Case Study

Monolithic to Microservices Migration — BPM System

Tareq Aziz

Senior Software Engineer, Bangladesh Software Solution

Platform

SaaS-based BPM Platform

Solution

Monolith to Microservices Migration

Industry

Business Process Automation / Enterprise SaaS

Key Tech

Python, .NET, Kubernetes, Dapr, Keycloak, ELK Stack

1. Monolithic Architecture — Challenges and Limitations

The initial version of BPM was built as a single monolithic application. While this approach accelerated early-stage development and reduced initial complexity, it introduced significant scalability, performance, and maintainability limitations as the platform grew.

1.1 Identified Pain Points

Challenge	Monolithic Problem	Microservices Solution
Slow Step-by Step Execution	Each workflow step waited for the previous one to finish before starting — like a single queue where everyone waits in line, making the whole system slow.	Asynchronous Communication — Services communicate through messages without waiting, so tasks run independently without blocking each other.
No Real Parallel Processing	When a workflow needed to do multiple things at once, the system still did them one by one — losing the benefit of parallel execution.	Independent Service Execution — Each service runs as a separate process, so multiple tasks execute truly in parallel across different services.
Slow Data Processing	All data requests went through one single layer, creating a bottleneck — like having only one cashier for thousands of customers.	Database per Service — Each service owns its own database, so data operations are distributed and don’t create a single bottleneck.
Hard to Find Errors	When something broke, finding the exact cause was like finding a needle in a haystack — everything was mixed together with no clear separation.	Centralized Logging & Monitoring — All services send logs to one place with unique trace IDs, making it easy to follow a request across services and find the exact error.
Unreliable Data Consistency	Keeping data accurate across long, multi-step workflows was fragile — if one step failed midway, the entire process could end up in a broken state.	Eventual Consistency — Each service handles its own data transaction locally, and services coordinate through events to keep data in sync across the system.
Poor User Experience	As workflows got more complex, the system became visibly slower — users experienced lag, long loading times, and unresponsive screens.	Horizontal Scaling — Busy services automatically get more instances to handle the load, so users always get fast responses.
Limited Security & Login System	The login system couldn’t properly handle multiple organizations, user roles, or modern security standards like OAuth 2.0.	Dedicated Authentication Service — A separate service handles all login, user roles, and security — making it easy to support multiple organizations and modern security standards.
Cannot Scale Independently	To handle more load on one feature, the entire application had to be scaled — wasting resources on parts that didn’t need it.	Independent Scalability — Each service scales on its own based on its specific load — only the busy parts get more resources, saving cost.
Single Point of Failure	If any part of the system crashed, the entire platform went down — one bug could take everything offline.	Fault Isolation — Services are isolated from each other, so if one service crashes, the rest continue working normally.
Slow Deployment Cycle	Every small change required redeploying the entire application — making releases slow, risky, and infrequent.	Independent Deployment — Each service is built, tested, and deployed separately — teams can release updates faster without affecting other services.

2. Migration Guideline

3. Microservices Architecture — Solution Design

3.1 Technology Stack & Infrastructure Decisions

Technology	Purpose	Why It Was Chosen
.NET / Python	Service Development	.NET and Python are used to build the microservices — enabling polyglot development where each service uses the best-fit language
Keycloak	Authentication & Authorization	Provides OAuth 2.0, OpenID Connect, multi-tenant support, SSO, and role-based access control
Dapr	Service Communication	Sidecar-based abstraction for service-to-service calls, pub/sub messaging, and message broker portability — no vendor lock-in
Distributed Cache	Caching	Reduces database load by caching frequently accessed data, configurations, and session tokens
Saga Pattern (Choreography)	Transaction Management	Achieves eventual consistency across services through domain events and compensating transactions — no central orchestrator needed
YARP	API Gateway	Provides a unified API entry point with routing, rate limiting, load balancing, and centralized auth validation
ELK Stack	Logging & Monitoring	Centralized logging with Elasticsearch, Logstash, and Kibana — enables cross-service tracing, real-time alerting, and performance analytics
Kubernetes	Deployment & Orchestration	Independent scaling, self-healing, rolling deployments, and namespace-based tenant isolation

4. Migration Outcomes — Statistical Analysis

4.1 Performance Benchmarks (Before vs. After)

Category	Metric	Monolithic	Microservices	Improvement
Workflow Speed	Avg. processing time	~6.5s	~1.2s	~80% faster
Parallel Execution	Multi-branch execution	~14s (sequential)	~2.6s (concurrent)	~81% faster
API Response (p95)	Avg. response time	~1,300ms	~180ms	~86% faster
Throughput	Requests per second	~120 req/s	~1,800 req/s	~15x increase
Concurrent Users	Active users supported	~80	~1,200+	~15x increase
Uptime	Monthly availability	~96.5%	~99.85%	+3.35pp
Failure Rate	Workflow failure rate	~4.8%	~0.6%	~87% reduction
MTTR	Mean time to resolve	~4.2 hrs	~25 min	~90% faster
Downtime	Unplanned downtime / month	~25 hrs	~1.3 hrs	~95% reduction
Deployment	Deploy frequency	1–2 / month	15–20 / week	~30x increase
Resource Cost	Infrastructure cost	Baseline	Optimized	~30% reduction

4.2 Key Improvements

Deploy Without Fear — Each service can be updated independently — no need to redeploy the entire system for a small change
One Crash Doesn’t Break Everything — If one service goes down, the rest keep running normally
Scale Only What’s Needed — Busy services get more resources automatically, while idle ones stay as they are — saving cost
Find Problems Quickly — Centralized logs make it easy to trace any issue across services in minutes, not hours
Data Stays Consistent — Even when operations span multiple services, the system keeps data in sync and auto-recovers from failures
Faster & Smoother for Users — Users experience noticeably faster load times and more responsive interactions
Higher Uptime — Better fault isolation and self-healing means the platform stays available with minimal downtime
Swap Technology Easily — Infrastructure components can be replaced without changing application code

5. Key Takeaways

Split by business function, not by technology — Don’t separate frontend/backend/database. Instead, split by what the system does (e.g., data, users, config) so each service can work and grow independently.
Don’t tie yourself to one tool — Use abstraction layers so you can swap out databases, message brokers, or other infrastructure without rewriting your code.
Plan for failures across services — When one operation touches multiple services, use patterns like Saga to keep data consistent and automatically undo changes if something goes wrong.
Set up centralized logging from the start — With multiple services, bugs become much harder to find. A single place to view all logs with trace IDs saves hours of debugging.
Use container orchestration for deployment — Tools like Kubernetes handle scaling, self-healing, and zero-downtime deployments automatically — essential when managing multiple services in production.

This case study shows how a monolithic application was transformed into a distributed microservices platform — resulting in better performance, higher reliability, and a system that’s easier to scale and maintain.

Looking for top-tier developers?