The PHP monolith was 7 years old, 3 million lines of code, and serving 400,000 active users. The team had been discussing microservices for two years. When a database bottleneck caused a 4-hour production outage in Q1 2025, the conversation became a mandate. Here's the complete story of the 6-month migration — every decision, every mistake, and every win.
Understanding the Monolith
Before any architectural decisions, we spent three weeks mapping the actual domain boundaries within the monolith. Using static analysis tools and database access pattern tracing, we identified natural bounded contexts: user authentication and authorization, product catalog management, order processing, inventory management, payment processing, and notification dispatch. Each boundary was fuzzy — code from one domain reached into another's database tables in dozens of places — but the logical separation was real.
The database was the central bottleneck. Every domain shared a single MySQL database with 340 tables. The order processing domain was joining across tables owned by four other domains in real time, creating lock contention during peak traffic. The schema had grown organically over 7 years — there were tables with over 200 columns, multiple circular foreign key relationships, and columns that existed only to support long-abandoned features.
- →Map bounded contexts before writing any microservice code
- →Database shared ownership is the hardest problem to solve — plan for it explicitly
- →Identify which services have the most cross-domain coupling as migration priorities
- →Legacy code is a living system — changes continue during migration
The Migration Plan: Strangler Fig
We chose the Strangler Fig pattern: deploy a new service, route traffic to it progressively, and when the new service handles 100% of traffic, delete the corresponding monolith code. This avoids the big-bang rewrite risk that kills most microservices migrations. The first service we extracted was the notification dispatch domain — it had the clearest boundaries, lowest coupling to other domains, and zero read-time database joins with other domains.
For each extracted service, we followed a 4-phase process: extract the domain logic into a new service still backed by the shared database (strangler phase 1), migrate the domain's tables to a dedicated database while maintaining the old tables with sync triggers (strangler phase 2), cut over to the new service exclusively and remove sync triggers (strangler phase 3), and delete the monolith code (strangler phase 4). Phase 2 was always the hardest — distributed data consistency under concurrent writes is genuinely difficult.
Challenges We Didn't Anticipate
Distributed transactions destroyed our initial timeline. Order processing required atomically updating inventory, creating an order record, and triggering payment — operations that in the monolith were a single database transaction. In a microservices world, they span three services with three databases. We tried two-phase commit (operationally complex, latency-adding), then settled on the Saga pattern with compensating transactions. This required rewriting the order processing domain's core logic and added two months to the schedule.
Service discovery and configuration management, which we had assumed would be solved by Kubernetes and Consul, generated more operational incidents than any application-level issue. DNS propagation delays, health check misconfiguration, and sidecar proxy versioning mismatches between teams all caused cascading failures that were harder to debug than the equivalent monolith failures. We underestimated how much operational complexity we were adding alongside the architectural complexity.
Results After 6 Months
We successfully extracted 4 of 6 planned domains in 6 months. Authentication, notifications, and catalog management are fully independent services. Order processing is 80% migrated (the payment integration remains in the monolith pending PCI compliance re-certification for the new service). Inventory management migration begins next quarter.
The measurable outcomes: deploy frequency improved from once per week to multiple times daily for extracted services. P95 API latency for order queries dropped from 820ms to 190ms after catalog separation. The database that caused the Q1 outage has been relieved of catalog query load and no longer spikes during traffic peaks. The team's confidence in making changes has qualitatively transformed — engineers can deploy catalog changes without fear of breaking order processing.
- →Strangler Fig pattern is the only safe migration approach for live systems
- →Distributed transaction patterns (Saga) must be designed before migration, not after
- →Operational complexity grows non-linearly with service count — invest in observability early
- →Plan for 40-50% schedule buffer — distributed data consistency always takes longer
Conclusion
Microservices are not an upgrade you apply to a monolith — they are a different operational model with genuine advantages and genuine costs. The advantages (independent deployment, fault isolation, team autonomy) materialized exactly as promised. The costs (distributed transaction complexity, operational overhead, debugging difficulty) were underestimated. If you are considering this path, budget more time for data migration than you think you need, and invest in observability infrastructure before the first service goes to production.
James Whitfield
Lead DevOps Engineer