Architecture

Engineering

By Lumia Labs/ On 09 Feb, 2026

You Don't Need Microservices

Segment built over 140 microservices to handle event routing. Three engineers ended up spending most of their time keeping them running instead of building features. They went back to a monolith. "Instead of enabling us to move faster," Alexandra Noonan wrote, "the small team found themselves mired in exploding complexity." We've seen this pattern play out with clients too. Amazon's Prime Video team moved from serverless microservices to a monolith, reducing infrastructure costs by 90%. Even the company that invented service-oriented architecture found it wasn't always the right answer. Amazon's architecture isn't yours The microservices movement started because monolithic applications became unwieldy as teams grew. Conway's Law played out: large codebases forced developers into coordination overhead. Jeff Bezos issued his famous API mandate around 2002, requiring all Amazon teams to communicate through service interfaces. That architectural decision enabled AWS. The pattern worked for Amazon, so the industry followed. By 2020, O'Reilly found that 77% of organizations had adopted microservices. But most of those organizations aren't Amazon. They don't have thousands of engineers, decades of infrastructure tooling, or the operational budget to manage hundreds of independent services. Martin Fowler noticed the pattern already in 2015, he wrote: "Almost all the successful microservice stories have started with a monolith that got too big and was broken up. Almost all the cases where I've heard of a system that was built as a microservice system from scratch, it has ended in serious trouble." Most microservices are monoliths in disguise Most microservice architectures we've worked with don't deliver on their promise. Mistakes we see: Multiple services share a database, making it difficult to run migrations on the database schema Difficult to deploy a single change to a single service, due to leaking abstractions and hidden dependencies Hard to coordinate deliveries across teams, as services are not independent Bad API design, making services expose their inner workings (this is bad in monoliths as well, but there it is still deployed in one delivery) Failure in one service cascades through the whole systemTaibi, Lenarduzzi, and Pahl catalogued 20 of these anti-patterns through practitioner interviews. We recognize most of them. It has a name: the distributed monolith. You pay the full operational tax of microservices but get none of the independence. Every service hop adds latency, and debugging across service boundaries takes roughly 35% longer than in a single process. Kelsey Hightower, then a Distinguished Engineer at Google Cloud, put it bluntly: "Monoliths are the future because the problem people are trying to solve with microservices doesn't really line up with reality." And: "Now you went from writing bad code to building bad infrastructure that you deploy the bad code on top of." We think the cognitive load on teams gets overlooked often. Team Topologies research by Matthew Skelton and Manuel Pais backs this up: what matters isn't which architecture you pick, but which one your teams can handle. Microservices multiply what a team needs to understand: networking, service discovery, distributed tracing, container orchestration. Method calls, not network calls There's another option: the modular monolith. A single deployment with strict internal boundaries between domain modules, where communication happens through method calls instead of network calls. One test pipeline, one deployment, but with the separation of concerns that prevents spaghetti code. Shopify runs one of the largest Ruby on Rails applications in the world: 2.8 million lines of code, 500,000+ commits, hundreds of active developers. They evaluated microservices and explicitly rejected them. Kirsten Westeinde from Shopify's engineering team wrote that they wanted "a solution that increased modularity without increasing the number of deployment units." They landed on 37 components with defined boundaries, enforced by tools like Packwerk for static analysis. Developers work within clear boundaries without the overhead of distributed systems. Basecamp's DHH described their approach in 2016: 12 programmers serving millions of users across six platforms, with a monolith of 200 controllers and 190 model classes. In 2023, he went further with "How to Recover from Microservices", citing Gall's Law: "A complex system that works is invariably found to have evolved from a simple system that worked." Even Google recognized the pattern. Their Service Weaver framework lets you write an application as a modular monolith and deploy it as microservices only when needed. The architecture starts simple and gains complexity only where the system demands it. Most teams are too small for microservices Microservices solve problems that simpler architectures can't. Hundreds of engineers who need to deploy independently. Components with fundamentally different scaling requirements. At that scale, microservices pay for their complexity. Stefan Tilkov argues that starting with a monolith creates coupling that's hard to undo later. He's right that it can, if you don't enforce module boundaries. But his own caveat is telling: the approach requires "deep domain expertise" and suits larger systems only. The research behind Accelerate (Forsgren, Humble, Kim) shows that elite software delivery performance correlates with architecture enabling independent deployment. That's achievable with either microservices or a well-modularized monolith. Team autonomy drives performance, not the architectural pattern. If your engineering organization has fewer than 50 developers, you almost certainly don't need microservices. Start with modules and move to microservices when the pressure demands it. Modules first, services later When we help teams rethink their architecture, we use these principles: Draw domain boundaries first. Before any architecture choice, understand your business domains. Get that wrong and no architecture saves you. Enforce boundaries in code. Module boundaries without enforcement erode within months. Use tooling like SonarQube to make violations visible. Measure cognitive load. If your team spends more time on infrastructure than features, your architecture is too complex. Plan ahead. A well-modularized monolith can graduate to microservices when needed. Design modules as if they could become services.Lumia Labs helps organizations make architectural decisions. If you're reconsidering your architecture, we'd like to hear from you.

Engineering

By Lumia Labs/ On 16 Jan, 2026

The Architecture Review Checklist

Nobody hands you a map when you inherit a codebase. Maybe you're the new CTO and this is your first week. Maybe your company just acquired software built by strangers. Maybe the founders left, and now it's you. The documentation is thin. The git history tells stories you weren't there for. The system runs, mostly, but you don't know why. You definitely don't know where it's going to break. This post gives you questions. Questions that surface problems before those problems surface themselves, usually on a holiday weekend. You'll learn more from the questions you can't answer than the ones you can. Each "I don't know" tells you something. Three areas matter most: security model, operational readiness, and change velocity. Each reveals different risks. Together, they tell you what you've actually inherited. Security Model: Trust Archaeology Security in inherited systems is archaeological. Layers of decisions made by different people, under different threat models, at different stages of the company's life. Your job is to excavate. Where does trust get granted? Can you trace the path from untrusted input to database write? Where exactly does the system decide to trust that input? Most teams can't draw this picture. Authentication happens in one service, authorization checks live somewhere else, input validation scatters across three microservices. That gap between components is where vulnerabilities hide. What's the blast radius? If one component gets compromised, what else falls with it? Look for shared database credentials, service accounts with God-mode access, secrets in environment variables that every service can read. These patterns made sense when the system was three developers and one server. Now they mean a single breach cascades everywhere. The 2013 Target breach started with an HVAC contractor's credentials and ended with 40 million stolen credit cards. Nothing stopped lateral movement once attackers were inside. What happens when auth fails? When the authentication service goes down, do requests fail or pass? Under pressure, many systems fail open: the "temporary bypass" that was never removed, the fallback that skips validation. These exist in almost every inherited codebase. Find them before an attacker does. What security decisions assumed a different world? The single-tenant system that became multi-tenant, the internal tool that became customer-facing. Security flaws can persist through multiple product releases, especially when inherited from dependencies. Ask when the last security review happened. Then ask what changed since. Security gaps don't just threaten data. They threaten the deal, the acquisition premium that evaporates after a breach. Operational Readiness: What Happens When It Breaks The system teaches you how it fails, but only if you're listening. Operational readiness means you can trace from "something's wrong" to "here's the line of code" before your customers start posting about it. What happens when the system breaks at night? If something fails, will the right person wake up with enough context to act? Many inherited systems have alerts that fire into Slack channels nobody watches after dinner. That's wishful thinking dressed as monitoring. Check who's actually on-call, what information they receive, and whether they can do anything useful with it. What failure modes has this system never experienced? If it's never seen a database failover, never handled a dependent service going dark, never weathered real production load, you don't know how it behaves in those scenarios. The absence of incidents might mean the system is resilient, but it might also mean you've been lucky. Which alerts does everyone ignore? Alert fatigue is operational debt with compound interest: every false positive trains your team to dismiss the next notification, until eventually the real incident gets lost in the noise. Ask how many alerts fired last week and how many were actionable. If less than half led to human action, your monitoring is mostly noise. How long does recovery actually take, and have you ever tested it? Most inherited systems have backups that have never been restored, failover procedures that have never been executed, and runbooks written by people who left years ago. Your documented recovery time means nothing until you've run through it under pressure. GitLab learned this the hard way in 2017 when they discovered during an incident that their backups weren't working. For context: DORA research shows that elite-performing teams restore service in under an hour. Low performers take between a week and a month. Who actually knows how this works? If the answer is one person, you have a single point of failure. Researchers call this the "bus factor," the number of people who could leave before a project stalls. A study at JetBrains found that files abandoned by their original developers tend to stay abandoned, becoming permanent blind spots in the codebase. If the answer is "the team that left," you're operating on muscle memory that's already fading. Change Velocity: The Fear Tax Developers who are scared to touch code batch changes into risky big releases. Fixes get deferred because they might break something else. Technical debt accumulates because nobody wants to venture into the dangerous parts. Survey research across the software industry found that teams waste 23% of their development time dealing with technical debt. Deadline pressure is the most common cause. Fear is a legitimate architectural metric. Where do developers refuse to go? Every inherited system has these zones: the billing code nobody fully understands, the integration with that legacy system held together by careful attention and prayer. These become permanent blind spots and permanent sources of risk. What's the worst that happens from a typo? Can one wrong character bring down production? A healthy architecture survives small mistakes, while a fragile one demands perfection at all times. Do your tests actually catch bugs? A green test suite that misses regressions creates false confidence. Teams deploy because the tests passed, when the tests weren't checking what mattered. Look at the last few bugs that reached production. Should the tests have caught them? How long until a new engineer can ship? This measures friction. If it takes months to understand the system well enough to contribute safely, change will always be slow. The codebase is effectively defended against its own team. Etsy has new engineers deploy on day one. Can you undo a bad deploy in minutes? Willingness to ship correlates directly with ability to recover, and if rollback is scary, slow, or uncertain, every change feels permanent. Teams stop taking reasonable risks and the system calcifies. What to Do With These Answers Start with the questions you couldn't answer. Those are your blind spots, and blind spots don't stay hidden forever. For the answers that worried you, write them down now. Next week the urgency will fade, you'll rationalize, and the system will keep running while you tell yourself it's probably fine. Capture the concern while you still feel it. Document the surprises for your team. If the mental model doesn't match reality, someone else will hit the same confusion, probably during an incident at the worst possible time. These questions won't give you a complete picture. Nothing will. But they'll tell you where the gaps are, and that's where to start.Lumia Labs partners with organizations navigating exactly this situation: inherited systems and pressure to move forward. If you want a second set of eyes on what you've inherited, let's talk.