MICROSOFT: Process 10 Million Events Per Minute. The Real Problem Wasn’t Processing Them. | by Sagar Yadav | Jul, 2026
Adding Consumers Can Make The System Slower
Here’s something that doesn’t make it into most tutorials.
Every time a consumer joins or leaves a consumer group, Kafka triggers a rebalance. Every partition gets redistributed. During that rebalance, message processing stops across the entire consumer group.
In a stable deployment, this is a brief pause — seconds, maybe. In a system where pods restart frequently, or where autoscaling is configured aggressively, rebalances can happen constantly. The consumer group spends more time redistributing partitions than processing messages.
The throughput curve inverts. Adding servers makes the system slower.
This tends to surface during the first real load test — or, more painfully, during the first time the platform actually needs to scale under pressure. Sticky partition assignment and cooperative rebalancing protocols reduce the disruption, but the problem first has to be recognized as a problem. Most teams encounter it after assuming that more consumers would simply mean more throughput.
Part of a series on system design, production engineering, and the interview questions that reveal how engineers actually think under pressure. And if you’re preparing for practical engineering interviews or trying to improve production-level thinking beyond just solving DSA problems, I’ve also been exploring platforms like PracHub.