- The Routing Intent by Leonardo Furtado
- Posts
- Capacity Isn’t What You Think It Is
Capacity Isn’t What You Think It Is
70% utilization doesn’t make you safe. In hyperscale networks, real capacity is what remains standing after failure. Let’s build for the world as it breaks.

“The 70% Lie”
For decades, network teams have leaned on a simple mental shortcut:
If link utilization is below 70%, we’re in the safe zone.
This logic shows up everywhere, from procurement checklists to boardroom slides. You’ll hear it recited with conviction:
“We’ve got plenty of headroom. Our links are sitting at 54%.”
And yet… some of the most catastrophic outages in hyperscale networks have occurred well below that threshold.
Why? Because the “70% rule” is built on a seductive but flawed assumption: that the network is operating in ideal conditions. That links won’t fail. That optical layers are stable. That there are no shared risk groups hiding in your topology. That when a path drops, traffic will reroute cleanly, without loss, latency spikes, or silent meltdowns buried in queue depths.
But networks don’t break under ideal conditions.
They break when reality diverges from your assumptions.
And nowhere is that divergence more dangerous than in capacity planning.
When you’re running hyperscale infrastructure, planning for steady state is no longer enough. You’re not building a system that merely performs; you’re building a system that must survive. Survivability is not tested by average load, but by what happens when something critical breaks and the rest of your network is forced to carry the load.
A real-world example:
In one case, a regional backbone link between two core data centers in a North American metro area failed due to a fiber cut caused by routine construction. Nothing catastrophic, just another contractor with a backhoe and a map that didn’t show the conduit.
No problem, right? The network had five other paths. All showed 40–60% utilization in steady state. Everyone thought there was plenty of room.
However, when traffic shifted, a specific pair of ECMP paths began to accumulate queuing delay. One under-provisioned device pair, connected through a legacy aggregation block that hadn’t been fully upgraded, choked under the new load. Within 30 minutes, TCP retransmits exploded. Some vital sessions dropped, while other services degraded so badly that it triggered a customer support incident with hundreds of calls.
All this… with “plenty of headroom”.
It turned out the 70% figure was meaningless in the face of failure dynamics. The system had enough capacity on paper, but not under stress.
This is the core failure of conventional capacity planning:
It measures bandwidth, not survivability.
The moment your planning assumes perfection, you’ve already lost the ability to design for resilience.
What hyperscale operators have learned, sometimes painfully, is that capacity is not the ceiling of utilization. It’s the floor of safety under failure. It’s the minimum viable network that must keep working even when something major doesn’t.
This is why a new paradigm has emerged, one that goes beyond thresholds and dashboards. One that treats every capacity plan as a fault-tolerant design exercise, not a spreadsheet calculation.
This is failure-informed capacity planning.
Let's now challenge the very premise of “how much is enough” and reframe the question entirely.
Because in a modern distributed infrastructure, you don’t plan for today’s traffic.
You plan for what’s going to break.

Subscribe to our premium content to read the rest.
Become a paying subscriber to get access to this post and other subscriber-only content. No fluff. No marketing slides. Just real engineering, deep insights, and the career momentum you’ve been looking for.
Already a paying subscriber? Sign In.
A subscription gets you:
- • ✅ Exclusive career tools and job prep guidance
- • ✅ Unfiltered breakdowns of protocols, automation, and architecture
- • ✅ Real-world lab scenarios and how to solve them
- • ✅ Hands-on deep dives with annotated configs and diagrams
- • ✅ Priority AMA access — ask me anything