- The Routing Intent by Leonardo Furtado
- Posts
- When Familiar Protocols Meet Unfamiliar Scale
When Familiar Protocols Meet Unfamiliar Scale
What You Think You Know About BGP Breaks At Hyperscale

Learn AI in 5 minutes a day
This is the easiest way for a busy person wanting to learn AI in as little time as possible:
Sign up for The Rundown AI newsletter
They send you 5-minute email updates on the latest AI news and how to use it
You learn how to become 2x more productive by leveraging AI
When BGP (Sort Of) Fails To Be The “BGP” We Know
In most enterprise networks, we often treat protocols like BGP, OSPF, and IS-IS as stable, mature technologies that “just work.” But when you enter hyperscale territory, the behavior of these protocols doesn’t scale linearly: it bends, twists, and in some cases, breaks.
Hyperscale operators, such as AWS, Meta, and Google, have long realized this. These are not just large networks; they are planetary-scale systems, with tens of thousands of BGP-speaking routers, dozens of independently operating availability zones, and multi-layered routing architectures that include intra-region, inter-region, edge, backbone, and internet peering layers.
In such environments, the control plane’s default behavior is often the limiting factor in global resiliency and customer experience. One such example: BGP convergence.
The BGP Problem at Hyperscale: Delay by Design
When something goes wrong, say, a link failure, route withdrawal, or prefix redistribution, the speed at which BGP converges can make or break availability SLAs. And BGP, by design, is not built for speed.
At enterprise scale, a few seconds of convergence is often tolerable. However, at hyperscale, seconds can turn into several minutes, and these extended minutes can lead to system-wide inconsistencies.
Let’s explore what breaks down: