When Familiar Protocols Meet Unfamiliar Scale

What You Think You Know About BGP Breaks At Hyperscale

When BGP (Sort Of) Fails To Be The “BGP” We Know

In most enterprise networks, we often treat protocols like BGP, OSPF, and IS-IS as stable, mature technologies that “just work.” But when you enter hyperscale territory, the behavior of these protocols doesn’t scale linearly: it bends, twists, and in some cases, breaks.

Hyperscale operators, such as AWS, Meta, and Google, have long realized this. These are not just large networks; they are planetary-scale systems, with tens of thousands of BGP-speaking routers, dozens of independently operating availability zones, and multi-layered routing architectures that include intra-region, inter-region, edge, backbone, and internet peering layers.

In such environments, the control plane’s default behavior is often the limiting factor in global resiliency and customer experience. One such example: BGP convergence.

The BGP Problem at Hyperscale: Delay by Design

When something goes wrong, say, a link failure, route withdrawal, or prefix redistribution, the speed at which BGP converges can make or break availability SLAs. And BGP, by design, is not built for speed.

At enterprise scale, a few seconds of convergence is often tolerable. However, at hyperscale, seconds can turn into several minutes, and these extended minutes can lead to system-wide inconsistencies.

Let’s explore what breaks down:

Subscribe to keep reading

This content is free, but you must be subscribed to The Routing Intent by Leonardo Furtado to continue reading.

Already a subscriber?Sign in.Not now