When Familiar Protocols Meet Unfamiliar Scale

What You Think You Know About BGP Breaks At Hyperscale

In partnership with

Learn AI in 5 minutes a day

This is the easiest way for a busy person wanting to learn AI in as little time as possible:

  1. Sign up for The Rundown AI newsletter

  2. They send you 5-minute email updates on the latest AI news and how to use it

  3. You learn how to become 2x more productive by leveraging AI

When BGP (Sort Of) Fails To Be The “BGP” We Know

In most enterprise networks, we often treat protocols like BGP, OSPF, and IS-IS as stable, mature technologies that “just work.” But when you enter hyperscale territory, the behavior of these protocols doesn’t scale linearly: it bends, twists, and in some cases, breaks.

Hyperscale operators, such as AWS, Meta, and Google, have long realized this. These are not just large networks; they are planetary-scale systems, with tens of thousands of BGP-speaking routers, dozens of independently operating availability zones, and multi-layered routing architectures that include intra-region, inter-region, edge, backbone, and internet peering layers.

In such environments, the control plane’s default behavior is often the limiting factor in global resiliency and customer experience. One such example: BGP convergence.

The BGP Problem at Hyperscale: Delay by Design

When something goes wrong, say, a link failure, route withdrawal, or prefix redistribution, the speed at which BGP converges can make or break availability SLAs. And BGP, by design, is not built for speed.

At enterprise scale, a few seconds of convergence is often tolerable. However, at hyperscale, seconds can turn into several minutes, and these extended minutes can lead to system-wide inconsistencies.

Let’s explore what breaks down:

Subscribe to keep reading

This content is free, but you must be subscribed to The Routing Intent by Leonardo Furtado to continue reading.

Already a subscriber?Sign in.Not now