4. Step 0: Requirements Before Devices (Problem-First Framework)
By now, the pattern should be obvious:
With smartphones, you don’t start with “iPhone or Galaxy?”
You start with: “I need a battery that lasts all day, these apps, a good screen, a good camera, and durability.”
Only after that do model names become meaningful.
In network engineering, Step 0 is the same, but the stakes are much higher. Once you commit to a protocol stack, a DC fabric, a firewall architecture, or a router family, you’re making a multi-year, multi-million-dollar decision that defines:
How you scale.
How you fail.
How you operate.
How often your weekends disappear.
And yet, people still jump straight to:
“We’ll just do EVPN everywhere.”
“We’re going SRv6; that’s the future.”
“Let’s standardize on Vendor X; they’re the market leader.”
That’s like saying, “We’ll just standardize on the most expensive phone” before you even know whether you need its features.
Serious engineering starts somewhere else: with a Problem / Requirements Document.
The Problem / Requirements Doc: Your Adult-On-Paper Step
Think of this doc as the grown-up version of your “I want these 5 things from my phone” list. It’s short, but it’s sharp.
At a minimum, it should answer:
Problem statement
Goals and SLOs
Non-goals
Constraints
Assumptions
Let’s walk through each.
1. Problem Statement: What Are We Actually Trying to Do?
This is the core question:
“What problem are we solving, for whom, and why now?”
Example:
“Design a spine–leaf DC fabric for 8,000 servers, delivering 10–20 Tbps of aggregate east–west capacity, with 99.99% availability for production workloads, and clear failure domains to contain blast radius.”
Another:
“Redesign the WAN core to support 500+ sites, each from 1 to 20 Gbps, across multiple providers, with per-tenant isolation and end-to-end encryption.”
You’re not listing technologies yet. You’re not name-dropping EVPN, SR-MPLS, or Vendor X. You’re describing the reality you need to exist.
2. Goals & SLOs: What “Good” Actually Means
Next, you define what success looks like, in measurable terms:
Availability:
“Core DC fabric: 99.99% availability for production tenants.”
“Backbone: no more than X minutes of brownout per month per region.”
Performance / latency:
“Median east–west RTT inside a DC pod ≤ 200 µs; 99th percentile ≤ 500 µs.”
“Backbone latency between region A and B ≤ X ms under normal conditions.”
Convergence & recovery:
“Single-link failures in DC: traffic reroute within ≤ 200 ms (data plane), full control-plane recovery ≤ 5 seconds.”
“Core routing changes across regions converge in ≤ 60 seconds in steady-state scenarios.”
Operational SLOs:
“90% of configuration changes go through automated pipelines with pre-checks and post-checks.”
“Most incidents detectable via monitoring within ≤ 1 minute of impact.”
These are the “battery must last all day” equivalents. They anchor everything else.
3. Non-Goals: What You Are Not Solving
Non-goals are criminally underrated.
They protect you from scope creep and hype:
“We are not designing an intercontinental backbone here.”
“We are not solving for IoT at global scale; this is about DC workloads only.”
“We are not reinventing the entire security architecture; we are defining DC fabric and edge, not full enterprise zero trust.”
Explicit non-goals do two huge things:
They keep the design review focused.
They block people from dragging unrelated pet projects into your decision.
When someone says, “Why not add SRv6 for everything while we’re here?”, you can calmly point to the non-goals and say, “Not in scope for this decision.”
4. Constraints: Welcome to the Real World
This is where engineering meets reality.
Constraints include:
Physical:
Rack space per site or per row.
Power envelope: kW per rack / per row.
Cooling capacity.
Fiber availability and layout.
Organizational:
Team skills and experience:
“Ops team is strong in MPLS and BGP; minimal SRv6 experience today.”
On-call model:
24/7 vs best-effort.
Change management:
Strict change windows vs continuous deployment.
Vendor / platform:
Existing contracts, discounts, or mandatory standards.
Hardware already deployed that must be reused.
Cloud provider or DC provider constraints.
Economic:
CAPEX budget bands for the initial build.
OPEX expectations:
“We expect to operate this with ≤ N FTEs per region.”
ROI expectations:
“Target payback period X years; network cost should not exceed Y% of service revenue.”
Time:
“First production tenants must be live in 9 months.”
“Migration window is 18 months; dual-stack or hybrid designs must exist during transition.”
These are the things that turn an academically nice design into something you can, or cannot, actually ship.
5. Assumptions: What You’re Betting On
Finally, you write down what you’re implicitly assuming anyway:
Traffic patterns:
“80% east–west inside DC, 20% north–south.”
“WAN traffic heavily hub-and-spoke today, trending toward more mesh with SaaS usage.”
Growth:
“20–30% annual traffic growth, with bursts possible due to new product launches.”
“Number of customers or tenants doubling over 3 years.”
Failure scenarios:
“Single link and single device failures are common; multi-device failures are rare but possible.”
“Fiber cuts between sites occur a few times per year.”
Operational maturity:
“Current automation coverage is low/medium, but the target is high.”
“Current incident process is manual; plan to introduce more automated detection/remediation.”
Assumptions are where you admit:
“This is the world we think we’re designing for. If these assumptions change, we may need to revisit the decision.”
That’s intellectual honesty. It also makes your future self very grateful when you come back to this doc in two years and remember why you chose what you chose.
Working Backwards: The Future-State Network Press Release
If you like Amazon’s Working Backwards model, this is where it fits perfectly.
Instead of starting from “what hardware do we want to buy?” you start from a future-state narrative:
“Our new DC network supports 8,000 servers across 4 pods, with 20 Tbps of aggregate east–west capacity. It maintains 99.99% availability for production tenants, even during maintenance and single-device failures. Most changes are deployed via automated pipelines, with pre-checks, staged rollout, and automated rollback. Incidents are detected within 60 seconds and mitigated in under 10 minutes in typical cases, with rich telemetry that makes troubleshooting fast and predictable.”
Or:
“Our new WAN architecture connects 500+ sites worldwide across three carriers. It provides per-tenant isolation with L3VPN/EVPN services, supports encrypted transit, and can reroute around provider outages within 60 seconds without manual intervention. Network changes are modeled as code, peer reviewed, and deployed via CI/CD pipelines.”
Once you have that “press release” style paragraph, you can derive:
SLOs: availability, latency, convergence, detection, mitigation.
Operational properties: automation level, observability, on-call expectations.
Economic expectations: cost envelope, ROI horizon, efficiency targets.
That future-state description becomes the North Star.
Then you work backwards:
“For this level of availability, what kind of HA and failure domains do we need?”
“For this level of automation, what OS and API capabilities are mandatory?”
“For this convergence behavior, what protocol features and designs are non-negotiable?”
“For this OPEX target, how simple must operations be?”
Only after this do you ask:
“Given these requirements and constraints, which protocol stack fits?”
“Which vendor platforms can satisfy this design with acceptable TCO and risk?”
“Which architectures (EVPN vs L2VPN, SR-MPLS vs RSVP, etc.) make sense here?”
The Smartphone Parallel, One More Time
Think about your phone decision:
You didn’t say:
“I must buy the newest iPhone because it’s the newest.”
You said:
“I need a battery that lasts all day, these apps, this screen, this camera, this durability.”
Then, and only then, did you look at models.
The network version is:
Before you even say “EVPN” or “SR-MPLS” or “Vendor X,” you should be at the smartphone stage of saying “I need a network that delivers X capacity, Y availability, Z operational properties, within these constraints and cost envelopes.”
If you skip Step 0 and jump straight to tech names, you’re not doing architecture. You’re doing shopping.
Let's move on and take these requirements and start feeding them into concrete decision frameworks, like multi-criteria matrices, QFD mappings, and Pugh comparisons, so you can systematically compare real options instead of arguing about preferences.
Subscribe to our premium content to read the rest.
Become a paying subscriber to get access to this post and other subscriber-only content. No fluff. No marketing slides. Just real engineering, deep insights, and the career momentum you’ve been looking for.
UpgradeA subscription gets you:
- ✅ Exclusive career tools and job prep guidance
- ✅ Unfiltered breakdowns of protocols, automation, and architecture
- ✅ Real-world lab scenarios and how to solve them
- ✅ Hands-on deep dives with annotated configs and diagrams
- ✅ Priority AMA access — ask me anything

