1. The Outrage Trap

Every few months, a headline explodes across social media:

“Startup spends $100 million a year on AWS!”
“$300,000 per day to run a SaaS platform… who signed off on that?”

The response is always the same: a mix of disbelief, ridicule, and hot takes about how “you could run that on a Raspberry Pi.” Suddenly, engineers, investors, and product managers all become self-appointed FinOps experts, diagnosing architectural incompetence from a headline.

The fact of the matter is that this is the uncomfortable truth:

A large cloud bill doesn’t mean someone failed. It often means they succeeded—at scale.

In reality, many of the companies behind these so-called “eye-watering” infrastructure bills are running complex, latency-sensitive, globally distributed platforms. Their cloud usage isn’t the result of reckless provisioning or a forgotten S3 bucket; it’s the direct consequence of intentional architecture aligned with business outcomes.

In this article, we’ll take a closer look at anonymized real-world scenarios, companies with workloads that justify $100M+ annual cloud spend. We’ll dissect what those numbers really mean through the lens of:

  • Performance requirements and architectural trade-offs

  • Network design complexity and operational reality

  • Cost structures and margin targets

  • FinOps principles and optimization techniques

We’ll also highlight where things go wrong, like when large cloud bills are the result of waste or poor design, and why that distinction matters more than ever in today’s economic climate.

This isn’t about defending high cloud bills blindly. It’s about giving you the tools to read those numbers intelligently, contextually, and with the technical and economic nuance they deserve.

2. Case #1 – A Real-Time Collaboration SaaS ($120M/year Cloud Spend)

Let’s start with a redacted case study: a design-focused, browser-based collaboration platform used by millions of designers, developers, and product teams worldwide. You know the type: real-time cursor sharing, asset libraries, commenting, multi-user whiteboarding.

At the peak of its growth, this company reached:

  • 15 million+ monthly active users

  • Global presence with users in over 100 countries

  • Hundreds of thousands of concurrent sessions, often from users collaborating across continents in near real-time

Their AWS bill? Over $120 million annually, or roughly $330,000/day.

It raised eyebrows. But once you unpack the architecture and business model, it becomes not only understandable but expected.

Engineering Use Case

Real-time demands, unforgiving latency

  • WebSockets everywhere: Each collaborative session maintains dozens of open WebSocket connections per user, all of which require ultra-low latency and high message throughput.

  • Multiplayer rendering: Sessions are stateful. Actions by one user trigger immediate updates for all others, often across continents.

  • Server-side rendering and preview: Thumbnails, design previews, and version histories require CPU- or GPU-intensive compute pipelines.

  • Storage complexity: Users upload large design files, fonts, libraries, and assets that must be stored durably and served quickly from the nearest region.

This isn’t static web content. This is interactive, global, millisecond-sensitive design orchestration, all from a browser.

Network and Infrastructure Reality

What’s actually running under the hood?

  • Global VPC design: Regional clusters spread across North America, Europe, and APAC, each with auto-scaling compute and regional failover.

  • PrivateLink + Direct Connect for hybrid operations and partner integrations.

  • Always-on environments for production, staging, QA, customer demos, and A/B testing (high cost, but critical for velocity).

  • High cross-AZ traffic: For fault tolerance and redundancy, leading to substantial inter-AZ and NAT gateway costs.

  • S3 and CloudFront: Delivering assets and fonts with CDN acceleration, but with significant PUT/GET request volumes and object replication logic.

The network wasn't built for cost minimization; it was built for user experience at scale, where milliseconds and availability directly impact retention and NPS.

FinOps View: Why This Isn’t Waste

From a cloud economics standpoint, this company’s cloud spend hovered around 12–13% of annual revenue, comfortably within norms for compute-heavy SaaS platforms.

Here’s what the FinOps team was doing behind the scenes:

  • RI and SP Optimization: Over 70% of EC2 and RDS workloads were covered by Reserved Instances or Savings Plans, pushing effective discounts close to 35%.

  • Cost per active session was tracked and optimized quarterly, providing a unit economics framework to evaluate engineering changes.

  • Observability on cost spikes: NAT Gateway, EBS snapshots, and inter-region traffic were tagged, visualized, and metered daily.

  • Rightsizing and instance tuning: Teams had playbooks to test CPU/memory-intensive workloads on ARM-based instances or switch pipelines to serverless tasks during off-peak hours.

So yes, the bill was enormous. But it was measured, optimized, and strategically aligned with product scale.

“What looks like cloud overspend from the outside is often a deliberate investment in customer experience, resilience, and scale velocity.”

This SaaS platform didn’t arrive at a $120M/year bill by accident. It evolved through a series of conscious architectural decisions to support performance and reliability in a product where collaboration lag is a deal-breaker.

And that’s the point: when your product is the infrastructure, optimizing for cost in isolation is the wrong metric. You optimize for business value and measure cost in that light.

3. Case #2 – An AI/ML Research Platform ($90M/year Cloud Spend)

This next case revolves around a company in the AI/ML infrastructure-as-a-service space, think fine-tuning foundation models, API access to generative inference, and a marketplace for model hosting and deployment. Their customers range from startups embedding LLMs into apps to Fortune 500s retraining proprietary datasets.

They’re not building slide decks or UIs. They’re selling raw, GPU-intensive compute as a product.

By the time they raised their Series D, this company had:

  • Dozens of model families under management, some with billions of parameters

  • A customer base running tens of thousands of concurrent inference jobs

  • Massive ingestion and preprocessing pipelines for text, images, audio, and tabular data

  • And yes, an AWS bill of around $90 million/year

Engineering Use Case

Training and inference at hyperscale

  • Model training: Distributed training runs for large transformer models across hundreds of A100 nodes, often for days at a time.

  • Inference endpoints: High-throughput, low-latency model serving with autoscaling logic that can scale out across thousands of vCPUs or GPUs in minutes.

  • Data pipeline: Ingests petabytes of source data into object storage (S3), performs data cleaning and sharding using Dask or Spark on EMR.

  • Serving gateways: Provide multi-tenant APIs with fine-grained rate limiting, user-level monitoring, and zero-downtime upgrades.

  • Multi-region fault isolation: Training clusters in us-west-2, eu-central-1, and ap-southeast-1 to avoid compute scarcity and reduce latency to global customers.

Network and Infrastructure Reality

What’s burning the budget?

  • Tens of thousands of ephemeral EBS volumes during training jobs, deleted and recreated continuously.

  • Exponential east-west traffic between GPU nodes in model parallel workloads, especially for transformer-based models.

  • Massive NAT gateway usage from autoscaled training clusters pulling dependencies and models from external repositories.

  • Burstable load: Inference APIs experience sudden spikes in demand, requiring large cold-start buffers.

  • Egress costs: Customer workloads frequently export trained models or inference outputs to external regions or on-prem environments.

None of this is "set and forget." This is chaotic, dynamic, and scale-intensive orchestration where network throughput is as essential as raw compute.

FinOps View: Why This Spend Was Strategic

Cloud costs here were routinely above 25% of revenue, but this was planned. The company operated under a known-growth-burn model, balancing capital efficiency with R&D velocity.

FinOps wasn't trying to slash costs to the bone. Their job was cost governance under chaos:

  • Custom scheduling for spot market volatility: Training jobs were placed on clusters with real-time spot interruption forecasting, enabling up to 65% savings.

  • Reserved GPU commitments with flexible terms: They negotiated early-access programs with AWS to get discounts on newer hardware in exchange for long-term use.

  • Unit cost tracking per model family: They could tell you how much it costs to train GPT-J, BERT, or Stable Diffusion per customer, per run.

  • Cost-aware product pricing: Customers were billed based on exact resource usage, allowing margins to be preserved even at high scale.

This wasn’t “spend now, optimize later.” It was optimized as you go, but spend deliberately for speed.

“When your product is raw compute, cloud cost isn’t overhead; it’s COGS.”

AI/ML platforms can’t exist without massive infrastructure, and cloud offers the only way to deliver that at speed, with global scale, and without up-front CapEx.

Yes, this company’s AWS bill was jaw-dropping. But it was also predictable, instrumented, and revenue-aligned.

In this business model, cloud cost is a sign of adoption, not a red flag.

4. Why “Cloud Cost ≠ Cloud Waste”

Every time a massive cloud bill hits public discourse, the default reaction is to assume waste.

“They must’ve left something running.”
“Who’s paying for all those NAT gateways?”
“Why didn’t they just move it on-prem?”

But the reality is that cost and waste are not the same thing.

In fact, at hyperscale, many of the biggest line items on your cloud bill reflect deliberate architectural and business decisions, not mistakes.

Let’s break that down.

When High Spend is Justified

1. High NAT Gateway Charges

NAT gateways are notoriously expensive, but also critical in some workloads. In multi-AZ deployments, particularly those involving auto-scaling workloads that need external updates (model weights, system packages, third-party APIs), NAT traffic becomes a constant.

Not wasteful: If your app auto-scales based on user traffic and those instances need external dependencies, this is just the cost of elasticity and security.

2. Cross-AZ and Cross-Region Data Transfer

You're replicating data across zones or regions? It’s probably for fault tolerance or latency optimization.

Not wasteful: These are costs tied to uptime guarantees, RTO/RPO targets, or user proximity. They’re architecture-level choices that align with SLAs and customer experience.

3. Always-On Environments

Many companies run 24/7 dev, staging, and canary regions.

Not wasteful: For large engineering teams with continuous deployment pipelines and global coverage, shutting down staging environments at night would cripple release velocity.

4. Redundant Infrastructure

It’s common to see 2–3x redundancy for certain mission-critical services, or even shadow deployments for disaster recovery.

Not wasteful: You’re paying for resilience. Downtime would cost more in SLA penalties or lost business.

⚠️ When High Spend Might Be Waste

Let’s be clear: not all large cloud bills are optimized. Here are scenarios where the cost should raise eyebrows:

1. Unattached Volumes and Idle Instances

Thousands of EBS volumes or EC2 instances without recent I/O or CPU activity? That’s a fireable offense at scale.

2. Low Utilization on Reserved Commitments

Buying a 3-year reserved instance but running it at 10% CPU utilization for 18 months? That’s poor forecasting and sunk cost.

3. Lack of Unit Economics

If you can’t answer “how much does a user session cost?” or “what’s our cost per model inference?” you’re flying blind. You don’t need to slash spend, but you do need telemetry.

4. No Spend Attribution

Costs aren’t tagged. There’s no ownership. Engineering can’t correlate features with cost deltas. That’s how zombie workloads survive.

The North Star: Intentional Spend

What separates justified high cloud bills from chaotic ones isn’t the number; it’s the intent, instrumentation, and accountability behind it.

A well-architected $100M/year cloud estate is infinitely more sustainable than a poorly instrumented $10M/year sprawl.

The goal is not to minimize spend at all costs. It’s to maximize business value per dollar spent, and that requires visibility, collaboration, and clear architectural reasoning.

5. Network Engineering Economics at Scale

When people think about cloud cost, their minds jump to compute and storage. But as systems scale, especially globally, the network becomes the silent heavyweight of your cloud bill. Often overlooked, often misunderstood, but always present.

As a seasoned Network Development Engineer, here’s what I can tell you from firsthand experience:

Many of the most expensive components in hyperscale environments are not compute-bound, but network-bound.

The Cost of Building a Resilient, Low-Latency, Global Network

1. Always-On Routing Across Availability Zones

High Availability (HA) doesn’t come free.

  • Design pattern: Active-active across at least two or three AZs.

  • Result: Constant cross-AZ traffic billed at premium rates.

  • Reason: SLAs and low RTOs demand that each request can fail over instantly.

Even if one AZ never fails, the cost of being ready for it to fail is significant.

2. Global VPC Peering and PrivateLink Architectures

At scale, you stop exposing public endpoints and start peering VPCs via PrivateLink, Transit Gateway, or custom mesh overlays.

  • Transit Gateway data processing charges can become seven-figure line items.

  • VPC Peering and Inter-region transfers introduce both bandwidth costs and complexity in route management.

  • Load balancers (NLB/ALB) incur per-request and data processing fees on ingress and egress.

These are trade-offs made for security, observability, and policy enforcement, not cost savings.

3. NAT Gateway Economics

You’ll hear this complaint a lot: “We’re spending $100k/month on NAT.”

Why?

  • Stateless services auto-scaling in private subnets.

  • Microservices fetching external data (model weights, telemetry, updates).

  • Managed services requiring outbound Internet access.

Alternatives like NAT instances require significant operational overhead, and don’t scale horizontally or survive interruptions like NAT Gateways do. So you trade Opex risk for Capex certainty.

Bandwidth Isn’t Free (Especially Not at Scale)

One of the most misunderstood aspects of cloud cost is egress bandwidth.

Here’s what changes when you grow:

Scale

Bandwidth Source

Notes

Startup

Mostly CDN (cheap)

Static assets + caching do most of the work

Mid-scale

Inter-AZ and VPC Peering

Apps talking to apps, zone-to-zone

Hyperscale

Cross-region, hybrid, real-time

Data sync, DR, edge inference, partner access

At hyperscale, internal traffic dwarfs customer-facing traffic, and most of it doesn’t benefit from caching or compression.

That’s right: your most expensive traffic is often the traffic no one sees.

Examples of Real Network Trade-Offs That Inflate Cost

  • Canary Regions: Running partial workloads in parallel across two regions to test resilience introduces 2x infra + 2x network cost but improves rollback safety and MTTD.

  • Shadow Deployments: Internal replicas of production traffic for observability, A/B testing, or model evaluation, none of which directly generate revenue.

  • Inter-region Queues: Event replication across regions or continents using services like Amazon SQS, MSK, or custom Kafka, each with per-message transfer fees.

  • Custom Backbone or SD-WAN Mesh: For hybrid, zero-trust architectures, adds software licensing and operational cost, but simplifies compliance and segmentation.

So Why Pay These Prices?

Because network cost is often the tax you pay for user trust, uptime, and performance.

It’s easy to say “optimize cross-AZ traffic,” but what if that traffic is replication for your customer database?

It’s easy to say “move to a cheaper region,” but what if that breaks GDPR compliance, or adds 300ms of latency to core workflows?

At hyperscale, “cheap” is often the opposite of “correct.”

Networking is invisible to users, but not to your cloud bill.

Your NAT gateways, transit gateways, cross-AZ traffic, and bandwidth egress are the scaffolding behind your app’s availability and speed. The goal isn’t to eliminate that cost. The goal is to understand its purpose, track it carefully, and optimize it within architectural bounds.

6. The Red Flags (When to Actually Worry)

Up to this point, we’ve defended large cloud bills as natural outcomes of scale, complexity, and intentional architecture.

But let’s be clear: not all high cloud bills are good bills.

There’s a fine line between deliberate cost and accidental waste, and organizations that operate at scale must be able to tell the difference. When cost starts to outpace value, and no one knows why, that’s when the red flags should start flying.

Here are the most common warning signs that your cloud bill is a symptom of dysfunction, not strategy.

1. No Unit Economics

"We don’t know what it costs to serve a user."

If you can’t answer basic questions like:

  • How much does one API call cost?

  • What’s the infra cost of a free-tier user vs. a paying user?

  • How does infrastructure cost change with a 10x increase in traffic?

…then you’re not in control of your platform’s economics. This isn't just a finance problem; it’s a systems observability problem.

Symptom: Cost grows linearly or unpredictably, regardless of user growth or product strategy.

Fix: Track cost per user/session/feature, and tie engineering output to spend delta.

2. Zombie Infrastructure

"That cluster was supposed to be temporary."

Idle EC2 instances. Detached but billable EBS volumes. DNS records pointing to nothing. Load balancers with zero traffic for weeks.

These aren't theoretical; they happen constantly in growing companies. Especially when teams move fast and ownership is unclear.

Symptom: Dozens (or hundreds) of resources with no usage, no tags, and no clear owner.

Fix: Enforce lifecycle tagging, implement TTLs on test environments, and monitor resource orphaning proactively.

3. Lift-and-Shift That Never Shifted

"We migrated to the cloud… but we’re still running like we’re on-prem."

This is one of the most common cost sins.

A classic case: an on-prem monolith is containerized and deployed into AWS, but nothing else changes. The architecture remains static, inefficient, and under-optimized for cloud primitives.

You end up using cloud as a colo, without benefiting from elasticity, managed services, or dynamic scaling.

Symptom: Constant 24/7 resource usage, no spot instances, no autoscaling, high ops burden.

Fix: Re-architect to cloud-native design patterns, event-driven, serverless, stateless microservices where appropriate.

4. Low Utilization of Reserved Resources

"We committed to $3 million in compute, but we’re only using half."

Buying Reserved Instances (RIs) or Savings Plans without understanding usage patterns leads to overcommitment and sunk cost. Conversely, failing to commit when workloads are stable leads to unnecessary on-demand spend.

Symptom: RIs going unused or constantly running workloads on on-demand pricing.

Fix: Establish FinOps feedback loops with engineering, monitor reservation coverage, and refresh forecasts quarterly.

5. Engineering Doesn’t Know (or Care) About Cost

"Our job is to ship code. Finance will figure it out."

This mindset is a fast track to waste. In high-performing organizations, cost accountability is distributed, not siloed.

If developers don’t see cloud bills, and infra teams aren’t measured on efficiency, the system breaks.

Symptom: Teams deploy expensive infrastructure without cost impact analysis, and costs spike during feature launches without correlation.

Fix: Embed FinOps metrics into dashboards, show cost deltas in CI/CD pipelines, and hold post-launch cost reviews.

A large bill is not automatically bad, but an unexplained or unexamined bill is.

The danger isn’t cloud spend. It’s cloud spend without visibility, ownership, or architectural awareness. If you’re seeing these red flags in your own environment, focus on regaining control rather than finding blame.

8. Benchmarks: What Is “Normal” Cloud Spend?

One of the first questions executives ask when they see an eight-figure cloud bill is:

“Is this normal?”

The answer depends on what business you’re in, what you’re delivering, and how you’ve architected for scale.

There is no single “good” or “bad” number, but there are industry benchmarks that help frame expectations. Cloud spend, when expressed as a percentage of revenue, reveals strategic intent, maturity, and business alignment.

Let’s break down some common profiles.

Productivity SaaS

Cloud Spend: 10–15% of Revenue

  • Collaborative tools, design platforms, knowledge sharing, etc.

  • Heavy WebSocket, compute for real-time rendering, and multi-region resilience.

Justified: Real-time latency matters, user concurrency drives scale.

🛑 Risk: High elasticity needs without strong cost controls lead to runaway bills.

Media & Streaming

Cloud Spend: 15–25% of Revenue

  • Audio/video platforms, live events, and transcoding services.

  • Massive object storage, high-throughput streaming, CDN costs, and transcoding compute.

Justified: Large egress, GPU pipelines, global delivery.

🛑 Risk: Free-tier users dominate usage without revenue offset.

AI/ML SaaS

Cloud Spend: 15–30% of Revenue

  • Model training and inference services, API access, and data prep.

  • Requires GPU instances, massive East-West traffic, and heavy storage IO.

Justified: Compute is the product.

🛑 Risk: Underutilized commitments, unmanaged spot fleets, lack of inference optimization.

E-commerce & Marketplace

Cloud Spend: 5–10% of Revenue

  • Web stores, digital product platforms, vendor marketplaces.

  • Moderate compute, high IOPS storage, peak-time scaling (Black Friday, etc.).

Justified: Usage spikes are predictable and seasonal.

🛑 Risk: In-house features (recommendation engines, image processing) may become cost centers if not optimized.

Enterprise SaaS

Cloud Spend: 5–12% of Revenue

  • CRMs, ERPs, HR platforms, security suites.

  • Stable workloads, predictable enterprise demand, often with slow growth curves.

Justified: Lower elasticity needs, strong reservation coverage.

🛑 Risk: Lack of FinOps discipline due to “locked-in” customers or long-term contracts.

Security / Compliance SaaS

Cloud Spend: 10–18% of Revenue

  • SIEMs, endpoint detection, and zero-trust platforms.

  • Large-scale log ingestion, long-term data storage, heavy indexing, and querying.

Justified: High ingest and compute demand, compliance requirements.

🛑 Risk: Poor observability on backend query costs, especially with managed services (e.g., Athena, CloudWatch Logs).

VC-Backed Startups

Cloud Spend: Varies Widely (10–40% of Revenue)

  • Depends on growth stage, burn model, and whether revenue exists yet.

  • Often trade profitability for velocity.

Justified: Infrastructure is a growth enabler.

🛑 Risk: Spend outpaces revenue and escapes accountability.

Key Benchmark Summary Table

Company Type

Typical Cloud Spend (% Rev)

Primary Cost Drivers

Productivity SaaS

10–15%

Real-time compute, WebSocket infra

Media/Streaming

15–25%

Egress, transcoding, CDN

AI/ML SaaS

15–30%

GPU compute, data movement, model hosting

E-commerce

5–10%

Storage, autoscaling compute

Enterprise SaaS

5–12%

Stable compute, reservation coverage

Security SaaS

10–18%

Log ingestion, long-term retention

VC-Backed Startups

10–40%

Speed over efficiency, high volatility

“Normal cloud spend isn’t a number; it’s a ratio that reflects your business model and architecture.”

High spend can be justified, if it's tied to revenue and growth. But comparing your bill to a raw number from another company, especially from a tweet, is misleading at best and dangerous at worst.

Instead, benchmark your spend:

  • Against your own revenue trajectory.

  • Against similar workload patterns and industry peers.

  • Against your past efficiency trends.

This tells you whether you’re investing in scale or just burning for burn’s sake.

9. Mentioned: Figma, Snap, Snowflake, Apple

So far, we’ve walked through redacted case studies, network cost structures, benchmarks, and the nuanced reality of cloud economics. But to ground this further, let’s look at some real-world examples that have gone public, each with varying scale, strategy, and success.

These cases serve not just as validation, but as a reminder: high cloud spend alone is not an outlier. Lack of strategy is.

Figma – $545M over 5 Years with AWS

  • Deal: $545 million committed cloud spend from 2021 to 2026, disclosed in Adobe’s failed acquisition filings.

  • Breakdown: ~$109M/year → ~$300k/day.

  • Revenue: ~$821M over the same period → ~13% of revenue spent on cloud.

  • Infrastructure Need: Real-time design collaboration, asset rendering, and multi-region availability.

This is a textbook for a high-concurrency, low-latency SaaS platform.

"Figma’s bill isn’t a scandal—it’s what intentional scale looks like.”
Corey Quinn, Duckbill Group

Snap – Multi-Billion Dollar Cloud Commitments

  • Deal: $2B with Google Cloud + $1B with AWS over five years.

  • Context: Snap was burning 40–50% of its revenue on cloud services during periods of hypergrowth.

  • Impact: The imbalance between cloud cost and revenue forced renegotiations with providers.

🛑 A cautionary tale in growth-first, cost-later cloud economics.

“Revenue was real, but margin was elusive because infrastructure cost wasn’t bounded by strategy.”

Snowflake – $2.5B AWS Commitment

  • Deal: Committed $2.5B in AWS spend between 2024 and 2028.

  • Business Model: Cloud data warehouse provider, cloud cost is COGS.

  • Revenue: Expected to scale proportionally, with margins built into the pricing model.

A perfect case of infrastructure spend aligned with usage-based billing.

"Snowflake’s customers are buying cloud compute and storage—so Snowflake’s AWS bill is directly correlated with product demand."

Apple – ~$360M/year on AWS

  • Revealed: 2019 reports indicated Apple spends over $30M/month on AWS.

  • Despite owning one of the largest private data center footprints in the world.

  • Why?: Elastic workloads, redundancy, and app store/CDN back-end distribution.

Even the biggest on-prem operator sees value in cloud elasticity.

“Apple doesn’t need the cloud. They choose it where it’s more efficient.”

Lessons from These Cases

Company

Cloud Spend (Est.)

Strategy

Verdict

Figma

$545M / 5 yrs

Real-time SaaS; spend tied to usage

Justified at scale

Snap

$3B+ / 5 yrs

Growth-first; poor margin control

⚠️ Required course-correction

Snowflake

$2.5B / 5 yrs

Cloud-native product; cloud is COGS

Aligned with model

Apple

$360M / year

Selective use of elasticity & services

Strategic hybrid use

“You’re not the first to get a massive cloud bill, and you won’t be the last.”

These companies didn't fall into cloud spend. They negotiated, structured, and built for it. Their bills reflected their growth, architecture, and business models.

If you’re operating at scale and growing responsibly, don’t let the sticker price scare you. Instead, ask:

  • Are we spending deliberately?

  • Is our spend tied to revenue or user value?

  • Are we instrumented well enough to course-correct?

10. Strategy, Not Scandal

Every time a company’s cloud bill leaks or becomes public, the Internet erupts in outrage:

“$300,000 a day? That’s insane.”
“You could run that on a single server rack!”
“This is why we should never use the cloud.”

But these reactions miss the point entirely.

In the world of hyperscale infrastructure, a large cloud bill isn’t inherently bad, just as a small bill isn’t inherently good. What matters is what that bill represents: business velocity, user value, architectural intent, and economic alignment.

What We've Learned

Through objective and redacted examples, we’ve shown that:

  • Cloud bills in the nine or even ten-figure range can be completely justified when they align with product demands, growth curves, and operational maturity.

  • Networking costs often become dominant at scale, and they’re not just noise; they’re the price of global latency, redundancy, and secure segmentation.

  • FinOps discipline matters, but the goal isn’t always to reduce cost. Sometimes it’s to spend better, not less.

  • On-prem isn’t dead. It’s one tool among many. For the right workloads, it remains the best option.

  • Benchmarks are contextual, not absolute. What’s expensive for one business may be margin-sustaining for another.

  • And finally, cloud isn’t the villain, but unmanaged cloud is.

A Modern Perspective on Cloud Spend

“Your cloud bill is a reflection of your architecture, your incentives, and your business maturity.”

  • If it’s growing in line with your revenue and tied to intentional design, good.

  • If it’s spiking randomly with no clear owner, bad.

  • If you’re using it to accelerate your roadmap, scale globally, and ship faster, great.

  • If it’s holding your margins hostage, course-correct.

Cloud spend isn’t a scandal. It’s a strategic narrative.

We’ve sat on both sides of the table: designing infrastructure for performance and scale, and justifying its cost under financial scrutiny.

Our message is simple:

  • Don’t be ashamed of a big number, be proud if it’s well understood.

  • Don’t optimize blindly, optimize with intent.

  • Don’t benchmark off headlines, benchmark off your business model.

In the cloud, like in business, context is everything.

See you in the next chapter!

Leonardo Furtado

Keep Reading