AWS Cost Optimization — Real Savings Playbooks

Here's something nobody tells you early enough: AWS pricing is a game, and most teams are losing it badly. Not because they're dumb, but because the defaults are expensive and the billing dashboard is deliberately confusing. I've seen startups burning through runway on NAT Gateway charges they didn't know existed. I've seen enterprises paying for idle RDS instances that nobody remembered provisioning.

These guides come from real cost reviews on real accounts. Every number is from an actual AWS bill. Every optimization was tested in production, not just theorized about in a spreadsheet.


Start Here

If your AWS bill makes you wince every month but you don't know where to start, these three will give you the biggest bang for the least effort.

We Saved a Startup 50L/year on AWS: Full Cost Teardown — The one that started it all. A line-by-line breakdown of a real startup's AWS bill, the 7 changes we made, and the exact savings from each. Spoiler: the biggest win wasn't technical — it was just turning off stuff nobody was using.

The 7 Biggest AWS Cost Leaks (and How to Plug Them in 30 Days) — NAT Gateways, idle load balancers, oversized RDS, unattached EBS volumes, forgotten Elastic IPs, data transfer between AZs, and CloudWatch log retention set to "forever." If you fix just these seven things, you'll probably save 20-30% without changing a single line of application code.

AWS Cost Explorer Actually Explained — Cost Explorer is powerful and also kind of baffling. This is the walkthrough I give every team I work with: how to set up useful views, what the amortized vs unblended numbers actually mean, and the three reports you should check every Monday morning.


Deep Dives

These go beyond the quick wins. Architecture decisions, commitment strategies, and the kind of optimizations that compound over time.

Reserved Instances vs Savings Plans: The Decision Framework — I've spent way too many hours on this. Here's the actual math: when RIs win, when Savings Plans win, and the hybrid strategy that gets you 90% of the savings with 50% of the commitment risk. Includes the spreadsheet I use for every client engagement.

Right-Sizing EC2 and RDS Without Breaking Things — "Just use smaller instances" sounds simple until you find out that your m5.xlarge is actually CPU-bound during batch jobs every Tuesday night. Here's how to right-size safely using CloudWatch metrics, Compute Optimizer recommendations, and a staged rollout that won't page you at 2 AM.

Serverless Cost Optimization: When Lambda Gets Expensive — Lambda is cheap until it isn't. I break down the inflection points — when Lambda costs more than Fargate, when Fargate costs more than EC2, and the hybrid approach that keeps you in the sweet spot. Includes a calculator you can plug your own numbers into.

Data Transfer Costs: The Hidden AWS Tax — Data transfer is the most misunderstood line item on every AWS bill. Cross-AZ, cross-region, NAT Gateway, VPC endpoints, CloudFront vs direct S3 — I map out every path your bytes can take and what each one costs. This single post has saved readers more money than anything else I've written.

Spot Instances in Production: Yes, Really — Spot gets a bad reputation because people use it wrong. I run stateless workloads on Spot in production and have for two years. Here's the interruption handling, the capacity diversification, and the fallback strategy that makes it work. 60-70% savings on compute isn't a gimmick if you architect for it.

Building a Cost-Aware Engineering Culture — Tools don't fix cost problems. Culture does. This is the playbook for getting engineers to care about AWS spend without making them resent you: tagging standards, team-level dashboards, cost anomaly alerts, and the one metric that actually changes behavior.


What We're Building Next

I'm putting together a complete FinOps automation toolkit — Terraform modules that deploy cost monitoring, alerting, and automated right-sizing recommendations out of the box. Think of it as your AWS cost guardrails, deployed in 10 minutes. Early access coming soon.


Get These Guides in Your Inbox

Every week: one AWS failure broken down + the fix that worked. Real bills, real savings, no vendor pitches.

Subscribe Now


More from InfraTales:

  • Serverless Hub — Event-driven architectures that actually work in production.
  • Security Hub — Zero trust, IAM, and DevSecOps that actually ships.

Every week: one AWS failure broken down + the fix that worked