When a Canadian SaaS company came to us with an $80K/month Kubernetes bill that had doubled in eight months, we expected to find a few obvious waste levers. Instead, we found a systemic architecture problem — and fixing it required rebuilding the cluster's cost model from the ground up. Here's the complete playbook, from initial audit to a sustained $24K/month baseline.
The Audit: Where $80K Was Going
The first step was a full resource utilization audit using Prometheus metrics and Kubecost. The numbers were striking: average CPU utilization across all pods was 11%. Memory utilization was 18%. Teams had been copy-pasting resource requests from Stack Overflow, and the defaults were grossly over-provisioned. A typical API pod was requesting 4 vCPU and 8GB RAM but actually using 0.3 vCPU and 600MB at peak load.
Dev and staging environments were running at production node specifications 24/7 — including weekends. Three environments that developers hadn't touched in over two months were still running at full capacity. The absence of any namespace-level cost ownership meant nobody had accountability for the waste. This is the organizational pattern we see most often in high Kubernetes bills.
- →Average CPU utilization was 11%, memory 18% across all pods
- →Dev/staging ran at prod specs 24/7 — including idle weekends
- →No namespace-level cost ownership created zero accountability
- →Teams copy-pasted resource requests with no measurement baseline
Right-Sizing Node Pools and Pods
We ran Vertical Pod Autoscaler (VPA) in recommendation mode for two weeks before applying any changes. VPA's recommendation reports showed the actual resource usage percentiles (p50, p95, p99) for every deployment. Armed with real data, we updated resource requests to p95 actual usage plus a 30% headroom buffer. This single change reduced reserved compute by 58% without touching a single application.
We then migrated from a single homogeneous m5.4xlarge node pool to three specialized pools: a CPU-optimized pool (c6i.2xlarge) for web servers and API pods, a memory-optimized pool (r6i.2xlarge) for in-memory caching and data processing, and a burst pool for batchjobs. Node affinity rules routed workloads automatically. The result was better per-workload efficiency and a further 22% cost reduction.
Spot Instance Strategy
Stateless workloads — API servers, background workers, batch processors — moved to Spot instances with a multi-family diversification strategy. Rather than targeting a single instance type, we specified 8-10 instance families in the node pool configuration. This dramatically reduced interruption probability: when AWS needs to reclaim capacity in one family, the others remain available. Spot interruption handlers (graceful pod draining on SIGTERM) ensured zero dropped requests during reclamation events.
We kept only stateful workloads — databases, queue brokers, services that held in-memory session state — on On-Demand instances. The remaining On-Demand instances used 1-year Reserved Instance commitments on 70% of baseline capacity, with the remaining 30% on On-Demand for burst headroom. This combination of Spot, Reserved, and On-Demand yielded a 67% effective compute cost reduction compared to the original all On-Demand configuration.
The Results: Month by Month
Month 1: VPA right-sizing and dev/staging schedules → $80K to $51K. Month 2: Node pool specialization and Spot migration → $51K to $34K. Month 3: Reserved Instance commitments and final cleanup → $34K to $24K. Total reduction: 70%, or $672,000 in annualized savings. More importantly, the cluster now has a cost governance process: monthly Kubecost reports go to team leads, namespace budgets are enforced, and dev environments auto-shutdown on a schedule.
- →VPA right-sizing alone reduced reserved compute by 58%
- →Multi-family Spot strategy achieved near-zero interruption impact
- →70% cost reduction in 90 days: $80K → $24K/month
- →$672,000 in annualized savings with no performance degradation
Conclusion
Kubernetes cost optimization is not magic — it's measurement, accountability, and discipline applied systematically. The technology exists (VPA, KEDA, Spot instances, Reserved capacity). The blocker is almost always organizational: no cost ownership, no visibility, no culture of measurement. Fix the culture first, then apply the tooling. The savings follow reliably.
James Whitfield
Lead DevOps Engineer