After managing Kubernetes clusters at FAANG-scale for 10+ years, I've learned one uncomfortable truth: most teams waste 40–60% of their cloud spend on Kubernetes alone. Not because of bad engineers — because the defaults are wrong and the tooling makes it easy to overprovision.
Here's exactly what I do on every engagement to cut that waste.
The Problem: Kubernetes Is Silently Burning Money
Before I show you the fixes, let me show you where the money goes:
| Waste Category | Typical % of Bill | |---|---| | Oversized nodes (CPU/memory requests >> actual usage) | 30–40% | | Idle namespaces / forgotten workloads | 10–15% | | On-demand instances where Spot would work | 15–20% | | Over-allocated PersistentVolumes | 5–10% | | Cross-AZ data transfer | 3–5% |
That's 60–90% of your bill potentially reducible. Let's go after each one.
1. Right-Size Your Resource Requests (Biggest Win)
The most expensive mistake in K8s is setting requests based on vibes, not data.
# What most teams do (bad)
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
The problem: your pod actually uses 80Mi and 20m. You're paying for 6x what you need.
Fix: Use VPA (Vertical Pod Autoscaler) in recommendation mode:
# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml
# Create a VPA object in Recommendation mode (no auto-apply yet)
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Recommendation only — don't auto-restart pods
EOF
After 24–48h of traffic, check recommendations:
kubectl describe vpa my-app-vpa
You'll see something like:
Recommendation:
Container Recommendations:
Container Name: my-app
Lower Bound: cpu: 12m, memory: 64Mi
Target: cpu: 25m, memory: 120Mi
Upper Bound: cpu: 100m, memory: 256Mi
Use Target as your new requests. I've seen this cut costs by 35% alone on teams that never benchmarked their pods.
2. Cluster Autoscaler + Spot Instances
Running all On-Demand nodes is 3–4x more expensive than it needs to be.
Strategy: Mixed node groups
On-Demand: 20–30% of capacity (base load, critical workloads)
Spot: 70–80% of capacity (stateless apps, batch jobs)
AWS EKS node group config:
# Terraform — EKS managed node group with mixed instances
resource "aws_eks_node_group" "spot_workers" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "spot-workers"
node_role_arn = aws_iam_role.node.arn
subnet_ids = var.private_subnet_ids
capacity_type = "SPOT"
instance_types = [
"m5.xlarge",
"m5a.xlarge",
"m4.xlarge",
"m5d.xlarge", # Multiple instance types = fewer interruptions
]
scaling_config {
desired_size = 3
max_size = 20
min_size = 1
}
labels = {
"node-type" = "spot"
}
taint {
key = "spot"
value = "true"
effect = "NO_SCHEDULE"
}
}
Tolerate Spot in your deployments:
spec:
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: "node-type"
operator: In
values:
- "spot"
Spot interruption handling — add this to every stateless deployment:
spec:
terminationGracePeriodSeconds: 30 # Must be ≤ 30s (Spot gives you 2min warning)
containers:
- name: my-app
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
3. Namespace Quotas — Stop Orphaned Workloads
Every team has that one dev who deployed a test workload 6 months ago and forgot about it.
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-namespace-quota
namespace: dev
spec:
hard:
requests.cpu: "4"
requests.memory: "8Gi"
limits.cpu: "8"
limits.memory: "16Gi"
count/pods: "20"
count/services: "10"
persistentvolumeclaims: "5"
requests.storage: "50Gi"
Also set LimitRanges so pods without explicit requests get sane defaults:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: dev
spec:
limits:
- default:
cpu: "200m"
memory: "256Mi"
defaultRequest:
cpu: "50m"
memory: "64Mi"
type: Container
4. Kubecost — See Where Every Dollar Goes
You can't optimize what you can't measure. Install Kubecost:
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="aGVsbUBrdWJlY29zdC5jb20=xm343yadf98"
Port-forward and open the dashboard:
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090
What to look for:
- Cost by namespace — find the biggest spenders
- Efficiency score — anything below 50% is a right-sizing candidate
- Idle cost — nodes paying for nothing
- Network cost — cross-AZ traffic is expensive
5. HPA + KEDA for Auto-Scale Down
Don't pay for 10 replicas at 2 AM when traffic is zero.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 1 # Scale down to 1 at night
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
policies:
- type: Pods
value: 2
periodSeconds: 60
Real Numbers From a Recent Engagement
A fintech startup I worked with had this profile before:
- 12 On-Demand
m5.2xlargenodes running 24/7 - No VPA, requests set to round numbers
- Single namespace with no quotas
- Monthly bill: $4,200/mo on EKS compute alone
After 2 weeks of implementing the above:
- 3 On-Demand + 9 Spot nodes
- VPA recommendations applied to all 23 deployments
- Namespace quotas + LimitRanges deployed
- Monthly bill: $1,680/mo
Saving: $2,520/mo (60%) — and zero incidents from the changes.
Quick Wins Checklist
- [ ] Install VPA in recommendation mode, collect 48h of data
- [ ] Apply VPA target recommendations to requests/limits
- [ ] Move stateless workloads to Spot instances
- [ ] Add LimitRanges to every namespace
- [ ] Install Kubecost (free tier is enough to start)
- [ ] Enable HPA on all stateless deployments
- [ ] Schedule non-prod clusters to scale to 0 overnight
Start with VPA + Kubecost. Those two tools alone will show you exactly where your money is going and what to cut.
Have questions about your specific cluster setup? Drop a comment below — I read every one.