After managing Kubernetes clusters at FAANG-scale for 10+ years, I've learned one uncomfortable truth: most teams waste 40–60% of their cloud spend on Kubernetes alone. Not because of bad engineers — because the defaults are wrong and the tooling makes it easy to overprovision.
Here's exactly what I do on every engagement to cut that waste.
The Problem: Kubernetes Is Silently Burning Money
Before I show you the fixes, let me show you where the money goes:
| Waste Category | Typical % of Bill | |---|---| | Oversized nodes (CPU/memory requests >> actual usage) | 30–40% | | Idle namespaces / forgotten workloads | 10–15% | | On-demand instances where Spot would work | 15–20% | | Over-allocated PersistentVolumes | 5–10% | | Cross-AZ data transfer | 3–5% |
That's 60–90% of your bill potentially reducible. Let's go after each one.
1. Right-Size Your Resource Requests (Biggest Win)
The most expensive mistake in K8s is setting requests based on vibes, not data.
# What most teams do (bad)
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
The problem: your pod actually uses 80Mi and 20m. You're paying for 6x what you need.
Fix: Use VPA (Vertical Pod Autoscaler) in recommendation mode:
# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml
# Create a VPA object in Recommendation mode (no auto-apply yet)
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Recommendation only — don't auto-restart pods
EOF
After 24–48h of traffic, check recommendations:
kubectl describe vpa my-app-vpa
You'll see something like:
Recommendation:
Container Recommendations:
Container Name: my-app
Lower Bound: cpu: 12m, memory: 64Mi
Target: cpu: 25m, memory: 120Mi
Upper Bound: cpu: 100m, memory: 256Mi
Use Target as your new requests. I've seen this cut costs by 35% alone on teams that never benchmarked their pods.
2. Cluster Autoscaler + Spot Instances
Running all On-Demand nodes is 3–4x more expensive than it needs to be.
Strategy: Mixed node groups
On-Demand: 20–30% of capacity (base load, critical workloads)
Spot: 70–80% of capacity (stateless apps, batch jobs)
AWS EKS node group config:
# Terraform — EKS managed node group with mixed instances
resource "aws_eks_node_group" "spot_workers" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "spot-workers"
node_role_arn = aws_iam_role.node.arn
subnet_ids = var.private_subnet_ids
capacity_type = "SPOT"
instance_types = [
"m5.xlarge",
"m5a.xlarge",
"m4.xlarge",
"m5d.xlarge", # Multiple instance types = fewer interruptions
]
scaling_config {
desired_size = 3
max_size = 20
min_size = 1
}
labels = {
"node-type" = "spot"
}
taint {
key = "spot"
value = "true"
effect = "NO_SCHEDULE"
}
}
Tolerate Spot in your deployments:
spec:
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: "node-type"
operator: In
values:
- "spot"
Spot interruption handling — add this to every stateless deployment:
spec:
terminationGracePeriodSeconds: 30 # Must be ≤ 30s (Spot gives you 2min warning)
containers:
- name: my-app
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
3. Namespace Quotas — Stop Orphaned Workloads
Every team has that one dev who deployed a test workload 6 months ago and forgot about it.
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-namespace-quota
namespace: dev
spec:
hard:
requests.cpu: "4"
requests.memory: "8Gi"
limits.cpu: "8"
limits.memory: "16Gi"
count/pods: "20"
count/services: "10"
persistentvolumeclaims: "5"
requests.storage: "50Gi"
Also set LimitRanges so pods without explicit requests get sane defaults:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: dev
spec:
limits:
- default:
cpu: "200m"
memory: "256Mi"
defaultRequest:
cpu: "50m"
memory: "64Mi"
type: Container
4. Kubecost — See Where Every Dollar Goes
You can't optimize what you can't measure. Install Kubecost:
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="aGVsbUBrdWJlY29zdC5jb20=xm343yadf98"
Port-forward and open the dashboard:
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090
What to look for:
- Cost by namespace — find the biggest spenders
- Efficiency score — anything below 50% is a right-sizing candidate
- Idle cost — nodes paying for nothing
- Network cost — cross-AZ traffic is expensive
5. HPA + KEDA for Auto-Scale Down
Don't pay for 10 replicas at 2 AM when traffic is zero.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 1 # Scale down to 1 at night
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
policies:
- type: Pods
value: 2
periodSeconds: 60
Real Numbers From a Recent Engagement
A fintech startup I worked with had this profile before:
- 12 On-Demand
m5.2xlargenodes running 24/7 - No VPA, requests set to round numbers
- Single namespace with no quotas
- Monthly bill: $4,200/mo on EKS compute alone
After 2 weeks of implementing the above:
- 3 On-Demand + 9 Spot nodes
- VPA recommendations applied to all 23 deployments
- Namespace quotas + LimitRanges deployed
- Monthly bill: $1,680/mo
Saving: $2,520/mo (60%) — and zero incidents from the changes.
6. PersistentVolume Right-Sizing
Storage is the forgotten cost. PVs are routinely over-provisioned by 5–10x because engineers pick round numbers and never revisit them.
Check actual disk usage inside pods:
# Find all PVCs and their claimed capacity
kubectl get pvc --all-namespaces -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
CAPACITY:.spec.resources.requests.storage,\
STATUS:.status.phase
# Check actual usage inside a specific pod
kubectl exec -n production my-pod -- df -h /data
If your 100Gi PVC is 12Gi full, shrink it. Also migrate to gp3 storage class while you're there:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-data
spec:
storageClassName: gp3 # Was gp2 — gp3 is 20% cheaper with same baseline IOPS
resources:
requests:
storage: 20Gi # Was 100Gi
gp3 vs gp2 on AWS: Switch all EBS volumes from gp2 to gp3. Same performance baseline, 20% cheaper. You can independently configure IOPS without paying extra — only useful at high throughput, but the base price drop applies immediately.
# Migrate all gp2 volumes to gp3 in one shot
aws ec2 describe-volumes \
--filters "Name=volume-type,Values=gp2" \
--query 'Volumes[].VolumeId' \
--output text | \
xargs -I {} aws ec2 modify-volume --volume-id {} --volume-type gp3
This is a zero-downtime operation — AWS modifies the volumes live.
7. Savings Plans vs On-Demand for Your Baseline
For the on-demand portion of your cluster (the 20–30% that can't run on Spot), Savings Plans cut costs by 30–40%.
AWS Compute Savings Plans (preferred over Reserved Instances):
- 1-year term: ~20–25% discount vs on-demand
- 3-year term: ~35–45% discount
- Flexible: applies to any EC2 instance family, any region, any OS
My rule: Cover only your guaranteed minimum — the nodes you'd never scale below even at 3 AM. Everything above baseline runs Spot.
If you always have at least 3 m5.xlarge nodes, a 1-year Savings Plan for that capacity saves roughly $720/year at no risk. The Spot tier handles burst. The Savings Plan handles the base.
Check your current Savings Plan coverage:
aws ce get-savings-plan-coverage \
--time-period Start=2024-01-01,End=2024-12-31 \
--granularity MONTHLY \
--query 'SavingsPlansCoverages[].Total'
8. Cross-AZ Traffic — The $3,000/Month Hidden Tax
AWS charges $0.01/GB for traffic crossing Availability Zones. For a cluster handling 10 TB/day of internal service-to-service traffic, that's $3,000/mo in transfer costs.
Diagnose with Kubecost: Look at the Network section — it shows inter-zone traffic costs broken down by service pair.
Fix: Topology-Aware Routing (Kubernetes 1.21+)
apiVersion: v1
kind: Service
metadata:
name: my-service
annotations:
service.kubernetes.io/topology-mode: Auto
spec:
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
This tells kube-proxy to prefer routing to pods in the same AZ as the caller. Works automatically when you have pods distributed across AZs.
Reduce cross-AZ pod scheduling for chatty services:
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- payment-service # Co-locate with the service we call most
topologyKey: topology.kubernetes.io/zone
If Service A calls Service B 10,000 times a second, co-locating them in the same AZ eliminates that transfer cost entirely.
Quick Wins Checklist
- [ ] Install VPA in recommendation mode, collect 48h of data
- [ ] Apply VPA target recommendations to requests/limits
- [ ] Move stateless workloads to Spot instances
- [ ] Add LimitRanges to every namespace
- [ ] Install Kubecost (free tier is enough to start)
- [ ] Enable HPA on all stateless deployments
- [ ] Schedule non-prod clusters to scale to 0 overnight
Start with VPA + Kubecost. Those two tools alone will show you exactly where your money is going and what to cut.
Have questions about your specific cluster setup? Drop a comment below — I read every one.