Kubernetes Cost Optimization: How I Cut Cloud Bills by 60% Without Sacrificing Uptime

After managing Kubernetes clusters at FAANG-scale for 10+ years, I've learned one uncomfortable truth: most teams waste 40–60% of their cloud spend on Kubernetes alone. Not because of bad engineers — because the defaults are wrong and the tooling makes it easy to overprovision.

Here's exactly what I do on every engagement to cut that waste.

The Problem: Kubernetes Is Silently Burning Money

Before I show you the fixes, let me show you where the money goes:

| Waste Category | Typical % of Bill | |---|---| | Oversized nodes (CPU/memory requests >> actual usage) | 30–40% | | Idle namespaces / forgotten workloads | 10–15% | | On-demand instances where Spot would work | 15–20% | | Over-allocated PersistentVolumes | 5–10% | | Cross-AZ data transfer | 3–5% |

That's 60–90% of your bill potentially reducible. Let's go after each one.

1. Right-Size Your Resource Requests (Biggest Win)

The most expensive mistake in K8s is setting requests based on vibes, not data.

# What most teams do (bad)
resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

The problem: your pod actually uses 80Mi and 20m. You're paying for 6x what you need.

Fix: Use VPA (Vertical Pod Autoscaler) in recommendation mode:

# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

# Create a VPA object in Recommendation mode (no auto-apply yet)
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"   # Recommendation only — don't auto-restart pods
EOF

After 24–48h of traffic, check recommendations:

kubectl describe vpa my-app-vpa

You'll see something like:

Recommendation:
  Container Recommendations:
    Container Name: my-app
    Lower Bound:    cpu: 12m, memory: 64Mi
    Target:         cpu: 25m, memory: 120Mi
    Upper Bound:    cpu: 100m, memory: 256Mi

Use Target as your new requests. I've seen this cut costs by 35% alone on teams that never benchmarked their pods.

2. Cluster Autoscaler + Spot Instances

Running all On-Demand nodes is 3–4x more expensive than it needs to be.

Strategy: Mixed node groups

On-Demand: 20–30% of capacity (base load, critical workloads)
Spot:       70–80% of capacity (stateless apps, batch jobs)

AWS EKS node group config:

# Terraform — EKS managed node group with mixed instances
resource "aws_eks_node_group" "spot_workers" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "spot-workers"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_subnet_ids

  capacity_type = "SPOT"

  instance_types = [
    "m5.xlarge",
    "m5a.xlarge",
    "m4.xlarge",
    "m5d.xlarge",   # Multiple instance types = fewer interruptions
  ]

  scaling_config {
    desired_size = 3
    max_size     = 20
    min_size     = 1
  }

  labels = {
    "node-type" = "spot"
  }

  taint {
    key    = "spot"
    value  = "true"
    effect = "NO_SCHEDULE"
  }
}

Tolerate Spot in your deployments:

spec:
  tolerations:
  - key: "spot"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: "node-type"
            operator: In
            values:
            - "spot"

Spot interruption handling — add this to every stateless deployment:

spec:
  terminationGracePeriodSeconds: 30   # Must be ≤ 30s (Spot gives you 2min warning)
  containers:
  - name: my-app
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 5"]

3. Namespace Quotas — Stop Orphaned Workloads

Every team has that one dev who deployed a test workload 6 months ago and forgot about it.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-namespace-quota
  namespace: dev
spec:
  hard:
    requests.cpu: "4"
    requests.memory: "8Gi"
    limits.cpu: "8"
    limits.memory: "16Gi"
    count/pods: "20"
    count/services: "10"
    persistentvolumeclaims: "5"
    requests.storage: "50Gi"

Also set LimitRanges so pods without explicit requests get sane defaults:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: dev
spec:
  limits:
  - default:
      cpu: "200m"
      memory: "256Mi"
    defaultRequest:
      cpu: "50m"
      memory: "64Mi"
    type: Container

4. Kubecost — See Where Every Dollar Goes

You can't optimize what you can't measure. Install Kubecost:

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="aGVsbUBrdWJlY29zdC5jb20=xm343yadf98"

Port-forward and open the dashboard:

kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

What to look for:

Cost by namespace — find the biggest spenders
Efficiency score — anything below 50% is a right-sizing candidate
Idle cost — nodes paying for nothing
Network cost — cross-AZ traffic is expensive

5. HPA + KEDA for Auto-Scale Down

Don't pay for 10 replicas at 2 AM when traffic is zero.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 1      # Scale down to 1 at night
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300    # Wait 5min before scaling down
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60

Real Numbers From a Recent Engagement

A fintech startup I worked with had this profile before:

12 On-Demand m5.2xlarge nodes running 24/7
No VPA, requests set to round numbers
Single namespace with no quotas
Monthly bill: $4,200/mo on EKS compute alone

After 2 weeks of implementing the above:

3 On-Demand + 9 Spot nodes
VPA recommendations applied to all 23 deployments
Namespace quotas + LimitRanges deployed
Monthly bill: $1,680/mo

Saving: $2,520/mo (60%) — and zero incidents from the changes.

6. PersistentVolume Right-Sizing

Storage is the forgotten cost. PVs are routinely over-provisioned by 5–10x because engineers pick round numbers and never revisit them.

Check actual disk usage inside pods:

# Find all PVCs and their claimed capacity
kubectl get pvc --all-namespaces -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
CAPACITY:.spec.resources.requests.storage,\
STATUS:.status.phase

# Check actual usage inside a specific pod
kubectl exec -n production my-pod -- df -h /data

If your 100Gi PVC is 12Gi full, shrink it. Also migrate to gp3 storage class while you're there:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-data
spec:
  storageClassName: gp3    # Was gp2 — gp3 is 20% cheaper with same baseline IOPS
  resources:
    requests:
      storage: 20Gi        # Was 100Gi

gp3 vs gp2 on AWS: Switch all EBS volumes from gp2 to gp3. Same performance baseline, 20% cheaper. You can independently configure IOPS without paying extra — only useful at high throughput, but the base price drop applies immediately.

# Migrate all gp2 volumes to gp3 in one shot
aws ec2 describe-volumes \
  --filters "Name=volume-type,Values=gp2" \
  --query 'Volumes[].VolumeId' \
  --output text | \
  xargs -I {} aws ec2 modify-volume --volume-id {} --volume-type gp3

This is a zero-downtime operation — AWS modifies the volumes live.

7. Savings Plans vs On-Demand for Your Baseline

For the on-demand portion of your cluster (the 20–30% that can't run on Spot), Savings Plans cut costs by 30–40%.

AWS Compute Savings Plans (preferred over Reserved Instances):

1-year term: ~20–25% discount vs on-demand
3-year term: ~35–45% discount
Flexible: applies to any EC2 instance family, any region, any OS

My rule: Cover only your guaranteed minimum — the nodes you'd never scale below even at 3 AM. Everything above baseline runs Spot.

If you always have at least 3 m5.xlarge nodes, a 1-year Savings Plan for that capacity saves roughly $720/year at no risk. The Spot tier handles burst. The Savings Plan handles the base.

Check your current Savings Plan coverage:

aws ce get-savings-plan-coverage \
  --time-period Start=2024-01-01,End=2024-12-31 \
  --granularity MONTHLY \
  --query 'SavingsPlansCoverages[].Total'

8. Cross-AZ Traffic — The $3,000/Month Hidden Tax

AWS charges $0.01/GB for traffic crossing Availability Zones. For a cluster handling 10 TB/day of internal service-to-service traffic, that's $3,000/mo in transfer costs.

Diagnose with Kubecost: Look at the Network section — it shows inter-zone traffic costs broken down by service pair.

Fix: Topology-Aware Routing (Kubernetes 1.21+)

apiVersion: v1
kind: Service
metadata:
  name: my-service
  annotations:
    service.kubernetes.io/topology-mode: Auto
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080

This tells kube-proxy to prefer routing to pods in the same AZ as the caller. Works automatically when you have pods distributed across AZs.

Reduce cross-AZ pod scheduling for chatty services:

spec:
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - payment-service    # Co-locate with the service we call most
          topologyKey: topology.kubernetes.io/zone

If Service A calls Service B 10,000 times a second, co-locating them in the same AZ eliminates that transfer cost entirely.

Quick Wins Checklist

[ ] Install VPA in recommendation mode, collect 48h of data
[ ] Apply VPA target recommendations to requests/limits
[ ] Move stateless workloads to Spot instances
[ ] Add LimitRanges to every namespace
[ ] Install Kubecost (free tier is enough to start)
[ ] Enable HPA on all stateless deployments
[ ] Schedule non-prod clusters to scale to 0 overnight

Start with VPA + Kubecost. Those two tools alone will show you exactly where your money is going and what to cut.

Have questions about your specific cluster setup? Drop a comment below — I read every one.