devops

Kubernetes Cost Optimization: How I Cut Cloud Bills by 60% Without Sacrificing Uptime

Practical techniques I've used across dozens of production clusters to slash Kubernetes costs — from right-sizing nodes to Spot instances and namespace-level budgets.

March 10, 2026·5 min read·
#kubernetes#cost-optimization#aws#eks#cloud

After managing Kubernetes clusters at FAANG-scale for 10+ years, I've learned one uncomfortable truth: most teams waste 40–60% of their cloud spend on Kubernetes alone. Not because of bad engineers — because the defaults are wrong and the tooling makes it easy to overprovision.

Here's exactly what I do on every engagement to cut that waste.


The Problem: Kubernetes Is Silently Burning Money

Before I show you the fixes, let me show you where the money goes:

| Waste Category | Typical % of Bill | |---|---| | Oversized nodes (CPU/memory requests >> actual usage) | 30–40% | | Idle namespaces / forgotten workloads | 10–15% | | On-demand instances where Spot would work | 15–20% | | Over-allocated PersistentVolumes | 5–10% | | Cross-AZ data transfer | 3–5% |

That's 60–90% of your bill potentially reducible. Let's go after each one.


1. Right-Size Your Resource Requests (Biggest Win)

The most expensive mistake in K8s is setting requests based on vibes, not data.

# What most teams do (bad)
resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

The problem: your pod actually uses 80Mi and 20m. You're paying for 6x what you need.

Fix: Use VPA (Vertical Pod Autoscaler) in recommendation mode:

# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

# Create a VPA object in Recommendation mode (no auto-apply yet)
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"   # Recommendation only — don't auto-restart pods
EOF

After 24–48h of traffic, check recommendations:

kubectl describe vpa my-app-vpa

You'll see something like:

Recommendation:
  Container Recommendations:
    Container Name: my-app
    Lower Bound:    cpu: 12m, memory: 64Mi
    Target:         cpu: 25m, memory: 120Mi
    Upper Bound:    cpu: 100m, memory: 256Mi

Use Target as your new requests. I've seen this cut costs by 35% alone on teams that never benchmarked their pods.


2. Cluster Autoscaler + Spot Instances

Running all On-Demand nodes is 3–4x more expensive than it needs to be.

Strategy: Mixed node groups

On-Demand: 20–30% of capacity (base load, critical workloads)
Spot:       70–80% of capacity (stateless apps, batch jobs)

AWS EKS node group config:

# Terraform — EKS managed node group with mixed instances
resource "aws_eks_node_group" "spot_workers" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "spot-workers"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_subnet_ids

  capacity_type = "SPOT"

  instance_types = [
    "m5.xlarge",
    "m5a.xlarge",
    "m4.xlarge",
    "m5d.xlarge",   # Multiple instance types = fewer interruptions
  ]

  scaling_config {
    desired_size = 3
    max_size     = 20
    min_size     = 1
  }

  labels = {
    "node-type" = "spot"
  }

  taint {
    key    = "spot"
    value  = "true"
    effect = "NO_SCHEDULE"
  }
}

Tolerate Spot in your deployments:

spec:
  tolerations:
  - key: "spot"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: "node-type"
            operator: In
            values:
            - "spot"

Spot interruption handling — add this to every stateless deployment:

spec:
  terminationGracePeriodSeconds: 30   # Must be ≤ 30s (Spot gives you 2min warning)
  containers:
  - name: my-app
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 5"]

3. Namespace Quotas — Stop Orphaned Workloads

Every team has that one dev who deployed a test workload 6 months ago and forgot about it.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-namespace-quota
  namespace: dev
spec:
  hard:
    requests.cpu: "4"
    requests.memory: "8Gi"
    limits.cpu: "8"
    limits.memory: "16Gi"
    count/pods: "20"
    count/services: "10"
    persistentvolumeclaims: "5"
    requests.storage: "50Gi"

Also set LimitRanges so pods without explicit requests get sane defaults:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: dev
spec:
  limits:
  - default:
      cpu: "200m"
      memory: "256Mi"
    defaultRequest:
      cpu: "50m"
      memory: "64Mi"
    type: Container

4. Kubecost — See Where Every Dollar Goes

You can't optimize what you can't measure. Install Kubecost:

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="aGVsbUBrdWJlY29zdC5jb20=xm343yadf98"

Port-forward and open the dashboard:

kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

What to look for:

  • Cost by namespace — find the biggest spenders
  • Efficiency score — anything below 50% is a right-sizing candidate
  • Idle cost — nodes paying for nothing
  • Network cost — cross-AZ traffic is expensive

5. HPA + KEDA for Auto-Scale Down

Don't pay for 10 replicas at 2 AM when traffic is zero.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 1      # Scale down to 1 at night
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300    # Wait 5min before scaling down
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60

Real Numbers From a Recent Engagement

A fintech startup I worked with had this profile before:

  • 12 On-Demand m5.2xlarge nodes running 24/7
  • No VPA, requests set to round numbers
  • Single namespace with no quotas
  • Monthly bill: $4,200/mo on EKS compute alone

After 2 weeks of implementing the above:

  • 3 On-Demand + 9 Spot nodes
  • VPA recommendations applied to all 23 deployments
  • Namespace quotas + LimitRanges deployed
  • Monthly bill: $1,680/mo

Saving: $2,520/mo (60%) — and zero incidents from the changes.


Quick Wins Checklist

  • [ ] Install VPA in recommendation mode, collect 48h of data
  • [ ] Apply VPA target recommendations to requests/limits
  • [ ] Move stateless workloads to Spot instances
  • [ ] Add LimitRanges to every namespace
  • [ ] Install Kubecost (free tier is enough to start)
  • [ ] Enable HPA on all stateless deployments
  • [ ] Schedule non-prod clusters to scale to 0 overnight

Start with VPA + Kubecost. Those two tools alone will show you exactly where your money is going and what to cut.


Have questions about your specific cluster setup? Drop a comment below — I read every one.

#kubernetes#cost-optimization#aws#eks#cloud