Introduction
Kubernetes makes scaling easy — almost too easy. Teams that migrate to Kubernetes often see their cloud bill double within six months, not because they are running more workloads, but because Kubernetes abstracts away the cost of idle resources. A cluster with 40% average CPU utilization is paying for 60% wasted compute.
FinOps — the practice of bringing financial accountability to cloud spending — is essential for Kubernetes in 2026. This guide covers practical cost optimization with Kubecost, spot instances, and right-sizing.
Why Kubernetes Costs Spiral
Three patterns drive Kubernetes cost overruns:
Over-provisioned requests. Developers set resources.requests.cpu: 500m as a safety margin, even if the workload averages 80m. The scheduler reserves 500m, blocking other pods from using that capacity. Across 100 deployments, this wastes hundreds of vCPUs.
Idle nodes. The cluster autoscaler adds nodes when pods are pending but rarely removes them quickly. Spot instances compound this — terminated spot nodes leave pods rescheduled, creating a cycle of provisioning and waste.
No cost visibility. Without per-namespace, per-deployment cost attribution, teams have no feedback loop. A team that never sees their infrastructure cost has no incentive to optimize.
Kubecost: Cost Visibility Without Spreadsheets
Kubecost is the open-source standard for Kubernetes cost allocation. It breaks down costs by namespace, deployment, label, and even individual pod:
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost --create-namespace \
--set kubecostToken="..." \
--set prometheus.server.persistentVolume.enabled=true
Kubecost answers the question every platform team dreads: "How much does the payment-service deployment cost?" It shows CPU cost, memory cost, GPU cost, and network egress cost — per deployment, per namespace, per team label.
The key Kubecost features for FinOps:
- Cost allocation by Kubernetes label. Tag deployments with
team,environment, andcost-centerlabels. Kubecost aggregates costs by these labels automatically. - Savings recommendations. Kubecost identifies over-provisioned pods and recommends right-sizing based on actual usage over the last 7 days.
- Efficiency scores. A per-namespace score showing what percentage of requested resources are actually used. A namespace at 30% efficiency is paying for 70% waste.
Spot Instances: The 60% Discount
Spot and preemptible instances cut compute costs by 60-70%. The trade-off is reliability — spot instances can be reclaimed with 2 minutes' notice. Kubernetes handles this gracefully:
# Karpenter NodePool preferring spot with on-demand fallback
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
Karpenter provisions spot instances first, falling back to on-demand when spot capacity is unavailable. Combine with PodDisruptionBudgets to ensure critical workloads maintain minimum availability:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: payment-service-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: payment-service
For truly stateless workloads (background jobs, batch processing), use 100% spot with no PDB — accept the occasional restart and pocket the 70% savings.
Right-Sizing: Stop Paying for Unused Capacity
Right-sizing is the single highest-leverage FinOps practice. The process:
Step 1: Measure actual usage. Kubecost or Prometheus shows the P95 resource consumption over 7 days.
Step 2: Set requests to P95 + 15% buffer. If P95 CPU is 200m, set resources.requests.cpu: 230m. The 15% buffer handles traffic spikes without over-provisioning.
Step 3: Set limits for burst protection. For CPU, set limits.cpu: 500m to allow bursting during traffic spikes without getting throttled. For memory, set limits.memory close to requests.memory — memory overcommit causes OOMKills.
resources:
requests:
cpu: "230m" # P95 actual usage + 15%
memory: "256Mi" # P95 actual usage + 15%
limits:
cpu: "500m" # Burst headroom
memory: "320Mi" # OOM prevention
Right-sizing anti-patterns:
- Setting requests equal to limits — prevents bursting and wastes reserved capacity
- Copy-pasting resource values across deployments — different workloads have different profiles
- Never revisiting resource requests — usage patterns change over time
Bin Packing: Fill the Nodes
Bin packing ensures nodes run at high utilization by co-locating complementary workloads:
- Deploy memory-heavy and CPU-heavy workloads on the same node
- Use pod affinity/anti-affinity to control placement
- Set
requestsprecisely so the scheduler can pack efficiently
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: payment-service
topologyKey: kubernetes.io/hostname
Prefer softer preferredDuringScheduling over strict requiredDuringScheduling for anti-affinity — the scheduler can still pack pods when needed.
Cost Allocation Labels
Label every Kubernetes resource that creates cloud costs:
metadata:
labels:
team: payments
environment: production
cost-center: engineering
app: payment-service
Kubecost aggregates by these labels. Without them, you know the cluster costs $12,000/month but cannot tell which team drove $8,000 of it. Labeling is free; the lack of labeling is expensive.
Monitoring the FinOps Loop
FinOps is a continuous loop, not a one-time project:
- Weekly: Review Kubecost savings recommendations. Implement right-sizing for the top 5 over-provisioned deployments.
- Monthly: Review per-team cost dashboards. Identify teams with above-average cost growth.
- Quarterly: Re-evaluate spot vs on-demand mix, reserved instance purchases, and GPU allocation.
For teams running GPU inference workloads, the cost optimization strategies in our Kubernetes LLM inference guide — quantization, MIG partitioning, and spot-first GPU provisioning — apply directly to the FinOps loop.
For securing the clusters where you run cost optimization tooling, see our Kubernetes security best practices guide.
FinOps is not about spending less — it is about spending on the right things. Kubernetes gives you the knobs. This guide shows you which ones to turn.