After managing AWS infrastructure at scale for 10+ years, I've seen companies hemorrhage thousands of dollars every month on preventable waste. These aren't hypothetical optimizations — they're patterns I've implemented across multiple organizations that consistently delivered 30–50% cost reductions.
Here are the five Terraform patterns that reliably cut costs without touching reliability, plus three bonus patterns I apply to every greenfield deployment.
Why Terraform is the Right Tool for Cost Optimization
Before the patterns: why Terraform specifically?
Cost optimization done manually is a ticking clock. An engineer spends a day right-sizing instances, things improve, and six months later the overprovisioning is back because no one remembered the rationale. Infrastructure-as-code makes cost decisions durable. When a new environment is provisioned, it inherits the optimized defaults. When someone wants to upsize, there's a PR review that catches it.
The other reason: Terraform forces you to enumerate every resource. You can't optimize what you can't see, and Terraform state is the most accurate inventory of your AWS footprint you'll ever have.
1. Right-Size Everything with Data Sources
The biggest waste I see is over-provisioned EC2 instances. Teams launch m5.2xlarge because it "feels safe" and never revisit it. Terraform can't tell you to downsize, but it can make resizing trivially easy.
# Use variables tied to environment
variable "instance_type" {
type = map(string)
default = {
production = "m5.xlarge"
staging = "t3.medium"
development = "t3.small"
}
}
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = var.instance_type[var.environment]
tags = {
Environment = var.environment
ManagedBy = "terraform"
}
}
Combine this with CloudWatch metrics pulled via Terraform data sources to compare actual CPU/memory utilization against provisioned capacity. If your m5.2xlarge averages 8% CPU utilization, that's a t3.large at one-third the price.
For CPU-heavy workloads, also consider t3 burstable instances. A t3.xlarge costs ~$0.1664/hr vs. m5.xlarge at $0.192/hr — and if your workload doesn't burst continuously, you're buying credits you'll never spend.
Typical savings: $500–2,000/mo per workload
2. Scheduled Auto-Scaling for Non-Production
Development and staging environments don't need to run 24/7. But most teams leave them on because "it's annoying to restart." Terraform + Lambda + EventBridge solves this permanently.
resource "aws_autoscaling_schedule" "scale_down_nights" {
scheduled_action_name = "scale-down-nights"
min_size = 0
max_size = 0
desired_capacity = 0
recurrence = "0 20 * * MON-FRI" # 8pm weekdays
autoscaling_group_name = aws_autoscaling_group.dev.name
}
resource "aws_autoscaling_schedule" "scale_up_mornings" {
scheduled_action_name = "scale-up-mornings"
min_size = 1
max_size = 3
desired_capacity = 2
recurrence = "0 8 * * MON-FRI" # 8am weekdays
autoscaling_group_name = aws_autoscaling_group.dev.name
}
A team running 10 EC2 instances in dev 24/7 vs. 12 hours/weekday = 65% runtime reduction. On m5.large at ~$70/mo each, that's $455/mo saved on dev alone.
Don't forget weekends: adding a complete weekend shutdown (0 20 * * FRI to 0 8 * * MON) reduces runtime by another 29%. The total savings against always-on: 73%.
Typical savings: $300–1,500/mo
3. S3 Lifecycle Policies as Code
S3 is deceptively expensive at scale. Objects accumulate for years in Standard storage when most of them haven't been accessed in months.
resource "aws_s3_bucket_lifecycle_configuration" "logs" {
bucket = aws_s3_bucket.logs.id
rule {
id = "transition-to-ia"
status = "Enabled"
filter {
prefix = "logs/"
}
transition {
days = 30
storage_class = "STANDARD_IA"
}
transition {
days = 90
storage_class = "GLACIER_IR"
}
expiration {
days = 365
}
}
}
Standard → Standard-IA → Glacier Instant Retrieval → deletion. The pricing difference is dramatic: Standard at $0.023/GB/mo vs. Glacier IR at $0.004/GB/mo. For a company storing 50TB of logs, that's $950/mo vs. $200/mo.
One addition I always make: incomplete multipart upload cleanup. Failed uploads leave phantom objects that accumulate silently.
rule {
id = "cleanup-incomplete-uploads"
status = "Enabled"
abort_incomplete_multipart_upload {
days_after_initiation = 7
}
}
This has saved $50–300/mo in situations where large file uploads frequently fail and retry.
Typical savings: $200–2,000/mo depending on data volume
4. NAT Gateway Consolidation
This one surprises people. NAT Gateways cost $0.045/hour ($32/mo) each, plus $0.045 per GB of data processed. Multi-AZ setups often have 3 NAT Gateways but only need one for non-production.
locals {
# Production: one NAT per AZ for HA
# Everything else: single NAT to save money
nat_gateway_count = var.environment == "production" ? length(var.availability_zones) : 1
}
resource "aws_nat_gateway" "main" {
count = local.nat_gateway_count
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index % length(aws_subnet.public)].id
tags = {
Name = "${var.project}-nat-${count.index}"
}
}
Dropping from 3 NAT Gateways to 1 in staging/dev saves $64/mo per environment. Across 5 non-prod environments, that's $320/mo — plus data transfer savings.
But there's a more impactful optimization for production too: VPC Endpoints for AWS services. Every S3, DynamoDB, or SSM call from a private subnet goes through NAT Gateway and costs $0.045/GB. VPC endpoints are free for gateway endpoints (S3 and DynamoDB) and $0.01/hr ($7.20/mo) for interface endpoints.
# Gateway endpoint for S3 — free, eliminates S3 data transfer via NAT
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.aws_region}.s3"
route_table_ids = concat(
aws_route_table.private[*].id,
aws_route_table.public[*].id
)
tags = {
Name = "${var.project}-s3-endpoint"
}
}
If your instances are regularly hitting S3 (logs, assets, backups), this single change can save $200–800/mo on data transfer charges alone.
Typical savings: $100–500/mo
5. Reserved Instance Coverage via Terraform
This is less a code pattern and more a discipline pattern. Most teams buy Reserved Instances manually and lose track. Terraform + AWS Cost Explorer API can surface coverage gaps.
# Use data source to audit RI coverage
data "aws_ec2_instance_type_offerings" "available" {
filter {
name = "location"
values = [var.aws_region]
}
location_type = "availability-zone"
}
# Tag all long-lived instances for RI tracking
resource "aws_instance" "app" {
# ... instance config ...
tags = {
RICandidate = "true"
UpTimeTarget = "99.9"
LaunchDate = timestamp()
}
}
Instances tagged with RICandidate=true that have been running 30+ days are RI candidates. A 1-year Standard Reserved Instance on m5.xlarge saves ~38% vs. On-Demand. On a fleet of 20 m5.xlarge instances ($140/mo On-Demand), that's $1,064/mo saved.
For newer workloads, Compute Savings Plans are more flexible than RIs — they apply to any EC2 instance family, Fargate, and Lambda. The savings are slightly lower (up to 66% vs. 72% for RIs) but the flexibility means you won't be stuck paying for committed capacity when you migrate to a different instance family.
Typical savings: $1,000–5,000/mo at scale
Bonus: gp2 → gp3 EBS Migration
This is the lowest-effort optimization on this list. AWS launched gp3 EBS volumes in 2020 as a straight upgrade over gp2: 20% cheaper, baseline IOPS guaranteed at 3,000 regardless of volume size (vs. gp2's 3 IOPS/GB).
resource "aws_ebs_volume" "app" {
availability_zone = var.az
size = 100
type = "gp3" # Not gp2
iops = 3000 # Baseline included in price
throughput = 125 # MB/s baseline included
tags = {
Name = "${var.project}-app-volume"
}
}
For existing volumes, this isn't a Terraform change — it's an AWS Console/CLI operation that applies without downtime. But enforcing type = "gp3" in your Terraform modules ensures all new infrastructure starts on gp3.
On 50 volumes averaging 200GB each (10TB total): gp2 costs $800/mo, gp3 costs $640/mo. $160/mo saved for changing one line.
Typical savings: $50–500/mo
Integrating Infracost for PR-Level Cost Visibility
The patterns above work best when paired with automated cost feedback in pull requests. Infracost integrates with Terraform and posts a cost diff comment on every PR that changes infrastructure.
# .github/workflows/infracost.yml
name: Infracost
on: [pull_request]
jobs:
infracost:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: infracost/actions/setup@v2
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Generate Infracost diff
run: |
infracost diff --path=. \
--format=json \
--compare-to=infracost-base.json \
--out-file=/tmp/infracost.json
- uses: infracost/actions/comment@v2
with:
path: /tmp/infracost.json
behavior: update
This surfaces unexpected cost increases before code merges. A PR that accidentally changes instance_type from t3.medium to m5.2xlarge shows a $280/mo increase in the PR comment — before it lands in production.
Putting It Together: Real Numbers
Here's what implementing all five core patterns looked like at a mid-size SaaS company I worked with:
| Pattern | Monthly Saving | |---------|---------------| | Right-sizing (7 instances downsized) | $1,840 | | Dev environment scheduling | $620 | | S3 lifecycle (80TB data) | $1,520 | | NAT Gateway consolidation (4 envs) | $256 | | Reserved Instance conversion | $2,100 | | gp2 → gp3 migration (30 volumes) | $180 | | VPC S3 endpoint (high S3 traffic) | $340 | | Total | $6,856/mo |
That's $82,272/year from configuration changes in Terraform. No architectural redesign, no migration project, no downtime.
The Process I Follow
Before writing any Terraform changes:
- Pull Cost Explorer data for the past 90 days by service
- Identify top 3 cost drivers — usually EC2, RDS, NAT/data transfer
- Check utilization — CloudWatch metrics for EC2/RDS CPU, connections, memory
- Model savings — use AWS Pricing Calculator before committing
- Apply in non-prod first — validate before touching production
- Tag everything —
CostCenter,Environment,ManagedBy=terraform
Cost optimization isn't a one-time project. Build it into your Terraform modules as defaults, and every new deployment starts lean rather than needing to be fixed later. The goal is infrastructure where the cheapest correct configuration is also the easiest configuration to provision.
Have a Terraform cost pattern that's saved you significant money? Drop it in the comments.