devops

Redis Caching Patterns That Slash AWS Costs: A Production SRE Guide

The 6 Redis caching patterns I use to cut AWS RDS costs by 40-60% in high-traffic systems — with real configs, Python examples, and when to use each one.

April 4, 2026·9 min read·
#redis#caching#aws#performance#devops#sre#backend

At 50,000 requests/sec, your database is the first thing that dies. Redis is what keeps it alive — but only if you're using it correctly. Most engineers slap on a basic key-value cache and call it a day. That's leaving 80% of the performance gains on the table.

After managing infrastructure at FAANG scale, here are the 6 Redis caching patterns that actually move the needle on cost and reliability.

Why Redis Over a Simple In-Memory Cache?

Before patterns — why Redis specifically?

  • Shared across all app instances — in-memory caches are per-process, so horizontal scaling creates cache misses on every new pod
  • Persistence options — RDB snapshots + AOF for disaster recovery
  • Rich data structures — hashes, sorted sets, streams (not just key-value)
  • Pub/sub + Lua scripting — enables atomic operations you can't do in a simple map
  • Eviction policies — LRU, LFU, TTL-based — automatic memory management

At 10 app pods, an in-memory cache gives you 10x the cache misses of a shared Redis instance.


Pattern 1: Cache-Aside (Lazy Loading)

The most common pattern. App checks cache first; on miss, loads from DB and writes to cache.

import redis
import json

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

def get_user(user_id: int) -> dict:
    cache_key = f"user:{user_id}"
    
    # 1. Check cache
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # 2. Cache miss — fetch from DB
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    
    # 3. Populate cache with TTL
    r.setex(cache_key, 3600, json.dumps(user))  # 1 hour TTL
    
    return user

When to use: Read-heavy workloads where cache misses are acceptable (user profiles, product pages, blog posts).

Tradeoff: First request after cache expiry hits the DB — acceptable for most use cases.

Real impact: A 100-RPS endpoint with 95% cache hit rate means only 5 DB queries/sec instead of 100. At AWS RDS pricing ($0.10–0.30/hour per vCPU), this translates to running a db.t3.medium instead of a db.r5.2xlarge.


Pattern 2: Write-Through

Write to cache and DB simultaneously. Cache is always warm — no cold-start misses.

def update_user(user_id: int, data: dict) -> dict:
    cache_key = f"user:{user_id}"
    
    # 1. Write to DB first
    updated_user = db.execute(
        "UPDATE users SET name=%s, email=%s WHERE id=%s RETURNING *",
        data['name'], data['email'], user_id
    )
    
    # 2. Immediately update cache
    r.setex(cache_key, 3600, json.dumps(updated_user))
    
    return updated_user

When to use: Data that's read frequently right after being written (order status, user settings, config values).

Tradeoff: Higher write latency (two writes per update). Cache may store data that's never read (wasted memory for write-heavy datasets).

Optimization: Combine with a background job to expire unused keys weekly.


Pattern 3: Write-Behind (Write-Back)

Write to cache immediately, flush to DB asynchronously. Ultra-fast writes.

import asyncio
from collections import deque

write_queue = deque()

def update_counter(entity_id: int, delta: int):
    cache_key = f"counter:{entity_id}"
    
    # Atomic increment in Redis — O(1), sub-millisecond
    new_value = r.incrby(cache_key, delta)
    
    # Queue DB flush (async, non-blocking)
    write_queue.append((entity_id, new_value))
    
    return new_value

async def flush_to_db():
    """Background task — runs every 5 seconds"""
    while True:
        await asyncio.sleep(5)
        while write_queue:
            entity_id, value = write_queue.popleft()
            await db.execute(
                "UPDATE counters SET value=%s WHERE id=%s",
                value, entity_id
            )

When to use: High-frequency counters (page views, likes, inventory updates), real-time leaderboards, rate limiting.

Risk: Data loss on Redis crash if not persisted. Mitigate with Redis AOF persistence (appendonly yes in config).

Real impact: A social platform tracking 10M daily "like" events can do this entirely in Redis at $50/mo (ElastiCache) instead of hammering a $500/mo RDS with 10M writes/day.


Pattern 4: Read-Through

Redis sits in front of DB as a transparent proxy. App only talks to Redis — cache loads itself on miss.

This pattern is typically implemented at the infrastructure level using tools like Twemproxy, KeyDB, or a caching middleware layer rather than application code.

class ReadThroughCache:
    def __init__(self, redis_client, db_loader, ttl=3600):
        self.redis = redis_client
        self.loader = db_loader  # callable: key -> value
        self.ttl = ttl
    
    def get(self, key: str):
        value = self.redis.get(key)
        if value is not None:
            return json.loads(value)
        
        # Transparent load from DB
        value = self.loader(key)
        if value:
            self.redis.setex(key, self.ttl, json.dumps(value))
        return value

# Usage — app doesn't know about DB at all
cache = ReadThroughCache(r, lambda k: db.get_by_key(k))
user = cache.get(f"user:{user_id}")

When to use: Clean separation of concerns — useful when multiple services need the same cached data without duplicating cache-aside logic.


Pattern 5: Cache Stampede Prevention (Mutex Lock)

The silent killer. When a popular key expires, hundreds of requests hit the DB simultaneously. This is a thundering herd problem.

import time
import uuid

def get_with_lock(key: str, db_loader, ttl=3600, lock_ttl=10):
    # 1. Check cache
    value = r.get(key)
    if value:
        return json.loads(value)
    
    # 2. Try to acquire lock
    lock_key = f"lock:{key}"
    lock_id = str(uuid.uuid4())
    acquired = r.set(lock_key, lock_id, nx=True, ex=lock_ttl)
    
    if acquired:
        try:
            # 3. We hold the lock — load from DB
            value = db_loader(key)
            r.setex(key, ttl, json.dumps(value))
            return value
        finally:
            # 4. Release lock (only if we own it)
            lua_script = """
            if redis.call("get", KEYS[1]) == ARGV[1] then
                return redis.call("del", KEYS[1])
            else
                return 0
            end
            """
            r.eval(lua_script, 1, lock_key, lock_id)
    else:
        # 5. Another process is loading — wait and retry
        for _ in range(20):
            time.sleep(0.1)
            value = r.get(key)
            if value:
                return json.loads(value)
        
        # Fallback: load directly (lock holder may have crashed)
        return db_loader(key)

When to use: Any high-traffic endpoint where a single cache key expires — product pages, homepage content, leaderboards.

Simpler alternative: Use probabilistic early expiration (PER) — randomly refresh the key before it expires:

def get_with_per(key: str, db_loader, ttl=3600, beta=1.0):
    """Probabilistic Early Revalidation — no locks needed"""
    data = r.get(key)
    
    if data:
        cached = json.loads(data)
        remaining_ttl = r.ttl(key)
        
        # Probabilistically refresh before expiry
        if remaining_ttl < ttl * 0.1:  # Last 10% of TTL
            import random, math
            if random.random() < beta * math.log(ttl / max(remaining_ttl, 1)) / ttl:
                value = db_loader(key)
                r.setex(key, ttl, json.dumps(value))
                return value
        
        return cached
    
    value = db_loader(key)
    r.setex(key, ttl, json.dumps(value))
    return value

Pattern 6: Cache Tagging (Invalidation Groups)

The hardest problem in caching: invalidation. When you update a product, you need to invalidate the product page, the category page, and the search results — not just product:123.

def set_with_tags(key: str, value, tags: list, ttl=3600):
    """Store a value and register it under multiple invalidation tags"""
    pipe = r.pipeline()
    
    # Store the value
    pipe.setex(key, ttl, json.dumps(value))
    
    # Register key under each tag (using Redis Sets)
    for tag in tags:
        tag_key = f"tag:{tag}"
        pipe.sadd(tag_key, key)
        pipe.expire(tag_key, ttl + 60)  # Tag set lives slightly longer
    
    pipe.execute()

def invalidate_tag(tag: str):
    """Invalidate all keys associated with a tag"""
    tag_key = f"tag:{tag}"
    keys = r.smembers(tag_key)
    
    if keys:
        pipe = r.pipeline()
        for key in keys:
            pipe.delete(key)
        pipe.delete(tag_key)
        pipe.execute()
    
    return len(keys)

# Usage
set_with_tags(
    f"product:{product_id}",
    product_data,
    tags=[f"category:{category_id}", "products:all", f"brand:{brand_id}"]
)

# When a category updates — invalidate everything tagged to it
invalidate_tag(f"category:{category_id}")

When to use: E-commerce, CMS, any system where one entity change affects multiple cached views.


Production Redis Configuration

Don't run Redis with default settings. For production:

# /etc/redis/redis.conf

# Memory limit — always set this
maxmemory 2gb
maxmemory-policy allkeys-lru

# Persistence — AOF for write-behind pattern safety
appendonly yes
appendfsync everysec

# Eviction — LRU for general caching workloads
# Use allkeys-lfu for skewed access patterns (power-law distribution)

# Network
tcp-keepalive 60
timeout 300

# Performance
hz 20
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes

For AWS ElastiCache, set these via parameter groups. The maxmemory-policy is the single most important parameter — noeviction (the default) will crash your cache when it fills up.


Choosing the Right Pattern

| Pattern | Best For | DB Load | Write Latency | Complexity | |---------|----------|---------|---------------|------------| | Cache-Aside | Read-heavy, stale OK | Low | Normal | Low | | Write-Through | Recent writes read immediately | Low | High | Medium | | Write-Behind | High-frequency counters | Very Low | Very Low | High | | Read-Through | Clean architecture | Low | Normal | Medium | | Stampede Lock | Viral content, spiky traffic | Protected | Normal | High | | Cache Tagging | Complex invalidation | Low | Normal | High |


The Real Cost Math

Here's what proper Redis caching actually saves on AWS:

Scenario: 500 RPS to a product catalog endpoint, db.r5.large ($0.24/hr)

  • Without cache: 500 RPS × DB query time 20ms = DB at 100% CPU → need db.r5.2xlarge ($0.96/hr)
  • With 95% cache hit rate: 25 RPS to DB → db.t3.medium ($0.068/hr) handles it comfortably
  • Redis ElastiCache cache.t3.medium: $0.068/hr

Monthly savings: ($0.96 - $0.068 - $0.068) × 730 hours = ~$600/mo on one endpoint alone.

At scale, proper caching typically saves 40-70% on RDS costs. Redis pays for itself within the first week.


Common Mistakes to Avoid

  1. No TTL on keys — memory fills up, Redis starts evicting data randomly or crashes
  2. Caching mutable user-specific data without user context in the key — user A sees user B's data (security incident)
  3. JSON serializing large objects — serialize only what you need; a 50KB JSON blob in Redis is 50KB you're paying for
  4. Not handling cache misses gracefully — if Redis goes down, your app should fall back to DB, not crash
  5. Storing sessions in Redis without replication — single-node Redis failure logs out every user simultaneously

Always test your fallback path: redis-cli DEBUG SLEEP 30 simulates Redis being slow/unresponsive.


Redis done right is the difference between a $200/mo RDS bill and a $2,000/mo one. Pick the pattern that matches your access pattern, set your TTLs deliberately, and you'll cut infrastructure costs while improving p99 latency at the same time.

#redis#caching#aws#performance#devops#sre#backend
D
DevToCashAuthor

Senior DevOps/SRE Engineer · 10+ years · Professional Trader (IDX, Crypto, US Equities)

I write about real infrastructure patterns and trading strategies I use in production and in live markets. No courses, no affiliate hype — just documentation of what actually works.

More about me →