Your app is slow. The logs say it's the database. But which query? And how do you fix it without a 3am maintenance window?
This is the exact process I use to diagnose and fix PostgreSQL performance problems in production.
Step 1: Find What's Actually Slow
Before optimizing anything, you need data.
Enable pg_stat_statements
-- Check if it's enabled
SELECT * FROM pg_extension WHERE extname = 'pg_stat_statements';
-- Enable it (requires superuser, no restart needed in PG 14+)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Add to postgresql.conf:
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
Find Your Top 10 Slowest Queries
SELECT
round(mean_exec_time::numeric, 2) AS mean_ms,
round(total_exec_time::numeric, 2) AS total_ms,
calls,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct_total,
substring(query, 1, 100) AS query_snippet
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
This shows you queries sorted by total time spent — not just slowest individual run, but slowest total impact on your database.
Find Queries with Worst Cache Hit Ratio
SELECT
schemaname,
tablename,
heap_blks_read,
heap_blks_hit,
round(heap_blks_hit::numeric / nullif(heap_blks_hit + heap_blks_read, 0) * 100, 2) AS cache_hit_pct
FROM pg_statio_user_tables
ORDER BY heap_blks_read DESC
LIMIT 20;
You want cache hit > 99%. Below 95% means you're hitting disk constantly.
Step 2: EXPLAIN ANALYZE — Read It Like a Map
Once you have your slow query, understand why it's slow:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.email
ORDER BY order_count DESC
LIMIT 50;
Reading EXPLAIN Output
Limit (cost=45231.88..45231.90 rows=50) (actual time=2341.221..2341.234 rows=50)
-> Sort (cost=45231.88..45481.88 rows=100000) (actual time=2341.220..2341.225 rows=50)
-> HashAggregate (cost=35000.00..37500.00 rows=100000) (actual time=2298.445..2330.112 rows=84231)
-> Hash Left Join (cost=2500.00..30000.00 rows=1000000) (actual time=45.221..1987.332 rows=1000000)
-> Seq Scan on users (cost=0.00..15000.00) (actual time=0.021..342.551 rows=500000)
Filter: (created_at > (now() - '30 days'::interval))
Rows Removed by Filter: 450000
-> Hash (actual time=44.221..44.221 rows=1000000)
-> Seq Scan on orders (cost=0.00..20000.00) (actual time=0.015..35.221 rows=1000000)
Red flags to look for:
| Warning Sign | What It Means |
|-------------|---------------|
| Seq Scan on large table | Missing index — reads every row |
| Rows Removed by Filter: 450000 | Index would eliminate these |
| actual time >> cost estimate | Statistics are stale, run ANALYZE |
| Hash Join with huge hash | Consider if join order can be improved |
Step 3: Fix the Most Common Problems
Problem 1: Missing Index on Filter Column
-- Slow: full table scan
SELECT * FROM orders WHERE user_id = 12345;
-- Fix: create index
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);
-- CONCURRENTLY means no table lock — safe for production
Index on timestamp for time-range queries:
-- Slow: scanning all users
SELECT * FROM users WHERE created_at > NOW() - INTERVAL '30 days';
-- Fix:
CREATE INDEX CONCURRENTLY idx_users_created_at ON users(created_at);
Problem 2: Index Exists but Not Being Used
Postgres might ignore your index if:
-- Check: why isn't index being used?
EXPLAIN SELECT * FROM users WHERE lower(email) = 'user@example.com';
-- Output: Seq Scan (because index is on email, not lower(email))
-- Fix: function-based index
CREATE INDEX CONCURRENTLY idx_users_email_lower ON users(lower(email));
-- Now this will use the index:
SELECT * FROM users WHERE lower(email) = 'user@example.com';
Index not used due to type mismatch:
-- Column is INTEGER but you're comparing to text
SELECT * FROM users WHERE id = '12345'; -- '12345' is text, triggers implicit cast → no index
-- Fix: match types
SELECT * FROM users WHERE id = 12345; -- integer literal
Problem 3: N+1 Query Problem
Your ORM fires 1 query to get users, then 1 query per user to get their orders = N+1 queries.
-- Bad pattern: called 10,000 times for 10,000 users
SELECT * FROM orders WHERE user_id = ?;
-- Fix: single query with JOIN or IN clause
SELECT u.id, u.email, o.id as order_id, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.id = ANY(ARRAY[1,2,3,...,10000]);
Problem 4: Missing Composite Index
Sometimes you need an index on multiple columns:
-- Query filters on two columns
SELECT * FROM orders
WHERE user_id = 12345 AND status = 'pending';
-- Single-column index on user_id helps
-- but a composite index is better
CREATE INDEX CONCURRENTLY idx_orders_user_status
ON orders(user_id, status);
-- Column order matters: put the high-cardinality column first
-- user_id (millions of unique values) before status (few unique values)
Problem 5: VACUUM Not Keeping Up (Table Bloat)
Heavy UPDATE/DELETE workloads leave dead rows. PostgreSQL autovacuum should handle this, but sometimes it can't keep up.
-- Check table bloat
SELECT
schemaname,
tablename,
n_dead_tup,
n_live_tup,
round(100 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0)::numeric, 2) AS dead_pct,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;
If dead_pct > 20% or autovacuum hasn't run recently:
-- Manual vacuum (non-blocking)
VACUUM ANALYZE orders;
-- If very bloated (need full rebuild, requires lock)
VACUUM FULL orders; -- Warning: table lock! Schedule maintenance window.
-- Better alternative to VACUUM FULL (no lock):
-- Use pg_repack extension
Tune autovacuum for high-write tables:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum when 1% of rows are dead (default: 20%)
autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_cost_delay = 2 -- milliseconds (faster vacuuming)
);
Step 4: postgresql.conf Tuning
The defaults are conservative. For a dedicated database server:
# Memory (for 16GB RAM server)
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM (planner hint)
work_mem = 64MB # Per sort/hash operation (watch for OOM with many connections)
maintenance_work_mem = 1GB # For VACUUM, CREATE INDEX
# Checkpoints (reduce I/O spikes)
checkpoint_completion_target = 0.9
wal_buffers = 64MB
max_wal_size = 4GB
# Connection settings
max_connections = 100 # Use connection pooler (PgBouncer) instead of raising this
Critical: Don't raise max_connections to solve connection issues. Use PgBouncer. Each Postgres connection consumes ~5-10MB RAM. 500 connections = 5GB RAM just for connection overhead.
Step 5: Connection Pooling (PgBouncer)
Most applications don't need persistent connections. PgBouncer sits between your app and Postgres:
App (100 threads) → PgBouncer (20 server connections) → PostgreSQL
# pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp
[pgbouncer]
pool_mode = transaction # Best performance for stateless apps
max_client_conn = 1000 # How many app connections
default_pool_size = 20 # How many actual PG connections
min_pool_size = 5
transaction pool mode: connection returned to pool after each transaction. Works for most apps. Incompatible with SET commands and advisory locks.
Monitoring Queries to Run Weekly
-- Tables without primary keys (missing indexes, can cause slow seq scans)
SELECT tablename
FROM pg_tables
WHERE schemaname = 'public'
AND tablename NOT IN (
SELECT table_name FROM information_schema.table_constraints
WHERE constraint_type = 'PRIMARY KEY'
);
-- Unused indexes (wasting write performance)
SELECT
schemaname || '.' || tablename AS table,
indexname,
idx_scan AS scans,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan < 10
ORDER BY pg_relation_size(indexrelid) DESC;
-- Long-running queries (> 5 minutes)
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND (now() - query_start) > interval '5 minutes'
ORDER BY duration DESC;
The 15-Minute Production Fix Checklist
When someone pages you at 2am because the database is slow:
- Check active queries:
SELECT pid, wait_event_type, wait_event, query FROM pg_stat_activity WHERE state = 'active'; - Kill long-running queries if needed:
SELECT pg_terminate_backend(pid); - Check for lock contention:
SELECT * FROM pg_locks l JOIN pg_stat_activity a ON l.pid = a.pid WHERE NOT granted; - Find top slow queries: Run pg_stat_statements query above
- Check indexes exist:
EXPLAIN ANALYZEon the worst query - Create missing index:
CREATE INDEX CONCURRENTLY ... - Run ANALYZE if statistics stale:
ANALYZE <tablename>; - Monitor: Watch
pg_stat_activitywhile app traffic recovers
Most production database incidents are solved by one of: missing index, stale statistics, or lock contention. This checklist covers all three.
FAQ
Q: How do I add an index without downtime?
A: Always use CREATE INDEX CONCURRENTLY in production. It takes longer to build but doesn't lock the table. A regular CREATE INDEX will block all reads and writes until complete.
Q: My query was fast, now it's slow after adding more data. Why?
A: Postgres query planner uses statistics (row counts, value distributions) to pick execution plans. Run ANALYZE tablename to refresh statistics. If that doesn't help, the planner may be choosing wrong — try SET enable_seqscan = off temporarily to force index use and confirm the index helps.
Q: Should I use UUID or integer for primary keys?
A: Integer (BIGINT) primary keys are faster for joins and indexing. UUID v4 is random — inserting random UUIDs into a B-tree index causes page splits, inflating the index. If you need UUIDs (for distributed systems), use UUIDv7 which is time-ordered, or use gen_random_uuid() with awareness that write performance degrades at large scale.
Q: What's the max table size PostgreSQL can handle? A: Theoretically 32TB per table. Practically, tables over 100GB start needing careful partition management. Postgres table partitioning (declarative partitioning in PG 10+) solves this — partition by date or ID range and queries that filter on the partition key skip irrelevant partitions entirely.