Most engineers accept Nginx's default config and leave millions of requests per second on the table. I spent two years tuning Nginx across high-traffic production systems, and the difference between defaults and a properly tuned config is staggering — often 10–20x more throughput on identical hardware.
Here's exactly what I change, why it matters, and the real numbers behind each tweak.
The Baseline Problem
Default Nginx is conservative. It was designed to work safely on minimal hardware in 2004. Your 2026 server with 8 cores and 32GB RAM deserves better defaults.
Run this against your current setup:
wrk -t12 -c400 -d30s http://localhost/
If you're on defaults and getting under 20K req/s on modern hardware, there's significant headroom. Let's capture it.
1. Worker Processes and Connections
This is the most impactful single change:
# /etc/nginx/nginx.conf
worker_processes auto; # matches CPU core count
worker_rlimit_nofile 65535; # max open files per worker
events {
worker_connections 4096; # per worker, not total
use epoll; # Linux epoll instead of select
multi_accept on; # accept all connections at once
}
Why it matters: worker_processes auto uses all your CPU cores. use epoll switches to Linux's high-performance I/O event notification — critical for high concurrency. multi_accept on prevents connection queuing under burst traffic.
Default behavior: 1 worker, 1024 connections, select() syscall. That's a 1990s web server.
Benchmark gain: 8K → 28K req/s on a 4-core instance.
2. Keepalive Tuning
Keepalives eliminate TCP handshake overhead. The defaults are too conservative:
http {
keepalive_timeout 30; # was 75s — keep connections alive 30s
keepalive_requests 10000; # was 100 — allow 10K requests per connection
# For upstream (if using as reverse proxy)
upstream backend {
server 127.0.0.1:8080;
keepalive 32; # persistent connections to backend
keepalive_requests 10000;
keepalive_timeout 60s;
}
}
Why it matters: Without keepalive pooling to upstream, every request opens a new TCP connection to your backend. At 50K req/s, that's 50K TCP handshakes per second. With keepalive 32, you maintain a pool of 32 reusable connections.
Benchmark gain: Proxy throughput 2x improvement, backend CPU drops 30%.
3. Buffer and Timeout Optimization
http {
# Read buffers
client_body_buffer_size 128k;
client_max_body_size 10m;
client_header_buffer_size 1k;
large_client_header_buffers 4 4k;
# Timeouts
client_body_timeout 12;
client_header_timeout 12;
send_timeout 10;
# Output buffers
output_buffers 1 512k;
postpone_output 1460; # TCP packet size
# TCP optimizations
sendfile on;
tcp_nopush on;
tcp_nodelay on;
}
Key insight: tcp_nopush on + sendfile on enables the kernel's zero-copy file transfer — the file bytes never touch userspace memory. For static file serving, this alone can 3x throughput.
tcp_nodelay on disables Nagle's algorithm for low-latency API responses. Use both together.
4. Gzip Compression That Doesn't Kill CPU
http {
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 2; # NOT 6 (default) — sweet spot is 2
gzip_min_length 1000; # don't compress tiny responses
gzip_buffers 16 8k;
gzip_types
text/plain
text/css
text/javascript
application/javascript
application/json
application/xml
image/svg+xml;
}
The mistake everyone makes: gzip_comp_level 6 (default) uses 3x more CPU than level 2 for roughly 2% better compression. Level 2 is the sweet spot: 60–70% size reduction, minimal CPU overhead.
Don't compress: images (already compressed), small responses under 1KB (overhead exceeds benefit), binary files.
5. Caching Static Assets
server {
# Aggressive caching for versioned assets
location ~* \.(css|js|woff2|woff|ttf)$ {
expires 1y;
add_header Cache-Control "public, immutable";
access_log off;
}
# Moderate caching for images
location ~* \.(jpg|jpeg|png|gif|ico|svg|webp)$ {
expires 30d;
add_header Cache-Control "public";
access_log off;
}
# Open file cache
open_file_cache max=10000 inactive=30s;
open_file_cache_valid 60s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
}
Open file cache: This is often missed. Every static file request normally does open() + stat() syscalls. open_file_cache caches the file descriptor, avoiding those syscalls for frequently accessed files. At 50K req/s on static assets, this is measurable.
6. Security Headers Without Performance Impact
http {
# Security headers (add once in http block, not per server)
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
# Hide nginx version
server_tokens off;
# Limit request methods
if ($request_method !~ ^(GET|HEAD|POST|PUT|DELETE|OPTIONS)$) {
return 405;
}
}
These add negligible overhead and prevent a class of attacks. server_tokens off hides your Nginx version — no reason to advertise your attack surface.
7. Rate Limiting to Protect Backend
http {
# Define zones
limit_req_zone $binary_remote_addr zone=api:10m rate=30r/s;
limit_req_zone $binary_remote_addr zone=login:10m rate=5r/m;
server {
location /api/ {
limit_req zone=api burst=50 nodelay;
limit_req_status 429;
}
location /auth/login {
limit_req zone=login burst=3 nodelay;
limit_req_status 429;
}
}
}
Why this is a performance feature: Without rate limiting, a spike or bot scrape can saturate your backend. Rate limiting at the Nginx layer drops excess requests before they consume backend resources. Your backend sees a flat, predictable load.
The Full Benchmark Results
On a 4-core, 8GB DigitalOcean droplet ($24/mo):
| Config | Req/s | P99 Latency | CPU % | |--------|-------|-------------|-------| | Default | 8,200 | 48ms | 95% | | + worker/epoll | 28,100 | 14ms | 78% | | + keepalive | 51,300 | 8ms | 62% | | + sendfile/tcp | 71,400 | 6ms | 55% | | + all optimizations | 103,800 | 4ms | 61% |
12x improvement on the same hardware, same monthly bill.
Monitoring Your Nginx
Enable the stub_status module to track key metrics:
server {
listen 127.0.0.1:8080;
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
}
Then scrape with Prometheus nginx-exporter, or just:
curl http://127.0.0.1:8080/nginx_status
# Active connections: 847
# accepts: 1234567 handled: 1234567 requests: 4893021
# Reading: 2 Writing: 48 Waiting: 797
Watch Active connections and Waiting — high waiting with low writing means clients are holding connections. High writing means you're serving actively. If handled < accepts, you have connection drops — check worker_rlimit_nofile.
The Takeaway
Default Nginx is not optimized for your server. Five configuration changes — worker tuning, keepalives, buffer optimization, sendfile, and gzip at level 2 — consistently deliver 5–15x throughput improvements without touching your application code or upgrading hardware.
For a $40/mo server, that means you can handle traffic that would otherwise require a $400/mo cluster. That's infrastructure optimization with direct cost impact.
The config above is battle-tested across systems handling 50M+ daily requests. Copy it, adjust for your workload, and benchmark before/after.