Zero-downtime deployments shouldn't be a luxury for large teams. With GitHub Actions and Kubernetes, you can have a production-grade deployment pipeline in an afternoon. Here's the exact workflow I use — battle-tested across dozens of production services.
The Goal
Every merge to main should:
- Build a Docker image with a deterministic tag
- Run tests in parallel (fail fast)
- Push to container registry
- Deploy to Kubernetes with zero downtime
- Verify the deployment succeeded
- Auto-rollback if health checks fail
All within 5–8 minutes.
The Workflow File
# .github/workflows/deploy.yml
name: Build and Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test -- --coverage --passWithNoTests
- name: Lint
run: npm run lint
build:
needs: test
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=sha-,format=short
type=ref,event=branch
type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}
- name: Build and push
id: build
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment: production
steps:
- uses: actions/checkout@v4
- name: Set up kubectl
uses: azure/setup-kubectl@v3
with:
version: 'v1.28.0'
- name: Configure kubeconfig
run: |
mkdir -p $HOME/.kube
echo "${{ secrets.KUBECONFIG }}" | base64 -d > $HOME/.kube/config
chmod 600 $HOME/.kube/config
- name: Deploy to Kubernetes
run: |
IMAGE_TAG="sha-$(echo ${{ github.sha }} | cut -c1-7)"
kubectl set image deployment/app \
app=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:$IMAGE_TAG \
--namespace=production
kubectl annotate deployment/app \
kubernetes.io/change-cause="Deploy $IMAGE_TAG from commit ${{ github.sha }}" \
--namespace=production \
--overwrite
- name: Wait for rollout
run: |
kubectl rollout status deployment/app \
--namespace=production \
--timeout=300s
- name: Smoke test
run: |
# Hit the health endpoint after deploy
sleep 10
ENDPOINT="${{ secrets.APP_HEALTH_URL }}"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$ENDPOINT")
if [ "$STATUS" != "200" ]; then
echo "Health check failed with status $STATUS — rolling back"
kubectl rollout undo deployment/app --namespace=production
exit 1
fi
echo "Deployment healthy — status $STATUS"
The Kubernetes Deployment Config
Your Kubernetes deployment must be configured correctly for zero-downtime to work:
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
namespace: production
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # never take pods down before new ones are ready
maxSurge: 1 # one extra pod during transition
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: app
image: ghcr.io/your-org/your-app:latest
ports:
- containerPort: 3000
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
terminationGracePeriodSeconds: 30
Why These Settings Matter
maxUnavailable: 0 — Kubernetes will not kill old pods until new ones pass readiness checks. This guarantees zero downtime.
readinessProbe — The new pod only receives traffic once /health/ready returns 200. Your app should return 503 until it's fully initialized (DB connections established, caches warmed).
preStop sleep 5 — Kubernetes removes the pod from the service endpoints before sending SIGTERM, but there's a race condition. This 5-second sleep lets all in-flight requests drain before the process exits.
terminationGracePeriodSeconds: 30 — Gives your app 30 seconds to finish handling requests after SIGTERM before a hard kill.
Manual Rollback
If automated rollback fails or you need to roll back manually:
# See rollout history (shows the change-cause annotations)
kubectl rollout history deployment/app --namespace=production
# Roll back to previous version
kubectl rollout undo deployment/app --namespace=production
# Roll back to a specific revision
kubectl rollout undo deployment/app --namespace=production --to-revision=5
# Watch the rollback progress
kubectl rollout status deployment/app --namespace=production
With this setup, rollback to previous version takes under 60 seconds.
GitHub Environment Protection Rules
In GitHub repo settings, create an environment called production with:
- Required reviewers — require 1 approval for production deploys (optional but recommended)
- Deployment branches — only allow
main - Wait timer — 0 minutes (don't slow down routine deploys)
The environment: production line in the workflow triggers these checks.
Secrets to Configure
In GitHub → Settings → Secrets:
| Secret | Value |
|--------|-------|
| KUBECONFIG | Base64-encoded kubeconfig with deploy permissions |
| APP_HEALTH_URL | Full URL to your health check endpoint |
Generate the kubeconfig secret:
cat ~/.kube/config | base64 | pbcopy
Build Time Benchmarks
With this setup and GitHub Actions cache for Docker layers:
- Test job: ~2 min
- Build + push: ~3 min (cold), ~90s (warm cache)
- Deploy + verify: ~2 min
- Total: ~5–7 minutes from merge to production
That's fast enough that you can deploy multiple times per day without friction.
Common Pitfalls
Readiness probe too aggressive. If initialDelaySeconds is too short and your app takes 20 seconds to start, Kubernetes kills and restarts it in a loop. Give your app breathing room.
Not draining connections. Containers need to handle SIGTERM gracefully — stop accepting new connections, finish existing ones, then exit. Most frameworks support this natively (Node.js: server.close(), Go: gracefulShutdown).
Image tag latest in production. Always deploy specific SHA-based tags. latest is non-deterministic and kills your ability to roll back to a known version.
Smoke test too fast. The sleep 10 after deploy gives load balancers time to notice the new endpoints. Remove it and you might hit old pods during the smoke test.
The core insight: zero-downtime deployment is a property of correct configuration, not clever code. Get the readiness probes and rolling update strategy right, and Kubernetes handles the rest.