Most GitHub Actions tutorials show you "Hello World" pipelines that don't survive contact with real production requirements. I've built CI/CD systems that deploy 50+ times a day across 3 environments, with zero manual intervention.
This is the production-ready setup. Copy-paste the configs, adapt to your stack.
What We're Building
A pipeline that:
- Runs tests on every PR
- Builds and scans Docker images
- Deploys to staging automatically on merge to
main - Deploys to production with a manual approval gate
- Rolls back automatically if health checks fail
Repository Structure
.github/
workflows/
ci.yml # runs on every PR
deploy-staging.yml # runs on merge to main
deploy-prod.yml # manual trigger or tag push
Step 1: The CI Workflow (Pull Requests)
# .github/workflows/ci.yml
name: CI
on:
pull_request:
branches: [main, develop]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run tests
run: npm test -- --coverage
- name: Upload coverage
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
build:
runs-on: ubuntu-latest
needs: test
steps:
- uses: actions/checkout@v4
- name: Build Docker image
run: docker build -t myapp:${{ github.sha }} .
- name: Run Trivy security scan
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
- name: Upload Trivy scan results
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: 'trivy-results.sarif'
This CI pipeline runs on every PR and blocks merge if tests fail or critical CVEs are found. The Trivy scan uploads results to GitHub Security tab.
Step 2: Build and Push to Registry
# .github/workflows/deploy-staging.yml (first job)
name: Deploy to Staging
on:
push:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=sha-
type=ref,event=branch
type=raw,value=latest,enable={{is_default_branch}}
- name: Build and push
id: build
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
The cache-from/cache-to: gha reduces Docker build time by 60–80% on repeat builds. The image digest output is used for immutable deployments.
Step 3: Deploy to Staging
deploy-staging:
runs-on: ubuntu-latest
needs: build-and-push
environment: staging
steps:
- uses: actions/checkout@v4
- name: Configure kubectl
uses: azure/setup-kubectl@v3
- name: Set kubeconfig
run: |
echo "${{ secrets.STAGING_KUBECONFIG }}" | base64 -d > /tmp/kubeconfig
echo "KUBECONFIG=/tmp/kubeconfig" >> $GITHUB_ENV
- name: Deploy to staging
run: |
kubectl set image deployment/api \
api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-and-push.outputs.image-digest }} \
-n staging
- name: Wait for rollout
run: |
kubectl rollout status deployment/api -n staging --timeout=5m
- name: Run smoke tests
run: |
# wait for service to be ready
sleep 10
curl --fail https://staging.myapp.com/healthz || exit 1
curl --fail https://staging.myapp.com/api/v1/status || exit 1
Using the image digest (not tag) for deployment guarantees immutability. The smoke test step fails the deployment if the health endpoint doesn't respond.
Step 4: Production Deploy With Manual Approval
# .github/workflows/deploy-prod.yml
name: Deploy to Production
on:
workflow_dispatch: # manual trigger
inputs:
image_tag:
description: 'Image tag to deploy (e.g. sha-abc1234)'
required: true
jobs:
deploy-prod:
runs-on: ubuntu-latest
environment: production # requires manual approval in GitHub UI
steps:
- uses: actions/checkout@v4
- name: Configure kubectl
uses: azure/setup-kubectl@v3
- name: Set kubeconfig
run: |
echo "${{ secrets.PROD_KUBECONFIG }}" | base64 -d > /tmp/kubeconfig
echo "KUBECONFIG=/tmp/kubeconfig" >> $GITHUB_ENV
- name: Deploy to production
run: |
kubectl set image deployment/api \
api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ inputs.image_tag }} \
-n production
- name: Wait for rollout
run: kubectl rollout status deployment/api -n production --timeout=10m
- name: Production health check
run: |
sleep 15
for i in {1..5}; do
curl --fail https://api.myapp.com/healthz && break || sleep 5
done
- name: Notify Slack on success
if: success()
run: |
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-H 'Content-type: application/json' \
--data '{"text":"✅ Production deploy successful: ${{ inputs.image_tag }}"}'
- name: Auto-rollback on failure
if: failure()
run: |
kubectl rollout undo deployment/api -n production
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-H 'Content-type: application/json' \
--data '{"text":"🚨 Production deploy FAILED — rolled back automatically"}'
The environment: production block in GitHub requires a designated reviewer to approve before the job runs. This is your manual gate.
Step 5: Required Secrets Setup
In GitHub → Settings → Secrets and Variables → Actions:
STAGING_KUBECONFIG # base64-encoded kubeconfig for staging cluster
PROD_KUBECONFIG # base64-encoded kubeconfig for production cluster
SLACK_WEBHOOK # Slack incoming webhook URL
CODECOV_TOKEN # Coverage reporting
Generate base64 kubeconfig:
cat ~/.kube/config | base64 -w 0
Reusable Workflows — DRY Your CI/CD
If you have multiple repositories running the same pipeline, don't copy-paste. Create reusable workflows:
# .github/workflows/reusable-build.yml (in your shared-actions repo)
name: Reusable Build
on:
workflow_call:
inputs:
image-name:
required: true
type: string
registry:
required: false
type: string
default: 'ghcr.io'
secrets:
registry-token:
required: true
outputs:
image-digest:
description: "Image digest"
value: ${{ jobs.build.outputs.digest }}
jobs:
build:
runs-on: ubuntu-latest
outputs:
digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- name: Build and push
id: build
uses: docker/build-push-action@v5
with:
push: true
tags: ${{ inputs.registry }}/${{ inputs.image-name }}:${{ github.sha }}
Call it from any repo:
jobs:
build:
uses: myorg/shared-actions/.github/workflows/reusable-build.yml@main
with:
image-name: my-service
secrets:
registry-token: ${{ secrets.GITHUB_TOKEN }}
This pattern eliminates drift between team pipelines. When you update the shared workflow, all consumers get the fix automatically.
Runner Cost Optimization
GitHub Actions charges per minute for private repos (public is free). At scale this adds up fast.
Cancel outdated runs with concurrency:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true # Cancels any in-progress run for the same branch
This alone cuts runner minutes 30–50% on active teams. Outdated PR builds are cancelled automatically when you push a new commit.
Aggressive dependency caching:
- name: Cache node_modules
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-npm-
The restore-keys fallback ensures you get a partial cache hit even when package-lock.json changes, avoiding a full re-download.
Matrix builds for parallel testing:
jobs:
test:
strategy:
fail-fast: false
matrix:
node: [18, 20, 22]
runs-on: ubuntu-latest
steps:
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
- run: npm ci && npm test
Three Node.js versions tested in parallel — same wall-clock time as one, triple the coverage.
Common Mistakes to Avoid
Hardcoding credentials in workflow files:
Never write password: mypassword. Always use ${{ secrets.MY_SECRET }}. GitHub scans for leaked secrets and will alert you, but the damage is already done by then.
Not pinning action versions:
# Bad — can silently break when the action releases a new major version
uses: actions/checkout@main
# Good — pinned to a specific SHA
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Unpinned actions can change behavior on any release. For production pipelines, pin to a commit SHA — not even a tag, because tags can be moved.
Long-running CI blocking PRs: Anything over 5 minutes kills developer velocity. Split heavy integration tests into a separate scheduled workflow that runs nightly, not on every PR. Keep the PR pipeline fast: lint + unit tests + Docker build only.
Exposing secrets in logs:
# Always mask dynamic secrets
- name: Login to registry
run: |
echo "::add-mask::${{ steps.get-token.outputs.token }}"
docker login -u user -p ${{ steps.get-token.outputs.token }}
The ::add-mask:: command redacts the value from all subsequent log output.
Protecting the Main Branch
In GitHub → Settings → Branches → Add rule for main:
- Require status checks:
test,build - Require PR before merging
- Require up-to-date branch before merging
- Dismiss stale reviews
This ensures no code bypasses CI to reach production.
Key Takeaways
- Use image digest (not tag) for production deployments — immutable and auditable
- Trivy security scanning blocks dangerous images before they reach production
- GitHub Environments provide manual approval gates for production deploys
- Auto-rollback on health check failure eliminates most 3am wake-ups
- Docker layer caching in GHA reduces build times by 60–80%
Conclusion
This pipeline handles everything from PR checks to production rollback automatically. The manual approval gate for production means you control the timing, not the pipeline. Once set up, deployments become boring — and boring is exactly what production should be.
Next: Zero-Downtime Kubernetes Rolling Deployments — The Full Guide
Published: 2026-04-15 | Category: DevOps | Read time: 10 min