CI/CD Pipeline With GitHub Actions: The Complete Production Setup

Most GitHub Actions tutorials show you "Hello World" pipelines that don't survive contact with real production requirements. I've built CI/CD systems that deploy 50+ times a day across 3 environments, with zero manual intervention.

This is the production-ready setup. Copy-paste the configs, adapt to your stack.

What We're Building

A pipeline that:

Runs tests on every PR
Builds and scans Docker images
Deploys to staging automatically on merge to main
Deploys to production with a manual approval gate
Rolls back automatically if health checks fail

Repository Structure

.github/
  workflows/
    ci.yml          # runs on every PR
    deploy-staging.yml   # runs on merge to main
    deploy-prod.yml      # manual trigger or tag push

Step 1: The CI Workflow (Pull Requests)

# .github/workflows/ci.yml
name: CI

on:
  pull_request:
    branches: [main, develop]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run linter
        run: npm run lint

      - name: Run tests
        run: npm test -- --coverage

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          token: ${{ secrets.CODECOV_TOKEN }}

  build:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - uses: actions/checkout@v4

      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Run Trivy security scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: myapp:${{ github.sha }}
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'

      - name: Upload Trivy scan results
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: 'trivy-results.sarif'

This CI pipeline runs on every PR and blocks merge if tests fail or critical CVEs are found. The Trivy scan uploads results to GitHub Security tab.

Step 2: Build and Push to Registry

# .github/workflows/deploy-staging.yml (first job)
name: Deploy to Staging

on:
  push:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build.outputs.digest }}

    steps:
      - uses: actions/checkout@v4

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=sha-
            type=ref,event=branch
            type=raw,value=latest,enable={{is_default_branch}}

      - name: Build and push
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

The cache-from/cache-to: gha reduces Docker build time by 60–80% on repeat builds. The image digest output is used for immutable deployments.

Step 3: Deploy to Staging

  deploy-staging:
    runs-on: ubuntu-latest
    needs: build-and-push
    environment: staging

    steps:
      - uses: actions/checkout@v4

      - name: Configure kubectl
        uses: azure/setup-kubectl@v3

      - name: Set kubeconfig
        run: |
          echo "${{ secrets.STAGING_KUBECONFIG }}" | base64 -d > /tmp/kubeconfig
          echo "KUBECONFIG=/tmp/kubeconfig" >> $GITHUB_ENV

      - name: Deploy to staging
        run: |
          kubectl set image deployment/api \
            api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-and-push.outputs.image-digest }} \
            -n staging

      - name: Wait for rollout
        run: |
          kubectl rollout status deployment/api -n staging --timeout=5m

      - name: Run smoke tests
        run: |
          # wait for service to be ready
          sleep 10
          curl --fail https://staging.myapp.com/healthz || exit 1
          curl --fail https://staging.myapp.com/api/v1/status || exit 1

Using the image digest (not tag) for deployment guarantees immutability. The smoke test step fails the deployment if the health endpoint doesn't respond.

Step 4: Production Deploy With Manual Approval

# .github/workflows/deploy-prod.yml
name: Deploy to Production

on:
  workflow_dispatch:   # manual trigger
    inputs:
      image_tag:
        description: 'Image tag to deploy (e.g. sha-abc1234)'
        required: true

jobs:
  deploy-prod:
    runs-on: ubuntu-latest
    environment: production    # requires manual approval in GitHub UI

    steps:
      - uses: actions/checkout@v4

      - name: Configure kubectl
        uses: azure/setup-kubectl@v3

      - name: Set kubeconfig
        run: |
          echo "${{ secrets.PROD_KUBECONFIG }}" | base64 -d > /tmp/kubeconfig
          echo "KUBECONFIG=/tmp/kubeconfig" >> $GITHUB_ENV

      - name: Deploy to production
        run: |
          kubectl set image deployment/api \
            api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ inputs.image_tag }} \
            -n production

      - name: Wait for rollout
        run: kubectl rollout status deployment/api -n production --timeout=10m

      - name: Production health check
        run: |
          sleep 15
          for i in {1..5}; do
            curl --fail https://api.myapp.com/healthz && break || sleep 5
          done

      - name: Notify Slack on success
        if: success()
        run: |
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -H 'Content-type: application/json' \
            --data '{"text":"✅ Production deploy successful: ${{ inputs.image_tag }}"}'

      - name: Auto-rollback on failure
        if: failure()
        run: |
          kubectl rollout undo deployment/api -n production
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -H 'Content-type: application/json' \
            --data '{"text":"🚨 Production deploy FAILED — rolled back automatically"}'

The environment: production block in GitHub requires a designated reviewer to approve before the job runs. This is your manual gate.

Step 5: Required Secrets Setup

In GitHub → Settings → Secrets and Variables → Actions:

STAGING_KUBECONFIG    # base64-encoded kubeconfig for staging cluster
PROD_KUBECONFIG       # base64-encoded kubeconfig for production cluster
SLACK_WEBHOOK         # Slack incoming webhook URL
CODECOV_TOKEN         # Coverage reporting

Generate base64 kubeconfig:

cat ~/.kube/config | base64 -w 0

Reusable Workflows — DRY Your CI/CD

If you have multiple repositories running the same pipeline, don't copy-paste. Create reusable workflows:

# .github/workflows/reusable-build.yml (in your shared-actions repo)
name: Reusable Build

on:
  workflow_call:
    inputs:
      image-name:
        required: true
        type: string
      registry:
        required: false
        type: string
        default: 'ghcr.io'
    secrets:
      registry-token:
        required: true
    outputs:
      image-digest:
        description: "Image digest"
        value: ${{ jobs.build.outputs.digest }}

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v4
      - name: Build and push
        id: build
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ${{ inputs.registry }}/${{ inputs.image-name }}:${{ github.sha }}

Call it from any repo:

jobs:
  build:
    uses: myorg/shared-actions/.github/workflows/reusable-build.yml@main
    with:
      image-name: my-service
    secrets:
      registry-token: ${{ secrets.GITHUB_TOKEN }}

This pattern eliminates drift between team pipelines. When you update the shared workflow, all consumers get the fix automatically.

Runner Cost Optimization

GitHub Actions charges per minute for private repos (public is free). At scale this adds up fast.

Cancel outdated runs with concurrency:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true    # Cancels any in-progress run for the same branch

This alone cuts runner minutes 30–50% on active teams. Outdated PR builds are cancelled automatically when you push a new commit.

Aggressive dependency caching:

- name: Cache node_modules
  uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-npm-

The restore-keys fallback ensures you get a partial cache hit even when package-lock.json changes, avoiding a full re-download.

Matrix builds for parallel testing:

jobs:
  test:
    strategy:
      fail-fast: false
      matrix:
        node: [18, 20, 22]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node }}
      - run: npm ci && npm test

Three Node.js versions tested in parallel — same wall-clock time as one, triple the coverage.

Common Mistakes to Avoid

Hardcoding credentials in workflow files: Never write password: mypassword. Always use ${{ secrets.MY_SECRET }}. GitHub scans for leaked secrets and will alert you, but the damage is already done by then.

Not pinning action versions:

# Bad — can silently break when the action releases a new major version
uses: actions/checkout@main

# Good — pinned to a specific SHA
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2

Unpinned actions can change behavior on any release. For production pipelines, pin to a commit SHA — not even a tag, because tags can be moved.

Long-running CI blocking PRs: Anything over 5 minutes kills developer velocity. Split heavy integration tests into a separate scheduled workflow that runs nightly, not on every PR. Keep the PR pipeline fast: lint + unit tests + Docker build only.

Exposing secrets in logs:

# Always mask dynamic secrets
- name: Login to registry
  run: |
    echo "::add-mask::${{ steps.get-token.outputs.token }}"
    docker login -u user -p ${{ steps.get-token.outputs.token }}

The ::add-mask:: command redacts the value from all subsequent log output.

Protecting the Main Branch

In GitHub → Settings → Branches → Add rule for main:

Require status checks: test, build
Require PR before merging
Require up-to-date branch before merging
Dismiss stale reviews

This ensures no code bypasses CI to reach production.

Key Takeaways

Use image digest (not tag) for production deployments — immutable and auditable
Trivy security scanning blocks dangerous images before they reach production
GitHub Environments provide manual approval gates for production deploys
Auto-rollback on health check failure eliminates most 3am wake-ups
Docker layer caching in GHA reduces build times by 60–80%

Conclusion

This pipeline handles everything from PR checks to production rollback automatically. The manual approval gate for production means you control the timing, not the pipeline. Once set up, deployments become boring — and boring is exactly what production should be.

Next: Zero-Downtime Kubernetes Rolling Deployments — The Full Guide

Published: 2026-04-15 | Category: DevOps | Read time: 10 min