CI/CD is the skill that unlocks everything else in DevOps. Once you can ship code automatically — with confidence that broken code never reaches production — your entire engineering velocity changes.
This guide builds a production-ready pipeline from scratch using GitHub Actions. Not a toy example — the same patterns I use on systems with 50+ deployments per day.
What We're Building
A pipeline that:
- Triggers on every push to
mainand every PR - Runs tests and code quality checks
- Builds and pushes a Docker image to a registry
- Deploys to production only when tests pass on
main - Rolls back automatically on failed health checks
The File Structure
.github/
└── workflows/
├── ci.yml # Tests on every push/PR
└── deploy.yml # Deploy on main branch merge
Step 1: The CI Workflow (Tests + Build)
# .github/workflows/ci.yml
name: CI
on:
push:
branches: ['*']
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
name: Test
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: testpass
POSTGRES_DB: testdb
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
redis:
image: redis:7
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 6379:6379
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
- name: Install dependencies
run: pip install -r requirements.txt -r requirements-dev.txt
- name: Run linter
run: ruff check .
- name: Run type checker
run: mypy app/
- name: Run tests
env:
DATABASE_URL: postgresql://postgres:testpass@localhost:5432/testdb
REDIS_URL: redis://localhost:6379
SECRET_KEY: test-secret-key-not-for-production
run: |
pytest tests/ \
--cov=app \
--cov-report=xml \
--cov-fail-under=80 \
-v
- name: Upload coverage
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: false
build:
name: Build Image
runs-on: ubuntu-latest
needs: test
outputs:
image: ${{ steps.meta.outputs.tags }}
digest: ${{ steps.build.outputs.digest }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=sha-
type=ref,event=branch
type=semver,pattern={{version}}
- name: Build and push
id: build
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
Key details:
- Service containers (Postgres, Redis) spin up alongside your tests — real integration tests, not mocks
cache-from: type=ghauses GitHub Actions cache for Docker layer caching — build times drop 60–80% after first run- Image is only pushed to registry on non-PR pushes (saves registry costs and clutter)
needs: testensures build only runs after tests pass
Step 2: The Deploy Workflow
# .github/workflows/deploy.yml
name: Deploy
on:
workflow_run:
workflows: [CI]
types: [completed]
branches: [main]
jobs:
deploy:
name: Deploy to Production
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
environment:
name: production
url: https://yourapp.com
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Get image digest from CI run
id: get-image
run: |
# Get the SHA of the triggering commit
SHA="${{ github.event.workflow_run.head_sha }}"
IMAGE="${{ env.REGISTRY }}/${{ github.repository }}:sha-${SHA:0:7}"
echo "image=$IMAGE" >> $GITHUB_OUTPUT
- name: Deploy to server
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.PROD_HOST }}
username: ${{ secrets.PROD_USER }}
key: ${{ secrets.PROD_SSH_KEY }}
script: |
# Pull new image
docker pull ${{ steps.get-image.outputs.image }}
# Update the running container
docker stop app || true
docker rm app || true
docker run -d \
--name app \
--restart unless-stopped \
-p 8000:8000 \
-e DATABASE_URL="${{ secrets.DATABASE_URL }}" \
-e SECRET_KEY="${{ secrets.SECRET_KEY }}" \
${{ steps.get-image.outputs.image }}
# Health check with retry
for i in {1..12}; do
if curl -sf http://localhost:8000/health; then
echo "Health check passed"
exit 0
fi
echo "Attempt $i failed, waiting 5s..."
sleep 5
done
echo "Health check failed — rolling back"
docker stop app
docker start app-previous || true
exit 1
- name: Notify on failure
if: failure()
uses: 8398a7/action-slack@v3
with:
status: failure
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
Step 3: Secrets Setup
In your GitHub repo → Settings → Secrets and variables → Actions, add:
PROD_HOST # Production server IP/hostname
PROD_USER # SSH username (e.g., ubuntu, deploy)
PROD_SSH_KEY # Private SSH key (generate a deploy key)
DATABASE_URL # Production database connection string
SECRET_KEY # Application secret key
SLACK_WEBHOOK # Optional: Slack notifications
CODECOV_TOKEN # Optional: coverage reporting
Security best practice: Create a dedicated deploy user on your server with minimal permissions — only enough to run Docker commands. Never use root.
Step 4: The Dockerfile That Works With This Pipeline
FROM python:3.12-slim as builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
FROM python:3.12-slim
# Create non-root user
RUN useradd -m -u 1000 app
WORKDIR /app
# Copy dependencies from builder
COPY --from=builder /root/.local /home/app/.local
# Copy application
COPY --chown=app:app . .
USER app
ENV PATH=/home/app/.local/bin:$PATH
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
CMD ["gunicorn", "app.main:app", "-b", "0.0.0.0:8000", "-w", "4"]
Multi-stage build keeps the final image small. Non-root user is a security requirement, not optional. The HEALTHCHECK directive is used by the deployment rollback script.
Advanced: Matrix Testing
Test against multiple Python/Node versions:
jobs:
test:
strategy:
matrix:
python-version: ['3.11', '3.12']
os: [ubuntu-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
Advanced: Reusable Workflows
Once you have multiple repos, extract common steps:
# .github/workflows/reusable-test.yml
on:
workflow_call:
inputs:
python-version:
required: false
type: string
default: '3.12'
jobs:
test:
# ... same test steps
Call it from any repo:
jobs:
test:
uses: your-org/shared-workflows/.github/workflows/reusable-test.yml@main
with:
python-version: '3.12'
secrets: inherit
The Hidden Cost of Bad CI/CD
Companies without proper CI/CD typically experience:
- 2–5 production outages per month from manual deployment errors
- 30–60 min deployment process requiring engineer attention
- "Works on my machine" bugs reaching production
With this pipeline:
- Deployments are automatic and take 4–6 minutes unattended
- Broken code is caught before it reaches main
- Rollback is automatic if production health checks fail
That's SRE-level reliability from a weekend of setup. The engineering time saved compounds every sprint.
Monitoring Your Pipeline
GitHub gives you basic analytics, but also track:
- Build time trend: should stay under 8 minutes for most apps
- Test flakiness rate: tests that fail randomly destroy trust in CI
- Deployment frequency: healthy teams ship 1–10x/day to production
- Change failure rate: % of deployments causing incidents
These are the DORA metrics. Track them to measure engineering team health — not just individual productivity.
A working CI/CD pipeline is the foundation everything else in DevOps is built on. Get this right and every other automation becomes easier.