Introduction
Ansible is the automation Swiss Army knife. It configures servers, deploys applications, orchestrates multi-tier rollouts, and patches operating systems — all with YAML playbooks and no agents required on target hosts.
In 2026, Ansible remains essential for SRE teams that manage hybrid infrastructure — cloud VMs, on-prem bare metal, and network devices — from a single control plane. This guide covers production patterns with dynamic inventories, AWX, and idempotent design.
Dynamic Inventories: Stop Hardcoding Hosts
Static inventory files do not scale. When EC2 instances come and go or Kubernetes pods reschedule, your playbook should discover targets dynamically:
# ansible.cfg
[defaults]
inventory = ./inventory.aws_ec2.yml
# inventory.aws_ec2.yml
plugin: aws_ec2
regions:
- us-east-1
- eu-west-1
filters:
tag:Environment: production
instance-state-name: running
keyed_groups:
- key: tags.Role
prefix: role
- key: tags.Service
prefix: service
This inventory groups EC2 instances by Role and Service tags — role_web, service_payments — so playbooks target them by group name. No IP addresses ever appear in a file.
For GCP:
plugin: google.cloud.gcp_compute
projects:
- my-project
filters:
- status = RUNNING
keyed_groups:
- key: labels.service
prefix: service
Dynamic inventories ensure your playbook always runs against currently running instances — never against terminated ones.
Idempotency: The First Law of Ansible
An Ansible playbook must be idempotent — running it twice produces the same result as running it once. The state: present pattern enforces this:
- name: Ensure Nginx is installed and running
hosts: role_web
tasks:
- name: Install Nginx
apt:
name: nginx
state: present
update_cache: true
- name: Ensure Nginx config
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
mode: "0644"
notify: reload nginx
- name: Ensure Nginx is running
service:
name: nginx
state: started
enabled: true
apt: state=present checks if Nginx is installed before doing anything. template checks if the config file changed. service: state=started checks if the service is running. Run this playbook 100 times — it only makes changes on the first run (or when the template changes).
AWX: Self-Service Automation
AWX (the open-source upstream of Red Hat Ansible Tower) provides a web UI and REST API for running Ansible playbooks. It solves the "only the SRE team can run Ansible" bottleneck:
kubectl apply -k https://github.com/ansible/awx-operator/config
Once deployed, create a Job Template that non-SRE users can run:
# AWX Job Template
name: Provision Staging Environment
playbook: provision_environment.yml
inventory: aws_ec2
credentials: [aws_access_key, vault_password]
survey:
- question: Service Name
type: text
required: true
- question: Environment
type: multiple_choice
choices: [staging, qa]
A developer opens AWX, selects the template, fills in the survey, and clicks Launch. The playbook runs with the SRE team's credentials and audit trail — no shell access required.
Ansible vs Terraform: When to Use Which
| Tool | Best For | State Management |
|---|---|---|
| Ansible | Configuration management, app deployment | Procedural, no state file |
| Terraform | Cloud resource provisioning | State file, declarative |
| Ansible + Terraform | Provision infra with Terraform, configure with Ansible | Terraform calls Ansible |
The two are complementary, not competitive. Use Terraform to create an EC2 instance and Ansible to configure it. Use Ansible to patch 500 servers — Terraform is not designed for ongoing configuration management.
Ansible Vault: Secrets in Playbooks
Secrets belong in Ansible Vault, not in plaintext variables:
ansible-vault create group_vars/production/vault.yml
# Paste: db_password: "super-secret-value"
Reference in playbooks:
- name: Configure database
hosts: db
vars_files:
- group_vars/production/vault.yml
tasks:
- name: Set database password
mysql_user:
name: app
password: "{{ db_password }}"
Run with --ask-vault-pass or store the vault password in AWX credentials.
Production Patterns
- Use roles — reusable, shareable collections of tasks. One
nginxrole, used across 40 services. - Use
check_modeanddiffbefore applying changes:ansible-playbook --check --diff site.yml - Limit blast radius with
--limit:ansible-playbook site.yml --limit role_web[0:4](canary rollouts) - Use
serialfor rolling updates:serial: 2updates 2 hosts at a time, verifying each batch before proceeding.
For teams building full infrastructure pipelines, combine Ansible's configuration management with the infrastructure provisioning in our Platform Engineering guide, which covers Crossplane and self-service IDPs.
Ansible is not the newest tool, but it is still the most reliable way to ensure 500 servers are configured identically — every time.