1. You deployed an Ansible playbook, but it fails on some hosts while working on others. How would you debug and resolve this?
Steps to debug:
- Use
ansible-playbook -vvv playbook.yml
to enable verbose mode for more detailed logs. - Check the error message and failed task details.
- Run
ansible all -m ping
to verify host connectivity. - Use
ansible all -m setup
to check system facts and detect OS/config differences. - Verify inventory configuration and ensure host groups are correctly assigned.
- Run affected tasks in isolation using
--limit <hostname>
and--step
to execute interactively.
Resolution:
- If the issue is environment-specific, modify the playbook to handle different OS versions or missing dependencies using
when
conditions. - Fix missing permissions by using
become: yes
where required.
2. Your Ansible playbook takes too long to execute. How would you optimize it for faster performance?
Optimization techniques:
- Use
serial
in playbooks to run tasks in parallel instead of executing sequentially. - Use fact caching (
fact_caching: jsonfile
) to avoid gathering facts repeatedly. - Use
async
&poll
for long-running tasks to execute asynchronously. - Reduce SSH overhead by setting
pipelining = True
inansible.cfg
. - Optimize loops by using
with_items
instead of separate tasks. - Use
changed_when
andcheck_mode
to avoid unnecessary task execution.
3. You need to ensure that Ansible applies configurations only if there’s a change. How would you implement this?
- Use idempotent modules like
copy
,template
,lineinfile
, andfile
, which apply changes only when required. - Use
changed_when
in shell/command tasks to detect actual changes. - Use
notify
handlers to trigger actions only when a task reports a change. - Use
check_mode: yes
to dry-run and see changes before applying them.
4. After an Ansible update, your playbooks start failing due to deprecated modules. How do you handle this situation?
Steps to resolve:
- Identify deprecated modules by running
ansible-playbook --check -vvv playbook.yml
. - Check the Ansible release notes for alternative modules.
- Modify playbooks to use recommended replacements (e.g., replacing
ec2
withamazon.aws.ec2_instance
). - Test the updated playbook in a non-production environment before deploying.
- Pin the working Ansible version using
pip install ansible==<working-version>
if immediate migration isn’t possible.
5. Your Ansible playbook is executing but not making the expected changes to remote servers. How would you troubleshoot?
- Run the playbook with
--check --diff
to preview changes before applying. - Check
register
variables to validate task outputs. - Ensure the
changed_when
condition is correctly set. - Verify correct
become
privileges are applied. - Check inventory and variables using
ansible-inventory --graph
. - Review logs using
-vvv
for deeper insights.
6. How would you design an Ansible architecture to scale across multiple regions with minimal overhead?
Key considerations:
- Use Ansible Tower/AWX to centralize execution and manage multiple environments.
- Implement a dynamic inventory for cloud-based scaling (AWS, Azure, GCP).
- Use delegate_to and local_action to reduce SSH connections.
- Leverage pull-based execution using
ansible-pull
for distributed control. - Use fact caching and log aggregation for efficient execution.
- Optimize networking by using regional jump hosts or bastion servers.
7. You need to ensure zero downtime while applying Ansible playbooks in production. What approach would you take?
- Use rolling updates with
serial: <N>
to update a few hosts at a time. - Implement blue-green deployment by switching traffic between two environments.
- Use canary deployment, applying changes to a subset before full rollout.
- Ensure health checks before restarting critical services.
- Use
reboot
module withpre/post
checks to prevent downtime.
8. An Ansible playbook is stuck waiting for user input, causing automation failures. How would you fix it?
- Check for tasks requiring interactive input and set
-e "ansible_ask_pass=False"
. - Use
no_log: true
for sensitive prompts (passwords, secrets). - Pass required values via
extra_vars
(-e "var_name=value"
). - Modify prompts in playbook with default values to avoid manual input.
9. Your Ansible deployment in a hybrid cloud environment is failing due to network latency. What strategies can you use?
- Use multiple inventory sources for better region-based execution.
- Implement Ansible Tower/AWX to manage execution closer to target regions.
- Reduce SSH connections by batching tasks with
serial
andforks
. - Use
ansible-pull
for edge deployments to reduce network overhead. - Optimize playbook logic to minimize repeated connections.
10. You need to apply a security patch across 1,000+ servers using Ansible while ensuring rollback in case of failure. How would you do this?
Steps for safe deployment:
- Take a backup before applying changes using the
fetch
orcopy
module. - Deploy in batches using
serial: <N>
to minimize impact. - Verify patch installation using a
test_command
before moving forward. - Rollback strategy:
- Use
when
conditions to apply rollback if verification fails. - Store a snapshot of critical files (
tar
backup). - Revert to previous packages if necessary (
yum history undo
ordpkg --remove
).
- Use
- Monitor & alert: Use
notify
handlers to report failures immediately.