1. You deployed an Ansible playbook, but it fails on some hosts while working on others. How would you debug and resolve this?
Steps to debug:
- Use
ansible-playbook -vvv playbook.yml to enable verbose mode for more detailed logs. - Check the error message and failed task details.
- Run
ansible all -m ping to verify host connectivity. - Use
ansible all -m setup to check system facts and detect OS/config differences. - Verify inventory configuration and ensure host groups are correctly assigned.
- Run affected tasks in isolation using
--limit <hostname> and --step to execute interactively.
Resolution:
- If the issue is environment-specific, modify the playbook to handle different OS versions or missing dependencies using
when conditions. - Fix missing permissions by using
become: yes where required.
2. Your Ansible playbook takes too long to execute. How would you optimize it for faster performance?
Optimization techniques:
- Use
serial in playbooks to run tasks in parallel instead of executing sequentially. - Use fact caching (
fact_caching: jsonfile) to avoid gathering facts repeatedly. - Use
async & poll for long-running tasks to execute asynchronously. - Reduce SSH overhead by setting
pipelining = True in ansible.cfg. - Optimize loops by using
with_items instead of separate tasks. - Use
changed_when and check_mode to avoid unnecessary task execution.
3. You need to ensure that Ansible applies configurations only if there’s a change. How would you implement this?
- Use idempotent modules like
copy, template, lineinfile, and file, which apply changes only when required. - Use
changed_when in shell/command tasks to detect actual changes. - Use
notify handlers to trigger actions only when a task reports a change. - Use
check_mode: yes to dry-run and see changes before applying them.
4. After an Ansible update, your playbooks start failing due to deprecated modules. How do you handle this situation?
Steps to resolve:
- Identify deprecated modules by running
ansible-playbook --check -vvv playbook.yml. - Check the Ansible release notes for alternative modules.
- Modify playbooks to use recommended replacements (e.g., replacing
ec2 with amazon.aws.ec2_instance). - Test the updated playbook in a non-production environment before deploying.
- Pin the working Ansible version using
pip install ansible==<working-version> if immediate migration isn’t possible.
5. Your Ansible playbook is executing but not making the expected changes to remote servers. How would you troubleshoot?
- Run the playbook with
--check --diff to preview changes before applying. - Check
register variables to validate task outputs. - Ensure the
changed_when condition is correctly set. - Verify correct
become privileges are applied. - Check inventory and variables using
ansible-inventory --graph. - Review logs using
-vvv for deeper insights.
6. How would you design an Ansible architecture to scale across multiple regions with minimal overhead?
Key considerations:
- Use Ansible Tower/AWX to centralize execution and manage multiple environments.
- Implement a dynamic inventory for cloud-based scaling (AWS, Azure, GCP).
- Use delegate_to and local_action to reduce SSH connections.
- Leverage pull-based execution using
ansible-pull for distributed control. - Use fact caching and log aggregation for efficient execution.
- Optimize networking by using regional jump hosts or bastion servers.
7. You need to ensure zero downtime while applying Ansible playbooks in production. What approach would you take?
- Use rolling updates with
serial: <N> to update a few hosts at a time. - Implement blue-green deployment by switching traffic between two environments.
- Use canary deployment, applying changes to a subset before full rollout.
- Ensure health checks before restarting critical services.
- Use
reboot module with pre/post checks to prevent downtime.
8. An Ansible playbook is stuck waiting for user input, causing automation failures. How would you fix it?
- Check for tasks requiring interactive input and set
-e "ansible_ask_pass=False". - Use
no_log: true for sensitive prompts (passwords, secrets). - Pass required values via
extra_vars (-e "var_name=value"). - Modify prompts in playbook with default values to avoid manual input.
9. Your Ansible deployment in a hybrid cloud environment is failing due to network latency. What strategies can you use?
- Use multiple inventory sources for better region-based execution.
- Implement Ansible Tower/AWX to manage execution closer to target regions.
- Reduce SSH connections by batching tasks with
serial and forks. - Use
ansible-pull for edge deployments to reduce network overhead. - Optimize playbook logic to minimize repeated connections.
10. You need to apply a security patch across 1,000+ servers using Ansible while ensuring rollback in case of failure. How would you do this?
Steps for safe deployment:
- Take a backup before applying changes using the
fetch or copy module. - Deploy in batches using
serial: <N> to minimize impact. - Verify patch installation using a
test_command before moving forward. - Rollback strategy:
- Use
when conditions to apply rollback if verification fails. - Store a snapshot of critical files (
tar backup). - Revert to previous packages if necessary (
yum history undo or dpkg --remove).
- Monitor & alert: Use
notify handlers to report failures immediately.