1. You deployed an Ansible playbook, but it fails on some hosts while working on others. How would you debug and resolve this?
Steps to debug:
- Use
ansible-playbook -vvv playbook.ymlto enable verbose mode for more detailed logs. - Check the error message and failed task details.
- Run
ansible all -m pingto verify host connectivity. - Use
ansible all -m setupto check system facts and detect OS/config differences. - Verify inventory configuration and ensure host groups are correctly assigned.
- Run affected tasks in isolation using
--limit <hostname>and--stepto execute interactively.
Resolution:
- If the issue is environment-specific, modify the playbook to handle different OS versions or missing dependencies using
whenconditions. - Fix missing permissions by using
become: yeswhere required.
2. Your Ansible playbook takes too long to execute. How would you optimize it for faster performance?
Optimization techniques:
- Use
serialin playbooks to run tasks in parallel instead of executing sequentially. - Use fact caching (
fact_caching: jsonfile) to avoid gathering facts repeatedly. - Use
async&pollfor long-running tasks to execute asynchronously. - Reduce SSH overhead by setting
pipelining = Trueinansible.cfg. - Optimize loops by using
with_itemsinstead of separate tasks. - Use
changed_whenandcheck_modeto avoid unnecessary task execution.
3. You need to ensure that Ansible applies configurations only if there’s a change. How would you implement this?
- Use idempotent modules like
copy,template,lineinfile, andfile, which apply changes only when required. - Use
changed_whenin shell/command tasks to detect actual changes. - Use
notifyhandlers to trigger actions only when a task reports a change. - Use
check_mode: yesto dry-run and see changes before applying them.
4. After an Ansible update, your playbooks start failing due to deprecated modules. How do you handle this situation?
Steps to resolve:
- Identify deprecated modules by running
ansible-playbook --check -vvv playbook.yml. - Check the Ansible release notes for alternative modules.
- Modify playbooks to use recommended replacements (e.g., replacing
ec2withamazon.aws.ec2_instance). - Test the updated playbook in a non-production environment before deploying.
- Pin the working Ansible version using
pip install ansible==<working-version>if immediate migration isn’t possible.
5. Your Ansible playbook is executing but not making the expected changes to remote servers. How would you troubleshoot?
- Run the playbook with
--check --diffto preview changes before applying. - Check
registervariables to validate task outputs. - Ensure the
changed_whencondition is correctly set. - Verify correct
becomeprivileges are applied. - Check inventory and variables using
ansible-inventory --graph. - Review logs using
-vvvfor deeper insights.
6. How would you design an Ansible architecture to scale across multiple regions with minimal overhead?
Key considerations:
- Use Ansible Tower/AWX to centralize execution and manage multiple environments.
- Implement a dynamic inventory for cloud-based scaling (AWS, Azure, GCP).
- Use delegate_to and local_action to reduce SSH connections.
- Leverage pull-based execution using
ansible-pullfor distributed control. - Use fact caching and log aggregation for efficient execution.
- Optimize networking by using regional jump hosts or bastion servers.
7. You need to ensure zero downtime while applying Ansible playbooks in production. What approach would you take?
- Use rolling updates with
serial: <N>to update a few hosts at a time. - Implement blue-green deployment by switching traffic between two environments.
- Use canary deployment, applying changes to a subset before full rollout.
- Ensure health checks before restarting critical services.
- Use
rebootmodule withpre/postchecks to prevent downtime.
8. An Ansible playbook is stuck waiting for user input, causing automation failures. How would you fix it?
- Check for tasks requiring interactive input and set
-e "ansible_ask_pass=False". - Use
no_log: truefor sensitive prompts (passwords, secrets). - Pass required values via
extra_vars(-e "var_name=value"). - Modify prompts in playbook with default values to avoid manual input.
9. Your Ansible deployment in a hybrid cloud environment is failing due to network latency. What strategies can you use?
- Use multiple inventory sources for better region-based execution.
- Implement Ansible Tower/AWX to manage execution closer to target regions.
- Reduce SSH connections by batching tasks with
serialandforks. - Use
ansible-pullfor edge deployments to reduce network overhead. - Optimize playbook logic to minimize repeated connections.
10. You need to apply a security patch across 1,000+ servers using Ansible while ensuring rollback in case of failure. How would you do this?
Steps for safe deployment:
- Take a backup before applying changes using the
fetchorcopymodule. - Deploy in batches using
serial: <N>to minimize impact. - Verify patch installation using a
test_commandbefore moving forward. - Rollback strategy:
- Use
whenconditions to apply rollback if verification fails. - Store a snapshot of critical files (
tarbackup). - Revert to previous packages if necessary (
yum history undoordpkg --remove).
- Use
- Monitor & alert: Use
notifyhandlers to report failures immediately.