Friday, February 7, 2025

Ansible Scenario based interview question & Answers

 

1. You deployed an Ansible playbook, but it fails on some hosts while working on others. How would you debug and resolve this?

Steps to debug:

  • Use ansible-playbook -vvv playbook.yml to enable verbose mode for more detailed logs.
  • Check the error message and failed task details.
  • Run ansible all -m ping to verify host connectivity.
  • Use ansible all -m setup to check system facts and detect OS/config differences.
  • Verify inventory configuration and ensure host groups are correctly assigned.
  • Run affected tasks in isolation using --limit <hostname> and --step to execute interactively.

Resolution:

  • If the issue is environment-specific, modify the playbook to handle different OS versions or missing dependencies using when conditions.
  • Fix missing permissions by using become: yes where required.

2. Your Ansible playbook takes too long to execute. How would you optimize it for faster performance?

Optimization techniques:

  • Use serial in playbooks to run tasks in parallel instead of executing sequentially.
  • Use fact caching (fact_caching: jsonfile) to avoid gathering facts repeatedly.
  • Use async & poll for long-running tasks to execute asynchronously.
  • Reduce SSH overhead by setting pipelining = True in ansible.cfg.
  • Optimize loops by using with_items instead of separate tasks.
  • Use changed_when and check_mode to avoid unnecessary task execution.

3. You need to ensure that Ansible applies configurations only if there’s a change. How would you implement this?

  • Use idempotent modules like copy, template, lineinfile, and file, which apply changes only when required.
  • Use changed_when in shell/command tasks to detect actual changes.
  • Use notify handlers to trigger actions only when a task reports a change.
  • Use check_mode: yes to dry-run and see changes before applying them.

4. After an Ansible update, your playbooks start failing due to deprecated modules. How do you handle this situation?

Steps to resolve:

  • Identify deprecated modules by running ansible-playbook --check -vvv playbook.yml.
  • Check the Ansible release notes for alternative modules.
  • Modify playbooks to use recommended replacements (e.g., replacing ec2 with amazon.aws.ec2_instance).
  • Test the updated playbook in a non-production environment before deploying.
  • Pin the working Ansible version using pip install ansible==<working-version> if immediate migration isn’t possible.

5. Your Ansible playbook is executing but not making the expected changes to remote servers. How would you troubleshoot?

  • Run the playbook with --check --diff to preview changes before applying.
  • Check register variables to validate task outputs.
  • Ensure the changed_when condition is correctly set.
  • Verify correct become privileges are applied.
  • Check inventory and variables using ansible-inventory --graph.
  • Review logs using -vvv for deeper insights.

6. How would you design an Ansible architecture to scale across multiple regions with minimal overhead?

Key considerations:

  • Use Ansible Tower/AWX to centralize execution and manage multiple environments.
  • Implement a dynamic inventory for cloud-based scaling (AWS, Azure, GCP).
  • Use delegate_to and local_action to reduce SSH connections.
  • Leverage pull-based execution using ansible-pull for distributed control.
  • Use fact caching and log aggregation for efficient execution.
  • Optimize networking by using regional jump hosts or bastion servers.

7. You need to ensure zero downtime while applying Ansible playbooks in production. What approach would you take?

  • Use rolling updates with serial: <N> to update a few hosts at a time.
  • Implement blue-green deployment by switching traffic between two environments.
  • Use canary deployment, applying changes to a subset before full rollout.
  • Ensure health checks before restarting critical services.
  • Use reboot module with pre/post checks to prevent downtime.

8. An Ansible playbook is stuck waiting for user input, causing automation failures. How would you fix it?

  • Check for tasks requiring interactive input and set -e "ansible_ask_pass=False".
  • Use no_log: true for sensitive prompts (passwords, secrets).
  • Pass required values via extra_vars (-e "var_name=value").
  • Modify prompts in playbook with default values to avoid manual input.

9. Your Ansible deployment in a hybrid cloud environment is failing due to network latency. What strategies can you use?

  • Use multiple inventory sources for better region-based execution.
  • Implement Ansible Tower/AWX to manage execution closer to target regions.
  • Reduce SSH connections by batching tasks with serial and forks.
  • Use ansible-pull for edge deployments to reduce network overhead.
  • Optimize playbook logic to minimize repeated connections.

10. You need to apply a security patch across 1,000+ servers using Ansible while ensuring rollback in case of failure. How would you do this?

Steps for safe deployment:

  1. Take a backup before applying changes using the fetch or copy module.
  2. Deploy in batches using serial: <N> to minimize impact.
  3. Verify patch installation using a test_command before moving forward.
  4. Rollback strategy:
    • Use when conditions to apply rollback if verification fails.
    • Store a snapshot of critical files (tar backup).
    • Revert to previous packages if necessary (yum history undo or dpkg --remove).
  5. Monitor & alert: Use notify handlers to report failures immediately.