DevOps
Infrastructure as Code
Ansible: Error Handling

Ansible: Error Handling

In production environments, automation must be resilient. Ansible Error Handling provides the tools to manage failures gracefully, implement automated rollbacks, and ensure your playbooks continue running even when non-critical tasks hit a snag.


🛡️ Basic Error Suppression (ignore_errors)

By default, Ansible stops executing on a host as soon as a task fails. The ignore_errors keyword allows you to bypass this behavior.

- name: Run an unstable legacy script
  command: /opt/scripts/legacy_report.sh
  ignore_errors: true
 
- name: This task will run even if the script above fails
  debug:
    msg: "Continuing execution..."

🎯 Custom Control Logic

Sometimes a command "fails" even if it succeeded, or vice versa. Use these keywords to define your own success criteria.

1. failed_when

Define what exactly constitutes a failure.

- name: Check free disk space
  shell: df -h / | tail -1 | awk '{print $5}' | sed 's/%//'
  register: disk_usage
  failed_when: disk_usage.stdout | int > 90

2. changed_when

Prevent tasks from reporting "Changed" status (great for status checks).

- name: Check application status
  command: /usr/bin/app_status
  register: app_status
  changed_when: false  # This task will always report "OK", never "Changed"

🏗️ Structured Error Handling (Block/Rescue/Always)

For complex workflows requiring rollbacks or guaranteed cleanups, use the Block pattern (similar to try/catch/finally in programming).

ComponentFunctionExecution Rule
BlockPrimary LogicTasks to attempt first.
RescueError RecoveryRuns ONLY IF a task in the Block fails.
AlwaysCleanup/LogsRuns REGARDLESS of success or failure.

Practical Scenario: Database Migration

- name: Database Update Workflow
  block:
    - name: 1. Perform Migration
      command: /usr/bin/migrate_db.sh
 
  rescue:
    - name: 2. Rollback on Failure
      command: /usr/bin/rollback_db.sh
      msg: "Migration failed! Rollback initiated."
 
  always:
    - name: 3. Health Summary
      debug:
        msg: "Workflow completed. Checking connectivity... "

💡 Troubleshooting Pro-Tips

[!TIP] Use the Debug Module! When a task fails, use the debug module to inspect variables. You can combine it with ignore_errors to print the exact state before a playbook terminates.

- name: Inspect failed state
  debug:
    var: my_output_variable
  when: my_output_variable is failed

[!CAUTION] Don't Overuse ignore_errors Suppressing errors silently can lead to "silent corruption" where your infrastructure appears to be in the correct state but is actually missing critical components. Always use failed_when or rescue for a more controlled approach.