Ansible: Error Handling

In production environments, automation must be resilient. Ansible Error Handling provides the tools to manage failures gracefully, implement automated rollbacks, and ensure your playbooks continue running even when non-critical tasks hit a snag.

🛡️ Basic Error Suppression (`ignore_errors`)

By default, Ansible stops executing on a host as soon as a task fails. The ignore_errors keyword allows you to bypass this behavior.

- name: Run an unstable legacy script
  command: /opt/scripts/legacy_report.sh
  ignore_errors: true
 
- name: This task will run even if the script above fails
  debug:
    msg: "Continuing execution..."

🎯 Custom Control Logic

Sometimes a command "fails" even if it succeeded, or vice versa. Use these keywords to define your own success criteria.

1. `failed_when`

Define what exactly constitutes a failure.

- name: Check free disk space
  shell: df -h / | tail -1 | awk '{print $5}' | sed 's/%//'
  register: disk_usage
  failed_when: disk_usage.stdout | int > 90

2. `changed_when`

Prevent tasks from reporting "Changed" status (great for status checks).

- name: Check application status
  command: /usr/bin/app_status
  register: app_status
  changed_when: false  # This task will always report "OK", never "Changed"

🏗️ Structured Error Handling (`Block/Rescue/Always`)

For complex workflows requiring rollbacks or guaranteed cleanups, use the Block pattern (similar to try/catch/finally in programming).

Component	Function	Execution Rule
Block	Primary Logic	Tasks to attempt first.
Rescue	Error Recovery	Runs ONLY IF a task in the Block fails.
Always	Cleanup/Logs	Runs REGARDLESS of success or failure.

Practical Scenario: Database Migration

- name: Database Update Workflow
  block:
    - name: 1. Perform Migration
      command: /usr/bin/migrate_db.sh
 
  rescue:
    - name: 2. Rollback on Failure
      command: /usr/bin/rollback_db.sh
      msg: "Migration failed! Rollback initiated."
 
  always:
    - name: 3. Health Summary
      debug:
        msg: "Workflow completed. Checking connectivity... "

💡 Troubleshooting Pro-Tips

[!TIP] Use the Debug Module! When a task fails, use the debug module to inspect variables. You can combine it with ignore_errors to print the exact state before a playbook terminates.

- name: Inspect failed state
  debug:
    var: my_output_variable
  when: my_output_variable is failed

[!CAUTION] Don't Overuse ignore_errors Suppressing errors silently can lead to "silent corruption" where your infrastructure appears to be in the correct state but is actually missing critical components. Always use failed_when or rescue for a more controlled approach.

Securing Infrastructure with Ansible Vault Project: Automated Express Deployment

Ansible: Error Handling

🛡️ Basic Error Suppression (ignore_errors)