Ansible target hosts report unreachable when running an InfoScale Ansible playbook

book

Article ID: 100051629

calendar_today

Updated On:

Description

Error Message

When adding verbose logging to the Ansible playbook command, you'll get the following messages in the debug output.

  • Failed to connect to the host via ssh
  • mux_client_read_packet: read header failed: Broken pipe
  • Control master terminated unexpectedly

Cause

This is indicative of the SSH session timing out. This can happen if there are larger patches that take longer to install, such as VRTSvxvm.

Resolution

Configure the "ssh_args" value in the ansible.cfg file to include ServerAliveInterval and ServerAliveCountMax. These settings will only be applied when running Ansible commands.

  • Example of ssh_args in ansible.cfg:
    • ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=30 -o ServerAliveCountMax=10

      • Ansible recommends using ControlMaster and ControlPersist in the ssh_args as well, in order to improve playbook performance.

By default, the SSH configuration in the OS has ServerAliveCountMax set to 0 and ServerAliveInterval set to 30. This indicates that no alive signals are sent and the session is terminated when the TCP session does. Setting the ServerAliveCountMax higher allows the session to send alive signals at the ServerAliveInterval value.

In the example above, the ServerAliveCountMax is set to 10 while the ServerAliveInterval is set to 30. This means that the alive signal will be sent every 30 seconds until it has sent 10 alive signals. The total SSH timeout in the example above is 300 seconds , or 5 minutes.

The ServerAliveCountMax and ServerAliveInterval values can be tuned to your environment's needs.

Issue/Introduction

When running an InfoScale Ansible playbook, the target hosts report "Unreachable" even though they are able to communicate properly outside of Ansible. PLAY RECAP ***********************************************************************************************************
server101 : ok=1 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
server102 : ok=1 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0