An IOFENCE is a GAB initiated signal that will eject a node from cluster membership. The IOFENCE is invoked as GAB deems it necessary that one or more nodes must leave the current cluster membership for a specific port. After being IOFENCE'd, a node may either rejoin the membership or be halted (intentional panic) depending on the GAB client (port) and the nature of the IOFENCE. In the case that the node is halted, the panic string will include the GAB IOFENCE reason as seen in the scenarios below.
On receiving an IOFENCE, clients must unregister from GAB and remove themselves from the current port membership. In the case of user clients such as HAD, the process is killed which causes the unregistration and the port to be closed. If the clients do not unregister within the interval specified by iofence_timeout, the node is halted. iofence_timeout defaults to 15000ms and can be changed with gabconfig -f . In the case of kernel clients (CFS/CVM) which are not IOFENCE aware, the system is halted.
IOFENCEs can be categorized into two types:
- Locally initiated IOFENCEs where a node deems it necessary to leave the current cluster membership
- Over-the-wire IOFENCEs where a node in the cluster deems that another node must leave and sends it an IOFENCE signal over the heartbeat links.
Scenarios of locally initiated IOFENCEs include:
"client process failure"
GAB clients such as HAD regularly heartbeat with GAB. By default, if the process fails to heartbeat GAB within the heartbeat interval, GAB will IOFENCE it by attempting to kill the process. If killed successfully, port 'h' will be unregistered (closed) and HAD will be restarted by HASHADOW allowing it to rejoin the membership. If, after the five attempts, the process cannot be killed (stuck in the kernel), GAB will halt the system. Of the five attempts to kill the process, the first four use signal SIGABRT and the final one is SIGKILL. The SIGABRT will produce a core file from the process. The SIGKILL is the last resort to kill the process.
If gabconfig -k has been enabled, GAB will repeatedly attempt to kill the process and not halt the system.
If gabconfig -b has been enabled, GAB will halt the system when the client process first fails to heartbeat without making any attempt to kill the process.
The HAD to GAB heartbeat timeout defaults to 15000 milliseconds and can be controlled with the VCS_GAB_TIMEOUT environment variable
From Cluster Server 4.0 onwards, if the client process is not killed, GAB will forcefully close (unregister) the port and start an isolate timer. If after gab_ioslate_time, the process has not been killed, the system is halted. The gab_ioslate_time is meant to give HAD more time to be killed and avoid the panic condition. gab_isolate_time is a kernel tunable parameter that defaults to 120000 milliseconds. The minimum value for this timer is 16 seconds and the maximum is 240 seconds
When a client process failure occurs, a message similar to the following will be logged in /var/adm/messages
GAB WARNING V-15-1-20058 Port h process heartbeat failed, killing process
Note: Client process failures usually indicate a busy or hung system and the source should be investigated.
"depleted memory reserves"
When a node enters a very low memory condition in a critical code path, the system is halted as there is nothing further that can be done.
"internal protocol error"
This indicates a possible bug or unknown error within GAB. If this condition arises, the node will be halted.
Scenarios of over-the-wire io fences include:
"quick re-open"
If a node leaves the running cluster and tries to join before the new cluster can be reconfigured (default is five seconds), the node is sent an IOFENCE. On receiving the message, the node will kill the client process HAD, which will be restarted by the HASHADOW process. The client process will then wait around until the join can eventually succeed. A quick re-open is sometimes referred to as a delayed re-open.
"Network Failure"
If a network partition occurs, a cluster can split into two or more separate clusters. If the two clusters join as one cluster, GAB will detect that a partition has occurred from the committed membership sets (last committed as reported by gabconfig -a ) and will designate that one or more nodes be ejected from the current cluster membership. GAB will then send an IOFENCE signal to offending nodes for the relevant ports - for Cluster Server, these will be ports 'a' and 'h'. GAB on the target system will then kill the client process HAD so that it can unregister (close) from port 'h' membership. GAB will then unregister itself from port 'a' and re-open causing a new join on port 'a'. The client process HAD will then be restarted by the HASHADOW process and will attempt to rejoin the cluster on port 'h'. If gabconfig -j has been enabled, the system is halted when the IOFENCE signal is received by the client process(HAD).
When an IOFENCE due to network failure occurs, the target node will log messages similar to below in /var/adm/messages
GABINFO V-15-1-20041 Port h: network failure: killing process
GABINFO V-15-1-20032 Port h closed
GABINFO V-15-1-20032 Port a closed
GABINFO V-15-1-20026 Port a registration waiting for seed port membership
GABINFO V-15-1-20005 Port h registration waiting for seed port membership
If a network failure IOFENCE is received by a kernel based GAB client, the node will be halted. The gabconfig-j option only applies to process based GAB clients such as HAD('h') or vxconfigd ('w') in Cluster Volume Manager. If the network failure condition is encountered in a SPFS/SFCFS scenario, the CVM kernel client ('v') on the target system will halt the node on receiving an IOFENCE. The panic string will contain the message similar to:
panic: GAB: Port v halting system due to networkfailure
Note: Although the network failure scenario is commonly perceived for situations that have actual network issues, it has also been experienced on busy systems where nodes are too busy to process LLT heartbeats, making their heartbeat links inactive.
"Disjoint memberships"
This can occur where ejected nodes re-join.
What happens when an IOFENCE is sent
In an over-the-wire IOFENCE, the sender of the IOFENCE will be the node with the lowest node ID, commonly called the master. In scenarios where IOFENCEs invoke system halts, the master node will usually be left running as it will be the sender of the IOFENCE and not the receiver.
Prior to sending the IOFENCE, the master node will receive state information from every node in the cluster and output the following to /var/adm/messages :
GABINFO V-15-1-20033 Port a nid0 3 3 3 0 0 0 20 2
GABINFO V-15-1-20033 Port a nid1 3 3 2 3 0 5a258a 20 11
where the fields are: port-id, node-id, connected set, reliable set, valid set, committed set, deferred set, generation number, flags & connects-id. Note the membership fields are bitmask, i.e 0x3 indicating node 0 & 1
When the IOFENCE is sent across the wire, a message similar to below is logged on the sending node:
GABINFO V-15-1-20034 Port a iofence set 2 dst 1 bdst 0
Where the fields are set=membership set, bdst=membership set & dst =destination node of iofence
On the target nodes, the closing and restarting of the ports will be logged in /var/adm/messages