Service groups active on a cluster node do not fail over when the node stops responding
book
Article ID: 100029238
calendar_today
Updated On:
Resolution
Symptom
When a node fails, active service groups
on failed node enter AutoDisabled state and no failover
occurs.
Cause
When a cluster node stops responding or is
restarted, two events are used to determine the next course of action for its
running service groups.
1. The first event is when the Veritas Cluster
Server (VCS) engine process (had) stops.
2. The second event is when the
Low Latency Transport (LLT) cluster heartbeats from that node stop, as seen from
the other cluster nodes.
If the time interval between these two events is
greater than the default 120 second value, then the service groups that were
active on the failed/restarted node will stay in AutoDisabled state. The Service
groups will not be failed to any active node.
.
If this time interval is
less than 120 seconds, the service groups will have their AutoDisabled status
cleared and will fail over to one of the remaining active nodes.
The VERITAS Cluster Server (VCS) system attribute,
ShutdownTimeout, is used to specify the time interval between the two events.
The default value for this attribute is 120
(seconds).
Details
If the service groups that were active on
the failed/restarted node stay in AutoDisabled state, then the service groups
would need to have that status cleared and be brought online by manual methods.
Verify that the failed/restarted cluster node is not
active.
Clear the AutoDisabled flag from all affected service
groups.
To determine which service groups are affected, use the hastatus
command and examine the AutoDisabled column under the Group State section.
If the AutoDisabled column has a "Y" for a service group, then it is
affected.
# hastatus -sum
# hagrp -autoenable
-sys
where,
service-group-name is the name of the service group and node-name is the cluster
node name for the failed/restarted node.
This command needs to be repeated
for all AutoDisabled service groups.
Bring these service groups
online on the desired active cluster node.
# hagrp -online
-sys
Issue/Introduction
Service groups active on a cluster node do not fail over when the node stops responding
Was this article helpful?
thumb_up
Yes
thumb_down
No