VCS service group remains in STARTING|PARTIAL state if a non-critical resource is in FAULTED state
book
Article ID: 100027914
calendar_today
Updated On:
Description
Error Message
# hagrp -state SG1
#Group Attribute System Value
SG1 State node1 |PARTIAL|STARTING|
SG1 State node2 |OFFLINE|
# hares -display -attribute State -group SG1 -sys node1
#Resource Attribute System Value
critical_res1 State node1 ONLINE
critical_res2 State node1 ONLINE
noncrit_res1 State node1 FAULTED
noncrit_res2 State node1 FAULTED
# hagrp -switch SG1 -to node2
VCS WARNING V-16-1-55046 Group SG1 is in transition on system node1; operations are not allowed
Cause
If online propagation reaches a non-critical resource in FAULTED state, it will prevent further propagation but will not trigger a service group fault. Consequently, the service group will neither succeed nor fail to become online, thus remaining indefinitely in STARTING|PARTIAL state.
This is an expected behavior (as per the design) of VCS engine prior to version 6.0, but can potentially result in unnecessary interruptions depending on the configuration, especially if the user is unaware of the exact cause.
This issue is being tracked under eTrack ID: 2210717.
Resolution
Veritas Engineering has modified the behavior of VCS engine (version 6.0 and above) such that it will re-calculate the service group state in case a non-critical resource faults during online process, and remove STARTING flag from the group state if the faulted resource was the last one waiting to go online.
This change is also included in the following patch release for VCS 5.1SP1RP3:
- VCS 5.1SP1RP3HF1
(Please contact Veritas Enterprise Support to obtain this patch.)
Applies To
- All OS platforms
- Pre-6.0 versions of Veritas Cluster Server (VCS) (up to VCS 5.1SP1RP3, at the time of writing)
Issue/Introduction
If a non-critical (Critical = 0) resource in a Veritas Cluster Server (VCS) service group is in FAULTED state when the service group is being brought online, the service group will remain indefinitely in "STARTING|PARTIAL" state.
Once the service group enters this state, users will not be able to switch over the service group to another node, as the group is in a transitional state; further attempts to take the service group offline may result in some resources stuck in W_OFFLINE_REVERSE or W_OFFLINE_PROPAGATE state.
Additional Information
ETrack: 2210717
Was this article helpful?
thumb_up
Yes
thumb_down
No