VCS unable to failover the service group after sytem crash or power-off
book
Article ID: 100006714
calendar_today
Updated On:
Cause
The problem is fixed in Etrack 1937834 which is a child of parent Etrack 2081720. Please refer to the Supplemental Material section of this article for details.
Resolution
The fix for this issue is RP2 which can be downloaded here:
https://sort.Veritas.com/patch/detail/5510
Issue/Introduction
From the /var/VRTSvcs/log/engine_A.log of the surviving node in the cluster we see that the server crashed or powered off abruptly.
2011/10/06 18:03:35 VCS INFO V-16-1-10077 Received new cluster membership
2011/10/06 18:03:35 VCS NOTICE V-16-1-10112 System (server01) - Membership: 0x2, DDNA: 0x0
2011/10/06 18:03:35 VCS ERROR V-16-1-10079 System server01 (Node '0') is in Down State - Membership: 0x2
2011/10/06 18:03:35 VCS ERROR V-16-1-10322 System server01 (Node '0') changed state from RUNNING to FAULTED
2011/10/06 18:03:35 VCS NOTICE V-16-1-10446 Group AGILESTAGE is offline on system server01
As we can see the service group AGILESTAGE is just reported offline but no falilover occurs.
From the /var/adm/messages file we see the following:
Oct 6 18:03:17 server kernel: LLT INFO V-14-1-10205 link 0 (eth2) node 0 in trouble
Oct 6 18:03:17 server kernel: LLT INFO V-14-1-10205 link 1 (eth3) node 0 in trouble
Oct 6 18:03:19 server kernel: LLT INFO V-14-1-10205 link 2 (bond0) node 0 in trouble
Oct 6 18:03:30 server kernel: LLT INFO V-14-1-10509 link 0 (eth2) node 0 expired
Oct 6 18:03:30 server kernel: LLT INFO V-14-1-10509 link 1 (eth3) node 0 expired
Oct 6 18:03:35 server kernel: GAB INFO V-15-1-20036 Port a gen dc712 membership ;1
Oct 6 18:03:35 server kernel: GAB INFO V-15-1-20036 Port h gen dc715 membership ;1
Oct 6 18:03:35 server Had[14619]: VCS INFO V-16-1-10077 Received new cluster membership
Oct 6 18:03:35 server Had[14619]: VCS ERROR V-16-1-10079 System server (Node '0') is in Down State - Membership: 0x2
Oct 6 18:03:35 server Had[14619]: VCS ERROR V-16-1-10322 System server (Node '0') changed state from RUNNING to FAULTED
After powering the server back up, another test is run. This time "had" and "hashadow" are killed with a signal "-15" and as expected the group is autodisabled.
2011/10/06 18:41:48 VCS NOTICE V-16-1-10112 System (server02) - Membership: 0x2, DDNA: 0x1
2011/10/06 18:41:48 VCS ERROR V-16-1-10113 System server01 (Node '0') is in DDNA Membership - Membership: 0x2, Visible: 0x0
2011/10/06 18:41:48 VCS ERROR V-16-1-10322 System server01 (Node '0') changed state from RUNNING to FAULTED
2011/10/06 18:41:48 VCS NOTICE V-16-1-10449 Group AGILESTAGE autodisabled on node server01 until it is probed
2011/10/06 18:41:48 VCS NOTICE V-16-1-10449 Group VCShmg autodisabled on node server01 until it is probed
2011/10/06 18:41:48 VCS NOTICE V-16-1-10446 Group AGILESTAGE is offline on system server01
The sever is is then powered off and then the SG fails over to the other node as per design.
2011/10/06 18:42:53 VCS WARNING V-16-1-11141 LLT heartbeat link status changed. Previous status = eth2 UP eth3 UP bond0 UP; Current status = eth2 DOWN eth3 DOWN bond0 DOWN.
2011/10/06 18:42:54 VCS INFO V-16-1-10077 Received new cluster membership
2011/10/06 18:42:54 VCS NOTICE V-16-1-10112 System (server02) - Membership: 0x2, DDNA: 0x0
2011/10/06 18:42:54 VCS ERROR V-16-1-10079 System server01 (Node '0') is in Down State - Membership: 0x2
2011/10/06 18:42:54 VCS NOTICE V-16-1-10451 Cleared attribute-'autodisabled' for Group AGILESTAGE on node server01
2011/10/06 18:42:54 VCS NOTICE V-16-1-10451 Cleared attribute-'autodisabled' for Group VCShmg on node server01
2011/10/06 18:42:54 VCS ERROR V-16-1-10205 Group AGILESTAGE is faulted on system server01
2011/10/06 18:42:54 VCS NOTICE V-16-1-10446 Group AGILESTAGE is offline on system server01
2011/10/06 18:42:54 VCS INFO V-16-1-10493 Evaluating server02 as potential target node for group AGILESTAGE
2011/10/06 18:42:54 VCS INFO V-16-1-10493 Evaluating server01 as potential target node for group AGILESTAGE
2011/10/06 18:42:54 VCS INFO V-16-1-10494 System server01 not in RUNNING state
2011/10/06 18:42:54 VCS NOTICE V-16-1-10301 Initiating Online of Resource AGILESTAGE_DISK (Owner: Unspecified, Group: AGILESTAGE) on System server02
The second test shows that the service group configuration is correct and the problem is caused by a VCS bug.
Additional Information
ETrack: 1599129
ETrack: 2081720
ETrack: 1937834
Was this article helpful?
thumb_up
Yes
thumb_down
No