System Freeze behavior has changed between VCS version 4.x and 5.x

book

Article ID: 100004449

calendar_today

Updated On:

Description

Error Message


 

Resolution

The behavior of freezing the system in a VCS cluster has changed dramatically from version 4 to version 5.  In version 4, "freezing a system" meant that none of the Service Groups could fail over from that system.  In version 5 and above, however, it means that Service Groups can still fault, and fail-over to other systems from a frozen system.  Further, a frozen system in version 5 and above means that Service Groups cannot come online while that system is frozen.

 

Frozen System SG failover policy
VERSIONBehavior
4.1SGs cannot fail over to other system when system frozen
5.0 SGs can still fail over
5.0 MP3SGs can still fail over
5.1SGs can still fail over

Here is an example of the change in behavior illustrated on a linux RH5 system running VCS 5.0 MP3

[gfellin@dex ~]# su -

[root@dex ~]# rpm -qa | grep VRTSvcs-5
VRTSvcs-5.0.30.00-MP3_RHEL5
[root@dex ~]#
[root@dex ~]# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen

A  dex                  RUNNING              0
A  polly                RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  websg           dex                  Y          N               ONLINE
B  websg           polly                Y          N               OFFLINE

[root@dex ~]# haconf -makerw
[root@dex ~]# hasys -freeze -persistent dex
[root@dex ~]# haconf -dump -makero

Fault a resource on the frozen system.
[root@dex ~]# umount /webvol


[root@dex ~]# hastatus
attempting to connect....
attempting to connect....connected


group           resource             system               message
--------------- -------------------- -------------------- --------------------
                                     dex                  RUNNING
                                     polly                RUNNING
VCShmg                               dex                  OFFLINE
VCShmg                               polly                OFFLINE
-------------------------------------------------------------------------
websg                                dex                  ONLINE
websg                                polly                OFFLINE
                VCShm                dex                  ONLINE
                VCShm                polly                ONLINE
                webmount             dex                  ONLINE
-------------------------------------------------------------------------
                webmount             polly                OFFLINE
                webvolume            dex                  ONLINE
                webvolume            polly                OFFLINE
                webdg                dex                  ONLINE
                webdg                polly                OFFLINE
-------------------------------------------------------------------------

After the fault is detected, the entire SG fails over to the other node (polly)


websg                                dex                  PARTIAL *FAULTED*
                webmount             dex                  *FAULTED*
websg                                dex                  STOPPING PARTIAL *FAULTED*
                webvolume            dex                  WAITING FOR OFFLINE
                webvolume            dex                  OFFLINE
-------------------------------------------------------------------------
                webdg                dex                  WAITING FOR OFFLINE

 
 

               webdg                dex                  OFFLINE
websg                                dex                  *FAULTED* OFFLINE
                webmount             polly                WAITING FOR CHILDREN ONLINE
                webvolume            polly                WAITING FOR CHILDREN ONLINE
-------------------------------------------------------------------------

                webdg                polly                WAITING FOR ONLINE
websg                                polly                STARTING OFFLINE
                webdg                polly                ONLINE
websg                                polly                STARTING PARTIAL
                webvolume            polly                WAITING FOR ONLINE
-------------------------------------------------------------------------
                webvolume            polly                ONLINE
                webmount             polly                WAITING FOR ONLINE
                webmount             polly                ONLINE
websg                                 polly                   ONLINE

 

With node "dex" still frozen, clear the fault, and then take the resource offline on the other node.  Since "dex" is still frozen, then entire SG faults, but does not fail over, since it has no where else to go in this two node cluster:

[root@dex ~]#
[root@dex ~]# hares -clear webmount  dex
[root@dex ~]# ssh polly "umount /webvol"


[root@dex ~]# hastatus
attempting to connect....
attempting to connect....connected


group           resource             system               message
--------------- -------------------- -------------------- --------------------
                                     dex                  RUNNING
                                     polly                RUNNING
VCShmg                               dex                  OFFLINE
VCShmg                               polly                OFFLINE
-------------------------------------------------------------------------
websg                                dex                  OFFLINE
websg                                polly                ONLINE
                VCShm                dex                  ONLINE
                VCShm                polly                ONLINE
                webmount             dex                  OFFLINE
-------------------------------------------------------------------------
                webmount             polly                ONLINE
                webvolume            dex                  OFFLINE
                webvolume            polly                ONLINE
                webdg                dex                  OFFLINE
                webdg                polly                ONLINE
-------------------------------------------------------------------------

After the fault is detected, the entire SG is faulted, but does not fail over:

websg                                polly                PARTIAL *FAULTED*
                webmount             polly                *FAULTED*
websg                                polly                STOPPING PARTIAL *FAULTED*
                webvolume            polly                WAITING FOR OFFLINE
                webvolume            polly                OFFLINE
-------------------------------------------------------------------------
                webdg                polly                WAITING FOR OFFLINE
                webdg                polly                OFFLINE
websg                                polly                *FAULTED* OFFLINE

  

 


Issue/Introduction

The service group failover behavior between nodes in a VCS cluster has changed in version 5. In v4.x VCS and below:
Freezing a system Freeze a system to prevent the service groups that it hosts from failing over to another system.

In v5.x VCS and above:
Freezing a system Freeze a system to prevent service groups from coming online on the system.
However, they still can fail over, or the entire service group can fault, if a critical resource goes offline while frozen.  

Additional Information

ETrack: 414300