Cluster failover does not occur when node is in a jeopardy membership

book

Article ID: 100024810

calendar_today

Updated On:

Description

Error Message

From Engine_a.txt of SERVER1:

2011/06/03 11:36:29 VCS INFO V-16-6-15004 (SERVER2) hatrigger:Failed to send trigger for postonline; script doesn't exist
2011/06/04 02:52:17 VCS INFO V-16-1-10077 Received new cluster membership
2011/06/04 02:52:17 VCS NOTICE V-16-1-10112 System (SERVER1) - Membership: 0x7, DDNA: 0x4
2011/06/04 02:52:17 VCS ERROR V-16-1-10111 System SERVER2 (Node '2') is in Regular and Jeopardy Memberships - Membership: 0x7, Jeopardy: 0x4
2011/06/04 02:52:22 VCS INFO V-16-1-10077 Received new cluster membership
2011/06/04 02:52:22 VCS NOTICE V-16-1-10112 System (SERVER1) - Membership: 0x3, DDNA: 0x0
2011/06/04 02:52:22 VCS ERROR V-16-1-10079 System SERVER2 (Node '2') is in Down State - Membership: 0x3
2011/06/04 02:52:22 VCS ERROR V-16-1-10322 System SERVER2 (Node '2') changed state from RUNNING to FAULTED

From Engine_a.txt of SERVER2:

2011/06/03 11:36:28 VCS NOTICE V-16-1-10447 Group D_Rpt_sql_grp is online on system SERVER2
2011/06/03 11:36:29 VCS INFO V-16-6-15004 (SERVER2) hatrigger:Failed to send trigger for postonline; script doesn't exist
2011/06/04 08:28:03 VCS INFO V-16-1-10196 Cluster logger started

Cause

The reason why the group did not failover is that SERVER2 was in a “jeopardy” state immediately prior to faulting. The jeopardy status means that all but one heartbeat was down. If both heartbeats were online and the node went down, the failover would have been triggered. If the node faults while it is in a jeopardy state, a failover will not occur. This is by design. Even if a failover was attempted by VCS, it likely would not been successful anyway because SERVER2 was not completely down and was probably still holding SCSI reservations and the IP address. The chances of a fault during the failover, or situation where the nodes fight over the disks, or a concurrency violation would be high.

Resolution

As you can see, this leaves clusters somewhat vulnerable to situations where one of the nodes hangs, but doesn’t quite fail completely. We recommend setting notifications in VCS to send alerts in situations like this. It’s not as convenient as an automated failover, but cluster software, including VCS, is designed to prioritize data integrity over availability, and the software will avoid failing over if there is a high probability that it could result in a concurrency violation (and possible data corruption).

Issue/Introduction

Cluster failover does not occur when node is in a jeopardy membership.

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"