autoseed_gab_timeout tunable overview

book

Article ID: 100033733

calendar_today

Updated On:

Description

Error Message

Related error:

Feb 22 08:40:30 localhost kernel: GAB INFO V-15-1-20026 Port a[GAB_Control (refcount 2)] registration waiting for seed port membership
Feb 22 08:40:40 localhost kernel: GAB INFO V-15-1-20005 Port h[GAB_USER_CLIENT (refcount 0)] registration waiting for seed port membership
Feb 22 08:41:17 localhost kernel: GAB INFO V-15-1-20230 Client VxFen inited GAB API with handle ffff8803fd428200
Feb 22 08:41:22 localhost kernel: GAB INFO V-15-1-20036 Port a[GAB_Control (refcount 1)] gen  171d502 membership ;1

Feb 22 08:41:22 localhost kernel: GAB INFO V-15-1-20036 Port b[VxFen (refcount 2)] gen  171d501 membership ;1
Feb 22 08:41:22 localhost kernel: VXFEN WARNING V-11-1-50 VxDMP Preempt Abort ioctl failed with error: 6
Feb 22 08:41:22 localhost kernel: VXFEN WARNING V-11-1-50 VxDMP Preempt Abort ioctl failed with error: 6
Feb 22 08:41:22 localhost kernel: VXFEN WARNING V-11-1-50 VxDMP Preempt Abort ioctl failed with error: 6
Feb 22 08:41:22 localhost kernel: VXFEN WARNING V-11-1-12 Potentially a
Feb 22 08:41:22 localhost kernel:        preexisting split-brain.
Feb 22 08:41:22 localhost kernel:      I/O Fencing DISABLED!VXFEN INFO V-11-1-35
 Fencing driver going into RUNNING state
Feb 22 08:41:22 localhost kernel: GAB INFO V-15-1-20032 Port b closed
Feb 22 08:41:22 localhost kernel: GAB INFO V-15-1-20229 Client VxFen deiniting GAB API

Cause

In the scenario described, a 3 node cluster exists with nodes 14, 15 and 16.

with autoseed_gab_timeout enabled on all nodes in /etc/vxfenmode, then attempted following:

- powered off nodes 15 and 16 with 'poweroff -force'
- see that node 14 is up and now registered alone:

[root@node-14 ~]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   bc6e03 membership 0
Port b gen   bc6e07 membership 0
Port h gen   bc6e05 membership 0


and node 14 has just its keys populated:

[root@node-14 ~]# vxfenadm -s -r -f /etc/vxfentab

Device Name: /dev/vx/rdmp/emc_clariion0_74

Total Number Of Keys: 1
key[0]:
        [Numeric Format]:  86,70,56,48,65,70,48,48
        [Character Format]: VF80AF00
   *    [Node Format]: Cluster ID: 32943 Node ID: 0   Node Name: node-14

Device Name: /dev/vx/rdmp/emc_clariion0_73
Total Number Of Keys: 1
key[0]:
        [Numeric Format]:  86,70,56,48,65,70,48,48
        [Character Format]: VF80AF00
   *    [Node Format]: Cluster ID: 32943 Node ID: 0   Node Name: node-14

Device Name: /dev/vx/rdmp/emc_clariion0_72
Total Number Of Keys: 1
key[0]:
        [Numeric Format]:  86,70,56,48,65,70,48,48
        [Character Format]: VF80AF00
   *    [Node Format]: Cluster ID: 32943 Node ID: 0   Node Name: node-14




- poweroff node 14 such that its keys remain on the coordinator disks (so as to force a pre-existing split brain in next step when node 15 powers on):

- power on node 15. It eventually starts GAB and had:

[root@node-15 ~]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen  171d502 membership ;1
Port h gen  171d504 membership ;1

Feb 22 08:40:30 localhost kernel: GAB INFO V-15-1-20026 Port a[GAB_Control (refcount 2)] registration waiting for seed port membership
Feb 22 08:40:40 localhost kernel: GAB INFO V-15-1-20005 Port h[GAB_USER_CLIENT (refcount 0)] registration waiting for seed port membership
Feb 22 08:41:17 localhost kernel: GAB INFO V-15-1-20230 Client VxFen inited GAB API with handle ffff8803fd428200
Feb 22 08:41:22 localhost kernel: GAB INFO V-15-1-20036 Port a[GAB_Control (refcount 1)] gen  171d502 membership ;1

Feb 22 08:41:22 localhost kernel: GAB INFO V-15-1-20036 Port b[VxFen (refcount 2)] gen  171d501 membership ;1
Feb 22 08:41:22 localhost kernel: VXFEN WARNING V-11-1-50 VxDMP Preempt Abort ioctl failed with error: 6
Feb 22 08:41:22 localhost kernel: VXFEN WARNING V-11-1-50 VxDMP Preempt Abort ioctl failed with error: 6
Feb 22 08:41:22 localhost kernel: VXFEN WARNING V-11-1-50 VxDMP Preempt Abort ioctl failed with error: 6
Feb 22 08:41:22 localhost kernel: VXFEN WARNING V-11-1-12 Potentially a
Feb 22 08:41:22 localhost kernel:        preexisting split-brain.
Feb 22 08:41:22 localhost kernel:      I/O Fencing DISABLED!VXFEN INFO V-11-1-35
 Fencing driver going into RUNNING state
Feb 22 08:41:22 localhost kernel: GAB INFO V-15-1-20032 Port b closed
Feb 22 08:41:22 localhost kernel: GAB INFO V-15-1-20229 Client VxFen deiniting GAB API


However, IO Fencing refuses to start because of the pre-existing split brain.

 

Resolution

Note that autoseed_gab_timeout tunable is not intended to resolve pre-existing split-brain (PESB). The tunable is useful only when there is no actual PESB (or when there is no stale key), i.e. where some nodes are genuinely down during the time of cluster startup. In such cases, the /etc/gabtab file setting becomes too restrictive as it requires all nodes to be up at the time of startup.

For example, using the 3 node cluster example, if 1 node (e.g. node 14) is down during cluster startup, /etc/gabtab setting `/sbin/gabconfig -c –n3`  would not allow the cluster to form even though there is no threat of data corruption i.e., no PESB. The autoseed_gab_timeout tunable in /etc/vxfenmode helps here by allowing the cluster to form with 2 nodes. But in case there are stale fencing keys present from node 14 (maybe left because of an ungraceful shutdown) , then that would be assumed as PESB (though it is not). Handling such issue is beyond the scope of the autoseed_gab_timeout tunable.

Issue/Introduction

Concern and misunderstanding that autoseed_gab_timeout couldn't resolve pre-existing split brain