Using an incorrect procedure to change co-ordinator disks can lead to inconsistencies between nodes in a cluster and may require nodes to be rebooted before fencing can be restarted

book

Article ID: 100022662

calendar_today

Updated On:

Resolution

In one customer support case the customer changed coordinator disks online. This resulted in an inconsistency in cached fencing disk serial num vers between nodes and required nodes to be rebooted before fencing could be re-started.
 
An example of the procedure used to generate the inconsistency is shown below. Initially we have a 2 node cluster where both nodes have port b membership:
 
hoover #more /etc/llthosts
0staubsauger
1hoover
 
hoover #vxfenadm -g all -f /etc/vxfentab
DeviceName: /dev/rdsk/c2t0d0s2
TotalNumber Of Keys: 2
key[0]:
Key Value[Numeric Format]: 66,45,45,45,45,45,45,45
Key Value[Character Format]: B-------
key[1]:
Key Value[Numeric Format]: 65,45,45,45,45,45,45,45
Key Value[Character Format]: A-------
DeviceName: /dev/rdsk/c2t4d0s2
TotalNumber Of Keys: 2
key[0]:
Key Value[Numeric Format]: 66,45,45,45,45,45,45,45
Key Value[Character Format]: B-------
key[1]:
Key Value[Numeric Format]: 65,45,45,45,45,45,45,45
Key Value[Character Format]: A-------
DeviceName: /dev/rdsk/c2t1d0s2
TotalNumber Of Keys: 2
key[0]:
Key Value[Numeric Format]: 66,45,45,45,45,45,45,45
Key Value[Character Format]: B-------
key[1]:
Key Value[Numeric Format]: 65,45,45,45,45,45,45,45
Key Value[Character Format]: A-------
 
We stop fencing on both nodes and as expected see all keys removed from co-ordinator disks:
 
hoover #vxfenconfig -U
hoover #gabconfig -a
GAB PortMemberships
===============================================================
Port agen 1516f14 membership 01
 
hoover #vxdisk -o alldgs list |grep fencingdg
c2t0d0s2auto:cdsdisk - (fencingdg) online
c2t1d0s2auto:cdsdisk - (fencingdg) online
c2t4d0s2auto:cdsdisk - (fencingdg) online
 
hoover #vxfenadm -g all -f /etc/vxfentab
DeviceName: /dev/rdsk/c2t0d0s2
TotalNumber Of Keys: 0
Nokeys...
DeviceName: /dev/rdsk/c2t4d0s2
TotalNumber Of Keys: 0
Nokeys...
DeviceName: /dev/rdsk/c2t1d0s2
TotalNumber Of Keys: 0
Nokeys...
 
At  this stage the customer then manually imports the co-ordinator disk group on one node and swaps disks:
 
hoover #vxdg -t import fencingdg
hoover #vxdg -g fencingdg rmdisk c2t0d0s2
hoover #vxdg -g fencingdg adddisk c1t1d0s2
hoover #vxdg deport fencingdg
 
We thenupdate /etc/vxfentab to reflect the disk change on both nodes:
 
hoover #cat /etc/vxfentab
/dev/rdsk/c1t1d0s2
/dev/rdsk/c2t1d0s2
/dev/rdsk/c2t4d0s2
 
staubsauger# cat /etc/vxfentab
/dev/rdsk/c1t1d0s2
/dev/rdsk/c2t1d0s2
/dev/rdsk/c2t4d0s2
 
Once complete we attempt to restart fencing on both nodes:
 
hoover #gabconfig -a
GAB PortMemberships
===============================================================
Port agen 1516f14 membership 01
 
staubsauger# vxfenconfig -c
VXFENvxfenconfig NOTICE Driver will use SCSI-3 compliant disks.
hoover #vxfenconfig -c
VXFENvxfenconfig NOTICE Driver will use SCSI-3 compliant disks.
VXFENvxfenconfig ERROR V-11-2-1006 List of coordinator disks in running cluster isdifferent than local node.
Unable toconfigure vxfen.
 
Regardlessof the order in which we start fencing on each node the cluster is unable toform due to the above message. In addition, we see the following messages insystem log files:
 
hoover #tail /var/adm/messages
Dec 316:39:04 hoover scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block:0
Dec 316:39:04 hoover scsi: [ID 107833 kern.notice] Vendor: FUJITSU Serial Number:DAL0P6C03KUW
Dec 316:39:04 hoover scsi: [ID 107833 kern.notice] Sense Key: IllegalRequest
Dec 316:39:04 hoover scsi: [ID 107833 kern.notice] ASC: 0x24 (invalid field in cdb),ASCQ: 0x0, FRU: 0x0
Dec 316:39:08 hoover gab: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port b gen1516f36 membership 01
Dec 316:39:08 hoover vxfen: [ID 549548 kern.notice] NOTICE: VXFEN ERROR V-11-1-23 Gotthe serial number of 5000CCA20CC436B3 for coordinator disk
Dec 316:39:08 hoover from remote node. Corresponding value for local node is5000CCA20BEEB8B4.
Dec 316:39:08 hoover vxfen: [ID 583204 kern.notice] NOTICE: VXFEN ERROR V-11-1-11Snapshot different. Dropping out of cluster.
Dec 316:39:08 hoover vxfen: [ID 416634 kern.notice] NOTICE: VXFEN INFO V-11-1-35Fencing driver going into RUNNING state
Dec 316:39:08 hoover gab: [ID 397130 kern.notice] GAB INFO V-15-1-20032 Port bclosed
 
Note thatthe vxfen driver is reporting that fencing disk serial numbers do not matchbetween nodes. This is due to a difference in cached serial numbers forco-ordinator disks on each node. For example, on the node where the disk waschanged we cache the new and correct serial numbers whereas on the other node(s)we cache stale serial numbers.
 
Note thateven if the vxfen module is reloaded this issue persists:
 
hoover #vxfenconfig -U
hoover #modinfo |grep fen
254139c280 4fe28 319 1 vxfen (VRTS Fence 5.0MP1)
hoover #modunload -i 254
 
staubsauger# vxfenconfig -U
staubsauger# modinfo |grep fen
2057bf7c000 4fe28 319 1 vxfen (VRTS Fence 5.0MP1)
hoover #modunload -i 205
 
staubsauger# vxfenconfig -c
VXFENvxfenconfig NOTICE Driver will use SCSI-3 compliant disks.
hoover #vxfenconfig -c
VXFENvxfenconfig NOTICE Driver will use SCSI-3 compliant disks.
VXFENvxfenconfig ERROR V-11-2-1006 List of coordinator disks in running cluster isdifferent than local node.
Unable toconfigure vxfen.
 
At thisstage we have no option to reform the cluster apart from rebooting the nodecontinuing to cache stale serial numbers.
 

 

Issue/Introduction

Using an incorrect procedure to change co-ordinator disks can lead to inconsistencies between nodes in a cluster and may require nodes to be rebooted before fencing can be restarted