Using an incorrect procedure to change co-ordinator disks can lead to inconsistencies between nodes in a cluster and may require nodes to be rebooted before fencing can be restarted

book

Article ID: 100022662

calendar_today

Updated On:

Resolution

In one customer support case the customer changed coordinator disks online. This resulted in an inconsistency in cached fencing disk serial num vers between nodes and required nodes to be rebooted before fencing could be re-started.

An example of the procedure used to generate the inconsistency is shown below. Initially we have a 2 node cluster where both nodes have port b membership:

hoover #more /etc/llthosts

0staubsauger

1hoover

hoover #vxfenadm -g all -f /etc/vxfentab

DeviceName: /dev/rdsk/c2t0d0s2

TotalNumber Of Keys: 2

key[0]:

Key Value[Numeric Format]: 66,45,45,45,45,45,45,45

Key Value[Character Format]: B-------

key[1]:

Key Value[Numeric Format]: 65,45,45,45,45,45,45,45

Key Value[Character Format]: A-------

DeviceName: /dev/rdsk/c2t4d0s2

TotalNumber Of Keys: 2

key[0]:

Key Value[Numeric Format]: 66,45,45,45,45,45,45,45

Key Value[Character Format]: B-------

key[1]:

Key Value[Numeric Format]: 65,45,45,45,45,45,45,45

Key Value[Character Format]: A-------

DeviceName: /dev/rdsk/c2t1d0s2

TotalNumber Of Keys: 2

key[0]:

Key Value[Numeric Format]: 66,45,45,45,45,45,45,45

Key Value[Character Format]: B-------

key[1]:

Key Value[Numeric Format]: 65,45,45,45,45,45,45,45

Key Value[Character Format]: A-------

We stop fencing on both nodes and as expected see all keys removed from co-ordinator disks:

hoover #vxfenconfig -U

hoover #gabconfig -a

GAB PortMemberships

===============================================================

Port agen 1516f14 membership 01

hoover #vxdisk -o alldgs list |grep fencingdg

c2t0d0s2auto:cdsdisk - (fencingdg) online

c2t1d0s2auto:cdsdisk - (fencingdg) online

c2t4d0s2auto:cdsdisk - (fencingdg) online

hoover #vxfenadm -g all -f /etc/vxfentab

DeviceName: /dev/rdsk/c2t0d0s2

TotalNumber Of Keys: 0

Nokeys...

DeviceName: /dev/rdsk/c2t4d0s2

TotalNumber Of Keys: 0

Nokeys...

DeviceName: /dev/rdsk/c2t1d0s2

TotalNumber Of Keys: 0

Nokeys...

At this stage the customer then manually imports the co-ordinator disk group on one node and swaps disks:

hoover #vxdg -t import fencingdg

hoover #vxdg -g fencingdg rmdisk c2t0d0s2

hoover #vxdg -g fencingdg adddisk c1t1d0s2

hoover #vxdg deport fencingdg

We thenupdate /etc/vxfentab to reflect the disk change on both nodes:

hoover #cat /etc/vxfentab

/dev/rdsk/c1t1d0s2

/dev/rdsk/c2t1d0s2

/dev/rdsk/c2t4d0s2

staubsauger# cat /etc/vxfentab

/dev/rdsk/c1t1d0s2

/dev/rdsk/c2t1d0s2

/dev/rdsk/c2t4d0s2

Once complete we attempt to restart fencing on both nodes:

hoover #gabconfig -a

GAB PortMemberships

===============================================================

Port agen 1516f14 membership 01

staubsauger# vxfenconfig -c

VXFENvxfenconfig NOTICE Driver will use SCSI-3 compliant disks.

hoover #vxfenconfig -c

VXFENvxfenconfig NOTICE Driver will use SCSI-3 compliant disks.

VXFENvxfenconfig ERROR V-11-2-1006 List of coordinator disks in running cluster isdifferent than local node.

Unable toconfigure vxfen.

Regardlessof the order in which we start fencing on each node the cluster is unable toform due to the above message. In addition, we see the following messages insystem log files:

hoover #tail /var/adm/messages

Dec 316:39:04 hoover scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block:0

Dec 316:39:04 hoover scsi: [ID 107833 kern.notice] Vendor: FUJITSU Serial Number:DAL0P6C03KUW

Dec 316:39:04 hoover scsi: [ID 107833 kern.notice] Sense Key: IllegalRequest

Dec 316:39:04 hoover scsi: [ID 107833 kern.notice] ASC: 0x24 (invalid field in cdb),ASCQ: 0x0, FRU: 0x0

Dec 316:39:08 hoover gab: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port b gen1516f36 membership 01

Dec 316:39:08 hoover vxfen: [ID 549548 kern.notice] NOTICE: VXFEN ERROR V-11-1-23 Gotthe serial number of 5000CCA20CC436B3 for coordinator disk

Dec 316:39:08 hoover from remote node. Corresponding value for local node is5000CCA20BEEB8B4.

Dec 316:39:08 hoover vxfen: [ID 583204 kern.notice] NOTICE: VXFEN ERROR V-11-1-11Snapshot different. Dropping out of cluster.

Dec 316:39:08 hoover vxfen: [ID 416634 kern.notice] NOTICE: VXFEN INFO V-11-1-35Fencing driver going into RUNNING state

Dec 316:39:08 hoover gab: [ID 397130 kern.notice] GAB INFO V-15-1-20032 Port bclosed

Note thatthe vxfen driver is reporting that fencing disk serial numbers do not matchbetween nodes. This is due to a difference in cached serial numbers forco-ordinator disks on each node. For example, on the node where the disk waschanged we cache the new and correct serial numbers whereas on the other node(s)we cache stale serial numbers.

Note thateven if the vxfen module is reloaded this issue persists:

hoover #vxfenconfig -U

hoover #modinfo |grep fen

254139c280 4fe28 319 1 vxfen (VRTS Fence 5.0MP1)

hoover #modunload -i 254

staubsauger# vxfenconfig -U

staubsauger# modinfo |grep fen

2057bf7c000 4fe28 319 1 vxfen (VRTS Fence 5.0MP1)

hoover #modunload -i 205

staubsauger# vxfenconfig -c

VXFENvxfenconfig NOTICE Driver will use SCSI-3 compliant disks.

hoover #vxfenconfig -c

VXFENvxfenconfig NOTICE Driver will use SCSI-3 compliant disks.

VXFENvxfenconfig ERROR V-11-2-1006 List of coordinator disks in running cluster isdifferent than local node.

Unable toconfigure vxfen.

At thisstage we have no option to reform the cluster apart from rebooting the nodecontinuing to cache stale serial numbers.

Issue/Introduction

Using an incorrect procedure to change co-ordinator disks can lead to inconsistencies between nodes in a cluster and may require nodes to be rebooted before fencing can be restarted

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"