VXFEN CRITICAL V-11-1-24 Local cluster node ejected from cluster to prevent potential data corruption since RACER died
Jul 28 01:19:36 HOSTP1 vmunix: VXFEN WARNING V-11-1-65 Could not eject node 1 from disk
Jul 28 01:19:36 HOSTP1 vmunix: with serial number 60060160510F260056C0EB8655FFDE11 since
Jul 28 01:19:36 HOSTP1 vmunix: keys of node 0 are not registered with it
Jul 28 01:19:37 HOSTP1 vmunix: with serial number 60060160510F260057C0EB8655FFDE11 since
Jul 28 01:19:38 HOSTP1 vmunix: with serial number 60060160510F260058C0EB8655FFDE11 since
This issue is caused due to a defect in vxdisk binary that causes it to display the wrong pathname.
# vxdisk list emcpower5s2
Device: emcpower5s2 <<< empower5
devicetag: emcpower5
type: auto
hostid:
disk: name= id=1184769158.55.topas-db02
group: name=fendg id=1184769164.57.topas-db02 <<< fendg
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig
pubpaths: block=/dev/vx/dmp/emcpower5s2 char=/dev/vx/rdmp/emcpower5s2
....
Multipathing information:
numpaths: 1
emcpower55c state=enabled <<< emcpower55, this is incorrect !!
mtvspt02$ cat vxdisk_list_emcpower55s2
Device: emcpower55s2 <<< emcpower55
devicetag: emcpower55
type: auto
clusterid: mycluster
disk: name= id=1286048319.88.topas-db04
group: name=oradg id=1184786112.114.topas-db02 <<< oradg
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig shared autoimport
pubpaths: block=/dev/vx/dmp/emcpower55s2 char=/dev/vx/rdmp/emcpower55s2
.....
Multipathing information:
numpaths: 1
emcpower55c state=enabled <<< emcpower55 (correct)
The above problem can happen on any disk (not necessarily the Fencing Coordinator Disks). But if this problem happens on the Coordinator Disks and the /etc/vxfenmode has scsi3_disk_policy=raw or if the system is running Veritas Storage Foundation (SF) 4.1 where only raw mode is available, the vxfen startup script will generate the /etc/vxfentab with an incorrect list of coordinator disks. With Coordinator Disks in raw mode, the system startup script (e.g. /etc/rc2.d/S97vxfen) will get the OS device pathnames of the coordinator disks using the above problematic vxdisk list output. For example, the above incorrect vxdisk list output will generate an /etc/vxfentab file with /dev/rdsk/emcpower55c which actually belongs to data diskgroup oradg.
Please note the first node started up in the cluster will succeed even though the fencing keys are registered on the wrong disk, it is because the fencing driver (vxfen) doesn't generate any I/O to the disks, the fencing driver will only need to register the keys. Subsequent nodes trying to join the cluster will fail because it is unlikely the joining node will be using the same wrong disk.
This defect in the vxdisk binary is fixed through the etrack incidents listed in the Supplemental Material section of this article. The fix is available in VxVM 5.0MP3RP1HF10 on the Solaris platform.
There is no official patch for VxVM 4.1 on Solaris platform because this version is already End of Support. Please use the attached hotfix vxdisk binary if you have the problem. One binary is for Solaris 9 and one is for Solaris 10. Please note that the system must be upgraded to the latest VxVM 4.1MP2 RP6 patch level first.
# cksum vxdisk_41mp2rp6_e1634785_sol9
1367017323 1112248 vxdisk_41mp2rp6_e1634785_sol9
# cksum vxdisk_41mp2rp6_e1634785_sol10
2068478890 1006512 vxdisk_41mp2_e1634785
Please backup the original vxdisk binary and replace it with the above hotfix binary. (Please choose the corresponding binary for your Solaris version.)
# cd /usr/sbin
# mv vxdisk vxdisk.41mp2rp6.orig
# mv /tmp/vxdisk_41mp2rp6_e1634785_solX vxdisk
# chown root:sys vxdisk
# chmod 555 vxdisk