VMwareDisk Resource and DiskGroup Resource together may cause delays in bringing the DiskGroup Resource online.

book

Article ID: 100065030

calendar_today

Updated On:

Description

Error Message

From DiskGroup_A.log

2024/03/07 03:28:29 VCS INFO V-16-2-13716 Thread(139768031495936) Resource(dataDG): Output of the completed operation (online)
==============================================
VxVM vxdisk ERROR V-5-1-14808 Invalid device name is specified.
VxVM vxdg ERROR V-5-1-19179 Disk group datadg: import failed:
No valid disk found containing disk group

Please refer to system log for details.
 

2024/03/07 04:05:15 VCS NOTICE V-16-10031-1559 DiskGroup:dataDG:online:Volumes in DiskGroup datadg will be started automatically as part of import command,the system level autostartvolume is set On
2024/03/07 04:05:15 VCS INFO V-16-2-13716 Thread(140319076787968) Resource(dataDG): Output of the completed operation (online)
==============================================
cat: /var/VRTSvcs/lock/volatile/vxpath_datadg_tmp: No such file or directory
VxVM vxdisk ERROR V-5-1-5267  Incorrect usage
 Usage:
        vxdisk [-f] scandisks [ [!]device=...| [!]ctlr=...| [!]pctlr=...|new|fabric]
rm: cannot remove '/var/VRTSvcs/lock/volatile/vxpath_datadg_tmp': No such file or directory
==============================================

 

Cause

This problem is observed in the DiskGroup resource along with the VMWareDisk resource.
VMwareDisks agent reports its resource online just after the VMware disk is attached to a virtual machine. If a dependent DiskGroup resource starts to online at that moment, it will fail because VMware disk is not yet present into vxdmp database due to VxVM transaction latency.

Resolution

Follow the below steps to induce some delays with the VMWare Disk agent online . So the DMP database is up to date by that time.

# cp -p /opt/VRTSvcs/bin/sample_triggers/resstatechange /opt/VRTSvcs/bin/triggers/

 Include the sleep in resstatechange trigger.

# vi /opt/VRTSvcs/bin/triggers/resstatechange


119 `$HALOG -add "Refreshing OS device tree." -sev I -sys $ARGV[0]`;
120 sleep(20); <<<<
121 $cmd = "for i in `ls -Ud /sys/class/scsi_host/host*`; " . "do echo '- - -' > \$i/scan; done";
122 `$cmd`;
123
124 `$HALOG -add "Scanning disks" -sev I -sys $ARGV[0]`;
125 `$VXDISK scandisks`;

 Ensure this file has executable permission.
# chmod 744 /opt/VRTSvcs/bin/triggers/resstatechange

Copy this file in all the cluster nodes in /opt/VRTSvcs/bin/triggers/.

Enable TriggerResStateChange for the vmware disk resource.

# haconf -makerw
# hares -modify TriggerResStateChange 1
# haconf -dump -makero

Do the switch from one server to other server and observe the logs. 
Check the log to confirm the execution of the restatechange.

Example :
# cat /var/VRTSvcs/log/engine_A.log
2024/03/14 21:05:26 VCS NOTICE V-16-1-10301 Initiating Online of Resource vmware_disk (Owner: Unspecified, Group: diskdg) on System rhel81
2024/03/14 21:05:26 VCS INFO V-16-6-0 (rhel81b) resstatechange:restatechange Invoked for resource vmware_disk
2024/03/14 21:05:26 VCS INFO V-16-6-0 (rhel81b) resstatechange:Disk Id recived from VMware disk resource is <6000C291-521b-ce53-d03f-1939f15d5d62>
2024/03/14 21:05:26 VCS INFO V-16-6-0 (rhel81b) resstatechange:Removing disk rhel81b_vmdk0_1 from VxVM
2024/03/14 21:05:26 VCS INFO V-16-6-0 (rhel81b) resstatechange:Removing device [sdv] from OS database
2024/03/14 21:05:26 VCS INFO V-16-6-0 (rhel81b) resstatechange:Refreshing OS device tree.
2024/03/14 21:05:43 VCS INFO V-16-1-10298 Resource vmware_disk (Owner: Unspecified, Group: diskdg) is online on rhel81 (VCS initiated)
2024/03/14 21:05:43 VCS NOTICE V-16-1-10301 Initiating Online of Resource dataDG (Owner: Unspecified, Group: diskdg) on System rhel81

 

 

Additional Information

JIRA: STESC-8697