The following is repeatedly reported in the /etc/VRTSvcs/log/wac_A.log on the primary cluster.
2024/02/06 12:20:18 VCS WARNING V-16-1-10519 IpmHandle::send peer closed
The following is repeatedly reported in the /etc/VRTSvcs/log/wac_A.log on the secondary (DR) cluster
VCS ERROR V-16-3-18491 Unable to connect to remote cluster xxxxx securely
2024/02/06 12:20:41 VCS INFO V-16-3-18306 Initiating connection to cluster prodclus at xxx.xxx.xx.xxx
Where xxxx refers to the remote cluster name and cluster ip address respectively. Normal switch activity will be successful up until the node crashes.
The remote cluster will be in an INIT state.
hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A server101 RUNNING 0
A server102 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
....
-- WAN HEARTBEAT STATE
-- Heartbeat To State
M Icmp drclus ALIVE
-- REMOTE CLUSTER STATE
-- Cluster State
N drclus INIT
...
P globalgroup drclus:drserver201 Y N OFFLINE
The following command may show the service group in a migrate state from node that crashed.
hagrp -display -all|grep -i migrateq
ClusterService MigrateQ localclus
globalgroup MigrateQ localclus Server101
globalgroup MigrateQ localclus Server101
The following command may show a non zero value for the service group.
hagrp -display -all|grep -i intentonline
globalgroup IntentOnline localclus 1
globalgroup IntentOnline localclus 1
The ClusterService WAC process has been configured securely:
ps -ef | grep wac
root 32467 1 12 09:16 ? 00:30:44 /opt/VRTSvcs/bin/wac -secure
Trust has not been (correctly configured) between the DR site to the node in question, due to which WAC is not able to enter a running state and will be stuck in an INIT state.
1. The immediate workaround is to flush the service group on both nodes and manually online the service group. i.e.
hagrp -flush globalgroup -sys server101
hagrp -flush globalgroup -sys server102
hagrp -online globalgroup -sys server102
2. The permanent solution is to either:
a. Modify the WAC start and monitor processes to run in insecure mode.
haconf -makerw
hares -modify wac StartProgram "/opt/VRTSvcs/bin/wacstart"
hares -modify wac MonitorProcesses "/opt/VRTSvcs/bin/wac"
haconf -dump -makerw
b. Establish trust between the problem node and the DR cluster. The following must be run on the node where the service group will not online, and on all nodes at the DR site.
export EAT_DATA_DIR=/var/VRTSvcs/vcsauth/data/WAC
/opt/VRTSvcs/bin/vcsat setuptrust –b xxx.xxx.xx.xxx:14149 –s high
Where xxx.xxx.xx.xxx is an IP address on the remote node.
Example assuming the following:
server102 is the node where the service group does not come online and has an IP address of 192.168.10.102.
DRserver201 is a node on the DR site with an IP address of 192.168.10.201.
DRServer202 is the second node on the DR site with an ip address of 192.168.10.202.
The following is executed on server102:
export EAT_DATA_DIR=/var/VRTSvcs/vcsauth/data/WAC
/opt/VRTSvcs/bin/vcsat setuptrust –b 192.168.10.201:14149 –s high
export EAT_DATA_DIR=/var/VRTSvcs/vcsauth/data/WAC
/opt/VRTSvcs/bin/vcsat setuptrust –b 192.168.10.202:14149 –s high
The following is executed on DRServer201 and DRServer202
export EAT_DATA_DIR=/var/VRTSvcs/vcsauth/data/WAC
/opt/VRTSvcs/bin/vcsat setuptrust –b 192.168.10.102:14149 –s high