VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)
Although the monitor does not fail per se, the results from querying the CPS database are not returned sufficiently fast enough. This causes the CoordPoint Agent to kill the process and reissue the monitor.
Increase the monitorinterval, monitortimeout and ToleranceLimit values for the CoordPoint agent on the active cluster(s):
# haconf -makerw
# hatype -modify CoordPoint MonitorInterval 120
# hatype -modify CoordPoint MonitorTimeout 120
# hatype -modify CoordPoint ToleranceLimit 3
# haconf -dump -makero
Mar 4 14:54:25 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)
Mar 4 14:54:25 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-658 CoordPoint:coordpoint:monitor:The child process was terminated due to signal (9)
Mar 4 14:54:36 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)
Mar 4 14:54:36 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-658 CoordPoint:coordpoint:monitor:The child process was terminated due to signal (9)
Mar 4 14:54:44 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)
Mar 4 14:54:44 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-658 CoordPoint:coordpoint:monitor:The child process was terminated due to signal (9)
Mar 4 14:54:44 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-655 CoordPoint:coordpoint:monitor:Total number of faults have exceeded the fault tolerance value
Mar 4 14:54:44 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13068 Thread(4) Resource(coordpoint) - clean completed successfully.
At this time, the /var/VRTSvcs/log/Coordpoint_A.log on the monitored server shows the following command executed :
2024/03/04 14:54:14 VCS DBG_10 CoordPoint:coordpoint:monitor:exec -/opt/VRTScps/bin/cpsadm -s xxx.xx.x.xxx -p 443 -u {212d7ac0-1dd2-11b2-aa68-00144ffb1823} -a list_membership
CoordPoint.C:monitor_coordination_server[808]2024-03-04 14:54:22 CPS INFO V-97-1400-22019 Response message : CPS_CLUS_UUID: {50cf7e1c-1dd2-11b2-b70e-00144ffbba59} CPS_NODE_ID: 1 CPS_CLIENT_REQ_TIMESTAMP: Mon Mar 4 14:54:17 2024
2024-03-04 14:54:22 CPS INFO V-97-1400-22019 Response message : CPS_CLUS_UUID: {af9e7aca-1dd1-11b2-9ee6-00144ff9a8ac} CPS_NODE_ID: 1 CPS_CLIENT_REQ_TIMESTAMP: Mon Mar 4 14:54:17 2024
2024-03-04 14:54:22 CPS INFO V-97-1400-22019 Response message : CPS_CLUS_UUID: {212d7ac0-1dd2-11b2-aa68-00144ffb1823} CPS_NODE_ID: 0 CPS_CLIENT_REQ_TIMESTAMP: Mon Mar 4 14:54:17 2024
2024-03-04 14:54:22 CPS INFO V-97-1400-22019 Response message : CPS_CLUS_UUID: {af9e7aca-1dd1-11b2-9ee6-00144ff9a8ac} CPS_NODE_ID: 0 CPS_CLIENT_REQ_TIMESTAMP: Mon Mar 4 14:54:21 2024
2024-03-04 14:54:23 CPS INFO V-97-1400-904 Got message to list registered nodes of cluster with UUID {af9e7aca-1dd1-11b2-9ee6-00144ff9a8ac
However the /var/VRTSvcs/log/engine_A.log on the monitored server shows the monitor does not succeed. A clean is called and successfully recovered:
2024/03/04 14:54:44 VCS INFO V-16-2-13068 (xxxxxxxx) Resource(coordpoint) - clean completed successfully.
2024/03/04 14:54:52 VCS INFO V-16-2-13082 (xxxxxxxx) Resource(coordpoint) recovered from fault, on its own.