VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)

Description

Error Message

VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)

Cause

Although the monitor does not fail per se, the results from querying the CPS database are not returned sufficiently fast enough. This causes the CoordPoint Agent to kill the process and reissue the monitor.

Resolution

Increase the monitorinterval, monitortimeout and ToleranceLimit values for the CoordPoint agent on the active cluster(s):

# haconf -makerw

# hatype -modify CoordPoint MonitorInterval 120

# hatype -modify CoordPoint MonitorTimeout 120

# hatype -modify CoordPoint ToleranceLimit 3

# haconf -dump -makero

Issue/Introduction

The following errors are repeatedly written to the /var/log/messages log on the monitored server:

Mar  4 14:54:25 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)

Mar  4 14:54:25 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-658 CoordPoint:coordpoint:monitor:The child process was terminated due to signal (9)

Mar  4 14:54:36 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)

Mar  4 14:54:36 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-658 CoordPoint:coordpoint:monitor:The child process was terminated due to signal (9)

Mar  4 14:54:44 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)

Mar  4 14:54:44 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-658 CoordPoint:coordpoint:monitor:The child process was terminated due to signal (9)

Mar  4 14:54:44 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-10061-655 CoordPoint:coordpoint:monitor:Total number of faults have exceeded the fault tolerance value

Mar  4 14:54:44 xxxxxxxx AgentFramework[8465]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13068 Thread(4) Resource(coordpoint) - clean completed successfully.

At this time, the /var/VRTSvcs/log/Coordpoint_A.log on the monitored server shows the following command executed :

2024/03/04 14:54:14 VCS DBG_10 CoordPoint:coordpoint:monitor:exec -/opt/VRTScps/bin/cpsadm -s xxx.xx.x.xxx -p 443 -u {212d7ac0-1dd2-11b2-aa68-00144ffb1823} -a list_membership

    CoordPoint.C:monitor_coordination_server[808]

The query is successfully executed and results are returned as can be seen in the /var/VRTScps/log/cpserver_A.log on the Coordination Point (CP) Servers:

2024-03-04 14:54:22 CPS INFO V-97-1400-22019 Response message : CPS_CLUS_UUID: {50cf7e1c-1dd2-11b2-b70e-00144ffbba59} CPS_NODE_ID: 1 CPS_CLIENT_REQ_TIMESTAMP: Mon Mar  4 14:54:17 2024

2024-03-04 14:54:22 CPS INFO V-97-1400-22019 Response message : CPS_CLUS_UUID: {af9e7aca-1dd1-11b2-9ee6-00144ff9a8ac} CPS_NODE_ID: 1 CPS_CLIENT_REQ_TIMESTAMP: Mon Mar  4 14:54:17 2024

2024-03-04 14:54:22 CPS INFO V-97-1400-22019 Response message : CPS_CLUS_UUID: {212d7ac0-1dd2-11b2-aa68-00144ffb1823} CPS_NODE_ID: 0 CPS_CLIENT_REQ_TIMESTAMP: Mon Mar  4 14:54:17 2024

2024-03-04 14:54:22 CPS INFO V-97-1400-22019 Response message : CPS_CLUS_UUID: {af9e7aca-1dd1-11b2-9ee6-00144ff9a8ac} CPS_NODE_ID: 0 CPS_CLIENT_REQ_TIMESTAMP: Mon Mar  4 14:54:21 2024

2024-03-04 14:54:23 CPS INFO V-97-1400-904 Got message to list registered nodes of cluster with UUID {af9e7aca-1dd1-11b2-9ee6-00144ff9a8ac

However the /var/VRTSvcs/log/engine_A.log on the monitored server shows the monitor does not succeed. A clean is called and successfully recovered:

2024/03/04 14:54:44 VCS INFO V-16-2-13068 (xxxxxxxx) Resource(coordpoint) - clean completed successfully.

2024/03/04 14:54:52 VCS INFO V-16-2-13082 (xxxxxxxx) Resource(coordpoint) recovered from fault, on its own.

Welcome to "KB Articles"