vxcpserv error on Coordination Point Server: Error writing to database! :database is locked

book

Article ID: 100033688

calendar_today

Updated On:

Description

Error Message

Two types of error were seen:

a) On the client cluster:

When the CoordPoint resource faulted and the following were reported in the CoordPoint_A.log:

2017/02/07 07:41:14 VCS WARNING V-16-10061-13196 Thread(140204565960448) script (/opt/VRTScps/bin/cpsadm) terminated due to signal (9)
2017/02/07 07:41:14 VCS ERROR V-16-10061-657 CoordPoint:coordpoint:monitor:Child process terminated abnormally(9)
2017/02/07 07:41:14 VCS ERROR V-16-10061-658 CoordPoint:coordpoint:monitor:The child process was terminated due to signal (9)
2017/02/07 07:41:15 VCS ERROR V-16-10061-655 CoordPoint:coordpoint:monitor:Total number of faults have exceeded the fault tolerance value
2017/02/07 07:41:15 VCS DBG_FFDC Generating FFDC for resource (coordpoint) as monitor entry point reported unexpected OFFLINE



b) In the Coordination Point Server's syslog file:


Feb 16 07:48:30 ud002029 vxcpserv: Error writing to database! :database is locked
Feb 16 07:48:30 ud002029 vxcpserv: Error writing to database! :database is locked
Feb 16 07:48:30 ud002029 vxcpserv: Error updating client's version information into database
Feb 16 07:48:30 ud002029 vxcpserv: Error updating client's version information into database
Feb 16 07:48:30 ud002029 vxcpserv: Error writing to database! :database is locked
Feb 16 07:48:30 ud002029 vxcpserv: Error updating client's version information into database
Feb 16 07:48:30 ud002029 vxcpserv: Error writing to database! :database is locked
Feb 16 07:48:30 ud002029 vxcpserv: Error updating client's version information into database


Though messages a and b were displayed at different times, the errors were usually seen together.

Cause

The issue was triggered by IO saturation on the same SAN port used by the Coordination Point Server

Resolution

Since the Coordination Point Server database is hosted on the root filesystem, this led the customer to investigate the SAN port used by the system disk. It transpired that there was contention at the same time every day because of an IO-intensive job being run by another host. This led to the client cluster not able to run cpsadm commands to query the Coordination Point Server (CPS) database on the CPS server

Issue/Introduction

One of the clusters registered with several Coordination Point Servers errored when trying to monitor via the CoordPoint resource. The actual issue was an IO hang on the coordination point server.