VCS Agent for Sybase - Sybase dataserver killed by SIGPIPE because the online script exits before Sybase dataserver writes the startup messages

book

Article ID: 100001508

calendar_today

Updated On:

Description

Error Message

2010/04/06 23:43:25 VCS NOTICE V-16-1-10301 Initiating Online of Resource SybDB (Owner: unknown, Group: asm filesystem) on System im001


2010/04/06 23:45:37 VCS ERROR V-16-2-13066 (im001) Agent is calling clean for resource(SybDB) because the resource is not up even after online completed.

Resolution

The online script in the Veritas Cluster Server (VCS) Agent for Sybase does not wait for the Sybase dataserver to print the startup messages and exits before that.   This causes the operating system to send "PipeBroken" signal (SIGPIPE) to the dataserver.  The default handler for the SIGPIPE signal is to exit the program.  

The following is a Solaris truss output which shows the problem.

The online script is executed by the Sybase Agent.  (Please note the following is Sybase resource configured inside a Solaris zone, so zlogin is used.  If Solaris zone is not used, the online script will be called directly.)

28969/3:         7.1088execve("/usr/sbin/zlogin", 0xFEC7811C, 0x0002D694)  argc =61
28969/1:         argv: /usr/sbin/zlogin db2testzone/bin/sh -c '
28969/1:          CLUSTER_LOGDBG="";exportCLUSTER_LOGDBG;
28969/1:          VCS_LOG_AGENT_NAME="Sybase";exportVCS_LOG_AGENT_NAME;
28969/1:          CLUSTER_HOME="/opt/VRTSvcs";export CLUSTER_HOME;
28969/1:          VCS_AGFW="1";export VCS_AGFW;VCS_CONF="/etc/VRTSvcs";
28969/1:          exportVCS_CONF; VCS_HOME="/opt/VRTSvcs"; exportVCS_HOME;
28969/1:          VCS_LOG="/var/VRTSvcs";export VCS_LOG;
28969/1:          cd/opt/VRTSagents/ha/bin/Sybase;
28969/1:          "/opt/VRTSagents/ha/bin/Sybase/online""SybEQIQ-DB" "Server"       <<< online script iscalled.
28969/1:          "1" "SYBUS_EQIQ1" "Owner""1" "sybase" "Home""1"
28969/1:          "/cfs/fs10/qa/eqiq_local/ASE15""Version" "1" "15.5" "SA" "1"
28969/1:          "sa""SApswd" "1" "XXXXXX" "User" "1" """UPword"
28969/1:          "1" "" "Db" "1" """Table" "1" "" "Monscript""1"
28969/1:          "/opt/VRTSagents/ha/bin/Sybase/SqlTest.pl""DetailMonitor"
28969/1:          "1" "0""Run_ServerFile""1"
28969/1:          "/cfs/fs10/qa/eqiq_local/ASE/ASE-15_0/install/RUN_SYBUS_EQIQ1"
28969/1:          '

The online script executes the "startserver" command.

28985/1:         7.9447execve("/cfs/fs10/qa/eqiq_local/ASE15/ASE-15_0/install/startserver", 0x00057784,0x00057798)  argc = 3
28985/1:        argv:
28985/1:          /cfs/fs10/qa/eqiq_local/ASE15/ASE-15_0/install/startserver-f
28985/1:          /cfs/fs10/qa/eqiq_local/ASE/ASE-15_0/install/RUN_SYBUS_EQIQ1

The "startserver" command then starts the "dataserver".

28993/1:         8.1777execve("/cfs/fs10/qa/eqiq_local/ASE15/ASE-15_0/bin/dataserver", 0x0003ACC4,0x0003ACE0)  argc = 6
28993/1:         argv:/cfs/fs10/qa/eqiq_local/ASE15/ASE-15_0/bin/dataserver        
28993/1:          -sSYBUS_EQIQ1-d/cfs/fs10/qa/eqiq_local/ASE15/data/master.dat
28993/1:          -e/cfs/fs10/qa/eqiq_local/ASE15/log/SYBUS_EQIQ1.log
28993/1:          -c/cfs/fs10/qa/eqiq_local/ASE15/ASE-15_0/SYBUS_EQIQ1.cfg
28993/1:          -M/cfs/fs10/qa/eqiq_local/ASE15/ASE-15_0

Note that the online script process (process id 28969) exists before the "dataserver" process (process id 28993) writes the startup messages.

28969/1:         8.2216_exit(10)            <<< online script exits

28993/1:         8.4279 write(1, " 0 0 : 0 0 :0 0 0 0 0 :".., 124)     Err#32 EPIPE        <<< dataserver tries to write the startup messages
28993/1:         8.4281     Received signal #13, SIGPIPE [default]       <<< dataserver gets SIGPIPE and the default SIGPIPE handler is to exit the program

The dataserver process receives the SIGPIPE signal and exits, as a result the dataserver cannot start properly.

The problem is already fixed in the VCS5.1 release of the Sybase Agent.   For VCS 5.0 the problem will be fixed in the next 5.0MP3 Rolling Patch release.   Please check the release notes of the next 5.0MP3 Rolling Patch for the fix through Etrack incident mentioned in the Supplemental Material section of this article.

The new version of the online script will have a new resource attribute "WaitForRecovery".  By enabling the "WaitForRecovery" attribute, the online script will continue to check the recovery state of the database instance until a user-specified timeout value.    This will allow the dataserver enough time to print the startup messages successfully.  After the startup messages are printed, the dataserver will change the SIGPIPE handler to ignore the signal.  

Before the fix is available, a temporary workaround is to delay the exit of the online script.   Please add the following perl statement (sleep(30))before the exit(10) statement in the onlinescript.

/opt/VRTSagents/ha/bin/Sybase/online:
....
sleep(30);  # added a sleep(30) statement to sleep 30 seconds and allow the dataserver to print the startup messages
#
# Delay first monitor by 10seconds
#
exit 10;




 
 

 

Issue/Introduction

VCS Agent for Sybase - Sybase dataserver killed by SIGPIPE because the online script exits before Sybase dataserver writes the startup messages