VCS Netlsnr Agent - Second Level Monitor (LsnrTest.pl) kills the "lsnrctl status" command and returns UNKNOWN state before the agent monitor times out
book
Article ID: 100021353
calendar_today
Updated On:
Description
Error Message
VCS INFO V-16-20002-211 Netlsnr:lsnr_cup_public:monitor:Monitor procedure /opt/VRTSagents/ha/bin/Netlsnr/LsnrTest.pl returned the output:
bash: line 1: 22286 Killed
/opt/oracle/product/10.2/bin/lsnrctl status LISTENER_CUP
Cause
In the Oracle Listener (Netlsnr) agent, the second level monitor script (LsnrTest.pl) has a logic to kill the lsnrctl status command before the agent monitor times out as specified in the MonitorTimeout attribute.
In the LsnrTest.pl script an alarm of 45 seconds is set before calling the lsnrctl status command.
my $TIMEOUT = 45;
$SIG{ALRM.EN_US}=\&catch_alrm;
alarm $TIMEOUT;
.....
$LsnrResult = `$SU $Owner -c '$LD_ENV; echo $LIBRARY_PATH - \$$LIBRARY_PATH; $stat_str'`;
In the alarm signal handler, the lsnrctl status command is killed and a message similar to the following will be logged. The monitor will return an UNKNOWN state for the Netlsnr resource.
VCS INFO V-16-20002-211 Netlsnr:lsnr_cup_public:monitor:Monitor procedure /opt/VRTSagents/ha/bin/Netlsnr/LsnrTest.pl returned the output:
bash: line 1: 22286 Killed
/opt/oracle/product/10.2/bin/lsnrctl status LISTENER_CUP
Because of the alarm, the Netlsnr monitor will not return OFFLINE or MONITOR TIMEOUT if the lsnrctl status command hangs. Also, VCS will not take any action because of the UNKNOWN state returned by the LsnrTest.pl script. VCS will only change the state of the resource to UNKNOWN though.
# hares -display listener | grep ' State'
listener State alaw1 ONLINE|STATE UNKNOWN
Resolution
If there is a need for VCS to take action to restart the Oracle Listener, you can modify the LsnrTest.pl script so that the script will return OFFLINE under such conditions.
# pwd
/opt/VRTSagents/ha/bin/Netlsnr
# ls -l LsnrTest.pl
-rwxr--r-- 1 root sys 5878 Jun 24 14:53 LsnrTest.pl
Please make a copy of the script before modifying it.
# cp LsnrTest.pl LsnrTest.pl.unknown
Increase the RestartLimit so that VCS will try to restart the Netlsnr resource before it fails over the whole service group to another node (if the Netlsnr resource is a critical resource). For example, the following command will increase the RestartLimit to 1 so that VCS will try to restart the resource once before declare it offline.
# haconf -makerw
# hatype -modify Netlsnr RestartLimit 1
# haconf -dump -makero
Freeze the service group containing the Netlsnr resource to prevent VCS from taking any unwanted actions during the script modification.
# hagrp -freeze "service group"
Change the line which returns the resource state in the catch_alrm subroutine:
#
# Subroutine - catch_alrm
# Purpose - To kill the lsnrctl process (if any) after $TIMEOUT seconds
#
sub catch_alrm {
if ( $AgentDebug == 1) {
VCSAG_LOG_MSG ( "E" , "lsnrctl operation timed out",14);
}
my $current_pgid = getpgrp();
my @pids;
my $pid;
@pids= `$PSALRM -g $current_pgid | $IGREP "$LSNRMGR" | $VGREP grep | $AWK '{print \$1}'`;
foreach $pid (@pids) {
kill (9, $pid);
}
# exit $VCSAG_RES_UNKNOWN; # comment out this line
exit $VCSAG_RES_OFFLINE; # change the line to return OFFLINE
}
After the script is modified, wait for a few minutes for VCS to complete a couple of monitor cycles, if the monitor entry point returns to the correct state, then we can unfreeze the service group.
# hagrp -unfreeze "service group"
After the above change, when the lsnrctl status command hangs, the second level monitor script (LsnrTest.pl) will return OFFLINE. VCS will then kill the Oracle Listener process through the clean entry point and restart the resource according to the RestartLimit attribute set.
Issue/Introduction
VCS Netlsnr Agent - Second Level Monitor (LsnrTest.pl) kills the lsnrctl status command and returns UNKNOWN state before the agent monitor times out
Additional Information
ETrack: 849011
ETrack: 1919377
Was this article helpful?
thumb_up
Yes
thumb_down
No