VCS Netlsnr Agent - Second Level Monitor (LsnrTest.pl) kills the "lsnrctl status" command and returns UNKNOWN state before the agent monitor times out

book

Article ID: 100021353

calendar_today

Updated On:

Description

Error Message

VCS INFO V-16-20002-211 Netlsnr:lsnr_cup_public:monitor:Monitor procedure /opt/VRTSagents/ha/bin/Netlsnr/LsnrTest.pl returned the output:
bash: line 1: 22286 Killed
/opt/oracle/product/10.2/bin/lsnrctl status LISTENER_CUP

Cause

In the Oracle Listener (Netlsnr) agent, the second level monitor script (LsnrTest.pl) has a logic to kill the lsnrctl status command before the agent monitor times out as specified in the MonitorTimeout attribute.  

In the LsnrTest.pl script an alarm of 45 seconds is set before calling the lsnrctl status command.

my $TIMEOUT = 45;
$SIG{ALRM.EN_US}=\&catch_alrm;
alarm $TIMEOUT;
.....
$LsnrResult = `$SU $Owner -c '$LD_ENV; echo $LIBRARY_PATH - \$$LIBRARY_PATH; $stat_str'`;


In the alarm signal handler, the lsnrctl status command is killed and a message similar to the following will be logged.  The monitor will return an UNKNOWN state for the Netlsnr resource.

VCS INFO V-16-20002-211 Netlsnr:lsnr_cup_public:monitor:Monitor procedure /opt/VRTSagents/ha/bin/Netlsnr/LsnrTest.pl returned the output:
bash: line 1: 22286 Killed                  
/opt/oracle/product/10.2/bin/lsnrctl status LISTENER_CUP


Because of the alarm, the Netlsnr monitor will not return OFFLINE or MONITOR TIMEOUT if the lsnrctl status command hangs.   Also, VCS will not take any action because of the UNKNOWN state returned by the LsnrTest.pl script.  VCS will only change the state of the resource to UNKNOWN though.

# hares -display listener | grep ' State'
listener     State            alaw1      ONLINE|STATE UNKNOWN
 
 

Resolution

 
If there is a need for VCS to take action to restart the Oracle Listener, you can modify the LsnrTest.pl script so that the script will return OFFLINE under such conditions.

# pwd
/opt/VRTSagents/ha/bin/Netlsnr

# ls -l LsnrTest.pl
-rwxr--r--   1 root     sys         5878 Jun 24 14:53 LsnrTest.pl

Please make a copy of the script before modifying it.

# cp LsnrTest.pl LsnrTest.pl.unknown

Increase the RestartLimit so that VCS will try to restart the Netlsnr resource before it fails over the whole service group to another node (if the Netlsnr resource is a critical resource).  For example, the following command will increase the RestartLimit to 1 so that VCS will try to restart the resource once before declare it offline.

# haconf -makerw
# hatype -modify Netlsnr RestartLimit 1
# haconf -dump -makero


Freeze the service group containing the Netlsnr resource to prevent VCS from taking any unwanted actions during the script modification.

# hagrp -freeze "service group"

Change the line which returns the resource state in the catch_alrm subroutine:

#
# Subroutine - catch_alrm
# Purpose - To kill the lsnrctl process (if any) after $TIMEOUT seconds
#
sub catch_alrm {
       if ( $AgentDebug == 1) {
               VCSAG_LOG_MSG ( "E" , "lsnrctl operation timed out",14);
       }                
                       
       my $current_pgid = getpgrp();
       my @pids;        
       my $pid;        
       @pids= `$PSALRM -g $current_pgid | $IGREP "$LSNRMGR" | $VGREP grep | $AWK '{print \$1}'`;
       foreach $pid (@pids) {
               kill (9, $pid);
       }                
       #  exit $VCSAG_RES_UNKNOWN;        # comment out this line
       exit $VCSAG_RES_OFFLINE;               # change the line to return OFFLINE
}    


After the script is modified, wait for a few minutes for VCS to complete a couple of monitor cycles, if the monitor entry point returns to the correct state, then we can unfreeze the service group.

# hagrp -unfreeze "service group"


After the above change, when the lsnrctl status command hangs, the second level monitor script (LsnrTest.pl) will return OFFLINE.  VCS will then kill the Oracle Listener process through the clean entry point and restart the resource according to the RestartLimit attribute set.






 
 

 

Issue/Introduction

VCS Netlsnr Agent - Second Level Monitor (LsnrTest.pl) kills the lsnrctl status command and returns UNKNOWN state before the agent monitor times out

Additional Information

ETrack: 849011 ETrack: 1919377