Oracle agent returned OFFLINE even though the Oracle error (ORA-XXXXX) should be ignored according to oraerror.dat

book

Article ID: 100005624

calendar_today

Updated On:

Description

Error Message

VCS INFO V-16-20002-211 (gsp-pcmdb1) Oracle:OPCMP1:monitor:Monitor procedure /opt/VRTSagents/ha/bin/Oracle/SqlTest.pl returned the output: ERROR:
ORA-28002: the password will expire within 7 days

VCS ERROR V-16-2-13067 (gsp-pcmdb1) Agent is calling clean for resource(OPCMP1) because the resource became OFFLINE unexpectedly, on its own.

Cause

If the DetailMontor script SqlTest.pl receives any Oracle error "ORA-XXXXX" from the sqlplus commands, it will return OFFLINE to the Oracle Agent together with the received ORA-XXXXX code.   The Oracle Agent (/opt/VRTSagents/ha/bin/Oracle/OracleAgent) will further process this Oracle code according to oraerror.dat.

Please refer to the VCS Agent for Oracle Installation and Configuration Guide for section : How the agent handles Oracle error codes during detail monitoring.

https://docs.infoscale.com/ 

One possible cause for the agent to return OFFLINE is that the oraerror.dat entries are not able be read by the agent when the agent started.

According to the table Predefined agent actions for Oracle errors, if the oraerror.dat file is not available, the agent assumes the default behavior of returning OFFLINE (FAILOVER).   Note the agent will also OFFLINE if oraerror.dat file is empty or contains no valid entries.

Please note that the oraerror.dat file is loaded by the Oracle Agent only during the agent start.   If the file is modified, the agent has to be restarted in order to have the agent recognize the change.

On a Solaris system we can check the gcore of the OracleAgent process to confirm if the oraerror.dat was loaded by the agent when it started.   For example, in the 5.1GA oraerror.dat file, there are 187 entries.

# egrep -v '^$|^#|}' oraerror.dat | wc -l
     187

Checking the gcore of the OracleAgent process, the data structure "token_data" stores the in-memory oraerror.dat.    The 7th field is the number of entries in the oraerror.dat file plus 1.

# ps -ef |grep OracleAge
    root  2946     1   1 13:02:43 ?           0:00 /opt/VRTSagents/ha/bin/Oracle/OracleAgent -type Oracle -agdir /opt/VRTSagents/h

# gcore 2946
gcore: core.2946 dumped

#  mdb /opt/VRTSagents/ha/bin/Oracle/OracleAgent core.2946

mdb> *token_data/20D
0x9ecd0:        32              789216          0               815576          815672        
                668792          188             16961           1162824517      1330332928         <<<  187 + 1 = 188
                121             1634299438      1112492800      100             842019121     
                758133805       808525873       859451442       976499488       542852678

 

Resolution

If due to some reasons, the oraerror.dat file was not loaded by the OracleAgent when it started, please first make sure the oraerror.dat file exists and is readable and contains the valid entries, and then restart the Oracle Agent.

# pwd
/opt/VRTSagents/ha/bin/Oracle

# ls -l oraerror.dat
-rwxr--r--   1 root     root        3615 Apr  1 13:02 oraerror.dat

# haagent -stop Oracle -force -sys

# haagent -start Oracle -sys

Check the gcore of the OracleAgent process again to make sure that the oraerror.dat entries are loaded successfully.

 

 

Issue/Introduction

Oracle agent returned OFFLINE even though the Oracle error (ORA-XXXXX) should be ignored according to oraerror.dat