Best practices for troubleshooting VCS Notifier Agent faults

book

Article ID: 100004827

calendar_today

Updated On:

Description

Error Message


2011/01/10 11:34:01 VCS NOTICE V-16-1-10301 Initiating Online of Resource Notifier (Owner: unknown, Group: ClusterService) on System symc-linux1

2011/01/10 11:34:01 VCS INFO V-16-1-10298 Resource Notifier (Owner: unknown, Group: ClusterService) is online on symc-linux1 (VCS initiated)
2011/01/10 11:34:02 VCS INFO V-16-1-10304 Resource Notifier (Owner: unknown, Group: ClusterService) is offline on symc-linux2 (First probe)
2011/01/10 11:35:02 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:35:02 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:35:02 VCS ERROR V-16-2-13073 (symc-linux1) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 3) the resource.
2011/01/10 11:35:02 VCS NOTICE V-16-2-13076 (symc-linux1) Agent has successfully restarted resource(Notifier).
2011/01/10 11:36:03 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:36:03 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:36:03 VCS ERROR V-16-2-13073 (symc-linux1) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 2 of 3) the resource.
2011/01/10 11:36:03 VCS NOTICE V-16-2-13076 (symc-linux1) Agent has successfully restarted resource(Notifier).
2011/01/10 11:37:03 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:37:03 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:37:03 VCS ERROR V-16-2-13073 (symc-linux1) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 3 of 3) the resource.
2011/01/10 11:37:03 VCS NOTICE V-16-2-13076 (symc-linux1) Agent has successfully restarted resource(Notifier).
2011/01/10 11:38:03 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:38:03 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:38:03 VCS INFO V-16-1-10307 Resource Notifier (Owner: unknown, Group: ClusterService) is offline on symc-linux1 (Not initiated by VCS)
2011/01/10 11:38:03 VCS NOTICE V-16-1-10301 Initiating Online of Resource Notifier (Owner: unknown, Group: ClusterService) on System symc-linux2
2011/01/10 11:38:03 VCS INFO V-16-1-10298 Resource Notifier (Owner: unknown, Group: ClusterService) is online on symc-linux2 (VCS initiated)
2011/01/10 11:39:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:39:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:39:03 VCS ERROR V-16-2-13073 (symc-linux2) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 3) the resource.
2011/01/10 11:39:03 VCS NOTICE V-16-2-13076 (symc-linux2) Agent has successfully restarted resource(Notifier).
2011/01/10 11:40:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:40:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:40:03 VCS ERROR V-16-2-13073 (symc-linux2) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 2 of 3) the resource.
2011/01/10 11:40:03 VCS NOTICE V-16-2-13076 (symc-linux2) Agent has successfully restarted resource(Notifier).
2011/01/10 11:41:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:41:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:41:03 VCS ERROR V-16-2-13073 (symc-linux2) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 3 of 3) the resource.
2011/01/10 11:41:03 VCS NOTICE V-16-2-13076 (symc-linux2) Agent has successfully restarted resource(Notifier).
2011/01/10 11:42:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:42:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:42:03 VCS INFO V-16-1-10307 Resource Notifier (Owner: unknown, Group: ClusterService) is offline on symc-linux2 (Not initiated by VCS)

 

Summary Status of VCS-- SYSTEM STATE-- System               State                Frozen

A  symc-linux1         RUNNING              0                   
A  symc-linux2         RUNNING              0            
       
-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State         
B  ClusterService  symc-linux1         Y          N               OFFLINE|FAULTED      <<<<<<
B  ClusterService  symc-linux2         Y          N               OFFLINE|FAULTED      <<<<<<

B  asic-queueing   symc-linux1         Y          N               ONLINE        
B  asic-queueing   symc-linux2         Y          N               OFFLINE       
B  fop-servlet     symc-linux1         Y          N               ONLINE        
B  fop-servlet     symc-linux2         Y          N               OFFLINE       
B  network         symc-linux1         Y          N               ONLINE        
B  network         symc-linux2         Y          N               ONLINE        
B  nfs-share       symc-linux1         Y          N               OFFLINE       
B  nfs-share       symc-linux2         Y          N               ONLINE        
B  webservices     symc-linux1         Y          N               ONLINE        
B  webservices     symc-linux2         Y          N               OFFLINE       
-- RESOURCES FAILED
-- Group           Type                 Resource             System             
C  ClusterService  NotifierMngr         Notifier             symc-linux1       
C  ClusterService  NotifierMngr         Notifier             symc-linux2     
  
-- RESOURCES NOT PROBED
-- Group           Type                 Resource             System             
D  ClusterService  NIC                  csgnic               symc-linux1       
D  ClusterService  NIC                  csgnic               symc-linux2 

Configuration and Logs

1.  /var/VRTSvcs/log/notifier_A.log
-------------------------------------------------------------------------
2010/12/15 16:24:41 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:26:14 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:27:18 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:28:32 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:29:56 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:31:30 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:36:54 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:32:31 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:33:02 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:33:43 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:34:34 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:35:35 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:36:46 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:40:50 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:41:21 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:42:02 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:42:53 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:43:55 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:45:06 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 16:08:30 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)


 

2.  /var/VRTSvcs/log/Notifier_A.log
-------------------------------------------------------------------------
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) name(Notifier) op(1607)
        VCSAgTimer.C:check_timers[297]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Resetting periodic timer for resource Notifier op 1607 to expire at 1485   <<<<<< Set the timer
        VCSAgTimer.C:_res
et_periodic_timer[999]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Adding timer for Notifier with tmo 1485                                                  <<<<<<
        VCSAgTimer.C:_add[723]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Timer id is 28
        VCSAgTimer.C:_add[739]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Appending command minor code 1607 for resource Notifier
        VCSAgRes.C:append_cmd[340]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Scheduled resource Notifier
        VCSAgSched.C:put_req[173]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Picked Res(Notifier) from Scheduler
        VCSAgSched.C:_dequeue[64]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Resource (Notifier) received cmd minor code (1607)
        VCSAgRes.C:process_cmd[4727]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) Resource Notifier transitioning from Online to Monitoring
        VCSAgRes.C:internal_state[4083]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) The values of ArgList attributes are given below
        VCSAgRes.C:call_entry_point[986]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[0] is (14141)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[1] is (30)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[2] is (14144)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[3] is (162)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[4] is (public)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[5] is (2)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[6] is (172.16.141.15)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[7] is (Warning)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[8] is (mailgatensw.ffx.jfh.com.au)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[9] is (0)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[10] is (10)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[11] is ()
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[12] is ()
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[13] is (2)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[14] is (
admin@symc.com)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[15] is (Warning)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) No OS encoded ArgList attributes
        VCSAgRes.C:call_entry_point[1028]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Adding timer for Notifier with tmo 1485
        VCSAgTimer.C:_add[723]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Timer id is 32
        VCSAgTimer.C:_add[739]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Calling monitor for resource Notifier
        VCSAgType.C:call_monitor[1268]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) agent ep version is 1
        VCSAgType.C:_is_script_ep[4948]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) Resource(Notifier) - monitor entry point exited with a confidence value 0.                <<<<<<< There was no response within its monitoring timeout.
        VCSAgType.C:call_monitor[1368]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) Notifier reported state (Offline) & conf_level (0)                                                      <<<<<<< Then place "offline" flag..
        VCSAgRes.C:call_entry_point[1324]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1608)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Removing thread_id 4151311248
        VCSAgThreadTbl.C:remove[221]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1605)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1621)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Res(Notifier) - ToleranceCount (1) ToleranceLimit(0)
        VCSAgRes.C:tolerance_limit_reached[5262]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) ToleranceLimit reached
        VCSAgRes.C:tolerance_limit_
reached[5268]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1607)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS ERROR V-16-2-13067 Thread(4151311248) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
..

..

In this example, the Notifer Agent went "offline." On the contrary, there was no explanation about "REASON."

 

3. Reviewing the configuration of Notifier

## main.cf
group ClusterService (
        SystemList = { symc-linux2 = 0, symc-linux1 = 1 }
        AutoStartList = { symc-linux1 }
        )
        NIC csgnic (
                Enabled = 0
                Device @symc-linux2 = bond0
                Device @symc-linux1 = bond0
                )
        NotifierMngr Notifier (
                SnmpConsoles = { "192.168.1.123" = Warning }
                SmtpServer = "mailgate.test.symantec.com"
                SmtpRecipients = { "
admin@symc.com" = Warning }
                )
        Notifier requires csgnic

 

4. According to the logs in /etc/VRTSvcs/conf/config/main.cmd, something changed in the past.

$ egrep -i SmtpServerVrfyOff main.cmd
hatype -modify NotifierMngr ArgList EngineListeningPort MessagesQueue NotifierListeningPort SnmpdTrapPort SnmpCommunity SnmpConsoles SmtpServer SmtpServerVrfyOff SmtpServerTimeout SmtpReturnPath SmtpFromPath SmtpRecipients
haattr -add NotifierMngr SmtpServerVrfyOff -boolean 0
hares -modify Notifier SmtpServerVrfyOff 0
Note: Check the current setting parameter in types.cf

$ egrep -i SmtpServerVrfyOff types.cf
        static str ArgList[] = { EngineListeningPort, MessagesQueue, NotifierListeningPort, SnmpdTrapPort, SnmpCommunity, SnmpConsoles, SmtpServer, SmtpServerVrfyOff, SmtpServerTimeout, SmtpReturnPath, SmtpFromPath, SmtpRecipients }
        boolean SmtpServerVrfyOff = 0
According to Administrator's Guide:
Set this value to 1 if your mail server does not support SMTP VRFY command.
 If this sets with value to 1, the notifier does not send a SMTP VRFY request to the mail server specified in SmtpServer attribute while sending emails.
Type and dimension: boolean-scalar Default: 0
So therefore, if this parameter is "SmtpServerVrfyOff = 0", the notifier should send a SMTP VRFY request to the mail server specified in SmtpServer attribute while sending emails accordingly.
As of now, it is a question of verifying if SMTP server supports the VCS notifer service and the SMTP VRFY command.

Resolution

 

1. Run the following command:
/opt/VRTSvcs/bin/notifier -s m=north -s m=south,p=2000,l=Error,c=your_company -t m=north,e="abc@your_company.com",l=SevereError
In this example, the Notifier:
- Sends all level SNMP traps to north at the default SNMP port and community value public.
- Sends Error and SevereError traps to south at port 2000 and community value your_company.
- Sends SevereError email messages to north as SMTP server at default port and to email recipient abc@your_company.com.

 

2. Thus, it may be required to get the strace output for Notifer.

- Have truss of the Notifier processes (when resource failed, so we can check in truss if it have tried to open the smtp connection)
#strace -f -v -p PID -o notifier_strace__`hostname`_`date '+%d.%m.%y'`.out -s 512

 

3. For the last workaround, please check if disabling "SmtpServerVrfyOff" makes a difference of not.

#haconf -makerw
#hares -modify ntfr SmtpServerVrfyOff 1
#haconf -dump -makero

 

Applies To

Configuration
- Two nodes in VCS configuration

Version of OS/package

1.
Linux symc-linux1 2.6.18-194.32.1.el5 #1 SMP Mon Dec 20 10:52:42 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
Linux symc-linux2 2.6.18-194.32.1.el5 #1 SMP Mon Dec 20 10:52:42 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
2. SFHA5.0MP4 

Issue/Introduction


The VCS (Veritas Cluster Server) Notifier agent fails.