Sep 3 02:09:35 vcshost1 gab: [ID 191522 kern.notice] GAB WARNING V-15-1-20058 Port h process 2305: heartbeat failed, killing process
Sep 3 02:09:35 vcshost1 gab: [ID 975177 kern.notice] GAB INFO V-15-1-20059 Port h heartbeat interval 30000 msec. Statistics:
Sep 3 02:09:35 vcshost1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 0 ~ 6000 msec: 332527539
Sep 3 02:09:35 vcshost1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 6000 ~ 12000 msec: 2
Sep 3 02:09:35 vcshost1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 12000 ~ 18000 msec: 0
Sep 3 02:09:35 vcshost1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 18000 ~ 24000 msec: 0
Sep 3 02:09:35 vcshost1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 24000 ~ 30000 msec: 0
Sep 3 02:09:35 vcshost1 gab: [ID 767912 kern.notice] GAB INFO V-15-1-20088 System information:
Sep 3 02:09:35 vcshost1 gab: [ID 746037 kern.notice] GAB INFO V-15-1-20089 number of cpu: 128
Sep 3 02:09:35 vcshost1 gab: [ID 343786 kern.notice] GAB INFO V-15-1-20090 physical memory: 32718848K
Sep 3 02:09:35 vcshost1 gab: [ID 974088 kern.notice] GAB INFO V-15-1-20091 free memory: 2300576 K
Sep 3 02:09:35 vcshost1 gab: [ID 663063 kern.notice] GAB INFO V-15-1-20092 average free memory in 5 sec: 2307056 K
Sep 3 02:09:35 vcshost1 gab: [ID 665429 kern.notice] GAB INFO V-15-1-20093 average free memory in 30 sec: 2301720 K
Sep 3 02:09:35 vcshost1 gab: [ID 259915 kern.notice] GAB INFO V-15-1-20094 number of processes: 143
Sep 3 02:09:35 vcshost1 gab: [ID 631272 kern.notice] GAB INFO V-15-1-20095 load average in 1 min: 29.98
Sep 3 02:09:35 vcshost1 gab: [ID 587815 kern.notice] GAB INFO V-15-1-20096 load average in 5 min: 28.92
Sep 3 02:09:35 vcshost1 gab: [ID 980060 kern.notice] GAB INFO V-15-1-20097 load average in 15 min: 32.77
Sep 3 02:09:35 vcshost1 gab: [ID 559196 kern.notice] GAB INFO V-15-1-20098 pagein rate: 0
Sep 3 02:09:35 vcshost1 gab: [ID 582491 kern.notice] GAB INFO V-15-1-20099 pageout rate: 0
Sep 3 02:09:35 vcshost1 gab: [ID 940236 kern.notice] GAB INFO V-15-1-20041 Port h: client process failure: killing process
Sep 3 02:09:35 vcshost1 Had[2305]: [ID 702911 daemon.alert] VCS WARNING V-16-1-53034 had Signal SIGABRT received
Sep 3 02:09:48 vcshost1 Had[2305]: [ID 702911 daemon.alert] VCS NOTICE V-16-1-53038 Beginning execution of the diagnostics script
Sep 3 02:09:50 vcshost1 gab: [ID 424555 kern.notice] GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
The FFDC log is a circular buffer within the _had process and is being continously updated by the _had process. This buffered data will be dumped when the _had process is killed by GAB. When the VCS engine process (_had) does not send heartbeats to GAB for more than the timout value sepecified in the environment variable VCS_GAB_TIMEOUT (default is 30 seconds, different value can be specified in the /opt/VRTSvcs/bin/vcsenv file), GAB will kill the _had process by sending a SIGABRT signal. As part of the SIGABRT signal handler, the _had process will dump the FFDC log to an external file /var/VRTSvcs/diag/had/FFDC_VCS_"pid"_had.log.
The FFDC log dump file will contain data like the followings.
Example 1:
#
# Log File Name: ./FFDC_VCS_2305_had.log
# System: vcshost3
# SysInfo: Solaris:vcshost3,Generic_142900-02,5.10,sun4v
# Created: 2011-09-03 02:09:35 <<< created 02:09:35
#
2011-09-03 02:09:22 [FFDC] [1] VCS DBG_IPM sent 82 bytes <<< the earliest entry in the circular buffer is from 02:09:22
Ipm.C:send[2657]
2011-09-03 02:09:22 [FFDC] [1] VCS DBG_IPM sent 83 bytes
Ipm.C:send[2657]
2011-09-03 02:09:22 [FFDC] [1] VCS DBG_IPM sent 90 bytes
Ipm.C:send[2657]
......
Example 2:
#
# Log File Name: ./FFDC_VCS_2629_had.log
# System: vcshost2
# SysInfo: Solaris:vcshost2,Generic_141444-09,5.10,sun4u
# Created: 2011-01-11 23:17:23
#
2011-01-11 23:15:38 [FFDC] [1] VCS DBG_AGENT Agent: Inside process_agent
Agent.C:process_agent[231]
2011-01-11 23:15:38 [FFDC] [1] VCS DBG_AGENT Agent name is WebLogic9
Agent.C:process_agent[241]
2011-01-11 23:15:38 [FFDC] [1] VCS DBG_AGENT minor code is 5379
Agent.C:process_agent[264]
2011-01-11 23:15:38 [FFDC] [1] VCS DBG_AGENT MSG_AGENT_ALIVE received
Agent.C:process_agent[351]
2011-01-11 23:15:38 [FFDC] [1] VCS DBG_AGENT Entering alive method for agent WebLogic9
Agent.C:alive[387]
.....
Because the whole FFDC circular buffer is dumped, the external FFDC log file will contain entries before the time the _had was killed.
Depends on the nature of the _had process problem, the FFDC may help in the root cause analysis.
Applies To
Veritas Cluster Server version 5.1 and onward