How to allow GAB to panic the machine on heartbeat loss in scenarios where HAD is getting killed frequently.

book

Article ID: 100021606

calendar_today

Updated On:

Resolution

Sometimes systems can be overwhelmed by programs orresource inadequacies. this can sometimes result in GAB killing HAD multipletimes with the message:
 
GAB WARNINGV-15-1-20058 Port h process heartbeat failed, killing process
 
sometimes performance data is not captured. luckily, most system core's capture enough information to allow analysis of the problem. instead of capturing a live core manually around the time of occurrences, one can set GAB to panic the machine on loss of heartbeat(internal heartbeat). to allow GAB this action we can look at it's options using gabconfig -l:
 
prevail:#gabconfig -l
GAB Drive rConfiguration
Driverstate         : Configured
Partition arbitration: Disabled
Control port seed    : Disabled
Halt on process death: Disabled
Missed heartbeat halt: Disabled
Halt on rejoin       : Disabled
Keep on killing      : Disabled
Quorum flag          : Disabled
Restart              :Disabled
Node count           : 1
Disk HB interval (ms): 1000
Disk HB misscount   : 4
IOFENCE timeout(ms) : 15000
Stable timeout(ms)  : 5000
 
 
 
 
so above we can see that Missed heartbeat halt is disabled. This is the setting which we must change to allow GAB to halt/panic the machine once the internal heartbeat mechanism times out(15 seconds)

to change the setting we can do it on the fly with gab running and active:

prevail:#gabconfig -b

we can then list the configuration settings again to see our change:


prevail:# gabconfig -l
GAB Driver Configuration
Driver state         :Configured
Partition arbitration: Disabled
Control port seed    : Enabled
Halt on process death: Disabled
Missed heartbeat halt: Enabled
Halt on rejoin       :Disabled
Keep on killing      : Disabled
Quorum flag          :Disabled
Restart              :Enabled
Node count           : 1
Disk HB interval(ms): 1000
Disk HB miss count   : 4
IOFENCE timeout (ms) :15000
Stable timeout (ms)  : 5000

now if we experience the port H failure again, GAB will panic the box and if cores are enabled, the system will produce a system core that support will be able to analyze.

Note: this setting is not recommended to have on at all times unless you internationally want to panic the machine in these circumstances. it is meant to be used whent his type of behavior is continuous and allows support to gather more information to aid in discovery and prevention of the problem. once the node has panicked and gab is restarted, the -b option will no longer be in configuration as GAB reads the /etc/gabtab which should not include the -b option.




 
 

 

Issue/Introduction

How to allow GAB to panic the machine on heartbeat loss in scenarios where HAD is getting killed frequently.