How we can set to designate the amount of time the hashadow process waits (sleep time) before restarting HAD.

book

Article ID: 100021273

calendar_today

Updated On:

Resolution

In case that GAB: Port h halting system due to client process failure, how we can set to designate the amount of time the hashadow process waits (sleep time) before restarting HAD.


If a client process fails to heartbeat to GAB, the process is killed.If the process hangs in the kernel and cannot be killed, GAB halts the system.

HAD heartbeats with GAB at regular intervals. The heartbeat timeout is specified by HAD when it registers with GAB; the default is 15 seconds.  If HAD gets stuck within the kernel and cannot heartbeat with GAB within the specified timeout, GAB tries to kill HAD by sending a SIGABRT signal.  If it does not succeed, GAB sends a SIGKILL and closes the port. By default, GAB tries to kill HAD five times before closing the port.


The number of times GAB tries to kill HAD is a configurable kernel tunable parameter, gab_kill_ntries

The minimum value for this tunable is 3 and the maximum is 10 (default = 5)
This is an indication to other nodes that HAD on this node has been killed.


Should HAD recover from its stuck state, it first processes pending signals.  Here it will receive the SIGKILL first and get killed.  After sending a SIGKILL, GAB waits for a specific amount of time for HAD to get killed.


If HAD survives beyond this time limit, GAB panics the system. This time limit is a configurable kernel tunable parameter, gab_isolate_time

The minimum value for this timer is 16 seconds and maximum is 4 minutes.

In addition to tunable parameter that above mentioned, we can adjust the VCS_HAD_RESTART_TIMEOUT environment variable to delay the hashadow process try to the had start up interval.

Please refer to the following:

VCS environment variables

VCS_HAD_RESTART_TIMEOUT:  Set this variable to designate the amount of time the hashadow process waits (sleep time) before restarting HAD.
Default: 0

Defining VCS environment variables:


Define VCS environment variables in the file vcsenv, which is located at the path /opt/VRTSvcs/bin/.
These variables are set for VCS when the hastart command is run.
To set a variable, use the syntax appropriate for the shell in which VCS starts.

For example, if you use the bash shell, define variables as:
export VCS_HAD_RESTART_TIMEOUT = 6000

Once the HAD process is started, it creates the /var/VRTSvcs/lock/.hadlock file.  Following this, hashadow is also started but waits for HAD to release the lock because the lock is preoccupied by HAD.

If HAD is stopped abnormally, the lock is released and then hashadow detects this, checks to see if HAD stopped normally.  If hashadow detects that HAD stopped abnormally, it waits for a defined time interval and then tries to restart HAD.


If HAD stopped normally and released the lock, hashadow is also stopped normally.

 

Issue/Introduction

How we can set to designate the amount of time the hashadow process waits (sleep time) before restarting HAD