HAD does not start or gets restarted in Veritas Cluster Server versions 5.0.1 and above

book

Article ID: 100025192

calendar_today

Updated On:

Description

Error Message

From VCS engine_A.log: 
 
2011/07/08 01:01:53 VCS WARNING V-16-1-10485 Excessive delay between successive calls to GAB heartbeat (11 seconds)  
2011/07/08 01:03:51 VCS WARNING V-16-1-10485 Excessive delay between successive calls to GAB heartbeat (117 seconds) 
 
From syslog.log: 
 
Jul 11 15:30:24 node1 Had[4694]: VCS WARNING V-16-1-53034 HAD Signal SIGABRT received  
Jul 11 15:32:46 node1 Had[5481]: VCS WARNING V-16-1-53034 HAD Signal SIGABRT received

Cause

The stack from _had cores points to hangs in select() system call. A sample stack from _had core is shown below: (0) 0x000000000479fdc0 _Z12VCSDumpStackv + 0x3b0 at Platform.C:1830 [/opt/VRTSvcs/bin/had] (1) 0x00000000047a1000 VCSAbrtHandler + 0x60 at Platform.C:1990 [/opt/VRTSvcs/bin/had] (2) 0xe0000001205c7420 ---- Signal 6 (SIGABRT) delivered ---- (3) 0x60000000c0948830 _select_sys + 0x30 [/usr/lib/hpux32/libc.so.1] (4) 0x60000000c095ed40 _select + 0xe0 at ../../../../../core/libs/libc/shared_em_32_perf/../core/syscalls/t_select.c:21 [/usr/lib/hpux32/libc.so.1] (5) 0x00000000046c4f30 _ZN9IpmHandle6eventsEP5DListPS1_S1_S2_i + 0xb30 at Ipm.C:502 [/opt/VRTSvcs/bin/had] (6) 0x00000000046d0100 _ZN9IpmHandle4sendEP5VListi + 0x1300 at Ipm.C:2230 [/opt/VRTSvcs/bin/had] (7) 0x000000000464e560 _ZN6System12process_dumpEPvP6MsgHdr + 0x920 at System.C:4871 [/opt/VRTSvcs/bin/had] (8) 0x00000000041e2330 _Z15process_messagePvP5VListi + 0xda0 at had.C:461 [/opt/VRTSvcs/bin/had] (9) 0x00000000041f5a50 _Z4MAINmPPc + 0x8d50 at had.C:3076 [/opt/VRTSvcs/bin/had] (10) 0x0000000004206270 main + 0x40 at had.C:3576 [/opt/VRTSvcs/bin/had] (11) 0x60000000c00427c0 main_opd_entry + 0x50 [/usr/lib/hpux32/dld.so] This points to HP-UX OS issue. Further analysis by HP referred to a regression caused by the OS patch PHKL_41700.

Resolution

To get around this problem, HP has suggested customer to tune "hires_timeout_enable" kernel parameter to 1 before starting cluster. Run the following command to set this variable to 1.

# kctune hires_timeout_enable=1  

Another possible solution is to install the following kernel patch:

PHKL_41967

Please note the above patch is the most current at the time this document was edited (July 2011). Check with HP to see if a new release of the patch is available.

Applies To

This issue is specific to:

HP-UX 11.31

VCS or SFRAC 5.0.1 and subsequent patches

HP-UX kernel patch PHKL_41700 installed

Issue/Introduction

VCS High Availability Daemon (had) is getting killed by GAB continuously.

Additional Information

ETrack: 1724831

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"