From VCS engine_A.log:
2011/07/08 01:01:53 VCS WARNING V-16-1-10485 Excessive delay between successive calls to GAB heartbeat (11 seconds)
2011/07/08 01:03:51 VCS WARNING V-16-1-10485 Excessive delay between successive calls to GAB heartbeat (117 seconds)
From syslog.log:
Jul 11 15:30:24 node1 Had[4694]: VCS WARNING V-16-1-53034 HAD Signal SIGABRT received
Jul 11 15:32:46 node1 Had[5481]: VCS WARNING V-16-1-53034 HAD Signal SIGABRT received
The stack from _had cores points to hangs in select() system call. A sample stack from _had core is shown below:
(0) 0x000000000479fdc0 _Z12VCSDumpStackv + 0x3b0 at Platform.C:1830 [/opt/VRTSvcs/bin/had]
(1) 0x00000000047a1000 VCSAbrtHandler + 0x60 at Platform.C:1990 [/opt/VRTSvcs/bin/had]
(2) 0xe0000001205c7420 ---- Signal 6 (SIGABRT) delivered ----
(3) 0x60000000c0948830 _select_sys + 0x30 [/usr/lib/hpux32/libc.so.1]
(4) 0x60000000c095ed40 _select + 0xe0 at ../../../../../core/libs/libc/shared_em_32_perf/../core/syscalls/t_select.c:21 [/usr/lib/hpux32/libc.so.1]
(5) 0x00000000046c4f30 _ZN9IpmHandle6eventsEP5DListPS1_S1_S2_i + 0xb30 at Ipm.C:502 [/opt/VRTSvcs/bin/had]
(6) 0x00000000046d0100 _ZN9IpmHandle4sendEP5VListi + 0x1300 at Ipm.C:2230 [/opt/VRTSvcs/bin/had]
(7) 0x000000000464e560 _ZN6System12process_dumpEPvP6MsgHdr + 0x920 at System.C:4871 [/opt/VRTSvcs/bin/had]
(8) 0x00000000041e2330 _Z15process_messagePvP5VListi + 0xda0 at had.C:461 [/opt/VRTSvcs/bin/had]
(9) 0x00000000041f5a50 _Z4MAINmPPc + 0x8d50 at had.C:3076 [/opt/VRTSvcs/bin/had]
(10) 0x0000000004206270 main + 0x40 at had.C:3576 [/opt/VRTSvcs/bin/had]
(11) 0x60000000c00427c0 main_opd_entry + 0x50 [/usr/lib/hpux32/dld.so]
This points to HP-UX OS issue. Further analysis by HP referred to a regression caused by the OS patch PHKL_41700.
To get around this problem, HP has suggested customer to tune "hires_timeout_enable" kernel parameter to 1 before starting cluster. Run the following command to set this variable to 1.
# kctune hires_timeout_enable=1
Another possible solution is to install the following kernel patch:
PHKL_41967
Please note the above patch is the most current at the time this document was edited (July 2011). Check with HP to see if a new release of the patch is available.Applies To
This issue is specific to:
HP-UX 11.31
VCS or SFRAC 5.0.1 and subsequent patches
HP-UX kernel patch PHKL_41700 installed
VCS High Availability Daemon (had) is getting killed by GAB continuously.