GAB initiated panic on AIX with low system utilization

book

Article ID: 100023102

calendar_today

Updated On:

Resolution

In the errlog, the following entry is found:
---------------------------------------------------------------------------
LABEL:  KERNEL_PANIC
IDENTIFIER:225E3B63
Date/Time:       Mon Nov  3 12:13:102008
Sequence Number: 1074016
Machine Id:      00344C0C4C00
Node Id:        xxxxxxx
Class:          S
Type:            TEMP
Resource Name:   PANIC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
ASSERT STRING
PANIC STRING
GAB: Port h halting system due to client process failure
---------------------------------------------------------------------------

The stack trace of the stack which caused the panic is:
(3)>f
pvthread+00DC00 STACK:
[00021B50].panic_trap+000000()
[0896632C]gab_halt+000080 (??)
[0896632C]gab_halt+000080(??)
[08961B6C]gab_kill_process+0000A8 (??)
[08958D58]gab_timerscan+00032C(??)
[089547E0]gab_timeout_daemon+000080 (??)
[000FF624]procentry+000010(??, ??, ??, ??)

Usually this happens when the system is under stress and HAD (running in user space) does not get CPU cycles in time to respond to GAB (in kernel space).

However, it has been seen on AIX that HAD can be paged out even if the system is more or less idle.

To avoid this, it is advised to change the setting for  lru_file_repage from 1 to 0.

If lru_file_repage is set to 1, computational and non-computational pages can be swapped out. This includes HAD, and a paged-out HAD can easily miss a response to a GAB heartbeat.

To check the current value of lru_file_repage:
# vmo -o lru_file_repage

To change the value of lru_file_repage:
# vmo-p -o lru_file_repage=0
 
or 1 to enable

Issue/Introduction

GAB initiated panic on AIX with low system utilization