Intermittent llt panics observed on InfoScale 7.4.1/AIX 7.2TL5 systems with errcheckdetail enabled.

book

Article ID: 100062279

calendar_today

Updated On:

Description

Error Message

CRASH INFORMATION:
CPU 56 CSA F100091561146D00 at time of crash, error code for LEDs: 70000000
pvthread+153600 STACK:
[00008BF0].simple_unlockir+000070 ()
[07DA24A8]07DA24A8 ()
[07DB3B20]07DB3B20 ()
[07DA2FB0]07DA2FB0 ()
[000D464C]clock+0002CC (??)
[00177378]i_softmod+0004F8 ()
[00142770]flih_util+000258 ()
____ Exception (F00000002FF47600) ____

Cause

In AIX 7.2 TL5 IBM introduced a performance feature for unlocking the interrupt disabled path. The change asserts that there are no waiters on this lock, since it is only correct that all threads contending on a given lock must be disabled (and spin rather than sleep) or they must all be enabled (intbase). In cases where the lock holder is on another cpu, it would eventually release the lock and the disabled lock would succeed. If a strong error check (eg errorcheckdetail 7 ) is enabled and IBM detects incorrect locking semantics, it will trigger an assertion (system panic).

In the corefiles it was observed that LLT had the lock disabled, but then, on a different thread, it tried to take the lock at intbase.

LLT registered a timer with AIX by specifying interrupt priority INTTIMER. The timer handler was unlocking llt_poll_req_lock .

CRASH INFORMATION:
CPU 56 CSA F100091561146D00 at time of crash, error code for LEDs: 70000000
pvthread+153600 STACK:
[00008BF0].simple_unlockir+000070 ()
[07DA24A8]07DA24A8 ()
[07DB3B20]07DB3B20 ()
[07DA2FB0]07DA2FB0 ()
[000D464C]clock+0002CC (??)
[00177378]i_softmod+0004F8 ()
[00142770]flih_util+000258 ()
____ Exception (F00000002FF47600) ____


 
Another LLT thread for polling mac address of private link was acquiring the same lock at INTBASE.

 
(60)> th pvthread+18D600
                SLOT NAME     STATE    TID PRI   RQ CPUID  CL  WCHAN
 
pvthread+18D600 6358 llt_poll SLEEP D60301 03C   80       257  llt_poll_req_lock slist_table+000A80

(80)> f 6358
pvthread+18D600 STACK:
[006CE9C0]slock+000580 (0000000000137D3C, 8000000000001032 [??])
[0000956C].simple_lock+00006C ()   <<<<<<<
[F1000915901131A0]F1000915901131A0 ()
[00014D70].hkey_legacy_gate+00004C ()
[07DA2620].llt_poll_procfunc+000080 ()
[00486230]procentry+000010 (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF9220

Resolution

LLT locking has been fixed in the VRTSllt 7.4.1.1101 hotfix.

A hotfix is now available for this issue in the current version(s) of the product(s) mentioned. Please contact Veritas Technical Support to obtain the hotfix.

Issue/Introduction

Intermittent LLT panics observed on InfoScale 7.4.1/AIX 7.2TL5 systems with errcheckdetail enabled.

Additional Information

JIRA: STESC-7487