Panic string:
GAB: Port h halting system due to client process failure
Panic stack:
==== panic thread: 0x2a10015fca0 ==== CPU: 22 ====
==== panic kernel thread: 0x2a10015fca0 PID: 0 on CPU: 22 affinity CPU: 22 ====
cmd: sched
[...]
unix:panicsys+0x48(0x2a10015f734, 0x2a10015f710, 0x18bab80, 0x1, , , 0x4480001606, , , , , , , , 0x2a10015f734, 0x2a10015f710)
unix:vpanic_common+0x78(0x2a10015f734, 0x2a10015f710, 0x7d5059d5, 0xa, 0xa, 0x1833000)
genunix:cmn_err+0x98(0x3, 0x2a10015f734, 0x4, 0x2a10015f76d, 0x2a10015f76e, 0x0)
gab:gab_halt+0xb0(0x36d70507, 0x36d70507, 0x600a28da180, 0x0, 0x0, 0x0)
gab:gab_kill_process+0xc8(0x10507)
gab:gab_timerscan+0x400(0x0)
genunix:callout_execute+0xb8(0x6009a044000, 0x6009a0131b8, 0x10b945c)
genunix:taskq_thread+0x1a4(0x6009a00d5f8, 0x0)
unix:thread_start+0x4()
-- end of kernel thread's stack --
VCS engine (had) and GAB could not communicate and attempted to kill and restart had. As 'had' was stuck in kernel, GAB initiated panic of the system.
Sat Oct 22 07:05:15 2011| GAB INFO V-15-1-20041 Port h: client process failure: killing process
Sat Oct 22 07:05:30 2011| GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
Sat Oct 22 07:05:45 2011| GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
Sat Oct 22 07:06:00 2011| GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure
Sat Oct 22 07:06:30 2011| GAB WARNING V-15-1-20138 Port h isolated due to client process failure
Core dump shows that _had was stuck in OS code.
genunix:turnstile_block+0x600(0x600a2325e08, 0x0, 0x19cc178, 0x1832ce0, 0x0, 0x0)
unix:mutex_vector_enter+0x428()
unix:mutex_enter(0x19cc178) - frame recycled
unix:page_get_mnode_freelist+0x150(0xf, 0x1, 0x0, 0x0, 0xb, 0xf)
unix:page_get_freelist+0x430(0x3003ac65d00, 0x30065780000, 0x600df8aa288, 0xff1b4000, 0x2000, 0xb)
unix:page_create_va+0x32c(0x3003ac65d00, 0x30065780000, 0x2000, 0x3, 0x600df8aa288, 0xff1b4000)
genunix:swap_getapage+0x168(0x3003ac65d00, 0x30065780000, 0x2000, 0x0, 0x2a107ac3490, 0x2000, 0x600df8aa288, 0xff1b4000, 0x4, 0x3003cc65760)
genunix:swap_getpage+0x4c(0x3003ac65d00, 0x30065780000, 0x2000, 0x0, 0x2a107ac3490, , , 0xff1b4000, 0x4, 0x3003cc65760)
Apart from 'had', 75 other threads were stuck in page_get_mnode_freelist(). _had seems to be a victim of an operating system (OS) defect.
Further analysis by Oracle (OS support vendor) confirmed that the issue is related to a known Solaris bug # 6778289.
Oracle advised to install Kernel patch 144500-19 to address known Solaris bug # 6778289.
Customers are encouraged to contact Oracle support for further details on the OS defect.
Applies To
This issue is applicable to:
- Systems running Solaris operating system
- VCS (any version)