savecore: [ID 570001 auth.error] reboot after panic: GAB: Port h halting system due to client process failure
The issue was related to UFS filesystem logging tuning.
From the crash analysis, found that 'had' seems to be stuck in seemed kernel context waiting on UFS: Sent the analysis to the customer and recommended that they contact Sun/Oracle, provide our analysis, and ask about what would cause this. Sun confirmed that it was their issue, and provided UFS tuning.
There are numerous hardware errors noticed in the syslog and message buffer too. IOSTAT reports a lot of hard and transport errors.
From the crashdump, we identify one thread of devfsadm command in biowait, and 79 threads waiting for a mutex.
We need to identify if all the 79 threads are waiting for the same mutex lock and what is the mutex it is waiting for?
SolarisCAT(vmcore.0/10U)> tlist biowait
thread: 0x300085c1840 state: slp
PID: 396 cmd: devfsadmd
idle: 5 ticks (0.05 seconds)
buf @ 0x300dd2ae740
b_edev: 328(vxio),0 //platform/sun4u-us3/lib/libc_psr.so.1/platform/sun4u-us3/lib/sparcv9/libc_psr.so.1 b_blkno: 0x548dec
b_addr: 0x0 b_bufsize: 0x400
b_bcount: 1024
b_vp: 0x6005886de00 v_op: *specfs(bss):spec_vnodeops
b_flags: 0x80053 (BUSY|DONE|PAGEIO|READ|NOCACHE)
1 thread in biowait() found.
threads in biowait() by device:
count device (thread: max idle time)
1 328(vxio),0 (0x300085c1840: 0.05 seconds) //platform/sun4u-us3/lib/libc_psr.so.1/platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
Note that the thread is idle only 5 ticks, so biowait may be misleading.
The VCS had and hashadow process threads are also waiting for a mutex/lock.
SolarisCAT(vmcore.0/10U)> proc -l 11514
addr PID PPID RUID/UID size RSS swresv time command
============= ====== ====== ========== ========== ======== ======== ====== =========
0x6008425b958 11514 1 0 19685376 8568832 8667136 1328853 /opt/VRTSvcs/bin/had
thread: 0x300148b63c0 state: slp wchan: 0x6008425ba1e sobj: condition var (from genunix:exitlwps+0x11c) <<<<
thread: 0x300149029c0 state: slp wchan: 0x60057957760 sobj: mutex
SolarisCAT(vmcore.0/10U)> proc -l 11577
addr PID PPID RUID/UID size RSS swresv time command
============= ====== ====== ========== ========== ======== ======== ====== =========
0x600844dad40 11577 1 0 3670016 8192 540672 7 /opt/VRTSvcs/bin/hashadow
thread: 0x300085b9180 state: slp wchan: 0x6008532cb94 sobj: condition var (from genunix:wait_for_lock+0x34) <<<<