GAB forced a panic because the "had" process was waiting for a return from the operating system.

book

Article ID: 100001673

calendar_today

Updated On:

Description

Error Message

GAB: Port h halting system due to client process failure

Resolution


GAB forced the panic because it could not communicate with the "had" processes. In turn the "had" processes were waiting for Solaris streams to return.  Please note that GAB's inability to communicate with "had" can be caused by a number of different issues.  This is just one example.


1. To determine if this issue might be the cause of the panic, get the process numbers for the "had" processes.

SolarisCAT(vmcore.0/10U)> proc | grep had
0x3002db8d330  20571  7678          0    3678208  1466368  581632      1/opt/VRTSvcs/bin/hashadow
0x3006c25cf10  20548      1          0  23740416   548864 13910016      0/opt/VRTSvcs/bin/had
0x60080dc40c0  20538      1          0  23740416   548864 13910016      0/opt/VRTSvcs/bin/had
0x6007a716098  7785      1          0  21307392   745472 10469376      0/opt/VRTSvcs/bin/had
0x60080bef138  7678      1          0    3678208  2244608  581632      2/opt/VRTSvcs/bin/hashadow

SolarisCAT(vmcore.0/10U)>


2. Get the stack of one or more of the "had" processes to see what it was waiting on.

SolarisCAT(vmcore.0/10U)> proc -L 7785
   addr       PID    PPID  RUID/UID     size      RSS     swresv  time  command
============= ====== ====== ========== ========== ================ ====== =========
0x6007a716098  7785      1          0  21307392   745472 10469376      0/opt/VRTSvcs/bin/had

==== user (LWP_SYS) thread: 0x3004b5a6aa0  PID:7785 ====
cmd: /opt/VRTSvcs/bin/had
t_wchan: 0x60059038e5a  sobj:condition var (from genunix:str_cv_wait+0x28)
t_procp:0x6007a716098
 p_as: 0x300247f90b0  size: 21307392  RSS:745472
 hat: 0x30086895b40
   cnum:CPU8:1263/2670
   cpusran:0,1,2,3,9,10,11,16,19,24,25,26,27
 zone: global
t_stk:0x2a1095b5ae0  sp: 0x2a1095b4c81  t_stkbase: 0x2a1095b0000
t_pri:157(RT)  t_tid: 2  pctcpu: 0.000000
t_lwp:0x30090350880  machpcb: 0x2a1095b5ae0
 mstate:LMS_SLEEP  ms_prev: LMS_SYSTEM
 ms_state_start: 1 minutes18.2366647 seconds earlier
 ms_start: 2 days 19 hours 9 minutes34.6118094 seconds earlier
psrset: 0  last CPU: 3
idle: 7824 ticks (1minutes 18.24 seconds)
start: Wed Mar 24 00:58:56 2010
age: 241773 seconds(2 days 19 hours 9 minutes 33 seconds)
syscall: #4 write(, 0xfeff7c00)(sysent: genunix:write32+0x0)
tstate: TS_SLEEP - awaiting anevent
tflg:   T_WAKEABLE - thread is blocked, signalsenabled
       T_DFLTSTK - stack is defaultsize
tpflg:  TP_TWAIT - wait to be freed bylwp_wait
       TP_MSACCT - collect micro-stateaccounting information
tsched: TS_LOAD - thread is inmemory
       TS_DONT_SWAP - thread/LWP should not beswapped
pflag:  SMSACCT - process is keeping micro-stateaccounting
       SMSFORK - child inherits micro-stateaccounting

pc:      genunix:cv_wait_sig+0x114:  call      unix:swtch

genunix:cv_wait_sig+0x114(,0x600587f3028)
genunix:str_cv_wait+0x28(0x60059038e5a, 0x600587f3028,0xffffffffffffffff, 0x0, , 0x8000084)
genunix:strwaitq+0x238(0x600587f2fa8,0x1, , 0x2, 0xffffffffffffffff,0x2a1095b576c)
genunix:strwrite_common+0x278(,0x2a1095b5a98)
specfs:spec_write(0x600592d76c0, 0x2a1095b5a98, 0x0,0x6007ae044e0, 0x0) - frame recycled
genunix:fop_write+0x20(0x600592d76c0,0x2a1095b5a98, 0x0, 0x6007ae044e0, 0x0)
sysmsg:sysmwrite+0xe4(,0x2a1095b5a98, 0x6007ae044e0)
specfs:spec_write(0x6005ec2a6c0, 0x2a1095b5a98,0x0, 0x6007ae044e0, 0x0) - framerecycled
genunix:fop_write+0x20(0x6005ec2a6c0, 0x2a1095b5a98, 0x0,0x6007ae044e0,0x0)
genunix:write+0x268(0x21)
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack --



SolarisCAT(vmcore.0/10U)>


3. Additional information
 
In this case, we see that the "had" process was waiting for a return from Solaris streams.  The cause of the hang seems to be Sun Bug ID 6893378.
 
Additional information is available at the following link:  
 
Contact Veritas Support for additional information.
 
Please note that GAB's inability to communicate with "had" can be caused by a number of different issues.  This is just one example.
 
 

 

Issue/Introduction

GAB forced a panic because the "had" process was waiting for a return from the operating system.