How to determine whether SFRAC node panicked due to CRS timeout
book
Article ID: 100002410
calendar_today
Updated On:
Resolution
Obtain crash dump from customer system and verify panicstring/thread
SolarisCAT(vmcore.7/10U)> panic
panic on cpu1
panic string: forced crash dump initiated at user request
====panic user (LWP_SYS) thread: 0x300056dc340 PID: 16038 on CPU:1 ==== --------<<< Note PID id
cmd: /sbin/uadmin 51 --------<<< Note cmd
t_procp:0x30003dc5120
p_as: 0x300059873f8 size: 2621440 rss:1474560
hat: 0x30008299880 cnum: 0x0 cpusran:1
zone: global
t_stk: 0x2a100bdbae0 sp:0x2a100bdb0b1 t_stkbase: 0x2a100bd6000
t_pri: 59(TS) pctcpu:0.037107
t_lwp: 0x60012438098 machpcb: 0x2a100bdbae0
mstate:LMS_SYSTEM ms_prev: LMS_USER
ms_state_start: 0.0000116 secondsearlier
ms_start: 0.2235608 seconds earlier
psrset: 0 lastCPU: 1
idle: 0 ticks (0 seconds)
start: Wed Jun 16 07:06:51 2010
age: 0seconds (0 seconds)
syscall: #55 uadmin(, 0xffbffce8) (sysent:genunix:uadmin+0x0)
tstate: TS_ONPROC - thread is being run on aprocessor
tflg: T_PANIC - thread initiated a systempanic
T_DFLTSTK - stack is defaultsize
tpflg: TP_TWAIT - wait to be freed bylwp_wait
TP_MSACCT - collect micro-stateaccounting information
tsched: TS_LOAD - thread is inmemory
TS_DONT_SWAP - thread/LWP should not beswapped
pflag: SMSACCT - process is keeping micro-stateaccounting
SMSFORK - child inherits micro-stateaccounting
pc: 0x106b2f4 unix:panic+0x1c: call unix:vpanic
unix:panic+0x1c(0x1269e48, 0x1, 0x1815000, 0x1815000,0x2b, 0x0)
genunix:kadmin+0x4ac(, 0x1, 0x0,0x60010803d98)
genunix:uadmin+0x11c(,0x1)
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack--
Print process tree of panicpid
SolarisCAT(vmcore.7/10U)>proc tree16038
4059 /bin/sh /etc/init.d/init.cssdfatal
6855 /bin/sh /etc/init.d/init.cssd daemon---------------<<< This shows Oracle CRS daemon issued uadmin commandwhich resulted in system panic
16038 /sbin/uadmin 51
There are many reason can cause this type of panics
-System is too busy
-Slow SAN response
-Files system is not responding
Verify whether customer has configured OCR andVOTEDISK on CFS file system
# exportPATH=$PATH:/apps/crshome/bin
# ocrcheck
Status of Oracle Cluster Registryis as follows :
Version : 2
Total space (kbytes) : 262144
Used space(kbytes) : 3264
Available space (kbytes) : 258880
ID : 1962738043
Device/FileName :/ocrvote/ocrdisk
Device/Fileintegrity checksucceeded
Device/Filenot configured
Cluster registry integritycheck succeeded
# crsctl query css votedisk
0. 0 /ocrvote/votedisk
located 1 votedisk(s).
#mount -v |grep /ocrvote
/dev/vx/dsk/ocrvotedg/ocrvotevol on /ocrvotetype vxfsread/write/setuid/devices/mincache=direct/delaylog/largefiles/qio/cluster/ioerror=mdisable/crw/mntlock=VCS/dev=4f0dea8on Wed Jun 16 14:19:25 2010
Check current timeout values forCRS
# /apps/crshome/bin/crsctl get css disktimeout
#/apps/crshome/bin/crsctl get css misscount
# /apps/crshome/bin/crsctl getcss reboottime
Advise customer to increase value of aboveTIMEOUT values on all RAC nodes to prevent similar panics based on out come ofcrash dump analysis
# /apps/crshome/bin/crsctl set css misscount 300
#/apps/crshome/bin/crsctl set css disktimeout 300
Issue/Introduction
How to determine whether SFRAC node panicked due to CRS timeout
Was this article helpful?
thumb_up
Yes
thumb_down
No