How to determine whether SFRAC node panicked due to CRS timeout

book

Article ID: 100002410

calendar_today

Updated On:

Resolution

Obtain crash dump from customer system and verify panicstring/thread

SolarisCAT(vmcore.7/10U)> panic
panic on cpu1
panic string: forced crash dump initiated at user request
====panic user (LWP_SYS) thread: 0x300056dc340 PID: 16038 on CPU:1 ==== --------<<< Note PID id
cmd: /sbin/uadmin 51 --------<<< Note cmd
t_procp:0x30003dc5120
p_as: 0x300059873f8 size: 2621440 rss:1474560
hat: 0x30008299880 cnum: 0x0 cpusran:1
zone: global
t_stk: 0x2a100bdbae0 sp:0x2a100bdb0b1 t_stkbase: 0x2a100bd6000
t_pri: 59(TS) pctcpu:0.037107
t_lwp: 0x60012438098 machpcb: 0x2a100bdbae0
mstate:LMS_SYSTEM ms_prev: LMS_USER
ms_state_start: 0.0000116 secondsearlier
ms_start: 0.2235608 seconds earlier
psrset: 0 lastCPU: 1
idle: 0 ticks (0 seconds)
start: Wed Jun 16 07:06:51 2010
age: 0seconds (0 seconds)
syscall: #55 uadmin(, 0xffbffce8) (sysent:genunix:uadmin+0x0)
tstate: TS_ONPROC - thread is being run on aprocessor
tflg: T_PANIC - thread initiated a systempanic
T_DFLTSTK - stack is defaultsize
tpflg: TP_TWAIT - wait to be freed bylwp_wait
TP_MSACCT - collect micro-stateaccounting information
tsched: TS_LOAD - thread is inmemory
TS_DONT_SWAP - thread/LWP should not beswapped
pflag: SMSACCT - process is keeping micro-stateaccounting
SMSFORK - child inherits micro-stateaccounting

pc: 0x106b2f4 unix:panic+0x1c: call unix:vpanic

unix:panic+0x1c(0x1269e48, 0x1, 0x1815000, 0x1815000,0x2b, 0x0)
genunix:kadmin+0x4ac(, 0x1, 0x0,0x60010803d98)
genunix:uadmin+0x11c(,0x1)
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack--

Print process tree of panicpid

SolarisCAT(vmcore.7/10U)>proc tree16038
4059 /bin/sh /etc/init.d/init.cssdfatal
6855 /bin/sh /etc/init.d/init.cssd daemon---------------<<< This shows Oracle CRS daemon issued uadmin commandwhich resulted in system panic
16038 /sbin/uadmin 51

There are many reason can cause this type of panics

-System is too busy

-Slow SAN response

-Files system is not responding

Verify whether customer has configured OCR andVOTEDISK on CFS file system

# exportPATH=$PATH:/apps/crshome/bin
# ocrcheck
Status of Oracle Cluster Registryis as follows :
Version : 2
Total space (kbytes) : 262144
Used space(kbytes) : 3264
Available space (kbytes) : 258880
ID : 1962738043
Device/FileName :/ocrvote/ocrdisk
Device/Fileintegrity checksucceeded

Device/Filenot configured

Cluster registry integritycheck succeeded

# crsctl query css votedisk
0. 0 /ocrvote/votedisk

located 1 votedisk(s).

#mount -v |grep /ocrvote
/dev/vx/dsk/ocrvotedg/ocrvotevol on /ocrvotetype vxfsread/write/setuid/devices/mincache=direct/delaylog/largefiles/qio/cluster/ioerror=mdisable/crw/mntlock=VCS/dev=4f0dea8on Wed Jun 16 14:19:25 2010

Check current timeout values forCRS

# /apps/crshome/bin/crsctl get css disktimeout
#/apps/crshome/bin/crsctl get css misscount
# /apps/crshome/bin/crsctl getcss reboottime

Advise customer to increase value of aboveTIMEOUT values on all RAC nodes to prevent similar panics based on out come ofcrash dump analysis

# /apps/crshome/bin/crsctl set css misscount 300
#/apps/crshome/bin/crsctl set css disktimeout 300

Issue/Introduction

How to determine whether SFRAC node panicked due to CRS timeout

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"