Random odm_tsk_hold_add panics encountered on InfoScale 7.4.1.1100/sles12sp4 (4.12.14-95.29 kernel) systems with Oracle 13c databases.

book

Article ID: 100048077

calendar_today

Updated On:

Description

Error Message

 

Cause

The panics were due to an odm pid leak. This pid leak could lead to situations involving exhaustion of pid space, reuse of of pids and negative refcounts of pid objects.

Resolution

The odm code has been modified to make sure that allocated pids are freed up when they are no longer in use.

The fixed code is present in the odm 7.4.1.1700 patch. This patch will require VxFS 7.4.1.1700 to be installed, so that the updated odm driver is recognized by VxFS.


Once both patches have been installed, the "/proc/slabinfo | grep pid " output can be monitored for both active and total number of pid objects. If a pid leak is present, these values should increase over time. It should be noted that the PID slab usage is system-wide and can be increased by other modules or applications.

 

Issue/Introduction

Random odm_tsk_hold_add panics encountered on InfoScale 7.4.1.1100/sles12sp4 (4.12.14-95.29 kernel) systems with Oracle 13c databases and different i/o workloads.
In each occurrence the following panic stack was observed: crash> bt
PID: 30498 TASK: ffff923aa7ab50c0 CPU: 10 COMMAND: "ora_lg00_dbkro5"
#0 [ffffb975a3dcfb08] machine_kexec at ffffffff8605e902
#1 [ffffb975a3dcfb58] __crash_kexec at ffffffff861211ba
#2 [ffffb975a3dcfc18] crash_kexec at ffffffff861221a9
#3 [ffffb975a3dcfc30] oops_end at ffffffff8602e091
#4 [ffffb975a3dcfc50] no_context at ffffffff8606dcfb
#5 [ffffb975a3dcfca0] __do_page_fault at ffffffff8606e1dc
#6 [ffffb975a3dcfd08] do_page_fault at ffffffff8606e63b
#7 [ffffb975a3dcfd30] page_fault at ffffffff868016f5
[exception RIP: __task_pid_nr_ns+10]
RIP: ffffffff860a742a RSP: ffffb975a3dcfde8 RFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffff8705d540 RCX: 00007ffd05f7fa50
RDX: ffffffff8705d540 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffb975a3dcfe20 R8: 0000000000000007 R9: 0000000011cc2200
R10: 0000000000000000 R11: 0000000000000000 R12: ffff923d74f7ec80
R13: ffff923db360ecd0 R14: ffff923ae2821f00 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffb975a3dcfde8] odm_tsk_hold_add at ffffffffc0db5962 [vxodm]
#9 [ffffb975a3dcfe08] odm_ioctl_ctl at ffffffffc0da3f9f [vxodm]
#10 [ffffb975a3dcfe80] odm_ioctl_ctl_unlocked at ffffffffc0db367d [vxodm]
#11 [ffffb975a3dcfe88] do_vfs_ioctl at ffffffff86260ca2
#12 [ffffb975a3dcfef8] sys_ioctl at ffffffff86261264
#13 [ffffb975a3dcff30] do_syscall_64 at ffffffff86003934
#14 [ffffb975a3dcff50] entry_SYSCALL_64_after_hwframe at ffffffff8680009a
RIP: 00007f108b7dd407 RSP: 00007ffd05f7fa18 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007ffd05f7fa50 RCX: 00007f108b7dd407
RDX: 00007ffd05f7fa50 RSI: 0000000056584f1c RDI: 0000000000000007
RBP: 0000000015173630 R8: 0000000015173650 R9: 0000000011cc2200
R10: 0000000015173710 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00000000151735f0 R15: 0000000015173650
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b

Additional Information

ETrack: 3987866 JIRA: STESC-4561