System panics due to NULL pointer dereference in vx_real_drop_inode.

book

Article ID: 100013635

calendar_today

Updated On:

Description

Error Message

After reviewing of the system vmcore file, it reveals the following stack information:

KERNEL: ./usr/lib/debug/lib/modules/2.6.32-279.31.1.el6.x86_64/vmlinux
    DUMPFILE: ./vmcore  [PARTIAL DUMP]
        CPUS: 40
        DATE: Tue Jul  1 13:40:21 2014
      UPTIME: 58 days, 05:22:12
LOAD AVERAGE: 3.98, 3.57, 3.46
       TASKS: 2016
    NODENAME: server1
     RELEASE: 2.6.32-279.31.1.el6.x86_64
     VERSION: #1 SMP Sun May 26 06:54:41 EDT 2013
     MACHINE: x86_64  (2397 Mhz)
      MEMORY: 1024 GB
       PANIC: "Oops: 0000 [#1] SMP " (check log for details)             <--------------------- Panic stack
         PID: 56848
     COMMAND: "flush-199:7002"
        TASK: ffff88ff916c9540  [THREAD_INFO: ffff88ff4052c000]         <----------------- Thread information
         CPU: 24
       STATE: TASK_RUNNING (PANIC)

Let's check specific thread trace information for the task:

crash> bt ffff88ff916c9540
PID: 56848  TASK: ffff88ff916c9540  CPU: 24  COMMAND: "flush-199:7002"
#0 [ffff88ff4052d7d0] machine_kexec at ffffffff81031fcb
#1 [ffff88ff4052d830] crash_kexec at ffffffff810b8de2
#2 [ffff88ff4052d900] oops_end at ffffffff814edb30
#3 [ffff88ff4052d930] no_context at ffffffff81042a0b
#4 [ffff88ff4052d980] __bad_area_nosemaphore at ffffffff81042c95
#5 [ffff88ff4052d9d0] bad_area_nosemaphore at ffffffff81042d63
#6 [ffff88ff4052d9e0] __do_page_fault at ffffffff810434c1
#7 [ffff88ff4052db00] do_page_fault at ffffffff814efb0e
#8 [ffff88ff4052db30] page_fault at ffffffff814ecec5
    [exception RIP: vx_real_drop_inode+108]                                   <----------------------
    RIP: ffffffffa07ceabc  RSP: ffff88ff4052dbe0  RFLAGS: 00010246
    RAX: 0000000000001130  RBX: 0000000000000000  RCX: ffff882381630db8
    RDX: 0000000000001130  RSI: ffffffff81fc3240  RDI: ffffffffa088b4d8
    RBP: ffff88ff4052dbf0   R8: 6040000000000000   R9: dc60a5d0a8a3ec08
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff88ffa5183000
    R13: ffff88ff4052dd20  R14: ffff882381630d98  R15: ffff882224fff4c8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#9 [ffff88ff4052dbf8] iput at ffffffff811905e2
#10 [ffff88ff4052dc18] writeback_sb_inodes at ffffffff811a05d0
#11 [ffff88ff4052dc78] writeback_inodes_wb at ffffffff811a06fb
#12 [ffff88ff4052dcd8] wb_writeback at ffffffff811a0a9b
#13 [ffff88ff4052ddd8] wb_do_writeback at ffffffff811a0d89
#14 [ffff88ff4052de68] bdi_writeback_task at ffffffff811a0e93
#15 [ffff88ff4052deb8] bdi_start_fn at ffffffff81134466
#16 [ffff88ff4052dee8] kthread at ffffffff81090896
#17 [ffff88ff4052df48] kernel_thread at ffffffff8100c0ca

 

Cause

The panic results from incomplete SE Linux support in the Storage Foundation product at version 5.1SP1PR2P1 (and lower).

Not all security structures for SE Linux are present in VxFS at this product revision.

SE Linux is shown as enabled:

crash> rd -d32 vx_linux_security_enabled
     ffffffffa084a624:             1

This can result in the "vx_inode_abi_prepared" variable having a NULL value.

crash> rd -d vx_inode_abi_prepared
ffffffffa088bd98:                0 <<------------- NULL

When the NULL value is referenced by a process, the system will panic.

Resolution

Update Storage Foundation to 5.1SP1RP4 to prevent recurrence of the panic.

The fix is included in 5.1SP1PR2RP4 for RHEL6 and is available here:

https://docs.infoscale.com

Issue/Introduction

System panic occurs frequently due to newly introduced security code in Linux Operating System.