Veritas File System (VxFS) results in spin_lock_irqsave kernel panic (Hard LOCKUP) thru vx_rddirahead_prepare path

book

Article ID: 100046162

calendar_today

Updated On:

Description

Error Message

# crash --osrelease vmcore_#####-190816102041
2.6.32-754.12.1.el6.x86_64

Crash Analysis:
 

      KERNEL: vmlinux.2.6.32-754.12.1.el6.x86_64
    DUMPFILE: vmcore_#####-190816102041.  [PARTIAL DUMP]
        CPUS: 48
        DATE: Fri Aug 16 12:06:48 2019
      UPTIME: 6 days, 00:18:13
LOAD AVERAGE: 12.60, 7.22, 6.33
       TASKS: 7985
    NODENAME: #####
     RELEASE: 2.6.32-754.12.1.el6.x86_64
     VERSION: #1 SMP Thu Mar 7 22:07:44 EST 2019
     MACHINE: x86_64  (2199 Mhz)
      MEMORY: 383.9 GB
       PANIC: "Kernel panic - not syncing: Hard LOCKUP"
         PID: 188307
     COMMAND: "find"
        TASK: ffff8832485db520  [THREAD_INFO: ffff883a57158000]
         CPU: 28
       STATE: TASK_RUNNING (PANIC)

crash>


crash> bt
PID: 188307  TASK: ffff8832485db520  CPU: 28  COMMAND: "find"
 #0 [ffff88016eb89a58] machine_kexec at ffffffff81040f1b
 #1 [ffff88016eb89ab8] crash_kexec at ffffffff810d6702
 #2 [ffff88016eb89b88] panic at ffffffff81558791
 #3 [ffff88016eb89c28] __perf_event_overflow at ffffffff81130afa
 #4 [ffff88016eb89ca8] perf_event_overflow at ffffffff81131154
 #5 [ffff88016eb89cb8] intel_pmu_handle_irq at ffffffff81025c8c
 #6 [ffff88016eb89e90] perf_event_nmi_handler at ffffffff8155e83f
 #7 [ffff88016eb89ea0] notifier_call_chain at ffffffff81560350
 #8 [ffff88016eb89ee0] atomic_notifier_call_chain at ffffffff815603ba
 #9 [ffff88016eb89ef0] notify_die at ffffffff810b12ce
#10 [ffff88016eb89f20] do_nmi at ffffffff8155dea9
#11 [ffff88016eb89f50] nmi at ffffffff8155d781
    [exception RIP: _spin_lock_irqsave+50]           <<<< ---
#12 [ffff883a5715bc08] _spin_lock_irqsave at ffffffff8155c3c2
#13 [ffff883a5715bc10] vx_rddirahead_prepare at ffffffffa093b9a0 [vxfs]  <--- VxFS
#14 [ffff883a5715bc50] vx_readdir_int at ffffffffa09d1c6b [vxfs]
#15 [ffff883a5715bdc0] vx_do_readdir at ffffffffa09d44b6 [vxfs]
#16 [ffff883a5715be60] vx_readdir at ffffffffa09d4896 [vxfs]
#17 [ffff883a5715beb0] vfs_readdir at ffffffff811b8796
#18 [ffff883a5715bf00] sys_getdents at ffffffff811b8929
#19 [ffff883a5715bf50] system_call_fastpath at ffffffff81564357
    RIP: 000000300a4a90a5  RSP: 00007ffdf4252718  RFLAGS: 00010202
    RAX: 000000000000004e  RBX: 0000000001d82290  RCX: ffffffff8156427e
    RDX: 0000000000008000  RSI: 0000000001d822c0  RDI: 0000000000000005
    RBP: ffffffffffffff00   R8: 0000000001d822c0   R9: 0000000000000001
    R10: 0000000000000018  R11: 0000000000000246  R12: 0000000000000016
    R13: 0000000001d822c0  R14: 0000000001d7f4f0  R15: 00000000000010d9
    ORIG_RAX: 000000000000004e  CS: 0033  SS: 002b
crash>

crash> dis -lr _spin_lock_irqsave+50 |tail -20
0xffffffff8155c391 <_spin_lock_irqsave+1>:      mov    %rsp,%rbp
0xffffffff8155c394 <_spin_lock_irqsave+4>:      nopl   0x0(%rax,%rax,1)
/usr/src/debug/kernel-2.6.32-754.12.1.el6/linux-2.6.32-754.12.1.el6.x86_64/arch/x86/include/asm/paravirt.h: 872
0xffffffff8155c399 <_spin_lock_irqsave+9>:      pushfq
0xffffffff8155c39a <_spin_lock_irqsave+10>:     pop    %rax
0xffffffff8155c39b <_spin_lock_irqsave+11>:     nopl   0x0(%rax,%rax,1)
0xffffffff8155c3a0 <_spin_lock_irqsave+16>:     mov    %rax,%rdx
/usr/src/debug/kernel-2.6.32-754.12.1.el6/linux-2.6.32-754.12.1.el6.x86_64/arch/x86/include/asm/paravirt.h: 882
0xffffffff8155c3a3 <_spin_lock_irqsave+19>:     cli
0xffffffff8155c3a4 <_spin_lock_irqsave+20>:     nopw   0x0(%rax,%rax,1)
/usr/src/debug/kernel-2.6.32-754.12.1.el6/linux-2.6.32-754.12.1.el6.x86_64/arch/x86/include/asm/spinlock.h: 127
0xffffffff8155c3aa <_spin_lock_irqsave+26>:     mov    $0x10000,%eax
0xffffffff8155c3af <_spin_lock_irqsave+31>:     lock xadd %eax,(%rdi)
0xffffffff8155c3b3 <_spin_lock_irqsave+35>:     movzwl %ax,%ecx
0xffffffff8155c3b6 <_spin_lock_irqsave+38>:     shr    $0x10,%eax
0xffffffff8155c3b9 <_spin_lock_irqsave+41>:     cmp    %eax,%ecx
0xffffffff8155c3bb <_spin_lock_irqsave+43>:     je     0xffffffff8155c3cb <_spin_lock_irqsave+59>
0xffffffff8155c3bd <_spin_lock_irqsave+45>:     pause
0xffffffff8155c3bf <_spin_lock_irqsave+47>:     movzwl (%rdi),%ecx
0xffffffff8155c3c2 <_spin_lock_irqsave+50>:     jmp    0xffffffff8155c3b9 <_spin_lock_irqsave+41>
crash>

 

NOTE: Only CPU28 is lagging:
 

crash> runq -T
  CPU 47: 0.00 secs
  CPU 46: 0.00 secs
  CPU 45: 0.00 secs
  CPU 44: 0.00 secs
  CPU 43: 0.00 secs
  CPU 42: 0.00 secs
  CPU 41: 0.00 secs
  CPU 40: 0.00 secs
  CPU 39: 0.00 secs
  CPU 38: 0.00 secs
  CPU 37: 0.00 secs
  CPU 36: 0.00 secs
  CPU 35: 0.00 secs
  CPU 34: 0.00 secs
  CPU 33: 0.00 secs
  CPU 32: 0.00 secs
  CPU 31: 0.00 secs
  CPU 30: 0.00 secs
  CPU 29: 0.00 secs
  CPU 27: 0.00 secs
  CPU 25: 0.00 secs
  CPU 22: 0.00 secs
  CPU 19: 0.00 secs
  CPU 21: 0.00 secs
  CPU 23: 0.00 secs
  CPU 20: 0.00 secs
  CPU 18: 0.00 secs
  CPU 17: 0.00 secs
  CPU 26: 0.00 secs
  CPU 24: 0.00 secs
  CPU 16: 0.00 secs
  CPU 14: 0.00 secs
  CPU 15: 0.00 secs
  CPU 12: 0.00 secs
   CPU 8: 0.00 secs
   CPU 9: 0.00 secs
  CPU 11: 0.00 secs
  CPU 10: 0.00 secs
   CPU 1: 0.00 secs
  CPU 13: 0.00 secs
   CPU 6: 0.00 secs
   CPU 7: 0.00 secs
   CPU 5: 0.00 secs
   CPU 2: 0.00 secs
   CPU 0: 0.00 secs
   CPU 3: 0.00 secs
   CPU 4: 0.00 secs
  CPU 28: 35.95 secs

 

Cause

InfoScale 7.3.x & 7.4.0 has a known issue regarding spin locks with VxFS. The spinlock is getting incorrectly re-initialized in the vx_rddirahead_prepare code path, which is causing a deadlock on the system.

Cross platform fixes created for RHEL6, 7, AIX and Solaris 11.
 

Resolution

This is fixed in the 7.4.1 GA release.

In this instance, the environment is running RedHat 6.10 with Veritas File System VRTSvxfs-7.4.0.1100-RHEL6.x86_64 & VRTSodm-7.4.0.1100-RHEL6.x86_64 requires an ODM hot-fix as well.

# cat etc/redhat-release
Red Hat Enterprise Linux Server release 6.10 (Santiago)


In addition to the RHEL6 based VxFS (fs-rhel6_x86_64-HotFix-7.4.0.1402) hot-fix, Veritas engineering are also recommending that the Veritas ODM (odm-rhel6_x86_64-HotFix-7.4.0.1401) hot-fix also be applied.
 

Patch ID: 7.4.0.1401
* 3972851 (3972852) After installing VRTSvxfs-7.4.0.1402 HF on rhel6 platform,ODM (VRTSodm-7.4.0.1100) fails to start.


Contact Veritas Technical Support to obtain the required hot-fixes.
 

Patch ID: RHEL6 RHEL7 AIX 7.2 SOL 11
infoscale-rhel6_x86_64-Patch-7.3.1.100 X      
infoscale-rhel7_x86_64-Patch-7.3.1.200   X    
infoscale-rhel7_x86_64-Patch-7.3.1.300   X    
fs-rhel6_x86_64-HotFix-7.4.0.1402  (requires ODM fix) X      
infoscale-rhel7_x86_64-Patch-7.4.0.1200   X    
fs-rhel7_x86_64-Patch-7.4.0.1400   X    
infoscale-aix72-Patch-7.3.1.100     X  
infoscale-sol11_sparc-Patch-7.3.1.200       X
infoscale-sol11_sparc-Patch-7.4.0.1200       X
fs-sol11_sparc-Patch-7.4.0.1500       X
infoscale-sol11.4_sparc-Patch-7.4.0.1200       X


 

Issue/Introduction

Veritas File System (VxFS) InfoScale 7.3.x and 7.4.0 may result in spin_lock_irqsave kernel panic (Hard LOCKUP) thru vx_rddirahead_prepare path.