The panic can occur on any running thread, but the system will typically crash with "BUG: unable to handle kernel paging request" or "general protection fault: 0000 [#1] SMP". The fault could come from a stack involving the kmem_cache family functions.
Panic stacks such as the following may be observed for different drivers. Thereby suggesting that a memory/slab corruption has occurred.
eg
PID: 21534 TASK: ff1e379a42020000 CPU: 6 COMMAND: "sh"
#0 [ff4ed83bf6503980] machine_kexec at ffffffffa8e6c1f3
#1 [ff4ed83bf65039d8] __crash_kexec at ffffffffa8fb59aa
#2 [ff4ed83bf6503a98] crash_kexec at ffffffffa8fb68e1
#3 [ff4ed83bf6503ab0] oops_end at ffffffffa8e2a9c1
#4 [ff4ed83bf6503ad0] do_general_protection at ffffffffa8e274a5
#5 [ff4ed83bf6503b60] general_protection at ffffffffa9a0113e
[exception RIP: kmem_cache_alloc+218]
RIP: ffffffffa9129eba RSP: ff4ed83bf6503c18 RFLAGS: 00010286
RAX: 967b3762439c35ea RBX: 00000000006000c0 RCX: 967b3762439c365a
RDX: 000000000002072b RSI: 00000000006000c0 RDI: 0000000000039bf0
RBP: ff1e375980b964c0 R8: ff1e3798002f9bf0 R9: 0000000000000000
R10: ff1e3762431500e8 R11: 0000000000000000 R12: 00000000006000c0
R13: ffffffffa8ef366a R14: ff1e379a3cd065a0 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ff4ed83bf6503c58] vm_area_dup at ffffffffa8ef366a
#7 [ff4ed83bf6503c68] __split_vma at ffffffffa90e5e19
#8 [ff4ed83bf6503c98] __do_munmap at ffffffffa90e609f
#9 [ff4ed83bf6503cf0] __vm_munmap at ffffffffa90e64e8
eg
PID: 20577 TASK: ff44bde40e838000 CPU: 4 COMMAND: "mh_driver.pl"
#0 [ff66a1a7b710b860] machine_kexec at ffffffff83c6c1f3
#1 [ff66a1a7b710b8b8] __crash_kexec at ffffffff83db59aa
#2 [ff66a1a7b710b978] crash_kexec at ffffffff83db68e1
#3 [ff66a1a7b710b990] oops_end at ffffffff83c2a9c1
#4 [ff66a1a7b710b9b0] no_context at ffffffff83c7e913
#5 [ff66a1a7b710ba08] __bad_area_nosemaphore at ffffffff83c7ec8c
#6 [ff66a1a7b710ba50] do_page_fault at ffffffff83c7f8a7
#7 [ff66a1a7b710ba80] page_fault at ffffffff8480116e
[exception RIP: unmap_page_range+2246]
RIP: ffffffff83edba86 RSP: ff66a1a7b710bb30 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffbaa2a05dddfbc0 RCX: 0000000000000000
RDX: ff44be2711102000 RSI: 0000000005000000 RDI: ff44be1b34e61e00
RBP: ff44be2711102000 R8: ffa33a6676766f88 R9: ff44be563ffd2000
R10: 0000000000000000 R11: ffffffffffffffff R12: 0000000005000000
R13: ff66a1a7b710bc78 R14: 0000000000000000 R15: 0000000005001000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ff66a1a7b710bc10] unmap_vmas at ffffffff83edc420
#9 [ff66a1a7b710bc70] exit_mmap at ffffffff83ee665d
During the installation of an InfoScale update patch the /etc/sysconfig/llt file was restored on the first node (CVM master), but not on the second node (CVM slave) and this resulted in an inconsistency in the LLT/RDMA tunable settings between the nodes.
Compare the contents of the /etc/sysconfig/llt file on the nodes and make sure the same tunable settings are in place.
For example, if the following two lines are present in the /etc/sysconfig/llt file on the CVM master node, but missing on the CVM slave node, then they will need to be added to the file on CVM slave node by either editing the file or copying the file from the CVM master node:
LLT_MAXADVBUFS=4000
LLT_ADVBUF_SIZE=8192
Once this inconsistency is corrected, the CVM slave node should join the cluster successfully.
Please also check that the files below are consistent between the CVM master and CVM slave nodes:.
/etc/sysconfig/gab
/etc/sysconfig/vxfen
/etc/sysconfig/vcs
/etc/sysconfig/amf
Arctera is working to address this issue.