Example shown is from a RedHat 8 environment
The extract below is from the panic stack:
crash> bt
PID: 10610 TASK: ff4aa1124e4b8000 CPU: 36 COMMAND: "dmpdaemon"
#0 [ff553c482092bb30] machine_kexec at ffffffff8d66da33
#1 [ff553c482092bb88] __crash_kexec at ffffffff8d7b757a
#2 [ff553c482092bc48] crash_kexec at ffffffff8d7b84b1
#3 [ff553c482092bc60] oops_end at ffffffff8d62be31
#4 [ff553c482092bc80] no_context at ffffffff8d67f923
#5 [ff553c482092bcd8] __bad_area_nosemaphore at ffffffff8d67fc9c
#6 [ff553c482092bd20] do_page_fault at ffffffff8d6808b7
#7 [ff553c482092bd50] page_fault at ffffffff8e2011ae
[exception RIP: dmpsvc_da_analyze_error+417]
RIP: ffffffffc09ec411 RSP: ff553c482092be08 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ff4aa0d3e5243900 RCX: 0000000000000000
RDX: 0000000000000000 RSI: d4532d9f8ca032df RDI: ff553c482092be48
RBP: ff4aa0d29ac74400 R8: ff553c482092be08 R9: ff553c482092be4e
R10: 0000000000000001 R11: 0000000000000000 R12: ff4aa0d3e5243c00
R13: 0000000000000000 R14: ff4aa0d264205fb8 R15: 0000000004200030
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ff553c482092be90] dmp_error_analysis_callback at ffffffffc0d217fa [vxdmp]
#9 [ff553c482092bed0] dmp_daemons_loop at ffffffffc0d333a4 [vxdmp]
#10 [ff553c482092bf10] kthread at ffffffff8d71e974
#11 [ff553c482092bf50] ret_from_fork at ffffffff8e20028f
The VRTSaslapm pkg for the IBM SVC Array has an issue with the new Multi-Queue Block IO Queuing Mechanism (blk_mq) Linux operating system feature.
Workaround
Customers using the IBM SVC array can run the following to work around to avoid Veritas triggering the blk_mq related panic by disabling it.
# sysctl -w vxdmp.dmp_blk_mq_enable = 0 >> /etc/sysctl.conf
# reboot
To check if the blk-mq feature is enabled, use the below commands to see if a Dynamic Multi-pathing (DMP) device has multiple queues, e.g.
# ls /sys/block/VxDMP*/mq
/sys/block/VxDMP10/mq:
0 1 2 3 4 5
/sys/block/VxDMP11/mq:
0 1 2 3 4 5
/sys/block/VxDMP12/mq:
0 1 2 3 4 5
Once the blk-mq feature is disabled and a reboot is taken, then the mq directory itself should not be present for any of the DMP devices:
Example of blk-mq disabled:
# ls /sys/block/VxDMP*/mq
ls: cannot access '/sys/block/VxDMP*/mq': No such file or directory
This should be applied to all nodes in the environment to disable the feature.
NOTE: Disabling this will not impact the performance of regular hard disks.
Veritas Private hotfix
Veritas Volume Manager (VxVM) 8.0.2.1501 (RHEL8+9) has been released to prevent NULL pointer deference "dmp_daemons_loop" panic when the blk_mq "vxdmp.dmp_blk_mq_enable" feature is enabled
Veritas Volume Manager 8.0.2 Private hotfix 1501 is available on request for RHEL 8 and 9.
Panic:
PID: 155809 TASK: ffff9abc40e08000 CPU: 11 COMMAND: "dmpdaemon"
#0 [ffffc13ca29a7b30] machine_kexec at ffffffffb4e6da33
#1 [ffffc13ca29a7b88] __crash_kexec at ffffffffb4fb757a
#2 [ffffc13ca29a7c48] crash_kexec at ffffffffb4fb84b1
#3 [ffffc13ca29a7c60] oops_end at ffffffffb4e2be31
#4 [ffffc13ca29a7c80] no_context at ffffffffb4e7f923
#5 [ffffc13ca29a7cd8] __bad_area_nosemaphore at ffffffffb4e7fc9c
#6 [ffffc13ca29a7d20] do_page_fault at ffffffffb4e808b7
#7 [ffffc13ca29a7d50] page_fault at ffffffffb5a011ae
[exception RIP: dmpsvc_da_analyze_error+417]
RIP: ffffffffc0ecc411 RSP: ffffc13ca29a7e08 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff9abd96d1f800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 68d38a89bb1d8221 RDI: ffffc13ca29a7e48
RBP: ffff9abf201b1400 R8: ffffc13ca29a7e08 R9: ffffc13ca29a7e4e
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9abd96d1f100
R13: 0000000000000000 R14: ffff9abd588c3938 R15: 00000000085000b0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffc13ca29a7e90] dmp_error_analysis_callback at ffffffffc10397fa [vxdmp]
#9 [ffffc13ca29a7ed0] dmp_daemons_loop at ffffffffc104b3a4 [vxdmp]
#10 [ffffc13ca29a7f10] kthread at ffffffffb4f1e974
#11 [ffffc13ca29a7f50] ret_from_fork at ffffffffb5a0028f
RESOLUTION:
The blk_mq related code processes IO in request form it does not deal with the bio. The bio which were seeing here it looks be a dummy bio which might be added just for
compatibility. Looks like we need to add a piece of code that will check whether IO is request based or bio based, if it is request based then handle it differently. We are
doing same thing for all other places need to handle it here as well.
Note: A supported hotfix has been made available for this issue. Please contact Technical Support to obtain this fix. This hotfix has not yet gone through any extensive Q&A testing. Consequently, if you are not adversely affected by this problem and have a satisfactory temporary workaround in place, we recommend that you wait for the public release of this hotfix.The Product Engineering Team currently plans to address this issue by way of a patch or hotfix to the current version of the software. Please note that we as a company reserve the right to remove any fix from the targeted release if it does not pass quality assurance tests. Our plans are subject to change and any action taken by you based on the above information or your reliance upon the above information is made at your own risk.Please contact your Sales representative or the Sales group for upgrade information including upgrade eligibility to the release containing the resolution for this issue.