Veritas Volume Manager 8.0.2.1501 (RHEL8+9) released to prevent NULL pointer deference "dmp_daemons_loop" panic when the blk_mq "vxdmp.dmp_blk_mq_enable" feature is enabled

book

Article ID: 100064514

calendar_today

Updated On:

Description

Error Message

Example shown is from a RedHat 8 environment

The extract below is from the panic stack:
 

crash> bt
PID: 10610  TASK: ff4aa1124e4b8000  CPU: 36  COMMAND: "dmpdaemon"
 #0 [ff553c482092bb30] machine_kexec at ffffffff8d66da33
 #1 [ff553c482092bb88] __crash_kexec at ffffffff8d7b757a
 #2 [ff553c482092bc48] crash_kexec at ffffffff8d7b84b1
 #3 [ff553c482092bc60] oops_end at ffffffff8d62be31
 #4 [ff553c482092bc80] no_context at ffffffff8d67f923
 #5 [ff553c482092bcd8] __bad_area_nosemaphore at ffffffff8d67fc9c
 #6 [ff553c482092bd20] do_page_fault at ffffffff8d6808b7
 #7 [ff553c482092bd50] page_fault at ffffffff8e2011ae
    [exception RIP: dmpsvc_da_analyze_error+417]
    RIP: ffffffffc09ec411  RSP: ff553c482092be08  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ff4aa0d3e5243900  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: d4532d9f8ca032df  RDI: ff553c482092be48
    RBP: ff4aa0d29ac74400   R8: ff553c482092be08   R9: ff553c482092be4e
    R10: 0000000000000001  R11: 0000000000000000  R12: ff4aa0d3e5243c00
    R13: 0000000000000000  R14: ff4aa0d264205fb8  R15: 0000000004200030
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ff553c482092be90] dmp_error_analysis_callback at ffffffffc0d217fa [vxdmp]
 #9 [ff553c482092bed0] dmp_daemons_loop at ffffffffc0d333a4 [vxdmp]
#10 [ff553c482092bf10] kthread at ffffffff8d71e974
#11 [ff553c482092bf50] ret_from_fork at ffffffff8e20028f

Cause

The VRTSaslapm pkg for the IBM SVC Array has an issue with the new Multi-Queue Block IO Queuing Mechanism (blk_mq) Linux operating system feature.

 

Resolution


Workaround

Customers using the IBM SVC array can run the following to work around to avoid Veritas triggering the blk_mq related panic by disabling it.

# sysctl -w vxdmp.dmp_blk_mq_enable = 0 >> /etc/sysctl.conf 

# reboot

To check if the blk-mq feature is enabled, use the below commands to see if a Dynamic Multi-pathing (DMP) device has multiple queues, e.g.

# ls /sys/block/VxDMP*/mq
/sys/block/VxDMP10/mq:
0  1  2  3  4  5
/sys/block/VxDMP11/mq:
0  1  2  3  4  5
/sys/block/VxDMP12/mq:
0  1  2  3  4  5

Once the blk-mq feature is disabled and a reboot is taken, then the mq directory itself should not be present for any of the DMP devices:

Example of blk-mq disabled:

# ls /sys/block/VxDMP*/mq
ls: cannot access '/sys/block/VxDMP*/mq': No such file or directory

This should be applied to all nodes in the environment to disable the feature.
 
NOTE: Disabling this will not impact the performance of regular hard disks.



Veritas Private hotfix

 

Veritas Volume Manager (VxVM) 8.0.2.1501 (RHEL8+9) has been released to prevent NULL pointer deference "dmp_daemons_loop" panic when the blk_mq "vxdmp.dmp_blk_mq_enable" feature is enabled
 

Veritas Volume Manager 8.0.2 Private hotfix 1501 is available on request for RHEL 8 and 9.
 

Panic:

PID: 155809  TASK: ffff9abc40e08000  CPU: 11  COMMAND: "dmpdaemon"
#0 [ffffc13ca29a7b30] machine_kexec at ffffffffb4e6da33
#1 [ffffc13ca29a7b88] __crash_kexec at ffffffffb4fb757a
#2 [ffffc13ca29a7c48] crash_kexec at ffffffffb4fb84b1
#3 [ffffc13ca29a7c60] oops_end at ffffffffb4e2be31
#4 [ffffc13ca29a7c80] no_context at ffffffffb4e7f923
#5 [ffffc13ca29a7cd8] __bad_area_nosemaphore at ffffffffb4e7fc9c
#6 [ffffc13ca29a7d20] do_page_fault at ffffffffb4e808b7
#7 [ffffc13ca29a7d50] page_fault at ffffffffb5a011ae
    [exception RIP: dmpsvc_da_analyze_error+417]
    RIP: ffffffffc0ecc411  RSP: ffffc13ca29a7e08  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff9abd96d1f800  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 68d38a89bb1d8221  RDI: ffffc13ca29a7e48
    RBP: ffff9abf201b1400   R8: ffffc13ca29a7e08   R9: ffffc13ca29a7e4e
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff9abd96d1f100
    R13: 0000000000000000  R14: ffff9abd588c3938  R15: 00000000085000b0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#8 [ffffc13ca29a7e90] dmp_error_analysis_callback at ffffffffc10397fa [vxdmp]
#9 [ffffc13ca29a7ed0] dmp_daemons_loop at ffffffffc104b3a4 [vxdmp]
#10 [ffffc13ca29a7f10] kthread at ffffffffb4f1e974
#11 [ffffc13ca29a7f50] ret_from_fork at ffffffffb5a0028f


RESOLUTION:

The blk_mq related code processes IO in request form it does not deal with the bio. The bio which were seeing here it looks be a dummy bio which might be added just for 
compatibility. Looks like we need to add a piece of code that will check whether IO is request based or bio based, if it is request based then handle it differently. We are 
doing same thing for all other places need to handle it here as well.
 



Note: A supported hotfix has been made available for this issue. Please contact  Technical Support to obtain this fix. This hotfix has not yet gone through any extensive Q&A testing. Consequently, if you are not adversely affected by this problem and have a satisfactory temporary workaround in place, we recommend that you wait for the public release of this hotfix.The Product Engineering Team currently plans to address this issue by way of a patch or hotfix to the current version of the software. Please note that we as a company reserve the right to remove any fix from the targeted release if it does not pass quality assurance tests. Our plans are subject to change and any action taken by you based on the above information or your reliance upon the above information is made at your own risk.Please contact your Sales representative or the Sales group for upgrade information including upgrade eligibility to the release containing the resolution for this issue.

 

Issue/Introduction


Customers may encounter a NULL pointer deference panic in the "dmp_daemons_loop" code path when the blk_mq "vxdmp.dmp_blk_mq_enable" feature is enabled.

blk_mq is a linux feature which DMP started supporting. This will take advantage of modern devices which can process large numbers of parallel I/O. Linux documentation link: https://docs.kernel.org/block/blk-mq.html

Additional Information

JIRA: STESC-8728