The messages file was flooded with blk_get_request failures.
eg.
kernel: VxVM vxdmp V-5-3-0 dmp_kernel_scsi_ioctl: blk_get_request failed dev 133/0x210 cmd 0x12 error code = 0
Similar messages were seen for SCSI cmds - 0x5e, 0x5f, 0xa3, 0x12,
This was due to an OS-udev-vxdmp interaction issue.
The DMP device is a virtual device created above OS device and it relies on the OS dev rules to notify it of any changes. In this instance OS-udev-vxdmp interaction was not behaving as expected, so that when the paths were re-activated, dmp was not made aware and so didn’t reactivate the paths.
Typically vxesd would be made aware that a device had been removed
eg. vxesd[108002]: vxesd: Device sdbar(71/1328) is removed.
and subsequently DMP would report that a path had been disabled.
Then it would be expected that vxesd would be made aware that a device had been added and so DMP would be made aware to re-enable the path.
eg. vxesd[108002]: vxesd: Device sdbar(71/1328) is added.
However in this instance vxesd was not made aware of the added devices.
Additionally it was determined that during the controller/SP reboots, it can take a while for the paths to respond after the array returns. This delay was leading to blk_get_request failures and the paths being disabled and so meant that this delay had to be taken into account at the OS-side.
With the following solution in place, vxesd recognized when a device had been removed and added and subsequently dmp would disable and re-enable the paths correctly for the Storwize storage:
1. vxesd enabled (with VxVM udev rules files in place)
For the VxVM udev rules files, check to see if the following files exist:
/etc/udev/rules.d/40-VxVM.rules
/etc/udev/rules.d/99-vxdmp-remove-blockdev.rules
/lib/udev/vxvm-udev.sh
/lib/udev/vxpath_links
If not, then the following files may need to be copied into place:
cp /etc/vx/vxvm-udev.rules /etc/udev/rules.d/40-VxVM.rules
cp /etc/vx/vxvm-udev.rules.systemd /etc/udev/rules.d/40-VxVM.rules
cp /etc/vx/vxdmp-remove-blockdev.rules /etc/udev/rules.d/99-vxdmp-remove-blockdev.rules
cp /etc/vx/vxvm-udev /lib/udev/vxvm-udev.sh
cp /etc/vx/vxpath_links /lib/udev/vxpath_links
Then run
/sbin/udevcontrol reload_rules
2. The Redhat scsi_mod.inq_timeout tunable needed to be tuned.
Normally the default value of 20s for this tunable is sufficient for most cases and Redhat would only recommend modifying it if the storage required more time to respond to the inquiry.
The following Redhat document provides some more information on this tunable:
https://access.redhat.com/solutions/3430351
In this particular scenario the recommendation was to set this tunable to '70'. This can be done in two ways:
a. Add scsi_mod.inq_timeout=70 to the grub line to make it permanent and reboot.
b. To set this tunable dynamically, then the following command can be run:
echo 70 > /sys/module/scsi_mod/parameters/inq_timeout
It should be noted that this tunable is also applicable to rhel7.
3. Set the dmp recoveryoption iotimeout to 600
This can be done using the vxdmpadm command:
eg.
vxdmpadm setattr enclosure
'600' was the value that best fit with the application timeout.
Please note that these issues may also occur on VxVM 7.x and so the above solution would be applicable.