Data Corruption may occur due to blk_execute_rq did not pass the error in RHEL 8.1

book

Article ID: 100048354

calendar_today

Updated On:

Cause

Known Bug in RedHat 8.1 (BZ#1822252)

Code paths like the function blk_execute_rq and some SCSI IOCTLs would not pass the block layer error to the caller. 

Ideally, DMP will disable paths and failover the IO after an HBA port is unplugged. The issue was that the OS didn’t return any errors to any SCSI requests against an unplugged path, even syslog shows the device was rejecting IOs at that time. DMP would take the error as DMP_PATH_OKAY and won’t disable those unplugged paths. DMP further would retry IO on the same path through a SCSI request. 

The SCSI request that carried the IO was returned successfully from the OS Layer. Hence DMP returned IO to upper layer with success. The upper layer(vxfs/application) would take the IO as success, but this IO never arrived at storage side, hence data loss happened between host and storage.

Resolution

RedHat provided a Kernel fix on kernel-4.18.0-147.13.2.el8_1 and above, which will set an error value in the SCSI result when rejecting commands during submission such as with an offline device. Kindly reach out to RedHat to know more about the issue.

Disabling dmp_fast_recovery can be a workaround.

When dmp_fast_recovery is set off, IO is issued through an IOCTL, not through a SCSI request. IOCTL will fail due to "not support ioctl with I/O vectors". DMP will restart those failed IOs on alternative path. So, there won’t have IO loss on storage side

Issue/Introduction

Data Corruption may occur due to OS API 'blk_execute_rq' did not pass the block layer error to its caller, after unplugging HBA ports. DMP didn’t disable the unplugged path either. Error Message System log: kernel: sd 17:0:2:9: rejecting I/O to offline device kernel: VxVM vxdmp V-5-3-0 dmp_check_scsipkt: SCSI request completed host_byte = 0x0 msg_byte = 0x0 rq_status = 0x8 <<< (host_byte was returned as success on failed path) kernel: VxVM vxdmp V-5-0-0 [Info] i/o error analysis done (status = 0) on path 66/0xd0 belonging to dmpnode 201/0x50 kernel: VxVM vxdmp V-5-3-1476 dmp_notify_events: Total number of events = 3 kernel: VxVM vxdmp V-5-0-148 [Info] i/o retry(1) on path 66/0xd0 belonging to the dmpnode 201/0x50 dmpevents.log: I/O error occurred on Path sdat(66/208) belonging to Dmpnode emc0_022f(201/80) I/O analysis done as DMP_PATH_OKAY on Path sdat(66/208) belonging to Dmpnode emc0_022f(201/80) I/O retry(1) on Path sdat(66/208) belonging to Dmpnode emc0_022f(201/80) Environment RHEL 8.1 Infoscale 7.4.1 and above VxFS/Oracle RAC

Additional Information

JIRA: STESC-4848