DMP returns an I/O failure to VxVM without retrying the I/O on alternate paths

book

Article ID: 100027074

calendar_today

Updated On:

Description

Error Message

From /etc/vx/dmpevents.log:

Tue Apr  3 23:11:26.649: I/O error occured on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
Tue Apr  3 23:11:31.296: SCSI error occured on Path c26t50060E8005B0E218d22s2: opcode=0x12 reported bus reset (status=0x0, key=0x0, asc=0x0, ascq=0x0)
Tue Apr  3 23:11:31.296: I/O analysis done on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2

Tue Apr  3 23:11:36.685: I/O error occured on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
          <<< No SCSI error reported here
Tue Apr  3 23:12:32.100: I/O analysis done on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
Tue Apr  3 23:12:32.102: I/O error occured on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
Tue Apr  3 23:13:27.219: I/O analysis done on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
Tue Apr  3 23:13:27.960: I/O error occured on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
Tue Apr  3 23:13:47.949: I/O analysis done on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
Tue Apr  3 23:13:57.490: I/O error occured on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
Tue Apr  3 23:14:12.296: I/O analysis done on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
Tue Apr  3 23:14:22.520: I/O error occured on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
Tue Apr  3 23:14:27.220: I/O analysis done on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2

From /var/adm/messages:

Apr  3 23:14:27 hostA vxio: [ID 556087 kern.warning] WARNING: VxVM vxio V-5-0-3 Plex vol66-01 block 5289488:
Apr  3 23:14:27 hostA        Uncorrectable read error on Subdisk disk28-02 block 1322512
 

Cause

Generally when DMP receives an I/O error back from the OS SCSI layer, DMP will perform error analysis of the path.   The error analysis is specific to the diskarray type and is controlled by the Array Policy Module (APM).    For example, DMP in VxVM version earlier than 5.1SP1 with Active/Active diskarray, during the error analysis if DMP finds that the problematic path is still available, DMP will retry the I/O on the same path.   DMP checks if the problematic path is available or not by sending SCSI Inquiry via the path.    From the dmpevents.log if there is no SCSI error reported between the "I/O error occured" and the "I/O analysis done" messages, it means the SCSI Inquiry succedded.

Tue Apr  3 23:11:36.685: I/O error occured on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2
          <<< No SCSI error reported here
Tue Apr  3 23:12:32.100: I/O analysis done on Path c26t50060E8005B0E218d22s2 belonging to Dmpnode c8t50060E8005B0E224d22s2

If this condition persists (that is, I/O fails but SCSI Inquiry succeeds), DMP will retry the I/O on the same path for a total number of times according to the dmp_retry_count.  

dmp_retry_count can be viewed or set by the vxdmpadm gettune and settune commands respectively.

# vxdmpadm gettune dmp_retry_count
            Tunable               Current Value  Default Value
------------------------------    -------------  -------------
dmp_retry_count                           5                5
 

At the same time DMP will monitor the retry process according to the enclosure-specific (also the array-name-specific or array-type-specific, whatever is applicable) recovery option.   The DMP recovery option is viewed and set by the vxdmpadm getattr and setattr commands.  The Error-Retry attribute is the maximum retry that DMP will wait for the I/O to finish before declaring that the I/O should be returned back to VxVM as failed I/O.

# vxdmpadm getattr enclosure xp24k0 recoveryoption
ENCLR-NAME      RECOVERY-OPTION      DEFAULT[VAL]        CURRENT[VAL]
==============================================================================
xp24k0          Throttle             Nothrottle[0]       Nothrottle[0]
xp24k0          Error-Retry          Fixed-Retry[5]      Fixed-Retry[5]      <<<  Recovery Option Error Retry - Fixed Retry

# vxdmpadm getattr enclosure hds9500-alua0 recoveryoption
ENCLR-NAME      RECOVERY-OPTION      DEFAULT[VAL]  CURRENT[VAL]
===============================================================
hds9500-alua0  Throttle             Nothrottle[0]  Nothrottle[0]
hds9500-alua0  Error-Retry          Fixed-Retry[5] Timebound[300]     <<< Recovery Option Error Retry - Timebound

With the default Fixed-Retry count of 5, DMP will fail the I/O without retrying on other paths if the diskarray continues to fail the I/O but return successfully with SCSI inquiry for the default 5 times as set by dmp_retry_count.

 

Resolution

There are enhancements in VxVM 5.1 SP1 through Etrack 192457 as listed in the Supplementary Information.  One of the enhancement to retry the I/O on other paths before returning I/O failure back to VxVM.

Please upgrade to VxVM 5.1SP1 to have the enhanced DMP path failure handling.

In VxVM version earlier than 5.1 SP1, you can increase the Fixed-Retry value to higher than that of dmp_retry_count in order for DMP to

retry the I/O on other paths.

# vxdmpadm settune enclosure recoveryoption=fixedretry retrycount=
 


Applies To

Veritas Volume Manager (VxVM) version 5.1GA or below and all OS platforms.

Issue/Introduction

In a multipathed storage environment, DMP returns an I/O failure to VxVM without retrying the I/O on alternate paths.

Additional Information

ETrack: 1924579