DMP Recovery Option of Timebound Error Retry doesn't work correctly with the specified Timebound value

book

Article ID: 100029008

calendar_today

Updated On:

Description

Error Message

Wed Mar 20 21:37:23.825: I/O error occured on Path hdisk228 belonging to Dmpnode v62400_46       <<< Three I/O errors on three different paths
Wed Mar 20 21:37:23.825: I/O error occured on Path hdisk261 belonging to Dmpnode v62400_48       <<<
Wed Mar 20 21:37:23.825: I/O error occured on Path hdisk237 belonging to Dmpnode v62400_48       <<<
Wed Mar 20 21:37:23.826: I/O analysis done as DMP_PATH_OKAY on Path hdisk261 belonging to Dmpnode v62400_48
Wed Mar 20 21:37:23.826: I/O retry(1) on Path hdisk261 belonging to Dmpnode v62400_48          <<< Only one error was retried
Wed Mar 20 21:37:23.827: I/O analysis done as DMP_PATH_OKAY on Path hdisk228 belonging to Dmpnode v62400_46
Wed Mar 20 21:37:23.827: I/O analysis done as DMP_PATH_OKAY on Path hdisk237 belonging to Dmpnode v62400_48
Wed Mar 20 21:37:23.827: I/O error occured (errno=0x0) on Dmpnode v62400_46         <<< I/O errors were returned on the other two I/Os
Wed Mar 20 21:37:23.827: I/O error occured (errno=0x0) on Dmpnode v62400_48         <<<

Cause

One_line_abstract:
Uncorrectable write error is seen on subdisk when SCSI device/bus reset occurs.

SYMPTOM:
Following messages can be seen in syslog:
SCSI error: return code = 0x00070000
I/O error, dev , sector
VxVM vxdmp V-5-0-0 i/o error occurred (errno=0x0) on dmpnode /

DESCRIPTION:
When the SCSI resets happen, the I/O fails with PATH_OK or PATH_RETRY error.
As time bound recovery is default recovery option, VxVM retries the I/O till timeout.
Because of miscalculation of time taken by each I/O retry, total timeout value is reduced drastically.
All retries fail with the same error in this small timeout value and uncorrectable error occurs.

RESOLUTION:
Code changes are made to calculate the timeout value properly.

Resolution

Please upgrade to SF 6.0.   The miscalculation doesn't occur in SF 6.0 because of the Timebound Error Retry is redesigned.


Applies To

VxVM 5.1 and 5.1SP1 running on all platforms.   The problem doesn't affect VxVM 6.0 and above because the related DMP kernel function was redesigned.

Issue/Introduction

When the SCSI resets happen, the I/O fails with PATH_OK or PATH_RETRY error.  As time bound recovery is default recovery option, VxVM retries the I/O till timeout. Because of miscalculation of time taken by each I/O retry, total timeout value is reduced drastically.  All retries fail with the same error in this small timeout value and uncorrectable error occurs.

Additional Information

JIRA: null ETrack: 1844425