Mon Dec 12 16:46:13.517: I/O retry(3) on Path sdl belonging to Dmpnode emc_clariion0_126
Thu Oct 20 00:19:00.733: I/O retry(1796406) on Path sdj belonging to Dmpnode emc_clariion0_121
There are two recovery options, and below is only for timebound, and fixed retry doesn't do any recalculation.
The message shows calculated retry count using below formula with :
Number of retries = (io_timeout / DMP total_time)
io_timeout is defined as recovery option on each enclosure, and you can check current value with below command:
# vxdmpadm getattr enclosure emc_clariion0 recoveryoption
ENCLR-NAME RECOVERY-OPTION DEFAULT[VAL] CURRENT[VAL]
===============================================================
emc_clariion0 Throttle Nothrottle[0] Nothrottle[0]
emc_clariion0 Error-Retry Timebound[300] Timebound[300]
DMP total_time is the actual time of DMP sending the I/O to the device, but it was failed. The time cannot be calculated by looking at the messages as the log only shows failed I/O, so unable to see start and finish time.
For example, retry(3) would indicate that the retry value was 4 as the message is printed out after reduce the retry count by 1. So, using above formula, we can get DMP total_time as 60~75 seconds:
300 / 75(=4) ~ 300 / 60(=5) (result is rounded)
retry(1796406) would have DMP total_time as about 0.000167(=300/1796407)
So, if the DMP I/O took longer than io_timeout, then retry count will be zero, and no further retry. But if the DMP I/O took very short time, then retry count will be huge value and dmp will try a lot, but remember the retry count is always re-calculate based on last DMP I/O time, so it's necessary to repeat so many times. The timebound retry is used to limit the retry times according to the device or SCSI I/O performance on the system. Fixed retry will be straight forward and just reduce retry count, and there is no re-calculation.
Applies To
Redhat 5 / SF5.1SP1