NOTICE: VxVM vxdmp V-5-3-0 Reached DMP Threshold IO TimeOut (300) for disk 49/0x32
Ensure the iotimeout value is high enough for Disk drivers to retry the i/o (and return fatal error) before timeout of the DMP node. The value of iotimeout should be somewhat less than the timeout value for applications ( Database instances, etc. ) to prevent application hang or failure.
Applies To
5.1SP1 and later verions for Solaris (Also in 6.x)
5.1SP1 and later versions for RedHat 5.x (due to a udev rule that sets SCSI_timeout to 60 seconds in RHEL5.x. This rule does not exist in RHEL4.x or RHEL6.x)
The typical scenarios that could cause SCSI to take the maximum timeout limits to time out are:
SAN fabric failure where the host does not lose local port connection. Loss of port connection usually results in immediate FATAL SCSI error.
No Device type failures where the target simply stops responding to any command. One possible cause is incorrect fabric zoning.
SPECIAL Note:
Solaris when using sd driver and sd_io_time set to 60 seconds or longer. Note that 60 seconds is the default value, however is commonly set in /etc/system as required by array vendors. The sd driver retries 5 times, yielding a path timeout of 300 seconds. [ sd_io_time =60 sec X sd_retry_count=5 = 300 second timer ] In this case, the DMP iotimeout value should be set to at least 315 or more to allow proper path failover when a device becomes unresponsive (SAN fabric failure, etc.)
In the case of Solaris 10 using the embedded ssd driver the timing of FATAL differs. The ssd driver retries only 3 times and adds a 20 second FCP timeout yielding a path timeout of 200 seconds with the same ssd_io_time value.
RedHat Enterprise Linux 5.x (RHEL5.x)
From the kernel source we can see at the scsi layer the timeout is set to 30 seconds by default. We can also see that the retries is hard coded into the kernel
drivers/scsi/sd.c
-----------------------------
/*
* Time out in seconds for disks and Magneto-opticals (which are slower).
*/
#define SD_TIMEOUT (30 * HZ)
#define SD_MOD_TIMEOUT (75 * HZ)
#define SD_FLUSH_TIMEOUT (60 * HZ)
/*
* Number of allowed retries
*/
#define SD_MAX_RETRIES 5
#define SD_PASSTHROUGH_RETRIES 1
-----------------------------
This code is the same in RHEL6. The timeout in RHEL5 is adjusted by the following udev rule that is installed by default.
/etc/udev/rules.d/50-udev.rules
-----------------------------
# sd: 0 TYPE_DISK, 7 TYPE_MOD, 14 TYPE_RBC
# sr: 4 TYPE_WORM, 5 TYPE_ROM
# st/osst: 1 TYPE_TAPE
# sg: 8 changer, [36] scanner
ACTION=="add", SUBSYSTEM=="scsi" , SYSFS{TYPE.EN_US}=="0|7|14", \
RUN+="/bin/sh -c 'echo 60 > /sys$$DEVPATH/timeout'"
ACTION=="add", SUBSYSTEM=="scsi" , SYSFS{TYPE.EN_US}=="1", \
RUN+="/bin/sh -c 'echo 900 > /sys$$DEVPATH/timeout'"
-----------------------------
This is only set in RHEL5.x Checking RHEL4 and RHEL6, we can see that the timeout is 30 as is hard coded into the kernel.
The following document confirms these findings.
17. Controlling the SCSI Command Timer and Device Status
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/online_storage_reconfiguration_guide/task_controlling-scsi-command-timer-onlining-devices
vxdmpadm getattr enclosure <enclosurename> recoveryoption
In the example below, the Disk enclosure is using Timebound Error-retry Logic and an iotimeout of 30 is used. Note this is less than most SCSI driver total timeout periods with retries at the SCSI layer.
#vxdmpadm getattr enclosure disk recoveryoption
ENCLR-NAME RECOVERY-OPTION DEFAULT[VAL] CURRENT[VAL]
===============================================================
disk Throttle Nothrottle[0] Nothrottle[0]
disk Error-Retry Timebound[300] Timebound[30] <-- iotimeout set for 30 seconds.
Another example below is for all attributes of the enclosure:
#vxdmpadm getattr enclosure emc0 #vxdmpadm setattr enclosure emc0 recoveryoption=timebound iotimeout=315