SCSI Device Sync Cache Errors result in LUN trespassing CVM/VCS resource timeouts as well as delay in vx commands

book

Article ID: 100030118

calendar_today

Updated On:

Description

Error Message

 

In /var/adm/messages with the array fail over mode set to 2 the following message will occur across all paths:

mtvv40z-01 scsi: [ID 107833 kern.warning] Warning:/pci@0,0/pci1022,7450@a/pci17c2,20@4/sd@2,0(sd35):
mtvv40z-01 sdclose: Sending Sync Cache Command

In /var/adm/messages with the array fail over mode set to 1 the following failure message will be produced for the secondary path:

mtvv40z-01 scsi: [ID107833 kern.warning] Warning:/pci@0,0/pci1022,7450@b/pci1077,106@2/fp@0,0/disk@w5006016830602e22,6(sd17):
mtvv40z-01 SYNCHRONIZE CACHE command failed (5)

Using iostat to monitor IO activity will produce the following (the w/s column shows activity regardless of fail-overmode):

   r/s    w/s  kr/s   kw/s wait actv wsvc_t asvc_t  %w  %bdevice
   0.0  17.0    0.0    0.0  0.0  0.0    0.0    0.0  0   0c2t5006016830602E22d0
 136.0  51.0  43.8    0.0  0.0  0.0    0.0    0.0  0   0c2t5006016030602E22d0
   0.0  17.0    0.0    0.0  0.0  0.0    0.0    0.0  0   0c3t5006016830602E22d0
 233.0  67.0  75.7    3.0  0.0  0.0    0.0    0.0  0   1 c3t5006016030602E22d0
 
 

Cause


Code that was intended to support PC peripheral devices that had Non Volatile Cache is interfering with storage arrays like EMC Clarion in the x86 platform.  This code is causing the SYNC CACHE command to be issued.  The SD driver at the OS level array is note designed to be aware of primary vs. secondary paths (or fail-over mode).  As a result, when a SYNC CACHE command is issued to the secondary path with a fail-over mode of 1 an EIO is produced.
 
When the SYNC CACHE command is issued to the secondary path with fail-over mode 2 it results in a LUN trespass.  These trespasses events will cause delays in IO time which in turn can cause VCS and related commands to time out.

Changes made to the driver have been tested and verified to be working with fail-over modes 1 and 2 with the EMC Clariion array.  No writes were being issued on to the secondary path and no EIO messages generated which eliminated the previously observed delays.
 

Resolution

 
This problem is specific to Solaris 10 on x86 hardware.

Note: If you are experiencing this issue please contact Sun/Oracle  .  The SD driver patch specific to Solaris 5.10 update4(Generic_125101-10) is IDR 128268-03.  
 
 

 

Issue/Introduction

At times, LUN trespassing occurs due to SD layer sync cache errors that create IO on all device paths. This results in CVM/VCS resource time outs and delays in execution of Volume Manager commands vxdctl enable , vxdisksetup and vxconfigd .