Veritas Volume Manager (VxVM) 5.0 MP3 RP5 (AIX) DMP may treat a SCSI opcode response as a MEDIA failure due to HBA adapter setting (init_link)

book

Article ID: 100027306

calendar_today

Updated On:

Description

Error Message


Sample error message from the /etc/vx/dmpevents.log


Tue May 15 10:56:14.085: SCSI error occured on Path hdisk38: opcode=0x2a reported illegal request (status=0x2, key=0x5, asc=0x21, ascq=0x0)

 

Cause




As 5.0 MP3 RP5 includes the following VxVM product enhancement, the HBA adapter setting (al) appears to be the root cause for DMP incorrectly treating the SCSI response as a MEDIA (disk) failure.

incident:     2201149
abstract:     [Product Enhancement]Solaris - DMP should try all possibilities to service I/O upon receipt of a SCSI illegal request following HBA fault
product:      UNIX_VOLUME_MANAGER

Without the above product enhancement Veritas DMP treats the above SCSI “illegal request” responses as a MEDIA failure, and does not attempt to validate any other path.
As a result, the LUN is reported as failed.

Without the product enhancement DMP introduced in 5.0 MP3 RP5, will attempt to validate the true LUN state by checking the alternate paths.
 

In Summary:

1) We don't expect the SCSI "illegal request" to be returned for a SCSI inquiry request, as a SCSI inquiry is something any SCSI device would respond to if its accessible.

2) Even if we have genuine case of an "illegal request" then DMP will now try all the paths and eventually fail the I/O. The only side effect is that the paths would be marked as DISABLED, which would the expected event if the DMP restore task sent out a SCSI probe and it failed.

Instead of failing the device, we would now fail the path. So, the I/O request will be retried via the other enabled paths. This would avoid any such component failure for a given path.

 
Factors outside of DMP control may also influence the intended behaviour, such as a HBA adapter setting.
 

Resolution

 

Recommendations:


Ensure all the HBA adapter settings regarding the "init_link" attribute are set to "pt2pt".
 

# chdev -l fcs0 -a init_link=pt2pt -P
# chdev -l fscsi0 -a fc_err_recov=fast_fail -P
# chdev -l fscsi0 -a dyntrk=yes -P


In a previous instance, adjusting this setting are prevented the previous SCSI opcode messages from appearing in the /etc/vx/dmpevents.log

 


Applies To

Environment

AIX
VxVM 5.0 MP3 RP5


How to change the HBA adapter setting


In this instance, the "fsc0" HBA adapter setting for the Link Initialization protocol (init_link) is currently defined as "Arbitrated Loop" (al), whereas the other HBA's are set to "pt2pt".


# lsattr -El fcs0
bus_intr_lvl  147        Bus interrupt level                                False
bus_io_addr   0xef800    Bus I/O address                                    False
bus_mem_addr  0xf0081000 Bus memory address                                 False
init_link     al         INIT Link flags                                    True
intr_priority 3          Interrupt priority                                 False
lg_term_dma   0x800000   Long term DMA                                      True
max_xfer_size 0x100000   Maximum Transfer Size                              True
num_cmd_elems 200        Maximum number of COMMANDS to queue to the adapter True
pref_alpa     0x1        Preferred AL_PA                                    True
sw_fc_class   2          FC Class for Fabric                                True


Comparison:


The Link Initialization protocol (init_link) is set to "point 2 point" for this adapter. In this instance, fcs1 is not triggering any SCSI "illegal" request messages.

# lsattr -El fcs1
bus_intr_lvl  148        Bus interrupt level                                False
bus_io_addr   0xefc00    Bus I/O address                                    False
bus_mem_addr  0xf0080000 Bus memory address                                 False
init_link     pt2pt      INIT Link flags                                    True
intr_priority 3          Interrupt priority                                 False
lg_term_dma   0x800000   Long term DMA                                      True
max_xfer_size 0x100000   Maximum Transfer Size                              True
num_cmd_elems 200        Maximum number of COMMANDS to queue to the adapter True
pref_alpa     0x1        Preferred AL_PA                                    True
sw_fc_class   2          FC Class for Fabric                                True
 

Note: Veritas would recommend engaging IBM support for guidance surrounding the modification of any HBA adapter setting.

The attribute change requires a reboot of the server.


 

Issue/Introduction


On AIX the fibre channel host adapter settings for the hardware adapter can potentially influence the SCSI opcode response to DMP and trigger false MEDIA failures, resulting in a disk being marked as "failed" by DMP and Veritas Volume Manager (VxVM).

The AIX "link initialization" protocol (init_link) can be set to "point 2 point (pt2pt)" or "arbitrated Loop(al)"

init_link:
    from online help:
    "Do not change this attribute unless directed by IBM support."
    al: arbitrared loop : first tries al, then pt2pt, if not succes then link remains down
    pt2pt: tries pt2pt, if not succes then link remains down   (recommended in this instance)

Upon changing the "link initialization" protocol the SCSI opcode messages will no longer recorded in the /etc/vx/dmpevents.log related to the Veritas vxesd (eventsource) daemon, thus the problematic disk remained stable going forward.


With 5.0 MP3 RP5 for VxVM on AIX, DMP should now try all possibilities to service I/O upon receipt of a SCSI illegal request following HBA failure.
See Veritas Article: 000010578  
Prior to the above DMP product enhancement DMP would was designed as follows:

Upon receipt of a SCSI "illegal request" opcode message, DMP is treating the "illegal request" sense key as a device (MEDIA) failure, and is resulting in an application I/O failure even when the device is healthy and there are other paths available.
 
When an I/O error occurs on a path of a DMP device, DMP does I/O error analysis on that path and as part of error analysis, it sends a SCSI inquiry command against that path. In this instance, the response of the SCSI inquiry operation, whereby DMP obtains the check condition status and the sense key as an "illegal request". DMP treats this event as a disk failure instead of a path failure and hence disables the DMP device and fails the application I/O on the DMP device.

The "illegal request" will typically suggests that I/O will fail against any of the paths and hence DMP does not retry. Here is the probable sequence:

•    DMP sends a correct SCSI inquiry request
•    The HBA layers receive the request, however responds with an “illegal request”
•    The device responds back to DMP saying the request is invalid
•    Since the response came from the end device, which is typically same across any of the paths, DMP fails the I/O

Since in this case the error returned by the device indicated a media failure, DMP did not retry any of the related paths. In the event that the error indicated a path failure or the HBA could not send the packet request to the device, DMP would have tried an alternate path.


Product design change:

In order to cater for the above event, with 5.0 MP3 RP5 onwards we will now request that DMP try alternate paths instead of failing the I/O by changing the action to a DMP_PATH_FAILURE event.