Loss of SCSI-3 Persistent Group Reservation (PGR) Keys during cluster reconfiguration if there are 32 or more keys per LUN

book

Article ID: 100001750

calendar_today

Updated On:

Resolution

A new incident was found in the way the Dynamic Multi-Pathing (DMP) module handling the PGR keys.   Due to a bug in interpreting the number of bytes in the response buffer of the SCSI Read PGR Keys command, currently the DMP module can handle only a maximum of 31 PGR keys.  In the response buffer the PGR keys are returned and the total number of bytes of those PGR keys is recorded as a 4-byte value, but currently the DMP module only recognizes one byte of the 4-byte value.   Each PGR keys consists of 8 bytes, a 1-byte value can represent a value of up to 255 and hence a maximum of 31 keys (248/8).  When there are N keys where N is greater than or equal to 32, currently DMP can only read the first M keys where M equal to the result of N modulo 32.   For example, if there are 36 keys, then currently DMP can only read the first 4 keys as 36 modulo 32 equals to 4.

In a Cluster Volume Manager (CVM) environment the number of keys registered per LUN is determined by the number of nodes in the CVM cluster and the number of paths to the LUN.    For Active/Active diskarray type, each active CVM node will always register one key for each DMP path.   For Active/Passive diskarray type, the active CVM node will only register a PGR key for the primary paths at first, during the path failover, additional PGR key will be registered on each secondary paths.  As a result the maximum number of PGR keys can be registered on each LUN is the product of the number of CVM cluster nodes and the number of paths to the LUN.

Because of the incomplete list of PGR keys read by DMP, during a CVM cluster reconfiguration some of PGR keys may be incorrectly removed from the LUNs.. This will cause VxVM to return random write errors to the applications or filesystems when DMP selects the path without PGR key for the write operation.
 
The incident affects the following combinations of VxVM versions, platforms and diskarray types only.
 
1. On Linux, Solaris and HP-UX platforms the incident affects pre-5.1 VxVM installation with A/P (Active/Passive) and ALUA (Asymmetric Logical Unit Access) diskarrays only. 
2. On AIX platform the incident affects both pre-5.1 and 5.1 VxVM installations with all diskarray types.

Before the fix for the Etrack incident listed in the Supplemental Material section of the article is available, Veritas advises customers to limit the number of PGR keys on a LUN to a maximum of 31.   In a CVM environment, please ensure that the number of CVM nodes times the number of paths to each LUN be less than 32.

This problem is addressed along with other enhancements through Etrack 1082077 on all platforms except AIX.  Fix is present in VxVM 5.1 right from beginning on all platforms except AIX.  Whereas on AIX platform, this issue for VxVM 5.1 is addressed through the child Etrack 2064998 of Etrack 2040150.
 
Available Fixes
-----------------------
The fix is already available in AIX and Solaris VxVM 5.0MP3 RP4 patch.
 
Future Fixes
------------------
Fix will be included in the following patches.
AIX VxVM 5.1 RP2
HP-UX 5.0.1 RP2
Linux VxVM 5.0 MP4 RP1



 
 

 

Issue/Introduction

Loss of SCSI-3 Persistent Group Reservation (PGR) Keys during cluster reconfiguration if there are 32 or more keys per LUN

Additional Information

ETrack: 2040150 ETrack: 2064998 ETrack: 1082077