False DCPA (Data Corruption Protection Activated) events for JBOD or Internal disks can occur when the original LUN Serial Number cannot be retrieved from the original Page Code offset

book

Article ID: 100049898

calendar_today

Updated On:

Description

Error Message

 

The following messages suggest the LUN serial number has changed for internal disk "sda"


fred vxvm:vxconfigd: V-5-1-14523 LUN serial number of the OS device path with device number 8/0 has changed from 600605B00D2DDE2021A9D944143610BB (sda) to 00bb10361444d9a92120de2d0db00506 (sda)
fred vxvm:vxconfigd: V-5-1-14522 Attempt to isolate DMP node 8/0 failed, error retuned is Device or resource busy: Device or resource busy
fred vxvm:vxconfigd: V-5-1-8769 ddl_find_devices_in_system: ddl_reconfigure_all failed: Device or resource busy: Device or resource busy
fred vxvm:vxconfigd: V-5-1-16011 Data Corruption Protection Activated - User Corrective Action Needed:
fred vxvm:vxconfigd: To recover, first ensure that the OS device tree is up to date (requires OS specific commands).
fred vxvm:vxconfigd: Then, execute 'vxdisk rm' on the following devices before reinitiating device discovery using 'vxdisk scandisks'
fred vxvm:vxconfigd: V-5-1-0 fred_disk_4
fred vxvm:vxconfigd: V-5-1-13790 No device configuration changes have been applied to DMP kernel database.
fred vxvm:vxconfigd: V-5-1-13791 Please consult the documentation for correct procedure to replace disk/path.
fred kernel: VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x5) on dmpnode 201/0x490

 

 

Cause


Vital Product Data can be obtained from multiple locations on a disk from two main Page Codes 0x80 and 0x83.

Each vendor may populate Page Code 0x80 and 0x83 differently, this is why the Veritas Array Support Libraries ( ASL’s ) are so critical.
 

Vital Product Data from Page Code 0x83

Serial Number details can be obtained from Page 0x83 by default


Vendor Specific Data from Page Code 0x80 – Alternate serial number information

Some vendors use Page 0x80 to store their ‘cabinet serial number’

ASLs

Unlike standard ASL’s (Array Support Libraries) which check for UDID (Unique Disk Identifier) attributes at fixed page code locations.

The scsi3_jbod category uses a flexible approach to claim any disk which supports either page 0x80 or 0x83. This is mainly required to work with as many scsi3 compliant devices as possible.

Since vxconfigd cannot remember if the disk was originally claimed under the scsi3_jbod category with an identifier from page code 0x83, disk claiming and other related errors force vxconfigd to check alternate page code 0x80 for the LSN and CSN (Cabinet Serial Number) details

Not checking page 0x80 will reduce the DMP support matrix and even cause problems in claiming disks which have already been using pages from 0x80 and under DMP as a scsi3_jbod.

The downside is that the Lun Serial Numbers & Cabinet Serial Numbers may differ from the different page code locations, resulting in a false DCPA event and notification.
 

If a DMPNODE is detected with different disk attributes, access to the DMPNODE is stopped by the DCPA feature.

One symptom could be a File Descriptor (FD) Leak with vxconfigd preventing the disk from being accessed correctly.
 

Sample error message

fred vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sd##: Too many open files

 

Resolution


The cause of DCPA events is related to vxvm running "vxdisk scandisks" to generate the DMP tree.

The correspond SCSI inquiry command fails to extract the serial number from page 0x83 for the problematic disk, and instead uses the details from the alternate page 0x80 to form the disk serial number.
 
The newly discovered serial number details from page 0x80 are different to the current DMP database details (originally from page 0x83), thus leads the false DCPA event, resulting in the DMPNODE being disabled and leading to the unplanned operational outage
 

Sample evidence:
 
Internal disks sda/sdb/sdc/sdd/sde are single-path DMPNODEs.

- sda/sdb/sdc come from c0/c1

- sdd/sde come from c2.
 

# vxdmpadm getsubpaths ctlr=c0
NAME         STATE[A]   PATH-TYPE[M] DMPNODENAME  ENCLR-TYPE   ENCLR-NAME     ATTRS        PRIORITY
===================================================================================================
sda          ENABLED(A)     -          fred_disk_4  Disk         disk            -         -


# vxdmpadm getsubpaths ctlr=c1
NAME         STATE[A]   PATH-TYPE[M] DMPNODENAME  ENCLR-TYPE   ENCLR-NAME     ATTRS        PRIORITY
===================================================================================================
sdb          ENABLED(A)     -          fred_disk_2  Disk         disk            -         -
sdc          ENABLED(A)     -          fred_disk_3  Disk         disk            -         -

 

# vxdmpadm getsubpaths ctlr=c2
NAME         STATE[A]   PATH-TYPE[M] DMPNODENAME  ENCLR-TYPE   ENCLR-NAME     ATTRS        PRIORITY
===================================================================================================
sde          ENABLED(A)     -          fred_disk_0  Disk         disk            -         -
sdd          ENABLED(A)     -          fred_disk_1  Disk         disk            -         -


DCPA events occurred on internel disks(sda/sdb/sdc) on different days:

fred vxvm:vxconfigd[12675]: V-5-1-14523 LUN serial number of the OS device path with device number 8/0 has changed from 600605B00D2DDE2021A9D944143610BB (sda) to 00bb10361444d9a92120de2d0db00506 (sda)

fred vxvm:vxconfigd: V-5-1-14523 LUN serial number of the OS device path with device number 8/16 has changed from 600605B009C17E6021949B6E148F16C7 (sdb) to 00c7168f146e9b9421607ec109b00506 (sdb)

fred vxvm:vxconfigd: V-5-1-14523 LUN serial number of the OS device path with device number 8/32 has changed from 600605B009C17E6021949B78152AC2DE (sdc) to 00dec22a15789b9421607ec109b00506 (sdc)

 

VRTSexplorer evidence review


It is possible to retrieve the original (old) LSN and new LSN from the vxscsinq file captured by the VRTSexplorer.

Confirming the original LSN came from page 0x83, and new LSN is now coming from page 0x80:

Sample output:

# egrep -a -A 20 "/dev/sda,|/dev/sdb,|/dev/sdc," vxscsiinq | egrep -a "evpd 0x1|Product serial number|Data"
Inquiry for /dev/sda, evpd 0x1, page code 0x80
Product serial number            : 00bb10361444d9a92120de2d0db00506   <<<<<  new LSN of sda
Inquiry for /dev/sda, evpd 0x1, page code 0x83
Data                : 600605b00d2dde2021a9d944143610bb       <<<<<   old LSN of sda
Inquiry for /dev/sdb, evpd 0x1, page code 0x80
Product serial number            : 00c7168f146e9b9421607ec109b00506   <<<<< new LSN of sdb
Inquiry for /dev/sdb, evpd 0x1, page code 0x83
Data                : 600605b009c17e6021949b6e148f16c7     <<<<< old LSN of sdb
Inquiry for /dev/sdc, evpd 0x1, page code 0x80
Product serial number            : 00dec22a15789b9421607ec109b00506    <<<<< new LSN of sdc
Inquiry for /dev/sdc, evpd 0x1, page code 0x83
Data                : 600605b009c17e6021949b78152ac2de    <<<<< old LSN of sdc


DMP errors are observed on fred_disk_2/3/4.

Error types include EIO(0x5) error and DMP_CONN_FAILURE(0x20d) error.

The disks span controller ids c0 & c1.

# grep err /var/adm/vx/dmpevents.log* | awk '{print $(NF-3)" "$NF}' | sort | uniq -c
   1612 fred_disk_2(201/1024)
   2079 fred_disk_3(201/1008)
   1965 fred_disk_4(201/1040)
    540 sda(201/1040)
 

# awk '{print $(NF-3)" "$NF}' /var/adm/vx/dmpevents.log* | sort | uniq -c
      1 Binary matches
    486 (errno=0x20d) bnics1nbu3_disk_2(201/1024)
    535 (errno=0x20d) bnics1nbu3_disk_3(201/1008)
   1126 (errno=0x5) bnics1nbu3_disk_2(201/1024)
   1544 (errno=0x5) bnics1nbu3_disk_3(201/1008)
   1965 (errno=0x5) bnics1nbu3_disk_4(201/1040)
    540 (errno=0x5) sda(201/1040)
      5 invalid code
 

# lsblk_S
NAME  HCTL       TYPE VENDOR   MODEL             REV TRAN
sda   0:2:0:0    disk Intel    RMS3CC080        4.68      <<<<<<  hit error
sdb   1:2:0:0    disk Intel    RS3SC008         4.68      <<<<<< hit error
sdc   1:2:6:0    disk Intel    RS3SC008         4.68      <<<<<< hit error
sdd   2:1:1:0    disk LSI      Logical Volume   3000 sas
sde   2:1:0:0    disk LSI      Logical Volume   3000 sas


NOTE: No IO error messages reported from fred_disk_0/1(sdd/sde) whose controller id is c2.

Issue/Introduction

"Data Corruption Protection Activated (DCPA)" Veritas Volume Manager (VxVM) 5.0 MP3 introduced the "Data Corruption Prevention Activated" (DCPA) feature which basically puts a bubble around each DMP metadevice.

The Data Corruption Prevention Activated (DCPA) feature will assist with safeguarding your data and potentially preventing data corruption.

This is a proactive message, rather than a warning about specific corruption. Using the "bubble protector" technology, the feature prevents newly presented paths from merging with an incorrect pre-existing DMP metadevice. This dramatically helps to reduce the chances of data corruption, when the incorrect LUN removal steps and critical storage provisioning process are not followed..