The following messages suggest the LUN serial number has changed for internal disk "sda"
fred vxvm:vxconfigd: V-5-1-14523 LUN serial number of the OS device path with device number 8/0 has changed from 600605B00D2DDE2021A9D944143610BB (sda) to 00bb10361444d9a92120de2d0db00506 (sda)
fred vxvm:vxconfigd: V-5-1-14522 Attempt to isolate DMP node 8/0 failed, error retuned is Device or resource busy: Device or resource busy
fred vxvm:vxconfigd: V-5-1-8769 ddl_find_devices_in_system: ddl_reconfigure_all failed: Device or resource busy: Device or resource busy
fred vxvm:vxconfigd: V-5-1-16011 Data Corruption Protection Activated - User Corrective Action Needed:
fred vxvm:vxconfigd: To recover, first ensure that the OS device tree is up to date (requires OS specific commands).
fred vxvm:vxconfigd: Then, execute 'vxdisk rm' on the following devices before reinitiating device discovery using 'vxdisk scandisks'
fred vxvm:vxconfigd: V-5-1-0 fred_disk_4
fred vxvm:vxconfigd: V-5-1-13790 No device configuration changes have been applied to DMP kernel database.
fred vxvm:vxconfigd: V-5-1-13791 Please consult the documentation for correct procedure to replace disk/path.
fred kernel: VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x5) on dmpnode 201/0x490
Vital Product Data can be obtained from multiple locations on a disk from two main Page Codes 0x80 and 0x83.
Each vendor may populate Page Code 0x80 and 0x83 differently, this is why the Veritas Array Support Libraries ( ASL’s ) are so critical.
Vital Product Data from Page Code 0x83
Serial Number details can be obtained from Page 0x83 by default
Vendor Specific Data from Page Code 0x80 – Alternate serial number information
Some vendors use Page 0x80 to store their ‘cabinet serial number’
ASLs
Unlike standard ASL’s (Array Support Libraries) which check for UDID (Unique Disk Identifier) attributes at fixed page code locations.
The scsi3_jbod category uses a flexible approach to claim any disk which supports either page 0x80 or 0x83. This is mainly required to work with as many scsi3 compliant devices as possible.
Since vxconfigd cannot remember if the disk was originally claimed under the scsi3_jbod category with an identifier from page code 0x83, disk claiming and other related errors force vxconfigd to check alternate page code 0x80 for the LSN and CSN (Cabinet Serial Number) details
Not checking page 0x80 will reduce the DMP support matrix and even cause problems in claiming disks which have already been using pages from 0x80 and under DMP as a scsi3_jbod.
The downside is that the Lun Serial Numbers & Cabinet Serial Numbers may differ from the different page code locations, resulting in a false DCPA event and notification.
If a DMPNODE is detected with different disk attributes, access to the DMPNODE is stopped by the DCPA feature.
One symptom could be a File Descriptor (FD) Leak with vxconfigd preventing the disk from being accessed correctly.
Sample error message
fred vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sd##: Too many open files
The cause of DCPA events is related to vxvm running "vxdisk scandisks" to generate the DMP tree.
The correspond SCSI inquiry command fails to extract the serial number from page 0x83 for the problematic disk, and instead uses the details from the alternate page 0x80 to form the disk serial number.
The newly discovered serial number details from page 0x80 are different to the current DMP database details (originally from page 0x83), thus leads the false DCPA event, resulting in the DMPNODE being disabled and leading to the unplanned operational outage
Sample evidence:
Internal disks sda/sdb/sdc/sdd/sde are single-path DMPNODEs.
- sda/sdb/sdc come from c0/c1
- sdd/sde come from c2.
# vxdmpadm getsubpaths ctlr=c0
NAME STATE[A] PATH-TYPE[M] DMPNODENAME ENCLR-TYPE ENCLR-NAME ATTRS PRIORITY
===================================================================================================
sda ENABLED(A) - fred_disk_4 Disk disk - -
# vxdmpadm getsubpaths ctlr=c1
NAME STATE[A] PATH-TYPE[M] DMPNODENAME ENCLR-TYPE ENCLR-NAME ATTRS PRIORITY
===================================================================================================
sdb ENABLED(A) - fred_disk_2 Disk disk - -
sdc ENABLED(A) - fred_disk_3 Disk disk - -
# vxdmpadm getsubpaths ctlr=c2
NAME STATE[A] PATH-TYPE[M] DMPNODENAME ENCLR-TYPE ENCLR-NAME ATTRS PRIORITY
===================================================================================================
sde ENABLED(A) - fred_disk_0 Disk disk - -
sdd ENABLED(A) - fred_disk_1 Disk disk - -
DCPA events occurred on internel disks(sda/sdb/sdc) on different days:
fred vxvm:vxconfigd[12675]: V-5-1-14523 LUN serial number of the OS device path with device number 8/0 has changed from 600605B00D2DDE2021A9D944143610BB (sda) to 00bb10361444d9a92120de2d0db00506 (sda)
fred vxvm:vxconfigd: V-5-1-14523 LUN serial number of the OS device path with device number 8/16 has changed from 600605B009C17E6021949B6E148F16C7 (sdb) to 00c7168f146e9b9421607ec109b00506 (sdb)
fred vxvm:vxconfigd: V-5-1-14523 LUN serial number of the OS device path with device number 8/32 has changed from 600605B009C17E6021949B78152AC2DE (sdc) to 00dec22a15789b9421607ec109b00506 (sdc)
VRTSexplorer evidence review
It is possible to retrieve the original (old) LSN and new LSN from the vxscsinq file captured by the VRTSexplorer.
Confirming the original LSN came from page 0x83, and new LSN is now coming from page 0x80:
Sample output:
# egrep -a -A 20 "/dev/sda,|/dev/sdb,|/dev/sdc," vxscsiinq | egrep -a "evpd 0x1|Product serial number|Data"
Inquiry for /dev/sda, evpd 0x1, page code 0x80
Product serial number : 00bb10361444d9a92120de2d0db00506 <<<<< new LSN of sda
Inquiry for /dev/sda, evpd 0x1, page code 0x83
Data : 600605b00d2dde2021a9d944143610bb <<<<< old LSN of sda
Inquiry for /dev/sdb, evpd 0x1, page code 0x80
Product serial number : 00c7168f146e9b9421607ec109b00506 <<<<< new LSN of sdb
Inquiry for /dev/sdb, evpd 0x1, page code 0x83
Data : 600605b009c17e6021949b6e148f16c7 <<<<< old LSN of sdb
Inquiry for /dev/sdc, evpd 0x1, page code 0x80
Product serial number : 00dec22a15789b9421607ec109b00506 <<<<< new LSN of sdc
Inquiry for /dev/sdc, evpd 0x1, page code 0x83
Data : 600605b009c17e6021949b78152ac2de <<<<< old LSN of sdc
DMP errors are observed on fred_disk_2/3/4.
Error types include EIO(0x5) error and DMP_CONN_FAILURE(0x20d) error.
The disks span controller ids c0 & c1.
# grep err /var/adm/vx/dmpevents.log* | awk '{print $(NF-3)" "$NF}' | sort | uniq -c
1612 fred_disk_2(201/1024)
2079 fred_disk_3(201/1008)
1965 fred_disk_4(201/1040)
540 sda(201/1040)
# awk '{print $(NF-3)" "$NF}' /var/adm/vx/dmpevents.log* | sort | uniq -c
1 Binary matches
486 (errno=0x20d) bnics1nbu3_disk_2(201/1024)
535 (errno=0x20d) bnics1nbu3_disk_3(201/1008)
1126 (errno=0x5) bnics1nbu3_disk_2(201/1024)
1544 (errno=0x5) bnics1nbu3_disk_3(201/1008)
1965 (errno=0x5) bnics1nbu3_disk_4(201/1040)
540 (errno=0x5) sda(201/1040)
5 invalid code
# lsblk_S
NAME HCTL TYPE VENDOR MODEL REV TRAN
sda 0:2:0:0 disk Intel RMS3CC080 4.68 <<<<<< hit error
sdb 1:2:0:0 disk Intel RS3SC008 4.68 <<<<<< hit error
sdc 1:2:6:0 disk Intel RS3SC008 4.68 <<<<<< hit error
sdd 2:1:1:0 disk LSI Logical Volume 3000 sas
sde 2:1:0:0 disk LSI Logical Volume 3000 sas
NOTE: No IO error messages reported from fred_disk_0/1(sdd/sde) whose controller id is c2.