Data corruption is possible in a multilevel LUNCLASS environment where the original (source) devices are offline or unavailable during the "vxdg import" LUN selection process

book

Article ID: 100011021

calendar_today

Updated On:

Cause

In this instance, Veritas Volume Manager (VxVM) incorrectly believes both the standard/replicated devices and corresponding CLONE image of the original Disk Group images were of the same LUNCLASS due to the udid_mismatch flag not being retained/reported for all luns associated with the Disk Group image to be imported.

VxVM was unable to differentiate between the two different types of LUNCLASS, REPLICATED (i.e. EMC SRDF, HDS TrueCopy...) and CLONE (EMC BCV, EMC SYMCLONE, HDS ShadowImage).

Resolution

A high priority product enhancement has been added to prevent/fail the DG import operation in multilevel LUNCLASS environment where duplicate VxVM Disk Group images are presented to the same host, and where VxVM is unable to clearly identify which VxVM Disk Group image and corresponding disks contain the source (production) image to be selected for import.

A private hot-fix exists on top of VxVM releases 5.1 SP1 RP4 and 6.0.3 for Solaris Sparc and Solaris x86. The fix will be ported to all platforms as soon as possible, once the required internal test cycles have been completed. Please contact Veritas Support if you require this hot-fix.

The DG import will now correctly fail with the following error:

VxVM vxdg ERROR V-5-1-0 Disk group : import failed:
DG import duplcate clone detected

 

Example #1:  

In this instance, VxVM has two identifical Disk Group images visible on the same host. Two REPLICATED (EMC SRDF-R1) luns and two corresponding CLONE (EMC BCV) luns in a split (read-write) state.


[At CLI level]

# vxdisk -eo alldgs list | egrep '(emc1_02b6|emc1_02b7|emc1_0042|emc1_0043)'
emc1_02b6    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d246s2 lun bcv snap
emc1_02b7    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d311s2 lun bcv snap
emc1_0042    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d244s2 lun srdf-r1 Mirror
emc1_0043    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d245s2 lun srdf-r1 Mirror


During the import validation, the revised code now prevents the Disk Group operation as VxVM is unable to determine which is the original (source) Disk Group image.
 

# vxdg import newsrdf
VxVM vxdg ERROR V-5-1-0 Disk group newsrdf: import failed:
DG import duplicate clone detected.
Please refer to system log for details.

 

[In syslog]

Oct 31 15:46:10 viper vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-0 Disk Group newsrdf import failed: Duplicate clone disks are detected, please follow the vxdg (1M) man page to import disk group with duplicate clone disks. Duplicate clone disks are: emc1_02b7 : emc1_0043  emc1_0042 : emc1_02b6

Oct 31 15:46:10 viper vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 Disk group import of newsrdf failed with error 427 - DG import duplicate clone detected

 

Example #2:

In this instance, VxVM has two identifical Disk Group images visible on the same host. Two REPLICATED (EMC SRDF-R1) luns of which one of the disks is unavailable/failed and two corresponding CLONE (EMC BCV) luns in a split (read-write) state.


[At CLI level]

# vxdisk -eo alldgs list | egrep '(emc1_02b6|emc1_02b7|emc1_0042|emc1_0043)'
emc1_02b6    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d246s2 lun bcv snap
emc1_02b7    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d311s2 lun bcv snap
emc1_0042    auto           -            -           offline udid_mismatch c1t5006048C536979A0d244s2 lun srdf-r1 Mirror
emc1_0043    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d245s2 lun srdf-r1 Mirror


During the import validation, the revised code now prevents the Disk Group operation as VxVM is unable to determine which is the original (source) Disk Group image.
 

 

# vxdg import newsrdf
VxVM vxdg ERROR V-5-1-0 Disk group newsrdf: import failed:
DG import duplicate clone detected.
Please refer to system log for details.

 

[In syslog]

Oct 31 15:47:14 viper vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-0 Disk Group newsrdf import failed: Duplicate clone disks are detected, please follow the vxdg (1M) man page to import disk group with duplicate clone disks. Duplicate clone disks are: emc1_02b7 : emc1_0043

Oct 31 15:47:14 viper vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 Disk group import of newsrdf failed with error 427 - DG import duplicate clone detected


Manual intervention

The product now requires manual intervention to ensure the correct disks are updated individually, so the intended devices to be imported appear as a source device to VxVM.
The import process will then select the source devices in preferrence to CLONE devices during the LUN selection process for the intended import operation.

# vxdisk -eo alldgs list | egrep '(emc1_02b6|emc1_02b7|emc1_0042|emc1_0043)'
emc1_02b6    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d246s2 lun bcv snap
emc1_02b7    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d311s2 lun bcv snap
emc1_0042    auto           -            -           offline udid_mismatch c1t5006048C536979A0d244s2 lun srdf-r1 Mirror
emc1_0043    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d245s2 lun srdf-r1 Mirror

 

Example #3:


The on-disk UDID can be updated using the "vxdisk updateudid "

# vxdisk updateudid emc1_0043

As the disk in this instance is a source device and not a CLONE, the "clone_disk" flag must be turned off manually using "vxdisk set clone=off" command for one disk at a time.

# vxdisk set emc1_0043 clone=off


Note: The "clone_disk" flag must never be turned off for a real CLONE (HARDWARE_MIRROR) device .


The remaining REPLICATED (source) device "emc1_0043" no longer reports a udid_mismatch or a reference to a "clone_disk" flag. With the current design VxVM will now interpret the REPLICATED (srdf-r1) device as a source disk.
 

# vxdisk -eo alldgs list | egrep '(emc1_02b6|emc1_02b7|emc1_0042|emc1_0043)'
emc1_02b6    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d246s2 lun bcv snap
emc1_02b7    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d311s2 lun bcv snap
emc1_0042    auto           -            -           offline udid_mismatch c1t5006048C536979A0d244s2 lun srdf-r1 Mirror
emc1_0043    auto:cdsdisk   -            (newsrdf)   online               c1t5006048C536979A0d245s2 lun srdf-r1 Mirror   <<<< SOURCE DISK

 
In this example, the VxVM Disk Group "newsrdf" originally contained two disks, as a result the source Disk Group image must be imported using the "-f" (force) operation for VxVM to fail the missing (unavailable) disk during the import validation process

 


# vxdg import newsrdf

VxVM vxdg ERROR V-5-1-10978 Disk group newsrdf: import failed:
Disk for disk group not found

 


By specifying the "-f" (force) import option, the source (standard) Disk Group image is imported and the unuavailable disk is failed.


# vxdg -f import newsrdf

VxVM vxdg WARNING V-5-1-560 Disk emc0_01d8: Not found, last known location: emc1_02b6



[syslog]

Oct 31 16:05:29 viper vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-0 Disk Group newsrdf has a mix of standard and cloned disks: Trying to import the disk group from the standard disks: emc1_0043
Oct 31 16:05:29 viper vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-0 Selecting configuration database copy from emc1_0043 from disks: emc1_0043
Oct 31 16:05:29 viper vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-0 Trying to import the disk group newsrdf using configuration database copy from emc1_0043
Oct 31 16:05:30 viper vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-546 Disk emc0_01d8 in group newsrdf: Disk device not found
Oct 31 16:05:30 viper vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-0 Disk group import of newsrdf succeeded.

 
The LUN selection process clearly identifies that Veritas disk access name "emc1_0043" is the only candidate to be referenced during the import validation process, the corresponding CLONE devices are excluded/igorned.

# vxdisk -eo alldgs list | egrep '(emc1_02b6|emc1_02b7|emc1_0042|emc1_0043)'
emc1_02b6    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d246s2 lun bcv snap
emc1_02b7    auto:cdsdisk   -            (newsrdf)   online udid_mismatch c1t5006048C536979A0d311s2 lun bcv snap
emc1_0042    auto           -            -           offline udid_mismatch c1t5006048C536979A0d244s2 lun srdf-r1 Mirror
emc1_0043    auto:cdsdisk   emc0_01d9    newsrdf     online               c1t5006048C536979A0d245s2 lun srdf-r1 Mirror
-            -         emc0_01d8    newsrdf      failed was:emc1_02b6



Automation

 


The process can be automated mostly by the use of a Veritas Cluster Server (VCS) preonline trigger especially developed to automate the updating of the on-disk UDID (and disabling of the clone_disk flag) content for the intended disks associated with a given LUNCLASS.

The VCS preonline trigger article is referenced below and is available to update UDID based content for replicated devices manage by VCS with VxVM versions 5.1 SP1 RP3 and 6.0.1 onwards.
 

 


Applies To

Cross Platform

All versions impacted.

 

Issue/Introduction

In multilevel LUNCLASS environment, a regular Disk Group import operation can wrongly result in a mix of disks being imported from different LUNCLASSES associated with the same Veritas Volume Manager (VxVM) managed Disk Group image residing on the same host leading to potential data corruption.   This issue is only encountered if the original (source) devices are offline or unavailable during the “vxdg import” LUN selection process. The DG import process should have failed by design and not have  incorrectly selected a corresponding CLONE(s) device for the unavailable standard or replicated related device(s) As the DG import operation succeeds, it is possible to encounter a situation where a mix of LUNCLASSES are being imported from two different sources leading to data corruption.