Diskgroup import failed due to cloned devices having the same on-disk UDID (udid_asl)

book

Article ID: 100004382

calendar_today

Updated On:

Description

Error Message

In case of local diskgroup import, if vxconfigd debug is turned on, the following messages can be found in the vxconfigd debug log.

12/15 19:03:56:  VxVM vxconfigd DEBUG V-5-1-16000 IMPORT(22): disk c65t4d4 phi_flags
12/15 19:03:56:  VxVM vxconfigd DEBUG V-5-1-16230 da_is_any_same_disk: disk c65t4d4 DG img4dg diskid 1204959468.305.hostA
        udid HITACHI%5FOPEN-V*5%5F0B1A2%5F1302
12/15 19:03:56:  VxVM vxconfigd DEBUG V-5-1-16229 da_is_any_same_disk: c65t4d4 is same disk same as c6t9d3
12/15 19:03:56:  VxVM vxconfigd DEBUG V-5-1-16004 IMPORT: disk c65t4d4 being skipped

 

In case of CVM shared diskgroup re-import during CVM mastership takeover, the vxconfigd debug log will show the following messages.

12/23 11:41:04:  VxVM vxconfigd DEBUG  V-5-1-16229 da_is_any_same_disk: ds51000_9 is same disk same as ds51000_6
12/23 11:41:04:  VxVM vxconfigd WARNING  V-5-1-16066 da_dg_reimport: disk 1285899514.43.hostB not found
12/23 11:41:04:  VxVM vxconfigd DEBUG  V-5-1-5529 import_finish:da_dg_reimport returned 183

Error 183 is defined in VxVM as:

#define VE_DISK_NOT_FOUND 183   /* Disk for disk group not found */

The following can be found in the system messages:

Dec 23 11:41:02 hostB vxvm:vxconfigd: V-5-1-7899 CVM_VOLD_CHANGE command received
Dec 23 11:41:02 hostB vxvm:vxconfigd: V-5-1-13170 Preempting CM NID 0
....
Dec 23 11:41:04 hostB vxvm:vxconfigd: V-5-1-16066 da_dg_reimport: disk 1285899514.43.hostB not found
.....
Dec 23 11:41:06 hostB kernel: GAB INFO V-15-1-20032 Port w closed
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-8060 master: could not delete shared disk groups
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-3865 node 1: missing vxconfigd
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-7934 Disk group cfs_prod_tm_sdc_dsg2_dg: Disabled by errors
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-3865 node 1: missing vxconfigd
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-7934 Disk group cfs_prod_tm_sdc_ftse1_dg: Disabled by errors
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-3865 node 1: missing vxconfigd
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-7934 Disk group cfs_prod_tm_sdc_dsg1_dg: Disabled by errors
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-3835 vold_set_new_role(): 183 returned from role_assume()
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-11467 kernel_fail_join() :          Reconfiguration interrupted: Reason is transition to role failed (12, 1)
Dec 23 11:41:07 hostB kernel: VxVM vxio V-5-0-164 Failed to join cluster CFS_PROD_TM_SDC_1, aborting
Dec 23 11:41:07 hostB kernel: GAB INFO V-15-1-20032 Port v closed
Dec 23 11:41:07 hostB vxvm:vxconfigd: V-5-1-7901 CVM_VOLD_STOP command received

 

Cause

The "vxdisk -v list " outputs can be used to check the duplicate on-disk UDIDs (udid_asl). 

For example, the following are the "vxdisk -v list " outputs from the source-clone disk pair.

----------------------------------
Device:    c93t3d1                   <<< original disk 
disk:      name=c93t3d1 id=1284855955.394.hostA  
       <<< diskid, different from the clone disk, reinitialized on 2010 Sep 19 10:25:55
group:     name=img6dg id=1262083254.514.hostA
udid:      HITACHI%5FOPEN-V*5%5F0B1A2%5F1302    
<<< DDL UDID, same as the on-disk UDID
tag      udid_asl=HITACHI%5FOPEN-V*5%5F0B1A2%5F1302         <<< on-disk UDID, the same as the DDL UDID
----------------------------------
Device:    c87t10d4                  <<< clone disk from the above original disk
disk:      name=c38t6d4 id=1204959468.305.hostA        
<<< diskid, different from the orginal disk, when cloned the original was initialized on 2008 Mar  8 17:57:48
group:     name=img4dg id=1193535755.398.hostA
udid:      HITACHI%5FOPEN-E*5%5FA9AD%5F03C3      
<<< DDL UDID, different from the on-disk UDID
tag      udid_asl=HITACHI%5FOPEN-V*5%5F0B1A2%5F1302       <<< on-disk UDID, different from the DDL UDID, cloned from the original disk
----------------------------------

The diskid information for both the devices is different but on-disk udids are the same.

The "vxdisk -v list " output lists two udids : 
- one udid is Disk Discovery Layer (DDL) udid and
- another one is on-disk udid which is persistently saved in the private region of the disk.

DDL udid is built when device is scanned by DDL/DMP on the host and it reflects the hardware attributes of the disks, e.g. Cabinet serial number, LUN serial number, PID, VID etc.   On-disk udid contains the DDL udid when a disk is first initialized (vxdisksetup).  If the disk is later cloned, the on-disk udid will be copied to the clone disk.   The two udids should be the same if the disk is not cloned. 
 

The diskid's become different because the original disk was reinitialized after the clone disk was cloned.   The initalization time of a disk can be deduced from the diskid by using Solaris mdb.

mdb> 0t1284855955=Y
                2010 Sep 19 10:25:55   
mdb> 0t1204959468=Y
                2008 Mar  8 17:57:48

 

Resolution

The on-disk udid should be updated to solve the diskgroup import failure.  This can be done by deporting the diskgroup and updating the on-disk udid by using the "vxdisk updateudid" command.

# vxdg deport

# vxdisk updateudid 

For example, in a diskgroup with two disks.

dg alawdg       default      default  5000     1294379866.55.fire01
dm alawdg01     hds9500-alua1_412 auto 65536   8228608  -
dm alawdg02     hds9500-alua1_413 auto 65536   8228608  -

hds9500-alua1_413 was cloned from hds9500-alua1_412. 

# vxdisk -v list hds9500-alua1_412 | egrep 'udid|disk:'
disk:      name=alawdg01 id=1253503347.60.fire02
udid:      HITACHI%5FDF600F%5FD600101C%5F019C
 tag      udid_asl=HITACHI%5FDF600F%5FD600101C%5F019C

 

# vxdisk -v list hds9500-alua1_413 | egrep 'udid|disk:'
disk:      name=alawdg02 id=1253503388.60.fire02
udid:      HITACHI%5FDF600F%5FD600101C%5F019D
 tag      udid_asl=HITACHI%5FDF600F%5FD600101C%5F019C    <<< cloned from hds9500-alua1_412

 

Note that "vxdisk update " will not update the udid_asl if the diskgroup is imported.

# vxdisk updateudid hds9500-alua1_413

# vxdisk -v list hds9500-alua1_413 | egrep 'udid|disk:'
disk:      name=alawdg02 id=1253503388.60.fire02
udid:      HITACHI%5FDF600F%5FD600101C%5F019D
 tag      udid_asl=HITACHI%5FDF600F%5FD600101C%5F019C     <<< "vxdisk updateudid" didn't update udid_asl if diskgroup is imported.

 

Deport the diskgroup and udid_asl can be updated.

# vxdg deport alawdg

# vxdisk updateudid hds9500-alua1_413

# vxdisk -v list hds9500-alua1_413 | egrep 'udid|disk:'
disk:      name= id=1253503388.60.fire02
udid:      HITACHI%5FDF600F%5FD600101C%5F019D
 tag      udid_asl=HITACHI%5FDF600F%5FD600101C%5F019D     <<< updated

 

Duplicate disk id's and duplicate on-disk udid's can be checked with the following commands.

Checkithe in-kernel private regions:

# vxdisk -q list | cut -f 1 -d' ' | xargs -i vxdisk -v list {} 2>/dev/null | egrep '^disk:|udid_asl' | sed 's/.*udid_asl=//' | sed 's/.*id=//' | sort | uniq -c

Check the on-disk private regions:

# vxdisk -q list | cut -f 1 -d' ' | xargs -i /etc/vx/diag.d/vxprivutil list /dev/vx/rdmp/{} 2>/dev/null | egrep '^diskid|udid_asl' | sed 's/.*udid_asl=//' | sort | uniq -c

For example,

# vxdisk -q list | cut -f 1 -d' ' | xargs -i vxdisk -v list {} 2>/dev/null | egrep '^disk:|udid_asl' | sed 's/.*udid_asl=//' | sed 's/.*id=//' | sort | uniq -c
   1 1166617995.273.alaw1
   1 1166618297.275.alaw1
....
   1 HITACHI%5FDF600F%5FD600101C%5F0054
   1 HITACHI%5FDF600F%5FD600101C%5F0055
...

# vxdisk -q list | cut -f 1 -d' ' | xargs -i /etc/vx/diag.d/vxprivutil list /dev/vx/rdmp/{} 2>/dev/null | egrep '^diskid|udid_asl' | sed 's/.*udid_asl=//' | sort | uniq -c
   1 diskid:  1166617995.273.alaw1
   1 diskid:  1166618297.275.alaw1
....
   1 HITACHI%5FDF600F%5FD600101C%5F0054
   1 HITACHI%5FDF600F%5FD600101C%5F0058
....

 

 

 

Issue/Introduction

Diskgroup import failed due to cloned devices having the same on-disk UDID (udid_asl). In one instance involving a local diskgroup,the disks were cloned using diskarray hardware feature. The cloned disks were then presented back to the same machine and the cloned disks will then have the same on-disk UDID (udid_asl) as the source disks belonging to the local diskgroup. However the diskids were different. In this situation every time the diskgroup was deported and re-imported, Veritas Volume Manager (VxVM) did not import one of the disks in the source-clone pair because VxVM considered them to be the same disk. VxVM only imported the first discovered disk and refused to import the disk discovered later and marked it as "failed was" in the "vxdisk list" output. In another situation involving a Cluster Volume Manager (CVM) shared diskgroup, when the CVM master left the cluster, the node that tried to take over the CVM mastership failed to re-import the diskgroup and exited the cluster. The problem went on until all the nodes exited the cluster.