How to restore a disk marked failed caused by link failure or I/O transport failure, but not because of the bad disk

book

Article ID: 100006454

calendar_today

Updated On:

Resolution

If vxdisk marked failed,  which is caused by the link offline, I/O transport failure, but not because of the bad disk, the following steps can help to recover the volume:

1. Restore the VERITAS Volume Manager (tm) information to the failed disk. From the vxdisk list output:

# vxdisk list

- - testdg01 testdg failed was:c1t12d0s2

- - testdg02 testdg failed was:c1t13d0s2


2. Use the vxdiskadm utility to remove and replace the failed drive:

# vxdiskadm

Choose item no. 4 to 'Remove a disk for replacement'

Note: The data in the public region will not be lost.
 
Before use the disk as a replace disk,  initialize it first in another session by:
 
vxdisksetup -if accessname   <<<<<< if format/attribute is not default, need use same attribute, format as previous.
 
Note: Reinitialize the disk only to restore Volume Manager information to the private region.
 
Then choose item no. 5 'Replace a failed or removed disk'

After running items 4 and 5, the vxdisk list output should be similar to the following output:

# vxdisk list

c1t12d0s2 sliced testdg01 testdg online
c1t13d0s2 sliced testdg02 testdg online
c1t14d0s2 sliced testdg03 testdg online

3. Recover the volume by performing the following:

# vxprint -htg testdg

dm testdg01 c1t12d0s2 sliced 2179 8920560
dm testdg02 c1t13d0s2 sliced 2179 8920560
dm testdg03 c1t14d0s2 sliced 2179 8920560

v testvol - DISABLED ACTIVE 204800 RAID -
pl testvol-01 testvol DISABLED RECOVER 204864 CONCAT - RW
sd testdg01-01 testvol-01 testdg01 0 102460 c1t12d0 ENA
sd testdg02-01 testvol-01 testdg02 0 102460 c1t13d0 ENA
sd testdg03-01 testvol-01 testdg03 0 102460 c1t14d0 ENA


# vxmend -g testdg -o force off testvol-01

# vxmend -g testdg on testvol-01

Note: This changes the plex state from RECOVER to STALE

# vxprint -htg testdg

dm testdg01 c1t12d0s2 sliced 2179 8920560
dm testdg02 c1t13d0s2 sliced 2179 8920560
dm testdg03 c1t14d0s2 sliced 2179 8920560

v testvol - DISABLED ACTIVE 204800 RAID -
pl testvol-01 testvol DISABLED STALE 204864 CONCAT - RW
sd testdg01-01 testvol-01 testdg01 0 102460 c1t12d0 ENA
sd testdg02-01 testvol-01 testdg02 0 102460 c1t13d0 ENA
sd testdg03-01 testvol-01 testdg03 0 102460 c1t14d0 ENA
 

# vxmend -g testdg fix clean testvol-01

Note: This makes the plex in CLEAN state

# vxprint -htg testdg

dm testdg01 c1t12d0s2 sliced 2179 8920560
dm testdg02 c1t13d0s2 sliced 2179 8920560
dm testdg03 c1t14d0s2 sliced 2179 8920560

v testvol - DISABLED ACTIVE 204800 RAID -
pl testvol-01 testvol DISABLED CLEAN  204864 CONCAT - RW
sd testdg01-01 testvol-01 testdg01 0 102460 c1t12d0 ENA
sd testdg02-01 testvol-01 testdg02 0 102460 c1t13d0 ENA
sd testdg03-01 testvol-01 testdg03 0 102460 c1t14d0 ENA


# vxvol -g testdg start testvol
# vxprint -htg testdg

dm testdg01 c1t12d0s2 sliced 2179 8920560
dm testdg02 c1t13d0s2 sliced 2179 8920560
dm testdg03 c1t14d0s2 sliced 2179 8920560

v testvol - ENABLED ACTIVE 204800 RAID -
pl testvol-01 testvol ENABLED ACTIVE 204864 CONCAT - RW
sd testdg01-01 testvol-01 testdg01 0 102460 c1t12d0 ENA
sd testdg02-01 testvol-01 testdg02 0 102460 c1t13d0 ENA
sd testdg03-01 testvol-01 testdg03 0 102460 c1t14d0 ENA


4. Run a file system utility check and mount the file system. Below is the command for 'vxfs' file system:

# fsck -F vxfs /dev/vx/rdsk/testdg/testvol

Note: The file system is clean - log replay is not required

# mount -F vxfs /dev/vx/dsk/testdg/testvol /testvol

# df -k /testvol

File system kbytes used avail capacity mounted on

/dev/vx/dsk/testdg/testvol

44429208 39734216 4658376 90% /testvol


 

Issue/Introduction

How to restore a disk marked failed caused by link failure or I/O transport failure, but not because of the bad disk