How to recover if relayout is interuppted by a SAN outage

book

Article ID: 100004473

calendar_today

Updated On:

Description

Error Message

# vxrelayout -g testdg start testvol
VxVM vxrelayout ERROR V-5-1-2220  volume testvol-S01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2199  plex testvol-01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2220  volume testvol-D01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2199  plex testvol-Dp01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2220  volume testvol-W01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2220  volume testvol-T01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2199  plex testvol-T01-01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2199  plex testvol-T01-02 is detached or disabled
VxVM vxrelayout INFO V-5-1-2291 Attempting to cleanup
..

Cause

The SAN outage occuring leave the volume in mid-process leaves lots of additional objects {subvolumes(sv), volumes (v), plexes(pl), subdisk(sd) } which are created temporary during the relayout process while still maintaining access to the data. 

Resolution

Check the vxprint output for current status

# vxprint -g testdg -hrt testvol
dm d1           hds9500-alua0_210 auto 65536   1931008  -
dm d2           hds9500-alua0_232 auto 65536   954112   -
dm d3           hds9500-alua0_217 auto 65536   1931008  -
dm d4           hds9500-alua0_224 auto 65536   855808   -

v  testvol      -            ENABLED  ACTIVE   1024000  SELECT    -        fsgen
pl testvol-tp01 testvol      ENABLED  ACTIVE   1024000  CONCAT    -        RW
sv testvol-ts01 testvol-tp01 testvol-I01 2     1024000  0         3/5      ENA
v2 testvol-I01  -            ENABLED  ACTIVE   1024000  ROUND     -        relayout
p2 testvol-Ip01 testvol-I01  ENABLED(SPARSE) SRC 1024000 CONCAT   -        RW
sv testvol-Is01 testvol-Ip01 testvol-S01 1     206848   817152    0/1      DIS
v3 testvol-S01  -            DISABLED CLEAN    206848   SELECT    -        fsgen
p3 testvol-01   testvol-S01  DISABLED CLEAN    206848   STRIPE    2/128    RW
s3 d1-03        testvol-01   d1       1125376  103424   0/0       hds9500-alua0_210 ENA
s3 d3-03        testvol-01   d3       1125376  103424   1/0       hds9500-alua0_217 ENA
p2 testvol-Ip02 testvol-I01  ENABLED(SPARSE) TMP 919296 CONCAT    -        WO
sv testvol-Is02 testvol-Ip02 testvol-T01 1     102144   817152    0/2      DIS
v3 testvol-T01  -            DISABLED NEEDSYNC 102144   SELECT    -        fsgen
p3 testvol-T01-01 testvol-T01 DISABLED ACTIVE  102144   CONCAT    -        RW
s3 d1-04        testvol-T01-01 d1     1228800  102144   0         hds9500-alua0_210 ENA
p3 testvol-T01-02 testvol-T01 DISABLED ACTIVE  102144   CONCAT    -        RW
s3 d3-04        testvol-T01-02 d3     1228800  102144   0         hds9500-alua0_217 ENA
p2 testvol-Ip03 testvol-I01  DISABLED UNUSED   1024000  CONCAT    -        RW
sv testvol-Is03 testvol-Ip03 testvol-U01 1     1024000  0         0/1      DIS
v3 testvol-U01  -            DISABLED EMPTY    1024000  SELECT    -        fsgen
p3 testvol-Up01 testvol-U01  DISABLED(SPARSE) ACTIVE 1225600 STRIPE 3/128  RW
s3 d1-05        testvol-Up01 d1       989184   136192   0/272384  hds9500-alua0_210 ENA
s3 d3-05        testvol-Up01 d3       989184   136192   1/272384  hds9500-alua0_217 ENA
s3 d2-03        testvol-Up01 d2       885120   68992    2/272384  hds9500-alua0_232 ENA
p2 testvol-Ip04 testvol-I01  DISABLED(SPARSE) WOD 0     CONCAT    -        WO
sv testvol-Is04 testvol-Ip04 testvol-W01 0     0        0         0/1      DIS
v3 testvol-W01  -            DISABLED CLEAN    0        SELECT    -        fsgen
p3 testvol-Wp01 testvol-W01  DISABLED CLEAN    0        STRIPE    3/128    RW
p2 testvol-Ip05 testvol-I01  ENABLED(SPARSE) DST 817152 CONCAT    -        RW
sv testvol-Is05 testvol-Ip05 testvol-D01 1     817152   0         0/1      DIS
v3 testvol-D01  -            DISABLED CLEAN    817152   SELECT    -        fsgen
p3 testvol-Dp01 testvol-D01  DISABLED CLEAN    817152   STRIPE    3/128    RW
s3 d1-06        testvol-Dp01 d1       716800   272384   0/0       hds9500-alua0_210 ENA
s3 d3-06        testvol-Dp01 d3       716800   272384   1/0       hds9500-alua0_217 ENA
s3 d4-04        testvol-Dp01 d4       716800   104064   2/0       hds9500-alua0_224 ENA
s3 d2-04        testvol-Dp01 d2       716800   168320   2/104064  hds9500-alua0_232 ENA

As per the warning above we can see from the vxprint output that most of the volumes are in disabled state. To recover we need to ONLY start the volumes from the output from running the # vxrelayout start command.  Do NOT start all the volumes as vxrelayout expect some volumes (in this case volume testvol-U01 to be DISABLED/EMPTY) not to be in ENABLED/ACTIVE state.

# vxrelayout -g testdg start testvol  | grep volume
VxVM vxrelayout ERROR V-5-1-2220  volume testvol-S01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2220  volume testvol-D01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2220  volume testvol-W01 is detached or disabled
VxVM vxrelayout ERROR V-5-1-2220  volume testvol-T01 is detached or disabled

Start the rquired volumes

# vxvol -g testdg start testvol-S01 testvol-D01  testvol-W01  testvol-T01

Remove stale mount (if it exist) fsck and Remount the filesystem

# umout -o force /tmnt

# fsck -F vxfs /dev/vx/rdsk/testdg/testvol

# mount -F vxfs /dev/vx/dsk/testdg/testvol  /tmnt

Recommended: Prior to starting the relayout for  the volume you should make a complete backup incae there are further complexion

Start vxrelayout to continue

# vxrelayout -g testdg start testvol 

Finally check the status

# vxprint -g testdg -htr testvol
dm d1           hds9500-alua0_210 auto 65536   1931008  -
dm d2           hds9500-alua0_232 auto 65536   954112   -
dm d3           hds9500-alua0_217 auto 65536   1931008  -
dm d4           hds9500-alua0_224 auto 65536   855808   -

v  testvol      -            ENABLED  ACTIVE   1024000  SELECT    testvol-01 fsgen
pl testvol-01   testvol      ENABLED  ACTIVE   1024128  STRIPE    3/128    RW
sd d1-06        testvol-01   d1       716800   341376   0/0       hds9500-alua0_210 ENA
sd d3-06        testvol-01   d3       716800   341376   1/0       hds9500-alua0_217 ENA
sd d4-04        testvol-01   d4       716800   104064   2/0       hds9500-alua0_224 ENA
sd d2-04        testvol-01   d2       716800   237312   2/104064  hds9500-alua0_232 ENA

 

Applies To

At the start of the process we have a standard 2 column stripe volume/filesystem mounted

# vxprint -g testdg -hrt testvol
dm d1           hds9500-alua0_210 auto 65536   1931008  -
dm d2           hds9500-alua0_232 auto 65536   954112   -
dm d3           hds9500-alua0_217 auto 65536   1931008  -
dm d4           hds9500-alua0_224 auto 65536   855808   -

v  testvol      -                   ENABLED  ACTIVE   1024000  SELECT    testvol-01 fsgen
pl testvol-01   testvol      ENABLED  ACTIVE   1024000  STRIPE    2/128    RW
sd d1-03        testvol-01   d1       716800   512000   0/0       hds9500-alua0_210 ENA
sd d3-03        testvol-01   d3       716800   512000   1/0       hds9500-alua0_217 ENA

# df -kl /tmnt
Filesystem                               kbytes     used        avail  capacity  Mounted on
/dev/vx/dsk/testdg/testvol    512000  442348    65328    88%      /tmnt

Issue/Introduction

After starting the relayout to increase a stripe volume by one column # vxassist -g testdg relayout testvol ncol=+1 As a result of a SAN outage , vxrelayout fails to complete and may result in worst the filesystem being umounted and some or all of the volumes/sub-volumes are disabled. # df -kl may show a stale mount point where the filesystem existed df: cannot statvfs /tmnt: I/O error checking the status of the relayout # vxrelayout -g testdg status testvol
STRIPED, columns=2, stwidth=128 --> STRIPED, columns=3, stwidth=128
Relayout stopped, 79.80% completed. Attempt to restart the relayout fails.