VxVM: Fast Mirror Resync (FMR) may result in inconsistent plex content (Data corruption) following plex attach for layered volumes

book

Article ID: 100046958

calendar_today

Updated On:

Description

Error Message

 

Cause


The issue only applies to VxVM with layered volumes and DCO's. 

If DCO's are not present with layered volumes, the corruption (missed writes) will not occur.

When DCO's are not added to mirrored volumes, a full sync of the data is required between plexes when the detached plex is reattached.

In this case, we found that sub-volume’s start offset in actual volume virtual address space is not aligned to the FMR region size.
 

  • Upon further analysis, we found a bug in the plex-attach code which was skipping atomic copy for a dirty region
     
  • This was happening because of un-aligned sub-volume start offset with respect to the DCO region size
     
  • Due to un-aligned sub-volume start offset, resync of one region which was actually spanning across 1MB boundary was getting skipped while attaching the plex
     
  • We were able to narrow down this defect on our internal setup by writing some data patterns on specific volume offsets when the sub-volume start offset is NOT aligned with respect to the DCO region size and verifying these patterns after plex attach is finished
     

In an effort to isolate the inconsistent plex content, the plex read policy can be changed to read from a specific plex, i.e. all the plexes associated with a given enclosure (site)

1. Stop the application

2. Set the read policy for each sub-layer volumes to reference the plexes for a single enclosure

# vxvol -g rdpol prefer                

3. Start the application


If the application reports errors, switch the read plex preference to the other plexes for the other enclosure


4. Stop the application

5. Set the read policy for each sub-layer volumes to reference the plexes for a single enclosure

# vxvol -g rdpol prefer                

6. Start the application



 

Resolution


Veritas engineering have released the below private hot-fix, contact support to obtain the fix.

The vm-rhel7_x86_64-HotFix-7.3.1.2703 hot-fix includes multiple incidents

Patch ID: 7.3.1.2703

3991737 (3976392) Memory corruption might happen in VxVM (Veritas Volume Manager) while processing Plex detach request.
3991996 (3950335) Support for throttling of Administrative IO for layered volumes
3992054 (3992053) Data corruption may happen with layered volumes due to some data not re-synced while attaching a plex.
3992302 (3991580) Deadlock may happen if IO performed on both source and snapshot volumes.



NOTE: The layered volumes issue impacts all VxVM versions and platforms.


Reproduction Steps
 

1. Create layered volume, i.e. layout=concat-mirror

# vxassist -bg testdg make vol01 1t layout=concat-mirror

NOTE: It can take sometime to create the volume, depending on the volume size specified.
 

2. Add DCO log to the volume

# vxsnap -g testdg prepare vol01
 

The vxprint output will look similar to the below:
 

# vxprint -qhtg testdg
dg testdg       default      default  23000    1579091402.34.gpk630r4c-08

dm 3pardata0_129 3pardata0_129 auto   65536    1048469696 -
dm 3pardata0_130 3pardata0_130 auto   65536    1048469696 -
dm 3pardata0_131 3pardata0_131 auto   65536    1048469696 -
dm 3pardata0_132 3pardata0_132 auto   65536    1048469696 -
dm 3pardata0_133 3pardata0_133 auto   65536    1048469696 -
dm 3pardata0_134 3pardata0_134 auto   65536    1048469696 -
dm 3pardata0_135 3pardata0_135 auto   65536    1048469696 -
dm 3pardata0_136 3pardata0_136 auto   65536    1048469696 -
dm 3pardata0_137 3pardata0_137 auto   65536    1048469696 -
dm 3pardata0_138 3pardata0_138 auto   65536    1048469696 -

v  vol01        -            ENABLED  ACTIVE   2147483648 SELECT  -        fsgen
pl vol01-03     vol01        ENABLED  ACTIVE   2147483648 CONCAT  -        RW
sv vol01-S01    vol01-03     vol01-L01 1       1048469696 0       2/2      ENA
sv vol01-S02    vol01-03     vol01-L02 1       1048469696 1048469696 2/2   ENA
sv vol01-S03    vol01-03     vol01-L03 1       50544256 2096939392 2/2     ENA
dc vol01_dco    vol01        vol01_dcl
v  vol01_dcl    -            ENABLED  ACTIVE   143488   SELECT    -        gen
pl vol01_dcl-01 vol01_dcl    ENABLED  ACTIVE   143488   CONCAT    -        RW
sd 3pardata0_133-01 vol01_dcl-01 3pardata0_133 50544256 143488 0  3pardata0_133 ENA
pl vol01_dcl-02 vol01_dcl    ENABLED  ACTIVE   143488   CONCAT    -        RW
sd 3pardata0_134-01 vol01_dcl-02 3pardata0_134 50544256 143488 0  3pardata0_134 ENA

v  vol01-L01    -            ENABLED  ACTIVE   1048469696 SELECT  -        fsgen
pl vol01-P01    vol01-L01    ENABLED  ACTIVE   1048469696 CONCAT  -        RW
sd 3pardata0_129-02 vol01-P01 3pardata0_129 0  1048469696 0       3pardata0_129 ENA
pl vol01-P02    vol01-L01    ENABLED  ACTIVE   1048469696 CONCAT  -        RW
sd 3pardata0_130-02 vol01-P02 3pardata0_130 0  1048469696 0       3pardata0_130 ENA

v  vol01-L02    -            ENABLED  ACTIVE   1048469696 SELECT  -        fsgen
pl vol01-P03    vol01-L02    ENABLED  ACTIVE   1048469696 CONCAT  -        RW
sd 3pardata0_131-02 vol01-P03 3pardata0_131 0  1048469696 0       3pardata0_131 ENA
pl vol01-P04    vol01-L02    ENABLED  ACTIVE   1048469696 CONCAT  -        RW
sd 3pardata0_132-02 vol01-P04 3pardata0_132 0  1048469696 0       3pardata0_132 ENA

v  vol01-L03    -            ENABLED  ACTIVE   50544256 SELECT    -        fsgen
pl vol01-P05    vol01-L03    ENABLED  ACTIVE   50544256 CONCAT    -        RW
sd 3pardata0_133-02 vol01-P05 3pardata0_133 0  50544256 0         3pardata0_133 ENA
pl vol01-P06    vol01-L03    ENABLED  ACTIVE   50544256 CONCAT    -        RW
sd 3pardata0_134-02 vol01-P06 3pardata0_134 0  50544256 0         3pardata0_134 ENA

 


To prevent hot-relocation (vxrelocd) trying to relocate subdisks to other available space, stop the vxrelocd processes.

Example:

# ps -ef | grep -i vxrelocd
root      6317     1  0 Jan15 ?        00:00:00 /bin/sh - /usr/lib/vxvm/bin/vxrelocd root
root      6386  6317  0 Jan15 ?        00:00:00 /bin/sh - /usr/lib/vxvm/bin/vxrelocd root
root     32078 13648  0 09:52 pts/0    00:00:00 grep --color=auto -i vxrelocd


# kill -9  6317 6386

#  ps -ef | grep -i vxrelocd
root     32080 13648  0 09:52 pts/0    00:00:00 grep --color=auto -i vxrelocd
 

 

3. Ideally you would have two enclosures for redundancy, however, in this instance the 2nd plex for each sub-layer volume will be detached by disabling the corresponding dmpnodes

# vxdmpadm -f disable dmpnodename=

Examples:

# vxdmpadm -f disable dmpnodename=3pardata0_130
# vxdmpadm -f disable dmpnodename=3pardata0_132
# vxdmpadm -f disable dmpnodename=3pardata0_134


 

4. I/O will be left running for 30 mins to an hour or more to ensure the surviving attached sub-layer plexes are updated, whilst the other plexes remain in a detached state (DISABLED NODEVICE)
 

# vxprint -qhtg testdg
dg testdg       default      default  23000    1579091402.34.gpk630r4c-08

dm 3pardata0_129 3pardata0_129 auto   65536    1048469696 -
dm 3pardata0_130 -           -        -        -        NODEVICE
dm 3pardata0_131 3pardata0_131 auto   65536    1048469696 -
dm 3pardata0_132 -           -        -        -        NODEVICE
dm 3pardata0_133 3pardata0_133 auto   65536    1048469696 -
dm 3pardata0_134 -           -        -        -        NODEVICE
dm 3pardata0_135 3pardata0_135 auto   65536    1048469696 -
dm 3pardata0_136 3pardata0_136 auto   65536    1048469696 -
dm 3pardata0_137 3pardata0_137 auto   65536    1048469696 -
dm 3pardata0_138 3pardata0_138 auto   65536    1048469696 -

v  vol01        -            ENABLED  ACTIVE   2147483648 SELECT  -        fsgen
pl vol01-03     vol01        ENABLED  ACTIVE   2147483648 CONCAT  -        RW
sv vol01-S01    vol01-03     vol01-L01 1       1048469696 0       1/2      ENA
sv vol01-S02    vol01-03     vol01-L02 1       1048469696 1048469696 1/2   ENA
sv vol01-S03    vol01-03     vol01-L03 1       50544256 2096939392 1/2     ENA
dc vol01_dco    vol01        vol01_dcl
v  vol01_dcl    -            ENABLED  ACTIVE   143488   SELECT    -        gen
pl vol01_dcl-01 vol01_dcl    ENABLED  ACTIVE   143488   CONCAT    -        RW
sd 3pardata0_133-01 vol01_dcl-01 3pardata0_133 50544256 143488 0  3pardata0_133 ENA
pl vol01_dcl-02 vol01_dcl    DISABLED NODEVICE 143488   CONCAT    -        RW
sd 3pardata0_134-01 vol01_dcl-02 3pardata0_134 50544256 143488 0  -        RLOC

v  vol01-L01    -            ENABLED  ACTIVE   1048469696 SELECT  -        fsgen
pl vol01-P01    vol01-L01    ENABLED  ACTIVE   1048469696 CONCAT  -        RW
sd 3pardata0_129-02 vol01-P01 3pardata0_129 0  1048469696 0       3pardata0_129 ENA
pl vol01-P02    vol01-L01    DISABLED NODEVICE 1048469696 CONCAT  -        RW
sd 3pardata0_130-02 vol01-P02 3pardata0_130 0  1048469696 0       -        RLOC

v  vol01-L02    -            ENABLED  ACTIVE   1048469696 SELECT  -        fsgen
pl vol01-P03    vol01-L02    ENABLED  ACTIVE   1048469696 CONCAT  -        RW
sd 3pardata0_131-02 vol01-P03 3pardata0_131 0  1048469696 0       3pardata0_131 ENA
pl vol01-P04    vol01-L02    DISABLED NODEVICE 1048469696 CONCAT  -        RW
sd 3pardata0_132-02 vol01-P04 3pardata0_132 0  1048469696 0       -        NDEV

v  vol01-L03    -            ENABLED  ACTIVE   50544256 SELECT    -        fsgen
pl vol01-P05    vol01-L03    ENABLED  ACTIVE   50544256 CONCAT    -        RW
sd 3pardata0_133-02 vol01-P05 3pardata0_133 0  50544256 0         3pardata0_133 ENA
pl vol01-P06    vol01-L03    DISABLED NODEVICE 50544256 CONCAT    -        RW
sd 3pardata0_134-02 vol01-P06 3pardata0_134 0  50544256 0         -        RLOC


5. Enable the disabled dmpnodes for the detached plexes and wait for the vxattachd daemon (180 seconds+) to detect the returning disks and perform the plex recovery

 

# vxdmpadm enable dmpnodename=

Examples

# vxdmpadm enable dmpnodename=3pardata0_130
# vxdmpadm enable dmpnodename=3pardata0_132
# vxdmpadm enable dmpnodename=3pardata0_134

 

6. Stop application


7. Once the plexes have been resynced, set the plex read policy to read from the resynced plexes

# vxvol -g rdpol prefer                

Examples:

# vxvol -g testdg rdpol prefer vol01-L01 vol01-P02
# vxvol -g testdg rdpol prefer vol01-L02 vol01-P04
# vxvol -g testdg rdpol prefer vol01-L03 vol01-P06

 


8. Start the application, does the application report any errors


9. If errors are reported, stop the application and switch the preferred read preference back to read from the 1st plex for each sub-layered volume
 

# vxvol -g rdpol prefer      
 

Examples:

# vxvol -g testdg rdpol prefer vol01-L01 vol01-P01
# vxvol -g testdg rdpol prefer vol01-L02 vol01-P03
# vxvol -g testdg rdpol prefer vol01-L03 vol01-P05


10. Start the application, does the application report any errors
 

 

Issue/Introduction


In Veritas Volume Manager (VxVM), the Fast mirror resync (FMR) feature allows fast resync of a detached plex, wherein the regions of the volume which are modified/dirtied after a plex is detached are persistently tracked in a bitmap and only those regions are copied/resynced during the plex attach.
  • The regions which are dirty are persistently tracked in the Data change object (DCO) bitmap
  • In a layered volume, this FMR tracking happens with respect to address space of the top-volume (main volume)
  • Whereas the plex attach IO’s (ATOMIC_COPY) are performed at sub-volume level which is sub-volume’s address space
  • During the plex attach operation, VxVM code converts the sub-volume’s offset into top-volume’s offset and then checks if the corresponding region is marked dirty/clean in the DCO bitmap
  • The VxVM code generates I/O's on such dirty regions, these I/O's are of size 1 MB each

We were able to reproduce a similar corruption in-house, wherein we performed some random write workload IO’s on volume after detaching plex.

We found that during subsequent plex attach, one particular region of the volume was not resynced, even though the bit in the DCO bitmap for the region was marked dirty, hence the data was inconsistent on two plexes, resulting in the corruption while reading data from the newly attached plex.

Additional Information

JIRA: STESC-3899