VxVM: Fast Mirror Resync (FMR) may result in inconsistent plex content (Data corruption) following plex attach for layered volumes

Description

Error Message

Cause

The issue only applies to VxVM with layered volumes and DCO's.

If DCO's are not present with layered volumes, the corruption (missed writes) will not occur.

When DCO's are not added to mirrored volumes, a full sync of the data is required between plexes when the detached plex is reattached.

In this case, we found that sub-volume’s start offset in actual volume virtual address space is not aligned to the FMR region size.

Upon further analysis, we found a bug in the plex-attach code which was skipping atomic copy for a dirty region
This was happening because of un-aligned sub-volume start offset with respect to the DCO region size
Due to un-aligned sub-volume start offset, resync of one region which was actually spanning across 1MB boundary was getting skipped while attaching the plex
We were able to narrow down this defect on our internal setup by writing some data patterns on specific volume offsets when the sub-volume start offset is NOT aligned with respect to the DCO region size and verifying these patterns after plex attach is finished

In an effort to isolate the inconsistent plex content, the plex read policy can be changed to read from a specific plex, i.e. all the plexes associated with a given enclosure (site)

1. Stop the application

2. Set the read policy for each sub-layer volumes to reference the plexes for a single enclosure

# vxvol -g rdpol prefer

3. Start the application

If the application reports errors, switch the read plex preference to the other plexes for the other enclosure

4. Stop the application

5. Set the read policy for each sub-layer volumes to reference the plexes for a single enclosure

# vxvol -g rdpol prefer

6. Start the application

Resolution

Veritas engineering have released the below private hot-fix, contact support to obtain the fix.

The vm-rhel7_x86_64-HotFix-7.3.1.2703 hot-fix includes multiple incidents

Patch ID: 7.3.1.2703 3991737 (3976392) Memory corruption might happen in VxVM (Veritas Volume Manager) while processing Plex detach request. 3991996 (3950335) Support for throttling of Administrative IO for layered volumes 3992054 (3992053) Data corruption may happen with layered volumes due to some data not re-synced while attaching a plex. 3992302 (3991580) Deadlock may happen if IO performed on both source and snapshot volumes.

NOTE: The layered volumes issue impacts all VxVM versions and platforms.

Reproduction Steps

1. Create layered volume, i.e. layout=concat-mirror

# vxassist -bg testdg make vol01 1t layout=concat-mirror

NOTE: It can take sometime to create the volume, depending on the volume size specified.

2. Add DCO log to the volume

# vxsnap -g testdg prepare vol01

The vxprint output will look similar to the below:

# vxprint -qhtg testdg
dg testdg default default 23000 1579091402.34.gpk630r4c-08

dm 3pardata0_129 3pardata0_129 auto 65536 1048469696 - dm 3pardata0_130 3pardata0_130 auto 65536 1048469696 - dm 3pardata0_131 3pardata0_131 auto 65536 1048469696 - dm 3pardata0_132 3pardata0_132 auto 65536 1048469696 - dm 3pardata0_133 3pardata0_133 auto 65536 1048469696 - dm 3pardata0_134 3pardata0_134 auto 65536 1048469696 - dm 3pardata0_135 3pardata0_135 auto 65536 1048469696 - dm 3pardata0_136 3pardata0_136 auto 65536 1048469696 - dm 3pardata0_137 3pardata0_137 auto 65536 1048469696 - dm 3pardata0_138 3pardata0_138 auto 65536 1048469696 -

v vol01 - ENABLED ACTIVE 2147483648 SELECT - fsgen pl vol01-03 vol01 ENABLED ACTIVE 2147483648 CONCAT - RW sv vol01-S01 vol01-03 vol01-L01 1 1048469696 0 2/2 ENA sv vol01-S02 vol01-03 vol01-L02 1 1048469696 1048469696 2/2 ENA sv vol01-S03 vol01-03 vol01-L03 1 50544256 2096939392 2/2 ENA dc vol01_dco vol01 vol01_dcl v vol01_dcl - ENABLED ACTIVE 143488 SELECT - gen pl vol01_dcl-01 vol01_dcl ENABLED ACTIVE 143488 CONCAT - RW sd 3pardata0_133-01 vol01_dcl-01 3pardata0_133 50544256 143488 0 3pardata0_133 ENA pl vol01_dcl-02 vol01_dcl ENABLED ACTIVE 143488 CONCAT - RW sd 3pardata0_134-01 vol01_dcl-02 3pardata0_134 50544256 143488 0 3pardata0_134 ENA

v vol01-L01 - ENABLED ACTIVE 1048469696 SELECT - fsgen pl vol01-P01 vol01-L01 ENABLED ACTIVE 1048469696 CONCAT - RW sd 3pardata0_129-02 vol01-P01 3pardata0_129 0 1048469696 0 3pardata0_129 ENA pl vol01-P02 vol01-L01 ENABLED ACTIVE 1048469696 CONCAT - RW sd 3pardata0_130-02 vol01-P02 3pardata0_130 0 1048469696 0 3pardata0_130 ENA

v vol01-L02 - ENABLED ACTIVE 1048469696 SELECT - fsgen pl vol01-P03 vol01-L02 ENABLED ACTIVE 1048469696 CONCAT - RW sd 3pardata0_131-02 vol01-P03 3pardata0_131 0 1048469696 0 3pardata0_131 ENA pl vol01-P04 vol01-L02 ENABLED ACTIVE 1048469696 CONCAT - RW sd 3pardata0_132-02 vol01-P04 3pardata0_132 0 1048469696 0 3pardata0_132 ENA

v vol01-L03 - ENABLED ACTIVE 50544256 SELECT - fsgen pl vol01-P05 vol01-L03 ENABLED ACTIVE 50544256 CONCAT - RW sd 3pardata0_133-02 vol01-P05 3pardata0_133 0 50544256 0 3pardata0_133 ENA pl vol01-P06 vol01-L03 ENABLED ACTIVE 50544256 CONCAT - RW sd 3pardata0_134-02 vol01-P06 3pardata0_134 0 50544256 0 3pardata0_134 ENA

To prevent hot-relocation (vxrelocd) trying to relocate subdisks to other available space, stop the vxrelocd processes.

Example:

# ps -ef | grep -i vxrelocd
root 6317 1 0 Jan15 ? 00:00:00 /bin/sh - /usr/lib/vxvm/bin/vxrelocd root root 6386 6317 0 Jan15 ? 00:00:00 /bin/sh - /usr/lib/vxvm/bin/vxrelocd root root 32078 13648 0 09:52 pts/0 00:00:00 grep --color=auto -i vxrelocd

# kill -9 6317 6386

# ps -ef | grep -i vxrelocd
root 32080 13648 0 09:52 pts/0 00:00:00 grep --color=auto -i vxrelocd

3. Ideally you would have two enclosures for redundancy, however, in this instance the 2nd plex for each sub-layer volume will be detached by disabling the corresponding dmpnodes

# vxdmpadm -f disable dmpnodename=

Examples:
# vxdmpadm -f disable dmpnodename=3pardata0_130
# vxdmpadm -f disable dmpnodename=3pardata0_132
# vxdmpadm -f disable dmpnodename=3pardata0_134

4. I/O will be left running for 30 mins to an hour or more to ensure the surviving attached sub-layer plexes are updated, whilst the other plexes remain in a detached state (DISABLED NODEVICE)

# vxprint -qhtg testdg
dg testdg default default 23000 1579091402.34.gpk630r4c-08

dm 3pardata0_129 3pardata0_129 auto 65536 1048469696 - dm 3pardata0_130 - - - - NODEVICE dm 3pardata0_131 3pardata0_131 auto 65536 1048469696 - dm 3pardata0_132 - - - - NODEVICE dm 3pardata0_133 3pardata0_133 auto 65536 1048469696 - dm 3pardata0_134 - - - - NODEVICE dm 3pardata0_135 3pardata0_135 auto 65536 1048469696 - dm 3pardata0_136 3pardata0_136 auto 65536 1048469696 - dm 3pardata0_137 3pardata0_137 auto 65536 1048469696 - dm 3pardata0_138 3pardata0_138 auto 65536 1048469696 -

v vol01 - ENABLED ACTIVE 2147483648 SELECT - fsgen pl vol01-03 vol01 ENABLED ACTIVE 2147483648 CONCAT - RW sv vol01-S01 vol01-03 vol01-L01 1 1048469696 0 1/2 ENA sv vol01-S02 vol01-03 vol01-L02 1 1048469696 1048469696 1/2 ENA sv vol01-S03 vol01-03 vol01-L03 1 50544256 2096939392 1/2 ENA dc vol01_dco vol01 vol01_dcl v vol01_dcl - ENABLED ACTIVE 143488 SELECT - gen pl vol01_dcl-01 vol01_dcl ENABLED ACTIVE 143488 CONCAT - RW sd 3pardata0_133-01 vol01_dcl-01 3pardata0_133 50544256 143488 0 3pardata0_133 ENA pl vol01_dcl-02 vol01_dcl DISABLED NODEVICE 143488 CONCAT - RW sd 3pardata0_134-01 vol01_dcl-02 3pardata0_134 50544256 143488 0 - RLOC

v vol01-L01 - ENABLED ACTIVE 1048469696 SELECT - fsgen pl vol01-P01 vol01-L01 ENABLED ACTIVE 1048469696 CONCAT - RW sd 3pardata0_129-02 vol01-P01 3pardata0_129 0 1048469696 0 3pardata0_129 ENA pl vol01-P02 vol01-L01 DISABLED NODEVICE 1048469696 CONCAT - RW sd 3pardata0_130-02 vol01-P02 3pardata0_130 0 1048469696 0 - RLOC

v vol01-L02 - ENABLED ACTIVE 1048469696 SELECT - fsgen pl vol01-P03 vol01-L02 ENABLED ACTIVE 1048469696 CONCAT - RW sd 3pardata0_131-02 vol01-P03 3pardata0_131 0 1048469696 0 3pardata0_131 ENA pl vol01-P04 vol01-L02 DISABLED NODEVICE 1048469696 CONCAT - RW sd 3pardata0_132-02 vol01-P04 3pardata0_132 0 1048469696 0 - NDEV

v vol01-L03 - ENABLED ACTIVE 50544256 SELECT - fsgen pl vol01-P05 vol01-L03 ENABLED ACTIVE 50544256 CONCAT - RW sd 3pardata0_133-02 vol01-P05 3pardata0_133 0 50544256 0 3pardata0_133 ENA pl vol01-P06 vol01-L03 DISABLED NODEVICE 50544256 CONCAT - RW sd 3pardata0_134-02 vol01-P06 3pardata0_134 0 50544256 0 - RLOC

5. Enable the disabled dmpnodes for the detached plexes and wait for the vxattachd daemon (180 seconds+) to detect the returning disks and perform the plex recovery

# vxdmpadm enable dmpnodename=

Examples
# vxdmpadm enable dmpnodename=3pardata0_130
# vxdmpadm enable dmpnodename=3pardata0_132
# vxdmpadm enable dmpnodename=3pardata0_134

6. Stop application

7. Once the plexes have been resynced, set the plex read policy to read from the resynced plexes

# vxvol -g rdpol prefer

Examples:

# vxvol -g testdg rdpol prefer vol01-L01 vol01-P02
# vxvol -g testdg rdpol prefer vol01-L02 vol01-P04
# vxvol -g testdg rdpol prefer vol01-L03 vol01-P06

8. Start the application, does the application report any errors

9. If errors are reported, stop the application and switch the preferred read preference back to read from the 1st plex for each sub-layered volume

# vxvol -g rdpol prefer

Examples:

# vxvol -g testdg rdpol prefer vol01-L01 vol01-P01
# vxvol -g testdg rdpol prefer vol01-L02 vol01-P03
# vxvol -g testdg rdpol prefer vol01-L03 vol01-P05

10. Start the application, does the application report any errors

Issue/Introduction

In Veritas Volume Manager (VxVM), the Fast mirror resync (FMR) feature allows fast resync of a detached plex, wherein the regions of the volume which are modified/dirtied after a plex is detached are persistently tracked in a bitmap and only those regions are copied/resynced during the plex attach.

The regions which are dirty are persistently tracked in the Data change object (DCO) bitmap
In a layered volume, this FMR tracking happens with respect to address space of the top-volume (main volume)
Whereas the plex attach IO’s (ATOMIC_COPY) are performed at sub-volume level which is sub-volume’s address space
During the plex attach operation, VxVM code converts the sub-volume’s offset into top-volume’s offset and then checks if the corresponding region is marked dirty/clean in the DCO bitmap
The VxVM code generates I/O's on such dirty regions, these I/O's are of size 1 MB each

We were able to reproduce a similar corruption in-house, wherein we performed some random write workload IO’s on volume after detaching plex.

We found that during subsequent plex attach, one particular region of the volume was not resynced, even though the bit in the DCO bitmap for the region was marked dirty, hence the data was inconsistent on two plexes, resulting in the corruption while reading data from the newly attached plex.

Additional Information

JIRA: STESC-3899

Welcome to "KB Articles"