BACKGROUND:
When using Veritas Volume Replication (VVR) one of the most frequently asked questions is how a customer can verify that replication is working correctly, and as such data held at their secondary site or node is intact. Likewise if, for some reason, that data is not intact, how can this be rectified and what are the options available?
VVR offers comprehensive data verification and synchronisation functionality within the product. The purpose of this document is to discuss how verification and synchronisation can be performed and any considerations which need to be made depending on methodology.
For the purpose of examples in this document a 2 node VVR pair were configured using a single RVG named 'testrvg' which contained three data volumes (datavol, datavol2, and datavol3). The configuration of the RVG on primary and secondary nodes is shown below:
Primary:
rv testrvg 1 ENABLED ACTIVE primary 3 srlvol
rl to08 testrvg CONNECT ACTIVE rdgv240sol08 vvrdg to07
v datavol testrvg ENABLED ACTIVE 2097152 SELECT - fsgen
pl datavol-01 datavol ENABLED ACTIVE 2097152 CONCAT - RW
sd vvrdg01-01 datavol-01 vvrdg01 0 2097152 0 c3t0d0 ENA
pl datavol-02 datavol ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg01-02 datavol-02 vvrdg01 2097152 64 LOG c3t0d0 ENA
pl datavol-03 datavol ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg02-02 datavol-03 vvrdg02 1048576 64 LOG c3t0d1 ENA
v datavol2 testrvg ENABLED ACTIVE 2097152 SELECT - fsgen
pl datavol2-01 datavol2 ENABLED ACTIVE 2097152 CONCAT - RW
sd vvrdg02-03 datavol2-01 vvrdg02 1048640 2097152 0 c3t0d1 ENA
pl datavol2-02 datavol2 ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg02-04 datavol2-02 vvrdg02 3145792 64 LOG c3t0d1 ENA
pl datavol2-03 datavol2 ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg01-03 datavol2-03 vvrdg01 2097216 64 LOG c3t0d0 ENA
v datavol3 testrvg ENABLED ACTIVE 2097152 SELECT - fsgen
pl datavol3-01 datavol3 ENABLED ACTIVE 2097152 CONCAT - RW
sd vvrdg01-04 datavol3-01 vvrdg01 2097280 2097152 0 c3t0d0 ENA
pl datavol3-02 datavol3 ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg01-05 datavol3-02 vvrdg01 4194432 64 LOG c3t0d0 ENA
pl datavol3-03 datavol3 ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg02-05 datavol3-03 vvrdg02 3145856 64 LOG c3t0d1 ENA
v srlvol testrvg ENABLED ACTIVE 1048576 SELECT - SRL
pl srlvol-01 srlvol ENABLED ACTIVE 1048576 CONCAT - RW
sd vvrdg02-01 srlvol-01 vvrdg02 0 1048576 0 c3t0d1 ENA
Secondary:
rv testrvg 1 ENABLED ACTIVE secondary 3 srlvol
rl to07 testrvg CONNECT ACTIVE rdgv240sol07 vvrdg to08
v datavol testrvg ENABLED ACTIVE 2097152 SELECT - fsgen
pl datavol-01 datavol ENABLED ACTIVE 2097152 CONCAT - RW
sd vvrdg01-01 datavol-01 vvrdg01 0 2097152 0 c3t0d2 ENA
pl datavol-02 datavol ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg01-02 datavol-02 vvrdg01 2097152 64 LOG c3t0d2 ENA
pl datavol-03 datavol ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg02-02 datavol-03 vvrdg02 1048576 64 LOG c3t0d3 ENA
v datavol2 testrvg ENABLED ACTIVE 2097152 SELECT - fsgen
pl datavol2-01 datavol2 ENABLED ACTIVE 2097152 CONCAT - RW
sd vvrdg02-03 datavol2-01 vvrdg02 1048640 2097152 0 c3t0d3 ENA
pl datavol2-02 datavol2 ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg01-04 datavol2-02 vvrdg01 4194368 64 LOG c3t0d2 ENA
pl datavol2-03 datavol2 ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg02-04 datavol2-03 vvrdg02 3145792 64 LOG c3t0d3 ENA
v datavol3 testrvg ENABLED ACTIVE 2097152 SELECT - fsgen
pl datavol3-01 datavol3 ENABLED ACTIVE 2097152 CONCAT - RW
sd vvrdg01-03 datavol3-01 vvrdg01 2097216 2097152 0 c3t0d2 ENA
pl datavol3-02 datavol3 ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg01-05 datavol3-02 vvrdg01 4194432 64 LOG c3t0d2 ENA
pl datavol3-03 datavol3 ENABLED ACTIVE LOGONLY CONCAT - RW
sd vvrdg02-05 datavol3-03 vvrdg02 3145856 64 LOG c3t0d3 ENA
v srlvol testrvg ENABLED ACTIVE 1048576 SELECT - SRL
pl srlvol-01 srlvol ENABLED ACTIVE 1048576 CONCAT - RW
sd vvrdg02-01 srlvol-01 vvrdg02 0 1048576 0 c3t0d3 ENA
Note that to simulate being in use by real applications, data volumes were mounted on the primary node:
rdgv240sol07# mount | grep vvrdg
/datavol on /dev/vx/dsk/vvrdg/datavol read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=47c61a8 on Fri Jan 8 11:22:30 2010
/datavol3 on /dev/vx/dsk/vvrdg/datavol3 read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=47c61ac on Fri Jan 8 11:23:09 2010
/datavol2 on /dev/vx/dsk/vvrdg/datavol2 read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=47c61ab on Fri Jan 8 11:23:52 2010
Note also that for the purpose of this document and the examples it contains, differences in data held on datavol2 and datavol3 were manually introduced between the primary and secondary nodes.
DATA VERIFICATION
Firstly lets look at data verification. Verification can take many forms such as secondary site firedrill via Veritas Cluster Server, migration to the secondary site followed by manual or application based data verification, or the use of manually created snapshot volumes at the secondary site. Within this document, however, we will discuss the options available within VVR as these are the recommended methods of performing full block level consistency verification of volumes within an RVG between primary and secondary nodes.
Essentially VVR offers two distinct forms of data verification - offline and online. The main difference between these is that offline must be performed in conjunction with application downtime on the primary node whereas online can be performed with some pre-configuration but no application downtime. The following sections discuss and show examples of each method in more detail.
Offline verification:
Offline verification works by performing verification of block level consistency between corresponding primary and secondary data volumes. To do this volumes are read in incremental chunks on each node, and an MD5 checksum generated for each chunk on each node. The generated checksums for the same chunk are compared between primary and secondary nodes. As such if the checksums match between nodes, the corresponding chunk of volume has been found to be identical. If, however, checksums differ between nodes, the corresponding chunk of volume contains differences and needs some form of resynchronisation.
This verification methodology allows VVR to provide a report showing a percentage difference between nodes for each data volume contained within the RVG.
Note that offline verification requires no preconfiguration but does require an outage to applications using data volumes contained within the RVG in question for the duration of the verification procedure. If an outage is not acceptable, online verification, as described below, should be used instead.
1. Before verification can begin all applications using data volumes within the primary RVG should be stopped
2. If using a file system for data storage, all data volumes within the primary RVG should be unmounted:
rdgv240sol07# umount /datavol
rdgv240sol07# umount /datavol2
rdgv240sol07# umount /datavol3
rdgv240sol07# mount | grep vvrdg
rdgv240sol07#
3. Before verification can begin the primary rlink must be reported as up to date meaning that the SRL is empty and there is no outstanding data on the primary node waiting to be replicated to the secondary. This can be verified with the 'vxrlink status' command as shown below:
rdgv240sol07# vxrlink -g vvrdg status to08
8 January 2010 11:25:10 GMT
VxVM VVR vxrlink INFO V-5-1-4467 Rlink to08 is up to date
If the rlink is not initially reported as up to date, the user should wait whilst replication occurs (or take other appropriate action) to ensure that the replication gets up to date at which point the procedure can continue.
4. Detach primary and secondary rlinks:
rdgv240sol07# vradmin -g vvrdg -f stoprep testrvg
VxVM VVR vradmin WARNING V-5-52-92 Secondary data volumes will become out-of-date.
vradmin: Continue with stoprep (y/n)? y
rdgv240sol07#
rdgv240sol07# vxprint -qtrg vvrdg testrvg | grep ^rl
rl to08 testrvg DETACHED STALE rdgv240sol08 vvrdg to07
5. Start data verification using the 'vradmin syncrvg' command. Note that as the '-verify' flag is used data consistency will be verified but there will be no attempt to perform any synchronisation between nodes. Once running verification provides a report on progress detailing which volume is currently being checked, current difference in percent of the volume between primary and secondary nodes for sections already checked, and the total amount of the volume checked so far:
rdgv240sol07# vradmin -g vvrdg -verify syncrvg testrvg rdgv240sol08
VxVM VVR vradmin WARNING V-5-52-126 Make sure applications using Primary data volumes are stopped. The result of the volume verification will be invalid if applications using Primary data volumes are not stopped.
vradmin: Continue with syncrvg -verify (y/n)? y
Message from Primary:
VxVM VVR vxrsync INFO V-5-52-2231 Starting volume verification to remote
VxVM VVR vxrsync INFO V-5-52-2211 Source host: 10.12.249.11
VxVM VVR vxrsync INFO V-5-52-2212 Destination host(s): 10.12.249.12
VxVM VVR vxrsync INFO V-5-52-2213 Total volumes: 3
VxVM VVR vxrsync INFO V-5-52-2214 Total size: 3.000 G
Eps_time Dest_host Src_vol Dest_vol F'shed/Tot_sz Diff Done
00:00:00 10.12.249.12 datavol3 datavol3 0M/1024M 0% 0%
00:00:10 10.12.249.12 datavol3 datavol3 370M/1024M 5% 36%
Message from Primary:
00:00:20 10.12.249.12 datavol3 datavol3 754M/1024M 2% 74%
Message from Primary:
00:00:26 10.12.249.12 datavol3 datavol3 1024M/1024M 2% 100%
00:00:26 10.12.249.12 datavol2 datavol2 0M/1024M 0% 0%
Message from Primary:
00:00:36 10.12.249.12 datavol2 datavol2 362M/1024M 5% 35%
Message from Primary:
00:00:46 10.12.249.12 datavol2 datavol2 739M/1024M 2% 72%
00:00:52 10.12.249.12 datavol2 datavol2 1024M/1024M 2% 100%
00:00:52 10.12.249.12 datavol datavol 0M/1024M 0% 0%
Message from Primary:
00:01:02 10.12.249.12 datavol datavol 349M/1024M 0% 34%
Message from Primary:
00:01:12 10.12.249.12 datavol datavol 739M/1024M 0% 72%
Message from Primary:
00:01:19 10.12.249.12 datavol datavol 1024M/1024M 0% 100%
VxVM VVR vxrsync INFO V-5-52-2218 Verification of the remote volumes found differences.
VxVM VVR vxrsync INFO V-5-52-2219 VxRSync operation completed.
VxVM VVR vxrsync INFO V-5-52-2220 Total elapsed time: 0:01:19
Note that at the end of verification a one line summary is provided to say whether data volumes between primary and secondary nodes are identical or whether differences were found. If any differences were found, however small, the customer should consider performing some form or data resynchronisation between nodes (as described below) as soon as possible.
6. Force start replication - note that as primary data volumes have not been written to whilst verification was performed and it was confirmed that the rlink was up to date before verification was started force starting replication is acceptable:
rdgv240sol07# vradmin -g vvrdg -f startrep testrvg
Message from Primary:
VxVM VVR vxrlink WARNING V-5-1-12397 This command should only be used if primary and all secondaries are already synchronized. If this is not the case detach the rlink and use autosync or checkpoint options to attach.
VxVM VVR vxrlink INFO V-5-1-3614 Secondary data volumes detected with rvg testrvg as parent:
VxVM VVR vxrlink INFO V-5-1-6183 datavol: len=2097152 primary_datavol=datavol
VxVM VVR vxrlink INFO V-5-1-6183 datavol2: len=2097152 primary_datavol=datavol2
VxVM VVR vxrlink INFO V-5-1-6183 datavol3: len=2097152 primary_datavol=datavol3
rdgv240sol07# vxprint -qtrg vvrdg testrvg | grep ^rl
rl to08 testrvg CONNECT ACTIVE rdgv240sol08 vvrdg to07
7. Remount file systems:
rdgv240sol07# mount -F vxfs /dev/vx/dsk/vvrdg/datavol /datavol
rdgv240sol07# mount -F vxfs /dev/vx/dsk/vvrdg/datavol2 /datavol2
rdgv240sol07# mount -F vxfs /dev/vx/dsk/vvrdg/datavol3 /datavol3
8. Restart applications
Online verification:
Online verification uses a methodology which is extremely similar to offline verification however it does not require an outage to primary applications whilst running. Instead, when verification is started, a space optimised snapshot of each data volume within the primary RVG is created on the primary node. In parallel, a marker is placed in the primary SRL to indicate, in terms of write operations to volumes, when these snapshots were created.
The primary SRL continues to drain as normal until the snapshot marker is the oldest object in the SRL. At this point space optimised snapshots of all data volumes within the RVG are created on the secondary node. Note that in terms of the flow of I/O, primary and secondary snapshots have been taken at the same point in time, and as such should contain identical data.
Data verification then commences by incrementally reading chunks and generating MD5 checksums for all snapshot volumes on the primary and secondary nodes. As with offline verification, generated checksums are compared between nodes to determine if volumes are consistent or differences occur. Once verification is complete, space optimised snapshots are, by default, destroyed on both nodes.
Note that as space optimised snapshots represent a persistent point in time copy of their parent volume when created, online verification can proceed with applications on the primary node, and replication to the secondary node running as normal. As snapshot volumes are used for verification and not parent data volumes, the results of the verification are entirely valid even with the contents of parent data volumes changing whilst verification is in progress.
Note also, however, that due to the use of space optimised snapshots, some pre-configuration of volumes is required:
1. Prepare all data volumes on primary and secondary nodes for snapshot (i.e. add a DCO object and enable persistent fast mirror resync). This can be performed with the vxsnap command:
rdgv240sol07# vxsnap -g vvrdg prepare datavol
VxVM vxsnap INFO V-5-1-13571 Volume is under RVG, setting drl=no.
rdgv240sol07# vxsnap -g vvrdg prepare datavol2
VxVM vxsnap INFO V-5-1-13571 Volume is under RVG, setting drl=no.
rdgv240sol07# vxsnap -g vvrdg prepare datavol3
VxVM vxsnap INFO V-5-1-13571 Volume is under RVG, setting drl=no.
rdgv240sol07#
rdgv240sol08# vxsnap -g vvrdg prepare datavol
VxVM vxsnap INFO V-5-1-13571 Volume is under RVG, setting drl=no.
rdgv240sol08# vxsnap -g vvrdg prepare datavol2
VxVM vxsnap INFO V-5-1-13571 Volume is under RVG, setting drl=no.
rdgv240sol08# vxsnap -g vvrdg prepare datavol3
VxVM vxsnap INFO V-5-1-13571 Volume is under RVG, setting drl=no.
rdgv240sol08#
2. Optionally, pre-create cache objects for use by space optimised instant snapshots:
Note that pre-creation of cache objects is not required as these can be created by VVR during online verification. Despite this pre-creation is recommended as VVR has very limited flexibility in terms of cache object size and location when cache objects are created automatically during online verification.
Initially a cache volume should be created with required parameters:
rdgv240sol07# vxassist -g vvrdg make cachevol 5g
Next a cache object should be created which uses the cache volume as backing storage:
rdgv240sol07# vxmake -g vvrdg cache vvrcacheobj cachevolname=cachevol autogrow=on
Finally the cache object should be started such that it is ready for use:
rdgv240sol07# vxcache -g vvrdg start vvrcacheobj
Note that cache objects should be created on primary and secondary nodes, and that the name of the cache object should be identical between nodes. Note also that cache objects must be sized correctly to ensure that they are large enough, or can grow, to accommodate total data change which is likely to occur during the verification period. For example if primary data change rate is 1Gb per hour and verification is expected to take 10 hours, cache objects should be sized at around 10Gb in size. In reality 10Gb of cache space may not be required if many writes occur to recurring regions within data volumes as space optimised snapshots operate using copy on first write technology, i.e. on first write to a specific offset the underlying data is copied out to the cache object however on subsequent writes no further copy is required.
It is important that cache objects are sized sufficiently as inadequate cache space can cause snapshot invalidation during verification making the results of the verification useless. Sizing of cache objects is beyond the scope of this document.
Once pre-configuration steps have been completed, data verification can be started:
1. Initiate snapshot creation and online data verification using the 'vradmin verifydata' command. If using pre-created cache objects the cache object name should be specified as follows. This single cache object will be shared by all space optimised snapshot volumes created:
rdgv240sol07# vradmin -g vvrdg verifydata testrvg rdgv240sol08 cache=vvrcacheobj
Alternatively if allowing VVR to create cache objects specify the cache object size. Note that a separate cache object of this size will be created for every underlying data volume in the RVG:
rdgv240sol07# vradmin -g vvrdg verifydata testrvg rdgv240sol08 cachesize=1g
2. Data verification will commence with cache object and snapshot volumes being created on the primary, and some time later, secondary nodes.
3. Once snapshot volumes are available VVR will commence validation of volumes between nodes and will report on progress until completion as shown below:
rdgv240sol07# vradmin -g vvrdg verifydata testrvg rdgv240sol08 cache=vvrcacheobj
Message from Primary:
VxVM VVR vxrsync INFO V-5-52-2230 Starting verification for snapshots with prefix VD0008120151-datavol3
VxVM VVR vxrsync INFO V-5-52-2211 Source host: 10.12.249.11
VxVM VVR vxrsync INFO V-5-52-2212 Destination host(s): 10.12.249.12
VxVM VVR vxrsync INFO V-5-52-2213 Total volumes: 3
VxVM VVR vxrsync INFO V-5-52-2214 Total size: 3.000 G
Eps_time Dest_host Src_vol Dest_vol F'shed/Tot_sz Diff Done
00:00:01 10.12.249.12 datavol3 datavol3 0M/1024M 0% 0%
00:00:11 10.12.249.12 datavol3 datavol3 305M/1024M 6% 30%
Message from Primary:
00:00:21 10.12.249.12 datavol3 datavol3 615M/1024M 3% 60%
Message from Primary:
00:00:31 10.12.249.12 datavol3 datavol3 1004M/1024M 2% 98%
00:00:31 10.12.249.12 datavol3 datavol3 1024M/1024M 2% 100%
00:00:31 10.12.249.12 datavol2 datavol2 0M/1024M 0% 0%
Message from Primary:
00:00:41 10.12.249.12 datavol2 datavol2 346M/1024M 5% 34%
Message from Primary:
00:00:51 10.12.249.12 datavol2 datavol2 722M/1024M 2% 71%
Message from Primary:
00:00:58 10.12.249.12 datavol2 datavol2 1024M/1024M 2% 100%
00:00:58 10.12.249.12 datavol datavol 0M/1024M 0% 0%
Message from Primary:
00:01:08 10.12.249.12 datavol datavol 340M/1024M 0% 33%
Message from Primary:
00:01:18 10.12.249.12 datavol datavol 724M/1024M 0% 71%
00:01:25 10.12.249.12 datavol datavol 1024M/1024M 0% 100%
VxVM VVR vxrsync INFO V-5-52-2218 Verification of the remote volumes found differences.
VxVM VVR vxrsync INFO V-5-52-2219 VxRSync operation completed.
VxVM VVR vxrsync INFO V-5-52-2220 Total elapsed time: 0:01:25
Again note that upon completion verification gives a one line summary of whether primary and secondary data volumes were found to be identical or whether differences were found. In the case that differences were found steps to resynchronise volumes between nodes should be performed as soon as possible.
VOLUME SYNCHRONISATION
VVR provides various methods for volume synchronisation. Factors influencing which method is most applicable to a given situation can include:
The amount of data which needs to be resynchronised between nodes - if this is small then a method which synchronises only differences may be most applicable
Whether the customer wishes to resynchronise only a specific subset of volumes - it may not be practical to resynchronise the whole RVG
Whether the customer wants to resynchronise the whole RVG (i.e. if the amount of difference is large) and how this is to be performed - for example the synchronisation can take place over the network or by using a disk/tape based backup
Note that whilst advising on which method is best in a given situation is beyond the scope of this document we can look at how each type of synchronisation is performed below.
Differences Based Synchronisation
This form of synchronisation is best performed when the amount of difference between primary and secondary data volumes is small. Its methodology is similar to that performed by verification procedures as discussed above. Essentially all data volumes within the RVG are read incrementally in small chunks on primary and secondary nodes with an MD5 checksum being generated for each chunk. These MD5 checksums are then compared between nodes.
Where MD5 checksums for a chunk match that chunk is guaranteed to be identical between nodes. Where MD5 checksums differ, however, VVR attempts to correct the difference by re-replicating the chunk from the primary node and overwriting the inconsistent chunk on the secondary node.
Note that differences based synchronisation can be performed online however must be performed in conjunction with an SRL checkpoint as rlinks are detached whilst synchronisation takes place. The SRL checkpoint is used to record incoming application writes made to parent data volumes on the primary whilst synchronisation is performed. These writes are then 'replayed' at the end of synchronisation to guarantee consistency .As such it is important to ensure that the primary SRL does not overflow during differences based synchronisation otherwise synchronisation will not complete (due to the SRL checkpoint being deleted) and volume consistency will not be achieved.
1. Before starting differences based synchronisation replication should be stopped with rlinks being detached on primary and secondary nodes:
rdgv240sol07# vradmin -g vvrdg -f stoprep testrvg
VxVM VVR vradmin WARNING V-5-52-92 Secondary data volumes will become out-of-date.
vradmin: Continue with stoprep (y/n)? y
Message from Primary:
VxVM VVR vxrlink INFO V-5-1-6466 Data volumes are in use. Before restarting replication a complete synchronization of the secondary data volumes must be performed.
rdgv240sol07# vxprint -qtrg vvrdg testrvg | grep ^rl
rl to08 testrvg DETACHED STALE rdgv240sol08 vvrdg to07
2. Start data synchronisation specifying the name of the SRL checkpoint which will be created by VVR. Note that once synchronisation begins details of the SRL checkpoint can be viewed with the 'vxrvg cplist' command:
Initially the RVG contains no SRL checkpoints:
rdgv240sol07# vxrvg -g vvrdg cplist testrvg
VxVM VVR vxrvg INFO V-5-1-4472 Rvg testrvg has no checkpoints
Start data synchronisation with 'vradmin syncrvg' - note that we specify an SRL checkpoint name of 'diffsyncckpt'. This provides a report as to progress:
rdgv240sol07# vradmin -g vvrdg -c diffsyncckpt syncrvg testrvg rdgv240sol08
Message from Host rdgv240sol08:
VxVM VVR vxrlink WARNING V-5-1-3532 Rlink to07 is already detached
Message from Primary:
VxVM VVR vxrsync INFO V-5-52-2233 Starting differences volume synchronization to remote
VxVM VVR vxrsync INFO V-5-52-2211 Source host: 10.12.249.11
VxVM VVR vxrsync INFO V-5-52-2212 Destination host(s): 10.12.249.12
VxVM VVR vxrsync INFO V-5-52-2213 Total volumes: 3
VxVM VVR vxrsync INFO V-5-52-2214 Total size: 3.000 G
Eps_time Dest_host Src_vol Dest_vol F'shed/Tot_sz Diff Done
00:00:00 10.12.249.12 datavol3 datavol3 0M/1024M 0% 0%
00:00:10 10.12.249.12 datavol3 datavol3 273M/1024M 6% 27%
...
We now see that an SRL checkpoint has been created with the name specified:
rdgv240sol07# vxrvg -g vvrdg cplist testrvg
Name MBytes % Log Started/Completed
---- ------ ------ -----------------
diffsyncckpt 40 7 Started
As synchronisation completes the SRL checkpoint is marked 'Completed':
...
00:01:19 10.12.249.12 datavol datavol 281M/1024M <1% 27%
Message from Primary:
00:01:29 10.12.249.12 datavol datavol 586M/1024M <1% 57%
00:01:39 10.12.249.12 datavol datavol 884M/1024M <1% 86%
Message from Primary:
00:01:43 10.12.249.12 datavol datavol 1024M/1024M <1% 100%
VxVM VVR vxrsync INFO V-5-52-2219 VxRSync operation completed.
VxVM VVR vxrsync INFO V-5-52-2220 Total elapsed time: 0:01:43
rdgv240sol07# vxrvg -g vvrdg cplist testrvg
Name MBytes % Log Started/Completed
---- ------ ------ -----------------
diffsyncckpt 89 17 Completed
At this stage rlinks stay detached, however if further writes are made to data volumes these writes will be recorded within the SRL checkpoint and will not be 'lost'.
3. To complete synchronisation we start replication using the checkpoint created by the 'vradmin syncrvg' command. This will initially drain the contents of the SRL checkpoint and once complete will resume normal VVR replication:
rdgv240sol07# vradmin -g vvrdg -c diffsyncckpt startrep testrvg
Message from Primary:
VxVM VVR vxrlink INFO V-5-1-3614 Secondary data volumes detected with rvg testrvg as parent:
VxVM VVR vxrlink INFO V-5-1-6183 datavol: len=2097152 primary_datavol=datavol
VxVM VVR vxrlink INFO V-5-1-6183 datavol2: len=2097152 primary_datavol=datavol2
VxVM VVR vxrlink INFO V-5-1-6183 datavol3: len=2097152 primary_datavol=datavol3
After replication is restarted we see the contents of the checkpoint drain until the rlink is up to date once more:
...
8 January 2010 12:25:42 GMT
VxVM VVR vxrlink INFO V-5-1-4640 Rlink to08 has 2872 outstanding writes, occupying 69676 Kbytes (13%) on the SRL
...
8 January 2010 12:25:47 GMT
VxVM VVR vxrlink INFO V-5-1-4640 Rlink to08 has 1582 outstanding writes, occupying 32963 Kbytes (6%) on the SRL
...
8 January 2010 12:25:52 GMT
VxVM VVR vxrlink INFO V-5-1-4467 Rlink to08 is up to date
4. Finally, once no longer required, the SRL checkpoint should be deleted. If this is not performed the checkpoint will continue to consume space within the SRL until the SRL overflows:
rdgv240sol07# vxrvg -g vvrdg -c diffsyncckpt checkdelete testrvg
rdgv240sol07#
Full Synchronisation Using SRL Checkpoint
It is also possible to perform full synchronisation of an RVG using the 'vradmin syncrvg' command. The steps to do this are identical to performing differences based synchronisation as described above however full synchronisation must be specified when running 'vradmin syncrvg'. For example:
rdgv240sol07# vradmin -g vvrdg -c diffsyncckpt -full syncrvg testrvg rdgv240sol08
Note that full synchronisation using an SRL checkpoint is not recommended. Instead an 'autosync' (as described below) should be used instead for the following reasons:
Autosync can resume after primary or secondary reboot whereas syncrvg synchronisations must be restarted
Autosync is more efficient in terms of I/O than syncrvg syncronisations
Autosync Synchronisation:
An autosync synchronisation is a full block level resynchronisation of all data volumes within an RVG over the network. It guarantees that each and every block of all primary data volumes are replicated to the secondary node, and that at the point in time where an autosync completes, primary and secondary volumes are identical.
Whilst a full description of the exact methodology of an autosync is beyond the scope of this document it uses on disk data volume DCM logs to track progress (hence how it is able to survive a reboot) in conjunction with a temporary SRL checkpoint during completion. As such the only pre-requisite for performing an autosync is that primary and secondary data volumes must have DCM logs attached.
Note also that an autosync is designed to be performed whilst data volumes are in use on the primary node and as such can be performed with no downtime required.
1. Before starting an autosync rlinks should be detached:
rdgv240sol07# vradmin -g vvrdg -f stoprep testrvg
VxVM VVR vradmin WARNING V-5-52-92 Secondary data volumes will become out-of-date.
vradmin: Continue with stoprep (y/n)? y
Message from Primary:
VxVM VVR vxrlink INFO V-5-1-6466 Data volumes are in use. Before restarting replication a complete synchronization of the secondary data volumes must be performed.
rdgv240sol07# vxprint -qtrg vvrdg testrvg | grep ^rl
rl to08 testrvg DETACHED STALE rdgv240sol08 vvrdg to07
2. Start the autosync using the 'vradmin -a startrep' command:
rdgv240sol07# vradmin -g vvrdg -a startrep testrvg
Message from Primary:
VxVM VVR vxrlink WARNING V-5-1-3359 Attaching rlink to non-empty rvg. Autosync will be performed.
VxVM VVR vxrlink INFO V-5-1-3614 Secondary data volumes detected with rvg testrvg as parent:
VxVM VVR vxrlink INFO V-5-1-6183 datavol: len=2097152 primary_datavol=datavol
VxVM VVR vxrlink INFO V-5-1-6183 datavol2: len=2097152 primary_datavol=datavol2
VxVM VVR vxrlink INFO V-5-1-6183 datavol3: len=2097152 primary_datavol=datavol3
VxVM VVR vxrlink INFO V-5-1-3365 Autosync operation has started
Note that an autosync does NOT provide rolling output on synchronisation progress - progress must be monitored using other commands such as 'vxrlink status'.
3. Monitor progress with 'vxrlink status' - note that once the autosync completes VVR switches to normal SRL based replication with no further intervention required:
rdgv240sol07# vxrlink -g vvrdg -i 10 status to08
8 January 2010 12:49:40 GMT
VxVM VVR vxrlink INFO V-5-1-4464 Rlink to08 is in AUTOSYNC. 2623520 Kbytes remaining.
VxVM VVR vxrlink INFO V-5-1-4464 Rlink to08 is in AUTOSYNC. 2511136 Kbytes remaining.
VxVM VVR vxrlink INFO V-5-1-4464 Rlink to08 is in AUTOSYNC. 2395168 Kbytes remaining.
...
VxVM VVR vxrlink INFO V-5-1-4467 Rlink to08 is up to date
Full Synchronisation Using Disk/Tape Based Backup
If it is necessary to perform a full resynchronisation of an RVG between primary and secondary nodes it may not be practical to perform this resynchronisation over the network as low available network bandwidth may cause the resynchronisation to take many days or weeks to complete. In this circumstance it may be more appropriate to use a disk or tape based backup for resynchronisation in conjunction with an SRL checkpoint. In this way resynchronisation can be performed without application downtime on the primary node.
The backup used can take any form as long as it is guaranteed to comprise of a block level backup of data volumes on the primary node. Note that this is required as VVR requires block level consistency between nodes as it has no knowledge of file system or other application using data volumes.
1. Before starting resynchronisation rlinks should be detached:
rdgv240sol07# vradmin -g vvrdg -f stoprep testrvg
VxVM VVR vradmin WARNING V-5-52-92 Secondary data volumes will become out-of-date.
vradmin: Continue with stoprep (y/n)? y
Message from Primary:
VxVM VVR vxrlink INFO V-5-1-6466 Data volumes are in use. Before restarting replication a complete synchronization of the secondary data volumes must be performed.
rdgv240sol07# vxprint -qtrg vvrdg testrvg | grep ^rl
rl to08 testrvg DETACHED STALE rdgv240sol08 vvrdg to07
2. Next an SRL checkpoint should be started on the primary node:
rdgv240sol07# vxrvg -g vvrdg -c fullsynccpt checkstart testrvg
The checkpoint is created, marked as started, and grows as writes occur to primary data volumes:
rdgv240sol07# vxrvg -g vvrdg cplist testrvg
Name MBytes % Log Started/Completed
---- ------ ------ -----------------
fullsynccpt 88 17 Started
3. A full block level backup should be performed of all primary data volumes.
4. Once the full backup is complete the checkpoint should be marked as completed to indicate that this is the case. Note that despite being marked as completed further writes to primary data volumes are still recorded in the primary SRL despite rlinks being detached:
rdgv240sol07# vxrvg -g vvrdg checkend testrvg
rdgv240sol07# vxrvg -g vvrdg cplist testrvg
Name MBytes % Log Started/Completed
---- ------ ------ -----------------
fullsynccpt 129 25 Completed
5. The full block level backup should be transferred to the secondary node and restored to all volumes within the RVG
6. Once the restore is complete replication should be restarted using the SRL checkpoint on the primary. Initially the checkpoint will drain after which VVR will switch back to normal SRL based replication and the rlink will be marked as consistent:
rdgv240sol07# vradmin -g vvrdg -c fullsynccpt startrep testrvg
Message from Primary:
VxVM VVR vxrlink INFO V-5-1-3614 Secondary data volumes detected with rvg testrvg as parent:
VxVM VVR vxrlink INFO V-5-1-6183 datavol: len=2097152 primary_datavol=datavol
VxVM VVR vxrlink INFO V-5-1-6183 datavol2: len=2097152 primary_datavol=datavol2
VxVM VVR vxrlink INFO V-5-1-6183 datavol3: len=2097152 primary_datavol=datavol3
7. Once the primary SRL has drained and the SRL checkpoint is no longer required, the checkpoint should be deleted. If this is not performed the checkpoint will continue to consume space within the SRL until the SRL overflows:
rdgv240sol07# vxrvg -g vvrdg -c fullsynccpt checkdelete testrvg
Synchronisation Of Individual Volumes
If all differences are found to affect only a single volume within an RVG it may be more practical to resynchronise just that volume between nodes rather than attempting to resynchronise the entire RVG.
The 'vradmin syncvol' command can be used to perform a one time synchronisation of a single data volume with the following caveats:
The volume contents must not be changing during
The volume cannot be part of an RVG object whilst synchronisation occurs
To perform volume synchronisation the following steps should be performed:
1. Any applications using the primary data volumes in question should be stopped and the volume unmounted:
rdgv240sol07# umount /datavol3
2. The volume should be removed from the RVG on primary and secondary nodes for the duration of the synchronisation:
rdgv240sol07# vradmin -g vvrdg -f delvol testrvg datavol3
Message from Primary:
VxVM VVR vxvol WARNING V-5-1-3601 Rvg (testrvg) is started. Since there may be outstanding writes still in the SRL, any secondary data volumes corresponding to this primary data volume should be considered out of sync.
3. The volume should be synchronised using the 'vradmin syncvol' command - this provides a report on progress whilst running as shown below:
rdgv240sol07# vradmin -g vvrdg syncvol datavol3 rdgv240sol08
VxVM VVR vradmin WARNING V-5-52-85 Volumes on remote hosts rdgv240sol08 will be overwritten. Continue with syncvol (y/n)? y
Message from Primary:
VxVM VVR vxrsync INFO V-5-52-2233 Starting differences volume synchronization to remote
VxVM VVR vxrsync INFO V-5-52-2211 Source host: 10.12.249.11
VxVM VVR vxrsync INFO V-5-52-2212 Destination host(s): 10.12.249.12
VxVM VVR vxrsync INFO V-5-52-2213 Total volumes: 1
VxVM VVR vxrsync INFO V-5-52-2214 Total size: 1.000 G
Eps_time Dest_host Src_vol Dest_vol F'shed/Tot_sz Diff Done
00:00:00 10.12.249.12 datavol3 datavol3 0M/1024M 0% 0%
00:00:10 10.12.249.12 datavol3 datavol3 300M/1024M 0% 29%
Message from Primary:
00:00:20 10.12.249.12 datavol3 datavol3 604M/1024M 0% 59%
Message from Primary:
00:00:30 10.12.249.12 datavol3 datavol3 900M/1024M 2% 88%
Message from Primary:
00:00:34 10.12.249.12 datavol3 datavol3 1024M/1024M 2% 100%
VxVM VVR vxrsync INFO V-5-52-2219 VxRSync operation completed.
VxVM VVR vxrsync INFO V-5-52-2220 Total elapsed time: 0:00:34
4. Once complete the volume can be added back to the RVG on primary and secondary nodes:
rdgv240sol07# vradmin -g vvrdg addvol testrvg datavol3
VxVM VVR vradmin WARNING V-5-52-91 Make sure volumes are synchronized before running addvol.
vradmin: Continue with addvol (y/n)? y
5. Finally the volume can be remounted and any applications using the volume can be restarted:
rdgv240sol07# mount -F vxfs /dev/vx/dsk/vvrdg/datavol3 /datavol3
rdgv240sol07#
FINAL NOTES
Verification of data between primary and secondary nodes should become an integral part of a customers standard procedure. This can help users to identify and deal with issues with replication before data at the secondary site is required for decision making or during a disaster scenario.
In addition, if verification finds differences, some form of synchronisation should be performed as soon as possible to deal with these differences. Until synchronisation is performed data at the secondary site should be considered corrupt and unusable. To ensure that synchronisation has been performed correctly further verification should be performed after synchronisation completes to ensure that differences have been rectified.
For further information see the Volume Replicator Administrators Guide