How to tune the VVR SRL vol_max_rdback_sz (RDBACK) tunable in cases when replication fails to drain the pending updates in the SRL

book

Article ID: 100038242

calendar_today

Updated On:

Description

Description


This document attempts to explain how to tune the SRL "vol_max_rdback_sz" (RDBACK) tunable.

In shared environments such as CVM (shared diskgroups) the SRL is accessed extensively, so the RDBACK pool may need to be increased to a higher than normal value to ensure the replication link can cope with the increase in workload at peak operational times.
 

IMPORTANT: In a shared environment, VVR always “READS BACK” from the SRL when replicating in asynchronous mode.


Figure 1.0

 

In asynchronous mode, where the Secondary or network bandwidth cannot keep with the incoming write rate, the Primary kernel memory buffers fills up.
For VVR to continue to provide memory for incoming writes and continue its processing, it must free the memory held by writes that have been written to the Primary data volume, but, not yet sent to the Secondary.

When VVR is ready to send the unsent writes that were freed, the writes must first “READ BACK” from the SRL.

In synchronous mode the data is always available in memory, while in asynchronous mode VVR may have to FREQUENTLY “READ BACK” the data from SRL. Synchronous replication can significantly decrease application performance by adding the network round trip to the latency of each write request.

Consequently, replication performance might suffer because of the delay of the additional read operation.


KEY POINT:

VVR does not need to “READ BACK” from the SRL if the “NETWORK BANDWIDTH” is sufficient and the Secondary always keeps up with the incoming write rate, or if the Secondary only falls behind for short periods during which the accumulated writes are small enough to fit in the VVR kernel buffer.


If VVR reads back from the SRL frequently, striping the SRL volume over several self-contained (not used by data volumes) disk could improve performance, unless already done at the array level.

To determine whether VVR is reading back from the SRL, use the “vxstat” command. In the output, note the number of read operations on the SRL.


vrstat command

The vrstat command prints statistical information for the volumes in Replicated Volume Groups (RVGs) and RLINKs, and for all hosts in a Replicated Data Set (RDS). Information is displayed across the RDS setup on all the hosts, and not for a specific host.

By default, the command displays statistics at intervals of 10 seconds. This interval can be changed by setting the VRAS_STATS_FREQUENCY environment variable to the required value in the /etc/vx/vras/vras_env file.

If no rvg argument is specified, the command displays information for the RLINKs, storage replicator logs (SRLs), data volumes or memory tunables across all the RDSs on the local host, depending on the option that is specified.

If no option or argument is specified, the vrstat command displays the consolidated status for the RLINKs, SRLs, data volumes and memory tunables of all RDSs on the local host.

The "-M" option with the vrstat command can be used to display detailed information for the memory tunables on every host in an RDS. The output from this option is similar to that from the vxmemstat command.


Scenario:

In this instance the replication state stalls during a peak operating window, and as result the SRL starts to fill up as the SRL updates are not being replicated to the secondary site, note the waiting state for the RDBCK value associated with RBCK-datadg_rvg.




Sample output:


# vrstat -M

Fri Aug 2 22:20:57 2013 Replicated Data Set prod_rvg:

Fri Aug 2 22:20:59 2013

Replicated Data Set datadg_rvg:


Memory-pool Statistics:  
Host   Pool        DG   Min    Max    In   Allocated  Max   Waiting
                        Size   Size   Use             Used
------ ---------- ---- ------ ------ ----- ---------- ----- -------
Barney WRSHIP       -    1024  65536   0    1024      0     no  
Barney RDBCK-prodg_rvg - 1024  1048576 660  1024      972   no  
Barney NMCOM-prodg_rvg - 1024  262144  972  1024      1012  no  
Barney RDBCK-datadg_rvg - 1024 1048576 1048320 1048580 1048320 yes     <<<< WAITING
Barney NMCOM-datadg_rvg - 1024 262144  262140 262140 262140 no


The tuning process may require multuple amendments to the " vol_max_rdback_sz" ( RDBACK) until a satisfactory level has been reached to cope with the peak operating level.

The RDBACK (vol_max_rdback_sz) pool can be increased to 1073741824 (1024M) using the vxtune command as shown below:


# vxtune vol_max_rdback_sz
Tunable                            Current Value  Default Value Reboot  
--------------------------------- --------------- ------------- ------  
vol_max_rdback_sz                   1073741824     134217728     N
 

# vxtune vol_max_rdback_sz=2147483648

The "vol_max_rdback_sz" value has to be increased further, as a result the RDBACK value is then increased to:
 

# vxtune vol_max_rdback_sz
Tunable                            Current Value  Default Value Reboot
--------------------------------- --------------- ------------- ------
vol_max_rdback_sz                   2147483648     134217728     N

 

 


# vrstat -M

Sun Aug 4 22:20:01 2013

Replicated Data Set prod_rvg:

Sun Aug 4 22:20:01 2013

Replicated Data Set datadg_rvg:


Memory-pool Statistics:  
Host   Pool        DG   Min    Max    In   Allocated  Max   Waiting
                        Size   Size   Use             Used
------ ---------- ---- ------ ------ ----- ---------- ----- -------
Barney WRSHIP       -    1024  65536   0    1024      0     no  
Barney RDBCK-prodg_rvg - 1024  1048576 660  1024      972   no  
Barney NMCOM-prodg_rvg - 1024  262144  972  1024      1012  no
Barney RDBCK-datadg_rvg - 1024 2097152 1194960 1222456 1198080 no     <<<< NOT WAITING
Barney NMCOM-datadg_rvg - 1024 262144  262140 262140 262140 no


After increasing the RDBACK pool to 2147483648, the issue appears to be resolved as the rlinks no longer go into a stalled state.

 

Additionally o n the Primary the "-e" argument can be used in connection with the vxrlink command to gather more results about the replication throughput.

# vxrlink –g -e stats


Sample output:

# vxrlink -g datadg -i 10 -e stats rlk_adc

Fri Aug  2 22:24:25 2013
 Messages :
 --------
 Number of blocks sent                           : 2834216
 Compressed msgs                                 : 25447
 Compressed data(bytes)                          : 44041954
 Uncompressed data(bytes)                        : 270866944
 Compression Ratio                               : 6.15         
 Bandwidth Savings                               : 83.74%

 Errors :
 ------
 No memory available                             : 0
 No message slots available                      : 0
 No memory available in nmcom pool on Secondary  : 0
 Timeout                                         : 11905
 Missing packet                                  : 668
 Missing message                                 : 154
 Stream                                          : 0
 Checksum                                        : 0
 Unable to deliver due to transaction            : 5

 Messages :
 --------
 Number of blocks sent                           : 47701
 Compressed msgs                                 : 52
 Compressed data(bytes)                          : 343355
 Uncompressed data(bytes)                        : 1148416
 Compression Ratio                               : 3.34         
 Bandwidth Savings                               : 70.10%

 Errors :
 ------
 No memory available                             : 0
 No message slots available                      : 0
 No memory available in nmcom pool on Secondary  : 0
 Timeout                                         : 143
 Missing packet                                  : 8
 Missing message                                 : 0
 Stream                                          : 0
 Checksum                                        : 0
 Unable to deliver due to transaction            : 0
 

Issue/Introduction

How to tune the VVR SRL vol_max_rdback_sz (RDBACK) tunable in cases when replication fails to drain the pending updates in the SRL