Introduction
(Back to top)
The important factors that affect VVR performance are the layout of the SRL and the sizing of the VVR buffers. This article explains how to decide on the layout of the SRL, and size the VVR buffers. It also describes how to choose the value of other VVR tcunables.
(Back to top)
This section explains how the SRL affects application performance and how a good SRL layout can improve performance.
How SRL affects performance
Incoming writes to a data volume on the Primary are written to the SRL, first, and then to the data volume. VVR manages writes in the same way, irrespective of the replication settings, including the mode of replication. Note that writes to different data volumes within the RVG are all written in the same SRL. Therefore, the SRL throughput may affect performance. The use of the SRL may not degrade performance too badly, for the following reasons:
- The writes to SRL are sequential, whereas, the writes to the data volumes are spatially random in most cases. Typically, sequential writes are processed faster than the random writes.
- The SRL is not used to process read operations performed by the application. If a large percentage of the operations are read operations, then the SRL is not busy at these times.
If the rate at which the application writes to the data volumes is greater than the rate at which the SRL can process writes, then the application could become slow. The following sections explain how to lay out the SRL to improve performance.
Striping the SRL
Striping the SRL over several physical disks to increase the available bandwidth can improve performance.
Choosing disks for the SRL
It is recommended that there be no overlap between the physical disks comprising the SRL and those comprising the data volumes, because all write requests to VVR result in a write to both the SRL and the requested data volume. Any such overlap is guaranteed to lead to major performance problems, as the disk head thrashes between the SRL and data sections of the disk. Slowdowns of over 100% can be expected.
Mirroring the SRL
It is recommended that the SRL be mirrored to improve its reliability. The loss of the SRL immediately stops replication. The only way to recover from this is to perform a full resynchronization, which is a time-consuming procedure to be avoided whenever possible. Under certain circumstances, the loss of the SRL may even cause loss of the data volumes. The risk of this failure can be minimized by mirroring the SRL.
Tuning Veritas Volume Replicator
This section describes how to adjust the tunable parameters that control the system resources used by VVR. Depending on the system resources that are available, adjustments may be required to the values of some tunable parameters to optimize performance.
Note: In a shared disk group environment, each of theVVRbuffer spaces must be set to the same value on each node.
Write and readback buffers
- Write buffer space on the Primary
- Readback buffer space on the Primary
- Buffer space on the Secondary
Write buffer space on the Primary
VVRprocesses writes differently depending on whether it is replicating in a private disk group or a shared disk group. Also, in a shared disk group environment, VVR processes writes differently when replicating in synchronous and asynchronous mode.
When a write is issued, a write buffer is allocated from the write buffer space on the Primary. In a private disk group, the buffer is not released until the data has been written to the Primary SRL and sent to all the Secondaries in synchronous mode. If the Secondaries in asynchronous mode cannot keep up with the application write rate, the data to be sent to the Secondary starts accumulating in the write-buffer space on the Primary. As a result, write-buffer space on the Primary becomes low. Then, VVR begins to free some buffers and postpones sending the data to the Secondaries in asynchronous mode. As a result, more space is freed up for incoming write requests so that they are not delayed. If the disk group is shared and the write is issued on the logowner, VVR allocates a write buffer from the write buffer space on the logowner.
If the disk group is shared and VVR is replicating in synchronous mode, and the write is issued on the non-logowner, VVR sends the write to the logowner. On the logowner, VVR receives the write in the write ship buffer space and then copies it to the write buffer space. This process is called write shipping. In a shared disk group that uses write shipping, the write buffer is freed in the same way as for a private disk group.
If the disk group is shared and VVR is replicating in asynchronous mode, and the write is issued on the non-logowner, VVR exchanges metadata information about the write with the logowner. After VVR receives the metadata information on the non-logowner,VVRperforms the writes locally on the non-logowner. This process is called metadata shipping.
Readback buffer space on the Primary
When VVR is ready to send the freed requests to the Secondary, the freed requests are read back from the SRL. The data from the SRL is read back in to the Readback buffer space on the Primary.
The need to read back data from the SRL has an impact on write latency because more non-sequential I/O is performed on the SRL. Reading back data from the SRL also increases the load on the system and slows the rate at which data is sent to the Secondaries.
Note: The write buffer is freed only if the mode of replication is asynchronous; the writes do not have to be read back from the SRL when replicating in synchronous mode.
Buffer space on the Secondary
The writes from the Primary are received in to the buffer space on the Secondary. The write is then written to the Secondary data volume from this buffer space. A write on the Primary can complete before the write to the Secondary data volume completes, even in synchronous mode of replication. However, if the Secondary is low on buffer space, it rejects new writes from the Primary thereby slowing down the Primary. On the Primary this appears as an inability to send requests over the network. The results are identical to those pertaining to insufficient network bandwidth.
For Secondaries in asynchronous mode, theremaybe no limit to how far Secondary data volumes can fall behind unless certain mechanisms are in force. Hence, if all the Secondaries are replicating in asynchronous mode, the application may not slow down; if there are Secondaries in synchronous mode, the write rate of the application reduces.
Tunable parameters for the VVR buffer spaces
The amount of buffer space available toVVRaffects the application and replication performance. You can use the following tunables to manage buffer space according to your requirements:
- vol_rvio_maxpool_sz
- vol_min_lowmem_sz
- vol_max_rdback_sz
- vol_max_nmpool_sz
The amount of buffer space available toVVRaffects the application and replication performance. You can use the following tunables to manage buffer space according to your requirements:
- vol_rvio_maxpool_sz
- vol_min_lowmem_sz
- vol_max_wrspool_sz
- vol_max_rdback_sz
- vol_max_nmpool_sz
Use the vxmemstat command to monitor the buffer space used by VVR. The following sections describe each of the above tunables.
Tunable parameters for the write buffer space on the Primary in a private disk group
The following tunable parameters control the write buffer space on the Primary in a private disk group:
- vol_rvio_maxpool_sz
- vol_min_lowmem_sz
The amount of buffer space that can be allocated within the operating system to handle incoming writes is defined by the tunable vol_rvio_maxpool_sz, which defaults to 128MB.
Tuning Veritas Volume Replicator
If the available buffer space is not sufficient to process the write request, writes are held up. VVR must wait for current writes to complete and release the memory being used before processing new writes.
Furthermore, when the buffer space is low, VVR frees buffers early, requiring VVR to read back the write from the SRL.
Both these problems can be alleviated by increasing vol_rvio_maxpool_sz. By setting the vol_rvio_maxpool_sz to be large enough to hold the incoming writes, you can increase the number of concurrent writes and reduce the number of readbacks from the SRL. When decreasing the value of the vol_rvio_maxpool_sz tunable, stop all the RVGs on the system on which you are performing this operation.
When deciding whether or not a given write is freed early and read back later, VVRlooks at the amount of buffer space available, and frees the write if the amount is below a threshold defined by the tunable vol_min_lowmem_sz. If this threshold is too low, it results in buffers being held for a long time. New writes cannot be performed because of lack of buffer space.
The vol_min_lowmem_sz tunable is about 4MB.
You can raise the threshold by increasing the value of the tunable vol_min_lowmem_sz. It should be set to at least 3 x N x I, but not less than 520K, where N is the number of concurrent writes to replicated volumes, and I is the average I/O size, rounded up to 8 kilobytes. The vol_min_lowmem_sz tunable is auto-tunable and depending on the incoming writes,VVRwill increase or decrease the tunable value. The value that you specify for the tunable, using the vxtune utility or the system-specific interface, will be used as an initial value and could change depending on the application write behavior.
Note: This tunable is used only when replicating in asynchronous mode because SRL is not read back when replicating in synchronous mode.
Use the vxrvg stats command to determine the maximum concurrency (N) and average write size (I).
Tunable parameter for the readback buffer space
The amount of buffer space available for readbacks is defined by the tunable, vol_max_rdback_sz, which defaults to 128 megabytes. To accommodate reading back more data, increase the value of vol_max_rdback_sz. You may need to increase this value if you have multiple Secondaries in asynchronous mode for one or more RVGs.
Use the vxmemstat command to monitor the buffer space. If the output indicates that the amount of space available is completely used, increase the value of the vol_max_rdback_sz tunable to improve readback performance.When decreasing the value of the vol_max_rdback_sz tunable, pause replication to all the Secondaries to which VVR is currently replicating.
Tunable parameters for the buffer space on the Primary in a shared disk group
In a shared disk group environment, the following tunable parameters control
the buffer space on the Primary when replicating in asynchronous mode:
- vol_rvio_maxpool_sz
- vol_min_lowmem_sz
- vol_max_rdback_sz
In asynchronous mode, the tunable parameters work the same way as for a private disk group.
The vol_rvio_maxpool_sz tunable applies to all nodes. The vol_min_lowmem_sz vol_max_rdback_sz tunables are only applied on the logowner node. However, these tunables should also be set to the same values on all nodes, because any node may become the logowner at some time. In a shared disk group environment, the following tunable parameters control the buffer space on the Primary when replicating in synchronous mode:
- vol_max_wrspool_sz
- vol_rvio_maxpool_sz
When replicating in synchronous mode, the vol_rvio_maxpool_sz tunable works as the same way as for a private disk group, except that it won't prevent readbacks.
This tunable should be set on all nodes in the shared disk group. In addition, the amount of buffer space that can be allocated on the logowner to receive writes sent by the non-log owner is defined by the write ship buffer space tunable vol_max_wrspool_sz, which defaults to 64MB. This tunable should be set to the same value on all nodes, because any node may become the logowner at some time.
Tunable parameters for the buffer space on the Secondary
The amount of buffer space available for requests coming in to the Secondary over the network is determined by the VVR tunable, vol_max_nmpool_sz, which defaults to 64 megabytes.VVRallocates separate buffer space for each Secondary RVG, the size of which is equal to the value of the tunable vol_max_nmpool_sz. The buffer space on the Secondary must be large enough to prevent slowing the network transfers excessively.
If the buffer is too large, it can cause problems. When a write arrives at the Secondary, the Secondary sends an acknowledgment to the Primary so that the Primary knows the transfer is complete. When the write is written to the data volume on the Secondary, the Secondary sends another acknowledgment, which tells the Primary that the write can be discarded from the SRL. However, if this second acknowledgment is not sent within one minute, the Primary disconnects the RLINK. The RLINK reconnects immediately but this causes disruption of the network flow and potentially other problems. Thus, the buffer space on the Secondary should be sized in such a way that no write can remain in it for one minute. This size depends on the rate at which the data can be written to the disks, which is dependent on the disks themselves, the I/O buses, the load on the system, and the nature of the writes (random or sequential, small or large).
If the write rate isWmegabytes/second, the size of the buffer should be no greater thanW* 50 megabytes, that is, 50 seconds’ worth of writes.
There are various ways to measure W. If the disks and volume layouts on the Secondary are comparable to those on the Primary and you have I/O statistics from the Primary before replication was implemented, these statistics can serve to arrive at the maximum write rate.
Alternatively, if replication has already been implemented, start by sizing the buffer space on the Secondary to be large enough to avoid timeout and memory errors.
While replication is active at the peak rate, run the following command and make sure there are no memory errors and the number of timeout errors is small:
# vxrlink -g diskgroup -i5 stats rlink_name
Then, run the vxstat command to get the lowest write rate:
# vxstat -g diskgroup -i5
The output looks similar to this:
OPERATIONS BLOCKS AVG TIME(ms)
TYP NAME READ WRITE READ WRITE READ WRITE
Mon 29 Sep 2003 07:33:07 AM PDT
vol srl1 0 1245 0 1663 0.0 9.0
vol archive 0 750 0 750 0.0 9.0
vol archive-L01 0 384 0 384 0.0 5.9
vol archive-L02 0 366 0 366 0.0 12.1
vol ora02 0 450 0 900 0.0 11.1
vol ora03 0 0 0 0 0.0 0.0
vol ora04 0 0 0 0 0.0 0.0
Mon 29 Sep 2003 07:33:12 AM PDT
vol srl1 0 991 0 1389 0.0 20.1
vol archive 0 495 0 495 0.0 10.1
vol archive-L01 0 256 0 256 0.0 5.9
vol archive-L02 0 239 0 239 0.0 14.4
vol ora02 0 494 0 988 0.0 10.0
vol ora03 0 0 0 0 0.0 0.0
vol ora04 0 0 0 0 0.0 0.0
For each interval, add the numbers in the blocks written column for data volumes, but do not include the SRL. Also, do not include any subvolumes. For example, archive-L01, and archive-L02 are subvolumes of the volume archive. The statistics of the writes to the subvolumes are included in the statistics for the volume archive. You may vary the interval, the total time you run the test, and the number of times you run the test according to your needs. In this example, the interval is 5 seconds and the count is in blocks, hence on a machine with 2 kilobytes of block size, the number of megabytes per interval, M, is (total * 2048)/(1024*1024), where total is the sum for one interval. Hence, for one second the number of megabytes is M/5 and the size of the buffer is (M/5)*50. If there is more than one Primary, do not increase the buffer size beyond this number.
The writes to the SRL should not be considered part of the I/O load of the application. However, in asynchronous mode, the Secondary writes the incoming updates to both the Secondary SRL and the data volumes, so it may be necessary to make the value of vol_max_nmpool_sz slightly larger. However, to avoid the problems discussed at the beginning of this section, the calculated vol_max_nmpool_sz value should still ensure that writes do not remain in the pool for more than one minute.
DCM replay block size
(Back to top)
When the Data Change Map (DCM) is being replayed, data is sent to the Secondary in blocks. The tunable vol_dcm_replay_size enables you to configure the size of the DCM replay blocks according to your network conditions. The default value of vol_dcm_replay_size is 256K. Decreasing the value of the tunable vol_dcm_replay_size may improve performance in a high latency environment.
(Back to top)
VVR uses a heartbeat mechanism to detect communication failures between the Primary and the Secondary hosts. The RLINKs connect after the heartbeats are exchanged between the Primary and the Secondary. The RLINK remains connected while the heartbeats continue to be acknowledged by the remote host. The maximuminterval during which heartbeats can remain unacknowledged is known as the heartbeat timeout value. If the heartbeat is not acknowledged within the specified timeout value, VVR disconnects the RLINK.
The tunable vol_nm_hb_timeout enables you to specify the heartbeat timeout value. The default is 10 seconds. For a high latency network, increasing the default value of the tunable vol_nm_hb_timeout prevents the RLINKs from experiencing false disconnects.
In Release 5.1SP1 and later, you can tune the heartbeat timeout value with vxtune.
Memory chunk size
(Back to top)
The tunable voliomem_chunk_size enables you to configure the granularity of memory chunks used by VVR when allocating or releasing system memory. A memory chunk is a contiguous piece of memory that can be written to the disk in one operation. If the write size of the application is larger than the memory chunk size then the write is split resulting in multiple operations, which can reduce performance.
The default memory chunk size is 32K. For applications performing large writes, increase the size of voliomem_chunk_size to improve replication performance. The maximum value of voliomem_chunk_size is 32K.
UDP tuning, adjusting the flow control algorithm
(Back to top)
When you use UDP as the replication protocol, VVR uses its own network flowcontrol. VVR increases or decreases the rate at which data is sent depending on the number of timeouts or memory errors it gets per second. If the number of errors is greater, VVR decreases the sending rate to avoid network congestion. If there are only a few (or no) errors, VVR continues to increase the sending rate by a fixed amount every second. For a lossy network, a large number of errors may occur, which prevents VVR from increasing the sending rate. However, these errors are not due to network congestion so VVR should continue to increase the sending rate.
You specify the error tolerance VVR uses by setting two tunables:
vol_rp_increment and vol_rp_decrement. VVR increases its sending rate if timeouts or memory errors per second are not more than the vol_rp_increment value. VVR decreases its sending rate if timeouts or memory errors per second are more than vol_rp_decrement. The default value of both tunables is 8.
In the case of a lossy network, the sending rate does not increase because the number of errors per second could be more than vol_rp_increment or vol_rp_decrement, and the sending rate can decrease further. This impacts replication performance. If RLINK statistics show a higher number of errors and VVR is not using available bandwidth, you may be able to improve replication performance by tuning vol_rp_increment and vol_rp_decrement to higher value like 16 or 32. In Release 5.1SP1 and later, you can change both tunables using the vxtune command. This tuning is required only on the Primary host.
TCP tuning, tuning the number of TCP connections
(Back to top)
If you use TCP as the replication protocol, VVR supports multi-connection. The number of connections is auto-tuned depending on latency between the Primary host and the Secondary host. If the network pipe is very thick, the auto-tuned number of connections may not be enough to saturate the pipe.
You can specify the number of TCP connections using the tunable
nmcom_max_connections.
You may need to tune nmcom_max_connections in the following situations:
- VVR is not using available bandwidth
- The latency between the Primary host and Secondary host is high (greater than 20 ms)
To tune the number of TCP connections
1 Determine the number of active connections. Enter:
# /etc/vx/diag.d/vxkprint |grep active_connections
2 If active_connections value is less than 8, tune nmcom_max_connections, continue with the following steps.
3 Pause replication.
4 Tune the nmcom_max_connections value to 8. In Release 5.1SP1 and later, you can tune this value with the vxtune command. Enter:
# vxtune nmcom_max_connections 8
Note: This tuning is required only on the Primary host.
5 Resume replication to make the new value effective.
6 Use the vrstat command to observe the bandwidth use. If use increases and there is still room grow, tune nmcom_max_connections to 16 or 32 connections.
Message slots on the Secondary
VVR uses a number of memory slots on the Secondary site to store incoming network I/O. By default, number of messages slots is 1024. If the network has high packet reordering, the default number of message slots may not be enough. Packet will get dropped if they are out of order, and this will impact replication performance.
You can use network health check tools like Iperf to find packet reordering, or you can also use extended RLINK statistics. Enter the following command:
# vxrlink -g dgname -i10 -e stats rlink_name
Errors :
------
No memory available : 0
No message slots available : 0
No memory available in nmcom pool on Secondary : 0
Timeout : 0
Missing packet : 0
Missing message : 0
Stream : 0
Checksum : 0
Unable to deliver due to transaction : 0
Errors in the following categories indicate that packet reordering is happening
in the network:
- No message slots available
- Missing message
You can specify the number of message slots with the nmcom_max_msgs tunable. For a high packet reordering network, increasing default value to 2048 or 3072 may improve replication performance. In Release 5.1SP1 and later, you can tune this value with the vxtune command.
VVR and NAT (Network Address Translation) firewall
(Back to top)
VVR uses a heartbeat mechanism to detect communication failures between the Primary and the Secondary hosts. VVR uses IP addresses in the heartbeat message to send heartbeat acknowledgments.
When replicating over a Network Address Translation (NAT) based firewall, VVR must use the translated IP address, instead of the IP address in the heartbeat message. If the IP address in the heartbeat message is used, the heartbeat acknowledgment is dropped at the firewall and replication does not start.
The tunable vol_vvr_use_nat directs VVR to use the translated address from the received message so that VVR can communicate over a NAT-based firewall. Set this tunable to 1 only if there is a NAT-based firewall in the configuration.
(Back to top)
The following tunable parameters control the compression settings on a per-system basis:
- vol_cmpres_enabled
- vol_cmpres_threads
Tunable parameter for enabling compression
The compression feature can be enabled using one of the following methods:
- On a per-Secondary basis using the vradmin set command.
- On a per-system basis using the global VVR compression tunable.
Enabling compression on a per-Secondary basis requires that you use the vradmin set command.
The global VVR compression tunable, vol_cmpres_enabled, lets you enable or disable compression on a per-system basis. Compression is disabled by default. The vol_cmpres_enabled tunable can be set either on the Primary or the Secondary host, however it comes into effect only on the Primary. The per-Secondary compression setting does not override the global setting.
In the case of a clustered environment, the tunable needs to be set on the current logowner node of the Primary or Secondary clusters. The compression-related statistics are maintained only on the logowner node of the Primary cluster. It is recommended that you set the tunable on all the nodes of the Primary and Secondary clusters to prevent disruption of the tunable setting in the event that the logowner node changes.
Note: If the global VVR compression tunable is enabled, but the per-Secondary setting is not, compression will be enabled but the vradmin repstatus andvxprint command outputs will not indicate whether compression is enabled or disabled.
Also, the vrstat and vxrlink -e stats commands will not display the compression-related statistics. To be able to view the compression mode and compression-related statistics, you should use the per-Secondary setting to enable compression.
Tunable parameter for setting compression threads
The vol_cmpres_threads tunable is a per-system tunable that lets you set the number of compression threads on the Primary host or the number of decompression threads on the Secondary host between 1 and 64. The default value is 10. You can tune this setting dependent on your CPU usage.
Tuning considerations for VVR compression
The following are tuning considerations for VVR compression:
- VVR compression results in additional CPU utilization on the Primary and Secondary hosts when enabled. It is recommended that you provision for this CPU overhead by allocating a dedicated CPU.
- Reducing the number of compression and decompression threads on the Primary and Secondary hosts reduces the amount of CPU consumption.
- In cases where CPU utilization is very high, it is recommended that you either disable compression or reduce the number of compression and decompression threads that are set on the Primary and Secondary hosts.