Background:
When working on VVR (Veritas Volume Replicator) issues it is common for customers to remark that VVR performance is not matching their expectations given the network infrastructure in place between the primary and secondary nodes.
Inadequate VVR performance may be caused by a number of factors and may require tuning of the product. Despite this, however, VVR issues may also be caused by the underlying network infrastructure not performing as expected, for example showing much lower data throughput rates than expected. In this case the onus is then on the network administrator or provider to improve the performance of the network to allow a corresponding improvement in VVR performance.
To achieve end to end testing of network performance the IPerf tool is commonly used. In most cases the tool is used to determine that connectivity exists between a port on two given nodes, and to find the total available bandwidth at any given time on that connection. Note, however, that this tool can also be used to tune different aspects of connections between nodes such as the number of TCP connections in use and so on.
Note that IPerf has been accredited by many major US institutions and is commonly available pre-compiled for many operating systems in their package repositories or on freeware sites. Note, however, that IPerf is not linked with or sponsored by Veritas and as such any defects or bugs found with IPerf should be reported directly to the developers and not to Veritas technical support.
Various versions of IPerf have been made available in the past with increasing functionality and flexibility being added to the later releases. The examples in this document are around IPerf 2.0.4 and are run on Solaris SPARC systems. If using any other operating system or version of IPerf this document makes no guarantees that the options shown are applicable or even available on the target system.
Note : For Redhat Linux, iperf package available in their repositories. Below link provides download and usage information for Redhat Linux.
Issue:
In the examples in this document we will explain how to use IPerf to determine the maximum throughput of data available on a given UDP or TCP connection between two machines. When dealing with network connections and VVR, however, the following points need to be considered:
- The rated speed of a piece of network hardware or connection is the theoretical maximum throughput that can be achieved in a burst in ideal conditions. As such the real world maximum throughput which can be achieved is likely to be somewhat lower than the rated maximum throughput.
- Network throughput is commonly measured in bits per second, for example Mbps (mega bits per second) or Kbps (kilo bits per second). These are an order of magnitude smaller than data storage sizes such as MB (mega bytes) or KB (kilo bytes). As such a network card rated at 100Mbps can only transfer a theoretical maximum of ~12.5MB of data per second.
- Note also that 1Mbps == 1000Kbps and 1Kbps == 1000bps not 1024bps. This is different to data storage sizes where 1MB == 1024KB and 1KB == 1024B.
- The peak throughput achievable by VVR when replicating is likely to be somewhat lower than the maximum throughput found by network testing tools such as IPerf. This is due to the design of the VVR product making it particularly sensitive to network 'features' such as dropped or reordered packets and packet fragmentation. Note also that VVRs use of the network is inherently more complex than the simple tests performed by simple network test tools.
- VVR has two main modes of replication - DCM replay (used during DCM replays and autosync operations), and SRL based replication (used during normal replication and SRL checkpoint replays). It is expected that DCM replays will provide somewhat better network throughput than SRL based replication. Again this is due to the design of the VVR product however in brief DCM replays give better throughput due to:
Use of fixed data chunk sizes (areas of underlying volume are being replicated rather than individual writes)
Not having to adhere to write ordering on the secondary node (rlinks are marked as inconsistent during DCM replays so there is no requirement for the secondary to process data received over the network in strict write order)
Using IPerf:
IPerf is based on a client/server model. As such IPerf binaries must be installed on both machines taking part in testing. IPerf is started on one node in 'server' mode. This node will sit and listen for a connection on a specific port. Iperf is then started on the second node in 'client' mode and is provided with the host name or IP address of the 'server' node. The client node will attempt to initiate a connection with the server node and once connected will start to push data over the connection. The characteristics of the connection can be modified by options provided to the IPerf binary. IPerf will attempt to push as much data as possible over the connection in a given amount of time and once complete will use the time taken and amount of data pushed to calculate a perceived bandwidth for the connection.
When using IPerf to test network throughput between primary and secondary nodes the following must be ensured before testing can begin:
- Replication must be stopped. If running, replication will be using a certain proportion of the available bandwidth between the nodes and as such will cause IPerf to give an artificially small value for available bandwidth which can be misleading. If using IPerf to test non VVR ports replication can simply be paused.
- If using IPerf to test throughput on specific ports which are also being used by VVR, the component of VVR using the port must be stopped. Failure to do this can leave VVR bound to the port and can cause IPerf to fail as it cannot bind to a given port or to give misleading results. The component of VVR which should be stopped depends on the port being tested. For example, when using TCP for replication data is commonly sent on TCP port 4145. As such before IPerf can be used to test this port vxnetd kernel threads should be stopped on both nodes to ensure that the port is free.
To stop a specific daemon the daemons specific init script can be used. The example below shows the use of the vxstart_vvr script to stop all VVR daemons for the duration of testing:
Initially, our rlink is CONNECT ACTIVE, i.e.:
# vxprint -qtrg testdg | grep ^rl
rl to_10.12.240.33 testrvg CONNECT ACTIVE 10.12.240.33 testdg to_10.12.249.249
We stop the VVR daemons on both nodes:
# /usr/sbin/vxstart_vvr stop
And the rlink drops to ENABLED ACTIVE:
# vxprint -qtrg testdg | grep ^rl
rl to_10.12.240.33 testrvg ENABLED ACTIVE 10.12.240.33 testdg to_10.12.249.249
We are now able to run IPerf as required to test connectivity and throughput between the nodes.
Once complete the IPerf server process on the server node should be killed and VVR daemons restarted on both nodes:
# /usr/sbin/vxstart_vvr start
Rlinks should transition back to CONNECT ACTIVE and replication should continue as normal:
# vxprint -qtrg testdg | grep ^rl
rl to_10.12.240.33 testrvg CONNECT ACTIVE 10.12.240.33 testdg to_10.12.249.249
IPerf Options:
In the following examples a number of command line switches are used when running IPerf. A description of each of these switches and when they should be used is given below:
-s: Run IPerf locally in server mode (i.e. sit and listen for incoming connections)
-c hostname: Run IPerf locally in client mode and attempt to initiate a connection with an existing IPerf server on the node given by hostname (this can be a name or IP address)
-p port_number: Define the port which IPerf should use for communications. If not specified this will default to port 5001
-t time: Define the length in seconds for which IPerf testing should run. If not specified IPerf will default to 10 seconds however a longer time such as 30 seconds appears to give more consistent results
-u: Run tests using the UDP protocol. If not specified IPerf will default to using the TCP protocol
-b bandwidth[KM]: The bandwidth at which the client should attempt to send when using the UDP protocol. Note that to effectively test throughput of a UDP connection the client should be specified to send at a higher throughput than its believed that the network has available. For example if testing a 100Mbps network connection bandwidth should be specified at higher than 100M, for example 250M. Failure to do this will prevent the client from sending as fast as possible and will give an artificially low result for available network bandwidth. Note that if not specified IPerf will default to 1Mbps. The reason for this is that IPerf does not want to try and flood the connection with UDP packets unless told to do so, however we need to flood the network to get a valid result. Note that this option is not required when using the TCP protocol.
-l size: Size in bytes of datagrams to send during testing when using the UDP protocol. If not specified IPerf defaults to 1470 byte datagrams to try and avoid packet fragmentation. Note that this option is not required when using the TCP protocol.
-P number: Number of parallel connections to use when using the TCP protocol. If not specified IPerf defaults to a single TCP connection. This can help simulate multiple TCP connections as used by VVR in certain situations.
-B IP address: Bind to a specific IP address on the local node. This is useful where a machine has multiple IP addresses on the same subnet as it can be used to bind IPerf to the exact same IP as would be used by VVR on each machine ensuring that IPerf uses the exact same network infrastructure/routing as VVR traffic.
As we can see IPerf is extremely flexible and as well as reporting basic network bandwidth available between two nodes it can also be used to generate comparative reports with changes in datagram size and numbers of connections allowing some testing of tuning which could be applied to the VVR stack and the likely affect this will have on replication throughput.
When simply using IPerf to compare how much network bandwidth is available with the amount of bandwidth being used by VVR care should be taken to match options used by IPerf such as datagram size, port numbers, protocol and so on as closely as possible to the current VVR configuration.
Examples Of Use:
Testing a single UDP connection between two nodes on port 12345 using the default size of datagram. Note that as the network connection between nodes is believed to be ~100Mbps a bandwidth value of 250Mbps is specified:
Server node:
root@pnd1# ./iperf -s -u -p 12345
------------------------------------------------------------
Server listening on UDP port 12345
Receiving 1470 byte datagrams
UDP buffer size: 8.00 KByte (default)
------------------------------------------------------------
[ 3] local 10.12.240.32 port 12345 connected with 10.12.240.33 port 46248
Client node:
root@pnd2# ./iperf -c 10.12.240.32 -u -p 12345 -t 30 -b 250M
------------------------------------------------------------
Client connecting to 10.12.240.32, UDP port 12345
Sending 1470 byte datagrams
UDP buffer size: 8.00 KByte (default)
------------------------------------------------------------
[ 3] local 10.12.240.33 port 46248 connected with 10.12.240.32 port 12345
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 298 MBytes 83.4 Mbits/sec
Testing a single TCP connection between two nodes on port 45678:
Server node:
root@pnd1# root@pnd1# ./iperf -s -p 45678
------------------------------------------------------------
Server listening on TCP port 45678
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[ 4] local 10.12.240.32 port 45678 connected with 10.12.240.33 port 37795
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-30.0 sec 189 MBytes 52.8 Mbits/sec
Client node:
root@pnd2# ./iperf -c 10.12.240.32 -p 45678 -t 30
------------------------------------------------------------
Client connecting to 10.12.240.32, TCP port 45678
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.12.240.33 port 37795 connected with 10.12.240.32 port 45678
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 189 MBytes 52.8 Mbits/sec
The same TCP test as above but with 8 parallel connections (to show effect this might have on throughput):
Server node:
root@pnd1# ./iperf -s -p 45678
------------------------------------------------------------
Server listening on TCP port 45678
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[ 4] local 10.12.240.32 port 45678 connected with 10.12.240.33 port 37835
[ 5] local 10.12.240.32 port 45678 connected with 10.12.240.33 port 37836
[ 6] local 10.12.240.32 port 45678 connected with 10.12.240.33 port 37837
[ 7] local 10.12.240.32 port 45678 connected with 10.12.240.33 port 37838
[ 8] local 10.12.240.32 port 45678 connected with 10.12.240.33 port 37839
[ 9] local 10.12.240.32 port 45678 connected with 10.12.240.33 port 37840
[ 10] local 10.12.240.32 port 45678 connected with 10.12.240.33 port 37841
[ 11] local 10.12.240.32 port 45678 connected with 10.12.240.33 port 37842
[ ID] Interval Transfer Bandwidth
[ 8] 0.0-30.0 sec 41.0 MBytes 11.4 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 11] 0.0-30.0 sec 41.4 MBytes 11.6 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 9] 0.0-30.0 sec 41.5 MBytes 11.6 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-30.0 sec 41.0 MBytes 11.5 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 7] 0.0-30.1 sec 41.5 MBytes 11.6 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 10] 0.0-30.1 sec 42.6 MBytes 11.9 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-30.1 sec 41.6 MBytes 11.6 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-30.1 sec 41.4 MBytes 11.6 Mbits/sec
[SUM] 0.0-30.1 sec 332 MBytes 92.7 Mbits/sec
Client node:
root@pnd2# ./iperf -c 10.12.240.32 -p 45678 -t 30 -P 8
------------------------------------------------------------
Client connecting to 10.12.240.32, TCP port 45678
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[ 10] local 10.12.240.33 port 37842 connected with 10.12.240.32 port 45678
[ 3] local 10.12.240.33 port 37835 connected with 10.12.240.32 port 45678
[ 4] local 10.12.240.33 port 37836 connected with 10.12.240.32 port 45678
[ 5] local 10.12.240.33 port 37837 connected with 10.12.240.32 port 45678
[ 6] local 10.12.240.33 port 37838 connected with 10.12.240.32 port 45678
[ 7] local 10.12.240.33 port 37839 connected with 10.12.240.32 port 45678
[ 8] local 10.12.240.33 port 37840 connected with 10.12.240.32 port 45678
[ 9] local 10.12.240.33 port 37841 connected with 10.12.240.32 port 45678
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-30.0 sec 41.6 MBytes 11.6 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 7] 0.0-30.0 sec 41.0 MBytes 11.5 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 8] 0.0-30.0 sec 41.5 MBytes 11.6 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 10] 0.0-30.0 sec 41.4 MBytes 11.6 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 9] 0.0-30.0 sec 42.6 MBytes 11.9 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-30.0 sec 41.0 MBytes 11.5 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-30.0 sec 41.5 MBytes 11.6 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 41.4 MBytes 11.6 Mbits/sec
[SUM] 0.0-30.0 sec 332 MBytes 92.7 Mbits/sec
As we can see we achieve significantly higher throughput with multiple connections (as expected given the nature of the TCP protocol).
Interpreting IPerf Results:
It should be noted that any results obtained with IPerf detail the bandwidth which was available specifically when the test was run. Available bandwidth can be influenced by many factors such as:
- The time of day/day of the week (i.e. network are expected to be busier during peak hours or when backups are running)
- Applications running on nodes being tested (a high load may prevent IPerf/network stacks from getting sufficient CPU time skewing results)
- Routing in use - it is possible to use different network paths at different times of the day
As such IPerf should be run a number of times across the day to get a feel for what is normally achievable bandwidth and any periods where bandwidth is higher or lower than normal.
Likewise available bandwidth can also be influenced by physical factors in the network such as:
- Asymmetric routing (where a different route is taken through the network from client to server when compared with server to client) - network traffic may be significantly faster in one direction
- Data compression - data can be compressed depending on given ports and protocols
As such available bandwidth should always be tested in both directions between nodes to ensure that available bandwidth is comparable depending on which node is used as the server (otherwise the customer may experience significant changes in VVR throughput if the direction of replication is reversed after migration/fail over between sites). Likewise it is important (as far as possible) to test available bandwidth on the same ports as being used by VVR with comparable options such as protocol and packet size. This ensures that IPerf generated traffic is subject to the same compression and so on as VVR traffic.
Note that when using the TCP protocol data is sent on the VVR heartbeat port (which defaults to port 4145) whereas when using the UDP protocol data is sent on anonymous UDP ports by default making it very difficult to determine the exact port numbers in use for data transmission. If VVR throughput is very poor but IPerf shows significantly improved data throughput on a named UDP port or range of ports it is sensible to configure VVR to use the same ports as a test to see if this shows an increase in throughput.
For further information on displaying and modifying VVR ports please see the 'vrport' commands man page.