As part of the vxresize operation, the "fsadm grow" performs its operations as follows:
Analysis
Workload : 600 threads are doing appending writes on 600 different files across the same file system (workload is running from one node in a two node CFS cluster environment)
The STAP utility is used to collect "time taken by" vx_resize() and vx_freeze. An instrument "fsadm" binary was also created to collect the total time taken by fsadm throughout the various stages.
Jan 24 08:45:58 node fsadm: vxfs debug: fsadm before resize took 189 seconds <<<< this is the time taken by fsync (before freezing FS)
Jan 24 08:49:09 node fsadm: vxfs debug: fsadm->resize completed in 380 seconds <<<< Total time taken by fsadm grow
# date; time /opt/VRTS/bin/vxresize -g testdg vol1 +2T
Fri Jan 24 08:42:48 GMT 2020
real 6m21.325s <<< Total time taken by vxresize
user 0m0.013s
sys 0m0.260s
Function Total time(seconds) No of calls
------------------------------------------------------------
vx_resize 190 1
vx_cfs_resize 190 1
vx_freeze 189 1
So, in 381 seconds of the "vxresize" operation, fsync itself took 189 seconds to flush all the dirty data. The file system was frozen for only 189 seconds (i.e for 3 minutes) and this is the time window where the user would observe the "application IO freeze".
One thing that stood out is the amount of time the file system freeze operation took to complete the resize operation. From the analysis and further troubleshooting we were able understand where the time was spent and forced the flushing all the dirty pages before the initiating freeze operation. This reduced the dirty data in the system memory and improved the freeze process during the vxresize activity window.
The metasave collected from customer's environment was replayed on top of a cache volume. The vxresize with no workload against the file system is taking around 9 mins.
Size of Metasave : ~180GB
Actual FS size : 200 TB
Metasave has been replayed on a 200TB cache volume (created on top of 5TB of physical volume)
After replaying the metasave, the file system is mounted as a Clustered VxFS file system on both nodes in the CVm/CFS cluster, the vxresize (grow) is then executed from the CFS Primary node.
# date; strace -fFrtT -o resize_strace_out /opt/VRTS/bin/fsadm -b +1m /mnt1
Fri Jan 31 09:12:36 GMT 2020
UX:vxfs fsadm: INFO: V-3-25942: /dev/vx/rdsk/testdg/cachevol size increased from 429495426672 sectors to 429495428720 sectors
20496 0.000642 ioctl(3, _IOC(_IOC_WRITE, 0x46, 0x80, 0x1658), 0x7fff591720a0) = 0 <493.894782>
20496 493.894963 write(1, "UX:vxfs fsadm: INFO: V-3-25942: "..., 126) = 126 <0.000019> <<<<<<< Total time taken by fsadm command for growing the Filesystem
NOTE: Engineering wrote a customized STAP utility to find out the actual time taken by multiple function calls.
| Function | Total time (seconds) | No. of calls |
|---|---|---|
| vx_resize | 494 | 1 |
| vx_cfs_resize | 494 | 1 |
| vx_get_dele | 263 | 819204 |
| vx_resize_mapupd | 202 | 1 |
| vx_fs_reinit | 27 | 1 |
| vx_freeze | 1 | 1 |
| vx_thaw | 0 | 1 |
During the file system grow operation, calls to get the delegation of all the allocation units (there are 819204 allocation units in the concerned file system) are made and it is taking around ~50% of the total time consumed. Another function that is of interest is : vx_resize_mapupd that is called to update the extent maps for the resize request. This function is also taking around ~40% of the total time taken.
Engineering focused on the above mentioned functions to see how the performance could be improved by reducing the time required by these two functions.
Veritas engineering are working on a potential product enhancement to handle the resizing of larger CFS file systems.
The second action will involve having better logging during the freeze operation, so we know where the time is being taken during freeze operation and it will make debugging the issue easier. We are going to track this as part of our supportability enhancements.