VxFS: vxresize operations results in system freeze for 5-10 minutes causing application hangs and VCS timeouts for CFS file systems

Description

Error Message

Cause

As part of the vxresize operation, the "fsadm grow" performs its operations as follows:

It issues a sync IOCTL against the mount point to flush any dirty cached data for the filesystem
Till this point, file system is accessible.
Once done it issues an IOCTL to freeze the filesystem and update the metadata
During the freeze window, the file system is not accessible from outside so you will observe "ls/df" hang scenario till the time file system is frozen
Once the metadata change is done, the file system is unfrozen and becomes accessible again

Analysis

Workload : 600 threads are doing appending writes on 600 different files across the same file system (workload is running from one node in a two node CFS cluster environment)

The STAP utility is used to collect "time taken by" vx_resize() and vx_freeze. An instrument "fsadm" binary was also created to collect the total time taken by fsadm throughout the various stages.

Jan 24 08:45:58 node fsadm: vxfs debug: fsadm before resize took 189 seconds <<<< this is the time taken by fsync (before freezing FS)
Jan 24 08:49:09 node fsadm: vxfs debug: fsadm->resize completed in 380 seconds <<<< Total time taken by fsadm grow

# date; time /opt/VRTS/bin/vxresize -g testdg vol1 +2T
Fri Jan 24 08:42:48 GMT 2020

real 6m21.325s <<< Total time taken by vxresize
user 0m0.013s
sys 0m0.260s

Function Total time(seconds) No of calls
------------------------------------------------------------
vx_resize 190 1
vx_cfs_resize 190 1
vx_freeze 189 1

So, in 381 seconds of the "vxresize" operation, fsync itself took 189 seconds to flush all the dirty data. The file system was frozen for only 189 seconds (i.e for 3 minutes) and this is the time window where the user would observe the "application IO freeze".

One thing that stood out is the amount of time the file system freeze operation took to complete the resize operation. From the analysis and further troubleshooting we were able understand where the time was spent and forced the flushing all the dirty pages before the initiating freeze operation. This reduced the dirty data in the system memory and improved the freeze process during the vxresize activity window.

The metasave collected from customer's environment was replayed on top of a cache volume. The vxresize with no workload against the file system is taking around 9 mins.

Size of Metasave : ~180GB

Actual FS size : 200 TB

Metasave has been replayed on a 200TB cache volume (created on top of 5TB of physical volume)

After replaying the metasave, the file system is mounted as a Clustered VxFS file system on both nodes in the CVm/CFS cluster, the vxresize (grow) is then executed from the CFS Primary node.

# date; strace -fFrtT -o resize_strace_out /opt/VRTS/bin/fsadm -b +1m /mnt1
Fri Jan 31 09:12:36 GMT 2020
UX:vxfs fsadm: INFO: V-3-25942: /dev/vx/rdsk/testdg/cachevol size increased from 429495426672 sectors to 429495428720 sectors

20496 0.000642 ioctl(3, _IOC(_IOC_WRITE, 0x46, 0x80, 0x1658), 0x7fff591720a0) = 0 <493.894782>
20496 493.894963 write(1, "UX:vxfs fsadm: INFO: V-3-25942: "..., 126) = 126 <0.000019> <<<<<<< Total time taken by fsadm command for growing the Filesystem

NOTE: Engineering wrote a customized STAP utility to find out the actual time taken by multiple function calls.

Function	Total time (seconds)	No. of calls
vx_resize	494	1
vx_cfs_resize	494	1
vx_get_dele	263	819204
vx_resize_mapupd	202	1
vx_fs_reinit	27	1
vx_freeze	1	1
vx_thaw	0	1

During the file system grow operation, calls to get the delegation of all the allocation units (there are 819204 allocation units in the concerned file system) are made and it is taking around ~50% of the total time consumed. Another function that is of interest is : vx_resize_mapupd that is called to update the extent maps for the resize request. This function is also taking around ~40% of the total time taken.

Engineering focused on the above mentioned functions to see how the performance could be improved by reducing the time required by these two functions.

Resolution

Veritas engineering are working on a potential product enhancement to handle the resizing of larger CFS file systems.

The second action will involve having better logging during the freeze operation, so we know where the time is being taken during freeze operation and it will make debugging the issue easier. We are going to track this as part of our supportability enhancements.

Issue/Introduction

When resizing a Clustered Veritas File Systems, the system may freeze for extended periods of time, resulting in application hangs and VCS (Veritas Cluster Server) timeouts.

Example

The I/O against a 10Tb mount-point froze for 5 minutes whilst increasing the size of the file system with the vxresize command.

Symptoms:
A CVM/CFS clustered 4-node environment has multiple CFS mounts
1. Whilst performing the vxresize (on the CVM master), the "df" command hangs (strace reveals that the vxresize is running)

2. No read or writes could be performed against the mount-point

3. ls <mountpoint> operation also hangs until the resize operation is complete

4. Cluster can't monitor related resources and initiates clean routine for mounts

5. All four nodes are impacted

Welcome to "KB Articles"