VxFS file system becomes unresponsive while attempting to "grow" a file, using setext

book

Article ID: 100009652

calendar_today

Updated On:

Description

Error Message

On a VCS cluster that is running Oracle, the VCS engine_A.log may contain entries which indicate that the Oracle processes are not responding.

YYYY/MM/DD HH:MM:SS VCS NOTICE V-16-20002-27 Oracle:ora_yxdb:clean:Oracle(resource_name) kill KILL 50528416, 51118120, 51314812, 51380282, 53280990, 54067392, 57737352, 58130542, 62324986, 60293510
YYYY/MM/DD HH:MM:SS VCS NOTICE V-16-20002-210 monitoretting cookie for proc = ora_lmon_, PID = 51511342
<<<  hung process
YYYY/MM/DD HH:MM:SS VCS NOTICE V-16-20002-210 monitoretting cookie for proc = ora_lgwr_, PID = 52101228  <<<
YYYY/MM/DD HH:MM:SS VCS NOTICE V-16-20002-210 monitoretting cookie for proc = ora_smon_, PID = 53346416 <<<
YYYY/MM/DD HH:MM:SS VCS NOTICE V-16-20002-210 monitoretting cookie for proc = ora_dbw0_, PID = 54067392 <<<

Note: The above Oracle processes appear to be waiting on a Global Lock Manager (GLM) lock. The following thread is the GLM lock owner which was looping in the vx_growfile() kernel function.

(0)> f 3739
pvthread+0E9B00 STACK:
[04B520C8].vx_hash2ints+0000A8 ()
[04B90E44].vx_cbuf_hold+000030 ()
[04BDEFC4].vx_brelse+000248 ()
[04B399F0].vx_te_bufrelse+000040 ()
[04B40454].vx_pre_bmapsplit+0003A4 ()
[04B2D2A0].vx_dopreamble+00009C ()
[04B7A8DC].vx_growfile+0004D4 ()           
<<< vx_growfile()
[04CB411C].vx_doreserve+000118 ()
[04CB7974].vx_tran_extset+0005DC ()
[04BBB7E8].vx_extset_msg+0006E8 ()
[04D4E488].vx_cfs_extset+000040 ()
[04CB4D54].vx_extset+0002D4 ()
[04CB7BD4].vx_setext+000190 ()
[04CFBE38].vx_uioctl+0004AC ()
[04C4E720].vx_ioctl+0000D0 ()
[04B5A0C0].vx_ioctl_skey+00004C ()
[0059D2F0]vnop_ioctl+000050 (??, ??, ??, ??, ??, ??)
[00014F50].kernel_add_gate_cstack+000030 ()
[04C4E888].vx_vop_ioctl+00001C ()
[04D513D4].vx_odm_resize@AF15_6+00015C ()
[04D51024].vx_odm_resize+000030 ()
[F1000000C03305C4].odm_vx_resize+000040 ()
[F1000000C032BAAC].odm_resize+0000E8 ()
[F1000000C031A48C].vxodmioctl+00018C ()
[00014D70].hkey_legacy_gate+00004C ()
[0059D2F0]vnop_ioctl+000050 (??, ??, ??, ??, ??, ??)
[005B2D58]vno_ioctl+000178 (??, ??, ??, ??, ??)
[006865D8]common_ioctl+0000F8 (??, ??, ??, ??)
[00003850]ovlya_addr_sc_flih_main+000130 ()

 

 

Additional information from engineering:

Problem:
------------

- A lot of oracle threads got stuck around IRWLOCK, which is owned by vx_growfile() thread.
- vx_growfile() is trying to grow one oracle DB file, and VxFS is trying to allocate some extents to grow the file.
- The extent allocation happenes through multiple calls to vx_bmap_alloc(), because the file system got badly fragmented.
- The TE buffer which containes the requesting offset has already 486 data extent descriptors, so it has space left just for 26 entries.
- At this point, when vx_te_bmap_alloc() allocates the 26th extent, it will cause the buffer to become full.
- Now vx_te_bmap_alloc() is called to allocate the 27th extent, it calls vx_te_bmap_split(), which locates the buffer and find that it is full, so it goes up 1 level to the parent of this TE buffer and starts locking buffers in VX_LOCK_EXCL mode.
- At this time, GLM doesn’t have VX_LOCK_EXCL level cached for the parent TE buffer on this node, because vx_growfile() issues a trylock, which is GLM_WEAK, hence the VX_LOCK_EXCL call for the parent buffer fails with VX_EBMAPLOCK error.
- So vx_te_bmap_split() sets up a preamble routine. This preamble routine is set up such that it would look for 21 extent descriptors.
- The VX_EBMAPLOCK error causes the current transaction to be undone, so the TE buffer still has space left for 26 entries, all the 26th allocated extents are rolled back, note that vx_tranundo happens before the preamble routine.
- In vx_grow(), we then call the preamble routine that was set up in vx_te_bmap_split().
- The preamble routine vx_pre_bmapsplit() calls vx_te_bmap_split(). In vx_te_bmap_split() we locate the TE buffer that would hold the data extents for this grow. Then we find out if the buffer has space for 21 more entries.
- At this point, the buffer still has space for 26 entries, which is larger than 21, so vx_te_bmap_split() thinks the buffer has enough space to hold 21 descriptors, as a result, it locks only this buffer in VX_LOCK_EXCL mode and doesn’t bother to lock its parent buffer in VX_LOCK_EXCL mode, no real split of the bmap tree happen, and it just returns normally.
- After the preamble routine completes, vx_growfile() retries the transaction. It runs into the same VX_EBMAPLOCK failure again and ends up in an infinite loop.

Summary:
--------------
This issue could be seen when the file system is badly fragmented, because that means multiple allocation is need to fulfill one growfile request, and each of these allocations needed to be entered into the indirect extent of the inode. As INDIR extent buffers have a limit to how many DATA extent descriptors they might contain - occasionally INDIR extents need to be split. Splitting requires taking exclusive locks on the parent INDIR extent buffers - they might fail with EBMAPLOCK (unable to get the lock).
When the growfile transaction fails to split the bmap tree with EBMAPLOCK, VxFS sets up the preamble to do the splitting through separate transaction (rather than through the growfile transaction) - so that growfile attempted again does not need to split and can succeed. However, here preamble checks for whether 21 more DATA extent descriptors can be inserted into INDIR extent then only it does the split. But here we've a peculiar situation where split is required after 26 DATA entries are added to the INDIR. In short, pre-amble insufficiently imagines making growfile transaction tried again and again in vain.
 

 

Cause

The vx_growfile() kernel function may run into a loop on a highly fragmented file system, which causes multiple processes to hang. The vx_growfile() routine is invoked through the setext(1) command or its Application Programming Interface (API).

When the vx_growfile() function requires more extents than the provided extent buffer can spare, an VX_EBMAPLOCK error may occur. To handle the error, VxFS cancels the transaction and repeats the same operation again, which creates the loop.

 

Resolution

A supported hotfix has been made available for this issue. Please contact Veritas Technical Support to obtain this fix. This hotfix has not yet gone through any extensive Q&A testing. Consequently, if you are not adversely affected by this problem and have a satisfactory temporary workaround in place, we recommend that you wait for the public release of this hotfix.

As a workaround, this issue can be largely avoided by periodically defragmenting the file system. Visit the following article for information about how to check and resolve file system fragmentation:

https://supportinfoscale.cloud.com/support-home/kbsearch/article?articleNumber=100024882&articleTitle=using-fsadm-to-measure-and-optimize-extent-and-directory-fragmentation&

 

Issue/Introduction

If VxFS file system is heavily fragmented, it may become unresponsive while attempting to grow a file, using setext. This may occur if setext is executed manually, or via the setext API by database software, if auto-grow is enabled (the autoextend attribute on an Oracle datafile). In particular, daemons may be hung on the vx_growfile() kernel function.

 

Additional Information

ETrack: 3189562 ETrack: 3189825