- Commands such as ls,cd and mv may not respond.
- VCS monitor scripts may timeout / fail when commands are not responsive.
This issue is tracked via etrack incident # 2878164.
While freeing the memory and pre-translations from heap, if the call happen to be in interrupt context, a kernel extension is not allowed to call services like xmfree() and xlate_remove(). Hence, VxFS hangs off these structures to per-cpu data to be freed later(during further allocations). This deferral is piling up the heap consumption of VxFS.
Below is a sample information from a snap captured at the time of hang:
VMM Memory Limits:
Total available memory (4K frames) : 00E80000 58.0GB
Total unmanaged mem (wlm_hw_pages): 0006D300 1.8GB
4K number of frames : 00064130 1.6GB
4K frames pinned : 0004B40B 1.2GB
4K system pinnable frames remaining: 00004CE9 77.0MB ==> Very little leftover for 4K
frames
4K user pinnable frames remaining : 0003F7C4 1015.8MB
64K number of frames : 0004CBBD 19.2GB
64K frames pinned : 0003A527 14.6GB
64K system pinnable frames remaining: 0000310A 784.7MB ==> Looks like most of 64K memory is used & very little leftover
64K user pinnable frames remaining : 0003BB50 15.0GB
16M number of frames : 000008E3 35.6GB
16M frames pinned : 000008E3 35.6GB
16M system pinnable frames remaining: 00000000 0.0MB ==> Pinned memory for 16M frames exhausted
16M user pinnable frames remaining : 00000000 0.0MB
16G number of frames : 00000000 0.0MB
16G frames pinned : 00000000 0.0MB
16G system pinnable frames remaining: 00000000 0.0MB
16G user pinnable frames remaining : 00000000 0.0MB
Free paging space (in 4K blocks) : 00256102 9.4GB
Paging space SIGDANGER level : 00015000 336.0MB
Paging space SIGKILL level : 00005400 84.0MB
(0)> xm -lu > xm_lu.out
(0)>
(0)> !grep -i vx xm_lu.out
0000000013EF12F8 10354 4872E24 .vx_alloc+0000A4
00000000001A0000 13 4872E24 .vx_alloc+0000A4
0000000000040000 2 4872E24 .vx_alloc+0000A4
0000000031AE0000 6359 4872E24 .vx_alloc+0000A4
000000007D520000 16041 4872E24 .vx_alloc+0000A4
00000000299C0000 5326 4872E24 .vx_alloc+0000A4
000000000144A920 249 4872E24 .vx_alloc+0000A4
0000000011834214 1480889 4872E24 .vx_alloc+0000A4
00000000002CC400 14457 4872E24 .vx_alloc+0000A4
(0)>
(0)> hcal
0000000013EF12F8+00000000001A0000+0000000000040000+000000007D520000+00000000299C0000+000000000144A920+0000000011834214+00000000002CC400
Value hexa: CDEFC22C Value decimal: 3455042092
(0)> dcal 3455042092/1024/1024
Value decimal: 3294 Value hexa: 00000CDE
(0)>
The issue is addressed via VxFS code change. VxFS now have separate worker threads which will release the consumed heap. While doing the free, if the call happens to be interrupt context, the allocated structure will be placed on workitem queues to be picked up by the corresponding worker thread and release it.
The fix is available in the following VxFS patch releases:
VxFS 5.1SP1
AIX 5.3, 6.1 - 5.1SP1RP4 https://sort.Veritas.com/patch/detail/7886
AIX 7.1 - 5.1SP1PR1RP4 https://sort.Veritas.com/patch/detail/7887
VxFS 6.0.3
AIX - 6.0.3 https://sort.Veritas.com/patch/detail/6996
Applies To
This issue is applicable VxFS 5.1 and 6.0 versions running on AIX systems