The problem was caused by the etrack incident listed in the Supplemental Material section of this article. The following is a description of the problem.
SYMPTOM:
system panic when press Control-C at aio-stress running.
DESCRIPTION:
The test program is using POSIX threads, which share the same mm_struct in the kernel. [The function] exit_aio() is (correctly) only being called on the exit of the last pthread from
./kernel/fork.c:mmput();
if (atomic_dec_and_test(&mm->mm_users)) {
exit_aio(mm);
...
Therefore, other pthreads that have submitted aio can exit without waiting for IO. As there is no synchronisations between the exiting pthreads and VxFS, vx_naio_do_work() can deference an exited thread;
fsizelim = VX_GETU_RLIMIT_FSIZE_TASK(nwip->nwi_tsk);
which causes the panic. If the race was a little slower, we'd panic down in VxFS's uiomove code.
The correct way to handle this is for the IO to take a hold on the mm_struct
(inc mm->mm_users), but GPL export restrictions mean we couldn't drop the hold
(EXPORT_SYMBOL_GPL(mmput)).
RESOLUTION:
The fix uses two fields in the task structure; one to provide an exit hook (->tux_exit) that is called regardless of any pthreads (aka VM_CLONEd threads), and a counter (->tux_info) for the number of outstanding IOs against a thread.
Please upgrade to Veritas Storage Foundation 5.1SP1 to fix the problem.
The required patch can be downloaded from the Veritas Operation Readiness Tools (SORT) website
https://sort.Veritas.com/patch/matrix
Applies To
The problem only affects VxFS running on Linux Platform. It doesn't affect other platforms.