Analysis of the crash dump showed a backtrace like below, and MountAgent as process (CFSMount agent also subject to the same issue):
crash> bt
PID: 7280 TASK: ffff9d2652748000 CPU: 15 COMMAND: "MountAgent"
#0 [ffffb3d00f7eb488] machine_kexec at ffffffffae46156e
#1 [ffffb3d00f7eb4e0] __crash_kexec at ffffffffae58f99d
#2 [ffffb3d00f7eb5a8] crash_kexec at ffffffffae59088d
#3 [ffffb3d00f7eb5c0] oops_end at ffffffffae42434d
#4 [ffffb3d00f7eb5e0] no_context at ffffffffae47262f
#5 [ffffb3d00f7eb638] __bad_area_nosemaphore at ffffffffae47298c
#6 [ffffb3d00f7eb680] do_page_fault at ffffffffae473267
#7 [ffffb3d00f7eb6b0] page_fault at ffffffffaee010fe
[exception RIP: d_path+52]
RIP: ffffffffae753504 RSP: ffffb3d00f7eb760 RFLAGS: 00010286
RAX: ffff9d11006f1000 RBX: ffffb3d00f7eb7b8 RCX: 0000000000000301
RDX: 0000000000001000 RSI: ffff9d11006f0000 RDI: 0000000000000000
RBP: ffffb3d00f7eb790 R8: 0000000000000000 R9: 000000000166f25f
R10: 0000000000000000 R11: 0000000000000100 R12: ffffb3d00f7eb970
R13: 0000000000000000 R14: ffff9d11006f0000 R15: ffffffffc08631a0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffb3d00f7eb798] amf_plat_fs_verify at ffffffffc0847e38 [amf]
#9 [ffffb3d00f7eb8b8] amf_ev_fsoff_verify at ffffffffc083eb74 [amf]
#10 [ffffb3d00f7eb8d0] amf_event_reg at ffffffffc08354ee [amf]
#11 [ffffb3d00f7eb920] amfioctl at ffffffffc084d754 [amf]
#12 [ffffb3d00f7ebe68] amf_ioctl at ffffffffc0843c1c [amf]
#13 [ffffb3d00f7ebe80] do_vfs_ioctl at ffffffffae72dfe4
#14 [ffffb3d00f7ebef8] ksys_ioctl at ffffffffae72e620
#15 [ffffb3d00f7ebf30] __x64_sys_ioctl at ffffffffae72e666
#16 [ffffb3d00f7ebf38] do_syscall_64 at ffffffffae40420b
#17 [ffffb3d00f7ebf50] entry_SYSCALL_64_after_hwframe at ffffffffaee000ad
RIP: 00007f4b5578a62b RSP: 00007f4b548f4e48 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4b5578a62b
RDX: 00007f4b40003ae0 RSI: 0000000046c0af04 RDI: 000000000000000c
RBP: 000000000000000c R8: 0000000000000000 R9: 00007f4b40000080
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 000000000040a21a R14: 00007f4b548f6b90 R15: 00007f4b40003ae0
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
The panic is due to changes within related RedHat 8.4 kernel structures, which impacted Cluster Server's Asynchronous Monitoring Framework (AMF), used to get instant notifications (as opposed to VCS traditional monitoring).
With RHEL 8.4, AMF will cause a node to crash when agents register with AMF.
For RedHat 8.4 support with Infoscale, the following patch is required:
To avoid the AMF-related panic, the following fix also needs to be applied:
However, for any new download of the 7.4.2.1800 patch from 13 July, 2021, the AMF-related panic is addressed in the refreshed patch:
7.4.1 is also impacted if having applied the Infoscale 7.4.1.2800 patch for RHEL 8.4 support. Work is ongoing on an AMF fix for this release. The issue will be fixed in the forthcoming 7.4.1 update 5 patch, due on 23 July..
If patching is not an option, a workaround exists. To prevent a system panic, disable the AMF mechanism for the Mount and the CFSMount agents.
Run the following command before upgrading the operating system:
# haimfconfig -disable -agent Mount CFSMount
This command disables the AMF mechanism for the specified agents by changing the Mode value to 0 for each agent and for all the associated resources whose Mode values were overridden.
- If VCS is running, the command prompts the user to confirm whether to make the configuration changes persistent. If "No" is specified, the command exits. If "Yes" is specified, it disables the AMF mechanism and saves the update to the configuration by using the "haconf -dump -makero" command.
- If VCS is not running, the Mode value for the agents is modified in the VCS configuration file. Before it makes any changes to configuration files, the command prompts for confirmation. If "No" is specified, the command exits. If "Yes" is specified, the configuration file is updated.
Disabling AMF for these agents causes agents to fall back to periodic (default of every 60 seconds) monitoring, and hence doesn’t impact the High Availability of the application.
After upgrade to Infoscale patch 7.4.2.1800 for RHEL 8.4 support, VCS MountAgent process causes panic