After upgrade to Infoscale patch 7.4.2.1800, VCS MountAgent process causes panic

book

Article ID: 100050773

calendar_today

Updated On:

Description

Error Message

Analysis of the crash dump showed a backtrace like below, and MountAgent as process (CFSMount agent also subject to the same issue):

crash> bt
PID: 7280   TASK: ffff9d2652748000  CPU: 15  COMMAND: "MountAgent"
 #0 [ffffb3d00f7eb488] machine_kexec at ffffffffae46156e
 #1 [ffffb3d00f7eb4e0] __crash_kexec at ffffffffae58f99d
 #2 [ffffb3d00f7eb5a8] crash_kexec at ffffffffae59088d
 #3 [ffffb3d00f7eb5c0] oops_end at ffffffffae42434d
 #4 [ffffb3d00f7eb5e0] no_context at ffffffffae47262f
 #5 [ffffb3d00f7eb638] __bad_area_nosemaphore at ffffffffae47298c
 #6 [ffffb3d00f7eb680] do_page_fault at ffffffffae473267
 #7 [ffffb3d00f7eb6b0] page_fault at ffffffffaee010fe
    [exception RIP: d_path+52]
    RIP: ffffffffae753504  RSP: ffffb3d00f7eb760  RFLAGS: 00010286
    RAX: ffff9d11006f1000  RBX: ffffb3d00f7eb7b8  RCX: 0000000000000301
    RDX: 0000000000001000  RSI: ffff9d11006f0000  RDI: 0000000000000000
    RBP: ffffb3d00f7eb790   R8: 0000000000000000   R9: 000000000166f25f
    R10: 0000000000000000  R11: 0000000000000100  R12: ffffb3d00f7eb970
    R13: 0000000000000000  R14: ffff9d11006f0000  R15: ffffffffc08631a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffb3d00f7eb798] amf_plat_fs_verify at ffffffffc0847e38 [amf]
 #9 [ffffb3d00f7eb8b8] amf_ev_fsoff_verify at ffffffffc083eb74 [amf]
#10 [ffffb3d00f7eb8d0] amf_event_reg at ffffffffc08354ee [amf]
#11 [ffffb3d00f7eb920] amfioctl at ffffffffc084d754 [amf]
#12 [ffffb3d00f7ebe68] amf_ioctl at ffffffffc0843c1c [amf]
#13 [ffffb3d00f7ebe80] do_vfs_ioctl at ffffffffae72dfe4
#14 [ffffb3d00f7ebef8] ksys_ioctl at ffffffffae72e620
#15 [ffffb3d00f7ebf30] __x64_sys_ioctl at ffffffffae72e666
#16 [ffffb3d00f7ebf38] do_syscall_64 at ffffffffae40420b
#17 [ffffb3d00f7ebf50] entry_SYSCALL_64_after_hwframe at ffffffffaee000ad
    RIP: 00007f4b5578a62b  RSP: 00007f4b548f4e48  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000001  RCX: 00007f4b5578a62b
    RDX: 00007f4b40003ae0  RSI: 0000000046c0af04  RDI: 000000000000000c
    RBP: 000000000000000c   R8: 0000000000000000   R9: 00007f4b40000080
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000001
    R13: 000000000040a21a  R14: 00007f4b548f6b90  R15: 00007f4b40003ae0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

 

Cause

The panic is due to changes within related RedHat 8.4 kernel structures, which impacted Cluster Server's Asynchronous Monitoring Framework (AMF), used to get instant notifications (as opposed to VCS traditional monitoring).

With RHEL 8.4, AMF will cause a node to crash when agents register with AMF. 
 

Resolution

 

 

For RedHat 8.4 support with Infoscale, the following patch is required:

https://downloads.infoscale.com/infoscale/REL600675/7.4.2.1800?q=UPD559557&fileNumber=FILE261432&updateNumber=UPD559557 

 

To avoid the AMF-related panic, the following fix also needs to be applied:

https://downloads.infoscale.com/infoscale/REL600675/7.4.2.1900?q=UPD206370&fileNumber=FILE741279&updateNumber=UPD206370 

 

However, for any new download of the 7.4.2.1800 patch from 13 July, 2021, the AMF-related panic is addressed in the refreshed patch:

https://downloads.infoscale.com/infoscale/REL600675/7.4.2.1800?q=UPD559557&fileNumber=FILE261432&updateNumber=UPD559557 

 

 

7.4.1 is also impacted if having applied the Infoscale 7.4.1.2800 patch for RHEL 8.4 support. Work is ongoing on an AMF fix for this release. The issue will be fixed in the forthcoming 7.4.1 update 5 patch, due on 23 July..

 

If patching is not an option, a workaround exists. To prevent a system panic, disable the AMF mechanism for the Mount and the CFSMount agents.

Run the following command before upgrading the operating system:

# haimfconfig -disable -agent Mount CFSMount

This command disables the AMF mechanism for the specified agents by changing the Mode value to 0 for each agent and for all the associated resources whose Mode values were overridden.

- If VCS is running, the command prompts the user to confirm whether to make the configuration changes persistent. If "No" is specified, the command exits. If "Yes" is specified, it disables the AMF mechanism and saves the update to the configuration by using the "haconf -dump -makero" command.

- If VCS is not running, the Mode value for the agents is modified in the VCS configuration file. Before it makes any changes to configuration files, the command prompts for confirmation. If "No" is specified, the command exits. If "Yes" is specified, the configuration file is updated.

Disabling AMF for these agents  causes agents to fall back to periodic (default of every 60 seconds) monitoring, and hence doesn’t impact the High Availability of the application. 

Issue/Introduction

After upgrade to Infoscale patch 7.4.2.1800 for RHEL 8.4 support, VCS MountAgent process causes panic