Panic stack:
DUMPFILE: ./vmcore [PARTIAL DUMP]
CPUS: 64
DATE: Mon Jul 4 06:59:56 2022
UPTIME: 07:29:56
LOAD AVERAGE: 0.15, 0.08, 0.06
TASKS: 1999
NODENAME: XXXXX
RELEASE: 4.18.0-305.19.1.el8_4.x86_64
VERSION: #1 SMP Tue Sep 7 07:07:31 EDT 2021
MACHINE: x86_64 (2900 Mhz)
MEMORY: 127.6 GB
PANIC: "Kernel panic - not syncing: VxVM vxio V-5-3-2090 bulk_cleanup_verification: bad upd"
PID: 7693
COMMAND: "vxiod"
TASK: ffff8ebcfc659ec0 [THREAD_INFO: ffff8ebcfc659ec0]
CPU: 59
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 7693 TASK: ffff8ebcfc659ec0 CPU: 59 COMMAND: "vxiod"
#0 [ffffa2738b70bc68] machine_kexec at ffffffffb726156e
#1 [ffffa2738b70bcc0] __crash_kexec at ffffffffb738faad
#2 [ffffa2738b70bd88] panic at ffffffffb72e0df7
#3 [ffffa2738b70be78] volrv_seclog_write1_done at ffffffffc38d7c77 [vxio]
#4 [ffffa2738b70bea0] voliod_iohandle at ffffffffc3748c2b [vxio]
#5 [ffffa2738b70bee0] voliod_loop at ffffffffc3748e62 [vxio]
#6 [ffffa2738b70bf10] kthread at ffffffffb73043d6
#7 [ffffa2738b70bf50] ret_from_fork at ffffffffb7c0023f
The following messages are seen in dmesg or the messages log:
[26996.239215] VxVM VVR vxio V-5-3-2208 nmcom hdr_magic mismatch
[26996.239217] VxVM VVR vxio V-5-0-855 Disconnecting rlink prod1_to_chn due to stream error.
[26996.239772] VxVM VVR vxio V-5-3-1076 inconsistent update ids
[26996.239773] Kernel panic - not syncing: VxVM vxio V-5-3-2090 bulk_cleanup_verification: bad upd
The root cause has not yet been identified and is being investigated in defect STESC-7058. So far this has been seen on RHEL 7 and RHEL 8 in releases 7.4.1, 7.4.2, 7.4.2 and 8.0.
A workaround has been identified of disabling bulk transfer on both the Primary and Secondary clusters.
Example:
[root@server101 ~]# vradmin -g datadg pauserep datadg-rvg
[root@server101 ~]# vxprint -Vl | grep flags
flags: closed primary enabled attached bulktransfer dcm_in_dco
[root@server101 ~]# vxtune vol_rv_bulk_transfer
Tunable Current Value Default Value Reboot Clusterwide
------------------------------ ----------- ------------ ------ -----------
vol_rv_bulk_transfer 1 1 N N
[root@server101 ~]# vxtune vol_rv_bulk_transfer 0
[root@server102 ~]# vxtune vol_rv_bulk_transfer 0
[root@server101 ~]# vradmin -g datadg resumerep datadg-rvg
[root@server101 ~]# vxtune vol_rv_bulk_transfer
Tunable Current Value Default Value Reboot Clusterwide
------------------------------ ----------- ------------ ------ -----------
vol_rv_bulk_transfer 0 1 N N
[root@server101 ~]# vxprint -Vl | grep flags
flags: closed primary enabled attached dcm_in_dco