VVR: VxVM 6.2.1 VVR Primary node may panic when sending data (nmcom_throttle_send) to the Secondary node as a result of accessing already released freed memory

book

Article ID: 100045756

calendar_today

Updated On:

Description

Error Message

Node panic due to "unable to handle kernel paging request at ffff88332e8b552c"

KERNEL: vmlinux.2.6.32-696.18.7.el6.x86_64
DUMPFILE: vmcore-190328131922. [PARTIAL DUMP]
CPUS: 12
DATE: Tue Mar 26 18:24:35 2019
UPTIME: 82 days, 00:59:04
LOAD AVERAGE: 0.25, 0.38, 0.36
TASKS: 2129
NODENAME: ###########
RELEASE: 2.6.32-696.18.7.el6.x86_64
VERSION: #1 SMP Thu Dec 28 20:15:47 EST 2017
MACHINE: x86_64 (3491 Mhz)
MEMORY: 64 GB
PANIC: "BUG: unable to handle kernel paging request at ffff88332e8b552c"
PID: 57635
COMMAND: "nmcom-sender"
TASK: ffff8803a50dcab0 [THREAD_INFO: ffff88096e178000]
CPU: 3
STATE: TASK_RUNNING (PANIC)

The back trace (bt) output for task "ffff8803a50dcab0" shows the "nmcom-sender" routine is running.

crash> bt
PID: 57635 TASK: ffff8803a50dcab0 CPU: 3 COMMAND: "nmcom-sender"
#0 [ffff88096e17ba00] machine_kexec at ffffffff8103eb3b
#1 [ffff88096e17ba60] crash_kexec at ffffffff810d2772
#2 [ffff88096e17bb30] oops_end at ffffffff81550570
#3 [ffff88096e17bb60] no_context at ffffffff810515eb
#4 [ffff88096e17bbb0] __bad_area_nosemaphore at ffffffff81051875
#5 [ffff88096e17bc00] bad_area_nosemaphore at ffffffff81051943
#6 [ffff88096e17bc10] __do_page_fault at ffffffff81052100
#7 [ffff88096e17bd30] do_page_fault at ffffffff815524fe
#8 [ffff88096e17bd60] page_fault at ffffffff8154f365
[exception RIP: nmcom_throttle_send+494]
RIP: ffffffffa09fbe5e RSP: ffff88096e17be10 RFLAGS: 00010086
RAX: ffff88332e8b5400 RBX: ffff88096e7a3710 RCX: 000000000000cc83
RDX: 0000000200000001 RSI: 0000000000000246 RDI: ffff88096e7a3710
RBP: ffff88096e17be50 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffff88096e7a3400 R14: ffff880bba33d000 R15: ffff880b7a2cd000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88096e17be58] nmcom_sender at ffffffffa09fc2e2 [vxio]
#10 [ffff88096e17bee8] kthread at ffffffff810a6d0e
#11 [ffff88096e17bf48] kernel_thread at ffffffff81557afa

crash> ps 57635
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 57635 2 3 ffff8803a50dcab0 RU 0.0 0 0 [nmcom-sender]

Cause

After sending data from the VVR (Veritas Volume Replicator) Primary server to the Secondary server, the code was accessing memory variables for already released (freed) memory, due to the data ACK have already been processed.

This is a rare race condition which may happen due to accessing the freed memory.

Resolution

Veritas engineering successfully identified the corresponding source code which is causing the memory access.

Code changes have been made to avoid the incorrect memory access.

Please contact Veritas Technical Support to download Private hot-fix VRTSvxvm-6.2.1.8202-RHEL6.x86_64.

As the issue has only recently been identified (June 2019), the other product versions will not contain a fix at this time.

Reference Escalation: STESC-2900

Issue/Introduction

The VVR (Veritas Volume Replicator) Primary node may panic when sending data (nmcom_throttle_send) to the Secondary node as a result of accessing already released freed memory. We are observing a panic while accessing internal VVR structures.

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"