Panic initiated by llt_rc_conn in mlx_ib driver when using RDMA private links on RHEL 7.9 or RHEL 8.X

book

Article ID: 100050610

calendar_today

Updated On:

Description

Error Message

The panic stack trace will be similar to 

PID: 11241 TASK: ffff9f728e3da100 CPU: 13 COMMAND: "llt_rc_conn"
 #0 [ffff9f799ddcba48] machine_kexec at ffffffffa66662c4
 #1 [ffff9f799ddcbaa8] crash_get_memory_size at ffffffffa6722842
 #2 [ffff9f799ddcbb30] init_cq_frag_buf at ffffffffc0833ac7 [mlx5_ib]
 #3 [ffff9f799ddcbb78] crash_shrink_memory at ffffffffa6722930
 #4 [ffff9f799ddcbb90] oops_end at ffffffffa6d8d798
 #5 [ffff9f799ddcbbb8] die at ffffffffa6630a7b
 #6 [ffff9f799ddcbbe8] do_general_protection at ffffffffa6d8d092
 #7 [ffff9f799ddcbc20] general_protection at ffffffffa6d8c718
 [exception RIP: init_cq_frag_buf+103]
 RIP: ffffffffc0833ac7 RSP: ffff9f799ddcbcd0 RFLAGS: 00010202
 RAX: 73041b0734012940 RBX: 0000000000004000 RCX: 73041b0734012900
 RDX: ffff9f799e0c497c RSI: ffff9f799e0c4978 RDI: ffff9f767d1f9000
 RBP: ffff9f799ddcbcd0 R8: 0000000000000101 R9: 0000000000000000
 R10: 0000000000000100 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: ffff9f799e0c4940 R15: ffff9f767d1f9000
 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
 #8 [ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib]
 #9 [ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core]
#10 [ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt]
#11 [ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt]
#12 [ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt]
#13 [ffff9f799ddcbec8] kthread at ffffffffa66c5da1
#14 [ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd

 

Cause

The panic is due to changes in the mellanox mlx5 driver.

Resolution

Please contact RedHat referencing bugzilla 1642498

Issue/Introduction

A panic occurs when using VCS with RDMA private links on RHEL 7.9 or RHEL 8.X.

Additional Information

JIRA: STESC-5879