IO hang seen on SUSE Linux Enterprise Server (SLES) 15 SP4

book

Article ID: 100060331

calendar_today

Updated On:

Description

Error Message

Due to the nature of the hang, the system was dumped. The resulting dump showed the following log messages:

[179740.059747] VxVM vxio V-5-0-1146 voldsio_timeout: Timeout value 240 seconds, actual io time 242 seconds, I/O Timedout, bad disk?
[179740.059849] VxVM vxio V-5-0-2036 voldio: I/O Timedout hence setting FAIL_IO flag on disk VPLEX_E556, 1 I/Os hung on the disk!
[179740.059855] VxVM vxio V-5-0-1112 voldio: I/O hung; disallowing all I/Os to the disk.
[179985.823800] VxVM vxio V-5-0-1146 voldsio_timeout: Timeout value 240 seconds, actual io time 245 seconds, I/O Timedout, bad disk?
[179985.823945] VxVM vxio V-5-0-2036 voldio: I/O Timedout hence setting FAIL_IO flag on disk VPLEX_E55C, 1 I/Os hung on the disk!
[179985.823951] VxVM vxio V-5-0-1112 voldio: I/O hung; disallowing all I/Os to the disk.
[180106.062054] AMF WARNING V-292-1-44 AMF can no longer monitor DGoffline events. Notifying reapers.
[180280.739512] VxVM vxio V-5-0-1146 voldsio_timeout: Timeout value 240 seconds, actual io time 243 seconds, I/O Timedout, bad disk?
[180280.739617] VxVM vxio V-5-0-2036 voldio: I/O Timedout hence setting FAIL_IO flag on disk VPLEX_E562, 1 I/Os hung on the disk!
[180280.739621] VxVM vxio V-5-0-1112 voldio: I/O hung; disallowing all I/Os to the disk.

 

Over 4000 threads were seen with the same stack:

crash> ps|grep UN|grep kworker|wc -l
4274

So, 4274 of 4276 of these uninterruptible entries are like:
673      2  44  ffff8f2768752880  UN   0.0       0      0  [kworker/44:43]
11998      2  26  ffff8f27f6f9a880  UN   0.0       0      0  [kworker/26:165]

crash> bt 673
PID: 673    TASK: ffff8f2768752880  CPU: 44  COMMAND: "kworker/44:43"
 #0 [ffffabd599b4bc78] __schedule at ffffffff8864ce0d
 #1 [ffffabd599b4bd40] schedule at ffffffff8864dce4
 #2 [ffffabd599b4bd50] schedule_timeout at ffffffff88652e18
 #3 [ffffabd599b4bdc8] io_schedule_timeout at ffffffff8864e0f9
 #4 [ffffabd599b4bde0] wait_for_completion_io at ffffffff8864e722
 #5 [ffffabd599b4be20] blk_execute_rq at ffffffff880dbf63
 #6 [ffffabd599b4be60] dmp_send_scsi_work_fn at ffffffffc1122919 [vxdmp]
 #7 [ffffabd599b4be98] process_one_work at ffffffff87ccd647
 #8 [ffffabd599b4bed8] worker_thread at ffffffff87ccd84d
 #9 [ffffabd599b4bf10] kthread at ffffffff87cd5146
#10 [ffffabd599b4bf50] ret_from_fork at ffffffff87c049a2

 

Cause

The dump was referred to the operating system vendor for comment. The following issue was identified, related to SLES and Qlogic:

https://www.suse.com/support/kb/doc/?id=000021056

 

Resolution

Please note that this article references sites not owned or maintained by Veritas and, as such, Veritas is not responsible for the content portrayed on such sites, including any revisions to or deletions of content or third-party software on which this article relies. User is responsible for conducting all necessary due diligence prior to following the instructions described in this article.

 

Advice from the operating system vendor was to update the kernel to version 4.12.14-122.156 or higher for SLES12SP5 and 5.14.21-150400.24.60 or higher for SLES15 SP4 to avoid the issue.
 

Issue/Introduction

IO hang seen on SUSE Linux Enterprise Server (SLES) 15 SP4