Avoiding stack overflow on Linux Platforms with Veritas InfoScale and Storage Foundation

book

Article ID: 100031567

calendar_today

Updated On:

Description

Error Message

The system panicked with a VxFS or Veritas Oracle Disk Manager (ODM) kernel function in the stack. 
 

Case 1

In one customer case the system panicked after the following operations:

  1. Use RDMA heartbeat for LLT
  2. Use Flexible Storage Sharing (FSS) option
  3. Disconnect one LLT link. One machine will panic


The above system panic shows the following panic stack.

vx_dio_physio
vx_dio_rdwri
fdd_write_end
fdd_rw
fdd_odm_rw
odm_vx_io
odm_io_start
odm_io_req
odm_io
odm_io_stat
odm_ioctl_ctl
odm_ioctl_ctl_unlocked
vfs_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath

Case 2

In another customer case, the system panic shows the following kernel stack.

vx_dev_strategy
vx_snap_strategy
vx_io_startnowait
vx_nalloc_getpage_lnx
vx_do_getpage
vx_do_read_ahead
vx_read_ahead
vx_do_getpage
vx_getpage1
vx_fault
__do_fault
handle_pte_fault
handle_mm_fault
__get_user_pages
get_user_pages
vx_dio_physio
vx_dio_rdwri
vx_write_direct
vx_write1
vx_write_common_slow
vx_write_common
vx_write
vfs_write
sys_pwrite64
system_call_fastpath

Cause

On RHEL (Red Hat Enterprise Linux) 6 systems (up to and including RHEL 6.6) the kernel thread stack has a limited size of 8KB. This is increased to 16KB starting from RHEL 6.7. Each time a kernel thread calls a kernel function, some of its stack space will be used. The stack space will be returned when the kernel function returns back to the caller.  When the combined stack space used by all the functions is over the limit, a stack overflow occurs and will cause system panic.  

In some situations a kernel thread executing the VxFS kernel functions may already use up a large portion of the stack space, when this thread calls the lower layer functions, only a small amount of stack space is left.  If the lower layer functions continues to use this small piece of stack space, the stack may overflow.

Resolution

Upgrade the RHEL version to 6.7 (or above) and 7.1 (or above).  The latest RHEL versions extended the kernel stack size to 16KB. This can eliminate almost all common stack overflow issues.   

In case you can't upgrade to latest RHEL versions yet, ensure that the latest Veritas Storage Foundation patches are installed, especially for SF 6.0.5.  There are incidents in SF 6.0.5 which can increase the kernel stack usage and increase the chance to hit the stack overflow issue.  The Linux kernel I/O scheduler should also be changed to deadline.  Refer to the following related article for details.

Article 000024448 - Linux Completely Fair Queuing (CFQ) I/O Scheduler configured on a system running SF may cause system panic due to kernel task stack overflow

There are two kernel parameters that can be used to resolve the two panics described in this article. By configuring these two parameters, a thread hand-off can be added before submitting the I/O to VxVM (Volume Manager) when there's not sufficient stack space left. These parameters are not run-time parameters. They can be set at the module load time only. They will only take affect if the VxFS module is unloaded and reloaded, or if the system is rebooted.

The following two module parameters need to be configured for this solution:

  • vxfs_io_proxy_vxvm: If enabled, VxVM devices are included in I/O hand-off decisions.
  • vxfs_io_proxy_level: When free stack space falls below this level, an I/O is handed off to a proxy. The default value of vxfs_io_proxy_level is 4K bytes.

 
Set above VxFS kernel parameters using the vxfs.conf file as follows:

1. Create a vxfs.conf file inside /etc/modprobe.d directory.

touch /etc/modprobe.d/vxfs.conf    
 

2. Copy the following lines into the vxfs.conf file.

options vxfs vxfs_io_proxy_vxvm=1

options vxfs vxfs_io_proxy_level=6144
 

The change will take affect when the system is rebooted, or the VxFS module is unloaded and reloaded.

Issue/Introduction

VxFS (Veritas File System) is involved in a kernel panic caused by a stack overflow. The stack overflow is detected immediately by VxFS after the submitted I/O is returned from the lower layer. The stack overflow does not happen in VxFS, but somewhere in the lower layers.

Applies to:

  • Veritas InfoScale
  • Storage Foundation
  • Linux platforms
     

Additional Information

ETrack: 3712961