LLT fails to start and some times core dump during system boot

book

Article ID: 100008087

calendar_today

Updated On:

Description

Error Message

lltconfig -v display the error message : 

V-14-2-15369 do_put: sendto failed: No buffer space available

Cause

LLT fails to start as it could not send the Packets on the network and an error ENOBUF "No buffer space available" is returned by sendto() system call. error ENOBUF means that the output queue for a network interface is full. This does not indicate a problem with LLT and requires tuning of network buffering configuration of Operating system.

LLT doesn't handle the above ENOBUF error message correctly, and core dumps. The core dump is generated because of a mismatch in the way message is interpreted. The message in the code is passing an integer while the message in print routine is expecting a string, so, when the message conversion happens in the code, a segmentation fault happens due to mismatched arguments and lltconfig generate a core.

Resolution

Until a patch is released that prevents the core dump (possibly in the next GA release of VCS), the workaround is:

1- Increase the values of the following kernel parameters, to increase the network buffer size:
    net.core.wmem_max
    net.core.rmem_max
    net.core.wmem_default
    net.core.rmem_default

Note:  The OS vendor should be engaged for advice on the actual value that should be configured or refer to Linux documentation.

2- To avoid lltconfig generating a core , Disable verbose logging for llt.

After using the above workaround(s) restart the llt and gab (vxfen if configured) before attempting to start VCS Services.

 

Applies To

OS: Linux (RHEL and SLES).

VCS version: 5.1SP1RP1 and 6.0 versions.

Issue/Introduction

In Linux environments running VCS 5.1SP1RP1 and 6.0, LLT may fail to start after a reboot of a node or core dump, when lltconfig is started with verbose logging then it return an error "sendto failed". Manually restarting LLT after the first core dump may result in the same error or new core dump. In this situation, gab and the rest of the components of VCS can't start. it's identified that there are two issues because of which lltconfig does not start or core dump. 1 - sendto() system call is failed at OS level due to which lltconfig tries to write an error message on console
2 - due to issue 1 lltconfig with -v option is trying to write an error message and using incorrect message ID; this is resulting in lltconfig to core dump

Additional Information

ETrack: 2763810