What does the message "Send out of context hbs to peers from llt_lrput" in Veritas Cluster Server (VCS) mean

book

Article ID: 100023398

calendar_today

Updated On:

Resolution

 When VCS sends a heartbeat to other nodes in the cluster, it uses the Low Latency Transport (llt) module to send the packet. An llt kernel thread is responsible of sending the packets onto the network. If for some reason, this thread is unable to run, then no heartbeat is sent. This may mean that other nodes in the cluster will time out this node and fence it out of the cluster.

In order to guard against a transient issue causing this node to be removed from the cluster, functionality was added at 5.0MP3RP3. Other llt threads which run at a very high priority (interrupt context) monitor the sending of heartbeats to ensure one is sent. If a heartbeat has not been sent, then these out of context threads will send one. The message above indicates this has occurred.

As these out-of-context threads run at a very high priority, it might be possible for the node to continue sending heartbeat packets even though, for all other purposes, the node is effectively dead. Therefore out of context heartbeats will only be sent for a limited time, controlled by the llt variable sendhbcap, the default is 3 minutes.

 

To check the current value of sendhbcap:

lltconfig -T query |  grep sendhbcap

  sendhbcap      = 18000

To change the current value:

lltconfig -T sendhbcap:20000

To make this value permanent across reboots, append the following line to /etc/llttab

set-timer sendhbcap:20000

 

 

Issue/Introduction

The message file contains lines similar to: Apr 22 09:25:10 server1 llt: [ID 367489 kern.notice] LLT INFO V-14-1-10536 llt_send_hb: timer not called for 19 secs (1936 ticks). Send out of context hbs to peers from llt_lrput. 160 secs more to go