Setting LLT peerinact to greater than 214748 causes LLT links to expire

book

Article ID: 100010308

calendar_today

Updated On:

Description

Error Message

 In the following example, we first set peerinact to a value of 214748 and LLT reacts correctly. We then increase peerinact to 214749 and LLT immediately expires all links:

# lltconfig -T peerinact:214748

Response in messages file:

Aug 14 04:28:45 rhel61-01 kernel: LLT INFO V-14-1-10550 llt_setparam: Successfully set the LLT tunable  *LLT Peerinact Time(llt_xpeerinact)* from 214747 to 214748 on user request.
 

# lltconfig -T peerinact:214749

Response in messages file:

Aug 14 04:28:52 rhel61-01 kernel: LLT INFO V-14-1-10550 llt_setparam: Successfully set the LLT tunable *LLT Peerinact Time(llt_xpeerinact)* from 214748 to 214749 on user request.
Aug 14 04:28:52 rhel61-01 kernel: LLT INFO V-14-1-10033 link 0 (eth2) node 1 expired
Aug 14 04:28:52 rhel61-01 kernel: LLT INFO V-14-1-10033 link 0 (eth2) node 2 expired
Aug 14 04:28:52 rhel61-01 kernel: LLT INFO V-14-1-10033 link 1 (eth3) node 2 expired

If peerinact is set to much higher values, a slightly different reaction is seen from LLT:

# lltconfig -T peerinact:360000

Aug 14 01:49:43 rhel61-01 kernel: LLT INFO V-14-1-10550 llt_setparam: Successfully set the LLT tunable  *LLT Peerinact Time(llt_xpeerinact)* from 1600 to 360000 on user request.
Aug 14 01:49:43 rhel61-01 kernel: LLT INFO V-14-1-10510 sent hbreq (NULL) on link 0 (eth2) node 1. 4 more to go.
Aug 14 01:49:43 rhel61-01 kernel: LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (eth3) node 1. 4 more to go.
Aug 14 01:49:43 rhel61-01 kernel: LLT INFO V-14-1-10510 sent hbreq (NULL) on link 0 (eth2) node 2. 4 more to go.

Cause

 The issue is caused by an integer overflow within LLT code when peerinact is set to a high value.

Resolution

 The issue will be resolved at a later patch level.

As a workaround, do not set peerinact to a value greater than 214748.


Applies To

 This problem effects all platforms and all versions of VCS upt to and including 5.1SP1RP3 and 6.0.3. The threshold value at which the issue will start to occur may vary between platforms. The minimum value at which the issue might start is 214749.

Issue/Introduction

 the llt peerinact tunable can be used to adjust the amount of time LLT will wait after heartbeats stop being recieved on a link before the link is expired. Normally, it is preferable to keep this value as the default, such that LLT reacts reasonably quickly to such an event, but gives enough time that a temporary glitch does not cause links to continually bounce up and down.  In some situations it is useful to tune peerinact to a high value. For example, if network maintence is being carried out which might cause llt links to be effected temporarily and it is preferred that LLT ignores these events. If a value for peerinact greater than 214748 is set, LLT might immediately expire all links. Causing VCS to react as if all other nodes have left the cluster. This might lead to split brain, or panics initiated by IO Fencing, of GAB.  Peerinact is set in hundredths of a second, therefore a value of 214748 is around 35 minutes.