VCS node will not join to the cluster if LLT Private links MTU size is different to the LLT MTU of existing node clusters

book

Article ID: 100024291

calendar_today

Updated On:

Description

Error Message

In the engine_A.log of joining node we will see the following error message:

VCS NOTICE V-16-1-10465 Getting snapshot.  snapped_membership: 0x3 current_members
hip: 0x3 current_jeopardy_membership: 0x0

In the engine_A.log of the existing cluster nodes we will see the following error message:

VCS INFO V-16-1-10455 Sending snapshot to node membership: 0x2

Cause

The LLT MTU size value in the joining node is different to the existing cluster active nodes, so despite the LLT links are reported as up there're still a communication problem between the nodes.

In the example below we have the default 1500 value:

bash-3.00# lltstat -c
LLT configuration information:
    node: 1
    name: sun07
    cluster: 33
      Supported Protocol Version(s)     : 5.0
    nodes: 0 - 63
    max nodes: 64
    max ports: 32
    links: 2
    mtu: 1452 <<<<<<<<<<<<
    max sdu: 66560
.....

   
and here we've a different value:

bash-3.00# lltstat -c
LLT configuration information:
    node: 0
    name: sun06
    cluster: 33
      Supported Protocol Version(s)     : 5.0
    nodes: 0 - 63
    max nodes: 64
    max ports: 32
    links: 2
    mtu: 9146 <<<<<<<<<<<<<<<
    max sdu: 66560
...

When LLT starts, it gets the NIC MTU size information from the OS kernel. If the NIC MTU size value has been modified to, for instance, start using jumbo frames, the LLT private links will start with this new MTU size setting.

A symptom of the problem is the OS message file reporting LLT in trouble/active messages constantly like:

LLT INFO V-14-1-10205 link 0 (ce0) node 0 in trouble
LLT INFO V-14-1-10024 link 0 (ce0) node 0 active

Another symptom of the problem is the high rate of retransmited data packets in the existing cluster active node sending the snapshot to the joining node:

bash-3.00# lltstat
LLT statistics:
    185        Snd data packets
    388981     Snd retransmit data
.........
 

Resolution

We have to be sure LLT links are the same MTU size value in all the cluster nodes.

As the LLT NIC properties could be accidentally modified, it's a good practice to set the LLT MTU size values in the /etc/llttab LLT configuration file. Here we set it to the usual default value of 1500:

bash-3.00# cat /etc/llttab
set-node sun06
set-cluster 33
link ce0 /dev/ce:0 - ether - 1500
link ce1 /dev/ce:1 - ether - 1500

 

Applies To

VCS Cluster for Unix

Issue/Introduction

VCS node will not join to the cluster if somehow LLT Private links MTU size has been modified and it's different to the LLT MTU size of existing node clusters. Despite of lltstat -vvn will show healthy LLT links with no obvious LLT problems, and gabconfig -a will show right Port a/Port h membership, the VCS HAD daemon will fail to complete its start up operation in the joining node.