Cluster will not form if a node is stuck in a REMOTE_BUILD state.

book

Article ID: 100009943

calendar_today

Updated On:

Description

Error Message

c:\hasys -state
#System                     Attribute        Value
       SysState        RUNNING
       SysState        REMOTE_BUILD

 

 

Cause

Issue may be related to dropped packets across the heartbeat network which will prevent the cluster from forming.  This could be due to the LLT packets not being able to be transmitted over the network with the default LLT MTU of 1500.  If the network will not allow this size UDP packet the LLT service will not be able to transmit cluster messages between the nodes.

Resolution

Edit the LLTTab.txt file (%vcs_root%\comms\llt) on ALL nodes of the cluster to use a lower MTU value (default value is 1500)

Use the following steps to

  • Stop the cluster
  • Edit the LLTTab.txt file, changing the MTU value to 1200
  • Restart the cluster

 

1.  Stop HAD on all nodes
     Run the following command on one node

hastop -all -force

     HAD may not respond to commands while the cluster is stuck in a Remote_Build state. 

If so, manually kill HAD.exe and HaShadow:

In Task Manager \ Processes tab click on "Show Processes from all users"

End Process on HAD.exe and Hashadow.exe

     Please confirm that both processes ended and did not restart.


2.  Stop LLT on all nodes.

Run the following command on ALL nodes of the cluster to stop LLT communication.

         net stop llt /y

3.  Backup the LLTTab.txt file. (in the %vcs_root%\comms\llt folder)


4.  Manually edit the LLTTab.txt file on each node of the cluster. 

 

If Heartbeat NICs are configured to use LLT over Ethernet, on each line starting with "link" change the last "-" to 1200


Example LLTTab.txt file with default value of 1500:

# This is a program generated file. Please do not edit.
set-node DRSERVER1
set-cluster 2

link Adapter0 00:00:00:00:00:00 - ether - -
link Adapter1 00:00:00:00:00:00 - ether - -

#disable LLT broadcasts

set-bcasthb 0
set-arp 1


Example LLTTab.txt file with MTU set to 1200:

# This is a program generated file. Please do not edit.
set-node DRSERVER1
set-cluster 2

link Adapter0 00:00:00:00:00:00 - ether - 1200
link Adapter1 00:00:00:00:00:00 - ether - 1200

#disable LLT broadcasts

set-bcasthb 0
set-arp 1


 
Save and close the LLTTab.txt file to save changes.



If Heartbeat NICs are configured to use LLT over UDP then change the second to last "-" to 1200

Example:

On lines beginning with 'link':
“link Link1 udp - udp 50000 - 10.10.10.10 -”

Edit as follows:
“link Link1 udp - udp 50000 1200 10.10.10.10 -”

5.  Start HAD to confirm cluster is able to form

Run the following command on just one node to start HAD on all systems:

C:\hastart

6.  Run the following command to check cluster status


C:\hasys -state

All nodes should now be listed in a RUNNING state.

Please note that if the last command is run too soon, the other node may not have enough time to start.
Repeat the command until both nodes show as RUNNING


Applies To

SFW-HA 5.1 SP2

SFW-HA 6.0

SFW-HA 6.0.1

Windows 2008 Standard (x64-64bit)

LLT is configured to use LLT

Issue/Introduction

 When using LLT over UDP the cluster may not form and all nodes be able to take up cluster membership. Cluster will not form due to a node is stuck in a REMOTE_BUILD state.