LLT link does not recover after physical loss of cable is recovered on 1000BASE-T link on Solaris

book

Article ID: 100004522

calendar_today

Updated On:

Description

Error Message

The VCS/LLT testing utility dlpiping will show that the link layer is down, and DLPI packets cannot traverse the network.

In this example, the Solaris host vcs_server2 has had a cable failure on /dev/nxge3 LLT interface, and after fixing the cable the link has not recovered.

vcs_server2 # /opt/VRTSllt/dlpiping -cv /dev/nxge:3 00:14:4F:6A:2C:B3
dlpiping: opening network device: /dev/nxge (unit 3)
dlpiping: binding ping SAP 0xf00e
dlpiping: sent a request to 00:14:4F:6A:2C:FFFFFFB3:FFFFFFF0:0E
dlpiping: sent a request to 00:14:4F:6A:2C:FFFFFFB3:FFFFFFF0:0E
dlpiping: sent a request to 00:14:4F:6A:2C:FFFFFFB3:FFFFFFF0:0E
dlpiping: sent a request to 00:14:4F:6A:2C:FFFFFFB3:FFFFFFF0:0E
dlpiping: sent a request to 00:14:4F:6A:2C:FFFFFFB3:FFFFFFF0:0E
dlpiping: sent a request to 00:14:4F:6A:2C:FFFFFFB3:FFFFFFF0:0E
dlpiping: sent a request to 00:14:4F:6A:2C:FFFFFFB3:FFFFFFF0:0E
    

 

 vcs_server2 # /opt/VRTS/bin/lltstat -vvn

     0 vcs_server1         OPEN
                                  nxge0   UP      00:14:4F:6A:36:B0
                                  nxge1   UP      00:14:4F:6A:36:B1
                                  nxge2   UP      00:14:4F:6A:36:B2
                                  nxge3   DOWN
   * 1 vcs_server2         OPEN
                                  nxge0   UP      00:14:4F:6A:2C:70
                                  nxge1   UP      00:14:4F:6A:2C:71
                                  nxge2   UP      00:14:4F:6A:2C:72
                                  nxge3   UP      00:14:4F:6A:2C:73

Cause

Sun does not recommend that Symantec customers adopt a policy of forcing interfaces.

Auto-negotiation (AN; auto-neg; autoneg) is defined in:
IEEE Std 802.3u clause 28       (Fast Ethernet)
IEEE Std 802.3z clause 37       (Gigabit Ethernet)
Note: In 802.3z specifications, auto-negotiation is mandatory. All devices which are
802.3z compliant, MUST have auto-negotiation enabled by default.

The latest Solaris Ethernet devices and drivers (hme, qfe, eri, dmfe, ge, ce) are fully 802.3 compliant.
Even though these drivers have the ability to change auto-neg, speed and duplex settings, Sun's preferred (and recommended) way is to auto-negotiate and not disable auto-neg capabilities!

More Information from Sun:

Oracle(Sun): Recommended Ethernet Port Configuration (Auto-Negotiation or Manual Configuration) [ID 1006000.1]
Oracle(Sun): Should auto-negotiation be changed to force the speed and mode on Ethernet interface adapters? [ID 1004579.1]

 

Although this article is specific to Solaris, as the IEEE802.3 standard stipulates that the 'adv_autoneg_cap=1' is essential for reliable 1000BASE-T operation on any operating system platform.

Resolution

Veritas insist that if any 1000BASE-T  link is to be used for LLT or TCP/IP, that the link retain it's adv_autoneg_cap=1 tunable for these links.

 

If there is some concern that the links may 'train' to a lower speed, Veritas would recommend turning off the advertised capability of the slower link speeds:

eg. common settings to fix to 1000BASE-T with autonegotiation on:

adv-autoneg_cap = 1 adv_1000fdx_cap = 1 adv_100fdx_cap = 0 adv_10fdx_cap = 0;

 

If a link has NOT recovered due to autoneg being turned off. Using the ndd command to set autoneg on the NIC instance is enough to recover a link that will not restart.

eg. Modifying the /dev/nxge3 to use autonegotiation without reboot:

# ndd -set /dev/nxge instance 3# ndd -set /dev/nxge adv_autoneg_cap 1

Confirm that the nxge3 link is UP by looking at the latest syslog messages:

# grep nxge3 /var/adm/messages | tail

Confirm that LLT has recovered the link:

# /opt/VRTS/bin/lltstat -vvn

Applies To

This particular behaviour s only an issue for any cards that are capable of running 1000BASE-T speeds and higher. Forcing speed and duplex settings are not enough to allow a 1000BASE-T  link to recover after a physical link loss (cable pull), even if some cards allow link_master to be set.

This particular example uses Sun NXGE interface cards which are capable of speeds from 10BASE-T  to 10kBASE-T.

The speeds 1000BASE-T  and 10000BASE-T  have a mandatory requirement to have autonegotiate set.

Note: NIC tunables are generally tuned in their specific driver's /kernel/drv/.conf file. But they can also be tuned in custom-written startup scripts that adjust the NIC tunables using the Solaris 'ndd' command.

Example of /kernel/drv/nxge.conf:

name = "pciex108e,abcd" parent = "/pci@0,600000/pci@0/pci@9" unit-address = "0" adv-autoneg-cap = 0 adv_10gfdx_cap = 0 adv_1000fdx_cap = 0 adv_100fdx_cap = 1 adv_10fdx_cap = 0;
name = "pciex108e,abcd" parent = "/pci@0,600000/pci@0/pci@9" unit-address = "0,1" adv-autoneg-cap = 0 adv_10gfdx_cap = 0 adv_1000fdx_cap = 1 adv_100fdx_cap = 0 adv_10fdx_cap = 0;
name = "pciex108e,abcd" parent = "/pci@0,600000/pci@0/pci@9" unit-address = "0,2" adv-autoneg-cap = 0 adv_10gfdx_cap = 0 adv_1000fdx_cap = 0 adv_100fdx_cap = 1 adv_10fdx_cap = 0;
name = "pciex108e,abcd" parent = "/pci@0,600000/pci@0/pci@9" unit-address = "0,3" adv-autoneg-cap = 0 adv_10gfdx_cap = 0 adv_1000fdx_cap = 1 adv_100fdx_cap = 0 adv_10fdx_cap = 0;
 
vcs_server1 # dladm show-dev  
bge0            link: up        speed: 100   Mbps       duplex: full 
nxge0           link: up        speed: 100   Mbps       duplex: full  
nxge1           link: up        speed: 1000  Mbps       duplex: full  
nxge2           link: up        speed: 100   Mbps       duplex: full  
nxge3           link: down      speed: 1000  Mbps       duplex: full  

 

We see that interfaces nxge1 and nxge3 are configured as 1000BASE-T, but have Autonegotiate tuned to be OFF  in /kernel/drv/nxge.conf.

Disabling autonegotiation on 1000BASE-T  links is not supported, as the link layer cannot adequately 'renegotiate' a lost link when autoneg is off. Even if the NIC driver is tuned with link_master parameters. The link_master parameters are ONLY supported for back-to-back NIC configuration, and have been found to have issues even under this scenario. It is simply not recommended to run 1000MBit speeds without autonegotiation being enabled for NIC capable of 1000BASE-T speeds or higher.

Issue/Introduction

If a physical link is disconnected, the ability of the link to self-recover (without a reboot) is severely limited if autonegotiation is not set in the NIC configuration file.