VCS MultiNICB resource sees frequent link UP / DOWN messages

book

Article ID: 100008390

calendar_today

Updated On:

Description

Error Message

From VCS  engine_A.log:

2012/04/27 01:19:33 VCS INFO V-16-10001-6557 (n933) MultiNICB:Network_MultiNICB:monitor:Device: igb0 went from Up to Down
2012/04/27 01:19:42 VCS INFO V-16-10001-6556 (n933) MultiNICB:Network_MultiNICB:monitor:Device: igb0 went from Down to Up

2012/04/27 18:23:26 VCS INFO V-16-10001-6557 (n933) MultiNICB:Network_MultiNICB:monitor:Device: igb0 went from Up to Down
2012/04/27 18:23:36 VCS INFO V-16-10001-6556 (n933) MultiNICB:Network_MultiNICB:monitor:Device: igb0 went from Down to Up

 

From MultiNICB agent (debug) log:

2012/05/29 03:01:41 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb0 =100 MultiNICB.C:checkStatus[970]
2012/05/29 03:01:42 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb1 =111 MultiNICB.C:checkStatus[970]
2012/05/29 03:01:42 VCS INFO V-16-10001-6557 MultiNICB:Network_MultiNICB:monitor:Device: igb1 went from Up to Down

 

Cause

VCS MultiNICB agent uses haping (/opt/VRTSvcs/bin/MultiNICB/haping) to check lines health. Haping sends ICMP request packets to NetworkHosts configured and waits for reply for NetworkTimeout interval (default 100 msec). If reply is received within this time period, haping returns 100 i.e. link up else error code.

Based on type of error haping returns different error codes.

In this case we are getting haping return value as 111, which is because of timeout. This could happen for multiple reasons:
- Network host agent is trying to reach take more time to reply and haping timesout.
- Reply gets delayed because of high network traffic.
- Network fluctuations cause request/reply packet drop.
- Network host itself is down, so on.

When haping reports timeout for a specific link, MultiNICB agent retries for OfflineTestRepeatCount (default 3) times before reporting link as down i.e. haping is invoked 3 times for the same interface. If haping returns error for all 3 times then only agent reports link as down.

2012/05/29 03:01:41 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb0 = 100 MultiNICB.C:checkStatus[970]
2012/05/29 03:01:42 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb1 = 111 MultiNICB.C:checkStatus[970]
2012/05/29 03:01:42 VCS INFO V-16-10001-6557 MultiNICB:Network_MultiNICB:monitor:Device: igb1 went from Up to Down
2012/05/29 03:01:42 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:956 calling haping on igb1 MultiNICB.C:checkStatus[956]
2012/05/29 03:01:43 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb1 = 111 MultiNICB.C:checkStatus[970]
2012/05/29 03:01:43 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:956 calling haping on igb1 MultiNICB.C:checkStatus[956]
2012/05/29 03:01:43 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb1 = 100 MultiNICB.C:checkStatus[970]

 

 

Resolution

This condition MOSTLY occurs because of network fluctuations or network traffic flood.

This could be confirmed by running haping command and collecting the following output. This data needs to be collected when issue is hit i.e. when haping times out (haping return value = 111)


# /opt/VRTSvcs/bin/MultiNICB/haping -v -g

For example:

# /opt/VRTSvcs/bin/MultiNICB/haping -v -g igb1  

Output for ping to default Routter/NetworkHosts configured should also be checked.

# ping -s 10 10

One possible workaround is to increase NetworkTimeout value from default 100ms to say 1000ms as follows:

# haconf -makerw
# hares -modify Network_MultiNICB  NetworkTimeout 1000
# haconf -makero -dump

Applies To

Solaris 10

VCS 5.1SP1

MultiNICB  resource configured in Base Mode (UseMpathd = 0)

 

Issue/Introduction

Seeing frequent link UP / DOWN messages for VCS MultiNICB resource.

Additional Information

ETrack: 2804288