MultiNICB monitor timeout "NIC" went from Up to Down

book

Article ID: 100003960

calendar_today

Updated On:

Description

Error Message

Below error messages are coming in engine_A.logs
================================================
2010/09/23 11:49:13 VCS ERROR V-16-2-13027 (atrcxb953) Resource(pub_mnic) - monitor procedure did not complete within the expected time.
2010/09/23 11:49:55 VCS ERROR V-16-2-13027 (atrcxb946) Resource(pub_mnic) - monitor procedure did not complete within the expected time.
2010/09/23 11:51:46 VCS ERROR V-16-2-13027 (atrcxb946) Resource(stor_mnic) - monitor procedure did not complete within the expected time.
2010/09/23 11:52:24 VCS ERROR V-16-2-13027 (atrcxb953) Resource(pub_mnic) - monitor procedure did not complete within the expected time.
2010/09/23 11:52:56 VCS INFO V-16-10001-6557 (atrcxb946) MultiNICB:pub_mnic:monitor:Device:  bnxe0 went from Up to Down


2010/09/27 10:41:32 VCS DBG_AGDEBUG V-16-50-0 Thread(1111) Calling monitor for resource stor_mnic
2010/09/27 10:41:32 VCS DBG_AGDEBUG V-16-50-0 Thread(1111) Agent framework version is 2
2010/09/27 10:41:32 VCS DBG_4 V-16-50-0 MultiNICB:stor_mnic:monitor: In checkStatus:889 calling haping on bnxe4  <=== Never returned

2010/09/27 10:42:42 VCS DBG_4 V-16-50-0 MultiNICB:stor_mnic:monitor: In checkStatus:889 calling haping on bnxe4  <--- this have returned when new thread was started,
2010/09/27 10:42:42 VCS DBG_4 V-16-50-0 MultiNICB:pub_mnic:monitor: In checkStatus:889 calling haping on bnxe0
2010/09/27 10:42:42 VCS DBG_4 V-16-50-0 MultiNICB:stor_mnic:monitor: In checkStatus:896 haping status for bnxe4 = 100

 

This issue is fixed by hotfix patch vcs-sol_x64-VRTSvcsag-5.1RP2HF2. Please contact Veritas Technical Support to obtain the patch.After applying the patch set the NumThread of MultiNICB to 10 by running following command.#haconf -makerw#hatype -modify MultiNICB NumThread 10#haconf -dump -makero

Applies To

VCS 5.1 RP2 on Solaris x86_64 with MultiNICB Resoruce

Cause

We can see that the haping called for an Interface bnxe4 never returned back. VCS Monitoring Framework cancel the monitoring if it does not complete successfully in 300 second, As the haping is not returning for over 300 second hence the monitor procedure is cancelled. it's also observed that haping is failint intermatiently. 

This is above behaviour is a known issue with MultiNICB agent and below are the possible solutions which can be used to avoid these messages.

 

 

Issue/Introduction

Customer is experiencing the MultiNICB monitor timeout issue but MultiNICB Resource is up, No Network problems are seen, "NIC" status went from Up to Down messages are logged in engine_A.log. MultiNICB use haping binary during monitor to ping the Host Defined in NetworkHost or Broadcast Address, If haping is not completed successfully it will log the messages as seen below.