MultiNICB monitor timeout issue

book

Article ID: 100003959

calendar_today

Updated On:

Description

Error Message

Below error messages are coming in engine_A.logs================================================2010/09/23 11:49:13 VCS ERROR V-16-2-13027 (atrcxb953) Resource(pub_mnic) - monitor procedure did not complete within the expected time.2010/09/23 11:49:55 VCS ERROR V-16-2-13027 (atrcxb946) Resource(pub_mnic) - monitor procedure did not complete within the expected time.2010/09/23 11:51:46 VCS ERROR V-16-2-13027 (atrcxb946) Resource(stor_mnic) - monitor procedure did not complete within the expected time.2010/09/23 11:52:24 VCS ERROR V-16-2-13027 (atrcxb953) Resource(pub_mnic) - monitor procedure did not complete within the expected time.2010/09/23 11:52:56 VCS INFO V-16-10001-6557 (atrcxb946) MultiNICB:pub_mnic:monitor:Device: bnxe0 went from Up to Downhaping command is giving below messages:--=======================================#/opt/VRTSvcs/bin/MultiNICB/haping -v -i bnxe0 -g 10.44.193.41 bnxe0#/opt/VRTSvcs/bin/MultiNICB /haping -l -i bnxe0 bnxe0 2010/09/27 10:41:32 VCS DBG_AGDEBUG V-16-50-0 Thread(1111) Calling monitor for resource stor_mnic2010/09/27 10:41:32 VCS DBG_AGDEBUG V-16-50-0 Thread(1111) Agent framework version is 22010/09/27 10:41:32 VCS DBG_4 V-16-50-0 MultiNICB:stor_mnic:monitor: In checkStatus:889 calling haping on bnxe4 <=== Never returned2010/09/27 10:42:42 VCS DBG_4 V-16-50-0 MultiNICB:stor_mnic:monitor: In checkStatus:889 calling haping on bnxe4 <--- this have returned when new thread was started,2010/09/27 10:42:42 VCS DBG_4 V-16-50-0 MultiNICB:pub_mnic:monitor: In checkStatus:889 calling haping on bnxe02010/09/27 10:42:42 VCS DBG_4 V-16-50-0 MultiNICB:stor_mnic:monitor: In checkStatus:896 haping status for bnxe4 = 100

Cause

If we refer the above logs, we can see that the haping process called for an Interface which never returned to the configuration, and then the thread was cancelled. From the logs, we can see that haping is failing intermittently.

Resolution

Workaround:---==========Customer can use the workaround to set NumThread=1 for MultiNICB Resource.

Permanent Fix:--==============To fix the issue permanently, customer need to upgrade the server with 5.1RP2HF2.vcs-sol_x64-VRTSvcsag-5.1RP2HF2After applying the patch set the NumThread of MultiNICB to 10 by running following command.#haconf -makerw#hatype -modify MultiNICB NumThread 10#haconf -dump -makero

Applies To

Solaris X86_64 with 5.1RP2

Issue/Introduction

Customer is experiencing the MultiNICB monitor timeout issue.