VCS reports LDom resource as OFFLINE unexpectedly when LDOM is heavily loaded

book

Article ID: 100023481

calendar_today

Updated On:

Description

Error Message

VCS WARNING V-16-10001-19526 (unix1) LDom:prd2_ldom:monitor:The guest operating system is heavily loaded.

Then the LDom is reported as being offline.

VCS ERROR V-16-2-13067 (unix1) Agent is calling clean for resource(LDOM_prd02) because the resource became OFFLINE unexpectedly, on its own.

 

Cause

This behaviour is as designed to failover or restart LDom's if they are hung with 100% CPU usage.

Resolution

If it is undesirable for VCS to offline the LDom's, then set MonitorCPU attribute to 0 for the resource

# haconf -makerw
# hares -modify prod2_ldom MonitorCPU 0
# haconf -dump -makero

 

Alternately, change the default behaviour for all LDom resources


# haconf -makerw
# haattr -default LDom MonitorCPU 0
# haconf -makero
 

Also, if there is a desire to reduce the sensitivity to CPU usage reaching 100%, then increase the ToleranceLimit attribute.

# hatype -modify LDom ToleranceLimit 2

That will delay the Agent to declare resource as faulted by 2 consecutive monitor cycles of LDom CPU is 100% busy.

Please note, this increase in ToleranceLimit also will delay the resource failover in actual fault conditions. So an appropriate

value, suiltable for local cluster requirement, should be choosen for ToleranceLimit Attribute.

 


Applies To

VCS on LDOM environments.

Issue/Introduction

VCS LDom Agent has MonitorCPU attribute which is used to monitor the CPU usage of LDOM's. This is set to 1 by default. If all the CPU's of the LDom are 100% or 0% used, then the LDom agent reports the resource as faulted.