A Volume Manager Disk Group in SFW 5.1 for a Windows Failover Cluster may take a prolonged time to go offline

book

Article ID: 100003370

calendar_today

Updated On:

Description

Error Message

INFORMATION      1001(0x000003e9)      Windows Error Reporting HOSTNAME     
Fault bucket , type 0 Event Name: WSFC Resource Deadlock

Cause

The default restart settings in WSFC are such that a faulting resource is terminated and then retried once before being declared offline.

In this case: the cluster discovers the fault, calls for terminate and then attempts to re-online.  Storage Foundation for Windows detects that the Disk Group is not present and returns a status of Pending.  This creates a deadlock that lasts until either RHS.exe terminates (through hang protection); or the online pending timeout is reached.

Resolution

 

A private fix is available for this issue which changes the return code from Pending to Failed.  This causes the online retry to fail which allows WSFC to take the resource offline.

To obtain the private fix, contact Veritas Enterprise Technical Support and reference this article during the call. A support representative will be available to assist in troubleshooting this issue. If it is determined that the private fix addresses the problem the support representative will further assist in obtaining the private fix.

Note: This fix specifically addresses the problem identified above. It has not been fully tested and should be applied in a test environment before placing into production. If the systems are not critically impaired, it is recommended to delay the installation of this private fix until the next scheduled maintenance release. Before applying this private fix, systems may be required to be upgraded to the latest code base. The support representative will help in determining the best course of action

This fix accumulates an earlier fix for a similar issue.  Please follow the URL in the Related Articles section for further information.

File information:

Filename File version
cluscmd.dll 5.1.10036.584
vxres.dll 5.1.10036.584
vxvm.dll 5.1.10036.584

 

 

Applies To

Windows Server 2008 Failover Clustering

Storage Foundation for Windows 5.1 Service Pack 1

Issue/Introduction

During a storage failure where all disks are removed from a Windows Server 2008 Failover Cluster (WSFC) node; it may take up to 5 minutes for any Volume Manager Disk Group (VMDg) resource(s), online on that node, to be taken offline in the cluster. The WSFC resource monitoring process (RHS.exe) may also be seen to fault and Windows Error Reporting may log messages relating to Resource Deadlocks.

Additional Information

ETrack: 2139735