Veritas Cluster Server (VCS) CVMVolDg & CFSMount monitor timeouts - causing VCS CFSMount resources to go offline

book

Article ID: 100019909

calendar_today

Updated On:

Resolution

There are known bottlenecks in these agents.  Most Veritas Cluster Server (VCS) Clusters will do better after making the following CVMVolDg & CFSMount resource attribute changes:
 
Make VCS configuration read-writable:
 
# haconf -makerw

# hatype -modify CFSMount OnlineTimeout 600
# hatype -modify CFSMount MonitorInterval 75
# hatype -modify CFSMount MonitorTimeout 75
# hatype -modify CFSMount FaultOnMonitorTimeouts 0
# hatype -modify CFSMount NumThreads 8
 
# hatype -modify CVMVolDg ToleranceLimit 2
# hatype -modify CVMVolDg MonitorInterval 75
# hatype -modify CVMVolDg MonitorTimeout 75
# hatype -modify CVMVolDg FaultOnMonitorTimeouts 0
# hatype -modify CVMVolDg NumThreads 8  

Save changes and make VCS configuration read-only:

# haconf-dump -makero
 

Attributes Explained:

NumThreads:
  limits how many monitors run at the same time
 
Decreasing the NumThreads may lengthen the time it takes online, offline & probe resource.  

ToleranceLimit:  The number that can come back faulted & no action is taken.
As the CVMVolDg doesn't import or deport VxVM shared diskgroups by default (unless CVMDeportOnOffline = 1).
 
The VCS CVMCluster agent enables the Volume Manager cluster functionality by automatically importing shared disk groups. It depends on the VxVM vxconfigd daemon, it gives vxconfigd time to recover.

Sample main.cf resources:

        CVMCluster cvm_clus (
                CVMClustName = pri_clust
                CVMNodeId = { nodea = 0, nodeb = 1 }
                CVMTransport = gab
                CVMTimeout = 200
                )

        CVMVxconfigd cvm_vxconfigd (
                Critical = 0
                CVMVxconfigdArgs = { syslog }
                )

The CVMVxconfigd agent starts and monitors the VxVM vxconfigd daemon. The vxconfigd daemon maintains disk and disk group configurations, communicates configuration changes to the kernel, and modifies the configuration information that is stored on disks. CVMVxconfigd resource must be present in the CVM service group.

 

NOTE: The VxVM shared diskgroup is imported from the CVM master node, if the disk group is not already imported.

If the CVMDeportOnOffline attribute is set to 1 and if the shared disk group does not contain open volumes on any node in the cluster, the disk group is deported from the CVM master node.
 
Example from latest 7.4.2 Private Hot-fix
 
        CVMVolDg srdf_dg (
                CVMDiskGroup = SRDFDG
                CVMVolume = { data }
                CVMActivation = sw
                CVMDeportOnOffline = 1
                ClearClone = 1
                DGOptions = "-t"
                ScanDisks = 1
                )
 

Increasing ToleranceLimit means that actual faults are ignored.  

NOTE:  Do not increase ToleranceLimit without consulting Veritas Technical Support, Professional Services, or after extensive testing.
 
MonitorInterval: Needs to be greater or equal to the MonitorTimeout.   Check error 13026 in the engine_A.log for a guide in regards to how long the monitors really need to run for.
 
FaultOnMonitorTimeouts: The number of timeouts before a fault is declared.
Zero disables it.  Real resource faults are not impacted by this.
 
Please see the VCS Users Guide for details on these attributes.

Issue/Introduction


Veritas Cluster Server (VCS) CVMVolDg & CFSMount monitor timeouts can result in VCS CFSMount resource to go offline.