Symptom of Veritas Cluster Server needing to be restarted: error: VCS WARNING V-16-1-10367 Dump already in progress

book

Article ID: 100007906

calendar_today

Updated On:

Description

Error Message

# haconf -dump
VCS WARNING V-16-1-10367 Dump already in progress

 

Rebooted node and it was seen in mode:
 

adelscott  SysState           CURRENT_DISCOVER_WAIT

(seen in hasys -state on another node of the cluster and in engine log)

Cause

Unknown

Resolution

1)  Use 'ps -aef' to find process IDs (pid's) of the had and hashadow processes; repeat steps 1 and 2 for all nodes in the cluster.

 

# ps -aef|grep ha root 4135 1 0 14:24:57 ? 0:00 /opt/VRTSvcs/bin/hashadow root 4019 1 0 14:24:55 ? 0:08 /opt/VRTSvcs/bin/had root 4283 1 0 14:25:05 ? 0:08 /opt/VRTSvcs/bin/Phantom/PhantomAgent -type Phantom root 5527 2459 0 14:26:10 ? 0:02 /opt/VRTSsfmh/bin/hareg -all -group -resource -clus -sys -rclus -rsys -rgroup -

2)  Kill both pid's on one command line to prevent them from restarting the other.

(this aborts the VCS engine but leaves production services running)

 

# kill 4135 4019

 

Use 'ps -aef|grep ha' to verify that both processes have been stopped.

 

3)  Determine if I/O fencing is running and unconfigure on all nodes of the cluster if it exists.

 

# gabconfig -a

GAB Port Memberships

=============================

Port a gen 286101 membership 01

Port b gen 286105 membership 01 <===

Port h gen 286104 membership 01

( "01" in the last column indicates where this service is running)

 

# vxfenconfig -U

 

Run 'gabconfig -a' to validate that port b has been dropped from the output.

 

4)  Unconfigure gab on all nodes of the cluster

 

# gabconfig -U

 

Run 'gabconfig -a' to validate that no ports are listed in the output.

 

5)  Restart gab on all nodes.

 

# gabconfig -c -n<# of nodes>

 

After all nodes have been seeded, validate that gab has started on all nodes.

 

# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 286101 membership 01

 

6)  Restart I/O fencing on all nodes if it was determined to be configured in step 3.

 

# vxfenconfig -c

 

After starting I/O fencing on all nodes, validate that it has started on all nodes.

 

# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 286101 membership 01

Port b gen 286109 membership 01

 

7)  Restart had (VCS engine) on all nodes

 

# hastart

 

After starting had on all nodes, validate that it has started on all nodes.

 

# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 286101 membership 01

Port b gen 286109 membership 01

Port h gen 286106 membership 01

 

After the cluster and service groups have started and been processed, use 'hastatus -sum' to view a summary of the cluster status.

 

Applies To

A failover cluster running Veritas Cluster Server (VCS) version 5.0MP1RP5 on Solaris 10 systems.

 

Similar symptoms of command hanging and no logging taking place have been reported for other VCS versions and other supported Unix Operating Systems.

Issue/Introduction

The symptom is that no entries were logged to the engine log of 1 or more nodes. Dumping the configuration would error. Rebooted nodes would not re-join clusters and hastop -local -force would hang. This required stopping had, unconfiguring gab and reforming the cluster.