LVM resources are in the W_OFFLINE_PROPAGATE status after consecutively rebooting the nodes in a cluster

book

Article ID: 100041997

calendar_today

Updated On:

Description

Error Message

-- SYSTEM STATE
-- System               State                Frozen

A  server101            RUNNING              0
A  server102            RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  lvm1            server101            Y          N               PARTIAL
B  lvm1            server102            Y          N               STOPPING|PARTIAL
B  lvm2            server101            Y          N               PARTIAL
B  lvm2            server102            Y          N               STOPPING|PARTIAL

-- RESOURCES OFFLINING
-- Group           Type            Resource             System               IState

G  lvm1            LVMLogicalVolume vg02_u03             server102            W_OFFLINE_PROPAGATE
G  lvm1            LVMLogicalVolume vg02_u04             server102            W_OFFLINE_PROPAGATE
G  lvm2            LVMLogicalVolume vg01_u01             server102            W_OFFLINE_PROPAGATE
G  lvm2            LVMLogicalVolume vg01_u02             server102            W_OFFLINE_PROPAGATE


2018/02/28 01:42:19 VCS INFO V-16-1-10297 Resource vg01_u02 (Owner: Unspecified, Group: lvm2) is online on server102 (First probe)
2018/02/28 01:42:19 VCS ERROR V-16-1-10252 Concurrency Violation:CurrentCount increased above 1 for group lvm2
2018/02/28 01:42:19 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group lvm2 on all nodes
2018/02/28 01:42:19 VCS INFO V-16-1-10304 Resource u02 (Owner: Unspecified, Group: lvm2) is offline on server102 (First probe)
2018/02/28 01:42:19 VCS INFO V-16-1-10297 Resource vg02_u03 (Owner: Unspecified, Group: lvm1) is online on server102 (First probe)
2018/02/28 01:42:19 VCS ERROR V-16-1-10252 Concurrency Violation:CurrentCount increased above 1 for group lvm1
2018/02/28 01:42:19 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group lvm1 on all nodes
2018/02/28 01:42:19 VCS INFO V-16-1-10297 Resource vg01_u01 (Owner: Unspecified, Group: lvm2) is online on server102 (First probe)
2018/02/28 01:42:19 VCS ERROR V-16-1-10252 Concurrency Violation:CurrentCount increased above 1 for group lvm2
2018/02/28 01:42:19 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group lvm2 on all nodes
2018/02/28 01:42:19 VCS INFO V-16-1-10297 Resource vg02_u04 (Owner: Unspecified, Group: lvm1) is online on server102 (First probe)
2018/02/28 01:42:19 VCS ERROR V-16-1-10252 Concurrency Violation:CurrentCount increased above 1 for group lvm1
2018/02/28 01:42:19 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group lvm1 on all nodes

2018/02/28 01:42:19 VCS ERROR V-16-2-13077 (server102) Agent is unable to offline resource(vg01_u02). Administrative intervention may be required.
2018/02/28 01:42:19 VCS ERROR V-16-2-13077 (server102) Agent is unable to offline resource(vg02_u04). Administrative intervention may be required.
2018/02/28 01:42:19 VCS INFO V-16-2-13068 (server102) Resource(vg01_u01) - clean completed successfully.
2018/02/28 01:42:19 VCS ERROR V-16-2-13077 (server102) Agent is unable to offline resource(vg02_u03). Administrative intervention may be required.
2018/02/28 01:42:19 VCS ERROR V-16-2-13077 (server102) Agent is unable to offline resource(vg01_u01). Administrative intervention may be required.

Cause

During the system boot, the volume group gets imported and both are activated. Once the HAD daemon starts, it reports a concurrency violation and requests administrator intervention. 

Resolution

To make sure this issue doesn't happen in future, enable VG Activation Protection.

Perform these steps on each node of the cluster:

1. Edit /etc/lvm/lvm.conf, and add the following line:
tags { hosttags = 1 }

2. Create the file /etc/lvm/lvm_$(uname -n).conf.

3. Add the following line to the file that you created in Step 2:
activation { volume_list="@node" }

"Node" is the value of "uname -n."

Note: There can be scenarios after making the change that the server does not boot since the tagging applies for all volume groups (VGs).

You can exclude VGs (VolGroup00) by adding this line into /etc/lvm/lvm_$(uname -n).conf.
activation { volume_list = [ "VolGroup00" , "@node" ] }

Edit the LVMVolumeGroup resource.
haconf -makerw
hares -modify vg02 EnableLVMTagging 1
hares -modify vg01 EnableLVMTagging 1
haconf -dump -
makero

The issue was tested with InfoScale 7.3.1, but it may be applicable to all VCS (Veritas Cluster Server) versions.