After adding Volume resources to SFCFS configuration, service groups are faulting.

book

Article ID: 100021020

calendar_today

Updated On:

Resolution

The customer has a 3-node SFORARAC cluster. They added some Volume resources to the configuration, and after rebooting all of the nodes, they did not come up properly. The customer would like to know if there is a configuration error, and what caused this to happen.

Looking at the main.cf:

group CFSRAC-EEO-PRDEC1R (
SystemList = { db100w48m3 = 0, db101w48m3 = 1, db102w48m3 = 2 }
AutoFailOver = 0
Parallel = 1
AutoStartList = { db100w48m3, db101w48m3, db102w48m3 }
)

CFSMount PRDEC1R-exp_CFSMount (
Critical = 0
MountPoint = "/etrade/prd/eeo/dbs/PRDEC1R/bkup/exp"
BlockDevice = "/dev/vx/dsk/CFS-PRD1-db100w48m3dg/prd1_exp"
MountOpt @db100w48m3 = "suid,rw"
MountOpt @db101w48m3 = "suid,rw"
MountOpt @db102w48m3 = "suid,rw"
NodeList = { db100w48m3, db101w48m3, db102w48m3 }
)
 CVMVolDg PRDEC1R-EEO_CVMVolDg (
Critical = 0
CVMDiskGroup = CFS-PRD1-db100w48m3dg
CVMActivation @db100w48m3 = sw
CVMActivation @db101w48m3 = sw
CVMActivation @db102w48m3 = sw
)

Volume PRDEC1R-exp_VOL (
DiskGroup = CFS-PRD1-db100w48m3dg
Volume = prd1_exp
)

Now here are the error's that are seen in the engine_A.log:

2009/05/12 16:21:23 VCS ERROR V-16-10031-12502 (db101w48m3) Volume:PRDEC1R-exp_VOL:offline:Could not stop volume prd1_exp.

2009/05/12 16:21:25 VCS ERROR V-16-2-13064 (db101w48m3) Agent is calling clean for resource(PRDEC1R-exp_VOL) because the resource is up even after offline completed.

2009/05/12 16:21:25 VCS ERROR V-16-10031-12504 (db101w48m3) Volume:PRDEC1R-exp_VOL:clean:Could not stop volume prd1_exp even after using forceful option.

2009/05/12 16:21:26 VCS ERROR V-16-2-13069 (db101w48m3) Resource(PRDEC1R-exp_VOL) - clean failed.

This also affects the dependent CFSMount resource from onlining because the underlying volume cannot be stopped.

2009/05/12 16:21:41 VCS ERROR V-16-2-13065 (db101w48m3) Agent is calling clean for resource(PRDEC1R-data1_CFSMount) because online did not complete within the expected time.

2009/05/12 16:21:43 VCS ERROR V-16-1-10303 Resource PRDEC1R-data1_CFSMount (Owner: unknown, Group: CFSRAC-EEO-PRDEC1R) is FAULTED (timed out) on sys db101w48m3

The way to correct this is to remove all of the Volume resources from the CFS service groups. The CVMVolDg resource monitors all of the volumes, and these are checked by the CFSMount agent. So there is no need for the Volume resources.
Also, Volume resources are not meant to be used in a parallel service group, just failover service groups. Bus since all of these volumes are part of a shared diskgroup, and the shared volumes are known by the CVMVolDg agent, just remove the Volumes resources from the main.cf, and this will resolve your issue.



Issue/Introduction

After adding Volume resources to SFCFS configuration, service groups are faulting.