The customer
has a 3-node SFORARAC cluster. They added some Volume resources to the
configuration, and after rebooting all of the nodes, they did not come up
properly. The customer would like to know if there is a configuration error, and
what caused this to happen.
Looking at
the main.cf:
group
CFSRAC-EEO-PRDEC1R (
SystemList
= { db100w48m3 = 0, db101w48m3 = 1, db102w48m3 = 2 }
AutoFailOver
= 0
Parallel =
1
AutoStartList
= { db100w48m3, db101w48m3, db102w48m3 }
)
CFSMount
PRDEC1R-exp_CFSMount (
Critical =
0
MountPoint
= "/etrade/prd/eeo/dbs/PRDEC1R/bkup/exp"
BlockDevice
= "/dev/vx/dsk/CFS-PRD1-db100w48m3dg/prd1_exp"
MountOpt
@db100w48m3 = "suid,rw"
MountOpt
@db101w48m3 = "suid,rw"
MountOpt
@db102w48m3 = "suid,rw"
NodeList =
{ db100w48m3, db101w48m3, db102w48m3 }
)
 CVMVolDg
PRDEC1R-EEO_CVMVolDg (
Critical =
0
CVMDiskGroup
= CFS-PRD1-db100w48m3dg
CVMActivation
@db100w48m3 = sw
CVMActivation
@db101w48m3 = sw
CVMActivation
@db102w48m3 = sw
)
Volume
PRDEC1R-exp_VOL (
DiskGroup
= CFS-PRD1-db100w48m3dg
Volume =
prd1_exp
)
Now here are
the error's that are seen in the engine_A.log:
2009/05/12
16:21:23 VCS ERROR V-16-10031-12502 (db101w48m3)
Volume:PRDEC1R-exp_VOL:offline:Could not stop volume prd1_exp.
2009/05/12
16:21:25 VCS ERROR V-16-2-13064 (db101w48m3) Agent is calling clean for
resource(PRDEC1R-exp_VOL) because the resource is up even after offline
completed.
2009/05/12
16:21:25 VCS ERROR V-16-10031-12504 (db101w48m3)
Volume:PRDEC1R-exp_VOL:clean:Could not stop volume prd1_exp even after using
forceful option.
2009/05/12
16:21:26 VCS ERROR V-16-2-13069 (db101w48m3) Resource(PRDEC1R-exp_VOL) - clean
failed.
This also
affects the dependent CFSMount resource from onlining because the underlying
volume cannot be stopped.
2009/05/12
16:21:41 VCS ERROR V-16-2-13065 (db101w48m3) Agent is calling clean for
resource(PRDEC1R-data1_CFSMount) because online did not complete within the
expected time.
2009/05/12
16:21:43 VCS ERROR V-16-1-10303 Resource PRDEC1R-data1_CFSMount (Owner: unknown,
Group: CFSRAC-EEO-PRDEC1R) is FAULTED (timed out) on sys db101w48m3
The way to
correct this is to remove all of the Volume resources from the CFS service
groups. The CVMVolDg resource monitors all of the volumes, and these are checked
by the CFSMount agent. So there is no need for the Volume resources.
Also, Volume
resources are not meant to be used in a parallel service group, just failover
service groups. Bus since all of these volumes are part of a shared diskgroup,
and the shared volumes are known by the CVMVolDg agent, just remove the Volumes
resources from the main.cf, and this will resolve your issue.