When consultingthe main.cf, all the three main entities required by Clusterware, such as theVoting device, PrivNIC device and OCR repository were all configured, and allonline, so obvious cause of the cssd resource to fail to online was notapparent. A look at the node in question's ocssd.log file in $CRSHOME/logrevealed the following messages:
...
[CSSD]2010-06-09 14:32:37.550 [20] >TRACE: clssnmRcfgMgrThread: LocalJoin
[CSSD]2010-06-09 14:32:37.550 [20] >Warning: clssnmLocalJoinEvent: takeoveraborted due to ALIVE node on Disk
[CSSD]2010-06-09 14:32:38.415 [8] >TRACE: clssnmReadDskHeartbeat: node(1) isdown. rcfg(2) wrtcnt(527667) LATS(179894445
...
However, acheck on the Voting device confirmed that heartbeating was fine, so a check wasthen made on the PrivNIC device. This represents an IP address on each node usedfor Clusterware's OCSSD processes to communicate with the peer processes onother nodes. A ping on each node of the other node's PrivNIC IP address failed.Once the networking issue had been resolved, and the PrivNIC IP addresses couldbe pinged successfully, the cssd resource was then able to online.
A samplePrivNIC resource is below:
PrivNICora_priv (
Critical =0
Device@nodeA = { nxge3 = 0, nxge15 = 1 }
Device@nodeA = { nxge3 = 0, nxge15 = 1 }
Address@nodeA = "10.0.0.5"
Address@nodeA = "10.0.0.6"
NetMask ="255.255.255.240"
)
...