"Oracle CRS failure. Rebooting for cluster integrity" is reported immediately before a system reboot

book

Article ID: 100024390

calendar_today

Updated On:

Description

Error Message

Oracle clsomon shutdown successful.
Oracle CSSD failure 143.
Oracle CRS failure.  Rebooting for cluster integrity.

Panic string:
^Mpanic[cpu17]/thread=3002e1ced80:
unix: [ID 156897 kern.notice] forced crash dump initiated at user request
unix: [ID 100000 kern.notice]
genunix: [ID 723222 kern.notice] 000002a10789d960 genunix:kadmin+4a4 (b4, 1, 0, 1225400, 5, 1)
genunix: [ID 179002 kern.notice]   %l0-3: 000000000182b400 00000000011ddc00 0000000000000004 0000000000000004
genunix: [ID 723222 kern.notice] 000002a10789da20 genunix:uadmin+11c (60030805d98, 1, 0, ff390000, 0, 0)
genunix: [ID 179002 kern.notice]   %l0-3: 0000000001009558 000002a10789db90 000003000c5fc170 0000000001054528
unix: [ID 100000 kern.notice]
genunix: [ID 672855 kern.notice] syncing file systems...

Cause

As the message “Oracle CRS failure. Rebooting for cluster integrity” suggests, the reboot is triggered by Oracle CRS.

This behavior has been observed when the OracleCSSD resource faults due to repeated timeouts of the monitoring scripts used by VCS. The reason for the timeouts is usually due to extreme CPU utilization around this timeframe, preventing the scripts from completing in a timely manner.

When this occurs, the Oracle CSS daemon appears to be “offline” to VCS because it cannot not be monitored (because the server is extremely busy and the monitoring scripts are timing-out). As a result of the time-outs, VCS changes the resource status to a “faulted” state. At this point, if the Oracle CSS daemon remains unresponsive, neither stopping nor starting, a fencing mechanism may be triggered by Oracle CRS. This mechanism is apparently, by design, to protect data integrity.

Resolution

Investigate the CPU usage of the node(s) as it appears that they are under heavy strain. This is what ultimately causes the monitoring timeouts and subsequent resource fault. For further clarification regarding the CSSD/CRS fencing mechanism that triggers the node reboot, contact Oracle Support.


Issue/Introduction

System panic due to CRS errors