VOM DB resources fails to online on system where SFM_services service group failed earlier
book
Article ID: 100031576
calendar_today
Updated On:
Description
Error Message
2015/11/03 19:32:37 VCS NOTICE V-16-1-10301 Initiating Online of Resource SFM_Services_DB (Owner: Unspecified, Group: SFM_Services) on System server101
2015/11/03 19:32:37 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : online called
2015/11/03 19:32:37 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : Invoking db_monitor.sh
2015/11/03 19:32:37 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : db_monitor.sh returned [100]
2015/11/03 19:32:37 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : attaching database
2015/11/03 19:33:05 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : database start returned [256].
2015/11/03 19:33:05 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : db_online:: database [SFMdb3] could not be started. Error code: 1
2015/11/03 19:33:05 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : returning from online
2015/11/03 19:33:05 VCS INFO V-16-10031-504 (server101) Application:SFM_Services_DB:online:Executed /opt/VRTSsfmcs/config/vcs/db/online as user root
2015/11/03 19:35:07 VCS ERROR V-16-2-13066 (server101) Agent is calling clean for resource(SFM_Services_DB) because the resource is not up even after online completed.
Cause
The issue occurs on systems where the VOM service group SFM_Services faulted earlier.
One of the scenarios is when VOM service group SFM_Services faults due to storage issue. As part of the service group fault/failover, all the DB resource is taken offline but it leaves behind stale postgres processes.
Any subsequent online of DB resource on that node fails with above error due to stale postgres processes.
---> Stale postgres process left from earlier
# ps -ef |grep post
99999 18661 1 0 18:17 ? 00:00:00 /opt/VRTSsfmcs/pgsql/bin/postgres -D /var/opt/VRTSsfmcs/db/data -p 5636
99999 18662 18661 0 18:17 ? 00:00:00 postgres: logger process
99999 18733 18661 0 18:17 ? 00:00:00 postgres: checkpointer process
99999 18734 18661 0 18:17 ? 00:00:00 postgres: writer process
99999 18735 18661 0 18:17 ? 00:00:00 postgres: wal writer process
99999 18736 18661 0 18:17 ? 00:00:00 postgres: autovacuum launcher process
99999 18737 18661 0 18:17 ? 00:00:00 postgres: stats collector process
Resolution
The /opt/VRTSsfmcs/config/vcs/db/offline script has been modified to perform cleanup of any stale postgres process to resolve this issue.
This fix is included in VOM patch 6.1HF8 (6.1.0800) which can be obtained from sort.veritas.com
using below link.
https://sort.veritas.com/patch/detail/11080
Issue/Introduction
VOM (Veritas Operations Manager) DB (Database) resource in VOM HA / HADR (High Availability Disaster Recovery) setup fails to online on system where SFM_services service group failed earlier.
In VOM HA or HADR setup, SFM_Services service group is configured to make VOM services including DB highly available.
Additional Information
ETrack: 3860010
Was this article helpful?
thumb_up
Yes
thumb_down
No