VOM DB resources fails to online on system where SFM_services service group failed earlier

book

Article ID: 100031576

calendar_today

Updated On:

Description

Error Message

2015/11/03 19:32:37 VCS NOTICE V-16-1-10301 Initiating Online of Resource SFM_Services_DB (Owner: Unspecified, Group: SFM_Services) on System server101
2015/11/03 19:32:37 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : online called
2015/11/03 19:32:37 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : Invoking db_monitor.sh
2015/11/03 19:32:37 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online :  db_monitor.sh returned [100]
2015/11/03 19:32:37 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : attaching database
2015/11/03 19:33:05 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : database start returned [256].
2015/11/03 19:33:05 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : db_online:: database [SFMdb3] could not be started. Error code: 1
2015/11/03 19:33:05 VCS INFO V-16-0 (server101) Application:SFM_Services_DB:online:SFM:db_online : returning from online
2015/11/03 19:33:05 VCS INFO V-16-10031-504 (server101) Application:SFM_Services_DB:online:Executed /opt/VRTSsfmcs/config/vcs/db/online as user root
2015/11/03 19:35:07 VCS ERROR V-16-2-13066 (server101) Agent is calling clean for resource(SFM_Services_DB) because the resource is not up even after online completed.

Cause

The issue occurs on systems where the VOM service group SFM_Services faulted earlier. 

One of the scenarios is when VOM service group SFM_Services faults due to storage issue. As part of the service group fault/failover, all the DB resource is taken offline but it leaves behind stale postgres processes.

Any subsequent online of DB resource on that node fails with above error due to stale postgres processes.


---> Stale postgres process left from earlier

# ps -ef |grep post
99999    18661     1  0 18:17 ?        00:00:00 /opt/VRTSsfmcs/pgsql/bin/postgres -D /var/opt/VRTSsfmcs/db/data -p 5636
99999    18662 18661  0 18:17 ?        00:00:00 postgres: logger process
99999    18733 18661  0 18:17 ?        00:00:00 postgres: checkpointer process
99999    18734 18661  0 18:17 ?        00:00:00 postgres: writer process
99999    18735 18661  0 18:17 ?        00:00:00 postgres: wal writer process
99999    18736 18661  0 18:17 ?        00:00:00 postgres: autovacuum launcher process
99999    18737 18661  0 18:17 ?        00:00:00 postgres: stats collector process

Resolution


The /opt/VRTSsfmcs/config/vcs/db/offline script has been modified to perform cleanup of any stale postgres process to resolve this issue.

This fix is included in VOM patch 6.1HF8 (6.1.0800) which can be obtained from sort.veritas.com
using below link.

https://sort.veritas.com/patch/detail/11080

Issue/Introduction

VOM (Veritas Operations Manager)  DB (Database) resource in VOM HA / HADR (High Availability Disaster Recovery)  setup fails to online on system where SFM_services service group failed earlier.


In VOM HA or HADR setup, SFM_Services service group is configured to make VOM services including DB highly available.


 

Additional Information

ETrack: 3860010