VCS agents including Netlsnr core dump due to SIGABRT signal

book

Article ID: 100007930

calendar_today

Updated On:

Description

Error Message

Messages related to VCS Netlsnr agent:
(This may happen for other VCS agents as well)

2011/10/31 03:35:39 VCS NOTICE V-16-1-53026 Agent Netlsnr ipm connection still valid
2011/10/31 03:35:39 VCS NOTICE V-16-1-53027 Waiting one more try for ipm connection for agent Netlsnr to go away
2011/10/31 03:35:51 VCS WARNING V-16-1-10023 Agent Netlsnr not sending alive messages since Mon Oct 31 03:33:28 2011

Stack trace (using dbx) of agent core dump may show:

(dbx) where
__fd_select(??, ??, ??, ??, ??) at 0xd0221e3c
Platform.select(int,fd_set*,fd_set*,fd_set*,timeval*)(0x0, 0x0, 0x0, 0x0, 0x20219ae8), line 228 in "time.h"
VCSSleep(timeval*)(tp = 0x20219ae8), line 1192 in "Platform.C"
vcsag_timer_thread_func(void*)(vp = 0x2ff228e4), line 1492 in "VCSAgMain.C"

Cause

This issue is tracked via Symantec etrack incident referenced in Supplemental Materials section below.

VCS agent has to heartbeat with engine periodically (within time set in AgentReplyTimeout attribute). This heartbeat ensures that agent is running properly. In order to send heartbeat, agent creates a timer and fetches the value periodically from this timer. If difference between current time and last heartbeat time is greater than or equal to time set in AgentReplyTImeout attribute, the agent sends a heartbeat message.

In present situation, this timer was getting re-initialized when resource was enabled, due to which agent failed to send a heartbeat to the engine. As agent was failed to send the heartbeat, engine tried to terminate the agent with SIGABRT and hence the core was generated.

Resolution

Veritas has made a change in agent framework library. With this change, the timer does not get reinitialized when resource is enabled.

The following patch fixes the issue.

VCS 5.1 SP1 RP2 HF1 (for AIX)


Applies To

This issue is applicable to following AIX versions running VCS 5.1SP1 and above.

AIX 5.3
AIX 6.1
AIX 7.1

 

Issue/Introduction

VCS (Veritas Cluster Server) agent(s) generated core due to SIGABRT signal received after AgentReplyTImeout time once resource(s) is/are enabled.

Additional Information

ETrack: 2607299