Netlsnr agent core dumping on Solaris 10 and Cluster Server 5.0MP3

book

Article ID: 100005442

calendar_today

Updated On:

Description

Error Message

2010/09/30 06:58:37 VCS WARNING V-16-1-10023 Agent Netlsnr not sending alive messages since Thu Sep 30 06:56:19 2010
2010/09/30 06:58:37 VCS WARNING V-16-1-53025 Agent Netlsnr has faulted; ipm connection was lost; restarting the agent

Cause

From pflags and pstack output, the information below can be captured:

-----------------  lwp# 6 / thread# 6  --------------------
 feed7224 t_splay  (426ba8, 1, 425af8, feed73cc, fefb03a8, 0) + 18c
 feed65bc _malloc_unlocked (1000, 0, 426ba8, 0, 0, 0) + 18c
 feed6414 malloc   (1000, 1, d9fd8, fef1ba24, fefb03a8, fefba518) + 4c
 ff237e60 __1cJvcsmalloc6FIpcI_pv_ (1000, ff2b0455, 6a, 3, fe970707, ffc) + 60
 ff237ba4 __1cLvcssnprintf6FpcIpkcE_I_ (fe96f704, 1000, ff299b3c, 6, 7ffffc00, 0) + 5c
 ff19b730 __1cIVCSAgLogDlog6MnMulog_dbg_lev_ipkcp2ii4pv_v_ (2fdd0, 15, 1, ff2a9120, ff2a914a, 160) + 100
 ff19c014 __1cIVCSAgLogFtrace6Mpkcp1ii3E_v_ (0, ff2a915e, ff2a914a, 160, 0, ff2a915e) + 8c
 ff1f6f84 __1cMVCSAgProcessP_create_process6M_l_ (38508, 3bb2b0, fe973a40, fe97193c, 0, ff2a92a7) + 164
 ff1f7854 __1cMVCSAgProcessTexec_script_in_zone6Mpkcppc3ipLp166nSVCSAgContainerType_3_i_ (3
8508, 19de6, fe974b78, fe973b78, 1000, fe973b74) + 564
 ff1ab110 VCSAgExecInContainer (fe973abc, 1, fe974b78, fe973b78, 1000, fe973b74) + 318
 00012bbc getZoneUserId (407760, fe974cb4, 3a1318, fe974ca0, 0, 1a0a7) + ec
 0001378c netlsnr_monitor (fe979710, 1f2558, fe9799c0, 13ec, 0, ff2a4111) + 71c
 ff1e4e78 __1cNVCSAgEPStructMcall_monitor6Mpkcppvpi_nNVCSAgResState__ (81298, fe979710, 1f2558, fe9799c0, 0, ff2a287a) + 60
 ff1d0bf4 __1cJVCSAgTypeMcall_monitor6Mpkcp1ppv5pipc_nNVCSAgResState__ (34998, fe979510, fe979710, 1f2558, 194330, fe9799c0) + 614
 ff1b1330 __1cIVCSAgResQcall_entry_point6MnPVCSAgEntryPoint_pvpnNVCSAgIntState__nMVCSAgRetType__ (ff29d0fa, 1a, fe9799bc, 300a0, ff29e42b, 1243) + 16f8
 ff191008 __1cNVCSAgISOnlineHmonitor6MpnIVCSAgRes__nJVCSAgBool__ (300e0, 66958, ff29e42b, 1243, 0, ff29e436) + 228
 ff1b9f20 __1cIVCSAgResLprocess_cmd6MpnFVList_pi_nJVCSAgBool__ (66958, 0, fe97ad08, 15ff, 0, ff29eb2c) + 300
 ff1bcd48 __1cIVCSAgResQprocess_resource6Fp0_v_ (66958, ff2a4961, 1ee, 1d6, 0, ff2a490b) + d8
 ff1df3cc vcsag_service_thread_start (1, fe97c000, 0, 0, ff1defd0, 0) + 3fc
 fef48a20 _lwp_start (0, 0, 0, 0, 0, 0)

so, between pflags and pstack,  the core dump is seen to crash with a SIGSEGV signal and this was due to memory operations at the time, i.e. _malloc_unlocked

Resolution

The issue will be fixed in 5.0MP3RP5 via e2309041, due in H12011


Applies To

Solaris 10

5.0MP3 VCS

Oracle instances running with Solaris zones

Issue/Introduction

The Netlsnr agent was core dumping daily on 5.0MP3 of Cluster Server.