How to enable First Failure Data Capture (FFDC) for VCS agent entry point using AdvDbg attribute?

book

Article ID: 100038026

calendar_today

Updated On:

Description

Description

First Failure Data Capture (FFDC) is the process of capturing information on unexpected event like:

  • Agent heartbeat loss with engine
  • Agent abnormal behavior (Segmentation Fault)
  • Agent EP timeout/Failure etc

VCS currently captures FFDC logs for the first two events - agent heartbeat loss and agent abnormal behaviour. From 6.0 release, VCS captures information on entry point (EP) timeout. This advanced debugging can be activated by setting the value of AdvDbg attribute for the resource type. This is helpful in RCA and saves time in troubleshooting the unexpected event. If configured, this directs the agent framework to invoke the following predefined actions on entry point time out:

■ pstack: Used to generate the process tree or process stack or both.
■ core: Used to generate the core of the agent process.

A process can be the agent process or any command executed from the agent entry point. All information is captured under $VCS_LOG/diag/agents/ directory.

Working of pstack action

Whenconfigured with this action, the agent framework captures the process stack or process stack with the process tree on entry point timeout. The agent framework takes a decision internally to capture the process tree along with the process stack.

Working of core action

When configured with this action, the agent framework captures the process core on the entry point timeout. In this release, agent framework only supports capturing of agent core. The core file is named as core.... This means that core is generated when entry point of the resource is timed out. The core of the agent process is the core generated in the last timeout of the entry point.

Configuring AdvDbg attribute

AdvDbg attribute is a keylist attribute and the format of the individual key is:
::

In the above syntax:

: Name of the resource level entry point. For example, monitor, offline, online, clean, and so on.
: This always has its value as timeout and is reserved for future use.
: Specifies what information to capture on entry point timeout. Its value can be either pstack or core or both.

For example:

monitor:timeout:pstack instructs the agent framework to generate pstack information on monitor timeout.
offline:timeout:pstack instructs the agent framework to generate pstack information on offline timeout.
clean:timeout:pstack,core instructs the agent framework to generate pstack information as well as core on clean timeout.

To configure pstack and core during monitor timeout for Mount agent, run the following command:

# hatype -modify Mount AdvDbg -add monitor:timeout:pstack,core
 
    

To override this value at the resource level and to capture only pstack during monitor timeout for a specific resource, run the following commands:

# hares -override AdvDbg
# hares -modify AdvDbg -add monitor:timeout:pstack
     

 

 To clear this action at the resource level, run the following commands:

# hares -modify AdvDbg -delete monitor:timeout:pstack
# hares -undo_override AdvDbg
          

 To stop capturing FFDC output for Mount agent, run the following command:

# hatype -modify Mount AdvDbg -delete monitor:timeout:pstack,core
 
 
 

Example of NFSRestart agent's monitor EP timeout where it is configured to capture pstack:

 
Thread 4 (Thread 0xf6cffb70 (LWP 1600)):
#0  0xffffe425 in __kernel_vsyscall ()
#1  0xf7397c1b in waitpid () from /lib/libc.so.6
#2  0xf73343ab in do_system () from /lib/libc.so.6
#3  0xf7334772 in system () from /lib/libc.so.6
#4  0x080491a5 in _plat_start_lock_daemons(nfsspec*) ()
#5  0x0804b9c7 in restart_daemons ()
#6  0x0804ba0f in restart_nfs_daemons(nfsspec*) ()
#7  0x0804ca80 in nfsrestart_monitor ()
#8  0xf760db80 in VCSAgEPStruct::call_monitor(char const*, void**, int*) () from /usr/lib/libvcsagfw.so
#9  0xf7604690 in VCSAgType::call_monitor(char const*, char const*, void**, void**, int*, unsigned long long*, VCSAgContainer*) () from /usr/lib/libvcsagfw.so
 

 

Issue/Introduction

How to enable First Failure Data Capture (FFDC) for VCS agent entry point using AdvDbg attribute?