vxesd (eventsource) daemon reporting high CPU usage whilst trying to obtain HBA topology information

book

Article ID: 100050409

calendar_today

Updated On:

Description

Error Message


The "top" command can be used to capture the CPU usage:
 

# top

top - 17:17:38 up 5:59, 4 users, load average: 2.10, 2.46, 2.31
Tasks: 792 total, 2 running, 790 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.9 us, 4.6 sy, 0.0 ni, 91.0 id, 2.4 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13139635+total, 3081160 free, 84459792 used, 43855408 buff/cache
KiB Swap: 66674684 total, 66667136 free, 7548 used. 40086748 avail Mem

 

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
114286 root 20 0 322436 2608 1452 R 100.3 0.0 0:38.73 vxesd



Example for Linux:

# grep -i "many open files" /var/log/messages*
messages-20210415:Apr 5 18:08:00 andcdfxpnbums04 vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sdack: Too many open files
messages-20210415:Apr 5 18:08:54 andcdfxpnbums04 vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sdack: Too many open files
messages-20210415:Apr 5 18:08:54 andcdfxpnbums04 vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sddy: Too many open files
messages-20210415:Apr 5 18:09:44 andcdfxpnbums04 vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sdeh: Too many open files
messages-20210415:Apr 5 18:09:44 andcdfxpnbums04 vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sdack: Too many open files

 

The pstack for the vxesd process id

Example:

# pstack 48134
Thread 7 (Thread 0x7f6faa482700 (LWP 48165)):
#0 0x00007f6fb24d1c3d in poll () from /lib64/libc.so.6
#1 0x0000000000407e85 in vold_get_dmp_events ()
#2 0x0000000000414114 in es_execute_thread ()
#3 0x00007f6fb31aaea5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f6fb24dc96d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f6fa9c41700 (LWP 48252)):
#0 0x00007f6fb24a385d in nanosleep () from /lib64/libc.so.6
#1 0x00007f6fb24a36f4 in sleep () from /lib64/libc.so.6
#2 0x00007f6fb3474c89 in qlapi_event_thread (arg=0x0) at ql_event.c:548
#3 0x00007f6fb31aaea5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f6fb24dc96d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f6fa9440700 (LWP 48387)):
#0 0x00007f6fb31aea35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000414257 in es_del_event ()
#2 0x000000000040d376 in process_hbaeve_queue ()
#3 0x0000000000414114 in es_execute_thread ()
#4 0x00007f6fb31aaea5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f6fb24dc96d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f6fa8c3f700 (LWP 48389)):
#0 0x00007f6fb24d39a3 in select () from /lib64/libc.so.6
#1 0x00000000004078b9 in es_check_udev_events ()
#2 0x0000000000414114 in es_execute_thread ()
#3 0x00007f6fb31aaea5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f6fb24dc96d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f6fa3fff700 (LWP 48390)):
#0 0x00007f6fb31aea35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000414257 in es_del_event ()
#2 0x000000000041aaae in es_process_disc_queue ()
#3 0x0000000000414114 in es_execute_thread ()
#4 0x00007f6fb31aaea5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f6fb24dc96d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f6fa37fe700 (LWP 213529)):
#0 0x00007f6fb24a385d in nanosleep () from /lib64/libc.so.6
#1 0x00007f6fb24a36f4 in sleep () from /lib64/libc.so.6
#2 0x00007f6fb3474c89 in qlapi_event_thread (arg=0x0) at ql_event.c:548
#3 0x00007f6fb31aaea5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f6fb24dc96d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f6fb35c5740 (LWP 48134)):
#0 0x00007f6fb31b19dd in accept () from /lib64/libpthread.so.0
#1 0x0000000000407032 in start_ipc ()
#2 0x000000000040fd60 in main ()

 

To save the vxesd debug information to a file, redirect the stderr & stout:
 
# vxesd -d > /tmp/vxesd.out 2>&1

strace output can also be obtained against the process if for vxesd

# strace -p 48134 -o /var/tmp/vxesd_48134.out

 

 

Cause

 

Unknown at this time, investigation on-going


When vxesd if reporting high CPU usage, Veritas engineering will require the following evidence:
 

#  i=0; while true; do ((i++));echo -e "\n$i" >> /tmp/esd.out;echo `date` >> /tmp/esd.out; pstack `pgrep vxesd` >> /tmp/esd.out; [[ $i -ge 1200 ]]&&gcore `pgrep vxesd`&&break;done

 

 

In addition, the following monitor.sh script can also be deployed and run in the background.

 

# cat monitor.sh

#!/bin/bash

CPU_THRESHOLD=90
CPU_HIGH_CNT=6
INTERVAL=5
LOGFILE=/tmp/vxesd_cpu.log
cnt=0
i=0
pid_vxesd=`pgrep vxesd`
while true
do
    cur_date=`date`
    cpu_util=`top -s -b -n 1 -p $pid_vxesd | grep vxesd | awk '{print $(NF-3)}'`
    echo >> $LOGFILE
    echo "$cur_date - $i - $cpu_util" >> $LOGFILE
    exceeded=`echo "$cpu_util > $CPU_THRESHOLD" | bc`
    if [ $exceeded -eq 1 ];then
        pstack $pid_vxesd >> $LOGFILE
        ((cnt++))
    else
        if [ $((i%12)) -eq 0 ];then
            pstack $pid_vxesd >> $LOGFILE
        fi
        cnt=0
    fi
    if [ $cnt -eq $CPU_HIGH_CNT ];then
        echo "gcore on vxesd" >> $LOGFILE
        gcore $pid_vxesd
        exit 0
    fi
    ((i++))
    sleep $INTERVAL
done

 

 

NOTES:

The script attributes: CPU_THRESHOLD, CPU_HIGH_CNT & INTERVAL can be tuned as needed.

The monitor.sh script will record the script results to /tmp/vxesd_cpu.log.
The script will also capture the gcore for vxesd.


Run the moinitor.sh script using nohup as follows:

# nohup ./monitor.sh &

 

 

To provide detailed analysis of the vxesd core file, engineering will also require the offending binary and related library files. These can be captured using the vxgetcore utility.

 

# cd /opt/VRTSspt/vxgetcore


# ./vxgetcore

Note:
        ./vxgetcore will attempt to find the correct core
        and/or binary file, but we can not guarantee that it will succeed in
        collecting the right combination of files. If you know their exact locations
        then you may quit now and re-run this utility as follows:

        # ./vxgetcore -c /path/to/corefile -b /path/to/binary

        Or, for full usage, see help:
        # ./vxgetcore -h

Press "CTRL+C" now to abort, ENTER to continue

 

 

Resolution


Whilst the issue is being investigated, the user may need to stop and restart the vxesd daemon (once the evidence have been captured) to reduce the high CPU usage.


Starting and stopping the event source daemon
 

By default, DMP starts vxesd at boot time.
 

To stop the vxesd daemon, use the vxddladm utility:

# vxddladm stop eventsource
 

To start the vxesd daemon, use the vxddladm utility:

# vxddladm start eventsource [logfile=logfilename]


To view the status of the vxesd daemon, use the vxddladm utility:

# vxddladm status eventsource

 

 

Figure 2.0

 

image.png

 

 

Issue/Introduction


The Veritas vxesd (eventsource) daemon is reporting high CPU usage whilst trying to obtain HBA topology information.

In layman terms, vxesd (eventsource) daemon is a 'helper' process that receives notifications of device related events which are used to take appropriate actions.

The benefits of vxesd include:
  • SAN Fabric event monitoring & pro-active error detection (SAN event)
  • DMP event logging (DMP event) /etc/vx/log/dmpevents.log
  • Automated device discovery (OS event)
  • iSCSI topology discovery
The vxesd daemon is key in regards to monitoring HBA related events. Figure 1.0

image.png
This issue has been observed in Netbackup Appliance environments running Veritas Volume Manager (VxVM) 7.2.x.

VxVM patches 7.2.0.5400/5401 do not appear to resolve the issue, however, 7.2.0.5401 does include a fix for a file descriptor leak issue with vxconfigd.
Veritas Netbackup Emergency Engineering Binary (EEB) hot-fix NBAPP_EEB_ET4034781-3.3.0.1-3.x86_64.rpm available to address Veritas Volume Manager 7.2.0.x File Descriptor (FD) leak & False DCPA events   https://docs.infoscale.com /support/en_US/article.100050306