How to collect File Descriptor (FD) and client connections details in relation to vxconfigd (Too many open files) collect_vold.sh (LINUX)

book

Article ID: 100050411

calendar_today

Updated On:

Description

Description

 

This article outlines how to collect File Descriptor (FD) and client connections details in relation to Veritas Volume Manager (VxVM) vxconfigd daemon using the script collect_vold.sh for Linux.


Sample messages for vxconfigd (from syslog):

Apr  5 18:03:43 fred vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sdbuz: Too many open files
 

Environments running with Netbackup Appliance (NBA) v3.3.0.1 aka InfoScale 7.2.x are experiencing a high number of client connections to vxconfigd along with a new File Descriptor (FD) leak issue.


Observations

The VxVM cmdlog file located in /var/adm/vx is recording multiple VxVM commands (vxdisk scandisk, vxdisk -p list, vxprint) being executed every 15 minutes (Netbackup autosupport collector interval) with spurts every 25-30 seconds.

The repeated execution of these VxVM commands seems to be having a negative impact on vxconfigd over an extended period of time.


Operational Impact


Netbackup Appliance environments are reporting repeated DMP I/O disk failures for  SAN attached and internal appliance disks.

Even with VRTSvxvm 7.2.0.5401 applied, vxconfigd is still encountering a high number of client connections and a new File Descriptor (FD) Leak outside of the known issue fixed below.
 

Known issue

Veritas has provided a fix for both Netbackup Appliances & InfoScale 7.2.x environments to handle a known File Descriptor (FD) leak issue.

Veritas Netbackup Emergency Engineering Binary (EEB) hot-fix NBAPP_EEB_ET4034781-3.3.0.1-3.x86_64.rpm available to address Veritas Volume Manager 7.2.0.x File Descriptor (FD) leak & False DCPA events

https://www.veritas.com/support/en_US/article.100050306


This EEB hot-fix is a private hot-fix and is only available by contacting Veritas Technical Support.
 

Once the EEB has been installed, the VRTSvxvm will list version 7.2.0.5401
 

/home/maintenance # rpm -qa |grep vxvm
VRTSvxvm-7.2.0.5401-RHEL7.x86_64
 

Veritas InfoScale VRTSvxvm-7.2.0.5401-RHEL7.x86_64.rpm has been ported to the Emergency Engineering Binary (EEB) hot-fix via Etrack 4034781.
 

Reference Veritas Incidents:

ET4034781
STESC-5775

 

Even with this EEB hot-fix, vxconfigd may still continue to report unwanted DMP I/O failures.

 

 

Script: collect_vold.sh
 

Veritas engineering has developed the following "collect_vold.sh" script to debug issues relating to the Veritas Volume Manager vxconfigd daemon on Linux.

 

The "collect_vold.sh" script will collect the FD (File Descriptor) count and the number of client connections for vxconfigd on Linux platforms.


Usage:

# nohup ./collect_vold.sh

e.g.
 

# nohup ./collect_vold.sh 800 600 &

Once either the FD count or client connection reaches the specified thresholds (800 and 600 can be modified), the script will collect the output and save to /tmp/fd_stat.out.

The script will also collect a gcore of vxconfigd in the current directory and exit.

 

# cat collect_vold.sh

#!/usr/bin/bash

while :;
do

    fds=$(ls -l /proc/`pgrep vxconfigd`/fd | wc -l)
    connections=`netstat -a | grep vold | wc -l`

    if [ $fds -gt "$1" ];then
        curdate=`date`
        echo "$curdate - fds($fds) is greater than $1" >> /tmp/fd_stat.out
        echo "connections: $connections" >> /tmp/fd_stat.out
        ls -l /proc/`pgrep vxconfigd`/fd >> /tmp/fd_stat.out
        ps -ef --forest >> /tmp/fd_stat.out
        gcore -o core.`date +%F-%T` `pgrep vxconfigd`
        #exit 0
    fi

    if [ $connections -gt "$2" ];then
        curdate=`date`
        echo "$curdate - connections($connections) is greater than $2">> /tmp/fd
_stat.out
        echo "connections: $connections" >> /tmp/fd_stat.out
        ls -l /proc/`pgrep vxconfigd`/fd >> /tmp/fd_stat.out
        ps -ef --forest >> /tmp/fd_stat.out
        gcore -o core.`date +%F-%T` `pgrep vxconfigd`;
        #exit 0
    fi
done

 

# End of Script

 

vxgetcore
 

To analyze the vxconfigd core file, Veritas Support will also require the user to collect the related binary & library file.
 

The vxgetcore utility can be used to collect these additional files.

 

# cd /opt/VRTSspt/vxgetcore


# ./vxgetcore

Note:
        ./vxgetcore will attempt to find the correct core
        and/or binary file, but we can not guarantee that it will succeed in
        collecting the right combination of files. If you know their exact locations
        then you may quit now and re-run this utility as follows:

        # ./vxgetcore -c /path/to/corefile -b /path/to/binary

        Or, for full usage, see help:
        # ./vxgetcore -h

Press "CTRL+C" now to abort, ENTER to continue

 

Once the tarfile has been created by vxgetcore utility, upload the evidence to Veritas Technical Support.

 

This article contains information about providing data to Veritas Technical Support
https://www.veritas.com/docs/000097935

 

Troubleshooting steps:


To isolate what is running at the time of the "vxdisk scandisks" execution, the following while loop was performed:
 

# while true; do ps `ps -ef |grep scandisks | awk '{print $3}'`; done

Sample output:

 

===========
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY          TIME CMD
 84835 pts/1    00:00:00 sudo
 84852 pts/1    00:00:00 clish
 90095 pts/1    00:00:01 collector
 90107 pts/1    00:00:03 intelHw
 90263 pts/1    00:00:00 sh
 90264 pts/1    00:10:04 qaucli
 90755 pts/1    00:10:01 qaucli
 90756 pts/1    00:00:01 /usr/openv/netb
 90757 pts/1    00:10:01 qaucli
 90758 pts/1    00:00:00 bpgetconfig
 90760 pts/1    00:00:00 cat
159456 pts/1    00:00:00 su
225432 pts/1    00:00:00 ps
331366 pts/1    00:00:00 sudo
331395 pts/1    00:00:00 bash
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
.
.
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
   PID TTY      STAT   TIME COMMAND
240062 pts/1    Sl     0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
   PID TTY      STAT   TIME COMMAND
240062 pts/1    Sl     0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
   PID TTY      STAT   TIME COMMAND
240062 pts/1    Sl     0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
   PID TTY      STAT   TIME COMMAND
240062 pts/1    Sl     0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
   PID TTY      STAT   TIME COMMAND
240062 pts/1    Sl     0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
   PID TTY      STAT   TIME COMMAND
240062 pts/1    Sl     0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
   PID TTY      STAT   TIME COMMAND
240062 pts/1    Sl     0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
   PID TTY      STAT   TIME COMMAND
240062 pts/1    Sl     0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
   PID TTY      STAT   TIME COMMAND

 

NOTE: To prove that the Netbackup autosupport collector (as-collector) was triggering the VxVM commands, the as-collector was disabled and no VxVM commands were observed whilst it was stopped


The following Netbackup msdp.py & msdpcloud.py scripts call "is_d2c_media" & "is_d2c_model".
These modules runs VxVM commands: vxdisk scandisk,  vxdisk -p and vxprint
 

The autosupport (as-collector) plugins can be disabled as follows:


# cd /opt/autosupport/collector/plugins


Disable the specified plugins using the following syntax:

# for i in Sas3ircu.yaml MegaRAIDCollectorPlugin.yaml partition.yaml MSDPCLOUDCollectorPlugin.yaml MSDPCollectorPlugin.yaml SASCableManager.yaml;do sed -i 's/isdisabled: 0/isdisabled: 1/g' $i;done


# as-collector stop
# as-collector start

To confirm the nbapp_d2cmgmt plugin and VxVM commands are not running, type:

# tail -f /var/adm/vx/cmdlog /log/app_vxul/*.log /log/autosupport/collector.log /var/log/messages | egrep -A2 "vxdisk|vxprint|cmdlog|\/app_vxul\/|collector.log|nbapp_d2cmgmt|\/messages"


Veritas Technical Support will require the user to collect the following directory "/log/app_vxul/" contents.

 

Figure 1.0

 



 

NOTE: The VxVM "vxdisk scandisks" is an expensive operation and its use should be limited and only executed when there is an essential need to do so.

 

[VxVM][210201-000359][Saudia Arabia Minstry of Interior] NBA: RHEL 7.6 VRTSvxvm-7.2.0.5401-RHEL7: vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sdcbw: Too many open files - resulting in dgdisabled for nbuapp, high connections count to vold

https://jira.community.veritas.com/browse/APPCFT-7707

Issue/Introduction

How to collect File Descriptor (FD) and client connections details in relation to vxconfigd (Too many open files) collect_vold.sh (LINUX)