This article outlines how to collect File Descriptor (FD) and client connections details in relation to Veritas Volume Manager (VxVM) vxconfigd daemon using the script collect_vold.sh for Linux.
Sample messages for vxconfigd (from syslog):
Apr 5 18:03:43 fred vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sdbuz: Too many open files
Environments running with Netbackup Appliance (NBA) v3.3.0.1 aka InfoScale 7.2.x are experiencing a high number of client connections to vxconfigd along with a new File Descriptor (FD) leak issue.
Observations
The VxVM cmdlog file located in /var/adm/vx is recording multiple VxVM commands (vxdisk scandisk, vxdisk -p list, vxprint) being executed every 15 minutes (Netbackup autosupport collector interval) with spurts every 25-30 seconds.
The repeated execution of these VxVM commands seems to be having a negative impact on vxconfigd over an extended period of time.
Operational Impact
Netbackup Appliance environments are reporting repeated DMP I/O disk failures for SAN attached and internal appliance disks.
Even with VRTSvxvm 7.2.0.5401 applied, vxconfigd is still encountering a high number of client connections and a new File Descriptor (FD) Leak outside of the known issue fixed below.
Known issue
Veritas has provided a fix for both Netbackup Appliances & InfoScale 7.2.x environments to handle a known File Descriptor (FD) leak issue.
Veritas Netbackup Emergency Engineering Binary (EEB) hot-fix NBAPP_EEB_ET4034781-3.3.0.1-3.x86_64.rpm available to address Veritas Volume Manager 7.2.0.x File Descriptor (FD) leak & False DCPA events
https://www.veritas.com/support/en_US/article.100050306
This EEB hot-fix is a private hot-fix and is only available by contacting Veritas Technical Support.
Once the EEB has been installed, the VRTSvxvm will list version 7.2.0.5401
/home/maintenance # rpm -qa |grep vxvm
VRTSvxvm-7.2.0.5401-RHEL7.x86_64
Veritas InfoScale VRTSvxvm-7.2.0.5401-RHEL7.x86_64.rpm has been ported to the Emergency Engineering Binary (EEB) hot-fix via Etrack 4034781.
Reference Veritas Incidents:
ET4034781
STESC-5775
Even with this EEB hot-fix, vxconfigd may still continue to report unwanted DMP I/O failures.
Script: collect_vold.sh
Veritas engineering has developed the following "collect_vold.sh" script to debug issues relating to the Veritas Volume Manager vxconfigd daemon on Linux.
The "collect_vold.sh" script will collect the FD (File Descriptor) count and the number of client connections for vxconfigd on Linux platforms.
Usage:
# nohup ./collect_vold.sh
e.g.
# nohup ./collect_vold.sh 800 600 &
Once either the FD count or client connection reaches the specified thresholds (800 and 600 can be modified), the script will collect the output and save to /tmp/fd_stat.out.
The script will also collect a gcore of vxconfigd in the current directory and exit.
# cat collect_vold.sh
#!/usr/bin/bash
while :;
do
fds=$(ls -l /proc/`pgrep vxconfigd`/fd | wc -l)
connections=`netstat -a | grep vold | wc -l`
if [ $fds -gt "$1" ];then
curdate=`date`
echo "$curdate - fds($fds) is greater than $1" >> /tmp/fd_stat.out
echo "connections: $connections" >> /tmp/fd_stat.out
ls -l /proc/`pgrep vxconfigd`/fd >> /tmp/fd_stat.out
ps -ef --forest >> /tmp/fd_stat.out
gcore -o core.`date +%F-%T` `pgrep vxconfigd`
#exit 0
fi
if [ $connections -gt "$2" ];then
curdate=`date`
echo "$curdate - connections($connections) is greater than $2">> /tmp/fd
_stat.out
echo "connections: $connections" >> /tmp/fd_stat.out
ls -l /proc/`pgrep vxconfigd`/fd >> /tmp/fd_stat.out
ps -ef --forest >> /tmp/fd_stat.out
gcore -o core.`date +%F-%T` `pgrep vxconfigd`;
#exit 0
fi
done
# End of Script
vxgetcore
To analyze the vxconfigd core file, Veritas Support will also require the user to collect the related binary & library file.
The vxgetcore utility can be used to collect these additional files.
# cd /opt/VRTSspt/vxgetcore
# ./vxgetcore
Note:
./vxgetcore will attempt to find the correct core
and/or binary file, but we can not guarantee that it will succeed in
collecting the right combination of files. If you know their exact locations
then you may quit now and re-run this utility as follows:
# ./vxgetcore -c /path/to/corefile -b /path/to/binary
Or, for full usage, see help:
# ./vxgetcore -h
Press "CTRL+C" now to abort, ENTER to continue
Once the tarfile has been created by vxgetcore utility, upload the evidence to Veritas Technical Support.
This article contains information about providing data to Veritas Technical Support
https://www.veritas.com/docs/000097935
Troubleshooting steps:
To isolate what is running at the time of the "vxdisk scandisks" execution, the following while loop was performed:
# while true; do ps `ps -ef |grep scandisks | awk '{print $3}'`; done
Sample output:
===========
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY TIME CMD
84835 pts/1 00:00:00 sudo
84852 pts/1 00:00:00 clish
90095 pts/1 00:00:01 collector
90107 pts/1 00:00:03 intelHw
90263 pts/1 00:00:00 sh
90264 pts/1 00:10:04 qaucli
90755 pts/1 00:10:01 qaucli
90756 pts/1 00:00:01 /usr/openv/netb
90757 pts/1 00:10:01 qaucli
90758 pts/1 00:00:00 bpgetconfig
90760 pts/1 00:00:00 cat
159456 pts/1 00:00:00 su
225432 pts/1 00:00:00 ps
331366 pts/1 00:00:00 sudo
331395 pts/1 00:00:00 bash
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
.
.
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
PID TTY STAT TIME COMMAND
240062 pts/1 Sl 0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
PID TTY STAT TIME COMMAND
240062 pts/1 Sl 0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
PID TTY STAT TIME COMMAND
240062 pts/1 Sl 0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
PID TTY STAT TIME COMMAND
240062 pts/1 Sl 0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
PID TTY STAT TIME COMMAND
240062 pts/1 Sl 0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
PID TTY STAT TIME COMMAND
240062 pts/1 Sl 0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
PID TTY STAT TIME COMMAND
240062 pts/1 Sl 0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
PID TTY STAT TIME COMMAND
240062 pts/1 Sl 0:00 python2.7 /opt/NBUAppliance/scripts/nbapp_d2cmgmt_storage_ui.py configuration check msdpc
PID TTY STAT TIME COMMAND
NOTE: To prove that the Netbackup autosupport collector (as-collector) was triggering the VxVM commands, the as-collector was disabled and no VxVM commands were observed whilst it was stopped
The following Netbackup msdp.py & msdpcloud.py scripts call "is_d2c_media" & "is_d2c_model".
These modules runs VxVM commands: vxdisk scandisk, vxdisk -p and vxprint
The autosupport (as-collector) plugins can be disabled as follows:
# cd /opt/autosupport/collector/plugins
Disable the specified plugins using the following syntax:
# for i in Sas3ircu.yaml MegaRAIDCollectorPlugin.yaml partition.yaml MSDPCLOUDCollectorPlugin.yaml MSDPCollectorPlugin.yaml SASCableManager.yaml;do sed -i 's/isdisabled: 0/isdisabled: 1/g' $i;done
# as-collector stop
# as-collector start
To confirm the nbapp_d2cmgmt plugin and VxVM commands are not running, type:
# tail -f /var/adm/vx/cmdlog /log/app_vxul/*.log /log/autosupport/collector.log /var/log/messages | egrep -A2 "vxdisk|vxprint|cmdlog|\/app_vxul\/|collector.log|nbapp_d2cmgmt|\/messages"
Veritas Technical Support will require the user to collect the following directory "/log/app_vxul/" contents.
Figure 1.0

NOTE: The VxVM "vxdisk scandisks" is an expensive operation and its use should be limited and only executed when there is an essential need to do so.
https://jira.community.veritas.com/browse/APPCFT-7707