Sample error
vxvm:vxconfigd: V-5-1-12223 Error in claiming /dev/sdclh: Too many open files
Data Corruption Protection Activated (DCPA) messages
V-5-1-14523 LUN serial number of the OS device path with device number 8/32 has changed from 600605B00B7B222026ADF17F59D0ECF8 (sdc) to 00f8ecd0597ff1ad2620227b0bb00506 (sdc)
V-5-1-14522 Attempt to isolate DMP node 8/32 failed, error retuned is Device or resource busy: Device or resource busy
V-5-1-8769 ddl_find_devices_in_system: ddl_reconfigure_all failed: Device or resource busy: Device or resource busy
V-5-1-16011 Data Corruption Protection Activated - User Corrective Action Needed:
To recover, first ensure that the OS device tree is up to date (requires OS specific commands).
Then, execute 'vxdisk rm' on the following devices before reinitiating device discovery using 'vxdisk scandisks'
V-5-1-0 fred_disk_6
V-5-1-13790 No device configuration changes have been applied to DMP kernel database.
V-5-1-13791 Please consult the documentation for correct procedure to replace disk/path.
VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x5) on dmpnode 201/0x20
See related article:
False DCPA (Data Corruption Protection Activated) events for JBOD or Internal disks can occur when the original LUN Serial Number cannot be retrieved from the original Page Code offset
https://www.veritas.com/support/en_US/article.100049898
The above DCPA events are related to file open failures as too many files are currently open by vxconfigd and not released correctly, resulting in FD leaks in vxconfigd. As the devices are internal to the Netbackup Appliance, it is unexpected to see such DCPA messages which subsequently result in unintended DMPNODE failures.
Upon further investigation we can see the Lun Serial Number (LSN) changing, thus triggering the DCPA events. The impacted DMPNODEs are then disabled to protect data integrity.
The LSN changes were caused by the SCSI inquiries failing against page code 0x80 and successful inquiries from page code 0x83. As the LSN is different from each page code location, a different LSN is reported for the same disk.
We believe the SCSI inquiries initially failed against 0x80 page code due to the FD leak with vxconfigd, which ultimately triggered the false DCPA events.
When disks attached to a system are in an "online invalid" state, the VxVM vxconfigd daemon will go and read these devices. The read on the devices is necessary to confirm whether the underlying device(s) have a VxVM Private region present on them and this cannot be avoided from a VxVM perspective.
Therefore to prevent such unwanted reads against the “online invalid” disks, the disks can be offlined to avoid such unwanted read requests.
Veritas Volume Manager (VxVM) disks listed in an online invalid can cause intermittent hangs of vxconfigd and general slowness
https://www.veritas.com/support/en_US/article.100050354
To better manage SAN events, Veritas engineering has released a VxVM patch via a Veritas NetBackup Emergency Engineering Binary (EEB) to dynamically handle the removal and addition of Backup images (LUNs) following SAN zoning events.
The Veritas InfoScale engineering team have created vm-rhel7_x86_64-HotFix-7.4.2.2205 Private hot-fix & Array Support Library (ASL) update VRTSaslapm_Linux_7.4.2.2601 rolled into the EEB NBAPP_EEB_ET4057043-4.0.0.1-1.x86_64 hot-fix to automate the handling of SAN zoned devices.
In addition, SAN attached disks that appear in an "online invalid" state will be offlined automatically by the vxattachd daemon (/opt/VRTS/bin/vxattachd).
Once the EEB NBAPP_EEB_ET4057043-4.0.0.1-1.x86_64 patch has been installed, the EEB installer will also create the default template file containing the vxattachd tunables:
# more /etc/default/vxattachd
handle_invalid_disk=on
skip_offline=on
remove_disable_dmpnode=on
To reduce the number of "vxdisk scandisks" command executed by the as-collector (autosupport), Veritas also recommends the installation of EEB patch NBAPP_EEB_ET4046344-4.0.0.0-1.x86_64.
Over the years we have found the DMP tunable "dmp_monitor_fabric" can cause performance issues across various platforms.
The design of the "dmp_monitor_fabric" tunable is to proactively detect errors by catching SAN events. With it off, DMP will detect errors reactively. DMP can handle the errors coming from the lower layers, it just depends if the approach is proactive or reactive.
The tunable is turned off by default on AIX.
We need to ensure the EEB patch also disables the "dmp_monitor_fabric" tunable
https://www.veritas.com/support/en_US/article.100051027
fred:/home/maintenance # vxdmpadm gettune dmp_monitor_fabric
Tunable Current Value Default Value
------------------------------ ------------- -------------
dmp_monitor_fabric off on
The following article outlines how to disable (unbind) and enable (bind) HBA access on Red Hat:
How to temporarily (unbind) disable HBA (Fibre Channel) ports (controllers) on RedHat for testing DMP & vxattachd interoperability
https://www.veritas.com/support/en_US/article.100050716
EEB Overview
NBAPP_4.0 - 4046344
Problem Description
EEB will update interval time for the following collector plugins MegaRAIDCollectorPlugin.yaml msdp.py partition.yaml partition.py Sas3ircu.yaml sas3ircu.py.
NOTES:
Checked interval time for each collector plugin in /log/autosupport/collector.log. after installation of EEB.
NB_Appliance Installed Files
/opt/NBUAppliance/scripts/post_uninstall-4046344.pl
/opt/NBUAppliance/scripts/install-4046344.pl
Installation Requires: All NetBackup services can remain running.
NBAPP_4.0.0.1 - 4057043
Problem Description
EEB will bring down all NetBackup Appliance Services EEB will update VRTSvxvm and VRTSaslapm rpm. EEB will set tunable dmp_fabric_monitor to off.
EEB will reboot NetBackup Appliance after installation or RollBack Completes.
EEB will do the following:
- Bring down all NetBackup and Appliance Services
- Update InfoScale RPM, VRTSvxvm and VRTSaslapm.
- Set VxVM tunable dmp_fabric_monitor to off.
- Update /etc/default/vxattachd
- Reboot NetBackup Appliance Server after installation or RollBack Completes.
NB_Appliance Installed Files
/opt/NBUAppliance/scripts/install-4057043.pl
/opt/NBUAppliance/scripts/post_uninstall-4057043.pl
/opt/NBUAppliance/scripts/pre_proc_uninstall_4057043.pl
/opt/NBUAppliance/scripts/preprocess_install_4057043.pl
/opt/NBUAppliance/scripts/rpms_4057043.tar
Installation Requires: Shutdown and restart all NetBackup services.
EEB Install Steps
Login to the NBA appliance:
fred.Main_Menu> Manage
Entering appliance management view...
fred.Manage> Software
• Check EEBs have been downloaded
fred.Software> List Downloaded
EEB 1 INSTALL
fred.Software> Install NBAPP_EEB_ET4057043-4.0.0.1-1.x86_64.rpm
- [Info] The file: /opt/NBUAppliance/scripts/rpms_4057043.tar has been extracted successfully.
- [Info] Installing RPM VRTSvxvm-7.4.2.2205-RHEL7.x86_64.rpm.
- [Info] Successfully installed VRTSvxvm-7.4.2.2205-RHEL7.x86_64.rpm
- [Info] Installing RPM VRTSaslapm-7.4.2.2601-RHEL7.x86_64.rpm.
- [Info] Successfully installed VRTSaslapm-7.4.2.2601-RHEL7.x86_64.rpm
- [Info] Successfully imported nbuapp
- [Info] Successfully set dmp_monitor_fabric to off
- [Info] Installation done, rebooting the setup.
Shutdown scheduled for Sat 2022-02-05 21:28:34 PST, use 'shutdown -c' to cancel.
EEB2 INSTALL
fred.Software> Install NBAPP_EEB_ET4046344-4.0.0.0-1.x86_64.rpm
Extracting post-process script -- /opt/NBUAppliance/scripts/install-4046344.pl
Changing interval time for MegaRAIDCollectorPlugin.yaml to 1443...
- [Info] Successfully changed interval value for MegaRAIDCollectorPlugin.yaml.
Changing interval time for msdp.py to "60*17"...
- [Info] Successfully changed interval value for msdp.py.
Changing interval time for partition.py to "13 * 60"...
- [Info] Successfully changed interval value for partition.py.
Changing telemtry interval time for partition.py to "21 * 60"...
- [Info] Successfully changed telemetry interval value for partition.py.
Changing interval time for partition.yaml to 913...
- [Info] Successfully changed interval value for partition.yaml.
Changing interval time for sas3ircu.py to "60*19"...
- [Info] Successfully changed interval value for sas3ircu.py.
Changing interval time for Sas3ircu.yaml to 1023...
- [Info] Successfully changed interval value for Sas3ircu.yaml.
- [Info] Successfully restarted the Collector service.
post-process complete.
- [Info] Installer finished execution.
- [Info] Install script exited successfully.
- [Warning] No recipients are configured to receive software notifications. Use 'Settings->Alerts->Email Software Add' command to configure the appropriate Email address.
- [Info] Successfully installed the EEB NBAPP_EEB_ET4046344-4.0.0.0-1.x86_64.rpm.
fred.Software> List EEBs
List of installed EEBs:
NBAPP_EEB_ET4046344-4.0.0.0-1.x86_64
NBAPP_EEB_ET4057043-4.0.0.1-1.x86_64
Everything mounts
/dev/vx/dsk/nbuapp/advol 5368709120 493432 5326276512 1% /advanceddisk/dp1/advol
/dev/vx/dsk/nbuapp/pdvol 47365624680 2613992 47094576256 1% /msdp/data/dp1/pdvol
/dev/vx/dsk/nbuapp/catvol 2147483648 2529528 2128196944 1% /cat
/dev/vx/dsk/nbuapp/cfgvol 104857600 418072 103624088 1% /config
/dev/vx/dsk/nbuapp/pdcatvol 10737418240 662032 10652875752 1% /msdp/cat
/dev/vx/dsk/nbuapp/1pdvol 27796303000 1580264 27577577456 1% /msdp/data/dp1/1pdvol
/dev/vx/dsk/nbuapp/nbrepovol 104857600 74120 103964872 1% /nbrepo
tmpfs 26352908 0 26352908 0% /run/user/0
tmpfs 26352908 0 26352908 0% /run/user/1007
tmpfs 26352908 0 26352908 0% /run/user/888
RPMs are upgraded.
fred:/home/maintenance # rpm -qa | egrep "VRTSvxvm|VRTSaslapm"
VRTSaslapm-7.4.2.2601-RHEL7.x86_64
VRTSvxvm-7.4.2.2205-RHEL7.x86_64