Troubleshooting VCS Application Resource Faulting and Failover
book
Article ID: 100066476
calendar_today
Updated On:
Description
Error Message
Agent is calling clean for resource(app_res) because the resource became OFFLINE unexpectedly, on its own.
Cause
The VCS Application resource, "app_res", faulted outside the VCS, triggering the failover to the secondary node. This fault could have been caused by various factors, such as a system issue, application crash, or misconfiguration.
The Application resource type allows a user to define a custom agent with a set of scripts that will start, stop, monitor, and clean-up in the event of a failure. If an application resource faults unexpectedly, that means that the monitor process returned an "offline" exit status without the VCS engine sending an intentional stop/offline command.
If the application agent is a custom agent defined by a user and not by Veritas, it is the responsibility of the user to troubleshoot the resource and verify that it is defined and working appropriately.
Resolution
To troubleshoot and resolve the issue, follow these steps:
-
Check OS Logs:
- Review the OS logs for any relevant error messages or events that occurred around the time of the fault.
- Look for any system issues or errors that could have caused the resource to go offline unexpectedly.
-
Analyze Veritas Engine Logs:
- Examine the Veritas engine logs to gather more information about the faulted resource.
- Look for any error codes or messages related to the app_res resource.
- Pay attention to any indications of resource failures or unexpected behavior.
-
Review Application Agent Logs:
- Check the logs of the Veritas Application agent for any errors or warnings related to the app_res resource.
- Look for any clues or patterns that could help identify the root cause of the fault.
-
Analyze VCS Configuration:
- Verify the VCS configuration for the app_res resource.
- Ensure that the start, stop, and monitor programs are correctly defined.
- Check for any misconfigurations or inconsistencies that could lead to resource faults.
-
Investigate Application-Level Issues:
- Focus on the application itself to determine why it became offline outside the VCS.
- Check application logs or perform additional troubleshooting steps specific to the app_res resource.
- Look for any application crashes, errors, or issues that could have triggered the fault.
-
Implement Corrective Actions:
- Based on the analysis of the logs and diagnostic information, take appropriate corrective actions.
- Address any system issues, application crashes, or misconfigurations identified during the troubleshooting process.
- Make necessary changes to the VCS configuration or application settings to prevent future faults.
Issue/Introduction
VCS Application resource faulted outside the VCS, resulting in a failover to the secondary node.
Was this article helpful?
thumb_up
Yes
thumb_down
No