Veritas Cluster Server (VCS) WebSphereMQ6 agent does not clean up the WebSphere MQ Manager processes when Second Level Monitor returns offline.
book
Article ID: 100001623
calendar_today
Updated On:
Resolution
The following depicts a situation where the application processes are running but the application itself is not functioning properly. This can be caused by bringing up the application outside VCS but the application is then not functioning properly. This situation can also happen when the VCS offline entry point fails to bring down the application completely where only the normal functioning of the application is stopped but the application processes are left running.
As for any VCS agents with Second Level Monitor (also called Detail Monitor or In depth Monitor) or agent with a customizable Monitor Script (MonScript), the offline state resulted from the Second Level Monitor can nullify the online state returned by the Fist Level Monitor. In general the First Level Monitor is a process-based monitor which only monitors the process table for the existence of the application processes. For example, in the case of WebSphereMQ6 agent, the MQ Manager processes (e.g. amqrrmfa, runmqchi, etc) will be monitored. If those processes exist, then the First Level Monitor will return online. If the Second Level Monitor is enabled, then the agent will call the Second Level Monitor. (The Second Level Monitor will not be called if the First Level Monitor returns offline.) The Second Level Monitor is a functionality-based monitor which uses the application utility to check if the application is functioning properly. For example, in the case of the WebSphereMQ6 agent, the Second Level Monitor runs the WebSphereMQ6 utility runmsqc and checks the exit code of the utility. If the Second Level Monitor returns offline because the application is not functioning properly, the agent will ignore the result from the First Level Monitor and return offline as the final resource state. This can cause a problem for VCS because VCS will not be able to detect the existence of the stale application processes and will not clean up the processes. This may cause subsequent online failure because the application may find that the application are already running and refuses to start new application processes.
This problem exists in the WebSphereMQ6 agent up to version 5.1.7.0. In WebsphereMQ6 agent version 5.1.8.0, the Second Level Monitor will be enhanced to handle the above situation. In WebSphereMQ6 agent version 5.1.8.0 the Second Level Monitor will not return offline state directly but call the function HandlePartialOnline() to handle the situation. If the existing resource state is offline, the function HandlePartialOnline() will cause the monitor to return online with a confidence level of 50. The monitor exit code will be 105 in this case. The HandlePartialOnline() function has additional logic to return offline in the immediate following monitor cycle. This online turning offline transition will cause the agent to declare the resource faulted and call the clean entry point to clean up the stale application processes.
The new feature will be available in the WebSphereMQ6 agent version 5.1.8.0 which will be part of the next Agent Pack release (second quarter 2010). A workaround for the problem before version 5.1.8.0 is available is to set the SecondLevelMonitor attribute to 2. By setting the WebSphereMQ6 SecondLevelMonitor attribute to 2, the Second Level Monitor will only be called every alternate monitor cycle. When the Second Level Monitor is not called, the agent will return online state, then the next monitor cycle with Second Level Monitor will return offline. This online to offline transition will also cause VCS to clean the resource and kill the stale WebSphere MQ Manager processes.
Issue/Introduction
Veritas Cluster Server (VCS) WebSphereMQ6 agent does not clean up the WebSphere MQ Manager processes when Second Level Monitor returns offline.
Was this article helpful?
thumb_up
Yes
thumb_down
No