Error messages in VCS engine log.
2017/12/22 13:38:16 VCS WARNING V-16-2-13139 Thread(139753808484096) Canceling thread (139753574995712)
2017/12/22 13:38:16 VCS DBG_FFDC Generating FFDC for resource (vmdk_vmdisks) as monitor entry point could not complete within the expected time
....
The agent then log the "last (NN) invocations" messages.
2017/12/22 18:33:16 VCS ERROR V-16-2-13028 Thread(139753574995712) Resource(vmdk_vmdisks) - the last (60) invocations of the monitor procedure did not complete within the expected time.
2017/12/22 18:38:16 VCS WARNING V-16-2-13139 Thread(139753808484096) Canceling thread (139753574995712)
2017/12/22 18:38:16 VCS DBG_FFDC Generating FFDC for resource (vmdk_vmdisks) as monitor entry point could not complete within the expected time
....
2017/12/29 05:33:17 VCS ERROR V-16-2-13028 Thread(139753574995712) Resource(vmdk_vmdisks) - the last (1920) invocations of the monitor procedure did not complete within the expected time.
2017/12/29 05:38:16 VCS WARNING V-16-2-13139 Thread(139753808484096) Canceling thread (139753574995712)
2017/12/29 05:38:16 VCS DBG_FFDC Generating FFDC for resource (vmdk_vmdisks) as monitor entry point could not complete within the expected time
....
Finally up to 1980 times.
2017/12/29 10:33:16 VCS ERROR V-16-2-13028 Thread(139753574995712) Resource(vmdk_vmdisks) - the last (1980) invocations of the monitor procedure did not complete within the expected time.
2017/12/29 10:38:17 VCS WARNING V-16-2-13139 Thread(139753808484096) Canceling thread (139753574995712)
2017/12/29 10:38:17 VCS DBG_FFDC Generating FFDC for resource (vmdk_vmdisks) as monitor entry point could not complete within the expected time
When all the file descriptors are used up, "Too many open files" errors are then logged.
2017/12/29 14:57:17 VCS ERROR V-16-10061-22503 VMwareDisks:vmdk_vmdisks:monitor:Failed to login to ESX esxserver.example.com with error 'Fail to create pipe, (Too many open files)'
2017/12/29 14:57:17 VCS DBG_FFDC Generating FFDC for resource (vmdk_vmdisks) as monitor entry point reported UNKNOWN
Some of the software components used in the VMwareDisks agent is not multi-thread cancel safe. If the agent's entry points times out and a program thread is cancelled, there is a chance that the TCP socket opened to communicate with the ESX server will not be closed and leads to the file descriptor leak problem.
The problem can be confirmed by running "lsof" to check the file descriptors used by the VMwareDisks agent. For example, the following output shows that the agent is leaking the TCP sockets. (124589 is the process ID of the VMwareDisks agent.)
# lsof -p 124589
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
VMwareDis 17321 root 3u IPv4 50412863 0t0 TCP vm01.example.com:60392->esxserver.example.com:https (CLOSE_WAIT)
VMwareDis 17321 root 12u IPv4 150664591 0t0 TCP vm01.example.com:52080->esxserver.example.com:https (CLOSE_WAIT)
VMwareDis 17321 root 13u IPv4 150669552 0t0 TCP vm01.example.com:52129->esxserver.example.com:https (CLOSE_WAIT)
VMwareDis 17321 root 14u IPv4 150676851 0t0 TCP vm01.example.com:52205->esxserver.example.com:https (CLOSE_WAIT)
.....
VMwareDis 17321 root 2044u IPv4 203213931 0t0 TCP vm01.example.com:40168->esxserver.example.com:https (CLOSE_WAIT)
VMwareDis 17321 root 2045u IPv4 203220271 0t0 TCP vm01.example.com:40239->esxserver.example.com:https (CLOSE_WAIT)
VMwareDis 17321 root 2046u IPv4 203221975 0t0 TCP vm01.example.com:40298->esxserver.example.com:https (CLOSE_WAIT)
Please monitor the VCS engine log for VMwareDisks entry point timeout messages. In the following logs we can see monitor entry point timeout messages.
2017/12/22 13:38:16 VCS WARNING V-16-2-13139 Thread(139753808484096) Canceling thread (139753574995712)
2017/12/22 13:38:16 VCS DBG_FFDC Generating FFDC for resource (vmdk_vmdisks) as monitor entry point could not complete within the expected time
2017/12/22 18:33:16 VCS ERROR V-16-2-13028 Thread(139753574995712) Resource(vmdk_vmdisks) - the last (60) invocations of the monitor procedure did not complete within the expected time.
If the above messages are observed, please check the number of file descriptors opened by the agent.
# lsof -p
Restart the agent using the following commands before the number of file descriptors approaching 2048.
# haagent -stop VMwareDisks -force -sys
# haagent -start VMwareDisks -sys
Checked that agent is running again.
# haagent -display VMwareDisks