If the Operating System doesn't schedule GAB and LLT timer functions on time, it affects heartbeating and ultimately triggers node eviction. This is the expected behaviour from LLT/GAB perspective.
In virtualized environments there are a lot of external factors that need to be considered with respect to the stability of the cluster.
Such factors include:
1. Provisioning ratio: CPU and memory provisioning ratios will affect the stability of Veritas cluster. For maximum stability the ratio should be kept as minimum as possible. For critical solutions that require maximum resiliency, the ratio should be 1:1 for both memory and CPU
2. CPU load on ESX: Even if the provisioning ratio is low, CPU load on ESX can still play a part in cluster stability. If the load on the ESX is very high, this can affect how vCPUs on the guest VMs are scheduled as vCPUs are just the processes with respect to the ESX servers.
3. CPU requirement of the actual workload on guests: If the total CPU requirement for workloads exceeds the available physical CPU capacity, then node evictions will still occur due to heartbeat timeouts.
4. External events: External events like vmotion, vmdk backups etc are known to add CPU load on the ESX servers and so any duration of stun in cluster environments caused by these events should be monitored and the peerinact tunable increased if needed. If it's not possible to increase it, then these types of operations should be avoided.
5. Hypervisor best practices should be followed. The following is a link to debug virtual machine performance issues: https://kb.vmware.com/s/article/2001003
Veritas recommends changing the default value of peerinact from 16secs to a minimum of 32 seconds.
The following command can be used on each cluster node to set it dynamically:
# lltconfig -T peerinact:3200
# lltconfig -T query // to confirm that the new value is in place
To make this setting persistent across reboots, the following line needs to be added to the /etc/llttab file:
set-timer peerinact:3200