Summary: When VMware Resource Pool constrains CPU resources on a VM within the pool, the guest OS can indicate high CPU usage when VMware showed it as not particularly busy. CPU Ready % from esxtop shows the VM has a high %. Resolution is to increase the CPU limit in the resource pool settings, power off some VMs or check for process in the VMs using excessive CPU.
I had been struggling with a Windows 2012 R2 VM running on VMware. I have a few hundred of these but this one happened to be a domain controller and was painful to remote desktop to. Further, it is running on hyper-converged Nutanix host in India. I know the host and storage are fine. I even had the fine folks at Nutanix help look at the issue just in case there was some odd issue with storage for this one VM.
To be honest, I have a couple more VMs experiencing issues. One of the issues we did find was CPU was often in a high Ready percent. One of traps we fall get caught in is there is ample CPU % free on the host. Around 40% free. But that does not mean VMs are not waiting for CPU. Clearly mine were with 10 or 15% Ready.
I manually control memory and CPU for each of my resource pool. basically, each resource pool get a percentage of usable CPU and memory. I deduct 20% of each CPU and Memory from the total of all hosts in the cluster then calculate what that means in KB and Mhz and allocate these per resource pool. This way, I can ensure the machine never exceeds 80% resource utilization. I had to create a spreadsheet to calculate these values for me. It is tedious and when I add a host to the cluster or worse, a new resource pool, there is a lot of updating to the spreadsheet then updating resource pool settings.
So, I checked my resource pool usage for this problem VM and while it is close to the max, it is around 5% free CPU resources. So, I have this VM that is basically idle but using 90% CPU in the OS. VMware performance stats show it rather idle. Remote Desktop control is painfully slow just to open control panel. It is acting poorly as a DC. Plenty of free resources on the host.
I see the VM is an exceedingly and pretty constant high CPU Ready %. What could cause this I thought. Then, I realize the only thing that would create this condition is if the resource pool limits was constraining the VMs in the pool despite looking OK.
So, I add 20,000 MHz to the resource pool and the CPU usage for the pool in the summary display makes this weird jump and I can see now that the resource pool was actually under duress and the VMs were being heavily constrained. Within a few minutes the VM returns to normal. The CPU % in Windows drops to a normal few percent and RDP is working normally and the CPU Ready % for the VM is under 1% again.