To Troubleshoot Poor vSphere Performance. Below are few questions you should ask :
A: Is this Really Unexpected Behavior(Analyze application specific I/O Behavior )
B : Running Latest Product Check HCL;
C: VM Running Vmware Tools ?
D: VM Resource/Share Issue ;
E: Antivirus Software or Recent Changes made on VM/Esxi Host ?
F: Underlying Storage Healthy;
G: ESXI have enough Resources;
H: Verify Networking Front
I : CPU Power management enabled :BIOS Power policies
Performance issues may be caused by Several different Areas such as :
1:CPU
2:Memory
3:Storage
4:Network Latency.
ESXTOP Command Overview
Things that can cause poor storage performance:
– Under sized storage arrays/devices unable to provide the needed performance
– I/O Stack Queue congestion
– I/O Bandwidth saturation, Link/Pipe Saturation
– Host CPU Saturation
– Guest Level Driver and Queuing Interactions
– Incorrectly Tuned Applications
Storage Stack Components :
GAVG (Guest Average Latency) total latency as seen from vSphere
KAVG (Kernel Average Latency) time an I/O request spent waiting inside the vSphere storage stack.
QAVG (Queue Average latency) time spent waiting in a queue inside the vSphere Storage Stack.
DAVG (Device Average Latency) latency coming from the physical hardware, HBA and Storage device.
To find out more about the threshold for each Metrics Read : http://www.yellow-bricks.com/esxtop/
Memory Constraints :
Examine the MEM overcommit avg on the first line of the command output. This value reflects the ratio of the requested memory to the available memory, minus 1.
If memory is Overcommitted
A: Increase the amount of physical RAM on the host
B: Decrease the amount of RAM allocated to the virtual machines
Determine whether the virtual machines are ballooning and/or swapping
MCTLSZ (MB)
displays the amount of guest physical memory reclaimed by the balloon driver.
SWCUR (MB)
displays the current Swap Usage.
ensure that the ballooning and/or swapping is not caused by the memory limit being incorrectly set.
CPU Constraints :
Check For Load Average. A load average of 1.00 means that the ESXi/ESX Server machine’s physical CPUs are fully utilized, and a load average of 0.5 means that they are half utilized.
Examine %READY percentage of time that the virtual machine was ready but could not be scheduled to run on a physical CPU. It should remain under 5 % If not Increase number of Physical CPU on the host or Decrease Virtual CPU Allocated to the host (Either by reducing CPU allocated to the Vm running on Esxi host or Reduce Vm running on the host )
Network Latency :
Network performance can be highly affected by CPU performance. First we have to Rule out a CPU performance issue before investigating network latency.
For Network latency Test the maximum bandwidth from the virtual machine with the Iperf tool
If you identify a bottleneck on the network: Verify VMwre Tools Version ;Verify speed settings for Network adapters; Use multiple nics to increase overall network capacity for port group that contain VM’s.If you are using iSCSI storage and jumbo frames, ensure that everything is properly configured.
If you are using Network I/O Control, ensure that the shares and limits are properly configured for your traffic. Also Ensure that traffic shaping is correctly configured.