Hi All ...
In the tenth part of our series, we'll go through vSphere Resources Pools and Resources Management algorithms and techniques.
We'll go through ESXTOP tool which is so important monitoring tool.
Kindly, concentrate well as it's so important part which can be helpful in many administration scenarios.
Credits:
- Frank Denneman
- Duncan Epping
- Arnim van Lieshout
- zpan (Don’t know the actual name)
- haiping (Don’t know the actual name)
Now, Let's Start...
1. Resource Pools & Resources Management in vSphere 5.5:
The official guide released by VMware about resource pools & resources management in vSphere 5.5:
2. CPU Scheduler in vSphere 5.1:
Long story short, CPU Scheduler is the main driver in VMKernel which is responsible for scheduling all VMs processes to be executed on physical CPUs below. It uses a genius algorithm called Relaxed Co-scheduler. The technical paper below is explaining CPU Scheduler and its algorithm of operation:
https://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
3. Memory Reservation and Impact:
Nice article by Frank Denneman about Memory Reservation in vSphere. Unfortunately, it’s based on ESX not a vSphere 5 release, but concept is the same (confirmation needed):
http://frankdenneman.nl/2009/12/08/impact-of-memory-reservation/
4. Memory Management and Metrics in vSphere 5:
Technical paper by VMware about memory management and related concepts in vSphere 5.0:
http://www.vmware.com/files/pdf/mem_mgmt_perf_vsphere5.pdf
Another KB article by VMware explaining memory concepts, including Memory Overcommitment, and updated to vSphere 5.5:
For memory metrics understanding, check the following article by Duncan Epping which clears them a lot:
http://www.yellow-bricks.com/2010/12/20/vcenter-and-memory-metrics/
Last, another series of articles by Arnim van Lieshout on his blog are also useful for understanding memory metrics and how ESXi hosts deal with their physical memory in vSphere environment:
http://www.van-lieshout.com/2009/04/esx-memory-management-part-1/
http://www.van-lieshout.com/2009/05/esx-memory-management-part-2/
http://www.van-lieshout.com/2009/05/esx-memory-management-%E2%80%93-part-3/
5. Large Memory Pages:
Large Memory Pages are pages reserved in physical memory with 2 MB size. VMKernel reserves each page for each page of VM virtual memory even if the virtual memory page is less than 2 MB in size to increase memory performance by decreasing the size of VMKernel translation tables -used to translate virtual memory page address to physical memory page address- and hence, decrease memory accessing time. This feature can be used automatically with physical CPUs support HW-assisted MMU.
The following KB article by VMware is describing it:
Keep in mind that, Large Memory Pages will lead to reduce memory freed by Transparent Page Sharing (TPS) feature and hence may all memory by allocated as described by article above. In case of memory over-commitment, Large Memory Pages will be divided into smaller pages (4 KB) and hence TPS can save memory.
6. Resource Allocation in vSphere 5.x:
In vSphere 5.x, any resource (CPU, Memory, etc.) is controlled by three parameters: Reservation, Share & Limit. Reservation is “Min. allocation of that resource for the VM without which that VM can’t be powered on”. Limit is “Max. allocation of that resource for the VM. It is bounded either by underlying host’s physical resources or parent resource pools’ reservations, limits and shares” and it’s greater than or equal to the VM’s reservation. Share is “A proportional weight according to which a portion of resource is allocated to the VM in case of resource contention”.
For any VM with Reservation, Share and Limit are configured, the reserved portion of a resource is allocated to it physically when the VM is powered up and physical resources assigned within its HW settings are allocated to it till either one of the following cases happens:
1-) Limit is approached, and then no more of that resource will be allocated.
2-) Contention occurs, and then Share value is used. All share values of VMs using same parent resource pool or host is added and the total amount of the resources (CPU/Memory) are divided between VMs according to these values (relative weighting). If the share of a VM of this resource is greater than the Limit value, Limit value is used and allocated for this VM. If not, the share is allocated completely to the VM.
3-) Resources allocated to the VM in its HW settings are allocated completely and it’s the most optimum case.
Keep in mind that, in all cases, guest OS doesn’t feel these operations and it assumes that it already has what is configured in VM HW settings, i.e. in a Windows Machine, Task manager will view the amount of CPU and Memory that’re configured in machine’s HW settings despite of what is already allocated physically below on the host.
7. Resource Monitoring Using ESXTOP:
ESXTOP tool is a really useful tool which allows you to examine each aspect of your ESXi hosts performance as well as examining your VMs’ disks performance and your storage device VAAI performance. This tool needs some time to be mastered and get its full capabilities. First, the following document on VMware Forums by zpan (Don’t know the actual nameJ) is a nice start guide to understand ESXTOP and its counters:
https://communities.vmware.com/docs/DOC-9279
Another document on VMware Forums by haiping (Don’t know the actual nameJ) is another guide:
https://communities.vmware.com/docs/DOC-11812
Both guides on vSphere 4.1, but they still can be applied to vSphere 5.1 (vSphere 5.5 confirmation needed).
The following guide by Duncan Epping is another guide and it’s really nice and it contains many useful links about ESXTOP tool:
http://www.yellow-bricks.com/esxtop/
The following link is a pdf file from VMWorld 2012 which summarizes many useful counters and their respective level of values:
http://www.vmworld.net/wp-content/uploads/2012/05/Esxtop_Troubleshooting_eng.pdf
Last thing, I summarized some of the important counters as following:
1-) CPU Load Average:
Determines how much pCPUs are utilized by a ratio number of vCPUs threads managed to the number of pCPUs. For example: 1.00 means that each pCPU handles one thread, i.e. 100% utilization, 2.00 means that each pCPU handles one thread and another thread is waiting to be executed and 0.33 means that each one thread has 3 pCPUs available to handle it or that if the host has 6 pCPUs, they’re handling just two threads at the same time at the time of that average.
2-) Ready Time (RDY %):
It’s percentage of time vCPU waits with load to be handled by pCPU before that pCPU begins to execute that load. The lower the better.
Average value is lower than 10% and it increases due to overestimating of VMs needs of vCPUs.
3-) DRPTX % - DRPRX%:
Percentage of dropped packets in transmitting (TX) and receiving (RX). When pNICs utilization is high, the packets are queued in their buffers. When the buffers are full, packets begin to queue in vSwitch buffer. When vSwitch buffer is full, it begins to drop packets. These percentages can be also high due to heavy pCPUs utilization. As pCPUs utilization goes heavier, pCPUs can’t find enough time windows to move packets between vNICs, pNICS and vSwitches and in/out vSwitches. This leads to full queue at both pNICs and vSwitches buffers and hence dropping packets.
The higher these percentages are, the higher the utilization is and the worse the network performance is.
8. Creating and Using New ESXTOP Profile:
For creating custom ESXTOP profile that contains certain counters, use the following steps:
1-) Run ESXTOP in interactive mode: esxtop.
2-) Select your view (CPU-Disk Adapter-..etc.).
3-) Use hot keys: (f)for adding\removing fields of the selected view and (o) for re-ordering the columns of the selected view.
4-) Save the view in a new profile with (W) and enter the path to save when prompted.
To open the new profile saved, run the following command: esxtop -c ‘File_Path’.
9. ESXTOP in Replay Mode Best Practice:
ESXTOP Replay Mode is used to view all performance counters gather by vm-support command.
Use the following steps:
1-) Run vm-support tool using: vm-support –p –d <Duration_of_Performance_Data_Gathering> -i <Duration_between_Each_Two_Snapshots_of_Data_Gathered> -w <Path_to_Save_the_Snapshots>.
2-) After finishing gathering, check the directory where you saved the output. It’ll be saved as a .tgz file.
3-) To begin check the data saved, decompress that .tgz using : tar zxvf <File_Path/File_Name>.
4-) After finishing decompressing, open the directory created with the same name of .tgz file.
5-) Reconstruct the data from the snapshots exist inside the directory using: ./reconstruct.sh.
6-) After finishing, use the command: esxtop –R <Directory_Where_the_Snapshots_Reside_after _Reconstruction> to replay the data from the snapshots saved.
Share the Knowledge ...
Previous: vSphere 5.x Notes & Tips - Part IX: