Maximizing VM Performance 1.2

Embed Size (px)

Citation preview

  • 8/8/2019 Maximizing VM Performance 1.2

    1/10

  • 8/8/2019 Maximizing VM Performance 1.2

    2/10

    2

    Contents

    Maximizing Virtual Machine Performance ........................................................................................... 3

    Introduction ...................................................................................................................................................................................

    Requirements .................................................................................................................................................................................

    Virtual hardware and guest OS .......................................................................................................................................................

    vCPU .............................................................................................................................................................................................Memory ................................... ........................................ ........................................ ......................................... .............................

    Disk ................................................................................................................................................................................................

    Network .........................................................................................................................................................................................

    Delete unnecessary devices from your virtual hardware and guest OS ............................................................................Acknowledgement .........................................................................................................................................................................

    Summary .......................................................................................................................................................................................

  • 8/8/2019 Maximizing VM Performance 1.2

    3/10

    3

    Maximiz ing Virtual Machine Performance IntroductionVM performance is ultimately determined by the underlying physical hardware and the hypervisor that serves as the foundation

    for your virtual infrastructure. The construction of this foundation has become simpler over the years, but there are still severalareas that should be fine-tuned in order to maximize the VM performance in your environment. While some of the content of this writing will be generic toward any hypervisor, this document focuses on VMware ESX(i) 4.1.

    This is an introduction to performance tuning and not intended to cover everything in detail. Most topics have links to sites thatcontains deep-dive information if you wish to learn more.

    Requirements VMware ESX(i) 4.1 - If you are running an older version, make sure to upgrade. Performance and scalability have

    increased significantly since ESX(i) 3.x. ESX(i) 4.1 offers some improvements over ESX(i) 4.0 as well.

    Virtual machine hardware version 7 this hardware version introduces features to increase performance. If you arenot running Virtual Hardware version 7 make sure to upgrade VMware Tools first , then shutdown the VMs guest OS.In the VI Client, right-click the VM and select Upgrade Virtual Hardware .

    Warning, once you upgrade Virtual Hardware version to 7 you will lose backward compatibility to ESX(i) 3.x,so if you have a mixed environment make sure to upgrade all ESX(i) hosts first.

    Virtual hardware and guest OSThe sections below make recommendations on how to configure the various hardware components for best performance aswell as what optimizations can be done inside the guest OS.

    vCPUStart with 1 vCPU - most applications works well with that. After some time you can evaluate CPUutilization and application performance. If the application response is poor you can add additionalvCPUs as needed. If you start with multiple vCPUs and determine that you have over provisioned,it can be cumbersome to revert, depending of your OS (see HAL).

    vFoglight with Exchange cartridge looksbeyond the hypervisor into the applicationlayer

  • 8/8/2019 Maximizing VM Performance 1.2

    4/10

    4

    Make sure you select the correct Hardware Abstraction Layer (HAL) in the guest Operating System. The HAL is the driver in theOperating System for the CPU; choices are Uni-Processor (UP) single processor or Symmetric Multiprocessing (SMP)multiple processors.

    Windows 2008 uses the same HAL for both UP and SMP, which makes it easy to downgrade the number of CPUs.

    Windows 2003 and earlier have different HAL drivers for UP versus SMP. Windows automatically changes the HALdriver when going from UP to SMP. It can be very complicated to go from SMP to UP, depending on the OS andversion.

    If you have a VM running Windows 2003 SP2 or later which you have downgraded from 2 vCPU to 1 vCPU you willstill have the multiprocessor HAL in the OS. This will result in slower performance than a system with correct HAL.The HAL driver can be manually updated, however Windows versions prior to Windows 2003 SP2 cannot be easilycorrected. I have personally experienced systems with an incorrect HAL driver to consume more CPU, which can oftenpeak to unnecessary high CPU utilization percentages once the system gets stressed.

    Make sure your multi-processor VMs have an OS and application that support multi-threading and can take advantageof it. If not, you are wasting resources.

    This example shows a VM with almost same CPU utilization across all vCPUs.That means OS and application are multi-threaded.

    CPU scheduling

    ESX 2 used strict co-scheduling which required a 2 vCPU VM to have 2 pCPU available at the same time. At this time physicalCPUs had single or dual core, which lead to slow performance when hosting to many VMs. ESX(i) 3 introduced relaxed co-

    scheduling which allows a 2 vCPU VM to be scheduled even though that there are not 2 pCPU available at the same time.

    ESX(i) 4 refines the relaxed co-scheduler even further, increasing performance and scalability.

  • 8/8/2019 Maximizing VM Performance 1.2

    5/10

    5

    CPU % Ready

    The best indication that a VM is suffering from CPU congestion on an ESX(i) host is when CPU % Ready reaches 5-10% over time, in this range further analysis might be needed. Values higher than 10% is definitely showing a critical contention. Thismeans that the VM has to wait for the ESX(i) host to schedule its CPU requests due to CPU resource contention with other VMs. This performance metric is one of the most important ones to monitor in order to understand the overall performance in avirtual environment. This can only be seen in the hypervisor and as a result CPU utilization inside the guest OS might be veryhigh.

    CPU % Ready is an important metric to understand VM performance

    MemoryWhen you create a VM you allocate a certain amount of memory to it. There is a feature in the virtual machine settings knownas Memory Limit (which often hurts more than it helps). The function of this setting is designed to limit the hypervisor memoryallocation to a value other than what is actually assigned. This means that the guest OS will still see the full amount of memoryallocation however the hypervisor will only allow use of physical memory up to the amount of the Memory Limit.

    The only use case I have found for this is an application that requires 16 GB memory (as an example) to install or start, but itonly uses 4 GB in operation. You can create a Memory Limit at a much lower value than the actual memory allocation. Theguest OS and application will see the full 16 GB memory but the ESX(i) host limit the physical memory to 4 GB.

    But in reality, the memory limit often gets set on VMs that you had no intention to limit. This can happen when you move VMsacross different resource pools or perform a P2V of a physical system. This may also happen as the result of a known bug invCenter which will randomly set a memory limit on a virtual machine, or worst of all, in the templates previously configuredwhich results in all deployed VMs inheriting this setting.

    As further explanation: if you allocate 2 GB memory to a VM, and there is a limit at 512 MB, the guest OS will see 2 GBmemory but the ESX(i) host will only allow 512 MB physical memory. If guest OS require more than 512 MB memory, thememory balloon driver will start to inflate to let guest OS decide what pages are actively being used. If balloon cant reclaim anymore memory, guest OS will start to swap. If the balloon cant deflate or if memory usage is too high on ESX(i) server it willstart to use memory compression and then VMkernel swapping as last resort. Ballooning is a first warning signal and guest OS/ ESX(i) host swapping will definitely negatively impact the VM performance as well the ESX(i) host and the storage subsystemthat have to serve as virtual memory. For further explanation see: http://www.vmguru.com/index.php/articles-mainmenu-62/mgmt-and-monitoring-mainmenu-68/96-memory-behavior-when-vm-limits-are-set

    vFoglight: Detect, Diagnose and Resolve VMproblems. Memory limits will be detected by arule, diagnose telling you what is wrong andoptionally workflows to automate the resolution.

  • 8/8/2019 Maximizing VM Performance 1.2

    6/10

    6

    Memory sizing

    When configuring the amount of VM memory, consider the following:

    o Too much memory will increase the VM memory overhead - your VM density (number of VMs per host) will notbe as high as it could be.

    o Too little memory can result in guest OS swapping -your performance will be affected negatively.

    To determine the correct amount of memory you need to monitor active memory utilization over at least 30-90 days to be ableto see patterns. Some systems might only be used during a certain period of the quarter, but used very heavily during thatperiod.

    Memory utilization (Active Memory) in this example is verylow over time, which makes it safe to decrease memorysetting without affecting VM and application performance.

    Memory reclamation

    It is a best practice to right-size memory allocation in order to avoid placing extra load on ESX(i) hosts due to memoryreclamation. There are several techniques that ESX(i) host uses to reclaim VM memory. After all, you want to run as many VMsas possible and will probably over-commit memory (allocate more than you have).

    o Ballooning*: Reclaiming memory by increasing memory pressure inside the VM this requires VMware Tools. Do notdisable ballooning, as it will negatively impact performance. If you experience a lot of ballooning, try to vMotion the VMto another host, as it will allocate all memory back to VM. Also, make sure you dont have a fixed memory limitconfigured on the VM.

    o Swapping*: Reclaiming memory by having ESX(i) host swap out VM memory to disk.

    o Memory compression*: Reclaiming memory by compressing pages before they are swapped out to disk.

    o Transparent Page Sharing: Reclaiming memory by removing redundant pages with same content (in the same VM or across VMs). 10% of VM memory allocation can be used as compression cache.

    * Only active when ESX(i) host is experiencing memory contention.

    For more details: http://www.vmware.com/files/pdf/techpaper/vsp_41_perf_memory_mgmt.pdf

    http://frankdenneman.nl/2010/06/memory-reclaimation-when-and-how/

    Memory definitions:

    Granted: Physical memory being granted to VM byESX(i) host

    Active: Physical memory actively being used byVM.

    Ballooned: Memory being used by the VMwareMemory Control Driver to allow VM OS to selectivelyswap memory.

    Swapped: Memory being swapped to disk.

    For a complete list of metrics and descriptions see:http://communities.vmware.com/docs/DOC-5600

  • 8/8/2019 Maximizing VM Performance 1.2

    7/10

    7

    Disk

    Now, lets move on to the most complex building block of the foundation, the disk configuration:

    ParaVirtualized SCSI (PVSCSI) controller

    PVSCSI provides better throughput and lower CPU utilization. Studies have shown,that with PVSCSI implemented, a 12 % increase in throughput and 18 % decrease of CPU utilization in comparison to the LSI Logic based controller.http://blogs.vmware.com/performance/2009/05/350000-io-operations-per-second-one-vsphere-host-with-30-efds.html

    VMware benchmarking PVSCSI versus LSI Logic: http://www.vmware.com/pdf/vsp_4_pvscsi_perf.pdf

    ESX(i) 4.1 has improved PVSCSI to be able to handle low disk I/O where older ESX (i) versions had problem withqueuing that could result in latency. http://vpivot.com/2010/02/04/pvscsi-and-low-io-workloads

    For more information on how to configure PVSCSI:http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010398

    Separate OS, Swap and Data disks into separate VMDK files.

    o This will help performance and Data Protection (primarily by excluding the swap data)

    o Consider creating a separate virtual disk controller for each disk. This will allow higher disk I/O than singlecontroller. Power off VM and change SCSI ID 1:0, 2:0 and so on and you will get additional controllers.

    LUN sizing and VM placement

    Create the datastores with correct size (500-1000GB) . LUNs that are too big will result in too many VMs, SCSIreservation conflicts, and potentially lower disk I/O caused by metadata locking (i.e.VMotion, VM power on, snapshot).

    o vStorage API for Array Integration (VAAI) is a new API in ESX(i) 4.1 that takes some of the heavy lifting fromthe hypervisor and moves it to the storage hardware. If your hardware supports it, you will be able to run

    bigger datastores without performance problems. It also helps lower metadata locking mentioned above. For more details: http://www.yellow-bricks.com/2010/11/23/vstorage-apis-for-array-integration-aka-vaai/

    Use a 8MB block size when creating the datastores as it has no negative impact on performance and it can hold larger VMDK files. It is required that you have the same block size on all datastores if you want to leverage VAAI.

    Monitor the LUN performance and identify any latency as quickly as possible to ensure that disk I/O is streamlinedacross all LUNs.

  • 8/8/2019 Maximizing VM Performance 1.2

    8/10

    8

    vFoglight Storage: Monitoring paths, throughput and latency is important

    VMFS and guest OS Alignment

    If you create new VMFS volumes from the vCenter client, the volumes will be aligned correctly for you. If you created the VMFSvolumes during ESX(i) installation your volumes will be unaligned. The only way to fix this is to Storage vMotion all VMs in theaffected datastore to a new datastore and then recreate it from the vCenter client.

    Windows 2008, 7 and Vista aligns NTFS volumes by default. All prior Windows server OS misalign the disks. You can only alignthe disk when you create it. Most Linux distributions have this misalignment tendency as well.

    On average, properly aligning the disks can increase performance by 12% and decrease latency by 10%.For more information see: http://www.vmware.com/pdf/esx3_partition_align.pdf

    Quest vOptimizer Pro can detect and resolve alignment problems on existing disks for Windows and Linux. To learnmore: http://www.quest.com/voptimizer%2Dpro/

    Storage I/O Control (SIOC)

    From ESX(i) 4.1, SIOC can be enabled on a per-datastore basis. This can be helpful if you are concerned that some missioncritical VMs are not getting the required disk I/O during times of disk congestion.

    You can configure disk shares per VM. If there is disk congestion, the VMs with higher disk shares (shares will be used onlywhen theres contention) have the priority for more disk I/O this works the same way as memory shares.

  • 8/8/2019 Maximizing VM Performance 1.2

    9/10

    9

    NetworkPhysical network

    Make sure you have multiple redundant physical NICs at 1 Gbit/s or 10 Gbit/s speedsconnected to VM virtual network switches.

    VMXNET3

    The network driver in the guest OS can be updated from default E1000 to VMXNET3 (paravirtualized network driver, sameenhancements as paravirtualized storage described above and it can leverage 10Gbit/s network speeds).

    Caution! IP address will reset to DHCP and a new MAC address will be generated. Make sure to capture your oldsettings. Ipconfig /all >c:\ip.txt (will capture your settings into ip.txt for Windows).

    Optionally, the option to enable jumbo frames on the network can help maximize the packets that traverse the environment.

    Set MTU to 9000 in guest OS driver, vSwitch, and physical network ports (end to end). Your network infrastructuremust also support jumbo frames.

    ESX(i) 4.1 support Fault Tolerance on VMXNET3 guest OS.

    For more detailed information and performance test see: http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf ,http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001805

    Network I/O Control (NetIOC)

    NetIOC allows you to control the network bandwidth utilized by vMotion, NFS, iSCSI, Fault Tolerence, VMs and Management.This can be done by configuring share or limits and allows you to control Quality of Service making sure critical componentsalways get the network bandwidth required.

    Delete unnecessary devices from your virtual hardware and guest OSUnnecessary devices in the virtual hardware and inside the guest OS will take unnecessary CPU and memory resources toemulate. If you dont use them make sure to delete them. Cleaning up inside the guest OS will not gain so much inperformance; its more of a housekeeping thing.

    Floppy, CD, USB, Serial Port, Com Port, Sound

    Less devices means less overhead on your VM

    Cleanup of deleted hardware in the OS as well

    o For Windows

    At the cmd prompt, type:

    Set devmgr_show_nonpresent_devices=1

    Start Device Manager (devmgmt.msc)

    In Device Manager Show hidden devices

    Delete all non present devices

  • 8/8/2019 Maximizing VM Performance 1.2

    10/10

    10

    AcknowledgementThanks to my colleagues at Quest Software: Tommy Patterson, Chris Walker, Paul Martin, Thomas Bryant and Scott Herold for reviewing and giving valuable feedback.

    A special thanks to VMware Trainer and Blogger: Eric Sloof at ntpro.nl for additional review and finding some errors.

    SummaryTune the foundation and you will better utilize the infrastructure. Once the building blocks (CPU, Memory, Disk and Network)are optimized then the true performance is ultimately determined.