VMware vSphere Performance WP En

Embed Size (px)

Citation preview

  • 7/31/2019 VMware vSphere Performance WP En

    1/13

    Whats New in VMwarevSphere

    4: Performance

    EnhancementsW H I T E P A P E R

  • 7/31/2019 VMware vSphere Performance WP En

    2/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 2

    Table of Contents

    Scalability Enhancements 3

    CPU Enhancements 4

    Memory Enhancements 4

    Storage Enhancements 5

    Networking Enhancements 7

    Resource Management Enhancements 9

    Perormance Management Enhancements 10

    Application Perormance 10

    Oracle 10

    SQL Server 11

    SAP 12

    Exchange 12

    Summary 13

    Reerences 13

  • 7/31/2019 VMware vSphere Performance WP En

    3/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 3

    VMware vSphere 4, the industrys rst cloud operating system, includes several unique new eatures that allow

    IT organizations to leverage the benets o cloud computing, with maximum eciency, uncompromised control,

    and fexibility o choice. The new VMware vSphere 4 provides signicant perormance enhancements that

    make it easier or organizations to virtualize their most demanding and intense workloads. These perormance

    enhancements provide VMware vSphere 4 with better:

    Efficiency:

    Optimizations resulting in reduced virtualization overheads and highest consolidation ratios.

    Control:

    Enhancements leading to improved ongoing perormance monitoring and management, as well as

    dynamic resource sizing or better scalability.

    Choice:

    Improvements that provide several options o guest OS, virtualization technologies, comprehensive HCL,

    integrations with 3rd-party management tools to choose rom.

    This document outlines the key perormance enhancements o VMware vSphere 4, organized into ollowing

    categories:

    ScalabilityEnhancements

    CPU,Memory,Storage,Networking

    ResourceManagement

    PerformanceManagement

    Finally, the white paper showcases the perormance improvements in various tier-1 enterprise applications as a

    result o these benets.

    ScalabilityEnhancements

    A summary o the key new scalability improvements o vSphere 4 as compared to VMwares previous

    datacenter product, VMware Inrastructure 3 (VI3), is shown in the ollowing table:

    FEaturE VI3 VSPhErE 4

    VirtualMachineCPUCount 4vCPUs 8vCPUs

    Virtual Machine Memory Maximum 64 GB 255 GB

    HostCPUCoreMaximum 32 cores 64 cores

    Host Memory Maximum 256 GB 1 TB

    Powered-onVMsperESX/ESXiMaximum 128 320

    For details see Systems Compatibility Guide and Guest Operating System Installation Guide.

    Additional changes that enhance the scalability o vSphere include:

    64LogicalCPUsand512VirtualCPUsPerHost ESX/ESXi4.0providesheadroomformorevirtual

    machines per host and the ability to achieve even higher consolidation ratios on larger machines.

    64-bitVMkernel TheVMkernel,acorecomponentoftheESX/ESXi4.0hypervisor,isnow64-bit.This

    provides greater host physical memory capacity and more seamless hardware support than earlier releases.

    64-bitServiceConsole TheLinux-basedServiceConsoleforESX4.0hasbeenupgradedtoa64-bit

    version derived rom a recent release o a leading Enterprise Linux vendor.

  • 7/31/2019 VMware vSphere Performance WP En

    4/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 4

    NewVirtualHardwareESX/ESXi4.0introducesanewgenerationofvirtualhardware(virtualhardware

    version 7) which adds signicant new eatures including:

    Serial Attached SCSI (SAS) virtual device for Microsoft Cluster Service Providessupportforrunning

    Windows Server 2008 in a Microsot Cluster Service conguration.

    IDE virtual device Ideal or supporting older operating systems that lack SCSI drivers.

    VMXNET Generation 3SeetheNetworkingsection.

    Virtual Machine Hot Plug SupportProvidessupportforaddingandremovingvirtualdevices,adding

    virtualCPUs,andaddingmemorytoavirtualmachinewithouthavingtopowerothevirtualmachine.

    Hardwareversion7isthedefaul tfornewESX/ESXi4.0virtualmachines.ESX/ESXi4.0willcontinuetorun

    virtualmachinescreatedonhostsrunningESXServerversions2.xand3.x.Virtualmachinesthatusevirtual

    hardwareversion7featuresarenotcompatiblewithESX/ESXireleasespriortoversion4.0.

    VMDirectPathforVirtualMachines VMDirectPathI/OdeviceaccessenhancesCPUeciencyinhandling

    workloadsthatrequireconstantandfrequentaccesstoI/Odevicesbyallowingvirtualmachinesto

    directly access the underlying hardware devices. Other virtualization eatures, such as VMotion, hardware

    independenceandsharingofphysicalI/Odeviceswillnotbeavailabletothevirtualmachinesusingthis

    feature.VMDirectPathI/OfornetworkingI/OdevicesisfullysupportedwiththeIntel8259810Gigabit

    Ethernet Controller and Broadcom 57710 and 57711 10 Gigabit Ethernet Controller. It is experimentally

    supportedforstorageI/OdeviceswiththeQLogicQLA25xx8GbFibreChannel,theEmulexLPe120008Gb

    FibreChannel,andtheLSI3442e-Rand3801e(1068chipbased)3GbSASadapters.

    IncreasedNFSDatastoreSupportESXnowsupportsupto64NFSsharesasdatastoresinacluster.

    CPUEnhancements

    Resource Management and Processor Scheduling

    TheESX4.0schedulerincludesseveralnewfeaturesandenhancementsthathelpimprovethethroughputofall

    workloads,withnotablegainsinI/Ointensiveworkloads.Thisincludes:

    Relaxedco-schedulingofvCPUs,introducedinearlierversionsofESX,hasbeenfurtherne-tunedespeciallyforSMPVMs.

    ESX4.0schedulerutilizesnewner-grainedlockingthatreducesschedulingoverheadsincaseswhere

    requent scheduling decisions are needed.

    Thenewschedulerisawareofprocessorcachetopologyandtakesintoaccounttheprocessorcache

    architecturetooptimizeCPUusage.

    ForI/Ointensiveworkloads,interruptdeliveryandtheassociatedprocessingcostsmakeupalargecomponent

    o the virtualization overhead. The above scheduler enhancements greatly improve the eciency o interrupt

    delivery and associated processing.

    MemoryEnhancements

    Hardware-assisted Memory Virtualization

    Memorymanagementinvirtualmachinesdiersfromphysicalmachinesinonekeyaspect:virtualmemory

    address translation. Guest virtual memory addresses must be translated rst to guest physical addresses using

    the guest OSs page tables beore nally being translated to machine physical memory addresses. The latter

    stepisperformedbyESXbymeansofasetofshadowpagetablesforeachvirtualmachine.Creatingand

    maintainingtheshadowpagetablesaddsbothCPUandmemoryoverhead.

  • 7/31/2019 VMware vSphere Performance WP En

    5/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 5

    Hardware support is available in current processors to alleviate this situation. Hardware-assisted memory

    managementcapabilitiesfromIntelandAMDarecalledEPTandRVI,respectively.Thissupportconsistsofa

    second level o page tables implemented in hardware. These page tables contain guest physical to machine

    memoryaddresstranslations.ESX4.0introducessupportfortheIntelXeonprocessorsthatsupportEPT.

    SupportforAMDRVIhasexistedsinceESX3.5.

    Apache Compile

    60%

    50%

    40%

    30%

    0%

    10%

    Eciency Improvement

    20%

    SQL Server Citrix XenApp

    Efficiency Improvement

    Figure1Eciencyimprovementsusinghardware-assistedmemoryvirtualization

    Figure 1 illustrates eciency improvements seen or a ew example workloads when using hardware-assisted

    memory virtualization.

    While this hardware support obviates the need or maintaining shadow page tables (and the associated

    performanceoverhead)itintroducessomecostsofitsown.Translationlook-asidebuer(TLB)misscosts,in

    theformofincreasedlatency,arehigherwithtwo-levelpagetablesthanwiththeone-leveltable.Usinglarge

    memorypages,afeaturethathasbeenavailablesinceESX3.5,thenumberofTLBmissescanbereduced.

    Since TLB miss latency is higher with this orm o hardware virtualization assist but large pages reduce the

    number o TLB misses, the combination o hardware assist and large page support that exists in vSphere yields

    optimal perormance.

    StorageEnhancements

    A variety o architectural improvements have been made to the storage subsystem o vSphere 4. The

    combinationofthenewparavirtualizedSCSIdriver,andadditionalESXkernel-levelstoragestackoptimizations

    dramaticallyimprovesstorageI/Operformancewiththeseimprovements,allbutaverysmallsegmentofthe

    mostI/OintensiveapplicationsbecomeattractivetargetsforVMwarevirtualization.

    VMware Paravirtualized SCSI (PVSCSI)

    Emulated versions o hardware storage adapters rom BusLogic and LSILogic were the only choices available

    inearlierESXreleases.Theadvantageofthisfullvirtualizationisthatmostoperatingsystemsshipdriversfor

  • 7/31/2019 VMware vSphere Performance WP En

    6/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 6

    these devices. However, this precludes the use o perormance optimizations that are possible in virtualized

    environments.Tothisend,ESX4.0shipswithanewvirtualstorageadapterParavirtualizedSCSI(PVSCSI).

    PVSCSIadaptersarehigh-performancestorageadaptersthatoergreaterthroughputandlowerCPUutilization

    forvirtualmachines.TheyarebestsuitedforenvironmentsinwhichguestapplicationsareveryI/Ointensive.

    PVSCSIadapterextendstothestoragestackperformancegainsassociatedwithotherparavirtualdevicessuch

    asthenetworkadapterVMXNETavailableinearlierversionsofESX.Aswithotherdeviceemulations,PVSCSI

    emulation improves eciency by:

    Reducingthecostofvirtualinterrupts

    BatchingtheprocessingofI/Orequests

    BatchingI/Ocompletioninterrupts

    A urther optimization, which is specic to virtual environments, reduces the number o context switches

    betweentheguestandVirtualMachineMonitor.EciencygainsfromPVSCSIcanresultinadditional2xCPU

    savingsforFibreChannel(FC),upto30percentCPUsavingsforiSCSI.

    S/W iSCSI

    1.2

    1

    0.8

    0.6

    0

    0.2

    LSI Logic

    pvscsi

    0.4

    Protocol

    Fibre Channel

    PVSCSI Efficiency of 4K Block

    I/0s

    Figure2EciencygainswithPVSCSIadapter

    VMware recommends that you create a primary adapter or use with a disk that will host the system sotware

    (bootdisk)andaseparatePVSCSIadapterforthediskthatwillstoreuserdata,suchasadatabaseormailbox.The primary adapter will be the deault or the guest operating system on the virtual machine. For example, or

    virtual machines with Microsot Windows 2008 guest operating systems, LSI Logic is the deault primary adapter.

    iSCSI Support Improvements

    vSphere 4 includes signicant updates to the iSCSI stack or both sotware iSCSI (that is, in which the iSCSI

    initiatorrunsattheESXlayer)andhardwareiSCSI(thatis,inwhichESXleveragesahardware-optimized

    iSCSIHBA).Thesechangesoerdramaticimprovementofbothperformanceaswellasfunctionalityofboth

  • 7/31/2019 VMware vSphere Performance WP En

    7/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 7

    softwareandhardwareiSCSIanddeliveringsignicantreductionofCPUoverheadforsoftwareiSCSI.Eciency

    gainsforiSCSIstackcanresultin7-26percentCPUsavingsforread,18-52percentforwrite.

    HW iSCSI

    60

    50

    40

    30

    0

    10

    Read

    Write

    20

    SW iSCSI

    iSCSI % CPU Efficiency Gains, ESX 4 vs. ESX 3.5

    Figure3iSCSI%CPUEciencyGainsESXvsESX

    Software iSCSI and NFS Support with Jumbo Frames

    vSphere4addssupportforJumboFrameswithbothNFSandiSCSIstorageprotocolson1Gbaswellas10GbNICs.The10GbsupportforiSCSIallowsfor10xI/Othroughputmoredetailsinnetworkingsectionbelow.

    Improved I/O Concurrency

    AsynchronousI/OexecutionhasalwaysbeenafeatureofESX.However,ESX4.0hasimprovedtheconcurrency

    ofthestoragestackwithanI/OmodethatallowsvCPUsintheguesttoexecuteothertasksafterinitiating

    anI/OrequestwhiletheVMkernelhandlestheactualphysicalI/O.InVMwaresFebruary2009announcement

    onOracleDBOLTPperformancethegainsattributedtothisimprovedconcurrencymodelweremeasuredat

    5 percent.

    NetworkingEnhancements

    Signicant changes have been made to the vSphere 4 network subsystem, delivering dramatic perormance

    improvements.

    VMXNET Generation 3vSphere4includes,VMXNET3,thethirdgenerationofparavirtualizedNICadapterfromVMware.New

    VMXNET3featuresoverpreviousversionofEnhancedVMXNETinclude:

    MSI/MSI-Xsupport(subjecttoguestoperatingsystemkernelsupport)

    ReceiveSideScaling(supported inWindows2008whenexplicitlyenabledthroughthedevicesAdvanced

    conguration tab)

  • 7/31/2019 VMware vSphere Performance WP En

    8/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 8

    IPv6checksumandTCPSegmentationOoading(TSO)

    overIPv6

    VLANo-loading

    LargeTX/RXringsizes(conguredfromwithinthevirtualmachine)

    Network Stack Performance and Scalability

    vSphere 4 includes optimizations to the network stack that can saturate 10Gbps links or both transmit and

    receivesidenetworkI/O.TheimprovementsintheVMkernelTCP/IPstackalsoimprovebothiSCSIthroughput

    as well as maximum network throughput or VMotion.

    vSphere4utilizestransmitqueuestoprovide3Xthroughputimprovementsintransmitperformanceforsmall

    packet sizes.

    1 VM 4 VMs 8 VMs

    100%

    80%

    60%

    0%

    20%

    Gains Over ESX 3.5

    40%

    16 VMs

    Network Transmit Throughput Improvement

    Figure 4NetworkTransmitThroughputImprovementorvSphere

    vSphere4supportsLargeReceiveOoad(LRO),afeaturethatcoalescesTCPpacketsfromthesame

    connectiontoreduceCPUutilization.UsingLROwithESXprovides40percentimprovementinboth

    throughputandCPUcosts.

  • 7/31/2019 VMware vSphere Performance WP En

    9/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 9

    ResourceManagementEnhancements

    VMotion

    PerformanceenhancementsinvSphere4reducetimetoVMotionaVMbyupto75percent.

    Storage VMotion Performance

    Storage VMotion is now ully supported (experimental beore) and has much improved switchover time. For

    veryI/OintensiveVMs,thisimprovementcanbe100x.StorageVMotionleveragesanewandmoreecient

    blockcopymechanismcalledChangedBlockTracking,minimizingCPUandmemoryresourceconsumptionon

    theESXhostuptotwotimes.

    Storage VMotion Time

    ESX 3.5 ESX 4

    1200

    1000

    800

    600

    0

    200

    400

    20 VM Provisioning Time

    ESX 3.5

    1200

    1000

    800

    600

    0

    200

    400

    ESX 4

    Figure 5DecreasedStorageVMotionTime Figure 6ImprovedVMFSPerormance

    During SPECjbb (ACTIVE)

    Seconds(lowerisbetter)

    600.00

    500.00

    400.00

    300.00

    0.00

    100.00

    4GB ESX 3.5

    4GB ESX 4

    200.00

    After SPECjbb (IDLE)

    Elapsed VMotion Time

    Figure 7PerormanceEnhancementsLeadtoaReducedTimetoVMotion

  • 7/31/2019 VMware vSphere Performance WP En

    10/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 1 0

    25

    ESX 3.5

    512 VM Boot Storm (FCP)

    ESX 4

    20

    15

    0

    5

    10

    512 VM Boot Time

    (Fibre Channel)

    Figure 8TimetoBoot2VDIVMS

    VM Provisioning

    VMFSperformanceimprovementsoermoreecientVMcreationandcloning.Thisusecaseisespecially

    important with vSpheres more ambitious role as a Cloud operating system.

    PerformanceManagementEnhancements

    Enhanced vCenter Server Scalability

    As organizations adopt server virtualization at an unprecedented level, the need to manage large scale

    virtual data centers is growing signicantly. To address this, vCenter Server, included with vSphere 4, has been

    enhanced to manage up to 300 hosts and 3000 virtual machines. You also have the ability to link many vCenter

    Servers in your environment with vCenter Server Linked Mode to manage up to 10,000 virtual machines rom

    a single console.

    vCenter Performance Charts Enhancements

    PerformancechartsinvCenterhavebeenenhancedtoprovideasingleviewofallperformancemetricssuch

    asCPU,memory,disk,andnetworkwithoutnavigatingthroughmultiplecharts.Inaddition,theperformance

    charts also include the ollowing improvements:

    Aggregatedchartsshowhigh-levelsummariesofresourcedistributionthatisusefultoidentifythe

    top consumers.

    Thumbnailviewsofhosts,resourcepools,clusters,anddatastoresallowforeasynavigationtothe

    individual charts.

    Drilldowncapabilityacrossmultiplelevelsintheinventoryhelpsin isolatingtherootcauseofperformance

    problems quickly.

    Detaileddatastorelevelviewsshowutilizat ionbyletypeandunusedcapacity.

    ApplicationPerformance

    Oracle

    VMwaretestinghasshownthatrunningaresource-intensiveOLTPbenchmark,basedonanon-comparable

    implementationoftheTPC-C*workloadspecication,OracleDBinan8-vcpuVMwithvSphere4achieved

    85percentofnativeperformance.Thisworkloaddemonstrated8,900databasetransactionspersecondand

    * The benchmark was a air-use implementation o the TPC-C business model; these results are not TPC-C compliant results, and not comparable to ocial

    TPC-C results. TPC Benchmark is a trademark o the TPC.

  • 7/31/2019 VMware vSphere Performance WP En

    11/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 1 1

    60,000diskinput/outputspersecond(IOPS).Theresultsdemonstratedinthisproofpointrepresentthemost

    I/O-intensiveapplication-basedworkloadeverruninanX86virtualenvironmenttodate.

    2-processor 4-processor 8-processor

    3

    3.5

    4

    4.5

    2.5

    2

    1.5

    0

    0.5

    ESX 4

    Native

    1

    ESX 4 Oracle DB VM Throughput,

    as Compared to 2-CPU Native Configuration

    Figure 9ComparisonoOracleDBVMThroughputvs2-CPUNativeConfguration

    The results above were run on a server with only eight physical cores, resulting in an 8-way VM conguration

    thatwasnotunder-committingthehost.TheslightlylesscommittedfourvCPUcongurationranat88

    percent o native.

    SQLServer

    RunninganOLTPbenchmarkbasedonanon-comparableimplementationoftheTPC-E*workloadspecication,

    aSQLServervirtualmachinewithfourvirtualCPUsonvSphere4.0showed90percenteciencywithrespectto

    native.TheSQLServerVMwitha500GBdatabaseperformed10,500IOPSand50Mb/sofnetworkthroughput.

    1 cpu 2 cpu

    RelativeScalingR

    atio

    4 cpu

    3

    4

    2

    0

    Native

    VMware VM

    1

    ESX 4 SQL Server VM Throughput,

    as Compared to 1 CPU Native Configuration

    Figure 10ComparisonovSphereSQLServerVMThroughputvsNativeConfguration

    * The benchmark was a air-use implementation o the TPC-C business model; these results are not TPC-C compliant results, and not comparable to ocial

    TPC-C results. TPC Benchmark is a trademark o the TPC.

  • 7/31/2019 VMware vSphere Performance WP En

    12/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    T E C H N I C A L W H I T E P A P E R / 1 2

    SAP

    VMwaretestingdemonstratedthatrunningSAPinaVMwithvSphere4scaledlinearlyfromonetoeightvCPUs

    perVMandachieved95percentofnativeperformanceonastandard2-tierSAPbenchmark.Thismulti-tiered

    applicationarchitectureincludestheSAPapplicationtierandback-endSQLServerdatabaseinstantiatedina

    single virtual machine.

    1 cpu 2 cpu

    Relative

    Scalin

    gR

    atio

    4 cpu 8 cpu

    6

    8

    4

    0

    Native

    VMware VM

    2

    ESX 4 SAP VM Throughput,

    as Compared to1 CPU Native Configuration

    Figure 11ComparisonoESXSAPVMThroughputvsNativeConfguration

    ExchangeMicrosot Exchange Server is one o the most demanding applications in todays datacenters, save the very

    largestdatabasesbeingdeployed.PreviousworkonvirtualExchangedeploymentsshowedVMwaresabilityto

    improve perormance rom native congurations by designing an Exchange architecture with a greater number

    o mailbox instances running ewer mailboxes per instance.

    With the perormance enhancements added to vSphere 4 single VM Exchange mailboxes have been

    demonstrated at up to 8,000 mailboxes per instance. This means that Exchange administrators will have the

    option o choosing the higher perorming smaller mailboxes or the more cheaply licensed large mailbox servers.

  • 7/31/2019 VMware vSphere Performance WP En

    13/13

    Ws New in VMwe vSpee 4:Pefomnce Enncemens

    VMware, Inc.HillviewAvenuePaloAltoCAUSATel8-86-2Fax6-2-wwwvmwarecomCopyright9VMware,IncAllrightsreservedThisproductisprotectedbyUSandinternationalcopyrightandintellectualpropertylawsVMwareproductsarecoveredbyoneormorepatentslistedat

    http://wwwvmwarecom/go/patentsVMwareisaregisteredtrademarkortrademarkofVMware,IncintheUnitedStatesand/orotherjurisdictionsAllothermarksandnamesmentionedhereinmaybe

    1 VM 2 VMs

    95percentilelatency(ms)

    Users(Thousands)

    4 VMs 6 VMs 8 VMs

    6

    7

    8

    9

    5

    4

    3

    0

    1

    2

    200

    250

    300

    150

    100

    0

    50

    Users (thousands) 95 percentile latency

    #VCPUs >#PCPUs

    ESX 4 Exchange Mailbox Count and Latency

    Figure 12vSphereperormanceenhancementswithMicrosotExchange

    Summary

    VMware innovations continue to make VMware vSphere 4 the industry standard or computing in data centers

    o all sizes and across all industries. The numerous perormance enhancements in VMware vSphere 4 enable

    organizations to get even more out o their virtual inrastructure and urther reinorce the role o VMware asindustry leader in virtualization.

    vSphere represents dramatic advances in perormance compared to VMware Inrastructure 3 to ensure that

    even the most resource intensive and scale out applications such as large databases and Microsot Exchange

    email systems can run on private clouds powered by vSphere.

    References

    Performance Evaluation of AMD RVI Hardware Assist

    http://www.vmware.com/pdf/RVI_performance.pdf

    Performance Evaluation of Intel EPT Hardware Assist

    http://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf