Vm Ware Performance Troubleshooting

Embed Size (px)

Citation preview

  • 7/31/2019 Vm Ware Performance Troubleshooting

    1/53

    VMware Performance

    Troubleshooting

    Presented by Chris Kranz

  • 7/31/2019 Vm Ware Performance Troubleshooting

    2/53

    Topics Covered Introduction

    Root Cause Analysis Performance Characteristics

    CPU

    Networking

    Memory

    Disk

    Virtual Machine optimisation ESXTop

    vm-support

    Service Console

    Resource Groups

    Design Guidelines

    Capacity Planner limitations and cautions

    Conclusion

    Reference Articles

  • 7/31/2019 Vm Ware Performance Troubleshooting

    3/53

    Introduction

    Multiple layers of virtualisation are used to

    increase service levels, availability and

    manageability

    However, multiple layers of virtualisation often

    mask performance and configuration issuesmaking it more of a challenge to troubleshoot

    and correct

    The worst out come is that performance issues

    after a virtualisation project lead to the

    perception that VMware results in reducedperformance and future confidence in VMware

    can be affected

  • 7/31/2019 Vm Ware Performance Troubleshooting

    4/53

    Virtual Machine Resources

    CPU

    Memory

    Disk

    Networking

    Performance Basics

  • 7/31/2019 Vm Ware Performance Troubleshooting

    5/53

    Resource Maximums

    Host GuestLogical Processors 64 N/A

    Virtual CPUs N/A 8

    Virtual CPUs per Core 20 N/A

    Memory 1TB 256GB

    http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf

  • 7/31/2019 Vm Ware Performance Troubleshooting

    6/53

    Typical Host

    vSphere 1U Host

    CPUs 2 x Quad Core

    Memory 32-64GB RAM

    Typical 3 VMs per core, 24VMs per Host

    Each has 2GB of RAM = 48GB of RAM

  • 7/31/2019 Vm Ware Performance Troubleshooting

    7/53

    Root Cause Analysis

    http://www.vmware.com/resources/techresources/10066

    http://www.vmware.com/resources/techresources/10066http://www.vmware.com/resources/techresources/10066
  • 7/31/2019 Vm Ware Performance Troubleshooting

    8/53

    Root Cause ...

  • 7/31/2019 Vm Ware Performance Troubleshooting

    9/53

    Do not rely on guest tools, but

    Can show high CPU, & Memory Utilisation

    Measurement of Latency & throughput of Disk &

    Network Interfaces Use the virtualisation layer, to diagnose cause:

    Guest is unaware of virtualisation workload

    The way in which guest OSs account time isdifferent

    No visibility of available resources

    Monitoring Performance

  • 7/31/2019 Vm Ware Performance Troubleshooting

    10/53

    esxtop (service console only)

    resxtop (remote command line utilities)

    Performance graphs in vCentre

    Performance Analysis Tools

  • 7/31/2019 Vm Ware Performance Troubleshooting

    11/53

    esxtop can be run:

    Interactively

    Batch (eg. esxtop -a -b > analysis.csv)

    Load batch into windows perfmon or MS Excel

    Two keys to remember

    H : help

    F : fields to display

    esxtop

  • 7/31/2019 Vm Ware Performance Troubleshooting

    12/53

    esxtop basics

    Number of WorldsName of Resource

    Pool, Virtual

    Machine or World

    Host Resources

  • 7/31/2019 Vm Ware Performance Troubleshooting

    13/53

    Performance Characteristics

    CPU NetworkingMemory DiskSlow Processing

    High CPU Wait

    Packet Loss

    Slow Network

    Slow Processing

    Disk Swapping

    Log Stalls

    Disk Queue

    Slow Application PerformanceReduced User Experience

    Data Loss and Corruption

  • 7/31/2019 Vm Ware Performance Troubleshooting

    14/53

    CPU

    ESX Scheduler

    Service

    Console

    Virtual

    Machine

    Limits / Shares / Reservations

    Basic World States

    Read / Run / Wait

    CPU StatesReady / Usage / Wait

  • 7/31/2019 Vm Ware Performance Troubleshooting

    15/53

    CPUesxtop

    PCPU(%): CPU utilization%USED: Utilization

    %RDY: Ready Time

    %RUN: Run Time

    %WAIT: Wait and idling time

    High %RDY + High %User can imply over commitment

  • 7/31/2019 Vm Ware Performance Troubleshooting

    16/53

    CPUVI-Client

    Used Time > Ready Time:

    Possible CPU over-committment

    Used Time

    Ready Time

  • 7/31/2019 Vm Ware Performance Troubleshooting

    17/53

    CPUFurther Investigation

    %MLMTD shows this VM has been limited

  • 7/31/2019 Vm Ware Performance Troubleshooting

    18/53

    VMware Memory Management

    Transparent Page Sharing

    VMware Tools Balloon Driver to force the VM to swap to disk

    Virtual Machine Page File

  • 7/31/2019 Vm Ware Performance Troubleshooting

    19/53

    MemoryBallooning vs. Swapping

    Ballooning driver causes the

    host to swap pages that it

    chooses to disk

    ESX Swapping will swap any

    pages to disk.

  • 7/31/2019 Vm Ware Performance Troubleshooting

    20/53

    Ballooning can be disabled (0 value) or

    controlled on a per Virtual Machine basis

    using:

    sched.mem.maxmemctl

    Default is set to 65%, can be controlled at host

    level.

    Only is an issue in resource contention

    scenarios. (or VMs with low latency eg Citrix)

    Memory

  • 7/31/2019 Vm Ware Performance Troubleshooting

    21/53

    Memory - Host

    VI Client shows memory usage of the host. This is calculated as consumed + overhead

    memory + Service Console.

    Performance charts are a very good way of showing the Virtual Machine memory

    breakdown.

    Consumed Memory

    Ballooned Memory

    Shared Memory

    Swapped Memory

  • 7/31/2019 Vm Ware Performance Troubleshooting

    22/53

    Memory - Guest

    Host Memory = Consumed + Overhead Memory

    Guest Memory = Active Memory for Guest OS

  • 7/31/2019 Vm Ware Performance Troubleshooting

    23/53

    Memory Guest Overhead

  • 7/31/2019 Vm Ware Performance Troubleshooting

    24/53

    Memory

    Metric Description

    Memory Active (KB) Physical pages touched recently by a VM

    Memory Usage (%) Active memory / configured memory

    Memory Consumed (KB) Machine memory mapped to a virtual machine, including its portion of

    shared pages. Doesnt include overhead memory

    Memory Granted (KB) Physical pages allocated to a virtual machine. May be less thanconfigured memory. Includes shared pages. Doesnt include overhead

    memory.

    Memory Shared (KB) Physical pages shared with other virtual machines

    Memory Balloon (KB) Physical memory ballooned from a virtual machine

    Memory Swapped (KB) Physical memory in swap file (approx. swap out swap in). Swap outand Swap in are cumulative

    Overhead Memory (KB) Machine pages used for virtualisation

    Virtual Machine Memory Metrics VI Client

  • 7/31/2019 Vm Ware Performance Troubleshooting

    25/53

    Memory

    Metric Description

    Memory Active (KB) Physical pages touched recently by the host

    Memory Usage (%) Active memory / configured memory

    Memory Consumed (KB) Total host physical memory free memory on host. Includes Overhead

    and Service Console memoryMemory Granted (KB) Sum of physical pages allocated to all virtual machines. Doesnt include

    overhead memory.

    Memory Shared (KB) Physical pages shared by virtual machines on host

    Shared Common (KB) Total machine pages used by shared pages

    Memory Balloon (KB) Machine pages ballooned from virtual machinesMemory Swap Used (KB) Physical memory in swap file (approx. swap out swap in). Swap out

    and Swap in are cumulative

    Overhead Memory (KB) Machine pages used for virtualisation

    Host Memory Metrics VI Client

  • 7/31/2019 Vm Ware Performance Troubleshooting

    26/53

    Memoryesxtop

    PMEM: Total physical memory breakdown

    VMKMEM: Memory managed by vmkernel

    COSMEM: Service Console memory breakdownPSHARE: Page sharing statistics

    SWAP: Swap statistics

    MEMCTL: Balloon driver data

  • 7/31/2019 Vm Ware Performance Troubleshooting

    27/53

  • 7/31/2019 Vm Ware Performance Troubleshooting

    28/53

    Memory

    VI Client esxtop

    Memory Active N/A (try /proc/vmware/sched/mem-verbose)

    Memory Usage N/A (try /proc/vmware/sched/mem-verbose)Memory Consumed PMEM total PMEM free

    Memory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)

    Memory Shared PSHARE (shared)

    Memory Shared Common PSHARE (common)

    Memory Balloon MEMCTL

    Memory Swap Used SWAP (r/w and w/s are rates)

    Overhead Memory OVHD & OVHDMAX

    esxtop / VI Client metrics : Host Usage

  • 7/31/2019 Vm Ware Performance Troubleshooting

    29/53

    MemoryVI Client memory usage graph

  • 7/31/2019 Vm Ware Performance Troubleshooting

    30/53

    MemoryTroubleshooting Memory usage issues

  • 7/31/2019 Vm Ware Performance Troubleshooting

    31/53

    Networking

    Network configuration is more likely to blame than resource contention

    Switch Assisted Teaming (IP Hash)

    VLAN Trunking

    Flow Control (full)

    Speed & Duplex (1000Mb / Full)

    Port FastBPDU Disabled

    STP Disabled

    Link State Tracking

    Jumbo Frames

  • 7/31/2019 Vm Ware Performance Troubleshooting

    32/53

    Networkingesxtop

    Transmit and Receive in Mb/s

    Transmit and Receive in Packets

  • 7/31/2019 Vm Ware Performance Troubleshooting

    33/53

    Networkingesxtop

    Drop Packets Received

    Dropped Packets Transmit

  • 7/31/2019 Vm Ware Performance Troubleshooting

    34/53

    Disk

    Varying Factors File system performance

    Disk subsystem configuration (SAN, NAS, iSCSI, local disk)

    Disk caching

    Disk formats (thick, sparse, thin)

    ESX Storage Stack

    Different latencies for different disks

    Queuing within the kernel

    K: Kernel

    D: Device

    G: Guest

  • 7/31/2019 Vm Ware Performance Troubleshooting

    35/53

    Disk

    Quite Coarse Statistics

    Disk read / write rate (KB/s)

    Disk usage: sum of read BW and write BW (KB/s)

    Disk read / write requests (per 20s interval)

    Bus resets / Command aborts (per 20s interval)Per LUN or aggregated stats

    VI Client statistics

  • 7/31/2019 Vm Ware Performance Troubleshooting

    36/53

    DiskAggregated stats similar to VI Client

    Disk read / write per sec (READS/s, WRITES/s) MB read / write per sec (MBREAD/s, MBWRTN/s)

    Latency Statistics

    Kernel Average / command (KAVG/cmd)

    Device Average / command (DAVG/cmd) Guest Average / command (GAVG/cmd)

    Queuing Information

    Adapter Queue Length (AQLEN)

    LUN Queue Length (LQLEN)

    VMKernel (QUED)

    Active Queue (ACTV)

    %Used (%USD = ACTV/LQLEN)

    esxtop statistics

  • 7/31/2019 Vm Ware Performance Troubleshooting

    37/53

    DiskSAN Rough Estimates

    Purely looking at a single ESX host, roughly:Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec

    FC, rough maximums:Effective Link Bandwidth = ~80/90% of Real Bandwidth

    Effective (2Gbps) = 200 230 MBpsEffective (4Gbps) = 410 460 MBps

    Effective (8Gbps) = 820 920 MBps

    iSCSI / NFS / FCoE, rough maximums:

    Effective Link Bandwidth = ~70/80% of Real BandwidthEffective (1GigE) = 90 100 MBps

    Effective (10GigE) = 900 1000 MBps

  • 7/31/2019 Vm Ware Performance Troubleshooting

    38/53

    DiskDesired Latency Calculations

    Desired Larency in msec

  • 7/31/2019 Vm Ware Performance Troubleshooting

    39/53

    DiskVI Client

    SAN Cache disabled

    Poor throughput

    SAN Cache enabled

    High throughput

  • 7/31/2019 Vm Ware Performance Troubleshooting

    40/53

    Diskesxtop

    Latency is quite high

    After enabling cache,

    Latency is reduced

  • 7/31/2019 Vm Ware Performance Troubleshooting

    41/53

    Virtual Machine OptimisationDeploy all machines from an optimised template!

    VMware tools MUSTbe installed

    The disks MUSTbe block aligned to the storage (even when using NFS and SAN)

    Where possible, always separate data disks from OS disks

    Windows performance settings should be optimised for application performance

    Guest operating system timeouts should be set as defined by the SAN vendor

    Pagefile should be separated where appropriate (this can impact VMware SRM however) Unused Windows services should be disabled (wireless config, print spooler, audio, etc.)

    Last access update time should be disabled (unless where required)

    Logging of the VM should be disabled (only enabled for troubleshooting)

    Remove any unused virtual hardware (floppy drives, USB, etc.)

    Disable screen savers and power saving features, including logon screen saver

    Enable Remote Desktop, avoid using the VI Client for remote administration Install standard applications into template (bginfo, AntiVirus, any host agents, etc)

    Multiple-CPUs should be allocated sparingly

  • 7/31/2019 Vm Ware Performance Troubleshooting

    42/53

    Virtual Machine OptimisationBlock alignment is vital to good disk performance!

  • 7/31/2019 Vm Ware Performance Troubleshooting

    43/53

    esxtopCommand Action

    space Update the display

    ? Show the help page

    q quit

    f / F Add or Remove columns from the display

    o / O Change the order the display is sorted

    s change the update interval

    # change the number of instances to display

    W Write configuration to file

    e Expand / Rollup CPU Stats

    V View only VM instances

    L Change the length of the NAME field

    m Display memory statistics

    n Display network statistics

    i Display interrupt statisticsd Display disk adapter statistics

    u Display disk device statistics

    v Display disk VM statistics

    Command Options

    when inside esxtop

  • 7/31/2019 Vm Ware Performance Troubleshooting

    44/53

    esxtop

    Command Action

    -b batch mode

    -l locks the objects available in the first snapshot

    -s enables secure mode

    -a show all statistics

    -c sets the configuration file

    -R enables replay mode (used with vm-supportS)

    -d sets the update interval

    -n runs esxtop for n iterations

    Command Line Options

    from the console

  • 7/31/2019 Vm Ware Performance Troubleshooting

    45/53

    esxtop

    Expand the default window size for your session to get all statistics

  • 7/31/2019 Vm Ware Performance Troubleshooting

    46/53

    vm-supportCreates a packaged zip file containing the following sections:

    boot contains the grub configuration

    etc

    contains the Console OS configuration files (cron, tcpwrappers, syslog, etc)

    proc

    contains much of the hardware configuration modules and variables

    tmp contains a lot of the ESX specific configuration output

    var

    contains log files and any core dumps

    vmfs

    contains the structure of the VMFS datastores

    esx3-installation (where appropriate)

    contains a copy if the previous esx3 configuration variables

  • 7/31/2019 Vm Ware Performance Troubleshooting

    47/53

    vm-supportUsing vm-support to extract performance information:

    vm-supportSd -i

    and are in seconds

    The output from this can then be replayed in esxtop for review after it has been

    extracted.

    esxtopR

  • 7/31/2019 Vm Ware Performance Troubleshooting

    48/53

    Service Console Performance

    Multiple Service Console networks for network resiliencyIncreased Service Console memory upto 800MB

    Use host agents supplied by your vendors

    Make storage recommended tweaks such as HBA Queue Depth

    and IO timeoutsMinimal use of the VI Client console RDP or SSH instead

    Properly sized vCenter server 64bit OS where possible

  • 7/31/2019 Vm Ware Performance Troubleshooting

    49/53

    Resource Groups

    Dynamically reallocate resource shares

    Additional VM, shares allow you to over-

    commit resources and have a gracefulre-allocation

    Remove a VM and exploit extra resources

    across all remaining VMs

  • 7/31/2019 Vm Ware Performance Troubleshooting

    50/53

    Design Guidelines

    Full Resilience / Multiple paths

    Standard configuration across all aspects (ESX, Storage, Networking, etc.)

    Standard naming conventions

    Learn from others mistakes

    Follow guidelines from vendors best-practices

    Rule out the basics before requesting support

  • 7/31/2019 Vm Ware Performance Troubleshooting

    51/53

    Capacity Planner & P2V Cautions and Limitations

    Peak CPU usage can sometimes be misleading Back-end storage system performance

    P2V machines will require block-aligning to the storage

    P2V machines will still require guest OS optimisation

  • 7/31/2019 Vm Ware Performance Troubleshooting

    52/53

    Conclusion Performance issues can often be traced with simple root cause

    analysis using basic tools (VI Client / esxtop) Performance tools help diagnose issues and help rule out non-

    issues

    Performance tools are useful in different contexts, not always

    either/or

    Real-time data and troubleshooting: esxtop

    Historical data: VI Client

    Coarse resource / cluster usage: VI Client

    Detailed resource usage: esxtop

    Combine information from various tools to get a complete picture Always benchmark your systems first so you not what the optimal

    performance is that you can receive

  • 7/31/2019 Vm Ware Performance Troubleshooting

    53/53

    Reference Articles http://www.vmware.com/pdf/esx3_memory.pdf

    http://www.vmworld.com/docs/DOC-2370

    http://blogs.vmware.com/performance/

    http://communities.vmware.com/docs/DOC-5420

    http://kb.vmware.com/kb/1008205

    http://communities.vmware.com/community/vmtn/general/performance

    http://www.vmware.com/products/vmmark/ http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf

    http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf

    http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf

    http://www.vmware.com/pdf/GuestOS_guide.pdf

    http://www.vmware.com/resources/techresources/10066

    http://www.vmware.com/resources/techresources/10059

    http://www.vmware.com/resources/techresources/10062

    http://www.vmware.com/pdf/esx3_memory.pdfhttp://www.vmworld.com/docs/DOC-2370http://blogs.vmware.com/performance/http://communities.vmware.com/docs/DOC-5420http://kb.vmware.com/kb/1008205http://communities.vmware.com/community/vmtn/general/performancehttp://www.vmware.com/products/vmmark/http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdfhttp://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdfhttp://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdfhttp://www.vmware.com/pdf/GuestOS_guide.pdfhttp://www.vmware.com/resources/techresources/10066http://www.vmware.com/resources/techresources/10059http://www.vmware.com/resources/techresources/10062http://www.vmware.com/resources/techresources/10062http://www.vmware.com/resources/techresources/10059http://www.vmware.com/resources/techresources/10066http://www.vmware.com/pdf/GuestOS_guide.pdfhttp://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdfhttp://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdfhttp://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdfhttp://www.vmware.com/products/vmmark/http://communities.vmware.com/community/vmtn/general/performancehttp://kb.vmware.com/kb/1008205http://communities.vmware.com/docs/DOC-5420http://communities.vmware.com/docs/DOC-5420http://communities.vmware.com/docs/DOC-5420http://blogs.vmware.com/performance/http://www.vmworld.com/docs/DOC-2370http://www.vmworld.com/docs/DOC-2370http://www.vmworld.com/docs/DOC-2370http://www.vmware.com/pdf/esx3_memory.pdf