Performance Report Sun Unified Storage and VMware View 1.0

Embed Size (px)

Citation preview

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    1/74

    Performance Report

    VMware View linked clone performance

    on Suns Unified Storage

    Author: Erik Zandboer

    Date: 02-04-2010Version 1.00

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    2/74

    Page 2 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Table of contents

    1 Management Summary ................................................................................................................. 6

    1.1 Introduction ........................................................................................................................ 6

    1.2 Objectives ........................................................................................................................... 6

    1.3 Results ................................................................................................................................ 6

    2 Initial objective ............................................................................................................................. 7

    2.1 VMware View ....................................................................................................................... 7

    2.2 Storage requirements .......................................................................................................... 7

    3 Technical overview of the solutions .............................................................................................. 8

    3.1 VMware View linked cloning ................................................................................................ 8

    3.2 Sun Unified Storage ............................................................................................................. 83.3 Linked cloning technology combined with Unified Storage ................................................... 9

    4 Performance test setup ............................................................................................................... 10

    4.1 VMware ESX setup ............................................................................................................. 10

    4.2 VMware View setup ........................................................................................................... 11

    4.3 Windows XP vDesktop setup .............................................................................................. 11

    4.4 Unified Storage setup ........................................................................................................ 12

    5 Tests performed ......................................................................................................................... 13

    5.1 Test 1: 1500 idle vDesktops .............................................................................................. 13

    5.2 Test 2: User load simulated linked clone desktops ............................................................. 13

    5.3 Test 2a: Rebooting 100 vDesktops in parallel .................................................................... 13

    5.4 Test 2b: Recovering all vDesktops after storage appliance reboot ...................................... 135.5 Test 3: User load simulated full clone desktops ................................................................. 14

    6 Test results ................................................................................................................................ 15

    6.1 Test Results 1: 1500 idle vDesktops .................................................................................. 15

    6.1.1 Measured Bandwidth and IOP sizes ................................................................................ 16

    6.1.2 Caching in the ARC and L2ARC ...................................................................................... 20

    6.1.3 I/O Latency ................................................................................................................... 22

    6.2 Test Results 2: User load simulated linked clone desktops ................................................ 24

    6.2.1 Deploying the initial 500 user load-simulated vDesktops ............................................... 25

    6.2.2 Impact of 500 vDesktop deployment on VMware ESX ..................................................... 31

    6.2.3 Impact of 500 vDesktop deployment on VMware vCenter and View ................................ 34

    6.2.4 Deploying vDesktops beyond 500 .................................................................................. 366.2.5 Performance figures at 1300 vDesktops ......................................................................... 40

    6.2.6 Extrapolating performance figures ................................................................................. 47

    6.3 Test Results 2a: Rebooting 100 vDesktops ........................................................................ 54

    6.4 Test Results 2b: Recovering all vDesktops after storage appliance reboot........................... 58

    6.5 Test Results 3: User load simulated full clone desktops ..................................................... 62

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    3/74

    Page 3 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    7 Conclusions ............................................................................................................................... 65

    7.1 Conclusions on scaling VMware ESX ................................................................................... 65

    7.2 Conclusions on scaling networking between ESX and Unified Storage ................................. 66

    7.3 Conclusions on scaling Unified Storage CPU power ............................................................ 677.4 Conclusions on scaling Unified Storage Memory and L2ARC ............................................... 68

    7.5 Conclusions on scaling Unified Storage LogZilla SSDs ........................................................ 68

    7.6 Conclusions on scaling Unified Storage SATA storage ........................................................ 69

    8 Conclusions in numbers ............................................................................................................. 70

    9 References ................................................................................................................................. 72

    Appendix 1: Hardware test setup ...................................................................................................... 73

    Appendix 2: Table of derived constants ............................................................................................ 74

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    4/74

    Page 4 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    People involved

    Name Company Responsibility E-Mail

    Erik Zandboer Dataman B.V. Sr. Technical Consultant [email protected]

    Simon Huizenga Dataman B.V. Technical Consultant [email protected]

    Kees Pleeging Sun Project leader [email protected]

    Cor Beumer Sun Storage Solution Architect [email protected]

    Version control

    Version Date Status Description

    0.01 11-02-2010 Initial draft Initial draft for internal (Dataman / Sun) review

    0.02 12-03-2010 Final draft Adjusted some reviewed minors; added conclusions and derivedconstants

    1.0 02-04-2010 Release Changed last minors; changed minors in reviewed items added in 0.02

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    5/74

    Page 5 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Abbreviations and definitions

    Abbreviation Description

    VM Virtual Machine. Virtualized workload on a virtualization platform (such as VMware ESX)

    GbE Gigabit Ethernet. Physical network connection at Gigabit speed.

    IOPS I/O operations Per Second. The number of both read- and write commands from and to a

    storage device per second. Take note that the ratio between reads and writes cannot be

    extracted from these values, only the sum of the two. Also see ROPS and WOPS.

    OPS Operations Per Second. More general term, and closely related to IOPS.

    ROPS Read Operation Per Second. The number of read commands performed on a storage

    device per second.

    WOPS Write Operation Per Second. The number of write commands performed on a storage

    device per second.

    TPS Transparent Page Sharing. A feature unique to VMware ESX, where several memory pages

    can be identified as containing equal data, and then stored only once in physical memory,

    effectively saving physical memory. Is in most respects comparable to data deduplication.

    SSD Solid State Drive. This is normally indicated as a non-volatile storage device with nomoving parts. It can be a Flash Drive (like the ReadZilla device), but it can also be a

    battery-backed (plus optionally flash-backed) RAM drive (like the LogZilla device).

    KB KBytes. Also seen in conjunction with /s or .sec-1 which dedicates KBytes-per-second

    MB MBytes. Also seen in conjunction with /s or .sec-1 which dedicates MBytes-per-second.

    Mb Mbits. Also seen in conjunction with /s or .sec-1 which dedicates Mbits-per-second.

    vDesktop Virtualized Desktop. A Virtual Machine (VM) running a client operating system such as

    Windows XP.

    ave Average. Shorthand used in graphs to indicate the value is an averaged value.

    HT, HTx Hyper Transport bus. High bandwidth connection between CPUs and I/O devices on

    mainboards. Often indicated with numbers (HT0, HT1) to indicate specific connections.

    UFS Unified Storage (Device). Storage device which is capable of delivering the same data using

    multiple protocols.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    6/74

    Page 6 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    1 Management Summary1.1 IntroductionRunning virtual desktops (vDesktops) puts a lot of stress on storage systems. Conventional storage systems

    are easily scaled to the right size: A number of disks deliver a certain capacity and performance.

    In an effort to tackle the need for a lot of disks in a virtualized desktop (vDesktop) environment, Dataman

    started to analyze the basic needs for a vDesktop storage solution based on VMware linked cloning

    technology. The new Sun Unified Storage (UFS) Solution (see reference [4] ) appeared to have a significant

    head start in delivering a high vDesktop performance with small number of disks.

    Because of the alternative way the storage solution works, it is next to impossible to calculate performance

    numbers. The way the Unified Storage performs is very dependent on the workload offered. This is why

    Dataman teamed up with Sun in order to run performance tests on these storage devices.

    1.2 ObjectivesThe performance test had several goals:

    - To measure performance impact on the Unified Storage Array as more vDesktops were deployed onthe environment;

    - To examine impact on vDesktop reboots;- To extrapolate the measured performance data;- To project (and avoid) performance bottlenecks;- To define scaling constants for scaling the environment to a projected number of vDesktops.

    The tests were performed in Suns Datacenter in Linlithgow, Scotland. Hardware and housing was generously

    made available to Dataman for a period of two months, over which all necessary tests were performed.

    1.3 ResultsThe performance tests have proven to be very effective; during the final stages of the test the testing

    environment stopped at 1319 user-simulated vDesktops because the VMware environment having only

    eight nodes could not handle any more virtual machines (VMs). At that stage, all vDesktops still performed

    without any issues or noticeable latency on a single headed UFS device. Even more remarkable, the

    environment could have run with only 16 SATA spindles in a mirrored setup! It is the underlying ZFS file

    system and the intelligent use of memory and Solid State Disks (SSDs) that makes all the difference here.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    7/74

    Page 7 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    2 Initial objectiveAfter virtualization practically conquered the world for server loads, virtualization now continues to virtualizethe desktop. Virtualizing a big number of desktops on a small set of servers has proven to pose its own set of

    challenges. The one often encountered is performance requirements of the underlying storage array. Scaling

    disks just to satisfy the capacity needs has always been a bad practice, but this can work out especially bad in

    a virtual desktop environment. The large disk capacities of nowadays do not help either.

    2.1 VMware ViewOne of the leading platforms for delivering virtualized desktops is VMware ESX in combination with VMware

    View. VMware View is able to deliver virtual desktops using linked cloning technology. This technology is able

    to deliver very fast desktop image duplicating and more efficient in terms of storage capacity needs.

    Calculating the number of ESX nodes (cores and memory) is not too hard. It is no different from having full

    cloned desktops. But what are the requirements of the underlying storage array?

    2.2 Storage requirementsThe structure of linked clones poses some challenges to the storage. For reasons explained in the next

    paragraphs, Suns 7000 series Unified Storage (see reference [4] ) was selected as being THE platform to drive

    the linked clone loads most efficient.

    The objective of this performance test is to prove that Suns 7000 series Unified Storage in combination with

    linked clones gives great performance at little cost.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    8/74

    Page 8 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    3 Technical overview of the solutionsIn order to understand the performance test setup and its results better, it is important to have someknowledge about the underlying technologies.

    3.1 VMware View linked cloningVMware View is basically a broker between the clients and the virtualized desktops (vDesktops) in the

    datacenter. The idea is that a single Windows XP image can be used to clone thousands of identical desktops.

    The broker controls the cloning and customization of these desktops.

    VMware View enables an extra feature: linked cloning. When using linked cloning technology, only a small

    number of fully cloned desktop images exist. All virtual desktops that are actually used are derivatives of

    these full clone images. In order to be able to differentiate the desktops, all writes to the virtual desktops

    disk are captured in a separate file, much like VMware snapshot technology. The result of this is that many

    read operations are performed from the few full clones within the environment.

    Following the VMware best practices, it is recommended to have a maximum of 64 linked clones under every

    full clone (called a replica).

    3.2 Sun Unified StorageSuns Unified Storage uses the ZFS file system internally. There are some very specific differences with just

    about any other file system. It is far beyond the scope of this document to deep dive into ZFS, so just some

    features of these appliances will be discussed.

    Suns Unified Storage appliances have a lot of CPU power and memory compared to most competitors. The

    CPU power is required to drive the ZFS file system in an appropriate manner, and memory helps caching of

    data. This caching is partly the key to extreme performance of the appliance, even with relatively slow SATA

    disks. The use of Solid State Drives (SSDs) further enhances the performance of the appliance: read SSD

    (called Readzillas) basically extends the appliances memory, and logging SSDs (called Logzillas) help

    synchronous writes to be acknowledged faster (the effect appears somewhat similar to write caching, but the

    technology is very different).

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    9/74

    Page 9 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    3.3 Linked cloning technology combined with Unified StorageThe basic idea of using Suns Unified Storage for linked cloned desktops came from two directions: First, a

    storage device with a lot of cache was needed, in order to be able to store the replicas (full clone images).

    Secondly, the barrier of 64 linked clones per replica limited the effectiveness of the cache, since one replica is

    needed for every 64 linked clones. This limit applies to storage devices having LUNs with VMFS (the VMware

    file system for storing VMs) on it. LUN queuing, LUN locking and some other artifacts come into play here.

    But when using NFS for storage, and not iSCSI or FC, the 64 linked clones per replica barrier could possibly

    be broken. NFS has no issues having a thousand or more opened files accessed in parallel. Since Suns Unified

    Storage is also able to deliver NFS, Suns storage device appeared to be the right choice.

    .

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    10/74

    Page 10 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    4 Performance test setupThe performance test was set up in Suns test laboratory in Linlithgow, Scotland. Sun made a number ofservers, a Sun 7410 Unified storage device and the necessary switching components available. The total

    hardware setup can be viewed in appendix 1.

    4.1 VMware ESX setupA total of nine servers were available for VMware ESX. Eight were used for virtual desktop loads, the ninth

    server was used for all other required VMs like vCenter, SQL, View and Active Directory. The specifications of

    the used servers:

    8x Sun x4450 with 4x 6core Intel CPU (2.6GHz), 64GB memory

    1x Sun X4450 with 4x 4core Intel CPU (2.6GHz), 16GB memory

    All nodes were connected with a single GbE NIC to the management network, a single NIC to a vmotion

    network, and with a third Ethernet NIC to an isolated client network where the Windows XP virtual desktops

    could connect to active directory / file serving.

    The eight nodes performing virtual desktop loads were also connected to an NFS storage network using two

    GbE interfaces. All these interfaces were connected to a single GbE switch.

    ESX 3.5 update 5 was used to perform the tests. Setup was kept to a default; console memory was increased

    to 800MB (maximum). In order to make sure both GbE connections to the storage array would be used, two

    different subnets were used to the array, each subnet accessed by its own VMkernel interface. Each VMkernel

    interface in its turn was connected to one of both GbE interfaces, guaranteeing a static load balancing across

    both interfaces for every host.

    To be able to house the maximum number of VMs possible on a single vSwitch, the port-count of the vSwitch

    was increased to 248 ports.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    11/74

    Page 11 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    4.2 VMware View setupFor managing the desktops, a template Windows 2003 64bit enterprise edition was created. From this

    template, five VMs were derived:

    1) Microsoft SQL 2005 standard server with SP3;2) Domain controller with DNS and file sharing enabled;3) VMware vCenter 2.5 update 5;4) VMware View 3.1.2;5) VMware Update Manager.

    During the tests, all these VMs were constantly monitored to guarantee that any limits found in the

    performance tests were not due to limitations within these VMs.

    All ESX nodes involved in carrying vDesktops were put in a single VMware cluster, which was kept at default.

    A single Resource Pool was created within the cluster (at default) to hold all vDesktops during the tests.

    4.3 Windows XP vDesktop setupThe Windows XP image used was a standard Windows XP install with SP2 integrated. PSTools was installed

    inside the image, in order to be able to start and stop application in batches, to simulate a simple user load

    of the vDesktops. No further tuning was done to the image.

    Within VMware the images were configured with an 8GB disk, a single vCPU and 512MB of memory.

    User load was simulated by using autologon of the vDesktop, after which a batch file was started. This batch

    file performed standard tasks with built-in delays. Examples of the tasks were:

    - Starting of MSpaint which loads an image from the Domain Controller/File server;- Starting Internet Explorer;- Starting MSinfo32;- Unzipping putty.zip to a local directory, then deleting it again;-

    Starting solitaire;

    - Stopping all applications again.These actions were fixed in order and delay. The delays were tuned until the vDesktop delivered an average

    load of 300MHz, and just about 6 IOPs (this is accepted as being a lightweight user). In this user load, a

    rather high write-load was introduced (in every 6 IOs, 5 are writes). This is considered to be a worst-case IO

    distribution for a vDesktop, making it a perfect setup for storage performance testing.

    Checking the performance of the XP desktops was not a primary objective of the performance tests, however

    after each test a few randomly chosen vDesktops would be accessed and the introduction to Windows XP

    would be started to see the fluidness of the animation, making sure the desktops were still responsive.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    12/74

    Page 12 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    4.4 Unified Storage setupThe Sun 7410 Unified storage device was connected to the storage switch using two 10 GbE interfaces. Only a

    single head was used in the performance test, connected to 136 underlying SATA disks in six trays. In four of

    the trays a LogZilla was present. In total two LogZillas (2x 18[GB]) were assigned to the 7410 head. Inside the

    7410 head itself, two Readzillas were available (2x 100[GB]). All SATA storage (apart for some hot spares) was

    mirrored (on a ZFS level). With a drive size of 1TB, this effectively delivers 60TB of total storage.

    The 7410 itself was configured with two Quad-Core AMD Opteron 2356 processors and 64[GB] of memory. A

    single, dual port 10GbE interface was added to the system for connection to the storage network. A third link

    (1GbE) was introduced for management inside the management network.

    During configuration, two shares were created, both having their own IP address on their own 10GbE uplink.

    This ensures static load balancing for the ESX nodes, and also ensures the load is evenly spread over both10GbE links on the storage unit. Jumbo frames was not enabled anywhere in the tests.

    In order to be able to measure the usage of the HyperTransport busses inside the 7410, a script was inserted

    into the unit which can measure these loads.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    13/74

    Page 13 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    5 Tests performedA total of three tests were performed; the first test loaded 1500 idle vDesktops in linked clone mode on thestorage. In the second test an attempt was made to load as many user load simulated vDesktops onto the

    testing environment in steps of 100 vDesktops. The third and final test was equal to the second test, but now

    using full clones from VMware View.

    For all test both NFS shares were used. VMware View will automatically balance the number of VMs equally

    across all available stores.

    5.1 Test 1: 1500 idle vDesktopsIn the first test, VMware View was simply instructed to deploy 1500 Windows XP images from a single source

    image. The resulting images were not performing any user load simulation, so were booted then left at idle.

    This test has been performed to get a general idea about loading on ESX and storage required for this

    number of VMs.

    5.2 Test 2: User load simulated linked clone desktopsAfter the initial test mentioned in 5.1, the test was repeated, now with user load simulated desktops. The test

    was performed in steps, with an additional 100 vDesktops every step. The steps are repeated until a

    limitation in storage, ESX and/or external environment is met.

    5.3 Test 2a: Rebooting 100 vDesktops in parallelAs test 2 (5.2) reached the 1000 vDesktop mark, a hundred vDesktops were rebooted in parallel. This test

    was performed to simulate a real life scenario, where a group of desktops is rebooted in a live environment.

    Especially the impact on the storage device is to be monitored.

    5.4 Test 2b: Recovering all vDesktops after storage appliance rebootAs test 2 (5.2) reached its maximum, the storage array was forcibly rebooted. Not really part of the

    performance test, yet interesting to see the recovery process of the storage array, and the recovery of the

    VMs on it.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    14/74

    Page 14 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    5.5 Test 3: User load simulated full clone desktopsUsing full clones on a Sun 7000 storage device was not expected to work as efficient as a linked cloning

    configuration. In this test a number of full clone desktops are deployed, 25 vDesktops in each step.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    15/74

    Page 15 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6 Test resultsThe test results are described on a per-test basis. The initial 1500 idle-running vDesktop test is also used asa general introduction into the behavior of the storage device, the solid state drives and the observed

    latencies.

    6.1 Test Results 1: 1500 idle vDesktopsAs an initial test, 1500 idle-running, linked-cloned vDesktops were deployed onto the test environment. After

    the system had settled, there was first prove about the storage device being able to cope at least with 1500

    idle vDesktop loads.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    16/74

    Page 16 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.1.1 Measured Bandwidth and IOP sizesRunning this workload used NFS bandwidth is measured in figure 6.1.1:

    Figure 6.1.1: Running 1500 idle desktops, about 22MB/s writes and 10MB/sec reads are observed.

    The fact that about twice as much data is written than read, is probably due to the fact that the vDesktops are

    running idle (little reads taking place), while the vDesktops only have 512[MB] of memory each, causing them

    to use their local swap files and writing out to the storage device.

    0

    5

    10

    15

    20

    25

    30

    1 101 201 301 401 501 601 701 801 901 1001

    NFSrate[MB.sec-1]

    Time [sec]

    NFS read and write MBs

    (1550 idle-running vDesktops)

    NFS writes ave [MB/sec]

    NFS reads ave [MB/sec]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    17/74

    Page 17 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    As both bandwidth and number of IOPS have been measured, it is easy to derive the average block size

    of the NFS reads and writes:

    Figure 6.1.2: Average NFS read- and write block sizes observed

    Since VMware ESX will try to concatenate sequential reads and writes whenever possible, it is very likely that

    the writes are completely random (NTFS 4K block size appears to be overruling here). The read operations are

    bigger on average, probably meaning there are some quasi sequential reads going on.

    0

    5

    10

    15

    20

    25

    30

    1 101 201 301 401 501 601 701 801 901 1001

    AverageNFSBlocksize[KB]

    Time [sec]

    Average NFS read and write blocksizes

    (1500 idle-running vDesktops)

    NFS write Blocksize [KB]

    NFS read blocksize [KB]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    18/74

    Page 18 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Since all writes to the storage device are synchronous and have very small block sizes, all writes will be put

    into the LogZilla devices before they pass on to SATA. As the data to be written traverse through these stages,

    the number of WOPS becomes smaller with every step:

    Figure 6.1.3: Number of Write operations observed through the three stages

    Here it becomes obvious how effective the underlying ZFS file system is. The completely random write load

    which consists of nearly 5000 Write Operations per second, gets converted in the last stage (SATA) to just

    over 30 write operations per second. ZFS is effectively converting the tiny random writes to NFS into large

    sequential blocks, effectively dealing with the relatively poor seek times of the physical SATA drives.

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500

    5000

    1 101 201 301 401 501 601 701 801 901 1001

    WriteOperations[sec

    -1]

    Time [sec]

    Comparing Write OPS through stages

    (1500 idle-running vDesktops)

    LogZilla WOPS [/sec]

    SATA WOPS [/sec]

    NFS WOPS [/sec]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    19/74

    Page 19 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The write operations are effectively being dealt with. On reads, the following is observed on the SATA drives:

    Figure 6.1.4: Observed SATA read operation per second.

    At an average read bandwidth of 10[MB.sec-1] (see figure 6.1.1), less than 0.3 read operations per second

    (ROPS) are observed on the SATA drives. This raises the suspicion that most (in fact almost all) read

    operations are served by the read cache (ARC or L2ARC), and only very little reads actually originate from the

    SATA drives, effectively boosting overall read performance of the Sun 7000 storage device.

    -0,5

    0

    0,5

    1

    1,5

    2

    2,5

    3

    1 101 201 301 401 501 601 701 801 901 1001

    SATAReadOperation

    [sec-1]

    Time [sec]

    SATA IOPS read ave [/sec]

    SATA ROPS [/sec]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    20/74

    Page 20 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.1.2 Caching in the ARC and L2ARCZooming in on the read performance, we need to look closer to the read caching going on. In figure 6.1.5 it is

    obvious, that the ARC (64[GB] minus overhead) was saturated and the L2ARC (200[GB]) is only filled up to

    about 70[GB]:

    Figure 6.1.5: Running 1500 idle desktops, the ARC shows fully filled while the L2ARC flash drives

    vary in usage around 64[GB].

    0

    10000

    20000

    30000

    40000

    50000

    60000

    70000

    1 101 201 301 401 501 601 701 801 901 1001

    ARC/L2ARCsize[MB]

    Time [sec]

    ARC / L2ARC size (1500 idle-running vDesktops )

    ARC datasize [MB]

    L2ARC datasize [MB]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    21/74

    Page 21 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The ARC/L2ARC not being saturated should mean that all actively read data still fits into memory (ARC) or

    Readzilla (L2ARC). This is clearly shown in figure 6.1.6, where the number of ARC hits show to be much larger

    than the number of ARC misses:

    Figure 6.1.6: Running 1500 idle desktops, the ARC hits show around 7000 per second while the

    ARC misses show up at about 250. This is an indication of the effectiveness of the

    (L2)ARC while running this specific workload.

    While read operations appear to be properly drawn from ARC or L2ARC, write operations must be committedto the disks at some point. The NFS writes are synchronous, meaning that each write operation must be

    guaranteed to be saved by the storage device before acknowledging the operation. This would mean a rather

    bad write performance, since the underlying disks are relatively slow SATA drives.

    This problem is countered by the use of LogZilla devices. These devices are write-optimized solid state disks

    (SSDs), which constantly store the write operation metadata and acknowledge the write back immediately,

    before it is actually committed to disk. As soon as the write is actually committed to SATA storage, the

    metadata entry is removed from the LogZilla (this the reason it is called a LogZilla and not a write cache; the

    LogZilla is only there to make sure the dataset does not get in an inconsistent state when for example a

    power outage occurs).

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    1 101 201 301 401 501 601 701 801 901 1001

    ARChits/misses[sec-1]

    Time [sec]

    ARC hits / misses (1500 idle-running vDesktops)

    ARC hits [/sec]

    ARC misses [/sec]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    22/74

    Page 22 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The underlying ZFS file system flushes the writes at least every 30 seconds to disk. The ZFS file system is able

    to perform random writes to the SATA disks very effective, actually being a big sequential write whenever

    possible. This can be verified from the graph in figure 6.1.3.

    6.1.3 I/O LatencyBesides read and write performance, it is also necessary to look at storage latency. Latency is the delay

    between a request to the storage, and the answer back. During a read it is typically the time from a read

    request to the delivering of the data. During a write it is typically the time required from a write to the write

    acknowledgement back.

    Best performance is met when latency is minimal. To be able to graph latency through time, a three

    dimensional graph is required. The functions of the different axes are:

    - Horizontal Axis: Time;- Vertical Axis: Number of Read and/or Write Operations;- Depth Axis: Latency.

    Latency is grouped into ranges instead of unique values. This enables the creation of 3D graphs,

    because it is now possible to see groups of IOPS which conform to a certain latency range.

    Since in many occasions almost all latency falls within the lowest group of 0-20[ms], graphs are often

    zoomed in, where the number of IOPS (Vertical axis) is clipped to a low number. As a result, the peaks

    of the 0-20[ms] latency-group go of the chart. This gives room to a more clear view of the higher

    latency-groups. Please take note that these graphs do not give a total overview of the number of IOPSperformed; they merely give insight to the tiny details which are almost invisible in the original (non

    zoomed) graph.

    In figure 6.1.7a (with its zoomed counterpart 6.1.7b) the latency graph is displayed for NFS Read

    Operations with 1500 idle-running vDesktops. Almost all operations fall within the 0-20[ms] latency-

    group. Only when looking at the zoomed graph (figure 6.1.7b), some higher latencies can be observed.

    However, these are so very small in numbers compared to the number of IOPS within the 0-20[ms]

    latency-group, that only very little impact is to be expected from this.

    The read operations that required more time to complete are probably the ARC/L2ARC cache misses,

    and had to be read from SATA. These SATA reads are the reads observed in figure 6.1.4.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    23/74

    Page 23 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.1.7a: Observed latency in NFS reads. Most read operations are served within 20[msec]

    Figure 6.1.7b: Detail of latency in NFS read operations. Clipped at only 20 OPS to visualize higher

    latency read operations.

    0

    200

    400

    600

    800

    1000

    1200

    NFSReadOperations[sec-1]

    NFS Read Latency (1500 idle-running

    vDesktops)

    0

    5

    10

    15

    20NFSReadOperations

    [sec-1]

    NFS Read Latency ZOOMED (1500 idle-

    running vDesktops)

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    24/74

    Page 24 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.2 Test Results 2: User load simulated linked clone desktopsAfter the initial test with idle-running desktops, the environment was reset. A new Windows XP image was

    introduced, which delivers a lightweight user pattern:

    - 200[MHz] CPU load;- 300[MB] active memory;- 7 observed NFS IOPS.

    The memory and CPU load were deliberately held to a low level, so a maximum number of VMs would fit onto

    the virtualization platform. The number of IOPs was matched to the accepted industry-average of 5 - 5.6

    IOPs, with a calculated 150% overhead for linked cloning technology (See reference [1] for an explanation on

    the 150% factor).

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    25/74

    Page 25 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.2.1 Deploying the initial 500 user load-simulated vDesktopsWhen deploying the initial 500 vDesktops, the effect of the deployment was clearly reflected in several

    graphs. In figure 6.2.1 the ARC + L2ARC size grow almost linear during deployment:

    Figure 6.2.1: Observed ARC/L2ARC data size growth when deploying the first 500 desktops.

    During the deployment of the very first vDesktops, the ARC immediately fills with both replicas (a replica is

    the full-clone image from which the linked clones are derived). There are two replicas, because two NFS

    shares were used, and VMware View places one replica on each share. In the leftmost part of the graph it is

    actually identifiable that both replicas are put into the ARC one by one.

    After this initial action, the ARC starts to fill. This is because the created linked clones are also being read

    back. Since every vDesktop behaves the same, the read back performed on the linked clones is also identical,

    which explains the near-linear growth.

    0

    10000

    20000

    30000

    40000

    50000

    60000

    ARC/L2ARCDatasize[MB]

    Time

    ARC / L2ARC datasize (0 - 500 userloaded

    vDesktops)

    ARC datasize [MB]

    L2ARC datasize [MB]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    26/74

    Page 26 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    At the right of figure 6.2.1, the ARC fills up to its memory limit of 64[GB] minus the Storage 7000 overhead. It

    is not until this time that the L2ARC starts to fill in the same linear manner as the ARC did. It becomes clear

    that the L2ARC behaves as a direct (though somewhat slower) extension of the ARC (which resides in

    memory).

    When looking at ARC hits and misses in figure 6.2.2, it becomes clear that more and more read operations

    are performed throughout the deployment:

    Figure 6.2.2: Observed ARC hits and misses while deploying the initial 500 user loaded vDesktops.

    The graph in figure 6.2.2 clearly shows the growing number of ARC hits. The ARC misses hardly increase at

    all. This means that as more vDesktops are deployed, the effectiveness of the read cache mechanism

    increases.

    0

    1000

    2000

    3000

    4000

    5000

    6000

    ARChits/misses[sec-1]

    ARC hits/misses (0 - 500 userloaded vDesktops)

    ARC hits [/sec]

    ARC misses [/sec]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    27/74

    Page 27 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.2.3: Consumed NFS bandwidth during deployment of the initial 500 vDesktops

    In figure 6.2.3 it is clearly visible that the first 500 vDesktops were deployed in batches of 100. During the

    linked cloning deployment, consumed NFS bandwidth is clearly higher than during normal running periods.

    0

    5

    10

    15

    20

    25

    30

    NFSread/write[MB.sec-1]

    NFS bandwidth consumed (0-500 userloaded

    vDesktops)

    NFS write ave [MB/sec]

    NFS read ave [MB/sec]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    28/74

    Page 28 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.2.4: SATA Read- and Write Operations observed during the deployment of the initial 500

    vDesktops. Note that the vertical scale has been extended to -2 in order to clearly

    display the Read Operations, which run over the vertical axis itself.

    Figure 6.2.4 shows that the SATA Write Operations increase with the number of vDesktops running. The Read

    Operations remain at a minimum level, without any measurable increase. This is in line with figure 6.2.2

    showing that the read cache gets more effective with a growing number of deployed vDesktops.

    -2

    3

    8

    13

    18

    23

    28

    Read/W

    riteOperations[sec-1]

    SATA Read- and Write Operations (0 - 500

    userloaded vDesktops)

    SATA WOPS [/sec]

    SATA ROPS [/sec]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    29/74

    Page 29 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The write operations to SATA are synchronous, and get accelerated by the LogZillas. The graph in figure 6.2.5

    shows the WOPS to the LogZilla devices:

    Figure 6.2.5: Write Operations to the LogZilla device(s).

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    LogZillaWOP

    S[sec-1]

    Logzilla WOPS ave [/sec]

    (0-500 userloaded vDesktops)

    LogZilla WOPS ave

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    30/74

    Page 30 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The ZFS file system is able to deliver this workload using a very limited amount of SATA write operations. A

    possible downside of the ZFS file system, is the large amount of CPU overhead imposed. See figure 6.2.6 for

    details on CPU usage of the sun 7000 storage device:

    Figure 6.2.6: CPU usage in the Sun 7000 storage during deployment of 500 user load-simulated

    vDesktops

    0

    5

    10

    15

    20

    25

    30

    35

    40

    Sun Storage 7000 CPU load ave [%]

    CPU load ave [%]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    31/74

    Page 31 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.2.2 Impact of 500 vDesktop deployment on VMware ESXAs the number of vDesktops increases, the load on VMware ESX and vCenter also increases. See figure 6.2.7,

    6.2.8 and 6.2.9 for more details:

    Figure 6.2.7: CPU usage within one of the eight VMware ESX hosts during the deployment of the

    initial 250 vDesktops. The topmost grey graph is the CPU overhead of VMware ESX.

    In figure 6.2.7 the deployment of vDesktops is clearly visible. Each time a vDesktop is deployed and started, a

    ribbon is added to the graph. Each vDesktop uses the same amount of CPU power, which is increased

    slightly just after deployment (when the VM is booting its operating system)

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    32/74

    Page 32 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.2.8: Active Memory used by the vDesktops on one of the ESX nodes during the

    deployment of the initial 250 vDesktops. The lower red ribbon is ESX memory

    overhead due to the Service Console.

    Figure 6.2.8 shows the active memory consumed as the vDesktops are deployed on one of the ESX nodes.

    After each batch of 100 vDesktops, the memory consumption stops increasing, then slightly decreases. This

    effect is caused by two things:

    1) Freeing up tested memory within the VMs (Windows VMs touch all memory during memory test);2) VMwares Transparent Page Sharing technology.

    As the VMs settle on the ESX server, ESX starts to detect identical memory pages, effectively deduplicating

    them (item 2 on the list above). This feature can save a lot of physical memory usage, especially when

    deploying many (almost) identical VM workloads.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    33/74

    Page 33 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.2.9: Physical memory shared between vDesktops thanks to VMwares Transparent Page

    Sharing (TPS) function within VMware ESX.

    Transparent Page Sharing (TPS) effects become clearer when looking at the graph in figure 6.2.9. As VMs are

    added to the ESX server, more memory pages are identified as being duplicates, saving more and more

    physical memory.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    34/74

    Page 34 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.2.3 Impact of 500 vDesktop deployment on VMware vCenter and ViewVMware vCenter and VMware View are not directly involved in the delivering of the vDesktop

    workloads, but they play an important role during the deployment of net vDesktops. The CPU loads on

    these machines clearly show the deployment of the batches of vDesktops:

    Figure 6.2.10: Observed CPU load on the (dual vCPU) vCenter server during vDesktop deployment.

    Note the dual y-axis descriptions; some values are percents, others are [MHz].

    In figure 6.2.10, the deployment batches can be clearly extracted. After each batch, vCenter server

    settles at a slightly higher CPU load. This is caused by the number of VMs to manage and monitor

    within the entire ESX cluster.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    35/74

    Page 35 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.2.11: Observed CPU load on the VMware View server during vDesktop deployment.

    Note the dual y-axis descriptions; some values are percents, others are [MHz].

    The VMware View server shows pretty much the same characteristics as the VMware vCenter server.

    Higher CPU loads during the batch deployment of vDesktops, and settling somewhat higher after each

    batch.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    36/74

    Page 36 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.2.4 Deploying vDesktops beyond 500After the successful deployment of the initial 500 vDesktops, further batches of 100 vDesktops were

    deployed. Goal was to fit as many vDesktops onto the testbed as possible, keeping track of all

    potential boundaries (performance-wise).

    The largest amount of vDesktops that could be deployed was 1319. At this point VMware stopped

    deploying more vDesktops because the ESX servers were running out of vCPUs. Within ESX version 3.5,

    the limit of the number of VMs that can run on a single node is fixed to 170. This maximum was

    reached just before ESX physical memory ran out:

    Figure 6.2.12: ESX node resource usage when deploying 1300 vDesktops

    As the graph in figure 6.2.12 is showing, the ratio of memory versus CPU usage was almost matched. The

    limitation of the number of running VMs, memory limitations and CPU power limitations reached their

    maximum almost simultaneously.

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300

    CPU/Memoryusage

    (average)[%]

    Number of deployed user-simulated vDesktops

    VMware ESX node resource usage (0 to 1300

    vDesktops)

    Node CPU

    Node mem

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    37/74

    Page 37 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Due to the nature of the ZFS file system, the CPU load on the storage device was a concern. The

    measured values can be found in figure 6.2.13:

    Figure 6.2.13: CPU load on the 7000 storage during deployment of 1300 vDesktops. Note the HT0

    value. This is the HT-bus between the two quad core CPUs inside the storage device.

    The relaxation points at 600-700 and 1200 vDesktops were due to settling of the

    environment during weekends.

    As shown, the CPU load on the storage device is quite high, but not near saturation yet. The HT0 bus

    displayed here was the one HT-bus having the biggest bandwidth usage. This is due to the fact that a

    single, dual-channel PCI-e 10GbE card was used in the environment. The result of this was that the

    second CPU had to transport all of its data to the first CPU in order to be able to get its data in and out

    of the 10GbE interfaces. Note that the design could have been optimized here to use two separate

    10GbE cards, each on PCI-e lanes that use a different HT-bus. This would have resulted in a better

    load balancing across CPUs and HyperTransport busses. See figure 7.3.1 for a graphical representation

    of this.

    0

    0,5

    1

    1,5

    2

    2,5

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    H

    yperTransportbusbandwidthusage[GB.sec-1]

    CPUconsumed[%]

    Number of deployed user-simulated vDesktops

    7000 Storage CPU resources (0 to 1300

    vDesktops)

    7410 CPU load

    HT0/socket1

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    38/74

    Page 38 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The memory consumption of the 7000 storage is directly linked to the amount of read cache used. As

    the number of vDesktops increase, the ARC (memory cache) fills up. As it reaches about 450

    vDesktops, the ARC reaches its 64[GB] limit and the L2ARC (solid state drive) starts to fill (see figure

    6.2.14):

    Figure 6.2.14: Memory usage on the 7000 storage during the deployment of 1300 vDesktops. Note

    the L2ARC (SSD drive) starting to fill as the ARC (memory) saturates. The relaxation

    between 600 and 700 vDesktops is due to a stop of deploying during a weekend

    (ARC flushing occurred through time as the vDesktops settled in their workload).

    The L2ARC finally settled at just about 100[GB] of used space (on the testbed there was a total of

    200[GB] of ReadZilla available).

    0

    20

    40

    60

    80

    100

    120

    Memoryusage[GB]

    Number of deployed user-simulated vDesktops

    7000 Storage memory usage (0 to 1300

    vDesktops)

    7410 L2ARC [GB]

    7410 ARC [GB]

    7410 Kernel use [GB]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    39/74

    Page 39 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The networking bandwidth and IOPs used by the testbed is displayed in figure 6.2.15:

    Figure 6.2.15: NFS traffic observed during the deployment of 1300 vDesktops.

    The dips in the graphics at 600/700 and 900/1000 vDesktops are actually weekends; the vDesktops

    settled in their behavior which shows in the graph in figure 6.2.15.

    0

    10

    20

    30

    40

    50

    60

    70

    80

    0

    1000

    2000

    30004000

    5000

    6000

    7000

    8000

    9000

    Networktraffic[MB.sec-1]

    I/Ooper

    ations[sec-1]

    Number of deployed user-simulated vDesktops

    NFS traffic (0 to 1300 vDesktops)

    NFS IOPSNFS reads

    NFS writes

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    40/74

    Page 40 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.2.5 Performance figures at 1300 vDesktopsThe system saturated at 1300 vDesktops, due to the limit in the maximum number of running VMs

    inside the ESX servers. Performance of the vDesktops at this number was still very acceptable, even

    though both memory and CPU power where almost at their maximum.

    The VMs were still very responsive. Random vDesktops were accessed through the console, and

    responsiveness was tested by starting the welcome to Windows XP introduction animation. Both frame rate

    and animation speed did not deteriorate significantly through the entire range of 0 to 1300 vDesktops.

    A good grade to determine this technically is the CPU ready time. This is the time that a VM is ready to

    execute on a physical CPU core, but ESX somehow cannot manage to schedule it to a physical core:

    Figure 6.2.16: CPU ready time measured on a vDesktop on a 30 minute interval.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    41/74

    Page 41 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Note that these values are summed up between samples, and all millisecond values should be divided by

    1800 (30 minutes) in order to obtain the number of milliseconds ready time per second (instead of per 30

    minutes). In the leftmost part of the graph, vDesktops are still being deployed and booted up, impacting

    performance (ready time is about 12.5 [ms.sec-1

    ]). After the deployment is complete, ready time drops toabout 4.2 [ms.sec-1]. These values are very acceptable from a CPU performance point of view.

    Next to CPU ready times, also the NFS latency is of great influence on the responsiveness of the

    vDesktops. The graphs in the following figures were made at a load of 1300 vDesktops:

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    42/74

    Page 42 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.2.17a and 6.2.17b: Observed NFS read latency at 1300 user simulated vDesktops

    0

    200

    400

    600

    800

    1000

    1200

    1400

    1600

    1800

    2000

    ReadIOP

    S[sec-1]

    NFS Read Latency (1300 userloaded vDesktops)

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    Re

    adIOPS[sec-1]

    NFS Read Latency ZOOMED (1300 userloadedvDesktops)

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    43/74

    Page 43 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Looking at graph 6.2.17a, it shows that almost all Read Operations are served within 20 [ms]. At this load it is

    quite impressive.

    When we look at the read latency in more detail (figure 6.2.17b), there are some Read Operations which takea longer time to be served. To put this in numbers, there are between 1 and 2 read operations every second

    which take up to about 100 [ms] to be served. Take note, this is only about 0.2% of the read operations

    performed.

    Next to read latency, also write latency is measured. The write latency appears to be a little worse than

    the read latency:

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    44/74

    Page 44 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.2.18a and 6.2.18b: Observed NFS write latency at 1300 user simulated vDesktops

    0

    2000

    4000

    6000

    8000

    10000NFSwriteOperations[sec-1

    ]

    NFS Write Latency (1300 userloaded vDesktops)

    0

    200

    400

    600

    800

    1000NFSwriteOperations[sec-1]

    NFS Write Latency ZOOMED (1300 userloaded

    vDesktops)

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    45/74

    Page 45 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    About 150 write operations require more than the base 0-40ms window to complete. Since the total

    number of write operations is about 6000, this is about 2,5% of the operations performed. The only

    explanation of these high latency numbers can be that some writes are not committed to the LogZilla,

    but are flushed to disk directly. This is normal behavior for ZFS.

    Within ZFS, larger blocks are not committed to the LogZilla. This is controlled by a parameter called

    zfs_immediate_write_sz. This parameter is actually a constant within ZFS, and set to 32768 (see

    reference [2] )

    VMware ESX will concatenate writes if possible, up to 64[KB]. It is safe to assume that the majority of

    writes equal the size of the vDesktops NTFS block size (4[KB]). However, some blocks do get

    concatenated within VMware.

    Looking at figure 6.1.2, we can see that the average write size is 5.5 [KB]. If we calculate the projected

    block size from the behavior seen above, we can conclude that:

    This is a perfect match, so it is safe to assume that the large write latency observed is in fact due to

    this behavior. Tuning the zfs_immediate_write_szconstant could help in this case (increasing it to

    65537 (which is 2^16+1) to make sure 64K writes are also stored in the LogZillas). Unfortunately

    adjusting of this parameter is not supported on the Sun 7000 storage arrays (nor is it in ZFS to myknowledge).

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    46/74

    Page 46 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    VMware ESX has a feature called Transparent Page Sharing (TPS). This allows VMware ESX to map

    several virtual memory pages which are identical to the same physical memory page. VMware performs

    this memory deduplication in either hardware (vSphere 4 plus supported CPUs) or spare CPU cycles

    (both ESX3.x and vSphere 4 optionally), so the positive effect of TPS gets bigger over time (also seefigure 6.2.9).

    At a total of 1300 deployed vDesktops, ESX saves a large amount of memory:

    Figure 6.2.19: Memory shared between vDesktops within a single ESX server.

    As shown in figure 6.2.19, there are 170 VMs running (the graph shows only one of the eight ESX

    nodes). Each ribbon in this graph represents a VM. In total, 22,5 [GB] of memory is shared thus saved

    between vDesktops per ESX node. Without TPS, the ESX servers would have required 64+22,5 = 86,5

    [GB] of memory (a 30% saving !).

    When looking at the entire ESX cluster, each ESX server saves about the same amount of memory

    thanks to TPS, saving 8*22,5 = 180 [GB] of memory.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    47/74

    Page 47 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.2.6 Extrapolating performance figures

    In order to be able to predict the maximum amount of vDesktops which can be placed on a certainenvironment, it is important to take note of all limiting factors. By extrapolating the measurements

    performed on these factors, we can determine how to scale different resources (like CPU, memory, SSD

    drives) to match the number of vDesktops we need to deploy.

    For scaling VMware ESX CPU and memory, we set the maximum allowable load to 85%. The

    extrapolated graph can be found in figure 6.2.20:

    Figure 6.2.20: Extrapolation of figure 6.2.11: ESX node resource usage

    In figure 6.2.20, memory is limited at 1300 desktops (which actually was the limit we ran into during

    the test). CPU had some room to spare: If we pushed CPU consumption to 85%, we could deploy 1650

    vDesktops.

    05

    1015202530354045505560

    6570758085

    0 100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    1100

    1200

    1300

    1400

    15

    00

    1600

    17

    00

    1800

    CPU/Mememoryusage(ave

    rage)[%]

    Number of deployed user-simulated vDesktops

    VMware ESX node resource usage

    (extrapolated)

    Node CPU

    Node mem

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    48/74

    Page 48 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.2.21: Extrapolation of figure 6.2.12: CPU load on the 7000 storage

    Looking at figure 6.2.21, the extrapolated value for the 7000 storage CPU usage would put the

    maximum number of vDesktops on 1900. The theoretical maximum of the HT bus is 4[GB.sec-1], but a

    general accepted value is around 2.5[GB.sec-1]. This would mean the HT-bus would limit the number of

    vDesktops to 1950.

    0

    0,5

    1

    1,5

    2

    2,5

    -5

    5

    15

    25

    35

    45

    55

    65

    75

    85

    0 100

    200

    300

    00

    500

    600

    700

    800

    900

    1000

    1100

    1200

    1300

    1

    00

    15

    00

    1600

    17

    00

    1800

    1900

    2000

    HyperTransportbustroughput[GB.sec-1]

    CPUconsumed[%]

    Number of deployed user-simulated vDesktops

    7000 Storage CPU resources (Extrapolated)

    7410 CPU load

    HT0/socket1

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    49/74

    Page 49 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    For read caching, the 7000 storage relies on memory and Solid state drives (SSDs). Both basically

    extrapolate the same, only memory is much faster than SSD. For extrapolating memory usage, using

    the ARC values is sufficient:

    Figure 6.2.22: Extrapolation of figure 6.2.14: Memory usage on the 7000 storage

    Extrapolation of the ARC size shows, that at 256 [GB] of memory minus some overhead for the Kernel

    could have up to 2400 vDesktops deployed. Beyond this point SSDs (ReadZilla) would have to be used

    in order to extend memory beyond 256[GB] which is the maximum amount of RAM that can be added

    to the biggest 7000 series array at the time of this writing.

    An important note to take is that the measured range of the ARC is rather short. A slight variation in

    the measurement could have quite a dramatic effect on the final number of vDesktops that can be

    deployed in a given environment.

    0

    50

    100

    150

    200

    250

    200

    00

    15

    00

    1800

    2300

    Memoryusage[GB]

    Number of deployed user-simulated vDesktops

    7000 Storage memory usage (Extrapolated)

    7410 ARC [GB]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    50/74

    Page 50 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Finally, the NFS traffic is extrapolated in order to be able to see projected network bandwidth and

    number of IOPS required for a given number of vDesktops:

    Figure 6.2.23: Extrapolation of figure 6.2.15: NFS traffic observed

    The extrapolation in figure 6.2.23 is set to some limits. In the network bandwidth projection a

    maximum of 2x 1GbE is used, with usage limited to 50% for each link in order to avoid possible

    saturation / packet dropping on the link.

    The number of total IOPS in this projection is limited to 12.000 [sec-1]. The reason for choosing this

    number is that at the measured I/O distribution, there should be about 10.000 Write Operations

    performed Per Second (WOPS), which is the maximum for a LogZilla device.

    According to this graph, maximums come into play above 1800 vDesktops. For the NFS read

    bandwidth, the maximum is not reached in this graph but would end somewhere near 4000

    vDesktops (!).

    0

    2000

    4000

    6000

    8000

    10000

    12000

    1800

    0

    10

    20

    30

    40

    50

    60

    NFSIOPS[sec-1]

    Number of deployed user-simulated vDesktops

    Networktraffic[MB.sec-1]

    NFS traffic (Extrapolated)

    NFS writes

    NFS reads

    NFS IOPS

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    51/74

    Page 51 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    In close relation to the NFS IOPS performed, SATA ROPS and WOPS can also be extrapolated:

    Figure 6.2.24: Extrapolation of figure 6.2.4: SATA Read- and Write-Operations Extrapolated

    The graph in figure 6.2.24 clearly shows that hardly any SATA ROPS are performed; SATA WOPS

    steadily increase as the number of running vDesktops increase. Note that at 1500 running vDesktops

    the number of WOPS is projected to be only 68 [sec-1] WOPS. ROPS remain near zero.

    -2

    8

    18

    28

    38

    48

    58

    68

    Read/WriteOpe

    rations[sec-1]

    Number of deployed vDesktops

    SATA Read- and Write Operations

    (Extrapolated)

    SATA WOPS [/sec]

    SATA ROPS [/sec]

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    52/74

    Page 52 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The write acceleration through the LogZilla device(s) can also be extrapolated:

    Figure 6.2.25: Extrapolation of figure 6.2.5: LogZilla WOPS performed

    0

    200

    400

    600

    800

    1000

    1200

    1400

    1600

    1800

    2000

    LogZillaW

    OPS[sec-1]

    Number of vDesktops deployed

    Logzilla WOPS ave [/sec] (Extrapolated)

    LogZilla WOPS ave

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    53/74

    Page 53 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Latency is more complex to extrapolate. By extrapolating each latency-group, a 3D graph can be

    recreated to show projected NFS read latencies:

    Figure 6.2.26: Extrapolation of NFS read latency, clipped at 100 read operations per second.

    Figure 6.2.26 is an extreme zoom of an extrapolated NFS read latency graph. The graph has been cut

    into segments and had isolations inserted to give a clear view of the latency graphs as more vDesktops

    would be deployed on the environment.

    As the number of vDesktops gets bigger, more latency is introduced. This was already determined. In

    this graph though, it becomes clear that the distribution of latency changes as the load increases.

    0,00

    20,00

    40,00

    60,00

    80,00

    100,00

    NFSReadOperat

    ions[sec-1]

    Extrapolated NFS read latencies

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    54/74

    Page 54 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.3 Test Results 2a: Rebooting 100 vDesktopsThe impact that should never be underestimated is the impact on the storage when a large number of VMs

    have to be rebooted. The reboot process uses far more resources than a regular workload, and especially

    rebooting a lot of vDesktops in parallel can mean a large increase in I/O operations performed.

    As a subtest, we shut down then restarted a hundred vDesktops with a total of 800 vDesktops deployed. The

    impact is best seen in the latency graphs:

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    55/74

    Page 55 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.3.1a and 6.3.1b: NFS read latency rebooting 100 vDesktops (@800 deployed).

    0

    2000

    4000

    6000

    8000

    10000NumberofReadOPS[sec-1]

    NFS Read Latency (800 vDesktops, 100 rebooting)

    0

    200

    400

    600

    800

    1000NumberofReadOPS[sec-1]

    NFS Read Latency ZOOMED (800 vDesktops, 100

    rebooting)

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    56/74

    Page 56 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    As can be seen in graphs 6.3.1a and b, the reboot in total took about one hour. The restart was issued

    through VMware View, who schedules the restarts spread over time to vCenter. In the a (unzoomed) graph,

    the peaks above the 4000 ROPS indicate a higher number of read operations caused by the restarting of the

    vDesktops. The zoomed graph (graph b) shows more detail on the read latency getting worse during therestarts. This is due to the fact that especially the linked clones files that were written by the VMs previously,

    are now read back and have to be introduced to the ARC/L2ARC read cache, meaning the reads have to come

    from the relatively slow SATA drives. A second restart might have had less impact in this respect (untested).

    The filling of the L2ARC (from SATA) at rebooting of the vDesktops can be clearly seen in the graph in figure

    6.3.2:

    Figure 6.3.2: L2ARC growth on desktops reboot

    A hundred rebooting vDesktops caused the L2ARC to grow about 50[GB] in size. Since all common reads

    come from only two replicas which are already stored in the ARC, apparently each VM reads about 0,5[GB] of

    unique data (from its linked clone).

    0

    20

    40

    60

    80

    100

    120

    ARC/L2ARCmemoryuagse[GB]

    (L2)ARC growth (restart of 100 vDesktops in an

    800 vDesktop environment)

    ARCdata size

    L2ARCdata size

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    57/74

    Page 57 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Network bandwidth used is also clearly higher during the reboot of the vDesktops:

    Figure 6.3.3: NFS bandwidth used during reboot of 100 vDesktops.

    At the left of the graph above, the regular I/O workload can be observed. The rest of the graph is the reboot

    of the 100 vDesktops.

    0

    20

    40

    60

    80

    100

    120

    NFSread/write[MB.se

    c-1]

    NFS bandwidth used (restart of 100 vDesktops

    in an 800 vDesktop environment)

    NFS write ave

    NFS read ave

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    58/74

    Page 58 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.4 Test Results 2b: Recovering all vDesktops after storage appliancereboot

    When running 1000 vDesktops, the storage array was forcibly rebooted. This subtest was performed to see

    the impact on the storage array, on the data and on the vDesktops.

    At the time of the forced shutdown of the storage device, all VMs froze. After rebooting the storage

    appliance, the ZFS file system had to perform some resilvering (checking and making sure the data is

    consistent which is a very reliable feature in ZFS) before normal NFS communication to the ESX servers was

    able to resume. At this resume, the VMs simply unfroze and started to show their normal behavior almost

    instantly.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    59/74

    Page 59 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    In graph 6.4.1 the effects of the forced reboot can clearly be seen:

    Figure 6.4.1: Network and CPU load behavior during reboot of the storage appliance. The red

    Striped bars indicate no measurements were made (during reboot of the storage

    device itself)

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    60/74

    Page 60 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The red bar in figure 6.4.1 indicates the time required to perform the (re)boot of the storage device. The

    silent period after that is the so called resilvering of the ZFS file system. No I/O is performed at this stage,

    but as can be seen the CPU is quite busy during the resilvering.

    After resilvering is done, the storage device immediately continues to perform I/O and settles quite fast. After

    a reboot of the appliance, the ARC is empty (being RAM), and the L2ARC data is forcibly deleted and will be

    rebuilt as read operations start to occur. Initially, the read operations will have to come out of SATA, filling up

    the ARC and after that the L2ARC as they are read. In figure 6.4.2 the refilling of the ARC and L2ARC are

    clearly visible:

    Figure 6.4.2: Filling of the ARC and the L2ARC after a forced reboot of the storage appliance.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    61/74

    Page 61 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    The graph in figure 6.4.2 clearly shows the rapid filling of the ARC. It appears to fill a little bit during

    resilvering, then shoots up quick (probably the two replicas are pulled into the ARC here). From there on, the

    filling of the ARC slows its pace, and the L2ARC starts to come in as well. The third graph in figure 6.4.2

    shows the (L2)ARC misses. During a few minutes there are quite a lot of misses, but this issue is resolvedrather quickly.

    All in all the device is up and running again in 15 minutes. Take note that the setup used here did not make

    use of the clustering features available for the 7000 series; all tests are performed on a single storage

    processor.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    62/74

    Page 62 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    6.5 Test Results 3: User load simulated full clone desktopsA limited test was added to the original linked-clone test scenario. In this test the same (user

    simulated) Windows XP images were deployed, but this time not in linked clone but full clone mode.

    Only 150 full-clone desktops have been deployed, to see the behavior of the ARC and L2ARC in this

    scenario.

    Figure 6.5.1: Filling of the ARC and the L2ARC during the deployment of 150 full-clone

    vDesktops. Totally left some test vDesktops (full clones) are deployed. At (1)

    the first batch of 25 vDesktops are deployed, at (2) the rest of the vDesktops

    are deployed.

    See figure 6.5.1. After start of the test (totally left) some full-clone vDesktops are deployed. At the (1)

    marker, the first batch of 25 vDesktops are deployed. Shortly after marker (1), the ARC and L2ARC sizes settle

    around 25 [GB]. This indicates that the vDesktops perform around 1[GB] of reads per vDesktop. Because the

    ARC does not saturate yet, the L2ARC remains (almost) empty at this stage. Beyond marker (2) the rest of the

    vDesktops are deployed, quickly filling the ARC and the L2ARC.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    63/74

    Page 63 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    Figure 6.5.2: Filling Extrapolation of the 7000 storage CPU usage.

    Figure 6.5.2 contains an extrapolation of the CPU load on the 7000 storage. The extrapolation is

    extensive, and gives room for error. However, it appears to be pretty much in line with the CPU figures

    measured in the linked-cloning setup (see figure 6.2.20).

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    5055

    60

    65

    70

    75

    80

    85

    7410CPUload(avera

    ge)[%]

    Number of deployed full-clone vDesktops

    7410 CPU load Average percent (Extrapolated)

    7410 CPU load ave

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    64/74

    Page 64 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    More interesting is the amount of IOPs performed in the full-clone scenario compared to the linked-clone

    scenario:

    Figure 6.5.3: NFS IOPS comparison of full-clone versus linked-clone vDesktop deployment.

    Figure 6.5.3 shows that linked-clone vDesktops use more IOPS than full-clone vDesktops. This effect

    can be explained by the way linked-clones function within VMware ESX. This behavior is much like

    VMware snapshotting (see reference [1] for more details).

    Another thing that can be seen in figure 6.5.3, is that the deployment itself of linked-clones appears to

    have a greater IOPS impact than when performing full-clone deployment. Take note though, that this is

    not the case: In figure 6.5.3, the time scale has been adjusted in order to match both graphs into a

    single figure. Fact is that the speed of deployment is very different:

    - Linked-clones deploy at a rate of 100 vDesktops per Hour;- Full-clones deploy at a rate of 10 vDesktops per Hour.

    This factor 10 is not visible in the graph, but in fact the full-clone vDesktop deployment uses far more

    IOPS. This makes sense: In the full-clone scenario every vDesktop gets its boot drive fully copied, while

    linked-clones only have to perform some IOPS overhead when creating an empty linked-clone (andsome other administrative actions on disk).

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500

    NumberofIOPS

    [sec-1]

    Number of vDesktops deployed

    NFS IOPS performed; FULL vs. LINKED clones

    (0 to 100 vDesktops)

    NFS IOPs FULL-CLONE

    NFS IOPs LINKED-CLONE

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    65/74

    Page 65 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    7 ConclusionsFrom all tests conducted, some very interesting conclusions can be drawn. First of all, the fact that theenvironment managed to run over 1300 vDesktops without performance issues on its own is a great

    accomplishment. Looking deeper into the measured values gives a wealth of information on best practices

    how to configure Sun Unified Storage 7000 in combination with VMware View linked clones.

    7.1 Conclusions on scaling VMware ESXIt proves to be very important to scale your VMware ESX nodes correctly. There are basically three things to

    keep in mind:

    1) The amount of CPU cores inside an ESX server;2) The amount of memory inside an ESX server;3) The amount of vCPUs/VMs the ESX server can deliver.

    The first and second are obvious ones; put in too much CPU power, and you run out of memory leaving the

    CPU cores underutilized; put in too much memory and you run out of CPU power leaving memory

    underutilized.

    The third is sometimes forgotten, but proved to be the culprit in our test setup: If you use ESX servers with

    too much CPU and memory, youll run out of vCPUs and VMs will not start anymore beyond a certain point.Luckily, with each release of VMware ESX this number appears to get higher and higher:

    - ESX 3.01 / ESX3.5: 128 vCPUs, 128 VMs;- ESX3.5U2+: 192 vCPUs, 170 VMs;- vSphere (ESX4): 512 vCPUs, 320 VMs.

    As shown, using vSphere as a basis will allow for much bigger ESX servers.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    66/74

    Page 66 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    7.2 Conclusions on scaling networking between ESX and Unified StorageThe network did not really prove to be an issue during the performed tests. Bandwidth usage to any single

    ESX node proved to be far within the capabilities of a single GbE connection.

    Bandwidth to the storage also remained far within the designed bandwidth. The two 10 GbE connections

    remained underutilized throughout all tests.

    Load balancing was forcibly introduced into the test environment, but could have been skipped without issue

    in this case. If the 7000 storage would have been driven using 1 GbE links, load balancing would be

    recommended.

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    67/74

    Page 67 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    7.3 Conclusions on scaling Unified Storage CPU powerDuring the tests, the CPUs inside the 7000 storage were not saturated. At 1300 user-simulated vDesktops,

    the load on the two CPUs reached 85%, which should be considered to be near its maximum performance. In

    order to be able to scale up further, 4 CPUs (or 6-core CPUs) will be required.

    The HyperTransport bus between the two CPUs showed quite large values (order of 1.7 [GByte.sec-1] ). This

    was partially due to the fact that the two 10 GbE ports both resides on a single PCIe card. This caused all

    traffic to be forcibly sent through the HyperTransport bus of CPU0, instead of being load-balanced between

    CPU0 and CPU1:

    HyperTransport-to-I/Obridge

    HyperTransport-to-I/Obridge

    MemoryBus

    MemoryBus

    HT-bus

    HT-bus

    H

    T-bus

    PCIebus

    10Gb

    Ethe

    rnet 10GbEthernet

    PCIebus

    Sun 7410 Unified StorageHyperTransport Bus technology

    Figure 7.3.1: Sun 7410 Unified Storage HyperTransport bus architecture. In the performance tests

    a single PCIe card with dual 10GbE was used. Best practice would be to use two single

    port 10GbE PCIe cards using a different HT-Bus (shown in semi-transparency).

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    68/74

    Page 68 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    7.4 Conclusions on scaling Unified Storage Memory and L2ARCIn order to obtain best performance out of the 7000 Unified Storage, read cache is very important. This type

    of storage was even primarily selected for its large read cache capabilities. Using linked-clones all replicas

    (the full-cloned mother of the linked clones) were directly committed to read cache. For each linked-clone

    deployed a small addition amount of read cache was required. The amount of read cache should be carefully

    matched to the projected number of vDesktops on the storage device. See chapter 8 for more details.

    The L2ARC presents itself in the form of one or more read-optimized Solid State Drives (SSDs). It can be seen

    as a direct extension to internal memory. It is important to note though, that L2ARC storage is about a factor

    1000 slower than memory. Best practice would be to match internal memory to the required read-cache. If

    (and only if) memory requirements exceed the physical maximum amount of internal memory, L2ARC could

    be used to reach the required amount.

    7.5 Conclusions on scaling Unified Storage LogZilla SSDsThe LogZilla devices enable the 7000 Unified storage to quickly acknowledge synchronous writes to the

    storage device. The metadata of the write is stored in both the LogZilla and the write itself in the ARC. Finally,

    the write is committed to disk from the ARC and the metadata in the LogZilla is flagged as handled.

    In normal operation, the LogZilla is never read from. Only on recovery (such as power-loss) the LogZilla is

    read and the ZFS file system is returned to a consistent state using the metadata present in the LogZilla that

    was not flagged as handled yet.

    In effect, the addition of a LogZilla greatly enhances (lowers) the write latency to the storage device. The

    performed tests show that the LogZilla really helps to keep write latency to a minimum.

    Each LogZilla is able to perform at 10.000 [WOPS]. When the projected number of writes is larger than 10.000

    [IOPS], adding LogZillas could help. Note should be taken that adding a second LogZilla will not help

    performance-wise: The Unified storage will place both LogZillas in a RAID1 configuration. This RAID1

    configuration will help in ensuring performance; a LogZilla may fail, and the storage device will keep working

    normally. When using a single LogZilla, the synchronous writes will have to be written to disk directly if the

    LogZilla fails, clipping performance.

    Using four LogZilla devices could increase the number of WOPS that can be performed to a single storage

    device (the Unified Storage will put four LogZillas into a RAID10 configuration effectively being able to

    perform 20.000 [WOPS])

  • 8/8/2019 Performance Report Sun Unified Storage and VMware View 1.0

    69/74

    Page 69 of74

    Performance Report: VMware View linked clone performance on Suns Unified Storage (v1.0)

    7.6 Conclusions on scaling Unified Storage SATA storageThroughout the test the number of SATA ROPS and WOPS has been consistently limited in numbers. This is

    due to the way ZFS works: ZFS aims to read most (if not all) data from the ARC and L2ARC; ZFS combines and

    reorders small random writes to very large blocks and so converts the small random writes to large sequential

    writes. This way of working minimizes ROPS, and performs only few large sequential writes to SATA (also see

    Reference [3]).

    Given the fact that a single