18
Achieving a Million I/O Operations per Second from a Single VMware vSphere ® 5.0 Host Performance Study TECHNICAL WHITE PAPER

Achieving a Million I/O Operations per Second from a ... · Building a Scalable Test Infrastructure” in the appendix for more details. A total of 480 RAID-1 groups were created

  • Upload
    trananh

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Achieving a Million I/O Operations per Second from a Single VMware vSphere® 5.0 Host Performance Study

T E C H N I C A L W H I T E P A P E R

T E C H N I C A L W H I T E P A P E R / 2

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Table of Contents

Introduction................................................................................................................................................................................................................... 3 Executive Summary .................................................................................................................................................................................................. 3 Software and Hardware ........................................................................................................................................................................................... 3 Test Workload .............................................................................................................................................................................................................. 3 Multi-VM Tests ............................................................................................................................................................................................................. 4

Experimental Setup ........................................................................................................................................................................................... 4 Server ................................................................................................................................................................................................................. 4 Storage Area Network ............................................................................................................................................................................... 4 Virtual Platform ............................................................................................................................................................................................. 4 Virtual Machines............................................................................................................................................................................................ 4 Iometer Workload ........................................................................................................................................................................................ 4 Test Bed ............................................................................................................................................................................................................. 5

Results ...................................................................................................................................................................................................................... 7 Scaling I/O Operations per Second with Multiple VMs ................................................................................................................ 7 Scaling I/O Throughput as I/O Request Size Increase ................................................................................................................ 7 CPU Cost of an I/O Operation with LSI Logic SAS and PVSCSI Virtual Controllers ...................................................... 9

Performance of a Single VM ................................................................................................................................................................................. 11 Experimental Setup ........................................................................................................................................................................................... 11

Server ................................................................................................................................................................................................................. 11 Storage Area Network ............................................................................................................................................................................... 11 Virtual Platform ............................................................................................................................................................................................. 11 Virtual Machines............................................................................................................................................................................................ 11 Iometer Workload ........................................................................................................................................................................................ 11 Test Bed ............................................................................................................................................................................................................ 11

Results .................................................................................................................................................................................................................... 13 Scaling I/O Operations per Second with Virtual SCSI Controllers ........................................................................................ 13

Conclusion ................................................................................................................................................................................................................... 14 Appendix A ................................................................................................................................................................................................................. 15

Selecting the Right PCIe Slots for the HBAs ......................................................................................................................................... 15 Building a Scalable Test Infrastructure .................................................................................................................................................... 16 Assigning Virtual Disks to a VM .................................................................................................................................................................. 16 Estimation of CPU Cost of an I/O Operation ........................................................................................................................................ 17

References ................................................................................................................................................................................................................... 17 About the Author ..................................................................................................................................................................................................... 17

Acknowledgements ......................................................................................................................................................................................... 17

T E C H N I C A L W H I T E P A P E R / 3

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Introduction One of the essential requirements for a platform supporting enterprise datacenters is the capability to support the extreme I/O demands of applications running in those datacenters. A previous study1 has shown that vSphere can easily handle demands for high I/O operations per second. Experiments discussed in this paper strengthen this assertion further by demonstrating that a vSphere 5.0 virtual platform can easily satisfy an extremely high level of I/O demand that originates from the hosted applications, as long as the hardware infrastructure meets the need.

Executive Summary The results obtained from the experiments show that:

• A single vSphere 5.0 host is capable of supporting a million+ I/O operations per second.

• 300,000 I/O operations per second can be achieved from a single virtual machine (VM).

• I/O throughput (bandwidth consumption) scales almost linearly as the request size of an I/O operation increases.

• I/O operations on vSphere 5.0 systems with Paravirtual SCSI (PVSCSI) controllers use less CPU cycles than those with LSI Logic SAS virtual SCSI controllers.

Software and Hardware Because of its capabilities and rich feature set, VMware vSphere has become one of the Industry’s leading choices as a platform to build private and public clouds. Significant improvements have been made to vSphere’s storage stack in successive releases. These improvements enable vSphere to easily satisfy the ever-increasing demand for I/O throughput by almost all enterprise applications.

The Symmetrix VMAX storage system is a high end, highly scalable storage system from EMC2. It allows the scaling of storage system resources through common building blocks called Symmetrix VMAX engines. VMAX engines can be scaled from one VMAX engine with one storage bay, to eight VMAX engines with a maximum of ten storage bays. Each VMAX engine contains four quad-core processors, up to 128GB of memory, and up to 16 front-end ports for host connectivity.

The Emulex LightPulse LPe120023 is an 8Gbits/s Fibre Channel PCI Express dual-channel host bus adapter3. The LPe12002 delivers some of the industry’s highest performance, CPU efficiency, and reliability, making it a convenient choice to enable mission-critical and I/O-intensive applications in cloud environments.

Test Workload Iometer was used to generate the I/O load in all the experiments discussed in the paper4. Iometer was configured to generate 16 or 32 Outstanding I/Os (OIOs) with 100% random and 100% read requests. The size of the I/O requests varied between 512 bytes, 1KB, 2KB, 4KB, and 8KB depending on the experiments. The I/O sizes used in the experiments represent the I/O characteristics of a wide gamut of transaction-oriented applications such as databases and enterprise mail servers.

T E C H N I C A L W H I T E P A P E R / 4

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Multi-VM Tests Experimental Setup

Server

• 4 Intel Xeon Processors E7-4870, 2.40GHz, 10 cores

• 256GB memory

• 6 dual-port Emulex LPe12002 HBAs (8Gbps)

Storage Area Network

• VMAX with 8 engines

• 4 quad-core processors and 128GB of memory per engine

• 64 front-end 8Gbps Fibre Channel (FC) ports

• 64 back-end 4Gbps FC ports

• 960 * 15K RPM, 450GB FC drives

• 1 FC switch

Virtual Platform

• vSphere 5.0

Virtual Machines

• Windows Server 2008 R2 EE x64

• 4 vCPUs

• 8GB memory

• 3 virtual SCSI controllers

Iometer Workload

• 1 worker for a pair of virtual disks (five workers per VM)

• 100% random

• 100% read

• An access region of 6.4GB in each virtual disk (a total of 384GB on 60 virtual disks)

• 16 or 32 OIOs depending on the test case

• Request size of 512 bytes, 1KB, 2KB, 4KB, or 8KB depending on the test case

T E C H N I C A L W H I T E P A P E R / 5

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Test Bed

A single physical server running vSphere 5.0 was used for the experiments. Six identical VMs were created on this host. Six dual-port 8Gbps FC HBAs were installed in the vSphere host. All 12 FC ports of these HBAs were connected to a FC switch. The VMAX was connected to the same FC switch via 60 8Gbps FC links, with each link connected to a separate front-end FC port on the array. Refer to “Building a Scalable Test Infrastructure” in the appendix for more details.

A total of 480 RAID-1 groups were created on top of 960 FC drives in the VMAX array. A single LUN was created in each RAID group. A 250GB metaLUN was created on a set of 8 LUNs, resulting in a total of 60 metaLUNs. All 60 metaLUNs were exposed to the vSphere host. Separate VMFS datastores were created on each of the 60 metaLUNs. To ensure high concurrency and effective use of all the available I/O processing capacity, each VMFS datastore was configured to use a fixed path via a dedicated FC port on the array.

A single thick*

• Each VM was assigned a total of 10 virtual disks

virtual disk was created in each VMFS datastore. The virtual disks were assigned to the VMs as follows:

• All 10 virtual disks in a VM were distributed across three virtual SCSI controllers. Refer to “

for Iometer testing.

Assigning Virtual Disks to a VM” for more details on the distribution of virtual disks.

• The virtual SCSI controller type was varied between LSI Logic SAS and PVSCSI for the comparison tests.

• A total of 384GB (6.4GB in each virtual disk) of disk space was used by Iometer to generate I/O load in all VMs. VMAX, because of its 1TB aggregate memory, was able to cache most of the disk blocks belonging to this 384GB disk space.

* Also known as eagerzeroed thick. † These virtual disks were different than the virtual disk that contained operating system and Iometer application–related files.

T E C H N I C A L W H I T E P A P E R / 6

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Figure 1. Configuration for Multi-VM Tests

T E C H N I C A L W H I T E P A P E R / 7

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Results For each experiment, different metrics such as I/O operations per second (IOPS), megabytes per second (MBps), and latency of a single I/O operation measured in milliseconds (ms) were collected to analyze the I/O performance measured under different test scenarios.

Scaling I/O Operations per Second with Multiple VMs

This test focused on illustrating the scalability of I/O operations per second on a single host as the number of VMs generating the I/O load increased. Iometer in each VM was configured to generate a similar I/O load using 32 OIOs of 8KB size. The number of VMs generating the I/O load was increased from one to six. In each case, the total I/O operations per second and average latency for each I/O operation was measured in each VM. This test was done with only PVSCSI controllers.

Figure 2. Scaling I/O Operations per Second with Multiple VMs

Figure 2 shows the aggregate number of I/O operations per second achieved as the number of VMs were increased. Aggregate I/O operations per second scaled from 200 thousand to slightly above 1 million as the number of VMs was increased from one to six. The latency of I/O operations remained under 2 milliseconds throughout the test, increasing by only 10% as the I/O load increased on the host.

Scaling I/O Throughput as I/O Request Size Increase

This test focused on demonstrating the scalability of I/O throughput achieved on the host as the size of the I/O operations increased from 512 bytes to 8KB. Iometer in each VM was configured to generate a similar I/O load using 16 outstanding I/Os of varying size. The number of VMs generating the I/O load was fixed at six. The total I/O operations per second, I/O throughput (megabytes per second), and average latency for each I/O operation was measured in each VM. This test was run with LSI Logic SAS and PVSCSI virtual controllers in order to compare their impact on overall I/O performance.

0.0

0.4

0.8

1.2

1.6

2.0

2.4

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1 2 3 4 5 6

Late

ncy

(ms)

Inpu

t/O

utpu

t O

pera

tion

s pe

r Se

cond

(I

OP

S)

Number of Virtual Machines

IOPs

Latency

T E C H N I C A L W H I T E P A P E R / 8

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Figure 3. I/O Operations per Second with Different Request Sizes

Figure 3 shows the aggregate I/O performance of all six VMs when generating similar I/O load, but each time with a different request size. For each request size, experiments were conducted with both LSI Logic SAS and PVSCSI controllers. As seen Figure 3, aggregate I/O operations achieved from six VMs remained well over 1 million for all I/O sizes except 8KB. The average latency of a single I/O operation remained well under a millisecond‡

Please note that the slightly lower number of aggregate IOPS observed with 8KB request size and 6 VMs in this test case compared to that with a same request size and a same number of VMs in the test case described in the section titled “

with each virtual SCSI controller type, except at 8KB request size. The increase in I/O latency as measured by Iometer was due to a corresponding increase in the I/O latency at the storage.

Scaling I/O Operations per Second with Multiple VMs” was primarily due to a lower number of OIOs used for this test.

‡ Note that the I/O latency in this test was lower than that in the section titled “Scaling I/O Operations per Second with Multiple VMs” as the number of OIOs was half of that used for the latter test.

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

512 bytes 1KB 2KB 4KB 8KB

I/O

Lat

ency

(m

s)

IOP

S

I/O Size

LSI - IOPs pVSCSI - IOPs LSI - Latency pVSCSI - Latency

T E C H N I C A L W H I T E P A P E R / 9

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Figure 4. Scaling I/O Throughput with the Request Size

The corresponding aggregate I/O throughput observed in the host is shown in Figure 4. As seen in the figure, throughput scales almost linearly as I/O request size is doubled in each iteration. vSphere utilized the available I/O bandwidth to scale almost linearly from 592MB per second to almost 8GB per second as the I/O request size increased from 512 bytes to 8KB. The linear scaling clearly indicates that the vSphere software stack didn’t present any type of bottlenecks to the I/O workload originating from the VMs. The slight drop observed at 8KB request size was due to a corresponding increase in the I/O latency observed in the storage array.

Figures 3 and 4 also compare the aggregate I/O performance of the host when using two different virtual SCSI controllers - LSI Logic SAS and PVSCSI. The PVSCSI adapter provided 7% to 10% better throughput over LSI Logic SAS in all of the cases.

CPU Cost of an I/O Operation with LSI Logic SAS and PVSCSI Virtual Controllers

Another important metric that is useful to compare the performance of LSI Logic and PVSCSI adapters is the CPU cost of an I/O operation. A detailed explanation of the estimation methodology is given in “Estimation of CPU Cost of an Operation.” Figure 6 compares the CPU cost of an I/O operation with a LSI Logic SAS virtual SCSI adapter to that with a PVSCSI adapter for an I/O request size of 8KB.

0

2,000

4,000

6,000

8,000

0 1 2 3 4 5 6 7 8

IO T

hrou

ghpu

t (M

Byt

es/s

ec)

I/O Size (KB)

LSI - MBps

pVSCSI - MBps

T E C H N I C A L W H I T E P A P E R / 1 0

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Figure 5. CPU Cost of an I/O Operation with LSI Logic SAS and PVSCSI Adapters

As seen in Figure 5, a PVSCSI adapter provides 8% better throughput at 10% lower CPU cost. These results clearly show that PVSCSI adapters are capable of providing better throughput at a lower CPU cost than LSI Logic SAS adapters at extreme load conditions.

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

LSI PVSCSI

Nor

mal

ized

Cyc

les

/ IO

IOP

S

Virtual SCSI Controller

IOPs Cycles / IO

T E C H N I C A L W H I T E P A P E R / 1 1

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Performance of a Single VM The next study focuses on the performance of a single VM.

Experimental Setup

Server

• 4 Intel Xeon Processors E7-4870, 2.40GHz, 10 cores

• 256GB memory

• 4 dual-port Emulex LPe12002 HBAs (8Gbps)

Storage Area Network

• VMAX with 5 engines

• 4 quad-core processors and 128GB of memory per engine

• 40 front-end 8Gbps FC ports

• 40 back-end 4Gbps FC ports

• 960 * 15K RPM, 450GB FC drives

• 1 FC switch

Virtual Platform

• vSphere 5.0

Virtual Machines

• Windows Server 2008 R2 EE x64

• 16 vCPUs

• 16GB memory

• 1 to 4 virtual SCSI controllers

Iometer Workload

• Two workers for every 10 virtual disks

• 100% random

• 100% read

• An access region of 6.4GB in each virtual disk

• 16 OIOs

• Request size of 8K Bytes

Test Bed

A single VM running on the vSphere 5.0 host was used for this test. The number of vCPUs was increased from 4 to 16. The amount of memory assigned to the VM was increased from 8GB to 16GB. The storage layout created for the multi-VM tests was reused for this study. Iometer was configured to use a total of 40 virtual disks. All the virtual disks were assigned to the single test VM in increments of 10. Each time, the set of 10 virtual disks were attached to a new virtual SCSI controller§

§ vSphere 5.0 supports a total of 4 virtual SCSI controllers per VM.

. Iometer was configured to generate the same I/O load on every virtual

T E C H N I C A L W H I T E P A P E R / 1 2

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

disk. This ensured a constant load on each virtual SCSI controller.

Each set of 10 virtual disks was accessed through a dual-port HBA in the host but separate FC ports on the array. A total of four dual-port HBAs in the host and 40 front-end FC ports in the array were used to access all 40 virtual disks.

Figure 6. Test Configuration for Single-VM test

T E C H N I C A L W H I T E P A P E R / 1 3

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Results As done in the multi-VM test case, different metrics such as I/O operations per second and latency for a single I/O operation measured in milliseconds were used to study the I/O performance.

Scaling I/O Operations per Second with Virtual SCSI Controllers

This test case focused on studying the scalability of I/O operations per second when the number of virtual SCSI controllers in a VM was increased.

Figure 7. Scaling I/O Operations per Second with 8KB Request Size in a Single VM

Figure 7 shows the aggregate I/O operations per second and the average latency of an I/O operation achieved from a single VM as the number of virtual SCSI controllers was increased. As shown in Figure 8, aggregate I/O operations per second increased linearly while the latency of I/O operations remained close after increasing from one to two virtual SCSI controllers.

0.0

0.4

0.8

1.2

1.6

2.0

2.4

2.8

0

60,000

120,000

180,000

240,000

300,000

360,000

420,000

1 2 3 4

I/O

Lat

ency

(m

s)

IOP

S

Number of Virtual SCSI Controllers

pVSCSI - IOPs pVSCSI - Latency

T E C H N I C A L W H I T E P A P E R / 1 4

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Conclusion vSphere offers a virtualization layer that, in conjunction with some of the industry’s biggest storage platforms, can be used to create private and public clouds that are capable of supporting any level of I/O demands originating from the applications running in those clouds. Results of the experiments conducted at EMC labs illustrate that:

• vSphere can easily support a million+ I/O operations per second from a single host out of the box.

• A single VM running on vSphere 5.0 is capable of supporting 300,000 I/O operations per second at an 8KB request size.

• vSphere can scale linearly in terms of I/O throughput (bandwidth consumption) as the request size of an I/O operation increases.

• vSphere’s Paravirtual SCSI controller offers lower CPU cost for an I/O operation compared to that of the LSI Logic SAS virtual SCSI controller.

The results presented in this paper are by no means the upper limit for the I/O operations achievable through any of the components used for the tests. The intent is to show that a vSphere virtual infrastructure, such as the one used for this study, can easily handle even the most extreme I/O demands that exist in datacenters today. Customers can virtualize their most I/O-intensive applications with confidence on a platform powered by vSphere 5 and EMC VMAX.

T E C H N I C A L W H I T E P A P E R / 1 5

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Appendix A

Selecting the Right PCIe Slots for the HBAs

To achieve a million+ I/O operations per second with an 8KB request size through the least number of HBAs required the HBAs to collectively provide 8GB per second of throughput. Although five HBAs, each with a dual port and supporting 8Gbps, could theoretically meet the 8GB per second throughput requirement, at that throughput each HBA would be operating near the HBA’s saturation regime. To ensure sufficient concurrency and still have enough capacity to push a million+ IOPS, six HBAs were used for the exercise. This required each HBA to support at least 1.3GB per second. To support that kind of throughput, the HBAs had to be placed in those PCIe slots that were capable of supporting the required throughput from each HBA. The HBAs were placed in different slots as shown in Table 1. The maximum theoretical throughput of each slot is also shown in the table.

Figure 8. PCIe Slot Configuration of the Server used for the Tests

SLOT NUMBER

PCIe VERSION

NUMBER OF LANES

MAXIMUM THROUGHPUT**

1

Gen 2 4 2GBps 2, 3, 4, 6 Gen 2 8 4GBps 7 Gen 2 16 8GBps

Table 1. Details of the PCIe Slots

** Each PCIe Gen 2 channel provides a theoretical max of 500MB per second.

T E C H N I C A L W H I T E P A P E R / 1 6

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Building a Scalable Test Infrastructure

To build a scalable virtual infrastructure that can support a million+ IOPS, a few tests were conducted initially. A single VM with 2 vCPUs and 8GB of memory was chosen as the basic building block. This VM was assigned 4 thick virtual disks, each created on a separate VMFS datastore. Each datastore was created on a separate metaLUN in the VMAX array. Each datastore was accessed through a single 8Gbps FC port in the host, but with dedicated FC ports on the array. Iometer was configured to issue 100% random, 100% read requests to 6.4GB worth of storage space on each disk.

With this Iometer profile, the VM produced 120K IOPS and 100K IOPS with 512 bytes and 8KB request sizes, respectively. If the vCPU and storage configuration of the VMs were to be doubled, it was theoretically possible to achieve 240K and 200K IOPS with 512 bytes and 8KB request sizes respectively through a dual-port 8Gbps FC HBA. Thus, a single VM with the following configuration was chosen as the building block for the tests: 4 vCPUs and 8GB of memory, issuing I/O requests to 10 virtual disks (all on separate VMFS datastores) through two ports of a dual-port adapter. The number of VMs was increased to six, with each VM being identical in the configuration. The final test setup used is shown in Figure 1.

Assigning Virtual Disks to a VM

To achieve enough parallelism while pushing a high number of I/O operations per second, virtual disks in each VM were spread across multiple virtual SCSI controllers as follows:

NUMBER OF VDISKS VIRTUAL SCSI CONTROLLER ID

4 0 3 1 3 2

Table 2. Virtual Disk Assignment for Multi-VM tests

NUMBER OF VDISKS VIRTUAL SCSI CONTROLLER ID

10 0 10 1 10 2 10 3

Table 3. Virtual Disk Assignment for Single-VM Test

T E C H N I C A L W H I T E P A P E R / 1 7

Achieving a Million I/O Operations per Second from a Single VMware vSphere 5.0 Host

Estimation of CPU Cost of an I/O Operation

%Processor time of the vSphere host was used to estimate the CPU cost of an I/O operation. When all six VMs were actively issuing 8KB I/O requests to their storage, the total %Processor time of the vSphere host was recorded using esxtop5. The CPU cost of an I/O operation was estimated using the following equation:

(Average % processor time of the host × Rated CPU clock frequency × Number of logical threads) ÷ (100 × Total IO operations per second)

The rated clock frequency of the processors in the server was 2.4GHz. By default, hyperthreading was enabled on the vSphere host. Hence the number of logical threads was 80.

References 1. “VMware vSPhere 4 Performance with Extreme I/O Workloads.” VMware, Inc., 2009. http://www.vmware.com/pdf/vsp_4_extreme_io.pdf.

2. “EMC Symmetrix VMAX storage system.” EMC Corporation, 2011. http://www.emc.com/collateral/hardware/specification-sheet/h6176-symmetrix-vmax-storage-system.pdf.

3. “LightPulse LPe12002.” Emulex Corporation, 2008. http://www.emulex.com/products/host-bus-adapters/emulex-branded/lightpulse-lpe12002/overview.html.

4. “Iometer.” http://www.iometer.org/.

5. “Interpreting esxtop Statistics.” VMware, Inc., 2010. http://communities.vmware.com/docs/DOC-9279.

About the Author Chethan Kumar is a senior member of Performance Engineering at VMware, where his work focuses on performance-related topics concerning database and storage. He has presented his findings in white papers and blog articles, and technical papers in academic conferences and at VMworld.

Acknowledgements The author would like to thank VMware’s partners—Intel, Emulex, and EMC—for providing the necessary gear to build the infrastructure that was used for the experiments discussed in the paper. He would also like to thank the Symmetrix Performance Engineering team of Thomas Rogers, John Aurin, John Adams and Dan Aharoni for installing and configuring the hardware setup and for running all the experiments. Finally, the author would like to thank Chad Sakac, VP of VMware Alliance, EMC for supporting this effort and ensuring the availability of the hardware components to complete the exercise in time.

VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www.vmware.com Copyright © 2011 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.

Achieving a Million I/O Operations per Second from a Single vSphere 5.0 Host