vio server

Modeling the Performance of Virtual I/O Server

Jie Lu

BMC Software Inc. Waltham, Massachusetts, USA

Virtual I/O server is widely adopted in server virtualization solutions, for it provides the sharing of physical disks and network adapters in a flexible and reliable manner. Traditional performance models for physical server lack the ability to represent such environment. On the basis of analyzing the architecture and characteristics of a variety of virtual I/O servers from different vendors, this paper presents a practical analytical model for measuring the performance of applications and servers involving virtual I/O server. It also introduces methods leveraging the commodity modeling tools.

1. INTRODUCTION Server virtualization enables the partitioning of a physical computer frame into multiple isolated virtual environments, where each acts as an individual server. It allows the sharing of physical resources between partitions to provide more efficient utilization. The virtualization layer, hypervisor, runs directly on hardware to provide virtual machine abstraction to the guest OSs. A guest OS runs in such virtual machine environment that would otherwise run directly on a separate physical system. The hypervisor is responsible for low-level resource allocation and scheduling. Virtualizing CPU utilizes the same time sharing mechanism used in any modern operating systems. The hypervisor performs context switches between virtual CPUs, just like conventional operating systems perform context switches between processes or threads. Virtualizing memory is relatively easy, too. Many server virtualization solutions do not allow the sharing of memory between virtual machines. In such case, virtualizing memory is simply mapping the addresses viewed by virtual machines to separate memory partitions of the physical system. For the hypervisor do allow shared memory, it adds another layer of address mapping to translate the virtual machine memory address to the actual physical memory address. The mechanism here is similar to the virtual memory management in conventional operating systems. However, virtualizing I/O devices is not trivial. Unlike the CPU and memory management, which have already been somewhat virtualized within an OS, the I/O subsystem is usually outside of the OS core kernel, thus is complicated to multiplex. Virtual I/O Server provides the opportunity to share physical disk and network I/O adapters in a flexible and reliable manner. An I/O adapter and associated disk subsystem or network interface can be shared by many virtual machines on the same server. This facilitates the consolidation of LAN and disk I/O resources and minimizes the number of physical adapters that are required. Therefore the number of virtual machines could be supported is not limited by the number of physical adapters, or PCI slots available. Hence it further increases the overall system utilization. All vendors supporting paravirtualization provide their own implementation of the virtual I/O server. IBM has Virtual I/O Server for the System p PowerVM virtualization, which includes DLPAR and SPLPAR. Sun uses Service Domain for the Logical Domains platform. Microsoft Hyper-V utilizes VSP-VSC (Virtualization Service Provider and Client) framework to virtualize I/O. Citrix XenServer introduces the Split Driver model to achieve the same functionality. For any of the above virtualized servers, modeling the performance of CPU and memory is not much different from the modeling of standalone servers. The same schemes can still be used, maybe with minor modification. But the conventional model does not apply to the I/O. First, the I/O devices are shared among multiple virtual machines. The

I/O measurement from virtual machines does not reflect the topology of the physical I/O devices. Second, virtual I/O consumes not only the devices themselves, but also the CPU and memory of the virtual I/O server. The contention on virtual I/O server CPU and memory will directly impact the performance of workloads running in the client virtual machines. One has to make sure that the virtual I/O server having sufficient processing power, in addition to I/O bandwidth. To effectively model the performance of the virtualized environment involving virtual I/O server calls for new methodologies. After discussing the performance measuring and modeling issues involving virtual I/O server, this paper presents practical modeling schemes. The second section describes the general architecture and concepts of virtual I/O server, followed by brief introduction of four sample solutions from IBM, Sun, Microsoft and Citrix. Then Section four introduces a basic performance model with a centralized system and multiple workloads. Next, we present a practical method based on distributed system model using commodity tools. Finally, the paper is concluded in Section six. 2. VIRTUAL I/O SERVER ARCHITECTURE Full virtualization requires that the guest OS run in a virtual machine without any modification. Therefore most full virtualization solutions, such as VMware ESX server, employ the monolithic model to provide emulated forms of simple devices. The emulated devices are typically chosen to be common hardware, so it is likely that drivers exist already for any given guest. Then the hypervisor provides the device drivers for the actual physical devices. Therefore, all I/O devices are owned by the hypervisor, where each guest only accesses the emulated virtual devices. Supporting the range of hardware available for a commodity server at hypervisor level would not be economy efficient, since most of them are already supported by the common operating systems, which would be running as the guests. In paravirtualization solutions, the guest OS kernel has to be modified in order to run anyway. Thus most paravirtualization solutions employ a client-server model to virtualize the I/O devices. IBM

PowerVM Sun

Logical Domains Microsoft Hyper-V

Citrix XenServer

Virtual I/O server Virtual I/O Server Service Domain VSP Partition Driver Domain Client virtual machine Client partition Logical Domain Child Partition Domain U Virtual bus Virtual SCSI Domain Channel VMBus XenBus

Front end interface Virtual Adapter Virtual Device Driver VSC Top half of split

driver

Backend service Virtual Server Adapter

Virtual Device Service VSP Bottom half of split

driver Table 1. Virtual I/O server terminologies

There is no consensus on the architecture and terminologies used in virtual I/O server solutions. Each vendor has its own. But they all share the same base-line concepts and structures. We will use the commonly understandable terminologies to illustrate the general architecture of the virtual I/O server here. When we describe each individual vendor’s solution later, we will use their own terminologies. Table 1 gives the mapping of the terminologies. The virtual I/O server usually consists of five major components: the actual device driver, the backend service interface, the virtual bus, the front end client interface and the virtual device driver. Figure 1 illustrates the architecture of a typical virtual I/O server. In paravirtualization, the hypervisor implements memory sharing and message passing mechanisms for virtual machines to communicate with each other. The virtual bus, which is a well-defined protocol built on top of these communication mechanisms, provides a way of enumerating the virtual devices available to a given virtual machine, and connecting to them. The backend service runs on the virtual I/O server to handle multiplexing and provide a generic interface. The multiplexing is to allow more than one client virtual machine to use the device. The front end interface runs on a client virtual machine to communicate with the backend service via the virtual bus protocols. Therefore, the backend acts as a provider while the front end acts as a consumer.

Virtual I/O Server VM

DD

evice river Device Driver Backend

Client VM

Virtual Bus

Front end Front end

Hypervisor

Virtual device driver

Virtual device driver

Figure 1. Architecture of a typical virtual I/O server

Client VM

The virtual I/O server owns the physical I/O devices and their corresponding device drivers. For storage systems, it exports a pool of heterogeneous physical storage as a homogeneous pool of block storage to the client virtual machines in simple abstract forms. The virtualized storage devices can be backed by internal physical disks, external LUNs, optical storage, logical volumes, or even files. For network interfaces, it usually implements the Ethernet transport mechanism as well as an Ethernet switch that supports VLAN capability. The client virtual machines access these abstract virtual devices using the virtual device driver, just like they are connected locally. In this way, the virtualization solution leverages the large base of device drivers in existing operating systems. An I/O request from a client virtual machine goes to the virtual device driver first; then the front end interface on the client communicates to the back end service on the virtual I/O server via the virtual bus; the virtual I/O server calls the actual device driver to send the request to the physical device. When the I/O request is finished service on the physical device, it responds to the client following the same route in backward. Usually the control virtual machine, which is the special one permitted to use the privileged control interface to hypervisor, is acting as the virtual I/O server. It can also dedicate one or several virtual machines to be the virtual I/O server. 3. COMMERCIAL SOLUTIONS IBM PowerVM IBM Virtual I/O Server (VIOS) is part of the System p PowerVM hardware feature [HALE2008]. It allows virtualization of physical storage and network resources. The VIOS can run in either a dedicated processor partition or a micro-partition. The VIOS is a standard storage subsystem, which is capable of exporting the physical storages in the form of standard SCSI-compliant LUNs. The virtualized storage devices are accessed by the client partitions through virtual SCSI devices. The Virtual I/O Server implements the virtual SCSI server adapter to act as a server, or SCSI target device. The client logical partitions have a SCSI initiator referred to as the virtual SCSI client adapter, and access the virtual SCSI targets as standard SCSI LUNs. Physical disks owned by the VIOS can be either exported and assigned to a client logical partition as a whole or can be partitioned into logical volumes and assigned to different partitions.

The virtual Ethernet function is provided by the POWER Hypervisor. The Virtual I/O Server allows shared access to external networks through the Shared Ethernet Adapter (SEA). The Shared Ethernet Adapter supports link aggregation, SEA failover, TCP segmentation offload, and GVRP. Sun Logical Domains Sun Logical Domains are supported in all Sun servers which utilize Sun processors with Chip Multithreading Technology [SUN2007]. There are four roles of the logical domains: control domain, service domain, I/O domain and guest domain. A single logical domain may function in one or more of the roles. Among these roles, a service domain provides specific virtualized services, including virtual disk, network, and console services to guest domains using a logical domain channel for communication. By buffering device control, the service domain can actually change the underlying device or device driver while the logical domain continues to execute. A logical domain channel (LDC) is a point-to-point, full-duplex link created by the hypervisor. Within the logical domains architecture, LDCs provide a data path between virtual devices and guest domains and establish virtual networks between logical domains. A unique LDC is explicitly created for each link, ensuring data transfer isolation. With assistance from the hypervisor, data is transferred across a LDC as a simple 64-byte datagram or by using shared memory. Virtual I/O devices, such as disk, network, console, and cryptographic units are created by the hypervisor and subsequently offered to logical domains by a service domain. Guest domains contain virtual device drivers that communicate using a logical domain channel to a virtual device service, such as virtual disk bridge, in a service domain. The service domain then connects to the actual I/O device. Citrix XenServer Xen delegates hardware support to a guest, called Driver Domain. Usually it is Domain 0, although it could be other guest domains. Xen employs a split device driver model, which consists of four major components: the real driver, the bottom half of the split driver, the shared ring buffers, and the top half of the split driver. The bottom half of the split driver provides multiplexing features. These features are exported to other domains using ring buffers in shared memory segments. The top half of the split driver, running in an unprivileged guest domain, is typically very simple. It initializes a memory page with a ring data structure and exports it via the grant table mechanism. It then advertises the grant reference via the XenStore where the bottom half driver can retrieve it. The bottom half driver then maps it into its own address space, giving a shared communication channel where the top half driver inserts requests and the bottom half driver places responses. [CHIS2008] Xen provides abstract devices by implementing a high-level interface that corresponds to a particular device category. Rather than providing a SCSI device or an IDE device, Xen provides an abstract block device, which supports only two operations: read and write a block. The operations are implemented in a way to allow operations to be grouped in a single request, which makes I/O reordering in the Driver Domain kernel or the controller to be used effectively. A Xen network driver is added to the Driver Domain kernel as a virtual interface and then existing low-level services for bridging, routing, and virtual interfaces in the guest OS are used to handle multiplexing of network devices. Microsoft Hyper-V In Hyper-V, the Virtual Service Provider (VSP) is running within a partition that owns the corresponding physical I/O devices. It virtualizes a specific class of devices, (e.g. networking, storage, etc.) by exposing an abstract device interface. The Virtual Service Client (VSC) runs within an enlightened child partition to consume the virtualized hardware service. The physical devices are managed by traditional driver stacks in the VSP partition. By default, the parent partition provides the VSP services. VMBus, which is a software-based bus using memory sharing and hypervisor IPC messages, enables VSPs and VSCs to communicate efficiently. [KIEF2005]

4. CENTRALIZED SYSTEM MODEL The typical approach of modeling a virtual server is to use a centralized system model. A queuing network is built to connect all resources of the physical server. Each virtual machine is treated as a workload traversing the queuing network. Therefore, multiple workloads are contending the resources of the physical server. Figure 2 illustrates the concept of the queuing network model. Each physical resource, such as CPU, disk, or NIC, has an input queue. The requests flow though these resources.

• • •

VMM NICs Disks

• • • • • •

• • •

• • •

VMm

VM1

CPUs

Figure 2. Queuing network model using centralized system approach The operational analysis formulas for multi-class open queuing networks can then be applied here [MENA2002]. First, the average throughput, , of each VM can be measured at the application level. In the case of open systems

with operational equilibrium, the average throughput is the same as the average arrival rate, 0X

0X=λ . However,

different VM has different values of the arrival rate. So we use a vector ),,,,( 1 Mm λλλλ LL= to denote the set of all arrival rates, where M stands for the number of VMs. Since all performance metrics depend on the values of the arrival rates, each is a function of the values of the arrival rates. The utilization of each resource can then be represented as:

( ) mimmi DU ,, ×= λλ (4 – 1)

( ) ( )∑=

=M

mmii UU

1, λλ (4 – 2)

Here, denotes the service demand of VM m on resource i, which is the total service time that the requests from VM m spend on resource i. The physical resource utilization for each VM can usually be obtained from the hypervisor layer. In full virtualization solutions, any time-based metrics collected within a guest OS is not valid for performance modeling.

miD ,

However, not all paravirtualization solutions provide valid performance measures at hypervisor level. IBM PowerVM and Sun Logical Domains do not provide statistical resource usage metrics at all. XenServer only keeps track of high-level CPU usage for each individual virtual machine [LU2006]. None of these solutions supplies the I/O statistics. Normally, the performance is measured within each virtual machine for a paravirtualization solution. Unlike the full virtualization, guest OS kernel of paravirtualized server has been modified to run. By implementing the involuntary wait time, it keeps track of valid accounting information. So we could use the regular performance metrics from all virtual machines to build the above queuing model. Collecting resource usage data from guest OS is trivial with any OS provided tools or third party products. The

measurements about CPU and memory can be directly used to plug into the queuing model. But the disk and network measurements from client VM are against virtual devices. Only the virtual I/O server measures the actual device usage when it performs the I/O on behalf of the client VMs. The conventional data collection methods would account both virtual and physical I/O statistics in separate VMs. When aggregated to the frame level, they are double counted. In order to get the valid model parameters, we have to figure out the physical device usage for the client VMs. Virtual I/O server could provide the mapping of a virtual I/O device to the physical device. With such mapping information, we can distribute the actual I/O metrics measured at virtual I/O server to each individual VM. Accurately distributing the I/O still has not solved all puzzles. While providing the virtual I/O service, the virtual I/O server also consumes CPU cycles. Therefore, the CPU usage has to be charged to the client VMs as well. Although most vendors recommend dedicating a VM for virtual I/O server, it does not prevent user from running other software inside it. Regardless, there are always managing and monitoring tools, consuming resources, which should not be charged to the virtual I/O services. Hence, we have to identify the CPU usage for serving the virtual I/O and distribute it to the client VMs proportionally by the I/O weight. The rest of the virtual I/O server can be left as regular VM running on the physical server. In many cases, there are processes associated with the virtual I/O services. For example, the process seaproc corresponds to the Shared Ethernet Adapter service on IBM VIOS. Sometime the CPU consumption of virtual I/O services is not charged to any process. Then we have to attribute the difference between the system level CPU time and the sum of process level CPU time to the virtual I/O service. With the resource utilization and service demand metrics, we may derive other performance metrics of the model as follow. The average residence time of requests from VM m at resource i is:

( ) ( )λλi

mimi

U

DR

−=′

1,

, (4 – 3)

The overall average response time of requests from VM m is:

( ) ( )∑=

′=K

imim RR

1,,0 λλ (4 – 4)

The average number of requests from VM m at resource i is:

( ) ( )( )λλ

λi

mimi

U

Un

−=

1,

, (4 – 5)

The average number of requests at resource i is:

( ) ( )∑=

=M

mmii nn

1, λλ (4 – 6)

Now, this queuing model is ready to evaluate the performance based on the measurement discussed above, and perform capacity planning for the future. When the workload on a VM grows, the arrival rate increases accordingly. Applying the formulas of (4 – 1) and (4 – 2) will give the affect on the resource utilization. The throughput of the host reaches its maximum, when any single resource saturates with 100% utilization. The effect on response time from the growth can also be derived using formulas (4 – 3) and (4 – 4). Figure 5 illustrates this modeling scheme with a real example. Example

Next, we use IBM Virtual I/O Server as an example to illustrate the above method. Our test environment has an 8-way IBM 9113-55A System p server. Six SPLPARs have been created on top of it, and two of them are configured as VIOS. Table 2 lists the configurations of these SPLPARs.

Partition Name

Partition Type Pool ID Virtual

Processors Entitlement Role OS

ts40 SPLPAR 0 2 0.5 Client AIX 5.3 d001 SPLPAR 0 4 1.0 Client AIX 5.3 d002 SPLPAR 0 4 0.5 Client AIX 5.3 ts001 SPLPAR 0 4 0.5 Client AIX 5.3 vi13a SPLPAR 0 2 0.2 VIOS AIX 5.3 vi13b SPLPAR 0 2 0.2 VIOS AIX 5.3

Table 2. Test environment configuration Each of the partitions is instrumented to collect performance data, including CPU, memory, disk, and network statistics. Figure 3 displays the overall CPU utilization and disk I/O rate measured from each individual partition. Notice that the I/O is double counted in the right chart.

0

50

100

150

200

250

300

%

1:00AM

4:00 7:00 10:00 1:00 4:00 7:00 10:00

Partition CPU Utilization

ts40vi13avi13bd002ts002d001

0

2

4

6

8

10

12

14

16

MB/sec

1:00AM

4:00 7:00 10:00 1:00 4:00 7:00 10:00

Partition Disk I/O Rate

ts40d002vi13bts002d001vi13a

Figure 3. Top level partition resource usage

First, we use command lsmap in VIOS to list out the mapping between virtual devices to physical devices. The virtual devices here are identified with the slot number of the virtual SCSI client adapter and the Logical Unit Number (LUN) of the virtual SCSI device. The command lscfg inside a client partition can tell the slot number and LUN for each virtual device. Therefore, we are able to correlate the information to connect a virtual device in client partition to the physical device in VIOS. In this way, we can eliminate the virtual I/O rate from client partitions and assign them the actual physical I/O rate from VIOS accordingly. In case there are multiple VIOS configured in a single frame, the HMC command lshwres gives the Virtual I/O Server name and the slot number for a given client partition. Next, we identify the CPU usage of virtual I/O services. The CPU used by SEA service is charged to process seaproc. Thus we may distribute it to client partitions based on the network rate. IBM VIOS does not account the CPU consumption for virtual SCSI service to any process. Therefore, the unaccounted CPU time is distributed to client partitions according to the disk I/O rate. Out testing data shows that the unaccounted CPU time could be over 80% of the overall CPU time, when there are heavy I/O activities. Figure 4 demonstrates the correlation between the disk I/O rate and unaccounted CPU utilization, and the correlation between network byte rate and process seaproc CPU utilization. Finally, we are able to evaluate the model and perform capacity planning. In our experiment, we chose the interval between 1:00am to 2:00am to build the baseline model. With the data of CPU utilization, disk utilization, throughput, and so on, we may compute the response time. Then we assume the applications will grow 20% for each quarter.

Based on the baseline model, we are able to plot the growth of the resource utilization and workload response time, as shown in Figure 5.

Correlation of Disk I/O Rate and Unaccounted CPU Util

0

1

2

3

4

5

6

7

8

1:002:0

03:0

04:0

05:0

06:0

07:0

08:0

09:0

010

:0011

:0012

:0013

:0014

:0015

:0016

:0017

:0018

:0019

:0020

:0021

:0022

:0023

:000:00

MB

/sec

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

%

I/O RateCPU Util

Correlation of Network Rate and seaproc CPU Util

0.0000

0.2000

0.4000

0.6000

0.8000

1.0000

1.2000

1.4000

1.6000

1.8000

1:002:0

03:0

04:0

05:0

06:0

07:0

08:0

09:0

010

:0011

:0012

:0013

:0014

:0015

:0016

:0017

:0018

:0019

:0020

:0021

:0022

:0023

:000:00

MB

/sec

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

%

Net RateCPU Util

Figure 4. Correlations between I/O rate and CPU utilization

0

20

40

60

80

100

120

%

Current Q1 Q2 Q3 Q4

CPU Utilization Growth

ts40

vi13a

vi13b

d001

d002

ts002

0102030405060708090

100

%

Current Q1 Q2 Q3 Q4

Disk Utilization Growth

d001 d0

d001 d1

d001 d2

d001 d3

ts002 d0

ts002 d1

ts003 d2

0.000.010.020.030.040.050.060.070.080.090.10

sec

Current Q1 Q2 Q3 Q4

Response Time Growth

ts40

vi13a

vi13b

d001

d002

ts002

Figure 5. Resource utilization and response time growth

As the charts shown, the system saturated in Q3, due to the disk d0 on partition d001 reaches 100% utilized. As a result, the response time of the partition grows to infinite. We would have to either upgrade the system or adjust the configuration before Q3. Fortunately, the virtual I/O server provides transparent reconfiguration of the I/O subsystem

seamlessly, without the client partition being involved. 5. DISTRIBUTED SYSTEM MODEL Although the previous performance model works, it is still primitive and has some shortcomings. In a virtualized environment, VMs are usually assigned a portion or share of resources. The virtual I/O server is constrained by the resource allocation of the hosting VM. Charging the resource usage directly to the client VMs violates the configuration constraints. We will then have to adjust the above model to reflect the different constraints. Actually, many existing modeling tools are pretty mature on dealing with distributed systems, and even certain partitioned or virtualized servers. We could leverage these commodity tools to model the virtual I/O server, although it is not explicitly supported. Many modeling tools provide sophisticated solutions for capacity planning. Although most of them are focusing on node (individual server) based systems, some do allow dependencies between workloads on different servers to model the distributed system. For example, a file server would host workload that others on separate servers depend on. Then, modeling the virtual I/O server would be similar to modeling a file server. The difference is that the regular file servers rely on network outside the server, while the virtual I/O server uses memory bus inside the physical frame.

Client VM

Virtual I/O server

Virtual I/O server

App workload

VIO service

VIO service

App workload

• • •

Client VM

Figure 6. Distributed system model concept With such kind of tool, we first create service workload on the virtual I/O server and associate it with the resource usage identified in previous section. In general, separate workloads are created for disk I/O and network I/O. They could be further broken down by devices if necessary. Then we identify the workloads on client partitions which have issued virtual I/O requests, and make them depend on the service workload on the virtual I/O server. This is equivalent to having the client workloads call the service workload. The call count is determined by the I/O rate. Figure 6 shows the concept of the modeling method. In this way, the virtual I/O server and the service workload become a component of the response time of the client workloads. Growing a client workload will incur the growth of the virtual I/O service workload as well. Using this method, we perform the same study as before. The results demonstrate the same trend, as illustrated in Figure 7. Unlike previous experiment, the bottleneck is the virtual I/O server now, as one of its disks saturates in Q3. Since applications on other client partitions depend on it, like d001, their response time grow rapidly. 6. CONCLUSION The virtual I/O server becomes more and more popular in server virtualization, for it provides the benefit of sharing of physical disks and network adapters in a flexible and reliable manner. As the traditional approaches not suitable for such environment, it calls for new method of performance modeling and capacity planning. This paper has successfully presented practical methods for effectively measuring and modeling the performance of applications and servers involving virtual I/O server.

020

40

6080

100120

140160

180

%

Current Q1 Q2 Q3 Q4

CPU Utilization Growth

ts40

vi13a

vi13b

d001

d002

ts002

0102030405060708090

100

%

Current Q1 Q2 Q3 Q4

Disk Utilization Growth

hdisk10

hdisk11

hdisk12

hdisk13

0.00

0.05

0.10

0.15

0.20

0.25

0.30

sec

Current Q1 Q2 Q3 Q4

Response Time Growth

ts40

vi13a

vi13b

d001

d002

ts002

Figure 7. Results from distributed system model

REFERENCES [CHIS2008] D. Chisnall, The Definitive Guide to the Xen Hypervisor. Prentice Hall, 2008. [HALE2008] C. Hales, C Milsted, O. Stadler and M. Vagmo, PowerVM Virtualization on IBM System p: Introduction and Configuration. IBM Redbooks, 2008. [KIEF2005] M. Kieffer, Windows Virtualization Architecture. WinHEC, 2005. [LU2006] J. Lu, L. Makhlis and J. Chen, “Measuring and modeling the performance of the Xen VMM,” Int. CMG Conference 2006: 621-628 [MENA2002] D. A. Menasce and V. A. F. Almeida, Capacity Planning for Web Services: metrics, models, and methods. Prentice Hall, 2002. [SUN2007] Virtualization with Logical Domains and Sun CoolThreads Servers. Sun Microsystems white paper, 2007.

Documents

vio server