* Distributed System Lab 1 Analysis and experimental

04/13/23 Distributed System Lab 1

Analysis and experimental evaluation of data plane virtualization with Xen

游清權


Outline• Introduction

• Virtual Network with Xen– Data path in Xen– Routers data plane virtualization with Xen– Performance problem statement

• Experiments and Analysis

• Related work

• Conclusion and perspectives


Introduction

• System virtualization– Isolation– Mobility– Dynamic reconfiguration – Fault tolerance of distributed systems– Increase security due to the isolation

Introduction

• Virtualization could potentially solve main issues of the actual Internet (security, mobility, eliability,configurability)

– Overhead due to the additional layers

• Considering this sharing of resources like the – network interfaces – the processors – the memory(buffer space) – the switching fabric

• It is a challenge to get a predictable, stable and optimal performance!



Virtual Network with Xen

• Data path in Xen


Data path in Xen

• VMs in Xen access the network hardware through the virtualization layer

• domU has a virtual interface for each physical network interface

• Virtual interface be accessed via a split device driver(frontend driver in domU , the backend driver in dom0)

1.Data path in Xen1. Network packets emitted on a VM

2. Copied to a segment of shared memory by the Xen hypervisor , transmitted to dom0

3. Packets are bridged (path 1), routed (path 2) between the virtual interfaces and the physical ones

• The additional path a packet (dashed line)• Overhead:

– Copy the shared memory,– multiplexing and demultiplexing


2.Routers data plane virtualization with Xen

• Xen can be used for fully (i. e. control plane and data plane) virtualized software routers

• Figure 2 .Architecture with software routers uploaded into two virtual machines to create virtual routers

• VM not direct access to the physical hardware interfaces

• Packets are forwarded between the virtual interface and corresponding physical interface (multiplexing and demultiplexing)


•


3.Performance problem statement


3.Performance problem statement

• Define the efficiency in terms of throughput

• Fairness of the inter-virtual machine resource sharing is derived from the classical Jain index[6]

• n : Number of VMs sharing the physical resources• Xi: The metric achieved by each virtual machine I


Experiments and Analysis

• 1.Experimental setup• All executed on the fully controlled, reservable and

reconfigurable French national testbed Grid’5000 [4].

• End-hosts are IBM eServers 325 – With 2 CPUs AMD Opteron 246 (2.0 GHz/1MB) – With one core each one – 2GB of memory and a 1Gb/s NIC.


Experiments and Analysis• Virtual routers host on IBM eServers 326m

– With 2 CPUs AMD Opteron 246 (2.0GHz/1MB),– With one core each one – 2GB of memory and 2 1Gb/s NICs.

• Xen 3.1.0 and 3.2.1 with respectively the modified 2.6.18-3 and 2.6.18.8 linux kernels

• Measurement tools– iperf for TCP throughput– netperf for UDP rate– xentop for the CPU utilization– classical ping utility for latency


Experiments and Analysis

• Evaluation of virtual end-hosts– Network performance on virtual end-hosts implemented

with Xen 3.1 and Xen 3.2.

– Some results with Xen 3.1 were not satisfying, dom0 being the bottleneck.

– Second run of the on Xen 3.1 , attributing more CPU time to dom0, (up to 32 times the part attributed to a domU) called Xen 3.1a.


Sending performance

• First experiment– TCP sending throughput on 1, 2, 4 and 8 virtual hosts– Figure 3:Throughput per VM , Aggregate throughput.

– 3.1 and 3.2, close to classical linux throughput Rclassical(T/R) = 938Mb/s

– 3.1a and 3.2, aggregated throughput obtained by VMs reaches roughly more than on 3.1



Sending performance

• Conclude in three cases – The system is efficient and predictable

(Throughput)

• The throughput per VM corresponds to the fair share of the available bandwidth of the link (Rtheoretical/N).


Sending performance

• Average CPU utilization for each guest domain Figure 4.

• For a single domU – Two CPUs are used at around 50% in the three

setups (Xen 3.1, 3.1a and 3.2)

• Linux system without virtualization:– only Cclassical(E) = 32% of both CPUs are in use

• With 8 domUs– Both of the CPUs are used at over 70%



Sending performance

• 3.1a : Increasing dom0’s CPU weight

• Even if virtualization introduces a processing overhead, two processors can allow to achieve a throughput equivalent to the Max theoretical throughput on 8 concurrent VMs using a 1Gb/s link.

• Fairness index is here close to 1 (bandwidth and CPU time are fairly)


2.Receiving performance

Figure 5• Xen 3.1: Aggregate throughput decreases slightly

– (according to the number of VM)

• Only 882Mb/s on a single domU • Only 900Mb/s on a set of 8 concurrent domUs

– What corresponds to around 95% of the throughput Rclassical(T/R) = 938Mb/s on a classical linux system.


Receiving performance.

• The efficiency Ethroughput – Varies between 0.96 for 8 domUs and 0.94 for a

single domU

• By changing scheduler parameters (Xen3.1a) – Improve the aggregate throughput to reach

about 970Mb/s on 8 virtual machines.


Receiving performance

• Xen 3.1, bandwidth between the domUs is very unfair (Growing number of domUs)

• Unfair treatment of the events and has been fixed in Xen 3.2.

• To provide simply dom0 with more CPU time– 3.1a improve fairness in Xen 3.1 by giving dom0

enough time to treat all the events



• Fair resource sharing: – Makes performance much more predictable

• Xen 3.2 is similar to Xen 3.1a– Throughput increases by about 6%

(compared to the default 3.1 version)





• Total CPU cost – Varie between 70% and 75%(Xen3.1 and 3.2)– (important overhead compared to linux system without virtualization)

– Network reception takes Cclassical(R) = 24%

• Notice that on default Xen 3.1– The efficiency in terms of throughput decreases, but the

available CPU time is not entirely consumed – Unfairness



• Proposal improves fairness but increases CPU• Xen 3.2

– DomUs CPU sharing is fair (dom0’s CPU decreases slightly)

– Less total CPU overhead and achieving however better throughput

• Conclude:– Important improvements have been implemented in

Xen 3.2 to decrease the excessive dom0 CPU overhead.




3.Evaluation of virtual routers

• Forwarding performance of virtual routers with 2 NICs– UDP receiving throughput over VMs – Sending Max sized packets on Max link speed over the

virtual routers and the TCP throughput is measured. – Further Latency over virtual routers is measured

• Xen 3.2a – Xen 3.2 in its default configuration – Increased weight parameter for dom0 in CPU

scheduling


3.Evaluation of virtual routers• Forwarding performance.



• Performance of virtual routers– Generate UDP traffic over one or several virtual routers (1

to 8) sharing a single physical machine• Max(1500 bytes) • min (64 bytes)

• Figure 7 ,obtained UDP bit rate and TCP throughput• Packet loss rate with Max sized packets on each VM

1 − Rtheoretical/(N × Rtheoretical)• Classical linux router Rclassical(F) = 957Mb/s



• Details the UDP packet rates and the loss rates per domU with Max and min sized packets.


3.Evaluation of virtual routers• Aggregate UDP some cases a bit higher than theoretical value

– Due to little variation in the start times of the different flows

• Resource sharing is fair – Performance of this setup is predictable

• With min sized packets on 4 or 8 virtual routers , dom0 becomes too overloaded

• Giving a bigger CPU part to dom0 (Xen 3.2a)– Overall TCP throughput increases



• Virtual router(VR) latency.– Concurrent virtual routers sharing the same

physical machine are either idle or stressed forwarding Max rate TCP flows.


Related work

• Performance of virtual packet transmission in Xen is a crucial subject and has been treated in several papers


Conclusion and perspectives

• Virtualization mechanisms are costly– Additional copy – I/O scheduling of virtual machines sharing the

physical devices

• Virtualizing the data plane by forwarding packets in domU becomes a more and more promising approach


Conclusion and perspectives

• End-host throughput improved in Xen 3.2 compared to 3.1• Virtual routers act similar to classical linux routers

forwarding big packets.

• Latency is impacted by the number of concurrent virtual routers .

• Our next goal is to evaluate the performance on 10 Gbit/s links and implement virtual routers on the Grid’5000 platform.


Documents

* Distributed System Lab 1 Analysis and experimental