7

Click here to load reader

Virtual Distributed Environments in a Shared … network technologies facilitates the creation of virtual distributed environments in a shared infrastructure. Paul Ruth Xuxian Jiang

  • Upload
    lamkien

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Virtual Distributed Environments in a Shared … network technologies facilitates the creation of virtual distributed environments in a shared infrastructure. Paul Ruth Xuxian Jiang

0018-9162/05/$20.00 © 2005 IEEE May 2005 63

C O V E R F E A T U R E

P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y

Virtual DistributedEnvironments in aShared Infrastructure

S panning multiple domains, the Grid pro-vides for the federation, allocation, andmanagement of heterogeneous networkedresources and makes them available to alarge number of users. Advances in Grid

computing technology have contributed to the formation of a wide-area shared cyber infrastruc-ture that provides opportunities for a broad spec-trum of distributed and parallel computing appli-cations to take advantage of the massive aggregatecomputational power available across the Internet.However, realizing the full potential of such ashared infrastructure presents significant researchchallenges.

Many Grid users are familiar with the traditionaljob submission and execution model as well as theservice-oriented access model as defined by theOpen Grid Services Architecture (OGSA).1 Poweredby key enabling technologies such as Globus’ GridResource Allocation and Management (GRAM),2

the Grid provides a single detail-hiding interface forrequesting and using networked resources to exe-cute jobs that users submit. The widely used job andservice models define an appropriate paradigm forresource sharing and program execution.

However, some applications are operationalrather than functional, making it difficult to mapthem to independent jobs or services. Examples ofthese applications include the fine-grained emula-tion of real-world systems such as airport opera-tions and antiterrorism exercises, each of whichinvolves numerous dynamic and diverse objects,contexts, and object-interaction patterns.

Furthermore, some applications require speciallyconfigured and customized execution environ-ments, including operating systems, network-levelservices, and application-level services, packages,and libraries. For example, many scientific applica-tions require mathematical and communicationlibraries such as basic linear algebra subprogramsand a message-passing interface (MPI), and a Javaapplication might insist on a specific version of Javavirtual machine. It might not always be possible tosatisfy such requirements in a shared infrastructure.Moreover, requirements could conflict with eachother—for example, the applications might requiredifferent versions of the same library.

Finally, it isn’t possible to prevent users fromrunning potentially untrustworthy or malfunction-ing applications. Developers can introduce bugsor vulnerabilities into an application either inad-vertently or deliberately. Well-known applicationssuch as SETI@Home (www.securityfocus.com/bid/7292), for example, have exhibited securityvulnerabilities that could be exploited to launch a malicious attack against any machine on theInternet.

As a result, containing the impact of a networkattack on an application is critical to prevent anycascading effect on other applications or the under-lying shared infrastructure.

We propose a paradigm to accommodate distrib-uted applications that are hard to map to jobs orservice instances, require customized execution andnetwork environments, or require strong contain-ment of security risk and impact.

A middleware system that integrates and elevates virtual machine and virtual network technologies facilitates the creation of virtual distributedenvironments in a shared infrastructure.

Paul RuthXuxian JiangDongyan XuSebastienGoasguenPurdue University

Page 2: Virtual Distributed Environments in a Shared … network technologies facilitates the creation of virtual distributed environments in a shared infrastructure. Paul Ruth Xuxian Jiang

64 Computer

We’ve developed a middleware systemthat integrates and extends virtual machineand virtual network technologies to supportmutually isolated virtual distributed environ-ments in shared infrastructures like the Gridand the PlanetLab overlay infrastructure(www.planet-lab.org).3

VIRTUALIZATION MIDDLEWAREClearly, a shared infrastructure needs to

enable mutually isolated distributed environ-ments to complement the job and service-ori-ented sharing model. Mutually isolated

distributed environments should support

• on-demand creation, • customizability and configurability, • binary compatibility for applications, and • containment of negative impact of malicious

or malfunctioning applications.

Virtualization is emerging as a promising solu-tion. It introduces a level of indirection betweenapplications and the shared infrastructure.Developers have introduced technologies for thevirtualization of machines and networks.

Examples of virtual machine technologies includeVMware (www.vmware.com), User-Mode Linux(UML, http://user-mode-linux.sourceforge.net),and Xen.4 Although their implementations differ,each of these systems offers virtual machines thatachieve binary compatibility and networking capa-bility like real machines. Examples of virtual net-work technologies include VNET5 and VirtualInternetworking on Overlay Infrastructures(Violin),6 which both create virtual IP networks forvirtual machine communications.

Renato Figueiredo and coauthors first proposedapplying virtual machine technology to Grid com-puting.7 They identified major advantages of vir-tual machines for the Grid, including security,isolation, customization, legacy support, adminis-trator privileges, resource control, and site inde-pendence. Their In-VIGO8 and Virtuoso9 projectsare among the first to address Grid resource virtu-alization and heterogeneity masking. With In-VIGO’s VMPlants architecture,10 the system canautomatically create customized virtual machines ina shared execution environment that adhere to thetraditional job submission and execution model.

We’ve taken virtualization to the next level bydeveloping middleware that creates virtual distrib-uted environments in a shared infrastructure suchas the Grid and PlanetLab. Our goal is to provide

customized and consistent runtime and network-ing environments for a wide spectrum of distrib-uted applications.

FROM VIRTUAL MACHINES TO VIRTUALDISTRIBUTED ENVIRONMENTS

In previous work,11 we extended virtual machinetechnologies to create virtual private servers. Theseservers are UML-enabled virtual machines3 re-quested and instantiated on demand, each with aregular IP address and a customized installation ofOS and application services. Enhancing the under-lying host OS guarantees each virtual machine aportion of the physical host’s CPU, memory, andbandwidth.

However, virtual machine technologies onlyachieve isolation between individual virtualmachines. A set of virtual machines does not createan isolated virtual network because the machinesare addressable to and from other Internet hosts.Moreover, creating multiple virtual networks is dif-ficult because all the virtual machines share thesame IP address space, and conflict can occurbetween the virtual networks.

Virtual network technologies such as Violin andVNET solve this problem by creating high-ordervirtual IP networks that offer the following fea-tures:

• Each virtual network has its own IP addressspace. The address spaces of different virtualnetworks can safely overlap.

• Virtual machines in a virtual network aren’tvisible on the Internet, preventing attacks bothfrom the virtual machines to the Internet andfrom the Internet directly to the virtualmachines.

• Within a virtual network, users can specify andenforce selected (rather than complete) con-nectivity between virtual machines based onthe application’s communication pattern.

• A user-specified threshold bounds the trafficvolume on each virtual machine interconnec-tion, preventing ill-behaving virtual machinesfrom generating excessive traffic. Furthermore,it isn’t possible to tamper with enforcement ofthe virtual network topology and traffic vol-ume from inside the virtual machines.

Violin achieves network virtualization by intro-ducing user-level communication indirectionbetween the virtual machines and the underlyinginfrastructure. Virtual network-enabling entities—Violin daemons—are located at this communica-

Virtualization introduces a level

of indirectionbetween

applications and the shared

infrastructure.

Page 3: Virtual Distributed Environments in a Shared … network technologies facilitates the creation of virtual distributed environments in a shared infrastructure. Paul Ruth Xuxian Jiang

May 2005 65

tion-indirection level. These daemons form a user-level overlay network that serves as the underlyingvirtual network traffic carrier.

Inside the virtual network, the virtual machinesuse standard IP services to communicate with eachother. Below the virtual network, the Violin dae-mons emulate these services via application-levelmechanisms such as User Datagram Protocol(UDP) tunneling. An important benefit of thisdesign is that the daemons can emulate networkservices that are not widely deployed in the Internet(for example, IP multicast) and enable them in avirtual network.

In VNET, because network virtualization is notcompletely realized at the user level, host kernel-level devices have to be created to tunnel networktraffic. An advanced feature of VNET is its dynamictopology adaptation capability:9 The virtual net-work’s topology adapts to the virtual machine com-munication pattern that the application running inthe virtual network exhibits, thus dynamicallyimproving the application’s runtime communica-tion performance.

Integrating virtual network and on-demand vir-tual machine creation and customization11 tech-nologies makes virtual distributed environments areality. The Violin-based middleware system inte-grates and enhances such technologies to create vir-tual distributed environments. For simplicity, wecall these environments Violins when there’s noambiguity.

Violins offer the four desirable features of a vir-tual distributed environment:

• on-demand creation of both virtual machinesand the virtual IP network connecting them;

• customization of the virtual network topologyand services, OS services, and application ser-vices, packages, and libraries;

• achieving binary compatibility by creating thesame runtime and network environment underwhich an application was originally developed;

• containment of negative impact by isolatingvirtual network address space, limiting bothvirtual machine resources and inter-virtual-machine traffic rates, and granting users vir-tual administrator privileges valid only withinViolins.

Figure 1 shows a multilayer overview of twoViolins on top of a shared infrastructure. In the bot-tom layer, the physical infrastructure, with hetero-geneous networked resources, spans multiplenetwork domains. In the middle layer, middleware

systems federate the networked resources to forma shared infrastructure. We deployed our virtual-ization middleware at this layer. The top layer con-sists of two mutually isolated Violins, each with itsown network, OS, and application services cus-tomized for the application running in it.

VIOLIN DETAILS Violin realizes the functionalities that connect vir-

tual machines residing on different host machines,allowing communication across a virtual IP net-work without having a presence in the underlyingInternet. Although it’s possible to implementViolin’s features by extending other virtualizationarchitectures, our implementation takes advantageof UML’s user-level execution feature and opensource code.

Standard UML virtual machines are instantiatedas processes in a Linux user’s execution space.Communication outside the host machines occursthrough a virtual network interface, or tap device,residing in the host’s kernel. The UML virtualmachine contains a virtual network interface thatconnects to the host’s tap device, and the host actsas a router, forwarding packets between the virtualmachine and the physical network. A user will needroot-level privileges to safely create the tap deviceand manage the routing table. In addition, the vir-tual machine needs an IP address that is routableon the physical network to access the networkthrough the tap device.

Violin bypasses the need for tap devices and letsvirtual machines exist in an orthogonal IP addressspace that is totally decoupled from the Internet. AViolin-enabled virtual machine contains a virtualnetwork interface that does not connect to the host;instead, it connects to a virtual switch (a Violin dae-mon) running in the host machine’s user space. Thevirtual switch behaves like a physical switch,

Internet

Two mutuallyisolatedViolins

Virtualmachine

Middleware-federated

infrastructure

Physicalinfrastructure

Physicalmachines

Figure 1. Multilayeroverview of two virtual distributedenvironments (Violins) on top of a sharedinfrastructure.

Page 4: Virtual Distributed Environments in a Shared … network technologies facilitates the creation of virtual distributed environments in a shared infrastructure. Paul Ruth Xuxian Jiang

66 Computer

accepting and forwarding virtual layer-2 framesbetween virtual machines.

For efficiency, the virtual switches mimic physi-cal network links by tunneling the network framesvia the lightweight but unreliable UDP. Using trans-port protocols (such as TCP) inside the virtualdomain can achieve end-to-end reliability betweenvirtual machines. By tunneling virtual network traf-fic through user-level connections instead of rout-ing the traffic directly through the underlyingnetwork, Violin creates a virtual network of virtualmachines with only user-level privileges.

EXPERIMENTAL RESULTS We have successfully deployed the Violin-based

middleware system for virtual distributed environ-ments in PlanetLab and in local cluster platforms.

Earlier work demonstrated Violin’s communica-tion performance using PlanetLab-based measure-ments.6 In these studies, we measured both TCPthroughput and Internet Control Message Protocollatency between a pair of virtual machines in aViolin and between the underlying PlanetLab hosts.Our results showed that the virtual machines’ per-formance is constantly within a small degradationfactor (less than 15 percent) of the PlanetLab hosts’performance.

HPL performance comparisonTo compare Violin’s performance with the phys-

ical cluster’s performance, we conducted experi-ments using the High-Performance Linpack (HPL)benchmark. Our experiments’ methodology wasto stress Violin to its limit and find the maximumnumber of floating-point operations per second(flops) it can achieve relative to the flops achievablein the physical cluster without virtualization.

Each host in the cluster consisted of dual 1.2-GHz Athlon processors and 1-Gbyte RAM runningDebian Linux 3.0. A 100-megabyte-per-secondEthernet switch connected the hosts.

We ran HPL in an increasing number of proces-sors in the physical cluster and ran the sameincreasing number of virtual machines in a Violinon top of the same cluster. For a given number ofprocessors or virtual machines, we tuned HPLparameters including problem size, block size, andprocess grid shape to reach its maximum perfor-mance level.

We set the HPL problem-size parameter as themaximum that Violin can handle and used thesame problem size for both the virtual and physi-cal cases. We separately tuned the other parametersfor maximum performance in each case. Becausea cluster node has dual processors, we created twoMPI processes and two virtual machines in eachcluster node in the physical and virtual cases,respectively.

Figure 2 shows the experimental results for run-ning HPL in up to 64 processors or virtualmachines. These results indicate that Violin con-stantly achieves approximately 80 percent of thephysical cluster’s performance. This trend scales toat least 64 virtual machines. Such a performancepenalty is small enough to justify Violin’s practical-ity, and the scaling trend shows promise for its usein larger infrastructures.

0

5

10

15

20

25

30

35

2 4 8 16 32 64

Gflo

ps

ViolinPhysical cluster

Number of processors or virtual machines

Figure 2. Performance comparison. Comparison of results achieved running High-Performance Linpack (HPL) in a physical cluster and in Violin on top of thesame physical cluster nodes shows a 20 percent performance penalty, a scalingtrend that shows promise for using Violins in even larger infrastructures.

Number of Violins/total number of virtual machines

0

1

2

3

4

5

6

7

8

Gflo

ps

1/32 2/64 4/128 8/256 16/512

Figure 3. HPL performance with multiple Violins. Each bar represents the aggregate performance observed during each test, with each bar segment representing one Violin’s performance. Each Violin consists of 32 virtualmachines; thus, in the 16-Violin test, 512 virtual machines concurrently run in 32 cluster nodes.

Page 5: Virtual Distributed Environments in a Shared … network technologies facilitates the creation of virtual distributed environments in a shared infrastructure. Paul Ruth Xuxian Jiang

May 2005 67

Sharing among multiple Violins We also evaluated the performance of multiple

Violins sharing the same set of physical nodes. Inthese studies, we used 32 cluster nodes to createmultiple Violins, each with 32 virtual machines. Weincreased the number of Violins (from 1 to 16) shar-ing the physical nodes and measured each Violin’sperformance. In the 16-Violin test, each clusternode supported 16 virtual machines, limiting eachto 32 Mbytes of memory.

To make the comparison fair, we found the largestHPL problem size that would run with 32 virtualmachines each limited to 32 Mbytes of memory andused the same problem size for all tests in this exper-iment. We tuned other HPL parameters to find eachtest’s highest performance. Our virtual networktechnology lets all running Violins use the same setof IP addresses for their virtual machines withoutany conflict.

As Figure 3 shows, as more virtual machinesshare the same cluster resources, they each receivea decreasing but equal share of the resources, withlittle degradation in their aggregate performance.In fact, the aggregate performance increased for upto eight Violins and slightly decreased for the 16-Violin case. Thus, increasing the number of Violinsincurs very small overhead in terms of loss of aggre-gate performance.

Violin communication-pattern profiling Violin provides a convenient facility for moni-

toring and profiling the communication pattern andtraffic rate among virtual machines. Violin’s all-software user-level implementation allows easyextension for network monitoring capability.

Figure 4 shows the communication profile gen-

erated by running HPL in an eight-virtual machineViolin. This communication pattern is more irreg-ular and complicated than the classic master-work-ers pattern seen in distributed applications such asCondor12 and SETI@home. This suggests that theplacement of virtual machines will significantlyimpact an application’s communication perfor-mance when the underlying infrastructure involvesheterogeneous network connections between phys-ical hosts.

ONGOING WORK To make virtual distributed environments more

self-adaptive, we’re extending our middleware sys-tem by adding intelligence to Violins. Each virtualenvironment will behave like an active organismthat manages itself with minimum user attention.Furthermore, a virtual environment will adapt itselfat runtime by adjusting its resource allocation,scale, and topology driven by the dynamics of theapplication running inside. Virtualization technolo-gies offer great flexibility in attaining these goals.

Figure 5 presents a simplified example illustrat-ing support for highly dynamic applications in anext-generation self-adaptive virtual distributedenvironment. In this example, the application is anagent-based emulation of crowds of people. Each

5 Mbps

5 Mbps7 Mbps

7 Mbps

6 Mbps 5 Mbps3 Mbps

4 Mbps

6 Mbps

6 Mbps6 Mbps0 4 6 3

5172

Figure 4. Communication profile. Running HPL in a Violin with eight virtualmachines generates an irregular communication pattern. The arrows’ directionand thickness indicate the traffic flows and rates between virtual machines.

Scale-up Grow Migrate

Time

Figure 5. Next-generation virtual distributed environment. The application is an agent-based emulation of crowds ofpeople that studies their mobility and interactions in a specific context. The virtual distributed environment adaptsitself to the application’s dynamics.

Page 6: Virtual Distributed Environments in a Shared … network technologies facilitates the creation of virtual distributed environments in a shared infrastructure. Paul Ruth Xuxian Jiang

68 Computer

colored square represents a geographic region withunique environmental characteristics. The emula-tion’s goal is to study the mobility of individuals aswell as their interactions with each other and withthe environment in a specific context such as shop-ping or commuting.

At the beginning of the emulation (the left-mostinstance), the few people present are in two of thefour regions. At this point, the virtual environmentconsists of two virtual machines emulating the tworegions and the people inside. As time elapses, thecrowd in the green region grows, causing a scale-upin the capacity of the virtual machine emulating thegreen region. Later, more people arrive, and thereare people in the brown region. At this point, the vir-tual environment grows by creating a new virtualmachine for the brown region. The last (right-most)instance doesn’t involve a change in the emulatedworld, but instead shows a virtual machine’s migra-tion due to a change in the resource availability of theunderlying physical host. The virtual environmentinitiates and performs all the adaptations.

T he concept of a virtual distributed environ-ment provides a powerful abstraction for theisolation and customization of runtime and

networking systems for different applications run-ning in a shared infrastructure. We propose theintegration and elevation of virtualization technolo-gies to support mutually isolated virtual distributedenvironments in shared infrastructures, such as theGrid and PlanetLab.

Our experimental results using the Violin-basedvirtualization middleware support further researchand wider deployment of virtual distributed envi-ronments as a new paradigm for distributed com-puting. In particular, next-generation Violins willpossess the intelligence for self-provisioning andadaptation, providing more agile and accommo-dating virtual distributed environments for highlydynamic distributed applications. ■

AcknowledgmentThe US National Science Foundation partly

funded this research under grants SCI-0438246 andSCI-0504261.

References 1. I. Foster et al., “The Physiology of the Grid: An Open

Grid Services Architecture for Distributed SystemsIntegration,” Open Grid Service Infrastructure WG,

Global Grid Forum, June 2002; http://www.globus.org/research/papers/ogsa.pdf.

2. I. Foster and C. Kesselman, “Globus: A Toolkit-Based Grid Architecture,” The Grid: Blueprint for aNew Computing Infrastructure, Morgan Kaufmann,1999, pp. 259-278.

3. L. Peterson et al., “A Blueprint for Introducing Dis-ruptive Technology into the Internet,” Proc. ACMWorkshop on Hot Topics in Networking (HotNets-I),ACM Press,2003, pp. 59-64.

4. P. Barham et al., “Xen and the Art of Virtualization,”Proc. ACM Symp. Operating Systems Principles(SOSP), ACM Press, 2003, pp. 164-177.

5. A. Sundararaj and P. Dinda, “Toward Virtual Net-works for Virtual Machine Grid Computing,” Proc.3rd Usenix Virtual Machine Technology Symp.,Usenix, 2004, pp. 177-190.

6. X. Jiang and D. Xu, “Violin: Virtual Internetwork-ing on Overlay Infrastructure,” Proc. 2nd Int’l Symp.Parallel and Distributed Processing and Applications,LNCS 3358, Springer-Verlag, 2004, pp. 937-946.

7. R. Figueiredo, P. Dinda, and J. Fortes, “A Case forGrid Computing on Virtual Machines,” Proc. 23rdInt’l Conf. Distributed Computing Systems (ICDCS),IEEE CS Press, 2003, pp. 550-559.

8. S. Adaballa et al., “From Virtualized Resources toVirtualized Computing Grid: The In-VIGO System,”J. Future-Generation Computing System, to appear.

9. A. Sundararaj, A. Gupta, and P. Dinda, “DynamicTopology Adaptation of Virtual Networks of VirtualMachines,” Proc. 7th Workshop Languages, Compil-ers, and Runtime Support for Scalable Systems, 2004,http://www.tlc2.uh.edu/lcr2004/Final_Proceedings/Sundararaj.pdf.

10. I. Krsul et al., “VMPlants: Providing and ManagingVirtual Machine Execution Environments for GridComputing,” Proc. IEEE/ACM Supercomputing,IEEE CS Press, 2004, p. 7.

11. X. Jiang and D. Xu, “SODA: A Service-on-DemandArchitecture for Application Service Hosting UtilityPlatforms,” Proc. 12th IEEE Int’l Symp. High-Per-formance Distributed Computing (HPDC-12), IEEECS Press, 2003, pp. 174-183.

12. D.H.J Epema et al., “A Worldwide Flock of Condors:Load-Sharing among Workstation Clusters,” J.Future-Generation Computer Systems, vol. 12, no.1, 1996, pp. 53-65.

Paul Ruth is a PhD candidate in the Department ofComputer Science at Purdue University and amember of the Laboratory for Research in Emerg-ing Network and Distributed Services (Friends).His research interests include virtualization, high-

Page 7: Virtual Distributed Environments in a Shared … network technologies facilitates the creation of virtual distributed environments in a shared infrastructure. Paul Ruth Xuxian Jiang

May 2005 69

performance computing, and Grid computing.Ruth received a BS in computer science and math-ematics from Baker University, Kansas. He is amember of the IEEE. Contact him at [email protected].

Xuxian Jiang is a PhD candidate in the Depart-ment of Computer Science at Purdue Universityand a member of the Laboratory for Research in Emerging Network and Distributed Services(Friends). His research interest is in virtualization-based system security and distributed computing.Jiang received an MS in computer science fromXi’an Jiaotong University, China. He is a memberof the IEEE, the ACM, and Usenix. Contact him [email protected].

Dongyan Xu is an assistant professor in the Depart-ment of Computer Science at Purdue University

and director of the Laboratory for Research inEmerging Network and Distributed Services(Friends). His research interest is in virtualization-based distributed computing and system security.Xu received a PhD in computer science from theUniversity of Illinois at Urbana-Champaign. He isa member of the IEEE, the ACM, and Usenix. Con-tact him at [email protected].

Sebastien Goasguen is a research scientist anddirector of the Grid and Distributed ComputingGroup within the Department of InformationTechnology at Purdue University. His researchinterests include applied computational science,parallel programming, virtual machines, and Gridcomputing. Goasguen received a PhD in electricalengineering from Arizona State University. He is amember of the IEEE. Contact him at [email protected].