14
Future Generation Computer Systems xxx (2004) xxx–xxx From virtualized resources to virtual computing grids: the In-VIGO system Sumalatha Adabala, Vineet Chadha, Puneet Chawla, Renato Figueiredo, José Fortes , Ivan Krsul, Andrea Matsunaga, Mauricio Tsugawa, Jian Zhang, Ming Zhao, Liping Zhu, Xiaomin Zhu Advanced Computing and Information Systems (ACIS) Laboratory, University of Florida, Gainesville, FL 32611-6200, USA Received 25 September 2003; received in revised form 11 November 2003 Abstract This paper describes the architecture of the first implementation of the In-VIGO grid-computing system. The architecture is designed to support computational tools for engineering and science research In Virtual Information Grid Organizations (as opposed to in vivo or in vitro experimental research). A novel aspect of In-VIGO is the extensive use of virtualization technol- ogy, emerging standards for grid-computing and other Internet middleware. In the context of In-VIGO, virtualization denotes the ability of resources to support multiplexing, manifolding and polymorphism (i.e. to simultaneously appear as multiple resources with possibly different functionalities). Virtualization technologies are available or emerging for all the resources needed to construct virtual grids which would ideally inherit the above mentioned properties. In particular, these technolo- gies enable the creation of dynamic pools of virtual resources that can be aggregated on-demand for application-specific user-specific grid-computing. This change in paradigm from building grids out of physical resources to constructing virtual grids has many advantages but also requires new thinking on how to architect, manage and optimize the necessary middleware. This paper reviews the motivation for In-VIGO approach, discusses the technologies used, describes an early architecture for In-VIGO that represents a first step towards the end goal of building virtual information grids, and reports on first experiences with the In-VIGO software under development. © 2004 Elsevier B.V. All rights reserved. Keywords: Virtual machines; Grid-computing; Middleware; Virtual data; Network computing 1. Introduction Information “grids” promise to revolutionize sci- entific computing and information processing by en- abling access to unprecedented computing power and functionality of diverse interconnected computers, geographically distributed software applications and other network-accessible resources. Fundamentally, Corresponding author. E-mail address: [email protected] (J. Fortes). the goal of a grid infrastructure is to provide “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources” [1]. Traditional approaches to grid-computing have focused on middleware-driven management of com- puters, storage, networks and other resources through sharing mechanisms that have evolved from centrally administered domains—multi-user operating systems, user accounts and file systems [2]. The middleware of these approaches interact with physical resources at the same level as local users and applications do. 0167-739X/$ – see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2003.12.021

From virtualized resources to virtual computing grids: the In-VIGO

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: From virtualized resources to virtual computing grids: the In-VIGO

Future Generation Computer Systems xxx (2004) xxx–xxx

From virtualized resources to virtual computinggrids: the In-VIGO system

Sumalatha Adabala, Vineet Chadha, Puneet Chawla, Renato Figueiredo,José Fortes∗, Ivan Krsul, Andrea Matsunaga, Mauricio Tsugawa,

Jian Zhang, Ming Zhao, Liping Zhu, Xiaomin ZhuAdvanced Computing and Information Systems (ACIS) Laboratory, University of Florida, Gainesville, FL 32611-6200, USA

Received 25 September 2003; received in revised form 11 November 2003

Abstract

This paper describes the architecture of the first implementation of the In-VIGO grid-computing system. The architectureis designed to support computational tools for engineering and science research In Virtual Information Grid Organizations (asopposed to in vivo or in vitro experimental research). A novel aspect of In-VIGO is the extensive use of virtualization technol-ogy, emerging standards for grid-computing and other Internet middleware. In the context of In-VIGO, virtualization denotesthe ability of resources to support multiplexing, manifolding and polymorphism (i.e. to simultaneously appear as multipleresources with possibly different functionalities). Virtualization technologies are available or emerging for all the resourcesneeded to construct virtual grids which would ideally inherit the above mentioned properties. In particular, these technolo-gies enable the creation of dynamic pools of virtual resources that can be aggregated on-demand for application-specificuser-specific grid-computing. This change in paradigm from building grids out of physical resources to constructing virtualgrids has many advantages but also requires new thinking on how to architect, manage and optimize the necessary middleware.This paper reviews the motivation for In-VIGO approach, discusses the technologies used, describes an early architecture forIn-VIGO that represents a first step towards the end goal of building virtual information grids, and reports on first experienceswith the In-VIGO software under development.© 2004 Elsevier B.V. All rights reserved.

Keywords:Virtual machines; Grid-computing; Middleware; Virtual data; Network computing

1. Introduction

Information “grids” promise to revolutionize sci-entific computing and information processing by en-abling access to unprecedented computing power andfunctionality of diverse interconnected computers,geographically distributed software applications andother network-accessible resources. Fundamentally,

∗ Corresponding author.E-mail address:[email protected] (J. Fortes).

the goal of a grid infrastructure is to provide “flexible,secure, coordinated resource sharing among dynamiccollections of individuals, institutions, and resources”[1]. Traditional approaches to grid-computing havefocused on middleware-driven management of com-puters, storage, networks and other resources throughsharing mechanisms that have evolved from centrallyadministered domains—multi-user operating systems,user accounts and file systems[2]. The middlewareof these approaches interact with physical resourcesat the same level as local users and applications do.

0167-739X/$ – see front matter © 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.future.2003.12.021

Page 2: From virtualized resources to virtual computing grids: the In-VIGO

2 S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

Therefore, such approaches must deal with the com-plexity of decoupling local administration policiesand configuration idiosyncrasies of distributed re-sources from the quality of service expected from endusers with respect to security, flexibility and perfor-mance. This complex decoupling can be simplified, oreven eliminated, by fundamentally changing the waygrid-computing is performed. The change advocatedby this paper and the In-VIGO approach consistsof raising the level of abstraction at which the mid-dleware enables resource sharing from physical tovirtualized resources.

Virtualization technologies encompass a varietyof mechanisms and techniques used to decouple thearchitecture and user-perceived behavior of hardwareand software resources from their physical implemen-tation. The decoupling provided by virtualization en-ables functions and capabilities that are desired fromcomputer systems, including consolidation of physicalresources, security and isolation, flexibility and easeof management. These and other characteristics havemotivated the use of virtualization technologies indifferent areas of computer system design. Examplesinclude:

• Virtual memory[3], which allows multiple virtualaddress spaces to share a single physical memory,and provides to software the view of a memorylarger than its physical implementation.

• Classical virtual machines[4] (e.g. VMware, IBMz800), which enable the sharing of a single physicalcomputer (CPU, memory, I/O devices) by multiplevirtual computers, each independently configuredwith their own operating system and applications.

• Application virtual machines and associated lan-guages (e.g. Java and C#), which allow virtual ma-chines to execute programs in physical machineswith different instruction set architectures.

• Virtual private networks[5], which provide secureprivate networks over a shared public network.

Virtualization is the basis of the In-VIGO architec-ture. As a distributed computing system that includesprocessing nodes, storage devices, networks, softwareapplications, and user interfaces, In-VIGO uses virtu-alization technologies in different forms, which havethree fundamental capabilities in common. Thesecapabilities are: polymorphism, manifolding and mul-tiplexing. They allow resources to simultaneously

Virtual Machine V2

Polymorphism Multiplexing

Virtual Machine V3Virtual Machine V1

Server “host” P

Manifolding

Fig. 1. A virtual machine monitor allows a server P to be sharedby three distinct virtual machines.

(via multiplexing) appear as multiple resources (viamanifolding) with possibly different functionalities(via polymorphism). These properties are illustratedin Fig. 1, where processor virtualization technologyis used to enable a single physical computer to si-multaneously appear as three separate machines withdifferent operating systems and applications.

This paper is organized as follows.Sections 2and 3introduce the In-VIGO approach and core tech-niques to support virtualized grid resources, data, andinterfaces. A detailed description of the In-VIGO ar-chitecture appears inSection 4. Section 5discussesthe current implementation of In-VIGO. Conclusionsfrom the design, development and initial experienceswith In-VIGO appear inSection 6.

2. The In-VIGO concept

In-VIGO provides a distributed environment wheremultiple application instances can coexist in virtualor physical resources, such that clients are unawareof the complexities inherent to grid-computing. TheIn-VIGO project leverages experience with early sys-tems and standards for grid-computing[1,6,7], and

Page 3: From virtualized resources to virtual computing grids: the In-VIGO

S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 3

XULUIML Virtual interfaces

Virtual

networksNFS SQL

Virtual

dataC#

NFS Virtual

applications

VMware IBMz800 machines

Virtual

JINI.netGlobus

XML

ServicesServices

Virtual information gridsOGSA

HTTP

N nohubOtherpor

UDDI SOAP

WSDL

a

SigmicroNetcare

Services Services

Virtual computing grids

Networks

TCP/IPUDP

DataAMachines

Fig. 2. The In-VIGO approach.

makes extensive use of virtualization technology forthe creation of dynamic pools of virtual resources thatcan be aggregated on-demand.

The In-VIGO approach, as depicted inFig. 2, addsthree layers of virtualization to the traditional grid-computing model. The first virtualization layer createspools of virtual resources that are the “primitive” com-ponents of a virtual computing grid, namely virtualmachines, virtual data, virtual applications and virtualnetworks. This layer decouples the process of allocat-ing applications to resources from that of managingjobs across administrative domains, physical machinesand local software configurations. In other words, jobsare mapped to virtual resources (e.g. RedHat Linux 7.0x86 virtual machines), and only virtual resources getmanaged across domains and physical environments(e.g. WinXP and Win2000 physical systems at differ-ent locations).

In the second layer, grid applications are instanti-ated as services which can be connected as needed tocreate virtual information grids. This layer decouplesthe process of using and composing services from thatof managing the execution of the underlying grid ap-plications. Different grid-computing mechanisms (e.g.Globus, Condor-G, .NET and JXTA) can be used torun applications but their implementation details arelargely hidden once the grid-enabled applications areencapsulated and composed as services[8] (e.g. usingOGSI, OGSI.NET and Jini).

In the third layer, aggregated services (possibly pre-sented to users via portals) export interfaces that arevirtualized in order to enable displaying by differ-ent access devices. This layer decouples the processof generating interfaces of services (e.g. XML andUIML) from the process of rendering them on specificdevices (e.g. HTML for laptop, WHML for a palmtopand WAP WML for a cell phone).

The currently deployed In-VIGO system1 imple-ments the first layer of virtualization (with exceptionof virtual networks)—users can develop applicationsor use one of several grid-enabled tools with interac-tive as well as batch-oriented interfaces. The authorsare currently developing research prototypes for theother virtualization layers. The focus of this paper ison the virtual computing grid layer (i.e. the first virtu-alization layer discussed above) as it is implementedin the current In-VIGO implementation.

3. Virtualization in In-VIGO

The In-VIGO architecture is built upon componentsthat allow for the virtualization of grid resources anduser interfaces. A distributed virtual file system fa-cilitates data transfer across grid resources; virtualmachines provide isolation, resource integrity, legacysoftware support, environment encapsulation and cus-tomization; virtual applications allow modification andextension of individual tool behavior as well as the useof a collection of tools as a single unified application;virtual networks allow the configuration of a networkthat meets the requirements of a specific grid applica-tion or user; and virtual interfaces allow applicationsto conform to changing environments, allowing uni-versal access to a long-lived application session.

3.1. Virtual data and the virtual file system

Virtualization techniques can be applied to facili-tate the transfer of data across grid resources. The griddistributed virtual file system forms the basic frame-work for the transfer of data necessary to In-VIGOapplications. It relies on a virtualization layer builton existing Network File System (NFS) components,

1 Accessible athttp://invigo.acis.ufl.edu. Courtesy accounts areavailable.

Page 4: From virtualized resources to virtual computing grids: the In-VIGO

4 S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

and is implemented at the level of Remote ProcedureCalls (RPC) by means of middleware-controlled filesystem proxies[9]. A virtual file system proxy in-tercepts RPC calls from an NFS client and forwardsthem to an NFS server, possibly modifying argumentsand returning values in the process. Through the useof proxies, the virtual file system currently supportsmultiplexing, manifolding and polymorphism by (1)sharing of a single file account across multiple users,(2) allowing multiple per-user virtual file systemsessions in a single server, and (3) mapping of userand group identities to allow for cross-domain NFSauthentication[10]. Proxy-level transformations canalso be used to support data virtualization, enablingpolymorphism and manifolding of file contents; al-though this capability is not currently exploited inIn-VIGO, it is the subject of ongoing research. Po-tential applications of data content virtualizationinclude translation of file formats, language transla-tion, summarization and reduction of data, or otherdata transformations invoked at data-access time(Fig. 3).

3.2. Virtual machines

A “classic” virtual machine (VM) is an efficient,isolated duplicate of the underlying physical ma-chine [4]. Unlike virtual machines associated witha particular programming language (e.g. the JavaVM), the layer of indirection in which “classic”VMs operate is that of the instruction set architecture(ISA) of the physical machine. A virtual machinemonitor (VMM) is responsible for providing the per-

Fig. 3. Proxy transformations can be used to support data virtualization, enabling polymorphism and manifolding of file contents.

ception of multiple, isolated machines that share asingle physical resource by intercepting and emu-lating the behavior of privileged ISA instructions.Classic VMs support manifolding (multiple VMsper physical resource), multiplexing (time-sharingof CPU, space-sharing of memory and disk amongseveral VMs), and polymorphism (with respect tovirtual hardware configuration, operating systems andapplications).

The virtualization properties of classic VMs are at-tractive in a grid-computing environment for severalreasons, including user isolation, resource integrity,legacy software support, environment encapsulationand customization[11]. The classic virtual machinemonitors implemented by VMware[12] and IBMzVM are currently used in In-VIGO.

3.3. Virtual applications

A virtual application can appear differently fromthe actual application or applications being used, andcan be replicated so that multiple executions can berun simultaneously over the same physical appli-cation. The implementation of virtual applicationsshares analogies with the approaches used to imple-ment both virtual data and virtual machines. Like vir-tual data, virtual applications use a virtual file systemto support manifolding and multiplexing of applica-tion codes. Like virtual machines, virtual applicationsuse a monitor to modify the semantics and capabil-ities presented to application users, thus supportingpolymorphism. In addition, virtual applications canconsist of multiple applications, either as multiple

Page 5: From virtualized resources to virtual computing grids: the In-VIGO

S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 5

instances of the same application or as compositionsof distinct applications. Since different applicationsencapsulated in a virtual application may requiredistinct hardware and software resources, the virtualapplication monitor must interact with grid middle-ware in order to locate, reserve, allocate and aggregateall those resources. In the context of In-VIGO, virtualapplications are exposed to the users; the term “tools”is used to refer to the actual applications on whichvirtual applications are based.

3.4. Virtual networks

In the context of In-VIGO, virtual networks are log-ical networks that are decoupled from physical (andother virtual) networks, per user and per application.The goal is to enable multiple virtual applications toshare a physical network, each having the percep-tion that it uses a private, dedicated network whosefunctionality and quality of service match the appli-cation needs and user access rights. Partial implemen-tations of these virtual networks are already possiblein the form of virtual private networks[5] and tun-neling [13]. Extensive ongoing research on overlaynetworks[14–16] is developing technologies that of-fer promise of enabling not only private and securecommunication but also necessary quality of serviceand customized functionality. Quoting[14], an overlaynetwork is “a configuration within which a base net-work is used to support some second network, layeredupon the underlying infrastructure”. According to thesame reference, each virtual overlay network has aglobal unique identifier, some set of access points anda set of strong guarantees. Overlay networks can pro-vide a basis for multiplexing, manifolding and poly-morphism as required by virtual networks. However,further research is needed to understand what kinds offunctionality to support, and how to efficiently builddynamic instances of such overlay networks. There areother published definitions of virtual networks that arenot being considered for implementation in In-VIGObut are worth mentioning. They include networks builtupon a physical network and partitioned for use byindividuals; logically and/or physically allocated sub-sets of network resources; collections of logical sub-sets of physical network resources; and simulations ofone or more networks on top of a physical network[17–19].

3.5. Virtual user interfaces

The notion of virtual user interfaces is a particu-lar case of data virtualization. It leverages ongoingefforts to universalize user interfaces[20–22] whereapplications use an abstract representation of an in-terface to communicate with a device-independentserver that renders the user interface to conform to theabilities of the device. However, unlike HTML, XMLor UIML, virtualization of a user interface also allowsthe appearance of the application to differ accordingto the context on which an application is deployed(polymorphism) and the access to the application us-ing a variety of devices and device types in a singlesession (manifolding and multiplexing). An exampleis an application session where a user starts the ap-plication on a workstation, later queries the state ofthe application using a PDA or a cellular phone andfinally retrieves the application results using a laptopor a public Internet machine.

4. The architecture of In-VIGO

Given that the objective of In-VIGO is to sup-port computational tools for engineering and scienceresearch, the component that is most evident is theVirtual Application (VAP). The In-VIGO VAP is anaggregation of actual tools into a logical applicationsession that allows users to perform session operationssuch as setting up execution parameters, importingdata files, executing tools and retrieving experimentresults. As shown inFig. 4, users do not interact di-rectly with the VAP, but rather by means of a separatecomponent called the User Interface Manager (UIM).The UIM has two primary objectives: provide the vir-tual application with mechanisms to interact with theuser, and provide the user with a virtual interface thatoffers a disconnected session that can be accessedusing a variety of devices.

Virtual applications, and hence their designers, donot concern themselves with the specifics of resourcemanagement, reservation or allocation. A separateresource management component provides the APIsand services needed to allocate and access all the re-sources needed to execute the desired tools with thedesired quality of service. Similarly, virtual applica-tions and the UIM are not concerned with user infor-

Page 6: From virtualized resources to virtual computing grids: the In-VIGO

6 S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

User Interface

Manager

User

Management

System

Global

Information

System

Database

System

Resource

Manager

Virtual

Application

Virtual

Application

Rule

User’s

Actions

Application

Status

Authentication

Virtual Application

Control

Application Control

Virtual Application

Instantiation

Job Contol

Job Control

Virtual File

System

[Uses]

Job Contol

[Uses]

[Uses]

Virtual

Application

Job

Fig. 4. High-level architecture of the In-VIGO system. The primary components shown in this figure are the virtual application, the virtualfile system, the resource manager, the user interface manager, the global Information System, and the user manager.

mation, privileges, authentication or session manage-ment. These are managed by the User ManagementSystem. Finally, the Resource Manager and UserManagement System components use the facilitiesof the Global Information System, which maintainsthe necessary information in a relational DatabaseSystem.

Virtual applications access user data through agrid-wide virtual file system that is set up and con-trolled by the Resource Manager, on behalf of a user.Through this mechanism, applications are presentedwith the interface of a conventional NFS-based dis-tributed file system—no application-level modifica-tions are required to access user data.

4.1. The virtual application

The virtual application in In-VIGO interacts withthe User Interface Manager using a special discon-nected secure XML-based protocol called VirtualApplication Control Protocol. Users do not interactdirectly with the Virtual Application, but rather with

the User Interface Manager. The Virtual Applicationis reactive in that it never attempts to contact the UIM(and, hence, the user) unless it is contacted first. Ituses a sequence number and a shared password tocommunicate with the UIM using a network connec-tion that is closed and discarded once the messageis processed. Hence, a session in In-VIGO is discon-nected, requiring that every action/reply be processedusing a different network connection. Disconnectedsessions have several desirable properties:

1. Sessions can be long-lived and, so long as the vir-tual application does not crash, can survive networkor server outages. Sessions even survive a crash onthe User Interface Manager, since it stores the ses-sion required state on permanent storage.

2. Virtual Application sessions are browser-indepen-dent and hence it is possible for a user to interfacewith a session from different devices. For exam-ple, a user could initiate a transaction on a desktopcomputer, monitor the transaction status on a PDAor a cell phone, and finally recover the results using

Page 7: From virtualized resources to virtual computing grids: the In-VIGO

S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 7

Fig. 5. The architecture of the virtual application. Dotted red arrows represent actions performed at application installation time. Solidarrows indicate actions performed by the system at run-time. The numbers represent the sequence of events from the installation of anapplication, to the execution of a tool, to the retrieval of execution results.

a laptop or from a public computer of an InternetCafe.

There is a Virtual Application instance for everytool, and for every tool session. As shown inFig. 5,the actual behavior of the application is determinedat runtime by an application configuration file thatdescribes how the application interacts with the user,and a set of Java classes that implement the actualapplication behavior (e.g. resource allocation andprocess execution). Hence, the installation of an ap-plication in In-VIGO entails: (1) the installation ofa tool in grid-enabled resources, (2) the creation ofan XML application configuration file, and (3) thecreation of a series of Java classes to control the ap-plication behavior. The latter are referred to as “rules”in In-VIGO. The process of creating the configurationfile and Java-based rules can be automated, especiallyfor applications with a native graphical user interface(which is presented to the user through a VNC-based

virtual display) as well as batch applications thatexpect command-line arguments.

In-VIGO rules represent the control and resourceview of the virtual application in In-VIGO. That is,they specify the resource requirements (operatingsystem, architecture), prerequisites of the execution(directory structure, files) and the execution logiccorresponding to the user’s input. Using the ResourceManager, rules find the proper resources for a jobrequest and execute it using grid protocols.

The Virtual File System, as discussed later, pro-vides users, virtual applications and rules a unified andgrid-wide file system. Rules cannot execute on a filesystem that can potentially change in the middle of atool execution, or the execution of a collection of toolsthat are perceived as a single action by the user, andhence the Resource Manager provides a running rulewith afile fork, a copy of the directory where the toolswill be run, and that is presented to the user once theuser-perceived action is finished.

Page 8: From virtualized resources to virtual computing grids: the In-VIGO

8 S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

This model has been designed to support the seam-less implementation of parameter sweeps in In-VIGO.Virtual application executions that support parametricranges are referred to asranging applications. Sup-port for ranging relies on the Virtual Application togenerate all the values in the range, unfold the val-ues into the parameter and executes the rules as manytimes as the number of values in the range. Hence,multiple executions are performed over the same datafiles but with different parameters, and grid-wide re-sources can be exploited concurrently. Note that thetools themselves are not aware that parameter sweepsare being performed and hence it is possible to rangeparallel executions for tools that traditionally couldonly be run sequentially.

4.2. The virtual file system

A key challenge that must be addressed by grid mid-dleware is the provisioning of data to applications, inthe absence of a centralized administrative domain. Itis desirable that such a data provisioning mechanismbe fast, secure, scalable and interoperable with pro-tocols and application programming interfaces (APIs)that are supported by typical grid nodes.

Data management in In-VIGO is provided by avirtualization layer built on top of NFS. The resultinggrid virtual file system allows dynamic creation anddestruction of file system sessions on a per-user orper-application basis. Such sessions allow on-demanddata transfers, and present to users and applicationsthe API of a widely used distributed network filesystem across nodes of a computational grid. Thevirtualization layer is implemented via proxies thatintercept, modify and forward NFS calls, without re-quiring modifications to existing clients and servers[9].

The dynamic setup of an In-VIGO virtual filesystem (VFS) session involves the execution of a se-quence of tasks under coordination of the ResourceManager. A virtual file system session is establishedfor a logical user account that is dynamically allocatedon behalf of a user, allowing a client-side “shadow” ac-count to access data from a server-side “file” accountthrough an NFS-mounted share[10]. The session setupbegins with the execution of server-side proxies forthe NFS and MOUNT protocol via a grid-aware jobsubmission mechanism (e.g. SSH or Globus). Once

a virtual file system proxy is running, the dynamicsession is established by a client-side mount opera-tion, also initiated by the resource manager via a jobsubmission mechanism. Logically, the In-VIGO gridmiddleware acts as a system administrator to managedynamic client–server data sessions on a per-userbasis.

4.3. The resource manager

Grid resources—CPUs, memory, disks, network,computers, clusters, grids—which may be distributedgeographically and across administrative domains,are managed by a wide variety of resource manage-ment systems. Examples are: operating systems (e.g.Linux with its process scheduler and file systems) forsingle-CPU/SMP machines, batch/job/queue manage-ment systems (e.g. PBS, Sun Grid Engine, Condor)and distributed file systems (e.g. NFS, AFS, PVFS)for clusters of machines, job management systemsfor peer-to-peer clusters (e.g. Enfuzion) and gridresource managers (e.g. Globus GRAM, Storage Re-source Management for grid data). In-VIGO interactswith these resource managers for two purposes: (a)for running In-VIGO middleware jobs to create vir-tual resources, and (b) for running In-VIGO user jobson physical and virtual resources. The core servicesprovided by these resource managers are very similar,however their programming interfaces differ. The re-source manager in In-VIGO provides a uniform andsimple API to access all these resources.

The resource management functionality in In-VIGOis implemented by providing support for the followingphases of running a job:

1. Determine the resource specification for the job(s)based on user/application authorization, applica-tion requirements and explicit resource specifica-tions.

2. Select/create/reserve a resource to run the job,based on the current available resources.

3. Run the job, i.e. run any pre-job tasks (e.g. stagingin files); submit job; monitor progress of job; findout that the job is done; run any post-job tasks (e.g.staging out files); and release resources.

The resource specification and selection API isbased on the Globus RSL language, to specify jobrequirements, and the Condor ClassAd language, to

Page 9: From virtualized resources to virtual computing grids: the In-VIGO

S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 9

describe resources and their constraints. The currentimplementation of this API supports few resourcetypes and very simple selection criteria (round-robinselection of unused resources). The job managementAPIs provided by different distributed resource man-agers (e.g. Globus) are encapsulated to conform to anabstract job submission and control API. This enablesuniform management of jobs using simple API call-ing sequences, independent of the type of resource,and the type of job, i.e. instantiated by the In-VIGOuser or the In-VIGO middleware.

4.4. The user interface manager

The front-end for In-VIGO, or User Interface Man-ager, is a web layer that uses Java Servlets to processrequests and construct responses. The User InterfaceManager captures the actions of the end user and sendsappropriate requests to the Virtual Application Man-ager using the VACP protocol. The UIM initiates arequest in response of a user action and the VAP re-sponds with an appropriate reply in XML form. Thisreply message is then parsed and converted into dif-ferent markup representations depending upon the enddevice. Presently we only support W3C HTML 4.01compliant devices.

In-VIGO requires the User Interface Manager toeasily manage dynamic content. An In-VIGO sessionrequires the web layer to ensure the persistence of thecontent of the pages to be passed between differentcomponents. The MVC[23] (model–view–controller)design pattern is used to provide flexibility for thedesign. To provide a higher layer of abstraction formanaging a large number of concurrent requests thecontent of the user screen is represented as compo-nents: the user screen is logically divided into differ-ent parts, which are represented as Java componentsat the UIM layer. The various components persist theview information for a specific client. This includesthe location of the component on the screen, its con-tent and logic to change its view for different requests.All the logical representations are handled by a sin-gle dispatcher servlet which acts as the controller. Theactual content is generated at the VAP which acts asthe model. By dividing the functionality we decou-ple the application logic, content presentation and userinteraction. This model facilitates the dynamic gener-ation of content and provides flexibility in terms of

easy transcoding, caching of content and renderingthe view.

Fig. 6shows a sample In-VIGO screen rendered ona Windows Internet Explorer workstation. The screenshows a step of the execution of the Dinero applica-tion, where execution parameters are being defined bythe user. The labels highlight the screen components,each representing a different aspect of the applicationexecution.

4.5. The global information system

Similar to the Grid Information Service (GIS)[24],the goal of the In-VIGO Information System is toprovide a uniform interface (framework) to allowinformation flow between various In-VIGO compo-nents, hiding the complexity of dealing with a varietyof information access interfaces. When a user inter-acts with the In-VIGO web interface (e.g. starts ajob), a variety of events take place in the grid throughdifferent layers/modules/services of In-VIGO. Eachof these components requires access to different setsof information. The Information System in In-VIGOmaintains the information about entities that partic-ipate in a computational grid: users, access controllists, computational resources (e.g. hosts, shadow ac-counts, file accounts, virtual machines) and services(e.g. user sessions, VFS sessions, applications).

The current In-VIGO Information System is basedon the centralized relational database server MySQLto store and manage the necessary information but itcan be extended to support other types of informationproviders (e.g. Lightweight Directory Access Protocolserver, Condor ClassAd Catalog server), as shown inFig. 7. The relational data model approach is attrac-tive to In-VIGO because the majority of the informa-tion is updated on-demand or on-failure, and requiresACID (atomicity, consistency, isolation, durability)properties.

4.6. The user manager

The In-VIGO user management layer deals withuser authentication and access control. The infor-mation to authenticate a user can be stored in manyways: Unix system files (/etc/passwd etc.), an ex-ternal LDAP database, an external SQL database orother data storage. The API for each authentication

Page 10: From virtualized resources to virtual computing grids: the In-VIGO

10 S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

Fig. 6. Sample In-VIGO screen showing the execution of the Dinero simulator and the screen components.

mechanism is different in each case, however the in-formation needed is usually a user/password pair. TheUser Management layer hides the variety of authen-tication mechanisms, providing a uniform and simpleAPI to perform authentication for all In-VIGO compo-

User

Manager

Resource

Manager

Application

Manager

In-VIGO Information System

MySQL

DBMSLDAP classads

User

Manager

Resource

Manager

Application

Manager

In-VIGO Information System

MySQL

DBMSLDAP classads

Fig. 7. The In-VIGO information system is based on a central-ized MySQL database server but can be extended to support anyinformation provider.

nents. The access control is achieved by using a pow-erful and flexible mechanism in which applications areable to specify access constraints by assigning actionrules to different permission groups, while the user isable to constraint his capability by using different userclasses.

If the services are provided through a stateless pro-tocol such as the HTTP, the layer on the top is requiredto be stateful to keep track of user actions. The meth-ods for user session tracking commonly used whenworking with HTTP are: basic authentication, cook-ies, and URL rewriting. Although the User Manager(UM) provides an interface that supports all of thesemethods, the last one is used in In-VIGO to avoidcookies or the transmission of the user password ineach interaction. Another desired feature of web ser-vices provider is the capability of maintaining multi-ple states for each user. In In-VIGO this is achieved bydecoupling the browser identification and the session

Page 11: From virtualized resources to virtual computing grids: the In-VIGO

S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 11

identification, which is maintained in the InformationSystem.

5. Implementation

In-VIGO was implemented at the ACIS Laboratoryand is at the time of this writing in its first beta ver-sion release. It was implemented using Java and JavaServlets for the core of the system, Perl for operat-ing system-specific operations and C for the VirtualFile System. In-VIGO leverages on existing standardsand open source technologies wherever possible, andin particular uses XML for all configuration files andprotocols, Condor ClassAds for resource description,Globus and SSH for resource invocation and SQLservers as data repositories.

In-VIGO has been configured to run a variety ofscientific and test applications, including:

• Dinero (a cache simulator for memory referencetraces supporting multi-level caches and dissimilarI and D caches).

• CNT-IV (an application that calculates theC–V andI–V characteristics of ideal, ballistic carbon nan-otube field-effect transistors).

• CoWord (a CoWord analysis tool), DLX View (aninteractive pipelined computer simulator using theDLX instruction set).

• GAMESS (the General Atomic and Molecular Elec-tronic Structure System—a general ab initio quan-tum chemistry package).

• LSS (Light Scattering Spectroscopy—used to esti-mate the diameter, diameter deviation and refractiveindex of pixels in an image).

• MolCToy (a collection of simple theoretical mod-els of the conduction through individual moleculesbetween two contacts).

• Nanomos (a self-consistent 2D-simulator for thinbody, fully depleted, double gated, n-MOSFETs).

• Simple Scalar (a set of fast, flexible and accurateuni-processor simulators based on a close derivativeof the MIPS architecture).

• CACTI (an integrated cache access time, cycle time,area, aspect ratio, and power model).

• Huckel-IV (a simple quantitative method which cal-culates current–voltage characteristics and conduc-tance spectrum of a molecule sandwiched between

two metallic contacts one of which could be a scan-ning probe).

• XSpim (a software simulator that runs programswritten for MIPS R2000 and R3000 processors).

• XGobi (a package for plotting scatterplots ofmulti-dimensional tabular data).

• Fract-O-Rama (a fractal generation program).• GNU Chess application using the X-Board graph-

ical user interface.Fig. 6 shows a typical screen-shot of In-VIGO executing the Dinero CacheSimulator.

All users have a file system that is shared by allapplications and that can be accessed by using a filemanager or a Unix prompt on a virtual machine thatprovides users with an isolated and secure virtualoperating environment. The Unix virtual machine iscalled a Virtual Workspace and is accessible throughthe toolbar on all applications. This Virtual Workspaceis a VMware virtual machine with a minimal config-uration that mounts the user’s file system using theVirtual File System.

In its current deployment version, In-VIGO SSH isused for executing jobs that are latency sensitive andGlobus is used for long-running jobs, yet users areunaware of the grid mechanisms used and their jobscan execute on any machine available in the currentpool: a 40-node Linux cluster, a Z800 IBM server and4 dual-processor Linux servers.

6. Conclusions

The In-VIGO grid-computing system described isdesigned to support computational tools for engineer-ing and science research In Virtual Information GridOrganizations. A novel aspect of this system is theextensive use of virtualization technology, emergingstandards for grid-computing and other Internet mid-dleware to enable the creation of dynamic pools ofvirtual resources that can be aggregated on-demandfor application-specific computational grids. Build-ing grids out of physical resources to constructingvirtual grids has many advantages but also requiresnew thinking on how to architect, manage and opti-mize the necessary middleware. In-VIGO representsa first step towards this goal and our experienceshave shown that grid-computing middleware is still

Page 12: From virtualized resources to virtual computing grids: the In-VIGO

12 S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

lacking support for robust and fault tolerant dis-tributed computing for interactive applications, andthat virtualization can compensate for many of theselimitations.

Acknowledgements

This material is based upon work supportedby the National Science Foundation under GrantsNo. EIA-9975275, EIA-0224442, ACI-0219925,EEC-0228390 and NSF Middleware Initiative (NMI)collaborative grants ANI-0301108/ANI-0222828, andby the Army Research Office Defense UniversityResearch Initiative in Nanotechnology. The authorsalso acknowledge two SUR grants from IBM and agift from VMware Corporation. Any opinions, find-ings and conclusions or recommendations expressedin this material are those of the authors and do notnecessarily reflect the views of the National Sci-ence Foundation, Army Research Office, IBM, orVMware.

References

[1] I. Foster, C. Kesselman, S. Tuecke, The anatomy of the grid:enabling scalable virtual organizations, Int. J. Supercomput.Appl. 15 (3) (2001) 200–222.

[2] K. Krauter, R. Buyya, M. Maheswaran, A taxonomy andsurvey of grid resource management systems for distributedcomputing, Softw. Pract. Exp. 32 (2) (2002) 135–164.

[3] J. Fotheringham, Dynamic storage allocation in the Atlasincluding an automatic use of a backing store, Commun.ACM 4 (10) (1961) 435–436.

[4] R.P. Goldberg, Survey of virtual machine research, IEEEComput. Mag. 7 (6) (1974) 34–45.

[5] B. Gleeson, A. Lin, J. Heinanen, G. Armitage, A. Malis, AFramework for IP-based Virtual Private Networks, RFC 2764,Internet Engineering Task Force, February 2000.

[6] W.E. Johnston, D. Gannon, B. Nitzberg, L.A. Tanner, B.Thigpen, A. Woo, Computing and data grids for scienceand engineering, in: Proceedings of the 2000 ACM/IEEEConference on Supercomputing (CDROM), no. 52, 2000.

[7] N.H. Kapadia, J.A.B. Fortes, PUNCH: an architecture forweb-enabled wide-area network-computing special issue onhigh performance distributed computing of cluster computing,J. Netw. Softw. Tools Appl. 2 (2) (1999) 153–164 (specialissue).

[8] I. Foster, C. Kesselman, J.M. Nick, S. Tuecke, The Physiologyof the Grid: An Open Grid Services Architecture forDistributed Systems Integration, Technical Report, Open

Grid Service Infrastructure WG, Global Grid Forum, June2002.

[9] R. Figueiredo, N.H. Kapadia, J.A.B. Fortes, The PUNCHvirtual file system: seamless access to decentralized storagein a computational grid, in: Proceedings of the 10th IEEEInternational Symposium on High Performance Computing,August 2001.

[10] N.H. Kapadia, R. Figueiredo, J.A.B. Fortes, Enhancing thescalability and usability of computational grids via logicaluser accounts and virtual file systems, in: Proceedings ofthe Heterogeneous Computing Workshop (HCW) at theInternational Parallel and Distributed Processing Symposium(IPDPS), April 2001.

[11] R. Figueiredo, P.A. Dinda, J.A.B. Fortes, A case forgrid computing on virtual machines, in: Proceedings ofInternational Conference on Distributed Computing Systems(ICDCS), May 2003.

[12] J. Sugerman, G. Venkitachalam, B.-H. Lim, Virtualizing I/Odevices on VMware workstation’s hosted virtual machinemonitor, in: Proceedings of the USENIX Annual TechnicalConference, USENIX Association, June 2001, pp. 1–14.

[13] R. Thayer, IP Security Document Roadmap, Internet RFC2411, November 1998.

[14] K.P. Birman, Technology challenges for virtual overlaynetworks, IEEE Trans. Syst. Man Cybern. Part A. Syst. Hum.31 (4) (2001) 319–326.

[15] D. Andersen, H. Balakrishnan, F. Kaashoek, R. Morris,Resilient overlay networks, in: Proceedings of the 18thACM Symposium on Operating Systems Principles (SOSP),October 2001, Banff, Canada.

[16] J. Touch, Y. Wang, L. Eggert, G. Finn, Virtual Internet Archi-tecture, Future Developments of Network Architectures (FD-NA) at Sigcomm, August 2003. Available as ISI-TR-2003-570.

[17] Z. Dziong, Y. Xiong, L. Mason, Virtual network conceptand its applications for resource management in ATMbased networks, in: Proceedings of IFIP/IEEE InternationalConference on Broadband Communications’96, April 1996,pp. 223–234.

[18] G. Woodruff, N. Perinpanathan, F. Chang, P. Appanna, A.Leon-Garcia, ATM Network Resources Management usingLayer and Virtual Network Concepts, IM’97, May 1997.

[19] K.H. Randall, Virtual networks: implementation and analysis,Masters Thesis, Department of Electrical Engineering andComputer Science, Massachusetts Institute of Technology,1994.

[20] T. Bray, J. Paoli, C.M. Sperberg-McQueen, E. Maler,Extensible Markup Language (XML) 1.0, 2nd ed., W3CRecommendation, 6 October 2000.

[21] D. Raggett, A. Le Hors, I. Jacobs, HTML 4.0 Specification.http://www.w3.org/TR/1998/REC-html40-19980424.

[22] M. Abrams, J. Helms, User Interface Markup Language (UI-ML) Specification, Language Version 3.0, February 2002.http://www.uiml.org/specs/docs/uiml30-revised-02-12-02.pdf.

[23] Sun Microsystems, Inc., Java BluePrints Model-View-Con-troller. http://java.sun.com/blueprints/patterns/MVC-detailed.html.

Page 13: From virtualized resources to virtual computing grids: the In-VIGO

S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx 13

[24] Grid Forum, Grid Forum web site:http://www.gridforum.org.

Sumalatha Adabala is a PhD candidatein Electrical and Computer Engineeringat Purdue University. She received herBE from Bangalore University and MSEEfrom Purdue University. Her research in-terests include computer architecture, vir-tual machines, operating systems and dis-tributed systems.

Vineet Chadha is a graduate student inthe Advance Computing and InformationSystem Laboratory of the University ofFlorida. His research interests include dis-tributed file systems, data caching, com-puter architecture and open source soft-ware. He has a Master’s degree in com-puter science from Mississippi State Uni-versity with software engineering as spe-cialization. He is a member of the hon-

our society of Computer Science Upsilon Pi Epsilon. He lovesLinux kernel hacking and playing around with Linux utilities. Hishobbies include reading magazines, Indian classical music and anevening walk.

Puneet Chawla is a graduate student inthe Advanced Computing and Informa-tions Systems Laboratory at Universityof Florida. His research interests are inthe areas of distributed and enterprisecomputing. Chawla received his Bache-lors of Engineering Degree from PunjabEngineering College, India. He is also aSun Certified Java Programmer and De-veloper.

Renato Figueiredo received his PhD de-gree in Computer Engineering from Pur-due University in 2001. He is currentlyan Assistant Professor in the Depart-ment of Electrical and Computer Engi-neering at the University of Florida. Priorto joining the University of Florida hewas an assistant professor at NorthwesternUniversity. His research interests includenetwork-centric grid computing systems,

virtual machines, distributed file systems and high-performancecomputer architectures.

José Fortes received his BS degree inElectrical Engineering from the Universi-dade de Angola in 1978, his MS degree inElectrical Engineering from the ColoradoState University, Fort Collins in 1981 andhis PhD degree in Electrical Engineeringfrom the University of Southern Califor-nia, Los Angeles in 1984. From 1984 to2001 he was in the faculty of the Schoolof Electrical Engineering of Purdue Uni-

versity at West Lafayette, Indiana. In 2001, he joined both theDepartment of Electrical and Computer Engineering and the De-partment of Computer and Information Science and Engineeringof the University of Florida as Professor and BellSouth EminentScholar. His research interests are in the areas of parallel process-ing, computer architecture, network-computing and fault-tolerantcomputing. José Fortes is a Fellow of the Institute of Electricaland Electronics Engineers (IEEE) professional society. He was aDistinguished Visitor of the IEEE Computer Society from 1991to 1995.

Ivan Krsul was born in Bolivia, SouthAmerica. He received his Bachelor ofScience in Computer Engineering degreefrom the Catholic University of Americain May 1989 and his MS and PhD de-grees in Computer Science from PurdueUniversity in May 1994 and May 1998,respectively. He is currently a Visiting As-sistant Professor at the Advanced Com-puting and Information Systems (ACIS)

Laboratory, University of Florida department of Electrical & Com-puter Engineering, having returned to academia from a 4-year in-cursion into the implementation of digital government solutionsfor the Government of Bolivia. Dr. Krsul is interested in systemsecurity, distributed systems and software engineering solutionsfor dynamic and complex organizations.

Andrea Matsunaga received her BS de-gree in Electrical Engineering from thePolytechnic School of the University ofSao Paulo (Brazil), in 1998, her MS de-gree in Electrical Engineering from theElectronic Systems Department (PSI) ofthe Polytechnic School of the Universityof São Paulo (Brazil), in 2001; and iscurrently a graduate student at the De-partment of Electrical and Computer En-

gineering of the University of Florida, working with Dr. Fortes.Her research interests are in the areas of distributed computing,information management, computer architecture and dsm systems.

Page 14: From virtualized resources to virtual computing grids: the In-VIGO

14 S. Adabala et al. / Future Generation Computer Systems xxx (2004) xxx–xxx

Mauricio Tsugawa received his BS de-gree in Electrical Engineering from thePolytechnic School of the University ofSao Paulo (Brazil), in 1998, his MS de-gree in Electrical Engineering from theElectronic Systems Department (PSI) ofthe Polytechnic School of the Universityof São Paulo (Brazil), in 2001; and is cur-rently a graduate student at the Departmentof Electrical and Computer Engineering of

the University of Florida, working with Dr. Jose Fortes. His re-search interests are in the areas of distributed computing, parallelprocessing, computer architecture and configurable logic design.

Jian Zhang is currently a PhD studentin the ACIS laboratory at the Universityof Florida. She received her MS degreein Electrical and Computer Engineeringfrom the University of Florida in 2001.

Ming Zhao is a PhD Research Assis-tant at the ACIS Laboratory, University ofFlorida. He has received both BS and MSdegrees from Tsinghua University, China,in 1999 and 2001, respectively.

Liping Zhu received her BS degree in1993 from Hubei Automotive IndustrialInstitute, and her MS degree in 1998 fromBeijing University of Posts and Telecom-munications, China. She has previouslyworked in Beijing Telecom R&D Cen-ter and is currently a PhD student in theDepartment of Electrical and ComputerEngineering, University of Florida. Herresearch interests include resource man-

agement and virtual applications in grid computing.

Xiaomin Zhu was born in China in 1975.He received his BS degree from Depart-ment of Telecommunications at BeijingUniversity of Posts and Telecommunica-tions (China) in July 1997. He was a sys-tem engineer in Nortel Networks (China)Ltd. Company and Ericsson (China) Ltd.Company. His working area was 3G wire-less communication and Telecommunica-tion Network Management System. Cur-

rently, he is a graduate student of Department of Electrical andComputer Engineering at University of Florida and working asa research assistant in the Advanced Computing and InformationSystems Laboratory. He is one of the researchers of In-VIGOsystem and participates in the development of Virtual Applicationsubsystem of In-VIGO. His research interests are in the areas ofGrid Computing, Virtualization and Networking.