50
[email protected] ARGONNE F CHICAGO l Grid Computing: Resource Sharing and Coordinated Problem Solving in Scalable Distributed Communities l Grid computing technologies enable controlled resource sharing in distributed communities and the coordinated use of those shared resources as community members tackle common goals. These technologies include new protocols, services, and APIs for secure resource access, resource management, fault detection, communication, and so forth, that in term enable new application concepts such as virtual data, smart instruments, collaborative design spaces, and metacomputations. In this talk, we review applications that are motivating widespread interest in Grid concepts within the scientific and engineering communities. Then, we describe the Globus Grid architecture that has been adopted by many Grid projects, focusing in particular on our security, resource management, and data management technologies.

[email protected] ARGONNE CHICAGO

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

l Grid Computing: Resource Sharing and Coordinated ProblemSolving in Scalable Distributed Communities

l Grid computing technologies enable controlled resource sharing indistributed communities and the coordinated use of those sharedresources as community members tackle common goals. Thesetechnologies include new protocols, services, and APIs for secureresource access, resource management, fault detection,communication, and so forth, that in term enable new applicationconcepts such as virtual data, smart instruments, collaborative designspaces, and metacomputations. In this talk, we review applicationsthat are motivating widespread interest in Grid concepts within thescientific and engineering communities. Then, we describe the GlobusGrid architecture that has been adopted by many Grid projects,focusing in particular on our security, resource management, anddata management technologies.

Page 2: foster@mcs.anl.gov ARGONNE CHICAGO

Grid Computing

Ian FosterMathematics and Computer Science Division

Argonne National LaboratoryDepartment of Computer Science

The University of Chicago

Carl Kesselman

Information Sciences InstituteUniversity of Southern California

Page 3: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Overview

l Grid computing concept

l Grid technical landscape

l Data grids & data grid projects

Page 4: foster@mcs.anl.gov ARGONNE CHICAGO

The Grid Concept

Page 5: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

The Grid: The Web on Steroids

Page 6: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

The Grid: The Web on Steroids

http://

http://

Web: Uniformaccess to HTMLdocuments

Page 7: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

The Grid: The Web on Steroids

http://

http://

Web: Uniformaccess to HTMLdocuments

Grid: Flexible,high-perf accessto all significantresources

Page 8: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

The Grid: The Web on Steroids

http://

http://

Web: Uniformaccess to HTMLdocuments

Grid: Flexible,high-perf accessto all significantresources

Page 9: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

The Grid: The Web on Steroids

http://

http://

Web: Uniformaccess to HTMLdocuments

Grid: Flexible,high-perf accessto all significantresources

Sensornets

Data archives

Computers

Softwarecatalogs

Colleagues

Page 10: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

The Grid: The Web on Steroids

http://

http://

Web: Uniformaccess to HTMLdocuments

Grid: Flexible,high-perf accessto all significantresources

Sensornets

Data archives

Computers

Softwarecatalogs

Colleagues

On-demand creation of powerful virtual computing systems

Page 11: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Computing: Take 2

l Enable communities (“virtual organizations”)to share geographically distributed resourcesas they pursue common goals—in theabsence of central control, omniscience,trust relationships

l Via investigations of– New applications that become possible when

resources can be shared in a coordinated way

– Protocols, algorithms, persistentinfrastructure to facilitate sharing

Page 12: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Communities and Applications:NSF National Technology Grid

Page 13: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Communities and Applications:Network for Earthquake Eng. Simulation

l NEESgrid: nationalinfrastructure to coupleearthquake engineerswith experimentalfacilities, databases,computers, & each other

l On-demand access toexperiments, datastreams, computing,archives, collaboration

NEESgrid: Argonne, Michigan, NCSA, UIUC, USC

Page 14: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Communities and Applications:Mathematicians Solve NUG30

l Community=an informalcollaboration ofmathematicians andcomputer scientists

l Condor-G delivers 3.46E8CPU seconds in 7 days(peak 1009 processors) inU.S. and Italy (8 sites)

l Solves NUG30 quadraticassignment problem

14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13,26,17,30,6,20,19,8,18,7,27,12,11,23

MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

Page 15: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

l Community =– 1000s of home

computer users

– Philanthropiccomputing vendor(Entropia)

– Research group(Scripps)

l Common goal=advance AIDSresearch

Grid Communities and Applications:Home Computers Evaluate AIDS Drugs

Page 16: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

A Little History

l Early 90s–Gigabit testbeds, metacomputing

l Mid to late 90s–Early experiments (e.g., I-WAY), academic

software projects (e.g., Globus), applicationexperiments

l 2000–Major application communities emerging–Major infrastructure deployments–Rich technology base–Grid Forum: >300 people, >90 orgs, 11 countries

Page 17: foster@mcs.anl.gov ARGONNE CHICAGO

Technical Landscape

Page 18: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Technical Challenges:Sharing, Coordinated Use

l New problem solving methods and tools– Data, computing, collaboration, sensors, …

l Numerous cross-cutting technical problems– Authentication, authorization, policy, audit;

resource discovery, access, allocation, control;failure detection & recovery; brokering; …; …

l No central control, omniscience, trust; hence– Need to preserve local site independence

– Policy discovery and negotiation important

– Myriad interesting failure modes

Page 19: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

How Are Cross-CuttingTechnical Issues Addressed?

l Development of Grid protocols & services– Protocol-mediated access to remote resources

– New services: e.g., resource brokering

– “On the Grid” = speak Intergrid protocols

– Mostly (extensions to) existing protocols

l Development of Grid APIs & SDKs– Facilitate application development by supplying

higher-level abstractions

l The (hugely successful) model is the Internet

l The Grid is not a distributed OS!

Page 20: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Page 21: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Fabric“Controlling things locally”: Accessto, & control of, resources

Page 22: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Fabric“Controlling things locally”: Accessto, & control of, resources

Connectivity“Talking to things”: communication(Internet protocols) & security

Page 23: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Fabric“Controlling things locally”: Accessto, & control of, resources

Connectivity“Talking to things”: communication(Internet protocols) & security

Resource“Sharing single resources”:negotiating access, controlling use

Page 24: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Fabric“Controlling things locally”: Accessto, & control of, resources

Connectivity“Talking to things”: communication(Internet protocols) & security

Resource“Sharing single resources”:negotiating access, controlling use

Collective“Managing multiple resources”:ubiquitous infrastructure services

Page 25: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Fabric“Controlling things locally”: Accessto, & control of, resources

Connectivity“Talking to things”: communication(Internet protocols) & security

Resource“Sharing single resources”:negotiating access, controlling use

Collective“Managing multiple resources”:ubiquitous infrastructure services

User“Specialized services”: user- orappln-specific distributed services

Page 26: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Fabric“Controlling things locally”: Accessto, & control of, resources

Connectivity“Talking to things”: communication(Internet protocols) & security

Resource“Sharing single resources”:negotiating access, controlling use

Collective“Managing multiple resources”:ubiquitous infrastructure services

User“Specialized services”: user- orappln-specific distributed services

Application

Page 27: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Fabric“Controlling things locally”: Accessto, & control of, resources

Connectivity“Talking to things”: communication(Internet protocols) & security

Resource“Sharing single resources”:negotiating access, controlling use

Collective“Managing multiple resources”:ubiquitous infrastructure services

User“Specialized services”: user- orappln-specific distributed services

Application

InternetTransport

Application

Link

Intern

et Protocol Arch

itecture

Page 28: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Protocols, Services, and InterfacesOccur at Each Level

Languages/Frameworks

Fabric Layer

Applications

Local Access APIs and Protocols

Collective Service APIs and SDKs

Collective ServicesCollective Service Protocols

Resource APIs and SDKs

Resource ServicesResource Service Protocols

User Service ProtocolsUser Service APIs and SDKs

User Services

Connectivity APIs

Connectivity Protocols

Page 29: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Services Architecture:Fabric Layer Protocols & Services

l Just what you would expect: the diversemix of resources that may be shared– Individual computers, Condor pools, file

systems, archives, metadata catalogs,networks, sensors, etc., etc.

l Few constraints on low-level technology:connectivity and resource level protocolsform the “neck in the hourglass”

l Defined by interfaces not physicalcharacteristics

Page 30: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Services Architecture:Connectivity Layer Protocols & Services

l Communication– Internet protocols: IP, DNS, routing, etc.

l Security: Grid Security Infrastructure (GSI)– Uniform authentication & authorization

mechanisms in multi-institutional setting

– Single sign-on, delegation, identity mapping

– Public key technology, SSL, X.509, GSS-API

– Supporting infrastructure: CertificateAuthorities, key management, etc.

GSI: www.globus.org

Page 31: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Services Architecture:Resource Layer Protocols & Services

l Grid Resource Allocation Mgmt (GRAM)– Remote allocation, reservation, monitoring,

control of compute resources

l GridFTP protocol (FTP extensions)– High-performance data access & transport

l Grid Resource Information Service (GRIS)– Access to structure & state information

l Network reservation, monitoring, control

l All integrated with GSI: authentication,authorization, policy, delegation

GRAM, GridFTP, GRIS: www.globus.org

Page 32: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Services Architecture:Collective Layer Protocols & Services

l Index servers aka metadirectory services– Custom views on dynamic resource collections

assembled by a community

l Resource brokers (e.g., Condor Matchmaker)– Resource discovery and allocation

l Replica catalogs

l Co-reservation and co-allocation services

l Etc., etc.

Metadirectory: www.globus.org; Condor: www.cs.wisc.edu/condor

Page 33: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

SourceCode Repository

SDK

API

LookupProtocol

Example: User Portal

Web Portal

Source code discovery, applicationconfiguration

Brokering, co-allocation, certificateauthorities

Access to data, access to computers,access to network performance data, …

Communication, service discovery (DNS),authentication, authorization, delegation

Storage systems, clusters, networks, …

User

Appln

Collective

Resource

Connect

Fabric

Page 34: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

SourceCode Repository

SDK

API

LookupProtocol

Example: User Portal

Web Portal

Source code discovery, applicationconfiguration

Brokering, co-allocation, certificateauthorities

Access to data, access to computers,access to network performance data, …

Communication, service discovery (DNS),authentication, authorization, delegation

Storage systems, clusters, networks, …

User

Appln

Collective

Resource

Connect

FabricComputeResource

SDK

API

AccessProtocol

Page 35: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

CheckpointRepository

SDK

API

C-pointProtocol

Example:High-Throughput Computing System

High Throughput Computing System

Dynamic checkpoint, job management,failover, staging

Brokering, certificate authorities

Access to data, access to computers,access to network performance data

Communication, service discovery (DNS),authentication, authorization, delegation

Storage systems, schedulers

User

Appln

Collective

Resource

Connect

Fabric

Page 36: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

CheckpointRepository

SDK

API

C-pointProtocol

Example:High-Throughput Computing System

High Throughput Computing System

Dynamic checkpoint, job management,failover, staging

Brokering, certificate authorities

Access to data, access to computers,access to network performance data

Communication, service discovery (DNS),authentication, authorization, delegation

Storage systems, schedulers

User

Appln

Collective

Resource

Connect

FabricComputeResource

SDK

API

AccessProtocol

Page 37: foster@mcs.anl.gov ARGONNE CHICAGO

Data Grids

Page 38: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Data Grid Problem

l “Enable a geographically distributedcommunity [of thousands] to pool theirresources in order to perform sophisticated,computationally intensive analyses onPetabytes of data”

l Note that this problem– Is not unique to physics

– Overlaps strongly with other Grid problems

l Data Grids do introduce new requirementsand R&D challenges

Page 39: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Data Grid Hierarchy Concept

Tier2 Centre~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance RegionalCentre

Italy RegionalCentre

Germany RegionalCentre

InstituteInstituteInstituteInstitute~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or morechannels; data for these channels should be cached by theinstitute server

Physics data cache

~PBytes/sec

~622 Mbits/secor Air Freight (deprecated)

Tier2 Centre~1 TIPS

Tier2 Centre~1 TIPS

Tier2 Centre~1 TIPS

Caltech~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

Image courtesy Harvey Newman

Page 40: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Major Data Grid Projects

l Earth System Grid (DOE Office of Science)– DG technologies, climate applications

l European Data Grid (EU)– DG technologies & deployment in EU

l GriPhyN (NSF ITR)– Investigation of “Virtual Data” concept

l Particle Physics Data Grid (DOE Science)– DG applications for HENP experiments

Page 41: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Data Grid Architecture

Discipline-Specific Data Grid Application

Coherency control, replica selection, task management,virtual data catalog, virtual data code catalog, …

Replica catalog, replica management, co-allocation,certificate authorities, metadata catalogs,

Access to data, access to computers, access to networkperformance data, …

Communication, service discovery (DNS),authentication, authorization, delegation

Storage systems, clusters, networks, network caches, …

User

Appln

Collective

Resource

Connect

Fabric

Page 42: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Globus Data-IntensiveServices Architecture

Library

Program

Legend

globus-url-copy

ReplicaPrograms

CustomServers

globus_gass_copy

globus_ftp_client

globus_ftp_control

globus_common GSI (security)

globus_io OpenLDAP client

globus_replica_catalog

globus_replica_manager

CustomClients

globus_gass_transfer

globus_gass

Alreadyexist

Page 43: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

High-Level View of Earth System Grid:A Model Architecture for Data Grids

MetadataCatalog

ReplicaCatalog

Tape Library

Disk Cache

AttributeSpecification

Logical Collection andLogical File Name

Disk Array Disk Cache

Application

Replica Selection

Multiple Locations

NWS

SelectedReplica

GridFTP commands PerformanceInformation &Predictions

Replica Location 1 Replica Location 2 Replica Location 3

MDS

Page 44: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Grid Physics Network (GriPhyN) Enabling R&D for advanced data grid systems,

focusing in particular on Virtual Data concept

Virtual Data ToolsRequest Planning and

Scheduling ToolsRequest Execution Management Tools

Transforms

Distributed resources(code, storage,computers, and network)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid Services

Other Grid Services

Interactive User Tools

Production Team

Individual Investigator Other Users

Raw data source

ATLASCMSLIGOSDSS

Page 45: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

The Virtual Data Concept

“[a virtual data grid enables] the definitionand delivery of a potentially unlimitedvirtual space of data products derived fromother data. In this virtual space, requestscan be satisfied via direct retrieval ofmaterialized products and/or computation,with local and global resource management,policy, and security constraints determiningthe strategy used.”

Page 46: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Virtual Datain Action

?

Major Archive Facilities

Network caches & regional centers

Local sites

l Data request mayl Access local data

l Compute locally

l Compute remotely

l Access remotedata

l Scheduling subjectto local & globalpolicies

l Local autonomy

Page 47: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Abbreviated Acknowledgments

l Other Globus principals include: SteveTuecke, Ann Chervenak, Bill Allcock,Gregor von Laszewski, Lee Liming, SteveFitzgerald

l Data Grid collaborators: Arie Shoshani,Reagan Moore, Miron Livny

l GriPhyN Co-PI: Paul Avery

l PPDG PIs: Richard Mount, Harvey Newman

Page 48: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

Summary

l Grid computing concepts are real: theyunderlie a variety of major projects inphysics, other sciences, and industry

l Promise is that by allowing coordinatingsharing, enable quantitative & qualitativechanges in problem solving techniques

l Clear picture of Grid architecture emerging

l Data Grid concepts and technologies areless mature but are appearing rapidly

Page 49: foster@mcs.anl.gov ARGONNE CHICAGO

[email protected] ARGONNE F CHICAGO

For More Information

l Book (Morgan Kaufman)– www.mkp.com/grids

l Globus– www.globus.org

l Grid Forum– www.gridforum.org

l PPDG– www.ppdg.net

l GriPhyN– www.griphyn.org

Ludwig Pregernig
Ludwig Pregernig
Ludwig Pregernig
Ludwig Pregernig
Ludwig Pregernig
Page 50: foster@mcs.anl.gov ARGONNE CHICAGO

The End