50
The Grid: Beyond the Hype Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster Seminar, Duke, September 14, 2004

The Grid: Beyond the Hype

  • Upload
    talia

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

The Grid: Beyond the Hype. Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster. Seminar, Duke, September 14, 2004. Grid Hype. The Shape of Grids to Come?. Energy Internet. Internet Hype?. eScience & Grid: 6 Theses. - PowerPoint PPT Presentation

Citation preview

Page 1: The Grid: Beyond the Hype

The Grid:Beyond the Hype

Ian Foster

Argonne National Laboratory

University of Chicago

Globus Alliance

www.mcs.anl.gov/~foster

Seminar, Duke, September 14, 2004

Page 2: The Grid: Beyond the Hype

3

Grid Hype

Page 3: The Grid: Beyond the Hype

4Energy Internet

The Shape of Grids to Come? InternetHype?

Page 4: The Grid: Beyond the Hype

5

eScience & Grid: 6 Theses1. Scientific progress depends increasingly on large-scale

distributed collaborative work

2. Such distributed collaborative work raises challenging problems of broad importance

3. Any effective attack on those problems must involve close engagement with applications

4. Open software & standards are key to producing & disseminating required solutions

5. Shared software & service infrastructure are essential application enablers

6. A cross-disciplinary community of technology producers & consumers is needed

Page 5: The Grid: Beyond the Hype

Global Knowledge Communities: E.g., High Energy Physics

Page 6: The Grid: Beyond the Hype

7

The Grid “Resource sharing & coordinated

problem solving in dynamic, multi-institutional virtual organizations”

1. Enable integration of distributed resources

2. Using general-purpose protocols & infrastructure

3. To achieve better-than-best-effort service

Page 7: The Grid: Beyond the Hype

8

The Grid (2) Dynamically link resources/services

From collaborators, customers, eUtilities, … (members of evolving “virtual organization”)

Into a “virtual computing system” Dynamic, multi-faceted system spanning

institutions and industries Configured to meet instantaneous needs, for:

Multi-faceted QoX for demanding workloads Security, performance, reliability, …

Page 8: The Grid: Beyond the Hype

9

Software,Standards

Problem-Driven, Collaborative Research Methodology

Design

DeployBuild

Apply

Analyze

ApplyApply

Deploy

Apply

ComputerScience

Infra-structure

DisciplineAdvances

GlobalCommunity

Page 9: The Grid: Beyond the Hype

10

Problem-Driven, Collaborative Research Methodology

Design

DeployBuild

Apply

Analyze

ApplyApply

Deploy

Apply

ComputerScience

Software,Standards

DisciplineAdvances

Infra-structure

GlobalCommunity

Page 10: The Grid: Beyond the Hype

11

Resource/Service Integrationas a Fundamental Challenge

R

Discovery

Many sourcesof data, services,computation

R

Registries organizeservices of interestto a community

Access

Data integration activitiesmay require access to, &exploration/analysis of, dataat many locations

Exploration & analysismay involve complex,multi-step workflows

RM

RM

RMRM

RM

Resource managementis needed to ensureprogress & arbitrate competing demands

Securityservice

Securityservice

PolicyservicePolicyservice

Security & policymust underlie access& managementdecisions

Page 11: The Grid: Beyond the Hype

12

CPU v. Collab.

10

100

1,000

10,000

100,000

0 500 1000 1500 2000 2500

Collaboration Size

CPU CPU v. Collab.

Earth Simulator

Atmospheric Chemistry Group

LHC Exp.

Astronomy

Grav. Wave

Nuclear Exp.

Current accelerator Exp.

Scale Metrics: Participants, Data, Tasks, Performance, Interactions, …

Page 12: The Grid: Beyond the Hype

13

Profound Technical Challenges

How do we, in dynamic, scalable, multi-institutional, computationally & data-rich settings:

Negotiate & manage trust Access & integrate data Construct & reuse workflows Plan complex computations Detect & recover from failures Capture & share knowledge Represent & enforce policies Achieve end-to-end QoX Move data rapidly & reliably

Support collaborative work Define primitive protocols Build reusable software Package & deliver software Deploy & operate services Operate infrastructure Upgrade infrastructure Perform troubleshooting Etc., etc., etc.

Page 13: The Grid: Beyond the Hype

14

Grid TechnologiesAddress Key Requirements

Infrastructure (“middleware”) for establishing, managing, and evolving multi-organizational federations Dynamic, autonomous, domain independent On-demand, ubiquitous access to computing,

data, and services Mechanisms for creating and managing workflow

within such federations New capabilities constructed dynamically and

transparently from distributed services Service-oriented, virtualization

Page 14: The Grid: Beyond the Hype

15Computer Science Contributions

Protocols and/or tools for use in dynamic, scalable, multi-institutional, computationally & data-rich settings for:

Large-scale distributedsystem architecture

Cross-org authentication Scalable community-based

policy enforcement Robust & scalable discovery Wide-area scheduling High-performance, robust,

wide-area data management Knowledge-based workflow

generation High-end collaboration

Resource & service virtualization

Distributed monitoring & manageability

Application development Wide area fault tolerance Infrastructure deployment &

management Resource provisioning &

quality of service Performance monitoring &

modeling

Page 15: The Grid: Beyond the Hype

“I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.”

VirtualData

System

Transformation Derivation

Data

created-by

execution-of

consumed-by/generated-by

“I’ve detected a calibration error in an instrument and

want to know which derived data to recompute.”

“I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.”

“I want to apply an astronomical analysis program to millions of objects. If the results

already exist, I’ll save weeks of computation.”

Collaborative Workflow: Virtual Data

www.griphyn.org/chimera

Page 16: The Grid: Beyond the Hype

17

0

50

100

150

200

250

300

350

400

0 50 100 150 200IP delay (ms)

Ove

rlay

del

ay (

ms)

.

AdaptiveUnstructured Multicast

“UMM: A dynamically adaptive, unstructured multicast overlay” M. Ripeanu et al.

A

E

B

D

C

A’

E’

B’

D’

C’

A”

E”

B”

D”

C”

Applicationoverlay

Baseoverlay

Physicaltopology

0

2

4

6

8

10

0

240

480

720

960

1200

1440

1680

1920

2160

2400

2640

2880

3120

3360

3600

3840

Time (sec)

RD

P

0

2

4

6

8

10

12

Max

xim

um li

nk s

tres

s .

MaxRDP

95% RDP

90%RDP

Stress

10 nodes fail then

rejoin 900s later

RDP=1

RDP=2

Page 17: The Grid: Beyond the Hype

18

Problem-Driven, Collaborative Research Methodology

Design

DeployBuild

Apply

Analyze

ApplyApply

Deploy

Apply

ComputerScience

Software,Standards

DisciplineAdvances

Infra-structure

GlobalCommunity

Page 18: The Grid: Beyond the Hype

19

Open Standards & Software Standardized & interoperable mechanisms for

secure & reliable: Authentication, authorization, policy, … Representation & management of state Initiation & management of computation Data access & movement Communication & notification

Good quality open source implementations to accelerate adoption & development E.g., Globus Toolkit

Page 19: The Grid: Beyond the Hype

20In

crea

sed

func

tiona

lity,

stan

dard

izat

ion

Customsolutions

1990 1995 2000 2005

Open GridServices Arch

Real standardsMultiple implementations

Web services, etc.

Managed sharedvirtual systems

Research

Globus Toolkit

Defacto standardSingle implementation

Internetstandards

Evolution of Open GridStandards and Software

2010

Page 20: The Grid: Beyond the Hype

21

WS Core Enables Frameworks:E.g., Resource Management

Web services(WSDL, SOAP, WS-Security, WS-ReliableMessaging, …)

WS-Resource Framework & WS-Notification(Resource identity, lifetime, inspection, subscription, …)

WS-Agreement(Agreement negotiation)

WS Distributed Management(Lifecycle, monitoring, …)

Applications of the framework(Compute, network, storage provisioning,

job reservation & submission, data management,application service QoS, …)

Page 21: The Grid: Beyond the Hype

22

WSRF & WS-Notification Naming and bindings (basis for virtualization)

Every resource can be uniquely referenced, and has one or more associated services for interacting with it

Lifecycle (basis for fault resilient state mgmt) Resources created by services following factory pattern Resources destroyed immediately or scheduled

Information model (basis for monitoring, discovery) Resource properties associated with resources Operations for querying and setting this info Asynchronous notification of changes to properties

Service groups (basis for registries, collective svcs) Group membership rules & membership management

Base Fault type

Page 22: The Grid: Beyond the Hype

23

Network

RRR

A

ServiceLevel

Bringing it All TogetherScenario: Resource management & scheduling

Storage

RRRIBM

IBM

Blades

RRR

Notification

GridScheduler

WS-Resource used to “model” physical

processor resources

WS-Resource Properties “project” processor status (like utilization)

Local processor manageris “front-ended” with A Web service interface

Other kinds of resources are also“modeled” as WS-Resources

JJ

J

WS-Notification can be used to “inform” the

scheduler when processor utilization

changes

Grid “Jobs” and “tasks” are also modeled using

WS-Resources and Resource Properties

Grid Scheduleris a

Web Service

Service Level Agreement

is modeled as a WS-Resource

Lifetime of SLA Resource tied to the duration

of the agreement

Page 23: The Grid: Beyond the Hype

24

The Globus Alliance & Toolkit(Argonne, USC/ISI, Edinburgh, PDC)

An international partnership dedicated to creating & disseminating high-quality open source Grid technology: the Globus Toolkit Design, engineering, support, governance

Academic Affiliates make major contributions EU: CERN, Imperial, MPI, Poznan AP: AIST, TIT, Monash US: NCSA, SDSC, TACC, UCSB, UW, etc.

Significant industrial contributions 1000s of users worldwide, many contribute

Page 24: The Grid: Beyond the Hype

25

Globus Toolkit History:An Unreliable Memoir

0

5000

10000

15000

20000

25000

30000

1997 1998 1999 2000 2001 2002

Glo

bu

s T

oo

lkit

Do

wn

load

s/M

on

th f

rom

Glo

bu

s.O

rg

DARPA, NSF begin funding Grid work

NASA initiatesInformation Power Grid

Globus Project winsGlobal Information

InfrastructureAward

MPICH-Greleased

The Grid: Blueprint for a New ComputingInfrastructure published

GT 1.0.0Released

Early ApplicationSuccesses Reported

GT 1.1.1Released

GT 1.1.2Released

GT 1.1.3Released

NSF & European CommissionInitiate Many New Grid Projects

GT 1.1.4 andMPICH-G2 Released

Anatomy of the GridPaper Released

FirstEuroGlobusConference

Held inLecce

SignificantCommercial

Interest inGrids

NSF GRIDS CenterInitiated

GT 2.0 betaReleased

Physiology of the GridPaper Released

GT 2.0Released

GT 2.2Released

Only Globus.Org; not downloads from: NMI UK eScience EU DataGrid IBM Platform etc.

Page 25: The Grid: Beyond the Hype

26

GlobusToolkit

ContributorsInclude

Grid Packaging Technology (GPT) NCSA Persistent GRAM Jobmanager Condor GSI/Kerberos interchangeability Sandia Documentation NASA, NCSA Ports IBM, HP, Sun, SDSC, … MDS stress testing EU DataGrid Support IBM, Platform, UK eScience Testing and patches Many Interoperable tools Many Replica location service EU DataGrid Python hosting environment LBNL Data access & integration UK eScience Data mediation services SDSC Tooling, Xindice, JMS IBM Brokering framework Platform Management framework HP $$ DARPA, DOE, NSF, NASA, Microsoft, EU

Page 26: The Grid: Beyond the Hype

27

GT-Based Grid Tools & Solutions

Globus Toolkit

Vir

tual D

ata

Toolk

it

Pla

tform

Glo

bu

s

NS

F M

idd

lew

are

In

it.

Bu

tterfl

y G

rid

EU

Data

Gri

d

IBM

Gri

d T

oolb

ox

MPIC

H-G

2

Acc

ess

Gri

d

Eart

h S

yst

em

Gri

d

Fusi

on

Gri

d

BIR

N B

iom

ed

ical G

rid

Tera

Gri

d

NEESg

rid

UK

eS

cien

ce G

rid

Page 27: The Grid: Beyond the Hype

28

Problem-Driven, Collaborative Research Methodology

Design

DeployBuild

Apply

Analyze

ApplyApply

Deploy

Apply

ComputerScience

Software,Standards

DisciplineAdvances

Infra-structure

GlobalCommunity

Page 28: The Grid: Beyond the Hype

29

Infrastructure Broadly deployed services in support of virtual

organization formation and operation Authentication, authorization, discovery, …

Services, software, and policies enabling on-demand access to important resources Computers, databases, networks, storage, software

services,… Operational support for 24x7 availability Integration with campus infrastructures Distributed, heterogeneous, instrumented systems

can be wonderful CS testbeds

Page 29: The Grid: Beyond the Hype

30

Infrastructure Status

Many infrastructure deployments worldwide Community-specific & general-purpose From campus to international Most based on GT technology

U.S. examples: TeraGrid, Grid2003, NEESgrid, Earth System Grid, BIRN

Major open issues include practical aspects of operations and federation

Scalability issues (number of users, sites, resources, files, jobs, etc.) also arising

Page 30: The Grid: Beyond the Hype

NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research vital to

reducing vulnerability to catastrophic earthquakes

Page 31: The Grid: Beyond the Hype

32

NEESgrid User Perspective

Secure, reliable, on-demand access to data,software, people, and other resources(ideally all via a Web Browser!)

Page 32: The Grid: Beyond the Hype

33

How it Really Happens(with the Globus Toolkit)

WebBrowser

ComputeServer

GlobusMCS/RLS

DataViewer

Tool

CertificateAuthority

CHEF ChatTeamlet

MyProxy

CHEF

ComputeServer

Resources implement standard access & management interfaces

Collective services aggregate &/or

virtualize resources

Users work with client applications

Application services organize VOs & enable

access to other services

Databaseservice

Databaseservice

Databaseservice

SimulationTool

Camera

Camera

TelepresenceMonitor

Globus IndexService

GlobusGRAM

GlobusGRAM

GlobusDAI

GlobusDAI

GlobusDAI

Application Developer

2

Off the Shelf

9

Globus Toolkit

4

Grid Community

4

Page 33: The Grid: Beyond the Hype

34Grid2003: An Operational Grid 28 sites (2100-2800 CPUs) & growing 400-1300 concurrent jobs 7 substantial applications + CS experiments Running since October 2003

Korea

http://www.ivdgl.org/grid2003

Page 34: The Grid: Beyond the Hype

35

Open Science Grid Components Computers & storage at 28 sites (to date)

2800+ CPUs Uniform service environment at each site

Globus Toolkit provides basic authentication, execution management, data movement

Pacman installation system enables installation of numerous other VDT and application services

Global & virtual organization services Certification & registration authorities, VO membership

services, monitoring services Client-side tools for data access & analysis

Virtual data, execution planning, DAG management, execution management, monitoring

IGOC: iVDGL Grid Operations Center

Page 35: The Grid: Beyond the Hype

36

www.earthsystemgrid.org

DOE Earth System Grid

Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models

Page 36: The Grid: Beyond the Hype

37Earth System Grid

Page 37: The Grid: Beyond the Hype

38

Problem-Driven, Collaborative Research Methodology

Design

DeployBuild

Apply

Analyze

ApplyApply

Deploy

Apply

ComputerScience

Software,Standards

DisciplineAdvances

Infra-structure

GlobalCommunity

Page 38: The Grid: Beyond the Hype

39

gx

NCSA Computational Model

All computational models written in Matlab.

m1

f1

UIUC

Experimental Model

gx

f1

m1

f2f2

U. Colorado

Experimental Model

gx

NEESgridMulti-site Online Simulation Test

Page 39: The Grid: Beyond the Hype

40

0

10

20

30

40

50

60

70

8:0

0

8:3

0

9:0

0

9:3

0

10

:00

10

:30

11

:00

11

:30

12

:00

12

:30

13

:00

13

:30

14

:00

14

:30

15

:00

15

:30

16

:00

16

:30

17

:00

17

:30

18

:00

18

:30

Nu

mb

er

of

Pa

rtic

ipa

nts

UIUC

Colorado

NEESgridMultisite OnlineSimulation Test

(July 2003)

Illin

ois

Colo

rado

Illinois (simulation)

Page 40: The Grid: Beyond the Hype

41

MOST: A Grid PerspectiveU. Colorado

Experimental Model

gx

f2m1, 1

F2

F1

e

gx

=

gx

f1, x1

UIUC Experimental Model

NTCPNTCP

SERVERSERVER

gx

m1

f1 f2

NCSANCSA

Computational Model

SIMULATIONSIMULATION

COORDINATORCOORDINATOR

NTCPNTCP

SERVERSERVER

NTCPNTCP

SERVERSERVER

Page 41: The Grid: Beyond the Hype

42

Grid2003 Applications To Date

CMS proton-proton collision simulation ATLAS proton-proton collision simulation LIGO gravitational wave search SDSS galaxy cluster detection ATLAS interactive analysis BTeV proton-antiproton collision simulation SnB biomolecular analysis GADU/Gnare genone analysis Various computer science experiments

www.ivdgl.org/grid2003/applications

Page 42: The Grid: Beyond the Hype

ExampleGrid2003Workflows

Genome sequence analysis

Physicsdata

analysis

Sloan digital sky

survey

Page 43: The Grid: Beyond the Hype

Example Grid3 Application:NVO Mosaic Construction

NVO/NASA Montage: A small (1200 node) workflow

Construct custom mosaics on demand from multiple data sources

User specifies projection, coordinates, size, rotation, spatial sampling

Work by Ewa Deelman et al., USC/ISI and Caltech

Page 44: The Grid: Beyond the Hype

45

Concluding Remarks

Design

DeployBuild

Apply

Analyze

ApplyApply

Deploy

Apply

ComputerScience

Software,Standards

DisciplineAdvances

Infra-structure

GlobalCommunity

Page 45: The Grid: Beyond the Hype

46

eScience & Grid: 6 Theses1. Scientific progress depends increasingly on large-scale

distributed collaborative work

2. Such distributed collaborative work raises challenging problems of broad importance

3. Any effective attack on those problems must involve close engagement with applications

4. Open software & standards are key to producing & disseminating required solutions

5. Shared software & service infrastructure are essential application enablers

6. A cross-disciplinary community of technology producers & consumers is needed

Page 46: The Grid: Beyond the Hype

GlobalCommunity

Page 47: The Grid: Beyond the Hype

48

(Based on a slide from HP)

Utility Computing is One of Several Commercial Drivers

shared, traded resources

value

clusters

grid-enabled systems

programmable data center

virtual data center

Open VMS clusters, TruCluster, MC ServiceGuard

Tru64, HP-UX, Linux

switchfabriccompute storage

UDC

computing utility

or

GRID

today

Utility computing On-demand Service-orientation Virtualization

Page 48: The Grid: Beyond the Hype

49

Significant Challenges Remain

Scaling in multiple dimensions Ambition and complexity of applications Number of users, datasets, services, … From technologies to solutions

The need for persistent infrastructure Software and people as well as hardware Currently no long-term commitment

Institutionalizing multidisciplinary approach Understand implications on the practice of

computer science research

Page 49: The Grid: Beyond the Hype

50

Thanks, in particular, to: Carl Kesselman and Steve Tuecke, my long-time Globus

co-conspirators Gregor von Laszewski, Kate Keahey, Jennifer Schopf,

Mike Wilde, Argonne colleagues Globus Alliance members at Argonne, U.Chicago, USC/ISI,

Edinburgh, PDC Miron Livny, U.Wisconsin Condor project, Rick Stevens,

Argonne & U.Chicago Other partners in Grid technology, application, &

infrastructure projects DOE, NSF, NASA, IBM for generous support

Page 50: The Grid: Beyond the Hype

51

For More Information Globus Alliance

www.globus.org Global Grid Forum

www.ggf.org Open Science Grid

www.opensciencegrid.org Background information

www.mcs.anl.gov/~foster GlobusWORLD 2005

Feb 7-11, Boston

2nd Editionwww.mkp.com/grid2