25
Issues for Grids and WorldWide Computing Harvey B Newman Harvey B Newman California Institute of California Institute of Technology Technology ACAT2000 ACAT2000 Fermilab, October 19, 2000 Fermilab, October 19, 2000

Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Embed Size (px)

Citation preview

Page 1: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Issues for Grids and WorldWide Computing

Harvey B NewmanHarvey B NewmanCalifornia Institute of TechnologyCalifornia Institute of Technology

ACAT2000ACAT2000Fermilab, October 19, 2000Fermilab, October 19, 2000

Page 2: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

LHC Vision: Data Grid HierarchyLHC Vision: Data Grid Hierarchy

Tier 1

Tier2 Center

Online System

Offline Farm,CERN Computer Ctr > 20 TIPS

FranceCenter

FNAL Center

Italy Center

UK Center

InstituteInstitute

InstituteInstitute ~0.25TIP

S

Workstations

~100 MBytes/se

c

~2.5 Gbits/sec

100 - 1000 Mbits/sec

1 Bunch crossing; ~17 interactions per 25 nsecs; 100 triggers per second. Event is ~1 MByte in size

Physicists work on analysis “channels”

Each institute has ~10 physicists working on one or more channels

Physics data cache

~PBytes/sec

~0.6-2.5 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center

~622 Mbits/sec

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

Page 3: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

US-CERN Link BW RequirementsUS-CERN Link BW RequirementsProjection Projection (PRELIMINARY)(PRELIMINARY)

2001 2002 2003 2004 2005 2006

Installed LinkBW in MbpsIncl. New SLACThroughput [*]

310

(120)

622

(250)

1600

(400)

2400

(600)

4000

(1000)

6500 [#]

(1600)

[#] Includes ~1.5 Gbps Each for ATLAS and CMS, Plus Babar, Run2 and Other[*] D0 and CDF at Run2: Needs Presumed to Be to be Comparable to BaBar

0

1000

2000

3000

4000

5000

6000

7000

Ba

nd

wid

th

(M

bp

s)

FY2001 FY2002 FY2003 FY2004 FY2005 FY2006

Page 4: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Grids: The Broader Issues and Grids: The Broader Issues and RequirementsRequirements

A New Level of Intersite Cooperation, andA New Level of Intersite Cooperation, andResource SharingResource Sharing

Security and Authentication Across Security and Authentication Across World-Region BoundariesWorld-Region Boundaries

Start with cooperation among Grid ProjectsStart with cooperation among Grid Projects (PPDG, GriPhyN, EU DataGrid, etc.) (PPDG, GriPhyN, EU DataGrid, etc.)

Develop Methods for Effective HEP/CS Collaboration Develop Methods for Effective HEP/CS Collaboration In Grid and VDT DesignIn Grid and VDT Design

Joint Design and Prototyping Effort, with Joint Design and Prototyping Effort, with (Iterative) Design Specifications(Iterative) Design Specifications

Find an Appropriate Level of AbstractionFind an Appropriate Level of Abstraction Adapted to > 1 Experiment; Adapted to > 1 Experiment;

> 1 Working Environment> 1 Working Environment Be Ready to Adapt to the Coming RevolutionsBe Ready to Adapt to the Coming Revolutions

In Network, Collaborative, and Internet In Network, Collaborative, and Internet Information TechnologiesInformation Technologies

Page 5: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

PPDG

BaB

ar D

ata

Man

agem

ent

BaBar

D0

CDF

Nuclear Physics

CMSAtlas

Globus Users

SRB Users

Condor Users

HENPGC

Users

CM

S D

ata Man

agem

ent

Nucl Physics Data Management

D0 Data M

anagement

CDF Data ManagementA

tlas

Dat

a M

anag

emen

t

Globus Team

Condor

SRB Team

HE

NP

GC

Page 6: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

GriPhyN: PetaScale GriPhyN: PetaScale Virtual Data GridsVirtual Data Grids

Build the Foundation for Petascale Virtual Data GridsBuild the Foundation for Petascale Virtual Data Grids

Virtual Data Tools

Request Planning &

Scheduling ToolsRequest Execution & Management Tools

Transforms

Distributed resources(code, storage,

computers, and network)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid ServicesOther Grid

Services

Interactive User Tools

Production TeamIndividual Investigator

Workgroups

Raw data source

Page 7: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

WorkPackageNumber

Work Package title Leadcontractor

WP1 Grid Workload Management INFN

WP2 Grid Data Management CERN

WP3 Grid Monitoring Services PPARC

WP4 Fabric Management CERN

WP5 Mass Storage Management PPARC

WP6 Integration Testbed CNRS

WP7 Network Services CNRS

WP8 High Energy Physics Applications CERN

WP9 Earth Observation Science Applications ESA

WP10 Biology Science Applications INFN

WP11 Dissemination and Exploitation INFN

WP12 Project Management CERN

EU-Grid ProjectEU-Grid ProjectWork PackagesWork Packages

Page 8: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Grid Issues: A Short List of Grid Issues: A Short List of Coming Revolutions Coming Revolutions

Network TechnologiesNetwork Technologies Wireless Broadband (from ca. 2003)Wireless Broadband (from ca. 2003) 10 Gigabit Ethernet10 Gigabit Ethernet (from 2002: See www.10gea.org) (from 2002: See www.10gea.org)

10GbE/DWDM-Wavelength (OC-192) integration: OXC10GbE/DWDM-Wavelength (OC-192) integration: OXC Internet Information Software TechnologiesInternet Information Software Technologies

Global Information “Broadcast” ArchitectureGlobal Information “Broadcast” Architecture E.g the Multipoint Information Distribution Protocol E.g the Multipoint Information Distribution Protocol

(MIDP; [email protected])(MIDP; [email protected]) Programmable Coordinated Agent ArchtecturesProgrammable Coordinated Agent Archtectures

E.g. Mobile Agent Reactive Spaces (MARS)E.g. Mobile Agent Reactive Spaces (MARS) by Cabri et al., Univ. of Modena by Cabri et al., Univ. of Modena

The “Data Grid” - Human InterfaceThe “Data Grid” - Human Interface Interactive monitoring and control of Grid resources Interactive monitoring and control of Grid resources

By authorized groups and individualsBy authorized groups and individuals By Autonomous AgentsBy Autonomous Agents

Page 9: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

GigaPOP

Vancouver

Calgary ReginaWinnipeg

Ottawa

Montreal

Toronto

Halifax

St. John’s

FrederictonCharlottetown

ORAN

BCnet

Netera SRnet MRnet

ONet RISQ

ACORN

Chicago

STAR TAP

CA*net 3 Primary Route

Seattle

New YorkLos Angeles

CA*net 3 Diverse Route

Deploying a 4 channel

CWDM Gigabit

Ethernet network – 400 km

Deploying a 4 channel Gigabit

Ethernet transparent

optical DWDM– 1500 km

Multiple Customer

Owned Dark Fiber

Networks connecting universities and schools

16 channel DWDM-8 wavelengths @OC-192 reserved for CANARIE-8 wavelengths for carrier and other customers

Consortium Partners:Bell Nexxia

NortelCisco

JDS UniphaseNewbridge

Condo Dark Fiber

Networks connecting universities

and schools

Condo Fiber Network

linking all universities and hospital

CA*net 3 National Optical InternetCA*net 3 National Optical Internetin Canadain Canada

Page 10: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Vancouver

Calgary

Regina Winnipeg

Ottawa

Montreal

Toronto

Halifax

St. John’s

Fredericton

Charlottetown

Chicago

Seattle

New York

Los Angeles Miami

Europe

Dedicated Wavelength or SONET channel

OBGP switches

Optional Layer 3 aggregation service

Large channel WDM

system

CA*net 4 Possible ArchitectureCA*net 4 Possible Architecture

Page 11: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Intermediate ISP

Tier 1 ISPTier 2 ISP

AS 1 AS 2 AS 3 AS 4

AS 5

Dual Connected

Router to AS 5

Optical switch looks like BGP router and AS1 is direct connected to Tier 1 ISP but still transits AS 5

Router redirects networks with heavy traffic load to optical switch, but routing policy still maintained by ISP

Bulk of AS 1 traffic is to Tier 1 ISP

For simplicity only data forwarding

paths in one direction shown

Red Default Wavelength

OBGP Traffic Engineering - PhysicalOBGP Traffic Engineering - Physical

Page 12: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

VRVS Remote Collaboration VRVS Remote Collaboration

System: Statistics System: Statistics

0200400600800

1000120014001600180020002200240026002800300032003400

Ja

n-9

7F

eb

-97

Ma

r-9

7A

pr-

97

Ma

y-9

7J

un

-97

Ju

l-9

7A

ug

-97

Se

p-9

7O

ct-

97

No

v-9

7D

ec

-97

Ja

n-9

8F

eb

-98

Ma

r-9

8A

pr-

98

Ma

y-9

8J

un

-98

Ju

l-9

8A

ug

-98

Se

p-9

8O

ct-

98

No

v-9

8D

ec

-98

Ja

n-9

9F

eb

-99

Ma

r-9

9A

pr-

99

Ma

y-9

9J

un

-99

Ju

l-9

9A

ug

-99

Se

p-9

9O

ct-

99

No

v-9

9D

ec

-99

Ja

n-0

0F

eb

-00

Ma

r-0

0A

pr-

00

Ma

y-0

0J

un

-00

Ju

l-0

0A

ug

-00

Se

p-0

0

Months

Number of Machines and People registered in VRVS

Machines Registered People Registered

30 Reflectors52 Countries

Mbone, H.323, MPEG2Streaming, VNC

Page 13: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

VRVS: Mbone/H.323/QT SnapshotVRVS: Mbone/H.323/QT Snapshot

Page 14: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

VRVS R&D: Sharing DesktopVRVS R&D: Sharing Desktop

VNC technology integrated in the upcoming VRVS release VNC technology integrated in the upcoming VRVS release

Page 15: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Worldwide Computing IssuesWorldwide Computing Issues

Beyond Grid Prototype Components: Integration of Beyond Grid Prototype Components: Integration of Grid Prototypes for End-to-end Data TransportGrid Prototypes for End-to-end Data Transport

Particle Physics Data Grid (PPDG) ReqM; SAM in D0Particle Physics Data Grid (PPDG) ReqM; SAM in D0 PPDG/EU DataGrid GDMP for CMS HLT ProductionsPPDG/EU DataGrid GDMP for CMS HLT Productions

Start Building the Grid System(s): Integration with Start Building the Grid System(s): Integration with Experiment-specific software frameworksExperiment-specific software frameworks

Derivation of Strategies (MONARC Simulation System) Derivation of Strategies (MONARC Simulation System) Data caching, query estimation, co-schedulingData caching, query estimation, co-scheduling Load balancing and workload management amongst Load balancing and workload management amongst

Tier0/Tier1/Tier2 sites (SONN by Legrand)Tier0/Tier1/Tier2 sites (SONN by Legrand) Transaction robustness: simulate and verifyTransaction robustness: simulate and verify

Transparent Interfaces for Replica ManagementTransparent Interfaces for Replica Management Deep versus shallow copies: Thresholds; Deep versus shallow copies: Thresholds;

tracking, monitoring and controltracking, monitoring and control

Page 16: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Grid Data Management Grid Data Management Prototype (GDMP)Prototype (GDMP)

Distributed Distributed Job Job ExecutionExecution and and Data Handling:Data Handling:

GoalsGoals TransparencyTransparency PerformancePerformance Security Security Fault ToleranceFault Tolerance AutomationAutomation

Submit job

Replicate data

Replicatedata

Site A Site B

Site C

Jobs are executed locally or

remotely Data is always

written locally Data is replicated

to remote sites

Job writes data locally

GDMP V1.1: Caltech + EU DataGrid WP2 Tests by CALTECH, CERN, FNAL, Pisa for CMS “HLT” Production 10/2000;

Integration with ENSTORE, HPSS, Castor

Page 17: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

MONARC Simulation: Physics MONARC Simulation: Physics Analysis at Regional CentresAnalysis at Regional Centres

Similar data processingSimilar data processing

jobs are performed in jobs are performed in each of several RCs each of several RCs

There is profile of jobs,There is profile of jobs,each submitted to a job each submitted to a job schedulerscheduler

Each Centre has “TAG”Each Centre has “TAG”and “AOD” databases and “AOD” databases replicated.replicated.

Main Centre provides Main Centre provides “ESD” and “RAW” data “ESD” and “RAW” data

Each job processes Each job processes AOD data, and also aAOD data, and also aa fraction of ESD and a fraction of ESD and RAW data.RAW data.

Page 18: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

ORCA Production on CERN/IT-LoanedORCA Production on CERN/IT-LoanedEvent Filter Farm Test Facility Event Filter Farm Test Facility

PileupDB

PileupDB

PileupDB

PileupDB

PileupDB

HPSS

PileupDB

PileupDB

SignalDB

SignalDB

SignalDB

...

6 Servers for Signal

Output Server

Output Server

Lock Server

Lock Server

SU

N

...FARM 140 Processing

Nodes

17 Servers

9 S

erve

rs

Total 24 Pile Up Servers

2 Objectivity

Federations The strategy is to use many commodity PCs as Database Servers

Page 19: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Network Traffic & Job efficiency Network Traffic & Job efficiency

Mean measured Value ~48MB/s

Measurement

SimulationJet

<0.52>

Muon<0.90>

Page 20: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

CD

CH

MDMH TH

MC

UF.bootMyFED.boot

UserCollection

MDCDMC

TD

AMS

ORCA 4 tutorial, part II - 14. October 2000

From UserFederation From UserFederation To Private Copy To Private Copy

Page 21: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Mobile Agents: (Semi)-Autonomous, Mobile Agents: (Semi)-Autonomous, Goal Driven, AdaptiveGoal Driven, Adaptive Execute AsynchronouslyExecute Asynchronously Reduce Network Load: Local ConversationsReduce Network Load: Local Conversations Overcome Network Latency; Some OutagesOvercome Network Latency; Some Outages Adaptive Adaptive Robust, Fault Tolerant Robust, Fault Tolerant Naturally Heterogeneous Naturally Heterogeneous Extensible Concept: Extensible Concept: Coordinated Agent Coordinated Agent

Architectures Architectures

Beyond Traditional Architectures:Beyond Traditional Architectures:Mobile AgentsMobile Agents

““Agents are objects with rules and legs” -- D. TaylorAgents are objects with rules and legs” -- D. Taylor

Application

Se

rvic

e

Ag

entAgent

Ag

ent A

gen

tA

gen

t

Ag

ent

Ag

ent

Page 22: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Coordination Architectures for Coordination Architectures for Mobile Java AgentsMobile Java Agents

A lot of Progress since 1998A lot of Progress since 1998 Fourth Generation Architecture: “Associative Fourth Generation Architecture: “Associative

Blackboards”Blackboards” After 1) Client/Server, 2) Meeting-Oriented, 3) After 1) Client/Server, 2) Meeting-Oriented, 3)

Blackboards;Blackboards; Analogous to CMS ORCA software: Analogous to CMS ORCA software:

Observer-based “action on demand”Observer-based “action on demand” MARS: Mobile Agent Reactive Spaces (Cabri et al.)MARS: Mobile Agent Reactive Spaces (Cabri et al.)

See http://sirio.dsi.unimo.it/MOONSee http://sirio.dsi.unimo.it/MOON Resilient and Scalable; Simple ImplementationResilient and Scalable; Simple Implementation Works with standard Agent implementations Works with standard Agent implementations

(e.g. Aglets: http://www.trl.ibm.co.jp)(e.g. Aglets: http://www.trl.ibm.co.jp) Data-oriented, to provide temporal and spatial Data-oriented, to provide temporal and spatial

asynchronicity (See Java Spaces, Page Spaces)asynchronicity (See Java Spaces, Page Spaces) Programmable, authorized reactions, based onProgrammable, authorized reactions, based on

“virtual Tuple spaces”“virtual Tuple spaces”

Page 23: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Mobile Agent Reactive Spaces Mobile Agent Reactive Spaces (MARS) Architecture(MARS) Architecture

MARS Programmed Reactions: Based on MetalevelMARS Programmed Reactions: Based on Metalevel 4-Ples: (Reaction, Tuple, Operation-Type, Agent-ID) 4-Ples: (Reaction, Tuple, Operation-Type, Agent-ID) Allows Security, PoliciesAllows Security, Policies Allows Production of Tuple on DemandAllows Production of Tuple on Demand

The

Internet

NETWORK NODENETWORK NODE

Tuple Space

MetaLevel Tuple space

Agent Server

NETWORK NODE

NETWORK NODE

A

Reference to the

local Tuple Space

B

C

A: Agents ArriveB: They Get Ref. To Tuple SpaceC: They Access Tuple SpaceD: Tuple Space Reacts, with Programmed Behavior

D

Page 24: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

GRIDs In 2000: SummaryGRIDs In 2000: Summary

Grids are (in) our Grids are (in) our Future…Future…

Let’s Get to Work Let’s Get to Work

Page 25: Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

Grid Data ManagementGrid Data ManagementIssuesIssues

Data movement and responsibility for Data movement and responsibility for updating the Replica Catalogupdating the Replica Catalog

Metadata update and replica consistencyMetadata update and replica consistency Concurrency and lockingConcurrency and locking

Performance characteristics of replicasPerformance characteristics of replicas Advance Reservation: Policy, time-limitAdvance Reservation: Policy, time-limit

How to advertise policy and resource How to advertise policy and resource availabilityavailability

Pull versus push (strategy; security)Pull versus push (strategy; security) Fault tolerance; recovery proceduresFault tolerance; recovery procedures Queue managementQueue management Access control, both global and localAccess control, both global and local