34
Harvey B. Newman, Caltech Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure Advisory Panel on CyberInfrastructure National Science Foundation National Science Foundation November 29, 2001 November 29, 2001 http://l3www.cern.ch/~newman/LHCGridsPACI.ppt http://l3www.cern.ch/~newman/LHCGridsPACI.ppt LHC Experiments and the PACI LHC Experiments and the PACI A Partnership for Global Data Analysis A Partnership for Global Data Analysis

Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure National Science Foundation

  • Upload
    giles

  • View
    36

  • Download
    1

Embed Size (px)

DESCRIPTION

LHC Experiments and the PACI A Partnership for Global Data Analysis. Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure National Science Foundation November 29, 2001 http://l3www.cern.ch/~newman/LHCGridsPACI.ppt. Global Data Grid Challenge. - PowerPoint PPT Presentation

Citation preview

Page 1: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Harvey B. Newman, CaltechHarvey B. Newman, Caltech Advisory Panel on CyberInfrastructureAdvisory Panel on CyberInfrastructure

National Science FoundationNational Science Foundation November 29, 2001November 29, 2001

http://l3www.cern.ch/~newman/LHCGridsPACI.ppthttp://l3www.cern.ch/~newman/LHCGridsPACI.ppt

LHC Experiments and the PACILHC Experiments and the PACIA Partnership for Global Data AnalysisA Partnership for Global Data Analysis

Page 2: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Global Data Grid ChallengeGlobal Data Grid Challenge

““Global scientific communities, served by networks Global scientific communities, served by networks with bandwidths varying by orders of magnitude, with bandwidths varying by orders of magnitude, need to perform computationally demanding need to perform computationally demanding analyses of geographically distributed datasets analyses of geographically distributed datasets that will grow by at least 3 orders of magnitude that will grow by at least 3 orders of magnitude over the next decade, from the 100 Terabyte to over the next decade, from the 100 Terabyte to the 100 Petabyte scale [from 2000 to 2007]”the 100 Petabyte scale [from 2000 to 2007]”

Page 3: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

The Large Hadron Collider (2006-)The Large Hadron Collider (2006-) The Next-generation Particle Collider The Next-generation Particle Collider

The largest superconductor The largest superconductor installation in the worldinstallation in the world

Bunch-bunch collisions at 40 MHz,Bunch-bunch collisions at 40 MHz,Each generating ~20 interactionsEach generating ~20 interactions

Only one in a trillion may lead Only one in a trillion may lead to a major physics discovery to a major physics discovery

Real-time data filtering: Real-time data filtering: Petabytes per second to Gigabytes Petabytes per second to Gigabytes per secondper second

Accumulated data of many Accumulated data of many Petabytes/YearPetabytes/Year

Large data samples explored and analyzed by thousands Large data samples explored and analyzed by thousands of globally dispersed scientists, in hundreds of teamsof globally dispersed scientists, in hundreds of teams

Page 4: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Four LHC Experiments: The Four LHC Experiments: The Petabyte to Exabyte ChallengePetabyte to Exabyte ChallengeATLAS, CMS, ALICE, LHCBATLAS, CMS, ALICE, LHCB

Higgs + New particles; Quark-Gluon Plasma; CP ViolationHiggs + New particles; Quark-Gluon Plasma; CP Violation

Data storedData stored ~40 Petabytes/Year and UP; ~40 Petabytes/Year and UP; CPU CPU 0.30 Petaflops and UP 0.30 Petaflops and UP

0.1 to 1 Exabyte (1 EB = 100.1 to 1 Exabyte (1 EB = 101818 Bytes) Bytes) (2007) (~2012 ?) for the LHC Experiments(2007) (~2012 ?) for the LHC Experiments

Page 5: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Evidence for the Higgs at LEP at M~115 GeV The LEP Program Has Now Ended

Page 6: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

All charged tracks with pt > 2 GeV

Reconstructed tracks with pt > 25 GeV

(+30 minimum bias events)

109 events/sec, selectivity: 1 in 1013 (1 person in a thousand world populations)

LHC: Higgs Decay into 4 muons LHC: Higgs Decay into 4 muons 1000X LEP Data Rate1000X LEP Data Rate

Page 7: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

LHC Data Grid HierarchyLHC Data Grid Hierarchy

Tier 1

Tier2 Center

Online System

CERN 700k SI95 ~1 PB Disk; Tape Robot

FNAL: 200k SI95; 600 TBIN2P3 Center INFN Center RAL Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100-400 MBytes/sec

2.5 Gbps

100 - 1000

Mbits/sec

Physicists work on analysis “channels”Each institute has ~10 physicists working on one or more channels

Physics data cache

~PByte/sec

~2.5 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center~2.5 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1

Page 8: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

TeraGrid:TeraGrid:NCSA, ANL, SDSC, CaltechNCSA, ANL, SDSC, Caltech

NCSA/UIUC

ANL

UIC Multiple Carrier HubsStarlight / NW Univ

Ill Inst of TechUniv of Chicago

Indianapolis (Abilene NOC)

I-WIRE

Pasadena

San Diego

DTF Backplane(4x: 40 Gbps)

Abilene

Chicago

IndianapolisUrbana

OC-48 (2.5 Gb/s, Abilene)Multiple 10 GbE (Qwest)Multiple 10 GbE (I-WIRE Dark Fiber)

Solid lines in place and/or available in 2001 Dashed I-WIRE lines planned for Summer 2002

Source: Charlie Catlett, Argonne

StarLight: Int’l Optical Peering Point(see www.startap.net)

A Preview of the Grid Hierarchyand Networks of the LHC Era

Page 9: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Current Grid Challenges: Resource Current Grid Challenges: Resource Discovery, Co-Scheduling, TransparencyDiscovery, Co-Scheduling, Transparency

Discovery and Efficient Co-Scheduling of Computing, Discovery and Efficient Co-Scheduling of Computing, Data Handling, and Network ResourcesData Handling, and Network Resources

Effective, Consistent Replica ManagementEffective, Consistent Replica Management Virtual Data: Recomputation Versus Data Transport Virtual Data: Recomputation Versus Data Transport

DecisionsDecisions Reduction of Complexity In a “Petascale” WorldReduction of Complexity In a “Petascale” World

““GA3”: Global Authentication, Authorization, AllocationGA3”: Global Authentication, Authorization, Allocation VDT: Transparent Access to Results VDT: Transparent Access to Results

(and Data (and Data When Necessary)When Necessary) Location Independence of the User Analysis, Grid,Location Independence of the User Analysis, Grid,

and Grid-Development Environmentsand Grid-Development Environments Seamless Multi-Step Data Processing and Analysis:Seamless Multi-Step Data Processing and Analysis:

DAGMan (Wisc), MOP+IMPALA(FNAL)DAGMan (Wisc), MOP+IMPALA(FNAL)

Page 10: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

CMS Production: Event Simulation CMS Production: Event Simulation and Reconstructionand Reconstruction

““Grid-Enabled”Grid-Enabled” AutomatedAutomated

Imperial Imperial CollegeCollege

UFLUFL

Fully operationalFully operational

CaltechCaltech

PUPUNo PUNo PU

Not Op.Not Op.Not Op.Not Op.

In progressIn progress

Common Common Prod. toolsProd. tools(IMPALA)(IMPALA)

GDMPGDMPDigitizationDigitizationSimulationSimulation

Not Op.Not Op.HelsinkiHelsinkiIN2P3IN2P3WisconsinWisconsinBristolBristol

UCSDUCSD

INFNINFNMoscowMoscowFNALFNALCERNCERN

Worldwide Production

at 12 Sites

Page 11: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

US CMS TeraGrid Seamless US CMS TeraGrid Seamless PrototypePrototype

Caltech/Wisconsin Condor/NCSA ProductionCaltech/Wisconsin Condor/NCSA Production Simple Job Launch from CaltechSimple Job Launch from Caltech

Authentication Using Globus Security Infrastructure (GSI)Authentication Using Globus Security Infrastructure (GSI) Resources Identified Using Globus Information Resources Identified Using Globus Information

Infrastructure (GIS)Infrastructure (GIS) CMSIM Jobs (Batches of 100, 12-14 Hours, 100 GB Output)CMSIM Jobs (Batches of 100, 12-14 Hours, 100 GB Output)

Sent to the Wisconsin Condor Flock Using Condor-G Sent to the Wisconsin Condor Flock Using Condor-G Output Files Automatically Stored in NCSA Unitree (Gridftp)Output Files Automatically Stored in NCSA Unitree (Gridftp)

ORCA Phase: Read-in and Process Jobs at NCSAORCA Phase: Read-in and Process Jobs at NCSA Output Files Automatically Stored in NCSA UnitreeOutput Files Automatically Stored in NCSA Unitree

Future: Multiple CMS Sites; Storage in Caltech HPSS Also,Future: Multiple CMS Sites; Storage in Caltech HPSS Also,Using GDMP (With LBNL’s HRM).Using GDMP (With LBNL’s HRM).

Animated Flow Diagram of the DTF Prototype:Animated Flow Diagram of the DTF Prototype: http://cmsdoc.cern.ch/~wisniew/infrastructure.htmlhttp://cmsdoc.cern.ch/~wisniew/infrastructure.html

Page 12: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Baseline BW for the US-CERN Link:Baseline BW for the US-CERN Link: HENP Transatlantic WG (DOE+NSF) HENP Transatlantic WG (DOE+NSF)

US-CERN Plans: 155 Mbps to 2 X 155 Mbps this Year;US-CERN Plans: 155 Mbps to 2 X 155 Mbps this Year; 622 Mbps in April 2002;622 Mbps in April 2002;

DataTAG 2.5 Gbps Research Link in Summer 2002;DataTAG 2.5 Gbps Research Link in Summer 2002;10 Gbps Research Link in ~200310 Gbps Research Link in ~2003

Transoceanic Networking

Integrated with the TeraGrid,

Abilene, Regional Nets

and Continental Network

Infrastructuresin US, Europe,

Asia, South America

Page 13: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

2001 2002 2003 2004 2005 2006CMS 100 200 300 600 800 2500

ATLAS 50 100 300 600 800 2500BaBar 300 600 1100 1600 2300 3000CDF 100 300 400 2000 3000 6000D0 400 1600 2400 3200 6400 8000

BTeV 20 40 100 200 300 500DESY 100 180 210 240 270 300

CERNBW

155-310

622 1250 2500 5000 10000

[*] [*] Installed BW. Maximum Link Occupancy 50% AssumedInstalled BW. Maximum Link Occupancy 50% AssumedThe Network Challenge is Shared by Both Next- and The Network Challenge is Shared by Both Next- and

Present Generation ExperimentsPresent Generation Experiments

Transatlantic Net WG (HN, L. Price)Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*] Bandwidth Requirements [*]

Page 14: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Internet2 HENP Networking WG [*]Internet2 HENP Networking WG [*]MissionMission

To help ensure that the requiredTo help ensure that the required National and international network infrastructuresNational and international network infrastructures Standardized tools and facilities for high performance Standardized tools and facilities for high performance

and end-to-end monitoring and tracking, andand end-to-end monitoring and tracking, and Collaborative systemsCollaborative systems

are developed and deployed in a timely manner, are developed and deployed in a timely manner, and used effectively to meet the needs of the US LHC and and used effectively to meet the needs of the US LHC and other major HENP Programs, as well as the general needs other major HENP Programs, as well as the general needs of our scientific community.of our scientific community.

To carry out these developments in a way that is broadly To carry out these developments in a way that is broadly applicable across many fields, within and beyond the applicable across many fields, within and beyond the scientific communityscientific community

[*] Co-Chairs: S. McKee (Michigan), H. Newman (Caltech); [*] Co-Chairs: S. McKee (Michigan), H. Newman (Caltech); With thanks to R. Gardner and J. Williams (Indiana)With thanks to R. Gardner and J. Williams (Indiana)

Page 15: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Grid R&D: Focal Areas for Grid R&D: Focal Areas for NPACI/HENP PartnershipNPACI/HENP Partnership

Development of Grid-Enabled User Analysis EnvironmentsDevelopment of Grid-Enabled User Analysis Environments CLARENS (+IGUANA) CLARENS (+IGUANA) Project for Portable Grid-EnabledProject for Portable Grid-Enabled

Event Visualization, Data Processing and Analysis Event Visualization, Data Processing and Analysis Object Integration:Object Integration: backed by an ORDBMS, and backed by an ORDBMS, and

File-Level Virtual Data CatalogsFile-Level Virtual Data Catalogs Simulation Toolsets for Systems Modeling, OptimizationSimulation Toolsets for Systems Modeling, Optimization

For example: theFor example: the MONARC MONARC System System Globally Scalable Agent-Based Realtime Information Globally Scalable Agent-Based Realtime Information

Marshalling SystemsMarshalling Systems To face the next-generation challenge of DynamicTo face the next-generation challenge of Dynamic

Global Grid design and operationsGlobal Grid design and operations Self-learning (e.g. SONN) optimization Self-learning (e.g. SONN) optimization Simulation (Now-Casting) enhanced: to monitor, track and Simulation (Now-Casting) enhanced: to monitor, track and

forward predict site, network and global system stateforward predict site, network and global system state 1-10 Gbps Networking development and global deployment1-10 Gbps Networking development and global deployment

Work with the TeraGrid, STARLIGHT, Abilene, the iVDGL Work with the TeraGrid, STARLIGHT, Abilene, the iVDGL GGGOC, HENP Internet2 WG, Internet2 E2E, and DataTAGGGGOC, HENP Internet2 WG, Internet2 E2E, and DataTAG

Global Collaboratory Development: e.g. VRVS, Access GridGlobal Collaboratory Development: e.g. VRVS, Access Grid

Page 16: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

CLARENS: a Data AnalysisCLARENS: a Data AnalysisPortal to the Grid: Steenberg (Caltech)Portal to the Grid: Steenberg (Caltech)

A highly functional graphical interface, A highly functional graphical interface, Grid-enabling the working environment for Grid-enabling the working environment for “non-specialist” physicists’ data analysis“non-specialist” physicists’ data analysis

Clarens consists of a server communicating with Clarens consists of a server communicating with various clients via the commodity XML-RPC protocol. various clients via the commodity XML-RPC protocol. This ensures implementation independence.This ensures implementation independence.

The server is implemented in C++ to give access The server is implemented in C++ to give access to the CMS OO analysis toolkit.to the CMS OO analysis toolkit.

The server will provide a remote API to Grid tools:The server will provide a remote API to Grid tools: Security services provided by the Grid (GSI)Security services provided by the Grid (GSI) The Virtual Data Toolkit: Object collection accessThe Virtual Data Toolkit: Object collection access Data movement between Tier centers using GSI-FTPData movement between Tier centers using GSI-FTP CMS analysis software (ORCA/COBRA)CMS analysis software (ORCA/COBRA)

Current prototype is running on the Caltech Proto-Tier2Current prototype is running on the Caltech Proto-Tier2 More information at More information at http://heppc22.http://heppc22.hephep..caltechcaltech..eduedu, ,

along with a web-based demoalong with a web-based demo

Page 17: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Modelling and understanding current systems, their performance and limitations, is essential for the design of the future large scale distributed processing systems.

The simulation program developed within the MONARC (MModels odels OOf f NNetworked etworked AAnalysis At nalysis At RRegional egional CCenters) enters) project is based on a process oriented approach for discrete event simulation. It is based on the on Java(TM) technology and provides a realistic modelling tool for such large scale distributed systems.

Modeling and Simulation:Modeling and Simulation:MONARC SystemMONARC System

SIMULATION of Complex Distributed Systems

Page 18: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

MONARC SONN: 3 Regional Centres MONARC SONN: 3 Regional Centres Learning to Export Jobs (Day 9)Learning to Export Jobs (Day 9)

NUST20 CPUs

CERN30 CPUs

CALTECH25 CPUs

1MB/s ; 150 ms RTT

1.2 MB/s

150 ms RTT

0.8

MB/s

200 m

s RTT

Day = 9

<E> = 0.73

<E> = 0.66

<E> = 0.83

Page 19: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

TCP Protocol Study: Limits We determined Precisely

The parameters which limit the throughput over a high-BW, long delay (170 msec) network

How to avoid intrinsic limits; unnecessary packet loss

Methods Used to Improve TCP Linux kernel programming in order

to tune TCP parameters We modified the TCP algorithm A Linux patch will soon be

available

Result: The Current State of the Art for Reproducible Throughput

125 Mbps between CERN and Caltech

135 Mbps between CERN and Chicago

Status: Ready for Tests at Higher BW (622 Mbps) in Spring 2002

Congestion window behavior of a TCP connection over the transatlantic line

3) Back to slow start(Fast Recovery couldn’t repair the lostThe packet lost is detected by timeout => go back to slow start cwnd = 2 MSS)

2) Fast Recovery (Temporary state to repair the lost)1) A packet is

lost New loss

Losses occur when the cwnd is larger than 3,5 Mbyte

TCP performance between CERN and Caltech

0

20

40

60

80

100

120

140

1 2 3 4 5 6 7 8 9 10 11

Connection number

Mbp

s

Without tunning

By tunning theSSTHRESHparameter

Maximizing US-CERN TCP Maximizing US-CERN TCP Throughput (S.Ravot, Caltech)Throughput (S.Ravot, Caltech)

Reproducible 125 Mbps Between

CERN and Caltech/CACR

Page 20: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Agent-Based Distributed System: Agent-Based Distributed System: JINI Prototype (Caltech/PakistanJINI Prototype (Caltech/Pakistan))

Includes “Station Servers” (static) that host mobile “Dynamic Services”

Servers are interconnected dynamically to form a fabric in which mobile agents travel, with a payload of physics analysis tasks

Prototype is highly flexible and robust against network outages

Amenable to deployment on leading edge and future portable devices (WAP, iAppliances, etc.) “The” system for the

travelling physicist The Design and Studies with this

prototype use the MONARC Simulator, and build on SONN studies See http://home.cern.ch/clegrand/lia/ See http://home.cern.ch/clegrand/lia/

StationServer

StationServer

StationServer

LookupService

LookupService

Proxy Exchange

Registration

Service Listener

Lookup Discovery

Service

Remote Notification

Page 21: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

RCMonitorService

Farm Monitor

Client(other service)

LookupService

LookupService Registration

Farm Monitor

Discovery

Proxy

Component Factory

GUI marshaling Code Transport RMI data access

Push & Pullrsh & ssh existing scripts

snmp

Globally Scalable Monitoring ServiceGlobally Scalable Monitoring Service

Page 22: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

ExamplesExamples GLASTGLAST meeting meeting

10 participants connected via VRVS (and 16 participants in Audio only)10 participants connected via VRVS (and 16 participants in Audio only)

VRVS7300 Hosts; 4300 Registered Users In 58 Countries34 Reflectors; 7 In I2Annual Growth 250% US CMS will use the CDF/KEK remote control room concept for Fermilab Run II

as a starting point. However, we will (1) expand the scope to encompass a US based physics group and US LHC accelerator tasks, and (2) extend the concept to a Global Collaboratory for realtime data acquisition + analysis

Page 23: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Next Round Grid Challenges: Global Workflow Next Round Grid Challenges: Global Workflow Monitoring, Management, and OptimizationMonitoring, Management, and Optimization

Workflow Management, Balancing Policy Versus Workflow Management, Balancing Policy Versus Moment-to-moment Capability to Complete TasksMoment-to-moment Capability to Complete Tasks Balance High Levels of Usage of Limited Resources Balance High Levels of Usage of Limited Resources

Against Better Turnaround Times for Priority JobsAgainst Better Turnaround Times for Priority Jobs Goal-Oriented; According to (Yet to be Developed) Goal-Oriented; According to (Yet to be Developed)

MetricsMetrics Maintaining a Global View of Resources and System StateMaintaining a Global View of Resources and System State

Global System Monitoring, Modeling, Quasi-realtime Global System Monitoring, Modeling, Quasi-realtime

simulation; feedback on the Macro- and Micro-simulation; feedback on the Macro- and Micro-ScalesScales

Adaptive Learning: new paradigms for execution Adaptive Learning: new paradigms for execution optimization and Decision Support (eventually optimization and Decision Support (eventually automated)automated)

Grid-enabled User EnvironmentsGrid-enabled User Environments

Page 24: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

PACI, TeraGrid and HENPPACI, TeraGrid and HENP The scale, complexity and global extent of the LHC Data The scale, complexity and global extent of the LHC Data

Analysis problem is unprecedentedAnalysis problem is unprecedented The solution of the problem, using globally distributed Grids, The solution of the problem, using globally distributed Grids,

is mission-critical for frontier science and engineeringis mission-critical for frontier science and engineering HENP has a tradition of deploying new highly functional HENP has a tradition of deploying new highly functional

systems (and sometimes new technologies) to meet its systems (and sometimes new technologies) to meet its technical and ultimately its scientific needstechnical and ultimately its scientific needs

HENP problems are mostly “embarrassingly” parallel; but HENP problems are mostly “embarrassingly” parallel; but potentially “overwhelming” in their data- and network potentially “overwhelming” in their data- and network intensivenessintensiveness

HENP/Computer Science synergy has increased dramatically HENP/Computer Science synergy has increased dramatically over the last two years, focused on Data Gridsover the last two years, focused on Data Grids Successful collaborations in GriPhyN, PPDG, EU Data GridSuccessful collaborations in GriPhyN, PPDG, EU Data Grid

The TeraGrid (present and future) and its development program The TeraGrid (present and future) and its development program is scoped at an appropriate level of depth and diversityis scoped at an appropriate level of depth and diversity to tackle the LHC and other “Petascale” problems, to tackle the LHC and other “Petascale” problems,

over a 5 year time span over a 5 year time span matched to the LHC time schedule, with full ops. In 2007matched to the LHC time schedule, with full ops. In 2007

Page 25: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Some Extra Some Extra

Slides FollowSlides Follow

Page 26: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Computing Challenges: Computing Challenges: LHC ExampleLHC Example

Geographical dispersion:Geographical dispersion: of people and resources of people and resources Complexity:Complexity: the detector and the LHC environment the detector and the LHC environment Scale: Scale: Tens of Petabytes per year of dataTens of Petabytes per year of data

5000+ Physicists 250+ Institutes 60+ Countries

Major challenges associated with:Major challenges associated with:Communication and collaboration at a distanceCommunication and collaboration at a distance

Network-distributed computing and data resources Network-distributed computing and data resources Remote software development and physics analysisRemote software development and physics analysisR&D: New Forms of Distributed Systems: Data GridsR&D: New Forms of Distributed Systems: Data Grids

Page 27: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Why Worldwide Computing? Why Worldwide Computing? Regional Center Concept GoalsRegional Center Concept Goals

Managed, fair-shared access for Physicists everywhereManaged, fair-shared access for Physicists everywhere Maximize total funding resources while meeting the Maximize total funding resources while meeting the

total computing and data handling needstotal computing and data handling needs Balance proximity of datasets to large central resources, Balance proximity of datasets to large central resources,

against regional resources under more local controlagainst regional resources under more local control Tier-N ModelTier-N Model

Efficient network use: higher throughput on short pathsEfficient network use: higher throughput on short paths Local > regional > national > internationalLocal > regional > national > international

Utilizing all intellectual resources, in several time zonesUtilizing all intellectual resources, in several time zones CERN, national labs, universities, remote sitesCERN, national labs, universities, remote sites Involving physicists and students at their home institutionsInvolving physicists and students at their home institutions

Greater flexibility to pursue different physics interests, Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by regionpriorities, and resource allocation strategies by region

And/or by Common Interests (physics topics, And/or by Common Interests (physics topics, subdetectors,…)subdetectors,…)

Manage the System’s ComplexityManage the System’s Complexity Partitioning facility tasks, to manage and focus resourcesPartitioning facility tasks, to manage and focus resources

Page 28: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

HENP Related Data Grid HENP Related Data Grid ProjectsProjects

Funded ProjectsFunded Projects PPDG IPPDG I USAUSA DOEDOE $ 2M$ 2M 1999-20011999-2001 GriPhyNGriPhyN USAUSA NSFNSF $ 11.9M + $1.6M$ 11.9M + $1.6M 2000-20052000-2005 EU DataGridEU DataGrid EUEU ECEC € € 10M10M 2001-20042001-2004 PPDG II (CP)PPDG II (CP) USAUSA DOEDOE $ 9.5M$ 9.5M 2001-20042001-2004 iVDGLiVDGL USAUSA NSFNSF $ 13.7M + $2M$ 13.7M + $2M 2001-20062001-2006 DataTAGDataTAG EUEU ECEC € € 4M4M 2002-20042002-2004

About to be Funded ProjectAbout to be Funded Project GridPPGridPP** UKUK PPARCPPARC >$15M?>$15M? 2001-20042001-2004

Many national projects of interest to HENPMany national projects of interest to HENP Initiatives in US, UK, Italy, France, NL, Germany, Japan, …Initiatives in US, UK, Italy, France, NL, Germany, Japan, … EU networking initiatives (GEU networking initiatives (Gééant, SURFNet)ant, SURFNet) US Distributed Terascale Facility: US Distributed Terascale Facility:

($53M, 12 TFL, 40 Gb/s network)($53M, 12 TFL, 40 Gb/s network)

* = in final stages of approval

Page 29: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Network Progress andNetwork Progress andIssues for Major ExperimentsIssues for Major Experiments

Network backbones are advancing rapidly to the 10 Gbps Network backbones are advancing rapidly to the 10 Gbps range: “Gbps” end-to-end data flows will soon be in demandrange: “Gbps” end-to-end data flows will soon be in demand These advances are likely to have a profound impactThese advances are likely to have a profound impact

on the major physics Experiments’ Computing Models on the major physics Experiments’ Computing Models We need to work on the technical and political network issuesWe need to work on the technical and political network issues

Share technical knowledge of TCP: Windows, Share technical knowledge of TCP: Windows, Multiple Streams, OS kernel issues; Provide User ToolsetMultiple Streams, OS kernel issues; Provide User Toolset

Getting higher bandwidth to regions outside W. Europe and Getting higher bandwidth to regions outside W. Europe and US: China, Russia, Pakistan, India, Brazil, Chile, Turkey, etc.US: China, Russia, Pakistan, India, Brazil, Chile, Turkey, etc. Even to enable their collaborationEven to enable their collaboration

Advanced integrated applications, such as Data Grids, rely onAdvanced integrated applications, such as Data Grids, rely onseamless “transparent” operation of our LANs and WANsseamless “transparent” operation of our LANs and WANs With reliable, quantifiable (monitored), high performanceWith reliable, quantifiable (monitored), high performance Networks need to become part of the Grid(s) designNetworks need to become part of the Grid(s) design New paradigms of network and system monitoringNew paradigms of network and system monitoring

and use need to be developed, in the Grid contextand use need to be developed, in the Grid context

Page 30: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Grid-Related R&D Projects in CMS: Grid-Related R&D Projects in CMS: Caltech, FNAL, UCSD, UWisc, UFlCaltech, FNAL, UCSD, UWisc, UFl

Installation, Configuration and Deployment of Prototype Installation, Configuration and Deployment of Prototype Tier2 Centers at Caltech/UCSD and FloridaTier2 Centers at Caltech/UCSD and Florida

Large Scale Automated Distributed Simulation ProductionLarge Scale Automated Distributed Simulation Production DTF “TeraGrid” (Micro-)Prototype: CIT, Wisconsin DTF “TeraGrid” (Micro-)Prototype: CIT, Wisconsin

Condor, NCSACondor, NCSA Distributed MOnte Carlo Production (MOP): FNALDistributed MOnte Carlo Production (MOP): FNAL

““MONARC” Distributed Systems Modeling;MONARC” Distributed Systems Modeling; Simulation system applications to Grid Hierarchy Simulation system applications to Grid Hierarchy managementmanagement Site configurations, analysis model, workloadSite configurations, analysis model, workload Applications to strategy development; e.g. inter-siteApplications to strategy development; e.g. inter-site

load balancing using a “Self Organizing Neural Net” load balancing using a “Self Organizing Neural Net” (SONN)(SONN)

Agent-based System Architecture for DistributedAgent-based System Architecture for DistributedDynamic ServicesDynamic Services

Grid-Enabled Object Oriented Data AnalysisGrid-Enabled Object Oriented Data Analysis

Page 31: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

MONARC Simulation System ValidationMONARC Simulation System Validation

CMS Proto-Tier1 Production Farm at FNAL

Mean measured Value ~48MB/sMeasurement

SimulationJet

<0.52>

Muon<0.90>

CMS Farm at CERN

Page 32: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

MONARC SONN: 3 Regional Centres MONARC SONN: 3 Regional Centres

Learning to Export Jobs (Day 0)Learning to Export Jobs (Day 0)

Day = 0

NUST20 CPUs

CERN30 CPUs

CALTECH25 CPUs

1MB/s ; 150 ms RTT

1.2 MB/s

150 ms RTT 0.8

MB/s

200 m

s RTT

Page 33: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

US CMS Remote Control RoomUS CMS Remote Control RoomFor LHCFor LHC

Page 34: Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure   National Science Foundation

Using local Tag event database, user plots event parameters of interest User selects subset of events to be fetched for further analysis Lists of matching events sent to Caltech and San Diego Tier2 servers begin sorting through databases extracting required events For each required event, a new large virtual object is materialized in the server-side cache, this object contains all tracks in the event. The database files containing the new objects are sent to the client using Globus FTP, the client adds them to its local cache of large objects The user can now plot event parameters not available in the Tag Future requests take advantage of previously cached large objects in the client

Full Event Database of

~100,000 large objects

Full Event Database of

~40,000 large objects

“Tag” database

of ~140,000

small objects

Bandwidth Greedy Grid-enabled Object Collection Analysis for Particle Physics (SC2001 Demo)

Julian Bunn, Ian Fisk, Koen Holtman, Harvey Newman, James Patton

RequestRequest

Parallel tuned GSI FTP

Parallel tuned GSI FTP

The object of this demo is to show grid-supported interactive physics analysis on a set of 144,000 physics events.Initially we start out with 144,000 small Tag objects, one for each event, on the Denver client machine. We also

have 144,000 LARGE objects, containing full event data, divided over the two tier2 servers.

http://pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall_JJB.htm