20
CHEP 2000 (F eb. 7-11) Paul Avery (Data Grids in the LHC Era) 1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville, Florida, USA [email protected] http://www.phys.ufl.edu/~avery/ CHEP 2000 Padova, Italy Feb. 7-11, 2000

CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

Embed Size (px)

Citation preview

Page 1: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 1

The Promise of Computational Grids in the LHC Era

Paul AveryUniversity of Florida

Gainesville, Florida, USA

[email protected]://www.phys.ufl.edu/~avery/

CHEP 2000Padova, Italy

Feb. 7-11, 2000

Page 2: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 2

Example: CMS1800 Physicists150 Institutes32 Countries

LHC Computing ChallengesLHC Computing ChallengesComplexity of LHC environment and resulting dataScale: Petabytes of data per yearGeographical distribution of people and resources

Page 3: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 3

Dimensioning / Deploying IT ResourcesDimensioning / Deploying IT ResourcesLHC computing scale is “something new”Solution requires directed effort, new initiativesSolution must build on existing foundations

Robust computing at national centers essentialUniversities must have resources to maintain intellectual

strength, foster training, engage fresh minds

Scarce resources are/will be a fact of life plan for itGoal: get new resources, optimize deployment of all

resources to maximize effectivenessCPU: CERN / national lab / region / institution /

desktopData: CERN / national lab / region / institution /

desktopNetworks: International / national / regional / local

Page 4: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 4

Deployment ConsiderationsDeployment ConsiderationsProximity of datasets to appropriate IT resources

Massive CERN & national labsData caches Regional centersMini-summary InstitutionalMicro-summary Desktop

Efficient use of network bandwidthLocal > regional > national > international

Utilizing all intellectual resourcesCERN, national labs, universities, remote sitesScientists, students

Leverage training, education at universitiesFollow lead of commercial world

Distributed data, web servers

Page 5: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 5

Hierarchical grid best deployment optionHierarchy Optimal resource layout (MONARC

studies)Grid Unified system

Arrangement of resourcesTier 0 Central laboratory computing resources

(CERN)Tier 1 National center (Fermilab / BNL)Tier 2 Regional computing center (university)Tier 3 University group computing resourcesTier 4 Individual workstation/CPU

We call this arrangement a “Data Grid” to reflect the overwhelming role that data plays in deployment

Solution: A Data GridSolution: A Data Grid

Page 6: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 6

Layout of ResourcesLayout of ResourcesWant good “impedance match” between Tiers

TierN-1 serves TierN

TierN big enough to exert influence on TierN-1

TierN-1 small enough to not duplicate TierN

Resources roughly balanced across Tiers

Tier 0 Tier 15 6

Tier 1 Tier 2

Tier 2

Tier 3?

i

i

Tier 1 Tier 0

Tier 2 Tier 1

Reasonable balance?

Page 7: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 7

Data Grid Hierarchy (Schematic)Data Grid Hierarchy (Schematic)

Tier 1

T2

T2

T2

T2

T2

3

3

3

3

33

3

3

3

3

3

3

Tier 0 (CERN)

44 4 4

33

Page 8: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 862

2 M

bit

s/s 622 M

bits/s

CERN (CMS/ATLAS)350k Si95

350 Tbytes Disk; Robot

Tier 2 Center20k Si95

25 Tbytes Disk, Robot

Tier 1: FNAL/BNL70k Si95

70 Tbytes Disk; Robot

2.4 Gbps

N

622

Mb

its/

s

622Mbits/s

2.4 GbpsTier 3

Univ WG1

Tier 3Univ WG

M

US Model Circa 2005US Model Circa 2005

Tier 3Univ WG

2

Page 9: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 9

InstituteInstitute

Data Grid Hierarchy (CMS)Data Grid Hierarchy (CMS)

Online System

Offline Farm~20 TIPS

CERN Computer Center

Fermilab~4 TIPS

France Regional Center

Italy Regional Center

Germany Regional Center

Workstations

~100 MBytes/sec

~100 MBytes/sec

~2.4 Gbits/sec

1-10 Gbits/sec

Bunch crossing per 25 nsecs.

100 triggers per second

Event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute has ~10 physicists workingon one or more channels

Data for these channels is cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec

~622 Mbits/sec

TierTier 0 0

Tier 1Tier 1

Tier 3Tier 3

Tier 4Tier 4

1 TIPS = 25,000 SpecInt95

PC (today) = 10-20 SpecInt95

Tier 2Tier 2 Tier2 Center ~1 TIPS

Tier2 Center ~1 TIPS

Tier2 Center ~1 TIPS

Tier2 Center ~1 TIPS

Tier2 Center ~1 TIPS

InstituteInstitute ~0.25TIPS

Page 10: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 10

Why a Data Grid: PhysicalWhy a Data Grid: PhysicalUnified system: all computing resources part of grid

Efficient resource use (manage scarcity)Averages out spikes in usageResource discovery / scheduling / coordination truly possible“The whole is greater than the sum of its parts”

Optimal data distribution and proximity Labs are close to the data they needUsers are close to the data they needNo data or network bottlenecks

Scalable growth

Page 11: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 11

Why a Data Grid: PoliticalWhy a Data Grid: PoliticalCentral lab cannot manage / help 1000s of users

Easier to leverage resources, maintain control, assert priorities regionally

Cleanly separates functionalityDifferent resource types in different TiersFunding complementarity (NSF vs DOE)Targeted initiatives

New IT resources can be added “naturally”Additional matching resources at Tier 2 universitiesLarger institutes can join, bringing their own resourcesTap into new resources opened by IT “revolution”

Broaden community of scientists and studentsTraining and educationVitality of field depends on University / Lab partnership

Page 12: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 12

Tier 2 Regional CentersTier 2 Regional CentersPossible Model : CERN:National:Tier 2 1/3 : 1/3 : 1/3Complementary role to Tier 1 lab-based centers

Less need for 24 7 operation lower component costsLess production-oriented respond to analysis prioritiesFlexible organization, i.e. by physics goals, subdetectorsVariable fraction of resources available to outside users

Range of activities includesReconstruction, simulation, physics analysesData caches / mirrors to support analysesProduction in support of parent Tier 1Grid R&D ...

Tier 0

Tier 1

Tier 2

Tier 3

Tier 4Mo

re O

rgan

izat

ion M

ore F

lexibility

Page 13: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 13

Distribution of Tier 2 CentersDistribution of Tier 2 CentersTier 2 centers arranged regionally in US model

Good networking connections to move data (caches)Location independence of users always maintained

Increases collaborative possibilitiesEmphasis on training, involvement of students

High quality desktop environment for remote collaboration, e.g., next generation VRVS system

Page 14: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 14

Strawman Tier 2 ArchitectureLinux Farm of 128 Nodes $ 0.30 MSun Data Server with RAID Array $ 0.10 MTape Library $ 0.04 M LAN Switch $ 0.06 MCollaborative Infrastructure $ 0.05 M Installation and Infrastructure $ 0.05 MNet Connect to Abilene network $ 0.14 MTape Media and Consumables $ 0.04 MStaff (Ops and System Support) $ 0.20 M*Total Estimated Cost (First Year) $ 0.98 M

Cost in Succeeding Years, for evolution, $ 0.68 Mupgrade and ops:

* 1.5 – 2 FTE support required per Tier 2. Physicists from institute also aid in support.

Page 15: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 15

Strawman Tier 2 EvolutionStrawman Tier 2 Evolution

2000 2005Linux Farm: 1,500 SI95 20,000 SI95*Disks on CPUs4 TB 20 TBRAID Array 1 TB 20 TBTape Library 1 TB 50 - 100 TBLAN Speed 0.1 - 1 Gbps 10 - 100 GbpsWAN Speed 155 - 622 Mbps 2.5 - 10 GbpsCollaborative MPEG2 VGA Realtime HDTV

Infrastructure (1.5 - 3 Mbps) (10 - 20 Mbps)

RAID disk used for “higher availability” data

* Reflects lower Tier 2 component costs due to less demanding usage, e.g. simulation.

Page 16: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 16

The GriPhyN ProjectThe GriPhyN ProjectJoint project involving

US-CMS, US-ATLASLIGO Gravity wave experimentSDSS Sloan Digital Sky Surveyhttp://www.phys.ufl.edu/~avery/mre/

Requesting funds from NSF to build world’s first production-scale grid(s)

Sub-implementations for each experimentNSF pays for Tier 2 centers, some R&D, some networking

Realization of unified Grid system requires researchMany common problems for different implementationsRequires partnership with CS professionals

Page 17: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 17

R & D Foundations IR & D Foundations IGlobus (Grid middleware)

Grid-wide services Security

Condor (see M. Livny paper)General language for service seekers / service providersResource discoveryResource scheduling, coordination, (co)allocation

GIOD (Networked object databases)Nile (Fault-tolerant distributed computing)

Java-based toolkit, running on CLEO

Page 18: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 18

R & D Foundations IIR & D Foundations IIMONARC

Construct and validate architectures Identify important design parametersSimulate extremely complex, dynamic system

PPDG (Particle Physics Data Grid)DOE / NGI funded for 1 yearTestbed systemsLater program of work incorporated into GriPhyN

Page 19: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 19

The NSF ITR InitiativeThe NSF ITR Initiative Information Technology Research Program

Aimed at funding innovative research in IT$90M in funds authorizedMax of $12.5M for a single proposal (5 years)Requires extensive student support

GriPhyN submitted preproposal Dec. 30, 1999 Intend that ITR fund most of our Grid research programMajor costs for people, esp. students / postdocsMinimal equipmentSome networking

Full proposal due April 17, 2000

Page 20: CHEP 2000 (Feb. 7-11)Paul Avery (Data Grids in the LHC Era)1 The Promise of Computational Grids in the LHC Era Paul Avery University of Florida Gainesville,

CHEP 2000 (Feb. 7-11)

Paul Avery (Data Grids in the LHC Era) 20

Summary of Data Grids and the LHCSummary of Data Grids and the LHCDevelop integrated distributed system, while meeting

LHC goalsATLAS/CMS: production, data handling oriented (LIGO/SDSS: computation, “commodity component” oriented)

Build, test the regional center hierarchyTier 2 / Tier 1 partnershipCommission and test software, data handling systems, and

data analysis strategies

Build, test the enabling collaborative infrastructureFocal points for student-faculty interaction in each regionRealtime high-res video as part of collaborative environment

Involve students at universities in building the data analysis, and in the physics discoveries at the LHC