33
HEP and Data Grids (Aug. 4-5, 2001) Paul Avery 1 High Energy Physics and Data Grids Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ [email protected] US/UK Grid Workshop San Francisco August 4-5, 2001

High Energy Physics and Data Grids

  • Upload
    zagiri

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

High Energy Physics and Data Grids. Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ [email protected]. US/UK Grid Workshop San Francisco August 4-5, 2001.  e e.   .   . u d. c s. t b. Essentials of High Energy Physics. - PowerPoint PPT Presentation

Citation preview

Page 1: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 1

High Energy Physicsand Data Grids

Paul AveryUniversity of Floridahttp://www.phys.ufl.edu/~avery/[email protected]

US/UK Grid WorkshopSan Francisco

August 4-5, 2001

Page 2: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 2

Essentials of High Energy PhysicsBetter name “Elementary Particle Physics”

Science: Elementary particles, fundamental forces

e

e

ud

cs

tb

Goal unified theory of natureUnification of forces (Higgs, superstrings, extra dimensions, …)Deep connections to large scale structure of universeLarge overlap with astrophysics, cosmology, nuclear physics

Quarks

Leptons

Particles Forces

Strong gluon

Electro-weak , W, Z0

Gravity graviton

Page 3: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 3

10-10 m ~ 10 eV >300,000 Y

10-15 m MeV - GeV

10-16 m >> GeV ~10 -6 sec

10-18 m ~ 100 GeV ~10 -10 sec

1900.... Quantum MechanicsAtomic physics

1940-50 Quantum Electro Dynamics

1950-65 Nuclei, HadronsSymmetries, Field theories

1965-75 Quarks. Gauge theories

1990 LEP 3 families, Precision Electroweak

10-19 m ~10 2 GeVOrigin of masses

The next step...

~10 -12 sec 2007 LHC Higgs ? Supersymmetry ?

197083 SPS ElectroWeak unification, QCD

~ 3 min

10-32 m ~1016 GeV ~10 -32 secProton Decay ? Underground GRAND Unified Theories ?

10-35 m ~1019 GeV(Planck scale)

~10 -43 sec ?? Quantum Gravity? Superstrings ?

The Origin of the Universe

1994 Tevatron Top quark

u e+Z

e-u

HEP Short History + Frontiers1/T t/ p

Page 4: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 4

HEP ResearchExperiments primarily accelerator based

Fixed target, colliding beams, special beams

DetectorsSmall, large, general purpose, special purpose

… but wide variety of other techniquesCosmic rays, proton decay, g-2, neutrinos, space missions

Increasing scale of experiments and laboratoriesForced on us by ever higher energiesComplexity, scale, costs large collaborations International collaborations are the norm todayGlobal collaborations are the future (LHC)

LHC discussed in next few slides

Page 5: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 5

The CMS Collaboration

1010

448

351

1809

Member States

Non-Member States

Total

USA

58

36

144

Member States

Total

USA

50Non-Member States

Number ofScientists

Number of Laboratories

Slovak Republic

CERN

France

Italy

UK

Switzerland

USA

Austria

Finland

Greece

Hungary

Belgium

Poland

PortugalSpain

Pakistan

Georgia

Armenia

UkraineUzbekistan

Cyprus

Croatia

China

TurkeyBelarus

Estonia

India

Germany

Korea

Russia

Bulgaria

China (Taiwan)

1809 Physicists and Engineers 31 Countries 144 Institutions

Associated Institutes

Number of ScientistsNumber of Laboratories

365

Page 6: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 6

CERN LHC site

CMS

Atlas

LHCb

ALICE

Page 7: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 7

High Energy Physics at the LHC“Compact” Muon Solenoid

at the LHC (CERN)

Smithsonianstandard man

Page 8: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 8

Particle

ProtonProton 2835 bunch/beam Protons/bunch 1011

Beam energy 7 TeV (7x1012 ev)Luminosity 1034 cm2s1

Crossing rate 40 MHz(every 25 nsec)

Collision rate ~109 Hz

Parton(quark, gluon)

Proton

Selection: 1 in 1013

ll

jetjet

Bunch

SUSY.....

Higgs

Zo

Zoe+

e+

e-

e-

New physics rate ~ 105 Hz

Collisions at LHC (2007?)

(Average ~20 Collisions/Crossing)

Page 9: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 9

HEP DataScattering is principal technique for gathering data

Collisions of beam-beam or beam-target particlesTypically caused by a single elementary interactionBut also background collisions obscures physics

Each collision generates many particles: “Event”Particles traverse detector, leaving electronic signature Information collected, put into mass storage (tape)Each event is independent trivial computational

parallelism

Data Intensive ScienceSize of raw event record: 20KB 1MB106 109 events per year0.3 PB per year (2001) BaBar (SLAC)1 PB per year (2005) CDF, D0 (Fermilab)5 PB per year (2007) ATLAS, CMS (LHC)

Page 10: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 10

Data Rates: From Detector to Storage

Level 1 Trigger: Special Hardware

40 MHz ~1000 TB/sec

75 KHz 75 GB/sec

5 KHz 5 GB/sec

Level 2 Trigger: Commodity CPUs

100 Hz 100 MB/sec

Level 3 Trigger: Commodity CPUs

Raw Data to storage

Physics filtering

Page 11: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 11

LHC Data Complexity“Events” resulting from beam-beam collisions:

Signal event is obscured by 20 overlapping uninteresting collisions in same crossing

CPU time does not scale from previous generations

2000 2007

Page 12: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 12

All charged tracks with pt > 2 GeV

Reconstructed tracks with pt > 25 GeV

(+30 minimum bias events)

40M events/sec, selectivity: 1 in 1013

Example: Higgs Decay into 4 Muons

Page 13: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 13

1800 Physicists150 Institutes32 Countries

LHC Computing ChallengesComplexity of LHC environment and resulting dataScale: Petabytes of data per year (100 PB by ~2010)

Millions of SpecInt95s of CPUGeographical distribution of people and resources

Page 14: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery

Transatlantic Net WG (HN, L. Price)

Tier0 - Tier1 BW Requirements [*] 2001 2002 2003 2004 2005 2006

CMS 100 200 300 600 800 2500

ATLAS 100 200 300 600 800 2500

BaBar 300 600 1100 1600 2300 3000

CDF 600 1200 1600 2000 3000 4000

D0 600 1200 1600 2000 3000 4000

BTeV 20 40 100 200 300 500

DESY 100 180 210 240 270 300

CERNBW

155-310

622 1250 2500 5000 10000

[*] Installed BW in Mbps. Maximum Link [*] Installed BW in Mbps. Maximum Link Occupancy 50%; work in progressOccupancy 50%; work in progress

Page 15: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 15

Hoffmann LHC Computing Report 2001

Tier0 – Tier1 link requirements

(1) Tier1 Tier0 Data Flow for Analysis 0.5 - 1.0 Gbps

(2) Tier2 Tier0 Data Flow for Analysis 0.2 - 0.5 Gbps

(3) Interactive Collaborative Sessions (30 Peak) 0.1 - 0.3 Gbps

(4) Remote Interactive Sessions (30 Flows Peak) 0.1 - 0.2 Gbps

(5) Individual (Tier3 or Tier4) data transfers 0.8 Gbps Limit to 10 Flows of 5 Mbytes/sec each

TOTAL Per Tier0 - Tier1 Link 1.7 - 2.8 Gbps

Corresponds to ~10 Gbps Baseline BW Installed on US-CERN LinkAdopted by the LHC Experiments (Steering Committee Report)

Page 16: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 16

LHC Computing ChallengesMajor challenges associated with:

Scale of computing systemsNetwork-distribution of computing and data resources Communication and collaboration at a distanceRemote software development and physics analysis

Result of these considerations: Data Grids

Page 17: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 17

Tier0 CERNTier1 National LabTier2 Regional Center (University, etc.)Tier3 University workgroupTier4 Workstation

Global LHC Data Grid Hierarchy

Tier 1

T2

T2

T2

T2

T2

3

3

3

3

3

3

3

3

3

3

3

Tier 0 (CERN)

4 4 4 4

3 3

Key ideas:Hierarchical structureTier2 centersOperate as unified Grid

Page 18: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 18

Example: CMS Data Grid

Tier2 Center

Online System

CERN Computer Center > 20

TIPS

USA CenterFrance Center

Italy Center UK Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations,other portals

~100 MBytes/sec

2.5 Gbits/sec

100 - 1000

Mbits/sec

Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute has ~10 physicists working on one or more channels

Physics data cache

~PBytes/sec

2.5 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center

~622 Mbits/sec

Tier 0 +1

Tier 1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1

Page 19: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 19

Tier1 and Tier2 CentersTier1 centers

National laboratory scale: large CPU, disk, tape resourcesHigh speed networksMany personnel with broad expertiseCentral resource for large region

Tier2 centersNew concept in LHC distributed computing hierarchySize [national lab * university]1/2

Based at large University or small laboratoryEmphasis on small staff, simple configuration & operation

Tier2 roleSimulations, analysis, data cachingServe small country, or region within large country

Page 20: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 20

LHC Tier2 Center (2001)

Router

FEthFEth Switch

FEth SwitchFEth Switch

FEth SwitchGEth Switch

Data S

erver

>1 RAIDTape

WA

N

Hi-speedchannel

Page 21: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 21

Buy late, but not too late: phased implementationR&D Phase 2001-2004 Implementation Phase 2004-2007R&D to develop capabilities and computing model itselfPrototyping at increasing scales of capability & complexity

1.4 years

1.2 years

1.1 years

2.1 years

Hardware Cost Estimates

Page 22: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 22

HEP Related Data Grid ProjectsFunded projects

GriPhyN USA NSF, $11.9M + $1.6MPPDG I USA DOE, $2MPPDG II USA DOE, $9.5MEU DataGrid EU $9.3M

Proposed projects iVDGL USA NSF, $15M + $1.8M + UKDTF USA NSF, $45M + $4M/yrDataTag EU EC, $2M?GridPP UK PPARC, > $15M

Other national projectsUK e-Science (> $100M for 2001-2004) Italy, France, (Japan?)

Page 23: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 23

(HEP Related) Data Grid Timeline

Q2 00

Q3 00

Q4 00

Q1 01

Q2 01

Q3 01

GriPhyN approved, $11.9M+$1.6M

Outline of US-CMS Tier plan

Caltech-UCSD install proto-T2

Submit GriPhyN proposal, $12.5M

Submit iVDGL preproposal EU DataGrid approved,

$9.3M1st Grid coordination

meeting Submit PPDG proposal, $12M

Submit DTF proposal, $45M

Submit iVDGL proposal, $15M

PPDG approved, $9.5M

2nd Grid coordination meeting

iVDGL approved?DTF approved?DataTAG approved

Submit DataTAG proposal, $2M

Page 24: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 24

Coordination Among Grid ProjectsParticle Physics Data Grid (US, DOE)

Data Grid applications for HENPFunded 1999, 2000 ($2M)Funded 2001-2004 ($9.4M)http://www.ppdg.net/

GriPhyN (US, NSF)Petascale Virtual-Data GridsFunded 9/2000 – 9/2005 ($11.9M+$1.6M)http://www.griphyn.org/

European Data Grid (EU)Data Grid technologies, EU deploymentFunded 1/2001 – 1/2004 ($9.3M)http://www.eu-datagrid.org/

HEP in common

Focus: infrastructure development & deployment

International scope

Now developing joint coordination framework

GridPP, DTF, iVDGL very soon?

Page 25: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 25

Data Grid Management

Page 26: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 26

PPDG

BaB

ar D

ata

Man

agem

ent

BaBar

D0

CDF

Nuclear Physics

CMSAtlas

Globus Users

SRB Users

Condor Users

HENPGC

Users

CM

S D

ata Managem

ent

Nuclear Physics Data Management

D0 Data M

anagement

CDF Data ManagementA

tlas

Dat

a M

anag

emen

t

Globus Team

Condor

SRB Team

HE

NP

GC

Page 27: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 27

Work Package

Work Package title Lead contractor

WP1 Grid Workload Management INFN

WP2 Grid Data Management CERN

WP3 Grid Monitoring Services PPARC

WP4 Fabric Management CERN

WP5 Mass Storage Management PPARC

WP6 Integration Testbed CNRS

WP7 Network Services CNRS

WP8 High Energy Physics Applications CERN

WP9 Earth Observation Science Applications ESA

WP10 Biology Science Applications INFN

WP11 Dissemination and Exploitation INFN

WP12 Project Management CERN

EU DataGrid Project

Page 28: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 28

PPDG and GriPhyN Projects

PPDG focus on today’s (evolving) problems in HENP Current HEP: BaBar, CDF, D0 Current NP: RHIC, JLAB Future HEP: ATLAS , CMS

GriPhyN focus on tomorrow’s solutions ATLAS, CMS, LIGO, SDSS Virtual data, “Petascale” problems (Petaflops, Petabytes) Toolkit, export to other disciplines, outreach/education

Both emphasize Application sciences drivers CS/application partnership (reflected in funding) Performance

Explicitly complementary

Page 29: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 29

UniversityCPU, Disk,

Users

PRIMARY SITEData Acquisition,Tape, CPU, Disk,

Robot

Satellite SiteTape, CPU, Disk, Robot

Satellite SiteTape, CPU, Disk, Robot

UniversityCPU, Disk,

Users

UniversityCPU, Disk,

Users

Satellite SiteTape, CPU, Disk, Robot

Resource Discovery, Matchmaking, Co-Scheduling/Queueing, Tracking/Monitoring, Problem Trapping + Resolution

PPDG Multi-site Cached File Access System

Page 30: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 30

GriPhyN: PetaScale Virtual-Data Grids

Virtual Data Tools

Request Planning &

Scheduling ToolsRequest Execution & Management Tools

Transforms

Distributed resources(code, storage, CPUs,networks)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid ServicesOther Grid

Services

Interactive User Tools

Production TeamIndividual Investigator Workgroups

Raw data source

~1 Petaflop~100 Petabytes

Page 31: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 31

Virtual Data in Action

Data request may Compute locally Compute remotely Access local data Access remote data

Scheduling based on Local policies Global policies Cost

Major facilities, archives

Regional facilities, caches

Local facilities, cachesItem request

Page 32: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 32

GriPhyN Goals for Virtual Data

Transparency with respect to locationCaching, catalogs, in a large-scale, high-performance Data

Grid

Transparency with respect to materializationExact specification of algorithm componentsTraceability of any data productCost of storage vs CPU vs networks

Automated management of computation Issues of scale, complexity, transparencyComplications: calibrations, data versions, software

versions, …

Explore concept of virtual data and itsapplicability to data-intensive science

Page 33: High Energy Physics and Data Grids

HEP and Data Grids (Aug. 4-5, 2001)

Paul Avery 33

Data Grid Reference Architecture

RequestPlanningServices

Discipline- Specific Data Grid Applications

Communication, service discovery (DNS), authentication, delegation

Application

Collective

Resource

Connectivity

Fabric StorageSystems

ComputeSystems

Networks Catalogs

ReplicaSelectionServices

ReplicaManagement

Services

CommunityAuthorization

Service

CodeRepositories

StorageMgmt

Protocol

ComputeMgmt

Protocol

NetworkMgmt

Protocol

CatalogMgmt

Protocol

CodeMgmt

Protocol

ServiceReg.

Protocol

EnquiryProtocol

OnlineCertificateRepository

InformationServices

CoallocationServices

DistributedCatalogServices

ConsistencyManagement

Services

SystemMonitoringServices

ResourceBrokeringServices

UsageAccounting

Services

RequestManagement

Services