18
[email protected] 1 Computing for Hall D Computing for Hall D Ian Bird Hall D Collaboration Meeting March 22, 2002

Computing for Hall D

  • Upload
    alyssa

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Computing for Hall D. Ian Bird Hall D Collaboration Meeting March 22, 2002. Data Volume per experiment per year (Raw data - in units of 10 9 bytes). But : collaboration sizes!. Technologies. Technologies are advancing rapidly Compute power Storage – tape and disk Networking - PowerPoint PPT Presentation

Citation preview

Page 1: Computing for Hall D

[email protected] 1

Computing for Hall DComputing for Hall DComputing for Hall DComputing for Hall D

Ian Bird

Hall D Collaboration MeetingMarch 22, 2002

Page 2: Computing for Hall D

Data Volume per experiment per year (Raw data - in units of 109 bytes)

100

1000

10000

100000

1000000

1980 1990 2000 2010

E691

E665

E769

E791

CDF/ D0

KTeV

E871

BABAR

CMS/ ATLAS

E831

ALEPH

J LAB

STAR/ PHENI X

NA48

ZEUS

But: collaboration sizes!

Page 3: Computing for Hall D

[email protected] 3

Technologies

• Technologies are advancing rapidly– Compute power– Storage – tape and disk– Networking

• What will be available 5 years from now?– Difficult to predict – but it will not be a problem to provide any of the

resources that Hall D will need….

– E.g computing:

Page 4: Computing for Hall D

[email protected] 4

Recently, 5 TB IDE cache disk (5 x 8u) per 19”

Intel Linux Farm

First purchases, 9 duals per 24” rack

FY00, 16 duals (2u) + 500 GB cache (8u) per 19” rack

FY01, 4 CPU per 1u

Page 5: Computing for Hall D

[email protected] 5

Compute power

• Blades– Low power chips

• Transmeta, Intel

– Hundreds in a single rack

• “An RLX System 300ex chassis holds twenty-four ServerBlade 800i units in a single 3U chassis. This density achievement packs 336 independent servers into a single 42U rack, delivering 268,800 MHz, over 27 terabytes of disk storage, and a whopping 366 gigabytes of DDR memory. “

Page 6: Computing for Hall D

[email protected] 6

Technologies

• As well as computing, developments in Storage and Networking will also make rapid progress

• Grid computing techniques will bring these technologies together

• Facilities – new Computer Center planned

• Issues will not be technology, but:– How to use them intelligently– Hall D computing model– People– Treating computing seriously enough to assign sufficient resources

Page 7: Computing for Hall D

[email protected] 7

(Data-) Grid Computing(Data-) Grid Computing(Data-) Grid Computing(Data-) Grid Computing

Page 8: Computing for Hall D

[email protected] 8

Particle Physics Data GridCollaboratory Pilot

Who we are:Four leading Grid Computer Science Projects

andSix international High Energy and Nuclear Physics Collaborations

What we do:Develop and deploy Grid Services for our Experiment Collaborators

andPromote and provide common Grid software and standards

The problem at hand today:Petabytes of storage, Teraops/s of computing

Thousands of users, Hundreds of institutions,

10+ years of analysis ahead

Page 9: Computing for Hall D

[email protected] 9

PPDG Experiments

ATLAS - a Toroidal LHC ApparatuS at CERN Runs 2006 onGoals: TeV physics - the Higgs and the origin of mass …

http://atlasinfo.cern.ch/Atlas/Welcome.html

BaBar - at the Stanford Linear Accelerator Center Running

NowGoals: study CP violation and more

http://www.slac.stanford.edu/BFROOT/

CMS - the Compact Muon Solenoid detector at CERN Runs 2006

onGoals: TeV physics - the Higgs and the origin of mass …

http://cmsinfo.cern.ch/Welcome.html/

D0 – at the D0 colliding beam interaction region at Fermilab Runs SoonGoals: learn more about the top quark, supersymmetry, and the Higgs

http://www-d0.fnal.gov/

STAR - Solenoidal Tracker At RHIC at BNL Running

NowGoals: quark-gluon plasma …

http://www.star.bnl.gov/

Thomas Jefferson National Laboratory Running

NowGoals: understanding the nucleus using electron beams …

http://www.jlab.org/

Page 10: Computing for Hall D

[email protected] 10

PPDG Computer Science Groups

Condor – develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing on large collections of computing resources with distributed ownership.

http://www.cs.wisc.edu/condor/

Globus - developing fundamental technologies needed to build persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations

http://www.globus.org/

SDM - Scientific Data Management Research Group – optimized and standardized access to storage systems

http://gizmo.lbl.gov/DM.html

Storage Resource Broker - client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and cataloging/accessing replicated data sets.

http://www.npaci.edu/DICE/SRB/index.html

Page 11: Computing for Hall D

[email protected] 11

Delivery of End-to-End Applications& Integrated Production Systems

to allow thousands of physicists to share data & computing resources for scientific processing and analyses

PPDG Focus:

- Robust Data Replication

- Intelligent Job Placement and Scheduling

- Management of Storage Resources

- Monitoring and Information of Global Services

Relies on Grid infrastructure:- Security & Policy- High Speed Data Transfer- Network management

Resources: Computers, Storage, Networks

Operators & Users

Page 12: Computing for Hall D

[email protected] 12

Project Activities, End-to-End Applicationsand Cross-Cut Pilots

Project Activities are focused Experiment – Computer Science Collaborative developments.

Replicated data sets for science analysis – BaBar, CMS, STARDistributed Monte Carlo production services – ATLAS, D0, CMSCommon storage management and interfaces – STAR, JLAB

End-to-End Applications used in Experiment data handling systems to give real-world requirements, testing and feedback.

Error reporting and responseFault tolerant integration of complex components

Cross-Cut Pilots for common services and policies Certificate Authority policy and authenticationFile transfer standards and protocolsResource Monitoring – networks, computers, storage.

Page 13: Computing for Hall D

[email protected] 13

Year 0.5-1 Milestones (1)

Align milestones to Experiment data challenges:

– ATLAS – production distributed data service – 6/1/02

– BaBar – analysis across partitioned dataset storage – 5/1/02

– CMS – Distributed simulation production – 1/1/02

– D0 – distributed analyses across multiple workgroup clusters – 4/1/02

– STAR – automated dataset replication – 12/1/01

– JLAB – policy driven file migration – 2/1/02

Page 14: Computing for Hall D

[email protected] 14

Year 0.5-1 Milestones

Common milestones with EDG:

GDMP – robust file replication layer – Joint Project with EDG Work Package (WP) 2 (Data Access)

Support of Project Month (PM) 9 WP6 TestBed Milestone. Will participate in integration fest at CERN - 10/1/01

Collaborate on PM21 design for WP2 - 1/1/02

Proposed WP8 Application tests using PM9 testbed – 3/1/02

Collaboration with GriPhyN:

SC2001 demos will use common resources, infrastructure and presentations – 11/16/01

Common, GriPhyN-led grid architecture

Joint work on monitoring proposed

Page 15: Computing for Hall D

[email protected] 15

Year ~0.5-1 “Cross-cuts”

• Grid File Replication Services used by >2 experiments:– GridFTP – production releases

• Integrate with D0-SAM, STAR replication• Interfaced through SRB for BaBar, JLAB• Layered use by GDMP for CMS, ATLAS

– SRB and Globus Replication Services• Include robustness features• Common catalog features and API

– GDMP/Data Access layer continues to be shared between EDG and PPDG.

• Distributed Job Scheduling and Management used by >1 experiment:• Condor-G, DAGman, Grid-Scheduler for D0-SAM, CMS• Job specification language interfaces to distributed schedulers – D0-SAM,

CMS, JLAB

• Storage Resource Interface and Management• Consensus on API between EDG, SRM, and PPDG• Disk cache management integrated with data replication services

Page 16: Computing for Hall D

[email protected] 16

Year ~1 other goals:

• Transatlantic Application Demonstrators:– BaBar data replication between SLAC and IN2P3– D0 Monte Carlo Job Execution between Fermilab and NIKHEF– CMS & ATLAS simulation production between Europe/US

• Certificate exchange and authorization.– DOE Science Grid as CA?

• Robust data replication.– fault tolerant – between heterogeneous storage resources.

• Monitoring Services– MDS2 (Metacomputing Directory Service)?– common framework– network, compute and storage information made available to scheduling and resource management.

Page 17: Computing for Hall D

[email protected] 17

PPDG activities as part of the Global Grid Community

Coordination with other Grid Projects in our field:GriPhyN – Grid for Physics NetworkEuropean DataGridStorage Resource Management collaboratoryHENP Data Grid Coordination Committee

Participation in Experiment and Grid deployments in our field:ATLAS, BaBar, CMS, D0, Star, JLAB experiment data handling systemsiVDGL/DataTAG – International Virtual Data Grid LaboratoryUse DTF computational facilities?

Active in Standards Committees:Internet2 HENP Working Group Global Grid Forum

Page 18: Computing for Hall D

[email protected] 18

What should happen now?

• Collaboration needs to define it’s computing model– It really will be distributed – grid based– Although the compute resources can be provided – it is not obvious that

the vast quantities of data can really be analyzed efficiently by a small group

• Do not underestimate the task

– The computing model will define requirements for computing – some of which may require some lead time

• Ensure software and computing is managed as a project equivalent in scope to the entire detector – It has to last at least as long, it runs 24x365– The complete software system is more complex than the detector, even for

Hall D where the reconstruction is relatively straightforward– It will be used by everyone

• Find and empower a computing project manager now