Upload
alyssa
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Computing for Hall D. Ian Bird Hall D Collaboration Meeting March 22, 2002. Data Volume per experiment per year (Raw data - in units of 10 9 bytes). But : collaboration sizes!. Technologies. Technologies are advancing rapidly Compute power Storage – tape and disk Networking - PowerPoint PPT Presentation
Citation preview
Computing for Hall DComputing for Hall DComputing for Hall DComputing for Hall D
Ian Bird
Hall D Collaboration MeetingMarch 22, 2002
Data Volume per experiment per year (Raw data - in units of 109 bytes)
100
1000
10000
100000
1000000
1980 1990 2000 2010
E691
E665
E769
E791
CDF/ D0
KTeV
E871
BABAR
CMS/ ATLAS
E831
ALEPH
J LAB
STAR/ PHENI X
NA48
ZEUS
But: collaboration sizes!
Technologies
• Technologies are advancing rapidly– Compute power– Storage – tape and disk– Networking
• What will be available 5 years from now?– Difficult to predict – but it will not be a problem to provide any of the
resources that Hall D will need….
– E.g computing:
Recently, 5 TB IDE cache disk (5 x 8u) per 19”
Intel Linux Farm
First purchases, 9 duals per 24” rack
FY00, 16 duals (2u) + 500 GB cache (8u) per 19” rack
FY01, 4 CPU per 1u
Compute power
• Blades– Low power chips
• Transmeta, Intel
– Hundreds in a single rack
• “An RLX System 300ex chassis holds twenty-four ServerBlade 800i units in a single 3U chassis. This density achievement packs 336 independent servers into a single 42U rack, delivering 268,800 MHz, over 27 terabytes of disk storage, and a whopping 366 gigabytes of DDR memory. “
Technologies
• As well as computing, developments in Storage and Networking will also make rapid progress
• Grid computing techniques will bring these technologies together
• Facilities – new Computer Center planned
• Issues will not be technology, but:– How to use them intelligently– Hall D computing model– People– Treating computing seriously enough to assign sufficient resources
(Data-) Grid Computing(Data-) Grid Computing(Data-) Grid Computing(Data-) Grid Computing
Particle Physics Data GridCollaboratory Pilot
Who we are:Four leading Grid Computer Science Projects
andSix international High Energy and Nuclear Physics Collaborations
What we do:Develop and deploy Grid Services for our Experiment Collaborators
andPromote and provide common Grid software and standards
The problem at hand today:Petabytes of storage, Teraops/s of computing
Thousands of users, Hundreds of institutions,
10+ years of analysis ahead
PPDG Experiments
ATLAS - a Toroidal LHC ApparatuS at CERN Runs 2006 onGoals: TeV physics - the Higgs and the origin of mass …
http://atlasinfo.cern.ch/Atlas/Welcome.html
BaBar - at the Stanford Linear Accelerator Center Running
NowGoals: study CP violation and more
http://www.slac.stanford.edu/BFROOT/
CMS - the Compact Muon Solenoid detector at CERN Runs 2006
onGoals: TeV physics - the Higgs and the origin of mass …
http://cmsinfo.cern.ch/Welcome.html/
D0 – at the D0 colliding beam interaction region at Fermilab Runs SoonGoals: learn more about the top quark, supersymmetry, and the Higgs
http://www-d0.fnal.gov/
STAR - Solenoidal Tracker At RHIC at BNL Running
NowGoals: quark-gluon plasma …
http://www.star.bnl.gov/
Thomas Jefferson National Laboratory Running
NowGoals: understanding the nucleus using electron beams …
http://www.jlab.org/
PPDG Computer Science Groups
Condor – develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing on large collections of computing resources with distributed ownership.
http://www.cs.wisc.edu/condor/
Globus - developing fundamental technologies needed to build persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations
http://www.globus.org/
SDM - Scientific Data Management Research Group – optimized and standardized access to storage systems
http://gizmo.lbl.gov/DM.html
Storage Resource Broker - client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and cataloging/accessing replicated data sets.
http://www.npaci.edu/DICE/SRB/index.html
Delivery of End-to-End Applications& Integrated Production Systems
to allow thousands of physicists to share data & computing resources for scientific processing and analyses
PPDG Focus:
- Robust Data Replication
- Intelligent Job Placement and Scheduling
- Management of Storage Resources
- Monitoring and Information of Global Services
Relies on Grid infrastructure:- Security & Policy- High Speed Data Transfer- Network management
Resources: Computers, Storage, Networks
Operators & Users
Project Activities, End-to-End Applicationsand Cross-Cut Pilots
Project Activities are focused Experiment – Computer Science Collaborative developments.
Replicated data sets for science analysis – BaBar, CMS, STARDistributed Monte Carlo production services – ATLAS, D0, CMSCommon storage management and interfaces – STAR, JLAB
End-to-End Applications used in Experiment data handling systems to give real-world requirements, testing and feedback.
Error reporting and responseFault tolerant integration of complex components
Cross-Cut Pilots for common services and policies Certificate Authority policy and authenticationFile transfer standards and protocolsResource Monitoring – networks, computers, storage.
Year 0.5-1 Milestones (1)
Align milestones to Experiment data challenges:
– ATLAS – production distributed data service – 6/1/02
– BaBar – analysis across partitioned dataset storage – 5/1/02
– CMS – Distributed simulation production – 1/1/02
– D0 – distributed analyses across multiple workgroup clusters – 4/1/02
– STAR – automated dataset replication – 12/1/01
– JLAB – policy driven file migration – 2/1/02
Year 0.5-1 Milestones
Common milestones with EDG:
GDMP – robust file replication layer – Joint Project with EDG Work Package (WP) 2 (Data Access)
Support of Project Month (PM) 9 WP6 TestBed Milestone. Will participate in integration fest at CERN - 10/1/01
Collaborate on PM21 design for WP2 - 1/1/02
Proposed WP8 Application tests using PM9 testbed – 3/1/02
Collaboration with GriPhyN:
SC2001 demos will use common resources, infrastructure and presentations – 11/16/01
Common, GriPhyN-led grid architecture
Joint work on monitoring proposed
Year ~0.5-1 “Cross-cuts”
• Grid File Replication Services used by >2 experiments:– GridFTP – production releases
• Integrate with D0-SAM, STAR replication• Interfaced through SRB for BaBar, JLAB• Layered use by GDMP for CMS, ATLAS
– SRB and Globus Replication Services• Include robustness features• Common catalog features and API
– GDMP/Data Access layer continues to be shared between EDG and PPDG.
• Distributed Job Scheduling and Management used by >1 experiment:• Condor-G, DAGman, Grid-Scheduler for D0-SAM, CMS• Job specification language interfaces to distributed schedulers – D0-SAM,
CMS, JLAB
• Storage Resource Interface and Management• Consensus on API between EDG, SRM, and PPDG• Disk cache management integrated with data replication services
Year ~1 other goals:
• Transatlantic Application Demonstrators:– BaBar data replication between SLAC and IN2P3– D0 Monte Carlo Job Execution between Fermilab and NIKHEF– CMS & ATLAS simulation production between Europe/US
• Certificate exchange and authorization.– DOE Science Grid as CA?
• Robust data replication.– fault tolerant – between heterogeneous storage resources.
• Monitoring Services– MDS2 (Metacomputing Directory Service)?– common framework– network, compute and storage information made available to scheduling and resource management.
PPDG activities as part of the Global Grid Community
Coordination with other Grid Projects in our field:GriPhyN – Grid for Physics NetworkEuropean DataGridStorage Resource Management collaboratoryHENP Data Grid Coordination Committee
Participation in Experiment and Grid deployments in our field:ATLAS, BaBar, CMS, D0, Star, JLAB experiment data handling systemsiVDGL/DataTAG – International Virtual Data Grid LaboratoryUse DTF computational facilities?
Active in Standards Committees:Internet2 HENP Working Group Global Grid Forum
What should happen now?
• Collaboration needs to define it’s computing model– It really will be distributed – grid based– Although the compute resources can be provided – it is not obvious that
the vast quantities of data can really be analyzed efficiently by a small group
• Do not underestimate the task
– The computing model will define requirements for computing – some of which may require some lead time
• Ensure software and computing is managed as a project equivalent in scope to the entire detector – It has to last at least as long, it runs 24x365– The complete software system is more complex than the detector, even for
Hall D where the reconstruction is relatively straightforward– It will be used by everyone
• Find and empower a computing project manager now