Upload
moris-hardy
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
DØSAR a Regional Grid within DØ
Jae YuUniv. of Texas, Arlington
THEGrid WorkshopJuly 8 – 9, 2004
Univ. of Texas at Arlington
• High Energy Physics– Total expected data size is over 5 PB (5,000 inche stack of
100GB hard drives) for CDF and DØ– Detectors are complicated Need many people to construct
and make them work– Collaboration is large and scattered all over the world– Allow software development at remote institutions– Optimized resource management, job scheduling, and
monitoring tools– Efficient and transparent data delivery and sharing
• Use the opportunity of having large data set in furthering grid computing technology– Improve computational capability for education – Improve quality of life
The Problem
DØ and CDF at Fermilab Tevatron• World’s Highest Energy proton-anti-proton collider
– Ecm=1.96 TeV (=6.3x10-7J/p 13M Joules on 10-6m2)Equivalent to the kinetic energy of a 20t truck at a speed 80 mi/hr
Chicago
Tevatron p
p CDF
DØ
650 Collaborators78 Institutions18 Countries
DØ Collaboration
Centeralized Deployment ModelsStarted with Lab-centric SAM infrastructure in place, …
…transition to hierarchically distributed Model
Desktop Analysis Stations
Institutional Analysis Centers
Regional Analysis Centers
Normal InteractionCommunication PathOccasional Interaction Communication Path
Central Analysis Center (CAC)
DAS DAS…. DAS DAS….
IAC ... IAC IAC…IAC
RAC….
RAC
DØ Remote Analysis Model (DØRAM)Fermilab
DØ Southern Analysis Region (DØSAR)
• One of the regional grids within the DØGrid• Consortium coordinating activities to maximize computing
and analysis resources in addition to the whole European efforts
• UTA, OU, LTU, LU, SPRACE, Tata, KSU, KU, Rice, UMiss, CSF, UAZ
• MC farm clusters – mixture of dedicated and multi-purpose, rack mounted and desktop, 10’s-100’s of CPU’s
• http://www-hep.uta.edu/d0-sar/d0-sar.html
UTA is the first US DØRAC
Mexico/Brazil
OU/LU
UAZ
RiceLTU
UTA
KUKSU
Ole Miss
DØRAM Implementation
MainzWuppertal
Munich
AachenBonn
GridKa
(Karlsruhe)
DØSAR formed around UTA
UTA – RAC (DPCC)•100 P4 Xeon 2.6GHz CPU = 260 GHz•64TB of Disk space
•84 P4 Xeon 2.4GHz CPU = 202 GHz•7.5TB of Disk space
•Total CPU: 462 GHz•Total disk: 73TB•Total Memory: 168Gbyte•Network bandwidth: 68Gb/sec
The tools• Sequential Access via Metadata (SAM)
– Data replication and cataloging system• Batch Systems
– FBSNG: Fermilab’s own batch system– Condor
• Three of the DØSAR farms consists of desktop machines under condor– PBS
• Most the dedicated DØSAR farms use this manager• Grid framework: JIM = Job Inventory Management
– Provide framework for grid operation Job submission, match making and scheduling
– Built upon Condor-G and globus
Operation of a SAM Station/ConsumersProducers/
Station &Cache
Manager
File Storage Server
File Stager(s)
Project Managers
eworkers
FileStorageClients
MSS orOtherStation
MSS orOtherStation
Data flow
Control
Cache DiskTemp Disk
Tevatron Grid Framework (JIM)
UTATTU
The tools cnt’d• Local Task managements
– DØSAR• Monte Carlo Farm (McFarm) management Cloned to other
institutions• Various Monitoring Software
– Ganglia resource– McFarmGraph: MC Job status monitoring– McPerM: Farm performance monitor
• DØSAR Grid: Submit requests onto a local machine and the requests gets transferred to a submission site and executed at an execution site
– DØGrid• Uses mcrun_job request script• More adaptable to a generic cluster
Ganglia Grid Resource Monitoring
Operating since Apr. 2003
Job Status Monitoring: McFarmGraphOperating since Sept. 2003
Farm Performance Monitor: McPerM
Designed, implemented and improved by UTA Students
Operating since Sept. 2003
D0 Grid/Remote Computing April 2004 Joel Snow Langston University
DØSAR MC Delivery Stat. (as of May 10, 2004)Institution Inception NMC (TMB) x106
LTU 6/2003 0.4
LU 7/2003 2.3
OU 4/2003 1.6
Tata, India 6/2003 2.2
Sao Paulo, Brazil 4/2004 0.6
UTA-HEP 1/2003 3.6
UTA–RAC 12/2003 8.2
D0SAR Total As of 5/10/04 18.9
DØSAR Computing & Human ResourcesInstitutions CPU(GHz) [future] Storage (TB) People
Cinvestav 13 1.1 1F+?
Langston 22 1.3 1F+1GA
LTU 25+[12] 1.0 1F+1PD+2GA
KU 12 ?? 1F+1PD
KSU 40 1.2 1F+2GA
OU 19+270 (OSCER) 1.8 + 120(tape) 4F+3PD+2GA
Sao Paulo 60+[120] 4.5 2F+Many
Tata Institute 52 1.6 1F+1Sys
UTA 430 74 2.5F+1sys+1.5PD+3GA
Total 943 [1075] 85.5 +
120(tape)
14.5F+2sys+6.5PD+10GA
How does current Tevatron MC Grid work?
Client Site
GlobalGrid
Sub. Sites
Regional Grids
Exe. Sites
Desktop. Clst.
Desktop. Clst.
Ded. Clst.
Ded. Clst.
SAM
Actual DØ Data Re-processing at UTA
Network Bandwidth Needs
Summary and Plans• Significant progress has been made in implementing grid
computing technologies for DØ experiment – DØSAR Grid has been operating since April, 2004
• Large amount of documents and expertise accumulated• Moving toward data re-processing and analysis
– First set of 180million event partial reprocessing completed– Different level of complexity
• Improved infrastructure necessary, especially network bandwidths– LEARN will boost the stature of Texas in HEP grid computing
world– Started working with AMPATH, Oklahoma, Louisiana, Brazilian
Consortia (Tentatively named the BOLT Network) Need the Texan consortium
• UTA’s experience on DØSARGrid will be an important asset to expeditious implementation of THEGrid