DØSAR a Regional Grid within DØ Jae Yu Univ. of Texas, Arlington THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington

DØSAR a Regional Grid within DØ

Jae YuUniv. of Texas, Arlington

THEGrid WorkshopJuly 8 – 9, 2004

Univ. of Texas at Arlington

• High Energy Physics– Total expected data size is over 5 PB (5,000 inche stack of

100GB hard drives) for CDF and DØ– Detectors are complicated Need many people to construct

and make them work– Collaboration is large and scattered all over the world– Allow software development at remote institutions– Optimized resource management, job scheduling, and

monitoring tools– Efficient and transparent data delivery and sharing

• Use the opportunity of having large data set in furthering grid computing technology– Improve computational capability for education – Improve quality of life

The Problem

DØ and CDF at Fermilab Tevatron• World’s Highest Energy proton-anti-proton collider

– Ecm=1.96 TeV (=6.3x10-7J/p 13M Joules on 10-6m2)Equivalent to the kinetic energy of a 20t truck at a speed 80 mi/hr

Chicago

Tevatron p

p CDF

DØ

650 Collaborators78 Institutions18 Countries

DØ Collaboration

Centeralized Deployment ModelsStarted with Lab-centric SAM infrastructure in place, …

…transition to hierarchically distributed Model

Desktop Analysis Stations

Institutional Analysis Centers

Regional Analysis Centers

Normal InteractionCommunication PathOccasional Interaction Communication Path

Central Analysis Center (CAC)

DAS DAS…. DAS DAS….

IAC ... IAC IAC…IAC

RAC….

RAC

DØ Remote Analysis Model (DØRAM)Fermilab

DØ Southern Analysis Region (DØSAR)

• One of the regional grids within the DØGrid• Consortium coordinating activities to maximize computing

and analysis resources in addition to the whole European efforts

• UTA, OU, LTU, LU, SPRACE, Tata, KSU, KU, Rice, UMiss, CSF, UAZ

• MC farm clusters – mixture of dedicated and multi-purpose, rack mounted and desktop, 10’s-100’s of CPU’s

• http://www-hep.uta.edu/d0-sar/d0-sar.html

http://www-hep.uta.edu/d0-sar/d0-sar.html

UTA is the first US DØRAC

Mexico/Brazil

OU/LU

UAZ

RiceLTU

UTA

KUKSU

Ole Miss

DØRAM Implementation

MainzWuppertal

Munich

AachenBonn

GridKa

(Karlsruhe)

DØSAR formed around UTA

UTA – RAC (DPCC)•100 P4 Xeon 2.6GHz CPU = 260 GHz•64TB of Disk space

•84 P4 Xeon 2.4GHz CPU = 202 GHz•7.5TB of Disk space

•Total CPU: 462 GHz•Total disk: 73TB•Total Memory: 168Gbyte•Network bandwidth: 68Gb/sec

The tools• Sequential Access via Metadata (SAM)

– Data replication and cataloging system• Batch Systems

– FBSNG: Fermilab’s own batch system– Condor

• Three of the DØSAR farms consists of desktop machines under condor– PBS

• Most the dedicated DØSAR farms use this manager• Grid framework: JIM = Job Inventory Management

– Provide framework for grid operation Job submission, match making and scheduling

– Built upon Condor-G and globus

Operation of a SAM Station/ConsumersProducers/

Station &Cache

Manager

File Storage Server

File Stager(s)

Project Managers

eworkers

FileStorageClients

MSS orOtherStation

MSS orOtherStation

Data flow

Control

Cache DiskTemp Disk

Tevatron Grid Framework (JIM)

UTATTU

The tools cnt’d• Local Task managements

– DØSAR• Monte Carlo Farm (McFarm) management Cloned to other

institutions• Various Monitoring Software

– Ganglia resource– McFarmGraph: MC Job status monitoring– McPerM: Farm performance monitor

• DØSAR Grid: Submit requests onto a local machine and the requests gets transferred to a submission site and executed at an execution site

– DØGrid• Uses mcrun_job request script• More adaptable to a generic cluster

Ganglia Grid Resource Monitoring

Operating since Apr. 2003

Job Status Monitoring: McFarmGraphOperating since Sept. 2003

http://hepfm000.uta.edu/job_status/

Farm Performance Monitor: McPerM

Designed, implemented and improved by UTA Students

Operating since Sept. 2003

http://hepfm000.uta.edu/new_m.htm

D0 Grid/Remote Computing April 2004 Joel Snow Langston University

DØSAR MC Delivery Stat. (as of May 10, 2004)Institution Inception NMC (TMB) x106

LTU 6/2003 0.4

LU 7/2003 2.3

OU 4/2003 1.6

Tata, India 6/2003 2.2

Sao Paulo, Brazil 4/2004 0.6

UTA-HEP 1/2003 3.6

UTA–RAC 12/2003 8.2

D0SAR Total As of 5/10/04 18.9

DØSAR Computing & Human ResourcesInstitutions CPU(GHz) [future] Storage (TB) People

Cinvestav 13 1.1 1F+?

Langston 22 1.3 1F+1GA

LTU 25+[12] 1.0 1F+1PD+2GA

KU 12 ?? 1F+1PD

KSU 40 1.2 1F+2GA

OU 19+270 (OSCER) 1.8 + 120(tape) 4F+3PD+2GA

Sao Paulo 60+[120] 4.5 2F+Many

Tata Institute 52 1.6 1F+1Sys

UTA 430 74 2.5F+1sys+1.5PD+3GA

Total 943 [1075] 85.5 +

120(tape)

14.5F+2sys+6.5PD+10GA

How does current Tevatron MC Grid work?

Client Site

GlobalGrid

Sub. Sites

Regional Grids

Exe. Sites

Desktop. Clst.

Desktop. Clst.

Ded. Clst.

Ded. Clst.

SAM

Actual DØ Data Re-processing at UTA

Network Bandwidth Needs

Summary and Plans• Significant progress has been made in implementing grid

computing technologies for DØ experiment – DØSAR Grid has been operating since April, 2004

• Large amount of documents and expertise accumulated• Moving toward data re-processing and analysis

– First set of 180million event partial reprocessing completed– Different level of complexity

• Improved infrastructure necessary, especially network bandwidths– LEARN will boost the stature of Texas in HEP grid computing

world– Started working with AMPATH, Oklahoma, Louisiana, Brazilian

Consortia (Tentatively named the BOLT Network) Need the Texan consortium

• UTA’s experience on DØSARGrid will be an important asset to expeditious implementation of THEGrid

Documents

DØSAR a Regional Grid within DØ Jae Yu Univ. of Texas, Arlington THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington