Meta-Computing at DØ Igor Terekhov, for the DØ Experiment Fermilab, Computing Division, PPDG ACAT 2002 Moscow, Russia June 28, 2002

Meta-Computing at DØ

Igor Terekhov, Igor Terekhov,

for the Dfor the DØØ Experiment Experiment Fermilab, Computing Division, PPDGFermilab, Computing Division, PPDG

ACAT 2002 Moscow, RussiaACAT 2002 Moscow, Russia

June 28, 2002June 28, 2002

2

Overview

Overview of the D0 ExperimentOverview of the D0 Experiment Introduction into Computing and the Introduction into Computing and the

paradigm of Distributed Computingparadigm of Distributed Computing SAM – the advanced data handling systemSAM – the advanced data handling system Global Job And Information Management Global Job And Information Management

(JIM) – the current Grid project(JIM) – the current Grid project Collaborative Grid workCollaborative Grid work

3

The DØ Experiment

P-pbar collider experiment 2TeVP-pbar collider experiment 2TeV Detector (Real) DataDetector (Real) Data

1,000,000 Channels (793k from 1,000,000 Channels (793k from Silicon Microstrip Tracker), 5-15% Silicon Microstrip Tracker), 5-15% read at a timeread at a time

Event size 250KB (25% increase in Event size 250KB (25% increase in RunIIb)RunIIb)

Recorded event rate 25 Hz RunIIa, Recorded event rate 25 Hz RunIIa, 50 Hz (projected) RunIIb50 Hz (projected) RunIIb

On-line Data RateOn-line Data Rate 0.5 TB/day, 0.5 TB/day, Total 1TB/day Total 1TB/day

Est. 3 year totals (incl Processing Est. 3 year totals (incl Processing and analysis):and analysis):

Over 10Over 1099 events, 1-2 PB events, 1-2 PB Monte Carlo DataMonte Carlo Data

6 remote processing centers6 remote processing centers Estimate ~300 TB in next 2 years.Estimate ~300 TB in next 2 years.

4

The Collaboration

600+ people600+ people 78 Institutions78 Institutions 18 countries18 countries Is a large Is a large Virtual Virtual

OrganizationOrganization whose whose members share members share resources for solving resources for solving common problemscommon problems

6

Analysis Assumptions

Num of Num of JobsJobs

% of% of

DataSetDataSet

DurationDuration CPU/evt, CPU/evt, 500 MHz500 MHz

LongLong 66 30%30% 12 weeks12 weeks 5 sec5 sec

MediumMedium 5050 10%10% 4 weeks4 weeks 1 sec1 sec

ShortShort 150150 1%1% 1 week1 week 0.1 sec0.1 sec

7

Data Storage

The Enstore Mass Storage System, The Enstore Mass Storage System, http://www-isd.fnal.gov/enstore/index.htmlhttp://www-isd.fnal.gov/enstore/index.html

All data is stored on tape in Automated Tape All data is stored on tape in Automated Tape Library (ATL) – robot, including derived datasetsLibrary (ATL) – robot, including derived datasets

Enstore is attached to the network, accessible via a Enstore is attached to the network, accessible via a cp-like commandcp-like command

Other, remote MSS’s may be used (the Other, remote MSS’s may be used (the distributed distributed ownership paradigm – grid computingownership paradigm – grid computing) )

http://www-isd.fnal.gov/enstore/index.html

8

Data Handling - SAM

Responds to the above challenges in:Responds to the above challenges in: Amounts of dataAmounts of data Rate of access (processing)Rate of access (processing) The degree to which the user base is distributedThe degree to which the user base is distributed

Major goals and requirementsMajor goals and requirements Reliably store (real and MC) produced dataReliably store (real and MC) produced data Distribute the data globally to remote analysis Distribute the data globally to remote analysis

centerscenters Catalogue the data – contents, status, locations, Catalogue the data – contents, status, locations,

processing history, user processing history, user datasetsdatasets etc etc Manage resourcesManage resources

9

SAM Highlights

SAM is SAM is SSequential data equential data AAccess via ccess via MMeta-dataeta-data http://d0db.http://d0db.fnalfnal..govgov//samsam Joint project between D0 and Computing Division Joint project between D0 and Computing Division

started in 1997 to meet the Run II data handling started in 1997 to meet the Run II data handling needsneeds

Employs a centrally managed RDBMS (Oracle) Employs a centrally managed RDBMS (Oracle) for meta-data catalogfor meta-data catalog

Processing takes place at Processing takes place at stationsstations Actual data is managed by a fully distributed set Actual data is managed by a fully distributed set

of collaborating servers (see architecture later)of collaborating servers (see architecture later)

http://d0db.fnal.gov/sam






10

SAM Advanced Features

Uniform interfaces for data access modesUniform interfaces for data access modes Online system, reconstruction farm, Monte-Carlo Online system, reconstruction farm, Monte-Carlo

farm, analysis server are all subclasses of the farm, analysis server are all subclasses of the stationstation.. Uniform capabilities for processing at FNAL and Uniform capabilities for processing at FNAL and

remote centersremote centers On-demand data caching and forwarding (intra-cluster On-demand data caching and forwarding (intra-cluster

and global)and global) Resource management:Resource management:

Co-allocation of compute and data resources Co-allocation of compute and data resources (interfaces with batch system abstraction)(interfaces with batch system abstraction)

Fair share allocation and schedulingFair share allocation and scheduling

11

Components of a SAM Station

Station &Cache

Manager

File Storage Server

File Stager(s)

Project Masters

/Consumers

eworkers

FileStorageClients

MSS orOtherStation

MSS orOtherStation

Data flow

Control

Producers/

Cache DiskTemp Disk

13

Data Site WAN

SAM as a Distributed System

optimizer Logger

Shared locally, optional Shared Globally (standard):

Database Server

optimizer Logger

FNAL

14

Data SiteWAN Data Flow

Routing+Caching=Replication

15

SAM as a Data Grid Provides high-level collective services of Provides high-level collective services of reliable reliable

data storage and replicationdata storage and replication Embraces multiple MSS’s (Enstore , HPSS, etc) Embraces multiple MSS’s (Enstore , HPSS, etc)

local resource management systems (LSF, FBS, local resource management systems (LSF, FBS, PBS, Condor), several different file transfer PBS, Condor), several different file transfer protocols (bbftp, kerberos rcp, grid ftp, …)protocols (bbftp, kerberos rcp, grid ftp, …)

Optionally uses Grid technologies and toolsOptionally uses Grid technologies and tools Condor as a Batch system (in use)Condor as a Batch system (in use) Globus FTP for data transfers (ready for Globus FTP for data transfers (ready for

deployment) deployment) From From de factode facto to to de jure…de jure…

16

Fab

ric

Tape Storage

Elements

Request Formulator and

Planner

Client Applications

Compute Elements

Indicates component that will be replaced

Disk Storage

Elements

LANs andWANs

Resource and Services Catalog

Replica Catalog

Meta-data

Catalog

Authentication and SecurityGSISAM-specific user, group, node, station registration Bbftp ‘cookie’

Connectivity and Resource

CORBA UDP File transfer protocols - ftp, bbftp, rcp GridFTP

Mass Storage systems protocols

e.g. encp, hpss

Collective

Services

Catalogprotocols

Significant Event Logger Naming Service Database ManagerCatalog Manager

SAM Resource ManagementBatch Systems - LSF, FBS, PBS,

CondorData MoverJob Services

Storage Manager

Job ManagerCache ManagerRequest Manager

“Dataset Editor” “File Storage Server”“Project Master” “Station Master” “Station Master”

Web Python codes, Java codes Command line D0 Framework C++ codes

“Stager”“Optimiser”

CodeRepository

Name in “quotes” is SAM-given software component name

or addedenhanced using PPDG and Grid tools

17

Dzero SAM Deployment Map

Processing Center

Analysis site

18

SAM usage statistics for DZero• 497 registered SAM users in production

• 360 of them have at some time run at least one SAM project• 132 of them have run more than 100 SAM projects• 323 of them have run a SAM project at some time in the past year• 195 of them have run a SAM project in the past 2 months

• 48 registered stations, 340 registered nodes • 115TB of data on tape• 63,235 cached files currently (over 1 million entries total)• 702,089 physical and virtual data files known to SAM • 535,048 physical files (90K raw, 300K MC related)• 71,246 “analysis” projects ever ran• http://d0db.fnal.gov/sam_data_browsing/ for more info

http://d0db.fnal.gov/sam_data_browsing/

19

SAM + JIM Grid

So we can reliably replicate a TB of data, what’s So we can reliably replicate a TB of data, what’s next?next?

It is handling of jobs, not data, that constitutes the It is handling of jobs, not data, that constitutes the top of the services pyramidtop of the services pyramid

Need services for job submission, brokering and Need services for job submission, brokering and reliable executionreliable execution

Need Need resource discoveryresource discovery and and opportunistic opportunistic computing computing (shared vs dedicated resources)(shared vs dedicated resources)

Need monitoring of the Need monitoring of the globalglobal system and jobs system and jobs Job and Information Management (JIM) emergedJob and Information Management (JIM) emerged

20

JIM and SAM-Grid

(NB: Please hear Gabriele Garzoglio’s talk)(NB: Please hear Gabriele Garzoglio’s talk) Project started in 2001 as part of the PPDG Project started in 2001 as part of the PPDG

collaboration to handle D0’s expanded needs.collaboration to handle D0’s expanded needs. Recently included CDFRecently included CDF These are real Grid problems and we are These are real Grid problems and we are

incorporating (adopting) or developing Grid incorporating (adopting) or developing Grid solutionssolutions

http://www-d0.http://www-d0.fnalfnal..govgov/computing/grid/computing/grid PPDG, GridPP, iVDGL, DataTAG and PPDG, GridPP, iVDGL, DataTAG and

other Grid Projectsother Grid Projects

http://www-d0.fnal.gov/computing/grid





21

SAMGrid Principal Components

(NB Please come to Gabriele’s talk)(NB Please come to Gabriele’s talk) Job Definition and Management:Job Definition and Management: The preliminary job The preliminary job

management architecture is aggressively based on the Condor management architecture is aggressively based on the Condor technology provided by through our collaboration with technology provided by through our collaboration with University of Wisconsin CS Group.University of Wisconsin CS Group.

Monitoring and Information Services:Monitoring and Information Services: We assign a critical We assign a critical role to this part of the system and widen the boundaries of role to this part of the system and widen the boundaries of this component to include all services that provide, or this component to include all services that provide, or receive, information relevant for job and data management.receive, information relevant for job and data management.

Data Handling:Data Handling: The existing SAM Data Handling system, The existing SAM Data Handling system, when properly abstracted, plays a principal role in the overall when properly abstracted, plays a principal role in the overall architecture and has direct effects on the Job Management architecture and has direct effects on the Job Management services.services.

22

SAM-Grid ArchitectureJob Handling Monitoring and Information

Data Handling

RequestBroker

Compute ElementResource

SiteGatekeeper

Logging andBookkeeping

Job Scheduler

Info ProcessorAnd Converter

Replica Catalog

DH Resource Management

Data Delivery and Caching

ResourceInfo

JH Client

AAA

BatchSystem

Condor-G

Condor MMS

GRAM

GSI

SAM

Grid sensors

(All) Job Status

Updates

MDS-2Condor

Class Ads Grid RC

PrincipalComponent Service Implementation

Or Library

Information

23

SAMGrid: Collaboration of Collaborations

HEP Experiments are traditionally collaborativeHEP Experiments are traditionally collaborative Computing solutions in the Grid era: new types of Computing solutions in the Grid era: new types of

collaborationcollaboration Sharing solution within experiment – UTA MCFarm Sharing solution within experiment – UTA MCFarm

software etcsoftware etc Collaboration between experiments – D0 and CDF Collaboration between experiments – D0 and CDF

joining forces an important event for SAM and joining forces an important event for SAM and FNALFNAL

Collaboration among the grid players: Physicists, Collaboration among the grid players: Physicists, Computer Scientists (Condor and Globus teams), Computer Scientists (Condor and Globus teams), Physics-oriented computer professionals (such as Physics-oriented computer professionals (such as myself)myself)

24

Conclusions The Dzero experiment is one of the largest currently The Dzero experiment is one of the largest currently

running experiments and presents computing challengesrunning experiments and presents computing challenges The advanced data handling system, SAM, has matured. It The advanced data handling system, SAM, has matured. It

is fully distributed, its model is proven sound and we is fully distributed, its model is proven sound and we expect to scale to meet RunII needs for both D0 and CDFexpect to scale to meet RunII needs for both D0 and CDF

Expanded needs are in the area of job and information Expanded needs are in the area of job and information managementmanagement

The recent challenges are typical of the Grid Computing The recent challenges are typical of the Grid Computing and D0 engages actively, in collaboration with Computer and D0 engages actively, in collaboration with Computer scientists and other Grid participantsscientists and other Grid participants

More in Gabriele Garzoglio’s talkMore in Gabriele Garzoglio’s talk

Documents

Meta-Computing at DØ Igor Terekhov, for the DØ Experiment Fermilab, Computing Division, PPDG ACAT 2002 Moscow, Russia June 28, 2002