28
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

Embed Size (px)

Citation preview

Page 1: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

David Adams

ATLAS

ATLAS Distributed Analysis

David AdamsBNL

September 30, 2004

CHEP2004Track 5: Distributed Computing Systems and Experiences

Page 2: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 2David Adams

ATLAS

Contents

Goals

Key concepts• Datasets

• Transformations

• Jobs

• AJDL

Service architecture

Analysis services• DIAL

• ATPROD

• ARDA

Catalog services

Data management services

Clients

Status

ARDA

Conclusions

Contributors

More information

Page 3: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 3David Adams

ATLAS

GoalsProvide to globally distributed users:

• Access to globally distributed data that is– Comprehensible– Enables selection of relevant data– Enables sensible placement of data

• Means to perform globally distributed processing on this data– High-level view that hides details of underlying middleware– But enables monitoring and debugging– Automatic, complete and accurate provenance

All the above must be easy to use• Well-integrated with analysis environments

– Root, python, etc.

• Graphical views where appropriate– Browse and examine data,– Monitor jobs, …

Page 4: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 4David Adams

ATLAS

Key conceptsDataset

• Describes a collection of data– E.g. a collection of reconstructed events,

– A collection of histograms, …

Transformation• Defines an operation to be performed on the data

• Dataset Dataset

• Application + task (user configuration of application)

Job• Instance of a transformation

• Typical user request processed as a collection of sub-jobs– Same transformation acting on sub-datasets

– Plus dataset splitting of input and merging of output

Page 5: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 5David Adams

ATLAS

Key concepts (cont)

D atase t 1 D atase t 2

D atase t

U se r a n a ly sisfra m e w o rk

A p p lic a tio n T a sk

R e sult C od e

7 . c re a te

4 . s e le c t

2 . s e le c t 3 . c re a te o r s e le c t

A n alys isS ervice

1 . c re a te o r lo c a te5 . s u b m it(a p p ,ts k ,d s )

R e sult 1

R e sult 2

Jo b 1

Jo b 2

8 . ru n(a p p ,ts k ,d s 1 )

8 . ru n(a p p ,ts k ,d s 2 )

9 . fill

9 . fill

1 0 . ga the r

6 . s p lit

R O O T ,G AN G A, . . .

E v en t d a ta ,s u m m ar y d a ta ,tu p les , . .

Ath en a , d ia lp aw ,R O O T , . . .

Page 6: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 6David Adams

ATLAS

DatasetsDataset includes

• Identifier

• Location of data, e.g. list of logical files– Absent for virtual datasets

• Content (i.e. description of the content)– E.g. list of event ID’s and the type of data for each event

– Or a list of histogram names

• List of constituent datasets– Usually their ID’s

– When dataset is composite, access to location and content may require use of the constituent datasets

Dataset selection catalog holds metadata

Dataset replica catalog holds replica mapping• 1 Virtual N concrete dataset mapping

Page 7: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 7David Adams

ATLAS

Datasets (cont)For ATLAS data, we identify

• Types of data– Used to define dataset categories

– Category will be part of the content specification

• Types of datasets– Currently C++ classes with XML data representation

– Third column indicates if this class exists

– Likely will move to XML schema as the primary definition

• See table

Page 8: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 8David Adams

ATLAS

Datasets (cont)Name Type ? Description

EVIDS EventDataset × List of event ID’s

EVGEN AtlasPoolEventDataset × From event generator

HITS AtlasPoolEventDataset × Hits, e.g. from GEANT

DIGITS AtlasPoolEventDataset × Digitization of hits

RAW AtlasByteStreamEventDataset Raw data

ESD AtlasPoolEventDataset × Event summary data

AOD AtlasPoolEventDataset × Analysis oriented data

TAG AtlasPoolTagEventDataset Event metadata

NTUP RootNtupleDataset Ntuples

HISTO RootHistogramDataset × Histograms

CBNT CbntDataset × DC1 combined ntuples

TEXT TextDataset Text data, e.g. log files

Page 9: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 9David Adams

ATLAS

TransformationsTransformation

• Describes an operation to act on a dataset to produce a new dataset

• Has two components– Application = code shared by multiple transformations

> Usually scripts to locate and run code in software packages

– Task = user-supplied configuration (parameters or code)

Task• List of files

– Presently embedded in task

– Later could also be logical files

• Named parameters– Add this soon

• Typically created by user submitting the job

Page 10: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 10David Adams

ATLAS

Transformations (cont)Application

• Two entry points (presently scripts)– Build_task to fetch task files, compile, etc

– Run creates output dataset from input dataset and built task

• Typically created by application developer

Software package management• Need an interface to enable build_task and run scripts to locate

software on any machine

• E.g. “locate mypkg 1.2.3” returns /usr/contrib/mypkg/1.2.3/rh73_gcc73

• Also support querying and installation

• Implement as thin layer on existing package management systems– Pacman, RPM, local build, …

• Use service to handle installation and removal of packages

Page 11: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 11David Adams

ATLAS

Transformations (cont)IN \ O UT EVT ID S EVG EN HIT S DIG ITS RA W E S D A O D TA G NTUP HIS TO

ID BLD D AQ

EVT ID S G EN

EVG EN G 4SIM G 4SIM G 4SIM

HIT S D IG I D IG I D IG I

D IG IT S PAC K R EC O R EC O R EC O

R AW UNPAC K

ESD AO D BLD

AO D SELEC T T AG BLD ANALYZE ANALYZE

T AG SELEC T

NT UP ANALYZE ANALYZE

For ATLAS we identify the above transformations• Characterized by input and output dataset categories

• Most common ones listed—others are possible

Page 12: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 12David Adams

ATLAS

JobsA job is an instance of a transformation acting on a dataset

• Output result is another dataset

• Partial result may be available before job is complete

Typical user-submitted job is split into sub-jobs• By splitting input dataset and applying the same transformation to

each sub-dataset

• Strategies for splitting and merging results must be provided

Provenance• Dataset provenance is specified by recording the input dataset and

transformation

• More complete information is available from the job:– Site, CPU, submission, start and stop times, …

– Log files maintained for some period, perhaps as datasets

Page 13: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 13David Adams

ATLAS

AJDLAJDL = Abstract Job Definition Language

Components are representations of• Dataset

• Transformation = Application + Task

• Job

• JobPreferences

• File

• Identifiers for all the above

Presently defined as C++ classes• With methods to write to and read from XML

– Different for each subclass of Dataset

– Same for subclasses of Job

• XML specified in DTD files

Page 14: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 14David Adams

ATLAS

AJDL (cont)Look at moving to XML schema

• Automatically derive classes from XML definitions– Automatic support for other languages (python, java, …)

• In collaboration with GANGA and others

At the same time• Try to find one representation for all datasets

• Introduce separate type for event ID lists– Often too large to carry around in a dataset

Also interested in specifying interfaces for AJDL services• Those that operate on AJDL components

• Services listed later

Interested in working with others on these specifications

Page 15: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 15David Adams

ATLAS

Service architectureADA itself is distributed

• Allows data access and job management to be distributed– Important for scaling to a large number of users

• Collection of web services– Analysis service for job processing

– Job monitoring

– Catalog services

> Metadata

> Repository

> Replica (not only for files)

• Users interact through clients– Root client from DIAL

– Python client from GANGA

Page 16: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 16David Adams

ATLAS

Service architecture

R O O T P Y T H O N

A M I D B S D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R gLite W M SA T P R O D

G U I andc o m m and l inec l ie nts

H igh le ve l s e rvic e sfo r c atalo ging andjo b s ubm is s io n andm o nito r ing

W o rklo adm anage m e nts ys te m s

AJ D L

s h S Q L g L ite

AM I w s

AJ D L

Page 17: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 17David Adams

ATLAS

DIAL analysis serviceTwo instances running at BNL

• Long running jobs using condor job submission

• Interactive response using fast LSF queue

Working to improve interactive response• Submit jobs to perform result merging

– Presently done on service host

• Use parallel jobs for merging

• Long term, look at the use of job agents– Possibly as part of ARDA

Add service to act as switch• Delegate jobs based on

– Job requirements

– Desired response time

– Resource availability

R O O T P Y T H O N

A M I D B S D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R gLite W M SA T P R O D

AJ D L

s h S Q L g L ite

AM I w s

AJ D L

Page 18: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 18David Adams

ATLAS

ATPROD analysis serviceEnable submission to the existing ATLAS production system

• At least for user-level production

Strategy• Split input dataset

• Make an entry in the production catalog for each sub-job

• Monitor catalog and gather and merge results as jobs finish

• Same for the other analysis services

Not yet implementedR O O T P Y T H O N

A M I D B S D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R gLite W M SA T P R O D

AJ D L

s h S Q L g L ite

AM I w s

AJ D L

Page 19: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 19David Adams

ATLAS

ARDA analysis serviceEnable submission to the gLite WMS

• Let EGEE do the work of matchmaking, brokering, job tracking, monitoring, error reporting, …

There is a service to submit to the existing prototype system

Expect first release of GLite next month• Quickly deploy an analysis service

based on this

• Make regular updates taking advantage

of more gLite features R O O T P Y T H O N

A M I D B S D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R gLite W M SA T P R O D

AJ D L

s h S Q L g L ite

AM I w s

AJ D L

Page 20: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 20David Adams

ATLAS

Catalog servicesGoals of ADA cataloging:

• Provide a repository for AJDL objects indexed by ID– Insert at site A and extract with ID at site B

• Enable users to assign metadata to objects and retrieve with queries

• Record dataset provenance

• Provide job monitoring

Identify three types of catalogs• Repository

– Map ID to XML string

• Metadata catalog– Map ID to named attributes

• Replica catalog– Map ID to a list of ID’s

R O O T P Y T H O N

A M I D B S D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R gLite W M SA T P R O D

AJ D L

s h S Q L g L ite

AM I w s

AJ D L

Page 21: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 21David Adams

ATLAS

Catalog services (cont)Required global catalog instances

• Repositories for Dataset, Application, Task, Job

• Metadata catalog for Dataset– Same as that used for production?

• Replica catalog for Dataset

• More later

• First choice is to host these in AMI (soon)

Next add local job catalog to record analysis service state• So service can be restarted without losing jobs

Later look at issues such as• Distributed cataloging

• Private catalogs

Page 22: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 22David Adams

ATLAS

Data management servicesDQ (Don Quijote) was developed as part of production

• Provides access to file replica catalogs from all three grids

• Enables file movement including between grids

• ADA will adopt this for replica management and movement

ATLAS has plan to add a file transfer service• Adopt this as well when available

SRM provides file management at the site level• ATLAS expects sites to deploy this service

• DQ and ADA will use this as it is deployed

GLite has a suite of data management services• Including SRM

• Rest of service model is complex—hide it behind DQ– Already have DQ interface to AlieEn file catalog

Page 23: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 23David Adams

ATLAS

ClientsDIAL provides a ROOT client

• ACLiC used to build dictionaries for DIAL classes– All DIAL classes available on the ROOT command line

– Enables catalog browsing, job submission, monitoring, etc.

GANGA provides a python client• PyLCGDict used to build python wrappers for DIAL classes

– All DIAL classes available on the python command line

• Later build python-only client– Restricted functionality but

– Greater portability

GUI• GANGA is developing a GUI

– Data browsing

– Configure, submit and monitor jobs

R O O T P Y T H O N

A M I D B S D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R gLite W M SA T P R O D

AJ D L

s h S Q L g L ite

AM I w s

AJ D L

Page 24: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 24David Adams

ATLAS

StatusPresent system includes

• Root and Python command line clients

• DIAL analysis services running– Interactive service at BNL

– Batch service at BNL

• Datasets– Classes for combined ntuples, ATLAS-POOL event collections

– All DC1 CBNT data

– Few DC2 samples

• Transformations– DC1 CBNT histograms

– DIGI: atlasdigi-8.5.0

– RECO: atlas-reco-8.x.0. x= 3, 4, 5

R O O T P Y T H O N

A M I D B S D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R gLite W M SA T P R O D

AJ D L

s h S Q L g L ite

AM I w s

AJ D L

Page 25: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 25David Adams

ATLAS

ARDAATLAS-ARDA prototype

• ARDA is a CERN project to deliver prototype distributed analysis systems for the LHC experiments

– Based on gLite (EGEE middleware)

• The ATLAS ARDA prototype makes use of the components shown in the figure

• Expect functional system this year

R O O T P Y T H O N

A M I D B S D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R gLite W M SA T P R O D

AJ D L

s h S Q L g L ite

AM I w s

AJ D L

Page 26: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 26David Adams

ATLAS

ConclusionsStatus

• ADA is coming together but there is still much to do• Still in demo mode; for serious use we must add

– Dataset description of DC2 data– Repositories for applications, tasks, datasets and jobs in AMI– Dataset selection catalog in AMI– Dataset replica catalogs in AMI– Transformations for the full DC2 production/analysis chain– Means to move output data to a storage element

• Expect all this year

Future developments (beyond those above)• Update AJDL moving to XML schema and adding WSDL• GUI (expect this soon)• ATPROD service to access more compute resources• ARDA service to try out EGEE middleware• Improvements to DIAL service to improve interactive response

Page 27: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 27David Adams

ATLAS

ContributorsDIAL

• D. Adams, W. Deng, V. Sambamurthy, N. Chetan, C. Kannan

GANGA• K. Harrison, C. Tan, A. Soroko

ARDA• D. Liko, F. Orellana

AMI• S. Albrand, J. Fulachier

ATLAS• C. Haeberli, J. Bahilo, F. Fassi, G. Rybkine, M. Branco

Many useful discussions• All the above and PPDG, GAG, gLite,…

Page 28: David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences

CHEP2004 Atlas Distributed Analysis Sept 30, 2004 28David Adams

ATLAS

More informationFor more information on ADA, see the home page

http://www.usatlas.bnl.gov/ADA

Includes status of subprojects, relevant talks and documents, and links to associated projects

To try it out, run root demo 3 in the latest DIAL releasehttp://www.usatlas.bnl.gov/~dladams/dial/releases/0.92

See the ADA paper in the CHEP2004 proceedings