24
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

Embed Size (px)

Citation preview

Page 1: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

David Adams

ATLAS

ATLAS Distributed Analysis Plans

David AdamsBNL

December 2, 2003

ATLAS software workshopCERN

Page 2: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 2

David Adams

ATLAS

Contents

DAC mandate

Scope

Strategy

Scenario for first release

Plans for the first release

GANGA status

DIAL status

Deliverables for the first release

Conclusions

Page 3: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 3

David Adams

ATLAS

DAC MandateDistributed Analysis Coordinator

• Is responsible for coordinating the development of software tools for distributed analysis and their integration into the ATLAS software environment

• Start with the analysis of existing tools such as GANGA, DIAL, AtCom…

• Provide users with transparent access to metadata of different sorts as well as to event data in all stages of processing

• Participate actively in the definition of LCG projects such as ARDA

• Is a member of relevant LCG committees and working groups

Page 4: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 4

David Adams

ATLAS

ScopeAnalysis (not necessarily distributed)

• Supports the manipulation and extraction of summary data (e.g. histograms) from any type of event data

– AOD, ESD, …

• Supports user-level production of event data– e.g. MC generation, simulation and reconstruction

Distributed analysis• Extends the extraction and production support to

include distributed processing and distributed data• Natural extension of non-distributed analysis• Easily invoked from any ATLAS analysis environment

– including Python, ROOT, command line– easily ported to any future environment (e.g. JAS)

Page 5: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 5

David Adams

ATLAS

StrategyImplement DA as a collection of grid services

• As described in ARDA document

• Use ARDA components where possible

• Add missing and ATLAS-specific pieces

Provide clients for ATLAS analysis environments• Python, ROOT, command line

Regular releases• Perhaps for each SW week and ATLAS X.0

• Provide useful tool

• Demonstrate functionality

• Expand functionality with each release

Page 6: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 6

David Adams

ATLAS

Strategy (cont)Look to common projects for most of the pieces

• ARDA, GANGA, DIAL, …

• Share as much as possible with ATLAS production– Also distributed

– Similar interfaces and code for bulk and user-level production

• ADA (ATLAS distributed analysis) must identify these pieces and tie them together

Deployment• ADA services must be deployed at relevant sites

• Provide testing and monitoring of these services

• Work with facilities to deploy and maintain– Also to develop facility-specific features

Page 7: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 7

David Adams

ATLAS

Scenario for first releaseHere is a scenario for user interaction with the first release of ADA

• Authenticate – Proxy from authentication service

• Choose application– E.g. PAW to process DC1 ntuples– Or Athena to process DC2 AOD– Also Athena reconstruction?

• Define task– Analysis: provide code to define and fill histograms– Production: athena job options, maybe code– Perhaps select starting point from provenance catalog

• Select input dataset– From dataset metadata catalog service

Page 8: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 8

David Adams

ATLAS

Scenario for first release (cont)• Create job configuration

– Response time, role, …

• Locate processing service

• Submit job– Application, task, dataset, configuration

• While job is running– Query service for status and partial results

– Examine partial results (e.g. histograms)

– Kill job if results are bad

• When job is finished– Examine complete result

– Modify task or select new dataset and repeat

Page 9: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 9

David Adams

ATLAS

Plan for first releaseSchedule

• Implement and deploy in advance of March 2004 software workshop

• Provide starting point for discussion at that meeting

Building blocks• Code and developers in GANGA and DIAL

– Following sections summarize current status

• LCG project following from ARDA– Just starting; so don’t wait but

– Stay closely coupled to that project

• Open to contributions (especially effort) from others

Page 10: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 10

David Adams

ATLAS

Ganga: status update (1)Work since September software week has focused on refactoring, to create a system that is more modular and more flexible

• In short-term (next 1-2 months), changes will mainly affect developers

• In longer term (in time for DC2) will see significant gains for users: improvements in functionality, ease of use and stability

Have introduced PyBus software bus, developed by W. Lavrijsen with contributions from K. Harrison

• Allows association between module and logical name to be made at run time

• Makes system more configurable: supports ATLAS/LHCb customizations and user add-ons

Moving to XML-based job description• Mechanics have been worked out, but still defining details of XML

schema• Aim to have job description consistent with DIAL (and others?)

Page 11: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 11

David Adams

ATLAS

Ganga: status update (2)Job-options editor (JOE) is evolving to become a more powerful, standalone component, which will be loaded by Ganga

• Assist user in the creation/modification of Gaudi/Athena job options by presenting the user with a hierarchical view of available options files and helping the user with value entry

• In process of creating Job Options Information Resource (JOIR) database

– JOIR database of job options will facilitate validation by providing valid ranges, valid option choices, and option descriptions

– Considering suggestions from LHCb for improving automated job-option extraction

Page 12: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 12

David Adams

ATLAS

Ganga: job definition and submission to LCG

Application

SelectApplication

PrepareSandbox

PrepareAlgFlowOptions

and DLLs

AlgorithmFlowEdit

AlgorithmFlow

AlgParamOptions

SelectDatasets

EditAlgParamOptions

DatasetOptions

AlgFlowOptions

SandboxDLLs

JobOptionsFileCatalogue slice

Submit Job

Metadatacatalogue

AlgOptionscatalogue

DLLs

Filecatalogue

Page 13: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 13

David Adams

ATLAS

Ganga: future plans

Plans well defined up to March 2004• Work towards Ganga/DIAL integration within ADA

• Enable job submission to LCG

• Release improved version of JOE

• Include interface to Pacman 3 for package installation– Informal Pacman workshop pencilled in for January 2004

• More tentatively, looking at possibilities for interfacing to Atlantis for displaying event data

Request for GridPP funding beyond December 2004 requires ATLAS/LHCb work plan for Ganga up to September 2007

• Need to ensure ATLAS priorities are taken into account

Page 14: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 14

David Adams

ATLAS

DIAL statusRelease 0.60

• Made in November

• Has application to process combined ntuple datasets with PAW

• Command line and ROOT clients

• Processing can be done by instantiating a private scheduler or by contacting a persistent web service

• Dataset catalogs have been implemented– DSC – dataset selection catalog

– DRC – dataset replica catalog

– Datasets created for all DC1 combined ntuples

Page 15: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 15

David Adams

ATLAS

DIAL status (cont)High-level JDL

• DIAL envisions a hierarchy of schedulers

• Interface to these schedulers constitutes a high-level JDL (job definition language)

– Job submission, monitoring and gathering of results

– See figure

• Would like to standardize this JDL so schedulers can be shared between projects and experiments

– See figure

Page 16: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 16

David Adams

ATLAS

UserAnalysis

Job 1

Job 2

Application Task

Dataset 1

Scheduler

1. Create or locate

2. select 3. Create or select

4. select

5. submit(app,tsk,ds)

6. splitDataset

Dataset 2

7. create

e.g. ROOT

e.g. athena

Result9. fill

10. gather

Result 9. fill

Result CodeComponents of DIAL

high-level JDL

Page 17: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 17

David Adams

ATLAS

DIAL status: sharing via JDL

P I/S E A L

G A N G A

R O O T

J A S

C o m m a nd line

H ighle v e lJ D L

P R O O FG A N G A -LC G

C o nd o r-G

G C E /C him e raS T A RJ D A P (J A S )

D IA L-inte ra c tiv e

An a ly s is e n v iro n m e n ts S ch e d u le r s

G r idse rv ice s

A T LA S p ro d u c tio n

P lu g -inclie n ts

P o rta l/s w itc h

Page 18: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 18

David Adams

ATLAS

Deliverables for first releaseComments

• Goal is to support the scenario outlined earlier

• Build on current GANGA and DIAL implementations and plans

• Emergence of ARDA project may change plans

• Coordination with ATLAS production may also lead to changes

• Add more tasks if more ideas and effort are found

Page 19: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 19

David Adams

ATLAS

Deliverables for first release (cont)Authentication service

• GSI based

• Support both EDG and US certificates

High-level JDL• Start from current DIAL interface

• Incorporate ideas from PPDG, ARDA, …– If available in time

• This defines the interface (WSDL) for the following analysis and production services

Page 20: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 20

David Adams

ATLAS

Deliverables for first release (cont)Interactive analysis service

• Build on existing DIAL scheduler service– Add authentication

– Deploy as web or grid service

• Client schedulers– Keep command line and ROOT clients

– Add Python (GANGA) client

> Possibly with associated GUI

• Application/task/dataset– Keep PAW with fortran task to fill histograms from HBOOK

combined ntuples

– Add ROOT with C++ task to fill from ROOT ntuples?

– Add athena with C++ task to fill from AOD?

Page 21: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 21

David Adams

ATLAS

Deliverables for first release (cont)User-level batch production service?

• Start from GANGA LCG submission service– Add high-level JDL– Requires GANGA to support client-server

• Other candidates for production services:– GCE/Chimera– DIAL– New ATLAS production model– Switch to choose between these

• Supported production tasks– Reconstruction– Simulation?– Event generation?– Fill histograms from AOD?

Page 22: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 22

David Adams

ATLAS

Deliverables for first release (cont)Dataset and file catalog services

• Functionality:– Means for users to select an input dataset

– Means for production to register output dataset

– Means for system (e.g. DIAL scheduler) to turn dataset specification into accessible physical files

• Start from AMI and DIAL

• Need file catalog and replication services– Magda, RLS1, RLS2, …

Page 23: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 23

David Adams

ATLAS

ConclusionsDistributed analysis is a new project for ATLAS

Philosophy• Tightly integrate with non-distributed analysis

• Be neutraluse client-server mechanism to support different analysis environments and different processing systems

• Be flexiblecapabilities (and hence demands) will change as technology evolves

• Be responsive to evolving user requirements

• Build on existing ideas and projects including GANGA, DIAL, ATLAS production and ARDA

Page 24: David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN

ADA Plans ATLAS SW – Grid session December 2, 2003 24

David Adams

ATLAS

Conclusions (cont)Plan of action

• Define interface (high-level JDL)

• Quickly implement services for analysis, user-level production and dataset catalogs

• Expose to users, learn lessons and re-implement

• Repeat

More information• Web site coming soon

• Mail to [email protected]