Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

Preview:

DESCRIPTION

Jürgen Knobloch API 3 ANAPHE Components Lizard Commander Histograms NTuples Fitting Plotting VectorOfPoints Functions Analyzer Abstract types HTL Tags (HepODBMS Gemini/HepFitting Qplotter VectorOfPoints CLHEP Class Libraries for HEP Python SWIG Objectivity/DB NAG-C Qt Implemetations (HEP-specific) non-HEP components User Interface - using Abstract Types AIDA (Abstract Interfaces for Data Analysis)

Citation preview

Analysis Software Strategy

Jürgen KnoblochHTASC, DESY 9 October 2001

AIDA

ANAPHE

LIZARD

API 2Jürgen Knobloch

Anaphe – Lizard - AIDA

• Anaphe - Libraries for Hep Computing– Full replacement of CERNLIB– Open source, mostly “license-free”

• Lizard is based on Anaphe components and the Python scripting language (through SWIG)

– PAW functionality– young but has very solid base in mature Anaphe– real plug-in structure

• AIDA project (Abstract Interfaces for Data Analysis)

– Interface definitions & classes for C++ and Java

API 3Jürgen Knobloch

ANAPHE Components

LizardCommander

HistogramsNTuplesFittingPlottingVectorOfPointsFunctionsAnalyzer

Abstract types

HTLTags (HepODBMSGemini/HepFittingQplotterVectorOfPoints

CLHEPClass Libraries for

HEP

Python SWIG

Objectivity/DBNAG-CQt

Implemetations (HEP-specific)

non-HEP components

User Interface - using Abstract Types

AIDA(Abstract

Interfaces for Data Analysis)

API 4Jürgen Knobloch

ANAPHE - History

• LHC++ “Libraries for Hep Computing”– 1995 project initiated with the aim to develop a modular

replacement of CERNLIB (lib and tools)– using standards (STL, OpenGL, ...) and commercial

components (e.g., NAG_C, OpenInventor, IrisExplorer)

• First iteration on physics data analysis tool – data driven approach (based on IRIS Explorer) – GUI based, not command line driven

• Request to create new physics analysis tool – September 99:new requirements defined together with

experiments– identified categories/components and Abstract Types

API 5Jürgen Knobloch

AIDA - History

• started in Sept. 1999 (HepVis 99, Orsay)– several (mini-) workshops since then

• main ones Paris 2000 and Boston 2001• release 1.0 summer 2000

– concentrated on “developers view”• Histogram package only

– IAxis, IHistogram, IHistogram1D, IHistogram2D, IHistogramFactory

• release 2.0 May 2001 (“Boston release”)– about 20 Interface classes – aiming at discussion and gathering feedback

API 6Jürgen Knobloch

History and Status - Lizard

• Started after CHEP-2000• Full version out since June 2001

– “PAW like” analysis functionality plus– on-demand loading of compiled code using shared

libraries• gives full access to experiment’s analysis code

and data– based on Abstract Interfaces

• flexible and extensible• License-free version available

API 7Jürgen Knobloch

USDP-like approach

• Start with OO analysis– collection of user requirements

• OO design phase– define categories and classes– find patterns

• Create prototype– considered “throw away”

• get feedback from users• Iterate, iterate, iterate ...

API 8Jürgen Knobloch

User requirements

• Ease of use (“like PAW”)• Foresee customization/integration

– e.g., use persistency/messaging/... from experiment

• Framework used will not be exposed/imposed– needs to be compatible with experiment’s

framework• Plan for extensions

– “code for now, design for the future”• Maximize flexibility/interoperability

API 9Jürgen Knobloch

Architecture Overview

• Maximize flexibility and re-use– Abstract Interfaces allow each component to

develop independently– not bound to a specific implementation of any

component (“plugin” style) – re-use of existing packages to implement

components reduces start-up time significantly• Identify and use patterns - avoid anti-patterns

– learn from other people’s experiences/failures

API 10Jürgen Knobloch

“Ignominy” Tool to Quantify Modularity

0123456789

101112

0% 10% 20% 30% 40% 50% 60% 70%% packages in cyclic dependency

NC

CD

= s

um s

ingl

e pa

ckag

e de

pend

enci

es n

orm

alis

ed to

a

bina

ry tr

ee

ATLAS

IGUANA

COBRA (CMS)

ROOT

ORCA (CMS)

Anaphe / Lizard

GEANT4

idealNCCD (“spaghetti index”)

1.0: good toolkit< 1.0: independent packages> 1.0: strongly-coupled

ideal toolkit

See:See:[8-024][8-024]

IncludesFortran

API 11Jürgen Knobloch

CLHEP

• HEP foundation class library– Random number generators– Physics vectors

• 3- and 4- vectors– Geometry– Linear algebra– System of units– more packages recently added

• will continue to evolve• wwwinfo.cern.ch/asd/lhc++/clhep/

API 12Jürgen Knobloch

2D Graphics libraries

• Qt – multi-platform C++ GUI toolkit

• C++ class library, not wrapper around C libs

• superset of Motif and MFC• available on Unix and MS Windows• no change for developer

– commercial but with public domain version– www.troll.no

• Qplotter– “add-on” functionality for HEP

• “HIGZ/HPLOT”

API 13Jürgen Knobloch

Basic 3D Graphic Libraries

• OpenGL (basic graphics)– De-facto industry standard for basic 3D graphics– Used in CAD/CAE, games, VR, medical imaging

•OpenInventor (scene mgmt.)–OO 3D toolkit for graphics–Cubes, polygons, text, materials–Cameras, lights, picking–3D viewers/editors,animation–Based on OpenGL/MesaGL

API 14Jürgen Knobloch

Mathematical Libraries

• NAG (Numerical Algorithms Group) C Library– Covers a broad range of functionality

• Linear algebra• differential equations• quadrature, etc.

– Special functions of CERNLIB added to Mark-6 release

• mostly for theory and accelerator • Quality assurance• extensive testing done by NAG

– www.nag.com

API 15Jürgen Knobloch

Histograms: the HTL package

• Histograms are the basic tool for physics analysis

– Statistical information of density distributions

• Histogram Template Library (HTL) – design based on C++ templates– Modular : separation between sampling and

display– Extensible : open for user defined binning

systems– Flexible: support transient/persistent at the

same time– Open: large use of abstract interfaces– recent addition: 3D histograms

API 16Jürgen Knobloch

Fitting and Minimization

• Fitting and Minimization Library (FML)

– common OO interface • NAG-C, MINUIT

– based on Abstract Interfaces• IVector, IModelFunction, …

• Gemini– common minimization

interface – very thin layer

API 17Jürgen Knobloch

Tags, Ntuples and Events

• NtupleTag Library– Ntuple navigation and analysis– common OO interface for different storage

• ODBMS • HBook (CERNLIB)

• Exploiting Tag concept – enhanced Ntuples– associated with an underlying persistent store– optional association to the Event may be used to navigate to

any other part of the Event• even from an interactive visualization program

– main use: speedup data selection for analysis…• Tag data is typically better clustered than the original

data

Object Association

API 18Jürgen Knobloch

Interactive Data Analysis

• Aim: “OO replacement for PAW”– analysis of “ntuple-like data” (“Tags”, “Ntuples”, …)– visualisation of data (Histograms, scatter-plot, “Vectors”)– fitting of histograms (and other data)– access to experiment specific data/code

• Maximize flexibility and re-use– plug-in structure– careful design with limited source and binary dependencies

• Foresee customization/integration– allow use from within experiment’s s/w framework!

API 19Jürgen Knobloch

Lizard: Interfaces

API 20Jürgen Knobloch

Anaphe components

API 21Jürgen Knobloch

Scripting (Architectural issue)

• Typical use of scripting is quite different from programming (reconstruction, analysis, ...)

– history “go back to where I was before”– repetition/looping - with “modifiable parameters”

• SWIG to (semi-) automatically create connection to chosen scripting language

– allows flexibility to choose amongst several scripting languages

– Python, Perl, Tcl, Guile, Ruby, (Java) …• Python - OO scripting, no “strange $!%-variables”

– other scripting languages possible (through SWIG)• Can be enhanced and/or replaced by a GUI

– scripting window within GUI application

API 22Jürgen Knobloch

Sample session

API 23Jürgen Knobloch

Future Enhancements

• Access to other implementations of components

– HBOOK histograms and ntuples (RWN) – mostly done

– OpenScientist, ROOT histograms?• Adding other “scripting” languages

– Perl , Tcl, cint ?• Communication with Java tools/packages

– via AIDA• JAS • WIRED

API 24Jürgen Knobloch

Distributed Computing

• Motivation– move code to data– parallel analysis

• Techniques– services via AI– late binding– plug-in architecture

• End-user (Lizard)– look-and-feel of local analysis

• R&D started and first prototype available soon– CORBA

API 25Jürgen Knobloch

The AIDA project

• AIDA project (Abstract Interfaces for Data Analysis)

• Presently active mainly developers from existing packages

– Tony Johnson (JAS)– Andreas Pfeiffer (Lizard/Anaphe)– Guy Barrand (OpenScientist )– Mark Dönszelmann (Wired)– Developers from LHCb/Gaudi

API 26Jürgen Knobloch

AIDA

• Design Interfaces for Data Analysis (in HEP)– “The goals of the AIDA project are to define

abstract interfaces for common physics analysis tools, such as histograms. The adoption of these interfaces should make it easier for developers and users to select to use different tools without having to learn new interfaces or change their code. In addition it should be possible to exchange data (objects) between AIDA compliant applications.” (http://aida.freehep.org)

• Open for contributions of any kind– questions, suggestions, code, implementations …

API 27Jürgen Knobloch

Motivation

• Minimize coupling between components• Provide flexibility to interchange implementations of

these interfaces• Allows and try to re-use existing packages

– even across “language boundaries”• e.g., C++ analysis using Java Histograms

• Allow for faster turn-around time

API 28Jürgen Knobloch

Components + Abstract Interfaces

• User Code uses only Interface classes

– IHistogram1D * hist = histoFactory-> create1D(‘track quality’, 100, 0., 10.)

• Actual implementations are selected at run-time

– loading of shared libraries

• No change at all to user code but keep freedom to choose implementation

Histo-IF Fitter-IF

User Code

Histo-Impl. 2

Fitter-Impl. Y

Histo-Impl. 1

Fitter-Impl. X

API 29Jürgen Knobloch

Architectural issue:Abstract Interfaces• Abstract Interfaces

– Only pure virtual methods, inheritance only from other Abstract Interfaces

– Components use other components only through their Abstract Interface

– Defines a kind of a “protocol” for a component – Allow each component to develop independently

• reduces maintenance effort significantly– Maximize flexibility and re-use of packages

• Re-use of existing packages to implement components reduces start-up time significantly

API 30Jürgen Knobloch

Architectural issue: Components (I)

• Identify components by functionality• Define “protocol” using Abstract Interfaces• Emphasize separation of different aspects for

each component– Example: Histogram

• statistical entity (density distribution of a physics quantity)

• view of a “collection of data points” (which can be a density distribution but also a detector efficiency curve)

• command to manipulate/store/plot/fit/...

API 31Jürgen Knobloch

Architectural issue: Components (II)

• “User’s view” is different from “implementor’s (developer’s) view”

– separate Abstract Interfaces for both aspects – “command-layer” vs. “implementation-layer(s)”

• UserInterface as a separate component– by definition couples to most of the other

components • Facade pattern

– promotes weak coupling between the other components

– interfaces to scripting and/or GUI

API 32Jürgen Knobloch

Initial Categories and dependencies

API 33Jürgen Knobloch

Across the languages

• JAida : C++ access to Java libs– using C++ proxies implementing the C++ Abstract Interfaces

to the Java interfaces

C++UserCode

AIDA-IF C++ AIDA-IF Java

Java Lib

JAida

API 34Jürgen Knobloch

Infrastructure (I)

• Location of repository (Java and C++)– Anonymous CVS access – :pserver:anoncvs@cvs.freehep.org:/cvs/aida

(passwd aida)– module: aida

• “Release area” for C++ code – /afs/cern.ch/sw/contrib/AIDA/<version>/AIDA/

• #include <AIDA/IHistogram.h>• version : x.y.p

– tarballs (C++) available in /afs/cern.ch/sw/contrib/AIDA/tar/AIDA-<version>.tar.gz

API 35Jürgen Knobloch

Infrastructure (III)

• Mailing lists (archived) – <listName>@cern.ch

• project-aida-dev (open)• project-aida (open)• project-aida-announce (posting moderated,

subscription open)• Web page (http://aida.freehep.org/)

– Updated automatically from repository– On web page links to implementations

• from whoever provides one (and informs us)

API 36Jürgen Knobloch

Release schedule / plan

• Release for discussion and feedback – even if not (yet) “complete”

• V 2.0 mid May 2001 (“Boston release”)– C++ and Java version of the Interfaces

• V 2.1 Aug. 2001 (“Genova release”)– updates from discussions at Geant-4 workshop– fixed some problems with C++ version

• Aiming for 2-3 month release frequency

API 37Jürgen Knobloch

More information

• cern.ch/Anaphe• cern.ch/Anaphe/Lizard• aida.freehep.org/• cern.ch/DB• wwwinfo.cern.ch/asd/lhc++/clhep/

API 38Jürgen Knobloch

“Use-cases” of AIDA

• Java reference implementation in FreeHEP repository

• JAS, OpenScientist and Lizard/Anaphe plan for implementations of version 2.x by Dec. 2001

• Used by Gaudi/Athena (LHCb, Atlas, Harp)– Gaudi people involved in design

• Adopted and used in Geant-4 examples/testing– new category created in Geant-4 for analysis

• No need to go for “least common denominator”– use “reasonable” superset and concentrate on design

API 39Jürgen Knobloch

AIDA - Summary

• Design of Abstract Interfaces for Data Analysis

– Maximize flexibility and re-use– Allow for faster turn-around time – Allows for and try to re-use existing packages

• No need to go for “least common denominator”

– use “reasonable” superset – concentrate on proper design

Recommended