39
Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

Embed Size (px)

DESCRIPTION

Jürgen Knobloch API 3 ANAPHE Components Lizard Commander Histograms NTuples Fitting Plotting VectorOfPoints Functions Analyzer Abstract types HTL Tags (HepODBMS Gemini/HepFitting Qplotter VectorOfPoints CLHEP Class Libraries for HEP Python SWIG Objectivity/DB NAG-C Qt Implemetations (HEP-specific) non-HEP components User Interface - using Abstract Types AIDA (Abstract Interfaces for Data Analysis)

Citation preview

Page 1: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

Analysis Software Strategy

Jürgen KnoblochHTASC, DESY 9 October 2001

AIDA

ANAPHE

LIZARD

Page 2: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 2Jürgen Knobloch

Anaphe – Lizard - AIDA

• Anaphe - Libraries for Hep Computing– Full replacement of CERNLIB– Open source, mostly “license-free”

• Lizard is based on Anaphe components and the Python scripting language (through SWIG)

– PAW functionality– young but has very solid base in mature Anaphe– real plug-in structure

• AIDA project (Abstract Interfaces for Data Analysis)

– Interface definitions & classes for C++ and Java

Page 3: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 3Jürgen Knobloch

ANAPHE Components

LizardCommander

HistogramsNTuplesFittingPlottingVectorOfPointsFunctionsAnalyzer

Abstract types

HTLTags (HepODBMSGemini/HepFittingQplotterVectorOfPoints

CLHEPClass Libraries for

HEP

Python SWIG

Objectivity/DBNAG-CQt

Implemetations (HEP-specific)

non-HEP components

User Interface - using Abstract Types

AIDA(Abstract

Interfaces for Data Analysis)

Page 4: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 4Jürgen Knobloch

ANAPHE - History

• LHC++ “Libraries for Hep Computing”– 1995 project initiated with the aim to develop a modular

replacement of CERNLIB (lib and tools)– using standards (STL, OpenGL, ...) and commercial

components (e.g., NAG_C, OpenInventor, IrisExplorer)

• First iteration on physics data analysis tool – data driven approach (based on IRIS Explorer) – GUI based, not command line driven

• Request to create new physics analysis tool – September 99:new requirements defined together with

experiments– identified categories/components and Abstract Types

Page 5: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 5Jürgen Knobloch

AIDA - History

• started in Sept. 1999 (HepVis 99, Orsay)– several (mini-) workshops since then

• main ones Paris 2000 and Boston 2001• release 1.0 summer 2000

– concentrated on “developers view”• Histogram package only

– IAxis, IHistogram, IHistogram1D, IHistogram2D, IHistogramFactory

• release 2.0 May 2001 (“Boston release”)– about 20 Interface classes – aiming at discussion and gathering feedback

Page 6: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 6Jürgen Knobloch

History and Status - Lizard

• Started after CHEP-2000• Full version out since June 2001

– “PAW like” analysis functionality plus– on-demand loading of compiled code using shared

libraries• gives full access to experiment’s analysis code

and data– based on Abstract Interfaces

• flexible and extensible• License-free version available

Page 7: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 7Jürgen Knobloch

USDP-like approach

• Start with OO analysis– collection of user requirements

• OO design phase– define categories and classes– find patterns

• Create prototype– considered “throw away”

• get feedback from users• Iterate, iterate, iterate ...

Page 8: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 8Jürgen Knobloch

User requirements

• Ease of use (“like PAW”)• Foresee customization/integration

– e.g., use persistency/messaging/... from experiment

• Framework used will not be exposed/imposed– needs to be compatible with experiment’s

framework• Plan for extensions

– “code for now, design for the future”• Maximize flexibility/interoperability

Page 9: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 9Jürgen Knobloch

Architecture Overview

• Maximize flexibility and re-use– Abstract Interfaces allow each component to

develop independently– not bound to a specific implementation of any

component (“plugin” style) – re-use of existing packages to implement

components reduces start-up time significantly• Identify and use patterns - avoid anti-patterns

– learn from other people’s experiences/failures

Page 10: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 10Jürgen Knobloch

“Ignominy” Tool to Quantify Modularity

0123456789

101112

0% 10% 20% 30% 40% 50% 60% 70%% packages in cyclic dependency

NC

CD

= s

um s

ingl

e pa

ckag

e de

pend

enci

es n

orm

alis

ed to

a

bina

ry tr

ee

ATLAS

IGUANA

COBRA (CMS)

ROOT

ORCA (CMS)

Anaphe / Lizard

GEANT4

idealNCCD (“spaghetti index”)

1.0: good toolkit< 1.0: independent packages> 1.0: strongly-coupled

ideal toolkit

See:See:[8-024][8-024]

IncludesFortran

Page 11: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 11Jürgen Knobloch

CLHEP

• HEP foundation class library– Random number generators– Physics vectors

• 3- and 4- vectors– Geometry– Linear algebra– System of units– more packages recently added

• will continue to evolve• wwwinfo.cern.ch/asd/lhc++/clhep/

Page 12: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 12Jürgen Knobloch

2D Graphics libraries

• Qt – multi-platform C++ GUI toolkit

• C++ class library, not wrapper around C libs

• superset of Motif and MFC• available on Unix and MS Windows• no change for developer

– commercial but with public domain version– www.troll.no

• Qplotter– “add-on” functionality for HEP

• “HIGZ/HPLOT”

Page 13: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 13Jürgen Knobloch

Basic 3D Graphic Libraries

• OpenGL (basic graphics)– De-facto industry standard for basic 3D graphics– Used in CAD/CAE, games, VR, medical imaging

•OpenInventor (scene mgmt.)–OO 3D toolkit for graphics–Cubes, polygons, text, materials–Cameras, lights, picking–3D viewers/editors,animation–Based on OpenGL/MesaGL

Page 14: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 14Jürgen Knobloch

Mathematical Libraries

• NAG (Numerical Algorithms Group) C Library– Covers a broad range of functionality

• Linear algebra• differential equations• quadrature, etc.

– Special functions of CERNLIB added to Mark-6 release

• mostly for theory and accelerator • Quality assurance• extensive testing done by NAG

– www.nag.com

Page 15: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 15Jürgen Knobloch

Histograms: the HTL package

• Histograms are the basic tool for physics analysis

– Statistical information of density distributions

• Histogram Template Library (HTL) – design based on C++ templates– Modular : separation between sampling and

display– Extensible : open for user defined binning

systems– Flexible: support transient/persistent at the

same time– Open: large use of abstract interfaces– recent addition: 3D histograms

Page 16: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 16Jürgen Knobloch

Fitting and Minimization

• Fitting and Minimization Library (FML)

– common OO interface • NAG-C, MINUIT

– based on Abstract Interfaces• IVector, IModelFunction, …

• Gemini– common minimization

interface – very thin layer

Page 17: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 17Jürgen Knobloch

Tags, Ntuples and Events

• NtupleTag Library– Ntuple navigation and analysis– common OO interface for different storage

• ODBMS • HBook (CERNLIB)

• Exploiting Tag concept – enhanced Ntuples– associated with an underlying persistent store– optional association to the Event may be used to navigate to

any other part of the Event• even from an interactive visualization program

– main use: speedup data selection for analysis…• Tag data is typically better clustered than the original

data

Object Association

Page 18: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 18Jürgen Knobloch

Interactive Data Analysis

• Aim: “OO replacement for PAW”– analysis of “ntuple-like data” (“Tags”, “Ntuples”, …)– visualisation of data (Histograms, scatter-plot, “Vectors”)– fitting of histograms (and other data)– access to experiment specific data/code

• Maximize flexibility and re-use– plug-in structure– careful design with limited source and binary dependencies

• Foresee customization/integration– allow use from within experiment’s s/w framework!

Page 19: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 19Jürgen Knobloch

Lizard: Interfaces

Page 20: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 20Jürgen Knobloch

Anaphe components

Page 21: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 21Jürgen Knobloch

Scripting (Architectural issue)

• Typical use of scripting is quite different from programming (reconstruction, analysis, ...)

– history “go back to where I was before”– repetition/looping - with “modifiable parameters”

• SWIG to (semi-) automatically create connection to chosen scripting language

– allows flexibility to choose amongst several scripting languages

– Python, Perl, Tcl, Guile, Ruby, (Java) …• Python - OO scripting, no “strange $!%-variables”

– other scripting languages possible (through SWIG)• Can be enhanced and/or replaced by a GUI

– scripting window within GUI application

Page 22: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 22Jürgen Knobloch

Sample session

Page 23: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 23Jürgen Knobloch

Future Enhancements

• Access to other implementations of components

– HBOOK histograms and ntuples (RWN) – mostly done

– OpenScientist, ROOT histograms?• Adding other “scripting” languages

– Perl , Tcl, cint ?• Communication with Java tools/packages

– via AIDA• JAS • WIRED

Page 24: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 24Jürgen Knobloch

Distributed Computing

• Motivation– move code to data– parallel analysis

• Techniques– services via AI– late binding– plug-in architecture

• End-user (Lizard)– look-and-feel of local analysis

• R&D started and first prototype available soon– CORBA

Page 25: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 25Jürgen Knobloch

The AIDA project

• AIDA project (Abstract Interfaces for Data Analysis)

• Presently active mainly developers from existing packages

– Tony Johnson (JAS)– Andreas Pfeiffer (Lizard/Anaphe)– Guy Barrand (OpenScientist )– Mark Dönszelmann (Wired)– Developers from LHCb/Gaudi

Page 26: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 26Jürgen Knobloch

AIDA

• Design Interfaces for Data Analysis (in HEP)– “The goals of the AIDA project are to define

abstract interfaces for common physics analysis tools, such as histograms. The adoption of these interfaces should make it easier for developers and users to select to use different tools without having to learn new interfaces or change their code. In addition it should be possible to exchange data (objects) between AIDA compliant applications.” (http://aida.freehep.org)

• Open for contributions of any kind– questions, suggestions, code, implementations …

Page 27: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 27Jürgen Knobloch

Motivation

• Minimize coupling between components• Provide flexibility to interchange implementations of

these interfaces• Allows and try to re-use existing packages

– even across “language boundaries”• e.g., C++ analysis using Java Histograms

• Allow for faster turn-around time

Page 28: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 28Jürgen Knobloch

Components + Abstract Interfaces

• User Code uses only Interface classes

– IHistogram1D * hist = histoFactory-> create1D(‘track quality’, 100, 0., 10.)

• Actual implementations are selected at run-time

– loading of shared libraries

• No change at all to user code but keep freedom to choose implementation

Histo-IF Fitter-IF

User Code

Histo-Impl. 2

Fitter-Impl. Y

Histo-Impl. 1

Fitter-Impl. X

Page 29: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 29Jürgen Knobloch

Architectural issue:Abstract Interfaces• Abstract Interfaces

– Only pure virtual methods, inheritance only from other Abstract Interfaces

– Components use other components only through their Abstract Interface

– Defines a kind of a “protocol” for a component – Allow each component to develop independently

• reduces maintenance effort significantly– Maximize flexibility and re-use of packages

• Re-use of existing packages to implement components reduces start-up time significantly

Page 30: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 30Jürgen Knobloch

Architectural issue: Components (I)

• Identify components by functionality• Define “protocol” using Abstract Interfaces• Emphasize separation of different aspects for

each component– Example: Histogram

• statistical entity (density distribution of a physics quantity)

• view of a “collection of data points” (which can be a density distribution but also a detector efficiency curve)

• command to manipulate/store/plot/fit/...

Page 31: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 31Jürgen Knobloch

Architectural issue: Components (II)

• “User’s view” is different from “implementor’s (developer’s) view”

– separate Abstract Interfaces for both aspects – “command-layer” vs. “implementation-layer(s)”

• UserInterface as a separate component– by definition couples to most of the other

components • Facade pattern

– promotes weak coupling between the other components

– interfaces to scripting and/or GUI

Page 32: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 32Jürgen Knobloch

Initial Categories and dependencies

Page 33: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 33Jürgen Knobloch

Across the languages

• JAida : C++ access to Java libs– using C++ proxies implementing the C++ Abstract Interfaces

to the Java interfaces

C++UserCode

AIDA-IF C++ AIDA-IF Java

Java Lib

JAida

Page 34: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 34Jürgen Knobloch

Infrastructure (I)

• Location of repository (Java and C++)– Anonymous CVS access – :pserver:[email protected]:/cvs/aida

(passwd aida)– module: aida

• “Release area” for C++ code – /afs/cern.ch/sw/contrib/AIDA/<version>/AIDA/

• #include <AIDA/IHistogram.h>• version : x.y.p

– tarballs (C++) available in /afs/cern.ch/sw/contrib/AIDA/tar/AIDA-<version>.tar.gz

Page 35: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 35Jürgen Knobloch

Infrastructure (III)

• Mailing lists (archived) – <listName>@cern.ch

• project-aida-dev (open)• project-aida (open)• project-aida-announce (posting moderated,

subscription open)• Web page (http://aida.freehep.org/)

– Updated automatically from repository– On web page links to implementations

• from whoever provides one (and informs us)

Page 36: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 36Jürgen Knobloch

Release schedule / plan

• Release for discussion and feedback – even if not (yet) “complete”

• V 2.0 mid May 2001 (“Boston release”)– C++ and Java version of the Interfaces

• V 2.1 Aug. 2001 (“Genova release”)– updates from discussions at Geant-4 workshop– fixed some problems with C++ version

• Aiming for 2-3 month release frequency

Page 37: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 37Jürgen Knobloch

More information

• cern.ch/Anaphe• cern.ch/Anaphe/Lizard• aida.freehep.org/• cern.ch/DB• wwwinfo.cern.ch/asd/lhc++/clhep/

Page 38: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 38Jürgen Knobloch

“Use-cases” of AIDA

• Java reference implementation in FreeHEP repository

• JAS, OpenScientist and Lizard/Anaphe plan for implementations of version 2.x by Dec. 2001

• Used by Gaudi/Athena (LHCb, Atlas, Harp)– Gaudi people involved in design

• Adopted and used in Geant-4 examples/testing– new category created in Geant-4 for analysis

• No need to go for “least common denominator”– use “reasonable” superset and concentrate on design

Page 39: Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD

API 39Jürgen Knobloch

AIDA - Summary

• Design of Abstract Interfaces for Data Analysis

– Maximize flexibility and re-use– Allow for faster turn-around time – Allows for and try to re-use existing packages

• No need to go for “least common denominator”

– use “reasonable” superset – concentrate on proper design