Upload
garth
View
46
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Data Analysis: Algorithms & Methods. Highlights. Vincenzo Innocente (CERN-CMS) Ed Frank (Univ. of Pennsylvania - BaBar). Contributions. General Architecture 12 Foundation Libraries 3 Detector reconstruction (all but one: tracking!) Focus on Program Structure 7 Strictly Algorithms 3 - PowerPoint PPT Presentation
Citation preview
CHEP 2000 - Highlights from Session A
1
Data Analysis: Algorithms & Methods
Vincenzo Innocente (CERN-CMS)
Ed Frank (Univ. of Pennsylvania - BaBar)
Highlights
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 2
Contributions
General Architecture 12Foundation Libraries 3Detector reconstruction (all but one:
tracking!) Focus on Program Structure 7 Strictly Algorithms 3
Simulation 8Detector description 4
CHEP 2000 - Highlights from Session A
3
Architecture
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 4
ORCA Software & Architecture
When project started, most people were worried about ways to bring on the physicists, develop the sub-detector software etc. Important, major emphasis of the last year, but actually less
critical in the long term
Engineering of the architecture, and crucially the data-handling issues, are really the critical items Tracking algorithms can, and will, be rewritten many times. But
having an architecture that allows and keeps track of plug-and-play is vital.
Even now we face very large datasets (multi TB). Production, automation, mirroring, evolution are (some of) the hard issues.
Reconstruction is much more than the reconstruction code
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 5
Offline Architecture: New Requirements
Bigger Experiment, higher rate, more dataLarger and dispersed user community performing
non trivial queries against a large event store Make best use of new IT technologiesIncreased demand of both flexibility and
coherence ability to plug-in new algorithms ability to run the same algorithms in multiple
environments guarantees of quality and reproducibility high-performance user-friendliness
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 6
CMS (offline) Software
Slow ControlOnline Monitoring
Persistent Object Store ManagerObject Database Management System
Environmental data
storeRequest part
of event
Simulation
G3 and or G4
store
store
Data Quality
Calibrations
Group AnalysisUser Analysis
on demand
Request part
of event
Request part of eventStore rec-Obj
and calibrations
Quasi-online
Reconstruction
Request part
of event
Store rec-ObjEvent Filter
Objectivity Formatter
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 7
March 2000 HLT Production Plans
2M events ORCA reconstructed with high-luminosity pile-up
2-4 Tera-Bytes in Objectivity/Db 400 CPU-weeks ~6 Production-Units ~1-2 Production Units off CERN site Copy of all data at CERN in hpss, use of IT/ASD AMS-
backend to stage data to ~1TB of disk pools Mirroring of Data to a few off-site centers, including
trans-Atlantic
Users want (need!) now what they were promised for 2005..
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 8
Offline Architecture:Solution
One coherent architecture from online event filtering to final physics analysis
Clear definition of Clients’ and Services’ interfaces and roles
Framework which orchestrates instances of all these modules
Set of common foundation libraries
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 9
Software Structure
FrameworksToolkits
Reco
nst
ruct
ion
Sim
ula
tion
Analy
sis
Foundation LibrariesTri
ggers
One main framework: GAUDI.
Various specialised frameworks: visualisation, persistency, interactivity, simulation (Geant4), etc.
Basic libraries: STL, CLHEP, etc. (Vocabulary)
Applications implementing the physics algorithms.
DØ C++ Framework Set of well established interfaces from which
reconstruction and analysis algorithms are built.
Propagates events through a sets of algorithms in a well defined and established manner.
The algorithm configuration and set is determined at program execution time.
The framework hides many system related complexities from the user and the algorithm developer and allow for sharing of code for common or related tasks.
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 12
Offline Architecture: Enabling Technologies
C++ & OORun Time Dynamic LoadingEvent Driven NotificationState MachinesPersistent Object StoreDatabase TechnologiesNetworked Client-Server ArchitecturesLayered Architecture to shield the
user from the above!
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 13
CLEO III Dynamic Loading vs. Static Linking
Both equally well supported, can mix. Static linking required for reconstruction jobs
need stable environment for long periods of time
Dynamic Linking/Loading for rapid code development Fast turn-around time needed Cutting link times from hours/minutes to minutes/seconds
Limit the number of libraries to link to: Proper Layering of code Separation of data types from the algorithms that supply them
why would I have to link to a tracker to access tracks??? No direct links between objects reduces # of libs to link to
instead we use index-list objects (“Lattice”)
Run-time cost of resolving symbols is low!
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 14
CMS Conclusions
An “implicit invocation” architecture is a flexible software solution which can scale with the complexity of the CMS project.
ODBMS, integrated into the framework, provides a coherent management of persistent objects coupled with run-time dynamic-loading, allows to
automatically configure an application
The framework can effectively shield physics modules from the underlying technology without penalizing performances
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 15
R egionalC enter
N O VA Architecture
R em oteC lients
Data Management
Analysis Server
Middlew are Components
Remote Analysis
A pplica tion specific; sam pleim plem enta tion provided
N O V A com ponent
Th ird party too l custom ized forand in tegrated in to N O V A
E xisting th ird party too l em ployed by N O V A
P roto typedS ta tus: P lannedIm plem ented
O ffline C ontro lF ram ework
C V S C odeR eposito ry
A nalysisD aem on
D ynam ica llyloaded apps
M yS Q L A na lysisC ata logue
M onitoringM odule
H yperN ewsB ug system
S tateS erver
M obileA nalysis
C lien t
W ebbrowser
V isua lisa tionG C A Q uerynanoD S T
D ata R eposito ry
G randC hallenge
A rch itecture(G C A )
M yS Q L D ataC ata logue
C ata logIn terface
C lien tD ata B inder
M odule
S erverD ata B inder
M odule
P aram etersR eposito ry
M yS Q L C lien tS ta te D B
C lien tD ata B inder
M odule
W eb S erverD atabaseN avigator
Component-based Architecture
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 16
Offline Architecture:Commonalties and Differences
Event Data Reduction Externally: Pipes&Filters Internally: Blackboard CMS: Action on Demand
External Services (geometry, run conditions etc.) Mainly procedural CMS and DØ: “Event” Notification (implicit
invocation)
Lots ofEmcDigis
Lots ofEmcClusters
Lots ofRecoTracks
EmcClustering
TrackAssociator
Lots ofAssociations
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 17
Offline Architecture:Commonalties and Differences
Distinction among data, detector and algorithms Only BaBar makes no clear distinction
Access to object-collections by name everybody uses named registries (flat or tree) central component of Gaudi (LHCB) Services
Persistency insulation layer: Transient copy (managed by the framework) direct smart pointer
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 18
Principal design choices Separation between “data” and “algorithms”
Data objects primarily carry data, have only basic methods e.g. Tracking hits
Algorithm objects primarily manipulate data e.g. Track fitter
Three basic categories of data: “event data” (obtained from particle collisions, real or simulated) “detector data” (structure, geometry, calibration, alignment, ....) “statistical data” (histograms, ....)
Separation between “transient” and “persistent” data. Isolate user code from persistency technology . Different optimisation criteria. Transient as a bridge between independent representations.
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 19
Module, event and environment structure
Modules provide the algorithms Use existing information to create new objects
Styles range from procedural monoliths to OO castles Framework/AC++ provides control & config
Uses TCL scripting, command line Production executables run 300 modules
Objects have behaviors, not just values “Networks of objects collaborate to provide semantics” Internal form of our track objects is irrelevant
Objects kept in event and environment Named access in a flat space
event -> Ifd<EmcCluster>::get(“MergedClusters”) Implemented via ProxyDict
Proxies provide complex access when needed Ensures physical decoupling
Lots ofEmcDigis
Lots ofEmcClusters
Lots ofRecoTracks
EmcClustering
TrackAssociator
Lots ofAssociations
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 21
Transient data store
Algorithms
AlgorithmA
AlgorithmB
AlgorithmC
Data T1
Data T2, T3
Data T2
Data T3, T4
Data T4
Data T5
Logical view
• An Algorithm knows only which data (type and name) it uses as input and produces as output.
• The only coupling between algorithms is via the data. • The execution order of the sub-algorithms is the responsibility of the parent
algorithm.
A
C
B
Parent
Data T1
Data T2
Data T4
Data T3
Data T5
Physical view
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 22
Action on Demand
EventEvent
Rec T2Rec T2
Rec T1Rec T1
Rec HitsRec Hits
AnalysisAnalysis
HitsHits
T1T1
CaloClCaloCl
DetectorDetectorElementElement
Rec HitsRec HitsRec HitsRec Hits
Compare the results of two different track reconstruction algorithms
T2T2 RecRecCaloClCaloCl
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 23
StMaker
StMaker StMaker.maker
.data.const .const.data
1. Init()
2. Make()
GetDataSet()
AddData()
“regular” makers communication
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 24
ALICE's choice
Migrate immediately to C++ Immediately abandon PAW But accept GEANT3.21 (initially)
Adopt the ROOT framework Not worried of being dependent on ROOT Much more worried being dependent on G4, Objy....
Allow use of FORTRAN and C++ Allow to start with wrapping and bad design
Impose a single framework Provide central support, documentation and distribution Train users in the framework
CHEP 2000 - Highlights from Session A
25
Detector Description
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 26
Detector Data Store
Algorithm
Transient Detector Store
DetElement2
DetElement1
Detector DataService
DetectorPersistency
Service
Converter
Converter
Converter
The transient detector store contains a “snapshot” of the detector data valid for the currently processed
event
DetElementDetElement1
DetElementDetElement
DetElement2
Persistent Detector
StoreGeant4Service
G4Converter
Geant4Representation
G4ConverterG4Converter
Input: Why Use XML?
For 1st pass LCD used ad hoc file format, one-of-a-kind code for serial-only parsing of detector geom.
XML is a standard meta-language for defining markup languages. Good free parsers exist, more tools coming.
XML languages are plain-text, self-documenting.
Appl. interface to data (XML document) may be serial or random-access.
Avoid growing private file formats or, worse, hard-coding parameters.
Make it easy (well, easier) for several programs to use same input.
J.Bogart
LCD
Detector Description in XML<lcdparm> <global file=“largeParms2.xml” /> <physical_detector topology=“large” id = “L2” > <volume id=“EM_BARREL” > <tube> <barrel_dimensions inner_r = “196.0” outer_z = “322.0” /> <layering n=“40”> <slice material = “Pb” width = “0.4” /> <slice material = “Tyvek” width = “0.05” /> <slice material = “Polystyrene” width = “0.1” sensitive = “yes” /> </layering>
<segmentation cos_theta = “300” phi = “300” /> </tube> <calorimeter type = “em” /> </volume> ...
Start subdetector description
Geometry,materials
function
End subdectectordescription
J.Bogart
LCD
CHEP 2000 - Highlights from Session A
29
Detector Reconstruction
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 30
Track Reconstruction Framework: Motivation
We cannot implement the optimal track reconstruction algorithm right awayThere’s probably no one optimal algorithm but several,each
optimized for a specific task
We need a flexible framework for developing and evaluating algorithms
The mathematical complexity of track finding/fitting often limits the number of developersThe involved algebra is often localized in a few places
If we could encapsulate the involved algebra in a few classes and separate it from the logic of the algorithm it would make track finding easier for developers
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 31
Objects encapsulate the behavior of: reconstruction information (strip, hit, cluster,…) the detector model (sector, layer, …) algorithm strategies (clusterizer, …) etc.
Reconstruction Object Model (BaBar IFR)
stripstrip ““hit” : 1D-clusterhit” : 1D-cluster
clustercluster
clustercluster
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 32
The BaBar Track Fit Written in OO C++ Integrated with the BaBar software framework Exploits a novel formulation of the Kalman equations
Symmetric processing for both track directions Processing in Parameter and Weight space
reduces the number of matrix inversions required Fit result is expressed as a Piecewise Helix
Joined helix segments describing ‘most likely’ path through space
Integrates support other tracking operations Pattern recognition Alignment
Used to fit >108 tracks in the commissioning run
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 33
Effect Processing
Outward Processing
opt = out inopt = out in
P PP P P P
P P
P
involve matrix inversion involve only linear operations
Weight Space
Parameter Space
Optimal ‘parameters’ are easy to compute
Material EffectsBField EffectsHit Effects
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 34
Code OrganizationTrkRecoTrk TrkRep
51
KalRep
1
N
KalMaker
PiecewiseTrajectory
HelixTrajectory
1
N
KalMaterial
KalBendKalHit
1
2
Inwards and Outwards
KalParams KalWeight
HepVector
HepSymMatrix
1
2 2Lazy Cache
General BaBar Tracking Kalman Specific CLHEP
KalSite A
KalSite A
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 35
KalStub: A Pattern Recognition Tool
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 36
Experience with software development (BaBar IFR)
Inflexible design was spotted when problems repeatedly occurred in the same code areas introducing changes
Applying a more flexible design has usually improved the software management more effective development problems isolation
A concrete example: computation of number of interaction lengths: Abstract base class for cluster curve approximation Path length in the detector model computation has been tested
using a straight line implementation of the curve approximation Polynomial approximation from a fit in each view was
implemented separately The integration of the two pieces has been immediately
successful
CHEP 2000 - Highlights from Session A
37
Simulation
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 38
Geant4 Capabilities
Very powerful Geant4 kernel tracking, stacks, geometry, hits, ..
Extensive & transparent physics models electromagnetic, hadronic, …
extended energy range, new models
Persistency, Visualization, ...Surpasses Geant-3
in nearly every respect
39 Vincenzo InnocenteCHEP 2000 - Highlights from Session A
ESA Space Environment & Effects Analysis Section
X-Ray Surveys of Asteroids and Moons
Induced X-ray line emission:indicator of target composition(~100 m surface layer)
Cosmic rays,jovian electrons
Geant3.21
ITS3.0, EGS4
Geant4
C, N, O line emissions included
Solar X-rays, e, p
Courtesy SOHO EIT
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 40
Hadronic shower models in Geant4
Typical Example of OO designHighly structured and layered object model
(inheritance tree): at each level a given set of functionalities is made
concrete which will be common to a given branch1st level: calculation of cross-sections and final states for
particles in flight and at rest in a medium.5th: implement the fragmentation function for string decay
Result in a flexible framework to implement new hadronic interaction models
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 41
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 42
Changing cuts
Results very stable with variation of cuts even track length
Also see shower profiles for different cuts (next slide) between 10mm
and 50 microns
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 43
CMS Geometry Model using GEANT4
Categories based on responsibilities Geometry categories:
CMS specific, OSCAR (Geant4) & Persistent
Hits categories:CMS & OSCAR
User Interaction categories:User Actions, GUI
Utilities:Materials, Rotation Matrices
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 44
ATLAS Accordion Calorimeter
G3: 0.5 Megabytes, 10 seconds*SPECint95/GeV STATIC GEOMETRY
110 Megabytes of memory CPU time is 9.5 seconds*SPECint95/GeV
PARAMETERIZED GEOMETRY 1500 seconds*SPECint95/GeV (1D voxelization)
TAILORED GEOMETRY (G4Accordeon) 8 Megabytes of memory CPU time is 11.5 seconds*SPECint95/GeV.
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 45
ATLAS Calorimeter The first results on EM shower simulations are
close to test beam and GEANT3 results, but more work is needed to understand the differences.
GEANT4 performance comparable to that of GEANT3 can be achieved.
The design of GEANT4 allows a user to extend GEANT4 functionality. This helps to implement the new idea of “tailored” geometry description that can be used for high performance simulation of any calorimeter or other regular structure.
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 46
The Virtual MC
Detector Code
AliRun
AliMC
TGeant3
TGeant4
TFluka
G3
geom
et
ry
G3toG4
G4
g
eom
etr
y
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 47
Tracking schema
Module Version StepManagerAdd the hit
FLUKA Step
Geant4StepManager
Disk I/ORoot
AliRun::StepManagerGUSTEP
Inverse Framework plug-in
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 48
StdHepC++
There is a strong need for C++ standard Monte Carlo generator interface.
StdHepC++ is a natural object-oriented implementation of such an interface.
At present we have working examples which integrate StdHepC++ with the Fortran versions of Herwig, Pythia, Isajet.
On the other side, StdHepC++ provides event blocks readable by MCFast and Geant3, and will have an interface to Geant4.
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 49
LHC++: what it is (I)
Modular replacement of current CERNLIB for use in HEP experiments memory management (C++) persistency (“I/O”) mathematical library foundation classes random number generators histogramming fitting simulation
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 50
LHC++ Present configuration
Object persistency from RD45 collaboration (Objectivity/DB)
Foundation classes HEP specific foundation classes (CLHEP) Random number generators (CLHEP)
Mathematical library from NAG (NAG_C) covers broad range of functionality extensions required by CERN will be added in next
release (Mark 6) quality assurance
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 51
LHC++ Present configuration (cont.)
Simulation: GEANT-4 worldwide collaboration complete OO design
Histogramming: HTLFitting: Gemini, HepFitting packages
interface to any minimizer (at present: NAG, Minuit)
Event generators Lund people started Pythia-7 (C++) StdHep++ in process to become part of CLHEP
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 52
LHC++ packages and dependencies
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 53
User requirements for a physics analysis tool
Easy to use for “end user” "like PAW”
Foresee customization/integration wrt. existing frameworks of experiments e.g., use persistency/messaging/... from experiment needs to be compatible with experiment’s framework
Plan for extensionsMaximize flexibility/interoperability
"plug-and-play-like" use of components from other frameworks (shared libs using the same interfaces)
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 54
Abstract Interfaces for Data Analysis
AIDA project started by HepVis’99 workgroup: Abstract Interfaces for Data Analysis http://wwwinfo.cern.ch/asd/lhc++/AIDA/index.html
In close collaboration with users and developers from experiments and providers of other packages Iguana, HippoDraw, JAS, OpenScientist
Starting with Histogram classes presently in final iteration
Next items are Ntuples, Vectors and Fitting
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 55
Conclusions
In response to the challenges posed by the new physics program and the expectations of the user community all major experiments are investing in flexible and powerful software architectures based on frameworks (Non just main&subroutines) Many commonalties Several qualifying differences
thrust on novel technologies and their impact on physicists
Specialized sub-framework for detector reconstruction, detector description, physics process simulation
CHEP 2000 - Highlights from Session A
Vincenzo Innocente 56
Conclusions
Experience with OO is no more confined to few gurus and their prophets Clear evidence that well engineered OO software is
much easier to adapt, extend, interface in response of evolving requirements
Near Future ( CHEP 2001?) Consolidation of current architectures Common approach to basic computing services
Next Challenge: Customer Satisfaction Physicists analyzing data from their desk using all
the power they expect from new computing technologies