Upload
margaret-donovan
View
78
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Introduction to the ROOT framework http://root.cern.ch. Ren é Brun CERN. Do-Son school on Advanced Computing and GRID Technologies for Research Institute of Information Technology, VAST, Hanoi. What's ROOT?. ROOT Application Domains. Data Analysis & Visualization. General Framework. - PowerPoint PPT Presentation
Citation preview
Introduction tothe ROOT framework
http://root.cern.ch
René BrunCERN
Do-Son school on Advanced Computing and GRID Technologies for Research
Institute of Information Technology, VAST, Hanoi
René Brun 2
What's ROOT?
René Brun 3
ROOT Application Domains
Data Storage: Local, Network
Data Analysis & Visualization
General Fram
ework
René Brun 4
ROOT in a Nutshell The ROOT system is an Object Oriented framework
for large scale data handling applications. It is written in C++.
Provides, among others, an efficient data storage and access system designed to support
structured data sets (PetaBytes) a query system to extract data from these data sets a C++ interpreter advanced statistical analysis algorithms (multi dimensional
histogramming, fitting, minimization and cluster finding) scientific visualization tools with 2D and 3D graphics an advanced Graphical User Interface
The user interacts with ROOT via a graphical user interface, the command line or scripts
The command and scripting language is C++, thanks to the embedded CINT C++ interpreter, and large scripts can be compiled and dynamically loaded.
A Python shell is also provided.
René Brun 5
ROOT: An Open Source Project
The project was started in 1995.The project is developed as a collaboration
between: Full time developers:
• 11 people full time at CERN (PH/SFT)• +4 developers at Fermilab/USA, Protvino , JINR/Dubna (Russia)
Large number of part-time contributors (155 in CREDITS file)A long list of users giving feedback, comments, bug
fixes and many small contributions• 2400 registered to RootForum• 10,000 posts per year
An Open Source Project, source available under the LGPL license
René Brun 6
More information...
http://root.cern.chDownloadDocumentationTutorialsOnline HelpMailing listForum
René Brun 7
Install ROOT
Download & extract ROOT v5.16/00 from http://root.cern.ch/root/Version516.html Set of precompiled versions and source code
Set environment export ROOTSYS=<path-to-installation> export PATH=$ROOTSYS/bin:$PATH export LD_LIBRARY_PATH=$ROOTSYS/lib:$LD_LIBRARY_PATH export DYLD_LIBRARY_PATH=$ROOTSYS/lib:
$DYLD_LIBRARY_PATH (MacOS X only) Compile source code (if chosen)
cd $ROOTSYS ./configure make More information at http://root.cern.ch/root/Install.html
René Brun 8
ROOT Library StructureROOT libraries are a layered structure CORE classes always required: support for RTTI, basic
I/O and interpreter Optional libraries loaded when needed. Separation
between data objects and the high level classes acting on these objects. Example: a batch job uses only the histogram library, no need to link histogram painter library
Shared libraries reduce the application size and link time
Mix and match, build your own application Reduce dependencies via plug-in mechanism
René Brun 9
Three User Interfaces
GUIwindows, buttons, menus
Command lineCINT (C++ interpreter),Python, Ruby,…
Macros, applications, libraries (C++ compiler and interpreter)
{ // example // macro...}
CINT Interpreter
René Brun 11
CINT in ROOTCINT is used in ROOT:
As command line interpreterAs script interpreterTo generate class dictionariesTo generate function/method calling stubsSignals/Slots with the GUI
The command line, script and programming language become the same
Large scripts can be compiled for optimal performance
René Brun 12
Compiled versus Interpreted Why compile? Faster execution, CINT has limitations…
Why interpret? Faster Edit → Run → Check result → Edit cycles
("rapid prototyping"). Scripting is sometimes just easier.
Are Makefiles dead? No! if you build/compile a very large application Yes! ACLiC is even platform independent!
René Brun 13
Running Code
To run function mycode() in file mycode.C:root [0] .x mycode.C
Equivalent: load file and run function:root [1] .L mycode.C
root [2] mycode()
All of CINT's commands (help):root [3] .h
René Brun 14
Running Code
Macro: file that is interpreted by CINT (.x)
Execute with .x mymacro.C(42)
int mymacro(int value)
{
int ret = 42;
ret += value;
return ret;
}
René Brun 15
Unnamed Macros
No functions, just statements
Execute with .x mymacro.C No functions thus no arguments
Recommend named macros!Compiler prefers it, too…
{
float ret = 0.42;
return sin(ret);
}
René Brun 16
Named Vs. UnnamedNamed: function scope unnamed:
global
mymacro.C: unnamed.C:
Back at the prompt:
void mymacro()
{ int val = 42; }
{ int val = 42; }
root [] val
Error: Symbol val is not defined
root [] val
(int)42
René Brun 17
Survivors: Heap Objects
obj local to function, inaccessible from outside:
Instead: create on heap, pass to outside:
pObj gone – but MyClass still where pObj pointed to! Returned by mymacro()
void mymacro()
{ MyClass obj; }
MyClass* mymacro()
{ MyClass* pObj = new MyClass();
return pObj; }
René Brun 18
Running Code – Libraries"Library": compiled code, shared library
CINT can call its functions!
Build a library: ACLiC! "+" instead of Makefile(Automatic Compiler of Libraries for CINT)
CINT knows all its functions / types:
something(42)
.x something.C(42)++
René Brun 19
My first session
root [0] 344+76.8(const double)4.20800000000000010e+002root [1] float x=89.7;root [2] float y=567.8;root [3] x+sqrt(y)(double)1.13528550991510710e+002root [4] float z = x+2*sqrt(y/6);root [5] z(float)1.09155929565429690e+002root [6] .q
root
root
root [0] try up and down arrows
See file $HOME/.root_hist
René Brun 20
My second session
root [0] .x session2.Cfor N=100000, sum= 45908.6
root [1] sum(double)4.59085828512453370e+004
Root [2] r.Rndm()(Double_t)8.29029321670533560e-001
root [3] .q
root
{ int N = 100000; TRandom r; double sum = 0; for (int i=0;i<N;i++) { sum += sin(r.Rndm()); } printf("for N=%d, sum= %g\n",N,sum);}
session2.C
unnamed macroexecutes in global scope
René Brun 21
My third session
root [0] .x session3.Cfor N=100000, sum= 45908.6
root [1] sumError: Symbol sum is not defined in current scope*** Interpreter error recovered ***
Root [2] .x session3.C(1000)for N=1000, sum= 460.311
root [3] .q
root
void session3 (int N=100000) { TRandom r; double sum = 0; for (int i=0;i<N;i++) { sum += sin(r.Rndm()); } printf("for N=%d, sum= %g\n",N,sum);}
session3.C
Named macroNormal C++ scope
rules
René Brun 22
My third session with ACLIC
root [0] gROOT->Time();root [1] .x session4.C(10000000)for N=10000000, sum= 4.59765e+006Real time 0:00:06, CP time 6.890
root [2] .x session4.C+(10000000)
for N=10000000, sum= 4.59765e+006 Real time 0:00:09, CP time 1.062
root [3] session4(10000000)for N=10000000, sum= 4.59765e+006Real time 0:00:01, CP time 1.052
root [4] .q
#include “TRandom.h”void session4 (int N) { TRandom r; double sum = 0; for (int i=0;i<N;i++) { sum += sin(r.Rndm()); } printf("for N=%d, sum= %g\n",N,sum);}
session4.C
File session4.CAutomatically compiled
and linked by thenative compiler.
Must be C++ compliant
René Brun 23
Macros with more than one function
root [0] .x session5.C >session5.logroot [1] .q
void session5(int N=100) { session5a(N); session5b(N); gROOT->ProcessLine(“.x session4.C+(1000)”);}void session5a(int N) { for (int i=0;i<N;i++) { printf("sqrt(%d) = %g\n",i,sqrt(i)); }}void session5b(int N) { double sum = 0; for (int i=0;i<N;i++) { sum += i; printf("sum(%d) = %g\n",i,sum); }}
session5.C
.x session5.Cexecutes the function
session5 in session5.C
root [0] .L session5.Croot [1] session5(100); >session5.logroot [2] session5b(3)sum(0) = 0sum(1) = 1sum(2) = 3
root [3] .q
use gROOT->ProcessLineto execute a macro from a
macro or from compiled code
René Brun 24
Running a script in batch mode
root –b –q myscript.Cmyscript.C is executed by the interpreter
root –b –q myscript.C(42)same as above, but giving an argument
root –b –q myscript.C+myscript.C is compiled via ACLIC with the native
compiler and linked with the executable.
root –b –q myscript.C++g(42)same as above. Script compiled in debug mode and
an argument given.
René Brun 25
Calling interpreter in a macro
All CINT commands can be excuted from an interpreted macro or from compiled code.gROOT->ProcessLine(“.x myscript.C”);
René Brun 26
ROOT Tutorials & Doc
See Users Guide (pdf) atSee online Reference Manual atExecute tutorials at $ROOTSYS/tutorials
293 tutorials in the following directoriesfoam gui matrix spectrum xml hist mlp splotnet sql geom physics gl image pyroot threadfft graphics io pythia tree fit graphs math quadp ruby unuran
René Brun 27
Generating a Dictionary
MyClass.h
MyClass.cxx rootcint –f MyDict.cxx –c MyClass.h
compile and link
libMyClass.so
MyDict.hMyDict.cxx
René Brun 2828
HTML
René Brun 29
Math TMVA
SPlot
Graphics & GUI
René Brun 31
TPad: main graphics container
Hello
Root > TLine line(.1,.9,.6,.6)
Root > line.Draw()
Root > TText text(.5,.2,”Hello”)
Root > text.Draw()
The Draw function adds the object to the list of primitives of the current pad.
If no pad exists, a pad is automatically created with a default range [0,1].
When the pad needs to be drawn or redrawn, the object Paint function is called.
Only objects derivingfrom TObject may be drawn
in a padRoot Objects or User objects
René Brun 32
Basic Primitives
TButton
TLine TArrow TEllipse
TCurvyLine
TPaveLabel
TPave
TDiamond
TPavesText
TPolyLine
TLatex
TCrown
TMarker
TText
TCurlyArc
TBox
René Brun 33
Full LateX
support on
screen and
postscript
TCurlyArcTCurlyLineTWavyLine
and other building blocks for Feynmann diagrams
Formula or diagrams can be
edited with the mouse
Feynman.C
latex3.C
René Brun 34
Graphs
TGraph(n,x,y)
TCutG(n,x,y)
TGraphErrors(n,x,y,ex,ey)
TGraphAsymmErrors(n,x,y,exl,exh,eyl,eyh)
TMultiGraph
gerrors2.C
René Brun 35
Graphics examples
René Brun 36
René Brun 37
René Brun 38
René Brun 39
Graphics (2D-3D)
“SURF”“LEGO”
TF3
TH3
TGLParametric
René Brun 40
ASImage: Image processor
René Brun 41
GUI (Graphical User Interface)
René Brun 42
Canvas tool bar/menus/help
René Brun 43
Object editor Click on any object to show
its editor
René Brun 44
TBrowser: a dynamic browser
see $ROOTSYS/test/RootIDE
René Brun 45
TBrowser Editor
René Brun 46
GUI C++ code generator
When pressing ctrl+S on any widget it is saved as a C++ macro file thanks to the SavePrimitive methods implemented in all GUI classes. The generated macro can be edited and then executed via CINT
Executing the macro restores the complete original GUI as well as all created signal/slot connections in a global way
// transient frame TGTransientFrame *frame2 = new TGTransientFrame(gClient->GetRoot(),760,590); // group frame TGGroupFrame *frame3 = new TGGroupFrame(frame2,"curve");
TGRadioButton *frame4 = new TGRadioButton(frame3,"gaus",10); frame3->AddFrame(frame4);
root [0] .x example.C
René Brun 47
The GUI builder provides GUI tools for developing user interfaces based on the ROOT GUI classes. It includes over 30 advanced widgets and an automatic C++ code generator.
The GUI Builder
René Brun 48
More GUI Examples$ROOTSYS/tutorials/gui
$ROOTSYS/test/RootShower
$ROOTSYS/test/RootIDE
René Brun 49
Geometry
The GEOMetry package is used to model very complex detectors (LHC). It includes
-a visualization system
-a navigator (where am I, distances, overlaps, etc)
René Brun 50
OpenGL
see $ROOTSYS/tutorials/geom
René Brun 51
Peak Finder + Deconvolutions
TSpectrum
René Brun 52
Fitters
Minuit
Fumili
LinearFitter
RobustFitter
Roofit
René Brun 53
Histograms
Analysis result: often a histogram
Value (x) vs. count (y)
Menu:View / Editor
René Brun 54
Fitting
Analysis result: often a fit based on a histogram
René Brun 55
FitFit = optimization in parameters,
e.g. Gaussian
For Gaussian: [0] = "Constant"[1] = "Mean"[2] = "Sigma" / "Width"
Objective: choose parameters [0], [1], [2] to get function as close as possible to histogram
2
2
]2[2
])1[(
]0[)(
x
exf
René Brun 56
Fit Panel
To fit a histogram:
right click histogram,
"Fit Panel"
Straightforward interface for fitting!
René Brun 57
Roofit: a powerful fitting framework
see $ROOTSYS/tutorials/fit/RoofitDemo.C
Input/Output
René Brun 59
Saving Data
Streaming, Reflection, TFile,
Schema Evolution
René Brun 60
Saving Objects – the Issues
Storing aLong: How many bytes? Valid:
0x0000002a0x000000000000002a0x0000000000000000000000000000002a
which byte contains 42, which 0? Valid:char[] {42, 0, 0, 0}: little endian, e.g. Intelchar[] {0, 0, 0, 42}: big endian, e.g. PowerPC
long aLong = 42; WARNING:
platform dependent!
René Brun 61
Platform Data Types
Fundamental data types:size is platform dependent
Store "int" on platform A0x000000000000002a
Read back on platform B – incompatible!Data loss, size differences, etc:0x000000000000002a
René Brun 62
ROOT Basic Data Types
Solution: ROOT typedefs
Signed Unsigned sizeof [bytes]
Char_t UChar_t 1
Short_t UShort_t 2
Int_t UInt_t 4
Long64_t ULong64_t 8
Double32_t float on disk, double in RAM
René Brun 6363
Object Oriented Concepts
Members: a “has a” relationship to the class.
Inheritance: an “is a” relationship to the class.
Class: the description of a “thing” in the system Object: instance of a class Methods: functions for a class
TObject
Jets Tracks EvNum
Momentum
Segments
Charge
Event
IsA
HasAHasA
HasA
HasAHasAHasA
René Brun 64
Saving Objects – the Issues
Storing anObject:need to know its members + base classesneed to know "where they are"
WARNING:
platform dependent!
class TMyClass {
float fFloat;
Long64_t fLong;
};
TMyClass anObject;
René Brun 65
Reflection
Need type description (aka reflection)
1. types, sizes, members
TMyClass is a class.
Members: "fFloat", type float, size 4 bytes "fLong", type Long64_t, size 8 bytes
class TMyClass {
float fFloat;
Long64_t fLong;
};
René Brun 66
Reflection
Need type description
1. types, sizes, members
2. offsets in memory class TMyClass {
float fFloat;
Long64_t fLong;
};
TMyClass
Mem
ory
Add
ress
fLong
fFloat
– 16– 14– 12– 10– 8– 6– 4– 2– 0
PADDING "fFloat" is at offset 0
"fLong" is at offset 8
René Brun 67
members memory re-order
I/O Using Reflection
TMyClass
Mem
ory
Add
ress
fLong
fFloat
– 16– 14– 12– 10– 8– 6– 4– 2– 0
PADDING
2a 00 00 00...
...00 00 00 2a
René Brun 68
C++ Is Not Java
Lesson: need reflection!
Where from?
Java: get data members with
C++: get data members with – oops. Not part of C++.
-- discussions for next C++
Class.forName("MyClass").getFields()
René Brun 69
Reflection For C++Program parses header files, e.g. Funky.h:
Collects reflection data for types requested in Linkdef.h
Stores it in Dict.cxx (dictionary file)
Compile Dict.cxx, link, load:C++ with reflection!
rootcint –f Dict.cxx Funky.h LinkDef.h
René Brun 70
Reflection By Selection
LinkDef.h syntax:
#pragma link C++ class MyClass+;#pragma link C++ typedef MyType_t;#pragma link C++ function MyFunc(int);#pragma link C++ enum MyEnum;
René Brun 71
Why So Complicated?
Simply use ACLiC:
Will create library with dictionary of all types in MyCode.cxx automatically!
.L MyCode.cxx+
René Brun 72
Saving Objects: TFile
ROOT stores objects in TFiles:
TFile behaves like file system:
TFile has a current directory:
TFile* f = new TFile("afile.root", "NEW");
f->mkdir("dir");
f->cd("dir");
René Brun 73
Saving Objects, Really
Given a TFile:
Write an object deriving from TObject:
"optionalName" or TObject::GetName()
Write any object (with dictionary):
TFile* f = new TFile("afile.root", "RECREATE");
object->Write("optionalName");
f->WriteObject(object, "name");
René Brun 74
"Where Is My Histogram?"
TFile owns histograms, graphs, trees(due to historical reasons):
h automatically deleted: owned by file.
c still there.
TFile* f = new TFile("myfile.root");TH1F* h = new TH1F("h","h",10,0.,1.);h->Write();Canvas* c = new TCanvas();c->Write();delete f;
René Brun 75
TFile as Directory: TKeys
One TKey per object:named directory entryknows what it points tolike inodes in file system
Each TFile directory is collection of TKeys
TKey: "hist1", TH1F
TFile "myfile.root"
TKey: "list", TListTKey: "hist2", TH2FTDirectory: "subdir"
TKey: "hist1", TH2D
René Brun 76
TKeys Usage
Efficient for hundreds of objects
Inefficient for millions:lookup ≥ log(n)sizeof(TKey) on 64 bit machine: 160 bytes
Better storage / access methods available,
René Brun 77
Risks With I/OPhysicists can loop a lot:
For each particle collision
For each particle created
For each detector module
Do something.
Physicists can loose a lot:
Run for hours…
Crash.
Everything lost.
René Brun 78
Name Cycles
Create snapshots regularly:
MyObject;1
MyObject;2
MyObject;3
…
MyObject
Write() does not replace but append!but see documentation TObject::Write()
René Brun 79
The "I" Of I/O
Reading is simple:
Remember:TFile owns histograms!file gone, histogram gone!
TFile* f = new TFile("myfile.root");TH1F* h = 0;f->GetObject("h", h);h->Draw();delete f;
René Brun 80
Ownership And TFiles
Separate TFile and histograms:
… and h will stay around.
TFile* f = new TFile("myfile.root");TH1F* h = 0;TH1::AddDirectory(kFALSE);f->GetObject("h", h);h->Draw();delete f;
René Brun 81
Changing Class:a Problem
Things change:
class TMyClass { float fFloat; Long64_t fLong;};
René Brun 82
Changing Class: a Problem
Things change:
Inconsistent reflection data!
Objects written with old version cannot be read!
class TMyClass { double fFloat; Long64_t fLong;};
René Brun 83
Solution: Schema Evolution
Support changes:
1. removed members: just skip data
2. added members: just default-initialize them (whatever the default constructor does)
Long64_t fLong;float fFloat;
file.root
float fFloat;
RAMignore
float fFloat;file.root Long64_t fLong;
float fFloat;
RAMTMyClass(): fLong(0)
René Brun 84
Solution: Schema Evolution
Support changes:
3. members change types
Long64_t fLong;float fFloat;
file.rootLong64_t fLong;double fFloat;
RAM
float DUMMY;RAM (temporary)
convert
René Brun 85
Supported Type Changes
Schema Evolution supports: float ↔ double, int ↔ long, etc (and pointers) TYPE* ↔ std::vector<TYPE> CLASS* ↔ TClonesArray
Adding / removing base class equivalent to adding / removing data member
René Brun 86
Storing Reflection
Big Question: file == memory class layout?
Need to store reflection to file,see TFile::ShowStreamerInfo():
StreamerInfo for class: TH1F, version=1 TH1 BASE offset= 0 type= 0 TArrayF BASE offset= 0 type= 0
StreamerInfo for class: TH1, version=5 TNamed BASE offset= 0 type=67... Int_t fNcells offset= 0 type= 3 TAxis fXaxis offset= 0 type=61
René Brun 87
Class Version
One TStreamerInfo for all objects of the same class layout in a file
Can have multiple versions in same file
Use version number to identify layout:
class TMyClass: public TObject {public: TMyClass(): fLong(0), fFloat(0.) {} virtual ~TMyClass() {} ... ClassDef(TMyClass,1); // example class};
René Brun 88
Random Facts On ROOT I/O
We know exactly what's in a TFile – need no library to look at data!
ROOT files are zippedCombine contents of TFiles with $ROOTSYS/bin/hadd
Can even open TFile("http://myserver.com/afile.root") including read-what-you-need!
Nice viewer for TFile: new TBrowser
René Brun 89
I/O
Object in Memory
Object in Memory
Streamer: No need for
transient / persistent classes
http
sockets
File on disk
Net File
Web File
XML XML File
SQL DataBase
LocalB
uffe
r
René Brun 90
TFile / TDirectoryA TFile object may be divided in a hierarchy of
directories, like a Unix file system.Two I/O modes are supported
Key-mode (TKey). An object is identified by a name (key), like files in a Unix directory. OK to support up to a few thousand objects, like histograms, geometries, mag fields, etc.
TTree-mode to store event data, when the number of events may be millions, billions.
René Brun 91
Self-describing filesDictionary for persistent classes written to the
file.ROOT files can be read by foreign readers Support for Backward and Forward
compatibilityFiles created in 2001 must be readable in 2015Classes (data objects) for all objects in a file
can be regenerated via TFile::MakeProject
Root >TFile f(“demo.root”);
Root > f.MakeProject(“dir”,”*”,”new++”);
René Brun 92
Example of key mode
void keywrite() {
TFile f(“keymode.root”,”new”);
TH1F h(“hist”,”test”,100,-3,3);
h.FillRandom(“gaus”,1000);
h.Write()
}void keyRead() {
TFile f(“keymode.root”);
TH1F *h = (TH1F*)f.Get(“hist”);
h.Draw();
}
René Brun 93
A Root file pippa.rootwith two levels of
directories
Objects in directory/pippa/DM/CJ
eg:/pippa/DM/CJ/h15
René Brun 94
1 billion people surfing the
Web
LHC: How Much Data?
105
104
103
102
Level 1 Rate (Hz)
High Level-1 Trigger(1 MHz)
High No. ChannelsHigh Bandwidth(500 Gbit/s)
High Data Archive(5 PetaBytes/year)10 Gbits/s in Data base
LHCB
KLOE
HERA-B
CDF II
CDF
H1ZEUS
UA1
LEP
NA49ALICE
Event Size (bytes)
104 105 106
ATLASCMS
106
107
STAR
ROOT Collection Classes
Or when one million TKeys are not enough…
René Brun 96
Collection Classes
The ROOT collections are polymorphic
containers that hold pointers to TObjects, so:They can only hold objects that inherit from
TObjectThey return pointers to TObjects, that have to
be cast back to the correct subclass
René Brun 97
Types Of Collections
The inheritance hierarchy of the primary collection classes
TCollection
TSeqCollection THashTable TMap
TList TOrdCollection TObjArray TBtree
TSortedList THashList TClonesArray
René Brun 98
Collections (cont)Here is a subset of collections supported by ROOT:
TObjArray TClonesArray THashTable THashList TMap
Templated containers, e.g. STL (std::list etc)
René Brun 99
TObjArray
A TObjArray is a collection which supports traditional array semantics via the overloading of operator[].
Objects can be directly accessed via an index. The array expands automatically when objects
are added.
René Brun 100
TClonesArray
Array of identical classes (“clones”).
Designed for repetitive data analysis tasks: same type of objects created and deleted many times.
No comparable class in STL!
The internal data structure of a TClonesArray
René Brun 101
Traditional Arrays
Very large number of new and delete calls in large loops like this (N(100000) x N(10000) times new/delete):
TObjArray a(10000); while (TEvent *ev = (TEvent *)next()) { for (int i = 0; i < ev->Ntracks; ++i) { a[i] = new TTrack(x,y,z,...); ... } ... a.Delete(); }
N(100000)
N(10000)
René Brun 102
You better use a TClonesArray which reduces the number of new/delete calls to only N(10000):
TClonesArray a("TTrack", 10000); while (TEvent *ev = (TEvent *)next()) { for (int i = 0; i < ev->Ntracks; ++i) { new(a[i]) TTrack(x,y,z,...); ... } ... a.Delete(); }
Pair of new/delete calls cost about 4 μsNN(109) new/deletes will save about 1 hour.
N(100000)
N(10000)
René Brun 103
THashTable - THashList
THashTable puts objects in one of several of its short lists. Which list to pick is defined by TObject::Hash(). Access much faster than map or list, but no order defined. Nothing similar in STL.
THashList consists of a THashTable for fast access plus a TList to define an order of the objects.
René Brun 104
TMap
TMap implements an associative array of (key,value) pairs using a THashTable for efficient retrieval (therefore TMap does not conserve the order of the entries).
ROOT Trees
René Brun 106
Why Trees ?
As seen previously, any object deriving from TObject can be written to a file with an associated key with object.Write()
However each key has an overhead in the directory structure in memory (up to 160 bytes). Thus, Object.Write() is very convenient for simple objects like histograms, but not for saving one object per event.
Using TTrees, the overhead in memory is in general less than 4 bytes per entry!
René Brun 107
Why Trees ?
Extremely efficient write once, read many ("WORM")
Designed to store >109 (HEP events) with same data structure
Load just a subset of the objects to optimize memory usage
Trees allow fast direct and random access to any entry (sequential access is the best)
René Brun 108
Building ROOT Trees
Overview of TreesBranches
5 steps to build a TTree
René Brun 109
Memory <--> TreeEach Node is a branch in the Tree
0123456789101112131415161718
T.Fill()
T.GetEntry(6)
T
Memory
René Brun 110
ROOT I/O -- Split/ClusterTree version
Streamer
File
Branches
Tree in memory
Tree entries
René Brun 111
Writing/Reading a Tree class Event : public Something {
Header fHeader;
std::list<Vertex*> fVertices;
std::vector<Track> fTracks;
TOF fTOF;
Calor *fCalor;
}
main() {
Event *event = 0;
TFile f(“demo.root”,”recreate”);
int split = 99; //maximum split
TTree *T = new TTree(“T”,”demo Tree”);
T->Branch(“event”,&event,split);
for (int ev=0;ev<1000;ev++) {
event = new Event(…);
T->Fill();
delete event;
}
t->AutoSave();
}
main() {
Event *event = 0;
TFile f(“demo.root”);
TTree *T = (TTree*)f.Get”T”);
T->SetBranchAddress(“event”,&event);
Long64_t N = T->GetEntries();
for (Long64_t ev=0;ev<N;ev++) {
T->GetEntry(ev);
// do something with event
}
}
Event.h
Write.C Read.C
René Brun 112
Tree structure
René Brun 113
8 Branches of T
8 leaves of branchElectrons
A double-clickto histogram
the leaf
Browsing a TTree with TBrowser
René Brun 114
The TTreeViewer
René Brun 115
Tree structureBranches: directories Leaves: data containersCan read a subset of all branches – speeds up
considerably the data analysis processesTree layout can be optimized for data analysisThe class for a branch is called TBranchVariables on TBranch are called leaf (yes -
TLeaf)Branches of the same TTree can be written to
separate files
René Brun 116
Five Steps to Build a Tree
Steps:
1. Create a TFile
2. Create a TTree
3. Add TBranch to the TTree
4. Fill the tree
5. Write the file
René Brun 117
Step 1: Create a TFile Object
Trees can be huge need file for swapping filled entries
TFile *hfile = new TFile("AFile.root");
René Brun 118
Step 2: Create a TTree Object
The TTree constructor:Tree name (e.g. "myTree") Tree title
TTree *tree = new TTree("myTree","A Tree");
René Brun 119
Step 3: Adding a Branch
Branch nameAddress of the pointer
to the object
Event *event = new Event(); myTree->Branch("EventBranch", &event);
René Brun 120
Splitting a Branch
Setting the split level (default = 99)
Split level = 0 Split level = 99
tree->Branch("EvBr", &event, 64000, 0 );
René Brun 121
Splitting (real example)
Split level = 0 Split level = 99
René Brun 122
Splitting
Creates one branch per member – recursively Allows to browse objects that are stored in trees,
even without their library Makes same members consecutive, e.g. for object
with position in X, Y, Z, and energy E, all X are consecutive, then come Y, then Z, then E. A lot higher zip efficiency!
Fine grained branches allow fain-grained I/O - read only members that are needed, instead of full object
Supports STL containers, too!
René Brun 123
Performance Considerations
A split branch is:Faster to read - the type does not have to be
read each timeSlower to write due to the large number of
buffers
René Brun 124
Step 4: Fill the Tree
Assign values to the object contained in each branch
TTree::Fill() creates a new entry in the tree: snapshot of values of branches’ objects
Create a for loop
for (int e=0;e<100000;++e) { event->Generate(e); // fill event myTree->Fill(); // fill the tree }
René Brun 125
Step 5: Write Tree To File
myTree->Write();
René Brun 126
Example macro{ Event *myEvent = new Event(); TFile f("mytree.root"); TTree *t = new TTree("myTree","A Tree"); t->Branch("SplitEvent", &myEvent); for (int e=0;e<100000;++e) { myEvent->Generate(); t->Fill(); } t->Write();}
René Brun 127
Reading a TTree
Looking at a treeHow to read a treeTrees, friends and chains
René Brun 128
Looking at the Tree
TTree::Print() shows the data layout
root [] TFile f("AFile.root")root [] myTree->Print();*******************************************************************************Tree :myTree : A ROOT tree **Entries : 10 : Total = 867935 bytes File Size = 390138 ** : : Tree compression factor = 2.72 ********************************************************************************Branch :EventBranch **Entries : 10 : BranchElement (see below) **............................................................................**Br 0 :fUniqueID : **Entries : 10 : Total Size= 698 bytes One basket in memory **Baskets : 0 : Basket Size= 64000 bytes Compression= 1.00 **............................................................................*……
René Brun 129
Looking at the Tree
TTree::Scan("leaf:leaf:….") shows the valuesroot [] myTree->Scan("fNseg:fNtrack"); > scan.txt
root [] myTree->Scan("fEvtHdr.fDate:fNtrack:fPx:fPy","", "colsize=13 precision=3 col=13:7::15.10");
******************************************************************************* Row * Instance * fEvtHdr.fDate * fNtrack * fPx * fPy ******************************************************************************** 0 * 0 * 960312 * 594 * 2.07 * 1.459911346 ** 0 * 1 * 960312 * 594 * 0.903 * -0.4093382061 ** 0 * 2 * 960312 * 594 * 0.696 * 0.3913401663 ** 0 * 3 * 960312 * 594 * -0.638 * 1.244356871 ** 0 * 4 * 960312 * 594 * -0.556 * -0.7361358404 ** 0 * 5 * 960312 * 594 * -1.57 * -0.3049036264 ** 0 * 6 * 960312 * 594 * 0.0425 * -1.006743073 ** 0 * 7 * 960312 * 594 * -0.6 * -1.895804524 *
René Brun 130
TTree Selection Syntax
Prints the first 8 variables of the tree.
Prints all the variables of the tree.Select specific variables:
Prints the values of var1, var2 and var3.A selection can be applied in the second argument:
Prints the values of var1, var2 and var3 for the entries where var1 is exactly 0.
MyTree->Scan();
MyTree->Scan("*");
MyTree->Scan("var1:var2:var3");
MyTree->Scan("var1:var2:var3", "var1==0");
René Brun 131
Looking at the TreeTTree::Show(entry_number) shows the values for
one entry
root [] myTree->Show(0);======> EVENT:0 EventBranch = NULL fUniqueID = 0 fBits = 50331648 [...] fNtrack = 594 fNseg = 5964 [...] fEvtHdr.fRun = 200 [...] fTracks.fPx = 2.066806, 0.903484, 0.695610, -
0.637773,... fTracks.fPy = 1.459911, -0.409338, 0.391340,
1.244357,...
René Brun 132
How To Read a TTree
Example:
1. Open the Tfile
2. Get the TTree
TFile f("tree4.root")
TTree *t4=0;
f.GetObject("t4",t4)
OR
René Brun 133
How to Read a TTree
3. Create a variable pointing to the dataroot [] Event *event = 0;
4. Associate a branch with the variable:root [] t4->SetBranchAddress("event_split", &event);
5. Read one entry in the TTreeroot [] t4->GetEntry(0)
root [] event->GetTracks()->First()->Dump()
==> Dumping object at: 0x0763aad0, name=Track, class=Track
fPx 0.651241 X component of the momentum
fPy 1.02466 Y component of the momentum
fPz 1.2141 Z component of the momentum
[...]
René Brun 134
Example macro
{
Event *ev = 0;
TFile f("mytree.root");
TTree *myTree = (TTree*)f->Get("myTree");
myTree->SetBranchAddress("SplitEvent",&ev);
for (int e=0;e<100000;++e) {
myTree->GetEntry(e);
ev->Analyse();
}
}
René Brun 135
Chains of Trees
A TChain is a collection of Trees.Same semantics for TChains and TTrees
root > .x h1chain.C root > chain.Process(“h1analysis.C”)
{ //creates a TChain to be used by the h1analysis.C class //the symbol H1 must point to a directory where the H1 data sets //have been installed TChain chain("h42"); chain.Add("$H1/dstarmb.root"); chain.Add("$H1/dstarp1a.root"); chain.Add("$H1/dstarp1b.root"); chain.Add("$H1/dstarp2.root");}
René Brun 136
Data Volume & Organisation
100MB 1GB 10GB 1TB100GB 100TB 1PB10TB
1 1 500005000500505
TTree TChain
A TFile typically contains 1 TTreeA TChain is a collection of TTrees or/and TChainsA TChain is typically the result of a query to a file
catalog
René Brun 137
Friends of Trees
- Adding new branches to existing tree without touching it, i.e.:
- Unrestricted access to the friend's data via virtual branch of original tree
myTree->AddFriend("ft1", "friend.root")
René Brun 138
Tree Friends
0123456789101112131415161718
0123456789101112131415161718
0123456789101112131415161718
Public
read
Public
read
User
Write
Entry # 8
René Brun 139
Tree Friends
TFile f1("tree1.root");tree.AddFriend("tree2", "tree2.root")tree.AddFriend("tree3", "tree3.root");tree.Draw("x:a", "k<c");tree.Draw("x:tree2.x", "sqrt(p)<b");
x
Processing timeindependent of thenumber of friendsunlike table joins
in RDBMS
Collaboration-widepublic read
Analysis groupprotected
userprivate
René Brun 140
Summary: Reading TreesTTree is one of the most powerful collections
available for HEPExtremely efficient for huge number of data sets
with identical layoutVery easy to look at TTree - use TBrowser!Write once, read many (WORM) ideal for
experiments' dataStill: extensible, users can add their own tree as
friend
René Brun 141
Analyzing Trees
Selectors, Proxies, PROOF
René Brun 142
Recap
TTree efficient storage and access for huge amounts of structured data
Allows selective access of data
TTree knows its layout
Almost all HEP analyses based on TTree
René Brun 143
TTree Data Access
Data access via TTree / TBranch is complex
Lots of code identical for all TTrees:getting tree from file, setting up branches, entry loop
Lots of code completely defined by TTree:branch names, variable types, branch ↔ variable mapping
Need to enable / disable branches "by hand"
Want common interface to allow generic "TTree based analysis infrastructure"
René Brun 144
TTree Proxy
The solution:write analysis code using branch names
as variablesauto-declare variables from branch
structureread branches on-demand
René Brun 145
Branch vs. Proxy
Take a simple branch:
Access via proxy in pia.C:
class ABranch { int a;};ABranch* b=0;tree->Branch("abranch", &b);
double pia() { return sqrt(abranch.a * TMath::Pi());}
René Brun 146
Proxy Details
Analysis script somename.C must contain somename()
Put #includes into somename.h
Return type must convert to double
double pia() { return sqrt(abranch.a * TMath::Pi());}
René Brun 147
Proxy Advantages
Very efficient: only reads leaf "a"
Can use arbitrary C++
Leaf behaves like a variable
Uses meta-information stored with TTree:branch namestypes contained in branches / leavesand their members (unrolling)
double pia() { return sqrt(abranch.a * TMath::Pi());}
René Brun 148
Simple Proxy Analysis
TTree::Draw() runs simple analysis:
Compiles pia.C
Calls it for each event
Fills histogram named "htemp" with return value
TFile* f = new TFile("tree.root");TTree* tree = 0;f->GetObject("MyTree", tree);tree->Draw("pia.C+");
René Brun 149
Behind The Proxy Scene
TTree::Draw("pia.C+") creates helper class
pia.C gets #included inside generatedSel!
Its data members are named like leaves,wrap access to Tree data
Can also generate Proxy via
generatedSel: public TSelector {#include "pia.C" ...};
tree->MakeProxy("MyProxy", "pia.C")
René Brun 150
TSelector
TSelector is base for event-based analysis:
1. setup
2. analyze an entry
3. draw / save histograms
generatedSel: public TSelector
TSelector::Begin()
TSelector::Process(Long64_t entry)
TSelector::Terminate()
René Brun 151
Extending The ProxyGiven Proxy generated for script.C (like pia.C):looks for and calls
Correspond to TSelector's functions
Can be used to set up complete analysis:fill several histograms,control event loop
void script_Begin(TTree* tree)Bool_t script_Process(Long64_t entry)void script_Terminate()
René Brun 152
Proxy And TClonesArray
TClonesArray as branch:
Cannot loop over entries and return each:
class TGap { float ore; };TClonesArray* b = new TClonesArray("TGap");tree->Branch("gap", &b);
double singapore() { return sin(gap.ore[???]);}
René Brun 153
Proxy And TClonesArray
TClonesArray as branch:
Implement it as part of pia_Process() instead:
class TGap { float ore; };TClonesArray* b = new TClonesArray("TGap");tree->Branch("gap", &b);
Bool_t pia_Process(Long64_t entry) { for (int i=0; i<gap.GetEntries(); ++i) hSingapore->Fill(sin(gap.ore[i])); return kTRUE;}
René Brun 154
Extending The Proxy, Example
Need to declare hSingapore somewhere!
But pia.C is #included inside generatedSel, so:
TH1* hSingapore;void pia_Begin() { hSingapore = new TH1F(...);}Bool_t pia_Process(Long64_t entry) { hSingapore->Fill(...);}void pia_Terminate() { hSingapore->Draw();}
ROOT on the GRID(s)
Many ways to use ROOT on a GRIDSingle laptop
Multi-Core desktopLocal area cluster
Cluster on a GRID centerClusters on GRID(s)
René Brun 156
GRID: the ideaExploit resources available on the Net: You
take and you give.Not a new idea: Seti, Condor, etcBUT:
Can we make it transparent?What type of work?
Problem is more complex than a simple CPU distribution taskWAN data accessData managementAuthentication, time & space sharingDisk and CPU quotas
René Brun 157
The HEP GRID100,000 computers
in 1000 locations5,000 physicists
in 1000 locations
WAN
René Brun 158
GRID: Users profile
Few big users submittingmany long jobs (Monte Carlo,
reconstruction)They want to run many jobs in
one month
Many users submittingmany short jobs (physics
analysis)They want to run many jobs in one hour or less
René Brun 159
Big Users
Monte Carlo jobs (one hour one day)Each job generates one file (1 GigaByte)
Reconstruction job (10 minutes -> one hour) Input from the MC job or copied from a storage
centerOutput (< input) is staged back to a storage center
Success rate (90%). If the job fails you resubmit it.
For several years, GRID projects focused effort on big users.
René Brun 160
Small Users Scenario 1: submit one batch job to the GRID. It runs
somewhere with varying response times. Scenario 2: Use a splitter to submit many batch jobs to
process many data sets (eg CRAB, Ganga, Alien). Output data sets are merged automatically. Success rate < 90%. You see the final results only when the last job has been received and all results merged.
Scenario 3: Use PROOF (automatic splitter and merger). Success rate close to 100%. You can see intermediate feedback objects like histograms. You run from an interactive ROOT session.
René Brun 161
Scenario 1 & 2: PROS
Job level parallelism. Conventional model. Nothing to change in user program.Initialisation phaseLoop on eventsTermination
Same job can run on laptop or on a GRID job.
René Brun 162
Scenario 1 & 2: CONS(1)
Long tail in the jobs wall clock time distribution.
René Brun 163
Scenario 1 & 2: CONS(2)
Can only merge output after a timecut.More data movement (input & output)Cannot have interactive feedbackTwo consecutive queries will produce
different results (problem with rare events)
Will use only one core on a multicore laptop or GRID node.
Hard to control priorities and quotas.
René Brun 164
Scenario 3: PROS
Predictive response. Event level parallelism. Workers terminate at the same time
Process moved to data as much as possible, otherwise use network.
Interactive feedbackCan view running queries in different
ROOT sessions.Can take advantage of multicore cpus
René Brun 165
Scenario 3: CONS
Event level parallelism. User code must follow a template: the TSelector API.
Good for a local area cluster. More difficult to put in place in a GRID collection of local area clusters.
Interactive schedulers, priority managers must be put in place.
Debugging a problem slightly more difficult.
René Brun 166
GRID & Parallelism: 1
The user application splits the problem in N subtasks. Each task is submitted to a GRID node (minimal input, minimal output).
The GRID task can run synchronously or asynchronously. If the task fails or time-out, it can be resubmitted to another node.
One of the first and simplest use of the GRID, but not many applications in HEP.
René Brun 167
GRID & Parallelism: 2
The typical case of Monte Carlo or reconstruction in HEP.
It requires massive data transfers between the main computing centers.
This activity has concentrated so far a very large fraction of the GRID projects and budgets.
It has been an essential step to foster coordination between hundreds of sites, improve international network bandwidths and robustness.
René Brun 168
GRID & Parallelism: 3Distributed data analysis will be a major
challenge for the coming experiments.This is the area with thousands of people
running many different styles of queries, most of the time in a chaotic way.
The main challenges:Access to millions of data sets (eg 500 TeraBytes)Best match between execution and data locationDistributing/compiling/linking users code(a few
thousand lines) with experiment large libraries (a few million lines of code).
Simplicity of useReal time responseRobustness.
René Brun 169
GRID & Parallelism: 3a
Currently 2 different & competing directions for distributed data analysis.Batch solution using the existing GRID
infrastructure for Monte Carlo and reconstruction programs. A front-end program partitions the problem to analyze ND data sets on NP processors.
Interactive solution PROOF. Each query is parallelized with an optimum match of execution and data location.
René Brun 170
Parallelism: Where ?
Multi-Core CPU laptop/desktop
2(2007) 32(2012?)
Network of desktops
Local Cluster
with multi-core CPUs
GRID(s)