Introduction to the ROOT framework root.cern.ch

Introduction tothe ROOT framework

http://root.cern.ch

René BrunCERN

Do-Son school on Advanced Computing and GRID Technologies for Research

Institute of Information Technology, VAST, Hanoi

René Brun 2

What's ROOT?

René Brun 3

ROOT Application Domains

Data Storage: Local, Network

Data Analysis & Visualization

General Fram

ework

René Brun 4

ROOT in a Nutshell The ROOT system is an Object Oriented framework

for large scale data handling applications. It is written in C++.

Provides, among others, an efficient data storage and access system designed to support

structured data sets (PetaBytes) a query system to extract data from these data sets a C++ interpreter advanced statistical analysis algorithms (multi dimensional

histogramming, fitting, minimization and cluster finding) scientific visualization tools with 2D and 3D graphics an advanced Graphical User Interface

The user interacts with ROOT via a graphical user interface, the command line or scripts

The command and scripting language is C++, thanks to the embedded CINT C++ interpreter, and large scripts can be compiled and dynamically loaded.

A Python shell is also provided.

René Brun 5

ROOT: An Open Source Project

The project was started in 1995.The project is developed as a collaboration

between: Full time developers:

• 11 people full time at CERN (PH/SFT)• +4 developers at Fermilab/USA, Protvino , JINR/Dubna (Russia)

Large number of part-time contributors (155 in CREDITS file)A long list of users giving feedback, comments, bug

fixes and many small contributions• 2400 registered to RootForum• 10,000 posts per year

An Open Source Project, source available under the LGPL license

René Brun 6

More information...

http://root.cern.chDownloadDocumentationTutorialsOnline HelpMailing listForum

René Brun 7

Install ROOT

Download & extract ROOT v5.16/00 from http://root.cern.ch/root/Version516.html Set of precompiled versions and source code

Set environment export ROOTSYS=<path-to-installation> export PATH=$ROOTSYS/bin:$PATH export LD_LIBRARY_PATH=$ROOTSYS/lib:$LD_LIBRARY_PATH export DYLD_LIBRARY_PATH=$ROOTSYS/lib:

$DYLD_LIBRARY_PATH (MacOS X only) Compile source code (if chosen)

cd $ROOTSYS ./configure make More information at http://root.cern.ch/root/Install.html

http://root.cern.ch/root/Version516.html

René Brun 8

ROOT Library StructureROOT libraries are a layered structure CORE classes always required: support for RTTI, basic

I/O and interpreter Optional libraries loaded when needed. Separation

between data objects and the high level classes acting on these objects. Example: a batch job uses only the histogram library, no need to link histogram painter library

Shared libraries reduce the application size and link time

Mix and match, build your own application Reduce dependencies via plug-in mechanism

René Brun 9

Three User Interfaces

GUIwindows, buttons, menus

Command lineCINT (C++ interpreter),Python, Ruby,…

Macros, applications, libraries (C++ compiler and interpreter)

{ // example // macro...}

CINT Interpreter

René Brun 11

CINT in ROOTCINT is used in ROOT:

As command line interpreterAs script interpreterTo generate class dictionariesTo generate function/method calling stubsSignals/Slots with the GUI

The command line, script and programming language become the same

Large scripts can be compiled for optimal performance

René Brun 12

Compiled versus Interpreted Why compile? Faster execution, CINT has limitations…

Why interpret? Faster Edit → Run → Check result → Edit cycles

("rapid prototyping"). Scripting is sometimes just easier.

Are Makefiles dead? No! if you build/compile a very large application Yes! ACLiC is even platform independent!

René Brun 13

Running Code

To run function mycode() in file mycode.C:root [0] .x mycode.C

Equivalent: load file and run function:root [1] .L mycode.C

root [2] mycode()

All of CINT's commands (help):root [3] .h

René Brun 14

Running Code

Macro: file that is interpreted by CINT (.x)

Execute with .x mymacro.C(42)

int mymacro(int value)

{

int ret = 42;

ret += value;

return ret;

}

René Brun 15

Unnamed Macros

No functions, just statements

Execute with .x mymacro.C No functions thus no arguments

Recommend named macros!Compiler prefers it, too…

{

float ret = 0.42;

return sin(ret);

}

René Brun 16

Named Vs. UnnamedNamed: function scope unnamed:

global

mymacro.C: unnamed.C:

Back at the prompt:

void mymacro()

{ int val = 42; }

{ int val = 42; }

root [] val

Error: Symbol val is not defined

root [] val

(int)42

René Brun 17

Survivors: Heap Objects

obj local to function, inaccessible from outside:

Instead: create on heap, pass to outside:

pObj gone – but MyClass still where pObj pointed to! Returned by mymacro()

void mymacro()

{ MyClass obj; }

MyClass* mymacro()

{ MyClass* pObj = new MyClass();

return pObj; }

René Brun 18

Running Code – Libraries"Library": compiled code, shared library

CINT can call its functions!

Build a library: ACLiC! "+" instead of Makefile(Automatic Compiler of Libraries for CINT)

CINT knows all its functions / types:

something(42)

.x something.C(42)++

René Brun 19

My first session

root [0] 344+76.8(const double)4.20800000000000010e+002root [1] float x=89.7;root [2] float y=567.8;root [3] x+sqrt(y)(double)1.13528550991510710e+002root [4] float z = x+2*sqrt(y/6);root [5] z(float)1.09155929565429690e+002root [6] .q

root

root

root [0] try up and down arrows

See file $HOME/.root_hist

René Brun 20

My second session

root [0] .x session2.Cfor N=100000, sum= 45908.6

root [1] sum(double)4.59085828512453370e+004

Root [2] r.Rndm()(Double_t)8.29029321670533560e-001

root [3] .q

root

{ int N = 100000; TRandom r; double sum = 0; for (int i=0;i<N;i++) { sum += sin(r.Rndm()); } printf("for N=%d, sum= %g\n",N,sum);}

session2.C

unnamed macroexecutes in global scope

René Brun 21

My third session

root [0] .x session3.Cfor N=100000, sum= 45908.6

root [1] sumError: Symbol sum is not defined in current scope*** Interpreter error recovered ***

Root [2] .x session3.C(1000)for N=1000, sum= 460.311

root [3] .q

root

void session3 (int N=100000) { TRandom r; double sum = 0; for (int i=0;i<N;i++) { sum += sin(r.Rndm()); } printf("for N=%d, sum= %g\n",N,sum);}

session3.C

Named macroNormal C++ scope

rules

René Brun 22

My third session with ACLIC

root [0] gROOT->Time();root [1] .x session4.C(10000000)for N=10000000, sum= 4.59765e+006Real time 0:00:06, CP time 6.890

root [2] .x session4.C+(10000000)

for N=10000000, sum= 4.59765e+006 Real time 0:00:09, CP time 1.062

root [3] session4(10000000)for N=10000000, sum= 4.59765e+006Real time 0:00:01, CP time 1.052

root [4] .q

#include “TRandom.h”void session4 (int N) { TRandom r; double sum = 0; for (int i=0;i<N;i++) { sum += sin(r.Rndm()); } printf("for N=%d, sum= %g\n",N,sum);}

session4.C

File session4.CAutomatically compiled

and linked by thenative compiler.

Must be C++ compliant

René Brun 23

Macros with more than one function

root [0] .x session5.C >session5.logroot [1] .q

void session5(int N=100) { session5a(N); session5b(N); gROOT->ProcessLine(“.x session4.C+(1000)”);}void session5a(int N) { for (int i=0;i<N;i++) { printf("sqrt(%d) = %g\n",i,sqrt(i)); }}void session5b(int N) { double sum = 0; for (int i=0;i<N;i++) { sum += i; printf("sum(%d) = %g\n",i,sum); }}

session5.C

.x session5.Cexecutes the function

session5 in session5.C

root [0] .L session5.Croot [1] session5(100); >session5.logroot [2] session5b(3)sum(0) = 0sum(1) = 1sum(2) = 3

root [3] .q

use gROOT->ProcessLineto execute a macro from a

macro or from compiled code

René Brun 24

Running a script in batch mode

root –b –q myscript.Cmyscript.C is executed by the interpreter

root –b –q myscript.C(42)same as above, but giving an argument

root –b –q myscript.C+myscript.C is compiled via ACLIC with the native

compiler and linked with the executable.

root –b –q myscript.C++g(42)same as above. Script compiled in debug mode and

an argument given.

René Brun 25

Calling interpreter in a macro

All CINT commands can be excuted from an interpreted macro or from compiled code.gROOT->ProcessLine(“.x myscript.C”);

René Brun 26

ROOT Tutorials & Doc

See Users Guide (pdf) atSee online Reference Manual atExecute tutorials at $ROOTSYS/tutorials

293 tutorials in the following directoriesfoam gui matrix spectrum xml hist mlp splotnet sql geom physics gl image pyroot threadfft graphics io pythia tree fit graphs math quadp ruby unuran

René Brun 27

Generating a Dictionary

MyClass.h

MyClass.cxx rootcint –f MyDict.cxx –c MyClass.h

compile and link

libMyClass.so

MyDict.hMyDict.cxx

René Brun 2828

HTML

René Brun 29

Math TMVA

SPlot

Graphics & GUI

René Brun 31

TPad: main graphics container

Hello

Root > TLine line(.1,.9,.6,.6)

Root > line.Draw()

Root > TText text(.5,.2,”Hello”)

Root > text.Draw()

The Draw function adds the object to the list of primitives of the current pad.

If no pad exists, a pad is automatically created with a default range [0,1].

When the pad needs to be drawn or redrawn, the object Paint function is called.

Only objects derivingfrom TObject may be drawn

in a padRoot Objects or User objects

René Brun 32

Basic Primitives

TButton

TLine TArrow TEllipse

TCurvyLine

TPaveLabel

TPave

TDiamond

TPavesText

TPolyLine

TLatex

TCrown

TMarker

TText

TCurlyArc

TBox

René Brun 33

Full LateX

support on

screen and

postscript

TCurlyArcTCurlyLineTWavyLine

and other building blocks for Feynmann diagrams

Formula or diagrams can be

edited with the mouse

Feynman.C

latex3.C

René Brun 34

Graphs

TGraph(n,x,y)

TCutG(n,x,y)

TGraphErrors(n,x,y,ex,ey)

TGraphAsymmErrors(n,x,y,exl,exh,eyl,eyh)

TMultiGraph

gerrors2.C

René Brun 35

Graphics examples

René Brun 36

René Brun 37

René Brun 38

René Brun 39

Graphics (2D-3D)

“SURF”“LEGO”

TF3

TH3

TGLParametric

René Brun 40

ASImage: Image processor

René Brun 41

GUI (Graphical User Interface)

René Brun 42

Canvas tool bar/menus/help

René Brun 43

Object editor Click on any object to show

its editor

René Brun 44

TBrowser: a dynamic browser

see $ROOTSYS/test/RootIDE

René Brun 45

TBrowser Editor

René Brun 46

GUI C++ code generator

When pressing ctrl+S on any widget it is saved as a C++ macro file thanks to the SavePrimitive methods implemented in all GUI classes. The generated macro can be edited and then executed via CINT

Executing the macro restores the complete original GUI as well as all created signal/slot connections in a global way

// transient frame TGTransientFrame *frame2 = new TGTransientFrame(gClient->GetRoot(),760,590); // group frame TGGroupFrame *frame3 = new TGGroupFrame(frame2,"curve");

TGRadioButton *frame4 = new TGRadioButton(frame3,"gaus",10); frame3->AddFrame(frame4);

root [0] .x example.C

René Brun 47

The GUI builder provides GUI tools for developing user interfaces based on the ROOT GUI classes. It includes over 30 advanced widgets and an automatic C++ code generator.

The GUI Builder

René Brun 48

More GUI Examples$ROOTSYS/tutorials/gui

$ROOTSYS/test/RootShower

$ROOTSYS/test/RootIDE

René Brun 49

Geometry

The GEOMetry package is used to model very complex detectors (LHC). It includes

-a visualization system

-a navigator (where am I, distances, overlaps, etc)

René Brun 50

OpenGL

see $ROOTSYS/tutorials/geom

René Brun 51

Peak Finder + Deconvolutions

TSpectrum

René Brun 52

Fitters

Minuit

Fumili

LinearFitter

RobustFitter

Roofit

René Brun 53

Histograms

Analysis result: often a histogram

Value (x) vs. count (y)

Menu:View / Editor

René Brun 54

Fitting

Analysis result: often a fit based on a histogram

René Brun 55

FitFit = optimization in parameters,

e.g. Gaussian

For Gaussian: [0] = "Constant"[1] = "Mean"[2] = "Sigma" / "Width"

Objective: choose parameters [0], [1], [2] to get function as close as possible to histogram

2

2

]2[2

])1[(

]0[)(

x

exf

René Brun 56

Fit Panel

To fit a histogram:

right click histogram,

"Fit Panel"

Straightforward interface for fitting!

René Brun 57

Roofit: a powerful fitting framework

see $ROOTSYS/tutorials/fit/RoofitDemo.C

Input/Output

René Brun 59

Saving Data

Streaming, Reflection, TFile,

Schema Evolution

René Brun 60

Saving Objects – the Issues

Storing aLong: How many bytes? Valid:

0x0000002a0x000000000000002a0x0000000000000000000000000000002a

which byte contains 42, which 0? Valid:char[] {42, 0, 0, 0}: little endian, e.g. Intelchar[] {0, 0, 0, 42}: big endian, e.g. PowerPC

long aLong = 42; WARNING:

platform dependent!

René Brun 61

Platform Data Types

Fundamental data types:size is platform dependent

Store "int" on platform A0x000000000000002a

Read back on platform B – incompatible!Data loss, size differences, etc:0x000000000000002a

René Brun 62

ROOT Basic Data Types

Solution: ROOT typedefs

Signed Unsigned sizeof [bytes]

Char_t UChar_t 1

Short_t UShort_t 2

Int_t UInt_t 4

Long64_t ULong64_t 8

Double32_t float on disk, double in RAM

René Brun 6363

Object Oriented Concepts

Members: a “has a” relationship to the class.

Inheritance: an “is a” relationship to the class.

Class: the description of a “thing” in the system Object: instance of a class Methods: functions for a class

TObject

Jets Tracks EvNum

Momentum

Segments

Charge

Event

IsA

HasAHasA

HasA

HasAHasAHasA

René Brun 64

Saving Objects – the Issues

Storing anObject:need to know its members + base classesneed to know "where they are"

WARNING:

platform dependent!

class TMyClass {

float fFloat;

Long64_t fLong;

};

TMyClass anObject;

René Brun 65

Reflection

Need type description (aka reflection)

1. types, sizes, members

TMyClass is a class.

Members: "fFloat", type float, size 4 bytes "fLong", type Long64_t, size 8 bytes

class TMyClass {

float fFloat;

Long64_t fLong;

};

René Brun 66

Reflection

Need type description

1. types, sizes, members

2. offsets in memory class TMyClass {

float fFloat;

Long64_t fLong;

};

TMyClass

Mem

ory

Add

ress

fLong

fFloat

– 16– 14– 12– 10– 8– 6– 4– 2– 0

PADDING "fFloat" is at offset 0

"fLong" is at offset 8

René Brun 67

members memory re-order

I/O Using Reflection

TMyClass

Mem

ory

Add

ress

fLong

fFloat

– 16– 14– 12– 10– 8– 6– 4– 2– 0

PADDING

2a 00 00 00...

...00 00 00 2a

René Brun 68

C++ Is Not Java

Lesson: need reflection!

Where from?

Java: get data members with

C++: get data members with – oops. Not part of C++.

-- discussions for next C++

Class.forName("MyClass").getFields()

René Brun 69

Reflection For C++Program parses header files, e.g. Funky.h:

Collects reflection data for types requested in Linkdef.h

Stores it in Dict.cxx (dictionary file)

Compile Dict.cxx, link, load:C++ with reflection!

rootcint –f Dict.cxx Funky.h LinkDef.h

René Brun 70

Reflection By Selection

LinkDef.h syntax:

#pragma link C++ class MyClass+;#pragma link C++ typedef MyType_t;#pragma link C++ function MyFunc(int);#pragma link C++ enum MyEnum;

René Brun 71

Why So Complicated?

Simply use ACLiC:

Will create library with dictionary of all types in MyCode.cxx automatically!

.L MyCode.cxx+

René Brun 72

Saving Objects: TFile

ROOT stores objects in TFiles:

TFile behaves like file system:

TFile has a current directory:

TFile* f = new TFile("afile.root", "NEW");

f->mkdir("dir");

f->cd("dir");

René Brun 73

Saving Objects, Really

Given a TFile:

Write an object deriving from TObject:

"optionalName" or TObject::GetName()

Write any object (with dictionary):

TFile* f = new TFile("afile.root", "RECREATE");

object->Write("optionalName");

f->WriteObject(object, "name");

René Brun 74

"Where Is My Histogram?"

TFile owns histograms, graphs, trees(due to historical reasons):

h automatically deleted: owned by file.

c still there.

TFile* f = new TFile("myfile.root");TH1F* h = new TH1F("h","h",10,0.,1.);h->Write();Canvas* c = new TCanvas();c->Write();delete f;

René Brun 75

TFile as Directory: TKeys

One TKey per object:named directory entryknows what it points tolike inodes in file system

Each TFile directory is collection of TKeys

TKey: "hist1", TH1F

TFile "myfile.root"

TKey: "list", TListTKey: "hist2", TH2FTDirectory: "subdir"

TKey: "hist1", TH2D

René Brun 76

TKeys Usage

Efficient for hundreds of objects

Inefficient for millions:lookup ≥ log(n)sizeof(TKey) on 64 bit machine: 160 bytes

Better storage / access methods available,

René Brun 77

Risks With I/OPhysicists can loop a lot:

For each particle collision

For each particle created

For each detector module

Do something.

Physicists can loose a lot:

Run for hours…

Crash.

Everything lost.

René Brun 78

Name Cycles

Create snapshots regularly:

MyObject;1

MyObject;2

MyObject;3

…

MyObject

Write() does not replace but append!but see documentation TObject::Write()

René Brun 79

The "I" Of I/O

Reading is simple:

Remember:TFile owns histograms!file gone, histogram gone!

TFile* f = new TFile("myfile.root");TH1F* h = 0;f->GetObject("h", h);h->Draw();delete f;

René Brun 80

Ownership And TFiles

Separate TFile and histograms:

… and h will stay around.

TFile* f = new TFile("myfile.root");TH1F* h = 0;TH1::AddDirectory(kFALSE);f->GetObject("h", h);h->Draw();delete f;

René Brun 81

Changing Class:a Problem

Things change:

class TMyClass { float fFloat; Long64_t fLong;};

René Brun 82

Changing Class: a Problem

Things change:

Inconsistent reflection data!

Objects written with old version cannot be read!

class TMyClass { double fFloat; Long64_t fLong;};

René Brun 83

Solution: Schema Evolution

Support changes:

1. removed members: just skip data

2. added members: just default-initialize them (whatever the default constructor does)

Long64_t fLong;float fFloat;

file.root

float fFloat;

RAMignore

float fFloat;file.root Long64_t fLong;

float fFloat;

RAMTMyClass(): fLong(0)

René Brun 84

Solution: Schema Evolution

Support changes:

3. members change types

Long64_t fLong;float fFloat;

file.rootLong64_t fLong;double fFloat;

RAM

float DUMMY;RAM (temporary)

convert

René Brun 85

Supported Type Changes

Schema Evolution supports: float ↔ double, int ↔ long, etc (and pointers) TYPE* ↔ std::vector<TYPE> CLASS* ↔ TClonesArray

Adding / removing base class equivalent to adding / removing data member

René Brun 86

Storing Reflection

Big Question: file == memory class layout?

Need to store reflection to file,see TFile::ShowStreamerInfo():

StreamerInfo for class: TH1F, version=1 TH1 BASE offset= 0 type= 0 TArrayF BASE offset= 0 type= 0

StreamerInfo for class: TH1, version=5 TNamed BASE offset= 0 type=67... Int_t fNcells offset= 0 type= 3 TAxis fXaxis offset= 0 type=61

René Brun 87

Class Version

One TStreamerInfo for all objects of the same class layout in a file

Can have multiple versions in same file

Use version number to identify layout:

class TMyClass: public TObject {public: TMyClass(): fLong(0), fFloat(0.) {} virtual ~TMyClass() {} ... ClassDef(TMyClass,1); // example class};

René Brun 88

Random Facts On ROOT I/O

We know exactly what's in a TFile – need no library to look at data!

ROOT files are zippedCombine contents of TFiles with $ROOTSYS/bin/hadd

Can even open TFile("http://myserver.com/afile.root") including read-what-you-need!

Nice viewer for TFile: new TBrowser

René Brun 89

I/O

Object in Memory

Object in Memory

Streamer: No need for

transient / persistent classes

http

sockets

File on disk

Net File

Web File

XML XML File

SQL DataBase

LocalB

uffe

r

René Brun 90

TFile / TDirectoryA TFile object may be divided in a hierarchy of

directories, like a Unix file system.Two I/O modes are supported

Key-mode (TKey). An object is identified by a name (key), like files in a Unix directory. OK to support up to a few thousand objects, like histograms, geometries, mag fields, etc.

TTree-mode to store event data, when the number of events may be millions, billions.

René Brun 91

Self-describing filesDictionary for persistent classes written to the

file.ROOT files can be read by foreign readers Support for Backward and Forward

compatibilityFiles created in 2001 must be readable in 2015Classes (data objects) for all objects in a file

can be regenerated via TFile::MakeProject

Root >TFile f(“demo.root”);

Root > f.MakeProject(“dir”,”*”,”new++”);

René Brun 92

Example of key mode

void keywrite() {

TFile f(“keymode.root”,”new”);

TH1F h(“hist”,”test”,100,-3,3);

h.FillRandom(“gaus”,1000);

h.Write()

}void keyRead() {

TFile f(“keymode.root”);

TH1F *h = (TH1F*)f.Get(“hist”);

h.Draw();

}

René Brun 93

A Root file pippa.rootwith two levels of

directories

Objects in directory/pippa/DM/CJ

eg:/pippa/DM/CJ/h15

René Brun 94

1 billion people surfing the

Web

LHC: How Much Data?

105

104

103

102

Level 1 Rate (Hz)

High Level-1 Trigger(1 MHz)

High No. ChannelsHigh Bandwidth(500 Gbit/s)

High Data Archive(5 PetaBytes/year)10 Gbits/s in Data base

LHCB

KLOE

HERA-B

CDF II

CDF

H1ZEUS

UA1

LEP

NA49ALICE

Event Size (bytes)

104 105 106

ATLASCMS

106

107

STAR

ROOT Collection Classes

Or when one million TKeys are not enough…

René Brun 96

Collection Classes

The ROOT collections are polymorphic

containers that hold pointers to TObjects, so:They can only hold objects that inherit from

TObjectThey return pointers to TObjects, that have to

be cast back to the correct subclass

René Brun 97

Types Of Collections

The inheritance hierarchy of the primary collection classes

TCollection

TSeqCollection THashTable TMap

TList TOrdCollection TObjArray TBtree

TSortedList THashList TClonesArray

René Brun 98

Collections (cont)Here is a subset of collections supported by ROOT:

TObjArray TClonesArray THashTable THashList TMap

Templated containers, e.g. STL (std::list etc)

René Brun 99

TObjArray

A TObjArray is a collection which supports traditional array semantics via the overloading of operator[].

Objects can be directly accessed via an index. The array expands automatically when objects

are added.

René Brun 100

TClonesArray

Array of identical classes (“clones”).

Designed for repetitive data analysis tasks: same type of objects created and deleted many times.

No comparable class in STL!

The internal data structure of a TClonesArray

René Brun 101

Traditional Arrays

Very large number of new and delete calls in large loops like this (N(100000) x N(10000) times new/delete):

TObjArray a(10000); while (TEvent *ev = (TEvent *)next()) { for (int i = 0; i < ev->Ntracks; ++i) { a[i] = new TTrack(x,y,z,...); ... } ... a.Delete(); }

N(100000)

N(10000)

René Brun 102

You better use a TClonesArray which reduces the number of new/delete calls to only N(10000):

TClonesArray a("TTrack", 10000); while (TEvent *ev = (TEvent *)next()) { for (int i = 0; i < ev->Ntracks; ++i) { new(a[i]) TTrack(x,y,z,...); ... } ... a.Delete(); }

Pair of new/delete calls cost about 4 μsNN(109) new/deletes will save about 1 hour.

N(100000)

N(10000)

René Brun 103

THashTable - THashList

THashTable puts objects in one of several of its short lists. Which list to pick is defined by TObject::Hash(). Access much faster than map or list, but no order defined. Nothing similar in STL.

THashList consists of a THashTable for fast access plus a TList to define an order of the objects.

René Brun 104

TMap

TMap implements an associative array of (key,value) pairs using a THashTable for efficient retrieval (therefore TMap does not conserve the order of the entries).

ROOT Trees

René Brun 106

Why Trees ?

As seen previously, any object deriving from TObject can be written to a file with an associated key with object.Write()

However each key has an overhead in the directory structure in memory (up to 160 bytes). Thus, Object.Write() is very convenient for simple objects like histograms, but not for saving one object per event.

Using TTrees, the overhead in memory is in general less than 4 bytes per entry!

René Brun 107

Why Trees ?

Extremely efficient write once, read many ("WORM")

Designed to store >109 (HEP events) with same data structure

Load just a subset of the objects to optimize memory usage

Trees allow fast direct and random access to any entry (sequential access is the best)

René Brun 108

Building ROOT Trees

Overview of TreesBranches

5 steps to build a TTree

René Brun 109

Memory <--> TreeEach Node is a branch in the Tree

0123456789101112131415161718

T.Fill()

T.GetEntry(6)

T

Memory

René Brun 110

ROOT I/O -- Split/ClusterTree version

Streamer

File

Branches

Tree in memory

Tree entries

René Brun 111

Writing/Reading a Tree class Event : public Something {

Header fHeader;

std::list<Vertex*> fVertices;

std::vector<Track> fTracks;

TOF fTOF;

Calor *fCalor;

}

main() {

Event *event = 0;

TFile f(“demo.root”,”recreate”);

int split = 99; //maximum split

TTree *T = new TTree(“T”,”demo Tree”);

T->Branch(“event”,&event,split);

for (int ev=0;ev<1000;ev++) {

event = new Event(…);

T->Fill();

delete event;

}

t->AutoSave();

}

main() {

Event *event = 0;

TFile f(“demo.root”);

TTree *T = (TTree*)f.Get”T”);

T->SetBranchAddress(“event”,&event);

Long64_t N = T->GetEntries();

for (Long64_t ev=0;ev<N;ev++) {

T->GetEntry(ev);

// do something with event

}

}

Event.h

Write.C Read.C

René Brun 112

Tree structure

René Brun 113

8 Branches of T

8 leaves of branchElectrons

A double-clickto histogram

the leaf

Browsing a TTree with TBrowser

René Brun 114

The TTreeViewer

René Brun 115

Tree structureBranches: directories Leaves: data containersCan read a subset of all branches – speeds up

considerably the data analysis processesTree layout can be optimized for data analysisThe class for a branch is called TBranchVariables on TBranch are called leaf (yes -

TLeaf)Branches of the same TTree can be written to

separate files

René Brun 116

Five Steps to Build a Tree

Steps:

1. Create a TFile

2. Create a TTree

3. Add TBranch to the TTree

4. Fill the tree

5. Write the file

René Brun 117

Step 1: Create a TFile Object

Trees can be huge need file for swapping filled entries

TFile *hfile = new TFile("AFile.root");

René Brun 118

Step 2: Create a TTree Object

The TTree constructor:Tree name (e.g. "myTree") Tree title

TTree *tree = new TTree("myTree","A Tree");

René Brun 119

Step 3: Adding a Branch

Branch nameAddress of the pointer

to the object

Event *event = new Event(); myTree->Branch("EventBranch", &event);

René Brun 120

Splitting a Branch

Setting the split level (default = 99)

Split level = 0 Split level = 99

tree->Branch("EvBr", &event, 64000, 0 );

René Brun 121

Splitting (real example)

Split level = 0 Split level = 99

René Brun 122

Splitting

Creates one branch per member – recursively Allows to browse objects that are stored in trees,

even without their library Makes same members consecutive, e.g. for object

with position in X, Y, Z, and energy E, all X are consecutive, then come Y, then Z, then E. A lot higher zip efficiency!

Fine grained branches allow fain-grained I/O - read only members that are needed, instead of full object

Supports STL containers, too!

René Brun 123

Performance Considerations

A split branch is:Faster to read - the type does not have to be

read each timeSlower to write due to the large number of

buffers

René Brun 124

Step 4: Fill the Tree

Assign values to the object contained in each branch

TTree::Fill() creates a new entry in the tree: snapshot of values of branches’ objects

Create a for loop

for (int e=0;e<100000;++e) { event->Generate(e); // fill event myTree->Fill(); // fill the tree }

René Brun 125

Step 5: Write Tree To File

myTree->Write();

René Brun 126

Example macro{ Event *myEvent = new Event(); TFile f("mytree.root"); TTree *t = new TTree("myTree","A Tree"); t->Branch("SplitEvent", &myEvent); for (int e=0;e<100000;++e) { myEvent->Generate(); t->Fill(); } t->Write();}

René Brun 127

Reading a TTree

Looking at a treeHow to read a treeTrees, friends and chains

René Brun 128

Looking at the Tree

TTree::Print() shows the data layout

root [] TFile f("AFile.root")root [] myTree->Print();*******************************************************************************Tree :myTree : A ROOT tree **Entries : 10 : Total = 867935 bytes File Size = 390138 ** : : Tree compression factor = 2.72 ********************************************************************************Branch :EventBranch **Entries : 10 : BranchElement (see below) **............................................................................**Br 0 :fUniqueID : **Entries : 10 : Total Size= 698 bytes One basket in memory **Baskets : 0 : Basket Size= 64000 bytes Compression= 1.00 **............................................................................*……

René Brun 129

Looking at the Tree

TTree::Scan("leaf:leaf:….") shows the valuesroot [] myTree->Scan("fNseg:fNtrack"); > scan.txt

root [] myTree->Scan("fEvtHdr.fDate:fNtrack:fPx:fPy","", "colsize=13 precision=3 col=13:7::15.10");

******************************************************************************* Row * Instance * fEvtHdr.fDate * fNtrack * fPx * fPy ******************************************************************************** 0 * 0 * 960312 * 594 * 2.07 * 1.459911346 ** 0 * 1 * 960312 * 594 * 0.903 * -0.4093382061 ** 0 * 2 * 960312 * 594 * 0.696 * 0.3913401663 ** 0 * 3 * 960312 * 594 * -0.638 * 1.244356871 ** 0 * 4 * 960312 * 594 * -0.556 * -0.7361358404 ** 0 * 5 * 960312 * 594 * -1.57 * -0.3049036264 ** 0 * 6 * 960312 * 594 * 0.0425 * -1.006743073 ** 0 * 7 * 960312 * 594 * -0.6 * -1.895804524 *

René Brun 130

TTree Selection Syntax

Prints the first 8 variables of the tree.

Prints all the variables of the tree.Select specific variables:

Prints the values of var1, var2 and var3.A selection can be applied in the second argument:

Prints the values of var1, var2 and var3 for the entries where var1 is exactly 0.

MyTree->Scan();

MyTree->Scan("*");

MyTree->Scan("var1:var2:var3");

MyTree->Scan("var1:var2:var3", "var1==0");

René Brun 131

Looking at the TreeTTree::Show(entry_number) shows the values for

one entry

root [] myTree->Show(0);======> EVENT:0 EventBranch = NULL fUniqueID = 0 fBits = 50331648 [...] fNtrack = 594 fNseg = 5964 [...] fEvtHdr.fRun = 200 [...] fTracks.fPx = 2.066806, 0.903484, 0.695610, -

0.637773,... fTracks.fPy = 1.459911, -0.409338, 0.391340,

1.244357,...

René Brun 132

How To Read a TTree

Example:

1. Open the Tfile

2. Get the TTree

TFile f("tree4.root")

TTree *t4=0;

f.GetObject("t4",t4)

OR

René Brun 133

How to Read a TTree

3. Create a variable pointing to the dataroot [] Event *event = 0;

4. Associate a branch with the variable:root [] t4->SetBranchAddress("event_split", &event);

5. Read one entry in the TTreeroot [] t4->GetEntry(0)

root [] event->GetTracks()->First()->Dump()

==> Dumping object at: 0x0763aad0, name=Track, class=Track

fPx 0.651241 X component of the momentum

fPy 1.02466 Y component of the momentum

fPz 1.2141 Z component of the momentum

[...]

René Brun 134

Example macro

{

Event *ev = 0;

TFile f("mytree.root");

TTree *myTree = (TTree*)f->Get("myTree");

myTree->SetBranchAddress("SplitEvent",&ev);

for (int e=0;e<100000;++e) {

myTree->GetEntry(e);

ev->Analyse();

}

}

René Brun 135

Chains of Trees

A TChain is a collection of Trees.Same semantics for TChains and TTrees

root > .x h1chain.C root > chain.Process(“h1analysis.C”)

{ //creates a TChain to be used by the h1analysis.C class //the symbol H1 must point to a directory where the H1 data sets //have been installed TChain chain("h42"); chain.Add("$H1/dstarmb.root"); chain.Add("$H1/dstarp1a.root"); chain.Add("$H1/dstarp1b.root"); chain.Add("$H1/dstarp2.root");}

René Brun 136

Data Volume & Organisation

100MB 1GB 10GB 1TB100GB 100TB 1PB10TB

1 1 500005000500505

TTree TChain

A TFile typically contains 1 TTreeA TChain is a collection of TTrees or/and TChainsA TChain is typically the result of a query to a file

catalog

René Brun 137

Friends of Trees

- Adding new branches to existing tree without touching it, i.e.:

- Unrestricted access to the friend's data via virtual branch of original tree

myTree->AddFriend("ft1", "friend.root")

René Brun 138

Tree Friends

0123456789101112131415161718

0123456789101112131415161718

0123456789101112131415161718

Public

read

Public

read

User

Write

Entry # 8

René Brun 139

Tree Friends

TFile f1("tree1.root");tree.AddFriend("tree2", "tree2.root")tree.AddFriend("tree3", "tree3.root");tree.Draw("x:a", "k<c");tree.Draw("x:tree2.x", "sqrt(p)<b");

x

Processing timeindependent of thenumber of friendsunlike table joins

in RDBMS

Collaboration-widepublic read

Analysis groupprotected

userprivate

René Brun 140

Summary: Reading TreesTTree is one of the most powerful collections

available for HEPExtremely efficient for huge number of data sets

with identical layoutVery easy to look at TTree - use TBrowser!Write once, read many (WORM) ideal for

experiments' dataStill: extensible, users can add their own tree as

friend

René Brun 141

Analyzing Trees

Selectors, Proxies, PROOF

René Brun 142

Recap

TTree efficient storage and access for huge amounts of structured data

Allows selective access of data

TTree knows its layout

Almost all HEP analyses based on TTree

René Brun 143

TTree Data Access

Data access via TTree / TBranch is complex

Lots of code identical for all TTrees:getting tree from file, setting up branches, entry loop

Lots of code completely defined by TTree:branch names, variable types, branch ↔ variable mapping

Need to enable / disable branches "by hand"

Want common interface to allow generic "TTree based analysis infrastructure"

René Brun 144

TTree Proxy

The solution:write analysis code using branch names

as variablesauto-declare variables from branch

structureread branches on-demand

René Brun 145

Branch vs. Proxy

Take a simple branch:

Access via proxy in pia.C:

class ABranch { int a;};ABranch* b=0;tree->Branch("abranch", &b);

double pia() { return sqrt(abranch.a * TMath::Pi());}

René Brun 146

Proxy Details

Analysis script somename.C must contain somename()

Put #includes into somename.h

Return type must convert to double


René Brun 147

Proxy Advantages

Very efficient: only reads leaf "a"

Can use arbitrary C++

Leaf behaves like a variable

Uses meta-information stored with TTree:branch namestypes contained in branches / leavesand their members (unrolling)


René Brun 148

Simple Proxy Analysis

TTree::Draw() runs simple analysis:

Compiles pia.C

Calls it for each event

Fills histogram named "htemp" with return value

TFile* f = new TFile("tree.root");TTree* tree = 0;f->GetObject("MyTree", tree);tree->Draw("pia.C+");

René Brun 149

Behind The Proxy Scene

TTree::Draw("pia.C+") creates helper class

pia.C gets #included inside generatedSel!

Its data members are named like leaves,wrap access to Tree data

Can also generate Proxy via

generatedSel: public TSelector {#include "pia.C" ...};

tree->MakeProxy("MyProxy", "pia.C")

René Brun 150

TSelector

TSelector is base for event-based analysis:

1. setup

2. analyze an entry

3. draw / save histograms

generatedSel: public TSelector

TSelector::Begin()

TSelector::Process(Long64_t entry)

TSelector::Terminate()

René Brun 151

Extending The ProxyGiven Proxy generated for script.C (like pia.C):looks for and calls

Correspond to TSelector's functions

Can be used to set up complete analysis:fill several histograms,control event loop

void script_Begin(TTree* tree)Bool_t script_Process(Long64_t entry)void script_Terminate()

René Brun 152

Proxy And TClonesArray

TClonesArray as branch:

Cannot loop over entries and return each:

class TGap { float ore; };TClonesArray* b = new TClonesArray("TGap");tree->Branch("gap", &b);

double singapore() { return sin(gap.ore[???]);}

René Brun 153

Proxy And TClonesArray

TClonesArray as branch:

Implement it as part of pia_Process() instead:

class TGap { float ore; };TClonesArray* b = new TClonesArray("TGap");tree->Branch("gap", &b);

Bool_t pia_Process(Long64_t entry) { for (int i=0; i<gap.GetEntries(); ++i) hSingapore->Fill(sin(gap.ore[i])); return kTRUE;}

René Brun 154

Extending The Proxy, Example

Need to declare hSingapore somewhere!

But pia.C is #included inside generatedSel, so:

TH1* hSingapore;void pia_Begin() { hSingapore = new TH1F(...);}Bool_t pia_Process(Long64_t entry) { hSingapore->Fill(...);}void pia_Terminate() { hSingapore->Draw();}

ROOT on the GRID(s)

Many ways to use ROOT on a GRIDSingle laptop

Multi-Core desktopLocal area cluster

Cluster on a GRID centerClusters on GRID(s)

René Brun 156

GRID: the ideaExploit resources available on the Net: You

take and you give.Not a new idea: Seti, Condor, etcBUT:

Can we make it transparent?What type of work?

Problem is more complex than a simple CPU distribution taskWAN data accessData managementAuthentication, time & space sharingDisk and CPU quotas

René Brun 157

The HEP GRID100,000 computers

in 1000 locations5,000 physicists

in 1000 locations

WAN

René Brun 158

GRID: Users profile

Few big users submittingmany long jobs (Monte Carlo,

reconstruction)They want to run many jobs in

one month

Many users submittingmany short jobs (physics

analysis)They want to run many jobs in one hour or less

René Brun 159

Big Users

Monte Carlo jobs (one hour one day)Each job generates one file (1 GigaByte)

Reconstruction job (10 minutes -> one hour) Input from the MC job or copied from a storage

centerOutput (< input) is staged back to a storage center

Success rate (90%). If the job fails you resubmit it.

For several years, GRID projects focused effort on big users.

René Brun 160

Small Users Scenario 1: submit one batch job to the GRID. It runs

somewhere with varying response times. Scenario 2: Use a splitter to submit many batch jobs to

process many data sets (eg CRAB, Ganga, Alien). Output data sets are merged automatically. Success rate < 90%. You see the final results only when the last job has been received and all results merged.

Scenario 3: Use PROOF (automatic splitter and merger). Success rate close to 100%. You can see intermediate feedback objects like histograms. You run from an interactive ROOT session.

René Brun 161

Scenario 1 & 2: PROS

Job level parallelism. Conventional model. Nothing to change in user program.Initialisation phaseLoop on eventsTermination

Same job can run on laptop or on a GRID job.

René Brun 162

Scenario 1 & 2: CONS(1)

Long tail in the jobs wall clock time distribution.

René Brun 163

Scenario 1 & 2: CONS(2)

Can only merge output after a timecut.More data movement (input & output)Cannot have interactive feedbackTwo consecutive queries will produce

different results (problem with rare events)

Will use only one core on a multicore laptop or GRID node.

Hard to control priorities and quotas.

René Brun 164

Scenario 3: PROS

Predictive response. Event level parallelism. Workers terminate at the same time

Process moved to data as much as possible, otherwise use network.

Interactive feedbackCan view running queries in different

ROOT sessions.Can take advantage of multicore cpus

René Brun 165

Scenario 3: CONS

Event level parallelism. User code must follow a template: the TSelector API.

Good for a local area cluster. More difficult to put in place in a GRID collection of local area clusters.

Interactive schedulers, priority managers must be put in place.

Debugging a problem slightly more difficult.

René Brun 166

GRID & Parallelism: 1

The user application splits the problem in N subtasks. Each task is submitted to a GRID node (minimal input, minimal output).

The GRID task can run synchronously or asynchronously. If the task fails or time-out, it can be resubmitted to another node.

One of the first and simplest use of the GRID, but not many applications in HEP.

René Brun 167

GRID & Parallelism: 2

The typical case of Monte Carlo or reconstruction in HEP.

It requires massive data transfers between the main computing centers.

This activity has concentrated so far a very large fraction of the GRID projects and budgets.

It has been an essential step to foster coordination between hundreds of sites, improve international network bandwidths and robustness.

René Brun 168

GRID & Parallelism: 3Distributed data analysis will be a major

challenge for the coming experiments.This is the area with thousands of people

running many different styles of queries, most of the time in a chaotic way.

The main challenges:Access to millions of data sets (eg 500 TeraBytes)Best match between execution and data locationDistributing/compiling/linking users code(a few

thousand lines) with experiment large libraries (a few million lines of code).

Simplicity of useReal time responseRobustness.

René Brun 169

GRID & Parallelism: 3a

Currently 2 different & competing directions for distributed data analysis.Batch solution using the existing GRID

infrastructure for Monte Carlo and reconstruction programs. A front-end program partitions the problem to analyze ND data sets on NP processors.

Interactive solution PROOF. Each query is parallelized with an optimum match of execution and data location.

René Brun 170

Parallelism: Where ?

Multi-Core CPU laptop/desktop

2(2007) 32(2012?)

Network of desktops

Local Cluster

with multi-core CPUs

GRID(s)

Documents

Introduction to the ROOT framework root.cern.ch