Analisi Statistica dei dati nella Fisica Nucl. e Subnucl. [Laboratorio]
Gabriele SirriIstituto Nazionale di Fisica Nucleare
1
Questa è una raccolta delle slides mostrate nell’A.A. 2015-2016 …
… in continuo aggiornamento e rielaborazione.
2
Credits:RooFit slides extracted or adapted from original presentations by Wouter Verkerke .
3
4
Terminology
5
7
Data modelling
What’s it all about ?EstimatorsThe maximum likelihood estimator
9
10
Data modelling – analysis examples
• Typical questions
– We obtained a mass distribution from data
– what are the signal and background(s) yields?
– what is the significance of the signal?
• Typical tasks
– Creation of an adequate model for the data
– Description of detector effects such as acceptances and resolutions
– Make sure the model is correctly implemented - toy Monte Carlo studies
– Fit the model to the data
– Graphical representation of the data and fit results
11
12
Intermezzo – Functions vs probability density functions
• Why use probability density functions rather than ‘plain’ functions to describe your data?
– Easier to interpret your models. If Blue and Green pdf are each guaranteed to be normalized to 1, then fractions of Blue,Green can be cleanly interpreted as #events
– Many statistical techniques onlyfunction properly with PDFs(e.g maximum likelihood)
– Can sample ‘toy Monte Carlo’ eventsfrom p.d.f because value is always guaranteed to be >=0
• So why is not everybody always using them
– The normalization can be hard to calculate(e.g. it can be different for each set of parameter values p)
– In >1 dimension (numeric) integration can be particularly hard
– RooFit aims to simplify these tasks
13
Estimators
i
i
i
i
xN
xV
xN
x
2)(1
)(ˆ
1)(ˆ
Estimator of the mean
Estimator of the variance
aan )ˆ(lim
2)ˆˆ()ˆ( aaaV This is called theMinimum Variance Bound
Note: Cramer-Rao theorem says there is a limit to the accuracy of an estimator
ie. that there is some estimator for which the variance is a minimum (MVB).
Estimators are called efficient if V(estimator)=MVB
• An estimator is a procedure giving a value for a parameter or a property of a distribution as a function of the actual data values, i.e.
• A perfect estimator is
– Consistent:
– Unbiased – With finite statistics you get the right answer on average
– Efficient
– There are no perfect estimators for real-life problems
14
The Likelihood estimator
)...;();();()(i.e.,);()( 210 pxFpxFpxFpLpxFpLi
i
i
i pxFpL );(ln)(ln
0)(ln
ˆ
ii pp
pd
pLd
Functions used in likelihoods must be Probability Density Functions:
0);(,1);( pxFxdpxF
• Definition of Likelihood
– given D(x) and F(x;p)
– For convenience the negative log of the Likelihood is often used
• Parameters are estimated by maximizing the Likelihood, or equivalently minimizing –log(L)
15
Maximum Likelihood – Variance on ML parameter estimates
p
1
2
22 ln
)(ˆ)(ˆ
pd
LdpVp
pd
Ld
dpdb
pV2
2 ln
1)ˆ(
From Rao-Cramer-Frechetinequality
b = bias as function of p,inequality becomes equalityin limit of efficient estimator
2
1ln)(ln
ˆ2
)ˆ(ln
2
)ˆ(lnln
)ˆ(ln
)ˆ(ln
)ˆ(ln)(ln
max2
2
max
2
ˆ
2
2
max
2
ˆ
2
2
21
ˆ
LpLpp
L
pp
pd
LdL
pppd
Ldpp
dp
LdpLpL
p
pp
pppp
-lo
g(L)
p̂
0.5
• Estimator for the parameter variance is
– I.e. variance is estimated from 2nd derivative of –log(L) at minimum
– Valid if estimator is efficient and unbiased!
• Visual interpretation of variance estimate
– Taylor expand –log(L) around minimum
16
Maximum Likelihood – Properties of MLEs
22ˆ pp Use of 2nd derivative of –log(L)
for variance estimate is usually OK
• In general, Maximum Likelihood Estimators are
– Consistent (gives right answer for N)
– Mostly unbiased (bias 1/N, may need to worry at small N)
– Efficient for large N (you get the smallest possible error)
– Invariant: (a transformation of parameters will Not change your answer, e.g
• MLE efficiency theorem: the MLE will be unbiased and efficient if an unbiased efficient estimator exists
17
Maximum Likelihood – Extended ML
)...;();();()(i.e.,);()( 210 pxFpxFpxFpLpxFpLi
i
)log()),(log()(log expexp NNNpxgpL obs
D
i
Log of Poisson(Nexp,Nobs) (modulo a constant)
• Maximum likelihood information only parameterizes shape of distribution
– I.e. one can determine fraction of signal events from ML fit, but not number of signal events
• Extended Maximum likelihood add extra term
– Clever choice of parameters will allows us to extract Nsig and Nbkg in one pass ( Nexp=Nsig+Nbkg, fsig=Nsig/(Nsig+Nbkg) )
18
Introduction & Overview1
24
Introduction -- Focus: coding a probability density function• Focus on one practical aspect of many data analysis in
HEP: How do you formulate your p.d.f. in ROOT– For ‘simple’ problems (gauss, polynomial) this is easy
– But if you want to do unbinned ML fits, use non-trivial functions, or work with multidimensional functions you quickly find that you need some tools to help you
25
Introduction – Why RooFit was developed
);|BkgResol();(BkgDecay);BkgSel()1(
);|SigResol())2sin(,;(SigDecay);SigSel(
bkgbkgbkgsig
sigsigsigsig
rdttqtpmf
rdttqtpmf
• BaBar experiment at SLAC: Extract sin(2) from time dependent CP violation of B decay: e+e-
Y(4s) BB
– Reconstruct both Bs, measure decay time difference
– Physics of interest is in decay time dependent oscillation
• Many issues arise
– Standard ROOT function framework clearly insufficient to handle such complicated functions must develop new framework
– Normalization of p.d.f. not always trivial to calculate may need numeric integration techniques
– Unbinned fit, >2 dimensions, many events computation performance important must try optimize code for acceptable performance
– Simultaneous fit to control samples to account for detector performance
26
Introduction – Relation to ROOT
C++ command line interface & macros
Data management &histogramming
Graphics interface
I/O support
MINUIT
ToyMC dataGeneration
Data/ModelFitting
Data Modeling
Model Visualization
Extension to ROOT – (Almost) no overlap with existing functionality
29
RooFit core design philosophy
variable RooRealVar
function RooAbsReal
PDF RooAbsPdf
space point RooArgSet
list of space points RooAbsData
integral RooRealIntegral
RooFit classMathematical concept
)(xf
x
x
dxxf
x
x
max
min
)(
)(xf
• Mathematical objects are represented as C++ objects
31
RooFit core design philosophy
f(x,y,z)
RooRealVar x RooRealVar y RooRealVar z
RooAbsReal f
RooRealVar x(“x”,”x”,5) ;
RooRealVar y(“y”,”y”,5) ;
RooRealVar z(“z”,”z”,5) ;
RooBogusFunction f(“f”,”f”,x,y,z) ;
Math
RooFitdiagram
RooFitcode
• Represent relations between variables and functionsas client/server links between objects
32
Object-oriented data modeling
RooRealVar mass(“mass”,”Invariant mass”,5.20,5.30) ;
RooRealVar width(“width”,”B0 mass width”,0.00027,”GeV”);
RooRealVar mb0(“mb0”,”B0 mass”,5.2794,”GeV”) ;
RooGaussian b0sig(“b0sig”,”B0 sig PDF”,mass,mb0,width);
Objects representinga ‘real’ value.
PDF object
Initial range
Initial value Optional unit
References to variables
– All objects are self documenting
• Name - Unique identifier of object
• Title – More elaborate description of object
33
Object-oriented data modeling
RooRealVar mass(“mass”,”Invariant mass”,5.20,5.30) ;
RooRealVar width(“width”,”B0 mass width”,0.00027,”GeV”);
RooRealVar mb0(“mb0”,”B0 mass”,5.2794,”GeV”) ;
RooGaussian b0sig(“b0sig”,”B0 sig PDF”,mass,mb0,width);
Objects representinga ‘real’ value.
PDF object
Initial range
Initial value Optional unit
References to variables
• In RooFit every variable, data point, function, PDF represented in a C++ object
– Objects classified by data/function type they represent,not by their role in a particular setup
– All objects are self documenting
• Name - Unique identifier of object
• Title – More elaborate description of object
34
Basic use2
35
The simplest possible example
RooRealVar x(“x”,”Observable”,-10,10) ;
RooRealVar mean(“mean”,”B0 mass”,0.00027,”GeV”);
RooRealVar sigma(“sigma”,”B0 mass width”,5.2794,”GeV”) ;
RooGaussian model(“model”,”signal pdf”,mass,mean,sigma)
Objects representinga ‘real’ value.
PDF object
Initial range
Initial value Optional unit
References to variables
Name of object Title of object
• We make a Gaussian p.d.f. with three variables: mass, mean and sigma
36
Basics – Creating and plotting a Gaussian p.d.f
// Create an empty plot frame
RooPlot* xframe = w::x.frame() ;
// Plot model on frame
model.plotOn(xframe) ;
// Draw frame on canvas
xframe->Draw() ;
Plot range taken from limits of x
Axis label from gauss title
Unit normalization
Setup gaussian PDF and plot
A RooPlot is an empty frame
capable of holding anythingplotted versus it variable
37
Basics – Generating toy MC events
// Generate an unbinned toy MC set
RooDataSet* data = w::gauss.generate(w::x,10000) ;
// Generate an binned toy MC set
RooDataHist* data = w::gauss.generateBinned(w::x,10000) ;
// Plot PDF
RooPlot* xframe = w::x.frame() ;
data->plotOn(xframe) ;
xframe->Draw() ;
Generate 10000 events from Gaussian p.d.f and show distribution
Can generate both binned andunbinned datasets
43
Basics – Importing data
// Import unbinned data
RooDataSet data(“data”,”data”,w::x,Import(*myTree)) ;
// Import unbinned data
RooDataHist data(“data”,”data”,w::x,Import(*myTH1)) ;
• Unbinned data can also be imported from ROOT TTrees
– Imports TTree branch named “x”.
– Can be of type Double_t, Float_t, Int_t or UInt_t.
All data is converted to Double_t internally
– Specify a RooArgSet of multiple observables to import
multiple observables
• Binned data can be imported from ROOT THx histograms
– Imports values, binning definition and SumW2 errors (if defined)
– Specify a RooArgList of observables when importing a TH2/3.
44
Basics – ML fit of p.d.f to unbinned data
// ML fit of gauss to data
w::gauss.fitTo(*data) ;(MINUIT printout omitted)
// Parameters if gauss now
// reflect fitted values
w::mean.Print()
RooRealVar::mean = 0.0172335 +/- 0.0299542
w::sigma.Print()
RooRealVar::sigma = 2.98094 +/- 0.0217306
// Plot fitted PDF and toy data overlaid
RooPlot* xframe = w::x.frame() ;
data->plotOn(xframe) ;
w::gauss.plotOn(xframe) ;
PDFautomaticallynormalizedto dataset
45
Basics – ML fit of p.d.f to unbinned data
RooFitResult* r = w::gauss.fitTo(*data,Save()) ;
r->Print() ;
RooFitResult: minimized FCN value: 25055.6,
estimated distance to minimum: 7.27598e-08
coviarance matrix quality:
Full, accurate covariance matrix
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
mean 1.7233e-02 +/- 3.00e-02
sigma 2.9809e+00 +/- 2.17e-02
r->correlationMatrix().Print() ;
2x2 matrix is as follows
| 0 | 1 |
-------------------------------
0 | 1 0.0005869
1 | 0.0005869 1
• Can also choose to save full detail of fit
46
Basics – Integrals over p.d.f.s
w::x.setRange(“sig”,-3,7) ;
RooAbsReal* ig = w::g.createIntegral(x,NormSet(x),Range(“sig”)) ;
cout << ig.getVal() ;
0.832519
mean=-1 ;
cout << ig.getVal() ;
0.743677
xdxFxCx
x
min
)()(
RooAbsReal* cdf = gauss.createCdf(x) ;
• It is easy to create an object representing integral over a normalized p.d.f in a sub-range
• Similarly, one can also request the cumulative distribution function
47
RooFit core design philosophy - Workspace
f(x,y,z)
RooRealVar x RooRealVar y RooRealVar z
RooAbsReal f
RooRealVar x(“x”,”x”,5) ;
RooRealVar y(“y”,”y”,5) ;
RooRealVar z(“z”,”z”,5) ;
RooBogusFunction f(“f”,”f”,x,y,z) ;
RooWorkspace w(“w”) ;
w.import(f) ;
Math
RooFitdiagram
RooFitcode
RooWorkspace
• The workspace serves a container class for allobjects created
48
Using the workspace
RooWorkspace w(“w”) ;
RooRealVar x(“x”,”x”,-10,10) ;
RooRealVar mean(“mean”,”mean”,5) ;
RooRealVar sigma(“sigma”,”sigma”,3) ;
RooGaussian f(“f”,”f”,x,mean,sigma) ;
// imports f,x,mean and sigma
w.import(myFunction) ;
• Workspace
– A generic container class for all RooFit objects of your project
– Helps to organize analysis projects
• Creating a workspace
• Putting variables and function into a workspace
– When importing a function or pdf, all its components (variables) are automatically imported too
49
Using the workspace
w.Print() ;
variables
---------
(mean,sigma,x)
p.d.f.s
-------
RooGaussian::f[ x=x mean=mean sigma=sigma ] = 0.249352
// Variety of accessors available
RooPlot* frame = w.var(“x”)->frame() ;
w.pdf(“f”)->plotOn(frame) ;
• Looking into a workspace
• Getting variables and functions out of a workspace
50
Using the workspace
// Variety of accessors available
w.exportToCint() ;
RooPlot* frame = w::x.frame() ;
w::f.plotOn(frame) ;
w.writeToFile(“wspace.root”) ;
• Alternative access to contents through namespace
– Uses CINT extension of C++, works in interpreted code only
• Writing workspace and contents to file
51
Using the workspace
void driver() {
RooWorkspace w(“w”0 ;
makeModel(w) ;
useModel(w) ;
}
void makeModel(RooWorkspace& w) {
// Construct model here
}
void useModel(RooWorkspace& w) {
// Make fit, plots etc here
}
• Organizing your code –Separate construction and use of models
52
RooFit core design philosophy - Factory
f(x,y,z)
RooRealVar x RooRealVar y RooRealVar z
RooAbsReal f
RooWorkspace w(“w”) ;
w.factory(“BogusFunction::f(x[5],y[5],z[5])”) ;
Math
RooFitdiagram
RooFitcode
RooWorkspace
• The factory allows to fill a workspace with pdfs and variables using a simplified scripting language
53
Factory and Workspace
w.factory(“Gaussian::f(x[-10,10],mean[5],sigma[3])”) ;
RooRealVar x(“x”,”x”,-10,10) ;
RooRealVar mean(“mean”,”mean”,5) ;
RooRealVar sigma(“sigma”,”sigma”,3) ;
RooGaussian f(“f”,”f”,x,mean,sigma) ;
• One C++ object per math symbol provides ultimate level of control over each objects functionality, but results in lengthy user code for even simple macros
• Solution: add factory that auto-generates objects from a math-like language. Accessed through factory() method of workspace
• Example: reduce construction of Gaussian pdf and its parameters from 4 to 1 line of code
54
Factory language – Goal and scope
• Aim of factory language is to be very simple.
• The goal is to construct pdfs, functions and variables
– This limits the scope of the factory language (and allows to keep it simple)
– Objects can be customized after creation
• The language syntax has only three elements
1. Simplified expression for creation of variables
2. Expression for creation of functions and pdf is trivial1-to-1 mapping of C++ constructor syntax of corresponding object
3. Multiple objects (e.g. a pdf and its variables) can be nested in a single expression
• Operator classes (sum,product) provide alternate syntax in factory that is closer to math notation
55
Factory syntax
x[-10,10] // Create variable with given range
x[5,-10,10] // Create variable with initial value and range
x[5] // Create initially constant variable
Gaussian::g(x,mean,sigma)
RooGaussian(“g”,”g”,x,mean,sigma)
Polynomial::p(x,{a0,a1})
RooPolynomial(“p”,”p”,x”,RooArgList(a0,a1));
ClassName::Objectname(arg1,[arg2],...)
• Rule #1 – Create a variable
• Rule #2 – Create a function or pdf object
– Leading ‘Roo’ in class name can be omitted
– Arguments are names of objects that already exist in the workspace
– Named objects must be of correct type, if not factory issues error
– Set and List arguments can be constructed with brackets {}
56
Factory syntax
Gaussian::g(x[-10,10],mean[-10,10],sigma[3])
x[-10,10]
mean[-10,10]
sigma[3]
Gaussian::g(x,mean,sigma)
Gaussian::g(x[-10,10],0,3)
SUM::model(0.5*Gaussian(x[-10,10],0,3),Uniform(x)) ;
• Rule #3 – Each creation expression returns the name of the object created
– Allows to create input arguments to functions ‘in place’ rather than in advance
• Miscellaneous points
– You can always use numeric literals where values or functions are expected
– It is not required to give component objects a name, e.g.
57
Model building – (Re)using standard components
RooArgusBG
RooPolynomial
RooBMixDecay
RooHistPdf
RooGaussian
BasicGaussian, Exponential, Polynomial,…Chebychev polynomial
Physics inspiredARGUS,Crystal Ball, Breit-Wigner, Voigtian,B/D-Decay,….
Non-parametricHistogram, KEYS
Easy to extend the library: each p.d.f. is a separate C++ class
• RooFit provides a collection of compiled standard PDF classes
58
Model building – (Re)using standard components
• List of most frequently used pdfs and their factory spec
Gaussian Gaussian::g(x,mean,sigma)
Breit-Wigner BreitWigner::bw(x,mean,gamma)
Landau Landau::l(x,mean,sigma)
Exponential Exponental::e(x,alpha)
Polynomial Polynomial::p(x,{a0,a1,a2})
Chebychev Chebychev::p(x,{a0,a1,a2})
Kernel Estimation KeysPdf::k(x,dataSet)
Poisson Poisson::p(x,mu)
Voigtian Voigtian::v(x,mean,gamma,sigma)
(=BW⊗G)
59
Model building – Making your own
w.factory(“EXPR::mypdf(‘sqrt(a*x)+b’,x,a,b)”) ;
w.factory(“CEXPR::mypdf(‘sqrt(a*x)+b’,x,a,b)”) ;
• Interpreted expressions
• Customized class, compiled and linked on the fly
• Custom class written by you
– Offer option of providing analytical integrals, custom handling of toy MC generation (details in RooFit Manual)
• Compiled classes are faster in use, but require O(1-2) seconds startup overhead
– Best choice depends on use context
60
Model building – Adjusting parameterization
w.factory(“expr::w(‘(1-D)/2’,D[0,1])”) ;
w.factory(“BMixDecay::bmix(t,mixState,tagFlav,
tau,expr(‘(1-D)/2’,D[0,1]),dw,....”) ;
• RooFit pdf classes do not require their parameter arguments to be variables, one can plug in functions as well
• Simplest tool perform reparameterization is interpreted formula expression
– Note lower case: expr builds function, EXPR builds pdf
• Example: Reparameterize pdf that expects mistag rate in terms of dilution
61
Composite models3
62
RooBMixDecay
RooPolynomial
RooHistPdf
RooArgusBG
Model building – (Re)using standard components
RooAddPdf+
RooGaussian
• Most realistic models are constructed as the sum of one or more p.d.f.s (e.g. signal and background)
• Facilitated through operator p.d.f RooAddPdf
63
Adding p.d.f.s – Mathematical side
)()1()()( xGfxfFxS
)(1)(...)()()(1,0
111100 xPcxPcxPcxPcxS n
ni
inn
• From math point of view adding p.d.f is simple
– Two components F, G
– Generically for N components P0-PN
• For N p.d.f.s, there are N-1 fraction coefficients that should sum to less 1
– The remainder is by construction 1 minus the sum of all other coefficients
64
Adding p.d.f.s – Factory syntax
w.factory(“Gaussian::gauss1(x[0,10],mean1[2],sigma[1]”) ;
w.factory(“Gaussian::gauss2(x,mean2[3],sigma)”) ;
w.factory(“ArgusBG::argus(x,k[-1],9.0)”) ;
w.factory(“SUM::sum(g1frac[0.5]*gauss1, g2frac[0.1]*gauss2, argus)”)
SUM::name(frac1*PDF1,frac2*PDF2,...,PDFN)
• Additions created through a SUM expression
– Note that last PDF does not have an associated fraction
• Complete example
65
Extended ML fits
NNxBfxSfxF exp;)()1()()(
BS
BS
B
BS
S NNNxBNN
NxS
NN
NxF
exp;)()()(
BS NNNf ,,
SUM::name(Nsig*S,Nbkg*B)
Write like this, extended term automatically included in –log(L)
shape normalization
• In an extended ML fit, an extra term is added to the likelihood
Poisson(Nobs,Nexp)
• This is most useful in combination with a composite pdf
66
Component plotting - Introduction
// Plot only argus components
w::sum.plotOn(frame,Components(“argus”),LineStyle(kDashed)) ;
// Wildcards allowed
w::sum.plotOn(frame,Components(“gauss*”),LineStyle(kDashed)) ;
• Plotting, toy event generation and fitting works identically for composite p.d.f.s
– Several optimizations applied behind the scenes that are specific to composite models (e.g. delegate event generation to components)
• Extra plotting functionality specific to composite pdfs
– Component plotting
67
Operations on specific to composite pdfs
RooAddPdf::sum[ g1frac * g1 + g2frac * g2 + [%] * argus ] = 0.0687785
RooGaussian::g1[ x=x mean=mean1 sigma=sigma ] = 0.135335
RooGaussian::g2[ x=x mean=mean2 sigma=sigma ] = 0.011109
RooArgusBG::argus[ m=x m0=k c=9 p=0.5 ] = 0
• Tree printing mode of workspace reveals component structure – w.Print(“t”)
– Can also make input files for GraphViz visualization(w::sum.graphVizTree(“myfile.dot”))
– Graph output on ROOT Canvas in near future(pending ROOT integrationof GraphViz package)
68
Convolution
=
• Many experimental observable quantities are well described by convolutions
– Typically physics distribution smeared with experimental resolution (e.g. for B0 J/y KS exponential decay distribution
smeared with Gaussian)
– By explicitly describing observed distribution with a convolution p.d.f can disentangle detector and physics
• To the extent that enough information is in the data to make this possible
69
Common fittingissues4• Understanding MINUIT output
• Instabilities and correlation coefficients
79
What happens when you do
pdf->fitTo(*data) ?
80
Fitting and likelihood minimization
// Construct function object representing –log(L)
RooAbsReal* nll = pdf.createNLL(data) ;
// Minimize nll w.r.t its parameters
RooMinuit m(*nll) ;
m.migrad() ;
m.hesse() ;
• What happens when you do pdf->fitTo(*data)
– 1) Construct object representing –log of (extended) likelihood
– 2) Minimize likelihood w.r.t floating parameters using MINUIT
• Can also do these two steps explicitly by hand
81
Let take a closer look at
Minuit
82
A brief description of MINUIT functionality
1
2
22 ln
)(ˆ)(ˆ
pd
LdpVp
• MIGRAD
– Find function minimum. Calculates function gradient, follow to (local) minimum, recalculate gradient, iterate until minimum found
• To see what MIGRAD does, it is very instructive to do RooMinuit::setVerbose(1). It will print a line for each step through parameter space
– Number of function calls required depends greatly on number of floating parameters, distance from function minimum and shape of function
• HESSE
– Calculation of error matrix from 2nd derivatives at minimum
– Gives symmetric error. Valid in assumption that likelihood is (locally parabolic)
– Requires roughly N2 likelihood evaluations (with N = number of floating parameters)
83
A brief description of MINUIT functionality
• MINOS
– Calculate errors by explicit finding points (or contour for >1D) where D-log(L)=0.5
– Reported errors can be asymmetric
– Can be very expensive in with large number of floating parameters
• CONTOUR
– Find contours of equal D-log(L) in two parameters and draw corresponding shape
– Mostly an interactive analysis tool
84
Note of MIGRAD function minimization
Reason: There may exist multiple (local) minimain the likelihood or c2
p
-lo
g(L)
Local minimum
True minimum
• For all but the most trivial scenarios it is not possible to automatically find reasonable starting values of parameters
– So you need to supply ‘reasonable’ starting values for your parameters
– You may also need to supply ‘reasonable’ initial step size in parameters. (A step size 10x the range of the above plot is clearly unhelpful)
– Using RooMinuit, the initial step size is the value of RooRealVar::getError(), so you can control this by supplying
initial error values
85
Minuit function MIGRAD
**********
** 13 **MIGRAD 1000 1
**********
(some output omitted)
MIGRAD MINIMIZATION HAS CONVERGED.
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=257.304 FROM MIGRAD STATUS=CONVERGED 31 CALLS 32 TOTAL
EDM=2.36773e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 mean 8.84225e-02 3.23862e-01 3.58344e-04 -2.24755e-02
2 sigma 3.20763e+00 2.39540e-01 2.78628e-04 -5.34724e-02
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
1.049e-01 3.338e-04
3.338e-04 5.739e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.00430 1.000 0.004
2 0.00430 0.004 1.000
Parameter values and approximate errors reported by MINUIT
Error definition (in this case 0.5 for a likelihood fit)
Progress information,watch for errors here
• Purpose: find minimum
86
Minuit function MIGRAD
**********
** 13 **MIGRAD 1000 1
**********
(some output omitted)
MIGRAD MINIMIZATION HAS CONVERGED.
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=257.304 FROM MIGRAD STATUS=CONVERGED 31 CALLS 32 TOTAL
EDM=2.36773e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 mean 8.84225e-02 3.23862e-01 3.58344e-04 -2.24755e-02
2 sigma 3.20763e+00 2.39540e-01 2.78628e-04 -5.34724e-02
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
1.049e-01 3.338e-04
3.338e-04 5.739e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.00430 1.000 0.004
2 0.00430 0.004 1.000
Approximate Error matrix
And covariance matrix
Value of c2 or likelihood at minimum
(NB: c2 values are not divided by Nd.o.f)
• Purpose: find minimum
87
Minuit function MIGRAD
• Purpose: find minimum
**********
** 13 **MIGRAD 1000 1
**********
(some output omitted)
MIGRAD MINIMIZATION HAS CONVERGED.
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=257.304 FROM MIGRAD STATUS=CONVERGED 31 CALLS 32 TOTAL
EDM=2.36773e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 mean 8.84225e-02 3.23862e-01 3.58344e-04 -2.24755e-02
2 sigma 3.20763e+00 2.39540e-01 2.78628e-04 -5.34724e-02
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
1.049e-01 3.338e-04
3.338e-04 5.739e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.00430 1.000 0.004
2 0.00430 0.004 1.000
Status: Should be ‘converged’ but can be ‘failed’
Estimated Distance to Minimumshould be small O(10-6)
Error Matrix Qualityshould be ‘accurate’, but can be ‘approximate’ in case of trouble
88
Minuit function HESSE
2
2
dp
Ld
**********
** 18 **HESSE 1000
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=257.304 FROM HESSE STATUS=OK 10 CALLS 42 TOTAL
EDM=2.36534e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 mean 8.84225e-02 3.23861e-01 7.16689e-05 8.84237e-03
2 sigma 3.20763e+00 2.39539e-01 5.57256e-05 3.26535e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
1.049e-01 2.780e-04
2.780e-04 5.739e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.00358 1.000 0.004
2 0.00358 0.004 1.000
Error matrix (Covariance Matrix)
calculated from1
2 )ln(
ji
ijdpdp
LdV
• Purpose: calculate error matrix from
89
Minuit function HESSE
2
2
dp
Ld
**********
** 18 **HESSE 1000
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=257.304 FROM HESSE STATUS=OK 10 CALLS 42 TOTAL
EDM=2.36534e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 mean 8.84225e-02 3.23861e-01 7.16689e-05 8.84237e-03
2 sigma 3.20763e+00 2.39539e-01 5.57256e-05 3.26535e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
1.049e-01 2.780e-04
2.780e-04 5.739e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.00358 1.000 0.004
2 0.00358 0.004 1.000
Correlation matrix rij
calculated from
ijjiijV r
• Purpose: calculate error matrix from
90
Minuit function HESSE
2
2
dp
Ld
**********
** 18 **HESSE 1000
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=257.304 FROM HESSE STATUS=OK 10 CALLS 42 TOTAL
EDM=2.36534e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 mean 8.84225e-02 3.23861e-01 7.16689e-05 8.84237e-03
2 sigma 3.20763e+00 2.39539e-01 5.57256e-05 3.26535e-01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5
1.049e-01 2.780e-04
2.780e-04 5.739e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.00358 1.000 0.004
2 0.00358 0.004 1.000
Global correlation vector:correlation of each parameter
with all other parameters
• Purpose: calculate error matrix from
91
Minuit function MINOS
**********
** 23 **MINOS 1000
**********
FCN=257.304 FROM MINOS STATUS=SUCCESSFUL 52 CALLS 94 TOTAL
EDM=2.36534e-06 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER PARABOLIC MINOS ERRORS
NO. NAME VALUE ERROR NEGATIVE POSITIVE
1 mean 8.84225e-02 3.23861e-01 -3.24688e-01 3.25391e-01
2 sigma 3.20763e+00 2.39539e-01 -2.23321e-01 2.58893e-01
ERR DEF= 0.5
Symmetric error
(repeated result from HESSE)
MINOS errorCan be asymmetric
(in this example the ‘sigma’ error is slightly asymmetric)
• Error analysis through Dnll contour finding
92
Illustration of difference between HESSE and MINOS errors
MINOS error
HESSE error
Extrapolationof parabolicapproximationat minimum
• ‘Pathological’ example likelihood with multiple minima and non-parabolic behavior
93
Practical estimation – Fit converge problems
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.99835 1.000 0.998
2 0.99835 0.998 1.000
Signs of trouble…
• Sometimes fits don’t converge because, e.g.
– MIGRAD unable to find minimum
– HESSE finds negative second derivatives (which would imply negative errors)
• Reason is usually numerical precision and stability problems, but
– The underlying cause of fit stability problems is usually by highly correlated parameters in fit
• HESSE correlation matrix in primary investigative tool
– In limit of 100% correlation, the usual point solution becomes a line solution (or surface solution) in parameter space. Minimization problem is no longer well defined
94
Mitigating fit stability problems
),;()1(),;(),,,;( 221121 msxGfmsxfGssmfxF
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL [ f] [ m] [s1] [s2]
[ f] 0.96973 1.000 -0.135 0.918 0.915
[ m] 0.14407 -0.135 1.000 -0.144 -0.114
[s1] 0.92762 0.918 -0.144 1.000 0.786
[s2] 0.92486 0.915 -0.114 0.786 1.000
HESSE correlation matrix
Widths s1,s2
strongly correlatedfraction f
• Strategy I – More orthogonal choice of parameters
– Example: fitting sum of 2 Gaussians of similar width
95
Mitigating fit stability problems
),;()1(),;( 2212111 mssxGfmsxfG
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL [f] [m] [s1] [s2]
[ f] 0.96951 1.000 -0.134 0.917 -0.681
[ m] 0.14312 -0.134 1.000 -0.143 0.127
[s1] 0.98879 0.917 -0.143 1.000 -0.895
[s2] 0.96156 -0.681 0.127 -0.895 1.000
– Different parameterization:
– Correlation of width s2 and fraction f reduced from 0.92 to 0.68
– Choice of parameterization matters!
• Strategy II – Fix all but one of the correlated parameters
– If floating parameters are highly correlated, some of them may be redundant and not contribute to additional degrees of freedom in your model
96
Mitigating fit stability problems -- Polynomials
• Warning: Regular parameterization of polynomials a0+a1x+a2x
2+a3x3 nearly always results in strong
correlations between the coefficients ai.
– Fit stability problems, inability to find right solution common at higher orders
• Solution: Use existing parameterizations of polynomials that have (mostly) uncorrelated variables
– Example: Chebychev polynomials
97
Minuit CONTOUR tool also useful to examine ‘bad’ correlations
• Example of 1,2 sigma contour of two uncorrelated variables
– Elliptical shape. In this example parameters are uncorrelation
• Example of 1,2 sigma contourof two variables with problematic correlation
– Pdf = fG1(x,0,3)+(1-f)G2(x,0,s) with s=4 in data
98
Practical estimation – Bounding fit parameters
Bou
nd
ed
Param
ete
r s
pace
MINUIT internal parameter space (-∞,+∞)
Internal Error
Exte
rn
al Erro
r• Sometimes is it desirable to bound the allowed range of
parameters in a fit
– Example: a fraction parameter is only defined in the range [0,1]
– MINUIT option ‘B’ maps finite range parameter to an internal infinite range using an arcsin(x) transformation:
99
Working withLikelihood8• Using discrete variable to classify data
• Simultaneous fits on multiple datasets
100
Fitting and likelihood minimization
// Construct function object representing –log(L)
RooAbsReal* nll = pdf.createNLL(data) ;
// Minimize nll w.r.t its parameters
RooMinuit m(*nll) ;
m.migrad() ;
m.hesse() ;
• What happens when you do pdf->fitTo(*data)
– 1) Construct object representing –log of (extended) likelihood
– 2) Minimize likelihood w.r.t floating parameters using MINUIT
• Can also do these two steps explicitly by hand
101
Plotting the likelihood
RooAbsReal* nll = w::model.createNLL(data) ;
RooPlot* frame = w::param.frame() ;
nll->plotOn(frame,ShiftToZero()) ;
• A likelihood function is a regular RooFit function
• Can e.g. plot is as usual
102
Constructing a c2 function
// Construct function object representing –log(L)
RooAbsReal* chi2 = pdf.createChi2(data) ;
// Minimize nll w.r.t its parameters
RooMinuit m(chi2) ;
m.migrad() ;
m.hesse() ;
• Along similar lines it is also possible to construct a c2
function
– Only takes binned datasets (class RooDataHist)
– Normalized p.d.f is multiplied by Ndata to obtain c2
– MINUIT error definition for c2 automatically adjusted to 1 (it is 0.5 for likelihoods) as default error level is supplied through virtual method of function base class RooAbsReal
103
Automatic optimizations in the calculation of the likelihood
• Several automatic computational optimizations are applied the calculation of likelihoods inside RooNLLVar
– Components that have all constant parameters are pre-calculated
– Dataset variables not used by the PDF are dropped
– PDF normalization integrals are only recalculated when the ranges of their observables or the value of their parameters are changed
– Simultaneous fits: When a parameters changes only parts of the total likelihood that depend on that parameter are recalculated
• Lazy evaluation: calculation only done when intergal value is requested
• Applicability of optimization techniques is re-evaluated for each use
– Maximum benefit for each use case
• ‘Typical’ large-scale fits see significant speed increase
– Factor of 3x – 10x not uncommon.
104
Statistical procedures involving likelihood
• ‘Simple’ Parameter and error estimation (MINUIT/HESSE/MINOS)
• Construct Bayesian credible intervals
– Likelihood appears in Bayes theorem for hypothesis with continuous parameters
• Construct (Profile) Likelihood Ratio intervals
– ‘Approximate Confidence intervals’ (Wilks theoreom)
– Connection to MINOS errors
• NB: Can also construct Frequentist intervals (Neyman construction), but these are based on PDFs, not likelihoods
105
Likelihood minimization – class RooMinuit
• Class RooMinuit is an interface to the ROOT implementation of the MINUIT minimization and error analysis package.
• RooMinuit takes care of
– Passing value of miminized RooFit function to MINUIT
– Propagated changes in parameters both from RooRealVar to MINUIT and back from MINUIT to RooRealVar, i.e. it keeps the
state of RooFit objects synchronous with the MINUIT internal state
– Propagate error analysis information back to RooRealVar
parameters objects
– Exposing high-level MINUIT operations to RooFit uses (MIGRAD,HESSE,MINOS) etc…
– Making optional snapshots of complete MINUIT information (e.g. convergence state, full error matrix etc)
106
Demonstration of RooMinuit use
// Start Minuit session on above nll
RooMinuit m(nll) ;
// MIGRAD likelihood minimization
m.migrad() ;
// Run HESSE error analysis
m.hesse() ;
// Set sx to 3, keep fixed in fit
sx.setVal(3) ;
sx.setConstant(kTRUE) ;
// MIGRAD likelihood minimization
m.migrad() ;
// Run MINOS error analysis
m.minos()
// Draw 1,2,3 ‘sigma’ contours in sx,sy
m.contour(sx,sy) ;
107
What happens if there are problems in the NLL calculation
[#0] WARNING:Minization -- RooFitGlue: Minimized function has error status.
Returning maximum FCN so far (99876) to force MIGRAD to back out of this region.
Error log follows. Parameter values: m=-7.397
RooGaussian::gx[ x=x mean=m sigma=sx ] has 3 errors
• Sometimes the likelihood cannot be evaluated do due an error condition.
– PDF Probability is zero, or less than zero at coordinate where there is a data point ‘infinitely improbable’
– Normalization integral of PDF evaluates to zero
• Most problematic during MINUIT operations. How to handle error condition
– All error conditions are gather and reported in consolidated way by RooMinuit
– Since MINUIT has no interface deal with such situations, RooMinuit passes instead a large value to MINUIT to force it to retreat from the region of parameter space in which the problem occurred
108
What happens if there are problems in the NLL calculation
pdf and data
-log(L) vs m0dropping problematic events
-log(L) vs m0with ‘wall’ (RooFit default)
• Classic example in B physics: floating the end point of the ARGUS function
– Probability density of ARGUS above end point is zero If end
point is moved to low value in fit you end up with events above end point Probility is zero Likelihood is –log(0) = infinity
109
What happens if there are problems in the NLL calculation
[#0] WARNING:Minization -- RooFitGlue: Minimized function has error status.
Returning maximum FCN so far (-1e+30) to force MIGRAD to back out of this region.
Error log follows
Parameter values: m=-7.397
RooGaussian::gx[ x=x mean=m sigma=sx ]
getLogVal() top-level p.d.f evaluates to zero or negative number
@ x=x=9.09989, mean=m=-7.39713, sigma=sx=0.1
getLogVal() top-level p.d.f evaluates to zero or negative number
@ x=x=6.04652, mean=m=-7.39713, sigma=sx=0.1
getLogVal() top-level p.d.f evaluates to zero or negative number
@ x=x=2.48563, mean=m=-7.39713, sigma=sx=0.1
• Can request more verbose error logging to debug problem
– Add PrintEvalError(N) with N>1
110
Bayesian formalism
∝ ∗
Area that integrates X% of posterior
• Original Bayes Thm:
P(B|A) ∝ P(A|B) P(B).
• Let probability density function p(x|μ) be the conditional pdf for data x, given parameter μ. Then Bayes’ Thm becomes
p(μ|x) ∝ p(x|μ) p(μ).
• Substituting in a set of observed data, x0, and recognizing the likelihood, written as L(x0|μ) ,L(μ), then
p(μ|x0) ∝ L(x0|μ) p(μ),
111
Illustration of nuisance parameters in Bayesian intervals
∫ =
MLE fit fit data-logLR(mean,sigma)
LR(mean,sigma) prior(mean,sigma) posterior(mean)
• Example: data with Gaussian model (mean,sigma)
112
Bayesian formalism and integration
• Bayesian formalism often requires integration
• Straightforward to do in RooFit Integration functionality for pdfs also works for likelihood functions
113
Likelihood ratio intervals
Likelihood ratio interval
HESSE error
Extrapolationof parabolicapproximationat minimum)ˆ,(
),(),(
xL
xLxLR
• Definition of Likelihood Ratio interval (identical to MINOS for 1 parameter)
114
Dealing with nuisance parameters in Likelihood ratio intervals
MLE fit fit data
-logLR(mean,sigma) -logLR(mean,sigma)
)ˆ,ˆ(
))(ˆ̂,()(
L
L
•best L(μ) for any value of s
•best L(μ,σ)
-logPLR(mean)
• Nuisance parameters in LR interval
– For each value of the parameter of interest, search the full subspace of nuisance parameters for the point at which the likelihood is maximized.
– Associate that value of the likelihood with that value of the parameter of interest ‘Profile likelihood’
115
Working with profile likelihood
)ˆ,ˆ(
)ˆ̂,()(
qpL
qpLp
RooAbsReal* ll = model.createNLL(data,NumCPU(8)) ;
RooAbsReal* pll = ll->createProfile(params) ;
RooPlot* frame = w::frac.frame() ;
nll->plotOn(frame,ShiftToZero()) ;
pll->plotOn(frame,LineColor(kRed)) ;
Best L for given p
Best L• A profile likelihood ratio
can be represent by a regular RooFit function(albeit an expensive one to evaluate)
116
Dealing with nuisance parameters in Likelihood ratio intervals
•Likelihood Ratio
•Profile Likelihood Ratio
•Minimizes –log(L) for each value of fsig
by changing bkg shape params(a 6th order Chebychev Pol)
117
On the equivalence of profile likelihood and MINOS
• Demonstration of equivalenceof (RooFit) profile likelihoodand MINOS errors
– Macro to make above plots is34 lines of code (+23 to beautifygraphics appearance)
118
Intervals & Limits9• A brief introduction to RooStats
119
RooStats Project – Overview
• Goals:
– Standardize interface for major statistical procedures so that they can work on an arbitrary RooFit model & dataset and handle many parameters of interest and nuisance parameters.
– Implement most accepted techniques from Frequentist, Bayesian, and Likelihood-based approaches
– Provide utilities to perform combined measurements
• Design:
– Essentially all methods start with the basic probability density function or likelihood function. Building a good model is the hard part. Want to re-use it for multiple methods Use RooFit to
construct models
– Build series of tools that perform statistical procedures on RooFit models
120
RooStats Project – Structure
• RooFit (data modeling)
– Data modeling language (pdfs and likelihoods).Scales to arbitrary complexity
– Support for efficient integration, toy MC generation
– Workspace
• Persistent container for data models
• Completely self-contained (including custom code)
• Complete introspection and access to components
– Workspace factory provides easy scripting language to populate the workspace
• RooStats (limits, interval calculators & utilities)
– Profile Likelihood calculator
– Neyman construction (FC)
– Bayesian calculator (BAT & native MCMC)
– Utilities (combinations, construct pdfs corresponding to standard number counting problems)
121
RooStats Project – Organization
• Joint ATLAS/CMS project
• Core developers
– K. Cranmer (ATLAS)
– Gregory Schott (CMS)
– Wouter Verkerke (RooFit)
– Lorenzo Moneta (ROOT)
• Open project, you are welcome to join
– Max Baak, Mario Pelliccioni, Alfio Lazzaro contributing now
• Included since ROOT v5.22
– Example macros in $ROOTSYS/tutorials/roostats
• Documentation
– Code doc. via ROOT
– Esers manual is in development
122
RooStats Project – Example
RooWorkspace* w = new RooWorkspace(“w”);
w->factory(“Poisson::P(obs[150,0,300],
sum::n(s[50,0,120]*ratioSigEff[1.,0,2.],
b[100,0,300]*ratioBkgEff[1.,0.,2.]))");
w->factory("PROD::PC(P, Gaussian::sigCon(ratioSigEff,1,0.05),
Gaussian::bkgCon(ratioBkgEff,1,0.1))");
)1.0,1,()05.0,1,()|( bsbs rGaussrGaussrbrsxPoisson
RooWorkspace(w) w contents
variables
---------
(b,obs,ratioBkgEff,ratioSigEff,s)
p.d.f.s
-------
RooProdPdf::PC[ P * sigCon * bkgCon ] = 0.0325554
RooPoisson::P[ x=obs mean=n ] = 0.0325554
RooAddition::n[ s * ratioSigEff + b * ratioBkgEff ] = 150
RooGaussian::sigCon[ x=ratioSigEff mean=1 sigma=0.05 ] = 1
RooGaussian::bkgCon[ x=ratioBkgEff mean=1 sigma=0.1 ] = 1
•Create workspace with above model (using factory)
•Contents of workspace from above operation
• Create a model - Example
123
RooStats Project – Example
RooPlot* frame = w::obs.frame(100,200) ;
w::PC.plotOn(frame) ;
frame->Draw()
• Simple use of model
124
RooStats Project – Example
ProfileLikelihoodCalculator plc;
plc.SetPdf(w::PC);
plc.SetData(data); // contains [obs=160]
plc.SetParameters(w::s);
plc.SetTestSize(.1);
ConfInterval* lrint = plc.GetInterval(); // that was easy.
FeldmanCousins fc;
fc.SetPdf(w::PC);
fc.SetData(data); fc.SetParameters(w::s);
fc.UseAdaptiveSampling(true);
fc.FluctuateNumDataEntries(false);
fc.SetNBins(100); // number of points to test per parameter
fc.SetTestSize(.1);
ConfInterval* fcint = fc.GetInterval(); // that was easy.
UniformProposal up;
MCMCCalculator mc;
mc.SetPdf(w::PC);
mc.SetData(data); mc.SetParameters(s);
mc.SetProposalFunction(up);
mc.SetNumIters(100000); // steps in the chain
mc.SetTestSize(.1); // 90% CL
mc.SetNumBins(50); // used in posterior histogram
mc.SetNumBurnInSteps(40);
ConfInterval* mcmcint = mc.GetInterval();
• Confidence intervals calculated with model
– Profile likelihood
– FeldmanCousins
– Bayesian (MCMC)
125
RooStats Project – Example
double fcul = fcint->UpperLimit(w::s);
double fcll = fcint->LowerLimit(w::s);
• Retrieving and visualizing output
126
RooStats Project – Example
• Some notes on example
– Complete working example (with output visualization) shipped with ROOT distribution ($ROOTSYS/tutorials/roofit/rs101_limitexample.C)
– Interval calculators make no assumptions on internal structure of model. Can feed model of arbitrary complexity to same calculator (computational limitations still apply!)
127
Introduzione a RooSTATS
129
130
RooStats
RooStatsTutorial_120323.pdfhttps://indico.desy.de/getFile.py/access?contribId=15&resId=3&materialId=slides&confId=5065slides da 1 a 14
131
TMVA
132
133
TMVA
TMVA (Tool for Multi Variate Analysis) Utilizzo di TMVA come classificatore. Descrizione di TMVAGui.
- Multivariate Methods, di Niklaus Berger - Statistical methods for data analysis, di L. Lista (Multivariate discriminators with TMVA)
Multidimensional models5• Uncorrelated products of p.d.f.s
• Using composition to p.d.f.s with correlation
• Products of conditional and plain p.d.f.s
134
Building realistic models
* =
g(x;m,s)m(y;a0,a1)
=
g(x,y;a0,a1,s)Possible in any PDFNo explicit support in PDF code needed
– Multiplication
– Composition
135
Model building – Products of uncorrelated p.d.f.s
RooBMixDecay
RooPolynomial
RooHistPdf
RooArgusBG
RooGaussian
RooProdPdf*
)()(),( yGxFyxH
136
Uncorrelated products – Mathematics and constructors
)()(),( yGxFyxH i
iii xFxH )()( }{}{}{
2D nD
w.factory(“Gaussian::gx(x[-5,5],mx[2],sx[1])”) ;
w.factory(“Gaussian::gy(y[-5,5],my[-2],sy[3])”) ;
w.factory(“PROD::gxy(gx,gy)”) ;
• Mathematical construction of products of uncorrelated p.d.f.s is straightforward
– No explicit normalization required If input p.d.f.s are unit
normalized, product is also unit normalized (this is true only because of the absence of correlations)
• Corresponding factory operator is PROD
137
How it work – event generation on uncorrelated products
Delegate Generate Merge
• If p.d.f.s are uncorrelated, each observable can be generated separately
– Reduced dimensionality of problem (important for e.g. accept/reject sampling)
– Actual event generation delegated to component p.d.f (can e.g. use internal generator if available)
– RooProdPdf just aggregates output in single dataset
138
Fundamental multi-dimensional p.d.fs
EXPR::mypdf(‘sqrt(x+y)*sqrt(x-y)’,x,y) ;
• It also possible define multi-dimensional p.d.f.s that do not arise through a product construction
– For example
– But usually n-dim p.d.f.s are constructed more intuitively through product constructs. Also correlations can be introduced efficiently (more on that in a moment)
• Example of fundamental 2-D B-physics p.d.f. RooBMixDecay
– Two observables: decay time (t, continuous) mixingState (m, discrete [-1,+1])
139
Plotting multi-dimensional PDFs
RooPlot* xframe = x.frame() ;
data->plotOn(xframe) ;
prod->plotOn(xframe) ;
xframe->Draw() ;
c->cd(2) ;
RooPlot* yframe = y.frame() ;
data->plotOn(yframe) ;
prod->plotOn(yframe) ;
yframe->Draw() ;
dyyxpdfxf ),()(
dxyxpdfyf ),()(
-Plotting a dataset D(x,y) versus x represents a projection over y
-To overlay PDF(x,y), you must plot Int(dy)PDF(x,y)
-RooFit automatically takes care of this!
•RooPlot remembers dimensions of plotted datasets
140
Introduction to slicing
x = x.getVal()
Slice in x
Range in y
• With multidimensional p.d.f.s it is also often useful to be able to plot a slice of a p.d.f
• In RooFit
– A slice is thin
– A range is thick
• Slices mostly usefulin discrete observables
– A slice in a continuous observablehas no width and usually no datawith the corresponding cut (e.g. “x=5.234”)
• Ranges work for bothcontinuous and discrete observables
– Range of discrete observablecan be list of >=1 state
141
Plotting a slice of a dataset
// Mixing dataset defines dt,mixState
RooDataSet* data ;
// Plot the entire dataset
RooPlot* frame = dt.frame() ;
data->plotOn(frame) ;
// Plot the mixed part of the data
RooPlot* frame_mix = dt.frame() ;
data->plotOn(frame,
Cut(”mixState==mixState::mixed”)) ;
• Use the optional cut string expression
– Works the same for binned data sets
142
Plotting a slice of a p.d.f
RooPlot* dtframe = dt.frame() ;
data->plotOn(dtframe,Cut(“mixState==mixState::mixed“)) ;
bmix.plotOn(dtframe,Slice(mixState,”mixed”)) ;
dtframe->Draw() ;
For slices both data and p.d.f normalize with respect to full dataset. If fraction ‘mixed’ in above example disagrees between data and p.d.f prediction, this discrepancy will show in plot
143
Plotting a range of a p.d.f and a dataset
RooPlot* xframe = x.frame() ;
data->plotOn(xframe) ;
model.plotOn(xframe) ;
y.setRange(“sig”,-1,1) ;
RooPlot* xframe2 = x.frame() ;
data->plotOn(xframe2,CutRange("sig")) ;
model.plotOn(xframe2,ProjectionRange("sig")) ;
model(x,y) = gauss(x)*gauss(y) + poly(x)*poly(y)
Works also with >2D projections (just specify projection range on all projected observables)
Works also with multidimensional p.d.fs that have correlations
144
Physics example of combined range and slice plotting
// Plot projection on mB
RooPlot* mbframe = mb.frame(40) ;
data->plotOn(mbframe) ;
model.plotOn(mbframe) ;
// Plot mixed slice projection on deltat
RooPlot* dtframe = dt.frame(40) ;
data>plotOn(dtframe,
Cut(”mixState==mixState::mixed”)) ;
model.plotOn(dtframe,Slice(mixState,”mixed”)) ;
Example setup:Argus(mB)*Decay(dt) +
Gauss(mB)*BMixDecay(dt)
(background)(signal)
mB
dt (mixed slice)
145
Plotting a range - Example
Example setup:Argus(mB)*Decay(dt) +
Gauss(mB)*BMixDecay(dt)
(background)(signal)
mb.setRange(“signal”,5.27,5.30) ;
mbSliceData->plotOn(dtframe2,
Cut("mixState==mixState::mixed“),
CutRange(“signal”))
model.plotOn(dtframe2,Slice(mixState,”mixed”),
ProjectionRange(“signal”))
mB
dt (mixed slice)
dt (mixed slice &&“signal” range)
“signal”
146
Plotting a range - Example
// Generate 80K toy MC events from p.d.f to be projected
RooDataSet *toyMC =
model.generate(RooArgSet(dt,mixState,tagFlav,mB),80000);
// Apply desired cut on toy MC data
RooDataSet* mbSliceToyMC = toyMC->reduce(“mb>5.27”);
// Plot data requesting data averaging over selected toy MC data
model.plotOn(dtframe2,Slice(mixState),ProjWData(mb,mbSliceToyMC))
),(
),,(1
),,(zyD
ii zyxMN
dydzzyxM
• We can also plot the finite width slice with a different technique toy MC integration
147
Plotting non-rectangular PDF regions
4)3()5( 22 yx ‘donut’
• Why is this interesting? Because with this technique we can trivially implement projection over arbitrarily shaped regions.
– Any cut prescription that you can think of to apply to data works
• Example: Likelihood ratio projection plot
– Common technique in rare decay analyses
– PDF typically consist of N-dimensional event selection PDF,where N is large (e.g. 6.)
– Projection of data & PDF in any of the N dimensions doesn’t show a significant excess of signal events
– To demonstrate purity of selected signal, plot data distribution (with overlaid PDF) in one dimension, while selecting events with a cut on the likelihood ratio of signal and background in the remaining N-1 dimensions
148
Likelihood ratio plots
dxzyxBfzyxSf
dxzyxSfzyLR
),,()1(),,(
),,(),(
),,()1(),,(
),,(),,(
zyxBfzyxSf
zyxSfzyxLR
•Integrate over x
•Plot LR vs (y,z)
• Idea: use information on S/(S+B) ratio in projected observables to define a cut
• Example: generalize previous toy model to 3 dimensions
• Express information on S/(S+B) ratio of model in terms of integrals over model components
149
Likelihood ratio plots
),(5.0),(
),,(1
),,(zyD
ii
zyLR
zyxMN
dydzzyxM•Dataset with values of (y,z)sampled from p.d.f andfiltered for events that meetLR(y,z)>0.5
•All events •Only LR(y,z)>0.5
• Decide on s/(s+b) puritycontour of LR(y,z)
– Example s/(s+b) > 50%
• Plot both data and model with corresponding cut.
– For data: calculate LR(y,z) for each event, plot only event with LR>0.5
– For model: using Monte Carlo integration technique:
150
Likelihood ratio plot on model with correlations
151
Likelihood ratio plots – Coded example
// Construct likelihood ratio in projection on (y,z)
w.factory("expr::LR('fsig*psig/ptot',fsig,
PROJ::psig(sig,x),PROJ::ptot(model,x))") ;
// Generate toy dataset for MC integration over region with LR>68%
RooDataSet* tmpdata = model.generate(RooArgSet(x,y,z),10000) ;
tmpdata->addColumn(*w.function(“LR”)) ;
RooDataSet* projdata = (RooDataSet*) tmpdata->reduce(Cut("LR>0.68")) ;
// Add LR to observed data so we can cut on it
data->addColumn(*w.function(“LR”)) ;
RooDataSet* seldata = (RooDataSet*) data->reduce(Cut("LR>0.68")) ;
// Make plot for data and pdf
RooPlot* frame3 = x.frame(Title("Projection with LR(y,z)>68%")) ;
seldata->plotOn(frame3) ;
model.plotOn(frame3,ProjWData(*projdata)) ;
dxzyxBfzyxSf
dxzyxSfzyLR
),,()1(),,(
),,(),(
152
Plotting in more than 2,3 dimensions
TH2D* ph2 = pdf.createHistogram(“ph2”,x,YVar(y)) ;
TH2* dh2 = data.createHistogram(“dg2",x,Binning(10),
YVar(y,Binning(10)));
ph2->Draw("SURF") ;
dh2->Draw("LEGO") ;
• No equivalent of RooPlot for >1 dimensions
– Usually >1D plots are not overlaid anyway
• Easy to use createHistogram() methods provided in both RooAbsData and RooAbsPdf to fill ROOT 2D,3D histograms
153
Building models – Introducing correlations
);,()),(,();( qyxfqypxfpxf
• Easiest way to do this is
– start with 1-dim p.d.f. and change on of its parameters into a function that depends on another observable
– Natural way to think about it
• Example problem
– Observable is reconstructed mass M of some object.
– Fitting Gaussian g(M,mean,sigma) some background to dataset D(M)
– But reconstructed mass has bias depending on some other observable X
– Rewrite fit functions as g(M,meanCorr(mtrue,X,alpha),sigma)where meanCorr is an (emperical) function that corrects for the bias depending on X
154
Introducing correlations through composition
);,()),(,();( qyxfqypxfpxf
w.factory(“expr::mean(‘a*y+b’,y[-10,10],a[0.7],b[0.3])”) ;
w.factory(“Gaussian::g(x[-10,10],mean,sigma[3])”) ;
• RooFit pdf building blocks do not require variables as input, just real-valued functions
– Can substitute any variable with a function expression in parameters and/or observables
– Example: Gaussian with shifting mean
– No assumption made in function on a,b,x,y being observables or parameters, any combination will work
155
What does the example p.d.f look like?
Projection on Y
Projection on X
• Use example model with x,y as observables
• Note flat distribution in y. Unlikely to describe data, solutions:
1. Use as conditional p.d.f g(x|y,a,b)
2. Use in conditional form multiplied by another pdf in y: g(x|y)*h(y)
156
Conditional p.d.f.s – Formulation and construction
xdpyxf
pyxfpyxF
),,(
),,();|(
• Mathematical formulation of a conditional p.d.f
– A conditional p.d.f is not normalized w.r.t its conditional observables
– Note that denominator in above expression depends on y and is thus in general different for each event
• Constructing a conditional p.d.f in RooFit
– Any RooFit p.d.f can be used as a conditional p.d.f as objects have no internal notion of distinction between parameters, observables and conditional observables
– Observables that should be used as conditional observables have to be specified in use context (generation, plotting, fitting etc…)
157
Method 1 – Using a conditional p.d.f – fitting and plotting
pdf.fitTo(data,ConditionalObservables(y))
xdyxf
yxfyxF
),(
),()|(
Ni
D i
ip
dxyxp
yxp
NxP
,1
),(
),(1)(
dxdyyxp
dyyxpxPp
),(
),()(
Sum over all yi in dataset DIntegrate over y
• For fitting, indicate in fitTo() call what the conditional observables are
– You may notice a performance penalty if the normalization integral of the p.d.f needs to be calculated numerically. For a conditional p.d.f it must evaluated again for each event
• Plotting: You cannot project a conditional F(x|y) on xwithout external information on the distribution of y
– Substitute integration with averaging over y values in data
158
How it works – event generation with conditional p.d.f.s
• Just like plotting, event generation of conditional p.d.f.s requires external input on the conditional observables
– Given an external input dataset P(dt)
– For each event in P, set the value of dt in F(d|dt) to dti
generate one event for observable t from F(t|dti)
– Store both ti and dti in the output dataset
159
Physics example with conditional p.d.f.s
),,();()( mtRtDtF
),,();()|( tmtRtDttF
• Want to fit decay time distribution of B0 mesons (exponential) convoluted with Gaussian resolution
• However, resolution on decay time varies from event by event (e.g. more or less tracks available).
– We have in the data an error estimate dt for each measurement from the decay vertex fitter (“per-event error”)
– Incorporate this information into this physics model
– Resolution in physics model is adjusted for each event to expected error.
– Overall scale factor can account for incorrect vertex error estimates (i.e. if fitted >1 then dt was underestimate of true error)
– Physics p.d.f must used conditional conditional p.d.f because it give no sensible prediction on the distribution of the per-event errors
160
Physics example with conditional p.d.f.s
),,();()|( tmtRtDttF
Small dt
Large dt
// Plotting of decay(t|dterr)
RooPlot* frame = dt.frame() ;
data->plotOn(frame2) ;
decay_gm1.plotOn(frame2,ProjWData(*data)) ;
Ni
D i
ip
dxyxp
yxp
NxP
,1
),(
),(1)(
Note that projecting over largedatasets can be slow. You can speedthis up by projecting with a binnedcopy of the projection data
• Some illustrations of decay model with per-event errors
– Shape of F(t|t) for several values of t
• Plot of D(t) and F(t|dt) projected over dt
161
Method 2 – Building products with conditional pdfs
• Use of conditional pdf in fitting, plotting, event generation has some practical drawbacks
– Need external dataset with distribution in conditional observable in all operations
• But there is also a fundamental issue
– If your model has both a signal and a background component, the model assumes that the distribution of the conditional observable (e.g. the per-event error) is the same for signal and background
– This may not be a valid assumption (‘Punzi effect’)
– Way out: Construct a product F(x|y)*G(y) separately for signal and background
162
Example with product of conditional and plain p.d.f.
// I - Use g as conditional pdf g(x|y)
w::g.fitTo(data,ConditionalObservables(w::y)) ;
// II - Construct product with another pdf in y
w.factory(“Gaussian::h(y,0,2)”) ;
w.factory(“PROD::gxy(g|y,h)”) ;
gx(x|y) gy(y)* model(x,y)=
dyygyxgx )()|(163
Example with product of conditional and plain p.d.f.
)()|()()|(),( dtbdttBdtsdttSdttF
• Following the ‘conditional product’ formalism you can now choose different distributions for the conditional observable for signal and background e.g.
• At this point F(t,dt) is a plain pdf: fitting plotting and event generation works ‘as usual’ without external input
• You may want to use an empirical pdf for s(dt) or b(dt) if these distributions are difficult to model
– Histogram based pdf (RooHistPdf)
– Kernel estimatin pdf (RooKeysPdf) Set next slide
164
Special pdfs – Kernel estimation model
Sample of eventsGaussian pdffor each event
Summed pdffor all events
Adaptive Kernel:width of Gaussian depends on local event density
w.import(myData,Rename(“myData”)) ;
w.factory(“KeysPdf::k(x,myData)”) ;
• Kernel estimation model
– Construct smooth pdf from unbinned data, using kernel estimation technique
• Example
• Also available for n-D data
165
Fit validation,
Toy MC studies6• Goodness-of-fit, c2
• Toy Monte Carlo studies for fit validation
166
How do you know if your fit was ‘good’
• Goodness-of-fit broad issue in statistics in general, will just focus on a few specific tools implemented in RooFit here
• For one-dimensional fits, a c2 is usually the right thing to do
– Some tools implemented in RooPlot to be able to calculate c2/ndf of curve w.r.t data
double chi2 = frame->chisquare(nFloatParam) ;
– Also tools exists to plot residual and pull distributions from curve and histogram in a RooPlot
frame->makePullHist() ;
frame->makeResidHist() ;
167
GOF in >1D, other aspects of fit validity
• No special tools for >1 dimensional goodness-of-fit
– A c2 usually doesn’t work because empty bins proliferate with dimensions
– But if you have ideas you’d like to try, there exists generic base classes for implementation that provide the same level of computational optimization and parallelization as is done for likelihoods (RooAbsOptTestStatistic)
• But you can study many other aspect of your fit validity
– Is your fit unbiased?
– Does it (often) have convergence problems?
• You can answer these with a toy Monte Carlo study
– I.e. generate 10000 samples from your p.d.f., fit them all and collect and analyze the statistics of these 10000 fits.
– The RooMCStudy class helps out with the logistics
168
Advanced features – Task automation
Input model Generate toy MC Fit model
Repeat N times
Accumulatefit statistics
Distribution of- parameter values- parameter errors- parameter pulls
// Instantiate MC study manager
RooMCStudy mgr(inputModel) ;
// Generate and fit 100 samples of 1000 events
mgr.generateAndFit(100,1000) ;
// Plot distribution of sigma parameter
mgr.plotParam(sigma)->Draw()
• Support for routine task automation, e.g. goodness-of-fit study
169
How to efficiently generate multiple sets of ToyMC?
• Use RooMCStudy class to manage generation and fitting
• Generating features
– Generator overhead only incurred once Efficient for large number of small samples
– Optional Poisson distribution for #events of generated experiments
– Optional automatic creation of ASCII data files
• Fitting
– Fit with generator PDF or different PDF
– Fit results (floating parameters & NLL) automatically collected in summary dataset
• Plotting
– Automated plotting for distribution of parameters, parameter errors, pulls and NLL
• Add-in modules for optional modifications of procedure
– Concrete tools for variation of generation parameters, calculation of likelihood ratios for each experiment
– Easy to write your own. You can intervene at any stage and offer proprietary data to be aggregated with fit results
170
A RooMCStudy example
// Setup PDF
RooRealVar x("x","x",-5,15) ;
RooRealVar mean("mean","mean of gaussian",-1) ;
RooRealVar sigma("sigma","width of gaussian",4) ;
RooGaussian gauss("gauss","gaussian PDF",x,mean,sigma) ;
// Create manager
RooMCStudy mgr(gauss,gauss,x,””,”mhv”) ;
// Generate and fit 1000 experiments of 100 events each
mgr.generateAndFit(1000,100) ;
RooMCStudy::run: Generating and fitting sample 999
RooMCStudy::run: Generating and fitting sample 998
RooMCStudy::run: Generating and fitting sample 997
…
Fitting Options
Generator Options
Observables
Generator PDF
Fitting PDF
• Generating and fitting a simple PDF
171
A RooMCStudy example
// Plot the distrution of the value
RooPlot* mframe = mean.frame(-2,0) ;
mgr.plotParamOn(mframe) ;
mframe->Draw() ;
// Plot the distrution of the error
RooPlot* meframe = mgr.plotError(mean,0.,0.1) ;
meframe->Draw() ;
// Plot the distrution of the pull
RooPlot* mpframe = mgr.plotPull(mean,-3,3,40,kTRUE) ;
mpframe->Draw() ;
Add Gaussian fit
• Plot the distribution of the value, error and pull of mean
172
A RooMCStudy example
// Plot the distribution of the NLL
mgr.plotNLL(mframe) ;
mframe->Draw() ;
• Plot the distribution of –log(L)
• NB: likelihood distributions cannot be used to deduce goodness-of-fit information!
173
A RooMCStudy example
mgr.fitParDataSet().get(10)->Print(“v”) ;
RooArgSet:::
1) RooRealVar::mean : 0.14814 +/- 0.191 L(-10 - 10)
2) RooRealVar::sigma : 4.0619 +/- 0.143 L(0 - 20)
3) RooRealVar::NLL : 2585.1 C
4) RooRealVar::meanerr : 0.19064 C
5) RooRealVar::meanpull : 0.77704 C
6) RooRealVar::sigmaerr : 0.14338 C
7) RooRealVar::sigmapull : 0.43199 C
TH2* h = mean.createHistogram("mean vs sigma",sigma) ;
mgr.fitParDataSet().fillHistogram(h,RooArgList(mean,sigma)) ;
h->Draw("BOX") ;
Pulls and errorshave separateentries foreasy accessand plotting
• For other uses, use summarized fit results in RooDataSet form
174
Fit Validation Study – Practical example
);();(),,,;( bkgsigbkgsig BSBS pmANpmGNppNNmF
Nsig(fit)
Nsig(generated)
• Example fit model in 1-D (B mass)
– Signal component is Gaussian centered at B mass
– Background component is Argus function (models phase space near kinematic limit)
• Fit parameter under study: Nsig
– Results of simulation study: 1000 experiments with NSIG(gen)=100, NBKG(gen)=200
– Distribution of Nsig(fit)
– This particular fit looks unbiased…
175
Fit Validation Study – The pull distribution
(Nsig)
fit
N
true
sig
fit
sig NN
)pull(N sig
pull(Nsig)
• What about the validity of the error?
– Distribution of error from simulated experiments is difficult to interpret…
– We don’t have equivalent of Nsig(generated) for the error
• Solution: look at the pull distribution
– Definition:
– Properties of pull:
• Mean is 0 if there is no bias
• Width is 1 if error is correct
– In this example: no bias, correct errorwithin statistical precision of study
176
Fit Validation Study – Low statistics example
• Special care should be taken when fitting small data samples
– Also if fitting for small signal component in large sample
• Possible causes of trouble
– c2 estimators may become approximate as Gaussian approximation of Poisson statistics becomes inaccurate
– ML estimators may no longer be efficient error estimate from 2nd derivative may become inaccurate
– Bias term proportional to 1/N of ML and c2 estimators may no longer be small compared to 1/sqrt(N)
• In general, absence of bias, correctness of error can not be assumed. How to proceed?
– Use unbinned ML fits only – most robust at low statistics
– Explicitly verify the validity of your fit
177
Demonstration of fit bias at low N – pull distributions
NBKG(gen)=200
NSIG(gen)=20
Distributions becomeasymmetric at low statistics
NSIG(fit) (NSIG) pull(NSIG)
NSIG(gen)
Pull mean ~2 away from 0 Fit is positively biased!
• Low statistics example:
– Scenario as before but now with 200 bkg events and only 20 signal events (instead of 100)
• Results of simulation study
• Absence of bias, correct error at low statistics not obvious
178
New developments for automated studies
• A new alternative framework is being put in place to replace class RooMCStudy.
– Class RooStudyManager manages logistics of repeated studies, but does not implement content of study.
– Abstract concept of study interfaced through class RooAbsStudy
– Class RooGenFitStudy manages implementation of ‘generate-and-fit’ style studies (functionality of RooMCStudy)
• Greater flexibility in choice of study (you can put in anything you want)
• Support for multiple backend implementations
– Inline calculation (as done in RooMCStudy)
– Parallelized execution through PROOF (lite)
– Almost complete automation of support for batch submission
– Just need to change one line of your macro to change back-end
179
Demo of parallelization with PROOF-lite
RooStudyManager mcs(*w,gfs) ;
mcs.run(1000) ; // inline running
mcs.runProof(1000,"") ; // empty string is PROOF-lite
mcs.prepareBatchInput("default",1000,kTRUE) ;
• Example – Factor 8 speed up on a dual-quad core box.
– Works with out-of-the box ROOT distribution
– Also: Graceful early termination when users presses ‘Stop’
• Much larger gains can be made with ‘real’ PROOF farms180
Constructing joint models7• Using discrete variable to classify data
• Simultaneous fits on multiple datasets
181
Datasets and discrete observables
Dataset A
X
5.0
3.7
1.2
4.3 Dataset B
X
5.0
3.7
1.2
Dataset A+B
X source
5.0 A
3.7 A
1.2 A
4.3 A
5.0 B
3.7 B
1.2 B
• Discrete observables play an important role in management of datasets
– Useful to classify ‘sub datasets’ inside datasets
– Can collapse multiple, logically separate datasets into a single dataset by adding them and labeling the source with a discrete observable
– Allows to express operations such a simultaneous fits as operation on a single dataset
182
Discrete variables in RooFit – RooCategory
// Define a cat. with explicitly numbered states
w.factory(“b0flav[B0=-1,B0bar=1]”) ;
// Define a category with labels only
w.factory(“tagCat[Lepton,Kaon,NT1,NT2]”) ;
w.factory(“sample[CPV,BMixing]”) ;
• Properties of RooCategory variables
– Finite set of named states self documenting
– Optional integer code associated with each state
• Used for classification of data, or to describe occasional discrete fundamental observable (e.g. B0 flavor)
183
Datasets and discrete observables – part 2
RooDataSet simdata("simdata","simdata",x,source,
Import(“A",*dataA),Import(“B",*dataB)) ;
• Example of constructing a joint dataset from 2 inputs
• But can also derive classification from info within dataset
– E.g. (10<x<20 = “signal”, 0<x<10 | 20<x<30 = “sideband”)
– Encode classification using realdiscrete mapping functions
184
A universal realdiscrete mapping function
// Mass variable
RooRealVar m(“m”,”mass,0,10.);
// Define threshold category
RooThresholdCategory region(“region”,”Region of M”,m,”Background”);
region.addThreshold(9.0, “SideBand”) ;
region.addThreshold(7.9, “Signal”) ;
region.addThreshold(6.1,”SideBand”) ;
region.addThreshold(5.0,”Background”) ;
Sig Sidebandbackground
Default state
Define region boundaries
• Class RooThresholdCategory maps ranges of input RooRealVar to states of a RooCategory
185
Discrete multiplication function
// Define ‘product’ of tagCat and runBlock
RooSuperCategory prod(“prod”,”prod”,RooArgSet(tag,flav))
flav
B0
B0bar
tag
Lepton
Kaon
NT1
NT2
prod
{B0;Lepton} {B0bar;Lepton}
{B0;Kaon} {B0bar;Kaon}
{B0;NT1} {B0bar;NT1}
{B0;NT2} {B0bar;NT2}
X
• RooSuperCategory/RooMultiCategory provides
category multiplication
186
DiscreteDiscrete mapping function
RooCategory tagCat("tagCat","Tagging category") ;
tagCat.defineType("Lepton") ;
tagCat.defineType("Kaon") ;
tagCat.defineType("NetTagger-1") ;
tagCat.defineType("NetTagger-2") ;
RooMappedCategory tagType(“tagType”,”type”,tagCat) ;
tagType.map(“Lepton”,”CutBased”) ;
tagType.map(“Kaon”,”CutBased”) ;
tagType.map(“NT*”,”NeuralNet”) ;
Define inputcategory
Create mappedcategory
Add mapping rules
Wildcard expressionsallowed
tagCat
Lepton
Kaon
NT1
NT2
tagType
CutBased
NeuralNet
• RooMappedCategory provides cat cat mapping
187
Exploring discrete data
RooTable* table=data->table(b0flav) ;
table->Print() ;
Table b0flav : aData+-------+------+| B0 | 4949 || B0bar | 5051 |+-------+------+
Double_t nB0 = table->get(“B0”) ;
Double_t b0Frac = table->getFrac(“B0”);
data->table(tagCat,"x>8.23")->Print() ;
Table tagCat : aData(x>8.23)+-------------+-----+| Lepton | 668 || Kaon | 717 || NetTagger-1 | 632 || NetTagger-2 | 616 |+-------------+-----+
Tabulate contents of datasetby category state
Extract contents by label
Extract contents fraction by label
Tabulate contents of selected part of dataset
• Like real variables of a dataset can be plotted,discrete variables can be tabulated
188
Exploring discrete data
data->table(b0Xtcat)->Print() ;
Table b0Xtcat : aData+---------------------+------+| {B0;Lepton} | 1226 || {B0bar;Lepton} | 1306 || {B0;Kaon} | 1287 || {B0bar;Kaon} | 1270 || {B0;NetTagger-1} | 1213 || {B0bar;NetTagger-1} | 1261 || {B0;NetTagger-2} | 1223 || {B0bar;NetTagger-2} | 1214 |+---------------------+------+
data->table(tcatType)->Print() ;
Table tcatType : aData+----------------+------+| Unknown | 0 || Cut based | 5089 || Neural Network | 4911 |+----------------+------+
Tabulate RooSuperCategory states
Tabulate RooMappedCategory states
• Discrete functions, built from categories in a datasetcan be tabulated likewise
189
Fitting multiple datasets simultaneously
• Simultaneous fitting efficient solution to incorporate information from control sample into signal sample
• Example problem: search rare decay
– Signal dataset has small number entries.
– Statistical uncertainty on shape in fit contributes significantly to uncertainty on fitted number of signal events
– However can constrain shape of signal from control sample (e.g. another decay with similar properties that is not rare), so no need to relay on simulations
190
Fitting multiple datasets simultaneously
• Fit to control sample yields accurate information on shape of signal
• Q: What is the most practical way to combine shape measurement on control sample to measurement of signal on physics sample of interest
• A: Perform a simultaneous fit
– Automatic propagation of errors & correlations
– Combined measurement (i.e. error will reflect contributions from both physics sample and control sample
191
Discrete observable as data subset classifier
mi
i
BB
ni
i
AA DPDFDPDFL,1,1
))(log())(log()log(
‘CTL’‘SIG’
Combined-lo
g(L
)
• Likelihood level definition of a simultaneous fit
• Minimize -logL(a,b,c)= -logL(a,b)+ -logL(b,c)
– Errors, correlations on common par. b automatically propagated192
Discrete observable as data subset classifier
mi
i
BB
ni
i
AA DPDFDPDFL,1,1
))(log())(log()log(
RooSimultaneous implements ‘switch’ PDF:
case (indexCat) {
A: return pdfA ;
B: return pdfB ;
}
Likelihood of switchPdfwith composite datasetautomatically constructssum of likelihoods above
ni
i
BADsimPDFL,1
))(log()log(
• Likelihood level definition of a simultaneous fit
• PDF level definition of a simultaneous fit
193
Practical fitting – Simultaneous fit technique
•Dsig(x), Fsig(x;a,b) •Dctl(x), Fctl(x;b,c)
• given data Dsig(x) and model Fsig(x;a,b) anddata Dctl(x) and model Fctl(x;b,c)
– Construct –log[Lsig(a,b)] and –log[Lctl(b,c)] and
194
Constructing joint pdfs
// Pdfs for channels ‘A’ and ‘B’
w.factory(“Gaussian::pdfA(x[-10,10],mean[-10,10],sigma[3])”) ;
w.factory(“Uniform::pdfB(x)”) ;
// Create discrete observable to label channels
w.factory(“index[A,B]”) ;
// Create joint pdf
w.factory(“SIMUL::joint(index,A=pdfA,B=pdfB)”) ;
RooDataSet *dataA, *dataB ;
RooDataSet dataAB(“dataAB”,”dataAB”,Index(w::index),
Import(“A”,*dataA),Import(“B”,*dataB)) ;
49
• Operator class SIMUL to construct joint models at the pdf level
• Can also construct joint datasets
195
Building simultaneous fits in RooFit
// Signal pdf
w.factory("Gaussian::sig(x[-10,10],mean[0,-10,10],sigma[3,2,4])") ;
w.factory("Uniform::bkg(x)") ;
w.factory("SUM::model(Nsig[800,0,1000]*sig,Nbkg[0,1000]*bkg)") ;
// Background pdf
w.factory("Gaussian::sig_control(x[-10,10],mean[0,-10,10],sigma[3,2,4])") ;
w.factory("Chebychev::bkg_control(x,a0[1])") ;
w.factory("SUM::model_control(Nsig_control[500,0,10000]*sig_control,
Nbkg_control[500,0,10000]*bkg_control)") ;
// Joint pdf construction
w.factory("SIMUL::model_sim(index[sig,control],
sig=model, control=model_control)") ;
// Joint data construction
RooDataSet simdata("simdata","simdata",w::x,Index(w::index),
Import("sig",*data),Import("control",*data_control)) ;
// Joint fit
RooFitResult* rs = w::model_sim.fitTo(simdata,Save()) ;
• Code that construct example shown 2 slides back
196
Constructing joint likelihood
RooAbsReal* nllJoint = w::joint.createNLL(dataAB) ;
RooAbsReal* nllA = w::A.createNLL(*dataA) ; w.import(nllA) ;
RooAbsReal* nllB = w::B.createNLL(*dataB) ; w.import(nllB) ;
w.factory(sum::nllJoint(nllA,nllB)) ;
50
• When you have a simultaneous pdf you can create a joint likelihood from the joint pdf
• Also possible to make likelihood functions of the components first and then add them
• Likelihood constructed either way is the same.
• Minimization of joint likelihood == Joint fit
197
Other scenarios in which simultaneous fits are useful
• Preceding example was ‘asymmetric’
– Very large control sample, small signal sample
– Physics in each channel possibly different (but with some similar properties
• There are also ‘symmetric’ use cases
– Fit multiple data sets that are functionally equivalent, but have slightly different properties (e.g. purity)
– Example: Split B physics data in block separated by flavor tagging technique (each technique results in a different sensitivity to CP physics parameters of interest).
– Split data in block by data taking run, mass resolutions in each run may be slightly different
– For symmetric use cases pdf-level definition of simultaneous fit very convenient as you usually start with a single dataset with subclassing formation derived from its observables
• By splitting data into subsamples with p.d.f.s that can be tuned to describe the (slightly) varying properties you can increase the statistical sensitivity of your measurement
198
A more empirical approach to simultaneous fits
• Instead of investing a lot of time in developing multi-dimensional models Split data in many subsamples, fit all subsamples
simultaneously to slight variations of ‘master’ p.d.f
• Example: Given dataset D(x,y) where observable of interest is x.
– Distribution of x varies slightly with y
– Suppose we’re only interested in the width of the peakwhich is supposed to be invariant under y (unlike mean)
– Slice data in 10 bins of y and simultaneous fit each bin with p.d.f that only has different Gaussian mean parameter, but same width
199
A more empirical approach to simultaneous fits
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
mean_bin1 -4.5302e+00 +/- 1.62e-02
mean_bin2 -3.4928e+00 +/- 1.38e-02
mean_bin3 -2.4790e+00 +/- 1.35e-02
mean_bin4 -1.4174e+00 +/- 9.64e-03
mean_bin5 -4.8945e-01 +/- 7.95e-03
mean_bin6 4.0716e-01 +/- 9.67e-03
mean_bin7 1.4733e+00 +/- 1.37e-02
mean_bin8 2.4912e+00 +/- 1.44e-02
mean_bin9 3.5028e+00 +/- 1.41e-02
mean_bin10 4.5474e+00 +/- 1.68e-02
sigma 2.7319e-01 +/- 2.46e-03
• Fit to sample of preceding page would look like this
– Each mean is fitted to expected value (-4.5 + ibin)
– But joint measurement of sigma
– NB: Correlation matrix is mostly diagonal as all mean_binXX parameters are completely uncorrelated!
200
A more empirical approach to simultaneous fits
• Preceding example was simplistic for illustrational clarity, but more sensible use cases exist
– Example: Measurement CP violation in B decay. Analyzing power of each event is diluted by factor (1-2w) where w is the mistake rate of the flavor tagging algorithm
– Neural net flavor tagging algorithm provides a tagging probability for each event in data. Could use prob(NN) as w, but then we rely on good calibration of NN, don’t want that
– In a simultaneous fit to CPV+Mixing samples, can measure average w from the latter. Now not relying on NN calibration, but not exploiting event-by-event variation in analysis power.
– Improved scenario: divide (CPV+mixing) data in 10 or 20 subsets corresponding to bins in prob(NN). Use identical p.d.f but only have separate parameter to express fitted mistag rate w_binXX.
– Simultaneous fit will now exploit difference in analyzing power of events and be insensitive to calibration of flavor tagging NN.
– If calibration of NN was OK fitting mistag rate in each bin of probNN will be average probNN value for that bin
201
A more empirical approach to simultaneous fits
Event with little analyzing power
Event withgreat analyzing
power
NN predicted power
NN predicted power
NN predicted power
co
ntr
ol sam
ple
m
easu
red
po
wer
co
ntr
ol sam
ple
m
easu
red
po
wer
co
ntr
ol sam
ple
m
easu
red
po
wer
Perfect NN
OK NN
Lousy NN
In all 3 casesfit not biasedby NN calibration
Better precisionon CPV meas.because moresensitive events in sample
Worse precisionon CPV meas.because lesssensitive events in sample
202
Building simultaneous fits from a template
// Template pdf – B0 decay with mixing
w.factory("TruthModel::tm(t[-20,20])") ;
w.factory("BMixDecay::sig(t,mixState[mixed=-1,unmixed=1],
tagFlav[B0=1,B0bar=-1], tau[1.54,1,2],
dm[0.472,0.1,0.8],w[0.1,0,0.5],dw[0],tm)") ;
// Construct index category
w.factory(“tag[Lep,Kao,NT1,NT2]”) ;
// Construct simultaneous pdf with separate mistag rate for each category
w.factory(“SIMCLONE::model(sig,$SplitParam({w,dw},tagCat)”) ;
• In the ‘symmetric’ use case the models assigned to each state are very similar in structure – Usually just one parameter name is different
• Easiest way to construct these from a template pdf and a prescription on how to tailor the template for each index state
• Use operator SIMCLONE instead of SIMUL
203
Building simultaneous fits from a template
RooWorkspace(w) w contents
variables
---------
(dm,dw,dw_Kao,dw_Lep,dw_NT1,dw_NT2,mixState,t,tagCat,tagFlav,tau,w,w_Kao,w_Lep,w_NT1,w_NT2)
p.d.f.s
-------
RooBMixDecay::sig[ mistag=w delMistag=dw mixState=mixState tagFlav=tagFlav tau=tau dm=dm t=t ] = 0.2
RooSimultaneous::model[ indexCat=tagCat Lep=sig_Lep Kao=sig_Kao NT1=sig_NT1 NT2=sig_NT2 ] = 0.2
RooBMixDecay::sig_Kao[ mistag=w_Kao delMistag=dw_Kao ... t=t ] = 0.2
RooBMixDecay::sig_Lep[ mistag=w_Lep delMistag=dw_Lep ... t=t ] = 0.2
RooBMixDecay::sig_NT1[ mistag=w_NT1 delMistag=dw_NT1 ... t=t ] = 0.2
RooBMixDecay::sig_NT2[ mistag=w_NT2 delMistag=dw_NT2 ... t=t ] = 0.2
analytical resolution models
----------------------------
RooTruthModel::tm[ x=t ] = 1
• Result
204
Adding parameter pdfs to the likelihood
w.factory(“Gaussian::g(x[-10,10],mean[-10,10],sigma[3])”) ;
w.factory(“PROD::gprime(f,Gaussian(mean,1.15,0.30))”) ;
))30.0,15.1,(log(),;(log(),(log GaussxfLdata
i
• Systematic/external uncertainties can be modeledwith regular RooFit pdf objects.
• To incorporate in likelihood, simply multiply with orig pdf
– Any pdf can be supplied, e.g. Gaussian most common, but an also use class RooMultiVarGaussian to introduce a Gaussian uncertainty on multiple parameteres including a correlation
• Advantage of including systematic uncertainties in likelihood: error automatically propagated to error reported by MINUIT
205
Adding uncertainties to a likelihood
• Example 1 – Width known exactly
• Example 2 – Gaussian uncertainty on width
206
Using the fit result output
RooAbsPdf* paramPdf =
fr->createHessePdf(RooArgSet(frac,mean,sigma));
• The fit result class contains the full MINUIT output
• Can construct multi-variate Gaussian pdfrepresenting pdf on parameters
– Returned pdf represents HESSE parabolic approximation of fit
• Can also multiply this pdf in parameterswith a pdf in observables
– ‘Simultaneous fit’
207
Another approach to joint fitting
• ‘Asymmetric’ simultaneous fit may spend majority of it CPU time calculating the likelihood of the control sample part
– Because control sample have many more events
– Example: joint fit between CPV golden modes and BMixing samples
• Alternate solution: Make joint fit using likelihood of signal sample and parameterized likelihood of control sample
– Assumption: Likelihood can be described by a multi-variate Gaussian with correlations (i.e. log-likelihood is parabolic)
– Very easy to do in RooFit using RooFitResult->createHessePdf()
– Example on next page
208
Example of joint fit with parameterized likelihood
// Joint pdf construction
w.factory("SIMUL::model_sim(index[sig,ctl],
sig=model, ctl=model_ctl)") ;
// Joint data construction
RooDataSet simdata("simdata","simdata",w::x,Index(w::index),
Import("sig",*data),Import("ctl",*data_ctl)) ;
// Joint fit
RooFitResult* rs = w::model_sim.fitTo(simdata,Save()) ;
// Fit to control sample only
RooFitResult* r = w::model_ctl.fitTo(*data_ctl,Save()) ;
RooAbsPdf* ctrlParamPdf = r->createHessePdf(w::model_ctl.getParameters());
// Make pdf of parameters and import in workspace
ctrlParamPdf->SetName(“ctrlParamPdf”) ;
w.import(*ctrlParamPdf) ;
w.factory(“PROD::model_sim2(model,ctrlParamPdf)”) ;
// Joint fit with parameterized likelihood for control sample
RooFitResult* rs = w::model_sim2.fitTo(*data,Save()) ;
Regular joint fit
Joint fit with parameterized L for ctl sample
209