71
Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf of the author team: A. Hoecker, P. Speckmayer, J. Stelzer, J.Therhaag, E.v.Toerne , H. Voss And the contributors: See acknowledgments on page xx On the web: http:// tmva.sf.net / (home), https:// twiki.cern.ch/twiki/bin/view/TMVA/WebHome (tutorial)

Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

Embed Size (px)

Citation preview

Page 1: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

Multivariate Analysis with TMVA

Basic Concept and New Features

Helge Voss(*) (MPI–K, Heidelberg)

LHCb MVA Workshop, CERN, July 10, 2009

(*) On behalf of the author team: A. Hoecker, P. Speckmayer, J. Stelzer, J.Therhaag, E.v.Toerne , H. Voss

And the contributors: See acknowledgments on page xx

On the web: http://tmva.sf.net/ (home), https://twiki.cern.ch/twiki/bin/view/TMVA/WebHome (tutorial)

Page 2: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

2 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

A linear boundary? A nonlinear one?Rectangular cuts?

S

B

x1

x2 S

B

x1

x2 S

B

x1

x2

How can we decide?

Once decided on a class of boundaries, how to find the “optimal” one?

Event Classification

Suppose data sample of two types of events: with class labels Signal and Background (will restrict here to two class cases. Many classifiers can in principle be extended to several classes, otherwise, analyses can be staged)

how to set the decision boundary to select events of type S ? we have discriminating variables x1, x2, …

Page 3: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

3 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Regression In High Energy Physics

Most HEP event reconstruction require some sort of “function estimate”,

of which we do not have an analytical expression:

energy deposit in a the calorimeter: shower profile calibration entry location of the particle in the calorimeter distance between two overlapping photons in the calorimeter … you can certainly think of more..

Maybe you could even imagine some final physics parameter that needs to be fitted

to your event data and you would rather use a non analytic fit function from Monte

Carlo events than an analytic parametrisation of the theory ???:

reweighing method used in the LEP W-mass analysis …

Maybe your application can be successfully learned by a machine using Monte Carlo events ??

somewhat wild guessing, might not be reasonable to do..

Page 4: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

4 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Regression

linear?

x

f(x)

x

f(x)

x

f(x) constant ? non - linear?

how to estimate a “functional behaviour” from a given set of ‘known measurements” ?

Assume for example “D”-variables that somehow characterize the shower in your calorimeter.

from Monte Carlo or testbeam measurements, you have a data sample with events that measure all D-Variables x and the corresponding particle energy

the particle energy as a function of the D observables x is a surface in the D+1 dimensional space given by D-observables + energy

seems trivial??

maybe… the human eye and brain behind have very good pattern recognition capabilities!

but what if you have more variables?

Page 5: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

5 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Event Classification: Why MVA ?

• What do you think. Is this event is more “likely” signal or background ??

• Assume 4 Uncorrelated Gaussians:

Page 6: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

6 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Event Classification

Each event, if Signal or Background, has “D” measured variables.

D

“feature space”

y(x)

most general formy = y(x); x D x={x1,….,xD}: input variables

y(x): RnR:

If one histogramms the resulting y(x) values:

Find a mapping from D-dimensional input/observable/”feature” space

to one dimensional output

class lables

Who sees how this would look like for the regression problem?

Page 7: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

7 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

y(x)

Event Classification Each event, if Signal or Background, has “D” measured variables.

D

“feature space”

y(B) 0, y(S) 1

y(x): “test statistic” in D-dimensional space of input variables

y(x): RnR:

distributions of y(x): PDFS(y) and PDFB(y)

overlap of PDFS(y) and PDFB(y) separation power , purity

>cut: signal=cut: decision boundary<cut: background

y(x): used to set the selection cut!

Find a mapping from D-dimensional input/observable/”feature” space

y(x)=const: surface defining the decision boundary.

efficiency and purity

to one dimensional output

class lables

Page 8: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

8 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

MVA and Machine Learning

The previous slide was basically the idea of “Multivariate Analysis” (MVA)rem: What about “standard cuts” ?

Finding y(x) : RnR ( “basically” the same for regression and classification)given a certain type of model class y(x)

in an automatic way using “known” or “previously solved” eventsi.e. learn from known “patterns”

such that the resulting y(x) has good generalization properties when applied to “unknown” events

that is what the “machine” is supposed to be doing: supervised machine learning

Of course… there’s no magic, we still need to:choose the discriminating variables

choose the class of models (linear, non-linear, flexible or less flexible)

tune the “learning parameters” bias vs. variance trade off

check generalization properties

consider trade off between statistical and systematic uncertainties

Page 9: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

9 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Overview/Outline

Idea: rather than just implementing new MVA techniques and making them

available in ROOT (i.e. like TMulitLayerPercetron does):

one common platform / interface for all MVA classification and regression algorithms.

easy to use and switch between classifiers

(common) data pre-processing capabilities

train / test all classifiers on same data sample and evaluate consistently

common analysis, application and evaluation framework (ROOT Macros, C++ executables, python scripts..)

application of trained classifier/regressor using libTMVA.so and corresponding “weight files” or stand alone C++ code (latter is currently disabled (TMVA 4.01) )

Outline

The TMVA project

Survey of the available classifiers/regressors

Toy examples

General considerations and systematics

Page 10: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

10 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

TMVA Development and Distribution

TMVA is a sourceforge (SF) package for world-wide accessHome page ………………. http://tmva.sf.net/

SF project page …………. http://sf.net/projects/tmva

View SVN ………………… http://tmva.svn.sourceforge.net/viewvc/tmva/trunc/TMVA/

Mailing list .……………….. http://sf.net/mail/?group_id=152074

Tutorial TWiki ……………. https://twiki.cern.ch/twiki/bin/view/TMVA/WebHome

Active project

Currently 6 core developers, and 42 registered contributors at SF

>4864 downloads since March 2006 (without CVS/SVN checkouts and ROOT users)

current version. 4.0.1 (29 Jun 2009)

try to be fast on users requests bug fixes etc.

Integrated and distributed with ROOT since ROOT v5.11/03 (newest version 4.0.1 in ROOT production release 5.23)

Page 11: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

11 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

New (and future) TMVA Developments

MV-Regression

Code re-design allowing for:

boosting any classifier (not only decision trees)

combination of any pre-processing algorithms

classifier training in multi-regions

multi-class classifiers (will slowly creep in to some classifiers…)

cross-validation

New MVA-Technique

PDE-Foam (classification + regression)

Data preprocessing

Gauss-transformation

arbitrary combination of preprocessing

Numerous small add-ons etc..

Page 12: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

12 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Currently implemented classifiers :

Rectangular cut optimisation

Projective (1D) likelihood estimator

multidimensional likelihood estimators (PDE range search, PDE-Foam, k-Nearest Neighbour) (also as Regressor)

Linar discirminant analysis (Fisher, H-Matrix general Linear (also as Regressor) )

Function Discriminant (also as Regressor)

Artificial neural networks (3 multilayer perceptron implementations (MLP also as Regressor with multi-dimensional target) )

Boosted/bagged/randomised decision trees (also as Regressor)

Rule Ensemble Fitting

Support Vector Machine

(Plugin for any other MVA classifier (e.g. NeuroBayes))

Currently implemented data preprocessing :

Linear decorrelation

Principal component analysis

Gauss transformation

The TMVA Classifiers (Regressors)

Page 13: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

13 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Note that decorrelation is only complete, if Correlations are linear

Input variables are Gaussian distributed

Example for Data Preprocessing: Decorrelation

Commonly realised for all classifiers in TMVA (centrally in DataSet class)

using the “square-root” of the correlation matrix

using the Principal Component Analysis

originaloriginal SQRT derorr.SQRT derorr. PCA derorr.PCA derorr.

Removal of linear correlations by rotating input variables

Page 14: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

14 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

“Gaussian-isation”

Improve decorrelation by pre-Gaussianisation of variables

First: transformation to achieve uniform (flat) distribution:

event

evefl

ntat variabl , es

kx

k k

i

k kix p x kx d

Transformed variable k Measured value PDF of variable k

Second: make Gaussian via inverse error function:

Gauss 1 flevent even

at

t 2 erf 2 1 var iables,k ki i kx x

2

0

2erf

xte tx d

The integral can be solved in an unbinned way by event counting,

or by creating non-parametric PDFs (see later for likelihood section)

Third: decorrelate (and “iterate” this procedure)

Page 15: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

15 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

OriginalOriginal

“Gaussian-isation”

Signal - GaussianisedSignal - Gaussianised

We cannot simultaneously Gaussianise both signal and background ?!

Background - GaussianisedBackground - Gaussianised

Page 16: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

16 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Pre-processing

The decorrelation/gauss transform can be determined from either: Signal + Background together

Signal only

Background only

For all but the “Likelihood” methods (1D or multi-dimensional) we need to decide on either of those. (TMVA default: signal+background)

For “Likelihood” methods where we test “Signal” hypothesis against “Background” hypothesis, BOTH are used.

Page 17: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

17 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Rectangular Cut Optimisation

Simplest method: indiviudual cuts on each variable

cuts out a single rectangular volume from the variable space

Technical challenge: how to find optimal cuts ? MINUIT fails due to non-unique solution space

TMVA uses: Monte Carlo sampling, Genetic Algorithm, Simulated Annealing

Huge speed improvement of volume search by sorting events in binary tree

Cuts usually benefit from prior decorrelation of cut variables

Page 18: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

18 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Projective (1D) Likelihood Estimator (PDE Approach)

Probability density estimators for each input variable

combined in likelihood estimator (ignoring correlations)

Optimal approach if zero correlations (or linear decorrelation)

Otherwise: significant performance loss

Technical challenge: how to estimate the PDF shapes

Difficult to automate for arbitrary PDFs

Easy to automate, can create artefacts/suppress information

Automatic, unbiased, but suboptimal

TMVA uses binned shape interpolation using spline functions

Or: unbinned adaptive Gaussian kernel density estimation

TMVA performs automatic validation of goodness-of-fit

3 ways: parametric fitting (function) event counting nonparametric fitting

Page 19: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

19 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Multidimensional PDE Approach

Use a single PDF per event class (sig, bkg), which spans Nvar dimensions

2 different implementations of the same idea:PDE Range-Search: count number of signal and background events in “vicinity” of test event preset or adaptive volume defines “vicinity”

Carli-Koblitz, NIM A501, 576 (2003)

H1

H0

x1

x2

test event

Improve simple event counting by use of kernel functions to penalise large distance from test event

eventPDERS , 0.86y i V

Increase counting speed with binary search tree

Genuine k-NN algorithm (author R. Ospanov)

Intrinsically adaptive approach

Very fast search with kd-tree event sorting

Regression: assume the function value distance weighted averaged over the “volume”

PDE-Foam: self adaptive phase space binning divides space in hyperqubes as “volumes”…

Page 20: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

20 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

H1

H0

x1

x2 H1

H0

x1

x2 H1

H0

x1

x2 H1

H0

x1

x2

Fisher’s Linear Discriminant Analysis (LDA)

LDA determines axis in the input variable hyperspace such that a projection of events onto this axis pushes signal and background as far away from each other as possible

Classifier response couldn’t be simpler:

event evevari

Fabl

i 0e

nts

k kk

i iy F x F

“Fisher coefficients”

projection axis

Well known, simple and elegant classifier

Function discriminant analysis (FDA)

Fit any user-defined function of input variables requiring that signal events return 1 and background 0

Parameter fitting: Genetics Alg., MINUIT, MC and combinations Easy reproduction of Fisher result, but can add nonlinearities Very transparent discriminator

Page 21: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

21 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Nonlinear Analysis: Artificial Neural Networks

Achieve nonlinear classifier response by “activating” output nodes using nonlinear weights

1( ) 1 xA x e

1

i

. . .N

1 input layer k hidden layers 1 ouput layer

1

j

M1

. . .

. . . 1

. . .Mk

2 output classes (signal and background)

Nvar discriminating input variables

11w

ijw

1jw. . .. . .

1( ) ( ) ( ) ( 1)

01

kMk k k k

j j ij ii

x w w xA

var

(0)1..i Nx

( 1)1,2kx

(“Activation” function)

with:

Fee

d-fo

rwar

d M

ultil

ayer

Per

cept

ron

TMlpANN: Interface to ROOT’s MLP implementation

MLP: TMVA’s own MLP implementation for increased speed and flexibility

CFMlpANN: ALEPH’s Higgs search ANN, translated from FORTRAN

Three different implementations in TMVA (all are Multilayer Perceptrons)

Regression: uses the real valued output of ONE node

Page 22: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

22 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Boosted Decision Trees (BDT)DT: Sequential application of cuts splits the data into nodes, where the final nodes (leafs) classify an event as signal or background

BDT: combine forest of DTs, with differently weighted events in each tree (trees can also be weighted)

Bottom-up “pruning” of a decision tree

Remove statistically insignificant nodes to reduce tree overtraining

e.g., “AdaBoost”: incorrectly classified events receive larger weight in next DT

GradientBoost

“Bagging”: random poisson event weights

resampling with replacement

“Randomised Forest”, like bagging and usaging of random sub-sets of variables

Boosting/bagging/randomising create set of “basis functions”: final classifier is linear combination of these improves stability

Page 23: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

23 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Boosted Decision Trees (BDT)DT: Sequential application of cuts splits the data into nodes, where the final nodes (leafs) classify an event as signal or background

Bottom-up “pruning” of a decision tree

Remove statistically insignificant nodes to reduce tree overtraining

Decision tree before pruningDecision tree after pruning

Randomised Forests don’t need pruning VERY ROBUSTSome people even claim that any “boosted” decision trees don’t need to be pruned, but

according to our experience, this is not always true.

Regression: assume the function value of average in the leaf node

Page 24: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

24 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Learning with Rule Ensembles

One of the elementary cellular automaton rules (Wolfram 1983, 2002). It specifies the next color in a cell, depending on its color and its immediate neighbors. Its rule outcomes are encoded in the binary representation 30=000111102.

Following RuleFit approach by Friedman-Popescu Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford U., 2003

Model is linear combination of rules, where a rule is a sequence of cuts

RF 01 1

ˆ ˆR RM n

m m k km k

x x xy a ra b

rules (cut sequence rm=1 if all cuts satisfied, =0 otherwise)

normalised discriminating event variables

RuleFit classifier

Linear Fisher termSum of rules

The problem to solve is

Create rule ensemble: use forest of decision trees

Fit coefficients am, bk: gradient direct regularization minimising Risk (Friedman et al.)

Pruning removes topologically equal rules” (same variables in cut sequence)

Page 25: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

25 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Boosting

Training Sampleclassifier

C(0)(x)

Weighted Sample

re-weight

classifier C(1)(x)

Weighted Sample

re-weight

classifier C(2)(x)

Weighted Sample

re-weight

Weighted Sample

re-weight

classifier C(3)(x)

classifier C(m)(x)

ClassifierN(i)

ii

y(x) w C (x)

Page 26: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

26 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Adaptive Boosting (AdaBoost)

Training Sampleclassifier

C(0)(x)

Weighted Sample

re-weight

classifier C(1)(x)

Weighted Sample

re-weight

classifier C(2)(x)

Weighted Sample

re-weight

Weighted Sample

re-weight

classifier C(3)(x)

classifier C(m)(x)

err

err

err

1 fwith :

f

misclassified eventsf

all events

ClassifierN (i)(i)err

(i)i err

1 fy(x) log C (x)

f

AdaBoost re-weights events misclassified by previous classifier by:

AdaBoost weights the classifiers also using the error rate of the individual classifier according to:

Page 27: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

27 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

AdaBoost: A simple demonstration

The example: (somewhat artificial…but nice for demonstration) : • Data file with three “bumps”• Weak classifier (i.e. one single simple “cut” ↔ decision tree stumps )

B S

var(i) > x var(i) <= x

Two reasonable cuts: a) Var0 > 0.5 εsignal=66% εbkg ≈ 0% misclassified events in total 16.5%

or b) Var0 < -0.5 εsignal=33% εbkg ≈ 0% misclassified events in total 33%

the training of a single decision tree stump will find “cut a)”

a)b)

Page 28: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

28 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

AdaBoost: A simple demonstration

The first “tree”, choosing cut a) will give an error fraction: err = 0.165

.. and hence will chose: “cut b)”: Var0 < -0.5

b)

The combined classifier: Tree1 + Tree2the (weighted) average of the response to

a test event from both trees is able to separate signal from background as good as one would expect from the most powerful classifier

before building the next “tree”: weight wrong classified training events by ( 1-err/err) ) ≈ 5

the next “tree” sees essentially the following data sample:

re-weight

Page 29: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

29 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

AdaBoost: A simple demonstration

Only 1 tree “stump” Only 2 tree “stumps” with AdaBoost

Page 30: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

30 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Boosting Fisher Discriminant

• Fisher Discriminant: • y(x) = A linear combination of the input variables

• A linear combination of Fisher discriminant is again a Fisher discriminant. Hence “simple boosting doesn’t help”• However:

• apply the “Fisher Discriminant” CUT in each step (or any other non-linear transformation”

then it works!

Page 31: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

31 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Support Vector Machine

There are methods to create linear decision boundaries using only measures of distances

leads to quadratic optimisation processs

• Using variable transformations, i.e moving into a higher dimensional space (i.e. using var1*var1 in Fisher Discriminant) allows in principle to separate with linear decision boundaries non linear problems

• The decision boundary in the end is defined only by training events that are closest to the boundary

Support Vector Machine

Page 32: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

32 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Support Vector Machines

x1

x2

margin

support vectors

Sep

arab

le d

ata

Find hyperplane that best separates signal from background

optimal hyperplane

Best separation: maximum distance (margin) between closest events (support) to hyperplane

Linear decision boundary

Non

-sep

arab

le d

ata

Solution of largest margin depends only on inner product of support vectors (distances)

quadratic minimisation problem

1

2

4

3

If data non-separable add misclassification cost parameter C·ii to minimisation function

Page 33: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

33 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Support Vector Machines

x1

x2

x1

x3

x1

x2

Non-linear cases:

Kernel size paramter typically needs careful tuning! (Overtraining!)

Transform variables into higher dimensional feature space where again a linear boundary (hyperplane) can separate the data

Explicit transformation doesn’t need to be specified. Only need the “scalar product” (inner product) x·x Ф(x)·Ф(x).

certain Kernel Functions can be interpreted as scalar products between transformed vectors in the higher dimensional feature space. e.g.: Gaussian, Polynomial, Sigmoid

Choose Kernel and fit the hyperplane using the linear techniques developed above

(x1,x2)Sep

arab

le d

ata

Non

-sep

arab

le d

ata

Find hyperplane that best separates signal from background

Best separation: maximum distance (margin) between closest events (support) to hyperplane

Linear decision boundary

Solution depends only on inner product of support vectors quadratic minimisation problem

If data non-separable add misclassification cost parameter C·ii to minimisation function

Page 34: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

34 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Data Preparation and Preprocessing

Data input format: ROOT TTree or ASCII

Supports selection of any subset or combination or function of available variables

Supports application of pre-selection cuts (possibly independent for signal and bkg)

Supports global event weights for signal or background input files

Supports use of any input variable as individual event weight

Supports various methods for splitting into training and test samples:

Block wise

Randomly

Periodically (i.e. periodically 3 testing ev., 2 training ev., 3 testing ev, 2 training ev. ….)

User defined separate Training and Testing trees

Preprocessing of input variables (e.g., decorrelation)

Page 35: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

35 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Code Flow for Training and Application Phases

T MVA tutorialScripts can be ROOT scripts, C++ executables or python scripts (via PyROOT)Scripts can be ROOT scripts, C++ executables or python scripts (via PyROOT)

Page 36: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

36 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

MVA Evaluation Framework

TMVA is not only a collection of classifiers, but an MVA framework

After training, TMVA provides ROOT evaluation scripts (through GUI)

Plot all signal (S) and background (B) input variables with and without pre-processing

Correlation scatters and linear coefficients for S & B

Classifier outputs (S & B) for test and training samples (spot overtraining)

Classifier Rarity distribution

Classifier significance with optimal cuts

B rejection versus S efficiency

Classifier-specific plots:• Likelihood reference distributions• Classifier PDFs (for probability output and Rarity)• Network architecture, weights and convergence• Rule Fitting analysis plots• Visualisation of decision trees

Page 37: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

37 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Evaluating the Classifier Training

Projective likelihood PDFs, MLP training, BDTs, …

average no. of nodes before/after pruning: 4193 / 968

Page 38: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

38 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Evaluating the Classifier Training

Classifier output distributions for test and training samples …

Page 39: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

39 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Evaluating the Classifier Training

Optimal cut for each classifiers …

Using the TMVA graphical output one can determine the optimal cut on a classifier output (working point in “statistical significance)

Page 40: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

40 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Evaluating the Classifier Training

Background rejection versus signal efficiencies …

ˆ( ) ( )y

R y y y dy

Best plot to compare classifier performance

An elegant variable is the Rarity: transforms to uniform background. Height of signal peak direct measure of classifier performance

If background in data non-uniform problem in training sample

Page 41: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

41 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Evaluating the Classifiers (taken from TMVA output…)

--- Fisher : Ranking result (top variable is best ranked)--- Fisher : ------------------------------------------------ Fisher : Rank : Variable : Discr. power--- Fisher : ------------------------------------------------ Fisher : 1 : var4 : 2.175e-01--- Fisher : 2 : var3 : 1.718e-01--- Fisher : 3 : var1 : 9.549e-02--- Fisher : 4 : var2 : 2.841e-02--- Fisher : ---------------------------------------------B

ette

r va

riabl

e

--- Factory : Inter-MVA overlap matrix (signal):--- Factory : --------------------------------- Factory : Likelihood Fisher--- Factory : Likelihood: +1.000 +0.667--- Factory : Fisher: +0.667 +1.000--- Factory : ------------------------------

Input Variable Ranking

Classifier correlation and overlap

How discriminating is a variable ?

Do classifiers select the same events as signal and background ? If not, there is something to gain !

Page 42: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

42 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Evaluating the Classifiers (taken from TMVA output…)

Evaluation results ranked by best signal efficiency and purity (area)------------------------------------------------------------------------------ MVA Signal efficiency at bkg eff. (error): | Sepa- Signifi-

Methods: @B=0.01 @B=0.10 @B=0.30 Area | ration: cance:------------------------------------------------------------------------------

Fisher : 0.268(03) 0.653(03) 0.873(02) 0.882 | 0.444 1.189MLP : 0.266(03) 0.656(03) 0.873(02) 0.882 | 0.444 1.260LikelihoodD : 0.259(03) 0.649(03) 0.871(02) 0.880 | 0.441 1.251PDERS : 0.223(03) 0.628(03) 0.861(02) 0.870 | 0.417 1.192RuleFit : 0.196(03) 0.607(03) 0.845(02) 0.859 | 0.390 1.092HMatrix : 0.058(01) 0.622(03) 0.868(02) 0.855 | 0.410 1.093BDT : 0.154(02) 0.594(04) 0.838(03) 0.852 | 0.380 1.099CutsGA : 0.109(02) 1.000(00) 0.717(03) 0.784 | 0.000 0.000Likelihood : 0.086(02) 0.387(03) 0.677(03) 0.757 | 0.199 0.682

------------------------------------------------------------------------------Testing efficiency compared to training efficiency (overtraining check)

------------------------------------------------------------------------------ MVA Signal efficiency: from test sample (from traing sample)

Methods: @B=0.01 @B=0.10 @B=0.30------------------------------------------------------------------------------

Fisher : 0.268 (0.275) 0.653 (0.658) 0.873 (0.873)MLP : 0.266 (0.278) 0.656 (0.658) 0.873 (0.873)LikelihoodD : 0.259 (0.273) 0.649 (0.657) 0.871 (0.872)PDERS : 0.223 (0.389) 0.628 (0.691) 0.861 (0.881)RuleFit : 0.196 (0.198) 0.607 (0.616) 0.845 (0.848)HMatrix : 0.058 (0.060) 0.622 (0.623) 0.868 (0.868)BDT : 0.154 (0.268) 0.594 (0.736) 0.838 (0.911)CutsGA : 0.109 (0.123) 1.000 (0.424) 0.717 (0.715)Likelihood : 0.086 (0.092) 0.387 (0.379) 0.677 (0.677)

-----------------------------------------------------------------------------

Bet

ter

clas

sifie

r

Check for over-training

Page 43: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

43 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Subjective Summary of TMVA Classifier Properties

Criteria

Classifiers

CutsLikeli-hood

PDERS/ k-NN

H-Matrix Fisher MLP BDTRule

fittingSVM

Perfor-mance

no / linear correlations nonlinear

correlations

Speed

Training Response /

Robust-ness

Overtraining Weak input variables

Curse of dimensionality Transparency

The properties of the Function discriminant (FDA) depend on the chosen function

Page 44: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

44 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Regression Example

Example: Target as a quadratic function of 2 variables

•approximated by:• “Linear Regressor”

• Neural Network

Page 45: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

45 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

NEW! TMVA Users Guide for 4.0Available on http://tmva.sf.net

TMVA Users Guide97pp, incl. code examples

arXiv physics/0703039

web page with quick reference summary of all training options!

Page 46: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

46 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Some T o y E x a m p l e sSome T o y E x a m p l e s

Page 47: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

47 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

The “Schachbrett” Toy (chess board)

Performance achieved without parameter tuning: PDERS and BDT best “out of the box” classifiers

After some parameter tuning, also SVM und ANN(MLP) perform equally well

Theoretical maximum

Events weighted by SVM response

Event distribution

Page 48: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

48 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

More Toys: Linear-, Cross-, Circular Correlations

Illustrate the behaviour of linear and nonlinear classifiers

Linear correlations(same for signal and background)

Linear correlations(opposite for signal and background)

Circular correlations(same for signal and background)

Page 49: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

49 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

How does linear decorrelation affect strongly nonlinear cases ?

Original correlations

SQRT decorrelation

PCA decorrelation

Page 50: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

50 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Weight Variables by Classifier Output

Linear correlations(same for signal and background)

Cross-linear correlations(opposite for signal and background)

Circular correlations(same for signal and background)

How well do the classifier resolve the various correlation patterns ?

LikelihoodLikelihood - DPDERSFisherMLPBDT

Page 51: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

51 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Final Classifier Performance

Background rejection versus signal efficiency curve:

LinearExampleCrossExampleCircularExample

Page 52: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

52 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Stability with Respect to Irrelevant Variables

Toy example with 2 discriminating and 4 non-discriminating variables ?

Page 53: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

53 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Stability with Respect to Irrelevant Variables

Toy example with 2 discriminating and 4 non-discriminating variables ?

use only two discriminant variables in classifiers

use only two discriminant variables in classifiers

Page 54: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

54 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Stability with Respect to Irrelevant Variables

Toy example with 2 discriminating and 4 non-discriminating variables ?

use only two discriminant variables in classifiers

use only two discriminant variables in classifiersuse all discriminant variables in classifiers

use all discriminant variables in classifiers

Page 55: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

55 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

g e n e r a l r e m a r k s a n d

a b o u t s y s t e m a t i c s

g e n e r a l r e m a r k s a n d

a b o u t s y s t e m a t i c s

Page 56: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

56 Helge Voss Graduierten-Kolleg, Freiburg, 11.-15. Mai 2009 ― Multivariate Data Analysis and Machine Learning

General Advice for (MVA) AnalysesThere is no magic in MVA-Methods:

no need to be too afraid of “black boxes”

you typically still need to make careful tuning and do some “hard work”

The most important thing at the start is finding good observables

good separation power between S and B

little correlations amongst each other

no correlation with the parameters you try to measure in your signal sample!

Think also about possible combination of variables

can this eliminate correlation

Always apply straightforward preselection cuts and let the MVA only do the rest.

“Sharp features should be avoided” numerical problems, loss of information when binning is applied

simple variable transformations (i.e. log(var1) ) can often smooth out these areas and allow signal and background differences to appear in a clearer way

Treat regions in the detector that have different features “independent”

can introduce correlations where otherwise the variables would be uncorrelated!

Page 57: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

57 Helge Voss Graduierten-Kolleg, Freiburg, 11.-15. Mai 2009 ― Multivariate Data Analysis and Machine Learning

Some Words about Systematic Errors Typical worries are:

What happens if the estimated “Probability Density” is wrong ?

Can the Classifier, i.e. the discrimination function y(x), introduce systematic uncertainties?

What happens if the training data do not match “reality”

Any wrong PDF leads to imperfect discrimination function

Imperfect ! (calling it “wrong” wouldn’t “right”) y(x) loss of discrimination power

that’s all!

classical cuts face exactly the same problem,

But: in addition to cutting on features that are not correct, now you can also “exploit” correlations that are in fact not correct HOWEVER: see next slide

P(x | S)y(x)

P(x | B)

Systematic error are only introduced once “Monte Carlo events” with imperfect modeling are used for

efficiency; purity

#expected events

same problem with classical “cut” analysis

use control samples to test MVA-output distribution (y(x))

Combined variable (MVA-output, y(x)) might “hide” problems in ONE individual variable

train classifier with few variables only and compare with data

check all input variables individually ANYWAY

Page 58: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

58 Helge Voss Graduierten-Kolleg, Freiburg, 11.-15. Mai 2009 ― Multivariate Data Analysis and Machine Learning

Systematic “Error” in Correlations• Use as training sample events that have correlatetions

• optimize CUTs• train an propper MVA (e.g. Likelihood, BDT)

• Assume in “real data” there are NO correlations SEE what happens!!

Page 59: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

59 Helge Voss Graduierten-Kolleg, Freiburg, 11.-15. Mai 2009 ― Multivariate Data Analysis and Machine Learning

Systematic “Error” in Correlations•Compare “Data” (TestSample) and Monte-Carlo

(both taken from the same underlying distribution)

Page 60: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

60 Helge Voss Graduierten-Kolleg, Freiburg, 11.-15. Mai 2009 ― Multivariate Data Analysis and Machine Learning

Systematic “Error” in Correlations•Compare “Data” (TestSample) and Monte-Carlo

(both taken from the same underlying distributions that

differ by the correlation!!! )

Differences are ONLY visible in the MVA-output plots (and if you’d look at cut sequences….)

Page 61: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

61 Helge Voss Graduierten-Kolleg, Freiburg, 11.-15. Mai 2009 ― Multivariate Data Analysis and Machine Learning

Treatment of Systematic Uncertainties

“Calibration uncertainty” may shift the central value and hence worsen (or increase) the discrimination power of “var4”

Is there a strategy however to become ‘less sensitive’ to possible systematic uncertainties

i.e. classically: variable that is prone to uncertainties do not cut in the region of steepest gradient

classically one would not choose the most important cut on an uncertain variable

Try to make classifier less sensitive to “uncertain variables”

i.e. re-weight events in training to decrease separation

in variables with large systematic uncertainty

(certainly not yet a recipe that can strictly be followed, more

an idea of what could perhaps be done)

Page 62: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

62 Helge Voss Graduierten-Kolleg, Freiburg, 11.-15. Mai 2009 ― Multivariate Data Analysis and Machine Learning

Treatment of Systematic Uncertainties

1st Way

Cla

ssifi

er o

utpu

t dis

trib

utio

ns fo

r si

gnal

onl

yC

lass

ifier

out

put d

istr

ibut

ions

for

sign

al o

nly

Page 63: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

63 Helge Voss Graduierten-Kolleg, Freiburg, 11.-15. Mai 2009 ― Multivariate Data Analysis and Machine Learning

Treatment of Systematic Uncertainties

2nd Way

Cla

ssifi

er o

utpu

t dis

trib

utio

ns fo

r si

gnal

onl

yC

lass

ifier

out

put d

istr

ibut

ions

for

sign

al o

nly

Page 64: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

64 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

C o p y r i g h t s & C r e d i t sC o p y r i g h t s & C r e d i t s

TMVA is open source software

Use & redistribution of source permitted according to terms in BSD license

Acknowledgments: The fast development of TMVA would not have been possible without the contribution and feedback from many developers and users to whom we are indebted. Andreas Hoecker (CERN, Switzerland), Jörg Stelzer (CERN, Switzerland), Peter Speckmayer (CERN, Switzerland), Jan Therhaag (Universität Bonn, Germany), Eckhard von Toerne (Universität Bonn, Germany), Helge Voss (MPI für Kernphysik Heidelberg, Germany), Tancredi Carli (CERN, Switzerland), Asen Christov (Universität Freiburg, Germany), Or Cohen (CERN, Switzerland and Weizmann, Israel), Krzysztof Danielowski (IFJ and AGH/UJ, Krakow, Poland), Dominik Dannheim (CERN, Switzerland), Sophie Henrot-Versille (LAL Orsay, France), Matthew Jachowski (Stanford University, USA), Kamil Kraszewski (IFJ and AGH/UJ, Krakow, Poland), Attila Krasznahorkay Jr. (CERN, Switzerland, and Manchester U., UK), Maciej Kruk (IFJ and AGH/UJ, Krakow, Poland), Yair Mahalalel (Tel Aviv University, Israel), Rustem Ospanov (University of Texas, USA), Xavier Prudent (LAPP Annecy, France), Arnaud Robert (LPNHE Paris, France), Doug Schouten (S. Fraser U., Canada), Fredrik Tegenfeldt (Iowa University, USA, until Aug 2007), Alexander Voigt (CERN, Switzerland), Kai Voss (University of Victoria, Canada), Marcin Wolter (IFJ PAN Krakow, Poland), Andrzej Zemla (IFJ PAN Krakow, Poland). Backup slides on:

(i) some illustrative toy examples (ii) treatment of systematic uncertainties (iii) sensitivity to weak input variables

http://tmva.sf.net/

Page 65: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

65 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Questions/Discussion pointsHow to treat systematic uncertainties in MVA methods?

check multidimensioal training samples including their correlations

which differences we care about?

Can one give a particular classifier that is:

always the best?

always the best for specific class of models?

How could one teach the classifiers to optimise for certain regions of interest (i.e. very large background rejection only ?)

is that actually necessary, or does the cut on the MVA output provide just this?

Train according to “smallest error on measurement” rather than best separation of signal/background

Anyone a good idea for maybe better treatment of negative weights?

How to best divide the data sample into “test and training” sample?

Do we want an additional validation sample?

automatic tuning of parameters

cross validation

Page 66: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

66 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Bias Variance Trade off -- Overtraining

model complexityhighlow

pred

ictio

n er

ror

low bias

high variance

high bias

low variance

test sample

training sample

S

B

x1

x2 S

B

x1

x2

Overtraining!

Page 67: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

67 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Cross Validation many (all) classifiers have tuning parameters “” that need to be controlled against

overtraining

number of training cycles, number of nodes (neural net)

smoothing parameter h (kernel density estimator)

….

the more free parameter a classifier has to adjust internally more prone to overtraining

more training data better training results

division of data set into “training” and “test” sample reduces the “training” data

Train TrainTrainTrainTest Train

Cross Validation: divide the data sample into say 5 sub-sets

Train TrainTrainTrainTest TrainTrain TrainTrainTrain TestTrain TrainTrain TestTrainTrain TrainTrainTrain TestTrain

train 5 classifiers: yi(x,) : i=1,..5,

classifier yi(x,) is trained without the i-th sub sample

calculate the test error:events

i kkevents

1CV( ) L(y (x , )) L : loss function

N

choose tuning parameter a for which CV() is minimum and train the final classifier using all data

Page 68: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

68 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Minimisation

Robust global minimum finder needed at various places in TMVA

Default solution in HEP: (T)Minuit/Migrad [ How much longer do we need to suffer …. ? ]

Gradient-driven search, using variable metric, can use quadratic Newton-type solution

Poor global minimum finder, gets quickly stuck in presence of local minima

Specific global optimisers implemented in TMVA:Genetic Algorithm: biology-inspired optimisation algorithm

Simulated Annealing: slow “cooling” of system to avoid “freezing” in local solution

Brute force method: Monte Carlo SamplingSample entire solution space, and chose solution providing minimum estimator

Good global minimum finder, but poor accuracy

TMVA allows to chain minimisers

For example, one can use MC sampling to detect the vicinity of a global minimum, and then use Minuit to accurately converge to it.

Page 69: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

69 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Monte Carlo Brute force methods: Random Monte Carlo or Grid search

•Sample entire solution space, and chose solution providing minimum estimator

•Good global minimum finder, but poor accuracy

MinuitDefault solution in HEP: Minuit

• Gradient-driven search, using variable metric, can use quadratic Newton-type solution

• Poor global minimum finder, gets quickly stuck in presence of local minima

Grid Search

Minimisation Techniques

Simulated Annealing

Genetic Algorithm

Like heating up metal and slowly cooling it down (“annealing”)

Atoms in metal move towards the state of lowest energy while for sudden cooling atoms tend to freeze in intermediate higher energy states slow “cooling” of system to avoid “freezing” in local solution

Biology-inspired

• “Genetic” representation of points in the parameter space

• Uses mutation and “crossover”

• Finds approximately global minima

Page 70: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

70 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Binary Trees

Tree data structure in which each node has at most two childrenTypically the child nodes are called left and right

Binary trees are used in TMVA to implement binary search trees and decision trees

Root node

Leaf node

><

Dep

th o

f a

tree Amount of computing time to

search for events:

• Box search: (Nevents)Nvar

• BT search: Nevents·Nvarln2(Nevents)

Page 71: Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 ( * ) On behalf

71 LHCb MVA Workshop, CERN, July 10, 2009 H. Voss ― Multivariate Analysis with TMVA

Gauss-Decorr-Gauss etc..

G

GDG

GDGDGDG