32
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Modern Methods of Data Analysis Lecture XII (14.01.08) Multi-Variate Analysis Methods (MVA) Contents:

Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Modern Methods ofData Analysis

Lecture XII (14.01.08)

● Multi-Variate Analysis Methods (MVA)Contents:

Page 2: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Multi-Variate Analysis Methods● Univariate – only one attribute that is represented by one

variable in data is considered● Multivariate – treat several attributes in data simultaneously,

e.g. investigate several variables at the same time

Considering attributes separately in a multi-dimensional data set can lead to wrong conclusions due to correlation

– imagine a data set of dental patients with the following attributes: age, sex, family status, education

– investigate tooth loss in univariate analysis, find that loss depends strongly on age, but also on education and family status. For instance, married people have less teeth than singles. To be married is bad for your denture???

– in multivariate analysis influence/correlations of different factors/attributes is taken into account => married people are in general older than singles

– Factoring out the attribute age => married people have in fact better teeth than singles

Page 3: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Event Classification● How to exploit the information present in the discriminating variables?● often, a lot of information is also given by the correlation of variables

● Different techniques use different ways trying to exploit (all) features.

=> compare an choose● How to make a selection? -> Let the machine learn (training)

Page 4: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Methods Classification● Discriminant Analysis (mainly used)

– discriminate between different groups in data, e.g. signal and background (training events are classified as signal or background)

– find discriminant or test statistics to separate classes– reduce dimensionality (feature space) of data => simplifies

optimization

● Cluster Analysis– find groups (cluster) in data of similar attributes– contrary to discriminant method, the cluster attributes are not

known a priori

● Principal Component Analysis– find new (uncorrelated) variables trough transformations– new (reduced) set of variables are principal components and

describe main feature in data– reduce dimensionality in data

Page 5: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Use in Particle Physics● since early 1990's multivariate methods are used in physics● e.g.: event selection for single top analysis 2006

● physics analysis tasks are inherently multi variate– event selection (trigger)– event reconstruction (tracking,vertexing, particle ID) – signal/background discrimination– parameter estimation– ...

Page 6: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Discrimination with LH ratio ● Neyman-Pearson: optimal separation between 2 hypothesis (e.g.

signal & background) via cut on likelihood ratio:

● Acceptance region with largest power (highest signal purity) at given significance level given by “likelihood ratio”

● However have to know pdf analytically, which is not possible in many cases. In practice use MC simulation to approximate pdfs in multidimensional histograms

● Method impractical if too many input variables – finite MC statistics – MVA: Alternatives which have the potential to come close in seperation power to LH ratio – e.g. often done, however not optimal: Factorize likelihood, this

means neglecting correlations!

Page 7: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Linear Discriminant Analysis (I)● Simplest form for test statistic is linear function of input

variables :

● How to compute optimal components of ?

mean value of signal/background data:

covariance matrix of signal/background data:

Page 8: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Linear Discriminant Analysis (II)mean value of test statistic t for signal and background:

covariance matrix of test statistic t for signal and background:

We like to maximize to get good separation between signal and background. We like the data to be as closely concentrated around mean values as possible => Minimize

Fisher discriminant

Page 9: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Fisher Discriminant

B,W only depend on signal and background sample

can be computed from S & B samples, without knowledge of pdf!

Page 10: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Fisher Discriminant for Gaussian Distribution

● Assume multidimensional Gaussian Distribution for S and B with

Exponential function is monotonic, thus cutting on t gives equivalent separation power than cutting on LH ratio! In this case Fisher discriminant is optimal test statistic!

Page 11: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Introduction Neural Networks:● Many areas, where computers are significant

faster (“better”) than human brain:– e.g. all kind of computation & numerical algorithms

● However some areas human brain, is significant better – e.g. pattern recognition (hand writing, face recognition)

● human brain much more robust against (partially) missing information (shadow/fog on picture etc.)

● Neural nets motivated from structure of brain (“neurons”)

Page 12: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Model of Artificial Neuron: “Unit”● McCulloch-Pitts Unit

Σa[i,j]x[i] x[j]∫g[j]

x[1]

x[2]

x[3]

.

.

.

.

.

.

.

.

.

input input activation output outputlinks function function links

x[j] = g[j](Σa[i,j]x[i])

Page 13: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Activation Function● Simple case:

step function (threshold)

● mainly used sigmoid or tanh(x):– possible to

differentiate, useful for optimization

– useful to describe non linear functions

sigmoid

Page 14: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Structure of Neural Nets (I)● One directional flow of information: “feed forward” net● Typically ordered in “layers”

– input layer, output layer– hidden layers

● One hidden layer + sigmoid activation function enough to approximate any continuous function to arbitrary precision

● Two hidden layers + sigmoid activation function to describe any function to arbitrary precision.

Page 15: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Structure of Neural Nets (II)● No hidden layers nets with step function:

and in n=3 dimensions:

no hidden layers: only linear separation possibleequivalent to Fisher discriminant analysis

Page 16: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Structure of Neural Nets (III)

1 hidden layer,& step function:konvexe polynoms

2 hidden layer& step function:overlay of severalkonvexe polynomshere: A and not B

additional layers don't bring anyfurther feature

Page 17: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

AND/OR/XOR Nets

● “Design” an AND and and OR network:– two input nodes– one output node– no hidden layer– give weights of connections & treshold for activation

function

● Design an XOR network:– two input nodes– one output node– one hidden layer– give weight of connections & treshold for activation

function– Why is it not possible to use a network without a hidden

layer?

Page 18: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Recursive Nets (I)

“direct feedback”:neurons can enhance or suppress their own activation (depending on sign of weight). In this structure neurons are often at limits of their activation region

“indirect feedback”:connection to neurons of lower levelsStructure is often used to focus on some special input characteristics

Page 19: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Recursive Nets (II)“lateral feedback”:Feedback among all nodes of each hidden layer. Often used with suppressingconnection to neighbouring nodes and selfactivating connection. The strongest nodeis activated, others are suppressed. “The winner takes it all” nets.

Fully connected nets:If matrix is symmetric and no self activation (diagonal elements = 0)this is a Hopfield net. (only historicalimportance.)

Feed forward nets with sigmoid function & one or two hidden layers most commonly used nets often called multi layer perceptron (MLP).

Page 20: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Error (Loss) Function

● Parameters of NN determined by minimizing error function. => “network training”, backward propagation

● E.g. Χ²-like error or loss function:

s: activation function; a[i], w[ij] weight of connections

NN target valuefor background

NN target value for signal

w: weight to adjust for different size of S and B training samples

Page 21: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Control Sample

● too many (hidden) nodes, too small training sample and too long training => risk of over training

● use control sample, stop training, when performance on control sample gets worse

Page 22: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Neural Nets: Practical Aspects● Choose input variables sensibly:

– Do not include variables that are badly simulated– Avoid variables with high correlations among themselves

=> drop all but one– Some input variables have no discriminative power =>

drop them, reduce dimensionality– Transform strongly peaked distributions into smoother

ones, using log(), for instance– Transform all variable in similar numerical range

● Choose architecture sensibly:– start with simple architecture, increase complexity

gradually● Avoid overtuning, use cross validation on independent

training sample● NN are no magic, understand what your trained NN is

doing!

Page 23: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Some (older) physicists have a strong opinions against NN ...

● You will often hear the following arguments:– If MC doesn't describe data perfectly NN it cannot be

used, this is wrong!– It is true, that if the data-MC agreement is better the

NN (trained on MC) has a higher separation power in data, this is true for every method developed on MC. Every discriminating tool even a simple cut has to be evaluated in data. This is independently if it has been developed on MC or on data!

● Use NN, if they give significant improvement, if not using simple cuts is fine as well.

Page 24: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example: Signal Selection● Examine decays:

– Signal: events from (tuned) MC– Background: events from B mass sidebands– Input variables: Particle ID (time-of-flight, dE/dx in drift

chamber), momentum of daughter particles, opening angle between daughters, quality of reconstructed daughters (number of hits on the tracks), Χ² of vertex fit

Page 25: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Principal Component Analysis (I)● find linearly uncorrelated variables by variable transformation● reduce dimensionality of data

Tuned boxcuts onoriginalvariablesx,y

A better (1D) cutalong axis with largestdifference between signaland background

Background Signal

Assume background is uncorrelatedin two variables x and y, but signal is.A simple 2D box cut is not optimal.Better use 1D cut in new variable

PCA often used for NN preprocessing.

Page 26: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

PCA (transformed variable space)● Principal (main) components in data – orthogonal unit vectors

along maximal data variances ● First principal component: for each point (x,y)[i] calculate

perpendicular distance d[i](u1) of the point to u1 (y = αx+β) and find α, β which minimize ∑d[i]²

● The points (x,y)[i] have then maximal variance along direction u1: “The main scatter in data points is along this direction”

● Mathematically:– Goal is to transform X=(x,y) to

U=(u1,u2)– Calculate covariance matrix Cov(x)– Compute eigenvalues λ[i] and

eigenvectors v[i]– Construct rotation matrix

T=Col(v[i])– Finally calculate u[i]=Tx[i]

T

Page 27: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Random Grid Search● Rapid search for optimal cuts● Search systematically over randomly selected grid of cut

values for the discriminating variables to determine best cut● Density of grid points is randomly selected following

distribution of signal variables (MC)● Advantage: more efficient than a regular spaced grid, less

time-consuming

Grid Search Random Grid Search

signalbackground

Page 28: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Optimization● Assume we have our discriminants (NN, Fisher, Likelihood,

simple cuts). Via Neyman-Pearson lemma ration D(x)=S(x)/B(x) has maximal power (smallest misidentification) for given signal efficiency.

● Where to cut? It's your choice: choose signal efficiency => fix then the cut

Background efficiency as function of signal efficiency

This type of plot is useful to compare the merits of variousdiscriminating variables but it doesn't tell where to cut.

Page 29: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Figure Of Merit● Every optimization need a figure of merit to tune

on or to validate tuning.

– Optimal for fitting signal properties in mixed signal+background sample

– Sometimes used for counting rates (e.g. branching ratio measurements)

Page 30: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

TMVA: Toolkit for MultiVariate Data Analysis with ROOT

● large variety of sophisticated data selection algorithms– Rectangular cut optimization– Projective and Multi-dimensional cut optimization– Fisher discriminant– ANN (3 diff. implementations)– Boosted/bagged Decision Trees

● have one common interface to different MVA methods– easy to use & to compare many different MVA methods

● common preprocessing of input data: decorrelation, PCA● TMVA provides training/test and evaluation of all MVAs● Each MVA method provides a ranking of input variables● choose the best one for your selection problem● available as open source package● however, still under development ... easily out of date

http://tmva.sourceforge.net

Page 31: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Prediction of your free will? www.phit.de/mousegame

left

rig

ht

klic

k of

mou

se f

rom

tes

t pe

rson

a gr e

em

e nt

of

phi-t

p red

ict io

n w

ith in

put

Page 32: Modern Methods of Data Analysismenzemer/Stat...Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Event Classification How to exploit the information present in

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Used pattern l l l r r, after few iterations net learned it ...