Transcript
Page 1: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1

Statistical Methods in Particle PhysicsLecture 3: Multivariate Methods

SUSSP65

St Andrews

16–29 August 2009

Glen CowanPhysics DepartmentRoyal Holloway, University of [email protected]/~cowan

Page 2: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 2

OutlineLecture #1: An introduction to Bayesian statistical methods

Role of probability in data analysis (Frequentist, Bayesian)

A simple fitting problem : Frequentist vs. Bayesian solution

Bayesian computation, Markov Chain Monte Carlo

Lecture #2: Setting limits, making a discovery

Frequentist vs Bayesian approach,

treatment of systematic uncertainties

Lecture #3: Multivariate methods for HEP

Event selection as a statistical test

Neyman-Pearson lemma and likelihood ratio test

Some multivariate classifiers:

NN, BDT, SVM, ...

Page 3: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 3

Page 4: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 4

Page 5: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 5

A simulated SUSY event in ATLAS

high pT

muons

high pT jets

of hadrons

missing transverse energy

p p

Page 6: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 6

Background events

This event from Standard Model ttbar production alsohas high p

T jets and muons,

and some missing transverseenergy.

→ can easily mimic a SUSY event.

Page 7: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 7

A simulated event

PYTHIA Monte Carlopp → gluino-gluino

.

.

.

Page 8: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 8

Event selection as a statistical testFor each event we measure a set of numbers: nx,,x=x 1

x1 = jet p

T

x2 = missing energyx

3 = particle i.d. measure, ...

x follows some n-dimensional joint probability density, which

depends on the type of event produced, i.e., was it ,ttpp ,g~g~pp

x i

x jE.g. hypotheses H

0, H

1, ...

Often simply “signal”, “background”

1H|xp

0H|xp

Page 9: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 9

Finding an optimal decision boundary

In particle physics usually startby making simple “cuts”:

xi < c

i

xj < c

j

Maybe later try some other type of decision boundary:

H0 H

0

H0

H1

H1

H1

Page 10: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 10

Page 11: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 11

The optimal decision boundaryTry to best approximate optimal decision boundary based onlikelihood ratio:

or equivalently think of the likelihood ratio as the optimal statistic fora test of H0 vs H1.

In general we don't have the pdfs p(x|H0), p(x|H1),...Rather, we have Monte Carlo models for each process.

Usually training data from the MC models is cheap.

But the models contain many approximations:predictions for observables obtained using perturbationtheory (truncated at some order); phenomenological modelingof non-perturbative effects; imperfect detector description,...

Page 12: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 12

Two distinct event selection problemsIn some cases, the event types in question are both known to exist.

Example: separation of different particle types (electron vs muon)Use the selected sample for further study.

In other cases, the null hypothesis H0 means "Standard Model" events,and the alternative H1 means "events of a type whose existence isnot yet established" (to do so is the goal of the analysis).

Many subtle issues here, mainly related to the heavy burdenof proof required to establish presence of a new phenomenon.

Typically require p-value of background-only hypothesis below ~ 10 (a 5 sigma effect) to claim discovery of "New Physics".

Page 13: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 13

Using classifier output for discovery

y

f(y)

y

N(y)

Normalized to unity Normalized to expected number of events

excess?

signal

background background

searchregion

Discovery = number of events found in search region incompatiblewith background-only hypothesis.

p-value of background-only hypothesis can depend crucially distribution f(y|b) in the "search region".

ycut

Page 14: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 14

Page 15: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 15

Example of a "cut-based" studyIn the 1990s, the CDF experiment at Fermilab (Chicago) measuredthe number of hadron jets produced in proton-antiproton collisionsas a function of their momentum perpendicular to the beam direction:

Prediction low relative to data forvery high transverse momentum.

"jet" ofparticles

Page 16: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 16

High pT jets = quark substructure?Although the data agree remarkably well with the Standard Model(QCD) prediction overall, the excess at high pT appears significant:

The fact that the variable is "understandable" leads directly to a plausible explanation for the discrepancy, namely, that quarks could possess an internal substructure.

Would not have been the case if the variable plotted was a complicated combination of many inputs.

Page 17: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 17

High pT jets from parton model uncertaintyFurthermore the physical understanding of the variable led oneto a more plausible explanation, namely, an uncertain modeling ofthe quark (and gluon) momentum distributions inside the proton.

When model adjusted, discrepancy largely disappears:

Can be regarded as a "success" of the cut-based approach. Physicalunderstanding of output variable led to solution of apparent discrepancy.

Page 18: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 18

Page 19: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 19

Page 20: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 20

Neural networks in particle physicsFor many years, the only "advanced" classifier used in particle physics.

Usually use single hidden layer, logistic sigmoid activation function:

Page 21: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 21

Neural network example from LEP IISignal: ee → WW (often 4 well separated hadron jets)

Background: ee → qqgg (4 less well separated hadron jets)

← input variables based on jetstructure, event shape, ...none by itself gives much separation.

Neural network output:

(Garrido, Juste and Martinez, ALEPH 96-144)

Page 22: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 22

Some issues with neural networksIn the example with WW events, goal was to select these eventsso as to study properties of the W boson.

Needed to avoid using input variables correlated to theproperties we eventually wanted to study (not trivial).

In principle a single hidden layer with an sufficiently large number ofnodes can approximate arbitrarily well the optimal test variable (likelihoodratio).

Usually start with relatively small number of nodes and increaseuntil misclassification rate on validation data sample ceasesto decrease.

Often MC training data is cheap -- problems with getting stuck in local minima, overtraining, etc., less important than concerns of systematic differences between the training data and Nature, and concerns aboutthe ease of interpretation of the output.

Page 23: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 23

Overtraining

training sample independent test sample

If decision boundary is too flexible it will conform too closelyto the training points → overtraining.

Monitor by applying classifier to independent test sample.

Page 24: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 24

Particle i.d. in MiniBooNEDetector is a 12-m diameter tank of mineral oil exposed to a beam of neutrinos and viewed by 1520 photomultiplier tubes:

H.J. Yang, MiniBooNE PID, DNP06H.J. Yang, MiniBooNE PID, DNP06

Search for to e oscillations

required particle i.d. using information from the PMTs.

Page 25: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 25

Decision treesOut of all the input variables, find the one for which with a single cut gives best improvement in signal purity:

Example by MiniBooNE experiment,B. Roe et al., NIM 543 (2005) 577

where wi. is the weight of the ith event.

Resulting nodes classified as either signal/background.

Iterate until stop criterion reached based on e.g. purity or minimum number of events in a node.

The set of cuts defines the decision boundary.

Page 26: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 26

Page 27: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 27

Page 28: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 28

Page 29: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 29

Page 30: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 30

BDT example from MiniBooNE~200 input variables for each event ( interaction producing e, or

Each individual tree is relatively weak, with a misclassification error rate ~ 0.4 – 0.45

B. Roe et al., NIM 543 (2005) 577

Page 31: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 31

Monitoring overtraining

From MiniBooNEexample:

Performance stableafter a few hundredtrees.

Page 32: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 32

Comparison of boosting algorithmsA number of boosting algorithms on the market; differ in theupdate rule for the weights.

Page 33: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 33

Page 34: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 34

Single top quark production (CDF/D0)Top quark discovered in pairs, butSM predicts single top production.

Use many inputs based on jet properties, particle i.d., ...

signal(blue +green)

Pair-produced tops are now a background process.

Page 35: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 35

Different classifiers for single top

Also Naive Bayes and various approximations to likelihood ratio,....

Final combined result is statistically significant (>5 level) but not easy to understand classifier outputs.

Page 36: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 36

Support Vector MachinesMap input variables into high dimensional feature space: x →

Maximize distance between separating hyperplanes (margin) subject to constraints allowing for some misclassification.

Final classifier only depends on scalarproducts of (x):

So only need kernel

Bishop ch 7

Page 37: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 37

Page 38: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 38

Page 39: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 39

Page 40: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 40

Page 41: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 41

Page 42: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 42

Page 43: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 43

Page 44: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 44

Page 45: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 45

Page 46: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 46

Page 47: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 47

Page 48: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 48

Page 49: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 49

Using an SVMTo use an SVM the user must as a minimum choose

a kernel function (e.g. Gaussian)any free parameters in the kernel (e.g. the of the Gaussian)a cost parameter C (plays role of regularization parameter)

The training is relatively straightforward because, in contrast to neuralnetworks, the function to be minimized has a single global minimum.

Furthermore evaluating the classifier only requires that one retainand sum over the support vectors, a relatively small number of points.

The advantages/disadvantages and rationale behind the choices above is not always clear to the particle physicist -- help needed here.

Page 50: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 50

SVM in particle physicsSVMs are very popular in the Machine Learning community but haveyet to find wide application in HEP. Here is an early example froma CDF top quark anlaysis (A. Vaiciulis, contribution to PHYSTAT02).

signaleff.

Page 51: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 51

Summary on multivariate methodsParticle physics has used several multivariate methods for many years:

linear (Fisher) discriminantneural networksnaive Bayes

and has in the last several years started to use a few morek-nearest neighbourboosted decision treessupport vector machines

The emphasis is often on controlling systematic uncertainties betweenthe modeled training data and Nature to avoid false discovery.

Although many classifier outputs are "black boxes", a discoveryat 5 significance with a sophisticated (opaque) method will win thecompetition if backed up by, say, 4 evidence from a cut-based method.

Page 52: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 52

Extra slides

Page 53: G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods

G. Cowan SUSSP65, St Andrews, 16-29 August 2009 / Statistical Methods 3 page 53


Recommended