42
1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan RHUL Physics www.pp.rhul.ac.uk/~cowan RHUL Computer Science Seminar 17 March, 2009

1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

Embed Size (px)

Citation preview

Page 1: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

1 Glen Cowan Multivariate Statistical Methods in Particle Physics

Machine Learning and Multivariate Statistical Methods in Particle Physics

Glen CowanRHUL Physicswww.pp.rhul.ac.uk/~cowan

RHUL Computer Science Seminar

17 March, 2009

Page 2: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

2 Glen Cowan Multivariate Statistical Methods in Particle Physics

OutlineQuick overview of particle physics at the Large Hadron Collider (LHC)

Multivariate classification from a particle physics viewpoint

Some examples of multivariate classification in particle physicsNeural NetworksBoosted Decision TreesSupport Vector Machines

Summary, conclusions, etc.

Page 3: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

3 Glen Cowan Multivariate Statistical Methods in Particle Physics

The Standard Model of particle physics

Matter... + gauge bosons...

photon (), W, Z, gluon (g)

+ relativity + quantum mechanics + symmetries... = Standard Model

25 free parameters (masses, coupling strengths,...).

Includes Higgs boson (not yet seen).

Almost certainly incomplete (e.g. no gravity).

Agrees with all experimental observations so far.

Many candidate extensions to SM (supersymmetry, extra dimensions,...)

Page 4: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

4 Glen Cowan Multivariate Statistical Methods in Particle Physics

The Large Hadron Collider

Counter-rotating proton beamsin 27 km circumference ring

pp centre-of-mass energy 14 TeV

Detectors at 4 pp collision points:ATLASCMSLHCb (b physics)ALICE (heavy ion physics)

general purpose

Page 5: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

5 Glen Cowan Multivariate Statistical Methods in Particle Physics

The ATLAS detector

2100 physicists37 countries 167 universities/labs

25 m diameter46 m length7000 tonnes~108 electronic channels

Page 6: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

6 Glen Cowan Multivariate Statistical Methods in Particle Physics

A simulated SUSY event in ATLAS

high pT

muons

high pT jets

of hadrons

missing transverse energy

p p

Page 7: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

7 Glen Cowan Multivariate Statistical Methods in Particle Physics

Background events

This event from Standard Model ttbar production alsohas high p

T jets and muons,

and some missing transverseenergy.

→ can easily mimic a SUSY event.

Page 8: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

8 Glen Cowan Multivariate Statistical Methods in Particle Physics

LHC event production rates

most events (boring)

interesting

very interesting (~1 out of every 1011)

mildly interesting

Page 9: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

9 Glen Cowan Multivariate Statistical Methods in Particle Physics

LHC dataAt LHC, ~109 pp collision events per second, mostly uninteresting

do quick sifting, record ~200 events/secsingle event ~ 1 Mbyte1 “year” 107 s, 1016 pp collisions / year2 109 events recorded / year (~2 Pbyte / year)

For new/rare processes, rates at LHC can be vanishingly smalle.g. Higgs bosons detectable per year could be ~103

→ 'needle in a haystack'

For Standard Model and (many) non-SM processes we can generatesimulated data with Monte Carlo programs (including simulationof the detector).

Page 10: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

10 Glen Cowan Multivariate Statistical Methods in Particle Physics

A simulated event

PYTHIA Monte Carlopp → gluino-gluino

.

.

.

Page 11: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

11 Glen Cowan Multivariate Statistical Methods in Particle Physics

Multivariate analysis in particle physicsFor each event we measure a set of numbers: nx,,x=x 1

x1 = jet p

T

x2 = missing energy

x3 = particle i.d. measure, ...

x follows some n-dimensional joint probability density, which

depends on the type of event produced, i.e., was it pp t t , pp gg ,

x i

x j

E.g. hypotheses H0, H

1, ...

Often simply “signal”, “background”

1H|xp

0H|xp

Page 12: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

12 Glen Cowan Multivariate Statistical Methods in Particle Physics

Finding an optimal decision boundary

In particle physics usually startby making simple “cuts”:

xi < c

i

xj < c

j

Maybe later try some other type of decision boundary:

H0 H

0

H0

H1

H1

H1

Page 13: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

13 Glen Cowan Multivariate Statistical Methods in Particle Physics

The optimal decision boundaryTry to best approximate optimal decision boundary based onlikelihood ratio:

or equivalently think of the likelihood ratio as the optimal statistic fora test of H0 vs H1.

In general we don't have the pdfs p(x|H0), p(x|H1),...Rather, we have Monte Carlo models for each process.

Usually training data from the MC models is cheap.

But the models contain many approximations:predictions for observables obtained using perturbationtheory (truncated at some order); phenomenological modelingof non-perturbative effects; imperfect detector description,...

Page 14: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

14 Glen Cowan Multivariate Statistical Methods in Particle Physics

Two distinct event selection problemsIn some cases, the event types in question are both known to exist.

Example: separation of different particle types (electron vs muon)Use the selected sample for further study.

In other cases, the null hypothesis H0 means "Standard Model" events,and the alternative H1 means "events of a type whose existence isnot yet established" (to do so is the goal of the analysis).

Many subtle issues here, mainly related to the heavy burdenof proof required to establish presence of a new phenomenon.

Typically require p-value of background-only hypothesis below ~ 10 (a 5 sigma effect) to claim discovery of "New Physics".

Page 15: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

15 Glen Cowan Multivariate Statistical Methods in Particle Physics

Discovering "New Physics"The LHC experiments are expensive

~ $1010 (accelerator and experiments)

the competition is intense(ATLAS vs. CMS) vs. Tevatron

and the stakes are high:

4 sigma effect

5 sigma effect

So there is a strong motivation to extract all possible informationfrom the data.

Page 16: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

16 Glen Cowan Multivariate Statistical Methods in Particle Physics

Using classifier output for discovery

y

f(y)

y

N(y)

Normalized to unity Normalized to expected number of events

excess?

signal

background background

searchregion

Discovery = number of events found in search region incompatiblewith background-only hypothesis.

p-value of background-only hypothesis can depend crucially distribution f(y|b) in the "search region".

ycut

Page 17: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

17 Glen Cowan Multivariate Statistical Methods in Particle Physics

Example of a "cut-based" studyIn the 1990s, the CDF experiment at Fermilab (Chicago) measuredthe number of hadron jets produced in proton-antiproton collisionsas a function of their momentum perpendicular to the beam direction:

Prediction low relative to data forvery high transverse momentum.

"jet" ofparticles

Page 18: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

18 Glen Cowan Multivariate Statistical Methods in Particle Physics

High pT jets = quark substructure?Although the data agree remarkably well with the Standard Model(QCD) prediction overall, the excess at high pT appears significant:

The fact that the variable is "understandable" leads directly to a plausible explanation for the discrepancy, namely, that quarks could possess an internal substructure.

Would not have been the case if the variable plotted was a complicated combination of many inputs.

Page 19: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

19 Glen Cowan Multivariate Statistical Methods in Particle Physics

High pT jets from parton model uncertaintyFurthermore the physical understanding of the variable led oneto a more plausible explanation, namely, an uncertain modelling ofthe quark (and gluon) momentum distributions inside the proton.

When model adjusted, discrepancy largely disappears:

Can be regarded as a "success" of the cut-based approach. Physicalunderstanding of output variable led to solution of apparent discrepancy.

Page 20: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

20 Glen Cowan Multivariate Statistical Methods in Particle Physics

Neural networks in particle physicsFor many years, the only "advanced" classifier used in particle physics.

Usually use single hidden layer, logistic sigmoid activation function:

Page 21: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

21 Glen Cowan Multivariate Statistical Methods in Particle Physics

Neural network example from LEP IISignal: ee → WW (often 4 well separated hadron jets)

Background: ee → qqgg (4 less well separated hadron jets)

← input variables based on jetstructure, event shape, ...none by itself gives much separation.

Neural network output:

(Garrido, Juste and Martinez, ALEPH 96-144)

Page 22: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

22 Glen Cowan Multivariate Statistical Methods in Particle Physics

Some issues with neural networksIn the example with WW events, goal was to select these eventsso as to study properties of the W boson.

Needed to avoid using input variables correlated to theproperties we eventually wanted to study (not trivial).

In principle a single hidden layer with an sufficiently large number ofnodes can approximate arbitrarily well the optimal test variable (likelihoodratio).

Usually start with relatively small number of nodes and increaseuntil misclassification rate on validation data sample ceasesto decrease.

Usually MC training data is cheap -- problems with getting stuck in local minima, overtraining, etc., less important than concerns of systematic differences between the training data and Nature, and concerns aboutthe ease of interpretation of the output.

Page 23: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

23 Glen Cowan Multivariate Statistical Methods in Particle Physics

Decision treesOut of all the input variables, find the one for which with a single cut gives best improvement in signal purity:

Example by MiniBooNE experiment,B. Roe et al., NIM 543 (2005) 577

where wi. is the weight of the ith event.

Resulting nodes classified as either signal/background.

Iterate until stop criterion reached basedon e.g. purity or minimum number ofevents in a node.

The set of cuts defines the decision boundary.

Page 24: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

24 Glen Cowan Multivariate Statistical Methods in Particle Physics

BoostingThe resulting classifier is usually very sensitive to fluctuations in the training data. Stabilize by boosting:

Create an ensemble of training data sets from the original one by updating the event weights (misclassified events get increased weight).

Assign a score k to the classifier from the kth training set basedon its error rate k:

Final classifier is a weighted combination of those from theensemble of training sets:

Page 25: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

25 Glen Cowan Multivariate Statistical Methods in Particle Physics

Particle i.d. in MiniBooNEDetector is a 12-m diameter tank of mineral oil exposed to a beam of neutrinos and viewed by 1520 photomultiplier tubes:

H.J. Yang, MiniBooNE PID, DNP06H.J. Yang, MiniBooNE PID, DNP06

Search for to e oscillations

required particle i.d. using information from the PMTs.

Page 26: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

26 Glen Cowan Multivariate Statistical Methods in Particle Physics

BDT example from MiniBooNE~200 input variables for each event ( interaction producing e, or

Each individual tree is relatively weak, with a misclassification error rate ~ 0.4 – 0.45

B. Roe et al., NIM 543 (2005) 577

Page 27: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

27 Glen Cowan Multivariate Statistical Methods in Particle Physics

Monitoring overtraining

From MiniBooNEexample:

Performance stableafter a few hundredtrees.

Page 28: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

28 Glen Cowan Multivariate Statistical Methods in Particle Physics

Comparison of boosting algorithmsA number of boosting algorithms on the market; differ in theupdate rule for the weights.

Page 29: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

29 Glen Cowan Multivariate Statistical Methods in Particle Physics

Boosted decision tree commentsBoosted decision trees have become popular in particle physics because they can handle many inputs without degrading; those that provide little/no separation are rarely used as tree splitters are effectively ignored.

A number of boosting algorithms have been looked at, which differ primarily in the rule for updating the weights (-Boost, LogitBoost,...).

Some studies have looked at other ways of combining weaker classifiers, e.g., Bagging (Boostrap-Aggregating), generates the ensemble of classifiers by random sampling with replacement from the full training sample. Not much experience yet with these.

Page 30: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

30 Glen Cowan Multivariate Statistical Methods in Particle Physics

The top quarkTop quark is the heaviest known particle in the Standard Model.Since mid-1990s has been observed produced in pairs:

Page 31: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

31 Glen Cowan Multivariate Statistical Methods in Particle Physics

Single top quark production

One also expected to find singly produced top quarks; pair-producedtops are now a background process. Use many inputs based on

jet properties, particle i.d., ...

signal(blue +green)

Page 32: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

32 Glen Cowan Multivariate Statistical Methods in Particle Physics

Different classifiers for single top

Also Naive Bayes and various approximations to likelihood ratio,....

Final combined result is statistically significant (>5 level) but not easy to understand classifier outputs.

Page 33: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

33 Glen Cowan Multivariate Statistical Methods in Particle Physics

Support Vector MachinesMap input variables into high dimensional feature space: x →

Maximize distance between separating hyperplanes (margin) subject to constraints allowing for some misclassification.

Final classifier only depends on scalarproducts of (x):

So only need kernel

Bishop ch 7

Page 34: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

34 Glen Cowan Multivariate Statistical Methods in Particle Physics

Using an SVMTo use an SVM the user must as a minimum choose

a kernel function (e.g. Gaussian)any free parameters in the kernel (e.g. the of the Gaussian)the cost parameter C (plays role of regularization parameter)

The training is relatively straightforward because, in contrast to neuralnetworks, the function to be minimized has a single global minimum.

Furthermore evaluating the classifier only requires that one retainand sum over the support vectors, a relatively small number of points.

The advantages/disadvantages and rationale behind the choices above is not always clear to the particle physicist -- help needed here.

Page 35: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

35 Glen Cowan Multivariate Statistical Methods in Particle Physics

SVM in particle physicsSVMs are very popular in the Machine Learning community but haveyet to find wide application in HEP. Here is an early example froma CDF top quark anlaysis (A. Vaiciulis, contribution to PHYSTAT02).

signaleff.

Page 36: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

36 Glen Cowan Multivariate Statistical Methods in Particle Physics

Summary, conclusions, etc.Particle physics has used several multivariate methods for many years:

linear (Fisher) discriminantneural networksnaive Bayes

and has in the last several years started to use a few morek-nearest neighbourboosted decision treessupport vector machines

The emphasis is often on controlling systematic uncertainties betweenthe modeled training data and Nature to avoid false discovery.

Although many classifier outputs are "black boxes", a discoveryat 5 significance with a sophisticated (opaque) method will win thecompetition if backed up by, say, 4 evidence from a cut-based method.

Page 37: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

37 Glen Cowan Multivariate Statistical Methods in Particle Physics

Quotes I like

“Alles sollte so einfach wie möglich sein,aber nicht einfacher.”

– A. Einstein

“If you believe in something you don't understand, you suffer,...”

– Stevie Wonder

Page 38: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

38 Glen Cowan Multivariate Statistical Methods in Particle Physics

Extra slides

Page 39: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

39 Glen Cowan Multivariate Statistical Methods in Particle Physics

Software for multivariate analysisTMVA, Höcker, Stelzer, Tegenfeldt, Voss, Voss, physics/0703039

StatPatternRecognition, I. Narsky, physics/0507143

Further info from www.hep.caltech.edu/~narsky/spr.htmlAlso wide variety of methods, many complementary to TMVACurrently appears project no longer to be supported

From tmva.sourceforge.net, also distributed with ROOTVariety of classifiersGood manual

Page 40: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

40 Glen Cowan Multivariate Statistical Methods in Particle Physics

Page 41: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

41 Glen Cowan Multivariate Statistical Methods in Particle Physics

Identifying particles in a detectorDifferent particle types (electron, pion, muon,...) leave characteristicallydistinct signals as in the particle detector:

But the characteristics overlap, hence the need for multivariateclassification methods.

Goal is to produce a listof "electron candidates","muon candidates", etc.with well known acceptance probabilitiesfor all particle types.

Page 42: 1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan

42 Glen Cowan Multivariate Statistical Methods in Particle Physics

For every particle measure pattern of energy deposit in calorimeter~ shower width, depth

Get training data by placing detector in test beam of pions, muons, etc.here muon beam essentially "pure"; electron and pion beams bothhave significant contamination.

Example of neural network for particle i.d.

e output output output

e beam

beam

beam

ATLAS Calorimeter test

NN architecture:

10 input nodes 8 nodes in 1 hidden layer 3 output nodes

Damazio and de Seixas