Transcript
Page 1: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 1

Computing and Statistical Data Analysis Stat 5: Multivariate Methods

London Postgraduate Lectures on Particle Physics;

University of London MSci course PH4515

Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan

Course web page: www.pp.rhul.ac.uk/~cowan/stat_course.html

Page 2: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 page 2

Finding an optimal decision boundary In particle physics usually start by making simple “cuts”:

xi < ci xj < cj

Maybe later try some other type of decision boundary: H0 H0

H0

H1

H1 H1

Page 3: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 3

Multivariate methods Many new (and some old) methods:

Fisher discriminant Neural networks Kernel density methods Support Vector Machines Decision trees Boosting Bagging

New software for HEP, e.g., TMVA , Höcker, Stelzer, Tegenfeldt, Voss, Voss, physics/0703039 StatPatternRecognition, I. Narsky, physics/0507143

Page 4: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 4

Page 5: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 5

Page 6: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 6

Page 7: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 7

2

Page 8: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 8

Page 9: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 9

Page 10: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 10

Page 11: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 11

Page 12: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 12

Page 13: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 13

Page 14: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 14

Page 15: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 15

Page 16: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 16

Page 17: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 17

Page 18: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 18

Page 19: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 19

Page 20: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 20

Page 21: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 21

Page 22: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 22

Page 23: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 23

Page 24: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 page 24

Overtraining

training sample independent validation sample

If decision boundary is too flexible it will conform too closely to the training points → overtraining. Monitor by applying classifier to independent validation sample.

Page 25: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 25

Choose classifier that minimizes error function for validation sample.

Page 26: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 page 26

Neural network example from LEP II Signal: e+e- → W+W- (often 4 well separated hadron jets) Background: e+e- → qqgg (4 less well separated hadron jets)

← input variables based on jet structure, event shape, ... none by itself gives much separation.

Neural network output:

(Garrido, Juste and Martinez, ALEPH 96-144)

Page 27: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 27

Page 28: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 28

Page 29: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 29

Page 30: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 30

Page 31: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 31

Page 32: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 32

Kernel-based PDE (KDE, Parzen window) Consider d dimensions, N training events, x1, ..., xN, estimate f (x) with

Use e.g. Gaussian kernel:

kernel bandwidth (smoothing parameter)

Need to sum N terms to evaluate function (slow); faster algorithms only count events in vicinity of x (k-nearest neighbor, range search).

Page 33: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 33

Page 34: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 34

Page 35: Computing and Statistical Data Analysis Stat 5 ...cowan/stat/2013/stat_5.pdf · G. Cowan Computing and Statistical Data Analysis / Stat 5 3 Multivariate methods Many new (and some

G. Cowan Computing and Statistical Data Analysis / Stat 5 35


Recommended