29
Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 1 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016

Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Harrison B. Prosper Florida State University

Bari Lectures 30, 31 May, 1 June 2016

1 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016

Page 2: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

h Lecture 1 h Introduction h Classification h Grid Searches h Decision Trees

h Lecture 2 h Boosted Decision Trees

h Lecture 3 h Neural Networks h Bayesian Neural Networks

2

Page 3: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

https://support.ba.infn.it/redmine/projects/statistics/wiki/MVA

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 3

Page 4: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Today, we shall apply neural networks to two problems:

h Wine Tasting

h Approximating a 19-parameter function.

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 4

Page 5: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Recall that our goal is to classify wines as good or bad using the variables highlighted in blue, below:

variables description acetic acetic acid

citric citric acid sugar residual sugar salt NaCl SO2free free sulfur dioxide SO2tota total sulfur dioxide

pH pH sulfate potassium sulfate http://www.vinhoverde.pt/en/history-of-vinho-verde alcohol alcohol content

quality (between 0 and 1)

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 5

Page 6: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison
Page 7: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

7

(One version of Problem 13): Prove that it is possible to do the following: f(x1,…,xn) = F( g1(x(1),…, x(m)),…, gk(x(1),…, x(m))) m < n for all n

In 1957, Kolmogorov proved that it was possible with m = 3. Today, we know that functions of the form

can provide arbitrarily accurate approximations of real functions of I real variables. (Hornik, Stinchcombe, and White, Neural Networks 2, 359-366 (1989))

f (x1,!,xI ) = a + bj tanh cj + d jixi

i=1

I

∑⎡

⎣⎢

⎦⎥

j=1

H

Page 8: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

8

f (x,w) = a + bj tanh cj + djixii=1

I

∑⎡⎣⎢

⎤⎦⎥j=1

H

n(x, w)

x1

x2

cj

a

f is used for regression n is used for classification w = a, b, c, d are free parameters to be found by the training.

bj dji

Page 9: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

We consider the same two variables SO2tota and alcohol and the NN function shown here:

We use data for 500 good wines and 500 “bad” wines.

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 9

Page 10: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

To construct the NN, we minimize the empirical risk function

where y = 1 for good wines and y = 0 for bad wines,

which yields an approximation to

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 10

R(w) = 12

yi − n(xi ,w)[ ]2i=1

1000

D(x) = p(x | good)p(x | good) + p(x | bad)

x = (SO2tota, alcohol)

Page 11: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 11

The distribution of D(x) and the Receiver Operating Characteristic (ROC) curve. In this example, the ROC curve shows the fraction f of each class of wine accepted.

Page 12: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 12

Actual p(good | x)

Best fit NN

Page 13: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison
Page 14: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

14

Page 15: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

15

Given, the posterior density p(w | T), and some new data x one can compute the predictive distribution of y

If a definite estimate of y is needed for every x, this can be obtained by minimizing a suitable risk function,

Using, for example, L = (y – f )2 yields the following estimate

In general, a different L, will yield a different estimate of f (x).

R[ fw] = L( y, f ) p( y | x,T ) dy∫

f (x) ! y(x, T ) ≡ y p( y | x,T ) dy∫

Page 16: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

16

For function approximation, the likelihood p(y | x, w) for a given (y, x) is typically chosen to be Gaussian

For classification one uses

Setting y = 1, in gives

which is an estimate of the probability that x belongs to the class for which y = 1.

p(y | x,w) = exp −12yi − f (xi ,w)[ ]2 /σ 2⎡

⎣⎢⎤⎦⎥

p(y | x,w) = [n(x,w)]y[1− n(x,w)]1− y , y = 0, or 1

p(1 | x,T ) = n(x,w) p(w | T ) dw∫

Page 17: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

(GeV)T4l

p

0 50 100 150 200

)T

4l

p(S

|p

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

17

Page 18: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

To construct the BNN, we sample points w from the posterior density

using a Markov Chain Monte Carlo method. This generates an ensemble of neural networks, just as in the 1-D example on the previous slide.

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 18

p(w |T ) = p(T |w)p(w)p(T )

Page 19: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 19

The distribution of D(x) (using the average of 100 NNs) and the ensemble of ROC curves, one for each NN.

Page 20: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 20

<NN> Actual p(good | x)

Page 21: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

We see that if good and bad wines are equally likely and we are prepared to accept 40% of bad wines, while accepting 80% of good ones, then the BDT does slightly better. But, if we lower the acceptance rate of bad wines, all three methods work equally well.

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 21

Page 22: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison
Page 23: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Now that the Higgs boson has been found, the primary goal of researchers at the LHC is to look for, and we hope, discover new physics.

There are two basic strategies: 1.  Look for any deviation from the

predictions of our current theory of particles (the Standard Model).

2.  Look for the specific deviations predicted by theories of new physics, such as those based on a proposed symmetry of Nature called supersymmetry.

Page 24: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

While finishing a paper related to his PhD dissertation, my student Sam Bein had to approximate a function of 19 parameters!

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 24

Page 25: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

This is what I suggested he try. 1.  Write the function to be approximated as

2.  Use NNs to approximate each gi(θi) by setting y(θ) = g.

3.  Use another neural network to approximate the ratio

where the “background” is sampled from the g functions and the “signal” is the original distribution of points.

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 25

f (θ) = C(θ) gi (θi )i=1

19

D(θ) = p(θ)

p(θ) + gi (θi )i=1

19

Page 26: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

4.  Finally, approximate the function f using

The intuition here is that much of the dependence of f on θ is captured by the product function, leaving (we hoped!) a somewhat gentler 19-dimensional function to be approximated.

The next slide shows the neural network approximations of a few of the g functions. Of course, 1-D functions can be approximated using a wide variety of methods. The point here is to show the flexibility of a single class of NNs.

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 26

f (θ) = D(θ)1− D(θ)

gi (θi )i=1

19

Page 27: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 27

Ab-6000-4000-2000 0 2000 40006000

0.040.050.060.070.080.090.1

-310×

At-6000-4000-2000 0 2000 400060000

0.020.040.060.080.10.120.140.16

-310×

M1-3000 -2000 -1000 0 1000 2000 300000.10.20.30.40.50.60.70.8

-310×

M2-3000 -2000 -1000 0 1000 2000 30000

0.05

0.1

0.15

0.2

0.25-310×

Page 28: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 28

Page 29: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison

29

h Multivariate methods can be applied to many aspects of data analysis, including classification and function approximation. Even though there are hundreds of methods, almost all are numerical approximations of the same mathematical quantity.

h We considered random grid searches, decision trees, boosted decision trees, neural networks, and the Bayesian variety.

h Since no method is best for all problems, it is good practice to try a few of them, say BDTs and NNs, and check that they give approximately the same results.