Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen1 Statistical Learning Basics Jens Zimmermann [email protected] Max-Planck-Institut für Physik,

Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 1

Statistical Learning Basics

Jens [email protected]

Max-Planck-Institut für Physik, München

Forschungszentrum Jülich GmbH

XEUS and MAGICExample: -hadron Separation

Basic Concepts and Notions

Classical MethodsStatistical Learning Methods

Statistical Learning Theory

Conclusion


XEUS and MAGIC

X-rays: 0.1 – 10 keV Gamma-rays: 10 – 1000 GeV

X-ray Evolving UniverseSpectroscopy Mission

Major Atmospheric GammaImaging Cerenkov Telescope

Launchedinto space~ 2015

Built inLa Palma

2003

AGNSNRGRB

First galaxiesMetal synthesisIGM

Chargedistribution

Cherenkovphotons

ellipse


Example:-hadron separation

photon: small hadron: uniform „Hillas“ parameters:

lengthwidthsize

...

Photon excess for small - significance of excess- number of excess events

before after


Example:-hadron separation

Choose (preprocessed) inputs (length, width, size, ...)Classification: photon vs. hadron

Offline Analysis: Train with simulated photonsComparison to classical „supercuts“ methodNeural Network based on linear separation

details to

be

discuss

ed


Training of Statistical Learning Methods

Statistical Learning Method:From N examples infer a rule( , )i ix y

( )x out x

Important: Generalisation vs. Overtraining

Without noise and separableby complicated boundary?

0 1 2 3 4 5 6

0

1

2

3

4

5

6

Easily separable but with noise?

Too high degree of polynomial results in interpolationbut too low degree means bad approximation


Classification vs. Regression

Pileup vs.Single photon

Reconstruction of theincident position with

subpixel resolution

0

5

10

15

20

25

-15.0 -7.5 0 7.5 15.0

=Eout-Etrue

=xout-xtrue

XEUS – x[µm]MAGIC – E[GeV]

²=<>²+²

Gamma vs.Myon vs.Hadron event

Reconstructionof the primary

photon energy

0 20 40 60 80 1000

20

40

60

80

100

signal efficiency

bac

kgro

un

d r

ejec

tio

n

validation

training

XEUS – Photon recognition

0 output 1

photons

pileups


Inputs and Preprocessing

Reasonable selection of inputs = Steering the search in function spaceMAGIC

A B

C D A C

B D

XEUS

• as many as necessary, as few as possible• highest possible analysis level• make use of symmetries

• reflection• rotation

• Measure importance of inputs• correlation• relevance


Motivation and Training Data

Lack of time

• „Online“ application, usually trigger• Very fast hardware implementations of statistical learning methods(down to few 100 ns)

• Training with offline analysis

neural network trigger at the H1 experiment


Motivation and Training Data

Lack of knowledge

• „Offline“ application• No theoretical description of the data• Theoretical prediction does not match data• Theory too complicated to construct algorithm• Performance increase with statistical learning methods• Incorporate prior knowledge by preprocessing

• Training with• Monte Carlo simulation (careful!) or• modified experiment

mesh experiment


Classical Methods

Classification: “cuts”

two univariate cuts vs. one multivariate cut XEUS: patterns which can begenerated by single photons

Regression: “fit”

MAGIC: estimate energy of the primary photon minimise relative error


Statistical Learning Methods

Decision Trees

C4.5

CAL5

Local Density Estimators

Naïve Bayes

“Maximum Likelihood”

k-Nearest Neighbours

CART

Linear Separation

Neural Network

Support Vector M

achine

Linear

Discriminant

Analysis

Meta Learning Strategies

Bagging

Boosting Random

Subspace


Some Events

0 1 2 3 4 5 6 x10# formulas

# s

lide

s

0

1

2

3

4

5

6 x

10

# formulas # slides

42 21

28 8

71 19

64 31

29 36

15 34

48 44

56 51

25 55

12 16Exp

erim

enta

list

s

The

oris

ts


Decision Trees

0 2 4 6 x10

# formulas

#formulas < 20 exp#formulas > 60 th

0 2 4 6 x10

# slides

20 < #formulas < 60?

#slides > 40 exp#slides < 40 th

#slides < 40 #slides > 40

expth

#formulas < 20 #formulas > 60rest

exp th

all events

subset 20 < #formulas < 60


Local Density Estimators

Search for similar events that are already classifiedand count the members of the two classes.

0 1 2 3 4 5 6 x10# formulas

# s

lide

s

0

1

2

3

4

5

6 x

10

0 1 2 3 4 5 6 x10# formulas

# s

lide

s

0

1

2

3

4

5

6 x

10


Methods Based on Linear Separation

Divide the input space into regionsseparated by one or more hyperplanes.

Extrapolation is done!

0 1 2 3 4 5 6 x10# formulas

# s

lide

s

0

1

2

3

4

5

6 x

10

0 1 2 3 4 5 6 x10# formulas

# s

lide

s

0

1

2

3

4

5

6 x

10


Meta-Learning Strategies

Training Data

Classifier 1Classifier 2 Classifier 3 Classifier n

Combine different classificationsto one final decision



error on training set true error

loss function, here misclassifications

PAC-Learning (probably approximately correct)

finite hypotheses space H, size of training set is n,target function y is in H

probablyapproximately



VC-Framework (Vapnik-Chervonenkis)

VC-dimension of linear separation in twodimensions is three because three points

can be shattered but not four points.

bound for the true errordepending on VC-dimension

“generalisation error”


Conclusion

Many applications for statistical learning methodsin high energy and astrophysics

Classification or Regression

Online or Offline

Many different methods with three basic ideas(decision trees, local density estimators, linear separation)

Rich theory


Next Talk

Very promising results with statistical learning methods

But:

Can they be trusted?Can they be controlled?Can one calculate uncertainties?

Documents

Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen1 Statistical Learning Basics Jens Zimmermann [email protected] Max-Planck-Institut für Physik,