Upload
jaden-shead
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 1
Statistical Learning Basics
Jens [email protected]
Max-Planck-Institut für Physik, München
Forschungszentrum Jülich GmbH
XEUS and MAGICExample: -hadron Separation
Basic Concepts and Notions
Classical MethodsStatistical Learning Methods
Statistical Learning Theory
Conclusion
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 2
XEUS and MAGIC
X-rays: 0.1 – 10 keV Gamma-rays: 10 – 1000 GeV
X-ray Evolving UniverseSpectroscopy Mission
Major Atmospheric GammaImaging Cerenkov Telescope
Launchedinto space~ 2015
Built inLa Palma
2003
AGNSNRGRB
First galaxiesMetal synthesisIGM
Chargedistribution
Cherenkovphotons
ellipse
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 3
Example:-hadron separation
photon: small hadron: uniform „Hillas“ parameters:
lengthwidthsize
...
Photon excess for small - significance of excess- number of excess events
before after
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 4
Example:-hadron separation
Choose (preprocessed) inputs (length, width, size, ...)Classification: photon vs. hadron
Offline Analysis: Train with simulated photonsComparison to classical „supercuts“ methodNeural Network based on linear separation
details to
be
discuss
ed
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 5
Training of Statistical Learning Methods
Statistical Learning Method:From N examples infer a rule( , )i ix y
( )x out x
Important: Generalisation vs. Overtraining
Without noise and separableby complicated boundary?
0 1 2 3 4 5 6
0
1
2
3
4
5
6
Easily separable but with noise?
Too high degree of polynomial results in interpolationbut too low degree means bad approximation
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 6
Classification vs. Regression
Pileup vs.Single photon
Reconstruction of theincident position with
subpixel resolution
0
5
10
15
20
25
-15.0 -7.5 0 7.5 15.0
=Eout-Etrue
=xout-xtrue
XEUS – x[µm]MAGIC – E[GeV]
²=<>²+²
Gamma vs.Myon vs.Hadron event
Reconstructionof the primary
photon energy
0 20 40 60 80 1000
20
40
60
80
100
signal efficiency
bac
kgro
un
d r
ejec
tio
n
validation
training
XEUS – Photon recognition
0 output 1
photons
pileups
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 7
Inputs and Preprocessing
Reasonable selection of inputs = Steering the search in function spaceMAGIC
A B
C D A C
B D
XEUS
• as many as necessary, as few as possible• highest possible analysis level• make use of symmetries
• reflection• rotation
• Measure importance of inputs• correlation• relevance
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 8
Motivation and Training Data
Lack of time
• „Online“ application, usually trigger• Very fast hardware implementations of statistical learning methods(down to few 100 ns)
• Training with offline analysis
neural network trigger at the H1 experiment
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 9
Motivation and Training Data
Lack of knowledge
• „Offline“ application• No theoretical description of the data• Theoretical prediction does not match data• Theory too complicated to construct algorithm• Performance increase with statistical learning methods• Incorporate prior knowledge by preprocessing
• Training with• Monte Carlo simulation (careful!) or• modified experiment
mesh experiment
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 10
Classical Methods
Classification: “cuts”
two univariate cuts vs. one multivariate cut XEUS: patterns which can begenerated by single photons
Regression: “fit”
MAGIC: estimate energy of the primary photon minimise relative error
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 11
Statistical Learning Methods
Decision Trees
C4.5
CAL5
Local Density Estimators
Naïve Bayes
“Maximum Likelihood”
k-Nearest Neighbours
CART
Linear Separation
Neural Network
Support Vector M
achine
Linear
Discriminant
Analysis
Meta Learning Strategies
Bagging
Boosting Random
Subspace
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 12
Some Events
0 1 2 3 4 5 6 x10# formulas
# s
lide
s
0
1
2
3
4
5
6 x
10
# formulas # slides
42 21
28 8
71 19
64 31
29 36
15 34
48 44
56 51
25 55
12 16Exp
erim
enta
list
s
The
oris
ts
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 13
Decision Trees
0 2 4 6 x10
# formulas
#formulas < 20 exp#formulas > 60 th
0 2 4 6 x10
# slides
20 < #formulas < 60?
#slides > 40 exp#slides < 40 th
#slides < 40 #slides > 40
expth
#formulas < 20 #formulas > 60rest
exp th
all events
subset 20 < #formulas < 60
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 14
Local Density Estimators
Search for similar events that are already classifiedand count the members of the two classes.
0 1 2 3 4 5 6 x10# formulas
# s
lide
s
0
1
2
3
4
5
6 x
10
0 1 2 3 4 5 6 x10# formulas
# s
lide
s
0
1
2
3
4
5
6 x
10
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 15
Methods Based on Linear Separation
Divide the input space into regionsseparated by one or more hyperplanes.
Extrapolation is done!
0 1 2 3 4 5 6 x10# formulas
# s
lide
s
0
1
2
3
4
5
6 x
10
0 1 2 3 4 5 6 x10# formulas
# s
lide
s
0
1
2
3
4
5
6 x
10
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 16
Meta-Learning Strategies
Training Data
Classifier 1Classifier 2 Classifier 3 Classifier n
Combine different classificationsto one final decision
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 17
Statistical Learning Theory
error on training set true error
loss function, here misclassifications
PAC-Learning (probably approximately correct)
finite hypotheses space H, size of training set is n,target function y is in H
probablyapproximately
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 18
Statistical Learning Theory
VC-Framework (Vapnik-Chervonenkis)
VC-dimension of linear separation in twodimensions is three because three points
can be shattered but not four points.
bound for the true errordepending on VC-dimension
“generalisation error”
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 19
Conclusion
Many applications for statistical learning methodsin high energy and astrophysics
Classification or Regression
Online or Offline
Many different methods with three basic ideas(decision trees, local density estimators, linear separation)
Rich theory
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen 20
Next Talk
Very promising results with statistical learning methods
But:
Can they be trusted?Can they be controlled?Can one calculate uncertainties?