Upload
truongdieu
View
214
Download
0
Embed Size (px)
Citation preview
Neural networks in data analysis
Michal Kreps
ISAPP Summer Institute 2009
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 1/38
OutlineWhat are the neural networks
Basic principles of NN
For what are they useful
Training of NN
Available packages
Comparison to other methods
Some examples of NN usage
I’m not expert on neural networks, only advanced user whounderstands principles.Goal: Help you to understand principles so that:
→ neural networks don’t look as black box to you
→ you have starting point to use neural networks by yourself
1
2
3
4
5
6
7
8
input layer hidden layer
1
output layer
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 2/38
Tasks to solveDifferent tasks in nature
Some easy to solvealgorithmically, some difficult orimpossible
Reading hand written textLot of personal specificsRecognition often needs towork only with partialinformationPractically impossible inalgorithmic wayBut our brains can solve thistasks extremely fast
Use brain as inspiration
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 3/38
Tasks to solve
Why (astro-)particle physicist would care about recognitionof hand written text?
Because it is classification task on not very regular dataDo we see now letter A or H?
→ Classification tasks are very often in our analysisIs this event signal or background?
→ Neural networks have some advantages in this task:Can learn from known examples (data or simulation)Can evolve to complex non-linear modelsEasily can deal with complicated correlations amongsinputsGoing to more inputs does not cost additional examples
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 4/38
Down to principles
Zoom in
Detailed look shows hugenumber of neurons
Neurons are interconnected,each having many inputsfrom other neurons
To build artificial neu-ral network:
Build model ofneuron
Connect manyneurons together
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 5/38
Model of neuron
Get inputs
Combine input to single quantity
Give output
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 6/38
Model of neuron
Get inputs
Combine input to single quantity
Give output
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 7/38
Model of neuron
Get inputs
Combine input to single quantity
Give output
S =∑
i wixi
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 7/38
Model of neuron
Get inputs
Combine input to single quantity
Give output
S =∑
i wixi
Pass S throughactivation func-tionMost common:A(x) = 2
1+e−x − 1
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 7/38
Building neural network
Complicated structureISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 8/38
Building neural network
Even more complicated structureISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 8/38
Building neural network1
2
3
4
5
6
7
8
input layer hidden layer
1
output layer
Or simple structures
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 8/38
Output of Feed-forward NN1
2
3
4
5
6
7
8
input layer hidden layer
1
output layer
Output of node k in second layer:ok = A
(∑
i w1→2ik xi
)
Output of node j in output layer:
oj = A(
∑
k w2→3kj xk
)
oj = A[
∑
k w2→3kj A
(∑
i w1→2ik xi
)
]
More layers usually not needed
If more layers involved, outputfollows same logic
In following, stick to only threelayers as here
Knowledge is stored in weights
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 9/38
Tasks we can solveClassification
Binary decision whether event belongs to one of the twoclassesIs this particle kaon or not?Is this event candidate for Higgs?Will Germany become soccer world champion in 2010?
→ Output layer contains single node
Conditional probability densityGet probability density for each eventWhat is decay time of a decayWhat is energy of particle given the observed showerWhat is probability that customer will cause car accident
→ Output layer contains many nodes
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 10/38
NN work flow
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 11/38
Neural network trainingAfter assembling neural networks, weights w1→2
ik and w2→3kj
unknownInitialised to default valueRandom value
Before using neural network for planned task, need to findbest set of weights
Need some measure, which tells which set of weightsperforms best ⇒ loss function
Training is process, where based on known examples ofevents we search for set of weights which minimise lossfunction
Most important part in using neural networksBad training can result in bad or even wrong solutionsMany traps in the process, so need to be careful
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 12/38
Loss functionPractically any function which allows you to decide, whichneural network is better
→ In particle physics it can be one which results in bestmeasurement
In practical applications, two commonly used functions (Tji
is true value with −1 for one class and 1 for other, oji isactual neural network output)
Quadratic loss functionχ2 =
∑
j wjχ2j =
∑
j12
∑
i (Tji − oji)2
→ χ2 = 0 for correct decisions and χ2 = 2 for wrongEntropy loss functionED =
∑
j wjEjD =
∑
j
∑
i − log(12(1 + Tjioji))
→ ED = 0 for correct decisions and ED = ∞ for wrong
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 13/38
Conditional probability densityClassification is easy case, we need single output node andif output is above chosen value, it is class 1 (signal)otherwise class 2 (background)
Now question is how to obtain probability density?
Use many nodes in output layer, each answering question,whether probability is larger than some value
As example, lets have n = 10 nodes and true value of 0.63Vector of true values for output nodes:(1, 1, 1, 1, 1, 1,−1,−1,−1,−1)Node j answers question whether probability is largerthan j/10
The output vector that represents integrated probabilitydensity
After differentiation we obtain desired probability densityISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 14/38
Interpretation of NN outputIn classification tasks in general higher NN output meansmore signal like event
→ NN output rescaled to interval [0; 1] can be interpreted asprobability
Quadratic/entropy loss function is usedNN is well trained ⇔ loss function minimised
→ Purity P(o) = Nselected signal(o)/Nselected(o) should be
linear function of NN output
Looking to expression for output
oj = A[
∑
k w2→3kj A
(∑
i w1→2ik xi
)
]
→ NN is function, which maps n-dimensional space to 1-dimensionalspace
Network output-1.0 -0.5 0.0 0.5 1.0
purit
y0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 15/38
How to check quality of trainingDefine quantities:
Signal efficiency:
ǫs = N(selected signal)N(all signal)
Efficiency:
ǫ = N(selected)N(all)
Purity:
P = N(selected signal)N(selected)
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38
How to check quality of trainingDefine quantities:
Signal efficiency:
ǫs = N(selected signal)N(all signal)
Efficiency:
ǫ = N(selected)N(all)
Purity:
P = N(selected signal)N(selected)
Network output-1.0 -0.5 0.0 0.5 1.00
20
40
60
80
100
120
140
160
310×
Distribution of NN outputfor two classes
How good is separation?
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38
How to check quality of trainingDefine quantities:
Signal efficiency:
ǫs = N(selected signal)N(all signal)
Efficiency:
ǫ = N(selected)N(all)
Purity:
P = N(selected signal)N(selected)
Network output-1.0 -0.5 0.0 0.5 1.0
purit
y0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Are we at minimum of theloss function?
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38
How to check quality of trainingDefine quantities:
Signal efficiency:
ǫs = N(selected signal)N(all signal)
Efficiency:
ǫ = N(selected)N(all)
Purity:
P = N(selected signal)N(selected)
Signal efficiency0.0 0.2 0.4 0.6 0.8 1.0
Sig
nal p
urity
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Each point has require-ment on out > cut
Each point has require-ment on out < cut
Training S/B
Best point is (1,1)
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38
How to check quality of trainingDefine quantities:
Signal efficiency:
ǫs = N(selected signal)N(all signal)
Efficiency:
ǫ = N(selected)N(all)
Purity:
P = N(selected signal)N(selected)
Efficiency0.0 0.2 0.4 0.6 0.8 1.0
Sig
nal e
ffici
ency
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Each point has require-ment on out > cut
Best achievable
Random decision
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38
Issues in trainingMain issue: Finding minimum in high dimensional space5 input nodes with 5 nodes in hidden layer and one output nodegives 5x5 + 5 = 30 weightsImagine task to find deepest valley in the Alps (only 2 dimen-sions)
Being put to some randomplace, easy to find someminimum
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 17/38
Issues in trainingMain issue: Finding minimum in high dimensional space5 input nodes with 5 nodes in hidden layer and one output nodegives 5x5 + 5 = 30 weightsImagine task to find deepest valley in the Alps (only 2 dimen-sions)
Being put to some randomplace, easy to find someminimum
But what is in next valley?
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 17/38
Issues in trainingIn training, we start with:
Large number of weights which are far from optimal valueInput values in unknown ranges
→ Large chance for numerical issues in calculation of lossfunction
Statistical issues coming from low statistics in trainingIn many places in physics not that much issue as trainingstatistics can be largeIn industry can be quite big issue to obtain large enoughtraining datasetBut also in physics it can be sometimes very costly toobtain large enough training sample
Learning by heart, when NN has enough weights to do that
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 18/38
How to treat issuesAll issues mentioned can be treated by neural networkpackage in principle
Regularisation for numerical issues at the beginning oftraining
Weight decay - forgettingHelps in overtraining on statistical fluctuationsIdea is that effects which are not significant are notshown often enough to keep weight to remember them
Pruning of connections - remove connections, which don’tcarry significant information
Preprocessingnumerical issuessearch for good minimum by good starting point anddecorrelation
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 19/38
Neurobayes preprocessing
0 0.2 0.4 0.6 0.8 1
even
ts
0
1000
2000
3000
4000
5000
6000input node 5 flat
1 1.0224 1.0449 1.0667 1.0887 1.1107 1.133 1.1551 1.1779 1.2021 1.2272 1.2531 1.2805 1.3099 1.3409 1.3739 1.4096 1.4482 1.4886 1.5328 1.5801 1.6316 1.6865 1.7462 1.8114 1.882 1.9542 2.0361 2.1242 2.2173 2.3178 2.4212 2.5321 2.649 2.7733 2.905 3.045 3.1934 3.3514 3.52 3.706 3.9122 4.1511 4.4197 4.7391 5.1186 5.5871 6.2022 7.121 8.7501 29.709
bin n.10 20 30 40 50 60 70 80 90 100
purit
y
0
0.2
0.4
0.6
0.8
1
1.2 spline fit
−3 −2 −1 0 1 2 3
even
ts
0
10000
20000
30000
40000
50000
final
Bin to bins ofequal statistics
Calculate purityfor each bin anddo spline fit
Use purity in-stead of originalvalue, plus trans-form to µ = 0and RMS = 1
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 20/38
Electron ID at CDF
Comparing two different NN packages with exactly same inputsPreprocessing is important and can improve result significantly
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 21/38
Available packagesNot exhaustive list
NeuroBayeshttp://www.phi-t.deBest I knowCommercial, but fee is small for scientific useDeveloped originally in high energy physics experiment
ROOT/TMVAPart of the ROOT packageCan be used for playing, but I would advise to usesomething else for serious applications
SNNS: Stuttgart Neural Network Simulatorhttp://www.ra.cs.uni-tuebingen.de/SNNS/Written in C, but ROOT interface existBest open source package I know about
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 22/38
NN vs. other multivariate techniquesSeveral times I saw statements of type, my favourite methodis better than other multivariate techniques
Usually lot of time is spend to understand and properlytrain favourite methodOther methods are tried just for quick comparison withoutunderstanding or tuningOften you will even not learn any details of how othermethods were setup
With TMVA it is now easy to compare different methods -often I saw that Fischer discriminant won in those
From some tests we found that NN implemented in TMVAis relativily badNo wonder that it is usually worst than other methods
Usually it is useful to understand method you are using, noneof them is black box
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 23/38
Practical tips
No need for several hidden layers, one should be sufficientfor you
Number of nodes in hidden layer should not be very large
→ Risk of learning by heart
Number of nodes in hidden layer should not be very small
→ Not enough capacity to learn your problem
→ Usually O(number of inputs) in hidden layer
Understand your problem, NN gets numbers and can dig in-formation out of them, but you know meaning of those num-bers
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 24/38
Practical tips
Don’t throw away knowledge you acquired about problembefore starting using NN
→ NN can do lot, but you don’t need that it discovers what youalready know
If you already know that given event is background, throwit awayIf you know about good inputs, use them rather than rawdata from which they are calculated
Use neural network to learn what you don’t know yet
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 25/38
Physics examplesOnly mentioning work done by Karlsruhe HEP group
"Stable" particle identification at Delphi, CDF and now Belle
Properties (E , φ, θ, Q-value) of inclusively reconstructed Bmesons
Decay time in semileptonic Bs decays
Heavy quark resonances studies: X (3872), B∗∗, B∗∗
s , Λ∗
c, Σc
B flavour tagging and Bs mixing (Delphi, CDF, Belle)
CPV in Bs → J/ψφ decays (CDF)
B tagging for high pT physics (CDF, CMS)
Single top and Higgs search (CDF)
B hadron full reconstruction (Belle)
Optimisation of KEKB acceleratorISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 26/38
B∗∗s
at CDF
Neural Network Output-1.0 -0.5 0.0 0.5 1.0
Can
dida
tes
0
50
100
150
200
250
300
350
400
CDF Run 2 -11.0 fb-] K+ Kψ [J/→ -K+ B→**
sB
SignalBackground
]2) [GeV/c-)-M(K+)-M(B-K+M(B0.00 0.05 0.10 0.15 0.20
2C
andi
date
s pe
r 1.
25 M
eV/c
0
5
10
15
20
25
30
35
40 -K+B+K+B
Signal
Background
CDF Run 2 -11.0 fb
Bs1
B∗
s2
Background from dataSignal from MC
First observation of Bs1Most precise masses
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 27/38
X(3872) mass at CDF
Network Output-1.0 -0.5 0.0 0.5 1.0
Can
dida
tes
0
20
40
60
80
100
310×
Signal
Background
CDF II Preliminary -12.4 fb
)2 Mass (GeV/cππψJ/3.4 3.6 3.8 4.0
2C
andi
date
s pe
r 2
MeV
/c
0
10000
20000
30000
40000
50000
60000 Total
Rejected
Selected
CDF II Preliminary -12.4 fb
X (3872) first of charmonium likestatesLot of effort to understand itsnature
Powerful selection at CDF oflargest sample leading to bestmass measurementISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 28/38
Single top search
KIT Flavor Sep. Output-1 -0.5 0 0.5 1
Eve
nt
Fra
ctio
n
0
0.1
0.2
single toptt
c+WcbWbWc
qWqDibosonZ+jetsQCD
TLC 2Jets 1Tag-1
CDF II Preliminary 3.2 fb
no
rmal
ized
to
un
it a
rea
KIT Flavor Sep. Output-1 -0.5 0 0.5 1
Eve
nt
Fra
ctio
n
0
0.1
0.2
NN Output-1 -0.5 0 0.5 1
Can
did
ate
Eve
nts
0
100
200
300
NN Output-1 -0.5 0 0.5 1
Can
did
ate
Eve
nts
0
100
200
300
-1CDF II Preliminary 3.2 fb
MC
no
rmal
ized
to
SM
pre
dic
tio
n
All Channels
0.6 0.7 0.8 0.9 10
1020
3040
506070
80
0.6 0.7 0.8 0.9 10
1020
3040
506070
80single toptt
c+WcbWbWc
qWqDibosonZ+jetsQCDdata
Complicated analysis withmany different multivariatetechniques
Tagging whether jet is b-quarkjet or not using NN
Final data plot
Part of big program leadingto discovery of single topproductionISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 29/38
B flavour tagging - CDF
Flavour tagging at hadroncolliders is difficult task
Lot of potential forimprovements
In plot, concentrate oncontinuous lines
Example of not that goodseparation
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 30/38
Training with weights
]2
) [GeV/ccΛ)-Mass(π cΛMass(0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.185 0.19
Can
did
ates
per
0.4
MeV
0
1000
2000
3000
4000
5000
6000
7000
23300≈#Signal Events 472200≈#Background Events
-1CDF Run II preliminary, L = 2.4 fb
One knows statistically whether event is signal or background
Or has sample of mixture of signal and background and puresample for one class
Can use data only as example of applicationISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 31/38
Training with weights
]2
) [GeV/ccΛ)-Mass(π cΛMass(0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.185 0.19
Can
did
ates
per
0.4
MeV
0
1000
2000
3000
4000
5000
6000
7000
23300≈#Signal Events 472200≈#Background Events
-1CDF Run II preliminary, L = 2.4 fb
0.15 0.20 0.25C
andi
date
s pe
r 1.
30 M
eV0
500
1000
1500
2000
2500
3000
DataFit Function
+π +cΛ → ++(2455)cΣ
+π +cΛ → ++(2520)cΣ
-1CDF Run II preliminary, L = 2.4 fb
]2
) [GeV/c+cΛ)-Mass(+π +
cΛMass(0.15 0.20 0.25
data
(dat
a -
fit)
-2
0
2→ Use only data in selec-tion→ Can compare data to
simulation→ Can reweight simula-
tion to match dataISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 32/38
Examples outside physics
Medicine and Pharma researche.g. effects and undesirable effects of drugs early tumorrecognition
Bankse.g. credit-scoring (Basel II), finance time series prediction,valuation of derivates, risk minimised trading strategies,client valuation
Insurancese.g. risk and cost prediction for individual clients, probabilityof contract cancellation, fraud recognition, justice in tariffs
Trading chain stores:turnover prognosis for individual articles/stores
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 33/38
Data Mining CupWorld‘s largeststudents competition:Data-Mining-Cup
Very successful incompetition with otherdata-mining methods
2005: Fraud detectionin internet trading2006: price predictionin ebay auctions2007: coupon redemp-tion prediction2008: lottery customerbehaviour predictionISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 34/38
NeuroBayes Turnover prognosis
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 35/38
Individual health costsPilot project for a large private health insurance
Prognosis of costs in fol-lowing year for each personinsured with confidenceintervals
4 years of training, teston following year
Results: Probability den-sity for each customer/tariffcombination
Very good test results!
Potential for an objective cost reduction in health managementISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 36/38
Prognosis of sport events
Results: Probabilities for home - tie - guest
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 37/38
Conclusions
Neural network is powerful tool for your analysis
They are not genius, but can do marvellous things for genius
Lecture of this kind cannot really make you expert
Hope that I was successful with:Give you enough insight so that neural network is notblack box anymoreTeaching you enough not to be afraid to try
Feel free to talk to me, I’m at Campus Süd, building 30.23,room 9-11
ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 38/38