Upload
teva
View
36
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Bayesian Within The Gates A View From Particle Physics. Harrison B. Prosper Florida State University SAMSI 24 January, 2006. Outline. Measuring Zero as Precisely as Possible! Signal/Background Discrimination 1-D Example 14-D Example Some Open Issues Summary. Measuring Zero!. - PowerPoint PPT Presentation
Citation preview
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 1
Bayesian Within The GatesBayesian Within The GatesA View From Particle PhysicsA View From Particle Physics
Harrison B. ProsperFlorida State University
SAMSI24 January, 2006
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 2
OutlineOutline Measuring Zero as Precisely as Possible!
Signal/Background Discrimination 1-D Example 14-D Example
Some Open Issues
Summary
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 3
Measuring Zero!Measuring Zero!Diamonds may not beforeverNeutron <-> anti-neutron transitions, CRISPExperiment (1982 – 1985),Institut Laue LangevinGrenoble, France
MethodFire gas of cold neutrons onto a graphite foil. Look for annihilation ofanti-neutron component.
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 4
Measuring Zero!Measuring Zero!Count number of signal +background events N.
Suppress putative signaland count background events B, independently.
Results:
N = 3
B = 7
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 5
Measuring Zero!Measuring Zero!Classic 2-ParameterCounting Experiment
N ~ Poisson(s+b)
B ~ Poisson(b)
Wanted:
A statement like
s < u(N,B) @ 90% CL
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 6
Measuring Zero!Measuring Zero!In 1984, no exact solutionexisted in the particlephysics literature!
But, surely it must havebeen solved by statisticians.
Alas, from Kendal and Stuart I learnt that calculating exactconfidence intervals is “a matter of very considerable difficulty”.
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 7
Measuring Zero!Measuring Zero!
Exact in what way?
Over the ensemble of statements of the form
s є [0, u)
at least 90% of them should be true
whatever the true value of the signal s AND whatever the true value of the background parameter b.
blame… Neyman (1937)
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 8
“Keep it simple, but no simpler”
Albert Einstein
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 9
Bayesian @ the Gate (1984)Bayesian @ the Gate (1984)Solution:
p(N,B|s,b) = Poisson(s+b) Poisson(b) the likelihoodlikelihoodp(s,b) = uniform(s,b) the priorprior
Compute the posteriorposterior density p(s,b|NN,,BB)p(s,b|NN,,BB) = p(NN,,BB|s,b) p(s,b)/p(NN,,BB)
Marginalize over bp(s|N,B) = ∫p(s,b|N,B) db This reasoning was
compelling to me then, and is much more so now!
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 10
Particle Physics DataParticle Physics Data
proton + anti-proton
-> positron (e+)neutrino ()Jet1Jet2Jet3Jet4
This event “lives” in3 + 2 + 3 x 4 = 17dimensions.
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 11
jetslttpp
CDF/DzeroDiscovery of top quark(1995)
Data redSignal greenBackgroundblue, magenta
Dzero: 17-D -> 2-D
Particle Physics DataParticle Physics Data
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 12
But that was then, and now is now!
Today we have 2 GHz laptops, with 2 GB of memory!
It is fun to deploy huge, sometimes unreliable,computational resources, that is, brains, to reducethe dimensionality of data.
But perhaps it is now feasible to work directly in the original high-dimensional space, using hardware!
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 13
Signal/Background Signal/Background DiscriminationDiscrimination
The optimal solution is to compute
p(S|x) = p(x|s) p(s) / [p(x|s) p(s) + p(x|B) p(B)]
Every signal/background discrimination method is ultimately an algorithm to approximate this solution, or a mapping thereof.
Therefore, if a method is already at the Bayes limit, no other method, however sophisticated, can do better!
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 14
GivenD D = x, yx = {x1,…xN}, y = {y1,…yN}of N training examples
Infer A discriminant function f(x, w), with parameters w p(ww|x, y) = p(x, y|ww) p(ww) / p(x, y)= p(y|x, w) p(x|ww) p(ww) / p(y|x) p(x)= p(y|x, w) p(ww) / p(y|x)assuming p(x|w) -> p(x)
Signal/Background Signal/Background DiscriminationDiscrimination
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 15
A typical likelihood for classification:
p(y|x, ww) = i f(xi, ww)y [1 – f(xi, ww)]1-y
where y = 0 for background eventsy = 1 for signal events
If f(x, ww) flexible enough, then maximizing p(y|x, ww) with respect to w yields f = p(S|x), asymptotically.
Signal/Background Signal/Background DiscriminationDiscrimination
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 16
However, in a full Bayesian calculation one usually averages with respect to the posterior density
y(x) = ∫ f(x, ww) p(ww|D) dw
Questions:1. Do suitably flexible functions f(x, ww) exist?
2. Is there a feasible way to do the integral?
Signal/Background Signal/Background DiscriminationDiscrimination
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 17
Answer 1: Hilbert’s 13Answer 1: Hilbert’s 13thth Problem!Problem!
Prove that the following is impossible
y(x,y,z) = F( A(x), B(y), C(z) )
In 1957, Kolmogorov proved thecontrary conjecture
y(x1,..,xn) = F( f1(x1),…,fn(xn) )
I’ll call such functions, F, Kolmogorov functions
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 18
Kolmogorov FunctionsKolmogorov Functions
H
j
P
iiijjj xuavbwxf
1 1
tanh),(
n(x,w)x1
x2
u, a
v, b )],(exp[11),(
wxfwxn
A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:RN -> UThe parameters w = (u, a, v, b) are called weightsweights
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 19
Answer 2: Use Hybrid MCMCAnswer 2: Use Hybrid MCMCComputational Method
Generate a Markov chain (MC) of N points {w} drawn from the posterior density p(w|D) and average over the last M points.
Each point corresponds to a network.
SoftwareFlexible Bayesian Modeling by Radford Neal
http://www.cs.utoronto.ca/~radford/fbm.software.html
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 20
A 1-D ExampleA 1-D ExampleSignal
p+pbar -> t q b
Background p+pbar -> W b b
NN Model Class (1, 15, 1)
MCMC 500 tqb + Wbb events Use last 20 networks
in a MC chain of 500.
x
Wbbtqb
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 21
A 1-D ExampleA 1-D Example
x
Dots p(S|x) = HS/(HS+HB)
HS, HB, 1-D histograms
Curves Individual NNs n(x, wwkk)
Black curve < n(x, w) >
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 22
A 14-D Example (Finding A 14-D Example (Finding Susy!)Susy!)
Transversemomentumspectra
Signal:blackcurve
Signal/Noise
1/100,0001/100,000
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 23
A 14-D Example (Finding A 14-D Example (Finding Susy!)Susy!)
Missingtransversemomentumspectrum
(caused byescape ofneutrinosand Susyparticles)
Variable count
4 x (ET, , )
+ (ET, )
= 14
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 24
Likelihood Prior
A 14-D Example (Finding A 14-D Example (Finding Susy!)Susy!)
Signal250 p+pbar -> top + anti-top (MC) events
Background250 p+pbar -> gluino gluino (MC) events
NN Model Class(14, 40, 1) (641-D parameter space!)
MCMCUse last 100 networks in a Markov chain of
10,000, skipping every 20.
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 25
But does it Work?But does it Work?
Signal to noisecan reach 1/1with anacceptablesignal strength
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 26
But does it Work? But does it Work? Let
d(x) = N p(x|S) + N p(x|B) be the density of the data, containing 2N events, assuming, for simplicity, p(S) = p(B).
A properly trained classifier y(x) approximates
p(S|x) = p(x|S)/[p(x|S) + p(x|B)]
Therefore, if the signal and background events are weighted with y(x), we should recover the signal density.
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 27
But does it Work? But does it Work?
Amazingly well !
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 28
Some Open IssuesSome Open Issues Why does this insane function p(w1,…,w641|x1,
…,x500) behave so well? 641 parameters > 500 events!
How should one verify that an n-D (n ~ 14) swarm of simulated background events matches the n-D swarm of observed events (in the background region)?
How should one verify that y(x) is indeed a reasonable approximation to the Bayes discriminant, p(S|x)?
Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 29
SummarySummary Bayesian methods have been, and are being,
used with considerable success by particle physicists. Happily, the frequentist/Bayesian Cold War is abating!
The application of Bayesian methods to highly flexible functions, e.g., neural networks, is very promising and should be broadly applicable.
Needed: A powerful way to compare high-dimensional swarms of points.
Agree, or not agree, that is the question!