Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Introduction to Unfolding in High Energy Physics
Mikael KuuselaInstitute of Mathematics,
EPFL
Advanced Scientific Computing Workshop,ETH Zurich
July 15, 2014
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 1 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 2 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 3 / 66
The unfolding problem
Unfolding refers to the problem of estimating the particle-leveldistribution of some physical quantity of interest on the basis ofobservations smeared by an imperfect measurement device
What would the distribution look like when measured with a devicehaving a perfect experimental resolution?
Cf. deconvolution in optics, image reconstruction in medical imaging
−5 0 50
500
1000
1500
Physical observable
(b) Smeared intensity
Intensity
Figure : Smeared density
Folding←−−Unfolding−−→
−5 0 50
500
1000
1500
Physical observable
(a) True intensity
Intensity
Figure : True density
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 4 / 66
Why unfold?
Unfolding is usually done to achieve one or more of the following goals:
1 Comparison of the measurement with future theories
2 Comparison of experiments with different responses
3 Input to a subsequent analysis
4 Exploratory data analysis
Unfolding is most often used in measurement analyses (as opposed todiscovery analyses): QCD, electroweak, top, forward physics,...
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 5 / 66
Examples of unfolding in LHC data analysis
Inclusive jet cross section
(GeV)T
Jet p200 300 1000 2000
d|y|
(pb
/GeV
)T
/dp
σ2 d
-510
-110
310
710
1110
1310 )4 10×|y| < 0.5 (
)3 10×0.5 < |y| < 1.0 ( )2 10×1.0 < |y| < 1.5 ( )1 10×1.5 < |y| < 2.0 ( )0 10×2.0 < |y| < 2.5 (
CMS = 7 TeVs
-1L = 5.0 fb R = 0.7Tanti-k
T= p
Fµ=
Rµ
NP Corr.⊗NNPDF2.1
W boson cross section
Hadronic event shape
,Cτln -12 -10 -8 -6 -4 -2
,Cτ
dN/
dln
N1
0.00
0.05
0.10
0.15
0.20
,Cτln -12 -10 -8 -6 -4 -2
,Cτ
dN/
dln
N1
0.00
0.05
0.10
0.15
0.20 > 30 GeV/cT
, R=0.5, pTanti-k< 125 GeV/c
T,190 GeV/c < p
Pythia6Pythia8Herwig++MadGraph+Pythia6Alpgen+Pythia6Data
-1 = 7 TeV, L = 3.2 pbsCMS
Charged particle multiplicity
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 6 / 66
Problem formulation
Notation:
λ ∈ Rp+ bin means of the true histogram
x ∈ Np0 bin counts of the true histogram
µ ∈ Rn+ bin means of the smeared histogram
y ∈ Nn0 bin counts of the smeared histogram
Assume that:1 The true counts are independent and Poisson distributed
x|λ ∼ Poisson(λ), ⊥⊥ xi |λ
2 The propagation of events to neighboring bins is multinomialconditional on xi and independent for each true bin
It follows that the smeared counts are also independent and Poissondistributed
y|λ ∼ Poisson(Kλ), ⊥⊥ yi |λ
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 7 / 66
Problem formulation
Here the elements of the smearing matrix K ∈ Rn×p are given by
Kij = P(smeared event in bin i | true event in bin j)
and assumed to be known
The unfolding problem:
Problem statement
Given the smeared observations y and the Poisson regression model
y|λ ∼ Poisson(Kλ),
what can be said about the means λ of the true histogram?
The problem here is that typically K is an ill-conditioned matrix
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 8 / 66
Unfolding is an ill-posed inverse problem
The unfolding problem is typically ill-posed in the sense that the(pseudo)inverse of K is very sensitive to small perturbations in thedata
From y|λ ∼ Poisson(Kλ) we have that µ = Kλ
We could naıvely estimate λ = K†µ = K†y
But this can lead to catastrophic results!
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 9 / 66
Demonstration of the ill-posedness
−6 −4 −2 0 2 4 60
100
200
300
400
500Smeared histogram
−6 −4 −2 0 2 4 60
100
200
300
400
500True histogram
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 10 / 66
Demonstration of the ill-posedness
−6 −4 −2 0 2 4 6−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2x 10
13
Pseudoinverse
True
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 11 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 12 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 13 / 66
The likelihood function
The likelihood function in unfolding is:
L(λ) = p(y|λ) =n∏
i=1
p(yi |λ) =n∏
i=1
(∑pj=1 Kijλj
)yiyi !
e−∑p
j=1 Kijλj , λ ∈ Rp+
This function uses our Poisson regression model to link theobservations y with the unknown λ
The likelihood function plays a key role in all sensible unfoldingmethods
In most statistical problems, the maximum of the likelihood, orequivalently the maximum of the log-likelihood, provides a goodestimate of the unknown
In ill-posed problems, this is usually not the case, but the maximumlikelihood solution still provides a good starting point
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 14 / 66
Maximum likelihood estimation
Any histogram that maximizes the log-likelihood of the unfoldingproblem is called a maximum likelihood estimator λMLE of λ
Hence, we want to solve:
maxλ∈Rp
+
log p(y|λ) =n∑
i=1
yi log
p∑j=1
Kijλj
− p∑j=1
Kijλj
+ const
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 15 / 66
Maximum likelihood estimation
Theorem (Vardi et al. (1985))
Assume Kij > 0 and y 6= 0. Then the following hold for the log-likelihoodlog p(y|λ) of the unfolding problem:
1 The log-likelihood has a maximum.
2 The log-likelihood is concave and hence all the maxima are globalmaxima.
3 The maximum is unique if and only if the columns of K are linearlyindependent
So a unique MLE exists when the columns of K are linearlyindependent but how do we find it?
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 16 / 66
Maximum likelihood estimation
Proposition
Let K be an invertible square matrix and assume that λ = K−1y ≥ 0.Then λ is the MLE of λ.
That is, matrix inversion gives us the MLE if K is invertible and theresulting estimate is positive
Note that this result is more restrictive than it may seem
K is often non-squareEven if K was square, it is often not invertibleAnd even if K was invertible, K−1y often contains negative values
Is there a general recipe for finding the MLE?
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 17 / 66
Maximum likelihood estimation
The MLE can always be found computationally by using theexpectation-maximization (EM) algorithm (Dempster et al. (1977))
This is a widely used iterative algorithm for finding maximum likelihoodsolutions in problems that can be seen as containing incompleteobservations
Starting from some initial value λ(0) > 0, the EM iteration forunfolding is given by:
λ(k+1)j =
λ(k)j∑n
i=1 Kij
n∑i=1
Kijyi∑pl=1 Kilλ
(k)l
, j = 1, . . . , p
The convergence of this iteration to an MLE (i.e. λ(k) k→∞−→ λMLE)was proved by Vardi et al. (1985)
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 18 / 66
Maximum likelihood estimation
The EM iteration for finding the MLE in Poisson regression problemshas been rediscovered many times in different fields:
Optics: Richardson (1972)Astronomy: Lucy (1974)Tomography: Shepp and Vardi (1982); Lange and Carson (1984); Vardiet al. (1985)HEP: Kondor (1983); Multhei and Schorr (1987); Multhei et al. (1987,1989); D’Agostini (1995)
In modern use, the algorithm is most often called D’Agostini iterationin HEP and Lucy–Richardson deconvolution in astronomy and optics
In HEP, also the name “Bayesian unfolding” is used but this is anunfortunate misnomer
D’Agostini iteration is a fully frequentist technique for finding the MLEThere is nothing Bayesian about it!
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 19 / 66
D’Agostini demo, k = 0
−10 −5 0 5 100
100
200
300
400
500µy
Kλ(k)
Figure : Smeared histogram
−10 −5 0 5 100
100
200
300
400
500
λλ(k)
Figure : True histogram
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 20 / 66
D’Agostini demo, k = 100
−10 −5 0 5 100
100
200
300
400
500µy
Kλ(k)
Figure : Smeared histogram
−10 −5 0 5 100
100
200
300
400
500
λλ(k)
Figure : True histogram
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 21 / 66
D’Agostini demo, k = 10000
−10 −5 0 5 100
100
200
300
400
500µy
Kλ(k)
Figure : Smeared histogram
−10 −5 0 5 100
200
400
600
800
λλ(k)
Figure : True histogram
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 22 / 66
D’Agostini demo, k = 100000
−10 −5 0 5 100
100
200
300
400
500µy
Kλ(k)
Figure : Smeared histogram
−10 −5 0 5 100
500
1000
1500
λλ(k)
Figure : True histogram
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 23 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 24 / 66
Regularization by early stopping of the EM iteration
We have seen that unfortunately the MLE itself is often useless
Due to the ill-posedness of the problem, it exhibits large, unphysicalfluctuationsIn other words, the likelihood function alone does not contain enoughinformation to constrain the solution
As the EM iteration proceeds, the solutions will typically first improvebut will start to degrade at some point
This is because the algorithm will start overfitting to the Poissonfluctuations in y
This behavior can be exploited by stopping the iteration beforeunphysical features start to appear
The number of iterations k now becomes a regularization parameterthat controls the trade-off between fitting the data and tamingunphysical features
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 25 / 66
D’Agostini demo, k = 100
−10 −5 0 5 100
100
200
300
400
500µy
Kλ(k)
Figure : Smeared histogram
−10 −5 0 5 100
100
200
300
400
500
λλ(k)
Figure : True histogram
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 26 / 66
Penalized maximum likelihood estimation
Early stopping of the EM iteration seems a bit ad-hoc
Is there a more principled way of finding good solutions?
Ideally we would like to find a solution that fits the data but at thesame time seems physically plausible
Let’s consider a penalized maximum likelihood problem:
maxλ∈Rp
+
F (λ) = log p(y|λ)− δP(λ),
Here:
P(λ) is a penalty function which obtains large values for physicallyimplausible solutionsδ > 0 is a regularization parameter which controls the balance betweenmaximizing the likelihood and minimizing the penalty
Typically P(λ) is a measure of the curvature of the solution
I.e., it penalizes for large oscillations
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 27 / 66
From penalized likelihood to Tikhonov regularization
To simplify this optimization problem, we use a Gaussianapproximation of the Poisson likelihood
y|λ ∼ Poisson(Kλ) ≈ N(Kλ, C),
where C = diag(y)
Hence the objective function becomes:
F (λ) = log p(y|λ)− δP(λ)
=n∑
i=1
yi log
p∑j=1
Kijλj
− p∑j=1
Kijλj
− δP(λ) + const
≈ −1
2(y −Kλ)TC−1(y −Kλ)− δP(λ) + const
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 28 / 66
From penalized likelihood to Tikhonov regularization
We furthermore drop the positivity constraint and absorb the factor1/2 into the penalty to obtain
λ = arg maxλ∈Rp
−(y −Kλ)TC−1(y −Kλ)− δP(λ)
= arg minλ∈Rp
(y −Kλ)TC−1(y −Kλ) + δP(λ)
We see that we have ended up with a penalized χ2 problem
This is typically called (generalized) Tikhonov regularization
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 29 / 66
How to choose the penalty?
The penalty term should reflect the analyst’s a priori understanding ofthe desired solution
Common choices include:
Norm of the solution: P(λ) = ‖λ‖2Curvature of the solution: P(λ) = ‖Lλ‖2, where L is a discretized 2ndderivative operator“SVD” unfolding (Hocker and Kartvelishvili, 1996):
P(λ) =
∥∥∥∥∥∥∥∥∥L
λ1/λ
MC1
λ2/λMC2
...λp/λ
MCp
∥∥∥∥∥∥∥∥∥2
,
where λMC is a MC prediction for λTUnfold1 (Schmitt, 2012): P(λ) = ‖L(λ− λMC)‖2
1Also more general penalty terms are allowed in TUnfoldMikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 30 / 66
Least squares estimation with the pseudoinverse
Consider the least squares problem
minx∈Rp‖Ax− y‖2,
where A ∈ Rn×p, x ∈ Rp and y ∈ Rn
This problem always has a solution, but it may not be unique
A solution is always given by the Moore–Penrose pseudoinverse of A:
xLS = A†y
When there are multiple solutions, the pseudoinverse gives the onewith the smallest norm
When A has full column rank, the solution is unique
In this special case, the pseudoinverse is given by A† = (ATA)−1AT
Hence, the least squares solution is: xLS = (ATA)−1ATy
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 31 / 66
Finding the Tikhonov regularized solution
We will now find an explicit form of the Tikhonov regularized estimator
λ = arg minλ∈Rp
(y −Kλ)TC−1(y −Kλ) + δ‖Lλ‖2
by rewriting this as a least squares problemThis approach also easily generalizes to penalty terms involving λMC
Let us rewrite:
C−1
= diag
(1
y1, . . . ,
1
yn
)= diag
(1√
y1, . . . ,
1√
yn
)︸ ︷︷ ︸
:=A
diag
(1√
y1, . . . ,
1√
yn
)︸ ︷︷ ︸
:=A
= AA = ATA
Defining y := Ay and K := AK, our optimization problem becomes
λ = arg minλ∈Rp
(y − Kλ)T(y − Kλ) + δ‖Lλ‖2
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 32 / 66
Finding the Tikhonov regularized solution
We can rewrite the objective function as follows:
(y − Kλ)T(y − Kλ) + δ‖Lλ‖2
= ‖Kλ− y‖2 + ‖√δLλ‖2
=
∥∥∥∥[Kλ− y√δLλ
]∥∥∥∥2=
∥∥∥∥[ K√δL
]λ−
[y0
]∥∥∥∥2Here we recognize a least squares problem, so a minimizer is given by
λ =
[K√δL
]† [y0
]
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 33 / 66
Finding the Tikhonov regularized solution
Assuming that ker(K) ∩ ker(L) = {0}, the minimizer is unique andcan be simplified as follows:
λ =
[K√δL
]† [y0
]
=
([K√δL
]T [K√δL
])−1 [K√δL
]T [y0
]
=
([K
T √δLT
] [ K√δL
])−1 [K
T √δLT
] [y0
]=(
KT
K + δLTL)−1
KT
y
=(
KTC−1
K + δLTL)−1
KTC−1
y
Hence we have obtained an explicit, closed-form solution for theTikhonov regularization problem
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 34 / 66
Demonstration of Tikhonov regularization, P(λ) = ‖λ‖2
−6 −4 −2 0 2 4 6−100
0
100
200
300
400
500
600
Tikhonov regularization
True
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 35 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 36 / 66
Bayesian unfolding
In Bayesian unfolding, the inferences about λ are based on theposterior distribution p(λ|y)
This is obtained using Bayes’ rule:
p(λ|y) =p(y|λ)p(λ)
p(y)=
p(y|λ)p(λ)∫Rp+
p(y|λ′)p(λ′) dλ′, λ ∈ Rp
+
where the likelihood p(y|λ) is the same as earlier and p(λ) is a priordistribution for λThe most common choices as a point estimator of λ are:
The posterior mean: λ = E[λ|y] =∫Rp
+λp(λ|y) dλ
The maximum a posteriori (MAP) estimator: λ = arg maxλ∈Rp
+
p(λ|y)
The width of the posterior distribution p(λ|y) can be used to quantifyuncertainty regarding λ
But note that the interpretation of the resulting Bayesian credibleintervals is different from frequentist confidence intervals
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 37 / 66
Regularization using the prior
In the Bayesian approach, the prior density p(λ) regularizes theotherwise ill-posed problem
It concentrates the probability mass of the posterior on physicallyplausible solutions
The prior is typically of the form
p(λ) ∝ exp (−δP(λ)) , λ ∈ Rp+,
where P(λ) is a function characterizing a priori plausible solutions andδ > 0 is a hyperparameter controlling the scale of the prior density
For example, choosing P(λ) = ‖Lλ‖2, where L a discretized 2ndderivative operator, leads to the positivity-constrained Gaussiansmoothness prior
p(λ) ∝ exp(−δ‖Lλ‖2
), λ ∈ Rp
+
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 38 / 66
Connection between Bayesian unfolding and penalized MLE
Notice that when p(λ) ∝ exp (−δP(λ)), the Bayesian MAP solutioncoincides with the penalized maximum likelihood estimator:
λMAP = arg maxλ∈Rp
+
p(λ|y)
= arg maxλ∈Rp
+
log p(λ|y)
= arg maxλ∈Rp
+
log p(y|λ) + log p(λ)
= arg maxλ∈Rp
+
log p(y|λ)− δP(λ)
= λPMLE
So the penalty term δP(λ) can either be interpreted as a Bayesianprior or as a frequentist regularization term
The Bayesian interpretation has the advantage that we can visualizethe prior p(λ) by, e.g., drawing samples from it
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 39 / 66
A note about Bayesian computations
To be able to compute the posterior mean E[λ|y] or form the Bayesiancredible intervals, we need to be able to evaluate the posterior
p(λ|y) =p(y|λ)p(λ)∫
Rp+
p(y|λ′)p(λ′) dλ′
But the denominator is an intractable high-dimensional integral...Luckily, it turns out that it is possible to sample from the posteriorwithout evaluating the denominator
The sample mean and sample quantiles can then be used to computethe posterior mean and the credible intervals
The class of algorithms that enable this are called Markov chainMonte Carlo (MCMC) samplers and are based on a Markov chainwhose equilibrium distribution is the posterior p(λ|y)
The single-component Metropolis–Hastings sampler of Saquib et al.(1998) is particularly well-suited for the unfolding problem and seemsto also work well in practice
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 40 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 41 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 42 / 66
Choice of the regularization strength
All unfolding methods involve a free parameter controlling thestrength of the regularization
The parameter δ in Tikhonov regularization and Bayesian unfolding,the number of iterations in D’Agostini
This parameter is typically difficult to choose using only a prioriinformation
But its value usually has a major impact on the unfolded spectrum
Most LHC analyses choose the regularization parameter using MCstudies
But this may create an undesired MC bias
It would be better to choose the regularization parameter based onthe observed data y
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 43 / 66
Data-dependent choice of the regularization strength
Many methods for using the observed data y to choose theregularization strength have been proposed in the literature:
Goodness-of-fit test in the smeared space (Veklerov and Llacer, 1987)Empirical Bayes estimation (Kuusela and Panaretos, 2014)L-curve (Hansen, 1992)(Generalized) cross validation (Wahba, 1990)...
At the moment, we have very limited experience about the relativemerits of these methods in HEP unfolding
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 44 / 66
Goodness-of-fit for choosing the regularization strength
We present here a simplified version of the procedure proposed byVeklerov and Llacer (1987)
Let µ = Kλ be the estimated smeared mean
Consider the χ2 statistic
T = (µ− y)TC−1(µ− y),
where C = diag(µ)
If y ∼ Poisson(µ), then asymptotically Ta∼ χ2
n, where n is thenumber of bins in y
Hence, E[T ] ≈ n
This suggests that we should choose the regularization strength sothat T is as close as possible to n
Note that this provides a balance between overfitting (T < n) andunderfitting (T > n) the data
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 45 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 46 / 66
Uncertainty quantification
Proper uncertainty quantification is one of the main challenges inunfolding
By uncertainty quantification, we mean computing bin-wisefrequentist confidence intervals at 1− α confidence level:
infλ∈Rp
+
Pλ[λi ,L(y) ≤ λi ≤ λi ,U(y)] = 1− α
In practice, we can only hope to satisfy this approximately for finitesample sizes
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 47 / 66
Uncertainty quantification
Let SE[λi ] be the standard error of λi (i.e., the standard deviation ofthe sampling distribution of λi )
In many situations, λi ± SE[λi ] provides a reasonable 68% confidenceinterval
But this is only true when λi is unbiased and has a symmetric samplingdistribution
But in regularized unfolding the estimators are always biased!Regularization reduces variance by increasing the bias (bias-variancetrade-off)Hence the SE confidence intervals may have lousy coverage
SE[λi ] SE[λi ]
p(λi |λ)
λi = E[λi ]
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 48 / 66
Uncertainty quantification
Let SE[λi ] be the standard error of λi (i.e., the standard deviation ofthe sampling distribution of λi )
In many situations, λi ± SE[λi ] provides a reasonable 68% confidenceinterval
But this is only true when λi is unbiased and has a symmetric samplingdistribution
But in regularized unfolding the estimators are always biased!Regularization reduces variance by increasing the bias (bias-variancetrade-off)Hence the SE confidence intervals may have lousy coverage
SE[λi ] SE[λi ]
p(λi |λ)
λi E[λi ]
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 48 / 66
Demonstration with Tikhonov regularization, P(λ) = ‖λ‖2
−5 0 5−2000
−1000
0
1000
2000
δ = 1e−05
λ
λ± SE[λ]
−5 0 5−100
0
100
200
300
400
500
600
δ = 0.001
λ
λ± SE[λ]
−5 0 5−100
0
100
200
300
400
500
600
δ = 0.01
λ
λ± SE[λ]
−5 0 5−100
0
100
200
300
400
500
600
δ = 1
λ
λ± SE[λ]
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 49 / 66
Uncertainty quantification
The uncertainties returned by RooUnfold are estimates of thestandard errors computed either using error propagation or resampling
Hence these uncertainties should be understood as estimates of thespread of the sampling distribution of λThese should only be understood as approximate confidence intervals ifit can be shown that the bias is negligible
Bootstrap resampling provides an attractive way of formingapproximate confidence intervals that take into account the bias andthe potential skewness of p(λi |λ) (Kuusela and Panaretos, 2014)
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 50 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 51 / 66
MC dependence in the smearing matrix
The smearing matrix K is typically estimated using Monte Carlo
In addition to a statistical error due to the finite sample size, there aretwo sources of systematics in K:
1 The matrix depends on the shape of the spectrum within each true bin
Kij =
∫Fi
∫Ej
k(y , x)f (x) dx dy∫Ej
f (x) dx, i = 1, . . . , n, j = 1, . . . , p
2 The smearing of the variable of interest may depend on the MCdistribution of some auxiliary variables
For example, the energy resolution of jets depends on thepseudorapidity distribution of the jets
The first problem can be alleviated by making the true bins smaller atthe cost of increased ill-posedness of the problem
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 52 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 53 / 66
Introduction to RooUnfold
RooUnfold (Adye, 2011) an unfolding framework for ROOT thatprovides an interface for many standard unfolding methods
Written by Tim Adye, Richard Claridge, Kerstin Tackmann andFergus Wilson
RooUnfold is currently the most commonly used unfolding frameworkamong the LHC experiments although other implementations are alsooccasionally usedRooUnfold includes the following unfolding techniques:
1 Matrix inversion2 D’Agostini iteration3 The SVD flavor of Tikhonov regularization4 The TUnfold flavor of Tikhonov regularization
There is also an implementation for the so-called bin-by-bin unfoldingtechnique
This is an obsolete method that replaces the full response matrix K bya diagonal approximation and while doing so introduces a huge MC biasThis method should not be used!
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 54 / 66
RooUnfold classes
Figure from Adye (2011)
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 55 / 66
RooUnfoldInvert
RooUnfoldInvert(const RooUnfoldResponse* res, const TH1*
meas, const char* name = 0, const char* title = 0)
This is the most basic method: it estimates λ using λ = K−1y
Remember that when λ is positive, this is the MLE
res contains the response matrix K
meas contains the smeared data y
The standard error of λ is estimated using standard error propagation
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 56 / 66
RooUnfoldBayes
RooUnfoldBayes(const RooUnfoldResponse* res, const TH1*
meas, Int t niter = 4, Bool t smoothit = false, const char*
name = 0, const char* title = 0)
This implements the D’Agostini/Lucy-Richardson/EM iteration forfinding the MLERemember that despite the name this is not a Bayesian techniqueThe iteration is started from the MC spectrum, i.e., λ(0) = λMC
contained in resniter is the number of iterations
For small niter, the solution is biased towards λMC; for large niter,we get a solution close to the MLENote that the default niter = 4 is completely arbitrary and with nooptimality guarantees
smoothit can be used to enable a smoothed version of the EMiteration (outside the scope of this course)By default, the standard error of λ is estimated using errorpropagation at each iteration of the algorithm
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 57 / 66
RooUnfoldSvd
RooUnfoldSvd(const RooUnfoldResponse* res, const TH1* meas,
Int t kreg = 0, Int t ntoyssvd = 1000, const char* name = 0,
const char* title = 0)
This implements the SVD flavor of Tikhonov regularization, i.e.,
λ = arg minλ∈Rp
(y −Kλ)TC−1(y −Kλ) + δ
∥∥∥∥∥∥∥∥∥L
λ1/λ
MC1
λ2/λMC2
...λp/λ
MCp
∥∥∥∥∥∥∥∥∥2
,
where λMC is again contained in res
This is a wrapper for the TSVDUnfold class by K. Tackmannkreg chooses the number of significant singular values in a certaintransformation of the smearing matrix K
Small kreg corresponds to a large δ and a large kreg to a small δ
The standard error of λ is estimated by resampling ntoyssvdobservations
Also includes a contribution from the uncertainty of KMikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 58 / 66
RooUnfoldTUnfold
RooUnfoldTUnfold(const RooUnfoldResponse* res, const TH1*
meas, TUnfold::ERegMode reg = TUnfold::kRegModeDerivative,
const char* name = 0, const char* title = 0)
This implements the TUnfold flavor of Tikhonov regularization, i.e.,
λ = arg minλ∈Rp
(y −Kλ)TC−1(y −Kλ) + δ‖L(λ− λMC)‖2,
where the minimizer is found subject to an additional area constraint2
This is a wrapper for the TUnfold class by S. SchmittTUnfold actually provides a lot of extra functionality which cannot beaccessed through RooUnfold
The form of the matrix L is chosen using regThe supported choices are identity, 1st derivative and 2nd derivative
The regularization parameter δ is chosen using theSetRegParm(Double t parm) method
If δ is not chosen manually, it is found automatically using the L-curvetechnique, but this only seems to work when n� p
2In the case of the TUnfold wrapper, the RooUnfold documentation is not explicitabout the choice of λMC (it does not seem to come from res in this case)
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 59 / 66
RooUnfold practical
Start by downloading the code template at:
www.cern.ch/mkuusela/ETH_workshop_July_2014/
RooUnfoldExercise.cxx
A set of exercises based on this code can be found at:www.cern.ch/mkuusela/ETH_workshop_July_2014/
practical.pdf
Useful supplementary materialThese slides:
www.cern.ch/mkuusela/ETH_workshop_July_2014/
slides.pdf
RooUnfold website:
http://hepunx.rl.ac.uk/~adye/software/unfold/
RooUnfold.html
RooUnfold class documentation:
http://hepunx.rl.ac.uk/~adye/software/unfold/
htmldoc/RooUnfold.html
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 60 / 66
Outline
1 Introduction
2 Basic unfolding methodologyMaximum likelihood estimationRegularized frequentist techniquesBayesian unfolding
3 Challenges in unfoldingChoice of the regularization strengthUncertainty quantificationMC dependence in the smearing matrix
4 Unfolding with RooUnfold
5 Conclusions
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 61 / 66
Conclusions
Unfolding is a complex data analysis task that involves severalassumptions and approximations
It is crucial to understand the ingredients that go into an unfoldingprocedureUnfolding algorithms should never be used as black boxes!
All unfolding methods are based on complementing the likelihood byadditional information about physically plausible solutionsThe most popular techniques are the D’Agostini iteration and variousflavors of Tikhonov regularizationBeware when using RooUnfold that:
There is a MC dependence in both the smearing matrix and theregularizationThe uncertainties should be understood as standard errors and do notnecessarily provide good coverage propertiesThe regularization parameter has a major impact on the solution andshould be chosen in a data-dependent way
There is plenty room for further improvements in both unfoldingmethodology and software
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 62 / 66
References I
T. Adye. Unfolding algorithms and tests using RooUnfold. In H. B. Prosper andL. Lyons, editors, Proceedings of the PHYSTAT 2011 Workshop on StatisticalIssues Related to Discovery Claims in Search Experiments and Unfolding,CERN-2011-006, pages 313–318, CERN, Geneva, Switzerland, 17–20 January2011.
G. D’Agostini. A multidimensional unfolding method based on Bayes’ theorem.Nuclear Instruments and Methods A, 362:487–498, 1995.
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood fromincomplete data via the EM algorithm. Journal of the Royal Statistical Society.Series B (Methodological), 39(1):1–38, 1977.
P. C. Hansen. Analysis of discrete ill-posed problems by means of the L-curve.SIAM Review, 34(4):561–580, 1992.
A. Hocker and V. Kartvelishvili. SVD approach to data unfolding. NuclearInstruments and Methods in Physics Research A, 372:469–481, 1996.
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 63 / 66
References II
A. Kondor. Method of convergent weights – An iterative procedure for solvingFredholm’s integral equations of the first kind. Nuclear Instruments andMethods, 216:177–181, 1983.
M. Kuusela and V. M. Panaretos. Empirical Bayes unfolding of elementaryparticle spectra at the Large Hadron Collider. arXiv:1401.8274 [stat.AP], 2014.
K. Lange and R. Carson. EM reconstruction algorithms for emission andtransmission tomography. Journal of Computer Assisted Tomography, 8(2):306–316, 1984.
L. B. Lucy. An iterative technique for the rectification of observed distributions.Astronomical Journal, 79(6):745–754, 1974.
H. N. Multhei and B. Schorr. On an iterative method for the unfolding of spectra.Nuclear Instruments and Methods in Physics Research A, 257:371–377, 1987.
H. N. Multhei, B. Schorr, and W. Tornig. On an iterative method for a class ofintegral equations of the first kind. Mathematical Methods in the AppliedSciences, 9:137–168, 1987.
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 64 / 66
References III
H. N. Multhei, B. Schorr, and W. Tornig. On properties of the iterativemaximum likelihood reconstruction method. Mathematical Methods in theApplied Sciences, 11:331–342, 1989.
W. H. Richardson. Bayesian-based iterative method of image restoration. Journalof the Optical Society of America, 62(1):55–59, 1972.
S. S. Saquib, C. A. Bouman, and K. Sauer. ML parameter estimation for Markovrandom fields with applications to Bayesian tomography. IEEE Transactions onImage Processing, 7(7):1029–1044, 1998.
S. Schmitt. TUnfold, an algorithm for correcting migration effects in high energyphysics. Journal of Instrumentation, 7:T10003, 2012.
L. A. Shepp and Y. Vardi. Maximum likelihood reconstruction for emissiontomography. IEEE Transactions on Medical Imaging, 1(2):113–122, 1982.
Y. Vardi, L. Shepp, and L. Kaufman. A statistical model for positron emissiontomography. Journal of the American Statistical Association, 80(389):8–20,1985.
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 65 / 66
References IV
E. Veklerov and J. Llacer. Stopping rule for the MLE algorithm based onstatistical hypothesis testing. IEEE Transactions on Medical Imaging, 6(4):313–319, 1987.
G. Wahba. Spline Models for Observational Data. SIAM, 1990.
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 66 / 66
Uncertainty quantification with the bootstrap
The bootstrap sample can be obtained as follows:1 Unfold y to obtain λ2 Fold λ to obtain µ = Kλ3 Obtain a resampled observation y∗ ∼ Poisson(µ)4 Unfold y∗ to obtain λ∗
5 Repeat R times from 3
The bootstrap sample {λ∗(r)}Rr=1 follows the sampling distribution of
λ if the true value of λ was the observed value of our estimator
I.e., it is our best understanding of the sampling distribution of λ forthe data at hand
This procedure also enables us to take into account thedata-dependent choice of the regularization strength
This is very difficult to do using competing methods
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 68 / 66
Uncertainty quantification with the bootstrap
The bootstrap sample can be used to compute 1− α basic bootstrapintervals to serve as approximate 1− α confidence intervals for λi :
[λi ,L, λi ,U ] = [2λi − λ∗i ,1−α/2, 2λi − λ∗i ,α/2],
where λ∗i ,α denotes the α-quantile of the bootstrap sample {λ∗(r)i }Rr=1
This can be understood as the bootstrap analogue of the Neymanconstruction of confidence intervals
α/2 α/2
p(θ|θ = θ0)
p(θ|θ = θU)p(θ|θ = θL)
θL θ0 θU
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 69 / 66
Uncertainty quantification with the bootstrap
The bootstrap sample can be used to compute 1− α basic bootstrapintervals to serve as approximate 1− α confidence intervals for λi :
[λi ,L, λi ,U ] = [2λi − λ∗i ,1−α/2, 2λi − λ∗i ,α/2],
where λ∗i ,α denotes the α-quantile of the bootstrap sample {λ∗(r)i }Rr=1
This can be understood as the bootstrap analogue of the Neymanconstruction of confidence intervals
α/2 α/2
p(θ|θ = θ0) = g(θ)
g(θ + θ0 − θU)g(θ + θ0 − θL)
θL θ0 θU
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 69 / 66
Demonstration
−6 −4 −2 0 2 4 6−50
0
50
100
150
200
250
300
350
400
450
λ
λSEBS
Figure : Tikhonov regularization with 95% bin-wise confidence intervals. The SEintervals cover in 23 bins out of 40, while the bootstrap intervals cover in 32 bins.
Mikael Kuusela (EPFL) Unfolding in HEP July 15, 2014 70 / 66