Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Connections between discrete andcontinuous models for biological image
analysis
Thesis Report
Submitted in partial fulfillment of the requirements of
BITS F422T Thesis
By
Anmol Kumar Gupta
ID No. 2013B4A7588P
Under the supervision of:
Dr. Ivo F. Sbalzarini
&
Dr. Rajiv Kumar
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI, PILANI CAMPUS
May 2018
Declaration of Authorship
I, Anmol Kumar Gupta, declare that this Thesis Report titled, ‘Connections between discrete
and continuous models for biological image analysis’ and the work presented in it are my own. I
confirm that:
This work was done wholly or mainly while in candidature for a research degree at this
University.
Where any part of this thesis has previously been submitted for a degree or any other
qualification at this University or any other institution, this has been clearly stated.
Where I have consulted the published work of others, this is always clearly attributed.
Where I have quoted from the work of others, the source is always given. With the
exception of such quotations, this thesis is entirely my own work.
I have acknowledged all main sources of help.
Where the thesis is based on work done by myself jointly with others, I have made clear
exactly what was done by others and what I have contributed myself.
Signed:
Date:
i
Certificate
This is to certify that the thesis report entitled, “Connections between discrete and contin-
uous models for biological image analysis” and submitted by Anmol Kumar Gupta ID No.
2013B4A7588P in partial fulfillment of the requirements of BITS F422T Thesis embodies the
work done by him under my supervision.
Supervisor
Dr. Ivo F. Sbalzarini
Professor, Faculty of Computer Science
and Faculty of Mathematics,
Center for Systems Biology, TU Dresden,
Germany
Date:
Co-Supervisor
Dr. Rajiv Kumar
Professor, Department of Mathematics,
BITS-Pilani, Pilani Campus,
India
Date:
ii
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI
Abstract
Bachelor of Engineering (Hons.) Computer Science
Master of Science (Hons.) Mathematics
Thesis Report
by Anmol Kumar Gupta
Mathematical connections between discrete and continuous models in image analysis give
important insights into the uniqueness and stability of the solution, as well as help to choose
the appropriate model with least error. The error bounds from the discrete model can be used
to comment about the quality of the solution of a continuous model but also one can use the
physical interpretation of the continuous model to rationalize the choice of functions/potentials
in the discrete problem. We observe that there is a strong connection between discrete and
continuous convolution and the errors in discrete convolution filter can be studied via that
connection. Also, we observe that the solution to denoising using Markov Random Fields, a
discrete model, becomes identical to Total Variational denoising, a continuous model, when
particular potential functions are chosen in the formulation of Markov Random Fields.
Acknowledgements
First of all, I would like to thank my supervisor Dr. Ivo F. Sbalzarini. He offered me an interesting
topic which deals with the dichotomy: continuous and discrete in the area of biological image
analysis. I have enjoyed all the discussions, suggestions, and help which he offered during group
meetings and one on one interactions. Much of my current progress is dedicated to the ideas
resulting from said discussions.
I am also grateful to Dr. Rajiv Kumar for agreeing to be the co-supervisor from BITS. I admire
his part of taking care of all the formalities of the University while I am writing my off-campus
thesis.
Next, I would like to thank Karl, a Ph.D. student at the MOSAIC group, for his suggestions
regarding calculations in Chapter 2. I have learned a lot from him in terms of mathematical
reasoning. Also, I would like to thank Joel, a Masters student at the MOSAIC group, for the
discussions on discretized continuous convolution.
Last I would like to thank each and every group member of MOSAIC group for making the
workplace full of fun and open to discussions.
iv
Contents
Declaration of Authorship i
Certificate ii
Abstract iii
Acknowledgements iv
Contents v
List of Figures vii
List of Tables viii
1 Introduction 1
2 Connection between discrete and continuous Convolution 2
2.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.1 Continuous Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.2 Discrete Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Mathematical framework for connection . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Selecting a discrete filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Writing the expression for convolution corresponding to the filter . . . . . 4
2.2.3 Discretizing the continuous convolution . . . . . . . . . . . . . . . . . . . 5
2.2.4 Comparing the algebraic expressions and solving for nodal points . . . . . 5
2.3 Connection between quadrature schemes and discrete convolution filter . . . . . . 7
3 Connection between Markov Random Fields and Variational Model for adenoising problem 8
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2 Variational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.3 Bayesian Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
v
Contents vi
3.3 Mathematical framework for connection . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1 Defining MRF model for denoising . . . . . . . . . . . . . . . . . . . . . . 11
3.3.2 Discretizing the Total Variational model . . . . . . . . . . . . . . . . . . . 12
3.4 The connection depends on the choice of potentials . . . . . . . . . . . . . . . . . 13
4 Conclusion and Future work 15
A Theoretical notes 17
A.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
A.2 Numerical integration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
B Convolution Integral: reasoning from Plots and Data 20
B.1 Inference from Absolute error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
B.2 Inference from number of computations . . . . . . . . . . . . . . . . . . . . . . . 21
B.3 Inference from Plots of different quadrature schemes . . . . . . . . . . . . . . . . 22
C Illustrative example of MRF Denoising 23
C.1 Binary MRF Model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
C.2 Algorithm for Binary MRF and results . . . . . . . . . . . . . . . . . . . . . . . . 24
List of Figures
2.1 three-point Gaussian kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Discrete Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Mid point rule to evaluate convolution . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Graph structure for MRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Test image for denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 L2/TV denoising result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 L1/TV denoising result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
B.1 Absolute Errors in Numerical Integration . . . . . . . . . . . . . . . . . . . . . . 20
B.2 Computational Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
B.3 Error plots for Numerical integration methods . . . . . . . . . . . . . . . . . . . . 22
C.1 MRF toy example results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
vii
List of Tables
A.1 Gauss–Legendre quadrature coefficients . . . . . . . . . . . . . . . . . . . . . . . 19
viii
Chapter 1
Introduction
The distinction between discrete and continuous models is a fundamental dichotomy in mathe-
matics and related fields [12]. In the world of image analysis, these two approaches to solve a
problem have their own importance. On one hand one can get an analytical/numerical solution
of a problem using a continuous model whereas, on the other hand, one can get a speed-up
with an error if the same problem is solved using a discrete model with some assumptions. But,
we can’t always have an analytical solution for a continuous problem and also we can’t always
know the right assumptions of how our discrete model solved the problem like a black box. In
both the scenarios if we try to study these two models taking simple examples then we can
hope to understand the link between these two worlds and give a theoretical estimate about the
suitability of the model for solving a problem or even what can be a proper mix of the steps if
one uses both the approaches.
This work aims to establish a link between two worlds viz. discrete and continuous in the area
of image analysis by studying two important processes in image analysis: Convolution and
Denoising. Chapter two provides a mathematical framework for the connection between the
continuous and the discrete models of convolution by taking an example of a low pass Gaussian
filter whereas in third chapter a well-known denoising problem will be studied from a continuous
perspective using Variational Models and from a discrete perspective using Markov Random
Fields (MRFs). In the last chapter of this work, we present the results obtained from denoising
an image using L1/TV and L2/TV models and propose a framework with which we can find a
connection between variational models and artificial neural networks.
1
Chapter 2
Connection between discrete and
continuous Convolution
The motivation to study the connection between discrete and continuous convolution is to have a
mathematical framework to comment on the exactness of the error and convergence of a discrete
convolution filter.
2.1 Convolution
The convolution is the backbone of all filtering techniques and is the most frequently used
operation in image processing, hence analyzing the connection will provide a general framework
for finding connections between discrete and continuous models in filtering domain. It is also easy
to analyze each term in the convolution operation due to its mathematical simplicity because in
discrete setting it is just a summation of pixel-wise multiplications and in the continuous setting
it is just an integration of pointwise multiplication of two functions.
2.1.1 Continuous Convolution
The basic equation for convolution is as follows:
C(x) = (f ∗ g)(x) =
∫Ωf(t)g(x− t) dt (2.1)
=
∫Ωf(x− t)g(t) dt (2.2)
also called as f convolved with g, where f and g are functions defined inside Ω.
2
Chapter 2. Connection between discrete and continuous Convolution 3
2.1.2 Discrete Convolution
The following discrete form of convolution is more famous in signal processing community:
(f ∗ g)[n] =
m=∞∑m=−∞
f [m]g[n−m] (2.3)
=m=∞∑m=−∞
f [n−m]g[m]. (2.4)
Here, m is often referred to as support of function g. But for image processing we use the
discrete form equation with a finite support. Let g has a finite support in the set −M,−M +
1, . . . ,M − 1,M then (2.4) can be re-written as a finite summation:
(f ∗ g)[n] =m=M∑m=−M
f [n−m]g[m]. (2.5)
For each pixel, (2.5) computes the inner product of current sample neighborhood 1 and kernel 2.
2.2 Mathematical framework for connection
Notations used:
f [n], g[m]: discrete image pixels and filter kernel respectively.
f(x), g(t): continuous representation of image and filter respectively.
xi, ti: nodal points in quadrature scheme to index f(x) and g(t) respectively.
We propose the following stepwise procedure to show the connection between discrete and
continuous convolution:
• Step 1: Selecting a discrete filter.
• Step 2: Writing the expression for discrete convolution with the filter chosen.
• Step 3: Discretizing the convolution integral.
• Step 4: Comparing the algebraic expressions of Step 2 and 3 term by term and solving
for nodal points.
After following the above steps one can easily reason about the equivalence of the discrete filter
and the choice of quadrature scheme characterized by unique nodal points.
1Here, neighborhood refers to the set n−M, . . . , n + M including n.2Kernel in image processing terminology refers to convolution matrix or mask which when convolved with the
image can produce effects like blurring, edge detection and many more.
Chapter 2. Connection between discrete and continuous Convolution 4
Figure 2.1: Normalized three-point Gaussian kernel.
Figure 2.2: A single location in discrete convolution.
2.2.1 Selecting a discrete filter
The first step is to choose a filter kernel to be convolved with the image. We need to know the
filter type i.e Mean, Median, Gaussian, Laplacian, etc. and the kernel dimension to completely
define a discrete filter. During discretization step of the convolution integral the kernel dimension
will decide the number of nodal points to be used in the quadrature scheme3. For analysis in the
upcoming sections, we choose a three-point Gaussian kernel given by 1/4[1,2,1] as shown in the
figure.
2.2.2 Writing the expression for convolution corresponding to the filter
To compare the discrete and continuous convoltuion we first write the algebraic expression
for the discrete convolution corresponding to the filter chosen in previous step. For a given
one dimensional image f indexed by n ∈ 0, 1, . . . , N − 2, N − 1, the expression for discrete
convolution for each pixel n of the image corresponding to filter g = 1/4[1,2,1] is as follows:
(f ∗ g)[n] =
m=1∑m=−1
f [n−m]g[m] (2.6)
=1
4(1.f [n− 1] + 2.f [n] + 1.f [n+ 1]) . (2.7)
WLOG we can choose zero padding i.e f [n] = 0 for n /∈ 0, 1, . . . , N − 2, N − 1 to avoid overflow
in the above equation at boundary pixels.
3Numerical integration method for discretizing the convolution integral.
Chapter 2. Connection between discrete and continuous Convolution 5
2.2.3 Discretizing the continuous convolution
The expression for convolution integral when an image f convolved with a filter g having a finite
support t ∈ −k, k ⊂ IR is as follows:
C(x) = (f ∗ g)(x) =
∫ k
−kf(x− t)g(t) dt. (2.8)
There are two questions which arise at this point:
• Which quadrature scheme to use for discretizing the convolution integral in (2.8)?. In
principle, one can use any quadrature schemes from Midpoint Rule, Simpsons’ Rule or
Trapezoidal Rule to discretize the integral. But, there are two points to be noted. First,
depending on the quadrature scheme chosen the error in evaluating the integral will vary.
Second, for the same choice of quadrature scheme when a different kernel width is chosen
for the discrete filter then the convergence rate will vary which will give an idea of how
the discrete filter converges as one increases the kernel width (see B).
• How many nodal points to choose for the quadrature scheme? The answer depends on the
size of the kernel. The nodal points indicate where the continuous filter function inside the
continuous integral is sampled. Therefore, the number of sampled points for the continuous
filter function should match the corresponding discrete kernel size. Since we have chosen a
three-point discrete Gaussian kernel, therefore, the number of nodal points to be used in
the quadrature scheme would be three.
Applying the composite Mid point rule to (2.8) gives
(f ∗ g)(x) =
∫ k
−kf(x− t)g(t) dt (2.9)
=
∫ t1
t0
f(x− t)g(t) dt+
∫ t2
t1
f(x− t)g(t) dt+
∫ x3
x2
f(x− t)g(t) dt (2.10)
≈ hf(x− t1/2)g(t1/2) + hf(x− t3/2)g(t3/2) + hf(x− t5/2)g(t5/2), (2.11)
where ti−1/2 = (ti + ti−1)/2 and h is the sampling width.
2.2.4 Comparing the algebraic expressions and solving for nodal points
We compare the algebraic expressions (2.7) and (2.11) obtained in step 2 and step 3 respectively
to calculate the nodal points for the quadrature scheme chosen. It is to be noted that the image
in both the expressions (2.7) and (2.11) is sampled at same points which leads to canceling of
Chapter 2. Connection between discrete and continuous Convolution 6
Figure 2.3: Nodal points of Mid point rule and area of the boxes numbered 1, 2, 3 overgaussian curve are in ratio 1:2:1 corresponding to the discrete filter chosen.
the image term leading to
hg(t1/2) = 1/4, (2.12)
hg(t3/2) = 2/4, (2.13)
hg(t5/2) = 1/4. (2.14)
Observation: Since, the mid-point rule is applied on the interval −k, k therefore by symmetry
t3/2 = 0. (2.15)
Eliminating h from equations (2.12) and (2.13), we obtain
g(t1/2) =1
2g(t3/2). (2.16)
A continuous Gaussian filter can be represented as
g(t) =1
cexp
(− t2
2σ2
), (2.17)
where c is the normalizing constant and σ is the standard deviation. Using (2.16) and (2.17), we
obtain
−t21/2
2σ2= − ln 2 +−
t23/2
2σ2(2.18)
and finally using (2.15), we get
t1/2 = −√
2 ln 2σ. (2.19)
Chapter 2. Connection between discrete and continuous Convolution 7
Similarly, we solve for t5/2 to get
t5/2 =√
2 ln 2σ. (2.20)
Also,
h = t3/2 − t1/2 =√
2 ln 2σ. (2.21)
The nodal points and the sample width h completely define the quadrature scheme. Therefore, a
discrete convolution with filter 1/4[1,2,1] is equivalent to to the continuous convolution discretized
by three point Mid-point rule characterized by the nodal points as given by (2.19), (2.15), (2.20)
and sample width as given by (2.21).
2.3 Connection between quadrature schemes and discrete con-
volution filter
In this section, we discuss how the error and convergence rate of discrete convolution filter can
be studied using the quadrature scheme chosen to discretize the convolution integral.
• A note on error analysis: If we sample the Gaussian curve exactly at the points given
by (2.15), (2.19) and (2.20) for the continuous convolution discretized by Midpoint Rule
then we’ll get exactly the same algebraic expression as three-point Gaussian convolution
filter 1/4[1,2,1]. Therefore, the error in the discrete convolution is exactly same as the
error of the quadrature scheme for discretizing the convolution integral. But different
quadrature schemes have different error bounds. Therefore, one can choose a suitable
quadrature scheme to report the lowest error. Appendix B discusses errors for different
quadrature schemes.
• A note on convergence rate: The kernel width (window size) is directly related to the
number of nodal points in the quadrature scheme and since the convergence rate of a
quadrature scheme depends on the number of nodal points (see Appendix B) therefore one
can study how a discrete convolution filter converges as the window size is increased.
Therefore, the discrete filter convolution is same as the continuous convolution discretized using
a specific quadrature scheme.
Chapter 3
Connection between Markov
Random Fields and Variational
Model for a denoising problem
The motivation to study the connection between Markov Random fields and Variational Model
for denoising is to see how choosing a particular prior in Markov Random Fields (discrete) is
connected to optimization of energy functional of Variational methods (continuous).
3.1 Introduction
In this chapter, we take yet another classical problem apart from convolution which is denoising.
Denoising is the process of recovering the original image from a corrupted or noisy image. Image
denoising is a well-studied process in computer vision, image processing, and signal processing
community. In all these communities there are two main groups of people, ones who work with
continuous models like variational models, contour-based and level-set method [3] and others who
work with discrete models, for instance, Probabilistic Graphical models like Markov Random
Fields (MRFs). There have been many advancements in both the fields [3], [11], [14], [20] but
much less work has been done in establishing the connection between these two models [4].
We take a simple problem of denoising and discuss how we model it using Markov Random
Fields which is a discrete model. Then we ask a question which assumptions in Total Variational
denoising (TVD), a continuous model, will lead to the exact same expression for data fidelity
and prior term in MRF energy equation. It is to be noted that both these models are based on
Bayesian framework, therefore, we start with a hypothesis that for a specific formulation of MRF
potentials there must be mathematical connection with a TVD which solves the same problem in
8
Chapter 3. Connection between MRFs and Variational Model for a denoising problem 9
continuous domain. In other words, we can ask which discretization scheme in TVD leads to
exactly the same formulation of energy in MRF.
3.2 Preliminaries
3.2.1 Markov Random Field
A Markov Random Field is an undirected graph represented by a countable set of random
variables, which in image analysis applications is indexed by spatial position of image pixels.
The model ingredients which completely define MRFs include the following:
1. set of random variables defining the number of unknown variables in the model,
2. an undirected graph representing a family of distribution characterized by Markov blanket1,
3. range of random variables defining the permissible values for the random variables, and
4. potentials functions defining the relationship between the random variables according to
the apriori assumptions about the model.
3.2.2 Variational Methods
Variational Methods are specific class of optimization methods. The key idea is to define a
cost functional which is then minimized or maximized depending on the problem. Variational
methods are widely used in image analysis in the area of denoising [20], segmentation [18], motion
estimation [10], dense 3 D reconstruction tracking [9]. Some of the advantages of Variational
methods includes in-depth analysis of cost functionals allowing statements about uniqueness
and stability of the solution. The uniqueness is guaranteed if the energy functional is strictly
convex. The stability of the solution depends on how much variation in the solution is caused by
changing the initial conditions of the problem for instance nature of noise.
Rudin, Osher, and Fatemi [20] used the total variation as an image regularization method to
remove noise as well as preserve edges. Later TV showed efficiency in solving various image
inverse problems such as deblurring, restoration, and inpainting [8], [21]. Mathematically, TV
for a function u ∈ IRn>1 is defined as the integral of its gradient magnitude i.e.
‖u‖TV =
∫‖∇u‖. (3.1)
1set of neighbors of a node (random variable) in an undirected graph following Markovian property ofconditional independence.
Chapter 3. Connection between MRFs and Variational Model for a denoising problem 10
3.2.3 Bayesian Framework
Denoising is an inverse problem and MRF and TVD are both connected via Bayesian framework
as the former stresses on the maximization of the probability of an event with Markovian
assumptions of conditional independence and the latter relies on minimizing the continuous
energy functional involving TV term (3.1).
The posterior distribution for our denoising model is as follows:
Probability(ture image | noisy image) = P (u | v). (3.2)
Applying Bayes rule to (3.2) we obtain
P (u | v) =P (v |u).P (u)
P (v), (3.3)
where P (u | v) is the posterior term, P (v |u) is the likelihood of noisy image given the true image,
P (u) is called prior and the P (v) is a normalization term (constant) or
Posterior ≈ Likelihood xPrior. (3.4)
The goal is to maximize the posterior or in other words minimize the negative logarithm of
posterior i.e. energy of the model. Taking negative logarithm on both sides of 3.4 we get
minimize E(u) = Edata(u, v) + λEprior(u), (3.5)
where Edata is data association term that takes into account the the nature of noise as well as
the relationship between the true and the noisy image, Eprior is the regularity term that takes
into account how likely a priori is a given solution (true image), and λ > 0 is regularization
parameter controls the smoothness of the expected image. This is a general form of optimization
model used by both MRF and TVD as illustrated in upcoming sections.
3.3 Mathematical framework for connection
Notations used:
u: true image.
v: noisy image.
N(i, j): neighborhood of the pixel (i, j).
In this section, we provide a simple formulation to analyze the connection by focusing on the
choice of potentials in MRFs. We first define the MRF and then discretize the TVD so that
Chapter 3. Connection between MRFs and Variational Model for a denoising problem 11
Figure 3.1: Here, the white nodes denote the variables corresponding to true imageu and the grey nodes denote variables corresponding to noisy image v. Each vi,j isconnected to the corresponding ui,j , and each ui,j is connected to its neighbors.
both the models can be compared term-wise. Then we reason about how the choice of potentials
in MRF affects the convexity of the problem.
3.3.1 Defining MRF model for denoising
If the image size is M x N then there are 2 x M x N random variables in our model represented
by ui,j and vi,j where 1≤ i ≤M and 1≤ j ≤N representing M x N independent pixels. The
undirected graph is represented by the pixel grid of both the images with pixels as the vertices
and the neighborhood connection as well as the connection between true image pixels and noisy
image pixels as edges as shown in Figure 3.1.
The goal is to maximize the posterior probability P (u | v) i.e. the probability of observing the
true image give the noisy version of it, i.e
u = arg maxu
P (u | v) (3.6)
= arg maxu
P (v |u)︸ ︷︷ ︸Likelihood
. P (u)︸ ︷︷ ︸Prior
. (3.7)
Modeling the likelihood: To model the likelihood we assume that for each pixel (i, j), the
measured intensity, vi,j , is given by the true intensity, ui,j , plus additive noise. For illustration
purpose we take the noise to be Gaussian for which the likelihood is as follows
P (vi,j |ui,j) ∝ exp(−(ui,j − vi,j)2
2σ2
). (3.8)
All measurements are mutually independent therefore the likelihood for the entire image v
P (v |u) =∏i,j
P (vi,j |u) =∏i,j
P (vi,j |ui,j) ∝∏i,j
exp
(−(ui,j − vi,j)2
2σ2
). (3.9)
Chapter 3. Connection between MRFs and Variational Model for a denoising problem 12
Modeling the prior: We expand the prior as follows
P (u) =∏i,j
P (ui,j |N(ui,j)), (3.10)
where we assume a Markov property, namely that the probability of ui,j is sufficiently characterized
by its neighbors N(ui,j). Assuming a simple smoothness prior, we have
P (u) ∝∏i,j
exp
(−λ√
(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2
), (3.11)
where λ > 0 is a regularization parameter controlling the smoothness.
With these assumptions the posterior distribution is given by
P (u | v) ∝∏i,j
exp
(−(ui,j − vi,j)2
2σ2
).∏i,j
exp
(−λ√
(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2
). (3.12)
Instead of maximizing this probability distribution, one can equivalently minimize its negative
logarithm which is given by the energy
minimizeE(u) =− logP (u | v) (3.13)
=∑i,j
(ui,j − vi,j)2
2σ2︸ ︷︷ ︸Edata(u, v)
+λ∑i,j
√(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2
︸ ︷︷ ︸Eprior(u)
(3.14)
and thus the expected true image is
u =arg minuE(u) (3.15)
which is analogous to 3.6 and defined for Energy functional instead of posterior probability.
3.3.2 Discretizing the Total Variational model
According to Rudin, Osher Fatemi total variational model for denoising [20] the energy functional
is given by
E(u) =
∫Ω‖u− v‖2 + λ
∫Ω‖∇u‖, (3.16)
where Ω is domain of images u and v, and λ > 0 is regularization parameter.
Chapter 3. Connection between MRFs and Variational Model for a denoising problem 13
Discretizing the first term in (3.16) i.e.∫
Ω ‖u−v‖2 is simple as integral over discrete pixels
(i, j) ∈ Ω is equivalent to summation over whole pixel grid i.e.∫Ω
(u− v)2 ≈∑i,j
(ui,j − vi,j)2 (3.17)
which is exactly equal to the Edata(u, v) term in MRF model (3.14). It is to be noted that
discretizing∫
Ω ‖u− v‖2 using composite Midpoint quadrature scheme will also yield the same
result because the sample points are discrete pixels (i, j) and therefore the step size h = 1 leading
to simple summation over the domain.
Discretizing the TV term in (3.16) involves choosing a discretization scheme for evaluating
the gradient and a suitable norm to evaluate the magnitude of the gradient. The gradient in the
TV term can be discretized by using finite differences with Neumann boundary conditions [5]
‖∇u‖ = (uxi,j , uyi,j), (3.18)
where uxi,j , uyi,j are computed as follows
uxi,j =
ui+1,j − ui,j if i<M
0 if i = M(3.19) uyi,j =
ui,j+1 − ui,j if i<N
0 if i = N. (3.20)
We observe that in (3.14) the prior term represents Euclidean distance between neighbors of a
pixel. Therefore, L2 norm for evaluating the TV term is an obvious choice
‖∇u‖2 =‖(uxi,j , uyi,j)‖2 (3.21)
=√
(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2 (3.22)
which is exactly equal to the Eprior(u, v) term in MRF model 3.14.
3.4 The connection depends on the choice of potentials
Similarity between MRF and L2/TV denoising:
• The TV model used as an example in the previous section is well known by the name of
L2/TV denoising. The example in the previous section showed how a particular choice
of potential for the MRF denoising model makes it exactly similar to L2/TV model of
denoising. We can now comment on global or local optimization of the energy functional
in MRF model. L2/TV model of denoising is convex in nature, therefore, the solution
obtained in MRF denoising using the potential as given in (3.11) will be globally optimized.
Chapter 3. Connection between MRFs and Variational Model for a denoising problem 14
Similarity between MRF and L1/TV denoising:
• L1/TV denoising: The L1/TV denoising has the following energy functional
E(u) =
∫Ω|u− v|+ λ
∫Ω|∇u|, (3.23)
where the meaning of symbols remains the same as in (3.16). S. Alliney in [1], [2] discussed
this type of energy functional for one-dimensional signals in a discrete setting. This model
is frequently used for restoring images corrupted from impulse noise [17], [7]. The energy
functional of L1/TV model is not strictly convex therefore the minimizer cannot be globally
optimized. But the model is well-known for preserving contrast and geometry [16].
If we model the likelihood in MRF as
P (vi,j |ui,j) ∝ exp (−|ui,j − vi,j |) (3.24)
and prior as
P (ui,j |N(ui,j)) ∝ exp (−λ |(ui+1,j − ui,j)|+ |(ui,j+1 − ui,j)|) (3.25)
then it can be verified using the same steps given in the previous section that this will
lead to exactly the same expression of a discretized the L1/TV model 3.23. Since, L1/TV
model of denoising is not strictly convex [16] therefore uniqueness of the solution is not
guaranteed for the above choices of likelihood and prior.
Chapter 4
Conclusion and Future work
Conclusion:
This work used convolution and denoising as an example to present a mathematical framework
for the connection between discrete and continuous models in biological image analysis. The main
questions that have been addressed are: 1. How the choice of discrete convolution filter is related
to the quadrature scheme used to discretize the continuous convolution corresponding to the same
filter? and 2. How the choice of potentials/priors in MRF is related to TVD formulation?
Chapter 2 started with taking an example of a three-point Gaussian filter and showed how
the expression for discrete convolution with the filter is exactly same as the expression of the
convolution integral discretized using composite Midpoint Rule with specific nodal points and
sample width. It was concluded that the error in discrete convolution using the three-point
discrete filter is same as the error in discretizing the convolution integral using composite
Midpoint Rule. The error analysis of all the quadrature schemes are given in Appendix A and
using an appropriate quadrature scheme with least error one can report the error which one
makes while choosing a specific discrete filter.
Chapter 3 showed how a denoising problem is modeled using Markov Random Fields and
Variational methods and that certain choices of potentials/priors in MRF lead to exact same
expression as that of discretized L1/TV and L2/TV model of denoising.
Future work:
We have a set of noisy images of size 256 x 256 which have been denoised using L2/TV [20]
and L1/TV [7] models using the Primal-Dual Algorithm, the in-depth discussion of which is
given in the paper [6]. The denoised images can be used to find and test whether there exists
an artificial neural network, a discrete model, that can be sufficiently trained using the results
of L1/TV or L2/TV denoising, a continuous model, and if its possible then the ANN can be
15
Chapter 4. Conclusion and Future work 16
termed as a “solver” for a denoising problem. The ANN will be faster in solving the denoising
problem because in general querying a sufficiently trained network takes much less time than
performing numerical solution of a variational model. The main advantage will be a speed-up in
the response time of a denoising query1.
Figure 4.1: Original image vs Noisy image; size: 256 x 256 Source: ImageJ.
Figure 4.2: Results of the L2/TV denoising for different values of the regularizationparameter λ.
Figure 4.3: Results of the L1/TV denoising for different values of the regularizationparameter λ.
1input: noisy image, output: denoised image
Appendix A
Theoretical notes
A.1 Convolution
The basic equation for convolution is as follows:
C(x) = (f ∗ g)(x) =
∫Ωf(t)g(x− t) dt (A.1)
=
∫Ωf(x− t)g(t) dt, (A.2)
also called as f convolved with g, where f and g are functions defined inside Ω.
The following discrete form of convolution is more famous in signal processing community:
(f ∗ g)[n] =∞∑
m=−∞f [m]g[n−m] (A.3)
=∞∑
m=−∞f [n−m]g[m]. (A.4)
Here, m is often referred to as support of function g.
But for image processing we use the discrete form equation with a finite support. Let g has a
finite support in the set −M,−M + 1, . . . ,M − 1,M then (A.4) can be re-written as a finite
summation:
(f ∗ g)[n] =M∑
m=−Mf [n−m]g[m]. (A.5)
17
Appendix A. Theoretical notes 18
For each pixel n, (A.5) computes the inner product of current sample neighborhood1 and kernel2.
A.2 Numerical integration methods
Many computational applications such as image analysis need to make use of numerical integration
methods to solve an integral. In this section, Midpoint Rule, Trapezoidal Rule, Simpson’s Rule
and Gaussian Quadrature are discussed. The main focus is on how one can discretize the
convolution integral. The mathematical error bounds are also discussed which give an idea about
how a change in kernel size and sampling steps can affect the accuracy of the result.
Notations:
[a, b]: interval of integration.
n: number of subintervals.
[xi−1, xi]: subinterval, where xi = a+ ih and i = 0, 1, . . . , n.
h: step size, mathematically h = xi − xi−1 ≡ (b− a)/n.
In midpoint rule, we approximate function f(x) by its value at the midpoint of each subinterval
and the area under the curve for each such subinterval is then given as hf(xi−1/2), i.e.,
∫ b
af(x) =
∫ x1
x0
f(x) +
∫ x2
x1
f(x) + · · ·+∫ xn
xn−1
f(x) (A.6)
≈ hf(x1/2) + hf(x3/2) + · · ·+ hf(xn−1/2) (A.7)
= hn∑
i=1
f(xi−1/2) (A.8)
≡Mn(f). (A.9)
In trapezoidal rule, we approximate f by a piecewise linear function. The function value is
calculated at the nodal points, xi, and the area under the curve for each subinterval is then
given by 0.5h(f(xi−1) + f(xi)), i.e.,
∫ b
af(x) =
∫ x1
x0
f(x) +
∫ x2
x1
f(x) + · · ·+∫ xn
xn−1
f(x) (A.10)
≈ 0.5h(f(x0) + f(x1)) + 0.5h(f(x1) + f(x2)) + · · ·+ 0.5h(f(xn−1) + f(xn)) (A.11)
= 0.5h(f(x0) + 2f(x1) + f(x2) + · · ·+ 2f(xn−1) + f(xn)). (A.12)
1Here, neighborhood refers to the set n−M, . . . , n + M including n.2Kernel in image processing terminology refers to convolution matrix or mask which when convolved with the
image can produce effects like blurring, edge detection and many more.
Appendix A. Theoretical notes 19
In Simpson’s rule, in contrast to trapezoidal rule, we approximate f by a piecewise quadratic.
The function value is then calculated at successive triplets of the nodal point and the area under
the curve for each subinterval is of the form h3 (f(xi) + 4f(xi+1) + f(xi+2)), i.e.,
∫ b
af(x) =
∫ x1
x0
f(x) +
∫ x2
x1
f(x) + · · ·+∫ xn
xn−1
f(x) (A.13)
≈ h
3(f(x0) + 4f(x1) + f(x2)) +
h
3(f(x1) + 4f(x2) + f(x3))+
· · ·+ h
3(f(xn−2) + 4f(xn−1) + f(xn)) (A.14)
=h
3(f(x0) + 4f(x1) + 2f(x2) + 4f(x3) · · ·+ 2f(xn−2) + 4f(xn−1) + f(xn)). (A.15)
Gaussian quadrature approximates the integral by weighted sum as follows,∫ 1
−1f(x) ≈
n∑i=1
wif(xi), (A.16)
where wis are weights3 Conventionally, the limit of integration is [−1, 1] but the following
conversion scheme can be applied to form a formula for the interval [a, b]∫ b
af(x) dx =
b− a2
∫ 1
−1f
(b− a
2x+
a+ b
2
)dx. (A.17)
Applying Gauss quadrature rule (A.16) to (A.17) results in following∫ b
af(x) dx ≈ b− a
2
n∑i=1
wif
(b− a
2xi +
a+ b
2
). (A.18)
Table A.1 shows wi and xi values for up to n = 5.
n xi wi
1 0 2
2 ±0.57735 1
3 0, ±0.774597 0.888889, 0.555556
4 ±0.339981, ±0.861136 0.652145, 0.347855
5 0, ±0.538469, ±0.90618 0.568889, 0.478629, 0.236927
Table A.1: Gauss–Legendre quadrature coefficients
3refer table A.1.
Appendix B
Convolution Integral: reasoning
from Plots and Data
This appendix will discuss the results obtained from numerical integration methods for convolution
integral- the errors, convergence rate and computational operations. I started with a very simple
example of convolving f(x) with g(x), where f(x) and g(x) are real-valued functions. For studying
the errors in numerical integration methods, I took a Gaussian filter like function g(x) = e−x2,
neglecting the constants and taking σ = 1, and another function f(x) = x. The exact solution
comes out to be πx which can be evaluated at different indices of f . Also, the convolution
integral was discretized using different quadrature rules like Simpson, Trapezoidal, Midpoint
rules, and Gaussian quadratures.
Figure B.1: Absolute Errors in Numerical Integration Methods
20
Appendix B. Convolution Integral: reasoning from Plots and Data 21
B.1 Inference from Absolute error
Points to be noted from Figure B.1:
1. The Midpoint rule and Trapezoidal rule converges quite fast as compared to other methods.
2. The Midpoint and the Trapezoidal rules give an accuracy of up to 10−2 if an equivalent
five point stencil is used, although the accuracy increases as number of nodal/partition
points reach nine.
3. Although Gauss Quadrature converges slowly before n=3 but does quite well when compared
to Simpson’s Rule after n=5.
Figure B.2: Error analysis and Computational operations per index/pixel
B.2 Inference from number of computations
Points to be noted from Figure B.2:
1. The number of operations per pixel/index is least in the Midpoint Rule. Also for the
Midpoint Rue, the accuracy is the highest among all the methods.
2. For a fixed n ≥ 5, Figure B.1 and B.2 suggest that Gaussian Quadrature performs well
when compared to Simpson’s rule in terms of accuracy although the number of operations
per pixel/index is the same for them.
Appendix B. Convolution Integral: reasoning from Plots and Data 22
Figure B.3: Error plots for numerical integration methods- absolute error (left)/error(right) vs number of nodal/partition points.
B.3 Inference from Plots of different quadrature schemes
Points to be noted from Figure B.3:
1. The convergence rate of Midpoint Rule and Trapezoidal Rule is of the order 1/n2 [13]
and it is also evident from the plot that after n = 2 these two methods converge almost
together.
2. After n ≥ 5, almost all except Simpson Rule perform well. The Simpson’s Rule’s curve
can be seen to oscillate around the zero error base line.
The Midpoint Rule does quite well in terms of accuracy if an analogous 5 point stencil is taken
for discrete low pass filtering. The convergence of all these methods have been well studied in
the past [19]. Therefore, with the knowledge of computational cost and convergence rate one can
decide what filter is well suited and also how many number of sampling points will be enough to
get the result of acceptable accuracy.
Appendix C
Illustrative example of MRF
Denoising
C.1 Binary MRF Model formulation
The probability formulation of Markov Random Fields have been skipped and the focus is kept
on how to choose the potentials based on the specific assumptions about the image and the noise.
The objective of this chapter is to test the MRF denoising theory on a binary image and study
how the energy function is defined. This example has been inspired from a book on machine
learning [15].
Notations:
u: binary image to be recovered,
v: noisy image, both indexed by (i, j).
ui,j , vi,y ∈ 1,−1.N(ui,j): set of neighbors of pixel ui,j .
Task: To recover u from given v.
Assumptions:
1. The noise is assumed to be small, which implies that there must be good correlation
between the values of each pixel in two images i.e. ui,j , vi,j should be correlated.
2. For a given pixel ui,j , the other pixels in image are conditionally independent of ui,j , except
the neighbors (Markovian Blanket) of this pixel i.e. ui,j and N(ui,j) are correlated.
Modeling the data term in Energy (likelihood): The first assumption introduces data
association or similarity term i.e. how close the solution is to the exact solution. Energy of the
same pixel in two images is represented as −ηui,j .vi,j where η > 0 is a constant.
23
Bibliography 24
Modeling the regularization/smoothness term in Energy (prior): The second assump-
tion introduces a smoothness or regularity term i.e. how likely a priori a given solution is: if
it is oscillatory, then it is not likely to be a real image but if it is smooth then it is likely to
be the real image. To introduce smoothness we assume that the value pixel ui,j is conditional
dependent only on its neighbors ui±1,j±1 therefore the prior is given by −ζui,j .ui±1,j±1, where
ζ > 0 is a constant.
Total energy is given by:
E(u, v) = −η∑i,j
ui,j .vi,j − ζ∑i,j
ui,j .ui±1,j±1, (C.1)
where indexes of pixels run from 1 to N in both x and y direction [15].
C.2 Algorithm for Binary MRF and results
Algorithm 1 MRF Denoising Algorithm for binary image
• Given a noisy image v and constant parameters ζ, η.
1: Loop over pixels of u:
2: Compute Energy using equation C.1 with the current pixel being −1 and 1, whichever gives
the lower Energy accordingly, update value of the current pixel in u.
Figure C.1: Using the algorithm 1 with η = 2.1, ζ = 1.5 with noisy image corrupted by10% uniformly distributed random noise as an input, we get the denoised image havingabout 1.648% error.
Bibliography
[1] Stefano Alliney. “Digital filters as absolute norm regularizers”. In: IEEE Transactions on
Signal Processing 40.6 (1992), pp. 1548–1562.
[2] Stefano Alliney. “Recursive median filters of increasing order: a variational approach”. In:
IEEE Transactions on Signal Processing 44.6 (1996), pp. 1346–1354.
[3] Leah Bar et al. “Mumford and shah model and its applications to image segmentation
andimage restoration”. In: Handbook of mathematical methods in imaging. Springer, 2011,
pp. 1095–1157.
[4] Antonin Chambolle. “Total variation minimization and a class of binary MRF models”.
In: International Workshop on Energy Minimization Methods in Computer Vision and
Pattern Recognition. Springer. 2005, pp. 136–152.
[5] Antonin Chambolle and Thomas Pock. “A first-order primal-dual algorithm for convex
problems with applications to imaging”. In: Journal of mathematical imaging and vision
40.1 (2011), pp. 120–145.
[6] Antonin Chambolle et al. “An introduction to total variation for image analysis”. In:
Theoretical foundations and numerical methods for sparse recovery 9.263-340 (2010), p. 227.
[7] Tony F Chan and Selim Esedoglu. “Aspects of total variation regularized L 1 function
approximation”. In: SIAM Journal on Applied Mathematics 65.5 (2005), pp. 1817–1837.
[8] L. Chato, S. Latifi, and P. Kachroo. “Total variation denoising method to improve the
detection process in IR images”. In: 2017 IEEE 8th Annual Ubiquitous Computing, Elec-
tronics and Mobile Communication Conference (UEMCON). 2017, pp. 441–447. doi:
10.1109/UEMCON.2017.8249109.
[9] Geert De Cubber. “Variational methods for dense depth reconstruction from monocular
and binocular video sequences”. In: (2010).
[10] Hendrik Dirks. “Variational methods for joint motion estimation and image reconstruction”.
PhD thesis.
25
Bibliography 26
[11] Vania V Estrela, Hermes A Magalhaes, and Osamu Saotome. “Total Variation Applications
in Computer Vision”. In: Handbook of Research on Emerging Perspectives in Intelligent
Pattern Recognition, Analysis, and Image Processing (2015), p. 41.
[12] James Franklin. “Discrete and continuous: a fundamental dichotomy in mathematics”. In:
(2017).
[13] JG Jones. “On the numerical solution of convolution integral equations and systems of
such equations”. In: Mathematics of Computation (1961), pp. 131–142.
[14] Stan Z Li. Markov random field modeling in image analysis. Springer Science & Business
Media, 2009.
[15] Stephen Marsland. Machine learning: an algorithmic perspective. CRC press, 2015.
[16] Charles A Micchelli et al. “Proximity algorithms for the L1/TV image denoising model”.
In: Advances in Computational Mathematics 38.2 (2013), pp. 401–426.
[17] Mila Nikolova. “A variational approach to remove outliers and impulse noise”. In: Journal
of Mathematical Imaging and Vision 20.1-2 (2004), pp. 99–120.
[18] Jean Petitot. “An introduction to the Mumford–Shah segmentation model”. In: Journal
of Physiology-Paris 97.2-3 (2003), pp. 335–342.
[19] Ji Qiang. “A high-order fast method for computing convolution integral with smooth
kernel”. In: Computer Physics Communications 181.2 (2010), pp. 313–316.
[20] Leonid I Rudin, Stanley Osher, and Emad Fatemi. “Nonlinear total variation based noise
removal algorithms”. In: Physica D: nonlinear phenomena 60.1-4 (1992), pp. 259–268.
[21] V Nivitha Varghees, M Sabarimalai Manikandan, and Rolant Gini. “Adaptive MRI image
denoising using total-variation and local noise estimation”. In: Advances in Engineering,
Science and Management (ICAESM), 2012 International Conference on. IEEE. 2012,
pp. 506–511.