Connections between discrete and continuous models for ... · Connections between discrete and continuous models for biological image analysis Thesis Report Submitted in partial ful

Connections between discrete andcontinuous models for biological image

analysis

Thesis Report

Submitted in partial fulfillment of the requirements of

BITS F422T Thesis

By

Anmol Kumar Gupta

ID No. 2013B4A7588P

Under the supervision of:

Dr. Ivo F. Sbalzarini

&

Dr. Rajiv Kumar

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI, PILANI CAMPUS

May 2018

http://www.bits-pilani.ac.in/

Declaration of Authorship

I, Anmol Kumar Gupta, declare that this Thesis Report titled, ‘Connections between discrete

and continuous models for biological image analysis’ and the work presented in it are my own. I

confirm that:

This work was done wholly or mainly while in candidature for a research degree at this

University.

Where any part of this thesis has previously been submitted for a degree or any other

qualification at this University or any other institution, this has been clearly stated.

Where I have consulted the published work of others, this is always clearly attributed.

Where I have quoted from the work of others, the source is always given. With the

exception of such quotations, this thesis is entirely my own work.

I have acknowledged all main sources of help.

Where the thesis is based on work done by myself jointly with others, I have made clear

exactly what was done by others and what I have contributed myself.

Signed:

Date:

i

Certificate

This is to certify that the thesis report entitled, “Connections between discrete and contin-

uous models for biological image analysis” and submitted by Anmol Kumar Gupta ID No.

2013B4A7588P in partial fulfillment of the requirements of BITS F422T Thesis embodies the

work done by him under my supervision.

Supervisor

Dr. Ivo F. Sbalzarini

Professor, Faculty of Computer Science

and Faculty of Mathematics,

Center for Systems Biology, TU Dresden,

Germany

Date:

Co-Supervisor

Dr. Rajiv Kumar

Professor, Department of Mathematics,

BITS-Pilani, Pilani Campus,

India

Date:

ii

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI

Abstract

Bachelor of Engineering (Hons.) Computer Science

Master of Science (Hons.) Mathematics

Thesis Report

by Anmol Kumar Gupta

Mathematical connections between discrete and continuous models in image analysis give

important insights into the uniqueness and stability of the solution, as well as help to choose

the appropriate model with least error. The error bounds from the discrete model can be used

to comment about the quality of the solution of a continuous model but also one can use the

physical interpretation of the continuous model to rationalize the choice of functions/potentials

in the discrete problem. We observe that there is a strong connection between discrete and

continuous convolution and the errors in discrete convolution filter can be studied via that

connection. Also, we observe that the solution to denoising using Markov Random Fields, a

discrete model, becomes identical to Total Variational denoising, a continuous model, when

particular potential functions are chosen in the formulation of Markov Random Fields.

http://www.bits-pilani.ac.in/

Acknowledgements

First of all, I would like to thank my supervisor Dr. Ivo F. Sbalzarini. He offered me an interesting

topic which deals with the dichotomy: continuous and discrete in the area of biological image

analysis. I have enjoyed all the discussions, suggestions, and help which he offered during group

meetings and one on one interactions. Much of my current progress is dedicated to the ideas

resulting from said discussions.

I am also grateful to Dr. Rajiv Kumar for agreeing to be the co-supervisor from BITS. I admire

his part of taking care of all the formalities of the University while I am writing my off-campus

thesis.

Next, I would like to thank Karl, a Ph.D. student at the MOSAIC group, for his suggestions

regarding calculations in Chapter 2. I have learned a lot from him in terms of mathematical

reasoning. Also, I would like to thank Joel, a Masters student at the MOSAIC group, for the

discussions on discretized continuous convolution.

Last I would like to thank each and every group member of MOSAIC group for making the

workplace full of fun and open to discussions.

iv

Contents

Declaration of Authorship i

Certificate ii

Abstract iii

Acknowledgements iv

Contents v

List of Figures vii

List of Tables viii

1 Introduction 1

2 Connection between discrete and continuous Convolution 2

2.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1.1 Continuous Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1.2 Discrete Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Mathematical framework for connection . . . . . . . . . . . . . . . . . . . . . . . 3

2.2.1 Selecting a discrete filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.2 Writing the expression for convolution corresponding to the filter . . . . . 4

2.2.3 Discretizing the continuous convolution . . . . . . . . . . . . . . . . . . . 5

2.2.4 Comparing the algebraic expressions and solving for nodal points . . . . . 5

2.3 Connection between quadrature schemes and discrete convolution filter . . . . . . 7

3 Connection between Markov Random Fields and Variational Model for adenoising problem 8

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.2 Variational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.3 Bayesian Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

v

Contents vi

3.3 Mathematical framework for connection . . . . . . . . . . . . . . . . . . . . . . . 10

3.3.1 Defining MRF model for denoising . . . . . . . . . . . . . . . . . . . . . . 11

3.3.2 Discretizing the Total Variational model . . . . . . . . . . . . . . . . . . . 12

3.4 The connection depends on the choice of potentials . . . . . . . . . . . . . . . . . 13

4 Conclusion and Future work 15

A Theoretical notes 17

A.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

A.2 Numerical integration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

B Convolution Integral: reasoning from Plots and Data 20

B.1 Inference from Absolute error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

B.2 Inference from number of computations . . . . . . . . . . . . . . . . . . . . . . . 21

B.3 Inference from Plots of different quadrature schemes . . . . . . . . . . . . . . . . 22

C Illustrative example of MRF Denoising 23

C.1 Binary MRF Model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

C.2 Algorithm for Binary MRF and results . . . . . . . . . . . . . . . . . . . . . . . . 24

List of Figures

2.1 three-point Gaussian kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Discrete Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Mid point rule to evaluate convolution . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 Graph structure for MRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1 Test image for denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 L2/TV denoising result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 L1/TV denoising result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

B.1 Absolute Errors in Numerical Integration . . . . . . . . . . . . . . . . . . . . . . 20

B.2 Computational Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

B.3 Error plots for Numerical integration methods . . . . . . . . . . . . . . . . . . . . 22

C.1 MRF toy example results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

vii

List of Tables

A.1 Gauss–Legendre quadrature coefficients . . . . . . . . . . . . . . . . . . . . . . . 19

viii

Chapter 1

Introduction

The distinction between discrete and continuous models is a fundamental dichotomy in mathe-

matics and related fields [12]. In the world of image analysis, these two approaches to solve a

problem have their own importance. On one hand one can get an analytical/numerical solution

of a problem using a continuous model whereas, on the other hand, one can get a speed-up

with an error if the same problem is solved using a discrete model with some assumptions. But,

we can’t always have an analytical solution for a continuous problem and also we can’t always

know the right assumptions of how our discrete model solved the problem like a black box. In

both the scenarios if we try to study these two models taking simple examples then we can

hope to understand the link between these two worlds and give a theoretical estimate about the

suitability of the model for solving a problem or even what can be a proper mix of the steps if

one uses both the approaches.

This work aims to establish a link between two worlds viz. discrete and continuous in the area

of image analysis by studying two important processes in image analysis: Convolution and

Denoising. Chapter two provides a mathematical framework for the connection between the

continuous and the discrete models of convolution by taking an example of a low pass Gaussian

filter whereas in third chapter a well-known denoising problem will be studied from a continuous

perspective using Variational Models and from a discrete perspective using Markov Random

Fields (MRFs). In the last chapter of this work, we present the results obtained from denoising

an image using L1/TV and L2/TV models and propose a framework with which we can find a

connection between variational models and artificial neural networks.

1

Chapter 2

Connection between discrete and

continuous Convolution

The motivation to study the connection between discrete and continuous convolution is to have a

mathematical framework to comment on the exactness of the error and convergence of a discrete

convolution filter.

2.1 Convolution

The convolution is the backbone of all filtering techniques and is the most frequently used

operation in image processing, hence analyzing the connection will provide a general framework

for finding connections between discrete and continuous models in filtering domain. It is also easy

to analyze each term in the convolution operation due to its mathematical simplicity because in

discrete setting it is just a summation of pixel-wise multiplications and in the continuous setting

it is just an integration of pointwise multiplication of two functions.

2.1.1 Continuous Convolution

The basic equation for convolution is as follows:

C(x) = (f ∗ g)(x) =

∫Ωf(t)g(x− t) dt (2.1)

=

∫Ωf(x− t)g(t) dt (2.2)

also called as f convolved with g, where f and g are functions defined inside Ω.

2

Chapter 2. Connection between discrete and continuous Convolution 3

2.1.2 Discrete Convolution

The following discrete form of convolution is more famous in signal processing community:

(f ∗ g)[n] =

m=∞∑m=−∞

f [m]g[n−m] (2.3)

=m=∞∑m=−∞

f [n−m]g[m]. (2.4)

Here, m is often referred to as support of function g. But for image processing we use the

discrete form equation with a finite support. Let g has a finite support in the set −M,−M +

1, . . . ,M − 1,M then (2.4) can be re-written as a finite summation:

(f ∗ g)[n] =m=M∑m=−M

f [n−m]g[m]. (2.5)

For each pixel, (2.5) computes the inner product of current sample neighborhood 1 and kernel 2.

2.2 Mathematical framework for connection

Notations used:

f [n], g[m]: discrete image pixels and filter kernel respectively.

f(x), g(t): continuous representation of image and filter respectively.

xi, ti: nodal points in quadrature scheme to index f(x) and g(t) respectively.

We propose the following stepwise procedure to show the connection between discrete and

continuous convolution:

• Step 1: Selecting a discrete filter.

• Step 2: Writing the expression for discrete convolution with the filter chosen.

• Step 3: Discretizing the convolution integral.

• Step 4: Comparing the algebraic expressions of Step 2 and 3 term by term and solving

for nodal points.

After following the above steps one can easily reason about the equivalence of the discrete filter

and the choice of quadrature scheme characterized by unique nodal points.

1Here, neighborhood refers to the set n−M, . . . , n + M including n.2Kernel in image processing terminology refers to convolution matrix or mask which when convolved with the

image can produce effects like blurring, edge detection and many more.


Figure 2.1: Normalized three-point Gaussian kernel.

Figure 2.2: A single location in discrete convolution.

2.2.1 Selecting a discrete filter

The first step is to choose a filter kernel to be convolved with the image. We need to know the

filter type i.e Mean, Median, Gaussian, Laplacian, etc. and the kernel dimension to completely

define a discrete filter. During discretization step of the convolution integral the kernel dimension

will decide the number of nodal points to be used in the quadrature scheme3. For analysis in the

upcoming sections, we choose a three-point Gaussian kernel given by 1/4[1,2,1] as shown in the

figure.

2.2.2 Writing the expression for convolution corresponding to the filter

To compare the discrete and continuous convoltuion we first write the algebraic expression

for the discrete convolution corresponding to the filter chosen in previous step. For a given

one dimensional image f indexed by n ∈ 0, 1, . . . , N − 2, N − 1, the expression for discrete

convolution for each pixel n of the image corresponding to filter g = 1/4[1,2,1] is as follows:

(f ∗ g)[n] =

m=1∑m=−1

f [n−m]g[m] (2.6)

=1

4(1.f [n− 1] + 2.f [n] + 1.f [n+ 1]) . (2.7)

WLOG we can choose zero padding i.e f [n] = 0 for n /∈ 0, 1, . . . , N − 2, N − 1 to avoid overflow

in the above equation at boundary pixels.

3Numerical integration method for discretizing the convolution integral.


2.2.3 Discretizing the continuous convolution

The expression for convolution integral when an image f convolved with a filter g having a finite

support t ∈ −k, k ⊂ IR is as follows:

C(x) = (f ∗ g)(x) =

∫ k

−kf(x− t)g(t) dt. (2.8)

There are two questions which arise at this point:

• Which quadrature scheme to use for discretizing the convolution integral in (2.8)?. In

principle, one can use any quadrature schemes from Midpoint Rule, Simpsons’ Rule or

Trapezoidal Rule to discretize the integral. But, there are two points to be noted. First,

depending on the quadrature scheme chosen the error in evaluating the integral will vary.

Second, for the same choice of quadrature scheme when a different kernel width is chosen

for the discrete filter then the convergence rate will vary which will give an idea of how

the discrete filter converges as one increases the kernel width (see B).

• How many nodal points to choose for the quadrature scheme? The answer depends on the

size of the kernel. The nodal points indicate where the continuous filter function inside the

continuous integral is sampled. Therefore, the number of sampled points for the continuous

filter function should match the corresponding discrete kernel size. Since we have chosen a

three-point discrete Gaussian kernel, therefore, the number of nodal points to be used in

the quadrature scheme would be three.

Applying the composite Mid point rule to (2.8) gives

(f ∗ g)(x) =

∫ k

−kf(x− t)g(t) dt (2.9)

=

∫ t1

t0

f(x− t)g(t) dt+

∫ t2

t1

f(x− t)g(t) dt+

∫ x3

x2

f(x− t)g(t) dt (2.10)

≈ hf(x− t1/2)g(t1/2) + hf(x− t3/2)g(t3/2) + hf(x− t5/2)g(t5/2), (2.11)

where ti−1/2 = (ti + ti−1)/2 and h is the sampling width.

2.2.4 Comparing the algebraic expressions and solving for nodal points

We compare the algebraic expressions (2.7) and (2.11) obtained in step 2 and step 3 respectively

to calculate the nodal points for the quadrature scheme chosen. It is to be noted that the image

in both the expressions (2.7) and (2.11) is sampled at same points which leads to canceling of


Figure 2.3: Nodal points of Mid point rule and area of the boxes numbered 1, 2, 3 overgaussian curve are in ratio 1:2:1 corresponding to the discrete filter chosen.

the image term leading to

hg(t1/2) = 1/4, (2.12)

hg(t3/2) = 2/4, (2.13)

hg(t5/2) = 1/4. (2.14)

Observation: Since, the mid-point rule is applied on the interval −k, k therefore by symmetry

t3/2 = 0. (2.15)

Eliminating h from equations (2.12) and (2.13), we obtain

g(t1/2) =1

2g(t3/2). (2.16)

A continuous Gaussian filter can be represented as

g(t) =1

cexp

(− t2

2σ2

), (2.17)

where c is the normalizing constant and σ is the standard deviation. Using (2.16) and (2.17), we

obtain

−t21/2

2σ2= − ln 2 +−

t23/2

2σ2(2.18)

and finally using (2.15), we get

t1/2 = −√

2 ln 2σ. (2.19)


Similarly, we solve for t5/2 to get

t5/2 =√

2 ln 2σ. (2.20)

Also,

h = t3/2 − t1/2 =√

2 ln 2σ. (2.21)

The nodal points and the sample width h completely define the quadrature scheme. Therefore, a

discrete convolution with filter 1/4[1,2,1] is equivalent to to the continuous convolution discretized

by three point Mid-point rule characterized by the nodal points as given by (2.19), (2.15), (2.20)

and sample width as given by (2.21).

2.3 Connection between quadrature schemes and discrete con-

volution filter

In this section, we discuss how the error and convergence rate of discrete convolution filter can

be studied using the quadrature scheme chosen to discretize the convolution integral.

• A note on error analysis: If we sample the Gaussian curve exactly at the points given

by (2.15), (2.19) and (2.20) for the continuous convolution discretized by Midpoint Rule

then we’ll get exactly the same algebraic expression as three-point Gaussian convolution

filter 1/4[1,2,1]. Therefore, the error in the discrete convolution is exactly same as the

error of the quadrature scheme for discretizing the convolution integral. But different

quadrature schemes have different error bounds. Therefore, one can choose a suitable

quadrature scheme to report the lowest error. Appendix B discusses errors for different

quadrature schemes.

• A note on convergence rate: The kernel width (window size) is directly related to the

number of nodal points in the quadrature scheme and since the convergence rate of a

quadrature scheme depends on the number of nodal points (see Appendix B) therefore one

can study how a discrete convolution filter converges as the window size is increased.

Therefore, the discrete filter convolution is same as the continuous convolution discretized using

a specific quadrature scheme.

Chapter 3

Connection between Markov

Random Fields and Variational

Model for a denoising problem

The motivation to study the connection between Markov Random fields and Variational Model

for denoising is to see how choosing a particular prior in Markov Random Fields (discrete) is

connected to optimization of energy functional of Variational methods (continuous).

3.1 Introduction

In this chapter, we take yet another classical problem apart from convolution which is denoising.

Denoising is the process of recovering the original image from a corrupted or noisy image. Image

denoising is a well-studied process in computer vision, image processing, and signal processing

community. In all these communities there are two main groups of people, ones who work with

continuous models like variational models, contour-based and level-set method [3] and others who

work with discrete models, for instance, Probabilistic Graphical models like Markov Random

Fields (MRFs). There have been many advancements in both the fields [3], [11], [14], [20] but

much less work has been done in establishing the connection between these two models [4].

We take a simple problem of denoising and discuss how we model it using Markov Random

Fields which is a discrete model. Then we ask a question which assumptions in Total Variational

denoising (TVD), a continuous model, will lead to the exact same expression for data fidelity

and prior term in MRF energy equation. It is to be noted that both these models are based on

Bayesian framework, therefore, we start with a hypothesis that for a specific formulation of MRF

potentials there must be mathematical connection with a TVD which solves the same problem in

8

Chapter 3. Connection between MRFs and Variational Model for a denoising problem 9

continuous domain. In other words, we can ask which discretization scheme in TVD leads to

exactly the same formulation of energy in MRF.

3.2 Preliminaries

3.2.1 Markov Random Field

A Markov Random Field is an undirected graph represented by a countable set of random

variables, which in image analysis applications is indexed by spatial position of image pixels.

The model ingredients which completely define MRFs include the following:

1. set of random variables defining the number of unknown variables in the model,

2. an undirected graph representing a family of distribution characterized by Markov blanket1,

3. range of random variables defining the permissible values for the random variables, and

4. potentials functions defining the relationship between the random variables according to

the apriori assumptions about the model.

3.2.2 Variational Methods

Variational Methods are specific class of optimization methods. The key idea is to define a

cost functional which is then minimized or maximized depending on the problem. Variational

methods are widely used in image analysis in the area of denoising [20], segmentation [18], motion

estimation [10], dense 3 D reconstruction tracking [9]. Some of the advantages of Variational

methods includes in-depth analysis of cost functionals allowing statements about uniqueness

and stability of the solution. The uniqueness is guaranteed if the energy functional is strictly

convex. The stability of the solution depends on how much variation in the solution is caused by

changing the initial conditions of the problem for instance nature of noise.

Rudin, Osher, and Fatemi [20] used the total variation as an image regularization method to

remove noise as well as preserve edges. Later TV showed efficiency in solving various image

inverse problems such as deblurring, restoration, and inpainting [8], [21]. Mathematically, TV

for a function u ∈ IRn>1 is defined as the integral of its gradient magnitude i.e.

‖u‖TV =

∫‖∇u‖. (3.1)

1set of neighbors of a node (random variable) in an undirected graph following Markovian property ofconditional independence.


3.2.3 Bayesian Framework

Denoising is an inverse problem and MRF and TVD are both connected via Bayesian framework

as the former stresses on the maximization of the probability of an event with Markovian

assumptions of conditional independence and the latter relies on minimizing the continuous

energy functional involving TV term (3.1).

The posterior distribution for our denoising model is as follows:

Probability(ture image | noisy image) = P (u | v). (3.2)

Applying Bayes rule to (3.2) we obtain

P (u | v) =P (v |u).P (u)

P (v), (3.3)

where P (u | v) is the posterior term, P (v |u) is the likelihood of noisy image given the true image,

P (u) is called prior and the P (v) is a normalization term (constant) or

Posterior ≈ Likelihood xPrior. (3.4)

The goal is to maximize the posterior or in other words minimize the negative logarithm of

posterior i.e. energy of the model. Taking negative logarithm on both sides of 3.4 we get

minimize E(u) = Edata(u, v) + λEprior(u), (3.5)

where Edata is data association term that takes into account the the nature of noise as well as

the relationship between the true and the noisy image, Eprior is the regularity term that takes

into account how likely a priori is a given solution (true image), and λ > 0 is regularization

parameter controls the smoothness of the expected image. This is a general form of optimization

model used by both MRF and TVD as illustrated in upcoming sections.

3.3 Mathematical framework for connection

Notations used:

u: true image.

v: noisy image.

N(i, j): neighborhood of the pixel (i, j).

In this section, we provide a simple formulation to analyze the connection by focusing on the

choice of potentials in MRFs. We first define the MRF and then discretize the TVD so that


Figure 3.1: Here, the white nodes denote the variables corresponding to true imageu and the grey nodes denote variables corresponding to noisy image v. Each vi,j isconnected to the corresponding ui,j , and each ui,j is connected to its neighbors.

both the models can be compared term-wise. Then we reason about how the choice of potentials

in MRF affects the convexity of the problem.

3.3.1 Defining MRF model for denoising

If the image size is M x N then there are 2 x M x N random variables in our model represented

by ui,j and vi,j where 1≤ i ≤M and 1≤ j ≤N representing M x N independent pixels. The

undirected graph is represented by the pixel grid of both the images with pixels as the vertices

and the neighborhood connection as well as the connection between true image pixels and noisy

image pixels as edges as shown in Figure 3.1.

The goal is to maximize the posterior probability P (u | v) i.e. the probability of observing the

true image give the noisy version of it, i.e

u = arg maxu

P (u | v) (3.6)

= arg maxu

P (v |u)︸︷︷︸Likelihood

. P (u)︸︷︷︸Prior

. (3.7)

Modeling the likelihood: To model the likelihood we assume that for each pixel (i, j), the

measured intensity, vi,j , is given by the true intensity, ui,j , plus additive noise. For illustration

purpose we take the noise to be Gaussian for which the likelihood is as follows

P (vi,j |ui,j) ∝ exp(−(ui,j − vi,j)2

2σ2

). (3.8)

All measurements are mutually independent therefore the likelihood for the entire image v

P (v |u) =∏i,j

P (vi,j |u) =∏i,j

P (vi,j |ui,j) ∝∏i,j

exp

(−(ui,j − vi,j)2

2σ2

). (3.9)


Modeling the prior: We expand the prior as follows

P (u) =∏i,j

P (ui,j |N(ui,j)), (3.10)

where we assume a Markov property, namely that the probability of ui,j is sufficiently characterized

by its neighbors N(ui,j). Assuming a simple smoothness prior, we have

P (u) ∝∏i,j

exp

(−λ√

(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2

), (3.11)

where λ > 0 is a regularization parameter controlling the smoothness.

With these assumptions the posterior distribution is given by

P (u | v) ∝∏i,j

exp

(−(ui,j − vi,j)2

2σ2

).∏i,j

exp

(−λ√

(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2

). (3.12)

Instead of maximizing this probability distribution, one can equivalently minimize its negative

logarithm which is given by the energy

minimizeE(u) =− logP (u | v) (3.13)

=∑i,j

(ui,j − vi,j)2

2σ2︸︷︷︸Edata(u, v)

+λ∑i,j

√(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2

︸︷︷︸Eprior(u)

(3.14)

and thus the expected true image is

u =arg minuE(u) (3.15)

which is analogous to 3.6 and defined for Energy functional instead of posterior probability.

3.3.2 Discretizing the Total Variational model

According to Rudin, Osher Fatemi total variational model for denoising [20] the energy functional

is given by

E(u) =

∫Ω‖u− v‖2 + λ

∫Ω‖∇u‖, (3.16)

where Ω is domain of images u and v, and λ > 0 is regularization parameter.


Discretizing the first term in (3.16) i.e.∫

Ω ‖u−v‖2 is simple as integral over discrete pixels

(i, j) ∈ Ω is equivalent to summation over whole pixel grid i.e.∫Ω

(u− v)2 ≈∑i,j

(ui,j − vi,j)2 (3.17)

which is exactly equal to the Edata(u, v) term in MRF model (3.14). It is to be noted that

discretizing∫

Ω ‖u− v‖2 using composite Midpoint quadrature scheme will also yield the same

result because the sample points are discrete pixels (i, j) and therefore the step size h = 1 leading

to simple summation over the domain.

Discretizing the TV term in (3.16) involves choosing a discretization scheme for evaluating

the gradient and a suitable norm to evaluate the magnitude of the gradient. The gradient in the

TV term can be discretized by using finite differences with Neumann boundary conditions [5]

‖∇u‖ = (uxi,j , uyi,j), (3.18)

where uxi,j , uyi,j are computed as follows

uxi,j =

ui+1,j − ui,j if i<M

0 if i = M(3.19) uyi,j =

ui,j+1 − ui,j if i<N

0 if i = N. (3.20)

We observe that in (3.14) the prior term represents Euclidean distance between neighbors of a

pixel. Therefore, L2 norm for evaluating the TV term is an obvious choice

‖∇u‖2 =‖(uxi,j , uyi,j)‖2 (3.21)

=√

(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2 (3.22)

which is exactly equal to the Eprior(u, v) term in MRF model 3.14.

3.4 The connection depends on the choice of potentials

Similarity between MRF and L2/TV denoising:

• The TV model used as an example in the previous section is well known by the name of

L2/TV denoising. The example in the previous section showed how a particular choice

of potential for the MRF denoising model makes it exactly similar to L2/TV model of

denoising. We can now comment on global or local optimization of the energy functional

in MRF model. L2/TV model of denoising is convex in nature, therefore, the solution

obtained in MRF denoising using the potential as given in (3.11) will be globally optimized.


Similarity between MRF and L1/TV denoising:

• L1/TV denoising: The L1/TV denoising has the following energy functional

E(u) =

∫Ω|u− v|+ λ

∫Ω|∇u|, (3.23)

where the meaning of symbols remains the same as in (3.16). S. Alliney in [1], [2] discussed

this type of energy functional for one-dimensional signals in a discrete setting. This model

is frequently used for restoring images corrupted from impulse noise [17], [7]. The energy

functional of L1/TV model is not strictly convex therefore the minimizer cannot be globally

optimized. But the model is well-known for preserving contrast and geometry [16].

If we model the likelihood in MRF as

P (vi,j |ui,j) ∝ exp (−|ui,j − vi,j |) (3.24)

and prior as

P (ui,j |N(ui,j)) ∝ exp (−λ |(ui+1,j − ui,j)|+ |(ui,j+1 − ui,j)|) (3.25)

then it can be verified using the same steps given in the previous section that this will

lead to exactly the same expression of a discretized the L1/TV model 3.23. Since, L1/TV

model of denoising is not strictly convex [16] therefore uniqueness of the solution is not

guaranteed for the above choices of likelihood and prior.

Chapter 4

Conclusion and Future work

Conclusion:

This work used convolution and denoising as an example to present a mathematical framework

for the connection between discrete and continuous models in biological image analysis. The main

questions that have been addressed are: 1. How the choice of discrete convolution filter is related

to the quadrature scheme used to discretize the continuous convolution corresponding to the same

filter? and 2. How the choice of potentials/priors in MRF is related to TVD formulation?

Chapter 2 started with taking an example of a three-point Gaussian filter and showed how

the expression for discrete convolution with the filter is exactly same as the expression of the

convolution integral discretized using composite Midpoint Rule with specific nodal points and

sample width. It was concluded that the error in discrete convolution using the three-point

discrete filter is same as the error in discretizing the convolution integral using composite

Midpoint Rule. The error analysis of all the quadrature schemes are given in Appendix A and

using an appropriate quadrature scheme with least error one can report the error which one

makes while choosing a specific discrete filter.

Chapter 3 showed how a denoising problem is modeled using Markov Random Fields and

Variational methods and that certain choices of potentials/priors in MRF lead to exact same

expression as that of discretized L1/TV and L2/TV model of denoising.

Future work:

We have a set of noisy images of size 256 x 256 which have been denoised using L2/TV [20]

and L1/TV [7] models using the Primal-Dual Algorithm, the in-depth discussion of which is

given in the paper [6]. The denoised images can be used to find and test whether there exists

an artificial neural network, a discrete model, that can be sufficiently trained using the results

of L1/TV or L2/TV denoising, a continuous model, and if its possible then the ANN can be

15

Chapter 4. Conclusion and Future work 16

termed as a “solver” for a denoising problem. The ANN will be faster in solving the denoising

problem because in general querying a sufficiently trained network takes much less time than

performing numerical solution of a variational model. The main advantage will be a speed-up in

the response time of a denoising query1.

Figure 4.1: Original image vs Noisy image; size: 256 x 256 Source: ImageJ.

Figure 4.2: Results of the L2/TV denoising for different values of the regularizationparameter λ.

Figure 4.3: Results of the L1/TV denoising for different values of the regularizationparameter λ.

1input: noisy image, output: denoised image

Appendix A

Theoretical notes

A.1 Convolution

The basic equation for convolution is as follows:

C(x) = (f ∗ g)(x) =

∫Ωf(t)g(x− t) dt (A.1)

=

∫Ωf(x− t)g(t) dt, (A.2)

also called as f convolved with g, where f and g are functions defined inside Ω.

The following discrete form of convolution is more famous in signal processing community:

(f ∗ g)[n] =∞∑

m=−∞f [m]g[n−m] (A.3)

=∞∑

m=−∞f [n−m]g[m]. (A.4)

Here, m is often referred to as support of function g.

But for image processing we use the discrete form equation with a finite support. Let g has a

finite support in the set −M,−M + 1, . . . ,M − 1,M then (A.4) can be re-written as a finite

summation:

(f ∗ g)[n] =M∑

m=−Mf [n−m]g[m]. (A.5)

17

Appendix A. Theoretical notes 18

For each pixel n, (A.5) computes the inner product of current sample neighborhood1 and kernel2.

A.2 Numerical integration methods

Many computational applications such as image analysis need to make use of numerical integration

methods to solve an integral. In this section, Midpoint Rule, Trapezoidal Rule, Simpson’s Rule

and Gaussian Quadrature are discussed. The main focus is on how one can discretize the

convolution integral. The mathematical error bounds are also discussed which give an idea about

how a change in kernel size and sampling steps can affect the accuracy of the result.

Notations:

[a, b]: interval of integration.

n: number of subintervals.

[xi−1, xi]: subinterval, where xi = a+ ih and i = 0, 1, . . . , n.

h: step size, mathematically h = xi − xi−1 ≡ (b− a)/n.

In midpoint rule, we approximate function f(x) by its value at the midpoint of each subinterval

and the area under the curve for each such subinterval is then given as hf(xi−1/2), i.e.,

∫ b

af(x) =

∫ x1

x0

f(x) +

∫ x2

x1

f(x) + · · ·+∫ xn

xn−1

f(x) (A.6)

≈ hf(x1/2) + hf(x3/2) + · · ·+ hf(xn−1/2) (A.7)

= hn∑

i=1

f(xi−1/2) (A.8)

≡Mn(f). (A.9)

In trapezoidal rule, we approximate f by a piecewise linear function. The function value is

calculated at the nodal points, xi, and the area under the curve for each subinterval is then

given by 0.5h(f(xi−1) + f(xi)), i.e.,

∫ b

af(x) =

∫ x1

x0

f(x) +

∫ x2

x1

f(x) + · · ·+∫ xn

xn−1

f(x) (A.10)

≈ 0.5h(f(x0) + f(x1)) + 0.5h(f(x1) + f(x2)) + · · ·+ 0.5h(f(xn−1) + f(xn)) (A.11)

= 0.5h(f(x0) + 2f(x1) + f(x2) + · · ·+ 2f(xn−1) + f(xn)). (A.12)

1Here, neighborhood refers to the set n−M, . . . , n + M including n.2Kernel in image processing terminology refers to convolution matrix or mask which when convolved with the

image can produce effects like blurring, edge detection and many more.

Appendix A. Theoretical notes 19

In Simpson’s rule, in contrast to trapezoidal rule, we approximate f by a piecewise quadratic.

The function value is then calculated at successive triplets of the nodal point and the area under

the curve for each subinterval is of the form h3 (f(xi) + 4f(xi+1) + f(xi+2)), i.e.,

∫ b

af(x) =

∫ x1

x0

f(x) +

∫ x2

x1

f(x) + · · ·+∫ xn

xn−1

f(x) (A.13)

≈ h

3(f(x0) + 4f(x1) + f(x2)) +

h

3(f(x1) + 4f(x2) + f(x3))+

· · ·+ h

3(f(xn−2) + 4f(xn−1) + f(xn)) (A.14)

=h

3(f(x0) + 4f(x1) + 2f(x2) + 4f(x3) · · ·+ 2f(xn−2) + 4f(xn−1) + f(xn)). (A.15)

Gaussian quadrature approximates the integral by weighted sum as follows,∫ 1

−1f(x) ≈

n∑i=1

wif(xi), (A.16)

where wis are weights3 Conventionally, the limit of integration is [−1, 1] but the following

conversion scheme can be applied to form a formula for the interval [a, b]∫ b

af(x) dx =

b− a2

∫ 1

−1f

(b− a

2x+

a+ b

2

)dx. (A.17)

Applying Gauss quadrature rule (A.16) to (A.17) results in following∫ b

af(x) dx ≈ b− a

2

n∑i=1

wif

(b− a

2xi +

a+ b

2

). (A.18)

Table A.1 shows wi and xi values for up to n = 5.

n xi wi

1 0 2

2 ±0.57735 1

3 0, ±0.774597 0.888889, 0.555556

4 ±0.339981, ±0.861136 0.652145, 0.347855

5 0, ±0.538469, ±0.90618 0.568889, 0.478629, 0.236927

Table A.1: Gauss–Legendre quadrature coefficients

3refer table A.1.

Appendix B

Convolution Integral: reasoning

from Plots and Data

This appendix will discuss the results obtained from numerical integration methods for convolution

integral- the errors, convergence rate and computational operations. I started with a very simple

example of convolving f(x) with g(x), where f(x) and g(x) are real-valued functions. For studying

the errors in numerical integration methods, I took a Gaussian filter like function g(x) = e−x2,

neglecting the constants and taking σ = 1, and another function f(x) = x. The exact solution

comes out to be πx which can be evaluated at different indices of f . Also, the convolution

integral was discretized using different quadrature rules like Simpson, Trapezoidal, Midpoint

rules, and Gaussian quadratures.

Figure B.1: Absolute Errors in Numerical Integration Methods

20

Appendix B. Convolution Integral: reasoning from Plots and Data 21

B.1 Inference from Absolute error

Points to be noted from Figure B.1:

1. The Midpoint rule and Trapezoidal rule converges quite fast as compared to other methods.

2. The Midpoint and the Trapezoidal rules give an accuracy of up to 10−2 if an equivalent

five point stencil is used, although the accuracy increases as number of nodal/partition

points reach nine.

3. Although Gauss Quadrature converges slowly before n=3 but does quite well when compared

to Simpson’s Rule after n=5.

Figure B.2: Error analysis and Computational operations per index/pixel

B.2 Inference from number of computations


1. The number of operations per pixel/index is least in the Midpoint Rule. Also for the

Midpoint Rue, the accuracy is the highest among all the methods.

2. For a fixed n ≥ 5, Figure B.1 and B.2 suggest that Gaussian Quadrature performs well

when compared to Simpson’s rule in terms of accuracy although the number of operations

per pixel/index is the same for them.

Appendix B. Convolution Integral: reasoning from Plots and Data 22

Figure B.3: Error plots for numerical integration methods- absolute error (left)/error(right) vs number of nodal/partition points.

B.3 Inference from Plots of different quadrature schemes


1. The convergence rate of Midpoint Rule and Trapezoidal Rule is of the order 1/n2 [13]

and it is also evident from the plot that after n = 2 these two methods converge almost

together.

2. After n ≥ 5, almost all except Simpson Rule perform well. The Simpson’s Rule’s curve

can be seen to oscillate around the zero error base line.

The Midpoint Rule does quite well in terms of accuracy if an analogous 5 point stencil is taken

for discrete low pass filtering. The convergence of all these methods have been well studied in

the past [19]. Therefore, with the knowledge of computational cost and convergence rate one can

decide what filter is well suited and also how many number of sampling points will be enough to

get the result of acceptable accuracy.

Appendix C

Illustrative example of MRF

Denoising

C.1 Binary MRF Model formulation

The probability formulation of Markov Random Fields have been skipped and the focus is kept

on how to choose the potentials based on the specific assumptions about the image and the noise.

The objective of this chapter is to test the MRF denoising theory on a binary image and study

how the energy function is defined. This example has been inspired from a book on machine

learning [15].

Notations:

u: binary image to be recovered,

v: noisy image, both indexed by (i, j).

ui,j , vi,y ∈ 1,−1.N(ui,j): set of neighbors of pixel ui,j .

Task: To recover u from given v.

Assumptions:

1. The noise is assumed to be small, which implies that there must be good correlation

between the values of each pixel in two images i.e. ui,j , vi,j should be correlated.

2. For a given pixel ui,j , the other pixels in image are conditionally independent of ui,j , except

the neighbors (Markovian Blanket) of this pixel i.e. ui,j and N(ui,j) are correlated.

Modeling the data term in Energy (likelihood): The first assumption introduces data

association or similarity term i.e. how close the solution is to the exact solution. Energy of the

same pixel in two images is represented as −ηui,j .vi,j where η > 0 is a constant.

23

Bibliography 24

Modeling the regularization/smoothness term in Energy (prior): The second assump-

tion introduces a smoothness or regularity term i.e. how likely a priori a given solution is: if

it is oscillatory, then it is not likely to be a real image but if it is smooth then it is likely to

be the real image. To introduce smoothness we assume that the value pixel ui,j is conditional

dependent only on its neighbors ui±1,j±1 therefore the prior is given by −ζui,j .ui±1,j±1, where

ζ > 0 is a constant.

Total energy is given by:

E(u, v) = −η∑i,j

ui,j .vi,j − ζ∑i,j

ui,j .ui±1,j±1, (C.1)

where indexes of pixels run from 1 to N in both x and y direction [15].

C.2 Algorithm for Binary MRF and results

Algorithm 1 MRF Denoising Algorithm for binary image

• Given a noisy image v and constant parameters ζ, η.

1: Loop over pixels of u:

2: Compute Energy using equation C.1 with the current pixel being −1 and 1, whichever gives

the lower Energy accordingly, update value of the current pixel in u.

Figure C.1: Using the algorithm 1 with η = 2.1, ζ = 1.5 with noisy image corrupted by10% uniformly distributed random noise as an input, we get the denoised image havingabout 1.648% error.

Bibliography

[1] Stefano Alliney. “Digital filters as absolute norm regularizers”. In: IEEE Transactions on

Signal Processing 40.6 (1992), pp. 1548–1562.

[2] Stefano Alliney. “Recursive median filters of increasing order: a variational approach”. In:

IEEE Transactions on Signal Processing 44.6 (1996), pp. 1346–1354.

[3] Leah Bar et al. “Mumford and shah model and its applications to image segmentation

andimage restoration”. In: Handbook of mathematical methods in imaging. Springer, 2011,

pp. 1095–1157.

[4] Antonin Chambolle. “Total variation minimization and a class of binary MRF models”.

In: International Workshop on Energy Minimization Methods in Computer Vision and

Pattern Recognition. Springer. 2005, pp. 136–152.

[5] Antonin Chambolle and Thomas Pock. “A first-order primal-dual algorithm for convex

problems with applications to imaging”. In: Journal of mathematical imaging and vision

40.1 (2011), pp. 120–145.

[6] Antonin Chambolle et al. “An introduction to total variation for image analysis”. In:

Theoretical foundations and numerical methods for sparse recovery 9.263-340 (2010), p. 227.

[7] Tony F Chan and Selim Esedoglu. “Aspects of total variation regularized L 1 function

approximation”. In: SIAM Journal on Applied Mathematics 65.5 (2005), pp. 1817–1837.

[8] L. Chato, S. Latifi, and P. Kachroo. “Total variation denoising method to improve the

detection process in IR images”. In: 2017 IEEE 8th Annual Ubiquitous Computing, Elec-

tronics and Mobile Communication Conference (UEMCON). 2017, pp. 441–447. doi:

10.1109/UEMCON.2017.8249109.

[9] Geert De Cubber. “Variational methods for dense depth reconstruction from monocular

and binocular video sequences”. In: (2010).

[10] Hendrik Dirks. “Variational methods for joint motion estimation and image reconstruction”.

PhD thesis.

25

http://dx.doi.org/10.1109/UEMCON.2017.8249109

Bibliography 26

[11] Vania V Estrela, Hermes A Magalhaes, and Osamu Saotome. “Total Variation Applications

in Computer Vision”. In: Handbook of Research on Emerging Perspectives in Intelligent

Pattern Recognition, Analysis, and Image Processing (2015), p. 41.

[12] James Franklin. “Discrete and continuous: a fundamental dichotomy in mathematics”. In:

(2017).

[13] JG Jones. “On the numerical solution of convolution integral equations and systems of

such equations”. In: Mathematics of Computation (1961), pp. 131–142.

[14] Stan Z Li. Markov random field modeling in image analysis. Springer Science & Business

Media, 2009.

[15] Stephen Marsland. Machine learning: an algorithmic perspective. CRC press, 2015.

[16] Charles A Micchelli et al. “Proximity algorithms for the L1/TV image denoising model”.

In: Advances in Computational Mathematics 38.2 (2013), pp. 401–426.

[17] Mila Nikolova. “A variational approach to remove outliers and impulse noise”. In: Journal

of Mathematical Imaging and Vision 20.1-2 (2004), pp. 99–120.

[18] Jean Petitot. “An introduction to the Mumford–Shah segmentation model”. In: Journal

of Physiology-Paris 97.2-3 (2003), pp. 335–342.

[19] Ji Qiang. “A high-order fast method for computing convolution integral with smooth

kernel”. In: Computer Physics Communications 181.2 (2010), pp. 313–316.

[20] Leonid I Rudin, Stanley Osher, and Emad Fatemi. “Nonlinear total variation based noise

removal algorithms”. In: Physica D: nonlinear phenomena 60.1-4 (1992), pp. 259–268.

[21] V Nivitha Varghees, M Sabarimalai Manikandan, and Rolant Gini. “Adaptive MRI image

denoising using total-variation and local noise estimation”. In: Advances in Engineering,

Science and Management (ICAESM), 2012 International Conference on. IEEE. 2012,

pp. 506–511.

Documents

Connections between discrete and continuous models for ... · Connections between discrete and continuous models for biological image analysis Thesis Report Submitted in partial ful