Computer Vision and Graphics (ee2031) - University of Surreyinfo.ee.surrey.ac.uk/Teaching/.../ImageProcessing... · Learning Outcomes After attending this lecture, and doing the reading

Computer Vision and Graphics (ee2031) Digital Image Processing I

Dr John Collomosse [email protected]

Centre for Vision, Speech and Signal Processing University of Surrey

Learning Outcomes After attending this lecture, and doing the reading and labwork, you should be able to:

• Describe the basic framework for performing linear filtering on a digital image (convolution)

• Implement image blurring and sharpening operations.

• Compare and contrast several low-pass filters and describe their operation in the context of image processing.

• Define the Fourier transform in both continuous and discrete terms for the 1D and 2D cases.

• Describe the convolution theorem, its links to the Fourier transform, and its implications for digital image processing.

Credit: Some images in these slides from Noah Snavly (Cornell). David Lowe (Columbia). Steve Seitz (Washington). Various creative commons sources.

Further reading:

What is an Image?

An image is a rectangular grid (raster) of picture cels (= pixels)

= 255 255 255 255 255 255 255 255 255 255 255 255

255 255 255 255 255 255 255 255 255 255 255 255

255 255 255 20 0 255 255 255 255 255 255 255

255 255 255 75 75 75 255 255 255 255 255 255

255 255 75 95 95 75 255 255 255 255 255 255

255 255 96 127 145 175 255 255 255 255 255 255

255 255 127 145 175 175 175 255 255 255 255 255

255 255 127 145 200 200 175 175 95 255 255 255

255 255 127 145 200 200 175 175 95 47 255 255

255 255 127 145 145 175 127 127 95 47 255 255

255 255 74 127 127 127 95 95 95 47 255 255

255 255 255 74 74 74 74 74 74 255 255 255

255 255 255 255 255 255 255 255 255 255 255 255

255 255 255 255 255 255 255 255 255 255 255 255

Typically 1 byte (8 bits) per pixel. 0=black, 255=white.

Greyscale (Y)

Colour Images

An image is a rectangular grid (raster) of picture cels (= pixels) Colour images use 3 rasters (Red, Green, Blue)

Y= 0.30 (R) + 0.59 (G) + 0.11 (B)

For this introduction we process a colour image simply by processing R, G and B rasters independently, as you would a greyscale image.

Image as a function

We can think of a greyscale image as a function R2 → R

=255 255 255 255 255 255 255 255 255 255 255 255

255 255 255 255 255 255 255 255 255 255 255 255

255 255 255 20 0 255 255 255 255 255 255 255

255 255 255 75 75 75 255 255 255 255 255 255

255 255 75 95 95 75 255 255 255 255 255 255

255 255 96 127 145 175 255 255 255 255 255 255

255 255 127 145 175 175 175 255 255 255 255 255

255 255 127 145 200 200 175 175 95 255 255 255

255 255 127 145 200 200 175 175 95 47 255 255

255 255 127 145 145 175 127 127 95 47 255 255

255 255 74 127 127 127 95 95 95 47 255 255

255 255 255 74 74 74 74 74 74 255 255 255

255 255 255 255 255 255 255 255 255 255 255 255

255 255 255 255 255 255 255 255 255 255 255 255

f(x,y) = the intensity of pixel (x,y)

A digital image f(x,y) only has compact support

Image as a function

When we “do image processing” we can think of transforming the function f(x,y) to form a new function g(x,y).

These lectures will focus on a class of transform called

Linear transforms Because they:

1) are useful (e.g. Noise reduction, finding edges, sharpen detail)

2) can be performed efficiently via convolution

g (x,y) = f (x,y) + 20 g (x,y) = f (-x,y)

Noise reduction

Suppose we take a photo of a stationary scene.

f(x,y) = I(x,y) + N(0,σ)

If noise obeys central limit theorem, then we can take many photos and average them to obtain a less noisy result.

1 10 100 1000

image = signal + noise

Gaussian

In that example we used a Gaussian distribution to model noise

N(µ,σ)

A Gaussian distribution is a generalised normal distribution with any mean (here µ=0) and standard deviation (here σ=10).

N(0,1) is the Normal distribution

Noise reduction

What if we only have one photo (i.e. typical image processing)?

A pixel is very similar to its neighbours (spatial coherence)

Average groups of neighbouring pixels together.

The window centres on each pixel and computes mean value of pixels beneath it. The result is written to a new image.

0 4 0 0 0 0

0 0 0 0 0 0

0 0 0 3 0 0

0 0 0 0 0 0

0 2 0 0 0 0

0 0 0 0 0 0

0 1 0 0

0 0 0 0

0 1 0 0

0 0 0 0

Input f(x,y) Output g(x,y)

Another way of saying the same...

Consider a window containing a set of values.

For each pixel in the input:

1. Window values are multiplied with image beneath

2. The sum of these products is written to output image.

e.g. (0 x 1) + (4 x 1) + (0 x 1) +... = 4/9

0 4 0 0 0 0

0 0 0 0 0 0

0 0 0 3 0 0

0 0 0 0 0 0

0 2 0 0 0 0

0 0 0 0 0 0

1 1 0 0

0 0 0 0

0 1 0 0

0 0 0 0

1 1 1

1 1 1

1 1 1

1/9 x

Convolution

Input f(x,y) Output g(x,y)

Terminology Image f(x,y) was transformed into g(x,y) via convolution.

Each pixel was “replaced” by a linear combination of its neighbours. This is called linear filtering.

The weightings for each pixel were defined by the window

Input f(x,y)

1 1 1

1 1 1

1 1 1

Output g(x,y)

* =

“Window” = “Template” = “Kernel” = “Filter” = “Mask”=...

Convolution operator, not multiplication!

i.e. the prescription for a linear filter is the values in the window

Closer look at the box filter

1/9 1/9 1/9

1/9 1/9 1/9

1/9 1/9 1/9 “Box filter” / “Box blur”

“Mean filter”

Any filter is itself a signal h(x,y)

Can be padded with zeros to match image size

Example of Box Filter

Blocky / square artifacts

More on this later...

Closer look at convolution process

Convolu,on expressed using * operator

1/9 1/9 1/9

1/9 1/9 1/9

1/9 1/9 1/9

side 2k+1

i.e. k=1

Other choices of filter

We can put different values in the window to create different effects (i.e. produce different linear filters):

0 0 0

0 1 0

0 0 0

Original Identical image

* =

Other choices of filter:


0 0 0

0 0 1

0 0 0

Original Shifted left By 1 pixel

* =

Other choices of filter:

Original

1 1 1

1 1 1

1 1 1

0 0 0

0 2 0

0 0 0 - Sharpening filter

(accentuates edges)

= *


Example of Image Sharpening

More on blurring

We get better results blurring with a “Gaussian” filter vs. the “box” filter.

0.11 0.11 0.11

0.11 0.11 0.11

0.11 0.11 0.11

0.06 0.13 0.06

0.13 0.24 0.13

0.06 0.13 0.06

3x3 box filter

3x3 Gaussian

Original

Box Gaussian

2D Gaussian

0.06 0.13 0.06

0.13 0.24 0.13

0.06 0.13 0.06

In this case x and y are measured as offsets from the centre of the template. The standard deviation is fixed. The window truncates the Gaussian function beyond a certain distance.

Convolution - Topics

Convolution is a versatile filtering mechanism, but:-

1) As described, it is slow O(nm) and will take ages to process modern digital images e.g. multi-megapixel

2) We don’t yet understand why particular sets of values in the filters have the result they do....

To answer both we need to understand Fourier’s theorem.

Fourier’s Theorem

“Any periodic signal can be synthesised by summing (possibly infinitely) many sine and cosine waves of various amplitudes and frequencies”

(or equivalently: many cosine waves with various phase, ampltiude and frequency)

Fourier Synthesis Example Adding sine waves of increasing frequency (with decreasing amplitude) to make a square wave:

y = sin(t); y = sin(t) + sin(3*t)/3; y = sin(t) + sin(3*t)/3 + sin(5*t)/5 + sin(7*t)/7 + sin(9*t)/9;

Fourier Transform (Terminology) The Fourier Transform (FT) is a piece of mathematics that decomposes a real signal into its individual frequency components.

Spatial domain

Frequency domain

FT ( Analysis)

IFT ( Synthesis)

You can convert a signal in the spatial domain to the frequency domain, and back again, with no loss of information.

Fourier Transform (Continuous) 1D Fourier transform: 1D Inverse Fourier Transform

f(x) is the signal.

F(u) is the “response” at frequency ‘u’.

The response comprises a magnitude (r) and a phase (φ).

Complex numbers

Fourier Transform (Continuous) 1D Fourier transform: 1D Inverse Fourier Transform

f(x) is the signal.

F(u) is the “response” at frequency ‘u’. Complex numbers

Normalisation required because u is angular frequency (u=2πv)

Discrete Fourier Transform

1D Discrete Fourier Transform: 1D Inverse DFT:

Because a digital signal has compact support, the DFT is used on digital signals. It is near-identical in form to continuous FT.

The Fast Fourier Transform (FFT) is a fast way of computing DFT, that works only when N is a power of 2. (Cooley et al. 1965)

Demo

2D DFT FT and DFT also work over 2D (or any-D) signals. The 2D case is very important, because images are 2D signals - recall f(x,y)

2D DFT:

2D IDFT:

Recall that converting to/from frequency domain is loss-less.

2D DFT / IDFT allows us to manipulate image in frequency domain.

Image Frequency domain

Manipulate frequencies

Result image

FT IFT

2D DFT – Implementation The 2D DFT is a “separable” transform.

Separability makes 2D DFT fast to compute.

If image has side length in powers of 2, can use FFT instead of DFT to speed up even further (equivalently you can pad the image with a border of zeros until it has sides of power 2).

... is computable by running 1D DFT over each image row, and then running each column of the result through its own 1D DFT.

2D DFT What does an image “look” like in frequency domain?

Visualising |F(u,v)| (i.e. amplitude of frequencies)

Demo

Origin (i.e. dc component)

Lower frequencies

Higher frequencies

f(x,y)

F(u,v)

2D DFT Simpler examples

What will |F(u,v)|

look like?

f(x,y)

F(u,v)

2D DFT Simpler examples

Although the result is predominantly what you would expect, there are additional high frequencies introduced.

This is because the signal isn’t periodic (most images aren’t)

What will |F(u,v)|

look like?

2D DFT Image processing by manipulating frequency domain F(u,v):-

“Ideal” Low-pass filter “Ideal” High-pass filter

Recall: Convolution

Convolu,on expressed using * operator

1/9 1/9 1/9

1/9 1/9 1/9

1/9 1/9 1/9

side 2k+1

i.e. k=1

Slow O(nm)

Convolution Theorem Convolution can be performed faster by converting both the image and filter into the frequency domain (2D DFT), multiplying them together, and converting the result back (2D IDFT).

f(x,y) image FT

FT h(x,y) filter

IFT

F[.] indicates FT of function.

By considering convolution in this way, we can also understand why filters behave the way they do.

Recall: Why is Gaussian better?

We get better results blurring with a “Gaussian” filter vs. the “box” filter.

0.11 0.11 0.11

0.11 0.11 0.11

0.11 0.11 0.11

0.06 0.13 0.06

0.13 0.24 0.13

0.06 0.13 0.06

3x3 box filter

3x3 Gaussian

Original

Box Gaussian

Fourier Analysis of Common Filters Visualisations of 1D box and Gaussian filters.

FT

FT

Fourier Analysis of Common Filters

Visualisations of 2D box and Gaussian filters.

FT

FT

Gaussian

Box Sinc

Gaussian

Observations

The “Ideal” low pass filter is “sinc”: sinc (x) = sin(x)/x

The FT of a box is a sinc scaled according to the size of the box.

The FT of a Gaussian of σ is a Gaussian of 1/σ

The opposite holds too (i.e. FT of a sinc is a box, etc.).

Recall: Box vs. Gaussian blur

Can you explain the artifacts in the box filtered image?

Original

Box Gaussian

Box

Gaussian

Question

How do we produce this “ideal” low-pass filtering scenario:

f(x,y) image FT

FT h(x,y) filter

IFT

?

Answer: Use 2D sinc filter (“ideal low-pass filter”) But sinc is an infinite series and thus cannot be represented in digital images, because they have compact support

Ideal low-pass filter

Sinc is an oscillating, infinite series and unsuitable for digital images, because they have compact support

A truncated sinc signal in spatial filter creates artifacts in the frequency domain and thus ringing artifacts in the image.

FT

FT

IFT

Gaussian low-pass filter

A Gaussian does not have this problem. Although it is an infinite series it does not oscillate, and is “well behaved”

(FT of a Gaussian σ is a Gaussian 1/σ).

So, 1/σ determines the bandwidth of the frequencies passed.

FT

IFT

FT

Back to sharpening:

Original

1 1 1

1 1 1

1 1 1

0 0 0

0 2 0

0 0 0 - Sharpening filter

(accentuates edges)

= *

unfiltered

filtered

Back to sharpening What does the blurring take out?

original smoothed (5x5)

–

detail

=

sharpened

=

original detail

+ α

Source: S. Lazebnik

Boost detail and add it back :

Sharpening

Gaussian scaled impulse Laplacian of Gaussian

image blurred image

1 1 1 1 1 1 1 1 1

0 0 0 0 2 0 0 0 0 -

Fourier analysis - LoG

Laplacian of Gaussian (LoG)

FT

Similar to Gaussiam, the FT of LoG is a LoG. What will this do to the high frequencies?

Example of Image Sharpening

Summary After attending this lecture, and doing the reading and labwork, you should be able to:

• Describe the basic framework for performing linear filtering on a digital image (convolution)

• Implement image blurring and sharpening operations.

• Compare and contrast several low-pass filters and describe their operation in the context of image processing.

• Define the Fourier transform in both continuous and discrete terms for the 1D and 2D cases.

• Describe the convolution theorem, its links to the Fourier transform, and its implications for digital image processing.

Documents

Computer Vision and Graphics (ee2031) - University of Surreyinfo.ee.surrey.ac.uk/Teaching/.../ImageProcessing... · Learning Outcomes After attending this lecture, and doing the reading