172
Visual Recognition: Filtering and Transformations Raquel Urtasun TTI Chicago Jan 10, 2012 Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 1 / 91

Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Visual Recognition: Filtering and Transformations

Raquel Urtasun

TTI Chicago

Jan 10, 2012

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 1 / 91

Page 2: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Today’s lecture ...

Image formation and color

Image Filtering

Additional transformations

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 2 / 91

Page 3: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Material

Chapter 2 and 3 of Rich Szeliski book

Available online here

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 3 / 91

Page 4: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

How is an image created?

The image formation process that produced a particular image depends on

lighting conditions

scene geometry,

surface properties

camera optics

[Source: R. Szeliski]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 4 / 91

Page 5: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Image formation and color

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 5 / 91

Page 6: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

From photons to RGB values

Sample the 2D space on a regular grid.

Quantize each sample, i.e., the photons arriving at each active cell areintegrated and then digitized.

[Source: D. Hoiem]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 6 / 91

Page 7: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Problems: Aliasing

Shannons Sampling Theorem shows that the minimum sampling

fs ≥ 2fmax

If you haven’t seen this... take a class on Fourier analysis... everyone shouldhave at least one!

Figure: example of a 1D signal

[Source: R. Szeliski]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 7 / 91

Page 8: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

And in 2D...

Figure: (a) Example of a 2D signal. (b–d) downsampled with different filters

[Source: R. Szeliski]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 8 / 91

Page 9: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Color Cameras

Each color camera integrates light according to the spectral responsefunction of its red, green, and blue sensors.

R =

∫L(λ)SR(λ)dλ

G =

∫L(λ)SG (λ)dλ

B =

∫L(λ)SB(λ)dλ

where λ is the incoming spectrum of light at a given pixel, and SR ,SG ,SB ,are the red, green, and blue spectral sensitivities of the correspondingsensors.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 9 / 91

Page 10: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Bayer Pattern

Color cameras use color filter arrays (CFAs), where alternating sensors arecovered by different colored filters.

More green filters as the luminance signal is mostly determined by greenvalues and the visual system is much more sensitive to high frequency detailin luminance than in chrominance.

Demosaicing: interpolate the missing color values to have RGB values forall pixels.

Figure: (a) Bayer Pattern. (b) interpolated RGB

[Source: R. Szeliski]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 10 / 91

Page 11: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Bayer Pattern

Color cameras use color filter arrays (CFAs), where alternating sensors arecovered by different colored filters.

More green filters as the luminance signal is mostly determined by greenvalues and the visual system is much more sensitive to high frequency detailin luminance than in chrominance.

Demosaicing: interpolate the missing color values to have RGB values forall pixels.

Figure: (a) Bayer Pattern. (b) interpolated RGB

[Source: R. Szeliski]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 10 / 91

Page 12: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Bayer Pattern

Color cameras use color filter arrays (CFAs), where alternating sensors arecovered by different colored filters.

More green filters as the luminance signal is mostly determined by greenvalues and the visual system is much more sensitive to high frequency detailin luminance than in chrominance.

Demosaicing: interpolate the missing color values to have RGB values forall pixels.

Figure: (a) Bayer Pattern. (b) interpolated RGB

[Source: R. Szeliski]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 10 / 91

Page 13: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

RGB components

Figure: (a) Original image. (b) R component, (c) G component, (d) Bcomponent.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 11 / 91

Page 14: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

HSV color space

(RGB) (HSV)

There are other color spaces that might be better from a processingperspective: Lab, HSV, etc

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 12 / 91

Page 15: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

HSV components

Figure: (a) Original image. (b) H component, (c) S component, (d) Vcomponent.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 13 / 91

Page 16: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Filtering

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 14 / 91

Page 17: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Applications of Filtering

Enhance an image, e.g., denoise, resize.

Extract information, e.g., texture, edges.

Detect patterns, e.g., template matching.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 15 / 91

Page 18: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Noise reduction

Simplest thing: replace each pixel by the average of its neighbors.

This assumes that neighboring pixels are similar, and the noise to beindependent from pixel to pixel.

[Source: S. Marschner]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 16 / 91

Page 19: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Noise reduction

Simpler thing: replace each pixel by the average of its neighbors

This assumes that neighboring pixels are similar, and the noise to beindependent from pixel to pixel.

Moving average in 1D: [1, 1, 1, 1, 1]/5

[Source: S. Marschner]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 17 / 91

Page 20: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Noise reduction

Simpler thing: replace each pixel by the average of its neighbors

This assumes that neighboring pixels are similar, and the noise to beindependent from pixel to pixel.

Non-uniform weights [1, 4, 6, 4, 1] / 16

[Source: S. Marschner]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 18 / 91

Page 21: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Moving Average in 2D

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 19 / 91

Page 22: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Moving Average in 2D

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 19 / 91

Page 23: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Moving Average in 2D

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 19 / 91

Page 24: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Moving Average in 2D

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 19 / 91

Page 25: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Moving Average in 2D

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 19 / 91

Page 26: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Moving Average in 2D

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 19 / 91

Page 27: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Linear Filtering: Correlation

Involves weighted combinations of pixels in small neighborhoods.

The output pixels value is determined as a weighted sum of input pixel values

g(i , j) =∑k,l

f (i + k , j + l)h(k , l)

The entries of the weight kernel or mask h(k, l) are often called the filtercoefficients.

This operator is the correlation operator

g = f ⊗ h

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 20 / 91

Page 28: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Linear Filtering: Correlation

Involves weighted combinations of pixels in small neighborhoods.

The output pixels value is determined as a weighted sum of input pixel values

g(i , j) =∑k,l

f (i + k , j + l)h(k , l)

The entries of the weight kernel or mask h(k , l) are often called the filtercoefficients.

This operator is the correlation operator

g = f ⊗ h

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 20 / 91

Page 29: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Linear Filtering: Correlation

Involves weighted combinations of pixels in small neighborhoods.

The output pixels value is determined as a weighted sum of input pixel values

g(i , j) =∑k,l

f (i + k , j + l)h(k , l)

The entries of the weight kernel or mask h(k , l) are often called the filtercoefficients.

This operator is the correlation operator

g = f ⊗ h

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 20 / 91

Page 30: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Linear Filtering: Correlation

Involves weighted combinations of pixels in small neighborhoods.

The output pixels value is determined as a weighted sum of input pixel values

g(i , j) =∑k,l

f (i + k , j + l)h(k , l)

The entries of the weight kernel or mask h(k , l) are often called the filtercoefficients.

This operator is the correlation operator

g = f ⊗ h

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 20 / 91

Page 31: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Convolution Example

Figure: What does this filter do?

[Source: R. Szeliski]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 21 / 91

Page 32: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Smoothing by averaging

What if the filter size was 5 x 5 instead of 3 x 3?

[Source: K. Graumann]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 22 / 91

Page 33: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Gaussian filter

What if we want nearest neighboring pixels to have the most influence onthe output?

Removes high-frequency components from the image (low-pass filter).

[Source: S. Seitz]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 23 / 91

Page 34: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Smoothing with a Gaussian

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 24 / 91

Page 35: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Gaussian filter: Parameters

Size of kernel or mask: Gaussian function has infinite support, but discretefilters use finite kernels.

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 25 / 91

Page 36: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Gaussian filter: Parameters

Variance of the Gaussian: determines extent of smoothing.

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 26 / 91

Page 37: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Gaussian filter: Parameters

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 27 / 91

Page 38: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Properties of the Smoothing

All values are positive.

They all sum to 1.

Amount of smoothing proportional to mask size.

Remove high-frequency components; low-pass filter.

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 28 / 91

Page 39: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Properties of the Smoothing

All values are positive.

They all sum to 1.

Amount of smoothing proportional to mask size.

Remove high-frequency components; low-pass filter.

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 28 / 91

Page 40: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Properties of the Smoothing

All values are positive.

They all sum to 1.

Amount of smoothing proportional to mask size.

Remove high-frequency components; low-pass filter.

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 28 / 91

Page 41: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Properties of the Smoothing

All values are positive.

They all sum to 1.

Amount of smoothing proportional to mask size.

Remove high-frequency components; low-pass filter.

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 28 / 91

Page 42: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example of Correlation

What is the result of filtering the impulse signal (image) F with the arbitrarykernel H?

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 29 / 91

Page 43: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Convolution

Convolution operator

g(i , j) =∑k,l

f (i − k , j − l)h(k , l) =∑k,l

f (k , l)h(i − k, j − l) = f ∗ h

and h is then called the impulse response function.

Equivalent to flip the filter in both dimensions (bottom to top, right to left)and apply cross-correlation.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 30 / 91

Page 44: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Convolution

Convolution operator

g(i , j) =∑k,l

f (i − k , j − l)h(k , l) =∑k,l

f (k , l)h(i − k, j − l) = f ∗ h

and h is then called the impulse response function.

Equivalent to flip the filter in both dimensions (bottom to top, right to left)and apply cross-correlation.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 30 / 91

Page 45: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Matrix form

Correlation and convolution can both be written as a matrix-vector multiply,if we first convert the two-dimensional images f (i , j) and g(i , j) intoraster-ordered vectors f and g

g = Hf

with H a sparse matrix.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 31 / 91

Page 46: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Correlation vs Convolution

Convolution

g(i , j) =∑k,l

f (i − k , j − l)h(k, l)

G = H ∗ F

Cross-correlation

g(i , j) =∑k,l

f (i + k , j + l)h(k, l)

G = H ⊗ F

For a Gaussian or box filter, how will the outputs differ?

If the input is an impulse signal, how will the outputs differ? h ∗ δ?, andh ⊗ δ?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 32 / 91

Page 47: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example

What’s the result?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 33 / 91

Page 48: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example

What’s the result?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 33 / 91

Page 49: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example

What’s the result?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 33 / 91

Page 50: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example

What’s the result?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 33 / 91

Page 51: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Correlation vs Convolution

The convolution is both commutative and associative.

The Fourier transform of two convolved images is the product of theirindividual Fourier transforms.

Both correlation and convolution are linear shift-invariant (LSI) operators,which obey both the superposition principle

h ◦ (f0 + f1) = h ◦ fo + h ◦ f1

and the shift invariance principle

if g(i , j) = f (i + k, j + l)↔ (h ◦ g)(i , j) = (h ◦ f )(i + k, j + l)

which means that shifting a signal commutes with applying the operator.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 34 / 91

Page 52: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Correlation vs Convolution

The convolution is both commutative and associative.

The Fourier transform of two convolved images is the product of theirindividual Fourier transforms.

Both correlation and convolution are linear shift-invariant (LSI) operators,which obey both the superposition principle

h ◦ (f0 + f1) = h ◦ fo + h ◦ f1

and the shift invariance principle

if g(i , j) = f (i + k , j + l)↔ (h ◦ g)(i , j) = (h ◦ f )(i + k, j + l)

which means that shifting a signal commutes with applying the operator.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 34 / 91

Page 53: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Correlation vs Convolution

The convolution is both commutative and associative.

The Fourier transform of two convolved images is the product of theirindividual Fourier transforms.

Both correlation and convolution are linear shift-invariant (LSI) operators,which obey both the superposition principle

h ◦ (f0 + f1) = h ◦ fo + h ◦ f1

and the shift invariance principle

if g(i , j) = f (i + k , j + l)↔ (h ◦ g)(i , j) = (h ◦ f )(i + k, j + l)

which means that shifting a signal commutes with applying the operator.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 34 / 91

Page 54: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Boundary Effects

The results of filtering the image in this form will lead to a darkening of thecorner pixels.

The original image is effectively being padded with 0 values wherever theconvolution kernel extends beyond the original image boundaries.

A number of alternative padding or extension modes have been developed.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 35 / 91

Page 55: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Separable Filters

The process of performing a convolution requires K 2 operations per pixel,where K is the size of the convolution kernel.

In many cases, this operation can be speed up by first performing a 1Dhorizontal convolution followed by a 1D vertical convolution, requiring 2Koperations.

If his is possible, then the convolution kernel is called separable.

And it is the outer product of two kernels

K = vhT

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 36 / 91

Page 56: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Separable Filters

The process of performing a convolution requires K 2 operations per pixel,where K is the size of the convolution kernel.

In many cases, this operation can be speed up by first performing a 1Dhorizontal convolution followed by a 1D vertical convolution, requiring 2Koperations.

If his is possible, then the convolution kernel is called separable.

And it is the outer product of two kernels

K = vhT

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 36 / 91

Page 57: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Separable Filters

The process of performing a convolution requires K 2 operations per pixel,where K is the size of the convolution kernel.

In many cases, this operation can be speed up by first performing a 1Dhorizontal convolution followed by a 1D vertical convolution, requiring 2Koperations.

If his is possible, then the convolution kernel is called separable.

And it is the outer product of two kernels

K = vhT

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 36 / 91

Page 58: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Separable Filters

The process of performing a convolution requires K 2 operations per pixel,where K is the size of the convolution kernel.

In many cases, this operation can be speed up by first performing a 1Dhorizontal convolution followed by a 1D vertical convolution, requiring 2Koperations.

If his is possible, then the convolution kernel is called separable.

And it is the outer product of two kernels

K = vhT

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 36 / 91

Page 59: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 37 / 91

Page 60: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

What does this filter do?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 37 / 91

Page 61: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 38 / 91

Page 62: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

What does this filter do?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 38 / 91

Page 63: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 39 / 91

Page 64: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

What does this filter do?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 39 / 91

Page 65: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 40 / 91

Page 66: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

What does this filter do?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 40 / 91

Page 67: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 41 / 91

Page 68: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Let’s play a game...

Is this separable? If yes, what’s the separable version?

What does this filter do?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 41 / 91

Page 69: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

How can we tell if a given kernel K is indeed separable?

Inspection... this is what we were doing.

Looking at the analytic form of it.

Look at the singular value decomposition (SVD), and if only one singularvalue is non-zero, then it is separable

K = UΣVT =∑i

σiuivTi

with Σ = diag(σi ).√σ1u1 and

√σ1vT

1 are the vertical and horizontal kernels.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 42 / 91

Page 70: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

How can we tell if a given kernel K is indeed separable?

Inspection... this is what we were doing.

Looking at the analytic form of it.

Look at the singular value decomposition (SVD), and if only one singularvalue is non-zero, then it is separable

K = UΣVT =∑i

σiuivTi

with Σ = diag(σi ).

√σ1u1 and

√σ1vT

1 are the vertical and horizontal kernels.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 42 / 91

Page 71: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

How can we tell if a given kernel K is indeed separable?

Inspection... this is what we were doing.

Looking at the analytic form of it.

Look at the singular value decomposition (SVD), and if only one singularvalue is non-zero, then it is separable

K = UΣVT =∑i

σiuivTi

with Σ = diag(σi ).√σ1u1 and

√σ1vT

1 are the vertical and horizontal kernels.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 42 / 91

Page 72: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

How can we tell if a given kernel K is indeed separable?

Inspection... this is what we were doing.

Looking at the analytic form of it.

Look at the singular value decomposition (SVD), and if only one singularvalue is non-zero, then it is separable

K = UΣVT =∑i

σiuivTi

with Σ = diag(σi ).√σ1u1 and

√σ1vT

1 are the vertical and horizontal kernels.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 42 / 91

Page 73: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Filtering: Edge detection

Map image from 2d array of pixels to a set of curves or line segments orcontours.

Look for strong gradients, post-process.

Figure: [Shotton et al. PAMI, 07]

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 43 / 91

Page 74: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Filtering: Edge detection

Map image from 2d array of pixels to a set of curves or line segments orcontours.

Look for strong gradients, post-process.

Figure: [Shotton et al. PAMI, 07]

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 43 / 91

Page 75: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

What causes an edge?

[Source: K. Grauman]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 44 / 91

Page 76: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Looking more locally...

[Source: K. Grauman]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 45 / 91

Page 77: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Derivatives and Edges

An edge is a place of rapid change in the image intensity function.

[Source: S. Lazebnik]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 46 / 91

Page 78: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

How to Implement Derivatives with Convolution

For 2D functions, the partial derivative is

∂f (x , y)

∂x= limε→0

f (x + ε, y)− f (x)

ε

We can approximate with finite differences

∂f (x , y)

∂x≈ f (x + 1, y)− f (x)

1

What would be the filter to implement this using convolution?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 47 / 91

Page 79: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

How to Implement Derivatives with Convolution

For 2D functions, the partial derivative is

∂f (x , y)

∂x= limε→0

f (x + ε, y)− f (x)

ε

We can approximate with finite differences

∂f (x , y)

∂x≈ f (x + 1, y)− f (x)

1

What would be the filter to implement this using convolution?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 47 / 91

Page 80: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

How to Implement Derivatives with Convolution

For 2D functions, the partial derivative is

∂f (x , y)

∂x= limε→0

f (x + ε, y)− f (x)

ε

We can approximate with finite differences

∂f (x , y)

∂x≈ f (x + 1, y)− f (x)

1

What would be the filter to implement this using convolution?

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 47 / 91

Page 81: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Partial derivatives of an image

Figure: Using correlation filters

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 48 / 91

Page 82: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Finite Difference Filters

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 49 / 91

Page 83: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Image Gradient

The gradient of an image ∇f =[∂f∂x ,

∂f∂y

]The gradient points in the direction of most rapid change in intensity

The gradient direction (orientation of edge normal) is given by:

θ = tan−1

(∂f

∂y/∂f

∂x

)The edge strength is given by the magnitude ||∇f || =

√( ∂f∂x )2 + ( ∂f∂y )2

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 50 / 91

Page 84: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Image Gradient

The gradient of an image ∇f =[∂f∂x ,

∂f∂y

]The gradient points in the direction of most rapid change in intensity

The gradient direction (orientation of edge normal) is given by:

θ = tan−1

(∂f

∂y/∂f

∂x

)

The edge strength is given by the magnitude ||∇f || =√

( ∂f∂x )2 + ( ∂f∂y )2

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 50 / 91

Page 85: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Image Gradient

The gradient of an image ∇f =[∂f∂x ,

∂f∂y

]The gradient points in the direction of most rapid change in intensity

The gradient direction (orientation of edge normal) is given by:

θ = tan−1

(∂f

∂y/∂f

∂x

)The edge strength is given by the magnitude ||∇f || =

√( ∂f∂x )2 + ( ∂f∂y )2

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 50 / 91

Page 86: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Image Gradient

The gradient of an image ∇f =[∂f∂x ,

∂f∂y

]The gradient points in the direction of most rapid change in intensity

The gradient direction (orientation of edge normal) is given by:

θ = tan−1

(∂f

∂y/∂f

∂x

)The edge strength is given by the magnitude ||∇f || =

√( ∂f∂x )2 + ( ∂f∂y )2

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 50 / 91

Page 87: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Effects of noise

Consider a single row or column of the image.

Plotting intensity as a function of position gives a signal.

[Source: S. Seitz]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 51 / 91

Page 88: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Effects of noise

Smooth first, and look for picks in ∂∂x (h ∗ f ).

[Source: S. Seitz]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 52 / 91

Page 89: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Derivative theorem of convolution

Differentiation property of convolution

∂x(h ∗ f ) = (

∂h

∂x) ∗ f

[Source: S. Seitz]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 53 / 91

Page 90: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Derivative of Gaussians

We have the following equivalence

(I ⊗ g)⊗ h = I ⊗ (g ⊗ h)

[Source: K. Grauman]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 54 / 91

Page 91: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Laplacian of Gaussians

Edge by detecting zero-crossings of bottom graph

[Source: S. Seitz]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 55 / 91

Page 92: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

2D Edge Filtering

with ∇2 the Laplacian operator ∇2f = ∂2f∂x2 + ∂2f

∂y2

[Source: S. Seitz]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 56 / 91

Page 93: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Effect of σ on derivatives

The detected structures differ depending on the Gaussian’s scale parameter:

Larger values: larger scale edges detected.

Smaller values: finer features detected.

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 57 / 91

Page 94: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Derivatives

Use opposite signs to get response in regions of high contrast.

They sum to 0 so that there is no response in constant regions.

High absolute value at points of high contrast.

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 58 / 91

Page 95: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Band-pass filters

The Sobel and corner filters are band-pass and oriented filters.

More sophisticated filters can be obtained by convolving with a Gaussianfilter

G (x , y , σ) =1

2πσ2exp

(−x2 + y2

2σ2

)and taking the first or second derivatives.

These filters are band-pass filters: they filter low and high frequencies.

The second derivative of a two-dimensional image is the laplacian operator

∇2f =∂2f

∂x2+∂2f

∂y2

Blurring an image with a Gaussian and then taking its Laplacian is equivalentto convolving directly with the Laplacian of Gaussian (LoG) filter,

∇2fG (x , y , σ) =

(x2 + y2

σ4− 2

σ2

)G (x , y , σ)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 59 / 91

Page 96: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Band-pass filters

The Sobel and corner filters are band-pass and oriented filters.

More sophisticated filters can be obtained by convolving with a Gaussianfilter

G (x , y , σ) =1

2πσ2exp

(−x2 + y2

2σ2

)and taking the first or second derivatives.

These filters are band-pass filters: they filter low and high frequencies.

The second derivative of a two-dimensional image is the laplacian operator

∇2f =∂2f

∂x2+∂2f

∂y2

Blurring an image with a Gaussian and then taking its Laplacian is equivalentto convolving directly with the Laplacian of Gaussian (LoG) filter,

∇2fG (x , y , σ) =

(x2 + y2

σ4− 2

σ2

)G (x , y , σ)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 59 / 91

Page 97: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Band-pass filters

The Sobel and corner filters are band-pass and oriented filters.

More sophisticated filters can be obtained by convolving with a Gaussianfilter

G (x , y , σ) =1

2πσ2exp

(−x2 + y2

2σ2

)and taking the first or second derivatives.

These filters are band-pass filters: they filter low and high frequencies.

The second derivative of a two-dimensional image is the laplacian operator

∇2f =∂2f

∂x2+∂2f

∂y2

Blurring an image with a Gaussian and then taking its Laplacian is equivalentto convolving directly with the Laplacian of Gaussian (LoG) filter,

∇2fG (x , y , σ) =

(x2 + y2

σ4− 2

σ2

)G (x , y , σ)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 59 / 91

Page 98: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Band-pass filters

The Sobel and corner filters are band-pass and oriented filters.

More sophisticated filters can be obtained by convolving with a Gaussianfilter

G (x , y , σ) =1

2πσ2exp

(−x2 + y2

2σ2

)and taking the first or second derivatives.

These filters are band-pass filters: they filter low and high frequencies.

The second derivative of a two-dimensional image is the laplacian operator

∇2f =∂2f

∂x2+∂2f

∂y2

Blurring an image with a Gaussian and then taking its Laplacian is equivalentto convolving directly with the Laplacian of Gaussian (LoG) filter,

∇2fG (x , y , σ) =

(x2 + y2

σ4− 2

σ2

)G (x , y , σ)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 59 / 91

Page 99: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Band-pass filters

The Sobel and corner filters are band-pass and oriented filters.

More sophisticated filters can be obtained by convolving with a Gaussianfilter

G (x , y , σ) =1

2πσ2exp

(−x2 + y2

2σ2

)and taking the first or second derivatives.

These filters are band-pass filters: they filter low and high frequencies.

The second derivative of a two-dimensional image is the laplacian operator

∇2f =∂2f

∂x2+∂2f

∂y2

Blurring an image with a Gaussian and then taking its Laplacian is equivalentto convolving directly with the Laplacian of Gaussian (LoG) filter,

∇2fG (x , y , σ) =

(x2 + y2

σ4− 2

σ2

)G (x , y , σ)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 59 / 91

Page 100: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Band-pass filters

The directional or oriented filter can obtained by smoothing with aGaussian (or some other filter) and then taking a directional derivative∇u = ∂

∂uu · ∇(G ∗ f ) = ∇u(G ∗ f ) = (∇uG ) ∗ f

with u = (cos θ, sin θ).

The Sobel operator is a simple approximation of this:

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 60 / 91

Page 101: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Steerable Filters

Oriented filters are used in many vision and image processing tasks: textureanalysis, edge detection, image data compression, motion analysis.

One approach to finding the response of a filter at many orientations is toapply many versions of the same filter, each different from the others bysome small rotation in angle.

More efficient is to apply a few filters corresponding to a few angles andinterpolate between the responses.

One then needs to know how many filters are required and how to properlyinterpolate between the responses.

With the correct filter set and the correct interpolation rule, it is possible todetermine the response of a filter of arbitrary orientation without explicitlyapplying that filter.

Steerable filters are a class of filters in which a filter of arbitrary orientationis synthesized as a linear combination of a set of basis filters.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 61 / 91

Page 102: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Steerable Filters

Oriented filters are used in many vision and image processing tasks: textureanalysis, edge detection, image data compression, motion analysis.

One approach to finding the response of a filter at many orientations is toapply many versions of the same filter, each different from the others bysome small rotation in angle.

More efficient is to apply a few filters corresponding to a few angles andinterpolate between the responses.

One then needs to know how many filters are required and how to properlyinterpolate between the responses.

With the correct filter set and the correct interpolation rule, it is possible todetermine the response of a filter of arbitrary orientation without explicitlyapplying that filter.

Steerable filters are a class of filters in which a filter of arbitrary orientationis synthesized as a linear combination of a set of basis filters.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 61 / 91

Page 103: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Steerable Filters

Oriented filters are used in many vision and image processing tasks: textureanalysis, edge detection, image data compression, motion analysis.

One approach to finding the response of a filter at many orientations is toapply many versions of the same filter, each different from the others bysome small rotation in angle.

More efficient is to apply a few filters corresponding to a few angles andinterpolate between the responses.

One then needs to know how many filters are required and how to properlyinterpolate between the responses.

With the correct filter set and the correct interpolation rule, it is possible todetermine the response of a filter of arbitrary orientation without explicitlyapplying that filter.

Steerable filters are a class of filters in which a filter of arbitrary orientationis synthesized as a linear combination of a set of basis filters.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 61 / 91

Page 104: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Steerable Filters

Oriented filters are used in many vision and image processing tasks: textureanalysis, edge detection, image data compression, motion analysis.

One approach to finding the response of a filter at many orientations is toapply many versions of the same filter, each different from the others bysome small rotation in angle.

More efficient is to apply a few filters corresponding to a few angles andinterpolate between the responses.

One then needs to know how many filters are required and how to properlyinterpolate between the responses.

With the correct filter set and the correct interpolation rule, it is possible todetermine the response of a filter of arbitrary orientation without explicitlyapplying that filter.

Steerable filters are a class of filters in which a filter of arbitrary orientationis synthesized as a linear combination of a set of basis filters.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 61 / 91

Page 105: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Steerable Filters

Oriented filters are used in many vision and image processing tasks: textureanalysis, edge detection, image data compression, motion analysis.

One approach to finding the response of a filter at many orientations is toapply many versions of the same filter, each different from the others bysome small rotation in angle.

More efficient is to apply a few filters corresponding to a few angles andinterpolate between the responses.

One then needs to know how many filters are required and how to properlyinterpolate between the responses.

With the correct filter set and the correct interpolation rule, it is possible todetermine the response of a filter of arbitrary orientation without explicitlyapplying that filter.

Steerable filters are a class of filters in which a filter of arbitrary orientationis synthesized as a linear combination of a set of basis filters.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 61 / 91

Page 106: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Steerable Filters

Oriented filters are used in many vision and image processing tasks: textureanalysis, edge detection, image data compression, motion analysis.

One approach to finding the response of a filter at many orientations is toapply many versions of the same filter, each different from the others bysome small rotation in angle.

More efficient is to apply a few filters corresponding to a few angles andinterpolate between the responses.

One then needs to know how many filters are required and how to properlyinterpolate between the responses.

With the correct filter set and the correct interpolation rule, it is possible todetermine the response of a filter of arbitrary orientation without explicitlyapplying that filter.

Steerable filters are a class of filters in which a filter of arbitrary orientationis synthesized as a linear combination of a set of basis filters.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 61 / 91

Page 107: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example of Steerable Filter

2D symmetric Gaussian with σ = 1 and assume constant is 1

G (x , y , σ) = exp(−x2 + y2

)The directional derivative operator is steerable.

The first derivative

G 01 =

∂xexp

(−x2 + y2

)= −2x exp

(−x2 + y2

)and the same function rotated 90 degrees is

G 901 =

∂yexp

(−x2 + y2

)= −2y exp

(−x2 + y2

)A filter of arbitrary orientation θ can be synthesized by taking a linearcombination of G 0

1 and G 901

G θ1 = cos θG 01 + sin θG 90

1

G 01 and G 90

1 are the basis filters and cos θ and sin θ are the interpolationfunctions

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 62 / 91

Page 108: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example of Steerable Filter

2D symmetric Gaussian with σ = 1 and assume constant is 1

G (x , y , σ) = exp(−x2 + y2

)The directional derivative operator is steerable.

The first derivative

G 01 =

∂xexp

(−x2 + y2

)= −2x exp

(−x2 + y2

)and the same function rotated 90 degrees is

G 901 =

∂yexp

(−x2 + y2

)= −2y exp

(−x2 + y2

)

A filter of arbitrary orientation θ can be synthesized by taking a linearcombination of G 0

1 and G 901

G θ1 = cos θG 01 + sin θG 90

1

G 01 and G 90

1 are the basis filters and cos θ and sin θ are the interpolationfunctions

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 62 / 91

Page 109: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example of Steerable Filter

2D symmetric Gaussian with σ = 1 and assume constant is 1

G (x , y , σ) = exp(−x2 + y2

)The directional derivative operator is steerable.

The first derivative

G 01 =

∂xexp

(−x2 + y2

)= −2x exp

(−x2 + y2

)and the same function rotated 90 degrees is

G 901 =

∂yexp

(−x2 + y2

)= −2y exp

(−x2 + y2

)A filter of arbitrary orientation θ can be synthesized by taking a linearcombination of G 0

1 and G 901

G θ1 = cos θG 01 + sin θG 90

1

G 01 and G 90

1 are the basis filters and cos θ and sin θ are the interpolationfunctions

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 62 / 91

Page 110: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example of Steerable Filter

2D symmetric Gaussian with σ = 1 and assume constant is 1

G (x , y , σ) = exp(−x2 + y2

)The directional derivative operator is steerable.

The first derivative

G 01 =

∂xexp

(−x2 + y2

)= −2x exp

(−x2 + y2

)and the same function rotated 90 degrees is

G 901 =

∂yexp

(−x2 + y2

)= −2y exp

(−x2 + y2

)A filter of arbitrary orientation θ can be synthesized by taking a linearcombination of G 0

1 and G 901

G θ1 = cos θG 01 + sin θG 90

1

G 01 and G 90

1 are the basis filters and cos θ and sin θ are the interpolationfunctions

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 62 / 91

Page 111: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

More on steerable filters

Because convolution is a linear operation, we can synthesize an imagefiltered at an arbitrary orientation by taking linear combinations of theimages filtered with G 0

1 and G 901

if R01 = G 0

1 ∗ I and R901 = G 90

1 ∗ I then Rθ1 = cos θR01 + sin θR90

1

Check yourself that this is the case.

See [Freeman & Adelson, 91] for the conditions on when a filter is steerableand how many basis are necessary.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 63 / 91

Page 112: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

More on steerable filters

Because convolution is a linear operation, we can synthesize an imagefiltered at an arbitrary orientation by taking linear combinations of theimages filtered with G 0

1 and G 901

if R01 = G 0

1 ∗ I and R901 = G 90

1 ∗ I then Rθ1 = cos θR01 + sin θR90

1

Check yourself that this is the case.

See [Freeman & Adelson, 91] for the conditions on when a filter is steerableand how many basis are necessary.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 63 / 91

Page 113: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

More on steerable filters

Because convolution is a linear operation, we can synthesize an imagefiltered at an arbitrary orientation by taking linear combinations of theimages filtered with G 0

1 and G 901

if R01 = G 0

1 ∗ I and R901 = G 90

1 ∗ I then Rθ1 = cos θR01 + sin θR90

1

Check yourself that this is the case.

See [Freeman & Adelson, 91] for the conditions on when a filter is steerableand how many basis are necessary.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 63 / 91

Page 114: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

[Source: W. Freeman 91]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 64 / 91

Page 115: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Template matching

Filters as templates: filters look like the effects they are intended to find.

Use normalized cross-correlation score to find a given pattern (template) inthe image.

Normalization needed to control for relative brightnesses.

[Source: K. Grauman]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 65 / 91

Page 116: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Template matching

[Source: K. Grauman]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 66 / 91

Page 117: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

More complex Scenes

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 67 / 91

Page 118: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Template matching

What if the template is not identical to some subimage in the scene?

Match can be meaningful, if scale, orientation, and general appearance isright.

How can I find the right scale?

[Source: K. Grauman]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 68 / 91

Page 119: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Other transformations

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 69 / 91

Page 120: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Integral Images

If an image is going to be repeatedly convolved with different box filters, itis useful to compute the summed area table.

It is the running sum of all the pixel values from the origin

s(i , j) =i∑

k=0

j∑l=0

f (k , l)

This can be efficiently computed using a recursive (raster-scan) algorithm

s(i , j) = s(i − 1, j) + s(i , j − 1)− s(i − 1, j − 1) + f (i , j)

The image s(i , j) is called an integral image and can actually be computedusing only two additions per pixel if separate row sums are used.

To find the summed area (integral) inside a rectangle [i0, i1]× [j0, j1] wesimply combine four samples from the summed area table.

S([i0, i1]× [j0, j1]) = s(i1, j1)− s(i1, j0 − 1)− s(i0 − 1, j1) + s(i0 − 1, j0 − 1)

Summed area tables have been used in face detection [Viola & Jones, 04]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 70 / 91

Page 121: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Integral Images

If an image is going to be repeatedly convolved with different box filters, itis useful to compute the summed area table.

It is the running sum of all the pixel values from the origin

s(i , j) =i∑

k=0

j∑l=0

f (k , l)

This can be efficiently computed using a recursive (raster-scan) algorithm

s(i , j) = s(i − 1, j) + s(i , j − 1)− s(i − 1, j − 1) + f (i , j)

The image s(i , j) is called an integral image and can actually be computedusing only two additions per pixel if separate row sums are used.

To find the summed area (integral) inside a rectangle [i0, i1]× [j0, j1] wesimply combine four samples from the summed area table.

S([i0, i1]× [j0, j1]) = s(i1, j1)− s(i1, j0 − 1)− s(i0 − 1, j1) + s(i0 − 1, j0 − 1)

Summed area tables have been used in face detection [Viola & Jones, 04]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 70 / 91

Page 122: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Integral Images

If an image is going to be repeatedly convolved with different box filters, itis useful to compute the summed area table.

It is the running sum of all the pixel values from the origin

s(i , j) =i∑

k=0

j∑l=0

f (k , l)

This can be efficiently computed using a recursive (raster-scan) algorithm

s(i , j) = s(i − 1, j) + s(i , j − 1)− s(i − 1, j − 1) + f (i , j)

The image s(i , j) is called an integral image and can actually be computedusing only two additions per pixel if separate row sums are used.

To find the summed area (integral) inside a rectangle [i0, i1]× [j0, j1] wesimply combine four samples from the summed area table.

S([i0, i1]× [j0, j1]) = s(i1, j1)− s(i1, j0 − 1)− s(i0 − 1, j1) + s(i0 − 1, j0 − 1)

Summed area tables have been used in face detection [Viola & Jones, 04]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 70 / 91

Page 123: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Integral Images

If an image is going to be repeatedly convolved with different box filters, itis useful to compute the summed area table.

It is the running sum of all the pixel values from the origin

s(i , j) =i∑

k=0

j∑l=0

f (k , l)

This can be efficiently computed using a recursive (raster-scan) algorithm

s(i , j) = s(i − 1, j) + s(i , j − 1)− s(i − 1, j − 1) + f (i , j)

The image s(i , j) is called an integral image and can actually be computedusing only two additions per pixel if separate row sums are used.

To find the summed area (integral) inside a rectangle [i0, i1]× [j0, j1] wesimply combine four samples from the summed area table.

S([i0, i1]× [j0, j1]) = s(i1, j1)− s(i1, j0 − 1)− s(i0 − 1, j1) + s(i0 − 1, j0 − 1)

Summed area tables have been used in face detection [Viola & Jones, 04]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 70 / 91

Page 124: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Integral Images

If an image is going to be repeatedly convolved with different box filters, itis useful to compute the summed area table.

It is the running sum of all the pixel values from the origin

s(i , j) =i∑

k=0

j∑l=0

f (k , l)

This can be efficiently computed using a recursive (raster-scan) algorithm

s(i , j) = s(i − 1, j) + s(i , j − 1)− s(i − 1, j − 1) + f (i , j)

The image s(i , j) is called an integral image and can actually be computedusing only two additions per pixel if separate row sums are used.

To find the summed area (integral) inside a rectangle [i0, i1]× [j0, j1] wesimply combine four samples from the summed area table.

S([i0, i1]× [j0, j1]) = s(i1, j1)− s(i1, j0 − 1)− s(i0 − 1, j1) + s(i0 − 1, j0 − 1)

Summed area tables have been used in face detection [Viola & Jones, 04]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 70 / 91

Page 125: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Integral Images

If an image is going to be repeatedly convolved with different box filters, itis useful to compute the summed area table.

It is the running sum of all the pixel values from the origin

s(i , j) =i∑

k=0

j∑l=0

f (k , l)

This can be efficiently computed using a recursive (raster-scan) algorithm

s(i , j) = s(i − 1, j) + s(i , j − 1)− s(i − 1, j − 1) + f (i , j)

The image s(i , j) is called an integral image and can actually be computedusing only two additions per pixel if separate row sums are used.

To find the summed area (integral) inside a rectangle [i0, i1]× [j0, j1] wesimply combine four samples from the summed area table.

S([i0, i1]× [j0, j1]) = s(i1, j1)− s(i1, j0 − 1)− s(i0 − 1, j1) + s(i0 − 1, j0 − 1)

Summed area tables have been used in face detection [Viola & Jones, 04]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 70 / 91

Page 126: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example of Integral Images

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 71 / 91

Page 127: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Non-linear filters: Median filter

We have seen linear filters, i.e., their response to a sum of two signals is thesame as the sum of the individual responses.

Median filter: Non linear filter that selects the median value from eachpixels neighborhood.

Robust to outliers, but not good for Gaussian noise.

α-trimmed mean: averages together all of the pixels except for the αfraction that are the smallest and the largest.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 72 / 91

Page 128: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Non-linear filters: Median filter

We have seen linear filters, i.e., their response to a sum of two signals is thesame as the sum of the individual responses.

Median filter: Non linear filter that selects the median value from eachpixels neighborhood.

Robust to outliers, but not good for Gaussian noise.

α-trimmed mean: averages together all of the pixels except for the αfraction that are the smallest and the largest.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 72 / 91

Page 129: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Non-linear filters: Median filter

We have seen linear filters, i.e., their response to a sum of two signals is thesame as the sum of the individual responses.

Median filter: Non linear filter that selects the median value from eachpixels neighborhood.

Robust to outliers, but not good for Gaussian noise.

α-trimmed mean: averages together all of the pixels except for the αfraction that are the smallest and the largest.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 72 / 91

Page 130: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Non-linear filters: Median filter

We have seen linear filters, i.e., their response to a sum of two signals is thesame as the sum of the individual responses.

Median filter: Non linear filter that selects the median value from eachpixels neighborhood.

Robust to outliers, but not good for Gaussian noise.

α-trimmed mean: averages together all of the pixels except for the αfraction that are the smallest and the largest.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 72 / 91

Page 131: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example of non-linear filters

(Median filter) (α-trimmed mean)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 73 / 91

Page 132: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Bilateral Filtering

Weighted filter kernel with a better outlier rejection.

Instead of rejecting a fixed percentage, we reject (in a soft way) pixels whosevalues differ too much from the central pixel value.

The output pixel value depends on a weighted combination of neighboringpixel values

g(i , j) =

∑k,l f (k, l)w(i , j , k, l)∑

k,l w(i , j , k, l)

Data-dependent bilateral weight function

w(i , j , k, l) = exp

(− (i − k)2 + (j − l)2

2σ2d

− ||f (i , j)− f (k, l)||2

2σ2r

)composed of the domain kernel and the range kernel.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 74 / 91

Page 133: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Bilateral Filtering

Weighted filter kernel with a better outlier rejection.

Instead of rejecting a fixed percentage, we reject (in a soft way) pixels whosevalues differ too much from the central pixel value.

The output pixel value depends on a weighted combination of neighboringpixel values

g(i , j) =

∑k,l f (k , l)w(i , j , k , l)∑

k,l w(i , j , k , l)

Data-dependent bilateral weight function

w(i , j , k, l) = exp

(− (i − k)2 + (j − l)2

2σ2d

− ||f (i , j)− f (k, l)||2

2σ2r

)composed of the domain kernel and the range kernel.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 74 / 91

Page 134: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Bilateral Filtering

Weighted filter kernel with a better outlier rejection.

Instead of rejecting a fixed percentage, we reject (in a soft way) pixels whosevalues differ too much from the central pixel value.

The output pixel value depends on a weighted combination of neighboringpixel values

g(i , j) =

∑k,l f (k , l)w(i , j , k , l)∑

k,l w(i , j , k , l)

Data-dependent bilateral weight function

w(i , j , k , l) = exp

(− (i − k)2 + (j − l)2

2σ2d

− ||f (i , j)− f (k, l)||2

2σ2r

)composed of the domain kernel and the range kernel.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 74 / 91

Page 135: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Bilateral Filtering

Weighted filter kernel with a better outlier rejection.

Instead of rejecting a fixed percentage, we reject (in a soft way) pixels whosevalues differ too much from the central pixel value.

The output pixel value depends on a weighted combination of neighboringpixel values

g(i , j) =

∑k,l f (k , l)w(i , j , k , l)∑

k,l w(i , j , k , l)

Data-dependent bilateral weight function

w(i , j , k , l) = exp

(− (i − k)2 + (j − l)2

2σ2d

− ||f (i , j)− f (k, l)||2

2σ2r

)composed of the domain kernel and the range kernel.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 74 / 91

Page 136: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example Bilateral Filtering

Figure: Bilateral filtering [Durand & Dorsey, 02]. (a) noisy step edge input. (b)domain filter (Gaussian). (c) range filter (similarity to center pixel value). (d)bilateral filter. (e) filtered step edge output. (f) 3D distance between pixels

[Source: R. Szeliski]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 75 / 91

Page 137: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Distance Transform

Useful to quickly precomputing the distance to a curve or a set of points.

Let d(k, l) be some distance metric between pixel offsets, e.g., Manhattandistance

d(k , l) = |k |+ |l |

or Euclidean distanced(k , l) =

√k2 + l2

The distance transform D(i , j) of a binary image b(i , j) is defined as

D(i , j) = mink,l ;b(k,l)=0

d(i − k , j − l)

it is the distance to the nearest pixel whose value is 0.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 76 / 91

Page 138: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Distance Transform

Useful to quickly precomputing the distance to a curve or a set of points.

Let d(k, l) be some distance metric between pixel offsets, e.g., Manhattandistance

d(k , l) = |k |+ |l |

or Euclidean distanced(k , l) =

√k2 + l2

The distance transform D(i , j) of a binary image b(i , j) is defined as

D(i , j) = mink,l ;b(k,l)=0

d(i − k , j − l)

it is the distance to the nearest pixel whose value is 0.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 76 / 91

Page 139: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Distance Transform

Useful to quickly precomputing the distance to a curve or a set of points.

Let d(k, l) be some distance metric between pixel offsets, e.g., Manhattandistance

d(k , l) = |k |+ |l |

or Euclidean distanced(k , l) =

√k2 + l2

The distance transform D(i , j) of a binary image b(i , j) is defined as

D(i , j) = mink,l ;b(k,l)=0

d(i − k , j − l)

it is the distance to the nearest pixel whose value is 0.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 76 / 91

Page 140: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Distance Transform Algorithm

The Manhattan distance can be computed using a forward and backwardpass of a simple raster-scan algorithm.

Forward pass:, each non-zero pixel in b is replaced by the minimum of 1 +the distance of its north or west neighbor.

Backward pass: the same, but the minimum is both over the current valueD and 1 + the distance of the south and east neighbors.

Figure: City block distance transform: (a) original binary image; (b) top to bottom (forward)raster sweep: green values are used to compute the orange value; (c) bottom to top (backward)raster sweep: green values are merged with old orange value; (d) final distance transform.

[Source: R. Szeliski]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 77 / 91

Page 141: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Distance Transform Algorithm

The Manhattan distance can be computed using a forward and backwardpass of a simple raster-scan algorithm.

Forward pass:, each non-zero pixel in b is replaced by the minimum of 1 +the distance of its north or west neighbor.

Backward pass: the same, but the minimum is both over the current valueD and 1 + the distance of the south and east neighbors.

Figure: City block distance transform: (a) original binary image; (b) top to bottom (forward)raster sweep: green values are used to compute the orange value; (c) bottom to top (backward)raster sweep: green values are merged with old orange value; (d) final distance transform.

[Source: R. Szeliski]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 77 / 91

Page 142: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Distance Transform Algorithm

The Manhattan distance can be computed using a forward and backwardpass of a simple raster-scan algorithm.

Forward pass:, each non-zero pixel in b is replaced by the minimum of 1 +the distance of its north or west neighbor.

Backward pass: the same, but the minimum is both over the current valueD and 1 + the distance of the south and east neighbors.

Figure: City block distance transform: (a) original binary image; (b) top to bottom (forward)raster sweep: green values are used to compute the orange value; (c) bottom to top (backward)raster sweep: green values are merged with old orange value; (d) final distance transform.

[Source: R. Szeliski]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 77 / 91

Page 143: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example of Distance Transform

More complicated in the Euclidean case.

Example of a distance transform

The ridges is the skeleton or medial axis.

Extension: Signed distance transform.

[Source: P. Felzenszwalb]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 78 / 91

Page 144: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Fourier Transform

Fourier analysis could be used to analyze the frequency characteristics ofvarious filters.

How can we analyze what a given filter does to high, medium, and lowfrequencies?

Pass a sinusoid of known frequency through the filter and to observe by howmuch it is attenuated

s(x) = sin(2πfx + φi ) = sin(ωx + φi )

with frequency f , angular frequency ω and phase φi .

If we convolve the sinusoidal signal s(x) with a filter whose impulse responseis h(x), we get another sinusoid of the same frequency but differentmagnitude and phase

o(x) = h(x) ∗ s(x) = A sin(ωx + φo)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 79 / 91

Page 145: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Fourier Transform

Fourier analysis could be used to analyze the frequency characteristics ofvarious filters.

How can we analyze what a given filter does to high, medium, and lowfrequencies?

Pass a sinusoid of known frequency through the filter and to observe by howmuch it is attenuated

s(x) = sin(2πfx + φi ) = sin(ωx + φi )

with frequency f , angular frequency ω and phase φi .

If we convolve the sinusoidal signal s(x) with a filter whose impulse responseis h(x), we get another sinusoid of the same frequency but differentmagnitude and phase

o(x) = h(x) ∗ s(x) = A sin(ωx + φo)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 79 / 91

Page 146: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Fourier Transform

Fourier analysis could be used to analyze the frequency characteristics ofvarious filters.

How can we analyze what a given filter does to high, medium, and lowfrequencies?

Pass a sinusoid of known frequency through the filter and to observe by howmuch it is attenuated

s(x) = sin(2πfx + φi ) = sin(ωx + φi )

with frequency f , angular frequency ω and phase φi .

If we convolve the sinusoidal signal s(x) with a filter whose impulse responseis h(x), we get another sinusoid of the same frequency but differentmagnitude and phase

o(x) = h(x) ∗ s(x) = A sin(ωx + φo)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 79 / 91

Page 147: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Fourier Transform

Fourier analysis could be used to analyze the frequency characteristics ofvarious filters.

How can we analyze what a given filter does to high, medium, and lowfrequencies?

Pass a sinusoid of known frequency through the filter and to observe by howmuch it is attenuated

s(x) = sin(2πfx + φi ) = sin(ωx + φi )

with frequency f , angular frequency ω and phase φi .

If we convolve the sinusoidal signal s(x) with a filter whose impulse responseis h(x), we get another sinusoid of the same frequency but differentmagnitude and phase

o(x) = h(x) ∗ s(x) = A sin(ωx + φo)

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 79 / 91

Page 148: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Filtering and Fourier

Convolution can be expressed as a weighted summation of shifted inputsignals (sinusoids); so it is just a single sinusoid at that frequency.

o(x) = h(x) ∗ s(x) = A sin(ωx + φo)

A is the gain or magnitude of the filter, while the phase difference∆φ = φo − φi i is the shift or phase

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 80 / 91

Page 149: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Complex notation

The sinusoid is express as s(x) = e jωx = cosωx + j sinωx and the filtersinusoid as

o(x) = h(x) ∗ s(x) = Ae jωx+φ

The Fourier transform pair is

h(x)←→ H(ω)

The Fourier transform in continuous domain

H(ω) =

∫ ∞−∞

h(x)e−jωxdx

The Fourier transform in discrete domain

H(k) =1

N

N−1∑x=0

h(x)e−j2πkxN

where N is the length of the signal.

The discrete form is known as the Discrete Fourier Transform (DFT).

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 81 / 91

Page 150: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Complex notation

The sinusoid is express as s(x) = e jωx = cosωx + j sinωx and the filtersinusoid as

o(x) = h(x) ∗ s(x) = Ae jωx+φ

The Fourier transform pair is

h(x)←→ H(ω)

The Fourier transform in continuous domain

H(ω) =

∫ ∞−∞

h(x)e−jωxdx

The Fourier transform in discrete domain

H(k) =1

N

N−1∑x=0

h(x)e−j2πkxN

where N is the length of the signal.

The discrete form is known as the Discrete Fourier Transform (DFT).

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 81 / 91

Page 151: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Complex notation

The sinusoid is express as s(x) = e jωx = cosωx + j sinωx and the filtersinusoid as

o(x) = h(x) ∗ s(x) = Ae jωx+φ

The Fourier transform pair is

h(x)←→ H(ω)

The Fourier transform in continuous domain

H(ω) =

∫ ∞−∞

h(x)e−jωxdx

The Fourier transform in discrete domain

H(k) =1

N

N−1∑x=0

h(x)e−j2πkxN

where N is the length of the signal.

The discrete form is known as the Discrete Fourier Transform (DFT).

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 81 / 91

Page 152: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Complex notation

The sinusoid is express as s(x) = e jωx = cosωx + j sinωx and the filtersinusoid as

o(x) = h(x) ∗ s(x) = Ae jωx+φ

The Fourier transform pair is

h(x)←→ H(ω)

The Fourier transform in continuous domain

H(ω) =

∫ ∞−∞

h(x)e−jωxdx

The Fourier transform in discrete domain

H(k) =1

N

N−1∑x=0

h(x)e−j2πkxN

where N is the length of the signal.

The discrete form is known as the Discrete Fourier Transform (DFT).

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 81 / 91

Page 153: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Complex notation

The sinusoid is express as s(x) = e jωx = cosωx + j sinωx and the filtersinusoid as

o(x) = h(x) ∗ s(x) = Ae jωx+φ

The Fourier transform pair is

h(x)←→ H(ω)

The Fourier transform in continuous domain

H(ω) =

∫ ∞−∞

h(x)e−jωxdx

The Fourier transform in discrete domain

H(k) =1

N

N−1∑x=0

h(x)e−j2πkxN

where N is the length of the signal.

The discrete form is known as the Discrete Fourier Transform (DFT).

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 81 / 91

Page 154: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Properties Fourier Transform

[Source: R. Szeliski]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 82 / 91

Page 155: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

[Source: R. Szeliski]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 83 / 91

Page 156: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

[Source: R. Szeliski]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 84 / 91

Page 157: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

2D Fourier Transform

Same as 1D, but in 2D. Now the sinusoid is

s(x , y) = sin(ωxx + ωyy)

The 2D Fourier in continuous domain is then

H(ωx , ωy ) =

∫ ∞−∞

∫ ∞−∞

h(x , y)e−jωxx+ωy ydxdy

and in the discrete domain

H(kx , ky ) =1

MN

M−1∑x=0

N−1∑y=0

h(x , y)e−2πjkx x+ky y

MN

where M and N are the width and height of the image.

All the properties carry over to 2D.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 85 / 91

Page 158: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

2D Fourier Transform

Same as 1D, but in 2D. Now the sinusoid is

s(x , y) = sin(ωxx + ωyy)

The 2D Fourier in continuous domain is then

H(ωx , ωy ) =

∫ ∞−∞

∫ ∞−∞

h(x , y)e−jωxx+ωy ydxdy

and in the discrete domain

H(kx , ky ) =1

MN

M−1∑x=0

N−1∑y=0

h(x , y)e−2πjkx x+ky y

MN

where M and N are the width and height of the image.

All the properties carry over to 2D.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 85 / 91

Page 159: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

2D Fourier Transform

Same as 1D, but in 2D. Now the sinusoid is

s(x , y) = sin(ωxx + ωyy)

The 2D Fourier in continuous domain is then

H(ωx , ωy ) =

∫ ∞−∞

∫ ∞−∞

h(x , y)e−jωxx+ωy ydxdy

and in the discrete domain

H(kx , ky ) =1

MN

M−1∑x=0

N−1∑y=0

h(x , y)e−2πjkx x+ky y

MN

where M and N are the width and height of the image.

All the properties carry over to 2D.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 85 / 91

Page 160: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Example of 2D Fourier Transform

[Source: A. Jepson]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 86 / 91

Page 161: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Pyramids

We might want to change resolution of an image before processing.

We might not know which scale we want, e.g., when searching for a face inan image.

In this case, we will generate a full pyramid of different image sizes.

Can also be used to accelerate the search, by first finding at the coarser levelof the pyramid and then at the full resolution.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 87 / 91

Page 162: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Pyramids

We might want to change resolution of an image before processing.

We might not know which scale we want, e.g., when searching for a face inan image.

In this case, we will generate a full pyramid of different image sizes.

Can also be used to accelerate the search, by first finding at the coarser levelof the pyramid and then at the full resolution.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 87 / 91

Page 163: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Pyramids

We might want to change resolution of an image before processing.

We might not know which scale we want, e.g., when searching for a face inan image.

In this case, we will generate a full pyramid of different image sizes.

Can also be used to accelerate the search, by first finding at the coarser levelof the pyramid and then at the full resolution.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 87 / 91

Page 164: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Pyramids

We might want to change resolution of an image before processing.

We might not know which scale we want, e.g., when searching for a face inan image.

In this case, we will generate a full pyramid of different image sizes.

Can also be used to accelerate the search, by first finding at the coarser levelof the pyramid and then at the full resolution.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 87 / 91

Page 165: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Image Pyramid

[Source: R. Szeliski]

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 88 / 91

Page 166: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Interpolation and Decimation

To interpolate (or upsample) an image to a higher resolution, we need toselect an interpolation kernel with which to convolve the image

g(i , j) =∑k,l

f (k , l)h(i − rk, j − rl)

with r the up-sampling rate.

The linear interpolator (corresponding to the tent kernel) producesinterpolating piecewise linear curves.

More complex kernels, e.g., B-splines.

Decimation: reduces resolution

g(i , j) =∑k,l

f (k, l)h(i − k/r , j − l/r)

with r the down-sampling rate.

Different filters exist as well.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 89 / 91

Page 167: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Interpolation and Decimation

To interpolate (or upsample) an image to a higher resolution, we need toselect an interpolation kernel with which to convolve the image

g(i , j) =∑k,l

f (k , l)h(i − rk, j − rl)

with r the up-sampling rate.

The linear interpolator (corresponding to the tent kernel) producesinterpolating piecewise linear curves.

More complex kernels, e.g., B-splines.

Decimation: reduces resolution

g(i , j) =∑k,l

f (k, l)h(i − k/r , j − l/r)

with r the down-sampling rate.

Different filters exist as well.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 89 / 91

Page 168: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Interpolation and Decimation

To interpolate (or upsample) an image to a higher resolution, we need toselect an interpolation kernel with which to convolve the image

g(i , j) =∑k,l

f (k , l)h(i − rk, j − rl)

with r the up-sampling rate.

The linear interpolator (corresponding to the tent kernel) producesinterpolating piecewise linear curves.

More complex kernels, e.g., B-splines.

Decimation: reduces resolution

g(i , j) =∑k,l

f (k , l)h(i − k/r , j − l/r)

with r the down-sampling rate.

Different filters exist as well.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 89 / 91

Page 169: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Interpolation and Decimation

To interpolate (or upsample) an image to a higher resolution, we need toselect an interpolation kernel with which to convolve the image

g(i , j) =∑k,l

f (k , l)h(i − rk, j − rl)

with r the up-sampling rate.

The linear interpolator (corresponding to the tent kernel) producesinterpolating piecewise linear curves.

More complex kernels, e.g., B-splines.

Decimation: reduces resolution

g(i , j) =∑k,l

f (k , l)h(i − k/r , j − l/r)

with r the down-sampling rate.

Different filters exist as well.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 89 / 91

Page 170: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Interpolation and Decimation

To interpolate (or upsample) an image to a higher resolution, we need toselect an interpolation kernel with which to convolve the image

g(i , j) =∑k,l

f (k , l)h(i − rk, j − rl)

with r the up-sampling rate.

The linear interpolator (corresponding to the tent kernel) producesinterpolating piecewise linear curves.

More complex kernels, e.g., B-splines.

Decimation: reduces resolution

g(i , j) =∑k,l

f (k , l)h(i − k/r , j − l/r)

with r the down-sampling rate.

Different filters exist as well.

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 89 / 91

Page 171: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Multi-Resolution Representations

The most used one is the Laplacian pyramid:

We first blur and subsample the original image by a factor of two and storethis in the next level of the pyramid.

They then subtract this low-pass version from the original to yield theband-pass Laplacian image.

The pyramid has perfect reconstruction: the Laplacian images plus thebase-level Gaussian are sufficient to exactly reconstruct the original image.

Wavelets are alternative pyramids. We will not see them here.

[Source: R. Szeliski]Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 90 / 91

Page 172: Visual Recognition: Filtering and Transformationsttic.uchicago.edu/~rurtasun/courses/VisualRecognition/lecture3.pdf · Visual Recognition: Filtering and Transformations Raquel Urtasun

Next class ... some image features

Raquel Urtasun (TTI-C) Visual Recognition Jan 10, 2012 91 / 91