CVIU Lecture 1

7/30/2019 CVIU Lecture 1

1/55

1

ENGN8530: Computer Vision

and Image Understanding:Theories and Research

Topic 1:

Introduction to Computer Vision

and Image UnderstandingDr Chunhua Shen

Dr Roland Goecke

VISTA / NICTA & RSISE, ANU


2/55

ENGN8530: CVIU 2

What is Computer Vision?

Vision is a process that produces, from images of theexternal world, a description that is useful to the viewerand not cluttered with irrelevant information. (Marr andNishihara, 1978)

Computer vision is the science and technology ofmachines that see. computer vision is concerned with thetheory and technology for building artificial systems that

obtain information from images or multi-dimensionaldata. (Wikipedia)Reference:

D. Marr and K. Nishihara, Representation and recognition of the spatial organisation of three-

dimensional shapes, Proc. Royal Society, B-200, 1978, pp. 269-294.


3/55

ENGN8530: CVIU 3

What is Computer Vision? (2)

Sometimes seen as complementary to biological vision. In biological vision, the visual perception of humans and

various animals are studied, resulting in models of howthese systems operate in terms of physiologicalprocesses.

Computer vision, on the other

hand, studies and describesartificial vision system that areimplemented in software and/orhardware.


4/55ENGN8530: CVIU 4

What is Computer Vision? (3)

Applications: Controlling processes

(robots, vehicles)

Detecting events (visual

surveillance) Organising information

(indexing databases ofimages / videos)

Modelling objects orenvironments (medicalimage analysis)

Interaction (HCI)

Source: Wikipedia


5/55

ENGN8530: CVIU 5

Image Understanding

Computer vision goeshand in hand withimage understanding

What information dowe need to know tounderstand the scene?

How can we makedecisions about whatobjects are present,their shape, their

positioning? Source: CMU Computer Vision course


6/55

ENGN8530: CVIU 6

Image Understanding (2)

Many different questions and approaches to solvecomputer vision / image understanding problems:

Can we build useful machines to solve specific (and limited)vision problems?

Is there anything special about the environment which makesvision possible?

Can we build a model of the world / scene from 2D images?

Many different fields are involved, e.g. computer science,AI, neuroscience, psychology, engineering, philosophy,art.


7/55

ENGN8530: CVIU 7

Sub-areas of CVIU

Scene reconstruction Event detection

Object tracking

Object recognition Object structure recovery

Ego-motion

Multi-view geometry

Indexing of image / video databases


8/55

ENGN8530: CVIU 8

Scene Reconstruction

From stereo From multiple views


9/55

ENGN8530: CVIU 9

Event Detection

Source: MERL

Source: Roland Goecke


10/55

ENGN8530: CVIU 10

Object Tracking



11/55

ENGN8530: CVIU 11

Object Recognition

Query

Result

DatabaseSource: David Nister


12/55

ENGN8530: CVIU 12

Object Structure Recovery

Reference:

A.D. Worrall, J.M. Ferryman, G.D. Sullivan and K.D. Baker, Pose and structure recovery using

active models, Proc. 6th British Machine Vision Conference, Vol.1, Birmingham, UK, pp137-146.


13/55

ENGN8530: CVIU 13

Ego-motion

Estimated camera path

Optical flow



14/55

ENGN8530: CVIU 14

Multi-View Geometry

Epipolar geometry

Source: Richard Hartley, Andrew Zisserman


15/55

ENGN8530: CVIU 15

Indexing and Retrieval

Results

1

23

Query

Reference: J. Sivic and A. Zisserman, Video Google: A Text Retrieval Approach to Object Matching in

Videos, Proc. International Conference on Computer Vision, Nice, France, 2003, pp. 1470-1477.


16/55

ENGN8530: CVIU 16

The Default Approach (Marr)

Workbottom upfrom the image to a 3D world modelvia hierarchy of representations as follows

Pixel array the image

Raw primal sketch edge, corner, etc. representation

Primal sketch structural information, i.e. groupings,segmentations, etc.

2-D sketch depth information in image-centred view

3-D world model

Reference:

D. Marr, Vision, Freeman, 1982.


17/55

ENGN8530: CVIU 17

The Default Approach (2)

Image sensor

Visible,infra-red,

radar

Image capture

Digitisation

Image processing

Feature detection

(edges, corners,regions)

Feature grouping

Characterization

of parts

Object

recognition


18/55

ENGN8530: CVIU 18

What is in Image? An image is an array/matrix of

values (picture elements =pixels) on a plane whichdescribe the world from the

point of view of the observer.

Because of the line of sighteffect, this is a 2D

representation of the 3D world. The meaning of the pixels

depends on the sensors used

for their acquisition. Source: Antonio Robles-Kelly


19/55

ENGN8530: CVIU 19

Imaging Sensors The information seen by the imaging device is digitised

and stored as pixel values. Two important quantities of imaging sensors are:

Spatial resolution: How many pixels are there? Image size

Signal resolution: How many values per pixel? There are many different types ofsensors

Optical: CCDs, CMOS, photodiodes, photomultipliers,

photoresistors Infrared: Bolometers

Others: Range sensors (laser), Synthetic Aperture Radar (SAR),Positron emission tomography (PET), Computed (Axial)

Tomography (CAT/CT), Magnetic Resonance Imaging (MRI)


20/55

ENGN8530: CVIU 20

Electro-Magnetic SpectrumSWIR MWIR LWIR

1.7m2.5m3.0m 5.0m 14.0m8.0m

NIR

1.0m

UV Visible

0.4m

The human eye can seelight between 400 and700 nm.


21/55

ENGN8530: CVIU 21

Charge-Coupled Device (CCD) CCDs (Charge-Coupled Devices)

were invented in 1969 by WillardBoyle and George Smith at

AT&T.

They are composed of an arrayof capacitors which are sensibleto light.

More modern devices are basedupon photodiodes.

Source: Wikipedia


22/55

ENGN8530: CVIU 22

CCD (2) Generally, the light-sensitive unit of

construction is arranged in an arraywhose topology is a lattice

Not always true, e.g. log-polarCCDs

Colour CCDs:

Bayer filter: 1x Red, 1x Blue, 2x Green

because the human eye is moresensitive to green

RGBE filter: 1x Red, 1x Blue, 1x Green,1x Emerald (Cyan)

Bayer filter

RGBE filter

Source: Wikipedia


23/55

ENGN8530: CVIU 23

Bolometers Invented by the astronomer Samuel Pierpont Langley in

1878.

It is a device comprised ofan "absorber" in contact witha heat sink through aninsulator. The sink can beviewed as a reference for

the absorber temperature,which is raised by the powerof the incident

electromagnetic wave.

Source: Los Alamos National Laboratory


24/55

ENGN8530: CVIU 24

Microbolometer The microbolometer, a particular kind of bolometer, is

the basis for thermal cameras.

It is a grid of vanadium oxide or amorphous silicon heatsensors atop a corresponding grid of silicon.

IR radiation from a specific rangeof wavelengths strikes thevanadium oxide and changes its

electrical resistance. Thisresistance change is measured andprocessed into temperatures which

can be represented graphically. Source: Roland Goecke


25/55

ENGN8530: CVIU 25

Synthetic Aperture Radar SARis an active sensing technique

Active sensor transmits radio waves

Antenna picks up reflections

For a conventional radar, the footprint is governed by the

size of the antenna (aperture).

SAR creates a synthetic aperture and delivers a 2Dimage. One dimension is the range (cross track),

whereas the other one is the azimuth (along track).

Sonar and ultrasound work on the same principles but indifferent wavelengths


26/55

ENGN8530: CVIU 26

SAR (2)

Radar Track

Range

Azimuth

Nadir Track

RADAR = Radio Detection and Ranging

NADIR = Opposite of zenith SAR image of Venus

Source: Wikipedia


27/55

ENGN8530: CVIU 27

Positron Emission Tomography Active sensing technique.

PET based on measuring emittedradiation.

PET is a nuclear medicineimaging technique which usesradiation from a radio-isotopeintroduced into the target.

PET produces a 3D image or mapof functional processes in thebody.

Source: Wikipedia


28/55

ENGN8530: CVIU 28

Magnetic Resonance Imaging Active sensing technique.

MRI also based on measuringemitted radiation.

MRI simulates the emission ofradiation by aligning the spinsof water molecules making useof a high energy magnetic field

(several Tesla!).

Good for showing soft tissue

Not good for showing bones

MRI

Magnetic

Resonance

Angiography

Source: Wikipedia


29/55

ENGN8530: CVIU 29

Functional MRI Functional MRI (fMRI) measures

signal changes in the brain thatare due to changing neuralactivity.

Increases in neural activity causechanges in the MR signal due tochange in ratio of oxygenated to

deoxygenated haemoglobin. Deoxygenated haemoglobin

attenuates the MR signal.

fMRI of head: Highlighted areasshow primary visual cortex

Source: Wikipedia


30/55

ENGN8530: CVIU 30

Computed (Axial) Tomography Employs a set of axially acquired x-

ray images to recover a 3Drepresentation of the object.

Originally, the images were in axial

or transverse planes, but the modernCT scanner deliver volumetric data.

Digital geometry processing is used

to generate a 3D image of theinternals of an object from a largeseries of 2D X-ray images taken

around a single axis of rotation.

CT scan of head

Source: Wikipedia


31/55

ENGN8530: CVIU 31

CAT/CT Good for showing

bones

Not good for showingsoft tissue

Modern diagnostic software


32/55

ENGN8530: CVIU 32

Camera Geometry

The apertureallows light to enter the camera

The image planeis where the image is formed

The focal lengthis the distance between the aperture and theimage plane

The optical axispasses through the center of the aperture andis perpendicular to it.

f (focal length)

x'

image planeaperture

y'd

z optical axis


33/55

ENGN8530: CVIU 33

Camera Geometry (2)

f (focal length)

x'

x

z optical axis x'

By similar triangles, x'/f=x/z

For small angle

tanor fxzxfx ==

fx =


34/55

ENGN8530: CVIU 34

Camera Geometry (3)

f

x'x'

xt

xb x't

x'b

x

And, using the formula in the previous slide

Hence, size transforms as

z

xf

z

fxx

xxxz

fx

xz

fx

x

bt

bt

b

b

t

t=

====

)(

and,

ffx =2

tan2


35/55

ENGN8530: CVIU 35

Camera Geometry (4)

Close objectDistant object

Rays that pass through the camera aperture spread out anddo not make a sharp point on the image.

These rays need to be focussed to make a sharp point in

the image. The rays from close objects diverge more than from distant

objects

For very distant objects, the rays are effectively parallel


36/55

ENGN8530: CVIU 36

Aperture and Resolution Light diffracts as it passes through the aperture

A point in the scene spreads out into a blob in the image(fundamental limit on image sharpness)

Size of Airy disk (and best resolution) is (Rayleigh

criterion)

where is the wavelength of the light, d is the apertured

fR

d

22.122.1 minmin ==

Circular apertureAiry disk

Squareaperture

Separatepoints


37/55

ENGN8530: CVIU 37

Resolution The resolution of a camera is the minimum separation

between two points such that they appear separately onthe image plane

Since distant objects appear smaller and closer together,

the resolution varies with respect to the distance.

The angle between separable objects does not vary wrtdistance angular resolution

The distance on the image plane does not vary imageplane resolution.


38/55

ENGN8530: CVIU 38

Camera Models Pinhole camera

Camera with lenses


39/55

ENGN8530: CVIU 39

Pinhole Camera Advantages

No distortion of image

Depth of field from a few cm to infinity

Wide angular field

Works on ultra-violet and X-rays

Disadvantages

Very limited light gathering

Poor resolution


40/55

ENGN8530: CVIU 40

Pinhole Camera (2)

Simplest camera

The pinhole (aperture d) must be small to get asharp image

But we need a large pinhole to get enough light!


41/55

ENGN8530: CVIU 41

Pinhole Camera (3) For distant objects the

geometric limit is

The diffraction limit is

The best resolution occurswhen these two are equal:

or

f* is the optimal focal length

d

dR =

dfR /22.1 =

dfd /22.1 *=

22.1/2* df =

R=

d

Geometric

Diffraction

f

R


42/55

ENGN8530: CVIU 42

Pinhole Camera (4)Geometric limit

Longer wavelength

Smaller aperture


43/55

ENGN8530: CVIU 43

Cameras with Lenses For better light-gathering capabilities, we need to

increase the aperture.

A lens removes the geometric limit on resolution,since it focuses all light entering through the

aperture on the same point on the image.

f

d

Pinholepath


44/55

ENGN8530: CVIU 44

Cameras with Lenses (2) We can have apertures as large as we like

The price to pay: chromatic and spherical aberration

The image-plane resolution of lens based camera is thediffraction limit of the aperture:

The larger the aperture, the better the resolution

The image-plane resolution is still f

d/22.1 =

dfR /22.1 =


45/55

ENGN8530: CVIU 45

Camera Resolution Examples Pinhole camera, 0.5mm pinhole

Optimal focal length f*=37cm

=4.6', equivalent to 1mm at 75cm

For a 35mm lens camera and visible light:

=3.9'', 1mm at 52m

Focal length depends on the lens, but typically


46/55

ENGN8530: CVIU 46

Illumination The amount of light entering the camera is proportional

to the area of the lens (d2/4)

The area covered by the image is proportional to f2

So, the brightness of the image is proportional to d2/f2

Dependent on the focal ratio f/d

Brightness is controlled by a moveable aperture whichchanges d

Referred to by a sequence of f-stops; f:1 is fully open,each successive f-stop halves the brightness (so theaperture is reduced by 2): f:1.4, f:2, f:2.8, f:4, f:5.6


47/55

ENGN8530: CVIU 47

Absorption and Reflection

Reflection

Transmission

Absorption

Reflected + absorbed + t ransmit t ed energy

= I ncident l ight energy

All of these are object(material, surface) dependant!


48/55

ENGN8530: CVIU 48

The BSDF

Source: Wikipedia

Bidirectional Scattering

Distribution Function

Describes the way in which lightis scattered by a surface

BSDF = BRDF + BSSRDF + BTDF

BRDF - Bidirectional reflectancedistribution function

BSSRDF - Bidirectional surfacescattering reflectance distributionfunction (incl. subsurface scattering)

BTDF - Bidirectional transmittance

distribution function


49/55

ENGN8530: CVIU 49

The BRDF It describes the reflectance of

an object as a function of theillumination, viewinggeometry and wavelength.

Its given by the ratio ofirradiance (incident flux perunit area) to radiance

(reflected flux per unit area).

Reference:

F. Nicodemus, "Reflectance nomenclature and directional reflectance and emissivity," Appl. Opt.,

Vol. 9, 1970, pp. 14741475.


50/55

ENGN8530: CVIU 50

The BRDF (2) The modelling of the lighting conditions in the scene is of

pivotal importance for the acquisition and processing ofdigital imagery.

The radiance function can be decomposed into a linear

combination ofambient, diffuse and specularcomponents.

Recovering the radiance function from a single image is

an underconstrained problem.


51/55

ENGN8530: CVIU 51

The BRDF (3) In general, the BRDF has the following form

The function depends on

Incoming and outgoing angle

Incoming and outgoing wavelength

Incoming and outgoing polarisation

Incoming and outgoing position (subsurface scattering)

Delay between the incoming and outgoing light rays


52/55

ENGN8530: CVIU 52

Radiance Power per unit projected area perpendicular to

the ray per unit solid angle in the direction of theray

Flux given byd = L(x,) cos d dA

Solid angle is proportional to the surface area, Sof a projection of the object onto a sphere dividedby the square of its radius R.

dA dw

L(x,w)


53/55

ENGN8530: CVIU 53

Example BRDFs Oren and Nayar

Cook and Torrance


54/55

ENGN8530: CVIU 54

Example BRDFs (2)

where mp is the microfacet slope


55/55

ENGN8530: CVIU 55

Example BRDFs (3) Phong

Documents

CVIU Lecture 1