C18 Computer Visiondwm/Courses/4CV_2015/4CV-N1.pdf · 2015-10-21 · C18 2015 13 / 64 b. Model-driven,andc. Dynamicvision b. Model-driven,top-down,generativeprocessing—amodelofthe

C18 2015 1 / 64

C18 Computer Vision

David Murray

[email protected]/∼dwm/Courses/4CV

Michaelmas 2015

C18 2015 2 / 64

Course Content

DWM: Intro, basic image features, basic multiview geometry ...1. Introduction; imaging geometry; camera calibration2. Salient feature detection – edges, line and corners3. Recovering 3D from two images I: epipolar geometry.4. Recovering 3D from two images II: stereo correspondence

algorithms; triangulation.

Slides, videos, etc at www.robots.ox.ac.uk/∼dwm/Courses/4CV

AV: Motion, tracking, recognition ...5. Structure from Motion I: estimating the F matrix6. Structure from Motion II7. Visual Motion and Tracking8. Object Detection and Recognition

C18 2015 3 / 64

Useful texts

Multiple View Geometry in Computer VisionRichard Hartley, Andrew Zisserman Cambridge U. Press.ISBN:0521623049Computer Vision: Algorithms and ApplicationsRichard Szeliski Springer: online at szeliski.org/book

Computer Vision, A Modern ApproachDavid Forsyth, Jean Ponce Prentice Hall; ISBN:0130851981

C18 2015 4 / 64

Computer Vision: This time ...

1. Introduction; imaging geometry; camera calibration2. Salient feature detection – edges, line and corners3. Recovering 3D from two images I: epipolar geometry.4. Recovering 3D from two images II: stereo correspondence

algorithms; triangulation.

C18 2015 5 / 64

Lecture 1

1.1 Introduction

1.2 The perspective camera as a geometric device

1.3 Perspective using homogeneous coordinates

1.4 Calibrating the elements of the projection model

C18 2015 6 / 64

1.1 IntroductionAim in computational vision is to take a number of 2D images, and fromthem obtain an understanding of the 3D environment; what is in it; andhow it evolves over time.What have we here ... ?

So it is easy ...

C18 2015 7 / 64

Wrong! It’s the mother of big data problemsSome thoughts why?

From the hardware engineering perspective?The raw throughput is upsettingly large:Colour stereo pair at 30Hz is ∼ 100 MBs−1

Now multiply by non-trivial processing cost per Byte ...Image collections are huge

Certainly challenging, but no longer frightening

From applied maths perspective?Scene info is made highly implict or lost by projectionInverse mappings 2D→3D ill-posed, ill-conditioned.Can only be solved by introducing constraints:(i) about the way the world actually works; or(ii) about the way we expect it to work.

We now know about these. No longer frightening

C18 2015 8 / 64

Some thoughts why ...

From the Information-Engineering/AI perspective ...

Images have uneven information content both absolutely andcontextually.

Computational visual semantics— what does visual stuff mean exactly?

If we are under time pressure, what is the important visual stuff rightnow?

Still a massive challenge — if we want genuine autonomy

C18 2015 9 / 64

Natural vision a hard problem

But we see effortlessly! Yep, spot on if one neglects:

the 1011 neurons involvedaeons of evolution generating hardwired priors P(I )that we sleep with our eyes shut, and avert gaze when thinking.

Hard work for our brains — does machine vision have a hope?

Perhaps building a “general” vision system is a flawed concept.

Evidence that the human visual system is a bag of tricks —specialized processing for specialized tasks.Perhaps we should expect no more of computer vision?

However, the h.v.s does give us a convincing magic showEach trick flows seamlessly into the next. Do we have an inklinghow that will be achieved?

C18 2015 10 / 64

So why bother? What are the advantages?

From a sensor standpoint ...Vision is a passive sensor (unlike sonar, radar, lasers)Wide range of depths, overlaps and complements other sensorsWide diversity of methods of recover 3D information:so vision has process redundancy.Images provide data redundancy.

From a natural communications standpoint ...The world is awash with information-rich photonsBecause we have eyes, vision provides a natural language ofcommunication. If we want robots/man-machine-interfaces to actand interact in our environment, they have to speak that language.

C18 2015 11 / 64

Organizing the tricks ...

Although human and computer vision might be bags of tricks, it is usefulto place the tricks within larger processing paradigms.

For example:a. Data-driven, bottom-up processingb. Model-driven, top-down, generative processingc. Dynamic Vision (mixes bottom-up with top-down feedback)d. Active Vision (task oriented)e. Data-driven discriminative approach (machine learning)

These are neither all-embracing nor exclusive

C18 2015 12 / 64

a. Data-driven, bottom-up processing

Image processing produces map ofsalient 2D features.

Features input into a range ofShape from X processes whoseoutput was the 2 1

2D sketch. Thiswas a thumbnail sketch of the 3Dworld containing sparse depths,surface orientations and so on, socalled because although it contained3D information, the information isretinotopically-mapped.

Only in the last stage was a fully 3Dobject-centred descriptionachieved.

Viewer centre

Retinotopic2.5D sketch

2D Primal sketch

2D Image

Object centred

Retinotopic

Retinotopic

3D objects

Object recognition

Shape from X processes

Early vision

C18 2015 13 / 64

b. Model-driven, and c. Dynamic vision

b. Model-driven, top-down, generative processing — a model of thescene is assumed known.Supply a pose for the objectrelative to the camera, anduse projection to predictwhere its salient featuresshould be found in the imagespace.

Search for the features, andrefine the pose by minimizingthe observed deviation.

2D Primal sketch

2D Image

3D objects

+ Pose

imagespace

intoProject

2.5D sketch

2D Primal sketch

2D Image

3D objects

Feedback

Recognition

Shape from X

Early vision

Top-Down Dynamicc. Dynamic Visionmixes bottom-up/top-down by introducing feedback

C18 2015 14 / 64

d. Active Vision

A criticism of earlier paradigms was that vision was divorced from theactions that its “owner” might wish to take.

Actions A1 and A2 had to play “lucky dip” in the perceptual bran tub.

By introducing task-orientedsensing-perception-actionloops:1. Visual data need only be“good enough” to drive theparticular action.

A1

P1

S1

A2

P2

S2

Percept Pot

A1 A2

Sensing

Dynamic Active

2. No need to build and maintain an overarching representation of thesurroundings.

3. Computational resources focussed where they are needed.

C18 2015 15 / 64

e. Data-driven discriminative approach

The aim is to learn a description of the transformation between input andoutput using exemplars

Example Images

Desired Outputs

OutputImage

Training

Transformation

Geometry is not forgotten, but implicit learned representations arefavoured.

C18 2015 16 / 64

The importance of Geometry

A consistent thread through these paradigms is

The geometry of the scene, the lighting and and the properties of theimaging device are key to image formation, and so are also key tounderstanding reconstruction of the scene from imagery.

That geometry is paramount seems self-evident ...

Clear in the 1960’s (eg Louis Roberts’ 1965 system)

Message nearly lost in the 1970’s — expert systems in simple domainsled to the “if above and blue then sky” rule-based approach. Disaster!

Geometrical modelling regained dominance in 1980’s (calibrated) and1990’s (uncalibrated)

Emphasis in 2000’s – avoiding explicit modelling of geometry by learning.

C18 2015 17 / 64

1.2 The perspective camera as a geometric device

C18 2015 18 / 64

The perspective camera as a geometric device

Standard cameras project images as though light rays enter via pinhole atthe optical centre.

The z-axis or optical axis is perpedicular to the image plane and passesthrough the optical centre.Easy to use similartriangles to show thatscene point atX = (X ,Y ,Z )> isimaged at x = (x , y)>

where[xy

]=

[fX/ZfY /Z

].

C18 2015 19 / 64

Homogeneous coordinates

Awkward! The projection equation x = f X/Z is non-linear.

However, it turns out that the projection can be made linear by adoptinghomogeneous coordinates, which involves representing the image andscene in a higher dimensional space.

They handle limiting cases betters — e.g. points at infinity and vanishingpoints.

Using homogeneous coordinates also allows transformations to beconcatenated more easily —and here we will start with that as motivation.

C18 2015 20 / 64

3D Euclidean transforms: inhomogeneous coords

A Euclidean transformation in 3D involves a rotation matrix andtranslation. Using inhomogeneous coordinates:

X ′3×1 = R3×3X3×1 + t3×1

where R is the orthogonal rotation matrix R>R = I and X ′ etc arecolumn vectors.Concatenation is a mess!

X1 = R1X+ t1X2 = R2X1 + t2

= R2(R1X+ t1) + t2 = (R2R1)X+ (R2t1 + t2)

C18 2015 21 / 64

Euclidean transformations using homogeneous coords

If instead 3D points

XYZ

are represented by a four-vector

XYZ1

then the transformation can be represented by a 4× 4 matrix ...[

X ′

1

]= E

[X1

]=

[R t0> 1

] [X1

]Obviously the matrix has block form:

[R t0> 1

]=

r11 r12 r13 tXr21 r22 r23 tYr31 r32 r33 tZ0 0 0 1

C18 2015 22 / 64

Combining successive transformationsTransformations can now be concatenated by matrix multiplication.[

X11

]= E 10

[X01

] [X21

]= E 21

[X11

]then [

X21

]= E 21E 10

[X01

]Checking ...[

X1

1

]=

[R1 t10> 1

] [X0

1

]=

[R1X0 + t1

1

][

X2

1

]=

[R2 t20> 1

] [X1

1

]=

[R2 t20> 1

] [R1 t10> 1

] [X0

1

]=

[R2R1 R2t1 + t20> 1

] [X0

1

]=

[(R2R1)X0 + (R2t1 + t2)

1

]as earlier.

C18 2015 23 / 64

Homogeneous notation — definition in R3

X = (X ,Y ,Z )> is represented in homogeneous coordinates by ANY4-vector

X1X2X3X4

such that X = X1/X4 Y = X2/X4 Z = X3/X4

So the following homogeneous vectors represent the same point forany non-zero λ.

X1X2X3X4

and λ

X1X2X3X4

E.g. (2, 3, 5, 1)>, (4, 6, 10, 2)> & (−3,−4.5,−7.5,−1.5)>

all represent the same inhomogeneous point (2, 3, 5)>

C18 2015 24 / 64

Homogeneous notation – rules for use

1. Convert theinhomogeneous point to anhomogeneous vector:

XYZ

→

XYZ1

2. Apply a 4× 4transformation.3. Dehomogenize theresulting vector:

X1X2X3X4

→ X1/X4

X2/X4X3/X4

NB general transformations only to bedefined up to scale.E.g.

1 2 3 4−2 3 −4 56 2 1 94 −1 0 1

and

2 4 6 8−4 6 −8 1012 4 2 188 −2 0 2

represent the SAME transformation

The Euclidean transformation is a specialcase, as RR> = I

C18 2015 25 / 64

Homogeneous notation — definition in R2

A 2D inhomogeneous vector x = (x , y)> is represented in homogeneouscoordinates by any 3-vector x1

x2x3

such that

x = x1/x3 y = x2/x3

E.g.(1, 2, 3)> ≡ (3, 6, 9)> ≡ (−2,−4,−6)>

all represent the same inhomogeneous 2D point (0.33, 0.66)>

C18 2015 26 / 64

Projective transformations

A projective transformation is a linear transformation on homogeneous4-vectors represented by a non-singular 4× 4 matrix.

X ′1X ′2X ′3X ′4

=

p11 p12 p13 p14p21 p22 p23 p24p31 p32 p33 p34p41 p42 p43 p44

X1X2X3X4

The effect on the homogenous points is that the original and transformedpoints are linked through a projection centre

The 4× 4 matrix is defined up to scale, and so has 15 dof

C18 2015 27 / 64

[**] More 3D-3D & 2D-2D TransformationsProjectivity (15 dof)

X ′1X ′2X ′3X ′4

P= [P4×4]

X1

X2

X3

X4

Projectivity (8 dof) x ′1

x ′2x ′3

P=

h11 h12 h13

h21 h22 h23

h31 h32 h33

x1

x2

x3

Affine (12 dof)[X ′

1

]=

[A3×3 t30> 1

] [X1

] Affine (6 dof)[x ′

1

]=

[A2×2 t20> 1

] [x1

]Similarity (7 dof)[

X ′

1

]=

[SR3×3 St30> 1

] [X1

] Similarity (4 dof)[x ′

1

]=

[SR2×2 St20> 1

] [x1

]Euclidean (6 dof)[

X ′

1

]=

[R3×3 t30> 1

] [X1

] Euclidean (3 dof)[x ′

1

]=

[R2×2 t0> 1

] [x1

]

C18 2015 28 / 64

Perspective 3D-2D transformationNow constrain the transformed points lie on the plane z = f

Z = f ⇒Ximage =

xyf1

Then, because z = f is fixed, it must be that

λ

xyf1

=

p11 p12 p13 p14p21 p22 p23 p24fp41 fp42 fp43 fp44p41 p42 p43 p44

[ X1

]

The 3rd row is redundant! Remove it and re-label elements:

λ

xy1

=

p11 p12 p13 p14p21 p22 p23 p24p31 p32 p33 p34

[ X1

]

A perspective transformation. The matrix is the projection matrix.

C18 2015 29 / 64

1.3 Perspective using homogeneous coordinates

We now write down the form of the projection matrix for a simplepin-hole camera with focal length f , where the camera frame coincideswith the world frame.

Image Projection WorldPoint Matrix Point

λ

xy1

=

f 0 0 00 f 0 00 0 1 0

XYZ1

To check ...

λx = fX λy = fY λ = Z

and⇒ x = fX/Z y = fY /Z

— exactly what we wrote down using similar triangles

C18 2015 30 / 64

Perspective using homogeneous coordinates

It is useful to split up the overall projection matrix into (i) a part thatdepends on the internals of the camera, (ii) a vanilla projection matrix,and (iii) a Euclidean transformation between the world and cameraframes.

First assume that the scene or world coords are aligned with the cameracoords, so that the Extrinsic Calibration Matrix is an identity:

Camera’s Projection Camera’sImage Intrinsic matrix Extrinsic WorldPoint Matrix (vanilla) Matrix Point

λ

xy1

=

f 0 00 f 00 0 1

1 0 0 00 1 0 00 0 1 0

1 0 0 00 1 0 00 0 1 00 0 0 1

X

C18 2015 31 / 64

Perspective using homogeneous coordinates /ctd

Now make the projection matrix more general ...

1. Insert a Rotation R and translation t between world and cameracoordinates

2. Insert extra terms in the intrinsic calibration matrix

Intrinsic ExtrinsicMatrix Projection Matrix

λ

xy1

=

f sf u00 γf v00 0 1

1 0 0 00 1 0 00 0 1 0

r11 r12 r13 t1r21 r22 r23 t2r31 r32 r33 t30 0 0 1

X

C18 2015 32 / 64

The Extrinsic matrix arametersThe camera’s extrinsic matrix is just the rotation R and translation tthat take points in the world or scene frame into the camera frame[

XC

1

]=

[R t0> 1

] [XW

1

]

The vector t is the origin of the world frame expressed in the cameraframe.

World Frame

Camera Frame

y

x

z

y

z

x

[R]

tx

y

x

C

z

y

Wz

z

x

y

Aligned

C18 2015 33 / 64

Building the Euclidean transformation in stepsIt is convenient to build the transformation in stages.

Let A be a frame aligned with the camera.

Think about the rotation between the world and the aligned frame A ...

RCW = RAW

A

A

A

R

OC

yC

Cx

Cz

y

x

WOW

zW

W

y

x

z

C18 2015 34 / 64

Building a Euclidean transformation in steps /ctdBreak the rotation into simple ones through θz about z , then θy abouty , and θx about x . Let frames ’ and ” be intermediate frames ...

X ′ = RzXW =

cos θz sin θz 0− sin θz cos θz 0

0 0 1

XW

X ′′ = RyX ′ =

cos θy 0 − sin θy0 1 0

sin θy 0 cos θy

X ′XA = RxX ′′ =

1 0 00 cos θx ± sin θx0 ∓ sin θx cos θx

X ′′(You think about ±, ∓ in the last!)

RCW = RAW = RxRyRz

NB: rotation does not commute ⇒order important

θ

x’

xW

z

Wyy’

θ y

z’

x

z’’

x’’

θ

y’’

y

z’’z

A

A

x

C18 2015 35 / 64

Build transformations in steps ...

Then deal with translation

XC = XA + tCA= XA + tCW= RCWXW + tCW

E =

[RCW tCW0> 1

]

A

A

A

t

OC

yC

Cx

Cz

y

x

zO

W

C18 2015 36 / 64

[**] Inverse of the Euclidean transformationIt must be the case that[

RCW tCW0> 1

]−1

=

[RWC tWC0> 1

]Now, we know that

RWC =[RCW ]−1

=[RCW ]>

but what is tWC ? — tempting to say −tCW , but NO!

XC = RCWXW + tCW (tCW is Origin of W ref ′d to C )

⇒ XW = RWC (XC − tCW )

⇒ XW = RWCXC − RWC tCWBut by def ′n : XW = RWCXC + tWC (tWC is Origin of C ref ′d to W )

⇒ tWC = −RWC tCW

C18 2015 37 / 64

The Intrinsic Matrix

1. In a real camera the imageplane might be skewed.2. The central axis of the lensmight not line up with theoptical axis.3. The light gathering elementsmight be rectangular, not square.

0,0

IDEAL FRAMESTORE

0,0

0,0

Principal Point

1. Use parameter s ≈ 0 to account for axis skew.2. Use an origin offset. (u0, v0) is the principal point, where where opticaxis strikes the image plane. (u0/f , v0/γf is the offset before scaling)3. Allow different scaling in x and y directions. γ is the aspect ratio.

K =

f 0 00 γf 00 0 1

1 0 u0/f0 1 v0/γf0 0 1

1 s 00 1 00 0 1

=

f sf u00 γf v00 0 1

C18 2015 38 / 64

How can the aspect ratio γ be non-unity?

There is usally a 1-to-1 mapping between nominal camera pixels andframestore pixels. However the camera pixels (that is, the elementsgathering light) might not be square.

Camera has Framestore ImageRectangular pixels w > h

Look at the framestore image. To take such an image the camera lenswould be wide-angle in the x−direction, but narrow-angle in y−dirn.

⇒ fx < fy ⇒ f < γf ⇒ γ > 1 ⇒ γ = w/h

Good news! Modern digital cameras have γ ≈ 1.

C18 2015 39 / 64

Summary of steps from Scene to ImageMove scene point is (XW , 1)> into camera coord frame by a 4× 4extrinsic Euclidean transformation:[

XC

1

]=

[R t0> 1

] [XW

1

]

Project into an ideal camera (f = 1, γ = 1, s = 0, uo = vo = 0) via avanilla perspective transformation:[

x ′1

]P= [I |0]

[XC

1

]

Map the ideal image into the real image using the intrinsic matrix:[x1

]= K

[x ′1

]NB: the equals sign — the bottom right element of K is K33 = 1.

C18 2015 40 / 64

1.4 Camera Calibration

An uncalibrated camera can only recover 3D information up to aunknown projective transformation of the scene. Then the X recoveredrelates to the Euclidean XE by an unknown 4× 4 projectivity

XEP= H4×4X .

To make Euclidean 3D measurements the camera’s intrinsic and extrinsicmatrices must be calibrated.

There are a variety of methods for auto-calibration of cameras. Thesecan recover H4×4 on the fly.

Here we consider a method of pre-calibration using a specially madetarget.

C18 2015 41 / 64

Camera Calibration: Preamble

Multiply out the intrinsic, projection and extrinsic matrices

λx = K [I |0][

R t0> 1

]= K [R |t]X = P3×4X

First we need to recover the overall projection matrix P3×4.It has only 11 dof (Why?)

It is possible to fix one of the elements (say p34 = 1), but this is vdangerous if p34 is small.

Instead, we’ll adopt a null-space method, solving (at first, anyway) asystem Ap = 0, where A is known, and p is the unknown.

Of course, p can only be recovered up to scale.

C18 2015 42 / 64

Camera Calibration: PreambleSuppose you know the image positions of a number of known scenepoints. For each point i :

λi

[xi

yi

1

]= PXi =

[p11 p12 p13 p14

p21 p22 p23 p24

p31 p32 p33 p34

]Xi

Hence

λi = p31Xi + p32Yi + p33Zi + p34(p31Xi + p32Yi + p33Zi + p34)xi = (p11Xi + p12Yi + p13Zi + p14)(p31Xi + p32Yi + p33Zi + p34)yi = (p21Xi + p22Yi + p23Zi + p24)

Re-arrange to give for point i[Xi Yi Zi 1 0 0 0 0 −Xixi −Yixi −Zixi −xi0 0 0 0 Xi Yi Zi 1 −Xiyi −Yiyi −Ziyi −yi

]p =

[00

]where vector p = (p11, p12, . . . , p33, p34)> contains the unknowns.So, use 6 points i = 1...6, to build up a known 12× 12 matrix A ...

C18 2015 43 / 64

Calibration: Step 1

Step 0: Create target with known scene points

Step 1: Determine the projection matrix

Measure the image positions x of n > 6 KNOWN scene points X andbuild up the system Ap = 0

With exactly 6 points and perfect image data, p is the null-space of A.That is, p = null(A).

If the 6 points are noisy, or if you have more points, a least-squaressolution is found using SVD:

Find the Singular Value Decompoisiton of A: USV> ← Ap is the column of V corresponding to smallest singular value.See later for an explanation of least squares & SVD.

The projection matrix P is now known (up to scale).

C18 2015 44 / 64

Step 2: Invert left block of projection matrix

Step 2:Construct PLEFT from the leftmost 3× 3 block of P.Invert it to obtain P−1

SLEFT.

P−1LEFT has form [Rotation Matrix] × [Upper triangular matrix].

Explanation:

Remember that P = K [R |t], so PLEFT = KRHence P−1

LEFT = R−1K−1.NB: You know that the inverse of a rotation matrix is a rotationmatrixNB: The inverse of an upper triangular matrix is also uppertriangular.

C18 2015 45 / 64

Step 3: QR decomposition

Step 3.Apply QR-decomposition technique to P−1

LEFT.

Outputs Q and R are R−1 and K−1 respectively.

ExplanationStandard technique decomposes a matrix into a rotation matrix Q

and an upper triangular matrix R.Watch out!! Q is the rotation matrix — NOT R

There is no need to know how the QR-decomposition worksOne of the tute sheet questions gets you to play with it in Matlab

C18 2015 46 / 64

Steps 4–7 Mop upReminder: Outputs Q and R correspond to R−1 and K−1

Step 4. To find our rotation, setR ← Q−1.

Step 5. You’d think that K = R−1, but because the scale of P wasunknown, we must re-scale every element of K so that, by convention,K33 = 1.

Set Temp ← R−1.Each Kij ← Tempij/Temp33.Each Pij new ← Pij old/Temp33.

Step 6. Recall P = K [R |t], hence t = K−1[p14 p24 p34]>.t← K−1[p14 p24 p34]>

(Note that you could save some time in Steps 5 and 6 by not rescaling Pand then computing t = R[p14 p24 p34]>.)

Step 7. Sort out remaining sign ambiguities. (See Qsheet).

C18 2015 47 / 64

Calibration Scheme

1. Compute corner features

2. Hand match 6 (or more) image points to world points

3. Compute Calibration K, R, t

4. While unmatched points exist {4.1 Project all known world points into image

using current calibration

4.2 Match to measurements (using threshold on imagedistance)

4.3 Recompute calibration

4.4 Relax threshold if match confusion unlikely}

C18 2015 48 / 64

Calibration Example

K =

4622 0.1 2670.0 4651 5580.0 0.0 1

R =

−0.03 0.34 −0.94−0.92 0.36 0.160.39 0.87 0.30

t =[0.7 7.5 115.8

]>

C18 2015 49 / 64

Radial distortion

So far, we have figured out the transformations that turn our camera intoa notional camera with the world and camera coordinates aligned and an“ideal” image plane of the sort given in the first diagram.

One often has to correct for other optical distortions and aberrations.Radial distortion is the most common.See the Q sheet.

Corrections for this distortion is applied before carrying out calibration.

Barrel Distortion Undistorted Pincushion Distortion

C18 2015 50 / 64

Summary of Lecture 1

In the lecture we have

1 Introduced the aims of computer vision, and some paradigms (lookon the web for videos)

2 Explored linear transformations and introduced homogeneouscoordinates

3 Defined perspective projection from a scene, and saw that it couldbe made linear using homogeneous coordinates

4 Discussed how to pre-calibrate a camera using image of six or moreknown scene points.

C18 2015 51 / 64

Some further notes

There follow some further notes on

1 Least squares using Eigendecomposition or SVD. (Essential readingbefore lecture 3.)

2 Notes on how homogeneous coords handle points at infinity andvanishing points. (Essential reading before lecture 3.)

3 That there are other projection models. (For info only.)

C18 2015 52 / 64

A1: Least Squares, Eigendecomposition and SVDSuppose you want to find the straight line p1x + p2y + p3 = 0 from twoknown points (x , y)1 and (x , y)2 on it.p1,2,3 are the unknowns. You can write[

x1 y1 1x2 y2 1

] p1p2p3

=

000

or Ap = 0

and solve for p up to scale as the vector spanning the A’s null-space.Example: points are (2, 1) and (1, 2). Find row-echelon-form[

2 1 11 2 1

]→[

2 1 10 3/2 1/2

]rank, nullity ofA are 2, 1

p1 and p2 are free variables and p3 non-free.[2 1 10 3/2 1/2

] αβp3

=

000

... grind... ⇒p = α

11−3

and the line is x + y − 3 = 0 — obviously correct!

C18 2015 53 / 64

A1: Least Squares, Eigendecomposition and SVD

If you pile in more points exactly on the line the rank of matrix A doesnot increase.

A = 1 2 12 1 10 3 3

>> rank(A)ans = 2

A = 1 2 12 1 10 3 1

-99 102 1>> rank(A)ans = 2

>> p=null(A)p =

0.30150.3015

-0.9045

Matlab returns p with unit modulus.

C18 2015 54 / 64


Suppose you are trying to solve Ap = 0 using more noisy data than thereare are degrees of freedom in the system. Eg, three or more points lyingapproximately on a straight line.

A non-trivial approximate solution can be found using least-squares.

The vector of residuals r is defined such that Ap = r.

The sum of their squares is then r>r = p>[A>A]p.

The least squares solution, the minimum of p>[A>A]p, is found when

p is the eigenvector of [A>A] corresponding to the smallesteigenvalue.

C18 2015 55 / 64

A1: Proof* [A>A] is a n × n, positive semi-definite, symmetric matrix, and hence has

real and positive eigenvalues λi . Why?[A>A]ei = λi ei ⇒e>i A>Aei = λi e>i ei = λi ⇒λi = [Ae]>[Ae] > 0

* The eigendecomposition of [A>A] is by definition

[A>A] = [e1|e2| . . . |en]

λ1

λ2

..λn

e>1e>2...e>n

=∑

i

λi [ei e>i ]

where we order the eigenvalues λ1 > λ2 > . . . > λn > 0.

* Then, noting that p>ee>p = (p>e)2 is just a scalar, we have

r>r = p>[A>A]p = λ1(p>e1)2 + λ2(p>e2)

2 + . . .+ λn(p>en)2

* We conclude that

r>r = p>[A>A]p is minimized (and equal to λn) when p = en.

C18 2015 56 / 64


Any m × n matrix where m > n can be decomposed as UΣV> ← Awhere

Am×n = Um×m [diag(σ1,σ2, . . . ,σn)]m×n V>n×n

whereU is column-orthogonal,V is fully orthogonal, andΣ contains the singular values ordered so σ1 > σ2 > . . . > σn > 0.

The lsq soln for p occurs as the column of V associated with thesmallest singular value in the SVD of A. (Column n in our case.)

C18 2015 57 / 64

A1: Proof

* If UΣV> ← A, then

[A>A] = VΣ>U>UΣV> = VΣ>ΣV>

* But V is fully orthogonal and Σ>Σ is diagonal with elements σ2i* Compare with Eigendecomposition ...

[A>A] = [e1|e2| . . . |en]

λ1

λ2

..λn

e>1e>2...e>n

* It is obvious thatV = [e1|e2| . . . |en] and λi = σ

2i

SVD is usually preferred over eigendecomposition because it avoids therelatively expensive computation of A>A

C18 2015 58 / 64


SVD: 3 collinear pts

A=[1.0 2.0 1; ...2.0 1.0 1; ...1.5 1.5 1];

>> [u,s,v]=svd(A)u= -0.5774 0.7071 -0.4082

-0.5774 -0.7071 -0.4082-0.5774 0.0000 0.8165

s= 4.0620 0 00 1.0000 00 0 0.0000

v= -0.6396 -0.7071 -0.3015-0.6396 0.7071 -0.3015-0.4264 0 0.9045

SVD: 3 non-collinear

A=[1.0, 2.0, 1; ...2.0, 1.0, 1; ...1.5, 1.4, 1];

>> [u,s,v]=svd(A)u= -0.5844 0.6810 -0.4412

-0.5807 -0.7308 -0.3589-0.5669 0.0465 0.8225

s= 4.0257 0 00 1.0016 00 0 0.0248

v= -0.6308 -0.7143 -0.3032-0.6458 0.6999 -0.3052-0.4302 -0.0033 0.9027

EigD: 3 non-collinear

A=[1.0, 2.0, 1; ...2.0, 1.0, 1; ...1.5, 1.4, 1];

>> [evec ,eval]=eig(A’*A)evec =

0.3052 0.6999 0.64580.3032 -0.7143 0.6308

-0.9027 -0.0033 0.4302

eval =0.0006 0 0

0 1.0033 00 0 16.2061

% NB the eigenvectors & values% are sorted the other way

C18 2015 59 / 64

A2: More advantages of homogeneous notation

There are several advantages of using homogeneous notation to representperspective projection:

Non-linear projections equations are turned into linear equations.

The tools of linear algebra can be used.

Awkward limiting procedures are avoided, especially when dealingwith points at infinity.

Let’s look at 3D points at infinity and their projections into the imageas 2D vanishing points

C18 2015 60 / 64

A2: Vanishing points using homogeneous notation

A line in 3D through the point A with direction D is X(µ) = A+ µD

Writing this in homogeneous notationX1(µ)X2(µ)X3(µ)X4(µ)

P=

[A1

]+ µ

[D0

]P=

1µ

[A1

]+

[D0

]

In the limit µ→∞, the point on the line is[D0

]So, homogeneous vectors with zero last entry represent points “atinfinity”.

Points at infinity are equivalent to directions

C18 2015 61 / 64

A2: Parallel 3D lines meet at a single 2D point

The special image point is call the vanishing point

To find it, simply project the point-at-∞ into the image ...

v =

1 0 0 00 1 0 00 0 1 0

[ D0

]=

DxDyDz

C18 2015 62 / 64

A2: Vanishing points using homogeneous notation

Exercise: Compute the vanishing points of lines on an XZ plane:(1) parallel to the Z axis; (2) at 45◦ to the Z axix; (3) parallel to the Xaxis.

01

1

001

1003

2

y

x Z

D =

D =

1D =XYZ

1:

0010

→ 0

01

2:

1010

→ 1

01

3:

1000

→ 1

00

C18 2015 63 / 64

A3: There are other projection models

Using a perspective projection modelis only sensible when the depthvariation across a viewed object iscomparable or large compared withthe distance to the object – ie whenthe image displays perspectivedistortion.

If the depth variation is small (say10%) of range, trying to recover fullperspective is not feasible. It isbetter to model the projection as“weak-perspective”, akin to choosingan average depth for the scene.

C18 2015 64 / 64

A3: Weak perspective is an affine projection

Weak-perspective is a special case of a general linear or affine projectionfrom scene to image. Using inhomogenous coordinates[

xy

]= MX+ t .

Using homogeneous coordinates, one sees that the projection matrixgains a specialized form with fewer degrees of freedom

[x1

]P= Paffine

[X1

]Paffine =

p11 p12 p13 p14p12 p22 p23 p240 0 0 p34

M is a 2× 3 matrix mij = pij/p34 and t = (p14/p34, p24/p34)>

Affine projection matrices correspond to parallel projections, orequivalently, projections for which the optic centre is at infinity.

Documents

C18 Computer Visiondwm/Courses/4CV_2015/4CV-N1.pdf · 2015-10-21 · C18 2015 13 / 64 b. Model-driven,andc. Dynamicvision b. Model-driven,top-down,generativeprocessing—amodelofthe