image formation lec9 - Carnegie Mellon University16720.courses.cs.cmu.edu/lec/image_formation_lec9.pdf · Perspective projection Closer objects appear larger ... Parallel lines meet

Image formation

Agenda

• Perspective projection

• Rotations

• Camera models

Light as a wave + particle

Light as a wave (ignore for now)

Refraction Diffraction

Image formationImage Formation

Digital Camera

The Eye

Film

Image Formation

Digital Camera

The Eye

Film

Digital Image

Image Formation

Digital Camera

The Eye

Film

Human eye

Pixel brightness

CS 217 Lecture 1 — April 1 Spring 2009

2. We measure the amount of light leaving a surface as

Radiance = power / (foreshortened area · solid angle)

= watt / (meter2 · sr)

=δ2P

δA · δω≈

P

∆A · ∆ωSteradian = surface area of an unit radians sphere

cut-out by a solid angle (0 ∼ 4π)

in 1D, reduces to radian = length of an unit radius circle

cut-out by a unit angle (0 ∼ 2π)

we need foreshortened area because a patch directly overhead δA sees more of A

1.1.3 Imaging a pixel

pixel intensity∝ total irradiance =

! x+∆x

x

! y+∆y

y

! 1

t=0

! π

−π

! π

2

0E[x, y, t, θ,φ] · f(θ,φ) dx dy dt dθ dφ

(sensor response)

0 < f(θ,φ) < 1↪→ will tend to 1 for (θ,φ) directly overhead patch

Q : Why do we not see an image of a scene on a paper?

A: Restrict directions of incoming light with a pinhole.

Pinhole optics:

Right-hand coordinate system place the scene at -z:

y′

f ′=

y

z,

x′

f ′=

x

z⇒

y = f ′ · yz

x = f ′ · xz

1-2

(More on “light as psychics” at end of semseter)

Pinhole opticsPinhole camera

Camera ObscuraCamera Obscura

World’s largest photographWorld’s largest photograph – 2006, El Toro Marine Corps Irvine,CA

El Toro Marine Corps, Irvine CA 2006

Accidental pinholes

(the view from Antonio’s hotel room)

what’s the dark stuff?

Torralba and Freeman, CVPR’12

Torralba and Freeman, CVPR’12

Accidental pinhole and pinspeck cameras: revealing the scene outside the picture

Antonio Torralba, William T. FreemanComputer Science and Artificial Intelligence Laboratory (CSAIL)

[email protected], [email protected]

Abstract

We identify and study two types of “accidental” imagesthat can be formed in scenes. The first is an accidental pin-hole camera image. These images are often mistaken forshadows, but can reveal structures outside a room, or theunseen shape of the light aperture into the room. The sec-ond class of accidental images are “inverse” pinhole cam-era images, formed by subtracting an image with a smalloccluder present from a reference image without the oc-cluder. The reference image can be an earlier frame of avideo sequence. Both types of accidental images happen ina variety of different situations (an indoor scene illuminatedby natural light, a street with a person walking under theshadow of a building, etc.). Accidental cameras can revealinformation about the scene outside the image, the lightingconditions, or the aperture by which light enters the scene.

1. Introduction

Researchers in computer vision have explored numerousways to form images, including novel lenses, mirrors, codedapertures, and light sources (e.g. [1, 2, 7, 10]). The novelcameras are, by necessity, carefully designed to control thelight transport such that images can be viewed from thedata recorded by the sensors. In this paper, we point outthat in scenes, accidental images can also form, and can berevealed within still images or extracted from a video se-quence using simple processing, corresponding to acciden-tal real and “inverse” pinhole camera images, respectively.These images are typically of poorer quality than imagesformed by intentional cameras, but they are present in manyscenes illuminated by indirect light and often occur withoutus noticing them.

A child might ask: why we don’t see an image of theworld around us when we view a blank surface? In a sensewe do: light rays yielding images of the world do land onsurfaces and then reflect back to our eye. But there are toomany of them and they all wash out to the ambient illu-mination we observe in a room or outdoors. Of course, if

a) b)

Figure 1. a) Light enters the room via an open window. b) On thewall opposite the window, we can see a projected pattern of lightand shadow. But, are the dark regions shadows? See Fig. 2

one restricts the set of light rays falling on a surface, we canreveal some particular one of the images. This is what a pin-hole camera does. Only a restricted set of rays falls on thesensor, and we can observe an image if we look at a surfacewith light from only a pinhole falling on it. A second wayto view an image when looking at a surface is to restrict thereflected rays from the surface by looking at a mirror sur-face. All rays impinge on the surface, but only those from aparticular direction reflect properly into our eye and so weagain see an image when viewing a surface.

There are many ways in which pictures are formedaround us. The most efficient mechanisms are to use lensesor narrow apertures to focus light into a picture of what isin front. So a set of occluders (to form a pinhole camera)or a mirror surface (to capture only a subset of the reflectedrays) will let us see an image as we view a surface. Forthose cases, an image is formed by intentionally building aparticular arrangement of surfaces that will result in a cam-era.

However, similar arrangements appear naturally by acci-dental arrangements of surfaces in many places. Often theobserver is not aware of the images produced by those ac-cidental cameras. Fig. 1.b shows one example of a picturein which one can see a pattern of shadows and reflectionsprojected on the walls of different scenes. Indeed, at first,one could miss-interpret some of the dark patterns on the

1

CVPR 2012

Perspective projection

Closer objects appear larger Closer objects are lower in the image

Parallel lines meet

Great reference

https://www.youtube.com/watch?v=q8xsXFU7dK0&list=PLc0IeyeoGt2xtmfaF2ST_uNdeptre3f9s&index=2

https://www.youtube.com/watch?v=q8xsXFU7dK0&list=PLc0IeyeoGt2xtmfaF2ST_uNdeptre3f9s&index=2

Pinhole Camera

How do we compute P’? [on board]

optical axis

[Aside: right-handed coordinate system]

Pinhole Camera

Image inversion

Image inversion

Perplexed folks for a while. But software (or the brain) can simply invert this.

Physical model that avoids inversion

COP = pinhole, camera center Distance of COP to easel = focal length

“easel”

Visual angle

“easel”✓

✓ =L

f

Human head is 9 inches high. At a distance of 9 feet, it subtends 1/12 radians = 4.8 degrees, regardless of focal length

theta = units of radians

Note: math is easier for a spherical easel (e.g., retina)

L = length of projection on sphere

(common unit in human vision)

Field of view (FOV)24mm

50mm

135mm

Field of View

24mm

50mm

135mm

Field of View

FOV = total sensor size (diagonal)focal length

(in radians)

Increasing the focal length and stepping back

© Marc Levoy

© Marc Levoy

✦ changing the focal length lets us move back from a subject, while maintaining its size on the image

✦ but moving back changes perspective relationships

What happens to apparant object size and FOV when we double distance to object and double the focal length?

xnew

=

2fX

2Z=

fX

Z= x

old

FOVnew

=

sensor size

2f=

1

2

FOVold

Decreasing the focal length and moving forward

© Marc Levoy

Perspective projection

Closer objects appear larger Closer objects are lower in the image

Parallel lines meetAll these can be simply derived with x = f

X

Z

!

Vanishing point: proof2

4XYZ

3

5 =

2

4A

x

By

Cz

3

5+ �

2

4D

x

Dy

Dz

3

5

x =fX

Z

=f(A

x

+ �D

x

)

A

z

+ �D

z

! fD

x

D

z

as � ! 1

y =fY

Z=

f(Ax

+ �Dx

)

Az

+ �Dz

! fDy

Dz

as � ! 1

3D lines with identical direction vectors coverge to same 2D image location

(parallel lines meet)

Compute projected point (x,y) as lambda approaches infinity [on board]:

COP

(X,Y,Z)(x,y,f)

Special case: manhatten world

VP1 VP2

VP3

Consider a “city-block” world where all lines follow one of 3 directions

Special case: horizon lineFunny things happen… Parallel lines aren’t…

Figure by David Forsyth Claim: all 3D lines on ground plane meet at a horizon line

Horizon line: proof

For all points A on ground plane (Ax,-h,Az) with a direction D along ground plane (Dx,0,Dz), where will vanishing points converge to?

Equation of ground plane is Y = -h

2

4XYZ

3

5 =

2

4A

x

By

Cz

3

5+ �

2

4D

x

Dy

Dz

3

5

(fD

x

Dz

, 0)

(x, y) ! (fD

x

D

z

,

fD

y

D

z

) as � ! 1

COP

(X,Y,Z)(x,y,f)

Why is horizon line not always at center of image?

Image y position: proofEquation of ground plane is Y = -h

A point on ground plane will have y-coordinate=?

Z1

Z2Z3

y = -fh/Z

Image height: proofBottom of tree: (X,-h,Z) Top of tree: (X,L-h,Z)

ytop

� ybot

=f(L� h)

Z� �fh

Z=

fL

Z

Consequence of derivations for image height and parallel lines

distances and angles aren’t preserved in camera projection

32

Orthographic projection

(X,Y,Z)(x,y,f)

x = X y = Y

x = fX/Z y = fY/Z

Life would be much simpler; we could trust angles and distances

COP

(X,Y,Z)(x,y,f)

33

Scaled orthographic projection

2

4A

x

Ay

Z

3

5

2

4B

x

By

Z +�Z

3

5

Consider two points (A,B) at different depths that are far away from camera:

if Z >> deltaZ, what happens to their image projections (e.g., ax and bx)?

ax

=

fAx

Z= ↵A

x

bx

=

fBx

Z +�Z

⇡ fBx

Z= ↵B

x

for �Z ⌧ Z

We can approximate sets of such points with a scaled orthographic model

COP

Perspective vs Orthogrpahic

Wide angle Standard Telephoto

Perspective vs Orthographic

Wide angle Standard Telephoto

Scaled orthographic

Scaled orthographic

Perspective tends to matter for large objects

Funny things happen… (change in depth of object large relative to distance from camera)

A look back: dominant effects of perspective

• Parallel lines meet at vanishing points

• Objects further away are smaller

• Foreshortening

Fronto-parallel view Foreshortened view Perspective view

Affine “linear” warp Homography “nonlinear” warp

Rotation of far-away plane Rotation of close-by plane

164 Computer Vision: Algorithms and Applications (September 3, 2010 draft)

Transformation Matrix # DoF Preserves Icon

translationh

I ti

2⇥32 orientation

rigid (Euclidean)h

R ti

2⇥33 lengths ⇢⇢

⇢⇢SSSS

similarityh

sR ti

2⇥34 angles ⇢

⇢SS

affineh

Ai

2⇥36 parallelism ⇥⇥ ⇥⇥

projectiveh

˜Hi

3⇥38 straight lines `

Table 3.5 Hierarchy of 2D coordinate transformations. Each transformation also preservesthe properties listed in the rows below it, i.e., similarity preserves not only angles but alsoparallelism and straight lines. The 2⇥3 matrices are extended with a third [0T 1] row to forma full 3⇥ 3 matrix for homogeneous coordinate transformations.

amples of such transformations, which are based on the 2D geometric transformations shownin Figure 2.4. The formulas for these transformations were originally given in Table 2.1 andare reproduced here in Table 3.5 for ease of reference.

In general, given a transformation specified by a formula x0 = h(x) and a source imagef(x), how do we compute the values of the pixels in the new image g(x), as given in (3.88)?Think about this for a minute before proceeding and see if you can figure it out.

If you are like most people, you will come up with an algorithm that looks something likeAlgorithm 3.1. This process is called forward warping or forward mapping and is shown inFigure 3.46a. Can you think of any problems with this approach?

procedure forwardWarp(f,h, out g):

For every pixel x in f(x)

1. Compute the destination location x0 = h(x).

2. Copy the pixel f(x) to g(x0).

Algorithm 3.1 Forward warping algorithm for transforming an image f(x) into an imageg(x0) through the parametric transform x0 = h(x).

36 Computer Vision: Algorithms and Applications (September 3, 2010 draft)

y

x

similarity

Euclidean affine

projective

translation

Figure 2.4 Basic set of 2D planar transformations.

Translation. 2D translations can be written as x0 = x + t or

x0 =h

I tix (2.14)

where I is the (2⇥ 2) identity matrix or

x0 =

"I t

0

T 1

#x (2.15)

where 0 is the zero vector. Using a 2⇥ 3 matrix results in a more compact notation, whereasusing a full-rank 3⇥ 3 matrix (which can be obtained from the 2⇥ 3 matrix by appending a[0T 1] row) makes it possible to chain transformations using matrix multiplication. Note thatin any equation where an augmented vector such as x appears on both sides, it can always bereplaced with a full homogeneous vector x.

Rotation + translation. This transformation is also known as 2D rigid body motion or the2D Euclidean transformation (since Euclidean distances are preserved). It can be written asx0 = Rx + t or

x0 =h

R tix (2.16)

where

R =

"cos ✓ � sin ✓

sin ✓ cos ✓

#(2.17)

is an orthonormal rotation matrix with RRT = I and |R| = 1.

Scaled rotation. Also known as the similarity transform, this transformation can be ex-pressed as x0 = sRx + t where s is an arbitrary scale factor. It can also be written as

x0 =h

sR tix =

"a �b txb a ty

#x, (2.18)

where we no longer require that a2 + b2 = 1. The similarity transform preserves anglesbetween lines.

2D Geometric Transformations

Let’s define families of transformations by the properties that they preserve

Where we are headed….

Euclidean (trans + rot) preserves lengths + angles

Euclidean

Affine

Projective

Affine: preserves parallel lines

Projective: preserves lines

but first, we’ll need tools from geometry

Agenda


• Rotations

• Camera models

Orthogonal transformations

Defn: Orthogonal transformations are linear transformations that preserve distances and angles

[can conclude by setting a,b = coordinate vectors]

Defn: A is a rotation matrix if ATA = I, det(A) = 1Defn: A is a reflection matrix if ATA = I, det(A) = -1

aT b = T (a)T (b) where T (a) = Aa, a 2 Rn, A 2 Rn⇥n

aT b = aTATAb () ATA = I

aT b = F (a)TF (b) where F (a) = Aa, a 2 Rn, A 2 Rn⇥n

2D Rotations

R =

cos ✓ � sin ✓sin ✓ cos ✓

�

1 DOF

3D Rotations

Think of as change of basis where ri = r(i,:) are orthonormal basis vectors

R

2

4XYZ

3

5 =

2

4r11 r12 r13r21 r22 r23r31 r32 r33

3

5

2

4XYZ

3

5

rotated coordinate frame

r1

r2

r3

How many DOFs?

3 = (2 to point r1 + 1 to rotate along r1)

Euler’s rotation theorm

Any rotation of a rigid body in a three-dimensional space is equivalent to a pure rotation about a single fixed axis

https://en.wikipedia.org/wiki/Euler's_rotation_theorem

https://en.wikipedia.org/wiki/Euler's_rotation_theorem

3D RotationsLots of parameterizations that try to capture 3 DOFs

Helpful ones for vision: orthonormal matrix, axis-angle, exponential maps

Represent a 3D rotation with a unit vector pointed along the axis of rotation, and an angle of rotation about that vector

7

Shears

A=

2

664

1 hxy hxz 0hyx 1 hyz 0hzx hzy 1 00 0 0 1

3

775

Shears y into x

7

8

Rotations• 3D Rotations fundamentally more complex than in 2D!

• 2D: amount of rotation!• 3D: amount and axis of rotation

-vs-

2D 3D

8

05-3DTransformations.key - February 9, 2015

Review: dot and cross products

Dot product:

Cross product:

a · b = ||a|| ||b||cos✓

Cross product matrix: a⇥ b = ab =

2

40 �a3 a2a3 0 �a1�a2 a1 0

3

5

2

4b1b2b3

3

5

a⇥ b =

2

4a2b3 � a3b2b1a3 � a1b3a1b2 � a2b1

3

5

Approach

x

! 2 R3, ||!|| = 1

✓

https://en.wikipedia.org/wiki/Axis-angle_representation

https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representation

Rodrigues' rotation formula

x✓

! 2 R3, ||!|| = 1

xk

x?

1. Write as x as sum of parallel and perpindicular component to omega

2. Rotate perpindicular component by 2D rotation of theta in plane orthogonal to omega

R = I + w sin ✓ + ww(1� cos ✓)

[Rx can simplify to cross and dot product computations]

https://en.wikipedia.org/wiki/Rodrigues'_rotation_formula_rotation_formula

https://en.wikipedia.org/wiki/Rodrigues'_rotation_formula

Exponential map representation

x✓

! 2 R3, ||!|| = 1

xk

x?

[standard Taylor series expansion of exp(x) @ x=0 as 1 + x + (1/2!)x2 +…]

Implies that we can approximate change in position of x due to a small rotation v as: v ⇥ x, where v = !✓

R = exp(v), where v = !✓

= I + v +1

2!

v2 + . . .

[reduces to Rodrigous’ formula with Taylor series expansion of sine + cosine]

Agenda


• Rotations

• Camera models

Recall perspective projection

COP

(X,Y,Z)

(x,y,1)

x =f

Z

X

y =f

Z

Y

x

y

z

Perspective projection revisited

�

2

4x

y

1

3

5 =

2

4f 0 00 f 00 0 1

3

5

2

4X

Y

Z

3

5

�x = fX

� = Z

x =�x

�

=fX

Z

Given (X,Y,Z) and f, compute (x,y) and lambda:

Special case: f = 1

COP

(X,Y,Z)(x,y,1)

• 3D point is obtained by scaling ray pointed at image coordinate • Scale factor = true depth of point

Natural geometric intuition:

[Aside: given an image with a focal length ‘f’, resize by ‘1/f’ to obtain unit-focal-length image]

Z

2

4x

y

1

3

5 =

2

4X

Y

Z

3

5

Homogenous notation

For now, think of above as shorthand notation for

2

4x

y

z

3

5 ⇠

2

4X

Y

Z

3

5

2

4x

y

z

3

5 ⌘

2

4X

Y

Z

3

5

9� s.t. �

2

4x

y

z

3

5 =

2

4X

Y

Z

3

5

Camera projection

3D point in world coordinates

Camera extrinsics (rotation and translation)

Camera instrinsic matrix K (can include skew & non-square pixel size)

�

2

4x

y

1

3

5 =

2

4f 0 00 f 00 0 1

3

5

2

4r11 r12 r13 t

x

r21 r22 r23 t

y

r31 r32 r33 t

z

3

5

2

664

X

Y

Z

1

3

775

camera

world coordinate frame

r1

r2

r3

T

Aside: homogenous notation is shorthand for x =�x

�

Fancier intrinsicsx

s

= s

x

x

y

s

= s

y

y

x

0 = x

s

+ o

x

y

0 = y

s

+ o

y

x” = x

0 + s

✓

y

0

non-square pixels

shifted origin

x

y

✓ skewed image axes

}

}

K =

2

4s

x

s

✓

o

x

0 s

y

o

y

0 0 1

3

5

2

4f 0 00 f 00 0 1

3

5 =

2

4fs

x

fs

✓

o

x

0 fs

y

o

y

0 0 1

3

5

Notation�

2

4x

y

1

3

5 =

2

4fs

x

fs

✓

o

x

0 fs

y

o

y

0 0 1

3

5

2

4r11 r12 r13 t

x

r21 r22 r23 t

y

r31 r32 r33 t

z

3

5

2

664

X

Y

Z

1

3

775

= K3⇥3

⇥R3⇥3 T3⇥1

⇤

2

664

X

Y

Z

1

3

775

= M3⇥4

2

664

X

Y

Z

1

3

775

Claims (without proof): 1. A 3x4 matrix ‘M’ can be a camera matrix iff det(M) is not zero 2. M is determined only up to a scale factor

[Using Matlab’s rows x columns]

Notation (more)M3⇥4

2

664

XYZ1

3

775 =⇥A3⇥3 b3⇥1

⇤

2

664

XYZ1

3

775

= A3⇥3

2

4XYZ

3

5+ b3⇥1

M =

2

4mT

1

mT2

mT3

3

5 , A =

2

4aT1aT2aT3

3

5 , b =

2

4b1b2b3

3

5

Applying the projection matrix

Set of 3D points that project to x = 0:

Set of 3D points that project to y = 0:

Set of 3D points that project to x = inf or y = inf:

� =⇥X Y Z

⇤a3 + b3

⇥X Y Z

⇤a1 + b1 = 0

⇥X Y Z

⇤a2 + b2 = 0

⇥X Y Z

⇤a3 + b3 = 0

x =1

�

(⇥X Y Z

⇤a1 + b1)

y =1

�(⇥X Y Z

⇤a2 + b2)

x

y

a3

Rows of the projection matrix describe the 3 planes defined by the image coordinate system

a1

a2

image plane

COP

(x,y) (X,Y,Z)

What’s set of (X,Y,Z) points that project to same (x,y)?2

4X

Y

Z

3

5 = �w + b where w = A

�1

2

4x

y

1

3

5, b = �A

�1b

What’s the position of COP / pinhole?

COP

A

2

4XYZ

3

5+ b = 0 )

2

4XYZ

3

5 = �A�1b

Other geometric properties

Draw plane infront of pinhole. Write (x,y) for normalized coordinate and (u,v) for image coordinates?

Affine cameras

mT3 =

⇥0 0 0 1

⇤perspective weak perspective

Affine cameras

Captures 3D affine transformation + orthographic projection + 2D affine transformation

x

y

�=

· · ·· · ·

�2

41

11

3

5

2

664

· · · ·· · · ·· · · ·

1

3

775

2

664

X

Y

Z

1

3

775

=

2

4a11 a12 a13 b1a21 a22 a23 b2

1

3

5

2

664

XYZ1

3

775

=

a11 a12 a13a21 a22 a23

�2

4XYZ

3

5+

b1b2

�

x = AX+ b

• Projection defined by 8 parameters • Parallel lines project to parallel lines • 2D points = linear projection of 3D points (+ 2D translation)

Affine Cameras

• Example: Weak-perspective projection model • Projection defined by 8 parameters • Parallel lines project to parallel lines • The transformation can be written as a direct linear transformation plus an offset

Image coordinates (x,y) are an affine function of world coordinates (X,Y,Z)

mT3 =

⇥0 0 0 1

⇤ x =⇥X Y Z

⇤a1 + b1

y =⇥X Y Z

⇤a2 + b1

Geometric Transformations

Euclidean (trans + rot) preserves lengths + angles

Euclidean

Affine

Projective

Affine: preserves parallel lines

Projective: preserves lines

Documents

image formation lec9 - Carnegie Mellon University16720.courses.cs.cmu.edu/lec/image_formation_lec9.pdf · Perspective projection Closer objects appear larger ... Parallel lines meet