Vision is the act of knowing what is where 2102642

11

Suree Pumrin, Ph.D.Suree Pumrin, Ph.D.

Chapter 1Chapter 1IntroductionIntroduction

2102642 Computer Vision 2102642 Computer Vision and Video Electronicsand Video Electronics

(Forsyth & Ponce)(Forsyth & Ponce) 22

What is computer vision?What is computer vision?

““ VisionVision is the act of knowing what is where is the act of knowing what is where by looking.by looking.”” –– AristotleAristotle

Definition: The goal of computerDefinition: The goal of computer vision is vision is to make useful decisions about real to make useful decisions about real physical objects and scenes based on physical objects and scenes based on sensed images.sensed images.


Why study Computer Vision?Why study Computer Vision?

Images and movies are everywhereImages and movies are everywhereFastFast--growing collection of useful applicationsgrowing collection of useful applications

building representations of the 3D world from picturesbuilding representations of the 3D world from picturesautomated surveillance (whoautomated surveillance (who’’s doing what)s doing what)movie postmovie post--processingprocessingface findingface finding

Various deep and attractive scientific mysteriesVarious deep and attractive scientific mysterieshow does object recognition work?how does object recognition work?

Greater understanding of human visionGreater understanding of human vision


Fundamental IssuesFundamental Issues

Sensing: How do sensors obtain images of the Sensing: How do sensors obtain images of the world?world?Encoded information: How do images yield Encoded information: How do images yield information for understanding the 3D world?information for understanding the 3D world?Representations: What representations should Representations: What representations should be used for stored descriptions of objects?be used for stored descriptions of objects?Algorithms: What methods are there to process Algorithms: What methods are there to process image information and construct descriptions of image information and construct descriptions of the world and its objects?the world and its objects?


Fundamental ProblemsFundamental Problems

What are the problems involved in What are the problems involved in vision that make it so easy for the eye vision that make it so easy for the eye but so difficult for the machine?but so difficult for the machine?

Individual objectsIndividual objectsObject categoriesObject categoriesScenes: specific and categoryScenes: specific and categoryDiscrimination Discrimination vsvs detection detection vsvs recognitionrecognitionHow many objects and categories?How many objects and categories?


Applications of Computer VisionApplications of Computer Vision

Biometric measurement: iris, fingerprint, Biometric measurement: iris, fingerprint, face recognitionface recognitionManufacturing: Manufacturing: quality control, robot quality control, robot assembly, automated inspectionassembly, automated inspectionSurveillance, security: detect/recognize Surveillance, security: detect/recognize people, objectspeople, objectsMilitary: tank, airplane identificationMilitary: tank, airplane identificationTraffic: monitoring and control, vehicle Traffic: monitoring and control, vehicle guidanceguidance


Properties of VisionProperties of Vision

3D representations are easily constructed3D representations are easily constructedThere are many different cues. There are many different cues. Useful Useful •• to humans (avoid bumping into things; planning a to humans (avoid bumping into things; planning a

grasp; etc.)grasp; etc.)•• in computer vision (build models for movies).in computer vision (build models for movies).

Cues includeCues include•• multiple views (motion, multiple views (motion, stereopsisstereopsis))•• texture texture •• shadingshading


Properties of VisionProperties of Vision

People draw distinctions between what is seenPeople draw distinctions between what is seen““Object recognitionObject recognition””This could mean This could mean ““is this a fish or a bicycle?is this a fish or a bicycle?””It could mean It could mean ““is this George Washington?is this George Washington?””It could mean It could mean ““is this poisonous or not?is this poisonous or not?””It could mean It could mean ““is this slippery or not?is this slippery or not?””It could mean It could mean ““will this support my weight?will this support my weight?””Great mysteryGreat mystery

•• How to build programs that can draw useful distinctions How to build programs that can draw useful distinctions based on image properties.based on image properties.


Part I: The Physics of ImagingPart I: The Physics of Imaging

How images are formedHow images are formed•• CamerasCameras

What a camera doesWhat a camera doesHow to tell where the camera wasHow to tell where the camera was

•• LightLightHow to measure lightHow to measure lightWhat light does at surfacesWhat light does at surfacesHow the brightness values we see in cameras are How the brightness values we see in cameras are determineddetermined

•• ColorColorThe underlying mechanisms of colorThe underlying mechanisms of colorHow to describe it and measure itHow to describe it and measure it

(Forsyth & Ponce)(Forsyth & Ponce) 1010Images are two-dimensional patterns of brightness values.

They are formed by the projection of 3D objects.

Figure from US Navy Manual of Basic Optics and Optical Instruments, prepared by Bureau of Naval Personnel. Reprinted by Dover Publications, Inc., 1969.


Animal eye: a long time ago.

Pinhole perspective projection: Brunelleschi, XVth Century. Camera obscura: XVIth

Century.

Photographic camera:Niepce, 1816.

Reproduced by permission, the American Society of Photogrammetry andRemote Sensing. A.L. Nowicki, “Stereoscopy.” Manual of Photogrammetry,Thompson, Radlinski, and Speert (eds.), third edition, 1966. Figure from US Navy

Manual of Basic Optics and Optical Instruments, prepared by Bureau of Naval Personnel. Reprinted by Dover Publications, Inc., 1969.



Pinhole Perspective Equation

⎪⎪⎩

⎪⎪⎨

⎧

=

=

zyfy

zxfx

''

''NOTE: z is always negative..


Affine projection models: Weak perspective projection

0

'where''

zfm

myymxx

−=⎩⎨⎧

−=−=

is the magnification.

When the scene relief is small compared its distance from theCamera, m can be taken constant: weak perspective projection.


Affine projection models: Orthographic projection

⎩⎨⎧

==

yyxx

'' When the camera is at a

(roughly constant) distancefrom the scene, take m = -1.


Planar pinhole perspective

Orthographicprojection

Spherical pinholeperspective


Lenses

Snell’s law

n1 sinα1 = n2 sin α2


Paraxial (or first-order) optics

Snell’s law:

n1 sinα1 = n2 sin α2

Small angles:

Rnn

dn

dn 12

2

2

1

1 −=+2211 αα nn ≈


Thin Lenses

)1(2 and11

'1 e wher

''

''

−==−

⎪⎪⎩

⎪⎪⎨

⎧

=

=

nRf

fzzzyzy

zxzx


Thick Lens


SphericalAberration

Distortion

ChromaticAberration

Figure from US Navy Manual of Basic Optics and Optical Instruments, prepared by Bureau of Naval Personnel. Reprinted by Dover Publications, Inc., 1969. (Forsyth & Ponce)(Forsyth & Ponce) 2222

Vignetting


Photographs (Niepce, “La Table Servie,” 1822)

Milestones: Daguerreotypes (1839)Photographic Film (Eastman,1889)Cinema (Lumière Brothers,1895)Color Photography (LumièreBrothers, 1908)Television (Baird, Farnsworth,Zworykin, 1920s)

CCD Devices (1970)

Collection Harlingue-Viollet. .


The Human Eye

Helmoltz’sSchematicEye

Reproduced by permission, the American Society of Photogrammetry andRemote Sensing. A.L. Nowicki, “Stereoscopy.” Manual of Photogrammetry,Thompson, Radlinski, and Speert (eds.), third edition, 1966.


The distribution of rods and cones across the retina

Reprinted from Foundations of Vision, by B. Wandell, SinauerAssociates, Inc., (1995). © 1995 Sinauer Associates, Inc.

Cones in the fovea

Rods and cones in the periphery

Reprinted from Foundations of Vision, by B. Wandell, SinauerAssociates, Inc., (1995). © 1995 Sinauer Associates, Inc.


Part II: Early Vision in One Part II: Early Vision in One ImageImage

Representing small patches of imageRepresenting small patches of imageFor three reasonsFor three reasons•• We wish to establish correspondence between We wish to establish correspondence between

(say) points in different images, so we need to (say) points in different images, so we need to describe the neighborhood of the pointsdescribe the neighborhood of the points

•• Sharp changes are important in practice Sharp changes are important in practice ------ known known as as ““edgesedges””

•• Representing texture by giving some statistics of Representing texture by giving some statistics of the different kinds of small patch present in the the different kinds of small patch present in the texture.texture.

Tigers have lots of bars, few spotsTigers have lots of bars, few spotsLeopards are the other wayLeopards are the other way


Representing an image patchRepresenting an image patch

Filter outputsFilter outputsessentially form a dotessentially form a dot--product between a product between a pattern and an image, while shifting the pattern and an image, while shifting the pattern across the imagepattern across the imagestrong response strong response --> image locally looks like the > image locally looks like the patternpatterne.g. derivatives measured by filtering with a e.g. derivatives measured by filtering with a kernel that looks like a big derivative (bright kernel that looks like a big derivative (bright bar next to dark bar)bar next to dark bar)


Convolve this image

With this kernel

To get this


TextureTexture

Many objects are distinguished by their Many objects are distinguished by their texturetexture•• Tigers, cheetahs, grass, treesTigers, cheetahs, grass, trees

We represent texture with statistics of filter We represent texture with statistics of filter outputsoutputs•• For tigers, bar filters at a coarse scale respond For tigers, bar filters at a coarse scale respond

stronglystrongly•• For cheetahs, spots at the same scaleFor cheetahs, spots at the same scale

Objects with different textures can be Objects with different textures can be segmentedsegmentedThe variation in textures is a cue to shapeThe variation in textures is a cue to shape


(Forsyth & Ponce)(Forsyth & Ponce) 3131 (Forsyth & Ponce)(Forsyth & Ponce) 3232

Part III: Early Vision in Multiple Part III: Early Vision in Multiple ImagesImages

The geometry of multiple viewsThe geometry of multiple viewsWhere could it appear in camera 2 (3, etc.) Where could it appear in camera 2 (3, etc.) given it was here in 1 (1 and 2, etc.)?given it was here in 1 (1 and 2, etc.)?

StereopsisStereopsisWhat we know about the world from having 2 What we know about the world from having 2 eyeseyes

Structure from motionStructure from motionWhat we know about the world from having What we know about the world from having many eyesmany eyes•• or, more commonly, our eyes moving.or, more commonly, our eyes moving.


Part IV: MidPart IV: Mid--Level VisionLevel Vision

Finding coherent structure so as to break Finding coherent structure so as to break the image or movie into big unitsthe image or movie into big units

Segmentation: Segmentation: •• Breaking images and videos into useful piecesBreaking images and videos into useful pieces•• E.g. finding video sequences that correspond to E.g. finding video sequences that correspond to

one shotone shot•• E.g. finding image components that are coherent E.g. finding image components that are coherent

in internal appearancein internal appearance

Tracking:Tracking:•• Keeping track of a moving object through a long Keeping track of a moving object through a long

sequence of viewssequence of views(Forsyth & Ponce)(Forsyth & Ponce) 3434

Part V: High Level Vision Part V: High Level Vision (Geometry) (Geometry)

The relations between object geometry The relations between object geometry and image geometryand image geometry

Model based visionModel based vision•• find the position and orientation of known objectsfind the position and orientation of known objects

Smooth surfaces and outlinesSmooth surfaces and outlines•• how the outline of a curved object is formed, and how the outline of a curved object is formed, and

what it looks likewhat it looks like

Aspect graphsAspect graphs•• how the outline of a curved object moves around how the outline of a curved object moves around

as you view it from different directionsas you view it from different directions

Range dataRange data


Part VI: High Level Vision Part VI: High Level Vision (Probabilistic)(Probabilistic)

Using classifiers and probability to recognize Using classifiers and probability to recognize objectsobjects•• Templates and classifiersTemplates and classifiers

how to find objects that look the same from view to view how to find objects that look the same from view to view with a classifierwith a classifier

Relations Relations •• break up objects into big, simple parts, find the break up objects into big, simple parts, find the

parts with a classifier, and then reason about the parts with a classifier, and then reason about the relationships between the parts to find the object.relationships between the parts to find the object.

Geometric templates from spatial relationsGeometric templates from spatial relations•• extend this trick so that templates are formed from extend this trick so that templates are formed from

relations between much smaller partsrelations between much smaller parts(Forsyth & Ponce)(Forsyth & Ponce) 3636

3D Reconstruction from multiple 3D Reconstruction from multiple viewsviews

Multiple views arise fromMultiple views arise from•• stereostereo•• motionmotion

StrategyStrategy•• ““triangulatetriangulate”” from distinct measurements of the from distinct measurements of the

same thingsame thing

IssuesIssues•• Correspondence: which points in the images are Correspondence: which points in the images are

projections of the same 3D point?projections of the same 3D point?•• The representation: what do we report?The representation: what do we report?•• Noise: how do we get stable, accurate reportsNoise: how do we get stable, accurate reports


Part VII: Some Applications in Part VII: Some Applications in DetailDetail

Finding images in large collectionsFinding images in large collectionssearching for picturessearching for picturesbrowsing collections of picturesbrowsing collections of pictures

Image based renderingImage based renderingoften very difficult to produce models that look often very difficult to produce models that look like real objectslike real objects•• surface weathering, etc., create details that are surface weathering, etc., create details that are

hard to modelhard to model•• Solution: make new pictures from oldSolution: make new pictures from old


Some applications of Some applications of recognitionrecognition

Digital librariesDigital librariesFind me the Find me the picpic of JFK and Marilyn Monroe of JFK and Marilyn Monroe embracingembracingNCMEC (The National Center for Missing and NCMEC (The National Center for Missing and Exploited Children)Exploited Children)

SurveillanceSurveillanceWarn me if there is a mugging in the groveWarn me if there is a mugging in the grove

HCI (Human Computer Interaction) HCI (Human Computer Interaction) Do what I show youDo what I show you

MilitaryMilitaryShoot this, not thatShoot this, not that


What are the problems in What are the problems in recognition?recognition?

Which bits of image should be recognized together? Which bits of image should be recognized together? •• SegmentationSegmentation..

How can objects be recognized without focusing on How can objects be recognized without focusing on detail?detail?

•• AbstractionAbstraction..

How can objects with many free parameters be How can objects with many free parameters be recognized?recognized?

•• No popular name, but itNo popular name, but it’’s a crucial problem anyhow.s a crucial problem anyhow.

How do we structure very large model bases?How do we structure very large model bases?•• again, no popular name; abstraction and learning come into again, no popular name; abstraction and learning come into

thisthis


HistoryHistory


SegmentationSegmentation

Which image components Which image components ““belong belong togethertogether””??Belong together=lie on the same objectBelong together=lie on the same objectCuesCues

similar colorsimilar colorsimilar texturesimilar texturenot separated by contournot separated by contourform a suggestive shape when assembledform a suggestive shape when assembled





Matching templatesMatching templates

Some objects are 2D patternsSome objects are 2D patternse.g. facese.g. faces

Build an explicit pattern matcherBuild an explicit pattern matcherdiscount changes in illumination by using a discount changes in illumination by using a parametric modelparametric modelchanges in background are hardchanges in background are hardchanges in pose are hardchanges in pose are hard


http://www.ri.cmu.edu/projects/project_271.html


Relations between templatesRelations between templates

e.g. find faces by e.g. find faces by finding eyes, nose, mouthfinding eyes, nose, mouthfinding assembly of the three that has the finding assembly of the three that has the ““rightright”” relationsrelations

(Forsyth & Ponce)(Forsyth & Ponce) 5151 (Forsyth & Ponce)(Forsyth & Ponce) 5252http://www.ri.cmu.edu/projects/project_320.html


Representing the 3D worldRepresenting the 3D world

Assemblies of primitivesAssemblies of primitivesfit parametric formsfit parametric formsIssuesIssues

•• what primitives?what primitives?•• uniqueness of representationuniqueness of representation•• few objects are actual primitivesfew objects are actual primitives

Indexed collection of imagesIndexed collection of imagesuse interpolation to predict appearance between use interpolation to predict appearance between imagesimagesIssuesIssues

•• occlusion is a mild nuisanceocclusion is a mild nuisance•• structuring the collection can be trickystructuring the collection can be tricky


PeoplePeopleSkin is characteristic; clothing hard to Skin is characteristic; clothing hard to segmentsegment•• hence, people wearing little clothinghence, people wearing little clothing

Finding body segments:Finding body segments:•• finding skinfinding skin--like (color, texture) regions that have like (color, texture) regions that have

nearly straight, nearly parallel boundaries nearly straight, nearly parallel boundaries

Grouping process constructed by hand, tuned Grouping process constructed by hand, tuned by hand using small dataset.by hand using small dataset.When a sufficiently large group is found, When a sufficiently large group is found, assert a person is presentassert a person is present


Horse grouperHorse grouper


TrackingTracking

Use a model to predict next position and Use a model to predict next position and refine using next imagerefine using next imageModel:Model:

simple dynamic models (second order simple dynamic models (second order dynamics)dynamics)kinematickinematic modelsmodelsetc.etc.

Face tracking and eye tracking now work Face tracking and eye tracking now work rather wellrather well

Documents

Vision is the act of knowing what is where 2102642