Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
11
Suree Pumrin, Ph.D.Suree Pumrin, Ph.D.
Chapter 1Chapter 1IntroductionIntroduction
2102642 Computer Vision 2102642 Computer Vision and Video Electronicsand Video Electronics
(Forsyth & Ponce)(Forsyth & Ponce) 22
What is computer vision?What is computer vision?
““ VisionVision is the act of knowing what is where is the act of knowing what is where by looking.by looking.”” –– AristotleAristotle
Definition: The goal of computerDefinition: The goal of computer vision is vision is to make useful decisions about real to make useful decisions about real physical objects and scenes based on physical objects and scenes based on sensed images.sensed images.
(Forsyth & Ponce)(Forsyth & Ponce) 33
Why study Computer Vision?Why study Computer Vision?
Images and movies are everywhereImages and movies are everywhereFastFast--growing collection of useful applicationsgrowing collection of useful applications
building representations of the 3D world from picturesbuilding representations of the 3D world from picturesautomated surveillance (whoautomated surveillance (who’’s doing what)s doing what)movie postmovie post--processingprocessingface findingface finding
Various deep and attractive scientific mysteriesVarious deep and attractive scientific mysterieshow does object recognition work?how does object recognition work?
Greater understanding of human visionGreater understanding of human vision
(Forsyth & Ponce)(Forsyth & Ponce) 44
Fundamental IssuesFundamental Issues
Sensing: How do sensors obtain images of the Sensing: How do sensors obtain images of the world?world?Encoded information: How do images yield Encoded information: How do images yield information for understanding the 3D world?information for understanding the 3D world?Representations: What representations should Representations: What representations should be used for stored descriptions of objects?be used for stored descriptions of objects?Algorithms: What methods are there to process Algorithms: What methods are there to process image information and construct descriptions of image information and construct descriptions of the world and its objects?the world and its objects?
(Forsyth & Ponce)(Forsyth & Ponce) 55
Fundamental ProblemsFundamental Problems
What are the problems involved in What are the problems involved in vision that make it so easy for the eye vision that make it so easy for the eye but so difficult for the machine?but so difficult for the machine?
Individual objectsIndividual objectsObject categoriesObject categoriesScenes: specific and categoryScenes: specific and categoryDiscrimination Discrimination vsvs detection detection vsvs recognitionrecognitionHow many objects and categories?How many objects and categories?
(Forsyth & Ponce)(Forsyth & Ponce) 66
Applications of Computer VisionApplications of Computer Vision
Biometric measurement: iris, fingerprint, Biometric measurement: iris, fingerprint, face recognitionface recognitionManufacturing: Manufacturing: quality control, robot quality control, robot assembly, automated inspectionassembly, automated inspectionSurveillance, security: detect/recognize Surveillance, security: detect/recognize people, objectspeople, objectsMilitary: tank, airplane identificationMilitary: tank, airplane identificationTraffic: monitoring and control, vehicle Traffic: monitoring and control, vehicle guidanceguidance
(Forsyth & Ponce)(Forsyth & Ponce) 77
Properties of VisionProperties of Vision
3D representations are easily constructed3D representations are easily constructedThere are many different cues. There are many different cues. Useful Useful •• to humans (avoid bumping into things; planning a to humans (avoid bumping into things; planning a
grasp; etc.)grasp; etc.)•• in computer vision (build models for movies).in computer vision (build models for movies).
Cues includeCues include•• multiple views (motion, multiple views (motion, stereopsisstereopsis))•• texture texture •• shadingshading
(Forsyth & Ponce)(Forsyth & Ponce) 88
Properties of VisionProperties of Vision
People draw distinctions between what is seenPeople draw distinctions between what is seen““Object recognitionObject recognition””This could mean This could mean ““is this a fish or a bicycle?is this a fish or a bicycle?””It could mean It could mean ““is this George Washington?is this George Washington?””It could mean It could mean ““is this poisonous or not?is this poisonous or not?””It could mean It could mean ““is this slippery or not?is this slippery or not?””It could mean It could mean ““will this support my weight?will this support my weight?””Great mysteryGreat mystery
•• How to build programs that can draw useful distinctions How to build programs that can draw useful distinctions based on image properties.based on image properties.
(Forsyth & Ponce)(Forsyth & Ponce) 99
Part I: The Physics of ImagingPart I: The Physics of Imaging
How images are formedHow images are formed•• CamerasCameras
What a camera doesWhat a camera doesHow to tell where the camera wasHow to tell where the camera was
•• LightLightHow to measure lightHow to measure lightWhat light does at surfacesWhat light does at surfacesHow the brightness values we see in cameras are How the brightness values we see in cameras are determineddetermined
•• ColorColorThe underlying mechanisms of colorThe underlying mechanisms of colorHow to describe it and measure itHow to describe it and measure it
(Forsyth & Ponce)(Forsyth & Ponce) 1010Images are two-dimensional patterns of brightness values.
They are formed by the projection of 3D objects.
Figure from US Navy Manual of Basic Optics and Optical Instruments, prepared by Bureau of Naval Personnel. Reprinted by Dover Publications, Inc., 1969.
(Forsyth & Ponce)(Forsyth & Ponce) 1111
Animal eye: a long time ago.
Pinhole perspective projection: Brunelleschi, XVth Century. Camera obscura: XVIth
Century.
Photographic camera:Niepce, 1816.
Reproduced by permission, the American Society of Photogrammetry andRemote Sensing. A.L. Nowicki, “Stereoscopy.” Manual of Photogrammetry,Thompson, Radlinski, and Speert (eds.), third edition, 1966. Figure from US Navy
Manual of Basic Optics and Optical Instruments, prepared by Bureau of Naval Personnel. Reprinted by Dover Publications, Inc., 1969.
(Forsyth & Ponce)(Forsyth & Ponce) 1212
(Forsyth & Ponce)(Forsyth & Ponce) 1313
Pinhole Perspective Equation
⎪⎪⎩
⎪⎪⎨
⎧
=
=
zyfy
zxfx
''
''NOTE: z is always negative..
(Forsyth & Ponce)(Forsyth & Ponce) 1414
Affine projection models: Weak perspective projection
0
'where''
zfm
myymxx
−=⎩⎨⎧
−=−=
is the magnification.
When the scene relief is small compared its distance from theCamera, m can be taken constant: weak perspective projection.
(Forsyth & Ponce)(Forsyth & Ponce) 1515
Affine projection models: Orthographic projection
⎩⎨⎧
==
yyxx
'' When the camera is at a
(roughly constant) distancefrom the scene, take m = -1.
(Forsyth & Ponce)(Forsyth & Ponce) 1616
Planar pinhole perspective
Orthographicprojection
Spherical pinholeperspective
(Forsyth & Ponce)(Forsyth & Ponce) 1717
Lenses
Snell’s law
n1 sinα1 = n2 sin α2
(Forsyth & Ponce)(Forsyth & Ponce) 1818
Paraxial (or first-order) optics
Snell’s law:
n1 sinα1 = n2 sin α2
Small angles:
Rnn
dn
dn 12
2
2
1
1 −=+2211 αα nn ≈
(Forsyth & Ponce)(Forsyth & Ponce) 1919
Thin Lenses
)1(2 and11
'1 e wher
''
''
−==−
⎪⎪⎩
⎪⎪⎨
⎧
=
=
nRf
fzzzyzy
zxzx
(Forsyth & Ponce)(Forsyth & Ponce) 2020
Thick Lens
(Forsyth & Ponce)(Forsyth & Ponce) 2121
SphericalAberration
Distortion
ChromaticAberration
Figure from US Navy Manual of Basic Optics and Optical Instruments, prepared by Bureau of Naval Personnel. Reprinted by Dover Publications, Inc., 1969. (Forsyth & Ponce)(Forsyth & Ponce) 2222
Vignetting
(Forsyth & Ponce)(Forsyth & Ponce) 2323
Photographs (Niepce, “La Table Servie,” 1822)
Milestones: Daguerreotypes (1839)Photographic Film (Eastman,1889)Cinema (Lumière Brothers,1895)Color Photography (LumièreBrothers, 1908)Television (Baird, Farnsworth,Zworykin, 1920s)
CCD Devices (1970)
Collection Harlingue-Viollet. .
(Forsyth & Ponce)(Forsyth & Ponce) 2424
The Human Eye
Helmoltz’sSchematicEye
Reproduced by permission, the American Society of Photogrammetry andRemote Sensing. A.L. Nowicki, “Stereoscopy.” Manual of Photogrammetry,Thompson, Radlinski, and Speert (eds.), third edition, 1966.
(Forsyth & Ponce)(Forsyth & Ponce) 2525
The distribution of rods and cones across the retina
Reprinted from Foundations of Vision, by B. Wandell, SinauerAssociates, Inc., (1995). © 1995 Sinauer Associates, Inc.
Cones in the fovea
Rods and cones in the periphery
Reprinted from Foundations of Vision, by B. Wandell, SinauerAssociates, Inc., (1995). © 1995 Sinauer Associates, Inc.
(Forsyth & Ponce)(Forsyth & Ponce) 2626
Part II: Early Vision in One Part II: Early Vision in One ImageImage
Representing small patches of imageRepresenting small patches of imageFor three reasonsFor three reasons•• We wish to establish correspondence between We wish to establish correspondence between
(say) points in different images, so we need to (say) points in different images, so we need to describe the neighborhood of the pointsdescribe the neighborhood of the points
•• Sharp changes are important in practice Sharp changes are important in practice ------ known known as as ““edgesedges””
•• Representing texture by giving some statistics of Representing texture by giving some statistics of the different kinds of small patch present in the the different kinds of small patch present in the texture.texture.
Tigers have lots of bars, few spotsTigers have lots of bars, few spotsLeopards are the other wayLeopards are the other way
(Forsyth & Ponce)(Forsyth & Ponce) 2727
Representing an image patchRepresenting an image patch
Filter outputsFilter outputsessentially form a dotessentially form a dot--product between a product between a pattern and an image, while shifting the pattern and an image, while shifting the pattern across the imagepattern across the imagestrong response strong response --> image locally looks like the > image locally looks like the patternpatterne.g. derivatives measured by filtering with a e.g. derivatives measured by filtering with a kernel that looks like a big derivative (bright kernel that looks like a big derivative (bright bar next to dark bar)bar next to dark bar)
(Forsyth & Ponce)(Forsyth & Ponce) 2828
Convolve this image
With this kernel
To get this
(Forsyth & Ponce)(Forsyth & Ponce) 2929
TextureTexture
Many objects are distinguished by their Many objects are distinguished by their texturetexture•• Tigers, cheetahs, grass, treesTigers, cheetahs, grass, trees
We represent texture with statistics of filter We represent texture with statistics of filter outputsoutputs•• For tigers, bar filters at a coarse scale respond For tigers, bar filters at a coarse scale respond
stronglystrongly•• For cheetahs, spots at the same scaleFor cheetahs, spots at the same scale
Objects with different textures can be Objects with different textures can be segmentedsegmentedThe variation in textures is a cue to shapeThe variation in textures is a cue to shape
(Forsyth & Ponce)(Forsyth & Ponce) 3030
(Forsyth & Ponce)(Forsyth & Ponce) 3131 (Forsyth & Ponce)(Forsyth & Ponce) 3232
Part III: Early Vision in Multiple Part III: Early Vision in Multiple ImagesImages
The geometry of multiple viewsThe geometry of multiple viewsWhere could it appear in camera 2 (3, etc.) Where could it appear in camera 2 (3, etc.) given it was here in 1 (1 and 2, etc.)?given it was here in 1 (1 and 2, etc.)?
StereopsisStereopsisWhat we know about the world from having 2 What we know about the world from having 2 eyeseyes
Structure from motionStructure from motionWhat we know about the world from having What we know about the world from having many eyesmany eyes•• or, more commonly, our eyes moving.or, more commonly, our eyes moving.
(Forsyth & Ponce)(Forsyth & Ponce) 3333
Part IV: MidPart IV: Mid--Level VisionLevel Vision
Finding coherent structure so as to break Finding coherent structure so as to break the image or movie into big unitsthe image or movie into big units
Segmentation: Segmentation: •• Breaking images and videos into useful piecesBreaking images and videos into useful pieces•• E.g. finding video sequences that correspond to E.g. finding video sequences that correspond to
one shotone shot•• E.g. finding image components that are coherent E.g. finding image components that are coherent
in internal appearancein internal appearance
Tracking:Tracking:•• Keeping track of a moving object through a long Keeping track of a moving object through a long
sequence of viewssequence of views(Forsyth & Ponce)(Forsyth & Ponce) 3434
Part V: High Level Vision Part V: High Level Vision (Geometry) (Geometry)
The relations between object geometry The relations between object geometry and image geometryand image geometry
Model based visionModel based vision•• find the position and orientation of known objectsfind the position and orientation of known objects
Smooth surfaces and outlinesSmooth surfaces and outlines•• how the outline of a curved object is formed, and how the outline of a curved object is formed, and
what it looks likewhat it looks like
Aspect graphsAspect graphs•• how the outline of a curved object moves around how the outline of a curved object moves around
as you view it from different directionsas you view it from different directions
Range dataRange data
(Forsyth & Ponce)(Forsyth & Ponce) 3535
Part VI: High Level Vision Part VI: High Level Vision (Probabilistic)(Probabilistic)
Using classifiers and probability to recognize Using classifiers and probability to recognize objectsobjects•• Templates and classifiersTemplates and classifiers
how to find objects that look the same from view to view how to find objects that look the same from view to view with a classifierwith a classifier
Relations Relations •• break up objects into big, simple parts, find the break up objects into big, simple parts, find the
parts with a classifier, and then reason about the parts with a classifier, and then reason about the relationships between the parts to find the object.relationships between the parts to find the object.
Geometric templates from spatial relationsGeometric templates from spatial relations•• extend this trick so that templates are formed from extend this trick so that templates are formed from
relations between much smaller partsrelations between much smaller parts(Forsyth & Ponce)(Forsyth & Ponce) 3636
3D Reconstruction from multiple 3D Reconstruction from multiple viewsviews
Multiple views arise fromMultiple views arise from•• stereostereo•• motionmotion
StrategyStrategy•• ““triangulatetriangulate”” from distinct measurements of the from distinct measurements of the
same thingsame thing
IssuesIssues•• Correspondence: which points in the images are Correspondence: which points in the images are
projections of the same 3D point?projections of the same 3D point?•• The representation: what do we report?The representation: what do we report?•• Noise: how do we get stable, accurate reportsNoise: how do we get stable, accurate reports
(Forsyth & Ponce)(Forsyth & Ponce) 3737
Part VII: Some Applications in Part VII: Some Applications in DetailDetail
Finding images in large collectionsFinding images in large collectionssearching for picturessearching for picturesbrowsing collections of picturesbrowsing collections of pictures
Image based renderingImage based renderingoften very difficult to produce models that look often very difficult to produce models that look like real objectslike real objects•• surface weathering, etc., create details that are surface weathering, etc., create details that are
hard to modelhard to model•• Solution: make new pictures from oldSolution: make new pictures from old
(Forsyth & Ponce)(Forsyth & Ponce) 3838
Some applications of Some applications of recognitionrecognition
Digital librariesDigital librariesFind me the Find me the picpic of JFK and Marilyn Monroe of JFK and Marilyn Monroe embracingembracingNCMEC (The National Center for Missing and NCMEC (The National Center for Missing and Exploited Children)Exploited Children)
SurveillanceSurveillanceWarn me if there is a mugging in the groveWarn me if there is a mugging in the grove
HCI (Human Computer Interaction) HCI (Human Computer Interaction) Do what I show youDo what I show you
MilitaryMilitaryShoot this, not thatShoot this, not that
(Forsyth & Ponce)(Forsyth & Ponce) 3939
What are the problems in What are the problems in recognition?recognition?
Which bits of image should be recognized together? Which bits of image should be recognized together? •• SegmentationSegmentation..
How can objects be recognized without focusing on How can objects be recognized without focusing on detail?detail?
•• AbstractionAbstraction..
How can objects with many free parameters be How can objects with many free parameters be recognized?recognized?
•• No popular name, but itNo popular name, but it’’s a crucial problem anyhow.s a crucial problem anyhow.
How do we structure very large model bases?How do we structure very large model bases?•• again, no popular name; abstraction and learning come into again, no popular name; abstraction and learning come into
thisthis
(Forsyth & Ponce)(Forsyth & Ponce) 4040
HistoryHistory
(Forsyth & Ponce)(Forsyth & Ponce) 4141
SegmentationSegmentation
Which image components Which image components ““belong belong togethertogether””??Belong together=lie on the same objectBelong together=lie on the same objectCuesCues
similar colorsimilar colorsimilar texturesimilar texturenot separated by contournot separated by contourform a suggestive shape when assembledform a suggestive shape when assembled
(Forsyth & Ponce)(Forsyth & Ponce) 4242
(Forsyth & Ponce)(Forsyth & Ponce) 4343 (Forsyth & Ponce)(Forsyth & Ponce) 4444
(Forsyth & Ponce)(Forsyth & Ponce) 4545 (Forsyth & Ponce)(Forsyth & Ponce) 4646
(Forsyth & Ponce)(Forsyth & Ponce) 4747 (Forsyth & Ponce)(Forsyth & Ponce) 4848
Matching templatesMatching templates
Some objects are 2D patternsSome objects are 2D patternse.g. facese.g. faces
Build an explicit pattern matcherBuild an explicit pattern matcherdiscount changes in illumination by using a discount changes in illumination by using a parametric modelparametric modelchanges in background are hardchanges in background are hardchanges in pose are hardchanges in pose are hard
(Forsyth & Ponce)(Forsyth & Ponce) 4949
http://www.ri.cmu.edu/projects/project_271.html
(Forsyth & Ponce)(Forsyth & Ponce) 5050
Relations between templatesRelations between templates
e.g. find faces by e.g. find faces by finding eyes, nose, mouthfinding eyes, nose, mouthfinding assembly of the three that has the finding assembly of the three that has the ““rightright”” relationsrelations
(Forsyth & Ponce)(Forsyth & Ponce) 5151 (Forsyth & Ponce)(Forsyth & Ponce) 5252http://www.ri.cmu.edu/projects/project_320.html
(Forsyth & Ponce)(Forsyth & Ponce) 5353
Representing the 3D worldRepresenting the 3D world
Assemblies of primitivesAssemblies of primitivesfit parametric formsfit parametric formsIssuesIssues
•• what primitives?what primitives?•• uniqueness of representationuniqueness of representation•• few objects are actual primitivesfew objects are actual primitives
Indexed collection of imagesIndexed collection of imagesuse interpolation to predict appearance between use interpolation to predict appearance between imagesimagesIssuesIssues
•• occlusion is a mild nuisanceocclusion is a mild nuisance•• structuring the collection can be trickystructuring the collection can be tricky
(Forsyth & Ponce)(Forsyth & Ponce) 5454
PeoplePeopleSkin is characteristic; clothing hard to Skin is characteristic; clothing hard to segmentsegment•• hence, people wearing little clothinghence, people wearing little clothing
Finding body segments:Finding body segments:•• finding skinfinding skin--like (color, texture) regions that have like (color, texture) regions that have
nearly straight, nearly parallel boundaries nearly straight, nearly parallel boundaries
Grouping process constructed by hand, tuned Grouping process constructed by hand, tuned by hand using small dataset.by hand using small dataset.When a sufficiently large group is found, When a sufficiently large group is found, assert a person is presentassert a person is present
(Forsyth & Ponce)(Forsyth & Ponce) 5555
Horse grouperHorse grouper
(Forsyth & Ponce)(Forsyth & Ponce) 5656
TrackingTracking
Use a model to predict next position and Use a model to predict next position and refine using next imagerefine using next imageModel:Model:
simple dynamic models (second order simple dynamic models (second order dynamics)dynamics)kinematickinematic modelsmodelsetc.etc.
Face tracking and eye tracking now work Face tracking and eye tracking now work rather wellrather well