Upload
dorthy-barton
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Vision & Graphics ResearchVision & Graphics Researchin UCSD CSEin UCSD CSE
David KriegmanDavid Kriegman
Computer Science & EngineeringComputer Science & EngineeringUniversity of California, San DiegoUniversity of California, San Diego
The Pixel LabThe Pixel Lab
Serge Belongie Henrik Wann Jensen David Kriegman
Sameer Agarwal
Kristin Branson
Piotr Dollar
Craig DonnerJeff Ho
Wojciech Jarosz
Arash KeshmirianTobias KlugAnders Wang Kristensen
Kuang-chih Lee
Jongwoo Lim
Satya Mallick
Ben Ochoa
Vincent Rabaud
Andrew Rabinovich
John Rapp
Geoffrey RomerSteve Rotenberg
Josh Wills
Computer Vision ResearchComputer Vision Research
Computer Vision Research Threads
Segmentation Recognition
Reconstruction Motion
A failed outlineA failed outlineI. Theoretical Contributions:
A. Object trackingB. Measurement and segmentation of motionC. Unsupervised learning (clustering) of objects from imagesD. Object recognition and categorizationE. Helmholtz reciprocity stereopsisF. Structure from motionG. Illumination and reflectance modelingH. Optical flow through refractive objects
II. Vision meets GraphicsA. Refractive Optical flow for video compositingB. Image-based renderingC. Efficient rendering using environment mapsD. Modeling of rough dielectric surfacesE. Texture synthesis on surfaces
III. Applications of VisionA. Face recognition in images and videoB. Person tracking for video monitoring and ubiquitous vision systemsC. Visual monitoring of animal health and welfare (Smart Vivarium)D. Tissue microarray analysis for cancer detection.E. Protein structure reconstruction from cryo-electron micrographs
N-dimensional Image Space
The Illumination ConeThe Illumination Cone
Theorem:: The set of images of any object in fixed posed, but under all lighting conditions, is a convex cone in the image space. (Belhumeur and Kriegman, IJCV, 98)
Single light source images lie on cone boundary
2-light source image
x1
x2
xn
Illumination Cone
3-D Face Modeling for 3-D Face Modeling for RecognitionRecognition
Training Images
[Georghiades, Belhumeur, Kriegman]
3-D GenerativeModel
Face DatabaseFace Database
64 Lighting Conditions 9 Poses => 576 Images per Person Variable lighting
0-12o
12-25o
25-50o
50-77o
Face Recognition Error Face Recognition Error RatesRates
0
10
20
30
40
50
60
70
80
0-25 degrees 25-50 degrees 50-77 degrees
Light Direction
Err
or
Rat
e
Eigenfaces Eigenfaces (w/o 1st 3)
Linear Subspace Illumination Cones
Our Method
What Went Where: Motion SegmentationWhat Went Where: Motion Segmentation
[J. Wills, S. Agarwal, S. Belongie ]
1. Compute point correspondences by comparing filter response vectors
2. Partition these correspondences into groups using RANSAC and estimate the represented motion layers
3. Densely assign pixels to one of the detected motion layers using a Markov Random Field method
Motion SegmentationMotion Segmentation
[J. Wills, S. Agarwal, S. Belongie ]
Helmholtz Stereo: Arbitrary BRDFHelmholtz Stereo: Arbitrary BRDF
Bi-directional Reflectance Distribution Function
(in, in ; out, out)
n̂(in,in)
(out,out)
LEFT RIGHT
Non-Matte surfaces:
A challenge for conventional stereo
[ Zickler, Kriegman, Belhumeur ]
Helmholtz Reciprocity & StereoHelmholtz Reciprocity & Stereo
n̂
in, in
out, outn̂
in, in
out, out
((inin, , in in ; ; outout, , outout) = ) = ((outout, , out out ; ; inin, , inin))
[Helmholtz, 1910], [Minnaert, 1941], [ Nicodemus et al, 1977]
Using Multiple Helmholtz Stereo PairsUsing Multiple Helmholtz Stereo Pairs
il1wl1T
– ir1wr1T
n̂
wl1
wr1
il2wl2T
– ir2wr2T
…
n̂ = 0
}A
• Multiple views (at least three pairs) yield a matrix constraint equation.
• For correct match, matrix A must be Rank 2.• Surface normal lies in null space of A
[ Zickler, Kriegman, Belhumeur ]
Comparison to other methodsComparison to other methods
Method
Assumed Reflectance
Surface Information Recovered
Recovers Constant Intensity
Active/ Passive
Depth Discontinuities
Handles Half-Occlusion
Robust to Cast Shadows
Photometric Stereopsis
Lambertian or Known
Surface Normals
Surface Normals
Active No NA No
Multinocular Stereopsis
Lambertian Depth Nothing Passive Sometimes Sometimes Yes
Helmholtz Stereopsis
Arbitrary Depth + Surface Normals
Surface Normals
Active Yes Yes Yes
[ Zickler, Kriegman, Belhumeur ]
II. Vision meets GraphicsII. Vision meets Graphics
A. Refractive optical flow for compositing.B. Modeling of rough dielectric surfacesC. Efficient rendering using environment mapsD. Image-based renderingE. Texture synthesis on surfaces
Refractive Optical FlowRefractive Optical Flow
We use motion of the background to recover how light rays are transformed from the background to the foreground as they travel through a refractive medium.
[ Agarawal, Mallick, Belongie, Kriegman ]
Refractive Optical Flow MoviesRefractive Optical Flow Movies
With Water Without Water
[ Zickler, Kriegman, Belhumeur ]
Structured Importance SamplingStructured Importance Sampling
A novel importance metric that combines area and illumination importance.
A new hierarchical stratification algorithm. [Hochbaum Shmoys]
–Fast
–Stable
–Quality guarantees.
[ Agarwal, Jensen, Belongie ]
Lambertian BRDFLambertian BRDF
Area Illumination LightGenStructuredImportance
[ Agarwal, Jensen, Belongie ]
Glossy BRDFGlossy BRDF
1 10 100 300
Illumination importance sampling
Structured importance sampling
[ Agarwal, Jensen, Belongie ]
Texture Synthesis on SurfacesTexture Synthesis on Surfaces
Shape Model
Texture Sample [ Magda, Kriegman ]
Making it fastMaking it fast
Run Time: 1-2 seconds
Bunny Model: 8192 faces
1.3Ghz Athlon
Blending to hide seamsFast methods for selecting
Texture Triangles
Textons
[ Magda, Kriegman ]
Image-based modeling & RenderingImage-based modeling & Rendering
Goal: To render, under arbitrary pose & lighting images of both natural and man-made objects.
Rendered Rendered
ImagesImages
Textures that vary with Textures that vary with viewpointviewpoint
[ Koudelka, Magda, Belhumeur, Kriegman ]
III. Applications of VisionIII. Applications of Vision
A. Person tracking for video monitoring and ubiquitous vision systems
B. Face recognition in images and video
C. Visual monitoring of animal health and welfare (Smart Vivarium)
D. Tissue microarray analysis for cancer detection.
E. Reconstruction of protein structure from cryo-electron micrographs
Visual Tracking with Learned Linear SubspacesVisual Tracking with Learned Linear Subspaces
Main Challenges
1. 3-D Pose Variation
2. Occlusion of the target
3. Illumination variation
4. Camera jitter
5. Expression variation etc.
Approach
1. Initialize tracker with a single window
2. Target state: Affine warp to align with subspace
3. Model variation in target’s appearance as linear subspace
4. Learn linear subspace online while tracking
[ Ho, Lee, Kriegman ]
Representations of ObjectsRepresentations of Objects
N-dimensional Image Space
x1
x2
xn
Linear
Subspace
3-D Representations
Shape
Reflectance
Appearance-Based
Representations
Linear subspace of image space based on recent observed images
Problem Formulation: Given a collection of k Data points, {x1, x2, . . ., xk}. Find a linear subspace L that best approximates the data points.
Solution: Find a subspace L such that
max { dist (L, x1), dist (L, xs), . . . dist (L, xk) } < ε,
For some ε > 0.
Learned Representation of AppearancesLearned Representation of Appearances
State hypothesis is a location, size, orientation of tracking window.
Tracking Algorithm Outline
1. Generate multiple state hypotheses using previous state.
2. Evaluate each hypothesis using linear subspace model .
3. Choose best state hypothesis.
4. Update subspace model (adapt to changes)
Tracking loopTracking loop
[ Ho, Lee, Kriegman ]
Tracking Humans with Shape and AppearanceTracking Humans with Shape and Appearance
Goal : locate, identify and monitor people
Challenges– Lighting variation, full range of pose variation,
articulation
Identity : Jon Doe
Dimension : 5.1 x 3.2 x 9.7Planar position : (5.3, -10.1)Activity : Moving (-1.3, 2.0)……
[ Lim, Kriegman ]
Tracking Humans with Shape & AppearanceTracking Humans with Shape & Appearance
Offline: train shape model from training vidoes
Online: Model the background Detect & track a person using background model,
shape model, appearance model
Online: learn person’s appearanceRaw Image Foreground region Estimated State
… K-meansClustering
[ Lim, Kriegman ]
Tracking Humans with Shape & AppearanceTracking Humans with Shape & Appearance
Initial and learned appearance
Frame 88 117 225 312
[ Lim, Kriegman ]
Face Tracking & Face Tracking & RecognitionRecognition Using Using Probabilistic Appearance ManifoldsProbabilistic Appearance Manifolds
Challenges– Pose variation– Misalignment – Partial occlusion– Illumination
change– Facial expression
[ Lee, Ho, Yang, Kriegman ]
Representing an PersonRepresenting an Person
• Offline: Learn subspaces using clustering• Lean transition probabilities between subspaces
from training videos
Represent all appearance as union of subspaces
N-dimensional Image Space
x1
x2
x3
Tracking ResultTracking Result
Red rectangle represents ground truth
Compare with two-frame based tracker:
Smart Vivarium Project GoalSmart Vivarium Project Goal
Side-view video of multiple,
identical mice
Activity recognition statistics
Animal health analysis
The First Challenge …The First Challenge …
To track multiple, indistinguishable mice through severe occlusions:
Incorporate information from before and after the current frame using an acausal reasoning algorithm.
before
after
D IAGNOSIS
1 Field of View
1 Core
TMA
Region of Interest Detection
Spectral Decomposition
e.g., DAB Hematoxylin
densitometry
Tissue MicroarraysTissue Microarrays
[A. Rabinovich, S. Belongie]
InspirationInspiration
P. Viola and M. Jones. Robust real-time object detection. In ICCV Workshop on Statistical and Computation Theories of Vision, Vancouver Canada, July 2001.
[ Mallick, Zhu, Kriegman ]
Algorithm: AdaBoostAlgorithm: AdaBoost
Training using AdaBoost* Learning Algorithm. Using a cascade to speed up the detection
process.
* Y. Freund and R. Schapire .A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence,14(5),771-780,1999.
[ Mallick, Zhu, Kriegman ]
NSF IGERT: Vision and Learning in NSF IGERT: Vision and Learning in Humans and MachinesHumans and Machines
Garrison W. Cottrell (CSE, PI)Geoff Boynton (Salk, co-PI)Virginia de Sa (CogSci, co-PI)
Senior PersonnelSenior Personnel Tom Albright (Salk) Serge Belongie (CSE) Leslie Carver (Psych,HDP) Sanjoy Dasgupta (CSE) Gedeon Deak (Cog Sci) Charles Elkan (CSE) Ione Fine (Psych) Don MacLeod (Psych)
Javier Movellan (INC) Vilayanur Ramachandran (Psych) Marty Sereno (Cog Sci) Joan Stiles (Cog Sci, HDP) Emo Todorov (Cog Sci) Jochen Triesch (Cog Sci) Terry Sejnowski (Salk)
Karen Dobkins (Psych, co-PI)David Kriegman (CSE, co-PI)
$3.4m over five years to support for 17 grad students per year
FWGrid: NSF Research Infrastructure GrantFWGrid: NSF Research Infrastructure Grant
High Bandwidth to campus
GigE Switch
Compute + DataCluster “Brick”
High BW Wireless Cell
1st Floor
2nd Floor
3rd Floor
4th FloorHead mounted displayw/ video cameras wireless laptop
Handheld w/ camera & wireless
IBM T2213840 x 2200 pixels
Plasmadisplays
Special purpose:Range scanner
Eye tracker
Stereo cameraVideo camera
OmnicamNetwork camera
IEEE 1394
GPU
Domain Independent Vision-Based NavigationDomain Independent Vision-Based Navigation
• Color camera• SICK laser scanner • Fiber optic gyroscope• Pan-tilt head• GPS• Digital compass• Ultrasonic• Contact bumpers
Goal: Robust Vision-based exploration, mapping, and Goal: Robust Vision-based exploration, mapping, and navigation for mobile robots over large environments.navigation for mobile robots over large environments.