Vision & Graphics Research in UCSD CSE David Kriegman Computer Science & Engineering University of California, San Diego

Vision & Graphics ResearchVision & Graphics Researchin UCSD CSEin UCSD CSE

David KriegmanDavid Kriegman

Computer Science & EngineeringComputer Science & EngineeringUniversity of California, San DiegoUniversity of California, San Diego

The Pixel LabThe Pixel Lab

Serge Belongie Henrik Wann Jensen David Kriegman

Sameer Agarwal

Kristin Branson

Piotr Dollar

Craig DonnerJeff Ho

Wojciech Jarosz

Arash KeshmirianTobias KlugAnders Wang Kristensen

Kuang-chih Lee

Jongwoo Lim

Satya Mallick

Ben Ochoa

Vincent Rabaud

Andrew Rabinovich

John Rapp

Geoffrey RomerSteve Rotenberg

Josh Wills

Computer Vision ResearchComputer Vision Research

Computer Vision Research Threads

Segmentation Recognition

Reconstruction Motion

A failed outlineA failed outlineI. Theoretical Contributions:

A. Object trackingB. Measurement and segmentation of motionC. Unsupervised learning (clustering) of objects from imagesD. Object recognition and categorizationE. Helmholtz reciprocity stereopsisF. Structure from motionG. Illumination and reflectance modelingH. Optical flow through refractive objects

II. Vision meets GraphicsA. Refractive Optical flow for video compositingB. Image-based renderingC. Efficient rendering using environment mapsD. Modeling of rough dielectric surfacesE. Texture synthesis on surfaces

III. Applications of VisionA. Face recognition in images and videoB. Person tracking for video monitoring and ubiquitous vision systemsC. Visual monitoring of animal health and welfare (Smart Vivarium)D. Tissue microarray analysis for cancer detection.E. Protein structure reconstruction from cryo-electron micrographs

Face RecognitionThe challenge caused by lighting

variability

Same Person or

Different People

Same Person

or

Different People

Same Person

or

Different People

N-dimensional Image Space

The Illumination ConeThe Illumination Cone

Theorem:: The set of images of any object in fixed posed, but under all lighting conditions, is a convex cone in the image space. (Belhumeur and Kriegman, IJCV, 98)

Single light source images lie on cone boundary

2-light source image

x1

x2

xn

Illumination Cone

Face Detection: First Online StepFace Detection: First Online Step

3-D Face Modeling for 3-D Face Modeling for RecognitionRecognition

Training Images

[Georghiades, Belhumeur, Kriegman]

3-D GenerativeModel

Face DatabaseFace Database

64 Lighting Conditions 9 Poses => 576 Images per Person Variable lighting

0-12o

12-25o

25-50o

50-77o

Face Recognition Error Face Recognition Error RatesRates

0

10

20

30

40

50

60

70

80

0-25 degrees 25-50 degrees 50-77 degrees

Light Direction

Err

or

Rat

e

Eigenfaces Eigenfaces (w/o 1st 3)

Linear Subspace Illumination Cones

Our Method

What Went Where?What Went Where?

Segmenting images into regions with different motion

What Went Where: Motion SegmentationWhat Went Where: Motion Segmentation

[J. Wills, S. Agarwal, S. Belongie ]

1. Compute point correspondences by comparing filter response vectors

2. Partition these correspondences into groups using RANSAC and estimate the represented motion layers

3. Densely assign pixels to one of the detected motion layers using a Markov Random Field method

Motion SegmentationMotion Segmentation

[J. Wills, S. Agarwal, S. Belongie ]

Helmholtz StereoHelmholtz Stereo

Reconstructing shape of surfaces with arbitrary BRDF

Helmholtz Stereo: Arbitrary BRDFHelmholtz Stereo: Arbitrary BRDF

Bi-directional Reflectance Distribution Function

(in, in ; out, out)

n̂(in,in)

(out,out)

LEFT RIGHT

Non-Matte surfaces:

A challenge for conventional stereo

[ Zickler, Kriegman, Belhumeur ]

Helmholtz Reciprocity & StereoHelmholtz Reciprocity & Stereo

n̂

in, in

out, outn̂

in, in

out, out

((inin, , in in ; ; outout, , outout) = ) = ((outout, , out out ; ; inin, , inin))

[Helmholtz, 1910], [Minnaert, 1941], [ Nicodemus et al, 1977]

Using Multiple Helmholtz Stereo PairsUsing Multiple Helmholtz Stereo Pairs

il1wl1T

– ir1wr1T

n̂

wl1

wr1

il2wl2T

– ir2wr2T

…

n̂ = 0

}A

• Multiple views (at least three pairs) yield a matrix constraint equation.

• For correct match, matrix A must be Rank 2.• Surface normal lies in null space of A


Helmholtz StereoHelmholtz Stereo


Experimental AparatusExperimental Aparatus


Subtle Surface Geometry: WrinklesSubtle Surface Geometry: Wrinkles

Helmholtz Stereo: Depth & NormalsHelmholtz Stereo: Depth & Normals

Reconstructed GeometryReconstructed Geometry


Comparison to other methodsComparison to other methods

Method

Assumed Reflectance

Surface Information Recovered

Recovers Constant Intensity

Active/ Passive

Depth Discontinuities

Handles Half-Occlusion

Robust to Cast Shadows

Photometric Stereopsis

Lambertian or Known

Surface Normals

Surface Normals

Active No NA No

Multinocular Stereopsis

Lambertian Depth Nothing Passive Sometimes Sometimes Yes

Helmholtz Stereopsis

Arbitrary Depth + Surface Normals

Surface Normals

Active Yes Yes Yes


II. Vision meets GraphicsII. Vision meets Graphics

A. Refractive optical flow for compositing.B. Modeling of rough dielectric surfacesC. Efficient rendering using environment mapsD. Image-based renderingE. Texture synthesis on surfaces

Refractive Optical FlowRefractive Optical Flow

We use motion of the background to recover how light rays are transformed from the background to the foreground as they travel through a refractive medium.

[ Agarawal, Mallick, Belongie, Kriegman ]

The SetupThe Setup

Theory Experiment


ResultsResults

Real Recovered


Refractive Optical Flow MoviesRefractive Optical Flow Movies

With Water Without Water


Henrik’s SlidesHenrik’s Slides

Rendering with Environment MapsRendering with Environment Maps

[ Agarwal, Jensen, Belongie ]

Area ImportanceArea Importance


Illumination ImportanceIllumination Importance


Structured Importance SamplingStructured Importance Sampling

A novel importance metric that combines area and illumination importance.

A new hierarchical stratification algorithm. [Hochbaum Shmoys]

–Fast

–Stable

–Quality guarantees.


Lambertian BRDFLambertian BRDF

Area Illumination LightGenStructuredImportance


Glossy BRDFGlossy BRDF

1 10 100 300

Illumination importance sampling

Structured importance sampling


Texture Synthesis on SurfacesTexture Synthesis on Surfaces

Shape Model

Texture Sample [ Magda, Kriegman ]

Making it fastMaking it fast

Run Time: 1-2 seconds

Bunny Model: 8192 faces

1.3Ghz Athlon

Blending to hide seamsFast methods for selecting

Texture Triangles

Textons

[ Magda, Kriegman ]

Image-based modeling & RenderingImage-based modeling & Rendering

Goal: To render, under arbitrary pose & lighting images of both natural and man-made objects.

Rendered Rendered

ImagesImages

Model objects with many imagesModel objects with many images

Real Objects in Synthetic ScenesReal Objects in Synthetic Scenes

Textures that vary with Textures that vary with viewpointviewpoint

[ Koudelka, Magda, Belhumeur, Kriegman ]

Data AcquisitionData Acquisition

[ Koudelka, Magda, Belhumeur, Kriegman ]

Compositing Real Objects in VideoCompositing Real Objects in Video

Movie with soundMovie with sound

III. Applications of VisionIII. Applications of Vision

A. Person tracking for video monitoring and ubiquitous vision systems

B. Face recognition in images and video

C. Visual monitoring of animal health and welfare (Smart Vivarium)

D. Tissue microarray analysis for cancer detection.

E. Reconstruction of protein structure from cryo-electron micrographs

Visual Tracking with Visual Tracking with Learned Linear SubspacesLearned Linear Subspaces

Visual Tracking with Learned Linear SubspacesVisual Tracking with Learned Linear Subspaces

Main Challenges

1. 3-D Pose Variation

2. Occlusion of the target

3. Illumination variation

4. Camera jitter

5. Expression variation etc.

Approach

1. Initialize tracker with a single window

2. Target state: Affine warp to align with subspace

3. Model variation in target’s appearance as linear subspace

4. Learn linear subspace online while tracking

[ Ho, Lee, Kriegman ]

Representations of ObjectsRepresentations of Objects


x1

x2

xn

Linear

Subspace

3-D Representations

Shape

Reflectance

Appearance-Based

Representations

Linear subspace of image space based on recent observed images

Problem Formulation: Given a collection of k Data points, {x1, x2, . . ., xk}. Find a linear subspace L that best approximates the data points.

Solution: Find a subspace L such that

max { dist (L, x1), dist (L, xs), . . . dist (L, xk) } < ε,

For some ε > 0.

Learned Representation of AppearancesLearned Representation of Appearances

State hypothesis is a location, size, orientation of tracking window.

Tracking Algorithm Outline

1. Generate multiple state hypotheses using previous state.

2. Evaluate each hypothesis using linear subspace model .

3. Choose best state hypothesis.

4. Update subspace model (adapt to changes)

Tracking loopTracking loop

[ Ho, Lee, Kriegman ]

Tracking Humans with Shape and AppearanceTracking Humans with Shape and Appearance

Goal : locate, identify and monitor people

Challenges– Lighting variation, full range of pose variation,

articulation

Identity : Jon Doe

Dimension : 5.1 x 3.2 x 9.7Planar position : (5.3, -10.1)Activity : Moving (-1.3, 2.0)……

[ Lim, Kriegman ]

Tracking Humans with Shape & AppearanceTracking Humans with Shape & Appearance

Offline: train shape model from training vidoes

Online: Model the background Detect & track a person using background model,

shape model, appearance model

Online: learn person’s appearanceRaw Image Foreground region Estimated State

… K-meansClustering

[ Lim, Kriegman ]

Tracking Humans with Shape & AppearanceTracking Humans with Shape & Appearance

Initial and learned appearance

Frame 88 117 225 312

[ Lim, Kriegman ]

Face Tracking & Face Tracking & RecognitionRecognition Using Using Probabilistic Appearance ManifoldsProbabilistic Appearance Manifolds

Challenges– Pose variation– Misalignment – Partial occlusion– Illumination

change– Facial expression

[ Lee, Ho, Yang, Kriegman ]

Representing an PersonRepresenting an Person

• Offline: Learn subspaces using clustering• Lean transition probabilities between subspaces

from training videos

Represent all appearance as union of subspaces


x1

x2

x3

Online ProcessOnline Process

Tracking Recognition

Cropped Sub-window

Pose Subspace

Tracking ResultTracking Result

Red rectangle represents ground truth

Compare with two-frame based tracker:

Recognition ResultRecognition Result

Testing on 30 videos of 20 people

Smart Vivarium Project GoalSmart Vivarium Project Goal

Side-view video of multiple,

identical mice

Activity recognition statistics

Animal health analysis

The First Challenge …The First Challenge …

To track multiple, indistinguishable mice through severe occlusions:

Incorporate information from before and after the current frame using an acausal reasoning algorithm.

before

after

D IAGNOSIS

1 Field of View

1 Core

TMA

Region of Interest Detection

Spectral Decomposition

e.g., DAB Hematoxylin

densitometry

Tissue MicroarraysTissue Microarrays

[A. Rabinovich, S. Belongie]

Particle Detection in Cryo-EMParticle Detection in Cryo-EM

[ Mallick, Zhu, Kriegman ]

InspirationInspiration

P. Viola and M. Jones. Robust real-time object detection. In ICCV Workshop on Statistical and Computation Theories of Vision, Vancouver Canada, July 2001.


Algorithm: AdaBoostAlgorithm: AdaBoost

Training using AdaBoost* Learning Algorithm. Using a cascade to speed up the detection

process.

* Y. Freund and R. Schapire .A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence,14(5),771-780,1999.


ResultsResults


NSF IGERT: Vision and Learning in NSF IGERT: Vision and Learning in Humans and MachinesHumans and Machines

Garrison W. Cottrell (CSE, PI)Geoff Boynton (Salk, co-PI)Virginia de Sa (CogSci, co-PI)

Senior PersonnelSenior Personnel Tom Albright (Salk) Serge Belongie (CSE) Leslie Carver (Psych,HDP) Sanjoy Dasgupta (CSE) Gedeon Deak (Cog Sci) Charles Elkan (CSE) Ione Fine (Psych) Don MacLeod (Psych)

Javier Movellan (INC) Vilayanur Ramachandran (Psych) Marty Sereno (Cog Sci) Joan Stiles (Cog Sci, HDP) Emo Todorov (Cog Sci) Jochen Triesch (Cog Sci) Terry Sejnowski (Salk)

Karen Dobkins (Psych, co-PI)David Kriegman (CSE, co-PI)

$3.4m over five years to support for 17 grad students per year

FWGrid: NSF Research Infrastructure GrantFWGrid: NSF Research Infrastructure Grant

High Bandwidth to campus

GigE Switch

Compute + DataCluster “Brick”

High BW Wireless Cell

1st Floor

2nd Floor

3rd Floor

4th FloorHead mounted displayw/ video cameras wireless laptop

Handheld w/ camera & wireless

IBM T2213840 x 2200 pixels

Plasmadisplays

Special purpose:Range scanner

Eye tracker

Stereo cameraVideo camera

OmnicamNetwork camera

IEEE 1394

GPU

Domain Independent Vision-Based NavigationDomain Independent Vision-Based Navigation

• Color camera• SICK laser scanner • Fiber optic gyroscope• Pan-tilt head• GPS• Digital compass• Ultrasonic• Contact bumpers

Goal: Robust Vision-based exploration, mapping, and Goal: Robust Vision-based exploration, mapping, and navigation for mobile robots over large environments.navigation for mobile robots over large environments.

Documents

Vision & Graphics Research in UCSD CSE David Kriegman Computer Science & Engineering University of California, San Diego