Context-based vision system for place and object recognition

Context-based vision system for place and object recognition

Antonio TorralbaKevin MurphyBill FreemanMark Rubin

Presented by David LeeSome slides borrowed from Kevin Murphy

Object out of context

Object in context

Wearable test-bed

System diagram

Computing the features

24 filteredImages

Downs

ampl

e

to 4

x4

4x4x24=384 dim 80 dim

Visualizing the filter bank output

Images

80-dimensional representation

Place recognition system

Hidden Markov Model

Hidden states = location (63 values) Observations = vG

t ∈ R80

Transition model encodes topology of environment

Observation model is a mixture of Gaussians (100 views per place)

Hidden Markov Model

Observation Likelihood

Prediction Prior

Transition Matrix

Mixture of Gaussians MLE (counting)

Scene Categorization

17 Categories (Office, Corridor, Street, etc)

Train a separate HMM on category labels

Place recognition demo

Specific location

Location category

Indoor/outdoor

Ground truthSystem estimate

Performance on known env.

Performance on new env.

Comparison of features

Recognition Categorization

Effect of HMM on recognition

With Without(But with temporal smoothing)

From place to object recognition

Object priming Predict object properties based on

context (top-down signals): Visual gist, vt

G

Specific Location, Qt

Kind of location, Ct

Object Priming

Again…MLE

Probability of object i

Probability of object i in image vi given entire video

sequence

Probability of object i Given current

observation & place

Estimate of current place

(Output of HMM)

Mixture of Gaussians

Observation Likelihood

Prior probability of object i being

in place q

Predicting object presence

ROC curves for object detection

Predicting object position and scale

Predicting object position and scaleEstimate of

mask

Probability of an object i being present and location being q(Output of previous system)

Estimate of mask given current gist, place, and object

delta Gaussian

Predicted segmentation

Conclusion

Real world problem (and it works!)

Uses only global feature (context)

How much did {HMM / place prior} affect{place recognition / object detection}?Can we really say “context” did the job?

Documents

Context-based vision system for place and object recognition