Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University

Problems in large-scale computer vision

David CrandallSchool of Informatics and ComputingIndiana University

Research questions

• Given huge collections of images online, how can we analyze images and non-visual metadata to:– Help users organize, browse, search?– Mine information about the state of the world and human

behavior?

Common computational problems

1. Image-by-image (e.g. recognition)– Large-scale, but easily parallelizable

2. Iterative algorithms (e.g. learning)– Sometimes few but long-running iterations– Sometimes many lightweight iterations

3. Inference on graphs (e.g. reconstruction, learning)– Small graphs with huge label spaces– Large graphs with small label spaces– Large graphs with large label spaces

Scene classification

• E.g.: Find images containing snow, in a collection of ~100 million images

– Typical approach: extract features and run a classifier (typically SVM) on each image

– We use Hadoop, typically with trivial Reducer, images in giant HDFS sequence files, and C++ Map-Reduce bindings

Geolocation

• Given an image, where on Earth was it taken?

– Match against thousands of place models, or against hundreds of attribute classifiers (e.g. indoor vs outdoor, city vs rural, etc.)

– Again use Hadoop with trivial mapper

Learning these models

• Many recognition approaches use “bags-of-words”– Using vector space model over “visual words”

• To learn, need to:1. Generate vocabulary of visual words (e.g. with K-means)2. Extract features from training images3. Learn a classifier

• Our computational approach:1. For k-means, use iterative Map-Reduce (Twister – J. Qiu) 2. For feature extraction, Map-Reduce with trivial reducer3. For learning classifiers, we use off-the-shelf packages (can be quite slow)

Inference on graphical models

• Statistical graphical models are widely used in vision– Basic idea: vertices are variables, with some known and

some unknown; edges are probabilistic relationships– Inference is NP hard in general

• Many approximation algorithms are based on message passing – e.g. Loopy Discrete Belief Propagation– # of Messages proportional to # of edges in graph– Messages can be large – size depends on variable label space– # of iterations depends (roughly) on diameter of graph

Pose and part-based recognition• Represent objects in terms of parts

• Can be posed as graphical model inference problem– Small number of variables (vertices) and constraints (edges),

but large label space (millions++)– We use single-node multi-threaded implementation, with

barriers between iterations

Fine-grained object recognition

• Classify amongst similar objects (e.g. species of birds)– How can we learn discriminative properties of these

objects automatically?– Model each training image as a node, edges between all

pairs; goal is to label each image with a feature that is found in all positive examples and no negative examples

• We use off-the-shelf solver– With some additional multi- threading on single node; still very slow

Large-scale 3D reconstruction

Pose as inference problem

• View reconstruction as statistical inference over a graphical model

– Vertices are cameras and points– Edges are relative camera/point

correspondences (estimated through point matching)

– Inference: Label each image with a camera pose and each point with a 3-d position, such that constraints are satisfied

Computation

• Our graphs have ~100,000 vertices, ~1,000,000 edges, ~100,000 possible discrete labels– Reduce computation using exact algorithmic tricks (min

convolutions) from O(|E| |L|2) to O(|E| |L|)– Huge amount of data: total message size >1TB per iteration

• Parallelize using iterative MapReduce – Hadoop plus shell scripts for iteration– Mappers take in messages from last iteration and compute

outgoing messages– Reducers collate and route messages– Messages live on HDFS between iterations

Common computational problems

1. Image-by-image (e.g. recognition)– Large-scale, but easily parallelizable

2. Iterative algorithms (e.g. learning)– Sometimes few but long-running iterations– Sometimes many lightweight iterations

3. Inference on graphs (e.g. reconstruction, learning)– Small graphs with huge label spaces– Large graphs with small label spaces– Large graphs with large label spaces

Documents

Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University