Content Based Image Search

Embed Size (px)

Citation preview

  • 7/29/2019 Content Based Image Search

    1/41

    Content Based Image Search

    Mark Desnoyer

  • 7/29/2019 Content Based Image Search

    2/41

    Text Search - Bag of Words

    Gnome

    grass

    him

    caught

    moonwalked

    walksGnome

    him

    house

    internet

    walks

    shuttle

    PC

    Nord

  • 7/29/2019 Content Based Image Search

    3/41

    Text Search - TF-IDF

  • 7/29/2019 Content Based Image Search

    4/41

    Text Search - Query

  • 7/29/2019 Content Based Image Search

    5/41

    Current Image Search

    On the web uses contextaround image

    Words around itWords in the alt tag

    Those words are treatedas a documentSame as normal textsearch

    But we want pictures, nottext!

    Query: Horse

  • 7/29/2019 Content Based Image Search

    6/41

    Searching With Pictures

    How about searching with pictures instead

  • 7/29/2019 Content Based Image Search

    7/41

    Using Visual Words For Search

    Use visual words paradigmwe've seen beforeCan use all the text search

    machinery we already haveBut, we're searching withpictures now

  • 7/29/2019 Content Based Image Search

    8/41

    The Players

    O. Chum et al., Total recall: Automatic query expansionwith a generative feature model for object retrieval, in Proc.ICCV, vol. 2, 2007.

    N. Snavely, S. M. Seitz, and R. Szeliski, Photo tourism:exploring photo collections in 3D, in InternationalConference on Computer Graphics and InteractiveTechniques (ACM New York, NY, USA, 2006), 835-846.

    D. Nister and H. Stewenius, Scalable recognition with avocabulary tree, in Proc. CVPR, vol. 5, 2006.

  • 7/29/2019 Content Based Image Search

    9/41

    SIFT Features

    Succinct descriptorsScale invariantRobust to changes in lighting, viewpoint, blur etc.Therefore, normally used in this search context

  • 7/29/2019 Content Based Image Search

    10/41

    Bag of Words First Step - Build a

    Dictionary

    Must be big to be expressive enough to differentiate objects

    So, cluster SIFT features

    Each cluster is a word in the dictionary

    But K-means clustering 10M+ descriptors is O(NK)Hierarchical K-means (Nister)

    Approximate K-means (Chum)

  • 7/29/2019 Content Based Image Search

    11/41

    Vocabulary Tree

    Copyright David Nistr, Henrik Stewnius

    Vocabulary Tree (Nister)

  • 7/29/2019 Content Based Image Search

    12/41

    Vocabulary Tree

    Vocabulary Tree (Nister)

    Copyright David Nistr, Henrik Stewnius

  • 7/29/2019 Content Based Image Search

    13/41

    Vocabulary Tree

    Vocabulary Tree (Nister)

    Copyright David Nistr, Henrik Stewnius

  • 7/29/2019 Content Based Image Search

    14/41

    Vocabulary Tree (Nister)

    Copyright David Nistr, Henrik Stewnius

  • 7/29/2019 Content Based Image Search

    15/41

    Vocabulary Tree (Nister)

    Copyright David Nistr, Henrik Stewnius

  • 7/29/2019 Content Based Image Search

    16/41

    Approximate Nearest Neighbour (Chum)

    Most of the time in K-Means is spent doing NearestNeighbour

    Nearest Neighbour can be approximated using kd-trees

    O(N logK) vs. O(NK)

  • 7/29/2019 Content Based Image Search

    17/41

    Another Problem - Synonyms

    Visual Polysemy. Single visual word occurring on different (but locallysimilar) parts on different object categories.

    Visual Synonyms. Two different visual words representing a similar part ofan object (wheel of a motorbike).

  • 7/29/2019 Content Based Image Search

    18/41

    Video Google

    Search for recurring objectsin a movie

    Synonyms suppressed by

    enforcing consistency intime

    Stop list used to throw outwords that are too common

  • 7/29/2019 Content Based Image Search

    19/41

    Photo Tourism Overview

    Scenereconstruction

    Photo Explorer

    Inputphotographs

    Relative camerapositions and orientations

    Point cloud

    Sparse correspondence

    Copyright Noah Snavely

  • 7/29/2019 Content Based Image Search

    20/41

    Photo Tourism Scene Recontruction

    Automatically estimate

    position, orientation, and focal length of cameras

    3D positions of feature points

    Feature detection

    Pairwise

    feature matching

    Incrementalstructure from

    motion

    Correspondenceestimation

    Copyright Noah Snavely

  • 7/29/2019 Content Based Image Search

    21/41

    Photo Tourism

    Demo

    http://phototour.cs.washington.edu/applet/index.html
  • 7/29/2019 Content Based Image Search

    22/41

    Photo Tourism Limitations

    Matching is only performed between pairs of imagesDoes not scale to large datasets

  • 7/29/2019 Content Based Image Search

    23/41

    Chum et al. Experiment

    5K labeled images of sights at Oxford

    1M Flickr images from popular tags (distractors)

    Dictionary built from Oxford Images16M descriptions -> 1M word dictionary

    Query for landmarks and calculate PR curve using differentforms of query expansion

  • 7/29/2019 Content Based Image Search

    24/41

    Copyright James Philbin

  • 7/29/2019 Content Based Image Search

    25/41

    Copyright James Philbin

  • 7/29/2019 Content Based Image Search

    26/41

    Copyright James Philbin

  • 7/29/2019 Content Based Image Search

    27/41

    Copyright James Philbin

  • 7/29/2019 Content Based Image Search

    28/41

    Copyright James Philbin

  • 7/29/2019 Content Based Image Search

    29/41

    Copyright James Philbin

  • 7/29/2019 Content Based Image Search

    30/41

    Copyright James Philbin

  • 7/29/2019 Content Based Image Search

    31/41

    Copyright James Philbin

  • 7/29/2019 Content Based Image Search

    32/41

    Text Query Expansion

    In text search, some words are similar, but they are differentin the dictionary

    e.g. gray and grey

    Improve results by expanding the query to include similarwords

    e.g. "grey goose" -> "grey goose gray"

    Similar words are found by clustering on document data

    At query time, relevant clusters are found and pulled in

    False positives add a lot of noise to the results

  • 7/29/2019 Content Based Image Search

    33/41

    Image Query Expansion - Baseline

  • 7/29/2019 Content Based Image Search

    34/41

    Transitive Closure

  • 7/29/2019 Content Based Image Search

    35/41

    Average Query Expansion

  • 7/29/2019 Content Based Image Search

    36/41

    Recursive Average Query Expansion

  • 7/29/2019 Content Based Image Search

    37/41

    Multiple Image Resolution Expansion

  • 7/29/2019 Content Based Image Search

    38/41

    Copyright James Philbin

  • 7/29/2019 Content Based Image Search

    39/41

    Demo

    http://arthur.robots.ox.ac.uk:8080/search/?id=oxc1_hertford_000011

    http://arthur.robots.ox.ac.uk:8080/search/?id=oxc1_hertford_000011http://arthur.robots.ox.ac.uk:8080/search/?id=oxc1_hertford_000011
  • 7/29/2019 Content Based Image Search

    40/41

    Results - PR Curves Before & After

    Expansion

  • 7/29/2019 Content Based Image Search

    41/41

    Results - Effect Of Distractors

    My distractors: 9K images from searches like "building","cathedral", "library", "historic", "spire" etc.