46
Bayesian Content-Based Image Retrieval research with: Katherine A. Heller based on (Heller and Ghahramani, 2006) part IB, paper 8, Lent

Bayesian Content-Based Image Retrieval

  • Upload
    patch

  • View
    37

  • Download
    2

Embed Size (px)

DESCRIPTION

Bayesian Content-Based Image Retrieval. research with: Katherine A. Heller based on (Heller and Ghahramani, 2006) part IB, paper 8, Lent. What is Information Retrieval?. - PowerPoint PPT Presentation

Citation preview

Page 1: Bayesian Content-Based  Image Retrieval

Bayesian Content-Based Image Retrieval

research with:

Katherine A. Heller

based on (Heller and Ghahramani, 2006) part IB, paper 8, Lent

Page 2: Bayesian Content-Based  Image Retrieval

What is Information Retrieval? finding material from within a large unstructured collection (e.g. the internet) that satisfies the user’s information need (e.g. expressed via a query).

well known examples…

…but there are many specialist search systems as well:

Page 3: Bayesian Content-Based  Image Retrieval
Page 4: Bayesian Content-Based  Image Retrieval
Page 5: Bayesian Content-Based  Image Retrieval
Page 6: Bayesian Content-Based  Image Retrieval

Universe of items being searched…Imagine a universe of items:

….The items could be: images, music, documents, websites, publications, proteins, news stories, customer profiles, products, medical records, …or any other type of item one might want to query.

Page 7: Bayesian Content-Based  Image Retrieval

Query set:

Result:

Query set:

Result:

Illustrative example

Page 8: Bayesian Content-Based  Image Retrieval

Generalization from a small set… Query is a set of items Our information retrieval method should rank

items x by how well x fits with the query set

Page 9: Bayesian Content-Based  Image Retrieval

Bayesian Inference & Statistical Models Statistical model for data points

with model parameters Prior on model parameters Dataset and model class Marginal likelihood (evidence) for model :

Page 10: Bayesian Content-Based  Image Retrieval

Query set:

Result:

Query set:

Result:

Illustrative example

Page 11: Bayesian Content-Based  Image Retrieval

Ranking items Rank each item in the universe by how well it would “fit into”

a set which includes the query set

Limit output to the top few items

Query:

Ranking:Best Worst

Page 12: Bayesian Content-Based  Image Retrieval

A Criterion?

Having observed , belonging to some concept, how probable is it that an item also belongs to that concept ?

What we really want to know is relative to , , the probability of the item before observing the query…

Page 13: Bayesian Content-Based  Image Retrieval

Bayesian Sets Criterion

So we compute:

Assume a simple parameterized model, , and a prior on the parameters, .

Since is unknown, to compute the score we need to average over all values of :

Page 14: Bayesian Content-Based  Image Retrieval

Bayesian Sets Criterion(A Different Perspective)

We can rewrite this score as:

Page 15: Bayesian Content-Based  Image Retrieval

Bayesian Sets Criterion (A Different Perspective)

This has a nice intuitive interpretation:

Page 16: Bayesian Content-Based  Image Retrieval

Bayesian Sets Criterion

Page 17: Bayesian Content-Based  Image Retrieval

Bayesian Sets Algorithm

For simple models computing the score is tractable. For sparse binary data computing all scores can be

reduced to a single sparse matrix multiplication. Even with very simple models and almost no parameter

tuning one can get very competitive retrieval results.

Page 18: Bayesian Content-Based  Image Retrieval

Sparse Binary Data

If we use a multivariate Bernoulli model:

With conjugate Beta prior:

We can compute:

This daunting expression can be dramatically simplified…

E.g:

Page 19: Bayesian Content-Based  Image Retrieval

Sparse Binary Data

Reduces to:

The log of the score is linear in :

where:

and

Page 20: Bayesian Content-Based  Image Retrieval

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Priors broad empirical priors from

entire data set chosen before observing any queries

prior proportional to mean feature frequency

robust to changes in

Page 21: Bayesian Content-Based  Image Retrieval

Key Advantages of Our Approach

Novel search paradigm for retrieval queries are a small set of examples

Based on: principled statistical methods (Bayesian machine learning) recent psychological research into models of human

categorization and generalization Extremely fast

search >100,000 records per second on a laptop computer uses sparse matrix methods easy to parallelize and use inverted indices to search

billions of records/sec

Page 22: Bayesian Content-Based  Image Retrieval

Applications Retrieving movies from a database of movie preferences

EachMovie Dataset: (person,movie) entry is 1 if the person gave the movie a rating above 3 stars out of a possible 0-5 stars

Finding sets of authors who work on similar topics NIPS authors dataset: (word,author) entry is 1 if the author uses that word more

frequently than twice the mean across all authors Searching scientific literature

NIPS dataset: (word, paper) entry is 1 if the paper uses that word more frequently than twice the mean across all papers

Image retrieval based on color and texture features only Corel dataset: (image, feature) matrix contains 240 binary features per image:

Gabor and Tamura texture features and HSV color features Searching a protein database

UniProt database: the “world’s most comprehensive catalog of information on proteins”. Binary features from GO annotations, PDB structural information, keywords, and primary sequences.

Patent Search (Xyggy.com)

Page 23: Bayesian Content-Based  Image Retrieval

Retrieving MoviesEachMovie Data: 1813 people by 1532

movies

Page 24: Bayesian Content-Based  Image Retrieval

Retrieving Moviescomparison to Google Sets

Page 25: Bayesian Content-Based  Image Retrieval

Retrieving Moviescomparison to Google Sets

Page 26: Bayesian Content-Based  Image Retrieval
Page 27: Bayesian Content-Based  Image Retrieval
Page 28: Bayesian Content-Based  Image Retrieval

Query Times

Page 29: Bayesian Content-Based  Image Retrieval

Content-Based Image Retrieval We can use the Bayesian Sets method as the basis of a content-based image retrieval system…

Page 30: Bayesian Content-Based  Image Retrieval

The Image Retrieval Prototype System A system for searching large collections of unlabelled images.

You enter a word, e.g. “penguins”, and it retrieves images that match this label, using only color and texture features of the images

A database of 32,000 images (from Corel) Labelled Training Images: 10,000 images with about 3-10 text

labels per image Unlabelled Test Images: 22,000 images For each training and test image we can store a vector of 240 binary

color and texture features A vocabulary of about 2000 keywords

For each keyword, we can compute a query vector q from the labelled training images, as is specified by the Bayesian Sets algorithm.

Page 31: Bayesian Content-Based  Image Retrieval

Image features Texture features (75)

48 Gabor features 27 Tamura features

Color features (165) HSV histogram (8x5x5)

Binarization Compute skewness of each feature Assign value 1 to images in heavier tail

Page 32: Bayesian Content-Based  Image Retrieval

The Image Retrieval Prototype SystemThe Algorithm:

1. Input query word: w=“penguins”

2. Find all training images with label w

3. Take the binary feature vectors for these training images as query set and use Bayesian Sets algorithm

For each image, x, in the unlabelled test set, we compute score(x) which measures the probability that x belongs in the set of images with the label w.

4. Return the images with the highest score

The algorithm is very fast:

about 0.2 sec on this laptop to query 22,000 test images

Page 33: Bayesian Content-Based  Image Retrieval
Page 34: Bayesian Content-Based  Image Retrieval
Page 35: Bayesian Content-Based  Image Retrieval
Page 36: Bayesian Content-Based  Image Retrieval
Page 37: Bayesian Content-Based  Image Retrieval
Page 38: Bayesian Content-Based  Image Retrieval
Page 39: Bayesian Content-Based  Image Retrieval
Page 40: Bayesian Content-Based  Image Retrieval
Page 41: Bayesian Content-Based  Image Retrieval

Results on all 50 queries…

Page 42: Bayesian Content-Based  Image Retrieval

Results for Image Retrieval

NNall - nearest neighbors to any member of the query setNnmean - nearest neighbors to the mean of the query setBO - Behold Search online, www.beholdsearch.com A Yavlinsky, E Schofield and S Rüger (CIVR, 2005)

http://www.inference.phy.cam.ac.uk/vr237/

Page 43: Bayesian Content-Based  Image Retrieval
Page 44: Bayesian Content-Based  Image Retrieval

Conclusions Given a query of a small set of items, Bayesian Sets finds

additional items that belong in this set The score used for ranking items is based on the marginal

likelihood of a probabilistic model For binary data, the score can be computed exactly and

efficiently using sparse matrices (e.g. ~1 sec for over 2 million non-zero entries)

This approach can be extended to many probabilistic models and other forms of data

Where applicable, results competitive with Google Sets Google Sets works well for lists that appear explicitly on the web Bayesian Sets works well for finding more abstract set completions

We have built prototype movie, author, paper, image and protein search systems

Page 45: Bayesian Content-Based  Image Retrieval

Appendix

Page 46: Bayesian Content-Based  Image Retrieval

Image featuresTexture features (75):

We represented images using two types of texture features, 48 Gabor texture features and 27 Tamura texture features. We computed coarseness, contrast and directionality Tamura features, for each of 9 (3x3) tiles. We applied 6 scale sensitive and 4 orientation sensitive Gabor filters to each image point and compute the mean and standard deviation of the resulting distribution of filter responses.

Color features (165):

Computed HSV 3D histogram with 8 bins for H and 5 each for value and saturation. The lowest value bin was not partitioned into hues since these are hard to distinguish.

Binarization:

Each feature was binarized by computing the skewness of the distribution of that feature and giving a binary value of 1 to images falling in the 20 percentile of the heavier tail of the feature distribution.