A decision-theoretic view of image retrieval

A decision-theoretic view of image retrieval

A decision-theoretic view of image retrieval

Nuno VasconcelosCompaq Computer CorporationCambridge Research Lab

http://www.media.mit.edu/~nuno

Nuno VasconcelosCompaq Computer CorporationCambridge Research Lab

http://www.media.mit.edu/~nuno

Nuno Vasconcelos

Content-based retrievalContent-based retrievalContent-based retrievalContent-based retrieval allow users to express queries directly in visual

domain– user provides query image– system extracts low-level features (texture, color, shape)– signature compared with those extracted from database– top matches returned

Texturesimilarity

Colorsimilarity

Shapesimilarity

horses

Nuno Vasconcelos

three main components– feature transformation– feature representation– similarity function

previous solutions have concentrated on some components

two main strategies:– texture: features– color: representation

need: criteria to guide the design of all components

Retrieval architectureRetrieval architectureRetrieval architectureRetrieval architecture

+ +

+

++

++

+

+ +

+

++

++

+++

+ +++ +

+

+++

++ +

++ +

++

++ +

+++

+ ++

+ +

++

+++

++

++

+ +

++

++

+ +++

+ +++ +

+ +++

++

++ ++

+

+++

++

+++

+ ++

+

+

+

+

+ +++

++ ++

+

+

+

++

+++

++

+

+

+

+++

+ ++

+

+

+

+

+

+++ ++

+

+

+

++ +++

++

+

+

+

+

+++

++ ++

+

+

+

++

+++

+ ++

+

+

+

+

+ + +++ +

+

+

+

++

+ ++++ ++

+ +

++

++

+++

+++

+ +++ +

++

+

++

+++ + ++ +

+

+

+

+++++

+ ++

++++ ++

++

+ ++

++ ++

+ ++

+++

+++

+ ++

+

+

+

+

+++

++ ++

+

+

+

++

++

+ ++ ++ +

++

++

+

++

+

+++ ++

+

+

+

++ +++

+ ++

+

+

+

+

+ ++++

+ ++

+

+

+

++

+++

+ ++

+

+

+

+

+ ++

+

++

+

+++

+++

+ +++ +

++

+

+

+++ +

+ ++ +

+++

+ ++

+ +

++

+

+++

++ +

+

+ +

+++

+

+++ +

+

+ +

+ +++

++ +

+

+ +

++

+

+

+++

+ +++ +

++

+

+ +

+ +

+ +++

++ +

+++

+

+

+

=?

++++

+

++

++

+++

+ ++

+

+

+

+

+ +++

++ ++

+

+

++

+++

+ ++

+

+

+

+

+ +

++ ++ ++

+++ ++

+

+

+

++ +++

+ ++

+

+

+

+

+ ++++

+ ++

+

+

+

++

++ ++ ++ ++ ++ ++

+

+

+ ++

+

+

+ +

++ ++

++

+

+

+ +

+

++

++

++

+

++

++ + +

+

+

+

++

+

+

++

++ +

++ +

+ +

+

+

+

++

++

++

+

+

+ +

+

+

+

++

+

++

+ ++

++

+ ++

++

+

++

+

++

++

+

+ ++ + +++++

++

+ +++ ++ + ++

++

++

+

++

+

+++

++

+

+

++

++

++

+++

+ ++

+

+

+

+

+ +++

++ ++

+

+

++

+++

+ ++

+

+

+

+

+ +

++ ++ ++ +

+++ ++

+

+

+

++ +++

+ ++

+

+

+

+

+ ++++

+ ++

+

+

+

++

++ ++ ++ +++ +

+ ++ ++ +

+ ++ + ++++ + ++

++ + ++

+

+

+ ++

+

+

+ +

++ ++

++

+

+

+ +

+

++

++

++

+

++

++ + +

+

+

+

++

+

+

++

++ +

++ +

+ +

+

+

+

++

++

++

+

+

+ +

+

+

+

++

+

++

+ ++

++

+ ++

++

+

++

+

++

++

++

++

++ +

+

+

+

++

Nuno Vasconcelos

Decision-theoretic formulationDecision-theoretic formulation

given: feature space X and set Y={1,…,C} of classes goal: design map that minimizes

probability of retrieval error

Bayes classifier is optimal

establishes and optimal criteria for image similarity

YXg :*

)|(maxarg)(* xiyPxgi

),( ),)((minarg* yxyxgPgg

)()|(maxarg iyPiyxPi

Nuno Vasconcelos

A unified view of image similarityA unified view of image similarityA unified view of image similarityA unified view of image similarity

Bayes

Battacharyya

ML

2 waybound

equalpriors Kullback

LeiblerLarge, iidquery

Quadratic Mahalanobis EuclideanGaussian

orthogonalq = i i = I

2

linearization

Bayes:

Battacharyya:

ML:

Kullback Leibler:

)|(maxarg)( xiyPxgi

)|()|(minarg)( iyxPqxPxgi

)|(maxarg)( iyxPxgi

dxiyxP

qxPqxPxg

i

)|(

)|(log)|(minarg)(

2:

Quadratic:

Mahalanobis:

Euclidean:

dxiyxP

iyxPqxPxg

i

)|(

))|()|((minarg)(

2

)()(minarg)( 1ii

ti

ixxExg

)()(minarg)( 1iqi

tiq

ixg

)()(minarg)( iqt

iqi

xg

Nuno Vasconcelos

Feature transformationFeature transformation Feature transformationFeature transformation

probability of error is lower bounded by Bayes error:

Theorem: for a retrieval system with observation space Z and a feature transformation

the Bayes error on X can never be smaller than that on Z. Equality is achieved if and only if T is invertible.

suggests that emphasis on features is a bad idea

)|(max1* xiyPELi

x

otherwise ,

,**

** invertible is T if

xy

xy

LL

LL

XZT :

Nuno Vasconcelos

Theorem: for a retrieval system with class probabilities p(y=i) and class-likelihood functions p(x|y=i), and a decision function

the difference between real and Bayes error is upper bounded by the L1 distance between real and estimated probabilities

Feature representationFeature representationFeature representationFeature representation

*))(( LyxgP

est

dxiypiyxpiyPiyxPi

)(~)|(~)()|(

)(~)|(~maxarg)( iypiyxpxgi

Nuno Vasconcelos

Feature representationFeature representationFeature representationFeature representation distance between actual and ideal probability of

error (estimation ) is upper bounded by a function of the quality of density estimates

this means: – good estimation is sufficient condition for accurate

retrieval– from the theoretical viewpoint,no reason for features

caveat: estimation is difficult in high dimensions

*L

est

))((max yxgP ))(( yxgP

Nuno Vasconcelos

Color (estimation)-based retrievalColor (estimation)-based retrievalColor (estimation)-based retrievalColor (estimation)-based retrieval no features, emphasis on representation (histograms)

problem: low-order statistics are not sufficient

spatial neighborhoods high dimensionality

Title:Hist.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Nuno Vasconcelos

SummarySummarySummarySummary

low Bayes error: avoid features good image discrimination:

– requires high dimensional spaces – estimation is difficult in high dimensions– can lead to large estimation error

fundamental trade-off of image retrieval:– a feature transformation will increase Bayes error but

can also reduce estimation error the two components have to be considered

simultaneously!

Bayes error < error < Bayes error + estimation

Nuno Vasconcelos

emphasis: discriminant features– simple representation () and similarity function (MD)– years of research on “good” features, e.g. MRSAR– problem: discriminant for texture but not generic– can we get similar performance with generic transform?

– for Bayesian retrieval the features are not so important

Example: texture recognitionExample: texture recognitionExample: texture recognitionExample: texture recognition

Nuno Vasconcelos

Designing retrieval systemsDesigning retrieval systemsDesigning retrieval systemsDesigning retrieval systems the retrieval trade-off:

– low Bayes error: invertible feature transformation– low estimation : expressive feature representation &

low-dimensional feature space directive 1: get the most expressive

representation you can afford! directive 2: role for feature transform is

dimensionality reduction– images live on a low-dimensional manifold embedded in

high dimensional space– feature transformation should eliminate unnecessary

dimensions– while staying as close to invertible as possible

Nuno Vasconcelos

Feature representationFeature representationFeature representationFeature representation among expressive models (kernel

estimators) we like Gaussian mixtures

because they are:– compact (computational efficiency)– able to capture details of multi-modal densities

(histogram)– computationally tractable in high dimensions (Gaussian)

)()(

21

2/

1

2)|( ikik

tik xx

k ikn

k eiyxP

Nuno Vasconcelos

dimensionality reduction has been thoroughly studied in compression literature

“close to invertible” = minimum reconstruction error

Feature transformationFeature transformationFeature transformationFeature transformation

)0,,0,,,(),,,,( where 111 idii xxxxxQ

xTQTxE iT

1min

T Q3 T-1

Nuno Vasconcelos

Optimal transformationOptimal transformationOptimal transformationOptimal transformation optimal solution (squared error sense): principal

component analysis for T(x) = x

iff *k = [v1,…,vk], vi = ith eigenvector of x, 1<…<n

problems:– squared error is not Bayes error– PCA does not mimic well early human vision

xQxE kk

1* minarg

Nuno Vasconcelos

Alternative transformationsAlternative transformationsAlternative transformationsAlternative transformations defining sparse representation as one where the

coefficients are close to zero most of the time (high kurtosis)

Olshausen and Field have shown that if we add a sparseness constraint to PCA

the resulting basis functions are remarkably similar to the receptive field of the cells found in V1.

xQSxQxE kkk

1* minarg

multiplier Lagrange a ,),,( 1 i

in xxxSwhere

Nuno Vasconcelos

Basis functionsBasis functionsBasis functionsBasis functions

Nuno Vasconcelos

In practiceIn practiceIn practiceIn practice early stages of vision: dimensionality reduction, but

subject to “efficiency” constraints sparse representations are computationally intensive can be reasonably approximated by wavelets we have obtained good results even with the DCT in summary, this indicates it is possible to have

feature transformations that:– achieve good balance between invertibility and dim.

reduction– capture the most important aspects of early human vision – have reduced complexity

work needed to find the best transformation

Nuno Vasconcelos

Lemma: restriction of a Gaussian mixture to a linear subspace is still a Gaussian mixture

Gaussian mixture on a multi-resolution feature space:

– family of embedded densities over multiple image scales– each dimension adds higher resolution information– DC only = histogram

Invariance propertiesInvariance propertiesInvariance propertiesInvariance properties

Nuno Vasconcelos

Embedded multi-resolution Embedded multi-resolution mixturemixtureEmbedded multi-resolution Embedded multi-resolution mixturemixture explicit control over trade-off between “invariant”

and “invertible” (low Bayes error)

invariant invertible

Nuno Vasconcelos




Nuno Vasconcelos




Nuno Vasconcelos




Nuno Vasconcelos

Impact on retrieval accuracyImpact on retrieval accuracyImpact on retrieval accuracyImpact on retrieval accuracy

overall, the EMM representation: – extends histogram: account for spatial dependencies – extends Gaussian: expressive power to capture density details

combines good properties of color and texture-based approaches

precision: % of retrieved that are relevant to query

recall: % of relevant that are retrieved

Nuno Vasconcelos

Retrieval resultsRetrieval resultsRetrieval resultsRetrieval results

Bayesian retrieval with embedded mixtures is clearly superior: up to 10% better than next best method (correlogram)

comparison:Corel DB

1500 images, 15 classesmethods:

MRSAR+MD (texture) histogram intersection (color) color correlograms (both) DCT+ Gaussian mixtures + ML

Nuno Vasconcelos

ConclusionsConclusionsConclusionsConclusions

Probabilistic architecture for image similarity decision-theoretic formulation unifying view of similarity optimal guidelines for feature transformation and

representation DCT + Gaussian mixtures works well across various types of databases

Nuno Vasconcelos

Object recognitionObject recognitionObject recognitionObject recognitionBayesian + embedded multi-resolution mixture:

Color histograms + Histogram Intersection (Swain & Ballard):

Nuno Vasconcelos

Texture recognitionTexture recognitionTexture recognitionTexture recognitionBayesian + embedded resolution mixture:

MRSAR model + Mahalanobis distance (Mao & Jain):

Documents

A decision-theoretic view of image retrieval