Upload
charlotte-schroeder
View
37
Download
1
Tags:
Embed Size (px)
DESCRIPTION
A decision-theoretic view of image retrieval. Nuno Vasconcelos Compaq Computer Corporation Cambridge Research Lab http://www.media.mit.edu/~nuno. horses. Texture similarity. Color similarity. Shape similarity. Content-based retrieval. - PowerPoint PPT Presentation
Citation preview
A decision-theoretic view of image retrieval
A decision-theoretic view of image retrieval
Nuno VasconcelosCompaq Computer CorporationCambridge Research Lab
http://www.media.mit.edu/~nuno
Nuno VasconcelosCompaq Computer CorporationCambridge Research Lab
http://www.media.mit.edu/~nuno
Nuno Vasconcelos
Content-based retrievalContent-based retrievalContent-based retrievalContent-based retrieval allow users to express queries directly in visual
domain– user provides query image– system extracts low-level features (texture, color, shape)– signature compared with those extracted from database– top matches returned
Texturesimilarity
Colorsimilarity
Shapesimilarity
horses
Nuno Vasconcelos
three main components– feature transformation– feature representation– similarity function
previous solutions have concentrated on some components
two main strategies:– texture: features– color: representation
need: criteria to guide the design of all components
Retrieval architectureRetrieval architectureRetrieval architectureRetrieval architecture
+ +
+
++
++
+
+ +
+
++
++
+++
+ +++ +
+
+++
++ +
++ +
++
++ +
+++
+ ++
+ +
++
+++
++
++
+ +
++
++
+ +++
+ +++ +
+ +++
++
++ ++
+
+++
++
+++
+ ++
+
+
+
+
+ +++
++ ++
+
+
+
++
+++
++
+
+
+
+++
+ ++
+
+
+
+
+
+++ ++
+
+
+
++ +++
++
+
+
+
+
+++
++ ++
+
+
+
++
+++
+ ++
+
+
+
+
+ + +++ +
+
+
+
++
+ ++++ ++
+ +
++
++
+++
+++
+ +++ +
++
+
++
+++ + ++ +
+
+
+
+++++
+ ++
++++ ++
++
+ ++
++ ++
+ ++
+++
+++
+ ++
+
+
+
+
+++
++ ++
+
+
+
++
++
+ ++ ++ +
++
++
+
++
+
+++ ++
+
+
+
++ +++
+ ++
+
+
+
+
+ ++++
+ ++
+
+
+
++
+++
+ ++
+
+
+
+
+ ++
+
++
+
+++
+++
+ +++ +
++
+
+
+++ +
+ ++ +
+++
+ ++
+ +
++
+
+++
++ +
+
+ +
+++
+
+++ +
+
+ +
+ +++
++ +
+
+ +
++
+
+
+++
+ +++ +
++
+
+ +
+ +
+ +++
++ +
+++
+
+
+
=?
++++
+
++
++
+++
+ ++
+
+
+
+
+ +++
++ ++
+
+
++
+++
+ ++
+
+
+
+
+ +
++ ++ ++
+++ ++
+
+
+
++ +++
+ ++
+
+
+
+
+ ++++
+ ++
+
+
+
++
++ ++ ++ ++ ++ ++
+
+
+ ++
+
+
+ +
++ ++
++
+
+
+ +
+
++
++
++
+
++
++ + +
+
+
+
++
+
+
++
++ +
++ +
+ +
+
+
+
++
++
++
+
+
+ +
+
+
+
++
+
++
+ ++
++
+ ++
++
+
++
+
++
++
+
+ ++ + +++++
++
+ +++ ++ + ++
++
++
+
++
+
+++
++
+
+
++
++
++
+++
+ ++
+
+
+
+
+ +++
++ ++
+
+
++
+++
+ ++
+
+
+
+
+ +
++ ++ ++ +
+++ ++
+
+
+
++ +++
+ ++
+
+
+
+
+ ++++
+ ++
+
+
+
++
++ ++ ++ +++ +
+ ++ ++ +
+ ++ + ++++ + ++
++ + ++
+
+
+ ++
+
+
+ +
++ ++
++
+
+
+ +
+
++
++
++
+
++
++ + +
+
+
+
++
+
+
++
++ +
++ +
+ +
+
+
+
++
++
++
+
+
+ +
+
+
+
++
+
++
+ ++
++
+ ++
++
+
++
+
++
++
++
++
++ +
+
+
+
++
Nuno Vasconcelos
Decision-theoretic formulationDecision-theoretic formulation
given: feature space X and set Y={1,…,C} of classes goal: design map that minimizes
probability of retrieval error
Bayes classifier is optimal
establishes and optimal criteria for image similarity
YXg :*
)|(maxarg)(* xiyPxgi
),( ),)((minarg* yxyxgPgg
)()|(maxarg iyPiyxPi
Nuno Vasconcelos
A unified view of image similarityA unified view of image similarityA unified view of image similarityA unified view of image similarity
Bayes
Battacharyya
ML
2 waybound
equalpriors Kullback
LeiblerLarge, iidquery
Quadratic Mahalanobis EuclideanGaussian
orthogonalq = i i = I
2
linearization
Bayes:
Battacharyya:
ML:
Kullback Leibler:
)|(maxarg)( xiyPxgi
)|()|(minarg)( iyxPqxPxgi
)|(maxarg)( iyxPxgi
dxiyxP
qxPqxPxg
i
)|(
)|(log)|(minarg)(
2:
Quadratic:
Mahalanobis:
Euclidean:
dxiyxP
iyxPqxPxg
i
)|(
))|()|((minarg)(
2
)()(minarg)( 1ii
ti
ixxExg
)()(minarg)( 1iqi
tiq
ixg
)()(minarg)( iqt
iqi
xg
Nuno Vasconcelos
Feature transformationFeature transformation Feature transformationFeature transformation
probability of error is lower bounded by Bayes error:
Theorem: for a retrieval system with observation space Z and a feature transformation
the Bayes error on X can never be smaller than that on Z. Equality is achieved if and only if T is invertible.
suggests that emphasis on features is a bad idea
)|(max1* xiyPELi
x
otherwise ,
,**
** invertible is T if
xy
xy
LL
LL
XZT :
Nuno Vasconcelos
Theorem: for a retrieval system with class probabilities p(y=i) and class-likelihood functions p(x|y=i), and a decision function
the difference between real and Bayes error is upper bounded by the L1 distance between real and estimated probabilities
Feature representationFeature representationFeature representationFeature representation
*))(( LyxgP
est
dxiypiyxpiyPiyxPi
)(~)|(~)()|(
)(~)|(~maxarg)( iypiyxpxgi
Nuno Vasconcelos
Feature representationFeature representationFeature representationFeature representation distance between actual and ideal probability of
error (estimation ) is upper bounded by a function of the quality of density estimates
this means: – good estimation is sufficient condition for accurate
retrieval– from the theoretical viewpoint,no reason for features
caveat: estimation is difficult in high dimensions
*L
est
))((max yxgP ))(( yxgP
Nuno Vasconcelos
Color (estimation)-based retrievalColor (estimation)-based retrievalColor (estimation)-based retrievalColor (estimation)-based retrieval no features, emphasis on representation (histograms)
problem: low-order statistics are not sufficient
spatial neighborhoods high dimensionality
Title:Hist.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Nuno Vasconcelos
SummarySummarySummarySummary
low Bayes error: avoid features good image discrimination:
– requires high dimensional spaces – estimation is difficult in high dimensions– can lead to large estimation error
fundamental trade-off of image retrieval:– a feature transformation will increase Bayes error but
can also reduce estimation error the two components have to be considered
simultaneously!
Bayes error < error < Bayes error + estimation
Nuno Vasconcelos
emphasis: discriminant features– simple representation () and similarity function (MD)– years of research on “good” features, e.g. MRSAR– problem: discriminant for texture but not generic– can we get similar performance with generic transform?
– for Bayesian retrieval the features are not so important
Example: texture recognitionExample: texture recognitionExample: texture recognitionExample: texture recognition
Nuno Vasconcelos
Designing retrieval systemsDesigning retrieval systemsDesigning retrieval systemsDesigning retrieval systems the retrieval trade-off:
– low Bayes error: invertible feature transformation– low estimation : expressive feature representation &
low-dimensional feature space directive 1: get the most expressive
representation you can afford! directive 2: role for feature transform is
dimensionality reduction– images live on a low-dimensional manifold embedded in
high dimensional space– feature transformation should eliminate unnecessary
dimensions– while staying as close to invertible as possible
Nuno Vasconcelos
Feature representationFeature representationFeature representationFeature representation among expressive models (kernel
estimators) we like Gaussian mixtures
because they are:– compact (computational efficiency)– able to capture details of multi-modal densities
(histogram)– computationally tractable in high dimensions (Gaussian)
)()(
21
2/
1
2)|( ikik
tik xx
k ikn
k eiyxP
Nuno Vasconcelos
dimensionality reduction has been thoroughly studied in compression literature
“close to invertible” = minimum reconstruction error
Feature transformationFeature transformationFeature transformationFeature transformation
)0,,0,,,(),,,,( where 111 idii xxxxxQ
xTQTxE iT
1min
T Q3 T-1
Nuno Vasconcelos
Optimal transformationOptimal transformationOptimal transformationOptimal transformation optimal solution (squared error sense): principal
component analysis for T(x) = x
iff *k = [v1,…,vk], vi = ith eigenvector of x, 1<…<n
problems:– squared error is not Bayes error– PCA does not mimic well early human vision
xQxE kk
1* minarg
Nuno Vasconcelos
Alternative transformationsAlternative transformationsAlternative transformationsAlternative transformations defining sparse representation as one where the
coefficients are close to zero most of the time (high kurtosis)
Olshausen and Field have shown that if we add a sparseness constraint to PCA
the resulting basis functions are remarkably similar to the receptive field of the cells found in V1.
xQSxQxE kkk
1* minarg
multiplier Lagrange a ,),,( 1 i
in xxxSwhere
Nuno Vasconcelos
Basis functionsBasis functionsBasis functionsBasis functions
Nuno Vasconcelos
In practiceIn practiceIn practiceIn practice early stages of vision: dimensionality reduction, but
subject to “efficiency” constraints sparse representations are computationally intensive can be reasonably approximated by wavelets we have obtained good results even with the DCT in summary, this indicates it is possible to have
feature transformations that:– achieve good balance between invertibility and dim.
reduction– capture the most important aspects of early human vision – have reduced complexity
work needed to find the best transformation
Nuno Vasconcelos
Lemma: restriction of a Gaussian mixture to a linear subspace is still a Gaussian mixture
Gaussian mixture on a multi-resolution feature space:
– family of embedded densities over multiple image scales– each dimension adds higher resolution information– DC only = histogram
Invariance propertiesInvariance propertiesInvariance propertiesInvariance properties
Nuno Vasconcelos
Embedded multi-resolution Embedded multi-resolution mixturemixtureEmbedded multi-resolution Embedded multi-resolution mixturemixture explicit control over trade-off between “invariant”
and “invertible” (low Bayes error)
invariant invertible
Nuno Vasconcelos
Embedded multi-resolution Embedded multi-resolution mixturemixtureEmbedded multi-resolution Embedded multi-resolution mixturemixture explicit control over trade-off between “invariant”
and “invertible” (low Bayes error)
invariant invertible
Nuno Vasconcelos
Embedded multi-resolution Embedded multi-resolution mixturemixtureEmbedded multi-resolution Embedded multi-resolution mixturemixture explicit control over trade-off between “invariant”
and “invertible” (low Bayes error)
invariant invertible
Nuno Vasconcelos
Embedded multi-resolution Embedded multi-resolution mixturemixtureEmbedded multi-resolution Embedded multi-resolution mixturemixture explicit control over trade-off between “invariant”
and “invertible” (low Bayes error)
invariant invertible
Nuno Vasconcelos
Impact on retrieval accuracyImpact on retrieval accuracyImpact on retrieval accuracyImpact on retrieval accuracy
overall, the EMM representation: – extends histogram: account for spatial dependencies – extends Gaussian: expressive power to capture density details
combines good properties of color and texture-based approaches
precision: % of retrieved that are relevant to query
recall: % of relevant that are retrieved
Nuno Vasconcelos
Retrieval resultsRetrieval resultsRetrieval resultsRetrieval results
Bayesian retrieval with embedded mixtures is clearly superior: up to 10% better than next best method (correlogram)
comparison:Corel DB
1500 images, 15 classesmethods:
MRSAR+MD (texture) histogram intersection (color) color correlograms (both) DCT+ Gaussian mixtures + ML
Nuno Vasconcelos
ConclusionsConclusionsConclusionsConclusions
Probabilistic architecture for image similarity decision-theoretic formulation unifying view of similarity optimal guidelines for feature transformation and
representation DCT + Gaussian mixtures works well across various types of databases
Nuno Vasconcelos
Object recognitionObject recognitionObject recognitionObject recognitionBayesian + embedded multi-resolution mixture:
Color histograms + Histogram Intersection (Swain & Ballard):
Nuno Vasconcelos
Texture recognitionTexture recognitionTexture recognitionTexture recognitionBayesian + embedded resolution mixture:
MRSAR model + Mahalanobis distance (Mao & Jain):