View
56
Download
2
Category
Tags:
Preview:
DESCRIPTION
(Final Version). Simultaneous Image Classification and Annotation. Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR 2009 Presented by Eric Wang 7-3-09. Outline. Introduction Review of sLDA and Corr -LDA Model description - PowerPoint PPT Presentation
Citation preview
Simultaneous Image Classification and Annotation
Chong Wang, David Blei, Li Fei-FeiComputer Science Department
Princeton University
Published in CVPR 2009
Presented by Eric Wang7-3-09
(Final Version)
Outline• Introduction
• Review of sLDA and Corr-LDA
• Model description
• Model Inference and Parameter Estimation
• Empirical Results
• Conclusion
Introduction• Images Classification refers to assigning a class label to each
image which globally describes the image.
• Image Annotation refers to assigning words which describe individual regions of the image.
• The images considered in this paper are both classified with a class label and annotated with free text.
• This paper will combine the basic framework of Corr-LDA, and a highly modified version of Supervised LDA (sLDA) to yield a model which simultaneously classifies images and annotates the individual regions.
Review of sLDA• For each document
• In this model, the response variable y is a continuous random variable.
• are treated as unknown constants to be estimated, rather than as random variables.Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.
Review of sLDA• An application of sLDA considered by Blei et. al was
regressing a corpus of textual movie reviews to number of stars given.
Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.
Review of Corr-LDA• For each document
• Corr-LDA is a simple extension of LDA for images to annotate image regions.
Source: D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, 2003.
Annotated sLDA Model
• This step is identical to LDA, the topic proportions are drawn once per document.
• In this paper, is not optimized.
Annotated sLDA Model
• A region is characterized by one of 240 codewords (quantized from 128 dimensional SIFT features).
• Regions are found by segmenting images using the N-cuts algorithm.
• parameterizes a particular multinomial distribution (topic) over the quantized codewords.
Annotated sLDA Model
• The class label c is completely determined by the topic indicators z_{1:N} using a modified sLDA framework.
• The total number of classes is known a priori and the class indicators are treated separately from the annotations. This is a simpler approach than the one taken by L.J. Li, R. Socher and L. Fei-Fei in that there is no “switch” variable which determines whether a word is an annotation or label.
• The softmax function is well studied and is also known as “multinomial logistic regression”
Annotated sLDA Model
• The annotations are assigned to specific regions in the same manner as in Corr-LDA.
• This will, for example, encourage words such as “blue” and “white” to be associated with regions (and thus, codewords) which capture sky.
• Though not explicitly shown in the graphical model, and have symmetric Dirichlet priors.
Inference of Latent Variables
These updates are identical to those used in Corr-LDA.
• Let parameterize a multinomial over the K topics
• Let parameterize a Dirichlet over topic distributions.
• Let parameterize a multinomial over image regions
• These updates are local to each document (thus the omission of d).
n
m
Inference of Latent Variables
• This equation updates the posterior distribution over topics.
• Note that this update depends on both class label c and the annotation information .mw
Inference of Latent Variables
Parameter Estimation
has no closed form solution and is optimized via conjugate gradient
Updates of codebook word f in codebook topic i (proportional to a constant).
Inference of Latent Variables
Parameter Estimation
has no closed form solution and is optimized via conjugate gradient
Updates annotation word w in annotation topic i (proportional to a constant).
Empirical Results• LableMe dataset
– 8 classes: “highway,” “inside city,” “tall building,” “street,” “forest,” “coast,” “mountain,” and “open country.”
– 200 256x256 training images per class.• UIUC dataset
– 8 types of sports: “badminton,” “bocce,” “croquet,” “polo,” “rockclimbing,” “rowing,” “sailing” and “snowboarding.”
– 1792 256x256 training images.• 240 codeword dictionary.• Annotations which appeared less than 3 times
were removed.
Empirical Results: Classification
• The black line represents of the performance of Bosch et. al 2006, which employs a non-annotated LDA on the image regions and a KNN to classify the images.
• The blue line is the performance of Fei-Fei and Perona 2005, which uses unannotated labeled images
• The models presented in this paper are much more resistant to overfitting than the models of Bosch et. al and Fei-Fei and Perona .
Emperical Results: Classification
• Confusion matrices comparing the performance of multi-class sLDA with annotations and multi-class sLDA using 100 topic models
• Annotations seem to improve performance slightly, although, as the last slide shows, the main benefit is more consistent performance as a function of the number of topics.
Empirical Results: Annotation• The F-Measure is used as a score.• Results are given over all numbers of topics considered
above.
• LabelMe: • 38.2% (corr-LDA) • 38.7% (multi-class sLDA with annotations)
• UIUC-Sport:• 34.7% (corr-LDA) • 35.0% (multiclass sLDA with annotations).
Conclusion• Combining image annotation with classification provides state
of the art image classification performance.
• However, the addition of the classification framework provides only a small improvement to the annotation performance.
• The authors’ primary contribution is showing that image classification and annotation are related and can be conducted simultaneously in the same framework.
• Inference was done in a Variational EM framework.
Recommended