Simultaneous Image Classification and Annotation

Chong Wang, David Blei, Li Fei-FeiComputer Science Department

Princeton University

Published in CVPR 2009

Presented by Eric Wang7-3-09

(Final Version)

Outline• Introduction

• Review of sLDA and Corr-LDA

• Model description

• Model Inference and Parameter Estimation

• Empirical Results

• Conclusion

Introduction• Images Classification refers to assigning a class label to each

image which globally describes the image.

• Image Annotation refers to assigning words which describe individual regions of the image.

• The images considered in this paper are both classified with a class label and annotated with free text.

• This paper will combine the basic framework of Corr-LDA, and a highly modified version of Supervised LDA (sLDA) to yield a model which simultaneously classifies images and annotates the individual regions.

Review of sLDA• For each document

• In this model, the response variable y is a continuous random variable.

• are treated as unknown constants to be estimated, rather than as random variables.Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

Review of sLDA• An application of sLDA considered by Blei et. al was

regressing a corpus of textual movie reviews to number of stars given.

Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

Review of Corr-LDA• For each document

• Corr-LDA is a simple extension of LDA for images to annotate image regions.

Source: D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, 2003.

Annotated sLDA Model

• This step is identical to LDA, the topic proportions are drawn once per document.

• In this paper, is not optimized.

• A region is characterized by one of 240 codewords (quantized from 128 dimensional SIFT features).

• Regions are found by segmenting images using the N-cuts algorithm.

• parameterizes a particular multinomial distribution (topic) over the quantized codewords.

• The class label c is completely determined by the topic indicators z_{1:N} using a modified sLDA framework.

• The total number of classes is known a priori and the class indicators are treated separately from the annotations. This is a simpler approach than the one taken by L.J. Li, R. Socher and L. Fei-Fei in that there is no “switch” variable which determines whether a word is an annotation or label.

• The softmax function is well studied and is also known as “multinomial logistic regression”

• The annotations are assigned to specific regions in the same manner as in Corr-LDA.

• This will, for example, encourage words such as “blue” and “white” to be associated with regions (and thus, codewords) which capture sky.

• Though not explicitly shown in the graphical model, and have symmetric Dirichlet priors.

Inference of Latent Variables

These updates are identical to those used in Corr-LDA.

• Let parameterize a multinomial over the K topics

• Let parameterize a Dirichlet over topic distributions.

• Let parameterize a multinomial over image regions

• These updates are local to each document (thus the omission of d).

• This equation updates the posterior distribution over topics.

• Note that this update depends on both class label c and the annotation information .mw

Parameter Estimation

has no closed form solution and is optimized via conjugate gradient

Updates of codebook word f in codebook topic i (proportional to a constant).

Parameter Estimation

has no closed form solution and is optimized via conjugate gradient

Updates annotation word w in annotation topic i (proportional to a constant).

Empirical Results• LableMe dataset

– 8 classes: “highway,” “inside city,” “tall building,” “street,” “forest,” “coast,” “mountain,” and “open country.”

– 200 256x256 training images per class.• UIUC dataset

– 8 types of sports: “badminton,” “bocce,” “croquet,” “polo,” “rockclimbing,” “rowing,” “sailing” and “snowboarding.”

– 1792 256x256 training images.• 240 codeword dictionary.• Annotations which appeared less than 3 times

were removed.

Empirical Results: Classification

• The black line represents of the performance of Bosch et. al 2006, which employs a non-annotated LDA on the image regions and a KNN to classify the images.

• The blue line is the performance of Fei-Fei and Perona 2005, which uses unannotated labeled images

• The models presented in this paper are much more resistant to overfitting than the models of Bosch et. al and Fei-Fei and Perona .

Emperical Results: Classification

• Confusion matrices comparing the performance of multi-class sLDA with annotations and multi-class sLDA using 100 topic models

• Annotations seem to improve performance slightly, although, as the last slide shows, the main benefit is more consistent performance as a function of the number of topics.

Empirical Results: Annotation• The F-Measure is used as a score.• Results are given over all numbers of topics considered

above.

• LabelMe: • 38.2% (corr-LDA) • 38.7% (multi-class sLDA with annotations)

• UIUC-Sport:• 34.7% (corr-LDA) • 35.0% (multiclass sLDA with annotations).

Conclusion• Combining image annotation with classification provides state

of the art image classification performance.

• However, the addition of the classification framework provides only a small improvement to the annotation performance.

• The authors’ primary contribution is showing that image classification and annotation are related and can be conducted simultaneously in the same framework.

• Inference was done in a Variational EM framework.

Simultaneous Image Classification and Annotation

Documents

Phylogeny, classification and evolution of ladybird ... · Phylogeny, classiﬁcation and evolution of ladybird beetles (Coleoptera: Coccinellidae) based on simultaneous analysis

English PropBank Annotation Guidelinesverbs.colorado.edu/~mpalmer/projects/ace/EPB-annotation...Chapter 1 Verb Annotation Instructions 1.1 PropBank Annotation Goals PropBank is a corpus

Annotation - 碁峰資訊epaper.gotop.com.tw/pdf/A155.pdf · Retention meta-annotation Java annotation annotation class class Java virtual machine annotation class class annotation

Simultaneous Feature Selection and Feature …sreevani_r/Dissertation_260.pdf1 Simultaneous Feature Selection and Feature Extraction for Pattern Classification A dissertation submitted

დიდაჭარის ხეობის ტოპონიმია bolqvadze.pdf · 3 Annotation Research of general theoretical issues of structural and semantic classification

---- TECHNICAL REPORT Security Classification DOCUMENT CONTROL DATA -R & D• (Scurity Classification of title. body of abstract a -d indexing annotation must be entered when the overall

English PropBank Annotation Guidelinesverbs.colorado.edu/propbank/EPB-Annotation-Guidelines.pdfChapter 1 Verb Annotation Instructions 1.1 PropBank Annotation Goals PropBank is a corpus

NeoVision 2 Annotation Guidelines - Welcome to iLab!ilab.usc.edu/neo2/dataset/neovision2-annotation-guidelines.pdf11. Annotation Quality Assurance Quality is assured throughout annotation

NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation

The PIRSF Protein Classification System as a Basis for Automated UniProt Protein Annotation

An Automated Video Classification and Annotation Using ... · multimedia information retrieval in recent years. Most previous work on multi-modal data retrieval and classification,

DRAFT - Version 1/27/2014 - Annotation Studio · Type your annotation into the Annotation Editor. !!!!! 3.5 Tags and privacy options! Below the annotation ﬁeld in the Annotation

Semi-Supervised Classiﬁcationpages.cs.wisc.edu/~jerryzhu/pub/AERFAI06ssl.pdf · Labels can be hard to get Human annotation is slow, boring Semi-supervised classification. Xiaojin

Formal Model and Conceptual Architecture of the Annotation ...ferro/papers/2004/PhD2004-F.pdfASI Annotation Service Integrator ASM Annotation Storing Manager ATIM Annotation Textual

Classification of hyperspectral urban data using adaptive ... · Classification of hyperspectral urban data using adaptive simultaneous orthogonal matching pursuit Jinyi Zou,a Wei

Image Classification for Automatic Annotation Jianping Fan Department of Computer Science University of North Carolina at Charlotte jfan

Annotation. Traditional genome annotation BLAST Similarities

Annotation consistency using annotation intersections

Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Boom Chameleon: Simultaneous capture of 3D viewpoint ... · annotation information. In effect, these systems constantly record all data streams removing the burden of the user having