25
SUPERVISED LEARNING OF SEMANTIC CLASSES FOR IMAGE ANNOTATION AND RETRIEVAL G. Carneiro, A. Chan, P. Moreno N. Vasconcelos by: Lukáš Tencer ECSE626 2012

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Embed Size (px)

DESCRIPTION

This is presentation done by me for ECSE626 "Statistical Computer Vision" at McGill University. It is presentation of a project inspired by paper "Supervised Learning of Semantic Classes for Image Annotation and Retrieval" from PAMI 2007. It presents my implementation of the paper and my achieved results.

Citation preview

Page 1: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

SUPERVISED LEARNING OF SEMANTIC CLASSES FOR IMAGE ANNOTATION AND RETRIEVAL

G. Carneiro, A. Chan, P. Moreno N. Vasconcelos

by: Lukáš Tencer

ECSE626 2012

Page 2: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Outline

• Introduction• Prior techniques

• Supervised OVA Labeling• Unsupervised Labeling

• Methodology• Supervised Multiclass Labeling• Semantic Distribution Estimation• Density Estimation

• Algorithm• Learning, Annotation, Retrieval

• Results• Quantitative• Qualitative

• Conclusion

Page 3: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Introduction

• Task• Assign labels to unknown images• Retrieve relevant images given labels

• Supervised Learning• Learning from labeled training data• Training data consist of pairs • Multiple instance learning

• Semantic Classes• labels representing common concepts (sky, bear, snow…)

• Image Annotation and Retrieval• Annotation: Given the image D, what labels are present in

the image• Given the label what are the top n matching images

nilx ii ...1 },{

Page 4: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Introduction

Datasets: Corel5K – 5000 images, 272 Classes Corel30K – 30000 images, 1120 Classes MIRFLICKR – 25000 images, 37 Classes (PSU) – not available anymore

ImageCLEF - The CLEF (Cross Language Evaluation Forum) Cross Language Image Retrieval Track

Medical Image retrieval Photo Annotation Plant Identification Wikipedia Retrieval Patent Image Retrieval and Classification

Page 5: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Introduction

Corel 5K Corel 30K MIRFLICKRBear New Zealand Urban

Page 6: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Prior Techniques

Supervised OVA Binary decision problem, concept present /

absent Hidden variable Yi

Decision rule: Unsupervised Learning

Modeling dependency between text label and image features, expressed as hidden variable L

Considering just positive examples, densities for Yi=1

)0()0|()1()1|( || iiii YYXYYX PXPPXP

D

l LWLXWX lPlwPlxPwxP1 ||, )(),(),(),(

L

W XW1 W2 W3 X

bear

polar, grizzly features

Page 7: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Methodology

Supervised Multiclass Labeling (SML) Elements of semantic vocabulary (W) are

explicitly made to semantic classes (L) ! Random var. W:

annotation and retrieval is then easy to do as:

Annotation Retrieval

)|(P and from sample is ifonly },...,1{ , W|X ixwxTiiW i

)(

)(),()|( |

| xP

iPixPxiP

X

WWXXW

)|(maxarg)(* | XiPXi XWi )|(maxarg)(* | iXPwj jWXji

???

Page 8: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Methodology

Estimation of Semantic Class Distributions

Given Di training set of images, estimate Assumption: Gaussian Distribution How to estimate?

Direct estimation Model Averaging Naive Averaging

GMM model:

Averaged:

)|(| ixP WX

iD

l WLXi

WX ilxPD

ixP1 ,|| ),|(

1),(

k

kli

kli

kliWLX xGilxP ),,(),|( ,,,,|

k

D

l

kli

kli

kli

iWX

i

xGD

ixP1

,,,| ),,(1

)|(

Page 9: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Methodology

Mixture hierarchies First step, get GMM from images –

regular soft EM

E:

M:

8

1| ),,()|(

k

kI

kI

kIWX xGIxP

InitializationEuclidian distance

Mahalonobis distance

Initial Par. estimate

Expectation

Maximizaiton

Max iter. 200Change in likelihood is too small

n

ij jjiji xGjzzxP

1

2

1),;()()|,(

)|,()|,()|,( 1 ttt zxPzxPzxP

)],;([log),(,|

ZXFEQ txz

t

),(maxarg1 tt Q

Page 10: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Methodology

Mixture hierarchies for label Second step, get HGMM for labels

E:

M:

64

1| ),,()|(

k

kw

kw

kwWX xGwxP Initialization

Bhattacharyya distance

Initial Par. estimate

Expectation

Maximizaiton

Max iter. 200Change in likelihood is too small

n

ij jjiji xGjzzxP

1

2

1),;()()|,(

)|,()|,()|,( 1 ttt zxPzxPzxP

)],;([log),(,|

ZXFEQ txz

t

),(maxarg1 tt Q

Page 11: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

E and M step for HGMM

Input: Output: E-step:

M-step:

KkDj ikj

kj

kj ,...,1,,...,1},,,{

l

lc

Ntracelc

lc

kj

mc

Ntracemc

mc

kjm

jkkj

kj

lc

kj

kj

mc

eG

eGh

]),,([

]),,([

}){(2

1

}){(2

1

1

1

Mmmj

mj

mj ,...,1},,,{

KD

h

i

mjkjknewm

c

)(

jkjk

kj

mjk

kj

mjkm

jkkj

mjk

newmc h

hww

where,)(

jk

Tmc

kj

mc

kj

kj

mjk

newmc w ]))(([)(

Page 12: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Algorithm - learning

Training For each training set I for label w Decompose image (192px * 128px ) into 8x8

regions by sliding window moving each 2 pixels Calculate DCT for each window (8*8*3) 192-d

feature vector Calculate mixture of 8 Gaussians for each

Image using EM

Calculate mixture of 64 Gaussians for each label using H-EM

8

1| ),,()|(

k

kI

kI

kIWX xGIxP

64

1| ),,()|(

k

kw

kw

kwWX xGwxP

Page 13: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Algorithm – annotation, retrieval

Annotation Get n(5) beast labels for image I Get features from image ((192*128/2)*192) Get log likelihood for each label, choose the

best n

Retrieval For images IT and label w: Annotate IT and get decreasing scores of

posterior

x

iWXiWX wxPwP )|(log)|(log ||

)|(| iWX wP

Page 14: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Results-quantitative

Database: Corel 5k Precision: Recall:

4000 training 1000 testing

retrieved

retrievedrelevant

relevant

retrievedrelevant H

C

w

wrecall

auto

C

w

wprecision

annotated automatic

annotatedhuman

images annotatedcorrectly

auto

H

C

w

w

w

Page 15: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Results-quantitative

Non zero recall mean Recall mean Precision

1 2 3 4 5 6

w with Recall > 0 140 121 110 125 90 131

Mean Recall per w 0.27 0.25 0.25 0.26 0.23 0.27

Mean Precision pre w

0.25 0.24 0.23 0.23 0.2 0.23

Annotation

Page 16: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Results-quantitative

Recall > 0 PrecisionAll precision

1 2 3 4 5 6

Mean Recall all w 0.23 0.21 0.20 0.21 0.19 0.24

Mean Recall per w R>0

0.45 0.40 0.40 0.41 0.37 0.41

Retrieval

Page 17: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Results-qualitative

Page 18: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Results-qualitative

plane jet f-14 sky-----------------------sky plane clouds smoke snow

coast waves water hills -----------------------water sky ocean mountain clouds

polar bear bars cage -----------------------bear snow texture sunrise closeup

people cheese market street -----------------------people wall sand flower bird

Page 19: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Results-qualitative

Page 20: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Results-qualitative

Blooms Mountain Pool Smoke Woman

Page 21: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Results-qualitative

Page 22: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Conclusions

Pros Nice segmentation as byproduct of annotation Great for general concepts with lots of samples Just weakly annotated data is required (multi-instance

learning) Allows hierarchical representation (adding images, speed)

Contras Fixed number of labels per image Learning is time consuming Parameter tuning is time consuming Weakly represented classes could be associated with wrong

concepts

Page 23: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Resources

Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 29, 394–410 (2007).

Gudivada, V.N., Raghavan, V.V.: Content based image retrieval systems. Computer. 28, 18–22 (1995).

Belongie, S., Carson, C., Greenspan, H., Malik, J.: Color-and texture-based image segmentation using EM and its application to content-based image retrieval. Computer Vision, 1998. Sixth International Conference on. pp. 675–682. IEEE (1998).

Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 71, 593–613 (2009).

Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys. 40, 1-60 (2008).

Page 24: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Thank you for your attentionQuestions?

[email protected]://tencer.hustej.net@lukastenceraccuratelyrandom.blogspot.comfacebook.com/lukas.tencer

Page 25: Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Google labeling game