20 cv mil_models_for_words

Preview:

DESCRIPTION

 

Citation preview

Computer vision: models, learning and inference

Chapter 20 Models for visual words

Please send errata to s.prince@cs.ucl.ac.uk

2

Visual words

2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Most models treat data as continuous• Likelihood based on normal distribution• Visual words = discrete representation of

image• Likelihood based on categorical distribution• Useful for difficult tasks such as scene

recognition and object recognition

3

Motivation: scene recognition

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

4

Structure

4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

5

Computing dictionary of visual words

5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

1. For every one of the I training images, select a set of Ji spatial locations.• Interest points• Regular grid

2. Compute a descriptor at each spatial location in each image

3. Cluster all of these descriptor vectors into K groups using a method such as the K-Means algorithm

4. The means of the K clusters are used as the K prototype vectors in the dictionary.

6

Encoding images as visual words

6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

1. Select a set of J spatial locations in the image using the same method as for the dictionary

2. Compute the descriptor at each of the J spatial locations. 3. Compare each descriptor to the set of K prototype

descriptors in the dictionary4. Assign a discrete index to this location that corresponds to

the index of the closest word in the dictionary.

End result:

Discrete feature index x and y position

7

Structure

7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

8

Bag of words model

8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Key idea:

• Abandon all spatial information• Just represent image by relative frequency

(histogram) of words from dictionary

where

9

Bag of words

9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

10

Structure

10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Learning (MAP solution):

Inference:

11

Bag of words for object recognition

11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

12

Problems with bag of words

12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

13

Structure

13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

14

Latent Dirichlet allocation

14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Describes relative frequency of visual words in a single image (no world term)

• Words not generated independently (connected by hidden variable)

• Analogy to text documents– Each image contains mixture of several topics (parts)– Each topic induces a distribution over words

15

Latent Dirichlet allocation

15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

16

Latent Dirichlet allocation

16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Generative equations

Marginal distribution over features

Conjugate priors over parameters

17

Latent Dirichlet allocation

17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

18

Learning LDA model

18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Part labels p hidden variables• If we knew them then it would be easy to estimate the

parameters

• How about EM algorithm? Unfortunately, parts within in image not independent

19

Latent Dirichlet allocation

19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

20

Learning

20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Strategy:

1. Write an expression for posterior distribution over part labels

2. Draw samples from posterior using MCMC3. Use samples to estimate parameters

21

1. Posterior over part labels

21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Can compute two terms in numerator in closed formDenominator

intractable

22

2. Draw samples from posterior

22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Gibbs’ sampling: fix all part labels except one and sample from conditional distribution

This can be computed in closed form

23

3. Use samples to estimate parameters

23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Samples substitute in for real part labels in update equations

24

Structure

24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

25

Single author topic model

25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

26

Single author-topic model

26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

27

Learning

27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

1. Posterior over part labels

Likelihood same as before, prior becomes

28

Learning

28Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

2. Draw samples from posterior

3. Use samples to estimate parameters

29

Inference

29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Compute posterior over categories

Likelihood that words in this image are due to category n

30

Structure

30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

31

Problems with bag of words

31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

32

Constellation model

32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

33

Constellation model

33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

34

Learning

34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

1. Posterior over part labels

Prior same as before, likelihood becomes

35

Learning

35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

2. Draw samples from posterior

3. Use samples to estimate parameters

Part and word probabilities as before

36

Inference

36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Compute posterior over categories

Likelihood that words in this image are due to category n

37

Learning

37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

38

Structure

38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

39

Problems with bag of words

39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

40

Scene model

40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

41

Scene model

41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

42

Structure

42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

43

Video Google

43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

44

Action recognition

44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Spatio-temporal bag of words model 91.8% classification

45

Action recognition

45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Recommended