20
Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Embed Size (px)

Citation preview

Page 1: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Estimation of the Number of Relevant Images in Infinite Databases

Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Page 2: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Introduction

Due to the increased importance of the Internet, the use of image search engines is becoming increasingly widespread. However, it is difficult for users to make a decision as to which image search engine should be selected.

The more effective the system is, the more it will offer satisfaction to the user.

Retrieval effectiveness becomes one of the most important parameters to measure the performance of image retrieval systems.

Page 3: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Measures: Precision

Recall

Significant Challenge: the total number of relevant images is not directly observable in such a potentially infinite database

retrievedimagesofnumbertotal

retrievedimagesrelevantofnumberP

retrievedimagesrelevantofnumbertotal

retrievedimagesrelevantofnumberR

Page 4: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Objective

To Investigate the probabilistic behavior of the distribution of relevant images among the returned results for the image search engines:

a) Independent Distribution

b) Markov Chain Distribution

From such models, we shall determine algorithms for the meaningful estimation of recall.

Page 5: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Independent Model

Let pk denote the probability that the cumulative relevance of all the images in page k.

In general, it is normally true that, for search engines, the first pages provide a larger probability, so that

p1 p2 pk pk+1

Since the relevant outcomes of different ranked images are not mutually exclusive events and that the search results do not feasibly terminate, we have in general and that, as

11

kkP

0kP

k

Page 6: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Independent Model

Record the number of relevant images per page

as some stochastic processXi1,Xi2, …Xik, where i=1,2,

…69 k=1,2…

Investigate the quadratic formula:

Pk = 1k2 +2k +, where k=1, 2, 3…

Determine the parameters using the least square

method

Calculate the percentage that the cumulative relevance of all the images in page k, ,...2,1,

20 k

Xp kk

Obtain a mean number of relevant images for each page

69

1

,...2,1,i

ikk kXX

Page 7: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Markov Chain Model

Since in internet image search, results are returned in units of pages, we shall focus on the integer-valued stochastic process X

1, X2,…, where XJ represents the aggregate relevance of all the images in page J, the sequence X={X1, X2 ,…} will be modeled as Markov Chain.

Take the conditional probability of the number of relevant images in XJ given the number of relevant images in XJ-1 to be the transition probability:

p(J-1),J={ XJ=xJ |XJ-1=xJ-1 }.

Page 8: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Markov Chain Model

From this, we construct the transition probability matrix.

where n is the number of images contained in a page.

nnnnn

n

n

pppp

ppp

ppp

P

210

11110

00100

............

...

...

Page 9: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Markov Chain Model

Calculate the initial probabilities. The probabilities are placed in a vector of state probabilities:

(J) = vector of state probabilities for page J

= (0, 1, 2, 3, … , n)

Where k is the probability of having k relevant images Therefore, from this model, we can estimate the number of

relevant images by pages by using the formula: (J) = (J-1)*P, J=1, 2, 3, …, n

Page 10: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Experiment

Image search engine selection: Google, Yahoo, Msn

Queries Selection: the queries consist of one-word, two-word and more than three-word queries, which range from simple words like apple to specific query like apple computers and finally progressing to more specific query like eagle catching fish

Record the stochastic sequence X={X1, X2 ,…} for each query

Apply the models: Independent Model and Markov Chain Model

Test the returned results using the query: volcano, tibetan girl, desert camel shadow

Page 11: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Independent Model and Testing Results for Google

Figure 1. Independent Model for Google

Figure 2. Testing Results and Independent Distribution Model for Google

y = - 0. 0189x2 - 1. 9129x + 97. 25R2 = 0. 9523

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10

No. of Page

Perc

enta

ge o

f th

e Nu

mber

of

Rele

vant

Ima

ges

Per

Page

Googl e Pol y. (Googl e)

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10No. of Page

Numb

er o

f Re

leva

nt I

mage

s

Vol cano Ti betan Gi rlDesert Camel Shadow I ndependent Di stri buti on

Page 12: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Independent Model and Testing Results for Yahoo

Figure 3. Independent Model for Yahoo Figure 4. Testing Results and Indepen

dent Distribution Model for Yahoo

y = 0. 3788x2 - 6. 1364x + 96. 667R2 = 0. 8559

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10

No. of Page

Perc

enta

ge o

f th

e Nu

mber

of

Rele

vant

Ima

ges

Per

Page

Yahoo Pol y. (Yahoo)

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10No. of Page

Numb

er o

f Re

leva

nt I

mage

s

Vol cano Ti betan Gi rlDesert Camel Shadow I ndependent Di stri buti on

Page 13: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Independent Model and Testing Results for Msn

Figure 5. Independent Model for Msn Figure 6. Testing Results and Independent Distribution Model for

Msn

y = 0. 1894x2 - 4. 8409x + 93. 833R2 = 0. 961

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10

No. of Page

Perc

enta

ge o

f th

e Nu

mber

of

Rele

vant

Ima

ges

Per

Page

MSN Pol y. (MSN)

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10No. of Page

Numb

er o

f Re

leva

nt I

mage

s

Vol cano Ti betan Gi rlDesert Camel Shadow I ndependent Di stri buti on

Page 14: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Markov Chain Model and Testing Results for Google

Figure 7. Search Result of Testing Queries and Markov Chain Model for Google

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10No. of Page

Numb

er o

f Re

leva

nt I

mage

s

Markov Chai n Model Vol cano Ti betan Gi rl Desert Camel Shadow

Page 15: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Markov Chain Model and Testing Results for Yahoo

Figure 8. Search Result of Testing Queries and Markov Chain Model for Yahoo

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10No. of Page

Numb

er o

f Re

leva

nt I

mage

s

Markov Chai n Model Vol cano Ti betan Gi rl Desert Camel Shadow

Page 16: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Markov Chain Model and Testing Results for Msn

Figure 9. Search Result of Testing Queries and Markov Chain Model for Msn

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10No. of Page

Numb

er o

f Re

leva

nt I

mage

Markov Chai n Model Vol cano Ti betan Gi rl Desert Camel Shadow

Page 17: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Measure of Accuracy

One measure of accuracy is the mean absolute deviationmean absolute deviation (MADMAD)

n

errorforecast MAD

ISE

MAD

Model

Google Yahoo MsnOne-word

Two-word

Three-word

One-word

Two-word

Three-word

One-word

Two-word

Three-word

INDP

Model

1.2 2.4 1.1 2.9 4.6 2.5 2.7 2.6 11.8

MC Model

1 0.4 2.3 2.9 0 2.1 1.4 1.7 15.8

Page 18: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Conclusion

In terms of MAD, we conclude that the Markov Chain Model can estimate the number of relevant images for the ISE better than Independent Model does.

Except for three word query for Msn, such models could estimate the total number of image search engines quite well

Page 19: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Future Work

Optimal stopping rules for the different models will be established

Time series modeling and exponential Smoothing. Because the previous models indicates that the situation may be modeled as a time series with the page number representing the time.

Page 20: Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Q & A