Using Artificial Queries to Evaluate Image Retrieval

Using Artificial Queries to Evaluate Image Retrieval

Nicholas R. HoweDepartment of Computer Science

Cornell University

June 12, 2000 Workshop on Content-Based Access of Image and Video Libraries

2

How Do We CompareImage Retrieval Algorithms?

• Different research groups use images from different sources.

• Image sets are of different sizes.• Tasks are different.

– Each researcher identifies set of queries and targets through subjective criteria.

– Can’t share keys because image sets are not standard.

Answer: Badly!


3

How It’s Usually Done

• Each researcher tests a proposed algorithm against a few baselines.– e.g., Color Histograms.

• No data to compare latest techniques.– Test sets are different.

– Implementation of baselines may differ also.


4

Some Difficulties

• Given a query, which target is most relevant?

• Context will determine answer.

?


5

What Should a Good Test Do?

• Provide comparable results even with different image sets.

• Offer insight into the behavior of different retrieval algorithms.

• Run quickly.

• Allow for easy implementation.


6

Proposal: Altered-Image Queries

f

Image from Library Altered Image

Query

Image Library

1

2

3

etc.

Look for rank of original:

Retrieved ranks:


7

The Crop Test

• Crop image to k% of its original area.

• Simulates close-up shot of same subject.

Original Crop-50


8

The Jumble Test

• Shuffle tiles in image divided on an hk grid.

• Simulates image with similar elements in a different arrangement.

Original Jumble-44


9

The Low-Con Test

• Decrease contrast to k% of its original range.

• Simulates altered lighting conditions and/or camera differences.

Original Low-Con-80


10

Typical results

• Most retrievals are at low rank.• A few retrievals are at much higher rank.

Median: 26

Mean 205

• Median and mean summarize the results of multiple repetitions.


11

Difficulty of Altered-Image Queries

• Both mean and median increase with difficulty.• Note order-of-magnitude changes.


12

How Stable are Altered-Image Queries?

• Ran Crop-50 on three entirely different sets of 6000 images.

• Some consistency even with different test sets.• Look for order-of-magnitude change.

Set 1 Set 2 Set 3 Mean Dev.

Median Rank 5 5 7 5.7 1.2

Mean Rank 29.6 33.9 45.5 36.3 8.2


13

Does the Number of Images Matter?

• Found linear dependence on number of images.


14

How Many Queries Must Be Run?

• Small % of total image set gives decent figure.


15

Comparing Algorithms Using Altered-Image Queries

• Three algorithms compared using altered image queries.

• Especially good or bad performance can be identified.

Crop Jumble Low-Con

Histograms 18 126.6 1 1 86.5 350.3

Correlograms 1 12.4 1 2.0 5 83.6

STAIRS (Tuned) 1 17.0 1 1.2 1 22.6


16

Final Thoughts

• Altered-Image Queries are...– Well defined.

– Easy to implement.

– Consistent over different image sets.

• A useful addition to our evaluation toolkit.• Also offer diagnostic potential.

Documents

Using Artificial Queries to Evaluate Image Retrieval