87
Tamara Berg Retrieval 790-133 Language and Vision

Tamara Berg Retrieval 790-133 Language and Vision

Embed Size (px)

Citation preview

Page 1: Tamara Berg Retrieval 790-133 Language and Vision

Tamara BergRetrieval

790-133Language and Vision

Page 2: Tamara Berg Retrieval 790-133 Language and Vision

How big is the web?

• The first Google index in 1998 already had 26 million pages

• By 2000 the Google index reached the one billion mark.

• July 25, 2008 – Google announced that search had discovered one trillion unique URLs

Page 3: Tamara Berg Retrieval 790-133 Language and Vision

Slide from Takis Metaxas

Page 4: Tamara Berg Retrieval 790-133 Language and Vision

How hard is it to go from one page to another?

• Over 75% of the time there is no directed pathfrom one random web page to another.

Kleinberg: The small-world phenomenon

Page 5: Tamara Berg Retrieval 790-133 Language and Vision

How hard is it to go from one page to another?

• Over 75% of the time there is no directed pathfrom one random web page to another.• When a directed path exists its average length

is 16 clicks.• When an undirected path exists its average

length is 7 clicks.

Kleinberg: The small-world phenomenon

Page 6: Tamara Berg Retrieval 790-133 Language and Vision

How hard is it to go from one page to another?

• Over 75% of the time there is no directed pathfrom one random web page to another.• When a directed path exists its average length

is 16 clicks.• When an undirected path exists its average

length is 7 clicks.• Short average path between pairs of nodes is

characteristic of a small-world network (“six degrees of separation” Stanley Milgram).

Kleiberg: The small-world phenomenon

Page 7: Tamara Berg Retrieval 790-133 Language and Vision

Information Retrieval

• Information retrieval (IR) is the science of searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web

Wikipedia

Page 8: Tamara Berg Retrieval 790-133 Language and Vision

Slide from Takis Metaxas

Page 9: Tamara Berg Retrieval 790-133 Language and Vision

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Sergey Brin and Lawrence Page

Page 10: Tamara Berg Retrieval 790-133 Language and Vision

“The ultimate search engine would understand exactly what you mean and give back exactly what you want.” - Larry Page

Google – misspelling of googol

Page 11: Tamara Berg Retrieval 790-133 Language and Vision

The Google Search EngineFounded 1998 (1996) by two Stanford students

Originally academic / research project that later became a commercial tool

Distinguishing features (then!?):

- Special (and better) ranking

- Speed

- Size

Slide from Jeff Dean

Page 12: Tamara Berg Retrieval 790-133 Language and Vision

The web in 1997

Internet was growing very quickly• “Junk results often wash out any results that a

user is interested in. In fact, as of November 1997, only one of the top four commercial search engines finds itself (returns its own search page in response to its name in the top ten results).”

Page 13: Tamara Berg Retrieval 790-133 Language and Vision

The web in 1997

Internet was growing very quickly• “Junk results often wash out any results that a

user is interested in. In fact, as of November 1997, only one of the top four commercial search engines finds itself (returns its own search page in response to its name in the top ten results).” Need high precision in

the top results

Page 14: Tamara Berg Retrieval 790-133 Language and Vision

Google’s first search engine

Page 15: Tamara Berg Retrieval 790-133 Language and Vision

Components of Web Search Service

Components

• Web crawler

• Indexing system

• Search system

• Advertising system

Considerations

• Economics

• Scalability

• Legal issues

Slide from William Y. Arms

Page 16: Tamara Berg Retrieval 790-133 Language and Vision

Web Searching: Architecture

Build indexSearch

Index to all Web pages

• Documents stored on many Web servers are indexed in a single central index.

• The central index is implemented as a single system on a very large number of computers

Examples: Google, Yahoo!

Web servers with Web pages

Crawl

Web pages retrieved by crawler

Slide from William Y. Arms

Page 17: Tamara Berg Retrieval 790-133 Language and Vision

What is a Web Crawler?

Web Crawler

• A program for downloading web pages.

• Given an initial set of seed URLs, it recursively downloads every page that is linked from pages in the set.

• A focused web crawler downloads only those pages whose content satisfies some criterion.

Also known as a web spider

Slide from William Y. Arms

Page 18: Tamara Berg Retrieval 790-133 Language and Vision

Simple Web Crawler Algorithm

Basic Algorithm

Let S be set of URLs to pages waiting to be indexed. Initially S is is a set of known seeds.

Take an element u of S and retrieve the page, p, that it references.

Parse the page p and extract the set of URLs L it has links to.

Update S = S + L - u

Repeat as many times as necessary.

[Large production crawlers may run continuously]

Slide from William Y. Arms

Page 19: Tamara Berg Retrieval 790-133 Language and Vision

Indexing the Web Goals: Precision

Short queries applied to very large numbers of items leads to large numbers of hits.

• Goal is that the first 10-100 hits presented should satisfy the user's information need

-- requires ranking hits in order that fits user's requirements

• Recall is not an important criterion

Completeness of index is not an important factor.

• Comprehensive crawling is unnecessary

Slide from William Y. Arms

Page 20: Tamara Berg Retrieval 790-133 Language and Vision

Concept of Relevance and Importance

Document measures

Relevance, as conventionally defined, is binary (relevant or not relevant). It is usually estimated by the similarity between the terms in the query and each document.

Importance measures documents by their likelihood of being useful to a variety of users. It is usually estimated by some measure of popularity.

Web search engines rank documents by a weighted combination of estimates of relevance and importance.

Slide from William Y. Arms

Page 21: Tamara Berg Retrieval 790-133 Language and Vision

Relevance

• Words in document (stored in inverted index)• Location information – for use of proximity in

multi-word search.• In page title, page url?• Visual presentation details – font size of

words, words in bold.

Page 22: Tamara Berg Retrieval 790-133 Language and Vision

Relevance

<a href="http://www.cis.cornell.edu/">The Faculty of Computing and Information Science</a>

The source of Document A contains the marked-up text:

The anchor text:

The Faculty of Computing and Information Science can be considered descriptive metadata about the document:

http://www.cis.cornell.edu/

Slide from William Y. Arms

Anchor Text

Page 23: Tamara Berg Retrieval 790-133 Language and Vision

Importance - PageRank Algorithm

Used to estimate popularity of documents

Concept:

The rank of a web page is higher if many pages link to it.

Links from highly ranked pages are given greater weight than links from less highly ranked pages.

Slide from William Y. Arms

Page 24: Tamara Berg Retrieval 790-133 Language and Vision

Intuitive Model (Basic Concept)

Basic (no damping)

A user:

1. Starts at a random page on the web

2. Selects a random hyperlink from the current page and jumps to the corresponding page

3. Repeats Step 2 a very large number of times

Pages are ranked according to the relative frequency with which they are visited.

Slide from William Y. Arms

Page 25: Tamara Berg Retrieval 790-133 Language and Vision

PageRank

Page 26: Tamara Berg Retrieval 790-133 Language and Vision

Example

1 2

4

3

5

6

Page 27: Tamara Berg Retrieval 790-133 Language and Vision

Basic Algorithm: Matrix Representation

Slide from William Y. Arms

Page 28: Tamara Berg Retrieval 790-133 Language and Vision

Basic Algorithm: Normalize by Number of Links from Page

Slide from William Y. Arms

Page 29: Tamara Berg Retrieval 790-133 Language and Vision

Basic Algorithm: Normalize by Number of Links from Page

Slide from William Y. Arms

Page 30: Tamara Berg Retrieval 790-133 Language and Vision

Basic Algorithm: Weighting of Pages

Initially all pages have weight 1/n

w0 = 0.17

0.17

0.17

0.17

0.17

0.17

Recalculate weights

w1 = Bw0 = 0.06

0.21

0.29

0.35

0.04

0.06

If the user starts at a random page, the jth element of w1 is the probability of reaching page j after one step.

Slide from William Y. Arms

Page 31: Tamara Berg Retrieval 790-133 Language and Vision

Basic Algorithm: Weighting of Pages

Initially all pages have weight 1/n

w0 = 0.17

0.17

0.17

0.17

0.17

0.17

Recalculate weights

w1 = Bw0 = 0.06

0.21

0.29

0.35

0.04

0.06

If the user starts at a random page, the jth element of w1 is the probability of reaching page j after one step.

Slide from William Y. Arms

Page 32: Tamara Berg Retrieval 790-133 Language and Vision

Basic Algorithm: Weighting of Pages

Initially all pages have weight 1/n

w0 = 0.17

0.17

0.17

0.17

0.17

0.17

Recalculate weights

w1 = Bw0 = 0.06

0.21

0.29

0.35

0.04

0.06

If the user starts at a random page, the jth element of w1 is the probability of reaching page j after one step.

Slide from William Y. Arms

Page 33: Tamara Berg Retrieval 790-133 Language and Vision

Basic Algorithm: Iterate

Iterate: wk = Bwk-1

0.01

0.32

0.46

0.19

0.01

0.01

0.01

0.47

0.34

0.17

0.00

0.00

->

->

->

->

->

->

0.00

0.40

0.40

0.20

0.00

0.00

w0 w1 w2 w3 ... converges to ... w

At each iteration, the sum of the weights is 1.

0.17

0.17

0.17

0.17

0.17

0.17

0.06

0.21

0.29

0.35

0.04

0.06

Slide from William Y. Arms

Page 34: Tamara Berg Retrieval 790-133 Language and Vision

Special Cases of Hyperlinks on the Web

There is no link out of {2, 3, 4}

1

2

34

5 6

Slide from William Y. Arms

Page 35: Tamara Berg Retrieval 790-133 Language and Vision

Google PageRank with Damping

A user:

1. Starts at a random page on the web

2a. With probability 1-d, selects any random page and jumps to it

2b. With probability d, selects a random hyperlink from the current page and jumps to the corresponding page

3. Repeats Step 2a and 2b a very large number of times

Pages are ranked according to the relative frequency with which they are visited.

[For dangling nodes, always follow 2a.]

Slide from William Y. Arms

Teleport!

Page 36: Tamara Berg Retrieval 790-133 Language and Vision

The PageRank Iteration

The basic method iterates using the normalized link matrix, B.

wk = Bwk-1

This w is an eigenvector of B

PageRank iterates using a damping factor. The method iterates:

wk = (1 - d)w0 + dBwk-1

w0 is a vector with every element equal to 1/n.

Slide from William Y. Arms

Page 37: Tamara Berg Retrieval 790-133 Language and Vision

Choice of d

Conceptually, values of d that are close to 1 are desirable as they emphasize the link structure of the Web graph, but...

• The rate of convergence of the iteration decreases as d approaches 1.

• The sensitivity of PageRank to small variations in data increases as d approaches 1.

It is reported that Google uses a value of d = 0.85 and that the computation converges in about 50 iterations

Slide from William Y. Arms

Page 38: Tamara Berg Retrieval 790-133 Language and Vision

Image retrieval

Page 39: Tamara Berg Retrieval 790-133 Language and Vision

Types of queries

1) Text query based retrieval

2) Image query based retrieval

Page 40: Tamara Berg Retrieval 790-133 Language and Vision

1) Text query retrieval

Page 41: Tamara Berg Retrieval 790-133 Language and Vision

2) Image query retrieval

Content based image retrieval:• Analyze visual content of images

– Extract features – Build visual descriptor of each image (query and database

images).

• For a query image, match descriptors between query and database images.

• Return closest matches in ranked order by similarity.

Page 42: Tamara Berg Retrieval 790-133 Language and Vision

Image query retrieval

Query Image

Page 43: Tamara Berg Retrieval 790-133 Language and Vision

Reminder: Image Representation

Represent the image as a spatial grid of average pixel colorsConvert data base of images to this representationRepresent query image in this representation.Find images from data base that are similar to query.

Photo by: marielito

Page 44: Tamara Berg Retrieval 790-133 Language and Vision

Image query retrieval

Query Image

Database Images

Page 45: Tamara Berg Retrieval 790-133 Language and Vision

Image query retrieval

Query Image

Ranked Results – database images ranked by similarity to query

Page 46: Tamara Berg Retrieval 790-133 Language and Vision

Image query retrieval

• What’s easy?

• What’s difficult?

Page 47: Tamara Berg Retrieval 790-133 Language and Vision

Image Retrieval

• Image relevance

• Image importance

Page 48: Tamara Berg Retrieval 790-133 Language and Vision

Image Retrieval

• Image relevance

• Image importance

Page 49: Tamara Berg Retrieval 790-133 Language and Vision

Text info

• Idea – most images have associated text.• Analyze words around picture & associated with picture (title,

words, links, etc). • For a query word return pictures based on standard web

search on text associated with image.

Page 50: Tamara Berg Retrieval 790-133 Language and Vision

Human info

Just leave the content analysis/labeling to people.

ESP gameLuis von Ahn, Ruoran Liu and Manuel Blum

Page 51: Tamara Berg Retrieval 790-133 Language and Vision

User data

Watch what people click on!

Page 52: Tamara Berg Retrieval 790-133 Language and Vision

Tags: banana, monkey banana, monkeys, primate, homeless, cardboard, funny photo, orangutan, hairy, brown, zoo, animal, brown fur …

Flickr – over 2 billion user uploaded pictures, 2+ million uploads per day. About half come with some sort of associated text.

Photo by haskins

Web – billions of web pages almost always containing text and images.

Text+Image info

Page 53: Tamara Berg Retrieval 790-133 Language and Vision

Image Retrieval

• Image relevance

• Image importance

Page 54: Tamara Berg Retrieval 790-133 Language and Vision
Page 55: Tamara Berg Retrieval 790-133 Language and Vision
Page 56: Tamara Berg Retrieval 790-133 Language and Vision
Page 57: Tamara Berg Retrieval 790-133 Language and Vision

PageRank

• For web pages – use links between two pages as a measure of their similarity.

• For images – use number of matching features between two images as a measure of their similarity.– Features – SIFT features (based on histograms of

edges in different directions).– Two features are considered matching if their SSD

distance is below a threshold.

Page 58: Tamara Berg Retrieval 790-133 Language and Vision
Page 59: Tamara Berg Retrieval 790-133 Language and Vision

Pros/Cons

• Where will it work well?

• Where will it fail?

• What happens to polysemous queries?

• What about logos?

Page 60: Tamara Berg Retrieval 790-133 Language and Vision

Text + Image PageRank

How could we extend this algorithm to incorporate image and text information?

Page 61: Tamara Berg Retrieval 790-133 Language and Vision

Animals on the Web

Tamara L. Berg & David Forsyth

Page 62: Tamara Berg Retrieval 790-133 Language and Vision

I want to find lots of good pictures of monkeys…

What can I do?

Page 63: Tamara Berg Retrieval 790-133 Language and Vision

Google Image Search -- monkey

Circa 2006

Page 64: Tamara Berg Retrieval 790-133 Language and Vision

Google Image Search -- monkey

Page 65: Tamara Berg Retrieval 790-133 Language and Vision

Google Image Search -- monkey

Page 66: Tamara Berg Retrieval 790-133 Language and Vision

Google Image Search -- monkey

Words alone won’t work

Page 67: Tamara Berg Retrieval 790-133 Language and Vision

Flickr Search - monkey

• Even with humans doing the labeling, the data is extremely noisy -- context, polysemy, photo sets

• Words alone still won’t work!

Page 68: Tamara Berg Retrieval 790-133 Language and Vision

Our Results

Page 69: Tamara Berg Retrieval 790-133 Language and Vision

General Approach

- Vision alone won’t solve the problem. - Text alone won’t solve the problem.

-> Combine the two!

Page 70: Tamara Berg Retrieval 790-133 Language and Vision

Animals on the Web

Extremely challenging visual categories.

Free text on web pages.

Take advantage of language advances.

Combine multiple visual and textual cues.

Page 71: Tamara Berg Retrieval 790-133 Language and Vision

Goal:

Classify images depicting semantic categories of animals in a wide range of aspects, configurations and appearances. Images typically portray multiple species that differ in appearance.

Page 72: Tamara Berg Retrieval 790-133 Language and Vision

Animals on the Web Outline:

Harvest pictures of animals from the web using Google Text Search.

Select visual exemplars using text based information.

Use visual and textual cues to extend to similar images.

Page 73: Tamara Berg Retrieval 790-133 Language and Vision

Harvested Pictures

14,051 images for 10 animal categories.

12,886 additional images for monkey category using related monkey queries (primate, species, old world, science…)

Page 74: Tamara Berg Retrieval 790-133 Language and Vision

Text Model

Latent Dirichlet Allocation (LDA) on the words in collected web pages to discover 10 latent topics for each category.

Each topic defines a distribution over words. Select the 50 most likely words for each topic.

1.) frog frogs water tree toad leopard green southern music king irish eggs folk princess river ball range eyes game species legs golden bullfrog session head spring book deep spotted de am free mouse information round poison yellow upon collection nature paper pond re lived center talk buy arrow common prince

Example Frog Topics:

2.) frog information january links common red transparent music king water hop tree pictures pond green people available book call press toad funny pottery toads section eggs bullet photo nature march movies commercial november re clear eyed survey link news boston list frogs bull sites butterfly court legs type dot blue

Page 75: Tamara Berg Retrieval 790-133 Language and Vision

Select ExemplarsRank images according to whether they have these likely

words near the image in the associated page (word score)

Select up to 30 images per topic as exemplars.

2.) frog information january links common red transparent music king water hop tree pictures pond green people available book call press ...

1.) frog frogs water tree toad leopard green southern music king irish eggs folk princess river ball range eyes game species legs golden bullfrog session head ...

Page 76: Tamara Berg Retrieval 790-133 Language and Vision

SensesThere are multiple senses of a category within

the Google search results.

Ask the user to identify which of the 10 topics are relevant to their search. Merge.

Optional second step of supervision – ask user to mark erroneously labeled exemplars.

Page 77: Tamara Berg Retrieval 790-133 Language and Vision

Image Model

Match Pictures of a category

Page 78: Tamara Berg Retrieval 790-133 Language and Vision

Geometric Blur Shape Feature

Sparse Signal Geometric Blur

(A.) Berg & Malik ‘01

Captures local shape, but allows for some deformation. Robust to differences in intra category object shape.

Used in current best object recognition systems Zhang et al, CVPR 2006Frome et al, NIPS 2006

Page 79: Tamara Berg Retrieval 790-133 Language and Vision

Image Model (cont.)Color Features: Histogram of what colors appear in the image

Texture Features: Histograms of 16 filters

* =

Page 80: Tamara Berg Retrieval 790-133 Language and Vision

++

++

+

++

+ ++

++

+

+

++

++

***

**

*

* *

**

*

**

**

* **

*

+ *+

+

+

+

+

+ +

+*

*

?

*

+++

*

?

+

+

Scoring ImagesRelevant Features

Irrelevant Features

Relevant Exemplar

For each query feature apply a 1-nearest neighbor classifier. Sum votes for relevant class. Normalize.

Combine 4 cue scores (word, shape, color, texture) using a linear combination.

Query

*

Irrelevant Exemplar

Page 81: Tamara Berg Retrieval 790-133 Language and Vision

Classification ComparisonW

ords

Wor

ds +

Pic

ture

Page 82: Tamara Berg Retrieval 790-133 Language and Vision

Cue Combination:Monkey

Page 83: Tamara Berg Retrieval 790-133 Language and Vision

Cue Combination:

FrogGiraffe

Page 84: Tamara Berg Retrieval 790-133 Language and Vision

Re-ranking Precision

Classification Performance

Google

Page 85: Tamara Berg Retrieval 790-133 Language and Vision

Re-ranking Precision

Monkey Category

Classification Performance

Google

Monkey

Page 86: Tamara Berg Retrieval 790-133 Language and Vision

Ranked Results:

http://tamaraberg.com/google/animals/index.html

Page 87: Tamara Berg Retrieval 790-133 Language and Vision

Commercial systems

• http://tineye.com/login• http://labs.systemone.at/retrievr/• http://www.polarrose.com/• http://images.google.com• http://www.picsearch.com/