43
Lecture #32 WWW Search

Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Embed Size (px)

Citation preview

Page 1: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Lecture #32

WWW Search

Page 2: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Review: Data Organization

• Kinds of things to organize– Menu items– Text– Images– Sound– Videos– Records (I.e. a person’s name, address, & phone

number, or a car’s year, make, & model)

Page 3: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Review: Data Organization

• Three ways to find things:– Lists (in-order search, binary search)– Trees (balance number of branches with time to

decide which is correct branch)– Search

Page 4: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

WWW Search

Page 5: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Search issues

• How do we say what we want?– I want a story about pigs– I want a picture of a rooster– How many televisions were sold in Vietnam

during 2000?– Find a movie like this one

• How does the computer find what we said?

Page 6: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Things to search for

• Records

• Text

• Images

• Audio

• Video

Page 7: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Records

• Car– Price– Miles– Year– Make– Doors

• Queries• Price < 6000 & Miles<100000• Make == Toyota & Year > 1993

Page 8: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Queries

• Make == Toyota & Year >1993

Page 9: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Queries

• Make == Toyota & Year >1993

Page 10: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Queries

• Year >1993 or Price < $3,000

Page 11: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Queries

• Year >1993 or Price < $3,000

Page 12: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Databases

• Large collections of records

• Accessed by queries

Page 13: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Things to search for

• RecordsText

• Images

• Audio

• Video

Page 14: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Text searching

• How do I say what I want?– Type some phrase

• I want a story about pigs

• How will the computer match this?– What is text?

• An array of characters

– What can can a computer do with text?• Match characters

Page 15: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Text searching

• People think in words not characters

• How do I convert an array of characters into an array of words?– Collect together sequences of letters– How do I know if character C is a letter?

• C>=“a” & C<=“z” | C>=“A” & C<=“Z”

Page 16: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Convert to words

• Because people think in words

Page 17: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Every document is an array of words

• I want a story about pigs

• How will I find the right documents?– Find all documents that have the word “pigs”

Page 18: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Searching text

• How will I find pigs fast?– Create an index of all words

• With each word store the name or address of each document that contains that word

– Search the index for “pigs”• Return the list of documents

• Use a binary search on the word list (50,000 words)

Page 19: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Problems

• What if a document has the word “Pig” but not “pigs”?

• Normalize– Case - make all words lower case

• Pig -> pig

– Stemming - remove all suffixes and prefixes before putting a word into the index

• pigs -> pig• piggy -> pig

Page 20: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Problems

• I want a story about pigs?– How does the computer know to search for

pigs?• It doesn’t

– How does the computer know what a story is?• It doesn’t

Page 21: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Searching

• I want a story about pigs

• Pick out the important words and search for them– Which words are important?

– D = number of times a word appears in a document– A = average number of times a word appears in all

documents

– Importance = D/A• Why?

Page 22: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

How do we create an index of all documents on the Web?

• Try = a list of URLs• Seen = all URLs you have seen

While (Try is not empty){ Page = take a URL from Try

Words = all the “important” words in Pageadd Page to the index using all of WordsLinks = all URLs in Pagefor every Link that is not in Seen add Link to Try and to Seen

}

Page 23: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Other ways to find important words and important documents

• A Document is important if many other documents point to it

• A word is important in document D if that word occurs frequently in documents that link to document D.

Page 24: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Images

• What will I say when searching for an image?– I want a rooster picture– Draw a picture of a rooster?

Page 25: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Search by picture?

?

Is this possible? If so, how?

Page 26: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

What’s in a picture?• Computers don’t understand the contents of

images

• To a computer an image is a bunch of colored pixels

Page 27: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

I want a picture of a rooster

• Label all of the pictures

• How does Google Images do it?– File name of the picture “rooster-crossingSt.jpg”– Words around the picture in the HTML

• Use “Safe Search” and set filters appropriately (http://www.youtube.com/watch?v=maWx-ApkBCs)

Page 28: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Audio

• Talking– Use speech recognition to convert audio to text

– With each recognized word keep track of where in the audio it was recognized.

• Build an index using the recognized text– Normalize based on how words sound rather

than are spelled.

Page 29: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Video

• Where in “Casablanca” does Bogart say “Play it again Sam” ?

– he never does, he just says “play it”

• How can the computer find that?– Transcribe the audio– Speech recognition on the audio

Page 30: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Video

• Does Woody ever kiss Bo Peep?

• Exactly what color is a kiss?

Page 31: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Video

• Does Woody ever kiss Bo Peep?

• Annotate every frame with who is in the frame and search for frames with both Woody and Bo Peep.

Page 32: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

So what’s with this?

Page 33: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Or this?

Page 34: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Is Woody cheating?

Page 35: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Search• Records

– Queries• < > = And Or

• Text– Normalized words (case, stemming, thesaurus)

• Images– Add words

• Audio– Transcribe or recognize as words

• Video– Transcribe– Annotate

Page 36: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

“Re-Search” Directions in Image Recognition, Search and Retrieval

Page 37: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington

Face Detection – Viola & Jones

Face DetectionIn Commercial Digital Cameras

Train on- 1000’s of faces- Millions of non-faces

Page 38: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Face Recognition(Eigenfaces [Turk and Pentland 1991])

N

N

N2

0 7125068 2104412853

Project image into higher-dimensional space

“Recognize” by grouping unknown image with closest training example

Page 39: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

Face Recognition(Picasa - Google)

• Image search/organization• Automatically finds, crops and groups images of

the same person from a collection of photos• Allows user feedback (trainable) - user can

indicate if it found the wrong person.

Page 40: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington

Create visual “words” from image features.

Face/Object Recognition/Search:Feature-Based Technology

ObjectObject Bag of Bag of “words”*“words”*

Extract Extract FeaturesFeatures

*Li Fei-Fei (Princeton)

Page 41: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington

Do this for multiple objects

Face/Object Recognition/Search:Feature-Based Technology

*Li Fei-Fei (Princeton)

Page 42: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

From R. Szeliski, Computer Vision Algorithms and Applications, p. 605

How to get matching images/documents?:

Use “word” frequencies = where nid = # times word i occurs in document d nd = total # words in document d

Then combine word frequency with inverse document frequency weighting to downweight words that occur frequently (D = # of occurrences; A = average # of occurrences)

Face/Object Recognition/Search:Bag of Words

Page 43: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,

From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington

Drop word features through a “vocabulary tree” to classify

Face/Object Recognition/Search:Feature-Based Technology

*Li Fei-Fei (Princeton)