43
Intelligent Search

Intelligent Search

Embed Size (px)

DESCRIPTION

ApacheCon 2009 talk describing methods for doing intelligent (well, really clever at least) search on items with no or poor meta-data. The video of the talk should be available shortly on the ApacheCon web-site.

Citation preview

Page 1: Intelligent Search

Intelligent Search

Page 2: Intelligent Search

Intelligent Search(or at least really clever)

Page 3: Intelligent Search

Some Preliminaries

• Text retrieval = matrix multiplication

A: our corpusdocuments are rowsterms are columns

Page 4: Intelligent Search

Some Preliminaries

• Text retrieval = matrix multiplication

for each document d:for each term t:sd += adt qt

A: our corpusdocuments are rowsterms are columns

Page 5: Intelligent Search

Some Preliminaries

• Text retrieval = matrix multiplication

A: our corpusdocuments are rowsterms are columns

sd = Σt adt qt

Page 6: Intelligent Search

Some Preliminaries

• Text retrieval = matrix multiplication

A: our corpusdocuments are rowsterms are columns

s = A q

Page 7: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

Page 8: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

Users who bought itemsin the list h also bought items in the list r

Page 9: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

for each user u:for each item t1:for each item t2:

rt1 += au,t1 au,t2 ht2

A: our users’ historiesusers are rowsitems are columns

Page 10: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

sd = Σt2 Σu au,t1 au,t2 qt2

Page 11: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

s = A’ (A q)

Page 12: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

s = (A’ A) q

Page 13: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

s = (A’ A) q ish!

Page 14: Intelligent Search

Why so ish?

• In real life, ish happens because:

• Big data ... so we selectively sample

• Sparse data ... so we smooth

• Finite computers ... so we sparsify

• Top-40 effect ... so we use some stats

Page 15: Intelligent Search

The same in spite of ish

• The shape of the computation is unchanged

• The cost of the computation is unchanged

• Broad algebraic conclusions still hold

Page 16: Intelligent Search

Back to recommendations ...

Page 17: Intelligent Search

Dyadic Structure● Functional

– Interaction: actor -> item*● Relational

– Interaction ⊆ Actors x Items● Matrix

– Rows indexed by actor, columns by item– Value is count of interactions

● Predict missing observations

Page 18: Intelligent Search

Fundamental Algorithmics● Cooccurrence

● A is actors x items, K is items x items● Product has general shape of matrix ● K tells us “users who interacted with x also

interacted with y”

Page 19: Intelligent Search

Fundamental Algorithmic Structure● Cooccurrence

● Matrix approximation by factoring

● LLR

Page 20: Intelligent Search

But Wait ...

Page 21: Intelligent Search

But Wait ...

Does it have to be that way?

Page 22: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Page 23: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Sum over all other users who watched/bought/...

Page 24: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Sum over all other users who watched/bought/...

Add up what they watched/bought/listened to

Page 25: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Sum over all other users who watched/bought/...

Add up what they watched/bought/listened to

And recommend that

Page 26: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Sum over all other users who watched/bought/...

Add up what they watched/bought/listened to

And recommend that

ish

Page 27: Intelligent Search

What we have:

Add up what they watched/bought/listened to

Page 28: Intelligent Search

What we have:

Add up what they watched/bought/listened to

But wait, we can do that faster

Page 29: Intelligent Search

What we have:

Add up what they watched/bought/listened to

But wait, we can do that faster

Page 30: Intelligent Search

But why not ...

Page 31: Intelligent Search

But why not ...

Page 32: Intelligent Search

But why not ...

Why just dyadic learning?

Page 33: Intelligent Search

But why not ...

Why just dyadic learning?

Why not triadic learning?

Page 34: Intelligent Search

But why not ...

Why just dyadic learning?

Why not p-adic learning?

Page 35: Intelligent Search

For example● Users enter queries (A)

– (actor = user, item=query) ● Users view videos (B)

– (actor = user, item=video)● AʼA gives query recommendation

– “did you mean to ask for”● BʼB gives video recommendation

– “you might like these videos”

Page 36: Intelligent Search

The punch-line● BʼA recommends videos in response to a query

– (isnʼt that a search engine?)– (not quite, it doesnʼt look at content or meta-data)

Page 37: Intelligent Search

Real-life example● Query: “Paco de Lucia”● Conventional meta-data search results:

– “hombres del paco” times 400– not much else

● Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff

Page 38: Intelligent Search

Real-life example

Page 39: Intelligent Search

Real-life example

Page 40: Intelligent Search

System Diagram

Viewing Logs t user video

Search Logs t user query-term

selective sampler

selective sampler

count

count

join on user

count

Related videos v => v1 v2...

Related terms v => t1 t2...

llr + sparsify

Hadoop

Page 41: Intelligent Search

Indexing

Related videos v => v1 v2...

Video meta v => url title...

join on video

Lucene Index

Related terms v => t1 t2...

Hadoop Lucene (+Katta?)

Page 42: Intelligent Search

Hypothetical Example● Want a navigational ontology?● Just put labels on a web page with traffic

– This gives A = users x label clicks● Remember viewing history

– This gives B = users x items● Cross recommend

– BʼA = click to item mapping● After several users click, results are whatever

users think they should be

Page 43: Intelligent Search

Resources● My blog

– http://tdunning.blogspot.com/● The original LLR in NLP paper

– Accurate Methods for the Statistics of Surprise and Coincidence (check on citeseer)

● Source code– Mahout project– contact me ([email protected])