32
Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College Park

Web Query Disambiguation from Short Sessions

  • Upload
    selah

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Web Query Disambiguation from Short Sessions. Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College Park. Web Query Disambiguation. scrubs. ?. Existing Approaches. Well-studied problem: - PowerPoint PPT Presentation

Citation preview

Page 1: Web Query Disambiguation  from Short Sessions

Web Query Disambiguation from Short Sessions

Lilyana Mihalkova* and Raymond Mooney

University of Texas at Austin

*Now at University of Maryland College Park

Page 2: Web Query Disambiguation  from Short Sessions

2

Web Query Disambiguation

scrubs

?

Page 3: Web Query Disambiguation  from Short Sessions

3

Existing Approaches

• Well-studied problem: – [e.g., Sugiyama et al. 04, Sun et al. 05, Dou et al. 07]

• Build a user profile from a long history of that user’s interactions with the search engine

Page 4: Web Query Disambiguation  from Short Sessions

4

Concerns

• Privacy concerns– AOL Data Release– “Googling Considered Harmful” [Conti 06]

• Pragmatic concerns– Storing and protecting data– Identifying users across searches

Page 5: Web Query Disambiguation  from Short Sessions

5

Proposed Setting

• Base personalization only on short glimpses of user search activity captured in brief short-term sessions

• Do not assume across-session user identifiers of any sort

Page 6: Web Query Disambiguation  from Short Sessions

How Short is Short-Term?N

umbe

r of s

essio

ns w

ith th

at m

any

quer

ies

Number of queries before ambiguous query

Page 7: Web Query Disambiguation  from Short Sessions

7

Is This Enough Info?

98.7 fm

kroq

scrubs

www.star987.com

www.kroq.com

???

huntsville hospital

ebay.com

scrubs

www.huntsvillehospital.com

www.ebay.com

???scrubs-tv.com scrubs.com

Page 8: Web Query Disambiguation  from Short Sessions

More Closely Related Work

• [Almeida & Almeida 04]: Similar assumption of short sessions, but better suited for a specialized search engine (e.g. on computer science literature)

Page 9: Web Query Disambiguation  from Short Sessions

Main Challenge

• How to harness this small amount of potentially noisy information available for each user?– Exploit the relations among sessions, URLs, and

queries

Page 10: Web Query Disambiguation  from Short Sessions

Relationship are established by shared queries or clicks that

are themselves related

Query strings and URLs are related by sharing

identical keywords, or by being identical

The choices users make over the set of possible results are

also interdependent

Page 11: Web Query Disambiguation  from Short Sessions

Exploiting Relational Information

• To overcome sparsity in individual sessions, need techniques that can effectively combine effects of noisy relations of varying types among entities– Use statistical relational learning (SRL) [Getoor &

Taskar 07]– We used one particular SRL model: Markov logic

networks (MLNs) [Richardson & Domingos 06]

Page 12: Web Query Disambiguation  from Short Sessions

Rest of This Talk

• Background on MLNs• Details of our model

– Ways of relating users– Clauses defined in the model

• Experimental results• Future work

Page 13: Web Query Disambiguation  from Short Sessions

13

Markov Logic Networks (MLNs)

• Set of weighted first-order clauses• The larger the weight of a clause, the greater

the penalty for not satisfying a grounding of this clause– Clauses can be viewed as relational features

[Richardson & Domingos 06]

Page 14: Web Query Disambiguation  from Short Sessions

MLNs Continued

• Q: set of unknown query atoms• E: set of evidence atoms• What is P(Q|E), according to an MLN:

All formulas

Number of satisfied

groundings of formula i

Normalizing constant

Page 15: Web Query Disambiguation  from Short Sessions

MLN Learning and Inference

• A wide range of algorithms available in the Alchemy software package [Kok et al. 05]

• Weight learning: we used contrastive divergence [Lowd & Domingos 07]– Adapted it for streaming data

• Inference: we used MC-SAT [Poon & Domingos 06]

Page 16: Web Query Disambiguation  from Short Sessions

16

Re-Ranking of Search Results

0.1

0.4

0.9

0.5

0.30.7

MLN

0.1

0.4

0.9

0.5

0.3

0.7

•Hand-code a set of formulas to capture regularities in the domain

- Challenge: define an effective set of relations

•Learn weights over these formulas from the data

- Challenge: noisy data

Page 17: Web Query Disambiguation  from Short Sessions

17

Specific Relationships

huntsville hospital

ebay

scrubs

huntsvillehospital.org

ebay.com

???

huntsville school. . .

. . .hospitallink.com

scrubsscrubs-tv.com

…ebay.com

scrubsscrubs.com

Page 18: Web Query Disambiguation  from Short Sessions

18

Collaborative Clauses

• The user will click on result chosen by sessions related by:

ambiguous query www.someplace1.com

some query ambiguous query

www.clickedResult.com

. . .

www.someplace1.com

•Shared click•Shared keyword click-to-click, click-to-search, search-to-click, or search-to-search

ambiguous query www.aClick.com

some query ambiguous query

www.clickedResult.comwww.someplace1.comsome other

Page 19: Web Query Disambiguation  from Short Sessions

19

Popularity Clause

• User will choose result chosen by any previous session, regardless of whether it is related– Effect is that the most popular result becomes the

most likely pick

Page 20: Web Query Disambiguation  from Short Sessions

20

Local Clauses

• User will choose result that shares a keyword with a previous search or click in the current session

ambiguous query

some query

www.someplace1.com

www.someResult.com

www.anotherPossibility.com

www.yetAnother.com

Not effective because of data sparsity

Page 21: Web Query Disambiguation  from Short Sessions

21

Balance Clause

• If the user chooses one of the results, she will not choose another– Sets up a competition among possible results– Allows the same set of weights to work well for

different-size problems

Page 22: Web Query Disambiguation  from Short Sessions

22

Empirical Evaluation: Data

• Provided by MSR• Collected in May 2006 on MSN Search• First 25 days is training, last 6 days is test

×Does not specify which queries are ambiguous• Used DMOZ hierarchy of pages to establish ambiguity

×Only lists actually clicked results, not all that were seen• Assumed user saw all results that were clicked at least once

for that exact query string

Page 23: Web Query Disambiguation  from Short Sessions

Empirical Evaluation: Models Tested

• Random re-ranker• Popularity• Standard collaborative filtering baselines based on

preferences of a set of the most closely similar sessions– Collaborative-Pearson– Collaborative-Cosine

• MLNs– Collaborative + Popularity + Balance– Collaborative + Balance– Collaborative + Popularity + Local + Balance

In Paper

Page 24: Web Query Disambiguation  from Short Sessions

24

Empirical Evaluation: Measure

• Area under the ROC curve (mean average true negative rate)

In Paper: Area under precision-recall curve, aka mean average precision

Page 25: Web Query Disambiguation  from Short Sessions

25

AUC-ROC Intuitive Interpretation

0.1

0.4

0.9

0.5

0.3

0.7

• Assuming that user scans results from the top until a relevant one is found,

• AUC-ROC captures what proportion of the irrelevant results are not considered by the user

Page 26: Web Query Disambiguation  from Short Sessions

26

AUC-ROC

0.5

0.51

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

Random Collaborative-Pearson

Collaborative-Cosine

Popularity MLN

Results Collaborative + Popularity +

Balance

Page 27: Web Query Disambiguation  from Short Sessions

27

Difficulty Levels

1 2 3 4 5 6 7 80.45

0.47

0.49

0.51

0.53

0.55

0.57

0.59

0.61

0.63

0.65

Increasing Easiness

AUC-

ROC

Average Case

MLN

Popularity

Possible Resultsfor a given query

Prop

ortio

n Cl

icke

d

KL

Page 28: Web Query Disambiguation  from Short Sessions

28

Difficulty Levels WorstCase

1 2 3 4 5 6 7 80.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

Increasing Easiness

AUC-

ROC

Popularity

MLN

Page 29: Web Query Disambiguation  from Short Sessions

Future Directions

• Incorporate more info into the models– How to retrieve relevant information quickly

• Learn more nuanced models– Cluster shared clicks/keywords and learn separate

weight for clusters• More advanced measures of user relatedness

– Incorporate similarity measures– Use time spent on a page as an indicator of

interestedness

Page 30: Web Query Disambiguation  from Short Sessions

Thank you

• Questions?

Page 31: Web Query Disambiguation  from Short Sessions

31

First-Order Logic

• Relations and attributes are represented as predicates

Predicate Variable Predicate

Literal Literal

WorkedFor(brando, coppola)

Ground Literal, “Grounding” for short

WorkedFor(A, B) Actor(A)

Page 32: Web Query Disambiguation  from Short Sessions

32

Clauses• Dependencies and regularities among the predicates are

represented as clauses:

Premises Conclusion

Movie(T, A) Director(A) Actor(A)

• To obtain a grounding of a clause, replace all variables with entity names:

Movie(godfather, pacino) Director(pacino) Actor(pacino)