151
Fan Guo Chao Liu Carnegie Mellon University Microsoft Research-Redmond

Statistical Models for Web Search Click Log Analysis

Embed Size (px)

DESCRIPTION

Fan Guo Chao Liu Carnegie Mellon University Microsoft Research-Redmond. Statistical Models for Web Search Click Log Analysis. Prologue. Search Results for “CIKM”. # of clicks received. Prologue. Adapt ranking to user clicks?. # of clicks received. Prologue. - PowerPoint PPT Presentation

Citation preview

Page 1: Statistical Models for Web Search Click Log Analysis

Fan Guo Chao LiuCarnegie Mellon University Microsoft Research-Redmond

Page 2: Statistical Models for Web Search Click Log Analysis

Search Results for “CIKM”

04/20/23 2CIKM'09 Tutorial, Hong Kong, China

# of clicks received

Page 3: Statistical Models for Web Search Click Log Analysis

Adapt ranking to user clicks?

04/20/23 3CIKM'09 Tutorial, Hong Kong, China

# of clicks received

Page 4: Statistical Models for Web Search Click Log Analysis

Tools needed for non-trivial cases

04/20/23 4CIKM'09 Tutorial, Hong Kong, China

# of clicks received

Page 5: Statistical Models for Web Search Click Log Analysis

One of the most extensive (yet indirect) surveys of user experience.

For researchers: Help understand human interaction with IR

results Design and calibrate novel models and

hypotheses For practitioners:

Measure, monitor and improve search engine performance.

Attract more page views and clicks, boost profit 04/20/23 CIKM'09 Tutorial, Hong Kong, China 5

Page 6: Statistical Models for Web Search Click Log Analysis

Introduce problems and applications in web search click modeling.

Present latest development of click models in web search.

Provide examples and discuss trade-offs for model design, implementation and evaluation.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 6

Page 7: Statistical Models for Web Search Click Log Analysis

04/20/23 CIKM'09 Tutorial, Hong Kong, China 7

Ph.D. Student (exp. 2011), Computer Science Department, Carnegie Mellon University

Advisor: Christos Faloutsos Dissertation topic: graph

mining for large bioinformatics image databases

2008, M.S., CMU 2005, B.E., Tsinghua

University, Beijing, China

Page 8: Statistical Models for Web Search Click Log Analysis

Researcher, Internet Services Research Center (ISRC), MSR-Redmond.

Research focus: large-scale search/browsing log analysis for effective Web information access.

2007, Ph.D., UIUC2005, M.S., UIUC Advisor: Jiawei Han Dissertation on statistical

debugging and automated failure analysis

2003, B.S., Peking University, China

04/20/23 CIKM'09 Tutorial, Hong Kong, China 8

Page 9: Statistical Models for Web Search Click Log Analysis

IntroductionDesigning click modelsBayesian click modelsSelected topics on click modelsConclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 9

Page 10: Statistical Models for Web Search Click Log Analysis

Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models

Designing click models Bayesian click models Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 10

Page 11: Statistical Models for Web Search Click Log Analysis

Click-throughBrowser actionDwelling timeExplicit judgmentOther page elements

04/20/23 CIKM'09 Tutorial, Hong Kong, China 11

Page 12: Statistical Models for Web Search Click Log Analysis

Auto-generated data keeping important information about search activity.

1204/20/23 CIKM'09 Tutorial, Hong Kong, China

Position URL Click1 cikm2008.org 1

2 www.cikm.org 03 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 www.comp.polyu.edu.hk/conference/cikm2009 16 cikmconference.org 07 Ir.iit.edu/cikm2004 08 www.informatik.uni-trier.de/~ley/db/conf/cikm/index.html 09 www.tzi.de/CIKM2005 0

10 www.cikm.com 0

Query cikmSession

IDf851c5af178384d12f3d

Page 13: Statistical Models for Web Search Click Log Analysis

A real world example

04/20/23 CIKM'09 Tutorial, Hong Kong, China 13

Page 14: Statistical Models for Web Search Click Log Analysis

How large is the click log? search logs: 10+ TB/day

In existing publications:▪ [Craswell+08]: 108k sessions▪ [Dupret+08] : 4.5M sessions (21 subsets * 216k

sessions)▪ [Guo +09a] : 8.8M sessions from 110k unique queries▪ [Guo+09b]: 8.8M sessions from 110k unique queries▪ [Chapelle+09]: 58M sessions from 682k unique

queries▪ [Liu+09a]: 0.26PB data from 103M unique queries

04/20/23 CIKM'09 Tutorial, Hong Kong, China 14

Page 15: Statistical Models for Web Search Click Log Analysis

How large is one ?

04/20/23 CIKM'09 Tutorial, Hong Kong, China 15

Page 16: Statistical Models for Web Search Click Log Analysis

Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models

Designing click models Bayesian click models Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 16

Page 17: Statistical Models for Web Search Click Log Analysis

Clicks are good… Are these two

clicks equally “good”?

Non-clicks may have excuses: Not relevant Not examined

04/20/23 CIKM'09 Tutorial, Hong Kong, China 17

Page 18: Statistical Models for Web Search Click Log Analysis

1804/20/23 CIKM'09 Tutorial, Hong Kong, China

Page 19: Statistical Models for Web Search Click Log Analysis

Higher positions receive more user attention (eye fixation) and clicks than lower positions.

This is true even in the extreme setting where the order of positions is reversed.

“Clicks are informative but biased”.

1904/20/23 CIKM'09 Tutorial, Hong Kong, China

[Joachims+07]

Normal Position

Perc

enta

ge

Reversed Impression

Perc

enta

ge

Page 20: Statistical Models for Web Search Click Log Analysis

“Clicked > Skipped Above” [Joachims02]

04/20/23 CIKM'09 Tutorial, Hong Kong, China 20

Preference pairs:#5>#2, #5>#3, #5>#4.

Use Rank SVM to optimize the retrieval function.

Limitation: Confidence of

judgments Little implication to

user modeling

1

2345

67

8

Page 21: Statistical Models for Web Search Click Log Analysis

Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models

Designing click models Bayesian click models Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 21

Page 22: Statistical Models for Web Search Click Log Analysis

Given a set of web search click logs: Predict clicks: output the

probability of click vectors given a new order of URLs.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 22

210 possibilities!

Page 23: Statistical Models for Web Search Click Log Analysis

Given a set of web search click logs: Estimate relevance: measures how

good a URL is with regard to the information need of the query/user.

04/20/23 23

Relevance score = 0.5

CIKM'09 Tutorial, Hong Kong, China

Page 24: Statistical Models for Web Search Click Log Analysis

The probability of a click if the document appears at the top position. Relevance score = 0.5 indicates that on

average, the document will be clicked once per 2 sessions.

Bayesian click models characterize relevance using a probability distribution

2404/20/23Relevance score

Densi

ty f

unct

ion

CIKM'09 Tutorial, Hong Kong, China

Page 25: Statistical Models for Web Search Click Log Analysis

Effective: aware of the position-bias and address it properly

Scalable: linear complexity for both time and space, easy to parallel

Incremental: flexible for model update based on new data

04/20/23 CIKM'09 Tutorial, Hong Kong, China 25

Page 26: Statistical Models for Web Search Click Log Analysis

Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models

Designing click models Bayesian click models Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 26

Page 27: Statistical Models for Web Search Click Log Analysis

Optimizing the retrieval function Ranking alternation based on clicks

[Liu+09b]

04/20/23 CIKM'09 Tutorial, Hong Kong, China 27

0.90

0.10

0.08

0.05

0.20

0.72

Page 28: Statistical Models for Web Search Click Log Analysis

Optimizing the retrieval function Ranking alternation based on clicks As a feature to a learning-to-rank

system (e.g., RankNet [Burges+05] )

04/20/23 CIKM'09 Tutorial, Hong Kong, China 28

Page 29: Statistical Models for Web Search Click Log Analysis

Online advertising User model for sponsored search

auctions

04/20/23 CIKM'09 Tutorial, Hong Kong, China 29

Page 30: Statistical Models for Web Search Click Log Analysis

Online advertising User model for sponsored search

auctions Click through rate (CTR) prediction

[Zhu+10]

04/20/23 CIKM'09 Tutorial, Hong Kong, China 30

Page 31: Statistical Models for Web Search Click Log Analysis

Search engine evaluation Pskip [Wang+09]:

click-through-rate above last clicks; dwelling time features could also be incorporated.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 31

Page 32: Statistical Models for Web Search Click Log Analysis

Search engine evaluation Pskip [Wang+09]: click-through-rate above

last clicks;

Search relevance score [Guo+09c]: average relevance score weighted by chance of examination

04/20/23 CIKM'09 Tutorial, Hong Kong, China 32

Page 33: Statistical Models for Web Search Click Log Analysis

User behavior analysis A preliminary work showing different

user behavior patterns for navigational and informational queries [Guo+09c]

04/20/23 CIKM'09 Tutorial, Hong Kong, China 33

Page 34: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models

Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design

Bayesian click models Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 34

Page 35: Statistical Models for Web Search Click Log Analysis

A document must be examined before a click.

The (conditional) probability of click upon examination depends on document relevance.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 35

Page 36: Statistical Models for Web Search Click Log Analysis

The click probability could be decomposed: Global component: the examination

probability which reflects the position-bias Local component: depends on the (query,

URL) pair only

The building block for every existing model!

04/20/23 CIKM'09 Tutorial, Hong Kong, China 36

Page 37: Statistical Models for Web Search Click Log Analysis

The first document is always examined.

First-order Markov property: Examination at position (i+1) depends on

examination and click at position i only

Examination follows a strict linear order:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 37

Position i Position (i+1)

Page 38: Statistical Models for Web Search Click Log Analysis

The first document is always examined.

First-order Markov property: Examination at position (i+1) depends on

examination and click at position i only

Examination follows a strict linear order:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 38

Position i Position (i+1)

Page 39: Statistical Models for Web Search Click Log Analysis

Limitation: examination/click rate monotonically decreases with rank, which is not always true.

Some models do not follow this hypothesis (e.g., UBM)

04/20/23 CIKM'09 Tutorial, Hong Kong, China 39

Web search data in [Guo+09a]

Ads click data in [Zhu+10]

Page 40: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models

Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design

Bayesian click models Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 40

Page 41: Statistical Models for Web Search Click Log Analysis

Put together two hypotheses:

Formal model specification: P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui

P(E1=1) =1, P(Ei+1=1|Ei=0) = 0

P(Ei+1=1|Ei=1, Ci=0)=104/20/23 CIKM'09 Tutorial, Hong Kong, China 41

Cascade Model = [Craswell+08]

examination hypothesiscascade hypothesis

modeling a single click

Page 42: Statistical Models for Web Search Click Log Analysis

The user behavior chart:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 42

Examine the URL

Click?

Yes

No See Next URL?

Done

Yes

Index for URL at position i

Page 43: Statistical Models for Web Search Click Log Analysis

First click in Click Chain Model [Guo+09b] as well asDynamic Bayesian Network model [Chapelle+09]

04/20/23 CIKM'09 Tutorial, Hong Kong, China 43

The chance that user may

immediately abandon

examination w/o a click.

Examine the URL

Click?

Yes

No See Next URL?

Done

Yes

Done

No

Page 44: Statistical Models for Web Search Click Log Analysis

First click in User Browsing Model [Dupret+08]

04/20/23 CIKM'09 Tutorial, Hong Kong, China 44

Examine the URL

Click?

Yes

No

Done

Yes

Noi ←i+1

See Next URL?

Position-dependent parameters

Page 45: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models

Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design

Bayesian click models Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 45

Page 46: Statistical Models for Web Search Click Log Analysis

Generalize the cascade model to 1+ clicks: P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui

P(E1=1) =1, P(Ei+1=1|Ei=0) = 0

P(Ei+1=1|Ei=1, Ci=0)=1

P(Ei+1=1|Ei=1, Ci=1)= λi

04/20/23 CIKM'09 Tutorial, Hong Kong, China 46

λ:global parameters characterizing user browsing

behavior

Page 47: Statistical Models for Web Search Click Log Analysis

Generalize the cascade model to 1+ clicks:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 47

Page 48: Statistical Models for Web Search Click Log Analysis

DCM Algorithms: Input: for each query session, the query

term, with (URL, clicked) tuple for all top-10 positions.

Output: relevance for each (query, URL) pair;global parameters for user behavior

Method: approximate* maximum-likelihood estimation.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 48*Footnote: the algorithm maximizes a lower bound of log-likelihood function.

Page 49: Statistical Models for Web Search Click Log Analysis

04/20/23 CIKM'09 Tutorial, Hong Kong, China 49

Position URL Click1 cikm2008.org 12 www.cikm.org 03 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 www.comp.polyu.edu.hk/... 16 cikmconference.org 07 Ir.iit.edu/cikm2004 08 www.informatik.uni-trier.de... 09 www.tzi.de/CIKM2005 0

10 www.cikm.com 0

Last clicked position

Query cikm

Session ID f851c5af178384d12f3d

Page 50: Statistical Models for Web Search Click Log Analysis

04/20/23 CIKM'09 Tutorial, Hong Kong, China 50

Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 17 Ir.iit.edu/cikm2004 08 www.informatik.uni-trier.de... 09 www.tzi.de/CIKM2005 1

10 www.cikm.com 0

Last clicked position

Query cikm

Session ID ab8dee4c4dd21e6aaf03

Page 51: Statistical Models for Web Search Click Log Analysis

The estimation formula for relevance:

empirical CTR measured before last clicked position

The estimation formula for global (user behavior) parameters:

empirical probability of “clicked-but-not-last”

04/20/23 CIKM'09 Tutorial, Hong Kong, China 51

Page 52: Statistical Models for Web Search Click Log Analysis

Keep 3 counts for each (query, URL) pair

Then

04/20/23 CIKM'09 Tutorial, Hong Kong, China 52

Details

Page 53: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on the relevance of the URL clicked:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 53

Not what I want, go to examine the

next

Aha, this is the right one, and I’m done!

Page 54: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on the relevance of the URL clicked: P(Ei+1=1|Ei=1, Ci=1)= α2(1-rui

) + α3rui

P(Ei+1=1|Ei=1, Ci=0)= α1

where 0 < α1 ≤ 1, 0 ≤ α3< α2≤ 1

04/20/23 CIKM'09 Tutorial, Hong Kong, China 54

Page 55: Statistical Models for Web Search Click Log Analysis

The full picture:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 55

Page 56: Statistical Models for Web Search Click Log Analysis

There is a subtle difference between the relevance of the URL snippet and the landing page.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 56

hmmm…, this looks

pretty nice

errr…, it’s way out of

date

Conclusion: attractive, but not satisfactory.

Page 57: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on the “satisfaction score”: P(Ei+1=1|Ei=1, Ci=1)= γ(1-sui

) + 0sui

P(Ei+1=1|Ei=1, Ci=0)= γ

where 0 < γ ≤1The click probability is associated

with “attractiveness score”: P(Ci=1|Ei=1)= aui

04/20/23 CIKM'09 Tutorial, Hong Kong, China 57

Page 58: Statistical Models for Web Search Click Log Analysis

The full picture:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 58

Page 59: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 59

r = 0d = 1

Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …

Page 60: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 60

r = 0d = 2

Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …

Page 61: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 61

r = 2d = 1

Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …

Page 62: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 62

r = 2d = 2

Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …

Page 63: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 63

r = 2d = 3

Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …

Page 64: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. Users would lose patience when they

browse through without issuing a click. The probability monotonically drops as d

increases and r remains the same.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 64

Page 65: Statistical Models for Web Search Click Log Analysis

The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. P(Ei=1|C1:i-1)= βri,di

55 parameters are needed for top-10 positions (0≤r<r+d≤10).

Cascade hypothesis is not assumed.04/20/23 CIKM'09 Tutorial, Hong Kong, China 65

where ri = max{j| j <i , Cj=1}, di = i - ri

Page 66: Statistical Models for Web Search Click Log Analysis

The full picture:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 66

Page 67: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models

Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design

Bayesian click models Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 67

Page 68: Statistical Models for Web Search Click Log Analysis

Probability of examine the first URL

04/20/23 CIKM'09 Tutorial, Hong Kong, China 68

Model P(E1)

Cascade 1DCM 1CCM 1*

DBN 1*

UBM β0,1

* Footnote: it is flexible to add another parameter to specify this probability.

Page 69: Statistical Models for Web Search Click Log Analysis

Probability of click upon examination

04/20/23 CIKM'09 Tutorial, Hong Kong, China 69

Model P(Ci=1|Ei=1)

Cascade rdi

DCM rdi

CCM rdi

*

DBN adi

UBM rdi*Footnote: the mean of the relevance distribution, detailed in the next part

Page 70: Statistical Models for Web Search Click Log Analysis

Probability of examine-next w/o a click

04/20/23 CIKM'09 Tutorial, Hong Kong, China 70

Model P(Ei+1=1|Ei=1,Ci=0)

Cascade 1DCM 1CCM α1

DBN γUBM βri+1,di+1

*

*Footnote: the probability does not depend on Ei

Page 71: Statistical Models for Web Search Click Log Analysis

Probability of examine-next after a click

04/20/23 CIKM'09 Tutorial, Hong Kong, China 71

Model P(Ei+1=1|Ei=1,Ci=1)

Cascade --DCM αi

CCM α2(1-rdi) + α3rdi

DBN γ(1-sdi)

UBM βi,1

Page 72: Statistical Models for Web Search Click Log Analysis

Probability of examine-next after a click

04/20/23 CIKM'09 Tutorial, Hong Kong, China 72

Model P(Ei+1=1|Ei=1,Ci=1)

Cascade --DCM αi

CCM α2(1-rdi) + α3rdi

DBN γ(1-sdi)

UBM βi,1

Page 73: Statistical Models for Web Search Click Log Analysis

Size of parameter sets

04/20/23 CIKM'09 Tutorial, Hong Kong, China 73

Model # of global params

Cascade 0DCM 9CCM 3DBN 1UBM 55

Page 74: Statistical Models for Web Search Click Log Analysis

Inference and estimation algorithms

04/20/23 CIKM'09 Tutorial, Hong Kong, China 74

Model

Single-Pass

Details

DCM Maximizing a lower bound of LL, fastest

CCMNo iteration needed,

thanks to the Bayesian framework

DBN EM-based, iterative algorithms

UBM EM-based, usually takes ~30 iterations to converge

Page 75: Statistical Models for Web Search Click Log Analysis

Inference and estimation algorithms

04/20/23 CIKM'09 Tutorial, Hong Kong, China 75

Model

Single-Pass

Details

DCM Maximizing a lower bound of LL, fastest

CCMNo iteration needed,

thanks to the Bayesian framework

DBN EM-based, iterative algorithms

UBM EM-based, usually takes ~30 iterations to converge

Page 76: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models Bayesian click models

Bayesian framework and the rationale

Bayesian Browsing Model: a case study

Click Chain Model in a nutshell Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 76

Page 77: Statistical Models for Web Search Click Log Analysis

p(H)=0.8

Frequentist

Bayesian

0 1

Prior Posterior

10

04/20/23 77CIKM'09 Tutorial, Hong Kong, China

p(H) p(H)

“probability” of p(H)

Page 78: Statistical Models for Web Search Click Log Analysis

Prior Posterior

04/20/23 78CIKM'09 Tutorial, Hong Kong, China

Density Function(not normalized)

x x2 x3 x3(1-x) x4(1-x)

Page 79: Statistical Models for Web Search Click Log Analysis

Prior Posterior

04/20/23 79CIKM'09 Tutorial, Hong Kong, China

Density Function(not normalized)

x1(1-x)0 x2(1-x)0 x3(1-x)0

x3(1-x)1 x4(1-x)1

Page 80: Statistical Models for Web Search Click Log Analysis

The graphical model for coin-toss

04/20/23 CIKM'09 Tutorial, Hong Kong, China 80

X

C1

C2

C3

C4

C5

Page 81: Statistical Models for Web Search Click Log Analysis

The graphical model for coin-toss

04/20/23 CIKM'09 Tutorial, Hong Kong, China 81

X

C1

C2

C3

C4

C5

Page 82: Statistical Models for Web Search Click Log Analysis

04/20/23 CIKM'09 Tutorial, Hong Kong, China 82

Prior

Density Function(not normalized)

x1

(1-x)0

(1-0.6x)0

(1+0.3x)1

(1-0.5x)0

(1-0.2x)0

x1

(1-x)1

(1-0.6x)0

(1+0.3x)1

(1-0.5x)0

(1-0.2x)0

x2

(1-x)1

(1-0.6x)0

(1+0.3x)2

(1-0.5x)0

(1-0.2x)0

x3

(1-x)1

(1-0.6x)1

(1+0.3x)2

(1-0.5x)0

(1-0.2x)0

x3

(1-x)1

(1-0.6x)1

(1+0.3x)2

(1-0.5x)1

(1-0.2x)0

Page 83: Statistical Models for Web Search Click Log Analysis

Representation of relevance A probability distribution on

[0,1] for each (query, URL) pair

The density function is in a polynomial form over a small set of linear factors.

The coefficients of such linear factors are shared between different (query, URL) pairs.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 83

x3

(1-1x)1

(1-0.6x)1

(1+0.3x)2

(1-0.5x)1

(1-0.2x)0

Page 84: Statistical Models for Web Search Click Log Analysis

Inference: Go over each query session

once, update the exponents for corresponding (query, URL) pair impressed*

Analytical or numerical integration may be needed to compute the normalization constant.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 84*Footnote: by virtue of the Bayes theorem and conditional independence relationship/assumption

Page 85: Statistical Models for Web Search Click Log Analysis

Key problems: Which is the right factor to update?

How to estimate all the coefficients?

04/20/23 CIKM'09 Tutorial, Hong Kong, China 85

Page 86: Statistical Models for Web Search Click Log Analysis

Modeling Benefits: Confidence for the URL relevance estimate Relative judgments: probability of URL i is

more relevant to the query than URL j Easy to interpret: coefficients in linear

factors reflect position-bias and user browsing patterns

Computational Benefits: Single-pass, linear algorithms; no

iterations Paralleled version is easy to implement

04/20/23 CIKM'09 Tutorial, Hong Kong, China 86

Page 87: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models Bayesian click models

Bayesian framework and the rationale

Bayesian Browsing Model: a case study

Click Chain Model in a nutshell Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 87

Page 88: Statistical Models for Web Search Click Log Analysis

For a specific query session, let

where 1 ≤ i ≤ M=10.

04/20/23 88

S1

S2

S3

SM

E1

E2

E3

EM

C1

C2

C3

CM

CIKM'09 Tutorial, Hong Kong, China

Page 89: Statistical Models for Web Search Click Log Analysis

04/20/23 89

S1

S2

S3

SM

E1

E2

E3

EM

C1

C2

C3

CM

Relevance

Examination

Click

CIKM'09 Tutorial, Hong Kong, China

Page 90: Statistical Models for Web Search Click Log Analysis

Compute the posterior distributionConditional independence

relationship induced from the graphical model

04/20/23 90

How many times the URL j was clicked

How many times URLj was not clicked when it is at position (r + d) with the preceding click at position rCIKM'09 Tutorial, Hong Kong, China

Details

Page 91: Statistical Models for Web Search Click Log Analysis

9104/20/23

Only top M=3 positions are shown, 3 query sessions and 4 distinct URLs.

41

4

3

1 3

31 2

Position 1 2 3

Query Session 3

Query Session 2

Query Session 1

CIKM'09 Tutorial, Hong Kong, China

Page 92: Statistical Models for Web Search Click Log Analysis

9204/20/23

Initialize M(M+1)/2+1 counts for each URL

URL Clicks r=0d=1

r=0d=2

r=0d=3

r=1d=1

r=1d=2

r=2d=1

4 0 0 0 0 0 0 0

CIKM'09 Tutorial, Hong Kong, China

Page 93: Statistical Models for Web Search Click Log Analysis

9304/20/23

Update counts for URL 4 If not impressed, do nothing; If clicked, increment “clicks” by 1; Otherwise, locate the right r and d to

increment.

URL Clicks r=0d=1

r=0d=2

r=0d=3

r=1d=1

r=1d=2

r=2d=1

4 0 0 0 0 0 0 0CIKM'09 Tutorial, Hong Kong, China

Page 94: Statistical Models for Web Search Click Log Analysis

9404/20/23

Update counts for URL 4 If not impressed, do nothing; If clicked, increment “clicks” by 1; Otherwise, locate the right r and d to

increment.

URL Clicks r=0d=1

r=0d=2

r=0d=3

r=1d=1

r=1d=2

r=2d=1

4 0 0 0 0 0 0 1CIKM'09 Tutorial, Hong Kong, China

Page 95: Statistical Models for Web Search Click Log Analysis

9504/20/23

Update counts for URL 4 If not impressed, do nothing; If clicked, increment “clicks” by 1; Otherwise, locate the right r and d to

increment.

URL Clicks r=0d=1

r=0d=2

r=0d=3

r=1d=1

r=1d=2

r=2d=1

4 1 0 0 0 0 0 1CIKM'09 Tutorial, Hong Kong, China

Page 96: Statistical Models for Web Search Click Log Analysis

9604/20/23

The posterior for URL 4

Interpretation: The larger the probability of examination,

the stronger the penalty for a non-click.

URL Clicks r=0d=1

r=0d=2

r=0d=3

r=1d=1

r=1d=2

r=2d=1

4 1 0 0 0 0 0 1

CIKM'09 Tutorial, Hong Kong, China

Page 97: Statistical Models for Web Search Click Log Analysis

Keep 2 counts for each parameter (one for click, and the other one for non-click)

04/20/23 CIKM'09 Tutorial, Hong Kong, China 97

Parameter Click Non-click Parameter Click Non-Click

β0,1 0 0 β1,1 0 0

β0,2 0 0 β1,2 0 0

β0,3 0 0 β2,1 0 0

Page 98: Statistical Models for Web Search Click Log Analysis

For each position in a query session, locate the right r and d to increment.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 98

Parameter Click Non-click Parameter Click Non-Click

β0,1 1 0 β1,1 0 1

β0,2 0 0 β1,2 0 1

β0,3 0 0 β2,1 0 0

Page 99: Statistical Models for Web Search Click Log Analysis

For each position in a query session, locate the right r and d to increment.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 99

Parameter Click Non-click Parameter

Click Non-Click

β0,1 1 1 β1,1 0 1

β0,2 1 0 β1,2 0 1

β0,3 0 0 β2,1 0 1

Page 100: Statistical Models for Web Search Click Log Analysis

For each position in a query session, locate the right r and d to increment.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 100

Parameter Click Non-click Parameter

Click Non-Click

β0,1 1 2 β1,1 1 1

β0,2 1 0 β1,2 0 1

β0,3 0 0 β2,1 1 1

Page 101: Statistical Models for Web Search Click Log Analysis

Maximum-Likelihood Estimate:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 101

Parameter Click Non-click Parameter

Click Non-Click

β0,1 1 2 β1,1 1 1

β0,2 1 0 β1,2 0 1

β0,3 0 0 β2,1 1 1

Page 102: Statistical Models for Web Search Click Log Analysis

Let

Initializing and updating the counts: Time: Space:

04/20/23 102

Linear to the size of the click log

Almost constant storage requiredCIKM'09 Tutorial, Hong Kong, China

Details

Page 103: Statistical Models for Web Search Click Log Analysis

Let

Initializing and updating the counts: Time: Space:

Computing relevance scores using numerical integration with B bins: Time: Space:

04/20/23 103CIKM'09 Tutorial, Hong Kong, China

Details

Page 104: Statistical Models for Web Search Click Log Analysis

Step 1: Step 1: initialize counting statistics; Step 2: Step 2: scan through the click log

once and update the counts for both inference and estimation

Step 3: Step 3: compute parameter values; Step 4: Step 4: use numerical integration to

obtain relevance scores.

Step 2 also applies for (linear) incremental computation!

04/20/23 104CIKM'09 Tutorial, Hong Kong, China

Page 105: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models Bayesian click models

Bayesian framework and the rationale

Bayesian Browsing Model: a case study

Click Chain Model in a nutshell Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 105

Page 106: Statistical Models for Web Search Click Log Analysis

The user behavior model:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 106

Page 107: Statistical Models for Web Search Click Log Analysis

Graphical model:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 107

Relevance

Examination

Click

S1

S2

S3

SM

E1

E2

E3

EM

C1

C2

C3

CM

Page 108: Statistical Models for Web Search Click Log Analysis

04/20/23 CIKM'09 Tutorial, Hong Kong, China 108

Details

Page 109: Statistical Models for Web Search Click Log Analysis

Number of user behavior parameters

Number of distinct factors for (query, URL)

Number of counts needed for parameters

04/20/23 CIKM'09 Tutorial, Hong Kong, China 109

CCM UBM

3 55

CCM UBM

22 56

CCM UBM

5 110

Page 110: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models Bayesian click models Selected topics on click models

Scaling click models for Petabyte-scale data

Click model evaluation

Tailoring user goals to click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 110

Page 111: Statistical Models for Web Search Click Log Analysis

Data collected in 8 weeks Job k includes data between week 1 and

k Both time and space costs are

prohibitive for a single node.

04/20/23 111CIKM'09 Tutorial, Hong Kong, China

Page 112: Statistical Models for Web Search Click Log Analysis

A Simple Task: counting # impression for each (query, URL) pair

04/20/23 CIKM'09 Tutorial, Hong Kong, China 112

Page 113: Statistical Models for Web Search Click Log Analysis

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Output

Count Count Count Count

Machine #1

Machine #2 Machine #3 Machine #4

Page 114: Statistical Models for Web Search Click Log Analysis

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Output

Count Count Count Count

“Map” puts all of the same Pairs onto one machine. This allows you to group by various fields in

subsequent processes.

Machine #1

Machine #2 Machine #3 Machine #4

Page 115: Statistical Models for Web Search Click Log Analysis

A Simple Task: counting # impression for each (query, URL) pair

Map = Bucket: the intermediate key is (query, URL) pair

04/20/23 CIKM'09 Tutorial, Hong Kong, China 115

Page 116: Statistical Models for Web Search Click Log Analysis

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Output

Count Count Count Count

“Count” carries out standard increment-by-1 over each distinct Pair.

Machine #1

Machine #2 Machine #3 Machine #4

“Count” REDUCES the amount of data since each Pair has only one output value

Page 117: Statistical Models for Web Search Click Log Analysis

A Simple Task: counting # impression for each (query, URL) pair

Map = Bucket: the intermediate key is (query, URL) pair

Reduce = Count: it accepts a list of (key, value) tuple, and outputs the final result for each distinct key

04/20/23 CIKM'09 Tutorial, Hong Kong, China 117

Page 118: Statistical Models for Web Search Click Log Analysis

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Extent

GetPairs

Map

Sort

Output

Count Count Count Count

MAPMAP

REDUCEREDUCE

Machine #1

Machine #2 Machine #3 Machine #4

Page 119: Statistical Models for Web Search Click Log Analysis

04/20/23 119

0 for clicks0 for clicks332 52 51 4 61 4 6

CIKM'09 Tutorial, Hong Kong, China

Page 120: Statistical Models for Web Search Click Log Analysis

Map: scan the click log Intermediate key: (query, URL) Value: the index of linear factors

(0~55 for top-10 positions)

Reduce: scan the list of (key, value) The key indicates which exponent vector

to update The value indicates the index of the

element in the exponent vector to increment

04/20/23 CIKM'09 Tutorial, Hong Kong, China 120

Page 121: Statistical Models for Web Search Click Log Analysis

Linearly increasing computation loadNear-constant elapsed time

04/20/23121

Single machine computation load

Elapse time on SCOPE

• 3 hours• 265 TB log data• 1.15 billion (query, url) pairs

CIKM'09 Tutorial, Hong Kong, China

Page 122: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models Bayesian click models Selected topics on click models

Scaling click models for Petabyte-scale data

Click model evaluation

Tailoring user goals to click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 122

Page 123: Statistical Models for Web Search Click Log Analysis

04/20/23 123

Impression Data

Click Data

CIKM'09 Tutorial, Hong Kong, China

Page 124: Statistical Models for Web Search Click Log Analysis

04/20/23 124

Impression Data

Click Data

Relevance Scores

Global Parameters

M=10

CIKM'09 Tutorial, Hong Kong, China

Page 125: Statistical Models for Web Search Click Log Analysis

Relevance

New Impression Vector from an Existing Query

04/20/23 125

Global params

Predicted Examination

Predicted ClicksCIKM'09 Tutorial, Hong Kong, China

Page 126: Statistical Models for Web Search Click Log Analysis

Data are collected from a commercial search engine after query term normalization and spam removal.

For each query term, split query sessions evenly into training and test sets according to the timestamp.

Top frequent/infrequent query terms are removed.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 126

Page 127: Statistical Models for Web Search Click Log Analysis

Most popular metrics: Average test data log-likelihood (LL)

(probability of accurately predicting the click vector, 2^10 possibilities)[Guo+09a, Guo+09b, Liu+09a, Zhu+10]

Perplexity of prediction for each position(2^{average entropy} of click/no-click binary prediction for each position independently)[Dupret+08, Guo+09a, Guo+09b, Zhu+10]

04/20/23 CIKM'09 Tutorial, Hong Kong, China 127

Page 128: Statistical Models for Web Search Click Log Analysis

Other Metrics: Click-through-rate (CTR) prediction

(Especially for predicting CTR@1)[Chapelle+09, Zhu+10]

Predicting first/last clicked positions[Guo+09a, Guo+09b]

Position-bias sanity check(plot the click rate curve for top-10 positions v.s. the ground truth)[Guo+09a, Guo+09b]

04/20/23 CIKM'09 Tutorial, Hong Kong, China 128

Page 129: Statistical Models for Web Search Click Log Analysis

Average Log-likelihood Random guess: log(2-10) = -3.01 Optimal value: 0

12904/20/23

Model CCM UBM DCM

LL -1.171 -1.264 -1.302

Improve-ment Ratio

9.7% 14%

CIKM'09 Tutorial, Hong Kong, China

Page 130: Statistical Models for Web Search Click Log Analysis

13004/20/23

Better

Worse

CIKM'09 Tutorial, Hong Kong, China

Page 131: Statistical Models for Web Search Click Log Analysis

13104/20/23

Better

Worse

CIKM'09 Tutorial, Hong Kong, China

Page 132: Statistical Models for Web Search Click Log Analysis

Average Perplexity over top 10 positions Random guess: 2 Optimal value: 1

13204/20/23 CIKM'09 Tutorial, Hong Kong, China

Model CCM UBM DCM

Perplexity

-1.1479

1.1577 1.1590

Improve-ment Ratio

7.5% 8.3%

Page 133: Statistical Models for Web Search Click Log Analysis

13304/20/23 CIKM'09 Tutorial, Hong Kong, China

Worse

Better

Page 134: Statistical Models for Web Search Click Log Analysis

13404/20/23 CIKM'09 Tutorial, Hong Kong, China

Page 135: Statistical Models for Web Search Click Log Analysis

04/20/23 CIKM'09 Tutorial, Hong Kong, China 135

For 1M query sessions, the estimated time in seconds:

* Time for CCM and BBM includes computing posterior mean and variance using numerical integration w/ 100 bins.

** UBM converges in 34 iterations.

DCM CCM* BBM* UBM**

80 150 165 5,000

Page 136: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models Bayesian click models Selected topics on click models

Scaling click models for Petabyte-scale data

Click model evaluation

Tailoring user goals to click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 136

Page 137: Statistical Models for Web Search Click Log Analysis

Queries could be categorized into 2 sets: Navigational: to find the link to an

existing website, e.g., bing; Informational: more exploration, multiple

clicks may arise, e.g., iron man.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 137

Page 138: Statistical Models for Web Search Click Log Analysis

Different user goals result in different browsing and click patterns.

The straightforward mixture-modeling approach is not practical. [Dupret+08]

Solution: Classify query terms a priori based on user

goals. Fitting and learning 2 sets of model

parameters for navigational and informational queries.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 138

Page 139: Statistical Models for Web Search Click Log Analysis

Two-way classification for query terms based on click data using… Median position of click distribution Mean position of click distribution Average # clicks per query session …

Pick the one which has best click prediction If a position receives 50% of the click,

then navigational, else informational04/20/23 CIKM'09 Tutorial, Hong Kong, China 139

Page 140: Statistical Models for Web Search Click Log Analysis

Improvement of click prediction for DCM: Log-Likelihood: 4.0% Perplexity: 1.3%

Examination/Click position-bias:

04/20/23 CIKM'09 Tutorial, Hong Kong, China 140

Page 141: Statistical Models for Web Search Click Log Analysis

Introduction Designing click models Bayesian click models Selected topics on click models Conclusion

04/20/23 CIKM'09 Tutorial, Hong Kong, China 141

Page 142: Statistical Models for Web Search Click Log Analysis

Click models A statistical tool to leverage valuable

user implicit feedback in terabyte/petabyte search logs.

Provide click prediction as well as relevance estimates.

Application domains include learning to rank, measuring search performance, online advertising, user behavior analysis…

04/20/23 CIKM'09 Tutorial, Hong Kong, China 142

Page 143: Statistical Models for Web Search Click Log Analysis

Click models Different model designs reflect various

assumption of user behaviors to explain the position-bias.

The modeling choice may depend on the application scenario.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 143

Page 144: Statistical Models for Web Search Click Log Analysis

Click models Efficient, single-pass, parallelizable

algorithms are desired in real-world applications.

Bayesian framework could be applied to click models for both modeling benefits and computational benefits.

Click Chain Model and Bayesian Browsing Model represent state-of-the-art examples.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 144

Page 145: Statistical Models for Web Search Click Log Analysis

Bigger Context Query reformulations Personalization

Richer inputs Universal search Diverse user feedback

Click model v.s. Human judgments04/20/23 CIKM'09 Tutorial, Hong Kong, China 145

Page 146: Statistical Models for Web Search Click Log Analysis

[Burges+05]: C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. ICML’05.

[Chapelle+09]: O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. WWW’09.

[Craswell+08]: N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. WSDM ’08.

[Dean+04]: J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI’04.

[Dupret+08]: G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. SIGIR’08.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 146

Page 147: Statistical Models for Web Search Click Log Analysis

[Guo+09a]: F. Guo, C. Liu, and Y.-M. Wang. Efficient multiple-click models in web search. WSDM’09.

[Guo+09b]: F. Guo, C. Liu, A. Kannan, T. Minka, M. Taylor, Y.-M. Wang, and C. Faloutsos. Click chain model in web search. WWW’09.

[Guo+09c]: F. Guo, L. Li, and C. Faloutsos. Tailoring click models to user goals. WSCD’09.

[Joachims02]: T. Joachims. Optimizing search engines using clickthrough data. KDD’02.

[Joachims+07]: T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Accurately interpreting clickthrough data as implicit feedback, ACM TOIS, 25(2), 2007.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 147

Page 148: Statistical Models for Web Search Click Log Analysis

[Lee+05]: U. Lee, Z. Liu, and J. Cho. Automatic identification ofuser goals in web search. WWW’05.

[Liu+09a]: C. Liu, F. Guo, and C. Faloutsos. BBM: Deriving click models from petabyte-scale data. KDD’09.

[Liu+09b]: C. Liu, M. Li, and Y.-M. Wang. Post-rank reordering: resolving preference misalignments between search engines and end users. CIKM’09.

[Richardson+07]: M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. WWW’07.

[Zhu+10]: Z. Zhu, W. Chen, T. Minka, C. Zhu and Z. Chen. A novel click model and its applications to online advertising. To appear in WSDM’10.

04/20/23 CIKM'09 Tutorial, Hong Kong, China 148

Page 149: Statistical Models for Web Search Click Log Analysis

04/20/23 CIKM'09 Tutorial, Hong Kong, China 149

MSR, Search LabAnitha Kannan MSR, Cambridge

Tom Minka

Carnegie Mellon University

Christos Faloutsos Li-Wei HeMSR, ISRC-RedmondMSR, Cambridge

Nick Craswell

Page 150: Statistical Models for Web Search Click Log Analysis

04/20/23 CIKM'09 Tutorial, Hong Kong, China 150

Yi-Min WangMSR, ISRC-Redmond

MSR, CambridgeMike Taylor

MSR, ISRC-RedmondEthan Tu

Page 151: Statistical Models for Web Search Click Log Analysis

04/20/23 CIKM'09 Tutorial, Hong Kong, China 151