33
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research

Embed Size (px)

Citation preview

11

Learning User Interaction Models for Predicting Web Search Result Preferences

Eugene AgichteinEric BrillSusan DumaisRobert Ragno

Microsoft

Research

22

User InteractionsUser Interactions Goal: Harness rich user interactions with Goal: Harness rich user interactions with

search results to improve quality of searchsearch results to improve quality of search

Millions of users submit queries daily and Millions of users submit queries daily and interact with the search results interact with the search results – Clicks, query refinement, dwell timeClicks, query refinement, dwell time

User interactions with search engines are User interactions with search engines are plentiful, but require careful interpretationplentiful, but require careful interpretation

We will predict user preferences for resultsWe will predict user preferences for results

33

Related WorkRelated Work Linking implicit interactions and explicit Linking implicit interactions and explicit

judgmentsjudgments– Fox et al. [TOIS 2005]Fox et al. [TOIS 2005]

Predict explicit satisfaction rating Predict explicit satisfaction rating

– Joachims [SIGIR 2005 ]Joachims [SIGIR 2005 ] Predict preference (gaze studies, interpretation Predict preference (gaze studies, interpretation

strategies)strategies)

More broad overview of analyzing implicit More broad overview of analyzing implicit interactions: interactions: Kelly & Teevan [SIGIR Forum Kelly & Teevan [SIGIR Forum 2003]2003]

44

OutlineOutline Distributional model of user Distributional model of user

interactionsinteractions– User Behavior = Relevance + “Noise”User Behavior = Relevance + “Noise”

Rich set of user interaction featuresRich set of user interaction features

Learning framework to predict user Learning framework to predict user preferencespreferences

Large-scale evaluationLarge-scale evaluation

55

Interpreting User Interpreting User InteractionsInteractions Clickthrough and subsequent browsing behavior Clickthrough and subsequent browsing behavior

of of individual individual users influenced by many factorsusers influenced by many factors– Relevance of a result to a queryRelevance of a result to a query– Visual appearance and layoutVisual appearance and layout– Result presentation orderResult presentation order– Context, history, etc.Context, history, etc.

General idea: General idea: – Aggregate interactionsAggregate interactions across all users and queries across all users and queries– Compute “expected” behaviorCompute “expected” behavior for any query/page for any query/page– Recover relevance signalRecover relevance signal for a given query for a given query

66

Case Study: ClickthroughCase Study: Clickthrough

Clickthrough frequency for all queries in sampleClickthrough frequency for all queries in sample

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 10

result position

Re

lati

ve

Cli

ck

Fre

qu

en

cy

All queries

Clickthrough (query q, document d, result position p) = expected (p) + relevance (q , d)

77

Clickthrough for Queries with Clickthrough for Queries with Known Position of Top Relevant Known Position of Top Relevant ResultResult

Relative clickthrough for queries top relevant result known to be at position 1

1 2 3 5 10

Result Position

Re

lati

ve

Cli

ck

Fre

qu

en

cy

All queries

PTR=1

88

Clickthrough for Queries with Clickthrough for Queries with Known Position of Top Relevant Known Position of Top Relevant ResultResult

Relative clickthrough for queries with known relevant results in position 1 and 3 respectively

1 2 3 5 10

Result Position

Re

lati

ve

Cli

ck

Fre

qu

en

cy

All queries

PTR=1

PTR=3

Higher clickthrough at top non-relevant than

at top relevant document

99

Deviation from ExpectedDeviation from Expected

Relevance component: Relevance component: deviationdeviation from from “expected”:“expected”:Relevance(q , d)= observed - expected (p)

-0.023-0.029

-0.009-0.001

-0.013

0.010-0.002 -0.001

0.144

0.063

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

1 2 3 5 10

Result position

Clic

k f

req

ue

nc

y d

ev

iati

on

PTR=1

PTR=3

1010

Beyond Clickthrough: Beyond Clickthrough: Rich User Interaction SpaceRich User Interaction Space Observed and Distributional featuresObserved and Distributional features

– Observed features: aggregated values over all user Observed features: aggregated values over all user interactions for each query and result pairinteractions for each query and result pair

– Distributional features: deviations from the Distributional features: deviations from the “expected” behavior for the query“expected” behavior for the query

Represent user interactions as vectors in Represent user interactions as vectors in “Behavior Space”“Behavior Space”– PresentationPresentation: what a user sees : what a user sees beforebefore click click– ClickthroughClickthrough: frequency and timing of clicks: frequency and timing of clicks– BrowsingBrowsing: what users do : what users do afterafter the click the click

1111

Some User Interaction FeaturesSome User Interaction Features

PresentationPresentation

ResultPositionResultPosition Position of the URL in Current rankingPosition of the URL in Current ranking

QueryTitleOverlaQueryTitleOverlapp

Fraction of query terms in result TitleFraction of query terms in result Title

Clickthrough Clickthrough

DeliberationTimeDeliberationTime Seconds between query and first clickSeconds between query and first click

ClickFrequencyClickFrequency Fraction of all clicks landing on pageFraction of all clicks landing on page

ClickDeviationClickDeviation Deviation from expected click Deviation from expected click frequencyfrequency

Browsing Browsing

DwellTimeDwellTime Result page dwell timeResult page dwell time

DwellTimeDeviatiDwellTimeDeviationon

Deviation from expected dwell time for Deviation from expected dwell time for queryquery

1212

OutlineOutline

Distributional model of user Distributional model of user interactionsinteractions

Rich set of user interaction featuresRich set of user interaction features

Models for predicting user Models for predicting user preferencespreferences

Experimental resultsExperimental results

1313

Predicting Result Predicting Result PreferencesPreferences Task: predict pairwise preferencesTask: predict pairwise preferences

– A user will prefer Result A > ResultA user will prefer Result A > Result BB

Models for preference prediction Models for preference prediction – Current search engine rankingCurrent search engine ranking– ClickthroughClickthrough– Full user behavior modelFull user behavior model

1414

Clickthrough ModelClickthrough Model SA+N: “Skip Above” and “Skip Next”SA+N: “Skip Above” and “Skip Next”

– Adapted from Joachims’ et al. [SIGIR’05]Adapted from Joachims’ et al. [SIGIR’05]– Motivated by gaze tracking Motivated by gaze tracking

ExampleExample– Click on results 2, 4Click on results 2, 4– Skip Above: 4 > (1, 3), 2>1Skip Above: 4 > (1, 3), 2>1– Skip Next: 4 > 5, 2>3Skip Next: 4 > 5, 2>3

1

2

3

4

5

6

7

8

1515

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

1 2 3 4 5

Result position

Cli

ck

thro

ug

h F

req

ue

nc

y D

ev

iati

on

Distributional ModelDistributional Model CD: distributional model, extends SA+NCD: distributional model, extends SA+N

– Clickthrough considered iff frequency > Clickthrough considered iff frequency > εε than than expected expected

Click on result 2 likely “by chance”Click on result 2 likely “by chance” 4>(1,2,3,5), but 4>(1,2,3,5), but notnot 2>(1,3) 2>(1,3)

1

2

3

4

5

6

7

8

1616

User Behavior ModelUser Behavior Model Full set of interaction featuresFull set of interaction features

– Presentation, clickthrough, browsingPresentation, clickthrough, browsing

TrainTrain the model with explicit judgments the model with explicit judgments– Input: behavior feature vectors for each query-Input: behavior feature vectors for each query-

page pair in rated resultspage pair in rated results

– Use Use RankNetRankNet (Burges et al., [ICML 2005]) (Burges et al., [ICML 2005]) to discover model weightsto discover model weights

– Output: a neural net that can assign a Output: a neural net that can assign a “relevance” score to a behavior feature vector“relevance” score to a behavior feature vector

1717

RankNet for User RankNet for User BehaviorBehavior

RankNet: general, scalable, robust RankNet: general, scalable, robust Neural Net training algorithms and Neural Net training algorithms and implementationimplementation

Optimized for Optimized for rankingranking – predicting an – predicting an ordering of items, not scores for eachordering of items, not scores for each

Trains on pairsTrains on pairs (where first point is to be (where first point is to be ranked higher or equal to second)ranked higher or equal to second)– Extremely efficientExtremely efficient– Uses Uses cross entropy costcross entropy cost (probabilistic model) (probabilistic model)– UsesUses gradient descentgradient descent to set weights to set weights – Restarts to escape local minimaRestarts to escape local minima

1818

OutlineOutline

Distributional model of user Distributional model of user interactionsinteractions

Rich set of user interaction featuresRich set of user interaction features

Models for predicting user Models for predicting user preferencespreferences

Experimental evaluationExperimental evaluation

1919

Evaluation MetricsEvaluation Metrics Task: predict user preferencesTask: predict user preferences

Pairwise agreement:Pairwise agreement:– For comparison with previous workFor comparison with previous work– Useful for ranking and other applicationsUseful for ranking and other applications

Precision for a query:Precision for a query:– Fraction of pairs predicted that agree with Fraction of pairs predicted that agree with

preferences derived from human ratingspreferences derived from human ratings

Recall for a query:Recall for a query:– Fraction of human-rated preferences predicted Fraction of human-rated preferences predicted

correctlycorrectly

Average Precision and Recall across all queries Average Precision and Recall across all queries

2020

DatasetsDatasets

Explicit judgmentsExplicit judgments– 3,500 queries, top 10 results, relevance 3,500 queries, top 10 results, relevance

ratings converted to pairwise preferences for ratings converted to pairwise preferences for each query each query

User behavior dataUser behavior data– Opt-in client-side instrumentationOpt-in client-side instrumentation– Anonymized UserID, time, visited pageAnonymized UserID, time, visited page

Detect queries submitted to MSN Search engineDetect queries submitted to MSN Search engine Subsequent visited pagesSubsequent visited pages 120,000 instances of these 3,500 queries 120,000 instances of these 3,500 queries

submitted at least 2 times over 21 dayssubmitted at least 2 times over 21 days

2121

Methods ComparedMethods Compared

Preferences inferred by:Preferences inferred by:

Current search engine ranking: Current search engine ranking: BaselineBaseline– Result Result i i > Result > Result jj iff iff i i > > jj

Clickthrough model: Clickthrough model: SA+NSA+N

Clickthrough distributional model: Clickthrough distributional model: CDCD

Full user behavior model: Full user behavior model: UserBehaviorUserBehavior

2222

Results: Predicting User Results: Predicting User PreferencesPreferences

SA+N

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

0 0.1 0.2 0.3 0.4

Recall

Pre

cis

ion

SA+N

CD

UserBehavior

Baseline

• Baseline < SA+N < CD << UserBehavior• Rich user behavior features result in dramatic improvement

2323

Contribution of Feature TypesContribution of Feature Types

• Presentation features not helpful• Browsing features: higher precision, lower recall• Clickthrough features > CD: due to learning

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.01 0.05 0.09 0.13 0.17 0.21 0.25 0.29 0.33 0.37 0.41 0.45

Recall

Pre

cisi

on

ClickthroughPresentationBrowsing

2424

Amount of Interaction Amount of Interaction DataData

0.65

0.67

0.69

0.71

0.73

0.75

0.77

0.79

0.81

0.83

0.85

0.01 0.05 0.09 0.13 0.17 0.21 0.25 0.29 0.33 0.37 0.41 0.45 0.49

Recall

Pre

cisi

on

2 or more

10 or more

20 or more

• Prediction accuracy for varying amount of user interactions per query• Slight increase in Recall, substantial increase in Precision

2525

Learning CurveLearning Curve

0

0.05

0.1

0.15

0.2

7 12 17 21

Days of user interactions observed

Rec

all

ClickDeviation

UserBehavior

• Minimum precision of 0.7• Recall increases substantially with more days of user interactions

2626

Experiments SummaryExperiments Summary Clickthrough distributional model: Clickthrough distributional model:

more accurate than previously more accurate than previously published workpublished work

Rich user behavior features: dramatic Rich user behavior features: dramatic accuracy improvementaccuracy improvement

Accuracy increases for frequent Accuracy increases for frequent queries and longer observation periodqueries and longer observation period

2727

Some ApplicationsSome Applications Web search ranking (next talk):Web search ranking (next talk):

– Can use preference predictions to re-rank resultsCan use preference predictions to re-rank results– Can integrate features into ranking algorithmsCan integrate features into ranking algorithms

Identifying and answering navigational Identifying and answering navigational queries queries – Can tune model to focus on top 1 result Can tune model to focus on top 1 result – Supports classification or ranking methodsSupports classification or ranking methods– Details in Agichtein & Zheng, [KDD 2006]Details in Agichtein & Zheng, [KDD 2006]

Automatic evaluation: augment explicit Automatic evaluation: augment explicit relevance judgments relevance judgments

2828

ConclusionsConclusions

General framework for training General framework for training rich user interaction modelsrich user interaction models

Robust techniques for inferring Robust techniques for inferring user relevance preferencesuser relevance preferences

High-accuracy preference High-accuracy preference prediction in a large scale prediction in a large scale evaluationevaluation

2929

Thank youThank you

Text Mining, Search, and Navigation group: http://research.microsoft.com/tmsn/

Adaptive Systems and Interaction group:http://research.microsoft.com/adapt/

Microsoft

Research

3030

Presentation FeaturesPresentation Features

Query terms in Title, Summary, Query terms in Title, Summary, URLURL

Position of result Position of result Length of URLLength of URL Depth of URLDepth of URL ……

3131

Clickthrough FeaturesClickthrough Features

Fraction of clicks on URLFraction of clicks on URL Deviation from “expected” given Deviation from “expected” given

result positionresult position Time to clickTime to click Time to first click in “session”Time to first click in “session” Deviation from average time for Deviation from average time for

queryquery

3232

Browsing FeaturesBrowsing Features Time on URLTime on URL Cumulative time on URL Cumulative time on URL

(CuriousBrowser)(CuriousBrowser) Deviation from average time on URLDeviation from average time on URL

– Averaged over the “user”Averaged over the “user”– Averaged over all results for the queryAveraged over all results for the query

Number of subsequent non-result Number of subsequent non-result URLsURLs

3333

An Intelligent BaselineAn Intelligent Baseline

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Recall

Pre

cisi

on

UserBehavior Baseline++