Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации

usage mining techniqueswith applications to web searchand content recommendation

Aristides Gionis

Yahoo! Research, Barcelona

yandex aug 31, 2012

yahoo! research, barcelona

web mining

social media and multimedia

large-scale distributed systems

user engagement

semantic web

yandex aug 31, 2012

web mining in yahoo! research

themes

usage mining and query-log mining

social network analysis and graph mining

influence propagation

other data mining problems

data sources

- query logs (search) and toolbar (browsing)

- social networks (flickr, messenger, email, ...)

- question-answering (answers)

- micro-blogging (twitter)

yandex aug 31, 2012

web mining in yahoo! research

themes

usage mining and query-log mining

social network analysis and graph mining

influence propagation

other data mining problems

data sources

- query logs (search) and toolbar (browsing)

- social networks (flickr, messenger, email, ...)

- question-answering (answers)

- micro-blogging (twitter)

yandex aug 31, 2012

overview of the talk

query-log mining

query graphsquery recommendations

yahoo! tips

news recommendations using real-time web

yandex aug 31, 2012

query-log mining

yandex aug 31, 2012

query-log mining

search engines collect a large amount of query logs

lots of interesting information

analyzing users’ behaviorcreating user profiles and personalizationcreating knowledge bases and folksonomiesfinding similar conceptsbuilding systems for query recommendationsusing statistics for improving systems’ performance. . .

yandex aug 31, 2012

query-log mining

search engines collect a large amount of query logs

lots of interesting information

analyzing users’ behaviorcreating user profiles and personalizationcreating knowledge bases and folksonomiesfinding similar conceptsbuilding systems for query recommendationsusing statistics for improving systems’ performance. . .

yandex aug 31, 2012

the click graph

[Craswell and Szummer, 2007]

yandex aug 31, 2012

applications of the click graph

[Craswell and Szummer, 2007]

query-to-document search

query-to-query suggestion

document-to-query annotation

document-to-document relevance feedback

yandex aug 31, 2012

the query-flow graph

[Boldi et al., 2008]

take into account temporal information

captures the “flow” of how users submit queries

definition:

nodes V = Q ∪ {s, t} the distinct set of queries Q, plusa starting state s and a terminal state tedges E ⊆ V × Vweights w(q, q′) representing the probabilitythat q and q′ are part of the same chain

yandex aug 31, 2012

building the query-flow graph

an edge (q, q′) if q and q′ are consecutive inat least one session

weights w(q, q′) learned by machine learning

features used

textual features: cosine similarity, Jaccard coefficient,size of intersection, etc.session features: the number of sessions, the averagesession length, the average number of clicks in thesessions, the average position of the queries in thesessions, etc. andtime-related features: average time difference, etc.

yandex aug 31, 2012

query-flow graph

barcelona fc

<T>

0.506

barcelona fcwebsite

0.043barcelona fc

fixtures

0.031

realmadrid

0.017

barcelonaweather

0.523

barcelonahotels

0.018

barcelonaweatheronline

0.100

barcelona

0.018

0.011

0.439

cheapbarcelona

hotels

0.072

luxurybarcelona

hotels

0.029

0.080

0.416

0.043

0.023

yandex aug 31, 2012

query-flow graph

dog

cat

funny cat

picture of a catcat and dog

picture of a funny

breed of dog

dog for sale

picture of a dog

funny dog

^

$

yandex aug 31, 2012

query recommendations

the general theme:

given an input query q

identify similar queries q

rank them and present them to the user

most query graphs can be used for both tasks:similarity and ranking

yandex aug 31, 2012

query recommendations

the general theme:

given an input query q

identify similar queries q

rank them and present them to the user

most query graphs can be used for both tasks:similarity and ranking

yandex aug 31, 2012

recommendations using the query-flow graph

[Boldi et al., 2008]

perform a random walk on the query-flow graph

teleportation to the submitted query

teleportation to previous queries to take into accountthe user history

normalize PageRank score to un-biasingfor very popular queries

yandex aug 31, 2012

example : apple

Max. weight sq sq sq

t t apple appleapple ipod apple apple fruit apple ipodapple store apple ipod apple ipod apple trailersapple trailers apple store apple belgium apple storeamazon apple trailers eating apple apple macapple mac google apple.nl apple fruititunes amazon apple monitor apple usapc world argos apple usa apple ipod nanoargos itunes apple jobs apple.com/ipod...

yandex aug 31, 2012

example : banana → apple

banana → apple banana

banana bananaapple eating bugsusb no banana holidaybanana cs opening a bananagiant chocolate bar banana shoewhere is the seed inanut

fruit banana

banana shoe recipe 22 feb 08fruit banana banana jules oliverbanana cloths banana cseating bugs banana cloths

yandex aug 31, 2012

example : beatles → apple

beatles → apple beatles

beatles beatlesapple scarringapple ipod paul mcartneyscarring yarns from irelandsrg peppers artwork statutory instrument

A55ill get you silver beatles tribute

bandbashles beatles mp3dundee folk songs GHOST’Sthe beatles love album ill get youplace lyrics beatles fugees triger finger

remix

yandex aug 31, 2012

recommendations as shortcuts to qfg

[Anagnostopoulos et al., 2010]

yandex aug 31, 2012

the query-recommendation problem

yandex aug 31, 2012


yandex aug 31, 2012


yandex aug 31, 2012


yandex aug 31, 2012

the recommendation problem

model user behavior as a random walk on qfg

a user starts at query q0 and follows a path p ofreformulations on qfg before terminating

consider a reward function w(q) on the nodes of qfg

goal: “nudge” users in order to maximize their reward

objectives:

1. collect a large reward along the way

2. end the session at a high-reward node

applications: a general problem formulation for suggestingshortcuts (web graph, social networks, etc.)

yandex aug 31, 2012

probabilistic model

we can only suggest, not order the user

we do not know how the user will act

random walk on qfg is modeled by stochastic matrix P

recommendations R modify P to P ′ = P + R

yandex aug 31, 2012

utility functions

reward function w(q) on queries

- quality of search results, user satisfaction, dwell time,monetization, etc.

utility function U(p) on paths p = 〈q0 . . . qk−1T 〉

U(p) =∑

q∈p

w(q) U(p) = w(qk−1),

(Cafavy) (Machiavelli)

“road to Ithaca” “end justify the means”

yandex aug 31, 2012

utility

w ρ ρw 1−step heuristic

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Sum of expected values

yandex aug 31, 2012

qfg projections for diverse recommendations

[Bordino et al., 2010]

yandex aug 31, 2012

diverse recommendations

[Bordino et al., 2010]

we want not only relevant and high-qualityrecommendations, but also a diverse set

we want recommendations that take to different“directions” in the qfg

need notions of distance of queries in the qfg

use spectral embeddings

project a graph in a low dimensional space, so thatembedding minimizes total edge distortion

finding diverse recommendations reduces to a geometricproblem

yandex aug 31, 2012

example: time

Spectral projection on 2-hop neighborhood

time time magazine new york times time zone world time what time is it time warner time warner cabletime magazine 0.9953 0.0162 0.1422 0.1049 -0.6071 -0.6056new york times 0.9953 -0.0051 0.1248 0.0893 -0.6478 -0.6462

time zone 0.0162 -0.0051 0.9903 0.9891 -0.5234 -0.5254world time 0.1422 0.1248 0.9903 0.9970 -0.6263 -0.6282

what time is it 0.1049 0.0893 0.9891 0.9970 -0.6244 -0.6263time warner -0.6071 -0.6478 -0.5234 -0.6263 -0.6244 0.9999

time warner cable -0.6056 -0.6462 -0.5254 -0.6282 -0.6263 0.9999

yandex aug 31, 2012

improving recommendationfor long-tail queries via templates

[Szpektor et al., 2011]

yandex aug 31, 2012

motivation

goal: improve coverage of query-recommendation systems

observation: in a typical query log 50 % of query volumeare unique queries [Baeza-Yates et al., 2007]

most query-recommendation systems are based on findingqueries that co-occur frequently

inherent limitation on using co-occurrences

need to be able to develop methods to reason for rare,and even previously unseen, queries

yandex aug 31, 2012

overview of the approach

1 generate candidate query-templates for each query

Paris hotels → <city> hotels

Paris hotels → <district> hotels

Moscow hotels → <city> hotels

2 infer transitions between templates

<city> hotels → <city> restaurants

3 infer recommendations for rare queries

Yancheng hotels → Yancheng restaurants

yandex aug 31, 2012










yandex aug 31, 2012










yandex aug 31, 2012










yandex aug 31, 2012










yandex aug 31, 2012










yandex aug 31, 2012

query templates

defined over a hierarchy of entity types

define a global set of templates over the whole query log

do not restrict on specific domains(such as, travel, weather, or movies)

examples:

jaguar spare parts → <car> spare parts

name for salt → name for <compound>

a thousand miles notes → <song> notes

yandex aug 31, 2012

query templates

defined over a hierarchy of entity types

define a global set of templates over the whole query log

do not restrict on specific domains(such as, travel, weather, or movies)

examples:

jaguar spare parts → <car> spare parts


a thousand miles notes → <song> notes

yandex aug 31, 2012

candidate templates – example

chocolate cookie chocolate cookie

food

dessert

drink

recipe

instruction

substance

query: chocolate cookie recipe

candidate templates: <food> cookie recipe

<drink> cookie recipe

<food> recipe

<substance> recipe

chocolate cookie <instruction> . . .

yandex aug 31, 2012



food

dessert

drink

recipe

instruction

substance




<food> recipe

<substance> recipe


yandex aug 31, 2012



food

dessert

drink

recipe

instruction

substance




<food> recipe

<substance> recipe


yandex aug 31, 2012

ranking candidate templates

ambiguity

Jaguar spare parts → <car> spare parts

Jaguar spare parts → <animal> spare parts

focus


name for salt → <description> for salt

right generalization level

Paris hotels → <capital> hotels


Paris hotels → <location> hotels

yandex aug 31, 2012


ambiguity



focus







yandex aug 31, 2012


ambiguity



focus







yandex aug 31, 2012

construction of query templates – details

hierarchy used: WordNet 3.0 hierarchy and Wikipediacategory hierarchy, connected via yago mapping

queries are tokenized, and n-grams are looked up andmapped to entities in the hierarchy

enriched with heuristic generalizations for <email>,<url>, numbers, and noun-phrases not in the taxonomy

yandex aug 31, 2012

query-to-template edges

mapping from a query q to its set of templates T (q)viewed as query-to-template edges

associated edge scores

sqt(q, t) = αd

when t obtained by generalizing q at distance d in H

parameter α set experimentally to 0.9

set sqt(q, q′) = 1, if (q, q′) edge in query-flow graph

normalize so that all sqt(q, ·) sum to 1

yandex aug 31, 2012

template-to-templates edges

reasoning about transitions between templates

<food> recipe → healthy <food> recipe

for templates (t1, t2) define the support set of query pairs{(q1, q2)}, s.t.

t1 ∈ T (q1) and t2 ∈ T (q2)t1 and t2 substitute the same token in q1 and q2

(e.g., dosa recipe and healthy dosa recipe)

define template-to-template edge score as

stt(t1, t2) =∑

(q1,q2)∈Sup(t1,t2)

sqq(q1, q2)

normalize so that all stt(t, ·) sum to 1

yandex aug 31, 2012

example – ambiguity

consider query transition:jaguar transmission → jaguar spare parts

template transition<car> transmission → <car> spare parts

supported bybmw transmission → bmw spare parts

audi transmission → audi spare parts

. . .

template transition<animal> transmission → <animal> spare parts

will not be supported bylion transmission → lion spare parts

tiger transmission → tiger spare parts

. . .

yandex aug 31, 2012

example – ambiguity

consider query transition:jaguar transmission → jaguar spare parts

template transition<car> transmission → <car> spare parts

supported bybmw transmission → bmw spare parts

audi transmission → audi spare parts

. . .

template transition<animal> transmission → <animal> spare parts

will not be supported bylion transmission → lion spare parts

tiger transmission → tiger spare parts

. . .

yandex aug 31, 2012

the query-template flow graph

extension of the query-flow graph

superposition of all the concepts we have seen so far:

set of nodes consists of queries and templates

set of edges consists of

query to query edgesquery to template edgestemplate to template edges

associated weights

yandex aug 31, 2012

generating recommendations

q

q q′

q′t1

t2

t3

t4

s1

s2

s3

s4

s5

s6

s7

r(q, q′) = s1s4 + s2s5 + s3s6 + s3s7

interpretation: probability of a feasible path

dashed lines do not really exist, but discovered on-the-fly

queries q and q′ may not have been seen before

transitions in the query-flow graph ranked first

yandex aug 31, 2012

methodology

methods:

query-template flow graph

query-flow graph

evaluation:

inspection a sample of the results

editorial evaluation

automated evaluation

yandex aug 31, 2012

training dataset

queries templates# nodes 95 279 132 5 382 051 983# edges 83 513 590 4 345 497 267avg degree 0.88 0.81max out-degree 14 145 34 249

(craigslist) (<album>)max in-degree 14 317 133 874

(youtube) (<institution>)

yandex aug 31, 2012

anecdotal evidence

{“guangzhou flights”, “guangzhou map”}<capital> flights → <capital> map

{“a thousand miles notes”, “a thousand miles piano notes”}<single> notes → <single> piano notes

{“8 week old weimaraner”, “8 week old weimaraner puppy”}8 week old <breed> → 8 week old <breed> puppy

{“aaa office twin falls idaho”, “aaa twin falls idaho”}aaa office <city> → aaa <city>

{“air force titles”, “air force ranks”}<military service> titles → <military service> ranks

{“name for salt”, “chemical name for salt”}name for <compound> → chemical name for <compound>

yandex aug 31, 2012

editorial evaluation

set-A: 300 pairs from each configuration,recommendation in the top-10

set-B: 100 pairs, same queries in each configuration,same position

set-C: 100 pairs for which query-flow graph has norecommendation

editors labeled query-recommendation pairs as:relevant, not relevant, cannot tell

two editors, 100 common queries, kappa-statistic 0.37

qfg qtfgset-A 98.48% 97.84%set-B 97.65% 98.86%set-C — 94.38%

yandex aug 31, 2012

automated evaluation – guiding principle

extract query pairs {qi , qi+1} from a testing dataset, suchthat user submitted qi+1 after qi in the same session

measure if qi+1 is predicted by our methods, and in whichposition

assumption: qi+1 should be relevant and useful for qi

yandex aug 31, 2012

results

qfg qtfg relative increase

pair occurrences

total pairs 3134388 3134388coverage 22.65 % 28.17 % 24.37 %# in top-100 16.97 % 25.49 % 50.23 %# in top-10 9.49 % 20.74 % 118.49 %# in top-1 2.86 % 10.01 % 249.5 %MAP 0.050 0.137avg. position 18.35 8.3

unique pairs

total pairs 2755922 2755922coverage 13.28 % 19.38 % 45.87 %# in top-100 12.06 % 17.25 % 42.96 %# in top-10 8.41 % 13.52 % 60.68 %# in top-1 2.86 % 6.5 % 127.32 %MAP 0.047 0.089avg. position 12.33 9.43yandex aug 31, 2012

results

0

2

4

6

8

10

12

14

16

18

20

2 4 6 8 10 12 14 16

# te

st-p

airs

at t

op-1

0 (%

)

query length (words)

QFGQTFG

yandex aug 31, 2012

conclusions

improve coverage of query recommendation systems

recommendations for rare or previously unseen queries

well suited for tail queries

complements rather than replaces existing methods

future work: improve quality of extracted templates

yandex aug 31, 2012

yahoo! tips

[Weber et al., 2011]

yandex aug 31, 2012

motivation

provide answers, not links

identify “how to” queries and provide tips

tip: piece of advice that is1 short2 concrete3 self-contained4 non-obvious

yandex aug 31, 2012

yahoo! tips

yandex aug 31, 2012

yahoo! tips

yandex aug 31, 2012

yahoo! tips

yandex aug 31, 2012

yahoo! tips

yandex aug 31, 2012

extract tips from yahoo! answers

tip: To tell if your eggs are fresh : place eggs in a bowl/glassof water.....if it floats it’s bad. if it sinks it’s good.

yandex aug 31, 2012

system diagram

zest lime without zester

250k candidate tips

rule-based extraction

machine learning

Does query have

how-to intent?

show normal

search resultsno

yes

Obtain quality labels for 20k

candidate tip using CrowdFlower

machine learning

22k high quality tipsAre there relevant

high quality tips?

show normal

search results

rank the matching tips and

display highest ranking one

TIP: To zest a lime if you don‘t have a zester : use a cheese grater

no

yes

yandex aug 31, 2012

mining tips from yahoo! answers

consider tips of a specific structure: “X : Y ”

X : goal of the tip

Y : action of the tip

examples

To get the mildew smell out of your towels : try soakingit in a salt water solution, then washing with soap andcold water, that tends to get rid of smellsTo style your hair without heat, gel or straighteners : trycoconut oil mark k

yandex aug 31, 2012

mining tips from yahoo! answers

english

only literal “how to” queries

answer should start with a verb

consider only best answers

replace I, my, me, myself, etc.with you, your, you, yourself, etc.

yandex aug 31, 2012

quality filtering

generated 249 675 tips

manually label 20 000 using CrowdFlower

classes: very good (25%), ok (48%), bad (27%)

algorithms

svm (rbf)decision treesk-nn (Euclidean, k = 21 . . . 50)

feature families:

18 handcrafted features: e.g., style (Flesch-Kincaidreading level), sentiment, # urls, emoticons, etc.content: SVD on the tip×term matrix

yandex aug 31, 2012

quality filtering




algorithms


feature families:


yandex aug 31, 2012

quality filtering




algorithms


feature families:


yandex aug 31, 2012

quality filtering — machine learning results

Method handcrafted content bothfeatures features

Har

d SVM 0.63/0.13 0.60/0.09 0.63/0.16Decision Tree 0.67/0.07 0.61/0.06 0.66/0.13k-NN 0.62/0.23 0.56/0.11 0.63/0.11

Sof

t SVM 0.95/0.11 0.93/0.05 0.95/0.08Decision Tree 0.95/0.03 0.92/0.03 0.94/0.06k-NN 0.94/0.11 0.91/0.05 0.94/0.05

yandex aug 31, 2012

quality filtering — machine learning results

Category P,R VG sizeBeauty & Style 0.53,0.08 0.16 0.08Business & Finance 0.57,0.20 0.20 0.03Cars & Transportation 0.64,0.12 0.23 0.03Computers & Internet 0.69,0.33 0.45 0.15Consumer Electronics 0.70,0.23 0.38 0.06Entertainment & Music 0.60,0.39 0.15 0.05Family & Relationships 0.35,0.05 0.06 0.14Games & Recreation 0.61,0.31 0.24 0.04Health 0.62,0.07 0.15 0.09Home & Garden 0.43,0.06 0.27 0.04Society & Culture 0.50,0.19 0.09 0.03Sports 0.68,0.24 0.19 0.03Yahoo! Products 0.73,0.43 0.45 0.07

yandex aug 31, 2012

detecting “how to” queries

how many? 2-3% of volume, 3-4% of distinct queries

start with “how to” “how do i” or “how can i”

how do you fix keys on a laptopP: 96-99%, cover: 1.0%

queries start with an action verb

play my music on tool bar raidoP: 7-14%, cover: 3.2%

if exists “how to X” then “X”

craft ideas for boysP: 87-94%, cover: 1.1%

incoming queries to “how to” web sites

fixing a wet cell phoneP: 61-75%, cover: 0.08%

yandex aug 31, 2012











yandex aug 31, 2012











yandex aug 31, 2012











yandex aug 31, 2012











yandex aug 31, 2012

matching queries to tips

precision–recall trade-off

index only the “goal” or also “action”use AND or OR mode for queryrequire minimum “span” for the goal

ranking

rank by number of query tokens in goal, then tf·idf

yandex aug 31, 2012

matching queries to tips — evaluation

mode min span vol. dist. P@1 medianAND .50 8.7% 2.7% .428/.680 1AND .66 6.8% 1.8% .557/.770 1AND 1.0 4.4% 0.8% .625/.835 1OR .50 87.4% 88.4% .048/.110 18OR .66 36.8% 36.3% .092/.200 2OR 1.0 13.5% 10.3% .160/.300 1

yandex aug 31, 2012

future work

mine tips from other recourses

twitterwikitravel

improve quality of existing system

incorporating more featuresimproving rule extractionclassification

yandex aug 31, 2012

information dissemination in social networks

yandex aug 31, 2012

the information dissemination spectrum

news sitescontent-provider siteseditorially curatedusers browseno specific info need

web searchurl, images, music,...clear intent

social media (twitter, facebook)recommendations(content- or context- or geo-aware)user-generated content(blogs, images, q/a)

yandex aug 31, 2012





yandex aug 31, 2012





yandex aug 31, 2012

social media

yandex aug 31, 2012

the information overload problem

yandex aug 31, 2012

social media and user-generated content

paradigm shift from a broadcast one-to-many mechanismto a many-to-many model

users at the role of information producers

yandex aug 31, 2012

benefits and opportunities

wealth of information of extreme volume and diversity

wisdom of crowd phenomena

accurate profiling and personalization(toolbar, search, clicks)

content- and context- information available

social and geo information available

yandex aug 31, 2012

challenges

heterogeneous sources

high variability in quality

needle-in-the-haystack problems

we want to:

support users to seek, filter, and disseminate information

build efficient platforms that support social-mediafunctionalities

yandex aug 31, 2012

challenges

heterogeneous sources

high variability in quality

needle-in-the-haystack problems

we want to:

support users to seek, filter, and disseminate information

build efficient platforms that support social-mediafunctionalities

yandex aug 31, 2012

personalized news recommendationsby harnessing the real-time web

[De Francisci Morales et al., 2012]

yandex aug 31, 2012

overview

a news recommendation system based on real-time web,e.g., twitter

suggest news articles to twitter users

infer user preferences from twitter activity

yandex aug 31, 2012

yahoo! news

yandex aug 31, 2012

yahoo! news

yandex aug 31, 2012

yahoo! news

yandex aug 31, 2012

sources characteristics

news stream

+ high coverage

− sparse and noisy data for user profiling

− latency on collecting user feedback

twitter stream

+ much more accurate personalization

+ news spread very fast

yandex aug 31, 2012

Entities

News

Tweets

From Chatter to Headlines:Harnessing the Real-Time Web

for Personalized News Recommendation

Overview Motivation Problem

Model Method Results

tweetsUser

tweetsFollowee

tweetsFollowee

tweetsFollowee

tweetstwitter

articlesnews

T.Rex

User Model

!

"

#

Personalized ranked list of news articles

Table 5.2: MRR, precision and coverage.

Algorithm MRR P@1 P@5 P@10 CoverageRECENCY 0.020 0.002 0.018 0.036 1.000CLICKCOUNT 0.059 0.024 0.086 0.135 1.000SOCIAL 0.017 0.002 0.018 0.036 0.606CONTENT 0.107 0.029 0.171 0.286 0.158POPULARITY 0.008 0.003 0.005 0.012 1.000T.REX 0.107 0.073 0.130 0.168 1.000T.REX+ 0.109 0.062 0.146 0.189 1.000

RECENCY: it ranks news articles by time of publication (most recent first);CLICKCOUNT: it ranks news articles by click count (highest count first);SOCIAL: it ranks news articles by using T.REX with β = γ = 0;CONTENT: it ranks news articles by using T.REX with α = γ = 0;POPULARITY: it ranks news articles by using T.REX with α = β = 0.

5.6.5 Results

We report MRR, precision and coverage results in Table 5.6.3. The twovariants of our system, T.REX and T.REX+, have the best results overall.

T.REX+ has the highest MRR of all the alternatives. This result meansthat our model has a good overall performance across the dataset. CON-TENT has also a very high MRR. Unfortunately, the coverage level achievedby the CONTENT strategy is very low. This issue is mainly caused by thesparsity of the user profiles. It is well know that most of twitter usersbelong to the “silent majority,” and do not tweet very much.

The SOCIAL strategy is affected by the same problem, albeit to a muchlesser extent. The reason for this difference is that SOCIAL draws froma large social neighborhood of user profiles, instead of just one. So ithas more chances to provide a recommendation. The quality of the rec-ommendation is however quite low, probably because the social-basedprofile only is not able to catch the specific user interests.

It is worth noting that in almost 20% of the cases T.REX+ was able torank the clicked news in the top 10 results. Ranking by the CLICKCOUNT

124

!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ave

rag

e D

CG

Rank

T.Rex+T.Rex

PopularityContent

SocialRecency

Click count

63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5

T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5

What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5

Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5

Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5

Popularity Model Π6'('7'*'8%?@"*"'6,/0%(0%9@"%)+)8,#*(9E%+>%$"?0%#*9(',"%1/5

in updating the popularity counts is to take into account recency: newentities of interest should dominate the popularity counts of older enti-ties. In this work, we choose to update the popularity counts using anexponential decay rule. We discuss the details in Section 5.3.1. However,note that the popularity update is independent of our recommendationmodel, and any other decaying function can be used.

Finally, we propose a ranking function for recommending news arti-cles to users. The ranking function is linear combination of the scoringcomponents described above. We plan to investigate the effect of non-linear combinations in the future.

Definition 10 (Recommendation ranking Rτ (u, n)). Given the componentsΣτ , Γτ and Πτ , resulting form a stream of news N and a stream of tweets Tauthored by users U up to time τ , the recommendation score of a news articlen ∈ N for a user u ∈ U at time τ is defined as

Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n),

where α, β, γ are coefficients that specify the relative weight of the components.

At any given time, the recommender system produces a set of newsrecommendation by ranking a set of candidate news, e.g., the most re-cent ones, according to the ranking function R. To motivate the pro-posed ranking function we note similarities with popular recommenda-tion techniques. When β = γ = 0, the ranking function R resemblescollaborative filtering, where user similarity is computed on the basisof their social circles. When α = γ = 0, the function R implements acontent-based recommender system, where a user is profiled by the bag-of-entities occurring in the tweets of the user. Finally, when α = β = 0,the most popular items recommended, regardless of the user profile.

Note that Σ, Γ, Π and R are all time dependent. At any given time τ

the social network and the set of authored tweets vary, thus affecting Σ

and Γ. More importantly, some entities may abruptly become popular,hence of interest to many user. This dependency is captured by Π. Whilethe changes in Σ and Γ derive directly from the tweet stream T and thesocial network S, the update of Π is non-trivial, and plays a fundamentalrole in the recommendation system that we describe in the next section.

108

Recommendation Model R

T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"

;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0

"#$%!&'(!&)*+,!-).&!/(0(12$&!$(3.!4)/!5.(/!&!2&!&#-(τ6Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1('9+*%+>%($9"*"095

How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5

Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05

DataR"?0L%SIT-%#*9(',"0%>*+:%A#@++B%$"?0P?(99"*L%H%:+$9@%+>%'*#?,"1%9?""9052,('-0L%80"*0%+>%9?(99"*%($%A#@++B%9++,<#*%,+405

EvaluationU"%"3#,8#9"%!"#$%%#0%#%',('-%)*"1('9(+$%0E09":5%U"%9*#($%+8*%:+1",%80($4%#%,"#*$($4V9+V*#$-%#))*+#'@%#$1%08))+*9%3"'9+*%:#'@($"05P@"%9*#($%#$1%9"09%0"9%#*"%1*#?$%>*+:%',('-%,+405

Claudio [email protected]

Gianmarco De Francisci [email protected]

Aristides [email protected]

Overwhelmed by information overload! W($1%($9"*"09($4%09+*("0%($%#$%+'"#$%+>%+$,($"%$"?0%#*9(',"05

0

5

10

15

20

25

30

35

40

45

1 10 100 1000 10000

Minutes

News-click delay

$8:<"

*%+>%+

''8**"$'

"0

R"?0V',('-%1",#E%1(09*(<89(+$

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

May-01 h20

May-02 h00

May-02 h04

May-02 h08

May-02 h12

May-02 h16

May-02 h20

May-03 h00

May-03 h04

May-03 h08

newstwitterclicks

9:;<;'=-1'>;?$1%9*"$10

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

May-22 h00

May-22 h12

May-23 h00

May-23 h12

May-24 h00

May-24 h12

May-25 h00

May-25 h12

May-26 h00

newstwitterclicks

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

@ABC-1'!AD1;?A'9*"$10

),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/

U

T

''(%#89@+*0@()%:#9*(J

4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%

)*+18'"1%<E%2/

U

U

('('0+'(#,%:#9*(J

in N according to a user-dependent relevance criteria. We also aim atincorporating time recency into our model, so that our recommendationsfavor the most recently published news articles.

We now proceed to model the factors that affect the relevance of newsfor a given user. We first model the social-network aspect. In our case,the social component is induced by the twitter following relationship. Wedefine S to be the social network adjacency matrix, were S(i, j) is equalto 1 divided by the number of users followed by user ui if ui follows uj ,and 0 otherwise. We also adopt a functional ranking (Baeza-Yates et al.,2006) that spreads the interests of a user among its neighbors recursively.By limiting the maximum hop distance d, we define the social influencein a network as follows.

Definition 4 (Social influence S∗). Given a set of users U = {u0, u1, . . .},organized in a social network where each user may express an interest to thecontent published by another user, we define the social influence model S∗ as the|U| × |U| matrix where S∗(i, j) measures the interest of user ui to the contentgenerated by user uj and it is computed as

S∗ =

�i=d�

i=1

σiSi

�,

where S is the row-normalized adjacency matrix of the social network, d is themaximum hop-distance up to which users may influence their neighbors, and σis a damping factor.

Next we model the profile of a user based on the content that the userhas generated. We first define a binary authorship matrix A to capturethe relationship between users and the tweets they produce.

Definition 5 (Tweet authorship A). Let A be a |U|×|T | matrix where A(i, j)is 1 if ui is the author of tj , and 0 otherwise.

The matrix A can be extended to deal with different types of relation-ships between users and posts, e.g., weigh differently re-tweets, or likes.In this work, we limit the concept of authorship to the posts actuallywritten by the user.

104

0+'(#,%($9"*"09

45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5

Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5

C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5

Z

7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*

+,-./0'('*",#9"1$"00%+>%

9?""9%F-%9+%$"?0%1/T

N

*'('9?""9V9+V$"?0%:#9*(J

*+,+!+-+.

!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/

T

Z

!'(%9?""9%:#9*(J

8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/

Z

N

.'(%$"?0%:#9*(J

yandex aug 31, 2012

Entities

News

Tweets





tweetsUser

tweetsFollowee

tweetsFollowee

tweetsFollowee

tweetstwitter

articlesnews

T.Rex

User Model

!

"

#





5.6.5 Results





124

!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ave

rag

e D

CG

Rank

T.Rex+T.Rex

PopularityContent

SocialRecency

Click count

63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5

T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5

What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5

Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5

Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5











108


T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"

;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0


How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5

Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05







0

5

10

15

20

25

30

35

40

45

1 10 100 1000 10000

Minutes

News-click delay

$8:<"

*%+>%+

''8**"$'

"0

R"?0V',('-%1",#E%1(09*(<89(+$

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

May-01 h20

May-02 h00

May-02 h04

May-02 h08

May-02 h12

May-02 h16

May-02 h20

May-03 h00

May-03 h04

May-03 h08

newstwitterclicks

9:;<;'=-1'>;?$1%9*"$10

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

May-22 h00

May-22 h12

May-23 h00

May-23 h12

May-24 h00

May-24 h12

May-25 h00

May-25 h12

May-26 h00

newstwitterclicks

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

@ABC-1'!AD1;?A'9*"$10

),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/

U

T

''(%#89@+*0@()%:#9*(J

4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%

)*+18'"1%<E%2/

U

U

('('0+'(#,%:#9*(J




S∗ =

�i=d�

i=1

σiSi

�,





104

0+'(#,%($9"*"09

45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5

Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5

C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5

Z

7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*

+,-./0'('*",#9"1$"00%+>%

9?""9%F-%9+%$"?0%1/T

N

*'('9?""9V9+V$"?0%:#9*(J

*+,+!+-+.

!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/

T

Z

!'(%9?""9%:#9*(J

8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/

Z

N

.'(%$"?0%:#9*(J

yandex aug 31, 2012

yandex aug 31, 2012

challenges

scale to large volumes of news and tweets

high dynamicity of news and tweets

news have short life-cycle

twitter users use jargon language

find the right degree of personalization

cope with inactive twitter users

yandex aug 31, 2012

relate users, tweets, and news articles

yandex aug 31, 2012

T.rex architecture

Entities

News

Tweets





tweetsUser

tweetsFollowee

tweetsFollowee

tweetsFollowee

tweetstwitter

articlesnews

T.Rex

User Model

!

"

#





5.6.5 Results





124

!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ave

rage D

CG

Rank

T.Rex+T.Rex

PopularityContent

SocialRecency

Click count

63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5

T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5

What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5

Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5

Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5











108


T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"

;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0


How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5

Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05







0

5

10

15

20

25

30

35

40

45

1 10 100 1000 10000

Minutes

News-click delay

$8:<"

*%+>%+

''8**"$'

"0

R"?0V',('-%1",#E%1(09*(<89(+$

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

May-01 h20

May-02 h00

May-02 h04

May-02 h08

May-02 h12

May-02 h16

May-02 h20

May-03 h00

May-03 h04

May-03 h08

newstwitterclicks

9:;<;'=-1'>;?$1%9*"$10

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

May-22 h00

May-22 h12

May-23 h00

May-23 h12

May-24 h00

May-24 h12

May-25 h00

May-25 h12

May-26 h00

newstwitterclicks

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

@ABC-1'!AD1;?A'9*"$10

),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/

U

T

''(%#89@+*0@()%:#9*(J

4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%

)*+18'"1%<E%2/

U

U

('('0+'(#,%:#9*(J




S∗ =

�i=d�

i=1

σiSi

�,





104

0+'(#,%($9"*"09

45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5

Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5

C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5

Z

7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*

+,-./0'('*",#9"1$"00%+>%

9?""9%F-%9+%$"?0%1/T

N

*'('9?""9V9+V$"?0%:#9*(J

*+,+!+-+.

!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/

T

Z

!'(%9?""9%:#9*(J

8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/

Z

N

.'(%$"?0%:#9*(J

yandex aug 31, 2012

recommendation model

Rτ(u, n) = α · Στ(u, n) + β · Γτ(u, n) + γ · Πτ(n)

social modelΣ(i , j) social relevance ofnews j to user i

content modelΓ(i , j) content relevanceof news j to user i

popularity modelΠ(j) popularity model ofnews article j

yandex aug 31, 2012






yandex aug 31, 2012






yandex aug 31, 2012






yandex aug 31, 2012

popularity update rule

Entities

News

Tweets





tweetsUser

tweetsFollowee

tweetsFollowee

tweetsFollowee

tweetstwitter

articlesnews

T.Rex

User Model

!

"

#





5.6.5 Results





124

!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ave

rag

e D

CG

Rank

T.Rex+T.Rex

PopularityContent

SocialRecency

Click count

63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5

T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5

What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5

Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5

Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5











108


T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"

;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0


How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5

Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05







0

5

10

15

20

25

30

35

40

45

1 10 100 1000 10000

Minutes

News-click delay

$8:<"

*%+>%+

''8**"$'

"0

R"?0V',('-%1",#E%1(09*(<89(+$

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

May-01 h20

May-02 h00

May-02 h04

May-02 h08

May-02 h12

May-02 h16

May-02 h20

May-03 h00

May-03 h04

May-03 h08

newstwitterclicks

9:;<;'=-1'>;?$1%9*"$10$+

*:#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

May-22 h00

May-22 h12

May-23 h00

May-23 h12

May-24 h00

May-24 h12

May-25 h00

May-25 h12

May-26 h00

newstwitterclicks

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

@ABC-1'!AD1;?A'9*"$10

),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/

U

T

''(%#89@+*0@()%:#9*(J

4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%

)*+18'"1%<E%2/

U

U

('('0+'(#,%:#9*(J




S∗ =

�i=d�

i=1

σiSi

�,





104

0+'(#,%($9"*"09

45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5

Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5

C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5

Z

7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*

+,-./0'('*",#9"1$"00%+>%

9?""9%F-%9+%$"?0%1/T

N

*'('9?""9V9+V$"?0%:#9*(J

*+,+!+-+.

!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/

T

Z

!'(%9?""9%:#9*(J

8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/

Z

N

.'(%$"?0%:#9*(J

news become stale after twodays

track mentions in news andtweets with exponentialdecay

Zτ = λZτ−1 + wTHT + wNHN

yandex aug 31, 2012

model learning and evaluation

Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)

Yahoo! toolbar data

the recommendation model should rank highnews articles that users click

learn the model using SVM

use clicks and twitter profiles of 3K usersto train and test the system

yandex aug 31, 2012

systems evaluated

T.rex: basic model using only user profiles

Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)

T.rex+: additional features

entity hotness

news click count

news article age

yandex aug 31, 2012

results

Entities

News

Tweets





tweetsUser

tweetsFollowee

tweetsFollowee

tweetsFollowee

tweetstwitter

articlesnews

T.Rex

User Model

!

"

#





5.6.5 Results





124

!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ave

rage D

CG

Rank

T.Rex+T.Rex

PopularityContent

SocialRecency

Click count

63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5

T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5

What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5

Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5

Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5











108


T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"

;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0


How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5

Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05







0

5

10

15

20

25

30

35

40

45

1 10 100 1000 10000

Minutes

News-click delay

$8:<"

*%+>%+

''8**"$'

"0

R"?0V',('-%1",#E%1(09*(<89(+$

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

May-01 h20

May-02 h00

May-02 h04

May-02 h08

May-02 h12

May-02 h16

May-02 h20

May-03 h00

May-03 h04

May-03 h08

newstwitterclicks

9:;<;'=-1'>;?$1%9*"$10

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

May-22 h00

May-22 h12

May-23 h00

May-23 h12

May-24 h00

May-24 h12

May-25 h00

May-25 h12

May-26 h00

newstwitterclicks

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

@ABC-1'!AD1;?A'9*"$10

),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/

U

T

''(%#89@+*0@()%:#9*(J

4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%

)*+18'"1%<E%2/

U

U

('('0+'(#,%:#9*(J




S∗ =

�i=d�

i=1

σiSi

�,





104

0+'(#,%($9"*"09

45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5

Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5

C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5

Z

7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*

+,-./0'('*",#9"1$"00%+>%

9?""9%F-%9+%$"?0%1/T

N

*'('9?""9V9+V$"?0%:#9*(J

*+,+!+-+.

!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/

T

Z

!'(%9?""9%:#9*(J

8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/

Z

N

.'(%$"?0%:#9*(J

yandex aug 31, 2012

results

Entities

News

Tweets





tweetsUser

tweetsFollowee

tweetsFollowee

tweetsFollowee

tweetstwitter

articlesnews

T.Rex

User Model

!

"

#





5.6.5 Results





124

!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ave

rage D

CG

Rank

T.Rex+T.Rex

PopularityContent

SocialRecency

Click count

63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5

T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5

What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5

Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5

Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5











108


T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"

;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0


How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5

Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05







0

5

10

15

20

25

30

35

40

45

1 10 100 1000 10000

Minutes

News-click delay

$8:<"

*%+>%+

''8**"$'

"0

R"?0V',('-%1",#E%1(09*(<89(+$

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

May-01 h20

May-02 h00

May-02 h04

May-02 h08

May-02 h12

May-02 h16

May-02 h20

May-03 h00

May-03 h04

May-03 h08

newstwitterclicks

9:;<;'=-1'>;?$1%9*"$10

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

May-22 h00

May-22 h12

May-23 h00

May-23 h12

May-24 h00

May-24 h12

May-25 h00

May-25 h12

May-26 h00

newstwitterclicks

$+*:

#,(Q"1

%$8:

<"*%+

>%+''8**"$'

"0

@ABC-1'!AD1;?A'9*"$10

),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/

U

T

''(%#89@+*0@()%:#9*(J

4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%

)*+18'"1%<E%2/

U

U

('('0+'(#,%:#9*(J




S∗ =

�i=d�

i=1

σiSi

�,





104

0+'(#,%($9"*"09

45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5

Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5

C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5

Z

7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*

+,-./0'('*",#9"1$"00%+>%

9?""9%F-%9+%$"?0%1/T

N

*'('9?""9V9+V$"?0%:#9*(J

*+,+!+-+.

!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/

T

Z

!'(%9?""9%:#9*(J

8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/

Z

N

.'(%$"?0%:#9*(J

yandex aug 31, 2012

conclusions

real-time web information can be leveraged to deliverrelevant information

future directions

LSI analysis on entities

models for different user clusters

georgaphic information

yandex aug 31, 2012

conclusions

real-time web information can be leveraged to deliverrelevant information

future directions

LSI analysis on entities

models for different user clusters

georgaphic information

yandex aug 31, 2012

summary

review concepts on query-log mining

answering directly queries with useful tips

challenges and opportunities in information dissemination

news recommendations using real-time web

many nice problems and research opportunities

yandex aug 31, 2012

thank you!

yandex aug 31, 2012

references I

Anagnostopoulos, A., Becchetti, L., Castillo, C., and Gionis, A.(2010).

An optimization framework for query recommendation.

In WSDM.

Baeza-Yates, R. A., Gionis, A., Junqueira, F., Murdock, V.,Plachouras, V., and Silvestri, F. (2007).

The impact of caching on search engines.

In SIGIR.

Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., andVigna, S. (2008).

The query-flow graph: model and applications.

In Proceeding of the 17th ACM conference on Information andknowledge management (CIKM).

yandex aug 31, 2012

references II

Bordino, I., Castillo, C., Donato, D., and Gionis, A. (2010).

Query similarity by projecting the query-flow graph.

In SIGIR.

Craswell, N. and Szummer, M. (2007).

Random walks on the click graph.

In Proceedings of the 30th annual international ACM conference onResearch and development in information retrieval (SIGIR).

De Francisci Morales, G., Gionis, A., and Lucchese, C. (2012).

From chatter to headlines: Harnessing the real-time web forpersonalized news recommendation.

In WSDM.

Szpektor, I., Gionis, A., and Maarek, Y. (2011).

Improving recommendation for long-tail queries via templates.

In WWW.

yandex aug 31, 2012

references III

Weber, I., Ukkonen, A., and Gioni, A. (2011).

Answers, not links: Extracting tips from yahoo! answers to addresshow-to web queries.

In CIKM.

yandex aug 31, 2012

Technology

Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации