Susan Dumais Microsoft Research http:// research.microsoft.com/~sdumais

CCF ADL, Jul 31 2011

Susan DumaisMicrosoft Research

http://research.microsoft.com/~sdumais

Temporal Dynamics and Information Retrieval

In collaboration with:Eric Horvitz, Jaime Teevan, Eytan Adar, Jon Elsas, Ed Cutrell, Dan Liebling, Richard Hughes, Merrie Ringel

Morris, Evgeniy Gabrilovich, Krysta Svore, Anagha Kulkani




Change is Everywhere in IR

Change is everywhere in digital information systems New documents appear all the time Document content changes over time Queries and query volume change over time What’s relevant to a query changes over time

E.g., U.S. Open 2011 (in May vs. Sept) User interaction changes over time

E.g., tags, anchor text, social networks, query-click streams, etc. Relations between entities change over time

E.g., President of the US in 2008 vs. 2004 vs. 2000 Change is pervasive in digital information

systems … yet, we’re not doing much about it !


Information Dynamics

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Content Changes

Today’s Browse and Search ExperiencesBut, ignores …

User Visitation/ReVisitation

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009


Digital Dynamics Easy to Capture

Easy to capture

But … few tools support dynamics


Overview Change on the desktop and news

Desktop: Stuff I’ve Seen; Memory Landmarks; LifeBrowser News: Analysis of novelty (e.g., NewsJunkie)

Change on the Web Content changes over time User interaction varies over time (queries, re-visitation,

anchor text, query-click stream, “likes”) Tools for understanding Web change (e.g., Diff-IE)

Improving Web retrieval using dynamics Query trends over time Retrieval models that leverage dynamics Task evolution over time


Stuff I’ve Seen (SIS)

Many silos of

information SIS:

Unified access to distributed, heterogeneous content (mail, files, web, tablet notes, rss, etc.)

Index full content + metadata

Fast, flexible search Information re-use

Stuff I’ve Seen

Windows-DS

[Dumais et al., SIGIR 2003]

SIS -> Windows Desktop Search

Example Desktop Searches

Looking for: recent email from Fedor that contained a link to his new demoInitiated from: Start menuQuery: from:FedorLooking for: the pdf of a SIGIR paper on

context and ranking (not sure it used those words) that someone (don’t remember who) sent me about a month agoInitiated from: OutlookQuery: SIGIR

Looking for: meeting invite for the last intern handoffInitiated from: Start menuQuery: intern handoff kind:appointment

Looking for: C# program I wrote a long time agoInitiated from: Explorer paneQuery: QCluster*.*


Lots of metadata… especially time

Stuff I’ve Seen

Stuff I’ve Seen: Findings Evaluation:

Internal to Microsoft, ~3000 users in 2004 Methods: free-form feedback, questionnaires, usage

patterns from log data, in situ experiments, lab studies for richer data

Personal store characteristics: 5k–1500k items

Information needs: Desktop search != Web search

Short queries (1.6 words) Few advanced operators in the initial query (~7%) But … many advanced operators and query iteration in UI

(48%) Filters (type, date, people); modify query; re-sort results

People know a lot about what they are looking for and we need to provide a way to express it !


Type N SizeWeb 3k 0.2 GbFiles 28k 23.0 GBMail 60k 2.2 GbTotal 91k items 25.4 GbIndex 190 Mb

+1.5 Mb/week

Susan's (Laptop) World

Stuff I’ve Seen: Findings Information needs:

People are important – 29% queries involve names/aliases

Date is the most common sort order Even w/ “best-match” default Few searches for “best” matching object Many other criteria (e.g., time, people, type), depending

on task Need to support flexible access

Abstraction is important – “useful” date, people, pictures

Age of items retrieved Today (5%), Last week (21%), Last month (47%) Need to support episodic access to memory


Memory Landmarks Importance of episodes in human memory

Memory organized into episodes (Tulving, 1983) People-specific events as anchors (Smith et al.,

1978) Time of events often recalled relative to other

events, historical or autobiographical (Huttenlocher & Prohaska, 1997)

Identify and use landmarks facilitate search and information management Timeline interface, augmented w/ landmarks Learn Bayesian models to identify memorable

events Extensions beyond search, e.g., Life

BrowserCCF ADL, Jul 31 2011


Memory LandmarksSearch Results

Memory Landmarks- General (world, calendar)- Personal (appts, photos)

Linked to results by time

Distribution of Results Over Time

[Ringle et al., 2003]


Memory Landmarks: Findings

Dates Only Landmarks + Dates0

5

10

15

20

25

30

Sea

rch

Tim

e (s

)

With Landmarks Without Landmarks

Memory LandmarksLearned models of memorability


[Horvitz et al., 2004]

Images & videos

Appts & events

Desktop& search activity

Whiteboardcapture

Locations

LifeBrowser[Horvitz & Koch, 2010]


LifeBrowserLearned models of selective

memory



News is a stream of information w/ evolving events But, it’s hard to consume it as such

Personalized news using information novelty Identify clusters of related articles Characterize what a user knows about an event Compute the novelty of new articles, relative to this

background knowledge (relevant & novel) Novelty = KLDivergence (article || current_knowledge)

Use novelty score and user preferences to guide what, when, and how to show new information

[Gabrilovich et al., WWW 2004]NewsJunkiePersonalized news via information

novelty


NewsJunkie in Action

Novelty Score

Friends say Wells is innocent

Looking for two people

Copycat case in Missouri

Article Sequence by Time

Gun disguised as cane

NewsJunkie:Pizza delivery man w/ bomb incident

NewsJunkie in Action


News Junkie Evaluation Experiment to evaluate algorithms

for detecting novelty Task: Given background article, select set of

articles that you would recommend for a friend who wants to find out what’s new about the story

KL and Named Entity algorithms better than temporal

But, many types of “differences” Recap, review of prior information Elaboration, new information Offshoot, related but mostly about something

else Irrelevant, not related to main story

0

10

20

30

40

50

60

70

80

kl ne_count original

# times most novel

# times mediumnovel# times least novel



NewsJunkieTypes of novelty, via intra-article novelty

dynamics

0

1

2

3

4

5

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97

Offshoot: SARS impact on Asian stock markets

Word Position

Nov

elty

Sco

re

0

1

2

3

4

5

1 12 23 34 45 56 67 78 89 100 111 122 133

On-topic, recapWord Position

Nov

elty

Sco

re

0

1

2

3

4

5

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81

On topic, elaboration: SARS patient’s wife held under quarantine

Word Position

Nov

elty

Sco

re

0

1

2

3

4

5

1 10 19 28 37 46 55 64 73 82 91 100 109

Offshoot: Swiss company develops SARS vaccine

Word Position

Nov

elty

Sc

ore







Questions?


1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Content Changes


1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Characterizing Web Change

Large-scale Web crawls, over time Revisited pages

55,000 pages crawled hourly for 18+ months Unique users, visits/user, time between visits

Pages returned by a search engine (for ~100k queries)

6 million pages crawled every two days for 6 months

[Adar et al., WSDM 2009]


Measuring Web Page Change

Summary metrics Number of changes Amount of change Time between changes

Change curves Fixed starting point Measure similarity

over different time intervals

Within-page changes



Summary metrics Number of changes

Amount of change Time between changes

33% of Web pages change 66% of visited Web pages

change 63% of these change every hr.

Avg. Dice coeff. = 0.80 Avg. time bet. change =

123 hrs. .edu and .gov pages change

infrequently, and not by much

popular pages change more frequently, but not by much



Summary metrics Number of changes Amount of change Time between changes

Change curves Fixed starting point Measure similarity

over different time intervals

Series1

00.2

0.4

0.60.8

1

Dice

Sim

ilarit

y

Knot point

Time from starting point


Measuring Within-Page Change

DOM-level changes Term-level changes

Divergence from norm cookbooks salads cheese ingredient bbq …

“Staying power” in page Time

Sep. Oct. Nov. Dec.


Example Term Longevity Graphs


Revisitation on the Web

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Content Changes


1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

What was the last Web page you visited?Why did you visit (re-visit) the page?

Revisitation patterns Log analyses

Toolbar logs for revisitation Query logs for re-finding

User survey to understand intent in revisitations

[Adar et al., CHI 2009]

60-80% of Web pages you visit, you’ve visited before

Many motivations for revisits


Measuring Revisitation Summary metrics

Unique visitors Visits/user Time between visits

Revisitation curves Histogram of revisit

intervals Normalized

Series1

00.20.40.60.8

1

Nor

mal

ized

Cou

nt

Time Interval


Four Revisitation Patterns

Fast Hub-and-spoke Navigation within site

Hybrid High quality fast pages

Medium Popular homepages Mail and Web applications

Slow Entry pages, bank pages Accessed via search engine


Relationships Between Change and Revisitation

Interested in change Monitor

Effect change Transact

Change unimportant Re-find old Change can

interfere with re-finding


Repeat

ClickNew Click

Repeat

Query

33% 29% 4%

NewQuer

y67% 10% 57%

39% 61%

Revisitation and Search (ReFinding)

Repeat query (33%) Q: microsoft research Click same and different URLs

Repeat click (39%) http://research.microsoft.com/ Q: microsoft research; msr

Big opportunity (43%) 24% “navigational revisits”

Repeat

Query

33%

NewQuer

y67%

[Teevan et al., SIGIR 2007] [Tyler et al., WSDM 2010][Teevan et al., WSDM 2011]


Building Support for Web Dynamics

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Content Changes

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Diff IE

Temporal IR



Diff-IE

Changes to page since your last visit

Diff-IE toolbar

[Teevan et al., UIST 2009] [Teevan et al., CHI 2010]


Interesting Features of Diff-IE

Always on

In-situ

New to you

Non-intrusive

Try it: http://research.microsoft.com/en-us/projects/diffie/default.aspx

http://research.microsoft.com/en-us/projects/diffie/default.aspx



Examples of Diff-IE in Action


Expected New Content


Monitor


Serendipitous Encounters


Unexpected Important Content


Understand Page Dynamics


Unexpected Unimportant Content

Attend to Activity

Edit

Understand Page Dynamics

Serendipitous Encounter

Unexpected Important Content

Expected New Content

Monitor

Expected Unexpected


Studying Diff-IE Feedback buttons Survey

Prior to installation After a month of use

Logging URLs visited Amount of change when revisited

Experience interview

In situRepresentativ

eExperienceLongitudinal


People Revisit More Perception of revisitation remains

constant How often do you revisit? How often are revisits to view new content?

Actual revisitation increases Last week: 45.0% of visits are revisits First week: 39.4% of visits are revisits

Why are people revisiting more with DIFF-IE?

14%


Revisited Pages Change More

Perception of change increases What proportion of pages change regularly? How often do you notice unexpected

change? Amount of change seen increases

Last week: 32.4% revisits changed, by 9.5% First week: 21.5% revisits changed, by 6.2%

Diff-IE is driving visits to changed pages It supports people in understanding change

51+%

17%

8%

Other Examples of Dynamics and User

Experience Content changes

Diff-IE (Teevan et al., 2008) Zoetrope (Adar et al., 2008) Diffamation (Chevalier et al., 2010) Temporal summaries and snippets …

Interaction changes Explicit annotations, ratings, wikis,

etc. Implicit interest via interaction

patterns Edit wear and read wear (Hill et al., 1992)CCF ADL, Jul 31 2011







Questions?


Leveraging Dynamics for Retrieval

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Content Changes

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009


Temporal IR

Temporal IR Query frequency over time Retrieval models that incorporate time

Ranking algorithms typically look only at a single snapshot in time

But, both content and user interaction with the content change over time

Model content change on a page Model user interactions

Tasks evolve over time


Query Dynamics Queries sometimes mention time, but often don’t

Explicit time (e.g., World Cup Soccer 2011) Explicit news (e.g., earthquake news) Implicit time (e.g., Harry Potter reviews; implicit “now”)

Queries are not uniformly distributed over time Often triggered by events in the world

Using temporal query patterns to: Cluster similar queries Identify events and find related news

Query Dynamics

Q: cinema

Discrete Fourier Transform Best k components (vs. first k) Best k significantly reduce

reconstruction error

Burst detection Bursts as deviations from moving

average

[Vlachos et al., SIGMOD 2004]

Modeling query frequency over time (Vlachos et al.)

Query Dynamics Types of query popularity

patterns Number of spikes (0, 1, multiple) Periodic (yes, no) Shape of rise and fall (wedge,

sail, castle) Trend (flat, up, down)

Changes in query popularity and content to changes in user intent (i.e., what is relevant to the query) … more on this later

[Kulkarni et al., WSDM 2011]

Model query patterns using empirical query frequency (normalized by total queries per day)

Examples:

Identify “similar” queries using correlation coefficient between the normalized time series

Nice use of time to identify semantic similarity between queries/entities, but not predictive

Using Query Dynamics to Find Similar Queries

[Chien & Immorlica, WWW 2005]

Q: movies

Q: scott peterso

n

Q: weather report


Many queries to Web search engines are motivated by events in the world Should you show just Web results? Or, provide an integrated view of news and Web?

Example Learn model to predict “newsworthiness” of

a query (i.e., will a user click on news results) Is the query part of a burst? [content consumption] Are the top-ranked news results very recent? [content

production] Improve prediction using ongoing click data for this and

related queries

[Diaz 2010]

Using Query Dynamics to Identify “News” Queries

Temporal Retrieval Models 1

Current retrieval algorithms look only at a single snapshot of a page

But, Web page content changes over time Can we can leverage this to improved

retrieval? Pages have different rates of change

Different priors (using change vs. link structure) Terms have different longevity (staying power)

Some are always on the page; some transient Language modeling approach to ranking


)|()()|( DQPDPQDP

Change prior Term longevity

[Elsas et al., WSDM 2010]

Relevance and Page Change

Page change is related to relevance judgments Human relevance judgments

5 point scale – Perfect/Excellent/Good/Fair/Bad Rate of Change -- 60% Perfect pages; 30% Bad

pages

Use change rate as a document prior (vs. priors based on link structure like Page Rank) Shingle prints to measure change


)|()()|( DQPDPQDP

Change prior

Relevance and Term Change

Terms patterns vary over time

Represent a document as a mixture of terms with different “staying power” Long, Medium, Short

)|()|()|()|( SSMMLL DQPDQPDQPDQP


)|()()|( DQPDPQDP

Term longevity

Evaluation: Queries & Documents

18K Queries, 2.5M Judged Documents 5-level relevance judgment (Perfect … Bad)

2.5M Documents crawled weekly for 10 wks

Navigational queries 2k queries identified with a “Perfect”

judgment Assume these relevance judgments are

consistent over timeCCF ADL, Jul 31 2011

Experimental Results

Baseline Static Model

Dynamic Model

Dynamic Model + Change Prior

Change Prior


Temporal Retrieval Models 2

Initial evaluation Navigational queries; assume relevance is “static”

over time But, relevance often changes over time

E.g., Stanley Cup in 2011 vs. in 2010 E.g., US Open 2011 in May (golf) vs. in Sept (tennis) E.g., March madness 2011

Before event: Schedule and tickets, e.g., stubhub During event: Real-time scores, e.g., espn, cbssports After event: General sites, e.g., wikipedia, ncaa

Ongoing evaluation Collecting explicit relevance judgments, query frequency,

interaction data and page content over time Developing temporal IR models, temporal snippets


[Kulkarni et al., WSDM 2011][Radinsky et al., in prep]

Relevance over Time Query: march madness [Mar 15 – Apr 4,

2010]


During After

Relevance over Time Query: sigir Why is old

content ranked higher? User interaction

data more prevalent for older documents

E.g., query-clicks, anchor text, etc.


2011

20102009

Relevance over Time Query: nfl

Again, time of query and time of event not well modeled


Sept - Jan

Feb - Aug

Time Series Modeling of Queries and Clicks

E.g., ny daily news Q: weekly period URL clicks: consistent

ordering

E.g., gold miner Q: more spikes URL clicks: most

clicked URL varies over time

Model queries and URL clicks as time seriesCCF ADL, Jul 31 2011

Time Series Modeling Model search behavior as time series:

Given a time series of observations we assume a process that generated it as a sequence of state vectors

State Space Model (SSM):Let be a state vector at moment of time t, then a semi-linear state space model is defined by:

(observation eq.) (state transition eq.) Predict:

Jointly learn along with internal states and residuals (parametric/non-parametric)

Generate “future states” using the state transition equation, Using the new states produce distribution on “future values”


Experimental Results Ground truth

Actual user behavior over time Two search-related prediction tasks

Predict number of URL clicks (prediction error) Ranking of URLs for search (correlation)

Three types of models No historical features Historical average Time-series modeling

Results (preliminary) Task Prediction Error Ranking URLsModel Text Only 0.012Text + Historical Avg 17.61 0.150Text + Time Series 16.72 0.300


Task Evolution over Time

Most retrieval models look at queries in isolation

But, people often use multiple queries to accomplish a task

Within-session tasks Use previous actions in current session to improve

understanding of current query Cross-session tasks

Develop methods to predict and support task resumption over time


Within-Session Tasks Examples

Q: [sigir] … given [information retrieval] vs. [iraq reconstruction] Q: [acl] … given [computational linguistics] vs. [knee injury] vs.

[country music] 40% of session contain multiple queries 60% of queries have at least one preceding query

Use previous actions in current session to improve understanding of current query Model using ODP categories

Use to predict future actions, rank current results, etc.

Health/MedicineMusicComputer ScienceSportsBusiness

[White et al., CIKM 2010]


Within-Session Results Context helps

Using any context source improves accuracy

Using more sources improves accuracy

Context sourceAccuracy ()

Models

None (i.e., current query only) 0.39 0.39Queries (i.e., all previous queries) 0.39 0.42 0.43Queries + SERPClicks (i.e., all previous queries / result clicks)

0.39 0.46 0.46

Queries + SERPClicks + NavTrails(i.e., all previous actions) 0.39 0.50 0.49

Context sourcePercentage of queries best

Between models

Queries (i.e., all previous queries) 25% 18% 22%

Queries + SERPClicks (i.e., all previous queries / result clicks)

30% 16% 25%

Queries + SERPClicks + NavTrails(i.e., all previous actions) 34% 11% 30%

Differences across queries Query model wins: current

query has specific intent [espn] [webmd], or first actions after a shift in interests

Context model wins: query is ambiguous [amazon] and session has a consistent intent

Intent model wins: session has consistent intent throughout


Research Missions Research missions (from Jones & Klinkner) Identify “research missions” on-the-fly during

a search session Three general signals

Research_mission (q1, q2) classifier Same_mission (q1, q2) classifier Sim (topics (q1), topics (q2))

Many features Textual – similarity q1, q1 Session-based – queries, clicks, queries since last click,

etc. Time-based – time between q1, q2, total session time, etc.

Trigger Yahoo! Scratch Pad if a research mission is detected

[Donato et al., WWW 2010][Jones & Klinckner, CIKM 2008)


Cross-Session Tasks Many tasks extend across sessions – e.g., medical

diagnosis and treatment, event planning, how-to advice, shopping research, academic research, etc. 10-15% of tasks continue across multiple sessions 20-25% of queries are from multi-session tasks

Example

Develop methods to support task resumption over time Same Task: Find (previous) related queries/clicks Task Resumption: Predict whether user will resume

task

[Kotov et al., SIGIR 2011]


Cross-Session Tasks

Results Same Task Task Continuation

Develop support for task continuation

[Kotov et al., SIGIR 2011]

Approach Classification (logistic regression, MART) Features (query, pair-wise, session-based, history-

based)


Other Examples of Dynamics and

Information Systems Document dynamics, for crawling and

indexing Adar et al. (2009); Cho & Garcia-Molina (2000); Fetterly et al.

(2003) Query dynamics

Kulkarni et al. (2011); Jones & Diaz (2004); Diaz (2009); Kotov et al. (2010)

Temporal retrieval models Elsas & Dumais (2010); Liu & Croft (2004); Efron (2010); Aji et al.

(2010) Extraction of temporal entities within

documents Protocol extension for retrieving versions

over time E.g., Memento (Van de Sompel et al., 2010)



Summary

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Content Changes

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Relating revisitation and change allows us to– Identify pages for which change is

important– Identify interesting components within a

page

Diff-IE: Supports (and influences) interaction and understanding

Temporal IR: Leverages change for improved IR

Web content changes: page-level, term-level

User Visitation/ReVisitationPeople revisit and re-find Web content

Time on the desktop, news, real-time media

Challenges and Opportunities

Temporal dynamics are pervasive in information systems

Influence many aspect of information systems Systems: protocols, crawling, indexing, caching Document representations: meta-data generation,

information extraction, sufficient statistics at page and term-level

Retrieval models: term weights, document priors, etc. User experience and evaluation

Better supporting temporal dynamics of information Requires digital preservation and temporal metadata

extraction Enables richer understanding of the evolution (and

prediction) of key ideas, relations, and trends over time Time is one important example of context and IR

Others include: location, individuals, tasks …



User Context

Task/Use Context

Document Context

Query Words

Ranked List

Think Outside the (Search) BoxesSearch Research


Thank You ! Questions/Comments …

More info, http://research.microsoft.com/~sdumais

Diff-IE … try it! http://research.microsoft.com/en-us/projects/diffie/default.aspx





References Desktop and news

Dumais et al. (SIGIR 2003). Ringle et al. (Interact 2003). Horvitz & Koch (2010). Gabrilovich et al. (WWW 2004). Diaz (2009).

Document dynamics Adar et al. (WSDM 2009). Cho & Garcia-Molina (VLDB

2000). Fetterly et al. (WWW 2003).

User interaction Adar et al. (CHI 2009). Teevan et al. (SIGIR 2007). Tyler et al. (WSDM 2010). Teevan et al. (WSDM 2011). Adar et al. (CHI 2010). Teevan et al. (UIST 2009). Teevan et al. (CHI 2010).


Query dynamics Kulkarni et al. (WSDM 2011). Jones & Diaz (TOIS 2007). Vlachos et al. (SIGMOD 2004). Chien & Immorlica (WWW 2005).

Temporal retrieval models Elsas & Dumais (WSDM 2010). Radinsky (in prep). Li & Croft (2004). Efron & Golovshinshy (SIRGIR

2011). Tasks over time

White et al. (CIKM 2010). Donato et al. (WWW 2010). Kotov et al. (SIGIR 2011). Jones and Klinkner (CIKM 2008).

General Kleinberg (). Van de Sompel et al. (Arxiv

2009).