Upload
hedy
View
25
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Temporal Dynamics and Information Retrieval. Susan Dumais Microsoft Research http:// research.microsoft.com/~sdumais. In collaboration with: - PowerPoint PPT Presentation
Citation preview
CCF ADL, Jul 31 2011
Susan DumaisMicrosoft Research
http://research.microsoft.com/~sdumais
Temporal Dynamics and Information Retrieval
In collaboration with:Eric Horvitz, Jaime Teevan, Eytan Adar, Jon Elsas, Ed Cutrell, Dan Liebling, Richard Hughes, Merrie Ringel
Morris, Evgeniy Gabrilovich, Krysta Svore, Anagha Kulkani
CCF ADL, Jul 31 2011
Change is Everywhere in IR
Change is everywhere in digital information systems New documents appear all the time Document content changes over time Queries and query volume change over time What’s relevant to a query changes over time
E.g., U.S. Open 2011 (in May vs. Sept) User interaction changes over time
E.g., tags, anchor text, social networks, query-click streams, etc. Relations between entities change over time
E.g., President of the US in 2008 vs. 2004 vs. 2000 Change is pervasive in digital information
systems … yet, we’re not doing much about it !
CCF ADL, Jul 31 2011
Information Dynamics
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Content Changes
Today’s Browse and Search ExperiencesBut, ignores …
User Visitation/ReVisitation
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
CCF ADL, Jul 31 2011
Digital Dynamics Easy to Capture
Easy to capture
But … few tools support dynamics
CCF ADL, Jul 31 2011
Overview Change on the desktop and news
Desktop: Stuff I’ve Seen; Memory Landmarks; LifeBrowser News: Analysis of novelty (e.g., NewsJunkie)
Change on the Web Content changes over time User interaction varies over time (queries, re-visitation,
anchor text, query-click stream, “likes”) Tools for understanding Web change (e.g., Diff-IE)
Improving Web retrieval using dynamics Query trends over time Retrieval models that leverage dynamics Task evolution over time
CCF ADL, Jul 31 2011
Stuff I’ve Seen (SIS)
Many silos of
information SIS:
Unified access to distributed, heterogeneous content (mail, files, web, tablet notes, rss, etc.)
Index full content + metadata
Fast, flexible search Information re-use
Stuff I’ve Seen
Windows-DS
[Dumais et al., SIGIR 2003]
SIS -> Windows Desktop Search
Example Desktop Searches
Looking for: recent email from Fedor that contained a link to his new demoInitiated from: Start menuQuery: from:FedorLooking for: the pdf of a SIGIR paper on
context and ranking (not sure it used those words) that someone (don’t remember who) sent me about a month agoInitiated from: OutlookQuery: SIGIR
Looking for: meeting invite for the last intern handoffInitiated from: Start menuQuery: intern handoff kind:appointment
Looking for: C# program I wrote a long time agoInitiated from: Explorer paneQuery: QCluster*.*
CCF ADL, Jul 31 2011
Lots of metadata… especially time
Stuff I’ve Seen
Stuff I’ve Seen: Findings Evaluation:
Internal to Microsoft, ~3000 users in 2004 Methods: free-form feedback, questionnaires, usage
patterns from log data, in situ experiments, lab studies for richer data
Personal store characteristics: 5k–1500k items
Information needs: Desktop search != Web search
Short queries (1.6 words) Few advanced operators in the initial query (~7%) But … many advanced operators and query iteration in UI
(48%) Filters (type, date, people); modify query; re-sort results
People know a lot about what they are looking for and we need to provide a way to express it !
CCF ADL, Jul 31 2011
Type N SizeWeb 3k 0.2 GbFiles 28k 23.0 GBMail 60k 2.2 GbTotal 91k items 25.4 GbIndex 190 Mb
+1.5 Mb/week
Susan's (Laptop) World
Stuff I’ve Seen: Findings Information needs:
People are important – 29% queries involve names/aliases
Date is the most common sort order Even w/ “best-match” default Few searches for “best” matching object Many other criteria (e.g., time, people, type), depending
on task Need to support flexible access
Abstraction is important – “useful” date, people, pictures
Age of items retrieved Today (5%), Last week (21%), Last month (47%) Need to support episodic access to memory
CCF ADL, Jul 31 2011
Memory Landmarks Importance of episodes in human memory
Memory organized into episodes (Tulving, 1983) People-specific events as anchors (Smith et al.,
1978) Time of events often recalled relative to other
events, historical or autobiographical (Huttenlocher & Prohaska, 1997)
Identify and use landmarks facilitate search and information management Timeline interface, augmented w/ landmarks Learn Bayesian models to identify memorable
events Extensions beyond search, e.g., Life
BrowserCCF ADL, Jul 31 2011
CCF ADL, Jul 31 2011
Memory LandmarksSearch Results
Memory Landmarks- General (world, calendar)- Personal (appts, photos)
Linked to results by time
Distribution of Results Over Time
[Ringle et al., 2003]
CCF ADL, Jul 31 2011
Memory Landmarks: Findings
Dates Only Landmarks + Dates0
5
10
15
20
25
30
Sea
rch
Tim
e (s
)
With Landmarks Without Landmarks
Memory LandmarksLearned models of memorability
CCF ADL, Jul 31 2011
[Horvitz et al., 2004]
Images & videos
Appts & events
Desktop& search activity
Whiteboardcapture
Locations
LifeBrowser[Horvitz & Koch, 2010]
CCF ADL, Jul 31 2011
LifeBrowserLearned models of selective
memory
CCF ADL, Jul 31 2011
CCF ADL, Jul 31 2011
News is a stream of information w/ evolving events But, it’s hard to consume it as such
Personalized news using information novelty Identify clusters of related articles Characterize what a user knows about an event Compute the novelty of new articles, relative to this
background knowledge (relevant & novel) Novelty = KLDivergence (article || current_knowledge)
Use novelty score and user preferences to guide what, when, and how to show new information
[Gabrilovich et al., WWW 2004]NewsJunkiePersonalized news via information
novelty
CCF ADL, Jul 31 2011
NewsJunkie in Action
Novelty Score
Friends say Wells is innocent
Looking for two people
Copycat case in Missouri
Article Sequence by Time
Gun disguised as cane
NewsJunkie:Pizza delivery man w/ bomb incident
NewsJunkie in Action
CCF ADL, Jul 31 2011
News Junkie Evaluation Experiment to evaluate algorithms
for detecting novelty Task: Given background article, select set of
articles that you would recommend for a friend who wants to find out what’s new about the story
KL and Named Entity algorithms better than temporal
But, many types of “differences” Recap, review of prior information Elaboration, new information Offshoot, related but mostly about something
else Irrelevant, not related to main story
0
10
20
30
40
50
60
70
80
kl ne_count original
# times most novel
# times mediumnovel# times least novel
CCF ADL, Jul 31 2011
CCF ADL, Jul 31 2011
NewsJunkieTypes of novelty, via intra-article novelty
dynamics
0
1
2
3
4
5
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
Offshoot: SARS impact on Asian stock markets
Word Position
Nov
elty
Sco
re
0
1
2
3
4
5
1 12 23 34 45 56 67 78 89 100 111 122 133
On-topic, recapWord Position
Nov
elty
Sco
re
0
1
2
3
4
5
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81
On topic, elaboration: SARS patient’s wife held under quarantine
Word Position
Nov
elty
Sco
re
0
1
2
3
4
5
1 10 19 28 37 46 55 64 73 82 91 100 109
Offshoot: Swiss company develops SARS vaccine
Word Position
Nov
elty
Sc
ore
CCF ADL, Jul 31 2011
Overview Change on the desktop and news
Desktop: Stuff I’ve Seen; Memory Landmarks; LifeBrowser News: Analysis of novelty (e.g., NewsJunkie)
Change on the Web Content changes over time User interaction varies over time (queries, re-visitation,
anchor text, query-click stream, “likes”) Tools for understanding Web change (e.g., Diff-IE)
Improving Web retrieval using dynamics Query trends over time Retrieval models that leverage dynamics Task evolution over time
Questions?
CCF ADL, Jul 31 2011
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Content Changes
User Visitation/ReVisitation
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Characterizing Web Change
Large-scale Web crawls, over time Revisited pages
55,000 pages crawled hourly for 18+ months Unique users, visits/user, time between visits
Pages returned by a search engine (for ~100k queries)
6 million pages crawled every two days for 6 months
[Adar et al., WSDM 2009]
CCF ADL, Jul 31 2011
Measuring Web Page Change
Summary metrics Number of changes Amount of change Time between changes
Change curves Fixed starting point Measure similarity
over different time intervals
Within-page changes
CCF ADL, Jul 31 2011
Measuring Web Page Change
Summary metrics Number of changes
Amount of change Time between changes
33% of Web pages change 66% of visited Web pages
change 63% of these change every hr.
Avg. Dice coeff. = 0.80 Avg. time bet. change =
123 hrs. .edu and .gov pages change
infrequently, and not by much
popular pages change more frequently, but not by much
CCF ADL, Jul 31 2011
Measuring Web Page Change
Summary metrics Number of changes Amount of change Time between changes
Change curves Fixed starting point Measure similarity
over different time intervals
Series1
00.2
0.4
0.60.8
1
Dice
Sim
ilarit
y
Knot point
Time from starting point
CCF ADL, Jul 31 2011
Measuring Within-Page Change
DOM-level changes Term-level changes
Divergence from norm cookbooks salads cheese ingredient bbq …
“Staying power” in page Time
Sep. Oct. Nov. Dec.
CCF ADL, Jul 31 2011
Example Term Longevity Graphs
CCF ADL, Jul 31 2011
Revisitation on the Web
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Content Changes
User Visitation/ReVisitation
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
What was the last Web page you visited?Why did you visit (re-visit) the page?
Revisitation patterns Log analyses
Toolbar logs for revisitation Query logs for re-finding
User survey to understand intent in revisitations
[Adar et al., CHI 2009]
60-80% of Web pages you visit, you’ve visited before
Many motivations for revisits
CCF ADL, Jul 31 2011
Measuring Revisitation Summary metrics
Unique visitors Visits/user Time between visits
Revisitation curves Histogram of revisit
intervals Normalized
Series1
00.20.40.60.8
1
Nor
mal
ized
Cou
nt
Time Interval
CCF ADL, Jul 31 2011
Four Revisitation Patterns
Fast Hub-and-spoke Navigation within site
Hybrid High quality fast pages
Medium Popular homepages Mail and Web applications
Slow Entry pages, bank pages Accessed via search engine
CCF ADL, Jul 31 2011
Relationships Between Change and Revisitation
Interested in change Monitor
Effect change Transact
Change unimportant Re-find old Change can
interfere with re-finding
CCF ADL, Jul 31 2011
Repeat
ClickNew Click
Repeat
Query
33% 29% 4%
NewQuer
y67% 10% 57%
39% 61%
Revisitation and Search (ReFinding)
Repeat query (33%) Q: microsoft research Click same and different URLs
Repeat click (39%) http://research.microsoft.com/ Q: microsoft research; msr
Big opportunity (43%) 24% “navigational revisits”
Repeat
Query
33%
NewQuer
y67%
[Teevan et al., SIGIR 2007] [Tyler et al., WSDM 2010][Teevan et al., WSDM 2011]
CCF ADL, Jul 31 2011
Building Support for Web Dynamics
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Content Changes
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Diff IE
Temporal IR
User Visitation/ReVisitation
CCF ADL, Jul 31 2011
Diff-IE
Changes to page since your last visit
Diff-IE toolbar
[Teevan et al., UIST 2009] [Teevan et al., CHI 2010]
CCF ADL, Jul 31 2011
Interesting Features of Diff-IE
Always on
In-situ
New to you
Non-intrusive
Try it: http://research.microsoft.com/en-us/projects/diffie/default.aspx
CCF ADL, Jul 31 2011
Examples of Diff-IE in Action
CCF ADL, Jul 31 2011
Expected New Content
CCF ADL, Jul 31 2011
Monitor
CCF ADL, Jul 31 2011
Serendipitous Encounters
CCF ADL, Jul 31 2011
Unexpected Important Content
CCF ADL, Jul 31 2011
Understand Page Dynamics
CCF ADL, Jul 31 2011
Unexpected Unimportant Content
Attend to Activity
Edit
Understand Page Dynamics
Serendipitous Encounter
Unexpected Important Content
Expected New Content
Monitor
Expected Unexpected
CCF ADL, Jul 31 2011
Studying Diff-IE Feedback buttons Survey
Prior to installation After a month of use
Logging URLs visited Amount of change when revisited
Experience interview
In situRepresentativ
eExperienceLongitudinal
CCF ADL, Jul 31 2011
People Revisit More Perception of revisitation remains
constant How often do you revisit? How often are revisits to view new content?
Actual revisitation increases Last week: 45.0% of visits are revisits First week: 39.4% of visits are revisits
Why are people revisiting more with DIFF-IE?
14%
CCF ADL, Jul 31 2011
Revisited Pages Change More
Perception of change increases What proportion of pages change regularly? How often do you notice unexpected
change? Amount of change seen increases
Last week: 32.4% revisits changed, by 9.5% First week: 21.5% revisits changed, by 6.2%
Diff-IE is driving visits to changed pages It supports people in understanding change
51+%
17%
8%
Other Examples of Dynamics and User
Experience Content changes
Diff-IE (Teevan et al., 2008) Zoetrope (Adar et al., 2008) Diffamation (Chevalier et al., 2010) Temporal summaries and snippets …
Interaction changes Explicit annotations, ratings, wikis,
etc. Implicit interest via interaction
patterns Edit wear and read wear (Hill et al., 1992)CCF ADL, Jul 31 2011
CCF ADL, Jul 31 2011
Overview Change on the desktop and news
Desktop: Stuff I’ve Seen; Memory Landmarks; LifeBrowser News: Analysis of novelty (e.g., NewsJunkie)
Change on the Web Content changes over time User interaction varies over time (queries, re-visitation,
anchor text, query-click stream, “likes”) Tools for understanding Web change (e.g., Diff-IE)
Improving Web retrieval using dynamics Query trends over time Retrieval models that leverage dynamics Task evolution over time
Questions?
CCF ADL, Jul 31 2011
Leveraging Dynamics for Retrieval
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Content Changes
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
User Visitation/ReVisitation
Temporal IR
Temporal IR Query frequency over time Retrieval models that incorporate time
Ranking algorithms typically look only at a single snapshot in time
But, both content and user interaction with the content change over time
Model content change on a page Model user interactions
Tasks evolve over time
CCF ADL, Jul 31 2011
Query Dynamics Queries sometimes mention time, but often don’t
Explicit time (e.g., World Cup Soccer 2011) Explicit news (e.g., earthquake news) Implicit time (e.g., Harry Potter reviews; implicit “now”)
Queries are not uniformly distributed over time Often triggered by events in the world
Using temporal query patterns to: Cluster similar queries Identify events and find related news
Query Dynamics
Q: cinema
Discrete Fourier Transform Best k components (vs. first k) Best k significantly reduce
reconstruction error
Burst detection Bursts as deviations from moving
average
[Vlachos et al., SIGMOD 2004]
Modeling query frequency over time (Vlachos et al.)
Query Dynamics Types of query popularity
patterns Number of spikes (0, 1, multiple) Periodic (yes, no) Shape of rise and fall (wedge,
sail, castle) Trend (flat, up, down)
Changes in query popularity and content to changes in user intent (i.e., what is relevant to the query) … more on this later
[Kulkarni et al., WSDM 2011]
Model query patterns using empirical query frequency (normalized by total queries per day)
Examples:
Identify “similar” queries using correlation coefficient between the normalized time series
Nice use of time to identify semantic similarity between queries/entities, but not predictive
Using Query Dynamics to Find Similar Queries
[Chien & Immorlica, WWW 2005]
Q: movies
Q: scott peterso
n
Q: weather report
CCF ADL, Jul 31 2011
Many queries to Web search engines are motivated by events in the world Should you show just Web results? Or, provide an integrated view of news and Web?
Example Learn model to predict “newsworthiness” of
a query (i.e., will a user click on news results) Is the query part of a burst? [content consumption] Are the top-ranked news results very recent? [content
production] Improve prediction using ongoing click data for this and
related queries
[Diaz 2010]
Using Query Dynamics to Identify “News” Queries
Temporal Retrieval Models 1
Current retrieval algorithms look only at a single snapshot of a page
But, Web page content changes over time Can we can leverage this to improved
retrieval? Pages have different rates of change
Different priors (using change vs. link structure) Terms have different longevity (staying power)
Some are always on the page; some transient Language modeling approach to ranking
CCF ADL, Jul 31 2011
)|()()|( DQPDPQDP
Change prior Term longevity
[Elsas et al., WSDM 2010]
Relevance and Page Change
Page change is related to relevance judgments Human relevance judgments
5 point scale – Perfect/Excellent/Good/Fair/Bad Rate of Change -- 60% Perfect pages; 30% Bad
pages
Use change rate as a document prior (vs. priors based on link structure like Page Rank) Shingle prints to measure change
CCF ADL, Jul 31 2011
)|()()|( DQPDPQDP
Change prior
Relevance and Term Change
Terms patterns vary over time
Represent a document as a mixture of terms with different “staying power” Long, Medium, Short
)|()|()|()|( SSMMLL DQPDQPDQPDQP
CCF ADL, Jul 31 2011
)|()()|( DQPDPQDP
Term longevity
Evaluation: Queries & Documents
18K Queries, 2.5M Judged Documents 5-level relevance judgment (Perfect … Bad)
2.5M Documents crawled weekly for 10 wks
Navigational queries 2k queries identified with a “Perfect”
judgment Assume these relevance judgments are
consistent over timeCCF ADL, Jul 31 2011
Experimental Results
Baseline Static Model
Dynamic Model
Dynamic Model + Change Prior
Change Prior
CCF ADL, Jul 31 2011
Temporal Retrieval Models 2
Initial evaluation Navigational queries; assume relevance is “static”
over time But, relevance often changes over time
E.g., Stanley Cup in 2011 vs. in 2010 E.g., US Open 2011 in May (golf) vs. in Sept (tennis) E.g., March madness 2011
Before event: Schedule and tickets, e.g., stubhub During event: Real-time scores, e.g., espn, cbssports After event: General sites, e.g., wikipedia, ncaa
Ongoing evaluation Collecting explicit relevance judgments, query frequency,
interaction data and page content over time Developing temporal IR models, temporal snippets
CCF ADL, Jul 31 2011
[Kulkarni et al., WSDM 2011][Radinsky et al., in prep]
Relevance over Time Query: march madness [Mar 15 – Apr 4,
2010]
CCF ADL, Jul 31 2011
During After
Relevance over Time Query: sigir Why is old
content ranked higher? User interaction
data more prevalent for older documents
E.g., query-clicks, anchor text, etc.
CCF ADL, Jul 31 2011
2011
20102009
Relevance over Time Query: nfl
Again, time of query and time of event not well modeled
CCF ADL, Jul 31 2011
Sept - Jan
Feb - Aug
Time Series Modeling of Queries and Clicks
E.g., ny daily news Q: weekly period URL clicks: consistent
ordering
E.g., gold miner Q: more spikes URL clicks: most
clicked URL varies over time
Model queries and URL clicks as time seriesCCF ADL, Jul 31 2011
Time Series Modeling Model search behavior as time series:
Given a time series of observations we assume a process that generated it as a sequence of state vectors
State Space Model (SSM):Let be a state vector at moment of time t, then a semi-linear state space model is defined by:
(observation eq.) (state transition eq.) Predict:
Jointly learn along with internal states and residuals (parametric/non-parametric)
Generate “future states” using the state transition equation, Using the new states produce distribution on “future values”
CCF ADL, Jul 31 2011
Experimental Results Ground truth
Actual user behavior over time Two search-related prediction tasks
Predict number of URL clicks (prediction error) Ranking of URLs for search (correlation)
Three types of models No historical features Historical average Time-series modeling
Results (preliminary) Task Prediction Error Ranking URLsModel Text Only 0.012Text + Historical Avg 17.61 0.150Text + Time Series 16.72 0.300
CCF ADL, Jul 31 2011
Task Evolution over Time
Most retrieval models look at queries in isolation
But, people often use multiple queries to accomplish a task
Within-session tasks Use previous actions in current session to improve
understanding of current query Cross-session tasks
Develop methods to predict and support task resumption over time
CCF ADL, Jul 31 2011
Within-Session Tasks Examples
Q: [sigir] … given [information retrieval] vs. [iraq reconstruction] Q: [acl] … given [computational linguistics] vs. [knee injury] vs.
[country music] 40% of session contain multiple queries 60% of queries have at least one preceding query
Use previous actions in current session to improve understanding of current query Model using ODP categories
Use to predict future actions, rank current results, etc.
Health/MedicineMusicComputer ScienceSportsBusiness
[White et al., CIKM 2010]
CCF ADL, Jul 31 2011
Within-Session Results Context helps
Using any context source improves accuracy
Using more sources improves accuracy
Context sourceAccuracy ()
Models
None (i.e., current query only) 0.39 0.39Queries (i.e., all previous queries) 0.39 0.42 0.43Queries + SERPClicks (i.e., all previous queries / result clicks)
0.39 0.46 0.46
Queries + SERPClicks + NavTrails(i.e., all previous actions) 0.39 0.50 0.49
Context sourcePercentage of queries best
Between models
Queries (i.e., all previous queries) 25% 18% 22%
Queries + SERPClicks (i.e., all previous queries / result clicks)
30% 16% 25%
Queries + SERPClicks + NavTrails(i.e., all previous actions) 34% 11% 30%
Differences across queries Query model wins: current
query has specific intent [espn] [webmd], or first actions after a shift in interests
Context model wins: query is ambiguous [amazon] and session has a consistent intent
Intent model wins: session has consistent intent throughout
CCF ADL, Jul 31 2011
Research Missions Research missions (from Jones & Klinkner) Identify “research missions” on-the-fly during
a search session Three general signals
Research_mission (q1, q2) classifier Same_mission (q1, q2) classifier Sim (topics (q1), topics (q2))
Many features Textual – similarity q1, q1 Session-based – queries, clicks, queries since last click,
etc. Time-based – time between q1, q2, total session time, etc.
Trigger Yahoo! Scratch Pad if a research mission is detected
[Donato et al., WWW 2010][Jones & Klinckner, CIKM 2008)
CCF ADL, Jul 31 2011
Cross-Session Tasks Many tasks extend across sessions – e.g., medical
diagnosis and treatment, event planning, how-to advice, shopping research, academic research, etc. 10-15% of tasks continue across multiple sessions 20-25% of queries are from multi-session tasks
Example
Develop methods to support task resumption over time Same Task: Find (previous) related queries/clicks Task Resumption: Predict whether user will resume
task
[Kotov et al., SIGIR 2011]
CCF ADL, Jul 31 2011
Cross-Session Tasks
Results Same Task Task Continuation
Develop support for task continuation
[Kotov et al., SIGIR 2011]
Approach Classification (logistic regression, MART) Features (query, pair-wise, session-based, history-
based)
CCF ADL, Jul 31 2011
Other Examples of Dynamics and
Information Systems Document dynamics, for crawling and
indexing Adar et al. (2009); Cho & Garcia-Molina (2000); Fetterly et al.
(2003) Query dynamics
Kulkarni et al. (2011); Jones & Diaz (2004); Diaz (2009); Kotov et al. (2010)
Temporal retrieval models Elsas & Dumais (2010); Liu & Croft (2004); Efron (2010); Aji et al.
(2010) Extraction of temporal entities within
documents Protocol extension for retrieving versions
over time E.g., Memento (Van de Sompel et al., 2010)
CCF ADL, Jul 31 2011
CCF ADL, Jul 31 2011
Summary
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Content Changes
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Relating revisitation and change allows us to– Identify pages for which change is
important– Identify interesting components within a
page
Diff-IE: Supports (and influences) interaction and understanding
Temporal IR: Leverages change for improved IR
Web content changes: page-level, term-level
User Visitation/ReVisitationPeople revisit and re-find Web content
Time on the desktop, news, real-time media
Challenges and Opportunities
Temporal dynamics are pervasive in information systems
Influence many aspect of information systems Systems: protocols, crawling, indexing, caching Document representations: meta-data generation,
information extraction, sufficient statistics at page and term-level
Retrieval models: term weights, document priors, etc. User experience and evaluation
Better supporting temporal dynamics of information Requires digital preservation and temporal metadata
extraction Enables richer understanding of the evolution (and
prediction) of key ideas, relations, and trends over time Time is one important example of context and IR
Others include: location, individuals, tasks …
CCF ADL, Jul 31 2011
CCF ADL, Jul 31 2011
User Context
Task/Use Context
Document Context
Query Words
Ranked List
Think Outside the (Search) BoxesSearch Research
CCF ADL, Jul 31 2011
Thank You ! Questions/Comments …
More info, http://research.microsoft.com/~sdumais
Diff-IE … try it! http://research.microsoft.com/en-us/projects/diffie/default.aspx
References Desktop and news
Dumais et al. (SIGIR 2003). Ringle et al. (Interact 2003). Horvitz & Koch (2010). Gabrilovich et al. (WWW 2004). Diaz (2009).
Document dynamics Adar et al. (WSDM 2009). Cho & Garcia-Molina (VLDB
2000). Fetterly et al. (WWW 2003).
User interaction Adar et al. (CHI 2009). Teevan et al. (SIGIR 2007). Tyler et al. (WSDM 2010). Teevan et al. (WSDM 2011). Adar et al. (CHI 2010). Teevan et al. (UIST 2009). Teevan et al. (CHI 2010).
CCF ADL, Jul 31 2011
Query dynamics Kulkarni et al. (WSDM 2011). Jones & Diaz (TOIS 2007). Vlachos et al. (SIGMOD 2004). Chien & Immorlica (WWW 2005).
Temporal retrieval models Elsas & Dumais (WSDM 2010). Radinsky (in prep). Li & Croft (2004). Efron & Golovshinshy (SIRGIR
2011). Tasks over time
White et al. (CIKM 2010). Donato et al. (WWW 2010). Kotov et al. (SIGIR 2011). Jones and Klinkner (CIKM 2008).
General Kleinberg (). Van de Sompel et al. (Arxiv
2009).