15
1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

Embed Size (px)

Citation preview

Page 1: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

1

The Use of Provenance in Information Retrieval

Simone StumpfErin Fitzhenry Tom Dietterich

Page 2: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

2

Defining Provenance

To us, provenance concerns:

The origin of content within documents

The relationships between documents

AttachmentSave SaveAs

Page 3: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

3

Why focus on Provenance for Information Retrieval?

People remember the relationships between documents!

Episodic vs. Semantic Memory Studies:

Blanc-Brude & Scapin (2007) Gonçalves & Jorge (2004)

No need to formulate keyword queries

Other common document attributes are often inaccurately remembered (Blanc-Brude & Scapin 2007):

Title (20% false recall) Size (53.8% false recall) Time (47.6% false recall)

Page 4: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

4

Example Use Case: “Where did I save that again?”

I got an email from Tom…

I saved the attachment…

And I pasted some information from the attachment into a PowerPoint document…

Where did that presentation go??

Page 5: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

5

Requirements for Tracking and Visualizing Provenance

Instrument all important document provenance events

Provenance events are NOT automatically captured by Windows

Develop a UI enabling users to locate documents via the provenance relationships they remember

Integrate the UI into the Windows Desktop

Page 6: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

6

Capturing Provenance Events with TaskTracer

TaskTracer is a Personal Information Management system

User defines a hierarchy of Projects or Activities

As the user works, TaskTracer automatically tags (according to task/project):

Files Folders Email Messages Email Contacts Web pages

Page 7: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

7

Instrumenting TT to Capture Provenance Events

TaskTracer already instruments many desktop events:

Open, Save, SaveAs, Close EmailArrived, Email Open, Email Close Open URL, Close URL, Follow Hyperlink

Idea: Extend existing instrumentation to cover key provenance events

CopyPaste, SaveAs, FileCopy/Rename AttachmentAdd, AttachmentOpen, AttachmentSave,

EmailForward*, EmailReply* FileDownload, FileUpload*

*Coming soon

Page 8: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

8

Instrumenting TaskTracer to capture Provenance Events (cont.)

A Provenance Event

“From” Resource “To” Resource

Event_id Event_type Event_time

SaveAs10233 Jan 12

oldFile.docId: 1768etc.

newFile.docId: 1923etc.

Database of document-to-document provenance relationships

Page 9: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

9

A tool for visualizing provenance

Developing a User Interface:TaskTrail

User’s Query

Click to ExpandMouse over details

Double-click to open

Page 10: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

10

Integrating TaskTrail into the Windows UI

Launch a query by right clicking on an item within

Windows Explorer, Outlook, TaskExplorer

Page 11: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

13

Page 12: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

14

Research Questions

• Does TaskTrail help users find documents more quickly than other methods?

• How should the provenance graph be laid out?

• What kind of provenance events do users accurately recall?

• How large are the provenance graphs?

• What patterns exist (if any) in terms of the succession of provenance events?

Page 13: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

15

User Studies: Formative

• Observational Study (planned)• What provenance-related actions do users

perform? Which of those do they remember?• Observe 12 participants in their workplaces• Record provenance-related actions performed• Interview participants after 1 week to see what they

remember• Free Recall• Cued Recall

• How do users layout their documents according to what they remember?

Page 14: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

16

User Studies: Summative

• TaskTrail Study at Intel (in progress)

• 4 participants (so far) are using TaskTracer for at least 1 month each

• Then they will use TaskTrail to locate their own documents

• Measures of success:• Do users locate more documents using TaskTrail?• Do users locate documents more quickly using

TaskTrail?• Do users prefer using TaskTrail?

Page 15: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich

17

Provenance-related User Studies are Hard!

Must be done “in the wild”

Involves: Long time-scales, which increase chances that:

Participants will drop out Situation on site will change

Potentially sensitive information Emails to/from users not participating in the study Documents regarding trade secrets

Installation of some event-tracking software Software installation/maintenance can introduce

compatibility, scheduling and other problems