Upload
edmund-parks
View
217
Download
3
Embed Size (px)
Citation preview
1
The Use of Provenance in Information Retrieval
Simone StumpfErin Fitzhenry Tom Dietterich
2
Defining Provenance
To us, provenance concerns:
The origin of content within documents
The relationships between documents
AttachmentSave SaveAs
3
Why focus on Provenance for Information Retrieval?
People remember the relationships between documents!
Episodic vs. Semantic Memory Studies:
Blanc-Brude & Scapin (2007) Gonçalves & Jorge (2004)
No need to formulate keyword queries
Other common document attributes are often inaccurately remembered (Blanc-Brude & Scapin 2007):
Title (20% false recall) Size (53.8% false recall) Time (47.6% false recall)
4
Example Use Case: “Where did I save that again?”
I got an email from Tom…
I saved the attachment…
And I pasted some information from the attachment into a PowerPoint document…
Where did that presentation go??
5
Requirements for Tracking and Visualizing Provenance
Instrument all important document provenance events
Provenance events are NOT automatically captured by Windows
Develop a UI enabling users to locate documents via the provenance relationships they remember
Integrate the UI into the Windows Desktop
6
Capturing Provenance Events with TaskTracer
TaskTracer is a Personal Information Management system
User defines a hierarchy of Projects or Activities
As the user works, TaskTracer automatically tags (according to task/project):
Files Folders Email Messages Email Contacts Web pages
7
Instrumenting TT to Capture Provenance Events
TaskTracer already instruments many desktop events:
Open, Save, SaveAs, Close EmailArrived, Email Open, Email Close Open URL, Close URL, Follow Hyperlink
Idea: Extend existing instrumentation to cover key provenance events
CopyPaste, SaveAs, FileCopy/Rename AttachmentAdd, AttachmentOpen, AttachmentSave,
EmailForward*, EmailReply* FileDownload, FileUpload*
*Coming soon
8
Instrumenting TaskTracer to capture Provenance Events (cont.)
A Provenance Event
“From” Resource “To” Resource
Event_id Event_type Event_time
SaveAs10233 Jan 12
oldFile.docId: 1768etc.
newFile.docId: 1923etc.
Database of document-to-document provenance relationships
9
A tool for visualizing provenance
Developing a User Interface:TaskTrail
User’s Query
Click to ExpandMouse over details
Double-click to open
10
Integrating TaskTrail into the Windows UI
Launch a query by right clicking on an item within
Windows Explorer, Outlook, TaskExplorer
13
14
Research Questions
• Does TaskTrail help users find documents more quickly than other methods?
• How should the provenance graph be laid out?
• What kind of provenance events do users accurately recall?
• How large are the provenance graphs?
• What patterns exist (if any) in terms of the succession of provenance events?
15
User Studies: Formative
• Observational Study (planned)• What provenance-related actions do users
perform? Which of those do they remember?• Observe 12 participants in their workplaces• Record provenance-related actions performed• Interview participants after 1 week to see what they
remember• Free Recall• Cued Recall
• How do users layout their documents according to what they remember?
16
User Studies: Summative
• TaskTrail Study at Intel (in progress)
• 4 participants (so far) are using TaskTracer for at least 1 month each
• Then they will use TaskTrail to locate their own documents
• Measures of success:• Do users locate more documents using TaskTrail?• Do users locate documents more quickly using
TaskTrail?• Do users prefer using TaskTrail?
17
Provenance-related User Studies are Hard!
Must be done “in the wild”
Involves: Long time-scales, which increase chances that:
Participants will drop out Situation on site will change
Potentially sensitive information Emails to/from users not participating in the study Documents regarding trade secrets
Installation of some event-tracking software Software installation/maintenance can introduce
compatibility, scheduling and other problems