Upload
carrie
View
40
Download
1
Embed Size (px)
DESCRIPTION
Attention-Based Information Retrieval. Georg Buscher German Research Center for Artificial Intelligence (DFKI) Knowledge Management Department Kaiserslautern, Germany. SIGIR 07 Doctoral Consortium. Motivation. - PowerPoint PPT Presentation
Citation preview
Georg Buscher
Georg Buscher
German Research Center for Artificial Intelligence (DFKI)
Knowledge Management Department
Kaiserslautern, Germany
SIGIR 07 Doctoral Consortium
Attention-Based
Information Retrieval
Georg Buscher
Motivation
Magnetic Resonance Imaging uses magnetic fields and radio waves to produce high quality two- or three-dimensional images of brain structures. Sensors read frequencies of radio waves and a computer uses the information to construct an image of the brain (see 2) .
Positron Emission Tomography measures emissions from radioactively labeled metabolically active chemicals that have been injected into the bloodstream. The emission data are computer-processed to produce 2- or 3-dimensional images of the distribution of the chemicals throughout the brain. Especially useful are a wide array of chemicals used to map different aspects of neurotransmitter activity (see 3).
Homer's personality is one of frequent stupidity, laziness, and explosive anger. He also suffers from a short attention span which complements his intense but short-lived passion for hobbies, enterprises and various causes. Furthermore, he is prone to emotional outbursts.
1 2 3
Georg Buscher
Outline
Acquiring attention evidence– Attention evidence through eye tracking
– Attention annotation and derivation with Dempster-Shafer
Applications in Information Retrieval– Attention-based TfIdf
– Context elicitation
– Context-based Index
– Query Expansion / result re-ranking
Georg Buscher
Sources of Attention-Data
There are many indications of attention from the user:
read
skimmed
longer viewed
Annotations (explicit)
Reading evidence (implicit)
Georg Buscher
Attention Annotations Imply Different Levels of Attention
Attention evidence values
[0.7; 1.0] [0.5; 1.0] [0.2; 0.7][1.0; 1.0] … … …
Range from 0 to 1
Width of an interval expresses uncertainty
Georg Buscher
Dempster-Shafer Combination of Attention Evidence
read
[The demo … provide][different][visualizations][and interfaces][according … situation.]
R R H R H U R U R
[0.5; 1] [0.85; 1] [0.96; 1] [0.85; 1] [0.5; 1]
Calculate one value of attention (att(t) = bel(t) – 0.2*bel(t) + 0.2*pl(t)):
0.6 0.88 0.97 0.88 0.6
In that way, the function att provides an attention value for every term of the document.
attdifferent, d = 0.88
attaccording, d = 0.6
attsomethingElse, d = 0
Georg Buscher
Outline
Acquiring attention evidence– Attention evidence through eye tracking
– Attention annotation and derivation with Dempster-Shafer
Applications in Information Retrieval– Attention-based TfIdf Desktop Index
– Context elicitation
– Context-based Index
– Query Expansion
Georg Buscher
Attention-Based Desktop Index
A Desktop index is especially for re-finding known documents. You can better remember those parts of a document that you paid
attention to. Attended terms should be weighted higher.
TfIdf-based modification– Attention is a local factor (like tf)– The higher the maximal intensity of an attended document part, the
more weight should be assigned to the attention value.– The lower the maximal intensity of an attended document part, the
more weight should be assigned to tf.
attention part term frequency part
tft,d : term frequency of term t in document dattt,d : attention value of term t in document d
α in [0; 1] is a balancing factor for defining the influence of attention in contrast to term frequency.
Georg Buscher
Why Context? The Search for the Mental Model
If a knowledge worker tries to recall something concerning a topic,does he primarily think
– on the basis of documents and document structures or
– on the basis of former thematic contexts?
Rather the latter…
While re-finding some information, one does not search primarily for the document, but for the former mental model.Documents mediate.
Georg Buscher
Elicitation and Representation of the Thematic Context
Document 1
Brain imaging
Document 2
Brain imaging
Document 3
The Simpsons
thematic context
Brain imaging
Some read sub-documents
Combination of the viewed sub-documents to one virtual context document (only those attended parts that have a thematic overlapping)
Document 4
Brain imaging
Georg Buscher
Determination of Thematical Overlapping
Determine buzzwords for each viewed document by using– Attention value
– Idf of desktop index
Compare buzzword vector with previous context vectors– If there is a similarity, then merge with context vector
– Else buzzword vector is a new context
? Previouscontexts
Currentlyviewed
document(part)
Georg Buscher
Idea: two indexes
1. Term – Context 2. Context – Document
A context is represented by a virtual context document The value for each term–context relation is influenced by the degree of attention
Context-Based Vector-Space Index
Common index structure Doc1 Doc2 Doc3
Term1
Term2
Term3
2
3
1
0
1
0
4
0
1
C1 C2 C3
Term1
Term2
Term3
Term4
5
2
0
1
2
1
0
3
1
2
1
3
Doc1 Doc2
C1
C2
C3
x
x
x
x
Georg Buscher
New Kinds of Search Tasks Possible
Local search:Find for the current task (parts of) documents,that I formerly used for a similar task.
Enterprise-wide search:Find for the current task (parts of) documents,that I do not know yet, butthat have been used by some colleague for a similar task.
Georg Buscher
Evaluation of the Context-Based Index
Main advantage is expected to show up in several weeks.
Not possible to do real-world eye tracking studies for such a long time
Artificial experiment:– Several different exploration
tasks within some hours
– Then some re-finding tasks about previously viewed content
– Measuring the time or user-satisfaction during the search process?
Context-based search
Normal search
Georg Buscher
Contextual Attention-Based Relevance Feedback
Problem with context-based index: it doesn’t scale for web search therefore query expansion
Current elicited context (i.e. term vector) expresses current interest of the user
Topmost characteristic keywords will be used for query expansion