Upload
jewel-bond
View
212
Download
0
Embed Size (px)
Citation preview
Per Ahlgren
On a cognitive search strategy
2
• Background
• Ingwersen’s example
• A cognitive search strategy
• Search formulation construction
• Four situations but two search formulations
• A remedy
• Concluding remarks
Overview
3
Background• Important to provide users of the digital
library with tools that help them to retrieve information relevant to their needs
• Search stratgies - approaches for search problems– Example. The building blocks strategy
Step 1 Concept 1 Concept 2Step 2 T11, . . . , T1i T21, . . . , T2j
Step 3 T11 OR . . . OR T1i T21 OR . . . OR T2j
Step 4 (T11 OR . . . OR T1i) AND (T21 OR . . . OR T2j)
4
Ingwersen’s example• Two terms, A and B
• Two fields, title field (TI) and descriptor/identifier field (DE,ID)
• Four atomic search formulations:– A/TI; A/DE,ID; B/TI; B/DE,ID
• Assumed situation with regards to document frequency (df): df (A/TI) < df (B/TI) < df (A/DE,ID) < df (B/DE,ID).
5
• Principle (P) - atomic search formulations with lower frequencies should be combined before formulations with higher frequencies– Idea behind P: a term’s value for retrieval
purposes is inversely proportional to the number of documents in which the term occurs
6
A cognitive search strategy• Different cognitive agents - author (TI) and indexer
(DE,ID) - are involved with regards to the assignment of terms to documents
• Occurrence of different cognitive agents– Cognitive overlap
• When constructing Boolean search formulations in a two term TI/DE,ID search, consider the factors occurrence of different cognitive agents and cognitive overlap when combining atomic formulations with the AND operator
7
– Optimal situation: both terms are present in both fields (the cognitive agents involved agree about the two access points). Expressed by the following formulation: A/TI*B/TI*A/DE,ID*B/DE,ID.
• A multiple evidence approach - the strategy combines evidence for the relevance of a document
8
Search formulation construction• Purpose: stepwise retrieval of a number of subsets of the set
D of documents that is retrieved by A+B. First formulation: S1
A/TI*B/TI*A/DE,ID*B/DE,ID.
• Two methods– (1) NOTPRESET (Ingwersen’s method)
• A new formulation is obtained by (1) combining atomic formulations by the AND operator, considering the factors (a) presence of A and B, (b) occurrence of different cognitive agents and (c) document frequency, (2) excluding all the preceeding formulations by the NOT and OR operators, and (3) ANDing the results of (1) and (2).
9
• Should be fairly easy for the user to grasp• Example. S2 A/TI*B/TI*A/DE,ID NOT S1
– (2) NOTATOMIC• A new formulation is obtained by (1) combining as many
atomic formulations as possible (in the light of earlier formulations) by the AND operator, considering the factors (a) presence of A and B, (b) occurrence of different cognitive agents and (c) document frequency, (2) excluding by the NOT and OR operators all the atomic formulations that are not part of the result of (1), and (3) ANDing the results of (1) and (2).
• Yields, in most cases, shorter fomulations than NOTPRESET
10
• Should be fairly easy for the user to grasp
• Is abandoned in step 10 and step 11
• Example. (2) A/TI*B/TI*A/DE,ID NOT B/DE,ID
• Presence of A and B the most importent factor
• Occurrence of different cognitive agents more important than document frequency
11
Four situations but two search formulations
• Consider the (NOTATOMIC) formulations
(10) A/TI NOT (B/TI+B/DE,ID) and
(11) B/TI NOT (A/TI+A/DE,ID).
• (10) and (11) are indefinite with respect to A/DE,ID and B/DE,ID, respectively.
12
(1) A is present in TI but not in DE,ID, and B is absent from both fields
and
(2) A is present in both fields, and B is absent from both fields,
or between
(3) B is present in TI but not in DE,ID, and A is absent from both fields
and
(4) B is present in both fields, and A is absent from both fields.
13
• We then need four formulations instead of (10) A/TI NOT (B/TI+B/DE,ID) and (11) B/TI NOT (A/TI+A/DE,ID) (instead of S10 and S11), four formulations that express the four situations.
• Ingwersen’s formulations express only 11 of the 16 possible situations with regards to the presence of A and B in the two fields.
14
Figure 1: The 16 possible situations with regards to the presence of of A and B in the two fields.
B/TI
A/TI B/DE,ID
A/DE,ID
15
A remedy• NOTATOMICSubtitute the following four formulations (in the given
order)
9a A/TI*A/DE,ID NOT (B/TI+B/DE,ID)
9b B/TI*B/DE,ID NOT (A/TI+A/DE,ID)
10* A/TI NOT (A/DE,ID+B/TI+B/DE,ID)
11* B/TI NOT (B/DE,ID+A/TI+A/DE,ID)
for
(10) A/TI NOT (B/TI+B/DE,ID)
and
(11) B/TI NOT (A/TI+A/DE,ID).
16
– The new set of formulations express 15 of the 16 possible cases with regards to the presence of A and B in the two fields, not just 11.
– NOTATOMIC is not abandoned.
17
• NOTPRESET– It is also possible to use a special case of
NOTPRESET, say NOTPRESET*, to construct formulations that express the four situations in question. When constructing a new formulation, the first step in NOTPRESET* is identical with the first step in NOTATOMIC: combine as many atomic formulations as possible (in the light of earlier formulations) by the AND operator, considering the factors presence of A and B, occurrence of different cognitive agents and document frequency.
18
– The new set of formulations express 15 of the 16 possible cases with regards to the presence of A and B in the two fields, not just 11.
19
Concluding remarks
• Ingwersen’s set of formulations should be modified to correspond to 15 of the 16 possible situations with regards to the presence of the terms A and B in the two fields.
• Ingwersen’s approach gives the Boolean searcher a hint concerning the order in which the parts of a (possibly large) document set should be retrieved.
20
• If the command language does not admit abbreviation of an OR formlation, NOTATOMIC is in my opinon preferable to NOTPRESET*.