Upload
derrick-eaton
View
228
Download
0
Tags:
Embed Size (px)
Citation preview
Information RetrievalInformation Retrieval
Lebanese UniversityFaculty of Economics and Business
Administration – 1st Branch
Class: M1Instructor: Dr. Lina A. Nimri
1
Course Text BookCourse Text Book
Modern Information Retrieval,
R. Baeza-yates and B. Ribeiro-Neto.,
Addison-Wesley and ACM Press, 1999,
ISBN: 0-201-39829-X
2
IntroductionIntroduction
Modern Information Retrieval, Chapter 1 Ricardo Baeza-Yates, Berthier Ribeiro-Neto
IntroductionIntroduction Examples of information need in the context of the
world wide web: “Find all documents containing information on
computer courses which: (1) are offered by universities in South England, and (2) are accredited by the BCS/IEE bodies,
To be relevant, the document must include information on admission requirements, and e-mail and phone number for contact purpose.” “Find all docs containing information on college
tennis teams which:
(1) are maintained by a USA university and
(2) participate in the NCAA tournament.
Information Retrieval4
5
Information RetrievalInformation Retrieval
Retrieval SystemRetrieval System
QueryQuery
Set of retrieved documentsdocuments
Docu
men
tsD
ocu
men
tsUser Information NeedUser Information Need
Search EngineSearch Engine
Useful or relevant Useful or relevant information to the userinformation to the user
Primary goal of an IR system“Retrieve all the documents which are relevant to a user
query, while retrieving as few non-relevant documents as possible.”
Representation, storage, organisation, and access to information items
(Usually) keyword-based representation
Data RetrievalData Retrieval
Determine which documents contain the keywords in the user query is not always enough to satisfy the user information need.
Data Retrieval retrieves objects which satisfy clearly defined conditions, such as regular expressions or relational algebra expressions.
Data Retrieval system deals with data with well-defined structure and semantics
6
Information Retrieval SystemInformation Retrieval System
Retrieving information about a subjectDeals with natural language text which
is not well structured and could be semantically ambiguous
It must interpret the contents of documents and rank them according to the degree of relevance to the user need.
7
Area of interestArea of interest
Digital LibrariesInformation expertsWorld Wide Web - Very difficult task
– The hyperspace is vast– The absence of a well defined data model
(format or representation form)
8
Effective retrievalEffective retrieval
The effective retrieval of relevant information is directly affected by:– The user task– The logical view of the document
(document’s representation) adopted by the retrieval system.
9
User tasksUser tasks
Pull technology User requests
information in an interactive manner
3 retrieval tasks– Browsing (hypertext)– Retrieval (classical IR
systems)– Browsing and retrieval
(modern digital libraries and web systems)
Push technology– automatic and
permanent pushing of information to user
– software agents– example: news
service– filtering (retrieval
task) relevant information for later inspection by user
10
PullingPulling
The user can browse the documents when his main objectives are not clear in the beginning and whose purpose might change during the interaction with the system.
Combination of retrieval and browsing is not yet a well established approach.
11
Retrieval
Browsing
Database
DocumentsDocumentsUnit of retrievalA passage of free text
– composed of text, strings of characters from an alphabet
– composed of natural language newspaper article, a journal paper, a
dictionary definition, email messages
– size of documents arbitrary newspaper article vs. journal paper vs.
email12
What is a document?What is a document?
13
Representation of documentsRepresentation of documents Documents are represented thru a set of index
terms or keywords or term descriptors– extracted directly form text– specified by human subjects (information science)
metadata Most concise representation Poor quality of retrieval
Full text representation– Most complete representation– High computational cost
Large collections– Reduce set of representative keywords
Elimination of stop words Stemming Identification of noun phrases Further compression 14
Document term descriptors to access texts
Generation of descriptors for text• By hand
• By analysing the text
Logical View of the Logical View of the documentsdocuments
15
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text Index terms
The retrieval functionsThe retrieval functions
16
Information need
Query
FormulationFormulation
Documents
Document representation
IndexingIndexing
Retrieved documents
Retrieval functionsRetrieval functions
Rele
vance
fe
edb
ack
QueriesQueries
Information Need: Simple queries
– composed of two or three, perhaps even dozens, of keywords
– e.g., as in web retrieval Boolean queries
– “neural networks AND speech recognition” Context Queries
– Proximity search, phrase queries
17
User term descriptors characterising the user need
Best-Match retrievalBest-Match retrieval
Compare the terms in a document and query
Compute similarity between each document in the collection and the query based on the terms that they have in common
Sorting the documents in order of decreasing similarity with the query
The outputs are a ranked list and displayed to the user - the top ones are more relevant as judged by the system
18
Document term descriptors to access texts
User term descriptors characterising the user need
Conceptual view of text Conceptual view of text retrieval systemretrieval system
19
Queries DocumentsSimilarity
Computation
RetrievedDocuments
Expanded view of text Expanded view of text retrieval systemretrieval system
20
Queries DocumentsIndexingIndexed
DocumentsSimilarity
Computation
RetrievedDocuments
RankedDocuments
Process of retrieving infoProcess of retrieving info
21
User Interface
Text Operations
Query Operations
Indexing
Similarity Computation (Searching)
Ranking
Document RepositoryManager
Index
User need
Logical view Logical view
Inverted file
Query
Retrieved docs
Text
TextUser feedback
Ranked docs
Text repository
Key TopicsKey Topics
Indexing text documents Retrieving text documents Evaluation Query reformulations
Search Engines =
IR + Link Structure + Name Interpretation
22
Information Retrieval Information Retrieval vs Information Extractionvs Information Extraction
Information Retrieval– Given a set of query terms and a set of document
terms select only the most relevant documents [precision], and preferably all the relevant [recall].
Information Extraction– Extract from the text what the document means.
IR systems can FIND documents but need not “understand” them
23