12
Special Topics in Computer Science Special Topics in Computer Science Advanced Topics in Information Advanced Topics in Information Retrieval Retrieval Chapter 1: Introduction Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com

Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

Embed Size (px)

Citation preview

Page 1: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

Special Topics in Computer ScienceSpecial Topics in Computer Science

Advanced Topics in Information RetrievalAdvanced Topics in Information Retrieval

Chapter 1: IntroductionChapter 1: Introduction

Alexander Gelbukh

www.Gelbukh.com

Page 2: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

2

MotivationMotivation

First for libraries, but now — WWW!!! Info: representation, storage, organization, access Search Engines (IR systems) User information need

o Plain English description query

Concerns of modern IR:o modeling

o classification, categorization, filtering

o system architecture

o user interfaces, visualization, query languages

Page 3: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

3

Data vs. Information RetrievalData vs. Information Retrieval

Data Retrieval Precise description Well-structured data

Precise results Yes-or-no results

Science

Information Retrieval Vague information need Natural Language, images, ... Semantic interpretation Approximate results Relevance ranking

Art!

Page 4: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

4

Basic ConceptsBasic Concepts

User task (search)o Can formulate what they need: Retrieval (classical)o Can’t (or does not know): Browsing (new to IR)

Still not very well integrated

o Filtering (user passive, contents active) Logical view of docs

o ... Added linguistic info... not clear if helpso Full texto Text operations: reduce complexity to index terms

Keywords, stopwords Stemming, noun groups (linguistic processing needed)

o Categories

Slow, good

Fast, bad

Page 5: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

5

Past, Present, and FuturePast, Present, and Future

Since clay tabletso Alphabetical index (formal)o Table of Contents (by storing order)o Classifications (by meaning)

Librarieso Automation of classical techniques. Catalogs.o Search by fields (exact match: author, title, keywords)

Web & Digital Libraries: interactiveo Cheaper huge amount of datao Networks remote access, wider audienceo Free publishing unprepared, heterogeneous data

Artificial Intelligence and Linguistic methods

Page 6: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

6

Main concernsMain concerns

Open audienceo Help people to formulate their information need

o Improve retrieval quality. Intelligent methods

Efficiency (speed)o Development of fast techniques

Interactiono Watch user behavior to improve quality

o Privacy!

Open contento Legal issues. Copyright. Responsibility for info quality

o Intelligent methods

Page 7: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

7

Retrieval processRetrieval process

Databaseo Define the logical view: text operations, text model

Index (e.g., inverted file)

User queryo Query operations (users are not good at this!)

Retrieved docso Ranked by likelihood (relevance)

Feedback cycle

Page 8: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh
Page 9: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

9

The Textbook: Text IRThe Textbook: Text IR

Models and Evaluationo Modeling (basic concepts)o Retrieval Evaluation

Improvements on Retrievalo Query Languageso Query Operations o Text Languages and Properties o Text Operations

Efficiencyo Indexing and Searching

Page 10: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

10

Conferences & JournalsConferences & Journals

Confs on IRo IRo ACM SIGIRo TRECo SPIRE

Journalo IR

General conferences on text processingo ACLo COLINGo CICLingo DEXA (databases)o NLDB

Page 11: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

11

ConclusionsConclusions

User Information Needo Vague

o Semantic, not formal

Document Relevanceo Order, not retrieve

Huge amount of informationo Efficiency concerns

o Tradeoffs

IR is art more than science

Page 12: Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh

12

Thank you!