28
INFO624 -- Week 10 Futures of IR Systems Dr. Xia Lin Dr. Xia Lin Assistant Professor Assistant Professor College of Information Science and College of Information Science and Technology Technology Drexel University Drexel University

INFO624 -- Week 10 Futures of IR Systems Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University

Embed Size (px)

Citation preview

INFO624 -- Week 10Futures of IR Systems

Dr. Xia LinDr. Xia LinAssistant ProfessorAssistant Professor

College of Information Science and TechnologyCollege of Information Science and Technology

Drexel UniversityDrexel University

The Seven Ages of Information Retrieval Vannevar Bush's 1945 article set a Vannevar Bush's 1945 article set a

goal of fast access to the contents of goal of fast access to the contents of the world's libraries which looks like the world's libraries which looks like it will be achieved by 2010, sixty-five it will be achieved by 2010, sixty-five years later. years later. Bush’s PredictionBush’s Prediction

IR Childhood (1945-1955)

Ideas conceived Ideas conceived Information explosion after World War IIInformation explosion after World War II Possibility of information processing Possibility of information processing

machinemachine MemexMemex

The hardware seems mostly out of date. The hardware seems mostly out of date. user inserting 5000 pages per day into a user inserting 5000 pages per day into a

personal repository and it taking hundreds personal repository and it taking hundreds of years to fill it up. of years to fill it up.

the software goals have not been achieved.the software goals have not been achieved.

The Schoolboy (1960s)

Many many experimentsMany many experiments Use of Precision and RecallUse of Precision and Recall Use of relevance feedbackUse of relevance feedback

Adulthood (1970s)

The invention of The invention of word processing systemsword processing systems time-sharing systemstime-sharing systems

The beginning of information industryThe beginning of information industry OCLCOCLC DIALOGDIALOG BRSBRS

Maturity (1980s)

Mid-Life Crisis (1990s)

• Internet put IR to the test.

• Better understanding of the limit of IR.

• Large scale evaluations

• Digital Libraries projects

Predictions Fulfillment (2000s)Fulfillment (2000s) Retirement (2010)Retirement (2010)

IR research

IR System

Retrieval algorithmsInterface

EvaluationUser Satisfaction

System prototyping

ContentsInteraction

User

Top Ten Research Issues

10. Relevance Feedback.10. Relevance Feedback.

9. Information Extraction.9. Information Extraction.

8. Multimedia Retrieval.8. Multimedia Retrieval.

7. Effective Retrieval.7. Effective Retrieval.

6. Routing and Filtering.6. Routing and Filtering.

Top Ten Research Issues5. Interfaces and Browsing.5. Interfaces and Browsing.4. “Magic” (Vocabulary Mapping).4. “Magic” (Vocabulary Mapping).3. Efficient, Flexible Indexing and 3. Efficient, Flexible Indexing and

Retrieval.Retrieval.2. Distributed IR.2. Distributed IR.1. Integrated Solutions.1. Integrated Solutions.

A new Industry – Content A new Industry – Content ManagementManagement

Examples of AdvancedIR Systems

Intelligent retrievalIntelligent retrieval Natural language dialogNatural language dialog Text mining Text mining Knowledge-based indexingKnowledge-based indexing

Intelligent Retrieval Expert systemsExpert systems Case-based reasoningCase-based reasoning Natural language understandingNatural language understanding Automatic abstracting/summarizingAutomatic abstracting/summarizing

Natural Dialog with the Computer Use natural languageUse natural language Let the computer keep a profile of your Let the computer keep a profile of your

life-long information seeking patternslife-long information seeking patterns Understand information receivedUnderstand information received understand user’s information needsunderstand user’s information needs

Text mining Information SummationInformation Summation

Automatic summaryAutomatic summary Automatic translationAutomatic translation Automatic clustering and Automatic clustering and

classificationclassification Knowledge Discovery Knowledge Discovery

Associative query expansion Associative query expansion Use of domain knowledge Use of domain knowledge explore document associationexplore document association Generate visualization mapsGenerate visualization maps

IR & Text Mining

IR helps users find information that is IR helps users find information that is already known and that exists in a already known and that exists in a document.document.

TM attempts to examine a collection of TM attempts to examine a collection of documents and discover information not documents and discover information not contained in any individual document in contained in any individual document in the collection.the collection.

Neural Network Computing Learn how the human brain worksLearn how the human brain works Create massive associative networks where Create massive associative networks where

each node has limited process capability each node has limited process capability (intelligence)(intelligence)

Make the network learns by its own Make the network learns by its own processing patternsprocessing patterns

Knowledge-based Indexing

Automatic indexing + concept indexing.Automatic indexing + concept indexing.

Index not only documents but also ideas.Index not only documents but also ideas.

From Internet to Semantic Web

Internet Internet Digital files are connected togetherDigital files are connected together

WWWWWW web pages are connectedweb pages are connected

Digital LibraryDigital Library Information collections are connectedInformation collections are connected

Knowledge NetKnowledge Net Ideas are connected Ideas are connected

Personal Information Agent ? Find me anything I ask it to find.Find me anything I ask it to find. Help me clarify what I really need.Help me clarify what I really need. Let me know what information is Let me know what information is

availableavailable Help me convert information into Help me convert information into

knowledgeknowledge

Human-Computer Collaboration The nature of information seeking requires The nature of information seeking requires

the dialog between the user and the computer the dialog between the user and the computer to be deeper than what HCI describes.to be deeper than what HCI describes. Interaction -- to act upon one another Interaction -- to act upon one another Collaboration -- work together purposely Collaboration -- work together purposely

in an intellectual endeavor in an intellectual endeavor Let the computer amplify and augment Let the computer amplify and augment

human’s intellect. human’s intellect.

Search Engines in the News

Searching the Web gets easier with enginSearching the Web gets easier with engines that try to read your mindes that try to read your mind

US News & World Report, April 16, 2001US News & World Report, April 16, 2001 Now Google saves lives! Someone Now Google saves lives! Someone

wondering if they were having a heart wondering if they were having a heart attack did a search on Google, found a attack did a search on Google, found a page explaining the symptoms, then got page explaining the symptoms, then got to a hospital for help. Without Google, to a hospital for help. Without Google, "I'd be dead today," he's quoted as "I'd be dead today," he's quoted as saying. A look at various search saying. A look at various search engines, search technology and tips.engines, search technology and tips.

Search Engines in the news

Desperately Seeking Search TechnologyDesperately Seeking Search Technology (Commentary) BusinessWeek Online, September 24, 2001 by Robert D. Hof After e-mail, people spend the most time online searching for

stuff. Market researcher Jupiter Media Metrix Inc. found that 80% of online users will abandon a site if the search function doesn't work well. Says Martha M. Frey, an analyst at market researcher Patricia Seybold Group: "You could make a case that the main reason e-commerce is unprofitable is that the power of search has been overlooked."

Building Web Sites With DepthBuilding Web Sites With Depth Web Techniques, February 2001 by Jakob Nielsen and Marie Tahir

search is one of the most common, and one of the least successful ways that users look for things on the Web. Search is often as bad as the worst salesperson or customer service representative.

Google vs. SearchKing An ad marketer took Google to An ad marketer took Google to

court over changes the popular court over changes the popular search engine made to its page-search engine made to its page-ranking system.ranking system. When Google changed its PageRank, When Google changed its PageRank,

SearchKing lost businessSearchKing lost business Should the power of Google be regulated? Should the power of Google be regulated?

Like public utility or highway?Like public utility or highway?

Google vs Searchking Google vs Searchking: United States District Court Google vs Searchking: United States District Court

in Oklahoma gave Google First Amendment in Oklahoma gave Google First Amendment protection in SearchKing's suit against Google for protection in SearchKing's suit against Google for harm to its business from changes in Google's page harm to its business from changes in Google's page ranking algorithms. SearchKing is a link farm for its ranking algorithms. SearchKing is a link farm for its customers. Last year its position in Google results customers. Last year its position in Google results dropped dramatically as Google adjusted its ranking dropped dramatically as Google adjusted its ranking rules. rules. Google argued that PageRanks are opinions Google argued that PageRanks are opinions and protected by free speechand protected by free speech. It won that point, but . It won that point, but the Court did say that Google manually lowered the Court did say that Google manually lowered SearchKing's PageRank. --Jan. 27.2003SearchKing's PageRank. --Jan. 27.2003

Google vs. Evil Daniel Brandt, who runs google-Daniel Brandt, who runs google-

watch.org, attacked PageRank, watch.org, attacked PageRank, accusing the company of being accusing the company of being unfair and undemocratic. Brandt unfair and undemocratic. Brandt urged the FTC to investigate Google urged the FTC to investigate Google and regulate it as a public utility - and regulate it as a public utility - as a company that, in effect, as a company that, in effect, controls access to the Internet's controls access to the Internet's natural resources. natural resources.

Google vs. Evil Google vs. Evil

Finally ….

Recommended Readings on Recommended Readings on Futures of Search Engine Futures of Search Engine TechnologiesTechnologies