View
213
Download
0
Embed Size (px)
Citation preview
2004.10.12 - SLIDE 1IS 202 – FALL 2004
Lecture 13: Midterm Review
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2004http://www.sims.berkeley.edu/academics/courses/is202/f04/
SIMS 202:
Information Organization
and Retrieval
2004.10.12 - SLIDE 2IS 202 – FALL 2004
Lecture Overview
• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and
discuss them– Open question/answer period
2004.10.12 - SLIDE 3IS 202 – FALL 2004
Lecture Overview
• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and
discuss them– Open question/answer period
2004.10.12 - SLIDE 4IS 202 – FALL 2004
Midterm Exam Details
• Date: 10/14/2004 Time: 10:30-12:00• The exam is open-book, open note AND open
computer• There will be 8-10 questions on the exam• You may use your own laptop, or one of the
computers in the lab. The results of your work are to be printed
• The exam can be hand-written if you wish, if so be sure to bring:– Pens/Pencils– Calculator– (Paper will be provided on the exam itself, but you
may want to bring scratch paper)
2004.10.12 - SLIDE 5IS 202 – FALL 2004
Midterm Exam Details
• The exam will cover the first half of the course, that is primarily it will be on the topics covered concerning Information Retrieval
• Questions will be worth a specific number of points and these will be stated on the exam itself
• Partial credit will be awarded for partial answers• In your answers, please balance conciseness
with illustration of all of the requested information– In other words, don't write a lot of things that aren't
asked for, but try to address all of what is asked for
2004.10.12 - SLIDE 6IS 202 – FALL 2004
Lecture Overview
• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and
discuss them– Open question/answer period
2004.10.12 - SLIDE 7IS 202 – FALL 2004
Rules
• Do your own work• No discussion during the exam
– Yes, IM counts as discussion!– Yes, email counts as discussion!
• You are on your honor to not look at other student’s work (you may want to review the University policies on academic dishonesty)
• PROVIDE PROPER ATTRIBUTION for ideas taken from other sources (online or printed)
2004.10.12 - SLIDE 8IS 202 – FALL 2004
Rules
• Questions CAN and SHOULD be asked of me or the TA’s
• Issues/Corrections/Answers for details will be put up on the screens in 202
• We will also put these up on a web page for those in the Lab
2004.10.12 - SLIDE 9IS 202 – FALL 2004
Lecture Overview
• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and
discuss them– Open question/answer period
2004.10.12 - SLIDE 10IS 202 – FALL 2004
Study Guide
• To study for the exam:• Be sure you understand the material that was
covered in lectures and have read and absorbed the corresponding material in the readings
• Be sure you can do activities similar to what was done in the homework assignments
• We will have questions that require you to generalize from what you've learned and synthesize ideas– So be sure you have thought about the ideas covered
in lecture, readings, and homework assignments
2004.10.12 - SLIDE 11IS 202 – FALL 2004
Study Guide
• Alison suggests that you might want to bookmark online or printed resources so that you can quickly find the topics that you need
2004.10.12 - SLIDE 12IS 202 – FALL 2004
Example Questions
• These are available on the Class Web site• Note that these examples are NOT the
exact questions that will be on the exam but are similar to questions that have been used in the past
• There will be questions that ask you to do something with supplied data– For example, given some data, design an ER
diagram describing the data elements and their relationships
2004.10.12 - SLIDE 13IS 202 – FALL 2004
Example Questions
• The example questions on the web site are organized (approximately) in the order that the topics were presented during the course:– Information– The Search process– Documents and Statistics of Text– Queries, Ranking, and the Vector Space Model– IR Systems and Implementation– Relevance Feedback– Evaluation of IR Systems– Database Design
2004.10.12 - SLIDE 14IS 202 – FALL 2004
(Approximate) Course Schedule
• Organization– Phone Project Introduction– Categorization– Knowledge Representation– Lexical Relations and
WordNet– Metadata Introduction– Controlled Vocabularies
Introduction– Facetted Classification– Thesaurus Design and
Construction– Semantic Web– Multimedia Information
Organization and Retrieval– Metadata for Media– Phone Project Presentations
• Retrieval– Overview– Introduction to the Search
Process– Boolean Queries and Text
Processing– Web Search Issues and
Architecture– Statistical Properties of
Text and Vector Representation
– Probabilistic Ranking & Relevance Feedback
– Evaluation– Interfaces for Information
Retrieval– Database Design
2004.10.12 - SLIDE 15IS 202 – FALL 2004
Review of Course Content
• We can draw on:– 14 sets of Slides (including this one and the
Math Review slides)– Handout papers– The Reader– Textbooks– Assignments– Discussion questions and issues
2004.10.12 - SLIDE 16IS 202 – FALL 2004
Example Questions
• Topic: Information
• Example Questions: – What is the information life cycle? – What are different ways of measuring
information? What are different ways of defining information?
2004.10.12 - SLIDE 17IS 202 – FALL 2004
Example Questions
• Topic: Document Representation and Statistical Properties of Text
• Example Questions:– What is the significance of Zipf's law for
weighting of terms in information retrieval? – What kinds of errors can a stemming
algorithm produce?
2004.10.12 - SLIDE 18IS 202 – FALL 2004
Example Questions
• Topic: Queries, Ranking, and the Vector Space Model• Example Questions:
– What is the difference between a search engine that uses the vector space ranking algorithm on natural language queries and a system that uses Boolean queries?
– What is the role of coordination level ranking in a faceted Boolean system?
– Describe the following information need in terms of a faceted Boolean query. What kinds of weighting algorithms can be applied to a faceted query like this? ``I would like to find articles about the effects of the passage of the independent investigator statute by Congress on how the U.S. president chooses an attorney general.''
– Why do different web search engines return different sets of documents for the same query?
– Redo the computations of Assignment 3 part 3 using different values for TF.
2004.10.12 - SLIDE 19IS 202 – FALL 2004
Example Questions
• Topic: IR systems and Implementation• Example Questions:
– Draw and label a diagram that shows the major components of an IR system.
– What are the special features of the Cheshire II information access system?
– What is the purpose of an inverted index? How is it used to generate answers to Boolean queries?
– Convert the contents of a set of documents (short texts) into an inverted index representation.
2004.10.12 - SLIDE 20IS 202 – FALL 2004
Example Questions
• Topic: Evaluation of IR Systems
• Example Questions: – Define precision. Define recall. Define
relevance. How are the three interrelated? – Under what circumstances is high recall
desirable? Under what circumstances is high precision?
– What is the main purpose of TREC? How does it differ from earlier evaluation efforts?
2004.10.12 - SLIDE 21IS 202 – FALL 2004
Example Questions
• Topic: The Search Process • Example Questions:
– Search and retrieval is part of a larger process. Name some other components of that process.
– How/why doesn't the Bates berry-picking model fit with the standard information retrieval model?
– How (fundamentally) does search on a directory system like Yahoo differ from search on Altavista or Google?
2004.10.12 - SLIDE 22IS 202 – FALL 2004
Example Questions
• Topic: Relevance Feedback• Example Questions:
– What is main the difference between relevance feedback as defined in the literature and the more current web-based notion of "more like this"?
– Given a query, three documents marked as relevant, and the Rocchio formula for relevance feedback given in class, compute the vector for the new query that results.
– The Koenemann & Belkin study found results in three conditions for relevance feedback: opaque, transparent, and penetrable. Consider the different ways people have implemented systems for predicting which web page to show the user next. How do the differences in these systems correspond to the different relevance feedback
2004.10.12 - SLIDE 23IS 202 – FALL 2004
Example Questions
• Topic: Database Design• Example Questions:
– How is a database different than a file system? – What are the benefits of a database system? – What do we mean by data independence? – What are the benefits/drawbacks of the primary
database models? – Entity-Relationship Diagrams -- what are they for, how
do you create them? – How do you normalize a relational model database? – What is a join?
2004.10.12 - SLIDE 24IS 202 – FALL 2004
Lecture Overview
• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and
discuss them– Open question/answer period
2004.10.12 - SLIDE 25IS 202 – FALL 2004
Your Questions
• What other topics would you like more explanation for?