26
2004.10.12 - SLIDE 1 IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004 http://www.sims.berkeley.edu/academics/courses/ is202/f04/ SIMS 202: Information Organization and Retrieval

2004.10.12 - SLIDE 1IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

2004.10.12 - SLIDE 1IS 202 – FALL 2004

Lecture 13: Midterm Review

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 pm

Fall 2004http://www.sims.berkeley.edu/academics/courses/is202/f04/

SIMS 202:

Information Organization

and Retrieval

2004.10.12 - SLIDE 2IS 202 – FALL 2004

Lecture Overview

• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and

discuss them– Open question/answer period

2004.10.12 - SLIDE 3IS 202 – FALL 2004

Lecture Overview

• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and

discuss them– Open question/answer period

2004.10.12 - SLIDE 4IS 202 – FALL 2004

Midterm Exam Details

• Date: 10/14/2004 Time: 10:30-12:00• The exam is open-book, open note AND open

computer• There will be 8-10 questions on the exam• You may use your own laptop, or one of the

computers in the lab. The results of your work are to be printed

• The exam can be hand-written if you wish, if so be sure to bring:– Pens/Pencils– Calculator– (Paper will be provided on the exam itself, but you

may want to bring scratch paper)

2004.10.12 - SLIDE 5IS 202 – FALL 2004

Midterm Exam Details

• The exam will cover the first half of the course, that is primarily it will be on the topics covered concerning Information Retrieval

• Questions will be worth a specific number of points and these will be stated on the exam itself

• Partial credit will be awarded for partial answers• In your answers, please balance conciseness

with illustration of all of the requested information– In other words, don't write a lot of things that aren't

asked for, but try to address all of what is asked for

2004.10.12 - SLIDE 6IS 202 – FALL 2004

Lecture Overview

• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and

discuss them– Open question/answer period

2004.10.12 - SLIDE 7IS 202 – FALL 2004

Rules

• Do your own work• No discussion during the exam

– Yes, IM counts as discussion!– Yes, email counts as discussion!

• You are on your honor to not look at other student’s work (you may want to review the University policies on academic dishonesty)

• PROVIDE PROPER ATTRIBUTION for ideas taken from other sources (online or printed)

2004.10.12 - SLIDE 8IS 202 – FALL 2004

Rules

• Questions CAN and SHOULD be asked of me or the TA’s

• Issues/Corrections/Answers for details will be put up on the screens in 202

• We will also put these up on a web page for those in the Lab

2004.10.12 - SLIDE 9IS 202 – FALL 2004

Lecture Overview

• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and

discuss them– Open question/answer period

2004.10.12 - SLIDE 10IS 202 – FALL 2004

Study Guide

• To study for the exam:• Be sure you understand the material that was

covered in lectures and have read and absorbed the corresponding material in the readings

• Be sure you can do activities similar to what was done in the homework assignments

• We will have questions that require you to generalize from what you've learned and synthesize ideas– So be sure you have thought about the ideas covered

in lecture, readings, and homework assignments

2004.10.12 - SLIDE 11IS 202 – FALL 2004

Study Guide

• Alison suggests that you might want to bookmark online or printed resources so that you can quickly find the topics that you need

2004.10.12 - SLIDE 12IS 202 – FALL 2004

Example Questions

• These are available on the Class Web site• Note that these examples are NOT the

exact questions that will be on the exam but are similar to questions that have been used in the past

• There will be questions that ask you to do something with supplied data– For example, given some data, design an ER

diagram describing the data elements and their relationships

2004.10.12 - SLIDE 13IS 202 – FALL 2004

Example Questions

• The example questions on the web site are organized (approximately) in the order that the topics were presented during the course:– Information– The Search process– Documents and Statistics of Text– Queries, Ranking, and the Vector Space Model– IR Systems and Implementation– Relevance Feedback– Evaluation of IR Systems– Database Design

2004.10.12 - SLIDE 14IS 202 – FALL 2004

(Approximate) Course Schedule

• Organization– Phone Project Introduction– Categorization– Knowledge Representation– Lexical Relations and

WordNet– Metadata Introduction– Controlled Vocabularies

Introduction– Facetted Classification– Thesaurus Design and

Construction– Semantic Web– Multimedia Information

Organization and Retrieval– Metadata for Media– Phone Project Presentations

• Retrieval– Overview– Introduction to the Search

Process– Boolean Queries and Text

Processing– Web Search Issues and

Architecture– Statistical Properties of

Text and Vector Representation

– Probabilistic Ranking & Relevance Feedback

– Evaluation– Interfaces for Information

Retrieval– Database Design

2004.10.12 - SLIDE 15IS 202 – FALL 2004

Review of Course Content

• We can draw on:– 14 sets of Slides (including this one and the

Math Review slides)– Handout papers– The Reader– Textbooks– Assignments– Discussion questions and issues

2004.10.12 - SLIDE 16IS 202 – FALL 2004

Example Questions

• Topic: Information

• Example Questions: – What is the information life cycle? – What are different ways of measuring

information? What are different ways of defining information?

2004.10.12 - SLIDE 17IS 202 – FALL 2004

Example Questions

• Topic: Document Representation and Statistical Properties of Text

• Example Questions:– What is the significance of Zipf's law for

weighting of terms in information retrieval? – What kinds of errors can a stemming

algorithm produce?

2004.10.12 - SLIDE 18IS 202 – FALL 2004

Example Questions

• Topic: Queries, Ranking, and the Vector Space Model• Example Questions:

– What is the difference between a search engine that uses the vector space ranking algorithm on natural language queries and a system that uses Boolean queries?

– What is the role of coordination level ranking in a faceted Boolean system?

– Describe the following information need in terms of a faceted Boolean query. What kinds of weighting algorithms can be applied to a faceted query like this? ``I would like to find articles about the effects of the passage of the independent investigator statute by Congress on how the U.S. president chooses an attorney general.''

– Why do different web search engines return different sets of documents for the same query?

– Redo the computations of Assignment 3 part 3 using different values for TF.

2004.10.12 - SLIDE 19IS 202 – FALL 2004

Example Questions

• Topic: IR systems and Implementation• Example Questions:

– Draw and label a diagram that shows the major components of an IR system.

– What are the special features of the Cheshire II information access system?

– What is the purpose of an inverted index? How is it used to generate answers to Boolean queries?

– Convert the contents of a set of documents (short texts) into an inverted index representation.

2004.10.12 - SLIDE 20IS 202 – FALL 2004

Example Questions

• Topic: Evaluation of IR Systems

• Example Questions: – Define precision. Define recall. Define

relevance. How are the three interrelated? – Under what circumstances is high recall

desirable? Under what circumstances is high precision?

– What is the main purpose of TREC? How does it differ from earlier evaluation efforts?

2004.10.12 - SLIDE 21IS 202 – FALL 2004

Example Questions

• Topic: The Search Process • Example Questions:

– Search and retrieval is part of a larger process. Name some other components of that process.

– How/why doesn't the Bates berry-picking model fit with the standard information retrieval model?

– How (fundamentally) does search on a directory system like Yahoo differ from search on Altavista or Google?

2004.10.12 - SLIDE 22IS 202 – FALL 2004

Example Questions

• Topic: Relevance Feedback• Example Questions:

– What is main the difference between relevance feedback as defined in the literature and the more current web-based notion of "more like this"?

– Given a query, three documents marked as relevant, and the Rocchio formula for relevance feedback given in class, compute the vector for the new query that results.

– The Koenemann & Belkin study found results in three conditions for relevance feedback: opaque, transparent, and penetrable. Consider the different ways people have implemented systems for predicting which web page to show the user next. How do the differences in these systems correspond to the different relevance feedback

2004.10.12 - SLIDE 23IS 202 – FALL 2004

Example Questions

• Topic: Database Design• Example Questions:

– How is a database different than a file system? – What are the benefits of a database system? – What do we mean by data independence? – What are the benefits/drawbacks of the primary

database models? – Entity-Relationship Diagrams -- what are they for, how

do you create them? – How do you normalize a relational model database? – What is a join?

2004.10.12 - SLIDE 24IS 202 – FALL 2004

Lecture Overview

• Midterm Review– The administrative details– The “Rules” for the exam– We will go through the sample questions and

discuss them– Open question/answer period

2004.10.12 - SLIDE 25IS 202 – FALL 2004

Your Questions

• What other topics would you like more explanation for?

2004.10.12 - SLIDE 26IS 202 – FALL 2004

Be prepared, and good luck!