26
Bringing Order to the Web : Automatically Categorizing Search Results Advisor Dr. Hsu Graduate Keng-Wei Chang Author Hao Chen Susan Dumais

Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Embed Size (px)

DESCRIPTION

Motivation With the exponential growth of the Internet, it has become more and more difficult to find information. Most of web search services return a ranked list of web pages in response to a user’s search request. Web pages on different topics or different aspects of the same topic are mixed together in the returned list.

Citation preview

Page 1: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Bringing Order to the Web : Automatically Categorizing Search Results

Advisor : Dr. HsuGraduate : Keng-Wei ChangAuthor : Hao Chen

Susan Dumais

Page 2: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

outline Motivation Objective Introduction Related Work Text Classification User Interface User Study Conclusions Personal Opinion

Page 3: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Motivation With the exponential growth of the Internet, it

has become more and more difficult to find information.

Most of web search services return a ranked list of web pages in response to a user’s search request.

Web pages on different topics or different aspects of the same topic are mixed together in the returned list.

Page 4: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Objective To combine the advantage of structured topic

information in directories and broad coverage in search engines, we built a system that takes the web pages returned by a search engine and classifies them into a known hierarchical structure such as LookSmart’s Web directory.

Page 5: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Introduction

Web search services such as AltaVista, InfoSeek, and MSNWebSearch help people to find information on the web.

Most of these systems return a ranked list of web pages in response to a user’s search request.

Page 6: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Introduction

The system consists of two main components A text classifier that categorizes web pages on-th

e-fly, A user interface that presents the web pages withi

n the category structure and allows the user to manipulate the structured view.

Page 7: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Related Work Generating structure Using structure to support search

Page 8: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Generating structure Three general techniques have been used to

organize documents into topical contexts. Structural information (meta data) associated with

each document. Clustering classification

Page 9: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Using structure to support search A statistical text classification model is trained

offline on a representative sample of Web pages with known category labels.

At query time, new search results are quickly classified on-the-fly into the learned category structure. The benefit of using known and consistent

category labels Easily incorporating new items into the structure.

Page 10: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Text Classification Data Set

A collection of web pages from LookSmart’s Web Directory

13 top-level categories, 150 second-level categories, and over 17,000 categories in total.

Page 11: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Text Classification Pre-processing

Extracted plain text from each web page. In addition, the title, description, keyword, and

image tag fields were also extracted if they existed.

Page 12: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Text Classification Classification

A Support Vector Machine (SVM) algorithm was used as the classifier.

Used 13,352 pre-classified web pages to train the model for the 13 top-level categories, and between 1,985 and 10,431 for second-level categories.

Page 13: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Interface The search results were organized into hierarchical categories.

Page 14: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Interface Under each category, web pages beloingin to

that category were listed. The category could be expanded (or collapse

d) on demand by the user. To save screen space, only the title of each p

age was shown (the summary can be viewed by hover text)

Page 15: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Interface Only top-level categories on the first screen.

Help the user identify domains of interest quickly. Save a lot of screen space. Classification accuracy is usually in top level. Computationally faster If only few pages in the subcategories.

Page 16: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Study Compare the Category Interface to the List

Interface

Page 17: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Study query “jaguar” Twenty items are shown initially Summary are shown on hover Contain ShowMore button Category interface has a SubCategory button Eighteen subjects

Page 18: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Study The subject worked with three windows

Page 19: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Study-result Subjective questionnaire measures

“easy to use” (6.4 vs. 3.9, t(17)=6.41 ; p<<0.001) “liked using it” (6.7 vs. 4.3, t(17)=6.01 ; p<<0.001) “confident that I could find the information if it was

there” (6.3 vs. 4.4, t(17)=4.91 ; p<<0.001) “Easy to get a good sense of the range of

alternatives” (6.4 vs. 4.2, t(17)=6.22 ; p<<0.001) “prefer this to my usual search engine”

(6.4 vs. 4.3, t(17)=4.13 ; p<<0.001) On all of subjects much preferred the Category

Page 20: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Study-result Subjective questionnaire measures

“Summaries in hover text was useful in both interfaces” (6.5 vs. 6.4, t(17)=0.36 ; p<0.72)

“ShowMore option was useful” (6.5 vs. 6.1, t(17)=1.94 ; p<0.07)

Page 21: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Study-result Objective measures

Search Time 56s for Category 85s for List F(1,16)=12.94 ; p=.002

Page 22: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Study-result Objective measures

There is no interaction between order and interface (F(1,16)=1.23 ; p=0.28)

Page 23: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Study-result Objective measures

Top20(57s) NotTop20(98s) F(1,56)=16.5 ; p<<.001

No interaction between query difficulty and interface (F(1,56)=2.52 ; p=.12)

Page 24: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

User Study-result Objective measures

Subjects performed in the course of finding the items than those in the Category interface (4.60 vs. 2.99, t(17)=-5.54 ; p<.001)

Subjects actually viewed in the right window is somewhat larger in the List interface (1.41 vs. 1.23, t(17)=-2.08; p<.053)

Subjects in the Category interface used more expansion operations (0.78 vs. 0.48, t(17)=3.54 ; p<.003)

Page 25: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Conclusions SVM classifier Consistent category information to assist the

user in quickly focusing in on task-relevant. Across user study, the results convincingly

demonstrate that the category interface is superior to the list interface in both subjective and objective measures.

Page 26: Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate :…

Personal Opinion