20
Improving Search Engines using Online Communities Anatoliy Gruzd <[email protected]> Research Forum Graduate School of Library and Information Science University of Illinois, Urbana-Champaign, IL March 14, 2007 It takes an [Internet] village …

Improving Search Engines using Online Communities

Embed Size (px)

DESCRIPTION

Anatoliy Gruzd Research Forum, Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign, IL March 14, 2007

Citation preview

Page 1: Improving Search Engines using Online Communities

Improving Search Engines using

Online Communities

Anatoliy Gruzd <[email protected]>

Research ForumGraduate School of Library and Information Science

University of Illinois, Urbana-Champaign, IL March 14, 2007

It takes an [Internet] village …

Page 2: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

AgendaAgenda

1. Common search problems

2. Online bookmarking - http://del.icio.us

3. Pilot Study

4. Future work

Page 3: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Common search problems

The main drawback of all modern search engines is that they force

the user to guess words that might appear in all relevant documents

and at the same time will not appear in NON-relevant documents.

1. A relevant page will not be retrieved, if it does not contain keywords that the user chose for searching.

2. Even If user’s search keywords are found inside a web page, it does not always mean that the page is relevant to the user.

Page 4: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Query#1: weight loss

User’s Query

Web page

MatchingMatching

Results

weight loss

weight loss ???

Architecture of a typical search engine

Page 5: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Query#1: weight loss• http://www.paleofood.com/

Recipes are: grain-free, bean-free, potato-free, dairy-free, and sugar-free.

Page 6: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Query#2: assignment about "human brain" for homeschooling

This is an instructor’s blog for a Human Development class in the Evergreen

State College. The page was retrieved because of two unrelated postings titled

“Homeschoolers use selective socialization” and

“Part Of Human Brain Functions Like A Digital Computer”.

This is an instructor’s blog for a Human Development class in the Evergreen

State College. The page was retrieved because of two unrelated postings titled

“Homeschoolers use selective socialization” and

“Part Of Human Brain Functions Like A Digital Computer”.

Page 7: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

AgendaAgenda

1. Common search problems

2. Online bookmarking - http://del.icio.us

3. Pilot Study

4. Future work

Page 8: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Page 9: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

username

Page 10: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Common Tags forhttp://www.paleofood.com/

• ethnic • evolutionary eating • food • allergies • german • naturopathic • primitivism • weight loss

• ethnic • evolutionary eating • food • allergies • german • naturopathic • primitivism • weight loss

Tag

Tag

Tag

Page 11: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

User’s Query

Web page

MatchingMatching

Results

Tags

weight loss

weight loss ???

Page 12: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

AgendaAgenda

1. Common search problems

2. Online bookmarking - http://del.icio.us

3. Pilot Study

4. Future work

Page 13: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Pilot Study

User’s Query

Web page

MatchingMatching

Results A

Tags

MatchingMatching

Results B

System A System B

Page 14: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Pilot Study

• Search engine – Indri, a cooperative effort between the University of

Massachusetts and Carnegie Mellon University

• Search queries – ~20-30 Users’ real questions found on the

Internet

• Pilot dataset– 454 health-related web pages

Page 15: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

115 /Neurological_Disorders

101 /Cancer

54 /Immune_Disorders/Immune_Deficiency

53 /Endocrine_Disorders

35 /Cardiovascular_Disorders

26 /Respiratory_Disorde

23 /Digestive_Disorders

“The Open Directory Project is the largest, most comprehensive human-edited directory of the Web.”

http://dmoz.org

Started with ~64,000 URLs (from Top/Health/Conditions_and_Diseases)-> only 544 are bookmarked by del.icio.us users

-> only 454 were accessible at the time of my experiment

Started with ~64,000 URLs (from Top/Health/Conditions_and_Diseases)-> only 544 are bookmarked by del.icio.us users

-> only 454 were accessible at the time of my experiment

Pilot dataset: 454 health-related web pages

Page 16: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Noise in Tags

• toread• todo• interesting• imported• safari_export• system:unfiled• .imported

Page 17: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Compound tags

• generalhealth• computersoftware

• cancerpatients-supportgroups• highbloodpressure

• whoiwanttosharewith

Page 18: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Keywords-based Tags-based

1. (---) /term "assignment" 2. (---) /term "brain [center]" 3. (+++) Neuroscience For Kids -

Explore the nervous system

1. (+++) Neuroscience For Kids - Explore the nervous system

2. (+++) 3. (+++)

Common tags

anatomy

psychology

biology

cognitive

education

reference

medical

human

homeschool

Web page

Matching

Results A

System A

Tags

Matching

Results B

System B

Page 19: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

AgendaAgenda

1. Common search problems

2. Online bookmarking - http://del.icio.us

3. Pilot Study

4. Future work

Page 20: Improving Search Engines using Online Communities

Anatoliy Gruzd Community-created metadata

Future work

• Use a larger dataset

• Compare results across different subject domains and genres

• Explore ways to combine tags and keywords to determine whether it will improve the quality of results (if at all)