43
Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

Embed Size (px)

Citation preview

Page 1: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

Content Metadata and Search Remarks to the Dublin Core Workshop

Marti HearstSIMS, UC Berkeley

September 28, 2003

Page 2: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Resource Finding and the Web

• Web search vs. collection search– When a single page is all that’s

needed, web search is fine• Although validity is an issue

– Unsolved problem:• How to make source-focused search more

intuitive on the web?• One idea (untested): task-based search

Page 3: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

What about Content?

• Dublin Core takes stances on the “content-neutral” aspects of metadata

• Q: What about content?– The Metadata Marsh

• Getting agreement on metadata terms is difficult• Even worse when talking about content!

• A: Domain-specific solutions– Don’t worry about cross-domain consistency

(a necessary drawback)– Success: b-to-b protocols

Page 4: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Hypothesis (as yet untested):

Assuming we’ve focused on a domain, agreement on category assignment can converge much more quickly by:

1. Focusing on the applications that will use the category system.

2. Designing metadata to be used in interfaces that show items represented by many different categories in a highly flexible, but intuitive, manner.

Page 5: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

One Example: Flamenco Project

• Goal: create intuitive, inviting search interfaces that make use of hierarchical faceted metadata

• Challenge: How to provide flexibility and power without overwhelming? (Answer: careful interface design)

Page 6: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

6

The Flamenco Project Team

Brycen Chun Ame Elliott

Jennifer EnglishKevin Li

Rashmi Sinha Kirsten Swearingen

Ping Yee

http://flamenco.berkeley.eduResearch funded by:

NSF CAREER Grant IIS-9984741IBM Faculty Fellowship

Page 7: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Our Approach

• Integrate the search seamlessly into the information architecture.– Use proper HCI methodologies.

• Use faceted metadata:– More flexible than canned hyperlinks– Less complex than full search– Help users see where to go next and return to

what happened previously

• What’s new?– Putting hierarchical facets into a useable

interface.

Page 8: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Metadata: data about dataFacets: orthogonal categories

Time/Date TopicGeoRegion

Page 9: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Hierarchical Faceted Metadata Example: Biological Subject Headings

1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

Page 10: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Hierarchical Faced Metadata

1. Anatomy [A] Body Regions [A01] 2. [B] Musculoskeletal System [A02] 3. [C] Digestive System [A03] 4. [D] Respiratory System [A04] 5. [E] Urogenital System [A05] 6. [F] …… 7. [G] 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]

Page 11: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Hierarchical Faceted Metadata

1. Anatomy [A] Body Regions [A01] Abdomen [A01.047]

2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities

[A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]

Page 12: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Hierarchical Faceted Metadata

1. Anatomy [A] Body Regions [A01] Abdomen [A01.047]

2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities

[A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics 9. [I] Astronomy 10. [J] Nature 11. [K] Time 12. [L] Weights and Measures 13. [M] ….

Page 13: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Hierarchical Faceted Metadata

1. Anatomy [A] Body Regions [A01] Abdomen [A01.047]

2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities

[A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures 13. [M] ….

Page 14: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Hierarchical Faceted Metadata

1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures Calibration 13. [M] …. Metric System Reference Standard

Page 15: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

The Interface Design

• Chess metaphor– Opening– Middle game– End game

Page 16: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 17: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 18: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 19: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 20: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 21: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 22: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 23: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 24: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 25: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

The Interface Design

• Tightly Integrated Search• Supports Expand as well as Refine• Dynamically Generated Pages

– Paths can be taken in any order– Links are idempotent

• Consistent Color Coding• Consistent Backup and Bookmarking• Standard HTML

– No javascript

Page 26: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

What is Tricky About This?

• It is easy to do it poorly– Yahoo directory structure

• It is hard to be not overwhelming– Most users prefer simplicity unless

complexity really makes a difference

• It is hard to “make it flow”– Can it feel like “browsing the shelves”?– Yes, but we iterated the design 3 times

Page 27: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Usability Study

• Participants & Collection– 32 Art History Students– ~35,000 images from SF Fine Arts Museum

• Study Design– Within-subjects

• Each participant sees both interfaces• Balanced in terms of order and tasks

– Participants assess each interface after use– Afterwards they compare them directly

• Data recorded in behavior logs, server logs, paper-surveys; one or two experienced testers at each trial.

• Used 9 point Likert scales.• Session took about 1.5 hours; pay was $15/hour

Page 28: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

The Baseline System

• Floogle• Take the best of the existing

keyword-based image search systems

Page 29: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

sword

Page 30: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 31: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 32: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Page 33: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Hypotheses

• We attempted to design tasks to test the following hypotheses:– Participants will experience greater search

satisfaction, feel greater confidence in the results, produce higher recall, and encounter fewer dead ends using FC over Baseline

– FC will perceived to be more useful and flexible than Baseline

– Participants will feel more familiar with the contents of the collection after using FC

– Participants will use FC to create multi-faceted queries

Page 34: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Four Types of Tasks

– Unstructured (3): Search for images of interest – Structured Task (11-14): Gather materials for

an art history essay on a given topic, e.g.• Find all woodcuts created in the US• Choose the decade with the most• Select one of the artists in this periods and show all

of their woodcuts• Choose a subject depicted in these works and find

another artist who treated the same subject in a different way.

– Structured Task (10): compare related images• Find images by artists from 2 different countries

that depict conflict between groups.

– Unstructured (5): search for images of interest

Page 35: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Other Points

• Participants were NOT walked through the interfaces.

• The wording of Task 2 reflected the metadata; not the case for Task 3

• Within tasks, queries were not different in difficulty (t’s<1.7, p >0.05 according to post-task questions)

• Flamenco is and order of magnitude slower than Floogle on average.– In task 2 users were allowed 3 more minutes in FC

than in Baseline.– Time spent in tasks 2 and 3 were significantly longer in

FC (about 2 min more).

Page 36: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Post-Interface Assessments

All significant at p<.05 except simple and overwhelming

Page 37: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Post-Test Comparison

15 16

2 30

1 29

   4 28

8 23

6 24

28 3

1 31

2 29

FCBaseline

Find images of rosesFind all works from a given periodFind pictures by 2 artists in same media

Which Interface Preferable For:

Page 38: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Post-Test Comparison

15 16

2 30

1 29

   4 28

8 23

6 24

28 3

1 31

2 29

FCBaseline

Overall Assessment:

More useful for your tasksEasiest to use

Most flexibleMore likely to result in dead ends

Helped you learn moreOverall preference

Find images of rosesFind all works from a given periodFind pictures by 2 artists in same media

Which Interface Preferable For:

Page 39: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Study Results Summary

• Strongly positive results for the faceted metadata interface.

• Moderate use of multiple facets.• Strong preference over the current state

of the art.– Chair of Architecture Dept: “It felt like I was

browsing the shelves!”– This kind of enthusiasm is not seen in

similarity-based image search interfaces.

• Hypotheses are supported.

Page 40: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Study Summary

• Usability studies done on 3 collections:– Recipes: 13,000 items– Architecture Images: 40,000 items– Fine Arts Images: 35,000 items

• Conclusions:– Users like and are successful with the

dynamic faceted hierarchical metadata, especially for browsing tasks

– Very positive results, in contrast with studies on earlier iterations

– Note: it seems you have to care about the contents of the collection to like the interface

Page 41: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Advantages of the Approach

• Supports different search types– Highly constrained known-item searches– Open-ended, browsing tasks – Can easily switch from one mode to the

other midstream– Can both expand and refine

• Allows different people to add content without breaking things

• Can make use of standard technology

Page 42: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Metadata Availability

• Many collections already have rich metadata associated with them.

• Automated methods are improving.

• Have applied this to:– Tobacco documents archive– MEDLINE

Page 43: Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

M. Hearst Faceted Metadata in Search

Back to the Hypothesis

• This kind of tool may be helpful for resolving metadata creation wars.– Multiple paths to get to the same item– Different views on different subsets of

items– No need to force everything into one

hierarchy

• What do you think?