42
Castanet: Using WordNet to Build Facet Hierarchies Emilia Stoica and Marti Hearst School of Information, Berkeley

Castanet: Using WordNet to Build Facet Hierarchies

  • Upload
    sherri

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Castanet: Using WordNet to Build Facet Hierarchies. Emilia Stoica and Marti Hearst School of Information, Berkeley. Motivation. Want to assign labels from multiple hierarchies. Fruit Apricot. Flavor gingerroot. Vegetables pepper. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Castanet: Using WordNet to Build Facet Hierarchies

Castanet:Using WordNet to Build Facet

Hierarchies

Emilia Stoica and Marti HearstSchool of Information,

Berkeley

Page 2: Castanet: Using WordNet to Build Facet Hierarchies

Motivation

Want to assign labels from multiple hierarchies

Page 3: Castanet: Using WordNet to Build Facet Hierarchies

Motivation

Hot and Sweet Chicken: 1 pepper, 2 apricots, 1 pound chicken breast, 1 Tbsp gingerroot

Meat Chicken

Vegetables pepper

Fruit Apricot

Flavor gingerroot

Page 4: Castanet: Using WordNet to Build Facet Hierarchies

Castanet

Carves out a structure from the hypernym (IS-A) relations within WordNet

Produces surprisingly good results for a wide range of subjects e.g., arts, medicine, recipes, math, news,

bibliographical records

Page 5: Castanet: Using WordNet to Build Facet Hierarchies

WordNet Challenges

A word may have more than one sense

- Fine granularity of word sense distinctions

e.g., newspaper (#1) - daily publication on

folded sheets

newspaper (#3) - physical object

- Ambiguity for the same sense

tuna#1 cactus

#2 fish food fish bony fish

Page 6: Castanet: Using WordNet to Build Facet Hierarchies

WordNet Challenges (cont.)

The hypernym path may be quite long (e.g., sense #3 of tuna has 14 nodes)

Sparse coverage of proper names and noun phrases (not addressed)

Page 7: Castanet: Using WordNet to Build Facet Hierarchies

Algorithm Goals

Build a set of facet hierarchies Balance depth and breadth

Avoid “skinny” paths Don’t go too deep or too broad

Choose understandable labels Disambiguate words

Currently a word can take on only one sense

Page 8: Castanet: Using WordNet to Build Facet Hierarchies

Our ApproachD

ocum

ents

Sel

ect

ter

ms

WordNet

Build core tree

Augmentcore tree

Remove

top level

categories

Compress

Tree

Divide into facets

Page 9: Castanet: Using WordNet to Build Facet Hierarchies

1. Select Terms

Select well-distributed terms from the collection

Eliminate stopwords Retain only those terms

with a distribution higher than a threshold

(default: top 10%)

Doc

ume

nts

WordNet

Sel

ect

term

s

Build core tree

Comp. tree

Remove top levelcateg.

Augm. core tree

Page 10: Castanet: Using WordNet to Build Facet Hierarchies

2. Build Core Tree

Get hypernym path if term: - has only one sense, or - matches a pre-selected WordNet domain Adding a new term increases a

count at each node on its path by # of docs with the term. frozen dessert

sundae

entity

substance,matter

nutriment

dessert

ice cream sundae

frozen dessert

entity

substance,matter

nutriment

dessert

sherbet,sorbet

sherbet

Build a “backbone” Create paths from

unambiguous terms only Bias the structure towards

appropriate senses of words

Doc

ume

nts

WordNet

Sel

ect

te

rms

Build core tree

Comp. tree

Remove top levelcateg.

Augm. core tree

Page 11: Castanet: Using WordNet to Build Facet Hierarchies

2. Build Core Tree (cont.)

Merge hypernym paths to build a tree

sundae

entity

substance,matter

nutriment

dessert

ice cream sundae

frozen dessert

entity

substance,matter

nutriment

dessert

sherbet,sorbet

sherbet

frozen dessert

sundae sherbet

substance,matter

nutriment

dessert

sherbet,sorbet

frozen dessert

entity

ice cream sundae

Page 12: Castanet: Using WordNet to Build Facet Hierarchies

3. Augment Core Tree

Attach to Core tree the terms with more than one sense

Favor the more common path over other alternatives

Doc

ume

nts

WordNet

Sel

ect

te

rms

Build core tree

Comp. tree

Remove top levelcateg.

Augm. core tree

Page 13: Castanet: Using WordNet to Build Facet Hierarchies

Augment Core Tree (cont.)

Date (p1) Date (p2)

entity abstraction substance,matter measure, quantity food, nutrient fundamental quality nutriment time period food calendar day (18) edible fruit (78) date date

Choose this path since it has more items assigned

Page 14: Castanet: Using WordNet to Build Facet Hierarchies

Optional Step: Domains

To disambiguate, use Domains Wordnet has 212 Domains

medicine, mathematics, biology, chemistry, linguistics, soccer, etc.

A better collection has been developed by Magnini 2000 Assigns a domain to every noun synset

Automatically scan the collection to see which domains apply

The user selects which of the suggested domains to use or may add own

Paths for terms that match the selected domains are added to the core tree

Page 15: Castanet: Using WordNet to Build Facet Hierarchies

Using Domains

dip glosses:

Sense 1: A depression in an otherwise level surface

Sense 2: The angle that a magnet needle makes with horizon

Sense 3: Tasty mixture into which bite-size foods are dipped

dip hypernyms

Sense 1 Sense 2 Sense 3

solid shape, form food

=> concave shape => space => ingredient, fixings

=> depression => angle => flavorer

Given domain “food”, choose sense 3

Page 16: Castanet: Using WordNet to Build Facet Hierarchies

4. Compress Tree

Rule 1: Eliminate a parent with fewer

than k children unless it is the root or its distribution is larger than 0.1*maxdist

ice cream sundae

dessert

sundae

frozen dessert

sherbet,sorbet

sherbet

parfait

dessert

frozen dessert

sundae parfait sherbet

abstraction

Doc

ume

nts

WordNet

Sel

ect

te

rms

Build core tree

Comp. tree

Remove top levelcateg.

Augm. core tree

Page 17: Castanet: Using WordNet to Build Facet Hierarchies

4. Compress Tree (cont.)

Rule 2: Eliminate a child whose

name appears within the parent’s name

sundae

dessert

frozen dessert

parfait sherbet

dessert

sundae parfait sherbet

abstraction

Doc

ume

nts

WordNet

Sel

ect

te

rms

Build core tree

Comp. tree

Remove top levelcateg.

Augm. core tree

Page 18: Castanet: Using WordNet to Build Facet Hierarchies

5. Divide into Facets

Divide into facets

Page 19: Castanet: Using WordNet to Build Facet Hierarchies

5. Divide into Facets(Remove top levels)

sugar syrup

entity

substance,matter

food,nutriment

ingredient,fixings

food stuff,food product

sweeteningherb

flavorer

parsley oregano sugar syrup

sweeteningherb

flavorer

parsley oregano

Rule 1: Eliminate very general categories (e.g., entity, abstraction). If no paths are longer than threshold t, then done. Else:

Divide into facets

Rule 2: Undo first step. Then eliminate all top levels until the maximum length of any path in the resulting hierarchy is t.

Page 20: Castanet: Using WordNet to Build Facet Hierarchies

Example: Recipes (3500 docs)

Page 21: Castanet: Using WordNet to Build Facet Hierarchies

Castanet Output (shown in Flamenco)

Page 22: Castanet: Using WordNet to Build Facet Hierarchies

Castanet Output

Page 23: Castanet: Using WordNet to Build Facet Hierarchies

Castanet Output

Page 24: Castanet: Using WordNet to Build Facet Hierarchies

Castanet Output

Page 25: Castanet: Using WordNet to Build Facet Hierarchies

Castanet Output

Page 26: Castanet: Using WordNet to Build Facet Hierarchies
Page 27: Castanet: Using WordNet to Build Facet Hierarchies

Castanet Evaluation

This is a tool for information architects, so people of this type did the evaluation

We compared output on Recipes Biomedical journal titles

We compared to two state-of-the-art algorithms LDA (Blei et al. 04) Subsumption (Sanderson & Croft ’99)

Page 28: Castanet: Using WordNet to Build Facet Hierarchies

Subsumption Output

Page 29: Castanet: Using WordNet to Build Facet Hierarchies

Subsumption Output

Page 30: Castanet: Using WordNet to Build Facet Hierarchies

Subsumption Output

Page 31: Castanet: Using WordNet to Build Facet Hierarchies

Subsumption Output

Page 32: Castanet: Using WordNet to Build Facet Hierarchies

LDA Output

Page 33: Castanet: Using WordNet to Build Facet Hierarchies

LDA Output

Page 34: Castanet: Using WordNet to Build Facet Hierarchies

LDA Output

Page 35: Castanet: Using WordNet to Build Facet Hierarchies

Evaluation Method

Information architects assessed the category systems

For each of 2 systems’ output: Examined and commented on top-level Examined and commented on two sub-levels

Then comment on overall properties Meaningful? Systematic? Likely to use in your work?

Page 36: Castanet: Using WordNet to Build Facet Hierarchies

Evaluation (cont.)

Sample questions for top level categories: - Would you add/remove/rename any category ?

- Did this category match your expectations ?

Sample questions for a specific category: - Would you add/move/remove any sub-categories ? - Would you promote any sub-category to top level ?

General questions: - Would you use Castanet ? - Would you use LDA ? - Would you use Subsumption ? - Would you use list of most frequent terms ?

Page 37: Castanet: Using WordNet to Build Facet Hierarchies

Evaluation Results

Results on recipes collection for “Would you use this system in your work?” # “Yes in some cases” or “yes, definitely”:

Castanet: 29/34 LDA: 0/18 Subsumption: 6/16 Baseline: 25/34

Average response to questions about quality (4 = “strongly agree”)

Page 38: Castanet: Using WordNet to Build Facet Hierarchies

Evaluation Results

Average responses for top-level categories 4= no changes, 1 = change many

Average responses for 2 subcategories

Page 39: Castanet: Using WordNet to Build Facet Hierarchies

Needed Improvements

Take spelling variations and morphological variants into account

Use verbs and adjectives, not just nouns Normalize noun phrases Allow terms to have more than one sense Improve algorithm for assigning documents to

categories.

Page 40: Castanet: Using WordNet to Build Facet Hierarchies

Opportunities for Tagging

New opportunity: Tagging, folksonomies (flickr, de.lici.ous) People are created facets in a decentralized

manner They are assigning multiple facets to items This is done on a massive scale This leads naturally to meaningful associations

Page 41: Castanet: Using WordNet to Build Facet Hierarchies

Conclusions

Castanet builds a set of faceted hierarchies by finding IS-A relations between terms using WordNet.

The method has been tested on various domains: medicine, recipes, math, news, arts, bibliographical

records

Usability study shows: Castanet is preferred to other state-of-the art solutions. Information architects want to use the tool in their work.

Page 42: Castanet: Using WordNet to Build Facet Hierarchies

Learn More

Funding This work supported in part by NSF (IIS-9984741)

For more information: Stoica, E., Hearst, M., and Richardson, M., Automating

Creation of Hierarchical Faceted Metadata Structures, NAACL/HLT 2007

See http://flamenco.berkeley.edu