Upload
rosanna-lindsey
View
218
Download
0
Embed Size (px)
Citation preview
ELPUB 2010, Helsinki, Finland 1
A Collaborative Faceted Categorization System – User
Interactions
Kurt Maly; Harris Wu; Mohammad Zubair ; Contact: [email protected]
ELPUB 2010, Helsinki, Finland 2
Outline
• Introduction– What is the problem we are addressing?– What is the approach we are taking?
• Facet schema evolution – user interactions– Schema enrichment– Anomaly detection– Visual schema presentation rearrangement – User feedback on classifications
• Conclusions– Future improvements
ELPUB 2010, Helsinki, Finland 3
Introduction - Problem
• Problem– Navigating a large growing collection of digital objects,
particularly non textual collection such as images and photographs
• Possible Approaches– Categorize and classify the collection manually using
human experts • Centralized, expensive, single perspective• Static, rigid structure; does not evolve
– Use of social tagging systems such as flickr.com• Low precision and recall, lack of structure in tags, ambiguity
and noise in tags
ELPUB 2010, Helsinki, Finland 4
Introduction - Approach
• We built a system that improves access to a large, growing collection by supporting users to build a faceted classification collaboratively
– Challenge: continuously classify new objects, modify the facet schema, and reclassify existing objects into the modified facet schema
– Collaborative Approach: Enable users to collaboratively build a schema with facets and categories, and to classify documents into this schema
• Needs automated system support to create critical mass and make it easier for users to collaborate
The system front page
ELPUB 2010, Helsinki, Finland 5
Facet and Category Enrichment
• Statistical co-occurrence model– Subsumption
• parent-child relationship between x and y if all documents tagged with y are also tagged with x
• for an existing tagword t – identifies all documents with tag t– If these documents have a common category c, the rule of subsumption
implies that t is a possible subcategory of c
• ExampleCategory Suggested sub-categoryAmerican Civil War military life
China boxer rebellion
ELPUB 2010, Helsinki, Finland 6
Schema Cleansing
• Problem: – categories are created under the wrong facet– child categories might represent a broader
concept than the parent category
• Solution: – Use WordNet’s hierarchical relationships
among words to detect anomalies
ELPUB 2010, Helsinki, Finland 7
Schema Cleansing
Hierarchy in WordNet (hyponymy: known as “is a” relationship)dog, domestic dog, Canis familiaris
=> canine, canid
=> carnivore
=> placental, placental mammal, eutherian, eutherian mammal
=> mammal
=> vertebrate, craniate
=> chordate
=> animal, animate being, beast, brute, creature, fauna
=> ...
anomaly detection algorithm Category Parent Cat Grandparent Category Problem
President Holiday Politics more closely related to grandparent than to parent
ELPUB 2010, Helsinki, Finland 8
Ordering of Schema Display
• Problem:– significant number of categories are created
under a given facet (or another category)– large number of facets are created
• Solution:– limit number of child categories/facets
displayed– configure administratively– order the display by a popularity measure
ELPUB 2010, Helsinki, Finland 9
Ordering of Schema Display
• Popularity (P) measure– favours the biggest, most used, and fastest
growing facets and categories
P = 0.5*f(PN*PC)+ 0.5*PR• f – normalizing factor• PN - total number of items in a category• PR - growth rate of a category: number of new
(recent) items for a unit of time• PC - number of clicks on the category link in the
browsing menu over a period of time
ELPUB 2010, Helsinki, Finland 10
Expanding category display using the “more…” link
ELPUB 2010, Helsinki, Finland 11
Limiting category display using the “more…” link
ELPUB 2010, Helsinki, Finland 12
Quality Assessment through User Feedback
• “thumb-up” and “thumb-down” buttons available for every association– vote up or down for the association between an image and a category
based on how relevant and accurate they think it is
• Value of this explicit feedback determines when a classification can be deleted or, conversely, when it becomes “hard”, i.e., it is confirmed– Action will update the confidence value of an association by increasing
or decreasing it by 0.05 based on whether a user believes it is a correct classification or not
– confidence value reaches 1.00 -> association is hardened
– confidence value falls below a threshold -> association is deleted
ELPUB 2010, Helsinki, Finland 13
Feedback on category associations
ELPUB 2010, Helsinki, Finland 14
ELPUB 2010, Helsinki, Finland 15
Conclusions
• Schema enrichment, cleansing and ordering are effective tools to remedy problems introduced by collaborative schema evolution
• Future improvements include recording actual administrator actions for training purposes