View
213
Download
0
Tags:
Embed Size (px)
Citation preview
2004.10.19 - SLIDE 1IS 202 - FALL 2004
Lecture 15: Categorization
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2004
SIMS 202:
Information Organization
and Retrieval
Credits to Marti Hearst and Warren Sack for some of the slides in this lecture
2004.10.19 - SLIDE 2IS 202 - FALL 2004
Agenda
• Information Organization Overview
• Categorization
• Discussion Questions
• Action Items for Next Time
2004.10.19 - SLIDE 3IS 202 - FALL 2004
Agenda
• Information Organization Overview
• Categorization
• Discussion Questions
• Action Items for Next Time
2004.10.19 - SLIDE 4IS 202 - FALL 2004
Information Organization Overview
Tuesday, October 19, 2004 Categorization
Thursday, October 21, 2004 Knowledge Representation
Tuesday, October 26, 2004 Project Introduction
Thursday, October 28, 2004 Lexical Relations and WordNet
Tuesday, November 02, 2004 Semantic Web and RDF
Thursday, November 04, 2004 Controlled Vocabularies Introduction
Tuesday, November 09, 2004Facetted Classification and Thesaurus Design
and Construction
Thursday, November 11, 2004 No Class -- Veteran's Day
2004.10.19 - SLIDE 5IS 202 - FALL 2004
Information Organization Overview
Tuesday, November 16, 2004 Metadata Standards
Thursday, November 18, 2004Multimedia Information Organization and
Retrieval
Tuesday, November 23, 2004Metadata for Motion Pictures: Media Streams
and MPEG-7
Thursday, November 25, 2004 No Class -- Thanksgiving Day
Tuesday, November 30, 2004Mobile and Context-Aware Mutlimedia
Information Systems
Thursday, December 02, 2004 Project Presentations
Tuesday, December 07, 2004Looking Backward Looking Forward: Future of
Information Systems
Thursday, December 09, 2004 Final Review
2004.10.19 - SLIDE 6IS 202 - FALL 2004
Agenda
• Information Organization Overview
• Categorization
• Discussion Questions
• Action Items for Next Time
2004.10.19 - SLIDE 7IS 202 - FALL 2004
Categorization
Tuesday, October 19, 2004 Categorization
Thursday, October 21, 2004 Knowledge Representation
Tuesday, October 26, 2004 Project Introduction
Thursday, October 28, 2004 Lexical Relations and WordNet
Tuesday, November 02, 2004 Semantic Web and RDF
Thursday, November 04, 2004 Controlled Vocabularies Introduction
Tuesday, November 09, 2004Facetted Classification and Thesaurus Design and Construction
2004.10.19 - SLIDE 8IS 202 - FALL 2004
Foucault on Borges
• This passage quotes “a certain Chinese encyclopedia” in which it is written that ‘animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies.’– Michel Foucault, The Order of Things, 1970
2004.10.19 - SLIDE 11IS 202 - FALL 2004
Why Study Categorization?
• Categorization is central to how we organize information and the world
• Categorization is a core cognitive process
• In recent years, centuries-old views of categorization have been revised
• Understanding how people categorize can help us design information systems that do a better job at organization and retrieval
2004.10.19 - SLIDE 12IS 202 - FALL 2004
Why Read Lakoff?
• Very influential figure in recent thinking about human categorization, metaphor, and cognition
• Provides summary of historical work and develops syncretic model of cognition and categorization
• Clear explanations using examples
• Professor at UC Berkeley (Department of Linguistics)
2004.10.19 - SLIDE 13IS 202 - FALL 2004
George Lakoff
• Lakoff’s research covers many areas of Conceptual Analysis within Cognitive Linguistics– The nature of human conceptual systems, especially metaphor
systems for concepts such as time, events, causation, emotions, morality, the self, politics, etc.
– The development of Cognitive Social Science, which applies ideas of Cognitive Semantics to the Social Sciences
– The implications of Cognitive Science for Philosophy, in collaboration with Mark Johnson, Chair of Philosophy at the University of Oregon
– Neural foundations of conceptual systems and language, in collaboration with Jerome Feldman, of the International Computer Science Institute, seeking to develop biologically-motivated structured connectionist systems to model both the learning of conceptual systems and their neural representations
– The cognitive structure, especially the metaphorical structure, of mathematics, in collaboration with Rafael Núñez
2004.10.19 - SLIDE 14IS 202 - FALL 2004
George Lakoff
• Selected publications– Metaphors We Live By (with Mark Johnson) Univ. of
Chicago Press. 1980.– Women, Fire, and Dangerous Things. University of
Chicago Press. 1987.– More Than Cool Reason. (with Mark Turner) Univ. of
Chicago Press. 1989.– Moral Politics. University of Chicago Press. 1996.– Philosophy in The Flesh. Basic Books, 1999.– Where Mathematics Comes From: How the Embodied
Mind Brings Mathematics into Being. (with Rafael Núñez). Basic Books. 2000.
– Moral Politics: How Liberals and Conservatives Think. Second Edition. University of Chicago Press, 2002.
2004.10.19 - SLIDE 15IS 202 - FALL 2004
Objectivist Views
• Thought is mechanical manipulation of symbols• The mind is an abstract machine• Symbols get their meaning from correspondences to the external
world• Symbols are internal representations• Abstract symbols stand in correspondence with the external world
independent of the interpreting organism• The human mind is a mirror of nature• Human bodies play no role in characterizing concepts• Thought is abstract and disembodied• Exclusively symbolic machines are capable of thought• Thought can be broken down into simple “building blocks”• Thought is defined by mathematical logic
2004.10.19 - SLIDE 16IS 202 - FALL 2004
Experientialist Views
• Thought is embodied• Thought is imaginative• Thought has gestalt properties• Thought utilizes basic-level categorization and basic-
level primacy• Thought uses prototypes and family resemblances as
organizing structures• Conceptual structure can be described using cognitive
models that have the above properties• The theory of cognitive models incorporates what was
right about the traditional view of categorization, meaning, and reason, while accounting for the empirical data on categorization and fitting the new view overall
2004.10.19 - SLIDE 17IS 202 - FALL 2004
Central Conceptual Issue
• Do meaningful thought and reason concern merely the manipulations of abstract symbols and their correspondence to an objective reality, independent of any embodiment (except, perhaps, for limitations imposed by the organism)?
• Do meaningful thought and reason essentially concern the nature of the organism doing the thinking—including the nature of its body, its interaction in its environment, its social character, and so on?
2004.10.19 - SLIDE 18IS 202 - FALL 2004
Categorization
• Classical categorization– Necessary and sufficient conditions for
membership– Generic-to-specific monohierarchical structure
• Modern categorization– Characteristic features (family resemblances)– Centrality/typicality (prototypes)– Basic-level categories
2004.10.19 - SLIDE 19IS 202 - FALL 2004
Defining Category Membership
• Necessary and sufficient conditions– Every condition must be met– No other conditions can be required
• Example: A prime number:– An integer divisible only by itself and 1.
Source: Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc.
• Example: mother– A woman who has given birth to a child.
2004.10.19 - SLIDE 20IS 202 - FALL 2004
Defining Category Membership
• Necessary and sufficient conditions for Mother?– mother(A,B) -> female(A), gave-birth-to(A,B),
same-species(A,B)
• What about– Birth mother vs. adoptive mother– Surrogate mother– Transgenic mother
2004.10.19 - SLIDE 21IS 202 - FALL 2004
Can Category Membership Be Defined?
• What are the necessary and sufficient conditions for something to be a game?
• Famous example by Wittgenstein– Classic categories assume clear boundaries
defined by common properties (necessary and sufficient conditions)
• How do we categorize games?
2004.10.19 - SLIDE 22IS 202 - FALL 2004
Definition of Game
• Counterexample: “Game”– No common properties shared by all games
• Card games, ball games, Olympic games, children’s games
– Competition: ring-around-the-rosy– Skill: dice games– Luck: chess
– No fixed boundary to category• Can be extended to new games (e.g., video
games)
• Alternative notion of category membership– Concepts related by family resemblances
2004.10.19 - SLIDE 23IS 202 - FALL 2004
Properties of Categorization
• Family resemblance– Members of a category may be related to one
another without all members having any property in common
• Instead, they may share a large subset of traits• Some attributes are more likely given that others
have been seen
– Example: feathers, wings, twittering, ...• Likely to be a bird, but not all features apply to
“emu”• Unlikely to see an association with “barks”
2004.10.19 - SLIDE 24IS 202 - FALL 2004
Properties of Categorization
• Example: Prime numbers– Definition: An integer divisible only by itself and 1– Examples: 2, 3, 5, 7, 11, 13, 17, …
• A very clear-cut category. Or is it?– Can one number be “more prime” than another?
• Centrality– Some members of a category may be “better
examples” than others, i.e., “prototypical” members• Example: robins vs. chickens vs. emus
2004.10.19 - SLIDE 25IS 202 - FALL 2004
Properties of Categorization
• Characteristic features– Perceived degree of category membership
has to do with which features help define the category
– Members usually do not have ALL the necessary features, but have some subset
– Those members that have more of the central features are seen as more central members
– People have conceptions of typical members
2004.10.19 - SLIDE 26IS 202 - FALL 2004
Testing for Centrality/Typicality
• Ask a series of questions, compare how long it takes people to answer– True or false:
• An apple is a fruit• A plum is a fruit• A coconut is a fruit• An olive is a fruit• A tomato is a fruit
• Rosch and Mervis– The more features a fruit shares with the other fruits,
the more typical a member of the class it is
2004.10.19 - SLIDE 27IS 202 - FALL 2004
Characteristic Features
• Is a cat on a mat a cat?
• Is a dead cat a cat?
• Is a photo of a cat a cat?
• Is a cat with three legs a cat?
• Is a cat that barks a cat?
• Is a cat with a dog’s brain a cat?
• Is a cat with every cell replaced by a dog’s cells a cat?
2004.10.19 - SLIDE 28IS 202 - FALL 2004
Properties of Categorization
• Basic-level categories– Categories are organized into a hierarchy
from the most general to the most specific, but the level that is most cognitively basic is “in the middle” of the hierarchy
• Basic-level primacy– Basic-level categories are functionally primary
with respect to factors including ease of cognitive processing (learning, reasoning, recognition, etc.)
2004.10.19 - SLIDE 29IS 202 - FALL 2004
Basic-Level Categories
• Brown 1958, 1965, Berlin et al., 1972, 1973• Folk biology:
– Unique beginner: plant, animal– Life form: tree, bush, flower– Generic name: pine, oak, maple, elm– Specific name: Ponderosa pine, white pine– Varietal name: Western Ponderosa pine
• No overlap between levels• Level 3 is basic
– Corresponds to genus– Folk biological categories correspond accurately to
scientific biological categories only at the basic level
2004.10.19 - SLIDE 30IS 202 - FALL 2004
Psychologically Primary Levels
SUPERORDINATE animal furniture
BASIC LEVEL dog chair
SUBORDINATE terrier rocker
• Children take longer to learn superordinate categories above the basic level
• Superordinate categories above the basic level are not associated with mental images or motor actions
2004.10.19 - SLIDE 31IS 202 - FALL 2004
Basic-Level Categorization
• Perception– Overall perceived shape– Single mental image– Fast identification
• Function– General motor program
• Communication– Shortest, most commonly used and contextually neutral words– First learned by children
• Knowledge Organization– Most attributes of category members stored at this level
2004.10.19 - SLIDE 32IS 202 - FALL 2004
Middle-Out Categorization
• Top down– Object
• Writing implement– Pen
• Bottom up– Sanford Uniball Black Pen
• Ink Pen
– Pen
• Middle out– Writing implement
• Pen– Ink Pen
2004.10.19 - SLIDE 33IS 202 - FALL 2004
Summary
• Processes of categorization underlie many of the issues having to do with information organization
• Categorization is messier than our computer systems would like
• Human categories have graded membership, consisting of family resemblances– Family resemblance is expressed in part by which subset of
features is shared– It is also determined by underlying understandings of the world
that do not get represented in most systems
• Basic-level categories, as well as subordinate and superordinate categories, seem to be cognitively real and therefore important in the design of information organization and retrieval systems
2004.10.19 - SLIDE 34IS 202 - FALL 2004
Agenda
• Information Organization Overview
• Categorization
• Discussion Questions
• Action Items for Next Time
2004.10.19 - SLIDE 35IS 202 - FALL 2004
Discussion Questions (Lakoff)
• Sarita Yardi on Lakoff– Doesn’t Lakoff’s prototype theory completely debunk IR (as we
have learned it so far in this course)? The success of IR relies on its ability to categorize queries and documents such that they can be best matched for the user’s needs. However, if categorization is necessarily dependent on prototypes, embodiment, and human thought, then isn’t it impossible to develop an IR system that can be universally applicable?
– I think Boolean matching might avoid the problem because it fits into the abstract and mathematic classic theory. All the other types (vector, probabilistic, relevance feedback, etc.) would require custom design after the effects of human reason had been factored in to each individual usage.
– Do you agree? Disagree? Couldn’t care less? Agree but nothing we can do about it? Feel that we should never ever try to apply theories to practical applications?
2004.10.19 - SLIDE 36IS 202 - FALL 2004
Discussion Questions (Lakoff)
• Sarita Yardi on Lakoff– Categorization plays an important role on the web. It
sure would be nice if we had some super power index with categorized links to everything we could ever want to know.
– Is there any hope for categorizing anything on the web at all or will increasing entropy (randomness and disorder) forever dominate the way we manage information on the web? Does the prototype theory give us more or less hope for the role of categorization on the web? (some random subjects to spur thought… blogs, personal home pages, university websites, google versus yahoo, media)
2004.10.19 - SLIDE 37IS 202 - FALL 2004
Agenda
• Information Organization Overview
• Categorization
• Discussion Questions
• Action Items for Next Time
2004.10.19 - SLIDE 39IS 202 - FALL 2004
George Furnas Lecture
• Towards Framing the Convergence– Wednesday, October 20, 2004, 4:00 - 5:30 pm– 202 South Hall
• Abstract– The past several years have seen an increased convergence in
disciplines working towards the goal of bringing people, information and technology together in more valuable ways. The goal has engaged participants ranging from Computer Science to Library Science, from Organizational Theory to Public Policy, and from Economics to Sociology, among many others. At the University of Michigan's School of Information we have been working for several years in an interdisciplinary effort, not just to participate in this convergence, but to understand it: Why is this broad suite of disciplines needed? What intellectual frameworks might we use for trying to understand how they fit together? How might we use such frameworks to leverage the disparate contributions better? This talk will describe those efforts and one take on some emerging results.
2004.10.19 - SLIDE 40IS 202 - FALL 2004
Homework (!)
• Course Reader– “The Vocabulary Problem in Human-System
Communication” (G. W. Furnas, T. K. Landauer, L. M. Gomez, S. T. Dumais)
• (Steve)
– “CYC: A Large-Scale Investment in Knowledge Infrastructure” (D. B. Lenat)
• (Rupa)
– “Commonsense-Based Interfaces” (M. Minsky)• (Andrew)
– Lakoff redux• (Morgan)