40
2004.10.19 - SLIDE 1 IS 202 - FALL 2004 Lecture 15: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004 SIMS 202: Information Organization and Retrieval Credits to Marti Hearst and Warren Sack for some of the slides in this lecture

2004.10.19 - SLIDE 1IS 202 - FALL 2004 Lecture 15: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

2004.10.19 - SLIDE 1IS 202 - FALL 2004

Lecture 15: Categorization

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 pm

Fall 2004

SIMS 202:

Information Organization

and Retrieval

Credits to Marti Hearst and Warren Sack for some of the slides in this lecture

2004.10.19 - SLIDE 2IS 202 - FALL 2004

Agenda

• Information Organization Overview

• Categorization

• Discussion Questions

• Action Items for Next Time

2004.10.19 - SLIDE 3IS 202 - FALL 2004

Agenda

• Information Organization Overview

• Categorization

• Discussion Questions

• Action Items for Next Time

2004.10.19 - SLIDE 4IS 202 - FALL 2004

Information Organization Overview

Tuesday, October 19, 2004 Categorization

Thursday, October 21, 2004 Knowledge Representation

Tuesday, October 26, 2004 Project Introduction

Thursday, October 28, 2004 Lexical Relations and WordNet

Tuesday, November 02, 2004 Semantic Web and RDF

Thursday, November 04, 2004 Controlled Vocabularies Introduction

Tuesday, November 09, 2004Facetted Classification and Thesaurus Design

and Construction

Thursday, November 11, 2004 No Class -- Veteran's Day

2004.10.19 - SLIDE 5IS 202 - FALL 2004

Information Organization Overview

Tuesday, November 16, 2004 Metadata Standards

Thursday, November 18, 2004Multimedia Information Organization and

Retrieval

Tuesday, November 23, 2004Metadata for Motion Pictures: Media Streams

and MPEG-7

Thursday, November 25, 2004 No Class -- Thanksgiving Day

Tuesday, November 30, 2004Mobile and Context-Aware Mutlimedia

Information Systems

Thursday, December 02, 2004 Project Presentations

Tuesday, December 07, 2004Looking Backward Looking Forward: Future of

Information Systems

Thursday, December 09, 2004 Final Review

2004.10.19 - SLIDE 6IS 202 - FALL 2004

Agenda

• Information Organization Overview

• Categorization

• Discussion Questions

• Action Items for Next Time

2004.10.19 - SLIDE 7IS 202 - FALL 2004

Categorization

Tuesday, October 19, 2004 Categorization

Thursday, October 21, 2004 Knowledge Representation

Tuesday, October 26, 2004 Project Introduction

Thursday, October 28, 2004 Lexical Relations and WordNet

Tuesday, November 02, 2004 Semantic Web and RDF

Thursday, November 04, 2004 Controlled Vocabularies Introduction

Tuesday, November 09, 2004Facetted Classification and Thesaurus Design and Construction

2004.10.19 - SLIDE 8IS 202 - FALL 2004

Foucault on Borges

• This passage quotes “a certain Chinese encyclopedia” in which it is written that ‘animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies.’– Michel Foucault, The Order of Things, 1970

2004.10.19 - SLIDE 9IS 202 - FALL 2004

Yahoo! Categorization

2004.10.19 - SLIDE 10IS 202 - FALL 2004

Yahoo! Categorization Detail

2004.10.19 - SLIDE 11IS 202 - FALL 2004

Why Study Categorization?

• Categorization is central to how we organize information and the world

• Categorization is a core cognitive process

• In recent years, centuries-old views of categorization have been revised

• Understanding how people categorize can help us design information systems that do a better job at organization and retrieval

2004.10.19 - SLIDE 12IS 202 - FALL 2004

Why Read Lakoff?

• Very influential figure in recent thinking about human categorization, metaphor, and cognition

• Provides summary of historical work and develops syncretic model of cognition and categorization

• Clear explanations using examples

• Professor at UC Berkeley (Department of Linguistics)

2004.10.19 - SLIDE 13IS 202 - FALL 2004

George Lakoff

• Lakoff’s research covers many areas of Conceptual Analysis within Cognitive Linguistics– The nature of human conceptual systems, especially metaphor

systems for concepts such as time, events, causation, emotions, morality, the self, politics, etc.

– The development of Cognitive Social Science, which applies ideas of Cognitive Semantics to the Social Sciences

– The implications of Cognitive Science for Philosophy, in collaboration with Mark Johnson, Chair of Philosophy at the University of Oregon

– Neural foundations of conceptual systems and language, in collaboration with Jerome Feldman, of the International Computer Science Institute, seeking to develop biologically-motivated structured connectionist systems to model both the learning of conceptual systems and their neural representations

– The cognitive structure, especially the metaphorical structure, of mathematics, in collaboration with Rafael Núñez

2004.10.19 - SLIDE 14IS 202 - FALL 2004

George Lakoff

• Selected publications– Metaphors We Live By (with Mark Johnson) Univ. of

Chicago Press. 1980.– Women, Fire, and Dangerous Things. University of

Chicago Press. 1987.– More Than Cool Reason. (with Mark Turner) Univ. of

Chicago Press. 1989.– Moral Politics. University of Chicago Press. 1996.– Philosophy in The Flesh. Basic Books, 1999.– Where Mathematics Comes From: How the Embodied

Mind Brings Mathematics into Being. (with Rafael Núñez). Basic Books. 2000.

– Moral Politics: How Liberals and Conservatives Think. Second Edition. University of Chicago Press, 2002.

2004.10.19 - SLIDE 15IS 202 - FALL 2004

Objectivist Views

• Thought is mechanical manipulation of symbols• The mind is an abstract machine• Symbols get their meaning from correspondences to the external

world• Symbols are internal representations• Abstract symbols stand in correspondence with the external world

independent of the interpreting organism• The human mind is a mirror of nature• Human bodies play no role in characterizing concepts• Thought is abstract and disembodied• Exclusively symbolic machines are capable of thought• Thought can be broken down into simple “building blocks”• Thought is defined by mathematical logic

2004.10.19 - SLIDE 16IS 202 - FALL 2004

Experientialist Views

• Thought is embodied• Thought is imaginative• Thought has gestalt properties• Thought utilizes basic-level categorization and basic-

level primacy• Thought uses prototypes and family resemblances as

organizing structures• Conceptual structure can be described using cognitive

models that have the above properties• The theory of cognitive models incorporates what was

right about the traditional view of categorization, meaning, and reason, while accounting for the empirical data on categorization and fitting the new view overall

2004.10.19 - SLIDE 17IS 202 - FALL 2004

Central Conceptual Issue

• Do meaningful thought and reason concern merely the manipulations of abstract symbols and their correspondence to an objective reality, independent of any embodiment (except, perhaps, for limitations imposed by the organism)?

• Do meaningful thought and reason essentially concern the nature of the organism doing the thinking—including the nature of its body, its interaction in its environment, its social character, and so on?

2004.10.19 - SLIDE 18IS 202 - FALL 2004

Categorization

• Classical categorization– Necessary and sufficient conditions for

membership– Generic-to-specific monohierarchical structure

• Modern categorization– Characteristic features (family resemblances)– Centrality/typicality (prototypes)– Basic-level categories

2004.10.19 - SLIDE 19IS 202 - FALL 2004

Defining Category Membership

• Necessary and sufficient conditions– Every condition must be met– No other conditions can be required

• Example: A prime number:– An integer divisible only by itself and 1.

Source: Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc.

• Example: mother– A woman who has given birth to a child.

2004.10.19 - SLIDE 20IS 202 - FALL 2004

Defining Category Membership

• Necessary and sufficient conditions for Mother?– mother(A,B) -> female(A), gave-birth-to(A,B),

same-species(A,B)

• What about– Birth mother vs. adoptive mother– Surrogate mother– Transgenic mother

2004.10.19 - SLIDE 21IS 202 - FALL 2004

Can Category Membership Be Defined?

• What are the necessary and sufficient conditions for something to be a game?

• Famous example by Wittgenstein– Classic categories assume clear boundaries

defined by common properties (necessary and sufficient conditions)

• How do we categorize games?

2004.10.19 - SLIDE 22IS 202 - FALL 2004

Definition of Game

• Counterexample: “Game”– No common properties shared by all games

• Card games, ball games, Olympic games, children’s games

– Competition: ring-around-the-rosy– Skill: dice games– Luck: chess

– No fixed boundary to category• Can be extended to new games (e.g., video

games)

• Alternative notion of category membership– Concepts related by family resemblances

2004.10.19 - SLIDE 23IS 202 - FALL 2004

Properties of Categorization

• Family resemblance– Members of a category may be related to one

another without all members having any property in common

• Instead, they may share a large subset of traits• Some attributes are more likely given that others

have been seen

– Example: feathers, wings, twittering, ...• Likely to be a bird, but not all features apply to

“emu”• Unlikely to see an association with “barks”

2004.10.19 - SLIDE 24IS 202 - FALL 2004

Properties of Categorization

• Example: Prime numbers– Definition: An integer divisible only by itself and 1– Examples: 2, 3, 5, 7, 11, 13, 17, …

• A very clear-cut category. Or is it?– Can one number be “more prime” than another?

• Centrality– Some members of a category may be “better

examples” than others, i.e., “prototypical” members• Example: robins vs. chickens vs. emus

2004.10.19 - SLIDE 25IS 202 - FALL 2004

Properties of Categorization

• Characteristic features– Perceived degree of category membership

has to do with which features help define the category

– Members usually do not have ALL the necessary features, but have some subset

– Those members that have more of the central features are seen as more central members

– People have conceptions of typical members

2004.10.19 - SLIDE 26IS 202 - FALL 2004

Testing for Centrality/Typicality

• Ask a series of questions, compare how long it takes people to answer– True or false:

• An apple is a fruit• A plum is a fruit• A coconut is a fruit• An olive is a fruit• A tomato is a fruit

• Rosch and Mervis– The more features a fruit shares with the other fruits,

the more typical a member of the class it is

2004.10.19 - SLIDE 27IS 202 - FALL 2004

Characteristic Features

• Is a cat on a mat a cat?

• Is a dead cat a cat?

• Is a photo of a cat a cat?

• Is a cat with three legs a cat?

• Is a cat that barks a cat?

• Is a cat with a dog’s brain a cat?

• Is a cat with every cell replaced by a dog’s cells a cat?

2004.10.19 - SLIDE 28IS 202 - FALL 2004

Properties of Categorization

• Basic-level categories– Categories are organized into a hierarchy

from the most general to the most specific, but the level that is most cognitively basic is “in the middle” of the hierarchy

• Basic-level primacy– Basic-level categories are functionally primary

with respect to factors including ease of cognitive processing (learning, reasoning, recognition, etc.)

2004.10.19 - SLIDE 29IS 202 - FALL 2004

Basic-Level Categories

• Brown 1958, 1965, Berlin et al., 1972, 1973• Folk biology:

– Unique beginner: plant, animal– Life form: tree, bush, flower– Generic name: pine, oak, maple, elm– Specific name: Ponderosa pine, white pine– Varietal name: Western Ponderosa pine

• No overlap between levels• Level 3 is basic

– Corresponds to genus– Folk biological categories correspond accurately to

scientific biological categories only at the basic level

2004.10.19 - SLIDE 30IS 202 - FALL 2004

Psychologically Primary Levels

SUPERORDINATE animal furniture

BASIC LEVEL dog chair

SUBORDINATE terrier rocker

• Children take longer to learn superordinate categories above the basic level

• Superordinate categories above the basic level are not associated with mental images or motor actions

2004.10.19 - SLIDE 31IS 202 - FALL 2004

Basic-Level Categorization

• Perception– Overall perceived shape– Single mental image– Fast identification

• Function– General motor program

• Communication– Shortest, most commonly used and contextually neutral words– First learned by children

• Knowledge Organization– Most attributes of category members stored at this level

2004.10.19 - SLIDE 32IS 202 - FALL 2004

Middle-Out Categorization

• Top down– Object

• Writing implement– Pen

• Bottom up– Sanford Uniball Black Pen

• Ink Pen

– Pen

• Middle out– Writing implement

• Pen– Ink Pen

2004.10.19 - SLIDE 33IS 202 - FALL 2004

Summary

• Processes of categorization underlie many of the issues having to do with information organization

• Categorization is messier than our computer systems would like

• Human categories have graded membership, consisting of family resemblances– Family resemblance is expressed in part by which subset of

features is shared– It is also determined by underlying understandings of the world

that do not get represented in most systems

• Basic-level categories, as well as subordinate and superordinate categories, seem to be cognitively real and therefore important in the design of information organization and retrieval systems

2004.10.19 - SLIDE 34IS 202 - FALL 2004

Agenda

• Information Organization Overview

• Categorization

• Discussion Questions

• Action Items for Next Time

2004.10.19 - SLIDE 35IS 202 - FALL 2004

Discussion Questions (Lakoff)

• Sarita Yardi on Lakoff– Doesn’t Lakoff’s prototype theory completely debunk IR (as we

have learned it so far in this course)?  The success of IR relies on its ability to categorize queries and documents such that they can be best matched for the user’s needs.  However, if categorization is necessarily dependent on prototypes, embodiment, and human thought, then isn’t it impossible to develop an IR system that can be universally applicable? 

– I think Boolean matching might avoid the problem because it fits into the abstract and mathematic classic theory. All the other types (vector, probabilistic, relevance feedback, etc.) would require custom design after the effects of human reason had been factored in to each individual usage.

– Do you agree?  Disagree?  Couldn’t care less?  Agree but nothing we can do about it?  Feel that we should never ever try to apply theories to practical applications?

2004.10.19 - SLIDE 36IS 202 - FALL 2004

Discussion Questions (Lakoff)

• Sarita Yardi on Lakoff– Categorization plays an important role on the web.  It

sure would be nice if we had some super power index with categorized links to everything we could ever want to know. 

– Is there any hope for categorizing anything on the web at all or will increasing entropy (randomness and disorder) forever dominate the way we manage information on the web?  Does the prototype theory give us more or less hope for the role of categorization on the web? (some random subjects to spur thought…  blogs, personal home pages, university websites, google versus yahoo, media)

2004.10.19 - SLIDE 37IS 202 - FALL 2004

Agenda

• Information Organization Overview

• Categorization

• Discussion Questions

• Action Items for Next Time

2004.10.19 - SLIDE 38IS 202 - FALL 2004

Next Time

• Knowledge Representation

2004.10.19 - SLIDE 39IS 202 - FALL 2004

George Furnas Lecture

• Towards Framing the Convergence– Wednesday, October 20, 2004, 4:00 - 5:30 pm– 202 South Hall

• Abstract– The past several years have seen an increased convergence in

disciplines working towards the goal of bringing people, information and technology together in more valuable ways. The goal has engaged participants ranging from Computer Science to Library Science, from Organizational Theory to Public Policy, and from Economics to Sociology, among many others. At the University of Michigan's School of Information we have been working for several years in an interdisciplinary effort, not just to participate in this convergence, but to understand it: Why is this broad suite of disciplines needed? What intellectual frameworks might we use for trying to understand how they fit together? How might we use such frameworks to leverage the disparate contributions better? This talk will describe those efforts and one take on some emerging results.

2004.10.19 - SLIDE 40IS 202 - FALL 2004

Homework (!)

• Course Reader– “The Vocabulary Problem in Human-System

Communication” (G. W. Furnas, T. K. Landauer, L. M. Gomez, S. T. Dumais)

• (Steve)

– “CYC: A Large-Scale Investment in Knowledge Infrastructure” (D. B. Lenat)

• (Rupa)

– “Commonsense-Based Interfaces” (M. Minsky)• (Andrew)

– Lakoff redux• (Morgan)