Categori Zat i On

Preview:

DESCRIPTION

Categori Zat i On

Citation preview

Tom Griffiths

CogSci 131 Models of categorization

Spaces and features •  Will show up in many contexts in this class

– similarity – semantic representation – categorization – neural networks

•  How can we use these representations?

Categorization

Outline

Prototype and exemplar theories

Break

Testing the theories

How can we explain typicality?

•  One answer: reject definitions, and have a new representation for categories

•  Prototype theory: – categories are represented by a prototype – other members share a family resemblance

relation to the prototype –  typicality is a function of similarity to the

prototype

Family resemblance

Prototype

Family resemblance

(Posner & Keele, 1968)

Prototype

Posner and Keele (1968) •  Prototype effect in categorization accuracy •  Constructed categories by perturbing

prototypical dot arrays •  Ordering of categorization accuracy at test:

– old exemplars – prototypes – new exemplars

Formalizing prototype theories Representation:

Each category (e.g., A, B) has a corresponding prototype (µA,µB)

Categorization: (for a new stimulus x)

Choose category that minimizes (maximizes) the distance (similarity) from x to its prototype

(e.g., Reed, 1972)

Formalizing prototype theories Prototype is most frequent or “typical” member

Spaces (Binary) Features Prototype e.g., average of members of category Distance e.g., Euclidean distance

Prototype e.g., binary vector with most frequent feature values Distance e.g., Hamming distance

d(x,π A ) = (xk −µA ,k )2

k∑%

& '

(

) *

1/ 2

d(x,π A ) = xk −µA ,kk∑

Distances

Euclidean distance

01100100111 01110100101

Hamming distance

Formalizing prototype theories

Category A Category B

Prototypes (category means)

Decision boundary at equal distance (always a straight line for two categories)

More complex prototypes •  Various extensions to simple prototype

models have been explored… •  For features, configural cue models

– compound features, such as “red and small” –  results in combinatorial explosion

•  For spaces, prototype models that incorporate information about variance – category-specific measures of distance

More complex prototypes

Category A Category B

Prototypes (category means)

Decision boundary at equal distance (no longer a straight line)

More complex prototypes Decision boundary at equal distance

(no longer a straight line)

Boundaries are conic sections (parabolas, ellipses, and hyperbolas)

Predicting prototype effects •  Prototype effects are built into the model:

– assume categorization becomes easier as proximity to the prototype increases…

– …or distance from the boundary increases •  But what about the old exemplar advantage?

(Posner & Keele, 1968)

•  Prototype models are not the only way to get prototype effects…

Exemplar theories

Store every member (“exemplar”) of the family

Formalizing exemplar theories Representation:

A set of stored exemplars y1, y2, …, yn, each with its own category label

Categorization: (for a new stimulus x) Choose category A with probability

P(A | x) =

βA ηxyy∈A∑

βA ηxy + βB ηxyy∈B∑

y∈A∑

“Luce-Shepard choice rule”

ηxy is similarity of x to y

βA is bias towards A

The context model (Medin & Schaffer, 1978)

Defined for stimuli with binary features (color, form, size, number)

1111 = (red, triangle, big, one) 0000 = (green, circle, small, two)

ηxy = ηxykk∏

Define similarity as the product of similarity on each dimension

ηxyk =1 xk = yksk otherwise# $ %

Prototypes vs. exemplars •  Exemplar models produce prototype effects

–  if prototype minimizes distance to all exemplars in a category, then it has high probability

•  Also predicts old exemplar advantage – being close (or identical) to an old exemplar of

the category gives high probability •  Predicts new effects prototype models

cannot produce… – stimuli close to an old exemplar should have high

probability, even far from the prototype

Break

Up next: Testing the theories

Prototypes vs. exemplars •  Exemplar models produce prototype effects

–  if prototype minimizes distance to all exemplars in a category, then it has high probability

•  Also predicts old exemplar advantage – being close (or identical) to an old exemplar of

the category gives high probability •  Predicts new effects prototype models

cannot produce… – stimuli close to an old exemplar should have high

probability, even far from the prototype

The 5-4 category structure (Medin & Schaffer, 1978)

Category A Category B

Prototype Prototype

d(x,µ) d(x,µ) 1

2

1

1

1

2

2

1

0

The 5-4 category structure (Medin & Schaffer, 1978)

Category A Category B

Prototype Prototype

d(x,µ) d(x,µ) 1

2

1

1

1

2

2

1

0

Prototype: P(A|4) > P(A|7) Exemplar: P(A|4) < P(A|7)

“4”

“7”

The generalized context model (Nosofsky, 1986)

Defined for stimuli in psychological space

ηxy = exp{−cd(x,y)p}

P(A | x) =

βA ηxyy∈A∑

βA ηxy + βB ηxyy∈B∑

y∈A∑

c is “specificity” p = 1 is exponential p = 2 is Gaussian

where

The generalized context model

Category A Category B

Decision boundary determined by exemplars

90% A

10% A

50% A

Category A Category B

Prototypes vs. exemplars Exemplar models can capture complex boundaries

Prototypes vs. exemplars Exemplar models can capture complex boundaries

Bells and whistles: distance metrics

d(x,y) = wk (xk − yk )r

k∑$

% &

'

( )

1/ rThe “weighted Minkowski r metric”:

where r determines the metric (Euclidean or city-block) wk is the weight of dimension k (reflects attention)

Using different metrics r = 2: Euclidean distance r = 1: city-block distance

“integral” dimensions e.g. saturation & brightness

“separable” dimensions e.g. size & shape

Dimensional attention

Allows rescaling of dimensions to aid in

categorization

(similar to capturing the variance of a category)

Evaluating models •  Both prototype and exemplar models have

lots of free parameters – prototype locations –  response biases, attention weights, r, p, c

•  Testing the models typically involves finding the best-fitting values of the parameters – generic optimization methods (like gradient

descent) are usually used to do this

Some questions… •  Both prototype and exemplar models seem

reasonable… are they “rational”? – are they solutions to the computational problem?

•  Should we use prototypes, or exemplars?

•  How can we define other models that handle more complex categorization problems?

•  Is this all that categories are?

Next week

•  Tuesday: Linear algebra – a way of computing with spaces

•  Thursday: Semantic networks – using some linear algebra! – Google and the mind…