Upload
gabriel-recchia
View
313
Download
1
Tags:
Embed Size (px)
Citation preview
Crowdsourcing Large-Scale
Semantic Feature Norms
Gabriel Recchia
Michael N. Jones
Semantic space models are computational models of
human semantic representation that typically operate on
distributional data (co-occurrence statistics)
A common criticism: Not grounded in perception
and action
Emergence of “perceptually grounded” computational
models integrating experiential and distributional data
• Andrews, M., Vigliocco, G., & Vinson, D. (2009). Integrating experiential and
distributional data to learn semantic representations.
• Durda, K., Buchanan, L., & Caron, R. (2009). Grounding co-occurrence: Identifying
features in a lexical co-occurrence model of semantic memory.
• Jones, M. N. & Recchia, G. (2010). You can't wear a coat rack: A binding framework
to avoid illusory feature migrations in perceptually grounded semantic models.
• Steyvers, M. (2010). Combining feature norms and text data with topic models.
• Vigliocco, G., Vinson, D. P., Lewis, W., & Garrett, M. F. (2004). Representing the
meanings of object and action words: The featural and unitary semantic space
hypothesis.
Where do features come from?
• For humans: Experience with the real world
• For models: Human-generated property norms
bluebird
housefly
starling
has_feathers
has_wings
FEATURE VECTORS
MEMORY VECTORS
bluebird
housefly
starling
has_feathers
has_wings
FEATURE VECTORS
MEMORY VECTORS
feature examples
McRae, Cree, Seidenberg & McNorgan (2005), Appendix F
feature examples
McRae, Cree, Seidenberg & McNorgan (2005), Appendix F
Issues with “grounded” distributional models
• Not enough grounded concepts
• Features represented as discrete entities
How to get data…
“In this experiment, you will describe various words…”
How to get data…
“In this experiment, you will describe various words…”fun game
von Ahn, L. and L. Dabbish. (2004). Labeling images with a computer game.
ACM Conference on Human Factors in Computing Systems, CHI 2004.
Baroni, M. & Lenci, A. (2008). Concepts and properties in word spaces.
Making property generation into a game –
do participants generate usable data?
• 45 subjects generated ten features for each of 16 to 48
words, resulting in at least 30 subjects having generated
features for each of the 48 words
• For comparison to McRae norms, features manually
remapped
(“gives bad breath” beh_-_causes_bad_breath,
“is a fruit” a_fruit, etc.)
• Word by feature matrix constructed: cell at
<w, f> contains the number of participants listing feature
f for word w
• Square word by word matrix constructed: cell at
<w1, w2> contains the cosines between the rows for
word w1 and word w2 in the word-by-word matrix
Do participants in the “game” task generate usable data?
• Word-by-feature matrix: Rows had high correlations, on
average, with the corresponding rows in McRae matrix
(M = .83, SD = .08)
• Word-by-word matrix correlations similarly high
(M = .96, SD = .03)
Do participants in the “game” task generate usable data?
• Word-by-feature matrix: Rows had high correlations, on
average, with the corresponding rows in McRae matrix
(M = .83, SD = .08)
• Word-by-word matrix correlations similarly high
(with diagonal removed: M = .82, SD = .23)
• Higher-order statistics correlate as well
– Number of features
– % shared features
Still not much of a game…
• Participant testimonials
– “It was hard”
– “Took too long”
– “After a while I just wanted it to be done”
• Can something like this be made into
something people would willingly do?
Using Verbosity: Common Sense Data
from Games with a Purpose(Speer, Havasi, & Surana, 2010)
Speer, Havasi, & Surana (2010), Fig. 2
Adapted from Speer, Havasi, & Surana (2010), Fig. 5
Issues
• Predefined frames ignored
• Sound-alikes
• Effect of teammates’ guesses
leg has lower limb
toy is a kind of little
sail is a boat
servant has paid help
produce is a type of fruits vegetables
attack is a tack
belief is a kind of be leaf
chord is typically in rhymes sword
heat looks like feat meat
machine looks like mush sheen
passion looks like fashion
wander is a type of wonder
Desiderata
• Open-ended, as opposed to restricting the
player to predefined frames
• Incentives for player to provide actual
features, as opposed to associates or
sound-alikes
• Minimize the effect that teammates’
guesses have on player’s descriptions
http://mypage.iu.edu/~grecchia/FeatureGameInstaller.exe
Challenges
• Two main types of players…– Descriptions are single-word associates
(can’t be normed automatically)
– Descriptions are rich and many words long
(can’t be normed automatically)
• Possible approaches: Restrict to two/three word
descriptions? Classify semantic relations via
another game?
• Other data of interest?
Thank You
Where do features come from?
• For humans: Experience with the real world
• For models: Human-generated property norms