Upload
hayden-burton
View
222
Download
3
Tags:
Embed Size (px)
Citation preview
Pranav Anand, Caroline Andrews, Matthew Wagers
Assessing the pragmatics of experiments with
crowdsourcing: The case of scalar implicature
University of California, Santa Cruz
Experiments & Pragmatic Processing
Each of the critics reviewed some of the movies.
– evidence for EI’s, with different response choices
- no evidence of EI’s
but not all ?
Depending on the study:
Worry: How much do methodologies themselves influence judgements?
Worry: Are we adequately testing the influence of methodologies on our data?
Case Study: (Embedded) Implicatures
Previous Limitation: Lack of Subjects and Money
Crowd-sourcing addresses both problems
Pragmatics of Experimental Situations
Teleological Curiosity - Subjects hypothesizing “expected” behavior, matching an ideal
Worry: How much do methodologies themselves influence judgements?
Evaluation Apprehension – subjects know they are being judged
The experiment itself is part of the pragmatic context See Rosenthal & Rosnow. (1975) The Volunteer Subject.
Elements of Experimental Context
e.g. True / False, Yes / No, 1-7 scale
Response Structure – Response choices available to the subject
Worry: How much do methodologies themselves influence judgements?
Prompt – the Question
Protocol – Social Context / Task Specification
directions for the Response Structure
Immediate Linguistic/Visual Context
Our Goal: Explore variations of these elements in a systematic way
Experimental Design
Is this an accurate description?
Some of the spices have red lids.
Linguistic Contexts – All Relevant, All Irrelevant, No Context
Protocol Experimental – normal experiment instructions
Annotation – checking the work of unaffiliated annotators
4 Implicature Targets, 6 Some/All Controls, 20 Fillers
Experiment 1:Social Context
Focus on ProtocolAnnotation vs Experiment
Population: Undergraduates
All – Irrelevant No Story All-Relevant
Experiment
Annotation
Accuracy Prompt - “Is this an accurate description?”Response Categories - Yes, No, Don’t Know
Experiment 1:Social Context
Finding: Social context even when linguistic context does not.
LinguisticContext:No Effect
Experiment 1:Social Context
Finding: Social context even when linguistic context does not.
Lower SI rate for Annotation(p<0.05)
Experiment 2Prompt Type
Accuracy Prompt - “Is this an accurate description?”Response Categories - Yes, No, Don’t Know
Informativity Prompt - “How Informative is this sentence?”
Response Categories - Not Informative Enough Informative Enough Too Much Information False
Population: Mechanical Turk Workers
Systematic Debriefing Survey
Experiment 2Prompt Type
Effect for Prompt
Experiment 2Prompt Type
Effect for Prompt(p<0.001)
Effect for Context(p<0.001)
Experiment 2Prompt Type
Effect for Prompt(p<0.001)
Effect for Context(p<0.001)
Weak Interaction:Prompt xContext(p<0.06)
Experiment 2Prompt Type
No Effect forProtocol
Experiment 2Prompt Type
Low SI ratesoverall
But the debriefing surveyindicates that (roughly) 70% of participants were aware of some/all contrast
Populations
Turkers – More sensitive to Linguistic ContextLess sensitive to changes in changes in social context/ evaluation apprehension
Undergraduates – More sensitive to Protocol
Take Home Points
• Methodological variables should be explored alongside conventional linguistic variables– Ideal: models of these processes (cf. Schutze 1996)– Crowdsourcing allows for cheap/fast exploration of
parameter spaces
• New Normal: Don’t guess, test.– Controls, norming, confounding … all testable
online
A potential check on exuberance
• Undergraduates may be WEIRD*, but crowdsourcing engenders its own weirdness– High evaluation apprehension– Uncontrolled backgrounds, skillsets, focus levels– Unknown motivations
• Ignorance does not necessarily mean diversity– This requires study if we rely on such participants
more
* Heinrich et al. (2010) The Weirdest People in the World? BBS
Acknowledgments
Thanks Jaye Padgett and to the attendees of two Semantics Lab presentations and the XPRAG conference for their
comments, to the HUGRA committee for their generous award and support, and thanks to Rosie Wilson-Briggs
for stimuli construction.