Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Pranav Anand, Caroline Andrews, Matthew Wagers

Assessing the pragmatics of experiments with

crowdsourcing: The case of scalar implicature

University of California, Santa Cruz

Experiments & Pragmatic Processing

Each of the critics reviewed some of the movies.

– evidence for EI’s, with different response choices

- no evidence of EI’s

but not all ?

Depending on the study:

Worry: How much do methodologies themselves influence judgements?

Worry: Are we adequately testing the influence of methodologies on our data?

Case Study: (Embedded) Implicatures

Previous Limitation: Lack of Subjects and Money

Crowd-sourcing addresses both problems

Pragmatics of Experimental Situations

Teleological Curiosity - Subjects hypothesizing “expected” behavior, matching an ideal


Evaluation Apprehension – subjects know they are being judged

The experiment itself is part of the pragmatic context See Rosenthal & Rosnow. (1975) The Volunteer Subject.

Elements of Experimental Context

e.g. True / False, Yes / No, 1-7 scale

Response Structure – Response choices available to the subject


Prompt – the Question

Protocol – Social Context / Task Specification

directions for the Response Structure

Immediate Linguistic/Visual Context

Our Goal: Explore variations of these elements in a systematic way

Experimental Design

Is this an accurate description?

Some of the spices have red lids.

Linguistic Contexts – All Relevant, All Irrelevant, No Context

Protocol Experimental – normal experiment instructions

Annotation – checking the work of unaffiliated annotators

4 Implicature Targets, 6 Some/All Controls, 20 Fillers

Experiment 1:Social Context

Focus on ProtocolAnnotation vs Experiment

Population: Undergraduates

All – Irrelevant No Story All-Relevant

Experiment

Annotation

Accuracy Prompt - “Is this an accurate description?”Response Categories - Yes, No, Don’t Know


Finding: Social context even when linguistic context does not.

LinguisticContext:No Effect


Finding: Social context even when linguistic context does not.

Lower SI rate for Annotation(p<0.05)

Experiment 2Prompt Type

Accuracy Prompt - “Is this an accurate description?”Response Categories - Yes, No, Don’t Know

Informativity Prompt - “How Informative is this sentence?”

Response Categories - Not Informative Enough Informative Enough Too Much Information False

Population: Mechanical Turk Workers

Systematic Debriefing Survey


Effect for Prompt


Effect for Prompt(p<0.001)

Effect for Context(p<0.001)


Effect for Prompt(p<0.001)

Effect for Context(p<0.001)

Weak Interaction:Prompt xContext(p<0.06)


No Effect forProtocol


Low SI ratesoverall

But the debriefing surveyindicates that (roughly) 70% of participants were aware of some/all contrast

Populations

Turkers – More sensitive to Linguistic ContextLess sensitive to changes in changes in social context/ evaluation apprehension

Undergraduates – More sensitive to Protocol

Take Home Points

• Methodological variables should be explored alongside conventional linguistic variables– Ideal: models of these processes (cf. Schutze 1996)– Crowdsourcing allows for cheap/fast exploration of

parameter spaces

• New Normal: Don’t guess, test.– Controls, norming, confounding … all testable

online

A potential check on exuberance

• Undergraduates may be WEIRD*, but crowdsourcing engenders its own weirdness– High evaluation apprehension– Uncontrolled backgrounds, skillsets, focus levels– Unknown motivations

• Ignorance does not necessarily mean diversity– This requires study if we rely on such participants

more

* Heinrich et al. (2010) The Weirdest People in the World? BBS

Acknowledgments

Thanks Jaye Padgett and to the attendees of two Semantics Lab presentations and the XPRAG conference for their

comments, to the HUGRA committee for their generous award and support, and thanks to Rosie Wilson-Briggs

for stimuli construction.

Documents

Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of