How smart is the crowd?€¦ · - Leadership and organisational structure - Individual intelligence, cognitive bias and thinking styles - Conformity and social influence - Big data

A quantitative estimate of the intelligence gain in crowd-sourced solutions

How smart is the crowd?

Ans Vercammen Centre For Environmental Policy, Imperial College London

What makes a group (act) smart?

Extensive research in psychology, business, (behavioural) economics, ecology, computer science and artificial intelligence

-  Leadership and organisational structure -  Individual intelligence, cognitive bias and thinking styles -  Conformity and social influence -  Big data and AI-assisted decision making -  Swarm behaviour in animals

“Collective intelligence” emerges as a key theme

–  Record of the phrase since 1800’s –  “A collective decision capability [that is] at least as good as or better than any

single member of the group” (Hiltz &Turoff, 1978) –  Collective capacity to adapt to the environment

Judgement and decision making in environmental management

Environmental management contexts are complex and decision processes error-prone

–  Volatile, uncertain socio-ecological systems presenting “wicked problems” –  Human cognitive limitations

Decisions are entrusted to “the collective intelligence” of committees or panels of experts

–  Merits and challenges of group-based decision making rarely systematically addressed

–  Limited research on the cognitive aspects of decision making in environmental management/policy

Quantitative judgements & the wisdom of the crowd

Galton (1907) –  “The vox populi is correct to within 1 per cent of the real value” –  Diverse and independent opinions cancel out random error

Ø  Expert judgment in environmental decision making and conservation

–  Structured elicitation, e.g. DELPHI method and IDEA protocol –  Awareness and mainstreaming

Group dynamics in complex decision making

Can we measure group intelligence in the way we measure individual IQ? –  Group IQ test assessing variety of group abilities (Woolley et al. 2010) –  Single factor explains 43% of the variation in performance between groups –  Group IQ predicts performance on other (real-life) tasks

Group intelligence predicted by… –  Social perceptiveness (“theory of mind”) of group members (Engel et al. 2014) –  Diversity in thinking styles (Aggarwal 2013) –  Communication patterns & turn-taking (Woolley et al. 2015; Engel et al 2014) –  Not individual IQ of group members (Woolley et al. 2010)

Ø  Call for greater diversity in policy & decision making circles (Buckingham, Science 2010)

Crowdsourcing big data

Unprecedented connectivity between individual human minds through advances in web-technology

Distribute functions once performed by single experts in an open call to an undefined network of individuals (Howe 2006)

–  Workers need not have topical knowledge –  Workers can commit to simple repetitive tasks or… –  Can be engaged in complex design and problem solving tasks

Ø  Environmental data gathering/processing and complex problem solving –  Citizen science, e.g. Zooniverse –  Crowd-contests such as MIT’s ClimateCoLab

•  Implications for

Crowdsourcing challenges

How to ensure high quality responses? –  Contributors’ effort level cannot be observed directly –  Aggregation/redundancy of responses

•  Majority voting •  Manual selection of best solution

–  Improvement in quality of the output at the expense of increase in cost?

1. What is the gain in quality from aggregating crowdsourced responses? 2. What is the optimal size of the crowd for maximal return?

Crowdsourcing an intelligence test

Partial replication of Kosinski et al. (2012) Raven’s Standard Progressive Matrices •  60-item, non-verbal inductive reasoning test •  5 sections with increasing difficulty •  Raw test score converted to an IQ score

–  Mean = 100 / SD = 15 –  Norm-refernced: IQ of 125 is better than 95%

of the population •  Total score correlates strongly with more

elaborate tests of intelligence, e.g. WAIS

Crowdsourcing an intelligence test

Mechanical Turk •  Commercial crowdsourcing platform •  Individual workers get paid to perform

“human intelligence tasks” (HITs) •  Requested 100+ workers to complete each

RSPM question

•  Workers do not interact

•  Random selections of N=1 to N=24 from the pool of workers for each test question

–  Multiple choice questions –  Group’s collective answer is the most

frequent answer (modal response) •  Groups’ raw test scores calculated •  Collective IQ is estimated based on

norms for US adults •  Resampled 1000x to obtain

confidence intervals

IQ Test

Procedure

A B C D E

Overall accuracy and response times

Raw scores

40

42

44

46

48

50

52

54

56

58

60

62

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Raw SPM

score max=60

Group size

Minimum

Mean

Maximum

Max individual raw score

Estimated IQ

90

95

100

105

110

115

120

125

130

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

EsAm

ated

IQ sc

ore

Group size

95% Confidence Interval

Mean

Max individual IQ

How smart is the crowd…?

•  Only about 10% of the general population will outperform a random selection of 10 individuals

•  Only about 5% of the general population will outperform a random selection of 20 individuals

•  It only takes 5 randomly selected individuals to collectively outperform the best available individual from the pool

Next steps…

Scaling up: Does the finding hold with increased task complexity and in different sets of users?

–  Other aspects of reasoning and the crowd’s susceptibility to biases –  Other platforms –  More ecologically valid tests

Adding interaction: Does the finding hold if we go beyond simple aggregation? –  Platforms designed to promote and guide interaction between contributors in wiki-

style, e.g. SWARM –  Account for influences that are top-down (emergent characteristics of the group)

and bottom up (individual personality and intelligence)

Thank you •  Yan Ji, MSc •  Prof. Mark Burgman

Questions?

References Aggarwal, I (2013) Cognitive Style Diversity in Teams. Doctoral thesis (Carnegie Mellon University). Engel D, Woolley AW, Jing LX, Chabris CF & Malone TW (2014) Reading the Mind in the Eyes or Reading between the Lines? Theory of Mind Predicts Collective Intelligence Equally Well Online and Face-To-Face. PLOS ONE, 9, e115212. Galton F (1907) Vox populi. Nature, 75:450-451. Hiltz SR & Turoff M (1978). The Network Nation: Human Communication via Computer. New York: Addison-Wesley. Howe J (2006) The rise of crowdsourcing. Wired, 14. Kosinski M, Bachrach Y, Kasneci H, Van-Gael J, Graepel T (2012) Crowd IQ: Measuring the Intelligence of Crowdsourcing Platforms. ACM Conference on Web Sciences. Woolley AW, Chabris CF, Pentland A, Hashmi N & Malone TW (2010) Evidence for a Collective Intelligence Factor in the Performance of Human Groups. Science, 330: 686-688

Additional slides (hidden)

Who are the MTurkers?

http://www.mturk-tracker.com





What about more complex reasoning?

•  Well-reasoned decisions rely on more than abstract reasoning •  Individuals are susceptible to a range of cognitive biases

–  Kahneman has highlighted a wide range of “irrational” patterns –  Bosetti et al (2017) Paris COP21

•  Asked policy makers about their expectations for global temperature increases by 2100 •  Then revealed projections for the various climate change models based on best available evidence and

asked experts again •  Conditional probabilities reported by policy makers anchored on their private prior estimates and failed to

fully incorporate the scientific information received

1. Can crowdsourcing improve reasoning quality? 2. Is the collective mind more resistant to cognitive bias?

Procedure

Brief “reasoning test set” –  Inductive reasoning –  Deductive reasoning –  Cognitive Reflection Test –  Heuristics and Biases Test

Individual participants complete entire test

–  Individual performance vs. group performance

–  Group performance is modal response of random assembly of individuals

Item 1 Item 60 … Item 4 Item 3 Item 2

Random selection from pool of N>100

N = 1 …

N = 24

Group response =

mode

√ X X √ X

√ √

√ √ √ X X

X

X

X √ √ √ √

√ √ √ √

√ √ X

X √ √

√

√ √

√

√

√

X √ √

√

Accuracy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

A1 A3 A5 A7 A9 A11 B1 B3 B5 B7 B9 B11 C1 C3 C5 C7 C9 C11 D1 D3 D5 D7 D9 D11 E1 E3 E5 E7 E9 E11

A B C D E

Documents

How smart is the crowd?€¦ · - Leadership and organisational structure - Individual intelligence, cognitive bias and thinking styles - Conformity and social influence - Big data