Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
A quantitative estimate of the intelligence gain in crowd-sourced solutions
How smart is the crowd?
Ans Vercammen Centre For Environmental Policy, Imperial College London
What makes a group (act) smart?
Extensive research in psychology, business, (behavioural) economics, ecology, computer science and artificial intelligence
- Leadership and organisational structure - Individual intelligence, cognitive bias and thinking styles - Conformity and social influence - Big data and AI-assisted decision making - Swarm behaviour in animals
“Collective intelligence” emerges as a key theme
– Record of the phrase since 1800’s – “A collective decision capability [that is] at least as good as or better than any
single member of the group” (Hiltz &Turoff, 1978) – Collective capacity to adapt to the environment
Judgement and decision making in environmental management
Environmental management contexts are complex and decision processes error-prone
– Volatile, uncertain socio-ecological systems presenting “wicked problems” – Human cognitive limitations
Decisions are entrusted to “the collective intelligence” of committees or panels of experts
– Merits and challenges of group-based decision making rarely systematically addressed
– Limited research on the cognitive aspects of decision making in environmental management/policy
Quantitative judgements & the wisdom of the crowd
Galton (1907) – “The vox populi is correct to within 1 per cent of the real value” – Diverse and independent opinions cancel out random error
Ø Expert judgment in environmental decision making and conservation
– Structured elicitation, e.g. DELPHI method and IDEA protocol – Awareness and mainstreaming
Group dynamics in complex decision making
Can we measure group intelligence in the way we measure individual IQ? – Group IQ test assessing variety of group abilities (Woolley et al. 2010) – Single factor explains 43% of the variation in performance between groups – Group IQ predicts performance on other (real-life) tasks
Group intelligence predicted by… – Social perceptiveness (“theory of mind”) of group members (Engel et al. 2014) – Diversity in thinking styles (Aggarwal 2013) – Communication patterns & turn-taking (Woolley et al. 2015; Engel et al 2014) – Not individual IQ of group members (Woolley et al. 2010)
Ø Call for greater diversity in policy & decision making circles (Buckingham, Science 2010)
Crowdsourcing big data
Unprecedented connectivity between individual human minds through advances in web-technology
Distribute functions once performed by single experts in an open call to an undefined network of individuals (Howe 2006)
– Workers need not have topical knowledge – Workers can commit to simple repetitive tasks or… – Can be engaged in complex design and problem solving tasks
Ø Environmental data gathering/processing and complex problem solving – Citizen science, e.g. Zooniverse – Crowd-contests such as MIT’s ClimateCoLab
• Implications for
Crowdsourcing challenges
How to ensure high quality responses? – Contributors’ effort level cannot be observed directly – Aggregation/redundancy of responses
• Majority voting • Manual selection of best solution
– Improvement in quality of the output at the expense of increase in cost?
1. What is the gain in quality from aggregating crowdsourced responses? 2. What is the optimal size of the crowd for maximal return?
Crowdsourcing an intelligence test
Partial replication of Kosinski et al. (2012) Raven’s Standard Progressive Matrices • 60-item, non-verbal inductive reasoning test • 5 sections with increasing difficulty • Raw test score converted to an IQ score
– Mean = 100 / SD = 15 – Norm-refernced: IQ of 125 is better than 95%
of the population • Total score correlates strongly with more
elaborate tests of intelligence, e.g. WAIS
Crowdsourcing an intelligence test
Mechanical Turk • Commercial crowdsourcing platform • Individual workers get paid to perform
“human intelligence tasks” (HITs) • Requested 100+ workers to complete each
RSPM question
• Workers do not interact
• Random selections of N=1 to N=24 from the pool of workers for each test question
– Multiple choice questions – Group’s collective answer is the most
frequent answer (modal response) • Groups’ raw test scores calculated • Collective IQ is estimated based on
norms for US adults • Resampled 1000x to obtain
confidence intervals
IQ Test
Procedure
A B C D E
Overall accuracy and response times
Raw scores
40
42
44
46
48
50
52
54
56
58
60
62
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Raw SPM
score max=60
Group size
Minimum
Mean
Maximum
Max individual raw score
Estimated IQ
90
95
100
105
110
115
120
125
130
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
EsAm
ated
IQ sc
ore
Group size
95% Confidence Interval
Mean
Max individual IQ
How smart is the crowd…?
• Only about 10% of the general population will outperform a random selection of 10 individuals
• Only about 5% of the general population will outperform a random selection of 20 individuals
• It only takes 5 randomly selected individuals to collectively outperform the best available individual from the pool
Next steps…
Scaling up: Does the finding hold with increased task complexity and in different sets of users?
– Other aspects of reasoning and the crowd’s susceptibility to biases – Other platforms – More ecologically valid tests
Adding interaction: Does the finding hold if we go beyond simple aggregation? – Platforms designed to promote and guide interaction between contributors in wiki-
style, e.g. SWARM – Account for influences that are top-down (emergent characteristics of the group)
and bottom up (individual personality and intelligence)
Thank you • Yan Ji, MSc • Prof. Mark Burgman
Questions?
References Aggarwal, I (2013) Cognitive Style Diversity in Teams. Doctoral thesis (Carnegie Mellon University). Engel D, Woolley AW, Jing LX, Chabris CF & Malone TW (2014) Reading the Mind in the Eyes or Reading between the Lines? Theory of Mind Predicts Collective Intelligence Equally Well Online and Face-To-Face. PLOS ONE, 9, e115212. Galton F (1907) Vox populi. Nature, 75:450-451. Hiltz SR & Turoff M (1978). The Network Nation: Human Communication via Computer. New York: Addison-Wesley. Howe J (2006) The rise of crowdsourcing. Wired, 14. Kosinski M, Bachrach Y, Kasneci H, Van-Gael J, Graepel T (2012) Crowd IQ: Measuring the Intelligence of Crowdsourcing Platforms. ACM Conference on Web Sciences. Woolley AW, Chabris CF, Pentland A, Hashmi N & Malone TW (2010) Evidence for a Collective Intelligence Factor in the Performance of Human Groups. Science, 330: 686-688
Additional slides (hidden)
Who are the MTurkers?
http://www.mturk-tracker.com
Who are the MTurkers?
http://www.mturk-tracker.com
Who are the MTurkers?
http://www.mturk-tracker.com
What about more complex reasoning?
• Well-reasoned decisions rely on more than abstract reasoning • Individuals are susceptible to a range of cognitive biases
– Kahneman has highlighted a wide range of “irrational” patterns – Bosetti et al (2017) Paris COP21
• Asked policy makers about their expectations for global temperature increases by 2100 • Then revealed projections for the various climate change models based on best available evidence and
asked experts again • Conditional probabilities reported by policy makers anchored on their private prior estimates and failed to
fully incorporate the scientific information received
1. Can crowdsourcing improve reasoning quality? 2. Is the collective mind more resistant to cognitive bias?
Procedure
Brief “reasoning test set” – Inductive reasoning – Deductive reasoning – Cognitive Reflection Test – Heuristics and Biases Test
Individual participants complete entire test
– Individual performance vs. group performance
– Group performance is modal response of random assembly of individuals
Item 1 Item 60 … Item 4 Item 3 Item 2
Random selection from pool of N>100
N = 1 …
N = 24
Group response =
mode
√ X X √ X
√ √
√ √ √ X X
X
X
X √ √ √ √
√ √ √ √
√ √ X
X √ √
√
√ √
√
√
√
X √ √
√
Accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
A1 A3 A5 A7 A9 A11 B1 B3 B5 B7 B9 B11 C1 C3 C5 C7 C9 C11 D1 D3 D5 D7 D9 D11 E1 E3 E5 E7 E9 E11
A B C D E