The Dynamics of Micro-Task Crowdsourcing

The Dynamics of Micro-TaskCrowdsourcing

The Case of Amazon MTurk

Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, Panos Ipeirotis, Philippe Cudré-Mauroux

WWW’15 - 20th May 2015 - Florence 1

Background

Crowdsourcing is an Effective solution to certain classes of problems

2

Background

A Crowdsourcing Platform allows requesters to publish a crowdsourcing request (batch)

composed of multiple tasks (HITs)

Programmatically Invoke the crowd with APIs

3

Background

Paid Microtask Crowdsourcing scales-out but remains highly unpredictable

4

Background

Paid Microtask Crowdsourcing scales-out but remains highly unpredictable

5

time

#HITs/ Minute

Batch Throughput

SLAs are expensive

6

MTurk is a Marketplace for HITs

Direct: Price, Time of the day, #workers, #HITs etc

Other: Forums, Reputation-sys (TurkOpticon), Recommendation-sys (Openturk) 7

A Data Driven Approach

8

9

...Five Years Later[2009 - 2014]

mturk-tracker collected 2.5Million different batches

with over 130Million HITs

10

mturk-tracker.com

● Collects metadata about each visible batch (Title, description, rewards, required qualifications, HITs available etc)

● Records batch progress (every ~20 minutes)

We note that the tracker reports data periodically only and does not reflect fine-grained information (e.g., real-time variations)

11

Menu

1. Notable Facts Extracted from the Data

2. Large-scale HIT Type Classification

3. Analyzing the Features Affecting Batch Throughput

4. Market Analysis

12

1) Notable Facts Extracted from the Data

13

Country-Specific HITs

14

US and India?

Country-Specific HITs

Workers from US, India and Canada are the most sought after.15

Distribution of Batch Size

16

“Power-law”

Evolution of Batch Sizes

Very large batches

start to appear

17

HIT Pricing

18

Is 1-cent per HIT the norm?

HIT Pricing

19

5-cents is the new

1-cent

Requesters and Reward Evolution

20

Increasing number of New and Distinct Requesters

2) Large-scale HIT Type Classification

21

Classify HITs into types (Gadiraju et. al 2014)- Information Finding (IF)- Verification and Validation (VV )- Interpretation and Analysis (IA)- Content Creation (CC)- Surveys (SU)- Content Access (CA)

22

HIT Classes

We trained a Support Vector Machine (SVM) model

- HIT title, description, keywords, reward, date, allocated time, and batch size

- Created labeled data on Mturk for 5,000 HITs uniformly sampled HITs- Our HIT used 3 repetitions

- Consensus reached for 89% of the tasks- 10-fold cross validation

- Precision of 0.895- Recall of 0.899- F-Measure of 0.895

- We then performed a large-scale classification for all 2.5M HITs

Supervised ClassificationWith the Crowd

23

Distribution of HIT Types

Less Content Access batches

Content Creation being the most popular24

3) Analyzing the Features Affecting Batch

Throughput

25

time

#HITs/ Minute

Batch Throughput

Batch Throughput Prediction

29 Features

HIT Features

HITs available, Start Time, Reward, Description length, Title length, Keywords, requester_id, Time_alloted, Task type, Age (minutes) etc.

Market Features

Total HITs available, HITs arrived, rewards Arrived, % HITs completed etc.

26


Ttime

delta

- Predict batch throughput at time T by training a Random Forest Regression model with samples taken in [T-delta, T) time span

- 29 Features (including the Type of the Batch)- Hourly Data in range [June-October] 2014- We sampled 50 times points for evaluation purposes

27


Ttime

delta

- Predict batch throughput at time T by training a Random Forest Regression model with samples taken in [T-delta, T) time span

- 29 Features (including the Type of the Batch)- Hourly Data in range [June-October] 2014- We sampled 50 times points for evaluation purposes

We are interested in cases where prediction works reasonably28

Predicted vs. Actual Batch Throughput (delta=4 hours)

Prediction Works best for larger batches having large momentum

29

Significant Features

- What features contribute best when the

prediction works reasonably

- We proceed by feature ablation

- Re-run prediction by removing 1 feature at a time

- 1000 samples

30

Significant Features

- What features contribute best when the prediction works reasonably

- We proceed by feature ablation- Re-run prediction by removing 1 feature at a time.- 1000 samples

HITs_Available (Number of tasks in the batch)

Age_Minutes (how long ago the batch was created)

31

4) Market Analysis

32

Demand - The number of new tasks published on the platform by the requesters

Supply - The workforce that the crowd is providing

Supply Elasticity

How does the market reacts when new tasks arrive on the platform?

33

Supply Elasticity

We regressed the percentage of work done (within 1 Hour) against the number of new HITs

34

Supply Elasticity

Intercept = 2.5Slope = 0.5%

20% of new work gets completed within an hour

35

Supply Elasticity

Intercept = 2.5Slope = 0.5%

20% of new work gets completed within an hour

36

Demand and Supply Periodicity

Demand Supply37

Demand and Supply Periodicity

Strong weekly periodicity 7-10 days.38

Conclusions

- Long time data analysis uncovers some hidden trends

- Large scale HIT classification

- Important features in throughput prediction (HITs

available, Age_minutes)

- Supply is Elastic

- (More work available -> More work Done)

- Supply and Demand are periodic (7-10days) 39

Is a Crowdsourcing Marketplace the right paradigm for efficient and predictable

crowdsourcing?

40

Is a Crowdsourcing Marketplace the right paradigm for efficient and predictable

crowdsourcing?

41

Q&A

Djellel Difallah

[email protected]

Science

The Dynamics of Micro-Task Crowdsourcing