1 Effect of Award on Worker Behaviors in Competitive Crowdsourcing Tasks Ye Yang, Razieh Saremi Stevens Institute of Technology CII Forum 2015 Nov. 16-18,

1

Effect of Award on Worker Behaviors in Competitive Crowdsourcing Tasks

Ye Yang, Razieh SaremiStevens Institute of Technology

CII Forum 2015 Nov. 16-18, 2015Arlington, VA

2

Introduction and Definitions

• Crowdsourcing―Coined in 2005 by Jeff Howe and Mark Robinson―Howe and Robinson in 2006: o Simply defined, crowdsourcing represents the act of a company or institution taking

a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.

―Merriam Webster in 2012:o The process of obtaining needed services, ideas, or content by soliciting

contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers.

• Software Crowdsourcing―Stol and Fitzgerald in 2014o The accomplishment of specified software development tasks on behalf of an

organization by a large and typically undefined group of external people with the requisite specialist knowledge through an open call.

3

Introduction and Definitions (cont’d)

• New paradigm of crowdsourced software development (CSD)

Reported benefits shortened schedule innovative solutions reduced cost

• Challenges―Predictive models don’t work

General Competitive CSD Processes

4

CSD Decisions Needing Better Analytics

Task Pricing(ICSE’13)

Developer Recommendation

(SOSE’15)

Failure Prediction

(Under review, ICSE’16)

Task Scheduling

(ASE’15)

5

Conflicts with Traditional Model/Law

• Schedule reduction― Compared with

Parkinson’s lawo “Work expands so as

to fill the time available for its completion.”

• Cost reduction― Compared with

COCOMOo EFFORT = a * SIZE b

Traditional Predictive Models don’t work for CSD!

6

Task Price Can Be Accurately Predictable! (ICSE-NIER’13)

7

Under-Priced Tasks Are More Failure Prone

Estimated vs. Actual Price on Failed Tasks (82% under-pricing)

Estimated vs. Actual Price on Successful Tasks (56% under-pricing)

8

Example Decision Scenarios

• How can I estimate how many potential workers that might want to sign up in my competition?

• How can I assure to get interested workers to work on my task and make final submissions for the money I pay?

• How can I predict if I am going to get qualified submission from the registered workers?

• How can I incentivize workers in order to obtain better results?

9

Conceptual Award-Worker Behavior Model

Yerkers-Dodson law

Conceptual Award-Worker Behavior Model

10

Research Questions

• RQ1: How does the award correlate with worker’s behavior in task selection and completion?

• RQ2: How consistent do workers behave from registering to submitting for tasks?

• RQ3: How does the number of registrants correlate to the quality of submission?

• RQ4: For similar tasks, will the number of registrants and submissions increase as award increase?

10

11

Dataset

• Datasets―514 component development tasks from Sep 2003 to Sep 2012―Extracted from TopCoder website―All tasks are successfully tasks

• Data-preprocessing

30 bins, (514)

APPL (33)

General (10)

COMM (34)

DATA (142)

DEVE (52)

Aggregate

Stratification

Analysis I

Analysis II, III

Analysis IV

Original Dataset

(514)

Analysis I

Main: 10 bins (494

DiscretizeRemoveOutlier

12

Metrics and Statistics

Metric Min Max Median Mean STDEV

Award112.5 3000 750 754 372

Size310 21925 2290 2978 2268

#Reg1 72 16 18 11

#Sub1 44 4 5 5

Score75 100 94.16 92.5 6.2

0

1000

2000

3000

4000

APPL COMM DATA DEVE

Award

60

70

80

90

100

APPL COMM DATA DEVE

Score

Min

Max

Median

Average

0

10

20

30

40

APPL COMM DATA DEVE

#Sub

05000

10000150002000025000

APPL COMM DATA DEVE

Size

0

20

40

60

APPL COMM DATA DEVE

#Reg

Comparison across 4 subsets

Summary of Basic Statistics

13

Analysis I: overall correlation analysis on dataset “Main”

13

Region Award #Registrants #tasks

I <=750; cheaper <=18; less competition

243

II <=750; cheaper >18; broader competition

185

III >750; more expensive

>18; broader competition

22

IV >750; more expensive

<=18; less competition

44

Rationales and statistics of tasks in four regions

• 63% of all tasks priced 750$ • 85% of all the tasks followed top-7

award settings ($750, $450, $600, $900, $1050, $150, and $375)

Scatter-plot of #Registrants vs. Award;

Very weak negative correlation of -0.015

14

Analysis I: overall correlation analysis on dataset “General”

14

Bin #Tasks Award Size #Reg #Sub Score

1 32 142 2562 14 4 94

2 17 226 2582 18 4 95

3 19 353 2886 19 5 95

4 24 447 2766 20 8 96

5 25 612 2663 21 6 95

6 311 750 3129 19 5 92

7 23 913 3468 15 4 94

8 19 1050 2286 22 5 91

9 15 1210 2597 17 4 95

10 9 1500 2509 14 3 87Sum

: 494Correlatio

n: -0.09 -0.13 -0.40 -0.71There are bigger pool of workers for cheaper tasks since they are relatively easier and require lower experience and less skill sets.

15

Analysis II: Behavior consistency

#Registration vs. #Submissions #Registration vs. SubmissionRatiostrong positive correlation of 0.71 SR: Median 0.25, Mean 0.30

y = 0.1995x + 0.8816R² = 0.4273

0

5

10

15

20

0 20 40 60

DEVE

y = 0.211x + 0.5918R² = 0.4819

02468

10121416

0 20 40 60

APPL

y = 0.2694x - 0.5749R² = 0.614

02468

10121416

0 10 20 30 40 50

COMM

y = 0.3116x - 0.1672R² = 0.4303

0

5

10

15

20

25

30

35

0 20 40 60

DATA

y = -0.246ln(x) + 1.0021R² = 0.5123

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60

DEVE

y = -0.11ln(x) + 0.6187R² = 0.1148

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60

DATA

y = -0.038ln(x) + 0.3425R² = 0.0245

0

0.1

0.2

0.3

0.4

0.5

0.6

0 10 20 30 40 50

COMM

y = -0.068ln(x) + 0.4484R² = 0.1028

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 20 40 60

APPL

16

Analysis III: Submission Quality

708090

100110

0 20 40 60

DATA

708090

100110

0 20 40 60

DEVE

70

80

90

100

110

0 20 40 60

APPL

70

80

90

100

110

0 20 40 60

COMM

16

Tasks with more than 20 registrants, have higher chance (71.5%) of

receiving better quality submissions (score > 93).

17

• Further breakdown by size

Analysis IV: Similar Tasks

17

Sub-Group

# Tasks

#observed award

#outlier removed

Model Fitness -

registrants

Model Fitness -

Submissions

<2kloc 56 13 1 0.69499 0.44283

2kloc~ 3kloc

31 6 1 0.59802 0.85468

3kloc~ 5kloc

27 9 1 0.76909 0.0619

>5kloc* 26 11 1 0.46303 0.57195

18

Discussions – (1)

• Correlation between award and worker behaviors―Award negatively correlates to all four metrics in the General dataset.―This indicates that overall, as award increase, the number of registrants, the

number of submissions, and the quality of the final submission all decrease. ―This observation supports the negligible negative roles that award plays in

worker behavior in the conceptual model.―It may not be cost-effective to simply raise award in order to attract broader

competition, esp. for competitive, 2-winner software crowdsourcing tasks.

• Answer to RQ1: Generally, in task selection, the number of registrants will decrease as award increase; in task completion, the number of submissions and score will decrease as award increase.

19

Discussions – (2)

• Behavior Consistency―By attracting more registrants, there are higher chances of receiving

satisfactory submissions, however the willingness to submit for each individual worker reduces.

―This reflects the behavior inconsistency from task registration to task completion, which supports the assumption of distracting factors in the conceptual model.

―Possible distracting factors include competition pressure, insufficient time to complete the task, etc.

• Answer to RQ2: There is a strong positive correlation of 0.71 between number of submissions and number of registrants. However, there is a decreasing tendency in making submission as the number of registrants increases.

20

Discussions – (3)

• Quality of Submission ―The positive correlation between number of registrants and submission

scores confirms the improved quality due to leveraging on worker diversity ―However, the low correlation of 0.19 indicate that the previously reported

great impact of team diversity could be limited or weakened by many distracting factors due to increased competition

―Similar viewpoint: “maximum point” reported by Tajedin et al.

• In summary, the answer to RQ3 is: There is a weak positive correlation of 0.19 between number of registrants and score of the winning submission.

21

Discussions – (4)

• Interaction of Award and Competition ―Analysis IV demonstrate some examples on optimal award. ―It is recommended for task requesters to design hybrid competition

combining both collaborative and competitive tasks.―Future decision support research directions:o Pricing strategies such as broader competition or higher quality o Sensitivity analysis for task requesters to explore different options with respect to

their needs and preferences;

• Answer to RQ4: For similar tasks, the relationship between award and worker behavior follows a variety of inverted U-shape curves.

22

Future Works• Further evaluation

―Additional dataset from Jan. 2014-Feb. 2015

• In-depth causality analysis using more attributes; ―task complexity, worker registration order, worker availability (multi-tasking),

worker rating, and so on;

• Predictive models to support strategic pricing.

23

Thank you!

Contact:Ye Yang, [email protected]

Documents

1 Effect of Award on Worker Behaviors in Competitive Crowdsourcing Tasks Ye Yang, Razieh Saremi Stevens Institute of Technology CII Forum 2015 Nov. 16-18,