Upload
cornelia-baldwin
View
215
Download
0
Embed Size (px)
Citation preview
1
Effect of Award on Worker Behaviors in Competitive Crowdsourcing Tasks
Ye Yang, Razieh SaremiStevens Institute of Technology
CII Forum 2015 Nov. 16-18, 2015Arlington, VA
2
Introduction and Definitions
• Crowdsourcing―Coined in 2005 by Jeff Howe and Mark Robinson―Howe and Robinson in 2006: o Simply defined, crowdsourcing represents the act of a company or institution taking
a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.
―Merriam Webster in 2012:o The process of obtaining needed services, ideas, or content by soliciting
contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers.
• Software Crowdsourcing―Stol and Fitzgerald in 2014o The accomplishment of specified software development tasks on behalf of an
organization by a large and typically undefined group of external people with the requisite specialist knowledge through an open call.
3
Introduction and Definitions (cont’d)
• New paradigm of crowdsourced software development (CSD)
Reported benefits shortened schedule innovative solutions reduced cost
• Challenges―Predictive models don’t work
General Competitive CSD Processes
4
CSD Decisions Needing Better Analytics
Task Pricing(ICSE’13)
Developer Recommendation
(SOSE’15)
Failure Prediction
(Under review, ICSE’16)
Task Scheduling
(ASE’15)
5
Conflicts with Traditional Model/Law
• Schedule reduction― Compared with
Parkinson’s lawo “Work expands so as
to fill the time available for its completion.”
• Cost reduction― Compared with
COCOMOo EFFORT = a * SIZE b
Traditional Predictive Models don’t work for CSD!
6
Task Price Can Be Accurately Predictable! (ICSE-NIER’13)
7
Under-Priced Tasks Are More Failure Prone
Estimated vs. Actual Price on Failed Tasks (82% under-pricing)
Estimated vs. Actual Price on Successful Tasks (56% under-pricing)
8
Example Decision Scenarios
• How can I estimate how many potential workers that might want to sign up in my competition?
• How can I assure to get interested workers to work on my task and make final submissions for the money I pay?
• How can I predict if I am going to get qualified submission from the registered workers?
• How can I incentivize workers in order to obtain better results?
9
Conceptual Award-Worker Behavior Model
Yerkers-Dodson law
Conceptual Award-Worker Behavior Model
10
Research Questions
• RQ1: How does the award correlate with worker’s behavior in task selection and completion?
• RQ2: How consistent do workers behave from registering to submitting for tasks?
• RQ3: How does the number of registrants correlate to the quality of submission?
• RQ4: For similar tasks, will the number of registrants and submissions increase as award increase?
10
11
Dataset
• Datasets―514 component development tasks from Sep 2003 to Sep 2012―Extracted from TopCoder website―All tasks are successfully tasks
• Data-preprocessing
30 bins, (514)
APPL (33)
General (10)
COMM (34)
DATA (142)
DEVE (52)
Aggregate
Stratification
Analysis I
Analysis II, III
Analysis IV
Original Dataset
(514)
Analysis I
Main: 10 bins (494
DiscretizeRemoveOutlier
12
Metrics and Statistics
Metric Min Max Median Mean STDEV
Award112.5 3000 750 754 372
Size310 21925 2290 2978 2268
#Reg1 72 16 18 11
#Sub1 44 4 5 5
Score75 100 94.16 92.5 6.2
0
1000
2000
3000
4000
APPL COMM DATA DEVE
Award
60
70
80
90
100
APPL COMM DATA DEVE
Score
Min
Max
Median
Average
0
10
20
30
40
APPL COMM DATA DEVE
#Sub
05000
10000150002000025000
APPL COMM DATA DEVE
Size
0
20
40
60
APPL COMM DATA DEVE
#Reg
Comparison across 4 subsets
Summary of Basic Statistics
13
Analysis I: overall correlation analysis on dataset “Main”
13
Region Award #Registrants #tasks
I <=750; cheaper <=18; less competition
243
II <=750; cheaper >18; broader competition
185
III >750; more expensive
>18; broader competition
22
IV >750; more expensive
<=18; less competition
44
Rationales and statistics of tasks in four regions
• 63% of all tasks priced 750$ • 85% of all the tasks followed top-7
award settings ($750, $450, $600, $900, $1050, $150, and $375)
Scatter-plot of #Registrants vs. Award;
Very weak negative correlation of -0.015
14
Analysis I: overall correlation analysis on dataset “General”
14
Bin #Tasks Award Size #Reg #Sub Score
1 32 142 2562 14 4 94
2 17 226 2582 18 4 95
3 19 353 2886 19 5 95
4 24 447 2766 20 8 96
5 25 612 2663 21 6 95
6 311 750 3129 19 5 92
7 23 913 3468 15 4 94
8 19 1050 2286 22 5 91
9 15 1210 2597 17 4 95
10 9 1500 2509 14 3 87Sum
: 494Correlatio
n: -0.09 -0.13 -0.40 -0.71There are bigger pool of workers for cheaper tasks since they are relatively easier and require lower experience and less skill sets.
15
Analysis II: Behavior consistency
#Registration vs. #Submissions #Registration vs. SubmissionRatiostrong positive correlation of 0.71 SR: Median 0.25, Mean 0.30
y = 0.1995x + 0.8816R² = 0.4273
0
5
10
15
20
0 20 40 60
DEVE
y = 0.211x + 0.5918R² = 0.4819
02468
10121416
0 20 40 60
APPL
y = 0.2694x - 0.5749R² = 0.614
02468
10121416
0 10 20 30 40 50
COMM
y = 0.3116x - 0.1672R² = 0.4303
0
5
10
15
20
25
30
35
0 20 40 60
DATA
y = -0.246ln(x) + 1.0021R² = 0.5123
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60
DEVE
y = -0.11ln(x) + 0.6187R² = 0.1148
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60
DATA
y = -0.038ln(x) + 0.3425R² = 0.0245
0
0.1
0.2
0.3
0.4
0.5
0.6
0 10 20 30 40 50
COMM
y = -0.068ln(x) + 0.4484R² = 0.1028
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 20 40 60
APPL
16
Analysis III: Submission Quality
708090
100110
0 20 40 60
DATA
708090
100110
0 20 40 60
DEVE
70
80
90
100
110
0 20 40 60
APPL
70
80
90
100
110
0 20 40 60
COMM
16
Tasks with more than 20 registrants, have higher chance (71.5%) of
receiving better quality submissions (score > 93).
17
• Further breakdown by size
Analysis IV: Similar Tasks
17
Sub-Group
# Tasks
#observed award
#outlier removed
Model Fitness -
registrants
Model Fitness -
Submissions
<2kloc 56 13 1 0.69499 0.44283
2kloc~ 3kloc
31 6 1 0.59802 0.85468
3kloc~ 5kloc
27 9 1 0.76909 0.0619
>5kloc* 26 11 1 0.46303 0.57195
18
Discussions – (1)
• Correlation between award and worker behaviors―Award negatively correlates to all four metrics in the General dataset.―This indicates that overall, as award increase, the number of registrants, the
number of submissions, and the quality of the final submission all decrease. ―This observation supports the negligible negative roles that award plays in
worker behavior in the conceptual model.―It may not be cost-effective to simply raise award in order to attract broader
competition, esp. for competitive, 2-winner software crowdsourcing tasks.
• Answer to RQ1: Generally, in task selection, the number of registrants will decrease as award increase; in task completion, the number of submissions and score will decrease as award increase.
19
Discussions – (2)
• Behavior Consistency―By attracting more registrants, there are higher chances of receiving
satisfactory submissions, however the willingness to submit for each individual worker reduces.
―This reflects the behavior inconsistency from task registration to task completion, which supports the assumption of distracting factors in the conceptual model.
―Possible distracting factors include competition pressure, insufficient time to complete the task, etc.
• Answer to RQ2: There is a strong positive correlation of 0.71 between number of submissions and number of registrants. However, there is a decreasing tendency in making submission as the number of registrants increases.
20
Discussions – (3)
• Quality of Submission ―The positive correlation between number of registrants and submission
scores confirms the improved quality due to leveraging on worker diversity ―However, the low correlation of 0.19 indicate that the previously reported
great impact of team diversity could be limited or weakened by many distracting factors due to increased competition
―Similar viewpoint: “maximum point” reported by Tajedin et al.
• In summary, the answer to RQ3 is: There is a weak positive correlation of 0.19 between number of registrants and score of the winning submission.
21
Discussions – (4)
• Interaction of Award and Competition ―Analysis IV demonstrate some examples on optimal award. ―It is recommended for task requesters to design hybrid competition
combining both collaborative and competitive tasks.―Future decision support research directions:o Pricing strategies such as broader competition or higher quality o Sensitivity analysis for task requesters to explore different options with respect to
their needs and preferences;
• Answer to RQ4: For similar tasks, the relationship between award and worker behavior follows a variety of inverted U-shape curves.
22
Future Works• Further evaluation
―Additional dataset from Jan. 2014-Feb. 2015
• In-depth causality analysis using more attributes; ―task complexity, worker registration order, worker availability (multi-tasking),
worker rating, and so on;
• Predictive models to support strategic pricing.