Upload
exascale-infolab
View
221
Download
3
Tags:
Embed Size (px)
Citation preview
Mechanical cheatSpamming Schemes and Adversarial Techniques on Crowdsourcing PlatformsDjellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-
MaurouxUniversity of Fribourg, Switzerland
Popularity and Monetary Incentives
Micro task Crowdsourcing is growing in popularity. ~500k registered workers in AMT ~200k hits available (April 2012) ~20k $ of rewards (April 2012)
Spam could be a threat for Crowdsourcing
Some Experiments Results:Entity Link Selection (ZenCrowd – WWW2012)
Evidence of participations of dishonest workers, spending less time doing more tasks and achieving lesser quality.
Dishonest Answers on Crowdsourcing Platforms
We define a dishonest answer in a crowd sourcing context as answer that has been either: Randomly posted. Artificially generated. Duplicated from another source.
How can requesters perform quality control?
Go over all the submissions?
Blindly accept all submissions?
Use selection and filtering algorithms.
Anti adversarial techniques
Pre-selection and dissuasion Use built in control (ex: acceptance rate) Task design Qualification test
Post processing Task repetition and aggregation Test questions Machine learning (ex: probabilistic netw0rk in
ZenCrowd)
Countering adversarial techniquesOrganization
Countering adversarial techniquesIndividual attacks
Random Answers Target tasks designed with monetary incentive Countered with test questions
Automated Answers Target tasks with simple submission mechanism Counter with test questions (especially captchas)
Semi-Automated Answers Target easy hits achievable with some AI. Can pass easy-to-answer test questions Can detect captchas and forward them to a human.
Countering adversarial techniquesGroup attacks
Agree on Answers Target naïve aggregation schemes like majority vote. May discard valid answers! Counter by shuffling the options
Answer Sharing Target repeated tasks Counter with creating multiple batches
Artificial Clones Target repeated tasks
Conclusions and future work
We claim the inefficiency of some quality control tools to counter resourceful spammers.
Combine multiple techniques for post-filtering.
Crowdsourcing platforms to provide more tools.
Evaluation of future filtering algorithms must be repeatable and generic. Crowdsourcing benchmark.
Conclusions and future workBenchmark proposal
A collection of tasks with multiple choice options
Each task is repeated multiple times
Unpublished expert judgment for all the tasks
Publish answers completed in a controlled environment with the following categories of workers: Honest workers Random clicks Semi automated program Organized group
Post-filtering methods are evaluated based on their ability to achieve high precision score. Other parameter could be the money spent etc
DiscussionQ&A