Upload
mavis
View
40
Download
0
Embed Size (px)
DESCRIPTION
C ROWD S EARCHING (And Beyond). Stefano Ceri Politecnico di Milano Dipartimento di Elettronica , Informazione e BioIngegneria. Crowd-based Applications. Emerging crowd-based applications : opinion mining localized information gathering marketing campaigns - PowerPoint PPT Presentation
Citation preview
1
CROWDSEARCHING(AND BEYOND)
Stefano CeriPolitecnico di Milano
Dipartimento di Elettronica, Informazione e BioIngegneria
Crowdsearcher
2
Crowd-based Applications• Emerging crowd-based applications:
• opinion mining • localized information gathering • marketing campaigns • expert response gathering
• General structure: • the requestor poses some questions • a wide set of responders are in charge of providing answers
(typically unknown to the requestor)• the system organizes a response collection campaign
• Include crowdsourcing and crowdsearching
Crowdsearcher
3
The “system” is a wide concept • Crowd-based applications may use social networks and Q&A
websites in addition to crowdsourcing platforms• Our approach: a coordination engine which keeps an overall
control on the application deployment and execution
Crowdsearcher
CrowdSearcher
AP
I Access
4
A simple example of crowdsearching
Crowdsearcher
5
Example: Find your job (social invitation)
Crowdsearcher
6
Example: Find your job (social invitation)
Selected data items can be transferred to the crowd question
Crowdsearcher
7
Find your job (response submission)
Crowdsearcher
8
Crowdsearcher results (in the loop)Crowdsearcher
9
Deployment alternatives • Multi-platform deployment
Embedded application
Social/ Crowd platformNative
behaviours
External application
Standalone application
API
Embedding
Community / Crowd
Generated query template
Native
Crowdsearcher
10
Deployment: search on a social network• Multi-platform deployment
Crowdsearcher
11
Deployment: search on the social network• Multi-platform deployment
Crowdsearcher
12
Deployment: search on the social network• Multi-platform deployment
Crowdsearcher
13
Deployment: search on the social network• Multi-platform deployment
Crowdsearcher
14
THE MODEL ANDTHE PROCESS
Crowdsearcher
15
CrowdSearcher• Combines a conceptual framework, a specification
paradigm and a reactive execution control environment • Supports designing, deploying, and monitoring
applications on top of crowd-based systems• Design is top-down, platform-independent• Deployment turns declarative specifications into platform-specific
implementations which include social networks and crowdsourcing platforms
• Monitoring provides reactive control, which guarantees applications’ adaptation and interoperability
• Developed in the context of Search Computing (SeCo, ERC Advanced Grant, 2008-2013)
Crowdsearcher
16
• A simple task design and deployment process, based on specific data structures• created using model-driven transformations• driven by the task specification
The Design Process
Task Specification Task Planning Control
Specification
Crowdsearcher
• Task Specification: task operations, objects, and performers• Task Planning: work distribution• Control Specification: task control policies
17
DEMO !
Crowdsearcher
18
Valuable ideas: 1. Operation types• In a Task, performers are required to execute logical operations on input objects
• e.g. Locate the faces of the people appearing in the following 5 images
• CrowdSearcher offers pre-defined operation types:• Like: Ask a performer to express a preference (true/false)
• e.g. Do you like this picture?• Comment: Ask a performer to write a description / summary / evaluation
• e.g. Can you summarize the following text using your own words?• Tag: Ask a performer to annotate an object with a set of tags
• e.g. How would you label the following image?• Classify: Ask a performer to classify an object within a closed-set of alternatives
• e.g. Would you classify this tweet as pro-right, pro-left, or neutral? • Add: Ask a performer to add a new object conforming to the specified schema
• e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano?• Modify: Ask a performer to verify/modify the content of one or more input object
• e.g. Is this wine from Cinque Terre? If not, where does it come from? • Order: Ask a performer to order the input objects
• e.g. Order the following books according to your taste
Crowdsearcher
19
2. Platform-independent Meta-Model
Crowdsearcher
20
3. Reactive Crowdsourcing• A conceptual framework for controlling the execution of
crowd-based computations. Based on: • Control Marts• Active Rules
• Classical forms of controls:• Majority control (to close object computations)• Quality control (to check that quality constraints are met)• Spam detection (to detect / eliminate some performers)• Multi-platform adaptation (to change the deployment platform) • Social adaptation (to change the community of performers)
Crowdsearcher
21
Why Active Rules?• Ease of Use: control is easily expressible
• Simple formalism, simple computation• Power: arbitrarily complex controls is supported
• Extensibility mechanisms• Automation: active rules can be system-generated
• Well-defined semantics• Flexibility: localized impact of changes on the rules set
• Control isolation• Known formal properties descending from known theory
• Termination, confluence
Crowdsearcher
22
4. Control Mart• Data structure for controlling application execution, inspired by data
marts (for data warehousing); content is automatically built from task specification & planning
• Central entity: MicroTask Object Execution
• Dimensions: Task / Operations, Performer, Object
Crowdsearcher
Task Specification Task Planning Control Specification
23
Auxiliary Structures• Object : tracking object responses• Performer: tracking performer behavior (e.g. spammers)• Task: tracking task status
Crowdsearcher
Task Specification Task Planning Control Specification
24
Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm
Crowdsearcher
25
Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm
• Events: data updates / timer• ROW-level granularity
• OLD before state of a row• NEW after state of a row
Crowdsearcher
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
26
Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm
• Events: data updates / timer• ROW-level granularity
• OLD before state of a row• NEW after state of a row
• Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes)
Crowdsearcher
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
c: NEW.ClassifiedParty == ’Republican’
27
Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm
• Events: data updates / timer• ROW-level granularity
• OLD before state of a row• NEW after state of a row
• Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes)
• Actions: updates on data structures (e.g. change attribute value, create new instances), special functions (e.g. replan)
Crowdsearcher
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
c: NEW.ClassifiedParty == ’Republican’
a: SET ObjectControl[oID == NEW.oID].#Eval+= 1
28
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
c: NEW.ClassifiedParty == ’Republican’
a: SET ObjectControl[oID == NEW.oID].#Eval+= 1
Crowdsearcher
Rule Example 1
29
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
c: NEW.ClassifiedParty == ’Republican’
a: SET ObjectControl[oID == NEW.oID].#Eval+= 1
Crowdsearcher
Rule Example 1
30
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
c: NEW.ClassifiedParty == ’Republican’
a: SET ObjectControl[oID == NEW.oID].#Eval+= 1
Crowdsearcher
Rule Example 1
31
5. Rule Programming Best Practices• We define three classes of rules
Crowdsearcher
32
Rule Programming Best Practice
Crowdsearcher
• We define three classes of rules• Control rules: modifying the control tables;
33
Rule Programming Best Practice
Crowdsearcher
• We define three classes of rules• Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);
34
Rule Programming Best Practice
Crowdsearcher
• Top-to-bottom, left-to-right, evaluation• Guaranteed termination
• We define three classes of rules• Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);
35
Rule Programming Best Practice• We define three classes of rules
• Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task); • Execution rules: modifying the execution table, either directly or through re-planning
Crowdsearcher
• Termination must be proven (rule precedence graph has cycles)
36
6. Dealing with interoperability• Adaptation is any change of allocation of the application
to crowd-based systems or to their performers.• Migration is the moving of the application from a given
system to a different one. (Migration is a special case of adaptation)
• Cross-Platform Interoperability: applications change the underlying social network or crowdsourcing platforms, e.g., from Facebook to Twitter or to AMT.
• Cross-Community Interoperability: applications change the performers' community, e.g., from the students to the professors of a university.
Crowdsearcher
37
Adaptation optionsAdaptation may require: • Re-planning: the process of generating new micro-tasks. • Re-invitation: the process of generating new invitation
messages for existing or re-planned micro-tasks, with the aim of getting new performers for them.
Adaptation occurs at different levels of granularity• Task granularity: re-planning or re-invitation occurs for the
whole task• Object granularity: re-planning or re-invitation is focused on
one (or a few) objects (for instance, objects on which it is harder to achieve an agreement among performers, with a majority-based decision mechanisms).
Crowdsearcher
38
EXPERIMENTS
Crowdsearcher
39
Politician Affiliation• Given the picture and name of a politician, specify his/her political
affiliation• No time limit• Performers are encouraged to look up online
• 2 set of rules• Majority Evaluation• Spammer Detection
Crowdsearcher
40
Movie Scenes• users can select the screenshot timeframe and whether it is
a spoiler or not• 20 still images each from 16 popular movies• each micro-task consists of evaluating one image• Results are accepted, and the corresponding request is closed, when an agreement between 5 performers is reached both on the temporal category and the spoiler option, independently on the number of executions.
Crowdsearcher
41
Professors’ images• 16 professors within two
research groups in our department (DB and AI groups)
• The top 50 images returned by the Google Image API for each query
• Each microtask consisted of evaluating 5 images regarding a professor.
• Results are accepted (and thus the corresponding object is closed) when enough agreement on the class of the image is reached
• Closed objects are removed from new executions.
Crowdsearcher
42
SINGLE PLATFORM
Crowdsearcher
43
Query Type• Engagement depends on the difficulty of the task• Like vs. Add tasks:
Crowdsearcher
44
Comparison of Execution Platforms• Facebook vs. Doodle
Crowdsearcher
45
Posting Time• Facebook vs. Doodle
Crowdsearcher
46
Majority Evaluation_1/3
Crowdsearcher
30 object; object redundancy = 9; Final object classification as simple majority after 7 evaluations
47
Majority Evaluation_2/3
Crowdsearcher
Final object classification as total majority after 3 evaluationsOtherwise, re-plan of 4 additional evaluations. Then simple majority at 7
48
Majority Evaluation_3/3
Crowdsearcher
Final object classification as total majority after 3 evaluationsOtherwise, simple majority at 5 or at 7 (with replan)
49
Spammer Detection_1/2
Crowdsearcher
New rule for spammer detection without ground truthPerformer correctness on final majority. Spammer if > 50% wrong classifications
50
Spammer Detection_1/2
Crowdsearcher
New rule for spammer detection without ground truthPerformer correctness on current majority. Spammer if > 50% wrong classifications
51
MULTI-PLATFORM &MULTICOMMUNITY
Crowdsearcher
52
Number of Executions per Platform• Immediate engagement and then plateau. • Higher engagement on AMT (paid) then SN (unpaid and limited by #
of contacts of inviter)
Crowdsearcher
53
Precision of Performers per Platform
Crowdsearcher
• AMT significantly lower in precision
54
Precision on Closed Objects • Precision decreases on crowdsourcing platforms• Agreement increases precision wrt single performers
Crowdsearcher
55
Number of performers per community
Crowdsearcher
58
Precision for different engagement strategies
• Precision decreases with less expert communities• Inside-out strategy (from expert to generic users)
outperforms outside-in strategy (from generic to expert)
Crowdsearcher
59
EXPERT FINDING IN CROWDSEARCHER
Crowdsearcher
60
Problem• Ranking the members of a social group according to the level of knowledge that they have about a given topic
• Application: crowd selection (for Crowd Searching or Sourcing)
• Available data• User profile • behavioral trace that users leave behind them through
their social activities
Crowdsearcher
61
Most interesting aspect:Feature Organization Meta-Model
Crowdsearcher
62
Main Results• Profiles are less effective than level-1 resources
• Resources produced by others help in describing each individual’s expertise
• Twitter is the most effective social network for expertise matching – sometimes it outperforms the other social networks• Twitter most effective in Computer Engineering, Science, Technology &
Games, Sport• Facebook effective in Locations, Sport, Movies & TV, Music• Linked-in never very helpful in locating expertise
Crowdsearcher
63
CONCLUSIONS
Crowdsearcher
64
Summary• Results
• An integrated framework for crowdsourcing task design and control• Well-structured control rules with guarantees of termination• Support for cross-platform crowd interoperability• A working prototype crowdsearcher.search-computing.org
• Forthcoming• Publication of Web Interface + API• Support of declarative options for automatic rule generation• Integration with more social networks and human computation
platforms • Providing vertical solutions for specific markets• More applications and experiments (e.g. in Expo 2015)
Crowdsearcher
65
APPENDIX
Crowdsearcher
66
Current «other» interest:Genomic Computing
• NGS changes biology & medicine Massive DNA testing becoming available DNA-based personalized medicine approaching
• NGS data management looks to me as the biggest and most important big-data problem, but:
• No high-level view of genome data supporting high-level query and search
• No scalable method for NGS data analysis
-> Data management research for NGS data has very good potential for impact!
Crowdsearcher
Single-Gene Disease Mutations
69
Our results so far:
1. Intuition: genomic data management requires a «genometric space data processing system»
2. Data model for profiling NGS data
3. System Architecture for managing genometric queries
4. Genometric Data Model (GQM) and Genometric Query Language (GQL) + mapping to PIG LATIN (Query Language for Hadoop).
Long-Term Goal: INTERNET OF GENOMES
70
QUESTIONS?
Crowdsearcher