Upload
amrapali-zaveri-phd
View
261
Download
1
Embed Size (px)
Citation preview
www.kit.edu
@Data Quality Tutorial, September 12, 2016
Crowdsourcing Linked Data Quality AssessmentAmrapali Zaveri
Linked Data - over billion facts
What about the quality?
Motivation - Linked Data Quality
Varying quality of Linked Data sources Source, Extraction, Integration etc.
Some quality issues require certain interpretation that can be easily performed by humans
Incompleteness Incorrectness Semantic Accuracy
Motivation - Linked Data Quality
Solution: Include human verification in the process of LD quality assessment via crowdsourcing
Human Intelligent Tasks (HIT) Labor market Monetary Reward/Incentive Time & Cost effective
Large-scale problem solving approach, divided into
smaller tasks, independently solved by a large
group of people.
Research questions
RQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms?
RQ2: What type of crowd is most suitable for each type of quality issue?
RQ3: Which types of errors are made by lay users and experts when assessing RDF triples?
Related work
Crowdsourcing & Linked Data
Web of data quality
assessment
Our work
ZenCrowd
Entity resolution
CrowdMAP Ontology alignment
GWAP for LD
Assessing LD
mappings (Automatic)
Quality characteristics of LD data sources
(Semi-automatic)
DBpedia
WIQA, Sieve,
(Manual)
OUR APPROACH
Find-Verify Phases of Crowdsourcing
Contest LD Experts Difficult task Final prize
Find Verify
Microtasks Workers Easy task Micropayments
TripleCheckMate [Kontoskostas2013] MTurk
(1) Adapted from [Bernstein2010]http://mturk.com
LD Experts AMT Workers
Type Contest-based Human Intelligent Tasks (HITs)
Participants Linked Data (LD) experts Labor market
Task Detect and classify quality issues in resources
Detect quality issues in triples
Reward Most no. of resources evaluated
Per task/triple
Tool TripleCheckMate Amazon Mechanical Turk, CrowdFlower etc.
Difference between LD experts & Workers
MethodologyCrowdsource using: • Linked Data experts - Find Phase• Amazon Mechanical Turk workers - Verify Phase
Crowdsourcing using Linked Data Experts — Methodology
Phase I: Creation of quality problem taxonomy Phase II: Launching a contest
Zaveri et. al. Quality assessment methodologies for Linked Open Data. Semantic Web Journal, 2015.
D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki
Crowdsourcing using Linked Data Experts — Quality Problem Taxonomy
D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki
Crowdsourcing using Linked Data Experts — Quality Problem Taxonomy
D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki
http://nl.dbpedia.org:8080/TripleCheckMate-Demo/
Crowdsourcing using Linked Data Experts — Contest
Crowdsourcing using Linked Data Experts — Results
MethodologyCrowdsource using: • Linked Data experts - Find Phase • Amazon Mechanical Turk workers - Verify Phase
Crowdsourcing using AMT Workers
Selecting LD quality issues to crowdsource Designing and generating the micro tasks to present the data to the crowd
1
2
Dataset{s p o .}
{s p o .}
Correct
Incorrect + Quality issue
Steps:
1
2
3
Three categories of quality problems occur pervasively in DBpedia and can be crowdsourced:
Incorrect/Incomplete object ▪Example: dbpedia:Dave_Dobbyndbprop:dateOfBirth“3”.
Incorrect data type or language tags ▪Example: dbpedia:Torishima_Izu_Islandsfoaf:name“ ”@en.
Incorrect link to “external Web pages” ▪Example: dbpedia:John-Two-Hawksdbpedia-owl:wikiPageExternalLink <http://cedarlakedvd.com/>
Selecting LD quality issues to crowdsource
1
Presenting the data to the crowd
• Selection of foaf:name or rdfs:label to extract human-readable descriptions
• Values extracted automatically from Wikipedia infoboxes
• Link to the Wikipedia article via foaf:isPrimaryTopicOf
• Preview of external pages by implementing HTML iframe
Microtask interfaces: MTurk tasksIncorrect object
Incorrect data type or language tag
Incorrect outlink
2
EXPERIMENTAL STUDY
Experimental design
• Crowdsourcing approaches: • Find stage: Contest with LD experts • Verify stage: Microtasks
• Creation of a gold standard: • Two of the authors of this paper (MA, AZ) generated the gold
standard for all the triples obtained from the contest • Each author independently evaluated the triples • Conflicts were resolved via mutual agreement
• Metric: precision
Overall results
LD Experts Microtask workers
Number of distinct participants
50 80
Total time 3 weeks (predefined) 4 days
Total triples evaluated 1,512 1,073
Total cost ~ US$ 400 (predefined) ~ US$ 43
Precision results: Incorrect object task• MTurk workers can be used to reduce the error rates of LD experts for
the Find stage
• 117 DBpedia triples had predicates related to dates with incorrect/incomplete values:
”2005 Six Nations Championship” Date 12 .
• 52 DBpedia triples had erroneous values from the source: ”English (programming language)” Influenced by ? .
• Experts classified all these triples as incorrect • Workers compared values against Wikipedia and successfully classified
this triples as “correct”
Triples compared LD Experts MTurk (majority voting: n=5)
509 0.7151 0.8977
=
Precision results: Incorrect data type taskN
umbe
r of t
riple
s
0
38
75
113
150
Data types Date Millimetre Number Second Year
Experts TPExperts FPCrowd TPCrowd FP
Triples compared LD Experts MTurk (majority voting: n=5)
341 0.8270 0.4752
Precision results: Incorrect link task
• We analyzed the 189 misclassifications by the experts:
• The misclassifications by the workers correspond to pages with a language different from English.
11%
39%50%
Freebase linksWikipedia imagesExternal links
Triples compared Baseline LD Experts MTurk (n=5 majority voting)
223 0.2598 0.1525 0.9412
Final discussionRQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms?
LD experts - incorrect datatype AMT workers - incorrect/incomplete object value, incorrect
interlink
RQ2: What type of crowd is most suitable for each type of quality issue? The effort of LD experts must be applied on those tasks demanding specific-domain skills. AMT workers were exceptionally good at performing data comparisons
RQ3: Which types of errors are made by lay users and experts? Lay users do not have the skills to solve domain-specific tasks, while experts performance is very low on tasks that demand an extra effort (e.g., checking an external page)
CONCLUSIONS, CHALLENGES & FUTURE WORK
Conclusions
A crowdsourcing methodology for LD quality assessment: Find stage: LD experts Verify stage: AMT workers
Methodology and tool are generic to be applied to other scenarios Crowdsourcing approaches are feasible in detecting the studied quality issues
Challenges
Lack of gold-standard Crowdsourcing design — how many workers? how many tasks? reward? Microtask design
Future Work
Combining semi-automated and crowdsourcing methods Predicted vs. crowdsourced metadata
Conducting new experiments (other domains)
Entity, dataset, experimental metadata Fix/Improve Quality using Crowdsourcing
Find-Fix-Verify Phases
ReferencesTriplecheckmate: A tool for crowdsourcing the quality assessment of linked data. D Kontokostas, A Zaveri, S Auer, J Lehmann, ISWC 2013. Crowdsourcing linked data quality assessment. M Acosta, A Zaveri, E Simperl, D Kontokostas, S Auer, J Lehmann, ISWC 2013. User-driven quality evaluation of DBpedia. A Zaveri, D Kontokostas, MA Sherif, L Bühmann, M Morsey, S Auer, J Lehmann, ISEMANTiCS 2013. Quality assessment for linked data: A survey. A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer, Semantic Web Journal 2015. Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. M Acosta, A Zaveri, E Simperl, D Kontokostas, F Flöck, J Lehmann, Semantic Web Journal 2016. ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment. U Hassan, A Zaveri, E Marx, E Curry, J Lehmann. EKAW 2016.