Crowdsourcing Linked Data Quality Assessment

www.kit.edu

@Data Quality Tutorial, September 12, 2016

Crowdsourcing Linked Data Quality AssessmentAmrapali Zaveri

Linked Data - over billion facts

What about the quality?

Motivation - Linked Data Quality

Varying quality of Linked Data sources Source, Extraction, Integration etc.

Some quality issues require certain interpretation that can be easily performed by humans

Incompleteness Incorrectness Semantic Accuracy

Motivation - Linked Data Quality

Solution: Include human verification in the process of LD quality assessment via crowdsourcing

Human Intelligent Tasks (HIT) Labor market Monetary Reward/Incentive Time & Cost effective

Large-scale problem solving approach, divided into

smaller tasks, independently solved by a large

group of people.

Research questions

RQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms?

RQ2: What type of crowd is most suitable for each type of quality issue?

RQ3: Which types of errors are made by lay users and experts when assessing RDF triples?

Related work

Crowdsourcing & Linked Data

Web of data quality

assessment

Our work

ZenCrowd

Entity resolution

CrowdMAP Ontology alignment

GWAP for LD

Assessing LD

mappings (Automatic)

Quality characteristics of LD data sources

(Semi-automatic)

DBpedia

WIQA, Sieve,

(Manual)

OUR APPROACH

Find-Verify Phases of Crowdsourcing

Contest LD Experts Difficult task Final prize

Find Verify

Microtasks Workers Easy task Micropayments

TripleCheckMate [Kontoskostas2013] MTurk

(1) Adapted from [Bernstein2010]http://mturk.com

LD Experts AMT Workers

Type Contest-based Human Intelligent Tasks (HITs)

Participants Linked Data (LD) experts Labor market

Task Detect and classify quality issues in resources

Detect quality issues in triples

Reward Most no. of resources evaluated

Per task/triple

Tool TripleCheckMate Amazon Mechanical Turk, CrowdFlower etc.

Difference between LD experts & Workers

MethodologyCrowdsource using: • Linked Data experts - Find Phase• Amazon Mechanical Turk workers - Verify Phase

Crowdsourcing using Linked Data Experts — Methodology

Phase I: Creation of quality problem taxonomy Phase II: Launching a contest

Zaveri et. al. Quality assessment methodologies for Linked Open Data. Semantic Web Journal, 2015.

D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki

Crowdsourcing using Linked Data Experts — Quality Problem Taxonomy


Crowdsourcing using Linked Data Experts — Quality Problem Taxonomy


http://nl.dbpedia.org:8080/TripleCheckMate-Demo/

Crowdsourcing using Linked Data Experts — Contest

Crowdsourcing using Linked Data Experts — Results

MethodologyCrowdsource using: • Linked Data experts - Find Phase • Amazon Mechanical Turk workers - Verify Phase

Crowdsourcing using AMT Workers

Selecting LD quality issues to crowdsource Designing and generating the micro tasks to present the data to the crowd

1

2

Dataset{s p o .}

{s p o .}

Correct

Incorrect + Quality issue

Steps:

1

2

3

Three categories of quality problems occur pervasively in DBpedia and can be crowdsourced:

Incorrect/Incomplete object ▪Example: dbpedia:Dave_Dobbyndbprop:dateOfBirth“3”.

Incorrect data type or language tags ▪Example: dbpedia:Torishima_Izu_Islandsfoaf:name“ ”@en.

Incorrect link to “external Web pages” ▪Example: dbpedia:John-Two-Hawksdbpedia-owl:wikiPageExternalLink <http://cedarlakedvd.com/>

Selecting LD quality issues to crowdsource

1

http://cedarlakedvd.com/

Presenting the data to the crowd

• Selection of foaf:name or rdfs:label to extract human-readable descriptions

• Values extracted automatically from Wikipedia infoboxes

• Link to the Wikipedia article via foaf:isPrimaryTopicOf

• Preview of external pages by implementing HTML iframe

Microtask interfaces: MTurk tasksIncorrect object

Incorrect data type or language tag

Incorrect outlink

2

EXPERIMENTAL STUDY

Experimental design

• Crowdsourcing approaches: • Find stage: Contest with LD experts • Verify stage: Microtasks

• Creation of a gold standard: • Two of the authors of this paper (MA, AZ) generated the gold

standard for all the triples obtained from the contest • Each author independently evaluated the triples • Conflicts were resolved via mutual agreement

• Metric: precision

Overall results

LD Experts Microtask workers

Number of distinct participants

50 80

Total time 3 weeks (predefined) 4 days

Total triples evaluated 1,512 1,073

Total cost ~ US$ 400 (predefined) ~ US$ 43

Precision results: Incorrect object task• MTurk workers can be used to reduce the error rates of LD experts for

the Find stage

• 117 DBpedia triples had predicates related to dates with incorrect/incomplete values:

”2005 Six Nations Championship” Date 12 .

• 52 DBpedia triples had erroneous values from the source: ”English (programming language)” Influenced by ? .

• Experts classified all these triples as incorrect • Workers compared values against Wikipedia and successfully classified

this triples as “correct”

Triples compared LD Experts MTurk (majority voting: n=5)

509 0.7151 0.8977

=

Precision results: Incorrect data type taskN

umbe

r of t

riple

s

0

38

75

113

150

Data types Date Millimetre Number Second Year

Experts TPExperts FPCrowd TPCrowd FP

Triples compared LD Experts MTurk (majority voting: n=5)

341 0.8270 0.4752

Precision results: Incorrect link task

• We analyzed the 189 misclassifications by the experts:

• The misclassifications by the workers correspond to pages with a language different from English.

11%

39%50%

Freebase linksWikipedia imagesExternal links

Triples compared Baseline LD Experts MTurk (n=5 majority voting)

223 0.2598 0.1525 0.9412

Final discussionRQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms?

LD experts - incorrect datatype AMT workers - incorrect/incomplete object value, incorrect

interlink

RQ2: What type of crowd is most suitable for each type of quality issue? The effort of LD experts must be applied on those tasks demanding specific-domain skills. AMT workers were exceptionally good at performing data comparisons

RQ3: Which types of errors are made by lay users and experts? Lay users do not have the skills to solve domain-specific tasks, while experts performance is very low on tasks that demand an extra effort (e.g., checking an external page)

CONCLUSIONS, CHALLENGES & FUTURE WORK

Conclusions

A crowdsourcing methodology for LD quality assessment: Find stage: LD experts Verify stage: AMT workers

Methodology and tool are generic to be applied to other scenarios Crowdsourcing approaches are feasible in detecting the studied quality issues

Challenges

Lack of gold-standard Crowdsourcing design — how many workers? how many tasks? reward? Microtask design

Future Work

Combining semi-automated and crowdsourcing methods Predicted vs. crowdsourced metadata

Conducting new experiments (other domains)

Entity, dataset, experimental metadata Fix/Improve Quality using Crowdsourcing

Find-Fix-Verify Phases

ReferencesTriplecheckmate: A tool for crowdsourcing the quality assessment of linked data. D Kontokostas, A Zaveri, S Auer, J Lehmann, ISWC 2013. Crowdsourcing linked data quality assessment. M Acosta, A Zaveri, E Simperl, D Kontokostas, S Auer, J Lehmann, ISWC 2013. User-driven quality evaluation of DBpedia. A Zaveri, D Kontokostas, MA Sherif, L Bühmann, M Morsey, S Auer, J Lehmann, ISEMANTiCS 2013. Quality assessment for linked data: A survey. A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer, Semantic Web Journal 2015. Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. M Acosta, A Zaveri, E Simperl, D Kontokostas, F Flöck, J Lehmann, Semantic Web Journal 2016. ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment. U Hassan, A Zaveri, E Marx, E Curry, J Lehmann. EKAW 2016.

Thank You Questions?

[email protected] @AmrapaliZ

mailto:[email protected]?subject=

Technology

Crowdsourcing Linked Data Quality Assessment