32
www.kit.edu @Data Quality Tutorial, September 12, 2016 Crowdsourcing Linked Data Quality Assessment Amrapali Zaveri

Crowdsourcing Linked Data Quality Assessment

Embed Size (px)

Citation preview

Page 1: Crowdsourcing Linked Data Quality Assessment

www.kit.edu

@Data Quality Tutorial, September 12, 2016

Crowdsourcing Linked Data Quality AssessmentAmrapali Zaveri

Page 2: Crowdsourcing Linked Data Quality Assessment

Linked Data - over billion facts

What about the quality?

Page 3: Crowdsourcing Linked Data Quality Assessment

Motivation - Linked Data Quality

Varying quality of Linked Data sources Source, Extraction, Integration etc.

Some quality issues require certain interpretation that can be easily performed by humans

Incompleteness Incorrectness Semantic Accuracy

Page 4: Crowdsourcing Linked Data Quality Assessment

Motivation - Linked Data Quality

Solution: Include human verification in the process of LD quality assessment via crowdsourcing

Human Intelligent Tasks (HIT) Labor market Monetary Reward/Incentive Time & Cost effective

Large-scale problem solving approach, divided into

smaller tasks, independently solved by a large

group of people.

Page 5: Crowdsourcing Linked Data Quality Assessment

Research questions

RQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms?

RQ2: What type of crowd is most suitable for each type of quality issue?

RQ3: Which types of errors are made by lay users and experts when assessing RDF triples?

Page 6: Crowdsourcing Linked Data Quality Assessment

Related work

Crowdsourcing & Linked Data

Web of data quality

assessment

Our work

ZenCrowd

Entity resolution

CrowdMAP Ontology alignment

GWAP for LD

Assessing LD

mappings (Automatic)

Quality characteristics of LD data sources

(Semi-automatic)

DBpedia

WIQA, Sieve,

(Manual)

Page 7: Crowdsourcing Linked Data Quality Assessment

OUR APPROACH

Page 8: Crowdsourcing Linked Data Quality Assessment

Find-Verify Phases of Crowdsourcing

Contest LD Experts Difficult task Final prize

Find Verify

Microtasks Workers Easy task Micropayments

TripleCheckMate [Kontoskostas2013] MTurk

(1) Adapted from [Bernstein2010]http://mturk.com

Page 9: Crowdsourcing Linked Data Quality Assessment

LD Experts AMT Workers

Type Contest-based Human Intelligent Tasks (HITs)

Participants Linked Data (LD) experts Labor market

Task Detect and classify quality issues in resources

Detect quality issues in triples

Reward Most no. of resources evaluated

Per task/triple

Tool TripleCheckMate Amazon Mechanical Turk, CrowdFlower etc.

Difference between LD experts & Workers

Page 10: Crowdsourcing Linked Data Quality Assessment

MethodologyCrowdsource using: • Linked Data experts - Find Phase• Amazon Mechanical Turk workers - Verify Phase

Page 11: Crowdsourcing Linked Data Quality Assessment

Crowdsourcing using Linked Data Experts — Methodology

Phase I: Creation of quality problem taxonomy Phase II: Launching a contest

Page 12: Crowdsourcing Linked Data Quality Assessment

Zaveri et. al. Quality assessment methodologies for Linked Open Data. Semantic Web Journal, 2015.

D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki

Crowdsourcing using Linked Data Experts — Quality Problem Taxonomy

Page 13: Crowdsourcing Linked Data Quality Assessment

D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki

Crowdsourcing using Linked Data Experts — Quality Problem Taxonomy

D = Detectable, F = Fixable, E = Extraction Framework, M = Mappings Wiki

Page 14: Crowdsourcing Linked Data Quality Assessment

http://nl.dbpedia.org:8080/TripleCheckMate-Demo/

Crowdsourcing using Linked Data Experts — Contest

Page 15: Crowdsourcing Linked Data Quality Assessment

Crowdsourcing using Linked Data Experts — Results

Page 16: Crowdsourcing Linked Data Quality Assessment

MethodologyCrowdsource using: • Linked Data experts - Find Phase • Amazon Mechanical Turk workers - Verify Phase

Page 17: Crowdsourcing Linked Data Quality Assessment

Crowdsourcing using AMT Workers

Selecting LD quality issues to crowdsource Designing and generating the micro tasks to present the data to the crowd

1

2

Dataset{s p o .}

{s p o .}

Correct

Incorrect + Quality issue

Steps:

1

2

3

Page 18: Crowdsourcing Linked Data Quality Assessment

Three categories of quality problems occur pervasively in DBpedia and can be crowdsourced:

Incorrect/Incomplete object ▪Example: dbpedia:Dave_Dobbyndbprop:dateOfBirth“3”.

Incorrect data type or language tags ▪Example: dbpedia:Torishima_Izu_Islandsfoaf:name“ ”@en.

Incorrect link to “external Web pages” ▪Example: dbpedia:John-Two-Hawksdbpedia-owl:wikiPageExternalLink <http://cedarlakedvd.com/>

Selecting LD quality issues to crowdsource

1

Page 19: Crowdsourcing Linked Data Quality Assessment

Presenting the data to the crowd

• Selection of foaf:name or rdfs:label to extract human-readable descriptions

• Values extracted automatically from Wikipedia infoboxes

• Link to the Wikipedia article via foaf:isPrimaryTopicOf

• Preview of external pages by implementing HTML iframe

Microtask interfaces: MTurk tasksIncorrect object

Incorrect data type or language tag

Incorrect outlink

2

Page 20: Crowdsourcing Linked Data Quality Assessment

EXPERIMENTAL STUDY

Page 21: Crowdsourcing Linked Data Quality Assessment

Experimental design

• Crowdsourcing approaches: • Find stage: Contest with LD experts • Verify stage: Microtasks

• Creation of a gold standard: • Two of the authors of this paper (MA, AZ) generated the gold

standard for all the triples obtained from the contest • Each author independently evaluated the triples • Conflicts were resolved via mutual agreement

• Metric: precision

Page 22: Crowdsourcing Linked Data Quality Assessment

Overall results

LD Experts Microtask workers

Number of distinct participants

50 80

Total time 3 weeks (predefined) 4 days

Total triples evaluated 1,512 1,073

Total cost ~ US$ 400 (predefined) ~ US$ 43

Page 23: Crowdsourcing Linked Data Quality Assessment

Precision results: Incorrect object task• MTurk workers can be used to reduce the error rates of LD experts for

the Find stage

• 117 DBpedia triples had predicates related to dates with incorrect/incomplete values:

”2005 Six Nations Championship” Date 12 .

• 52 DBpedia triples had erroneous values from the source: ”English (programming language)” Influenced by ? .

• Experts classified all these triples as incorrect • Workers compared values against Wikipedia and successfully classified

this triples as “correct”

Triples compared LD Experts MTurk (majority voting: n=5)

509 0.7151 0.8977

Page 24: Crowdsourcing Linked Data Quality Assessment

=

Precision results: Incorrect data type taskN

umbe

r of t

riple

s

0

38

75

113

150

Data types Date Millimetre Number Second Year

Experts TPExperts FPCrowd TPCrowd FP

Triples compared LD Experts MTurk (majority voting: n=5)

341 0.8270 0.4752

Page 25: Crowdsourcing Linked Data Quality Assessment

Precision results: Incorrect link task

• We analyzed the 189 misclassifications by the experts:

• The misclassifications by the workers correspond to pages with a language different from English.

11%

39%50%

Freebase linksWikipedia imagesExternal links

Triples compared Baseline LD Experts MTurk (n=5 majority voting)

223 0.2598 0.1525 0.9412

Page 26: Crowdsourcing Linked Data Quality Assessment

Final discussionRQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms?

LD experts - incorrect datatype AMT workers - incorrect/incomplete object value, incorrect

interlink

RQ2: What type of crowd is most suitable for each type of quality issue? The effort of LD experts must be applied on those tasks demanding specific-domain skills. AMT workers were exceptionally good at performing data comparisons

RQ3: Which types of errors are made by lay users and experts? Lay users do not have the skills to solve domain-specific tasks, while experts performance is very low on tasks that demand an extra effort (e.g., checking an external page)

Page 27: Crowdsourcing Linked Data Quality Assessment

CONCLUSIONS, CHALLENGES & FUTURE WORK

Page 28: Crowdsourcing Linked Data Quality Assessment

Conclusions

A crowdsourcing methodology for LD quality assessment: Find stage: LD experts Verify stage: AMT workers

Methodology and tool are generic to be applied to other scenarios Crowdsourcing approaches are feasible in detecting the studied quality issues

Page 29: Crowdsourcing Linked Data Quality Assessment

Challenges

Lack of gold-standard Crowdsourcing design — how many workers? how many tasks? reward? Microtask design

Page 30: Crowdsourcing Linked Data Quality Assessment

Future Work

Combining semi-automated and crowdsourcing methods Predicted vs. crowdsourced metadata

Conducting new experiments (other domains)

Entity, dataset, experimental metadata Fix/Improve Quality using Crowdsourcing

Find-Fix-Verify Phases

Page 31: Crowdsourcing Linked Data Quality Assessment

ReferencesTriplecheckmate: A tool for crowdsourcing the quality assessment of linked data. D Kontokostas, A Zaveri, S Auer, J Lehmann, ISWC 2013. Crowdsourcing linked data quality assessment. M Acosta, A Zaveri, E Simperl, D Kontokostas, S Auer, J Lehmann, ISWC 2013. User-driven quality evaluation of DBpedia. A Zaveri, D Kontokostas, MA Sherif, L Bühmann, M Morsey, S Auer, J Lehmann, ISEMANTiCS 2013. Quality assessment for linked data: A survey. A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer, Semantic Web Journal 2015. Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study. M Acosta, A Zaveri, E Simperl, D Kontokostas, F Flöck, J Lehmann, Semantic Web Journal 2016. ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment. U Hassan, A Zaveri, E Marx, E Curry, J Lehmann. EKAW 2016.

Page 32: Crowdsourcing Linked Data Quality Assessment

Thank You Questions?

[email protected] @AmrapaliZ