Upload
cristina-sarasua
View
39
Download
2
Tags:
Embed Size (px)
Citation preview
Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Programmatic Access to CrowdsourcedHuman Computation for
Designing and Enhancing Interlinking
Cristina [email protected]
ESWC2015 Developers Workshop
CROWDKI 3Cristina Sarasua
Mostly identity links, few prominent interlinking hubs with high number of in-links, and still 44% of the analyzed datasets do
not contain out-links [Schmachtenberg et al., 2014]
To improve in heterogeneity and quantity of links we need:– To overcome computational limitations of automatic link
discovery methods
– Methods that assist data publishers in deciding the datasets to target and the way to define the interlinks.
Current LOD
CROWDKI 5Cristina Sarasua
Humans involved systematically in processing data for interlinking
Human input is collected via microtask crowdsourcing– Online marketplaces (e.g. Clickworker)
– Anyone registered (all around the world)
– Economic reward
– Divided into simple tasks (e.g. review a particular link between two resources)
– Large and dedicated workforce → fast completion time (hours / days)
Software to manage the generation and completion of microtasks related to interlinking: https://github.com/criscod/CROWDKI
Crowdsourced Human Computation for Interlinking
CROWDKI 6Cristina Sarasua
Input: list of interlinking possibilities and a particular context
Car wasDesigned by Person
Car wasDriven by Person
Car wasRecommended by Person
Output: contextual relevance assessment by the crowd
UC1:Assessing the relevance of different interlinking possibilities
CROWDKI 8Cristina Sarasua
Input: set of candidate links, RDF data Output: set of final links (extended / post-processed)
UC2:Validating and Enhancing automatically computed links:
CROWDKI 10Cristina Sarasua
Architecture
CrowdFlowerREST API
Microtasks templates & config
Java
Jena, SPARQL
JSON
Guava IO
CROWDKI 12Cristina Sarasua
Communication is key: processing data with typos may be interpreted wrongly
Communities of crowd workers are emerging Not all interlinking scenarios require human computation (e.g.
country ISO codes) – the challenge is to automatically decide when it is really worthwhile
Drawbacks: no real-time crowdsourcing and crowd workers cannot be selected accurately
CrowdFlower (the crowdsourcing platform used) provides more access to more feature via the UI than the API
Lessons Learned
CROWDKI 13Cristina Sarasua
Conclusions
Hybrid approaches (automatic + crowd interlinking) can be better (P,R) than purely automatic interlinking methods
CROWDKI could be used in combination with dataset recommendation methods that analyze the way data is already interlinked.
CROWDKI 14Cristina Sarasua
Which challenges did you face with state-of-the-art link discovery tools, what went wrong?
How do you think human computation can further help in interlinking?
What are in your opinion pros and cons of this approach?
Questions for the audience