Scripting User Contributed Interlinking

Preview:

DESCRIPTION

Presentation about User Contributed Interlinking at Scripting for the Semantic Web (SFSW) 2008 workshop at European Semantic Web Conference (ESWC) 2008

Citation preview

Institute of Information Systems & Information Management

Scripting User Contributed Interlinking

Michael Hausenblas, Wolfgang Halb, and Yves RaimondSFSW08, Tenerife, Spain

2008-06-02

2

Agenda Linked Data 101 A first step in UCI – http://riese.joanneum.at Towards Generalising UCI Demo

3

Linked Data: Principles Items should be identified using URI references [

URIrefs] (and: don’t use bNodes) URIrefs should be dereferenceable: using HTTP

URIs allows looking up the items identified through URIrefs, cf. [http-range-14 TAG finding]

Looking up an URIref leads to more data [follow-your-nose principle]

Links to other URIrefs should be included in order to enable the discovery of more data [How to Publish Linked Data on the Web]

4

Linked Data: Datasets (2008)

By courtesy of Richard Cyganiak, http://richard.cyganiak.de/2007/10/lod/

5

Linked Data: Issues Building

RDFising process (schema, mapping) Interlinking (automagically, manual) Deployment (SPARQL end point, dump, RDFa, etc.)

Using Provenance, trust, rights, etc. Access (depending on deployment) Performance (deref chain, reliability) Discovery (which is the right LOD dataset for my task ?)

6

A first step in UCI - riese

http://riese.joanneum.at

7

riese: A first step in UCI riese, the ‘RDFizing and Interlinking the EuroStat

Dataset Effort’ aims to offer an RDFised and interlinked version of the Eurostat data (http://ec.europa.eu/eurostat)

Eurostat data is high-volume data (5 GB data dump in approx. 4,000 TSV files; 350 million data values 80,000 different data codes)

Currently we serve 3.6 million triples, interlinking with Geonames (DBpedia and Wordnet upcoming)

Data is exposed as XHTML+RDFa, SPARQL end-point and as dump (+semantic sitemap description)

8

riese: architecture

9

riese: inside Server

Apache 2.2 SWI-Prolog PHP 5 p2r/Ceriese (see Yves’s blog post) (RDF/XML documents in the file system)

Client XHTML+RDFa Javascript/Yahoo! Interface Library [YUI]

Vocabulary (triggered the development of scovo, the Statistical Core Vocabulary together with Talis and Lee Feigenbaum, see http://purl.org/NET/scovo)

10

riese: User Contributed Interlinking

11

riese: User Contributed Interlinking

12

riese: issues Dynamic content (Ajax) vs. embedded metadata

(RDFa). Local agent has the data in the DOM, but external agent can not access it. No real solution, yet.

Scalability & Performance. When data is fine-granular and high-volume, how much to embed directly in a page?

How to notify users about data updates? We currently experiment with AtomOwl deployed in RDFa (http://riese.joanneum.at/updates/)

13

Towards Generalising UCI Next step after riese was to decouple the UCI and

generalise it. The result is: I R S (interlinking of resources with semantics, see also poster session)

I R S features query, add, remove semantic links (owl:sameAs, rdfs:seeAlso,

foaf:topic, etc.) subject and object can be set by user (restriction: URIs only) resource preview (debug) expose data in XHTML+RDFa + SPARQL end point lookup in http://sindice.com for unknown resources simple provenance tracking through named graphs

14

Towards Generalising UCI: I R S

15

Towards Generalising UCI: I R S

16

I R S issues

Motivation for end-user to contribute has yet to be researched

Trust issues arise (experimenting with OpenID) Generic UCI requires high level of abstraction

(maybe only for geeks and not suitable for an end-user)

To get an overview of what is available some other mechanism should be offered (currently only SPARQL end point)

Validation of resources is desirable (e.g. type of target, information vs. non-information resource, etc.)

17

Discussion UCI can help creating high-quality semantic links Social process needs to be researched (might turn

out that it is pretty similar to the Wiki ecosystem) Some type of content such as multimedia content

might benefit more from UCI than others Is generic UCI only for geeks? To really be

successful, the UCI likely needs to be embedded into a domain-specific application

BTW, I R S is also a nice LOD debugger ;) Questions?

Recommended