22
LOD2 Webinar . 29.11.2011 . Page 1 http:// lod2.eu Creating Knowledge out of Interlinked Data

LOD2 Webinar: UnifiedViews

Embed Size (px)

DESCRIPTION

UnifiedViews is a joint project currently maintained by Semantic Web Company (SWC) and Semantica.cz (Semantica.cz). It has been mainly developed by Charles University in Prague as a student project called ODCleanStore (version 2). It is based on the experience SWC obtained with the LOD Management Suite (LODMS) used in WP7 and ODCleansStore (version 1) developed by Charles University in Prague for the WP9a use case of the LOD2 FP7 project. In the next stack release of the LOD2 stack, UnifiedViews will replace LODMS as an ETL tool in the stack and the tool has already been adopted in other projects. In the webinar we will give a brief overview of the UnifiedViews project (Helmut Nagy). The main part will be a presentation of the tool and it's capabilities (Tomas Knap)

Citation preview

Page 1: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 1 http://lod2.eu

Creating Knowledge out of Interlinked Data

Page 2: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. This 4-year project comprises leading Linked Open Data technology researchers, companies, and service providers. Coming from across 12 countries the partners are coordinated by the Agile Knowledge Engineering and Semantic Web Research Group at the University of Leipzig, Germany.

LOD2 will integrate and syndicate Linked Data with existing large-scale applications. The project shows the benefits in the scenarios of Media and Publishing, Corporate Data intranets and eGovernment.

Page 3: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

Once per month the LOD2 webinar series offer a free webinar about tools and services along the Linked Open Data Life Cycle.

Stay with us and learn more about acquisition, editing, composing, connected applications – and finally publishing Linked Open Data.

Page 4: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 4 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

UnifiedViewsTomáš Knap, Semantica.cz Helmut Nagy, Semantic Web Company

Page 5: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 5 http://lod2.eu

Creating Knowledge out of Interlinked Data

• What is UnifiedViews• Short History: From ODCleanstore & LOD Manager to Unified Views• Presentation of Unified Views• Outlook, Impact, the UnifiedViews project

Agenda

Page 6: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 6 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Suppose a Linked Data consumer, who is defining a data processing task - building a data mart integrating information from various RDF and non-RDF sources.

– There are tools available for RDF data extraction, enrichment, linking, transforming, ...

– Any23, Virtuoso, Silk, …

• Stil, the consumer has to (among other activities):– Write his own script executing the tools in the required order and with the required

configurations– Schedule the script– Add notification capabilities, such as sending an email in case of problems

• Maintenance of such task is challenging– In case of problems, consumer has to manually launch the problematic tool with

the proper input data and the problematic configuration, load the output data to a RDF store and browse/query these data

– Consumer can get very quickly lost as the amount of configurations and tools, he is using, is increasing; as a result, he may start creating duplicated configurations.

– Consumer cannot share already prepared configurations, cannot use configurations already prepared by others

Motivation for UnifiedViews

Page 7: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 7 http://lod2.eu

Creating Knowledge out of Interlinked Data

• General Problem: Consumers have to write most of the logic to define, execute, monitor, schedule, and share the data processing tasks

• We propose UnifiedViews, an Extract-Transform-Load (ETL) framework

– The concept of data processing task is a central concept– Another central concept is the native support for RDF data format and ontologies

Problem and Our Solution

Page 8: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 8 http://lod2.eu

Creating Knowledge out of Interlinked Data

Short History: From ODCleanstore & LOD Manager to Unified Views

Two tools targetting the same purpose with different strenght

One tool aligning the ideas of both tools and going beyond that

Page 9: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 9 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Basic Concepts• Key Features• Demo

Presentation of Unified Views

Page 10: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 10 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Every data processing task is modelled as a pipeline.

Basic Concepts in UnifiedViews – A Pipeline

Page 11: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 11 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Component, plugin, module, on the pipeline • Every DPU has certain inputs, outputs, business logic and

configuration. Based on the input and the configuration, the outputs are created.

– E.g., DPU may apply certain set of SPARQL Update queries to the input RDF and produces output RDF data.

Basic Concepts in UnifiedViews - Data Processing Unit (DPU)

Page 12: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 12 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Web administration interface:– Define and manage pipelines– Validate, execute, monitor and debug pipelines– Possibility to schedule tasks, set up notifications about the pipeline executions– Define and manage DPUs– Possibility to debug inputs to/outputs from DPU– Possibility to share pipelines and DPUs – Possibility to get notifications about the result of the pipeline execution– Multi-user environment

• Robust engine running the tasks– Ensures that DPUs on the pipeline are executed in the proper order– It may send notifications about the result of the pipeline execution

• Core DPUs to work with RDF data• Easy way how to extend UnifiedViews with your own DPUs

– Every DPU is an OSGi bundle, as a result, two DPUs with the requirement for two different versions of the same library may coexist in the framework

– Possibility to reload DPUs on the fly

Key Features

Page 13: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 13 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Part A – instance http://odcs.xrg.cz:8080/unifiedviews– Introduction to the Web user interface (2mins)– Simple pipeline and basic operations with the pipeline (5mins)– DPU templates, how they can be managed (1-2mins)

• Part B – instance http://odcs.xrg.cz:8080/odcleanstore– More complex pipelines (1-5mins)

Demo

Page 14: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 14 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Non-RDF ETL Frameworks– Plenty of ETL frameworks, some of them are open source– No support for RDF data format and ontologies in the framework itself

• E.g., DPUs are not prepared to suggest ontological terms in DPU configurations

– No native support for exchanging RDF data between DPUs– No RDF data processing units available out of the box

• Linked Data Integration Framework (LDIF)

• DERI Pipes– When adding new DPUs, Core must be rebuilt– It is not possible to reload Dpus on the fly– Does not provide solution for library version clashes– No possibility to debug inputs/outputs of DPUs

Related Work

Page 15: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 15 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Integrated into the LOD2 stack • Replacin the existing LOD Manager integration

• Used in LOD2• WP9a, to process public contracts data• WP7, to enrich documents with links to Dbpedia and WKD Thesauri

• Used by other projects• OpenData.cz initiative• INITLIB• COMSODE FP7 project (2013-2015)• OpenFridge project.

• Used for commercial purposes by companies Semantica.cz, Czech Republic, and

Semantic Web Company, Austria, to help their customers to prepare and process RDF

data

Impact

Page 16: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 16 http://lod2.eu

Creating Knowledge out of Interlinked Data

• UnifiedViews is available under open source license– GPLv3 + LGPLv3

• Hosted on GitHub– Respository: https://github.com/UnifiedViews/Core

• Current latest version: UnifiedViews 1.0 Candidate– Branch in the repository

• User Documentation:– https://grips.semantic-web.at/display/UDDOC/UnifiedViews+User+Documentation

How to try UnifiedViews?

Page 17: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 17 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Guide for Plugin (DPU) developers:– https://grips.semantic-web.at/display/UDDOC/Creation+of+Plugins

• In short, every DPU typically consists of 4 main files– Core DPU file

• Implement execute() method• Define inputs, outputs

– pom.xml File– DPU dialog – DPU config object

How to develop new DPUs?

Page 18: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 18 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Guideline for contributors:– https://grips.semantic-web.at/display/UDDOC/Guidelines+for+Contributors

How to contribute?

Page 19: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 19 http://lod2.eu

Creating Knowledge out of Interlinked Data

• We presented UnifiedViews, an ETL framework with a native support for processing RDF data, which addresses the problem of sustainable RDF data processing

– Users may define, execute, monitor, debug, schedule, and share data processing tasks (pipeline)

– Users may create their own plugins - data processing units

• UnifiedViews has a living community around and is already used in many projects

– It is Maintained by Semantic Web Company and Semantica.cz

Conclusions

Page 20: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 20 http://lod2.eu

Creating Knowledge out of Interlinked Data

Credits

Jingle R.E.M., Martin Kaltenböck, Florian Kondert

Coordination Thomas Thurner

Martin Kaltenböck

Moderation Martin Kaltenböck

Presented by Tomas Knap, Helmut Nagy

Page 21: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 21 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

Hope you enjoyed staying with us – if you need more detailed information, visit us at www.lod2.eu and let us know how we can improve to meet your expectations!

Don’t forget to register for our next webinar

20.12. 2011 - Virtuoso (Open Link Software) 24.01. 2012 - OntoWiki (University of Leipzig, Germany)

Have a great day and don’t forget ...

Page 22: LOD2 Webinar: UnifiedViews

LOD2 Webinar . 29.11.2011 . Page 22 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu