Upload
fabio-benedetti
View
159
Download
2
Embed Size (px)
Citation preview
DB
Gro
up
@ U
NIM
O
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 1
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Fabio Benedetti
Department of Engineering “Enzo Ferrari”
University of Modena & Reggio Emilia
D-Day 2015 - Modena
DB
Gro
up
@ U
NIM
O
3Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 3
[Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the Linked Data Best Practices in
Different Topical Domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260}
DB
Gro
up
@ U
NIM
O
4Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 4
*Only 570 datasets belong to the LOD cloud,
the remaining datasets do not contain
ingoing/outgoing links to the LOD Cloud.
2009 2014*
Domain Number % Number %
Cross-domain 41 13.95% 41 4.04%
Geographic 31 10.54% 21 2.07%
Government 49 16.67% 183 18.05%
Life sciences 41 13.95% 83 8.19%
Media 25 8.50% 22 2.17%
Publications 87 29.59% 96 9.47%
Social web 0 0.00% 520 51.28%
User-generated content 20 6.80% 48 4.73%
Total 294 1014
2009 Domain
Cross-domain
Geographic
Government
Life sciences
Media
Publications
Social web
2014
DB
Gro
up
@ U
NIM
O
5Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 5
The Open Access trends encourage the
publication of Open Data in form of
Linked Data
But
discovering LOD sources of interest is a
complex task for a user
Main issues
• Do not exist any standard to document a Dataset
• The structure of the Dataset can be understood only
manually exploring the Dataset
• The Semantic Web technologies are extremely complex for
unskilled user
DB
Gro
up
@ U
NIM
O
6Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 6
• To automatically extract and summarize a schema
(Schema Summary) able to describe a LOD Dataset
• Use the Schema Summary to support the user in the
information extraction task
Online & Automatic extraction• It does not require any additional information by the user
• It works with SPARQL endpoints
– We have to handle the bad performance issues of these Datasets
The Schema Summary has to describe a Dataset• Ontology/Vocabulary (OWL & RDFS constraints)
• Open Data (i.e. generated from existing RDBMS)
DB
Gro
up
@ U
NIM
O
7Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 7
Two main modules
• Extraction & Summarization
• Visualization & Querying
LODeX uses a NoSQL
Database as back-end
Input
URLs of SPARQL endpoints
Output
Interactive Schema Summary
LOD Cloud
SPARQL Queries
Schema
Summary
NoSQL
LODeX Post-
processing
Statistical Indexes
LODeX Indexes
Extraction
Query Orchestrator
Schema Summary
Visualizzation
Schema Summary
Basic QueryResults
EndpointURLs
Sgvizler
SPARQL Queries
DB
Gro
up
@ U
NIM
O
8Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 8
Statistical Indexes
They are composed by 9 indexes divided in three groups:
• General group
• Intensional group
• Extensional group
The IE process is able to generate the SPARQL queries used to extract the
different indexes.
• Iterative algorithm able to extract the Intensional knowledge
• Pattern Strategy technique
– It is a technique able to produce an higher number of less complex
SPARQL query
The IE process is able to perform online index extraction handling the
performance issues of the SPARQL endpoints
[F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open data sources,” 2014, Linked Data for Information
Extraction (LD4IE) Workshop held at International Semantic Web Conference]
DB
Gro
up
@ U
NIM
O
9Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 9
The elements composing the Schema Summary are:
• Classes
• Properties
• Attributes
An algorithm combines
the information
contained in the
Statistical Indexes to
produce and store the
Schema Summary
[F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data sources,” 2014, International
Semantic Web Conference (Posters & Demos)]
DB
Gro
up
@ U
NIM
O
10Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 10
Schema Summary
SPARQL compiler
SPARQL query
Basic Query
• The User using the Web Application GUI is
driven to building a Basic Query
• A refinement panel helps the user in refine
the Basic Query
A SPARQL compiler automatically generates
the corresponding SPARQL query
Operator supported by the compiler:• AND
• Optional
• Filter
The query is sent to the SPARQL endpoint
and the results can be visualized in a
tabular, maps or chart view (pie, bar, etc.)
• ORDER BY
• LIMIT
• OFFSET
DB
Gro
up
@ U
NIM
O
11Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 11
DB
Gro
up
@ U
NIM
O
12Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 12
Try LODeX demo at: http://dbgroup.unimo.it/lodex2
[F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX,” 2014, submitted at The
Semantic Web journal]
DB
Gro
up
@ U
NIM
O
13Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 13
Test Nov. 2014
Dataset URLs 559
Reachable datasets 302
SPARQL 1.1 compatible
206
Extraction completed 185
Task Correct Answers
Schema Summary browsing 94% (32/34)
Query generation 88% (60/68)
Online survey with 17 anonymous
users:
• 8 Skilled users
• 9 Unskilled user
The survey is divided in two parts:
• Schema Summary browsing
clarity
• Query generation
DB
Gro
up
@ U
NIM
O
14Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 14
• Modify the interface of LODeX according to the
results of the online survey
• Extends the VOID descriptor vocabulary in order
to represent the Statistical Indexes and publish our
data as LOD
– Build an observatory for the LOD cloud
• Define clustering techniques to reduce the size of
the Summary for huge dataset
DB
Gro
up
@ U
NIM
O
15Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 15
Accepted papers• Beneventano, D., Bergamaschi, S., Sorrentino, S., Vincini, M., Benedetti, F. “Semantic
annotation of the CEREALAB database by the AGROVOC linked dataset” (2014)
Ecological Informatics journal, . Article in Press.
• F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open
data sources” 2014, Linked Data for Information Extraction (LD4IE) Workshop held at
International Semantic Web Conference
• F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data
sources” 2014, International Semantic Web Conference (Posters & Demos)
Submitted papers• F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX”
2014, submitted at Semantic Web – Interoperability, Usability, Applicability an IOS
Press Journal
European projects & schools• Web Science Summer School - Southampton University (20-26 July 2014)
• RDA Research Data Alliance - RDA Fourth Plenary Meeting 22 - 24 September 2014 in
Amsterdam. I won an Early Career Scientist grant and I belong to the Big Data
Analytics Interest group.
• Keystone - COST Action IC1302. Autumn 2014 MC and WG Meetings “QUERYING THE
SEMANTIC WEB” 17-18 October 2014, Riva del Garda, TN.