Upload
thomas-gottron
View
211
Download
1
Tags:
Embed Size (px)
DESCRIPTION
The intensive growth of the Linked Open Data Cloud has spawned a web of data where a multitude of data sources provides huge amounts of valuable information across different domains. Nowadays, when accessing and using Linked Data more and more often the challenging question is not so much whether there is relevant data available, but rather where it can be found and how it is structured. Thus, index structures play an important role for making use of the information in LOD cloud. In this talk I will address three aspects of Linked Data index structures: (1) a high level view and categorization of indices structures and how they can be queried and explored, (2) approaches for building index structures and the need to maintain them and (3) some example applications which greatly benefit from indices over linked data.
Citation preview
Institute for Web Science & Technologies – WeST
Making Use of the Linked Data Cloud:
The Role of Index Structures
Thomas Gottron
March 20th, 2014 FGDB Frühjahrstreffen
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 2 Role of Index Structures on LOD
Making Use of the Linked Data Cloud ...
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
LOD: a rich, huge, diverse, public and distributed knowledge base on the Web.
Pros Cons
rich
knowledge
base
diverse public
huge
on the Web
diverse distributed
Shall I?
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 3 Role of Index Structures on LOD
Challenges Underlying the „Cons“
Volume Semi-
structured
No schema
No central access point
Multitude of data sources
Quality
Dynamics Availability
huge
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 4 Role of Index Structures on LOD
Making Use of the Linked Data Cloud ...
Pros Cons
rich
knowledge
base
diverse public
huge
on the Web
diverse distributed
Shall I?
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 5 Role of Index Structures on LOD
20 years ago ...
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 6 Role of Index Structures on LOD
Making Use of the World Wide Web... Shall I?
Source: Chris 73 / Wikimedia Commons
Pros Cons
rich
document
collection
diverse public
huge
on the
Internet
diverse distributed
Technical solutions to
the problems
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 7 Role of Index Structures on LOD
Making Use of the Linked Data Cloud ... Shall I?
Pros Cons
rich
knowledge
base
diverse public
huge
on the Web
diverse distributed
Inde
x st
ruct
ures
Provide:
Solutions for the storage, management, organization
of, and access to a
rich, huge, diverse distributed knowledge
base on the Web.
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 8 Role of Index Structures on LOD
Types of Indices
Building Indices
Using Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
s1 o1 p1 c1
s1 o1 p2 c1
s2 o2 p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 9 Role of Index Structures on LOD
Types of Indices
Building Indices
Using Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
s1 o1 p1 c1
s1 o1 p2 c1
s2 o2 p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 10 Role of Index Structures on LOD
Data Format
§ Linked Data as N-Quads:
triple – what is the information?
context URI – where does it come from?
s o p
c
( ) s o p c
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 11 Role of Index Structures on LOD
Index Models
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 12 Role of Index Structures on LOD
(Abstract) Index Models
w D : Data elements to be retrieved (payload) w K : Key elements to access the data (index elements) w σ : Selection function: How to get data for a key
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
DK σ
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
℘( )
Data items / Payload Keys
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 13 Role of Index Structures on LOD
Concrete Example: Subject Based Index Model
ukob:Gottron
ukob:Staab
ukob:Schegi
...
tud:CGottron
(ukob:Gottron, rdf:type, foaf:Person) (ukob:Gottron, foaf:knows, ukob:Staab) ...
(ukob:Staab, swrc:institution, ukob:WeST) (ukob:Staab, foaf:name, „Steffen Staab“) ...
(ukob:Schegi, rdf:type, foaf:Person) (ukob:Schegi, foaf:name, „Stefan Scheglmann“)
(tud:CGottron, swrc:institution, tud:KOM) (tud:CGottron, foaf:knows, ukob:Gottron) ...
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 14 Role of Index Structures on LOD
Schema-level Indices
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 15 Role of Index Structures on LOD
Schema Information on the LOD Cloud
(No) Schema?
Guidelines / best practices
Automatic tools Social effects
Emerging Schema!
Induce from data observations
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 16 Role of Index Structures on LOD
Examples for Schema Information
p1
x p2
p3
{p1, p2, p3}
...
x, ... {cA, cB}
...
y, ...
rdf:type
y cB
cA
rdf:type
Property Set Type Set
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 17 Role of Index Structures on LOD
Indexing „Styles“ for the Payload
Full Caching
local
Web
s o p c
Triples
local
Web
s o p
Entities
local
Web
s
Data Sources
local
Web
c
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 18 Role of Index Structures on LOD
Schema-based Access to the LOD cloud
? foaf:Document
fb:Computer_Scientist
dc:creator
x
swrc:InProceedings
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 19 Role of Index Structures on LOD
Schema-based Access to the LOD cloud
Schema-level Index
Where? • ACM • DBLP
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 20 Role of Index Structures on LOD
Building Indices
s1 o1 p1 c1
s1 o1 p2 c1
s2 o2 p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
Types of Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
Using Indices
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 21 Role of Index Structures on LOD
Index Construction
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 22 Role of Index Structures on LOD
Building Indices: Operators
§ Combination of few simple operations w Aggregate, Join, Invert
§ Example: Property Set index
s1 o1 p1 c1
s1 o1 p2 c1
s2 o2 p2 c1
s3 o3 p1 c1
s3 o4 p2 c1
s4 o1 p3 c1
s1 p1 p2
s2 p2
s3 p1 p2
s4 p3
p1 p2 s1 s3
p2 s2
p3 s4
Aggregate Invert
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 23 Role of Index Structures on LOD
12 Implemented Index Models
§ Triple based w Subject à Triple w Predicate à Triple w Object à Triple
§ Meta data w Keywords à Triple w Context à Triple w PLD à Triple
§ Schema-level w RDF Type à Entity w Type set (TS) à Entity w Property set (PS) à Entity w Incoming property set (IPS) à Entity w Type and properties (ECS) à Entity w SchemEX à Entity
https://github.com/gottron/lod-index-models
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 24 Role of Index Structures on LOD
Indices over Evolving Data
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 25 Role of Index Structures on LOD
Index Maintenance
2007
2008
2009 2010
2011
Not just growth, but also deletion and
modification of data
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 26 Role of Index Structures on LOD
How to Measure Accuracy?
§ Queries? w No established query log
for data set w Different key elements
require different queries w Cover all of the index
§ Distributions! w Relevant to several
applications w Established metrics for
comparison
SPARQL
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 27 Role of Index Structures on LOD
Quantifying Divergence of Index Accuracy over Time
Index construction / Estimation of distributions
...
...
T0 (Base) T1 T2 T3 Tn
...
Tn-1
T0
„dev
iatio
n“
T1 T2 T3 Tn Tn-1
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 28 Role of Index Structures on LOD
Evolving Data: Normalised Perplexity
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70
Norm
. P
erp
lexi
ty
Week of Data Snapshot
Subject Predicate Object
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70
No
rm.
Pe
rple
xity
Week of Data Snapshot
Context Keywords PLD
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70
No
rm.
Pe
rple
xity
Week of Data Snapshot
RDF TypeTS
PSIPS
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70
No
rm.
Pe
rple
xity
Week of Data Snapshot
ECS SchemEX
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 29 Role of Index Structures on LOD
Evolving Data: Normalised Perplexity (Zoom in)
0.00
0.02
0.04
0.06
0.08
0.10
0 10 20 30 40 50 60 70
No
rm.
Pe
rple
xity
Week of Data Snapshot
Subject Predicate Object
0.00
0.02
0.04
0.06
0.08
0.10
0 10 20 30 40 50 60 70
No
rm.
Pe
rple
xity
Week of Data Snapshot
Context Keywords PLD
0.00
0.02
0.04
0.06
0.08
0.10
0 10 20 30 40 50 60 70
No
rm.
Pe
rple
xity
Week of Data Snapshot
RDF TypeTS
PSIPS
0.00
0.02
0.04
0.06
0.08
0.10
0 10 20 30 40 50 60 70
No
rm.
Pe
rple
xity
Week of Data Snapshot
ECS SchemEX
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 30 Role of Index Structures on LOD
Using Indices
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Types of Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
Building Indices
s1 o1 p1 c1
s1 o1 p2 c1
s2 o2 p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 31 Role of Index Structures on LOD
Programming Support
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 32 Role of Index Structures on LOD
LITEQ and NPQL
§ Support programming with Linked Data sources
§ NPQL (Node Path Query Language) w Intensional queries à class descriptions, properties w Extensional queries à instance data
§ LITEQ w Implementiation of NPQL (F# type provider) w Autocompletion
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 33 Role of Index Structures on LOD
LITEQ and NPQL
§ RDF type and property navigation (intension)
dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.`` ``http://example.org/ns#dog``
``http://example.org/ns#cat`` ``http://example.org/ns#person`` ...
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 34 Role of Index Structures on LOD
LITEQ and NPQL
§ RDF type and property navigation (intension)
dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 35 Role of Index Structures on LOD
LITEQ and NPQL
§ RDF type and property navigation (intension)
dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``↵ .PropNavigation.``
``http://example.org/ns#hasOwner`` ``http://example.org/ns#hasName`` ``http://example.org/ns#taxNumber`` ...
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 36 Role of Index Structures on LOD
LITEQ and NPQL
§ RDF type and property navigation (intension)
dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``↵ .PropNavigation.``http://example.org/ns#hasOwner``
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 37 Role of Index Structures on LOD
LITEQ and NPQL
§ Accessing instances (extension) let allDogs = dC.``http://example.org/ns#creature``↵
.SubTypeNavigation.``http://example.org/ns#dog``.↵ .Extension
§ Accessing individuals
let bello = dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``↵ .Individuals.``http://example.org/ns#bello``↵ .getRdfObject
bello.get_hasName() bello.get_taxNumber()
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 38 Role of Index Structures on LOD
Exploring Entity Descriptions
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 39 Role of Index Structures on LOD
Schema-based Access to the LOD cloud
Schema-level Index
Where? • ACM • DBLP
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 40 Role of Index Structures on LOD
Schema-level Search of Relevant Data Sources
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 41 Role of Index Structures on LOD
Searching for a Suitable Description
SELECT ?x WHERE { ?x rdf:type foaf:Document }
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type foaf:PersonalProfileDocument }
SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type sioc:Post . }
Did you mean ...
Related Queries ...
So far: gentle,
iterative modification
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 42 Role of Index Structures on LOD
Parallel Indices Over the Data
ts1
ts2
ts3
...
tsn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
psA
psB
psC
...
psM
dA,1 dA,2 dA,3 ...
dB,1 dB,2
dC,1
dM,1 dM,2 dM,3 ...
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 43 Role of Index Structures on LOD
Parallel Indices Over the Data
ts1
ts2
ts3
...
tsn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3
psA
psB
psC
...
psM
dA,1 dA,2 dA,3 ...
dB,1 dB,2
dC,1
dM,1 dM,2 dM,3 ...
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 44 Role of Index Structures on LOD
General Idea for Mapping
Entity Set
c1
c2
p3
p4
p5
Approx. Entity
Set
deriv
e derive
approximate
description alternative description
ts1
ts2
ts3
...
tsn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3
psA
psB
psC
...
psM
dA,1 dA,2 dA,3 ...
dB,1 dB,2
dC,1
dM,1 dM,2 dM,3 ...
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 45 Role of Index Structures on LOD
Types of Indices
Building Indices
Using Indices
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
s1 o1 p1 c1
s1 o1 p2 c1
s2 o2 p2 c1
s1 p1 p2
s2 p2
p1 p2 s1 s3
p2 s2
E1 rdf:type dc:creator
E2
Bad News ... dc:title
foaf:Document
swrc:InProceedings
rdf:type
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 46 Role of Index Structures on LOD
Summary
Pros Cons
rich
knowledge
base
diverse public
huge
on the Web
diverse distributed
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Technical solutions to some of the problems
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 47 Role of Index Structures on LOD
Summary
Pros Cons
rich
knowledge
base
diverse public
huge
on the Web
diverse distributed
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 48 Role of Index Structures on LOD
Thank you!
Contact: Thomas Gottron WeST – Institute for Web Science and Technologies Universität Koblenz-Landau [email protected]
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 49 Role of Index Structures on LOD
References
1. M. Konrath, T. Gottron, and A. Scherp, “Schemex – web-scale indexed schema extraction of linked open data,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2011.
2. M. Konrath, T. Gottron, S. Staab, and A. Scherp, “Schemex—efficient construction of a data catalogue by stream-based indexing of linked data,” Journal of Web Semantics, 2012.
3. T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “Explicit and implicit schema information on the linked open data cloud: Joined forces or antagonists?,” Tech. Rep. 06/2012, Institut WeST, Universität Koblenz-Landau, 2012.
4. T. Gottron and R. Pickhardt, “A detailed analysis of the quality of stream-based schema construction on linked open data,” in CSWS’12: Proceedings of the Chinese Semantic Web Symposium, 2012.
5. T. Gottron, A. Scherp, B. Krayer, and A. Peters, “Get the google feeling: Supporting users in finding relevant sources of linked open data at web-scale,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2012.
6. T. Gottron, A. Scherp, B. Krayer, and A. Peters, “LODatio: Using a Schema-Based Index to Support Users in Finding Relevant Sources of Linked Data,” in K-CAP’13: Proceedings of the Conference on Knowledge Capture, 2013.
7. T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud,” in ESWC’13: Proceedings of the 10th Extended Semantic Web Conference, 2013.
8. J. Schaible, T. Gottron, S. Scheglmann, and A. Scherp, “LOVER: Support for Modeling Data Using Linked Open Vocabularies,” in LWDM’13: 3rd International Workshop on Linked Web Data Management, 2013.
9. R. Dividino, A. Scherp, G. Gröner, and T. Gottron, “Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not?,” in COLD’13: International Workshop on Consuming Linked Data, 2013.
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 50 Role of Index Structures on LOD
References
10. T. Gottron, M. Knauf, and A. Scherp, “Analysis of schema structures in the linked open data graph based on unique subject uris, pay-level domains, and vocabulary usage,” Distributed and Parallel Databases, pp. 1–39, 2014.
11. T. Gottron and C. Gottron, “Perplexity of index models over evolving linked data,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.
12. T. Gottron, A. Scherp, and S. Scheglmann, “Providing alternative declarative descriptions for entity sets using parallel concept lattices,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.
13. Carothers, G.: Rdf 1.1 n-quads. W3C Recommendation (Feb 2014), http://www.w3. org/TR/2014/REC-n-quads-20140225/, (accessed 14 March 2014)
14. Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In: The Se- mantic Web: Semantics and Big Data, Lecture Notes in Computer Science, vol. 7882, pp. 213–227. Springer Berlin Heidelberg (2013)
Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 51 Role of Index Structures on LOD
Sources
• Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/, This work is available under a CC-BY-SA license.
• WorldWideWeb Around Wikipedia – Wikipedia as part of the world wide web, This Wikipedia and Wikimedia Commons image is from the user Chris 73 and is freely available at //commons.wikimedia.org/wiki/File:WorldWideWebAroundWikipedia.png under the creative commons CC-BY-SA 3.0 license.