View
297
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf * Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
Citation preview
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Javier D. Fernández, Alejandro Llaves, Oscar Corcho
Ontology Engineering Group (OEG)Universidad Politécnica de Madrid, Spain
Outline
Index
1. Introduction & Motivation2. Background3. Efficient RDF Interchange (ERI) Format
i. Basic Conceptsii. ERI Streamsiii. Practical Deployment
4. Evaluation5. Conclusions and Next steps
2
3
INTRODUCTION - Static data versus RDF data streams
3
INTRODUCTION - Static data versus RDF data streams
Transform LoadExtract
Files
DBMS
Spatial Information
Web APIs
Linked Data discovery
3
INTRODUCTION - Static data versus RDF data streams
My datasetAugust 2014
My d
Febru
Transform LoadExtract
Files
DBMS
Spatial Information
Web APIs
Linked Data discovery
3
INTRODUCTION - Static data versus RDF data streams
My datasetAugust 2014
My d
Febru
Transform LoadExtract
Files
DBMS
Spatial Information
Web APIs
Linked Data discovery
“Most semantic tools are focused on this static view”
3
INTRODUCTION - Static data versus RDF data streams
RDF Data Streams are gaining momentum, generated from any type of data stream, and combining real-time and historical data.
©Wilgengebroed on Flickr, Mr3641, ProtoplasmaKid and ISA Internationales Stadtbauatelier in commons wikimedia
My datasetAugust 2014
My d
Febru
3
INTRODUCTION - Static data versus RDF data streams
3
INTRODUCTION - Static data versus RDF data streams
3
INTRODUCTION - Static data versus RDF data streams
3
INTRODUCTION - Static data versus RDF data streams
3
INTRODUCTION - Static data versus RDF data streams
RDF streams: potentially unbounded sequences of timestamped RDF statements or graphs.
3
INTRODUCTION - Static data versus RDF data streams
RDF streams: potentially unbounded sequences of timestamped RDF statements or graphs.
user1_observation [t1]weather1_observation [t1] user2_observation [t3]…
3
INTRODUCTION - Static data versus RDF data streams
RDF streams: potentially unbounded sequences of timestamped RDF statements or graphs.
t
u1 u2 u3 u4
w1 w2 w3
Stream
user1_observation [t1]weather1_observation [t1] user2_observation [t3]…
3
INTRODUCTION - Motivation
Achieve efficient transmission of RDF streams, a necessary step to ensure higher throughput for RDF Stream processors
Stream source
Stream source
Stream source
Stream source
Stream Processor
Engine
Historic Information
C-SPARQL, SPARQLStr
eammorph-streamsCQELS Cloud
Ztreamy…
Stream source
queries
Continuous results
INTRODUCTION – Motivation - Requirements
16
Efficient transmission of RDF streams:• Streamable
• Scalable
• Easy (fast) to process (create and parse)
• Compact
• Parametrizable (several tradeoffs compression/time)
BACKGROUND
17
Plain:Turtle/Trig/JSON-LD
Plain+Compression (e.g. gzip) HDT
Streaming HDT RDSZ
RDF/XML + EXI ERI
Streamable Yes Yes No Yes Yes Yes Yes
Scalable Limited Yes Yes No Yes Yes YesEasy (fast) to create and parse Yes Limited Limited Yes Limited Limited Yes
Compact No Yes Yes Limited Yes Yes YesParametrizable: compression/time No Limited Yes No Limited Limited Yes
Outline
Index
1. Introduction & Motivation2. Background3. Efficient RDF Interchange (ERI) Format
i. Basic Conceptsii. ERI Streamsiii. Practical Deployment
4. Evaluation5. Conclusions and Next steps
18
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
19
• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited
structurevariations
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
20
• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at two levels:
structurevariations
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
21
• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at two levels:
• A sliding dictionary of structures: Structural Dictionary
structurevariations
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
22
• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at two levels:
• A sliding dictionary of structures: Structural Dictionary• The concrete value for each predicate
structurevariations
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
23
• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at two levels:
• A sliding dictionary of structures: Structural Dictionary• The concrete value for each predicate
structurevariations
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
24
• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at two levels:
• A sliding dictionary of structures: Structural Dictionary• The concrete value for each predicate
structurevariations
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
25
t
u1 u2 u3 u4
w1 w2 w3
Stream
“7.7”^^xsd:float “9.4”^^xsd:float
Structural
Dictionary
temperature
Casual user
Anual pass
wind
ID-30
ID-31 ID-32
ID-33
…
weather: TemperatureObservation
rdf:type
weather:AirTemperature
ssn:observedProperty
???
ex:CelsiusValue
… …
…
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
26
t
u1 u2 u3 u4
w1 w2 w3
Stream
“7.7”^^xsd:float “9.4”^^xsd:float
Structural
Dictionary
temperature
Casual user
Anual pass
wind
ID-30
ID-31 ID-32
ID-33
…
weather: TemperatureObservation
rdf:type
weather:AirTemperature
ssn:observedProperty
???
ex:CelsiusValue
… …
…
molecule
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
27
t
u1 u2 u3 u4
w1 w2 w3
Stream
“7.7”^^xsd:float “9.4”^^xsd:float
Structural
Dictionary
temperature
Casual user
Anual pass
wind
ID-30
ID-31 ID-32
ID-33
…
weather: TemperatureObservation
rdf:type
weather:AirTemperature
ssn:observedProperty
???
ex:CelsiusValue
… …
…
molecule
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
28
t
u1 u2 u3 u4
w1 w2 w3
Stream
“7.7”^^xsd:float “9.4”^^xsd:float
Structural
Dictionary
temperature
Casual user
Anual pass
wind
ID-30
ID-31 ID-32
ID-33
…
weather: TemperatureObservation
rdf:type
weather:AirTemperature
ssn:observedProperty
???
ex:CelsiusValue
… …
…
molecule
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
29
t
u1 u2 u3 u4
w1 w2 w3
Stream
“7.7”^^xsd:float “9.4”^^xsd:float
Structural
Dictionary
temperature
Casual user
Anual pass
wind
ID-30
ID-31 ID-32
ID-33
…
weather: TemperatureObservation
rdf:type
weather:AirTemperature
ssn:observedProperty
???
ex:CelsiusValue
… …
…
molecule
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
30
t
u1 u2 u3 u4
w1 w2 w3
Stream
“7.7”^^xsd:float “9.4”^^xsd:float
Structural
Dictionary
temperature
Casual user
Anual pass
wind
ID-30
ID-31 ID-32
ID-33
…
weather: TemperatureObservation
rdf:type
weather:AirTemperature
ssn:observedProperty
???
ex:CelsiusValue
… …
…
molecule
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
31
t
u1 u2 u3 u4
w1 w2 w3
Stream
“7.7”^^xsd:float “9.4”^^xsd:float
Structural
Dictionary
temperature
Casual user
Anual pass
wind
ID-30
ID-31 ID-32
ID-33
…
weather: TemperatureObservation
rdf:type
weather:AirTemperature
ssn:observedProperty
???
ex:CelsiusValue
… …
…
molecule
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
32
t
u1 u2 u3 u4
w1 w2 w3
Stream
“7.7”^^xsd:float “9.4”^^xsd:float
Structural
Dictionary
temperature
Casual user
Anual pass
wind
ID-30
ID-31 ID-32
ID-33
…
weather: TemperatureObservation
rdf:type
weather:AirTemperature
ssn:observedProperty
???
ex:CelsiusValue
… …
…
molecule
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
33
• ERI processing model
• Minimal Information Unit is a molecule:
• We initially restrict to subject molecules
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
34
Sub
ject
M
ole
cule
…
Suu
bje
ctM
ole
cule
…
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
35
Sub
ject
M
ole
cule
…..Structure ID30= a (1, weather:TemperatureObservation) rdfs:label (2) om-wl:observedProperty (1, weather:_AirTemperature ) om-owl:procedure (1,sens-obs:System_4UT01) om-owl:result (1) om-owl:samplingTime (1) ex:CelsiusValue (1) …..
Structural Dictionary
…
Suu
bje
ctM
ole
cule
…
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
36
Sub
ject
M
ole
cule
…..Structure ID30= a (1, weather:TemperatureObservation) rdfs:label (2) om-wl:observedProperty (1, weather:_AirTemperature ) om-owl:procedure (1,sens-obs:System_4UT01) om-owl:result (1) om-owl:samplingTime (1) ex:CelsiusValue (1) …..
Structural Dictionary
…
Suu
bje
ctM
ole
cule
…
Air Temperature Observations of the
Sensor “System_4UT01”
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
37
Sub
ject
M
ole
cule
…..Structure ID30= a (1, weather:TemperatureObservation) rdfs:label (2) om-wl:observedProperty (1, weather:_AirTemperature ) om-owl:procedure (1,sens-obs:System_4UT01) om-owl:result (1) om-owl:samplingTime (1) ex:CelsiusValue (1) …..
Structural Dictionary
…
Suu
bje
ctM
ole
cule
…
Air Temperature Observations of the
Sensor “System_4UT01”
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – ERI Streams
38Based on: Efficient XML Interchange (EXI) format
Block
Molecule
Molecule
Molecule
…
Block
Molecule
Molecule
Molecule
…
Block
Molecule
Molecule
Molecule
……
Multiplex / Demultiplex
Compression/Decompression (per channel)
StreamHeader
Stream Body
METADATA
COMPCHAN.
COMPCHAN.
COMPCHAN.
COMPCHAN.
COMPCHAN.
COMPCHAN.
METADATA
COMPCHAN.
COMPCHAN.
COMPCHAN.
COMPCHAN.
METADATA
COMPCHAN.
COMPCHAN.
COMPCHAN.
COMPCHAN.
COMPCHAN.
COMPCHAN.
ChannelsStructural Channels
Value Channels…
ERI stream
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – ERI Streams
39
ERI follows an encoding procedure similar to that of the Efficient XML Interchange (EXI) format.
Structural channels: They encode the subjects in each block and, for each one, the structural properties of the related triples, using the dynamic dictionary of structures.• Main Terms of molecules: subject of the grouping.• ID-Structures: ID of the structure of each molecule in the block. The ID
points to the entry in the Structural Dictionary.• New Structures: New entries in the Structural Dictionary.
– Value channels: They encode the concrete data values held by each predicate in the block in a compact fashion.• One channel per different predicate in the block.
• Lists explicit values or use IDs pointing to a sliding object dictionary
stru
ctur
eva
riatio
ns
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
40
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
41
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
Explicit list of values
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
42
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
Explicit list of values
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
43
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
Explicit list of values
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
44
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
Explicit list of values
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
45
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
Explicit list of values
IDs pointing to a sliding object
dictionary
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
46
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
Explicit list of values
IDs pointing to a sliding object
dictionary
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
47
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
Explicit list of values
IDs pointing to a sliding object
dictionary
Extraction of types
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
48
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
Explicit list of values
IDs pointing to a sliding object
dictionary
Extraction of types
EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
49
…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…
ID-Structures …3030…
ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7
[IDs ofStructures]
[Encoded Structures] [Strings]
… om-owl:samplingTime
ex:CelsiusValue…
Structural Channels
ID-pred2
[Object Values][Meta: strings]
…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…
ID-pred5
[Term IDs][Meta: IDs]
New Terms
[Strings]
…101245…
ID-pred6
[Term IDs][Meta: IDs]
12…
Pote
ntial
Com
pres
sion
Differential…
Prefix compressionZlib
Snappy…
Main Terms of Molecules
[Strings]
….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….
Prefix compressionZlib
Snappy…
Prefix compressionZlib
Snappy…
ZlibSnappy
…
Differential…
Differential…
…10…
[Bits]
New Structure Marker
New Structures New Predicates
ZlibSnappy
…
[Bits]
New Object MarkerID-pred5
…01…
New Object MarkerID-pred6
[Bits]
11…
1211111
ID-pred7
[Object Values][Meta: xsd:float]
Differential…
…7.79.4….
ValueChannels
Pote
ntial
Com
pres
sion
Explicit list of values
IDs pointing to a sliding object
dictionary
Extraction of types
Outline
Index
1. Introduction & Motivation2. Background3. Efficient RDF Interchange (ERI) Format
i. Basic Conceptsii. ERI Streamsiii. Practical Deployment
4. Evaluation5. Conclusions and Next steps
50
EVALUATION - COMPRESSION
51
EVALUATION - COMPRESSION
52
ERI excels in space for streaming and statistical dataset
EVALUATION - COMPRESSION
53
ERI excels in space for streaming and statistical dataset
RDSZ remains comparable to our approach
EVALUATION - COMPRESSION
54
ERI excels in space for streaming and statistical dataset
The object dictionary can overload the representation, although it always obtains comparable compression ratios.
RDSZ remains comparable to our approach
EVALUATION - COMPRESSION
55
EVALUATION - COMPRESSION
56
A smaller buffer in ERI-1k slightly affects the efficiency
EVALUATION - PARSING
57
EVALUATION - PARSING
58
ERI always outperforms the RDSZ compression time (3 and 3.8 times on average for ERI-4k and ERI-4k-Nodict, respectively)
EVALUATION - PARSING
59
ERI always outperforms the RDSZ compression time (3 and 3.8 times on average for ERI-4k and ERI-4k-Nodict, respectively)
ERI decompression is commonly slower (1.4 times on average in both ERI configurations), typically due to decompressing several channels.
EVALUATION - PARSING
60
ERI always outperforms the RDSZ compression time (3 and 3.8 times on average for ERI-4k and ERI-4k-Nodict, respectively)
ERI decompression is commonly slower (1.4 times on average in both ERI configurations), typically due to decompressing several channels.
Channels could be grouped (as in EXI)
EVALUATION – CONSUMING SCENARIO
61
In parsing: transmission + decompression
EVALUATION – CONSUMING SCENARIO
62
ERI-4k and ERI-4k-Nodict outperform the baseline in transmission + decompressionexcept for those datasets with less regularities in the structure or the data values,
In parsing: transmission + decompression
EVALUATION – CONSUMING SCENARIO
63
In a scenario in which we include the compression time
EVALUATION – CONSUMING SCENARIO
64
ERI-4k suffers an expected overhead as we are always including the timeto process the information
In a scenario in which we include the compression time
EVALUATION – CONSUMING SCENARIO
65
ERI-4k suffers an expected overhead as we are always including the timeto process the information
In a scenario in which we include the compression time
The time in which the client receives all data in ERI is comparable to the baseline
Results
66
• Compressed, efficient RDF interchange (ERI) format• exploit the RDF data stream regularity of their structure and
data values• Flexible and extensible ERI configurations• Minimize transmission costs in RDF stream processing
• State-of-the-art compression • Remains efficient in performance
• Time overheads are relatively low and can be assumed in many scenarios.
67
Next steps
• Integration within RDF streaming Engines • e.g. morph-streams, CQELS Cloud• 3 purposes:
• scaling to higher input data rates• minimizing the data exchange among processing nodes• serving a small set of operators on the compressed data
• Parallel compression/decompression• preliminary proposal on Storm
• Align the proposal with the results of W3C RSP group regarding streaming modeling and serialization
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Javier D. Fernández, Alejandro Llaves, Oscar CorchoOntology Engineering Group (OEG), Universidad Politécnica de Madrid, Spain
purl.org/net/ro-eri-ISWC14
Electronic edition:
Research object: