34
Rethinking Online SPARQL Querying to Support Incremental Result Visualization Olaf Hartig http://olafhartig.de @olafhartig

Rethinking Online SPARQL Querying to Support Incremental Result Visualization

Embed Size (px)

Citation preview

Rethinking Online SPARQL Querying to Support

Incremental Result Visualization

Olaf Hartig

http://olafhartig.de

@olafhartig

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 2

Prologue

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 3

Live Querying the Web of Data

● Federated query processing– i.e., querying a federation of SPARQL endpoints

● Linked Data query processing– i.e., querying Linked Data by relying only on the

Linked Data principles (interface: URI lookups)

– e.g., traversal-based query execution

● Querying other Linked Data fragment servers– e.g., triple pattern fragments

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 4

Chapter 1

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 5

Can the progress that has been madeon (Read/Write) Linked Data change theway we interact with the Web […] ?”

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 6

Information in Dynamic Web Pages

Support for such an incremental visualizationhas not received much attention in existing

work on querying the Web of Data

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 7

I think we have not made enough progress to evenenable well-understood interaction techniques thatare widely applied in “traditional” Web applications

Can the progress that has been madeon (Read/Write) Linked Data change theway we interact with the Web […] ?”

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 8

Topics

Opportunities to Optimize the ResponseTimes of Traversal-based Query Executions

Making the Core Fragment of SPARQLSuitable for the Task

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 9

Chapter 2

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 10

Implementation Approach

Data RetrievalOperator

TriplePattern

Operator

TriplePattern

Operator

Dispatcher

. . .

Triple pattern ( ?v1, knows, ?v2 )

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 11

Data Retrieval Operator

Dispatcher

. . . GET http://example.org/...

. . . . . . . .

RDF triple( Bob, knows, Alice )

Triple pattern ( ?v1, knows, ?v2 )

TriplePattern

Operator

TriplePattern

Operator

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 12

Triple Pattern Operator

Dispatcher

. . .

. . . . . . . . Triple pattern ( ?v1, knows, ?v2 )

RDF triple( Bob, knows, Alice )

Intermediate SolutionTimestamp: 1Bindings: ?v1 → Bob, ?v2 → AliceFlags: [ ∙ | √ | ∙ | ∙ ]

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 13

Dispatcher

. . .

. . . . . . . .

Output

Intermediate SolutionTimestamp: 1Bindings: ?v1 → Alice, ?v2 → BobFlags: [ ∙ | √ | ∙ | ∙ ]

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 14

Output

Triple Pattern Operator cont'd

. . .

. . . . . . . .

?X

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 15

Output

Triple Pattern Operator cont'd

. . .

. . . . . . . .

?

Intermediate SolutionTimestamp: 461Bindings: ?v1 → Bob, ?v2 → SteveFlags: [ ∙ | √ | ∙ | ∙ ]

Intermediate SolutionTimestamp: 327Bindings: ?v1 → Bob, ?v3 → BerlinFlags: [√ | ∙ | ∙ | ∙ ]

Intermediate SolutionTimestamp: 461Bindings: ?v1 → Bob, ?v2 → Steve, ?v3 → BerlinFlags: [√ | √ | ∙ | ∙ ]

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 16

Output

Properties

. . .

. . . . . . . .

TP Operator

Data Retrieval

Dispatcher

TP Operator

● Supports:– any reachability-based

query semantics

● Highly flexible– routing of intermediate

solutions

● Inspired by “Eddies”– Avnur & Hellerstein,

SIGMOD 2000

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 17

Hypothesis 1

Responses time can be reducedby applying a suitable routing policy.

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 18

Test of Different Routing Policies

Setup:● Data retrieval operator simply appends to its lookup queue● Web simulation environment (test Web: W-62-47, test query: Q1, details: [Hartig and Özsu 2014])● Each bar represents geometric mean of 5 separate executions

Response time forlast reported solution,relative to overall QET

Response time forfirst reported solution,relative to overall QET

Routing policyhas no impact!

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 19

Hypothesis 1

Responses time can be reducedby applying a suitable routing policy.

No!

Why?

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 20

Data Retrieval Dominates!!!

Query 1 Query 4 Query 5 Query 9 Query 100.1

1

10

100

1000

10000

10000010 threads 20 threads cache

avg.

que

ry e

xec.

tim

e (s

econ

ds)

log

scal

e!

5 queries of the FedBench benchmark suite,executed over real Linked Data on the WWW

Different number of lookup threadsused by the data retrieval operator Data retrieval op. equipped with a cache

● Cache populatedby a first execution

● Times measured fora 2nd, cache-onlyexecution (i.e., dataretrieval deactivated)

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 21

Hypothesis 2

Response times can be reducedby choosing a “good” strategy

of prioritizing URI lookups.

. . . . . . . .

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 22

0 1 2 3 4 5 60

5

10

15

20

25

30

35

QETexec1exec2exec3exec4exec5

Prioritizing Lookups Randomly

result elements

time

from

beg

in o

f the

que

ry e

xecu

tion

(in m

inut

es)

ca. 25% of QET

ca. 58%

Setup:● LD10 of the FedBench benchmark suite,

over real Linked Data on the WWW

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 23

Hypothesis 2

Response times can be reducedby choosing a “good” strategy

of prioritizing URI lookups.√

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 24

Question

Response times can be reducedby choosing a “good” strategy

of prioritizing URI lookups.√

What is

?

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 25

Chapter 3

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 26

Topics

Opportunities to Optimize the Response Times of Traversal-based Query Executions √

Making the Core Fragment of SPARQLSuitable for the Task

(by making it monotonic)

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 27

Monotonicity?

● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that:

● Example: the SPARQL pattern is

P = (a, p,?x) OPT (?x, p,?y)

is not monotonic– G1 = { (a, p, b) }

– G2 = { (a, p, b), (b, p, c) }

– ⟦P⟧G1 = { μ }, where μ = { ?x → b }

– ⟦P⟧G2 = { μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !

⟹ Q( ) ⊆ Q( )

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 28

What is the Issue?

● For any non-monotonic query, elements ofthe result set can be output only after wehave seen all query-relevant parts of the DB– Hence, since we discover our DB (the Web of Data)

at runtime, we can output result elements only after completing the discovery process

● Good news: the AND-UNION-FILTER fragment of SPARQL is monotonic [Arenas and Perez 2011]

● Bad news: for the AND-UNION-FILTER-OPT fragment, monotonicity is undecidable [Hartig 2014]

– i.e., queries with OPT may be non-monotonic

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 29

What is the Usage of OPT?

● DBpedia– 46.4% of ca. 1.3M unique queries

(logs from Apr. – Jul. 2010)Picalausa and Vansummeren, in SWIM 2011

– 16.6% (logs from USEWOD 2011 dataset)Gallego et al., in USEWOD 2011

– 15% (logs from USEWOD 2011 dataset)Elbedweihy et al., in COLD 2011

● Semantic Web conference corpus (SWDF)– 0.4% (logs from USEWOD 2011 dataset)

Gallego et al., in USEWOD 2011

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 30

A Proposal: The OPT+ Operator

● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that:

● Recall our example: the SPARQL pattern is

P' = (a, p,?x) OPT (?x, p,?y)

is not monotonic– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }

– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }

– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !

● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G \ 〚 P2 〛 G )

● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G

➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 31

A Proposal: The OPT+ Operator

● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that:

● Recall our example: the SPARQL pattern is

P' = (a, p,?x) OPT+ (?x, p,?y)

is not monotonic √– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }

– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }

– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !

● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G \ 〚 P2 〛 G )

● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G

➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 32

A Proposal: The OPT+ Operator

● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G \ 〚 P2 〛 G )

● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G

➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 33

Epilogue

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 34

Conclusions

● Returning result elements early has not yet received sufficient attention in existing workon live querying the Web of Data

● Prioritizing data retrieval can reduce response times of traversal-based query executions

What approaches are suitable and effective?

Similar for federated query processing, LDFs?

● Language features have to be chosen with care

Their impact has to be studied

Dedicated optimization techniques are possible