87
SPARQL Querying Benchmarks Muhammad Saleem, Ivan Ermilov, Axel-Cyrille Ngonga Ngomo, Ricardo Usbeck, Michael Roder https://sites.google.com/site/sqbenchmarks/ Tutorial at ISWC 2016, Kobe, Japan, 17/10/2016 Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany 06/16/2022 1

SPARQL Querying Benchmarks ISWC2016

Embed Size (px)

Citation preview

Page 1: SPARQL Querying Benchmarks ISWC2016

SPARQL Querying BenchmarksMuhammad Saleem, Ivan Ermilov, Axel-Cyrille Ngonga Ngomo,

Ricardo Usbeck, Michael Roderhttps://sites.google.com/site/sqbenchmarks/

Tutorial at ISWC 2016, Kobe, Japan, 17/10/2016

Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany

05/03/2023 1

Page 2: SPARQL Querying Benchmarks ISWC2016

Agenda• Why benchmarks?• Components and design principles• Key features and choke points• Centralized SPARQL benchmarks• Federated SPARQL benchmarks• Hands-on • HOBBIT introduction

10:00 – 10:30

9:00 – 10:00

10:30 – 12:00

05/03/2023 2

Page 3: SPARQL Querying Benchmarks ISWC2016

Why Benchmarks?• What tools I can use for my use case? • Which tool best suit my use case and why?• Which are the relevant measures?• Which is the behavior of the existing engines?• What are the limitations of the existing engines?• How to improve existing engines?

05/03/2023 3

Page 4: SPARQL Querying Benchmarks ISWC2016

Benchmark Categories• Micro benchmarks

Specialized, detailed, very focused and easy to run Neglect larger picture Difficult to generalize results Do not use standardized metrics For example, Joins evaluation benchmark

• Standard benchmarks Generalized, well defined Standard metrics Complicated to run Systems are often optimized for benchmarks For example, Transaction Processing Council (TPC) benchmarks

• Real-life applications

05/03/2023 4

Page 5: SPARQL Querying Benchmarks ISWC2016

SPARQL Querying Benchmarks• Centralized benchmarks• Centralized repositories• Query span over a single dataset• Real or synthetic• Examples: LUBM, SP2Bench, BSBM, WatDiv, DBPSB, FEASIBLE

• Federated benchmarks• Multiple Interlinked datasets• Query span over multiple datasets• Real or synthetic• Examples: FedBench, LargeRDFBench

5

Page 6: SPARQL Querying Benchmarks ISWC2016

Querying Benchmark Components• Datasets (real or synthetic)• Queries (real or synthetic)• Performance metrics• Execution rules

05/03/2023 6

Page 7: SPARQL Querying Benchmarks ISWC2016

Design Principles [L97] • Relevant• Understandable• Good metrics• Scalable• Coverage• Acceptance• Repeatable• Verifiable

05/03/2023 7

Page 8: SPARQL Querying Benchmarks ISWC2016

Choke Points: Technological Challenges [BNE14]• CP1: Aggregation Performance• CP2: Join Performance• CP3: Data Access Locality (materialized views)• CP4: Expression Calculation• CP5: Correlated Sub-queries• CP6: Parallelism and Concurrency

05/03/2023 8

Page 9: SPARQL Querying Benchmarks ISWC2016

RDF Querying Benchmarks Choke Points [FK16]• CP1: Join Ordering• CP2: Aggregation• CP3: OPTIONAL and nested OPTIONAL clauses• CP4: Reasoning• CP5: Parallel execution of UNIONS• CP6: FILTERS• CP7: ORDERING• CP8: Geo-spatial predicates • CP9: Full Text• CP10: Duplicate elimination• CP11: Complex FILTER conditions

05/03/2023 9

Page 10: SPARQL Querying Benchmarks ISWC2016

SPARQL Queries as Directed Labelled Hyper-graphs (DLH) [SNM15]

05/03/2023 10

Page 11: SPARQL Querying Benchmarks ISWC2016

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

11

DLH Of SPARQL Queries

Page 12: SPARQL Querying Benchmarks ISWC2016

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_States

dbpedia:nationality

12

DLH Of SPARQL Queries

Page 13: SPARQL Querying Benchmarks ISWC2016

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_States

dbpedia:nationality

dbpedia:party ?party

13

DLH Of SPARQL Queries

Page 14: SPARQL Querying Benchmarks ISWC2016

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_States

dbpedia:nationality

?x

dbpedia:party ?party

nyt:topicPage

?page

14

DLH Of SPARQL Queries

Page 15: SPARQL Querying Benchmarks ISWC2016

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_States

dbpedia:nationality

?x

owl:SameAS

dbpedia:party ?party

nyt:topicPage

?page

Star simple hybrid Tail of hyperedge

15

DLH Of SPARQL Queries

Page 16: SPARQL Querying Benchmarks ISWC2016

Key SPARQL Queries Characteristics FEASIBLE [SNM15], WatDiv [AHO+14], LUBM [GPH05] identified: • Query forms

SELECT, DESCRIBE, ASK, CONSTRUCT

• Constructs UNION, DISTINCT, ORDER BY, REGEX, LIMIT, FILTER, OPTIONAL, GROUP BY,

Negation

• Features Result size, No. of BGPs, Number of triple patterns, No. of join vertices, Mean

join vertices degree, Mean triple pattern selectivity, Join selectivity, Query runtime, Unbound predicates,

05/03/2023 16

Page 17: SPARQL Querying Benchmarks ISWC2016

Centralized SPARQL Querying Benchmarks

05/03/2023 17

Page 18: SPARQL Querying Benchmarks ISWC2016

Lehigh University Benchmark (LUBM) [GPH05]• Synthetic RDF benchmark• Test reasoning capabilities of triple stores• Synthetic universities data generator• 15 SPARQL 1.0 queries• Query design criteria

Input Size, Selectivity, Complexity, Logical inferencing

• Performance metrics Load time, Repository size, Query runtime, Query completeness and

soundness, Combined metric (runtime + completeness + soundness)

05/03/2023 18

Page 19: SPARQL Querying Benchmarks ISWC2016

LUBM Queries Choke Points [FK16]# CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11

Q1

Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14

05/03/2023 19

Join Ordering

Reasoning

Page 20: SPARQL Querying Benchmarks ISWC2016

LUBM Queries Characteristic [SNM15]

05/03/2023 20

Queries 15

Query Forms

SELECT 100.00%ASK 0.00%CONSTRUCT 0.00%DESCRIBE 0.00%

Important SPARQL

Constructs

UNION 0.00%DISTINCT 0.00%ORDER BY 0.00%REGEX 0.00%LIMIT 0.00%OFFSET 0.00%OPTIONAL 0.00%FILTER 0.00%GROUP BY 0.00%

Result Size

Min 3Max 1.39E+04Mean 4.96E+03S.D. 1.14E+04

BGPs

Min 1Max 1Mean 1S.D. 0

Triple Patterns

Min 1Max 6Mean 3S.D. 1.8126539

Join Vertices

Min 0Max 4Mean 1.6

S.D. 1.4040757

Mean Join Vertices Degree

Min 0Max 5Mean 2.0222222S.D. 1.2999796

Mean Triple

Patterns Selectivity

Min 0.0003212Max 0.432Mean 0.01S.D. 0.0745

Query Runtime

(ms)

Min 2Max 3200Mean 437.675S.D. 320.34

Page 21: SPARQL Querying Benchmarks ISWC2016

SP2Bench[SHM+09]• Synthetic RDF triple stores benchmark• DBLP bibliographic synthetic data generator• 12 SPARQL 1.0 queries• Query design criteria

SELECT, ASK SPARQL forms, Covers majority of SPARQL constructs

• Performance metrics Load time, Per query runtime, Arithmetic and geometric mean of overall

queries runtime, memory consumption

05/03/2023 21

Page 22: SPARQL Querying Benchmarks ISWC2016

SP2Bench Queries Choke Points [FK16]# CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

Q11

Q12

05/03/2023 22

Join Ordering FILTERS Duplicate Elimination

Page 23: SPARQL Querying Benchmarks ISWC2016

SP2Bench Queries Characteristic [SNM15]

05/03/2023 23

Queries 12

Query Forms

SELECT 91.67%ASK 8.33%CONSTRUCT 0.00%DESCRIBE 0.00%

Important SPARQL

Constructs

UNION 16.67%DISTINCT 41.67%ORDER BY 16.67%REGEX 0.00%LIMIT 8.33%OFFSET 8.33%OPTIONAL 25.00%FILTER 58.33%GROUP BY 0.00%

Result Size

Min 1Max 4.34E+07Mean 4.55E+06S.D. 1.37E+07

BGPs

Min 1Max 3Mean 1.5S.D. 0.67419986

Triple Patterns

Min 1Max 13Mean 5.91666667S.D. 3.82475985

Join Vertices

Min 0Max 10Mean 4.25

S.D. 3.79293602

Mean Join Vertices Degree

Min 0Max 9Mean 2.41342593S.D. 2.26080826

Mean Triple

Patterns Selectivity

Min 6.5597E-05Max 0.53980613Mean 0.22180428S.D. 0.20831387

Query Runtime

(ms)

Min 7Max 7.13E+05Mean 2.83E+05S.D. 5.26E+05

Page 24: SPARQL Querying Benchmarks ISWC2016

Berlin SPARQL Benchmark (BSBM) [ BS09]• Synthetic RDF triple stores benchmark• E-commerce use case synthetic data generator• 20 Queries

12 SPARQL 1.0 queries for explore, explore and update use cases 8 SPARQL 1.1 analytical queries for business intelligence use case

• Query design criteria SELECT, DESCRIBE, CONSTRUCT SPARQL forms, Covers majority of SPARQL

constructs

• Performance metrics Load time, Query Mixes per Hour (QMpH), Queries per Second (QpS)

05/03/2023 24

Page 25: SPARQL Querying Benchmarks ISWC2016

BSBM Queries Choke Points [FK16]# CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12

05/03/2023 25

Join Ordering

FILTERS

Result Ordering

Page 26: SPARQL Querying Benchmarks ISWC2016

BSBM Queries Characteristic [SNM15]

05/03/2023 26

Queries 20

Query Forms

SELECT 80.00%ASK 0.00%CONSTRUCT 4.00%DESCRIBE 16.00%

Important SPARQL

Constructs

UNION 8.00%DISTINCT 24.00%ORDER BY 36.00%REGEX 0.00%LIMIT 36.00%OFFSET 4.00%OPTIONAL 52.00%FILTER 52.00%GROUP BY 0.00%

Result Size

Min 0Max 31Mean 8.312S.D. 9.0308

BGPs

Min 1Max 5Mean 2.8S.D. 1.7039

Triple Patterns

Min 1Max 15Mean 9.32S.D. 5.18

Join Vertices

Min 0Max 6Mean 2.88

S.D. 1.8032

Mean Join Vertices Degree

Min 0Max 4.5Mean 3.05S.D. 1.6375

Mean Triple

Patterns Selectivity

Min 9E-08Max 0.0453Mean 0.0105S.D. 0.0142

Query Runtime

(ms)

Min 5Max 99Mean 9.1S.D. 14.564

Page 27: SPARQL Querying Benchmarks ISWC2016

DBpedia SPARQL Benchmark (DBSB) [MLA+14]• Real benchmark generation framework based on

DBpedia dataset with different sizes DBpedia query log mining

• Clustering log queries Name variables in triple patterns Select frequently executed queries Remove SPARQL keywords and prefixes Compute query similarity using Levenshtein string matching Compute query clusters using a soft graph clustering algorithm [NS09] Get queries templates (most frequently asked and uses more SPARQL constructs) from

clusters with > 5 queries Generate any number of queries from queries templates

05/03/2023 27

Page 28: SPARQL Querying Benchmarks ISWC2016

DBSB Queries Features• Number of triple patterns

Test the efficiency of join operations (CP1)• SPARQL UNION & OPTIONAL constructs

Handle parallel execution of Unions (CP5)• Solution sequences & modifiers (DISTINCT)

Efficiency of duplication elimination (CP10)• Filter conditions and operators (FILTER, LANG, REGEX, STR)

Efficiency of engines to execute filters as early as possible (CP6)

05/03/2023 28

Page 29: SPARQL Querying Benchmarks ISWC2016

DBSB Queries Features• Queries are based on 25 templates• Do not consider features such as number of join vertices, join vertex

degree, triple patterns selectivities or query execution times etc. • Only consider SPARQL SELECT queries• Not customizable for given use cases or needs of an application

05/03/2023 29

Page 30: SPARQL Querying Benchmarks ISWC2016

Recall: Key SPARQL Queries Characteristics FEASIBLE [SNM15], WatDiv [AHO+14], LUBM [GPH05] identified: • Query forms

SELECT, DESCRIBE, ASK, CONSTRUCT

• Constructs UNION, DISTINCT, ORDER BY, REGEX, LIMIT, FILTER, OPTIONAL, GROUP BY,

Negation

• Features Result size, No. of BGPs, Number of triple patterns, No. of join vertices, Mean

join vertices degree, Mean triple pattern selectivity, Join selectivity, Query runtime, Unbound predicates,

05/03/2023 30

Page 31: SPARQL Querying Benchmarks ISWC2016

DBSB Queries Characteristic [SNM15]

05/03/2023 31

Queries from 25 templates 125

Query Forms

SELECT 100%ASK 0%CONSTRUCT 0%DESCRIBE 0%

Important SPARQL

Constructs

UNION 36%DISTINCT 100%ORDER BY 0%REGEX 4%LIMIT 0%OFFSET 0%OPTIONAL 32%FILTER 48%GROUP BY 0%

Result Size

Min 197Max 4.62E+06Mean 3.24E+05S.D. 9.56E+05

BGPs

Min 1Max 9Mean 2.695652S.D. 2.438979

Triple Patterns

Min 1Max 12Mean 4.521739S.D. 2.79398

Join Vertices

Min 0Max 3Mean 1.217391

S.D. 1.126399

Mean Join Vertices Degree

Min 0Max 5Mean 1.826087S.D. 1.435022

Mean Triple

Patterns Selectivity

Min 1.19E-05Max 1Mean 0.119288S.D. 0.226966

Query Runtime

(ms)

Min 11Max 5.40E+04Mean 1.07E+04S.D. 1.73E+04

Page 32: SPARQL Querying Benchmarks ISWC2016

Waterloo SPARQL Diversity Test Suite (WatDiv) [AHO+14]• Synthetic benchmark

Synthetic data generator Synthetic query generator

• User-controlled data generator Entities to include Structuredness [DKS+11] of the dataset Probability of entity associations Cardinality of property associations

• Query design criteria Structural query features Data-driven query features

05/03/2023 32

Page 33: SPARQL Querying Benchmarks ISWC2016

WatDiv Query Design Criteria• Structural features

Number of triple patterns Join vertex count Join vertex degree

• Data-driven features Result size (Filtered) Triple Pattern (f-TP) selectivity BGP-Restricted f-TP selectivity Join-Restricted f-TP selectivity

05/03/2023 33

Page 34: SPARQL Querying Benchmarks ISWC2016

WatDiv Queries Generation• Query Template Generator

User-specified number of templates User specified template characteristics

• Query Generator Instantiates the query templates with terms (IRIs, literals etc.) from the RDF

dataset User-specified number of queries produced

05/03/2023 34

Page 35: SPARQL Querying Benchmarks ISWC2016

WatDiv Queries Characteristic [SNM15]

05/03/2023 35

Queries templates 125

Query Forms

SELECT 100.00%ASK 0.00%CONSTRUCT 0.00%DESCRIBE 0.00%

Important SPARQL

Constructs

UNION 0.00%DISTINCT 0.00%ORDER BY 0.00%REGEX 0.00%LIMIT 0.00%OFFSET 0.00%OPTIONAL 0.00%FILTER 0.00%GROUP BY 0.00%

Result Size

Min 0Max 4.17E+09Mean 3.49E+07S.D. 3.73E+08

BGPs

Min 1Max 1Mean 1S.D. 0

Triple Patterns

Min 1Max 12Mean 5.328S.D. 2.60823

Join Vertices

Min 0Max 5Mean 1.776

S.D. 0.9989

Mean Join Vertices Degree

Min 0Max 7Mean 3.62427S.D. 1.40647

Mean Triple

Patterns Selectivity

Min 0Max 0.01176Mean 0.00494S.D. 0.00239

Query Runtime

(ms)

Min 3Max 8.82E+08Mean 4.41E+08S.D. 2.77E+07

Page 36: SPARQL Querying Benchmarks ISWC2016

FEASIBLE: Benchmark Generation Framework [SNM15]• Customizable benchmark generation framework• Generate real benchmarks from queries log• Can be applied to any SPARQL queries log• Customizable for given use cases or needs of an application

05/03/2023 36

Page 37: SPARQL Querying Benchmarks ISWC2016

FEASIBLE Queries Selection Criteria• Query forms

SELECT, DESCRIBE, ASK, CONSTRUCT

• Constructs UNION, DISTINCT, ORDER BY, REGEX, LIMIT, FILTER, OPTIONAL, GROUP BY,

Negation

• Features Result size, No. of BGPs, Number of triple patterns, No. of join vertices, Mean

join vertices degree, Mean triple pattern selectivity, Join selectivity, Query runtime, Unbound predicates

05/03/2023 37

Page 38: SPARQL Querying Benchmarks ISWC2016

FEASIBLE: Benchmark Generation Framework• Dataset cleaning • Feature vectors and normalization• Selection of exemplars • Selection of benchmark queries

38

Page 39: SPARQL Querying Benchmarks ISWC2016

Feature Vectors and Normalization

39

SELECT DISTINCT ?entita ?nomeWHERE { ?entita rdf:type dbo:VideoGame . ?entita rdfs:label ?nome FILTER regex(?nome, "konami", "i") }LIMIT 100

Query Type: SELECT Results Size: 13Basic Graph Patterns (BGPs): 1Triple Patterns: 2Join Vertices: 1Mean Join Vertices Degree: 2.0Mean triple patterns selectivity: 0.01709761619798973UNION: No DISTINCT: Yes ORDER BY: No REGEX: Yes LIMIT: Yes OFFSET: No OPTIONAL: No FILTER: Yes GROUP BY: No Runtime (ms): 65

13 1 2 1 2 0.017 0 1 0 1 1 0 0 1 0 65

0.11 0.53 0.67 0.14 0.08 0.017 0 1 0 1 1 0 0 1 0 0.14

Feature Vector

Normalized Feature Vector

Page 40: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

40

Plot feature vectors in a multidimensional space

Query F1 F2Q1 0.2 0.2Q2 0.5 0.3Q3 0.8 0.3Q4 0.9 0.1Q5 0.5 0.5Q6 0.2 0.7Q7 0.1 0.8Q8 0.13 0.65Q9 0.9 0.5Q10 0.1 0.5

Suppose we need a benchmark of 3 queries

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 41: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

41

Calculate average point

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 42: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

42

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Select point of minimum Euclidean distance to avg. point

*Red is our first exemplar

Page 43: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

43

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Select point that is farthest to exemplars

Page 44: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

44

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 45: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

45

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Select point that is farthest to exemplars

Page 46: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

46

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 47: SPARQL Querying Benchmarks ISWC2016

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FEASIBLE

47

Calculate distance from Q1 to each exemplars

0.60

0.42

0.70

Page 48: SPARQL Querying Benchmarks ISWC2016

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FEASIBLE

48

0.60

0.42

0.70

Assign Q1 to the minimum distance exemplar

Page 49: SPARQL Querying Benchmarks ISWC2016

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FEASIBLE

49

Repeat the process for Q2

Page 50: SPARQL Querying Benchmarks ISWC2016

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FEASIBLE

50

Repeat the process for Q3

Page 51: SPARQL Querying Benchmarks ISWC2016

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FEASIBLE

51

Repeat the process for Q6

Page 52: SPARQL Querying Benchmarks ISWC2016

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FEASIBLE

52

Repeat the process for Q8

Page 53: SPARQL Querying Benchmarks ISWC2016

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FEASIBLE

53

Repeat the process for Q9

Page 54: SPARQL Querying Benchmarks ISWC2016

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FEASIBLE

54

Repeat the process for Q10

Page 55: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

55

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Calculate Average across each cluster

Page 56: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

56

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Calculate distance of each point in cluster to the average

Page 57: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

57

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Select minimum distance query as the final benchmark query from that cluster

Purple, i.e., Q2 is the final selected query from yellow cluster

Page 58: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

58

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Select minimum distance query as the final benchmark query from that cluster

Purple, i.e., Q3 is the final selected query from green cluster

Page 59: SPARQL Querying Benchmarks ISWC2016

FEASIBLE

59

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Select minimum distance query as the final benchmark query from that cluster

Purple, i.e., Q8 is the final selected query from brown clusterOur benchmark queries are Q2, Q3, and Q8

Page 60: SPARQL Querying Benchmarks ISWC2016

Comparison of Composite Error

60

FEASIBLE’s composite error is 54.9% less than DBPSB

Page 61: SPARQL Querying Benchmarks ISWC2016

Rank-wise Ranking of Triple Stores

61

All values are in percentages

None of the system is sole winner or loser for a particular rank Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%) Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%) OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %) Sesame is either fast or slow. Rank 1 (31.71% of the queries) and rank 4 (23.14%)

Page 62: SPARQL Querying Benchmarks ISWC2016

FEASIBLE(DBpedia) Queries Characteristic [SNM15]

05/03/2023 62

Queries 125

Query Forms

SELECT 95.20%ASK 0.00%CONSTRUCT 4.00%DESCRIBE 0.80%

Important SPARQL

Constructs

UNION 40.80%DISTINCT 52.80%ORDER BY 28.80%REGEX 14.40%LIMIT 38.40%OFFSET 18.40%OPTIONAL 30.40%FILTER 58.40%GROUP BY 0.80%

Result Size

Min 1Max 1.41E+06Mean 52183S.D. 1.97E+05

BGPs

Min 1Max 14Mean 3.176S.D. 3.55841574

Triple Patterns

Min 1Max 18Mean 4.88S.D. 4.396846377

Join Vertices

Min 0Max 11Mean 1.296

S.D. 2.39294662

Mean Join Vertices Degree

Min 0Max 11Mean 1.44906666S.D. 2.13246612

Mean Triple Patterns

Selectivity

Min 2.86693E-09Max 1

Mean0.14021433

7S.D. 0.31899488

Query Runtime

(ms)

Min 2Max 3.22E+04Mean 2242.6S.D. 6961.99191

Page 63: SPARQL Querying Benchmarks ISWC2016

FEASIBLE(SWDF) Queries Characteristic [SNM15]

05/03/2023 63

Queries 125

Query Forms

SELECT 92.80%ASK 2.40%CONSTRUCT 3.20%DESCRIBE 1.60%

Important SPARQL

Constructs

UNION 32.80%DISTINCT 50.40%ORDER BY 25.60%REGEX 16.00%LIMIT 45.60%OFFSET 20.80%OPTIONAL 32.00%FILTER 29.60%GROUP BY 19.20%

Result Size

Min 1Max 3.01E+05Mean 9091.512S.D. 4.70E+04

BGPs

Min 0Max 14Mean 2.688S.D. 2.812460752

Triple Patterns

Min 0Max 14Mean 3.232S.D. 2.76246734

Join Vertices

Min 0Max 3Mean 0.52

S.D. 0.65500554

Mean Join Vertices Degree

Min 0Max 4Mean 0.968S.D. 1.09202386

Mean Triple Patterns

Selectivity

Min 1.06097E-05Max 1Mean 0.29192835

S.D.0.32513860

1

Query Runtime

(ms)

Min 4Max 4.13E+04Mean 1308.832S.D. 5335.44123

Page 64: SPARQL Querying Benchmarks ISWC2016

Others Useful Benchmarks• Semantic Publishing Benchmark (SPB)• UniProt [RU09][UniprotKB]• YAGO (Yet Another Great Ontology)[SKW07]• Barton Library [Barton]• Linked Sensor Dataset [PHS10]• WordNet [WordNet]• Publishing TPC-H as RDF [TPC-H]• Apples and Oranges [DKS+11]

05/03/2023 64

Page 65: SPARQL Querying Benchmarks ISWC2016

Summary of the centralized SPARQL querying benchmarks

05/03/2023 65

Page 66: SPARQL Querying Benchmarks ISWC2016

Centralized SPARQL Querying Benchmarks Summary [SNM15]

05/03/2023 66

LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog Queries 15 125 12 125 125 125 130466 125 64030

Basic Query Forms

SELECT 100.00% 80.00% 91.67% 100.00% 100% 95.20% 97.964987 92.80% 58.7084ASK 0.00% 0.00% 8.33% 0.00% 0% 0.00% 1.93% 2.40% 0.09%CONSTRUCT 0.00% 4.00% 0.00% 0.00% 0% 4.00% 0.09% 3.20% 0.04%DESCRIBE 0.00% 16.00% 0.00% 0.00% 0% 0.80% 0.02% 1.60% 41.17%

Page 67: SPARQL Querying Benchmarks ISWC2016

05/03/2023 67

LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog

Important SPARQL

Constructs

UNION 0.00% 8.00% 16.67% 0.00% 36% 40.80% 7.97% 32.80% 29.32%DISTINCT 0.00% 24.00% 41.67% 0.00% 100% 52.80% 4.16% 50.40% 34.18%ORDER BY 0.00% 36.00% 16.67% 0.00% 0% 28.80% 0.30% 25.60% 10.67%REGEX 0.00% 0.00% 0.00% 0.00% 4% 14.40% 0.21% 16.00% 0.03%LIMIT 0.00% 36.00% 8.33% 0.00% 0% 38.40% 0.40% 45.60% 1.79%OFFSET 0.00% 4.00% 8.33% 0.00% 0% 18.40% 0.03% 20.80% 0.14%OPTIONAL 0.00% 52.00% 25.00% 0.00% 32% 30.40% 20.11% 32.00% 29.52%FILTER 0.00% 52.00% 58.33% 0.00% 48% 58.40% 93.38% 29.60% 0.72%GROUP BY 0.00% 0.00% 0.00% 0.00% 0% 0.80% 7.66E-06 19.20% 1.34%

Centralized SPARQL Querying Benchmarks Summary [SNM15]

Page 68: SPARQL Querying Benchmarks ISWC2016

Centralized SPARQL Querying Benchmarks Summary [SNM15]

05/03/2023 68

LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog

Result Size

Min 3 0 1 0 197 1 1 1 1Max 1.39E+04 31 4.34E+07 4.17E+09 4.62E+06 1.41E+06 1.41E+06 3.01E+05 3.01E+05Mean 4.96E+03 8.312 4.55E+06 3.49E+07 3.24E+05 52183 404.000307 9091.512 39.5068S.D 1.14E+04 9.0308 1.37E+07 3.73E+08 9.56E+05 1.97E+05 12932.2472 4.70E+04 2208.7

BGPs

Min 1 1 1 1 1 1 0 0 0Max 1 5 3 1 9 14 14 14 14Mean 1 2.8 1.5 1 2.695652 3.176 1.67629114 2.688 2.28603S.D 0 1.7039 0.67419986 0 2.438979 3.55841574 1.66075812 2.81246075 2.94057

Triple Patterns

Min 1 1 1 1 1 1 0 0 0Max 6 15 13 12 12 18 18 14 14Mean 3 9.32 5.91666667 5.328 4.521739 4.88 1.7062683 3.232 2.50928S.D 1.812653 5.18 3.82475985 2.60823 2.79398 4.396846377 1.68639622 2.76246734 3.21393

Join Vertices

Min 0 0 0 0 0 0 0 0 0Max 4 6 10 5 3 11 11 3 3Mean 1.6 2.88 4.25 1.776 1.217391 1.296 0.02279521 0.52 0.18076S.D 1.40407 1.8032 3.79293602 0.9989 1.126399 2.392946625 0.23381101 0.65500554 0.45669

Page 69: SPARQL Querying Benchmarks ISWC2016

Centralized SPARQL Querying Benchmarks Summary [SNM15]

05/03/2023 69

LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog

Mean Join

Vertices Degree

Min 0 0 0 0 0 0 0 0 0Max 5 4.5 9 7 5 11 11 4 5Mean 2.02222 3.05 2.4134259 3.62427 1.826087 1.449066667 0.04159183 0.968 0.37006S.D 1.29997 1.6375 2.2608082 1.40647 1.435022 2.132466121 0.33443107 1.092023868 0.87378

Mean Triple

Patterns Selectivi

ty

Min 0.00032 9E-08 6.559E-05 0 1.19E-05 2.86693E-09 1.261E-05 1.06097E-05 1.1E-05Max 0.432 0.0453 0.5398061 0.01176 1 1 1 1 1Mean 0.01 0.0105 0.2218042 0.00494 0.119288 0.140214337 0.00578652 0.29192835 0.02381S.D 0.0745 0.0142 0.2083138 0.00239 0.226966 0.318994887 0.03669906 0.325138601 0.07857

Query Runtime

Min 2 5 7 3 11 2 1 4 3Max 3200 99 7.13E+05 8.82E+08 5.40E+04 3.22E+04 5.60E+04 4.13E+04 4.13E+04Mean 437.675 9.1 2.83E+05 4.41E+08 1.07E+04 2242.6 30.4185995 1308.832 16.1632S.D 320.34 14.564 5.26E+05 2.77E+07 1.73E+04 6961.991912 702.518249 5335.441231 249.674

Page 70: SPARQL Querying Benchmarks ISWC2016

Federated SPARQL Querying Benchmarks

05/03/2023 70

Page 71: SPARQL Querying Benchmarks ISWC2016

Federated Query

• Return the party membership and news pages about all US presidents.

Party memberships US presidents US presidents News pages

71Computation of results require data from both sources

Page 72: SPARQL Querying Benchmarks ISWC2016

Federated SPARQL Query Processing

S1

S2

S3

S4

RDF RDF RDF RDF

Parsing/Rewriting

Source Selection

Federator Optimizer

Integrator

Rewrite query and get Individual Triple Patterns

Identify capable/relevant sources

Generate optimized query Execution Plan

Integrate sub-queries results

Execute sub-queries

Federation Engine

72

Page 73: SPARQL Querying Benchmarks ISWC2016

SPARQL Query Federation Approaches• SPARQL Endpoint Federation (SEF)• Linked Data Federation (LDF)• Hybrid of SEF+LDF

05/03/2023 73

Page 74: SPARQL Querying Benchmarks ISWC2016

SPLODGE [SP+12]• Federated benchmarks generation tool• Query design criteria

Query form Join type Result modifiers: DISTINCT, LIMIT, OFFSET, ORDER BY Variable triple patterns Triple patterns joins Cross product triple patterns Number of sources Number Join vertices Query selectivity

• Non-conjunctive queries that make use of the SPARQL UNION, OPTIONAL are not considered

05/03/2023 74

Page 75: SPARQL Querying Benchmarks ISWC2016

FedBench [FB+11]• Based on 9 real interconnected datasets

KEGG, DrugBank, ChEDI from life sciences DBpedia, GeoNames, Jamendo, SWDF, NYT, LMDB from cross domain Vary in structuredness and sizes

• Four sets of queries 7 life sciences queries 7 cross domain queries 11 Linked Data queries 14 queries from SP2Bench

05/03/2023 75

Page 76: SPARQL Querying Benchmarks ISWC2016

FedBench Queries Characteristic

05/03/2023 76

Queries 25

Query Forms

SELECT 100.00%ASK 0.00%CONSTRUCT 0.00%DESCRIBE 0.00%

Important SPARQL

Constructs

UNION 12%DISTINCT 0.00%ORDER BY 0.00%REGEX 0.00%LIMIT 0.00%OFFSET 0.00%OPTIONAL 4%FILTER 4%GROUP BY 0.00%

Result Size

Min 1Max 9054Mean 529S.D. 1764

BGPs

Min 1Max 2Mean 1.16S.D. 0.37

Triple Patterns

Min 2Max 7Mean 4S.D. 1.25

Join Vertices

Min 0Max 5Mean 2.52

S.D. 1.26

Mean Join Vertices Degree

Min 0Max 3Mean 2.14S.D. 0.56

Mean Triple

Patterns Selectivity

Min 0.001Max 1Mean 0.05S.D. 0.092

Query Runtime

(ms)

Min 50Max 1.2E+4Mean 1987S.D. 3950

Page 77: SPARQL Querying Benchmarks ISWC2016

LargeRDFBench

• 32 Queries 10 simple 10 complex 8 large data

• 14 Interlined datasets

77

LinkedMDB

DBpedia

New York

Times

Linked TCGA-

M

Linked TCGA-E

Linked TCGA-

A

Affymetrix

SW Dog Food

KEGG Drugbank

Jamendo

ChEBI

Geo names

owl:sameAs

owl:sameAs

based_near

basedNear owl:sameAs

x-geneid#Links: 251.3k

country, ethnicity, race

owl:s

ameA

s

keggCompoundId

based_near

owl:sameAs

bcr_patient_barcode

Same instanceLife Sciences Cross Domain Large Data

bcr_patient_barcode

#Links: 1.7k

#Links: 4.1k

#Links:

10k

#Links: 21.7k

#Links: 118k

#Links: 1.9k

#Links: 4k

#Link

s: 9.5

k

#Links: 1.3k

#Links: 63.1k

Page 78: SPARQL Querying Benchmarks ISWC2016

LargeRDFBench Datasets Statistics

78

Page 79: SPARQL Querying Benchmarks ISWC2016

LargeRDFBench Queries Properties• 14 Simple

2-7 triple patterns Subset of SPARQL clauses Query execution time around 2 seconds on avg.

• 10 Complex 8-13 triple patterns Use more SPARQL clauses Query execution time up to 10 min

• 8 Large Data Minimum 80459 results Large intermediate results Query execution time in hours

05/03/2023 79

Page 80: SPARQL Querying Benchmarks ISWC2016

LargeRDFBench Queries Characteristic

05/03/2023 80

Queries 32

Query Forms

SELECT 100.00%ASK 0.00%CONSTRUCT 0.00%DESCRIBE 0.00%

Important SPARQL

Constructs

UNION 18.75%DISTINCT 28.21%ORDER BY 9.37%REGEX 3.12%LIMIT 12.5%OFFSET 0.00%OPTIONAL 25%FILTER 31.25%GROUP BY 0.00%

Result Size

Min 1Max 3.0E+5 Mean 5.9E+4S.D. 1.1E+5

BGPs

Min 1Max 2Mean 1.43S.D. 0.5

Triple Patterns

Min 2Max 12Mean 6.6S.D. 2.6

Join Vertices

Min 0Max 6Mean 3.43

S.D. 1.36

Mean Join Vertices Degree

Min 0Max 6Mean 2.56S.D. 0.76

Mean Triple

Patterns Selectivity

Min 0.001Max 1Mean 0.10S.D. 0.14

Query Runtime

(ms)

Min 159Max >1hrMean UndefinedS.D. Undefined

Page 81: SPARQL Querying Benchmarks ISWC2016

FedBench vs. LargeRDFBench

05/03/2023 81

Page 82: SPARQL Querying Benchmarks ISWC2016

Performance Metrics• Efficient source selection in terms of• Total triple pattern-wise sources selected• Total number of SPARQL ASK requests used during source selection• Source selection time

• Query execution time• Results completeness and correctness• Number of remote requests during query execution• Index compression ratio (1- index size/datadump size)• Number of intermediate results

05/03/2023 82

Page 83: SPARQL Querying Benchmarks ISWC2016

Future Directions• Micro benchmarking• Synthetic benchmarks generation

Synthetic data that is like real data Synthetic queries that is like real queries

• Customizable and flexible benchmark generation • Fits user needs• Fits current use-case

• What are the most important choke points for SPARQL querying benchmarks? How they are related to query performance?

05/03/2023 83

Page 84: SPARQL Querying Benchmarks ISWC2016

References• [L97] Charles Levine. TPC-C: The OLTP Benchmark. In SIGMOD – Industrial Session, 1997.• [GPH05] Y. Guo, Z. Pan, and J. Heflin. LUBM: A Benchmark for OWL Knowledge Base

Systems. Journal Web Semantics: Science, Services and Agents on the World Wide Web archive Volume 3 Issue 2-3, October, 2005 , Pages 158-182

• [SHM+09] M. Schmidt , T. Hornung, M. Meier, C. Pinkel, G. Lausen. SP2Bench: A SPARQL Performance Benchmark. Semantic Web Information Management, 2009.

• [BS09] C. Bizer and A. Schultz. The Berlin SPARQL Benchmark. Int. J. Semantic Web and Inf. Sys., 5(2), 2009.

• [BSBM] Berlin SPARQL Benchmark (BSBM) Specification - V3.1. http://wifo5-3.informatik.unimannheim.de/bizer/berlinsparqlbenchmark/spec/index.html.

• [RU09] N. Redaschi and UniProt Consortium. UniProt in RDF: Tackling Data Integration and Distributed Annotation with the Semantic Web. In Biocuration Conference, 2009.

05/03/2023 84

Page 85: SPARQL Querying Benchmarks ISWC2016

References• [UniProtKB] UniProtKB Queries. http://www.uniprot.org/help/query-fields.• [SKW07]F. M. Suchanek, G. Kasneci and G. Weikum. YAGO: A Core of Semantic Knowledge Unifying

WordNet and Wikipedia, In WWW 2007.• [Barton] The MIT Barton Library dataset. http://simile.mit.edu/rdf-test-data/• [PHS10] H. Patni, C. Henson, and A. Sheth. Linked sensor data. 2010• [TPC-H] The TPC-H Homepage. http://www.tpc.org/tpch/• [WordNet] WordNet: A lexical database for English. http://wordnet.princeton.edu/• [MLA+14] M. Morsey, J. Lehmann, S. Auer, A-C. Ngonga Ngomo. Dbpedia SPARQL Benchmark• [SP+12] Görlitz, Olaf, Matthias Thimm, and Steffen Staab. Splodge: Systematic generation of sparql

benchmark queries for linked open data. International Semantic Web Conference. Springer Berlin Heidelberg, 2012.

• [BNE14] P. Boncz, T. Neumann, O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. Performance Characterization and Benchmarking. In TPCTC 2013, Revised Selected Papers.

05/03/2023 85

Page 86: SPARQL Querying Benchmarks ISWC2016

References• [NS09] A–C. Ngonga Ngomo and D. Schumacher. Borderflow: A local graph clustering algorithm for

natural language processing. In CICLing, 2009.• [AHO+14]G. Aluc, O. Hartig, T. Ozsu, K. Daudjee. Diversifed Stress Testing of RDF Data Management

Systems. In ISWC, 2014.• [SMN15] M. Saleem, Q. Mehmood, and A–C. Ngonga Ngomo. FEASIBLE: A Feature-Based SPARQL

Benchmark Generation Framework. ISWC 2015.• [DKS+11] S. Duan, A. Kementsietsidis, Kavitha Srinivas and Octavian Udrea. Apples and oranges: a

comparison of RDF benchmarks and real RDF datasets. In SIGMOD, 2011.• [FK16] I.Fundulaki, A.Kementsietsidis Assessing the performance of RDF Engines: Discussing RDF

Benchmarks, Tutorial at ESWC2016• [FB+11] Schmidt, Michael, et al. Fedbench: A benchmark suite for federated semantic data query

processing. International Semantic Web Conference. Springer Berlin Heidelberg, 2011.• [LB+16] M.Saleem, A.Hasnain, A–C. Ngonga Ngomo. LargeRDFBench: A Billion Triples Benchmark for

SPARQL Query Federation, Submitted to Journal of Web Semantics

05/03/2023 86

Page 87: SPARQL Querying Benchmarks ISWC2016

Thanks

{lastname}@informatik.uni-leipzig.deAKSW, University of Leipzig, Germany

05/03/2023 87

This work was supported by grands from BMWi project SAKE and the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).