40
Lei Zou 1 , Jinghui Mo 1 , Lei Chen 2 , M. Tamer Özsu 3 , Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong Kong University of Science and Technology, 3 University of Waterloo

gStore: Answering SPARQL Queries Via Subgraph Matching

  • Upload
    phyre

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

gStore: Answering SPARQL Queries Via Subgraph Matching. 1 Peking University, 2 Hong Kong University of Science and Technology, 3 University of Waterloo. Lei Zou 1 , Jinghui Mo 1 , Lei Chen 2 , M. Tamer Özsu 3 , Dongyan Zhao 1. Outline. Background & Related Work Overview of gStore - PowerPoint PPT Presentation

Citation preview

Page 1: gStore: Answering SPARQL Queries Via Subgraph Matching

Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer Özsu3, Dongyan Zhao1

1

gStore: Answering SPARQL Queries Via Subgraph Matching

1Peking University,2Hong Kong University of Science and

Technology,3University of Waterloo

Page 2: gStore: Answering SPARQL Queries Via Subgraph Matching

Outline

• Background & Related Work

• Overview of gStore

• Encoding Technique

• VS*-tree & Query Algorithm

• Experiments

• Conclusions

2

Page 3: gStore: Answering SPARQL Queries Via Subgraph Matching

Outline

• Background & Related Work

• Overview of gStore

• Encoding Technique

• VS*-tree & Query Algorithm

• Experiments

• Conclusions

3

Page 4: gStore: Answering SPARQL Queries Via Subgraph Matching

Semantic Web

4

“Semantic Web Technologies” is a collection of standard technologies to realize a Web of Data.

Page 6: gStore: Answering SPARQL Queries Via Subgraph Matching

RDF Graph

6

Entity VertexLiteral Vertex

Page 7: gStore: Answering SPARQL Queries Via Subgraph Matching

SPARQL Queries

7

SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

Query Graph

Page 8: gStore: Answering SPARQL Queries Via Subgraph Matching

Subgraph Match vs. SPARQL Queries

8

Page 9: gStore: Answering SPARQL Queries Via Subgraph Matching

Naïve Triple Store

9

SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

SQL: Select T3.SubjectFrom T as T1, T as T2, T as T3Where T1.Predict=“BornOnDate” and T1.Object=“1809-02-12” and T2.Predict=“DiedOnDate” and T2.Object=“1865-04-15” and T3. Predict=“hasName” and T1.Subject = T2.Subject and T2. Subject= T3.subject

Too many Self-Joins

Page 10: gStore: Answering SPARQL Queries Via Subgraph Matching

Existing Solutions Three categories of solutions are proposed to speed up query

processing: 1. Property Table; Jena [K. Wilkinson et al. SWDB 03], …

2. Vertically Partitioned Solution; SW-store [D. J. Abadi et al. VLDB 07],…

3. Exhaustive-IndexingRDF-3x [T. Neumann et al. VLDB 08], Hexastore [C. Weiss et al. VLDB 08 ],…

10

Page 11: gStore: Answering SPARQL Queries Via Subgraph Matching

Existing Solutions-Property Table

11

SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

SQL: Select People.hasName from People where People.BornOnDate = “1809-02-12” and People.DiedOnDate = “1865-04-15”.

Reducing # of join steps

Page 12: gStore: Answering SPARQL Queries Via Subgraph Matching

Existing Solutions-Vertically Partitioned Solution

12

Fast Merge Join

Page 13: gStore: Answering SPARQL Queries Via Subgraph Matching

Existing Solutions- Exhaustive-Indexing

Each SPARQL query statement can be translated into one “range query”.

SPARQL Query: Select ?name Where {

?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

13

Range query &

Merge Join

Page 14: gStore: Answering SPARQL Queries Via Subgraph Matching

Some Limitations

1. Difficult to handle ``wildcard queries’’.

2. Difficult to handle updates.

14

Page 15: gStore: Answering SPARQL Queries Via Subgraph Matching

Outline

• Background & Related Work

• Overview of gStore

• Encoding Technique

• VS*-tree & Query Algorithm

• Experiments

• Conclusions

15

Page 16: gStore: Answering SPARQL Queries Via Subgraph Matching

Intuition of gStore

16

Finding Matches over a Large Graph is not a trivial task.

Page 17: gStore: Answering SPARQL Queries Via Subgraph Matching

Preliminaries

17

Entity VertexLiteral Vertex

Page 18: gStore: Answering SPARQL Queries Via Subgraph Matching

Storage Schema in gStore

18

Encoding all neibhors into a “bit-string”, called signature.

Page 19: gStore: Answering SPARQL Queries Via Subgraph Matching

Encoding Technique (1)

19

“Abr”, “bra”,

”rah”,

”aha”,….,

( hasName, “Abraham Lincoln”)

0010 0000 0000

0000 0010 0000 0000

1000 0000 0000 0000

0000 0000 0100 0000

0000 0000 0000 0001

1000 0010 0100 0001

OR

1000 0010 0100 0001

( BornOnDate, “1809-02-12”)

0100 0000 0000 0100 0010 0100 1000

( DiedOnDate, “1865-04-15”)

0000 1000 0000 0000 0010 0100 0000

( DiedIn, “y:Washington_D.c”)

0000 0010 0000 1000 0010 0100 0001

0000 0010 0000 1100 0010 0100 1001

OR

Page 20: gStore: Answering SPARQL Queries Via Subgraph Matching

Encoding Technique (2)

20

Page 21: gStore: Answering SPARQL Queries Via Subgraph Matching

Encoding Technique (3)

21

Page 22: gStore: Answering SPARQL Queries Via Subgraph Matching

Outline

• Background & Related Work

• Overview of gStore

• Encoding Technique

• VS-tree & Query Algorithm

• Experiments

• Conclusions

22

Page 23: gStore: Answering SPARQL Queries Via Subgraph Matching

A Straightforward Solution (1)

23

001

004

006

002

003

006

u1 u2

L1 L2

Page 24: gStore: Answering SPARQL Queries Via Subgraph Matching

A Straightforward Solution (2)

24

001

004

006

002

003

006

Large Join Space !

L1 L2

Page 25: gStore: Answering SPARQL Queries Via Subgraph Matching

VS-tree

Page 26: gStore: Answering SPARQL Queries Via Subgraph Matching

Pruning Technique

26

u1 u2

31d

34d

34d

32d

3G

10010

001

004

006

002

003

006

*G

Reduced Join

Space!

Page 27: gStore: Answering SPARQL Queries Via Subgraph Matching

An Example for Pruning Effect

27

Query:?x1 y:hasGivenName ?x5 ?x1 y:hasFamilyName ?x6 ?x1 rdf:type <wordnet_scientist_110560637> ?x1 y:bornIn ?x2 ?x1 y:hasAcademicAdvisor ?x4 ?x2 y:locatedIn <Switzerland> ?x3 y:locatedIn <Germany> ?x4 y:bornIn ?x3

Before Pruning

After Pruning

x1 810 810

X2 424 197

x3 66 66

x4 36187 6686

Page 28: gStore: Answering SPARQL Queries Via Subgraph Matching

Query Algorithm-Top-Down

28

Page 29: gStore: Answering SPARQL Queries Via Subgraph Matching

Outline

• Background & Related Work

• Overview of gStore

• Encoding Technique

• VS*-tree & Query Algorithm

• Experiments

• Conclusions

29

Page 30: gStore: Answering SPARQL Queries Via Subgraph Matching

Datasets

30

Triple # Size

Yago 20 million 3.1GB

DBLP 8 million 0.8 GB

Page 31: gStore: Answering SPARQL Queries Via Subgraph Matching

Exact Queries

31

Page 32: gStore: Answering SPARQL Queries Via Subgraph Matching

Wildcard Queries

32

Page 33: gStore: Answering SPARQL Queries Via Subgraph Matching

Outline

• Background & Related Work

• Overview of gStore

• Encoding Technique

• VS*-tree & Query Algorithm

• Experiments

• Conclusions

33

Page 34: gStore: Answering SPARQL Queries Via Subgraph Matching

Conclusions

• Vertex Encoding Technique;

• An Efficient index Structure: VS-tree;

• A Novel Filtering Technique.

34

Page 36: gStore: Answering SPARQL Queries Via Subgraph Matching

Updates- Insertion in G*

36

Page 37: gStore: Answering SPARQL Queries Via Subgraph Matching

Updates- Insertion in VS*-tree

37

Page 38: gStore: Answering SPARQL Queries Via Subgraph Matching

Updates- Deletion in VS*-tree

38

To be deleted

Page 39: gStore: Answering SPARQL Queries Via Subgraph Matching

Framework in gStore

39

Page 40: gStore: Answering SPARQL Queries Via Subgraph Matching

A Straightforward Solution (1)

40

0000 1000u u & 001 = u