27
Abadi, Marcus, Madden, Hollenbach VLDB 2007 Presented by: {Gui}llermo Cabrera The University of Texas at Austin

Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Embed Size (px)

DESCRIPTION

Part of the Semantic Web, Ontologies and the Cloud class at The University of Texas at Austin's Computer Science department during Spring 2010 term

Citation preview

Page 1: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Abadi, Marcus, Madden, HollenbachVLDB 2007

Presented by: {Gui}llermo CabreraThe University of Texas at Austin

Page 2: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Problem

Storage Goal

RDBMS use

RDF Physical Organization

Column store vs. Row Store

Materialized Path Expressions

Experiment & Results

Discussion

Page 3: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Performance: Self-joins

Many triples

Page 4: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Achieve scalability & performance in triple storage

Survey approaches in RDBMS

Benefits of vertical partition and column store

Page 5: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

1 table with 3 indexed columns?

Multi layer architecture◦ Translate -> Optimize -> Execute

Mapping tables for long URI and literals

Jena, Oracle, Sesame, 3store (Hyunjun),

Hexastore (Donghyuk)

Page 6: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Property tables◦ Clustered property table

Denormalize RDF (wider tables)

Clustering algorithm

NULL values

Page 7: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 8: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Property tables◦ Property-Class Tables

Exploit the type property

Properties may exist in multiple tables

Page 9: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 10: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Advantage:◦ Fewer joins

Disadvantage:◦ NULL values

◦ Multivalued attributes are complicated

Page 11: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Vertical Partition◦ n two-column tables, n = # of unique properties

◦ Table sorted by subject

Merge join

Page 12: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 13: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

• Advantage

Multi valued attributes supported

No clustering algorithm (Property tables)

Only accessed properties are read

• Disadvantage

Use of multiple properties (table joins)

Inserts expensive

Page 14: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Triple Store

Property Table

Vertical Partition (Row Store)

Vertical Partition Store (Column Store)

Page 15: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Why?

Projection is free

Tuple headers (metadata on row)◦ 35 bytes in Postgres vs. 8 bytes in C-Store

Column oriented compression◦ Run-length encoding (ex. 1,1,1,2,2 1x3, 2x2)

Optimized merge join◦ Prefetching

Page 16: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

<BookID1, Author, http://preamble/FoxJoe>

<http://preamble/FoxJoe,wasBorn, “1860”>

Find all books whose authors were born in 1860

Page 17: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 18: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Barton Libraries Dataset

Longwell Queries◦ Calculating counts

◦ Filtering

◦ Inference

Page 19: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

8.3 GB – Triple Store (Postgres)

14 GB – Property Table (Postgres)

5.2 GB – Vertically Partitioned (Postgres)

2.7 GB – Vertically Partitioned (C-store)

Including indices and mapping table

Page 20: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 21: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 22: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 23: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Replace ◦ subject-object joins subject-subject joins

Page 24: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Add 60 integer valued columns

7 GB increase in size

Page 25: Review: Scalable Semantic Web Data Management Using Vertical Partitioning

Great for reads, writes not considered

What about load times?

Using another benchmark (ex. LUBM)?

Native XML databases for RDF/XML?

Test triple store in Sesame

Page 26: Review: Scalable Semantic Web Data Management Using Vertical Partitioning
Page 27: Review: Scalable Semantic Web Data Management Using Vertical Partitioning