SkyhookDB: leveraging programmable storage toward DB ... · PDF fileExamples: MySQL, PostgreSQL, MS SQL Server, etc ... Min/max/size ... Oracle database appliance evolves as an enterprise

CROSS Symposium 2017

SkyhookDB: leveraging programmable storage

toward DB elasticityCROSS Incubator Project

PI: Jeff LeFevre ([email protected])1

mailto:[email protected]


Growing your database

2

DB Storage

DB application


Larger single machine route

3

DB Storage

DB application● Static allocation/course

upgrade path● Storage variants

○ RAID, SAN, EBS● Benefit: simple arch● Challenge

○ Single DB app does all processing

○ Scaling (**Tidalscale)

Examples: MySQL, PostgreSQL, MS SQL Server, etc.


Classical MPP route

4

DB Storage

DB application

● Static allocation/course upgrade path

● Storage variants○ RAID, NAS, EBS○ Specialized file

systems● Benefit: parallelism● Challenge:

○ Vendor lock-in○ Complexity of

multi-node coordination

○ $$

DB Storage

DB application

DB Storage

DB application

Examples: Teradata, OracleRAC, IBM, Vertica; newer: Redshift, PostgresXL


“Hybrid” MPP route

5

DB Storage

DB application

● Static allocation/course upgrade path

● Data distributed among multiple independent DB instances

● Benefits:○ Simpler than classical

MPP● Challenges:

○ Multi-shard transactions difficult

○ Data still tied to node

DB Storage

DB application

DB Storage

DB application

Examples: CitusData (Postgres), Greenplum, PostgresFDW+PMPP, other sharded databases


Many other approaches...

• AWS Redshift Spectrum (ParAccel)– MPP over local storage + query S3 data in place

• Apache Kudu (Cloudera)– Hadoop ecosystem; Impala, Spark

• Snowflake Computing– Dynamic MPP over S3 buckets in AWS

• OpenStack Swift “Store-lets” project (scoop)6


Growing your database: SkyhookDB

7

DB application CPU

DB storage

DB application

DB Storage

CPU

CPU

CPU

CPU

CPU

CPU

CPU CPU

CPU

CPU

DB application CPU

DB storage


Ceph Distributed Object Store

Elasticity: SkyhookDB

8


PostgreSQL CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPUCPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPUCPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPUCPU


CPU

CPUCPU

CPU


SkyhookDB Guiding Principles

• Leverage and extend storage capabilities– Processing++ {elasticity*, indexing, stats, local optimizations}

• Single DB node architecture – Avoid the complexity of MPP, still get the parallelism

• Import trust by building only upon trusted software– PostgreSQL, Ceph

• Keep approach generic enough for other applications– Allows for application-specific customizations *marketing benefits

9


Key features enabling elasticity

1. Intrinsic elasticity of storage system – Distributed storage systems are dynamic!

2. Repartition data dynamically– Subdivide data partitions for elasticity and

load balancing/skew3. Scale-out I/O and compute

– Remote data processing + metadata mgm’t10


Architecture

11

Client DBMS applicationPostgreSQL

Foreign Data Wrapper

Repo: skyhook-db

Repo: skyhook-fdw

Ceph Extensions

Repo: skyhook-kraken

Storage Access Layer API

Server

OS

Object Storage Device

Server

OS


Server

OS



12

SkyhookDB Architecture

set of objects set of objectsset of objects

● OSDs store and manage objects○ Database table partitions

● Objects are extensible○ Remote filter, projection

● OSDs have embedded KVstore (RocksDB)○ Query-able metadata

● ALL data, metadata, indexes stored in objects

13

Object Storage Device (OSD): Ceph

CEPH OSD

RocksDB

set of objects


Design Details

14

Object storage

RocksDB

N partitions

DB Table data (rows)

object-1

object-2

object-n

Ceph OSD Query-able Metadata

● Index values● Counts● Min/max/size● Stats, etc...

Object storage

RocksDB

Object storage

RocksDB

Object storage

RocksDB

15

• Object functionality– SQL processing (pushdowns)– Complex functions (application specific)– Repartition/move (copy partial rows)

• Collection level functionality– Indexing at object or collection level– IO batching, request reordering (local state)

DB-specific customizations


Implementation Status

16


Breadth-first approach

• Tested prototype functionality at scale• Try to ensure scalability benefits are clear

– IO, CPU– Collection-level optimizations possible– Server-local runtime optimizations

• Many experiments for each aspect– Pushdown processing, indexing, repartition/split

17


But Does it Scale?

• Experiments across many dimensions– Data set size– Number of machines– Number of objects per dataset– Repartition + new query time– Index creation per object– Number of concurrent users - not yet tested

18


Selected Experimental Results

19

20

• Dataset: – TPC lineitem table, 1 billion rows ~140GB– Ceph 10K objects ~4MB each (each with index)

• Queries: – Range query, regex query (LIKE), point query

• Machines:– Cloudlab 20 Cores, 500GB SSDs, 10GbE

• Parallelism - 12/node IO dispatch

Experimental Setup

21

Store + Index data

22

Qa: Count query with 10% selectivity:

SELECT count(*) FROM lineitem1B WHERE l_extendedprice > 71000.0

Qb: Range query with 10% selectivity:

SELECT * FROM lineitem1B WHERE l_extendedprice > 71000.0

Qd: Point query (unique row) issued with and without index:

SELECT l_extendedprice FROM lineitem1B WHERE l_orderkey=5 AND l_linenumber=3

Qf: Regex query with 10% selectivity:

SELECT * FROM lineitem1B WHERE l_comment iLIKE '%uriously%'

Queries

23

Range query (selectivity=10%)

24

Point query(unique row)

25

Regex query (selectivity=10%)

26

Client side Server side

Server sideoptimized

Cache locality knowledge

27

EXPLAIN SELECT l_extendedprice FROM lineitem1B WHERE l_orderkey=5 AND l_linenumber=3;

QUERY PLAN------------------------------------------------------------------------------------ Gather (cost=1000.00..21420167.15 rows=439 width=8) Workers Planned: 9 -> Parallel Seq Scan on lineitem1bssd (cost=0.00..21419123.25 rows=49

width=8) Filter: ((l_orderkey = 5) AND (l_linenumber = 3))

Parallel Query Plan in Postgres

28

Filter: l_extendedprice >71000 Regex: l_comment like %uriously%

Number Servers

Initial comparison with single-node Postgres (selectivity=10%)

Number of Storage Servers (OSDs) Number of Storage Servers (OSDs)


What’s Next And Challenges

• More functionality: – Joins in storage (critical for analytics)– Multi shard transactions (several options here)

• Collection level optimizations– IO batching and indexing

• Application specific customizations– Physics or genomics queries?

• Cost based elasticity policies– Minimize time or impact to grow/shrink

29


Acknowledgements

• Noah Watkins• CROSS Industry Board members• Other SRL/CROSS: Ivo, Carlos, Shel, Michael• More info:

– https://sites.google.com/site/skyhookdb/

– https://cross.ucsc.edu– [email protected]

30

https://sites.google.com/site/skyhookdb/

https://cross.ucsc.edu

mailto:[email protected]

References (1)[Oracle 2016a] Oracle database appliance, Nov 2016.

[Oracle 2012] A Technical overview of the Oracle Exadata database machine and Exadata Storage Server, Jun 2012.

[Oracle 2016b] Oracle database appliance evolves as an enterprise private cloud building block, Jan 2016.

[Kee] K. Keeton, D.Patterson, J.Hellerstein, A case for intelligent disks (iDISKS), SIGMOD Record, Sep 1998.

[Bor] H.Boral, D.DeWitt, Database machines, an idea whose time has passed?, Workshop on Database Machines 1983.

[Woo] L. Woods, Z.Istvan, G.Alonso, Ibex: an intelligent storage engine with support for advanced SQL offloading, VLDB 2014.

[Teradata 2016] Marketing statement: “Multi-dimensional scalability through independent scaling of processing power and storage capacity via our next generation massively parallel processing (MPP) architecture”

[Sevilla+Watkins 2017] M. Sevilla, N. Watkins, I. Jimenez, P. Alvaro, S. Finkelstein, J. LeFevre, C. Maltzahn, "Malacology: A Programmable Storage System", in EuroSys 2017.

[LeFevre 2014] J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis, M.J. Carey, "MISO: Souping Up Big Data Query Processing with a Multistore System", in SIGMOD 2014.

[LeFevre 2016] J. LeFevre, R. Liu, C. Inigo, M. Castellanos, L. Paz, E. Ma, M. Hsu, "Building the Enterprise Fabric for Big Data with Vertica and Spark", in SIGMOD 2016.

31

http://www.oracle.com/technetwork/database/database-appliance/documentation/oracle-db-appliance-whitepaper-1867695.pdf

http://www.oracle.com/technetwork/database/exadata/exadata-technical-whitepaper-134575.pdf

http://www.oracle.com/us/products/database/database-appliance/wikibon-database-appliance-wp-2866503.pdf

http://dl.acm.org/citation.cfm?id=3064208



References (2)[Anantharayanan 2011] Ganesh Anantharayanan, Ali Ghodsi, Scott Shenker, Ion Stoica, “Disk-Locality in Datacenter Computing Considered Irrelevant”, in USENIX HotOS, May 2011.

[Dageville 2016] Benoit Dageville, Thierry Cruanes, Marcin Zukowski, et. al, "The Snowflake Elastic Data Warehouse", in SIGMOD 2016.

[FlashGrid 2017] White paper on Mission-Critical Databases in the Cloud. “Oracle RAC on Amazon EC2 Enabled by FlashGrid® Software”. rev. Jan 6, 2017.

[Gupta 2015] Anurag Gupta, Deepak Agarwal, Derek Tan, et.al, “Amazon Redshift and the Case for Simpler Data Warehouses”, in SIGMOD 2015.

[Srinivasan 2016] Vidhya Srinivasan, “What’s New with Amazon Redshift”, AWS re-invent, Nov 2016.

[Dynamo 2017] Amazon’s DynamoDB, https://aws.amazon.com/dynamodb/, accessed Jan 2017.

[Brantner 2008] Matthias Brantner, Daniela Florescu, David Graf, Donald Kossmann, Tim Kraska, “Building a database on S3”, in SIGMOD 2008.

[Shuler 2015] Bill Shuler, “CitusDB Architecture for Real-time Big Data”, Jun 2015.

[Citusdata 2017] Citus Data, https://www.citusdata.com, accessed Jan 2017.

[Pavlo 2016] Andrew Pavlo, Matthew Aslett, "What’s Really New with NewSQL?", in SIGMOD Record Jun 2016.

32

https://amplab.cs.berkeley.edu/publication/disk-locality-in-datacenter-computing-considered-irrelevant/


https://www.flashgrid.io/wp-content/sideuploads/resources/FlashGrid_OracleRAC_on_AWS.pdf


https://www.slideshare.net/AmazonWebServices/aws-reinvent-2016-whats-new-with-amazon-redshift-bda304

https://aws.amazon.com/dynamodb/


https://www.linkedin.com/pulse/citusdb-architecture-real-time-big-data-bill-schuler

https://www.citusdata.com

http://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf

References (3)[Xi 2015] Sam (Likun) Xi, Oreoluwa Babarinsa, Manos Athanassoulis, Stratos Idreos, "Beyond the Wall: Near-Data Processing for Databases", in DAMON 2015.

[Kim 2015] Sungchan Kim, Hyunok Ohb, Chanik Parkc, Sangyeun Choc, Sang-Won Leed, Bongki Moone, "In-storage processing of database scans and joins", Information Sciences, 2015.

[Do 2013] Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, David J. DeWitt, "Query Processing on Smart SSDs: Opportunities and Challenges", in SIGMOD 2013.

[Gkantsidis] Christos Gkantsidis, Dimitrios Vytiniotis, Orion Hodson, Dushyanth Narayanan, Florin Dinu, Antony Rowstron, "Rhea: automatic filtering for unstructured cloud storage", in NSDI 2013.

[Balasubramonian 2014] Rajeev Balasubramonian, Jichuan Chang, Troy Manning, Jaime H. Moreno, Richard Murphy, Ravi Nair, Steven Swanson, "NEAR-DATA PROCESSING: INSIGHTS FROM A MICRO-46 WORKSHOP", IEEE Micro, Aug 2014.

[Kudu 2017] Apache Kudu, Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data

[Kudu 2015] Lipcon et. al “Kudu: Storage for Fast Analytics on Fast Data”, white paper.

[Moatti 2017] Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop, ICDE 2017 33

http://stratos.seas.harvard.edu/files/stratos/files/jafardamon2015.pdf

http://www.sciencedirect.com/science/article/pii/S0020025515006003

http://www.sciencedirect.com/science/article/pii/S0020025515006003

http://web.cecs.pdx.edu/~tufte/dbimplem-spr2014/papers/SmartSSD-Paper.pdf

http://web.cecs.pdx.edu/~tufte/dbimplem-spr2014/papers/SmartSSD-Paper.pdf

https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final121.pdf

https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final121.pdf

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6871738

http://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/

http://kudu.apache.org/kudu.pdf

http://ieeexplore.ieee.org/document/7929987/

34

Initial comparison with single-node Postgres (point query)

Documents

SkyhookDB: leveraging programmable storage toward DB ... · PDF fileExamples: MySQL, PostgreSQL, MS SQL Server, etc ... Min/max/size ... Oracle database appliance evolves as an enterprise