316
VLDB 2009 Tutorial Column-Oriented Database Systems 1 Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-Oriented Database Systems Part 1: Stavros Harizopoulos (HP Labs) Part 2: Daniel Abadi (Yale) Part 3: Peter Boncz (CWI) VLDB 2009 Tutorial

Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

  • Upload
    lamnga

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems

1

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

Part 1: Stavros Harizopoulos (HP Labs)

Part 2: Daniel Abadi (Yale)Part 3: Peter Boncz (CWI)

VLDB 2009

Tutorial

Page 2: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 2

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

What is a column-store?

VLDB 2009 Tutorial Column-Oriented Database Systems 2

row-store column-store

Date CustomerProductStore

+ easy to add/modify a record

- might read in unnecessary data

+ only need to read in relevant data

- tuple writes require multiple accesses

=> suitable for read-mostly, read-intensive, large data repositories

Date Store Product Customer Price Price

Page 3: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 3

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Are these two fundamentally different?

l The only fundamental difference is the storage layoutl However: we need to look at the big picture

VLDB 2009 Tutorial Column-Oriented Database Systems 3

‘70s ‘80s ‘90s ‘00s today

row-stores row-stores++ row-stores++

different storage layouts proposed

new applicationsnew bottleneck in hardware

column-stores

converge?

l How did we get here, and where we are headingl What are the column-specific optimizations?l How do we improve CPU efficiency when operating on Cs

Part 2

Part 1

Part 3

Page 4: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 4

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Outline

l Part 1: Basic concepts — Stavrosl Introduction to key featuresl From DSM to column-stores and performance tradeoffsl Column-store architecture overviewl Will rows and columns ever converge?

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 4

Page 5: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 5

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Telco Data Warehousing example

l Typical DW installation

l Real-world example

VLDB 2009 Tutorial Column-Oriented Database Systems 5

usage source

toll

account

star schema

fact table

dimension tables

or RAM

QUERY 2SELECT account.account_number,sum (usage.toll_airtime),sum (usage.toll_price)FROM usage, toll, source, accountWHERE usage.toll_id = toll.toll_idAND usage.source_id = source.source_idAND usage.account_id = account.account_idAND toll.type_ind in (‘AE’. ‘AA’)AND usage.toll_price > 0AND source.type != ‘CIBER’AND toll.rating_method = ‘IS’AND usage.invoice_date = 20051013GROUP BY account.account_number

Column-store Row-storeQuery 1 2.06 300Query 2 2.20 300Query 3 0.09 300Query 4 5.24 300Query 5 2.88 300

Why? Three main factors (next slides)

“One Size Fits All? - Part 2: Benchmarking Results” Stonebraker et al. CIDR 2007

Page 6: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 6

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Telco example explained (1/3):read efficiency

read pages containing entire rows

one row = 212 columns!

is this typical? (it depends)

VLDB 2009 Tutorial Column-Oriented Database Systems 6

row store column store

read only columns needed

in this example: 7 columns

caveats:• “select * ” not any faster• clever disk prefetching• clever tuple reconstruction What about vertical partitioning?

(it does not work with ad-hoc queries)

Page 7: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 7

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Telco example explained (2/3):compression efficiencyl Columns compress better than rows

l Typical row-store compression ratio 1 : 3l Column-store 1 : 10

l Why?l Rows contain values from different domains

=> more entropy, difficult to dense-packl Columns exhibit significantly less entropyl Examples:

l Caveat: CPU cost (use lightweight compression)

VLDB 2009 Tutorial Column-Oriented Database Systems 7

Male, Female, Female, Female, Male1998, 1998, 1999, 1999, 1999, 2000

Page 8: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 8

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Telco example explained (3/3):sorting & indexing efficiencyl Compression and dense-packing free up space

l Use multiple overlapping column collectionsl Sorted columns compress betterl Range queries are fasterl Use sparse clustered indexes

VLDB 2009 Tutorial Column-Oriented Database Systems 8

What about heavily-indexed row-stores?(works well for single column access,cross-column joins become increasingly expensive)

Page 9: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 9

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Additional opportunities for column-stores

l Block-tuple / vectorized processingl Easier to build block-tuple operators

l Amortizes function-call cost, improves CPU cache performance

l Easier to apply vectorized primitivesl Software-based: bitwise operationsl Hardware-based: SIMD

l Opportunities with compressed columnsl Avoid decompression: operate directly on compressedl Delay decompression (and tuple reconstruction)

l Also known as: late materialization

l Exploit columnar storage in other DBMS componentsl Physical design (both static and dynamic)

VLDB 2009 Tutorial Column-Oriented Database Systems 9

Part 3

morein Part 2

See: Database Cracking, from CWI

Page 10: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 10

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Effect on C-Store performance

VLDB 2009 Tutorial Column-Oriented Database Systems 10

“Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Hachem, and Madden. SIGMOD 2008.

Tim

e (s

ec)

Average for SSBM queries on C-store

enablelate

materializationenablecompression &

operate on compressed

originalC-store

column-orientedjoin algorithm

Page 11: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 11

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Summary of column-store key features

l Storage layout

l Execution engine

l Design tools, optimizer

VLDB 2009 Tutorial Column-Oriented Database Systems 11

columnar storage

header/ID elimination

compression

multiple sort orders

column operators

avoid decompression

late materialization

vectorized operations Part 3

Part 2

Part 2

Part 1

Part 1

Part 2

Part 3

Page 12: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 12

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Outline

l Part 1: Basic concepts — Stavrosl Introduction to key featuresl From DSM to column-stores and performance tradeoffsl Column-store architecture overviewl Will rows and columns ever converge?

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 12

Page 13: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 13

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

From DSM to Column-stores

70s -1985:TOD: Time Oriented Database – Wiederhold et al."A Modular, Self-Describing Clinical Databank System," Computers and Biomedical Research, 1975More 1970s: Transposed files, Lorie, Batory,

Page 14: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 14

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cantor - Column Store System early 80s•“An overview of cantor: a new system for data analysis”

Karasalo, Svensson, SSDBM 1983•“The Design of Cantor - a new system for data analysis”

Karasalo, Svensson, SSDBM 1986

Page 15: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 15

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cantor - Column Store System early 80s

J

•“An overview of cantor: a new system for data analysis”Karasalo, Svensson, SSDBM 1983

•“The Design of Cantor - a new system for data analysis”Karasalo, Svensson, SSDBM 1986

1983 vector graphics

J

Page 16: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 16

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cantor - Column Store System early 80s

zero suppression

delta coding

RLE

delta RLE

l Dynamic programming algorithm to choose compression method and parameters:

•“An overview of cantor: a new system for data analysis”Karasalo, Svensson, SSDBM 1983

•“The Design of Cantor - a new system for data analysis”Karasalo, Svensson, SSDBM 1986

Page 17: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 17

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

From DSM to Column-stores

70s -1985:

1985: DSM paper

1990s: Commercialization through SybaseIQLate 90s – 2000s: Focus on main-memory performance

l DSM “on steroids” [1997 – now]l Hybrid DSM/NSM [2001 – 2004]

2005 – : Re-birth of read-optimized DSM as “column-store”

VLDB 2009 Tutorial Column-Oriented Database Systems 17

“A decomposition storage model”Copeland and Khoshafian. SIGMOD 1985.

CWI: MonetDB

Wisconsin: PAX, Fractured Mirrors

Michigan: Data Morphing CMU: Clotho

MIT: C-Store CWI: MonetDB/X100 10+ startups

TOD: Time Oriented Database – Wiederhold et al."A Modular, Self-Describing Clinical Databank System," Computers and Biomedical Research, 1975More 1970s: Transposed files, Lorie, Batory, Svensson.“An overview of cantor: a new system for data analysis”Karasalo, Svensson, SSDBM 1983

Page 18: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 18

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

The original DSM paper

l Proposed as an alternative to NSMl 2 indexes: clustered on ID, non-clustered on valuel Speeds up queries projecting few columnsl Requires more storage

VLDB 2009 Tutorial Column-Oriented Database Systems 18

“A decomposition storage model” Copeland and Khoshafian. SIGMOD 1985.

1 2 3 4 ..ID

value0100 0962 1000 ..

Page 19: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 19

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Memory wall and PAX

l 90s: Cache-conscious research

l PAX: Partition Attributes Acrossl Retains NSM I/O patternl Optimizes cache-to-RAM communication

VLDB 2009 Tutorial Column-Oriented Database Systems 19

“DBMSs on a modern processor: Where does time go?” Ailamaki, DeWitt, Hill, Wood. VLDB 1999.

“Weaving Relations for Cache Performance.”Ailamaki, DeWitt, Hill, Skounakis, VLDB 2001.

“Cache Conscious Algorithms for Relational Query Processing.”Shatdal, Kant, Naughton. VLDB 1994.

from:

“Database Architecture Optimized for the New Bottleneck: Memory Access.”Boncz, Manegold, Kersten. VLDB 1999.

to:and:

Page 20: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 20

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

More hybrid NSM/DSM schemes

l Dynamic PAX: Data Morphing

l Clotho: custom layout using scatter-gather I/O

l Fractured mirrorsl Smart mirroring with both NSM/DSM copies

VLDB 2009 Tutorial Column-Oriented Database Systems 20

“Data morphing: an adaptive, cache-conscious storage technique.” Hankins, Patel, VLDB 2003.

“Clotho: Decoupling Memory Page Layout from Storage Organization.”Shao, Schindler, Schlosser, Ailamaki, and Ganger. VLDB 2004.

“A Case For Fractured Mirrors.” Ramamurthy, DeWitt, Su, VLDB 2002.

Page 21: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 21

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

MonetDB (more in Part 3)

l Late 1990s, CWI: Boncz, Manegold, and Kerstenl Motivation:

l Main-memoryl Improve computational efficiency by avoiding expression

interpreterl DSM with virtual IDs natural choicel Developed new query execution algebra

l Initial contributions:l Pointed out memory-wall in DBMSsl Cache-conscious projections and joinsl …

VLDB 2009 Tutorial Column-Oriented Database Systems 21

Page 22: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 22

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

2005: the (re)birth of column-stores

l New hardware and application realitiesl Faster CPUs, larger memories, disk bandwidth limitl Multi-terabyte Data Warehouses

l New approach: combine several techniquesl Read-optimized, fast multi-column access,

disk/CPU efficiency, light-weight compression

l C-store paper:l First comprehensive design description of a column-store

l MonetDB/X100l “proper” disk-based column store

l Explosion of new productsVLDB 2009 Tutorial Column-Oriented Database Systems 22

Page 23: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 23

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Performance tradeoffs: columns vs. rowsDSM traditionally was not favored by technology trendsHow has this changed?

l Optimized DSM in “Fractured Mirrors,” 2002l “Apples-to-apples” comparison

l Follow-up study

l Main-memory DSM vs. NSM

l Flash-disks: a come-back for PAX?

VLDB 2009 Tutorial Column-Oriented Database Systems 23

“Performance Tradeoffs in Read-Optimized Databases”Harizopoulos, Liang, Abadi, Madden, VLDB’06

“Read-Optimized Databases, In-Depth” Holloway, DeWitt, VLDB’08

“Query Processing Techniques for Solid State Drives”Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe, SIGMOD’09

“Fast Scans and Joins Using Flash Drives” Shah, Harizopoulos, Wiener, Graefe. DaMoN’08

“ DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing” Boncz, Zukowski, Nes, DaMoN’08

Page 24: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 24

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Fractured mirrors: a closer look

l Store DSM relations inside a B-treel Leaf nodes contain valuesl Eliminate IDs, amortize header overheadl Custom implementation on Shore

VLDB 2009 Tutorial Column-Oriented Database Systems 24

3

sparseB-tree on ID

1 a1

“Efficient columnar storage in B-trees” Graefe. Sigmod Record 03/2007.

Similar: storage densitycomparableto column stores

“A Case For Fractured Mirrors” Ramamurthy, DeWitt, Su, VLDB 2002.

11

22

33

TIDTID

a1a1

a2a2

a3a3

ColumnData

ColumnData

TupleHeaderTuple

Header

44 a4a4

55 a5a5

2 a2 3 a3 4 a4 5 a5

1 a1 a3a2 4 a4 a5

Page 25: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 25

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Fractured mirrors: performance

l Chunk-based tuple mergingl Read in segments of M pagesl Merge segments in memoryl Becomes CPU-bound after 5 pages

VLDB 2009 Tutorial Column-Oriented Database Systems 25

From PAX paper:

optimizedDSM

columns projected:1 2 3 4 5

timerow

column?

column?

regular DSM

Page 26: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 26

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-scannerimplementation

VLDB 2009 Tutorial Column-Oriented Database Systems 26

1 Joe 452 Sue 37… … …

JoeSue

4537…

…prefetch ~100ms

worth of data

Sapplypredicate(s)

Joe 45… …

S

#POS 45#POS …

S

Joe 45… …

applypredicate #1

SELECT name, ageWHERE age > 40

Direct I/O

row scanner column scanner

“Performance Tradeoffs in Read-Optimized Databases”Harizopoulos, Liang, Abadi, Madden, VLDB’06

Page 27: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 27

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Scan performance

l Large prefetch hides disk seeks in columnsl Column-CPU efficiency with lower selectivityl Row-CPU suffers from memory stallsl Memory stalls disappear in narrow tuplesl Compression: similar to narrow

VLDB 2009 Tutorial Column-Oriented Database Systems 27

not shown,details in the paper

Page 28: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 28

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Even more results

Non-selective queries, narrow tuples, favor well-compressed rowsMaterialized views are a winScan times determine early materialized joins

VLDB 2009 Tutorial Column-Oriented Database Systems 28

“Read-Optimized Databases, In-Depth” Holloway, DeWitt, VLDB’08

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10Tim

e (s)

Columns Returned

C-25%

C-10%

R-50%

Column-joins arecovered in part 2!

• Same engine as before• Additional findings

wide attributes:same as before

narrow & compressed tuple:CPU-bound!

Page 29: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 29

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Speedup of columns over rows

l Rows favored by narrow tuples and low cpdbl Disk-bound workloads have higher cpdb

VLDB 2009 Tutorial Column-Oriented Database Systems 29

tuple width

cycl

es p

er d

isk

byte

(cpdb)

8 12 16 20 24 28 32 369

18

36

72

144

_ + ++=

+++

“Performance Tradeoffs in Read-Optimized Databases”Harizopoulos, Liang, Abadi, Madden, VLDB’06

Page 30: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 30

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Varying prefetch size

l No prefetching hurts columns in single scans

VLDB 2009 Tutorial Column-Oriented Database Systems 30

0

10

20

30

40

4 8 12 16 20 24 28 32

time

(sec

)

selected bytes per tuple

Row (any prefetch size)

Column 48 (x 128KB)Column 16

Column 8

Column 2

no competingdisk traffic

Page 31: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 31

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Varying prefetch size

l No prefetching hurts columns in single scansl Under competing traffic, columns outperform rows for

any prefetch sizeVLDB 2009 Tutorial Column-Oriented Database Systems 31

with competing disk traffic

0

10

20

30

40

4 12 20 28

Column, 48Row, 48

0

10

20

30

40

4 12 20 28

Column, 8Row, 8

selected bytes per tuple

time

(sec

)

Page 32: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 32

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

CPU Performance

l Benefit in on-the-fly conversion between NSM and DSMl DSM: sequential access (block fits in L2), random in L1l NSM: random access, SIMD for grouped Aggregation

VLDB 2009 Tutorial Column-Oriented Database Systems 32

“ DSM vs. NSM: CPU performance trade offs in block-oriented query processing”Boncz, Zukowski, Nes, DaMoN’08

Page 33: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 33

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

VLDB 2009 Tutorial Column-Oriented Database Systems 33

New storage technology: Flash SSDsl Performance characteristics

l very fast random reads, slow random writesl fast sequential reads and writes

l Price per bit (capacity follows)l cheaper than RAM, order of magnitude more expensive than Disk

l Flash Translation Layer introduces unpredictabilityl avoid random writes!

l Form factors not ideal yetl SSD (Ł small reads still suffer from SATA overhead/OS limitations)l PCI card (Ł high price, limited expandability)

l Boost Sequential I/O in a simple packagel Flash RAID: very tight bandwidth/cm3 packing (4GB/sec inside the box)

l Column Store Updatesl useful for delta structures and logs

l Random I/O on flash fixes unclustered index accessl still suboptimal if I/O block size > record sizel therefore column stores profit mush less than horizontal stores

l Random I/O useful to exploit secondary, tertiary table orderingsl the larger the data, the deeper clustering one can exploit

Page 34: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 34

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Even faster column scans on flash SSDs

l New-generation SSDsl Very fast random reads, slower random writesl Fast sequential RW, comparable to HDD arrays

l No expensive seeks across columnsl FlashScan and Flashjoin: PAX on SSDs, inside Postgres

VLDB 2009 Tutorial Column-Oriented Database Systems 34

“Query Processing Techniques for Solid State Drives” Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe, SIGMOD’09

mini-pages with no qualified attributes are not accessed

30K Read IOps, 3K Write Iops250MB/s Read BW, 200MB/s Write

Page 35: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 35

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-scan performance over time

VLDB 2009 Tutorial Column-Oriented Database Systems 35

from 7x slower

..to 1.2x slower

..to same

and 3x faster!

regular DSM (2001)

optimized DSM (2002)

column-store (2006)

SSD Postgres/PAX (2009)

..to 2x slower

Page 36: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 36

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Outline

l Part 1: Basic concepts — Stavrosl Introduction to key featuresl From DSM to column-stores and performance tradeoffsl Column-store architecture overviewl Will rows and columns ever converge?

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 36

Page 37: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 37

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Architecture of a column-storestorage layout

l read-optimized: dense-packed, compressedl organize in extends, batch updatesl multiple sort ordersl sparse indexes

VLDB 2009 Tutorial Column-Oriented Database Systems 37

enginel block-tuple operatorsl new access methodsl optimized relational operatorssystem-level

l system-wide column supportl loading / updatesl scaling through multiple nodesl transactions / redundancy

Page 38: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 38

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

C-Store

l Compress columnsl No alignmentl Big disk blocksl Only materialized views (perhaps many)l Focus on Sorting not indexingl Data ordered on anything, not just timel Automatic physical DBMS designl Optimize for grid computingl Innovative redundancyl Xacts – but no need for Mohanl Column optimizer and executor

VLDB 2009 Tutorial Column-Oriented Database Systems 38

“C-Store: A Column-Oriented DBMS.” Stonebraker et al. VLDB 2005.

Page 39: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 39

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

C-Store: only materialized views (MVs)

l Projection (MV) is some number of columns from a fact tablel Plus columns in a dimension table – with a 1-n join between

Fact and Dimension tablel Stored in order of a storage key(s)l Several may be stored!l With a permutation, if necessary, to map between theml Table (as the user specified it and sees it) is not stored!l No secondary indexes (they are a one column sorted MV plus

a permutation, if you really want one)

VLDB 2009 Tutorial Column-Oriented Database Systems 39

User view :EMP (name, age, salary, dept)Dept (dname, floor)

Possible set of MVs :MV-1 (name, dept, floor) in floor orderMV-2 (salary, age) in age orderMV-3 (dname, salary, name) in salary order

Page 40: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 40

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

AsynchronousData Transfer

TUPLE MOVER

> Read Optimized Store (ROS)

• On disk• Sorted / Compressed• Segmented• Large data loaded direct

Continuous Load and Query (Vertica)

Hybrid Storage Architecture

(A B C | A)

A B C

Trickle Load

> Write Optimized Store (WOS)

§Memory based

§Unsorted / Uncompressed

§Segmented

§Low latency / Small quick inserts

A B C

VLDB 2009 Tutorial Column-Oriented Database Systems 40

Page 41: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 41

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Loading Data (Vertica)

Write-OptimizedStore (WOS)In-memory

Read-Optimized Store (ROS)

On-disk

Automatic Tuple Mover

> INSERT, UPDATE, DELETE

> Bulk and Trickle Loads

§COPY

§COPY DIRECT

> User loads data into logical Tables

> Vertica loads atomically into storage

Page 42: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 42

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Applications for column-storesl Data Warehousing

l High end (clustering)l Mid end/Mass Marketl Personal Analytics

l Data Miningl E.g. Proximity

l Google BigTablel RDF

l Semantic web data managementl Information retrieval

l Terabyte TREC

l Scientific datasetsl SciDB initiativel SLOAN Digital Sky Survey on MonetDB

VLDB 2009 Tutorial Column-Oriented Database Systems 42

Page 43: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 43

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

List of column-store systems

l Cantor (history)l Sybase IQl SenSage (former Addamark Technologies)l Kdbl 1010datal MonetDBl C-Store/Vertical X100/VectorWisel KickFirel SAP Business Acceleratorl Infobrightl ParAccell Exasol

VLDB 2009 Tutorial Column-Oriented Database Systems 43

Page 44: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 44

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Outline

l Part 1: Basic concepts — Stavrosl Introduction to key featuresl From DSM to column-stores and performance tradeoffsl Column-store architecture overviewl Will rows and columns ever converge?

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 44

Page 45: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 45

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

VLDB 2009 Tutorial Column-Oriented Database Systems 45

Simulate a Column-Store inside a Row-Store

Date Store Product Customer Price

01/01

01/01

01/01

1

2

3

1

2

3

1

2

3

Option A: Vertical Partitioning

Option B:Index Every Column

Date Index

Store Index

1

2

3

1

2

3

01/01

01/01

01/01

BOS

NYC

BOS

Table

Chair

Bed

Mesa

Lutz

Mudd

$20

$13

$79

BOS

NYC

BOS

Table

Chair

Bed

Mesa

Lutz

Mudd

$20

$13

$79

TID Value

StoreTID Value

ProductTID Value

CustomerTID Value

PriceTID Value

Date

Page 46: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 46

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

VLDB 2009 Tutorial Column-Oriented Database Systems 46

Simulate a Column-Store inside a Row-Store

Date Store Product Customer Price

1

2

3

1

2

3

Option A: Vertical Partitioning

Option B:Index Every Column

Date Index

Store Index

1

2

3

1

2

3

01/01

01/01

01/01

BOS

NYC

BOS

Table

Chair

Bed

Mesa

Lutz

Mudd

$20

$13

$79

BOS

NYC

BOS

Table

Chair

Bed

Mesa

Lutz

Mudd

$20

$13

$79

TID Value

StoreTID Value

ProductTID Value

CustomerTID Value

Price

01/01 1 3

StartPosValue

DateLength

Can explicitly run-length encode date

“Teaching an Old Elephant New Tricks.”Bruno, CIDR 2009.

Page 47: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 47

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

VLDB 2009 Tutorial Column-Oriented Database Systems 47

Experiments

0.0

50.0

100.0

150.0

200.0

250.0

Time (seconds)

Average 25.7 79.9 221.2

Normal Row-StoreVertically Partitioned

Row-Store

Row-Store With All

Indexes

“Column-Stores vs Row-Stores: How Different are They Really?”Abadi, Hachem, and Madden. SIGMOD 2008.

l Star Schema Benchmark (SSBM)

l Implemented by professional DBAl Original row-store plus 2 column-store

simulations on same row-store product

Adjoined Dimension Column Index (ADC Index) to Improve Star Schema Query Performance”. O’Neil et. al. ICDE 2008.

Page 48: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 48

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

What’s Going On? Vertical Partitions

l Vertical partitions in row-stores:l Work well when workload is knownl ..and queries access disjoint sets of columnsl See automated physical design

l Do not work well as full-columnsl TupleID overhead significantl Excessive joins

VLDB 2009 Tutorial Column-Oriented Database Systems 48

“Column-Stores vs. Row-Stores: How Different Are They Really?”Abadi, Madden, and Hachem. SIGMOD 2008.

Queries touch 3-4 foreign keys in fact table, 1-2 numeric columns

Complete fact table takes up ~4 GB (compressed)

Vertically partitioned tables take up 0.7-1.1 GB (compressed)

11

22

33

TID ColumnData

TupleHeader

Page 49: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 49

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

VLDB 2009 Tutorial Column-Oriented Database Systems 49

What’s Going On? All Indexes Case

l Tuple constructionl Common type of query:

l Result of lower part of query plan is a set of TIDs that passed all predicates

l Need to extract SELECT attributes at these TIDsl BUT: index maps value to TIDl You really want to map TID to value (i.e., a vertical partition)

à Tuple construction is SLOW

SELECT store_name, SUM(revenue)FROM Facts, StoresWHERE fact.store_id = stores.store_id

AND stores.country = “Canada”GROUP BY store_name

Page 50: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 50

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

VLDB 2009 Tutorial Column-Oriented Database Systems 50

So….

l All indexes approach is a poor way to simulate a column-storel Problems with vertical partitioning are NOT fundamental

l Store tuple header in a separate partitionl Allow virtual TIDsl Combine clustered indexes, vertical partitioning

l So can row-stores simulate column-stores?l Might be possible, BUT:

l Need better support for vertical partitioning at the storage layerl Need support for column-specific optimizations at the executer levell Full integration: buffer pool, transaction manager, ..

l When will this happen?l Most promising features = soon

l ..unless new technology / new objectives change the game(SSDs, Massively Parallel Platforms, Energy-efficiency)

See Part 2, Part 3 for most promising

features

Page 51: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 51

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

End of Part 1

l Basic concepts — Stavrosl Introduction to key featuresl From DSM to column-stores and performance tradeoffsl Column-store architecture overviewl Will rows and columns ever converge?

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 51

Page 52: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 52

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Part 2 Outline

l Compression

l Tuple Materialization

l Joins

Page 53: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

Compression

VLDB 2009

Tutorial

“Integrating Compression and Execution in Column-Oriented Database Systems” Abadi, Madden, and Ferreira, SIGMOD ’06

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

•Query optimization in compressed database systems” Chen, Gehrke, Korn, SIGMOD’01

Page 54: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 54

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Compression

l Trades I/O for CPUl Increased column-store opportunities:

l Higher data value locality in column storesl Techniques such as run length encoding far more

usefull Can use extra space to store multiple copies of

data in different sort orders

Page 55: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 55

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Run-length Encoding

Q1Q1Q1Q1Q1Q1Q1

Q2Q2Q2Q2

1111122

1112

Product IDQuarter

(value, start_pos, run_length)

(1, 1, 5)

Product IDQuarter

(Q2, 301, 350)

(Q3, 651, 500)

(Q4, 1151, 600)

(2, 6, 2)

(1, 301, 3)(2, 304, 1)

5729685

3814

Price

5729685

3814

Price

(Q1, 1, 300)

(value, start_pos, run_length)

Page 56: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 56

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Bit-vector Encoding

1111122

1123

Product ID

1111100

1100

ID: 1 ID: 2 ID: 3

0000000

0000

0000000

0001

0000011

0010

“Integrating Compression and Execution in Column-Oriented Database Systems” Abadi et. al, SIGMOD ’06

l For each unique value, v, in column c, create bit-vector bl b[i] = 1 if c[i] = v

l Good for columns with few unique values

l Each bit-vector can be further compressed if sparse

Page 57: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 57

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Q1Q2Q4Q1Q3Q1Q1

Q2Q4Q3Q3…

Quarter

Q1

0130200

1322

Quarter

0

0: Q11: Q22: Q33: Q4

Dictionary Encoding

Dictionary

+OR

24128122

Quarter

24: Q1, Q2, Q4, Q1

128: Q3, Q1, Q1, Q1

122: Q2, Q4, Q3, Q3

Dictionary

+

“Integrating Compression and Execution in Column-Oriented Database Systems” Abadi et. al, SIGMOD ’06

l For each unique value create dictionary entry

l Dictionary can be per-block or per-column

l Column-stores have the advantage that dictionary entries may encode multiple values at once

Page 58: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 58

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

45544855515340

49625250

Price

50

Frame: 50

4-2513

2

0

0…

Price

-1

Frame Of Reference Encoding

-5

40

62

4 bits per value

Exceptions (see part 3 for a better way to deal with exceptions)

l Encodes values as b bit offset from chosen frame of reference

l Special escape code (e.g. all bits set to 1) indicates a difference larger than can be stored in b bitsl After escape code,

original (uncompressed) value is written

“Compressing Relations and Indexes ” Goldstein, Ramakrishnan, Shaft, ICDE’98

Page 59: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 59

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Differential Encoding

5:005:025:035:035:045:065:07

5:105:155:165:16…

Time

5:08

21012

1

1

0

Time

2

5:00

1

5:15

2 bits per value

Exception (see part 3 for a better way to deal with exceptions)

l Encodes values as b bit offset from previous value

l Special escape code (just like frame of reference encoding) indicates a difference larger than can be stored in b bitsl After escape code, original

(uncompressed) value is written l Performs well on columns

containing increasing/decreasing sequencesl inverted listsl timestampsl object IDsl sorted / clustered columns

“Improved Word-Aligned Binary Compression for Text Indexing”Ahn, Moffat, TKDE’06

Page 60: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 60

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

What Compression Scheme To Use?

Does column appear in the sort key?

Is the averagerun-length > 2

Are number of unique values < ~50000

DifferentialEncoding

RLE

Is the data numericaland exhibit good locality?

Frame of ReferenceEncoding

HeavyweightCompression

Leave DataUncompressed

OR

Does this column appear frequently in selection predicates?

Bit-vectorCompression

DictionaryCompression

yes

yes

yes

yes

yes

no

no

no

no

no

Page 61: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 61

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Heavy-Weight Compression Schemes

l Modern disk arrays can achieve > 1GB/sl 1/3 CPU for decompression Ł 3GB/s needed

Ł Lightweight compression schemes are better

Ł Even better: operate directly on compressed data

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 62: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 62

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l I/O - CPU tradeoff is no longer a tradeoffl Reduces memory–CPU bandwidth requirementsl Opens up possibility of operating on multiple

records at once

Operating Directly on Compressed Data

“Integrating Compression and Execution in Column-Oriented Database Systems” Abadi et. al, SIGMOD ’06

Page 63: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 63

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

10011001000

Product IDQuarter

(1, 3)

(2, 1)

(3, 2)

1

00100010010

2

01000100101

3ProductID, COUNT(*))

(Q1, 1, 300)

(Q2, 301, 6)

(Q3, 307, 500)

(Q4, 807, 600)

Index Lookup + Offset jump

SELECT ProductID, Count(*)FROM tableWHERE (Quarter = Q2)GROUP BY ProductID

301-306

Operating Directly on Compressed Data

“Integrating Compression and Execution in Column-Oriented Database Systems” Abadi et. al, SIGMOD ’06

Page 64: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 64

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

(Q1, 1, 300)

(Q2, 301, 6)

(Q3, 307, 500)

(Q4, 807, 600)

Data

isOneValue()isValueSorted()isPosContiguous()isSparse()

getNext()decompressIntoArray()getValueAtPosition(pos)

getMin()getMax()getSize()

Block API

Compression-Aware Scan

Operator

SelectionOperator

AggregationOperator

SELECT ProductID, Count(*)FROM tableWHERE (Quarter = Q2)GROUP BY ProductID

“Integrating Compression and Execution in Column-Oriented Database Systems” Abadi et. al, SIGMOD ’06

Operating Directly on Compressed Data

Page 65: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

Tuple Materialization andColumn-Oriented Join Algorithms

VLDB 2009

Tutorial

“Materialization Strategies in a Column-Oriented DBMS” Abadi, Myers, DeWitt, and Madden. ICDE 2007.

“Query Processing Techniques for Solid State Drives” Tsirogiannis, Harizopoulos Shah, Wiener, and Graefe. SIGMOD 2009.

“Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008.

“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

“Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09

Page 66: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 66

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

When should columns be projected?

l Where should column projection operators be placed in a query plan?l Row-store:

l Column projection involves removing unneeded column s from tuples

l Generally done as early as possiblel Column-store:

l Operation is almost completely opposite from a row- storel Column projection involves reading needed columns f rom storage

and extracting values for a listed set of tuples§ This process is called “materialization”

l Early materialization: project columns at beginning of query plan§ Straightforward since there is a one-to-one mapping across columns

l Late materialization: wait as long as possible for projecting columns§ More complicated since selection and join operators on one column obfuscates

mapping to other columns from same tablel Most column-stores construct tuples and column proj ection time

§ Many database interfaces expect output in regular t uples (rows)§ Rest of discussion will focus on this case

Page 67: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 67

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

When should tuples be constructed?

l Solution 1: Create rows first (EM). But:l Need to construct ALL tuplesl Need to decompress datal Poor memory bandwidth

utilization

2131

2333

7134280

(4,1,4)

Construct

2

3

3

3

7

13

42

80

Select + Aggregate

2

1

3

1

4

4

4

4

prodID storeID custID price

QUERY:

SELECT custID,SUM(price)FROM tableWHERE (prodID = 4) AND

(storeID = 1) ANDGROUP BY custID

Page 68: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 68

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

DataSourceprodID

DataSourcestoreID

AND

AGG

DataSourcecustID

DataSourceprice

1111

4444

0101

2131

prodID storeID

DataSource

DataSource

QUERY:

SELECT custID,SUM(price)FROM tableWHERE (prodID = 4) AND

(storeID = 1) ANDGROUP BY custID

2131

2333

7134280

4444

prodID storeID custID price

Solution 2: Operate on columns

Page 69: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 69

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Solution 2: Operate on columns

1111

0101

AND

0101

QUERY:

SELECT custID,SUM(price)FROM tableWHERE (prodID = 4) AND

(storeID = 1) ANDGROUP BY custID

AND

AGG

DataSourcecustID

DataSourceprice

2131

2333

7134280

4444

prodID storeID custID price

DataSourceprodID

DataSourcestoreID

Page 70: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 70

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

33

Solution 2: Operate on columns

0101

0101

DataSource

DataSource

0101

7134280

0101

2333

111380

custID price

QUERY:

SELECT custID,SUM(price)FROM tableWHERE (prodID = 4) AND

(storeID = 1) ANDGROUP BY custID

AND

AGG

DataSourcecustID

DataSourceprice

2131

2333

7134280

4444

prodID storeID custID price

DataSourceprodID

DataSourcestoreID

Page 71: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 71

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Solution 2: Operate on columns

1133

111380

AGG

13 193

QUERY:

SELECT custID,SUM(price)FROM tableWHERE (prodID = 4) AND

(storeID = 1) ANDGROUP BY custID

AND

AGG

DataSourcecustID

DataSourceprice

2131

2333

7134280

4444

prodID storeID custID price

DataSourceprodID

DataSourcestoreID

Page 72: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 72

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

For plans without joins, late materialization is a win

l Ran on 2 compressed columns from TPC-H scale 10 data

QUERY:

SELECT C1, SUM(C2)FROM tableWHERE (C1 < CONST) AND

(C2 < CONST)GROUP BY C1

0

1

2

3

45

6

7

8

9

10

Low selectivity Mediumselectivity

High selectivity

Tim

e (s

econ

ds)

Early Materialization Late Materialization

“Materialization Strategies in a Column-Oriented DBMS”Abadi, Myers, DeWitt, and Madden. ICDE 2007.

Page 73: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 73

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Materializing late still works best

QUERY:

SELECT C1, SUM(C2)FROM tableWHERE (C1 < CONST) AND

(C2 < CONST)GROUP BY C1

0

2

4

6

8

10

12

14

Low selectivity Mediumselectivity

High selectivity

Tim

e (s

econ

ds)

Early Materialization Late Materialization

Even on uncompressed data, late materialization is still a win

“Materialization Strategies in a Column-Oriented DBMS”Abadi, Myers, DeWitt, and Madden. ICDE 2007.

Page 74: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 74

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

What about for plans with joins?

74

Select R1.B, R1.C, R2.E, R2.H, R3.FFrom R1, R2, R3Where R1.A = R2.D AND R2.G = R3.K

R1 (A, B, C) R2 (D, E, G, H)

Scan Scan

Join 1

A = D

Join 2

G = K

Scan

R3 (F, K, L)

B, C, E, H, F

A, B, C

B, C, E, G, H

D, E, G, H

F, K

R1 (A, B, C) R2 (D, E, G, H)

Scan Scan

JoinA = D

A D

Scan

R3 (F, K, L)

Project

G

JoinG = K

Project

B, C, E, H, F

KG

E, H

FB, C

Page 75: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 75

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

What about for plans with joins?

75

Select R1.B, R1.C, R2.E, R2.H, R3.FFrom R1, R2, R3Where R1.A = R2.D AND R2.G = R3.K

R1 (A, B, C) R2 (D, E, G, H)

Scan Scan

Join 1

A = D

Join 2

G = K

Scan

R3 (F, K, L)

B, C, E, H, F

A, B, C

B, C, E, G, H

D, E, G, H

F, K

R1 (A, B, C) R2 (D, E, G, H)

Scan Scan

JoinA = D

A D

Scan

R3 (F, K, L)

Project

G

JoinG = K

Project

B, C, E, H, F

KG

E, H

FB, C

Early

materialization

Late

materialization

Page 76: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 76

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Early Materialization Example

121111

6121

2333

7134280

Construct

2

3

3

3

7

13

42

80

prodID storeID quantity custID price

123

Construct

1

2

3

custID lastName

Green

White

Brown

GreenWhiteBrown

QUERY:

SELECT C.lastName,SUM(F.price)FROM facts AS F, customers AS CWHERE F.custID = C.custIDGROUP BY C.lastName

Facts Customers

(4,1,4)

Page 77: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 77

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Early Materialization Example

2

3

3

3

7

13

42

80

1

2

3

Green

White

Brown

QUERY:

SELECT C.lastName,SUM(F.price)FROM facts AS F, customers AS CWHERE F.custID = C.custIDGROUP BY C.lastName

Join

7

13

42

80

White

Brown

Brown

Brown

Page 78: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 78

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Late Materialization Example

121111

6121

2313

7134280

prodID storeID quantity custID price

123

custID lastName

GreenWhiteBrown

QUERY:

SELECT C.lastName,SUM(F.price)FROM facts AS F, customers AS CWHERE F.custID = C.custIDGROUP BY C.lastName

Facts Customers

Join

2313

1234

(4,1,4)

Late materialized join causes out of order probing of projected columns from the inner relation

Page 79: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 79

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Late Materialized Join Performance

l Naïve LM join about 2X slower than EM join on typical queries (due to random I/O)l This number is very dependent on

l Amount of memory availablel Number of projected attributesl Join cardinality

l But we can do betterl Invisible Joinl Jive/Flash Joinl Radix cluster/decluster join

Page 80: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 80

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Invisible Join

l Designed for typical joins when data is modeled using a star schemal One (“fact”) table is joined with multiple dimension tables

l Typical query:select c_nation, s_nation, d_year,

sum(lo_revenue) as revenuefrom customer, lineorder, supplier, datewhere lo_custkey = c_custkey

and lo_suppkey = s_suppkeyand lo_orderdate = d_datekeyand c_region = 'ASIA‘and s_region = 'ASIA‘and d_year >= 1992 and d_year <= 1997

group by c_nation, s_nation, d_yearorder by d_year asc, revenue desc;

“Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008.

Page 81: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 81

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

custkey region nation …1 ASIA CHINA …2 ASIA INDIA …3 ASIA INDIA …

Apply “region = ‘Asia’” On Customer Table

Hash Table (or bit-map) Containing Keys 1, 2 and 3

suppkey region nation …1 ASIA RUSSIA …2 EUROPE SPAIN …

Apply “region = ‘Asia’” On Supplier Table

Hash Table (or bit-map)Containing Keys 1, 3

dateid year …01011997 1997 …01021997 1997 …01031997 1997 …

Apply “year in [1992,1997]” On Date Table

Hash Table Containing Keys 01011997, 01021997,

and 01031997

Invisible Join

“Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008.

4 EUROPE FRANCE …

3 ASIA JAPAN …

Page 82: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 82

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Column-Stores vs Row-Stores: How Different are They Really?”Abadi et. al. SIGMOD 2008.

custkey suppkey orderdate revenueorderkey3 1 01011997 4325613 2 01011997 3333324 3 01021997 1212131 1 01021997 2323344 2 01021997 4545651 2 01031997 4325163 2 01031997 342357

Original Fact Table

custkey3341413

Hash Table Containing

Keys 1, 2 and 3

+1101011

Hash Table Containing

Keys 1 and 3

1011000

+suppkey

1231222

Hash Table Containing Keys 01011997,

01021997, and 01031997

1111111

+orderdate01011997010119970102199701021997010219970103199701031997

1101011

1011000

1111111

& & =1001000

Page 83: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 83

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

custkey3341413

+

suppkey1231222

orderdate01011997010119970102199701021997010219970103199701031997

1001000

31

1001000

11

1001000

0101199701021997

+

+

+

+

JOIN

CHINAINDIAINDIA = CHINA

INDIA

RUSSIASPAIN = RUSSIA

RUSSIA

199719971997 =01011997

0102199701031997

19971997

FRANCE

JAPAN

Page 84: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 84

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

custkey region nation …1 ASIA CHINA …2 ASIA INDIA …3 ASIA INDIA …

Apply “region = ‘Asia’” On Customer Table

Hash Table (or bit-map) Containing Keys 1, 2 and 3

suppkey region nation …1 ASIA RUSSIA …2 EUROPE SPAIN …

Apply “region = ‘Asia’” On Supplier Table

Hash Table (or bit-map)Containing Keys 1, 3

dateid year …01011997 1997 …01021997 1997 …01031997 1997 …

Apply “year in [1992,1997]” On Date Table

Hash Table Containing Keys 01011997, 01021997,

and 01031997

Invisible Join

“Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008.

4 EUROPE FRANCE …

3 ASIA JAPAN …

Range [1-3](between-predicate rewriting)

Page 85: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 85

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Invisible Join

l Bottom Linel Many data warehouses model data using star/snowflak e

schemesl Joins of one (fact) table with many dimension table s is

commonl Invisible join takes advantage of this by making su re

that the table that can be accessed in position ord er is the fact table for each join

l Position lists from the fact table are then interse cted (in position order)

l This reduces the amount of data that must be access ed out of order from the dimension tables

l “Between-predicate rewriting” trick not relevant for this discussion

Page 86: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 86

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

custkey3341413

+

suppkey1231222

orderdate01011997010119970102199701021997010219970103199701031997

1001000

31

1001000

11

1001000

0101199701021997

+

+

+

+

JOIN

CHINAINDIAINDIA = CHINA

INDIA

RUSSIASPAIN = RUSSIA

RUSSIA

199719971997 =01011997

0102199701031997

19971997

FRANCE

JAPAN

Still accessing table out of order

Page 87: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 87

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Jive/Flash Join

31 + CHINA

INDIAINDIA = CHINA

INDIA

FRANCE

Still accessing table out of order

“Fast Joins using Join Indices”. Li and Ross, VLDBJ 8:1-24, 1999.

“Query Processing Techniques for Solid State Drives”. Tsirogiannis, Harizopoulos et. al. SIGMOD 2009.

Page 88: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 88

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Jive/Flash Join

1. Add column with dense ascending integers from 1

2. Sort new position list by second column

3. Probe projected column in order using new sorted position list, keeping first column from position list around

4. Sort new result by first column

31 + CHINA

INDIAINDIA = CHINA

INDIA

FRANCE

12

13

21 + CHINA

INDIAINDIA = INDIA

CHINA

FRANCE

21

CHINAINDIA1

2

Page 89: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 89

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Jive/Flash Join

l Bottom Linel Instead of probing projected columns from inner tab le out of

order:l Sort join indexl Probe projected columns in orderl Sort result using an added column

l LM vs EM tradeoffs:l LM has the extra sorts (EM accesses all columns in order)l LM only has to fit join columns into memory (EM nee ds join

columns and all projected columns)§ Results in big memory and CPU savings (see part 3 f or why there is

CPU savings)

l LM only has to materialize relevant columnsl In many cases LM advantages outweigh disadvantages

l LM would be a clear winner if not for those pesky s orts … can we do better?

Page 90: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 90

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Radix Cluster/Decluster

l The full sort from the Jive join is actually overkilll We just want to access the storage blocks in order (we

don’t mind random access within a block)l So do a radix sort and stop earlyl By stopping early, data within each block is accessed out of

order, but in the order specified in the original join indexl Use this pseudo-order to accelerate the post-probe sort as well

“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

•“Database Architecture Optimized for the New Bottleneck: Memory Access”VLDB’99•“Generic Database Cost Models for Hierarchical Memory Systems”, VLDB’02 (all Manegold, Boncz, Kersten)

Page 91: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 91

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Radix Cluster/Decluster

l Bottom linel Both sorts from the Jive join can be significantly reduced in

overheadl Only been tested when there is sufficient memory for the

entire join index to be stored three timesl Technique is likely applicable to larger join indexes, but utility will

go down a little

l Only works if random access within a storage blockl Don’t want to use radix cluster/decluster if you have variable-

width column values or compression schemes that can only be decompressed starting from the beginning of the block

Page 92: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 92

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

LM vs EM joins

l Invisible, Jive, Flash, Cluster, Decluster techniques contain a bag of tricks to improve LM joins

l Research papers show that LM joins become 2X faster than EM joins (instead of 2X slower) for a wide array of query types

Page 93: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 93

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Tuple Construction Heuristics

l For queries with selective predicates, aggregations, or compressed data, use late materialization

l For joins:l Research papers:

l Always use late materializationl Commercial systems:

l Inner table to a join often materialized before joi n (reduces system complexity):

l Some systems will use LM only if columns from inner table can fit entirely in memory

Page 94: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 94

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Outline

l Computational Efficiency of DB on modern hardwarel how column-stores can help herel Keynote revisited: MonetDB & VectorWise in more depth

l CPU efficient column compressionl vectorized decompression

l Conclusionsl future work

Page 95: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

40 years of hardware evolution

vs.DBMS computational efficiency

VLDB 2009

Tutorial

Page 96: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 96

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

CPU Architecture

Elements:l Storage

l CPU caches L1/L2/L3

l Registersl Execution Unit(s)

l Pipelinedl SIMD

Page 97: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 97

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

CPU Metrics

Page 98: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 98

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

CPU Metrics

Page 99: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 99

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

DRAM Metrics

Page 100: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 100

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Super-Scalar Execution (pipelining)

Page 101: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 101

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Hazardsl Data hazards

l Dependencies between instructions

l L1 data cache misses

Result: bubbles in the pipeline

l Control Hazardsl Branch mispredictions

l Computed branches (late binding)l L1 instruction cache misses

Out-of-order execution addresses data hazardsl control hazards typically more expensive

Page 102: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 102

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SIMD

l Single Instruction Multiple Datal Same operation applied on a vector of valuesl MMX: 64 bits, SSE: 128bits, AVX: 256bitsl SSE, e.g. multiply 8 short integers

Page 103: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 103

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

alice 22101

next()

next()

next()

ivan 37102

ivan 37102

ivan 37102

ivan 350102

alice 22101

SELECT id, name (age-30)*50 AS bonus

FROM employeeWHERE age > 30

350

FALSETRUE

22 > 30 ?37 > 30 ?

37 – 30 7 * 50

7

A Look at the Query Pipeline

Page 104: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 104

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

next()

next()

next()

ivan 350102

22 > 30 ?37 > 30 ?

37 – 30 7 * 50

A Look at the Query Pipeline

Operators

Iterator interface-open()-next(): tuple-close()

Page 105: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 105

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

alice 22101

next()

next()

next()

ivan 37102

ivan 37102

ivan 37102

ivan 350102

alice 22101

350

FALSETRUE

22 > 30 ?37 > 30 ?

37 – 30 7 * 50

7

A Look at the Query Pipeline

Primitives

Provide computationalfunctionality

All arithmetic allowed in expressions, e.g. Multiplication

mult(int,int) Ł int

7 * 50

Page 106: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 106

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Operators

Iterator interface-open()-next(): tuple-close()SCAN

SELECT

PROJECT

Database Architecture causes Hazardsl Data hazards

l Dependencies between instructions

l L1 data cache misses

l Control Hazardsl Branch mispredictions

l Computed branches (late binding)l L1 instruction cache misses

Complex NSM record navigation

Large Tree/Hash Structures

Code footprint of all operators in query plan exceeds L1 cache

Data-dependent conditions

Next() late binding method calls

Tree, List, Hash traversal

SIMD Out-of-order Execution

work on one tuple at a time

“DBMSs On A Modern Processor: Where Does Time Go? ”Ailamaki, DeWitt, Hill, Wood, VLDB’99

Page 107: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 107

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

DBMS Computational Efficiency TPC-H 1GB, query 1l selects 98% of fact table, computes net prices and

aggregates alll Results:

l C program: ?l MySQL: 26.2s l DBMS “X”: 28.1s

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 108: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 108

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

DBMS Computational Efficiency TPC-H 1GB, query 1l selects 98% of fact table, computes net prices and

aggregates alll Results:

l C program: 0.2sl MySQL: 26.2s l DBMS “X”: 28.1s

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 109: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

MonetDB

VLDB 2009

Tutorial

Page 110: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 110

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l “save disk I/O when scan-intensive queries need a few columns”

l “avoid an expression interpreter to improve computational efficiency”

a column-store

Page 111: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 111

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SELECT id, name, (age-30)*50 as bonusFROM people

WHERE age > 30

RISC Database Algebra

Page 112: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 112

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SELECT id, name, (age-30)*50 as bonusFROM people

WHERE age > 30

RISC Database Algebra

batcalc_minus_int(int* res, int* col, int val,int n)

{for(i=0; i<n; i++)

res[i] = col[i] - val;}

CPU happy? Give it “nice” code !

- few dependencies (control,data)- CPU gets out-of-order execution- compiler can e.g. generate SIMD

One loop for an entire column- no per-tuple interpretation- arrays: no record navigation- better instruction cache locality

Simple, hard-coded semantics in operators

Page 113: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 113

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SELECT id, name, (age-30)*50 as bonusFROM people

WHERE age > 30

RISC Database Algebra

batcalc_minus_int(int* res, int* col, int val,int n)

{for(i=0; i<n; i++)

res[i] = col[i] - val;}

CPU happy? Give it “nice” code !

- few dependencies (control,data)- CPU gets out-of-order execution- compiler can e.g. generate SIMD

One loop for an entire column- no per-tuple interpretation- arrays: no record navigation- better instruction cache locality

MATERIALIZED intermediate

results

Page 114: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 114

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l “save disk I/O when scan-intensive queries need a few columns”

l “avoid an expression interpreter to improve computational efficiency”

How?

l RISC query algebra: hard-coded semantics

l Decompose complex expressions in multiple operations

l Operators only handle simple arrays

l No code that handles slotted buffered record layout

l Relational algebra becomes array manipulation language

l Often SIMD for free

l Plus: use of cache-conscious algorithms for Sort/Aggr/Join

a column-store

Page 115: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 115

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l You want efficiencyl Simple hard-coded operators

l I take scalabilityl Result materialization

n C program: 0.2s

n MonetDB: 3.7s

n MySQL: 26.2s

n DBMS “X”: 28.1s

a Faustian pact

Page 116: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

as a research platform

VLDB 2009

Tutorial

Page 117: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 117

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SIGMOD 1985

MonetDB supports

SQL, XML, ODMG,

..RDF

RDF support on C-

STORE / SW-Store

MonetDB

BAT Algebra

•“MIL Primitives for Querying a Fragmented World”, Boncz, Kersten, VLDBJ’98•“Flattening an Object Algebra to Provide Performance”Boncz, Wilschut, Kersten, ICDE’98•“MonetDB/XQuery: a fast XQuery processor powered by a relational engine” Boncz, Grust, vanKeulen, Rittinger, Teubner, SIGMOD’06•“SW-Store: a vertically partitioned DBMS for Semantic Web data management“ Abadi, Marcus, Madden, Hollenbach, VLDBJ’09

Page 118: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 118

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SIGMOD 1985

MonetDB never

materializes tuples

“Materialization Strategies in a Column-Oriented DBMS” Abad, Myers, DeWitt, Madden, ICDE’07

Page 119: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 119

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

XQuery

MonetDB 4 MonetDB 5

MonetDB kernel

SQL 03

Optimizers

RDF

SOAP

Arrays

The Software Stack

Extensible query langfrontend framework

.. SGL?Extensible DynamicRuntime QOPTFramework!

OGIS

compile

ExtensibleArchitecture-Conscious

Execution platform

Page 120: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 120

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Cache-Conscious Joinsl Cost Models, Radix-cluster,

Radix-decluster

l MonetDB/XQuery: l structural joins exploiting

positional column access

l Cracking:l on-the-fly automatic indexing

without workload knowledge

l Recycling:l using materialized intermediates

l Run-time Query Optimization:l correlation-aware run-time

optimization without cost model

as a research platform

“MonetDB/XQuery: a fast XQuery processor powered by a relational engine” Boncz, Grust, vanKeulen, Rittinger, Teubner, SIGMOD’06

•“Database Architecture Optimized for the New Bottleneck: Memory Access” VLDB’99•“Generic Database Cost Models for Hierarchical Memory Systems”, VLDB’02 (all Manegold, Boncz, Kersten)•“Cache-Conscious Radix-Decluster Projections”, Manegold, Boncz, Nes, VLDB’04

“Database Cracking”, CIDR’07“Updating a cracked database “, SIGMOD’07“Self-organizing tuple reconstruction in column-stores“, SIGMOD’09 (all Idreos, Manegold, Kersten)

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

optimizer frontend backend

Page 121: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 121

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Radix-Partitioned Hash Joinl create partitions << CPU cachel small partitions Ł many partitionsl many partitions Ł multiple passes needed

Cache-Conscious JoinsRadix-Cluster

•“Database Architecture Optimized for the New Bottleneck: Memory Access”VLDB’99•“Generic Database Cost Models for Hierarchical Memory Systems”, VLDB’02 (all Manegold, Boncz, Kersten)

Page 122: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 122

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Radix-Partitioned Hash Joinl create partitions << CPU cachel small partitions Ł many partitionsl many partitions Ł multiple passes needed

l Radix-Clusterl Radix-Sort with early stopping

l Each pass looks at Bi higher-most radix bitsl Splitting each input cluster into 2Bi output clusters

l leaves relation partially ordered

Cache-Conscious JoinsRadix-Cluster

•“Database Architecture Optimized for the New Bottleneck: Memory Access”VLDB’99•“Generic Database Cost Models for Hierarchical Memory Systems”, VLDB’02 (all Manegold, Boncz, Kersten)

Page 123: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 123

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Multiple clustering passesl Limit number of clusters

per passl Avoid cache/TLB trashingl Trade memory cost for

CPU costl Any data type (hashing)

Cache-Conscious JoinsRadix-Cluster

•“Database Architecture Optimized for the New Bottleneck: Memory Access”VLDB’99•“Generic Database Cost Models for Hierarchical Memory Systems”, VLDB’02 (all Manegold, Boncz, Kersten)

Page 124: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 124

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsAccurate Cache Miss Cost Modeling

Page 125: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 125

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsPartitioned Hash Join

Page 126: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 126

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

(64,000,000 tuples)

Cache-Conscious JoinsRadix-Clustered Hash-Join: overall perf

Page 127: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 127

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsPre-Projection vs. Post-Projection

Page 128: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 128

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Radix-Decluster = cache-conscious post-projection

Cache-Conscious JoinsPre-Projection vs. Post-Projection

Page 129: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 129

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Partitioned Hash-Join

Cache-Conscious JoinsPost-Projection

Join Index!

“Join Indices”Valduriez, TODS’87

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 130: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 130

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Partitioned Hash-Joinl Cluster on Left

Cache-Conscious JoinsPost-Projection

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 131: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 131

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Partitioned Hash-Joinl Cluster on Left

Cache-Conscious JoinsPost-Projection

00001111222233334444........

NNNN----1111

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 132: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 132

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Partitioned Hash-Joinl Cluster on Leftl Project Left

Cache-Conscious JoinsPost-Projection

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 133: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 133

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Partitioned Hash-Joinl Cluster on Leftl Project Leftl Cluster on Right

Cache-Conscious JoinsPost-Projection

222244446666111133337777000088885555

111133337777000055558888

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 134: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 134

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Partitioned Hash-Joinl Cluster on Leftl Project Leftl Cluster on Rightl Project Right

Ł Radix-Decluster

Cache-Conscious JoinsPost-Projection

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 135: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 135

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Red column forms a dense domainl {0,1,2,…,N-1}

l All subsequences are orderedl e.g.

l Task: merge subsequences into dense sequence

Cache-Conscious JoinsRadix-Decluster

222244446666111133337777000088885555

111133337777000055558888

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 136: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 136

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Red column forms a dense domainl {0,1,2,…,N-1}

l All subsequences are orderedl e.g.

l Task: merge subsequences into dense sequencel Approach 1: merge H=2B lists

l L cost O(N log(H))l many (H) cursors needed Ł cache thrashing

Cache-Conscious JoinsRadix-Decluster

222244446666111133337777000088885555

111133337777000055558888

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 137: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 137

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Red column forms a dense domainl {0,1,2,…,N-1}

l All subsequences are orderedl e.g.

l Task: merge subsequences into dense sequencel Approach 1: merge H=2B lists

l L cost O(N log(H)), l many (H) cursors needed Ł cache thrashing

l Approach 2: insert by positionl L many (H) sparse passes over the result Ł no cache reuse

Cache-Conscious JoinsRadix-Decluster

222244446666111133337777000088885555

111133337777000055558888

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 138: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 138

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Red column forms a dense domainl {0,1,2,…,N-1}

l All subsequences are orderedl e.g.

l Task: merge subsequences into dense sequencel Approach 1: merge H=2B lists

l L cost O(N log(H))l many (H) cursors needed Ł cache thrashing

l Approach 2: insert by positionl L many (H) sparse passes over the result Ł no cache reuse

l Radix-Decluster: insert by position with sliding window

Cache-Conscious JoinsRadix-Decluster

222244446666111133337777000088885555

111133337777000055558888

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 139: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 139

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 140: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 140

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

3 clusters3 clusters3 clusters3 clusters

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 141: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 141

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

destination positionsdestination positionsdestination positionsdestination positions

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 142: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 142

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Projection column (still)Projection column (still)Projection column (still)Projection column (still)

in wrong orderin wrong orderin wrong orderin wrong order

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 143: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 143

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Result column to fillResult column to fillResult column to fillResult column to fill

insertion window of size 2insertion window of size 2insertion window of size 2insertion window of size 2

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 144: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 144

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 145: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 145

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 146: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 146

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 147: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 147

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 148: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 148

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 149: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 149

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 150: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 150

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 151: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 151

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 152: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 152

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsRadix-Decluster In Action

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 153: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 153

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Only compulsary misses Ł cache-conscious

n Random access only to sliding window(<< cache size)

Cache-Conscious JoinsRadix-Decluster Memory Access Pattern

•“Cache-Conscious Radix-DeclusterProjections”, Manegold, Boncz, Nes, VLDB’04

Page 154: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 154

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Radix-Declusterprefers small W

(i.e. vertical fragmentation)

Cache-Conscious JoinsRadix-Decluster Performance Tradeoff

Read Also:- Jive Join- Slam Join

“Fast Joins UsingJoin Indices”

Ross, Lei, VLDBJ’98

Page 155: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 155

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cache-Conscious JoinsPre-Projection vs. Post-Projection

“Query Processing Techniques for Solid State Drives” Tsirogiannis, Harizopoulos,Shah, Wiener, Graefe, SIGMOD’09

Page 156: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 156

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

CrackingSelf Organizing Databases

“Database Cracking”, Idreos, Manegold, Kersten, CIDR’07

Page 157: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 157

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

CrackingSelf Organizing Databases

“Database Cracking”, Idreos, Manegold, Kersten, CIDR’07

Page 158: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 158

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Crackinghow it works

“Updating a cracked database “, Idreos, Manegold, Kersten, SIGMOD’07

Page 159: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 159

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Crackinghow it works

“Updating a cracked database “, Idreos, Manegold, Kersten, SIGMOD’07

Page 160: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 160

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Crackinghow it works

“Updating a cracked database “, Idreos, Manegold, Kersten, SIGMOD’07

Page 161: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 161

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Crackinghow it works

“Updating a cracked database “, Idreos, Manegold, Kersten, SIGMOD’07

Page 162: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 162

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Crackinghow it works

“Updating a cracked database “, Idreos, Manegold, Kersten, SIGMOD’07

Page 163: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 163

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Crackinghow it works

“Updating a cracked database “, Idreos, Manegold, Kersten, SIGMOD’07

Page 164: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 164

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Crackinghow it works

“Updating a cracked database “, Idreos, Manegold, Kersten, SIGMOD’07

Page 165: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 165

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Crackinghow it works

“Updating a cracked database “, Idreos, Manegold, Kersten, SIGMOD’07

Page 166: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 166

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 167: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 167

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 168: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 168

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 169: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 169

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 170: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 170

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 171: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 171

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 172: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 172

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 173: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 173

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 174: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 174

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 175: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 175

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 176: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 176

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09Cracking

sideways cracking in column stores

Page 177: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 177

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

CrackingTPC-H Performance

“Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, Kersten, SIGMOD’09

Page 178: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 178

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

CrackingFuture Work

Page 179: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 179

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXrun-time query optimization

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 180: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 180

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXsampling & incremental materialization

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 181: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 181

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXon-the-fly correlation detection & exploitation

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 182: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 182

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 183: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 183

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 184: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 184

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 185: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 185

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 186: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 186

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 187: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 187

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 188: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 188

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 189: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 189

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 190: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 190

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 191: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 191

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 192: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 192

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXavoid local minima using Chain Sampling

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 193: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 193

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ROXdetecting deep correlations

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 194: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 194

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l >800 DBLP co-authorship 4-way join queriesl Grouped by area DB, IR, Graphics, Bio

l 2:2, 3:1, 4:0 cross-area combinations

ROXrobust query performance

“ROX: run-time optimization of XQueries”, Abdelkader, Boncz, Manegold, vanKeulen, SIGMOD’09

Page 195: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 195

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Motivation: l scientific databases, data analytics

l Terabytes of data (observational , transactional)

l Prevailing read-only workload

l Ad-hoc queries with commonalities

Background:l Operator-at-a-time execution paradigm

Ø Automatic materialization of intermediates

l Canonical column-store organizationØ Intermediates have reduced dimensionality and finer granularityØ Simplified overlap analysis

l Recycling idea: instead of garbage collecting, keep the intermediates and reuse theml speed up query streams with commonalities

l low cost and self-organization

Recyclermotivation & idea

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Page 196: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 196

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Run-time Support

Recycler

Optimizer

SQL

MonetDB

Server

Tactical Optimizer

MonetDB Kernel

XQuery

MAL

MAL

Recycle Pool

function user.s1_2(A0:date, ...):void;

X5 := sql.bind("sys","lineitem",...);

X10 := algebra.select(X5,A0);

X12 := sql.bindIdx("sys","lineitem",...);

X15 := algebra.join(X10,X12);

X25 := mtime.addmonths(A1,A2);

...

function user.s1_2(A0:date, ...):void;

X5 := sql.bind("sys","lineitem",...);

X10 := algebra.select(X5,A0);

X12 := sql.bindIdx("sys","lineitem",...);

X15 := algebra.join(X10,X12);

X25 := mtime.addmonths(A1,A2);

...

Admission & Eviction

Recyclerfit into MonetDB

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Page 197: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 197

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Run time comparison of l instruction types l argument values

Name Value Data type Size

X1 10 :bat[:oid,:date]

T1 “sys” :str

T2 “orders” :str

X1 := sql.bind("sys","orders","o_orderdate",0);…

Y3 := sql.bind("sys","orders","o_orderdate",0);

Exact

matching

Recyclerinstruction matching

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Page 198: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 198

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Name Value Data type Size

X1 10 :bat[:oid,:int] 2000

X3 130 :bat[:oid,:int] 700

X5 150 :bat[:oid,:int] 350

X3 := algebra.select(X1,10,80);…

Y3 := algebra.select(X1,20,45);

X5 := algebra.select(X1,20,60);X5

Recyclerinstruction subsumption

Page 199: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 199

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

algebra.join

sql.bind(“C1“)

algebra.select

sql.bind(“C2“)

sql.bind(“C1“)X1 :=

algebra.select(X1)

X2 :=

sql.bind(“C2“)

X3 :=

algebra.join(X2,X3)

X4 :=

Q1

• Entire sub-tree cached

• Operator dependencies represented through intermediates

(operator lineage)

Recyclermotivation & idea

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Page 200: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 200

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

algebra.join

sql.bind(“C1“)

algebra.select

sql.bind(“C2“)

X1 := sql.bind(“C1“)

X2 := algebra.select(X1)

X3 := sql.bind(“C2“)

X4 := algebra.join(X2,X3)

algebra.join

sql.bind(“C3“)

X1

X2

X3

X4

Q2

• Entire sub-tree reused

• Successful matching of an operator depends on successful

matching of its predecessors (operator lineage)

Recyclerexample

algebra.join

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Page 201: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 201

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Decide about storing the results l KEEPALL

l all instructions advised by the optimizerl CREDIT

l instructions supplied with creditsl storage ‘paid’ with 1 creditl reuse returns creditsl lack of reuse limits admission and resource claims

Recycleradmission policies

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Page 202: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 202

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Decide about eviction of intermediatesl Filter ‘top’ instructions without dependentsl Pick instructions with smallest utility

l LRU : time of computation or last reusel BENEFIT : estimated contribution to performance:

CPU and I/O costs, recycling

l Triggered by resource limitations (memory or entries)

Recyclercache policies

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Page 203: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 203

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Sloan Digital Sky Servey/ SkyServer

http://cas.sdss.org

l 100 GB subset of DR4

l 100-query batch from January 2008 log

l 1.5GB intermediates, 99% reuse

l Join intermediates major consumer of memory and major contributor to savings

785

296

140

200

400

600

800

Tim

e(se

c)

Naïve CRD/1GB KeepAll/Unlim

RecyclerSkyServer evaluation

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Page 204: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 204

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Provides self-organizing cache of intermediates l Significant performance gains in SkyServer and TPC-Hl Transforms materialization overhead into benefit

l future work:l Refining and developing admission and cache policiesl Automatic switch of suitable policies

“An architecture for recycling intermediates in a column-store”, Ivanova, Kersten, Nes, Goncalves, SIGMOD’09

Recyclersummary and future work

Page 205: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

“MonetDB/X100”

vectorized query processing

VLDB 2009

Tutorial

Page 206: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 206

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Materialization vs Pipelining

MonetDB spin-off: MonetDB/X100

Page 207: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 207

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

next()

next()

101102104105

aliceivanpeggyvictor

22374525

next()

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 208: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 208

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

next()

next()

101102104105

aliceivanpeggyvictor

22374525

FALSETRUETRUEFALSE

> 30 ?

22374525

aliceivanpeggyvictor

101102104105

next()

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 209: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 209

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

next()

next()

101102104105

aliceivanpeggyvictor

22374525

715

FALSETRUETRUEFALSE

3745

ivanpeggy

102104

350750

ivanpeggy

102104

350750

Observations:

next() called much less often Łmore time spent in primitivesless in overhead

primitive calls process an array of

values in a loop:> 30 ?

- 30 * 50

22374525

aliceivanpeggyvictor

101102104105

“Vectorized In Cache Processing”

vector = array of ~100

processed in a tight loop

CPU cache Resident next()

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 210: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 210

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

next()

next()

101102104105

aliceivanpeggyvictor

22374525

715

FALSETRUETRUEFALSE

3745

ivanpeggy

102104

350750

ivanpeggy

102104

350750

Observations:

next() called much less often Łmore time spent in primitivesless in overhead

primitive calls process an array of

values in a loop:> 30 ?

- 30 * 50

CPU Efficiency depends on “nice” code- out-of-order execution- few dependencies (control,data)- compiler support

Compilers like simple loops over arrays- loop-pipelining- automatic SIMD

22374525

aliceivanpeggyvictor

101102104105

next()

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 211: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 211

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

FALSETRUETRUEFALSE

350750

Observations:

next() called much less often Łmore time spent in primitivesless in overhead

primitive calls process an array of

values in a loop:> 30 ?

* 50

CPU Efficiency depends on “nice” code- out-of-order execution- few dependencies (control,data)- compiler support

Compilers like simple loops over arrays- loop-pipelining- automatic SIMD

FALSETRUETRUEFALSE

> 30 ?

715

- 30

350750

* 50

for(i=0; i<n; i++)

res[i] = (col[i] > x)

for(i=0; i<n; i++)

res[i] = (col[i] - x)

for(i=0; i<n; i++)

res[i] = (col[i] * x)

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 212: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 212

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

next()101102104105

aliceivanpeggyvictor

22374525

12

Tricks being played:

- Late materialization

- Materialization avoidance using selection vectors

> 30 ?next()

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 213: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 213

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

next()

next()

101102104105

aliceivanpeggyvictor

22374525

715

12

350750

Tricks being played:

- Late materialization

- Materialization avoidance using selection vectors

> 30 ?

- 30 * 50

next()

map_mul_flt_val_flt_col(float *res,int* sel,float val,float *col, int n)

{for(int i=0; i<n; i++)

res[i] = val * col[sel[i]];}

selection vectors used to reduce vector copying

contain selected positions

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 214: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 214

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

SCAN

SELECT

PROJECT

next()

next()

101102104105

aliceivanpeggyvictor

22374525

715

12

350750

Tricks being played:

- Late materialization

- Materialization avoidance using selection vectors

> 30 ?

- 30 * 50

next()

map_mul_flt_val_flt_col(float *res,int* sel,float val,float *col, int n)

{for(int i=0; i<n; i++)

res[i] = val * col[sel[i]];}

selection vectors used to reduce vector copying

contain selected positions

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 215: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 215

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

MonetDB/X100

l Both efficiencyl Vectorized primitives

l and scalability..l Pipelined query evaluation

n C program: 0.2s

n MonetDB/X100: 0.6s

n MonetDB: 3.7s

n MySQL: 26.2s

n DBMS “X”: 28.1s

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 216: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 216

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Memory Hierarchy

ColumnBM

(buffer manager)

X100 query engine

CPU

cache

(raid)

Disk(s)

RAM

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 217: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 217

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Memory Hierarchy

Vectors are only the in-cache representation

RAM & disk representation mightactually be different

(vectorwise uses both PAX & DSM)

ColumnBM

(buffer manager)

X100 query engine

CPU

cache

(raid)

Disk(s)

RAM

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 218: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 218

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Optimal Vector size?

All vectors together should fit the CPU cache

Optimizer should tune this,given the query characteristics.

ColumnBM

(buffer manager)

X100 query engine

CPU

cache

(raid)

Disk(s)

RAM

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 219: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 219

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Varying the Vector size

Less and less iterator.next() and

primitive function calls (“interpretation overhead”)

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 220: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 220

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Varying the Vector size

Vectors start to exceed theCPU cache, causing

additional memory traffic

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 221: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 221

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Varying the Vector size

The benefit of selection vectors

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 222: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 222

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

MonetDB/MIL materializes columns

ColumnBM

(buffer manager)

CPU

cache

(raid)

Disk(s)

RAM

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 223: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 223

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Benefits of Vectorized Processingl 100x less Function Calls

l iterator.next(), primitives

l No Instruction Cache Misses

l High locality in the primitives

l Less Data Cache Misses

l Cache-conscious data placement

l No Tuple Navigation

l Primitives are record-oblivious, only see arrays

l Vectorization allows algorithmic optimization

l Move activities out of the loop (“strength reduction”)

l Compiler-friendly function bodies

l Loop-pipelining, automatic SIMD

“Buffering Database Operations for Enhanced Instruction Cache Performance”Zhou, Ross, SIGMOD’04

“Block oriented processing of relational database operations in modern computer architectures”Padmanabhan, Malkemus, Agarwal, ICDE’01

“MonetDB/X100: Hyper-Pipelining Query Execution ” Boncz, Zukowski, Nes, CIDR’05

Page 224: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 224

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Vectorizing Relational Operatorsl Project

l Select

l Exploit selectivities, test buffer overflow

l Aggregation

l Ordered, Hashed

l Sort

l Radix-sort nicely vectorizes

l Join

l Merge-join + Hash-join

“Balancing Vectorized Query Execution with Bandwidth Optimized Storage ” Zukowski, CWI 2008

Page 225: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

Efficient Column Store Compression

VLDB 2009

Tutorial

Page 226: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 226

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Compression in Column Stores

Column Storees:l More CPU bound thanks to less I/Ol Provide good compression opportunities

Reduce CPU effort by operating on compressed data immediately

•“Integrating Compression and Execution in Column-Oriented Database Systems” Abadi, Madden, Ferreira. SIGMOD’06•Query optimization in compressed database systems” Chen, Gehrke, Korn, SIGMOD’01

Page 227: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 227

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Compression in MonetDB/X100

MonetDB/X100 highly I/O bound:l 1GB/s query consumption per 2GHz corel 1/3 CPU for decompression Ł 3GB/s needed

ŁŁŁŁ new lightweight compression schemes

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 228: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 228

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Key Ingredients

l Compress relations on a per-column basisl Easy to exploit redundancy

l Decompress small vectors of tuples from a column into the CPU cachel Minimize main-memory overhead

l Use light-weight, CPU-efficient algorithmsl Exploit processing power of modern CPUs

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 229: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 229

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Key Ingredients

l Compress relations on a per-column basisl Easy to exploit redundancy

l Decompress small vectors of tuples from a column into the CPU cachel Minimize main-memory overhead

l Use light-weight, CPU-efficient algorithmsl Exploit processing power of modern CPUs

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 230: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 230

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Patched Frame Of Reference (PFOR)

“Compressing Relations and Indexes ” Goldstein, Ramakrishnan, Shaft, ICDE’98

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 231: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 231

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

PFOR-Delta

l PFOR on the differences (deltas) between subsequent values in anattribute column (a[i] – a[i-1])

l Performs well on columns containing increasing/decreasing sequencesl inverted listsl timestampsl object IDsl sorted / clustered columns

“Improved Word-Aligned Binary Compression for Text Indexing” Ahn, Moffat, TKDE’06

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 232: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 232

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Patched Dictionary (PDICT)

l Code words representindices into a dictionary (hash table) containing the 2^b most frequent source values.l Dictionary stored in block

headerl Store colliding values as

exceptions

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 233: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 233

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Key Ingredients

l Compress relations on a per-column basisl Columns compress well

l Decompress small vectors of tuples from a column into the CPU cachel Minimize main-memory overhead

l Use light-weight, CPU-efficient algorithmsl Exploit processing power of modern CPUs

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 234: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 234

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Key Ingredients

l Compress relations on a per-column basisl Columns compress well

l Decompress small vectors of tuples from a column into the CPU cachel Minimize main-memory overhead

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 235: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 235

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Vectorized Decompression

ColumnBM

(buffer manager)

X100 query engine

CPU

cache

(raid)

Disk(s)

RAM

Idea:

decompress a vector only

compression:-between CPU and RAM-Instead of disk and RAM (classic)

Page 236: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 236

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

RAM-Cache Decompression

l Decompress vectorson-demand into thecache

l RAM-Cache boundary only crossed once

l More (compressed)data cached in RAM

l Less bandwidth use

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 237: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 237

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Multi-Core Bandwidth & Compression

Performance Degradation with Concurrent Queries

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8

number of concurrent queries

avg

quer

y sp

eed

(clo

ck n

orm

aliz

ed)

Intel® Xeon E5410 (Harpertown) - Q6b Normal

Intel® Xeon E5410 (Harpertown) - Q6b Compressed

Intel® Xeon X5560 (Nehalem) - Q6b Normal

Intel® Xeon X5560 (Nehalem) - Q6b Compressed

Page 238: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 238

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

CPU Efficient Decompression

l Decoding loop over cache-resident vectors of code wordsl Avoid control dependencies within decoding loop

l no if-then-else constructs in loop body

l Avoid data dependencies between loop iterations

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 239: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 239

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Disk Block Layoutl Forward growing

section of arbitrarysize code words(code word size fixed per block)

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 240: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 240

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Disk Block Layout

l Forward growingsection of arbitrarysize code words(code word sizefixed per block)

l Backwards growingexception list

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 241: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 241

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Naïve Decompression Algorithm

l Use reserved value from code word domain (MAXCODE) to mark exception positions

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 242: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 242

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Deterioriation With Exception%

l 1.2GB/s deteriorates to 0.4GB/s

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 243: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 243

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Deterioriation With Exception%

l Perf Counters: CPU mispredicts if-then-else

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 244: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 244

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Patchingl Maintain a patch-list

through code wordsection that linksexception positions

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 245: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 245

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Patchingl Maintain a patch-list

through code wordsection that linksexception positions

l After decoding, patch up the exception positionswith the correct values

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 246: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 246

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Patched Decompression

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 247: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 247

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Patched Decompression

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 248: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 248

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Decompression Bandwidth

l Patching makes two passes, but is faster!

Patching can be applied to:

• Frame of Reference (PFOR)

• Delta Compression (PFOR-DELTA)

• Dictionary Compression (PDICT)

Makes these methods much more applicable to noisy data

Ł without additional cost

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 249: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 249

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

TPC-H 100GB – Slow RAID

l Gains equal to compression ratio

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 250: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 250

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

TPC-H 100GB – Faster RAID

l Still some gains with 450MB/s RAID

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 251: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 251

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

TPC-H 100GB – Final Result

l Compares well to DBMS with 8x more resources

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 252: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 252

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Terabyte TREC 2005

23110.541Zetdir

143230.342pisaEff4

5880.530Zetdist

4280.562MU05TBy1

2720.390uwmtEwteD10

2480.555MU05TBy3

3.2160.547MonetDB/X100

msec/qCPUsP@20Run

l Many systems used BM25 (X100 also)l Kept 9GB index in 8x2GB RAM thanks to PFOR-

delta compression (ratio~=3)

“Super-Scalar RAM-CPU Cache Compression”Zukowski, Heman, Nes, Boncz, ICDE’06

Page 253: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

Column Store Disk I/O

VLDB 2009

Tutorial

Page 254: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 254

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

TPC-H 100GB cost per component

Magnetic Disk Trends

The trend for more spindles per CPU core (8-20) is hard to sustain

given many-core CPUs.In OLTP, situation is worse.

DRAM Trend

Page 255: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 255

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Disk Scans: Unit Size

Only 1 disk used in RAID

Stripe size2MB: all 12 disks used

Non-sequential scans: 1-8MB needed

Page 256: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 256

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Using Disk in Data Warehousing

Goals:

1. large-chunk disk access *only*

2. minimize bandwidth, prepare for concurrency

database strategies:

l replicate tables in multiple orders (goal 1)

l clustering join-tables in foreign-key order (goal 1)

l keep dimension tables in RAM (goal 1&2)

l use a column-store (goal 1&2)

l increase disk bandwidth with lightweight compression (goal 2)

l coordinate concurrent disk access (goal 1&2)

•“Adjoined Dimension Column Clustering to Improve Data Warehouse Query Performance” O’Neil2, Chen, ICDE’08•“C-Store: A Column-oriented DBMS”, Stonebraker et al, VLDB’05

Page 257: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 257

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Store I/Ol Columns with various

physical sizes

Page 258: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 258

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Store I/Ol Columns with various

physical sizes: where are corresponding tuples located?

Page 259: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 259

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Store I/Ol Columns with various

physical sizes: where are corresponding tuples located?

l More prefetch buffers needed than for rows .

Page 260: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 260

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Store I/Ol Columns with various

physical sizes: where are corresponding tuples located?

l More prefetch buffers needed than for rows.

l Multiple cursors: columns fight among themselves for disk

“Performance Tradeoffs in Read-Optimized Databases” Harizopoulos, Liang, Abadi, Madden, VLDB’06

Page 261: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 261

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Scans in a DBMS

l Scan-based processing:l Large queriesl Clustered indicesl No useful indices

l Types of scans:l Full-table scansl Range-scansl Multi-range scans

Page 262: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 262

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Concurrent scans

l Multiple queries scanning the same table

l Different start times

l Different scan ranges

l Compete for disk access and buffer space

l FCFS request scheduling: poor latency

Page 263: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 263

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Chunks

l Pages clustered on diskl Large I/O units l Amortizes random-seek with large-readsl Result: “random” system bandwidth close to sequential

Page 264: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 264

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Normal” policy

l Strictly sequential read order

Page 265: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 265

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Normal” policy

l Strictly sequential read order

Page 266: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 266

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Normal” policy

l Strictly sequential read order

Page 267: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 267

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Normal” performance

l Data read many times – poor sharingl Many scans fight for bandwidthl Both latency and throughput bad

Page 268: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 268

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Normal” in real life

Page 269: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 269

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Shared scans

l Observation: queries often do not need data in a sequential order

l Idea: make queries “share” the scanning process

l Two existing types:

l Attach

l Elevator

Page 270: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 270

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Attach” policy

l Attach to a running query with data overlap

Page 271: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 271

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Attach” policy

l Attach to a running query with data overlap

Page 272: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 272

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Attach” policy

l Attach to a running query with data overlap

Page 273: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 273

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Attach” policy

l Attach to a running query with data overlap

Page 274: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 274

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Attach” policy

l Attach to a running query with data overlap

Page 275: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 275

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Attach” performance

l Better than Normall Only one overlapping range is usedl Queries with different speeds can “detach”

“Increasing buffer-locality for multiple relational table scans through grouping and throttling” Lang, Bhattacharjee, Malkemus, Padmanabhan, Wong. ICDE’07

Page 276: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 276

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Attach” in real life

Page 277: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 277

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Elevator” policy

l A single sliding window over a table

Page 278: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 278

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Elevator” policy

l A single sliding window over a table

Page 279: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 279

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Elevator” policy

l A single sliding window over a table

Page 280: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 280

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Elevator” policy

l A single sliding window over a table

Page 281: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 281

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Elevator” policy

l A single sliding window over a table

Page 282: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 282

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Elevator” policy

l A single sliding window over a table

Page 283: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 283

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Elevator” performance

l Maximizes sharing, minimizes I/Osl Good for long queries and uniform loadsl Short queries wait for the windowl Fast queries wait for the slow ones

Page 284: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 284

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Elevator” in real life

Page 285: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 285

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Shared scans – main problem

l Query read sequence in shared scans:l Broken into 2 partsl Then fully staticl Misses opportunities in a dynamic environment

Page 286: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 286

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cooperative scans

l Core ideasl Dynamically adapt to the current situationl Allow fully arbitrary chunk order

l Goals:l Maximize data sharingl Optimize latency and throughputl Work for different types of queries

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 287: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 287

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Active Buffer Manager

l ABM knows the status of all the queries

l Queries investigate ABM content:

l If some chunks buffered, choose one to use

l If not, wait for ABM to provide a new chunk

l ABM in a loop:

l Chooses a query to serve

l Chooses a chunk to load

l If out of space, chooses what to keep

l Loads a chunk and notifies the queries

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 288: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 288

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Relevance” functions

l ABM knows the status of all the queries

l Queries investigate ABM content:

l If some chunks buffered, choose one to use

l If not, wait for ABM to provide a new chunk

l ABM in a loop:

l Chooses a query to serve

l Chooses a chunk to load

l If out of space, chooses what to keep

l Loads a chunk and notifies the queries

useRelevance()

queryRelevance()

keepRelevance()

loadRelevance()

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 289: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 289

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

useRelevance()

l Choose a chunk with a minimal number of queries interestedl Allows early chunk eviction

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 290: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 290

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

queryRelevance()

l Choose “starved” queries onlyl Queries having data are doing fine

l Promote short queriesl Better query-stream throughputl Avoid round-robin request scheduling

l Promote long-waiting queriesl Don’t let short queries starve the long ones

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 291: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 291

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

loadRelevance()

l Load chunks interesting for the maximum number of starved queriesl Keep many queries busyl Amortize loading cost among many queries

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 292: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 292

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

keepRelevance()

l Keep chunks interesting for the maximum number of (almost) starved queriesl Avoid blocking queries

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 293: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 293

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Relevance policy”

l Follow the relevance functions

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 294: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 294

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Relevance policy”

l Follow the relevance functions

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 295: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 295

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Relevance policy”

l Follow the relevance functions

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 296: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 296

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Relevance policy”

l Follow the relevance functions

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 297: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 297

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Relevance policy”

l Follow the relevance functions

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 298: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 298

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Relevance” in real life

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 299: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 299

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Extensive benchmark

l TPC-H SF-10 datasetl MonetDB/X100, PAX storagel Two query speeds: Fast (Q6), Slow (Q1)l Varying scan ranges: 1%, 10%, 50%, 100%, at random

positionsl 16 concurrent streams, 3 seconds delayl 4 random queries in each stream

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 300: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 300

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Global results

1842140423254186Number of I/Os

93.9490.2081.3153.20CPU usage (%)

238244281453Total time (sec)

1.9613.523.726.42Avg. normalized query latency

99138160283Avg. stream time (sec)

RelevanceElevatorAttachNormal

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 301: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 301

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Results

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 302: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 302

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Cooperative Scans with DSM

l Reduced sharing opportunitiesl Physical-logical data mismatchl More complex ABM implementation and relevance

functions

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 303: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 303

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Sharing opportunities in DSM

l Both vertical and horizontal overlap needed

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 304: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 304

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Chunks in DSM

l Larger I/O requirementsl Columns with various

physical sizesl Logical chunks overlap

physicallyl Chunks as logical concepts

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 305: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 305

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

ABM in DSM

l More complex policiesl 2-dimensional decisions: chunk + columnl Columns with different queries interestedl Even column loading order matters!

l Still, it worksl Results depend on the overlap (paper)

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 306: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 306

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

DSM – global results

3639229744136490Number of I/Os

92827761CPU usage (%)

515562621805Total time (sec)

2.9615.114.777.05Avg. normalized query latency

264352338536Avg. stream time (sec)

RelevanceElevatorAttachNormal

“Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS”, Zukowski, Heman, Nes, Boncz VLDB’07

Page 307: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 307

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Conclusion

l Presented MonetDB/X100 (now VectorWise)

l A new database kernel developed at CWI

l Uses Blocked Iterator Model (Vectorization)

l works amazingly well

l So fast Ł must reduce hunger for hard disks

l Storage manager specializes large sequential column I/O

l + Lightweight compression schemes (give ~~ factor 3)

l + Cooperative Bandwidth Sharing (gives ~~ factor 2)

l Good performance results

l Very fast 100GB TPC-H performance

l Beats IR systems on Terabyte TREC

Page 308: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-Oriented Database Systems

Conclusion

VLDB 2009

Tutorial

Page 309: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 309

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Summary (1/2)

l Columns and Row-Stores: different?l No fundamental differencesl Can current row-stores simulate column-stores now?

l not efficiently: row-stores need change

l On disk layout vs execution layoutl actually independent issues, on-the-fly conversion pays offl column favors sequential access, row random

l Mixed Layout schemes l Fractured mirrorsl PAX, Clothol Data morphing

Page 310: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 310

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Summary (2/2)

l Crucial Columnar Techniquesl Storage

l Lean headers, sparse indices, fast positional access

l Compression l Operating on compressed datal Lightweight, vectorized decompression

l Late vs Early materializationl Non-join: LM always wins

l Naïve/Invisible/Jive/Flash/Radix Join (LM often wins)

l Executionl Vectorized in-cache executionl Exploiting SIMD

Page 311: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 311

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Future Work

l looking at write/load tradeoffs in column-storesl read-only vs batch loads vs trickle updates vs OLTP

Page 312: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 312

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Updates (1/3)

l Column-stores are update-in-place aversel In-place: I/O for each columnl + re-compression l + multiple sorted replicasl + sparse tree indices

Update-in-place is infeasible!

Page 313: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 313

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Updates (2/3)

l Column-stores use differential mechanisms insteadl Differential lists/files or more advanced (e.g. PDTs)l Updates buffered in RAM, merged on each queryl Checkpointing merges differences in bulk sequentially

l I/O trends favor this anyway § trade RAM for converting random into sequential I/O

§ this trade is also needed in Flash (do not write randomly!)

l How high loads can it sustain?§ Depends on available RAM for buffering (how long until full)

§ Checkpoint must be done within that time

§ The longer it can run, the less it molests queries§ Using Flash for buffering differences buys a lot of time

§ Hundreds of GBs of differences per server

Page 314: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 314

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Updates (3/3)l Differential transactions favored by hardware trendsl Snapshot semantics accepted by the user community

l can always convert to serialized

Ł Row stores could also use differential transactions and be efficient!Ł Implies a departure from ARIESŁ Implies a full rewrite

My conclusion: a system that combines row- and columns needs differentially implemented transactions.Starting from a pure column-store, this is a limited add-on.Starting from a pure row-store, this implies a full rewrite.

“Serializable Isolation For Snapshot Databases” Alomari, Cahill, Fekete, Roehm, SIGMOD’08

Page 315: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 315

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Future Work

l looking at write/load tradeoffs in column-storesl read-only vs batch loads vs trickle updates vs OLTP

l database design for column-storesl column-store specific optimizers

l compression/materialization/join tricks Ł cost models?

l hybrid column-row systemsl can row-stores learn new column tricks?

l Study of the minimal number changes one needs to make to a row store to get the majority of the benefits of a column-store

l Alternative: add features to column-stores that make them more like row stores

Page 316: Friday Column-Oriented Database Systems - CWIboncz/Friday_ColumnOriented_Database... · MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory

VLDB 2009 Tutorial Column-Oriented Database Systems 316

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Conclusion

l Columnar techniques provide clear benefits for:l Data warehousing, BIl Information retrieval, graphs, e-science

l A number of crucial techniques make them effectivel Without these, existing row systems do not benefit

l Row-Stores and column-stores could be combinedl Row-stores may adopt some column-store techniquesl Column-stores add row-store (or PAX) functionality

l Many open issues to do research on!