141
Technical Deep-Dive in a Column-Oriented In-Memory Database Martin Faust [email protected] Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Embed Size (px)

Citation preview

Page 1: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Technical Deep-Dive in a

Column-Oriented In-Memory Database

Martin Faust [email protected]

Research Group of Prof. Hasso Plattner

Hasso Plattner Institute for Software Engineering University of Potsdam

Page 2: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Research Group of Prof. Dr. h.c. Hasso Plattner §  Research focuses on the technical aspects of enterprise software and

design of complex applications §  In-Memory Data Management for Enterprise Applications §  Human-Centered Software Design and Engineering §  Programming Model for Enterprise

Applications §  Maintenance and Evolution of

Service-Oriented Enterprise Software §  RFID Technology in Enterprise Platforms

§  Teaching activities: §  Lectures, Exercises, Seminars §  Bachelor/Master projects §  Master thesis

Enterprise Platform and Integration Concepts Group - Topics

2  

Page 3: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

The Design Thinking Methodology

3  

Viability Business Factors

Feasibility Technical Factors

Solutions Needs & Opportunities

Desirability Human Factors

Page 4: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Demand Prediction

Supply Chain

Innovation

Database Technology

Massachusetts Institute of Technology

Software

Hardware

Cloud Computing

UC Berkeley

Design Research

In-Memory Data

Management Stanford

University

Hasso Plattner Institute, Enterprise Systems &

Applications Chair

End-Users

Setup    Enables  Leading  Research  in  the  Field    of  In-­‐Memory  Data  Management

Page 5: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Learning Map of our Online Lecture @ openHPI.de

Founda'ons  for  a  New  Enterprise  Applica'on  

Development  Era  

Founda'ons  of  Database  Storage  

Techniques  

The  Future  of  Enterprise  Compu'ng  

Advanced  Database  Storage  Tech-­‐niques  

In-­‐Memory  Database  Operators  

Page 6: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Goals

Deep  technical  understanding  of  a  column-­‐oriented,  dicAonary-­‐encoded  in-­‐memory  database  and  its  applicaAon  in  enterprise  compuAng  

 Chapters  □  The  future  of  enterprise  compuAng  □  FoundaAons  of  database  storage  techniques  □  In-­‐memory  database  operators  □  Advanced  database  storage  techniques  □  ImplicaAons  on  ApplicaAon  Development  

6  

Page 7: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Chapter 1:

The Future of Enterprise Computing

Page 8: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

New Requirements for Enterprise Computing

□  Sensors  

□  Events  

□  Structured  +  unstructured  data  

□  Social  networks  

□  ...  

8  

Page 9: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

���Enterprise Application

Characteristics���

Page 10: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

¨  Modern  enterprise  resource  planning  (ERP)  systems  are  challenged  by  mixed  workloads,  including  OLAP-­‐style  queries.  For  example:    §  OLTP-­‐style:  create  sales  order,  invoice,  accounAng  documents,  

display  customer  master  data  or  sales  order  §  OLAP-­‐style:  dunning,  available-­‐to-­‐promise,  cross  selling,  

operaAonal  reporAng  (list  open  sales  orders)  

¨  But:  Today’s  data  management  systems  are  opAmized  either  for  daily  transac'onal  or  analy'cal  workloads  storing  their  data  along  rows  or  columns  

Online  TransacAon    Processing  

Online  AnalyAcal  Processing  

OLTP vs. OLAP

10  

Page 11: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Drawbacks of the Separation

¨  OLAP  systems  do  not  have  the  latest  data  

¨  OLAP  systems  only  have  predefined  subset  of  the  data  

¨  Cost-­‐intensive  ETL  processes  have  to  sync  both  systems    

¨  SeparaAon  introduces  data  redundancy  

¨  Different  data  schemas  introduce  complexity  for  

applicaAons  combining  sources  

11  

Page 12: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Enterprise Workloads are Read Dominated

0 %

10 %

20 %

30 %

40 %

50 %

60 %

70 %

80 %

90 %

100%

OLTP OLAP

Work

load

0 %

10 %

20 %

30 %

40 %

50 %

60 %

70 %

80 %

90 %

100%

TPC-C

Work

load

Select

Insert

Modification

Delete

Write:

Read:

Lookup

Table Scan

Range Select

Insert

Modification

Delete

Write:

Read:

§  Workload  in  enterprise  applicaAons  consAtutes  of    §  Mainly  read  queries  (OLTP  83%,  OLAP  94%)  §  Many  queries  access  large  sets  of  data  

12  

Page 13: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Few Updates in OLTP Pe

rcen

tage  of  row

s  upd

ated

 

13  

Page 14: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Combine  OLTP  and  OLAP  data    using  modern  hardware  and  database  systems  

to  create  a  single  source  of  truth,  enable  real-­‐'me  analy'cs  and    

simplify  applicaAons  and  database  structures.      

AddiAonally,  ¨  ExtracAon,  transformaAon,  and  loading  (ETL)  processes    ¨  Pre-­‐computed  aggregates  and  materialized  views    

become  (almost)  obsolete.  

Vision

14  

Page 15: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

¨  Many  columns  are  not  used  even  once  

¨  Many  columns  have  a  low  cardinality  of  values  

¨  NULL  values/default  values  are  dominant  

¨  Sparse  distribuAon  facilitates  high  compression  

Standard  enterprise  so`ware  data  is  sparse  and  wide.  

Enterprise Data Characteristics

15  

Page 16: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Many Columns are not Used Even Once

55%  unused  columns  per  company  in  average    40%  unused  columns  across  all  12  analyzed  companies    

0%

10%

20%

30%

40%

50%

60%

70%

80%

1 - 32 33 - 1023 1024 - 100000000

13%9%

78%

24%

12%

64%

Number of Distinct Values

Inventory ManagementFinancial Accounting

% o

f Col

umns

16  

Page 17: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Changes in Hardware

Page 18: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

A  

Changes  in  Hardware…  

…  give  an  opportunity  to  re-­‐think  the  assump'ons  of  yesterday  because  of  what  is  possible  today.    

§  Main  Memory  becomes  cheaper  and  larger  

■  MulA-­‐Core  Architecture    (128  cores  per  server)  

■  Large  main  memories:  2TB  /blade  

■  One  blade  ~$50.000  =    1  enterprise  class  server  

■  Parallel  scaling  across  blades  

■  64-­‐bit  address  space  ■  12TB  in  current  servers  ■  Cost-­‐performance  raAo  rapidly  

declining  

■  Memory  hierarchies  

18  

Page 19: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

In the Meantime Research has come up with…

§  Column-­‐oriented  data  organizaAon    (the  column  store)  §  Sequen'al  scans  allow  best  bandwidth  uAlizaAon  between  CPU  cores  and  memory    

§  Independence  of  tuples  allows  easy  parAAoning  and  therefore  parallel  processing  

§  Lightweight  Compression  §  Reducing  data  amount  §  Increasing  processing  speed  through  late  materializaAon  

§  And  more,  e.g.,  parallel  scan/join/aggregaAon  

…  several  advances  in  soRware  for  processing  data  

+

19  

Page 20: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

A Blueprint of SanssouciDB

Page 21: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

SanssouciDB: An In-Memory Database for Enterprise Applications

Main Memoryat Blade i

Log

SnapshotsPassive Data (History)

Non-VolatileMemory

RecoveryLoggingTime travel

Data aging

Query Execution Metadata TA Manager

Interface Services and Session Management

Distribution Layerat Blade i

Main Store DifferentialStore

Active Data

Me

rgeCo

lum

n

Co

lum

n

Co

mb

ined

Co

lum

n

Co

lum

n

Co

lum

n

Co

mb

ined

Co

lum

n

Indexes

Inverted

ObjectData Guide

In-­‐Memory  Database  (IMDB)  

¨  Data  resides  permanently  in  main  memory  

¨  Main  Memory  is  the  primary  “persistence”  

¨  SAll:  logging  and  recovery              to/from  flash  

¨  Main  memory  access  is    the  new  boTleneck  

¨  Cache-­‐conscious  algorithms/  data  structures  are  crucial  (locality  is  king)    

 

Main Memory at Server i

Distribution Layer at Server i

Page 22: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Chapter 2:

Foundations of Database Storage Techniques

Page 23: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Learning Map of our Online Lecture @ openHPI.de

Founda'ons  for  a  New  Enterprise  Applica'on  

Development  Era  

Founda'ons  of  Database  Storage  

Techniques  

The  Future  of  Enterprise  Compu'ng  

Advanced  Database  Storage  Tech-­‐niques  

In-­‐Memory  Database  Operators  

Page 24: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Data Layout in Main Memory

Page 25: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Memory Basics (1)

□  Memory  in  today’s  computers  has  a  linear  address  layout:  addresses  start  at  0x0  and  go  to  0xFFFFFFFFFFFFFFFF  for  64bit  

□  Each  process  has  its  own  virtual  address  space  □  Virtual  memory  allocated  by  the  program  can  distribute  over  

mulAple  physical  memory  locaAons  □  Address  translaAon  is  done  in  hardware  by  the  CPU  

25  

Page 26: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Memory Basics (2)

□  Memory  layout  is  only  linear  □  Every  higher-­‐dimensional  access  (like  two-­‐dimensional  

database  tables)  is  mapped  to  this  linear  band  

26  

0x0 0xFFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF

...

Page 27: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Memory Hierarchy

27  

Memory Page

Nehalem Quadcore

Core 0 Core 1 Core 2 Core 3

L3 Cache

L2

L1

TLB

Main Memory Main Memory

QPI

Nehalem Quadcore

Core 0Core 1 Core 2 Core 3

L3 Cache

L2

L1

TLB

QPI

L1 Cacheline

L2 Cacheline

L3 Cacheline

Page 28: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Physical Data Representation §  Row  store:    

§  Rows  are  stored  consecuAvely  §  OpAmal  for  row-­‐wise  access  (e.g.  SELECT  *)  

§  Column  store:    §  Columns  are  stored  consecuAvely  §  OpAmal  for  akribute  focused  access  (e.g.  SUM,  GROUP  BY)  

§  Note:  concept  is  independent  from  storage  type  

+

28  

Doc

Num

Doc

Date

Sold-

To

Value

Status

Sales

Org

Row

4

Row

3

Row

2

Row

1

Row-Store Column-store

Page 29: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Row Data Layout □  Data  is  stored  tuple-­‐wise  □  Leverage  co-­‐locaAon  of  akributes  for  a  single  tuple  □  Low  cost  for  tuple  reconstrucAon,  but  higher  cost  for  

sequenAal  scan  of  a  single  akribute  

CBACBACBACBA

CBACBACBACBA

Column Operation

Row Operation

Row Row Row

Row Row Row Row29  

Page 30: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Columnar Data Layout □  Data  is  stored  akribute-­‐wise  □  Leverage  sequenAal  scan-­‐speed  in  main  memory    

for  predicate  evaluaAon  □  Tuple  reconstrucAon  is  more  expensive  

CCCCBBBBAAAA

CCCCBBBBAAAA

Column Operation

Row Operation

Column Column Column

Column Column Column30  

Page 31: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Row-oriented storage

31  

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

Page 32: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Row-oriented storage

32  

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

Page 33: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Row-oriented storage

33  

A1 B1 C1 A2 B2 C2

A3 B3 C3

A4 B4 C4

Page 34: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Row-oriented storage

34  

A1 B1 C1 A2 B2 C2 A3 B3 C3

A4 B4 C4

Page 35: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Row-oriented storage

35  

A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4

Page 36: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Column-oriented storage

36  

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

Page 37: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Column-oriented storage

37  

B1 C1

B2 C2

B3 C3

B4 C4

A1 A2 A3 A4

Page 38: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Column-oriented storage

38  

A1 B1

C1

A2 B2

C2

A3 B3

C3

A4 B4

C4

Page 39: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Column-oriented storage

39  

A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4

Page 40: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Dictionary Encoding

Page 41: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Motivation □  Main  memory  access  is  the  new  bokleneck  □  Idea:  Trade  CPU  Ame  to  compress  and  decompress  data  □  Compression  reduces  number  of  memory  accesses  □  Leads  to  less  cache  misses  due  to  more  informaAon  on  a  

cache  line  □  OperaAon  directly  on  compressed  data  possible  □  Offseong  with  bit-­‐encoded  fixed-­‐length  data  types  □  Based  on  limited  value  domain  

41  

Page 42: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Dictionary Encoding Example

8  billion  humans  □  Akributes  

§  first  name    §  last  name    §  gender  §  country  §  city  §  birthday  à  200  byte  

□  Each  akribute  is  stored  dicAonary  encoded  

42  

Page 43: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Sample Data

43  

rec  ID   fname   lname   gender   city   country   birthday  

…   …   …   …   …   …   …  

39   John   Smith   m   Chicago   USA   12.03.1964  

40   Mary   Brown   f   London   UK   12.05.1964  

41   Jane   Doe   f   Palo  Alto   USA   23.04.1976  

42   John   Doe   m   Palo  Alto   USA   17.06.1952  

43   Peter   Schmidt   m   Potsdam   GER   11.11.1975  

…   …   …   …   …   …  

Page 44: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Dictionary Encoding a Column

□  A  column  is  split  into  a  dicAonary  and  an  akribute  vector  □  DicAonary  stores  all  disAnct  values  with  implicit  value  ID  □  Akribute  vector  stores  value  IDs  for  all  entries  in  the  column  □  PosiAon  is  implicit,  not  stored  explicitly  □  Enables  offseong  with  fixed-­‐length  data  types  

44  

Rec  ID   fname  

…   …  

39   John  

40   Mary  

41   Jane  

42   John  

43   Peter  

…   …  

Dic'onary  for  “fname”  

Value  ID   Value  

…   …  

23   John  

24   Mary  

25   Jane  

26   Peter  

…   …  

ATribute  Vector  for  “fname”  

posi'on   Value  ID  

…   …  

39   23  

40   24  

41   25  

42   23  

43   26  

…   …  

Page 45: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Sorted Dictionary □  DicAonary  entries  are  sorted  either  by  their  numeric  value  or  

lexicographically  §  DicAonary  lookup  complexity:  O(log(n))  instead  of  O(n)  

□  DicAonary  entries  can  be  compressed  to  reduce  the  amount  of  required  storage  

□  SelecAon  criteria  with  ranges  are  less  expensive  (order-­‐preserving  dicAonary)  

45  

Page 46: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Data Size Examples

46  

Column   Cardi-­‐nality  

Bits  Needed  

Item  Size   Plain  Size   Size  with  Dic'onary  (Dic'onary  +  Column)  

Compression  Factor  

First  name   5  million   23  bit   50  Byte   373  GB   238.4  MB  +  21.4  GB   ≈  17  

Last  name   8  million   23  bit   50  Byte   373  GB   381.5  MB  +  21.4  GB   ≈  17  

Gender   2   1  bit   1  Byte   7  GB   2  bit  +  953.7  MB   ≈  8  

City   1  million   20  bit   50  Byte   373  GB   47.7  MB  +  18.6  GB   ≈  20  

Country   200   8  bit   47  Byte   350  GB   9.2  KB  +  7.5  GB   ≈  47  

Birthday   40,000   16  bit   2  Byte   15  GB   78.1  KB  +  14.9  GB   ≈  1  

Totals   200  Byte   ≈  1.6  TB   ≈  92  GB   ≈  17    

Page 47: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Chapter 3:

In-Memory Database Operators

Page 48: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Learning Map of our Online Lecture @ openHPI.de

Founda'ons  for  a  New  Enterprise  Applica'on  

Development  Era  

Founda'ons  of  Database  Storage  

Techniques  

The  Future  of  Enterprise  Compu'ng  

Advanced  Database  Storage  Tech-­‐niques  

In-­‐Memory  Database  Operators  

Page 49: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Scan Performance (1) 8  billion  humans  

¨  Akributes  §  First  Name  §  Last  Name    §  Gender  §  Country  §  City  §  Birthday  è  200  byte  

¨  QuesAon:  How  many  men/women?  ¨  Assumed  scan  speed:  2  MB/ms/core  

49  

Page 50: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Scan Performance (2)

50  

Row 1

Row 2

Row 3

...

Row 8 x 10

First NameLast Name

GenderCountry

CityBirthday

Table: humans

9

Row Store – Layout

¨  Table size = 8 billion tuples x 200 bytes per tuple à ~1.6 TB

¨  Scan through all rows with 2 MB/ms/core à ~800 seconds with 1 core

Page 51: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Scan Performance (3)

51  

Row 1

Row 2

Row 3

...

First NameLast Name Country

CityBirthday

Table: humans

Data loaded and used

Data loaded but not used

Row 8 x 109

Gender

Row Store – Full Table Scan

¨  Table size = 8 billion tuples x 200 bytes per tuple à ~1.6 TB

¨  Scan through all rows with 2 MB/ms/core à ~800 seconds with 1 core

Page 52: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Scan Performance (4)

52  

Row 1

Row 2

Row 3

...

First NameLast Name Country

CityBirthday

Table: humans

Data not loaded

Data loaded and used

Data loaded but not used

Row 8 x 109

Gender

¨  8 billion cache accesses à 64 byte à ~512 GB

¨  Read with 2 MB/ms/core à ~256 seconds with 1 core

Row Store – Stride Access “Gender”

Page 53: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Scan Performance (5)

53  

First NameLast Name Country

CityBirthday

Table: humans

Gender

Column Store – Layout

¨  Table size §  Attribute vectors: ~91 GB

§  Dictionaries: ~700 MB à Total: ~92 GB

¨  Compression factor: ~17

Page 54: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Scan Performance (6)

54  

Column Store – Full Column Scan on “Gender”

First NameLast Name Country

CityBirthday

Table: humans

Data not loaded

Data loaded and used

Data loaded but not used

Gender

¨  Size of attribute vector “gender” = 8 billion tuples x 1 bit per tuple à ~1 GB

¨  Scan through attribute vector with 2 MB/ms/core à ~0.5 seconds with 1 core

Page 55: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Scan Performance (7)

55  

Column Store – Full Column Scan on “Birthday”

First NameLast Name Country

CityBirthday

Table: humans

Data not loaded

Data loaded and used

Data loaded but not used

Gender

¨  Size of attribute vector “birthday” = 8 billion tuples x 2 Byte per tuple à ~16 GB

¨  Scan through column with 2 MB/ms/core à ~8 seconds with 1 core

Page 56: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Scan Performance – Summary

56  

¨  How many women, how many men?

Column Store Row Store

Full table scan

Stride access

Time in seconds 0.5 800 256

1,600x slower

512x slower

Page 57: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Tuple Reconstruction

Page 58: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Tuple Reconstruction (1)

¨  All  akributes    are  stored  consecuAvely  

¨  200  byte  à  4  cache  accesses  à  64  byte  à  256  byte    

¨  Read  with    2  MB/ms/core    à  ~0.128  μs    with  1  core  

 

Accessing  a  record  in  a  row  store  

58  

Table:  world_populaAon  

Page 59: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Tuple Reconstruction (2)

¨  All  akributes  are  stored  in  separate  columns  

¨  Implicit  record  IDs  are  used  to  reconstruct  rows  

 

59  

Virtual  record  IDs  Table:  world_populaAon  

Page 60: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Tuple Reconstruction (3)

¨  1  cache  access  for  each  akribute  

¨  6  cache  accesses    à  64  byte    à  384  byte    

¨  Read  with    2  MB/ms/core    à  ~0.192  μs    with  1  core  

60  

Virtual  record  IDs  Table:  world_populaAon  

Page 61: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Select

Page 62: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

SELECT Example SELECT fname, lname FROM world_population WHERE country="Italy" and gender="m"

fname   lname   country   gender  

Gianluigi   Buffon   Italy   m  

Lena   Gercke   Germany   f  

Mario   Balotelli   Italy   m  

Manuel   Neuer   Germany   m  

Lukas   Podolski   Germany   m  

Klaas-­‐Jan   Huntelaar   Netherlands   m  

62  

Page 63: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Query Plan

¨  MulAple  plans  exist  to  execute  query  ¨  Query  OpAmizer  decides  which  is  executed  ¨  Based  on  cost  model,  staAsAcs  and  other  parameters  

¨  AlternaAves  ¨  Scan  “country”  and  “gender”,  posiAonal  AND  ¨  Scan  over  “country”  and  probe  into  “gender”  ¨  Indices  might  be  used  ¨  Decision  depends  on  data  and  query  parameters  like  e.g.  selecAvity  

63  

SELECT fname, lname FROM world_population WHERE country="Italy" and gender="m"

Page 64: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Query Plan (i)

PosiAonal  AND:  ¨  Predicates  are  evaluated  and  generate  posiAon  lists  ¨  Intermediate  posiAon  lists  are  logically  combined  ¨  Final  posiAon  list  is  used  for  materializaAon  

64  

country = "Italy" gender = "m"

fname, lname

position list

position list

positionalAND

π  

Page 65: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Query Execution (i)

Posi'on  

0  

2  

Posi'on  

0  

2  

3  

4  

5  

Posi'on  

0  

2  

country  =    3  ("Italy")  

gender  =  1  ("m")  

AND  

Value  ID   Dic'onary  for  “country”  

0   Algeria  

1   France  

2   Germany  

3   Italy  

4   Netherlands  

…  

Value  ID  

Dic'onary  for  “gender”  

0   f  

1   m  

fname   lname   country   gender  

Gianluigi   Buffon   3   1  

Lena   Gercke   2   0  

Mario   Balotelli   3   1  

Manuel   Neuer   2   1  

Lukas   Podolski   2   1  

Klaas-­‐Jan   Huntelaar   4   1  

65  

Page 66: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

country = "Italy"

Gender

fname, lname

position list

ProbePositionalFilter

gender = "m"

Query Plan (ii)

Based  on  posiAon  list  produced  by  first  selecAon,  gender  column  is  probed.  

66  

π  

Page 67: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Insert

Page 68: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Insert □  Insert  is  the  dominant  modificaAon  operaAon    

§  Delete/Update  can  be  modeled  as  Inserts  as  well  (Insert-­‐only  approach)  

□  InserAng  into  a  compressed  in-­‐memory  persistence  can  be  expensive  §  UpdaAng  sorted  sequences  (e.g.  dicAonaries)  is  a  challenge  §  InserAng  into  columnar  storages  is  generally  more  expensive  than  

inserAng  into  row  storages  

68  

Page 69: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Insert Example

rowID   fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

...   ...   ...   ...   ...   ...   ...  

world_populaAon  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

69  

Page 70: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

INSERT (1) w/o new Dictionary entry

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

...   ...   ...   ...   ...   ...   ...  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

0   0  

1   1  

2   3  

3   2  

4   3  

0   Albrecht  

1   Berg  

2   Meyer  

3   Schulze  

D  AV  

Akribute  Vector  (AV)  DicAonary  (D)  

70  

Page 71: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

0   Albrecht  

1   Berg  

2   Meyer  

3   Schulze  

1.  Look-­‐up  on  D  à  entry  found  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

Akribute  Vector  (AV)  DicAonary  (D)  

D  fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

...   ...   ...   ...   ...   ...   ...  

AV  

0   0  

1   1  

2   3  

3   2  

4   3  

INSERT (1) w/o new Dictionary entry

71  

Page 72: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

0   0  

1   1  

2   3  

3   2  

4   3  

5   3  

1.  Look-­‐up  on  D  à  entry  found  2.  Append  ValueID  to  AV  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

Akribute  Vector  (AV)  DicAonary  (D)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze  

...   ...   ...   ...   ...   ...   ...  

AV  

0   Albrecht  

1   Berg  

2   Meyer  

3   Schulze  

D  

INSERT (1) w/o new Dictionary entry

72  

Page 73: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

INSERT (2) with new Dictionary Entry I/II

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze  

...   ...   ...   ...   ...   ...   ...  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

0   0  

1   0  

2   1  

3   2  

4   3  

D  AV  

0   Berlin  

1   Hamburg  

2   Innsbruck  

3   Potsdam  

Akribute  Vector  (AV)  DicAonary  (D)  

73  

Page 74: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

1.  Look-­‐up  on  D  à  no  entry  found  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

D  AV  

0   0  

1   0  

2   1  

3   2  

4   3  

0   Berlin  

1   Hamburg  

2   Innsbruck  

3   Potsdam  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze  

...   ...   ...   ...   ...   ...   ...  

Akribute  Vector  (AV)  DicAonary  (D)  

INSERT (2) with new Dictionary Entry I/II

74  

Page 75: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

1.  Look-­‐up  on  D  à  no  entry  found  2.  Append  new  value  to  D  (no  re-­‐sorAng  necessary)  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

0   Berlin  

1   Hamburg  

2   Innsbruck  

3   Potsdam  

4   Rostock  

D  fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze  

...   ...   ...   ...   ...   ...   ...  

Akribute  Vector  (AV)  DicAonary  (D)  

AV  

0   0  

1   0  

2   1  

3   2  

4   3  

INSERT (2) with new Dictionary Entry I/II

75  

Page 76: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

1.  Look-­‐up  on  D  à  no  entry  found  2.  Append  new  value  to  D  (no  re-­‐sorAng  necessary)  3.  Append  ValueID  to  AV  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...  

Akribute  Vector  (AV)  DicAonary  (D)  

0   0  

1   0  

2   1  

3   2  

4   3  

5   4  

0   Berlin  

1   Hamburg  

2   Innsbruck  

3   Potsdam  

4   Rostock  

D  AV  

INSERT (2) with new Dictionary Entry I/II

76  

Page 77: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

0   Anton  

1   Hanna  

2   MarAn  

3   Michael  

4   Sophie  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

D  AV  

0   2  

1   3  

2   1  

3   0  

4   4  

Akribute  Vector  (AV)  DicAonary  (D)  

INSERT (2) with new Dictionary Entry I/II

77  

Page 78: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

1.  Look-­‐up  on  D  à  no  entry  found  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...  

Akribute  Vector  (AV)  DicAonary  (D)  

0   Anton  

1   Hanna  

2   MarAn  

3   Michael  

4   Sophie  

D  AV  

0   2  

1   3  

2   1  

3   0  

4   4  

INSERT (2) with new Dictionary Entry II/II

78  

Page 79: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

1.  Look-­‐up  on  D  à  no  entry  found  2.  Insert  new  value  to  D  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...  

Akribute  Vector  (AV)  DicAonary  (D)  

0   Anton  

1   Hanna  

2   Karen  

3   MarAn  

4   Michael  

5   Sophie  

D  AV  

0   2  

1   3  

2   1  

3   0  

4   4  

INSERT (2) with new Dictionary Entry II/II

79  

Page 80: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

1.  Look-­‐up  on  D  à  no  entry  found  2.  Insert  new  value  to  D  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...  

Akribute  Vector  (AV)  DicAonary  (D)  

0   Anton  

1   Hanna  

2   MarAn  

3   Michael  

4   Sophie  

D  (old)  AV  

0   2  

1   3  

2   1  

3   0  

4   4  

0   Anton  

1   Hanna  

2   Karen  

3   MarAn  

4   Michael  

5   Sophie  

D  (new)  

INSERT (2) with new Dictionary Entry II/II

80  

Page 81: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

1.  Look-­‐up  on  D  à  no  entry  found  2.  Insert  new  value  to  D  3.  Change  ValueIDs  in  AV  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...  

Akribute  Vector  (AV)  DicAonary  (D)  

AV  (old)  

0   Anton  

1   Hanna  

2   Karen  

3   MarAn  

4   Michael  

5   Sophie  

D  (new)  AV  (new)  

0   3  

1   4  

2   1  

3   0  

4   5  

0   2  

1   3  

2   1  

3   0  

4   4  

INSERT (2) with new Dictionary Entry II/II

81  

Page 82: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

1.  Look-­‐up  on  D  à  no  entry  found  2.  Insert  new  value  to  D  3.  Change  ValueIDs  in  AV  4.  Append  new  ValueID  to  AV  

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Sophie   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Karen   Schulze   Rostock  

...   ...   ...   ...   ...   ...   ...  

Akribute  Vector  (AV)  DicAonary  (D)  

0   Anton  

1   Hanna  

2   Karen  

3   MarAn  

4   Michael  

5   Sophie  

D  AV  

0   3  

1   4  

2   1  

3   0  

4   5  

5   2  

INSERT (2) with new Dictionary Entry II/II

82  

Changed  Value  IDs  

Page 83: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Result

rowID   fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Karen   Schulze   f   GER   Rostock   11-­‐15-­‐2012  

world_populaAon    

INSERT  INTO  world_populaAon  VALUES  (Karen,  Schulze,  f,  GER,  Rostock,  11-­‐15-­‐2012)  

83  

Page 84: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Chapter 4:

Advanced Database Storage Techniques

Page 85: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Learning Map of our Online Lecture @ openHPI.de

Founda'ons  for  a  New  Enterprise  Applica'on  

Development  Era  

Founda'ons  of  Database  Storage  

Techniques  

The  Future  of  Enterprise  Compu'ng  

Advanced  Database  Storage  Tech-­‐niques  

In-­‐Memory  Database  Operators  

Page 86: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Differential Buffer

Page 87: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Motivation

□  InserAng  new  tuples  directly  into  a  compressed  structure  can  be  expensive  §  Especially  when  using  sorted  structures  §  New  values  can  require  reorganizing  the  dicAonary  §  Number  of  bits  required  to  encode  all  dicAonary  values  can  change,  

akribute  vector  has  to  be  reorganized  

87  

Page 88: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Differential Buffer ¨  New  values  are  wriken  to  a  dedicated  differenAal  buffer  (Delta)  ¨  Cache  SensiAve  B+  Tree  (CSB+)  used  for  fast  search  on  Delta  

Table

MainStore

DifferentialBufferD

ata

Mod

ifyin

g O

pera

tions

Rea

d O

pera

tions

TableDicAonary  Akribute  Vector  

fname  

…  

(com

pressed)  

Main  Store  

DicAonary  Akribute  Vector  

CSB+  

fname  

…  

DifferenAal  Buffer/  Delta  

Write  Read  

world_population

0 0  

1 1  

2 1  

3 3  

4 2  

5 1  

0 Anton  

1 Hanna  

2 Michael  

3 Sophie  

0   Angela  

1   Klaus  

2   Andre  

0 0  

1 1  

2 1  

3 2  

8  Billion  entries  up  to  50,000  

entries  

88  

Page 89: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Differential Buffer

□  Inserts  of  new  values  are  fast,  because  dicAonary  and  akribute  vector  do  not  need  to  be  resorted  

□  Range  selects  on  differenAal  buffer  are  expensive  §  Unsorted  dicAonary  allows  no  direct  comparison  of  value  IDs  §  Scans  with  range  selecAon  need  to  lookup  values  in  dicAonary  for  

comparisons  

□  DifferenAal  Buffer  requires  more  memory:  §  Akribute  vector  not  bit-­‐compressed  §  AddiAonal  CSB+  Tree  for  dicAonary  

89  

Page 90: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

DifferenAal  Buffer  

Main  Store  

Tuple Lifetime

recId   fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012  

...   ...   ...   ...   ...   ...   ...  

8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979  

Main  Table:  world_populaAon  

Michael  moves  from  Berlin  to  Potsdam  

UPDATE    "world_populaAon"  SET    city="Potsdam"  WHERE    fname="Michael"  AND  lname="Berg"  

90  

Page 91: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

DifferenAal  Buffer  

Main  Store  

Tuple Lifetime

recId   fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012  

...   ...   ...   ...   ...   ...   ...  

8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979  

UPDATE    "world_populaAon"  SET    city="Potsdam"  WHERE    fname="Michael"  AND  lname="Berg"  

91  

Main  Table:  world_populaAon  

Michael  moves  from  Berlin  to  Potsdam  

Page 92: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Tuple Lifetime

recId   fname   lname   gender   country   city   birthday  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992  

4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977  

5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012  

...   ...   ...   ...   ...   ...   ...  

8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979  

UPDATE    "world_populaAon"  SET    city="Potsdam"  WHERE    fname="Michael"  AND  lname="Berg"  

DifferenAal  Buffer  

Main  Store  

0   Michael   Berg   m   GER   Potsdam   03-­‐05-­‐1970  

92  

Main  Table:  world_populaAon  

Michael  moves  from  Berlin  to  Potsdam  

Page 93: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Tuple Lifetime □  Tuples  are  now  available  in  Main  Store  and  DifferenAal  Buffer  □  Tuples  of  a  table  are  marked  by  a  validity  vector  to  reduce  the  

required  amount  of  reorganizaAon  steps  §  AddiAonal  akribute  vector  for  validity  §  1  bit  required  per  database  tuple  

□  Invalidated  tuples  stay  in  the  database  table,  unAl  the  next  reorganizaAon  takes  place  

□  Query  results  §  Main  and  delta  have  to  be  queried  §  Results  are  filtered  using  the  validity  vector  

93  

Page 94: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

recId   fname   lname   gender   country   city   birthday   valid  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   0  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   1  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   1  

4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   1  

5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012   1  

...   ...   ...   ...   ...   ...   ...  

8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979   1  

Tuple Lifetime

UPDATE    "world_populaAon"  SET    city="Potsdam"  WHERE    fname="Michael"  AND  lname="Berg"  

0   Michael   Berg   m   GER   Potsdam   03-­‐05-­‐1970   1   DifferenAal    Buffer  

Main  Store  

94  

Main  Table:  world_populaAon  

Michael  moves  from  Berlin  to  Potsdam  

Page 95: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Tuple Lifetime

UPDATE    "world_populaAon"  SET    city="Potsdam"  WHERE    fname="Michael"  AND  lname="Berg"  

recId   fname   lname   gender   country   city   birthday   valid  

0   MarAn   Albrecht   m   GER   Berlin   08-­‐05-­‐1955   1  

1   Michael   Berg   m   GER   Berlin   03-­‐05-­‐1970   0  

2   Hanna   Schulze   f   GER   Hamburg   04-­‐04-­‐1968   1  

3   Anton   Meyer   m   AUT   Innsbruck   10-­‐20-­‐1992   1  

4   Ulrike   Schulze   f   GER   Potsdam   09-­‐03-­‐1977   1  

5   Sophie   Schulze   f   GER   Rostock   06-­‐20-­‐2012   1  

...   ...   ...   ...   ...   ...   ...  

8  *  109   Zacharias   Perdopolus   m   GRE   Athen   03-­‐12-­‐1979   1  

0   Michael   Berg   m   GER   Potsdam   03-­‐05-­‐1970   1   DifferenAal    Buffer  

Main  Store  

95  

Main  Table:  world_populaAon  

Michael  moves  from  Berlin  to  Potsdam  

Page 96: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Merge

Page 97: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Handling Write Operations

¨  All  Write  operaAons  (INSERT,  UPDATE)  are  stored  within  a  differenAal  buffer  (delta)  first  

¨  Read-­‐operaAons  on  differenAal  buffer  are  more  expensive  than  on  main  store  

□  DifferenAal  buffer  is  merged  periodically  with  the  main  store  §  To  avoid  performance  degradaAon  based  on  large  delta  §  Merge  is  performed  asynchronously    

97  

Page 98: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Merge Overview I/II

¨  The  merge  process  is  triggered  for  single  tables  

¨  Is  triggered  by:  §  Amount  of  tuples  in  buffer  §  Cost  model  to  

§  Schedule  §  Take  query  cost  into  account  

§  Manually  

98  

Page 99: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Merge Overview II/II

¨  Working  on  data  copies  allows  asynchronous  merge  ¨  Very  limited  interrupAon  due  to  short  lock  ¨  At  least  twice  the  memory  of  the  table  needed!  

99  

MainStore

Differ-entialBuffer

BeforeData

Modifying Operations

Read Operations

MainStore

DifferentialBuffer(new)

During the Merge ProcessData

Modifying Operations

Read Operations

MainStore(new)

Differ-entialBuffer(new)

AfterData

Modifying Operations

Read Operations

MainStore(new)

Differ-entialBuffer

Merge Operation

Table Table Table

Prepare Attribute Merge Commit

Page 100: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Attribute Merge

Step  1:  Dic'onary  Merge  ¨  Merge  main  and  delta  dicAonary  

¨  OpAonally  remove  unused  values  ¨  Merge  of  two  sorted  arrays  

¨  Create  mapping  if  valueIDs  changed    Step  2:  Update  ATribute  Vector  ¨  Create  new  merged  main  parAAon  ¨  Update  valueIDs  reflecAng  changed  dicAonary  

100  

Page 101: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Delta  

Main  Store  

Example

101  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

recID   valid  

0   1  

1   1  

2   1  

recID   fname   city  

0   2   0  

1   1   1  

2   0   0  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city   recID   valid  recID   fname   city  

Dic'onaries   ATribute  Vectors   Validity  Vector  

Page 102: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Delta  

Main  Store  

Example

102  

recID   valid  

0   1  

1   0  

2   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin  

recID   valid  

0   1  

recID   fname   city  

0   0   0  

Dic'onaries   ATribute  Vectors   Validity  Vector  

Michael  moves  from  London  to  Berlin  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

recID   fname   city  

0   2   0  

1   1   1  

2   0   0  

Page 103: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Main  Store  

Delta  

Example

103  

recID   valid  

0   0  

1   0  

2   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin  

1   Nadja   Potsdam  

recID   valid  

0   1  

1   1  

recID   fname   city  

0   0   0  

1   1   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

Nadja  moves  from  Berlin  to  Potsdam  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

recID   fname   city  

0   2   0  

1   1   1  

2   0   0  

Page 104: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Main  Store  

Delta  

Example

104  

recID   valid  

0   0  

1   0  

2   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin  

1   Nadja   Potsdam  

recID   valid  

0   0  

1   1  

2   1  

recID   fname   city  

0   0   0  

1   1   1  

2   0   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

Michael  moves  from  Berlin  to  Potsdam  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

recID   fname   city  

0   2   0  

1   1   1  

2   0   0  

Page 105: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Main  Store  

Delta  

First Step: Validity Detection

105  

recID   valid  

0   0  

1   0  

2   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin  

1   Nadja   Potsdam  

recID   valid  

0   0  

1   1  

2   1  

recID   fname   city  

0   0   0  

1   1   1  

2   0   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

Will  be  deleted  /  moved  into  history  parAAon  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

recID   fname   city  

0   2   0  

1   1   1  

2   0   0  

Page 106: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Main  Store  

Delta  

First Step: Dictionary Merge

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

Dic'onaries  

valueID   fname   city  

0   Michael   Berlin  

1   Nadja   Potsdam  

Dic'onaries  

city:  old  to  new  Main    

valueID  (old)   0   1  

valueID  (new)   0   NA  

fname:  old  to  new  Main  

valueID  (old)   0   1   2  

valueID  (new)   0   1   2  

city:  Delta  to  Main  

valueID  (Delta)   0   1  

valueID  (Main)   0   1  

fname:  Delta  to  Main  

valueID  (Delta)   0   1  

valueID  (Main)   1   2  

Mappings  

New  Combined  Main  Store  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   Potsdam  

2   Nadja  

Dic'onaries  

Page 107: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Main  Store  

Delta  

Second Step: New Main Column

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city  

0   Michael   Berlin  

1   Nadja   Potsdam  

recID   valid  

0   1  

1   1  

recID   fname   city  

0   1   1  

1   0   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

recID   valid  

0   1  

recID   fname   city  

0   0   0  

fname:  old  to  new  Main  

valueID  (old)   0   1   2  

valueID  (new)   0   1   2  

fname:  Delta  to  Main  

valueID  (Delta)   0   1  

valueID  (Main)   1   2  

Mappings  city:  old  to  new  Main    

valueID  (old)   0   1  

valueID  (new)   0   NA  

city:  Delta  to  Main  

valueID  (Delta)   0   1  

valueID  (Main)   0   1  

valueID   fname   city  

0   Albert   Berlin  

1   Michael   London  

2   Nadja  

Page 108: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Main  Store  

Delta  

Second Step: New Main Column

valueID   fname   city  

0   Albert   Berlin  

1   Michael   Potsdam  

2   Nadja  

recID   valid  

0   1  

1   1  

2   1  

recID   fname   city  

0   0   0  

1   2   1  

2   1   1  

Dic'onaries   ATribute  Vectors   Validity  Vector  

valueID   fname   city   recID   valid  recID   fname   city  

Dic'onaries   ATribute  Vectors   Validity  Vector  

108  

Page 109: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Chapter 5:

Implications on Application Development

Page 110: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Learning Map of our Online Lecture @ openHPI.de

Founda'ons  for  a  New  Enterprise  Applica'on  

Development  Era  

Founda'ons  of  Database  Storage  

Techniques  

The  Future  of  Enterprise  Compu'ng  

Advanced  Database  Storage  Tech-­‐niques  

In-­‐Memory  Database  Operators  

Page 111: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

How does it all come together? 1. Mixed Workload combining OLTP and

analytic-style queries ■  Column-Stores are best suited for analytic-style queries ■  In-memory databases enable fast tuple re-construction ■  In-memory column store allows aggregation on-the-fly

2. Sparse enterprise data ■  Lightweight compression schemes are optimal ■  Increases query execution ■  Improves feasibility of in-memory databases

3. Mostly read workload ■  Read-optimized stores provide best throughput

■  i.e. compressed in-memory column-store ■  Write-optimized store as delta partition

to handle data changes is sufficient

Changed Hardware

Advances in data processing

(software)

Complex Enterprise Applications

Our focus

111  

Page 112: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

An In-Memory Database for Enterprise Applications

□  In-­‐Memory  Database  (IMDB)  §  Data  resides  permanently    

in  main  memory  §  Main  Memory  is  the    

primary  “persistence”  §  SAll:  logging  and  recovery          

from/to  flash  §  Main  memory  access  is    

the  new  boTleneck  §  Cache-­‐conscious  algorithms/  

data  structures  are  crucial  (locality  is  king)    

112  

Main Memoryat Blade i

Log

SnapshotsPassive Data (History)

Non-VolatileMemory

RecoveryLoggingTime travel

Data aging

Query Execution Metadata TA Manager

Interface Services and Session Management

Distribution Layerat Blade i

Main Store DifferentialStore

Active Data

Me

rgeCo

lum

n

Co

lum

n

Co

mb

ined

Co

lum

n

Co

lum

n

Co

lum

n

Co

mb

ined

Co

lum

n

Indexes

Inverted

ObjectData Guide

Page 113: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Simplified Application Development

TradiAonal   Column-­‐oriented  

ApplicaAon  cache  

Database  cache  

Prebuilt  aggregates  

Raw  data  

113  

¨  Fewer caches necessary ¨  No redundant data

(OLAP/OLTP, LiveCache) ¨  No maintenance of

materialized views or aggregates

¨  Minimal index maintenance

Page 114: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Examples for Implications on Enterprise Applications

Page 115: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

SAP ERP Financials on In-Memory Technology

In-memory column database for an ERP system

□  Combined workload (parallel OLTP/OLAP queries)

□  Leverage in-memory capabilities to §  Reduce amount of data §  Aggregate on-the-fly §  Run analytic-style queries (to replace materialized views) §  Execute stored procedures

□  Use Case: SAP ERP Financials solution §  Post and change documents §  Display open items §  Run dunning job §  Analytical queries, such as balance sheet

115  

Page 116: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Current Financials Solutions

116  

Page 117: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Only base tables, algorithms, and some indices

The Target Financials Solution

117  

Page 118: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Feasibility of Financials on In-Memory Technology in 2009

□  Modifications on SAP Financials §  Removed secondary indices, sum tables and pre-calculated and

materialized tables

§  Reduce code complexity and simplify locks

§  Insert Only to enable history (change document replacement)

§  Added stored procedures with business functionality

□  European division of a retailer §  ERP 2005 ECC 6.0 EhP3

§  5.5 TB system database size

§  Financials:

§  23 million headers / 1.5 GB in main memory

§  252 million items / 50 GB in main memory (including inverted indices for join attributes and insert only extension)

118  

Page 119: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

BKPF

accounting documents

BSEG

sum tables

secondary indices

dunning data

change documents

CDHDR

CDPOS

MHNK

MHND BSAD

BSAK

BSAS

BSID

BSIK

BSIS

LFC1

KNC1

GLT0

119  

In-Memory Financials on SAP ERP

Page 120: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

BKPF

accounting documents

BSEG

In-Memory Financials on SAP ERP

120  

Page 121: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Classic Row-Store

(w/o compr.) IMDB

BKPF 8.7 GB 1.5 GB

BSEG 255 GB 50 GB

Secondary Indices 255 GB -

Sum Tables 0.55 GB -

Complete 519.25 GB 51.5 GB

263.7 GB 51.5 GB

Reduction by a Factor 10

121  

Page 122: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Booking an accounting document

□  Insert into BKPF and BSEG only

□  Lack of updates reduces locks

122  

Page 123: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Wrap Up (I)

□  The  future  of  enterprise  compuAng  §  Big  data  challenges  §  Changes  in  Hardware  §  OLTP  and  OLAP  in  one  single  system  

□  FoundaAons  of  database  storage  techniques  §  Data  layout  opAmized  for  memory  hierarchies  §  Light-­‐weight  compression  techniques  

□  In-­‐memory  database  operators  §  Operators  on  dicAonary  compressed  data  §  Query  execuAon:  Scan,  Insert,  Tuple  ReconstrucAon  

 

123  

Page 124: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Wrap Up (II)

□  Advanced  database  storage  techniques  §  DifferenAal  buffer  accumulates  changes  §  Merge  combines  changes  periodically  with  main  storage  

□  ImplicaAons  on  ApplicaAon  Development  §  Move  data  intensive  operaAons  closer  to  the  data  §  New  analyAcal  applicaAons  on  transacAonal  data  possible  §  Less  data  redundancy,  more  on  the  fly  calculaAon  §  Reduced  code  complexity  

124  

Page 125: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

References

□  A  Course  in  In-­‐Memory  Data  Management,  H.  Plakner  hkp://epic.hpi.uni-­‐potsdam.de/Home/InMemoryBook  

□  PublicaAons  of  our  Research  Group:  §  Papers  about  the  inner-­‐workings  of  in-­‐memory  databases  §  hkp://epic.hpi.uni-­‐potsdam.de/Home/PublicaAons  

125  

Page 126: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Thank You! Technical Deep-Dive in a Column-Oriented

In-Memory Database

Martin Faust [email protected]

Research Group of Prof. Hasso Plattner

Hasso Plattner Institute for Software Engineering University of Potsdam

Page 127: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Dunning Run

□  Dunning  run  determines  all  open  and  due  invoices  □  Customer  defined  queries  on  250M  records  □  Current  system:  20  min  □  New  logic:  1.5  sec  

§  In-­‐memory  column  store  §  Parallelized  stored  procedures  §  Simplified  Financials  

127  

Page 128: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Bring Application Logic Closer to the Storage Layer

□  Select  accounts  to  be  dunned,  for  each:  §  Select  open  account  items  from  BSID,  for  each:  

§  Calculate  due  date  §  Select  dunning  procedure,  level  and  area  

§  Create  MHNK  entries  

□  Create  and  write  dunning  item  tables  

 

128  

Page 129: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

□  Select  accounts  to  be  dunned,  for  each:  §  Select  open  account  items  from  BSID,  for  each:  

§  Calculate  due  date  §  Select  dunning  procedure,  level  and  area  

§  Create  MHNK  entries  

□  Create  and  write  dunning  item  tables  

 

1  SELECT  

10000  SELECTs  

10000  SELECTs  

31000  Entries  

129  

Bring Application Logic Closer to the Storage Layer

Page 130: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Bring Application Logic Closer to the Storage

Layer

□  Select  accounts  to  be  dunned,  for  each:  §  Select  open  account  items  from  BSID,  for  each:  

§  Calculate  due  date  §  Select  dunning  procedure,  level  and  area  

§  Create  MHNK  entries  

□  Create  and  write  dunning  item  tables  

 

130  

1  SELECT  

10000  SELECTs  

10000  SELECTs  

31000  Entries  

One single stored procedure executed within IMDB

Page 131: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Bring Application Logic Closer to the Storage

Layer

□  Select  accounts  to  be  dunned,  for  each:  §  Select  open  account  items  from  BSID,  for  each:  

§  Calculate  due  date  §  Select  dunning  procedure,  level  and  area  

§  Create  MHNK  entries  

□  Create  and  write  dunning  item  tables  

 Calculated on-the-fly

131  

One single stored procedure executed within IMDB

Page 132: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Dunning Application

132  

Page 133: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Dunning Application

133  

Page 134: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Available-to-Promise Check □  Can  I  get  enough  quanAAes  of  a  requested  product  on  a  

desired  delivery  date?  □  Goal:  Analyze  and  validate  the  potenAal  of  in-­‐memory  and  

highly  parallel  data  processing  for  Available-­‐to-­‐Promise  (ATP)  □  Challenges  

§  Dynamic  aggregaAon  §  Instant  rescheduling  in  minutes  vs.  nightly  batch  runs  §  Real-­‐Ame  and  historical  analyAcs  

□  Outcome  §  Real-­‐Ame  ATP  checks  without  materialized  views  §  Ad-­‐hoc  rescheduling    §  No  materialized  aggregates  

134  

Page 135: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

In-Memory Available-to-Promise

135  

Page 136: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

Demand Planning □  Flexible  analysis  of  demand  

planning  data  □  Zooming  to  choose  granularity  □  Filter  by  certain  products  or  

customers  □  Browse  through  Ame  spans  □  CombinaAon  of  locaAon-­‐based  

geo  data  with  planning  data  in  an  in-­‐memory  database  

□  External  factors  such  as  the  temperature,  or  the  level  of  cloudiness  can  be  overlaid  to  incorporate  them  in  planning  decisions  

136  

Page 137: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

GORFID □  HANA  for  Streaming  Data  Processing  □  Use  Case:  In-­‐Memory  RFID  Data  

Management  □  EvaluaAon  of  SAP  OER  □  Prototypical  implementaAon  of:  

§  RFID  Read  Event  Repository  on  HANA  §  Discovery  Service  on  HANA  (10  billion  data  

records  with  ~3  seconds  response  Ame)  §  Front  ends  for  iPhone  &  iPad  

□  Key  Findings:  §  HANA  is  suited  for  streaming  data    

(using  bulk  inserts)  §  AnalyAcs  on  streaming  data  is  now  possible  

 137  

Page 138: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

GORFID: “Near Real-Time” as a Concept

Discovery Service

Read Event Repositories

Verification Services

SAP HANA

● ●

up to 8,000 read event notifications

per second

up to 2,000 requests

per second

Discovery Service

Read Event Repositories

Verification Services

SAP HANA

● ●

P A

up to 8.000 read event notifications

per second

up to 2.000 requests

per second

Discovery Service

Read Event Repositories

Verification Services

SAP HANA

● ●

P A

up to 8.000 read event notifications

per second

up to 2.000 requests

per second

P A

Bulk load every 2-3 seconds:

> 50,000 inserts/s

138  

Page 139: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

POS Explorer I

139  

Page 140: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

POS Explorer II

140  

Page 141: Technical Deep-Dive in a Column-Oriented In … Model for Enterprise Applications ! Maintenance and Evolution of Service-Oriented Enterprise Software ! RFID Technology in Enterprise

POS Explorer III

141