DatenbanksystemimplementierungMicrosoft SQL Server H-Store Informix VoltDB Vertica Sybase ASE Sybase IQ SQL Anywhere Access Oracle Infobright MySQL TimesTen Paradox Teradata Database

Datenbanksystemimplementierung

Prof. Dr. Viktor Leis

Professur für Datenbanken und Informationssysteme

DBMS Architecture

• so far, we mostly looked at the traditional database architecture• developed between 1980 and 2010• in the past decade many new systems have emerged that are very differentfrom traditional ones

• many concepts are similar though• change is mostly driven by hardware trends (e.g., multi-core CPUs, CPUcaches, SIMD, NVMe SSDs)

• in this lecture, we will look at some historical and current developments

1

Information Retrieval P. BAXENDALE, Editor

A Relational Model of Data for Large Shared Data Banks

E. F. CODD IBM Research Laboratory, San Jose, California

Future users of large data banks must be protected from

having to know how the data is organized in the machine (the

internal representation). A prompting service which supplies

such information is not a satisfactory solution. Activities of users

at terminals and most application programs should remain

unaffected when the internal representation of data is changed

and even when some aspects of the external representation

are changed. Changes in data representation will often be

needed as a result of changes in query, update, and report

traffic and natural growth in the types of stored information.

Existing noninferential, formatted data systems provide users

with tree-structured files or slightly more general network

models of the data. In Section 1, inadequacies of these models

are discussed. A model based on n-ary relations, a normal

form for data base relations, and the concept of a universal

data sublanguage are introduced. In Section 2, certain opera-

tions on relations (other than logical inference) are discussed

and applied to the problems of redundancy and consistency

in the user’s model.

KEY WORDS AND PHRASES: data bank, data base, data structure, data

organization, hierarchies of data, networks of data, relations, derivability,

redundancy, consistency, composition, join, retrieval language, predicate

calculus, security, data integrity

CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29

1. Relational Model and Normal Form

1 .I. INTR~xJ~TI~N This paper is concerned with the application of ele-

mentary relation theory to systems which provide shared access to large banks of formatted data. Except for a paper by Childs [l], the principal application of relations to data systems has been to deductive question-answering systems. Levein and Maron [2] provide numerous references to work in this area.

In contrast, the problems treated here are those of data independence-the independence of application programs and terminal activities from growth in data types and changes in data representation-and certain kinds of data inconsistency which are expected to become troublesome even in nondeductive systems.

Volume 13 / Number 6 / June, 1970

The relational view (or model) of data described in Section 1 appears to be superior in several respects to the graph or network model [3,4] presently in vogue for noninferential systems. It provides a means of describing data with its natural structure only-that is, without superim- posing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation and organization of data on the other.

A further advantage of the relational view is that it forms a sound basis for treating derivability, redundancy, and consistency of relations-these are discussed in Section 2. The network model, on the other hand, has spawned a number of confusions, not the least of which is mistaking the derivation of connections for the derivation of relations (see remarks in Section 2 on the “connection trap”).

Finally, the relational view permits a clearer evaluation of the scope and logical limitations of present formatted data systems, and also the relative merits (from a logical standpoint) of competing representations of data within a single system. Examples of this clearer perspective are cited in various parts of this paper. Implementations of systems to support the relational model are not discussed.

1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS The provision of data description tables in recently de-

veloped information systems represents a major advance toward the goal of data independence [5,6,7]. Such tables facilitate changing certain characteristics of the data representation stored in a data bank. However, the variety of data representation characteristics which can be changed without logically impairing some application programs is still quite limited. Further, the model of data with which users interact is still cluttered with representational prop- erties, particularly in regard to the representation of col- lections of data (as opposed to individual items). Three of the principal kinds of data dependencies which still need to be removed are: ordering dependence, indexing dependence, and access path dependence. In some systems these dependencies are not clearly separable from one another.

1.2.1. Ordering Dependence. Elements of data in a data bank may be stored in a variety of ways, some involv- ing no concern for ordering, some permitting each element to participate in one ordering only, others permitting each element to participate in several orderings. Let us consider those existing systems which either require or permit data elements to be stored in at least one total ordering which is closely associated with the hardware-determined ordering of addresses. For example, the records of a file concerning parts might be stored in ascending order by part serial number. Such systems normally permit application programs to assume that the order of presentation of records from such a file is identical to (or is a subordering of) the

Communications of the ACM 377

2

The “Dinosaurs”

• the earliest commercially available relational systems were Oracle, IBMSystem R, Ingres (end of 1970s)

• in 1990s and 2000s Oracle, MS SQL Server, IBM DB2 were the dominantplayers (in terms of mind- and market share)

• general-purpose systems: OLTP and OLAP• behavior similar (all speak SQL), but different enough to make switching hard• similar internal architecture• stable, great functionality (ACID transactions, SQL)• sometimes hard to use (database administrator needed), many tuning knobsto get better performance

• database systems research was kind of boring

3

BAY AREA PARK

CODD RIVER

RELATIONAL CREEK

CODD RIVER

BAY AREA PARK

1970s

1980s1990s

2000s

2010s

DABA (Robotron, TU Dresden)

v1, 1992

v1.0, 1987

v4.0, 1990 v10, 1993

v1, 1987 v2, 1989v3, 2011

v11.5, 1996 v11.9, 1998

v12.0, 1999 v12.5, 2001 v12.5.1, 2003 v15.0, 2005 v16.0, 2012

v1, 1989

v2, 1993

v1.0, 1980s

v5.x, 1970s

v6.0, 1986 OpenIngres 2.0, 1997 vR3, 2004

v1, 1995 v6, 1997 v7, 2000

v8, 2005 v9, 2010

v9.0, 2006 v10, 2010

v4.0, 1990 v5.0, 1992 v6.0, 1994

v9.0, 2000 v10, 2005 v11, 2007

v4.21, 1993 v6, 1994 v7, 1998

v8, 1997

v3.1, 1997 v3.21, 1998 v3.23, 2001 v4, 2003 v4.1, 2004 v5, 2005 v5.1, 2008

v5.5, 2010

v8i, 1999 v9i, 2001 v10g, 2003 v10gR2, 2005 v11g, 2007 v11gR2, 2009

v8, 2000 v9, 2005 v10, 2008 v11, 2012

v3, 1995 v4, 1997 v5, 1999 v10, 2001 v11, 2003 v12, 2007 v14, 2010

v3, 1983 v4, 1984 v5, 1985

v1, 1983

v5.1, 1986

v3, 1993

v1, 1983 v2, 1988 v3, 1993 v4, 1994

v5, 1996

v6, 1999 v7, 2001 v8, 2003 v9, 2006

alpha, 1979

v1.0, 1981

v6.1, 1997 v8.1, 1998 v10.2, 2008

v5.1, 2004 v6.0, 2005 v6.2, 2006 v12, 2007 v13.0, 2009

v13.10, 2010 v14.0, 2012

v4, 1995

v5, 1997

v6, 1999

v1.6, 2001 v1.7, 2002

v1.8, 2005

v3.0, 1988

v2.0, 2010

v7, 2001 v8, 2004 v9, 2007

v10, 2010

v7, 1992

v7.0, 1995

v2, 1979

v1, 2003 v1.5, 2004

v6.5, 1995

v11, 1995v12, 1999

v15, 2009

v12c, 2013

v1, 1988 v2, 1992 v4, 1992

v6, 2008

v7, 2010

Ingres

VectorWise

MonetDB

Netezza

Greenplum

PostgreSQL

Red Brick

Microsoft SQL Server

H-Store

Informix

VoltDB

Vertica

Sybase ASE

Sybase IQ

SQL Anywhere

Access

Oracle

Infobright

MySQL

TimesTen

Paradox

Teradata Database

Empress Embedded

RDB

DB2 for iSeries

Derby

Transbase

DB2 for z/OS

DB2 for VSE & VM

Solid DB

EXASolution

dBase

Firebird

DB2 for LUW

HSQLDB

SQLite

HANA

HyPer

MaxDB

Nonstop SQL

AdabasD

MariaDB

v10, 2013

v11.70, 2010 v12.10, 2013

v2, 2006

FileMakerv1, 1985 II, 1988 v2, 1992 v3, 1995 v4, 1997 v5, 1999 v6, 2002

v7, 2004 v8, 2005

v9, 2007 v10, 2009v11, 2011 v14, 2015

dBase II, 1981 dBase III, 1984 dBase IV, 1988

MemSQL

Impala

Trafodion

Redshift

v2, 2012

v5.6, 2013 v5.7, 2015

Borland

Siemens

dBase Inc.

SAP

Ashton Tate

HP Compaq

Claris (Apple)

FileMaker Inc.

Tableau

dBase LLC

Cohera

PeopleSoft

Cullinet

NCR

Teradata

IBM

Oracle

Oracle

Borland

Corel

Informix IBM

EMC

SAP

Oracle

IBM

HP

Powersoft Sybase

Sun

Pivotal

RTI

ASK Group (1990) CA (1994)

Ingres Corp. (2005)

Actian (renamed 2011)

Actian

brand

Informix

Illustra

Microsoft SQL Server

InnoDB (Innobase)

PADB (ParAccel)

TimesTenJBMS

Cloudscape

Transbase(Transaction Software)

GDBM

AdabasD (Software AG)

P*TIME

Groton Database Systems

Trafodion

Berkeley Ingres

MonetDB (CWI)

IBM Red Brick Warehouse

MariaDB

Sybase ASE

IDM 500 (Britton Lee)

ShareBase

Aster Database

Multics Relational Data Store (Honeywell)

DB2 for VSE & VM

DB2 UDB for iSeries

DBM

System/38

InfiniDB

TurboImage (Esvel)

TurboImage/XL (HP)

IBM Peterlee Relational Test Vehicle

Neoview

Ingres

Postgres PostgreSQL

IBM Informix

Greenplum

Volt DB

Netezza

Informix

Sybase SQL Server

Microsoft Access

MySQL

Sybase IQ

Nonstop SQL(Tandem)

mSQL Infobright

H-StoreC-Store Vertica Analytic DB

VectorWise (Actian)

Monet Database System (Data Distilleries)

DATAllegro

Expressway 103

Watcom SQL SQL Anywhere

FileMaker(Nashoba)

FileMaker Pro

Redshift (Amazon)Bizgres

Empress EmbeddedRed Brick

VisualFoxPro (Microsoft)Oracle

RDB (DEC)

DBC 1012 (Teradata)

Derby Apache Derby

FoxPro

SQL/DS

DB2 UDB

DB2 MVS

Solid DB

System-R (IBM)

DB2 z/OS

SQL/400 DB2/400

MemSQL

Impala

TinyDB

Gamma (Univ. Wisconsin)Mariposa (Berkeley)

HyPer (TUM)

dBase (Ashton Tate)

NDBM

SQLite

HSQLDB

VDN/RDS DDB4 (Nixdorf)SAP DB MaxDB

SAP HANA

REDABAS (Robotron)

EXASolution

InterBase Firebird

Allbase (HP)

IBM IS1

Paradox (Ansa)

DB2

Key to lines and symbols

DBMS name (Company)

Genealogy of Relational Database Management Systems

Discontinued Branch (intellectual and/or code)Acquisition Versionsv9, 2006

Crossing lines have no special semantics

Felix Naumann, Jana Bauckmann, Claudia Exeler, Jan-Peer Rudolph, Fabian TschirschnitzContact - Hasso Plattner Institut, University of Potsdam, [email protected] - Alexander Sandt Grafik-Design, HamburgVersion 6.0 - October 2018https://hpi.de/naumann/projects/rdbms-genealogy.html

4

OLAP vs. OLTP

• Online Analytical Processing (OLAP)• mostly reads• long-running queries (one transaction per query)• many full table scans• batch inserts• example: compute revenue per month for last year• benchmark: TPC-H (also TPC-DS)

• Online Transactional Processing (OLTP)• many point writes and reads• short-running transactions (multiple statements per transaction)• heavy reliance on indexes• example: order processing (online shop)• benchmark: TPC-C (also TPC-E)

5

The Anatomy of a Dinosaur

feature techniquetransaction isolation 2 Phase Lockingsynchronization lock couplinglarge data sets buffer managementdurability ARIES-style loggingindexing B+treestorage slotted pages (row-wise)SQL iterator model (interpreter)parallelization Exchange operatorsquery optimization cost-based DP

• assumption: the only thing that matters for performance is minimizing diskI/O operations

6

7

OLTP Through the Looking Glass, and What We Found ThereStavros Harizopoulos

HP LabsPalo Alto, CA

[email protected]

Michael StonebrakerSamuel MaddenDaniel J. AbadiYale UniversityNew Haven, CT

[email protected]

Massachusetts Institute of TechnologyCambridge, MA

{madden, stonebraker}@csail.mit.edu

ABSTRACTOnline Transaction Processing (OLTP) databases include a suiteof features — disk-resident B-trees and heap files, locking-basedconcurrency control, support for multi-threading — that wereoptimized for computer technology of the late 1970’s. Advancesin modern processors, memories, and networks mean that today’scomputers are vastly different from those of 30 years ago, suchthat many OLTP databases will now fit in main memory, andmost OLTP transactions can be processed in milliseconds or less.Yet database architecture has changed little.

Based on this observation, we look at some interesting variants ofconventional database systems that one might build that exploitrecent hardware trends, and speculate on their performancethrough a detailed instruction-level breakdown of the major com-ponents involved in a transaction processing database system(Shore) running a subset of TPC-C. Rather than simply profilingShore, we progressively modified it so that after every featureremoval or optimization, we had a (faster) working system thatfully ran our workload. Overall, we identify overheads and opti-mizations that explain a total difference of about a factor of 20xin raw performance. We also show that there is no single “highpole in the tent” in modern (memory resident) database systems,but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations.

Categories and Subject DescriptorsH.2.4 [Database Management]: Systems — transaction process-ing; concurrency.

General TermsMeasurement, Performance, Experimentation.

KeywordsOnline Transaction Processing, OLTP, main memory transactionprocessing, DBMS architecture.

1. INTRODUCTIONModern general purpose online transaction processing (OLTP)database systems include a standard suite of features: a collectionof on-disk data structures for table storage, including heap filesand B-trees, support for multiple concurrent queries via locking-based concurrency control, log-based recovery, and an efficientbuffer manager. These features were developed to support trans-action processing in the 1970’s and 1980’s, when an OLTP data-base was many times larger than the main memory, and when thecomputers that ran these databases cost hundreds of thousands tomillions of dollars.

Today, the situation is quite different. First, modern processorsare very fast, such that the computation time for many OLTP-style transactions is measured in microseconds. For a few thou-sand dollars, a system with gigabytes of main memory can bepurchased. Furthermore, it is not uncommon for institutions toown networked clusters of many such workstations, with aggre-gate memory measured in hundreds of gigabytes — sufficient tokeep many OLTP databases in RAM.

Second, the rise of the Internet, as well as the variety of dataintensive applications in use in a number of domains, has led to arising interest in database-like applications without the full suiteof standard database features. Operating systems and networkingconferences are now full of proposals for “database-like” storagesystems with varying forms of consistency, reliability, concur-rency, replication, and queryability [DG04, CDG+06, GBH+00,SMK+01].

This rising demand for database-like services, coupled with dra-matic performance improvements and cost reduction in hard-ware, suggests a number of interesting alternative systems thatone might build with a different set of features than those pro-vided by standard OLTP engines.

1.1 Alternative DBMS ArchitecturesObviously, optimizing OLTP systems for main memory is a goodidea when a database fits in RAM. But a number of other data-base variants are possible; for example:

• Logless databases. A log-free database system might eithernot need recovery, or might perform recovery from other sitesin a cluster (as was proposed in systems like Harp [LGG+91],Harbor [LM06], and C-Store [SAB+05]).

• Single threaded databases. Since multi-threading in OLTPdatabases was traditionally important for latency hiding in the

Permission to make digital or hard copies of all or part of this work for per-sonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior spe-cific permission and/or a fee.SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada.Copyright 2008 ACM 978-1-60558-102-6/08/06...$5.00.

8

OLTP Through the Looking Glass [SIGMOD 2008]

• even a decade ago, the working set of many applications fit into mainmemory

• research question: Where does time go in OLTP?• approach: disable/rip out components step by step (+ additionalmicro-optimizations)

• use Shore system• open source storage engine• developed at University of Wisconsin in early 1990s• architecturally similar to Dinosaurs• the assumption is that the results should be similar too

9

General Setup

• single-core Pentium 4, 3.2 GHz• Linux• measure instructions, cycles• use TPC-C (standard OLTP benchmark)

10

TPC-C Schema

Figure 3. TPC-C Schema.

10 districts / warehouseWarehouse(size W)

100k stocks /warehouse

Stock(size W x 100k)

W stocks /item

Item(size 100k)

History(size > W x 30k)

New-Order(size > W x 9k)

Order-Line(size > W x 300k)

(size W x 30k)

(size > W x 30k)

(size W x 10)District

3k customers /district

>= 1 order /customer

Customer

Order

>= 1 historyrecord /customer

0 or 1 neworders /order

5-15 order-line entries /order

11

TPC-C Transactions

New Orderbeginfor loop(10).....Btree lookup(I), pinBtree lookup(D), pinBtree lookup (W), pinBtree lookup (C), pinupdate rec (D)for loop (10).....Btree lookup(S), pin.....update rec (S).....create rec (O-L).....insert Btree (O-L)create rec (O)insert Btree (O)create rec (N-O)insert Btree (N-O)insert Btree 2ndary(N-O)commit

PaymentbeginBtree lookup(D), pinBtree lookup (W), pinBtree lookup (C), pinupdate rec (C)update rec (D)update rec (W)create rec (H)commit

Figure 4. Calls to Shore’s methods for New Order and Payment transactions.

12

Instruction Breakdown

13

Transactions Per Second

• out-of-the-box Shore: 640• disable log flushing: 1,700• disable components: 12,700• standalone C implementation: 46,500

14

OLTP Through the Looking Glass: Conclusions

• traditional database implementation and architecture can be extremelyinefficient

• 10× to 100× performance gains are achievable• all components are slow (“no high pole in the tent”)

15

The End of an Architectural Era (It’s Time for a Complete Rewrite)

Michael Stonebraker

Samuel Madden Daniel J. Abadi

Stavros Harizopoulos MIT CSAIL

{stonebraker, madden, dna, stavros}@csail.mit.edu

Nabil Hachem AvantGarde Consulting, LLC

[email protected]

Pat Helland Microsoft Corporation

[email protected]

ABSTRACT In previous papers [SC05, SBC+07], some of us predicted the end of “one size fits all” as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific database markets. Assuming that specialized engines dominate these markets over time, the current relational DBMS code lines will be left with the business data processing (OLTP) market and hybrid markets where more than one kind of capability is required. In this paper we show that current RDBMSs can be beaten by nearly two orders of magnitude in the OLTP market as well. The experimental evidence comes from comparing a new OLTP prototype, H-Store, which we have built at M.I.T., to a popular RDBMS on the standard transactional benchmark, TPC-C.

We conclude that the current RDBMS code lines, while attempting to be a “one size fits all” solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of “from scratch” specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs.

1. INTRODUCTION The popular relational DBMSs all trace their roots to System R from the 1970s. For example, DB2 is a direct descendent of System R, having used the RDS portion of System R intact in their first release. Similarly, SQL Server is a direct descendent of Sybase System 5, which borrowed heavily from System R. Lastly, the first release of Oracle implemented the user interface from System R.

All three systems were architected more than 25 years ago, when hardware characteristics were much different than today. Processors are thousands of times faster and memories are thousands of times larger. Disk volumes have increased enormously, making it possible to keep essentially everything, if one chooses to. However, the bandwidth between disk and main memory has increased much more slowly. One would expect this relentless pace of technology to have changed the architecture of database systems dramatically over the last quarter of a century, but surprisingly the architecture of most DBMSs is essentially identical to that of System R. Moreover, at the time relational DBMSs were conceived, there was only a single DBMS market, business data processing. In the last 25 years, a number of other markets have evolved, including data warehouses, text management, and stream processing. These markets have very different requirements than business data processing.

Lastly, the main user interface device at the time RDBMSs were architected was the dumb terminal, and vendors imagined operators inputting queries through an interactive terminal prompt. Now it is a powerful personal computer connected to the World Wide Web. Web sites that use OLTP DBMSs rarely run interactive transactions or present users with direct SQL interfaces. In summary, the current RDBMSs were architected for the business data processing market in a time of different user interfaces and different hardware characteristics. Hence, they all include the following System R architectural features: Disk oriented storage and indexing structures Multithreading to hide latency Locking-based concurrency control mechanisms Log-based recovery Of course, there have been some extensions over the years, including support for compression, shared-disk architectures, bitmap indexes, support for user-defined data types and operators, etc. However, no system has had a complete redesign since its inception. This paper argues that the time has come for a complete rewrite. A previous paper [SBC+07] presented benchmarking evidence that the major RDBMSs could be beaten by specialized architectures by an order of magnitude or more in several application areas, including:

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Database Endowment. To copy otherwise, or to republish, to post on servers or to redistribute to lists, requires a fee and/or special permissions from the publisher, ACM. VLDB ’07, September 23-28, 2007, Vienna, Austria. Copyright 2007 VLDB Endowment, ACM 978-1-59593-649-3/07/09.

16

The End of an Architectural Era (It’s Time for a Complete Rewrite) [ICDE 2007]

• Stonebraker’s lesson:• existing code bases are hopeless, rewrite needed• specialized, simplified systems for OLTP, OLAP, text, etc.• let’s start lots of startups building specialized systems (OLTP: H-Store, OLAP:C-Store/Vertica, Data Integration/Cleaning: Tamr)

• German lesson:• general-purpose relational systems are kind of nice• DRAM is cheap (1 TB RAM for less than 50K EUR), let’s go in-memory only• SAP HANA, TUM HyPer

17

Modern Systems

• Column Stores for OLAP• Actian Vector (“Vectorwise”)• Microsoft Apollo (part of SQL Server)• IBM BLU

• OLTP (in-memory)• Microsoft Hekaton (part of SQL Server)• VoltDB

• OLTP and OLAP (in-memory)• SAP HANA• TUM HyPer

18

DBMS Evolution

feature old newtransaction isolation 2 Phase Locking MVCCsynchronization lock coupling optimistic lock couplinglarge data sets buffer management pointer swizzlingdurability ARIES-style logging scalable loggingindexing B+tree B+tree/triestorage slotted pages (row-wise) column storesSQL iterator model (interpreter) compilation or vectorizationparallelization Exchange operators morsel-driven parallelismquery optimization cost-based DP cost-based DP

19

Conclusions

• fast changes in hardware drive evolution of database systems• many new techniques• concepts stay similar, but need to be rethought• big trend: cloud• database systems can never be fast enough

20

Documents

DatenbanksystemimplementierungMicrosoft SQL Server H-Store Informix VoltDB Vertica Sybase ASE Sybase IQ SQL Anywhere Access Oracle Infobright MySQL TimesTen Paradox Teradata Database