25
D1 Solutions AG a Netcetera Company Real Life Performance of In-Memory Database Systems for BI 10th European TDWI Conference Munich, June 2010

Hauenstein Real Life Performance Database

Embed Size (px)

DESCRIPTION

Hauenstein Real Life Performance Database

Citation preview

  • D1 Solutions AGa Netcetera Company

    Real Life Performance of In-Memory Database Systems for BI10th European TDWI Conference

    Munich, June 2010

  • 10th European TDWI Conference

    Munich, June 2010

    Authors:

    Dr. Andreas Hauenstein Dr. Simon Hefti Dr. Andrej Vckovski

  • In-Memory Database Systems

    Buzzwords: Column-Orientation, In-Memory, Shared Nothing

    Meaning: Looks like Oracle/DB2/SQLServer from the outside,just much faster

    We are talking about relational systems, queryable in SQL

    We are not talking about client side caching (Microstrategy or QlikView do this)

    There is a new generation of DB systems, for example MonetDB, Exasol, Greenplum, LucidDB

  • Business Intelligence Data Warehouse

    We are not looking at transactional systems

    Any DB of an online shop or any DB driving a web site is transactional

    Typically BI applications are driven by a non-transactional data store that is bulk loaded in intervals by an ETL process. This is called a data warehouse.

    Next generation DB systems also exist for transactional systems. An example is Oracle TimesTen. This is a different subject.

    General Purpose DB Systems (e.g. Oracle, SQL Server)

    DB Systems Specialized for Analytics(e.g. Teradata)

    DB Systems Spezialized for Transactions(e.g. TimesTen)

  • Business Intelligence Generated SQL

    Tools with a GUI that generate SQL statements

    Examples: Business Objects, OBIEE, Microstrategy, Cognos

    No SQL tuning possible

    Bad SQL

    Non-technical users

    Frequently changing queries

    Lots of averages and sums, groupings, consolidation

  • Real Life Problem (1)

    Consolidation of numbers along a hierarchy

    Use a Parent-Child Table with a bridge table to do this in a relational DB

  • Real Life Problem (2)

    Every company has this sort of problem

    The most important people (CEO) experience the worst performance

    OLAP tools exist because this sort of query is traditionally slow on relational systems

    At a customer, 6 GB of data resulted in a 20 minute waitfor the CEO

    Even Pre-Calculating all reports over night became difficult

  • The Data Model

    levels12

    leaves4096

    nodes8191

    Bridge Table

    400 K Rows 300 K Rows

    500 K Rows

  • Size of the Data

    34438011875DIM_PRODUCT50111DIM_TIME

    300177DIM_TRANS815DIM_UNIT

    18110DIM_BUSINESSTYPE45339229819DIM_CLIENT

    53248118DIM_ORG_FLAT

    16019518723739T_FACTS

    816DIM_MEASURE

    5320679'780DIM_ACCOUNTING

    8916123DIM _ORG

    17415366775561

    RowsBlocks

    Quite small data volume

    Bad performance on several platforms

    Realistic scenario

    775561 blocks * 8192 Bytes = 6 GB

  • Data Generation

    One function call creates complete dimension table dim_org

    Generates id column, parent pointer, bridge table dim_org_flat

    Generated from a helper table with just integers and random numbers

    Similar function to generate fact table

    Started out as PL/SQL, now a Perl script that works with any DB

    It is easy to model any scenario with this tool

    create_dim( p_bf => 2, p_depth => 12, p_name => 'org', p_cols => 'org01,org02,org03,org04,org05,org06,org07,org08,org09,org10', p_types => 't10,t10,t10,t10,t10,t10,t10,t10,t10,t10);

  • The Test Query

    Generated by BI tool

  • Initial Tests on Oracle and SQL Server

    All the same order of magnitude

    Adding RAM does not help a traditional DB

    PCs are better than you think

    Aggregated Fact Rows

    Home PC159 sec205 sec1023 secOracle 10 GWindows 2003 ServerDell Dimension E521 4GB RAM

    293 sec699 sec741 secMS SQL Server2005

    Windows 2003 ServerDell Dimension E521 4 GB RAM

    Expensive ProductionServer

    167 sec168 sec1200 secOracle 10GAIXIBM 9117-570 8 GB RAM 1.9 GHt 4 CPUs

    Linux with little RAM386 sec413 sec1432 secOracle 10 GRed Hat LinuxHP DL 380 Proliant Server 0.5 GB RAM Intel Xeon 3.2 GHz

    16 Mio 1 Mio 3500OS DescriptionDBMSMachine

  • A New Generation DB System

    Im memory DB factor 30-50 faster

    Thats the speed of sound relative to a bicycle

    With generic Intel hardware

    Worth looking at several of these new systems

    Aggregated Fact Rows

    Home PC159 sec205 sec1023 secOracle 10 GWindows 2003 ServerDell Dimension E521 4GB RAM

    293 sec699 sec741 secMS SQL Server2005

    Windows 2003 ServerDell Dimension E521 4 GB RAM

    Expensive Production Server167 sec168 sec1200 secOracle 10GAIXIBM 9117-570 8 GB RAM 1.9 GHt 4 CPUs

    Linux with little RAM386 sec413 sec1432 secOracle 10 GRed Hat LinuxHP DL 380 Proliant Server 0.5 GB RAM Intel Xeon 3.2 GHz

    In Memory DB0 sec2 sec22 secExasolExacluster

    (Linux Microkernel)

    Exasol Test System 2 Quad Core Intel CPU 32 GB RAM 2 nodes

    16 Mio 1 Mio 3500OS DescriptionDBMSMachine

  • A New Generation DB System

    Im memory DB factor 30-50 faster

    Thats the speed of sound relative to a bicycle

    With generic Intel hardware

    Worth looking at several of these new systems

    0

    200

    400

    600

    800

    1000

    1200

    1400

    1600

    DD SQL DD CRA HP IBM Exa

  • The Contenders

    Oracle 11 G

    MySQL

    MonetDB

    LucidDB

    Greenplum (their own hardware)

    Exasol (their own hardware)

  • The Test Server

    Intel Dual Xeon E 5205

    16 GB RAM

    2 x 250 GB SATA Disk

    64 Bit Debian Linux

  • Interesting DB Systems That Were Not Tested

    Teradata

    Oracle ExaData

    Netezza

    Vertica

    Infobright

    Kognitio

    The field is very active and new products and approaches keep entering the market.

  • MonetDB

    Origin: Result of research at CWI in the Netherlands

    Open Source: Yes

    Free of Charge: Yes

    Remarks:o Recent publicity through a paper in Communications of the ACM:

    Breaking the Memory Wall in MonetDBo Constantly changing as research progresseso Easy to get into direct contact with the developers

    Quote from the website:MonetDB is a open-source database system for high-performance Applications in data mining, OLAP, GIS, XMLQuery, text and multimediaretrieval.

  • LucidDB

    Origin: Formerly part of LucidEra in San Mateo, California

    Open Source: Yes

    Free of Charge: Yes

    Remarks:o Emphasizes ease of configuration and maintenance o Mostly written in Java

    Quote from the website:LucidDB is the first and only open-source RDBMS purpose-built entirely fordata warehousing and business intelligence. It is based on architecturalcornerstones such as column-store, bitmap indexing, hash join/aggregation,and page-level multiversioning.

  • Greenplum

    Origin: Located in San Mateo, California. Postgres based.

    Open Source: Based on Open Source Technology

    Free of Charge: No

    Remarks:o Based on similiar hardware architecture as Exasolo Highly configurable and tunable, lots of featureso Column store is an option, default is row store

    Quote from the website:Greenplum Database utilizes a shared-nothing MPP (massively parallel processing) architecture that has been designed from the ground up for BI and analytical processing using commodity hardware. In this architecture, data is automatically partitioned across multiple 'segment' servers, and each 'segment' owns and manages a distinct portion of the overall data.All communication is via a network interconnect -- there is no disk-levelsharing or contention to be concerned with (i.e. it is a 'shared-nothingarchitecture).

  • Exasol

    Origin: Developed from scratch in Nrnberg, Germany

    Open Source: No

    Free of Charge: No

    Remarks:o Based on similiar hardware architecture as Greenplumo Pure column store DBo Emphasizes ease of administrationo No need to create indexes or gather statisticso Imitates some Oracle-isms for compatibility

    Quote from the website:The database has been specially developed for analysis and is being used successfully for data warehousing, Web analytics, data mining applications and more. In contrast with universal databases, this specialization means that the data to be analyzed can be made available to analysis tools virtually in real time.

  • Typical Shared Nothing Node

    Combine many of these, connected by GB Ethernet

  • Results With 16 Mio Rows in the Fact Table

    Oracle on a new 64 Bit box is 4 times faster than on an average 32 bit box

    Both Oracle and LucidDB were twice as fast after dropping all indexes on the fact table (those are the times in the chart)

    We did not manage to tune MySQL to get acceptable performance for a free system, LucidDB has good performance and little hassle

    MonetDB needed a fix in the optimizer before coping with the query

    Next generation in memory DBs are at least one order of magnitude faster

    226

    2280

    460

    31 13 100

    500

    1000

    1500

    2000

    2500

    Oracle MySQL LucidDB MonetDB Greenplum Exasol

  • 183

    364

    105

    210

    3

    54

    97

    13

    133

    288

    26

    60

    50

    100

    150

    200

    250

    300

    350

    400

    16 160 320

    Exasol [sec](public demo system)

    Exasol [sec](untuned comparablehardware) Exasol [sec](local dimensionscomparable hardware ) Greenplum[sec]

    Performance Scaling

    Both systems scale linearly It is possible to query at least ten times the data

    volume efficiently The vendors claim unlimited linear scaling by adding

    commodity hardware

  • Conclusion

    Big Lessons Database technology is in upheaval at the moment

    By adopting the new technologies, you can totally revolutionize the way you access your data

    Prices will fall rapidly. This is like the PC revolution.

    Small Lessons If you have an Oracle on a 32 Bit system, move to a 64 Bit architecture. It will give

    you a factor 4 without any pain

    If your table scans are slow, drop all indexes

    If you move to a new technology, you will get a factor 50

    The commercial systems are worth their money. Their SQL is more compatible, and they are more stable