69
1 Informix Warehouse & Informix Warehouse Accelerator Overview Scripted for Tech Sales audience March 2011

Informix warehouse and accelerator overview

Embed Size (px)

DESCRIPTION

Describes recent features in Informix warehouse and the new query acceleration technology.

Citation preview

Page 1: Informix warehouse and accelerator overview

1

Informix Warehouse & Informix Warehouse Accelerator

Overview

• Scripted for Tech Sales audience

• March 2011

Page 2: Informix warehouse and accelerator overview

2

Disclaimer– © Copyright IBM Corporation 2011. All rights reserved.

– U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

– THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.  WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.  IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.

• IBM, the IBM logo, ibm.com, Cognos, SPSS and Informix are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

• Other company, product, or service names may be trademarks or service marks of others.

Page 3: Informix warehouse and accelerator overview

3

Agenda

• Data Warehouse Industry Trends

• Data Warehousing on Informix

– History & Roadmap

• Informix Data Warehouse

– Informix Warehouse Tooling – ETL

– IDS 11.70 Server Features

• Informix Warehouse Accelerator

• Q&A

Page 4: Informix warehouse and accelerator overview

4

Data Warehousing Industry Trends

Page 5: Informix warehouse and accelerator overview

5

State of Data Warehousing in 2011

DBMS Market in 2011:

• DBMS market at the close of 2009 was approximately $21.2 billion (2010 data not yet available)

• Data Warehouse DBMS market was approximately 35% of the DBMS market or $7.42 billion

Key Findings:

• Data warehouse DBMSs have evolved to a broader analytics infrastructure supporting operational analytics, corporate performance management and other new applications and uses.

• Cost is driving interest in alternative architectures but performance optimization is driving multi-tiered data architectures and a variety of deployment options - notably a strong interest in in-memory data mart deployments.

Page 6: Informix warehouse and accelerator overview

6

State of Data Warehousing, Cont’d

Market Dynamics for 2011

• Today, smaller data warehouses, those less than 5 TB's of source system extracted data (SSED) are the only "data warehouse" for the entire organization and are commonly solving organizations' analytic needs. Gartner estimates that between 70% and 75% of all systems referred to as EDW are actually single business departments in nature.

Analysis:

• Optimization techniques such as summaries, aggregates and indexes are simply the result of performance restrictions inherent to normalized data and the way the RDBMS manages rows and columns.

Page 7: Informix warehouse and accelerator overview

7

State of Data Warehouse, Cont’d

A Glimpse Into the Future

• Vendor solutions began to focus even more on the ability to isolate and prioritize workload types including strategies for dual warehouse deployments and mixing OLTP and OLAP on the same platform.

• In-memory DBMS solutions provide a technology which enables OLTP/OLAP combined solutions. Organizations should increase their emphasis on financial viability during 2011 and even into 2012 as well as aligning their analytics strategies with vendor road maps when choosing a solution.

Page 8: Informix warehouse and accelerator overview

8

Data Warehouse Trends for the CIO, 2011-2012

Data Warehouse Appliances:

• DW appliances are not a new concept. Most vendors have developed an appliance offering or promote certified configurations. Main reason for consideration is simplicity.

The Resurgence of Data Marts:

• Data marts can be used to optimize DW by offloading part of the workload, returning greater performance to the warehousing environment

Column-Store DBMSs

• CIOs should be aware that their current DBMS vendor may offer a column-store solution. Don’t just buy a column-store-only DBMS because a column store was recommended by your team.

In-Memory DBMSs

• IMDBMS technology also introduces a higher probability that analytics and transactional systems can share the same database.

Page 9: Informix warehouse and accelerator overview

9

Informix Warehouse History

Informix has 3 Database Products: XPS for MPP Data Warehousing Red Brick for Star Schema data marts/data warehousing Informix Dynamic Server (IDS) for OLTP & (now) Data

Warehousing

Page 10: Informix warehouse and accelerator overview

10

Existing IDS Warehousing Features

• Performance & Scalability

– Inherent SMP Multi-threading

– Parallel Data Query (PDQ)

– Light Scan for fast table scans

– Online Index build

– Efficient Hash Joins

– Auto Fragment Elimination

– Memory Grant Manager (MGM)

– High Performance Loader

– Optimistic Concurrency

• Easy of Management

– Time cyclic data management using Range Partitioning

– Sophisticated Query Optimizer for OLTP and Warehousing

Page 11: Informix warehouse and accelerator overview

11

Informix Warehousing Moving Forward

Goal is to provide a comprehensive warehousing platform that is highly competitive in the marketplace

Incorporating the best features of XPS and Red Brick into IDS for OLTP/Warehousing and Mixed-Workload

Using the latest Informix technology in:

Continuous Availability and Flexible Grid

Data Warehouse Accelerator using latest industry technology

Integration of IBM’s BI software stack

Page 12: Informix warehouse and accelerator overview

12

Informix Warehouse Feature

- SQW- Data Modeling- ELT/ETL

Informix Warehouse with Storage Optimization/Compression

Cognos integration- Native Content Store on IDS

SQL Merge

Informix Warehouse Roadmap

External Tables

Star Join OptimizationMulti-index ScanNew FragmentationFragment Level StatsStorage Provisioning

Warehouse Accelerator

Page 13: Informix warehouse and accelerator overview

13

Informix Warehouse11.70 Features

Page 14: Informix warehouse and accelerator overview

14

Typical Data Warehouse Architecture

Page 15: Informix warehouse and accelerator overview

15

Source: Forrester

Query Tools

Analytics

BPS Apps

BI Apps

LOB apps

Databases

Other transactional data sources

I/O & data loading

Query processing

DBMS & Storage mgmt

11.70 Warehousing Features

Data Loading

HPL

DB utilities

ON utilities

DataStage

External Tables

Online attach/detach

Data & Storage Management

Deep Compression

Interval and List Fragmentation

Online attach/detach

Fragment level stats

Storage provisioning

Table defragmenter

Query Processing

Light Scans

Merge

Hierarchical Queries

Multi-Index Scan

Skip Scan

Bitmap Technology

Star and Snowflake join optimization

Implicit PDQ

Access performance

Page 16: Informix warehouse and accelerator overview

16

SQW

Control DB

IDS

Execution

DESIGN

Design Center(Eclipse)

Data Flows + Control Flows

DEPLOY

Deployment

preparation

Deployment package

Code Units

Build Profile

User scripts

Deploy

RUNTIME

HTTP service (WAS )SQW Runtime

ApplicationsOther Servers

(DataStage)

Warehouse

DB

IDSDB2

Oracle

SQL Server

De

sig

n S

tud

ioA

dm

in

Co

ns

ole

Deploy

SQWExecution

DB

IDS

Data Source

Datab

ases

Exec

utio

n

Execution

Debug

Informix Warehouse Tooling - SQW

Page 17: Informix warehouse and accelerator overview

17

SQW: Design Studio

• Design Studio–Eclipse based IDE

• Integrated tools, shell sharing

–Team development

• CVS, clearcase for checkin/checkout projects, flows

• Data Warehousing Project–Data Models–Data Flows–Control Flows–Warehouse Applications (deployment packages)–Subflow & Subprocess (reusable flow module)–Variables

• Data Source Explorer–Database connections to multiple vendors, e.g. Informix, DB2

LUW, Oracle, SQL Server, MySQL, DB2 z/OS

• DataStage Servers–Integration with IBM DataStage

Page 18: Informix warehouse and accelerator overview

18

SQW: Data Modeling

Physical Data Model Visualized data modeling Impact analysis Reverse engineering or new from scratch Compare & sync Generate DDL Overview diagram

Shell Sharing with Rational Data Architect & other Data Studio products

Physical Data Model Visualized data modeling Impact analysis Reverse engineering or new from scratch Compare & sync Generate DDL Overview diagram

Shell Sharing with Rational Data Architect & other Data Studio products

Page 19: Informix warehouse and accelerator overview

19

SQW: Data Flows

Data Flow Operators:

Source & target operators (table, file)

SQL Transformation operators

Warehousing operators

Data Flow Operators:

Source & target operators (table, file)

SQL Transformation operators

Warehousing operators

File source

Table source

Table join

aggregation

Table target

Page 20: Informix warehouse and accelerator overview

20

SQW: Data Flows

A simple flowA simple flow

Generated SQL code

Optimization across SQL statements.

Optimized staging strategy

In-database transformation

Generated SQL code

Optimization across SQL statements.

Optimized staging strategy

In-database transformation

Page 21: Informix warehouse and accelerator overview

21

SQW: Control Flows

Control flow

Common utility operators

Control logic, parallel execution, loop iteration

Error handling

Control flow

Common utility operators

Control logic, parallel execution, loop iteration

Error handling

Page 22: Informix warehouse and accelerator overview

22

SQW Overview

Design Studio

Eclipse Based Design Environment

Admin Console

Production Environment in Websphere

deploy

Application package (zip file)

Deployment profile: database connections, machine resources, variable definitions, DDL files etc..

Generated code

create

Manage warehouse applications

Schedule

Monitor

man

age

Page 23: Informix warehouse and accelerator overview

23

Admin Console Flex RIA based Warehouse Admin Console

Admin Console manages common resources (e.g. databases connections, ftp servers, DataStage servers)

Schedule & monitor warehouse processes

Page 24: Informix warehouse and accelerator overview

24

• Time-cyclic data management (roll-on, roll-off)

• Attach and detach online without requiring exclusive lock and access to the table

• Automatically kicks off background process to recollect statistics.

• Interval and List Fragmentation

• Auto Fragment level statisticsfieldfieldfieldfieldfieldfieldfield

fieldfieldfieldfieldfieldfieldfield field

fieldfieldfieldfieldfieldfield

fieldfieldfieldfieldfieldfieldfield

fieldfieldfieldfieldfieldfieldfield

fieldfieldfieldfieldfieldfieldfield

fieldfieldfieldfieldfieldfieldfield

JanJan FebFeb MarMar AprApr

May 2011May 2011

Dec 2010Dec 2010

enables storing data over time

Informix 11.70 Feature: Warehouse Time-Cyclic Data Management

Page 25: Informix warehouse and accelerator overview

25

Interval Fragmentation

• Fragments data based on an interval value

– E.g. fragment for every month or every million customer records

• Tables have an initial set of fragments defined by a range expression

• When a row is inserted that does not fit in the initial range fragments, IDS will automatically create fragment to hold the row (no DBA intervention)

• No Exclusive-lock is required for fragment addition

• All the benefits of fragment by expression

Page 26: Informix warehouse and accelerator overview

26

Informix 11.70 Feature: Multi-Index Scan

• Make use of all available indices

• Use set operations to apply to all rowids

• Use bitmap operations like union and intersection

• Bitmap can also be used for Skip Scan operations

Page 27: Informix warehouse and accelerator overview

27

Multi-Index Scan – An Example

• Handling common Data Warehouse queries more efficiently

• Large dimension tables, e.g. customer table

• Multiple low-selectivity attributes like gender, age group, zip code, etc.

• Example

SELECT count (customer_id)

FROM customer_table

WHERE gender = male

AND income_category = HIGH

AND education_level = MASTERS

AND zip_code = 95032;

Page 28: Informix warehouse and accelerator overview

28

Multi-Index Scan Example

• Method #1:

– Evaluates the most selective constraint

– Generates a list of rows that qualify, and

– Evaluate the remaining constraints for each of the rows generated above which will produce the answer to the query

Method retrieves rows based on the most selective constraint using only the index for that column, followed by a sequential evaluation of each of other constraints in a post-retrieval manner.

Page 29: Informix warehouse and accelerator overview

29

Multi-Index Scan Example

• Method #2– Evaluate each constraint by using a different B-tree index on each attribute –

results in a list of rows that qualify for each constraints.– Merge the lists to form one master list that satisfies all the constraints– Retrieve the qualifying rows to produce the answers

Gender=‘m’ Zipcode=‘95032’

AND

Sequential Skip Scan

RecordsSorted RIDs

Income_Category=“high”

Education_level = “masters”

Page 30: Informix warehouse and accelerator overview

30

Informix 11.70 Feature: Push Down Hash Join

First, a standard Hash Join for typical warehousing queries

involving a “large” Fact table with multiple dimension tables

Build Hash Table on Left Input

Probe with Right Input

Typically, build on smaller input

avoids hash table overflow to disk

Build Scan

Hash Join

Build Probe

Probe Scan

Page 31: Informix warehouse and accelerator overview

31

Large Central “Fact” table

Smaller “Dimension” tables

Restrictions on Dimension tables

assume independence

Small fraction of Fact table in result

Dim (D1) Dim (D3)Fact (F)

1M rows

sel :1/1000

10K rows sel : 1/10

10K rows

sel : 1/10

10K rows

sel: 1/10

Dim (D2)

Typical Star Schema: An Example

Page 32: Informix warehouse and accelerator overview

32

Scan D1

Hash Join

1K

Hash Join

Scan D3Hash Join

Scan D2

1M

100K 1K

10K 1KProblem Join

Second Join Build Too Large

Scan F

Prior to 11.70: Standard Left Deep Tree Solution

Page 33: Informix warehouse and accelerator overview

33

Scan F

Hash Join

Scan D1

1K

Hash Join

Scan D3 Hash Join

Scan D2

1K

1K1K

1K1K

Join Keys

Multi Index Scanof Fact Table

using Join Keys and Single-Column Indexes

Join Keys Pushed Down to Reduce Probe Size

11.70 Feature: Pushdown Hash-Join Solution

Page 34: Informix warehouse and accelerator overview

34

Informix Warehouse Accelerator (IWA)

Page 35: Informix warehouse and accelerator overview

35

Agenda

• 3rd Generation Data Base Technology

• Overview of the Informix Warehouse Accelerator (IWA)

• Target Market

• Beta Customer Experience

• IWA vs. Row/Column/Hybrid Stores

• Loading IWA

• Referenced Hardware & Software Configuration

Page 36: Informix warehouse and accelerator overview

36

Third Generation of Database Technology

According to IDC’s Article (Carl Olofson) – Feb. 2010

1st Generation:

- Vendor proprietary databases of IMS, IDMS, Datacom

2nd Generation:

- RDBMS for Open Systems, dependent on disk layout, limitations in scalability and disk I/O

- Database tuning by adding updating stats, creating/dropping indexes, data partitioning, summary tables & cubes, force query plans, resource governing

3rd Generation: IDC Predicts that within 5 years:

• Most data warehouses will be stored in a columnar fashion

• Most OLTP database will either be augmented by an in-memory database (IMDB) or reside entirely in memory

• Most large-scale database servers will achieve horizontal scalability through clustering

Page 37: Informix warehouse and accelerator overview

37

Example of 2nd Generation Database Disk I/O Issue

Page 38: Informix warehouse and accelerator overview

38

How Oracle/Exadata Solves That Problem:Add an I/O Layer

Page 39: Informix warehouse and accelerator overview

39

Sun Oracle Database Machine Full Rack

• Each Exadata cell is a self-contained server which houses disk storage and runs the Exadata software

• Databases are deployed across multiple Exadata cells

• Database enhanced to work in cooperation with Exadata intelligent storage

8 Cores

24 GB Memory

12 Disks

(600 GB/2 TB)

8 Cores

24 GB Memory

12 Disks

(600 GB/2 TB)

8 Cores

24 GB Memory

12 Disks

(600 GB/2 TB)

8 Cores

24 GB Memory

12 Disks

(600 GB/2 TB)

8 Cores

24 GB Memory

12 Disks

(600 GB/2 TB)

14 Exadata Storage Cells (Storage Server)

per Cell up to 1.5 GB/Sec I/O Bandwidth => 21 GB/Sec per DB machine

8 Cores

72 GB Memory

8 Cores

72 GB Memory

8 Cores

72 GB Memory

8 Cores

72 GB Memory

8 Cores

72 GB Memory

8 Oracle RAC Database Servers

InfiniBand Switches/Network

InfiniBand 16 Gigabit per Channel

Page 40: Informix warehouse and accelerator overview

40

Cost of Oracle/Exadata Solution

• Database Machine price – Full Rack

$1,115,000 Hardware (same price for 600GB or 2TB drives)

$1,680,000 Oracle Exadata Storage Server software

$1,520,000 Oracle 11gR2 Enterprise Edition

$736,000 Oracle Real Application Clusters

$368,000 Oracle Partitioning

$368,000 Advanced Compression

$160,000 Enterprise Manager Diagnostic Pack (recommended)

$160,000 Enterprise Manager Tuning Pack (recommended)

$1,098,240 1st year software support and maintenance

---------------------------------------------------------------------------------------------------------

$7,240,240 Total Price

• Excludes OLAP option, Data Mining option, ETL option

• Installation is extra and requires a custom quote

Page 41: Informix warehouse and accelerator overview

41

Agenda

• 3rd Generation Data Base Technology

• Overview of the Informix Warehouse Accelerator (IWA)

• Target Market

• Beta Customer Experience

• IWA vs. Row/Column/Hybrid Stores

• Loading IWA

• Referenced Hardware & Software Configuration

Page 42: Informix warehouse and accelerator overview

42

Informix Warehouse Accelerator 3rd Generation Database Technology is Here

How is it different?• Performance: Unprecedented response

times to enable 'train of thought' analysis frequently blocked by poor query performance.

• Integration: Connects to IDS through deep integration providing transparency to all applications.

• Self-managed workloads: queries are executed in the most efficient way

• Transparency: applications connected to IDS, are entirely unaware of IWA

• Simplified administration: appliance-like hands-free operations, eliminating many database tuning tasks

What is it?

The Informix Warehouse Accelerator (IWA) is a workload optimized, appliance-like, add-on, that enables the integration of business insights into operational processes to drive winning strategies. It accelerates select queries, with unprecedented response times.

Breakthrough Technology Enabling New Opportunities

Page 43: Informix warehouse and accelerator overview

43

Breakthrough technologies for performance

1

2

34

5

6

7 1

2

34

5

6

7

Row & Columnar DatabaseRow format within IDS for transactional workloads

and columnar data access via accelerator for OLAP queries.

Extreme CompressionRequired because RAM is the limiting factor.

Massive ParallelismAll cores are used within used for queries

Predicate evaluation on compressed data

Often scans w/o decompression during evaluation

Frequency PartitioningEnabler for the effective parallel access of

the compressed data for scanning. Horizontal and Vertical Partition

Elimination.

In Memory Database3rd generation database technology avoids I/O. Compression allows huge databases

to be completely memory resident

Multi-core and Vector Optimized Algorithms

Avoiding locking or synchronization

Page 44: Informix warehouse and accelerator overview

44

TC

P/IP

Informix Warehouse Accelerator Configuration

IDS: • Routes SQL queries to accelerator

• User need not change SQL or apps.

• Can always run query in IDS, e.g., if

– too short an est. execution time

Bulk Loader

SQL Queries (from apps)

Informix Warehouse Accelerator

Compressed DB partition

QueryProcessor

Data Warehouse

IDS SQL

(via DRDA)

Query Router

Informix Warehouse Accelerator: Connects to IDS via TCP/IP & DRDA Analyzes, compresses, and loads

Copy of (portion of) warehouse Processes routed SQL query and

returns answer to IDS

Results

Page 45: Informix warehouse and accelerator overview

45

Informix Warehouse Accelerator Overview

Coordinator Process

Orchestrating the distributed tasks like Load or Query execution

.

Have all the data in main memory spread across all cores. Do the compression and query execution.

IDS

Query parsing and matching to the Optimizer. Routing query blocks.

.

.

Worker Processes

Page 46: Informix warehouse and accelerator overview

46

Target Market: Business Intelligence (BI)

• Characterized by:

– “Star” or “snowflake” schema:

Complex, ad hoc queries that typically― Look for trends, exceptions to make actionable business decisions― Touch large subset of the database (unlike OLTP)― Involve aggregation functions (e.g., COUNT, SUM, AVG,…)― The “Sweet Spot” for the IWA!

City

Region

Store

SALES

Product

Period

Brand

Month

Quarter

Category

Dimensions

Fact Table

Page 47: Informix warehouse and accelerator overview

47

What IWA is Designed For

• Selective, fast scans over large (fact) tables

• Joins with smaller Dimension tables

• OLAP-style queries over large fact tables in relational star schema with grouping and aggregations

SELECT PRODUCT_DEPARTMENT, REGION, SUM(REVENUE)

FROM FACT_SALES F

INNER JOIN DIM_PRODUCT P ON F.FKP = P.PK

INNER JOIN DIM_REGION R ON F.FKR = R.PK

LEFT OUTER JOIN DIM_TIME T ON F.FKT = T.PK

WHERE T.YEAR = 2009

AND R.GEOID = 17

AND P.TYPEID = 3

GROUP BY PRODUCT_DEPARTMENT, REGION

Page 48: Informix warehouse and accelerator overview

48

Case Study #1: Major U.S. Shoe Retailer

• Top 7 time-consuming queries in Retail BI and Warehouse: (Against 1 Billion rows Fact Table)

Query IDS 11.5 IDS 11.7 IWA

1 22 mins 4 secs

2 1 min 3 secs 2 secs

3 3 mins 40 secs 2 secs

4 30 mins & up 4 secs

5 2 mins 2 secs

6 30 mins 2 secs

7 45 mins & up 2 secs

Our Retail users will be really happy to see such a huge improvement in the queries processing timings.

This IWA extension to IDS will really bring value to the Retail BI environment.

Page 49: Informix warehouse and accelerator overview

49

• Microstrategy report was run, which generates

• 667 SQL statements of which 537 were Select statements

• Datamart for this report has 250 Tables and 30 GB Data size

• Original report on XPS and Sun Sparc M9000 took 90 mins

• With IDS 11.7 on Linux Intel box, it took 40 mins

• With IWA, it took 67 seconds.

Case Study #2: Datamart at a Government Agency

Page 50: Informix warehouse and accelerator overview

50

Case Study #3: U.S. Government Agency

Query Description Informix Informix w/ IWA Notes Improvement

1 Find Top 100 Entities 1:28:22 0:01:28 Fact Table Scan 6023.23%

2 Find Top 100 Members 1:22:32 0:01:05 Fact Table Scan 7640.45%

3Summarize all transactions by

State and County 1:34:37 0:00:14 Fact Table Scan 41708.49%

4Detailed Report on Specific

Programs in a Date Range 0:00:06 0:00:06 Index Read 108.41%

5

Summarize all transactions by State, County, City, State, Zip, Program, Program Year, Commodity and Fiscal Year 1:48:58 0:00:41 Fact Table Scan 15800.89%

           

Page 51: Informix warehouse and accelerator overview

51

Agenda

• 3rd Generation Data Base Technology

• Overview of the Informix Warehouse Accelerator (IWA)

• Target Market

• Beta Customer Experience

• IWA vs. Row/Column/Hybrid Stores

• Loading IWA

• Referenced Hardware & Software Configuration

Page 52: Informix warehouse and accelerator overview

52

Row Oriented Data StoreEach row stored sequentially

• Optimized for record I/O

• Fetch and decompress entire row, every time

• Result –

• Very efficient for transactional workloads

• Not always efficient for analytical workloads

If only few columns are required the complete row is still fetched and uncompressed

Page 53: Informix warehouse and accelerator overview

53

Columnar Data Store Data is stored sequentially by column

If attributes are not required for a specific query execution,they are skipped completely.

• Data is compressed sequentially for column:

•Aids sequential scan

•Slows random access

Page 54: Informix warehouse and accelerator overview

54

Top 64 traded goods – 6 bit code

Rest

Prod Origin

Trade Info (volume, product, origin country)

CommonValues

Rare values

Nu

mb

er o

f O

ccu

rren

ces

Histogramon Origin

Histogram on Product

Origin

Pro

du

ct

ChinaUSA

GER,FRA,

… Rest

Table partitioned into Cells

Column Partitions

Vol

Compression: Frequency Partitioning

• Field lengths vary between cells• Higher Frequencies Shorter Codes (Approximate Huffman)

• Field lengths fixed within cells

Cell 4Cell 1

Cell 2

Cell 3

Cell 5

Cell 6

Page 55: Informix warehouse and accelerator overview

55

Male/John

Compression Process: Step 1 Input tuple

Column 1 Column 2

Co-codetransform

Type specifictransform

Column1 & 2

Column3.A

ColumnCode

TupleCode

ColumnCode

Column 3

Column3.B

ColumnCode

HuffmanEncode

Dict HuffmanEncode

DictHuffmanEncode

Dict

Male/John/Sat

Sat 2006

Male, John, 08/10/06, Mango

101101011 001 01011101

10110101100101011101

p = 1/512 p = 1/8 p = 1/512

w35/Mango

w35

Male John 08/10/06 Mango

Michael 4.2%

David 3.8%

James 3.6%

Robert 3.5%

John 3.5%

William 2.5%

Mark 2.4%

Richard 2.3%

Thomas 1.9%

Steven 1.5%

Mon Tue Wed Thu Fri Sat Sun

Male 3% 4% 10% 6% 23% 42% 12%

Female 4% 5% 9% 15% 17% 28% 22%

Page 56: Informix warehouse and accelerator overview

56

Compression Process: Step 2

First tuple code

Tuplecode

SortedTuplecodes1

PreviousTuplecode

Delta

HuffmanEncode

Delta Code

Append

Dict

CompressionBlock

10110101110000110010110101110001011111

1011010111000011101

10110101110001011101

10110101110001011101

0000000000000000001

000

000

00000000000000000001

010

010

0000000000000000101

1110

1110

Look Ma, no delimiters!101101011100010111010000101110

Page 57: Informix warehouse and accelerator overview

58

Register Stores Facilitate SIMD Parallelism

• Access only the banks referenced in the query (like a column store):

–SELECT SUM (T.G) –FROM T–WHERE T.A > 5–GROUP BY T.D

• Pack multiple rows from the same bank into the 128-bit register

• Enables yet another layer of parallelism: SIMD (Single-Instruction, Multiple-Data)!

A1 D1 G1

A2 D2 G2

A4 D4 G4

Bank β1 (32 bits)

A3 D3 G3

B1 E1 F1

B2 E2 F2

B4 E4 F4

C1 H1

C3 H3

C4 H4

Bank β2 (32 bits)Bank β3 (16 bits)

Ce

ll Blo

ck

B3 E3 F3

C2 H2

32 bits 32 bits32 bits32 bits

128 bitsResult1 Result2 Result3 Result4

Operand Operand Operand Operand

Vector Operation

Page 58: Informix warehouse and accelerator overview

59

Simultaneous Evaluation of Equality Predicates

State==‘CA’ && Quarter == ‘Q4’

State==01001 && Quarter==1110

Translate value queryto Code query

Row

Mask

Selectionresult

… … … …

11111 0 1111 0

01001 0 1110 0

==

&

• CPU operates on 128-bit units

• Lots of fields fit in 128 bits

• These fields are at fixed offsets

• Apply predicates to all columns simultaneously!

State Quarter

Page 59: Informix warehouse and accelerator overview

60

Agenda

• 3rd Generation Data Base Technology

• Overview of the Informix Warehouse Accelerator (IWA)

• Target Market

• Beta Customer Experience

• IWA vs. Row/Column/Hybrid Stores

• Loading IWA

• Referenced Hardware & Software Configuration

Page 60: Informix warehouse and accelerator overview

61

Defining, What Data to Accelerate

• A MART is a logical collection of tables which are related to each other. For example, all tables of a single star schema would belong to the same MART.

• The administrator uses a rich client interface to define the tables which belong to a MART together with the information about their relationships.

• IDS creates definitions for these MARTs in the own catalog. The related data is read from the IDS tables and transferred to IWA.

• The IWA transforms the data into a highly compressed, scan optimized format which is kept locally (in memory) on the Accelerator

Define

Worker Processes

Coordinator Process

IDS + IWA

Page 61: Informix warehouse and accelerator overview

62

IWA Design Studio

Page 62: Informix warehouse and accelerator overview

63

Distributing data from IDS (Fact tables)

Data Fragment

Fact Table

UNLOADUNLOADUNLOADUNLOAD

IDS Stored Procedures

Copy

A copy of the IDS data is now transferred over to the Worker process. The Worker process holds a subset of the data (compressed) in main memory and is able to execute queries on this subset. The data is evenly distributed (no value based partitioning) across the cpus.

Coordinator Process

Worker Process

Compressed Data

Compressed Data

Compressed Data

Compressed Data

Compressed Data

Compressed Data

Worker Process

Worker Process

Data Fragment

Data Fragment

Data Fragment

Page 63: Informix warehouse and accelerator overview

64

Dimension Table

Dimension Table

Dimension Table

Dimension Table

Distributing data from IDS (Dimension tables)

IDS

UNLOADUNLOADUNLOADUNLOAD

IDS Stored Procedure

Dimension Table

Dimension Table

Dimension Table

Dimension Table

All dimension tables are transferred to the worker process.

Dimension Table

Dimension Table

Dimension Table

Dimension Table

Dimension Table

Dimension Table

Dimension Table

Dimension Table

Coordinator Process

Worker Process

Worker Process

Worker Process

Page 64: Informix warehouse and accelerator overview

65

Inside IWAInside IDS

Mapping Data from IDS to IWA

Data Fragment

Data Fragment

Data Fragment

Data Fragment

Data Fragment

Data Fragment

Fact Table

Dimension Table

Dimension Table

Dimension Table

Dimension Table

Data Fragment

Data Fragment

Data Fragment

Data Fragment

Data Fragment

Data Fragment

Fact Table

Dimension Table

Dimension Table

Dimension Table

Dimension Table

Compressed

Page 65: Informix warehouse and accelerator overview

66

Agenda

• 3rd Generation Data Base Technology

• Overview of the Informix Warehouse Accelerator (IWA)

• Target Market

• Beta Customer Experience

• IWA vs. Row/Column/Hybrid Stores

• Loading on IWA

• Referenced Hardware & Software Configuration

Page 66: Informix warehouse and accelerator overview

67

IWA Referenced Hardware Configuration

Intel(R) Xeon(R) CPU X7560 @ 2.27GH 4 X 8

Memory 512G

6 disks 300 GB SAS hard disk drives each

- 4-processor, 4U rack-optimized enterprise server with Intel® Xeon® processors.

- 8-core, 6-core and 4-core processor options with up to 2.26 GHz (8-core), 2.66 GHz (six-core) and 1.86 GHz (four-core) speeds with up to 16 MB L3 cache

- Scalable from 4 sockets and 64 DIMMs to 8 sockets and 128 DIMMs

- Optional MAX5 32-DIMM memory expansion

- 16x 1.8" SAS SSDs with eXFlash or 8x 2.5" SAS HDDs

Options:

Page 67: Informix warehouse and accelerator overview

68

IWA Software Components

• Linux on Intel x86_64 (RHEL 5 or SUSE SLES 11)

• IDS 11.70 + IWA code modules including IDS Stored Procedures

• ISAO Studio Plug-in – GUI for Mart definition

• OnIWA – On Utilities for Monitoring IWA

Page 68: Informix warehouse and accelerator overview

69

(Fred Ho – [email protected])

Page 69: Informix warehouse and accelerator overview

70