What is InfiniDB? Massively Parallel MySQL Storage Engine for
Fast Analytics Linear scale to handle exponential growth
Open-Source Runs on premise, on AWS cloud or Hadoop HDFS cluster
Standard ANSI SQL compliance First MySQL storage engine to support
ANSI SQL11- compliant windowing functions Copyright 2014 InfiniDB.
All Rights Reserved.
Slide 3
3 Custom Handler Class InfiniDB Server User Module Performance
Module(s) Storage User Connections MySQL -----------------------
InfiniDB ExeMgr MySQL Functions MySQL Client MySQL Connectivity
(JDBC, ODBC) MySQL Security Initial SQL Statement Parsing Initial
SQL Optimization Execute final sort and final limit Display final
results
---------------------------------------------------------------------
InfiniDB ExeMgr Functions SQL Optimization Distribute work for
scan, filter, join, functions, expressions, group by, aggregation,
etc. to the all available Performance Modules to be run in
parallel. Collect the results returned by the Performance Modules
Return the final results to MySQL for display
Slide 4
4 InfiniDB Design Principles Scalable Fast Simple
Slide 5
InfiniDB Parallelism User Module Processes SQL Requests
Performance Module Executes the Queries or Single Server MPP
Copyright 2014 InfiniDB. All Rights Reserved.
Slide 6
6 Tiered MPP Building Blocks ModuleProcessFunctionalityValue
MySQL Hosts MySQL Connection management SQL parsing &
optimization Familiar DBMS interface Leverages existing partner
integrations Delivers full SQL syntax support Extent Map Abstracts
physical and logical storage Metadata store Enables shared nothing
and shared everything storage Enables partition elimination
Built-in failover ExeMgr Work distribution Final results management
and aggregation Independent scalability and tunable concurrency
Multi-threaded to take advantage of multi- core HW platforms
Slide 7
7 Tiered MPP Building Blocks ModuleProcessFunctionalityValue
PrimProc Scale-out cache management Distributed scan, filter, join
and aggregation operations Resource management Independent
scalability and tunable performance Multi-threaded to take
advantage of multi- core HW platforms Data High Speed Bulk Load
Transactional DML and DDL Online schema extensions Enables
concurrent reads and writes, non- blocking read enabled
Multi-threaded to take advantage of multi- core HW platforms
Slide 8
InfiniDB Foundation - Parallelism 8 Purpose-built C++ engine
Parallelism is at the thread level Example: 12 PM Servers with 8
cores each yields 96 parallel processing engines. SQL is translated
into thousands or tens of thousands of discrete jobs or primitives.
The UM sends primitives to the processing engines.
Slide 9
InfiniDB Parallelism Fixed Thread Pool Copyright 2014 InfiniDB.
All Rights Reserved. Single ServerMPP Local disk / EBS GlusterFS /
HDFS Primitives are issued into a thread queue within each
performance module. User Module Processes SQL Requests Performance
Module Executes the Queries
Slide 10
10 Architectural Differentiation Greenplum, Netezza, etc
Database Layer 1 - Executing SQL Database Layer 2 - Executing SQL
Database Layer - Executing SQL Block Processing Layer - Custom DoW
Parent Process Parent Process Worker Process Worker Process Worker
Process
Slide 11
11 Architectural Differentiation Threads operate from queue,
dedicated for a fraction of a second. Threads dedicated for the
duration of a query. Parent Process Parent Process Worker Process
Worker Process Worker Process Greenplum, Netezza, etc
Slide 12
12 InfiniDB Design Principles Scalable Fast Simple
Slide 13
Row-Oriented vs. Column-Oriented Copyright 2014 InfiniDB. All
Rights Reserved. Row-oriented: rows stored sequentially
Column-oriented: each column is stored in a separate file Each
column for a given row is at the same offset.
KeyFnameLnameStateZipPhoneAgeSex 1BugsBunnyNY11217(718) 938-323534M
2YosemiteSamCA95389(209) 375-657252M 3DaffyDuckNY10013(212)
227-181035M 4ElmerFuddME04578(207) 882-732343M
5WitchHazelMA01970(978) 744-099157F Key 1 2 3 4 5 Fname Bugs
Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY
CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235
(209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34
52 35 43 57 Sex M M M M F
Slide 14
2-Dimensional Data Partitioning Copyright 2014 InfiniDB. All
Rights Reserved. Vertical Partitioning by Column o Not
Column-Family (no relation to HBase) o Only do I/O for columns
requested Horizontal Partitioning by range of rows o Meta-data
stored within in-memory structure 10 TB of data maps to ~150k-300k
discrete files.
16 InfiniDB Design Principles Scalable Fast Simple
Slide 17
17 Simplicity Automated Everything Column storage Compression
/compression type No index build or maintenance required Extent Map
partitioning Vertical/ Horizontal Distribution of data across
server/disk resources Distribution of work Ad-hoc performance
Slide 18
18 InfiniDB Whats New Scalable Fast Simple Open Source GPL v2
New Company Name Funding InfiniDB for Hadoop Windowing Analytic
Functions Open Source GPL v2 New Company Name Funding InfiniDB for
Hadoop Windowing Analytic Functions
Slide 19
What is InfiniDB for Hadoop? Fast SQL for Hadoop offering for
real-time and ad-hoc reporting and analytics Non-map/reduce engine
for real-time SQL 40x to 100x faster than Hive SQL in Hadoop Reads
and writes directly to HDFS/GPFS Best of breed SQL in Hadoop
Superior ad-hoc usage, syntax vs. Impala/Presto MySQL Compatibility
InfiniDB presents Hadoop as MySQL data source
Slide 20
20 InfiniDB Background InfiniDB for Hadoop InfiniDB is a
non-map/reduce engine Reads and writes natively to HDFS Map Reduce
HBase InfiniDB for Hadoop Hadoop Distributed File System
Pig/Hive
Slide 21
Value Proposition For InfiniDB for Hadoop Enables access to
Hadoop data via familiar interface Response to competitive
challenge from Cloudera Impala Complete the Hadoop Checklist
Cost-effective storage Robust transforms via map/reduce Real-time
SQL for analytics with InfiniDB for Hadoop
Slide 22
Benchmark Hive, Presto, Impala, InfiniDB Copyright 2014
InfiniDB. All Rights Reserved.
http://infinidb.co/system/files/RadiantAdvisors_Benchmark_SQL-on-Hadoop_2014Q1.pdf
Slide 23
PARTITION and FRAME For each row, calculation for an
aggregation is done over a FRAME of rows The PARTITION of a row is
the group of rows that have a value for a specific column same as
the current row FRAME for each row is a subset of a PARTITION for
the row SELECT x,y,sum(x) OVER (PARTITION BY y RANGE BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING) FROM a 23 Row
NumberXYPARTITIONFRAME 111Partition for rows 1 to 4 Frame for row 1
sum(x) = 22 Frame for row 2 sum(x) = 21 Frame for row 3 sum(x) = 17
Frame for row 4 sum(x) = 10 241 371 4101 522Partition for rows 5 to
7 Frame for row 5 sum(x) = 15 Frame for row 6 sum(x) = 13 Frame for
row 7 sum(x) = 8 652 782 833Partition for rows 8 to 10 Frame for
row 8 sum(x) = 18 Frame for row 9 sum(x) = 15 Frame for row 10
sum(x) = 9 963 1093
Slide 24
24 InfiniDB Use Cases Scalable Fast Simple Who is using it?
When to use it? Who is using it? When to use it?
Slide 25
InfiniDB Customers Copyright 2014 InfiniDB. All Rights
Reserved.
Slide 26
InfiniDBs place in the Big Data world Designed for high
performance analytics Provides flexibility for ad hoc queries Not
suited for OLTP, NoSQL, KeyValue Copyright 2014 Calpont. All Rights
Reserved.
Slide 27
Workload Query Vision/Scope General DBMS missed the target
(dated database technology generally suboptimal) Copyright 2014
Calpont. All Rights Reserved. 1
10010,0001,000,000100,000,00010,000,000,000 Query Vision/Scope
OLTP/NoSQL Workloads Analytic Workloads
Slide 28
28 What is your typical query? 1
10010,0001,000,000100,000,00010,000,000,000 Query Vision/Scope
OLTP/NoSQL Workloads Analytic Workloads There is no average query.
The challenges are at the extremes: o The challenge of high
concurrency levels with OLTP/NoSQL. o The challenge of latency for
very large queries. Most use cases imply multiple data
technologies.
Slide 29
29 Columnar Appropriate Workloads 1
10010,0001,000,000100,000,00010,000,000,000 Query Vision/Scope
OLTP/NoSQL Workloads ROLAP/Analytic/Reporting Workloads Pure
Columnar about 10x worse I/O for single record lookups Pure
Columnar about 10x better I/O for large data access patterns
Slide 30
Benefits of InfiniDB 30 Real-time, Consistent Query Performance
Linear Scale for Massive Data Removes Limits to Dimensions and
Granularity Easy to Deploy and Maintain
Slide 31
Core Features of InfiniDB Scalable MPP architecture Performant
ad hoc analysis Consistent query response time Simplified data
administration Analytic window functions Native MySQL driver
support Open source license Deployable on premise, in the cloud,
& on Apache Hadoop Optional Enterprise support subscription
Copyright 2014 Calpont. All Rights Reserved.