InfiniDB Overview. What is InfiniDB? Massively Parallel MySQL Storage Engine for Fast Analytics Linear scale to handle exponential growth Open-Source

Embed Size (px)

Citation preview

  • Slide 1
  • InfiniDB Overview
  • Slide 2
  • What is InfiniDB? Massively Parallel MySQL Storage Engine for Fast Analytics Linear scale to handle exponential growth Open-Source Runs on premise, on AWS cloud or Hadoop HDFS cluster Standard ANSI SQL compliance First MySQL storage engine to support ANSI SQL11- compliant windowing functions Copyright 2014 InfiniDB. All Rights Reserved.
  • Slide 3
  • 3 Custom Handler Class InfiniDB Server User Module Performance Module(s) Storage User Connections MySQL ----------------------- InfiniDB ExeMgr MySQL Functions MySQL Client MySQL Connectivity (JDBC, ODBC) MySQL Security Initial SQL Statement Parsing Initial SQL Optimization Execute final sort and final limit Display final results --------------------------------------------------------------------- InfiniDB ExeMgr Functions SQL Optimization Distribute work for scan, filter, join, functions, expressions, group by, aggregation, etc. to the all available Performance Modules to be run in parallel. Collect the results returned by the Performance Modules Return the final results to MySQL for display
  • Slide 4
  • 4 InfiniDB Design Principles Scalable Fast Simple
  • Slide 5
  • InfiniDB Parallelism User Module Processes SQL Requests Performance Module Executes the Queries or Single Server MPP Copyright 2014 InfiniDB. All Rights Reserved.
  • Slide 6
  • 6 Tiered MPP Building Blocks ModuleProcessFunctionalityValue MySQL Hosts MySQL Connection management SQL parsing & optimization Familiar DBMS interface Leverages existing partner integrations Delivers full SQL syntax support Extent Map Abstracts physical and logical storage Metadata store Enables shared nothing and shared everything storage Enables partition elimination Built-in failover ExeMgr Work distribution Final results management and aggregation Independent scalability and tunable concurrency Multi-threaded to take advantage of multi- core HW platforms
  • Slide 7
  • 7 Tiered MPP Building Blocks ModuleProcessFunctionalityValue PrimProc Scale-out cache management Distributed scan, filter, join and aggregation operations Resource management Independent scalability and tunable performance Multi-threaded to take advantage of multi- core HW platforms Data High Speed Bulk Load Transactional DML and DDL Online schema extensions Enables concurrent reads and writes, non- blocking read enabled Multi-threaded to take advantage of multi- core HW platforms
  • Slide 8
  • InfiniDB Foundation - Parallelism 8 Purpose-built C++ engine Parallelism is at the thread level Example: 12 PM Servers with 8 cores each yields 96 parallel processing engines. SQL is translated into thousands or tens of thousands of discrete jobs or primitives. The UM sends primitives to the processing engines.
  • Slide 9
  • InfiniDB Parallelism Fixed Thread Pool Copyright 2014 InfiniDB. All Rights Reserved. Single ServerMPP Local disk / EBS GlusterFS / HDFS Primitives are issued into a thread queue within each performance module. User Module Processes SQL Requests Performance Module Executes the Queries
  • Slide 10
  • 10 Architectural Differentiation Greenplum, Netezza, etc Database Layer 1 - Executing SQL Database Layer 2 - Executing SQL Database Layer - Executing SQL Block Processing Layer - Custom DoW Parent Process Parent Process Worker Process Worker Process Worker Process
  • Slide 11
  • 11 Architectural Differentiation Threads operate from queue, dedicated for a fraction of a second. Threads dedicated for the duration of a query. Parent Process Parent Process Worker Process Worker Process Worker Process Greenplum, Netezza, etc
  • Slide 12
  • 12 InfiniDB Design Principles Scalable Fast Simple
  • Slide 13
  • Row-Oriented vs. Column-Oriented Copyright 2014 InfiniDB. All Rights Reserved. Row-oriented: rows stored sequentially Column-oriented: each column is stored in a separate file Each column for a given row is at the same offset. KeyFnameLnameStateZipPhoneAgeSex 1BugsBunnyNY11217(718) 938-323534M 2YosemiteSamCA95389(209) 375-657252M 3DaffyDuckNY10013(212) 227-181035M 4ElmerFuddME04578(207) 882-732343M 5WitchHazelMA01970(978) 744-099157F Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F
  • Slide 14
  • 2-Dimensional Data Partitioning Copyright 2014 InfiniDB. All Rights Reserved. Vertical Partitioning by Column o Not Column-Family (no relation to HBase) o Only do I/O for columns requested Horizontal Partitioning by range of rows o Meta-data stored within in-memory structure 10 TB of data maps to ~150k-300k discrete files.
  • Slide 15
  • 15 Column Restriction and Projection Automatic Vertical Partitioning + Horizontal Partitioning Just-In-Time Materialization |-------------- Column # Four ---------------| |-------------- Column # Six ---------------| Extent # 5 |-------- Column # Seventeen -----------| Extent # 27 Filter 1 Filter 2 Filter 3 Projection
  • Slide 16
  • 16 InfiniDB Design Principles Scalable Fast Simple
  • Slide 17
  • 17 Simplicity Automated Everything Column storage Compression /compression type No index build or maintenance required Extent Map partitioning Vertical/ Horizontal Distribution of data across server/disk resources Distribution of work Ad-hoc performance
  • Slide 18
  • 18 InfiniDB Whats New Scalable Fast Simple Open Source GPL v2 New Company Name Funding InfiniDB for Hadoop Windowing Analytic Functions Open Source GPL v2 New Company Name Funding InfiniDB for Hadoop Windowing Analytic Functions
  • Slide 19
  • What is InfiniDB for Hadoop? Fast SQL for Hadoop offering for real-time and ad-hoc reporting and analytics Non-map/reduce engine for real-time SQL 40x to 100x faster than Hive SQL in Hadoop Reads and writes directly to HDFS/GPFS Best of breed SQL in Hadoop Superior ad-hoc usage, syntax vs. Impala/Presto MySQL Compatibility InfiniDB presents Hadoop as MySQL data source
  • Slide 20
  • 20 InfiniDB Background InfiniDB for Hadoop InfiniDB is a non-map/reduce engine Reads and writes natively to HDFS Map Reduce HBase InfiniDB for Hadoop Hadoop Distributed File System Pig/Hive
  • Slide 21
  • Value Proposition For InfiniDB for Hadoop Enables access to Hadoop data via familiar interface Response to competitive challenge from Cloudera Impala Complete the Hadoop Checklist Cost-effective storage Robust transforms via map/reduce Real-time SQL for analytics with InfiniDB for Hadoop
  • Slide 22
  • Benchmark Hive, Presto, Impala, InfiniDB Copyright 2014 InfiniDB. All Rights Reserved. http://infinidb.co/system/files/RadiantAdvisors_Benchmark_SQL-on-Hadoop_2014Q1.pdf
  • Slide 23
  • PARTITION and FRAME For each row, calculation for an aggregation is done over a FRAME of rows The PARTITION of a row is the group of rows that have a value for a specific column same as the current row FRAME for each row is a subset of a PARTITION for the row SELECT x,y,sum(x) OVER (PARTITION BY y RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM a 23 Row NumberXYPARTITIONFRAME 111Partition for rows 1 to 4 Frame for row 1 sum(x) = 22 Frame for row 2 sum(x) = 21 Frame for row 3 sum(x) = 17 Frame for row 4 sum(x) = 10 241 371 4101 522Partition for rows 5 to 7 Frame for row 5 sum(x) = 15 Frame for row 6 sum(x) = 13 Frame for row 7 sum(x) = 8 652 782 833Partition for rows 8 to 10 Frame for row 8 sum(x) = 18 Frame for row 9 sum(x) = 15 Frame for row 10 sum(x) = 9 963 1093
  • Slide 24
  • 24 InfiniDB Use Cases Scalable Fast Simple Who is using it? When to use it? Who is using it? When to use it?
  • Slide 25
  • InfiniDB Customers Copyright 2014 InfiniDB. All Rights Reserved.
  • Slide 26
  • InfiniDBs place in the Big Data world Designed for high performance analytics Provides flexibility for ad hoc queries Not suited for OLTP, NoSQL, KeyValue Copyright 2014 Calpont. All Rights Reserved.
  • Slide 27
  • Workload Query Vision/Scope General DBMS missed the target (dated database technology generally suboptimal) Copyright 2014 Calpont. All Rights Reserved. 1 10010,0001,000,000100,000,00010,000,000,000 Query Vision/Scope OLTP/NoSQL Workloads Analytic Workloads
  • Slide 28
  • 28 What is your typical query? 1 10010,0001,000,000100,000,00010,000,000,000 Query Vision/Scope OLTP/NoSQL Workloads Analytic Workloads There is no average query. The challenges are at the extremes: o The challenge of high concurrency levels with OLTP/NoSQL. o The challenge of latency for very large queries. Most use cases imply multiple data technologies.
  • Slide 29
  • 29 Columnar Appropriate Workloads 1 10010,0001,000,000100,000,00010,000,000,000 Query Vision/Scope OLTP/NoSQL Workloads ROLAP/Analytic/Reporting Workloads Pure Columnar about 10x worse I/O for single record lookups Pure Columnar about 10x better I/O for large data access patterns
  • Slide 30
  • Benefits of InfiniDB 30 Real-time, Consistent Query Performance Linear Scale for Massive Data Removes Limits to Dimensions and Granularity Easy to Deploy and Maintain
  • Slide 31
  • Core Features of InfiniDB Scalable MPP architecture Performant ad hoc analysis Consistent query response time Simplified data administration Analytic window functions Native MySQL driver support Open source license Deployable on premise, in the cloud, & on Apache Hadoop Optional Enterprise support subscription Copyright 2014 Calpont. All Rights Reserved.