Upload
abba
View
56
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J. who we are. The Hadoop RDBMS. Standard ANSI SQL Horizontal Scale- Out Real -Time Updates ACID Transactions Powers OLAP and OLTP Seamless BI Integration. Splice Machine Proprietary and Confidential. - PowerPoint PPT Presentation
Citation preview
The Hadoop RDBMSReplace Oracle with Hadoop
John Leach CTO and Co-FounderJ
2
TheHadoopRDBMS
Standard ANSI SQLHorizontal Scale-OutReal-Time Updates
ACID TransactionsPowers OLAP and OLTPSeamless BI Integration
who we are
Splice Machine Proprietary and Confidential
3
serialization and write pipelining
Serialization GoalsDisk Usage Parity with Data SuppliedPredicate evaluation use byte[] comparisons (sorted)Memory and CPU efficient (fast)Lazy Serialization and Deserialization
Write Pipelining GoalsNon-blocking WritesTransactional AwarenessSmall Network FootprintHandle Failure, Location, and Retry Semantics
4
Single Column Encoding
All Columns encoded in a single cellseparated by 0x00 byte
Nulls are encoded either as “explicit null” or as an absent fieldCell value prefixed by an Index containing
which fields are present in cellwhether the field is
Scalar (1-9 Bytes) Float (4 Bytes) Double (8 Bytes) Other (1 – N Bytes)
5
Example Insert
Table Schema: (a int, b string)Insert row (1,’bob’):
All columns packed together1 0x00 ‘bob’
Index prepended{1(s),2(o)}0x00 1 0x00 ‘bob’
6
Example Insert w/ nulls
Row (1,null)nulls left absent
1
Index prepended (field B is not present){1(s)} 0x00 1
7
Example: Update
Row already present: {1(s),2(o)}set a = 2
Pack entry2
prepend index (field B is not present){1(s)}0x00 2
8
Decoding
Indexes are cachedMost data looks like it’s predecessor
Values are read in reverse timestamp orderUpdates before inserts
Seek through bytes for fields of interestOnce a field is populated, ignore all other values for that field.
9
Example Decoding
Start with (NULL,NULL)2 KeyValues present:
{1(s)}0x00 2{1(s),2(o)} 0x00 1 0x00 ‘bob’
Read first KeyValue, fill field 1Row: (2,NULL)
Read second KeyValue, skip field 1(already filled), fill field 2:
Row: (2,’bob’)
10
Index Decoding
Index encoded differently depending on number of columns present and type
Uncompressed: 1 bit for present, 2 bits for typeCompressed: Run-length encoded (field 1-3, scalar, 5-8 double…)Sparse: Delta encoded (index,type) pairsSparse compressed: Run-length encoded (index,type) pairs
11
Write Pipeline
Asynchronous but guaranteed deliveryOperate in Bulk
Row or Size boundedHighly Configurable
Utilizes Cached Region LocationsServer component modeled after Java’s NIO
Attach Handlers for different RDBMS features
Handle retries, failure, and SQL semanticsWrong Region, Region Too Busy, Primary Key Violation, Unique Constraint Violation
12
Write Pipeline Base Element
Rows are encoded into custom KVPairsall rows for a family and column are grouped together<byte[],byte[]>
Exploded into Put only to write to HBaseTimestamps added on server side
Supports snappy compression
13
Write Pipeline Client
Tree Based BufferTable -> Region -> N BuffersRows are buffered on client side in memoryN is configurable
When buffer fillsasynchronously write batch to Region
Handles HBase “difficulties” gracefullyWrong Region
Re-bucket
Too BusyAdd delay and possibly back-off
etc.
14
Write Pipeline Server Side
Coprocessor basedLimited number of concurrent writes to a server
excess write requests are rejectedprevents IPC thread starvation
SQL Based Handlers for parallel writesIndexes, Primary Key Constraints, Unique Constraints
Writes occur in a single WALEdit on each region
15
Interests
Other items we have done or interested in…Burstable Tries Implementation of MemstorePluggable Cost Based Genetic Algorithm for Assignment ManagerColumnar Representations and in-memory processing.Concurrent Bloom Filter (i.e. Thread Safe BitSet)
We are hiringJust Completed $15M Series B [email protected]