Upload
sunnygleason
View
142
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
g414-innoEmbedded InnoDB, Voldemort, St8,
and a Few Other Tidbits
Sunny Gleason
What’s in this Preso
• What is InnoDB?
• Relation to MySQL & Other Products
• InnoDB Model
• g414-inno: a Java Access Library for InnoDB
What else is in this Preso
• Creating a Voldemort Storage Engine with Embedded InnoDB
• St8: A REST-based Storage Server
• Faban Benchmark Results
What is InnoDB?
• High-Performance “guts” of MySQL
• Finely Tuned B-Tree Storage Engine
• MVCC Transactional Store a la Jim Gray (“Transactional Processing Systems”)
• Available Stand-Alone as Embedded InnoDB (stagnant) or HailDB (drizzle)
Relation to MySQL
• One of many MySQL storage engines
• Transactional, in contrast to MYISAM
• Well-known, Bullet-Proof Backup, Failure & Recovery Modes
• Advanced Buffer Pool Management (adaptive hash index, tunable LRU)
• Online Backup Support (Xtrabackup / Hot)
Other Products
• Tokyo BDB, Oracle BDB & BDB-JE
• Schema-Free (No Structure / Data Types)
• Lower Concurrency (fewer writers)
• Performance Degradation in Larger DBs
• (TODO: quantify performance gap - in meantime, see Dynamo & Voldemort)
InnoDB Model (Logical)• Database == Tablespace
• Tablespace has Table(s) and Log(s)
• Table has columns (rich datatypes)
• Tables have a PRIMARY clustered index
• Tables may have SECONDARY indexes
• Row == Tuple
• Tuples are stored / clustered by index sort
• Secondary index stores full Primary Key
InnoDB Model (Txns)• Everything uses a Transaction
• Isolation Level: Serialized, Read Committed, Read Uncommitted
• Locks: Shared (Read-only), Exclusive (Read/Write)
• Cursors provide access to tables: Lookup by index, Iteration / Traversal
• Secondary index contains partial Tuples
• Secondary cursor can access primary (full tuple)
InnoDB Model (Physical)
• Tablespace is a collection of pages (16K)
• Pages organized as a B-Tree: infimum & supremum keys, pointers to children
• Pages contain row or index tuple data, or blob overflow data
• Pages written to log first and flushed to tablespace based on ‘sync’ policy
Physical Considerations
• New pages requested from OS in extend_size increments
• OS Assigns space from file system / partition “free list”
• Temporal Locality (pages close together)
• Spatial Locality / Fragmentation from Updates
• Prefer “narrow” rows / indexes: faster scan, keeps working set in-memory
• Secondary “covering” indexes can save primary index access
!""#$%&$'(')'*+&,-.+*
!"#$%$ &!'()
!"#$%&'$()*%#+(,%
!"#$%"&'(&#&
(!)#!*"&%+
-"!./'0($('012%,$32"
*++,-.%$"'()
/0 !""*(,-.!'$-/$%-#&,'$
0!,( .!'$1
0.%2 .!'$1
3"(*'*41
!"1$%#5,3..$%5
*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
!""#$%&'()*+,-(.+,
!"#$%#
&$'($%#
!"#$%#
!"#$%# !"#$%#
)%*$"#$%#*+*,-*.)'$/
!"#$%#
01"*23
456
72$83*9
4588*.52%#$1
72$83*.52%#$1/
72$83*: 72$83*%
456
;)'$
456
456
456 456
<$)=*%53$*/$'($%#
0)>8$/.)?$
4588>)?@*/$'($%#
A5%B8$)=*%53$*/$'($%#
456456
*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
!""#$%&'#()
!"#$%&'()*+, **
#+,-./#(012,
-./0-12$3"456-./0-12$3"456
!"#$%&'(&%'''')%*'+,''''!$--'./%''''0-&'./%1'''$2"%3-$45.67"'./%'88'09"-&'26-:"1
#+,-./#(012,
* *7890/:-2$3"4567890/:-2$3"456
!"#$%&'(
*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
!""#$%&!"'()(*&+ ,-./0-1
!$020&-#3*&0-(&*2#-('&."&24(&%+2-((&5(06&"#'(*&0&758*2(-('&."'()!%+2-((&.*&#-90".:('&;1&<-./0-1&=(1&#-&"#"+"855&8".>8(&=(1&20;5(?&.6&'(6."('@&(5*(?&0"&."2(-"05&7#58/"&3.24&A+;12(&BCDE!$&.*&0''('F
!"#$%&'&()*'+,-'./0&.1
+2(&3
!!
"#$%&#'()*+,-
*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
!""#$%&!"'()(*&+ ,(-#"'./0
! ,(-#"'./0&1"'()&%+2/((&3(.4&"#'(*&-#"2.1"5&4#/&(.-6&7(0&8.39(5&26(&:/1;./0&7(0*&26(&-#//(*:#"'1"<&/#=*5&9*('&2#&.--(**&-39*2(/1"<&1"'()&2#&#>2.1"&26(&'.2.
!"#$%&'&()*'+,-'./0&.1
+2(&3
!"#$%&'&()*'+,-'./0&.1
+2(&3
4&!52(-'./+2(&3
67%'&&/"&-8/25(&$9/!52%-+2+2:/(-%-
67%'&&/"&-8/25(&$9/!52%-+2+2: ;<$
4&!52(-'./+2(&3
67%'&&/"&-8/25(&$9/!52%-+2+2: ;<$
,(-#"'./0 !"#$%
*source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
How can we use InnoDB?
• Download Embedded InnoDB or HailDB
• Use C-API for access to InnoDB tables
• Innostore: Erlang library for InnoDB access (from Basho’s Riak NoSQL project)
• g414-inno: Open-Source Java access library for Embedded InnoDB
g414-inno Foundations
• Uses JNA (Java Native Access): Like JNI, but doesn’t provoke (as much) insanity
• JNAerate: creates thin Java Class wrapper from a C-based header file (innodb.h)
• But, complex C API’s are super ugly in Java
• Need to clean that up a bit...
g414-inno Library
• Provides a more Object-Oriented API to mask all of the JNA “Pointer” madness
• Transaction Objects, Cursors, Table Builder, Tuple Builder, Datatype Validation
• Java Enum Types for ‘int’ enums in C API
• inTransaction() templates (like Spring, JDBI)
• Contains sanity checks to prevent common errors (mostly C API order of operations)
Use Case: Voldemort• Voldemort: High-Performance Key-Value
Store (Amazon Dynamo clone)
• Nokia: good results with Voldemort on MySQL with InnoDB
• Typical features of DB (network connectivity, SQL language) not really necessary
• Thought: why bother with DB layer?The g414-inno project is born ...
Voldemort Storage Engines
• Trivial to integrate new persistence mechanisms with Voldemort
• 2 Classes: Config & Storage Engine
• Trivial InnoDB Table:key_ VARBINARY(200) NOT NULLversion_ VARBINARY(200) NOT NULLvalue_ BLOBPRIMARY KEY (key_, version_)
• 3 Operations: put(k, v), get(k), delete(k)
• Complication: k is Versioned<Key>
V Storage Engine: put• put(byte[] key, byte[] version, byte[] value)
• Start transaction, open table cursor
• Create search tuple for key
• Cursor.find(key)
• Foreach row matching key if row.version is below, delete row if row.version is above, throw exception
• Cursor.insert(key, version, value)
V Storage Engine: get• get(byte[] key, byte[] version)
• Start transaction, open table cursor
• Create search tuple for key
• Cursor.find(key)
• Foreach row matching key add to results
• Return results
V Storage Engine: delete
• delete(byte[] key)
• Start transaction, open table cursor
• Create search tuple for key
• Cursor.find(key)
• Foreach row matching key delete row
V Storage Engine: TODO
• Perform Benchmarks (in EC2, local)
• Tuning / Optimization
• Clarify licenses (GPLv2 + Apache == ouch)
• Organize & streamline distribution
St8
• Simple, Open Source REST-based Storage Server
• Wraps InnoDB with thin “but pleasant” HTTP API
• Custom Tables using JSON table definitions
• Natural, JSON-based access to tables: CRUD, Index-based Query & Iteration
• Under the hood: Jetty, Jersey, Guice, Jackson, g414-inno, Embedded InnoDB
St8 Table Def{"columns":[ {"name":"key1","type":"INT","length":4}, {"name":"key2","type":"VARCHAR","length":50}, {"name":"val","type":"BLOB","length":0}],"indexes":[ { "name":"PRIMARY", "clustered":true,"unique":true, "indexColumns":[{"name":"key1"}] }, { "name":"key2", "clustered":false,"unique":false, "indexColumns":[{"name":"key2"}] }]}
St8 Interface• Operations for Table Management: create,
describe, delete, truncate
• Operations for Data Management: Create, Retrieve, Update, Delete
• Influences g414-inno design: template methods for inTransaction(), insert, update, insertOrUpdate, delete, load
• Coming Soon: Query & Iteration APIs
St8: Sample Requests
SIMPLE GET:
curl "http://localhost:8080/d/atable;key1=123"
INSERT:
curl -X PUT "http://localhost:8080/d/atable;key1=123;key2=ABC;val=AVERYLONGDATA"
UPDATE:
curl -X POST "http://localhost:8080/d/atable;key1=123;key2=CDE;val=NEWDATA"
DELETE:
curl -X DELETE "http://localhost:8080/d/atable;key1=123"
g414-inno: Faban Benchmark
• Row: 4-byte Key, 4096-byte value
• Insert Sequential, Random
• Single disk, 3-disk RAID 0, SSD
• TODO: Concurrent Benchmarks, Mixed Read/Write
Benchmark Results
0
5
10
15
20
InsertSeq InsertRnd SelectRnd
Embedded InnoDB Latency (ms)
Single Disk (OS X 1) 3-Disk Raid 0 (OS X 1) SSD (OS X 2)
Single-Threaded Benchmarks InsertSeq InsertRnd SelectRndSingle Disk (OS X 1) 9.0 9.3 163-Disk Raid 0 (OS X 1) 0.47 1.4 5.2SSD (OS X 2) 0.51 1.2 0.71
Next Steps / Future Work
• Finish St8: Queries & Iteration, Benchmark
• Package / Qualify Voldemort Storage Engine
• Integrate with Xtrabackup (hot backup)
• Integrate with Sqoop (hadoop export)
• Explore more advanced App-Level Replication Support
Questions?
• Thank you for listening!
References / More Info
• Embedded InnoDB, HailDB (drizzle)
• InnoDB Performance
• GitHub: g414-inno, st8, voldemort, xfaban
• Java Native Access (JNA)
• Tokyo BDB, Oracle BDB & BDB-JE
• Amazon Dynamo; Voldemort Project