55
<Insert Picture Here> The InnoDB Storage Engine for MySQL Morgan Tocker, MySQL Community Manager http://www.tocker.ca/

The InnoDB Storage Engine for MySQL

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: The InnoDB Storage Engine for MySQL

<Insert Picture Here>

The InnoDB Storage Engine for MySQL Morgan Tocker, MySQL Community Managerhttp://www.tocker.ca/

Page 2: The InnoDB Storage Engine for MySQL

Safe Harbor Statement

The  following  is  intended  to  outline  our  general  product  direction.  It  is  intended  for  information  purposes  only,  and  may  not  be  incorporated  into  any  contract.  It  is  not  a  commitment  to  deliver  any  material,  code,  or  functionality,  and  should  not  be  relied  upon  in  making  purchasing  decisions.  The  development,  release,  and  timing  of  any  features  or  functionality  described  for  Oracle’s  products  remains  at  the  sole  discretion  of  Oracle.

Page 3: The InnoDB Storage Engine for MySQL

<Insert Picture Here> MySQL 5.5MySQL Cluster 7.3

MySQL Enterprise Monitor 2.3 & 3.0MySQL Enterprise Backup Security Scalability HA Audit

MySQL 5.6MySQL Workbench 6.0

M y S Q L U t i l i t i e s

M y S Q L A p p l i e r f o r

H a d o o p

MySQL Workbench 5.2 & 6.0M y S Q L E n t e r p r i s e O r a c l e C e r t i f i c a t i o n s

4 Years of MySQL Innovation

M y S Q L C l u s t e r M a n a g e r

Windows installer & Tools

MySQL Cluster 7.2MySQL Cluster 7.1

MySQL Migration Wizard

MySQL 5.7

Page 4: The InnoDB Storage Engine for MySQL

Hello and Welcome!

• I will be talking about InnoDB’s internal behaviour. • Not talking (much) about MySQL. • Aim of this talk is to give you X-ray vision.

• i.e. not so many direct takeaways, but one day it will help you debug a problem.

Page 5: The InnoDB Storage Engine for MySQL

Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.

Prerequisites

Page 6: The InnoDB Storage Engine for MySQL

MySQL Architecture

Page 7: The InnoDB Storage Engine for MySQL

L1 cache reference 0.5 ns!Branch mispredict 5 ns!L2 cache reference 7 ns!Mutex lock/unlock 25 ns!Main memory reference 100 ns!Compress 1K bytes with Zippy 3,000 ns!Send 2K bytes over 1 Gbps network 20,000 ns!Read 1 MB sequentially from memory 250,000 ns!Round trip within same datacenter 500,000 ns!Disk seek 10,000,000 ns!Read 1 MB sequentially from disk 20,000,000 ns!Send packet CA->Netherlands->CA 150,000,000 ns

IO Performance

See: http://www.linux-mag.com/cache/7589/1.html and Google http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

Page 8: The InnoDB Storage Engine for MySQL

• 5-10ms per disk IO. • Maybe 50us for a high end SSD.

• Still not “memory speed”.

IO Performance (cont.)

Page 9: The InnoDB Storage Engine for MySQL

• Operating Systems compensate well already. • Reads are cached with free memory. • Writes don’t happen instantly.

• A step is introduced to rewrite and merge.

Buffered IO

Block 9, 10, 1, 4, 200, 5.

Block 1, 4, 5, 9, 10, 200

Page 10: The InnoDB Storage Engine for MySQL

fsync

Synopsis #include <unistd.h> int fsync(int fd); int fdatasync(int fd); !Description fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)).

Page 11: The InnoDB Storage Engine for MySQL

Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.

Basic  Operation

Page 12: The InnoDB Storage Engine for MySQL

InnoDB High Level OverviewTransa

ctio

nSystem

Storage

Caching

SYS_TABLES

ibda

ta1

spac

e 0

Page Cache

A.ibd

B.ibd

C.ibd

IBUF_HEADERIBUF_TREETRX_SYS

FIRST_RSEGDICT_HDR

Data Dict.

SYS_COLUMNSSYS_INDEXESSYS_FIELDS

Block 1 (64 pages)Block 2 (64 pages)

iblogfile0 iblogfile1 iblogfile2

Tables withfile_per_tableDoublewrite Buffer

Buffe

r Poo

l Data Dictionary Cache

Adaptive Hash Indexes

Buffer Pool LRU

Additional Mem Pool

Log Buffer

Log

Gro

up

Buffer Pool Flush List

In Memory On Disk

Page 13: The InnoDB Storage Engine for MySQL

Query (pages not in buffer pool)

InnoDB

Transa

ctio

nSystem

Storage

Caching

SYS_TABLES

ibda

ta1

spac

e 0

Page Cache

A.ibd

B.ibd

C.ibd

IBUF_HEADERIBUF_TREETRX_SYS

FIRST_RSEGDICT_HDR

Data Dict.

SYS_COLUMNSSYS_INDEXESSYS_FIELDS

Block 1 (64 pages)Block 2 (64 pages)

iblogfile0 iblogfile1 iblogfile2

Tables withfile_per_tableDoublewrite Buffer

Buffe

r Poo

l Data Dictionary Cache

Adaptive Hash Indexes

Buffer Pool LRU

Additional Mem Pool

Log Buffer

Log

Gro

upBuffer Pool Flush List

SELECT * FROM a WHERE id = 10;

mysqld

Not Found

Page 14: The InnoDB Storage Engine for MySQL

Query (pages in buffer pool)

InnoDB

Transa

ctio

nSystem

Storage

Caching

SYS_TABLES

ibda

ta1

spac

e 0

Page Cache

A.ibd

B.ibd

C.ibd

IBUF_HEADERIBUF_TREETRX_SYS

FIRST_RSEGDICT_HDR

Data Dict.

SYS_COLUMNSSYS_INDEXESSYS_FIELDS

Block 1 (64 pages)Block 2 (64 pages)

iblogfile0 iblogfile1 iblogfile2

Tables withfile_per_tableDoublewrite Buffer

Buffe

r Poo

l Data Dictionary Cache

Adaptive Hash Indexes

Buffer Pool LRU

Additional Mem Pool

Log Buffer

Log

Gro

upBuffer Pool Flush List

SELECT * FROM a WHERE id = 10;

mysqld

Page 15: The InnoDB Storage Engine for MySQL

Update Query in a Transaction (simplified)

InnoDB

Transa

ctio

nSystem

Storage

Caching

SYS_TABLES

ibda

ta1

spac

e 0

Page Cache

A.ibd

B.ibd

C.ibd

IBUF_HEADERIBUF_TREETRX_SYS

FIRST_RSEGDICT_HDR

Data Dict.

SYS_COLUMNSSYS_INDEXESSYS_FIELDS

Block 1 (64 pages)Block 2 (64 pages)

iblogfile0 iblogfile1 iblogfile2

Tables withfile_per_tableDoublewrite Buffer

Buffe

r Poo

l Data Dictionary Cache

Adaptive Hash Indexes

Buffer Pool LRU

Additional Mem Pool

Log Buffer

Log

Gro

upBuffer Pool Flush List

UPDATE a SET col1 = ‘new’ WHERE id = 10;

mysqld

commit;

Page 16: The InnoDB Storage Engine for MySQL

Log files

• Provide recovery. • Only written to in regular operation. • Read only required if there is a crash.

• Are rewritten over-and-over again. • Think of it like a tank tread.

Page 17: The InnoDB Storage Engine for MySQL

Log files (cont.)

• Are an optimization! • 512B aligned sequential writes. • Tablespace writes are 16KiB random writes.

• Tablespace writes to same pages in close time window can be merged. • Just need a large enough log file.

Page 18: The InnoDB Storage Engine for MySQL

Checkpoint (Background Activity)

InnoDB

Transa

ctio

nSystem

Storage

Caching

SYS_TABLES

ibda

ta1

spac

e 0

Page Cache

A.ibd

B.ibd

C.ibd

IBUF_HEADERIBUF_TREETRX_SYS

FIRST_RSEGDICT_HDR

Data Dict.

SYS_COLUMNSSYS_INDEXESSYS_FIELDS

Block 1 (64 pages)Block 2 (64 pages)

iblogfile0 iblogfile1 iblogfile2

Tables withfile_per_tableDoublewrite Buffer

Buffe

r Poo

l Data Dictionary Cache

Adaptive Hash Indexes

Buffer Pool LRU

Additional Mem Pool

Log Buffer

Log

Gro

upBuffer Pool Flush List

(nothing) mysqld

Page 19: The InnoDB Storage Engine for MySQL

• Q: What do we write to the log - is it committed data only, or can we write uncommitted data as well?

• A: Both.

FAQ

Page 20: The InnoDB Storage Engine for MySQL

• Q: How do you unapply transactions? • A: UNDO space.

Think of it like a hidden table internally stored in ibdata1.

FAQ

Page 21: The InnoDB Storage Engine for MySQL

Update Query (More Accurate*)

InnoDB

Transa

ctio

nSystem

Storage

Caching

SYS_TABLES

ibda

ta1

spac

e 0

Page Cache

A.ibd

B.ibd

C.ibd

IBUF_HEADERIBUF_TREETRX_SYS

FIRST_RSEGDICT_HDR

Data Dict.

SYS_COLUMNSSYS_INDEXESSYS_FIELDS

Block 1 (64 pages)Block 2 (64 pages)

iblogfile0 iblogfile1 iblogfile2

Tables withfile_per_tableDoublewrite Buffer

Buffe

r Poo

l Data Dictionary Cache

Adaptive Hash Indexes

Buffer Pool LRU

Additional Mem Pool

Log Buffer

Log

Gro

upBuffer Pool Flush List

UPDATE a SET col1 = ‘new’ WHERE id = 10;

mysqld

commit;Page Cache

Update any indexes to hold both versions.Modify row in place

Redirect older version of row to undo space

Page 22: The InnoDB Storage Engine for MySQL

• Background purge process is able to clean old rows from UNDO as soon as oldest transaction advances forward.

Update Query (cont.)

Page 23: The InnoDB Storage Engine for MySQL

Summarized Performance Characteristics

• Log Files: • Are short sequential writes. • They permit InnoDB to delay tablespace writes -

enabling more merging/optimization. • Buffer Pool:

• “In memory version of the tablespace”. • Loading/unloading via modified LRU algorithm.

Page 24: The InnoDB Storage Engine for MySQL

• Indexes and “data” in InnoDB are B+Trees. • Clustered Index design means that data itself is

stored in an index.

Index Structure

Page 25: The InnoDB Storage Engine for MySQL

Index Structure (cont.)

Infimum Supremum

Level 0Root

NextRecord

Page 3

Empty root

Page 26: The InnoDB Storage Engine for MySQL

Index Structure (cont.)

Infimum Supremum

1B*

Level 0Root

Page 3

Insert: 1

Page size is 16KBB* is 2KB

Page 27: The InnoDB Storage Engine for MySQL

Index Structure (cont.)

Level 0Root

I S

1B*

Page 3

2B*

3B*

4B*

5B*

6B*

7B*

Insert: 1 to 7

Page 28: The InnoDB Storage Engine for MySQL

Index Structure (cont.)

Infimum Supremum

14

Level 1Root

Page 3

Level 0Leaf

I S

1B*

Page 4

2B*

3B*

4B*

5B*

6B*

7B*

Insert: 8

Allocate new page and link in rootMove records to new pageSplit new page

Page 29: The InnoDB Storage Engine for MySQL

Index Structure (cont.)

Infimum Supremum

14

Level 1Root

Page 3

Level 0Leaf

45

I S

4B*

Page 5

5B*

6B*

7B*

I S

1B*

Page 4

2B*

3B*

Insert: 8 (Cont.)

Split at the middle of original page

Page 30: The InnoDB Storage Engine for MySQL

Index Structure (cont.)

Infimum Supremum

14

Level 1Root

Page 3

Level 0Leaf

45

I S

4B*

Page 5

5B*

6B*

10B*

I S

1B*

Page 4

2B*

3B*

7B*

8B*

9B*

Insert: 9 and 10

Page 31: The InnoDB Storage Engine for MySQL

Index Structure (cont.)

Level 1Root

Level 0Leaf

I S

1B*

Page 4

2B*

3B*

I S

4B*

Page 5

5B*

6B*

10B*

7B*

8B*

9B*

Infimum Supremum

14

Page 3

116

45

I S

11B*

Page 6

Insert: 11

Insert leads to a split at the insertion point

Page 32: The InnoDB Storage Engine for MySQL

Index Structure

Infimum Supremum

≥0→4

≥4→5

I S≥2→7

≥0→6

I S≥6→9

≥4→8

I S

1B

0A

I S

3D

2C

I S

5F

4E

I S

7H

6GL

eve

l 0Leaf

Leve

l 1In

tern

al

Leve

l 2R

oo

t

Next Page

Prev Page

NextRecord

Page 3

Page 4 Page 5

Page 6 Page 7 Page 8 Page 9

Page 33: The InnoDB Storage Engine for MySQL

Page Format

FIL Header (38)

FIL Trailer (8)

0

38

16384

16376

Other headers and page data,depending on page type.

Total usable space: 16,338 bytes.

Page 34: The InnoDB Storage Engine for MySQL

Row Format

N-2

Info Flags (4 bits)Number of Records Owned (4 bits)

Order (13 bits)Record Type (3 bits)

Next Record Offset (2)Cluster Key Fields (k)

N

N+k

Transaction ID (6)Roll Pointer (7)

Non-Key Fields (j)

N-4

N-5Variable field lengths (1-2 bytes per var. field)

N+k+6

N+k+13

N+k+13+j

Page 35: The InnoDB Storage Engine for MySQL

• Page is basic unit of storage. • Default is 16KiB • Rows of variable length.

Conclusion

Page 36: The InnoDB Storage Engine for MySQL

• Adaptive hash - Partial hash index to accelerate secondary key lookups.

• Change buffering - when non-unique indexes are not in memory, changes can be temporarily buffered until they are.

Two more useful features

Page 37: The InnoDB Storage Engine for MySQL

Query (by secondary key)

InnoDB

Transa

ctio

nSystem

Storage

Caching

SYS_TABLES

ibda

ta1

spac

e 0

Page Cache

A.ibd

B.ibd

C.ibd

IBUF_HEADERIBUF_TREETRX_SYS

FIRST_RSEGDICT_HDR

Data Dict.

SYS_COLUMNSSYS_INDEXESSYS_FIELDS

Block 1 (64 pages)Block 2 (64 pages)

iblogfile0 iblogfile1 iblogfile2

Tables withfile_per_tableDoublewrite Buffer

Buffe

r Poo

l Data Dictionary Cache

Adaptive Hash Indexes

Buffer Pool LRU

Additional Mem Pool

Log Buffer

Log

Gro

upBuffer Pool Flush List

SELECT * FROM a WHERE b_key = 10;

mysqld

Page 38: The InnoDB Storage Engine for MySQL

Update Query (large table)

InnoDB

Transa

ctio

nSystem

Storage

Caching

SYS_TABLES

ibda

ta1

spac

e 0

Page Cache

A.ibd

B.ibd

C.ibd

IBUF_HEADERIBUF_TREETRX_SYS

FIRST_RSEGDICT_HDR

Data Dict.

SYS_COLUMNSSYS_INDEXESSYS_FIELDS

Block 1 (64 pages)Block 2 (64 pages)

iblogfile0 iblogfile1 iblogfile2

Tables withfile_per_tableDoublewrite Buffer

Buffe

r Poo

l Data Dictionary Cache

Adaptive Hash Indexes

Buffer Pool LRU

Additional Mem Pool

Log Buffer

Log

Gro

upBuffer Pool Flush List

UPDATE a SET col1 = ‘new’ WHERE id = 10;

mysqld

commit;

Not Required

Page 39: The InnoDB Storage Engine for MySQL

Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.

New  Features

Page 40: The InnoDB Storage Engine for MySQL

• IO Scalability • Async IO • Multiple Buffer Pools • Adaptive Flushing • Scan Resistant LRU • Compressed Pages • CPU Scalability • Improved Atomics • Spin Loops with PAUSE

MySQL 5.5+

Page 41: The InnoDB Storage Engine for MySQL

• LRU Dump and Restore

• Improved Group Commit

• Fulltext Search • Fast Read-only

Transactions • Memcached Interface • Information Schema

metadata tables

• Persistent Statistics • Variable Page Size • Online DDL • Transportable

Tablespace • Transactional

ReplicationUsing InnoDB

MySQL 5.6+

Page 42: The InnoDB Storage Engine for MySQL

• Faster Temporary Tables • Index Lock Contention Reduction • More Online DDL

• Extend VARCHAR • Rename Index

• Improved Read-Only Transactions • Improved CPU Scalability

MySQL 5.7+

Page 43: The InnoDB Storage Engine for MySQL

Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.

Configuration

Page 44: The InnoDB Storage Engine for MySQL

1. innodb-buffer-pool-size 2. innodb-log-file-size 3. innodb_flush_log_at_trx_commit

The Top 3

Page 45: The InnoDB Storage Engine for MySQL

• Really only one major buffer/cache settings to set. • Responsible for all pages types (data, indexes, undo,

insert buffer..)

innodb-buffer-pool-size

Page 46: The InnoDB Storage Engine for MySQL

• Recommendation is 50-80% of RAM. • Default is 128M of RAM. • Please allow 5-10% on top for other meta data to

grow.

innodb-buffer-pool-size (cont.)

Page 47: The InnoDB Storage Engine for MySQL

• Log files are on disk, but this contributes to how many unflushed (dirty) pages you can hold in memory.

• In theory larger log files = longer crash recovery. • In MySQL 5.5 -2G max. • In MySQL 5.6 - 4G is usually safe. • Early versions should be much smaller.

• Default is 48M Log Files.

innodb-log-file-size

Page 48: The InnoDB Storage Engine for MySQL

• Default is full ACID Compliance (=1) • Can be set to 0/2 if you do not mind some data loss.

innodb_flush_log_at_trx_commit

Page 49: The InnoDB Storage Engine for MySQL

• 0 = Log buffer written + synced once per second. Nothing done at commit.

• 1 = Log buffer written + synced once per second + written and synced on commit.

• 2 = Log buffer written + synced once per second + written (not synced) on commit.

!2 is a slightly safer version of 0.

innodb_flush_log_at_trx_commit

Page 50: The InnoDB Storage Engine for MySQL

Basic Configuration

InnoDBTransa

ctio

nSystem

Storage

Caching

SYS_TABLES

ibda

ta1

spac

e 0

Page Cache

A.ibd

B.ibd

C.ibd

IBUF_HEADERIBUF_TREETRX_SYS

FIRST_RSEGDICT_HDR

Data Dict.

SYS_COLUMNSSYS_INDEXESSYS_FIELDS

Block 1 (64 pages)Block 2 (64 pages)

iblogfile0 iblogfile1 iblogfile2

Tables withfile_per_tableDoublewrite Buffer

Buffe

r Poo

l Data Dictionary Cache

Adaptive Hash Indexes

Buffer Pool LRU

Additional Mem Pool

Log BufferLo

g G

roup

Buffer Pool Flush List

Innodb_buffer_pool_size. Recommendation is 50-80% RAM.

Requires about 5-10% of buffer pool size as overhead (not directly configurable).

innodb_log_buffer_size. Typical values 1-8M. Flushing behaviour influenced by innodb_flush_log_at_trx_commit.

innodb_log_file_size. Typical values 256M+. Default of 2 files (innodb_log_files_in_group).

innodb_file_per_table (Default: ON in 5.6+)

Page 51: The InnoDB Storage Engine for MySQL

1. innodb-buffer-pool-size 2. innodb-log-file-size 3. innodb-log-buffer-size 4. innodb_flush_log_at_trx_commit 5. innodb_flush_method 6. innodb_flush_neighbors 7. innodb_io_capacity, innodb_io_capacity_max,

innodb_lru_scan_depth 8. innodb-buffer-pool-instances 9. innodb_read_io_threads and innodb_write_io_threads

The ~Top 10

Page 52: The InnoDB Storage Engine for MySQL

• innodb_thread_concurrency • innodb_concurrency_tickets • innodb_max_pct_dirty_pages • innodb_use_native_aio (always on) • innodb_old_blocks_time (5.6 default: 1000)

Less-likely to need configuration

Page 53: The InnoDB Storage Engine for MySQL

• Typically “remove on sight” from config files: • innodb_additional_mempool_size • innodb_use_sys_malloc

Deprecated Settings

Page 54: The InnoDB Storage Engine for MySQL

Credits

• InnoDB Architecture Diagrams via https://github.com/jeremycole/innodb_diagrams

• Available under (3-clause) BSD licenseCopyright (c) 2013, Twitter, Inc.Copyright (c) 2013, Jeremy Cole <[email protected]>Copyright (c) 2013, Davi Arnaut <[email protected]>

Page 55: The InnoDB Storage Engine for MySQL