25

Наш ответ Uber’у

Embed Size (px)

Citation preview

Page 1: Наш ответ Uber’у
Page 2: Наш ответ Uber’у

Russian developers of PostgreSQL:Alexander Korotkov, Teodor Sigaev, Oleg Bartunov

! Speakers at PGCon, PGConf: 20+ talks! GSoC mentors! PostgreSQL commi ers (1+1 in progress)! Conference organizers! 50+ years of PostgreSQL expertship:

development, audit, consul ng! Postgres Professional co-founders

PostgreSQL CORE! Locale support! PostgreSQL extendability:

GiST(KNN), GIN, SP-GiST! Full Text Search (FTS)! NoSQL (hstore, jsonb)! Indexed regexp search! Create AM & Generic WAL! Table engines (WIP)

Extensions! intarray! pg_trgm! ltree! hstore! plantuner! jsquery! RUM

Alexander Korotkov Our answer to Uber 2 / 25

Page 3: Наш ответ Uber’у

Disclaimer

! I’m NOT a MySQL expert. I didn’t even touch MySQLsince 2011...

! This talk express my own opinion, not PostgreSQLcommunity posi on, not even Postgres Professionalofficial posi on.

! Uber’s guys knows be er which database theyshould use.

Alexander Korotkov Our answer to Uber 3 / 25

Page 4: Наш ответ Uber’у

Why did it happen?

! Uber migrated from MySQL to PostgreSQL for “abunch of reasons, but one of the most importantwas availability of PostGIS” ¹

! Uber migrated from PostgreSQL to MySQL “some ofthe drawbacks they found with Postgres” ²

¹https://www.yumpu.com/en/document/view/53683323/migrating-uber-from-mysql-to-postgresql

²https://eng.uber.com/mysql-migration/Alexander Korotkov Our answer to Uber 4 / 25

Page 5: Наш ответ Uber’у

Uber’s complaints to PostgreSQL

Uber claims following “PostgreSQL limita ons”:! Inefficient architecture for writes! Inefficient data replica on! Issues with table corrup on! Poor replica MVCC support! Difficulty upgrading to newer releases

Alexander Korotkov Our answer to Uber 5 / 25

Page 6: Наш ответ Uber’у

PostgreSQL’s vs. InnoDB’s storage formats

Alexander Korotkov Our answer to Uber 6 / 25

Page 7: Наш ответ Uber’у

PostgreSQL storage format

! Both primary and secondary indexes point toloca on (blkno, offset) of tuple (row version) in theheap.

! When tuple is moved to another loca on, allcorresponding index tuples should be inserted to theindexes.

! Heap contains both live and dead tuples.! VACUUM cleans up dead tuples and correspondingindex tuples in a bulk manner.

Alexander Korotkov Our answer to Uber 7 / 25

Page 8: Наш ответ Uber’у

Update in PostgreSQL

! New tuple is inserted to the heap, previous tuple ismarked as deleted.

! Index tuples poin ng to new tuple are inserted to allindexes.

Alexander Korotkov Our answer to Uber 8 / 25

Page 9: Наш ответ Uber’у

Heap-Only-Tuple (HOT) PostgreSQL

! When no indexed columns are updated and newversion of row can fit the same page, then HOT isused and only heap is updated.

! Microvacuum can be used to free required space inthe page for HOT.Alexander Korotkov Our answer to Uber 9 / 25

Page 10: Наш ответ Uber’у

Update in MySQL

! Table rows are placed in the primary index itself. Updates areperformed in-place. Old version of rows are placed to specialsegment (undo log).

! When secondary indexed column is updated, then new indextuple is inserted while previous index tuple is marked asdeleted.Alexander Korotkov Our answer to Uber 10 / 25

Page 11: Наш ответ Uber’у

Updates: InnoDB in comparison withPostgreSQL

Pro:! Update of few of indexed columns is cheaper.! Update, which don’t touch indexed columns, doesn’tdepend on page free space in the page

Cons:! Update of majority of indexed columns is moreexpensive.

! Primary key update is disaster.

Alexander Korotkov Our answer to Uber 11 / 25

Page 12: Наш ответ Uber’у

Pending patches: WARM(write-amplifica on reduc on method)

! Behaves like HOT, but works also when some ofindex columns are updated.

! New index tuples are inserted only for updatedindex columns.

https://www.postgresql.org/message-id/flat/20170110192442.

ocws4pu5wjxcf45b%40alvherre.pgsqlAlexander Korotkov Our answer to Uber 12 / 25

Page 13: Наш ответ Uber’у

Pending patches: indirect indexes

! Indirect indexes are indexes which points to primary key valueinstead of pointer to heap.

! Indirect index is not updates un l corresponding column isupdated.

https://www.postgresql.org/message-id/20161018182843.xczrxsa2yd47pnru@

alvherre.pgsqlAlexander Korotkov Our answer to Uber 13 / 25

Page 14: Наш ответ Uber’у

Ideas: RDS (recently dead store)

! Recently dead tuples (deleted but visible for sometransac ons) are displaced into special storage: RDS.

! Heap tuple headers are le in the heap.

Alexander Korotkov Our answer to Uber 14 / 25

Page 15: Наш ответ Uber’у

Idea: undo log

! Displace old version of rows to undo log.! New index tuples are inserted only for updated index columns.

Old index tuples are marked as expired.! Move row to another page if new version doesn’t fit the page.

https://www.postgresql.org/message-id/flat/CA%2BTgmoZS4_

CvkaseW8dUcXwJuZmPhdcGBoE_GNZXWWn6xgKh9A%40mail.gmail.comAlexander Korotkov Our answer to Uber 15 / 25

Page 16: Наш ответ Uber’у

Idea: pluggable table engines

Owns! Ways to scan and modify tables.! Access methods implementa ons.

Shares! Transac ons, snapshots.! WAL.

https://www.pgcon.org/2016/schedule/events/920.en.htmlAlexander Korotkov Our answer to Uber 16 / 25

Page 17: Наш ответ Uber’у

Types of replica on

! Statement-level – stream wri ng queries to theslave. IMHO, never worked, broken by design.

! Row-level – stream updated rows to the slave.! Block-level – stream blocks and/or block deltas tothe slave.

Alexander Korotkov Our answer to Uber 17 / 25

Page 18: Наш ответ Uber’у

Replica on

Replica on Type MySQL PostgreSQLStatement-level buil n pgPool-IIRow-level buil n pgLogical

LondisteSlony ...

Block-level N/A buil n

Alexander Korotkov Our answer to Uber 18 / 25

Page 19: Наш ответ Uber’у

Uber’s replica on comparison

! Uber compares MySQL replica on versus PostgreSQL replica on.! Actually, Uber compares MySQL row-level replica on versus

PostgreSQL block-level replica on.! Thus, Uber compares not two implementa ons of the same design,

but two completely different designs.

Alexander Korotkov Our answer to Uber 19 / 25

Page 20: Наш ответ Uber’у

Uber’s complaints to PostgreSQL block-levelreplica on

! Replica on stream transfers all the changes at block-level including“write-amplifica on”. Thus, it requires very high-bandwidth channel.In turn, that makes geo-distributed replica on harder.

! There are MVCC limita ons for read-only requires on replica. Applyof VACUUM changes conflicts with read-only queries which could seethe data VACUUM is going to delete.

Alexander Korotkov Our answer to Uber 20 / 25

Page 21: Наш ответ Uber’у

Relica read-only query MVCC conflict withVACUUM

Possible op ons:! Delay the replica on,! Cancel read-only query on replica,! Provide a feedback to master about row versionswhich could be demanded.Alexander Korotkov Our answer to Uber 21 / 25

Page 22: Наш ответ Uber’у

Is row-level replica on superior overblock-level replica on?

Alibaba works on adding block-level replica on to InnoDB. Zhai Weixiang,database developer from Alibaba considers following advantages ofblock-level replica on: ³

! Be er performance: higher throughput and lower response me! Write less data (turn off binary log and g d), and only one fsync to

make transac on durable! Less recovery me

! Replica on! Less replica on latency! Ensure data consistency (most important for some sensi ve clients)

³https://www.percona.com/live/data-performance-conference-2016/sessions/physical-replication-based-innodb

Alexander Korotkov Our answer to Uber 22 / 25

Page 23: Наш ответ Uber’у

About replica on and write-amplifica on

MySQL row-level replica on

PostgreSQL row-level replica on (pgLogical)

Alexander Korotkov Our answer to Uber 23 / 25

Page 24: Наш ответ Uber’у

Other Uber notes

! PostgreSQL 9.2 had data corrup on bug. It was fixed long me ago.Since that me PostgreSQL automated tests system was significantlyimproved to evade such bugs in future.

! pread is faster than seek + read. Thats really gives 1.5% accelera onon read-only benchmark. ⁴

! PostgreSQL advises to setup rela vely small shared_buffers and relyon OS cache, while “InnoDB storage engine implements its own LRUin something it calls the InnoDB buffer pool”. PostgreSQL alsoimplements its own LRU in something it calls the shared buffers. Andyou can setup any shared buffers size.

! PostgreSQL uses mul process model. So, connec on is moreexpensive since unless you use pgBouncer or other externalconnec on pool.

⁴https://www.postgresql.org/message-id/flat/a86bd200-ebbe-d829-e3ca-0c4474b2fcb7%40ohmu.fi

Alexander Korotkov Our answer to Uber 24 / 25

Page 25: Наш ответ Uber’у

Thank you for a en on!

Alexander Korotkov Our answer to Uber 25 / 25