31
How Booking avoids and deals with replication lag (and how you can too) Eric Herman (Principal Developer) [email protected]

with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

How Booking avoids and deals with replication lag

(and how you can too)

Eric Herman (Principal Developer)[email protected]

Page 2: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

In this talk1. Booking.com2. MySQL/MariaDB replication (@Booking.com)3. Replication lag: what/how/why4. Bad solutions to cope with lag5. A Booking.com solution to cope with lag6. Improving on our current solution7. Take home8. Links, questions, and closing

2

Page 3: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Booking.com at a glance● Started in 1996; still based in Amsterdam

● Member of the Priceline Group since 2005 (stock: PCLN)● Amazing growth; continuous scaling challenges

● Online Hotel/Accommodation/Travel Agent (OTA):● Over 1.2 million active properties in 227 countries● Over 1.2 million room nights reserved daily● 40+ languages (website and customer service)● Over 13,000 people working in 187 offices in 70 countries

● We use a lot of MySQL and MariaDB:● Thousands (1000s) of servers, ~90% replicating● >150 masters: ~30 >50 slaves & ~10 >100 slaves

3

Page 4: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Reminder on replication● Typically one read/write master with one or more read-only slaves● The master records all its writes in a journal: the binary logs● Each slave:

● Downloads the journal and saves it locally (IO thread): relay logs● Executes the relay logs on the local copy of the database (SQL thread)● May produce binary logs to be itself a master (log-slave-updates)● Intermediate binlogs have different name+positions from original binlogs

● Replication is:● Asynchronous, thus lag● Slave replay is typically single threaded or less parallel● Overall execution may be slower on slaves than the master

4

Page 5: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

MySQL replication at Booking.com● Typical Booking.com MySQL replication deployment: +-----+ | PM | +-----+ | +----+----+-------------- ... ------+ | | | +-----+ +-----+ +-----+ | IM1 | | IM2 | | IMn | +-----+ +-----+ +-----+ | | +---+----+---- ... --+ +---+----+----- ...

| | | | | +-----+ +-----+ +-----+ +-----+ +-----+ | SA1 | | SA2 | | SAm | | SX1 | | SX2 | +-----+ +-----+ +-----+ +-----+ +-----+5

Page 6: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Replication Management● Lots of home-grown tooling for monitoring, alerting, management● We are using and contributing to Orchestrator:

6

Page 7: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Orchestrator’s lag visualization

7

Page 8: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Extreme lag

8

Page 9: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Why does lag happen ?● In which condition can lag be experienced ?

● Too many transactions for replication to keep up● capacity problem● fix by scaling (splitting, sharding, parallel replication, …)

● Very large and long transactions or DDL (self induced)● fix by a developer in the application● on-line schema change

● Overly aggressive “batch” workload on the master● optimize the batch sizes● slow down, consider a “backpressure” mechanism

9

Page 10: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Lag consequences (1)● What really are the consequences of lag?

● Yes, stale reads on slaves, but this is not necessarily a problem

10

Page 11: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Lag consequences (2)● What really are the consequences of lag?

● Yes, stale reads on slaves, but this is not necessarily a problem● e.g.: A product description is updated, we see the old value for

two minutes

● Examples of stale read problems● A user changes their email address but still sees the old one● A hotel changes its inventory but still sees old availability ● A user books a hotel but does not see it in his reservations

11

Page 12: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Bad solution #1 to cope with lag● Bad solution #1: falling back to reading from the master

● If slaves are lagging, maybe we should read from the master● Maybe this looks like an attractive solution to avoid stale reads● But this does not scale

● Consider why are we reading from slaves in the first place● May cause a sudden load on the master (in case of lag)● And it might cause an outage on the master (yikes!)

● It might be better to fail a read than to fallback to (and kill) the master● Reading from the master might be okay in very specific cases

12

Page 13: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Bad solution #2 to cope with lag● Bad solution #2: retry on another slave

● When reading from a slave: if lag, then retry on another slave● Scales “better" and is maybe okay-ish

● if only ever a few slaves are lagging, never many● But what happens if all slaves are lagging?● Increased load (retries) can slow down replication● This might overload the slaves and cause a good slave to start lagging● In the worst case, this might kill slaves and cause a domino effect

● Again: probably better to fail a read than to cause a bigger problem● Truly, I do not know of a special case where this tactic makes sense.

● Consider separate “interactive” and “reporting” pools of slaves instead13

Page 14: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Some tools already cope with lag● Percona’s pt-online-schema-change

● small chunks● slows down when lag is present

● Github’s gh-ost● built-in heartbeat mechanism which it utilizes to examine replication lag

● Many others, of course.

14

Page 15: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Coping with lag @ Booking.com (1)● A Booking.com solution: “waypoint”

● Place a “waypoint” marker in the replication stream● Waiting for a waypoint is similar to waiting for a slave to catch-up● Presence of that marker on the slave tells us the slave is caught up

● Maybe there is still lag, but we know we are caught up enough● Creating a waypoint is similar to creating a “read view”

15

Page 16: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Coping with lag @ Booking.com (2)● Booking.com waypoint implementation:

● Table: db_waypoint (a single waypoint is a row in that table)● API function: commit_wait(timeout) → (err_code, waypoint)

● INSERTs a waypoint, then executes the COMMIT● polls – until timeout – for row arrival on a slave● Does this look a bit similar to semi-sync?

● API function: waypoint_wait(waypoint, timeout) → err_code● Waits for a waypoint – until timeout – on a slave● This is waiting for a slave to catch-up enough

● Garbage collection: cleanup job that DELETEs old waypoints

16

Page 17: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Coping with lag @ Booking.com (3)● Booking.com waypoint use-cases

● Throttling batches● use commit_wait with a high timeout (backpressure)● use “small” transactions (chunks of 100 to 1000 rows)● sometimes add some extra sleep() between chunks

● Protect from stale reads after writing● store changed data in session● commit_wait with zero timeout● store the waypoint in web session● and waypoint_wait when reading

17

Page 18: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Improving Booking.com waypoints● The waypoint design and implementation still suits us.

● Although, sometimes we have a “fast slave” problem:● Throttling batches on a fast slave is sub-optimal● It would be easy-ish to fix: “find the slowest (or any slow) slave”● But for us, this does not arise very often in practice

● Yet, starting from scratch, we might do things differently:● Inserting, deleting, and purging waypoint could be simplified● And we could get rid of the waypoint table

18

Page 19: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

GTID waypoints (1)● Global Transaction IDs as waypoint

● Get the GTID of the last transaction● last_gtid session variable in MariaDB Server

From https://mariadb.com/kb/en/mariadb/master_gtid_wait/:

MASTER_GTID_WAIT() can also be used in client applications together with the last_gtid session variable. This is useful in a read-scaleout replication setup, where the application writes to a single master but divides the reads out to a number of slaves to distribute the load. In such a setup, there is a risk that an application could first do an update on the master, and then a bit later do a read on a slave, and if the slave is not fast enough, the data read from the slave might not include the update just made, possibly confusing the application and/or the end-user. One way to avoid this is to request the value of last_gtid on the master just after the update. Then before doing the read on the slave, do a MASTER_GTID_WAIT() on the value obtained from the master; this will ensure that the read is not performed until the slave has replicated sufficiently far for the update to have become visible.

19

Page 20: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

GTID waypoints (2)● Global Transaction IDs as waypoint:

● Get the GTID of the last transaction● last_gtid session variable in MariaDB Server● gtid_executed global variable in Oracle MySQL (get all executed GTIDs)● the last GTID can also be requested in the OK packet (only Oracle MySQL)

(session_track_gtids variable and mysql_session_track_get_{first,next} API functions)● Waiting for GTID:

● MASTER_GTID_WAIT in MariaDB Server● WAIT_FOR_EXECUTED_GTID_SET in Oracle MySQL

● Not portable● replicating from MySQL to MariaDB or vice-versa

20

Page 21: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

BinLog waypoints● Binary log file and position as waypoint:

● MASTER_POS_WAIT● However this breaks using intermediate masters● But it is OK with Binlog Servers[1]

● digress for a moment: what is BinLog Server● with a binlog server, the binlog file and position is a GTID

● But currently no way of getting file and position after committing

[1]: https://blog.booking.com/abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html

21

Page 22: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Lag info on the master?● Information about slave lag on the master would be great

● Potentially interesting for monitoring and alerting● Enable better solution for throttling

● Connecting to the right slave is a challenge

● A plugin exists for something close: semi-sync● Using this to track transaction execution on slaves?● Is this the No-Slave-Left-Behind MariaDB Server patch?

22

Page 23: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

No-Slave-Left-Behind● No-Slave-Left-Behind MariaDB Server patch[1]

● Thanks Jonas Oreland and Google● The semi-sync reply also reports SQL-thread position● Transactions are kept in the master plugin until slaves execute

● Slave lag can be approximated?

● Maybe this could easily be modified to implement commit_wait● wait until lag is acceptable● without connecting to any slave

[1]: https://jira.mariadb.org/browse/MDEV-811223

Page 24: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Feature requests● Bug#84747: Expose last transaction GTID in a session variable

● Bug#84748: Request transaction GTID in OK packet on COMMIT(without needing a round-trip)

● MDEV-11956: Get last_gtid in OK packet

● Bug#84779: Expose binlog file and position of last transaction● MDEV-11970: Expose binlog file and position of last transaction

● MDEV-8112: No Slave Left Behind24

Page 25: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

What about you?● Right now you will need to roll your own, like we have

● Rolling your own “waypoint” (even today) is not too hard● Use these and other ideas● Adapt the ideas to your environment and needs● Collaborate● Contribute? (Github? Or maybe even just a talk like this)

● Add yourself to the feature requests● “Affects Me” on bugs.mysql.com● “Vote for this issue” on jira.mariadb.org● “Does this bug affect you?” on bugs.launchpad.net/percona-server

Together, let’s expand what can be done25

Page 26: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Feature requests (again)● Bug#84747: Expose last transaction GTID in a session variable

● Bug#84748: Request transaction GTID in OK packet on COMMIT(without needing a round-trip)

● MDEV-11956: Get last_gtid in OK packet

● Bug#84779: Expose binlog file and position of last transaction● MDEV-11970: Expose binlog file and position of last transaction

● MDEV-8112: No Slave Left Behind26

Page 27: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Oh, and Booking.com is hiring!● Almost any role:

● MySQL Engineer / DBA● System Administrator● System Engineer● Site Reliability Engineer● Developer● Designer● Technical Team Lead● Product Owner● Data Scientist● And many more…

● https://workingatbooking.com/ 27

Page 28: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Links● Booking.com:

● https://blog.booking.com/● https://workingatbooking.com/● https://secure.booking.com/general.en-gb.html?tmpl=docs/about

● MariaDB Server last_gtid (thanks Kristian Nielsen for implementing this):● https://mariadb.com/kb/en/mariadb/master_gtid_wait/

● MySQL GTIDs in OK packet:● session_track_gtids …● mysql_session_track_get_{first,next} …

● No-Slave-Left-Behind MariaDB Server patch:● https://jira.mariadb.org/browse/MDEV-8112 (thanks Jonas Oreland and Google)

28

Page 29: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

More Links● Pull request to extent Perl-DBI for reading GTID in OK packet:

● https://github.com/perl5-dbi/DBD-mysql/pull/77 (thanks Daniël van Eeden)

● Bug reports/Feature requests: ● Bug#84747: Expose last transaction GTID in a session variable.● Bug#84748: Request transaction GTID in OK packet on COMMIT

(without needing a round-trip).● Bug#84779: Expose binlog file and position of last transaction.

● MDEV-11956: Get last_gtid in OK packet.● MDEV-11970: Expose binlog file and position of last transaction.

29

Page 30: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Questions

Page 31: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Thank You!And special thanks with sugar on top to:

PerconaBooking.com

Jean-François Gagné

Eric Herman <[email protected]>