20
Percona XtraDB Cluster 5.6 By Jay Janssen and Jervin Real Copyright © 2006-2014 Percona LLC Field Guide Issue No. 2

Percona XtraDB Cluster 5.6 eBook

Embed Size (px)

DESCRIPTION

Percona XtraDB Cluster 5.6 is a wonderful book about Percona Cluster setup.

Citation preview

  • Percona XtraDB Cluster 5.6

    By Jay Janssen and Jervin Real

    Copyright 2006-2014 Percona LLC

    Field Guide Issue No. 2

  • Table of Contents

    Percona XtraDB Cluster 5.6

    This is the second is a series of short Percona eBooks containing useful tips, examples and best practices for enterprise users of Percona XtraDB Cluster. If there is a topic you would like us to include in our next Issue, please let us know at [email protected] or give is a call at 1-888-316-9775.

    Copyright 2006-2014 Percona LLC

    Chapter 1: Finding a good IST donorChapter 2: keepalived with reader and writer VIPs Chapter 3: New wsrep_provider_optionssChapter 4: Useful MySQL 5.6 features you get for free

    36

    1113

  • Percona XtraDB Cluster 5.6Chapter 1: Finding a good IST donor

    Finding a good IST donor

    Gcache and IST

    The Gcache is a memory-based cache of recent Galera transactions that is local to each node in acluster. If a node leaves and rejoins the cluster, it can use the gcache from another node thatstayed in the cluster (i.e., its donor node) to fetch the transactions it missed (IST) as opposed todoing a full state snapshot transfer (SST). However, there are a few nuances that are not obviousto the beginner:

    The Gcache is lost when a node restartsThe Gcache is fixed size and implemented as a LRU. Once it is full, older transactions rolloff.Donor selection is made irregardless of the gcache stateIf the given donor for a restarting node doesnt have all transactions needed, a full SST(read: full backup) is done insteadUntil recent developments, there was no way to tell what, precisely, was in the Gcache.

    So, with (somewhat) arbitrary donor selection, it was hard to be certain that a node restart wouldnot trigger a SST. For example:

    A node crashed over night or was otherwise down for some length of time. How do youknow if the gcache on any node is big enough to contain all the transactions necessary forIST?If you brought two nodes in your cluster simultaneously, the second one you restart mightselect the first one as its donor and be forced to SST.

    Along comes Percona XtraDB Cluster 5.6

    Astute readers of the Percona XtraDB Cluster 5.6.15 release notes will have noticed this little tidbit:

    New wsrep_local_cached_downto status variable has been introduced. This variableshows the lowest sequence number in gcache. This information can be helpful withdetermining IST and/or SST.

    3

    By Jay Janssen

  • Percona XtraDB Cluster 5.6Chapter 1: Finding a good IST donor

    Until this release there was no visibility into any nodes Gcache and what was likely to happenwhen restarting a node. You could make some assumptions, but now it its a bit easier to:

    1. Tell if a given node would be a suitable donor2. And hence select a donor manually using wsrep_sst_donor instead of leaving it to chance.

    What it looks like

    Suppose I have a 3 node cluster where load is hitting node1. I execute the following in sequence:

    1. Shut down node22. Shut down node33. Restart node2

    At step 3, node1 is the only viable donor for node2. Because our restart was quick, we can havesome reasonable assurance that node2 will IST correctly (and it does).

    However, before we restart node3, lets check the oldest transaction in the gcache on nodes 1 and 2:

    So we can see that node1 has a much more complete gcache than node2 does (i.e., a muchsmaller seqno). Node2?s gcache was wiped when it restarted, so it only has transactions from afterits restart.

    4

  • Percona XtraDB Cluster 5.6Chapter 1: Finding a good IST donor

    To check node3?s GTID, we can either check the grastate.dat, or (if it has crashed and thegrastate is zeroed) use wsrep_recover:

    So, armed with this information, we can tell what would happen to node3, depending on which donor was selected:

    So, we can instruct node3 to use node1 as its donor on restart with wsrep_sst_donor:

    Note that passing mysqld options on the command line is only supported in RPM packages,Debian requires you put that setting in your my.cnf. We can see from node3?s log that it doesproperly IST:

    5

    Sometime in the future, this may be handled automatically on donor selection, but for now it is veryuseful that we can at least see the status of the gcache.

  • keepalived with reader and writer VIPs

    We had a request recently in which the customer had 2 VIPs (Virtual IP addresses), one for readerand one for a writer for a cluster of 3 nodes. They wanted to keep it simple, with low latency anddoes not require an external node resource like HaProxy would.

    keepalived is a simple load balancer with HA capabilities, which means it can proxy TCP servicesbehind it and at the same time, keep itself highly available using VRRP as failover mechanism.This chapter is about taking advantage of the VRRP capabilities built into keepalived to intelligentlymanage your PXC VIPs.

    While Yves Trudeau also wrote a very interesting and somewhat similar solution using ClusterIPand Pacemaker to load balance VIPs, they have different use cases. Both solutions reduce latencyfrom an external proxy or load balancer, but unlike ClusterIP, connections to the desired VIP withkeepalived go to a single node which means a little less work for each node trying to see if theyshould respond to the request. ClusterIP is good if you want to send writes to all nodes incalculated distribution while with our keepalived option, each VIP at best assigned to only a singlenode depending on your workload, each will have advantages and disadvantages.

    The OS I used was CentOS 6.4 with keepalived 1.2.7 available in the yum repositories, however,its difficult to troubleshoot failover behavior with VRRP_Instance weights without seeing themfrom keepalived directly. So I used a custom build, with a patch for vrrp-status option that allowsme to monitor something like this:

    6

    By Jervin Real

    Percona XtraDB Cluster 5.6Chapter 2: keepalived with reader and writer VIPs

  • So first, lets compile keepalived from source, the Github branch here is where the status patch isavailable.

    Install the customer tracker script below because compiling keepalived above installs it on/usr/local/bin, I put this script there as well. One would note that this script is completely redundant,its true, but beware that keepalived does not validate its configuration, especially track_scripts so Iprefer to have it on separate bash script so I can easily debug misbehavior. Of course when all isworking well, you can always merge this to the keepalived.conf file.

    7

    And on the following page is my /etc/keepalived.conf:

    Percona XtraDB Cluster 5.6Chapter 2: keepalived with reader and writer VIPs

  • 8

    Percona XtraDB Cluster 5.6Chapter 2: keepalived with reader and writer VIPs

  • There are a number of things you can change here like remove or modify the notify_* clauses to fityour needs or send SMTP notifications during VIP failovers. I also prefer the initial state of theVRRP_Instances to be on BACKUP instead of master and let the voting on runtime dictate wherethe VIPs should go.

    The configuration ensures that the reader and writer will not share a single node if more than one isavailable in the cluster. Even though the writer VIP prefers pxc01 in my example, this does notreally matter much and only makes a difference when the reader VIP is not in the picture, there isno automatic failback with the help of the nopreempt_* track_scripts.

    Now, to see it in action, after starting the cluster and keepalived in order pxc01, pxc02, pxc03, Ihave these statuses and weights:

    The writer is on pxc01 and reader on pxc02 even though the reader VIP score between pxc02and pxc03 matches, it remains on pxc02 because of our nopreempt_* script. Lets see whathappens if I stop MySQL on pxc02:

    9

    The reader VIP moved to pxc03 and the weights changed, pxc02 reader dropped by 100 and onpxc03 it gained by 50 again we set this higher for nor preempt. Now lets stop MySQL on pxc03:

    Percona XtraDB Cluster 5.6Chapter 2: keepalived with reader and writer VIPs

  • Our reader is back to pxc02 and writer remains intact. When both VIPs end up on a single node i.e. last node standing, and a second node comes up, the reader moves not the writer this is to prevent any risks in breaking any connections that may be writing to the node currently owningthe VIP.

    10

    Percona XtraDB Cluster 5.6Chapter 2: keepalived with reader and writer VIPs

  • Percona XtraDB Cluster 5.6Chapter 3: New wsrep_provider_options

    New wsrep_provider_options

    Now that Percona XtraDB Cluster 5.6 is out, I wanted to talk about some of the new features inGalera 3 and Percona XtraDB Cluster 5.6. On the surface, Galera 3 doesnt reveal a lot of newfeatures yet, but there has been a lot of refactoring of the system in preparation for great newfeatures in the future.

    Galera vs. MySQL options

    wsrep_provider_options is a semi-colon separated list of key => value configurations that setlow-level Galera library configuration. These tweak the actual cluster communication andreplication in the group communication system. By contrast, other Percona XtraDB Cluster globalvariables (like wsrep%) are set like other mysqld options and generally have more to do withMySQL/Galera integration. This post will cover the Galera options and mysql-level changes willhave to wait for another post.

    Here are the differences in the wsrep_provider_options between 5.5 and 5.6:

    gmcast.segment=0

    This is a new setting in 3.x and allows us to distinguish between nodes in different WAN segments. For example, all nodes in a single datacenter would be configured with the same segment number,but each datacenter would have its own segment.

    Segments are currently used in two main ways:

    11

    By Jay Janssen

    1. Replication traffic between segments is minimized. Writesets originating in one segmentshould be relayed through only one node in every other segment. From those local relaysreplication is propagated to the rest of the nodes in each segment respectively.

    2. Segments are used in Donor-selection. Yes, donors in the same segment are preferred,but not required.

  • Percona XtraDB Cluster 5.6Chapter 3: New wsrep_provider_options

    replicator -> repl

    The older replicator tag is now renamed to repl and the causal_read_timeout andcommit_order settings have moved there. No news here really.

    repl.key_format = FLAT8

    Every writeset in Galera has associated keys. These keys are effectively a list of primary, unique,and foreign keys associated with all rows modified in the writeset. In Galera 2 these keys werereplicated as literal values, but in Galera 3 they are hashed in either 8 or 16 byte values (FLAT8 vsFLAT16). This should generally make the key sizes smaller, especially with large CHAR keys.

    Because the keys are now hashed, there can be collisions where two distinct literal key valuesresult in the same 8-byte hashed value. This means practically that the places in Galera that relyon keys may falsely believe that there is a match between two writesets when there really is not. This should be quite rare. This false positive could affect:

    Local certification failures (Deadlocks on commit) that are unnecessary.Parallel apply things could be done in a stricter order (i.e., less parallelization) thannecessary

    Neither case affects data consistency. The tradeoff is more efficiency in keys and key operationsgenerally making writesets smaller and certification faster.

    repl.proto_max

    Limits the Galera protocol version that can be used in the cluster. Coderships documentationstates it is for debugging only.

    12

    1. Replication traffic between segments is minimized. Writesets originating in one segmentshould be relayed through only one node in every other segment. From those local relaysreplication is propagated to the rest of the nodes in each segment respectively.

    2. Segments are used in Donor-selection. Yes, donors in the same segment are preferred,but not required.

    socket.checksum = 2

    This modifies the previous network packet checksum algorithm (CRC32) to support CRC32-Cwhich is hardware accelerated on supported gear. Packet checksums also can now be completelydisabled (=0).

  • Useful MySQL 5.6 features you get for free

    13

    By Jay Janssen

    I get a lot of questions about Percona XtraDB Cluster 5.6 (PXC 5.6), specifically about whether such and such MySQL 5.6 Community Edition feature is in Percona XtraDB Cluster 5.6. The short answer is: yes, all features in community MySQL 5.6 are in Percona Server 5.6 and, in turn, are in PXC 5.6. Whether or not the new feature is useful in 5.6 really depends on how useful it is in general with Galera.I thought it would be useful to highlight a few features and try to show them working:

    Innodb Fulltext Indexes

    Yes, FTS works in Innodb in MySQL 5.6, so why wouldnt it work in Percona XtraDB Cluster 5.6? To test this I used the Sakila database, which contains a single table with FULLTEXT. In the sakila-schema.sql file, it is still designated a MyISAM table:

    I edited that file to change MyISAM to Innodb, loaded the schema and data into my 3 node cluster:

    and it works seamlessly:

    Percona XtraDB ClusterChapter 4: Useful MySQL 5.6 features you get for free

  • 14

    Sure enough, I can run this query on any node and it works fine:

    There might be a few caveats and differences from how FTS works in Innodb vs MyISAM, but it isthere.

    Minimal replication images

    Galera relies heavily on RBR events, but until 5.6 those were entire row copies, even if you onlychanged a single column in the table. In 5.6 you can change this to send only the updated datausing the variable binlog_row_image=minimal.

    Using a simple sysbench update test for 1 minute, I can determine the baseline size of thereplicated data:

    This results in 62.3 MB of data replicated in this test.

    Percona XtraDB ClusterChapter 4: Useful MySQL 5.6 features you get for free

  • 15

    If I set binlog_row_image=minimal on all nodes and do a rolling restart, I can see how this changes:

    This yields a mere 13.4MB, thats 80% smaller, quite a savings! This benefit, of course, fullydepends on the types of workloads you are doing.

    Durable Memcache Cluster

    It turns out this feature does not work properly with Galera, see below for an explanation:

    5.6 introduces an Memcached interface for Innodb. This means any standard memcache clientcan talk to our PXC nodes with the memcache protocol and the data is:

    To set this up, we need to simply load the innodb_memcache schema from the example andrestart the daemon to get a listening memcached port:

    Percona XtraDB ClusterChapter 4: Useful MySQL 5.6 features you get for free

  • 16

    This all appears to work and I can fetch the sample AA row from all the nodes with the memcached interface:

    However, if I try to update a row, it does not seem to replicate (even if I set innodb_api_enable_binlog):

    So unfortunately the memcached plugin must use some backdoor to Innodb that Galera is unawareof. Ive filed a bug on the issue, but its not clear if there will be an easy solution or if a whole lot ofcode will be necessary to make this work properly.

    In the short-term, however, you can at least read data from all nodes with the memcached pluginas long as data is only written using the standard SQL interface.

    Percona XtraDB ClusterChapter 4: Useful MySQL 5.6 features you get for free

  • 17

    Async replication GTID Integration

    Async GTIDs were introduced in 5.6 in order to make CHANGE MASTER easier. You have alwaysbeen able to use async replication from any cluster node, but now with this new GTID support, it ismuch easier to failover to another node in the cluster as a new master.

    If we take one node out of our cluster to be a slave and enable GTID binary logging on the othertwo by adding these settings:

    If I generate some writes on the cluster, I can see GTIDs are working:

    Notice that were at GTID 1505 on both nodes, even though the binary log position happens to bedifferent.

    I set up my slave to replicate from node1 (.70.2):

    Percona XtraDB ClusterChapter 4: Useful MySQL 5.6 features you get for free

  • 18

    So this seems to work pretty well.

    Conclusion

    MySQL 5.6 introduces a lot of new interesting features that are even more compelling in thePXC/Galera world. If you want to experiment for yourself, I pushed the Vagrant environment I usedto Github at: https://github.com/jayjanssen/pxc_56_features

    And its all caught up. If put some load on the cluster, I can easily change to node2 as my masterwithout needing to stop writes:

    Percona XtraDB ClusterChapter 4: Useful MySQL 5.6 features you get for free

  • 21

    About the authors

    Copyright 2006-2014 Percona LLC

    Jay Janssen: Percona principal consultant Jay joined Percona in 2011 after 7 years at Yahoo working in a variety of fields including High Availability architectures, MySQL training, tool building, global server load balancing, multi-datacenter environments, operationalization, and monitoring. He holds a B.S. of Computer Science from Rochester Institute of Technology.

    Jervin Real: Percona support engineerWhen you come to Percona for consulting and support, chances are he'll be greeting you first. His primary role is to make sure customer issues are handled efficiently and professionally. Jervin joined Percona in May 2010.

  • 22

    Percona has made MySQL faster and more reliable for over 2,000 consulting and support customers worldwide since 2006. Percona provides enterprise-grade MySQL support, Consulting, Training, Remote DBA, and Server Development services to companies such as Cisco Systems, Alcatel-Lucent, Groupon, and the BBC. Percona's founders authored the definitive book High Performance MySQL from O'Reilly Press and the widely read MySQL Performance Blog. Percona also develops software for MySQL users, including Percona Server, Percona XtraBackup, Percona XtraDB Cluster, and Percona Toolkit. The popular Percona Live conferences draw attendees and acclaimed speakers from around the world. For more information, visit www.percona.com.

    About Percona

    Copyright 2006-2014 Percona LLC

    Table of ContentsPercona XtraDB Cluster 5.6Finding a good IST donorkeepalived with reader and writer VIPsNew wsrep_provider_optionsUseful MySQL 5.6 features you get for free

    Blank PageBlank Page