Percona XtraDB Cluster Tutorial

Preview:

Citation preview

Percona XtraDB Cluster Tutorialhttp://www.percona.com/training/

Table of Contents

1. Overview & Setup 6. Incremental State Transfer2. Application HA 7. Node Failure Scenarios3. Monitoring Galera withPMM 8 Avoiding SST

4. Galera ReplicationInternals 9. Tuning Replication

5. Online Schema Changes

© 2011 - 2018 Percona, Inc.2 / 107

XtraDB Cluster Tutorial

OVERVIEW

© 2011 - 2018 Percona, Inc.3 / 107

Resourceshttp://www.percona.com/doc/percona-xtradb-cluster/5.7/http://www.percona.com/blog/category/percona-xtradb-cluster/http://galeracluster.com/documentation-webpages/reference.html

© 2011 - 2018 Percona, Inc.4 / 107

Lab InstructionsGet a USB stick (there are only 5)Copy the 2018-pxc-tutorial.ova to your laptopCopy VirtualBox installer to your laptop, if necessaryEJECT USB STICK and had over to someone elseInstall VirtualBox, if necessaryOpen VirtualBox and import the OVAStart all 4 virtual machinesHide/Minimize the console windows; you will not usethemWindows users - you need an SSH client (download onenow!)

© 2011 - 2018 Percona, Inc.5 / 107

Standard ReplicationReplication is a mechanism for recording a series ofchanges on one MySQL server and applying the samechanges to one, or more, other MySQL server.The source is the "master" and its replica is a "slave."

© 2011 - 2018 Percona, Inc.6 / 107

Galera ReplicationDeveloped by CodershipSynchronous multi-master database clusterWrite to any node; every node stays in sync.

© 2011 - 2018 Percona, Inc.7 / 107

Quick Tutorial Setup3 unconfigured MySQL Servers (mysql1, mysql2, mysql3)app will run sysbench

Sample application, simulating activity

© 2011 - 2018 Percona, Inc.8 / 107

Verify Connectivity and SetupConnect to all VMs

Use separate terminal windows; we will be switchingbetween them frequently.Username and password is vagrantDO NOT USE VIRTUAL BOX CONSOLE WINDOWS!

ssh -p 2200 vagrant@localhostssh -p 2201 vagrant@localhostssh -p 2202 vagrant@localhostssh -p 2222 vagrant@localhost

© 2011 - 2018 Percona, Inc.9 / 107

Verify Connectivity and Setup (cont)Be sure to sudo to rootStop MySQL if running on all nodes

# systemctl stop mysql

When you restart the Virtual Machines, they auto-startMySQL and create a default setupErase MySQL's $DATADIR

# rm -rf /var/lib/mysql/*

© 2011 - 2018 Percona, Inc.10 / 107

Configure mysql3 for PXC/etc/percona-xtradb-cluster.conf.d/wsrep.cnf

...wsrep_provider = /usr/lib64/galera3/libgalera_smm.sowsrep_cluster_address = gcomm://mysql1,mysql2,mysql3wsrep_node_address = 192.168.70.30wsrep_cluster_name = myclusterwsrep_node_name = mysql3wsrep_sst_auth = "sstuser:s3cretPass"

Make sure any of the above are NOT COMMENTED!Start mysql3 with:

mysql3# systemctl start mysql@bootstrap

© 2011 - 2018 Percona, Inc.11 / 107

Checking Cluster StateMySQL 5.7 generates random root password.

Find it and reset it.

$ grep password /var/log/mysqld.log... password is generated for root@localhost: :#xfgy*aD9ea

$ mysql -e "SET PASSWORD='Percona1234#'" -uroot -p \--connect-expired-password

Persist password in ~/.my.cnf on all 3 nodes to makethings easier

$ cat ~/.my.cnf[client]password="Percona1234#"

© 2011 - 2018 Percona, Inc.12 / 107

Checking Cluster StateCan you tell:

How many nodes are in the cluster?Is the cluster Primary?

Use 'myq_status'* to check the state of Galera on mysql3:

mysql3# /usr/local/bin/myq_status wsrep

Check status counters manually

mysql> SHOW GLOBAL STATUS LIKE 'wsrep%'

© 2011 - 2018 Percona, Inc.(*) https://github.com/jayjanssen/myq-tools/releases/tag/1.0.4

13 / 107

mysql2's Config/etc/percona-xtradb-cluster.conf.d/wsrep.cnf

...wsrep_provider = /usr/lib64/galera3/libgalera_smm.sowsrep_cluster_address = gcomm://mysql1,mysql2,mysql3wsrep_node_address = 192.168.70.20wsrep_cluster_name = myclusterwsrep_node_name = mysql2wsrep_sst_auth = "sstuser:s3cretPass"

Don't start MySQL yet!

© 2011 - 2018 Percona, Inc.14 / 107

What's going to happen?Don't start MySQL yet!When mysql2 is started, what's going to happen?How does mysql2 get a copy of the dataset?

Can it use the existing data it already has?Do we have to bootstrap mysql2?

© 2011 - 2018 Percona, Inc.15 / 107

State Snapshot Transfer (SST)Transfer a full backup of an existing cluster member(donor) to a new node entering the cluster (joiner).We configured our SST method to use ‘xtrabackup-v2'.

© 2011 - 2018 Percona, Inc.16 / 107

Additional Xtrabackup SST SetupWhat about that sst user in your wsrep.cnf?

[mysqld]...wsrep_sst_auth = "sstuser:s3cretPass"...

User/Grants need to be added for Xtrabackup

mysql3> CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 's3cretPass';mysql3> GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENTON *.* TO 'sstuser'@'localhost';

The good news is that once you get things figured out thefirst time, it's typically very easy to get an SST the firsttime on subsequent nodes.

© 2011 - 2018 Percona, Inc.17 / 107

Up and Running mysql2Watch mysql3's error log for progress

mysql3# tail -f /var/log/mysqld.log

Now you can start mysql2

mysql2# systemctl start mysql

© 2011 - 2018 Percona, Inc.18 / 107

mysql1's Config...wsrep_provider = /usr/lib64/galera3/libgalera_smm.sowsrep_cluster_address = gcomm://mysql1,mysql2,mysql3wsrep_node_address = 192.168.70.10wsrep_cluster_name = myclusterwsrep_node_name = mysql1wsrep_sst_auth = "sstuser:s3cretPass"

Now you can start mysql1

mysql1# systemctl start mysql

© 2011 - 2018 Percona, Inc.19 / 107

Where are We Now?

© 2011 - 2018 Percona, Inc.20 / 107

XtraDB Cluster Tutorial

APPLICATION HIGH AVAILABILITY

© 2011 - 2018 Percona, Inc.21 / 107

ProxySQL Architecture

© 2011 - 2018 Percona, Inc.22 / 107

Setting up ProxySQLCreate monitor user

mysql3> CREATE USER 'monitor'@'%' IDENTIFIED BY 'monit0r';

Create Cluster Admin User

mysql3> CREATE USER 'cuser'@'%' IDENTIFIED BY 'cpass';mysql3> GRANT ALL ON *.* TO 'cuser'@'%';

© 2011 - 2018 Percona, Inc.23 / 107

Setting up ProxySQLCreate monitor user

mysql3> CREATE USER 'monitor'@'%' IDENTIFIED BY 'monit0r';

Create Cluster Admin User

mysql3> CREATE USER 'cuser'@'%' IDENTIFIED BY 'cpass';mysql3> GRANT ALL ON *.* TO 'cuser'@'%';

Do we have to create these users on the other nodes?

© 2011 - 2018 Percona, Inc.23 / 107

Configure ProxySQLCreate and edit the file

app# nano /etc/proxysql-admin.cnf

export PROXYSQL_USERNAME="admin"export PROXYSQL_PASSWORD="admin"export PROXYSQL_HOSTNAME="localhost"export PROXYSQL_PORT="6032"export CLUSTER_USERNAME="cuser"export CLUSTER_PASSWORD="cpass"export CLUSTER_HOSTNAME="mysql3"export CLUSTER_PORT="3306"export MONITOR_USERNAME="monitor"export MONITOR_PASSWORD="monit0r"export CLUSTER_APP_USERNAME="sbuser"export CLUSTER_APP_PASSWORD="sbPass1234#"export WRITE_HOSTGROUP_ID="10"export READ_HOSTGROUP_ID="11"export MODE="singlewrite"export HOST_PRIORITY_FILE="/var/lib/proxysql/host_priority.conf"

© 2011 - 2018 Percona, Inc.24 / 107

Tibi

Start ProxySQLStart ProxySQL with the following command:

app# systemctl restart proxysql

app# proxysql-admin --config-file=/etc/proxysql-admin.cnf \--write-node=192.168.70.30:3306 --enable

© 2011 - 2018 Percona, Inc.25 / 107

Start ProxySQLStart ProxySQL with the following command:

app# systemctl restart proxysql

app# proxysql-admin --config-file=/etc/proxysql-admin.cnf \--write-node=192.168.70.30:3306 --enable

mysql3 is the writer nodeGrant all privileges to app user

mysql3> CREATE DATABASE sysbench;

mysql3> GRANT ALL ON sysbench.* TO 'sbuser'@'192.%';

© 2011 - 2018 Percona, Inc.25 / 107

Check ProxySQL Statsapp# mysql -P6032 -uadmin -padmin -h 127.0.0.1

proxysql> SELECT * FROM stats_mysql_connection_pool ORDER BY hostgroup, srv_host DESC;

+-----------+---------------+----------+--------+...------------+| hostgroup | srv_host | srv_port | status |... Latency_us |+-----------+---------------+----------+--------+...------------+| 10 | 192.168.70.30 | 3306 | ONLINE |... 461 || 11 | 192.168.70.20 | 3306 | ONLINE |... 820 || 11 | 192.168.70.10 | 3306 | ONLINE |... 101 |+-----------+---------------+----------+--------+...------------+

© 2011 - 2018 Percona, Inc.26 / 107

Prepare Test DataCreate some test data for our sample application

app$ /usr/local/bin/prepare_sysbench.shsysbench 1.0.14 (using bundled LuaJIT 2.1.0-beta2)

Initializing worker threads...

Creating table 'sbtest1'...Inserting 1000000 records into 'sbtest1'Creating a secondary index on 'sbtest1'...

© 2011 - 2018 Percona, Inc.27 / 107

Start SysbenchSending Reads and Writes

Writes goes to Hostgroup 10 (mysql3)Reads goes to Hostgroup 11 (mysql1,mysql2)

app# screen /usr/local/bin/run_sysbench_oltp.sh

[ 1s] threads: 1, tps: 9.99, reads: 139.89, writes: 39.97, ...[ 2s] threads: 1, tps: 7.00, reads: 98.04, writes: 28.01, ...[ 3s] threads: 1, tps: 7.00, reads: 98.03, writes: 28.01, ...[ 4s] threads: 1, tps: 14.00, reads: 195.93, writes: 55.98, ...

Check ProxySQL Stats

app# mysql -P6032 -uadmin -padmin -h 127.0.0.1

proxysql> SELECT * from stats_mysql_connection_poolORDER BY hostgroup, srv_host DESC;

© 2011 - 2018 Percona, Inc.28 / 107

Test mysql2 ShutdownStart myq_status on mysql3Shut down mysql2

systemctl stop mysqlCheck ProxySQL Stats

proxysql> SELECT * from stats_mysql_connection_poolORDER BY hostgroup, srv_host DESC;+-----------+---------------+----------+--------..| hostgroup | srv_host | srv_port | status ..+-----------+---------------+----------+--------..| 10 | 192.168.70.30 | 3306 | ONLINE ..| 11 | 192.168.70.20 | 3306 | OFFLINE_HARD ..| 11 | 192.168.70.10 | 3306 | ONLINE ..+-----------+---------------+----------+--------..

Start mysql2 back upmysql2# systemctl restart mysql

© 2011 - 2018 Percona, Inc.29 / 107

Crash TestStop mysql1 destructively:

mysql1# killall -9 mysqld_safe; killall -9 mysqld

Watch sysbech and myq_statusCheck ProxySQL statusRestart mysql2

mysql1# systemctl restart mysql

© 2011 - 2018 Percona, Inc.30 / 107

Stop writer nodeStop MySQL on mysql3Check Sysbench and myq_statusCheck ProxySQL status

mysql3# systemctl stop mysql@bootstrap

proxysql> SELECT * from stats_mysql_connection_poolORDER BY hostgroup, srv_host DESC;

+-----------+--------------+----------+--------...| hostgroup | srv_host | srv_port | status ...+-----------+--------------+----------+--------...| 10 | 192.168.70.2 | 3306 | ONLINE ...| 11 | 192.168.70.3 | 3306 | ONLINE ...+-----------+--------------+----------+--------...

mysql1 is in the write hostgroup now.

© 2011 - 2018 Percona, Inc.31 / 107

Restarting mysql3mysql3 was bootstrapped earlier

What did that mean?Do we have to bootstrap mysql3 again?

© 2011 - 2018 Percona, Inc.32 / 107

Restarting mysql3mysql3 was bootstrapped earlier

What did that mean?Do we have to bootstrap mysql3 again?

No, we already having a running cluster.Bootstrapping creates a new cluster.Restart mysql3 normally

mysql3# systemctl restart mysql

© 2011 - 2018 Percona, Inc.32 / 107

XtraDB Cluster Tutorial

MONITORING GALERA with PMM

© 2011 - 2018 Percona, Inc.33 / 107

Percona Monitoring and Management

© 2011 - 2018 Percona, Inc.(*) https://pmmdemo.percona.com/

34 / 107

Create PMM containerStart DockerCreate PMM Container

app# docker create \ -v /opt/prometheus/data \ -v /opt/consul-data \ -v /var/lib/mysql \ -v /var/lib/grafana \ --name pmm-data \ percona/pmm-server:latest /bin/true

© 2011 - 2018 Percona, Inc.35 / 107

Start PMM containerapp# docker run -d \ -p 80:80 \ --volumes-from pmm-data \ --name pmm-server \ --restart always \ percona/pmm-server:latest

Check the dashboardOpen http://192.168.70.200/graph/

© 2011 - 2018 Percona, Inc.36 / 107

Configure PMM Clients on All nodesRun the following on all 3 cluster nodes and app server

mysql*# pmm-admin config --server 192.168.70.200mysql*# pmm-admin add mysql

app# pmm-admin config --server 127.0.0.1app# pmm-admin add proxysql:metrics

Check the dashboard againOpen http://192.168.70.200/graph/

© 2011 - 2018 Percona, Inc.37 / 107

XtraDB Cluster Tutorial

GALERA REPLICATION INTERNALS

© 2011 - 2018 Percona, Inc.38 / 107

Group ReplicationBased on Totem Single-ring OrderingAll nodes have to ACK messageUses binlog row events

But does not require binary loggingWrites events to Gcache (configurable size)Nodes self-subscribe to communication channel

© 2011 - 2018 Percona, Inc.39 / 107

Replication RolesWithin the cluster, all nodes are equalmaster/donor node

The source of the transaction.slave/joiner node

Receiver of the transaction.Writeset: A transaction; one or more row changes

© 2011 - 2018 Percona, Inc.40 / 107

State Transfer RolesNew nodes joining an existing cluster get provisionedautomatically

Joiner = New nodeDonor = Node giving a copy of the data

State Snapshot transferFull backup of Donor to Joiner

Incremental Snapshot transferOnly changes since node left cluster

© 2011 - 2018 Percona, Inc.41 / 107

What's inside a write-set?The payload; Generated from the RBR eventKeys; Generated by the source node

Primary KeysUnique KeysForeign Keys

Keys are how certification happesA PRIMARY KEY on every table is required.

© 2011 - 2018 Percona, Inc.42 / 107

Replication StepsAs Master/Donor

ApplyReplicateCertifyCommit

As Slave/JoinerReplicate (receive)Certify (includes reply to master/donor)ApplyCommit

© 2011 - 2018 Percona, Inc.43 / 107

What is Certification?In a nutshell: "Can this transaction be applied?"Happens on every node for all write-setsDeterministicResults of certification are not relayed back tomaster/donor

In event of failure, drop transaction locally; Possiblyremove self from cluster.On success, write-set enters apply queue

Serialized via group communication protocol

© 2011 - 2018 Percona, Inc.(*) https://www.percona.com/blog/2016/04/17/how-percona-xtradb-cluster-certification-works/

44 / 107

The ApplyApply is done locally, asynchronously, on each node, aftercertification.Can be parallelized like standard async replication

wsrep_slave_threads > 1If there is a conflict, will create brute force aborts (bfa) onlocal node.

© 2011 - 2018 Percona, Inc.45 / 107

Last but not least, Commit StageLocal, on each node, InnoDB commit.Applier thread executes on slave/joiner nodes.Regular connection thread for master/donor node.

© 2011 - 2018 Percona, Inc.46 / 107

Galera Replication Example

© 2011 - 2018 Percona, Inc.47 / 107

Galera Replication Example

© 2011 - 2018 Percona, Inc.48 / 107

Galera Replication Example

© 2011 - 2018 Percona, Inc.49 / 107

Galera Replication Example

© 2011 - 2018 Percona, Inc.50 / 107

Galera Replication Example

© 2011 - 2018 Percona, Inc.51 / 107

Galera Replication Example

© 2011 - 2018 Percona, Inc.52 / 107

Galera Replication Example

© 2011 - 2018 Percona, Inc.53 / 107

Galera Replication Example

© 2011 - 2018 Percona, Inc.54 / 107

Galera Replication Example

© 2011 - 2018 Percona, Inc.55 / 107

Galera Replication Facts"Replication" is synchronous; slave apply is notA successful commit reply to client means a transactionWILL get applied across the entire cluster.A node that tries to commit a conflicting transactionbefore will get a deadlock error.

This is called a local certification failure or 'lcf'When our transaction is applied, any competingtransactions that haven't committed will get a deadlockerror.

This is called a brute force abort or 'bfa'

© 2011 - 2018 Percona, Inc.56 / 107

Brute Force Abort Example

© 2011 - 2018 Percona, Inc.57 / 107

Brute Force Abort Example

© 2011 - 2018 Percona, Inc.58 / 107

Brute Force Abort Example

© 2011 - 2018 Percona, Inc.59 / 107

Brute Force Abort Example

© 2011 - 2018 Percona, Inc.60 / 107

Brute Force Abort Example

© 2011 - 2018 Percona, Inc.61 / 107

Brute Force Abort Example

© 2011 - 2018 Percona, Inc.62 / 107

Brute Force Abort Example

© 2011 - 2018 Percona, Inc.63 / 107

Brute Force Abort ExerciseCreate a test table

mysql1> CREATE DATABASE test;mysql1> CREATE TABLE test.deadlocks (i INT UNSIGNED NOT NULLPRIMARY KEY, j VARCHAR(32) );

mysql1> INSERT INTO test.deadlocks VALUES (1, NULL);mysql1> BEGIN; -- Start explicit transactionmysql1> UPDATE test.deadlocks SET j = 'mysql1' WHERE i = 1;

Before we commit on mysql1, go to mysql3 in a separatewindow:

mysql3> BEGIN; -- Start transactionmysql3> UPDATE test.deadlocks SET j = 'mysql3' WHERE i = 1;

mysql3> COMMIT;

mysql1> COMMIT;

mysql1> SELECT * FROM test.deadlocks;

© 2011 - 2018 Percona, Inc.64 / 107

Unexpected Deadlocks (cont.)Which commit succeeds?Will only a COMMIT generate the deadlock error?Is this a lcf or bfa?How would you diagnose this error?

SHOW STATUS LIKE 'wsrep_local_bf%';

© 2011 - 2018 Percona, Inc.65 / 107

When could you multi-node write?When your application can handle the deadlocksgracefully

wsrep_retry_autocommit can helpApplication-specific schemasIsolated traffic to specific tables

© 2011 - 2018 Percona, Inc.66 / 107

AUTO_INCREMENT HandlingPick one node to work on:

mysql3> CREATE TABLE sysbench.autoinc (i INT NOT NULL AUTO_INCREMENT PRIMARY KEY, j VARCHAR(32) );

mysql3> INSERT INTO sysbench.autoinc (j) VALUES ('mysql3');mysql3> INSERT INTO sysbench.autoinc (j) VALUES ('mysql3');

mysql3> SELECT * FROM sysbench.autoinc;

What is odd about the values for 'i'?What happens if you do the inserts on each node inorder?Check wsrep_auto_increment_control andauto_increment% global variables

© 2011 - 2018 Percona, Inc.67 / 107

Encrypt all the Things!MySQL 5.7+ out-of-box SSL ready

SSL certificates created if not presentSST/IST/Galera traffic not encrypted by defaultSet pxc-encrypt-cluster-traffic=ON

Passes generated certs by MySQL to GaleraConfigure manually:

[mysqld]wsrep_provider_options="socket.ssl=yes;socket.ssl_key=/path/to/server-key.pem;socket.ssl_cert=/path/to/server-cert.pem;socket.ssl_ca=/path/to/cacert.pem"

[sst]encrypt=4ssl-ca=<PATH>/ca.pemssl-cert=<PATH>/server-cert.pemssl-key=<PATH>/server-key.pem

Must bootstrap cluster using SSL

© 2011 - 2018 Percona, Inc.68 / 107

XtraDB Cluster Tutorial

ONLINE SCHEMA CHANGES

© 2011 - 2018 Percona, Inc.69 / 107

How do we ALTER TABLEStandard MySQL: Execute directly

Lock table, create new table, copy rows, swap, releaselocks

© 2011 - 2018 Percona, Inc.70 / 107

How do we ALTER TABLEStandard MySQL: Execute directly

Lock table, create new table, copy rows, swap, releaselocks

How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)

© 2011 - 2018 Percona, Inc.70 / 107

How do we ALTER TABLEStandard MySQL: Execute directly

Lock table, create new table, copy rows, swap, releaselocks

How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)

Executed across all nodes, simultaneouslyCan impact applications

© 2011 - 2018 Percona, Inc.70 / 107

How do we ALTER TABLEStandard MySQL: Execute directly

Lock table, create new table, copy rows, swap, releaselocks

How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)

Executed across all nodes, simultaneouslyCan impact applications

"Rolling Schema Upgrades" - RSU

© 2011 - 2018 Percona, Inc.70 / 107

How do we ALTER TABLEStandard MySQL: Execute directly

Lock table, create new table, copy rows, swap, releaselocks

How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)

Executed across all nodes, simultaneouslyCan impact applications

"Rolling Schema Upgrades" - RSUDoes not replicate DDLRun schema change on each node

© 2011 - 2018 Percona, Inc.70 / 107

How do we ALTER TABLEStandard MySQL: Execute directly

Lock table, create new table, copy rows, swap, releaselocks

How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)

Executed across all nodes, simultaneouslyCan impact applications

"Rolling Schema Upgrades" - RSUDoes not replicate DDLRun schema change on each node

pt-online-schema-change

© 2011 - 2018 Percona, Inc.70 / 107

Direct ALTER TABLE - TOIEnsure sysbench is running on app as usualTry these ALTER statements on the nodes listed and notethe effect on sysbench throughput.

mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `m` varchar(32);

© 2011 - 2018 Percona, Inc.71 / 107

Direct ALTER TABLE - TOIEnsure sysbench is running on app as usualTry these ALTER statements on the nodes listed and notethe effect on sysbench throughput.

mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `m` varchar(32);

What about altering a table that sysbench is not using?

© 2011 - 2018 Percona, Inc.71 / 107

Direct ALTER TABLE - TOIEnsure sysbench is running on app as usualTry these ALTER statements on the nodes listed and notethe effect on sysbench throughput.

mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `m` varchar(32);

What about altering a table that sysbench is not using?

mysql2> ALTER TABLE sysbench.large ADD COLUMN `o` varchar(32);

© 2011 - 2018 Percona, Inc.71 / 107

Direct ALTER TABLE - TOIEnsure sysbench is running on app as usualTry these ALTER statements on the nodes listed and notethe effect on sysbench throughput.

mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `m` varchar(32);

What about altering a table that sysbench is not using?

mysql2> ALTER TABLE sysbench.large ADD COLUMN `o` varchar(32);

TOI blocks everything. Doesn't matter if the application isusing the table or not, everything halts while the alterhappens.

© 2011 - 2018 Percona, Inc.71 / 107

Direct ALTER TABLE - RSUChange to RSU and observe sysbench

mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `q` varchar(32);

© 2011 - 2018 Percona, Inc.72 / 107

Direct ALTER TABLE - RSUChange to RSU and observe sysbench

mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `q` varchar(32);

There's now an inconsistency in the schema betweenmysql1 and the others.In a deployment, you would have to run the ALTER oneach node BEFORE executing new application code.

© 2011 - 2018 Percona, Inc.72 / 107

Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?

Make sure you are in RSU mode

mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;

© 2011 - 2018 Percona, Inc.73 / 107

Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?

Make sure you are in RSU mode

mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;

What happens after dropping 'c'?

© 2011 - 2018 Percona, Inc.73 / 107

Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?

Make sure you are in RSU mode

mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;

What happens after dropping 'c'?

mysql1# tail -1000 /var/log/mysqld.log

© 2011 - 2018 Percona, Inc.73 / 107

Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?

Make sure you are in RSU mode

mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;

What happens after dropping 'c'?

mysql1# tail -1000 /var/log/mysqld.log

Why?

© 2011 - 2018 Percona, Inc.73 / 107

Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?

Make sure you are in RSU mode

mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;

What happens after dropping 'c'?

mysql1# tail -1000 /var/log/mysqld.log

Why?How do you recover?When is RSU safe and unsafe to use?

© 2011 - 2018 Percona, Inc.73 / 107

pt-online-schema-change

Verify mysql1 is in TOI mode

mysql1> SELECT @@wsrep_osu_method;+--------------------+| @@wsrep_osu_method |+--------------------+| TOI |+--------------------+

© 2011 - 2018 Percona, Inc.74 / 107

pt-online-schema-change

Verify mysql1 is in TOI mode

mysql1> SELECT @@wsrep_osu_method;+--------------------+| @@wsrep_osu_method |+--------------------+| TOI |+--------------------+

Let's add one final column and observe sysbench results

mysql1# pt-online-schema-change --chunk-size=1000 --alter \ "ADD COLUMN z varchar(32)" --execute D=sysbench,t=sbtest1

© 2011 - 2018 Percona, Inc.74 / 107

pt-online-schema-change

Verify mysql1 is in TOI mode

mysql1> SELECT @@wsrep_osu_method;+--------------------+| @@wsrep_osu_method |+--------------------+| TOI |+--------------------+

Let's add one final column and observe sysbench results

mysql1# pt-online-schema-change --chunk-size=1000 --alter \ "ADD COLUMN z varchar(32)" --execute D=sysbench,t=sbtest1

Was pt-online-schema-change any better at doingDDL?Was the application blocked at all?What are the caveats to using pt-osc?

© 2011 - 2018 Percona, Inc.74 / 107

XtraDB Cluster Tutorial

INCREMENTAL STATE TRANSFER

© 2011 - 2018 Percona, Inc.75 / 107

Incremental State Transfer (IST)Helps to avoid full SSTWhen a node starts, it knowns the UUID of the cluster towhich it belongs and the last sequence number it applied.Upon connecting to other cluster nodes, broadcast thisinformation.If another node has cached write-sets, trigger IST

If no other node has, trigger SSTFirewall Alert:

IST - port 4568SST - port 4444

© 2011 - 2018 Percona, Inc.76 / 107

Galera Cache - "gcache"Write-sets are contained in the "gcache" on each nodeRepresents a single-file circular bufferPreallocated file with a specific size.

Default size is 128MCan be increase via provider option[mysqld]

wsrep_provider_options = "gcache.size=2G";

Cache is mmaped (I/O buffered to memory)wsrep_local_cached_downto - The first seqnopresent in the cache for that node

© 2011 - 2018 Percona, Inc.77 / 107

Simple IST TestMake sure sysbench is still running on app

© 2011 - 2018 Percona, Inc.78 / 107

Simple IST TestMake sure sysbench is still running on appWatch myq_status on mysql1 and mysql3

© 2011 - 2018 Percona, Inc.78 / 107

Simple IST TestMake sure sysbench is still running on appWatch myq_status on mysql1 and mysql3Stop mysql2

Wait a few minutes, then start mysql2

© 2011 - 2018 Percona, Inc.78 / 107

Simple IST TestMake sure sysbench is still running on appWatch myq_status on mysql1 and mysql3Stop mysql2

Wait a few minutes, then start mysql2How did the cluster react when the node was stoppedand restarted?Which node was the donor?

Is it easy to tell when a node is a donor for SST vs IST?

mysql3# tail -50 /var/log/mysqld.log

© 2011 - 2018 Percona, Inc.78 / 107

Tracking ISTPercona XtraDB Cluster 5.7+ adds IST monitoring

mysql> SHOW STATUS LIKE 'wsrep_ist_receive_status'\G*************************** 1. row ***************************Variable_name: wsrep_ist_receive_status Value: 52% complete, received seqno 1506799 of 1415410-15896761 row in set (0.01 sec)

© 2011 - 2018 Percona, Inc.79 / 107

XtraDB Cluster Tutorial

NODE FAILURE SCENARIOS

© 2011 - 2018 Percona, Inc.80 / 107

Node Failure ExercisesRun sysbench on app as usual

© 2011 - 2018 Percona, Inc.81 / 107

Node Failure ExercisesRun sysbench on app as usual

Experiment 1: Clean node restart

mysql1# systemctl restart mysql

© 2011 - 2018 Percona, Inc.81 / 107

Node Failure Exercises (cont.)Experiment 2: Partial network outageSimulate a network outage by using iptables to droptraffic

mysql3# tail -f /var/log/mysqld.log

mysql1# iptables -A INPUT -s mysql3 \ -j DROP; iptables -A OUTPUT -s mysql3 -j DROP

© 2011 - 2018 Percona, Inc.82 / 107

Node Failure Exercises (cont.)Experiment 2: Partial network outageSimulate a network outage by using iptables to droptraffic

mysql3# tail -f /var/log/mysqld.log

mysql1# iptables -A INPUT -s mysql3 \ -j DROP; iptables -A OUTPUT -s mysql3 -j DROP

After observation, clear rules

mysql1# iptables -F

© 2011 - 2018 Percona, Inc.82 / 107

Node Failure Exercises (cont.)Experiment 3: mysql1 stops responding to the clusterAnother network outage affecting 1 node

-- Pay Attention to the command continuation backslashes!-- This should all be 1 line!

mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A INPUT -s mysql2 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql2 -j DROP

© 2011 - 2018 Percona, Inc.83 / 107

Node Failure Exercises (cont.)Experiment 3: mysql1 stops responding to the clusterAnother network outage affecting 1 node

-- Pay Attention to the command continuation backslashes!-- This should all be 1 line!

mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A INPUT -s mysql2 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql2 -j DROP

Attempt DML on mysql1

mysql1> SELECT COUNT(*) FROM sysbench.sbtest1;ERROR 1047 (08S01): WSREP has not yet prepared node for application use

© 2011 - 2018 Percona, Inc.83 / 107

Node Failure Exercises (cont.)Experiment 3: mysql1 stops responding to the clusterAnother network outage affecting 1 node

-- Pay Attention to the command continuation backslashes!-- This should all be 1 line!

mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A INPUT -s mysql2 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql2 -j DROP

Attempt DML on mysql1

mysql1> SELECT COUNT(*) FROM sysbench.sbtest1;ERROR 1047 (08S01): WSREP has not yet prepared node for application use

After observation, clear rules

mysql1# iptables -F

© 2011 - 2018 Percona, Inc.83 / 107

A Safe Stop ClusterKeep sysbench runningExperiment 1: Clean stop of 2 of 3 nodes

mysql1# systemctl stop mysql

mysql2# systemctl stop mysql

© 2011 - 2018 Percona, Inc.84 / 107

A Safe Stop ClusterKeep sysbench runningExperiment 1: Clean stop of 2 of 3 nodes

mysql1# systemctl stop mysql

mysql2# systemctl stop mysql

Is the cluster online?

© 2011 - 2018 Percona, Inc.84 / 107

A Safe Stop ClusterKeep sysbench runningExperiment 1: Clean stop of 2 of 3 nodes

mysql1# systemctl stop mysql

mysql2# systemctl stop mysql

Is the cluster online?Why?

© 2011 - 2018 Percona, Inc.84 / 107

A Non-Primary ClusterExperiment 2: Failure of 50% cluster (leave mysql2 down)

mysql1# systemctl start mysql

-- Wait a few minutes for IST to finish and for the-- cluster to become stable. Watch the error logs and-- SHOW GLOBAL STATUS LIKE 'wsrep%'

mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP

© 2011 - 2018 Percona, Inc.85 / 107

A Non-Primary ClusterExperiment 2: Failure of 50% cluster (leave mysql2 down)

mysql1# systemctl start mysql

-- Wait a few minutes for IST to finish and for the-- cluster to become stable. Watch the error logs and-- SHOW GLOBAL STATUS LIKE 'wsrep%'

mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP

Force mysql3 to become primary like this:

mysql3> set global wsrep_provider_options="pc.bootstrap=true";

© 2011 - 2018 Percona, Inc.85 / 107

A Non-Primary ClusterExperiment 2: Failure of 50% cluster (leave mysql2 down)

mysql1# systemctl start mysql

-- Wait a few minutes for IST to finish and for the-- cluster to become stable. Watch the error logs and-- SHOW GLOBAL STATUS LIKE 'wsrep%'

mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP

Force mysql3 to become primary like this:

mysql3> set global wsrep_provider_options="pc.bootstrap=true";

Clear iptables when done

mysql1# iptables -F

© 2011 - 2018 Percona, Inc.85 / 107

Using garbd as a third nodemysql2's mysql is still stopped!Start the Galera Arbitrator

mysql2# garbd -a gcomm://mysql1,mysql3 -g mycluster

© 2011 - 2018 Percona, Inc.86 / 107

Using garbd as a third nodemysql2's mysql is still stopped!Start the Galera Arbitrator

mysql2# garbd -a gcomm://mysql1,mysql3 -g mycluster

Experiment 3: Partial mysql1 network failure

mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP

© 2011 - 2018 Percona, Inc.86 / 107

Using garbd as a third nodemysql2's mysql is still stopped!Start the Galera Arbitrator

mysql2# garbd -a gcomm://mysql1,mysql3 -g mycluster

Experiment 3: Partial mysql1 network failure

mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP

Clear iptables when done

mysql1# iptables -F

© 2011 - 2018 Percona, Inc.86 / 107

Using garbd as a third nodemysql2's mysql is still stopped!Start the Galera Arbitrator

mysql2# garbd -a gcomm://mysql1,mysql3 -g mycluster

Experiment 3: Partial mysql1 network failure

mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP

Clear iptables when done

mysql1# iptables -F

Stop garbd by pressing ctrl-c

© 2011 - 2018 Percona, Inc.86 / 107

Recovering from a clean all-down stateStop mysql on all 3 nodes

$ systemctl stop mysql -- may need to stop mysql@bootstrap

© 2011 - 2018 Percona, Inc.87 / 107

Recovering from a clean all-down stateStop mysql on all 3 nodes

$ systemctl stop mysql -- may need to stop mysql@bootstrap

Use the grastate.dat to find the node with the highestseqno

$ cat /var/lib/mysql/grastate.dat

© 2011 - 2018 Percona, Inc.87 / 107

Recovering from a clean all-down stateStop mysql on all 3 nodes

$ systemctl stop mysql -- may need to stop mysql@bootstrap

Use the grastate.dat to find the node with the highestseqno

$ cat /var/lib/mysql/grastate.dat

Bootstrap that node and then start the others normally

$ systemctl start mysql@bootstrap -- or systemctl start mysql

© 2011 - 2018 Percona, Inc.87 / 107

Recovering from a clean all-down stateStop mysql on all 3 nodes

$ systemctl stop mysql -- may need to stop mysql@bootstrap

Use the grastate.dat to find the node with the highestseqno

$ cat /var/lib/mysql/grastate.dat

Bootstrap that node and then start the others normally

$ systemctl start mysql@bootstrap -- or systemctl start mysql

Why bootstrap the node with the highest seqno?Why can't the cluster just "figure it out"?

© 2011 - 2018 Percona, Inc.87 / 107

Recovering a dirty all-down stateKill -9 mysqld on all 3 nodes

$ killall -9 mysqld_safe; killall -9 mysqld

© 2011 - 2018 Percona, Inc.88 / 107

Recovering a dirty all-down stateKill -9 mysqld on all 3 nodes

$ killall -9 mysqld_safe; killall -9 mysqld

Check the grastate.dat again. Is it useful?

$ cat /var/lib/mysql/grastate.dat

© 2011 - 2018 Percona, Inc.88 / 107

Recovering a dirty all-down stateKill -9 mysqld on all 3 nodes

$ killall -9 mysqld_safe; killall -9 mysqld

Check the grastate.dat again. Is it useful?

$ cat /var/lib/mysql/grastate.dat

Nope! Why?

© 2011 - 2018 Percona, Inc.88 / 107

Recovering a dirty all-down stateKill -9 mysqld on all 3 nodes

$ killall -9 mysqld_safe; killall -9 mysqld

Check the grastate.dat again. Is it useful?

$ cat /var/lib/mysql/grastate.dat

Nope! Why?Ungraceful shutdown. No opportunity to writeinformation to file.

© 2011 - 2018 Percona, Inc.88 / 107

Recovering a dirty all-down state (cont)Thankfully, Galera GTID:seqno is written to the InnoDBREDO log as part of each transactionTry wsrep_recover on each node to get the seqno

$ mysqld_safe --wsrep_recover

© 2011 - 2018 Percona, Inc.89 / 107

Recovering a dirty all-down state (cont)Thankfully, Galera GTID:seqno is written to the InnoDBREDO log as part of each transactionTry wsrep_recover on each node to get the seqno

$ mysqld_safe --wsrep_recover

Fix grastate.dat with recovery infoBootstrap the highest node and then start the othersnormally

© 2011 - 2018 Percona, Inc.89 / 107

Recovering a dirty all-down state (cont)Thankfully, Galera GTID:seqno is written to the InnoDBREDO log as part of each transactionTry wsrep_recover on each node to get the seqno

$ mysqld_safe --wsrep_recover

Fix grastate.dat with recovery infoBootstrap the highest node and then start the othersnormallyWhat would happen if we started the wrong node?

© 2011 - 2018 Percona, Inc.89 / 107

Weighted QuorumAll nodes have a default weight of 1.Primary state is determined by being strictly greater than 50%of last stable weighted total.Change weight of individual nodes to affect calculation:

SET GLOBAL wsrep_provider_options ='pc.weight = 2';

mysql1: pc.weight = 2mysql2: pc.weight = 1mysql3: pc.weight = 0

Killing mysql2 and mysql3 simultaneously preserves thePrimary state on mysql1.Killing mysql1 causes mysql2 and mysql3 to become non-primary components.

© 2011 - 2018 Percona, Inc.90 / 107

XtraDB Cluster Tutorial

AVOIDING SST

© 2011 - 2018 Percona, Inc.91 / 107

Why Avoid an SST?SST is a slow network process

Approximately 100GB/hour over gigabit networkSST is a CPU/disk consumer

Node is processing write-sets while also trying tostream a backup

SST can cause nodes to fall behindor return inaccurate data

Use xtrabackup to create a "seed" from which to IST

© 2011 - 2018 Percona, Inc.92 / 107

Manual State Transfer with XtrabackupTake a backup on mysql3 (sysbench on app)

mysql3$ xtrabackup --backup --galera-info --target-dir /var/backup/

Stop mysql3 and prepare the backup

mysql3$ systemctl stop mysql

mysql3$ xtrabackup --prepare --target-dir /var/backup/

mysql3$ rm -rf /var/lib/mysql/*

© 2011 - 2018 Percona, Inc.93 / 107

Manual State Transfer with XtrabackupCopy back the backup

mysql3$ xtrabackup --copy-back --target-dir /var/backup/

Get GTID from backup and recreate grastate.dat

mysql3$ cd /var/lib/mysql

mysql3$ cp xtrabackup_galera_info grastate.dat

mysql3$ nano grastate.dat

# GALERA saved stateversion: 2.1uuid: 8797f811-7f73-11e2-0800-8b513b3819c1seqno: 22809safe_to_bootstrap: 0

mysql3$ chown -R mysql:mysql /var/lib/mysql

mysql3$ systemctl start mysql

© 2011 - 2018 Percona, Inc.94 / 107

When you WANT an SSTmysql3# cat /var/lib/mysql/grastate.dat

Turn off replication for the session:

mysql3> SET wsrep_on = OFF;

sysbench should be running on appRepeat the DELETE until mysql3 crashes

mysql3> DELETE FROM sysbench.sbtest1 LIMIT 50000;..after crash..mysql3# cat /var/lib/mysql/grastate.dat

mysql3# tail -500 /var/log/mysqld.log

What is in the error log after line the crash?Should we try to avoid SST in this case?

© 2011 - 2018 Percona, Inc.95 / 107

XtraDB Cluster Tutorial

TUNING REPLICATION

© 2011 - 2018 Percona, Inc.96 / 107

What is Flow Control?Ability for any node in the cluster to tell the other nodes topause writes while it catches up

Cry for help!Feedback mechanism for the replication processONLY caused by wsrep_local_recv_queue exceedinga node's fc_limitThis can pause the entire cluster and look like a clusterstall!

© 2011 - 2018 Percona, Inc.97 / 107

Observing Flow ControlRun sysbench on app as usualTake a read lock on mysql1 and observe its effect on thecluster

mysql1> SET GLOBAL pxc_strict_mode = 0;mysql1> LOCK TABLE sysbench.sbtest1 WRITE;

What happens to sysbench throughput?What are the symptoms of flow control?How can you detect flow control?

SHOW GLOBAL STATUS LIKE 'wsrep_flow%';Cleanup:

UNLOCK TABLES;

© 2011 - 2018 Percona, Inc.98 / 107

Flow Control Parametersgcs.fc_limit

Pause replication if recv queue exceeds this manywritesetsDefault: 100. For master-slave setups this number canbe increased considerably

gcs.fc_master_slaveWhen this is NO, effective gcs.fc_limit is multipledby sqrt( # of cluster members ). Default: NOEx: gcs.fc_limit = 100 * sqrt(3) = 173

gcs.fc_factorControls the high and low mark for sending andstopping of flow control messages. Default: 1.0

© 2011 - 2018 Percona, Inc.99 / 107

Avoiding Flow Control during BackupsMost backups use FLUSH TABLES WITH READ LOCKThis should cause flow control almost immediatelyIn XtraDB Cluster 5.7+, FTWRL automatically DESYNC'sthe node:

DESYNC blocks flow-control messageNo incoming writesets are processedWritesets are still appended to gcache for processingafter lock is releasedFlow-control may be issued during backlog processing

© 2011 - 2018 Percona, Inc.100 / 107

More Notes on Flow ControlIf you set it too low:

Too many cries for help from random nodes in thecluster

If you set it too high:Increases in replication conflicts in multi-node writingsIncreases in apply/commit lag on nodes

That one node with FC issues:Deal with that node: bad hardware? not same specs?

wsrep_flow_control_status ON/OFF - Easyindication if FC is currently happening

© 2011 - 2018 Percona, Inc.101 / 107

Wsrep Sync WaitApply is asynchronous, reads after writes on differentnodes may see older view of the databaseIn reality, flow control prevents this from becoming likeregular Async replicationTighter consistency can be enforced by ensuring thereplication queue is flushed before a read

SET [GLOBAL] wsrep_sync_wait=[0-7];Setting it ensures synchronous read view beforeexecuting an operation:

0 - Disabled / 1 - READ (SELECT, BEGIN)2 - UPDATE and DELETE3 - READ, UPDATE and DELETE4 - INSERT and REPLACE

© 2011 - 2018 Percona, Inc.102 / 107

XtraDB Cluster Tutorial

CONCLUSION

© 2011 - 2018 Percona, Inc.103 / 107

LimitationsDoes not work as expected:

InnoDB/XtraDB Onlytx_isolation = SERIALIZABLEGET_LOCK()LOCK TABLESSELECT ... FOR UPDATECareful with ALTER TABLE ... IMPORT/EXPORT.Capped maximum transaction size.XA transactions.

© 2011 - 2018 Percona, Inc.104 / 107

Common Configuration Parameterswsrep_sst_auth = un:pw Set SST Authentication

wsrep_osu_method = 'TOI' Set DDL Method

wsrep_desync = ON; Enable "Replication"

wsrep_slave_threads = 4; Number of applier threads

wsrep_retry_autocommit = 4; Number of commit retries

wsrep_provider_options:  

* gcs.fc_limit = 500 Increase flow control threshold

* pc.bootstrap = true Put node into single mode

* gcache.size = 2G Increase Galera Cache file

* pc.weight = 2 Increase voting of a node

wsrep_sst_donor = mysql1 Specify donor node

© 2011 - 2018 Percona, Inc.105 / 107

Common Status Counters

wsrep_cluster_status Primary / Non-Primarywsrep_local_state_comment Synced / Donor / Desyncwsrep_cluster_size Number of online nodes

wsrep_flow_control_paused Number of seconds clusterhas paused

wsrep_local_recv_queue Current number of write-setswaiting to be applied

© 2011 - 2018 Percona, Inc.106 / 107

Questions?

© 2011 - 2018 Percona, Inc.(*) Slide content authors: Jay Janssen, Kenny Gryp, Fred Descamps, Matthew Boehm

107 / 107

Recommended