Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
Percona XtraDB Cluster Tutorialhttp://www.percona.com/training/
Table of Contents
1. Overview & Setup 6. Incremental State Transfer2. Application HA 7. Node Failure Scenarios3. Monitoring Galera withPMM 8 Avoiding SST
4. Galera ReplicationInternals 9. Tuning Replication
5. Online Schema Changes
© 2011 - 2018 Percona, Inc.2 / 107
XtraDB Cluster Tutorial
OVERVIEW
© 2011 - 2018 Percona, Inc.3 / 107
Resourceshttp://www.percona.com/doc/percona-xtradb-cluster/5.7/http://www.percona.com/blog/category/percona-xtradb-cluster/http://galeracluster.com/documentation-webpages/reference.html
© 2011 - 2018 Percona, Inc.4 / 107
Lab InstructionsGet a USB stick (there are only 5)Copy the 2018-pxc-tutorial.ova to your laptopCopy VirtualBox installer to your laptop, if necessaryEJECT USB STICK and had over to someone elseInstall VirtualBox, if necessaryOpen VirtualBox and import the OVAStart all 4 virtual machinesHide/Minimize the console windows; you will not usethemWindows users - you need an SSH client (download onenow!)
© 2011 - 2018 Percona, Inc.5 / 107
Standard ReplicationReplication is a mechanism for recording a series ofchanges on one MySQL server and applying the samechanges to one, or more, other MySQL server.The source is the "master" and its replica is a "slave."
© 2011 - 2018 Percona, Inc.6 / 107
Galera ReplicationDeveloped by CodershipSynchronous multi-master database clusterWrite to any node; every node stays in sync.
© 2011 - 2018 Percona, Inc.7 / 107
Quick Tutorial Setup3 unconfigured MySQL Servers (mysql1, mysql2, mysql3)app will run sysbench
Sample application, simulating activity
© 2011 - 2018 Percona, Inc.8 / 107
Verify Connectivity and SetupConnect to all VMs
Use separate terminal windows; we will be switchingbetween them frequently.Username and password is vagrantDO NOT USE VIRTUAL BOX CONSOLE WINDOWS!
ssh -p 2200 vagrant@localhostssh -p 2201 vagrant@localhostssh -p 2202 vagrant@localhostssh -p 2222 vagrant@localhost
© 2011 - 2018 Percona, Inc.9 / 107
Verify Connectivity and Setup (cont)Be sure to sudo to rootStop MySQL if running on all nodes
# systemctl stop mysql
When you restart the Virtual Machines, they auto-startMySQL and create a default setupErase MySQL's $DATADIR
# rm -rf /var/lib/mysql/*
© 2011 - 2018 Percona, Inc.10 / 107
Configure mysql3 for PXC/etc/percona-xtradb-cluster.conf.d/wsrep.cnf
...wsrep_provider = /usr/lib64/galera3/libgalera_smm.sowsrep_cluster_address = gcomm://mysql1,mysql2,mysql3wsrep_node_address = 192.168.70.30wsrep_cluster_name = myclusterwsrep_node_name = mysql3wsrep_sst_auth = "sstuser:s3cretPass"
Make sure any of the above are NOT COMMENTED!Start mysql3 with:
mysql3# systemctl start mysql@bootstrap
© 2011 - 2018 Percona, Inc.11 / 107
Checking Cluster StateMySQL 5.7 generates random root password.
Find it and reset it.
$ grep password /var/log/mysqld.log... password is generated for root@localhost: :#xfgy*aD9ea
$ mysql -e "SET PASSWORD='Percona1234#'" -uroot -p \--connect-expired-password
Persist password in ~/.my.cnf on all 3 nodes to makethings easier
$ cat ~/.my.cnf[client]password="Percona1234#"
© 2011 - 2018 Percona, Inc.12 / 107
Checking Cluster StateCan you tell:
How many nodes are in the cluster?Is the cluster Primary?
Use 'myq_status'* to check the state of Galera on mysql3:
mysql3# /usr/local/bin/myq_status wsrep
Check status counters manually
mysql> SHOW GLOBAL STATUS LIKE 'wsrep%'
© 2011 - 2018 Percona, Inc.(*) https://github.com/jayjanssen/myq-tools/releases/tag/1.0.4
13 / 107
mysql2's Config/etc/percona-xtradb-cluster.conf.d/wsrep.cnf
...wsrep_provider = /usr/lib64/galera3/libgalera_smm.sowsrep_cluster_address = gcomm://mysql1,mysql2,mysql3wsrep_node_address = 192.168.70.20wsrep_cluster_name = myclusterwsrep_node_name = mysql2wsrep_sst_auth = "sstuser:s3cretPass"
Don't start MySQL yet!
© 2011 - 2018 Percona, Inc.14 / 107
What's going to happen?Don't start MySQL yet!When mysql2 is started, what's going to happen?How does mysql2 get a copy of the dataset?
Can it use the existing data it already has?Do we have to bootstrap mysql2?
© 2011 - 2018 Percona, Inc.15 / 107
State Snapshot Transfer (SST)Transfer a full backup of an existing cluster member(donor) to a new node entering the cluster (joiner).We configured our SST method to use ‘xtrabackup-v2'.
© 2011 - 2018 Percona, Inc.16 / 107
Additional Xtrabackup SST SetupWhat about that sst user in your wsrep.cnf?
[mysqld]...wsrep_sst_auth = "sstuser:s3cretPass"...
User/Grants need to be added for Xtrabackup
mysql3> CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 's3cretPass';mysql3> GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENTON *.* TO 'sstuser'@'localhost';
The good news is that once you get things figured out thefirst time, it's typically very easy to get an SST the firsttime on subsequent nodes.
© 2011 - 2018 Percona, Inc.17 / 107
Up and Running mysql2Watch mysql3's error log for progress
mysql3# tail -f /var/log/mysqld.log
Now you can start mysql2
mysql2# systemctl start mysql
© 2011 - 2018 Percona, Inc.18 / 107
mysql1's Config...wsrep_provider = /usr/lib64/galera3/libgalera_smm.sowsrep_cluster_address = gcomm://mysql1,mysql2,mysql3wsrep_node_address = 192.168.70.10wsrep_cluster_name = myclusterwsrep_node_name = mysql1wsrep_sst_auth = "sstuser:s3cretPass"
Now you can start mysql1
mysql1# systemctl start mysql
© 2011 - 2018 Percona, Inc.19 / 107
Where are We Now?
© 2011 - 2018 Percona, Inc.20 / 107
XtraDB Cluster Tutorial
APPLICATION HIGH AVAILABILITY
© 2011 - 2018 Percona, Inc.21 / 107
ProxySQL Architecture
© 2011 - 2018 Percona, Inc.22 / 107
Setting up ProxySQLCreate monitor user
mysql3> CREATE USER 'monitor'@'%' IDENTIFIED BY 'monit0r';
Create Cluster Admin User
mysql3> CREATE USER 'cuser'@'%' IDENTIFIED BY 'cpass';mysql3> GRANT ALL ON *.* TO 'cuser'@'%';
© 2011 - 2018 Percona, Inc.23 / 107
Setting up ProxySQLCreate monitor user
mysql3> CREATE USER 'monitor'@'%' IDENTIFIED BY 'monit0r';
Create Cluster Admin User
mysql3> CREATE USER 'cuser'@'%' IDENTIFIED BY 'cpass';mysql3> GRANT ALL ON *.* TO 'cuser'@'%';
Do we have to create these users on the other nodes?
© 2011 - 2018 Percona, Inc.23 / 107
Configure ProxySQLCreate and edit the file
app# nano /etc/proxysql-admin.cnf
export PROXYSQL_USERNAME="admin"export PROXYSQL_PASSWORD="admin"export PROXYSQL_HOSTNAME="localhost"export PROXYSQL_PORT="6032"export CLUSTER_USERNAME="cuser"export CLUSTER_PASSWORD="cpass"export CLUSTER_HOSTNAME="mysql3"export CLUSTER_PORT="3306"export MONITOR_USERNAME="monitor"export MONITOR_PASSWORD="monit0r"export CLUSTER_APP_USERNAME="sbuser"export CLUSTER_APP_PASSWORD="sbPass1234#"export WRITE_HOSTGROUP_ID="10"export READ_HOSTGROUP_ID="11"export MODE="singlewrite"export HOST_PRIORITY_FILE="/var/lib/proxysql/host_priority.conf"
© 2011 - 2018 Percona, Inc.24 / 107
Start ProxySQLStart ProxySQL with the following command:
app# systemctl restart proxysql
app# proxysql-admin --config-file=/etc/proxysql-admin.cnf \--write-node=192.168.70.30:3306 --enable
© 2011 - 2018 Percona, Inc.25 / 107
Start ProxySQLStart ProxySQL with the following command:
app# systemctl restart proxysql
app# proxysql-admin --config-file=/etc/proxysql-admin.cnf \--write-node=192.168.70.30:3306 --enable
mysql3 is the writer nodeGrant all privileges to app user
mysql3> CREATE DATABASE sysbench;
mysql3> GRANT ALL ON sysbench.* TO 'sbuser'@'192.%';
© 2011 - 2018 Percona, Inc.25 / 107
Check ProxySQL Statsapp# mysql -P6032 -uadmin -padmin -h 127.0.0.1
proxysql> SELECT * FROM stats_mysql_connection_pool ORDER BY hostgroup, srv_host DESC;
+-----------+---------------+----------+--------+...------------+| hostgroup | srv_host | srv_port | status |... Latency_us |+-----------+---------------+----------+--------+...------------+| 10 | 192.168.70.30 | 3306 | ONLINE |... 461 || 11 | 192.168.70.20 | 3306 | ONLINE |... 820 || 11 | 192.168.70.10 | 3306 | ONLINE |... 101 |+-----------+---------------+----------+--------+...------------+
© 2011 - 2018 Percona, Inc.26 / 107
Prepare Test DataCreate some test data for our sample application
app$ /usr/local/bin/prepare_sysbench.shsysbench 1.0.14 (using bundled LuaJIT 2.1.0-beta2)
Initializing worker threads...
Creating table 'sbtest1'...Inserting 1000000 records into 'sbtest1'Creating a secondary index on 'sbtest1'...
© 2011 - 2018 Percona, Inc.27 / 107
Start SysbenchSending Reads and Writes
Writes goes to Hostgroup 10 (mysql3)Reads goes to Hostgroup 11 (mysql1,mysql2)
app# screen /usr/local/bin/run_sysbench_oltp.sh
[ 1s] threads: 1, tps: 9.99, reads: 139.89, writes: 39.97, ...[ 2s] threads: 1, tps: 7.00, reads: 98.04, writes: 28.01, ...[ 3s] threads: 1, tps: 7.00, reads: 98.03, writes: 28.01, ...[ 4s] threads: 1, tps: 14.00, reads: 195.93, writes: 55.98, ...
Check ProxySQL Stats
app# mysql -P6032 -uadmin -padmin -h 127.0.0.1
proxysql> SELECT * from stats_mysql_connection_poolORDER BY hostgroup, srv_host DESC;
© 2011 - 2018 Percona, Inc.28 / 107
Test mysql2 ShutdownStart myq_status on mysql3Shut down mysql2
systemctl stop mysqlCheck ProxySQL Stats
proxysql> SELECT * from stats_mysql_connection_poolORDER BY hostgroup, srv_host DESC;+-----------+---------------+----------+--------..| hostgroup | srv_host | srv_port | status ..+-----------+---------------+----------+--------..| 10 | 192.168.70.30 | 3306 | ONLINE ..| 11 | 192.168.70.20 | 3306 | OFFLINE_HARD ..| 11 | 192.168.70.10 | 3306 | ONLINE ..+-----------+---------------+----------+--------..
Start mysql2 back upmysql2# systemctl restart mysql
© 2011 - 2018 Percona, Inc.29 / 107
Crash TestStop mysql1 destructively:
mysql1# killall -9 mysqld_safe; killall -9 mysqld
Watch sysbech and myq_statusCheck ProxySQL statusRestart mysql2
mysql1# systemctl restart mysql
© 2011 - 2018 Percona, Inc.30 / 107
Stop writer nodeStop MySQL on mysql3Check Sysbench and myq_statusCheck ProxySQL status
mysql3# systemctl stop mysql@bootstrap
proxysql> SELECT * from stats_mysql_connection_poolORDER BY hostgroup, srv_host DESC;
+-----------+--------------+----------+--------...| hostgroup | srv_host | srv_port | status ...+-----------+--------------+----------+--------...| 10 | 192.168.70.2 | 3306 | ONLINE ...| 11 | 192.168.70.3 | 3306 | ONLINE ...+-----------+--------------+----------+--------...
mysql1 is in the write hostgroup now.
© 2011 - 2018 Percona, Inc.31 / 107
Restarting mysql3mysql3 was bootstrapped earlier
What did that mean?Do we have to bootstrap mysql3 again?
© 2011 - 2018 Percona, Inc.32 / 107
Restarting mysql3mysql3 was bootstrapped earlier
What did that mean?Do we have to bootstrap mysql3 again?
No, we already having a running cluster.Bootstrapping creates a new cluster.Restart mysql3 normally
mysql3# systemctl restart mysql
© 2011 - 2018 Percona, Inc.32 / 107
XtraDB Cluster Tutorial
MONITORING GALERA with PMM
© 2011 - 2018 Percona, Inc.33 / 107
Percona Monitoring and Management
© 2011 - 2018 Percona, Inc.(*) https://pmmdemo.percona.com/
34 / 107
Create PMM containerStart DockerCreate PMM Container
app# docker create \ -v /opt/prometheus/data \ -v /opt/consul-data \ -v /var/lib/mysql \ -v /var/lib/grafana \ --name pmm-data \ percona/pmm-server:latest /bin/true
© 2011 - 2018 Percona, Inc.35 / 107
Start PMM containerapp# docker run -d \ -p 80:80 \ --volumes-from pmm-data \ --name pmm-server \ --restart always \ percona/pmm-server:latest
Check the dashboardOpen http://192.168.70.200/graph/
© 2011 - 2018 Percona, Inc.36 / 107
Configure PMM Clients on All nodesRun the following on all 3 cluster nodes and app server
mysql*# pmm-admin config --server 192.168.70.200mysql*# pmm-admin add mysql
app# pmm-admin config --server 127.0.0.1app# pmm-admin add proxysql:metrics
Check the dashboard againOpen http://192.168.70.200/graph/
© 2011 - 2018 Percona, Inc.37 / 107
XtraDB Cluster Tutorial
GALERA REPLICATION INTERNALS
© 2011 - 2018 Percona, Inc.38 / 107
Group ReplicationBased on Totem Single-ring OrderingAll nodes have to ACK messageUses binlog row events
But does not require binary loggingWrites events to Gcache (configurable size)Nodes self-subscribe to communication channel
© 2011 - 2018 Percona, Inc.39 / 107
Replication RolesWithin the cluster, all nodes are equalmaster/donor node
The source of the transaction.slave/joiner node
Receiver of the transaction.Writeset: A transaction; one or more row changes
© 2011 - 2018 Percona, Inc.40 / 107
State Transfer RolesNew nodes joining an existing cluster get provisionedautomatically
Joiner = New nodeDonor = Node giving a copy of the data
State Snapshot transferFull backup of Donor to Joiner
Incremental Snapshot transferOnly changes since node left cluster
© 2011 - 2018 Percona, Inc.41 / 107
What's inside a write-set?The payload; Generated from the RBR eventKeys; Generated by the source node
Primary KeysUnique KeysForeign Keys
Keys are how certification happesA PRIMARY KEY on every table is required.
© 2011 - 2018 Percona, Inc.42 / 107
Replication StepsAs Master/Donor
ApplyReplicateCertifyCommit
As Slave/JoinerReplicate (receive)Certify (includes reply to master/donor)ApplyCommit
© 2011 - 2018 Percona, Inc.43 / 107
What is Certification?In a nutshell: "Can this transaction be applied?"Happens on every node for all write-setsDeterministicResults of certification are not relayed back tomaster/donor
In event of failure, drop transaction locally; Possiblyremove self from cluster.On success, write-set enters apply queue
Serialized via group communication protocol
© 2011 - 2018 Percona, Inc.(*) https://www.percona.com/blog/2016/04/17/how-percona-xtradb-cluster-certification-works/
44 / 107
The ApplyApply is done locally, asynchronously, on each node, aftercertification.Can be parallelized like standard async replication
wsrep_slave_threads > 1If there is a conflict, will create brute force aborts (bfa) onlocal node.
© 2011 - 2018 Percona, Inc.45 / 107
Last but not least, Commit StageLocal, on each node, InnoDB commit.Applier thread executes on slave/joiner nodes.Regular connection thread for master/donor node.
© 2011 - 2018 Percona, Inc.46 / 107
Galera Replication Example
© 2011 - 2018 Percona, Inc.47 / 107
Galera Replication Example
© 2011 - 2018 Percona, Inc.48 / 107
Galera Replication Example
© 2011 - 2018 Percona, Inc.49 / 107
Galera Replication Example
© 2011 - 2018 Percona, Inc.50 / 107
Galera Replication Example
© 2011 - 2018 Percona, Inc.51 / 107
Galera Replication Example
© 2011 - 2018 Percona, Inc.52 / 107
Galera Replication Example
© 2011 - 2018 Percona, Inc.53 / 107
Galera Replication Example
© 2011 - 2018 Percona, Inc.54 / 107
Galera Replication Example
© 2011 - 2018 Percona, Inc.55 / 107
Galera Replication Facts"Replication" is synchronous; slave apply is notA successful commit reply to client means a transactionWILL get applied across the entire cluster.A node that tries to commit a conflicting transactionbefore will get a deadlock error.
This is called a local certification failure or 'lcf'When our transaction is applied, any competingtransactions that haven't committed will get a deadlockerror.
This is called a brute force abort or 'bfa'
© 2011 - 2018 Percona, Inc.56 / 107
Brute Force Abort Example
© 2011 - 2018 Percona, Inc.57 / 107
Brute Force Abort Example
© 2011 - 2018 Percona, Inc.58 / 107
Brute Force Abort Example
© 2011 - 2018 Percona, Inc.59 / 107
Brute Force Abort Example
© 2011 - 2018 Percona, Inc.60 / 107
Brute Force Abort Example
© 2011 - 2018 Percona, Inc.61 / 107
Brute Force Abort Example
© 2011 - 2018 Percona, Inc.62 / 107
Brute Force Abort Example
© 2011 - 2018 Percona, Inc.63 / 107
Brute Force Abort ExerciseCreate a test table
mysql1> CREATE DATABASE test;mysql1> CREATE TABLE test.deadlocks (i INT UNSIGNED NOT NULLPRIMARY KEY, j VARCHAR(32) );
mysql1> INSERT INTO test.deadlocks VALUES (1, NULL);mysql1> BEGIN; -- Start explicit transactionmysql1> UPDATE test.deadlocks SET j = 'mysql1' WHERE i = 1;
Before we commit on mysql1, go to mysql3 in a separatewindow:
mysql3> BEGIN; -- Start transactionmysql3> UPDATE test.deadlocks SET j = 'mysql3' WHERE i = 1;
mysql3> COMMIT;
mysql1> COMMIT;
mysql1> SELECT * FROM test.deadlocks;
© 2011 - 2018 Percona, Inc.64 / 107
Unexpected Deadlocks (cont.)Which commit succeeds?Will only a COMMIT generate the deadlock error?Is this a lcf or bfa?How would you diagnose this error?
SHOW STATUS LIKE 'wsrep_local_bf%';
© 2011 - 2018 Percona, Inc.65 / 107
When could you multi-node write?When your application can handle the deadlocksgracefully
wsrep_retry_autocommit can helpApplication-specific schemasIsolated traffic to specific tables
© 2011 - 2018 Percona, Inc.66 / 107
AUTO_INCREMENT HandlingPick one node to work on:
mysql3> CREATE TABLE sysbench.autoinc (i INT NOT NULL AUTO_INCREMENT PRIMARY KEY, j VARCHAR(32) );
mysql3> INSERT INTO sysbench.autoinc (j) VALUES ('mysql3');mysql3> INSERT INTO sysbench.autoinc (j) VALUES ('mysql3');
mysql3> SELECT * FROM sysbench.autoinc;
What is odd about the values for 'i'?What happens if you do the inserts on each node inorder?Check wsrep_auto_increment_control andauto_increment% global variables
© 2011 - 2018 Percona, Inc.67 / 107
Encrypt all the Things!MySQL 5.7+ out-of-box SSL ready
SSL certificates created if not presentSST/IST/Galera traffic not encrypted by defaultSet pxc-encrypt-cluster-traffic=ON
Passes generated certs by MySQL to GaleraConfigure manually:
[mysqld]wsrep_provider_options="socket.ssl=yes;socket.ssl_key=/path/to/server-key.pem;socket.ssl_cert=/path/to/server-cert.pem;socket.ssl_ca=/path/to/cacert.pem"
[sst]encrypt=4ssl-ca=<PATH>/ca.pemssl-cert=<PATH>/server-cert.pemssl-key=<PATH>/server-key.pem
Must bootstrap cluster using SSL
© 2011 - 2018 Percona, Inc.68 / 107
XtraDB Cluster Tutorial
ONLINE SCHEMA CHANGES
© 2011 - 2018 Percona, Inc.69 / 107
How do we ALTER TABLEStandard MySQL: Execute directly
Lock table, create new table, copy rows, swap, releaselocks
© 2011 - 2018 Percona, Inc.70 / 107
How do we ALTER TABLEStandard MySQL: Execute directly
Lock table, create new table, copy rows, swap, releaselocks
How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)
© 2011 - 2018 Percona, Inc.70 / 107
How do we ALTER TABLEStandard MySQL: Execute directly
Lock table, create new table, copy rows, swap, releaselocks
How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)
Executed across all nodes, simultaneouslyCan impact applications
© 2011 - 2018 Percona, Inc.70 / 107
How do we ALTER TABLEStandard MySQL: Execute directly
Lock table, create new table, copy rows, swap, releaselocks
How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)
Executed across all nodes, simultaneouslyCan impact applications
"Rolling Schema Upgrades" - RSU
© 2011 - 2018 Percona, Inc.70 / 107
How do we ALTER TABLEStandard MySQL: Execute directly
Lock table, create new table, copy rows, swap, releaselocks
How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)
Executed across all nodes, simultaneouslyCan impact applications
"Rolling Schema Upgrades" - RSUDoes not replicate DDLRun schema change on each node
© 2011 - 2018 Percona, Inc.70 / 107
How do we ALTER TABLEStandard MySQL: Execute directly
Lock table, create new table, copy rows, swap, releaselocks
How do you maintain consistency in XtraDB Cluster?"Total Order Isolation" - TOI (default)
Executed across all nodes, simultaneouslyCan impact applications
"Rolling Schema Upgrades" - RSUDoes not replicate DDLRun schema change on each node
pt-online-schema-change
© 2011 - 2018 Percona, Inc.70 / 107
Direct ALTER TABLE - TOIEnsure sysbench is running on app as usualTry these ALTER statements on the nodes listed and notethe effect on sysbench throughput.
mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `m` varchar(32);
© 2011 - 2018 Percona, Inc.71 / 107
Direct ALTER TABLE - TOIEnsure sysbench is running on app as usualTry these ALTER statements on the nodes listed and notethe effect on sysbench throughput.
mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `m` varchar(32);
What about altering a table that sysbench is not using?
© 2011 - 2018 Percona, Inc.71 / 107
Direct ALTER TABLE - TOIEnsure sysbench is running on app as usualTry these ALTER statements on the nodes listed and notethe effect on sysbench throughput.
mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `m` varchar(32);
What about altering a table that sysbench is not using?
mysql2> ALTER TABLE sysbench.large ADD COLUMN `o` varchar(32);
© 2011 - 2018 Percona, Inc.71 / 107
Direct ALTER TABLE - TOIEnsure sysbench is running on app as usualTry these ALTER statements on the nodes listed and notethe effect on sysbench throughput.
mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `m` varchar(32);
What about altering a table that sysbench is not using?
mysql2> ALTER TABLE sysbench.large ADD COLUMN `o` varchar(32);
TOI blocks everything. Doesn't matter if the application isusing the table or not, everything halts while the alterhappens.
© 2011 - 2018 Percona, Inc.71 / 107
Direct ALTER TABLE - RSUChange to RSU and observe sysbench
mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `q` varchar(32);
© 2011 - 2018 Percona, Inc.72 / 107
Direct ALTER TABLE - RSUChange to RSU and observe sysbench
mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 ADD COLUMN `q` varchar(32);
There's now an inconsistency in the schema betweenmysql1 and the others.In a deployment, you would have to run the ALTER oneach node BEFORE executing new application code.
© 2011 - 2018 Percona, Inc.72 / 107
Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?
Make sure you are in RSU mode
mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;
© 2011 - 2018 Percona, Inc.73 / 107
Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?
Make sure you are in RSU mode
mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;
What happens after dropping 'c'?
© 2011 - 2018 Percona, Inc.73 / 107
Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?
Make sure you are in RSU mode
mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;
What happens after dropping 'c'?
mysql1# tail -1000 /var/log/mysqld.log
© 2011 - 2018 Percona, Inc.73 / 107
Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?
Make sure you are in RSU mode
mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;
What happens after dropping 'c'?
mysql1# tail -1000 /var/log/mysqld.log
Why?
© 2011 - 2018 Percona, Inc.73 / 107
Direct ALTER TABLE - RSU (cont.)What happens if we drop a column the application isactively using?
Make sure you are in RSU mode
mysql1> SET wsrep_osu_method = 'RSU';mysql1> ALTER TABLE sysbench.sbtest1 DROP COLUMN `c`;
What happens after dropping 'c'?
mysql1# tail -1000 /var/log/mysqld.log
Why?How do you recover?When is RSU safe and unsafe to use?
© 2011 - 2018 Percona, Inc.73 / 107
pt-online-schema-change
Verify mysql1 is in TOI mode
mysql1> SELECT @@wsrep_osu_method;+--------------------+| @@wsrep_osu_method |+--------------------+| TOI |+--------------------+
© 2011 - 2018 Percona, Inc.74 / 107
pt-online-schema-change
Verify mysql1 is in TOI mode
mysql1> SELECT @@wsrep_osu_method;+--------------------+| @@wsrep_osu_method |+--------------------+| TOI |+--------------------+
Let's add one final column and observe sysbench results
mysql1# pt-online-schema-change --chunk-size=1000 --alter \ "ADD COLUMN z varchar(32)" --execute D=sysbench,t=sbtest1
© 2011 - 2018 Percona, Inc.74 / 107
pt-online-schema-change
Verify mysql1 is in TOI mode
mysql1> SELECT @@wsrep_osu_method;+--------------------+| @@wsrep_osu_method |+--------------------+| TOI |+--------------------+
Let's add one final column and observe sysbench results
mysql1# pt-online-schema-change --chunk-size=1000 --alter \ "ADD COLUMN z varchar(32)" --execute D=sysbench,t=sbtest1
Was pt-online-schema-change any better at doingDDL?Was the application blocked at all?What are the caveats to using pt-osc?
© 2011 - 2018 Percona, Inc.74 / 107
XtraDB Cluster Tutorial
INCREMENTAL STATE TRANSFER
© 2011 - 2018 Percona, Inc.75 / 107
Incremental State Transfer (IST)Helps to avoid full SSTWhen a node starts, it knowns the UUID of the cluster towhich it belongs and the last sequence number it applied.Upon connecting to other cluster nodes, broadcast thisinformation.If another node has cached write-sets, trigger IST
If no other node has, trigger SSTFirewall Alert:
IST - port 4568SST - port 4444
© 2011 - 2018 Percona, Inc.76 / 107
Galera Cache - "gcache"Write-sets are contained in the "gcache" on each nodeRepresents a single-file circular bufferPreallocated file with a specific size.
Default size is 128MCan be increase via provider option[mysqld]
wsrep_provider_options = "gcache.size=2G";
Cache is mmaped (I/O buffered to memory)wsrep_local_cached_downto - The first seqnopresent in the cache for that node
© 2011 - 2018 Percona, Inc.77 / 107
Simple IST TestMake sure sysbench is still running on app
© 2011 - 2018 Percona, Inc.78 / 107
Simple IST TestMake sure sysbench is still running on appWatch myq_status on mysql1 and mysql3
© 2011 - 2018 Percona, Inc.78 / 107
Simple IST TestMake sure sysbench is still running on appWatch myq_status on mysql1 and mysql3Stop mysql2
Wait a few minutes, then start mysql2
© 2011 - 2018 Percona, Inc.78 / 107
Simple IST TestMake sure sysbench is still running on appWatch myq_status on mysql1 and mysql3Stop mysql2
Wait a few minutes, then start mysql2How did the cluster react when the node was stoppedand restarted?Which node was the donor?
Is it easy to tell when a node is a donor for SST vs IST?
mysql3# tail -50 /var/log/mysqld.log
© 2011 - 2018 Percona, Inc.78 / 107
Tracking ISTPercona XtraDB Cluster 5.7+ adds IST monitoring
mysql> SHOW STATUS LIKE 'wsrep_ist_receive_status'\G*************************** 1. row ***************************Variable_name: wsrep_ist_receive_status Value: 52% complete, received seqno 1506799 of 1415410-15896761 row in set (0.01 sec)
© 2011 - 2018 Percona, Inc.79 / 107
XtraDB Cluster Tutorial
NODE FAILURE SCENARIOS
© 2011 - 2018 Percona, Inc.80 / 107
Node Failure ExercisesRun sysbench on app as usual
© 2011 - 2018 Percona, Inc.81 / 107
Node Failure ExercisesRun sysbench on app as usual
Experiment 1: Clean node restart
mysql1# systemctl restart mysql
© 2011 - 2018 Percona, Inc.81 / 107
Node Failure Exercises (cont.)Experiment 2: Partial network outageSimulate a network outage by using iptables to droptraffic
mysql3# tail -f /var/log/mysqld.log
mysql1# iptables -A INPUT -s mysql3 \ -j DROP; iptables -A OUTPUT -s mysql3 -j DROP
© 2011 - 2018 Percona, Inc.82 / 107
Node Failure Exercises (cont.)Experiment 2: Partial network outageSimulate a network outage by using iptables to droptraffic
mysql3# tail -f /var/log/mysqld.log
mysql1# iptables -A INPUT -s mysql3 \ -j DROP; iptables -A OUTPUT -s mysql3 -j DROP
After observation, clear rules
mysql1# iptables -F
© 2011 - 2018 Percona, Inc.82 / 107
Node Failure Exercises (cont.)Experiment 3: mysql1 stops responding to the clusterAnother network outage affecting 1 node
-- Pay Attention to the command continuation backslashes!-- This should all be 1 line!
mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A INPUT -s mysql2 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql2 -j DROP
© 2011 - 2018 Percona, Inc.83 / 107
Node Failure Exercises (cont.)Experiment 3: mysql1 stops responding to the clusterAnother network outage affecting 1 node
-- Pay Attention to the command continuation backslashes!-- This should all be 1 line!
mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A INPUT -s mysql2 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql2 -j DROP
Attempt DML on mysql1
mysql1> SELECT COUNT(*) FROM sysbench.sbtest1;ERROR 1047 (08S01): WSREP has not yet prepared node for application use
© 2011 - 2018 Percona, Inc.83 / 107
Node Failure Exercises (cont.)Experiment 3: mysql1 stops responding to the clusterAnother network outage affecting 1 node
-- Pay Attention to the command continuation backslashes!-- This should all be 1 line!
mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A INPUT -s mysql2 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql2 -j DROP
Attempt DML on mysql1
mysql1> SELECT COUNT(*) FROM sysbench.sbtest1;ERROR 1047 (08S01): WSREP has not yet prepared node for application use
After observation, clear rules
mysql1# iptables -F
© 2011 - 2018 Percona, Inc.83 / 107
A Safe Stop ClusterKeep sysbench runningExperiment 1: Clean stop of 2 of 3 nodes
mysql1# systemctl stop mysql
mysql2# systemctl stop mysql
© 2011 - 2018 Percona, Inc.84 / 107
A Safe Stop ClusterKeep sysbench runningExperiment 1: Clean stop of 2 of 3 nodes
mysql1# systemctl stop mysql
mysql2# systemctl stop mysql
Is the cluster online?
© 2011 - 2018 Percona, Inc.84 / 107
A Safe Stop ClusterKeep sysbench runningExperiment 1: Clean stop of 2 of 3 nodes
mysql1# systemctl stop mysql
mysql2# systemctl stop mysql
Is the cluster online?Why?
© 2011 - 2018 Percona, Inc.84 / 107
A Non-Primary ClusterExperiment 2: Failure of 50% cluster (leave mysql2 down)
mysql1# systemctl start mysql
-- Wait a few minutes for IST to finish and for the-- cluster to become stable. Watch the error logs and-- SHOW GLOBAL STATUS LIKE 'wsrep%'
mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP
© 2011 - 2018 Percona, Inc.85 / 107
A Non-Primary ClusterExperiment 2: Failure of 50% cluster (leave mysql2 down)
mysql1# systemctl start mysql
-- Wait a few minutes for IST to finish and for the-- cluster to become stable. Watch the error logs and-- SHOW GLOBAL STATUS LIKE 'wsrep%'
mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP
Force mysql3 to become primary like this:
mysql3> set global wsrep_provider_options="pc.bootstrap=true";
© 2011 - 2018 Percona, Inc.85 / 107
A Non-Primary ClusterExperiment 2: Failure of 50% cluster (leave mysql2 down)
mysql1# systemctl start mysql
-- Wait a few minutes for IST to finish and for the-- cluster to become stable. Watch the error logs and-- SHOW GLOBAL STATUS LIKE 'wsrep%'
mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP
Force mysql3 to become primary like this:
mysql3> set global wsrep_provider_options="pc.bootstrap=true";
Clear iptables when done
mysql1# iptables -F
© 2011 - 2018 Percona, Inc.85 / 107
Using garbd as a third nodemysql2's mysql is still stopped!Start the Galera Arbitrator
mysql2# garbd -a gcomm://mysql1,mysql3 -g mycluster
© 2011 - 2018 Percona, Inc.86 / 107
Using garbd as a third nodemysql2's mysql is still stopped!Start the Galera Arbitrator
mysql2# garbd -a gcomm://mysql1,mysql3 -g mycluster
Experiment 3: Partial mysql1 network failure
mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP
© 2011 - 2018 Percona, Inc.86 / 107
Using garbd as a third nodemysql2's mysql is still stopped!Start the Galera Arbitrator
mysql2# garbd -a gcomm://mysql1,mysql3 -g mycluster
Experiment 3: Partial mysql1 network failure
mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP
Clear iptables when done
mysql1# iptables -F
© 2011 - 2018 Percona, Inc.86 / 107
Using garbd as a third nodemysql2's mysql is still stopped!Start the Galera Arbitrator
mysql2# garbd -a gcomm://mysql1,mysql3 -g mycluster
Experiment 3: Partial mysql1 network failure
mysql1# iptables -A INPUT -s mysql3 -j DROP; \ iptables -A OUTPUT -s mysql3 -j DROP
Clear iptables when done
mysql1# iptables -F
Stop garbd by pressing ctrl-c
© 2011 - 2018 Percona, Inc.86 / 107
Recovering from a clean all-down stateStop mysql on all 3 nodes
$ systemctl stop mysql -- may need to stop mysql@bootstrap
© 2011 - 2018 Percona, Inc.87 / 107
Recovering from a clean all-down stateStop mysql on all 3 nodes
$ systemctl stop mysql -- may need to stop mysql@bootstrap
Use the grastate.dat to find the node with the highestseqno
$ cat /var/lib/mysql/grastate.dat
© 2011 - 2018 Percona, Inc.87 / 107
Recovering from a clean all-down stateStop mysql on all 3 nodes
$ systemctl stop mysql -- may need to stop mysql@bootstrap
Use the grastate.dat to find the node with the highestseqno
$ cat /var/lib/mysql/grastate.dat
Bootstrap that node and then start the others normally
$ systemctl start mysql@bootstrap -- or systemctl start mysql
© 2011 - 2018 Percona, Inc.87 / 107
Recovering from a clean all-down stateStop mysql on all 3 nodes
$ systemctl stop mysql -- may need to stop mysql@bootstrap
Use the grastate.dat to find the node with the highestseqno
$ cat /var/lib/mysql/grastate.dat
Bootstrap that node and then start the others normally
$ systemctl start mysql@bootstrap -- or systemctl start mysql
Why bootstrap the node with the highest seqno?Why can't the cluster just "figure it out"?
© 2011 - 2018 Percona, Inc.87 / 107
Recovering a dirty all-down stateKill -9 mysqld on all 3 nodes
$ killall -9 mysqld_safe; killall -9 mysqld
© 2011 - 2018 Percona, Inc.88 / 107
Recovering a dirty all-down stateKill -9 mysqld on all 3 nodes
$ killall -9 mysqld_safe; killall -9 mysqld
Check the grastate.dat again. Is it useful?
$ cat /var/lib/mysql/grastate.dat
© 2011 - 2018 Percona, Inc.88 / 107
Recovering a dirty all-down stateKill -9 mysqld on all 3 nodes
$ killall -9 mysqld_safe; killall -9 mysqld
Check the grastate.dat again. Is it useful?
$ cat /var/lib/mysql/grastate.dat
Nope! Why?
© 2011 - 2018 Percona, Inc.88 / 107
Recovering a dirty all-down stateKill -9 mysqld on all 3 nodes
$ killall -9 mysqld_safe; killall -9 mysqld
Check the grastate.dat again. Is it useful?
$ cat /var/lib/mysql/grastate.dat
Nope! Why?Ungraceful shutdown. No opportunity to writeinformation to file.
© 2011 - 2018 Percona, Inc.88 / 107
Recovering a dirty all-down state (cont)Thankfully, Galera GTID:seqno is written to the InnoDBREDO log as part of each transactionTry wsrep_recover on each node to get the seqno
$ mysqld_safe --wsrep_recover
© 2011 - 2018 Percona, Inc.89 / 107
Recovering a dirty all-down state (cont)Thankfully, Galera GTID:seqno is written to the InnoDBREDO log as part of each transactionTry wsrep_recover on each node to get the seqno
$ mysqld_safe --wsrep_recover
Fix grastate.dat with recovery infoBootstrap the highest node and then start the othersnormally
© 2011 - 2018 Percona, Inc.89 / 107
Recovering a dirty all-down state (cont)Thankfully, Galera GTID:seqno is written to the InnoDBREDO log as part of each transactionTry wsrep_recover on each node to get the seqno
$ mysqld_safe --wsrep_recover
Fix grastate.dat with recovery infoBootstrap the highest node and then start the othersnormallyWhat would happen if we started the wrong node?
© 2011 - 2018 Percona, Inc.89 / 107
Weighted QuorumAll nodes have a default weight of 1.Primary state is determined by being strictly greater than 50%of last stable weighted total.Change weight of individual nodes to affect calculation:
SET GLOBAL wsrep_provider_options ='pc.weight = 2';
mysql1: pc.weight = 2mysql2: pc.weight = 1mysql3: pc.weight = 0
Killing mysql2 and mysql3 simultaneously preserves thePrimary state on mysql1.Killing mysql1 causes mysql2 and mysql3 to become non-primary components.
© 2011 - 2018 Percona, Inc.90 / 107
XtraDB Cluster Tutorial
AVOIDING SST
© 2011 - 2018 Percona, Inc.91 / 107
Why Avoid an SST?SST is a slow network process
Approximately 100GB/hour over gigabit networkSST is a CPU/disk consumer
Node is processing write-sets while also trying tostream a backup
SST can cause nodes to fall behindor return inaccurate data
Use xtrabackup to create a "seed" from which to IST
© 2011 - 2018 Percona, Inc.92 / 107
Manual State Transfer with XtrabackupTake a backup on mysql3 (sysbench on app)
mysql3$ xtrabackup --backup --galera-info --target-dir /var/backup/
Stop mysql3 and prepare the backup
mysql3$ systemctl stop mysql
mysql3$ xtrabackup --prepare --target-dir /var/backup/
mysql3$ rm -rf /var/lib/mysql/*
© 2011 - 2018 Percona, Inc.93 / 107
Manual State Transfer with XtrabackupCopy back the backup
mysql3$ xtrabackup --copy-back --target-dir /var/backup/
Get GTID from backup and recreate grastate.dat
mysql3$ cd /var/lib/mysql
mysql3$ cp xtrabackup_galera_info grastate.dat
mysql3$ nano grastate.dat
# GALERA saved stateversion: 2.1uuid: 8797f811-7f73-11e2-0800-8b513b3819c1seqno: 22809safe_to_bootstrap: 0
mysql3$ chown -R mysql:mysql /var/lib/mysql
mysql3$ systemctl start mysql
© 2011 - 2018 Percona, Inc.94 / 107
When you WANT an SSTmysql3# cat /var/lib/mysql/grastate.dat
Turn off replication for the session:
mysql3> SET wsrep_on = OFF;
sysbench should be running on appRepeat the DELETE until mysql3 crashes
mysql3> DELETE FROM sysbench.sbtest1 LIMIT 50000;..after crash..mysql3# cat /var/lib/mysql/grastate.dat
mysql3# tail -500 /var/log/mysqld.log
What is in the error log after line the crash?Should we try to avoid SST in this case?
© 2011 - 2018 Percona, Inc.95 / 107
XtraDB Cluster Tutorial
TUNING REPLICATION
© 2011 - 2018 Percona, Inc.96 / 107
What is Flow Control?Ability for any node in the cluster to tell the other nodes topause writes while it catches up
Cry for help!Feedback mechanism for the replication processONLY caused by wsrep_local_recv_queue exceedinga node's fc_limitThis can pause the entire cluster and look like a clusterstall!
© 2011 - 2018 Percona, Inc.97 / 107
Observing Flow ControlRun sysbench on app as usualTake a read lock on mysql1 and observe its effect on thecluster
mysql1> SET GLOBAL pxc_strict_mode = 0;mysql1> LOCK TABLE sysbench.sbtest1 WRITE;
What happens to sysbench throughput?What are the symptoms of flow control?How can you detect flow control?
SHOW GLOBAL STATUS LIKE 'wsrep_flow%';Cleanup:
UNLOCK TABLES;
© 2011 - 2018 Percona, Inc.98 / 107
Flow Control Parametersgcs.fc_limit
Pause replication if recv queue exceeds this manywritesetsDefault: 100. For master-slave setups this number canbe increased considerably
gcs.fc_master_slaveWhen this is NO, effective gcs.fc_limit is multipledby sqrt( # of cluster members ). Default: NOEx: gcs.fc_limit = 100 * sqrt(3) = 173
gcs.fc_factorControls the high and low mark for sending andstopping of flow control messages. Default: 1.0
© 2011 - 2018 Percona, Inc.99 / 107
Avoiding Flow Control during BackupsMost backups use FLUSH TABLES WITH READ LOCKThis should cause flow control almost immediatelyIn XtraDB Cluster 5.7+, FTWRL automatically DESYNC'sthe node:
DESYNC blocks flow-control messageNo incoming writesets are processedWritesets are still appended to gcache for processingafter lock is releasedFlow-control may be issued during backlog processing
© 2011 - 2018 Percona, Inc.100 / 107
More Notes on Flow ControlIf you set it too low:
Too many cries for help from random nodes in thecluster
If you set it too high:Increases in replication conflicts in multi-node writingsIncreases in apply/commit lag on nodes
That one node with FC issues:Deal with that node: bad hardware? not same specs?
wsrep_flow_control_status ON/OFF - Easyindication if FC is currently happening
© 2011 - 2018 Percona, Inc.101 / 107
Wsrep Sync WaitApply is asynchronous, reads after writes on differentnodes may see older view of the databaseIn reality, flow control prevents this from becoming likeregular Async replicationTighter consistency can be enforced by ensuring thereplication queue is flushed before a read
SET [GLOBAL] wsrep_sync_wait=[0-7];Setting it ensures synchronous read view beforeexecuting an operation:
0 - Disabled / 1 - READ (SELECT, BEGIN)2 - UPDATE and DELETE3 - READ, UPDATE and DELETE4 - INSERT and REPLACE
© 2011 - 2018 Percona, Inc.102 / 107
XtraDB Cluster Tutorial
CONCLUSION
© 2011 - 2018 Percona, Inc.103 / 107
LimitationsDoes not work as expected:
InnoDB/XtraDB Onlytx_isolation = SERIALIZABLEGET_LOCK()LOCK TABLESSELECT ... FOR UPDATECareful with ALTER TABLE ... IMPORT/EXPORT.Capped maximum transaction size.XA transactions.
© 2011 - 2018 Percona, Inc.104 / 107
Common Configuration Parameterswsrep_sst_auth = un:pw Set SST Authentication
wsrep_osu_method = 'TOI' Set DDL Method
wsrep_desync = ON; Enable "Replication"
wsrep_slave_threads = 4; Number of applier threads
wsrep_retry_autocommit = 4; Number of commit retries
wsrep_provider_options:
* gcs.fc_limit = 500 Increase flow control threshold
* pc.bootstrap = true Put node into single mode
* gcache.size = 2G Increase Galera Cache file
* pc.weight = 2 Increase voting of a node
wsrep_sst_donor = mysql1 Specify donor node
© 2011 - 2018 Percona, Inc.105 / 107
Common Status Counters
wsrep_cluster_status Primary / Non-Primarywsrep_local_state_comment Synced / Donor / Desyncwsrep_cluster_size Number of online nodes
wsrep_flow_control_paused Number of seconds clusterhas paused
wsrep_local_recv_queue Current number of write-setswaiting to be applied
© 2011 - 2018 Percona, Inc.106 / 107
Questions?
© 2011 - 2018 Percona, Inc.(*) Slide content authors: Jay Janssen, Kenny Gryp, Fred Descamps, Matthew Boehm
107 / 107