Buytaert kris my_sql-pacemaker

MySQL HAMySQL HAwith PaceMakerwith PaceMaker

Kris Buytaert

Kris BuytaertKris Buytaert

● CTO and Open Source Consultant @inuits.euCTO and Open Source Consultant @inuits.eu● „„Infrastructure Architect“Infrastructure Architect“● I don't remember when I started using MySQL I don't remember when I started using MySQL ● Specializing in Automated , Large Scale Specializing in Automated , Large Scale

Deployments , Highly Available infrastructures, Deployments , Highly Available infrastructures, since 2008 also known as “the Cloud”since 2008 also known as “the Cloud”

● Surviving the 10Surviving the 10thth floor test floor test● Cofounded devopsdays.org Cofounded devopsdays.org

In this presentationIn this presentation● High Availability ?High Availability ?● MySQL HA SolutionsMySQL HA Solutions● MySQL Replication MySQL Replication ● Linux HA / PacemakerLinux HA / Pacemaker

What is HA Clustering ?What is HA Clustering ?

● One service goes down One service goes down

=> others take over its work=> others take over its work● IP address takeover, service takeover, IP address takeover, service takeover, ● Not designed for high-performanceNot designed for high-performance● Not designed for high troughput (load Not designed for high troughput (load

balancing)balancing)

Does it Matter ?Does it Matter ?

● Downtime is expensiveDowntime is expensive● You mis out on $$$You mis out on $$$● Your boss complains Your boss complains ● New users don't returnNew users don't return

Lies, Damn Lies, and Lies, Damn Lies, and StatisticsStatistics

Counting ninesCounting nines(slide by Alan R)(slide by Alan R)

99.9999% 30 sec99.999% 5 min99.99% 52 min99.9% 9 hr 99% 3.5 day

The Rules of HAThe Rules of HA

● Keep it SimpleKeep it Simple● Keep it SimpleKeep it Simple● Prepare for FailurePrepare for Failure● Complexity is the enemy of reliabilityComplexity is the enemy of reliability● Test your HA setup Test your HA setup

You care about ?You care about ?

● Your data ?Your data ?

•ConsistentConsistent

•RealitimeRealitime

•Eventual Consistent Eventual Consistent ● Your ConnectionYour Connection

•AlwaysAlways

•Most of the timeMost of the time

Eliminating the SPOFEliminating the SPOF● Find out what Will Fail

•Disks

•Fans

•Power (Supplies)● Find out what Can Fail

•Network

•Going Out Of Memory

Split BrainSplit Brain● Communications failures can lead to separated Communications failures can lead to separated

partitions of the clusterpartitions of the cluster● If those partitions each try and take control of If those partitions each try and take control of

the cluster, then it's called a split-brain the cluster, then it's called a split-brain conditioncondition

● If this happens, then bad things will happenIf this happens, then bad things will happen

•http://linux-ha.org/BadThingsWillHappenhttp://linux-ha.org/BadThingsWillHappen

Historical MySQL HAHistorical MySQL HA

● Replication Replication

•1 read write node1 read write node

•Multiple read only nodesMultiple read only nodes

•Application needed to be modifiedApplication needed to be modified

Solutions TodaySolutions Today● BYOBYO● DRBDDRBD● MySQL Cluster NDBDMySQL Cluster NDBD● Multi Master ReplicationMulti Master Replication● MySQL ProxyMySQL Proxy● MMM / FlipperMMM / Flipper● GaleraGalera● Percona XtraDB Cluster Percona XtraDB Cluster

Data vs ConnectionData vs Connection● DATA : DATA :

•Replication Replication

•DRBDDRBD● ConnectionConnection

•LVSLVS

•ProxyProxy

•Heartbeat / PacemakerHeartbeat / Pacemaker

Shared StorageShared Storage● 1 MySQL instance1 MySQL instance● Monitor MySQL node Monitor MySQL node ● StonithStonith● $$$ $$$ 1+1 <> 21+1 <> 2● Storage = SPOF Storage = SPOF ● Split Brain :(Split Brain :(

DRBDDRBD● Distributed Replicated Block DeviceDistributed Replicated Block Device● In the Linux Kernel (as of very recent)In the Linux Kernel (as of very recent)● Usually only 1 mountUsually only 1 mount

•Multi mount as of 8.X Multi mount as of 8.X

•Requires GFS / OCFS2Requires GFS / OCFS2● Regular FS ext3 ... Regular FS ext3 ... ● Only 1 MySQL instance Active accessing dataOnly 1 MySQL instance Active accessing data● Upon Failover MySQL needs to be started on Upon Failover MySQL needs to be started on

other nodeother node

DRBD(2)DRBD(2)● What happens when you pull the plug of a What happens when you pull the plug of a

Physical machine ? Physical machine ?

•Minimal TimeoutMinimal Timeout

•Why did the crash happen ? Why did the crash happen ?

•Is my data still correct ?Is my data still correct ?

•Innodb Consistency Checks ?Innodb Consistency Checks ?

•Lengthy ?Lengthy ?

•Check your BinLog size Check your BinLog size

MySQL Cluster NDBDMySQL Cluster NDBD● Shared-nothing architectureShared-nothing architecture● Automatic partitioningAutomatic partitioning● Synchronous replicationSynchronous replication● Fast automatic fail-over of data nodesFast automatic fail-over of data nodes● In-memory indexesIn-memory indexes● Not suitable for all query patterns (multi-table Not suitable for all query patterns (multi-table

JOINs, range scans)JOINs, range scans)

Title– Data

MySQL Cluster NDBDMySQL Cluster NDBD● All indexed data needs to be in memoryAll indexed data needs to be in memory● Good and bad experiencesGood and bad experiences

•Better experiences when using the APIBetter experiences when using the API

•Bad when using the MySQL Server Bad when using the MySQL Server ● Test before you deployTest before you deploy● Does not fit for all appsDoes not fit for all apps

How replication worksHow replication works● Master server keeps track of all updates in the Master server keeps track of all updates in the

Binary LogBinary Log•Slave requests to read the binary update logSlave requests to read the binary update log•Master acts in a Master acts in a passivepassive role, not keeping track role, not keeping track of what slave has read what dataof what slave has read what data

● Upon Upon connectingconnecting the slaves do the following: the slaves do the following:•The slave The slave informsinforms the master of where it left off the master of where it left off•It It catches upcatches up on the updates on the updates•It It waits waits for the masterfor the master to notify it of new to notify it of new updateupdatess

Two Slave ThreadsTwo Slave Threads● How does it work?How does it work?•The I/O thread connects to the master and asks for The I/O thread connects to the master and asks for the updates in the master’s binary logthe updates in the master’s binary log•The I/O thread copies the statements to the relay The I/O thread copies the statements to the relay loglog•The SQL thread implements the statements in the The SQL thread implements the statements in the relay log relay log AdvantagesAdvantages•Long running SQL statements don’t block log Long running SQL statements don’t block log downloadingdownloading•Allows the slave to keep up with the master betterAllows the slave to keep up with the master better•In case of master crash the slave is more likely to In case of master crash the slave is more likely to have all statementshave all statements

Replication commandsReplication commandsSlave commandsSlave commands● START|STOP SLAVESTART|STOP SLAVE● RESET SLAVERESET SLAVE● SHOW SLAVE STATUSSHOW SLAVE STATUS● CHANGE MASTER TO…CHANGE MASTER TO…● LOAD DATA FROM MASTERLOAD DATA FROM MASTER● LOAD TABLE tblname FROM MASTERLOAD TABLE tblname FROM MASTER

Master commandsMaster commands● SHOW MASTER STATUSSHOW MASTER STATUS● PURGE MASTER LOGS…PURGE MASTER LOGS…

Show slave status\GShow slave status\G Slave_IO_State: Waiting for master to send eventSlave_IO_State: Waiting for master to send event Master_Host: 172.16.0.1Master_Host: 172.16.0.1 Master_User: repliMaster_User: repli Master_Port: 3306Master_Port: 3306 Connect_Retry: 60Connect_Retry: 60 Master_Log_File: XMS-1-bin.000014Master_Log_File: XMS-1-bin.000014 Read_Master_Log_Pos: 106Read_Master_Log_Pos: 106 Relay_Log_File: XMS-2-relay.000033Relay_Log_File: XMS-2-relay.000033 Relay_Log_Pos: 251Relay_Log_Pos: 251 Relay_Master_Log_File: XMS-1-bin.000014Relay_Master_Log_File: XMS-1-bin.000014 Slave_IO_Running: YesSlave_IO_Running: Yes Slave_SQL_Running: YesSlave_SQL_Running: Yes Replicate_Do_DB: xpolReplicate_Do_DB: xpol Replicate_Ignore_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0Last_Errno: 0 Last_Error: Last_Error: Skip_Counter: 0Skip_Counter: 0 Exec_Master_Log_Pos: 106Exec_Master_Log_Pos: 106 Relay_Log_Space: 547Relay_Log_Space: 547 Until_Condition: NoneUntil_Condition: None Until_Log_File: Until_Log_File: Until_Log_Pos: 0Until_Log_Pos: 0 Master_SSL_Allowed: NoMaster_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Cipher: Master_SSL_Key: Master_SSL_Key: Seconds_Behind_Master: 0Seconds_Behind_Master: 0Master_SSL_Verify_Server_Cert: NoMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 0Last_IO_Errno: 0 Last_IO_Error: Last_IO_Error: Last_SQL_Errno: 0Last_SQL_Errno: 0 Last_SQL_Error: Last_SQL_Error: 1 row in set (0.00 sec)1 row in set (0.00 sec)

Row vs Statement Row vs Statement ● ProPro

•Proven (around since MySQL 3.23)Proven (around since MySQL 3.23)

•Smaller log filesSmaller log files

•Auditing of actual SQL statementsAuditing of actual SQL statements

•No primary key requirement for No primary key requirement for replicated tablesreplicated tables

● ConCon

•Non-deterministic functions and Non-deterministic functions and UDFsUDFs

● ProPro•All changes can be replicatedAll changes can be replicated•Similar technology used by other Similar technology used by other RDBMSesRDBMSes•Fewer locks required for some Fewer locks required for some INSERT, UPDATE or DELETE INSERT, UPDATE or DELETE statementsstatements● ConCon•More data to be loggedMore data to be logged•Log file size increases Log file size increases (backup/restore implications)(backup/restore implications)•Replicated tables require explicit Replicated tables require explicit primary keysprimary keys•Possible different result sets on Possible different result sets on bulk INSERTsbulk INSERTs

Multi Master ReplicationMulti Master Replication● Replicating the same table data both ways can Replicating the same table data both ways can

lead to race conditionslead to race conditions

•Auto_increment, unique keys, etc.. could cause Auto_increment, unique keys, etc.. could cause problems If you write them 2xproblems If you write them 2x● Both nodes are master Both nodes are master ● Both nodes are slave Both nodes are slave ● Write in 1 get updates on the otherWrite in 1 get updates on the other

M|S M|S

MySQL ProxyMySQL Proxy● Man in the middleMan in the middle● Decides where to connect to Decides where to connect to

•LUALUA● Write rules to Write rules to

•Redirect trafficRedirect traffic

•

Master Slave & ProxyMaster Slave & Proxy● Split Read and Write ActionsSplit Read and Write Actions● No Application change requiredNo Application change required● Sends specific queries to a specific node Sends specific queries to a specific node ● Based on Based on

•CustomerCustomer

•UserUser

•TableTable

•Availability Availability

MySQL ProxyMySQL Proxy● Your new SPOFYour new SPOF● Make your Proxy HA too ! Make your Proxy HA too !

•Heartbeat OCF Resource Heartbeat OCF Resource

Breaking ReplicationBreaking Replication● If the master and slave gets out of syncIf the master and slave gets out of sync● Updates on slave with identical index idUpdates on slave with identical index id

•Check error log for disconnections and issues Check error log for disconnections and issues with replicationwith replication

Monitor your SetupMonitor your Setup● Not just connectivity Not just connectivity ● Also functional Also functional

•Query dataQuery data

•Check resultset is correctCheck resultset is correct● Check replication Check replication

•MaatKit MaatKit

•OpenARKOpenARK

Pulling TrafficPulling Traffic● Eg. for Cluster, MultiMaster setups Eg. for Cluster, MultiMaster setups

•DNSDNS

•Advanced RoutingAdvanced Routing

•LVSLVS

•Flipper / MMM Flipper / MMM

MMMMMM● Multi-Master Replication Manager Multi-Master Replication Manager

for MySQLfor MySQL

•Perl scripts to perform Perl scripts to perform monitoring/failover and monitoring/failover and management of MySQL master-management of MySQL master-master replication configurationsmaster replication configurations

● Balance master / slave configs Balance master / slave configs based on replication state based on replication state

•Map Virtual IP to the Best NodeMap Virtual IP to the Best Node

● http://mysql-mmm.org/http://mysql-mmm.org/

FlipperFlipper● Flipper is a Perl tool for Flipper is a Perl tool for

managing read and write managing read and write access pairs of MySQL servers access pairs of MySQL servers

● master-master MySQL Serversmaster-master MySQL Servers● Clients machines do not Clients machines do not

connect "directly" to either connect "directly" to either node instead, node instead,

● One IP for read, One IP for read, ● One IP for write. One IP for write. ● Flipper allows you to move Flipper allows you to move

these IP addresses between these IP addresses between the nodes in a safe and the nodes in a safe and controlled manner.controlled manner.

● http://provenscaling.com/softwhttp://provenscaling.com/software/flipper/are/flipper/

Linux-HA PaceMakerLinux-HA PaceMaker● Plays well with othersPlays well with others● Manages more than MySQL Manages more than MySQL ●

● ...v3 .. don't even think about the rest anymore...v3 .. don't even think about the rest anymore●

● http://clusterlabs.org/http://clusterlabs.org/

Heartbeat Heartbeat ● Heartbeat v1Heartbeat v1

•Max 2 nodesMax 2 nodes

•No finegrained resourcesNo finegrained resources

•Monitoring using “mon”Monitoring using “mon”● Heartbeat v2Heartbeat v2

•XML usage was a consulting opportunityXML usage was a consulting opportunity

•Stability issuesStability issues

•Forking ?Forking ?

Pacemaker ArchitecturePacemaker Architecture● Stonithd : The Heartbeat fencing subsystem.

● Lrmd : Local Resource Management Daemon. Interacts directly with resource agents (scripts).

● pengine Policy Engine. Computes the next state of the cluster based on the current state and the configuration.

● cib Cluster Information Base. Contains definitions of all cluster options, nodes, resources, their relationships to one another and current status. Synchronizes updates to all cluster nodes.

● crmd Cluster Resource Management Daemon. Largely a message broker for the PEngine and LRM, it also elects a leader to co-ordinate the activities of the cluster.

● openais messaging and membership layer.

● heartbeat messaging layer, an alternative to OpenAIS.

● ccm Short for Consensus Cluster Membership. The Heartbeat membership layer.

Pacemaker ?Pacemaker ?● Not a fork Not a fork ● Only CRM Code taken out of Heartbeat Only CRM Code taken out of Heartbeat ● As of Heartbeat 2.1.3As of Heartbeat 2.1.3

•Support for both OpenAIS / HeartBeatSupport for both OpenAIS / HeartBeat

•Different Release Cycles as Heartbeat Different Release Cycles as Heartbeat

Heartbeat, OpenAis ?Heartbeat, OpenAis ?● Both Messaging LayersBoth Messaging Layers● Initially only HeartbeatInitially only Heartbeat● OpenAISOpenAIS● Heartbeat got unmaintainedHeartbeat got unmaintained● OpenAIS has heisenbugs :(OpenAIS has heisenbugs :(● Heartbeat maintenance taken over by LinBitHeartbeat maintenance taken over by LinBit● CRM Detects which layerCRM Detects which layer

OpenAISHeartbeat

Pacemaker

Cluster Glue

or

Configuring HeartbeatConfiguring Heartbeat● /etc/ha.d/ha.cf/etc/ha.d/ha.cf

Use crm = yesUse crm = yes

● /etc/ha.d/authkeys/etc/ha.d/authkeys

Configuring HeartbeatConfiguring Heartbeatheartbeat::hacf {"clustername":heartbeat::hacf {"clustername":

hosts => ["host-a","host-b"],hosts => ["host-a","host-b"],

hb_nic => ["bond0"],hb_nic => ["bond0"],

hostip1 => ["10.0.128.11"],hostip1 => ["10.0.128.11"],

hostip2 => ["10.0.128.12"],hostip2 => ["10.0.128.12"],

ping => ["10.0.128.4"],ping => ["10.0.128.4"],

} }

heartbeat::authkeys {"ClusterName":heartbeat::authkeys {"ClusterName":

password => “ClusterName ",password => “ClusterName ",

}}

http://github.com/jtimberman/puppet/tree/master/heartbeat/http://github.com/jtimberman/puppet/tree/master/heartbeat/

Heartbeat ResourcesHeartbeat Resources● LSBLSB● Heartbeat resource (+status)Heartbeat resource (+status)● OCF (Open Cluster FrameWork) (+monitor)OCF (Open Cluster FrameWork) (+monitor)● Clones (don't use in HAv2)Clones (don't use in HAv2)● Multi State ResourcesMulti State Resources

A MySQL Resource A MySQL Resource ● OCFOCF

•Clone Clone

•Where do you hook up the IP ?Where do you hook up the IP ?

•Multi State Multi State

•But we have Master Master replication But we have Master Master replication

•Meta ResourceMeta Resource

•Dummy resource that can monitor Dummy resource that can monitor

•ConnectionConnection

•Replication stateReplication state

•........

CRM CRM ● Cluster Resource Cluster Resource

ManagerManager● Keeps Nodes in SyncKeeps Nodes in Sync● XML BasedXML Based● cibadm cibadm ● Cli manageableCli manageable● Crm Crm

configureconfigureproperty $id="cib-bootstrap-property $id="cib-bootstrap-options" \options" \ stonith-enabled="FALSE" \stonith-enabled="FALSE" \ no-quorum-policy=ignore \no-quorum-policy=ignore \ start-failure-is-fatal="FALSE" \start-failure-is-fatal="FALSE" \rsc_defaults $id="rsc_defaults-rsc_defaults $id="rsc_defaults-options" \options" \ migration-threshold="1" \migration-threshold="1" \ failure-timeout="1"failure-timeout="1"primitive d_mysql ocf:local:mysql \primitive d_mysql ocf:local:mysql \ op monitor interval="30s" \op monitor interval="30s" \ params test_user="sure" params test_user="sure" test_passwd="illtell" test_passwd="illtell" test_table="test.table"test_table="test.table"primitive ip_db primitive ip_db ocf:heartbeat:IPaddr2 \ocf:heartbeat:IPaddr2 \ params ip="172.17.4.202" params ip="172.17.4.202" nic="bond0" \nic="bond0" \ op monitor interval="10s"op monitor interval="10s"group svc_db d_mysql ip_dbgroup svc_db d_mysql ip_dbcommitcommit

Node A Node B

HeartBeat

Pacemaker

“MySQLd” “MySQLd”

Hardware

Cluster Stack

Resource MySQL

ReplicationService IP MySQL

Adding MySQL to the Adding MySQL to the stackstack

Pitfalls & SolutionsPitfalls & Solutions● Monitor, Monitor,

•Replication stateReplication state

•Replication LagReplication Lag

● MaatKitMaatKit● OpenARKOpenARK

ConclusionConclusion● Plenty of AlternativesPlenty of Alternatives● Think about your DataThink about your Data● Think about getting Queries to that DataThink about getting Queries to that Data● Complexity is the enemy of reliabilityComplexity is the enemy of reliability● Keep it SimpleKeep it Simple● Monitor inside the DBMonitor inside the DB

ContactContactKris Buytaert Kris Buytaert [email protected]@inuits.be

Further ReadingFurther Reading@krisbuytaert @krisbuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/http://www.inuits.be/http://www.inuits.be/

InuitsInuits't Hemeltje't HemeltjeDuboistraat 50Duboistraat 502060 Antwerpen2060 AntwerpenBelgiumBelgium891.514.231891.514.231

+32 475 961221+32 475 961221

•Or the upcoming slidesOr the upcoming slides

mailto:[email protected]

http://www.krisbuytaert.be/blog/

Technology

Buytaert kris my_sql-pacemaker