MySQL High Availability: Managing Farms of Distributed Servers (MySQL Fabric)

Preview:

DESCRIPTION

 

Citation preview

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.1

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.2

MySQL High Availability:Managing Farms of Distributed Servers(MySQL Fabric)

Mats KindahlAlfranio CorreiaNarayanan Venkateswaran

3 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

The following is intended to outline our general product direction. It is intended

for information purposes only, and may not be incorporated into any contract.

It is not a commitment to deliver any material, code, or functionality, and

should not be relied upon in making purchasing decision. The development,

release, and timing of any features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.

Safe Harbor Statement

4 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Agenda

MySQL High Availability Options

MySQL Fabric – New kid on the block

MySQL Fabric – Failure detection and Failover

MySQL Fabric-aware connectors

MySQL Fabric – Playing with the new kid

5 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

MySQL High Availability Options

6 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

What Causes Downtime?

System Failures– Server faults

– Software bugs or crashes

Physical Disasters

Scheduled Maintenance

User Errors

7 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Effect and Impact

Effect:– Service Unavailability

– Bad response time

Impact:

– Revenue loss

– Negative impact on customer relationships

– Reduced employee productivity

– Regulatory issues

8 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Another Amazon Outage Exposes the Cloud's Dark LiningBy Brad Stone - Bloomberg Businessweek

“The entire incident lasted all of 49 minutes...”

9 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Causes of Downtime in Production MySQL ServersBy Baron Schwartz – Percona

“It is ironic but true that high-availability tools can cause downtime.”

10 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Failures are inevitable so design your systems taking this into account.

11 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

High Availability Solutions

Primary-Secondary

Shared Nothing Clusters

Tightly-coupled Clusters

12 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Simple to configure

Different Platforms

Configured over LAN or WAN

No Shared Storage or Virtual IP required

Primary-Secondary

Characteristics

MySQL Replication in 5.6

Ma

ster

Sla

ve

Sla

ve

Sla

veS

lave

13 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Asynchronous Replication: risk of data loss (unless using semi-sync)

Performance overhead to master

No automatic failover or switchover (unless using MySQL Utilities)

Primary-Secondary

Characteristics

MySQL Replication in 5.6

Ma

ster

Sla

ve

Sla

ve

Sla

veS

lave

14 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Multi-master architecture

No single point of failure

Support for SQL and NoSQL Interfaces

Synchronous replication

Shared Nothing Clusters

Characteristics

MySQL Cluster

MySQL Cluster Data Nodes MySQL Servers

15 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Tightly Coupled Clusters

Provide Active/Passive Solution

Examples:

– DRBD

– WSFC

– Solaris Clustering

– Oracle Virtual Machines

16 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Linux Kernel module integrated into Oracle Linux

Synchronous replication

Only one MySQL operational

Distributed Replicated Block Device

Characteristics

DRBD (Regular Operation)

Pacemaker

MySQL

DRBD

MySQL

DRBD

Corosync

Se

rvic

es

Clu

ste

r

17 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Cluster Management System required

Virtual IP migration

Distributed Replicated Block Device

Characteristics

DRBD (Failover)

Pacemaker

MySQL

DRBD

MySQL

DRBD

Corosync

Se

rvic

es

Clu

ste

r

18 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Cluster Management System required

Virtual IP migration

Distributed Replicated Block Device

Characteristics

DRBD (Failover)

Pacemaker

MySQL

DRBD

MySQL

DRBD

Corosync

Se

rvic

es

Clu

ste

r

19 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Required:– Windows Clustering– Shared Storage

Only one MySQL Operational

Virutal IP migration

Shared storage used to vote

Shared Storage

Characteristics

Windows Server Failover Clustering (Regular Operation)

Sh

are

d S

tora

ge

Se

rve

rs

MySQL

Windows Clustering

MySQL

Windows Clustering

Se

rvic

es

VoteData

BinaryLog

20 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

MySQL Fabric – New kid on the block

21 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Distributed framework

Extensions are first-class Citizens

Supported by a variety of connectors

Fault-tolerant solution

You can suggest features, report bugs and contribute patches

MySQL Fabric

Still early alpha, long journey ahead

Farms of MySQL 5.6 Servers

22 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Support for Primary-Secondary

Focus on MySQL 5.6 and later

Written in Python

Birds-eye View

Characteristics

High Availability Groups

MySQL Fabric Application

XML-RPC

SQL

Key Components

23 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Fabric-aware connectors:– Route Transactions– Cache Information– Currently Python, Java,

PHP

Birds-eye View

Characteristics

High Availability Groups

MySQL Fabric Application

XML-RPC

SQL

Fabric-aware Connectors

24 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

XML-RPC is widely available

Extensible Framework

Failures taken into account

Architecture

Characteristics

MySQL

MySQL FabricFramework

ExecutorState Store(Persister)

Sh

?HA

MySQLAMQP XML-RPC

??Extensions

Backing Store

Protocols

25 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

MySQL Fabric: Prerequisites

MySQL Servers 5.6.10 (or later):– Backing Store

– Managed Servers

Python 2.6 or 2.7 MySQL Utilities 1.4.0

– Available at labs (http://labs.mysql.com)

26 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

MySQL Fabric – Failure Detection and Failover

27 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Fabric keeps information on groups

Application defines the group that it will use

Connection failures regularly propagated

HA Overview

Characteristics

High Availability GroupMySQL Fabric

ApplicationOperator

28 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Failure Detection and Failover

Current Status:– Simple failure detector/recovery per group

Considering:– Make connectors report failures

– Support external/custom failure detectors

– Improve failover/switchover algorithm

– Extend servers/system to avoid the split-brain problem

29 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Enabled per groupFailure Detection

group = Group.fetch(self.__group_id)for server in group.servers():  if server.is_alive():    continue  if group.master == server.uuid:    trigger("FAIL_OVER", [], self.__group_id)  else:    trigger("SERVER_LOST", [], self.__group_id,             server.uuid)  server.status = MySQLServer.FAULTY

Failover if master has gone

Notification if not master

Server marked as faulty

30 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Failover

Ma

ster

Sla

ve

Sla

ve

Sla

ve

Sla

ve

T1T2T3 T1

T2T3

T1

T1T2

T1

Master fails

31 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Failover

Ma

ster

Sla

ve

Sla

ve

Sla

ve

Sla

ve

T1T2T3 T1

T2T3

T1

T1T2

T1

Choosing a candidate

32 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Failover

Ma

ster

Sla

ve

Sla

ve

Sla

ve

Sla

ve

T1T2T3 T1

T2T3

T1

T1T2

T1

Pointing to the new master

33 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Making Fabric Itself HA

Current Status:– Fabric can automatically resume on-going activities

– Backing store is not left in an inconsistent state

– Information is cached in the connector

Considering:– Replicated State Machine among Fabric nodes

– Use MySQL Cluster as backing store

34 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Crash-safe Procedures

MySQL FabricFramework

State Store(Persister)

Sh

HA

MySQLAMQP XML-RPC

MySQL

Executor

Procedure. Step 1

Procedure. Step 2

Procedure. Step 3

Regular Execution

35 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Crash-safe Procedures

MySQL FabricFramework

State Store(Persister)

Sh

HA

MySQLAMQP XML-RPC

MySQL

Executor

Procedure. Step 1

Procedure. Step 2

Procedure. Step 3

Failover/Recovery Execution

36 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Crash-safe Procedures

MySQL FabricFramework

State Store(Persister)

Sh

HA

MySQLAMQP XML-RPC

MySQL

Executor

Procedure. Step 1

Procedure. Step 2

Procedure. Step 3

Resuming Execution

37 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Writing a procedure

@_events.on_event(STEP_1)def do_something(group_id):    _do_it(group_id)    _events.trigger_within_procedure(STEP_2, group_id)    )

@do_something.undodef undo_something(group_id):    _undo_it(group_id)

Trigger the next step

Compensate Operation

Transactional Context

38 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

MySQL Fabric: Using MySQL Cluster

MySQL FabricFramework

State Store(Persister)

Sh

HA

MySQLAMQP XML-RPC

MySQL FabricFramework

State Store(Persister)

Sh

HA

MySQLAMQP XML-RPC

MySQL Cluster

Executor Executor

39 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

MySQL

MySQL FabricFramework

Executor State Store(Persister)

Sh

HA

MySQLAMQP XML-RPC

MySQL

MySQL FabricFramework

State Store(Persister)

Sh

HA

XML-RPC

MySQL FabricFramework

Executor

MySQLAMQP

MySQL

MySQL FabricFramework

ExecutorState Store(Persister)

Sh

HA

MySQLAMQP XML-RPC

RSMRSM

MySQL Fabric: Using Replicated State Machine

40 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

MySQL Fabric-aware Connectors

41 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Use MySQLFabricConnectionWriting an application

import mysql.connector.fabric as connector

conn = connector.MySQLFabricConnection(    fabric={"host": "fabric.example.com", "port" : 8080},    user='mats', passwd= 'passwd', database="employees")conn.set_property(group='YYZ')cur = conn.cursor()

Connecting to a Group

Define a group

Get a cursor to master in YYZ

42 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Connectors cannot hide failuresMulti-statement transaction

43 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Connectors cannot hide failuresSingle-statement transaction

44 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Writing an application

try:  conn.start_transaction()  conn.execute('INSERT...')  conn.execute('UPDATE...')  self.__cnx.commit()except InterfaceError as error:  cur = conn.cursor()

Handling Connection Failures

Connectors cannot safely retry orreconnect

45 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Plan your application to retry after a failure.

46 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Good practices

Handle session information in the retry logic:– Temporary tables

– Session variables

– Prepared statements

Check the wait_timeout server's property Do not set connection_timeout

47 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Blogs http://alfranio-distributed.blogspot.com/2013/09/writing-fault-tolerant-database.html http://alfranio-distributed.blogspot.com/2013/09/tips-to-build-fault-tolerant-database.html

Documents

http://miscalculation/why-mysql/white-papers/mysql-guide-to-high-availability-solutions/

http://dev.mysql.com/doc/workbench/en/mysql-utilities.html

Code

MySQL Fabric available at http://labs.mysql.com/

References

48 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

MySQL Fabric – Playing with the new kid

49 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Use MTR

Do it manually, use sandbox, whatever you like

Starting MySQL Servers

Quick Setup rpl_fabric_gtid.cnf:

!include ../my.cnf

[mysqld.n]reporthost=localhostlogslaveupdatesinnodbgtidmode=onenforcegtidconsistencymasterinforepository=TABLE

source include/have_innodb.inc

rpl_fabric_gtid.test:

50 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Python 2.6 or 2.7

MySQL Utilities 1.4.0

Check configuration file

MySQL Fabric Installation

Quick Setup fabric.cfg:[storage]address = localhost:3306user = fabricpassword = database = fabricconnection_timeout = 6

[protocol.xmlrpc]address = localhost:8080threads = 5url = file:///var/log/fabric.log

51 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Configure the state store

Start fabric

Manage your groups

Run MySQL Fabric

Quick Setupmysqlfabric manage setup

mysqlfabric manage start

Terminal 1:

mysqlfabric listcommands

mysqlfabric group create YYZ

mysqlfabric group add localhost:1300root ''

Terminal 2:

52 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Thoughts for the Future

● Connector multi-cast● Scatter-gather

● Internal interfaces● Improve extension support● Improve procedures support

● Command-line interface● Improving usability● Focus on ease-of-use

● More protocols● MySQL-RPC Protocol?● AMQP?

● More frameworks?

● More HA group types● DRBD● MySQL Cluster

● Fabric-unaware connectors?

53 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Thoughts for the Future

● “More transparent” sharding● Single-query transactions● Cross-shard joins is a problem

● Multiple shard mappings● Independent tables

● Multi-way shard split● Efficient initial sharding● Better use of resources

● High-availability executor● Node failure stop execution● Replicated State Machine● Fail over to other Fabric node

● Distributed failure detector● Connectors report failures● Custom failure detectors

54 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Thank you!

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.55

Your Feedback is Highly Appreciated!

http://forums.mysql.com/list.php?144

Recommended