Building Scalable High Availability Systems using MySQL Fabric

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Building Scalable High Availability Systemsusing MySQL Fabric

Mats KindahlSenior Principal Software Developer

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

2015-10-28

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |2015-10-28

Session Outline

Introduction to MySQL Fabric1

2

3

4

5

Elements of Scalable High-Availability Systems

Building a High Availability System

Scaling High Availability Systems

Closing Remarks & Thoughts for the Future


Introduction to MySQL Fabric


What is MySQL Fabric?

An extensible and easy-to-use framework for managing a farm of MySQL servers supporting high-availability and sharding


What does all that mean?

● Management System– Manages a MySQL Farm

– Distributed Framework

● Framework– Resilient executor

– State store

– Transaction routing

● Extensible– High-availability

– “Semi-automatic” sharding

● Written in Python● Open Source

– You can participate

– Suggest features

– Report bugs

– Contribute patches

● MySQL 5.6 is focus– For now


MySQL Fabric: Goals & Features

● Decision logic in connector– Eliminates network hop

– Reduce network load

– Eliminate single point of failure

– Scale naturally: not a bottleneck

● Connector API extensions– Support transactions

– Support full SQL

● MySQL Router– Support for legacy connectors

● Load balancing– Read-write split

– Weighted round-robin

● Cloud integration– Support elasticity

– Servers on-demand


MySQL Fabric: Goals & Features

● Multi-table sharding– Co-locate related rows

● Sharding functions– Range

– (Consistent) hash

● Shard operations– Shard move and split

● Global updates– Global tables

– Schema updates


MySQL Fabric 1.6 (alpha): What is new

● MySQL Router (2.0 GA)– Distributed separately

– Support for legacy connectors

– Connection-based routing

– Connection fail-over support

– Connection load balancing

● Multi-node Fabric Control– Geographical Redundancy

– Resilient Execution

– Load distribution

● Improved Password Management– Management passwords in state

store

– Management passwords encrypted using AES


Birds-eye View

MySQL Fabric Node

Application

Operator

High-Availability Groups (Shards)


High-Level Components

● Connecting to the Farm– Enhanced Connector API

– MySQL Router

● MySQL Fabric Node– Manage information about farm

– Provide status information

– Execute procedures

● MySQL Servers– Organized in high-availability

groups

– Handling application data

High AvailabilityGroup

ApplicationConnector

Connector

Connector

MySQLFabricNode


MySQL Fabric Node Architecture

MySQL

MySQL FabricFramework

Executor State Store(Persister)

Sh

?HA

MySQL-RPC XML-RPC

Connector

Connector

Connector

Protocols

Extensions

BackingStore


MySQL Router

● Plugin-based architecture– Harness with general functionality

● Connection-based Routing– Connection-based routing plugin

– Routing decision on connecting

– Raw packet copy: very fast

● Fabric Info Plugin– Meta-data Cache

– Use Fabric node to get meta-data

Fabric Info Module

Cache MySQL

MySQL Router

MySQL Fabric

Routing Module


Elements of ScalableHigh-Availability Systems


High-Availability Concepts

● Redundancy– Duplicate components

● Monitoring– Detecting failing components

– Monitor load

● Procedures– Activate replacements

– Distribute load

– Decommission components

– Deploy new components


Database Scalability Concepts

● Scalability and Elasticity– The ability of being able to adapt to changes in load of the system

● Scaling Reads– Being able to cope with an increasing read load

● Scaling Writes– Being able to cope with an increasing write load

● Scaling Storage– Being able to cope with an increasing database size


“[Elasticity] is defined as the degree to which a system is able to adapt to workload changes by provisioning and deprovisioning resources in an autonomic manner”

What is Elasticity?


Database Read Scalability

● Coping with increased read traffic– Typical use-case for many web application

– Usually multiple tiers: web caches and read servers

● Adding result set caches– Can offload database server for repeated queries

– Queries have to be identical

– Cannot handle complex queries

● Adding read servers– Replicate from primary server to dedicated read servers

– Can handle complex queries


Database Write Scalability

● Coping with increased write traffic– Typical use-case for many large-scale monitoring applications

● Monitoring database servers● Data collection systems

● Buying better machines– Straightforward solution

– Cost?


Database Storage Scalability

● Coping with increased amount of data– Large-scale monitoring applications

● Monitoring network

– Business Analysis Systems● Collecting and analyzing click-streams

● Sharding the data– Database need to be sharded into independent partitions

– Application need to be tailored to sharded data


Building a High Availability SystemA case study


Basic system without redundancy

● Application on the Web– Deployed on separate hardware

– Connector built into application

– Connections opened for each request

● Database Backend– Store data from many applications

● High Availability?


Basic system without redundancy

● No High Availability– No redundancy

– No monitoring

– No way to activate replacement

● What can you do?– Add redundancy

– Add monitoring

– Add activation of replacement

!


Basic system with redundancy

● Server Redundancy– Keep a secondary around

– Replicate from primary

● MySQL Router– Deployed as intermediate

– Monitor servers

– Re-direct connections on primary failure


Router Configuration for Static Failover

● Section header– Name + Key

● Bind port– Port number to listen on

– Connections from localhost only by default

[routing:failover]bind_port = 13306destinations = srv1.example.com,srv2.example.commode = read-write

● Destinations– Servers to open connections to

● Mode– Read-write queries on the

connection



● Server crashes– Replication stops

● Notices server is gone– Router notices server disappears

– Server is put in quarantine

● Find alternative server– Router picks new server from static

list of candidates

!

?



● Activate replacement– New connections are sent to

secondary

– Existing connections will time out

?



● Server Restored– The primary is restored into working state

– Replication is set up again

● Server unquarantined– Router notices server is available again

– Server is unquarantined

– New connections go to primary

– Old connections still go to secondary● Closed once the request is finished


Using Fabric for Server Management

● Fabric Control Node– Provide Information about farm to router

– Monitor servers

– Execute procedures

– Add and remove servers

● High Availability Group– Collection of servers

– All servers in group have same data

Fabric Node

High Availability Group


Router Configuration with Fabric

● Fabric cache– Address of Fabric node

[fabric_cache:my_cache]address = fabric.example.com:32275

[routing:failover]bind_port = 13306destinations = fabric+cache://my_cache/group/my_groupmode = read-write

● Destinations– Reference to Fabric cache instead

of servers.

– Fabric cache is authority in the URL

– Groups under group key



● Initially– Secondaries have @@READ_ONLY = 1

– Primary has @@READ_ONLY = 0

● Primary is lost● Fabric node notices

– Built-in Failure Detector

!

?



● Fabric node perform fail-over– Secondary is selected as new primary

– Other secondaries are changed to replicated from new primary

– @@READ_ONLY = 0 on new primary

– Router informed of change

● Router change primary– New connections go to new primary

– Old connections error out?



● Availability– Router caches information

– Service not interrupted on Fabric node failure

– Note: Failovers not done while Fabric node is down

● Meta Data Durability– Information stored in database backend

– Fabric node just have to be restarted● For example, using init.d script

Fabric Node


Geographical Redundancy

● Multiple data centers– One Fabric node in each

● Meta data locality– Node in data center to provide meta-data

● Resilient execution– Long-running procedures are not

interrupted



● Fabric Control Cluster– Multiple Fabric control nodes

– One active node (leader), others passive

– Passive nodes can be used as read replicas

– … but also act as standby

● Elastic Control Cluster– Adding or removing nodes on demand

– Added nodes automatically brought up to date when joining



● Fabric Control Cluster Leader– There is one leader in the cluster

– The leader can change● Leader dies● Leader is explicitly changed

– Commands changing meta data have to be executed on leader

● Internal Forwarding– Commands that change meta-data are

automatically forwarded to leader


Scaling High Availability SystemsA case study


Scaling Reads

● When to scale for reads– Read-heavy workloads

– Most web applications

● Result set caching– If queries are identical

– Done at application layer

– Not covered in this presentation

● Read servers– For more complex queries


Scaling Reads

● Adding read servers to group– Servers added as spares

– Fabric monitor spares as well

● Setup router– Add a read-only query port to router

– Distribute load in round-robin fashion

● Split traffic– Updates sent to update port

– Queries sent to query port

Fabric Node

Queries

Updates


Router Configuration for Static Load Balancing

● Section header– Name + Key

● Bind port– Port number to listen on

– Connections from localhost only by default

[routing:load_balancing]bind_port = 13307destinations = slave1.example.com,slave2.example.commode = read-only

● Destinations– Servers to open connections to

● Mode– Read-only queries on the

connection

– … can be load balanced


Scaling Writes

● Write-heavy workloads– Data collection networks

– Large-scale monitoring applications

● Workload Type– Large number of updates

– Each update change few rows

● Replication does not help– Writes are replicated as well

!


Scaling Writes

● “Sharding” database– Horizontal partitioning of rows in tables

– Each shard updated independently of others

● “Sharding” write stream– Write stream partitioned by key

– Write streams are independent

1000-1999

1-999

2000-2999

3000-3999


Scaling Storage

● Data-intensive applications– Data collection networks

– Business analysis applications● On-Line Analytical Processing

(OLAP) of large data sets

● Just too much data– Indexes too large

– Backup time excessive

– Analysis prohibitive

● “Sharding” the data– Horizontal partitioning of database

based on sharding key

– Same solution as for scaling writes

● Analytic Queries– Parallelization possible

– Inter-shard queries?

– Cross-shard queries?


Elasticity

● Scale hardware on demand– Automatically adapting to changes

in workload

– Useful for changes in read load, write load, and storage requirements

– Require monitoring the load of the system

● Read load change– Adapt number of read slaves to

read load

● Write load change– Adapt number of shards to

incoming write traffic

● Storage requirement change– Adapt number of shards to storage

requirements


Scaling through Sharding in Fabric

● Use Fabric-aware connectors– … or doing key mapping in application

– Sending transactions to correct shard

– Key need to be given explicitly● Provided together with the query

● Dispatching Analytical Queries– Done at application level

– Fetch shard information from Fabric● Using XML-RPC or MySQL-RPC interface

– Dispatch queries to the shards

– Collect and merge the results

MySQL Fabric ServerApplication


Global Group


Scaling through Sharding in Fabric

● Schema designed for sharding– Sharding key need to be picked carefully

● Sharded Tables– Application-level “objects” usually spread

over several tables

– Rows for one “object” need to be co-located on same shard

● Global Tables– Tables present on all shards

– Replicated to all shards using the Global Group

MySQL Fabric ServerApplication


Global Group


employeeemp_id INTbirth_date DATEfirst_name VARCHAR(14)last_name VARCHAR(16)gender ENUM('M','F')hire_date DATE

titlesemp_no INTtitle VARCHAR(50)from_date DATEto_date DATE

salariesemp_no INTsalary INTfrom_date DATEto_date DATE

departmentsdept_no CHAR(4)dept_name VARCHAR(40)

dept_empemp_no INTdept_no CHAR(4)from_date DATEto_date DATE

dept_managerdept_no CHAR(4)emp_no INTfrom_date DATEto_date DATE

Table Rows

salaries 284 404 700

titles 44 330 800

employees 30 002 400

dept_emp 33 160 300

dept_manager 2 400

departments 900

In desperate needof sharding!

Foreign keySchema for sharding

Global Table (replicated on all shards)


● OpenStack Controller Node– Manages Identities– Manages Images– Manages Block Storage

● OpenStack Compute Node– (Virtual) machine host– Contain hypervisor– Spawn and destroy virtual

machines

Controller

Compute(VM Host)

Elasticity through Virtualization


MySQL Fabric and OpenStack

MySQL Fabric NodeOpenStack Instance

Application

High-Availability Groups / Shards


Closing remarks


Where are we now?

● High Availability Features– Slave promotion

– Multi-node Fabric Control Cluster

– Failure detectors

● Failure detection– Built-in Failure Detector

– Failure reporting

– Custom failure detectors

● Connector APIs– Transaction properties

– “Virtual” connections

– Legacy connector support● Using router

● Interfaces– Command-line

– XML-RPC

– MySQL-RPC


Where are we now?

● Execution– Node failure stop execution

● Execution restart on recovery

– Replicated State Machine (RSM)

– Fail-over execution

● Security– RFC 2617 for HTTP

authentication

– SSL support over XML-RPC

– Secure Password Management

● Cloud integration– “Server providers”

– OpenStack Nova

● Sharding– Range and hash sharding

– Shard move and split


Thoughts for the Future

● Custom Procedures– Better support for custom

procedures

– Easy way to write high availability procedures

● Cloud support– OpenStack Trove

– Amazon AWS

– Amazon RDS

● Frameworks– Django?

– Symphony?

● More Fabric-aware connectors– C/C++?

● Improved provisioning– Using on-line backup?

– Using filesystem snapshots?



● Integration Router-Fabric– Better integration of Router and

Fabric

– Failure reporting in Router

– Multi-node Fabric Support in Router

● Integration with Group Replication– Router to balance write load

● Enterprise Monitoring– Monitor farm status

– Monitor Fabric nodes

● Dynamic Router Configuration– On-line configuration changes

● Transaction Multiplexing– Allow several incoming clients to use a

single connection to the server

– Transactions are interleaved



● Router Monitoring– Statistics collection

– Monitoring interfaces

● Query Consolidation– Consolidate result sets from

identical queries “in flight”

● Router Logging– Logging plugin interface

– Sending logs to different sinks

● Result set caching– Allow router to cache result sets

– Transparent to application

● Session consistency– We have a distributed database

– It should look like a single database

– Read monotonicity

– Write monotonicity



● Automatic sharding– Single-query transactions?

– Speculative execution?

– Cross-shard queries?

● Shard aggregation– Cross-shard queries

● Connection-based Sharding– Sharding OLTP workload in router

– Assign shard key range to router port

– Router send session to shard based on port

● Multi-way shard split– Efficient initial sharding

– Better use of resources

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

2015-10-28

Technology

Building Scalable High Availability Systems using MySQL Fabric