MySQL Cluster Deployment Best Practices€¦ · • High availability (99.999%) • Let’s look...

MySQL Cluster Deployment Best Practices

Johan Andersson Mat KeepMySQL Cluster Practice Manager MySQL Product Team

• Suitable Applications

• MySQL Cluster compared to InnoDB – main differences

• Network & Hardware Selection

• Disk Data Deployment

• Configuration

• Administration & Implementation Best Practices

• Online/Offline Operations

• Backup and restore

• Monitoring

• Services available to get started

Agenda

MySQL Cluster – Users & ApplicationsHA, Transactional Services: Web & Telecoms

http://www.mysql.com/customers/cluster/

• User & Subscriber Databases

• Service Delivery Platforms

• Application Servers

• Web Session Stores

• eCommerce

• VoIP, IPTV & VoD

• Mobile Content Delivery

• On-Line app stores and portals

• On-Line Gaming

• DNS/DHCP for Broadband

• Payment Gateways

• Data Store for LDAP Directories

• Good fit• OLTP apps with short running queries

• Application with realtime characteristics and requirements

• A lot of concurrent requests

• Write intensive applications

• Typically the following are a poor fit:• Heavy reporting type (OLAP)

• Data Warehouse

• Complex JOINs scale badly (a couple of tables and with about 1000 records meeting the JOIN criteria is just fine)

• However, replicate from MySQL Cluster to regular MySQL (innodb) which runs the reporting.

Suitable Applications

Realtime and Reporting Architecture

Realtime AppsReporting System

• Replication

• Mysqldump

• ndb_restore

→ csv

→ LOAD DATA INFILE

Complex

reporting queries

Don't mix real-time operations with Reporting - separate!

How to set this up with replication? Go to http://johanandersson.blogspot.com/2009/05/ha-mysql-write-scaling-using-cluster-to.html

App Servers

SQL Layer

Storage

Data Collection/Aggregation Architecture

Aggregate data from peripheral systems (sources)

http://johanandersson.blogspot.com/2009/04/multi-source-replication-with-mysql.html

HA Shard Catalog

Shard Catalog

MySQL ClusterShard_n

• Shard Catalog stores user_id → shard_id and other indexes/mappings (user_id → friend_id:shard_id).

• Shard Catalog can grow online

App Servers

SQL Layer

Storage

Shard_0

Memcached / caching layer

• Every database has its characteristics

• MySQL Cluster is designed for

• Short, but many, parallell transactions

• High volume

• High degree of concurrency

• High availability (99.999%)

• Let’s look how MySQL Cluster compares to Innodb (and most other traditional databases)

MySQL Cluster compared to InnoDB / Other databases

• Table locks are usually taken before an Offline operation (e.g ALTER to change data type). During normal traffic then a small granularity is preferred, such as ROW LEVEL locking.

• InnoDB

• LOCK TABLES tablename READ will lock the table for writes on the mysql server.

• MySQL Cluster

• LOCK TABLES tablename READ will lock the table for writes only on the mysql server where the command is issued!!

• To lock 'tablename' on the entire cluster you must do LOCK TABLES tablename READ on every mysql server.

• Or if you use the Configurator scripts:

• cd tools

• ./execute-all-mysql.sh -e “LOCK TABLES tablename READ”

MySQL Cluster compared to InnoDB -Table Locks

• InnoDB

• Blocking alter tables. Altered table is locked.

• Cluster

• Online (non blocking) – add column online (ALTER ONLINE TABLE … ADD COLUMN x BIGINT ) , add index online, drop index online.

• Other ALTER (changing column size, data type, column name etc, is not online)

• Non-online ALTER TABLE is not blocking!

• You can do the ALTER TABLE on one MySQL and still write to the table on another MySQL server → inconsistent data.

• Non-blocking – There is no table lock distributed across all mysql servers.

• Use LOCK TABLES manually before on all mysql servers, then ALTER, then UNLOCK TABLES on all mysql servers

Cluster compared to InnoDB - ALTER

• Considerations for Foreign Keys

• FKs simplify business logic, but FKs incur a performance overhead

• What is the role of your data? What is the role of the application?

• InnoDB

• Is the only storage engine currently supporting Foreign Keys

• MySQL Cluster

• Workaround is to use TRIGGERs to emulate Foreign Keys

For more info

http://forge.mysql.com/wiki/ForeignKeySupport#Appendix_A:_Triggers_implementing_foreign_key_constraints

MySQL Cluster compared to InnoDB -FOREIGN KEYS

• Failed transactions must be retried by the application

• Also true for InnoDB (and most other databases on the market)

• If the REDOLOG or REDOBUFFER become full, the transaction will be aborted

• This differs from InnoDB behaviour, InnoDB will run slower (and potentially grind to a virtual halt)

• There are also other resources / timeouts • "Lock wait timeout" – transaction will abort after TransactionDeadlockDetectionTimeout

• MaxNoOfConcurrent[Operations/Transactions]

• Nodefail / noderestart will cause transaction to abort

MySQL Cluster compared to InnoDB -Transactions

• System tables are stored in MyISAM format (same as for InnoDb)

• The System tables are local to each MySQL Server

• You must issue/create … :

• GRANTs

• Triggers

• SPROCs

• Views

• Events (mysql’s internal cron)

• ... on all MySQL Servers connected to Cluster.

MySQL Cluster compared to InnoDB –System Tables

Example Setup

SQL+Mgm

+AppServer

+WebServer...

Clients

Data node

Load Balancer(s)

Bonding

Redundant switches

SQL+Mgm

+AppServer

+WebServer...

Data node

Recommendation

• Start with four computers ..

• 2 x Data Nodes

• 2 x MySQL servers

• 2 x Management servers

• … and scale it from there.

MYSQLD

NDB_MGMD

MYSQLD

NDB_MGMD

NDBMTD NDBMTD

Hardware Selection : Network I

• Dedicated >= 1Gb/s networking

• On Oracle Sun CMT it may be necessary to bond 4 or more NICs together because typically many data nodes are on the same physical host.

• Prevent network failures (NIC x 2, Bonding, dual switches)

• Use dedicated network for cluster communication

• Put Data nodes ansd MySQL Servers on e.g 10.0.1.0 network and let MySQL listen on a “public” interface.

• No security layer to management node

• Enable port 1186 access only from cluster nodes and administrators

Hardware Selection : Network II

• The speed of the network greatly affects the performance

• ping <hostname>

• If ping time is > 0.200ms check (on 1Gig-E)

• routes – do you have >1 switch hop from one data node to another?

• Do you have full duplex?

• NAPI enabled (should be)?

• On my machines I have 0.150ms (on 1Gig-E), but if the switches are good then 0.080-0.100 is also possible

• JUMBO frames, you can try to enable this but I have not seen any noticeable improvements with this.

Hardware Selection - RAM & CPU• Storage Layer (Data nodes)

• One data node can (7.0+) use 8 cores

• CPU: 2 x 4 core (Nehalem works really well). Faster CPU → faster processing of messages.

• RAM: As much as you need

• a 10GB data set will require 20GB of RAM (because of redundancy

• Each node will then need 2 x 10 / # of data nodes. (2 data nodes → 10GB of RAM → 16GB RAM is good

• SQL Layer (MySQL Servers)

• CPU: 2 – 16 cores

• RAM: Not as important – 4GB enough (depends on connections and buffers)

Hardware Selection - Disk Subsystem

low-end mid-end high-end

1 x SATA 7200RPM

• For a read-most, write

not so much

• No redundancy

(but other data node is

the mirror)

1 x SAS 10KRPM

• Heavy duty (many MB/s)

• No redundancy

(but other data node is

the mirror)

4 x SAS 10KRPM

• Heavy duty (many MB/s)

• Disk redundancy (RAID1+0)

hot swap

• REDO, LCP, BACKUP – written sequentually in small chunks (256KB)

• If possible, use Odirect = 1

REDOLOG

Hardware Selection - Disk Data Storage

Minimal recommended high-end

2 x SAS 10KRPM (preferably)

• Use High-end for heavy read / write workloads (1000's of 10KB records per sec) of data

(e.g Content Delivery platforms)

• SSD for TABLESPACE is also interesting – not much experience of this yet

• Having TABLESPACE on separate disk is good for read performance

• Enable WRITE_CACHE on devices

TABLESPACE

REDOLOG

UNDOLOG

(REDO LOG / UNDO LOG)

TABLESPACE 1

TABLESPACE 2

4 x SAS 10-15KRPM (preferably)

(REDO LOG)(REDO LOG)

Disk Space Usage

• The data nodes use the disk for:

• LCP: 3 x sizeof(used DataMemory)

• REDO: [4-6]xDataMemory

• More (6x) REDO log for write intensive

• Don’t have a too short REDO (e.g 2x or 3x)

• Backups: sizeof(used DataMemory)

• TableSpace (if disk data tables): Must fit dataset.

Choosing the Filesystem

• Most customers uses EXT3 (Linux) and UFS (Solaris)

• EXT2 is an option (but recovery is longer)

• Mount with noatime

• ZFS

• You must separate journal (Zil) and filesystem

• Raw device is not supported

• EXT4, XFS – we haven't experienced so much…

• Use Disk Data tables for

• Simple accesses (read/write on PK)

• Same for InnoDB – you can easily get IO BOUND (iostat)

• Set• DiskPageBufferMemory=3072M

• is a good start if you rely a lot on disk data – like the Innodb_Buffer_Pool, but set it as high as you can!

• Increased chance that a page will be cached

• SharedGlobalMemory=384M-1024M

• UNDO_BUFFER=64M to 128M (if you write a lot)

• You cannot change this BUFFER later!

• Specified at LOGFILE GROUP creation time

• DiskIOThreadPool=[ 8 .. 16 ] (introduced in 7.0)

Configuration : Disk Data Storage

• Set• MaxNoOfExecutionThreads<=#cores

• Otherwise contention will occur → unexpected behaviour.

• RedoBuffer=32-64M

• If you need to set it higher → your disks are probably too slow

• FragmentLogFileSize=256M

• NoOfFragmentLogFiles= 6 x DataMemory (in MB) / (4x 256MB)

• Most common issue – customers never configure large enough redo log

• The above parameters (and others, also for MySQL) are set for production usage at:

• www.severalnines.com/config

Configuration : General

Administration

• Data nodes – designed for zero maintenance.

• Logs

• Writes error logs and trace files in its data directory.

• Configurable how many error messages/trace files that should be saved

• Memory Fragmentation

• Free pages are reclaimed and can be reused

• If you do a lot of insert/delete on VAR* attributes (of different sizes) you can get fragmentation

• OPTIMIZE TABLE / Rolling restart of data nodes can help reduce fragmentation

• See http://johanandersson.blogspot.com/2009/03/memory-deallocationdefragmentation-and.html

• Management servers

• Writes cluster log (rotating, size configurable) in its data directory

• Cluster logs can be sent to Syslog if desired

• http://www.clusterdb.com/mysql/using-syslog-with-mysql-cluster/

Administration

• MySQL Servers• Binary logs - (if enabled) must be removed manually (can be

done with –expire_logs_days but are you sure all have been applied on the slave?)

• General log / error log / slow log - does not rotate automatically. A script called mysql_log_rotate can help.

• Or move/cp log manually (or scripted) and do FLUSH LOGS

• For MySQL Cluster it is also good to have a dedicated MySQL Server for administration purposes.

• Perform offline ALTER TABLE (like change data type etc)

Administration Layer

• Introduce a MySQL Server for administration purposes!• Should never get application requests

• Simplifies heavy (non online) schema changes

Storage layer

Application layer

SQL layer

Admin layer

Synchronous Replication #give explicit nodeid in config.ini:

[mysqld]

hostname=X

# in my.cnf:

ndb_connectstring=”nodeid=8;x,y”

ndb_cluster_connnection_pool=1

Administration Layer• Modifying Schema is NOT online when you perform

the following:• Rename a table

• Change data type

• Change storage size

• Drop column

• Rename column

• Add/Drop a PRIMARY KEY

• Altering a 1GB table requires 1GB of free DataMemory (copying)

• Online (and ok to do with transactions ongoing):• Add column (ALTER ONLINE …)

• CREATE INDEX

• Online add node

Admistration Layer

• ALTER TABLE etc (non-online DDL) performed on Admin Layer!

Storage

App layer

SQL layer

Admin layer

Synchronous Replication

#give explicit nodeid in config.ini

[mysqld]

hostname=X

# in my.cnf:

ndb_connectstring=”nodeid=8;x,y”

ndb_cluster_connnection_pool=1

• 1. Block traffic fromSQL layer to data nodes

• ndb_mgm>ENTER SINGLE USER MODE 8

• Only Admin mysqld is now connected to the data nodes

• Or do LOCK TABLES on SQL Layer!

• 2. Perform heavy ALTER on admin layer

• 3. Allow traffic from SQL layer to data nodes

• ndb_mgm> EXIT SINGLE USER MODE

• Or do UNLOCK TABLES on the whole SQL Layer!

STOP!! No Traffic Now!

Admistration Layer• You can also set up MySQL Replication from Admin layer to the

SQL layer

• Replicate mysql database

• GRANT, SPROCs etc will be replicated.

• Keeps the SQL Layer aligned¨

Storage layer

App layer

SQL layer

Admin layer

Synchronous Replication

binlog_do_db=mysql

Online Upgrades

• Change Online • OS, SW version (7.0.x → 7.1.x)

• Configuration( e.g, increase DM, IM, Buffers, redo log, [mysqld] slots etc

• Hardware (upgrade more RAM etc)

• These procedures requires a Rolling Restart• Change config.ini, copy it over to all ndb_mgmd

• Stop ndb_mgmd , start ndb_mgmd with --reload

• Restart one data node at a time

• Restart one mysqld at a time

• Adding data nodes (7.0 and above)

• Adding MySQL Servers• Make sure you have free [mysqld] slots

• Start the new mysqld

Scaling

• One data node can (7.0+) use up to 8 cores

• CPU: Reaches bottleneck at about 370% CPU

• add another node group (to spread load)

• DISK: iostat -kx 1 : Check util; await, svctime etc..

• Add disks

• NETWORK: iftop (linux)

• add another node group (to spread load)

• MySQL Server

• CPU: About the same – 300-500%

• Add another MySQL Server to offload query processing

• DISK: Should not be a factor if you are using only NDB tables

• NETWORK:

• Add another MySQL Server to offload query processing

Monitoring

• Mandatory to monitor

• CPU/Network/Memory usage

• Disk capacity (I/O) usage

• Network latency between nodes

• Node status ...

• Used Index/Data Memory

• www.severalnines.com/cmon - monitors data nodes and mysql servers

• New in MySQL Cluster 7.1 :

• NDB$INFO Table in INFORMATION_SCHEMA

• Check node status

• Check buffer status etc

• Statistics

• http://www.severalnines.com/config

• Config for data nodes as well as mysql servers

• Scripts to manage the cluster

• MySQL Cluster Manager can be used for management

• Available as part of MySQL Cluster Carrier Grade Edition only

• http://www.mysql.com/products/database/cluster/mcm/

• MySQL Enterprise Monitor can monitor the MySQL Servers (custom scripts can be written to also monitor data nodes)

Best Practice : Use a good config

• To avoid problems with

• Cluster 2 Cluster replication

• Recovery

• Application behavior (KEY NOT FOUND.. etc)

• ALWAYS DEFINE A PRIMARY KEY ON THE TABLE!

• A hidden PRIMARY KEY is added if no PK is specified. BUT..

• .. NOT recommended

• The hidden primary key is for example not replicated (between Clusters)!!

• There are problems in this area, so avoid the problems!

• So always, at least haveid BIGINT AUTO_INCREMENT PRIMARY KEY

• Even if you don't “need” it for you applications

Best Practice : Primary Keys

• Don't enable the Query Cache!

• It is very expensive to invalidate over X mysql servers

• A write on one server will force the others to purge their cache.

• If you have tables that are read only (or change very seldom):

• my.cnf:

• query_cache_type=2 (ON DEMAND)

• SELECT SQL_CACHE <cols> .. FROM table;

• Cache only queries with SQL_CACHE

• This can be good for STATIC data

Best Practice : Query Cache

Best Practice : Large Transactions

• Remember NDB is designed for many and short transactions

• You are recommended to UPDATE / DELETE in small chunks

• Use LIMIT 10000 until all records are UPDATED/DELETED

• MaxNoOfConcurrentOperations sets the upper limit for how many records than can be modified simultaneously on one data node.• MaxNoOfConcurrentOperations=1000000 will use 1GB

of RAM

• Despite being possible, we recommend DELETE/UPDATE in smaller chunks.

Best Practice : Table logging

• Some types of tables account for a lot of WRITEs, but do not need to be recovered (E.g, Session tables)

• A session table is often unnecessary to REDO LOG and to CHECKPOINT

• Create these tables as 'NO LOGGING' tables:

• 'session_table' will not be

• REDO logged or Checkpointed → No disk activity for this table!

• After System Restart it will be there, but empty!

mysql> set @ndb_curr_val=@@ndb_table_no_logging;

mysql> set ndb_table_no_logging=1;

mysql> create table session_table(..) engine=ndb;

mysql> set ndb_table_no_logging=@ndb_curr_val;

Best Practice : Backup

• Backup of NDB tables• Online – can have ongoing transactions

• Consistent – only committed data and changes are backed up

• ndb_mgm -e “START BACKUP”

• Copy backup files from data nodes to safe location

• Non-NDB tables must be backed up separately

• MySQL system tables are stored only in MYISAM.

• You want to backup (for each mysql server)

• mysql database

• Triggers, routines, events ...

• Use 'mysqldump'• mysqldump mysql > mysql.sql

• mysqldump --no-data --no-create-info -R > routines.sql

• Copy my.cnf & config.ini files

• ndb_restore is in many cases the MOST write intensive operation on Cluster

• The problem is that ndb_restore produces REDO LOG

• This is unnecessary but a fact for now

• Restores many records in parallel, no throttling..

• So 128 or more small records may be fine, but 128 BLOBs….

• If you run into this during restore

• Try increase RedoBuffer (a value of higher than 64MB is seldom practical nor needed)

• Run only one instance of ndb_restore

• ndb_restore -p10 ....

• Or even a lower value, e.g, -p1

• If this does not help → faster disk(s) is/are needed

Best Practice: Restore

Temporary error: 410: REDO log buffers overloaded, consult online manual

(increase RedoBuffer, and|or

decrease TimeBetweenLocalCheckpoints, and|or increase NoOfFragmentLogFiles)

RBSynced

Questions?

• Email: johan.andersson@oracle.com

• Blog: johanandersson.blogspot.com

Resources

• Getting Started with MySQL Cluster – 5 Steps, <15 minutes

• http://www.mysql.com/products/database/cluster/get-started.html#quickstart

• Getting Started & Scaling Webinar, Sept 8th

• http://www.mysql.com/news-and-events/web-seminars/display-566.html

• MySQL Cluster Evaluation Guide

• http://www.mysql.com/why-mysql/white-papers/mysql_cluster_eval_guide.php

• MySQL Cluster Performance Tuning Best Practiceshttp://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster_perfomance.php

MySQL Cluster Deployment Best Practices€¦ · • High availability (99.999%) • Let’s look...

Documents

Welcome to OSCON 2012 MySQL Cluster and NoSQLassets.en.oreilly.com/1/event/80/MySQL Cluster and... · 2003: MySQL + NDB Cluster = MySQL Cluster • MySQL: – built for cheap linux

MySQL Cluster Configuración

Архитектура MySQL Cluster

Exploring mysql cluster 7.4

Drupal MySQL Cluster

Mysql Cluster Manual

MySQL Cluster Basics

MySQL Cluster Tutorial

MySQL Cluster Evaluation Guide

Tango Database & MySQL Cluster

Cluster Mysql

Manual Cluster Mysql

MySQL Cluster Cette présentation illustre la solution open source MySQL Cluster 7.1. Larchitecture MySQL Cluster 7.1 permet de répondre aux besoins suivants

MySQL Cluster Product Overview

MySQL 5.7入門インストール、アーキテクチャ基礎編 Database 5.6 All GA! MySQL Cluster 7.3 1.3 MySQL Enterprise Monitor 3.0 MySQL Fabric MySQL Workbench 6.3 MySQL Cluster

Mysql cluster introduction

MySQL Cluster CGE 7.2

MySQL HA Percona cluster @ MySQL Meetup Mumbai

MySQL Cluster - 2012.nosql- · PDF fileMySQL Cluster Scaling to Billion Database Queries with MySQL Cluster Bernd Ocklin MySQL Cluster Engineering bernd.ocklin@oracle.com

MySQL Cluster – Performance Tuningassets.en.oreilly.com/1/event/36/ MySQL Cluster Performance Tuning... · MySQL Cluster – Performance Tuning ... • Enable the slow query log!