View
8
Download
0
Category
Preview:
Citation preview
<Insert Picture Here>
MySQL Cluster Deployment Best Practices
Johan Andersson Mat KeepMySQL Cluster Practice Manager MySQL Product Team
• Suitable Applications
• MySQL Cluster compared to InnoDB – main differences
• Network & Hardware Selection
• Disk Data Deployment
• Configuration
• Administration & Implementation Best Practices
• Online/Offline Operations
• Backup and restore
• Monitoring
• Services available to get started
Agenda
MySQL Cluster – Users & ApplicationsHA, Transactional Services: Web & Telecoms
http://www.mysql.com/customers/cluster/
• User & Subscriber Databases
• Service Delivery Platforms
• Application Servers
• Web Session Stores
• eCommerce
• VoIP, IPTV & VoD
• Mobile Content Delivery
• On-Line app stores and portals
• On-Line Gaming
• DNS/DHCP for Broadband
• Payment Gateways
• Data Store for LDAP Directories
• Good fit• OLTP apps with short running queries
• Application with realtime characteristics and requirements
• A lot of concurrent requests
• Write intensive applications
• Typically the following are a poor fit:• Heavy reporting type (OLAP)
• Data Warehouse
• Complex JOINs scale badly (a couple of tables and with about 1000 records meeting the JOIN criteria is just fine)
• However, replicate from MySQL Cluster to regular MySQL (innodb) which runs the reporting.
Suitable Applications
Realtime and Reporting Architecture
Realtime AppsReporting System
• Replication
• Mysqldump
• ndb_restore
→ csv
→ LOAD DATA INFILE
Complex
reporting queries
Don't mix real-time operations with Reporting - separate!
How to set this up with replication? Go to http://johanandersson.blogspot.com/2009/05/ha-mysql-write-scaling-using-cluster-to.html
App Servers
SQL Layer
Storage
Layer
Data Collection/Aggregation Architecture
Aggregate data from peripheral systems (sources)
http://johanandersson.blogspot.com/2009/04/multi-source-replication-with-mysql.html
HA Shard Catalog
Shard Catalog
MySQL ClusterShard_n
• Shard Catalog stores user_id → shard_id and other indexes/mappings (user_id → friend_id:shard_id).
• Shard Catalog can grow online
App Servers
SQL Layer
Storage
Layer
Shard_0
Memcached / caching layer
• Every database has its characteristics
• MySQL Cluster is designed for
• Short, but many, parallell transactions
• High volume
• High degree of concurrency
• High availability (99.999%)
• Let’s look how MySQL Cluster compares to Innodb (and most other traditional databases)
MySQL Cluster compared to InnoDB / Other databases
• Table locks are usually taken before an Offline operation (e.g ALTER to change data type). During normal traffic then a small granularity is preferred, such as ROW LEVEL locking.
• InnoDB
• LOCK TABLES tablename READ will lock the table for writes on the mysql server.
• MySQL Cluster
• LOCK TABLES tablename READ will lock the table for writes only on the mysql server where the command is issued!!
• To lock 'tablename' on the entire cluster you must do LOCK TABLES tablename READ on every mysql server.
• Or if you use the Configurator scripts:
• cd tools
• ./execute-all-mysql.sh -e “LOCK TABLES tablename READ”
MySQL Cluster compared to InnoDB -Table Locks
• InnoDB
• Blocking alter tables. Altered table is locked.
• Cluster
• Online (non blocking) – add column online (ALTER ONLINE TABLE … ADD COLUMN x BIGINT ) , add index online, drop index online.
• Other ALTER (changing column size, data type, column name etc, is not online)
• Non-online ALTER TABLE is not blocking!
• You can do the ALTER TABLE on one MySQL and still write to the table on another MySQL server → inconsistent data.
• Non-blocking – There is no table lock distributed across all mysql servers.
• Use LOCK TABLES manually before on all mysql servers, then ALTER, then UNLOCK TABLES on all mysql servers
Cluster compared to InnoDB - ALTER
• Considerations for Foreign Keys
• FKs simplify business logic, but FKs incur a performance overhead
• What is the role of your data? What is the role of the application?
• InnoDB
• Is the only storage engine currently supporting Foreign Keys
• MySQL Cluster
• Workaround is to use TRIGGERs to emulate Foreign Keys
For more info
http://forge.mysql.com/wiki/ForeignKeySupport#Appendix_A:_Triggers_implementing_foreign_key_constraints
MySQL Cluster compared to InnoDB -FOREIGN KEYS
• Failed transactions must be retried by the application
• Also true for InnoDB (and most other databases on the market)
• If the REDOLOG or REDOBUFFER become full, the transaction will be aborted
• This differs from InnoDB behaviour, InnoDB will run slower (and potentially grind to a virtual halt)
• There are also other resources / timeouts • "Lock wait timeout" – transaction will abort after TransactionDeadlockDetectionTimeout
• MaxNoOfConcurrent[Operations/Transactions]
• Nodefail / noderestart will cause transaction to abort
MySQL Cluster compared to InnoDB -Transactions
• System tables are stored in MyISAM format (same as for InnoDb)
• The System tables are local to each MySQL Server
• You must issue/create … :
• GRANTs
• Triggers
• SPROCs
• Views
• Events (mysql’s internal cron)
• ... on all MySQL Servers connected to Cluster.
MySQL Cluster compared to InnoDB –System Tables
Example Setup
SQL+Mgm
+AppServer
+WebServer...
Clients
Data node
Load Balancer(s)
Bonding
Redundant switches
SQL+Mgm
+AppServer
+WebServer...
Data node
Recommendation
• Start with four computers ..
• 2 x Data Nodes
• 2 x MySQL servers
• 2 x Management servers
• … and scale it from there.
MYSQLD
NDB_MGMD
MYSQLD
NDB_MGMD
NDBMTD NDBMTD
Hardware Selection : Network I
• Dedicated >= 1Gb/s networking
• On Oracle Sun CMT it may be necessary to bond 4 or more NICs together because typically many data nodes are on the same physical host.
• Prevent network failures (NIC x 2, Bonding, dual switches)
• Use dedicated network for cluster communication
• Put Data nodes ansd MySQL Servers on e.g 10.0.1.0 network and let MySQL listen on a “public” interface.
• No security layer to management node
• Enable port 1186 access only from cluster nodes and administrators
Hardware Selection : Network II
• The speed of the network greatly affects the performance
• ping <hostname>
• If ping time is > 0.200ms check (on 1Gig-E)
• routes – do you have >1 switch hop from one data node to another?
• Do you have full duplex?
• NAPI enabled (should be)?
• On my machines I have 0.150ms (on 1Gig-E), but if the switches are good then 0.080-0.100 is also possible
• JUMBO frames, you can try to enable this but I have not seen any noticeable improvements with this.
Hardware Selection - RAM & CPU• Storage Layer (Data nodes)
• One data node can (7.0+) use 8 cores
• CPU: 2 x 4 core (Nehalem works really well). Faster CPU → faster processing of messages.
• RAM: As much as you need
• a 10GB data set will require 20GB of RAM (because of redundancy
• Each node will then need 2 x 10 / # of data nodes. (2 data nodes → 10GB of RAM → 16GB RAM is good
• SQL Layer (MySQL Servers)
• CPU: 2 – 16 cores
• RAM: Not as important – 4GB enough (depends on connections and buffers)
Hardware Selection - Disk Subsystem
low-end mid-end high-end
1 x SATA 7200RPM
• For a read-most, write
not so much
• No redundancy
(but other data node is
the mirror)
1 x SAS 10KRPM
• Heavy duty (many MB/s)
• No redundancy
(but other data node is
the mirror)
4 x SAS 10KRPM
• Heavy duty (many MB/s)
• Disk redundancy (RAID1+0)
hot swap
• REDO, LCP, BACKUP – written sequentually in small chunks (256KB)
• If possible, use Odirect = 1
LCP
REDOLOG
LCP
REDOLOG
LCP
REDOLOG
Hardware Selection - Disk Data Storage
Minimal recommended high-end
2 x SAS 10KRPM (preferably)
• Use High-end for heavy read / write workloads (1000's of 10KB records per sec) of data
(e.g Content Delivery platforms)
• SSD for TABLESPACE is also interesting – not much experience of this yet
• Having TABLESPACE on separate disk is good for read performance
• Enable WRITE_CACHE on devices
TABLESPACE
LCP
REDOLOG
UNDOLOG
UNDOLOG
LCP
(REDO LOG / UNDO LOG)
TABLESPACE 1
TABLESPACE 2
4 x SAS 10-15KRPM (preferably)
(REDO LOG)(REDO LOG)
Disk Space Usage
• The data nodes use the disk for:
• LCP: 3 x sizeof(used DataMemory)
• REDO: [4-6]xDataMemory
• More (6x) REDO log for write intensive
• Don’t have a too short REDO (e.g 2x or 3x)
• Backups: sizeof(used DataMemory)
• TableSpace (if disk data tables): Must fit dataset.
Choosing the Filesystem
• Most customers uses EXT3 (Linux) and UFS (Solaris)
• EXT2 is an option (but recovery is longer)
• Mount with noatime
• ZFS
• You must separate journal (Zil) and filesystem
• Raw device is not supported
• EXT4, XFS – we haven't experienced so much…
• Use Disk Data tables for
• Simple accesses (read/write on PK)
• Same for InnoDB – you can easily get IO BOUND (iostat)
• Set• DiskPageBufferMemory=3072M
• is a good start if you rely a lot on disk data – like the Innodb_Buffer_Pool, but set it as high as you can!
• Increased chance that a page will be cached
• SharedGlobalMemory=384M-1024M
• UNDO_BUFFER=64M to 128M (if you write a lot)
• You cannot change this BUFFER later!
• Specified at LOGFILE GROUP creation time
• DiskIOThreadPool=[ 8 .. 16 ] (introduced in 7.0)
Configuration : Disk Data Storage
• Set• MaxNoOfExecutionThreads<=#cores
• Otherwise contention will occur → unexpected behaviour.
• RedoBuffer=32-64M
• If you need to set it higher → your disks are probably too slow
• FragmentLogFileSize=256M
• NoOfFragmentLogFiles= 6 x DataMemory (in MB) / (4x 256MB)
• Most common issue – customers never configure large enough redo log
• The above parameters (and others, also for MySQL) are set for production usage at:
• www.severalnines.com/config
Configuration : General
Administration
• Data nodes – designed for zero maintenance.
• Logs
• Writes error logs and trace files in its data directory.
• Configurable how many error messages/trace files that should be saved
• Memory Fragmentation
• Free pages are reclaimed and can be reused
• If you do a lot of insert/delete on VAR* attributes (of different sizes) you can get fragmentation
• OPTIMIZE TABLE / Rolling restart of data nodes can help reduce fragmentation
• See http://johanandersson.blogspot.com/2009/03/memory-deallocationdefragmentation-and.html
• Management servers
• Writes cluster log (rotating, size configurable) in its data directory
• Cluster logs can be sent to Syslog if desired
• http://www.clusterdb.com/mysql/using-syslog-with-mysql-cluster/
Administration
• MySQL Servers• Binary logs - (if enabled) must be removed manually (can be
done with –expire_logs_days but are you sure all have been applied on the slave?)
• General log / error log / slow log - does not rotate automatically. A script called mysql_log_rotate can help.
• Or move/cp log manually (or scripted) and do FLUSH LOGS
• For MySQL Cluster it is also good to have a dedicated MySQL Server for administration purposes.
• Perform offline ALTER TABLE (like change data type etc)
Administration Layer
• Introduce a MySQL Server for administration purposes!• Should never get application requests
• Simplifies heavy (non online) schema changes
Storage layer
Application layer
SQL layer
Admin layer
Synchronous Replication #give explicit nodeid in config.ini:
[mysqld]
id=8
hostname=X
# in my.cnf:
ndb_connectstring=”nodeid=8;x,y”
ndb_cluster_connnection_pool=1
Administration Layer• Modifying Schema is NOT online when you perform
the following:• Rename a table
• Change data type
• Change storage size
• Drop column
• Rename column
• Add/Drop a PRIMARY KEY
• Altering a 1GB table requires 1GB of free DataMemory (copying)
• Online (and ok to do with transactions ongoing):• Add column (ALTER ONLINE …)
• CREATE INDEX
• Online add node
Admistration Layer
• ALTER TABLE etc (non-online DDL) performed on Admin Layer!
Storage
layer
App layer
SQL layer
Admin layer
Synchronous Replication
#give explicit nodeid in config.ini
[mysqld]
id=8
hostname=X
# in my.cnf:
ndb_connectstring=”nodeid=8;x,y”
ndb_cluster_connnection_pool=1
• 1. Block traffic fromSQL layer to data nodes
• ndb_mgm>ENTER SINGLE USER MODE 8
• Only Admin mysqld is now connected to the data nodes
• Or do LOCK TABLES on SQL Layer!
• 2. Perform heavy ALTER on admin layer
• 3. Allow traffic from SQL layer to data nodes
• ndb_mgm> EXIT SINGLE USER MODE
• Or do UNLOCK TABLES on the whole SQL Layer!
STOP!! No Traffic Now!
Admistration Layer• You can also set up MySQL Replication from Admin layer to the
SQL layer
• Replicate mysql database
• GRANT, SPROCs etc will be replicated.
• Keeps the SQL Layer aligned¨
Storage layer
App layer
SQL layer
Admin layer
Synchronous Replication
binlog_do_db=mysql
Online Upgrades
• Change Online • OS, SW version (7.0.x → 7.1.x)
• Configuration( e.g, increase DM, IM, Buffers, redo log, [mysqld] slots etc
• Hardware (upgrade more RAM etc)
• These procedures requires a Rolling Restart• Change config.ini, copy it over to all ndb_mgmd
• Stop ndb_mgmd , start ndb_mgmd with --reload
• Restart one data node at a time
• Restart one mysqld at a time
• Adding data nodes (7.0 and above)
• Adding MySQL Servers• Make sure you have free [mysqld] slots
• Start the new mysqld
Scaling
• One data node can (7.0+) use up to 8 cores
• CPU: Reaches bottleneck at about 370% CPU
• add another node group (to spread load)
• DISK: iostat -kx 1 : Check util; await, svctime etc..
• Add disks
• NETWORK: iftop (linux)
• add another node group (to spread load)
• MySQL Server
• CPU: About the same – 300-500%
• Add another MySQL Server to offload query processing
• DISK: Should not be a factor if you are using only NDB tables
• NETWORK:
• Add another MySQL Server to offload query processing
Monitoring
• Mandatory to monitor
• CPU/Network/Memory usage
• Disk capacity (I/O) usage
• Network latency between nodes
• Node status ...
• Used Index/Data Memory
• www.severalnines.com/cmon - monitors data nodes and mysql servers
• New in MySQL Cluster 7.1 :
• NDB$INFO Table in INFORMATION_SCHEMA
• Check node status
• Check buffer status etc
• Statistics
• http://www.severalnines.com/config
• Config for data nodes as well as mysql servers
• Scripts to manage the cluster
• MySQL Cluster Manager can be used for management
• Available as part of MySQL Cluster Carrier Grade Edition only
• http://www.mysql.com/products/database/cluster/mcm/
• MySQL Enterprise Monitor can monitor the MySQL Servers (custom scripts can be written to also monitor data nodes)
Best Practice : Use a good config
• To avoid problems with
• Cluster 2 Cluster replication
• Recovery
• Application behavior (KEY NOT FOUND.. etc)
• ALWAYS DEFINE A PRIMARY KEY ON THE TABLE!
• A hidden PRIMARY KEY is added if no PK is specified. BUT..
• .. NOT recommended
• The hidden primary key is for example not replicated (between Clusters)!!
• There are problems in this area, so avoid the problems!
• So always, at least haveid BIGINT AUTO_INCREMENT PRIMARY KEY
• Even if you don't “need” it for you applications
Best Practice : Primary Keys
• Don't enable the Query Cache!
• It is very expensive to invalidate over X mysql servers
• A write on one server will force the others to purge their cache.
• If you have tables that are read only (or change very seldom):
• my.cnf:
• query_cache_type=2 (ON DEMAND)
• SELECT SQL_CACHE <cols> .. FROM table;
• Cache only queries with SQL_CACHE
• This can be good for STATIC data
Best Practice : Query Cache
Best Practice : Large Transactions
• Remember NDB is designed for many and short transactions
• You are recommended to UPDATE / DELETE in small chunks
• Use LIMIT 10000 until all records are UPDATED/DELETED
• MaxNoOfConcurrentOperations sets the upper limit for how many records than can be modified simultaneously on one data node.• MaxNoOfConcurrentOperations=1000000 will use 1GB
of RAM
• Despite being possible, we recommend DELETE/UPDATE in smaller chunks.
Best Practice : Table logging
• Some types of tables account for a lot of WRITEs, but do not need to be recovered (E.g, Session tables)
• A session table is often unnecessary to REDO LOG and to CHECKPOINT
• Create these tables as 'NO LOGGING' tables:
• 'session_table' will not be
• REDO logged or Checkpointed → No disk activity for this table!
• After System Restart it will be there, but empty!
mysql> set @ndb_curr_val=@@ndb_table_no_logging;
mysql> set ndb_table_no_logging=1;
mysql> create table session_table(..) engine=ndb;
mysql> set ndb_table_no_logging=@ndb_curr_val;
Best Practice : Backup
• Backup of NDB tables• Online – can have ongoing transactions
• Consistent – only committed data and changes are backed up
• ndb_mgm -e “START BACKUP”
• Copy backup files from data nodes to safe location
• Non-NDB tables must be backed up separately
• MySQL system tables are stored only in MYISAM.
• You want to backup (for each mysql server)
• mysql database
• Triggers, routines, events ...
• Use 'mysqldump'• mysqldump mysql > mysql.sql
• mysqldump --no-data --no-create-info -R > routines.sql
• Copy my.cnf & config.ini files
• ndb_restore is in many cases the MOST write intensive operation on Cluster
• The problem is that ndb_restore produces REDO LOG
• This is unnecessary but a fact for now
• Restores many records in parallel, no throttling..
• So 128 or more small records may be fine, but 128 BLOBs….
• If you run into this during restore
• Try increase RedoBuffer (a value of higher than 64MB is seldom practical nor needed)
• Run only one instance of ndb_restore
• ndb_restore -p10 ....
• Or even a lower value, e.g, -p1
• If this does not help → faster disk(s) is/are needed
Best Practice: Restore
Temporary error: 410: REDO log buffers overloaded, consult online manual
(increase RedoBuffer, and|or
decrease TimeBetweenLocalCheckpoints, and|or increase NoOfFragmentLogFiles)
RBSynced
every
TBGCP
Questions?
• Email: johan.andersson@oracle.com
• Blog: johanandersson.blogspot.com
Resources
• Getting Started with MySQL Cluster – 5 Steps, <15 minutes
• http://www.mysql.com/products/database/cluster/get-started.html#quickstart
• Getting Started & Scaling Webinar, Sept 8th
• http://www.mysql.com/news-and-events/web-seminars/display-566.html
• MySQL Cluster Evaluation Guide
• http://www.mysql.com/why-mysql/white-papers/mysql_cluster_eval_guide.php
• MySQL Cluster Performance Tuning Best Practiceshttp://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster_perfomance.php
Recommended