13
James Sturrock, Operations Manager February 15, 2018 MySQL at Mastercard

MySQL At Mastercard - 2018 MySQL Days

Embed Size (px)

Citation preview

Page 1: MySQL At Mastercard - 2018 MySQL Days

James Sturrock, Operations Manager February 15, 2018

MySQL at Mastercard

Page 2: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

2 JANUARY 19, 2018

• We employ over 13,000 people worldwide

• One of the most recognizable brands in the world

• Our vision is “a World Beyond Cash™“

• Our mission is: Every day, everywhere, we use our technology and expertise to make payments safe, simple and smart

About us

Page 3: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

3 JANUARY 19, 2018

• James Sturrock

• Operations Manager

• With Mastercard for over 7 years

• Part of the Payment Gateway Services division

Who am I?

Page 4: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

4 JANUARY 19, 2018

• Our Payment Gateways processes financial transitions for merchants globally, across a variety of sectors such as: – Ecommerce – major online brands – Airlines – Cardholder Present – pub and restaurant chains,

high-street stores etc

• We bridge the gap between your bank authorizing a payment and the merchant receiving the funds

• Due to the nature of our business, operationally we must focus on maintaining three key objectives: 1.  Security – we handle peoples personal data as well as cardholder

data 2.  Stability – huge financial and reputational cost to merchants if

people can’t buy things 3.  Scalability – we need to ensure we can always cope with

unexpected surges in traffic (Black Friday, Sporting Events etc)

What we do

Page 5: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

5 JANUARY 19, 2018

• MySQL was a good fit for our Linux environment and open source approach

• Flexibility to use it in whatever way you need to

• Stability, MySQL is almost never the problem!

• Simplicity, MySQL can be used as simply or complex as you want

Why MySQL

Page 6: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

6 JANUARY 19, 2018

– Enterprise Monitor – Enterprise Authentication

– Enterprise Scalability

• These are all products which we are now using or evaluating! ensure that 3rd party vendor releases patches for security vulnerabilities in a timely manner. This can’t be guaranteed from the open source community.

• Traditionally we have failed to take advantage of the full suite of Enterprise tools such as: – Enterprise Monitor – Enterprise Authentication – Enterprise Scalability

• These are all products which we are now using or evaluating!

Why MySQL Enterprise

Page 7: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

7 JANUARY 19, 2018

• Around 40 servers running MySQL

• Anywhere between 1 and 12 running instances of MySQL on a single machine

• Vast majority are running MySQL Enterprise Edition

• All running on Red Hat Enterprise Linux (64 bit)

General Overview

Hardware

Presentation

Operating System

Database

Page 8: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

8 JANUARY 19, 2018

• We use MySQL Enterprise Monitor along with some legacy in house log monitoring tools

• Nagios is used for system level monitoring as well as basic MySQL checks (such as are instances running, is replication stalled, how far behind is replication)

• Grafana used for monitoring “user experience” of the platform, often the best indicator if there is an actual problem

Monitoring

Page 9: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

9 JANUARY 19, 2018

• Classic “upside down tree” replication chain, A single read/write master replicates down the chain one by one

• Having too many slaves replicating off one master can slow down the master!

• When carrying out a failover, there is much less remastering to be done

• Allows for us to carry out major schema upgrades on all slaves then failover with no downtime

Replication

Host 1

Host 2

Host 3

Host 4 Host X

Page 10: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

10 JANUARY 19, 2018

• Our replication structure and database design doesn’t give us high availability out of the box, there is still a single read/write master

• Red Hat Cluster Suite layered on top of MySQL to provide automated failure detection and failover

• Built in clustering and quorum functionality

• Essentially manages a VIP and ensures it is running on the correct host

• Custom health checks are run by the cluster software to determine if a MySQL instance or the entire host has crashed

High Availability

Page 11: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

11 JANUARY 19, 2018

• Replication lag during peak processing periods – Potentially could be fixed by parallel replication

• Cumbersome process to isolate databases for Kernel patching – Potentially could be fixed by using GTID replication – Potentially could be fixed by using tools like salt/fabric to automate

• Length of time for a cold started database to become “hot” and fast enough to use – Potentially could be fixed by migrating to InnoDB

Current Challenges…

Page 12: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

12 JANUARY 19, 2018

• Compliance considerations: – MySQL 8? – RHEL 7?

• Performance/usability improvements: –  Implement GTID replication – Test parallel replication

• Tighter integration of MySQL into our “DevOps” toolkit – Puppet – Fabric/salt

The Future…..

Page 13: MySQL At Mastercard - 2018 MySQL Days

©20

18 M

aste

rcar

d. P

ropr

ieta

ry a

nd C

onfid

entia

l.

13 JANUARY 19, 2018

• Use SSD disks where possible!

• Always test schema changes on a dataset equivalent to production (and test the rollback as well as the rollout)

• You can never have too many monitoring metrics across your platform

• Having a production like stress test environment is invaluable

• Historically MySQL has not been the problem, hardware and software bottlenecks are more common

• Disconnect database connections when reaching out to 3rd party services (avoids rapidly reaching the max_connection limit)

Lessons Learned…