32
1 Scaling MySQL at Venmo Dong Wang (PayPal Inc), Van Pham (Venmo), Heidi Wang (PayPal Inc) Percona Live 2019, Austin TX © 2019 PayPal Inc. Confidential and proprietary.

Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

1

Scaling MySQL at Venmo

Dong Wang (PayPal Inc), Van Pham (Venmo), Heidi Wang (PayPal Inc)

Percona Live 2019, Austin TX

© 2019 PayPal Inc. Confidential and proprietary.

Page 2: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

AgendaIntroduction

Venmo History

Application Ecosystem and MySQL Architecture

Scalability Challenges

Short Term Tactical Improvements

Long Term Strategic Improvements

Wrap up: Q & A

© 2019 PayPal Inc. Confidential and proprietary.

Page 3: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

The History of Venmo

© 2019 PayPal Inc. Confidential and proprietary.

Page 4: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

The History of Venmo

• Venmo was founded by Andrew

Kortina and Iqram Magdon-Ismail,

as freshman roommates at the

University of Pennsylvania

• The original prototype sent money

through text messages, and

eventually transitioned to a

smartphone app

• In 2012, the company was acquired

by Braintree for $26.2 million

• In December 2013, PayPal acquired

Braintree and by default Venmo

How Venmo Started

© 2019 PayPal Inc. Confidential and proprietary

Page 5: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

The History of Venmo

As of Q1 2019

• Venmo has 40 million users

• Account for 14.5% of PayPal's total user

base

• Venmo posted $21 billion in volume

• Growth of 73% in volume annually

How Venmo Works

Pay with Venmo

Just Venmo Me

© 2019 PayPal Inc. Confidential and proprietary.

Page 6: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Challenges

• Growth in partnership with vendors

• Exponential growth in the user base

• Exponential growth in payment volume

• Exponential growth in data volume

Business Requirement

• Add new features and initiatives

• Increase fraud detections and

security

• Scale the payment volume

• Increase user satisfaction

• Keep what makes Venmo unique

Venmo Growing Pains

© 2019 PayPal Inc. Confidential and proprietary.

Page 7: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Venmo Payment Volume 2 Year Trend

© 2019 PayPal Inc. Confidential and proprietary.

Page 8: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

8©2018 PayPal Inc. Confidential and proprietary.

Application Ecosystem and Database Architecture

Page 9: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Full Stack in AWS

Web(Shabu)

AndroidiOS Amazon CDN(CloudFront)

WebMobile

Routing Amazon Route 53

Task Workers (risk/fraud/comp, etc) Misc (cron, etc)

DB

Admin(Scope/VU)

Core Analytics Auth OFAC

celery

brokersLocks

Orch Developer API

REST

Nginx Envoy

Venmo Application Ecosystem

FeedSocial Graph Pub/Friend/User Feeds QueriesLogin Events

© 2019 PayPal Inc. Confidential and proprietary.

Page 10: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Model

View

Template

ORM Generated

SQLDB

Venmo Application Framework

MVT Framework for both web and web services

© 2019 PayPal Inc. Confidential and proprietary.

Page 11: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Database ArchitectureAmazon Aurora

© 2019 PayPal Inc. Confidential and proprietary.

Page 12: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Pros

• Managed services

• Low latency read replicas

• Stable and better performance

• Easy provisioning and scaling

• Faster backup and cloning

• Point-in-time recovery

• Custom end points

• Monitoring tools

Cons

• Less visibility to system and storage

layer

• Limited vertical and horizontal

scalability

• Maximum cluster volume of 64 TB

• Writer restart causes all readers to

reboot

Amazon Aurora - Pros and Cons

© 2019 PayPal Inc. Confidential and proprietary.

Page 13: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

13©2018 PayPal Inc. Confidential and proprietary.

Scalability Challenges at Venmo

Page 14: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Area Symptom Impact

Scalability • Limited horizontal scalability with

more read-only nodes for MongoDB and

MySQL

• Uneven CPU/connection distribution

• Read traffic not using read replicas

effectively

• Can’t handle increase in payment

volume

• Bad user experience

• Higher call volume to

customer support

• Low user ratings in the

app stores

Platform • Old version of MySQL, MongoDB and Cassandra

• Slow performance

• Inconsistent data

DR Readiness (in

progress)

• Limited distribution of DB nodes in US-

East AZs

• Lack of DB regional parity in US-West

• Degraded performance

in a regional failure

scenario

2018 Infrastructure Challenges

© 2019 PayPal Inc. Confidential and proprietary.

Page 15: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Area Symptom Impact

Query

Performance

• Bad performance from queries generated by ORM

• Top 10 slow queries > 75% of slow query time

• High latency during peak

time

• High CPU usage

Changes in

Access Pattern

• Too many indexes not used by the application

• Can’t add covering index to large tables• Slow queries

• Slower updates &

payment per second

Data Model • No data retention policy

• Heavily skewed data distribution in MySQL• Unoptimized keyspace in Cassandra and

MongoDB

• Payment failure or low

payment per second

• A high rate of time out

• Maintenance challenge

Transaction

Model

• Multiple DBs involved• High number of deadlocks• High number of blocking reads (Select for

Update)

• Payment failure

• Reduce payment

per second

2018 Architecture and Performance Challenges

© 2019 PayPal Inc. Confidential and proprietary.

Page 16: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Area Symptom Impact

Monitoring • Many monitoring tools including legacy and

new Grafana, New Relic, DataDog, Sumo Logic, PMM, MongoDB Cloud Manager

• Only monitor basic metrics for all

datastores

• Low confidence in

metrics validity

• Lack of notification for

critical metrics

• Harder to troubleshoot

problems

Release Process • No dedicated release engineering org• Limited QA review of releases• Lack of sufficient DBA review• Lack of sufficient testing before

production release

• Frequent incidents

• Bad user experience

• Higher call volume to

customer support

• Low user ratings in the

app stores

2018 Operation Challenges

© 2019 PayPal Inc. Confidential and proprietary.

Page 17: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

17©2018 PayPal Inc. Confidential and proprietary.

Short Term Tactical Improvements

Page 18: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Principles

• Aim for peak traffic during Super Bowl

• Target availability improvement

• Minimize code change

• No big surgery on data models

• Align with strategic moves

• Provide foundational benefits for both

short/long term

Short Term Tactical Improvements

2.5x pps

© 2019 PayPal Inc. Confidential and proprietary.

Page 19: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

• MySQL upgrade and row-based

replication

• Vertical scale of writer node

• Vertical and horizontal scale of reader

nodes

• Read/write traffic separation

• Domain isolation

Infrastructure Scaling

© 2019 PayPal Inc. Confidential and proprietary.

Page 20: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

20©2018 PayPal Inc. Confidential and proprietary.

Improved DML latency Reduced blocked transaction

Infrastructure Scaling

Page 21: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

21©2018 PayPal Inc. Confidential and proprietary.

Improvement in CPU usage Improvement in RAM available

Infrastructure Scaling

Page 22: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Task Outcome

Django Code Optimization On Single Table

Select

• Avoid querying millions of rows and then

throwing away• 25% CPU reduction across the board

Tuning of 7 Table Joining Queries • Workaround the plan instability by avoiding

order by PK column• Query execution time reduced

from >60 seconds to millisecond

Django Code Optimization to Reduce DB Round

Trips

• Get the one-row result set directly without

counting the number of rows first• Queries to DB reduced by 50% for a GET call

Fixing Slow Queries of Critical Jobs Due to Plan

Instability

• Root cause analysis using advanced techniques

• Avoid cascading slow queries• No more critical job failure

Application Optimization/Query Tuning

© 2019 PayPal Inc. Confidential and proprietary.

Page 23: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

23©2018 PayPal Inc. Confidential and proprietary.

Optimization on Single Table Select

Avoid unnecessary query and then throw away fetched data

Time CPU % Payments/Sec (PPS) CPU% for 100 PPS

12/26/2018 19:00 UTC 32.9 45 32.9/45*100 = 73.1

12/28/2018 19:00 UTC 30.1 55 30.1/55*100 = 54.7

Net Reduction (73.1 – 54.7) / 73.1 = 25%

1233 def _function(cls, users):1234 """1235 :param uses: A list of `User` objects.1236 :return: A tuple of `User` objects which meets the condition1237 """12381239 if not users or len(users) == 0:1240 return [ ]

Page 24: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

24©2018 PayPal Inc. Confidential and proprietary.

Workaround Plan Instability of a Slow 7-Table Joining Query

Avoid Order By PK Column

Page 25: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

25©2018 PayPal Inc. Confidential and proprietary.

Root Cause of Ever Growing Cost Estimate on Using 2ndary Indexes

Avoid blocking of index page merges

Page 26: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Task Outcome

Elasticity of Dev API Pool • Proactive expansion of application server pools

during peak

Improve Capacity Planning • Simulate peak handling in load test environment

• Realistic projections of system capacity for peak traffic

Improved Monitoring • Enable Performance Insight and performance

schema. Single glass view of all databases

Knowing All Levers and Knobs for Peak Handling • Optimize cron job timing

• Turn feature off/on

PayPal Risk Calls Capacity Preparation • Vastly improved the risk integration with PayPal

and reduced losses

Playbooks, Event Coordination/Communication • Planned execution of preemptive steps

• Anticipation of issue handling with playbooks• War room, slack channels, dedicated bridge, point

of contacts

• Creation of the Performance & Scalability Team

Operational Enhancements

© 2019 PayPal Inc. Confidential and proprietary.

Page 27: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Task Outcome

Improve release process • Less rollback of releases

Improve incident management • Reduction in the number of incidents

Improve root cause analysis process • Analysis result in actionable ticket

Improve QA & load test • Drastic reduction in Serv1 & Serv2

Mandatory DBA review of new data model and

queries

• Major improvement in query performance

• Drastic reduction in the number of slow queries

Release Process Enhancements

© 2019 PayPal Inc. Confidential and proprietary.

Page 28: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Core MySQL DB Performance

© 2019 PayPal Inc. Confidential and proprietary.

Page 29: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

29©2018 PayPal Inc. Confidential and proprietary.

Long Term Strategic Improvements

Page 30: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

30©2018 PayPal Inc. Confidential and proprietary.

Principles

• Business outcome prioritized

• Evolution rather than revolution

• Horizontal scalability

• Data lifecycle management (DLM)

enabled

• Tight transactional integrity

• Domain isolation

• Foundation for linear scalability

Long Term Strategic Improvements

100x pps

Page 31: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

31©2018 PayPal Inc. Confidential and proprietary.

Questions?

Page 32: Scaling MySQL at Venmo · • Stable and better performance • Easy provisioning and scaling • Faster backup and cloning • Point-in-time recovery • Custom end points • Monitoring

Thank You

© 2019 PayPal Inc. Confidential and proprietary.