29
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Swapan Golla, Technical Architect, Gallup October 2015 ISM304 From Oracle to Amazon RDS MySQL and Amazon Aurora How Gallup Made the Move

(ISM304) Oracle to Amazon RDS MySQL & Aurora: How Gallup Made the Move

Embed Size (px)

Citation preview

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Swapan Golla, Technical Architect, Gallup

October 2015

ISM304

From Oracle to Amazon RDS MySQL

and Amazon AuroraHow Gallup Made the Move

What to Expect from the Session

- Introduction

- Problem statement

- Why AWS?

- Non-database considerations

- RDS MySQL: Benefits and challenges

- Solution architecture

- Process and DevOps

- Amazon RDS / Amazon Aurora

- Conclusions

Introduction – Our Company

GALLUP Inc. has studied human nature and behavior for more than 70

years. Gallup employs many of the world's leading scientists in

management, economics, psychology, and sociology. Gallup performance

management systems help organizations boost organic growth by

increasing customer engagement and maximizing employee productivity

through measurement tools, coursework, and strategic advisory services.

Gallup's 2,000 professionals deliver services at client organizations,

through the Web, at Gallup University’s campuses, and in 40 offices

around the world.

Problem Statement

- Scalable reporting & analytics platform

- Cost effective

- Rich analytics capabilities

- Security & encryption (compliance)

- 24x7 availability (HA)

- Replication

- Same & multi-region data segregation

- Ease of administration

Why AWS?

- Cost effective

- Traditional/existing model

- Software licensing costs upfront

- Hardware investments

- Hardware/database administration overhead

- Multi-region support

- Patriot act

- Cross border data transfer

Why AWS?

- High availability (replication)

- Resource scalability

- Peak loads (temporary spikes) and Auto Scaling

- Analytical workloads

- Real-time/batch requirements

- Non-continuous loads/demands

- Rich supporting ecosystem

- Ex. Amazon RDS (relational DB), Amazon EMR, Amazon

Redshift, Amazon S3, AWS KMS, etc.

Non-Database Considerations: Process

- On-premises

- Existing stable processes

- Optimized over a decade

- Legacy overhead

- Cloud

- New processes

- New toolsets

- Cultural change (data is not within premises)

- Data segregation

Non-Database Considerations: Process

- Data migration

- VPC vs. public

- Bandwidth (VPN - Gallup Network <<>> Amazon VPC)

- Secure data migration

- Data encryption

- Database

- ETL

Non-Database Considerations: Technical

- Resource challenges/skillset gaps

- Experience with MySQL procedures/functions, etc.

- AWS skillsets

- Service layer mindset (http, web services, et al)

- Oracle skills are portable

- Lots of deficiencies and peculiarities

- Data migration

- Data synchronization issues

- On-premises vs cloud

- Automate - build vs. buy

Non-Database Considerations: Technical

- Data migration

- Amazon RDS reporting repository

- Data lakes

- Amazon S3 data repository (unified/global)

- Ad-hoc custom data & analytical deliverables

- Ease of cross-domain data analysis

- AWS Gotchas

- Amazon SQS: Not a conventional queue

- Amazon S3: eventual consistency

- Variable latency/performance of services

Amazon RDS MySQL: Benefits

- Relational DB (Oracle alternative)

- Cost effective & ease of administration

- Scalable

- Hardware resizing seamless

- Read instances

- Scalability

- Majority reads for reporting

- Ad-hoc needs

- Replication & HA (multi-AZ, region, AWS KMS, etc.)

- Security & encryption

Amazon RDS MySQL: Challenges (Database)

- Oracle is far more productive and feature-rich

- No AWS component integrations from the DB

- Tough to support primary database applications

- Developer productivity

- Package support non-existent

- Package level variables

- Codebase is scattered

- Better data structure support (ex. collections)

- Temporary tables

Amazon RDS MySQL: Challenges (Database)

- Cursor parameters in procedures

- Dynamic SQL (execute immediate)

- Debugging/logging

- Declare cursors with dynamic SQL

- Global temporary tables

- Support for subqueries in FROM clause

Amazon RDS MySQL: Challenges (Integrations)

- HTTP endpoint (Amazon SNS)

- Email/notification capability

- Two-way integration with Amazon S3

- Integration with Amazon SQS (enqueue/dequeue)

Solution Architecture

Oracle DB

Shared

Directories

Tomcat/Java

(QA & Prod)

S3

ELB

ElastiCache

Amazon Kinesis

SES/SNS

EC2 Tomcat

Cluster

External Reporting

CloudFront-S3

EC2 Tomcat Data

Server/RDS++

RDS MySQL

External Reporting

Data Integrations

SQSExternal Data

Integrations

Gallup

Network

ELB

EC2 Tomcat

Cluster

CloudFront-S3

EC2 Tomcat Data

Server/RDS++SQS

V

P

N

Amazon VPC (QA/PROD)

External Reporting

Developer

VMs/Jenkins

Solution Architecture

- Amazon RDS MySQL

- Currently reporting relational data store

- Stored routines/procedures extensively used

- RDS++

- AWS integrations with DB procedures

- XML-based definitions

- Java application

- Tomcat/Java instances (reporting instructure)

- Amazon EC2/Elastic Load Balancing/Auto Scaling/

Amazon VPC

Solution Architecture

- Tomcat/Java instances (data infrastructure)

- ETL/SWS/S3/SQS/AWS Java SDK/RDS++Host

- Amazon ElastiCache (distributed context mgmt.)

- Data collection

- SQS/S3

- ETL/S3 (Aggregated data from on-premises)

- Tomcat/Java instances (data on-premises)

- ETL/S3/CLI (VPN - Gallup Network <<>> Amazon VPC)

- Oracle exports to shared directory

Solution Architecture

Oracle DB

Shared

Directories

Tomcat/Java

(QA & Prod)

S3

ElastiCache

Amazon Kinesis

SES/SNS

RDS MySQL

External Reporting

Data Integrations

External Data

Integrations

Gallup

Network

ELB

EC2 Tomcat

Cluster

CloudFront-S3

EC2 Tomcat Data

Server/RDS++SQS

V

P

N

Amazon VPC (QA/PROD)

External Reporting

Developer

VMs/Jenkins

Solution Architecture – MySQL Workarounds

- Package scope variables

- Session variables to share between stored procedures

- SET @SUPPRESSION_VAL = -1 etc.

- Cursors with dynamic SQL

- Create temporary table and open a cursor

- DECLARE outCursor CURSOR FOR

SELECT * FROM test_tmp_tab;

Solution Architecture – MySQL Workarounds

- Cursors with dynamic SQL (contd.)

- Write dynamic SQL (populates temporary table)

- SET @v_dyn_sql = CONCAT("INSERT INTO test_tmp_tab

SELECT CONCAT_WS(@TEST1,D1,D2,D3,D4, 'High',

IFNULL(i_measure_list, '""')") out_val FROM test.test_vw

WHERE D1 in (", i_d1_list, ") AND D2 = ", i_d2_id,

IF(i_measure_list IS NULL, ' AND 1 = 0', ' AND 1 = 1')

Solution Architecture – MySQL Workarounds

- Execute dynamic SQL, which populates temporary table

- PREPARE stmt FROM @v_dyn_sql;

- EXECUTE stmt; DEALLOCATE PREPARE stmt;

- OPEN outCursor;

- Loop through the cursor and build output

- Execute immediate

- Build dynamic SQL

- SET @v_var = CONCAT('SELECT GROUP_CONCAT(D1

ORDER BY D1 SEPARATOR '','') INTO @o_list FROM (

SELECT D1 FROM D WHERE D1 in (', i_D_list, ')');

-

Solution Architecture – MySQL Workarounds

- Execute immediate (contd.)

- SET @o_flist = null;

- Executing the dynamic SQL

- PREPARE stmt FROM @v_var; EXECUTE stmt;

- DEALLOCATE PREPARE stmt;

- SET o_flist = @o_list;

Solution Architecture – MySQL

- 400+ stored procedures (first phase)

- 200+ tables/views (first phase)

- Support for aggregation data from on-premises

- Support for reporting configuration

- Brand new products (first phase)

- Amazon RDS++

- Amazon SQS/Amazon S3/Amazon SNS/Amazon SES

support from MySQL

- Post stored procedure integrations

Process & DevOps

- GitHub (On-premises)

- VPN (Gallup Network <<>> Amazon VPC)

- Jenkins (Java deployment)

- DB code deployment

- Stored procedure deployment

- EC2/Chef

- Auto Scaling

- Stress environment (clone of production)

- Automated deployment (sysadmins)

- Ease of multi-region deployment

Process & DevOps

- Amazon S3 intermediary deployment repository steps

- Jenkins – Check out GIT repo (on-premises)

- Jenkins - Build war and deploy to appropriate S3 buckets

- Jenkins - Run scripts on QA EC2 instances to sync war files

- Manual script deployment on PROD EC2 instances

- Auto Scaling

- Create an EC2 machine

- Install/deploy (Chef)

- Sync with S3 for war files

- Add to ELB

Jenkins

SSH/GIT

AWS Keys

S3 Plugins

Prod EC2

AWS CLI

Amazon S3 (QA & Prod Deploy Buckets)

QA EC2

AWS CLI

Amazon RDS / Amazon Aurora

- Early adopter

- More read instances / Less lag times

- Replication & HA

- Better integration with AWS components in future

- Better DevOps tools for database development in future

- Encryption

- Awaiting this functionality to go forward for our production

rollout

Conclusions

- AWS is the right fit for our future

- Cost-effective

- Scalable

- Meets challenging overall business needs

- Amazon RDS MySQL/Amazon Aurora

- A cost-effective alternative to Oracle in the cloud for

supporting scalable applications/workloads

- Better integration with other AWS components (Aurora)

Remember to complete

your evaluations!

Thank you!

Email if you have any questions [email protected]