Upload
amazon-web-services
View
2.462
Download
0
Embed Size (px)
Citation preview
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Swapan Golla, Technical Architect, Gallup
October 2015
ISM304
From Oracle to Amazon RDS MySQL
and Amazon AuroraHow Gallup Made the Move
What to Expect from the Session
- Introduction
- Problem statement
- Why AWS?
- Non-database considerations
- RDS MySQL: Benefits and challenges
- Solution architecture
- Process and DevOps
- Amazon RDS / Amazon Aurora
- Conclusions
Introduction – Our Company
GALLUP Inc. has studied human nature and behavior for more than 70
years. Gallup employs many of the world's leading scientists in
management, economics, psychology, and sociology. Gallup performance
management systems help organizations boost organic growth by
increasing customer engagement and maximizing employee productivity
through measurement tools, coursework, and strategic advisory services.
Gallup's 2,000 professionals deliver services at client organizations,
through the Web, at Gallup University’s campuses, and in 40 offices
around the world.
Problem Statement
- Scalable reporting & analytics platform
- Cost effective
- Rich analytics capabilities
- Security & encryption (compliance)
- 24x7 availability (HA)
- Replication
- Same & multi-region data segregation
- Ease of administration
Why AWS?
- Cost effective
- Traditional/existing model
- Software licensing costs upfront
- Hardware investments
- Hardware/database administration overhead
- Multi-region support
- Patriot act
- Cross border data transfer
Why AWS?
- High availability (replication)
- Resource scalability
- Peak loads (temporary spikes) and Auto Scaling
- Analytical workloads
- Real-time/batch requirements
- Non-continuous loads/demands
- Rich supporting ecosystem
- Ex. Amazon RDS (relational DB), Amazon EMR, Amazon
Redshift, Amazon S3, AWS KMS, etc.
Non-Database Considerations: Process
- On-premises
- Existing stable processes
- Optimized over a decade
- Legacy overhead
- Cloud
- New processes
- New toolsets
- Cultural change (data is not within premises)
- Data segregation
Non-Database Considerations: Process
- Data migration
- VPC vs. public
- Bandwidth (VPN - Gallup Network <<>> Amazon VPC)
- Secure data migration
- Data encryption
- Database
- ETL
Non-Database Considerations: Technical
- Resource challenges/skillset gaps
- Experience with MySQL procedures/functions, etc.
- AWS skillsets
- Service layer mindset (http, web services, et al)
- Oracle skills are portable
- Lots of deficiencies and peculiarities
- Data migration
- Data synchronization issues
- On-premises vs cloud
- Automate - build vs. buy
Non-Database Considerations: Technical
- Data migration
- Amazon RDS reporting repository
- Data lakes
- Amazon S3 data repository (unified/global)
- Ad-hoc custom data & analytical deliverables
- Ease of cross-domain data analysis
- AWS Gotchas
- Amazon SQS: Not a conventional queue
- Amazon S3: eventual consistency
- Variable latency/performance of services
Amazon RDS MySQL: Benefits
- Relational DB (Oracle alternative)
- Cost effective & ease of administration
- Scalable
- Hardware resizing seamless
- Read instances
- Scalability
- Majority reads for reporting
- Ad-hoc needs
- Replication & HA (multi-AZ, region, AWS KMS, etc.)
- Security & encryption
Amazon RDS MySQL: Challenges (Database)
- Oracle is far more productive and feature-rich
- No AWS component integrations from the DB
- Tough to support primary database applications
- Developer productivity
- Package support non-existent
- Package level variables
- Codebase is scattered
- Better data structure support (ex. collections)
- Temporary tables
Amazon RDS MySQL: Challenges (Database)
- Cursor parameters in procedures
- Dynamic SQL (execute immediate)
- Debugging/logging
- Declare cursors with dynamic SQL
- Global temporary tables
- Support for subqueries in FROM clause
Amazon RDS MySQL: Challenges (Integrations)
- HTTP endpoint (Amazon SNS)
- Email/notification capability
- Two-way integration with Amazon S3
- Integration with Amazon SQS (enqueue/dequeue)
Solution Architecture
Oracle DB
Shared
Directories
Tomcat/Java
(QA & Prod)
S3
ELB
ElastiCache
Amazon Kinesis
SES/SNS
EC2 Tomcat
Cluster
External Reporting
CloudFront-S3
EC2 Tomcat Data
Server/RDS++
RDS MySQL
External Reporting
Data Integrations
SQSExternal Data
Integrations
Gallup
Network
ELB
EC2 Tomcat
Cluster
CloudFront-S3
EC2 Tomcat Data
Server/RDS++SQS
V
P
N
Amazon VPC (QA/PROD)
External Reporting
Developer
VMs/Jenkins
Solution Architecture
- Amazon RDS MySQL
- Currently reporting relational data store
- Stored routines/procedures extensively used
- RDS++
- AWS integrations with DB procedures
- XML-based definitions
- Java application
- Tomcat/Java instances (reporting instructure)
- Amazon EC2/Elastic Load Balancing/Auto Scaling/
Amazon VPC
Solution Architecture
- Tomcat/Java instances (data infrastructure)
- ETL/SWS/S3/SQS/AWS Java SDK/RDS++Host
- Amazon ElastiCache (distributed context mgmt.)
- Data collection
- SQS/S3
- ETL/S3 (Aggregated data from on-premises)
- Tomcat/Java instances (data on-premises)
- ETL/S3/CLI (VPN - Gallup Network <<>> Amazon VPC)
- Oracle exports to shared directory
Solution Architecture
Oracle DB
Shared
Directories
Tomcat/Java
(QA & Prod)
S3
ElastiCache
Amazon Kinesis
SES/SNS
RDS MySQL
External Reporting
Data Integrations
External Data
Integrations
Gallup
Network
ELB
EC2 Tomcat
Cluster
CloudFront-S3
EC2 Tomcat Data
Server/RDS++SQS
V
P
N
Amazon VPC (QA/PROD)
External Reporting
Developer
VMs/Jenkins
Solution Architecture – MySQL Workarounds
- Package scope variables
- Session variables to share between stored procedures
- SET @SUPPRESSION_VAL = -1 etc.
- Cursors with dynamic SQL
- Create temporary table and open a cursor
- DECLARE outCursor CURSOR FOR
SELECT * FROM test_tmp_tab;
Solution Architecture – MySQL Workarounds
- Cursors with dynamic SQL (contd.)
- Write dynamic SQL (populates temporary table)
- SET @v_dyn_sql = CONCAT("INSERT INTO test_tmp_tab
SELECT CONCAT_WS(@TEST1,D1,D2,D3,D4, 'High',
IFNULL(i_measure_list, '""')") out_val FROM test.test_vw
WHERE D1 in (", i_d1_list, ") AND D2 = ", i_d2_id,
IF(i_measure_list IS NULL, ' AND 1 = 0', ' AND 1 = 1')
Solution Architecture – MySQL Workarounds
- Execute dynamic SQL, which populates temporary table
- PREPARE stmt FROM @v_dyn_sql;
- EXECUTE stmt; DEALLOCATE PREPARE stmt;
- OPEN outCursor;
- Loop through the cursor and build output
- Execute immediate
- Build dynamic SQL
- SET @v_var = CONCAT('SELECT GROUP_CONCAT(D1
ORDER BY D1 SEPARATOR '','') INTO @o_list FROM (
SELECT D1 FROM D WHERE D1 in (', i_D_list, ')');
-
Solution Architecture – MySQL Workarounds
- Execute immediate (contd.)
- SET @o_flist = null;
- Executing the dynamic SQL
- PREPARE stmt FROM @v_var; EXECUTE stmt;
- DEALLOCATE PREPARE stmt;
- SET o_flist = @o_list;
Solution Architecture – MySQL
- 400+ stored procedures (first phase)
- 200+ tables/views (first phase)
- Support for aggregation data from on-premises
- Support for reporting configuration
- Brand new products (first phase)
- Amazon RDS++
- Amazon SQS/Amazon S3/Amazon SNS/Amazon SES
support from MySQL
- Post stored procedure integrations
Process & DevOps
- GitHub (On-premises)
- VPN (Gallup Network <<>> Amazon VPC)
- Jenkins (Java deployment)
- DB code deployment
- Stored procedure deployment
- EC2/Chef
- Auto Scaling
- Stress environment (clone of production)
- Automated deployment (sysadmins)
- Ease of multi-region deployment
Process & DevOps
- Amazon S3 intermediary deployment repository steps
- Jenkins – Check out GIT repo (on-premises)
- Jenkins - Build war and deploy to appropriate S3 buckets
- Jenkins - Run scripts on QA EC2 instances to sync war files
- Manual script deployment on PROD EC2 instances
- Auto Scaling
- Create an EC2 machine
- Install/deploy (Chef)
- Sync with S3 for war files
- Add to ELB
Jenkins
SSH/GIT
AWS Keys
S3 Plugins
Prod EC2
AWS CLI
Amazon S3 (QA & Prod Deploy Buckets)
QA EC2
AWS CLI
Amazon RDS / Amazon Aurora
- Early adopter
- More read instances / Less lag times
- Replication & HA
- Better integration with AWS components in future
- Better DevOps tools for database development in future
- Encryption
- Awaiting this functionality to go forward for our production
rollout
Conclusions
- AWS is the right fit for our future
- Cost-effective
- Scalable
- Meets challenging overall business needs
- Amazon RDS MySQL/Amazon Aurora
- A cost-effective alternative to Oracle in the cloud for
supporting scalable applications/workloads
- Better integration with other AWS components (Aurora)