Upload
concordia-university
View
305
Download
0
Embed Size (px)
Citation preview
Automatic Data Migration into the Cloud
MS Thesis: Computer Science & Software Engineering
By: Kushal Mehra
Supervisor: Dr. Yuhong Yan, Dr. Daniel Lemire
Concordia University
1
2
Famous Applications (Facebook, Google Blogger, Twitter.) depends upon NoSQL.
Some of advantages Of NoSQL Databases:
High Scalability.
High reading and writing Performance.
Availability at low cost.
Suitable Applications.
Big Data
Geographical Data.
MOTIVATION
Concordia University
2
Agenda
• Introduction
• Review of Previous Studies
• Proposed Model
• Experiment and Results
Concordia University
3
• Future work and Conclusion
Section 1:Introduction
Concordia University
4
Relational Database Vs. NoSQL Database
Cloud database definition.
Existing Problem in Relational Database.
•Scale up
•Scale out
Concordia University
6
INTRODUCTION AND PROBLEM
7
Relational Database Vs. NoSQL Database•Scale up
Concordia University
INTRODUCTION AND PROBLEM
Relational Database Vs. NoSQL Database Scale out
Concordia University
8
INTRODUCTION AND PROBLEM
Data Migration
Enterprises seek to migrate their massive relational databases to the NoSQL databases.
The process of transferring data between storage types, formats, or computer systems is called data migration
Concordia University
11
INTRODUCTION AND PROBLEM
Importance of Data Migration
One of survey estimated that the data migration market would reach $906 million by 2012
Concordia University
12
INTRODUCTION AND PROBLEM
Previous Studies
There are large number of works available for data migration.
Some of them are :
Schema Conversion.
ETL.
Integrated Model
Concordia University
13
REVIEW OF PREVIOUS STUDIES
Previous Studies
Thakar et al. and Chanchary et al. migrated a large relational database to the cloud database (2010) .
Calil et al. proposed a SimpleSQL, a relational layer over Amazon SimpleDB (2012).
Concordia University
14
REVIEW OF PREVIOUS STUDIES
Limitations of Existing Work
Existing Migration methods are not sufficient for data migration:
Lack Migration strategy.
Application Adaption.
Sharding.
Existing migrate data from the legacy system to relational database.
Concordia University
15
Amazon SimpleDB
SimpleDB is a web service which provides structured data storage in the cloud.
Multi Value Attribute.
Concordia University
16
Amazon SimpleDB
Table1 : Relational database and SimpleDB equivalence
Relational Database SimpleDB
Table Domain
Row Item
Column Attribute
Value Value(s)
Concordia University
17
Characteristics of NoSQL Databases
No Normalization.
No Joins.
Schemaless.
Data Type.
Concordia University
18
Characteristics of NoSQL Databases
Some of the cloud database that have same data Model and characteristics.
Concordia University
19
CLOUD DATABASE
Amazon SimpleDB
MongoDB
CouchDB
Oracle NoSql
Section 2: Proposed Model
Concordia University
21
Data Migration Model
Relational-Cloud Mapping22
PROPOSED MODEL
Migration Methods
We Propose four Migration Methods.• Type 1: complete relational database to one domain.
• Type 2: multiple tables to one domain.
• Type 3: a table to one domain.
• Type 4: normalization to denormalization and tables to domain.
Each Method is independent of the other and is capable of migrating entire relational database.
Concordia University
25
PROPOSED MODEL
Migration Methods
Concordia University
26
PROPOSED MODEL
Mapping Strategies
Concordia University
27
Mapping Strategy 1 (MS1)
28
PROPOSED MODEL
Mapping Strategy 2 (MS2)
Concordia University
30
PROPOSED MODEL
Mapping Strategy 3 (MS3)
Concordia University
32
PROPOSED MODEL
34
Type 1 Migration
Uses Mapping Strategy 2 (Ms2).
Migrate Entire relational database.
Exists only a single domain in cloud database.
Number of items = number of rows in the entire relational database.
Concordia University
PROPOSED MODEL
36
Type 2 Migration
Uses Mapping Strategy 1 (Ms1) and Mapping Strategy 2 (Ms2).
Migrate tables and their data to one domain.
Migrate a table to one domain.
Concordia University
PROPOSED MODEL
3838
Type 3 Migration
Uses Mapping Strategy 1 (Ms1).
Migrate a table to one domain in a cloud database.
Implicit Conversion.
Concordia University
PROPOSED MODEL
404040
Type 4 Migration
Uses Mapping Strategy 1 (Ms1) and Mapping Strategy 3 (Ms3).
Migrates denormalized tables to one domain in a cloud database.
Migrate a single table and data to a one domain.
Explicit Conversion of columns.
Concordia University
PROPOSED MODEL
42424242
Migration Method Usage
Type 1 < 10 GB
Type 2
Data size is more than 10 GB and Joins to be performed.
Type 3
Needs same semantics as of relational database and database size is more than 10GB
Type 4
Denormalization.
Data size is more than 10 GB and Joins to be performed.
PROPOSED MODEL
Sharding and Redundancy in Migration Methods
Sharding: Sharding is the process of storing data records across multiple domains.
Type1 does not support sharding.
Type2, Typ3, Type 4 Supports sharding.
Redundancy: Data redundancy is the superfluity of data.
Concordia University
43
Implementation Details
Source System : can be Oracle, MySQL or Microsoft SQL Server.
Destination System: Our destination system is a cloud database which supports key-value pairs.
We use Microsoft .Net Framework 3.5, Microsoft IIS 7.0 and MicrosoftSQL Server 2008 R2.
C# library of SimpleDB to perform all necessary action for migrating the data.
44
EXPERIMENTS
Experiment
Migrated the relational database to Amazon Simpledb.
A relational database of the “online bookstore”application.
The sample database consists of thirteen tables and sample data
45
EXPERIMENTS
Type 1 Migration
Concordia University
46
EXPERIMENTS
Type 2 Migration
Concordia University
47
EXPERIMENTS
Type 3 Migration
48
EXPERIMENTS
Type 4 Migration
49
EXPERIMENTS
Code GenerationWe propose an interface which will
assist the developer to generate code automatically.
This includes the basic usage of: Select.
Insert.
Delete.
Update queries.
52
Application Adaptation
Concordia University
Performance Analysis
Perfomance Model
Computation time.
Storage Cost.
Concordia University
53
EXPERIMENTS
Average Computation Time
55
EXPERIMENTS
Storage Cost of 10GB
Concordia University
56
EXPERIMENTS
Amazon SimpleDB 2013
Storage Cost of 25GB
Concordia University
57
EXPERIMENTS
Amazon SimpleDB 2013
Comparison of Migration Methods
Migration Methods Type 1 Type 2 Type 3 Type 4
Storage Space <10GB
>10GB
Sharding
JoinsLimited to one
domain
Limited to
one domainCross domain
Limited to one
domain
Denormalzed Data
Storage costNearly same of
Type 2, Type3
Nearly same
of Type 1,
Type3
Nearly same of
Type 2, Type3
Less than Type 1,
Type 2, Type3
Computation Time Smallest Larger than
Type1 Highest
Larger than
Type2
Concordia University
58
Limitations
Stored Procedure.
User Defined Functions.
Triggers.
Concordia University
59
Conclusion and Future Direction
This thesis proposes four diverse methods to migrate relational databases to cloud databases.
Each method is independent of the other.
Successfully migrated relational database to the NoSQL database.
Proposes an Interface for code generation.
Concordia University
60
CONCLUSION AND FUTURE WORK
Future Direction
Migration of :
Stored procedure.
Triggers.
User-Defined Functions.
Concordia University
61
CONCLUSION AND FUTURE WORK
62
Publications
K. Mehra, Y. Yan and D. Lemire. Automatic data migration to the cloud. In the Sixth International workshop on Cloud Data Management (CloudDB2014), submitted.
K. Mehra, Y. Yan and D. Lemire. Automatic data migration into the cloud. IEEE Services 2014, Manuscript.
63