47
Automatic Data Migration into the Cloud MS Thesis: Computer Science & Software Engineering By: Kushal Mehra Supervisor: Dr. Yuhong Yan, Dr. Daniel Lemire Concordia University 1

Thesis presentation

Embed Size (px)

Citation preview

Page 1: Thesis presentation

Automatic Data Migration into the Cloud

MS Thesis: Computer Science & Software Engineering

By: Kushal Mehra

Supervisor: Dr. Yuhong Yan, Dr. Daniel Lemire

Concordia University

1

Page 2: Thesis presentation

2

Famous Applications (Facebook, Google Blogger, Twitter.) depends upon NoSQL.

Some of advantages Of NoSQL Databases:

High Scalability.

High reading and writing Performance.

Availability at low cost.

Suitable Applications.

Big Data

Geographical Data.

MOTIVATION

Concordia University

2

Page 3: Thesis presentation

Agenda

•  Introduction

•  Review of Previous Studies

•  Proposed Model

• Experiment and Results

Concordia University

3

• Future work and Conclusion

Page 4: Thesis presentation

Section 1:Introduction

Concordia University

4

Page 5: Thesis presentation

Relational Database Vs. NoSQL Database

Cloud database definition.

Existing Problem in Relational Database.

•Scale up

•Scale out

Concordia University

6

INTRODUCTION AND PROBLEM

Page 6: Thesis presentation

7

Relational Database Vs. NoSQL Database•Scale up

Concordia University

INTRODUCTION AND PROBLEM

Page 7: Thesis presentation

Relational Database Vs. NoSQL Database Scale out

Concordia University

8

INTRODUCTION AND PROBLEM

Page 8: Thesis presentation

Data Migration

Enterprises seek to migrate their massive relational databases to the NoSQL databases.

The process of transferring data between storage types, formats, or computer systems is called data migration

Concordia University

11

INTRODUCTION AND PROBLEM

Page 9: Thesis presentation

Importance of Data Migration

One of survey estimated that the data migration market would reach $906 million by 2012

Concordia University

12

INTRODUCTION AND PROBLEM

Page 10: Thesis presentation

Previous Studies

There are large number of works available for data migration.

Some of them are :

Schema Conversion.

ETL.

Integrated Model

Concordia University

13

REVIEW OF PREVIOUS STUDIES

Page 11: Thesis presentation

Previous Studies

Thakar et al. and Chanchary et al. migrated a large relational database to the cloud database (2010) .

Calil et al. proposed a SimpleSQL, a relational layer over Amazon SimpleDB (2012).

Concordia University

14

REVIEW OF PREVIOUS STUDIES

Page 12: Thesis presentation

Limitations of Existing Work

Existing Migration methods are not sufficient for data migration:

Lack Migration strategy.

Application Adaption.

Sharding.

Existing migrate data from the legacy system to relational database.

Concordia University

15

Page 13: Thesis presentation

Amazon SimpleDB

SimpleDB is a web service which provides structured data storage in the cloud.

Multi Value Attribute.

Concordia University

16

Page 14: Thesis presentation

Amazon SimpleDB

Table1 : Relational database and SimpleDB equivalence

Relational Database SimpleDB

Table Domain

Row Item

Column Attribute

Value Value(s)

Concordia University

17

Page 15: Thesis presentation

Characteristics of NoSQL Databases

No Normalization.

No Joins.

Schemaless.

Data Type.

Concordia University

18

Page 16: Thesis presentation

Characteristics of NoSQL Databases

Some of the cloud database that have same data Model and characteristics.

Concordia University

19

CLOUD DATABASE

Amazon SimpleDB

MongoDB

CouchDB

Oracle NoSql

Page 17: Thesis presentation

Section 2: Proposed Model

Concordia University

21

Page 18: Thesis presentation

Data Migration Model

Relational-Cloud Mapping22

PROPOSED MODEL

Page 19: Thesis presentation

Migration Methods

We Propose four Migration Methods.• Type 1: complete relational database to one domain.

• Type 2: multiple tables to one domain.

• Type 3: a table to one domain.

• Type 4: normalization to denormalization and tables to domain.

Each Method is independent of the other and is capable of migrating entire relational database.

Concordia University

25

PROPOSED MODEL

Page 20: Thesis presentation

Migration Methods

Concordia University

26

PROPOSED MODEL

Page 21: Thesis presentation

Mapping Strategies

Concordia University

27

Page 22: Thesis presentation

Mapping Strategy 1 (MS1)

28

PROPOSED MODEL

Page 23: Thesis presentation

Mapping Strategy 2 (MS2)

Concordia University

30

PROPOSED MODEL

Page 24: Thesis presentation

Mapping Strategy 3 (MS3)

Concordia University

32

PROPOSED MODEL

Page 25: Thesis presentation

34

Type 1 Migration

Uses Mapping Strategy 2 (Ms2).

Migrate Entire relational database.

Exists only a single domain in cloud database.

Number of items = number of rows in the entire relational database.

Concordia University

PROPOSED MODEL

Page 26: Thesis presentation

36

Type 2 Migration

Uses Mapping Strategy 1 (Ms1) and Mapping Strategy 2 (Ms2).

Migrate tables and their data to one domain.

Migrate a table to one domain.

Concordia University

PROPOSED MODEL

Page 27: Thesis presentation

3838

Type 3 Migration

Uses Mapping Strategy 1 (Ms1).

Migrate a table to one domain in a cloud database.

Implicit Conversion.

Concordia University

PROPOSED MODEL

Page 28: Thesis presentation

404040

Type 4 Migration

Uses Mapping Strategy 1 (Ms1) and Mapping Strategy 3 (Ms3).

Migrates denormalized tables to one domain in a cloud database.

Migrate a single table and data to a one domain.

Explicit Conversion of columns.

Concordia University

PROPOSED MODEL

Page 29: Thesis presentation

42424242

Migration Method Usage

Type 1 < 10 GB

Type 2

Data size is more than 10 GB and Joins to be performed.

Type 3

Needs same semantics as of relational database and database size is more than 10GB

Type 4

Denormalization.

Data size is more than 10 GB and Joins to be performed.

PROPOSED MODEL

Page 30: Thesis presentation

Sharding and Redundancy in Migration Methods

Sharding: Sharding is the process of storing data records across multiple domains.

Type1 does not support sharding.

Type2, Typ3, Type 4 Supports sharding.

Redundancy: Data redundancy is the superfluity of data.

Concordia University

43

Page 31: Thesis presentation

Implementation Details

Source System : can be Oracle, MySQL or Microsoft SQL Server.

Destination System: Our destination system is a cloud database which supports key-value pairs.

We use Microsoft .Net Framework 3.5, Microsoft IIS 7.0 and MicrosoftSQL Server 2008 R2.

C# library of SimpleDB to perform all necessary action for migrating the data.

44

EXPERIMENTS

Page 32: Thesis presentation

Experiment

Migrated the relational database to Amazon Simpledb.

A relational database of the “online bookstore”application.

The sample database consists of thirteen tables and sample data

45

EXPERIMENTS

Page 33: Thesis presentation

Type 1 Migration

Concordia University

46

EXPERIMENTS

Page 34: Thesis presentation

Type 2 Migration

Concordia University

47

EXPERIMENTS

Page 35: Thesis presentation

Type 3 Migration

48

EXPERIMENTS

Page 36: Thesis presentation

Type 4 Migration

49

EXPERIMENTS

Page 37: Thesis presentation

Code GenerationWe propose an interface which will

assist the developer to generate code automatically.

This includes the basic usage of: Select.

Insert.

Delete.

Update queries.

52

Application Adaptation

Concordia University

Page 38: Thesis presentation

Performance Analysis

Perfomance Model

Computation time.

Storage Cost.

Concordia University

53

EXPERIMENTS

Page 39: Thesis presentation

Average Computation Time

55

EXPERIMENTS

Page 40: Thesis presentation

Storage Cost of 10GB

Concordia University

56

EXPERIMENTS

Amazon SimpleDB 2013

Page 41: Thesis presentation

Storage Cost of 25GB

Concordia University

57

EXPERIMENTS

Amazon SimpleDB 2013

Page 42: Thesis presentation

Comparison of Migration Methods

Migration Methods Type 1 Type 2 Type 3 Type 4

Storage Space <10GB

>10GB

Sharding

JoinsLimited to one

domain

Limited to

one domainCross domain

Limited to one

domain

Denormalzed Data

Storage costNearly same of

Type 2, Type3

Nearly same

of Type 1,

Type3

Nearly same of

Type 2, Type3

Less than Type 1,

Type 2, Type3

Computation Time Smallest Larger than

Type1 Highest

Larger than

Type2

Concordia University

58

Page 43: Thesis presentation

Limitations

Stored Procedure.

User Defined Functions.

Triggers.

Concordia University

59

Page 44: Thesis presentation

Conclusion and Future Direction

This thesis proposes four diverse methods to migrate relational databases to cloud databases.

Each method is independent of the other.

Successfully migrated relational database to the NoSQL database.

Proposes an Interface for code generation.

Concordia University

60

CONCLUSION AND FUTURE WORK

Page 45: Thesis presentation

Future Direction

Migration of :

Stored procedure.

Triggers.

User-Defined Functions.

Concordia University

61

CONCLUSION AND FUTURE WORK

Page 46: Thesis presentation

62

Publications

K. Mehra, Y. Yan and D. Lemire. Automatic data migration to the cloud. In the Sixth International workshop on Cloud Data Management (CloudDB2014), submitted.

K. Mehra, Y. Yan and D. Lemire. Automatic data migration into the cloud. IEEE Services 2014, Manuscript.

Page 47: Thesis presentation

63