ATHABASCA UNIVERSITY A Survey of Data Consistency in …dtpr.lib.athabascau.ca/action/download.php?filename=scis-07/open/... · ATHABASCA UNIVERSITY A Survey of Data Consistency in

1

ATHABASCA UNIVERSITY

A Survey of Data Consistency in Cloud DBs

by

Hank (Yonghang) Lin

A project submitted in partial fulfillment

of the requirements for the degree of

MASTER OF SCIENCE in INFORMATION SYSTEMS

Athabasca, Alberta

August, 2014

© Hank Lin, 2014

2

DEDICATION

This paper is dedicated to my wife for her encouragement and support to pursue my educational

goal. Unique thanks to my employer, Enabil Solution, who supported and partially sponsored my

tuition fee.

3

ABSTRACT

This essay is a survey of data consistency in the cloud databases. It explores the five

essential characteristics of a cloud computing platform identified by NIST (National Institute of

Standards and Technology). They are: on-demand self-service, broad network access, resource

pooling, rapid elasticity and measured service. Those characteristics of a cloud platform

differentiate it from the traditional data center and make it an attractive infrastructure solution for

enterprise. Most cloud platforms utilize commodity machines to build a server farm and can deliver

high scalability and availability at low cost. The common architecture of the traditional RDBMS

(Relational DataBase Management System) cluster adopts shared disks architecture and it is well-

suited to a big machine. On the other hand, most cloud DBs choose shared nothing architecture,

which allows the DB to scale up to thousands of nodes, as they do not interfere with one another.

This essay reviews a traditional RDBMS cluster – Oracle RAC and a number of DBs that can offer

strong data consistency in the cloud environment and analyses how they handle the data consistency

and at what level from its architecture’s perspective.

Strong data consistency is a must for many enterprise applications. This essay can help IT

architects and DBAs understand the difference between traditional RDBMS and cloud DBs so that

they can evaluate the changes and efforts when they are thinking of migrating their applications to

the cloud. It can also be used as material for university students who have interest on the data

management domain.

The remainder of this essay is organized as follows: Chapter I introduces the research

background and states my research objective and methodology. Chapter II presents a literature

review of the state-of-the-art research into the cloud data management domain today. The foundation

knowledge of the cloud computing domain, database consistency theories, common cloud DB categories

and traditional RDBMS are discussed as well, so the readers can understand the subject better and

learn its latest development and challenges. The research methodology is detailed in the Chapter III.

In Chapter IV, seven cloud DBs are selected for analysis with a focus on its architecture. Issues,

4

challenges and opportunities are discussed in Chapter V and Chapter VI concludes this essay.

References are provided in the end.

5

ACKNOWLEDGMENTS

This research paper work was guided by my supervisor, Professor Qing Tan, Athabasca

University. I am thankful to Professor Tan for his valuable suggestions, proper guidelines and

support.

6

TABLE OF CONTENTS

ABSTRACT ......................................................................................................................................... 3

ACKNOWLEDGMENTS ................................................................................................................... 5

TABLE OF CONTENTS ..................................................................................................................... 6

CHAPTER I: INTRODUCTION ...................................................................................................... 9

1. Research Background ............................................................................................................... 9

2. Research Purpose and Objective ............................................................................................. 11

3. Research Methodology ........................................................................................................... 11

4. Research Scope and Contribution ........................................................................................... 12

CHAPTER II: LITERATURE REVIEW ..................................................................................... 14

1. Cloud Computing .................................................................................................................... 14

2. ACID and CAP theorem ......................................................................................................... 18

3. NoSQL and NewSQL ............................................................................................................. 20

4. RDBMS Cluster – Oracle RAC .............................................................................................. 22

5. Cloud Computing Database and Data Management ............................................................... 25

CHAPTER III: METHODOLOGY ............................................................................................... 29

CHAPTER IV: CASE STUDIES................................................................................................... 32

1. Megastore ................................................................................................................................ 32

2. SAP HANA ............................................................................................................................. 34

3. VoltDB .................................................................................................................................... 36

4. MySQL Cluster ....................................................................................................................... 37

7

5. ScaleDB .................................................................................................................................. 38

6. NuoDB .................................................................................................................................... 40

7. ClustrixDB .............................................................................................................................. 42

CHAPTER V: CHALLENGE, OPPORTUNITY AND TREND ................................................ 43

CHAPTER VI: CONCLUSIONS AND RECOMMENDATIONS ............................................... 47

References .......................................................................................................................................... 49

8

LIST OF FIGURES

Figure 1: Research methodology diagram ......................................................................................... 12

Figure 2: PACELC Tradeoffs for Distributed Data Services (Abadi, 2012) ..................................... 27

Figure 3: Trend Popularity Data provided by DB-Engine. ................................................................ 30

Figure 4: Operations across entity groups (Baker et al., 2011) .......................................................... 33

Figure 5: The SAP HANA database architecture (Färber, 2012) ...................................................... 35

Figure 6: MySQL Node Architecture (Ronstrom, 2004) ................................................................... 38

Figure 7: ScaleDB Cluster with Mirrored Storage (Shadmon, 2009) ................................................ 39

Figure 8: NuoDB architecture (NuoDB, 2013) .................................................................................. 41

Figure 9: Clustrix Distributed Query Processing (Clustrix, 2014) .................................................... 42

LIST OF TABLES

Table 1: Reviewed DB comparison ................................................................................................... 45

9

CHAPTER I: INTRODUCTION

1. Research Background

Cloud computing is becoming a trend. Gartner (2008) describes cloud computing as a style

of computing in which scalable and elastic IT-enabled capabilities are delivered “as a service” using

Internet technologies. The advantages and benefits of cloud computing have been well known. The

major point includes: cost efficiency, scalability, continuous availability and on-demand

provisioning. Even though enterprises still have some concerns regarding the cloud platform, such

as security and privacy in the cloud, its user base is growing constantly and most major IT players

have bet on it and heavily invested in it.

With the continued development of globalization, more enterprises now access global

markets and their workforces and client bases are spread over multiple regions. It is essential that

their applications are available all the time and can be accessed from anywhere. Also, user tolerance

for the latency of application response time is closing to near zero. Cloud computing can address the

concerns of high availability, easy access and fast response time. But the majority of data

management solutions in the cloud today cannot meet the mandatory data consistency requirement

of many enterprise applications.

In this essay, the consistency of a distributed database refers to level of guarantee as to when

a committed write would be visible to other clients/users in a distributed, concurrently accessible

system. Doug Terry defines six possible consistency guarantees in his paper (Terry, 2013).

1. Eventual Consistency

2. Consistent Prefix

3. Bounded Staleness

4. Monotonic Reads

5. Read My Own Writes

6. Strong Consistency

Most NoSQL is designed with the eventual consistency principle while strong consistency is

one of the key characteristics of traditional relational databases. Between eventual consistency and

10

strong consistency, Consistent Prefix can guarantee read data in ordered sequence of writes.

Bounded Staleness guarantees the retrieved data within a defined time period. Monotonic Reads

guarantee Consistent Prefix in a session, also called “session guarantee.” Read My Own Write

offers strong consistency for a single client. It guarantees that all writes that were performed by the

client are visible to the client's subsequent reads. The middle four consistency models are all a form

of eventual consistency but stronger than the basic eventual consistency.

The trade-off between consistency and availability or scalability is well explained in Eric

Brewer’s CAP theorem. For many enterprise applications, strong consistency is a must in some of

its use cases, but not all the use cases demand strong data consistency. For the benefit of availability

and scalability, it is possible that data consistency can be relaxed in some use cases. For example,

the account balance of prepaid cellular service is critical and has to be tracked in real time for

authorization before allowing a subscriber’s call request. But account balance is not so critical for

the majority postpaid accounts, as the carrier collects payment at the end of the customer’s bill

cycle. As long as the account balance of a postpaid account shows consistency before the bill cycle,

it is good enough for the carrier to bill the customer correctly. Even for a prepaid account, not all the

changes within its transactions demand strong consistency. The account balance must maintain

consistency anytime, but the call detail does not have to. A customer is unlikely to detect when the

call details are posted by seconds or even minutes of delay. From a business perspective, some data

in some transactions require absolute consistency while others do not. Even different customers may

have different consistency needs. If database consistency can be customized and the job of choosing

the desired consistency level for each transaction or user can be left with business analysts, then the

system does not have to pay the consistency costs for those cases where consistency is not really

needed.

Many new databases which claim to offer strong data consistency in the cloud are emerging

in recent years; especially some DBs label themselves as NewSQL. It is interesting to understand

their architecture and how they can overcome the overhead of maintaining strong consistency while

allowing their system to be highly scalable and available in the cloud environment.

11

2. Research Purpose and Objective

The purpose of this research is to survey the difference between the traditional RDBMS and

cloud-based databases and investigate how different databases handle the data consistency challenge

from an architecture perspective. In this essay, seven data management solutions were selected for

review through a systematic approach including Google Megastore, SAP HANA, VoltDB, MySQL

Cluster, ScaleDB, NuoDB and ClustrixDB. .

Although databases in the cloud can deliver much better scalability and availability along

with higher performance than tradition relational databases, the consistency model is a major

obstacle for migrating traditional database into the cloud.

The objective of this essay is to provide enough background for the people who are

interested to know more about cloud databases so that they can understand the complexity of data

consistency in the cloud database.

3. Research Methodology

In this essay, a systematic approach is employed to select the solutions for review. Gartner

Magic Quadrant for Operational Database Management Systems is used as an important reference

when selecting solutions for review. The DB-Engines ranking is used as a secondary factor.

My research interests are focused on the potential that a traditional OLTP (OnLine

Transaction Processing) system can be migrated to the cloud and achieve the high scalability that

the cloud platform can deliver, while it’s mandatory strong consistency attribute is protected in the

cloud. The solutions I choose to review should be closely associated with my research interests,

which are highly scalable, strong consistency guaranteed cloud-based databases. Also, it is

impractical to check all the existing solutions, as there are many novel solutions emerging all the

time. It makes more sense to study the solutions that have been recognized by the market instead of

diverting effort to investigation of the niche players.

The major selection criteria for review include:

Solutions must have potential to scale in the cloud for the OLTP transaction.

Solutions must be able to provide strong consistency.

12

Solutions should be either well-known or have great growth potential.

The research methodology diagram in Figure 1 illustrates how my research was conducted.

Define a research area

OLTP DB migrate to cloud DB while protect consistency attribute

Define criteria to select major palyers who claim have that abilities

Review related papers on selected DBs and investigate its architecture

Synthesise the findings from individual studies

Follow the defined criteria and use the Gartner data to select DBs for

inclusion

Interpret the finding and offer my recommendation

Figure 1: Research methodology diagram

4. Research Scope and Contribution

While many researchers have studied various database solutions suited or designed for the

cloud, they studied individual DB’s unique advantages and made a full comparison. My research

has assumed that cloud-based databases have better availability and scalability compared to

traditional RDBMS while they inherit the characteristics of the cloud platform and that the data

consistency issue becomes a user’s major concern when they are planning to move their applications

to the cloud. While there is no doubt that each solution has its best-use cases, my research tries to

investigate solutions that address a user’s mandatory data consistency requirement first.

13

The limitations of traditional relational databases cause major concerns when a company is

thinking of moving to a cloud-based database. Many new solutions have emerged in recent years to

tackle the issues of data management in the cloud. However, it is not realistic to study all of them.

In this essay, I use a systematic approach to select seven DBs, explore their design and present an

unbiased view of the architecture of those solutions and their pros and cons.

The outcomes of this research will assist practitioners who are thinking of migrating their

traditional application into the cloud to understand the differences in architecture between

traditional RDBMS and available databases suitable for the cloud and to assess different database

management solutions, especially from the perspective of their consistency needs.

In the following chapters of this essay, the Literature Review chapter presents an overview

of the state-of-the-art research in the cloud data management domain. Readers can learn the latest

development and challenges. Foundation knowledge of the cloud computing domain, database

consistency theories, common cloud DB categories, and traditional RDBMS are also discussed in

this chapter. The chapter helps the readers understand why strong consistency matters. The Case

Studies chapter focuses on how individual solutions in the cloud address the data consistency issue.

Issues, challenges and recommendations summarize what I found in my research.

14

CHAPTER II: LITERATURE REVIEW

Cloud-based data management solutions live in the cloud. It is important to know the

characteristics of a cloud platform so as to understand why the traditional RDBMS cannot perform

well in the cloud. An overview of cloud computing is covered in this chapter. Most popular cloud-

based data management solutions, such as Cassandra, MongoDB and Apache HBase, are referred to

as NoSQL. These NoSQL solutions mostly spawn from the recognition and application of the CAP

theorem, while traditional RDBMS stick to the ACID principle. The difference between the CAP

theorem and ACID principle is discussed in this chapter. Trying to overcome the pitfalls of NoSQL

solutions, the NewSQL is emerging. NewSQL aims to preserve the traditional RDBMS’s

characteristics in the cloud environment. Before studying the cloud databases, the popular RDBMS

cluster solution, Oracle Real Application Cluster (RAC), is used as an example to illustrate the

architecture of a traditional RDBMS cluster. At the end of this chapter, extensive literature review

on the related researches is provided.

1. Cloud Computing

One of the most prominent IT trends of the last decade has been the emergence of cloud

computing. Cloud computing is the delivery model for providing pervasive, readily available, on-

demand network access to a shared pool of configurable computing resources. These resources can

be quickly provisioned with minimal management. The technological foundation consists of the

collection of hardware and software required to support the delivery model. It can be divided into a

physical layer and an abstraction layer. The physical layer consists of the network, storage and

server infrastructure, while the abstraction layer is composed of the software implemented across

the physical layer. This cloud model promotes availability and is composed of five essential

characteristics: on-demand self-service, broad network access, resource pooling, rapid elasticity, and

measured service. (Mell & Grance, 2009)

Virtualization, high speed Internet and cloud management are key enabling technologies

behind the emerging cloud computing paradigm.

15

In cloud computing, virtualization is the creation of a virtual version of something, such as a

hardware platform, operating system (OS), storage device, or network resources. The usual goal of

virtualization is to centralize administrative tasks while improving scalability and overall hardware-

resource utilization. By leveraging virtualization technology, a company can pool IT assets into

resource pools to be carved up, consumed, and released back into the pool as workloads require. It is

a more utility-like resource. Physical and logical resources are made available through a virtual

service layer across the enterprise. The concept of cloud computing has captured the attention and

imagination of organizations of all sizes because its service delivery model converts the power of

virtualization into measurable business value by adding provisioning and billing capabilities.

When people switch on computers, they expect the application powered by cloud computing

to work just like locally installed software. They want information to be served up immediately.

Cloud computing requires not just high speed, but also high quality broadband connections that are

always connected. While many websites are usable on non-broadband connections or slow

broadband connections, cloud-based applications are often not usable in these environments.

Connection speed in kilobytes per second (or MB/s and GB/s) is important in the use of cloud

computing services. Also important is Quality of Service (QoS), indicators for which include the

number of times the connections are dropped, response time, and the extent of delays in the

processing of network data (latency) and loss of data (packet loss). Cloud computing can have high

costs due to its requirements for both an “always on” connection, as well as using large amounts of

data. This is hard for many who pay by the megabyte or gigabyte or are limited by a data cap. The

arrival of superfast network links (such as fiber optics) delivered just that access to the worldwide

web at the speed of light. This opened the door to cloud computing, with its high expectations of

accessing cloud platforms from any location in an instant.

Cloud management is software and technologies designed for operating and monitoring the

applications, data and services residing in the cloud. Cloud management tools help ensures that a

company’s cloud computer-based resources are working optimally and properly interacting with

users and other services. Cloud management strategies typically involve numerous tasks, including

16

performance monitoring (response times, latency, uptime, etc.), security and compliance auditing

and management, and initiating and overseeing disaster recovery and contingency plans. With cloud

computing growing more complex and a wide variety of private, hybrid, and public cloud-based

systems and infrastructure already in use, a more flexible and scalable cloud management system is

required.

Cloud computing represents a shift of application architecture from traditional vertical

scalability to horizontal scalability. A typical cloud platform utilizes commodity hardware rather

than big machines such as mainframes. Vertical scaling is the process of beefing up a server by

adding more CPUs, more memory or faster disks. These machines are not just expensive, but are

also limited by their designed capacity. Horizontal scalability is a new way of scaling, since it is no

longer bound by the physical size of the server. It scales by adding more nodes. As commodity

machine is less reliable than big machine, cloud platforms often use virtualization technology to add

hardware into pooled resources to resist failure.

There are three primary models of cloud computing services: Infrastructure as a Service

(IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). ( Hogan et al., 2011)

The IaaS provider delivers and manages infrastructure for end users, including storage,

network and computing resources. Some examples of these service providers are AWS, open stack,

Rackspace and IBM.

The PaaS provider delivers a platform for the end user to create and deploy applications.

Some examples of these service providers are Amazon's AWS Elastic Beanstalk, Force.com and

Engine Yard.

SaaS is service where many users can make use of the software hosted by the service

provider and pay only for the time it is being used. It will be better than buying the hardware and

software, since this model removes the burden of updating the software to the latest version,

licensing the software and it is, of course, more economical. Some examples of these service

providers are Salesforce and Google Apps.

17

A cloud computing platform can come in four different deployment models known as public

clouds, private clouds, community clouds and hybrid clouds. (Hogan et al., 2011) Public cloud

providers offer computing resources to the general public. Resources can include hardware, a

platform for application development or whole applications. Amazon Elastic Compute Cloud (EC2),

Google AppEngine and Windows Azure Services Platform are among the most well-known public

clouds. Private clouds are built by a single company for their own applications. The private cloud is

typically on premise, but may be 100% owned by the business yet located at a third party hosting

facility. The initial cost of private clouds can be expensive and are suitable for large corporations

who have concerns around data security and want to keep maintain control of their own

infrastructure. Community clouds are shared within a specific community who have similar

concerns. They are like private clouds, but built and serviced for a number of members within a

community. Hybrid clouds can be a composite of any two or more of the cloud models described

above. They can provide high availability backup for critical systems and meet different required

levels of security and management.

Cloud computing is seen by many as the next wave of information technology for

individuals, companies and governments. Many scientists highlighted the large potential benefits of

adoption, ranging from economic growth and potentially sizeable improvements in employment to

enabling innovation and collaboration. Virtualization, high-speed Internet and agile cloud

management are the fundamental technologies that drive cloud computing to grow.

With the way the world is embracing the cloud, it will become one of the revolutionary

technologies in the near future.

In general, a cloud computing platform is built on top of commodity hardware through

virtualization technology. The benefits of cloud computing include: no up-front investment, lower

operating costs, highly scalable, easy access and reduced business risks and maintenance expenses.

(Zhang, 2010)

Traditional RDBMS do not fit the new cloud environments well. The cloud-based data

management system should be able to run on a commodity machine, as it forms a typical cloud

18

environment. However, commodity hardware is prone to fail, so the cloud-based databases have to

be fault tolerant. The cloud platform is highly scalable and elastic and the cloud-based databases are

expected to take advantage of the cloud platform and offer high speed data processing while being

easily scaled. Security and privacy concerns are always a hot topic regarding cloud computing, as

the data may now be stored on third-party premises on resources shared among different tenants.

Cloud-based databases have to address the same concerns, so that an enterprise can have sufficient

trust to embrace it.

2. ACID and CAP theorem

ACID is acronym for Atomicity, Consistency, Isolation, and Durability. Those four

properties guarantee that database transactions are processed reliably and are the foundation of

traditional RDBMS. A transaction is a single logic operation on the data composed of a series of

read and write operations. RDBMS must ensure ACID properties in the face of concurrent access or

even system failure.

Atomicity states that each transaction can be regarded as atomic. All the operations in one

transaction either are carried out or are cancelled. Even when a system crash occurs, RDBMS can

roll back the incomplete transaction to guarantee transaction atomicity. Atomicity of a transaction

means that either all or none of the transaction’s operations are performed.

Consistency states that only valid data will be written to the database without breaking

certain pre-set constraints. If a transaction is executed that violates the database’s consistency rules

such as constraints, cascades and triggers for some reason, the entire transaction will be rolled back.

The consistency property ensures that every transaction will bring the database from one valid state

to another. The rule must be satisfied for all nodes, even in a cluster environment.

Isolation refers to the requirement that other sessions cannot see the data which has yet to

commit during a transaction. Each transaction is unaware of other transactions executing

concurrently in the system. The isolation of transaction means that if several transactions are

executed concurrently the results must be the same as if they were executed serially in some order.

19

Durability refers to the guarantee that once transaction get commit, all the changes will be

persisted. The common technique is to write all transactions into a redo log that can be replayed to

restore the system state during a failure. A transaction can only be deemed committed after all the

changes are written into the log successfully.

The way ACID can guarantee transaction consistency in the presence of failures is through

so-called two-phase commit protocol. In the first phase, the transaction coordinator sends out

commit requests to each participant. All the participants must send an acknowledgment response

and block related resources. The coordinator will send a commit command in the second phase after

it receives a green light from all the participants. Any resources used by the participant are

unavailable for use by other atomic transactions between the first phase and second phase. If the

coordinator fails before delivery of the second phase message these resources remain blocked until

it recovers.

CAP theorem was proposed in 2000 by Eric Brewer, who was a professor at the University

of California. The CAP theorem basically states that any distributed system cannot guarantee

Consistency, Availability and Partitioning tolerance simultaneously. Consistency in the CAP

theorem refers to all nodes in the distributed system seeing the same data at the same time. It is not

the same as the consistency in the ACID properties of RDBMS; actually, it is close to the Atomicity

property of ACID. Availability refers to the services accessible; the system as a whole continue to

operate in spite of some nodes failing. When a system is available, it can respond to all requests in a

timely fashion. Partition tolerance means that the system can continue to operate despite arbitrary

message loss or failure of part of the system. In a distributed environment, network communication

is critical for multiple nodes to work as a signal system. Partition tolerance application allows the

system to continue functioning unless total network failure occurs. In the case of partitioning, where

a distributed data store is partitioned into two sets of nodes, both partitions have to deny all write

requests in order to guarantee data consistency. Otherwise, the data will likely become inconsistent

when either of the partitioned nodes is available and allow updates. The first approach protects data

20

consistency, but sacrifices availability; the second approach maximizes availability, but cannot

guarantee data consistency.

The CAP theorem suggests that a system can only have, at most, but two of three desirable

properties. Scalability has become an essential facet of cloud computing that allows the system to

scale horizontally. In today’s business environment, more and more applications have to be highly

available; even long latency is not acceptable. Consistency becomes the only option to be

compromised and most NoSQL systems trade off consistency for the other two properties. The

common consistency model of NoSQL is known as BASE (Basically Available, Soft state, Eventual

consistency).

The BASE is eventual consistency. Its consistency is weaker than ACID, but it makes the

system easier to scale and maintains availability. ACID and BASE adopt two opposite design

philosophies. While ACID is pessimistic and requires consistency at the end of every operation,

BASE is optimistic and acknowledges that data might become inconsistent, but it will become

consistent eventually.

3. NoSQL and NewSQL

The NoSQL database is a whole new approach to manage data for very large sets of

distributed data. The data created today is so large and complex it is very difficult to process in

traditional RDBMS. The term “big data” is used to describe that massive volume of data and was

named as one of most overused buzzwords in 2013 by FactSet Research Systems Inc. As a result,

cloud-based NoSQL databases emerged to tackle the big data.

Compared to traditional relational databases, a NoSQL database provides a mechanism for

storage and retrieval of data that uses looser consistency models. Motivations for that approach

include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases

are finding significant and growing industry use in big data and real-time web applications. (NoSQL

Wekipedia, 2013)

NoSQL is widely used for exploring big data. Many Internet giants, such as Amazon

Dynamo, Google BigTable, LinkedIn Voldemort, Twitter FlockDB and Facebook Cassandra, found

21

the traditional RDBMS could not handle their unprecedented volume of data within an acceptable

time, so they developed their own in-house solution. Those databases are designed and optimized

for their specify use cases.

The common genres of NoSQL DB include key-value, columnar, document-oriented, and

graph databases.

Key-value is very simple structure similar to a dictionary. All the values have to be looked

up by a unique key. Values are isolated and independent of each other, so relationships must be

handled in application logic. (Hecht, 2011) The key-value store is suited to simple operations.

Amazon Dynamo, LinkedIn Voldemort and Riak are all key-value stores.

Columnar databases (aka column-oriented) share some similarity with key-value databases

in that values are queried by matching keys. Unlike key-value database, the value in column-

oriented DBs is composed of many columns, so all the related column values can be retrieved in one

lookup. Google Bigtable is the most representative columnar database. HBase is inspired by Google

Bigtable and is Bigtable’s open source implementation.

Compared to a column-oriented database, instead of grouping a number of columns,

document-oriented databases pack whole objects as a single value. JavaScript Object Notation

(JSON) and Extensible Markup Language (XML) are common representations of document-

oriented objects. The most prominent document stores are CouchDB, MongoDB and Riak. (Hecht,

2011)

Graph databases focus on the free interrelation of data; therefore they are very efficient in

traversing relationships between different entities. Existing graph databases are not very scalable

compared to other genres of NoSQL DBs, because it is expensive to travel multiple distributed

nodes to retrieve data. According DB-Engines Ranking, Neo4j is the most popular graph database in

use today.

The common technique used to partition and distribute data across nodes in the cloud is

Distributed Hash Tables (DHT). DHTs, which distribute lookup and storage over a number of peers

with no central coordination required, offer a scalable alternative to the central server lookup.

22

As most NoSQL databases do not provide strong data consistency, which is critical for

many OLTP systems, NoSQL technology is complementary to RDBMS, not a replacement.

(Pokorny, 2013)

NewSQL was first defined by Matthew Aslett, an analyst with the 451 Group. NewSQL has

the scalability and flexibility promised by NoSQL while retaining support for SQL queries and/or

ACID (atomicity, consistency, isolation and durability). It claims to improve performance for

appropriate workloads to the extent that the advanced scalability promised by some NoSQL

databases becomes irrelevant (Aslett, 2011). Like NoSQL, NewSQL uses shared-nothing

architecture and can scale-out to a large number of nodes without suffering a performance

bottleneck. Unlike NoSQL, which sacrifices consistency, NewSQL uses the relational data model

and primarily has an SQL interface. It supports the ACID properties for application transaction.

The traditional OLTP solution cannot handle massive OLTP workload and is incapable of

providing real-time analytics data warehouses as the traditional way of transferring data from OLTP

to OLAP (OnLine Analytics Processing) typically takes tens of minutes to hours (Stonebraker,

2011).

NewSQL preserves SQL and data consistency while offering high performance and

scalability.

Many OLTP applications require a strong consistency property. NewSQL is an alternative

for bringing the relational data model into the NoSQL database. It can make it easy for traditional

applications to migrate to a highly scalable cloud environment.

4. RDBMS Cluster – Oracle RAC

A traditional RDBMS cluster is a shared storage architecture. Oracle RAC is one of the most

successful products for enterprise mission-critical application. Here I am using Oracle RAC as an

example to illustrate the mechanism of traditional RDBMS cluster.

Oracle Real Application Clusters (RAC) allows multiple instances to access a single

database. RAC was evolved from Oracle Parallel Server (OPS) and Oracle introduced RAC in 2001

with their cache fusion technology in their Oracle 9i release. In 2004, Oracle extended their RAC

23

with their own server Clusterware software, a field used control by IBM HACMP (High

Availability Cluster Multi-Processing), HP Service Guard and Sun Cluster. Clusterware allows

multiple nodes to work together and can be viewed as a single system. The Oracle RAC

infrastructure is a key component for implementing the Oracle enterprise grid computing

architecture. The database of the Oracle RAC system is stored in a shared storage; all nodes must be

able to access the shared storage simultaneously.

In order to maintain data consistency across multiple instances, Oracle utilizes their

proprietary cache fusion technology to ensure data consistency within all the nodes. Cache fusion

technology, also known as cache coherence, maintains the consistency of data blocks in the buffer

caches within multiple instances. Any node in the RAC must acquire cluster-wide data lock before a

block can be modified or read. Oracle Global Cache Service (GCS) is implemented to maintain the

buffer cache coherence through high speed interconnection.

Oracle built a robust mechanism to protect data integrity when facing system failure. Each

node has a daemon process CSSD (Cluster Services Synchronization Daemon) to monitor the health

of the system and communicate with other nodes. When a severe issue is detected by the local

CSSD, the notification is broadcast to other nodes. RAC cluster can evict the membership of failed

nodes. When a network failure occurs, the CSSD in each node cannot communicate with others.

RAC uses voting disks to decide the node eviction. Voting disks are located on shared storage and

should be visible to all nodes. They are used to monitor disk heart beats. All the nodes must update

the block of voting disk periodically. If the disk block is not updated in a short timeout period, then

that node is considered unhealthy. The cluster can evict the unhealthy node or reboot it, depending

upon the quorum of that node, to avoid a split-brain situation.

Oracle RAC can prevent the server from being a single point of failure and provide high

availability and scalability. It can combine smaller commodity servers into a cluster to create

scalable environments that support mission-critical business applications.

However, compared to cloud-based data management solutions, the pitfall of Oracle RAC is

obvious. First, RAC can only prevent server failure. The communication latency between RAC

24

nodes is critical, so it is rare to deploy RAC nodes in different locations. The storage is shared; if it

fails, all the nodes will go down. The high availability of RAC is limited compared to the cloud

databases.

Second, Oracle RAC cannot provide linear scale-out performance. The performance of the

RAC system will downgrade when adding more nodes into the cluster. Oracle implemented a

Global Resource Directory (GRD) to record information about how resources are used within a

cluster database. It also introduced cache fusion technology to speed the data block movement

around a cluster. When multiple nodes are trying to access the same data set, the global locking

mechanism is used to protect the consistency property. It will cause contention and slow down the

performance. Oracle suggests partitioning the data between different applications. The shared

storage architecture prevents RAC from scaling further. Since all the read/write will go to the same

storage, sooner or later it will hit the storage speed limit when more nodes are added. RAC did offer

some scalability, but not much. Theoretically, Oracle RAC can have up to 255 nodes; however, it

has only been tested with up to 16 nodes. Actually, it is not common to see RAC on more than 6 or

8 nodes. Most instances of the RAC database are two-node. Oracle recognized the problem with

their RAC. In the latest Oracle 12c, Oracle introduced a new architecture called Flex Clusters,

which divide the nodes into two different types: Hub and Leaf. The Hub nodes are the same as the

traditional cluster nodes in its previous version. The Leaf nodes are connected only with the

corresponding attached Hub Nodes and they are not connected with each other. (Hussain et al.,

2013) The new architecture greatly reduces interconnect traffic and provides room to scale up the

cluster to the traditional cluster.

Finally, RAC implementation is expensive. As the speed of interconnection is critical for

RAC’s performance, Oracle suggests high-end switch to reduce the latency of cache fusion. The

storage performance has a common bottleneck when all the nodes access the same storage. High-

end storage is the silver bullet to improve overall RAC performance. Moreover, Oracle charges their

license fee based on the number of nodes. Oracle packs their RAC software and hardware together

into one appliance named Oracle Exadata. The list price for the basic eighth rack two-node Exadata

25

starts from $220,000 plus $55,000 storage cost, with an additional support fee based on Oracle’s

July17, 2014 price list. (http://www.oracle.com/us/corporate/pricing/exadata-pricelist-070598.pdf )

The top configuration of Exadata is eight-node with a list price of around $1.5 million.

Despite the fact that Oracle RAC is an expensive solution with limited improvement on

scalability and availability, it still a good choice for existing Oracle customers who do not want to

change their codes and want to seamlessly run their applications. But

5. Cloud Computing Database and Data Management

Many early papers have discussed cloud computing and the distributed cloud database. NIST

defines cloud computing as a model for enabling ubiquitous, convenient, on-demand network access

to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications,

and services) that can be rapidly provisioned and released with minimal management effort or

service provider interaction (Mell & Grance, 2011). Its key characteristics include: on-demand self-

service, broad network access, resource pooling, rapid elasticity and measured service.

A data management system can be migrated to the cloud platform and even become

Database-as-a-Service(DBaaS). DBaaS must be able to scale-out elastically and support databases

and workloads of different sizes. Curino et al. believe database partitioning is essential to allow

multi-nodes load balancing and scale-out. Partitioning should do this in a way that can minimizes

the number of cross-node distributed transactions. They proposed a graph-based partitioning method

to spread large databases across many machines. (Curino, 2011)

Jaroslav Pokorny states that tradition applications use vertical scale to support a larger

system while cloud computing uses horizontal scaling to scale out in a more effective and cheaper

way. NoSQL databases in the cloud relax some of usual database constraints to archive horizontal

scaling. The author discusses the ACID principle and CAP theorem. Traditional RDBMS always try

to guarantee consistency at all cost. Distributed Hash Tables (DHT) is a common technical to

partition the data for a DBMS to scale-out in the cloud. The author selected two NoSQL DBs,

CASSANDRA and Google DB, as typical examples and discusses their data model, how to query,

and how the data gets stored. Cassandra can even allow the developer to choose the consistency

http://www.oracle.com/us/corporate/pricing/exadata-pricelist-070598.pdf

26

degree in their client application to enable real-time transaction processing in the cloud. Also, the

author lists 10 popular NoSQL DBs and makes a comparison. The author concluded that the current

NoSQL solution is good for unstructured data but since has difficulty reaching simple ACID

properties, it is not a replacement for tradition RDBMSs. (Pokorny, 2013)

Rick Cattell walked through over twenty scalable data stores (Cattell, 2011) and compared

them from concurrency control, data storage, replication mechanism and transaction consistency

perspectives. He summarized six key features of a NoSQL DB. One of NoSQL’s key characteristics

is shared-nothing horizontal scaling architecture. Shared-nothing architecture allows NoSQL to

replicate and partition data over many servers, in order to scale and process data at much faster

speeds. It trades ACID constraints for performance and scalability. In his study, MySQL Cluster,

VoltDB, Clustrix, ScaleDB, ScaleBase and NuoDB are tagged as high consistency databases.

Not all NoSQL systems sacrifice consistency for high availability. Facebook Cassandra is a

key-value NoSQL store. It extends the concept of eventual consistency and implemental tunable

consistency mechanism that allow a developer to decide how consistent the requested data should

be. The developer can make trade-offs between consistency and latency. Cassandra allows clients

to specify a desired consistency level, ZERO, ONE, QUORUM and ALL, with each read or write

operation. ZERO indicates no consistency guarantee but offers the lowest possible latency, and ALL

is the highest consistency but it sacrifices availability. QUORUM is a middle-ground, ensuring

strong consistency. Use of these consistency levels should be tuned in order to strike the appropriate

balance between consistency and latency for the application. In addition to reduced latency, lower

consistency requirements mean that read and write services remain more highly available in the

event of a network partition. (Featherston, 2010)

Dr. Michael Stonebraker and Rick Cattell, leading researchers and technology entrepreneurs

in the database field, claim that one size does not fit all. (Stonebraker & Cattell, 2011) They studied

many scalable SQL and NoSQL data stores introduced over the past five years for consistency

guarantees, per-server performance, scalability for read versus write loads, automatic recovery from

failure of a server, programming convenience, and administrative simplicity. These data stores can

27

manage data volume that exceeds the capacity of single-server RDBMSs. These researchers think

that a developer needs to redesign the application for scalability, partition application data into

“shards,” avoid operations that span partitions, design for parallelism, and weigh requirements for

consistency guarantees. They presented ten rules that a system should follow in order to achieve

scalable performance. Some of the rules are: shared-nothing architecture, leverage of fast memory,

high availability, automatic recovery, avoidance of multi-node operations, not trying to build ACID

yourself, and recognition that per-node performance matters.

At the Symposium on Principles of Distributed Computing (PODC) 2000, Eric Brewer

presented his CAP theorem, also known as Brewer’s CAP Theorem. The three key properties of

distributed databases are tolerance for Network Partition, Consistency and Availability. Brewer

states that any shared-data system can only have, at most, two of those properties. Daniel Abadi of

Yale University says that consistency became adjustable in the modern distributed database system.

He questioned that the CAP theorem focus on the trade-off between consistency and availability

when there partition is not an issue, but it ignores latency. He argued that a system should trade off

latency and consistency in the absence of partition. He went further and proposed his PACELC

theorem: if there is a partition (P), how does the system trade off availability and consistency (A and

C); else (E), when the system is running normally in the absence of partitions, how does the system

trade off latency (L) and consistency (C)? (Abadi, 2012) Figure 1 below illustrate the diagram of

PACELE.

Figure 2: PACELC Tradeoffs for Distributed Data Services (Abadi, 2012)

28

Eric Brewer recognized the limitation of his CAP theorem in the evolving environment. The

CAP theorem asserts that any networked shared-data system can have only two of three desirable

properties. After twelve years of his CAP theorem, Brewer thinks that designers can optimize

consistency and availability by explicitly handling partitions, thereby achieving some trade-off of all

three.(Brewer, 2012)

The CAP theorem had become a principle for designing NoSQL databases. However, many

transactional applications cannot give up a strong consistency requirement. Tim Kraska et al. found

there has non-trivial trade-off between cost, consistency and availability through their experiences.

They advocate finding a balance between cost, consistency, and availability and present a number of

techniques that let the system dynamically adapt the consistency level by monitoring the data and/or

gathering temporal statistics of the data. (Kraska, 2009) They acknowledge that high consistency

implies high cost per transaction and reduced availability and propose to divide the data into three

categories: Category A – Serializable, Category B – Adaptive and Category C - Session

Consistency. They understand the challenge in relaxed consistency models is to provide the best

possible cost/benefit ratio while still providing understandable behavior to the developer. The

different categories bear different consistency level guarantees. The system only pays the

consistency cost when it matters.

29

CHAPTER III: METHODOLOGY

Cloud computing is becoming one of hottest trends in the current IT industry. The traditional

RDBMS is not designed for the cloud environment, so it cannot fit its unique characteristics. Novel

data management solutions are booming in recent years. It is not feasible to review all the solutions.

I employ a systematic approach to select the solutions for study in this essay.

The major selection criteria for review include:

Solutions must have potential to scale in the cloud for OLTP transactions.

Solutions must be able to provide strong consistency.

Solutions should be either well-known or have great growth potential.

Gartner Magic Quadrant for Operational Database Management Systems is used as an

important reference to narrow down the candidate solutions. This essay focuses on solutions that

can already or have potential to scale in the cloud for OLTP transactions. The ability to provide

strong consistency is a key factor. Gartner Magic Quadrants is a research methodology and

visualization tool for monitoring and evaluating the progress and position of companies in a

specific, technology-based market. Their report is based on their survey of hundreds of customers.

Gartner requests feedback on a vendor’s completeness of vision and ability to execute it. Gartner set

criteria for the companies listed in Magic Quadrants. For the 2013 Gartner Magic Quadrant for

Operational Database Management Systems, companies to be considered must have over 100

customer across at least two of the major geographic regions. The minimum revenue is $20 million.

SAP, Oracle, NuoDB, Clstrix and VoltDB, which I am reviewing, are listed in the 2013

Gartner Magic Quadrant for Operational Database Management Systems. Megastore from Google is

only used within Google and is not considered based on Gartner’s inclusion criteria, but given

Google’s influence in the cloud and Megastore’s unique design, Megastore is selected as one of the

DBs to be reviewed. ScaleDB is selected mostly because it is one of MySQL variant databases.

MySQL cluster adopts shared-nothing architecture, but ScaleDB tried to solve the issue with

surprisingly backward shared-disk architecture.

30

DB-Engines ranking is a good reference that helped me understand the popularity of

reviewed DBs. DB-Engines rank the databases by their popularity, which it is scored through a

number of parameters such as the number of mentions on websites, general interest according to

Google Trends, number of job offers, etc. As I focus on the cloud-based OLTP database, it is a

relative new domain and it is understandable that most of reviewed DBs have a relatively low

ranking. But its most recent growth trend can provide some additional insight besides Gartner’s

Magic Quadrant.

Below is the graph of reviewed DB growth trends that I drew using the tool provided by DB-

Engine. MySQL ranks very high on the popularity scale and was second in the overall ranking, but

it is an umbrella for all MySQL products. The MySQL cluster in my essay should only account for a

small portion of the MySQL market. As you can see in the graph below, five out of six DBs are

gaining popularity, the exception being MySQL. Again, Google’s Megastore is not listed as it is

only used in internal projects.

Figure 3: Trend Popularity Data provided by DB-Engine.

31

The selected data management solutions are reviewed with a focus on their architecture and

the data consistency they can provide.

32

CHAPTER IV: CASE STUDIES

1. Megastore

Megastore is a storage system developed by Google and has been widely deployed within

Google for many years. It blends the scalability of a NoSQL data store with the convenience of a

traditional RDBMS in a novel way and provides both strong consistency guarantees and high

availability. (Baker et al., 2011)

The goal of Megastore is to overcome the weaknesses of common NoSQL solutions which

can only provide eventual consistency. Megastore is designed for high availability, scalability and

low latency data management. More important, it can provide full ACID, which is critical for many

applications. It is built upon Google’s BigTable (Google’s NoSQL key-value store) and adds ACID

transactions, secondary indexes, queues and others primitives. It optimizes the Paxos algorithm to

achieve low latency replication operation across geographically distributed datacenters, in order to

provide reasonable latencies for interactive applications in a highly distributed environment.

The data in Megastore is partitioned into so-called entity groups which are hierarchically

linked sets of entities. In one entity group, all the entities share the common prefix of the primary

key. Megastore tables are either entity group root tables or child tables. Each child table must

declare a single distinguished foreign key referencing a root table. (Baker et al., 2011) Each record

in the root table along with all the associated data in the child table is deemed as one single entity

group. Google created transaction logs for each single entity group which are replicated to other

copies, so it can ensure the ACID properties within an entity group. Megastore supports two-phase

commit across entity groups operation. But the cost of two-phase commit is expensive. Google

recommends always using asynchronous messaging when the consistency for across-entity group

operation is not absolutely required.

The figure below illustrates the two-phase commit and asynchronous messaging for across-

entity groups operations.

33

Figure 4: Operations across entity groups (Baker et al., 2011)

Megastore provides three types of read methods: current, snapshot, and inconsistent reads

for user’s various requirements. As it is built on top of Google’s BigTable NoSQL technology,

Megastore stores multiple values with different timestamps to achieve multiversion concurrency

control (MVCC). The current read fetches the latest committed version of data. The snapshot read

fetches the data by timestamp. The inconsistent read is similar to the current read and tries to get the

latest version of data, but unlike current read, the inconsistent read does not check whether all the

committed transaction logs have been applied, so it does not guarantee that it can fetch the latest

version of data. The inconsistent read is faster than the other two types of reads and can be used

when stale or partially applicable data can be tolerated.

Megastore is built upon BigTable and adds traditional RDBMS primitives such as ACID,

indexes and schemas. It provides transparent replication and failover between data centers through

Paxos. It stores the transaction in the log first and applies it for each entity group. The ACID

properties are protected over an entity group for any transaction. For transaction across multiple

entity groups, users can choose light-weight asynchronous messaging or expensive two-phase

commit to suit their needs. Data writes need a majority of replicas to be up in order to commit, so it

may be more costly. Global consistency comes at a cost and Megastore writes are relatively slow.

34

Megastore is scalable, partition tolerate and provides adjustable ACID control. It is suitable

for a large-scale transaction processing system where a data can be easily partitioned. It performs

well largely in reads and small update applications.

2. SAP HANA

SAP HANA is an in-memory, columnar-based, relational database developed by Germany

software giant SAP AG. The name HANA is short for "High-Performance Analytic Appliance".

SAP HANA employs a hybrid engine that can process both column-based store and row-based

store. The row-based engine is suitable for OLTP and the column-based engine is suitabl for OLAP.

SAP merged two engines into one DB, in order to provide real-time analytics functionality. Both

engines share a common persistency layer, which provides data persistency consistent across both

engines (Krutov et al., 2014). It has a logging system that records all changes in in-memory pages.

The logger writes all the committed transactions on persistent storage. The key component that is

responsible for ensuring transactional ACID is Transaction Manager. It coordinates database

transactions, controls transactional isolation, and keeps track of running and closed transactions.

The transaction manager works with the persistency layer to achieve atomic and durable

transactions and provide consistent views of data.

35

Figure 5: The SAP HANA database architecture (Färber, 2012)

SAP HANA adopted the shared-nothing architecture and is designed for a highly distributed

system. The system can be split into three different functional components: Name server, Index

server and Statistics server. The Name server stores the topology information of the SAP HANA

system and provides directory service. The Index server contains the actual data stores and the

engines for processing the data. The Statistics server collects historical performance data for alerting

and performance analysis purposes. All the servers can be deployed into multiple nodes. One of the

index servers becomes the master server of the index servers’ cluster, and others index server acts as

a slave index server. If application data does not partition properly, the master index server has to

forward the request to all the slave index servers and perform but a multi-hop process. That would

increase the processing latency. When the data is well partitioned, the master index server only

needs to forward a request to the right index server hosting the corresponding data partition.

Since SAP HANA relies on MVCC as the underlying concurrency control mechanism, the

system provides distributed snapshot isolation and distributed locking to synchronize multiple

36

writers. Therefore, the system relies on a distributed locking scheme with a global deadlock

detection mechanism, avoiding a centralized lock server as a potential single point of failure (Wada

et al., 2011).

SAP HANA provides true ACID guarantees. If the data can be partitioned properly, the SAP

HANA can be scaled into hundreds of nodes with minimum performance penalty. Its architecture is

good for performing both OLTP and OLAP in one place. It eliminates the traditional Extract,

Transform and Load (ETL) process that pulls the data from OLTP into OLAP for data mining and

reduces data redundancy.

SAP HANA can be scaled-out for largely read data analytics processing. It suffers from

performance downgrade if online transaction processing has to access data in a remote distributed

environment.

3. VoltDB

VoltDB is an open source OLTP database that implements the design of the academic H-

Store project. It is an in-memory DB that adopted the share-nothing architecture. It is designed to

run a cluster rather than a single big machine. Tables can be partitioned across multiple servers in

the cluster. It can scale-out on a commodity machine and deliver ultra-high performance while

protecting the ACID properties.

VoltDB is designed to run much faster than traditional RDBMSs. Its main founder, Michael

Stonebraker, identified the four major overheads of traditional RDBMSs: buffer pool overhead,

multi-threading overhead, record-level locking and write-ahead log. He proposed a novel design that

can get rid of all those four major overheads, in order to make the new DB run much faster than

traditional RDBMSs.

The interface that VoltDB exposes is stored procedure. A transaction is a stored procedure

call and the transaction is executed sequentially and exclusively against its data. That is how

VoltDB protects the transaction atomicity and isolation properties. VoltDB is a pure SQL system. It

supports a large subset of SQL-92, including most SQL data types, along with filtering, joins and

37

aggregates (VolDB whitepaper, 2010). VoltDB can accept Ad-hoc queries, but it just compiles the

query on-the-fly into temporary stored procedures and calls it in the same way (Stonebraker, 2011).

VoltDB achieves high availability through automatic intra-cluster and inter-cluster

replication. Data is synchronously committed to replicate partitions within the cluster before

transactions commit. This provides durability against single-node failures. For intra-cluster

replication over WAN, transactions are asynchronously committed to a replica cluster. VoltDB

implements a concept called command logging for transaction-level durability (VolDB whitepaper,

2010). When a disaster occurs, VoltDB simply replays the logged commands to restore the data for

recovery.

For now, VoltDB only supports hash partitioning. When a new node is added in the cluster,

VoltDB has to redistribute the data into different servers; it cannot provide service until the data

redistribution is completed. Also, the hash partitioning is not good for range searches. It is best

suited for an application which has a high volume of small transactions.

4. MySQL Cluster

MySQL Cluster enhances the standard open source MySQL with an in-memory clustered

storage engine known as Network DataBase engine (NDB). It employs shared-nothing clustering

architecture and automatic sharding technique. That enables the great scalability of MySQL Cluster

compared to other traditional RDBMS.

Unlike the Oracle RAC, in which all the nodes in the RAC are treated equally, nodes in the

MySQL cluster are assigned three different roles: Storage Node (SN), Management Server Node

(MGM) and MySQL Server Node. The SN stores all the data and replicates the data between nodes

to ensure high availability. It also handles all the database transactions. The MGM handles the

system configuration and it is only used at start-up and system re-configuration (Ronstrom, 2004).

The MySQL server node sits in between the application and the SN. It knows how the data is

partitioned in the SN and acts as a broker that takes the application SQL request and sends it to the

appropriate SN.

38

Figure 6: MySQL Node Architecture (Ronstrom, 2004)

MySQL Cluster uses two-phase commit protocol in order to guarantee data consistency. All

the changes of a transaction are replicated synchronously to the nodes which hold other copies of

data before transaction commit. MySQL Cluster supports read committed transaction isolation level

and will not read uncommitted data of other transactions.

5. ScaleDB

ScaleDB is a pluggable storage engine that transforms MySQL to a cluster of database

servers. It uses shared-disk architecture which is similar to Oracle RAC.

ScaleDB is composed of three different types of nodes: Database Node, Cluster Manager

Node and ScaleDB Storage Node (Shadmon, 2009). Its design is almost the same as MySQL

Cluster except for the shared-disk architecture. All the storage nodes form a global cache and

39

persistency layer. The global cache manages caching of shared data and guarantees cache

coherency.

Figure 7: ScaleDB Cluster with Mirrored Storage (Shadmon, 2009)

ScaleDB uses a locking mechanism to guarantee ACID properties. The local lock manager

maintains locks at the node level, while the distributed lock manager manages cluster level locks.

ScaleDB can offer high availability. The cluster manager node detects the failure of other

nodes and takes action to resolve the issue. The standby cluster manager will kick in if the master

cluster node fails.

The shared-disk architecture has experienced scalability challenges and performance

bottlenecks. That explains why most of the new databases choose shared-nothing architecture. But

the speed of network connection has increased drastically in recent years and the cost of high

performance storage has been significantly reduced, especially in the cloud. ScaleDB is suitable for

the cloud environment and delivers high availability. There is no single point of failure in ScaleDB.

As all the storage nodes need to share their storage with other nodes, its scalability is limited.

40

6. NuoDB

NuoDB is a distributed database designed with global application deployment challenges in

mind. It is a true SQL service, with all the properties of ACID transactions, standard SQL language

support, and relational logic.(NuoDB, 2013)

NuoDB is composed of three layers: an administrative tier, a transactional tier and a storage

tier. The transaction layer maintains atomicity, consistency and isolation, while the storage layer is

responsible for durability. These layers can be set up in one server, but it is supposed to be different

servers by design. The node for the translation layer is known as the transaction engine (TE) and the

node for the storage layer is known as the storage management node (SM). Unlike a typical hub-

and-spoke design, NuoDB uses peer-to-peer service to move data between its nodes. The

administrator node is responsible for monitoring, managing and automating database activity. As all

processes are peers, all the hosts are the same and there is no single point of failure. The peer-to-

peer communication is formed through a local management agent that installs on all the hosts. The

administrator node have a global view of all the nodes.

NuoDB introduced a new concept called Atoms. Atoms are chunks of data used for

simplifying internal communication and caching. The NuoDB is simply a collection of Atoms that

can be easily stored in key-value stores or others kinds of storage.

NuoDB uses MVCC to avoid the conflict of concurrency read and write. For the write-write

conflict, NuoDB picks some host as a chairman of the object to act as tie-breaker. Only the TE that

caches the object can be selected as chairman, so most mediation is done locally. NuoDB sends

asynchronous update messages to all peers that have a copy of the object when TE commits a

transaction.

41

Figure 8: NuoDB architecture (NuoDB, 2013)

NuoDB is a high performance and resilient database solution that supports SQL and ACID

properties. The peer-to-peer architecture makes NuoDB very easy to scale-out and achieve high

availability. The unique Atom design allows different data stores to be selected for storage, such as

a local file system, Amazon S3 or a Hadoop Distributed Filesystem.

Every SM node of NuoDB connects to a whole set of databases. NuoDB relies on the

underlying storage to provide the data distribution. In order to prevent single SM failure, NuoDB

allows installation of multiple SM nodes, but additional SM nodes means additional storage for

another copy of the dataset. If there is a limit on the underlying storage, NuoDB cannot go beyond

that limit, since it cannot combine local disks into a big pool.

For consistency during write-write conflict, NuoDB uses a tie-breaker mechanism that is

similar to the locking system of traditional RDBMSs. It might run into bottleneck when the system

is a highly concurrent OLTP system. NuoDB is best suited for a hybrid workload that combines

both OLTP and OLAP requirements.

NuoDB adopted a tunable commit protocol that allows the user to trade off configurable

durability for performance.

NuoDB is well suited for applications that need transactional consistency, highly scalable

performance and simple operation.

42

7. ClustrixDB

ClustrixDB is designed from the ground up for scale-out in the cloud. It adopts fully-

distributed shared-nothing architecture and can be easily to scaled-out by adding more nodes. It

fully supports SQL and is fully ACID compliant. It has the ability to handle massive ACID

transactions.

Just like many other NewSQL DBs, ClustrixDB stores multi-version rows with timestamps,

so there is no conflict in providing consistent read while updating the record. The older versions are

garbage-collected when no longer used. For concurrent updates, ClustrixDB uses 2Phase Locking

(2PL) to order updates. Writers always read the latest committed information and acquire locks

before making any changes (Clustrix, 2014).

ClustrixDB distributes multiple copies of data intelligently across nodes and can parallelize

the query automatically by using multiple nodes and multiple cores on each node, so it can

accelerate the queries without sharding the database.

Figure 9: Clustrix Distributed Query Processing (Clustrix, 2014)

ClustrixDB is suitable for applications that running both OLTP and OLAP at the same time.

ClustrixDB claims it can scale linearly to hundreds of cores since it employs shared-nothing

architecture, intelligent data distribution, and distributed query processing.

43

CHAPTER V: CHALLENGE, OPPORTUNITY AND TREND

In this essay, I have reviewed eight databases: Google Megastore, SAP NANA, VoltDB,

Oracle MySQL Cluster, ScaleDB, NuoDB and Clustrix DB, along with Oracle RAC. In the latest

2013 Gartner Magic Quadrant for Operational Database Management Systems, Oracle and SAP

were recognized as market leaders, while VoltDB, NuoDB and Clustrix were listed as niche players

that have potential to challenge the leaders. Google’s Megastore is only used internally and cannot

be evaluated by third parties.

Most current databases provide multi-version concurrency control (MVCC) for read

consistency. But the way the DBs I reviewed implement MVCC is different than traditional

RDBMSs. Traditional RDBMSs such as Oracle RAC only keep the latest version of the data in the

database and try to reconstruct older versions of data dynamically as required. The new approach for

MVCC is to store multiple versions of data in the database and garbage-collect records when they

are no longer needed. All seven DBs except VoltDB choose the new approach for MVCC. VoltDB

tries to serialize all the transactions, so there is no concurrency issue at all. It is obvious that storing

multiple versions of records requires more disk space, but it can avoid the expensive operation of

reconstructing older versions of data. Storing multiple versions of records can achieve better

performance and scalability and opens a door toward eventual consistency.

Application consistency requirements might vary for different types of applications.

Applications such as social networking and commenting can tolerate a different view of data for a

short period of time. For mission critical applications, such as bank account management, strong

consistency has to be guaranteed.

In the reviewed DBs, only Megastore allows the developer to choose three different

consistency reads. SAP HANA, MySQL Cluster, NuoDB and ScaleDB all enforce ACID

compliance. VoltDB sequences all operations. There is no concurrency issue for VoltDB, so no

need to worry about data consistency.

In a distributed cloud environment, providing consistency is much more complex than in a

traditional database. It requires a great deal of communication involving remote locks, which force

44

systems to wait on each other before proceeding to mutate shared data. It is common to employ a

data sharding technique to avoid consistency issues. The inevitable communication delay and

reliability downgrade between networks increases the complexity to guarantee complete consistency

in a cloud environment. Megastore use Paxos to support two-phase commit across entity groups

operation. Google recognized the overhead of two-phase commit, so it recommends always using

asynchronous messaging when consistency across entity group operation is not absolutely required.

The common strategy is to employ two-phase commit, but its complexity and costs reduce the

system scalability and performance even when using the best practices and the most advanced

technologies of the time.

There always is cost attached for strong consistency. For many enterprise applications,

strong consistency is a must in some of its use cases. But the consistency requirement in many use

cases can be relaxed for the benefit of availability and scalability. For example, the account balance

of prepaid cellular service is critical and has to be tracked in real time for authorization before

authorizing a subscriber’s call request. But the account balance is not that important for the majority

of postpaid accounts since the carrier collects payment at the end of the customer’s bill cycle. As

long as the account balance of a postpaid account become consistent before the bill cycle, it is good

enough for the carrier to bill the customer correctly. Even for a prepaid account, not all the changes

within its transactions demand strong consistency. Account balance must maintain consistency all

the time, but the call detail does not have to be online right after the customer hangs up the phone.

The customer is unlikely to detect when the call details are posted by seconds or even minutes

delay. From a business perspective, some data in some transactions require absolute consistency

while others do not. Even different customers may have different consistency needs. If a database

allow its consistency to be customized and leaves the business analyst with the job of choosing the

desired consistency level for different transactions or users, then the system does not have to pay the

consistency costs for transactions where consistency is not really needed. Google’s Megastore

allows prioritization of consistency over performance. It provides three levels of read consistency,

current, snapshot, and inconsistent reads. If the database can let the user customize the consistency

45

model for the content of data on top of session level consistency control, it should be able to reduce

the cost of maintaining strong consistency in transactions. For instance, money transferring in a

bank must be executed in the strong consistency model. If the data management system can impose

strong consistency on the account balance information while applying eventual consistency on the

auditing and logging information, it might still meet the banking system’s requirements. But it

reduces the amount of data that must be synced significantly, in order to reduce the cost of

maintaining the strong consistency transaction.

Compared to the traditional major RDBMS products, Oracle, Microsoft SQL Server and

IBM DB2 rank first, third and fifth to respectively in terms of popularity of DB-Engines. Cloud

DBs have a relatively lower ranking of popularity. Table 1 compares the seven reviewed DBs in

terms of popularity, storage architecture and consistency. The popularity is based on the DB-Engine

ranking on relational DBMSs for July 2014. The latest result can be found at http://db-

engines.com/en/ranking/relational+dbms

MySQL is ranked as the second-most popular database after Oracle, thanks to great adoption

of the regular MySQL database. The MySQL cluster I discussed in this essay is quite different from

but the single-machine MySQL database, but DB-Engine ranks the whole MySQL family as a

single product. The number two ranking does not reflect the popularity of MySQL cluster at all.

Database Popularity Storage

Architecture Consistency

Megastore Only Google Shared-nothing adjustable ACID

SAP HANA 13 Shared-nothing ACID

VoltDB 40 Shared-nothing ACID

MySQL Cluster 2 Shared-nothing ACID

ScaleDB 78 Shared-Disk ACID

NuoDB 43 Shared-nothing Tunable ACID

ClustrixDB 52 Shared-nothing ACID

Table 1: Reviewed DB comparison

http://db-engines.com/en/ranking/relational+dbms

http://db-engines.com/en/ranking/relational+dbms

46

Speed, scalability and availability are all interesting areas for comparison, but each DB

might vary in performance in different use cases; there are no standard criteria to rank these

systems. Most benchmarks are conducted internally and in their favourite use cases environment.

VoltDB is touted as a super-fast OLTP engine. It targets high-velocity data to provide real time

analytics on it. SAP HANA is well-known for its ability to combine OLTP and OLAP engines in

one. MySQL cluster is the pioneer that adopts shared-nothing architecture and has the ability to

scale to tens of nodes. ScaleDB is a pluggable engine for MySQL and provides a cloud-based

solution for scalability. NuoDB claims to provide high scalability through its three-tier structure

combining features of administrative, transactional and storage layers. ClustrixDB is designed from

the ground up for scale-out in the cloud. It is perceivable that it has great scalability in the cloud.

According to the InformationWeek 2014 State of Database Technology Survey of 955

business technology professionals, the traditional RDBMS is still widely used today, but the cloud

DBs are becoming attractive. MongoDB is ranked tenth in use today. The popularity trend depicted

in Chapter III shows that five reviewed cloud DBs gained on popularity while the other two

(MegaStore and MySQL Cluster) have no data to measure.

Traditional RDBMSs are suitable for processing structured data in a transactional fashion,

but cannot handle massive data with decent performance. NoSQL can scale easily to process high

volume of all kinds of data, but it cannot guarantee strong consistency. Ensuring strong consistency

in the cloud environment is expensive and comes at the cost of performance and availability. It is a

challenge for the data management solution providers to come up with a solution that can overcome

the limitations of both RDBMSs and NoSQL DBs. NewSQL is emerging to tackle the issue.

47

CHAPTER VI: CONCLUSIONS AND RECOMMENDATIONS

Cloud computing offers significant benefits and presents great opportunities for the

information technology industry. The traditional RDBMS is designed to run on a single big

machine. Its cluster solutions are based on shared-disk architecture and group a number of servers

into one cluster. It can protect the system when any part of the server experiences a hardware issue,

but it cannot linear scale, since the storage is single point of failure and experiences performance

bottlenecks. These RDBMSs are not natively suited to a cloud environment. Many NoSQL solutions

have been invented by Internet companies to tackle their scalability and performance challenges in

the cloud. They use partitioning or a sharding technique to distribute data into multiple nodes so

they can better scale with near-linear performance. But they have to relax the notion of strong

consistency for the benefit of improved scalability and only offer eventual consistency. For many

web applications, eventual consistency is good enough to satisfy their requirements. For instance, a

social website or job searching website, in order to maintain high availability, avoids the overhead

of synchronizing the data between distant nodes. It is common to keep multiple replica in the cloud.

The replica approach can be master/replica, master/master, or distributed peers. Most NoSQLs do

not guarantee the strong data consistency that traditional RDBMSs offer. Strong data consistency is

still critical for many enterprise OLTP applications, so there is tremendous opportunity for new

vendors to introduce novel solutions that address these enterprises’ concerns. NewSQLs are

emerging and trying to combine both NoSQL’s and traditional RDBMS’s advantages.

In this essay, I have surveyed cloud computing technology, cloud NoSQL, NewSQL, data

consistency theorem, traditional RDBMS cluster Oracle RAC and seven selected data management

solutions. All seven database solutions recognized the importance of strong data consistency and

tried to solve the issue through their unique architectures.

This essay explored the common architecture of cloud-based OLTP data management

solutions such as shared-nothing storage, MVCC, partitioning or sharding to avoid across-distance

nodes transactions. In addition, this work has identified the challenges for future development in the

48

domain, especially fine grain data consistency control. I hope my work will provide a better

understanding of data consistency challenges in the cloud environment and will assist system

architects in making better decisions when they are thinking of migrating their application to the

cloud. It can also be used as material for university students who have interests in the data

management domain.

49

References

Abadi, D. J. (2012). Consistency tradeoffs in modern distributed database system design. Computer-

IEEE Computer Magazine, 45(2), 37.

Aslett, M. (2011). How will the database incumbents respond to NoSQL and NewSQL? San

Francisco, The, 451, 1-5.

Baker, J., Bond, C., Corbett, J. C., Furman, J. J., Khorlin, A., Larson, J., & Yushprakh, V. (2011,

January). Megastore: Providing Scalable, Highly Available Storage for Interactive Services.

In CIDR (Vol. 11, pp. 223-234).

Brewer, E. A. (2000, July). Towards robust distributed systems. In PODC (p. 7).

Brewer, E. (2012). CAP twelve years later: How the" rules" have changed. Computer, 45(2), 23-29.

Cattell, R. (2011). Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4), 12-27.

Curino, C., Jones, E. P., Popa, R. A., Malviya, N., Wu, E., Madden, S., ... & Zeldovich, N. (2011).

Relational cloud: A database-as-a-service for the cloud.

Färber, F., Cha, S. K., Primsch, J., Bornhövd, C., Sigg, S., & Lehner, W. (2012). SAP HANA

database: data management for modern business applications. ACM Sigmod Record, 40(4),

45-51.

Featherston, D. (2010). Cassandra: Principles and application. University of Illinois, 7, 28.

Feinberg, D. Adrian M. & Heudecker, N. (2013). Magic Quadrant for Operational Database

Management Systems. Gartner Research Note.

Hecht, R., & Jablonski, S. (2011). NoSQL Evaluation. In International Conference on Cloud and

Service Computing.

50

Hogan, M., Liu, F., Sokol, A., & Tong, J. (2011). NIST cloud computing standards roadmap. NIST

Special Publication, 35.

Home - Clustrix Documentation - Clustrix Documentation. 2014. Home - Clustrix Documentation -

Clustrix Documentation. [ONLINE] Available at:

http://docs.clustrix.com/display/CLXDOC/Home.

Hussain, S. J., Farooq, T., Shamsudeen, R., & Yu, K. (2013). New Features in RAC 12c. In Expert

Oracle RAC 12c (pp. 97-122). Apress.

Kraska, T., Hentschel, M., Alonso, G., & Kossmann, D. (2009). Consistency Rationing in the

Cloud: Pay only when it matters. Proceedings of the VLDB Endowment, 2(1), 253-264.

Kossmann, D., Kraska, T., & Loesing, S. (2010, June). An evaluation of alternative architectures for

transaction processing in the cloud. In Proceedings of the 2010 ACM SIGMOD International

Conference on Management of data (pp. 579-590). ACM.

Krutov, I., Vey, G., & Bachmaier, M. (2014). In-memory Computing with SAP HANA on IBM

eX5 Systems. IBM Redbooks.

Kumar, R., Gupta, N., Maharwal, H., Charu, S., & Yadav, K. (2014). Critical Analysis of Database

Management Using NewSQL.

Mell, P., & Grance, T. (2009). The NIST definition of cloud computing. National Institute of

Standards and Technology, 53(6), 50.

NoSQL - Wikipedia, the free encyclopedia. 2013. [ONLINE] Available at:

http://en.wikipedia.org/wiki/NoSQL

NuoDB, Inc. A Technical Whitepaper – NuoDB, 2013

Pokorny, J. (2013). NoSQL databases: a step to database scalability in web environment.

International Journal of Web Information Systems, 9(1), 69-82.

http://docs.clustrix.com/display/CLXDOC/Home

51

Ronstrom, M., & Thalmann, L. (2004). MySQL cluster architecture overview. MySQL Technical

White Paper.

Shadmon, M. (2009). The ScaleDB Storage Engine.

Solid IT, “DB-Engines Ranking of database management systems” [Online]. Available: http://db-

engines.com/en/ranking

Stonebraker, M. (2011). NewSQL: An Alternative to NoSQL and Old SQL for New OLTP Apps.

Stonebraker, M., & Cattell, R. (2011). 10 rules for scalable performance in simple operation

datastores. Communications of the ACM, 54(6), 72-80.

Stonebraker, M., & Weisberg, A. (2013). The VoltDB Main Memory DBMS. IEEE Data Eng. Bull.,

36(2), 21-27.

Terry, D. (2013). Replicated data consistency explained through baseball. Communications of the

ACM, 56(12), 82-89.

VoltDB, L. L. C. VoltDB Technical Overview, Whitepaper, 2010.

Wada, H., Fekete, A., Zhao, L., Lee, K., & Liu, A. (2011, January). Data Consistency Properties

and the Trade-offs in Commercial Cloud Storage: the Consumers' Perspective. In CIDR

(Vol. 11, pp. 134-143).

Zhang, Q., Cheng, L., & Boutaba, R. (2010). Cloud computing: state-of-the-art and research

challenges. Journal of internet services and applications, 1(1), 7-18.

Documents

ATHABASCA UNIVERSITY A Survey of Data Consistency in …dtpr.lib.athabascau.ca/action/download.php?filename=scis-07/open/... · ATHABASCA UNIVERSITY A Survey of Data Consistency in