In-Memory Relational Database Systemsalaiba/pub/absolvire/2016 vara/In-memoryDB.pdf · I hereby declare that the dissertation thesis with the title In-Memory Relational Database Systems

1

“ALEXANDRU IOAN CUZA” UNIVERSITY OF IAȘI

FACULTY OF COMPUTER SCIENCE

DISSERTATION THESIS

In-Memory Relational Database

Systems

proposed by

-- Róbert Kristó--

Session: July, 2016

Scientific coordinator

Asist. Dr. Vasile Alaiba

2

“ALEXANDRU IOAN CUZA” UNIVERSITY OF IAȘI

FACULTY OF COMPUTER SCIENCE

In-Memory Relational Database

Systems

-- Róbert Kristó --

Session: July, 2016

Scientific coordinator

Asist. Dr. Vasile Alaiba

3

STATEMENT REGARDING ORIGINALITY AND COPYRIGHT

I hereby declare that the dissertation thesis with the title In-Memory Relational

Database Systems is written by me and wasn’t submitted to another university or higher

education institution in the country or abroad. Also, I hereby declare that all the sources,

including the ones retrieved online are appropriately cited in this paper, respecting the

rules of avoiding Copyright infringement:

- all exactly reproduced excerpts, including translations from other languages, are

written using quotes and hold an accurate reference to their source;

- rephrased texts written by other authors hold an accurate reference to the

source;

- the source code, images, etc. taken from open-source projects or other sources

are used by respecting copyright ownership and hold accurate references;

- summarizing ideas of other authors hold an accurate reference to the original

text.

Iași, date

Graduate: Róbert Kristó

__________________________________________

4

STATEMENT OF CONSENT I hereby declare that I agree that the dissertation thesis titled In-Memory

Relational Database Systems, the application source code and other content (graphics,

multimedia, test data, etc.) presented in this paper can be used by the Faculty of

Computer Science.

Also, I agree that the Faculty of Computer Science, “Alexandru Ioan Cuza”

University of Iași can use, modify, reproduce and distribute the application, executable

and source code, in non-commercial purposes, created by me for the current thesis.

Iași, date

Graduate: Róbert Kristó

__________________________________________

5

Contents

1. Abstract

2. Introduction

2.1. The Computation Problem

2.2. RDBMS Terminology

2.3. Relational Database Management Systems

2.4. In-Memory Relational Database Management Systems

2.5. The Durability of the In-Memory RDBMS

3. OLTP vs OLAP

3.1. Online Transaction Processing

3.2. Online Analytical Processing

3.3. Hybrid Transactional/Analytical Processing

4. Row Store vs Column Store

4.1. Row Store

4.2. Column Store

4.3. Row Store vs Column Store

4.4. Row Stores and Column Stores in RDBMS

5. RDBMS Indexes

5.1. B+ Tree Indexes

5.1. Bitmap Indexes

5.1. Skip Lists

5.1. Hash Indexes

5.1. Column Store Indexes

6. High Level Architecture of In-Memory RDBMS

6.1. MemSQL

6.2. VoltDB

6.3. Oracle

6

7. Experimental Test Case

7.1. Test Case Description

7.2. Test Case Setup

7.3. Running the Queries

7.4. Test Case Comparison

8. Conclusions

8.1. MemSQL Results

8.2. VoltDB Results

8.3. Oracle Results

8.4. Final Conclusion

8.5. Further Research

9. Bibliography

7

1. Abstract

This thesis wants to prove the advantage of using In-Memory relational database

management systems in comparison with or alongside other classical relational database

management systems which persists the information on the disk.

It contains a short introduction about In-Memory databases, followed by some of the

specific technologies used by them. I will present the main differences between the row store

and column store ways of saving data, followed by the indexes used by these kind of

databases. After that the general high level architecture of some of the current vendors which

implement In-Memory data stores.

In the last chapter the reader will also see some experimental results done on the

MemSQL, VoltDB, Oracle and NuoDB, followed by the thesis conclusion and further

research proposal.

8

2. Introduction

The In-Memory relational database management systems have appeared after a large

drop in hardware prices and their sizes, the data access on the RAM being much faster in

comparison with the one on the hard disk drive.

2.1. The Computation Problem

Until recently all the data processing was done overnight or weekends and the results

were saved on the hard disk. These type of computing was performed in batches and could

take from a few hours up to almost all weekend. This meant that the Database users had to

work using all data from the previous day and also could not check how their changes would

affect the outcome until the next day. This meant that a lot of productivity was lost and also

could potentially affect the overall revenue when the results were not the expected ones.

Another issue with this old approach was that the results could not take in time in

consideration important events that occurred in that day.

In the past few years the price of the RAM hardware became much cheaper which

means that now the most relevant data can be stored in RAM. In order to profit of these new

prices most of the RDBMS (Relational Database Management Systems) started to provide an

In-Memory solution, and also other new In-Memory specialized new RDBMSs started to

appear. This meant that the most user critical tables were now stored in RAM memory and the

data could be accessed much faster and also the calculation could be done faster.

2.2. RDBMS Terminology

Relational database management systems, both the ones which store the records in

memory and the ones which store them on disk use a few specialized terms. A few of the

wors from that vocabulary are presented below:

○ Batch - it is a method of doing data processing in which operations are ran

grouped together to finish a certain type of calculation

○ Data Warehouse - it is one of the database management systems which stores

old data in a denormalized form. It usually stores historical data which is

processed in batches. It also contains old archive data

○ CRUD Operations - CRUD (create, read, update, delete), represent the basic

database operations

○ Query - it represents the operation of reading data from the database

9

○ Primary Key - it assures us of the record uniqueness on a given column and it

does not have null values (we can have a maximum of one primary key per

table)

○ Unique Key - it is almost the same as the primary key, the difference being that

we can have unique values and we can create more than one on a table

○ Foreign Key - it makes the link between a record from the current table with a

record from another table using an unique key or primary key

○ I/O - Input/Output represents the operations of reading and writing to disk

○ Cache - this is where the most accessed data and recent data is saved on the

RAM

2.3. Relational Database Management Systems

The first implementation of a Relational Database Management Systems was in 1974

by IBM. It was called System R and it was only a prototype. The first commercial Relational

Database Management System was released in 1979 by Relational Software (now Oracle

Corporation) and it was called Oracle.

The Relational Database Management Systems are based on the relational model that

was invented by Edgar F. Codd. In these kind of databases the information is stored in tables

which are connected with each other using relationships. The records in the tables are stored

as tuple format.

In these database systems the tables are linked together using relations (Foreign Keys).

The tables are theoretically normalized using third normal form, but in practice they can be

denormalized in order to achieve better performance for corner case use cases. Almost all of

the tables have a primary key which are used to uniquely identify records and have values

saved for all the fields from the table.

The RDBMS vendors usually provide the SQL language as a method to query the

tables and also try to respect the SQL ANSI standard in order for the queries to be compatible

with other vendors and to offer an easier transition for developers from one database to

another.

2.4. In-Memory RDBMS

In order to better understand what are the In-Memory Relational Database

Management Systems it must first be explained what they are not.

Further most they are not In-Memory NoSQL (not only SQL) databases. These

databases are not relational, most of them storing the data in a key-value store or document-

10

store type. One of the most popular database of this kind is Redis. It is indeed an In-Memory

database but it does not store the data in a relational way, as it uses a key-value storage. It

offers isolation by using only one execution thread which is used by all the users in order.

Another database type with whom it must not be confused are the classic database

systems which are deployed as virtual machines directly in memory [7]. These can be of

course faster that being deployed on disk but they do not profit of the different memory type

(RAM instead of hard disk). Another disadvantage of this kind of databases is that in case of

an outage the user can very easily lose all the information stored on them.

These databases must also not be mistaken with distributed caching systems. These

ones save the most frequently accessed data in an application, and in case the user has enough

machines at our disposal the user can save all the information from the database. This caching

mechanisms do not offer durability and this is why they are mostly not used to store

information which is crucial for the user.

The principal characteristics of an In-Memory Relational Database Management

System is that it offers a much faster access to the data thanks to the much lower latency of

the RAM access in comparison with the one from the disk and it also uses specialized data

structures in order to have a faster access to the information. Because the information is saved

directly in the RAM the user does not need a caching mechanism anymore, the exception

being when the user save the data in dual mode (both on the hard drive and in the RAM), this

way the database may choose the faster way to access the data (it may use the cache when it

chooses to access the data from the hard disc). As it will be presented in the next chapter, the

In-Memory Relationship Database Management Systems try to offer a hybrid system between

OLTP and OLAP, being named HTAP or NEWSQL.

2.5. The durability of the In-Memory RDBMS

The data being saved in the RAM, memory which is lost in case of an outage, the

durability becomes an important problem which needs to be solved. This can be solved by

using systems which have backups, which are made on longer periods of time, when the

number of users is lower. These are used alongside with transaction logs which will be

applied on the backup in case of an outage

Another method used is by implementing a High-Available system, which consists in

using another machine in parallel (most of the time it is located in another location, called the

disaster recovery site) which continually gets instructions from the primary server and in case

the primary server has an outage a failover process will start making the secondary machine

now primary. The clients will automatically connect the newly assigned primary machine, this

11

way the loss of information will be very small, the failover usually being done automatically

and very fast.

The durability can also be assured by some of the In-Memory RDBMS’s by saving the

data in dual format (both on disk and in memory), this way no information being lost in case

of a server restart as the data will still be saved on the hard disk.

12

3. OLTP vs OLAP

The historical databases are split in two: OLTP and OLAP databases. In this chapter I

will highlight the main differences of these two and the ways in which the In-Memory

RDBMS’s tries to unify them under only one system.

3.1. Online Transaction Processing

Online transaction processing, most often abbreviated as OLTP, is the most used kind

of database. The most important characteristics of the online transaction processing databases

are:

● contains real time data which comes from a lot of short and fast CRUD (create,

read, update and delete) operations.

● uses simple queries which usually return or modify very few records

● most of the applications which are using this kind of databases must be able to

provide high user concurrency

● the schema is highly normalized and contains a lot of tables

3.2. Online Analytical Processing

Online Analytical Processing, abbreviated as OLAP, represents the batch processing

kind of databases. It is mostly used in data warehouse environments. The most typical

attributes of the Online Analytical Processing databases are:

● in usually helps in planning and decision making

● provides multi-dimensional views on consolidated historical data

● the data usually is consolidated from multiple online transaction processing

system by periodic batches or from other user provided systems

● the queries are much more complex than the ones from the online transaction

processing kind of systems and usually contain a lot of aggregations and

involve a far larger number of records

● must provide timely answers to questions involving high resource usage

● is denormalized using a star/snowflake schema design

3.2.1. Star Schema

The Star Schema is mainly used in Data Warehouse systems. The star schema usually

has one or more fact tables which references a number of dimension tables.

13

The fact tables contain transaction records with a transaction id and also ids which are

foreign keys to the dimensional tables.

The dimensional tables contain the actual information which needs to be accessed. In

this way when the user is doing some computation the user only accesses records from the

tables from which the user actually needs information, and by doing so the system is reducing

the I/O needed, which is the bottleneck in many database systems.

3.2.2. Snowflake Schema

The Snowflake schema resembles a lot to the Star Schema, the main difference

between the two being that the Star schema contains only dimension directly linked to the fact

tables, while the Snowflake schema can contain dimension which are also linked between

them. A dimension can also be linked only to another dimension.

3.3. Hybrid Transactional/Analytical Processing

As pointed above the two of them have very different use cases and historically the

data has been saved in both formats and usually in different systems. This meant a lot of

duplicated data being saved which until a few years ago meant a much higher expenses to the

customers.

As technology advanced and the hardware became much cheaper a new kind of

database system started to be used called Hybrid Transactional/Analytical Processing

(HTAP). This Relational Database Management System has both the capabilities of the OLTP

and OLAP system. This new term has been created by Gartner, Inc., which is a firm

specialized in comparing products based on categories and placing them in four magic

quadrants. It is used mainly by the In-Memory database systems.

14

4. Row Store vs Column Store

Currently the records in a RDBMS can be stored in row store and column store

format.

4.1. Row Store

In the first database implementations the records have been saved in a row store

format. This meant that for each records in the database a tuple was saved with values for

each field, they being saved next to each other and separated by a special character.

An example of how the records are being saved is depicted bellow in Table 1.

EMP_NAME JOB HIRE_DATE SALARY BONUS

SMITH SALESMAN 14/05/2015 1000 20

MACBETH MANAGER 13/01/2014 1000 25

MACBETH SALESMAN 20/06/2015 1500 20

Table 1 : Row Store Data Saving

The level row store format of the same records:

1: SMITH, SALESMAN, 14/05/2015, 1000, 20 -

2: MACBETH, MANAGER, 13/01/2014, 1000, 25 -

3: MACBETH, SALESMAN, 20/06/2015, 1500, 20

The first visible advantage is that a new record can be easily red, added, deleted or

updated as database can have a continuous read of the respective row id.

A disadvantage is the fact that if the user only wants for example to read the

EMP_NAME and BONUS fields the user still has to read the whole record which means that

a lot more I/O is needed which can greatly affect performance especially when a large number

of records is involved.

15

4.2. Column Store

The column store format is a new kind of saving the data which started to gain

popularity in RDBMS implementations only recently. In this format the user has separately

each column with the values of all the records from it.

An example of how the records are saved in a column store format can be seen bellow,

in Table 2.

EMP_NAME SMITH MACBETH MACBETH

JOB SALESMAN MANAGER SALESMAN

HIRE_DATE 14/05/2015 13/01/2014 20/06/2015

SALARY 1000 1000 1500

BONUS 20 25 20

Table 2 - Column Store Data Saving

Also a high level column store format:

SMITH: 1, MACBETH: 2, MACBETH: 3 -

SALESMAN: 1, MANAGER: 2, SALESMAN: 3 -

14/05/2015: 1, 13/01/2014: 2, 20/06/2015: 3 -

1000: 1, 1000: 2, 1500: 3 -

20: 1, 25: 2, 20: 3

As it can be seen in the example above a few records are mentioned a few times. In

order to improve the space used by the column store format and also improving the I/O

operations needed the database can compress the values by the expense of CPU [10].

16

A simple compression example is provided bellow:

SMITH: 1, MACBETH: 2; 3 -

SALESMAN: 1; 3, MANAGER: 2 -

14/05/2015: 1, 13/01/2014: 2, 20/06/2015: 3 -

1000: 1; 2, 1500: 3 -

20: 1; 3, 25: 2

The first visible advantage of the column store is that the database can now have less

I/O operations by reading only the data of interest. As an example if the user wants to read

only the data from the EMP_NAME field the database does not need to also read the records

from the other fields. A second advantage is as noted above the fact that the database can use

compression to reduce I/O operations and reduce consumed space.

A disadvantage of using the column store format is that tuples might require multiple

seeks.

4.3. Row Store vs Column Store

In conclusion column-oriented organizations are more efficient when aggregation

operation are being done over many rows but only if a smaller subset of fields are selected as

the number of data read is much smaller. Also the column store format are faster when many

new values are provided for a field at once in an update statement as only the values from one

field are accessed in order to change the data.

The row-oriented organizations are more efficient when a bigger number of fields are

accessed at once from a record as the field values are next to each other and also if the record

is small enough a single database seek may be necessary. This is also true when a new record

is added to the database as.

The advantages and disadvantages make the row store format to be well suited for an

OLTP kind of application, while the column store format is best used in an OLAP

environment.

17

Number of

Attributes Accessed

Execution Time

in seconds for

Row-Store

Execution Time

in seconds for

Column-Store

2 257.462 sec 128.731 sec

3 257.326 sec 128.899 sec

4 259.526 sec 153.923 sec

9 273.445 sec 288.565 sec

15 280.778 sec 8543.667 sec

25 290.199 sec 20899.542 sec

Table 3 - PostgreSQL Row-Store vs Column-Store [12]

A study on the PostgreSQL database has been made in order to compare the response

time of the row store vs column store by comparing the number of columns selected in the

query. Also a plot was provided in the study:

Figure 1- Column-Store vs Row-Store Performance in PostgreSQL [12]

18

4.4. Row Store and Column Store in RDBMs

Each RDBMS provider provides different implementation of these types of storages.

Oracle Database 12c provides both row store and column store format, but row store only on

disk and column store is saved only on RAM. The same table is used, as in the records from

the disk coincide with the ones on RAM. This improves a lot the read performance as the

Oracle Optimizer (a process which helps the database find the best execution plan in order to

retrieve the records) can now choose to read the data from both the column store or row store

depending on the type of calculation. Oracle also provides different types of column store

compressing options. The two main compression types are Warehouse compression and

Archive compression. Warehouse compression is ideal for query performance and is

specifically oriented towards scan-oriented queries which are mostly used in data warehouses.

This is ideal for tables that are queried frequently as they occupy more space than using

archive compression. The archive compression is in contrast better for tables which are

queried less frequently as they need more CPU in order to decompress the information but

occupy less space. A few of the algorithms used in order to compress the information are:

Dictionary Encoding, Run Length Encoding, Bit-Packing, and the Oracle proprietary

compression technique called OZIP that offers extremely fast decompression that is tuned

specifically for Oracle Database.

MemSQL also provides both row store and column store format, but instead it saves

the row store only on RAM and the column store only on disk. In this situation the tables are

not linked between them, meaning that it does not have to update the values in both locations

in case of an update, delete or insert operation, the only loss in this kind of situation being that

the data from the in-memory storage can be lost in case of an outage if no high-availability

solution is in place.

The best solution is provided by SAP HANA which provides all the combination, as

in the row store can be saved both on disk and RAM and also the column store can be saved

both on disk and RAM. Also it has the option to choose if the tables are linked or not.

Another good functionality provided by SAP HANA is that when it will do an insert into a

column store format table it will have the option to initially save the new record in cache in a

row store format in order to improve the overall insert performance as the record will be

added in the column store when the database will be less busy [4].

19

5. RDBMS Indexes

An index is an auxiliary data structure which is used in order to faster access data from

the tables. The cost of maintaining this extra structure is little in comparison with the read

performance gains. Usually the indexes contain some of the data from the table for which the

search can be based and also a row identifier which acts like a pointer to the values from the

table, as the fastest way to access records from a table is by using the row id.

The first indexes introduced in early RDBMs were the B+ Tree indexes. After the

introduction of the in-memory databases new lock free indexes have been invented which are

also lock free and are a better fit for RAM storage.

5.1. B+ Tree Indexes

The B+ tree concept has first been introduced in the 1970s. The B+ tree is a balanced

binary tree which contains values only in the leaf nodes, the rest of the nodes being called

decision nodes. Another important difference between the binary balanced tree is the fact that

we have bidirectional pointers between the leaf nodes and that helps it greatly to scan

consecutive values or when the database wants to scan the whole index (the operation is

called full index scan), as it does not also need to scan the decision nodes. The decision nodes

contain a value which is used to compare with the searched value in order to find out if the

value is saved on the left or right branch. The search performance gain is O(log 2 n).

An example of B+ Tree index can be seen bellow:

Figure 2 - B+ Tree [8]

20

As it can be seen above to search the value 19 must be compared with only two other

values. If the database wanted to search for the respective value directly in the table the

database would have had to compare it with 20 other values, and also the database would

have had to read the whole rows while searching if a row store column format was used.

The main disadvantage is that it is not a lock free structure because after each update,

insert or delete operation the database would need to rebalance the tree in order to retain the

read performance.

The B+ Tree indexes are useful only when the selectivity is high (if the number of

duplicates is around 10% or more the query will be faster accessing directly the table) or

when all the needed fields are found inside the index.

This index is good at searching for unique values and also to search for a range of

value as the records stored are in consecutive order.

5.2. Bitmap Indexes

The bitmap indexes have also been introduced early in most of the RDBMs. They

have been first invented in 1985 and they are best used when the number of distinct values is

very low.

An example of a bitmap index when storing cardinalities:

Figure 3 - Bitmap Index [16]

As it can be seen above when the search was made for example for the Cardinality the

rows which contain the value can be easily be found by accessing the row North (the cells

which have a dot inside them contain the value).

The bitmap index can be easily updated but in practice is not used very often as there

are not a lot of searches done on fields with low selectivity and also for example Oracle

Database provides bitmap indexes only in its Enterprise Edition.

5.3. Skip Lists

This type of index was introduced in the year 1990, with almost 20 years after the well

known B+ Tree Index. Unlike the B+ Tree Index this one does not guarantee a number of

steps to find the results. It relies on probabilities, but the changes to not return a result in a

21

satisfying time is very low, in average having a performance comparable with the one of a B+

Tree Index. This type of index fits well with the In-Memory database architecture because it

can profit of the RAM hardware type. Another big advantage of this type of index is that

unlike its B+ Tree Index counterpart it does not block resources, as it does not need to be

rebalanced like the B+ Tree. However if the user sees that the performance of the Skip lists is

degrading the user can rebuild it in order to have the data better scattered.

Depending on the number of records the user has in the table the database can build

this index using a number of lanes, representing the index depth. After the number of lanes is

established by the DBMS it will start to do the inserts. The first records will be added and

after that it will do a “coin toss” in order to establish the level on which the record will be

placed. It will start with the lowest level and will use the “coin toss” technique on each level

until the coin toss result will return a negative result. Depending on each DBMS

implementation the database can have on each higher level a higher change of having a

negative result in order to have the Skip List well balanced. This step represented the

probabilistic stage in order to determine the level on which each record will be placed. In

order to add the rest of the records the database will also start from left to right and will

descend after until the database will find the values searched in the index of when the

database will get to the first level. The most left side will be considered as minus infinite and

the right side as plus infinite. This way when database start from the left (as our value if

bigger than minus infinite) database will compare our value with each of the values from the

towers from that level until database find one which is bigger than our value (or equal). In that

moment database will go one level lower and database will make the same steps as the ones

from the above level until database find our searched value or database get to the first level. In

case our searched value is already in the index database will add it to the list of the tower in

which database found it or database will add a new tower in case it did not already exist, in

which case database will restart the previously shown coin toss technique in order to

determine the tower’s height.

As database can see in the above example database will simply add a new tower

between other two, for which there is no need to block the whole index, this way concurrent

users can still use the Skip List.

When database will search for a value in this index database will use a related

technique to the one used at insert by doing a left to right and top to bottom search until

database will find a tower which has the searched value or until database finish searching the

index. In case database will find the searched value database will return all the records saved

in that tower.

22

A search example is given bellow, in Figure 4.

In case the database wants to delete records the database will use the same technique

as the one from the search, the only difference being that the database will delete the tower

after the database find it. Like in the case of the insert the database does not need to block the

whole index, the RDBMS simply deleting the searched tower.

This type of index is implemented by the In-Memory RDBMS MemSQL, being the

recommended user by this database vender. This type of index has a good performance at

both unique value searching and range searching as all the values in the Skip List are placed

in consecutive order.

5.4. Hash Index

This type of index has been implemented by both classic RDBMSs like SQL Server

and new RDBMS competitors like MemSQL. This type of index has a good performance only

when the database is doing an unique value search and it has a very bad performance when

the database are doing a range scan, and for this reason some of the RDBMS vendors allow

the customers to only create the index on primary or unique keys. The records of the index are

stored in buckets which contain a number of different values. When a new value is inserted a

new bucket will be created in which the database will store it unless the hash function returns

the value of an already existing bucket. When a search will be made a hash function will be

applied to the searched value and after the database establish in which bucket its saved the

database will search for the record there (it is possible that the returned hash value exists but

the record does not exist in that bucket) [17].

Figure 4 - Skip List Search [9]

23

5.5. Column Store Indexes

Different DBMS vendors made their proprietary indexes to help for a faster search of

data. An index of this type was implemented by the Oracle Database in order to help us search

for records in its In-Memory column store or in is Exadata column store.

5.5.1. Column Store Indexes used by Oracle Database

As it was presented above, Oracle database uses a proprietary column store index. For

storing the column store data Oracle uses a new store type called IMCU (In-Memory

Compression Unit). Each IMCU contains a number of records based on their size. When a

search is done in these IMCUs first they will access the SMU (Snapshot Metadata Unit)

which contains information about that IMCU, more exactly its interested in the minimum and

maximum values from that IMCU in order to know if it need to access it or not, this way

saving I/O time and accelerating the finding of the information. This type of index is

automatically created by the Oracle Database. In the current documentation it is not specified

the used criteria on which the tables are chosen to have indexes made, this meaning that the

database can only speculate that the indexes are created for the most used tables, respectively

when the database has a lower load.

24

6. High Level Architecture of In-Memory RDBMS

In this chapter it will be presented the general architecture implemented by various In-

Memory database vendors. The architectures studied in this chapter will be the ones provided

by MemSQL, VoltDB, Oracle and NuoDB.

6.1. MemSQL

MemSQL is regarded as one of the fastest In-Memory database solutions, one of its

main competitors being the NoSQL (Not only SQL) database provider Redis. This is a two

layered database. The first layer its called the aggregator and the second one its called the leaf

layer. The simplest implementation contains one aggregator node and a leaf node, but the

recommended solution by the database vendor is to use five leaf nodes for each aggregator

node. On the leaf node the real information is stored. Having a shared-nothing type

architecture the information is automatically distributed between the leaf nodes in such way

that the information of one record is being stored on only one leaf node, this way no duplicate

data will be stored and redundancy will be avoided. The aggregator level contains the

metadata about these leaf nodes and it automatically know by using partitioning which leaf

nodes need to be queried in order to returned the desired data [6]. To have the best

performance gain its indicated that the user should use many different machines in order to

have as many leaves as possible. To also implement high availability and to not lose data in

case of an outage the user can create more availability groups, which contain the same data as

the first availability group which is normally access, and in case the main availability group

fails and automatically failover will be done. MemSQL automatically does the data

rebalancing in case one of the partition is no longer available. This way it is avoiding the

single point of failure. The leaf nodes are the ones which execute the queries provided by the

user and only if all the data cannot be found in only one leaf node that the aggregator will

combine the data from the other leaf nodes. The single way the database can ensure that the

data is all saved in the same partition is by using Shard Keys. In a table the database can have

for example a Shard Key on the department column and also a primary key on the Employee

and Department columns, this way the database can ensure that all the employees from the

same department are saved on the same leaf node.

25

The connection is done to the primary aggregator node and when this one fails

MemSQL automatically assigns another aggregator node as primary. In each cluster group

there is only one primary aggregator node [1]. His tasks include monitoring the cluster group,

operation regarding the cluster group and it is also the only aggregator node which does the

DDL (data definition language) operation. The aggregator nodes can be considered being a

type of load balancers or network proxies.

6.2. VoltDB

VoltDB is also a new In-Memory Relational Database Management System which had

its first version available in 2010. It is the only official implementation of the academic H-

Store project. The H-Store project has been implemented by a team which members are from

Brown University, Carnegie Mellon University, the Massachusetts Institute of Technology,

and Yale University. The database researchers involved in the system’s design are: Michael

Stonebraker, Sam Madden, Andy Pavlo and Daniel Abadi [15]. The H-Store project is the

first implementation of the new class of parallel database management systems called

NewSQL (also known by others as HTAP). This technology tries to have the high-throughput

and high-availability of NoSQL system, while still retaining the transactional part of the

relational database management systems [13].

Figure 5 - MemSQL High Level Architecture

26

The VoltDB saves all the data only In-Memory, this being a big difference between

MemSQL and Oracle Database. In case the database is shutdown it can recover the data from

continuous snapshots that are being made and also from command logging files [2], which

assure the user no transaction is committed without the commands being added to the log. It is

ACID and has a shared-nothing architecture, meaning that each node has different data from

the same table, the data being automatically partitioned. In case a query is submitted it can run

separately on each node using a different transaction. This way the total query time is

comparable with other databases but the number of transactions it can run is much bigger as

other transactions can run in parallel on another node [14]. VoltDB is single-threaded

meaning that locks, latches and transaction are no longer needed, helping the overall

performance of the database to be much better as the above operations are one of the mostly

costly in classic database implementations. In order for the system to run optimal it can save

smaller tables with all the records in each node, this way in case joins are needed each node

can still run the query concurrently. Currently VoltDB supports DML (data manipulation

language) queries only with the auto-commit mode turned on. The transactions are supported

only as stored procedures. The query modifications from the stored procedure act as a

transaction, all the modification being saved or rolled back in case of an error.

6.3. Oracle Database

Oracle database is at the moment one of the most used databases, it also being the first

which implemented a relational database system. In its last release, Oracle Database 12c, it

offers an option to its Enterprise Edition an In-Memory database solution [5]. To profit of the

new option it’s not necessary to make almost no changes to the database, the main change

being that RAM needs to be allocated for the respective tables, change which also requires a

database restart. After the allocation part was done the user can also choose which table wants

to add In-Memory and also can choose which columns the user would like to be added In-

Memory (columns can be excluded which the user deems unnecessary to be added In-

Memory). As a parameter for the respective tables the user can select a property by which

they will be added In-Memory. As options provided there are: Critical, High, Medium, Low,

None. The first four option represent the priority with which the tables will be added in

memory after the database is started and the tables which use the last option will be added in

memory after the first time they will be queried. Another parameter which can be used when a

table is added in memory let us choose the compression type of the records.

27

The options from which the user may choose are: MEMCOMPRESS,

MEMCOMPRESS FOR DML, MEMCOMPRESS FOR QUERY LOW, MEMCOMPRESS

FOR QUERY HIGH, MEMCOMPRESS FOR CAPACITY LOW, MEMCOMPRESS FOR

CAPACITY HIGH [3], [11]. Using one of these options the records can occupy less space but

more CPU cycles will be needed for the decompression part or they may occupy more space

but less CPU cycles will be needed. The compression can be very efficient, records occupying

between 2 and 20 times less space than on disk [11], this part being very important as the

space on RAM is in general much less than the space provided for the hard disk drive.

The durability of the records is offered by storing the information both on disk and

RAM. A new process was added in Oracle Database 12c which takes care of the storing of

date on RAM, which runs in parallel with the classic one which saves the data on the hard

drive. Thanks to the dual format in which the data is saved by the database the optimizer can

decide if it wants to read the data from the hard disk or from RAM.

Table 4 - Oracle Compression Size and Elapsed Time

28

7. Experimental Results

In order to compare the performance of saving the data in memory and saving the data

on disk a few virtual machine instances will be created in which the data will be saved in both

format. Virtual Machines will be used in order to provide a clean operating system and also to

be able to easily reproduce the results in further research.

7.1. Tests Description

The tests will contain queries which implement ANSI (American National Standards

Institute) standard format. The main query types which were be tested were of the following

types combined: where filtered records, aggregations, select all rows, paginations, joins, count

rows. In each of the combinations the tests were ran selecting all column, one column or some

of the columns. Also the use of indexes was tested in order to see the performance gains.

As in most cases all or almost all the rows were selected the queries ran will be of

CTAS (Create Tables As) type. This method means that the queries will have a common

elapsed time of writing to disk as the same type of table will be created in order to have

relevant results.

In order to compare the In-Memory vs cache some of the queries were ran multiple

times to see if the cache can improve the table creation time.

The tests will be made on the following databases: MemSQL, VoltDB, Oracle and

NuoSQL. The tests will compare the disk performance vs in memory performance of the

same database (the performance between different vendors will not be compared).

A sample query from each of the tested databases is provided bellow:

7.2. Test Case Setup

The hardware on which the Virtual Machines were deployed contains the following

setup: Intel Core i5-4590 3,30 Ghz, Kingston HyperX BEAST 32GB (4x8GB), DDR3,

1600MHz, CL9, 1.5V, XMP and Hard Disk Western Digital Blue 1TB, 7200rpm, 64MB,

SATA 3. The operating system used on the physical machine was Windows 10 Pro.

The virtualization solution used was VirtualBox, which is provided by Oracle. When

deployed the VMs had assigned 4 cores and 24 GB of RAM. The operating systems used

were Oracle Linux 6.7 and Oracle Linux 7.1. Server edition was used as the operating system

in order to reduce as much as possible the hardware resources used by the operating system.

29

Although some of the databases work better with multiple instances or nodes for test

and comparison purposes one node should be enough as it will contain one node on both the

disk tests and in memory test.

The database versions used in the tests are: MemSQL, VoltDB 6.3 and Oracle

Database 12.1.0.2, NuoDB 2.4.1.2.

The tables in which the data is saved had 100,000 records and 5,000,000 records, and

10 columns, for the main table on which the data is saved and 10,000 records and 500,000

records for the smaller tables. Tests were made with both the small tables and big tables in

order to see if there is any big difference between the two. The records generated have been

generated randomly from a Java application for MemSQL and VoltDB and from inside

Oracle for Oracle Database.

7.3. Running the Queries

In order to run the queries results a simple Java program was made which ran the

queries through JDBC (Java Database Connectivity). Even though the JDBC does not assure

the best connectivity to the database (in comparison with the command line provided by the

database vendor) the tests should not be affected as the same connection was used in both the

in memory database calls and the on disk tables. The Java application will run five times each

query type and save the average time of execution. The queries were the same for each

database used, the single difference being of query compatibility issues in which case the

query has been rewritten in order to have the same desired result. I connected directly to the

Virtual Machine using Putty and I ran the java file directly on the machine in order to not

have network connectivity issues.

For each query a table was created which was deleted at the end of the run. This was

done in order to avoid display time issues as a big number of rows were selected in each

query. Only the average run time of the queries was displayed and in order to avoid better

performance in the consecutive runs the database cache was cleared after each query.

Also another batch of tests have been ran in which 10 concurrent users ran the same

query in order to also test the concurrency of queries. These batches of tests have been

implemented using Apache JMeter, a performance testing tool. In these tests no tables were

created.

30

The list of queries used:

● Query 1: Small table (10,000) all rows, all columns selected

● Query 2: Small table all rows, only 2 columns selected

● Query 3: Small table count rows

● Query 4: Small table with a like filter

● Query 5: Small table with a range filter

● Query 6: Small table with a where filter

● Query 7: Small table with average on one column

● Query 8: Small table with sum on one column and a where filter

● Query 9: Small table with group by and an average

● Query 10: Join between two small tables (10,000 and 1,000), all columns

selected

● Query 11: Join between two small tables with count

● Query 12: Join between two small tables with a group by and group functions

● Query 13: Big table (5,000,000) all rows, all columns selected

● Query 14: Big table all rows, only 2 columns selected

● Query 15: Big table count rows

● Query 16: Big table with a like filter

● Query 17: Big table with a range filter

● Query 18: Big table with a where filter

● Query 19: Big table with average on one column

● Query 20: Big table with sum on one column and a where filter

● Query 21: Big table with group by and an average

● Query 22: Join between two big tables (10,000 and 1,000), all columns

selected

● Query 23: Join between two big tables with count

● Query 24: Join between two big tables with a group by and group functions

7.4. Test Case Comparison

As stated above the thesis proposes to compare the results only between the same

database, in memory vs disk storage. The result times between databases will not be

compared, one of the main reasons for this being that the table creation time may differ

between the solutions and also the type of solutions provided by the vendors may differ and a

different architecture makes the comparison not compatible between them.

31

Only the read speeds were be compared. The write, update and delete speeds were not

be compared.

A number of different types of queries were used in order to also see some of the

drawbacks of using in memory saved tables. Also some differences were because of type of

the storage used, row store vs column store, meaning that the in memory is not the only factor

to be taken in account when comparing the results. All the queries have a create table as

added at the beginning as the number of records to display is too big and because the creation

and insertions of rows in both cases should be the same.

32

8. Conclusions

In order to better illustrate the results the query elapsed times were saved in the same

table. In the first row I added the query number, in the second one I added the elapsed time of

the query which accessed the disk and in the third row I added the query which used the in-

memory access. VoltDB had only two rows in the table because it does not offer querying

from disk, meaning it has only in-memory tables.

8.1. MemSQL Results

Query Number Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12

Disk elapsed time (ms)

102 124 22 64 55 57 45 30 58 134 72 66

In-Memory elapsed time (ms)

75 53 74 82 38 49 47 27 84 83 71 91

Table 5 - MemSQL small tables elapsed time



18757 10570 1999 1159 1478 4235 1560 215 447 26063 7453 2170


14118 6352 2550 880 387 6872 1022 567 2285 33772 8712 3172

Table 6 - MemSQL big tables elapsed time

0

10000

20000

30000

40000

50000

60000

70000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

On Disk

In-Memory

Figure 6 - MemSQL query comparison

33

As seen in the results from above the MemSQL database has most the queries with

about the same execution times, but there are a few queries which run better on disk and also

most of them ran better in-memory. The reason for these kind of result distribution is most

probably due to the data store type used by the compared queries, row store vs column store.

This means that the database administrators, database developers or architects must be very

careful when choosing the data store type for each table in order to have the best performance

gains. The recommended solution offered by MemSQL is to use in-memory tables for OLTP

operation and on disk tables for OLAP operations. This is in order to have most of the time

the best performance but also because the tables used in OLAP can get huge with a lot of

historical data which does not need to be accessed to often and it may occupy too much space

on RAM.

But as we can see in the tables above, Figure 7 and Figure 8, when we ran

concurrently with 10 users the MemSQL database ran much better in-memory in comparison

with the queries which ran from disk. The difference is very evident, this meaning that the In-

Figure 7- MemSQL disk elapsed time

Figure 8- MemSQL In-Memory elapsed time

34

Memory part of the MemSQL has been tuned to work better in an OLTP environment where

there are a lot of concurrent users present.

An issue with MemSQL appeared as seen above when the same queries have been run

by ten concurrent users. The queries on the huge tables have not been able to finish as the

heap space has been depleted. This is due to the MemSQL architecture which is optimized to

work on many nodes at a time so it should not be an issue when implementing MemSQL in

production. This should be further tested in order to confirm or infirm this hypothesis.

Another reason for which this should not be an issue in production is because most of the time

the users will not concurrently do expensive operations as these are mostly done in batches in

OLAP databases and the administrators can schedule these queries as needed after testing

them first.

8.2. VoltDB Results



283 240 220 233 233 235 242 235 271 965 890 915

Table 7- VoltDB single user query elapsed time default 100 MB tempspace

Figure 9-VoltDB multi-user query elapsed time

35

As we can see in comparison with the other database results VoltDB has many queries

which did not completed successfully as it did not manage to finish the queries as it did not

had enough memory. In the start it had only 100 MB default temp tablespace (this is in order

to have a better overall performance), but after an update on the MB limit it still did not have

enough memory in order to finish successfully the queries.

As also stated in the VoltDB documentation they are specialized mainly in OLTP kind

of queries and if calculations are needed they need a good partitioning of the big tables in as

many nodes as possible.

In conclusion the VoltDB architecture is surely not made to scale up, but it may still

have a very good scaling out capabilities. This needs to be tested in the future by using at least

four commodity server in which the computation will be distributed in order to see if there are

scenarios where VoltDB can accommodate bigger result sets with a desired performance.

8.3. Oracle Results

In order to properly run the Oracle queries hints needed to be used. INMEMORY and

NO_INMEMORY had to be added to the query in order to ensure that when the queries were

run that the Oracle Optimizer used the correct access paths.



47 15 9 15 11 15 12 12 13 67 11 19


118 15 12 19 17 16 9 8 12 51 54 19

Table 8-Oracle small tables query elapsed time



8761 2421 174 584 229 258 377 242 2040 13714 916 3102


10867 2594 41 428 11 10 262 8 1924 14629 642 2750

Table 9-Oracle big tables query elapsed time

36

As we can see in the results table most of the queries had a very close execution time,

but unlike the results from MemSQL we had some queries with a very big time difference.

The main difference between the queries were as expected with a faster execution time to the

in-memory side.

0

5000

10000

15000

20000

25000

30000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

On Disk

In-Memory

Figure 11-Oracle on disk multi-user scenario

Figure 10-Oracle single user elapsed time comparison

37

As we can see in the two JMeter results from above the main difference can be found

at queries 14, 16, 17, 19, were the in-memory solution had a much better performance. This

were all queries which had a filter selected. In the rest of the queries the time difference is

not as obvious, but overall the in-memory queries seem to be a little faster.

8.4. Final conclusion

Although the difference between most of the use cases tested in this thesis is not so

obvious it is clear that there still are some instances in which the in-memory databases have a

much better performance. A more thorough analysis can be made in the feature on these

instances in order to find the best fit of the in-memory databases.

Another issue which will be probably solved in the feature is the lack of online

documentation of most of the IMDBs and also the lack of features implemented in them in

comparison with more mature databases. This was to be expected as this is a new type of

databases with only a few years of development.

From comparing the test results from different databases we can see that the Oracle

Database had the fastest row retrieval. In the test created we used only one machine, meaning

we used a vertical scaling, Oracle Database being tuned for this kind of scenario. The other

tested in-memory databases are designed to be horizontally scalable, meaning that they could

Figure 12-Oracle In-memory multi-user scenario

38

easily surpass the Oracle Database performance if enough machines are used to process the

results.

8.5. Further Research

As further research the In-Memory Relational Database Systems must be also tested

using more than one machine as their architecture is more suitable for that kind design. The

number of machines should be varied in order to see the improvement threshold of using

them. For those tests the virtual machines should be installed on different physical machines

as the hardware cannot accommodate so easily the parallel running machines and also the

specification is most of the times not enough.

Also in the future they should also be compared with other in-memory databases,

especially the NoSQL databases are fit for this tests.

Another aspect which was not covered in this thesis is the use of more complex

database queries for which more than two tables are used in order to see the partitioning and

replication power of the IMDBSs.

39

9. Bibliography

[1] MemSQL official site, http://docs.memsql.com/docs/concepts-overview

[2] VoltDB official site, https://docs.voltdb.com/UsingVoltDB/

[3] Arup Nanda. Compressing Columns. Oracle Magazine, January/February 2010

[4] Hasso Plattner. The Impact of Columnar In-Memory Databases on Enterprise Systems,

2014

[5] Oracle official site, http://docs.oracle.com/database/121/index.htm

[6] MemSQL official site, http://docs.memsql.com/docs/distributed-sql

[7] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, Michael Stonebraker. OLTP

Through the Looking Glass, and What We Found There, 2008

[8] Christophe. How does a relational database work, http://coding-geek.com/how-

databases-work/

[9] Richard Clayton. Data Structures & Algorithms Lecture Notes,

http://bluehawk.monmouth.edu/rclayton/web-pages/f10-305/skiplists.html

[10] Daniel J. Abadi, Samuel R. Madden, Nabil Hachem. Column-Stores vs. Row-Stores:

How Different Are They Really?, 2008

[11] Maria Colgan. White Paper Oracle Database In-Memory, July 2015

[12] Aditi D. Andurkar. Implementation of Column-Oriented Database in PostgreSQL for

Optimization of Read-Only Queries, 2012

[13] VoltDB official site, https://docs.voltdb.com/

[14] VoltDB official site, https://voltdb.com/sites/default/files/tn-transactions.pdf

[15] https://en.wikipedia.org/wiki/H-Store

[16] Don Burleson. Oracle bitmap index maximum distinct values, http://www.dba-

oracle.com/t_bitmap_index_maximum_distinct_values_cardinality.htm

[17] Greg Larsen. Review Your BUCKET_COUNT Statistics with DMV,

http://www.databasejournal.com/features/mssql/review-your-bucketcount-statistics-with-

dmv.html

http://docs.memsql.com/docs/concepts-overview

https://docs.voltdb.com/UsingVoltDB/

http://docs.oracle.com/database/121/index.htm

http://docs.memsql.com/docs/distributed-sql

http://coding-geek.com/how-databases-work/

http://coding-geek.com/how-databases-work/

http://bluehawk.monmouth.edu/rclayton/web-pages/f10-305/skiplists.html

https://docs.voltdb.com/

https://voltdb.com/sites/default/files/tn-transactions.pdf

https://en.wikipedia.org/wiki/H-Store

http://www.dba-oracle.com/t_bitmap_index_maximum_distinct_values_cardinality.htm

http://www.dba-oracle.com/t_bitmap_index_maximum_distinct_values_cardinality.htm

http://www.databasejournal.com/features/mssql/review-your-bucketcount-statistics-with-dmv.html

http://www.databasejournal.com/features/mssql/review-your-bucketcount-statistics-with-dmv.html

Documents

In-Memory Relational Database Systemsalaiba/pub/absolvire/2016 vara/In-memoryDB.pdf · I hereby declare that the dissertation thesis with the title In-Memory Relational Database Systems