Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
“ALEXANDRU IOAN CUZA” UNIVERSITY OF IAȘI
FACULTY OF COMPUTER SCIENCE
DISSERTATION THESIS
In-Memory Relational Database
Systems
proposed by
-- Róbert Kristó--
Session: July, 2016
Scientific coordinator
Asist. Dr. Vasile Alaiba
2
“ALEXANDRU IOAN CUZA” UNIVERSITY OF IAȘI
FACULTY OF COMPUTER SCIENCE
In-Memory Relational Database
Systems
-- Róbert Kristó --
Session: July, 2016
Scientific coordinator
Asist. Dr. Vasile Alaiba
3
STATEMENT REGARDING ORIGINALITY AND COPYRIGHT
I hereby declare that the dissertation thesis with the title In-Memory Relational
Database Systems is written by me and wasn’t submitted to another university or higher
education institution in the country or abroad. Also, I hereby declare that all the sources,
including the ones retrieved online are appropriately cited in this paper, respecting the
rules of avoiding Copyright infringement:
- all exactly reproduced excerpts, including translations from other languages, are
written using quotes and hold an accurate reference to their source;
- rephrased texts written by other authors hold an accurate reference to the
source;
- the source code, images, etc. taken from open-source projects or other sources
are used by respecting copyright ownership and hold accurate references;
- summarizing ideas of other authors hold an accurate reference to the original
text.
Iași, date
Graduate: Róbert Kristó
__________________________________________
4
STATEMENT OF CONSENT I hereby declare that I agree that the dissertation thesis titled In-Memory
Relational Database Systems, the application source code and other content (graphics,
multimedia, test data, etc.) presented in this paper can be used by the Faculty of
Computer Science.
Also, I agree that the Faculty of Computer Science, “Alexandru Ioan Cuza”
University of Iași can use, modify, reproduce and distribute the application, executable
and source code, in non-commercial purposes, created by me for the current thesis.
Iași, date
Graduate: Róbert Kristó
__________________________________________
5
Contents
1. Abstract
2. Introduction
2.1. The Computation Problem
2.2. RDBMS Terminology
2.3. Relational Database Management Systems
2.4. In-Memory Relational Database Management Systems
2.5. The Durability of the In-Memory RDBMS
3. OLTP vs OLAP
3.1. Online Transaction Processing
3.2. Online Analytical Processing
3.3. Hybrid Transactional/Analytical Processing
4. Row Store vs Column Store
4.1. Row Store
4.2. Column Store
4.3. Row Store vs Column Store
4.4. Row Stores and Column Stores in RDBMS
5. RDBMS Indexes
5.1. B+ Tree Indexes
5.1. Bitmap Indexes
5.1. Skip Lists
5.1. Hash Indexes
5.1. Column Store Indexes
6. High Level Architecture of In-Memory RDBMS
6.1. MemSQL
6.2. VoltDB
6.3. Oracle
6
7. Experimental Test Case
7.1. Test Case Description
7.2. Test Case Setup
7.3. Running the Queries
7.4. Test Case Comparison
8. Conclusions
8.1. MemSQL Results
8.2. VoltDB Results
8.3. Oracle Results
8.4. Final Conclusion
8.5. Further Research
9. Bibliography
7
1. Abstract
This thesis wants to prove the advantage of using In-Memory relational database
management systems in comparison with or alongside other classical relational database
management systems which persists the information on the disk.
It contains a short introduction about In-Memory databases, followed by some of the
specific technologies used by them. I will present the main differences between the row store
and column store ways of saving data, followed by the indexes used by these kind of
databases. After that the general high level architecture of some of the current vendors which
implement In-Memory data stores.
In the last chapter the reader will also see some experimental results done on the
MemSQL, VoltDB, Oracle and NuoDB, followed by the thesis conclusion and further
research proposal.
8
2. Introduction
The In-Memory relational database management systems have appeared after a large
drop in hardware prices and their sizes, the data access on the RAM being much faster in
comparison with the one on the hard disk drive.
2.1. The Computation Problem
Until recently all the data processing was done overnight or weekends and the results
were saved on the hard disk. These type of computing was performed in batches and could
take from a few hours up to almost all weekend. This meant that the Database users had to
work using all data from the previous day and also could not check how their changes would
affect the outcome until the next day. This meant that a lot of productivity was lost and also
could potentially affect the overall revenue when the results were not the expected ones.
Another issue with this old approach was that the results could not take in time in
consideration important events that occurred in that day.
In the past few years the price of the RAM hardware became much cheaper which
means that now the most relevant data can be stored in RAM. In order to profit of these new
prices most of the RDBMS (Relational Database Management Systems) started to provide an
In-Memory solution, and also other new In-Memory specialized new RDBMSs started to
appear. This meant that the most user critical tables were now stored in RAM memory and the
data could be accessed much faster and also the calculation could be done faster.
2.2. RDBMS Terminology
Relational database management systems, both the ones which store the records in
memory and the ones which store them on disk use a few specialized terms. A few of the
wors from that vocabulary are presented below:
○ Batch - it is a method of doing data processing in which operations are ran
grouped together to finish a certain type of calculation
○ Data Warehouse - it is one of the database management systems which stores
old data in a denormalized form. It usually stores historical data which is
processed in batches. It also contains old archive data
○ CRUD Operations - CRUD (create, read, update, delete), represent the basic
database operations
○ Query - it represents the operation of reading data from the database
9
○ Primary Key - it assures us of the record uniqueness on a given column and it
does not have null values (we can have a maximum of one primary key per
table)
○ Unique Key - it is almost the same as the primary key, the difference being that
we can have unique values and we can create more than one on a table
○ Foreign Key - it makes the link between a record from the current table with a
record from another table using an unique key or primary key
○ I/O - Input/Output represents the operations of reading and writing to disk
○ Cache - this is where the most accessed data and recent data is saved on the
RAM
2.3. Relational Database Management Systems
The first implementation of a Relational Database Management Systems was in 1974
by IBM. It was called System R and it was only a prototype. The first commercial Relational
Database Management System was released in 1979 by Relational Software (now Oracle
Corporation) and it was called Oracle.
The Relational Database Management Systems are based on the relational model that
was invented by Edgar F. Codd. In these kind of databases the information is stored in tables
which are connected with each other using relationships. The records in the tables are stored
as tuple format.
In these database systems the tables are linked together using relations (Foreign Keys).
The tables are theoretically normalized using third normal form, but in practice they can be
denormalized in order to achieve better performance for corner case use cases. Almost all of
the tables have a primary key which are used to uniquely identify records and have values
saved for all the fields from the table.
The RDBMS vendors usually provide the SQL language as a method to query the
tables and also try to respect the SQL ANSI standard in order for the queries to be compatible
with other vendors and to offer an easier transition for developers from one database to
another.
2.4. In-Memory RDBMS
In order to better understand what are the In-Memory Relational Database
Management Systems it must first be explained what they are not.
Further most they are not In-Memory NoSQL (not only SQL) databases. These
databases are not relational, most of them storing the data in a key-value store or document-
10
store type. One of the most popular database of this kind is Redis. It is indeed an In-Memory
database but it does not store the data in a relational way, as it uses a key-value storage. It
offers isolation by using only one execution thread which is used by all the users in order.
Another database type with whom it must not be confused are the classic database
systems which are deployed as virtual machines directly in memory [7]. These can be of
course faster that being deployed on disk but they do not profit of the different memory type
(RAM instead of hard disk). Another disadvantage of this kind of databases is that in case of
an outage the user can very easily lose all the information stored on them.
These databases must also not be mistaken with distributed caching systems. These
ones save the most frequently accessed data in an application, and in case the user has enough
machines at our disposal the user can save all the information from the database. This caching
mechanisms do not offer durability and this is why they are mostly not used to store
information which is crucial for the user.
The principal characteristics of an In-Memory Relational Database Management
System is that it offers a much faster access to the data thanks to the much lower latency of
the RAM access in comparison with the one from the disk and it also uses specialized data
structures in order to have a faster access to the information. Because the information is saved
directly in the RAM the user does not need a caching mechanism anymore, the exception
being when the user save the data in dual mode (both on the hard drive and in the RAM), this
way the database may choose the faster way to access the data (it may use the cache when it
chooses to access the data from the hard disc). As it will be presented in the next chapter, the
In-Memory Relationship Database Management Systems try to offer a hybrid system between
OLTP and OLAP, being named HTAP or NEWSQL.
2.5. The durability of the In-Memory RDBMS
The data being saved in the RAM, memory which is lost in case of an outage, the
durability becomes an important problem which needs to be solved. This can be solved by
using systems which have backups, which are made on longer periods of time, when the
number of users is lower. These are used alongside with transaction logs which will be
applied on the backup in case of an outage
Another method used is by implementing a High-Available system, which consists in
using another machine in parallel (most of the time it is located in another location, called the
disaster recovery site) which continually gets instructions from the primary server and in case
the primary server has an outage a failover process will start making the secondary machine
now primary. The clients will automatically connect the newly assigned primary machine, this
11
way the loss of information will be very small, the failover usually being done automatically
and very fast.
The durability can also be assured by some of the In-Memory RDBMS’s by saving the
data in dual format (both on disk and in memory), this way no information being lost in case
of a server restart as the data will still be saved on the hard disk.
12
3. OLTP vs OLAP
The historical databases are split in two: OLTP and OLAP databases. In this chapter I
will highlight the main differences of these two and the ways in which the In-Memory
RDBMS’s tries to unify them under only one system.
3.1. Online Transaction Processing
Online transaction processing, most often abbreviated as OLTP, is the most used kind
of database. The most important characteristics of the online transaction processing databases
are:
● contains real time data which comes from a lot of short and fast CRUD (create,
read, update and delete) operations.
● uses simple queries which usually return or modify very few records
● most of the applications which are using this kind of databases must be able to
provide high user concurrency
● the schema is highly normalized and contains a lot of tables
3.2. Online Analytical Processing
Online Analytical Processing, abbreviated as OLAP, represents the batch processing
kind of databases. It is mostly used in data warehouse environments. The most typical
attributes of the Online Analytical Processing databases are:
● in usually helps in planning and decision making
● provides multi-dimensional views on consolidated historical data
● the data usually is consolidated from multiple online transaction processing
system by periodic batches or from other user provided systems
● the queries are much more complex than the ones from the online transaction
processing kind of systems and usually contain a lot of aggregations and
involve a far larger number of records
● must provide timely answers to questions involving high resource usage
● is denormalized using a star/snowflake schema design
3.2.1. Star Schema
The Star Schema is mainly used in Data Warehouse systems. The star schema usually
has one or more fact tables which references a number of dimension tables.
13
The fact tables contain transaction records with a transaction id and also ids which are
foreign keys to the dimensional tables.
The dimensional tables contain the actual information which needs to be accessed. In
this way when the user is doing some computation the user only accesses records from the
tables from which the user actually needs information, and by doing so the system is reducing
the I/O needed, which is the bottleneck in many database systems.
3.2.2. Snowflake Schema
The Snowflake schema resembles a lot to the Star Schema, the main difference
between the two being that the Star schema contains only dimension directly linked to the fact
tables, while the Snowflake schema can contain dimension which are also linked between
them. A dimension can also be linked only to another dimension.
3.3. Hybrid Transactional/Analytical Processing
As pointed above the two of them have very different use cases and historically the
data has been saved in both formats and usually in different systems. This meant a lot of
duplicated data being saved which until a few years ago meant a much higher expenses to the
customers.
As technology advanced and the hardware became much cheaper a new kind of
database system started to be used called Hybrid Transactional/Analytical Processing
(HTAP). This Relational Database Management System has both the capabilities of the OLTP
and OLAP system. This new term has been created by Gartner, Inc., which is a firm
specialized in comparing products based on categories and placing them in four magic
quadrants. It is used mainly by the In-Memory database systems.
14
4. Row Store vs Column Store
Currently the records in a RDBMS can be stored in row store and column store
format.
4.1. Row Store
In the first database implementations the records have been saved in a row store
format. This meant that for each records in the database a tuple was saved with values for
each field, they being saved next to each other and separated by a special character.
An example of how the records are being saved is depicted bellow in Table 1.
EMP_NAME JOB HIRE_DATE SALARY BONUS
SMITH SALESMAN 14/05/2015 1000 20
MACBETH MANAGER 13/01/2014 1000 25
MACBETH SALESMAN 20/06/2015 1500 20
Table 1 : Row Store Data Saving
The level row store format of the same records:
1: SMITH, SALESMAN, 14/05/2015, 1000, 20 -
2: MACBETH, MANAGER, 13/01/2014, 1000, 25 -
3: MACBETH, SALESMAN, 20/06/2015, 1500, 20
The first visible advantage is that a new record can be easily red, added, deleted or
updated as database can have a continuous read of the respective row id.
A disadvantage is the fact that if the user only wants for example to read the
EMP_NAME and BONUS fields the user still has to read the whole record which means that
a lot more I/O is needed which can greatly affect performance especially when a large number
of records is involved.
15
4.2. Column Store
The column store format is a new kind of saving the data which started to gain
popularity in RDBMS implementations only recently. In this format the user has separately
each column with the values of all the records from it.
An example of how the records are saved in a column store format can be seen bellow,
in Table 2.
EMP_NAME SMITH MACBETH MACBETH
JOB SALESMAN MANAGER SALESMAN
HIRE_DATE 14/05/2015 13/01/2014 20/06/2015
SALARY 1000 1000 1500
BONUS 20 25 20
Table 2 - Column Store Data Saving
Also a high level column store format:
SMITH: 1, MACBETH: 2, MACBETH: 3 -
SALESMAN: 1, MANAGER: 2, SALESMAN: 3 -
14/05/2015: 1, 13/01/2014: 2, 20/06/2015: 3 -
1000: 1, 1000: 2, 1500: 3 -
20: 1, 25: 2, 20: 3
As it can be seen in the example above a few records are mentioned a few times. In
order to improve the space used by the column store format and also improving the I/O
operations needed the database can compress the values by the expense of CPU [10].
16
A simple compression example is provided bellow:
SMITH: 1, MACBETH: 2; 3 -
SALESMAN: 1; 3, MANAGER: 2 -
14/05/2015: 1, 13/01/2014: 2, 20/06/2015: 3 -
1000: 1; 2, 1500: 3 -
20: 1; 3, 25: 2
The first visible advantage of the column store is that the database can now have less
I/O operations by reading only the data of interest. As an example if the user wants to read
only the data from the EMP_NAME field the database does not need to also read the records
from the other fields. A second advantage is as noted above the fact that the database can use
compression to reduce I/O operations and reduce consumed space.
A disadvantage of using the column store format is that tuples might require multiple
seeks.
4.3. Row Store vs Column Store
In conclusion column-oriented organizations are more efficient when aggregation
operation are being done over many rows but only if a smaller subset of fields are selected as
the number of data read is much smaller. Also the column store format are faster when many
new values are provided for a field at once in an update statement as only the values from one
field are accessed in order to change the data.
The row-oriented organizations are more efficient when a bigger number of fields are
accessed at once from a record as the field values are next to each other and also if the record
is small enough a single database seek may be necessary. This is also true when a new record
is added to the database as.
The advantages and disadvantages make the row store format to be well suited for an
OLTP kind of application, while the column store format is best used in an OLAP
environment.
17
Number of
Attributes Accessed
Execution Time
in seconds for
Row-Store
Execution Time
in seconds for
Column-Store
2 257.462 sec 128.731 sec
3 257.326 sec 128.899 sec
4 259.526 sec 153.923 sec
9 273.445 sec 288.565 sec
15 280.778 sec 8543.667 sec
25 290.199 sec 20899.542 sec
Table 3 - PostgreSQL Row-Store vs Column-Store [12]
A study on the PostgreSQL database has been made in order to compare the response
time of the row store vs column store by comparing the number of columns selected in the
query. Also a plot was provided in the study:
Figure 1- Column-Store vs Row-Store Performance in PostgreSQL [12]
18
4.4. Row Store and Column Store in RDBMs
Each RDBMS provider provides different implementation of these types of storages.
Oracle Database 12c provides both row store and column store format, but row store only on
disk and column store is saved only on RAM. The same table is used, as in the records from
the disk coincide with the ones on RAM. This improves a lot the read performance as the
Oracle Optimizer (a process which helps the database find the best execution plan in order to
retrieve the records) can now choose to read the data from both the column store or row store
depending on the type of calculation. Oracle also provides different types of column store
compressing options. The two main compression types are Warehouse compression and
Archive compression. Warehouse compression is ideal for query performance and is
specifically oriented towards scan-oriented queries which are mostly used in data warehouses.
This is ideal for tables that are queried frequently as they occupy more space than using
archive compression. The archive compression is in contrast better for tables which are
queried less frequently as they need more CPU in order to decompress the information but
occupy less space. A few of the algorithms used in order to compress the information are:
Dictionary Encoding, Run Length Encoding, Bit-Packing, and the Oracle proprietary
compression technique called OZIP that offers extremely fast decompression that is tuned
specifically for Oracle Database.
MemSQL also provides both row store and column store format, but instead it saves
the row store only on RAM and the column store only on disk. In this situation the tables are
not linked between them, meaning that it does not have to update the values in both locations
in case of an update, delete or insert operation, the only loss in this kind of situation being that
the data from the in-memory storage can be lost in case of an outage if no high-availability
solution is in place.
The best solution is provided by SAP HANA which provides all the combination, as
in the row store can be saved both on disk and RAM and also the column store can be saved
both on disk and RAM. Also it has the option to choose if the tables are linked or not.
Another good functionality provided by SAP HANA is that when it will do an insert into a
column store format table it will have the option to initially save the new record in cache in a
row store format in order to improve the overall insert performance as the record will be
added in the column store when the database will be less busy [4].
19
5. RDBMS Indexes
An index is an auxiliary data structure which is used in order to faster access data from
the tables. The cost of maintaining this extra structure is little in comparison with the read
performance gains. Usually the indexes contain some of the data from the table for which the
search can be based and also a row identifier which acts like a pointer to the values from the
table, as the fastest way to access records from a table is by using the row id.
The first indexes introduced in early RDBMs were the B+ Tree indexes. After the
introduction of the in-memory databases new lock free indexes have been invented which are
also lock free and are a better fit for RAM storage.
5.1. B+ Tree Indexes
The B+ tree concept has first been introduced in the 1970s. The B+ tree is a balanced
binary tree which contains values only in the leaf nodes, the rest of the nodes being called
decision nodes. Another important difference between the binary balanced tree is the fact that
we have bidirectional pointers between the leaf nodes and that helps it greatly to scan
consecutive values or when the database wants to scan the whole index (the operation is
called full index scan), as it does not also need to scan the decision nodes. The decision nodes
contain a value which is used to compare with the searched value in order to find out if the
value is saved on the left or right branch. The search performance gain is O(log 2 n).
An example of B+ Tree index can be seen bellow:
Figure 2 - B+ Tree [8]
20
As it can be seen above to search the value 19 must be compared with only two other
values. If the database wanted to search for the respective value directly in the table the
database would have had to compare it with 20 other values, and also the database would
have had to read the whole rows while searching if a row store column format was used.
The main disadvantage is that it is not a lock free structure because after each update,
insert or delete operation the database would need to rebalance the tree in order to retain the
read performance.
The B+ Tree indexes are useful only when the selectivity is high (if the number of
duplicates is around 10% or more the query will be faster accessing directly the table) or
when all the needed fields are found inside the index.
This index is good at searching for unique values and also to search for a range of
value as the records stored are in consecutive order.
5.2. Bitmap Indexes
The bitmap indexes have also been introduced early in most of the RDBMs. They
have been first invented in 1985 and they are best used when the number of distinct values is
very low.
An example of a bitmap index when storing cardinalities:
Figure 3 - Bitmap Index [16]
As it can be seen above when the search was made for example for the Cardinality the
rows which contain the value can be easily be found by accessing the row North (the cells
which have a dot inside them contain the value).
The bitmap index can be easily updated but in practice is not used very often as there
are not a lot of searches done on fields with low selectivity and also for example Oracle
Database provides bitmap indexes only in its Enterprise Edition.
5.3. Skip Lists
This type of index was introduced in the year 1990, with almost 20 years after the well
known B+ Tree Index. Unlike the B+ Tree Index this one does not guarantee a number of
steps to find the results. It relies on probabilities, but the changes to not return a result in a
21
satisfying time is very low, in average having a performance comparable with the one of a B+
Tree Index. This type of index fits well with the In-Memory database architecture because it
can profit of the RAM hardware type. Another big advantage of this type of index is that
unlike its B+ Tree Index counterpart it does not block resources, as it does not need to be
rebalanced like the B+ Tree. However if the user sees that the performance of the Skip lists is
degrading the user can rebuild it in order to have the data better scattered.
Depending on the number of records the user has in the table the database can build
this index using a number of lanes, representing the index depth. After the number of lanes is
established by the DBMS it will start to do the inserts. The first records will be added and
after that it will do a “coin toss” in order to establish the level on which the record will be
placed. It will start with the lowest level and will use the “coin toss” technique on each level
until the coin toss result will return a negative result. Depending on each DBMS
implementation the database can have on each higher level a higher change of having a
negative result in order to have the Skip List well balanced. This step represented the
probabilistic stage in order to determine the level on which each record will be placed. In
order to add the rest of the records the database will also start from left to right and will
descend after until the database will find the values searched in the index of when the
database will get to the first level. The most left side will be considered as minus infinite and
the right side as plus infinite. This way when database start from the left (as our value if
bigger than minus infinite) database will compare our value with each of the values from the
towers from that level until database find one which is bigger than our value (or equal). In that
moment database will go one level lower and database will make the same steps as the ones
from the above level until database find our searched value or database get to the first level. In
case our searched value is already in the index database will add it to the list of the tower in
which database found it or database will add a new tower in case it did not already exist, in
which case database will restart the previously shown coin toss technique in order to
determine the tower’s height.
As database can see in the above example database will simply add a new tower
between other two, for which there is no need to block the whole index, this way concurrent
users can still use the Skip List.
When database will search for a value in this index database will use a related
technique to the one used at insert by doing a left to right and top to bottom search until
database will find a tower which has the searched value or until database finish searching the
index. In case database will find the searched value database will return all the records saved
in that tower.
22
A search example is given bellow, in Figure 4.
In case the database wants to delete records the database will use the same technique
as the one from the search, the only difference being that the database will delete the tower
after the database find it. Like in the case of the insert the database does not need to block the
whole index, the RDBMS simply deleting the searched tower.
This type of index is implemented by the In-Memory RDBMS MemSQL, being the
recommended user by this database vender. This type of index has a good performance at
both unique value searching and range searching as all the values in the Skip List are placed
in consecutive order.
5.4. Hash Index
This type of index has been implemented by both classic RDBMSs like SQL Server
and new RDBMS competitors like MemSQL. This type of index has a good performance only
when the database is doing an unique value search and it has a very bad performance when
the database are doing a range scan, and for this reason some of the RDBMS vendors allow
the customers to only create the index on primary or unique keys. The records of the index are
stored in buckets which contain a number of different values. When a new value is inserted a
new bucket will be created in which the database will store it unless the hash function returns
the value of an already existing bucket. When a search will be made a hash function will be
applied to the searched value and after the database establish in which bucket its saved the
database will search for the record there (it is possible that the returned hash value exists but
the record does not exist in that bucket) [17].
Figure 4 - Skip List Search [9]
23
5.5. Column Store Indexes
Different DBMS vendors made their proprietary indexes to help for a faster search of
data. An index of this type was implemented by the Oracle Database in order to help us search
for records in its In-Memory column store or in is Exadata column store.
5.5.1. Column Store Indexes used by Oracle Database
As it was presented above, Oracle database uses a proprietary column store index. For
storing the column store data Oracle uses a new store type called IMCU (In-Memory
Compression Unit). Each IMCU contains a number of records based on their size. When a
search is done in these IMCUs first they will access the SMU (Snapshot Metadata Unit)
which contains information about that IMCU, more exactly its interested in the minimum and
maximum values from that IMCU in order to know if it need to access it or not, this way
saving I/O time and accelerating the finding of the information. This type of index is
automatically created by the Oracle Database. In the current documentation it is not specified
the used criteria on which the tables are chosen to have indexes made, this meaning that the
database can only speculate that the indexes are created for the most used tables, respectively
when the database has a lower load.
24
6. High Level Architecture of In-Memory RDBMS
In this chapter it will be presented the general architecture implemented by various In-
Memory database vendors. The architectures studied in this chapter will be the ones provided
by MemSQL, VoltDB, Oracle and NuoDB.
6.1. MemSQL
MemSQL is regarded as one of the fastest In-Memory database solutions, one of its
main competitors being the NoSQL (Not only SQL) database provider Redis. This is a two
layered database. The first layer its called the aggregator and the second one its called the leaf
layer. The simplest implementation contains one aggregator node and a leaf node, but the
recommended solution by the database vendor is to use five leaf nodes for each aggregator
node. On the leaf node the real information is stored. Having a shared-nothing type
architecture the information is automatically distributed between the leaf nodes in such way
that the information of one record is being stored on only one leaf node, this way no duplicate
data will be stored and redundancy will be avoided. The aggregator level contains the
metadata about these leaf nodes and it automatically know by using partitioning which leaf
nodes need to be queried in order to returned the desired data [6]. To have the best
performance gain its indicated that the user should use many different machines in order to
have as many leaves as possible. To also implement high availability and to not lose data in
case of an outage the user can create more availability groups, which contain the same data as
the first availability group which is normally access, and in case the main availability group
fails and automatically failover will be done. MemSQL automatically does the data
rebalancing in case one of the partition is no longer available. This way it is avoiding the
single point of failure. The leaf nodes are the ones which execute the queries provided by the
user and only if all the data cannot be found in only one leaf node that the aggregator will
combine the data from the other leaf nodes. The single way the database can ensure that the
data is all saved in the same partition is by using Shard Keys. In a table the database can have
for example a Shard Key on the department column and also a primary key on the Employee
and Department columns, this way the database can ensure that all the employees from the
same department are saved on the same leaf node.
25
The connection is done to the primary aggregator node and when this one fails
MemSQL automatically assigns another aggregator node as primary. In each cluster group
there is only one primary aggregator node [1]. His tasks include monitoring the cluster group,
operation regarding the cluster group and it is also the only aggregator node which does the
DDL (data definition language) operation. The aggregator nodes can be considered being a
type of load balancers or network proxies.
6.2. VoltDB
VoltDB is also a new In-Memory Relational Database Management System which had
its first version available in 2010. It is the only official implementation of the academic H-
Store project. The H-Store project has been implemented by a team which members are from
Brown University, Carnegie Mellon University, the Massachusetts Institute of Technology,
and Yale University. The database researchers involved in the system’s design are: Michael
Stonebraker, Sam Madden, Andy Pavlo and Daniel Abadi [15]. The H-Store project is the
first implementation of the new class of parallel database management systems called
NewSQL (also known by others as HTAP). This technology tries to have the high-throughput
and high-availability of NoSQL system, while still retaining the transactional part of the
relational database management systems [13].
Figure 5 - MemSQL High Level Architecture
26
The VoltDB saves all the data only In-Memory, this being a big difference between
MemSQL and Oracle Database. In case the database is shutdown it can recover the data from
continuous snapshots that are being made and also from command logging files [2], which
assure the user no transaction is committed without the commands being added to the log. It is
ACID and has a shared-nothing architecture, meaning that each node has different data from
the same table, the data being automatically partitioned. In case a query is submitted it can run
separately on each node using a different transaction. This way the total query time is
comparable with other databases but the number of transactions it can run is much bigger as
other transactions can run in parallel on another node [14]. VoltDB is single-threaded
meaning that locks, latches and transaction are no longer needed, helping the overall
performance of the database to be much better as the above operations are one of the mostly
costly in classic database implementations. In order for the system to run optimal it can save
smaller tables with all the records in each node, this way in case joins are needed each node
can still run the query concurrently. Currently VoltDB supports DML (data manipulation
language) queries only with the auto-commit mode turned on. The transactions are supported
only as stored procedures. The query modifications from the stored procedure act as a
transaction, all the modification being saved or rolled back in case of an error.
6.3. Oracle Database
Oracle database is at the moment one of the most used databases, it also being the first
which implemented a relational database system. In its last release, Oracle Database 12c, it
offers an option to its Enterprise Edition an In-Memory database solution [5]. To profit of the
new option it’s not necessary to make almost no changes to the database, the main change
being that RAM needs to be allocated for the respective tables, change which also requires a
database restart. After the allocation part was done the user can also choose which table wants
to add In-Memory and also can choose which columns the user would like to be added In-
Memory (columns can be excluded which the user deems unnecessary to be added In-
Memory). As a parameter for the respective tables the user can select a property by which
they will be added In-Memory. As options provided there are: Critical, High, Medium, Low,
None. The first four option represent the priority with which the tables will be added in
memory after the database is started and the tables which use the last option will be added in
memory after the first time they will be queried. Another parameter which can be used when a
table is added in memory let us choose the compression type of the records.
27
The options from which the user may choose are: MEMCOMPRESS,
MEMCOMPRESS FOR DML, MEMCOMPRESS FOR QUERY LOW, MEMCOMPRESS
FOR QUERY HIGH, MEMCOMPRESS FOR CAPACITY LOW, MEMCOMPRESS FOR
CAPACITY HIGH [3], [11]. Using one of these options the records can occupy less space but
more CPU cycles will be needed for the decompression part or they may occupy more space
but less CPU cycles will be needed. The compression can be very efficient, records occupying
between 2 and 20 times less space than on disk [11], this part being very important as the
space on RAM is in general much less than the space provided for the hard disk drive.
The durability of the records is offered by storing the information both on disk and
RAM. A new process was added in Oracle Database 12c which takes care of the storing of
date on RAM, which runs in parallel with the classic one which saves the data on the hard
drive. Thanks to the dual format in which the data is saved by the database the optimizer can
decide if it wants to read the data from the hard disk or from RAM.
Table 4 - Oracle Compression Size and Elapsed Time
28
7. Experimental Results
In order to compare the performance of saving the data in memory and saving the data
on disk a few virtual machine instances will be created in which the data will be saved in both
format. Virtual Machines will be used in order to provide a clean operating system and also to
be able to easily reproduce the results in further research.
7.1. Tests Description
The tests will contain queries which implement ANSI (American National Standards
Institute) standard format. The main query types which were be tested were of the following
types combined: where filtered records, aggregations, select all rows, paginations, joins, count
rows. In each of the combinations the tests were ran selecting all column, one column or some
of the columns. Also the use of indexes was tested in order to see the performance gains.
As in most cases all or almost all the rows were selected the queries ran will be of
CTAS (Create Tables As) type. This method means that the queries will have a common
elapsed time of writing to disk as the same type of table will be created in order to have
relevant results.
In order to compare the In-Memory vs cache some of the queries were ran multiple
times to see if the cache can improve the table creation time.
The tests will be made on the following databases: MemSQL, VoltDB, Oracle and
NuoSQL. The tests will compare the disk performance vs in memory performance of the
same database (the performance between different vendors will not be compared).
A sample query from each of the tested databases is provided bellow:
7.2. Test Case Setup
The hardware on which the Virtual Machines were deployed contains the following
setup: Intel Core i5-4590 3,30 Ghz, Kingston HyperX BEAST 32GB (4x8GB), DDR3,
1600MHz, CL9, 1.5V, XMP and Hard Disk Western Digital Blue 1TB, 7200rpm, 64MB,
SATA 3. The operating system used on the physical machine was Windows 10 Pro.
The virtualization solution used was VirtualBox, which is provided by Oracle. When
deployed the VMs had assigned 4 cores and 24 GB of RAM. The operating systems used
were Oracle Linux 6.7 and Oracle Linux 7.1. Server edition was used as the operating system
in order to reduce as much as possible the hardware resources used by the operating system.
29
Although some of the databases work better with multiple instances or nodes for test
and comparison purposes one node should be enough as it will contain one node on both the
disk tests and in memory test.
The database versions used in the tests are: MemSQL, VoltDB 6.3 and Oracle
Database 12.1.0.2, NuoDB 2.4.1.2.
The tables in which the data is saved had 100,000 records and 5,000,000 records, and
10 columns, for the main table on which the data is saved and 10,000 records and 500,000
records for the smaller tables. Tests were made with both the small tables and big tables in
order to see if there is any big difference between the two. The records generated have been
generated randomly from a Java application for MemSQL and VoltDB and from inside
Oracle for Oracle Database.
7.3. Running the Queries
In order to run the queries results a simple Java program was made which ran the
queries through JDBC (Java Database Connectivity). Even though the JDBC does not assure
the best connectivity to the database (in comparison with the command line provided by the
database vendor) the tests should not be affected as the same connection was used in both the
in memory database calls and the on disk tables. The Java application will run five times each
query type and save the average time of execution. The queries were the same for each
database used, the single difference being of query compatibility issues in which case the
query has been rewritten in order to have the same desired result. I connected directly to the
Virtual Machine using Putty and I ran the java file directly on the machine in order to not
have network connectivity issues.
For each query a table was created which was deleted at the end of the run. This was
done in order to avoid display time issues as a big number of rows were selected in each
query. Only the average run time of the queries was displayed and in order to avoid better
performance in the consecutive runs the database cache was cleared after each query.
Also another batch of tests have been ran in which 10 concurrent users ran the same
query in order to also test the concurrency of queries. These batches of tests have been
implemented using Apache JMeter, a performance testing tool. In these tests no tables were
created.
30
The list of queries used:
● Query 1: Small table (10,000) all rows, all columns selected
● Query 2: Small table all rows, only 2 columns selected
● Query 3: Small table count rows
● Query 4: Small table with a like filter
● Query 5: Small table with a range filter
● Query 6: Small table with a where filter
● Query 7: Small table with average on one column
● Query 8: Small table with sum on one column and a where filter
● Query 9: Small table with group by and an average
● Query 10: Join between two small tables (10,000 and 1,000), all columns
selected
● Query 11: Join between two small tables with count
● Query 12: Join between two small tables with a group by and group functions
● Query 13: Big table (5,000,000) all rows, all columns selected
● Query 14: Big table all rows, only 2 columns selected
● Query 15: Big table count rows
● Query 16: Big table with a like filter
● Query 17: Big table with a range filter
● Query 18: Big table with a where filter
● Query 19: Big table with average on one column
● Query 20: Big table with sum on one column and a where filter
● Query 21: Big table with group by and an average
● Query 22: Join between two big tables (10,000 and 1,000), all columns
selected
● Query 23: Join between two big tables with count
● Query 24: Join between two big tables with a group by and group functions
7.4. Test Case Comparison
As stated above the thesis proposes to compare the results only between the same
database, in memory vs disk storage. The result times between databases will not be
compared, one of the main reasons for this being that the table creation time may differ
between the solutions and also the type of solutions provided by the vendors may differ and a
different architecture makes the comparison not compatible between them.
31
Only the read speeds were be compared. The write, update and delete speeds were not
be compared.
A number of different types of queries were used in order to also see some of the
drawbacks of using in memory saved tables. Also some differences were because of type of
the storage used, row store vs column store, meaning that the in memory is not the only factor
to be taken in account when comparing the results. All the queries have a create table as
added at the beginning as the number of records to display is too big and because the creation
and insertions of rows in both cases should be the same.
32
8. Conclusions
In order to better illustrate the results the query elapsed times were saved in the same
table. In the first row I added the query number, in the second one I added the elapsed time of
the query which accessed the disk and in the third row I added the query which used the in-
memory access. VoltDB had only two rows in the table because it does not offer querying
from disk, meaning it has only in-memory tables.
8.1. MemSQL Results
Query Number Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
Disk elapsed time (ms)
102 124 22 64 55 57 45 30 58 134 72 66
In-Memory elapsed time (ms)
75 53 74 82 38 49 47 27 84 83 71 91
Table 5 - MemSQL small tables elapsed time
Query Number Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Q23 Q24
Disk elapsed time (ms)
18757 10570 1999 1159 1478 4235 1560 215 447 26063 7453 2170
In-Memory elapsed time (ms)
14118 6352 2550 880 387 6872 1022 567 2285 33772 8712 3172
Table 6 - MemSQL big tables elapsed time
0
10000
20000
30000
40000
50000
60000
70000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
On Disk
In-Memory
Figure 6 - MemSQL query comparison
33
As seen in the results from above the MemSQL database has most the queries with
about the same execution times, but there are a few queries which run better on disk and also
most of them ran better in-memory. The reason for these kind of result distribution is most
probably due to the data store type used by the compared queries, row store vs column store.
This means that the database administrators, database developers or architects must be very
careful when choosing the data store type for each table in order to have the best performance
gains. The recommended solution offered by MemSQL is to use in-memory tables for OLTP
operation and on disk tables for OLAP operations. This is in order to have most of the time
the best performance but also because the tables used in OLAP can get huge with a lot of
historical data which does not need to be accessed to often and it may occupy too much space
on RAM.
But as we can see in the tables above, Figure 7 and Figure 8, when we ran
concurrently with 10 users the MemSQL database ran much better in-memory in comparison
with the queries which ran from disk. The difference is very evident, this meaning that the In-
Figure 7- MemSQL disk elapsed time
Figure 8- MemSQL In-Memory elapsed time
34
Memory part of the MemSQL has been tuned to work better in an OLTP environment where
there are a lot of concurrent users present.
An issue with MemSQL appeared as seen above when the same queries have been run
by ten concurrent users. The queries on the huge tables have not been able to finish as the
heap space has been depleted. This is due to the MemSQL architecture which is optimized to
work on many nodes at a time so it should not be an issue when implementing MemSQL in
production. This should be further tested in order to confirm or infirm this hypothesis.
Another reason for which this should not be an issue in production is because most of the time
the users will not concurrently do expensive operations as these are mostly done in batches in
OLAP databases and the administrators can schedule these queries as needed after testing
them first.
8.2. VoltDB Results
Query Number Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
Disk elapsed time (ms)
283 240 220 233 233 235 242 235 271 965 890 915
Table 7- VoltDB single user query elapsed time default 100 MB tempspace
Figure 9-VoltDB multi-user query elapsed time
35
As we can see in comparison with the other database results VoltDB has many queries
which did not completed successfully as it did not manage to finish the queries as it did not
had enough memory. In the start it had only 100 MB default temp tablespace (this is in order
to have a better overall performance), but after an update on the MB limit it still did not have
enough memory in order to finish successfully the queries.
As also stated in the VoltDB documentation they are specialized mainly in OLTP kind
of queries and if calculations are needed they need a good partitioning of the big tables in as
many nodes as possible.
In conclusion the VoltDB architecture is surely not made to scale up, but it may still
have a very good scaling out capabilities. This needs to be tested in the future by using at least
four commodity server in which the computation will be distributed in order to see if there are
scenarios where VoltDB can accommodate bigger result sets with a desired performance.
8.3. Oracle Results
In order to properly run the Oracle queries hints needed to be used. INMEMORY and
NO_INMEMORY had to be added to the query in order to ensure that when the queries were
run that the Oracle Optimizer used the correct access paths.
Query Number Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
Disk elapsed time (ms)
47 15 9 15 11 15 12 12 13 67 11 19
In-Memory elapsed time (ms)
118 15 12 19 17 16 9 8 12 51 54 19
Table 8-Oracle small tables query elapsed time
Query Number Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Q23 Q24
Disk elapsed time (ms)
8761 2421 174 584 229 258 377 242 2040 13714 916 3102
In-Memory elapsed time (ms)
10867 2594 41 428 11 10 262 8 1924 14629 642 2750
Table 9-Oracle big tables query elapsed time
36
As we can see in the results table most of the queries had a very close execution time,
but unlike the results from MemSQL we had some queries with a very big time difference.
The main difference between the queries were as expected with a faster execution time to the
in-memory side.
0
5000
10000
15000
20000
25000
30000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
On Disk
In-Memory
Figure 11-Oracle on disk multi-user scenario
Figure 10-Oracle single user elapsed time comparison
37
As we can see in the two JMeter results from above the main difference can be found
at queries 14, 16, 17, 19, were the in-memory solution had a much better performance. This
were all queries which had a filter selected. In the rest of the queries the time difference is
not as obvious, but overall the in-memory queries seem to be a little faster.
8.4. Final conclusion
Although the difference between most of the use cases tested in this thesis is not so
obvious it is clear that there still are some instances in which the in-memory databases have a
much better performance. A more thorough analysis can be made in the feature on these
instances in order to find the best fit of the in-memory databases.
Another issue which will be probably solved in the feature is the lack of online
documentation of most of the IMDBs and also the lack of features implemented in them in
comparison with more mature databases. This was to be expected as this is a new type of
databases with only a few years of development.
From comparing the test results from different databases we can see that the Oracle
Database had the fastest row retrieval. In the test created we used only one machine, meaning
we used a vertical scaling, Oracle Database being tuned for this kind of scenario. The other
tested in-memory databases are designed to be horizontally scalable, meaning that they could
Figure 12-Oracle In-memory multi-user scenario
38
easily surpass the Oracle Database performance if enough machines are used to process the
results.
8.5. Further Research
As further research the In-Memory Relational Database Systems must be also tested
using more than one machine as their architecture is more suitable for that kind design. The
number of machines should be varied in order to see the improvement threshold of using
them. For those tests the virtual machines should be installed on different physical machines
as the hardware cannot accommodate so easily the parallel running machines and also the
specification is most of the times not enough.
Also in the future they should also be compared with other in-memory databases,
especially the NoSQL databases are fit for this tests.
Another aspect which was not covered in this thesis is the use of more complex
database queries for which more than two tables are used in order to see the partitioning and
replication power of the IMDBSs.
39
9. Bibliography
[1] MemSQL official site, http://docs.memsql.com/docs/concepts-overview
[2] VoltDB official site, https://docs.voltdb.com/UsingVoltDB/
[3] Arup Nanda. Compressing Columns. Oracle Magazine, January/February 2010
[4] Hasso Plattner. The Impact of Columnar In-Memory Databases on Enterprise Systems,
2014
[5] Oracle official site, http://docs.oracle.com/database/121/index.htm
[6] MemSQL official site, http://docs.memsql.com/docs/distributed-sql
[7] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, Michael Stonebraker. OLTP
Through the Looking Glass, and What We Found There, 2008
[8] Christophe. How does a relational database work, http://coding-geek.com/how-
databases-work/
[9] Richard Clayton. Data Structures & Algorithms Lecture Notes,
http://bluehawk.monmouth.edu/rclayton/web-pages/f10-305/skiplists.html
[10] Daniel J. Abadi, Samuel R. Madden, Nabil Hachem. Column-Stores vs. Row-Stores:
How Different Are They Really?, 2008
[11] Maria Colgan. White Paper Oracle Database In-Memory, July 2015
[12] Aditi D. Andurkar. Implementation of Column-Oriented Database in PostgreSQL for
Optimization of Read-Only Queries, 2012
[13] VoltDB official site, https://docs.voltdb.com/
[14] VoltDB official site, https://voltdb.com/sites/default/files/tn-transactions.pdf
[15] https://en.wikipedia.org/wiki/H-Store
[16] Don Burleson. Oracle bitmap index maximum distinct values, http://www.dba-
oracle.com/t_bitmap_index_maximum_distinct_values_cardinality.htm
[17] Greg Larsen. Review Your BUCKET_COUNT Statistics with DMV,
http://www.databasejournal.com/features/mssql/review-your-bucketcount-statistics-with-
dmv.html