Upload
stuart-clancy
View
47
Download
0
Tags:
Embed Size (px)
Citation preview
What is the future of the RDBMS in the Enterprise?
School of Computer Science and Statistics
TRINITY COLLEGE
What is the future of the RDBMS in the Enterprise?
Stuart Clancy
Edward Fitzpatrick
Degree Year
BSc (Hons) Information Systems
11th April 2011
A Dissertation submitted to the University of Dublin in partial fulfilment of the
requirements for the degree of BSc (Hons) Information Systems
Date of Submission: 11th April 2011
What is the future of the RDBMS in the Enterprise?
- III -
Declaration
We declare that the work described in this dissertation is, except where otherwise stated,
entirely our own work, and has not been submitted as an exercise for a degree at this or any
other university.
Signed:___________________
Stuart Clancy
Date of Submission:
Signed:___________________
Edward Fitzpatrick
Date of Submission:
What is the future of the RDBMS in the Enterprise?
- IV -
Permission to lend and/or copy
We agree that the School of Computer Science and Statistics, Trinity College may lend or
copy this dissertation upon request.
Signed:___________________
Stuart Clancy
Date of Submission:
Signed:___________________
Edward Fitzpatrick
Date of Submission:
What is the future of the RDBMS in the Enterprise?
- V -
Acknowledgements
We would like to acknowledge and thank Ronan Donagher, our project supervisor and Diana
Wilson, the acting course director for their support, guidance and understanding throughout
our research project.
We would also like to acknowledge the unfailing support of our families, who have
encouraged us throughout the years of our study; our employers and work colleagues, who
have been patient and flexible with working arrangements in order to allow us to complete
our studies; and close friends who on occasion are called upon to provide a welcome
distraction and perspective.
Signed:___________________
Stuart Clancy
11th April 2011
Signed:___________________
Edward Fitzpatrick
11th April 2011
What is the future of the RDBMS in the Enterprise?
- VI -
Abstract
Managing data and information has been feature of human activity since the first
acknowledged symbols were etched onto stones by Neolithic humans. Since the emergence
of the Internet data as an available resource to man and machine has been growing rapidly.
This dissertation looks at what this means for the traditional relational database management
system (RDBMS). It asks if there is a future for the RDBMS in enterprise information system
architecture. It also examines the early developmental years of RDBMS in order to gain an
insight as why it has enjoyed relative longevity within a rapidly changing technology
environment. New types of database and data management systems are discussed such as
NoSQL and other open source non-relational DBMS such as Hadoop and Cassandra. The
data volume and data type problem is absorbed into various sections under the umbrella term
‘Big Data’. Utility companies and social networking sites are two sectors where the
management of large data volumes is a growing concern are examined in the two case
studies. A separate chapter on the research methodology chosen by us is included. It provides
the necessary balance between subject matter and method as set out in the initial
requirements.
Keywords:
Relational Theory, DBMS, RDBMS History, NoSQL, Hadoop, Cassandra, Database Market,
Big Data, Research Methodology.
What is the future of the RDBMS in the Enterprise?
- VII -
Table of Contents
Abstract....................................................................................................................................VI
List of Figures...........................................................................................................................X
List of Tables.............................................................................................................................X
List of Abbreviations...............................................................................................................XI
Chapter One - Introduction.................................................................................................... 1
1.1 The Research Question ........................................................................................... 1
1.2 Document Roadmap ................................................................................................ 2
Chapter Two - Literature review, findings and analysis ......................................................... 4
2.1 Introduction ............................................................................................................. 4
2.2 RDBMS................................................................................................................... 4
2.2.1 History of the RDBMS ....................................................................................... 10
2.2.2 Main Features of ‘true’ RDBMS......................................................................... 13
2.2.3 IBM, Ellison and the University of California, Berkley....................................... 15
2.3 New Databases ...................................................................................................... 19
2.3.1 Features of NoSQL Databases ............................................................................ 20
2.3.2 Hadoop............................................................................................................... 23
2.3.2.1 Components of Hadoop ................................................................................... 24
2.3.3 Cassandra ........................................................................................................... 25
2.4 The market for RDBMS’ and Non-Relational DBMS’........................................... 27
2.4.1 Introduction ........................................................................................................ 27
2.4.2. RDBMS Market................................................................................................. 27
2.4.2.1 Vendor Offerings ............................................................................................. 28
2.4.4 Open Source Databases....................................................................................... 32
What is the future of the RDBMS in the Enterprise?
- VIII -
2.4.4.1 Non-RDBMS Market....................................................................................... 32
2.5 Case Studies .......................................................................................................... 36
2.5.1 Case Study 1- Utility Companies and the Data Management challenge ............... 36
2.5.1.1 Introduction ..................................................................................................... 36
2.5.1.2 Utilities............................................................................................................ 36
2.5.1.3 Smart Grid - The ESB case .............................................................................. 39
2.5.1.4 The Data Volume Problem............................................................................... 41
2.5.1.5 How one utility company is meeting the data volume challenge ....................... 44
2.5.1.6 What is the ESB doing? ................................................................................... 45
2.5.1.7 Conclusion....................................................................................................... 46
2.5.2 Case Study 2 - Social Networks – The migration to Non-SQL database models .. 47
2.5.2.1 Facebook Messages ......................................................................................... 48
2.5.2.2 Twitter - The use of NoSQL databases at Twitter............................................. 49
Chapter Three - Research Methodology .............................................................................. 52
3.1 Introduction ........................................................................................................... 52
3.2 The strategy adopted for researching the question. ................................................. 53
3.3 A Theoretical Framework ...................................................................................... 55
3.4 Research Design .................................................................................................... 57
3.5 Methodology - A Qualitative Approach ................................................................. 58
3.6 Methods................................................................................................................. 58
3.6.1 Method - Analytic Induction ............................................................................... 59
3.6.2 Method - Content Analysis ................................................................................. 59
3.6.3 Method - Historical Research.............................................................................. 59
3.6.4 Method - Case Study........................................................................................... 60
3.6.5 Method - Grounded Theory ................................................................................ 60
3.7 Ethics Approval .................................................................................................... 61
3.8 Audience .............................................................................................................. 61
What is the future of the RDBMS in the Enterprise?
- IX -
3.9 Significance of research......................................................................................... 61
3.10 Limitations of the research methodology ............................................................. 62
3.11 Conclusion....................................................................................................... 62
Chapter Four - Conclusions, Limitations of Research and Future Work............................... 63
4.1 Introduction ........................................................................................................... 63
4.2 Conclusions ........................................................................................................... 64
4.2.1 RDBMS.............................................................................................................. 64
4.2.2 New DB’s........................................................................................................... 64
4.2.3 Market ................................................................................................................ 65
4.2.4.1 Case Study 1 - Utility Companies .................................................................... 66
4.2.4.2 Case study 2 - Social Networks........................................................................ 66
4.3 Future Research..................................................................................................... 67
4.3.1 NoSQL ............................................................................................................... 67
4.3.2 Case Studies ....................................................................................................... 68
4.3.3 Business Intelligence .......................................................................................... 68
4.3.4 Research Methodology ....................................................................................... 68
4.4 Limitations of the Research ................................................................................... 69
4.5 Final thoughts........................................................................................................ 70
REFERENCES............................................................................................................ 71
APPENDIX 1.............................................................................................................. 85
What is the future of the RDBMS in the Enterprise?
- X -
List of Figures
Figure 2.1 - A simplified DBMS .......................................................................................... 9
Figure 2.2 – Overview of a generic Smart Grid ................................................................... 40
Figure 2.3 - ESB proposed implementation of Advanced Metering ..................................... 41
Figure 2.4 – Smart Meters transaction rate…………………………………………………..42
Figure 2.5 – Smart Meters data size………………………………………………………….42
Figure 2.6 - Sources of Smart Grid data with time dependencies……………………………43
List of Tables
Table 2.1 - Impact of unstructured data on productivity............................................................8
Table 2.2 – Example of redundant rows in a database.............................................................14
Table 3.1 - Key concepts in Qualitative and Quantitative research methodologies.................54
Table A.1 - Edgar Codd’s original relational model terms…………………………………..85
What is the future of the RDBMS in the Enterprise?
- XI -
List of Abbreviations
ACID – Atomicity, Consistency, Isolation and Durability.
ACM – Association of Computing Machinery.
BA - Business Analytics.
BASE - Basically Available, Soft state, Eventual consistency.
BI - Business Intelligence.
BSD - Berkeley Software Distribution.
CA - Computer Associates.
CAP - Consistency, Availability and Partition tolerance.
CIS – Customer Information System.
CODASYL – Conference on Data Systems Language.
CRM - Customer Relationship Management.
DBMS – Database Management Sys
DMS – Distribution Management System.
DW- Data Warehousing.
ERM - Enterprise Relationship Management.
GB - Gigabyte
GBT - Google Big Table.
GFS - Google File System.
GIS - Geographical Information System.
HA - High Availability.
HDFS - Hadoop Distributed File System.
IA – IBM’s Information Architecture.
What is the future of the RDBMS in the Enterprise?
- XII -
IBM - International Business Machines.
ISV - Independent Software Vendor.
IT – Information Technology.
KB - Kilobyte
MB - Megabyte
MDM - Meter Data Management.
MPL - Mozilla Public Licence.
MR - MapReduce.
NoSQL – ‘No’ SQL or more often ‘Not Only’ SQL.
OEM - Original Equipment Manufacturer.
OLAP - Online Application Processing.
OLTP - Online Transaction Processing.
OMS - Outage Management System.
OS - Operating System.
OSI - Open Source Initiative.
PB - Petabyte
PDC - Phasor Data Concentrators
PLM - Product Life-cycle Management.
RDBMS - Relational Database Management System.
SCADA - Supervisory Control and Data Acquisition.
SOA - Service Oriented Architecture.
SQL - Structured Query Language.
TB - Terabyte
Page 1
Chapter One - Introduction
Humans have being storing information outside of the brain probably before the first
consistent markings on a bone were found in Bulgaria dating from more than a million years
ago. Certainly so since the later Neolithic clay calculi bearing symbols representing
quantities, the cave paintings at Lascaux over 17,000 ago; through to the invention of the
moveable type printing press and eventually to the first computers. Since the emergence of
the information age of the last fifty years or so the amount of data transferred and stored in
computers has grown rapidly. Research from the International Data Corp (IDC) in 2008 puts
that growth at 60% per annum (The Economist, 2010).
An added complexity is that executive strategies now have business intelligence for
competitive edge as a key goal. Data management systems that for many years have been the
old reliable work horse toiling away in the back end somewhere are once again playing a key
role in driving business growth. The question is, are they still capable of carrying out this new
and challenging task? This dissertation asks that question and more specifically what is the
future for the Relational Database Management System (RDBMS) in the Enterprise?
The data volume problem now has a name ‘Big Data’. Its nascence coincides with the growth
of the Internet. Alternative solutions to traditional RDBMS to deal with ‘Big Data’ soon
followed. Much of these solutions are either based on multi-parallel processing (MPP a.k.a
distributed computing) or flipping the row store of RDBMS into column store systems. More
recently MPP solutions are being positioned not as alternatives but complements to RDBMS
(Stonebraker et al., 2010). Add to this mix a dynamic data management market where
vendors are acquiring new technology, merging with each other, adopting open source and
creating hybrid stacks in an effort to gain advantage in a market deemed to grow to $32
billion by 2013 (Yuhanna, 2009).
1.1 The Research Question
Time was taken to carefully frame our research question so as to provide a clear path of
exploration on the subject. The subject could have been framed as a predicated hypothesis
such as: “The future for RDBMS in the Enterprise is looking bright” or a contrary statement
“The end is nigh for RDBMS”. We chose to frame our research as an open ended question to
What is the future of the RDBMS in the Enterprise?
Page 2
allow for a broad exploration of the subject with no preconception of the outcome. The
broadness of scope however is necessarily tempered by restricting our research to those
organisations defined as enterprises. There is difficulty here as there is no overarching
definition for an enterprise organisation. However, it is necessary to provide some clear
defined boundaries around the term. For this dissertation an enterprise is defined not by size
or function alone.
Enterprises for us are organisations where the scale of control is large. They include
companies with a large amount of customers and employees, as well as companies that
control a large infrastructure or several functional units. Enterprises have one top-level
strategy to which all other functional units are aligned. The last point is an important
characteristic of an enterprise for our dissertation as it applies to decision making for
acquiring information management systems.
The presence of the word 'future' is central to locating the research in an exploratory and
intuitive research domain. It prompts looking into the past in an attempt to explain the present
and predict the future. It forces an open mind and questioning approach. It enables the
creation of new ideas which are either taken on or set aside for another time. The chapters
and sections are set out below in an attempt to follow this map in the view that the journey is
the objective rather than the destination.
1.2 Document Roadmap
In writing this dissertation a balance was sought between addressing the issues raised by the
initial question and the research methodology chosen. The bulk of this dissertation therefore
centres on those two areas. In this chapter we introduce the concept of our research and why
we feel it is interesting to us. The research question is explained and the objective is put in
context. Chapter two contains the literature review. The chapter begins with an outline of
RDBMS, its features and history of development. Particular attention is given to the role of
IBM in the development of RDBMS. The chapter moves on to discuss new databases and
data management systems. A section on the DBMS market follows and presents an overview
of the current vendor offerings. The market section does not attempt a comparison of
available systems as this work was carried out in greater detail by others more expert than us.
Throughout the dissertation we refer the reader to such work where it is not feasible for us to
reproduce it.
What is the future of the RDBMS in the Enterprise?
Page 3
Two case studies are included for the benefit of putting the research question in a practical
context. The two areas chosen involve contrasting enterprises. On one hand there is the
relatively long established utilities sector and on the other the new phenomenon of social
networking and its associated companies. Even though they operate in widely different
markets generating different types of data, they both share similar problems when it comes to
managing large amounts of data. Likewise, both are trying to get to grips with extracting
value out of data for competitive edge.
Chapter three discuss the research methodology chosen by us. It deserves a chapter to itself in
view of the objective of this dissertation. The chapter begins with an introduction on research
theory. It then moves to a discussion on our research strategy. A research framework is
introduced as a model of our strategy. The different methodologies available are outlined and
our chosen option is explained. Next, a group of related research methods are outlined and
the reason for their selection is stated. Short sections on ethics approval, audience and the
significance of the research follows before a final section on the limitations of our chosen
research methodology closes the chapter.
The final chapter attempts to pull together the conclusions and findings from the all the
previous sections. Relevant research threads and ideas not covered in sufficient detail in the
dissertation are mentioned. The last sections present a summary of the limitations of the
overall research and our final concluding thoughts.
What is the future of the RDBMS in the Enterprise?
Page 4
Chapter Two - Literature review, findings and analysis
2.1 Introduction
In this section the focus is on RDBMS. The intention is to provide an overview of its defining
features. It is not an in-depth technical analysis of RDBMS and we would refer the reader to
better papers on the subject such as those published in the Communications of the
Association of Computing Machinery (ACM) of which we refer to several times. It also sets
out the background to the development of RDBMS. Within that context an interesting
discovery is made with respect to IBM’s initial role in the development of database
management systems. For the purpose of exploring the question on the future of RDBMS
some associated concepts are discussed such as data types, ‘true’ RDBMS, and whether or
not the past can teach us something about the future.
2.2 RDBMS
Databases
It is unfortunate that in realm of Information Technology (IT) acronyms are not always self-
explanatory. Many such acronyms don’t travel outside of their specific domain very well.
Take for example DQDB or Distributed Queue Dual Bus; outside of the world of high speed
networks this may seem to be a very efficient urban transport vehicle. Luckily the term
RDBMS contains within itself the individual components which define it: a system (S)
composed of a database (DB) where information is stored by creating relationships (R)
between data elements and which can be managed (M) by users. It is helpful at this point to
explain the hierarchy, at least, of each of these components.
Throughout this dissertation data (and datum-singular) and information are taken to be a
classifications of entities stored in a system. Data being lowest in the sense of the taxonomy
data – information – knowledge - wisdom (sometimes called understanding) but not lower in
What is the future of the RDBMS in the Enterprise?
Page 5
real value; a single digit integer may be enough data to invoke the required wisdom to make
an important decision. For the purpose of simplicity, data here means a binary entry (such as
yes or no, 1 or 0), or a nominal entry (such as dog, 470, Smith, XRA9000 etc.). An analogy
from biology might see data as the molecules which make up a cell of information. The word
‘molecules’ is carefully suggested instead of ‘atoms’ given that ‘atomicity’ has particular
significance for relational databases. Permitting an extension of the analogy would see a body
of knowledge built from the cells of information. It would be unwise to stretch the analogy
further to address wisdom. Unhelpfully, the words ‘data’ and ‘information’ are often
interchangeable terms in research literature. Some examples of this are the concepts ‘Big
Data’ and ‘unstructured data’ for what really ought to be called information. For this reason
and for the purpose of consistency this dissertation will hold with the literature and consider
the two terms as one except where a distinction is required.
A database has been defined in a number of sources as a “collection of related data or
information” (Bocij et al. 2006, p. 153; Elmasri and Navathe, 1989, p. 3).
The Oxford English dictionary defines a database as a “structured set of data held in a
computer” (OED). However, the Cambridge Advanced Learner’s online dictionary (2011)
definition is perhaps closer to a contemporary definition:
“A large amount of information stored in a computer system in such a way that it can
be easily looked at or changed”.
It is noted that the definition in the later online edition of the Cambridge (2011) does not have
any explicit reference to relational, structured or organised data. This looser definition
reflects the changing nature of data management as newer types and bigger volumes of data
are being captured.
Finally, a definition from the business world which expands on the above mentioning
different types of data and hints at the issues regarding scale:
A database is “a systematically organized or structured repository of indexed information
(usually as a group of linked data files) that allows easy retrieval, updating, analysis, and
output of data. Stored usually in a computer, this data could be in the form of graphics,
What is the future of the RDBMS in the Enterprise?
Page 6
reports, scripts, tables, text, etc., representing almost every kind of information.” (Business
Dictionary, 2011).
Structured and unstructured data.
The last definition above alludes to unstructured data. Unstructured data is data in the form of
text (words, messages, symbols, emails, sms texts, reports) or bitmaps (images, graphics). A
good example of the growing relevance of unstructured information is a Facebook page
containing images, short messages, links, and chunks of text that can be altered at any time.
Structured data by contrast is any data “that has an enforced composition to the atomic data
types” (Weglarz, 2004). Atomicity is a characteristic of stored entity which is not divisible
(Elmasri and Navathe, 1989, p. 41). Atomicity is a key necessity for defining structured data
and is what relational databases rely on to make relationships. A database designer can decide
on the exact rules for the structured data and the level of atomicity required. As an aside, it is
often this small amount of flexibility in the design of the data model which is responsible for
the creation of many ‘bad’ databases. Structured data is data that is consistent, unambiguous
and conforms to a predefined standard. Structured data will be examined in more detail later
under the section discussing RDBMS. A third type is semi-structured data. This is data held
in a standard format such as forms, spreadsheets and XML files. This type of data can be
parsed by computer programs more easily than unstructured data due to the data generally
being located in a fixed and known place, even if the data itself is not atomic.
The problem of structured versus unstructured data types can be stated using the example of
two schools. One school grades students in the traditional way by giving a numerical grade
following examination. Another school does not give numerical grade to students, preferring
a method whereby students are furnished with a qualitative report on their overall
performance. The former is structured data as the meaning of a grade of 82% is consistent in
the context of the schools grading system. It can be easily recorded, measured, and compared
to other grades internally or from other schools using the same system. The report format
however is unstructured and comparison with a numerical grading system is not so easy.
Gleaning relevant information from a text report is complex and involves semantic analysis
with or without the help of technology.
What is the future of the RDBMS in the Enterprise?
Page 7
What does this mean for enterprises?
Eighty percent of information relevant to business is unstructured and is mostly textual form
(Langseth in Grimes, 2011). Seth Grimes an analytics expert with the Alta Plana Corporation
has previously investigated this claim. He concludes that even if the origins of the 80% are
elusive (Grimes tracks back as far as the 1990’s) experience supports the claim (Grimes,
2011). Patricia Selinger (IBM and ACM Fellow) who has worked on query optimisation for
27 years puts unstructured data in companies at about 85% (Selinger, 2005). Even assuming a
lower figure than 80% for unstructured data in larger enterprises, where much information is
in structured forms held in traditional transaction based databases, there is still the problem of
how to leverage competitive advantage out of the nuggets of information buried in the rich
seams of unstructured data. Businesses are realising that the chances of extracting valuable
wisdom from traditional data stores using stale analysis methods and tools are diminishing
and that new ideas are needed.
Unstructured data is growing faster than structured data, according to the "IDC Enterprise
Disk Storage Consumption Model" 2008 report, “while transactional data is projected to
grow at a compound annual growth rate (CAGR) of 21.8%, it's far outpaced by a 61.7%
CAGR prediction for unstructured data” (Pariseau, 2008).
Kevin McIssac (2007) of Computer World magazine puts it into perspective:
“Unfortunately business is drowning in unstructured data and does not yet have the
applications to transform that data into information and knowledge. As a result staff
productivity around unstructured data is still relatively low.”
McIssac gives examples of the impact of unstructured data on productivity citing research
from various sources. Table 2.1 below summarises those impacts:
What is the future of the RDBMS in the Enterprise?
Page 8
Time/Volume Impacts on Research Source
9.5 hours per
week
Average time an office worker spends
searching, gathering and analysing
information (60% of that on the Internet)
Outsell
10% of working
time
Time professionals in creative industry
spend on file management.
GISTICS
600 e-mails per
week
Sent and received by a typical business
person.
Ferris Research
49 minutes per
day
Time an office worker spends managing e-
mail. Longer for middle and upper
management.
ePolicy Institute
Table 2.1 - Impact of unstructured data on productivity.
Where are the joins?
It seems that a reappraisal of what a database is or needs to do is well under way. If this is so,
then this reappraisal logically extends to the database management system. Structured data
can be joined to other structured data to form concatenations of information using a query
language based on mathematical operations. Things get a little more ‘fuzzy’ with
unstructured data. Stock market analysts might like to try querying an online media sources
for all posts where the word ‘oil’ is used but only in the context of the recent crises in Libya.
How unstructured and unrelated data is to be stored in the system and how meaningful
information can be retrieved back out of that same system are questions many organisations
are now asking – but, similar questions were asked before and the past may hold some
lessons for us.
What is the future of the RDBMS in the Enterprise?
Page 9
A DBMS
In its simplest definition a DBMS is a set of computer programs that allows users to create
and maintain a database (Elmasri & Navanthe, 1989 p. 4). Bocij et al. (2006, p. 154) expands
on this definition a little: “One or more computer programs that allow users to enter, store,
organise, manipulate and retrieve data from a database.”
(Source: Elmasri and Navathe, 1989 p. 5)
Figure 2.1 - A simplified DBMS
Figure 1 above shows the key components of a data management system. A detailed
description of each of the components of the system is not necessary for our purpose but
briefly they are:
• Application programs with which users can interact with the stored data.
• Software programs for processing and accessing the stored data.
• A high-level declarative language interface for executing commands (commonly
known as a query language).
What is the future of the RDBMS in the Enterprise?
Page 10
• A repository for storing data.
• A store of information related to the data for classifying or indexing purposes (meta-
data)
• Hardware suitable for each of the above functions
• Users (includes database administrators and designers)
2.2.1 History of the RDBMS
To understand why newer types of databases and data management systems are emerging and
taking hold it seems reasonable to explore why RDBMS’ came into existence, as well as their
usefulness and relative longevity.
The 1960’s BC (Before Codd)
Data management systems existed before Edgar Codd, while at IBM, wrote his seminal paper
published in 1970 called “A Relational Model of Data for Large Shared Data Banks”. Codd’s
paper presented a new database model and hence introduced the world of database
management to relational theory (Codd, 1970). In his paper Codd discusses the limitations of
the existing hierarchal and network data systems and introduces a query language based on
relational algebra and predicate calculus.
In a later important paper he described 12 rules for a relational database management system
(Codd, 1985). Systems that satisfy all 12 rules are rare. In fact, it is argued that no truly
relational database systems existed in wide commercial production even a decade after
Codd’s vision (Don Heitzmann in Thiel, 1982), and even up to more recently (Anthes, 2010).
A brief description of the two data management systems (of whose limitations Codd
addressed) is a useful precursor to a broader description of relational DBMS’.
Hierarchal Data Models
Hierarchal data models are similar to tree-structured file systems in that the data is stored as
parent-child relationship. Codd asserts that hierarchal and network based DBMS’ were not
data models in comparison to his more formalised Relational model. (Codd, 1991). For
simplicity the word ‘model’ is maintained for the data structure of all systems under
What is the future of the RDBMS in the Enterprise?
Page 11
discussion here. The model made sense to organisations that were naturally hierarchal in
nature - a legacy of Henri Fayol and his 14 management principles, popular in the 1960’s and
still used in organisations today (Stoner and Freeman, 1989; Tiernan et al., 2006). A
hierarchal data model can be presented as a tree-structure of parent-child relationships or as
an adjancy list. For example: a root entity with no parent might be SCHOOL; STUDENT is a
child of SCHOOL; GRADE is a child of STUDENT. STUDENT is also a child of COURSE.
In this type of structure data can be replicated many times in different branches of the tree, a
relationship of ‘one to many’ or 1:N. A ‘modified preorder tree traversal' algorithm is used to
number each entity on the way down through the tree-structure (left value) and again on the
way back up to the root (right value). Thus, making the query operations more efficient in
navigating around the data (Van Tulder, 2003).
The first hierarchal DBMS was developed by IBM and North American Aviation in the late
1960’s (Elmasri and Navathe, 1989 p. 278). IBM imaginatively called it Information
Management System (IMS) and Frank Hayes dates its roll out to 1968 (Hayes, 2002). Elmasri
and Navathe cite McGee (1977) for a good overview of IMS (1989, p. 278).
Network Data Models
As can be seen in the hierarchal data model above a child could have many parents. A
STUDENT for instance, can take more than one MODULE in any COURSE YEAR. In a
hierarchal structure the same STUDENT would appear under each of the MODULE trees. In
other words many students can take many modules. The Network data model was a further
development of the hierarchal model to address the issue of managing ‘many to many’ (M:N)
relationships. The Conference on Data Systems Languages (CODASYL) defined the network
model in 1971 (Elmasri and Navathe, 1989).
Where the underlying principle of the hierarchal model was parent-child tree structures, in a
network model it is set theory. Records are classified into record types and given names.
These records are sets of related data. Record types are akin to tables in a relational database
model. The intricacies of set theory are beyond the scope of this dissertation; however, it
suffices to say that complex data combinations can be achieved by nesting record types
within other record types – data sets as members of other data sets. If this were possible in a
relational database it would be like having tables within tables within tables.
What is the future of the RDBMS in the Enterprise?
Page 12
The earliest work on a network data model was carried out by Charles Bachman in 1961
while working for General Electric. His work resulted in the first commercial DBMS called
Integrated Data Store (IDS) which ran on IBM mainframes. The system was cumbersome and
was eventually redeveloped by an IDS customer, BF Goodrich Chemical Company into what
was called IDMS (Hayes, 2002). With Bachman on board as a consultant, IDMS was
eventually commercialised by Cullinane/Cullinet Software in the 1980’s. Cullinet was bought
by Computer Associates (CA) in 1989. IDMS is a current offering by CA for mainframe
database management systems today. Charles Bachman received the Turing Award in 1973
for his pioneering work in developing the first commercially available data management
system, for being one of the founders of CODYSYL and for his work on representation
methods for data structures (Canning in Bachmann, 1973).
The 1970’s
Adabas DBMS was developed in the 1970 by Software AG. It has an interesting feature of
relevance to this dissertation. Adabas was designed to run on mainframes for enterprises with
large data sets and requiring fast response times for multiple users. One of its main features is
that it indexes data using inverted-list type indexing.
Adabas also features a data storage address convertor which avoids data fragmentation. Data
fragmentation can occur when a record is updated with additional data. The record is now too
large to be stored in the original location. The data can be moved to a new location but the
indexes still expect the data to be in the same place so they also have to be updated. The
address convertor does this. The alternative as used by other systems is data fragmentation;
part of the data is stored in the original location with a pointer to where the remainder is
stored. Fragmentation and pointer methods however require additional processing and hence
slower response times. The problem of using pointers in systems predating RDBMS instead
of storing data directly (in tuples as is done in RDBMS) is referred to by IBM’s Irv Traiger
(in McJones, 1997 pp. 16-17).
According to Carl Monash, Adabas’ inverted-list indexing is the favoured method for
searching textual content. New ideas regarding the management of text (unstructured data)
has according to Monash “at least the potential of being retrofitted to ADABAS, should the
payoff be sufficiently high” (Monash, Dec 8 2007).
What is the future of the RDBMS in the Enterprise?
Page 13
Edgar Codd and the birth of the Relational Model
Codd’s text ‘The Relational Model for Database Management’ of 1990 (version 2, 1991)
brings together his ideas set out in his previous papers regarding Relational Data Model for
managing databases. In it he places his model as solidly based on two areas of mathematics:
Predicate Logic and Relational Theory. In order for the maths to work effectively, there are
four essential concepts associated with the relational model: domains, primary keys, foreign
keys and no duplicate rows. In particular, the importance of Domains has not been
understood fully or adopted by later commercial versions of his RDBMS (Codd 1991, pg18).
Also, two early prototypes IBM’s System R and Berkley University and Michael
Stonebraker’s INGRES were not concerned about the need to address the issue of duplicated
rows. The designers of both those systems felt that the additional processing required to
eliminate duplicate rows was unnecessary given the relative benign presence of duplicate
rows (Codd, 1991, p. 18). Codd’s purer model based on mathematic principles gave way to
the more pragmatic needs of the commercial world.
2.2.2 Main Features of ‘true’ RDBMS
The main features of a Relational DBMS as proposed by Codd, distinguishes a ‘true’
Relational DBMS from other DBMS’. Based on his earlier paper setting out his 12 Rules
(1985), they are summarised as follows:
• Database information is values only and ordering is not essential (meta data while
required should not be of concern to the everyday user; pointers are not used)
• Data management is not dependant on position within the structure (contrast with
Hierarchal and Network models).
• Duplicate rows are not allowed.
• Information should be capable of being moved without impact on the user.
• Three level architecture of the RDBMS – base relations, storage, views (derived
tables).
• Declarations of domains as extended data types.
What is the future of the RDBMS in the Enterprise?
Page 14
• Column description should be akin to the domain it belongs to (i.e. a good naming
convention).
• Each base relation (R-Table) should have one and only one primary key column,
where null value entries are not allowed.
• RDBMS must allow one or more columns to be assigned as foreign keys.
• Relationships are based on comparing values from common domains.
This last point is crucial to understanding Codd’s intention. Only values from common
domains can be properly compared – currency with currency, euro with euro, date with date,
integer with integer etc. The basis for this lies with the nature of the mathematical operators
used in the system. Consistency of data types and strict rules are therefore vital for the
effective operation of the system. Herein lays one of the difficulties presented to designers of
commercial versions of Codd’s RDBMS. Users of data management systems are presented
with real world scenarios where consistency is not always practical. It would be ridiculous to
ask members of a social networking site to use standard forms for communicating so that the
DBMS could store the relevant information appropriately. Even closer to the relational
database world a transaction record could be created for a person called William Thomas as
follows:
Instance Surname Forename Address DOB ID Order No
1 Thomas William 22, Greenview Street 12/06/1945 1234 104
2 Thomas Bill 22 Greenview St. 12/06/1945 1365 104
3 Thomas William H. 22, Greenview Street 12/06/1945 3456 104
Table 2.2 – Example of redundant rows in a database
As can be seen in this simple example above, the database treats these as three distinct and
unique records, even though the intention is that only one record for this person should exist.
The result impacts on the size, processing speed and integrity of the system. Techniques to
address such problems (primarily data normalisation) were developed almost from the
beginning, in the early 1970’s by Codd and later by Raymond Boyce and Codd (Elmasri and
What is the future of the RDBMS in the Enterprise?
Page 15
Navathe, 1989, p. 371). Database normalisation is beyond the scope of this dissertation,
however the salient point and (and the reason for our initial hypothesis) is that the nature and
amount of unstructured data flowing in the electronic ether has pushed RDBMS and its
associated control and optimisation processes to the limits of their capabilities.
Debashish Ghosh of Anshin Software while advocating the merits of non-relational models
nevertheless puts it fairly…
“A relational data management system (RDBMS) engine is the right tool for handling
relational data used in transactions requiring atomicity, consistency, isolation, and
durability (ACID). However, an RDBMS isn’t an ideal platform for modelling
complicated social data networks that involve huge volumes, network partitioning,
and replication”. (Ghosh, 2010)
The above discussion is intended to provide an important distinction between Edgar Codd’s
original theory of a relational data management system and subsequent versions developed
for the commercial enterprise market (mainframe computer market at that time). The
importance of the mathematical principles (Relational Algebra and Calculus) behind Codd’s
ideas are not underestimated, nor are the associated operations based upon those principles, in
fact they are key to understanding why Codd at the time persisted in pushing for a full and
true implementation of his model, and it may also explain also why he stepped back from the
first experiments in commercialising his ideas (Chamberlin and Blasgen in McJones, 1997 p.
13). Brevity here forces us to move on to look at two of the earliest commercial versions of
RDBMS that by no accident are also the two market leaders today.
As an aside, Appendix 1 presents of useful comparison of the key terms from Codd’s original
intended meaning and their relationship to other systems.
2.2.3 IBM, Ellison and the University of California, Berkley
IBM
One artefact cited several times in this section on the history of data management systems is a
transcript from a reunion meeting in 1995 of some of the original IBM research employees,
who during the 1970s and 1980s were at the coal face of data management development. The
article edited by Paul McJones is entitled “The 1995 SQL Reunion: People, Projects, and
What is the future of the RDBMS in the Enterprise?
Page 16
Politics” (McJones, 1997). At first, what seems like the convivial reminiscences of middle
aged ex IBM colleagues, in fact turns out to be a rather more interesting illumination of the
context around the timelines for the development of some of the most important ideas to
emerge, as well as the historically important players and products from the realm of database
management. Some of the key people attending the reunion and contributing to the discussion
are: Donald Chamberlin, Jim Gray, Raymond Lorie, Gianfranco Putzolu, Patricia Selinger,
and Irving Traiger. All are IBM and ACM Fellows and award winners for their work. Jim
Gray, fellow Berkley graduate and mentor to Michael Stonebraker was given the ACM
Turing Award in 1998 for his work on transaction processing (ACID) (Stonebraker, 2008).
Patricia Selinger was awarded the ACM Edgar Codd Innovation Award for her work in query
optimisation. Their contributions were vital to the features of commercial RDBMS which has
ensured its longevity thus far and possibly for many years yet.
IBM and System R
Midway through the 1970s IBM’s San Jose based research lab began working on a project
called System R. Like many IBM research projects at the time it came out of different task
groups working on related areas such as data language, data storage, optimisation, concurrent
users, and system recovery. System R was relational based and combined work from various
groups. System R as a commercial RDBMS was installed in Prat & Whitney Aircraft
Company in Hartford Connecticut in 1977 where it was used for inventory control. However,
IBM was not yet interested in releasing it as fully featured product. At that time the big IBM
cash cow was IMS (its mainframe Network model DBMS mentioned earlier). And the
research focus was on a project called Eagle – a replacement for IMS with all the new
features of recent discoveries. With the pressure off, the System R developers plugged away,
aiming it towards the lower midrange product line (Jolls in McJones, 1997, p. 31). Two
things happened at the time which resulted in the focus coming back on System R and getting
it ready for market (McJones, 1997, pgs 33-34). Firstly, IBM was starting to loose ground to
new mini computers (Gray in McJones, 1997, pg 20) and secondly the Eagle project was
hitting a wall. System R unlike Eagle was relational and already pitched towards the smaller
computer range. The System R star did not shine for long and it was replaced by DB2 with
Release 1 in 1980. IBM fully embraced relational DBMS with Release 2 around 1985 (Miller
in McJones 1997, p. 43). DB2 is IBM’s current offering and is mentioned again under the
section on the RDBMS market.
What is the future of the RDBMS in the Enterprise?
Page 17
The Birth of SQL
In and around the same time that System R was being developed, the language research team
at IBM, Relational Data Systems (RDS) took on Codd’s two mathematical based languages
for data management, relational algebra and relational calculus. By their own admission they
found these mathematical notations too abstract and complex for general use. They developed
a notation which they called SQUARE (Specifying Queries as Relational Expressions),
(Chamberlin in McJones et al., 1997 p. 11)
SQUARE had some odd subscripts so a regular keyboard could not be used. RDS further
developed it to be closer to common English words. They called the new version Structured
English Query Language or SEQUEL. The intention was to make interaction with databases
easier for non-programmers. However its biggest impact came later when Larry Ellison (co-
founder and CEO of Oracle) read the IBM published papers on SEQUEL and realised that
this query language could act as an intermediary between different systems (Chamberlin in
McJones et al., 1997 p. 15). It was the RDS team at IBM who renamed it to SQL following a
trademark challenge to the term SEQUEL from an aircraft company (McJones et al, 1997, p.
20)
INGRES
In parallel with the work going on at IBM, the University of California at Berkley had a
project developing a system called INGRES (short for Interactive Graphics Retrieval
System). Michael Stonebraker who was at Berkley in 1972 was developing a query language
called QUELL. Stonebraker knew fellow Berkley graduates at IBM San Jose and more
importantly knew of their work. INGRES used QUELL whereas IBM and Larry Ellison’s
project at Software Development Laboratories (later Oracle) used SQL. Subsequent off
spring of the INGRES family are Sybase and Postgre (post Ingres). Incidentally, Microsoft
struck a deal with Sybase to use their code for their new extended operating system.
Recalling that the Sybase people were brought up in the QUELL tradition under Stonebraker,
Microsoft preferred SQL. They eventually fell out and Microsoft who now owned the Sybase
code ended up developing Microsoft SQL Server (Gray in McJones, 1997 p. 56).
What is the future of the RDBMS in the Enterprise?
Page 18
Oracle
In 1977 Larry Ellison, Bob Miner and Ed Oates founded Software Development Laboratories
(SDL), the precursor to Oracle Corporation. SDL based its system on a technical paper in an
IBM journal (Oracle History, 2011). That was Edgar Codd’s 1970 seminal paper setting out
his model for a RDBMS (Traiger in McJones et al., 1997). SDL’s first contract was to
develop a database management system for the Central Intelligence Agency (CIA) - the
project was called ‘Oracle’. SDL finished that project a year early and used the time to
develop a commercial RDBMS putting together the work done by IBM research on relational
databases and as mentioned above another project on working on the query language called
SEQUEL. While Ellison and SDL benefited from the work done at IBM they still had to do
all the coding. The resulting product was faster and a lot smaller than IBM’s System R. The
first officially released version of Oracle was version 2 in 1979.
Brad Wade jokes about Edgar Codd’s influence on Oracle - on Codd being made an IBM
Fellow in 1976, “It’s the first time that I recall of someone being made an IBM Fellow for
someone else’s product” (Wade in McJones, 1997, pg 49.)
It appears that many new enterprises sprang from the well of knowledge existing at IBM
during the 1970’s and 1980’s. Had the IBM research units not had so much talent, nor not
allowed publication of key papers at the time, the database world might look very different
today. Patents on software were prohibited by IBM, and also in fact by Supreme Court law
until 1980 (Bocchino, 1995). According to Franco Putzolu, IBM Research at that time and up
until 1979 were “publishing everything that would come to mind” (in McJones, 1997, p. 16).
Mike Blasgen argues that the outside interest in the published research was one reason why
the corporate machine of IBM began to notice some of the lesser research projects (in
McJones, 1997 p. 16).
It is hoped that the above overview gives the reader some understanding of the related threads
that developed out of Charles Bachman’s initial work on data management systems, through
IBM via Edgar Codd and out into the wide world via IBM research department’s open
attitude to sharing knowledge, of which Larry Ellison’s Oracle benefited greatly. Berkley
played its role also in the providing a common alma mater for young enthusiastic developers
to discuss ideas. It is an interesting irony that when we think of ‘open source’ we envision a
What is the future of the RDBMS in the Enterprise?
Page 19
recent phenomenon, however, IBM during the 1970’s would appear to have been a little
more open, for whatever reasons, than is usually accredited to them.
2.3 New Databases
This section will explore the development of new DB’s that have emerged on the database
market over the past decade, and what impact these DB’s will have on the general database
market as a whole.
What are ‘New DB’s’?
Traditional databases rely on a relational model in order to function. That is, they follow a set
of rigid rules to ensure the integrity of the data in the database. Most RDBMS models follow
the set of rules, originally outlined by Edgar Codd (1970).
New NoSQL database models don’t follow all of the rules set down by Codd. While
RDBMS’ models follow the set of properties called ACID as previously stated, NoSQL
database models do not. They follow any number of database properties including BASE
(Basically Available, Soft state, Eventual consistency) (Cattell, 2011) and CAP (Consistency,
Availability and Partition tolerance).
Why the development of NoSQL model databases?
Development of NoSQL databases was as a result of the evolution of the World Wide Web,
and the desire of individuals and companies/organizations to generate data, large amounts of
it (White, 2010, p. 2). By collecting data, organizations then had extract value from that data
in order to be successful in whatever field they participated, in the future.
The problem organizations faced in extracting value from that data were twofold:
1. As storage capacities increased, the means of transferring the data to the drive(s) did
not keep up. Twenty years ago, a hard drive could store 1.3 GB of data, while the
speed at which the entirety of the data could be accessed was 4.4 MB per second;
about five minutes to access it all. Today, 1 TB hard drives are the norm, but access
What is the future of the RDBMS in the Enterprise?
Page 20
speeds are about 100 MB per second; an access speed decrease of a factor of 30
(White, 2010, p. 3).
A means of getting around this bottle neck was the introduction of disk arrays,
whereby data could written and read from multiple disks in parallel. The drawback to
this was the possibility of hardware failure, whereby a disk or machine would fail and
the data lost (White, 2010, p. 3). Redundancy (various options of RAID being the
most famous examples) solved some of these problems but not all (Patterson, 1988).
2. The second problem is that with multiple disks, relational database models, with their
inbuilt consistency requirements, are unable to access data quickly enough when the
data is spread across multiple disk drives. RDBMS systems may not be able to allow a
query to access certain data if that data is already in use by another program or user
(Chamberlin, 1976).
2.3.1 Features of NoSQL Databases
In order for a Database to be considered a NoSQL database, it first must not comply with the
entirety of ACID properties. Amongst the features that define NoSQL databases include
Scalability, Eventual Consistency and Low Latency (Dimitrov, 2010). A key feature of
NoSQL databases is a “shared-nothing” architecture. This means databases can replicate and
partition data across multiple servers. In turn, this allows the databases to support a large
number of simple read/write operations per second (Cattell, 2011).
Scalability
With traditional RDBMS systems, a database was usually required to scale up, that is, switch
over to a newer, larger capacity machine, if the database is to expand capacity (Cattell, 2011).
One of the features designed into some NoSQL databases is their ability to scale to large data
volumes without losing the integrity of the data. With NoSQL, as systems are required to
expand with an influx of additional data, they scale out by adding more machines to the data
What is the future of the RDBMS in the Enterprise?
Page 21
cluster. With this scaling, NoSQL systems can process data at a faster speed than RDBMS, as
they are capable of spreading the workload of the processing over numerous machines
(Cattell, 2011).
Eventual Consistency
Eventual Consistency was pioneered by Amazon using the Dynamo database. The purpose of
its introduction was to ensure High Availability (HA) and scalability of the data. Ultimately,
data that is fetched for a query is not guaranteed to be up-to-date, but all updates to the data
are guaranteed to be propagated to all copies of the data on all nodes of the cluster eventually
(Cattell, 2011).
This ensures that databases are accessible to programs and individuals whom wish to read or
modify data, without the constraints of being locked out of a database or data field while the
data is currently being updated or read, as is the case with RDBMS databases models.
Low Latency
Latency is an element of the speed of a network. It refers to any number of delays that
typically occur in the processing of data (Mitchell, no date). In the case of NoSQL databases,
it means that queries can access the data and return answers more quickly than RDBMS
because the data is distributed across multiple nodes of a cluster, instead of one machine.
This results in a faster response time. Causes for high latency in traditional RDBMS model
databases include the seek time of hard disks (Mitchell, no date), the speed of the network
cables that run on the machines, and the bad programming of queries (Stevens, 2004)
(Souders, 2009).
NoSQL database models
Unlike RDBMS models, NoSQL data models are often inconsistent. For storage purposes,
NoSQL databases have a number of data model categories, which are listed below:
Key-value Stores
Databases that have this model use a single key-value index for all the data. These systems
provide persistence mechanisms as well as additional functions such as replication, locking,
What is the future of the RDBMS in the Enterprise?
Page 22
transactions and sorting. NoSQL databases such as Voldemort and Riak use Multi-Version
Concurrency Control (MVCC) for updates. They update data asynchronously, so they cannot
guarantee consistent data (Cattell, 2011).
Key-value store databases can support traditional SQL functionality, such as the ability to
delete, insert and lookup operations (Cattell, 2011).
Document Stores
This model supports more complex data than key-value stores. They can support secondary
indexes and multiple types of documents per database. A number of database models using
this include Amazon’s SimpleDB and CouchDB
Document Store databases provide a querying mechanism for the data they contain using
multiple attribute values and constraints (Cattell, 2011).
Extensible Record Stores
Influenced by Google’s Bigtable, Extensible Record Store databases consist of rows and
columns, which are scaled across multiple nodes. Rows are split across nodes by ‘sharding’
the primary key. This means that querying a range of values does not have to go to every
node. Columns are distributed over multiple nodes by using ‘column groups’. This allows the
database customer to specify which columns are best stored together, which has the added
advantage of being able to be queried faster, as all the most appropriate data for a query is
most likely close at hand: e.g., name and address (Cattell, 2011).
The most famous examples of an Extensible Record Store database available, save Google’s
proprietary Bigtable, are HBase and Cassandra. Additional databases that use the model are
Hypertable, sponsored by Baidu (Hypertable, 2011), and PNUT (Yahoo Research, 2011).
What is the future of the RDBMS in the Enterprise?
Page 23
Graph Databases
A graph database maintains one single structure – a graph (Rodrieguez, 2010). A graph is a
flexible data structure that allows for a more agile and rapid style of development (Neo4J,
2011).
A graph database has three main attributes:
1. Node – the location of the machine in which the data is stored
2. Relationship – this is a label given to the data item, which determines which data
in the same or other node that the original data is related too.
3. Property – this is the attribute of the data. (Neubauer, 2010)
The purpose of graph databases is to quickly determine the relationships between different
items of data. Examples of graph databases include the Neo4j database and Twitter’s
FlockDB, which is used to join up the tweets between those who post them and all of their
followers (Weil, 2010).
2.3.2 Hadoop
Hadoop/MapReduce
Hadoop is a distributed database model originally developed by Doug Cutting at Yahoo
(White, 2010, p. 9), using Google’s proprietary Bigtable database as a model (Apache, 2011).
Throughout its short history, developers have added components that allow Hadoop to
process the data that it collects more efficiently
Hadoop contains a number of components that allow the system to scale to large clusters of
machines, without impacting the overall integrity of the data stored on those machines. The
main component of Hadoop is MapReduce.
MapReduce is a framework for processing large datasets that are distributed across multiple
nodes/servers. The ‘map’ part of the framework takes the original inputted data and partitions
the data, distributing the original input to different nodes. The individual nodes can then, if
necessary, redistribute the data again to other sub-nodes. MapReduce then applies the map
What is the future of the RDBMS in the Enterprise?
Page 24
function in parallel to every item in the dataset, producing a list of pairs for each query
(White, 2010, p. 19). The ‘reduce’ part of the framework then collects all of the common key
values, sums them up, and returns a single output for the keys and a value(s). The reduce
function, in effect, removes duplication within the system, allowing queries to return results
more speedily (White, 2010, p. 19).
Hadoop is designed for distributed data, with a dataset split between multiple nodes, if
necessary. If MapReduce must query data that is located on multiple nodes, then the map
function will map all the data for the query that is located on a single node, and return the
result. It will do the same query on all nodes that the relevant data is located on. The reduce
function will then take all those map results and reduce them down to single values, again to
return the query result(s) (White, 2010, p. 31).
Both functions are oblivious to the size of the dataset that they are working on. As such, they
can remain the same irrespective of the size of the dataset, large or small. Additionally, if you
double the input data, the job will run twice as slow; however, if you double the size of the
cluster, a job will run as fast as the original one (White, 2010, p. 6).
HDFS
HDFS is the file system that allows Hadoop to distribute data across multiple
nodes/machines. HDFS stores data in blocks, similar in fashion other file systems. However,
while other file systems have small sized blocks, HDFS, by default has large size blocks. This
is to reduce the number of seeks that Hadoop must make in order to return a query, speeding
up the process (White, 2010, p. 43).
2.3.2.1 Components of Hadoop
HBase
Based on Google’s Bigtable, HBase was developed by Chad Walters and Jim Kellerman at
Powerset. The purpose of the development of HBase was to give Hadoop a means of storing
large quantities of fault-tolerant data. It can also sit on top of Amazon’s Simple Storage
Service (S3) (Wilson, 2009). HBase was developed from the ground up to allow databases to
What is the future of the RDBMS in the Enterprise?
Page 25
scale just by adding more nodes – machines – to the cluster that HBase/Hadoop is installed
on. As it does not support SQL, it can do what an RDBMS database cannot; host data on
sparsely populated tables, located on clusters made from commodity hardware (White, 2010,
p. 411). The structure of HBase is designed with a ‘master node’, which has control of any
number of ‘slave nodes’, called Region Servers. The master node is responsible for assigning
regions of the data to the region servers, as well as being responsible for the recovery of data
in the event of a region server failing (White, 2010, p. 413). In addition to this setup, HBase
is designed with fault tolerance built in – HBase, thanks to HDFS, creates three different
copies of the data spread across different data nodes (Dimitrov, 2010).
Hive
Hive is a scalable data processing platform developed by Jeff Hammerbacher at Facebook
(White, 2010, p. 365). The purpose of Hive is to allow individuals whom have strong SQL
skills to run queries on data that is stored in HDFS.
When querying the dataset, Hive first tries to convert SQL queries into MapReduce jobs, as
well as custom commands that allow it to target different partitions within the HDFS dataset,
allowing users to query specific data within the Hadoop cluster (White, 2010, p. 514). This
allows Hive to provide users with a traditional query model from older RDBMS
environments within the newer distributed NoSQL database environments.
2.3.3 Cassandra
Cassandra is a fault tolerant, decentralised database that can be scaled and distributed across
multiple nodes (Apache, 2011 - Lakshman, 2008). Developed by Avinash Lakshman at
Facebook (Lakshman, 2008), Cassandra is now an open source project run by the Apache
Foundation (Apache, 2011).
Initially designed to solve a search indexing problem, Cassandra was designed to scale to
very large sizes across multiple commodity servers. Additionally, the ability to have no single
point of failure was built into the system (Lakshman, 2008). Since Cassandra was designed to
scale across multiple servers, it had to overcome the possibility of failure at any given
location within each server, such as the possibility of a drive failure.
What is the future of the RDBMS in the Enterprise?
Page 26
To guard against such a possibility, Cassandra was developed with the following functions:
Replication
Cassandra replicates data across different nodes when written too. When data is
requested, the system accesses the closest node that contains the data. This ensures
that data stored using Cassandra maintains High-Availability (HA), one of the core
attributes of a NoSQL database. Once data is written to a server, a duplicate copy of
the data is then written to another node within the database (Lakshman, 2008).
Eventual Consistency
Cassandra uses BASE to determine the consistency of the database. In order for data
to be accessible to users, an individual whom is reading the data accesses it on one
node. At the same time, another individual can be making changes to another copy of
the data on another node. As the data is replicated, newer versions of the data are
sitting on one node, while older versions are still active on other nodes (Apache wiki,
2011).
Users of Cassandra can also determine the level of consistency, allowing writes to add
or edit data to a single copy of the data in a node, or, if possible, to write to all copies
of the data across all nodes (Apache wiki, 2011).
Scalability
Data that is stored on Cassandra is scalable across multiple machines. Such elasticity
is possible because Cassandra allows the adding of additional machines to the cluster
when required (Apache, 2011).
What is the future of the RDBMS in the Enterprise?
Page 27
2.4 The market for RDBMS’ and Non-Relational DBMS’
2.4.1 Introduction
This section is to give an overview of the current market for both relational databases and
newer non-relational databases. This document will investigate both traditional vendor
database offerings as well as the proliferation over the past few years of a number of
community developed open source database offerings.
The Literature Review for determining the current market for both traditional relational
databases and ‘future’ non-relational databases utilised a variety of sources, including
Internet search queries to find relevant research material, as well as utilising the University of
Dublin (DU) library facilities to access academic and commercial research to which DU has
access to.
2.4.2. RDBMS Market
Today, many executives want business to grow based on data-driven decisions. As such,
analytics of data has become a valuable tool in Business Intelligence (BI). Many of the top
performing companies use analytics to formulate future strategies and guide them on the
implementation of day-to-day operations (LaValle et al, 2010). However, organisations are
gaining more and more data without the means of extracting value from that data (LaValle,
Hopkins, et al). This has resulted in a requirement for the adoption by companies of
enterprise solutions that can give an overview of the data being generated, using Online
Analytical Processing (OLAP) databases.
The Database Management Systems market is split into two segments; OnLine Transaction
Processing (OLTP) and OLAP / Data Warehousing (DW). OLTP systems are characterised
by the RDBMS options available from vendors in the market will generally target either of
these two segments.
The OLTP market targets clients that require fast query processing, maintaining of data
integrity in multi-access environments and a business model that has data measured by the
number of transactions per second that the database can handle. In an OLTP model database,
What is the future of the RDBMS in the Enterprise?
Page 28
there is an emphasis on detailed current data, with the schema to store the data being the
entity model BCNF ( Datawarehouse4u, 2009).
OLAP databases are characterised by a low volume of transactions, and are primarily
designed for data warehousing databases. As such, they are particularly useful for data
mining; whereby applications access the data to give an overview of current trends, business
performance and informational advantage. As such, OLAP databases are increasingly seen as
important for making Business Intelligence (BI) decisions (Feinberg, Beyer, 2010)
2.4.2.1 Vendor Offerings
Within the enterprise database market, the industry is dominated by a few big corporations
which include Oracle, IBM, Microsoft, Sybase and Teradata. Many of the database offerings
from these firms operate in the Data Warehousing sector, which contains most of the market
for enterprise database management systems. While the big players will have comprehensive
database offerings for their clients, the market is currently being disrupted by new entrants
whom are targeting niche areas, either focusing on performance issues related to their
offerings, or single-point offerings (Feinberg, Bayer, 2010).
Oracle
According to Gartner, Oracle is currently the No. 1 vendor of RDBMS’ worldwide (Gartner
in Graham et al, 2010), with a 50% share of the market for the year 2010 (Trefis, 2011). They
are forecast to improve this figure to 60% by 2016, driven by their sales of the Exadata
hardware platform. Leveraging the use of the high-end Exadata servers in conjunction with
Oracles’ database software is estimated to result in more efficient and faster Online
Transaction Processing (Graham et al, 2010).
Currently, Oracle generates 86% of revenues from its database software portfolio, with 8%
from its hardware portfolio. The future strategy of the company is to have clients purchase
complete systems – hardware and software – thus leveraging the power of the Exadata system
to get the most out of Oracle’s database technology. The result will be an increase in Oracle’s
revenues and its market share (Crane et al, 2011).
What is the future of the RDBMS in the Enterprise?
Page 29
IBM
IBM is one of the main vendors in the market, and is the only vendor that offers to its clients
an Information Architecture (IA) that spans all systems, which includes OLTP, DW, and
retirement of data (Optim tapes) (Henschen, 2011a). IBM’s main offering in the RDBMS
market is the DB2 database. DB2 runs on a number of platforms, including Unix, Linux and
Windows OS. DB2 can also run on the z/OS platform, where it is used to deploy applications
for SOA, CRM, DW and operational BI.
IBM’s RDBMS solutions are ranked no.2 behind Oracle worldwide (Finkle, 2008), however,
they are slowly losing market share to Microsoft and Oracle due to uncompetitive pricing for
their database as well as greater functionality that can be found from rival offerings.
Recently, IBM acquired Netezza (Evans, 2011), a company who provide a DW appliance
called TwinFin to clients. TwinFin is a purpose-built appliance that integrates servers, storage
and database into a single managed system (Netezza, 2011a). The reason IBM acquired
Netezza is the expected increase in revenues that Netezza will generate from its portfolio
(Dignan, 2010), as well as a lack of overlap in the customer base between IBM’s current
client list and that of Netezza (Henschen, 2011b). Additionally, the acquisition fits in with
IBM’s overall business analytics strategy, as IBM has marked BI as the key driver for IT
infrastructure needs (Gartner, 2010).
Microsoft
SQL Server from Microsoft is a complete database platform designed for applications of
various sizes. It can be deployed on normal servers as well as the ‘cloud’, allowing user
clients to scale SQL Server to their respective needs. Purely a software player, Microsoft
requires hardware partners to deploy its database offerings (Mackie, 2011).
Microsoft, however, finds itself more under threat from low-cost or ‘free’ open source
alternatives such as MySQL and PostgreSQL due to operating primarily in the low-end mid-
market segment (Finkle, 2008). As such, if its clients are looking at alternative options, SQL
Server may not be competitively priced for Microsoft to compete with open source RDBMS.
What is the future of the RDBMS in the Enterprise?
Page 30
SAP/Sybase
Sybase, recently acquired by SAP, has three main business areas: OLTP using the Sybase
ASE database, Analytic Technology using Sybase IQ, and, interestingly, Mobile Technology
(Monash, 2010). This deal was required by SAP as it was coming under increasing pressure
due to Oracle’s recent acquisition of SUN Microsystems, which gave Oracle a stronger focus
on integrated products based around databases, middleware and applications (Yuhanna,
2010).
The deal between SAP and Sybase gives both companies a lot of synergies – SAP finally
acquires an enterprise-class database in the form of Sybase IQ, which SAP can now offer to
its hundreds of client companies a database with columnar store and advanced compression
capabilities (Yuhanna, 2010).
A differentiator from SAP peers now comes with the acquisition of Sybase in the form of a
mobile offering. Sybase has a number of mobile products for enterprises, including the
Sybase Unwired Platform and iAnywhere Mobile Office suite. These technologies allow
companies to connect mobile devices to a number of back-end data sources (Sybase, 2011).
SAP now has the ability to offer its applications embedded in Sybase mobile platforms, using
the synergy between the two to improve its competitive advantage and expand to other
markets (Yuhanna, 2010). Indeed, efforts are now being made to cement Sybase’s lead in this
segment of the market, with an initiative to make the Android OS platform enterprise ready.
This involves porting Afaria, Sybase’s mobile device management and security solution, to
the Android platform (Neil, 2011). With the growth of Android now reaching 30% of the
smartphone market share in the United States (Warren, 2011), the future growth for Sybase in
the mobile enterprise market looks strong.
Finally, although big in the database market in the early 1990’s (Greenbaum, 2010), Sybase
has been considered the fourth database vendor behind Oracle, Microsoft and IBM for the
past decade. Its main market for Sybase’ OLTP offering, Sybase ASE, has been the financial
services sector, with little penetration in other enterprise sectors. It is expected that SAP will
make Sybase ASE more cost effective, and make another push in this segment of the market,
maybe at the expense of the big three (Yuhanna, 2010).
What is the future of the RDBMS in the Enterprise?
Page 31
Teradata
Teradata is a database vendor specialising in data warehousing and analytical applications
(Prickett Morgan, 2010). During the last year, it was considered the best placed amongst its
peers as a market leader in Data Warehousing (Feinberg, Bayer, 2011). This will be a hard
position for competitors to dislodge as products in the DW market are considered difficult to
replace (Bylund, 2011). Amongst its clients are multinational corporations such as 3M and
PayPal (Teradata, 2011).
One of Teradata’s products, the Teradata parallel database, designed for DW and OLAP
functions, has an update and support revenue stream, as well as additional functions that
customers are willing to pay for (Prickett Morgan, 2010).
However, Teradata specialises in a single area of the database market – DW and analytics
(Prickett Morgan, 2010). As such it is exposed to any weakness that may occur within that
segment of the market. The company’s recent acquisition of Aprimo, an enterprise marketing
firm with a strong emphasis on Marketing Research Management (MRM) and Campaign
Management (CM). CM is considered by some as mission critical, as it allows marketers to
unlock the value of customer data to develop multi-channel communications. Such an
acquisition adds value to Teradata’s product portfolio, without competing with Teradata’s
current product range, allowing the company to diversify its offerings to clients and future
customers (Vittal, 2010).
EMC/Greenplum
Greenplum, a DW and Analytics firm acquired by EMC in 2010, is the foundation of EMC’s
Data Computing division. Greenplum specialises in DW in the ‘cloud’, through its Chorus
platform (Greenplum, 2011).
EMC’s strategy for gaining market share is releasing a free community version of their
database for testing, with the intent that they eventually purchase a commercial licence. It’s
recently released ‘free’ Community Edition database, a heavily customised version of
PostgreSQL, is targeted at companies and developers for whom Greenplum’s previous
offering was not useful for creating parallel databases for DW and Analytics (Prickett
What is the future of the RDBMS in the Enterprise?
Page 32
Morgan, 2011). The purpose of the release is to allow developers to build and test Massive
Parallel Processing (MPP) databases. If in the event that clients who develop these systems
wish to use the software in a commercial environment, then they will be required to purchase
a licence for the Greenplum Grade 4.0 database, EMC’s commercial DW offering
(Kanaracus, 2011).
It is hoped by EMC that customers wishing to have greater functionality with Greenplum’s
database will upgrade to the Greenplum Grade 4.0 database (Kanaracus, 2011).
2.4.3 Non-RDBMS Market
Open Source Databases
There are a number of open source community developed database solutions available on the
market today. However, due to these offerings generally being ‘free’, they don’t show up
high on the list of databases in use by revenues earned – total deployment of open source
databases can rival the total number of deployments from traditional vendors (Von Finck,
2009).
All RDBMS applications hold a consistency model that can be inflexible for certain
applications. The requirement for a record or table to be locked out from being viewed or
otherwise accessed while changes are being made slows down queries that are attempting to
generate results for end-users.
Additionally, due to atomicity and consistency, not all RDBMS applications are scalable to
the requirements of organisations that hold large quantities of data, such as Google and
Facebook.
With databases now employed that have tables of sizes in excess of 10 TB, the ability to
query all that data will require speed and processing power that cannot be achieved to the
requirements of user companies by traditional RDBMS offerings. Newer non-relational
database offerings designed to meet these new requirements usually come in two options;
MPP systems and Column-Store databases (Henschen, 2010).
What is the future of the RDBMS in the Enterprise?
Page 33
With the introduction of the Bigtable Distributed Storage System on top of the Google File
System (GFS) in 2006 (Chang, et al, 2006), Google has demonstrated that non-relational
databases can be scalable over multiple machines. Due to Bigtable’s proprietary nature
however, efforts have been made over the past five years to develop open source versions of
Google’s software, resulting in the arrival of the Apache Foundation’s Hadoop, initially
developed by Yahoo (Bryant and Kwan, 2008). A number of companies have now utilised
Hadoop and associated software to allow themselves to scale their database offerings to their
own requirements.
The growth of Hadoop can be inferred by unusual avenues. From 2007 through to early-mid
2009, IT requirements for expertise in Hadoop or MapReduce within the London area was .4
of 1% of the jobs market. By January 2011, the figure had grown to 1.2%, a 300% increase in
the requirement for expertise within 2 years (IT Jobs Watch, 2011). Additionally, there was a
49% increase in Hadoop job postings in the United States from 2008 to 2009, with most of
the job offerings being in California (Lorica, 2009).
However, due to the lack of suitably qualified engineers for Hadoop and HBase within the
industry at present, development projects at a number of companies have been affected due to
the lack of staff. Within Silicon Valley, Google and Facebook are two companies that can
afford to remunerate staff competitively due to their large sources of revenue. This has
resulted in Cloudera, the Start-up cloud database company, being unable to offer top
engineers remuneration at similar levels to their competitors. Cloudera have had to be
imaginative in relation to its remuneration to staff. This includes setting up offices within
downtown San Francisco, with the intention that staff would prefer to work in that location
than Palo Alto or Mountain View, both 30 miles from the centre of San Francisco (Metz,
2011a).
Such constraints will result in a lack of projects for new NoSQL databases until an adequate
supply of qualified engineers become available, slowing growth for development and
adoption of this new technology for the foreseeable future.
What is the future of the RDBMS in the Enterprise?
Page 34
Cassandra
Cassandra is a distributed, column family database, developed at Facebook to solve an Inbox
Search problem (Lakshman, 2008). It is now an open sourced project from the Apache
Foundation (Apache, 2011).
In addition to Facebook, additional users of the Cassandra database include the social news
website Digg (Higginbotham, 2010), who decided to switch from MySQL to Cassandra due
to scalability issues with MySQL. The rational behind the move was the decentralised nature
of Cassandra and the fact that it has no single point of failure (Kerner, 2010). Unfortunately,
the changeover to Cassandra was not run smoothly, resulting in Digg having to revert to
MySQL to ensure data integrity, and allow its services to be available to its clients. The
episode highlighted the pitfalls of switching from one architecture framework to another
(Woods, 2010).
Taking advantage of Cassandra’s introduction to the market, is Datastax – formerly Riptano
(DBMS2, 2011), a start-up founded by the Cassandra project’s chair, Jonathan Ellis. The
purpose of Datastax is to take commercial advantage of Cassandra, by selling expertise and
technical support in Cassandra (Kerner, 2010), following the examples of Red Hat (Linux)
and Cloudera (Cloud Computing) (Subramanian, 2010).
HBase
HBase is a non-relational database built on top of the Hadoop framework, using the Hadoop
Distributed File System (HDFS). Originally developed out of a need to process large amounts
of data, HBase is now a top-level Apache Foundation project (Zawodny, 2007).
Due to HBases’ ability to scale to large sizes, the database has received attention within IT as
a platform that can meet various companies’ requirements. Recent corporate announcements
about their deployment of HBase, has increased the marketplace viability of HBase as a
NoSQL database option (Metz, 2011b). These include both Facebook and Yahoo, 2
companies with large repositories of data.
Facebook announced a new messaging platform, in which email, text messages and Instant
Messages (IM), as well as Facebooks’ own messaging system, would be integrated together
(Metz, 2010). Facebook experimented with a number of database offerings, including its own
Cassandra database to see if it could handle the new system. Additionally, they excluded
What is the future of the RDBMS in the Enterprise?
Page 35
MySQL due to scalability issues. Eventually, they chose HBase, due to its consistency, as
well as ability to scale across multiple machines (Muthukkaruppan, 2010).
HBase was deployed by Yahoo to handle its news aggregation algorithm. The purpose of the
new system is to data-mine content in order to optimise what the viewer sees on Yahoo’s web
portal. In order for Yahoo to deploy to the website front page the most relevant news stories
that people are viewing at any given moment in time, their requirement for the system was a
database that could quickly query in real-time the most relevant items that people are
interested in based on the number of clicks that story receives. Deployment of this new
system has resulted in an increase in traffic to the Yahoo web portal, and subsequently
resulted in an increase in revenues (Metz, 2008).
What is the future of the RDBMS in the Enterprise?
Page 36
2.5 Case Studies
2.5.1 Case Study 1- Utility companies and the data management challenge
Introduction
Utility companies are known to be one of the most conservative of enterprises when it comes
to investing in technology (Fink, 2010; Fehrenbacher, 2010). There are many reasons for why
this might be so; security of supply, regulatory compliance and financial austerity together
with a lack of business drivers often leaves the risk averse utility threading water when it
comes to IT investment (Tony Giroti, CEO Bridge Energy, 2011). However, things have been
changing over the last few years. According to recent research by Lux, utilities (mainly
power and water) will invest up to $34 billion in technology by the year 2020 (St. John,
2011). The reason arises from Smart Grid projects mainly and the growing avalanche of
associated data which utilities will need to manage (St. John, 2011). For utilities, the business
drivers required to justify investment in the kind of technology which enables integration of
data across key business units have only recently emerged. Real-time applications just
weren’t necessary before now (Giroti, 2011).
Utilities
History has shown how utilities are by and large reactionary when it comes to new ideas. For
example, a snapshot of energy utilities related articles in the Pro Quest database (available
through the TCD Library’s online resources) at various times over the last few decades shows
flurries of activity around key moments of change in the industry. Cyclical changes from
regulation to de-regulation of the energy sector in the early 1990s, begun in the US, kick-
started reactionary strategy changes within the energy industry. Ireland followed the pattern
with the Electricity Regulation Act of 1999 a program which is nearing completion. Fifty six
articles on related subjects between 1992 and 1994 in contrast to just eighteen in following
six years to the year 2000 (Pro Quest database) would seem to support this assertion.
In the last decade or so innovation for utilities centred around the technology enabling Smart
Grid and again an upsurge in articles on this subject stands out in a normally ‘steady state’
What is the future of the RDBMS in the Enterprise?
Page 37
sector. More recently the pressures of a diminishing supply and subsequent higher prices of
raw material for energy production have propagated a sustainability drive.
Compliance however has been a steady influence on energy utilities. What makes the Smart
Grid attractive is the way it forces efficiency throughout the energy supply chain from
generation to distribution resulting in less CO2 emissions – a major deliverable of the Kyoto
agreement. Related to this has been the drive towards sustainable energy generation and
supply. Vice President of Technology at Cobb Energy, Bob Arnett sums it up:
“In today’s world, where utilities are focused on environmental concerns, resource
constraints, and intelligent grids, it is sometimes hard to remember that in the mid-
Nineties, the word of the day was ‘deregulation’.”
(Arnett, 2011)
This case study looks at utility companies in the context of these three key drivers:
Regulation/Deregulation; Smart Grid and Sustainability. The case is stated in general terms
initially but quickly moves to more specific Smart Grid applications in electricity supply
companies, focussing in one Irish energy company’s use of databases in its implementation of
Smart Grid applications. As the ESB’s (Electricity Supply Board) Tom Geraghty said of
Smart Metering in a recent interview with Silicon Republic:
“How you get data back from the electronic metre to a utility central point where it is
aggregated and the bill is sent out to simply allowing people to top up their metre at
home as if it were a mobile phone shows you the complexity that lies ahead. There are
many imaginative options emerging and the opportunities are endless,”
(in Kennedy, 2011)
One estimation from Lux research puts the increase of data coming from the Smart Grid at
900% by 2020 (St. John, 2011). Tony Giroti puts this in more tangible terms- 1 million smart
meters passing data every 15 min equates to 30 TB of data per year to be handled, stored and
harvested (Giroti, 2011). This figure doesn’t include the real time data flowing through the
system as part of the self-healing attribute of Smart Grids.
What is the future of the RDBMS in the Enterprise?
Page 38
The problem can be placed within the wider question asked in this dissertation, that is, what
is the future of the traditional RDBMS in the enterprise? To this end, this case study
predicates that the general feeling towards newer database management solutions such as
open source and NoSQL is that while they are attractive for certain non-core applications,
they are not yet up to the task of the more serious mission critical functions of control
systems, financial transactions and customer management within enterprises. This study
investigates the problem in the context of traditionally risk averse utility companies and
questions if new business drivers (of which the Smart Grid is key) are forcing a rethink on
this issue.
A public utility company is an enterprise which provides key services to the public most
typically electricity, gas, water, and transportation. They may be state or private owned. They
may operate in a regulated, deregulated or even semi-regulated market (Legal Dictionary).
The energy sector in Ireland is currently under going dramatic change. The two largest
energy companies in Ireland, the Electricity Supply Board (ESB) and Bord Gáis are
commercially run enterprises and are both majority owned by the state. Both companies have
recently entered into each others markets as a result of the state’s requirement (and driven by
the EU) to open up the energy market in an attempt to improve competitiveness the sector for
the benefit of consumers (Irish Government White Paper, 2007).
One result of this restructuring of the sector is that the separate electricity and gas markets
have been combined and the sector is now generally referred to as the energy market. The
functions carried out by utility companies differ according to the services they provide.
Energy suppliers are similar in the functions they carry out such as generation, transmission
and distribution of energy. Water utilities in other countries have moved towards a revenue
generating model for water supply and Ireland rightly or wrongly may soon follow suit.
Each core function contains a number of supporting IT applications. Each of these in turn is
supported by a suitable data management system. Some of the major solutions used in energy
utilities include: Geographical Information System (GIS); Meter Data Management (MDM);
Customer Information System (CIS); Distribution Management System (DMS); Supervisory
Control and Data Acquisition (SCADA); and Outage Management System (OMS). Figure 2.3
shows where some of these systems fit into the overall network.
What is the future of the RDBMS in the Enterprise?
Page 39
Each of these systems provides support the specific needs of the different business functions,
such as, supply, generation, distribution, trading, and operations. As such they may or may
not be integrated. In relation to meter data management (MDM) Giroti again states the
problem succinctly in his paper entitled “You’ve Got the Meter Data – Now What?” (2011),
where he gives two options:
Have a proactive strategy for integrating and managing data coming from the Grid, or...
Be reactive in response to problems as they appear at the risk of being left behind by
competitors adopting the former strategy.
Smart Grid - The ESB case
The European Technology Platform definition of smart grids is -
“electricity networks that can intelligently integrate the behaviour and actions of all users
connected to it - generators, consumers and those that do both – in order to efficiently deliver
sustainable, economic and secure electricity supplies” (Smart Grids: European Technology
Platform, 2010)
Successful smart grid implementation depends on how enterprises utilise information systems
in managing the torrent of data heading their way. This issue puts data management systems
right back in the foreground of the IT game.
The ESB plans to invest up to €11 billion in sustainable projects including a Smart Grid
(Strategy Framework 2020). The ESB began a pilot project for advanced metering in 2007.
Advanced meters occupy what is termed the head end of the smart grid. They reside on
customer premises or at the company’s own locations typically at the edge distribution
network. The ESB has to date installed 6,500 smart meters. The estimated total installations
required for full implementation is over two million. The data consists of messages to and
from a central management system called a meter data management system (MDM). The
message can be meter data relating to load readings, voltage and temperature measurements,
outages, faults and other events.
The ESB’s existing data management platforms includes solutions from Oracle, IBM and
Microsoft. Currently no open source or NoSQL solutions exist in any official way in the
company. A preliminary evaluation of the open source database solution MySQL was carried
What is the future of the RDBMS in the Enterprise?
Page 40
out by the IT department in 2010 but no decision on implementation has been made as yet.
MySQL is now under the roof of the Oracle house following its acquisition of Sun
Microsystems in 2010 (Lohr, 2009).
Image source: http://www.consumerenergyreport.com/wpcontent/uploads/2010/04/smartgrid.jpg
Figure 2.2 – Overview of a generic Smart Grid
What is the future of the RDBMS in the Enterprise?
Page 41
(Image source: EPRI)
Figure 2.3 - ESB proposed implementation of Advanced Metering (Key area of interest is circled)
The Data Volume Problem
A traditional electricity grid is made up of electro-mechanical components that link electricity
generation, transmission and distribution to consumers. A smart grid builds on advanced
digital SCADA devices involving two-way communication of data of interest to utilities,
consumers and government (Financial Times, Nov 2010).
Figures for how much data will flow vary depending on the implementation of smart grid.
Estimates from the ESB’s trials involving 6,500 meters show a substantial increase in the
amount of data required to be stored and analysed at the back end.
Utilities it seems are not immune to ‘Big Data’. Tony Giroti is qualified to comment on the
issue. He is one of only 13 elected members of Gridwise Architecture Council formed by the
US Department of Energy for the purpose of articulating the way forward for intelligent
energy systems.
In his article for the e-magazine Electric Energy Online “You’ve Got the Meter Data – Now
What?”, (2011), Giroti states the data volume problem as such:
What is the future of the RDBMS in the Enterprise?
Page 42
Figure 2.4 – Smart Meters transaction rate
Girotti foresees the storage and processing concerns associated with this volume of data.
Figure 2.5 – Smart Meters data size
Processing of this data also presents a challenge to system architects. Gathering of data from
a million smart-meters at 15-minute intervals as per the example above equates to 1,111
transactions per second, or 90 million transactions per day. The problem is further
compounded by the critical requirement of the system to analyse network event transactions
in real-time in responding to fluctuations in demand and fault response (Giroti, 2011).
One limitation of Girotti’s claim is that there is no indication in the article of how the one
kilobyte per transaction figure is calculated. This is an important factor for vendors of back
end processing running off relational databases. The lower this number is the better. Some
systems rely on filtering out less important data at the source, that is, at the meter itself rather
than storing superfluous data at the back end. For example, meter location information does
not change and can be sent only once. Even at a conservative data size of 128 bytes per
1 Million
Smart
Meters
Hourly Collections of
data =>
3.6Gigabytes of data
per day to be stored,
analysed and backed
up
1Kb per transaction
per meter = 1.1Mbs
1 Million
Smart
Meters
1 read every 15
mins
1 Million meter reads
15 mins x 60 secs
1,111
Transactions
per sec
What is the future of the RDBMS in the Enterprise?
Page 43
transaction for basic household usage data only at 15 min intervals, that’s 1.2 Megabytes of
data per meter per day to be stored, backed up and processed.
(Image source: Accenture, 2010)
Figure 2.6 - Sources of Smart Grid data with time dependencies
Figure 3 shows the different types of data involved. At one end you have critical and time
dependant event data. Some of data at this end will have very low latency measured in
milliseconds – the kind of times involved for the safe operation of self-healing networks. At
the top end there is the data for business intelligence. Processing of this data does not need to
be immediate. The top end gets interesting, however, when the business tries to wade through
their data warehouse, clustering data to form information and using the information for
knowledge and hopefully wisdom. Then there is the middle layer meter data coming in 15
What is the future of the RDBMS in the Enterprise?
Page 44
minute or half hourly intervals. Efficient processing of this data is critical if utilities want to
offer real time billing to their customers.
Further sources of data can be found at the opposite end of the system, the home and car.
Successful interoperability between domestic devices, electric vehicles and supplier
equipment (data collectors) will create the intelligent home. Many commentators are focusing
on open source as the most viable platform for this development (Fehrenbacher, 2009;
Rosenberg, 2010).
So in the end, the business case for the return of investment in smart networks and ultimately
competitiveness depends on the how well all of this data is gathered, stored, processed and
analysed.
Tom Geraghty, IT governance and strategy manager for the ESB explained, “In terms of
where the ESB is and where Ireland is in terms of the smart-grid agenda, we have just
completed a year-long technical trial of smart metering and a decision is due mid-year from
the energy regulator on how we should proceed. Emphasising the IT challenges ahead, he
added that “data collection, storage, transfer and billing are the key issues for utility firms”.
(in Kennedy, 2011)
How one utility company is meeting the data volume challenge
Dave Rosenberg of CNET interviewed Ritchie Carroll and Josh Patterson of Tennessee
Valley Authority (TVA) about their use of the open source solution Hadoop in addressing the
data volume problem (Rosenberg, 2009). TVA uses devices called Super Phasor Data
Concentrators (SuperPDC) to collect data on the health of their electricity network. TVA
expects the stored data from all their PDCs to grow to half a petabyte in the next few years.
TVA’s Josh Patterson says that “data is collected directly from field devices at 30 times per
second. This data is then time-aligned and processed in real-time….all data gets captured into
a binary data file as time-series data for mass processing by Hadoop.”
When Rosenberg asked why TVA chose Hadoop over more mature solutions, Paterson
responded:
“We considered several technologies including SAN, NAS devices, and RDBM systems.
Hadoop gave us a commodity based hardware solution that offered superior reliability at a
What is the future of the RDBMS in the Enterprise?
Page 45
minimal cost using HDFS, but it also had the added processing benefit using Map Reduce
over large scale data for fast analysis”
TVA’s Ritchie Carroll adds that Hadoop techniques already developed allows for faster
processing of ‘Big Data’ (Rosenberg, 2009).
What is the ESB doing?
ESB engineers working on their Advanced Metering project (the first step towards the Smart
Grid) looked for organisations who had implemented projects of similar scale. They found
that the US utility company ONCOR Delivery Electric had planned for data from 3.5 million
meters over a two year period. This was the kind of scale comparable to the ESB’s project at
full implementation. Access to this information was valuable in assessing the right type of
data management system required.
As the ESB already uses well established data management solutions from Oracle and IBM
an assessment of those vendors Meter Data Management (MDM) solutions seemed a good
place to start.
Oracle Utilities Meter Data Management is described by Oracle as an ‘off the shelf’ solution
for managing the influx of data from Advanced Meter Infrastructure (Oracle).
Oracle’s strengthened its position in the utilities area through its acquisition of Lodestar
Corporation in 2007. ESB Customer Supply business already uses Lodestar products for
demand forecasting (PR Newswire, 2011). Lodestar Customer Choice Suite includes the
Oracle 10g database (Oracle, 2007).
Smart DTS is AMT Sybex’s MDM solution for the UK and Ireland utility sector. SMART
DTS claims unrivalled performance for processing large volumes of data (AMT Sybex)
IBM’s Informix TimeSeries Datablade is used as the enterprise scale time-series RDBMS for
meter data loads. In 2005 AMTSybex provided products and services to the ESB market
opening Project (MOIP), (AMTSybex Case Study).
The ESB Power Generation business uses OSISoft’s PI solution for managing real-time
operational information. The associated database is Microsoft SQL Server.
What is the future of the RDBMS in the Enterprise?
Page 46
ESB also looked at Aclara Software’s Meter Data Management System. It uses a star schema
for time series data and a “wide storage” model to reduce storage needs and improve
processing.
The star schema is common in RDBMS data warehouse design. The schema is organised
around a central fact table joined by foreign keys to dimension tables (Aclara Software Inc.,
2008)
In “wide storage” the meter interval data is stored in a single daily record (row). The
attributes are spread over many columns. In more traditional “tall storage” database
architecture, 15 min interval data for one day would be stored in 96 separate rows. Aclara
have estimated that data will take up about 10% of the space “tall storage” would require for
the same data (Aclara Software Inc., 2008). Wide table database was designed to better
handle sparse data sets where many attributes may be null, such as user provided information
from the web (Chu et al, 2007).
Conclusion
The ESB has not yet concluded trials and no decision on its preferred MDM system is
available at the time of writing. It is the opinion of this writer that the ESB will follow a
similar line advised by Keith Broad (Director of Information Technology at Bluewater Power
Distribution Corporation, Ontario, Canada), to engage a trusted partner that will be around for
a long time; who will use their strong capabilities to develop product evolutions but can also
implement at local level (Broad, 2011)
Another view comes from Forrester’s Jeffrey Hammond. He says that enterprises are now
less afraid of open source solutions. “For large companies its tech OS savvy people who just
want to solve a problem without the burden of procurement and licensing on their backs and
they have the time...for smaller co’s its just money” (Hammond, 2009).
Tennessee Valley Authority is one of the rare utilities to embrace open source and would
seem to be the exception to the rule ( different research methods, such as interviews and
surveys, may reveal other utility companies that are using open source solutions for Smart
Network applications, however based on inductive research so far the number would appear
to be low).
What is the future of the RDBMS in the Enterprise?
Page 47
For utilities that are regulated to any extent, decision making on investment is tightly
controlled. Network infrastructure upgrades can be slower and more cautious. As a result
large well established vendors are more attractive (Fehrenbacher, 2010).
Things may be changing, however. As initiatives to reduce costs while investing in smart
technologies dove-tail with the availability and uptake of technically robust and
commercially sound new offerings, open source and non-relational database management
platforms are making inroads into utility companies. It may still be the case of careful
matching of systems with functional requirements but open source it seems is becoming part
of the business case for technology savvy utilities. In reality, for utilities anyway, this might
happen by proxy as their traditional and trusted vendors acquire the more robust of open
source solutions.
Andy Roehr of CapGemini sums the importance of new technologies and methods for
managing data:
“Just storing the data is also only the first step in gleaning benefits from smart meters
and smart grids…the industry as a whole has a challenge in front of it as we learn how
to use the data. Right now, we're trying to figure out what the right technologies are
and appropriate data collection intervals. People are going to learn as they go
(Pariseau, 2009).
2.5.2 Case Study 2 - Social Networks – The migration to Non-SQL database models
The most active adopters of ‘Big Data’ NoSQL database technologies have been Social
Network websites, as they are companies that generate large amounts of data from their users.
As such, for this case study, we will investigate the adoption of NoSQL databases by a couple
of prominent social networking sites, and why they chose these database models over
traditional RDBMS’.
What is the future of the RDBMS in the Enterprise?
Page 48
Facebook Messages - Choosing the scalability of HBase over the constraints MySQL
Facebook Messages is a new application from Facebook that integrates all of a users’ emails,
text messages (SMS) and instant messages (IM), as well as messages sent using Facebooks
own messaging system, between himself/herself and another person into one conversation
(Muthukkaruppan, 2010). If someone wanted to reply or forward on a message, he/she can
select from which method of communication they wish that message to be received –
Facebook, e-mail, IM or SMS (Seligstein, 2011).
Technology behind Facebook Messages
During the initial development of Messages, the engineering team realised that they would
require a large and robust storage platform in which to store all the messages that would be
generated. However, the team first needed to know what the requirements for the system
were. To get an idea of how the new Messages system was likely to be used, they monitored
the usage of their current users in relation to responding to message ‘chats’ on their own
profiles. This was from a total pool of 300 million users sending 120 billion messages a
month. After careful study, the engineering team realised that two patterns emerged:
● “A short set of temporal data that tends to be volatile”.
● “An ever-growing set of data that rarely gets accessed”. (Muthukkaruppan, 2010)
The team then proceeded to evaluate different database technologies to determine which
system would be most suitable for the new Messages service. Depending on the outcome of
this evaluation, Facebook would adopt one of the options, or possibly build their own (Hoff,
Nov 2010).
Facebook already have large clusters of MySQL servers (Cohen, Nov 2010), and were the
original developers of the Apache Foundations’ Cassandra Database (Lakshman, Malik,
2008).
The team tested the system on clusters of MySQL databases to determine whether or not the
new system would scale to the necessary size in order for it to work for the new service. They
dismissed MySQL as an option because, when dealing with a large amount of data, indexes
take a long time to update and all statistics of the data rarely get updated, if ever (Peschka,
2010). As such, the performance of the system suffered (Muthukkaruppan, 2010).
What is the future of the RDBMS in the Enterprise?
Page 49
Having tested the system on Cassandra, they realised that Cassandra’s eventual-consistency
model was difficult pattern to reconcile with Messages new infrastructure (Muthukkaruppan,
2010). Cassandra’s consistency model means that old versions of a post or message will still
be in the system after the updated post has been written (Apache, 2010a). With the Instant
Messaging component of Messages being required to return to the user immediate results, the
eventual consistency model of Cassandra could result in users reading older messages that are
still in the system, making Cassandra unacceptable for Messages.
They realised that HBase was ideal for Messages. To start with, it has a simpler consistency
model than Cassandra (Muthukkaruppan, 2010). HBase is built on top of Hadoop/HDFS,
which has a replication model called Replication Pipelining (Peschka, 2010). Replication
Pipelining requires data that is received in one copy of the file is immediately sent onto the
second copy of the file that is located on another node, and so on (Apache, 2010b). This
ensures that the data is consistent across the whole system, and any user accessing the data
has access to the most updated information, irrespective of which copy of the data is
accessed.
The strength of the consistency model of HDFS/Hadoop was required if Messages was to
have the real-time updates users expect of the system.
Twitter - The use of NoSQL databases at Twitter
Currently, Twitter generates 12 terabytes of data per day, doubling per annum (Weil, 2010).
Originally, in order to store the large amounts of data that Twitter was generating, they were
using a logging system called Syslog-ng. Syslog-ng eventually stopped scaling, resulting in
the system dropping data. In effect, this resulted in the data being lost forever (Weil, 2010).
Having tried to resolve this problem, Twitter used a new system called Scribe (Weil, 2010),
developed and open sourced by Facebook (Hoff, 2008). Scribe works well with a distributed
system, with only a minimal loss of data under certain circumstances such as timeout issues
(Hoff, 2008). Scribe solved Twitters initial problem with regards to logging and saving all the
data which it generated. The system worked so well that Twitter now knew more about what
was happening across its entire technological ecosystem (Weil, 2010).
What is the future of the RDBMS in the Enterprise?
Page 50
With this increased amount of stored data, Twitter realized that they needed to be able to use
it in a productive manner. To analyse the data, Twitter turned to Hadoop and MapReduce.
Hadoop was able to distribute the data over large clusters of machines, and then using
MapReduce, Twitter was able to compress the data by getting rid of duplicates of un-
necessary data, e.g. User_ID.
Attempts to analyse the data using MySQL would require joining a table of User_ID’s with a
table of Tweets. The resulting query would have to read through both tables, involving
millions of users (current count: 130 million) and all of the tweets generated by those users
(billions!). The length of time required to generate the result(s) would be too slow to make
MySQL useful for such a query (Weil, 2010).
With Hadoop having been designed with parallel computing in mind, it is possible to query
all of the tweets across the Twitter infrastructure in minutes. Additionally, with Hadoop’s
emphasis on scalability, adding more machines to the cluster helps speed up the process.
However, Hadoop will also read data that is not required for the query. Attempting to solve
this problem, involves writing specialised queries into the Hadoop infrastructure using Java,
the language upon which Hadoop is scripted. This resulted in queries that were not optimised,
slowing the system down.
To overcome this problem, Twitter use a high level language designed for use with extracting
data results from Hadoop called PIG (Anand, 2008). Built on top of Hadoop, PIG can query
data with 5% of the code of MapReduce, in 5% of the time (Weil, 2010). The ease of use of
PIG also allows more individuals within Twitter to customise queries to their specific needs.
This has helped different departments to gain the necessary answers they require to improve
the performance and productivity of their departments.
Using HBase, Twitter started to build products within the Twitter infrastructure. Taking for
example Twitters People Search utility, the old system would scan though the User_ID table
for the relevant name, but in an offline process on a single node. The system was prone to
failure due to the length of time it would take to process the query, in addition to listing
irrelevant results.
Because People Search is built in HBase on top of Hadoop, Twitter is able to scale People
Search across multiple machines. This not only improves the overall performance of People
Search, it also gives it built in redundancy, as Hadoop writes multiple copies of the file across
What is the future of the RDBMS in the Enterprise?
Page 51
the cluster to prevent a loss of data (Weil, 2010). Additionally, People Soft is mutable,
allowing users of Twitter to be ‘findable’ by People Search, even if they have changed their
user names (Weil, 2010). This contrasts with the problem of structured data discussed in the
section on RDBMS.
Twitter performs a large number of queries based on degrees of separation (Weil, 2010).
Taking for example that an individual sends a tweet to one individual, then everyone of the
senders followers must be informed that the tweet was sent, in addition to all of the followers
of the individual who received the tweet – two sets of updates must be performed. Twitter
originally used MySQL in 3rd normal form to ensure that everyone who was meant to get the
tweet update received it. Unfortunately, as the number of followers for an individual grew,
MySQL ran out of RAM when the indices overflowed, resulting in updates not getting to all
followers. Even when Twitter de-normalised the table into normalised lists, it became
inefficient if an individual had too many followers (Weil, 2010). It resulted in data
consistency challenges, especially if a deletion occurred, as the system would then have to re-
write the whole update.
To overcome these problems, Twitter built an architecture on top of MySQL called FlockDB.
FlockDB, developed by Twitter (Pointer et al, 2010), is a database that stores graph data, and
is optimised to read and write fast to adjacency lists. In conjunction with Gizzard, a
middleware networking application that handles and manipulates queries between the back-
end data store and the database, FlockDB stores User_IDs and Tweet_IDs as a set of integers
on these lists which are used for sorting (Pointer et al, 2010). These integers take the place of
User_IDs and Tweet_IDs, with the most recent first. By reducing the amount of data that the
system has to query as a result of turning IDs into integers, FlockDB can query any page of
indexed data very fast, allowing tweets to be updated quicker than if User_IDs or Tweet_IDs
were used as the primary source of query data.
What is the future of the RDBMS in the Enterprise?
Page 52
Chapter Three - Research Methodology
3.1 Introduction
“Somewhere, something incredible is waiting to be known.”
Dr. Carl Sagan quotes (American Astronomer, Writer and Scientist, 1934-
1996)
“Knowledge is the death of research.”
(Nernst's motto) Nernst, Hermann Walther, in C. C. Gillispie (ed.), The
Dictionary of Scientific Biography (1981), Supplement, Vol. 15, 450.
“Inquiry is fatal to certainty”
Will Durant (American Writer and Historian, 1885-1981)
The quotes above perhaps illustrate both the wonder and the frustration of conducting
research. In the beginning there is an idea that before investigation may be imagined to be
wonderfully true. The following statement will serve to illustrate the point – ‘My cat must be
cleverer than all other cats because I have seen him do things that other cats cannot do.’
Science demands that such a statement with all its assumptions is challenged (if deemed
worthy of investigation in the first place); what Karl Popper calls ‘falsifiability’ (cited in
Burns, 2000 p. 7). Research is the way forward to test the hypotheses contained in the
statement. Proper research must test all findings on many fronts; for example, assumptions
based on culture, class, gender, language, values and history, must be challenged before facts
can be accepted as true. In this respect Nernst’s motto is only half the story. As knowledge is
acquired it must be retested under various contexts, and hence, Durant’s reverse of Nernst’s
motto is also true. The only path towards certainty is further inquiry.
What is the future of the RDBMS in the Enterprise?
Page 53
This chapter attempts to show how the research question at the heart of this dissertation was
addressed and why it was placed in a particular context. It also discusses the reasoning behind
the selection of a research methodology and the specific methods used.
Chapter 1 introduced the research question and Chapter 2 discussed this question in detail by
reviewing the current literature, analysing the findings of the research and finally using case
studies to support our research. The Chapter presents the following:
• The strategy adopted for researching the question in the form of a framework.
• Explanation of why we chose a particular research strategy.
• The methodology chosen to best fit that strategy.
• The research methods used.
• How the overall strategy fits with current research practices.
• Strengths and weaknesses of the research strategy.
3.2 The strategy adopted for researching the question
A definition of research from Robert B. Burns (not the poet) states that “Research is a
systematic investigation to find answers to a problem.” (Burns, 2000, p. 3). The Cambridge
Dictionary definition is a little closer to the work undertaken for this dissertation: “Research -
a detailed study of a subject, especially in order to discover (new) information or reach a
(new) understanding” (Cambridge, 2008).
Research theory tends to centre on two contrasting approaches – the scientific/positivist
approach and the qualitative/ interpretive approach. The starting positions of both approaches
(albeit simplified) can be summed up in the two quotes below:
"There's no such thing as qualitative data. Everything is either 1 or 0"
(Fred Kerlinger in Miles & Huberman, 1994, p. 40)
What is the future of the RDBMS in the Enterprise?
Page 54
"All research ultimately has a qualitative grounding"
(Donald Campbell in Miles & Huberman, 1994, p. 40)
Ideally all researchers would like the data to be easily validated. Numerical data, the basis of
statistical analysis should under correct control conditions produce valid answers to a
hypothesis (that is, either true or false), (Burns, 2000, pp. 8-9).
On the other hand, there can be an inherent weakness in the way humans measure things. A
controlled environment in which an event or behaviour is observed is in a sense a contrived
environment and present an exact replica of the same conditions in which the same
phenomenon occurs naturally. The Hawthorne effect (of Elton Mayo’s experiments in the
Western Electric Hawthorne Works, Illinois 1927-1932) is sometimes cited as an example of
this problem (BioMed Central, 2011).
Quantative/Scientific/Positivist Qualitative/Interpretivist
Objective/Rational Subjective/Argumentative
Deductive Inductive
Observe and Measure Observe
Closed/Controlled Open/Natural
Statistical Descriptive/Interpretive
Behaviourist Cognitive
Logical argument - ‘This is so’ Hypotheses - ‘This seems to be so’
Table 3.1 - Key concepts in Qualitative and Quantitative research methodologies
Table 3.1 above shows a table of some of the keywords associated with each approach. It has
been comprised from various sources (Burns, 2000; Web Center for Social Research
Methods; and EJBRM, 2003)
What is the future of the RDBMS in the Enterprise?
Page 55
The last words in the table are perhaps pertinent to understanding the place of each approach
in the way arguments are made. The logical method of verifying the validity of a premise
leads to a valid conclusion. However, without the cyclical testing and retesting of hypotheses
the validity of a premise may be compromised. Take for example the following syllogism:
All the cats I have observed have four legs.
Therefore, all cats have four legs.
The above is a weak argument. The premise is valid but the conclusion does not take account
of cats that have unfortunately lost limbs and are alive. In a more relevant and less abstract
way the selection of information and the testing of assumptions have been key features of the
research methods for this dissertation.
The qualitative approach, especially in the initial stages of research can direct focus to where
quantitative study efforts should be made. This iterative approach of using the appropriate
methodology according to the required goal ensures better focus and therefore less waste of
research resources.
This last sentence of the paragraph above is itself a syllogism (enthymeme) (Rapp, 2010). It
contains an initial premise, followed by its inferred premise followed by its conclusion, and
hence requires further testing. It is not practical given the scope of this dissertation to stop
and test every premise of an argument to its end (even if it were possible given that the data is
not quantitative in nature). The philosopher and author Bertrand Russell recognises the
problem of syllogisms in general when saying that all that we can really expect to understand
is how words are used (Russell, 1995, p. 208). We, as researchers accept that limitation and
present our research as contained within the framework set out below. What is required is a
solid foundation for which to build on and enhance the knowledge of the subject matter
already existing. The following sections deal with the basis for the research that hopefully has
kept the research on the correct track while bounded by its limitations.
3.3 A Theoretical Framework
The scope of this chapter does not merit dwelling too much on the scientific approach
because the research question and subject of this dissertation is better serviced by a
qualitative analysis approach. The presence of the word ‘future’ in the question alone forces a
What is the future of the RDBMS in the Enterprise?
Page 56
hypothetical and inductive method. At the end of this chapter, the focus will return to
continuing this discussion of research theory briefly, relative to our experience of researching
for this dissertation. Particular attention is given to the future study regarding the question of
whether or not the availability of a large quantity of information on the Internet together with
the advancements in analytical tools and skills, are bringing the two contrasting sides of
research theory together.
Figure 3.1 – A Research Framework
The diagram above presents a framework for researching the questions that arose during the
investigation of the central question – What is the future of the RDBMS in the Enterprise? It
is based on a qualitative and inductive research methodology. The framework has been
constructed from information from several sources including Robert Burns, (2000); Trochim
(2006); and, Ellis and Levy, (2008).
The main concept of the framework design is the cyclical returning to the source data, (in the
case of research for this dissertation - the literature review) as hypotheses are validated,
Question for
Investigation
Observation and
investigation of
instances of
phenomenon in
similar context
[Literature review]
Emerging Patterns
Form hypothesis
Test hypothesis
Form theories,
new ideas and /or
conclusions
Hypothesis
still valid?
Yes
No
What is the future of the RDBMS in the Enterprise?
Page 57
modified or discarded. It is noted that Burns differentiates between searching for data which
supports or refutes a hypothesis and analysis of collected data to develop theories and ideas in
an open-minded way, and in which new directions for continuing research can emerge
(Burns, 2000 p. 390). For this dissertation a combined approach was taken. An initial swell of
information relevant to the subject was collected. Analysis of this information gave rise to
theories. Where new ideas resulted, they were either allowed to develop into further research
or else the idea was discarded for further study elsewhere. Ideas which were interesting but
not used are recorded in chapter four.
3.4 Research Design
More specific to the purpose of this dissertation, the following research design was selected.
• Research question: What is the future of the RDBMS in the Enterprise?
• Collection and classification of relevant information
• Hypothesis – There is a future for the traditional RDBMS in the enterprise.
• Historical investigation of traditional RDBMS.
• Investigation of new type databases and data management systems, and what do they
offer.
• Comparison and contrast of both traditional and new models.
• Question - Are there any drivers for change and if so what are they?
• Hypotheses of drivers for change - new applications, social behaviour, the data
volume problem.
• The marketplace for data management systems traditional and new
• Selected Case Studies supporting or refuting our hypotheses.
• Research on research methodology – are we doing it well enough?
• Conclusions - hypothesis true or false?
What is the future of the RDBMS in the Enterprise?
Page 58
3.5 Methodology - A Qualitative Approach
The aim of the research is to hopefully provide new knowledge regarding the future role of
relational database management systems in enterprise organisations. The inquiry hypothesises
that new applications of technology (business intelligence for competitive edge, the advances
of science, information highways and more), and changes in social behaviour (social
networking and sustainable living for example) are presenting organisations with a potential
data deluge (The Economist, 2010, Vol. 394, No. 8671, p. 11). ‘Big Data’ as it has come to
be known is a problem requiring new data management systems. Over the last few years the
big names in data processing and managing technology, Oracle, IBM, SAP and Microsoft
have spent over €15 billion in acquiring software companies specialising in data analysis and
management (Cukier, 2010, p. 4). This dissertation looks at what ‘Big Data’ could mean for
the traditional relational type data management systems.
To investigate the questions raised and the emerging hypotheses derived from the above
exploration a qualitative research approach was used.
3.6 Methods
Research methods were chosen in keeping with the framework, design and methodology
previously stated. Donald Radcliff’s paper provides a useful compilation of qualitative
research methods from which the following were chosen. Ratcliff (2000) summarises the
approach similar to others:
• Observation of events and behaviors within a common context – examples: industry
sector, academia, sustainability, social networking
• Recognition of patterns of similar events or behaviors
• Answers induced from the findings.
• Content analysis – looking for emerging themes
What is the future of the RDBMS in the Enterprise?
Page 59
3.6.1 Method - Analytic Induction
The steps of the analytic induction method is summarised by D.R. Cressey (cited by Ratcliff,
1994): (1) An idea to be explored is formed; (2) A hypothesis is developed, (3) An
occurrence of an event or behaviour (phenomenon) related to the hypothesis is sought (4) The
phenomenon either supports or refutes the hypothesis, (5) Additional phenomena is sought,
(6) The hypothesis is tested every time a new instance of the phenomena is found.
3.6.2 Method - Content Analysis
Content Analysis looks for emerging trends in documents, interviews, text, articles, academic
and industry papers which help the researcher to classify the information (Ratcliff, 1994). For
this dissertation, the data/information repository and categorisation features of Zotero were
used. Zotero, in its basic version, is a freely available Internet browser tool which allows
researchers to collect, categorise, organise, and share research sources. It also provides an
easy method for citing sources.). Zotero is produced by the Centre for History and New
Media at George Mason University. It cites United States Institute of Museum and Library
Services, the Andrew W. Mellon Foundation, and the Alfred P. Sloan Foundation among its
sponsors.
It was not only nice but also relevant (from a learning aspect at least), that we could interact
with a database in this very practical way while at the same time using the fruits of this labour
to write this dissertation.
3.6.3 Method - Historical Research
Historical research involves the collection and analysis of information from past events as
an aid to exploring present and future events (Gay, 1996). Again, trends and patterns
provide the researcher with useful pointers towards further courses of exploration of
hypotheses under test. An initial hypothesis may be modified or even abandoned if findings
arising from steps 5 and 6 of the analytic induction method above suggest it. The historical
analysis stage was critical proper research for this dissertation as the initial question
What is the future of the RDBMS in the Enterprise?
Page 60
involved looking at past events, evaluating their relevance to the hypothesis regarding the
future. It is here that assumptions were tested and where disappointment and frustration
lurked at the end of every blind alley. Exceptions to any trends or patterns discovered were
also be investigated and accounted for. In other words, an open mind was required in
reaching a conclusion to the findings.
3.6.4 Method - Case Study
There is difficulty in pinning down an exact definition of a case study. According to Burns
(2000, p. 459) it has become a ‘catch-all’ term for methods research that does not fit into the
any of the other categories of research methods.
The Collins English dictionary definition is useful: “the act or an instance of analysing one
or more particular cases or case histories with a view to making generalizations”. The three
case studies in this dissertation investigate the ideas raised elsewhere in the research with
specific contexts. The exploration of themes, trends or patterns within the bounded scope of
the case study allows for a general theory about the subject under investigation to be formed.
For example, if similar events or behaviours are found to occur for different parties within the
same industry sector can a general theory be formed, and do any exceptions alter that theory?
3.6.5 Method - Grounded Theory
Grounded Theory was originally defined by Glaser and Strauss in 1967 (in Burns, 2000, p.
433) and is perhaps the over-riding method associated with qualitative research. Burns
defines it as the theories that emerge from the body of data as it is analysed together with that
which was discovered previously and includes the testing of speculative ideas (2000, p. 433).
It is related to the cyclical testing of hypotheses as they are formed in the context of the
research framework discussed above.
What is the future of the RDBMS in the Enterprise?
Page 61
3.7 Ethics Approval
Research for this dissertation relied on available sources of data and information already in
existence. No interviews or surveys were conducted during the course of the research. Any
interviews or surveys cited in this dissertation were carried out be third parties previously and
are freely available within the public domain. Early on in the research process ethics approval
was sought from the college on the basis that interviewing experts may be required part of the
case studies. The application was rejected on the grounds of insufficient information.
Following a decision to follow a qualitative approach to the research, involving a ‘grounded
theory’ method, it was decided that the detail sought by the college ethics approval process,
specifically the interview questions could not be formed until later in the research. It is felt
that this is a limitation of the ethics approval process, especially for qualitative analysis
where iterative and exploratory methods of research are used to generate ideas.
3.8 Audience
The intended audience in the first instance is the academic staff of the college and any
external readers chosen by the college, who may read this as part of the final year
undergraduate examination. Other students may find this dissertation informative in itself, or
as a starting point for further research into database management systems, the emerging data
volume problem, the case studies, and/or the research methods used. It may also be useful to
students of business who wish to expand their knowledge of database systems as well as the
current and future database market.
3.9 Significance of research
This dissertation aspires to add to the knowledge of some of the areas discussed. There are
many artefacts (articles, documents, papers, online discussions) available on the subject of
RDBMS, new and emerging databases, and ‘Big Data’. This dissertation attempts to distil the
most relevant information to extract the most essential ideas about the future role of the
RDBMS within enterprises.
What is the future of the RDBMS in the Enterprise?
Page 62
3.10 Limitations of the research methodology
The limitations of the research have been mentioned earlier in this chapter. These limitations
are to a large degree associated with the qualitative approach taken. Reliability and validity
testing of data is more easily applied to quantitative analysis. The sources of information for
this dissertation are textual in form and exist in various contexts. This together with the
subjective interpretations of the researchers means that key assumptions must be questioned
and generalisations must be supported. The scope and purpose of the dissertation has
prompted an approach whereby enough data is collected as a fair representation of the corpus
of information on the subject, to allow a holistic examination possible.
One possible area for future study of research theory is related to our topic of ‘Big Data’.
The huge of amount of data (specifically textual forms) now available to researchers via the
Internet presents a potential for new hybrid methods of research combining quantitative
analysis tools and qualitative approaches. In this respect the growing area of Business
Intelligence can provide nourishment to research intelligence.
3.11 Conclusion
The chapter opened with two quotes from Hermann Nernst and Will Durant illustrating the
symbiotic nature of knowledge and research. It could be said that new knowledge exists in
the brief period between the death of one inquiry and the rebirth of another. While
epistemology has not been the subject of this chapter proper research should not undertaken
without a basic understanding of the underlying theory. This chapter began with a brief
discussion of research theory. It then proceeded to put some of the theory into practice for the
specific purpose of this dissertation. A theoretical framework was illustrated and a research
design set out. A chosen methodology was explained along with its associated methods. The
limitations of the research were stated together with areas for future research.
What is the future of the RDBMS in the Enterprise?
Page 63
Chapter Four - Conclusions, Limitations of Research and Future
Work.
4.1 Introduction
In the first chapter we proposed a research question. That question was examined and
defined. This formed the basis of our hypothesis which we framed in such a way as to
predicate that there was a future for the RDBMS in the Enterprise. The process of examining
and testing the validity of that predicate occurred during the literature review in chapter two.
Chapter three looked at the research methodology chosen that best suited our purpose. The
research question prompted a qualitative approach. A selection of qualitative methods were
chosen as being most appropriate to the nature of the secondary research being undertaken.
We found the application Zotero useful for organising and classifying our Internet sourced
information. The tagging features in particular provide a flexible method of cross-referencing
material supporting or refuting the hypotheses under examination.
This final chapter is focused on bringing the various threads of our research to a conclusion.
It also presents possible topics for future research related to the content of this dissertation.
This future research is that work which for various reasons we could not cover in sufficient
enough detail to do it justice. These topics emerged as branches off the core of our research
stem. The framework of our research methodology enabled us to assess each branch as a new
idea against our central theme. The last part of chapter acknowledges that the decision to
exclude certain topics from detailed discussion presents one limitation of our research. That
and other limitations are summarised here.
What is the future of the RDBMS in the Enterprise?
Page 64
4.2 Conclusions
The following conclusions section is presented in the same sequence as the topics discussed
in the literature review of chapter two.
4.2.1 RDBMS
The discussion earlier on RDBMS presented some interesting findings. Particularly, some of
the historical aspects concerning the development of RDBMS revealed insights into why
RDBMS has a durable and sustained presence within IT. The strength of the early work on
RDBMS carried out by certain IBM research groups was it seems to have been bolstered by
the lack of focus IBM gave to RDBMS initially. With the IBM corporate eyes on other more
commercial DBMS goals the research teams had the time develop their relational version
without the compromising constraints of getting a product ready for market. When IBM did
finally turn their attention towards relational DBMS’ in the early 1980’s they already had the
basis of a good product in System R. One of the key disadvantages of System R at the time
was its physical size (large enough fill a room) compared to the newly emerging mid-scale
systems. SDL’s (now Oracle) DBMS offering for example was comparable to a desktop size
PC today. However, it is the portability of System R’s logical design that has left a lasting
legacy.
4.2.2 New DB’s
Currently, there is a lot of development of new database technologies, with different
companies developing different systems to meet, initially, their own internal requirements.
Some of these database technologies have then been open sourced to the community, to allow
others, be they companies, organisations or individuals, to benefit from the experience of the
pioneer developers and users of those systems.
It does, however, result in a fragmented market of databases with any number of compatible
and complimentary database systems being developed concurrently. This could result in
confusion for any organisations of individuals whom wish to use a database, but are unsure of
which model is right for them.
What is the future of the RDBMS in the Enterprise?
Page 65
4.2.3 Market
Currently, the database market is dominated by a few large proprietary vendors, many of
whom have complete system offerings. They are however, being challenged by smaller
players with innovative offerings. As these challengers are gaining traction in the markets,
they are showing that big vendors have gaps in their portfolios.
One of the interesting gaps being filed is in the area of ‘Big Data’ analytics. This is an area
that is of importance for BI and the ability of companies to gain value from all the data that
they have accumulated, thus allowing them to implement new strategies or make day-to-day
decisions. Many of the disruptive technologies that new companies are developing are
targeting this area of the database market for growth. In turn, this has made them targets for
the big players looking to enter new markets or to add value to their portfolios. Usually when
the best features of one entity are put with the best features of another creating a single entity,
we call it a ‘hybrid’. Large vendors through their acquisition of new technologies are now
beginning to offer combinations of data management systems in what is often referred to as a
‘stack’ offering. RDBMS would seem to remain an important ingredient within those stacks
as its best features are put to work alongside newer technologies. Trusted vendors together
with stacks incorporating older but reliable technology provide an absorption path for less
mature open source products into the traditionally cautious enterprise DBMS market.
The jobs market for employees with the necessary skills for development of new NoSQL
databases is currently in its infancy. The numbers may be small, but given the relative youth
of the sector, and having a three-fold increase in demand for qualified staff with the relevant
expertise, the market for non-relational databases like Hadoop and HBase/MapReduce is
likely to increase substantially with time.
The decision making around the suitability of NoSQL or SQL based RDBMS’ for enterprises
is dependant on a number of factors. The relative immaturity of open source systems and the
unstable nature of open source development can evoke caution amongst Chief Information
Officer’s or their equivalents. However, established vendors are starting incorporate open
source into their product offering. CA’s well established product Ingres has already moved to
an open source platform in the hope of leveraging competitive advantage by coupling open
development with CA Ingres’ reliability and brand. There are many mid and small scale data
management systems worth considering even within enterprises for achieving certain project
objectives.
What is the future of the RDBMS in the Enterprise?
Page 66
4.2.4.1 Case Study 1 - Utility Companies
Utility companies, primarily energy, water and transport suppliers, are conservative in
general due to the historical constraints of state regulation. Even when that regulation is
relaxed or removed altogether, safety, cost and control measures tend to make these
enterprises more vigilant when it comes to investing in IT. However, the landscape for
utilities is changing rapidly and ‘Big Data’ looms large. For Electricity providers such as the
ESB the investment in Smart Grid Networks means investing in information systems that can
manage large amounts of data in real time as well as providing for off-line analytics. For the
more risk adverse companies the decision making may involve uncomfortable choices as
solution providers move towards open source technologies as part of their offerings. The fact
that most of the large scale and trusted vendors have Utilities solutions for managing large
data volumes may provide some cushion of comfort albeit at a higher cost.
4.2.4.2 Case study 2 - Social Networks
It is clear to see that new systems at Social Networking companies such as Facebook and
Twitter require both speed and consistency for their new services to be effective and adopted
by their users. As such, architectures based on older RDBMS models no longer provide the
necessary performance for them to be utilised on a large scale for day-to-day applications
within the Social Networking community.
Utilising open source NoSQL database architectures such as Hadoop/MapReduce allows ‘Big
Data’ users such as Facebook and Twitter to run applications on websites and crunch data
more efficiently than using traditional RDBMS model databases such as MySQL. With their
successful use by high profile adopters, it will provide a platform from which NoSQL
databases can attract the attention of other companies and organisations wishing to utilise the
power and functionality of the likes of Hadoop/MapReduce and Cassandra for their own
database requirements.
What is the future of the RDBMS in the Enterprise?
Page 67
4.3 Future Research
4.3.1 NoSQL
NoSQL databases have only been in existence for a few years, no more than a decade at best.
With the dominance of RDBMS offerings in the market, NoSQL’s youth and adoption by the
open source community will eventually see NoSQL compete with traditional database
models. However, there are caveats on the horizon. According to Market Research Media, the
future market for NoSQL databases will grow from ‘no value’ today to $1.8bn by 2015.
Additionally, this growth will eventually be dependent on NoSQL databases incorporating
transactional consistency into their infrastructures, as this is seen as an obstacle to mass
adoption of NoSQL (Market Research Media, 2010). This requirement for transactional
consistency opens a new area of research in the future; to determine whether or not NoSQL
developers and vendors can incorporate into their designs a database architecture that does
not currently lend itself to NoSQL database constraints (or lack of them) to function.
Brisk, a recently announced NoSQL database from Datastax (Datastax, 2011), is a hybrid
design of Cassandra that incorporates elements of Hadoop and Hive into its design. The
purpose of Brisk is to marry the low-latency capabilities of Cassandra with the analytical
capabilities of Hadoop/Hive (Datastax, 2011). According to Matthew PFeil, Brisk is capable
of providing a tight feedback loop between real-time application and the analytics that follow
(PFeil in Harris, 2011). If Brisk performs as advertised by Datastax, it has the potential for
providing companies and organisations a fast real-time means of analysing big-data at a
greater performance speed. However, alternative systems are being developed by other
vendors, including the big players such as IBM and EMC, whom are unhappy with the
current performance of Hadoop/MapReduce (Harris, 2011). Additional research into the state
of the NoSQL market to reflect the current and future developments of Hadoop/Cassandra,
and to determine, if possible, a trend in the market that will indicate a preferred platform for
NoSQL databases, is recommended. Although not transactional in nature, incorporating the
different capabilities of current NoSQL databases may involve the adoption of transactional
qualities at a later date.
What is the future of the RDBMS in the Enterprise?
Page 68
4.3.2 Case Studies
During the case study on utility companies it was found that Tennessee Valley Authority has
adopted an open source platform for its management of the data. There may be other utilities
company that have taken this path. Further primary research is required to ascertain if this is
the case and if so to evaluate the results.
4.3.3 Business Intelligence
One of the areas of growth relating data management is Business Intelligence (BI). The
market for BI platforms has seen much consolidation recently as large vendors acquire new
technologies. Oracle, IBM, Microsoft and SAP now offer complete ‘stacks’ for BI (Sallam,
2010). The RDBMS fit within the architecture of the various BI solutions available is one
research thread which requires further time and study to follow to conclusion.
4.3.4 Research Methodology
On the matter of the research methodology itself, there is scope for further study. As
mentioned earlier we found that by applying appropriate tags to our Internet sourced research
material, a not unsubstantial amount, we could retrieve information quickly, relevant to
whatever idea or research thread we happened to be following at the time. Tagging data is not
new and is related to the meta-data concept used by many information systems today. What is
interesting and possibly merits further study is how our tagging enabled us to correlate
information based on context. By carefully choosing combinations of keywords into multiple
tags complex relationships could be built. As relationships are at the core of RDBMS perhaps
a revisit of Edgar Codd’s concept of related domains is worth re-examination. Blowing the
dust of some of Codd’s earliest papers might be a good starting point.
Another case for future study concerns the Internet search engines used by us (Google in
particular) for the purpose of retrieving information. Google presents a list of items based in
order of popularity. Care was taken not to blindly accept the first few returned items on page.
Also, it should be noted that a particular algorithm used by a search engine may influence
patterns in the returned information that may or may not be reliable. Common sense we hope
prevailed in our assessments of key assumptions. In this matter further study is necessary if
we are to maintain a confidence in information search systems especially those operating
under competitive business models.
What is the future of the RDBMS in the Enterprise?
Page 69
4.4 Limitations of the Research
The aim of this dissertation is to present to the reader with a review of the available
information selected by us from various sources in relation to the research question and to
propose some conclusions based on our findings. In the final section of chapter three the
limitations of our chosen research methodology was discussed. In this section outlines the
limiting factors which impacted on the overall content of the dissertation.
Due to the relative youth of the NoSQL database industry, research material is limited and
much of it has not been robustly tested over a sufficiently large period of time to determine
NoSQL’s effectiveness and impact on the database market outside a select few segments of
the industry. There is a lack of published academic literature to support the conclusions of
database vendors that have NoSQL offerings, save published literature from the developers
and users of the databases themselves much of which may be biased.
Given that the NoSQL industry is youthful in comparison to RDBMS, the greatest source for
research material has been via the Internet. This has caused its own problems, as the great
body of material available has made it difficult to determine what material and articles will
have the best benefit for our research, as almost anyone can have an opinion on any given
topic. As such, we have tried to restrict ourselves to original source material, as well as
sources with journalistic and research integrity, such as Forrester and Gartner.
Additionally, having taken upon ourselves a research topic in what has proved to be in a
dynamic, youthful and fluid industry, developments can occur almost daily that can influence
and/or effect conclusions that we have made already, requiring a readjustment of our findings
regularly throughout our research. This has occurred most recently with the announcement by
a company called Datastax that they are releasing a hybrid of the two most relevant open
source NoSQL database offerings called Brisk. This development occurred subsequent to us
completing the gathering of our research material. Given additional time, we would have
incorporated this development into the main body of our dissertation, instead of leaving it for
others to follow.
What is the future of the RDBMS in the Enterprise?
Page 70
“The outcome of any serious research can only be to make two questions grow where
only one grew before”
Thorstein Veblen (1857 – 1929)
The quotation above is from the US economist & social philosopher Thorstein Veblen. It
sums up the notion that the limitations of research are bounded only by how many questions
there are. It is in this spirit that we pass on the baton to others to ask more questions of the
answers found.
4.5 Final thoughts
“Technology presumes there's just one right way to do things and there never is.”
Robert M. Pirsig
Having reached the end of our dissertation we do not propose any one overriding conclusion.
However, if one was forthcoming it might be stated in the form of advice. Anybody involved
in the decision making regarding technology should bear in mind the above quotation from
Pirsig. Enterprises embarking on new information systems projects or a review of their data
management requirements should understand the strengths and weaknesses of the various
offerings. They should be careful to analyse their choices with respect to the overall business
objective. This means understanding where critical performance is required and where
sacrifices can be made. There is no ‘one fits all’ solution. RDBMS and newer emerging non-
relational DBMS’ are each merited for certain objectives and it very much depends on where
the overhead is placed. Trust and reliability are key factors applying to both vendors and their
products. It is hoped that this dissertation has presented a case for maintaining an open mind
when it comes to choosing data management systems for the enterprise. The scope and
quality of solutions available to manage the increasing scale and complexity of information in
the enterprise is growing and by no means excludes legacy systems such as RDBMS.
RDBMS’ in the future may no longer hold the bulk of the enterprise’s data but what it does it
does well. The good work carried out at the beginning will ensure that the RDBMS has still
got much to offer.
What is the future of the RDBMS in the Enterprise?
Page 71
REFERENCES
Aclara Software MDMS , (2008) Scalability White Paper http://www.aclaratech.com/AclaraSoft/WhitePapers/Aclara%20Software%20MDMS.pdf [Accessed on 05/02/2011] AMT Sybex. (no year) SMART DTS – AMT - Sybex’ Smart Utility Solution. http://www.amt-sybex.com/media/pdf/SmartDTS2010.pdf. [Accessed on 01/03/2011] AMT Sybex case study (2009) ESB Market Opening Information Exchange Project, http://www.amt-sybex.com/OurEurope/ESBCaseStudy.aspx. [Accessed on 17/02/2011] Anthes, Gary, (2010), Happy Birthday, RDBMS, Communications of the ACM. Volume 53/Number 5/ May 2010. Bachman, Charles (1973), Programmer as Navigator, Turing Award 1973 Speech, ACM http://awards.acm.org/images/awards/140/articles/1896680.pdf [Accessed on 27/03/2011] Anand, A., (2008). PIG – The Road to an Efficient High-Level language for Hadoop http://developer.yahoo.com/blogs/hadoop/posts/2008/10/pig_-_the_road_to_an_efficient_high-level_language_for_hadoop/ [Accessed on 07/03/2011] Apache, (2011). http://cassandra.apache.org [Accessed on 06/03/2011] Apache (2010a). Architecture Overview - Cassandra Wiki http://wiki.apache.org/cassandra/ArchitectureOverview [Accessed on 07/03/2011] Apache, (2010b). HDFS Architecture http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#Replication+Pipelining [Accessed on 07/03/2011] Arnett, Bob. (2011) “Extreme IT Makeover” Transforms Georgia Utility. Electric Energy Online. Available from http://www.electricenergyonline.com/?page=show_article&mag=54&article=404 [Accessed on 04/02 2011] Barrass, Robert, 2005, 3rd ed, Students Must Write, Routledge, UK. BioMed Central, The Hawthorne Effect: a randomised, controlled trial. Available from: http://www.biomedcentral.com/1471-2288/7/30 [Accessed on01/03/ 2011]
What is the future of the RDBMS in the Enterprise?
Page 72
Bayes, Pere Urbon (2010). Graph Theory and databases. http://nosql.mypopescu.com/post/2316706732/graph-theory-and-databases [Accessed on 30/03/2011] Bednarz, A, Feb 13, 2006, Utility tackles data integration, analysis, Network World; Volume 6 (23), pg. 34, ABI/INFORM Global. Bocchino, Robert L. Jr., Book Review – ‘Software Patents’ by Gregory A. Stobbs. Harvard Journal of Law & Technology Volume 9, Number 1 Winter 1996. New York, N.Y. John Wiley & Sons, Inc. 1995. Pg. 623. http://jolt.law.harvard.edu/articles/pdf/v09/09HarvJLTech213.pdf [Accessed on 05/02/2011] Bocij, Paul et al (2006) Business Information Systems. Essex: Pearson Education Ltd. Bord Gáis Strategy, Available from: http://www.bordgais.ie/corporate/aboutus [Accessed on 05/02/2011] Broad, Keith, 2011, CASE STUDY - Bluewater Power Goes ERP Route to Address Deregulation. Available from: http://www.electricenergyonline.com/?page=show_article&mag=57&article=374 [Accessed on 10/02/2011]. Brown, J. M., November 2 2010 17:22, Energy policy: It pays to be smart when planning for power needs, Financial Times, Available from: http://www.ft.com/cms/s/0/4e4d638c-e611-11df-9cdd-00144feabdc0.html#ixzz1DyGcqQLR [Accessed on 04/02/2011]. Bryant, Randal E., & Kwan, Thomas T. (2008). Milestone Week in Evolving History of Data-Intensive Scalable Computing. Carnegie Mellon University and Yahoo! www.cra.org/ccc/docs/bigdata_highlights.pdf [Accessed on 10/04/2010] Business Dictionary (2011). Business Dictionary. WebFinance, Inc. http://www.businessdictionary.com/definition/database.html [Accessed on 14/03/2011] Bylund, A., (2011) Teradata: A Study of Contrasts http://www.fool.com/investing/general/2011]/02/14/teradata-a-study-in-contrasts.aspx [Accessed on 06/03/2011] Cambridge (Online Edition)(2011) Cambridge Advanced Learner’s Dictionary. Cambridge University Press. UK. Cambridge (3rd. Edition.)(2008). Cambridge Advanced Learner’s Dictionary. Cambridge University Press. UK.
What is the future of the RDBMS in the Enterprise?
Page 73
Carole, J, The Surprise Winners in the $34 Billion Smart Grid Market, Lux Research, Inc. http://www.businesswire.com/news/home/20110126005312/en/Surprise-Winners-34-Billion-Smart-Grid-Market [Accessed on 10/04/2011] Cassandra, (2011). http://cassandra.apache.org/ [Accessed on 28/03/2011] Cassandra wiki, (2011). Cassandra:L Architecture Overview http://wiki.apache.org/cassandra/ArchitectureOverview [Accessed on 28/03/2011] Cattell, Rick (2011). Scaleable SQL and NoSQL Data Stores http://cattell.net/datastores/Datastores.pdf [Accessed on 29/03/2011] Chamberlin, Donald D., (1976). Relational Data-Base Management Systems IBM Research Laboratory, San Jose, California 96193. Computer Surveys. Vol.8/No.1. March 1976. http://www.dpi.inpe.br/cursos/ser303/relational_csur.pdf [Accessed on 10/04/2011] Chang, F, Dean, J, et al, (2006). Bigtable: A Distributed Storage System for Structured Data. http://labs.google.com/papers/bigtable.html [Accessed on 06/03/2011] Chu, Eric; Beckmann, Jennifer; and Naughton, Jeffrey, 2007, The Case for a Wide-Table Approach to Manage Sparse Relational Data Sets. Available from: http://www.citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.107.36 [Accessed on 17/02/2011] Codd, Edgar, F. (1985). "Is Your DBMS Really Relational?" (Part 1) and "Does Your DBMS Run By the Rules?" (Part 2). ComputerWorld, October 14 and October 21. Codd, Edgar, F. (1991, Version 2) The Relational Model for Database Management, Reading: Addison-Wesley. Codd, Edgar, F. (1970) A Relational Model of Data For Large Shared Data Banks. Communications of the ACM. Volume 13/Number 6/ June. 1970 http://cacm.acm.org/magazines/1970/6/12368-a-relational-model-of-data-for-large-shared-data-banks/pdf?dl=no [Accessed on 10/04/2011] Cohen, J., (2010). Facebook Showcasing Its Open Source Database http://www.allfacebook.com/facebook-showcasing-its-open-source-database-2010-11 [Accessed on 07/03/2011] Cukier, Kenneth. (2010) ‘Data, data everywhere - A Special Report on Managing Information’. The Economist, Vol. 394, No. 8671.
What is the future of the RDBMS in the Enterprise?
Page 74
Datastax, 2011. DataStax Rewires Hadoop for Low-Latency Applications with Apache Cassandra http://www.datastax.com/news/press-releases/datastax-rewires-hadoop-for-low-latency-applications-with-apache-cassandra [Accessed on 07/04/2011] Datawarehouse4u (2009). OLTP vs. OLAP. Datawarehouse4u.info
http://datawarehouse4u.info/OLTP-vs-OLAP.html [Accessed on 05/03/2011] Davison, Robert M. (1998) (Submitted Phd) ‘Chapter 3 Research Methodology’ in An Action Research Perspective of Group Support Systems: How to Improve Meetings in Hong Kong., City University. Hong Kong. http://www.is.cityu.edu.hk/staff/isrobert/phd/phd.htm. [Accessed on 22/02/2011]. DBMS2, (2011). DataStax OpsCenter for Apache Cassandra announced http://www.dbms2.com/2011]/02/01/datastax-opscenter-cassandra/ [Accessed on 06/03/2011] Dimitrov, Marin (2010). NoSQL databases. http://www.slideshare.net/marin_dimitrov/nosql-databases-3584443?from=ss_embed [Accessed on14/03/2010] EJBRM, e-Journal of Business Research Methods, July 2003, Vol. 2, Issue 2, http://www.ejbrm.com/volume2/issue2, [Accessed on 20/02/2011]. Ellis, Timothy J. & Levy, Yair. (2008) ‘Framework of Problem-Based Research: A Guide for Novice Researchers on the Development of a Research-Worthy Problem’. International Journal of an Emerging Transdiscipline Volume 11. Elmasri, Ramez and Nahathe, Shamkant B. (1989) Fundamentals of Database Systems. California: The Benjamin/Cummings Publishing Company. Engage Consultants, May 2010, High-level Smart Meter Data Traffic Analysis, For: ENA, Document Ref: ENA-CR008-001-1.4 Available from: http://www.energynetworks.org/ena_energyfutures/ENA-CR008-001-1%204%20_Data%20Traffic%20Analysis_.pdf [Accessed on 17/02/2011] ESB Strategy Framework 2020, Available from: http://www.esb.ie/main/sustainability/strategy-to-2020.jsp [Accessed on 05/02/2011] ESB Strategy on Sustainability, available from: http://www.esb.ie/main/sustainability/smart-meters.jsp [Accessed on 5 February 2011
What is the future of the RDBMS in the Enterprise?
Page 75
Evans, B. (2011). Global CIO: IBM’s Most Disruptive Acquisition Of 2010 Is Netezza. http://www.informationweek.com/news/global-cio/interviews/showArticle.jhtml?articleID=229201238&queryText=Jim%20Baum [Accessed on 06/03/2011] Feinberg, D., Beyer, M. A. (2010). Magic Quadrant for Data Warehouse Database Management Systems. http://www.businessintelligence.info/docs/estudios/Gartner-Magic-Quadrant-for-Datawarehouse-Systems-2010.pdf [Accessed on 05/03/2011] Fehrenbacher, Katie, Jan. 24, 2010, Smart Grid 101: Utilities Are Very Risk Averse, Available from: http://gigaom.com/cleantech/smart-grid-101-utilities-are-very-risk-averse/ [[Accessed on 15/02/2010]] Fehrenbacher, Katie, Nov. 11, 2009, Why Open Source for the Smart Grid Needs a Kick-Start. Available from: http://gigaom.com/cleantech/why-open-source-for-the-smart-grid-needs-a-kick-start/. [Accessed on 15 February 2011. Fink, D, (2010), A Little Gray Switch Is a Hot-Button Issue for Solar Homeowner [Internet], The Solar Home & Business Journal, Available from: http://solarhbj.com/2010]/02/disconnect-switch-hot-button-issue-for-solar-owners-000100.php [Accessed on 15/02/2010]
Finkle, J. (2008). Microsoft, Oracle databases gain share vs IBM http://www.reuters.com/article/2008/08/26/software-database-idUSN2634118720080826 [Accessed on 06/03/2011] Gardner, D., (2010). IBM acquires Netezza as big data market continues to consolidate around appliances, middle market, new architecture. ZDnet blog, 21 September http://www.zdnet.com/blog/gardner/ibm-acquires-netezza-as-big-data-market-continues-to-consolidate-around-appliances-middle-market-new-architecture/3861?tag=content;search-results-rivers [Accessed on 06/03/2011] Ghosh, Debasish. (2010) Multiparadigm Data Storage for Enterprise Applications. IEEE Software, Los Alamitos: Sep/Oct 2010. Vol. 27, Iss. 5; p. 57 http://proquest.umi.com.elib.tcd.ie/pqdlink?index=2&did=2110653061&SrchMode=1&sid=4&Fmt=6&VInst=PROD&VType=PQD&RQT=309&VName=PQD&TS=1300019837&clientId=11502 [Accessed on 14/03/2011] Giroti, Tony., 2011,You’ve Got the Meter Data – Now What?, Electric Energy Online. Available at: http://www.electricenergyonline.com/?page=show_article&mag=66&article=524 [[Accessed onFebruary 4, 2011].
What is the future of the RDBMS in the Enterprise?
Page 76
Graham, C., Sood, B., Sommer, D., Horiuchi, H.. (2010). Cited in: MarketShare: RDBMS Software by Operating System, Worldwide, 2009. http://www.oracle.com/us/products/database/number-one-database-069037.html [Accessed on 05/03/2011] Greenbaum, J. (2010). SAP buys Sybase, and History is Re-Written http://www.enterpriseirregulars.com/17887/sap-buys-sybase-and-history-is-re-written/ [Accessed on 06/03/2011] Greenplum, (2010). About page. http://www.greenplum.com/about-us/ [Accessed on 06/03/2011] Grimes, Seth, (2011) BridgePoint Article - Unstructured Data and the 80 percent rule. Clarabridge. http://clarabridge.com/default.aspx?tabid=137&ModuleID=635&ArticleID=551 [Accessed on 14/03/2011] Hammond, J., 24 March 2009, Open Source Adoption: What Your Peers Are UpTo, Forrester Research, Available from: http://www.linux.com/news/enterprise/biz-enterprise/345132-forrester-analyst-says-open-source-has-won [Accessed on 04/02/2010] Harris, Derrick, citing PFeil, Matthew, 2011. DataStax Shakes Up Hadoop with NoSQL-Based Distro http://gigaom.com/cloud/datastax-shakes-up-hadoop-with-nosql-based-distro/ [Accessed on 07/04/2011] Harris, Derrick, 2011. Yahoo Suggests MapReduce Overhaul to Improve Hadoop Performance http://gigaom.com/cloud/yahoo-suggests-mapreduce-overhaul-to-improve-hadoop-performance/ [Accessed on 07/04/2011] Hayes, Frank, (2002) The Story So Far. Computerworld. April 15, 2002 12:00 PM ET http://www.computerworld.com/s/article/70102/The_Story_So_Far [Accessed on 14/03/2011] Henschen, Doug, (2010) The Big Data Era: How Data Strategy Will Change. InformationWeek . August 7, 2010 02:00 AM. http://www.informationweek.com/news/showArticle.jhtml?articleID=226600216&cid=tab_art_entsoft [Accessed on 13/03/2011] Henschen, Doug. (2011a). Gartner Ranks Data Warehousing Leaders. http://www.informationweek.com/news/software/info/managemnt/showArticle.jhtml?articleID=229215658&cid=RSSfeed_IWK_All [Accessed on 05/03/2011]
What is the future of the RDBMS in the Enterprise?
Page 77
Henschen, Doug. (2011b). Global CIO: IBM’s Most Disruptive Acquisition Of 2010 is Netezza. http://www.informationweek.com/news/global-cio/interviews/showArticle.jhtml?articleID=229201238&pgno=2&queryText=Jim+Baum&isPrev= [Accessed on 06/03/2011] Higginbotham, S., (2010) Digg Not Likely to Give Up on Cassandra http://gigaom.com/2010]/09/08/digg-not-likely-to-give-up-on-cassandra/ [Accessed on 06/03/2011] Hoff, Todd., (2010). Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month. HighScalability Blog, November 16 http://highscalability.com/blog/2010]/11/16/facebooks-new-real-time-messaging-system-hbase-to-store-135.html [Accessed on 07/03/2011] Hoff, Todd (2009). Neo4J graph database kick buttox. http://highscalability.com/neo4j-graph-database-kicks-buttox [Accessed on 30/03/2011] Hoff, T., (2008). Product: Scribe - Facebook's Scalable Logging System http://highscalability.com/product-scribe-facebooks-scalable-logging-system [Accessed on 07/03/2011] Hypertable, 2011. http://www.hypertable.org/sponsors.html [Accessed on 30/03/2011] Irish Government Energy White Paper, Available from: http://www.dcenr.gov.ie/NR/rdonlyres/54C78A1E-4E96-4E28-A77A-3226220DF2FC/27356/EnergyWhitePaper12March2007.pdf [Accessed on 04/02/ 2011 IT Jobs Watch, (2011). http://www.itjobswatch.co.uk/jobs/uk/hadoop.do [Accessed on 06/03/2011] Kanaracus, C., (2011). EMC’s Greenplum Offers ‘Big Data’ Tools at No Charge http://www.pcworld.com/businesscenter/article/218355/emcs_greenplum_offers_big_data_tools_at_no_charge.html [Accessed on 06/03/2011] Kennedy, J., 4 Feb 2011, Utility firms lead US$950bn charge towards sustainable IT, Available at: http://www.siliconrepublic.com/strategy/item/20251-utility-firms-lead-the/ [Accessed on 15/02/2011] Kerner, S. M., (2010) Citing: Quinn, J. Digg Moves From MySQL to NoSQL http://itmanagement.earthweb.com/datbus/article.php/3870116/Digg-Moves-From-MySQL-to-NoSQL.htm [Accessed on 06/03/2011]
What is the future of the RDBMS in the Enterprise?
Page 78
Krill, Paul; Leon, Mark, (1996) Dual force. InfoWorld; Dec 2, 1996; 18, 49; ABI/INFORM Global pg. 37. http://proquest.umi.com.elib.tcd.ie/pqdlink?index=4&did=10472333&SrchMode=1&sid=9&Fmt=6&VInst=PROD&VType=PQD&RQT=309&VName=PQD&TS=1300022938&clientId=11502 [Accessed on 14/03/2011]
Lakshman, A., (2008). Cassandra – A structured storage system on a P2P Network http://www.facebook.com/note.php?note_id=24413138919 [Accessed on 07/03/2011] LaValle, S., Hopkins, M., Lesser, E., Shockley, R., Krushwitz, N.. (2010). Analytics: The new path to value. http://www.informationweek.com/whitepaper/download/showPDF.jhtml?id=180700006&site_id=300001&profileCreated=&k=IWKREG [Accessed on 05/03/2011] Lawton, C., 8 April 2008, Business Technology: Open-Source Databases Make Headway; Scaling Challenges Remain as Upstarts Carve Bigger Niche. Wall Street Journal (Eastern Edition), p. B.5. Retrieved February 16, 2011, from ABI/INFORM Global. (Document ID: 1458385931). Legal dictionary, Available from: http://legaldictionary.thefreedictionary.com/Public+Utilities [Accessed on15/02/2011] Lindenberger, M., ‘Open Source Still Earning Trust as CIOs Consider Enterprise Software Solutions’,[Blog], Posted: 24-Apr-2007 14:56:02 Available from: http://www.itbusinessedge.com/cm/blogs/bentley/open-source-still-earning-trust-as-cios-consider-enterprise-software-solutions/?cs=14422 [Accessed on 04/02/2011] Lohr, Steve, 20 April 2009, In Sun, Oracle Sees a Software Gem, New York Times Available from: http://www.nytimes.com/2009/04/21/technology/companies/21sun.html?_r=1&partner=rss&emc=rss [Accessed on 15/02/2011] Lorica, B., (2009). Most Hadoop jobs are in California http://radar.oreilly.com/2009/06/most-hadoop-jobs-are-in-california.html [Accessed on 06/03/2011] Mackie, Kurt. (2011). Microsoft Unveils SQL Server Fast Track Data Warehouse 3.0 http://tdwi.org/articles/2011/02/17/sql-server-fast-track-data-warehouse.aspx? admgarea=news [Accessed on 10/04/2011]
What is the future of the RDBMS in the Enterprise?
Page 79
Market Research Media, (2010). NoSQL Market Forecast 2011-2015, Tabular Analysis, Publication: 11/2010. Market Research Media. http://www.marketresearchmedia.com/2010/11/11/nosql-market/ [Accessed on 10/04/2011] McGoveran, David and Date, C.J., 2010) Letter to the Editor: How to Celebrate Codd’s RDBMS Vision. Communications of the ACM. Volume 53/Number 10/October 2010. http://mags.acm.org/communications/2010]10/?folio=7&CFID=14255234&CFTOKEN=78521835#pg9 [Accessed on 10/04/2011] McHugh, Josh. (1997) Michael Stonebraker: The ultimate database. Forbes. New York: Jul 7, 1997. Vol. 160, Iss. 1; pg. 326, 1 pgs [Accessed on 14/03/2011] McIssac, Kevin, (2007) The data deluge: The growth of unstructured data. Computerworld. 12 September, 2007 09:57 http://www.computerworld.com.au/article/195150/data_deluge_growth_unstructured_data/ [Accessed on 10/04/2011] McJones, Paul (Ed) et al. (1997) The 1995 SQL Reunion: People, Projects, and Politics, SRC Technical Note, 1997 – 018,August 20, 1997. http://www.mcjones.org/System_R/SQL_Reunion_95/SRC-1997-018.pdf [Accessed on 10/04/2011] Metz, C., (2011a). Google v Facebook salary inflation riles Big Data startup http://www.theregister.co.uk/2011]/02/04/cloudera_caught_between_facebook_and_google/ [Accessed on 03/06/2011] Metz, C., (2011b). HBase: Shops swap MySQL for open source Google mimic http://www.theregister.co.uk/2011]/01/19/hbase_on_the_rise/ [Accessed on 06/03/2011] Metz, C., (2010). Facebook unveils 'next-gen' messaging system http://www.theregister.co.uk/2010]/11/15/facebook_announcement/ [Accessed on 06/03/2011] Metz, C., (2008). Cokeheads slip AI onto Yahoo! front page http://www.theregister.co.uk/2008/07/10/yahoo_front_page_ai/ [Accessed on 06/03/2011] Miles, M. B. and Huberman, A. M. (1994) Qualitative data analysis: A sourcebook of new methods. SAGE. USA. Mitchell, Bradley. (No Date). http://compnetworking.about.com/od/speedtests/a/network_latency.htm [Accessed on31/03/2011]
What is the future of the RDBMS in the Enterprise?
Page 80
Monash, Curt (2010a). NoSQL Basics, Benefits and Best-Fit Scenarios. InformationWeek. October 10, 2010 03:33 PM http://www.informationweek.com/news/software/info_management/showArticle.jhtml?articleID=227701021&pgno=1&queryText=&isPrev [[Accessed on 13/03/2011] Monash, Curt. (2010b). Quick reactions to SAP acquiring Sybase. http://www.dbms2.com/2010]/05/12/sap-acquire-sybase/ [Accessed on 06/03/2011] Moran, Aidan P. (2000) Managing Your Own Learning At University –A Practical Guide, UCD Press. Dublin. Muthukkaruppan, K., (2010). The Underlying Technology of Messages http://www.facebook.com/note.php?note_id=454991608919 [Accessed on 06/03/2011] Neil, S., (2011). Citing: Gillespie, M. Sybase adds Enterprise Security to Android. http://www.managingautomation.com/maonline/exclusive/read/Sybase_Adds_Enterprise_Security_to_Android_27756683 [Accessed on 06/03/2011] Netezza, (2011). http://www.netezza.com/data-warehouse-appliance-products/twinfin.aspx [Accessed on 06/03/2011] Neubauer, Peter (2010). Graph Databases, NOSQL and Neo4j http://www.infoq.com/articles/graph-nosql-neo4j [Accessed on 30/03/2011] Neo4j, (2011). http://neo4j.org/ [Accessed on 31/03/2011] Oracle History (2011), Oracle. http://www.oracle.com/us/corporate/timeline/index.html [[Accessed on 27/03/2011] Oracle White Paper January 2011, Smart Metering for Electric and Gas Utilities Available from: http://www.oracle.com/us/industries/utilities/046593.pdf [Accessed on 17/02/2011] Oracle, 2007, Oracle Buys Lodestar, Available from: http://www.oracle.com/us/corporate/Acquisitions/lodestar/oracle-lodestar-faq-072227.pdf [Accessed on 15/02/2011] Oxford English Dictionary (8th Edition) (1990), The Oxford English dictionary of Current English. Oxford:Clarendon Press.
What is the future of the RDBMS in the Enterprise?
Page 81
Pariseau, Beth, (2009), Energy IT sees smart-grid boon,SearchStorageChannel.com. Available from: http://searchstoragechannel.techtarget.com/news/1355355/Energy-IT-sees-smart-grid-boon-for-data-storage [Accessed on15/02/2011]. Pariseau, Beth, (2008). IDC: Unstructured data will become the primary task for storage . IT Knowledge Exchange. Oct 29 2008 12:26PM GMT http://itknowledgeexchange.techtarget.com/storage-soup/idc-unstructured-data-will-become-the-primary-task-for-storage/ [[Accessed on14/03/2011] Patterson, David A. et al. (1988), A Case for Redundant Arrays of Inexpensive Disks (RAID). University of California, Berkley. Peschka, J., (2010). HBase and Hadoop at Facebook http://facility9.com/2010]/11/18/facebook-messaging-hbase-comes-of-age [Accessed on 07/03/2011] Pointer, R., et al (2010). Twitter Engineering: Introducing FlockDB http://engineering.twitter.com/2010]/05/introducing-flockdb.html [Accessed on 07/03/2011] PR Newswire Europe Limited, 2011 Available from: http://www.prnewswire.co.uk/cgi/news/release?id=160381 Prickett Morgan, T., (2010). Teradata pumps data warehouses with six-core Xeons http://www.channelregister.co.uk/2010]/10/25/teradata_appliance_refresh/ [Accessed on 06/03/2011] Prickett Morgan, T., (2011). EMC lets go of Greenplum Community Edition http://www.theregister.co.uk/2011]/02/01/emc_greenplum_community_edition/ [Accessed on 06/03/2011] ProQuest Database, via TCD Library. Available from: http://stella.catalogue.tcd.ie/iii/encore/search/C%7CSutilities%7COrightresult%7CU1?lang=eng&suite=pearl [Accessed on 10/04/2011] Rapp, Christof, ‘Aristotle's Rhetoric’, in Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Spring 2010 Edition), Stanford. http://plato.stanford.edu/archives/spr2010/entries/aristotle-rhetoric. [Accessed on 22/02/2011] Ratcliff Donald, (Compiled by) 15 Methods of Data Analysis in Qualitative Research Retrieved from: http://qualitativeresearch.ratcliffs.net/15methods.pdf [Accessed on 10/04/2011] Ratcliff Donald E., 1994, Analytic Induction as a Qualitative Research Method of Analysis, The University of Georgia. http://qualitativeresearch.ratcliffs.net/analytic.pdf [Accessed on 10/04/2011]
What is the future of the RDBMS in the Enterprise?
Page 82
Rodrieguez, Marko A. (2010) Graph Databases. More than an Introduction. http://nosql.mypopescu.com/post/1173828185/graph-databases-more-than-an-introduction [Accessed on 30/03/2011] Rosenberg, Dave, 9 Nov. 2009, Open-source Hadoop powers Tennessee smart grid. Available from: http://news.cnet.com/8301-13846_3-10393259-62.html [Accessed on 16/02/2011] Russell, Bertrand. (1995) History of Western Philosphy. Routledge, London. Sallam, Rita L., (2010) Q&A: The Benefits and Perils of Buying Into the Megavendor Stack. Gartner Research 30 April 2010 ID No: G00200485. Selinger, Pat, (2005) ‘A Conversation with Pat Selinger’. Interview by Jim Hamilton, ACM Queue. Journal, April 18, 2005 http://delivery.acm.org/10.1145/1060000/1059803/p18-hamilton.pdf?key1=1059803&key2=7717841031&coll=DL&dl=ACM&ip=109.255.147.37&CFID=14259874&CFTOKEN=11948404 SmartGrids: European Technology Platform, 2006, ‘What is Smart Grids’, Available from: http://www.smartgrids.eu/?q=node/163 [Accessed on 12/02/2011] Souders, Steve (2009). Even faster Web Sites http://stevesouders.com/docs.velocity-20090622.ppt [Accessed on 31/03/2011] Stedman, C., (1997), Scottish utility tries to end the vendor blame game. Computerworld, Vol31(11), pgs 75, 80. Retrieved February 4, 2011, from ABI/INFORM Global. (Document ID: 11269142). Seligstein, J., (2011). See the Messages that Matter. The Facebook Blog, February 11 http://blog.facebook.com/blog.php?post=452288242130 [Accessed on 07/03/2011] St. John, J. (2011), There Will Be Nine Times the Smart Grid Data by 2020: Cleantech News and Analysis .Available at: http://gigaom.com/cleantech/there-will-be-nine-times-the-smart-grid-data-by-2020/ [Accessed on 06/02/2011. Stevens, Tim (2004). Overcoming High-Latency Database Access with Java Stored Procedures http://www.informit.com/articles/article.aspx?p=170870 [Accessed on 10/04/2011] Stonebraker, Michael, et al., (2010). MapReduce and Parallel DBMSs: Friends or Foes? Communications of the ACM. Volume 53/Number 1/January 2010.
What is the future of the RDBMS in the Enterprise?
Page 83
Stonebraker, Michael, and Dewitt, David J., (2008) A Tribute to Jim Gray, Communications of the ACM. Volume 51/Number 11/November 2008 http://rptcd.catalogue.tcd.ie/ebsco-web/ehost/pdfviewer/pdfviewer?vid=3&hid=10&sid=a54188b1-6075-4543-a891-3df2451bd16d%40sessionmgr14 Stoner, James A.F., & Freeman, Edward R., (1989) Management (Fourth Edition).New Jersey: Prentice-Hall International, Inc. Subramanian, K., (2010) Riptano, Cloudera For Cassandra http://www.cloudave.com/450/riptana-cloudera-for-cassandra/ [Accessed on 06/03/2011] Sybase, (2011). http://www.sybase.com/products/mobileenterprise/ianywheremobileoffice [Accessed on 06/03/2011] Teradata, (2011). Customer List http://www.teradata.com/t/customers-list/browse/ [Accessed on 06/03/2011] Tiernan et al. (3rd Edition) (2006). Modern Management – Theory and Practice for Irish Students. Dublin: Gill & Macmillan. Thiel, Carol Tomme. (1982). Relational DBMS: What's in a Name. Infosystems,1 29(9), 52. from ABI/INFORM Global. (Document ID: 1356651) http://proquest.umi.com.elib.tcd.ie/pqdlink?index=4&did=1356651&SrchMode=1&sid=14&Fmt=2&VInst=PROD&VType=PQD&RQT=309&VName=PQD&TS=1300026039&clientId=11502 [Accessed on 13/03/2011] Trefis Team (2011). Oracle Exadata Software Give Oracle Upside. http://www.trefis.com/articles/33739/oracles-exadata-software-give-oracle-20-upside/2011]-01-18 [Accessed on 05/03/2011] Trochim, Wiliam M.K. (2006) Deduction and Induction. Web Center for Social Research Methods, http://www.socialresearchmethods.net/kb/dedind.php. [Accessed on 22/02/2011] Van Tulder, Gijs, (2003) Storing Hierarchical Data in a Database Sitepoint. April 30th, 2003. http://articles.sitepoint.com/article/hierarchical-data-database [Accessed on 10/04/2011] Vittal, S., (2010). The Marketing Software Convergence Continues: Teradata Acquires Aprimo For $525 Million. Forrester Blog, December 23 http://blogs.forrester.com/suresh_vittal/10-12-22-the_marketing_software_convergence_continues_teradata_acquires_aprimo_for_525_million [Accessed on 06/03/2011]
What is the future of the RDBMS in the Enterprise?
Page 84
Von Finck, K., (2009). the indian wind along the telegraph lines; Blog Archive; The Global Database Market. Kurt Von Finck’s Blog, September 03. http://blogs.gnome.org/mneptok/2009/09/03/the-global-database-market/ [Accessed on 06/03/2011] Warren, C. (2011) Android, BlackBerry & iOS Tied for U.S. Market Share http://mashable.com/2011]/02/01/nielsen-smartphone-marketshare/ [Accessed on 06/03/2011] Weglarz, Geoffrey,(2004) Two Worlds of Data – Unstructured and Structured Information Management Magazine, September 2004. http://www.information-management.com/issues/20040901/1009161-1.html [[Accessed on14/03/2011] Weil, K., (2010). NoSQL at Twitter (2010). Strange Loop Conference, December 23 http://www.infoq.com/presentations/NoSQL-at-Twitter [Accessed on 07/03/2011] White, Tom (2010). Hadoop: the Definitive Guide (2nd Edition). Surrey: O’Reilly Media. Woods, D., (2010) How Digg’s Cassandra Debacle Could Have Been Avoided http://www.forbes.com/2010]/09/21/cassandra-mysql-software-technology-cio-network-digg.html [Accessed on 06/03/2011] Yahoo Research, 2011. PNUTS - Platform for Nimble Universal Table Storage http://research.yahoo.com/project/212 [Accessed: 09/04/2011] Yuhanna, Noel. (2010). Sybase Acquisition By SAP - A Great Move. Forrester Blog, May 17 http://blogs.forrester.com/noel_yuhanna/10-05-17-sybase_acquisition_sap_great_move [Accessed on 06/03/2011] Yuhanna, Noel. (2009). The Forrester Wave: Enterprise Database Management systems, Q2 2009. Forrester. June 30, 2009. Zawodny, J., (2007). Open Source Distributed Computing: Yahoo’s Hadoop Support. Jeremy Zawodny Blog, July 25. http://developer.yahoo.com/blogs/ydn/posts/2007/07/yahoo-hadoop/ [Accessed on 06/03/2011]
What is the future of the RDBMS in the Enterprise?
Page 85
APPENDIX
Table A.1 - Edgar Codd’s original relational model terms and their equivalent or alternative
meaning.
Term Meaning in
Codd’s
RDBMS
Model
Equivalent
in
commercial
RDBMS
Equivalent
in non
RDBMS
Example
Domain Both basic data type and extended data type in order to convey meaning
Meta Data -Not imposed – basic data type is used
Meta Data - Not imposed
Basic Data Type (rules) – Currency, €, Integer etc.
Extended data type(meaning) - Financial, Location etc.
Relation A subset of the Cartesian Product (Set Theory).
A set of composite data
A Table Table, Record, File...
NA
R-Table A base table usually created in the beginning as a root source
NA NA PARTS_TABLE
SUPPLIER_TABLE
Derived R-Table
A table created by combining data from two or more tables
NA NA PARTS_SUPPLIER_TABLE
Note 1:- naming convention is also specified by Codd.
Note 2: by combining two relations we are also combining two Domains (Parts and Suppliers) creating a new composite Domain, lets call it - Capabilities.
What is the future of the RDBMS in the Enterprise?
Page 86
Atomic Value
A data value which should not be sub-divided
Same Same Price, €100
Composite Value
A data value which can be sub-divided
Same Same An address field, text, an audio file.
Tuple A row in Table containing a set of related data.
Record, Record, Row, Column, file
Smith, John, 12 Greenview Road….31/12/1945…etc
Primary Key
One column in a table assigned to hold the unique identifier value for each row (not null)
Same but not imposed
NA PART_ID
123456
345636
Foreign Key
One or more columns in a table assigned to hold a reference to the Primary key field in another table. Used to link tables together and maintain the integrity of the information and relationships
Foreign, Secondary
As above but existing in a derived table.