Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
The Customized Database Fragmentation Technique in
Distributed Database Systems
Mohammed Ibrahim Shareef
Aus Wail Al-Rawi
MASTER THESIS 2011
INFORMATICS
Postadress: Besöksadress: Telefon:
Box 1026 Gjuterigatan 5 036-10 10 00 (vx)
551 11 Jönköping
The Customized Database Fragmentation Technique in
Distributed Database Systems
Mohammed Ibrahim Shareef
Aus Wail Al-Rawi
Detta examensarbete är utfört vid Tekniska Högskolan i Jönköping inom
ämnesområdet informatik. Arbetet är ett led i masterutbildningen med inriktning
informationsteknik och management. Författarna svarar själva för framförda åsikter,
slutsatser och resultat.
Supervisor: Anders Cartensen
Examinator: Vladimir Tarasov
Omfattning: 30 hp (D-nivå)
Datum:
Arkiveringsnummer
Abstract
iii
Abstract
In current age, various companies are using a centralized database system for daily
business transactions in different domains. Some critical issues have been observed
related to the complexity, maintenance, performance and communication cost of data
in centralized data repository for query processing, according to the demand of end-
users from different locations. So, different enterprises are striving to implement
efficient distributed database systems in their business environments for scalability.
The distributed database architecture covers different factors such as transparent
management system, replication, fragmentation and allocation etc. This dissertation
focuses on database fragmentation and techniques which are useful for performing
database fragmentation.
The objective of this research is to investigate efficient algorithm and technique for
database fragmentation in distributed environment. We proposed a customized ISUD
(Insert, Select, Update, Delete) technique after comparative study of the best suitable
techniques, which is selected for implementation purpose. The functionality of the
customized ISUD technique helps to get the precedence of the attribute of a relation
horizontally in database from various sites or location.
The practical objective of this dissertation is to design the architecture and develop,
implement customized ISUD (Insert, Select, Update, Delete) user interface, and to test
the selected algorithm or technique by using the interface. We used C#.Net as a
development tool. This user interface accepts ISUD frequency as an input and
produces ALP (attribute location precedence) values as output. We have incorporated
design science research (DSR) method for customized ISUD technique development.
This customized ISUD technique can be considered as a foundation to implement
horizontal database fragmentation in distributed environment, so that the database
administrator can take a proper decision for allocating the fragmented data to various
sites at initial state of distributed database design.
Abstract
iv
Abstrakt
I dag använder olika företag ett centraliserat databassystem för
dagliga affärstransaktioner i olika domäner. Vissa kritiska frågor har observerats i
samband med komplexiteten, underhåll, prestanda och kommunikations kostnader
av data i centraliserad data arkiv för behandling av förfrågningar, enligt efterfrågan
på slutanvändarna från olika platser. Så, olika företag strävar efter att implementera
effektiva distribuerade databassystem i sina affärsverksamheters miljöer för
skalbarheten. Den distribuerade databas arkitekturen omfattar olika faktorer
såsom transparent ledningssystem, replikering, fragmentering och allokering etcetera.
Denna avhandling fokuserar på databas fragmentering och tekniker som är
användbara för att utföra databas fragmentering.
Syftet med denna forskning är att undersöka effektiv algoritm och teknik för
databas fragmentering i en distribuerad miljö. Vi föreslog en
skräddarsydd ISUD (Insert, Select, Update, Delete) teknik efter en jämförande
studie av de bästa lämpliga teknikerna som har valts för genomförandets ändamål.
Funktionaliteten hos den anpassade ISUD tekniken hjälper till att få
företräde till attribut för en relation horisontellt i databasen från olika platser.
Den praktiska Syftet med denna avhandling är att utforma arkitektur och utveckla,
genomföra anpassade ISUD (Infoga, Välj, uppdatera, ta bort) användargränssnitt,och
att testa den valda algoritmen eller teknik med hjälp av gränssnittet.Vi
använde C#. Net somett utvecklingsverktyg. Dettaanvändargränssnitt accepterar ISU
D frekvens som indata och producerar ALP (attribute location precedence) värden
som utdata. Vi har integrerat design forskning (DSR) metoden för kundanpassad
ISUD teknik utveckling. Denna skräddarsydda ISUD tekniken kan betraktas
som en grund för att implementera horisontell databas fragmentering i distribuerad
miljö, så att databas administratören kan ta ett riktigt beslut för att allokera
fragmenterade data till olika platser vid första läget i distribuerad databas design.
Acknowledgements
v
Acknowledgements
With the immense pleasure we take this opportunity to thank one and all who have
helped in making this project possible.
First of all, we like to thank almighty God, the Most Beneficent, and the most
Merciful. We like to thank Jönköping University for giving us the opportunity to work
on a thesis as a part of our curriculum. We also like to thank our supervisor Anders
Carstensen for his advices, support and facilitator role throughout this final project.
We would also like to thank our examiner and professor Dr.Vladmir Tarasaov for his
valuable suggestion and guidance throughout our thesis. And we also like to thank Mr
Markus Milerup, representing for jordbruksverket(Swedish Department of
Agriculture) Sweden, for providing the information of the company problems in the
scope of this thesis project. At last we would like to thank our family and friends who
gave us social and moral support in order to achieve this thesis.
Key words
vi
Key words
Distributed database, Database Fragmentation, Attribute Locality
precedence, Customized ISUD.
Contents
vii
Contents
1 Introduction ............................................................................. 13
1.1 BACKGROUND ........................................................................................................................... 13 1.1.1 Contribution of the thesis ................................................................................................ 14
1.2 CASE STUDY .............................................................................................................................. 15 1.2.1 Swedish Board of Agriculture ......................................................................................... 15 1.2.2 Case Study for Testing Purpose or for Evaluation of Proposed Technique ................... 15
1.3 PURPOSE/OBJECTIVES ............................................................................................................... 15 1.3.1 Research Question .......................................................................................................... 15 1.3.2 Theoretical Purpose........................................................................................................ 16 1.3.3 Practical Purpose ........................................................................................................... 16 1.3.4 Assumption...................................................................................................................... 16
1.4 LIMITATIONS ............................................................................................................................. 16 1.5 THESIS OUTLINE ........................................................................................................................ 17
2 Theoretical Background ........................................................... 18
2.1 GENERAL DESCRIPTION OF DISTRIBUTED DATABASE ............................................................... 18 2.1.1 What is a Distributed Database System? ........................................................................ 18 2.1.2 Application of Distributed database technology ............................................................. 19
2.2 DISTRIBUTED DATABASE ARCHITECTURE ................................................................................ 20 2.2.1 Architectural Models for Distributed database system ................................................... 21
2.3 UNSOLVED PROBLEMS IN DDBS ............................................................................................... 22 2.3.1 Distribution design ......................................................................................................... 22 2.3.2 Network scaling problems:- ............................................................................................ 23
2.4 DISTRIBUTION DESIGN PROBLEMS ............................................................................................ 23 2.4.1 The Complexity of the Problems ..................................................................................... 23 2.4.2 Interdependencies with Query Optimization................................................................... 24 2.4.3 Improvised Solution for the problems mentioned ........................................................... 24
2.5 INITIAL DESIGN APPROACH FOR DISTRIBUTED DATABASE DESIGN .......................................... 24 2.5.1 Requirements analysis .................................................................................................... 25 2.5.2 Conceptual project.......................................................................................................... 25 2.5.3 Logical project ................................................................................................................ 26 2.5.4 Distribution project ........................................................................................................ 26 2.5.5 Physical project .............................................................................................................. 26
2.6 FRAGMENTATION IN DISTRIBUTED DATABASE DESIGN ............................................................ 26 2.6.1 Horizontal Fragmentation .............................................................................................. 27
2.7 PREVIOUS WORKS ON FRAGMENTATION IN DDBS .................................................................... 30 2.7.1 Database Fragmentation Technique by Shahidul Islam Khan and Dr. A. S. M. Latiful
Hoque 31 2.8 GENERIC FIVE STEPS FOR DATA FRAGMENTATION AND ALLOCATION IN DISTRIBUTED
DATABASE SYSTEMS .......................................................................................................................... 35 2.8.1 Collection of Global Relations ....................................................................................... 36 2.8.2 Frequently Asked Question (FAQs) ................................................................................ 36 2.8.3 Data Allocation Goals .................................................................................................... 36
3 Research Method ..................................................................... 38
3.1 CATEGORIES OF RESEARCH METHODS ...................................................................................... 38 3.2 HIGH LEVEL RESEARCH METHOD FOR DATA INQUIRY ............................................................. 39 3.3 LOW LEVEL METHOD FOR RESEARCH DESIGN .......................................................................... 39
3.3.1 Constructive Research .................................................................................................... 39 3.3.2 Phases of Constructive Research .................................................................................... 40
3.4 LOW LEVEL DESIGN RESEARCH METHODOLOGY (DSR) FOR IMPLEMENTATION ...................... 41 3.4.1 Steps of the Design Science Research Method (DSR) ..................................................... 41
Contents
viii
4 Results...................................................................................... 45
4.1 THEORETICAL RESULTS ............................................................................................................ 45 4.2 PRACTICAL RESULTS ................................................................................................................ 46
4.2.1 Proposed 5-Layer Architecture ................................................................................................ 46 4.2.2 Testing the Proposed Algorithmic approach .................................................................. 55
5 Discussion ................................................................................ 62
5.1 CONTRIBUTION OF THE WORK ................................................................................................. 62
6 Conclusion and Future Work ................................................... 64
6.1 CONCLUSION ............................................................................................................................. 64 6.2 FUTURE WORK.......................................................................................................................... 65
7 References ................................................................................ 66
8 Appendix: ................................................................................ 69
8.1 CASE STUDY APPLICATION ....................................................................................................... 69 8.2 LOG FILE CODE FOR GENERATING CUSTOMIZED ISUD MATRIX TABLE .................................... 69 8.3 ALGORITHM FOR ISUD APPLICATION INTERFACE ..................................................................... 70
List of Figures
ix
List of Figures
FIGURE 1: DATABASE MANAGEMENT SYSTEM IMPLEMENTATION
ALTERNATIVES[1] ............................................................................................... 21
FIGURE 2: STAGES OF THE TOP-DOWN APPROACH IN DISTRIBUTED
DATABASES [3] [5] ................................................................................................ 25
FIGURE 3. BLOCK DIAGRAM OF THE SYSTEM[24] ............................................... 31
FIGURE 4: ALGORITHM FOR FRAGMENTATION[24] .......................................... 32
FIGURE 5.ALP-TABLE-CONSTRUCTION PSEUDO-CODE [24] ............................ 33
FIGURE 6:RESEARCH DESIGN METHOD [7] .......................................................... 38
FIGURE 7: CONSTRUCTIVE RESEARCH METHODOLOGY FOR RESEARCH
DESIGN ................................................................................................................. 41
FIGURE 8: THE GENERAL METHODOLOGY OF DESIGN SCIENCE
RESEARCH [22] ..................................................................................................... 42
FIGURE 9: 5-LAYER ARCHITECTURE FOR PROPOSED FRAGMENTATION
TECHNIQUE ......................................................................................................... 48
FIGURE 10: APPLICATION OF A CASE STUDY ..................................................... 49
FIGURE 11: DATABASE OF CASE STUDY APPLICATION ................................... 50
FIGURE 12: CISUD MATRIX TABLE .......................................................................... 51
FIGURE 13: USER INTERFACE FOR CISUD APPLICATION. ................................ 52
FIGURE 14: INTERFACE FOR SETTING AND GETTING THE PREDICATE
SET FOR INDIVIDUAL HIGHEST ATTRIBUTE. ........................................... 53
List of Figures
x
FIGURE 15: PREDICATE SET FOR HIGHEST ATTRIBUTE PRECEDENCE AT
INDIVIDUAL SITE ............................................................................................... 54
FIGURE 16: ALLOCATION OF FRAGMENTS .......................................................... 54
FIGURE 17: ISUD USER INTERFACE FOR TOTAL COST OF ATTRIBUTE
FROM ALL SITES .................................................................................................. 55
FIGURE 18: RESULTS RETRIEVE FOR TOTAL ALP(ATTRIBUTE LOCALITY
PRECEDENCE) VALUE FROM THREE SITES. ................................................ 56
FIGURE 19: ISUD USER INTERFACE FOR INDIVIDUAL COST OF
ATTRIBUTE FROM INDIVIDUAL SITES .......................................................... 57
FIGURE 20: INDIVIDUAL ALP RESULTS FROM INDIVIDUAL SITES ................. 57
FIGURE 21: ALLOCATION OF DATA TO DIFFERENT SITES .............................. 58
FIGURE 22: ISUD INPUT VALUES (1) ....................................................................... 58
FIGURE 23: INTERPRETATION OF RESULT 1 ........................................................ 59
FIGURE 24: GRAPHICAL INTERPRETATION OF RESULT 1 ................................ 59
FIGURE 25: ISUD INPUT VALUES (2) ........................................................................ 60
FIGURE 26: INTERPRETATION OF RESULT 2 ........................................................ 60
FIGURE 27: GRAPHICAL INTERPRETATION OF RESULT 2 ................................ 61
FIGURE 28: BHARAT TRANSPORT SERVICE APPLICATION (CASE STUDY) .... 69
List of Tables
xi
List of Tables
TABLE 1: PROJECT S1 .................................................................................................. 29
TABLE 2: PROJECT S2 .................................................................................................. 29
TABLE 3: COMPARISON FRAMEWORK OF DIFFERENT TECHNIQUES WITH
RESPECT TO KEY CHARACTERISTICS ............................................................ 46
List of Abbreviations
xii
List of Abbreviations
DDBS: Distributed Database Systems
DDBMS: Distributed Database Management Systems
ALP: Attribute Locality Precedence
CISUD: Customized (Insert, Select, Update, Delete)
DSR: Design Science Research
HF: Horizontal Fragmentation
Introduction
13
1 Introduction The introductory section focuses on the selection of research domain and defines the
importance of the research and also mentions the objective and limitation of the
research work in this dissertation. This section also includes the background of the
problem domain and mention what are the potential problems in the area of research
under literature investigation.
1.1 Background Distributed database systems are becoming more and more important for sharing and
managing information within large corporate and companies or organizations. The
emergence of distributed database management systems (DDBMS) is based on
maturing of database management systems (DBMS) with significant development in
computer networks and distributed computing technologies [1]. The concept of
distributed database (DDB) is defined as a collection of multiple, logically interrelated
databases distributed over a computer network [1]. The control of distributed database
activities are governed by distributed database management systems (DDBMS). “A
distributed database management system (DDBMS) is the software system that
permits the management of the distributed database and makes the distribution
transparent to the users” [1, p.3].
In our discussion, it is important to take a brief overview about various distributed
database systems. These distributed database systems are categorized into different
forms such as homogenous distributed database systems (Home-DBS) and
heterogeneous distributed database systems (Hetro-DBS) [2]. The homogenous
distributed database defines the same data models, schemas and databases but the
heterogeneous distributed database depicts different characteristics like schema
integration, distributed query processing, distributed transaction management,
administrative functions and coping with different types of heterogeneity [2]. The
heterogeneity factor can also be involved with respect to computer hardware,
operating systems, communication links, data models, protocols and different
database management systems [2].
The importance of distributed and parallel processing in database management
systems (DBMS) is taken as an efficient way of improving performance of
applications that manipulate large volumes of the data in organization [8]. This design
of distributed database is used to achieve various tasks such as removing irrelevant
data accesses during the execution of queries from the various locations and reducing
the communication cost of data shared among various sites. The distribution design
also involves making decision in data fragmentation and placement across different
sites in distributed environment [8].
Distributed database helps to allocate data as fragmented, replicated and distributed
[9] over the intranet or internet within organization and across the organization. The
client/server architecture provides a platform where a number of client’s machines can
access to a single database server and help to distribute, allocate the data across
multiple sites that have to communicate with each other when responding to the user’s
queries and executing remotely transactions [1].
Introduction
14
Distributed database design involves some issues [5] and these issues complicate
distributed database design architecture. In distributed database system, it is often
required to allocate data as fragmented, replicated and decentralized [9]. The
fragmentation phenomenon highlights that how relation is divided into several parts
and stored at several sites. Relation can be fragmented in different form as horizontal,
vertical, or mixed fragmentation [9]. The term replication means the copies of the
same data are stored at several sites. These copies may be considered as fragment of
the relation or whole relation. For replication of the data, many data update operation
problems have been observed [9]. The term decentralized database is referred to
distribution of the data over the LAN/WAN environment where the relation is
distributed or stored at different sites [9].
Various approaches [10] [11] have been proposed for database partitioning and
fragment allocations in distributed databases. The design of distributed database is
used to enhance the performance of applications by minimizing the irrelevant data
accessible from different applications and by minimizing the cost of transferring the
data when processing the applications at different sites [12].
This dissertation focuses on different strategies and ways [12] for propagating data
over the network, between the sites within an organization or several organizations.
These strategies are based on fragmentation [1]. The fragmentation is basically
applied to relational database schema in the form of horizontal fragmentation and
vertical fragmentation [1]. The main advantage is to introduce fragmentation concept
in the distributed database system architecture and to enable the placement of data in
close proximity to its place of use, which helps to reduce transmission cost and also
the size of the relations that are involved in user queries [1].
1.1.1 Contribution of the thesis
The contribution of this thesis is to investigate the algorithms for database
fragmentation, by using comparative study framework of different techniques which
have been proposed by different researchers which explained in detail in section 5.1.
The other contribution of this thesis is, to design the architecture and implement the
customized ISUD technique which is taken from [24], which is explained in detail in
section 4. The main contribution of our thesis is the proposed 5-layered architecture
which enhances the features; the creation of individual ALP table from various
individual sites, because in [24] it only emphasis on summarized total cost of attribute
locality precedence (ALP) from all the sites, the detail explanation can be seen in
section 5.2.1 and 5.2.2.
Introduction
15
1.2 Case Study
1.2.1 Swedish Board of Agriculture
This research work is related to Swedish Department of Agriculture Organization
which has a centralized database system, providing the infrastructure to the end-users
in order for them to access data remotely all over Sweden. End-users of the
organization can easily access the information from the centralized database system
over the limited internet (extranet) by using internet authentication service (IAS). This
centralized database which is developed in oracle at different sites for a homogeneous
environment. In the organization, many resources are engaged for the maintenance of
centralized data for their dissemination within company and for accessing different
services according to end-user queries, so cost factor is high for quality assurance is
concerned. The organization’s centralized database system contains the data which
come from different relational databases such as customer’s database,
administrative/employee database etc.
1.2.2 Case Study for Testing Purpose or for Evaluation of Proposed
Technique
In order to test the technique, developed in this thesis, a separate case study has been
initiated. In the case study an information system previously developed for Bharat
transport service is used. Bharat transport service is Indian logistic company situated
in Hyderabad, India. This software is offered with different applications such as
vehicle’s billing information, daily loading reports, vehicles payment details and the
generation different reports. For the purpose of testing our technique, only the billing
information application has been used. This application has many functionalities such
as retrieving the data according to selected bill numbers and name of the employee , it
save the information in the database, it even helps to update and delete the
information. The application also utilizes the DML (Dynamic Manipulation
Language) operations such create, update, delete, select etc. Due to availability of
DML operation, we have selected this application to test our technique.
1.3 Purpose/Objectives
1.3.1 Research Question
After analytical assessment from the literature review [5], it is realized that there are
some issues in the distributed database development that are subjected to database
fragmentation. In distributed database design architecture, we have tried to discuss the
following issues in our thesis work.
Q.1. What algorithms do exist in order to uniformly fragment the relations in a
distributed database?
Q.2. How to design the architecture of designated algorithm from Q1?
How to implement and test the proposed algorithmic approach?
Introduction
16
1.3.2 Theoretical Purpose
This dissertation contributes to the field of distributed database and provides one of
the solutions that, how traditional centralized database system is transformed into
distributed database system. So the theoretical purpose of this research is to address
data fragmentation problems and investigate some efficient algorithms with efficient
techniques for horizontal database fragmentation in distributed environment.
1.3.3 Practical Purpose
The practical purpose of the research work was to design the architecture of a
carefully selected algorithm (describe in [24]) in real time scenario (using the case
study of Bharat Transport Service), and to implement and test the proposed
algorithmic approach. The practical purpose of this study helps the database
administrator or end-users to take a proper fragmentation decisions at initial stage of
distributed database system by using ISUD (Insert, Select, Update, Delete) matrix
table which is shown in detail in section4.
1.3.4 Assumption
Assumption is based on those things which are already developed. According to the
our research work
The databases of the case study which is used in this research work has
already made before developing distributed database systems architecture for
testing is concerned.
Different techniques which are already discussed in this research work are
taken from the literature review for fragmenting the database, which support
for creating distributed data in distributed environments.
1.4 Limitations
The limitation is the way to limit the scope of the study. The limitations also identify
certain set of boundaries and functionalities which are being used in this research
work.
1. Our research work is focused to implement the algorithm [24] for distribute
database using horizontal fragmentation technique.
2. We are not concerned about the vertical fragmentation and mixed
fragmentation.
3. We are also not concerned about the allocation of the data in distributed
environment at different sites.
Introduction
17
1.5 Thesis outline
The first chapter highlights the introduction of the research work, influence and
purpose of the research work and identify the problems, assumptions, limitation of the
research work. The second chapter express the previous approaches, techniques and
strategies how to develop fragmentation of distributed database. The third chapter
describes the methodologies, how we can conduct the research work and implement
the fragmentation in distributed database architecture. The fourth chapter defines the
design and implementation of algorithm using horizontal fragmentation technique.
The fifth chapter is about the results and analysis part of the research work. The sixth
chapter is about conclusion and future work related to the discussion.
Theoritical Background
18
2 Theoretical Background The distributed database is based on different architecture layers which describe a
logical collection of data from inter-lined databases [2]. Before going into the detailed
discussion, we have to know the basic fundamentals of distributed databases theory.
Below are some of the basic definitions related to database management systems.
A database “is a collection of data, typically containing the information about one or
more related organizations” [33.p.11].
A database management system (DBMS) “is a software package designed to store
and manage databases” [33.p.11].
A data model “is a collection of concepts for describing data. Data model in
database vs. type system in programming language” [33.p.12].
A schema “is a description of a particular collection of data, using the given data
model. Schemas in database vs. types in programming language” [33.p.12].
There are different types of data model for each database [33]. The models shown
with example. Data models: The relational data model, most commonly used
Relational database systems, e.g. SQL server, Oracle, Sybase. Object-oriented data
model: Object Store, O2. Object-relational model: UniSQL, Informix Universal
Server, Semi-structured data model, XML [33].
2.1 General Description of Distributed Database In the real world scenario people have the need to access different company
databases, whether it may be employees, customers, potential customers, vendors or
suppliers of any kind. Until now the companies have been able to have their databases
concentrated at a single server sites to be accessed worldwide by means of
telecommunication networks and internet [5]. Although using a centralized database
systems the companies have been able to disseminate the data within organization in a
very structured manner. But due to the incorporation of new business needs and
demands and the adoption of new database architectures for scalability, they need to
adopt new ways to propagate the data over distributed locations. There are many
benefits of using a distributed database system as explained in the following section.
However there are also associated complexities, some of them described in section 2.3
and 2.4.
2.1.1 What is a Distributed Database System?
There exists several different definitions of DDS (Distributed database systems)
defined by different authors. A basic and generic definition of a DDS is: A distributed
database systems is a “collection of multiple, logically interrelated database
distributed over a computer network” [1.p.3]. A DDBMS (distributed database
management systems) is also defined as the “software system that permits the
management of the DDS and makes the distribution transparent to the users” [1.p.3].
Theoritical Background
19
2.1.2 Application of Distributed database technology
Many advantages, of different perspectives have been listed for DDBSs. In the
following sections some fundamentals promises of DDBSs are described by the
Tamer Ozu [1].
2.1.2.1 Transparent Management of Distributed system:-
Distributed database technology is planned to extend the concept of data
independence to environments in which data is distributed and replicated over a
number of machines connected by a network [13]. Data independence is provided by
several forms of transparency network and, therefore distribution transparency,
replication transparency, and fragmentation transparency. Transparent access to data
separates a system’s higher level semantics from lower level implementation issues
[13].
Transparent system hides the information code of implementation from the users. The
actual benefit of transparent DBMS is that it handles the complex applications
development. This could be explaining more by an example as given by Tamer ozu
[1].
Let suppose an example of Jonkoping University which has different schools like
Engineering School(JTH), Jonkoping International business school(JIBS), and Health
science school, This university run the projects at each office sites and maintain a
database of their employees, Program information and related data etc. As per the
assumption the database used is relational so it can store the information in two
relations i.e. EMP(ENO,ENAME,TITLE) and PROG(PNO,PNAME,PROGDETAIL),
thus we add a third relation to store salary information of employee as
SAL(TITLE,AMT) and a fourth relation as ASG which is used as employees assigned
to which program for what duration and with what responsibility,
ASG(ENO,PNO,RESP,DUR), if this data is stored in a centralized DBMS and if we
want to find the names and the employees who worked on a project for more than 6
months, we would retrieve by the following SQL query[1].
Example
SELECT Ename,Amt FROM Emp,Asg,Sal
WHERE
Asg.Dur > 6 AND
Emp.Eno = Asg.Eno
AND Sal.Title = Emp.Title
From the above example we depict that the query get the results from centralize
database systems, as per the tables(relation) mentioned in where condition which is
transparent to the user. However if we make the centralize nature of the university
database to the distributed nature , it can be done through this circumstances that is to
localize data such that data of the employees of JTH school is stored at JTH office,
data of JIBS school are stored at JIBS office and so on. And the same can be applied
to other relations program and salary information. Therefore what we are intended to
do here is partitioning the relations and storing each partition to different sites, which
is known as Fragmentation. Thus the fully transparent access means that the user can
Theoritical Background
20
use the same query as used in the above example without any concern about the
fragmentation, location of data , as it rely on the system to resolve this issues[1].
There are different types of transparencies in distributed environment are explained by
the Tamer Ozu in [1] , they are Fragmentation Transparency, Network Transparency,
Replication Transparency etc. As our research work is concern with fragmentation so
we are going to explain about the Fragmentation Transparency.
2.1.2.2 Fragmentation Transparency
The actual form of transparency that needs to be talked about is fragmentation
transparency in distributed database system. In our proposed technique in chapter 4
we have justify the fact that fragmenting the relation horizontally into smaller
fragments is possible and treating each fragment as separate database or relation. The
motive of the fragmentation is to increase the performance, availability, and reliability
[1]. Generally fragmentation is of two types i.e. Horizontal fragmentation (HF) and
Vertical fragmentation (VF), In HF each relation is divided into sub relation and each
sub relation will have the subset of rows (tuples), whereas in VF the relations are
divided into sub relations and each sub relation is defined on a subset of the columns
(attribute) of the original relation.
When the relations of database is fragmented the user queries should be handle
according to the sub relations of database ,this issue can be handle by finding a query
processing strategy based on fragments rather than the relations [1]. Thus we can say
that these queries are converted from global queries to several fragment queries.
Therefore one of the fragmentation transparency issues is dealing with the one of
query processing [1].
2.1.2.3 Availability and Reliability
Availability can be defined as the probability that the system can be up continuously
until the time period given [12]. Whereas Reliability is defined as the probability that
the system will be up at a specified time [12], this improves with the DDBS. In the
centralized DBS, if one of the sites goes down then the entire system goes down
whereas in the DDBS it effects only with the site which is down and the other sites or
the system will not be affected. And even with the replicated data at different sites, it
effects is minimized [12].
2.1.2.4 Improved Performance
If there are very large database which is then distributed into different number of sites,
then the local subset of the DB will be lot smaller which tends to improve the size of
the transaction and the processing time. It even improve the performance of response
time for the transaction which access more than one site thus the processing can be
performed parallel [12].
2.2 Distributed Database Architecture A distributed database system allows applications to access data from local and
remote databases. In a homogenous distributed system, each site has same databases.
In a heterogeneous distributed system, at least one of the databases is a non-related
database.
Theoritical Background
21
2.2.1 Architectural Models for Distributed database system
There are some ways by which DDBMS can be constructed by using the following
classification which organizes the system; they are differentiated with respect to (1)
The Autonomy, (2) Distribution, (3) Heterogeneity in figure-1 [1].
Figure 1: Database Management System Implementation Alternatives[1]
(1) Autonomy. It refers to the distribution of control and not exactly the data and
it ensures single DBMSs which can operate independently [1]. It is a function
of certain factors such as the systems that exchange information, which can
independently execute transactions, or are allowed to modify the system. It
demands some requirements that have to be fulfilled they are as follows [1].
According to Gligor and Popsescu-Zeletin [1]
(i) Local Operations are not affected by participation in global multi
distributed database system.
(ii) Optimization and Query Processing also not affected by global query
access.
(iii) System consistency is not well cooperated when there is any change in
the database i.e. adding or removing DBs from global database.
According to Du and Elmagarmid [1][13]
(i) Design autonomy: All the databases use data models and transaction
management they need.
(ii) Communication autonomy: Every Databases are responsible and
decide which database to provide to other Dbs.
(iii) Execution autonomy: Each DBMS can be executed according to the
way it wants.
There are some classifications of autonomy which can be specified as follows.
-Tight integration: - It has single image of DB for all users who want to share
the information.
Theoritical Background
22
-Semiautonomous systems: - They consist of DBMSs which determines which
part of database should be shared and they modified the information for
communicating with each other.
(2) Distribution. It refers to the physical Distribution of data and different
software components over multiple sites, whereas the user can see the data as
transparent and as logical pool [1]. The distribution of data can be distributed
into two classes Client/server distribution and peer-to-peer distribution [1].
Client/server distribution: - It provides data management service at the
server side, the data is stored primarily, while the clients focus on getting the
data whenever needed and it also generate requests [1].
Peer-to-Peer distribution: - In this distribution the data is fully distributed
and there is no connection between client and servers, every machine has
functionality of DBMS and can communicate with other machines to execute
queries and transactions. Each server, client and each DBS at a site maintains a
portion of the database [1].
(3) Heterogeneity:-It occurs in various forms in distributed systems, like
hardware heterogeneity, Communications, and Operating system. In relation to
database it has data model, data format, query language, transaction
management algorithms. If accessing with other remote DBSs than there is
need of conversions [1].
2.3 Unsolved problems in DDBS
2.3.1 Distribution design
Distributed database design methodology varies depending upon the system
architecture. For tightly integrated distributed databases, the design process will be the
top-down from requirements analysis and logical design of the global database to
physical design of each local database [13]. For distributed multi-database systems,
the design process is bottom-up and involves the integration of existing databases
[13].
The step of interest in the top down process is distribution design describe by [13],
which involves designing local conceptual schemas by distributing global entities
over the sites of the distributed system. The global entities are then specified within
the global conceptual schema. By taking consideration of relational model, both the
global and the local entities are relations, therefore distribution design will maps
global relations to local ones [13]. One of the most important research issues that
require attention is the development of a practical distribution design methodology
and its integration into the general data-modeling process [13].
The two main aspects of distribution design are fragmentation and allocation. In
Fragmentation each global relation is partition into the set of fragment relations [13].
Whereas Allocation focuses on the (possibly replicated) distribution of these local
relations across the distributed system’s sites [13]. Therefore the research on
fragmentation has focuses on horizontal (or selecting) and vertical (or projecting)
fragmentation of global relations [13]. There are so many algorithms proposed for
Theoritical Background
23
allocation based on mathematical optimization formulations [13]. There is no
underlying design methodology which combines the fragmentation and allocation
techniques, they are typically treated independently.
2.3.2 Network scaling problems:-
The database does not have overall understanding of the entire distributed DBMS
design alternative in the form of performance implications [13]. Therefore, there are
some questions have been raised about the scalability of some protocols and
algorithms when the systems become geographically distributed or as the number of
system components increases [13]. There is one concern which is the suitable for the
distributed transaction-processing mechanisms i.e. (the 2PL and, particularly, the 2PC
protocols) in distributed database systems which is based on wide area networks [13].
There is an overhead is associated with these protocols, and implementing them over
a slow wide area network may pose difficulties [13].
2.4 Distribution Design Problems
For distributed databases, fragmentation and allocation are the major problems of
database distribution design. In the current research arena which often involves design
methods such as mathematical programing, in order to minimize the storing cost of
database, processing transactions against it, and communication cost [28]. Practically
it is very difficult to study database distribution design together with other problems
because every problem has their own difficulty to be studied.
2.4.1 The Complexity of the Problems
The problem of fragmentation and allocation together is proven hard
[28].Fragmentation and allocation are distribution design techniques which are used to
improve system performance. Each of them has massive search space for the best
solution of the problems.
Due to the complexity of fragmentation and allocation problems, the allocation is
treated independently from fragmentation [28]. From the previous literatures we find
that most of the allocation methods which accept fragmentation, in which
fragmentation has been done already, the fragmentation output will become the input
to allocation. To separate fragmentation from allocation is to simplify the formulation
of the problem by reducing the decision space, though the separation which
contributes to the complexity of allocation models [28]. Both steps take user
applications as input information and aim to improve system performance; they vary
only in that, where fragmentation works on global database schema while allocation
works on fragments. Thus, the application information and relationship between
fragments need to be specified again while doing allocation [28]. It would be worth to
develop a methodology which produces the interdependence of fragmentation and
allocation [28].
Theoritical Background
24
2.4.2 Interdependencies with Query Optimization
Designing distributed database systems is a complex task as many other issues are
also involved, like query processing and optimization, data replication, concurrency
control, directory management, reliability, and recovery [28]. From the
aforementioned problems, query processing and optimization is a closely interrelated
problem with fragmentation and allocation. Query optimization in distributed systems
depends on how data are fragmented and allocated, since query processing defines the
sequence of operations of queries, and the allocations of the operations as according
to the allocation of fragments [28].
2.4.3 Improvised Solution for the problems mentioned
In the literature, to minimize the complexity of the problem and to increase the
problem controllability the researchers have mentioned the following methods.
The fragmentation and allocation are mostly treated separately as two different
steps. First the fragmentation is performed without considering how resulting
fragments will be allocated, while allocation is performed with the assumption
that fragmentation has been decided already [28]. Thus, allocation is
considered with the assumption that a fixed query optimization method is used
to generate processing schedule [28], while the study of query optimization is
conducted with an assumption of fixed data allocation [28].
Both simple query environment and query site strategy is assumed while
studying allocation. As per the first assumption, network information is not
considered [28]. While with the second assumption is, queries are not
considered, which need to be processed in a distributed way. Therefore, query
trees are not activated and allocation of intermediate nodes is not considered
[28].
During studying allocation query optimization is disregarded. A real fragment
allocation can only be achieved when distributed query optimization is
performed after fragmentation [28].
There are some other ad hoc solutions proposed in the literature which leads to the
effective solutions for the overall system design, by avoiding the interdependencies
between individual problems, which makes this approaches inefficient in the sense of
obtaining optimal database distribution design [28].
2.5 Initial Design Approach for Distributed database
Design One of the prime tasks of this work research is to investigate and develop the
fragmentation technique in distributed database environment which is used to manage
the data from various locations. We chose the top-down design process approach in
our research work for database fragmentation in the initial state of the design. A
framework for this process is shown in figure-5 [3] [5].
Theoritical Background
25
The top-down approach is used frequently in different areas of computer sciences.
This top-down design process has required stages for designing the distributed
database. These stages share various level of information in incremental style for the
construction of homogenous distributed database system from scratch [5].
Requirement analysis
Distribution project
Logical projectConceptual Project
Physical project
User input
User input
Integration
Correction Correction
Figure 2: Stages of the top-down approach in distributed databases [3] [5]
Following are the stages of the top-down approach in distributed databases described.
2.5.1 Requirements analysis
In this stage the collection of information about the data, restrictions and relationships
within the organization is taken. The requirements analysis is understood through
meetings with the users where it can be observed that how the organization can
operates. After analyzing the requirement specification a document is created.
2.5.2 Conceptual project
In this level the data modeling and its relationships are formed independently as of the
structure representing the distributed database system (conceptual modeling). This
conceptual project can be recognized with analysis of the requirement specification.
Theoritical Background
26
After completing conceptual project a conceptual schema with the data integrity
restriction is obtained.
2.5.3 Logical project
In this level the conversion of the conceptual project which represents the schema of a
Distributed database system i.e. logical schema. This project is understood by the
application of conversion rules, translation to the relational model of the distributed
database. At the end of the logical project a logical schema with tables, stored
procedures, views, access authorizations, etc. is obtained [5].
2.5.4 Distribution project
In this level the decision of how the data and programs must be allocated and
fragmented through the nodes of the computer network is taken. In few cases the
network itself is designed and built to satisfy the necessities of the distributed
database project. This level is said to be the most critical and important in the project
of a distributed database. To support this phase in top-down approach, we tried to
connect with generic five steps for data distribution with respect to fragmentation and
allocation in distributed environment which is explained in detail in section 2.8.
2.5.5 Physical project
In this level the logical schema is defined in a DDS which is suitable to the data model.
The physical project is recognized by means of SQL instructions. The result is a physical
schema with establishing in the distribution project. After finishing the physical project of
each node of the computer network the distributed database is ready for the use. To find
errors a process which monitors is prepared to discover. Such errors are the system
feedback and are sent to the people responsible for the construction of the distributed
database [5].
2.6 Fragmentation in Distributed Database Design Fragmentation:- “Fragmentation is a design technique to divide a single relation or
class of a database into two or more partitions such that the combination of the
partitions provides the original database without any loss of information” [28,p.3].
“A fragment i.e. horizontal or vertical of a database object in an object-oriented
database system contains subsets of its instance objects (or class extents) reflecting
the way applications access the database objects” [34.p.1].
Distributed processing on DBMS is an effective way of improving the performance of
applications which operates huge data [2]. The major goals of distributed database
design are to remove the irrelevant data accessed while executing the queries and
reducing the data exchange among sites. The primary goal of distributed database
design is to fragment the relation in case of RDBMS (Relational DBMS) or fragment
the classes in case of object-oriented-databases, to allocate and to replicate the
fragment in different sites of the distributed system with local optimization on each
site.
Theoritical Background
27
Fragmentation is a promising design technique which is used to divide a single
relation or class in database schema into two or more partitions such that the
combination of the partitions provides the original database without loss of
information[28][4]. Horizontal fragmentation (HF) allows a relation or class in
database schema to be partitioned into disjoint tuples or instances [2]. Vertical
fragmentation (VF) also allows a relation or class to be partitioned into disjoint sets of
columns or attributes except the primary key [2].
Previous techniques of HF, VF or MF that are used have the following problems in
common:
Most of them uses frequency of queries, minterm predicates’ affinity or attribute
affinity matrix (AAM) as a basis of fragmentation. These require sufficient empirical
data that are not available in most cases at the initial stage [24][28].
Most of them concentrate only fragmentation problem and overlooked allocation
problem to reduce complexity [24].
Minimizing distributed joins is a fundamental fragmentation issue[3].
The second problem is related to semantic data control, specifically to integrity
checking[3].
2.6.1 Horizontal Fragmentation
Horizontal fragmentation is divided into two types they are primary and derived.
Whereas primary horizontal fragmentation of a relation or a class is implemented
using predicates of queries which are accessed by the relation or class, while derived
horizontal fragmentation of a relation or a class is implemented based on horizontal
fragmentation of another relation or class [28].
2.6.1.1 Primary Horizontal Fragmentation for Relational Databases
The primary horizontal fragmentation can be constructed with the context of the
relational data model and with the existing approaches for horizontal fragmentation
was first proposed by Ceri et al in 1982 [29] using minterm predicates.
Minterm-predicate-based approaches: “minterm-predicate-based approaches: which
perform primary horizontal fragmentation using a set of minterm predicates, e.g.,
[28.p.11][29].
Then after [24][30] proposed a technique based on attribute usage matrix (AUM) for
vertical fragmentation.
Affinity-based approaches: “which first group predicates according to predicate
affinities and then perform primary horizontal fragmentation using conjunctions of
the grouped predicates, e.g., [28] [30]. The way of grouping predicates is either
graph-based or using an objective function [28] [30]”.
From some of the literatures [28] we have taken few definitions related to minterm
predicates which are as follows.
Theoritical Background
28
Definition 1: “For a given relation R = {A1: D1, ……, An: Dn}, a simple predicate is
in the form of Pk: Ai Ɵ(Teta) Value With Ai as an attribute defined over Di , Ɵ(Teta)
{=,<,} Ɵ”[28].
Definition 2. “Minterm predicates M = {m1,m2, . . . ,mz} over a set Pr of simple
predicates are the conjunctions of simple predicates and their negations: M = {mj |mj
= ^ pk2Pr p_ k}, k = 1, . . . ,m, j = 1, . . . , z. where p_ k = pk or p_ k = ¬pk. Note that
all simple predicates in Pr appear (positively or negatively) in each minterm
predicate”[28.p.12].
Definition 3. “ A set of simple predicates Pr is said to be complete if and only if there
is an equal probability of access by every application to any tuple belonging to any
fragment that is defined according to Pr [28.p.12]”.
By using minterm predicates to implement horizontal fragmentation was first
proposed Ceri and Pelagatti in the year 1982 [28] by which files are fragmented
horizontally to optimize frequency of access performed at different sites of data by the
application programs. In the proposed literature it states that this minterm fragments
have records which are accessed homogeneously by all the transactions performed
and this used as the proper units of allocation.
Several researchers have adopted affinity-based vertical fragmentation algorithms to
horizontal fragmentation. Due to the complexity of checking completeness of the set
of simple predicates used for horizontal fragmentation, Zhang [28] adopted an
affinity-based vertical fragmentation approach to horizontal fragmentation. This
approach takes predicate usage and predicate affinity matrix as input and employs the
bond energy algorithm to cluster predicates. However, the fragments in the resulting
fragmentation schema may overlap each other and therefore cannot satisfy the
correctness criteria of fragmentation.
2.6.1.2 Derived Horizontal Fragmentation
Derived fragmentation in the rational data model is referred to horizontal
fragmentation. Derived horizontal fragmentation is used to splitting up a relation in
dependence on another relation by applying semi-join operations [28].
The dependence among the relations is the depiction of binary relationship between
relations. The direct link is based on equi-join operations and also shown one-to-many
relationships [28]. The two criteria suggested by [28] for choosing the fragmentation
with better join characteristics or choosing the fragmentation used in more
applications [28]. Here, derived horizontal fragmentation is explained by example.
There are different relations such as employee, assignment, projects and salary. Every
relation has own primary key for selection of the records according to the predicate
constraints value.
Relations:
Employee : Employee ID, Employee Name, Title
Assignment: Employee Number, Project No ,Duration
Projects: Project No, Project Name, Budget, Location
Theoritical Background
29
Salary: Title, Salary
The above underline attributes are referred to primary key attributes of the relations.
Similarly, assessed the foreign key relationships
Employee.Employee_ID Assignment.Employee_No
Projects. Project_No Assignment.Project_No
Salary.Title Employee.Title
Horizontal fragmentation of relation S based on the fragmentation of another relation
R where R is already fragmented into R1, R2, R3,----Rn. Using the semi-join operator
Si = S ∞ Ri = S ∞ σpi (R) = π S.*(S ∞ σpi(R))
fragmentation expression only refers to R. The following example has been shown the
mechanism of derived horizontal fragmentation. The relations have been distributed
into the more relations who are depended on each primary horizontal fragmentation
relation.
Project S1
Project_No Project_Name Budget Location
P1 Database Development 150.000 Jönköping
P2 Ontology based Portal 200.000 Stockholm
Table 1: Project S1
Project S2
Project_No Project_Name Budget Location
P3 Web Development 250.000 Göteborg
P4 Maintenance 100.00 Vaxjö
Table 2: Project S2
Similarly, distribute the relation R into to S1 and S2 for Assignment relation.
Assingment1 = Assignment ∞ Project S1
Assingment2 = Assignment ∞ Project S2
Assignment
Employee_No Project_No. Duration
E1 P1 5
E2 P4 4
E2 P1 3
E3 P4 5
E4 P1 4
E4 P3 5
E5 P2 7
Assignment S1
Employee_No Project_No. Duration
E1 P1 5
Theoritical Background
30
E2 P1 6
E4 P1 4
E5 P2 7
Assignment S2
Employee_No Project_No. Duration
E2 P4 4
E3 P4 3
E4 P3 5
According to the above mechanism of derived horizontal fragmentation, we achieved
the desire fragmentation with join characteristics. The benefits of derived
fragmentation using join operations in distributed database to retrieve desire tuples or
records according to the predicate or minterm efficiently. Here, we tried to pick real
time scenario to express the mechanism of fragmentation and allocation in distributed
database system in section 2.8.
2.7 Previous works on Fragmentation in DDBS The two main design techniques of distributed database design are fragmentation and
allocation. Since 1970s database distribution problem has been studied, in the first
stage the problem of file distribution was found, then the problem of distributing
relations or relation fragments. Then after emergence of the object-oriented data
model, there are some existing approaches of fragmentation and allocation have been
adapted to the object-oriented data model. To get overall picture database distribution
design, we have presented an overview of previous work in database distribution
design with respect to horizontal fragmentation and allocation.
In the year (1999) Ozsu and Valduriez proposed an iterative algorithm called
COMMIN algorithm in which it generate a complete and a min-term set of predicates
from a given set of simple predicates [1],after getting min-term predicates the access
frequency is defined in his algorithm, by using access frequency table data is
fragmented as explain by Ozu.
Using predicate matrix as input, in the year (2002) Cheng et al. [28] [31] proposed a
genetic algorithm-based clustering approach, which treats horizontal fragmentation as
a traveling salesman problem (TSP). Horizontal fragmentation is achieved by
performing selection operation using the set of the grouped predicates, which are
grouped according to the distances. The distance of each pair of attributes actually
measure the access frequencies of transactions that do not access the pair attributes
together. Additional analysis is needed to simplify the clusters of predicates. None of
the affinity-based horizontal fragmentation approaches takes into consideration of
data locality while clustering predicates.
In the year (2004) Baioo et al. proposed a technique in which it gives input as a
predicate affinity matrix which builds a predicate affinity graph which than define
horizontal class fragments [24].
In the year (2006) H. Ma, K. D. Schewe proposed a technique in which he uses input
as an attribute uses frequency matrix (AUFM) based on this matrix and a cost model a
Theoritical Background
31
vertical fragmentation is done [24].Than again in the year (2007) M.Alfares et-al.
proposed a technique by extending H.Ma technique in which he used input as AAM
(Attribute Affinity Matrix) to generate groups based on affinity values [24].
In the year (2008) Marwa et al. extended the M.Alfares er al. technique in which it
uses the instance request matrix which fragments the data horizontally for object
oriented database [24] [32].. In this paper he introduces a new algorithm for horizontal
fragmentation for an Object Oriented Distributed Database System OODDBS [32].
In the year (2009) Mahboubi H. and Darmont J. proposed a technique in which they
have used predicate affinity for HF in data warehouse [24] [33]. In this paper, they
work on XML warehouse fragmentation. In this paper there focus was on the initial
horizontal fragmentation of dimensions’ XML documents and exploits two alternative
algorithms [33].
In context to our studies there are some solution discuss in the research paper by
Shahidul Islam Khan and Dr. A. S. M. Latiful Hoque [24] which is publish in the year
(2010) has provided a fragmentation technique which can be applied at the initial
stage of database design of distributed database system. They have proposed a single
algorithm for both fragmentation and allocation which can be done simultaneously.
They have said that this technique can be used for initial fragmentation problem of
relational database for any distributed database systems. As from the literature review
we have found that this technique is most suitable to implement as per our
characteristic which we were searching for as shown in table-3 in chapter 5.
2.7.1 Database Fragmentation Technique by Shahidul
Islam Khan and Dr. A. S. M. Latiful Hoque
This technique is used to fragment a relation horizontally with the help of locality of
precedence of its attributes. “Attribute locality precedence (ALP) can be defined as
the value of importance of an attribute with respect to sites of distributed database”
[24, p.2]. Following is the block diagram of their system which depicts the
development of a fragmentation technique.
Relation
AllocationFragmentedSub-Relation
MCRUD FrequencyMatrix
Predicate Set
ALPTable
Figure 3. Block diagram of the system[24]
The block diagram provides a systematic working pattern of their technique in
sequential form. Firstly, a relation is taken from the database which needs to be
Theoritical Background
32
fragmented, then a modified CRUD (Insert, Select, Update, Delete) Frequency matrix
table is created according to predicates (queries) of the selected relation. “A data-to-
location MCRUD matrix is a table of which rows indicate attributes of the entities of
a relation and column indicate different locations of the applications” [24, p.2]. It is
used by the database designers and system analyst in the requirement analysis for
making decision to map to different locations [24]. We customized the existing
Modified Create, Read, Update, and Delete (CRUD) according to our requirement and
name it as Customized Insert, Select, Update, Delete (CISUD) matrix. The reason
behind customizing the MCRUD matrix into CISUD matrix is to implement this
technique practically in real time scenario. The MACRUD technique provide an
algorithm and pseudo code to calculate the total ALP value from all three sites, thus
by using this technique we customized and improvised it, like calculating the ALP
value from individual sites and provide an architecture to implement this technique
practically.
2.7.1.1 Fragmentation Allocation algorithm
The algorithm is used to generate the ALP (Attribute Locality Precedence) Table i.e.
to calculate the importance of the attribute at a particular location. The overview of
the fragmentation allocation algorithm is explained in the following figure-3. The
input of the algorithm is total number of sites, Relation of the database which need to
fragmented and the CISUD matrix of relation, the output of the algorithm will be cost
of ALP which fragmented as F1,F2,F3…etc. In step1 ALP table is constructed from
ISUD matrix based on cost functions, in step2 for the highest value of the ALP table a
predicate set is created, which is then rearranged to fragment the relation to different
sites.
Figure 4: Algorithm for Fragmentation[24]
In figure-4, they expressed the pseudo code of the algorithm for the construction of
ALP (Attribute Locality Precedence) table which is explained in [24]. We have
customized and contributed in our research work by using MCRUD (Create, Read,
Update, and Delete) technique.
“ Input: Total number of sites: S = {S1, S2,… ,Sn}
Relation to be fragmented: R
ISUD matrix: ISUD[R]
Output: Fragments F = {F1, F2, F3,…, Fn}
Step 1: Construct ALP[R] from ISUD[R] based on
Cost functions
Step 2: For the highest valued attribute of ALP table
a. Generate predicate set P={ P1, P2, … ,Pm }
b. Rearrange P so that #P = #S
c. Fragment R using P as selection predicate
(R) p p
d. Allocate F to S ”[24].
Theoritical Background
33
The pseudo code algorithm contains two parts. Firstly, CISUD (Insert, Select, Update,
Delete) matrix of a relation which needs to be fragmented is taken as input. Secondly,
ALP table is achieved as output of that relation. The pseudo code algorithm consist of
five nested-for loops for calculating the cost of each attribute i.e. ALP of the relation
[24].
Figure 5.ALP-table-construction Pseudo-code [24]
Input: ISUD of a relation that to be fragmented
Output: ALP table for that relation
for ( i =1; i <= TotalAttributes; i++)
{
for ( j =1; j <= TotalPredicates[i]; j++)
{
MAX[i][j] = 0;
for ( k =1; k <= TotalSites; k++)
{
for ( r =1; r <= TotalApplications[k]; r++) /* Calculating sum of
all applications’ cost of predicate j of attribute i at site k */
{
C[i][j][k][r] = fc*C + fr*R + fu*U + fd*D
S[i][j][k] + = C[i][j][k][r]
}// end of forth loop
If S[i][j][k] > MAX[i][j] /*Find out at which site cost of
predicate j is maximum*/
{
MAX[i][j] = S[i][j][k]
POS[i][j] = k
}
SumOther = 0
for ( r =1; r <= A[i][j][k][r]; r++)
{
If (r!=k)
SumOther + = S[i][j][r]
}
}// end of third loop
ALPsingle[i][j] = S[i][j][POS[i][j]] – SumOther /* actual
cost for predicate j of attribute i */
}// end of second loop
ALP[i] = 0
for ( j =1; j <= TotalPredicates[i]; j++) /*calculating total
cost for attribute i (locality precedence)*/
{
ALP[i] + = ALPsingle[i][j]
}
}// end of first loop
Theoritical Background
34
Above figure is a pseudo code of the algorithm of fragmentation allocation which is
shown by [24]. By using this algorithm we choose to test with our requirements and
fragment the database as accordingly.
2.7.1.2 Mathematical Measurement of the algorithm
To execute the algorithm there are some mathematical formulas and equations are
used in [24], which are often used to calculate the cost of ALP and also used to test
the algorithm with different operational changes. By considering these linear
combination equations we try to test the algorithm with different frequencies retrieve
from the customized ISUD matrix table. Therefore cost is treated as the effort of
access and modification of certain attribute of a relation by an application from a
particular site [24]. To calculate precedence of an attribute of a relation we can take
the CISUD matrix of the relation as an input with the following cost functions. The
equation (1) is used to calculate the cost of sum of frequencies, the equation (2) is
used to calculate the total cost of frequencies at particular site, equation (3) is used to
get the maximum cost among the sites for predicate j of attribute i. And the equation
(4) is used to calculate the total cost of attribute (i.e. locality precedence) [24]. All
following equations are executed in user interface application code development. The
customized ISUD frequencies can be retrieved automatically from CISUD matrix
table with the help of user interface.
Ci, j, k, r = fiI + fsS + fuU + fdD (1)
Ai j k
Si, j, k = ∑ C i, j ,k, r (2)
r =1
Si, j, m = Max (Si, j, k) (3)
Ai j k
ALPi j = Si, j, m - ∑ S i, j ,k (4)
k≠m
l
ALPi = ∑ ALPi j (5)
j= 1
Here fi = frequency of Insert operation
fs = frequency of Select operation
fu = frequency of update operation
fd = frequency of delete operation
I= weight of Insert operation
S = weight of select operation
U = weight of update operation
D = weight of delete operation
Ci, j, k, r = cost of predicate j of attribute i accessed by
Application r at site k
Si, j, k = sum of all applications’ cost of predicate j of
attribute i at site k
Si, j, m = maximum cost among the sites for predicate j of
attribute i
ALPi j = actual cost for predicate j of attribute i
Theoritical Background
35
ALPi = total cost of attribute i (locality precedence)
By using the above functions, the designer can calculate the actual cost ALP of
particular attribute. Generally update function acquires more cost than other function
like insert, select and delete function acquires least cost from different sites of
applications. The given constant weights of the ISUD frequencies are I=2 for insert,
S= 4 for select, U=3 for update and D=1 for delete, the justification of giving constant
weights is during the design time of DDB, the designer is unaware of occurrence of
frequencies of Insert, Select, Update, Delete of particular attribute from different sites.
The following Airline’s Reservation System Database [18], describes the mechanism
of data fragmentation and allocation over the distributed environment in real time
scenario for better understanding for implementation point of view in following
section.
2.8 Generic Five Steps for Data Fragmentation and
Allocation in Distributed Database Systems
Five steps method is the systematic approach which leads to construct data allocation
with respect to fragment in distributed database environment [18]. One of the
objectives of this steps is to give the overview of concrete example from literature
review to convince the readers, how the data fragmentation can be possible in real
world. These steps are taken from the [18] which explains about the distributed data
fragment, and allocation of data at various sites. We tried to make relation with our
research work which we will present in later part of our report. Here, are the following
steps [18].
Step 1: Collect Existing Global Relations
Step 2: Analyse Frequently Asked Queries (FAQs)
Step 3: Set Data Allocation Objectives
Step 4: Transform Global Relations into Fragment Relations
Step 5: Allocate Fragment Relations to Sites
We chose some of the steps from generic five steps approach for data fragmentation
and allocation in distributed environment in our research work which provides some
inspiration to the readers, how we can perform the data fragmentation phenomenon in
the real world.
Theoritical Background
36
2.8.1 Collection of Global Relations
The first step of five step method for data allocation in distributed database systems is
based on collection of global relations. The relations are referred to table in the
database systems. The design of global relations is based some specific procedure of
entity-relationship modelling and normalization [18]. Here, we tried to express these
steps in real time discussion of our case study (Bharat Transport Services) case study
in our research work so that we could realize the phenomenon of data allocation as
fragment, replicate and distribute [18].
2.8.2 Frequently Asked Question (FAQs)
The second step of this approach is utilized to analyse frequently asked queries
according to the end-users queries. According to the airline’s reservation example
[18], the classified FAQs of an airline’s reservation system is classified into various
categories see in [18]. The classification of frequently asked question invites the users
to understand what type of data could be retrieved at different sites of the distributed
database. We tried to correlate our case study (Bharat Transport Services) with the
classification of the frequently asked queries at different sites from the end-users
demands and needs. The user query can be executed to retrieve the data from different
relations using SQL query structure. The answer of this SQL query is retrieved at the
site of any destination.
2.8.3 Data Allocation Goals
The third step of five steps approach is used to set data allocation goals in distributed
environment. These goals highlight some characteristics which tried to achieve data
allocation objectives. These characteristics support and help to increase the
availability and reliability of the data against end-user queries at different sites and
focus to reduce the communicational cost for data transfer over the distributed
environment. This step also highlights the importance of storage cost and emphasis
that how this generic steps contribute to reduce the storage cost.
This step also exploits the phenomenon of parallelism by utilizing the resources of
other sites at the time of query processing whenever it is possible. So for this purpose,
data replication technique is the ultimate choice to achieve data allocation goals [18].
So, we tried to express the data allocation step in our case study (Bharat Transport
Services) to achieve data fragmentation objectives in our research work in later
section.
The aforementioned steps describes about the relations in the database of Bharat
Transport Service which is explained in detail in section 1.1.2, the second step
explains about the retrieved results against end-users queries from the database of
Bharat Transport Service in the form of results. The third step defines the set data
allocation objectives with respect to increase availability and reliability, minimize
communication cost factor, minimize storage cost in distributed environment.
The fourth step highlight the transformation process can be possible and how relations
can be converted into fragment relations at single site with the help of data
fragmentation technique e.g. horizontal data fragmentation technique. The fifth step
explains how fragmented data can be distributed over various sites and allocated
Theoritical Background
37
fragment relations to different sites. The last three steps are exempted from our
research work, so we try to give an overview of the allocation part to the reader with
an example given in [18] , as it has the connection with the first two steps.
The generic five steps for data fragmentation and allocation is interpreted in detail in
[18] with structured manner to describe the phenomenon of allocation of fragment
relations over different sites concurrently for better understanding. In this thesis work,
we chose data fragmentation technique [24] which we will explain in the next
proceeding of research report for implementation is concerned.
Research Methods
38
3 Research Method In this thesis, we are going to highlight the importance of the research methods which
we have adopted in this report. Research methods are categorized into three major
levels: a high level research method, a low level method for research design and a low
level research method. The high level research method is used for data inquiry during
system requirement assessment from the domain experts and knowledge mentors.
Similarly, we have also utilized two types of low level research method, one is used
for conducting the overall research design perspective in this thesis work and another
(Design Research method) is used for the development and implementation of 5-layer
architecture for database fragmentation in distributed environment in a constructive
and systematic way.
The concept of research methodology is to support a diligent, rigorous and a
systematic process of investigation of the specific problem to describe effective
solutions and develop test explanatory concepts, theories and applications [16].
Figure-6 describes the abstraction of the research design method.
Domain’s
Contextual Problem
Domain’s
Contextual Problem
Optimal Solution
to Domain’s Contextual
problems
Optimal Solution
to Domain’s Contextual
problems
High Level Research Method for
Data Inquiry
Low Level Method for Research
Design
Low Level Method for Implementation
(DSR)
Research Design Process
Figure 6:Research Design Method [7]
3.1 Categories of Research Methods
There are mainly three research methods used in this thesis work to investigate the
problems at different levels to achieve effective and optimal solution in systematic
way.
1. High Level Research Method for Data Inquiry
2. Low Level Method for Research Design
3. Low Level Method for Implementation (Design Science Research
Methodology)
The above three methodologies are quite convincingly used in our research work with
respect to data collection from the domain experts, to conduct the overall design of
our research work and implementation of the proposed technique which is perceived
from the literature review. The following proceeding can be explained in detail.
Research Methods
39
3.2 High Level Research Method for Data Inquiry
For information collection, we have chosen different ways for acquiring the
information from the domain’ experts. We have utilized following high level methods
for information collection is concerned.
1. Meeting Session
Here, the meeting session is considered as high level method which is a very useful
method for acquiring the information from domain experts. We have conducted the
meeting session at the start of our research work with our supervisor and knowledge
mentors in Swedish Board of Agriculture, to grasp the idea of the problem context
because we have to focus what we want to achieve at the end of this thesis. The
primary agenda of this meeting session is to discuss and highlight the problems about
data fragmentation over distributed database environment from the literature review
and specify the scope of the work. This meeting session with our supervisor
knowledge mentors is highly motivated and helpful for understanding the data
fragmentation and allocation problems in distributed environment and how we can
focus on achieving optimal results.
3.3 Low Level Method for Research Design
We have chosen constructive research method for designing the research work in
systematic way.
3.3.1 Constructive Research
The constructive research is considered one of the most popular methodologies for
designing the research, because it helps to support the ability of problem solving, in
selective and combined previous learned theories, procedures, declarative knowledge
and cognitive strategies to solve the unknown problems in specific subject’s
knowledge [6][7]. We have derived certain steps from [6], which are useful in our
research work.
3.3.1.1 Constructive Research Steps
These following steps are necessary to conduct constructive research [15][16]
Step 1: Prepare the case study according to the discussion in the meeting session
with the help of domain experts.
Step 2: To define the domain problems
Step 3: To define the scope of the domain’s problem
Step 4: Develop and describe the design for solutions
Step 5: Deploy proposed solution for implementation and testing
Step 6: To evaluate scope of the solution with knowledge mentors
Step 7: Refine the design structure of the solution after getting feedback from
domain users and domain experts
Research Methods
40
3.3.2 Phases of Constructive Research
The description of these steps helps to give detail in figure-7 for good understanding
about different phases in constructive research.
3.3.2.1 Preparing Case Study
The purpose of the case study is to give enormous understanding of domain’s
problems for quality work. For the preparation of the case study, it is very important
to understand the domain contextual knowledge for analyzing the nature of problems
and proposing effective solutions in the various contexts.
3.3.2.2 To Define the domain problems
For conducting good research, it is important to define domain problems so that the
researchers can take some initiatives for addressing these problems.
3.3.2.3 To define the scope of the domain’s problem
It is necessary to mark the boundaries of the domain’s area of concentration to find
optimal, efficient solution. This strategy would help the usage of resources which are
used in the domain’s problem [16].
3.3.2.4 Develop and describe the design for solutions
This phase describes the designs of the solutions for addressing the problems. Here is
the stage at which we have developed the design models to illustrate the problem
domain [16].
3.3.2.5 Deploy proposed solution for implementation and testing
This phase emphasizes different development strategies in terms of the design models
to address the problems in the given context of organization. This phase also, focuses
on some testing perspective to ensure that the prototype is fulfilling the domain’s
user’s requirements [16].
3.3.2.6 To evaluate scope of the solution with knowledge mentors
This phase emphasizes for the evaluation of the defined scope in the research design
process. At this stage, we have evaluated scope of the solution through some domain
experts. Domain experts will evaluate the design model by using different queries
according to the certain requirements in prototype.
3.3.2.7 Refine the design structure of the solution after getting feedback from
domain users and domain experts
This phase describes the refinement of the design structure of the solution after getting
the feedback from domain’s users and from the domain experts. This stage provides
invitation for the researcher for future improvement in the design of the model in this
research work.
Research Methods
41
Domain Experts
&
Researcher
Construct Case
Study
Prepare
Case Study
Define the domain
problemsObserve
Define the scope of
the problem’s
domain
Define
Develope
Design
To evalaute the
scope of the
solution
Feedback Feedback after
evaluation
Refine the Design
Structure
Deploy proposed
solution for
implemenation
Evaluate
Scope
Develope and
describe the design
for solutions
Deploy
solution
Figure 7: Constructive Research Methodology for Research Design
3.4 Low Level Design Research Methodology (DSR) for
Implementation Design Science Research (DSR) methodology is considered to be one of the
promising methods to conduct systematic design research various science disciplines
or in a developing industry. DSR has contributed to natural science research and
proposed generally four outputs for design science research: 1) constructs, 2) models,
3) methods, 4) instantiations [22]. This research methodology consists of various steps
which invite the practitioners and researchers to make the design rationally. The
methodology is used for the implementation of the defined algorithm. The following
steps are explained in context of our research questions i.e. from the point of view of
problem awareness and then the implementation of the suggested solutions is
presented. The testing of the algorithm is shown in developing and evaluating steps.
3.4.1 Steps of the Design Science Research Method (DSR)
These steps of design science research method (DSR) are illustrated as following:
1. Awareness of the problem
2. Suggestion
3. Development
4. Evaluation
5. Conclusion
The above steps are defined in the following pictorial diagram in figure-8.
Research Methods
42
Knowledge Flows Process Steps outputs
Figure 8: The General Methodology of Design Science Research [22]
3.4.1.1 Awareness of the Problem
The awareness of the problem comes from various information’s channels: new
development in industry or in a reference discipline [22]. The output or finding of this
phase is a proposal, formal or informal, for a new research effort to encourage the
researchers and practitioners to have serious input to understand the context of the
problem in different domains [22]. In this research work, we obtained the awareness
of the problem from knowledge mentors of the company called Swedish Board of
Agriculture and detail of the case study can be found in section 1.2.1 and from
literature review [24], about the assess of the data fragmentation and allocation
problem.
3.4.1.2 Suggestion
Suggestion is the next level of awareness of the problem or of following immediately
behind the proposal [22]. In any formal proposal for design science research (DSR), a
tentative design would be an integral part of the proposal. “Tentative design is an
essentially creative step wherein new functionality is envisioned based on a novel
configuration of either existing or new and existing elements. There are different
approaches to address the problems of software system complexity. Some of the
alternatives that were discarded included development of a new software development
methodology specifically focused on operation support systems, automation of the
maintenance function, and development of a high-level programming environment
“[22].
Suggestion is an essentially creative step where in new functionality is envisioned
based on a novel configuration of either existing or new elements [22]. In the
suggestion phase, we have included some creative steps after analysis of extensive
literature review [24] related to the problem of fragmentation and of allocation in
Development
Proposal
Tentative Design
Awareness of the
Problem
Suggestion
Evaluation
Conclusion
Artifact
Performance
Measures
Results
Operation and
Goal Knowledge
Circumscription
Research Methods
43
distributed database architecture which is proved by applying and testing of ISUD
matrix technique about Barat Transport Services.
3.4.1.3 Development
In the development phase, the tentative design is further developed and implemented.
The elaboration of the tentative design into complete design requires creative efforts.
The mechanism for developing and implementing the techniques is varied and
depending on the artifacts to be constructed. For formal proof, an algorithm may be
required for the construction of the technique [22].
In the development phase, we have proposed a 5-layer architecture to describe the
pattern of horizontal fragmentation which can be seen in section 4.2. In this section,
we have also described some development steps which are involved in various phases
to show development activities.
Step 1: We have chosen small scale relational database of Barat Transport Service
which has been mentioned in section 1.2.2 that explains the real time
scenario in initial phase of implementation.
Step 2: The database designer designs ISUD matrix (insert, select, update, and
delete) with the help of cost functions in distributed environment. This ISUD
matrix, with its cost function, is used to test the algorithm. In real time
scenario, the variation in the algorithm can be checked with various
approaches as described in chapter 4.2.
Step 3: The Attribute locality precedence (ALP) table that can be defined as value of
importance of an attribute with respect to sites of distributed database will be
generated after running the algorithm with the help of ISUD matrix for each
relation.
Step 4: To define the predicate set (P) for each relation which will be generated for
the attributes with highest precedence value in the ALP table and also defines
the behavior of the information retrieved from the relation.
Step 5: According to highest valued attribute of ALP, fragment the relation (R) using
predicate (P) as selection predicate
Step 6: Allocate the fragmented data according to the predicate or query over the
various sites (S) in distributed environment.
3.4.1.4 Evaluation
In the evaluation phase, the artifact is evaluated according to the set criteria that are
always implicit and frequently made explicit in the proposal or awareness of the
problem phase [22]. In the evaluation phase, the results and additional information
gained in the construction and in the running of the artifact are brought together and
fed back to another round of suggestion [22]. The evaluation phase emphasizes the
performance and measurement of algorithm or design technique for the judgment of
the results from different ways. These ways are defined clearly from the proposal or
awareness of the problem. We have provided the prototype which demonstrates the
horizontal fragmentation.
Research Methods
44
The evaluation activity can be confirmed through the execution of the algorithm that
gives the demonstration of data fragmentation and allocation over different sites in
defined distributed environment of real time scenario.
1. The evaluation activity can be verified and evaluated through changes in the
frequencies of ISUD (Insert, Select, Update, Delete) table which are defined in
the implementation part. The evaluation activity can also be assessed through
the domain users against different transactional queries with different
frequencies over the algorithm.
2. The testing of the algorithm is done on the basis of changes in the frequencies.
The detailed explanation is shown in the implementation part of the proposed
model in chapter 4.2.
3.4.1.5 Conclusion
The conclusion phase is the final stage of a specific research effort. The results are
focused to address the data fragmentation problems. The main contribution of the
conclusion is to achieve results, which are defined clearly in the purpose or objective
of the proposal. We conclude after the evaluation phase from the domain experts and
knowledge mentors, that the results are authentic and that they are truly mapped
according to the purpose of this thesis.
Results
45
4 Results In this section, we have explained the results in terms of two categories. These
categories explain the theoretical results and practical results they address the
research questions in our discussion and focus on achieving the objective of this
research work. These results which are presented in following section are based on
the purpose of our research work explained in section 1.3. We have utilized
constructive research to justify the theoretical findings in the form of a comparison
framework from literature review. We have also chosen research methodology
design science research method (DSR) to achieve practical results in a systematic
way. For the development of the practical results, we have used some development
tools like Microsoft Visual Studio 2008 version, using C#.Net for the execution of
the our proposed CISUD matrix technique. More explanation of these results can be
seen in the following parts. How we have achieved the results is also shown.
One of the significant contributions of our practical results is to test the linearity of
the algorithm. We have utilized proposed customized ISUD matrix technique which
helps to test the algorithm on the basis of frequency of particular attributes in
distributed environment.
4.1 Theoretical Results In this section, we have justified the answer to the first research question "What
algorithms do exist in order to uniformly fragment the relations in a distributed
database?”, by using a comparative study framework of different techniques which
have been proposed by different researchers to support the data fragmentation
phenomenon in this research work. We have also tried to convince readers, through
the characteristics, explained in table-3, which highlight the importance of techniques
in an extensive study of literature review. We have also assessed those different
methods that have been used by different researchers explained in table-3. This
comparative study is based on different techniques, algorithms approaches and
methods which are utilized to fragment and allocate the data over the distributed
database environment.
Characteristics
Cheng
et al
(2002)
Baioo
et al.
(2004)
H. Ma,
K. D
et al.
(2006)
H. Ma,
K. D
et al.
(2007)
Marwa
et al .
(2008)
Mahbo
ubi H.
and
Darmo
nt J
(2009)
Dr. A.
S. M.
Latiful
Hoque
(2010)
Custo
mized
ISUD
Tech.
Distributed Database
designing at initial
stage for partitioning
the relations.
No No No No
No
Yes
Yes
Yes
Horizontal
Fragmentation
algorithm
Yes Yes No Yes Yes Yes Yes
Yes
Affinity matrix to No Yes Yes No Yes Yes Yes Yes
Results
46
Table 3: Comparison Framework of different techniques with respect to key characteristics
The above said characteristics have been explained in detail in respective research
articles which is presented in two-dimensional form in table-3. This comparison
framework describes about different techniques, algorithms, methods and approaches
which were explained by various researchers. On the basis of this comparative
framework, we have analyzed the precedence and credibility of the characteristics in
various authors work through literature review for data fragmentation in distributed
environment and finally, we have made the judgment, what specific approach/method
or algorithmic technique is suitable to address defined questions in section 1.3.
Table-3 gives the holistic view to analyze and describe different characteristics such
as Distributed Database designing at initial stage for partitioning the relations,
Horizontal Fragmentation algorithm, Affinity matrix to build a predicate, Relational
Database, Complexity of the Technique, Allocation decision support, Performance,
Efficiency which are explained in detail by various authors in their respective
literatures. If the mentioned characteristics are present in respective research papers
we have mentioned as “YES” and if it is not present we have mentioned as “NO” in
the above Table 3. Based on this comparison we have selected Latiful et-al. technique
for implementation as we found this technique is more efficient and easy to
implement. Thus, These aforementioned characteristics fully support our proposed
customized ISUD matrix technique which is the modified version of Latiful et-al., to
address the phenomenon of data fragmentation in distributed environment efficiently.
After analysis of this framework, we are confident to answer the question, what
algorithms do exist in order to uniformly fragment the relations in a distributed
database?
4.2 Practical Results
This section describes the explanation of practical results and highlights mechanisms
of development of customized ISUD matrix technique and the way we have achieved
these results after the practical implementation which is explained in following
section 4.2.1. These results address the second research question:
“How to design the architecture of designated algorithm from Q1?
How to implement and test the proposed algorithmic approach?”
4.2.1 Proposed 5-Layer Architecture
In the beginning of implementing distributed database after the detailed conversation
with domain experts and knowledge mentors, we have assessed and analyzed the
nature of the domain problems for good understanding. So, this section explains about
build a predicate
Relational Database Yes No Yes Yes No No Yes Yes
Complexity of the
Technique Yes Yes Yes Yes
Yes Yes No No
Allocation decision
support No No Yes Yes
Yes No No Yes
Performance - - - - - - Yes Yes
Efficiency - - - - - - Yes Yes
Results
47
the part of our second research question i.e. “How to design the architecture of
designated algorithm from Q1?”. We have chosen top down approach for designing
distributed database architecture in our research work which is explained in detail in
section 2.5. The top-down approach gives initiatives to the database designers to build
homogenized distributed database system from scratch and also facilitates to share
information at different levels in incremental style. We have also tried to correlate the
top-down approach with generic five steps for data fragmentation and allocation
which are already explained in detail in section 2.8. These generic five steps scheme
is considered as a starting point for developing a 5-layer architecture in our research
work for distributing data fragmentation and allocation in distributed database system.
Here, we had to take a proper decision to solve the problem of fragmentation. We
have chosen the technique [24] as a foundation for database fragmentation which is
explained in section 2.7.1, and also customized it in our research work for data
fragmentation in distributed environment. The attribute locality precedence (ALP)
table can be designed and developed by the database designer for each relation of a
specific database system for a distributed environment. This can be done during the
time of designing the database with the help of modified ISUD (Insert, Select, Update,
and Delete) matrix and cost functions. These cost functions are explained in section
2.7.1.2. For a better understanding of this layer architecture which is explained in a
block diagram in Figure-9, the 5-tier architecture is set explain different layers such as
Application Layer, Database Layer, Mediator Layer, Fragmentation Layer, and
Allocation Layer.
This architecture gives a holistic view that explains the functionalities of the different
layers which are useful for fragmentation of the database in the distributed
environment.
The Application layer provides generic overview of different sites and a
specific application which runs on these sites. Application layer also provides
the communication between user interface and backend database repository.
The Database layer provides an object view of database which organizes the
data by applying different database operations such as DDL (Data definition
language), DML (Data manipulation language) etc.
The Mediator layers (user-interface) serves as middleware which provides the
facility for the connection between database layer and fragmentation layer.
This layer provides overall functionality of the algorithmic approach in our
research work to facilitate the end-users to retrieve the ALP table and
individual ALP table simultaneously and helps to present these data into
graphical form for better analysis and understanding.
The Fragmentation layer is responsible to take the decision for fragmenting
the relation on the basis of highest value of attributes retrieved from the
mediator layers (user- interface). These layers also communicate with the
database administrator for the decision making for fragmenting the relation to
different sites at the start of database designing in distributed environment.
The Allocation layer helps to allocate the fragmented data over the distributed
sites. This layer is excluded and the functionality of this layer is not
applicable in our research work.
Results
48
S1
S2
S3
App1
App1
App1
Database
Repository
(ISUD Matrix)
1. Application
Layer
2. Database
Layer
ISUD User Interface
Algorithmic
Approach
Get ALP
Table
Get
Individual
ALP Table
Get Graph
3. Mediator
Layer
Decision Process
based on attribute
Value at differnt sites
Database
Administrator
4. Fragmentation
LayerAllocation at
different sites based
on decision process
Excluded
5. Allocation
Layer
Figure 9: 5-Layer Architecture for Proposed Fragmentation Technique
The functionalities of above mentioned layers are discussed in detail in the following
section which is concerned specifically with our research work.
4.2.1.1 Application Layer
To test proposed technique, we have chosen the case study of Bharat transport
service which is explained in section 1.2.2. We have taken one of the applications
from Bharat transport service software system which is shown in figure-10. This
application layer contains three different sites such as S1, S2, and S3. For simplicity,
we have chosen a homogenous application named billing information from case study
at each sites. The specific application facilitates end-users for data storing in terms of
insertion, update, and deletion modification into the specific database system of
Bharat transport service. The end users can use the application and its functionality
according to their demands from different sites in distributed environment. Each
application has its own relation or table in a local database of Bharat transport service
which contains various types of attributes that accord with the end-users requirement.
Results
49
Figure 10: Application of a Case Study
We have constructed our own set-up to justify our proposed technique. We have taken
three different sites at local machine for testing purpose to minimize the networking
connectivity problem over the intranet or internet. So from the local machine, we have
taken three local drives such as C, D and E as three different sites. The application
which is used from the case study as shown in figure-15 is installed at all three drives.
The C: drive is used as site1, the D: drive is used as site2 and the E: drive is used as
Site3. All three sites share common database which is saved in the drive C. The
application used from the case study provides various functionalities such as insert,
update, delete and select operations depending upon a set of predicates or conditions
given by the database administrator. Whenever the end user runs any query of the
application from any sites, the access record of the query is saved in the ISUD matrix
table in the database at site1 of local machine.
4.2.1.2 Database Layer
The database layer consists of three sub-layers with database systems. The following
relation of database in figure-11 and figure-12 defines the physical storage of data in a
database system.
4.2.1.2.1 Database of case study application
In this task, we have taken already defined database of case study application which is
developed in MS access database repository. The relation in a database consists of
different types of attributes which are described as properties of a relation according
to the case study requirements and demands.
Results
50
Figure 11: Database of Case Study Application
Above figure-11 is a database of cases study application in MS access which has
various relations. For the testing customized ISUD technique we use “Description”
table which can be seen in figure-11.
4.2.1.2.2 Constructing of Log Files for Customized ISUD Matrix
In this task, we are creating the customized ISUD matrix with the help of the log file
code which is embedded in the application’s code of the case study. The log file code
is the core element for creating the CISUD matrix and User Interface. The log file
code is responsible for saving the data from different sites, with its attribute name,
attribute value, predicate name and time of access in database respectively. The
CISUD matrix table is created with the help of log file which is shown in figure-18.
4.2.1.2.3 Database for Customized ISUD User-Interface
A database for Customized ISUD User-Interface is created to get the CISUD
information. We can build CISUD (Customized Insertion, Selection Updation,
Deletion) matrix table in any database management systems but we have utilized the
MS-Access database for the execution of CISUD matrix table. A data-to-location
CISUD matrix is a table in which rows indicate attributes of the entities of a relation
and columns indicate different locations of the applications [24]. The log-file at each
site is responsible for creating the CISUD matrix table in database at specific site
against the end-user query with respect to site name, attribute name, attribute value,
predicate name and time of access. The log-file code is shown in detail in section 8.2.
Results
51
Figure 12: CISUD matrix table
The above figure-12 depicts the construction of CISUD matrix in the database in our
research. From the existing technique [24], we have customized ISUD (Insert, Select,
Update, and Delete) matrix technique according to the requirements of the case study.
The Customized ISUD matrix is a table which is constructed by inserting predicates
of attributes of a relation as the rows and application as the columns for the sites of a
DDBMS.
4.2.1.3 Mediator Layer (Algorithmic Approach)
The role of the mediator layer is considered to be as the core functionality of the
proposed technique because this layers explains what is the novel contribution of our
research work. In the mediator layer user interface of CISUD application is design
and implemented. The mediator layer takes input as CISUD matrix table from the
database layer. Following are the general algorithmic approach of our technique.
Algorithmic Approach
1. Input: a. Total number of sites: S = {S1, S2,… ,Sn}
Relation to be fragmented: R
1b. Select the attribute and its value
1c. Select ISUD(Insert, Select, Update, Delete) Frequencies
From CISUD matrix table: ISUD[R]
2.Output: a. Total ALP value
2b. Individual ALP value
2c. Fragments F = {F1, F2, F3,…, Fn}
2d. Graphical Representation of ALP values
Results
52
3. Construct ALP[R] from ISUD[R] based on
Cost functions
4. For the highest valued attribute of ALP table, select individual ALP value then
4a. Generate predicate set P={ P1, P2, … ,Pm }
4b. Rearrange P so that #P = #S
4c. Fragment R using P as selection predicate
(R) p
4d. Allocate F to S.
The pseudo code of the above algorithm can be seen in section 8.3.
4.2.1.3.1 Designing the User Interface for Customized ISUD Application
The user interface of ISUD application is developed in the C#.Net windows
application which is used to calculate the precedence of attributes according to the
given predicate which is called ALP (Attribute locality precedence) with the help of
existing algorithm defined by [24]. The customized ISUD table is considered as input
to the user interface. A User interface is also used to calculate the frequencies of
ISUD (Insert, Select, Update, and Delete) and generating a ALP table, which will
show the attribute with highest precedence value that is then, treated as the most
important attribute for fragmentation. This user interface provides different
functionalities to the end-users to retrieve ALP table according to set of predicates and
also retrieve individual ALP table according to the site. It also provides graphical
representation of the ALP table results which explains the testing of the algorithm.
The user interface for customized ISUD application is shown in figure-13.
Figure 13: User Interface for CISUD Application.
Results
53
4.2.1.4 Fragmentation Layer
In the fragmentation layer, the decision for the horizontal fragmentation is taken by
database administrator, being based on the attribute locality precedence (ALP) which
is generated from user interface. A predicate set is used to take the fragmentation
decision after ALP table values are retrieved. The predicate set which is constructed
based on the highest individual ALP table values according to sites for each relation
of the case study application. This selected predicate set has become the starting point
for horizontal fragmentation in each relation of the case study application. In the later
part of this section, we can understand the construction of the predicates set with the
help of ALP table.
4.2.1.4.1 Constructing Predicates Set
The predicate set is constructed based on two things: highest individual ALP values
and total ALP values. The highest individual ALP describes the importance of the
predicate or the attributes value according to individual sites which is shown in figure-
14 for good understanding. The total ALP also explains the total attributes values
from all sites. After getting the ALP value of the relation with the help of the user
interface, the end users has the flexibility to construct the predicate set at individual
site. The end user or the database administrator can take the decision to fragment the
data horizontally at a particular site on the basis of predicate set. The following
figures shows how to set and get the predicate set value from each site.
Figure 14: Interface for setting and getting the Predicate set for individual
highest attribute.
The above figure-14 is an interface for getting the values of the predicate set
individually with selected sites. This interface was built in C#.Net windows
application.
Results
54
Figure 15: Predicate Set for Highest Attribute Precedence at individual site
P = {site1: BillNos = 1, site2: BillNos = 2, site3: BillNos = 3}
By using figure-15 above, we have constructed the predicate set P which describes the
importance of the attributes values at different sites for taking fragmentation’s
decision at a particular site according to the highest ALP value. The blue colored lines
which are mentioned in figure-15 show the highest predicate value of particular
attribute at particular sites.
4.2.1.5 Allocation Layer
The discussion of the allocation layer is beyond the scope of our thesis, a layer which
is excluded in our proposed technique, but it could be taken as further work to
enhance our proposed technique for fragment data allocation. This layer plays a vital
role for the allocation of data fragmentation at different sites, which can be seen in
figure-16 as an example. Here, we are not showing the allocation process, but by the
help of predicate set, the end user or the database administrator can take initiative to
allocate the data to particular sites. The detail theoretical demonstration of allocation
can be seen in section 2.8 as five generic steps, which could be helpful in further
developing the distributed database environment.
Figure 16: Allocation of Fragments
Site1 Site2
Site3
BillNos =2
BillNos =3
BillNos =1
Results
55
The above figure-16 shows the allocation of data at different sites based on the
attribute locality precedence.
4.2.2 Testing the Proposed Algorithmic approach
To test our proposed algorithmic approach practically, we have developed a user
interface; the detailed explanation can be found in section 4.2.1.3, which helps to test
the algorithm as per the changes in the frequencies of ISUD, which is used as inputs.
In this section, we tried to address the other part of our second research question i.e.
“How to implement and test the proposed algorithmic approach?” we also tried to
explain the development of the User Interface tool and its results. This tool helps to
take proper horizontal fragmentation decisions for database administrator or end user
at initial stage of database designing. This tool is also served to calculate frequencies
of the ISUD matrix and generate ALP table. After generating ALP table, a predicate
set, based on highest precedence value from ALP table for each relation is
constructed. By using predicate sets, a proper fragmentation decision is taken at
different sites according to the highest precedence value. For the development of the
tools, we have used C#.Net language as front end application and MS access
databases as a back end application according to the need of the case study (Bharat
Transport Service). The implementation of results is categorized into total value of
ALP and individual value of ALP which is expressed in graphical form in the
following sections.
4.2.2.1 The Retrieved Result of Total ALP Value from all the sites
Figure-17 is a user interface of CISUD application, which provides the functionality
to get CISUD (Insert, Select, Update, Delete) frequencies from the CISUD matrix
table in the database. These CISUD frequencies can be retrieved automatically by
selecting a combo box and this combo box is reserved for Total predicate name from
the relation in database. The frequencies appeared in textboxes are retrieved from the
CISUD matrix table in the database.
Figure 17: ISUD User Interface for Total cost of Attribute From all Sites
Results
56
The figure-17, also explains the total frequencies of all the sites. After getting the
frequencies in respective textboxes, the user can click the button “GetALPTable” and
retrieve the results by using the interface which can be shown in figure-18.
Figure 18: Results Retrieve for Total ALP(Attribute Locality Precedence) value from
three sites.
The figure-18 above, describes the ALP values at all three sites. It shows the total cost
value of each attribute occurrence of the case study application (Bharat Transport
Service) from all the three sites. This work also supported by ISUD technique in [24].
After the analysis of retrieved ALP values from all the sites, we can visualize from the
blue colored line attribute with highest value as compare to the rest of attributes ALP
values i.e. “Billnos”. By using this highest precedence ALP value of the attribute
“Billnos”, the database administrator can easily assess which attribute ALP value
coming from all sites is the highest. This attribute with highest total ALP value of the
attribute “Billnos” is chosen for calculating the ALP value at each site. The total ALP
value of different attributes of relations may help the database administrator to take
decision on the basis of the highest value of the attribute for getting individual value
of ALP from individual site.
4.2.2.2 The Retrieved Result of ALP Value from individual sites
This section explains the retrieval of ALP value resulting from individual sites and
also highlights the drawback of the technique explained in [24], because the
aforementioned technique does not explain the ALP value and the cost factor of
predicate at individual site. So, we have also contributed in our research work to
retrieve the ALP value resulting from the individual sites for the end-users, on the
basis of this ALP-value, the database administrator can decide to horizontally
fragment the relation according to the predicates used.
Results
57
Figure 19: ISUD User Interface for individual cost of Attribute from individual sites
We have given one of the solutions to get the ALP value from individual sites by User
interface as shown in figure-19. The user can select the ISUD frequencies from the
combo box individual predicate name and also select the site from the combo box
select site according to the end-users choice. The user can click the button
“IndividualALPTable” and retrieve the results by using the interface which can be
shown in figure-20. The following figure-20 shows the sample of an example.
Figure 20: Individual ALP Results from individual sites
In the above figure-21, we have shown the results of the predicate cost with its
attribute name and its value at specific sites. The blue colored lines make predicates
sets and explain the number of occurrence or value of ALP of the same attribute name
with different values at individual sites. After the analysis of the predicates sets, the
Results
58
database administrator can take the decision for the selection of the site for data
fragmentation and allocation.
Figure 21: Allocation of data to different sites
From the above figure we can see that, the retrieve results of ALP from different sites
can be allocated based on the highest ALP value of particular site.
4.2.2.3 Graphical Representation of the ISUD application Results
The graphical representation of ISUD application shows the interpretations of various
results which we have been achieved in the development phase in section 4. The
importance of the graphical representation of the ISUD applications helps to test the
linearity of the applications or algorithm by using various ISUD frequencies inputs.
We have taken two results as interpretation of ISUD application results as shown in
the following section.
4.2.2.3.1 Interpretation of Result-1
The variation of the ALP results is directly proportional to the number of predicates
and ISUD frequencies. The following mathematical expression1 which is consist of
an array that contains various numbers of predicates of respective individual
attributes.
totPredicates = new int[15] { 4, 2, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0 };……………………………………………………………………..> Mathematical Expression1
In the following figure-22, different ISUD input values which we have taken in ISUD
entry interface can be seen. The changes in the ISUD inputs help to achieve different
results.
Figure 22: ISUD input values (1)
Site1 Site2
Site3
BillNos =2
BillNos =3
BillNos =1
Results
59
The following figure-23 depicts the interpretation of result1 of total ALP values with
respect to individual attribute name in tabular or two dimensional forms according to
ISUD input values as shown in figure-23.
Figure 23: Interpretation of Result 1
Figure-24 also shows the interpretation of result 1 in graphic form. This graphic form
gives the holistic view of the ISUD application results which explain the relation
between the various attributes and frequencies from all sites in distributed
environment. This graphic representation consists of two variables. One variable is
served for the name of attributes in x-axis and other variable is served for total
number of ALP values. This graphic notation in figure-30 also highlights the testing
and measurement of the algorithm or the proposed ISUD technique for practitioners
(database administrator or end-users) and for taking the decision at the initial state of
database fragmentation efficiently. As there are changes in the inputs, the results in
the graphic form also changes. Thus, due to different operational changes in the
results, the performance of the algorithm can be seen. In graphic interpretation of
result 1, we can assess that the attribute named “Billnos” ALP value is much greater
than the attribute named “NameofReceipent” from all three sites.
Figure 24: Graphical Interpretation of Result 1
Results
60
4.2.2.3.2 Interpretation of Result-2
This interpretation of result2 depicts another example with changes in total number of
predicates and changes in ISUD frequencies as shown in mathematical expression-2
and in figure-25.
totPredicates = new int[15] { 3, 2, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0 };……………………………………………………………………..> Mathematical Expression2
Figure 25: ISUD input values (2)
The procedure of interpretation of results, we have been already discussed in detail in
above section 4.2.2.3.1 for avoiding repetition. Similarly, the following figure-26
depicts the interpretation of result2 of total ALP values with respect to individual
attribute name in tabular or two dimensional forms according to ISUD input values as
shown in figure-26.
Figure 26: Interpretation of Result 2
Results
61
Figure 27: Graphical Interpretation of Result 2
The figure-27 also shows the interpretation of result2 in graphical form. This
graphical form gives also holistic view of the ISUD application results which explains
about the relation between various attributes and frequencies from all sites in
distributed environment which has already explained in interpretation of result1. In
graphical interpretation of result2, we can assess that the attribute named
“NameofReceipent” ALP value is much greater than attribute named “Billnos” from
all three sites.
Thus, from the above two interpretations of results in graphic form, which is one of
the solutions to justify efficiently the part of our second research question i.e. “How
to implement and test the proposed algorithmic approach?” But overall, the practical
results justify the research question “How to design the architecture of designated
algorithm from Q1?, How to implement and test the proposed algorithmic
approach?” efficiently for the reader’s benefit.
Discussion
62
5 Discussion In this dissertation, our objective has been to investigate the efficient algorithm or
technique for database fragmentation, develop and test the investigated algorithm in
distributed environment. We have constructed a comparative framework in section-
5.1 on the basis of some characteristics; distributed database designing at initial stage
for partitioning the relations, horizontal fragmentation algorithm, and affinity matrix
to build a predicate, relational database, complexity of the techniques, allocation
algorithm, performance and efficiency from the literature review. This comparative
framework highlights most efficient algorithm or technique for the development of
database fragmentation in distributed environment.
We have presented the proposed five layer architecture technique which is our
proposed technique and described different layers; application layers, database layers,
mediator layer, fragmentation layer and allocation layer. These layers perform various
functionalities which we have explained in detail in section 4.2.1. Similarly, the
implemented proposed technique can be useful for developing user interface for
calculating ALP values which can be useful for horizontal database fragmentation and
for testing the algorithm. The following section describes the main contribution of this
dissertation.
5.1 Contribution of the Work
Our customized approach gives motivation to the developers and end-users to justify
the aforementioned characteristics. The proposed ISUD approach can be one of the
solutions which help to fragment the database at the initial stage of distributed
database designing. This customized technique also gives confidence to the developer
for taking the decision for horizontal fragmentation by using specific algorithmic
approach.
The customized ISUD technique creates ALP table in the user interface with the help
of ISUD matrix table which is generated by logs files. This customized ISUD
technique also supports horizontal database fragmentation in distributed database
environment. The customized technique provides less complexity with respect to
execution for data fragmentation in distributed database systems as compared to the
previous techniques which we have been mentioned in table-1 and support efficiency,
performance with respect to cost factor and allocation phenomenon.
The structure of customized ISUD technique is taken from [24] and we have
developed it for the implementation is concerned in our research work, because this
technique provides theoretical solution for the horizontal database fragmentation. Our
proposed work, the customized ISUD technique enhances the features and also the
creation of individual ALP table from various individual sites because [24], only
emphasis on summarized total cost of attribute locality precedence (ALP) from all the
sites. Thus our customized ISUD technique also provides one of the solutions to the
database administrator to get the individual cost of attribute at each site, so that the
database administrator can take proper decisions for database fragmentation at the
concerned site where the attribute locality precedence (ALP) is maximum.
Discussion
63
We have also presented ISUD algorithm in graphic form which explains to the
database administrator who can analyze the performance of the algorithm by using
ISUD frequencies as input to the ISUD user interface application.
Conclusion and Future Work
64
6 Conclusion and Future Work
6.1 Conclusion This research work emphasizes the horizontal fragmentation phenomenon in
distributed environment in database industry. The main focus is on justifying the
theoretical purpose and the practical purpose in this research work. The theoretical
purpose is addressed to explore or search the existing algorithms which help to
uniformly fragment the relations over the distributed environment. The practical
purpose of this dissertation is to implement and develop the user interface application,
which helps for horizontal database fragmentation, and test the performance of the
selected algorithm, from the literature review in distributed database environment.
So the main contribution of our dissertation is to develop and implement, a 5-layer
architecture described in section 4.2.1, by using existing fragmentation technique
[24], which help to the database administrator or end-users to take efficient decisions
for fragmenting relations in distributed environment. To address the database
fragmentation complexity factors, we have proposed customized ISUD technique (5-
layer architecture) which is used efficiently as one of the solutions for the database
fragmentation in distributed environment.
We have conducted session meeting as high level method for data acquisition from
the domain experts and knowledge mentors for better understanding of the domain’s
problems. We have also chosen the development methodology design science
research (DSR) which is explained in section 3.4 for the development of customized
ISUD technique for relations fragmentation horizontally at the initial stage of
distributed database environment. This customized ISUD application or user interface
facilitates to calculate the total cost of an attribute from different sites and also
calculate individual cost of an attribute with respect to defined predicate at nominated
site.
One of the main objectives of this proposed customized ISUD technique is to show
the highest precedence value of the attribute (ALP value) in graphic form, it also
motivates the database administrator or end-users to take decisions for fragmenting
the relations at initial stage of distributed database environment. Thus by observing
the graphical statistics of ALP (Attribute locality precedence) table, we can easily
evaluate or measure the performance of the algorithm by having different operational
changes of inputs in ISUD frequencies.
We have discussed the different existing techniques or algorithms which we have
been explained in detail in section 2.5. These existing techniques or algorithms have
different pros and cons. So, after analysing the comparative framework of existing
techniques, we have suggested one of the solutions for fragmenting relations
efficiently in distributed database environment. Evaluating this technique, can be
useful with respect to performance, cost factor of the algorithm by using different
operational changes in the ISUD frequencies matrix.
Conclusion and Future Work
65
This dissertation can be considered as foundation for horizontal relation’s
fragmentation at initial stage of distributed database and also supports already
distributed database environment.
6.2 Future Work This research work has laid down the foundation for further work in the area of
vertical fragmentation and in heterogeneous distributed database environment. The
customized technique adopted in this research is also useful for the data allocation at
different sites based on the results obtained from the ALP (attribute locality
precedence) table by using ISUD application user interface. IT can be extended to
support fragmentation in distributed object oriented database.
References
66
7 References [1] M. Tamer Özsu,(1998) (2011), “Principles of Distributed Database Systems”,
University of Waterloo, Ontario, Canada N2L 3G1.
[2] Gomer Thomas, Glenn R. Thompson, Chin-wan Chung, Edward Barkmeyer,
Fred Carter, Marjorie Tempeton, Stephen Fox, Berl Hartman, (1990),
“Heterogeneous Distributed Database Systems for Production Use”, © ACM
Computing Surveys, Vol. 22, No. 3.
[3] Amit P. Sheth, James A. Larson, (1990), “Federated Database Systems for
Managing Distributed Heterogeneous, and Autonomous Databases”, © ACM
Computing Surveys, Vol. 22, No. 3.
[4] Jacob Slonim, Dave Schmidt, Paul Fisher, (1979), “Considerations for
Determining the Degree of Centralization or Decentralization in the
Computing Environment”, © North-Holland Publishing Company, Information
& Management 2, ppt. 15-29, USA.
[5] Haroun Rababaah, “Distributed Databases Fundamentals and Research”,
Advanced Database – B561. Spring 2005. Dr. H. Hakimzadeh, Department of
Computer and Information Sciences, Indiana University South Bend.
[6] Jason Durbin and Lance Ashdown. Oracle8i Distributed Database
Systems, Release 2(8.1.6).Oracle Corporation,1999.
[7] Stephens Smith, Article on Accpac and It’s Databases in Stephen Smith's
Blog. http://smist08.wordpress.com/2010/07/10/accpac-and-it%E2%80%99s-
databases/
[8] Marton Trencseni, Attila Gazso (2009). "Keyspace: A Consistently Replicated,
Highly-Available Key-Value Store". http://scalien.com/whitepapers. Retrieved
2010-04-18.
[9] Mike Burrows (2006). "The Chubby Lock Service for Loosely-Coupled
Distributed Systems". http://labs.google.com/papers/chubby.html. Retrieved
2010-04-18.
[10] Dr. George Schussel, DCI's founder, is Chairman of Database & Client/Server
World and a world-renowned authority on information systems and
client/server technology. http://www.dciexpo.com/geos/replica.htm
[11] Ed Boyajian, President and Chief Executive Officer
http://www.enterprisedb.com/docs/en/8.4/repserver/Postgres_Plus_Advanced_
Server_Replication_Server_Users_Guide-08.htm#TopOfPage
[12] Advanced Database – B561. Spring 2005. Dr. H. Hakimzadeh Department of
Computer and Information SciencesIndiana University South Bend
[13] M. Tamer azsu, GTE Laboratories* Patrick Valduriez, INRIA Distributed
Database Systems:Where Are We Now.
[14] Huang Y.F., Chen J., (2001), “Fragment Allocation in Distributed Database
Design”, Journal of Information Science and Engineering, Vol. 17, ppt. 491-
506.
[15] Lindholm A., (2008), “A Constructive Study on Creating Core Business
relevant CREM Strategy and Performance Measures”, Printed by ©
Emerald Group, Facilities, Vol. 26, No. 7-8, pp. 343-358.
References
67
[16] Fareedi A.A., (2010) “Ontology-based Model for the “Ward-round “ Process in
Healthcare (OMWRP)”, Master’s thesis, School of Engineering of Jönköping
University (2010).
[17] Baiao F., Mattoso M., Zaverucha G., (2002) “A Framework for the Design of
Distributed Databases”, Computer Science Department, COPPE/UFRJ, Federal
University of Rio de Janeriro, Brazil.
[18] Daudpota N.H., (1998) “Five Steps to Construct a Model of Data Allocation
for Distributed Database Systems”, Journal of Intelligent Information Systems,
Vol. 11, ppt. 153-168, Netherland.
[19] Son J.H, Kim M.H., (2003) “An Adaptable Vertical Partitioning Method in
Distributed Systems”, Journal of Systems and Software, Elsevier.
[20] Yee W.W.G, Donahoo M.J., Navathe S.B., (2000) “A Framework for Server
Data Fragment Grouping to Improve Server Scalability in Intermittently
Synchronized Databases”, CIKM.
[21] Hababeh I.O, Bowring N., (2003) “A Method for Fragment Allocation Design
in the Distributed Database Systems”, UGRU-4, The Sixth Annaul U.A.E
University Research Conference.
[22] Vaishnavi V.K, Kuechler Jr. W., (2008) “Design Science Research Methods
and Patterns: Innovating Information and Communication Technology”,
Auerbach Publications, Taylor and Francis Group, ISBN 978-1-4200-5932-8.
New York, USA.
[23] Dynamic Object Fragmentation and Replication Algorithm In Distributed
Database Systems by Azzam Sleit, Wesam AlMobaideen, Samih Al-Areqi, and
Abdulaziz Yahya, King Abdulla II School for Information Technology,
University of Jordan, Amman, Jordan.
[24] A New Technique for Database Fragmentation in Distributed Systems by
Shahidul Islam Khan and Dr. A. S. M. Latiful Hoque Department of Computer
Science & Engineering, Bangladesh University of Engineering & Technology.
[25] M. T. Ozsu and P. Valduriez, Principles of Distributed Database Systems, 3rd
ed., New Jersey: Prentice-Hall,2011
[26] S. Navathe, K. Karlapalem, and M. Ra, “A mixed fragmentation methodology
for initial distributed database design,” Journal of Computer and Software
Engineering Vol. 3, No. 4 pp 395–426, 1995.
[27] Leniel Braz de Oliveira Macaferi TOP-DOWN APPROACH IN
DISTRIBUTED DATABASES Barra Mansa ,November 2007.
[28] Hui Ma,”Distribution Design for Complex Value Databases”, dissertation
presented in partial fulfilment of the requirements for the degree of Doctor of
Philosophy in Information Systems at Massey University 2007.
[29] Ceri, S., and Pelagatti, G. Distributed Databases Principles and System.
McGraw- Hill, New York, 1984.
[30] S. B. Navathe, S. Ceri, G. Wiederhold, and J. Dour, “Vertical partitioning
algorithms for database design,” ACM Transactions on Database Systems
(TODS), Vol. 9, No. 4, pp. 680–710, 1984.
[31] C. H. Cheng, W. K. Lee, and K. F. Wong, “A genetic algorithm-based
clustering approach for database partitioning,” IEEE Transactions on Systems,
Man, and Cybernetics, Vol. 32, No. 3, pp. 215–230, 2002.
References
68
[32] F. F. Marwa, I. E. Ali, A. A. Hesham, “A heuristic approach for horizontal
fragmentation and alllocation in DOODB,” in Proc. INFOS2008, 2008, pp. 9-
16.
[33] Hadj Mahboubi and J´erome Darmont University of Lyon (ERIC) “Enhancing
XML Data Warehouse Query Performance by Fragmentation,” in Proc. ACM
SAC09, 2009, pp.1555-1562.
[34] Wenfie Fan, Introduction to XML and Relational Databases,
http://homepages.inf.ed.ac.uk/wenfei/cs2/lecture/ln1.pdf, lecture note 1,PP.11-
17, Spring 2005.
[34] Ezeife, C.I. and Zheng, J, 1998. Measuring the Performance of Database Object
Horizontal Fragmentation Schemes, Supported by NSERC of Canada.
Appendix
69
8 Appendix:
8.1 Case study Application
Figure 28: Case Study Application
8.2 Log File Code for generating Customized ISUD matrix table
Log files are used to create the ISUD matrix table in the database. For each query this
log files function is attached whenever the queries are executed or run by using the
application interface.
public void writeToLogFile(string user_name, string query_name,string
attribute_name,string attribute_value, DateTime time)
{
System.Console.WriteLine("I am in writeToLogFile function.");
con = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0; Data
Source=C:\\Documents and Settings\\All
Users\\Documents\\mydatabasethesis\\bharattransportservice.mdb ");
System.Console.WriteLine("Database:["System.Environment.CurrentDirect
ory + "\\bharattransportservice.mdb]");
con.Open();
//Code for writing information to log file.
string logFile = "";
string logFile_Path = "";
string query = "insert into Access_Record values('" + user_name + "
','" + query_name + "','" + attribute_name + "','" + attribute_value
+ "','" + time + "')";
System.Console.WriteLine(query);
Appendix
70
cmd = new OleDbCommand(query, con);
cmd.CommandType = CommandType.Text;
int i = cmd.ExecuteNonQuery();
con.Close();
//Open the log file.
//Write the information
//Close the file.
}
8.3 Algorithm for ISUD application Interface
Below is a sample code of implementation of our proposed technique by using the
pseudo code algorithm of [24] in C#.Net windows application.
/*pseudo-Code to generate ALP */
for (int i = 0; i < totAttribues; i++)
{
for (int j = 0; j < totPredicates[i]; j++)
{
MAX[i, j] = 0;
for (int k = 0; k < totSites; k++)
{
System.Console.WriteLine("1..K:[" + k + "]");
for (int r = 0; r < totApplications[k]; r++)
{
//Calculations
C[i, j, k, r] = (fi * I) + (fs * S) + (fu
* U) + (fd * D);
SS[i, j, k] += C[i, j, k, r];
}//end of 4th loop
if (SS[i, j, k] > MAX[i, j]) //
{
MAX[i, j] = SS[i, j, k];
POS[i, j] = k;
}
System.Console.WriteLine("POS[i, j][" +
POS[i, j] + "]");
tempSite = k;
Dgsiteview.Rows.Add(k, AttNames[i],
totPredicates[i]);
//System.Console.WriteLine("2..K:[" + k +
"]");
SumOther = 0;
for (int r = 0; r < C[i, j, k, k]; r++)
{
if (r != k)
{
SumOther += SS[i, j, k];
}
// Dg.Rows.Add(cmbsites.SelectedItem,
AttNames[i], ALP[i]);
Appendix
71
}
}//end of 3rd loop
ALPsingle[i, j] = SS[i, j, POS[i, j]] - SumOther;
// DGcostofPredofatt.Rows.Add(POS[i, j],
AttNames[i], ALPsingle[i, j]);
}//end of 2nd loop
ALP[i] = 0;
for (int j = 0; j < totPredicates[i]; j++)
{
ALP[i] += ALPsingle[i, j];
//Dgsiteview.Rows.Add(tempSite, AttNames[i],
ALP[i]);
System.Console.WriteLine("tempSite = " +
tempSite);
}
}//end of 1st loop
for (int i = 0; i < ALP.Length; i++)
{
//System.Console.WriteLine("ALP[" + i + "] = " +
ALP[i]);
//System.Console.WriteLine("|" + AttNames[i] + " | "
+ ALP[i] + "|");
//System.Console.WriteLine("--------------------");
DGcostofPredofatt.Rows.Add(AttNames[i], ALP[i]);
}
System.Console.WriteLine("ALP");
for (int i = 0; i < ALP.Length; i++)
{
}
}
End of ISUD application//