53
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University

PMIT-6102 Advanced Database Systems

  • Upload
    cassia

  • View
    60

  • Download
    0

Embed Size (px)

DESCRIPTION

PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture 06 Distributed Database Design. Outline. Distributed Database Design Distributed Design Problem Distributed Design Issues Fragmentation Data Allocation. - PowerPoint PPT Presentation

Citation preview

Page 1: PMIT-6102 Advanced Database Systems

PMIT-6102Advanced Database Systems

By-Jesmin Akhter

Assistant Professor, IIT, Jahangirnagar University

Page 2: PMIT-6102 Advanced Database Systems

Lecture 06 Distributed Database Design

Page 3: PMIT-6102 Advanced Database Systems

Outline Distributed Database Design

Distributed Design Problem Distributed Design Issues

o Fragmentationo Data Allocation

Slide 3

Page 4: PMIT-6102 Advanced Database Systems

The design of a distributed computer system involves making decisions on the placement of data and

programs across the sites of a computer network as well as possibly designing the network itself

In the case of distributed DBMSs, the distribution of applications involves two things: distribution of the distributed DBMS software distribution of the application programs that run on it.

Are not significant problem Assume a copy of the distributed DBMS software exists

at each site where data are stored Network has already been designed

We concentrate on distribution of dataSlide 4

Distributed Design Problem

Page 5: PMIT-6102 Advanced Database Systems

Distribution Design Issues

The following set of interrelated questions covers the entire issue.

Why fragment at all?

How should we fragment?

How much should we fragment?

How to test correctness?

How should we allocate?

What is the necessary information for fragmentation and allocation?

Slide 5

Page 6: PMIT-6102 Advanced Database Systems

Reasons for Fragmentation

The important issue is the appropriate unit of distribution. A relation is not a suitable unit, for a number of

reasons.o First, application views are usually subsets of relations. o Therefore, the locality of accesses of applications is

defined not on entire relations but on their subsetso Hence consider subsets of relations as distribution

units.

Slide 6

Page 7: PMIT-6102 Advanced Database Systems

Reasons for Fragmentation

The relation is not replicated and is stored at only one site, results in an unnecessarily high volume of remote data

accesses

The relation is replicated at all or some of the sites where the applications reside. May has unnecessary replication, which causes problems in

executing updates may not be desirable if storage is limited.

Finally, the decomposition of a relation into fragments, each being treated as a unit, permits a number of transactions to execute concurrently. Thus fragmentation typically increases the level of concurrency

and therefore the system throughput.

Slide 7

Page 8: PMIT-6102 Advanced Database Systems

Relation instances are essentially tables, so the issue is one of finding alternative ways of dividing a table into smaller ones.

There are clearly two alternatives for this: dividing it horizontally or dividing it vertically.

Slide 8

Fragmentation Alternatives

Page 9: PMIT-6102 Advanced Database Systems

Fragmentation Alternatives – Horizontal

PROJ1 : projects with budgets less than $200,000

PROJ2 : projects with budgets greater than or equal to $200,000

PROJ1

PNO PNAME BUDGET LOC

P3 CAD/CAM 250000 New York

P4 Maintenance 310000 Paris

P5 CAD/CAM 500000 Boston

PNO PNAME LOC

P1 Instrumentation 150000

Montreal

P2Database Develop. 135000 New York

BUDGET

PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1Instrumentation 150000 Montreal

P3 CAD/CAM 250000P2Database Develop. 135000

P4 Maintenance 310000 ParisP5 CAD/CAM 500000 Boston

Example of Horizontal Partitioning

Slide 9

Page 10: PMIT-6102 Advanced Database Systems

Fragmentation Alternatives – Vertical

PROJ1: information about project budgets

PROJ2: information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CAD/CAM New YorkP2Database Develop. New York

P4 Maintenance ParisP5 CAD/CAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CAD/CAM 250000P2 Database Develop.135000

P4 Maintenance 310000 ParisP5 CAD/CAM 500000 Boston

Example of Vertical PartitioningSlide 10

Page 11: PMIT-6102 Advanced Database Systems

Correctness of Fragmentation

Completeness Decomposition of relation R into fragments R1, R2, ..., Rn is

complete if and only if each data item in R can also be found in some Ri

This property, which is identical to the lossless decomposition property of normalization

it ensures that the data in a global relation are mapped into fragments without any loss

Slide 11

Page 12: PMIT-6102 Advanced Database Systems

Correctness of Fragmentation

Reconstruction If relation R is decomposed into fragments FR={R1,

R2, ..., Rn}, then there should exist some relational operator ∇ such that

R = Ri ,∇ The reconstructability of the relation from its fragments

ensures that constraints defined on the data in the form of dependencies are preserved.

Slide 12

Page 13: PMIT-6102 Advanced Database Systems

Correctness of Fragmentation

Disjointness If relation R is horizontally decomposed into fragments

FR={R1, R2, ..., Rn}, and data item dj is in Rj, then dj should not be in any other fragment Rk (k ≠ j ).

This criterion ensures that the horizontal fragments are disjoint.

If relation R is vertically decomposed, its primary key attributes are typically repeated in all its fragments (for reconstruction).

Therefore, in case of vertical partitioning, disjointness is defined only on the non-primary key attributes of a relation.

Slide 13

Page 14: PMIT-6102 Advanced Database Systems

Allocation Alternatives

Non-replicated partitioned : each fragment resides at only one site

Replicated fully replicated : each fragment at each site partially replicated : each fragment at some of the sites

Slide 14

Page 15: PMIT-6102 Advanced Database Systems

Information Requirements

The information needed for distribution design can be divided into four categories:

Database information

Application information

Communication network information

Computer system information

Slide 15

Page 16: PMIT-6102 Advanced Database Systems

Fragmentation

Horizontal Fragmentation (HF)

Vertical Fragmentation (VF)

Hybrid Fragmentation (HF)

Slide 16

Page 17: PMIT-6102 Advanced Database Systems

Horizontal Fragmentation (HF)

Horizontal fragmentation partitions a relation along its tuples. each fragment has a subset of the tuples of the relation.

There are two versions of horizontal partitioning: Primary horizontal fragmentation Derived horizontal fragmentation

Primary horizontal fragmentation of a relation is performed using predicates that are defined on that relation.

Derived horizontal fragmentation is the partitioning of a relation results from predicates being defined on another relation.

Slide 17

Page 18: PMIT-6102 Advanced Database Systems

Information Requirements of HF

Database Information how the database relations are connected to one another,

especially with joins.

In the relational model, these relationships are also depicted as relations.

cardinality of each relation: card(R)

TITLE, SAL

SKILL

ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC

ENO, PNO, RESP, DUR

EMP PROJ

ASG

L1

L2 L3

Expression of Relationships Among Relations Using Links

Slide 18

Page 19: PMIT-6102 Advanced Database Systems

Application Information simple predicates : Given R[A1, A2, …, An], a simple

predicate pj is

pj : Ai θValue

where θ {=,<,≤,>,≥,≠}, Value Di and Di is the domain of Ai.

For relation R we define Pr = {p1, p2, …,pm}

Example :PNAME = "Maintenance"

BUDGET ≤ 200000

Information Requirements of HF

Slide 19

Page 20: PMIT-6102 Advanced Database Systems

Application Information minterm predicates : Given R and Pr = {p1, p2, …,pm}

define M = {m1,m2,…,mr} as

M = { mi | mi = pjPr pj* }, 1≤j≤m, 1≤i≤z

where pj* = pj or pj* = ¬(pj).

Information Requirements of HF

Example

m1: PNAME="Maintenance" BUDGET≤200000

m2: NOT(PNAME="Maintenance") BUDGET≤200000

m3: PNAME= "Maintenance" NOT(BUDGET≤200000)

m4: NOT(PNAME="Maintenance") NOT(BUDGET≤200000)

Slide 20

Page 21: PMIT-6102 Advanced Database Systems

Application Information In terms of quantitative information about user applications,

we need to have two sets of data.

minterm selectivities: sel(mi)

o The number of tuples of the relation that would be accessed by a user query which is specified according to a given minterm predicate mi.

access frequencies: acc(qi)

o The frequency with which a user application qi accesses data.

o Access frequency for a minterm predicate can also be defined.

Information Requirements of HF

Slide 21

Page 22: PMIT-6102 Advanced Database Systems

Primary Horizontal Fragmentation

Definition :A primary horizontal fragmentation is defined by a selection operation on the owner relations of a database schema. Therefore, given relation R, its horizontal fragments are given by

Ri = Fi(R), 1 ≤ i ≤ w

where Fi is a selection formula used to obtain fragment Ri. If Fi is in conjunctive normal form, it is a minterm predicate (mi).

A horizontal fragment Ri of relation R consists of all the tuples of R which satisfy a minterm predicate mi.

Given a set of minterm predicates M, there are as many horizontal fragments of relation R as there are minterm predicates.

Set of horizontal fragments also referred to as minterm fragments.

Slide 22

Page 23: PMIT-6102 Advanced Database Systems

Primary Horizontal Fragmentation PROJ1 = LOC=“Montreal” (PROJ)

PROJ2 = LOC=“New York” (PROJ)

PROJ3 = LOC=“Paris” (PROJ)

Slide 23Primary Horizontal Fragmentation of Relation PROJ

Page 24: PMIT-6102 Advanced Database Systems

PHF – AlgorithmGiven: A relation R, the set of simple predicates Pr

Output: The set of fragments of R = {R1, R2,…,Rw} which obey the fragmentation rules.

Preliminaries :

Pr should be complete

Pr should be minimal

Slide 24

Page 25: PMIT-6102 Advanced Database Systems

Completeness of Simple Predicates A set of simple predicates Pr is said to be complete if

and only if there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to Pr.

Example :Assume PROJ[PNO,PNAME,BUDGET,LOC] has two

applications defined on it.Find the budgets of projects at each location. (1)Find projects with budgets less than $200000. (2)

Slide 25

Page 26: PMIT-6102 Advanced Database Systems

Completeness of Simple Predicates

According to (1),

Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”}

which is not complete with respect to (2).

Modify

Pr ={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”, BUDGET≤200000,BUDGET>200000}

which is complete.

Slide 26

Page 27: PMIT-6102 Advanced Database Systems

Minimality of Simple Predicates If a predicate influences how fragmentation is performed,

(i.e., causes a fragment f to be further fragmented into, say, fi and fj) then there should be at least one application that accesses fi and fj differently.

In other words, the simple predicate should be relevant in determining a fragmentation.

If all the predicates of a set Pr are relevant, then Pr is minimal.

Pi is relevant if and only if

acc(mi ) the access frequency of a minterm mi.

Slide 27

)(

)(

)(

)(

j

j

i

i

fcard

macc

fcard

macc

Page 28: PMIT-6102 Advanced Database Systems

Minimality of Simple Predicates

Example :

Pr ={LOC=“Montreal”,LOC=“New York”, LOC=“Paris”,

BUDGET≤200000,BUDGET>200000}

is minimal (in addition to being complete). However, if we add

PNAME = “Instrumentation”

then Pr is not minimal.

Slide 28

Page 29: PMIT-6102 Advanced Database Systems

COM_MIN Algorithm

Given: a relation R and a set of simple predicates Pr

Output: a complete and minimal set of simple predicates Pr' for Pr

Rule 1: a relation or fragment is partitioned into at least two parts which are accessed differently by at least one application.

Slide 29

Page 30: PMIT-6102 Advanced Database Systems

COM_MIN Algorithm Initialization :

find a pi Pr such that pi partitions R according to Rule 1

set Pr' = pi ; Pr Pr – {pi} ; F {fi}

Iteratively add predicates to Pr' until it is complete find a pj Pr such that pj partitions some fk defined

according to minterm predicate over Pr' according to Rule 1

set Pr' = Pr' {pj}; Pr Pr – {pj}; F F {fj}

if pk Pr' which is nonrelevant then

Pr' Pr' – {pk}

F F – {fk}

Slide 30

Page 31: PMIT-6102 Advanced Database Systems

PHORIZONTAL AlgorithmMakes use of COM_MIN to perform fragmentation.

Input: a relation R and a set of simple predicates Pr

Output: a set of minterm predicates M according to which relation R is to be fragmented

Pr' COM_MIN (R,Pr) determine the set M of minterm predicates

determine the set I of implications among pi Pr‘

Iteratively eliminate the contradictory minterms from M

M M-mi

Slide 31

Page 32: PMIT-6102 Advanced Database Systems

PHF – Example Two candidate relations : PAY and PROJ. Fragmentation of relation PAY

Application: Check the salary info and determine raise. Employee records kept at two sites application run at

two sites Simple predicates

p1 : SAL ≤ 30000

p2 : SAL > 30000

Pr = {p1,p2} which is complete and minimal Pr'=Pr

Minterm predicates

m1 : (SAL ≤ 30000)

m2 : NOT(SAL ≤ 30000) = (SAL > 30000)

Slide 32

Page 33: PMIT-6102 Advanced Database Systems

PHF – Example

TITLE

Mech. Eng.

Programmer

SAL

27000

24000

PAY1 PAY2

TITLE

Elect. Eng.

Syst. Anal.

SAL

40000

34000

Slide 33

Page 34: PMIT-6102 Advanced Database Systems

PHF – Example Fragmentation of relation PROJ

Applications:o Find the name and budget of projects given their location.

– Issued at three siteso Access project information according to budget

– one site accesses ≤200000 other accesses >200000 Simple predicates For application (1)

p1 : LOC = “Montreal”

p2 : LOC = “New York”

p3 : LOC = “Paris” For application (2)

p4 : BUDGET ≤ 200000

p5 : BUDGET > 200000 Pr = Pr' = {p1,p2,p3,p4,p5}

Slide 34

Page 35: PMIT-6102 Advanced Database Systems

Fragmentation of relation PROJ continued

Minterm fragments left after elimination

m1 : (LOC = “Montreal”) (BUDGET ≤ 200000)

m2 : (LOC = “Montreal”) (BUDGET > 200000)

m3 : (LOC = “New York”) (BUDGET ≤ 200000)

m4 : (LOC = “New York”) (BUDGET > 200000)

m5 : (LOC = “Paris”) (BUDGET ≤ 200000)

m6 : (LOC = “Paris”) (BUDGET > 200000)

PHF – Example

Slide 35

Page 36: PMIT-6102 Advanced Database Systems

PHF – Example

PROJ1

PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P2 Database Develop.135000 New York

PROJ3

PROJ4 PROJ6

PNO PNAME BUDGET LOC

P3 CAD/CAM 250000 New York

PNO PNAME BUDGET LOC

MaintenanceP4 310000 Paris

The result of the primary horizontal fragmentation of PROJ forms six fragments FPROJ = {PROJ1, PROJ2, PROJ3, PROJ4, PROJ5, PROJ} according to the minterm predicates M .

Since fragments PROJ2, and PROJ5 are empty, they are not depicted in Figure

Slide 36

Page 37: PMIT-6102 Advanced Database Systems

Completeness

Since Pr' is complete and minimal, the selection predicates are complete

Reconstruction

If relation R is fragmented into FR = {R1,R2,…,Rr}

R = Ri FR Ri

Disjointness

Minterm predicates that form the basis of fragmentation should be mutually exclusive.

PHF – Correctness

Slide 37

Page 38: PMIT-6102 Advanced Database Systems

Derived Horizontal Fragmentation

Defined on a member relation of a link according to a selection operation specified on its owner. Each link is an equijoin. Equijoin can be implemented by means of semijoins.

TITLE, SAL

SKILL

ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC

ENO, PNO, RESP, DUR

EMP PROJ

ASG

L1

L2 L3

Slide 38

Page 39: PMIT-6102 Advanced Database Systems

DHF – DefinitionGiven a link L where owner(L)=S and member(L)=R, the derived horizontal fragments of R are defined as

Ri = R ⋉ Si, 1≤i≤w

where w is the maximum number of fragments that will be defined on R and

Si = Fi (S)

where Fi is the formula according to which the primary horizontal fragment Si is defined.

Slide 39

Page 40: PMIT-6102 Advanced Database Systems

Given link L1 where owner(L1)=SKILL and member(L1)=EMP

EMP1 = EMP ⋉ SKILL1

EMP2 = EMP ⋉ SKILL2

whereSKILL1 = SAL≤30000(SKILL)

SKILL2 = SAL>30000(SKILL)

DHF – Example

ENO ENAME TITLE

E3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE7 R. Davis Mech. Eng.

EMP1

ENO ENAME TITLE

E1 J. Doe Elect. Eng.

E2 M. Smith Syst. Anal.E5 B. Casey Syst. Anal.

EMP2

E6 L. Chu Elect. Eng.

E8 J. Jones Syst. Anal.Slide 40

Page 41: PMIT-6102 Advanced Database Systems

DHF – Correctness Completeness

Referential integrityo ensures that the tuples of any fragment of the member

relation are also in the owner relation.o Let R be the member relation of a link whose owner is

relation S which is fragmented as FS = {S1, S2, ..., Sn}. Furthermore, let A be the join attribute between R and S. Then, for each tuple t of R, there should be a tuple t' of S such that

t[A] = t' [A]

Reconstruction Let a relation R with fragmentation FR = { R1,R2,……Rw }

Disjointness Simple join graphs between the owner and the member

fragments. Slide 41

Page 42: PMIT-6102 Advanced Database Systems

Slide 42

DHF – Correctness

Page 43: PMIT-6102 Advanced Database Systems

More difficult than horizontal, because more alternatives exist.

Example: In horizontal partitioning, if the total number of simple

predicates in Pr is n, there are 2n possible minterm predicates that can be defined on it.

some of these will contradict the existing implications, further reducing the candidate fragments that need to be considered

In the case of vertical partitioning if a relation has m non-primary key attributes, the number of possible fragments is equal to B(m), which is the mth Bell number.

For large values of m;B(m)= approximately (mm) for m=10, B(m) =115,000, for m=15, B(m) =109, for m=30, B(m) = 1023

Vertical Fragmentation

Slide 43

Page 44: PMIT-6102 Advanced Database Systems

In most cases a simple horizontal or vertical fragmentation of a database schema will not be sufficient to satisfy the requirements of user applications.

In this case a vertical fragmentation may be followed by a horizontal one, or vice versa, producing a tree structured Partitioning.

Since the two types of partitioning strategies are applied one after the other, this alternative is called hybrid fragmentation.

It has also been named mixed fragmentation or nested fragmentation.

Slide 44

Hybrid Fragmentation

Page 45: PMIT-6102 Advanced Database Systems

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

It is also called mixed fragmentation or nested fragmentation.

Slide 45

Page 46: PMIT-6102 Advanced Database Systems

To reconstruct the original global relation in case of hybrid fragmentation, one starts at the leaves of the partitioning tree and moves upward by performing joins and unions.

The fragmentation is complete if the intermediate and leaf fragments are complete.

Similarly, disjointness is guaranteed if intermediate and leaf fragments are disjoint.

Slide 46

Correctness of Hybrid Fragmentation

Page 47: PMIT-6102 Advanced Database Systems

Allocation Allocation Problem

Given

F = {F1, F2, …, Fn} fragments

S ={S1, S2, …, Sm} network sites

on which a set of applications Q = {q1, q2,…, qq} is running.

The allocation problem involves finding the “optimal” distribution of F to S. Optimality can be defined with respect to two measures:

Minimal costo The cost function consists of the cost of storing each Fi at a site Sj,o the cost of querying Fi at site Sj , the cost of updating Fi at all sites where it is

stored,o the cost of data communication.

Performanceo minimize the response time.o maximize the system throughput at each site.

Slide 47

Page 48: PMIT-6102 Advanced Database Systems

General Form

min(Total Cost)

subject to

response time constraint

storage constraint

processing constraint

Decision Variable

Allocation Model

xij 1 if fragment Fi is stored at site Sj

0 otherwise

Slide 48

Page 49: PMIT-6102 Advanced Database Systems

Total Cost

Storage Cost (of fragment Fj at Sk)

We choose a different approach in our model of the database allocation problem (DAP) and specify it as consisting of the processing cost (PC) and the transmission cost (TC).

Thus the query processing cost (QPC) for application qi is:

processing component + transmission component

Allocation Model

(unit storage cost at Sk) (size of Fj) xjk

query processing cost all queries

cost of storing a fragment at a site

all fragments

all sites

Slide 49

Page 50: PMIT-6102 Advanced Database Systems

Allocation Model Query Processing Cost

Processing component PC, consists of three cost factors the access cost (AC) + the integrity enforcement cost (IE) + the

concurrency control cost (CC) Access cost

o The first two terms calculate the number of accesses of user query qi to fragment Fj.

o We assume that the local costs of processing them are identical.o The summation gives the total number of accesses for all the

fragments referenced by qi. Multiplication by LPCk gives the cost of this access at site Sk.

o We again use xij to select only those cost values for the sites where fragments are stored.

Integrity enforcement and concurrency control costso Can be similarly calculated

(no. of update accesses+ no. of read accesses) all fragments

all sites xij local processing cost at a site

Slide 50

Page 51: PMIT-6102 Advanced Database Systems

Query Processing Cost

Transmission component

cost of processing updates + cost of processing retrievals In update queries it is necessary to inform all the sites where replicas exist,

while in retrieval queries, it is sufficient to access only one of the copies. In addition, at the end of an update request, there is no data transmission

back to the originating site other than a confirmation message, whereas the retrieval-only queries may result in significant data transmission.

Cost of updates

Retrieval Cost

Allocation Model

update message cost all fragments

all sites

acknowledgment cost all fragments

all sites

minall sites

all fragments (cost of retrieval command cost of sending back the result)

Slide 51

Page 52: PMIT-6102 Advanced Database Systems

Allocation Model Constraints

Response Time

execution time of query ≤ max. allowable response time for that query

Storage Constraint (for a site)

Processing constraint (for a site)

storage requirement of a fragment at that site

all fragments

storage capacity at that site

processing load of a query at that site all queries

processing capacity of that site

Slide 52

Page 53: PMIT-6102 Advanced Database Systems

Thank You

Slide 53