32
Distributed Database Systems Dr. Mohamed Osman Hegazi

Distributed Database Systems Dr. Mohamed Osman Hegazi

Embed Size (px)

Citation preview

Page 1: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Distributed Database Systems

Dr Mohamed Osman Hegazi

Definitions

Distributed Database is a collection of multiple logically interrelated databases distributed over a computer networkDistributed database management systems (DDBMS) The software that permits the management of DDBS and makes the distribution transparent to the usersDistributed database system (DDBS) = DDB + DndashDBMS

The two important terms in this definitions are-Logically interrelated (The Application)-Distributed over a network

Dr Mohamed Osman Hegazi

1 The development of computer network promotes de-centralization

2 In a company the database organization might reflect the organizational structure which is distributed into units Each unit maintains its own database

3 Sharing of data can be achieved by developing a distributed database system whichbull Makes data accessible by all unitsbull Stores data close to where it is most

frequently used

Motivation for Distributed Database

Dr Mohamed Osman Hegazi

DDBMS Advantages

bull Data are located near ldquogreatest demandrdquo sitebull Faster data accessbull Faster data processingbull Growth facilitationbull Improved communicationsbull Reduced operating costsbull User-friendly interfacebull Less danger of a single-point failurebull Processor independence

Dr Mohamed Osman Hegazi

DDBMS Disadvantages

bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data

environmentbull Increased training cost

Dr Mohamed Osman Hegazi

The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface

Dr Mohamed Osman Hegazi

Distributed Database Management Systems

Dr Mohamed Osman Hegazi

An ExampleEMP(ENO ENAME TITLE)

ASG(ENO PNO DUR RESP)

PROJ(PNO PNAME BUDGET)PAY(TITLESAL)

Dr Mohamed Osman Hegazi

Distributed Query

bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more

than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE

But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 2: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Definitions

Distributed Database is a collection of multiple logically interrelated databases distributed over a computer networkDistributed database management systems (DDBMS) The software that permits the management of DDBS and makes the distribution transparent to the usersDistributed database system (DDBS) = DDB + DndashDBMS

The two important terms in this definitions are-Logically interrelated (The Application)-Distributed over a network

Dr Mohamed Osman Hegazi

1 The development of computer network promotes de-centralization

2 In a company the database organization might reflect the organizational structure which is distributed into units Each unit maintains its own database

3 Sharing of data can be achieved by developing a distributed database system whichbull Makes data accessible by all unitsbull Stores data close to where it is most

frequently used

Motivation for Distributed Database

Dr Mohamed Osman Hegazi

DDBMS Advantages

bull Data are located near ldquogreatest demandrdquo sitebull Faster data accessbull Faster data processingbull Growth facilitationbull Improved communicationsbull Reduced operating costsbull User-friendly interfacebull Less danger of a single-point failurebull Processor independence

Dr Mohamed Osman Hegazi

DDBMS Disadvantages

bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data

environmentbull Increased training cost

Dr Mohamed Osman Hegazi

The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface

Dr Mohamed Osman Hegazi

Distributed Database Management Systems

Dr Mohamed Osman Hegazi

An ExampleEMP(ENO ENAME TITLE)

ASG(ENO PNO DUR RESP)

PROJ(PNO PNAME BUDGET)PAY(TITLESAL)

Dr Mohamed Osman Hegazi

Distributed Query

bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more

than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE

But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 3: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

1 The development of computer network promotes de-centralization

2 In a company the database organization might reflect the organizational structure which is distributed into units Each unit maintains its own database

3 Sharing of data can be achieved by developing a distributed database system whichbull Makes data accessible by all unitsbull Stores data close to where it is most

frequently used

Motivation for Distributed Database

Dr Mohamed Osman Hegazi

DDBMS Advantages

bull Data are located near ldquogreatest demandrdquo sitebull Faster data accessbull Faster data processingbull Growth facilitationbull Improved communicationsbull Reduced operating costsbull User-friendly interfacebull Less danger of a single-point failurebull Processor independence

Dr Mohamed Osman Hegazi

DDBMS Disadvantages

bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data

environmentbull Increased training cost

Dr Mohamed Osman Hegazi

The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface

Dr Mohamed Osman Hegazi

Distributed Database Management Systems

Dr Mohamed Osman Hegazi

An ExampleEMP(ENO ENAME TITLE)

ASG(ENO PNO DUR RESP)

PROJ(PNO PNAME BUDGET)PAY(TITLESAL)

Dr Mohamed Osman Hegazi

Distributed Query

bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more

than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE

But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 4: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

DDBMS Advantages

bull Data are located near ldquogreatest demandrdquo sitebull Faster data accessbull Faster data processingbull Growth facilitationbull Improved communicationsbull Reduced operating costsbull User-friendly interfacebull Less danger of a single-point failurebull Processor independence

Dr Mohamed Osman Hegazi

DDBMS Disadvantages

bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data

environmentbull Increased training cost

Dr Mohamed Osman Hegazi

The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface

Dr Mohamed Osman Hegazi

Distributed Database Management Systems

Dr Mohamed Osman Hegazi

An ExampleEMP(ENO ENAME TITLE)

ASG(ENO PNO DUR RESP)

PROJ(PNO PNAME BUDGET)PAY(TITLESAL)

Dr Mohamed Osman Hegazi

Distributed Query

bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more

than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE

But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 5: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

DDBMS Disadvantages

bull Complexity of management and controlbull Securitybull Lack of standardsbull Increased storage requirementsbull Greater difficulty in managing the data

environmentbull Increased training cost

Dr Mohamed Osman Hegazi

The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface

Dr Mohamed Osman Hegazi

Distributed Database Management Systems

Dr Mohamed Osman Hegazi

An ExampleEMP(ENO ENAME TITLE)

ASG(ENO PNO DUR RESP)

PROJ(PNO PNAME BUDGET)PAY(TITLESAL)

Dr Mohamed Osman Hegazi

Distributed Query

bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more

than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE

But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 6: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

The concept of DDBA DDBS is not a collection of files that can be individually stored at each node of computer network To form a DDBS files should not only be logically related but there should be structure among the files and access should be via a common interface

Dr Mohamed Osman Hegazi

Distributed Database Management Systems

Dr Mohamed Osman Hegazi

An ExampleEMP(ENO ENAME TITLE)

ASG(ENO PNO DUR RESP)

PROJ(PNO PNAME BUDGET)PAY(TITLESAL)

Dr Mohamed Osman Hegazi

Distributed Query

bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more

than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE

But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 7: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Distributed Database Management Systems

Dr Mohamed Osman Hegazi

An ExampleEMP(ENO ENAME TITLE)

ASG(ENO PNO DUR RESP)

PROJ(PNO PNAME BUDGET)PAY(TITLESAL)

Dr Mohamed Osman Hegazi

Distributed Query

bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more

than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE

But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 8: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

An ExampleEMP(ENO ENAME TITLE)

ASG(ENO PNO DUR RESP)

PROJ(PNO PNAME BUDGET)PAY(TITLESAL)

Dr Mohamed Osman Hegazi

Distributed Query

bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more

than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE

But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 9: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Distributed Query

bullIf these table is stored in one place then we can ldquofor examplerdquo using the following query to get the name and the salary of the employee who works more

than 12 months SELECT ENAME SALFROM EMP ASG PAYWHERE ASG DUR gt12AND EMPENO=ASGENOAND PAYTITLE=EMPTITLE

But if these table are distributed over deferent site then the execution of this query needs allot of process to be done DDMS do this process and let the end user feel like databasersquos only user (transparence)

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 10: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

bullThe concepts of DDB is to fragment the data and store each fragment on its site

bullData may be replicated on different site (replication)

bullDDBMS hide these details from the user and makes the distribution transparent to the users Distributed Database Transparency FeaturesDistribution transparency Transaction transparencyFailure transparency Performance transparency Heterogeneity transparency

Distributed Database Transparency

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 11: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Distributed DB Design

Top-down approach bull have a databasebull how to split and allocate to individual sitesTwo issues in top-down designFragmentationAllocation

Multi-databases (or bottom-up)bull combine existing databasesbull how to deal with heterogeneity amp autonomy

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 12: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Fragmentationbull Horizontal Primary

depends on local attributesR Derived

depends on foreign relation

bull Vertical

R

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 13: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Example

Employee relation E (namelocsalhellip) 40 of queries 40 of queries Qa select Qb select

from E from Ewhere loc=Sa where loc=Sbandhellip and

Motivation Two sites Sa Sb Qa Qb

Sa Sb

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 14: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Name Loc Sal578

Sa 10Sally Sb 25Tom Sa 15

Joe

58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

F = F1F2

At Sa At Sb

E

F1 = loc=Sa(E) F2 = loc=Sb(E)

primary horizontal fragmentation

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 15: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Loc=SA sal lt 10

Loc=SA

sal 10

Loc=SB sal lt 10

Loc=SB

sal 10

F1

F3F2

Qa Select hellip loc = SA

Qb Select hellip loc = SB

Prefer F2 to F1 and F3

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 16: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty1(

(ΔϴόϣΎΠϟΕΎϴϠϜϟϢψϧUniversity Faculties system(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty2(

(ŚŕƔƆƄƅŔƐŧţŌƇŕŴƊFaculty3(

Horizontal Fragmentation Peer to peer relationship ndash brothers

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 17: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Vertical fragmentation

E1

NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15hellip

NM Loc5 Joe Sa7 Sally Sb8 Fred Sahellip

Sal5 107 258 15hellip

E

E2

Example

R[T] R1[T1] R2[T2]hellip Rn[Tn] Ti T

Just like normalization of relations

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 18: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Vertical Fragmentation example

PROJ1 information about project budgets

PROJ2 information about project names and locations

PNO BUDGET

P1 150000

P3 250000P2 135000

P4 310000P5 500000

PNO PNAME LOC

P1 Instrumentation Montreal

P3 CADCAM New YorkP2 Database DevelopNew York

P4 Maintenance ParisP5 CADCAM Boston

PROJ1 PROJ2

New YorkNew York

PROJ

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P3 CADCAM 250000P2 Database Develop135000

P4 Maintenance 310000 ParisP5 CADCAM 500000 Boston

New YorkNew York

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 19: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

E1(NMLOC)E2(SAL)

ExampleE(NMLOCSAL) E1(NM)

E2(LOC)E3(SAL)

Which is the right vertical fragmentationhellip

Grouping Attributes

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 20: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Vertical Fragmentation branch relationship ndash parents and son

ΩήϓϷϥϮΌη

ΕΎΒΗή

Ϥϟ

(Sal

ary

allo

wan

ces

Tax

(

ΎϨϴϴόΘ

ϟΕ

(N

ame ad

dre

ss g

rade(

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 21: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Hybrid Fragmentation

R

HFHF

R1

VF VFVFVFVF

R11 R12 R21 R22 R23

R2

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 22: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

AllocationExample E F1 = loc=Sa(E) F2 = loc=Sb(E)

Site a

Site b

Fragment E

Do we replicate fragments

Where do we place each copy of each fragment

Site c

F1F1

F2

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 23: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Allocation Alternativesbull Non-replicated

ndash partitioned each fragment resides at only one site

bull Replicatedndash fully replicated each fragment at each sitendash partially replicated each fragment at some of the

sitesbull Rule

If replication is advantageous

otherwise replication may cause problems

read - only queriesupdate queries

1

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 24: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Optimization problem

bull What is the best placement of fragments andor best number of copies tondash minimize query response timendash maximize throughputndash minimize ldquosome costrdquondash

bull Subject to constraintsndash Available storagendash Available bandwidth processing powerhellipndash Keep 90 of response time below Xndash

Very hard problem

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 25: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Static data allocation No change on allocation sites or no need for extra storage space (no expanding on the size No increasing on data )Dynamic data allocation dynamically changed the location of the data as a result of expansion in the data which usually results because of the nature of the systems producing data Problems of data sites can be treated through two types of models

bull Adaptive Models models that apply when the reason for the change of location due to system activity( online systems- data storage on line( Example airline bookings) These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication)

bull non-adaptive models These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system

Static data allocation amp Dynamic data allocation

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 26: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

ReplicationReplication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated Despite the distance from each other Controlling the updating of these copies is done by one of two techniquesLazy replication it is to update the data after the completion of work on one of the copies (master copy) This means that update is done outside the boundaries of transaction Eager replication is to update the replicated data within the transaction boundaries while working on one of the copies

ndash central update(initial copy primary copy) update the primary copy first and then update the secondary copy This method leads to lack of synchronization of the update which facilitates control of consistency but may lead to the problems of the bottleneck

ndash Or update everywhere updating the copies in all places make all the copies of equal opportunities for the update

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 27: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Wherewhen Eager Lazy

Primary Copy

Early Solutions in Ingres

SybaseIBMOracle Placement Strat

Serialization- Graph Based

Update Everywhere

ROWAROWAA Quorum based Oracle Synchr Repl

Oracle Advanced Repl Weak consistency Strat

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 28: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Distributed Query ProcessingThe aim of queries processing in distributed data is to let the work on distributed data appear like a single database system The problem of query processing in distributed data can be fragmented into several levels according to the problems of dataThe query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query Query Decomposition amp Data Localizationbull The first stages of the distributed query processing is to analyze the query

to the relation algebra then the second stages localize the data by distribute the query

Query Optimizationbull The third stages is to achieve optimal implementation of the query by

making the executive be as little as possible and delete the unneeded expression

bull The query optimization is one of the important aspects in dealing with queries there are many algorithms used in the investigation of this aspect

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 29: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Data localization ΕΎϧΎϴΒϟΰϛήϤΗ

Query Decomposition ϡϼόΘγϻϞϴϠΤΗ

Global Optimization ΔϣΎόϟΔϴϠΜϣϷ

local Optimization ΔϴϠΤϤϟΔϴϠΜϣϷ

Calculus Query on distributed relation ϡϼόΘγϻΏΎδΣ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Algebraic Query on distributed relation ϡϼόΘγϻήΒΟ Δϋί ϮϤϟΕΎϗϼόϟϲ ϓ

Fragment Query ϡϼόΘγϻΔΰΠΗ

Optimized fragment query with communication operation

ϝΎμ ΗϻΕΎϴϠϤϋϊ ϣΓ ΰΠϤϟϡϼόΘγϻϲ ϓΔϴϠπ ϓϷϖϴϘΤΗ

Optimized local queries ϞΜϣϷΔϴϠΤϤϟΕΎϣϼόΘγϻ

Global Schema ϡΎόϟΕΎϧΎϴΒϟς τ Ψϣ

Statistics on fragments ΔΰΠΘϟΕΎϴΎμ Σ

fragment Schema ΓΰΠϤϟΕΎϧΎϴΒϟς τ Ψϣ

Local Schema ϲ ϠΤϤϟΕΎϧΎϴΒϟς τ Ψϣ

Control site ϢϜΤΘϟϊ ϗϮϣ

local site ϲ ϠΤϤϟϊ ϗϮϤϟ

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 30: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

bull Concurrency control in databases is the activities that make the transactions consistence among all the system data

bull DDBMS take care of synchronized the data that distributed over the network side each of these sites are running programs dealing with it is own data In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues

bull There are four techniques used to control the concurrence on distributed database locking techniquesTimestamp Optimistic algorithm make all operations on the data

performed except for the operation that updates the data in this case operation updates the local data first

Complex algorithm for timestamps

Concurrency Control in distributed database

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 31: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Distributed Concurrency Controlbull Nonreplicated Scheme

ndash Each site maintains a local lock manager to administer lock and unlock requests for local data

ndash Deadlock handling is more complexbull Single-Coordinator Approach

ndash The system maintains a single lock manager that resides in a single chosen site

ndash Can be used with replicated datandash Advantages

bull simple implementationbull simple deadlock handling

ndash Disadvantagesbull bottleneckbull vulnerability

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control
Page 32: Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr Mohamed Osman Hegazi

Distributed Concurrency Control

bull Majority Protocolndash A lock manager at each sitendash When a transaction wishes to lock a data item Q

which is replicated in n different sites it must send a lock request to more than half of the n sites in which Q is stored

ndash complex to implementndash difficult to handle deadlocks

  • Distributed Database Systems
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Distributed Database Management Systems
  • An Example
  • Distributed Query
  • Slide 10
  • Distributed DB Design
  • Fragmentation
  • Example
  • Slide 14
  • Slide 15
  • Slide 16
  • Vertical fragmentation
  • Vertical Fragmentation example
  • Grouping Attributes
  • Slide 20
  • Hybrid Fragmentation
  • Allocation
  • Allocation Alternatives
  • Optimization problem
  • Slide 25
  • Replication
  • Slide 27
  • Distributed Query Processing
  • Slide 29
  • Slide 30
  • Distributed Concurrency Control
  • Distributed Concurrency Control