View
66
Download
0
Category
Preview:
DESCRIPTION
Chapter 11 Grid Concurrency Control. 11.1 A Grid Database Environment 11.2 An Example 11.3 Grid Concurrency Control (GCC) 11.4 Correctness of GCC 11.5 Features of GCC Protocol 11.6 Summary 11.7 Bibliographical Notes 11.8 Exercises. Grid Concurrency Control. - PowerPoint PPT Presentation
Citation preview
Chapter 11
Grid Concurrency
Control
11.1 A Grid Database Environment 11.2 An Example11.3 Grid Concurrency Control (GCC)11.4 Correctness of GCC11.5 Features of GCC Protocol11.6 Summary11.7 Bibliographical Notes11.8 Exercises
Grid Concurrency Control Concurrency control protocol helps to maintain the consistency of data
in database
Concurrency control protocol addresses ‘C’ and ‘I’ of ACID properties
Serializability in the most widely accepted correctness criterion
Different DB architecture needs different concurrency control protocol, i.e. concurrency control protocol for a centralized DBMS will be different that that of a distributerd DBMS
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.1 A Grid Database Environment Data is geographically distributed in Grid
environment. A typical working of database in Grid architecture is shown in the figure
T2
T2
T1
ST12 ST22 ST23
DB1
Grid Middleware
DB2 DB3
ST13
T1
Legend:
T1: Transaction 1 T2: Transaction 2 ST ij: Subtransaction of
transaction i at site j
A distributed grid DB with 3 sites are shown, DB1, DB2, and DB3 (connected via grid middleware)
Transactions can be submitted at any site and may need to access data from all the sites
Originator / coordinator is a site where transaction is submitted Transactions T1 and T2 submitted to DB1 and they needs to access
data from DB2 and DB3 as well Transaction and site identifiers are suffixed, e.g. T1 will have sub-
transactions ST12 & ST13; and T2 will have sub-transactions ST21 and ST22
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.1 A Grid Database Environment (Cont’d) Data access must be synchronized to maintain correctness of data
Global lock tables, global logs etc cannot be implemented in Grid environment
Different DB sites may implement different concurrency control procols, e.g. one site may use locking whereas other site may use optimistic concurrency control protocol
This situation is unavoidable in Grid architecture due to heterogeneous DB sites
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.2 An Example
Following example shows that using traditional concurrency control protocols in the Grid environment may potentially corrupt the data
Example Consider four data objects are stored in two databases DB2 and
DB3:DB2 = O1 and O2DB3 = O3 and O4
Two transactions are submitted to the database DB1, as shown below:T1 = r1(O1) r1(O2) w1(O3) w1(O1) C1T2 = r2(O1) r2(O3) w2(O4) w2(O1) C2
The transactions are submitted to the Grid middleware and the metadata service forms required sub-transactions as follows: Sub-transactions of T1:
ST12 = r12(O1) r12(O2) w12(O1) C12(11.1)
ST13 = w13(O3) C13 (11.2)
Sub-transactions of T2:ST22 = r22(O1) w22(O1) C22
(11.3)ST23 = r23(O3) w23(O4) C23
(11.4) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.2 An Example (Cont’d)
The sub-transactions are submitted to respective sites, i.e. ST12 and ST22 are submitted to DB2 and ST13 and ST23 are submitted to DB3
As all DB sites are autonomous and hence schedules/histories are created independently. Say DB2 create following history: H2 = r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) C22 (11.5)
and DB3 creates following history: H3 = r23(O3) w23(O4) C23 w13(O3) C13 (11.6)
From equation 11.5 serializability order: T1 execute before T2 and from equation 11.6 serializability order: T2 executes before T1
Though there is no problem in executing histories H2 and H3 in isolation, but when both histories are combined then serilaizability graph produces a cycle T1 T2 T1
Traditional distributed DB handles this situation by implementing a global management, which is not possible in Grid Databases. Next, Grid Concurrency Control protocol is discussed
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) The above example is the motivation for GCC; where, though individual
sites generate serializable schedules, in global view of things the transactions may be ordered incorrectly
Functions required by GCC: DB_Accessed(T): takes the global transaction as argument and returns set of
databases where sub-transactions of the global transaction are submitted Split_Trans(T): takes the global transaction as argument and returns a set of sub-
transactions Active_Trans(DB): takes the database as an argument and returns the set of
global transactions having any sub-transaction running in the database Cardinality (Any Set): takes any set, e.g. set of databases or set of sub-
transactions and returns the number of elements in the set Append_TS (Subtransaction): takes the sub-transaction as an argument and
attaches a unique timestamp to it. Sub-transactions of same global transaction will have same timestamp value
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Grid Serializability Theorem Traditional Conflict Serializability is not sufficient to ensure consistency
in Grid database environment Grid serializability theorem is needed to ensure correctness of data Global transactions can be classified in 2 categories:
Global transactions with only one sub-transaction and Global transaction having more than one sub-transaction
Total order is defined as below:
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
In traditional serializability theory, serial history is considered correct. On the same ground Grid-serial history is considered correct in Grid architecture
Grid serial history is defined as below:
Condition (1) of definition 11.2 is very strict and does not allow interleaving of operations
Hence a more practical approach, Grid Serializable history is used
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Grid serializable history:
Grid serializability is analysed by the grid serializability graph
If the graph is acyclic the history is Grid serializable
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Grid Serializability graph is defined as below:
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Condition (1) considers local transactions in Grid Serializability graph Condition (2) only considers those global transactions having more
than one subtransaction Condition (3) shows the arc between conflicting transactions Grid serializability graph is stored at local sites as there is no global
management layer Following types of conflicts are possible:
Conflict between global transactions (global-global conflict) Conflict between global transaction and local transaction (global-local conflict) Conflict between local transactions (local-local conflict)
Acyclic Grid-serializability graph is used to resolve global-local conflict Total-order is used to resolve global-global conflict
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Based on the Grid serializability graph and total order Grid serializability theorem is as follows:
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Example of Grid serializability graph: In addition to the global transaction (in earlier example), consider
additional local transactions as follows: Local Transactions. (LT12 is read as local transaction 1 at database
site DB2):LT12 = lr12(O1) lw12(O2) lC12
LT13 = lw13(O3) lC13
Now consider following modified histories:
H2 = lr12(O1) r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) lw12(O2) C22 lC12
H3 = r23(O3) w23(O4) lw13(O3) C23 w13(O3) C13 lC13
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Following figure shows the Grid serializability graph at sites DB2 and DB3
Three possible types of conflicts are discussed below:
ST12 ST22
LT12
ST13 ST23
LT13
At site DB2 At site DB3
Global-global conflict: At site DB2, ST12 precedes ST22 (i.e. T1 precedes T2) and at site DB3, ST23 precedes ST13 (i.e. T2 precedes T1). Thus a cycle is formed at different sites. And it may be impossible to identify the cycle without a global management layer. Total order used in Grid serializability avoids formation of cycles are distributed sites
Global-local conflict: Can be identified and resolved by local DBMS, e.g. in DB2 ST12 and LT12
Local-local conflict: Can be identified and resolved by local DBMS, similar to traditional DBMS
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Grid Concurrency Control Protocol Has 2 phases: submission & termination Site where transaction is submitted is called originator Split_trans(T) function is used to generate multiple sub-transactions of
global transaction Sub-transactions are then submitted to participating sites Unique timestamp is attached to each sub-transactions before
submitting Sub-transactions at local databases are executed in total-order A local schedular does not distinguishes between a local transaction
and a sub-transaction of global transaction Global transaction with only one sub-transaction does not need to be in
total-order as they cannot conflict with other global transaction at more than one site
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
GCC (Cont’d)
Submission phase of GCC
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Step-1) Checks if data from multiple sites need to be accessed
if data from only originator is required then treat as local transaction If more multiple DB needs to be accessed then the transaction is submitted to
metadata services. Split_trans(T) function is used to create sub-transactions
Step-2) Global transactions are added to a set which stores all the currently executing global transactions. The set name is Active_Trans
Step-3) The middleware appends a timestamp to all sub-transactions before submitting it to respective databases
Step-4) If more than one active global transaction exists simultaneously that accesses more than one database, then sub-transactions are executed in total order (according to the timestamp)
Step-5) When all sub-transactions of a global transaction finish execution then the global transaction is removed from the Active_Trans set (details in termination phase of GCC)
Note: Active_Trans is a set of currently active global transactions and Active_trans(DB) is a function that take DB site as argument and returns active transactions executing in that database
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Termination phase of GCC A global transaction is active till even one of the sub-transaction is
executing Steps of termination are as follows:
When a sub-transaction finishes execution, the originator is informed Active Transactions, Conflicting Active Transactions and databases access by
global transaction set are updated accordingly Check whether the completed sub-transaction is the last sub-transaction of the
global transaction
if not the last, then sub-transactions waiting in the queue cannot be scheduled
if the sub-transaction is the last sub-transaction of the global transaction, then other conflicting sub-transactions can be scheduled. Sub-transactions from the queue then follows the normal submission steps
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Termination phase of GCC
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Revisiting the example of section 11.2 Say, transaction T1’s timestamp is 1 and T2’s timestamp is 2 History, H2, produced by site DB2 is a serial history (equation 11.5)
with T1 preceding T2 GCC will not schedule transactions as in H3 (equation 11.6) due to
step-4) of the submission phase of GCC. It will always follow the total-order based on timestamp. Hence, sub-transactions of T1 will always be scheduled before sub-transactions of T2. GCC will generate histories H2 (equation 11.5) and H3 (equation 11.6) as follows:H2 = r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) C22 (same as (11.5))
H3 = w13(O3) C13 r23(O3) w23(O4) C23 (corrected execution order by the GCC protocol)
Thus both schedules have ordered the transactions in total-order with T1 preceding T2
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Comparison with traditional concurrency control protocols
Release lock request
Operation decision
Coordinator site (typically where the
transaction is submitted)
Central site managing global information (e.g.
global lock table)
All participating sites (1,2…n)
Lock request
Lock granted
Operation command
Operations of a general centralised locking protocol (e.g. centralised two phase locking) in homogeneous distributed DBMS
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Operations of a general distributed locking protocol (e.g. decentralised two phase locking) in homogeneous distributed DBMS
Operation command embedded with lock request
Coordinator site (typically where the
transaction is submitted)
All participating sites (1,2,…n)
Participant’s image of global
information
Operation
End of operation
Release lock request
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Operations of a general Multi-DBMS protocol
MDBS Reply
Forward final decision to the originator
Final decision
Talk to participant depending on its local protocol
Operation request embedded with global information
Originator site (where the transaction is
submitted)
Multidatabase management system (global management
layer)
All participants (1,2,É n)
Check with multi-DBMS layer if required
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
Operations of GCC protocol
11.3 Grid Concurrency Control (GCC) (Cont’d)
Forward final decision to the originator Final decision
Forward operation request to participants
Operation request
Originator site (where the transaction is
submitted)
Grid Middleware services (metadata and timestamp services for this purpose)
All participants (1,2,…n)
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.4 Correctness of GCC Protocol Grid-serializable schedule is considered correct in Grid environment A concurrency control protocol conforming to Theorem 11.1 is Grid
serializable and thus is correct
Proposition 11.1: All local transactions and global subtransactions submitted to any local scheduler are scheduled in serializable order.
Proposition 11.2: Any two global transactions having more than one subtransaction actively executing simultaneously must follow total-order.
Based on the proposition 11.1 and 11.2 following theorem can be proved:
Theorem 11.2: Every schedule produced by GCC protocol is Grid-serializable.
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.5 Features of GCC Protocol Concurrency control in heterogeneous environment - Does not use
global lock table etc. and hence can work in Autonomous, Heterogeneous environment
Reducing the load from originator site - As GCC does not use a centralized scheduling schemes, originator sites have reduced load
Reducing number of messages in the inter-network - Communication between the originator and other participating sites is reduced
But due to absence of global management layer, some of the valid interleaving may not be possible and hence may result in strict schedule
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.6 Summary Global management layer cannot be used in Grid environment
GCC protocol maintains the correctness of data in Grid environment
GCC protocol can work in heterogeneous environment
Optimizing the scheduling process may be hard
The focus was to maintain the consistency of data in Grid databases
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
Continue to Chapter 12…
Recommended