November 17, 2016 Sam Siewert
CS317File and Database Systems
Lecture 15 – Introduction to Distributed DBMS
http://dilbert.com/strips/comic/2007-01-13/
Exam #2 ResultsAverage = TBDStd Deviation = TBD
No Clear Trends
Some Confusion on Normalization – See Solutions
Focus on A#6 and Final Oral Exam (25% of Grade)
Turn in all Slides (after Presentation)
Turn in final Report (Note new Due Date)
Sam Siewert 2
RemindersAssignment #6, DBMS Project of Your Interest –POSTED– Choose one of the BOLDED Options or Come Talk to Me or E-
mail me with your Ideas– Self-Directed – Autonomy, Mastery, Purpose and Life-long
Learning and Requires Some Research on Your Part
This Completes Assignments for the Class
Assignment #6 Assessed with Final Grading, Due 12/1, Late 12/4, Finals Start on 12/6
Sam Siewert 3
Interdisciplinary Nature of DBMS
Sam Siewert 4
DBMS
File Systems
Operating Systems
Programming Languages(SQL, OOP)
Security
Networking(Clusters, DR, Client/Server)
Storage(SAN, NAS, DAS)
Big
Data
Analytics
?
CS332 – “R”(Data Mining)
CS332 – C++ & JavaFinal Lecture – Week 14SE300/310 – OOA/OOD/OOP
ACID – ATOMICITY, CONSISTENCY, ISOLATION, DURABILITY
TRANSACTION MANAGEMENT
Sam Siewert 5
ACID FeaturesA – Atomicity, means Non-interruptable Updates for Tuples and Logically Related Table Tuples [E.g. Delete/Promote an Employee and Re-assign Clients]
C – Consistency, means the DB state should not become inconsistent [E.g. Deleted Employee has Clients Still assigned to their Employee-ID foreign key]
I – Isolation so that Transactions Do Not Interfere with Another if Serialized [E.g. Roll-back of One Should Not Cascade into Roll-backs of Others]
Durability – Committed Transactions Should Be Stored to Persistent Storage and Not Lost! [Durable] Sam Siewert 6
State Transition Diagram for Transaction
Pearson Education © 2014 7
Written to Storage(Cached w/o ATA Flush, SCSI FUA)
http://en.wikipedia.org/wiki/Disk_buffer
Completed TransactionBuffered for Write-back
Partially Completed TransactionCleared from Buffer – No Write-back
If We NEVER used CacheAnd ALWAYS used FUA, thishelps, But WAY TOO SLOW
Comparison of Methods
Pearson Education © 2014 8
Orders conflicting operationsAs “some” serial schedule
Including those with data corruptionAnd inconsistent DB states
Less Restrictive Operation OrderingAs “some” serial schedule[Impractical – NP-Complete]
SummaryConcurrency due to Multiple Clients and Transactions Mapped to OS Threads [Readers, Writers]
Causes Problems with Split Transactions– Lost Update– Dirty Read or Uncommitted Dependency– Fuzzy Read, Phantom Read
Serialization Using Locks – Not Sufficient for Consistency if Lock Scope Too Small and Can Cause Deadlock!Recoverability – Atomicity Requirement2PL – Growing and Shrinking Phase – Helps with Lock Scope, but Still have Deadlock/LivelockCascading Roll-back and Rigorous/Strict 2PLDeadlock Problem – Prevention, Detect & BreakAlternative Timestamp Methods
Rigorous/Strict 2PL with Deadlock Prevention or Detect & Break or Timestamp is Best
Sam Siewert 9
DDBMS CONCEPTS AND DESIGN
Distributed DBMS
Sam Siewert 10
Distributed DBMS
Pearson Education © 2014 11
MySQL ReplicationUsed for DR [Disaster Recovery] and for Geo Content and Services Distribution– East Coast / West Coast – ERAU Daytona, ERAU Prescott– ACTIVE-ACTIVE for Reads, Writes to MASTER
http://dev.mysql.com/doc/refman/5.0/en/replication.htmlAsynchronous Compared to Clustering [Synchronous]One Server is Master - MySQL Replication Solutions
Sam Siewert 12
MySQL Replication - DRFail-Over Master in Case of DisasterBEFORE MASTER FAILURE
Sam Siewert 13
MySQL Replication – DR Fail-OverFail-Over Master in Case of DisasterAFTER MASTER FAILUREChallenges – Fail-Back [Manual] and Split-Brain
Sam Siewert 14
To Restore MASTER We Must:
1) Repair Issue [E.g. HW fix]2) Re-Sync Writes3) Fail-Back to Original MASTER
Distributed ProcessingA centralized database that can be accessed over acomputer network.
Pearson Education © 2014 15
Parallel or Cluster DBMSMain architectures for parallel DBMSs are:
– Shared memory,– Shared disk,– Shared nothing.
Pearson Education © 2014 16
Parallel DBMS
(a) shared memory
(b) shared disk
(c) shared nothing
Pearson Education © 2014 17
MySQL ClusterMySQL Shared Nothing Cluster – NDBD, Network DB DaemonSynchronous – So Ideally Gigabit, 10GE, 40G IB or Better ClusterSimilar to PNFS Concept – Parallel Network File System (NDB MgtBottleneck?)
Sam Siewert 18
Advantages of DDBMSsReflects organizational structureImproved shareability and local autonomyImproved availabilityImproved reliabilityImproved performanceEconomicsModular growth
Pearson Education © 2014 19
Disadvantages of DDBMSsComplexityCostSecurityIntegrity control more difficultLack of standardsLack of experienceDatabase design more complex
Pearson Education © 2014 20
Homogeneous DDBMSAll sites use same DBMS product.Much easier to design and manage.Approach provides incremental growth and allowsincreased performance.
Pearson Education © 2014 21
Overview of NetworkingNetwork - Interconnected collection of autonomous computers, capable of exchanging information.
• Local Area Network (LAN) intended for connecting computers at same site.
• Wide Area Network (WAN) used when computers or LANs need to be connected over long distances.
• WAN relatively slow and less reliable than LANs. DDBMS using LAN provides much faster response time than one using WAN.
Pearson Education © 2014 22
Overview of Networking
Pearson Education © 2014 23
Asynchronous Replication Synchronous Cluster
Functions of a DDBMSExpect DDBMS to have at least the functionality ofa DBMS.Also to have following functionality:– Extended communication services.– Extended Data Dictionary.– Distributed query processing.– Extended concurrency control.– Extended recovery services.
Pearson Education © 2014 24
Functions of a DDBMSExpect DDBMS to have at least the functionality ofa DBMS.Also to have following functionality:– Extended communication services.– Extended Data Dictionary.– Distributed query processing.– Extended concurrency control.– Extended recovery services.
Pearson Education © 2014 25
Reference Architecture for DDBMS
Due to diversity, no accepted architecture equivalent to ANSI/SPARC 3-level architecture.A reference architecture consists of:– Set of global external schemas.– Global conceptual schema (GCS).– Fragmentation schema and allocation schema.– Set of schemas for each local DBMS conforming
to 3-level ANSI/SPARC.Some levels may be missing, depending on levels oftransparency supported.
Pearson Education © 2014 26
Reference Architecture for DDBMS
Pearson Education © 2014 27
Reference Architecture for MDBSIn DDBMS, GCS is union of all local conceptualschemas.In FMDBS, GCS is subset of local conceptualschemas (LCS), consisting of data that each localsystem agrees to share.GCS of tightly coupled system involves integrationof either parts of LCSs or local external schemas.FMDBS with no GCS is called loosely coupled.
Pearson Education © 2014 28
Reference Architecture for Tightly-Coupled FMDBS
Pearson Education © 2014 29
Components of a DDBMS
Pearson Education © 2014 30
Distributed Database DesignFragmentation
Relation may be divided into a number of sub-relations,which are then distributed.
AllocationEach fragment is stored at site with “optimal” distribution.
ReplicationCopy of fragment may be maintained at several sites.
Pearson Education © 2014 31
Data AllocationFour alternative strategies regarding placementof data:– Centralized, -- Bottleneck Issue– Partitioned (or Fragmented), -- Cluster Shared Nothing– Complete Replication, -- DR– Selective Replication.-- Split Brain Challenges
Pearson Education © 2014 32
Data AllocationCentralized: Consists of single database and DBMS
stored at one site with users distributed across thenetwork.
Partitioned: Database partitioned into disjointfragments, each fragment assigned to one site.
Complete Replication: Consists of maintainingcomplete copy of database at each site.
Selective Replication: Combination of partitioning,replication, and centralization.
Pearson Education © 2014 33
Comparison of Strategies for Data Distribution
Pearson Education © 2014 34
Concurrency TransparencyReplication makes concurrency more complex.If a copy of a replicated data item is updated,update must be propagated to all copies.Could propagate changes as part of originaltransaction, making it an atomic operation.However, if one site holding copy is not reachable,then transaction is delayed until site is reachable.
Pearson Education © 2014 35
Concurrency TransparencyCould limit update propagation to only those sitescurrently available. Remaining sites updated whenthey become available again.Could allow updates to copies to happenasynchronously, sometime after the originalupdate. Delay in regaining consistency may rangefrom a few seconds to several hours.
Pearson Education © 2014 36
Failure TransparencyDDBMS must ensure atomicity and durability ofglobal transaction.Means ensuring that subtransactions of globaltransaction either all commit or all abort.Thus, DDBMS must synchronize global transactionto ensure that all subtransactions have completedsuccessfully before recording a final COMMIT forglobal transaction.Must do this in presence of site and networkfailures.
Pearson Education © 2014 37
Performance TransparencyDDBMS must perform as if it were a centralizedDBMS.– DDBMS should not suffer any performance degradation
due to distributed architecture.– DDBMS should determine most cost-effective strategy to
execute a request.
Pearson Education © 2014 38
Date’s 12 Rules for a DDBMS
0. Fundamental PrincipleTo the user, a distributed system should look exactly like anondistributed system.
1. Local Autonomy2. No Reliance on a Central Site3. Continuous Operation4. Location Independence5. Fragmentation Independence6. Replication Independence
Pearson Education © 2014 39
Date’s 12 Rules for a DDBMS7. Distributed Query Processing8. Distributed Transaction Processing9. Hardware Independence10. Operating System Independence11. Network Independence12. Database Independence
• Last four rules are ideals.
Pearson Education © 2014 40