40
November 17, 2016 Sam Siewert CS317 File and Database Systems Lecture 15 – Introduction to Distributed DBMS http://dilbert.com/strips/comic/2007-01-13/

CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

November 17, 2016 Sam Siewert

CS317File and Database Systems

Lecture 15 – Introduction to Distributed DBMS

http://dilbert.com/strips/comic/2007-01-13/

Page 2: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Exam #2 ResultsAverage = TBDStd Deviation = TBD

No Clear Trends

Some Confusion on Normalization – See Solutions

Focus on A#6 and Final Oral Exam (25% of Grade)

Turn in all Slides (after Presentation)

Turn in final Report (Note new Due Date)

Sam Siewert 2

Page 3: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

RemindersAssignment #6, DBMS Project of Your Interest –POSTED– Choose one of the BOLDED Options or Come Talk to Me or E-

mail me with your Ideas– Self-Directed – Autonomy, Mastery, Purpose and Life-long

Learning and Requires Some Research on Your Part

This Completes Assignments for the Class

Assignment #6 Assessed with Final Grading, Due 12/1, Late 12/4, Finals Start on 12/6

Sam Siewert 3

Page 4: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Interdisciplinary Nature of DBMS

Sam Siewert 4

DBMS

File Systems

Operating Systems

Programming Languages(SQL, OOP)

Security

Networking(Clusters, DR, Client/Server)

Storage(SAN, NAS, DAS)

Big

Data

Analytics

?

CS332 – “R”(Data Mining)

CS332 – C++ & JavaFinal Lecture – Week 14SE300/310 – OOA/OOD/OOP

Page 5: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

ACID – ATOMICITY, CONSISTENCY, ISOLATION, DURABILITY

TRANSACTION MANAGEMENT

Sam Siewert 5

Page 6: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

ACID FeaturesA – Atomicity, means Non-interruptable Updates for Tuples and Logically Related Table Tuples [E.g. Delete/Promote an Employee and Re-assign Clients]

C – Consistency, means the DB state should not become inconsistent [E.g. Deleted Employee has Clients Still assigned to their Employee-ID foreign key]

I – Isolation so that Transactions Do Not Interfere with Another if Serialized [E.g. Roll-back of One Should Not Cascade into Roll-backs of Others]

Durability – Committed Transactions Should Be Stored to Persistent Storage and Not Lost! [Durable] Sam Siewert 6

Page 7: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

State Transition Diagram for Transaction

Pearson Education © 2014 7

Written to Storage(Cached w/o ATA Flush, SCSI FUA)

http://en.wikipedia.org/wiki/Disk_buffer

Completed TransactionBuffered for Write-back

Partially Completed TransactionCleared from Buffer – No Write-back

If We NEVER used CacheAnd ALWAYS used FUA, thishelps, But WAY TOO SLOW

Page 8: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Comparison of Methods

Pearson Education © 2014 8

Orders conflicting operationsAs “some” serial schedule

Including those with data corruptionAnd inconsistent DB states

Less Restrictive Operation OrderingAs “some” serial schedule[Impractical – NP-Complete]

Page 9: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

SummaryConcurrency due to Multiple Clients and Transactions Mapped to OS Threads [Readers, Writers]

Causes Problems with Split Transactions– Lost Update– Dirty Read or Uncommitted Dependency– Fuzzy Read, Phantom Read

Serialization Using Locks – Not Sufficient for Consistency if Lock Scope Too Small and Can Cause Deadlock!Recoverability – Atomicity Requirement2PL – Growing and Shrinking Phase – Helps with Lock Scope, but Still have Deadlock/LivelockCascading Roll-back and Rigorous/Strict 2PLDeadlock Problem – Prevention, Detect & BreakAlternative Timestamp Methods

Rigorous/Strict 2PL with Deadlock Prevention or Detect & Break or Timestamp is Best

Sam Siewert 9

Page 10: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

DDBMS CONCEPTS AND DESIGN

Distributed DBMS

Sam Siewert 10

Page 11: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Distributed DBMS

Pearson Education © 2014 11

Page 12: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

MySQL ReplicationUsed for DR [Disaster Recovery] and for Geo Content and Services Distribution– East Coast / West Coast – ERAU Daytona, ERAU Prescott– ACTIVE-ACTIVE for Reads, Writes to MASTER

http://dev.mysql.com/doc/refman/5.0/en/replication.htmlAsynchronous Compared to Clustering [Synchronous]One Server is Master - MySQL Replication Solutions

Sam Siewert 12

Page 13: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

MySQL Replication - DRFail-Over Master in Case of DisasterBEFORE MASTER FAILURE

Sam Siewert 13

Page 14: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

MySQL Replication – DR Fail-OverFail-Over Master in Case of DisasterAFTER MASTER FAILUREChallenges – Fail-Back [Manual] and Split-Brain

Sam Siewert 14

To Restore MASTER We Must:

1) Repair Issue [E.g. HW fix]2) Re-Sync Writes3) Fail-Back to Original MASTER

Page 15: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Distributed ProcessingA centralized database that can be accessed over acomputer network.

Pearson Education © 2014 15

Page 16: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Parallel or Cluster DBMSMain architectures for parallel DBMSs are:

– Shared memory,– Shared disk,– Shared nothing.

Pearson Education © 2014 16

Page 17: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Parallel DBMS

(a) shared memory

(b) shared disk

(c) shared nothing

Pearson Education © 2014 17

Page 18: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

MySQL ClusterMySQL Shared Nothing Cluster – NDBD, Network DB DaemonSynchronous – So Ideally Gigabit, 10GE, 40G IB or Better ClusterSimilar to PNFS Concept – Parallel Network File System (NDB MgtBottleneck?)

Sam Siewert 18

Page 19: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Advantages of DDBMSsReflects organizational structureImproved shareability and local autonomyImproved availabilityImproved reliabilityImproved performanceEconomicsModular growth

Pearson Education © 2014 19

Page 20: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Disadvantages of DDBMSsComplexityCostSecurityIntegrity control more difficultLack of standardsLack of experienceDatabase design more complex

Pearson Education © 2014 20

Page 21: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Homogeneous DDBMSAll sites use same DBMS product.Much easier to design and manage.Approach provides incremental growth and allowsincreased performance.

Pearson Education © 2014 21

Page 22: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Overview of NetworkingNetwork - Interconnected collection of autonomous computers, capable of exchanging information.

• Local Area Network (LAN) intended for connecting computers at same site.

• Wide Area Network (WAN) used when computers or LANs need to be connected over long distances.

• WAN relatively slow and less reliable than LANs. DDBMS using LAN provides much faster response time than one using WAN.

Pearson Education © 2014 22

Page 23: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Overview of Networking

Pearson Education © 2014 23

Asynchronous Replication Synchronous Cluster

Page 24: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Functions of a DDBMSExpect DDBMS to have at least the functionality ofa DBMS.Also to have following functionality:– Extended communication services.– Extended Data Dictionary.– Distributed query processing.– Extended concurrency control.– Extended recovery services.

Pearson Education © 2014 24

Page 25: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Functions of a DDBMSExpect DDBMS to have at least the functionality ofa DBMS.Also to have following functionality:– Extended communication services.– Extended Data Dictionary.– Distributed query processing.– Extended concurrency control.– Extended recovery services.

Pearson Education © 2014 25

Page 26: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Reference Architecture for DDBMS

Due to diversity, no accepted architecture equivalent to ANSI/SPARC 3-level architecture.A reference architecture consists of:– Set of global external schemas.– Global conceptual schema (GCS).– Fragmentation schema and allocation schema.– Set of schemas for each local DBMS conforming

to 3-level ANSI/SPARC.Some levels may be missing, depending on levels oftransparency supported.

Pearson Education © 2014 26

Page 27: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Reference Architecture for DDBMS

Pearson Education © 2014 27

Page 28: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Reference Architecture for MDBSIn DDBMS, GCS is union of all local conceptualschemas.In FMDBS, GCS is subset of local conceptualschemas (LCS), consisting of data that each localsystem agrees to share.GCS of tightly coupled system involves integrationof either parts of LCSs or local external schemas.FMDBS with no GCS is called loosely coupled.

Pearson Education © 2014 28

Page 29: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Reference Architecture for Tightly-Coupled FMDBS

Pearson Education © 2014 29

Page 30: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Components of a DDBMS

Pearson Education © 2014 30

Page 31: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Distributed Database DesignFragmentation

Relation may be divided into a number of sub-relations,which are then distributed.

AllocationEach fragment is stored at site with “optimal” distribution.

ReplicationCopy of fragment may be maintained at several sites.

Pearson Education © 2014 31

Page 32: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Data AllocationFour alternative strategies regarding placementof data:– Centralized, -- Bottleneck Issue– Partitioned (or Fragmented), -- Cluster Shared Nothing– Complete Replication, -- DR– Selective Replication.-- Split Brain Challenges

Pearson Education © 2014 32

Page 33: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Data AllocationCentralized: Consists of single database and DBMS

stored at one site with users distributed across thenetwork.

Partitioned: Database partitioned into disjointfragments, each fragment assigned to one site.

Complete Replication: Consists of maintainingcomplete copy of database at each site.

Selective Replication: Combination of partitioning,replication, and centralization.

Pearson Education © 2014 33

Page 34: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Comparison of Strategies for Data Distribution

Pearson Education © 2014 34

Page 35: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Concurrency TransparencyReplication makes concurrency more complex.If a copy of a replicated data item is updated,update must be propagated to all copies.Could propagate changes as part of originaltransaction, making it an atomic operation.However, if one site holding copy is not reachable,then transaction is delayed until site is reachable.

Pearson Education © 2014 35

Page 36: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Concurrency TransparencyCould limit update propagation to only those sitescurrently available. Remaining sites updated whenthey become available again.Could allow updates to copies to happenasynchronously, sometime after the originalupdate. Delay in regaining consistency may rangefrom a few seconds to several hours.

Pearson Education © 2014 36

Page 37: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Failure TransparencyDDBMS must ensure atomicity and durability ofglobal transaction.Means ensuring that subtransactions of globaltransaction either all commit or all abort.Thus, DDBMS must synchronize global transactionto ensure that all subtransactions have completedsuccessfully before recording a final COMMIT forglobal transaction.Must do this in presence of site and networkfailures.

Pearson Education © 2014 37

Page 38: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Performance TransparencyDDBMS must perform as if it were a centralizedDBMS.– DDBMS should not suffer any performance degradation

due to distributed architecture.– DDBMS should determine most cost-effective strategy to

execute a request.

Pearson Education © 2014 38

Page 39: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Date’s 12 Rules for a DDBMS

0. Fundamental PrincipleTo the user, a distributed system should look exactly like anondistributed system.

1. Local Autonomy2. No Reliance on a Central Site3. Continuous Operation4. Location Independence5. Fragmentation Independence6. Replication Independence

Pearson Education © 2014 39

Page 40: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Fall-2016… · Focus on A#6 and Final Oral Exam (25% of Grade) Turn in all Slides (after Presentation)

Date’s 12 Rules for a DDBMS7. Distributed Query Processing8. Distributed Transaction Processing9. Hardware Independence10. Operating System Independence11. Network Independence12. Database Independence

• Last four rules are ideals.

Pearson Education © 2014 40