40
CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs Data distribution Distributed query processing Distributed query optimization Distributed transactions & concurrency control Distributed reliability Database replication Parallel database systems Database integration & querying Advanced topics

CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

Embed Size (px)

Citation preview

Page 1: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.1

Outline Introduction & architectural issues

What is a distributed DBMSProblemsCurrent state-of-affairs

Data distribution Distributed query processing Distributed query optimization Distributed transactions & concurrency control Distributed reliability Database replication Parallel database systems Database integration & querying Advanced topics

Page 2: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.2

File Systems

program 1

data description 1

program 2

data description 2

program 3

data description 3

File 1

File 2

File 3

Page 3: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.3

Database Management

database

DBMS

Applicationprogram 1(with datasemantics)

Applicationprogram 2(with datasemantics)

Applicationprogram 3(with datasemantics)

descriptionmanipulation

control

Page 4: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.4

Motivation

DatabaseTechnology

ComputerNetworks

integration distribution

integration

integration ≠ centralization

DistributedDatabaseSystems

Page 5: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.5

Distributed Computing

A number of autonomous processing elements (not necessarily homogeneous) that are interconnected by a computer network and that cooperate in performing their assigned tasks.

What is being distributed?Processing logicFunctionDataControl

Page 6: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.6

What is a Distributed Database System?

A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network.

A distributed database management system (D–DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users.

Distributed database system (DDBS) = DDB + D–DBMS

Page 7: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.7

What is not a DDBS?

A timesharing computer system

A loosely or tightly coupled multiprocessor system

A database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node

Page 8: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.8

Centralized DBMS on a Network

Site 5

Site 1

Site 2

Site 3Site 4

CommunicationNetwork

Page 9: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.9

Distributed DBMS Environment

Site 5

Site 1

Site 2

Site 3Site 4

CommunicationNetwork

Page 10: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.10

Implicit Assumptions

Data stored at a number of sites each site logically consists of a single processor.

Processors at different sites are interconnected by a computer network not a multiprocessor system

Parallel database systems

Distributed database is a database, not a collection of files data logically related as exhibited in the users’ access patterns

Relational data model

D-DBMS is a full-fledged DBMSNot remote file system, not a TP system

Page 11: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.11

Data Delivery Alternatives

Delivery modesPull-onlyPush-onlyHybrid

FrequencyPeriodicConditionalAd-hoc or irregular

Communication MethodsUnicastOne-to-many

Note: not all combinations make sense

Page 12: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.12

Distributed DBMS Promises

Transparent management of distributed, fragmented, and replicated data

Improved reliability/availability through distributed transactions

Improved performance

Easier and more economical system expansion

Ch.x/12

Page 13: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.13

Transparency

Transparency is the separation of the higher level semantics of a system from the lower level implementation issues.

Fundamental issue is to providedata independence

in the distributed environment Network (distribution) transparency

Replication transparency

Fragmentation transparency horizontal fragmentation: selection vertical fragmentation: projection hybrid

Ch.x/13

Page 14: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.14

Example

Page 15: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.15

Transparent Access

SELECT ENAME,SALFROM EMP,ASG,PAYWHERE DUR > 12AND EMP.ENO = ASG.ENOAND PAY.TITLE =

EMP.TITLEParis projectsParis employeesParis assignmentsBoston employees

Montreal projectsParis projectsNew York projects with budget > 200000Montreal employeesMontreal assignments

Boston

CommunicationNetwork

Montreal

Paris

NewYork

Boston projectsBoston employeesBoston assignments

Boston projectsNew York employeesNew York projectsNew York assignments

Tokyo

Page 16: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.16

Distributed Database - User View

Distributed Database

Page 17: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.17

Distributed DBMS - Reality

CommunicationSubsystem

DBMSSoftware

UserApplicationUser

Query

DBMSSoftware

DBMSSoftware

DBMSSoftware

UserQuery

DBMSSoftware

UserQuery

UserApplication

Page 18: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.18

Types of Transparency

Data independence Network transparency (or distribution

transparency)Location transparencyFragmentation transparency

Replication transparency Fragmentation transparency

Page 19: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.19

Reliability Through Transactions

Replicated components and data should make distributed DBMS more reliable.

Distributed transactions provideConcurrency transparencyFailure atomicity

•Distributed transaction support requires implementation of Distributed concurrency control protocolsCommit protocols

Data replicationGreat for read-intensive workloads, problematic for updatesReplication protocols

Page 20: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.20

Potentially Improved Performance

Proximity of data to its points of use

Requires some support for fragmentation and replication

Parallelism in execution

Inter-query parallelism

Intra-query parallelism

Page 21: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.21

Parallelism Requirements

Have as much of the data required by each application at the site where the application executes

Full replication

How about updates?

Mutual consistency

Freshness of copies

Page 22: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.22

System Expansion

Issue is database scaling

Emergence of microprocessor and workstation technologies

Demise of Grosh's law

Client-server model of computing

Data communication cost vs telecommunication cost

Page 23: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.23

Distributed DBMS Issues

Distributed Database DesignHow to distribute the database

Replicated & non-replicated database distribution

A related problem in directory management

Query ProcessingConvert user transactions to data manipulation

instructions

Optimization problem min{cost = data transmission + local processing}

General formulation is NP-hard

Page 24: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.24

Distributed DBMS Issues

Concurrency ControlSynchronization of concurrent accesses

Consistency and isolation of transactions' effects

Deadlock management

ReliabilityHow to make the system resilient to failures

Atomicity and durability

Page 25: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.25

DirectoryManagement

Relationship Between Issues

Reliability

DeadlockManagement

QueryProcessing

ConcurrencyControl

DistributionDesign

Page 26: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.26

Related Issues

Operating System SupportOperating system with proper support for database

operationsDichotomy between general purpose processing

requirements and database processing requirements

Open Systems and InteroperabilityDistributed Multidatabase SystemsMore probable scenarioParallel issues

Page 27: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.27

Architecture

Defines the structure of the systemcomponents identified

functions of each component defined

interrelationships and interactions between components defined

Page 28: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.28

ANSI/SPARC Architecture

ExternalSchema

ConceptualSchema

InternalSchema

Internal view

Users

External view

Conceptual view

External view

External view

Page 29: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.29

Generic DBMS Architecture

Page 30: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.30

DBMS Implementation Alternatives

Page 31: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.31

Dimensions of the Problem

Distribution Whether the components of the system are located on the same

machine or not Heterogeneity

Various levels (hardware, communications, operating system) DBMS important one

data model, query language,transaction management algorithms

Autonomy Not well understood and most troublesome Various versions

Design autonomy: Ability of a component DBMS to decide on issues related to its own design.

Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with other DBMSs.

Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to.

Page 32: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.32

Client/Server Architecture

Page 33: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.33

Advantages of Client-Server Architectures

More efficient division of labor Horizontal and vertical scaling of resources Better price/performance on client

machines Ability to use familiar tools on client

machines Client access to remote data (via

standards) Full DBMS functionality provided to client

workstations Overall better system price/performance

Page 34: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.34

Database Server

Page 35: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.35

Distributed Database Servers

Page 36: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.36

Datalogical Distributed DBMS Architecture

...

...

...

ES1 ES2 ESn

GCS

LCS1 LCS2 LCSn

LIS1 LIS2 LISn

Page 37: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.37

Peer-to-Peer Component Architecture

Database

DATA PROCESSORUSER PROCESSOR

USER

Userrequests

Systemresponses

ExternalSchema

User

Inte

rface

Han

dle

r

GlobalConceptual

Schema

Sem

an

tic D

ata

Con

troller

Glo

bal

Execu

tion

Mon

itor

SystemLog

Local R

ecovery

Man

ag

er

LocalInternalSchema

Ru

nti

me

Su

pp

ort

Pro

cessor

Local Q

uery

Pro

cessor

LocalConceptual

Schema

Glo

bal Q

uery

Op

tim

izer

GD/D

Page 38: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.38

Datalogical Multi-DBMS Architecture

...

GCS… …

GES1

LCS2 LCSn…

…LIS2 LISn

LES11 LES1n LESn1 LESnm

GES2 GESn

LIS1

LCS1

Page 39: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.39

MDBS Components & Execution

Multi-DBMSLayer

DBMS1 DBMS3DBMS2

GlobalUser

Request

LocalUser

Request

GlobalSubrequest

GlobalSubrequest

GlobalSubrequest

LocalUser

Request

Page 40: CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.40

Mediator/Wrapper Architecture