View
226
Download
0
Category
Tags:
Preview:
Citation preview
Parallel and Distributed Databases
• CS263 Lecture 16
LECTURE PLAN
Parallel DBMS - What and Why?
What is a Client/Server DBMS?
Why do we need Distributed DBMSs?
Date’s rules for a Distributed DBMS
Benefits of a Distributed DBMS
Issues associated with a Distributed DBMS
Disadvantages of a Distributed DBMS
PARALLEL DATABASE SYSTEM
PARALLEL DBMSsWHY DO WE NEED THEM?
• More and More Data!
We have databases that hold a high amount of data, in the order of 1012 bytes:
10,000,000,000,000 bytes!
• Faster and Faster Access!
We have data applications that need to process data at very high speeds:
10,000s transactions per second!
SINGLE-PROCESSOR DBMS AREN’T UP TO THE JOB!
Improves Response Time.
INTERQUERY PARALLELISM
It is possible to process a number of transactions in parallel with each other.
Improves Throughput.
INTRAQUERY PARALLELISM
It is possible to process ‘sub-tasks’ of a transaction in parallel with each other.
PARALLEL DBMSsBENEFITS OF A PARALLEL DBMS
Speed-Up.
As you multiply resources by a certain factor, the time taken to execute a transaction should be reduced by the same factor:
10 seconds to scan a DB of 10,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs
PARALLEL DBMSsHOW TO MEASURE THE BENEFITS
Scale-up.
As you multiply resources the size of a task that can be executed in a given time should be increased by the same factor.
1 second to scan a DB of 1,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs
Sub-linear speed-up
Linear speed-up (ideal)
Number of CPUs
Nu
mb
er o
f tr
ansa
ctio
ns/
seco
nd
1000/Sec
5 CPUs
2000/Sec
10 CPUs 16 CPUs
1600/Sec
PARALLEL DBMSsSPEED-UP
10 CPUs2 GB Database
Number of CPUs, Database size
Nu
mb
er o
f tr
ansa
ctio
ns/
seco
nd
Linear scale-up (ideal)
Sub-linear scale-up
1000/Sec
5 CPUs1 GB Database
900/Sec
PARALLEL DBMSsSCALE-UP
MEMORYCPU
CPU
CPU
CPU
CPU
CPU
Shared Memory – Parallel Database Architecture
CPU
CPU
CPU
CPU
CPU
CPU
Shared Disk – Parallel Database Architecture
M
M
M
M
M
M
Shared Nothing – Parallel Database Architecture
CPUM
CPUM
CPUM
CPU M
CPU M
MAINFRAME DATABASE SYSTEM
DUMB
DUMB
DUMB
SP
EC
IAL
ISE
D N
ET
WO
RK
CO
NN
EC
TIO
NTERMINALSMAINFRAME COMPUTER
PRESENTATION LOGICBUSINESS LOGICDATA LOGIC
CLIENT/SERVER DATABASE SYSTEM
CLIENT/SERVER DBMS
Manages user interface
Accepts user data
Processes application/business logic
Generates database requests (SQL)
Transmits database requests to server
Receives results from server
Formats results according to application logic
Present results to the user
CLIENT PROCESS
CLIENT/SERVER DBMS
Accepts database requests
Processes database requests
Performs integrity checks
Handles concurrent access
Optimises queries
Performs security checks
Enacts recovery routines
Transmits result of database request to client
SERVER PROCESS
Data Request Data Response
CLIENT/SERVERCLIENT/SERVERDBMS ARCHITECTUREDBMS ARCHITECTURE
CLIENT#1
CLIENT#2
CLIENT#3
PRESENTATION LOGIC
BUSINESS LOGIC
DATA LOGIC
(FAT CLIENT)
D/BASE
SERVER
D/BASE
SERVER
Data Request Data Response
CLIENT/SERVERCLIENT/SERVERDBMS ARCHITECTUREDBMS ARCHITECTURE
CLIENT#1
CLIENT#2
CLIENT#3
PRESENTATION LOGIC
BUSINESS LOGICDATA LOGIC
(THIN CLIENT)
PL
/SQ
L
LAN
CLIENT
CLIENT
LAN
CLIENT CLIENT
CLIENT CLIENT
LAN
CLIENT
CLIENT
LAN
CLIENT
Leyton
CLIENT
CLIENT CLIENT
Stratford
DB
MS
WID
E A
RE
A N
ET
WO
RK
Barking Leytonstone
DISTRIBUTED PROCESSING ARCHITECTUREDISTRIBUTED PROCESSING ARCHITECTURE
CLIENT
CLIENT
CLIENT
CLIENT
DISTRIBUTED DATABASE SYSTEM
A distributed database system is a collection of logically related databases that co-operate in a transparent manner.
Transparent implies that each user within the system may access all of the data within all of the databases as if they were a single database
There should be ‘location independence’ i.e.- as the user is unaware of where the data is located it is possible to move the data from one physical location to another without affecting the user.
DISTRIBUTED DATABASESWHAT IS A DISTRIBUTED DATABASE?
WID
E A
RE
A N
ET
WO
RK
LAN
CLIENT CLIENT
CLIENT CLIENT
DB
MS
DISTRIBUTED DATABASE ARCHITECTUREDISTRIBUTED DATABASE ARCHITECTURE
LAN
CLIENT CLIENT
CLIENT CLIENT
DB
MS
Leytonstone
CLIENT CLIENT
CLIENT
DB
MS
Stratford
CLIENT
CLIENT CLIENT
CLIENT
DB
MS
Barking
CLIENT
CLIENT
CLIENT
Leyton
D/BASE
SERVER #1CLIENT
#1
D/BASE
SERVER #2
CLIENT#2
CLIENT#3
M:N CLIENT/SERVER DBMS ARCHITECTUREM:N CLIENT/SERVER DBMS ARCHITECTURE
NOT TRANSPARENT!NOT TRANSPARENT!
DB Computer Network
Site 2
Site 1
GSC
DDBMS
DC LDBMS
GSC
DDBMS
DC
LDBMS = Local DBMS DC = Data Communications GSC = Global Systems Catalog DDBMS = Distributed DBMS
COMPONENTS OF A DDBMS
• Reduced Communication Overhead
Most data access is local, less expensive and performs better.
• Improved Processing Power
Instead of one server handling the full database, we now have a collection of machines handling the same database.
• Removal of Reliance on a Central Site
If a server fails, then the only part of the system that is affected is the relevant local site. The rest of the system remains functional and available.
DISTRIBUTED DATABASESADVANTAGES
• Expandability
It is easier to accommodate increasing the size of the global (logical) database.
• Local autonomy
The database is brought nearer to its users. This can effect a cultural change as it allows potentially greater control over local data .
DISTRIBUTED DATABASESADVANTAGES
A distributed system looks exactly like a non-distributed system to the user!
1. Local autonomy2. No reliance on a central site3. Continuous operation4. Location independence5. Fragmentation independence6. Replication independence7. Distributed query independence8. Distributed transaction processing9. Hardware independence10. Operating system independence11. Network independence12. Database independence
DISTRIBUTED DATABASESDATE’S TWELVE RULES FOR A DDBMS
Data Allocation
Data Fragmentation
Distributed Catalogue Management
Distributed Transactions
Distributed Queries – (see chapter 20)
DISTRIBUTED DATABASESISSUES
1. Locality of reference Is the data near to the sites that need it?
2. Reliability and availability Does the strategy improve fault tolerance and accessibility?
3. Performance Does the strategy result in bottlenecks or under-utilisation of resources?
4. Storage costs How does the strategy effect the availability and cost of data storage?
5. Communication costs How much network traffic will result from the strategy?
DISTRIBUTED DATABASESDATA ALLOCATION METRICS
CENTRALISED
DISTRIBUTED DATABASESDATA ALLOCATION STRATEGIES
Locality of Reference
Reliability/Availability
Storage Costs
Performance
Communication Costs
Lowest
Lowest
Lowest
Unsatisfactory
Highest
PARTITIONED/FRAGMENTED
DISTRIBUTED DATABASESDATA ALLOCATION STRATEGIES
Locality of Reference
Reliability/Availability
Storage Costs
Performance
Communication Costs
High
Low (item) – High (system)
Lowest
Satisfactory
Low
COMPLETE REPLICATION
DISTRIBUTED DATABASESDATA ALLOCATION STRATEGIES
Locality of Reference
Reliability/Availability
Storage Costs
Performance
Communication Costs
Highest
Highest
Highest
High
High (update) – Low (read)
SELECTIVE REPLICATION
DISTRIBUTED DATABASESDATA ALLOCATION STRATEGIES
Locality of Reference
Reliability/Availability
Storage Costs
Performance
Communication Costs
High
Average
Satisfactory
Low
Low (item) – High (system)
Usage Applications are usually interested in ‘views’ not whole relations.
Efficiency It’s more efficient if data is close to where it is frequently used.
Parallelism It is possible to run several ‘sub-queries’ in tandem.
Security Data not required by local applications is not stored at the local site.
DISTRIBUTED DATABASESWHY FRAGMENT DATA?
DISTRIBUTED DATABASESHORIZONTAL DATA FRAGMENTATION
333.00STRATFORDKHAN456
500.00BARKINGONO400
340.14BARKINGGREEN350
23.17STRATFORDSMITH345
200.00BARKINGGRAY324
1000.00STRATFORDJONES200
BALANCEBRANCHCUSTOMERACCOUNT
Horizontal Fragmentation: Consists of a Restriction on a Relation.
e.g., ( branch = ‘Stratford’ Account)
DISTRIBUTED DATABASESHORIZONTAL DATA FRAGMENTATION
STRATFORD
STRATFORD
STRATFORD
333.00KHAN456
23.17SMITH345
1000.00JONES200
BALANCEBRANCHCUSTOMERACCT NO.
BARKING
BARKING
BARKING
500.00ONO400
340.14GREEN350
200.00GRAY324
BALANCEBRANCHCUSTOMERACCT NO.
STRATFORD BRANCH
BARKING BRANCH
DISTRIBUTED DATABASESVERTICAL DATA FRAGMENTATION
KJTR78KHA456T0208-500-5821STRATFORDKHAN456
ZZEE56GRA324S0208-545-7528BARKINGGRAY324
XXYY22JON200T0208-500-9000STRATFORDJONES200
PASSWORDLOGINPHONE NOSITENAMES#
Vertical Fragmentation: Consists of a Projection on a Relation.
e.g., ( S#, NAME, SITE, PHONE NO Student)
DISTRIBUTED DATABASESVERTICAL DATA FRAGMENTATION
STRATFORD
BARKING
STRATFORD
KHAN456
GRAY3240208-500-5821
0208-545-7528
0208-500-9000JONES200
PHONE NO.SITENAMES#
KJTR78
ZZEE56
XXYY22
KHA456T456
GRA324S324
JON200T200
PASSWORDLOGIN-IDS#
STUDENT ADMINISTRATION
NETWORK ADMINISTRATION
DISTRIBUTED DATABASESDISTRIBUTED CATALOG MANAGEMENT
• Centralised Global Catalog
One site maintains the full global catalog. All changes to any local system catalog have to be propagated to the site maintaining the global catalog. Bad performance, single point of failure, compromises site autonomy.
• Dispersed Catalog
There is no physical global catalog. Each time a remote data item is required, the catalogues from ALL other sites are examined for the item. This has severe performance penalties.
DISTRIBUTED DATABASESDISTRIBUTED CATALOG MANAGEMENT
• Replicated Global Catalog
Each site maintains its own global catalog. Although this greatly speeds up remote data location, it is very inefficient to maintain. A detail of every data item added, changed or deleted locally has to be propagated to ALL other sites .
• Local-Master Catalog
Each site maintains both its local system catalog as well as a catalog of all of its data items that are replicated at other sites. This avoids compromising site autonomy, is fairly efficient, and is not a single point of failure.
AT
OM
IC D
IST
RIB
UT
ED
TR
AN
SA
CT
ION
DISTRIBUTED DATABASESDISTRIBUTED TRANSACTIONS
Stratford DB
Barking DB
Leyton DB
StratfordDBMS
StratfordClient
StratfordClient
StratfordClient
BarkingDBMS
LeytonDBMS
Global Transaction
(a) Debit Stratford A/C £500(b) Credit Barking A/C £350(c) Credit Leyton A/C £150
(a)
(b)
(c)
TWO-PHASE COMMIT (2PC) - OK
TWO-PHASE COMMIT (2PC) - ABORT
‘Global Abort’
Architectural complexity.
Cost.
Security.
Integrity control more difficult.
Lack of standards.
Lack of experience.
Database design more complex.
DISTRIBUTED DATABASESDISADVANTAGES OF DDBMSs
Recommended