Upload
charlene-townsend
View
217
Download
0
Embed Size (px)
Citation preview
Physical Database Design
Barry Floyd
BUS 498Advanced Database Management Systems
Introduction
The Physical Database Design Process
Goal is to translate our conceptual designs into physical reality
Draw on requirements analysis and our conceptual data model
Agenda
Data Volume and Usage AnalysisData Distribution Strategy
discuss this later in the quarterIndexesDenormalization
Overview
Important step in the database design process (also the last step)
Decisions made here impact ... data accessibility response times usability
Vocabulary
Data volume - how many recordsData usage - how often and in what
manner are the records used
Data Volume Analysis
Use volume analysis to select physical storage devices estimate costs of storage
Data Volume Analysis
TREATMENTTREATMENT PATIENTPATIENT PHYSICIANPHYSICIAN5050
CHARGECHARGE ITEMITEM500500
LOCATIONLOCATION100100
GIVENGIVEN
GIVENGIVEN
GIVENGIVEN
Data Volume Analysis
TREATMENTTREATMENT PATIENTPATIENT10001000
PHYSICIANPHYSICIAN5050
CHARGECHARGE ITEMITEM500500
LOCATIONLOCATION100100
* Keep patient record active* Keep patient record active for 30 daysfor 30 days* Average length of stay * Average length of stay for a patient is 3 daysfor a patient is 3 days
100 X 30 / 3 => 1000100 X 30 / 3 => 1000
* Keep patient record active* Keep patient record active for 30 daysfor 30 days* Average length of stay * Average length of stay for a patient is 3 daysfor a patient is 3 days
100 X 30 / 3 => 1000100 X 30 / 3 => 1000
(10)(10)
(20)(20)
DERIVEDERIVE
Data Volume Analysis
TREATMENTTREATMENT40004000
PATIENTPATIENT10001000
PHYSICIANPHYSICIAN5050
CHARGECHARGE ITEMITEM500500
LOCATIONLOCATION100100
* Each patient has 4 treatments* Each patient has 4 treatments on average.on average.
1000 X 4 => 40001000 X 4 => 4000
* Each patient has 4 treatments* Each patient has 4 treatments on average.on average.
1000 X 4 => 40001000 X 4 => 4000
(10)(10)
(20)(20)(4)(4)
DERIVEDERIVE
Data Volume Analysis
TREATMENTTREATMENT40004000
PATIENTPATIENT10001000
PHYSICIANPHYSICIAN5050
CHARGECHARGE10,00010,000
ITEMITEM500500
LOCATIONLOCATION100100* Each patient has 10 charges* Each patient has 10 charges on average.on average.
1000 X 10 => 10,0001000 X 10 => 10,000
* Each patient has 10 charges* Each patient has 10 charges on average.on average.
1000 X 10 => 10,0001000 X 10 => 10,000
(20)(20)(4)(4)
DERIVEDERIVE
(20)(20)
(10)(10)
Data Volume Analysis
TREATMENTTREATMENT40004000
PATIENTPATIENT10001000
PHYSICIANPHYSICIAN5050
CHARGECHARGE10,00010,000
ITEMITEM500500
LOCATIONLOCATION100100
(10)(10)
(20)(20)(4)(4)
(20)(20)
(10)(10)KNOW ...KNOW ...Number ofNumber ofrecords andrecords andrelationshipsrelationships
Data Usage Analysis
Want to identify major transactions and processes which hit on the database
Analyze each transaction and process to determine access paths used and frequency of use
Create composite map from individual analyses
Transaction Analysis FormTRANSACTION NUMBER MVCH-4TRANSACTION NAME: CREATE PATIENT BILLTRANSACTION VOLUME:AVERAGE 2/HR PEAK: 10/HR
PATIENTPATIENT10001000
CHARGECHARGE10,00010,000
ITEMITEM500500
(1)
(2) (3)
NO. NAME ACCESS TRAN PERIOD TYPE REF REF(1) ENTRY-PATIENT READ 1 10
Transaction Analysis Form
NO. NAME ACCESS TRAN PERIOD TYPE REF REF(1) ENTRY-PATIENT READ 1 10(2) PATIENT-CHARGE READ 10 100(3) CHARGE-ITEM READ 10 100
PATIENTPATIENT10001000
CHARGECHARGE10,00010,000
ITEMITEM500500
(1)
(2) (3)
Composite Usage Map
Determine how the data structures are accessed for each transaction and process include programs standard queries
programmedad hoc
Composite Usage Map
TREATMENTTREATMENT40004000
PATIENTPATIENT10001000
PHYSICIANPHYSICIAN5050
CHARGECHARGE10,00010,000
ITEMITEM500500
LOCATIONLOCATION100100
(25)
(50)
(50)
(50)NUMBER ISPER HOURAT PEAK VOLUME
Composite Usage Map
TREATMENTTREATMENT40004000
PATIENTPATIENT10001000
PHYSICIANPHYSICIAN5050
CHARGECHARGE10,00010,000
ITEMITEM500500
LOCATIONLOCATION100100
(75) (25) (30)
(200)
(20)
(50)
(50)
(100)
Composite Usage Map
TREATMENTTREATMENT40004000
PATIENTPATIENT10001000
PHYSICIANPHYSICIAN5050
CHARGECHARGE10,00010,000
ITEMITEM500500
LOCATIONLOCATION100100
(75) (25) (30)(25)
(200)
(20)
(50)
(50)
(50)(50)
(50)
(100)
Summary
Given volume and usage knowledge we can consider different physical implementation strategies, including ... INDEXES DENORMALIZATION CLUSTERING
Indexes
Purpose: To speed up access to a particular row or a group of rows in a table.
Also used to enforce uniquenessEliminates the necessity of re-sorting
the table each time we need to create a sequenced list
Indexes
Allen 3Brian 6Carole 7John 2Karen 5Marvin1Sharon 8Sue 4
1 Marvin …2 John ...3 Allen ...4 Sue ...5 Karen ...6 Brian ...7 Carole ...8 Sharon ...
Example
SELECT NAME, DEPT, RATING FROM EMP WHERE RATING = 10;
Indexing on RATING improves performance. Without an index, must do a full table scan.
Costs of an index?
Storage spaceMaintenance
Indexed must be changed for each add/delete or change in value on indexed field.
One benchmark ... insert into table w/o indexes, 0.11 seconds, w/ 8 indexes, 0.94 seconds.
Access Indexes
Automatically created on primary key.
You must create other indexes as needed.
Note, creating a unique index on a foreign key turns the relationship into a 1 - 1 relationship rather than a 1 - m relationship.
Let’s consider Oracle indexes and performance ...
Oracle Indexes
% Seconds8.5 0.66 12.03 35.7015.5 1.04 16.21 35.7025.2 1.54 25.45 35.7050.7 2.80 33.89 35.70100 5.72 87.23 35.70
SELECT COUNT(*)FROM EMPWHERE EMP_NO>0
SELECT EMP_NAMEFROM EMPWHERE EMP_NO>0
INDEX + TABLE
FULL TABLE SCAN
INDEXONLY
% OFFILEREAD
26,000 Rows, 7 Rows per Block
BREAK-EVEN
% Seconds8.5 0.66 2.31 4.5215.5 1.05 4.01 4.5225.2 1.59 6.37 4.5250.7 2.91 12.69 4.52100 6.01 25.37 4.52
SELECT COUNT(*)FROM EMPWHERE EMP_NO>0
SELECT EMP_NAMEFROM EMPWHERE EMP_NO>0
INDEX + TABLE
FULL TABLE SCAN
INDEXONLY
% OFFILEREAD
26,000 Rows, 258 Rows per Block
BREAK-EVEN
Oracle Indexes
Rules of thumb
Use indexes generously for applications which are decision support/retrieval based.
Use indexes judiciously for transaction processing applications.
Places to use indexes
PRIMARY KEYFOREIGN KEYSNon Key attributes that are referred
to in qualification, sorting, and grouping (WHERE, ORDER BY, GROUP BY)
Denormalization
Goal is to reduce the number of physicals reads to the storage devices by reducing the number of joins.
Costs of Denormalization
Makes coding more complexOften sacrifices flexibilityWill speed up retrieval but slow
updates
Including children in the parent record
Multiple addresses in the personnel record Absolute number of children for a
parent is known (e.g., 2 addresses) The number won’t change over time The number is not very large
Clusters in Oracle
Clustering stores records from two tables into the same physical storage space Only useful for EQUI-JOINS Improves performance by 2-3 times
Storing most recent child data in the parent record
Multiple children, but children have an ordering (e.g., date of order) For example, perhaps storing amount of
last order. Amount of last dividend paid to a
particular account
Store running totals /Create extract tables
Store summary data from a child record Year to date sales
Create a summary table which contains aggregate values over some period (say, one month)
Duplicating a key beyond an immediate child record
ORDERS
PARTS
CLASS CLASS_ID
PART_ID,CLASS_ID
ORDER_ID,PART_ID,CLASS_IDADD THIS KEY
Consider SQL statement for previous example
SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESCFROM CLASS C, PART P, ORDER OWHERE O.PART_NO = P.PART_NOAND P.CLASS = C.CLASS;
SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESCFROM CLASS C,ORDER OWHERE O.CLASS = C.CLASS;
Record Partitioning
Breaking up a record into two parts
A,B,C,D,E,F,G
A,B,C,D
E,F,G
Summary
Logical design gives you information about the ‘how’ to build the system.
Good physical design takes into account the performance of the final design … to know how best to do this task, you must understand how the system is being used!