59
06/20/22 Lecture 8 Slide 1 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement Motivation Anomalies, Redundancy Decomposition: a good solution Keys and Functional Dependencies (FDs) BCNF and Redundancy Lossless Decompositions Dependency Preserving Decompositions, Projections Third Normal Form Physical Design Performance and the workload Choosing Indexes Identifying useful indexes, Too many indexes, How indexes are chosen More Schema Refinement • Denormalization, Vertical and Horizontal Decomposition Tuning the database and tuning queries

Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

Embed Size (px)

Citation preview

Page 1: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 1

Lecture 8: Schema Refinement and Normal Forms; Physical Design and

Tuning• Schema Refinement

– Motivation

– Anomalies, Redundancy

– Decomposition: a good solution

– Keys and Functional Dependencies (FDs)

– BCNF and Redundancy

– Lossless Decompositions

– Dependency Preserving Decompositions, Projections

– Third Normal Form

• Physical Design – Performance and the

workload

– Choosing Indexes• Identifying useful indexes,

Too many indexes, How indexes are chosen

– More Schema Refinement• Denormalization, Vertical

and Horizontal Decomposition

• Tuning the database and tuning queries

Page 2: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 2

Learning Objectives

LO8.1: Identify update, insertion and deletion anomalies

LO8.2: Identify possible keys given an instance

LO8.3: Identify possible functional dependencies in a relation

LO8.4: Determine all keys in a schema

LO8.5: Decompose a schema into BCNF schemas

Page 3: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 3

Review

• We began the course with the life cycle of database applications

• First came Requirements Analysis from the customer

• We learned how to transform an RA into an ER diagram

• Then we transformed ER diagrams into relational schemas– and went on to implement the application by loading the data

and writing SQL statements

• But different ER diagrams can lead to different relational schemas. This week we study which schemas are best.

Page 4: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 4

What is Schema Refinement?• Schema Refinement is the study of what should go where in a

DBMS, or, which schemas are best to describe an application. • For example, consider this schema

• Versus this one:

• Which schema do you think is best? Why?

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

EmpEID Name DeptIDA01 Ali 12 A12 Eric 10A13 Eric 12A03 Tyler 12

DeptDeptID DeptName12 Wing10 Tail

Page 5: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 5

What’s wrong?*

• The first problem students usually identify with the EmpDept schema is that it combines two different ideas: employee information and department information. But what is wrong with this?

1. If we separated the two concepts we could save space.

2. Combining the two ideas leads to some bad anomalies.

• These two problems occur because DeptID determines DeptName, but DeptID is not a key. Let’s look into the anomalies further.

Page 6: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 6

Anomalies, Redundancy*

• What anomalies are associated with EmpDept?• Update Anomalies:

• Insertion Anomalies:

• Deletion Anomalies:

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

Page 7: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 7

LO8.1:Practice Anomalies, Redundancies*

• Identify anomalies associated with this schema. Include update, insertion and deletion anomalies.

EnrollStud(StudID, ClassID, Grade, ProfID, StudName)

• Why do these anomalies occur?

Page 8: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 8

Decomposition: A good solution

• The intergalactic standard solution to the redundancy problem is to decompose redundant schemas, e.g., EmpDept becomes

• The secret to understanding when and how to decompose schemas is Functional Dependencies, a generalization of keys.

• When we say "X determines Y" we are stating a functional dependency.

EmpEID Name DeptIDA01 Ali 12 A12 Eric 10A13 Eric 12A03 Tyler 12

DeptDeptID DeptName12 Wing10 Tail

Page 9: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 9

Review Keys

• Note that EID being a key* of EmpDept means that the values of EID are unique, and EID is minimal.

• Remember: you cannot determine keys from an instance, only from “natural” information or from a domain expert.

• Let’s practice keys by identifying possible keys in an instance.

*sometimes called a candidate key

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

Page 10: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 10

LO8.2:Identify Possible Keys*

• Identify all possible Keys based on this instance:

Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX10:42AM 233 def PDX SEA11:44AM 155 des ORD ATL12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA

Page 11: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 11

Functional Dependencies

• A key like EID has another property: If two rows have the same EID, then they have the same value of every other attribute. We say EID functionally determines all other attributes and write this Functional Dependency (FD):

EID Name, DeptID, DeptName• Is Name DeptID true?

– No, because rows 2 and 3 have the same Name but not the same DeptID.

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

Page 12: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 12

Functional Dependencies, ctd.

• Do you see any more FDs in EmpDept?– Yes, the FD DeptID DeptName

• DEFINITION: If A and B are sets of attributes in a relation, we

say that A (functionally) determines B, or AB is a Functional Dependency (FD) if whenever two rows agree on A, they agree on B. In other words, the value of a row on A functionally determines its value on B.

• There are two special kinds of FDs:– Key FDs, XA where X contains a key

– Trivial FDs, such as NameName, or Name,DeptIDDeptID

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

Page 13: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 13

Identify (natural) FDs*• What are the (natural) FDs in these relations? Identify the key

FDs but ignore trivial FDs

Customer(CustID, Address, City, Zip, State)

EnrollStud(StudID, ClassID, Grade, ProfID, StudName, ProfName)

Page 14: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 14

What are FDs?• An FD is a generalization of the concept of key.• FDs, like keys and foreign keys, are a kind of integrity

constraint (IC).• Like other ICs, FDs are part of a relation’s schema.• For example, a schema might be:

Assigned(EmpID Int,JobID Int,EmpName varchar(20),percent real,EmpID references… , JobID references…,

PRIMARY KEY (EmpID, JobID))

FDs: EmpIDEmpName

Page 15: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 15

How to determine FDs• So far we have dealt with “natural” FDs. Sometimes

it’s not clear what FDs apply in a relation, e.g., zip codes vs cities, or

Supplier(Name, Address, Crating, Discount) – unclear what are the FDs.

• There are two ways to determine FDs– Infer them as “natural” FDs from your experience– You may be given them as part of the schema, by the

instructor or by the customer.

• As with keys, you cannot determine FDs from an instance!– But you can tell if something is not an FD

Page 16: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 16

LO8.3:Identify Possible FDs*

• Identify two possible non-key FDs based on this instance (identical to slide 10). Remember the possible keys for this instance are {Time}, {Plane, Dest}, {Origin, Dest}

Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX10:42AM 233 def PDX SEA11:44AM 155 des ORD ATL12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA

Page 17: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 17

Reasoning about FDs

EmpDept(EID, Name, DeptID, DeptName)• Two natural FDs are

EIDDeptID and DeptIDDeptName

• These two FDs imply the FD EIDDeptName– Because if two tuples agree on EID, then by the first FD they

agree on DeptID, then by the second FD they agree on DeptName.

• The set of FDs implied by a given set F of FDs is called the closure of F and is denoted F+

Page 18: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 18

Armstrong’s Axioms

• The closure of F can be computed using these axiomsReflexivity: If X Y, then XYAugmentation: If XY, then XZYZ for any ZTransitivity: If XY and YZ then XZ

• Armstrong’s axioms are sound (they generate only FDs in F+ when applied to FDs in F) and complete (repeated application of these axioms will generate all FDs in F+).

Page 19: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 19

Determining Keys

• In order to determine if X is a key of a relation R, use this algorithm, which computes the attribute closure of X:

AttClos = X; // Note: X is a set of attributes Repeat until there is no change

If there is an FD UV with U AttClos, then set

AttClos = AttClos ∪ V

AttClos=R if and only if X is a key

Page 20: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 20

LO8.4:Determining the keys of R*

• Given the schema: R(A,B,C,D,E) BCA, DEC .

• What are all the keys of this schema? • Hint: any key must include A, BC or DE. Why?

Page 21: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 21

Redundancy and FDs• Consider the FDs in these examples:

EmpDept(EID, Name, DeptID, DeptName)

Assigned(EmpID, JobID, EmpName, percent)

EnrollStud(StudID, ClassID, Grade)

• Remember that every non-key FD is associated with some redundancy, or anomalies, and vice-versa.

• Our game plan is to use non-key FDs to decompose any relation into a form that has no redudancy, a so-called normal form.

Page 22: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 22

Boyce-Codd Normal Form (BCNF)*

• A relation is said to be in Boyce-Codd Normal Form if all its FDs are either trivial FDs or key FDs.

• Which of these relations is BCNF? EmpDept(EID, Name, DeptName)

Assigned(EmpID, JobID, EmpName, percent)

EnrollStud(StudID, ClassID, grade)

• Each BCNF relation with a single key looks like this

Key Nonkey Attr1

Nonkey Attr2

Nonkey Attrk

Page 23: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 23

BCNF and Redundancy

• Theorem: BCNF relations have no redundancy.

Proof: A relation has redundancy if there is an FD between two sets of attributes , say DeptIDDeptName, and there can be repeated entries of data for those attributes.

For example, consider (12,Wing) in this example:

But if the relation is BCNF, then the FD must be a key FD, and DeptID must be a key. Thus any pair such as (12,Wing) can appear only once.

DeptID DeptName (Other attributes)12 Wing10 Tail12 Wing

Page 24: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 24

Decomposition into BCNF

• Here is an algorithm for decomposing an arbitrary relation R into a collection of BCNF relations:

1. If R is not in BCNF and XA is a non-key FD, then

decompose R into R A and XA.

2. If R A and/or XA is not in BCNF, recursively apply step 1.

Page 25: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 25

Decomposing to BCNF*• Given the schema

EnrollStud(StudID, ClassID, Grade, ProfID, StudName)

including its natural FDs, decompose it into BCNF relations.

Page 26: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 26

LO8.5: Decomposing into BCNF*

• Given the schema MedsLabelDrug (Prescr#, CustID, Label, Drug) ,

with FDs Prescr# Label, Label Drug

decompose it into BCNF relations.

Page 27: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 27

Where are we?

• We’ve accomplished a lot!– We began with a relational schema– We identified (redundancy, anomaly) problems with it– We learned how to use FDs to eliminate those problems with

decompositions into BCNF.– Along the way, we learned a powerful tool: how to determine

keys from FDs.

• There are two steps left– Showing that the BCNF decompositions do not lose

information.– Discovering that they may lose FDs, and how to deal with

that.

Page 28: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 28

Lossless Decompositions

• Some decompositions lose information. Suppose we got carried away and further decomposed

Enroll(StudID,ClassID,Grade) into

StudGrade(StudID, Grade) and ClassGrade(ClassID, Grade)

• Here a row (123,B) in StudGrade means that student 123 got a B in some course, and (386,A) in ClassGrade means that some student got an A in course 386.

• But now we have no way of knowing which student got which grade in which class.

• This decomposition is lossy. It contains less information than the original schema. We want to generate only lossless decompositions when we design our databases.

Page 29: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 29

Lossless Decompositions

Definition:A decomposition of a schema R with FDs F, into attribute sets X and Y, is lossless with respect to F if for every instance r of R that satisfies F

r = X(r) ⋈ Y(r)

In other words, we can recover r from the natural join of the decomposed versions of r.

Page 30: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 30

Example of a Lossless Decomposition

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

R=EmpDept = r

X=EID,Name,DeptID

EID Name DeptIDA01 Ali 12 A12 Eric 10A13 Eric 12A03 Tyler 12

Y=DeptID,DeptName

DeptID DeptName12 Wing10 Tail

=X(r) =Y(r)

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

r = = X(r) ⋈ Y(r)

Page 31: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 31

Example of a Lossy Decomposition

StudID ClassID Grade123 CS386 A456 CS410 A

R = Enroll = r

StudID Grade123 A456 A

X =StudID, Grade

=X(r) ClassID GradeCS386 ACS410 A

Y =ClassID, Grade

=Y(r)

StudID ClassID Grade123 CS386 A123 CS410 A456 CS410 A456 CS386 A

= X(r) ⋈ Y(r)r

Note that the join has extra rows. This always happens in lossy decompositions

Page 32: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 32

Producing only Lossless Decompositions

• In our design of database schemas we certainly want to produce only lossless decompositions. Fortunately this is easy to guarantee.

Theorem: The decomposition of R with respect to FDs F into attribute sets R1 and R2 is lossless if and only if R1R2 contains a key for either R1 or R2.

Proof: Page 620 in the text.

Corollary: The BCNF decomposition algorithm produces only lossless decompositions.

Proof: In this case F includes the FD XA and the decomposition is into R1=R A and R2=XA . Then R1R2 = X is a key for XA.

Page 33: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 33

Where are we?

• In CS 3/586 we have learned how to transform

A Requirements Analysis into an ER Diagram into a Relational Schema and to transform that losslessly into a BCNF schema.

• We recall from a previous picture that BCNF tables are particularly simple, so this looks like a perfect solution to a very general problem.

• But real schemas are not always BCNF. There is one more complexity to deal with.

Page 34: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 34

Dependency Preserving Decompositions

• Decompositions should preserve FDs. • FDs are business requirements that must be enforced.• Consider an example:

– Emp(Addr,City,State,Zip) ACS Z, Z S– Keys are ACS and ACZ. Consider the BCNF decomposition:

(Address, City, Zip) ( Zip,State)– This is BCNF but it does not preserve ACS Z– Consider the values

( 7315 SW84, Portland, 97223), ( 97223, OR),

( 7315 SW84, Portland, 00000), ( 00000, OR)

Page 35: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 35

Third Normal Form• Some schemas do not have a lossless, dependency preserving,

decomposition into BCNF schemas.• Because of this dilemma, researchers created another normal

form called Third Normal Form (3NF), with the property that every schema has a lossless dependency preserving decomposition into 3NF schemas.

• A schema R with FDs F is in Third Normal Form if for every XA in F, one of these is true:

– XA is a trivial FD (i.e., X contains A)

– XA is a key FD (i.e. X contains a key)

– A is a part of some key for R

Definition of BCNF!

BCNF

3NF

Page 36: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 36

Conclusion

• Almost all schemas in real life can be decomposed into BCNF schemas that preserve all FDs. In this case, life is wonderful.

• But every once in a while we get a schema like

Emp(Addr,City,State,Zip) ACS Z, Z S• Recall that its keys are ACS and ACZ. There is no

decomposition into BCNF that preserves FDs!• On the other hand, this schema is 3NF. Check it!

Page 37: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 37

Conclusion, ctd.

• So in the rare case that we don’t have an ideal decomposition ( lossless, dependency preserving, into BCNF), rest assured that we can decompose into 3NF instead of BCNF and have lossless and dependency preservation.

• The proof of this assertion is in section 19.6.2 .

Page 38: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 38

Physical Database Design

• Database development involves three steps1. ER design

2. Schema refinement (normalization) and view definition This defines the conceptual and external schemas

3. Physical Design Choose indexes More schema refinement

Consider denormalizing Vertical and horizontal decomposition

Tuning the database and tuning queries Deciding how the data will be stored on disks (omitted)

Page 39: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 39

Performance and the Workload

• Note that ER design and normalization are logical concepts, while physical design is driven by performance needs.

• First the user tells you what information (logical) should be in the database, then s/he tells you how efficiently the database should perform (physical).

• We'll start the physical design process by learning how to choose indexes for a workload.

• We want to know: What Indexes might improve performance? What algorithms would they enable? What indexes are not useful together?

Page 40: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 40

Example

• A B+ tree index on amount enables an index retrieval of tuples satisfying I.amount >1000.– But if there are many such tuples (the index is not selective) it may need

to be clustered.

• An index on party enables an index retrieval of tuples satisfying C.party='IND'.– Again, selectivity matters.

• An index on C.commid or I.commid – enables an Index Nested Loop Join, but it might not be efficient if there

are many tuples in the outer table.

– Speeds up a Merge Sort join if one or both indexes are clustered

• Given an index on C.commid, an index on C.party is not useful and similarly for indexes on I.commid and I.amount.

SELECT C.commname, I.donornameFROM comm C JOIN indiv I USING commidWHERE I.amount>1000 AND C.party='IND';

Page 41: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 41

Too many indexes

• Why not declare all useful indexes?1. The optimizer may not be able to support the plans

you have in mind Get to know your optimizer – use EXPLAIN

2. Indexes take up space Though nowadays this is not a big problem

3. Indexes slow updates4. Some indexes are not useful together5. The optimizer will be slower because it has more

choices 6. Indexes take time to create

Page 42: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 42

Choosing indexes in the real world

• As illustrated on the previous two pages, choosing indexes is an extremely complex task.

• The big 3 commercial DBMSs provide utilities to do the work for you– Microsoft: AutoAdmin– DB2: Autonomic Computing– Oracle: Automatic Database Diagnostic Monitor

Page 43: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 43

Automated Index Selection

• An algorithm for choosing indexes:– Input: schema, workload, performance requirements– Output: An index configuration whose cost (to execute the

workload) is minimal.– Complexity: For a single table with 10 attributes, there are

30,240 different 5-attribute indexes.

• How do we choose among all those possibilities?– Consider only single- or two- attribute indexes.– Consider indexes only on relevant attributes– Still need to prune search space intelligently

• Computing the cost of a workload is very expensive – why?

Page 44: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 44

More Schema Refinement

• We have studied one kind of schema refinement, namely normalizing a schema by decomposing it into 3NF or BCNF schemas. This is part of logical design.

• Physical design, driven by performance goals, includes other types of schema refinement, which we will study now. These include de-normalization (!), vertical decomposition and horizontal decomposition.

Page 45: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 45

De-Normalization• Recall the relation

– CustState(CustID, Address, City, Zip, State)

Here is its BCNF decomposition/Normalization– Cust(CustID,Address, City, Zip) State(Zip,State)

• Suppose we have done the normalization and the query

SELECT C.CustID,C.Address, C.City, C.Zip, S.State

FROM Cust C, State S WHERE C.Zip = S.Zip;is a frequent and important query in the company.

Page 46: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 46

De-Normalization, ctd.• The join query will be expensive, even if we declare

indexes (which will be costly too).• A possible solution is to denormalize the tables back

to CustState.– Then the previous query will run much more quickly

• What are the disadvantages of denormalization?– Space wasted

• But space is cheap nowadays

– Anomalies when data changes• But zip codes and states are unlikely to change

• In real shops, denormalization is done to improve performance, even when data is likely to change.

Page 47: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 47

Vertical Decomposition

• Consider the BCNF relation

Emp(EID, Address, City, State, Wage, DeptID)

• Suppose that the HR department issues queries about EID, Address, City and State and the rest of the company issues queries about EID, Wage and DeptID.

• What is the advantage of storing the Emp information in these two relations?

EmpHR(EID, Address, City, State)

EmpComp(EID, Wage, DeptID)– All the queries will run faster because they process smaller tables.

• For obvious reasons this is called a vertical decomposition

Page 48: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 48

Horizontal Decomposition

• Consider again the relationEmp(EID, Address, City, State, Wage, DeptID)

• Now suppose that most Emp queries are from the Washington or Oregon branches of the company, who issue queries about Washington or Oregon employees, respectively.

• Surely you see the advantage of storing the Emp information in two relations, EmpOR and EmpWA, consisting of OR and WA employees, respectively.

• Why is this called a horizontal decomposition?

Page 49: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 49

Masking Decompositions with Views

• If someone in the company wants to issue a query about the old Emp relation, or if there is old software that uses the Emp relation, this is possible with the use of a view, for example

CREATE VIEW Emp AS

SELECT * FROM EmpOR

UNION

SELECT * FROM EmpWA;

Page 50: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 50

Tuning the Database• We have described the steps a DBA takes during initial physical design

of a database, driven by performance requirements: choosing indexes, denormalization, and physical storage and refining schemas.

• These steps continue throughout the life of a database, because everything about the database changes: queries and their importance, schemas, and data.

• Changing the design of a database during the life of a database is called tuning.

– Tuning also involves other steps such as updating statistics and reclustering tables.

• Tuning is driven by two kinds of information– Utilities that generate performance statistics

• E.g., disk usage, response times

– User complaints

• Hopefully utilties will warn the DBA of problems before users complain.

Page 51: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 51

Tuning Queries• Sometimes a utility or a customer will identify a specific query as

a problem (poor respose time and/or excessive use of resources). What should you do?

• The first step: is it the fault of the DBMS?– Check to see how much time/resources the DBMS is using vs the

network, the OS, etc.

• The next step is to use EXPLAIN/SHOW PLAN, etc to find out what plan the optimizer is using to execute the query, then tune the query.

• There are various techniques to tune queries:– Rewrite the query to use existing indexes– Simplify the query, e.g., by eliminating DISTINCT, GROUP

BY/HAVING clauses, or eliminating temporary relations– Flatten nested queries (already studied)– Alter the index configuration (already studied)

Page 52: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 52

Rewriting a query to use existing indexes

• Consider the querySELECT E.EID

FROM Emp E

WHERE E.salary=1000 OR E.age=25;

• Suppose there are selective indexes on salary and age, but the optimizer is scanning the entire table.

• You could rewrite the query as a UNION SELECT E.EID

FROM Emp E, Dept D

WHERE E.salary=1000

UNIONSELECT E.EID

FROM Emp E, Dept D

WHERE E.age=25;

Page 53: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 53

Practice: Simplifying Queries*

• Can you simplify these queries?

SELECT DISTINCT E.EIDFROM Emp EWHERE E.salary > 1000;

SELECT AVG(E.salary)FROM Emp EWHERE E.salary > 1000GROUP BY E.ageHAVING E.age=25;

Page 54: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 54

Practice: Eliminate temp relations

• Usually (not always) an optimizer is more efficient without temporary relations. Can you combine these into one query?

SELECT E.sal, D.dno INTO TempFROM Emp E, Dept DWHERE E.dno=D.dno

AND D.mgrname=‘Joe’;

SELECT T.dno, AVG(T.sal)FROM Temp TGROUP BY T.dno;

Page 55: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 55

LO8.1:Exercise*

• Identify anomalies associated with this schema. Include update, insertion and deletion anomalies.

Assigned(EmpID, JobID, EmpName, percent)

• Why do these anomalies occur?

Page 56: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 56

LO8.2: Exercise*

• Identify some possible keys based on this instance. Include only keys with one or two attributes:

T W X Y Z

s A 1 B 2 t X 5 X 4 u Z 9 Z 2 s A 2 B 1 r X 1 B 2

Page 57: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 57

LO8.3: EXERCISE*

• Identify two possible non-key FDs based on this instance (identical to the previous slide):

T W X Y Z

s A 1 B 2 t X 5 X 4 u Z 9 Z 2 s A 2 B 1 r X 1 B 2

Page 58: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 58

LO8.4: EXERCISE*

• Given the schema R(A,B,C,D,E) ABD, CDAE

• What are all the keys of this schema?

Page 59: Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy

04/18/23 Lecture 8 Slide 59

LO8.5: EXERCISE*

• Given the schema LoansBC(Branch#, Loan#, Amt, Assets, Cust#, CustName)

including the FDs Branch#Assets, Cust#CustName, decompose it into BCNF relations.