Relational Model & Normalization
Relational terminology Anomalies and the need for
normalization Normal forms Relation synthesis De-normalization
Why the Relational Model?
General model DBMS-independent design Widely used in DBMS products But we must deal with anomalies
When is a table a relation?
Single value cells - no repeating groups or arrays
Each attribute has unique name All values in a column are of same kind Order of columns is not significant No identical rows Order of rows is not significant
Approaches to Relation Design
Analysis– start with table structure and normalize
(eliminate anomalies)– Entity Relationship Model (3rd Normal)
Synthesis– construct relations from attributes
Basic Concepts
Functional Dependency– relationship between or among attributes
Key– group of one or more attributes that
uniquely identifies a row
Functional Dependency
Y is functionally dependent on X if value of X determines value of Y– if we know the value of X, we can obtain
(look up, compute,…) the value for Y– determined by user model and business
rules
Keys
Single or group attributes
Depend on user model
Example: why {SID,Activity}?
Is there anotheroption?
Functional Dependencies, Keys and Uniqueness
Key is always unique Key functionally determines entire row Determinant need not be unique, hence
is not necessarily a key Example:
Activity Fee
Realitycheck
ProjectID EmployeeName? ProjectID EmployeeSalary? (ProjectID, EmployeeName) EmployeeSalary? EmployeeName EmployeeSalary? EmployeeSalary ProjectID? EmployeeSalary (ProjectID, EmployeeName)? What is the key?
Normalization
Modification Anomalies Referential Integrity Constraint Normal Forms Golden Rule: “A relation should have a “A relation should have a
single theme; if it has more, break it into single theme; if it has more, break it into more relations.”more relations.”
Modification Anomalies
What happens when you want to– add a new book?– change the address of a patron?– delete a patron record?
PatronName
PatronAddress
BookID
BookTitle
BookAuthor
BorrowDate
DueDate
ReturnDate
SmithJonesHartHicksRiceJones
12 Elk25 Sun73 Sera22 Main69 Witt25 Sun
AAABBBCCCAAADDDCCC
PeaceWarSystemPeaceSpringSystem
BartHineVangBartLyonVang
2/42/42/52/122/61/26
2/182/182/192/252/202/7
2/152/192/232/282/82/6
Modification Anomalies
Deletion anomaly– deleting one fact about an entity deletes a
fact about another entity Insertion anomaly
– cannot insert one fact about an entity unless a fact about another entity is also added
Update anomaly– changing one fact about an entity requires
multiple changes to a table
Referential Integrity Constraint
When we split a relation, we must pay attention to the references across the newly formed relations
E.g., a book must exist before it can be checked out:– CHECKOUT [BookID] BOOK [BookID]
The DBMS or the applications will have to check/enforce constraints
Second Normal Form
Single attribute key, or all non-key attributes are dependent on the entire key– ACTIVITY(SID, Activity, Fee)
Third Normal Form
No transitive dependencies– WORKER(Employee, Dept, Location)
– WORKER(Employee, Dept)OFFICE(Dept, Location)
Quick Quiz
Determine if the following relations are in 1NF, 2NF or 3NF
Rewrite each relation in 3NF– EMPLOYEE (EmpID, EmpName, JobCode)– EMPLOYEE(EmpID, EmpName, JobCode,
JobDesc)– EMPLOYEE(EmpID, EmpName, ProjectID,
HrsWorked)
Boyce-Codd Normal Form
Every determinant is a candidate key– ADVISER(SID,Major,Fname)
– STU-ADV(SID,Fname)ADV-SUBJ(Fname,Subject)
Multi-valued Dependency
Two or more functionally independent multi-valued attributes are dependent on another attribute– EMPLOYEE(Name,Dependent,Project)
Data redundancy and modification anomalies 4NF: BCNF & no multi-valued dependencies
– EMPLOYEE(Name,Dependent)– EMPLOYEE(Name, Project)
Domain/Key Normal Form
Every constraint on the relation is a logical consequence of the definitions of keys and domains
Constraints: rules, functional and multi-valued dependencies, anything that can be statically ascertained as true or false
Enforcing key and domain restrictions causes all of the constraints to be met
De-Normalization
Many databases are not normalized or poorly normalized implying bad design
We may also want to de-normalize to improve efficiency or ease of use
Consider the alternatives:– CUSTOMER(CustNo, CustName, City,
State, Zip)– CUSTOMER(CustNo, CustName, Zip)
CODES(Zip, City, State)
Optimization
There may be more than one way to normalize a table– COLLEGE(CollegeName, Dean, AsstDean)
» DEAN(CollegeName, Dean)ASSTDEAN(CollegeName, AsstDean)
» COLLEGE (CollegeName, Dean, AsstDean1, AsstDean2, AsstDean3)
Which is best depends on efficiency considerations