Designing of Databases
By Parteek Bhatia
Assistant ProfessorDepartment of Computer Science & Engineering
Thapar UniversityPatiala
Designing of Databases PCTE, Ludhiana, 7/3/2014 2
Objectives
Why Relational Database?Why there are so many keys like Primary
Key & Foreign Key?How to design a database?Entity-Relationship ModelConversion of E-R Diagram to TablesNormalization of Database
Designing of Databases PCTE, Ludhiana, 7/3/2014 3
Database
The related information when placed is an organized form makes a database. The organization of data/information is necessary because unorganized information has no meaning.
A database is a computer based record keeping system whose over all purpose is to record and maintain information.
Designing of Databases PCTE, Ludhiana, 7/3/2014 4
Operations on Database
InsertionTo add new information (e.g. to add the address of a new friend in your address book)
RetrievalTo view or retrieve the stored information (e.g. you have to find the address of one of your old friends)
UpdationTo modify or edit the existing information (e.g. your friend has shifted to a new place so his address would get changed)
DeletionTo remove or delete the unwanted information (e.g. your friend has changed his/her mobile number, so his/her mobile number would have to be removed from list.)
Designing of Databases PCTE, Ludhiana, 7/3/2014 5
Database Management System
A database management system is the software system that allows users to define, create and maintain a database and provides controlled access to the data.
ApplicationsComputerized library systemsAutomated teller machinesFlight reservation systemsComputerized parts inventory systems
SoftwaredBase, Foxpro, IMS, SQL Server, MySQL and Oracle
Designing of Databases PCTE, Ludhiana, 7/3/2014 6
Why Relational Database?Why RDBMS is called Relational DBMS?
Designing of Databases PCTE, Ludhiana, 7/3/2014 7
Relational DBMS
Relational model stores data in the form of tables.
It indicates the relation between rows and columns of a tables.
In Simple words, for every row-column combination there must at most a single value.
This concept purposed by Dr. E.F. Codd, a researcher of IBM in the year 1960s.
Designing of Databases PCTE, Ludhiana, 7/3/2014 8
Example of a Relation
Designing of Databases PCTE, Ludhiana, 7/3/2014 9
Keys of a Relation
Candidate KeyPrimary KeySuper KeyAlternate KeyForeign KeyArtificial Key
Designing of Databases PCTE, Ludhiana, 7/3/2014 10
Why there are so many keys like Primary Key & Foreign Key?
Designing of Databases PCTE, Ludhiana, 7/3/2014 11
Candidate Key
Candidate keys are those attributes of a relation, which have the properties of uniqueness and irreducibility. Irreducibility: No proper subset of K has the uniqueness property.
Designing of Databases PCTE, Ludhiana, 7/3/2014 12
Super Key
A super key has the uniqueness property but not necessarily the irreducibility property. For example if Roll_number is unique in relation STUDENT then, the set of attributes (Roll_number, Name, Class) is a super key for a relation STUDENT, these set of attributes are also unique, but this combination of keys (composite key) is not having the property of irreducibility because Roll_number which is one subset of the composite key is also unique itself. Thus, this composite key is called as super key because it has the property of uniqueness but not the irreducibilty.
Designing of Databases PCTE, Ludhiana, 7/3/2014 13
Primary Key
Primary key is a candidate key choose by the designer for unique identification of records of a relation.
Primary key cannot contain any Null value because we cannot uniquely identify multiple Null values.
Designing of Databases PCTE, Ludhiana, 7/3/2014 14
Alternate Key
The alternate keys of any table are simply those candidate keys, which are not currently selected as the primary key.
Designing of Databases PCTE, Ludhiana, 7/3/2014 15
Artificial Key
Designing of Databases PCTE, Ludhiana, 7/3/2014 16
Foreign Key
Foreign keys are the attributes of a table, which refers to the primary key of some another table. Foreign Keys permit only those values, which appears in the primary key of the table to which it refers or may be null. Foreign keys are used to link together two or more different tables which have some form of relationship with each other. The foreign key is a reference to the tuple of a table from which it was taken, this tuple being called the Referenced or Target tuple. The table containing the referenced tuple will be called as Target table.
The matter of integrity of foreign keys is referred to as Referential Integrity.
Designing of Databases PCTE, Ludhiana, 7/3/2014 17
Foreign Key Contd..
Designing of Databases PCTE, Ludhiana, 7/3/2014 18
Foreign Key Contd..
Designing of Databases PCTE, Ludhiana, 7/3/2014 19
How to design the database?
Designing of Databases PCTE, Ludhiana, 7/3/2014 20
How to design the database?
There are two approachesE-R Modeling: Identifying entity and
relationsNormalization: Refinement of database
designing
Designing of Databases PCTE, Ludhiana, 7/3/2014 21
Designing of Databases PCTE, Ludhiana, 7/3/2014 22
Entity-Relationship Model
Designing of Databases PCTE, Ludhiana, 7/3/2014 24
E-R Model
The Entity-Relationship (ER) model was originally proposed by Peter in 1976
The ER model is a conceptual data model that views the real world as entities and relationships.
A basic component of the model is the Entity-Relationship diagram, which is used to visually represent data objects.
Designing of Databases PCTE, Ludhiana, 7/3/2014 25
Basic Constructs of E-R Modeling
A database can be modeled as: a collection of entities, relationship among entities.
An entity is an object that exists and is distinguishable from other objects. Example: specific person, company, event,
plant Entities have attributes
Example: people have names and addresses An entity set is a set of entities of the same type that
share the same properties. Example: set of all persons, companies, trees, holidays
Designing of Databases PCTE, Ludhiana, 7/3/2014 26
Entity Sets customer and loan
Designing of Databases PCTE, Ludhiana, 7/3/2014 27
Relationship Set borrower
Designing of Databases PCTE, Ludhiana, 7/3/2014 28
Attributes
Attributes describe the properties of the entity of which they are associated. We can classify attributes as following:
SimpleCompositeSingle-valuesMulti-valuesDerived
Designing of Databases PCTE, Ludhiana, 7/3/2014 29
Designing of Databases PCTE, Ludhiana, 7/3/2014 30
Designing of Databases PCTE, Ludhiana, 7/3/2014 31
Designing of Databases PCTE, Ludhiana, 7/3/2014 32
Example
Designing of Databases PCTE, Ludhiana, 7/3/2014 33
Degree of a Relationship
The degree of a relationship is the number of entities associated with the relationship. The n-ary relationship is the general form for degree n. Special cases are the binary, and ternary, where the degree is 2, and 3, respectively.
Designing of Databases PCTE, Ludhiana, 7/3/2014 34
Connectivity and Cardinality
The connectivity of a relationship describes the mapping of associated entity instances in the relationship. The values of connectivity are "one" or "many". The cardinality of a relationship is the actual number of related occurrences for each of the two entities. The basic types of connectivity for relations are:
One to One (1:1) One to Many (1:M) Many to One (M:1) Many to Many (M:M)
Designing of Databases PCTE, Ludhiana, 7/3/2014 35
Designing of Databases PCTE, Ludhiana, 7/3/2014 36
Designing of Databases PCTE, Ludhiana, 7/3/2014 37
Designing of Databases PCTE, Ludhiana, 7/3/2014 38
Designing of Databases PCTE, Ludhiana, 7/3/2014 39
Direction
The direction of a relationship indicates the originating entity of a relationship. The entity from which a relationship originates is the parent entity; the entity where the relationship terminates is the child entity.
The type of the relation is determined by the direction of line connecting relationship component and the entity. To distinguish different types of relation, we draw either a directed line or an undirected line between the relationship set and the entity set. Directed line is used to indicate one occurrence and undirected line is used to indicate many occurrences in a relation as shown in next case.
Designing of Databases PCTE, Ludhiana, 7/3/2014 40
Designing of Databases PCTE, Ludhiana, 7/3/2014 41
Designing of Databases PCTE, Ludhiana, 7/3/2014 42
Designing of Databases PCTE, Ludhiana, 7/3/2014 43
E-R Notation
Entities are represented by labeled rectangles. The label is the name of the entity. Entity names should be singular nouns.
Attributes are represented by Ellipses. A solid line connecting two entities represents relationships. The
name of the relationship is written above the line. Relationship names should be verbs and diamonds sign is used to represent relationship sets.
Attributes, when included, are listed inside the entity rectangle. Attributes, which are identifiers, are underlined. Attribute names should be singular nouns.
Multi-valued attributes are represented by double ellipses. Directed line is used to indicate one occurrence and undirected
line is used to indicate many occurrences in a relation.
Designing of Databases PCTE, Ludhiana, 7/3/2014 44
Designing of Databases PCTE, Ludhiana, 7/3/2014 45
E-R Notation
Designing of Databases PCTE, Ludhiana, 7/3/2014 46
Example
Designing of Databases PCTE, Ludhiana, 7/3/2014 47
Designing of Databases PCTE, Ludhiana, 7/3/2014 48
Customer-Loan Relationship
Designing of Databases PCTE, Ludhiana, 7/3/2014 49
Exercise
Consider the following database: S (S#, SSNAME, STATUS, CITY) P (P#, PNAME, COLOR,
WEIGHT, CITY) J ( J#, JNAME, CITY) SPJ( S#, P#, J#, QTY)
Here, S indicates information of suppliers, P Parts, J Projects and SPJ indicates the supplied quantity details.
Designing of Databases PCTE, Ludhiana, 7/3/2014 50
Designing of Databases PCTE, Ludhiana, 7/3/2014 51
Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set E.g. participation of loan in borrower is total
every loan must have a customer associated to it via borrower Partial participation: some entities may not participate in any relationship
in the relationship set Example: participation of customer in borrower is partial
Total Participation
Designing of Databases PCTE, Ludhiana, 7/3/2014 52
Some More Examples
Designing of Databases PCTE, Ludhiana, 7/3/2014 53
Some More Examples
Designing of Databases PCTE, Ludhiana, 7/3/2014 54
Some More Examples
Designing of Databases PCTE, Ludhiana, 7/3/2014 55
Strong and Weak Entity Sets
The entity set which does not has sufficient attributes to form a primary key is called as weak entity set. An entity set that has a primary key is called as Strong entity set.
Consider an entity set Payment which has three attributes: payment_number, payment_date and payment_amount. Although each payment entity is distinct but payment for different loans may share the same payment number. Thus, this entity set does not have a primary key and it is a weak entity set. Each weak set must be a part of one-to-many relationship set.
Designing of Databases PCTE, Ludhiana, 7/3/2014 56
Strong and Weak Entity Sets
A member of a strong entity set is called dominant entity and member of weak entity set is called as subordinate entity. A weak entity set does not have a primary key but we need a means of distinguishing among all those entries in the entity set that depend on one particular strong entity set. The discriminator of a weak entity set is a set of attributes that allows this distinction to be made. For example, payment_number is acts as discriminator for payment entity set. It is also called as the Partial key of the entity set.
The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator. In the above example {loan_number, payment_number} acts as primary key for payment entity set.
Designing of Databases PCTE, Ludhiana, 7/3/2014 57
E-R Diagram of Baking System
59Designing of Databases
PCTE, Ludhiana, 7/3/2014
Conversion of E-R Diagram to Tables
Designing of Databases PCTE, Ludhiana, 7/3/2014 61
Entity Set to Table
For each entity set and relationship set there is a unique table, which is assigned the name of the corresponding entity set or relationship set. Each table has a number of columns (generally corresponding to attributes), which have unique names. Primary keys allow entity sets and relationship sets to be expressed uniformly as tables, which represent the contents of the database.
Designing of Databases PCTE, Ludhiana, 7/3/2014 62
Representing Entity sets as Tables
Designing of Databases PCTE, Ludhiana, 7/3/2014 63
Composite and Multi-Value Attributes
In order to convert an entity having composite attributes, the composite attributes are flattened out by creating a separate attribute for each component attribute.
Designing of Databases PCTE, Ludhiana, 7/3/2014 64
Composite Attributes
Designing of Databases PCTE, Ludhiana, 7/3/2014 65
Multi-Value Attributes
Designing of Databases PCTE, Ludhiana, 7/3/2014 66
Multi-Value Attributes
Designing of Databases PCTE, Ludhiana, 7/3/2014 67
Relationship Sets as Tables
Designing of Databases PCTE, Ludhiana, 7/3/2014 68
Many-to-one and One-to-Many Relationship Sets
Normalization for
Refinement of Database
Designing of Databases PCTE, Ludhiana, 7/3/2014 70
Normalization
It is a process of decomposing a larger table into smaller tables so that it satisfy series of tests. If the database satisfies the test, then database is considered normalized according to that test or rule or degree. There are five series of test that we apply on the database, so there are five degree or rules of normalization which are known as Normal First Normal Form, Second Normal Form and so on.
When a test fails, the relation violating that test must be decomposed into relations so that it individually meet the normalization tests.
Designing of Databases PCTE, Ludhiana, 7/3/2014 71
Objectives of Normalization
To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes.
To obtain powerful relational retrieval algorithms based on a collection of primitive relational operators.
To free relations from undesirable insertion, update and deletion anomalies.
To reduce the need for restructuring the relations as new data types are introduced.
To carry out series of tests on individual relation schema so that the relational database can be normalized to some degree. When a test fails, the relation violating that test must be decomposed into relations that individually meet the normalization tests.
Designing of Databases PCTE, Ludhiana, 7/3/2014 72
Designing of Databases PCTE, Ludhiana, 7/3/2014 73
Functional Dependence (FD)
In a relation R having two attributes X and Y, if for each value of X there should be one value of Y, then Y is called functionally dependent on X. In other words, X is the determinant and Y is the determined then we say that X functionally determines Y and graphically represent this as XY. The symbols XY can also be expressed as Y is functionally determined by X.
For each value of the determinant there is associated one and only one value of the determined.
Designing of Databases PCTE, Ludhiana, 7/3/2014 74
Functional Dependence (FD)
The following table illustrates A B:
A B
1 1
2 4
3 9
4 16
2 4
7 9
Designing of Databases PCTE, Ludhiana, 7/3/2014 75
Functional Dependence (FD)
The following table illustrates that A does not functionally determine B:A B1 12 43 94 163 10
Since for A = 3 there is associated more than one value of B.Functional dependency can also be defined as follows:An attribute in a relational model is said to be functionally dependent on
another attribute in the table if it can take only one value for a given value of the attribute upon which it is functionally dependent.
Designing of Databases PCTE, Ludhiana, 7/3/2014 76
Designing of Databases PCTE, Ludhiana, 7/3/2014 77
Fully Functional Dependence (FFD)
Fully Functional Dependence(FFD) is defined as Attribute Y is FFD on attribute X , if it is FD on X and not FD on any proper subset of X.
(X1, X2) Y
X
X1 Y
X2 Y
Designing of Databases PCTE, Ludhiana, 7/3/2014 78
Fully Functional Dependence (FFD)
For example, in relation Supplier, different cities may have the same status. It may be possible that cities like Amritsar, Jalandhar may have the same status 10.So, the City is not FD on StatusBut, the combination of Sno,Status can give only one corresponding City ,because Sno is unique. Thus,
(Sno, Status) City It means city is FD on composite attribute (Sno,Status)
however City is not fully functional dependent on this composite attribute,
Designing of Databases PCTE, Ludhiana, 7/3/2014 79
Fully Functional Dependence (FFD)
Consider the another case of SP table:Here, Qty is FD on combination of Sno, Pno.Here, X has two proper subsets Sno and PnoQty is not FD on Sno, because one Sno can
supply more than one quantity.Qty is also not FD on Pno, because one Pno
may be supplied many times by different suppliers with different or same quantities.
So, Qty is FFD on composite attribute of (Sno, Pno)à Qty.
Designing of Databases PCTE, Ludhiana, 7/3/2014 80
First Normal Form
Definition of First Normal Form A relation is said to be in First Normal
Form (1NF) if and only if every entry of the relation (the intersection of a tuple and a column) has at most a single value. In other words “a relation is in First Normal Form if and only if all underlying domains contain atomic values or single value only.”
Designing of Databases PCTE, Ludhiana, 7/3/2014 81
Designing of Databases PCTE, Ludhiana, 7/3/2014 82
First Approach: Flattening the table
The first approach known as “flattening the table” removes repeating groups by filling in the “missing” entries of each “incomplete row” of the table with copies of their corresponding non-repeating attributes.
Designing of Databases PCTE, Ludhiana, 7/3/2014 83
Designing of Databases PCTE, Ludhiana, 7/3/2014 84
Second Approach: Decomposition of the table
The second approach for normalizing a table requires that the table be decomposed into two new tables that will replace the original table.
However, before decomposing the original table it is necessary to identify an attribute or a set of its attributes that can be used as table identifiers.
Designing of Databases PCTE, Ludhiana, 7/3/2014 85
Rule of decomposition
One of the two tables contains the table identifier of the original table and all the non-repeating attributes.
The other table contains a copy of the table identifier and all the repeating attributes.
Designing of Databases PCTE, Ludhiana, 7/3/2014 86
Designing of Databases PCTE, Ludhiana, 7/3/2014 87
Anomalies in 1NF Relations (Considering STUDENT table)
Designing of Databases PCTE, Ludhiana, 7/3/2014 88
Second Normal Form
A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully functional dependent on the primary key.
A resultant database of first normal form COURSE_CODE does not satisfy above rule, because non-key attributes Name, System_Used and Hourly_Rate are not fully dependent on the primary key (Course_Code, Rollno) because Name, System_Used and Hourly_Rate are functional dependent on Rollno and Rollno is a subset of the primary key so it does not hold the law of fully functional dependence. In order to convert COURSE_CODE database into second normal form following rule is used.
Designing of Databases PCTE, Ludhiana, 7/3/2014 89
FD Diagram
Rollno
Hourly_Rate
System_Used
Name
Course_Code
Course_Name
Teacher_NameTotal_hrs
Designing of Databases PCTE, Ludhiana, 7/3/2014 90
Rule to convert First Normal Form to Second Normal Form
Designing of Databases PCTE, Ludhiana, 7/3/2014 91
Data Anomalies in 2NF Relations?
Designing of Databases PCTE, Ludhiana, 7/3/2014 93
Designing of Databases PCTE, Ludhiana, 7/3/2014 94
Third Normal Form
A relation R is in Third Normal Form (3NF) if and only if the following conditions are satisfied simultaneously:
R is already in 2NF No nonprime attribute is transitively dependent on the key.
Another way of expressing the conditions for Third Normal Form is as follows:
R is already in 2NF No nonprime attribute functionally determines any other nonprime
attribute.
These two sets of conditions are equivalent.
Designing of Databases PCTE, Ludhiana, 7/3/2014 95
Transitive Dependencies
Assume that A,B and C are the set of attributes of a relation R. Further assume that the following functional dependencies are satisfied simultaneously: AB, BA (B not functionally depends A), BC, AC and CA (C not functionally depends A). Observe that C B is neither prohibited nor required. If all these conditions are true, we will say that attribute C is transitively dependent on attribute A. It should be clear that these functional depend
Designing of Databases PCTE, Ludhiana, 7/3/2014 96
Transitive Dependencies
Designing of Databases PCTE, Ludhiana, 7/3/2014 97
Role to Convert a Relation to Third Normal Form
Designing of Databases PCTE, Ludhiana, 7/3/2014 98
Designing of Databases PCTE, Ludhiana, 7/3/2014 99
Case Study
Designing of Databases PCTE, Ludhiana, 7/3/2014 100
Sno
Pno
City
Qty
Status
F-D Diagram
Designing of Databases PCTE, Ludhiana, 7/3/2014 101
Designing of Databases PCTE, Ludhiana, 7/3/2014 102
For example, consider a relation
SSP ( Sno, Sname, Pno, Qty )
Special Case
Designing of Databases PCTE, Ludhiana, 7/3/2014 103
Boyce/Codd N/F (BCNF)
BCNF states that A relation R is in Boyce/Codd N/F (BCNF) if
and only if every determinant is a candidate key. Here determinant is a simple attribute or composite attribute on which some other attribute is fully functionally dependent.
For example Qty is FFD on (Sno, Pno) (Sno, Pno) Qty, here
(Sno, Pno) is a composite determinant.Sno Sname
Here Sno is simple attribute determinat.
Designing of Databases PCTE, Ludhiana, 7/3/2014 104
Overlapping of Candidate keys
In order to show the difference between 3NF and BCNF, relations having overlapping of candidate keys are considered in detail.
Two candidate keys overlap if they involve two or more attributes each and have an attribute in common.
(Id_no, Item_No) Quantity (Name,Item_No) Quantity Item_NoName NameItem_No
Designing of Databases PCTE, Ludhiana, 7/3/2014 105
Another Case
Sno, Sname, Pno, Qty)
Here, let us suppose that Sname (supplier name) is unique for each Sno (supplier number) as shown below:
Sno Sname Pno Qty
S1 Rahat P1 300
S2 Raju P2 200
S1 Rahat P3 100
S2 Raju P1 200
Designing of Databases PCTE, Ludhiana, 7/3/2014 106
Designing of Databases PCTE, Ludhiana, 7/3/2014 107
DENORMALIZATION
Denormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software.
A normalized design will often store different but related pieces of information in separate logical tables (called relations). If these relations are stored physically as separate disk files, completing a database query that draws information from several relations (a join operation) can be slow. If many relations are joined, it may be prohibitively slow.
Designing of Databases PCTE, Ludhiana, 7/3/2014 108
Uses of Denormalization
Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for Online Analytical Processing (OLAP). OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate Business Intelligence applications. Specifically, dimensional tables in a star schema often contain denormalized data.
Helpful in retrieval based applications.
Designing of Databases PCTE, Ludhiana, 7/3/2014 109
Available Web Resources
Website www.parteek.co.cc
Group http://groups.google.com/group/parteek-bhatia
Personal Blog www.parteekbhatia.co.cc
E-mail ids [email protected]
Thanks
All the Best