40
M1G505190 Introduction to Database Development 4. Improving the database design

M1G505190 Introduction to Database Development 4. Improving the database design

Embed Size (px)

Citation preview

Page 1: M1G505190 Introduction to Database Development 4. Improving the database design

M1G505190

Introduction to Database Development

4. Improving the database design

Page 2: M1G505190 Introduction to Database Development 4. Improving the database design

Representing the data model

You’ve now seen how to take a data model and represent it in a relational database. To recap: each class is represented by a table each object becomes a row in the table each table has a primary key which is a field or

set of fields which uniquely identifies each row a relationship between two tables is represented

by the foreign key field(s) which refer to the primary key of the related table

Introduction to Database Development 4. Improving the database design2

Page 3: M1G505190 Introduction to Database Development 4. Improving the database design

Database normalisation

However, there may still be problems which can result in data in the database becoming inaccurate or difficult to retrieve

We need to do some further checking of the database design to help prevent such problems

The process of formally checking and modifying a relational database design is called normalisation

Introduction to Database Development 4. Improving the database design3

Page 4: M1G505190 Introduction to Database Development 4. Improving the database design

Rules of normalisation

Normalisation uses a set of rules to check: that all fields are in the right tables whether we need to restructure or add tables to

the schema These rules were first proposed by E.F. Codd

in about 1970 Codd’s rules have become a key part of

relational database design

Introduction to Database Development 4. Improving the database design4

Page 5: M1G505190 Introduction to Database Development 4. Improving the database design

Using normalisation to design a database

Start with the “no-design” database Put everything you can think of into one

enormous table Following the rules of normalisation can fix

the database Data will be separated out into more

workable tables Not a recommended approach!

Introduction to Database Development 4. Improving the database design5

Page 6: M1G505190 Introduction to Database Development 4. Improving the database design

Data model and normalisation

As a general rule, a well thought-out data model tends to lead to a pretty well normalised database schema

However, any flaws in the data model design will have been translated in the database schema

Also, there may be problems which arise because of the way we have chosen to represent some features of the data model

Introduction to Database Development 4. Improving the database design6

Page 7: M1G505190 Introduction to Database Development 4. Improving the database design

Problems with un-normalised data

Problems with databases are usually a result of having attributes in the wrong tables

The solution usually involves moving attributes to different tables and creating additional tables

The GCUTours database is fairly well normalised, so we’ll look at some other examples, based on an IT Consultancy company

Introduction to Database Development 4. Improving the database design7

Page 8: M1G505190 Introduction to Database Development 4. Improving the database design

Repeated data

Assignments contains repeated data Repeated data can often become inaccurate

as values need to be entered repeatedly Which is correct contact no. for Acme Ltd.?

Would we have spotted this problem by looking at first 3 rows only?

Introduction to Database Development 4. Improving the database design8

Page 9: M1G505190 Introduction to Database Development 4. Improving the database design

Update anomalies

These are problems which arise when you try to add or remove data from a database

Update anomalies can make it impossible to get data into the database

Can cause important data to be lost from the database

Introduction to Database Development 4. Improving the database design9

Page 10: M1G505190 Introduction to Database Development 4. Improving the database design

Problems with inserting data

Possible primary key: (consultantID, clientnumber)

What if we add a new client, but don’t want to assign a consultant right away?

Can’t do this as primary key fields cannot be NULL

Introduction to Database Development 4. Improving the database design10

Page 11: M1G505190 Introduction to Database Development 4. Improving the database design

Problems with deleting data

Consultant 1001 is no longer to work for SuperPrint

We want to delete that assignment Whoops! We have just deleted all the

information we have in the database about SuperPrint

Introduction to Database Development 4. Improving the database design11

Page 12: M1G505190 Introduction to Database Development 4. Improving the database design

A better design

All of these problems arise because some of the fields in the Assignments table should really not be in that table

What we need is an additional table to store information about clients

Introduction to Database Development 4. Improving the database design12

Page 13: M1G505190 Introduction to Database Development 4. Improving the database design

Design flaws

In this example, the problem has arisen because the data model was flawed

The purpose of the Assignment entity simply wasn’t clear enough

Had we identified the need for a Client entity when designing the data model, and then the database problems would not have arisen

However, normalisation will pick up problems like this too

Introduction to Database Development 4. Improving the database design13

Page 14: M1G505190 Introduction to Database Development 4. Improving the database design

Functional dependencies

Normalisation is based on the idea of a functional dependency

The following statement is a functional dependency in the Consultants table

If we know the value of a consultant’s consultantID we

can tell the value of his or her last name 

Introduction to Database Development 4. Improving the database design14

Page 15: M1G505190 Introduction to Database Development 4. Improving the database design

Functional dependencies

Does it work the other way round? No If we know a consultant’s last name is Jones,

can we say for certain what her ID is? It looks like it from the data shown. However, the ID is unique and the last name

is not, so we could add another row later for a consultant called Bob Jones with ID 1006

Given the last name Jones the ID could be either 1002 or 1006

Introduction to Database Development 4. Improving the database design15

Page 16: M1G505190 Introduction to Database Development 4. Improving the database design

Think about these examples

Introduction to Database Development 4. Improving the database design16

Page 17: M1G505190 Introduction to Database Development 4. Improving the database design

Functional dependencies and keys

Keys are closely related to functional dependencies as follows: The key fields of a table should functionally

determine all the other fields in the table lastname does not functionally determine

consultantID, so it cannot be a key field However:

consultantID → (firstname, lastname) consultantID is a key

Introduction to Database Development 4. Improving the database design17

Page 18: M1G505190 Introduction to Database Development 4. Improving the database design

Primary keys

If we know that the ID is 1001 and the last name is Smith, can we say for sure what the first name is?

Yes we can – we can write this functional dependency as: (consultantID, lastname) → firstname

So the combination of these two fields is a key

But...

Introduction to Database Development 4. Improving the database design18

Page 19: M1G505190 Introduction to Database Development 4. Improving the database design

Primary keys

..we don’t actually need lastname in order to know firstname

The ID is sufficient as it is itself a key Therefore, this isn’t a primary key – a

primary key must have no unnecessary fields. The rule is:

A primary key has no subset of its fields that is also a key

Introduction to Database Development 4. Improving the database design19

Page 20: M1G505190 Introduction to Database Development 4. Improving the database design

Foreign keys

Remember that foreign keys must match primary keys

If we defined (consultantID, lastname) as the primary key of Consultants would need an additional lastname field in Assignments

This is an example of redundant data

Introduction to Database Development 4. Improving the database design20

need both fields as foreign keyneed both fields as foreign key

Page 21: M1G505190 Introduction to Database Development 4. Improving the database design

Normalisation and normal forms

Now that we know all about functional dependencies and primary keys, we are ready to do some normalisation

There are several levels of normalisation, called normal forms

We proceed through the forms, refining the tables and addressing additional problems each time

Introduction to Database Development 4. Improving the database design21

Page 22: M1G505190 Introduction to Database Development 4. Improving the database design

First Normal Form (1NF)

Ensures that we are not trying to cram several pieces of data into a single field the data in a table should be atomic

The following example is not in 1NF How would we query for all consultants with

‘databases’ skills?

Introduction to Database Development 4. Improving the database design22

Page 23: M1G505190 Introduction to Database Development 4. Improving the database design

Not the solution

Each field only contains one piece of information now

What if Jane Lee adds web design to her skills? Would have to add a new field to the table

To be properly atomic, a table can’t have multiple fields with the same kind of data

Introduction to Database Development 4. Improving the database design23

Page 24: M1G505190 Introduction to Database Development 4. Improving the database design

1NF rule

General rule for checking for 1NF: 

A table is not in first normal form if it contains data which is not atomic – that is, it keeps multiple values for a piece of information

Introduction to Database Development 4. Improving the database design24

Page 25: M1G505190 Introduction to Database Development 4. Improving the database design

1NF fix

Normalisation gives us rules – it also gives us ways to fix tables which don’t obey the rules

The fix for a table not in 1NF is:

Remove the multivalued information from the table. Create a new table with that information and the primary key of the original table.

Introduction to Database Development 4. Improving the database design25

Page 26: M1G505190 Introduction to Database Development 4. Improving the database design

1NF solution

Now have two tables:

This is a one-to-many relationship

Introduction to Database Development 4. Improving the database design26

Page 27: M1G505190 Introduction to Database Development 4. Improving the database design

Second Normal Form (2NF)

Getting to first normal form is a good start, but there can still be plenty of problems lurking

The following table is in 1NF:

Introduction to Database Development 4. Improving the database design27

Primary key is(consultantID, skill)

Table is prone to updateanomalies

Primary key is(consultantID, skill)

Table is prone to updateanomalies

Page 28: M1G505190 Introduction to Database Development 4. Improving the database design

2NF rule

General rule for checking for 2NF:

A table is in second normal form if it is in first normal form AND we need ALL the fields in the key to determine to values of the non-key fields

Introduction to Database Development 4. Improving the database design28

Page 29: M1G505190 Introduction to Database Development 4. Improving the database design

Why is the example not in 2NF?

Value of hourlyrate is functionally dependent only on the value of skill only part of the primary key. skill → hourlyrate

hourlyrate does not depend on consultantID database work is charged at £31 no matter who is

doing the work So, we don’t need all the fields in the key to

determine the value of the non-key field, hourlyrate

Introduction to Database Development 4. Improving the database design29

Page 30: M1G505190 Introduction to Database Development 4. Improving the database design

2NF fix

The fix for a table not in 2NF is:

Remove the non-key fields that are not dependent on the whole of the primary key. Create another table with those fields and the part of the primary key they do depend on.

Introduction to Database Development 4. Improving the database design30

Page 31: M1G505190 Introduction to Database Development 4. Improving the database design

2NF solution

Create extra table Skills The Consultants table is shown here also to

illustrate the relationships between the full set of tables

Introduction to Database Development 4. Improving the database design31

Page 32: M1G505190 Introduction to Database Development 4. Improving the database design

Many-to-many relationships

Careful design of the data model would probably have led to the same set of tables

This situation might have been modelled in the design process as a many-to-many relationship between Consultant and Skill

Solution is a typical representation in the database of this type relationship

Introduction to Database Development 4. Improving the database design32

Page 33: M1G505190 Introduction to Database Development 4. Improving the database design

Third Normal Form (3NF)

Tables in 2NFcan still cause problems Now, the company has two offices and each

consultant is based in one office

2NF, as the primary key is the single field consultantID

Still a problem with repeated data Introduction to Database Development 4. Improving the database design

33

inaccurate repeated datainaccurate repeated data

Page 34: M1G505190 Introduction to Database Development 4. Improving the database design

Functional dependencies and 3NF

The problem arises because the values of address and phone are dependent on the values of more than one field given the consultantID is 1001, you know that the

address is “Cowcaddens Road” given the office is Glasgow, you also know that the

address is “Cowcaddens Road”  So:

consultantID → address, phone office → address, phone

  Introduction to Database Development 4. Improving the database design

34

office is NOT part of the primary key

Page 35: M1G505190 Introduction to Database Development 4. Improving the database design

3NF rule

General rule for checking for 3NF:

A table is in third normal form if it is in second normal form AND no non-key fields depend on any fields that are not the primary key

Introduction to Database Development 4. Improving the database design35

Page 36: M1G505190 Introduction to Database Development 4. Improving the database design

3NF fix

The fix for a table not in 3NF is:

Remove the non-key fields that are dependent on a field (or fields) that is not the primary key. Create another table with those fields and the field(s) that they do depend on

Introduction to Database Development 4. Improving the database design36

Page 37: M1G505190 Introduction to Database Development 4. Improving the database design

3NF solution

The values of address and phone for each office are stored once only

It’s not possible to enter an invalid value for office in Consultants because each value must match a value in the Offices table

Introduction to Database Development 4. Improving the database design37

Page 38: M1G505190 Introduction to Database Development 4. Improving the database design

Summing up

A table is based on: the key; the whole key; and nothing but the key (so help me Codd)

Introduction to Database Development 4. Improving the database design38

Page 39: M1G505190 Introduction to Database Development 4. Improving the database design

Higher normal forms

For most cases, normalising to third normal form will take care of the likely problems

There are two more normal forms, 4NF and 5NF, which deal with more subtle problems

There is also Boyce-Codd normal form, which privdes a single statement which approximately encapsulates the first three normal forms

Introduction to Database Development 4. Improving the database design39

Page 40: M1G505190 Introduction to Database Development 4. Improving the database design

Why normalise?

To check the database schema and highlight any flaws in the data model design or the way it has been represented

or To fix a database which has already been

created without the aid of a suitable data modelling process

Introduction to Database Development 4. Improving the database design40