23
Relational Data Analysis Learning outcomes understand the process of normalisation; perform Relational Data Analysis; recognise the importance of normalised databases; recognise first, second and third normal forms; augment and formalise your understanding of Logical Data Modelling; decide when in the development cycle to perform normalisation; evaluate the integrity of a data structure.

Relational Data Analysis Learning outcomes understand the process of normalisation; perform Relational Data Analysis; recognise the importance of

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Relational Data Analysis

Learning outcomes understand the process of normalisation; perform Relational Data Analysis; recognise the importance of normalised databases; recognise first, second and third normal forms; augment and formalise your understanding of Logical

Data Modelling; decide when in the development cycle to perform

normalisation; evaluate the integrity of a data structure.

Relational Data AnalysisConcepts

Relations, Tables and Entity Types Repeating Groups and Levels Functional Dependencies Advantages of Normalisation Unnormalised Data First, Second and Third Normal Forms Rationalising 3NF Tables Converting 3NF Tables into an LDS

Relational Data AnalysisRelations and Tables

1970s

Edgar Codd

IBM

Supplier Name Supplier Address SupplierTel. No.

SupCon

SRW 115 Lancelot St 020-75630254 John

Off Beat Recordings

12 High St 020-8… ...

Bella Sonic Lake Industrial Estate

... ...

Supplier Number

1463

3621

2327

6762 … 3 Lot’s Corner ... ...

Rows

Columns

Attribute Names

Primary Key

Relational Data AnalysisTables and Entity Types

SupplierSupplier Number

Supplier Name

Supplier Address

Supplier Tel. Number

Supplier Contact Name

Supplier

Supplier NumberSupplier NameSupplier AddressSupplier Tel. NumberSupplier Contact Name

SUPPLIER (Supplier Number, Supplier Name, Supplier Address,Supplier Tel. Number, Supplier Contact Name)

Relational Data AnalysisRepeating Groups and Levels

Product’s Suppliers List

Product Number: 993201Product Name: Unbranded Blank 3hr Video TapesProduct Type Code: BVProduct Type Name: Blanc Video

Supplier Supplier Supplier’s Cost Main SupplierNumber Name Product Ref No Price Y/N 1463 SRW 3HVHS 54p Y 3628 Videos Are Us 438893 57p N 2327 Bella Sonic 3485VHS/3 53p N

Product’s Suppliers List

Product Number: 993201Product Name: Unbranded Blank 3hr Video TapesProduct Type Code: BVProduct Type Name: Blanc Video

Supplier Supplier Supplier’s Cost Main SupplierNumber Name Product Ref No Price Y/N 1463 SRW 3HVHS 54p Y 3628 Videos Are Us 438893 57p N 2327 Bella Sonic 3485VHS/3 53p N

Relational Data AnalysisRepeating Groups and Levels

PRODUCT’S SUPPLIERS

Product Number

Product Name

Product Type Code

Product Type Name

Supplier Number

Supplier Name

S/P Ref. Number

Cost Price

Main Supplier Y/N

level

1

1

1

1

2

2

2

2

2

Relational Data AnalysisFunctional Dependencies

An attribute X is said to be functionally dependent on an attribute Y if each value of Y is associated with only one value of X

For example, each product is of one product type. This means that each product number is associated with only one product type code. Product type code is therefore functionally dependent on product number. The opposite is not true as each product type code may be associated with many product numbers

Relational Data AnalysisFunctional Dependencies

Another way of phrasing functional dependency is to say that the value of X can be determined from the value of Y, or that Y functionally determines X

So product number functionally determines the value of product type code, or the value of product type code can be determined from the value of the product number, i.e. given the value of a product number we can always establish the value of the associated product type code.

Relational Data AnalysisNormalisation

The process of normalisation involves applying a series of refinements to groups of data items in order to produce tables that conform to specified standards, known as normal forms

Unnormalised tables are converted to First Normal Form by removing repeating groups into separate tables. Second and Third Normal Forms are achieved by reducing and splitting tables so that the only functional dependencies which exist are between the primary keys and the remaining non-key attributes.

Relational Data AnalysisAdvantages of Normalisation

Before describing normalisation in detail it is worth mentioning some of its advantages briefly. Data in Third Normal Form (3NF) consists of tables of closely associated attributes which are entirely dependent on ‘the key, the whole key, and nothing but the key’

This has the effect of minimising data duplication across different tables, thereby resolving many of the problems associated with data redundancy. In particular it should reduce the incidence of ‘update anomalies’

Relational Data AnalysisAdvantages of Normalisation

Update anomalies is the collective term for problems with modifying, inserting and deleting data from a database

These can be illustrated by considering the ‘unnormalised’ contents of the list of a product’s suppliers again

If we were to implement this data structure as it stands, and to use it as the only place in which product details were stored we would encounter the following problems:

Relational Data AnalysisAdvantages of Normalisation

Insertion Anomalies. No new suppliers could be added to the system without adding a product

Deletion Anomalies. If the last remaining product for a given supplier were deleted, then all information on that supplier would be lost

Amendment Anomalies. Any change to a supplier’s details (e.g. to the telephone number) would mean that every product for that supplier would need amending to keep it in line.

Relational Data Analysis UNF to 1NF

Product Number

Product Name

Product Type Code

Product Type Name

Supplier Number

Supplier Name

S/P Ref. Number

Cost Price

Main Supplier Y/N

1

1

1

1

2

2

2

2

2

UNF

Choose data items

Identify keys

Split groups

levelProduct NumberProduct NameProduct Type CodeProduct Type Name

Product NumberSupplier NumberSupplier NameS/P Ref. NumberCost PriceMain Supplier Y/N

1NF

Note key

Relational Data Analysis 1NF to 2NF

1NF

Does this attribute depend on the whole of the primary key?

Product NumberProduct NameProduct Type CodeProduct Type Name

2NFProduct NumberProduct NameProduct Type CodeProduct Type Name

Product NumberSupplier NumberSupplier NameS/P Ref. NumberCost PriceMain Supplier Y/N

Product NumberSupplier NumberS/P Ref. NumberCost PriceMain Supplier Y/N

Supplier NumberSupplier Name

Relational Data Analysis 2NF to 3NF

2NF

Is this attribute dependent on any other non-key attribute(s)?

Product Number Product Name*Product Type Code

Product Type Code Product Type Name

3NFProduct NumberProduct NameProduct Type CodeProduct Type Name

Product NumberSupplier NumberS/P Ref. NumberCost PriceMain Supplier Y/N

Supplier NumberSupplier Name

Product NumberSupplier NumberS/P Ref. NumberCost PriceMain Supplier Y/N

Supplier NumberSupplier Name

Note foreign

key

Relational Data AnalysisNaming the 3NF tables

3NF

PRODUCT

PRODUCT TYPE

SUPPLIER PRODUCT

SUPPLIER

Table Names Product Number Product Name*Product Type Code

Product Type Code Product Type Name

Product Number Supplier Number S/P Ref. Number Cost Price Main Supplier Y/N

Supplier Number Supplier Name

Relational Data AnalysisRationalising 3NF tables

Once we have carried out normalisation on a number of Functions we will have several sets of tables in 3NF, which we now rationalise into a single, larger set

Any tables that share a primary key should be merged, as should tables with matching candidate keys

We will also look for attributes which now act as foreign keys when compared with primary keys in other 3NF sets. A little care is needed to ensure that any synonyms or homonyms are identified, as failure to do so could lead to missing or spurious merges.

Relational Data AnalysisConverting 3NF tables into LDSs

3NF

PRODUCT Product Number Product Name*Product Type Code

Product Type Code Product Type Name

Product Number Supplier Number S/P Ref. Number Cost Price Main Supplier Y/N

Supplier Number Supplier Name

PRODUCT

Product Number *Product Type Code

PRODUCT TYPE

Product Type Code

SUPPLIER

Supplier Number

SUPPLIER PRODUCT

*Supplier Number *Product Number

PRODUCT TYPE

SUPPLIERPRODUCT

SUPPLIER

Small Keys Grab Large Keys

Crows Feet Grab Asterisks

Represent each table as an entity type box

List primary and foreign key attributes

Relational Data AnalysisComparing LDSs

We now compare our two data structures (the extract we just produced using RDA with the Required System LDM we have produced earlier in the development and decide whether any discrepancies are due to errors in Logical Data Modelling or whether they represent redundant information resulting from RDA

In practice there may be large numbers of entities involved in the comparison, so a fair amount of time is likely to be spent in identifying corresponding entities in the two models. Probably the best starting point is to look for common attributes, in particular common primary keys or candidate keys.

Relational Data AnalysisSummary

Relational Data Analysis (RDA) is based on material published in the 1970s by Edgar Codd of IBM, proposing the application of mathematical set theory and algebra to the organisation of data

RDA is used to create data model extracts from collections of individual data items, which can then be used to enhance or confirm the Required System LDM and to provide the basis for database design

Relational Data AnalysisSummary

The process of normalisation involves applying a series of refinements to groups of data items in order to produce tables that conform to specified standards, known as normal forms

Data in Third Normal Form (3NF) consists of tables of closely associated attributes which are entirely dependent on ‘the key, the whole key, and nothing but ‘the key, the whole key, and nothing but the key’the key’. This has the effect of minimising data duplication across different tables, thereby resolving many of the problems associated with data redundancy

Relational Data AnalysisSummary

Finally, the only way to learn normalisation is to practise.

This is specially true in order to understand the process of rationalisation and how to compare LDSs

The Place of Relational Data AnalysisD

ecis

ion

Str

uct

ure

Dec

isio

n S

tru

ctu

re

Pol

icie

s an

d P

roce

du

res

Pol

icie

s an

d P

roce

du

res

Use

r O

rgan

isat

ion

Use

r O

rgan

isat

ion

InvestigationInvestigation

ConstructionConstruction

SpecificationSpecification

Conceptual Model

Internal design

External Design

BAM

RD

WPM

DFM

FD

LDMRDA

BS

O