View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Relational Data Analysis
Learning outcomes understand the process of normalisation; perform Relational Data Analysis; recognise the importance of normalised databases; recognise first, second and third normal forms; augment and formalise your understanding of Logical
Data Modelling; decide when in the development cycle to perform
normalisation; evaluate the integrity of a data structure.
Relational Data AnalysisConcepts
Relations, Tables and Entity Types Repeating Groups and Levels Functional Dependencies Advantages of Normalisation Unnormalised Data First, Second and Third Normal Forms Rationalising 3NF Tables Converting 3NF Tables into an LDS
Relational Data AnalysisRelations and Tables
1970s
Edgar Codd
IBM
Supplier Name Supplier Address SupplierTel. No.
SupCon
SRW 115 Lancelot St 020-75630254 John
Off Beat Recordings
12 High St 020-8… ...
Bella Sonic Lake Industrial Estate
... ...
Supplier Number
1463
3621
2327
6762 … 3 Lot’s Corner ... ...
Rows
Columns
Attribute Names
Primary Key
Relational Data AnalysisTables and Entity Types
SupplierSupplier Number
Supplier Name
Supplier Address
Supplier Tel. Number
Supplier Contact Name
Supplier
Supplier NumberSupplier NameSupplier AddressSupplier Tel. NumberSupplier Contact Name
SUPPLIER (Supplier Number, Supplier Name, Supplier Address,Supplier Tel. Number, Supplier Contact Name)
Relational Data AnalysisRepeating Groups and Levels
Product’s Suppliers List
Product Number: 993201Product Name: Unbranded Blank 3hr Video TapesProduct Type Code: BVProduct Type Name: Blanc Video
Supplier Supplier Supplier’s Cost Main SupplierNumber Name Product Ref No Price Y/N 1463 SRW 3HVHS 54p Y 3628 Videos Are Us 438893 57p N 2327 Bella Sonic 3485VHS/3 53p N
Product’s Suppliers List
Product Number: 993201Product Name: Unbranded Blank 3hr Video TapesProduct Type Code: BVProduct Type Name: Blanc Video
Supplier Supplier Supplier’s Cost Main SupplierNumber Name Product Ref No Price Y/N 1463 SRW 3HVHS 54p Y 3628 Videos Are Us 438893 57p N 2327 Bella Sonic 3485VHS/3 53p N
Relational Data AnalysisRepeating Groups and Levels
PRODUCT’S SUPPLIERS
Product Number
Product Name
Product Type Code
Product Type Name
Supplier Number
Supplier Name
S/P Ref. Number
Cost Price
Main Supplier Y/N
level
1
1
1
1
2
2
2
2
2
Relational Data AnalysisFunctional Dependencies
An attribute X is said to be functionally dependent on an attribute Y if each value of Y is associated with only one value of X
For example, each product is of one product type. This means that each product number is associated with only one product type code. Product type code is therefore functionally dependent on product number. The opposite is not true as each product type code may be associated with many product numbers
Relational Data AnalysisFunctional Dependencies
Another way of phrasing functional dependency is to say that the value of X can be determined from the value of Y, or that Y functionally determines X
So product number functionally determines the value of product type code, or the value of product type code can be determined from the value of the product number, i.e. given the value of a product number we can always establish the value of the associated product type code.
Relational Data AnalysisNormalisation
The process of normalisation involves applying a series of refinements to groups of data items in order to produce tables that conform to specified standards, known as normal forms
Unnormalised tables are converted to First Normal Form by removing repeating groups into separate tables. Second and Third Normal Forms are achieved by reducing and splitting tables so that the only functional dependencies which exist are between the primary keys and the remaining non-key attributes.
Relational Data AnalysisAdvantages of Normalisation
Before describing normalisation in detail it is worth mentioning some of its advantages briefly. Data in Third Normal Form (3NF) consists of tables of closely associated attributes which are entirely dependent on ‘the key, the whole key, and nothing but the key’
This has the effect of minimising data duplication across different tables, thereby resolving many of the problems associated with data redundancy. In particular it should reduce the incidence of ‘update anomalies’
Relational Data AnalysisAdvantages of Normalisation
Update anomalies is the collective term for problems with modifying, inserting and deleting data from a database
These can be illustrated by considering the ‘unnormalised’ contents of the list of a product’s suppliers again
If we were to implement this data structure as it stands, and to use it as the only place in which product details were stored we would encounter the following problems:
Relational Data AnalysisAdvantages of Normalisation
Insertion Anomalies. No new suppliers could be added to the system without adding a product
Deletion Anomalies. If the last remaining product for a given supplier were deleted, then all information on that supplier would be lost
Amendment Anomalies. Any change to a supplier’s details (e.g. to the telephone number) would mean that every product for that supplier would need amending to keep it in line.
Relational Data Analysis UNF to 1NF
Product Number
Product Name
Product Type Code
Product Type Name
Supplier Number
Supplier Name
S/P Ref. Number
Cost Price
Main Supplier Y/N
1
1
1
1
2
2
2
2
2
UNF
Choose data items
Identify keys
Split groups
levelProduct NumberProduct NameProduct Type CodeProduct Type Name
Product NumberSupplier NumberSupplier NameS/P Ref. NumberCost PriceMain Supplier Y/N
1NF
Note key
Relational Data Analysis 1NF to 2NF
1NF
Does this attribute depend on the whole of the primary key?
Product NumberProduct NameProduct Type CodeProduct Type Name
2NFProduct NumberProduct NameProduct Type CodeProduct Type Name
Product NumberSupplier NumberSupplier NameS/P Ref. NumberCost PriceMain Supplier Y/N
Product NumberSupplier NumberS/P Ref. NumberCost PriceMain Supplier Y/N
Supplier NumberSupplier Name
Relational Data Analysis 2NF to 3NF
2NF
Is this attribute dependent on any other non-key attribute(s)?
Product Number Product Name*Product Type Code
Product Type Code Product Type Name
3NFProduct NumberProduct NameProduct Type CodeProduct Type Name
Product NumberSupplier NumberS/P Ref. NumberCost PriceMain Supplier Y/N
Supplier NumberSupplier Name
Product NumberSupplier NumberS/P Ref. NumberCost PriceMain Supplier Y/N
Supplier NumberSupplier Name
Note foreign
key
Relational Data AnalysisNaming the 3NF tables
3NF
PRODUCT
PRODUCT TYPE
SUPPLIER PRODUCT
SUPPLIER
Table Names Product Number Product Name*Product Type Code
Product Type Code Product Type Name
Product Number Supplier Number S/P Ref. Number Cost Price Main Supplier Y/N
Supplier Number Supplier Name
Relational Data AnalysisRationalising 3NF tables
Once we have carried out normalisation on a number of Functions we will have several sets of tables in 3NF, which we now rationalise into a single, larger set
Any tables that share a primary key should be merged, as should tables with matching candidate keys
We will also look for attributes which now act as foreign keys when compared with primary keys in other 3NF sets. A little care is needed to ensure that any synonyms or homonyms are identified, as failure to do so could lead to missing or spurious merges.
Relational Data AnalysisConverting 3NF tables into LDSs
3NF
PRODUCT Product Number Product Name*Product Type Code
Product Type Code Product Type Name
Product Number Supplier Number S/P Ref. Number Cost Price Main Supplier Y/N
Supplier Number Supplier Name
PRODUCT
Product Number *Product Type Code
PRODUCT TYPE
Product Type Code
SUPPLIER
Supplier Number
SUPPLIER PRODUCT
*Supplier Number *Product Number
PRODUCT TYPE
SUPPLIERPRODUCT
SUPPLIER
Small Keys Grab Large Keys
Crows Feet Grab Asterisks
Represent each table as an entity type box
List primary and foreign key attributes
Relational Data AnalysisComparing LDSs
We now compare our two data structures (the extract we just produced using RDA with the Required System LDM we have produced earlier in the development and decide whether any discrepancies are due to errors in Logical Data Modelling or whether they represent redundant information resulting from RDA
In practice there may be large numbers of entities involved in the comparison, so a fair amount of time is likely to be spent in identifying corresponding entities in the two models. Probably the best starting point is to look for common attributes, in particular common primary keys or candidate keys.
Relational Data AnalysisSummary
Relational Data Analysis (RDA) is based on material published in the 1970s by Edgar Codd of IBM, proposing the application of mathematical set theory and algebra to the organisation of data
RDA is used to create data model extracts from collections of individual data items, which can then be used to enhance or confirm the Required System LDM and to provide the basis for database design
Relational Data AnalysisSummary
The process of normalisation involves applying a series of refinements to groups of data items in order to produce tables that conform to specified standards, known as normal forms
Data in Third Normal Form (3NF) consists of tables of closely associated attributes which are entirely dependent on ‘the key, the whole key, and nothing but ‘the key, the whole key, and nothing but the key’the key’. This has the effect of minimising data duplication across different tables, thereby resolving many of the problems associated with data redundancy
Relational Data AnalysisSummary
Finally, the only way to learn normalisation is to practise.
This is specially true in order to understand the process of rationalisation and how to compare LDSs
The Place of Relational Data AnalysisD
ecis
ion
Str
uct
ure
Dec
isio
n S
tru
ctu
re
Pol
icie
s an
d P
roce
du
res
Pol
icie
s an
d P
roce
du
res
Use
r O
rgan
isat
ion
Use
r O
rgan
isat
ion
InvestigationInvestigation
ConstructionConstruction
SpecificationSpecification
Conceptual Model
Internal design
External Design
BAM
RD
WPM
DFM
FD
LDMRDA
BS
O