25
Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

Embed Size (px)

Citation preview

Page 1: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

Lecture 8Logical Database

Design

SFDV2002 - Principles of Information Systems

Page 2: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

2

Page 3: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

3

Levels of Information DesignHigh

Low

Employee

SalaryEmployee Project Role

Project

Budget

1: Specification

2: ImplementationCREATE TABLE department( dept_code CHAR(4), name VARCHAR2(30) NOT NULL,

PRIMARY KEY (dept_code), UNIQUE (name));

CREATE TABLE employee( emp_id NUMBER(7), firstnames VARCHAR2(50) NOT NULL, surname VARCHAR2(50) NOT NULL, phone VARCHAR2(15), sex CHAR(1) DEFAULT 'F', dept_code CHAR(4) NOT NULL,

PRIMARY KEY (emp_id), FOREIGN KEY (dept_code) REFERENCES department );

CREATE TABLE department( dept_code CHAR(4), name VARCHAR2(30) NOT NULL,

PRIMARY KEY (dept_code), UNIQUE (name));

Ab

stra

ctio

n s

pect

rum

Conceptual

Logical

Physical

Business

System

Technology

Page 4: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

4

Overview

Databases Choosing databases Features

Relationship Model Relations, Keys, etc. Integrity constraints Referential integrity

Transformation (ERD to Database)

Page 5: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

5

Databases“… is a collection of persistent data that is used

by the application systems of some given enterprise.”

Logical organisation of data

Requires DBMS to be of any use

What are they good for?

Advantages over paperCompact: No need for tonnes of paper filesSpeed: Computers can retrieve and update information faster than humansDrudgery: Tedium of maintaining files is removedCurrency: Accurate, up-to-date information available anytimeProtection: Information better protected against unintentional loss

[Date, 2004]

Page 6: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

6

Database ModelsHierarchical

Network

Relational

Object- Fit well with OOP, HybridXML

Tree structure – Data is organized in a top-down Suited to one to many relationshipsAdvantages

fast accessDisadvantages

Non-hierarchical data retrieval difficultOther relationships difficult to representHard to change data structure (modify)

Can represent many:many relationshipsAdvantage: fast accessDisadvantages: Inflexible

Page 7: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

7

Which DBMS to use?

Database sizeNumbers of concurrent users -

scalabilityPerformance – how fastIntegration – ability to export import

data between applicationsFeatures – security, Etc.Vendor – reputation & financial

stability of the vendorCost

Page 8: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

8

DBMS FeaturesData storage

managementData dictionaryData independenceSecurity

managementMulti-user access

controlBackup and recovery

management

Data integrity management

Performance monitoring and optimisation

Standardisation of data access

Page 9: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

9

1- Data storage management (plus creation)•Controls the storing, retrieval, and updating of data.•Data independence – how the data is model is independent the actual physical storage•Accessed (defined) by the user – query language SQL (covered in the next lecture)2- Data dictionary management•Definitions of the data elements and their relationships (metadata) •Data types and structure (what is modelled by entities, attributes, and relationships) •Provide a standard definition of terms and data elements3- Data Independence•Automatic and invisible transformation and presentation (physical storage)•Data independence – logical, physical – DBMS need to control and make these transformation as transparently as possible.4- Security management•Enforces user security and data privacy within a database•Rules determine which users can access the databases – which data items and which operations (read, add, delete, or modify) the user may perform.•More important is multi-user databases5- Multi-user access control•More than one person trying to access the database at the same time •Procedures and process are required in order to maintain data integrity & data consistency (e.g. What happens when two people trying to change the same record at the same time6- Backup and recovery management•DBMS provide backup, both onsite (other computers) and offsite •Recovery plans for when data is lost

Page 10: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

10

Recall Quality Information?Accurate Complete Economical

Current Relevant

Page 11: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

11

Poor Database DesignInconsistent dataIncorrect dataMissing dataLost dataData redundancyEmploye

eSalar

yProject Name

Budget (M)

Role

Brown 20 Alpha 2 Technician

Green 35 Gamma 15 Designer

Green 35 Epsilon 9 Designer

Hoskins 55 Epsilon 9 Manager

Hoskins 55 Gamma 15 Consultant

Moore 48 Gamma 15 Manager

Moore 48 Epsilon 9 Designer

Inconsistent data: where contradictory facts are stored in the database, it is not always easy to identify which fact is the correct one, and which should be changed or removed. Example: For one person you may have two different dates of birth storedIncorrect data: Where facts do not reflect the real-world, Errors could result of poor data entry Or could be caused by data corruption after entryMissing data: Where a desired fact was never captured, Usually indicated with a NULL Lost data: Occurs when a previously stored fact has been deleted, either deliberately or accidentally Data Redundancy: When the same fact is stored twice – which lead to inconsistencies (anomalies). Example: Relation associates employees with projects – assume no nulls are allowed, Note that this doesn’t violate the rule that relations cannot have duplicate rows.

Page 12: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

12

Anomaly 1: UpdateEmployee

Salary Project Budget

Role

Brown 20 Alpha 2 Technician

Green Gamma

15 Designer

Green Epsilon 9 Designer

Hoskins 55 Epsilon 9 Manager

Hoskins 55 Gamma

15 Consultant

Moore Gamma

15 Manager

Moore 48 Epsilon 9 Designer

35

35Both values    updated: OK    

37

37

48Only one value updated

50

ANOMALY!Action: Update salaryEach person’s salary is repeated for each project they are involved with. What does this imply when we need to increase someone’s salary?

Page 13: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

13

Anomaly 2: DeletionEmployee

Salary ProjectBudge

tRole

Green 35 Gamma

15 Designer

Green 35 Epsilon 9 Designer

Hoskins 55 Epsilon 9 Manager

Hoskins 55 Gamma

15 Consultant

Moore 48 Gamma

15 Manager

Moore 48 Epsilon 9 Designer

Brown 20 Alpha 2 Technician

What happens to(Brown, 20)?

ANOMALY!Action: Delete project AlphaIf a project ends (i.e., is deleted), what happens to the data for employees on that project?Project Alpha ends and the corresponding row for it is deleted.We now can’t store any data about employee Brown, because they are no longer assigned to any projects.

Page 14: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

14

Anomaly 3: Insertion

Employee

Salary Project Budget

Role

Brown 20 Alpha 2 Technician

Green 35 Gamma

15 Designer

Green 35 Epsilon 9 Designer

Hoskins 55 Epsilon 9 Manager

Hoskins 55 Gamma

15 Consultant

Moore 48 Gamma

15 Manager

Moore 48 Epsilon 9 Designer

Employee

Salary Project Budget Role

Brown 20 Alpha 2 Technician

Green 35 Gamma

15 Designer

Green 35 Epsilon 9 Designer

Hoskins 55 Epsilon 9 Manager

Hoskins 55 Gamma

15 Consultant

Moore 48 Gamma

15 Manager

Moore 48 Epsilon 9 Designer

Johnson 36 ??? ??? ???

ANOMALY!

Where do we store(Johnson, 36) until then?

Action: Hire Johnson on a salary of 36, but they haven’t been assigned to any project yet.We aren’t allowed to store nulls, which means we can’t add Johnson until they’ve been assigned to a project (click). This is effectively the inverse of the problem on the previous slide.

Page 15: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

15

Reduce Redundancy

Employee

Salary

Brown 20

Green 35

Hoskins 55

Moore 48

Employee

Project Role

Brown Alpha Technician

Green Gamma

Designer

Green Epsilon Designer

Hoskins Epsilon Manager

Hoskins Gamma

Consultant

Moore Gamma

Manager

Moore Epsilon Designer

Project Budget

Alpha 2

Gamma

15

Epsilon 9

Employee ProjectRole

Breaking up the relation eliminates the worst of the redundancy Normalisation is a process which groups logically related data into a structure, has minimal redundancy and has no update anomalies (later courses=SFDV3003)

Page 16: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

16

Relational Databases

Devised in 1969 by Edgar CoddThree aspects

1.Structural2. Integrity3.Manipulation

Attribute

Tuples

Relation

Page 17: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

17

Table / Relation:• Entities transformed into relations (physically tables in the database)• Data model is independent • Although table is used a synonym for relation• Physical level – Record type or fileRows / Tuples:• Entity occurrences = tuples (row) = MS Access records• Contains all the attribute values for a particular occurrence of a

relation• In the relation model order of tuples not important (i.e. ordering is

irrelevant)• Tuples must be unique (i.e. no duplicates allowed).Attributes:• Attributes are referred to by name not position (order is not

significant)• Attributes (intersection of row and column) can contain just one value

atomic• Attribute types (domains) refer to the set of values or pool of values

that the attribute can contain (often represented as a data type)Aspects:1.Structural: Data in DB is perceived by user as tables, and nothing but

tables2.Integrity: Tables satisfy certain integrity constraints (later in the

lecture)3.Manipulative: Operators available to users for manipulating (update,

delete, create, read) tables

Page 18: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

18

Relational Keys

Paper_code

COMP102

PSYC101

COMP102

Name Birth_date

Mickey 3/4/1963

Pluto 3/4/1963

Mickey 6/11/1975

Paper Title Description

COMP102 Software Enginee … …

PSYC101 … …

Composite PK FK

Non-composite PK

Types » Candidate key Primary key Alternate key Composite key Surrogate key Foreign key

Page 19: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

19

Candidate keyAny key that meets the unique, stable, and minimal. Can be > 1 for any given relation.

Primary keyJust 1 (chosen from the candidate key)

Alternate keyCandidate keys that do not become the primary key

Composite key (compound)Key (any of the above types plus foreign keys) that have more than one attributes

Surrogate key (artificial)An “invented” key (e.g. example customer ID which is just numbers).

Foreign keyUse to form relationships between entities (with primary keys).Not necessarily unique

Page 20: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

20

Referential Integrity Example

STUDENT

StudentNo

Name … CourseID

5467346 Jenny … BBW

1676349 Mun Chan

… DRC

9437316 Alexander

… DFA

4346786 Richard … BBW

7643465 Monique … <null>

134675 Sarah … DJK

… … … …

COURSE

CourseID Title Length

BKEBachelor of Kite Engineering

36

DRCDiploma in Rock Climbing

12

BBWBachelor of Bird Watching

24

DFADiploma in Flower Arranging

18?Violation ofreferentialintegrity PK

FK

[Source: D’Orazio and Happel, 1996]

Page 21: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

21

Review referential integrity

Using "Enforce Referential Integrity" (i.e. tick the check box) will match the related records of two tables and return zero value of Anomaly records).

When the "Cascade Update Related Fields" check box is selected, changing a primary key value in the primary table (main) automatically updates the matching values in all related records.

When the "Cascade Delete Related Fields" check box is selected, deleting a record in the primary table deletes any related records) in the related table.

Page 22: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

22

The Transformation Process

General rules:1. Each entity becomes a relation2. Each attribute becomes an attribute in corresponding

relation3. Unique identifiers become primary keys (PK) in

corresponding relation4. Implement relationships through foreign key (FK)

placements

Conceptual ERD

Candidate relations

Database Tables

[Source: D’Orazio and Happel, 1996]

Page 23: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

23

Relationship Transformation Rules

1:1 Place PK of first relation into the second

relation as a foreign key (or vice-versa)1:M

Place PK of the ‘1’ end relation into the ‘M’ end relation as a FK

M:M Create a new ‘all key relation’ to represent M:M

relationship Follow 1:M transformation rules

Page 24: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

24

M:M Transformation Example

GenreG#, desc, …

CDCD#, title, …

GenreG#, desc, …

ClassificationG#, CD#

CDCD#, title, …

always one& mandatory

Intersecting relation

Page 25: Lecture 8 Logical Database Design SFDV2002 - Principles of Information Systems

25

References

Date, An Introduction to Database Systems, 8th Edition, Addison Wesley, 2004

Rob and Coronel, Database Systems: Design, Implementation, and Management, 7th Edition, Thomson, 2007

-------------------------------------------------------

Note: Start Practical Sessions 4