60
V 1.0 OE NIK 2013 1 DBMAN 1. Expectations Data modeling basics Normalization Correct decomposition

V 1.0 OE NIK 2013 1 DBMAN 1. Expectations Data modeling basics Normalization Correct decomposition

Embed Size (px)

Citation preview

V 1.0 OE NIK 2013 1

DBMAN1.

ExpectationsData modeling basicsNormalizationCorrect decomposition

V 1.0 OE NIK 2013 2

DBMAN1.

ExpectationsData modeling basicsNormalizationCorrect decomposition

V 1.0

Schedule

OE NIK 2013 3

Expectations, ER, Normalization Verification, SQL intro, WhereGrouping, advanced grouping Practicing grouping queriesJoins, SubQueries Practicing joins, subqueriesViews, LIMIT/ROWNUM Practicing joins, views

DDL/DML, Transactions Practicing DMLRelational algebra, hashing, B-trees Practice test # 1Administration, Analytic queries Practicing analytic queriesPL SQL (control structures, principles) Practicing PL SQL basicsPL SQL (exceptions, triggers) Practicing PL SQL triggersAccessing a database from C# Accessing a database from C#

NO LESSON Practice test # 2No SQL databases: MongoDB Accessing MongoDB from C#Theory test Practice re-test / Project

V 1.0

Expectations• Two tests during the semester

– 1. SQL (DQL, DML, DDL, mid-semester, ~ week 7) – 2. Advanced SQL (PL SQL, Analytics, end of the semester)– Both are required for the “signature”, one can be

rewritten on the last week of the semester– Both can be rewritten in the first 2 weeks of the exam

season (“signature re-test”)• Exam requirements

– Theory test (ER, Algebra, Normalization, Indices, hash, etc ... Can be written on last week too)

– Project work: create demo database + queries

4OE NIK 2013

V 1.0

Project Expectations

• Create database that can be used for demonstration of SQL commands– Choose a topic– Create ER diagram to plan the data (at least three

inter-connected entities)– Create the table structure diagram– Verify that we made good tables (BCNF normal form

+ correct decomposition)– Create SQL file that creates the database– Create test queries

5OE NIK 2013

V 1.0

Project Expectations

• Test queries– Use DDL constraints (NOT NULL, PK, FK)– 3 simple single-table select– 3 simple single-table GROUP BY– 5 complex multi-table select– 5 complex subquery select– 5 analyics/advanced grouping query– Must use at least 10 different row-level functions and

all grouping aggregate functions– Demonstrate DML operations: 2 insert, 2 update, 2

delete statements – 3 of those must use sub-queries

6OE NIK 2013

V 1.0

Project Expectations

• Documentation– Task description– Diagrams– For every SQL command: id number of query, task

description, SQL command, output results– Deadline: last Monday of the semester OR the day

before the exam date

7OE NIK 2013

V 1.0 OE NIK 2013 8

DBMAN1.

ExpectationsData modeling basics(http://en.wikipedia.org/wiki/Entity–relationship_modelhttp://okaram.spsu.edu/~curri/classes/12/spring-12/DB/StudyNotes/Ch6-FromERToTables/RelationalModelER2Tables.pdf)

NormalizationCorrect decomposition

V 1.0

Logical data modelling• Hierarchical model tree• Sets graph• Relational data model Relations between tables

(~Excel worksheets) • Object-oriented data model Encapsulation of data +

commands (exists as a principle, can be well approached using ORM systems, but the storage is still relational)

9OE NIK 2013

V 1.0

Relational data model

• Columns: data fields (field, attribute)

• Rows: records

• Several RELATED tables: database

- the order of rows doesn't matter - the order of columns doesn't matter- there can't be two completely identical rows (entities must be different)

10OE NIK 2013

V 1.0

• One-to-one connection (1:1)• Quite rare (people + entrepreneurs – dependencies???)

Relations

11OE NIK 2013

V 1.0

One-to-many relation (1:N)

Relations keys

12OE NIK 2013

V 1.0

Keys

• Primary key, foreign key• Simple key, complex key• When we use complex keys, then the key must not

be split up (Complex keys shouldn't be used this semester)

• Relations between keys = relations between tables = database

• We'll discuss the actual SQL implementation of keys later

13OE NIK 2013

V 1.0

Relations

14OE NIK 2013

Many-to-many relation (M:N)

V 1.0

Elements of an ER diagram

• Relation + Entities

• Attribute

• Relation-attribute

• Primary key

15OE NIK 2013

V 1.0 16OE NIK 2013

V 1.0

Programmers, Projects, Modules• Entities ER Diagram Relations Fields

orEntities Fields ER Diagram Relations

• Fields of Coders?• Fields of Modules?• Fields of Projects?• Relations

– 1 module 1 project, 1 project several modules1:N

– 1 coder several modules, one module several codersN:M extra attributes for the relation?

17OE NIK 2013

V 1.0

From ER to Tables (examples)

18OE NIK 2013

V 1.0

From ER to Tables (examples)

19OE NIK 2013

V 1.0

Fro

m E

R t

o T

ab

les

(exam

ple

s)

20OE NIK 2013

V 1.0

Table structure diagram

21OE NIK 2013

V 1.0

Example for a typical excel spreadsheetOrder

numberItem

number Item name Amount Dead-line Address Name

991201 0001 Barna kenyér 25 991201 Bpest. Barna u.1

Julius Meinl

991201 0001 Barna kenyér 35 991201 Bpest. Barna u.1

Julius Meinl

991202 0002 Fehér kenyér 24 991201 Bpest Fehér u.2

Penny Market

REDUNDANCY, INCONSISTENCY

AIM: decrease these factors by introducing new tables based on the relations between the fields

22OE NIK 2013

V 1.0 OE NIK 2013 23

DBMAN1.

ExpectationsData modeling basicsNormalization (http://en.wikipedia.org/wiki/Database_normalization)

Correct decomposition

V 1.0

Normalization

• The method described here has a practical approach, and uses tables to show how the tables are split up

• Usually normalization is done BEFORE we have any data, so BEFORE we have tables

• In addition, it is basically a mathematical process to find DEPENDENCIES amongst the attributes in a relation:Frelation_name: {key1, key2} {field1, field2, field3}

• The same relation can be written as ER diagram too: 1 square + circles + underline in two circles

24OE NIK 2013

V 1.0

Base model• List of available/possible data fields that we have at our

disposal – without any system or principles, just "look around and list what we have"

• For example: we want to store the orders in our food shop

• Item number, Item name, Price, VAT category, VAT percentage, Order number, Amount, Shipping address, Color, Weight, Country, Packaging mode, etc etc etc

25OE NIK 2013

V 1.0

0NF• We eliminate the unnecessary data we only keep the

data we really want to store let's try and store these data in a table

• Item number, Item name, Price, VAT category, VAT percentage, Order number, Amount, Shipping address,

• Color, Weight, Country, Packaging mode

26OE NIK 2013

V 1.0

0NF

Forders: {Item number, Item name, Price, VAT category, VAT percentage, Amount, {Order number, Shipping address}}

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship.Addr.

C1 N1 500 2 20

R1

5

A1C2 N2 600 2 20 5

C3 N3 700 1 10 10

C4 N4 1000 3 25

R2

2

A2C3 N3 700 1 10 5

27OE NIK 2013

V 1.0

1NF„In every row of the relation, there must be only atomic

attributes.”

• The table must not have merged rows/columns

• We eliminate the merged cellsor• We eliminate the complex data attributes

28OE NIK 2013

V 1.0

1NF

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

C1 N1 500 2 20 R1 5 A1

C2 N2 600 2 20 R1 5 A1

C3 N3 700 1 10 R1 10 A1

C4 N4 1000 3 25 R2 2 A2

C3 N3 700 1 10 R2 5 A2

Anomalies: modification, deletion, insertion !!!29OE NIK 2013

V 1.0

Functional Dependency (FD):We speak about an FD if the value of an attribute is directly and clearly determined by the value of another attribute. It is always a one-way relationship.

For example:By knowing the passport number, we can determine all other data of a person (name, mother's name, birthdate, birthplace, etc.), so it determines all data for the entity person. In the lack of such a number, we should use alternate combination of attributes (e.g. Full name + Birthdate + Mother's maiden name)

2NF

30OE NIK 2013

V 1.0

2NF

„Every secondary attribute must functionally depend on the key of the relation”

• Expectation: in a table, a field must be either a key field or a field that functionally depends on the key field(s)

• We must eliminate the multiple FDs: we have to restructure the tables, and create several tables based on the FDs that we find

• To do that, we have to write up the FDs

31OE NIK 2013

V 1.0

2NF

• Fitems: {Item number} {Item name, Price, VAT category, VAT percentage}

• Forders: {Order number} {Address}• Amount???• ForderItems: {Item number, Order number} {Amount}• FD = Table• Primary key, Foreign key, Simple key, Complex key !!!

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

32OE NIK 2013

V 1.0

Item No. Item name

Price VAT Cat.

VAT Pct.

C1 N1 500 2 20

C2 N2 600 2 20

C3 N3 700 1 10

C4 N4 1000 3 25

Order No. Item No. Amt.

R1 C1 5

R1 C2 5

R1 C3 10

R2 C4 2

R2 C3 5

Order Num.

Ship. Addr.

R1 A1

R2 A2

Keys???

33OE NIK 2013

2NF

V 1.0

3NF

• Transitive Dependency: A non-key attribute is transitively dependent on the entity's key if it is also functionally dependent on another attribute in the relation

„secondary attributes must NOT have transitive dependencies on the relation's key(s)”

• Every field must depend on the key field, AND ONLY on the key field – no other dependencies are allowed in a table!

• Currently: Fvat: {VAT category} {VAT percentage}• New FD = new table new FK in the items table

34OE NIK 2013

V 1.0

Item No. Item name

Price VAT Cat.

C1 N1 500 2

C2 N2 600 2

C3 N3 700 1

C4 N4 1000 3

Order No. Item No. Amt.

R1 C1 5

R1 C2 5

R1 C3 10

R2 C4 2

R2 C3 5

Order Num.

Ship. Addr.

R1 A1

R2 A2

35OE NIK 2013

VAT Cat.

VAT Pct.

1 10

2 20

3 25

Anomalies ???

3NF

V 1.0

• Almost no redundancy, absolutely no unnecessary redundancy

• The overall size is usually smaller

• No anomalies, no data loss

• Consistent and easy to expand or modify

• Every field must depend on the key field AND only the key field data modification is easier

Benefits

36OE NIK 2013

V 1.0

To summarize0NF: if there are no unnecessary data

1NF: if 0NF and there are no complex attributes

2NF: if 1NF and every field is functionally dependent on the key(s)

3NF: if 2NF and there is no field that is transitively dependent on the key(s) [or: if 2NF and every field is functionally dependent on the key(s), BUT NOT ON OTHER FIELDS]

BCNF: if 3NF and there is no inner FD between the any of the complex keys' fields

4NF, 5NF, DKNF .........37OE NIK 2013

V 1.0

To summarizeIf we have a decomposition, we have to verify if it is BCNF or not.

A decomposition is BCNF if

• Every relation has PK field(s)

• There is no FD that crosses tables

• Inside a table, the only FD is the Frelation_name: {key1, key2} {field1, field2, field3}

• There is no dependency inside any of the PK fields (thus, it's good if we don't use complex keys)

38OE NIK 2013

V 1.0 OE NIK 2013 39

DBMAN1.

ExpectationsData modeling basicsNormalization (http://en.wikipedia.org/wiki/Database_normalization)

Correct decomposition

V 1.0

Armstrong axioms• A functional dependency:

– Is reflexive: The simple key functionally depends on itself. Any subset of a complex key functionally depends on the complex key (if , then )

– Is transitive: If and then – Is expandable: If , then

• After splitting up the relation into smaller tables, we have to use these axioms to verify if our decomposition is usable/correct

40OE NIK 2013

V 1.0

Correct decomposition• Two aspects:

– Dependency preserving decomposition : we had the original relation, and then we wrote up the FDs. After the decomposition, can we still access all the original FDs?

– Lossless decomposition: we had the original relation with the original fields. After the decomposition, using the FDs that we wrote up, can we reconstruct the original relation?

• 3NF and BCNF are ALWAYS lossless decompositions!• The 3NF is always dependency preserving, there are

some cases where it is impossible with the BCNF!

41OE NIK 2013

V 1.0

Dependency preserving decomposition• Given an R relation and an F set of depencenies, we say

that a decomposition of R into R1, R2, …Rk is dependency preserving if by projecting the original functional dependencies to R1, R2, …Rk we can get back the original set F

• Given:– R=(A, B, C) , F={AB, BC}– Decomposition: R1=(A, C) R2=(B, C)

• Projected FDs in R1: F1={AC} (transitive)• Projected FDs in R2: F2= {BC}• F1 union F2 = {BC, AC}. The original AB is lost

42OE NIK 2013

V 1.0

Dependency preserving decomposition• Given:

– R=(A, B, C) , F={AB, BC}– Decomposition: R1’=(A, B) R2’=(B, C)

• Projected FDs in R1’ : F1={AB}• Projected FDs in R2’ : F2= {BC}• F1 union F2: {AB, BC} , this is correct

43OE NIK 2013

V 1.0

Dependency preserving decomposition• R={Item number, Item name, Price, VAT category, VAT

percentage, Amount, Order number, Shipping address} • F={ Fitems: {Item number} {Item name, Price, VAT

category, VAT percentage}, Forders: {Order number} {Address}, ForderItems: {Item number, Order number} {Amount}, Ftax: {VAT category} {VAT percentage} }

• Decomposition: R1’=(Item number, Item name, Price, VAT category) R2’=(Order number, Shipping address) R3’=(Item number, Order number, Amount) R4’=(VAT category, VAT percentage)

• Projected FDs are the same as the original FDs, since we used the FD=table approach no FD is lost

44OE NIK 2013

V 1.0

Lossless decomposition

• In a lossy decomposition, the original data is lost BAD structure, a usable decomposition must always be lossless!

• We use a simple algorithm, do determine if I can get back from the small tables to the original big table

• Basically, we check which other fields can be "attached" to our existing fields using the FDs that we have written up

45OE NIK 2013

V 1.0

EXAMPLER1 (F1, F2, F3, F5)R2 (F1, F3, F4)R3 (F4, F5)FD1: {F1} {F3 F5}FD2: {F5} {F1 F4}FD3: {F3 F4} {F2}• Starting point: a 3x5 table with unique values b(i,j)

(rows=relations, columns=attributes)• Then modify the rows so that they represent the

relations: if field j is present in relation i, the b(i,j) is replaced with a(j)

46OE NIK 2013

V 1.0

EXAMPLER1 (F1, F2, F3, F5)R2 (F1, F3, F4)R3 (F4, F5)

47OE NIK 2013

F1 F2 F3 F4 F5B(1,1) B(1,2) B(1,3) B(1,4) B(1,5)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5)

B(3,1) B(3,2) B(3,3) B(3,4) B(3,5)

F1 F2 F3 F4 F5A(1) A(2) A(3) B(1,4) A(5)

A(1) B(2,2) A(3) A(4) B(2,5)

B(3,1) B(3,2) B(3,3) A(4) A(5)

V 1.0 48OE NIK 2013

V 1.0

EXAMPLEFD1: {F1} {F3 F5}

49OE NIK 2013

F1 F2 F3 F4 F5A(1) A(2) A(3) B(1,4) A(5)

A(1) B(2,2) A(3) A(4) B(2,5)

B(3,1) B(3,2) B(3,3) A(4) A(5)

F1 F2 F3 F4 F5A(1) A(2) A(3) B(1,4) A(5)

A(1) B(2,2) A(3) A(4) A(5)

B(3,1) B(3,2) B(3,3) A(4) A(5)

V 1.0

EXAMPLEFD2: {F5} {F1 F4}

50OE NIK 2013

F1 F2 F3 F4 F5A(1) A(2) A(3) B(1,4) A(5)

A(1) B(2,2) A(3) A(4) A(5)

B(3,1) B(3,2) B(3,3) A(4) A(5)

F1 F2 F3 F4 F5A(1) A(2) A(3) A(4) A(5)

A(1) B(2,2) A(3) A(4) A(5)

A(1) B(3,2) B(3,3) A(4) A(5)

V 1.0

EXAMPLEFD3: {F3 F4} {F2}

51OE NIK 2013

F1 F2 F3 F4 F5A(1) A(2) A(3) A(4) A(5)

A(1) B(2,2) A(3) A(4) A(5)

A(1) B(3,2) B(3,3) A(4) A(5)

F1 F2 F3 F4 F5A(1) A(2) A(3) A(4) A(5)

A(1) A(2) A(3) A(4) A(5)

A(1) B(3,2) B(3,3) A(4) A(5)

V 1.0

Lossless?

• Decision: If there is no FD to apply, then we check if there is a row in a table with nothing but aj values or not

• If there is a row like that, then the decomposition is lossless. If not, then the decomposition is lossy

• We don't even have to loop through all FDs: we can stop if we reach to a point where we see that a good row is created with nothing but aj values

52OE NIK 2013

V 1.0

BCNF

• Fitems: {Item number} {Item name, Price, VAT category}

• Forders: {Order number} {Address}

• ForderItems: {Item number, Order number} {Amount}

• Fvat: {VAT category} {VAT percentage}

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

53OE NIK 2013

V 1.0

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

B(1,1) B(1,2) B(1,3) B(1,4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) B(2,6) B(2,7) B(2,8)

B(3,1) B(3,2) B(3,3) B(3,4) B(3,5) B(3,6) B(3,7) B(3,8)

B(4,1) B(4,2) B(4,3) B(4,4) B(4,5) B(4,6) B(4,7) B(4,8)

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) B(3,2) B(3,3) B(3,4) B(3,5) A(6) A(7) B(3,8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

EXAMPLE

54OE NIK 2013

V 1.0

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) B(3,2) B(3,3) B(3,4) B(3,5) A(6) A(7) B(3,8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

EXAMPLE

Fitems: {Item number} {Item name, Price, VAT category}

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) B(3,8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)55OE NIK 2013

V 1.0

EXAMPLE

Forders: {Order number} {Address}

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) B(3,8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) A(8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)56OE NIK 2013

V 1.0

EXAMPLE

ForderItems: {Item number, Order number} {Amount}

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) A(8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

57OE NIK 2013

V 1.0

EXAMPLE

Fvat: {VAT category} {VAT percentage}

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) A(8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)

Item No.

Item name

Price VAT cat.

VAT pct.

Order No.

Amt. Ship. Addr.

A(1) A(2) A(3) A(4) A(5) B(1,6) B(1,7) B(1,8)

B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)

A(1) A(2) A(3) A(4) A(5) A(6) A(7) A(8)

B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)58OE NIK 2013

V 1.0 OE NIK 2013 59

60OE NIK 2013