Upload
databaseguys
View
2.476
Download
0
Tags:
Embed Size (px)
Citation preview
O. Günther: Database Management Systems
1
Database Management Systems
Prof. Oliver Günther, Ph.D.
O. Günther: Database Management Systems
2
Databases = Electronic Filing Cabinets?
online access vs. applications difference DB-WWW?
O. Günther: Database Management Systems
3
Databases = Electronic Filing Cabinets?
O. Günther: Database Management Systems
4
Requirements for a Database System
• large capacity - huge data sets: - banking/insurance apps.: gigabytes of data (109 - 1011 bytes) - environmental apps.: terabytes of data ( > 1012 bytes)• user-friendly read/write access• efficient processing - short response times• data security• privacy• persistency, robustness towards hardware problems• control of redundancy• consistency• multiple users (including concurrency)• integrated data management• structured data management (logical, physical)• low cost• role of standards• data independence
O. Günther: Database Management Systems
5
3-Layer Architecture
• External layers PASCAL COBOLUser views record emp of 01 Ang pno: string; 02 P-NR PIC X(6) ... salary: integer; 02 ABT PIC X(4) end
• Conceptional layer EMPLOYEEcommon logical view PNO CHAR(6) DEPT CHAR(4) SALARY INT
• Internal layer STORED_EMP LENGTH=20common physical view PREFIX TYPE=BYTE(6), OFFSET=0 EMP# TYPE=BYTE(6), OFFSET=6 DEPT# TYPE=BYTE(4), OFFSET=12 WAGE TYPE=FULLWORD, OFFSET=16
O. Günther: Database Management Systems
6
3-Layer Architecture (cont.)
• External layer- one external layer per user view or application program- Application program: embedded database commands- User: ad hoc query languages, menus, frames
• Conceptional layer- logical view of the complete database- often union of all external views
• Internal layer- oriented along the physical storage structure - (pages/blocks)- data independence??
O. Günther: Database Management Systems
7
Database Administration
• Database administrator (DBA) - user contact - definition of external views - definition of conceptional view - definition of internal view - security mechanisms - backup and recovery mechanisms - monitoring of response behavior• Data dictionary (metadata) - which data are known? - how are the data structured logically? - how are the data structured physically?
O. Günther: Database Management Systems
8
Abstraction Layers: Logical vs. Physical
• logical modeling - entities - relationships• data modeling - hierarchical - network - relational - object-oriented• physical modeling - storage structures - access methods
O. Günther: Database Management Systems
9
Entity Relationship Model
• ER = Entity - Relationship• entity: object, „thing“• attribute: property• entity set/entity class: object class• relationship• Example.: - entity classes: supplier, part - attributes: supplier number, supplier name, address part number, part name, color - entities: Miller, Smith, Shultz, <supplier> screw, nail <part> - relationships: supplies - attributes (of relationships): capacity
O. Günther: Database Management Systems
10
Data Models
• Hierarchical - 1:n relationships - tree-like data structures - Products: IMS, ...• Ex.: Company, Supplier, Product, Part• Problems - n:m relationships (Ex.: Product-Supplier) - redundancies - tight coupling logical-physical
O. Günther: Database Management Systems
11
Hierarchical Data Model - Example
O. Günther: Database Management Systems
12
Hierarchical Data Model - Example
O. Günther: Database Management Systems
13
Hierarchical Data Model - a Concrete Database
O. Günther: Database Management Systems
14
Hierarchical Data Model - a Concrete Database
O. Günther: Database Management Systems
15
Data Models (2)
• Network (''CODASYL'') n:m relationships - graph-like data structures Products: IDMS, ADABAS (Software AG), ... Ex.: Supplier - Part (n:m relationship)
database schema
a concrete database
O. Günther: Database Management Systems
16
Data Models (2)
• Network (''CODASYL'') n:m relationships - graph-like data structures Products: IDMS, ADABAS (Software AG), ... Ex.: Supplier - Part (n:m relationship)
database schema
a concrete database
• Problem: confusing, inefficient
O. Günther: Database Management Systems
17
Data Models (2)
• Network (''CODASYL'') n:m relationships - graph-like data structures Products: IDMS, ADABAS (Software AG), ... Ex.: Supplier - Part (n:m relationship)
database schema
a concrete database
• Problem: confusing, inefficient
Supplier Part
O. Günther: Database Management Systems
18
Data Models (3)
• Relational - n:m relationships - table as data structure - Products: Oracle 8i, Informix Universal Server, IBM DB2, SYBASE, Microsoft Access, Microsoft SQL Server, ... - market share still growing• Problem - legacy problems - migration strategies (Y2K)?
O. Günther: Database Management Systems
19
The Relational Data Model
• Ex.: Supplier - Part
Supplier (S-No, Name, Address)
Part (P-No,P-Name, Color)
SP_R (S-No, P-No, Capacity)
S-No Name AddressS1 Miller M-TownS2 Smith S-TownS3 Günther G-Town
P-No P-Name ColorP1 bracket blueP2 plate blackP3 screw grey
S-No P-No CapacityS1 P1 300S1 P2 200S1 P3 400S2 P2 300S2 P3 500S3 P3 100
O. Günther: Database Management Systems
20
The Relational Data Model: Formal Calculus
• relation (table): subset of the Cartesian product of a list of domains• domain: set of possible values for one column
- Ex.: INTEGER {0,1} {grey, blue, red} - similar to a data type in programming languages
O. Günther: Database Management Systems
21
The Relational Data Model: Formal Calculus (cont.)
• Cartesian Product of a set of domains Di
(D1×D2×...×Dk ): set of all k-tuples (v1, ..., vk), where vi Di (i=1,...,k)
- Ex.: k=2, D1={0,1}, D2={a,b,c}
(0, a)(0, b)(0, c)(1, a)(1, b)(1, c)
D1D2 =
O. Günther: Database Management Systems
22
The Relational Data Model: Formal Calculus (cont.)
• relation: finite subset of the Cartesian product - Ex.:
• tuple: element (line) of a relation• arity or degree: number of attributes (columns) of a relation• a tuple (v1 ... vk) has k components (k-tuple)• schema: the collection of the relation name, the attribute names, and the domains Ex.: Supplier (S-No, Name, Address)• relations are sets (in principle ...) - no tuples can appear more than once - not sorted
Column-1 Column-20 a0 b1 b
O. Günther: Database Management Systems
23
Entities vs. Relationships vs. Relations
• entity set relation• attribute column• ex.: Supplier (S-No, Name, Adress, ...)• entity line (tuple)• relationship between entity sets E1, ..., Ek relation whose schema consists of the key attributes of E1, ..., Ek (+ possibly additional information) Ex. 1: Supplier (S-No, Name, Address)
Part (P-No, P-Name, Color) ZT_R (S-No, P-No, Capacity)
Ex. 2: Student (Student-No, Name, Birthdate, ...) Lecture (Department, Lecture-No, ...) Takes (Student-No, Lecture-No, Grade, ...)
O. Günther: Database Management Systems
24
Key of a Relation
• a set S of attributes of a relation R is a key of R if (1) no instance of R may contain two different tuples that have the same values for all attributes in S (uniqueness) (2) there is no true subset of S that has property (1) (minimality)
• often depends on the application:
Ex. 1: Supplier (S-No, Name, Address) Part (P-No, P-Name, Color) ZT_R (S-No, P-No, Capacity)
Ex. 2: Student (Student-No, Name, Birthdate, ...) Lecture (Department, Lecture-No, ...) Takes (Student-No, Lecture-No, Grade, ...)
O. Günther: Database Management Systems
25
Key of a Relation (cont.)
• the question of what is the key of a relation R depends on R‘s schema, not on the current instance (Ex.: Supplier.Name)• relations can have more than one key
Ex.: Department (Name, Address, Dept_Code): Name and Dept_Code are both unique (in one company) and therefore keys But: Employee (Name, P-No, Salary)??
• If there is more than one key, one selects one of these candidate keys as primary key, depending on the application• if one has more than one relation with the same key, one may consider merging them
Ex.: Department (Name, Address, Dept_Code) Manager (Emp_No, Dept_Name) Dept (Name, Address, Dept_Code, Manager)
O. Günther: Database Management Systems
26
Relational Algebra
• set of mathematical operations on relations [Codd, 1970s] Ex.: R S
• union R S and difference R - S - R and S have to have same arity - domains have to be compatible
A B Ca b cd a fc b d
D E Fb g ad a f
O. Günther: Database Management Systems
27
Relational Algebra (cont.)
A B C D E F
A C
A B C
• Cartesian Product R S
• projection (subset of columns) -Ex.: A,C(R)
• Selection (subset of tuples (lines)) - Ex.:B=b (R)
O. Günther: Database Management Systems
28
Relational Algebra (cont.)
- Join R S
ij
- i, j : names of columns (R.i, S.j) - : arithmetic comparison operator (=, <, , ...) - subset of the Cartesian product R S, for which is true
- Ex.: R S B<D
A B C D E F
O. Günther: Database Management Systems
29
Relational Algebra (cont.)
• equijoin special kind of - Join: is =
Ex.: R S B=D
• natural Join special kind of equijoin applicable if the two input relations have columns with the same name Ex.: T U
T U T U
A B C D E F
A B Ca b cd b cb b fc a d
B C Db c db c ea d b
A B C D
O. Günther: Database Management Systems
30
SQL: Structured Query Language
• 4th Generation Language (4 GL) for data querying and manipulation• 4 GL: user only has to specify which data are needed, not how they can be obtained (data independence!)• DBMS (Database Management System) takes care of (efficient) computation• SQL : IBM Research (San Jose, Kalifornien), '70s
O. Günther: Database Management Systems
31
Toy Database
Name Address BalancePeter 1 Sybel St. -200Jane 5 Kant St. -50Ruth 7 Miller St. 43
O_No Date Customer1024 Jan 3 Peter1025 Jan 3 Ruth1026 Jan 4 Peter
O_No Product Amount1024 Brie 31024 Perrier 61025 Brie 51025 Oysters 121025 Onions 11026 Peanuts 2048
Name Product PriceChris Brie 3.49Chris Perrier 1.19Chris Peanuts .06 Chris Oysters .25Jack Brie 3.98Jack Perrier 1.19 Jack Onions .69
Customers Orders
Contains Supplies
O. Günther: Database Management Systems
32
SQL: Projection
• Ex.: Find name and account balance of all customers• relational algebra:
• in SQL:
• projection in SQL in general:SELECT Ri1·A1, Ri2·A2, ..., Rir·Ar FROM R1, R2, ..., Rk
O. Günther: Database Management Systems
33
SQL: Selection
• Ex.: Find all customers with negative balance• SQL:
• relational algebra:
• selection in SQL in general:SELECT * {alle Attribute} FROM R
WHERE
O. Günther: Database Management Systems
34
SQL: Uniqueness of Names
• if attribute names are unique, one can drop the relation name(s) in the SELECT and the WHERE clause• Ex.:
SELECT Customers.Name FROM Customers WHERE Customers.Balance < 0
O. Günther: Database Management Systems
35
SQL: Aliases
• alias = second name for an attribute
• to be attached to the original name of the column
• Ex.:
SELECT Name Client, Address, Balance Deficit FROM Customers WHERE Balance < 0
O. Günther: Database Management Systems
36
SQL: Equijoins
• Ex.: Find the products ordered by Peter• in relational algebra
• in SQL SELECT FROM WHERE
• Ex.: Find the names of all suppliers that carry at least one of the products that have been ordered by Peter
O. Günther: Database Management Systems
37
SQL: Processing a Join Query
• Ex.: SELECT Product FROM Orders, Contains WHERE Customer = '‘Peter'' AND Orders.O_No = Contains.O_No
O. Günther: Database Management Systems
38
SQL: Processing a Join Query (cont.)
Selection: Customer = '‘Peter''
Equijoin: Orders.O_No = Contains.O_No
• Starting with Orders
O. Günther: Database Management Systems
39
SQL: Processing a Join Query (cont.)
Projection: SELECT Product
• Operations can sometimes be exchanged efficiency?
O. Günther: Database Management Systems
40
SQL: Deleting Multiple Copies of a Tuple
• why do they exist?• keyword DISTINCT• Ex.: SELECT DISTINCT Customer FROM Orders• without DISTINCT?
O. Günther: Database Management Systems
41
SQL: Tuple Variables
• necessary if one needs to address several different tuples of the same relation in the same query• Ex.: Find names and addresses of all customers that have less money on their account than Jane
SELECT FROM WHERE AND
• tuple variables are relations, i.e., sets of tuples• they serve to represent intermediate results
O. Günther: Database Management Systems
42
SQL: Tuple Variables
• necessary if one needs to address several different tuples of the same relation in the same query• Ex.: Find names and addresses of all customers that have less money on their account than Jane
SELECT C1.Name, C1.Address FROM Customers C1, Customers C2, WHERE C1.Balance < C2.Balance AND C2.Name = ''Jane''
• tuple variables are relations, i.e., sets of tuples• they serve to represent intermediate results
O. Günther: Database Management Systems
43
SQL: Subqueries
• nesting of queries• reference to intermediate results via the keyword IN• Ex.: Find all suppliers that carry at least one of the products ordered by Peter
1 SELECT Name 2 FROM Supplies 3 WHERE Product IN 4 (SELECT Product 5 FROM Contains 6 WHERE O_No IN 7 (SELECT O_No 8 FROM Orders 9 WHERE Customer = '‘Peter''))
• IN corresponds to the element operator
O. Günther: Database Management Systems
44
SQL: Subqueries (cont.)
• Instead of IN : ALL
SELECT Product FROM Supplies WHERE Price >= ALL (SELECT Price FROM Supplies)
• ALL corresponds to the universal quantor
O. Günther: Database Management Systems
45
SQL: Subqueries (cont.)
• Instead of IN : ANY
SELECT FROM Orders WHERE O_No < ANY (SELECT O_No FROM Orders WHERE Customer ='‘Peter'')
• ANY corresponds to the existential quantor
O. Günther: Database Management Systems
46
SQL: Subqueries (cont.)
• Statt IN: =
SELECT Product FROM Contains WHERE O_No = (SELECT O_No FROM Orders WHERE Customer = ''Ruth'')
• If cardinality of the subquery‘s result is greater than 1: ERROR
O. Günther: Database Management Systems
47
SQL: Aggregates
• Functions for the aggregation of single values
AVG - average COUNT - number SUM - sum MIN - minimum MAX - maximum STDDEV - standard deviation VARIANCE - variance
• Ex.: SELECT AVG(Balance) FROM Customers
• Or: SELECT AVG(Balance) Average FROM Customers
O. Günther: Database Management Systems
48
SQL: Aggregates (cont.)
• Ex.: SELECT COUNT (DISTINCT Name) No-Suppliers FROM Supplies
• Ex.: SELECT COUNT(Name) No-Brie-Suppliers FROM Supplies WHERE Product =''Brie''
- no duplicate elimination required
No-Suppliers2
No-Brie-Suppliers2
O. Günther: Database Management Systems
49
SQL: Aggregation and Grouping
• GROUP BY
A1, A2, ..., Ak
• two tuples are in the same group if they have the same values for the attributes A1, A2, ..., Ak
• Ex.: SELECT Product, AVG(Price) Average-Price FROM Supplies GROUP BY Product Product Average-Price
O. Günther: Database Management Systems
50
SQL: Aggregation and Grouping (cont.)
• Ex:
SELECT Customer, AVG(Amount) FROM Orders, Contains WHERE Orders.O_No = Contains.O_No GROUP BY Customer
O. Günther: Database Management Systems
51
SQL: GROUP BY ... HAVING
• general format: GROUP BY A1, A2, ..., Ak
HAVING • is a boolean expression that is applied to each group separately• one selects only those groups where the condition is true• Ex.: SELECT Product, AVG(Price) Average-Price FROM Supplies GROUP BY Product
HAVING COUNT(*) > 1
• Or: HAVING COUNT (DISTINCT Price) > 1
O. Günther: Database Management Systems
52
SQL: Insertion of Tuples
• in general: INSERT INTO R VALUES (Vi, ..., Vk)• ex.: INSERT INTO Supplies VALUES (''Jack'',''Oysters'',.24)• null values: INSERT INTO Supplies (Name, Product) VALUES (''Jack'',''Oysters'')• nested insertions: INSERT INTO Sales-Chris SELECT Product, Price FROM Supplies WHERE Name = ''Chris''
O. Günther: Database Management Systems
53
SQL: Deletion of Tuples
• in general: DELETE FROM R WHERE • ex.: DELETE FROM Supplies WHERE Name = ''Chris'' AND Product = ''Perrier''
• ex.: Delete all orders containing Brie
O. Günther: Database Management Systems
54
SQL: Updating Tuples
• in general: UPDATE R SET A1=x1, ..., Ak=xk
WHERE • ex.: UPDATE Supplies SET Price = 1.00 WHERE Name = ''Chris'' AND Product = ''Perrier''
• ex.: Chris reduces all prices by 10 percent..
O. Günther: Database Management Systems
55
SQL - DDL
• DDL: Data Definition Language• so far we only discussed the DML - Data Manipulation Language • typical DDL command: CREATE TABLE• general format: CREATE TABLE R(A1T1 [NOT NULL], ...,
AkTk [NOT NULL]) • ex.: CREATE TABLE Supplies
(Name CHAR(20) NOT NULL, Product CHAR(10) NOT NULL,
Price NUMBER (6,2))
• to delete a table: DROP TABLE Supplies
O. Günther: Database Management Systems
56
Views
• logical relations• so far we only discussed physical relations (stored on disk), also called base relations• views serve to represent specific user views• view contents are not stored physically but computed on demand• one can query (i.e., read only) views just like base relations • updates (write access) are not so easy
O. Günther: Database Management Systems
57
Views (cont.)
• view definition - general form CREATE VIEW V (A1, ... , Ak) AS <SELECT Query>• Ex.: CREATE VIEW Offer - Chris (Product, Price) AS SELECT Product, Price FROM Supplies WHERE Name = 'Chris'
Product PriceBrie 3.49
Perrier 1.19Nuts .06
Oysters .25
O. Günther: Database Management Systems
58
View Update Problem
• ex.: Offer - Chris DELETE INSERT UPDATE (Price) UPDATE (Product)
• more complex example.: CREATE VIEW Customer-Order (Name, Date, Product, Amount) AS SELECT Customer, Date, Product, Amount FROM Orders, Contains WHERE Orders.O_No = Contains.O_No
- DELETE - INSERT
- UPDATE (Name)- UPDATE (Date)- UPDATE (Product)- UPDATE (Amount)
O. Günther: Database Management Systems
59
View Update Problem (cont.)
• ex.: CREATE VIEW X AS SELECT Product, AVG(Price) DP FROM Supplies GROUP BY Product - UPDATE (DP) - UPDATE (Product) - INSERT - DELETE
O. Günther: Database Management Systems
60
View Update Problem (cont.)
• ex.: CREATE VIEW Y AS SELECT C2.Name, C2.Address FROM Customers C1, Customers C2 WHERE C2.Balance < C1.Balance AND C1.Name = 'Jane'
- INSERT - DELETE - UPDATE (Name) - UPDATE (Address)
O. Günther: Database Management Systems
61
View Update Problem (cont.)
• Views can be updated if (1) the corresponding base relations can be updated (i.e., no non-updatable views) (2) the SELECT command is a combination of only projections (column subsets) and selections (row subsets) (i.e., no joins, subqueries, tuple variables, aggregates, etc.). In case of projections, the key has to be preserved.
O. Günther: Database Management Systems
62
View Update Problem (cont.)
all possible views
views that can be updated
views according to (1) and (2)
views that can be updatedin SQL (version-dependent)
O. Günther: Database Management Systems
63
Views - Summary
• logical relations• defined using physical base relations (and possibly other views)• (typically) not stored physically but computed on demand using the current content of the base relations• same data can be „viewed“ in different shapes• supports different user groups and privacy• view updates: problematic because not all updates can be mapped to base relations
O. Günther: Database Management Systems
64
Databases - Programming Languages
• collision of two different paradigms - PL: one tuple at a time - DB: many tuples at a time
• interface tuple - variable: communication via „cursors“ (buffer)
• queries are preformulated using variables
• instantiation at run-time with real values
O. Günther: Database Management Systems
65
Ex: Embedded SQL
exec sql begin declare section; int O_No, Amount; char Date [10], Customer [20], Product [10];exec sql end declare section;exec sql connect;exec sql prepare order-insert from insert into Orders values (:O_No, :Date, :Customer);exec sql prepare cont-insert from insert into Contains values (:O_No, :Product, :Amount);write (‚Enter Order No., Date, and Customer‘);read (O_No); read (Date); read (Customer);exec sql execute order-insert using :O_No, :Date, :Customer;write (‚Enter a list of tuples ‚Product-Amount‘, terminate with ´end´´);read (Product);while (Product ! = 'end') { read (Amount); exec sql execute cont_insert using :O_No, :Product, :Amount; read (Product); }
O. Günther: Database Management Systems
66
Integrity in Databases
• maintenance of a correct relationship database - real world• (possibly automatical) identification of invalid states of the database (i.e., states without correspondence in the real world)• three kinds of integrity
domain-specific integrity (application-specific, ex.: date) key integrity schema integrity
O. Günther: Database Management Systems
67
Integrity in Databases (cont.)
• key integrity - rule 1 (entity integrity): each relation must have a key, and each tuple in the relation must have a key value that is unique and non-NULL. - rule 2 (referential integrity): for each foreign key FK there is another relation with a primary key PK such that each non-NULL value of FK is identical to an existing value of PK. - Ex.: foreign key O_No in relation Contains, foreign key Customer in relation Orders
• schema integrity
O. Günther: Database Management Systems
68
Database Design
• ex. for bad database design:
Suppliers - Info
• disadvantages redundancies update anomalies insertion anomalies (ex: supplier without products) deletion anomalies (NULL in key)
Name Address Product PriceChris 24 Kant St. Brie 3.49Chris 24 Kant St. Perrier 1.19Jack 2 Main St. Brie 3.98... ... ... ...
O. Günther: Database Management Systems
69
Database Design by Decomposition
• approach: decomposition into relations with less columns Careful: no information loss
• Ex.: Suppliers (L-Name, L-Address) Supplies (L-Name, Product, Price)
• disadvantage: may require additional join operations at query time
O. Günther: Database Management Systems
70
Functional Dependencies
• logical dependencies between columns• causes many of the problems discussed above - redundancies - update anomalies - ...• Definition: If for a relation R there is a functional dependency (FD) X Y (where X and Y may represent one or several columns of R) then the following holds for two arbitrary tuples t1 and t2 in R: t1 [X] = t2 [X] t1 [Y] = t2 [Y] .• A functional dependency defined on relation R holds for all instances of R
O. Günther: Database Management Systems
71
Functional Dependencies (cont.)
• Ex.: Customers: Name Address
Name Balance Orders: O_No Date
O_No Customer Customers: Address Address Supplies: {Name, Product} Price
• for each key S of a relation R and each subset T of columns of R we have: S T• Some FDs imply other FDs• Ex.: F = {A B, B C} |= A C
O. Günther: Database Management Systems
72
Closure of FD Sets
• F+:= {X Y: there is an FD A B in F: A B |= X Y}
• the closure F+ of a set F of FDs contains all functional dependencies implied by the FDs in F
• Ex.: F = {A B; B C; AB C}
F+ =
O. Günther: Database Management Systems
73
Minimal Cover of a Set F of FDs
• given a set F of FDs, F is a minimal cover of F if and only if: (1) F+ = F+, i.e., all FDs F are implied by the FDs in F. F and F are equivalent. (2) the right side of each FD in F is a single attribute (3) there is no (X A) F : (F -{X A})+= F+, i.e., there are no superfluous FDs in F (4) there is no (X A) F, Z X: F - (X A) (Z A))+= F+, i.e., no FD in F can be replaced by a simpler FD
O. Günther: Database Management Systems
74
FDs and Database Design
• potential problem: too many FDs in a relation • may lead to anomalies and redundancies• solution: decomposition into several simple relations
Ri R (i = 1,..., k)
R = R1 || R2 || ... || Rk • less redundancies but possibly more joins • important for preservation of information:
one has to be able to re-assemble R by joining the Ri (lossless join) the FDs defined in R have to be definable on the Ri
(preservation of dependencies)
O. Günther: Database Management Systems
75
Database Design and Normal Forms
• why normal forms? - format standardization (1NF) - reduction/elimination of redundancies (2NF, 3NF, ...)• theoretical tool for improving/maintaining database design quality• in practice, however: redundancy vs. efficiency - redundant data may lead to inconsistencies after updates - but useful for efficiency reasons (shorter response times)• tradeoff problem: to be decided case by case
O. Günther: Database Management Systems
76
1st Normal Form (1NF)
• all attributes have to be atomic• no „repeating groups“• important foundation of the relation model• but: may lead to increased redundancy• Ex.: relation Supplies
Name Product PriceChris Brie 3.49Chris Perrier 1.19Chris Peanuts .06 Chris Oysters .25Jack Brie 3.98Jack Perrier 1.19 Jack Lettuce .69
Name Product PriceChris Brie 3.49
Perrier 1.19Peanuts .06 Oysters .25
Jack Brie 3.98Perrier 1.19 Lettuce .69
repeating groups
(a) not in 1NF (b) in 1NF
O. Günther: Database Management Systems
77
2nd Normal Form (2NF)
• 1NF + for all attributes A and attribute sets X in relation R:
X A in R X is no real subset of at least one key of R AND OR A not in X A is key attribute (i.e., it belongs to at least one key of R)
• note: if R has only one key, this is equivalent to: 1 NF + each non-key attribute is fully functionally dependent on the key, i.e., it can not be inferred from part of the key• trivially true for one-column keys• Ex.: relation Supplies - Supplies (Name, Product, Price) is in 2NF if and only if Price depends on both Name and Product (free pricing) - with fixed prices (e.g. books in Germany), Supplies is no longer in 2NF - possibly decomposition into Supplies’ (Name, Product) and Costs (Product, Price)
O. Günther: Database Management Systems
78
3rd Normal Form (3NF)
• 2NF + for all attributes A and attribute sets X in relation R: X is a key of R X A in R OR AND X contains a key of R A not in X OR A is a key attribute
• note: if there is only one key, this is equivalent to: 2NF + non-key attributes are mutually independent• sufficient (but not necessary) condition:: if an FD in the minimal cover contains all attributes of R then R is in 3NF• Ex.: relation Customers (Name, Address, Balance) - all attributes atomic 1NF - keys have only one column 2NF - Address and Balance are mutually independent 3NF
O. Günther: Database Management Systems
79
3NF - An Example
• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R:• key attributes of R:• 1NF
• 2NF
• 3 NF
O. Günther: Database Management Systems
80
3NF - An Example
• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R:• 1NF
• 2NF
• 3 NF
O. Günther: Database Management Systems
81
3NF - An Example
• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R: C, S, Z• 1NF
• 2NF
• 3 NF
O. Günther: Database Management Systems
82
3NF - An Example
• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R: C, S, Z• 1NF: no problem
• 2NF
• 3 NF
O. Günther: Database Management Systems
83
3NF - An Example
• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R: C, S, Z• 1NF: no problem
• 2NF: o.k. because Z and C are key attributes
• 3 NF
O. Günther: Database Management Systems
84
3NF - An Example
• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R: C, S, Z• 1NF: no problem
• 2NF: o.k. because Z and C are key attributes
• 3 NF: o.k. for the same reason
O. Günther: Database Management Systems
85
Decompositon into 3NF
• given: relation R, set of FD's F• find: decomposition of R into a set of 3NF relations Ri
• algorithm: IF R in 3NF THEN stop ELSE compute minimal cover F of F; create a separate relation Ri = A for each attribute Athat does not occur in any FD in F; create a relation Ri = XA for each FD X A in F; if the key K of R does not occur in any relation Ri, create one more relation Ri = K.• decomposition fulfills - lossless join - preservation of dependencies
O. Günther: Database Management Systems
86
Decomposition into 3NF - Example
Attributes:
L ... Lecture R ... Room I ... Instructor S ... Student T ... Time G ... Grade
Relational Schema: R= (L, I, T, R, S, G)
Functional Dependencies:
O. Günther: Database Management Systems
87
Decomposition into 3NF - Example (cont.)
Attributes:
L ... Lecture R ... Room I ... Instructor S ... Student T ... Time G ... Grade
Relational Schema: R= (L, I, T, R, S, G)
Functional Dependencies: F = {L I ,
TR L,TI R, LS G, TS R,TRI LR}
O. Günther: Database Management Systems
88
Decomposition into 3NF - Example (cont.)
• Keys:
• Key Attributes:
O. Günther: Database Management Systems
89
Decomposition into 3NF - Example (cont.)
• Keys: ST
• Key Attributes: S, T
O. Günther: Database Management Systems
90
Decomposition into 3NF - Example (cont.)
• F = { L I , TR L,TI R, LS G, TS R,TRI LR}
• Minimal Cover
• Decomposition into Ri
O. Günther: Database Management Systems
91
Decomposition into 3NF - Example (cont.)
• F = { L I , TR L,TI R, LS G, TS R,TRI LR}
• Minimal Cover F = {L I ,
TR L,TI R, LS G, TS R}
• Decomposition into Ri
O. Günther: Database Management Systems
92
Indices
• data structures (often tree structures) that serve to accelerate database searches• frequent synonyms: index structures, access methods• Ex.: Supplies (Name, Product, Price)
Name Product PriceChris Brie 3.49Chris Perrier 1.19Chris Peanuts .06 Chris Oysters .25Jack Brie 3.98Jack Perrier 1.19 Jack Lettuce .69
O. Günther: Database Management Systems
93
Indices (cont.)
• Name and Product are the indexed columns• Index on Name is primary index - indexed column is part of the primary key - relation is sorted by increasing primary key - well suited for processing range queries (Ex.: Find all suppliers whose name starts with B, C or D)• all other indices: secondary indices• tradeoff: queries vs. updates - indices accelerate many queries ... - ... but slow down updates
O. Günther: Database Management Systems
94
Dense vs. Sparse Indices
• relations are stored in blocks (pages) on the magnetic disk• crucial cost factor: how many blocks to I have to transfer from disk to main memory in order to answer the query?• non-dense (or sparse) index: one index entry per block - for a primary index it suffices to store the smallest key value per block - index supports the system when looking for the relevant block(s) - inside each block: local search (cf. telephone directory) - useful for large relations because very compact - only possible for columns according to which the relation has been sorted (cf. phone directory) - therefore: at most one sparse index per relation • dense index: one index entry per tuple
O. Günther: Database Management Systems
95
How Does a Disk Access Work?
Disk Drive
Readblock
Writeblock
Main Memory
O. Günther: Database Management Systems
96
Dense vs.sparse indices:An Example
Oysters
Peanuts
Lettuce
Index on Name(sparse)
Index on Product(dense)
Price
PeanutsOysters
Lettuce
O. Günther: Database Management Systems
97
• large relations large indices• indexing a larger index leads to a smaller index etc.• tree structure
• root fits on one page (= one block)
Layered Indices
Index (often dense)
File (Relation)
O. Günther: Database Management Systems
98
B+ Tree
• tree structure as described above
A
O. Günther: Database Management Systems
99
B+ Tree (cont.)
• B+ trees are balanced (i.e., all leaves are on the same level)
• lowest level (leaves): dense, otherwise : sparse
• each node fits on one page ( N entries)
• N = page size / space requirements per entry (Ex. above: N = 3)
• minimal page utilization (guaranteed): N/2 entries
O. Günther: Database Management Systems
100
B+ Tree (cont.)
• each node has between N/2 and N entries
• problems: overflow, underflow
• Ex.: N = 3
A
O. Günther: Database Management Systems
101
B+ Tree (cont.)
O. Günther: Database Management Systems
102
B+Baum (cont.)
O. Günther: Database Management Systems
103
Hashing - An Alternative to Indices
• hash function h: data value storage address• Ex.: storage address = data value MOD p (p typically a prime number)• Ex.: p = 13
Supp_N o Name ...
100 Miller ...
200 Smith ...
300 Meyer ...
400 Kuntze ...
500 Smith ...
1400 Miller ...
Hash Field
O. Günther: Database Management Systems
104
Hashing (cont.):Storage Structure
• only one hash field per relation!• advantage: very fast access• disadvantage: - relation dispersed across the disk - collisions
O. Günther: Database Management Systems
105
Hashing (cont.):Collision Chains
O. Günther: Database Management Systems
106
Query Optimization
• Ex.:
SELECT DISTINCT Orders.Customer FROM Orders, Contains WHERE Orders.O_No = Contains.O_No AND Contains.Product = 'Brie'
• Assumptions: 100,000 tuples in Orders, 1000 bytes each 1,000,000 tuples in Contains, 100 bytes each 1,000 tuples in Contains concern Brie 100 MB main memory
O. Günther: Database Management Systems
107
Query Optimization (cont.)
• Strategy 1: 1) Compute cartesian product Orders Contains 2) Select all tuples with Orders.O_No = Contains.O_No 3) Select all tuples with Contains.Product = 'Brie' 4) Project to Customer
• Strategy 2: 1) Select all tuples from Contains with Product = 'Brie' 2) Compute cartesian product with Orders 3) Select all tuples with Orders.O_No = Contains.O_No 4) Project to Customer
O. Günther: Database Management Systems
108
Query Optimization (cont.)
• Analysis Strategy 1: (1)+(2): Tuple-I/Os for Orders: Tuple-I/Os for Contains: (3)+(4): Tuple-I/Os:
Tuple-I/Os in total: • Analysis Strategy 2: (1): Tuple-I/Os for Contains: (2)-(4): Tuple-I/Os:
Tuple-I/Os in total:
• Strategy 2 is clearly superior (Factor?)
O. Günther: Database Management Systems
109
Query Optimization (cont.)
• Which (meta)data should be stored? (Statistics) - number of tuples for each relation - number of columns for each relation - number of different values per column - occurence frequencies of particular values• More information facilitates query optimization but slows down updates• Automatical optimization preferable because - statistics always up-to-date - more cost-efficient - dynamic• Important strength of relational systems
O. Günther: Database Management Systems
110
Transaction Processing
• Transaction (TA) - logical unit of work - should be executed either completely or not at all - atomic, i.e., not decomposable• Recovery • Concurrency
O. Günther: Database Management Systems
111
Recovery
• Recovery: restart after system fault• System faults - program crash - arithmetic mistakes (e.g. overflow) - disk crash - power failure• Ex.: DELETE FROM Contains WHERE O_No = 1024 • What happens in case of system fault „in the middle“
O. Günther: Database Management Systems
112
Recovery (cont.)
• COMMIT - operation to terminate a TA successfully - all updates are stored in the database permanently - storage on „safe“ storage medium - transaktion is finalized - bundling of several COMMIT operations in checkpoints• ROLLBACK - operation to abort a TA in case of system fault - changes in CPU registers and storage are reversed• Important for ROLLBACK - logging each single modification - storing the log on a „safe“ medium
O. Günther: Database Management Systems
113
Recovery (cont.)
checkpoint checkpoint checkpoint error
recovery
(Updates are storedon some “safe” medium)
O. Günther: Database Management Systems
114
Recovery (cont.)
• 3 types of transactions - transactions that already completed and whose results have been made permanent: T1 - transactions that have already completed but whose results have not yet been made permanent: T2, T4 REDO (i.e. re-run, after recovery these transactions will have completed) - transactions that started but that did not finish: T3, T5 UNDO (i.e. reversal of all modifications, ROLLBACK of each transaction concerned; after recovery these transactions will not have completed)
O. Günther: Database Management Systems
115
Concurrency: Dirty Read Problem
transaction A
action on basis of R.X
read from R.X
transaction B
commit B
update R.X
Problem!ROLLBACK A
R.X .. attributes of R
R .. relation
O. Günther: Database Management Systems
116
Concurrency:Lost Update Problem
transaction A
transaction B
transaction B
transaction A
A reads R.X
double R.X
A writes newvalue of R.X
Commit A
B reads R.X
B adds 2 to R.X
B writes newvalue of R.X
Commit B
A changes R.X
B changes R.X
A reads R.X
A Rollback
B Commit
O. Günther: Database Management Systems
117
Concurrency: Possible Solutions
• Timestamps to coordinate transactions• Locks: temporary blocking of parts of the database - exclusive lock (X-Lock): read/write lock, i.e. no other TA is allowed to read or write the blocked data - shared lock (S-Lock): write lock, i.e., others can read but not write• If a TA wants to read, it first has to ask for an S-lock for the required data • If a TA wants to write, it first has to ask for an X-lock for the required data• compatibility of locks S+S ... OK S+X ... Not OK X+X ... Not OK
O. Günther: Database Management Systems
118
Locks: Application to Dirty Read
Yes
YesYes Yes
Yes
YesN
N
N
O. Günther: Database Management Systems
119
Locks: Application to Dirty Read (cont.)
TA A obtains an X-lock for the field R.X to prepare for the planned update
TA B asks for an S-lock to prepare for the planned read operation REJECTED
ROLLBACK A locks are released
TA A obtains S-lock
TA B performs read operation + COMMIT
restart TA A
• Ex. 1:
O. Günther: Database Management Systems
120
Locks: Application to Dirty Read (cont.)
TA A requests X-Lock for R.X
TA A obtains X-Lock, updates R.X
TA B requests S-Lock REJECTED, TA B waits
TA A ROLLBACK
TA B obtains S-Lock, reads R.X
TA B COMMIT
restart TA A, re-obtains X-Lock
• Ex. 2:
O. Günther: Database Management Systems
121
Locks: Application to Lost Update
TA A wants to read R.X, asks for S-lock
TA A obtains S-lock, reads R.X
TA B also wants to read R.X, asks for S-Lock
TA B obtains S-Lock, reads R.X
TA A wants to update R.X, asks for X-Lock
TA A does not obtain X-Lock because TA B holds an S-Lock A waits
TA B wants to update R.X, asks for X-Lock
TA B does NOT obtain X-Lock B waits
DEADLOCK break via Rollback of some TA
O. Günther: Database Management Systems
122
Deadlocks
• Problem: How to recognize deadlocks?• How to treat deadlocks involving several TAs?
•
• Searching for cycles in the WAIT-FOR graph
wait for
O. Günther: Database Management Systems
123
Serializability
• Given a set of TAs, which possible events should be considered correct?• Convention: a schedule is considered correct if it is serializable• Serializability means that the result of the schedule is identical to the result of some serial schedule• Ex.:
(TA1) A := A + 1 read A into main memory add 1
write A back into the DB
(TA2) A := 2 * A read A into main memory multiply by 2 write A back into the DB
(TA3) write A read A into main memory display A on the screen set A to 1 in the DB
O. Günther: Database Management Systems
124
Serializability - An Example
• Assumption: A = 1
TA1, TA2, TA3:
TA1, TA3, TA2:
TA2, TA3, TA1:
TA2, TA1, TA3:
TA3, TA1, TA2:
TA3, TA2, TA1:
O. Günther: Database Management Systems
125
Concurrency: 2-Phase Locking
• 2-Phase locking protocol– for each transaction one first asks for all required locks (phase I)– processing ...– then all locks are (gradually) released (phase II)
TA2: no 2-phase-locking
numberof locks
O. Günther: Database Management Systems
126
Concurrency and 2-Phase Locking
Theorem: 2-Phase Locking Protokoll for each transaction
Serializability of the schedule
2-phase-locking
all „reasonable“ possibilities
equivalent to FIFO
serial
serializable
O. Günther: Database Management Systems
127
• Constraints and Properties - Minimum distance between roads and biotopes - River width varies widely - line vs. polygon - Roads are not necessarily connected - River and road shapes are independent of each other - Biotope shape depends on river shape
Environmental Data Modeling: An Example
O. Günther: Database Management Systems
128
Environmental Data Modeling: An Example (2)
• Queries What is the distance between the planned road and the biotope? Which roads have a distance of less than x meters from a biotope? Where do we need an intersection? Where do we need a bridge? How much area is enclosed between roads and river? Which roads go along the river?
• Updates An intersection is built. The road is built. A bridge is built. Generate a class bridge dynamically.
O. Günther: Database Management Systems
129
Spatial Data Types
• Points • Lines • Polygons • Curves • Polyhedra in arbitrary dimensions
• Applications Computer graphics Robotics CAD/CAM Geography Computer vision Environmental information systems
O. Günther: Database Management Systems
130
Spatial Operators (1): Set Operators
• Union • Intersection • Difference
O. Günther: Database Management Systems
131
Spatial Operators (2): Search Operators
Point Query: find all spatial objects that contain/are near a given pointRange Query: find all objects that contain/ intersect/are contained in a
given spatial object, such as a polygon
O. Günther: Database Management Systems
132
Spatial Operators (3): Similarity Operators
• Translation • Rotation
• Scaling
O. Günther: Database Management Systems
133
Spatial Operators (4): Spatial Joins
• Join between different classes of objects • Examples Find all houses that are within 10 km from a lake Find all buildings that are located within a biotope Find all schools that are more than 5 km away from a firestation Related: general map overlay
O. Günther: Database Management Systems
134
Spatial Data Structures (1):Vertex Lists
• List of polygon vertices
• Supported operators: Similarity operators (Set operators)
• Problems: Not unique No invariants List vs. set Simple polygons - invalid representations
O. Günther: Database Management Systems
135
Spatial Data Structures (2):B-Rep (Boundary Representation)
O. Günther: Database Management Systems
136
Spatial Data Structures (3):B-Rep (Boundary Representation)
• 3D: DAG of height 3
• Supported operators: Similarity operators • Problems: not unique, invalid representations, search / set operators, redundancy
O. Günther: Database Management Systems
137
What's the problem with commercial GIS?
• GIS = Geographic Information Systems • Originally oriented towards file systems • Scaling problems • No ad hoc query facility • Semantic integrity problems • Single user environment, little or no concurrency • No distributed GIS • Little support for application-specific data types or operators • Possible solution: use commercial databases
O. Günther: Database Management Systems
138
And what about commercial databases? (1)
• No geometric data types: point, line, polygon, ... • Geometric representation may be hidden in a long field
• ... or in an external file
• Inflexible • No database support for geometric operations • No notion of topology • Redundancy
ID Color Shape2 blue /usr/john/pol2... ... ...
ID Color Shape2 blue ((1,1) (2,7) (3,9) ...)... ... ...
polygon
polygon
O. Günther: Database Management Systems
139
And what about commercial databases? (2)
• Objects may be decomposed onto different relations • No spatial clustering • Shared objects less redundancy• Example: boundary representation
ID Facescuboid f1cuboid f2
pyramid f101... ...
ID Edgesf1 e1f1 e2... ...
ID Verticese1 v1e1 v2e2 v2... ...
ID X Y Zv1 0 1 0v2 0 3 2v3 ... ... ...
part faces
edges vertices
O. Günther: Database Management Systems
140
And what about commercial databases? (3)
• No spatial access methods • Little support for application-specific object types - Cities - Rivers - ... • ... or for application-specific operations - Build a bridge - Modify a shape - ...
O. Günther: Database Management Systems
141
Database Extensions (1)Abstract Data Types
• Abstract data types (ADTs) - Encapsulation of a (user-defined) data structure - Collection of (user-defined) operators on this structure - Implementation details hidden from the user
• ADTs in databases: BOX - example
create boxes (ID = i4, layer = c15,
box-desc = Box) append to boxes (ID = 99, layer = "polysilicon",
box-desc = "0,0 : 2,3") range of b is boxes replace b (box-desc = b.box-desc INT "0,0 : 4,1") where b.ID = 99 retrieve (boxes.ID) where AREA(boxes.box-desc > 100)
O. Günther: Database Management Systems
142
Database Extensions (2):Implementation of Abstract Data Types
define type Box is (Internal length = 16, Input Proc = CharToBox, Output Proc = BoxToChar, Default = '' '')
define operator INT (Box,Box) returns Box is (Proc = BoxInt,
Precedence = 3, Associativity = ''left'', Sort = left X)
define operator AE (Box,Box) returns boolean is (Proc = BoxAE, Precedence = 3,
Associativity = ''left'', Sort = BoxArea, Hashes, Restrict = AERSelect, Join = AEJSelect, Negator = BoxAreaNE)
• C-Procedures BoxArea, AERSelect, AEJSelect, etc.
O. Günther: Database Management Systems
143
Database Extensions (3):Implementation of Abstract Data Types
• Advantages - Very flexible - Data structures and operators can be very complex • Disadvantages - Two programming paradigms: DBMS and C - ADT maps into only one column: structural information gets lost - Complexity hidden in the ''black box'‘ - Problems for query optimization: what's inside?
O. Günther: Database Management Systems
144
• Point query • Range query
Database Extensions (3):Spatial Access Methods
O. Günther: Database Management Systems
145
Database Extensions (5): R - Trees
• Features - Hierarchy of d-dimensional boxes - Balanced tree - One node per disk page - Fully dynamic • Problems - Overlap of sibling boxes - bad for point searches - Arbitrary shapes: additional computations and disk accesses (clustering!)
O. Günther: Database Management Systems
146
Object-Oriented Database Systems
The OODBS Manifesto (Atkinson et al. 1989): OODBS = DBS + ...
• Complex objects (PART-OF) - Structural OO • User-defined data types - Behavioral OO • Object identity • Encapsulation • Types/Classes • Inheritance (IS-A) • Operators: overloading / overriding / late binding
O. Günther: Database Management Systems
147
Behavioral Object-Orientationfor Geometric Modeling
• Integration of complex geometric data types and operators
add class Point type tuple (x: real y: real) add method DistOrigin: real in class Point return (sqrt(sqr(selfx)+sqr(selfy)))
O. Günther: Database Management Systems
148
Structural Object-Orientationfor Geometric Modeling (1)
• Complex geometric objects • Boundary representation: 3D 2D 1D 0D • Shared subobjects: faces, lines, points
O. Günther: Database Management Systems
149
Structural Object-Orientationfor Geometric Modeling (2)
add class River type tuple (rname: string
rshape: list(PolylineOrPolygon)) add class PolylineOrPolygon type list(Point) add class Polyline inherits PolylineOrPolygon ... add class Polygon inherits PolylineOrPolygon ... add class Point type tuple (x: real y: real)
O. Günther: Database Management Systems
150
Structural Object-Orientationfor Application Modeling (1)
• Complex geo-objects • Example: city - districts - streets
O. Günther: Database Management Systems
151
Structural Object-Orientationfor Application Modeling (2)
add class City type tuple (cname: string cpopulation: integer districts: set(District) cshape: Polygon) add class District type tuple (dname: string dpopulation: integer dshape: Polygon streets: set(Street)) add class Street type tuple (sname: string sshape: Polyline)
O. Günther: Database Management Systems
152
Behavioral Object-Orientationfor Application Modeling
• Integration of application-specific data types and operations
add method CompPop: integer in class City d: District p: integer for each d in self districts { p = p+d dpopulation } return(p) add method CompShape ... add method CompStreets ...