152
O. Günther: Database Mana gement Systems 1 Database Management Systems Prof. Oliver Günther, Ph.D.

O. Günther: Database Management Systems

Embed Size (px)

Citation preview

Page 1: O. Günther: Database Management Systems

O. Günther: Database Management Systems

1

Database Management Systems

Prof. Oliver Günther, Ph.D.

Page 2: O. Günther: Database Management Systems

O. Günther: Database Management Systems

2

Databases = Electronic Filing Cabinets?

online access vs. applications difference DB-WWW?

Page 3: O. Günther: Database Management Systems

O. Günther: Database Management Systems

3

Databases = Electronic Filing Cabinets?

Page 4: O. Günther: Database Management Systems

O. Günther: Database Management Systems

4

Requirements for a Database System

• large capacity - huge data sets: - banking/insurance apps.: gigabytes of data (109 - 1011 bytes) - environmental apps.: terabytes of data ( > 1012 bytes)• user-friendly read/write access• efficient processing - short response times• data security• privacy• persistency, robustness towards hardware problems• control of redundancy• consistency• multiple users (including concurrency)• integrated data management• structured data management (logical, physical)• low cost• role of standards• data independence

Page 5: O. Günther: Database Management Systems

O. Günther: Database Management Systems

5

3-Layer Architecture

• External layers PASCAL COBOLUser views record emp of 01 Ang pno: string; 02 P-NR PIC X(6) ... salary: integer; 02 ABT PIC X(4) end

• Conceptional layer EMPLOYEEcommon logical view PNO CHAR(6) DEPT CHAR(4) SALARY INT

• Internal layer STORED_EMP LENGTH=20common physical view PREFIX TYPE=BYTE(6), OFFSET=0 EMP# TYPE=BYTE(6), OFFSET=6 DEPT# TYPE=BYTE(4), OFFSET=12 WAGE TYPE=FULLWORD, OFFSET=16

Page 6: O. Günther: Database Management Systems

O. Günther: Database Management Systems

6

3-Layer Architecture (cont.)

• External layer- one external layer per user view or application program- Application program: embedded database commands- User: ad hoc query languages, menus, frames

• Conceptional layer- logical view of the complete database- often union of all external views

• Internal layer- oriented along the physical storage structure - (pages/blocks)- data independence??

Page 7: O. Günther: Database Management Systems

O. Günther: Database Management Systems

7

Database Administration

• Database administrator (DBA) - user contact - definition of external views - definition of conceptional view - definition of internal view - security mechanisms - backup and recovery mechanisms - monitoring of response behavior• Data dictionary (metadata) - which data are known? - how are the data structured logically? - how are the data structured physically?

Page 8: O. Günther: Database Management Systems

O. Günther: Database Management Systems

8

Abstraction Layers: Logical vs. Physical

• logical modeling - entities - relationships• data modeling - hierarchical - network - relational - object-oriented• physical modeling - storage structures - access methods

Page 9: O. Günther: Database Management Systems

O. Günther: Database Management Systems

9

Entity Relationship Model

• ER = Entity - Relationship• entity: object, „thing“• attribute: property• entity set/entity class: object class• relationship• Example.: - entity classes: supplier, part - attributes: supplier number, supplier name, address part number, part name, color - entities: Miller, Smith, Shultz, <supplier> screw, nail <part> - relationships: supplies - attributes (of relationships): capacity

Page 10: O. Günther: Database Management Systems

O. Günther: Database Management Systems

10

Data Models

• Hierarchical - 1:n relationships - tree-like data structures - Products: IMS, ...• Ex.: Company, Supplier, Product, Part• Problems - n:m relationships (Ex.: Product-Supplier) - redundancies - tight coupling logical-physical

Page 11: O. Günther: Database Management Systems

O. Günther: Database Management Systems

11

Hierarchical Data Model - Example

Page 12: O. Günther: Database Management Systems

O. Günther: Database Management Systems

12

Hierarchical Data Model - Example

Page 13: O. Günther: Database Management Systems

O. Günther: Database Management Systems

13

Hierarchical Data Model - a Concrete Database

Page 14: O. Günther: Database Management Systems

O. Günther: Database Management Systems

14

Hierarchical Data Model - a Concrete Database

Page 15: O. Günther: Database Management Systems

O. Günther: Database Management Systems

15

Data Models (2)

• Network (''CODASYL'') n:m relationships - graph-like data structures Products: IDMS, ADABAS (Software AG), ... Ex.: Supplier - Part (n:m relationship)

database schema

a concrete database

Page 16: O. Günther: Database Management Systems

O. Günther: Database Management Systems

16

Data Models (2)

• Network (''CODASYL'') n:m relationships - graph-like data structures Products: IDMS, ADABAS (Software AG), ... Ex.: Supplier - Part (n:m relationship)

database schema

a concrete database

• Problem: confusing, inefficient

Page 17: O. Günther: Database Management Systems

O. Günther: Database Management Systems

17

Data Models (2)

• Network (''CODASYL'') n:m relationships - graph-like data structures Products: IDMS, ADABAS (Software AG), ... Ex.: Supplier - Part (n:m relationship)

database schema

a concrete database

• Problem: confusing, inefficient

Supplier Part

Page 18: O. Günther: Database Management Systems

O. Günther: Database Management Systems

18

Data Models (3)

• Relational - n:m relationships - table as data structure - Products: Oracle 8i, Informix Universal Server, IBM DB2, SYBASE, Microsoft Access, Microsoft SQL Server, ... - market share still growing• Problem - legacy problems - migration strategies (Y2K)?

Page 19: O. Günther: Database Management Systems

O. Günther: Database Management Systems

19

The Relational Data Model

• Ex.: Supplier - Part

Supplier (S-No, Name, Address)

Part (P-No,P-Name, Color)

SP_R (S-No, P-No, Capacity)

S-No Name AddressS1 Miller M-TownS2 Smith S-TownS3 Günther G-Town

P-No P-Name ColorP1 bracket blueP2 plate blackP3 screw grey

S-No P-No CapacityS1 P1 300S1 P2 200S1 P3 400S2 P2 300S2 P3 500S3 P3 100

Page 20: O. Günther: Database Management Systems

O. Günther: Database Management Systems

20

The Relational Data Model: Formal Calculus

• relation (table): subset of the Cartesian product of a list of domains• domain: set of possible values for one column

- Ex.: INTEGER {0,1} {grey, blue, red} - similar to a data type in programming languages

Page 21: O. Günther: Database Management Systems

O. Günther: Database Management Systems

21

The Relational Data Model: Formal Calculus (cont.)

• Cartesian Product of a set of domains Di

(D1×D2×...×Dk ): set of all k-tuples (v1, ..., vk), where vi Di (i=1,...,k)

- Ex.: k=2, D1={0,1}, D2={a,b,c}

(0, a)(0, b)(0, c)(1, a)(1, b)(1, c)

D1D2 =

Page 22: O. Günther: Database Management Systems

O. Günther: Database Management Systems

22

The Relational Data Model: Formal Calculus (cont.)

• relation: finite subset of the Cartesian product - Ex.:

• tuple: element (line) of a relation• arity or degree: number of attributes (columns) of a relation• a tuple (v1 ... vk) has k components (k-tuple)• schema: the collection of the relation name, the attribute names, and the domains Ex.: Supplier (S-No, Name, Address)• relations are sets (in principle ...) - no tuples can appear more than once - not sorted

Column-1 Column-20 a0 b1 b

Page 23: O. Günther: Database Management Systems

O. Günther: Database Management Systems

23

Entities vs. Relationships vs. Relations

• entity set relation• attribute column• ex.: Supplier (S-No, Name, Adress, ...)• entity line (tuple)• relationship between entity sets E1, ..., Ek relation whose schema consists of the key attributes of E1, ..., Ek (+ possibly additional information) Ex. 1: Supplier (S-No, Name, Address)

Part (P-No, P-Name, Color) ZT_R (S-No, P-No, Capacity)

Ex. 2: Student (Student-No, Name, Birthdate, ...) Lecture (Department, Lecture-No, ...) Takes (Student-No, Lecture-No, Grade, ...)

Page 24: O. Günther: Database Management Systems

O. Günther: Database Management Systems

24

Key of a Relation

• a set S of attributes of a relation R is a key of R if (1) no instance of R may contain two different tuples that have the same values for all attributes in S (uniqueness) (2) there is no true subset of S that has property (1) (minimality)

• often depends on the application:

Ex. 1: Supplier (S-No, Name, Address) Part (P-No, P-Name, Color) ZT_R (S-No, P-No, Capacity)

Ex. 2: Student (Student-No, Name, Birthdate, ...) Lecture (Department, Lecture-No, ...) Takes (Student-No, Lecture-No, Grade, ...)

Page 25: O. Günther: Database Management Systems

O. Günther: Database Management Systems

25

Key of a Relation (cont.)

• the question of what is the key of a relation R depends on R‘s schema, not on the current instance (Ex.: Supplier.Name)• relations can have more than one key

Ex.: Department (Name, Address, Dept_Code): Name and Dept_Code are both unique (in one company) and therefore keys But: Employee (Name, P-No, Salary)??

• If there is more than one key, one selects one of these candidate keys as primary key, depending on the application• if one has more than one relation with the same key, one may consider merging them

Ex.: Department (Name, Address, Dept_Code) Manager (Emp_No, Dept_Name) Dept (Name, Address, Dept_Code, Manager)

Page 26: O. Günther: Database Management Systems

O. Günther: Database Management Systems

26

Relational Algebra

• set of mathematical operations on relations [Codd, 1970s] Ex.: R S

• union R S and difference R - S - R and S have to have same arity - domains have to be compatible

A B Ca b cd a fc b d

D E Fb g ad a f

Page 27: O. Günther: Database Management Systems

O. Günther: Database Management Systems

27

Relational Algebra (cont.)

A B C D E F

A C

A B C

• Cartesian Product R S

• projection (subset of columns) -Ex.: A,C(R)

• Selection (subset of tuples (lines)) - Ex.:B=b (R)

Page 28: O. Günther: Database Management Systems

O. Günther: Database Management Systems

28

Relational Algebra (cont.)

- Join R S

ij

- i, j : names of columns (R.i, S.j) - : arithmetic comparison operator (=, <, , ...) - subset of the Cartesian product R S, for which is true

- Ex.: R S B<D

A B C D E F

Page 29: O. Günther: Database Management Systems

O. Günther: Database Management Systems

29

Relational Algebra (cont.)

• equijoin special kind of - Join: is =

Ex.: R S B=D

• natural Join special kind of equijoin applicable if the two input relations have columns with the same name Ex.: T U

T U T U

A B C D E F

A B Ca b cd b cb b fc a d

B C Db c db c ea d b

A B C D

Page 30: O. Günther: Database Management Systems

O. Günther: Database Management Systems

30

SQL: Structured Query Language

• 4th Generation Language (4 GL) for data querying and manipulation• 4 GL: user only has to specify which data are needed, not how they can be obtained (data independence!)• DBMS (Database Management System) takes care of (efficient) computation• SQL : IBM Research (San Jose, Kalifornien), '70s

Page 31: O. Günther: Database Management Systems

O. Günther: Database Management Systems

31

Toy Database

Name Address BalancePeter 1 Sybel St. -200Jane 5 Kant St. -50Ruth 7 Miller St. 43

O_No Date Customer1024 Jan 3 Peter1025 Jan 3 Ruth1026 Jan 4 Peter

O_No Product Amount1024 Brie 31024 Perrier 61025 Brie 51025 Oysters 121025 Onions 11026 Peanuts 2048

Name Product PriceChris Brie 3.49Chris Perrier 1.19Chris Peanuts .06 Chris Oysters .25Jack Brie 3.98Jack Perrier 1.19 Jack Onions .69

Customers Orders

Contains Supplies

Page 32: O. Günther: Database Management Systems

O. Günther: Database Management Systems

32

SQL: Projection

• Ex.: Find name and account balance of all customers• relational algebra:

• in SQL:

• projection in SQL in general:SELECT Ri1·A1, Ri2·A2, ..., Rir·Ar FROM R1, R2, ..., Rk

Page 33: O. Günther: Database Management Systems

O. Günther: Database Management Systems

33

SQL: Selection

• Ex.: Find all customers with negative balance• SQL:

• relational algebra:

• selection in SQL in general:SELECT * {alle Attribute} FROM R

WHERE

Page 34: O. Günther: Database Management Systems

O. Günther: Database Management Systems

34

SQL: Uniqueness of Names

• if attribute names are unique, one can drop the relation name(s) in the SELECT and the WHERE clause• Ex.:

SELECT Customers.Name FROM Customers WHERE Customers.Balance < 0

Page 35: O. Günther: Database Management Systems

O. Günther: Database Management Systems

35

SQL: Aliases

• alias = second name for an attribute

• to be attached to the original name of the column

• Ex.:

SELECT Name Client, Address, Balance Deficit FROM Customers WHERE Balance < 0

Page 36: O. Günther: Database Management Systems

O. Günther: Database Management Systems

36

SQL: Equijoins

• Ex.: Find the products ordered by Peter• in relational algebra

• in SQL SELECT FROM WHERE

• Ex.: Find the names of all suppliers that carry at least one of the products that have been ordered by Peter

Page 37: O. Günther: Database Management Systems

O. Günther: Database Management Systems

37

SQL: Processing a Join Query

• Ex.: SELECT Product FROM Orders, Contains WHERE Customer = '‘Peter'' AND Orders.O_No = Contains.O_No

Page 38: O. Günther: Database Management Systems

O. Günther: Database Management Systems

38

SQL: Processing a Join Query (cont.)

Selection: Customer = '‘Peter''

Equijoin: Orders.O_No = Contains.O_No

• Starting with Orders

Page 39: O. Günther: Database Management Systems

O. Günther: Database Management Systems

39

SQL: Processing a Join Query (cont.)

Projection: SELECT Product

• Operations can sometimes be exchanged efficiency?

Page 40: O. Günther: Database Management Systems

O. Günther: Database Management Systems

40

SQL: Deleting Multiple Copies of a Tuple

• why do they exist?• keyword DISTINCT• Ex.: SELECT DISTINCT Customer FROM Orders• without DISTINCT?

Page 41: O. Günther: Database Management Systems

O. Günther: Database Management Systems

41

SQL: Tuple Variables

• necessary if one needs to address several different tuples of the same relation in the same query• Ex.: Find names and addresses of all customers that have less money on their account than Jane

SELECT FROM WHERE AND

• tuple variables are relations, i.e., sets of tuples• they serve to represent intermediate results

Page 42: O. Günther: Database Management Systems

O. Günther: Database Management Systems

42

SQL: Tuple Variables

• necessary if one needs to address several different tuples of the same relation in the same query• Ex.: Find names and addresses of all customers that have less money on their account than Jane

SELECT C1.Name, C1.Address FROM Customers C1, Customers C2, WHERE C1.Balance < C2.Balance AND C2.Name = ''Jane''

• tuple variables are relations, i.e., sets of tuples• they serve to represent intermediate results

Page 43: O. Günther: Database Management Systems

O. Günther: Database Management Systems

43

SQL: Subqueries

• nesting of queries• reference to intermediate results via the keyword IN• Ex.: Find all suppliers that carry at least one of the products ordered by Peter

1 SELECT Name 2 FROM Supplies 3 WHERE Product IN 4 (SELECT Product 5 FROM Contains 6 WHERE O_No IN 7 (SELECT O_No 8 FROM Orders 9 WHERE Customer = '‘Peter''))

• IN corresponds to the element operator

Page 44: O. Günther: Database Management Systems

O. Günther: Database Management Systems

44

SQL: Subqueries (cont.)

• Instead of IN : ALL

SELECT Product FROM Supplies WHERE Price >= ALL (SELECT Price FROM Supplies)

• ALL corresponds to the universal quantor

Page 45: O. Günther: Database Management Systems

O. Günther: Database Management Systems

45

SQL: Subqueries (cont.)

• Instead of IN : ANY

SELECT FROM Orders WHERE O_No < ANY (SELECT O_No FROM Orders WHERE Customer ='‘Peter'')

• ANY corresponds to the existential quantor

Page 46: O. Günther: Database Management Systems

O. Günther: Database Management Systems

46

SQL: Subqueries (cont.)

• Statt IN: =

SELECT Product FROM Contains WHERE O_No = (SELECT O_No FROM Orders WHERE Customer = ''Ruth'')

• If cardinality of the subquery‘s result is greater than 1: ERROR

Page 47: O. Günther: Database Management Systems

O. Günther: Database Management Systems

47

SQL: Aggregates

• Functions for the aggregation of single values

AVG - average COUNT - number SUM - sum MIN - minimum MAX - maximum STDDEV - standard deviation VARIANCE - variance

• Ex.: SELECT AVG(Balance) FROM Customers

• Or: SELECT AVG(Balance) Average FROM Customers

Page 48: O. Günther: Database Management Systems

O. Günther: Database Management Systems

48

SQL: Aggregates (cont.)

• Ex.: SELECT COUNT (DISTINCT Name) No-Suppliers FROM Supplies

• Ex.: SELECT COUNT(Name) No-Brie-Suppliers FROM Supplies WHERE Product =''Brie''

- no duplicate elimination required

No-Suppliers2

No-Brie-Suppliers2

Page 49: O. Günther: Database Management Systems

O. Günther: Database Management Systems

49

SQL: Aggregation and Grouping

• GROUP BY

A1, A2, ..., Ak

• two tuples are in the same group if they have the same values for the attributes A1, A2, ..., Ak

• Ex.: SELECT Product, AVG(Price) Average-Price FROM Supplies GROUP BY Product Product Average-Price

Page 50: O. Günther: Database Management Systems

O. Günther: Database Management Systems

50

SQL: Aggregation and Grouping (cont.)

• Ex:

SELECT Customer, AVG(Amount) FROM Orders, Contains WHERE Orders.O_No = Contains.O_No GROUP BY Customer

Page 51: O. Günther: Database Management Systems

O. Günther: Database Management Systems

51

SQL: GROUP BY ... HAVING

• general format: GROUP BY A1, A2, ..., Ak

HAVING • is a boolean expression that is applied to each group separately• one selects only those groups where the condition is true• Ex.: SELECT Product, AVG(Price) Average-Price FROM Supplies GROUP BY Product

HAVING COUNT(*) > 1

• Or: HAVING COUNT (DISTINCT Price) > 1

Page 52: O. Günther: Database Management Systems

O. Günther: Database Management Systems

52

SQL: Insertion of Tuples

• in general: INSERT INTO R VALUES (Vi, ..., Vk)• ex.: INSERT INTO Supplies VALUES (''Jack'',''Oysters'',.24)• null values: INSERT INTO Supplies (Name, Product) VALUES (''Jack'',''Oysters'')• nested insertions: INSERT INTO Sales-Chris SELECT Product, Price FROM Supplies WHERE Name = ''Chris''

Page 53: O. Günther: Database Management Systems

O. Günther: Database Management Systems

53

SQL: Deletion of Tuples

• in general: DELETE FROM R WHERE • ex.: DELETE FROM Supplies WHERE Name = ''Chris'' AND Product = ''Perrier''

• ex.: Delete all orders containing Brie

Page 54: O. Günther: Database Management Systems

O. Günther: Database Management Systems

54

SQL: Updating Tuples

• in general: UPDATE R SET A1=x1, ..., Ak=xk

WHERE • ex.: UPDATE Supplies SET Price = 1.00 WHERE Name = ''Chris'' AND Product = ''Perrier''

• ex.: Chris reduces all prices by 10 percent..

Page 55: O. Günther: Database Management Systems

O. Günther: Database Management Systems

55

SQL - DDL

• DDL: Data Definition Language• so far we only discussed the DML - Data Manipulation Language • typical DDL command: CREATE TABLE• general format: CREATE TABLE R(A1T1 [NOT NULL], ...,

AkTk [NOT NULL]) • ex.: CREATE TABLE Supplies

(Name CHAR(20) NOT NULL, Product CHAR(10) NOT NULL,

Price NUMBER (6,2))

• to delete a table: DROP TABLE Supplies

Page 56: O. Günther: Database Management Systems

O. Günther: Database Management Systems

56

Views

• logical relations• so far we only discussed physical relations (stored on disk), also called base relations• views serve to represent specific user views• view contents are not stored physically but computed on demand• one can query (i.e., read only) views just like base relations • updates (write access) are not so easy

Page 57: O. Günther: Database Management Systems

O. Günther: Database Management Systems

57

Views (cont.)

• view definition - general form CREATE VIEW V (A1, ... , Ak) AS <SELECT Query>• Ex.: CREATE VIEW Offer - Chris (Product, Price) AS SELECT Product, Price FROM Supplies WHERE Name = 'Chris'

Product PriceBrie 3.49

Perrier 1.19Nuts .06

Oysters .25

Page 58: O. Günther: Database Management Systems

O. Günther: Database Management Systems

58

View Update Problem

• ex.: Offer - Chris DELETE INSERT UPDATE (Price) UPDATE (Product)

• more complex example.: CREATE VIEW Customer-Order (Name, Date, Product, Amount) AS SELECT Customer, Date, Product, Amount FROM Orders, Contains WHERE Orders.O_No = Contains.O_No

- DELETE - INSERT

- UPDATE (Name)- UPDATE (Date)- UPDATE (Product)- UPDATE (Amount)

Page 59: O. Günther: Database Management Systems

O. Günther: Database Management Systems

59

View Update Problem (cont.)

• ex.: CREATE VIEW X AS SELECT Product, AVG(Price) DP FROM Supplies GROUP BY Product - UPDATE (DP) - UPDATE (Product) - INSERT - DELETE

Page 60: O. Günther: Database Management Systems

O. Günther: Database Management Systems

60

View Update Problem (cont.)

• ex.: CREATE VIEW Y AS SELECT C2.Name, C2.Address FROM Customers C1, Customers C2 WHERE C2.Balance < C1.Balance AND C1.Name = 'Jane'

- INSERT - DELETE - UPDATE (Name) - UPDATE (Address)

Page 61: O. Günther: Database Management Systems

O. Günther: Database Management Systems

61

View Update Problem (cont.)

• Views can be updated if (1) the corresponding base relations can be updated (i.e., no non-updatable views) (2) the SELECT command is a combination of only projections (column subsets) and selections (row subsets) (i.e., no joins, subqueries, tuple variables, aggregates, etc.). In case of projections, the key has to be preserved.

Page 62: O. Günther: Database Management Systems

O. Günther: Database Management Systems

62

View Update Problem (cont.)

all possible views

views that can be updated

views according to (1) and (2)

views that can be updatedin SQL (version-dependent)

Page 63: O. Günther: Database Management Systems

O. Günther: Database Management Systems

63

Views - Summary

• logical relations• defined using physical base relations (and possibly other views)• (typically) not stored physically but computed on demand using the current content of the base relations• same data can be „viewed“ in different shapes• supports different user groups and privacy• view updates: problematic because not all updates can be mapped to base relations

Page 64: O. Günther: Database Management Systems

O. Günther: Database Management Systems

64

Databases - Programming Languages

• collision of two different paradigms - PL: one tuple at a time - DB: many tuples at a time

• interface tuple - variable: communication via „cursors“ (buffer)

• queries are preformulated using variables

• instantiation at run-time with real values

Page 65: O. Günther: Database Management Systems

O. Günther: Database Management Systems

65

Ex: Embedded SQL

exec sql begin declare section; int O_No, Amount; char Date [10], Customer [20], Product [10];exec sql end declare section;exec sql connect;exec sql prepare order-insert from insert into Orders values (:O_No, :Date, :Customer);exec sql prepare cont-insert from insert into Contains values (:O_No, :Product, :Amount);write (‚Enter Order No., Date, and Customer‘);read (O_No); read (Date); read (Customer);exec sql execute order-insert using :O_No, :Date, :Customer;write (‚Enter a list of tuples ‚Product-Amount‘, terminate with ´end´´);read (Product);while (Product ! = 'end') { read (Amount); exec sql execute cont_insert using :O_No, :Product, :Amount; read (Product); }

Page 66: O. Günther: Database Management Systems

O. Günther: Database Management Systems

66

Integrity in Databases

• maintenance of a correct relationship database - real world• (possibly automatical) identification of invalid states of the database (i.e., states without correspondence in the real world)• three kinds of integrity

domain-specific integrity (application-specific, ex.: date) key integrity schema integrity

Page 67: O. Günther: Database Management Systems

O. Günther: Database Management Systems

67

Integrity in Databases (cont.)

• key integrity - rule 1 (entity integrity): each relation must have a key, and each tuple in the relation must have a key value that is unique and non-NULL. - rule 2 (referential integrity): for each foreign key FK there is another relation with a primary key PK such that each non-NULL value of FK is identical to an existing value of PK. - Ex.: foreign key O_No in relation Contains, foreign key Customer in relation Orders

• schema integrity

Page 68: O. Günther: Database Management Systems

O. Günther: Database Management Systems

68

Database Design

• ex. for bad database design:

Suppliers - Info

• disadvantages redundancies update anomalies insertion anomalies (ex: supplier without products) deletion anomalies (NULL in key)

Name Address Product PriceChris 24 Kant St. Brie 3.49Chris 24 Kant St. Perrier 1.19Jack 2 Main St. Brie 3.98... ... ... ...

Page 69: O. Günther: Database Management Systems

O. Günther: Database Management Systems

69

Database Design by Decomposition

• approach: decomposition into relations with less columns Careful: no information loss

• Ex.: Suppliers (L-Name, L-Address) Supplies (L-Name, Product, Price)

• disadvantage: may require additional join operations at query time

Page 70: O. Günther: Database Management Systems

O. Günther: Database Management Systems

70

Functional Dependencies

• logical dependencies between columns• causes many of the problems discussed above - redundancies - update anomalies - ...• Definition: If for a relation R there is a functional dependency (FD) X Y (where X and Y may represent one or several columns of R) then the following holds for two arbitrary tuples t1 and t2 in R: t1 [X] = t2 [X] t1 [Y] = t2 [Y] .• A functional dependency defined on relation R holds for all instances of R

Page 71: O. Günther: Database Management Systems

O. Günther: Database Management Systems

71

Functional Dependencies (cont.)

• Ex.: Customers: Name Address

Name Balance Orders: O_No Date

O_No Customer Customers: Address Address Supplies: {Name, Product} Price

• for each key S of a relation R and each subset T of columns of R we have: S T• Some FDs imply other FDs• Ex.: F = {A B, B C} |= A C

Page 72: O. Günther: Database Management Systems

O. Günther: Database Management Systems

72

Closure of FD Sets

• F+:= {X Y: there is an FD A B in F: A B |= X Y}

• the closure F+ of a set F of FDs contains all functional dependencies implied by the FDs in F

• Ex.: F = {A B; B C; AB C}

F+ =

Page 73: O. Günther: Database Management Systems

O. Günther: Database Management Systems

73

Minimal Cover of a Set F of FDs

• given a set F of FDs, F is a minimal cover of F if and only if: (1) F+ = F+, i.e., all FDs F are implied by the FDs in F. F and F are equivalent. (2) the right side of each FD in F is a single attribute (3) there is no (X A) F : (F -{X A})+= F+, i.e., there are no superfluous FDs in F (4) there is no (X A) F, Z X: F - (X A) (Z A))+= F+, i.e., no FD in F can be replaced by a simpler FD

Page 74: O. Günther: Database Management Systems

O. Günther: Database Management Systems

74

FDs and Database Design

• potential problem: too many FDs in a relation • may lead to anomalies and redundancies• solution: decomposition into several simple relations

Ri R (i = 1,..., k)

R = R1 || R2 || ... || Rk • less redundancies but possibly more joins • important for preservation of information:

one has to be able to re-assemble R by joining the Ri (lossless join) the FDs defined in R have to be definable on the Ri

(preservation of dependencies)

Page 75: O. Günther: Database Management Systems

O. Günther: Database Management Systems

75

Database Design and Normal Forms

• why normal forms? - format standardization (1NF) - reduction/elimination of redundancies (2NF, 3NF, ...)• theoretical tool for improving/maintaining database design quality• in practice, however: redundancy vs. efficiency - redundant data may lead to inconsistencies after updates - but useful for efficiency reasons (shorter response times)• tradeoff problem: to be decided case by case

Page 76: O. Günther: Database Management Systems

O. Günther: Database Management Systems

76

1st Normal Form (1NF)

• all attributes have to be atomic• no „repeating groups“• important foundation of the relation model• but: may lead to increased redundancy• Ex.: relation Supplies

Name Product PriceChris Brie 3.49Chris Perrier 1.19Chris Peanuts .06 Chris Oysters .25Jack Brie 3.98Jack Perrier 1.19 Jack Lettuce .69

Name Product PriceChris Brie 3.49

Perrier 1.19Peanuts .06 Oysters .25

Jack Brie 3.98Perrier 1.19 Lettuce .69

repeating groups

(a) not in 1NF (b) in 1NF

Page 77: O. Günther: Database Management Systems

O. Günther: Database Management Systems

77

2nd Normal Form (2NF)

• 1NF + for all attributes A and attribute sets X in relation R:

X A in R X is no real subset of at least one key of R AND OR A not in X A is key attribute (i.e., it belongs to at least one key of R)

• note: if R has only one key, this is equivalent to: 1 NF + each non-key attribute is fully functionally dependent on the key, i.e., it can not be inferred from part of the key• trivially true for one-column keys• Ex.: relation Supplies - Supplies (Name, Product, Price) is in 2NF if and only if Price depends on both Name and Product (free pricing) - with fixed prices (e.g. books in Germany), Supplies is no longer in 2NF - possibly decomposition into Supplies’ (Name, Product) and Costs (Product, Price)

Page 78: O. Günther: Database Management Systems

O. Günther: Database Management Systems

78

3rd Normal Form (3NF)

• 2NF + for all attributes A and attribute sets X in relation R: X is a key of R X A in R OR AND X contains a key of R A not in X OR A is a key attribute

• note: if there is only one key, this is equivalent to: 2NF + non-key attributes are mutually independent• sufficient (but not necessary) condition:: if an FD in the minimal cover contains all attributes of R then R is in 3NF• Ex.: relation Customers (Name, Address, Balance) - all attributes atomic 1NF - keys have only one column 2NF - Address and Balance are mutually independent 3NF

Page 79: O. Günther: Database Management Systems

O. Günther: Database Management Systems

79

3NF - An Example

• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R:• key attributes of R:• 1NF

• 2NF

• 3 NF

Page 80: O. Günther: Database Management Systems

O. Günther: Database Management Systems

80

3NF - An Example

• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R:• 1NF

• 2NF

• 3 NF

Page 81: O. Günther: Database Management Systems

O. Günther: Database Management Systems

81

3NF - An Example

• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R: C, S, Z• 1NF

• 2NF

• 3 NF

Page 82: O. Günther: Database Management Systems

O. Günther: Database Management Systems

82

3NF - An Example

• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R: C, S, Z• 1NF: no problem

• 2NF

• 3 NF

Page 83: O. Günther: Database Management Systems

O. Günther: Database Management Systems

83

3NF - An Example

• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R: C, S, Z• 1NF: no problem

• 2NF: o.k. because Z and C are key attributes

• 3 NF

Page 84: O. Günther: Database Management Systems

O. Günther: Database Management Systems

84

3NF - An Example

• relation R = (C, S, Z)• functional dependencies: F = { CS Z, Z C} • R in 3NF?• keys of R: CS, ZS• key attributes of R: C, S, Z• 1NF: no problem

• 2NF: o.k. because Z and C are key attributes

• 3 NF: o.k. for the same reason

Page 85: O. Günther: Database Management Systems

O. Günther: Database Management Systems

85

Decompositon into 3NF

• given: relation R, set of FD's F• find: decomposition of R into a set of 3NF relations Ri

• algorithm: IF R in 3NF THEN stop ELSE compute minimal cover F of F; create a separate relation Ri = A for each attribute Athat does not occur in any FD in F; create a relation Ri = XA for each FD X A in F; if the key K of R does not occur in any relation Ri, create one more relation Ri = K.• decomposition fulfills - lossless join - preservation of dependencies

Page 86: O. Günther: Database Management Systems

O. Günther: Database Management Systems

86

Decomposition into 3NF - Example

Attributes:

L ... Lecture R ... Room I ... Instructor S ... Student T ... Time G ... Grade

Relational Schema: R= (L, I, T, R, S, G)

Functional Dependencies:

Page 87: O. Günther: Database Management Systems

O. Günther: Database Management Systems

87

Decomposition into 3NF - Example (cont.)

Attributes:

L ... Lecture R ... Room I ... Instructor S ... Student T ... Time G ... Grade

Relational Schema: R= (L, I, T, R, S, G)

Functional Dependencies: F = {L I ,

TR L,TI R, LS G, TS R,TRI LR}

Page 88: O. Günther: Database Management Systems

O. Günther: Database Management Systems

88

Decomposition into 3NF - Example (cont.)

• Keys:

• Key Attributes:

Page 89: O. Günther: Database Management Systems

O. Günther: Database Management Systems

89

Decomposition into 3NF - Example (cont.)

• Keys: ST

• Key Attributes: S, T

Page 90: O. Günther: Database Management Systems

O. Günther: Database Management Systems

90

Decomposition into 3NF - Example (cont.)

• F = { L I , TR L,TI R, LS G, TS R,TRI LR}

• Minimal Cover

• Decomposition into Ri

Page 91: O. Günther: Database Management Systems

O. Günther: Database Management Systems

91

Decomposition into 3NF - Example (cont.)

• F = { L I , TR L,TI R, LS G, TS R,TRI LR}

• Minimal Cover F = {L I ,

TR L,TI R, LS G, TS R}

• Decomposition into Ri

Page 92: O. Günther: Database Management Systems

O. Günther: Database Management Systems

92

Indices

• data structures (often tree structures) that serve to accelerate database searches• frequent synonyms: index structures, access methods• Ex.: Supplies (Name, Product, Price)

Name Product PriceChris Brie 3.49Chris Perrier 1.19Chris Peanuts .06 Chris Oysters .25Jack Brie 3.98Jack Perrier 1.19 Jack Lettuce .69

Page 93: O. Günther: Database Management Systems

O. Günther: Database Management Systems

93

Indices (cont.)

• Name and Product are the indexed columns• Index on Name is primary index - indexed column is part of the primary key - relation is sorted by increasing primary key - well suited for processing range queries (Ex.: Find all suppliers whose name starts with B, C or D)• all other indices: secondary indices• tradeoff: queries vs. updates - indices accelerate many queries ... - ... but slow down updates

Page 94: O. Günther: Database Management Systems

O. Günther: Database Management Systems

94

Dense vs. Sparse Indices

• relations are stored in blocks (pages) on the magnetic disk• crucial cost factor: how many blocks to I have to transfer from disk to main memory in order to answer the query?• non-dense (or sparse) index: one index entry per block - for a primary index it suffices to store the smallest key value per block - index supports the system when looking for the relevant block(s) - inside each block: local search (cf. telephone directory) - useful for large relations because very compact - only possible for columns according to which the relation has been sorted (cf. phone directory) - therefore: at most one sparse index per relation • dense index: one index entry per tuple

Page 95: O. Günther: Database Management Systems

O. Günther: Database Management Systems

95

How Does a Disk Access Work?

Disk Drive

Readblock

Writeblock

Main Memory

Page 96: O. Günther: Database Management Systems

O. Günther: Database Management Systems

96

Dense vs.sparse indices:An Example

Oysters

Peanuts

Lettuce

Index on Name(sparse)

Index on Product(dense)

Price

PeanutsOysters

Lettuce

Page 97: O. Günther: Database Management Systems

O. Günther: Database Management Systems

97

• large relations large indices• indexing a larger index leads to a smaller index etc.• tree structure

• root fits on one page (= one block)

Layered Indices

Index (often dense)

File (Relation)

Page 98: O. Günther: Database Management Systems

O. Günther: Database Management Systems

98

B+ Tree

• tree structure as described above

A

Page 99: O. Günther: Database Management Systems

O. Günther: Database Management Systems

99

B+ Tree (cont.)

• B+ trees are balanced (i.e., all leaves are on the same level)

• lowest level (leaves): dense, otherwise : sparse

• each node fits on one page ( N entries)

• N = page size / space requirements per entry (Ex. above: N = 3)

• minimal page utilization (guaranteed): N/2 entries

Page 100: O. Günther: Database Management Systems

O. Günther: Database Management Systems

100

B+ Tree (cont.)

• each node has between N/2 and N entries

• problems: overflow, underflow

• Ex.: N = 3

A

Page 101: O. Günther: Database Management Systems

O. Günther: Database Management Systems

101

B+ Tree (cont.)

Page 102: O. Günther: Database Management Systems

O. Günther: Database Management Systems

102

B+Baum (cont.)

Page 103: O. Günther: Database Management Systems

O. Günther: Database Management Systems

103

Hashing - An Alternative to Indices

• hash function h: data value storage address• Ex.: storage address = data value MOD p (p typically a prime number)• Ex.: p = 13

Supp_N o Name ...

100 Miller ...

200 Smith ...

300 Meyer ...

400 Kuntze ...

500 Smith ...

1400 Miller ...

Hash Field

Page 104: O. Günther: Database Management Systems

O. Günther: Database Management Systems

104

Hashing (cont.):Storage Structure

• only one hash field per relation!• advantage: very fast access• disadvantage: - relation dispersed across the disk - collisions

Page 105: O. Günther: Database Management Systems

O. Günther: Database Management Systems

105

Hashing (cont.):Collision Chains

Page 106: O. Günther: Database Management Systems

O. Günther: Database Management Systems

106

Query Optimization

• Ex.:

SELECT DISTINCT Orders.Customer FROM Orders, Contains WHERE Orders.O_No = Contains.O_No AND Contains.Product = 'Brie'

• Assumptions: 100,000 tuples in Orders, 1000 bytes each 1,000,000 tuples in Contains, 100 bytes each 1,000 tuples in Contains concern Brie 100 MB main memory

Page 107: O. Günther: Database Management Systems

O. Günther: Database Management Systems

107

Query Optimization (cont.)

• Strategy 1: 1) Compute cartesian product Orders Contains 2) Select all tuples with Orders.O_No = Contains.O_No 3) Select all tuples with Contains.Product = 'Brie' 4) Project to Customer

• Strategy 2: 1) Select all tuples from Contains with Product = 'Brie' 2) Compute cartesian product with Orders 3) Select all tuples with Orders.O_No = Contains.O_No 4) Project to Customer

Page 108: O. Günther: Database Management Systems

O. Günther: Database Management Systems

108

Query Optimization (cont.)

• Analysis Strategy 1: (1)+(2): Tuple-I/Os for Orders: Tuple-I/Os for Contains: (3)+(4): Tuple-I/Os:

Tuple-I/Os in total: • Analysis Strategy 2: (1): Tuple-I/Os for Contains: (2)-(4): Tuple-I/Os:

Tuple-I/Os in total:

• Strategy 2 is clearly superior (Factor?)

Page 109: O. Günther: Database Management Systems

O. Günther: Database Management Systems

109

Query Optimization (cont.)

• Which (meta)data should be stored? (Statistics) - number of tuples for each relation - number of columns for each relation - number of different values per column - occurence frequencies of particular values• More information facilitates query optimization but slows down updates• Automatical optimization preferable because - statistics always up-to-date - more cost-efficient - dynamic• Important strength of relational systems

Page 110: O. Günther: Database Management Systems

O. Günther: Database Management Systems

110

Transaction Processing

• Transaction (TA) - logical unit of work - should be executed either completely or not at all - atomic, i.e., not decomposable• Recovery • Concurrency

Page 111: O. Günther: Database Management Systems

O. Günther: Database Management Systems

111

Recovery

• Recovery: restart after system fault• System faults - program crash - arithmetic mistakes (e.g. overflow) - disk crash - power failure• Ex.: DELETE FROM Contains WHERE O_No = 1024 • What happens in case of system fault „in the middle“

Page 112: O. Günther: Database Management Systems

O. Günther: Database Management Systems

112

Recovery (cont.)

• COMMIT - operation to terminate a TA successfully - all updates are stored in the database permanently - storage on „safe“ storage medium - transaktion is finalized - bundling of several COMMIT operations in checkpoints• ROLLBACK - operation to abort a TA in case of system fault - changes in CPU registers and storage are reversed• Important for ROLLBACK - logging each single modification - storing the log on a „safe“ medium

Page 113: O. Günther: Database Management Systems

O. Günther: Database Management Systems

113

Recovery (cont.)

checkpoint checkpoint checkpoint error

recovery

(Updates are storedon some “safe” medium)

Page 114: O. Günther: Database Management Systems

O. Günther: Database Management Systems

114

Recovery (cont.)

• 3 types of transactions - transactions that already completed and whose results have been made permanent: T1 - transactions that have already completed but whose results have not yet been made permanent: T2, T4 REDO (i.e. re-run, after recovery these transactions will have completed) - transactions that started but that did not finish: T3, T5 UNDO (i.e. reversal of all modifications, ROLLBACK of each transaction concerned; after recovery these transactions will not have completed)

Page 115: O. Günther: Database Management Systems

O. Günther: Database Management Systems

115

Concurrency: Dirty Read Problem

transaction A

action on basis of R.X

read from R.X

transaction B

commit B

update R.X

Problem!ROLLBACK A

R.X .. attributes of R

R .. relation

Page 116: O. Günther: Database Management Systems

O. Günther: Database Management Systems

116

Concurrency:Lost Update Problem

transaction A

transaction B

transaction B

transaction A

A reads R.X

double R.X

A writes newvalue of R.X

Commit A

B reads R.X

B adds 2 to R.X

B writes newvalue of R.X

Commit B

A changes R.X

B changes R.X

A reads R.X

A Rollback

B Commit

Page 117: O. Günther: Database Management Systems

O. Günther: Database Management Systems

117

Concurrency: Possible Solutions

• Timestamps to coordinate transactions• Locks: temporary blocking of parts of the database - exclusive lock (X-Lock): read/write lock, i.e. no other TA is allowed to read or write the blocked data - shared lock (S-Lock): write lock, i.e., others can read but not write• If a TA wants to read, it first has to ask for an S-lock for the required data • If a TA wants to write, it first has to ask for an X-lock for the required data• compatibility of locks S+S ... OK S+X ... Not OK X+X ... Not OK

Page 118: O. Günther: Database Management Systems

O. Günther: Database Management Systems

118

Locks: Application to Dirty Read

Yes

YesYes Yes

Yes

YesN

N

N

Page 119: O. Günther: Database Management Systems

O. Günther: Database Management Systems

119

Locks: Application to Dirty Read (cont.)

TA A obtains an X-lock for the field R.X to prepare for the planned update

TA B asks for an S-lock to prepare for the planned read operation REJECTED

ROLLBACK A locks are released

TA A obtains S-lock

TA B performs read operation + COMMIT

restart TA A

• Ex. 1:

Page 120: O. Günther: Database Management Systems

O. Günther: Database Management Systems

120

Locks: Application to Dirty Read (cont.)

TA A requests X-Lock for R.X

TA A obtains X-Lock, updates R.X

TA B requests S-Lock REJECTED, TA B waits

TA A ROLLBACK

TA B obtains S-Lock, reads R.X

TA B COMMIT

restart TA A, re-obtains X-Lock

• Ex. 2:

Page 121: O. Günther: Database Management Systems

O. Günther: Database Management Systems

121

Locks: Application to Lost Update

TA A wants to read R.X, asks for S-lock

TA A obtains S-lock, reads R.X

TA B also wants to read R.X, asks for S-Lock

TA B obtains S-Lock, reads R.X

TA A wants to update R.X, asks for X-Lock

TA A does not obtain X-Lock because TA B holds an S-Lock A waits

TA B wants to update R.X, asks for X-Lock

TA B does NOT obtain X-Lock B waits

DEADLOCK break via Rollback of some TA

Page 122: O. Günther: Database Management Systems

O. Günther: Database Management Systems

122

Deadlocks

• Problem: How to recognize deadlocks?• How to treat deadlocks involving several TAs?

• Searching for cycles in the WAIT-FOR graph

wait for

Page 123: O. Günther: Database Management Systems

O. Günther: Database Management Systems

123

Serializability

• Given a set of TAs, which possible events should be considered correct?• Convention: a schedule is considered correct if it is serializable• Serializability means that the result of the schedule is identical to the result of some serial schedule• Ex.:

(TA1) A := A + 1 read A into main memory add 1

write A back into the DB

(TA2) A := 2 * A read A into main memory multiply by 2 write A back into the DB

(TA3) write A read A into main memory display A on the screen set A to 1 in the DB

Page 124: O. Günther: Database Management Systems

O. Günther: Database Management Systems

124

Serializability - An Example

• Assumption: A = 1

TA1, TA2, TA3:

TA1, TA3, TA2:

TA2, TA3, TA1:

TA2, TA1, TA3:

TA3, TA1, TA2:

TA3, TA2, TA1:

Page 125: O. Günther: Database Management Systems

O. Günther: Database Management Systems

125

Concurrency: 2-Phase Locking

• 2-Phase locking protocol– for each transaction one first asks for all required locks (phase I)– processing ...– then all locks are (gradually) released (phase II)

TA2: no 2-phase-locking

numberof locks

Page 126: O. Günther: Database Management Systems

O. Günther: Database Management Systems

126

Concurrency and 2-Phase Locking

Theorem: 2-Phase Locking Protokoll for each transaction

Serializability of the schedule

2-phase-locking

all „reasonable“ possibilities

equivalent to FIFO

serial

serializable

Page 127: O. Günther: Database Management Systems

O. Günther: Database Management Systems

127

• Constraints and Properties - Minimum distance between roads and biotopes - River width varies widely - line vs. polygon - Roads are not necessarily connected - River and road shapes are independent of each other - Biotope shape depends on river shape

Environmental Data Modeling: An Example

Page 128: O. Günther: Database Management Systems

O. Günther: Database Management Systems

128

Environmental Data Modeling: An Example (2)

• Queries What is the distance between the planned road and the biotope? Which roads have a distance of less than x meters from a biotope? Where do we need an intersection? Where do we need a bridge? How much area is enclosed between roads and river? Which roads go along the river?

• Updates An intersection is built. The road is built. A bridge is built. Generate a class bridge dynamically.

Page 129: O. Günther: Database Management Systems

O. Günther: Database Management Systems

129

Spatial Data Types

• Points • Lines • Polygons • Curves • Polyhedra in arbitrary dimensions

• Applications Computer graphics Robotics CAD/CAM Geography Computer vision Environmental information systems

Page 130: O. Günther: Database Management Systems

O. Günther: Database Management Systems

130

Spatial Operators (1): Set Operators

• Union • Intersection • Difference

Page 131: O. Günther: Database Management Systems

O. Günther: Database Management Systems

131

Spatial Operators (2): Search Operators

Point Query: find all spatial objects that contain/are near a given pointRange Query: find all objects that contain/ intersect/are contained in a

given spatial object, such as a polygon

Page 132: O. Günther: Database Management Systems

O. Günther: Database Management Systems

132

Spatial Operators (3): Similarity Operators

• Translation • Rotation

• Scaling

Page 133: O. Günther: Database Management Systems

O. Günther: Database Management Systems

133

Spatial Operators (4): Spatial Joins

• Join between different classes of objects • Examples Find all houses that are within 10 km from a lake Find all buildings that are located within a biotope Find all schools that are more than 5 km away from a firestation Related: general map overlay

Page 134: O. Günther: Database Management Systems

O. Günther: Database Management Systems

134

Spatial Data Structures (1):Vertex Lists

• List of polygon vertices

• Supported operators: Similarity operators (Set operators)

• Problems: Not unique No invariants List vs. set Simple polygons - invalid representations

Page 135: O. Günther: Database Management Systems

O. Günther: Database Management Systems

135

Spatial Data Structures (2):B-Rep (Boundary Representation)

Page 136: O. Günther: Database Management Systems

O. Günther: Database Management Systems

136

Spatial Data Structures (3):B-Rep (Boundary Representation)

• 3D: DAG of height 3

• Supported operators: Similarity operators • Problems: not unique, invalid representations, search / set operators, redundancy

Page 137: O. Günther: Database Management Systems

O. Günther: Database Management Systems

137

What's the problem with commercial GIS?

• GIS = Geographic Information Systems • Originally oriented towards file systems • Scaling problems • No ad hoc query facility • Semantic integrity problems • Single user environment, little or no concurrency • No distributed GIS • Little support for application-specific data types or operators • Possible solution: use commercial databases

Page 138: O. Günther: Database Management Systems

O. Günther: Database Management Systems

138

And what about commercial databases? (1)

• No geometric data types: point, line, polygon, ... • Geometric representation may be hidden in a long field

• ... or in an external file

• Inflexible • No database support for geometric operations • No notion of topology • Redundancy

ID Color Shape2 blue /usr/john/pol2... ... ...

ID Color Shape2 blue ((1,1) (2,7) (3,9) ...)... ... ...

polygon

polygon

Page 139: O. Günther: Database Management Systems

O. Günther: Database Management Systems

139

And what about commercial databases? (2)

• Objects may be decomposed onto different relations • No spatial clustering • Shared objects less redundancy• Example: boundary representation

ID Facescuboid f1cuboid f2

pyramid f101... ...

ID Edgesf1 e1f1 e2... ...

ID Verticese1 v1e1 v2e2 v2... ...

ID X Y Zv1 0 1 0v2 0 3 2v3 ... ... ...

part faces

edges vertices

Page 140: O. Günther: Database Management Systems

O. Günther: Database Management Systems

140

And what about commercial databases? (3)

• No spatial access methods • Little support for application-specific object types - Cities - Rivers - ... • ... or for application-specific operations - Build a bridge - Modify a shape - ...

Page 141: O. Günther: Database Management Systems

O. Günther: Database Management Systems

141

Database Extensions (1)Abstract Data Types

• Abstract data types (ADTs) - Encapsulation of a (user-defined) data structure - Collection of (user-defined) operators on this structure - Implementation details hidden from the user

• ADTs in databases: BOX - example

create boxes (ID = i4, layer = c15,

box-desc = Box) append to boxes (ID = 99, layer = "polysilicon",

box-desc = "0,0 : 2,3") range of b is boxes replace b (box-desc = b.box-desc INT "0,0 : 4,1") where b.ID = 99 retrieve (boxes.ID) where AREA(boxes.box-desc > 100)

Page 142: O. Günther: Database Management Systems

O. Günther: Database Management Systems

142

Database Extensions (2):Implementation of Abstract Data Types

define type Box is (Internal length = 16, Input Proc = CharToBox, Output Proc = BoxToChar, Default = '' '')

define operator INT (Box,Box) returns Box is (Proc = BoxInt,

Precedence = 3, Associativity = ''left'', Sort = left X)

define operator AE (Box,Box) returns boolean is (Proc = BoxAE, Precedence = 3,

Associativity = ''left'', Sort = BoxArea, Hashes, Restrict = AERSelect, Join = AEJSelect, Negator = BoxAreaNE)

• C-Procedures BoxArea, AERSelect, AEJSelect, etc.

Page 143: O. Günther: Database Management Systems

O. Günther: Database Management Systems

143

Database Extensions (3):Implementation of Abstract Data Types

• Advantages - Very flexible - Data structures and operators can be very complex • Disadvantages - Two programming paradigms: DBMS and C - ADT maps into only one column: structural information gets lost - Complexity hidden in the ''black box'‘ - Problems for query optimization: what's inside?

Page 144: O. Günther: Database Management Systems

O. Günther: Database Management Systems

144

• Point query • Range query

Database Extensions (3):Spatial Access Methods

Page 145: O. Günther: Database Management Systems

O. Günther: Database Management Systems

145

Database Extensions (5): R - Trees

• Features - Hierarchy of d-dimensional boxes - Balanced tree - One node per disk page - Fully dynamic • Problems - Overlap of sibling boxes - bad for point searches - Arbitrary shapes: additional computations and disk accesses (clustering!)

Page 146: O. Günther: Database Management Systems

O. Günther: Database Management Systems

146

Object-Oriented Database Systems

The OODBS Manifesto (Atkinson et al. 1989): OODBS = DBS + ...

• Complex objects (PART-OF) - Structural OO • User-defined data types - Behavioral OO • Object identity • Encapsulation • Types/Classes • Inheritance (IS-A) • Operators: overloading / overriding / late binding

Page 147: O. Günther: Database Management Systems

O. Günther: Database Management Systems

147

Behavioral Object-Orientationfor Geometric Modeling

• Integration of complex geometric data types and operators

add class Point type tuple (x: real y: real) add method DistOrigin: real in class Point return (sqrt(sqr(selfx)+sqr(selfy)))

Page 148: O. Günther: Database Management Systems

O. Günther: Database Management Systems

148

Structural Object-Orientationfor Geometric Modeling (1)

• Complex geometric objects • Boundary representation: 3D 2D 1D 0D • Shared subobjects: faces, lines, points

Page 149: O. Günther: Database Management Systems

O. Günther: Database Management Systems

149

Structural Object-Orientationfor Geometric Modeling (2)

add class River type tuple (rname: string

rshape: list(PolylineOrPolygon)) add class PolylineOrPolygon type list(Point) add class Polyline inherits PolylineOrPolygon ... add class Polygon inherits PolylineOrPolygon ... add class Point type tuple (x: real y: real)

Page 150: O. Günther: Database Management Systems

O. Günther: Database Management Systems

150

Structural Object-Orientationfor Application Modeling (1)

• Complex geo-objects • Example: city - districts - streets

Page 151: O. Günther: Database Management Systems

O. Günther: Database Management Systems

151

Structural Object-Orientationfor Application Modeling (2)

add class City type tuple (cname: string cpopulation: integer districts: set(District) cshape: Polygon) add class District type tuple (dname: string dpopulation: integer dshape: Polygon streets: set(Street)) add class Street type tuple (sname: string sshape: Polyline)

Page 152: O. Günther: Database Management Systems

O. Günther: Database Management Systems

152

Behavioral Object-Orientationfor Application Modeling

• Integration of application-specific data types and operations

add method CompPop: integer in class City d: District p: integer for each d in self districts { p = p+d dpopulation } return(p) add method CompShape ... add method CompStreets ...