Database: Review Sept. 2009Yangjun Chen ACS-39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

Database: Review

Sept. 2009 Yangjun Chen ACS-3902 1

Database

Introduction

system architecture,

Basic concepts,

ER

-mod e l,

Da ta

mo d el in g ,

B+

-treeH

ashing

Relational algebra,

Relational data m

odel

SQ

L: D

DL

, DM

L

not i

nclu

ded

Database: Review


Introductionto the database systems

What is a database?

The main characters of a database

The basic database design method

The entity-relationship data model

for application modeling

Database: Review


The main characteristics of the database approach:

single repository of data• sharable by multiple users

• concurrency control and transaction concept• security and integrity constraints

• self-describing - system catalogue contains meta data

• program-data independence• some changes to the database are transparent to

programs/users

• multiple views of data - to support individual needs of programs/users

Database: Review


Data modeling usingER-model

Entity-relationship model- Entity types

- strong entities- weak entities

- Relationships among entities- Attributes - attribute classification- Constraints

- cardinality constraints- participation constraints

ER-to-Relation-mapping

Database: Review


employee

department

project

dependent

ER-model:

works for

manages

works on

dependents of

controls

supervision

bdate

ssn

name

lnameminitfname

sex address salary

birthdatename sex relationship

name number location

name number location

number ofemployees

startdate

hours

1

1

1

N

supervisor supervisee NM

N

1M

N1 M

Database: Review


Database schema, Schema evolution,

Database state

Working process with a database system

Database system architecture

Data independence concept

Concepts andArchitecture

Database: Review


Database schema

Relation schema

Schema evolution

Database state

Student Name StNo Class Major

Smith 17 1 CS

Brown 8 2 CS

Course CName CNo CrHrs Dept

Database 8803 3 CS

C 2606 3 CS

Section SId CNo Semester Yr Instructor

32 8803 Spring 2000 Smith

25 8803 Winter 2000 Smith

43 2606 Spring 2000 Jones

Grades StNo Sid Grade

17 25 A

17 43 B

Database: Review


Working process with a database system:

Definition•record structure•data elements

•names•data types•constraints

etc

Construction•create database

files•populate the

database with records

Manipulation•querying•updating

Database: Review


Database Management System (DBMS)

• collection of software facilitating the definition, construction and manipulation of databases

Users/actors

Requestmanager

Storagemanager,

Queryevaluation

Meta data

Storeddatabase

DBMS

Database: Review


Three-schema architecture

Externalview

Externalview

Conceptualschema

Internalschema

Physical storage structures and details

Describes the whole database for all users

A specific user or groups view of the database

Database: Review


external hashing

static hashing & dynamic hashing

hash function

mathematical function that maps a key to a

bucket addresscollisionscollision resolution scheme- open addressing- chaining- multiple hashing

linear hashing

Hashing technique

Database: Review


External hashing: the data are on the disk.

Static hashing:using a hashing function to map keys to bucket addressesprimary area can not be changedcollision resolusion scheme:

open addressingchainingmultiple hashing

Dynamic hashing:primary area can be changedlinear hashing

Database: Review


Linear hashing:

1. What is a phase?

2. When to split a bucket?

3. How to split a bucket?

4. What bucket will be chosen to split next?

5. How do we find a record inserted into a linear hashing file?

Database: Review


Linear hashing:initially hash file contains M bucketshi = key mod (2iM) (i = 0, 1, 2, ...)insertion process can be divided into several phases

phase 1:insertion using h0 = key mod Msplitting using h1 = key mod (2M)splitting rule: overflow of a bucket or

if load factor > constant (e.g., 0.70)overflow will be put in the overflow area or redistributed throughsplitting a bucketsplitting buckets from n = 0 to n = M- 1 (after each splittingn is increased by 1.Phase 1 finishes when n = M (in this case, the primary areabecomes 2M buckets long)

Database: Review


phase 2:insertion using h1 = key mod (2M)splitting using h2 = key mod (4M)splitting rule: overflow of a bucket or

if load factor > constant (e.g., 0.70)overflow will be put in the overflow area or redistributed

throughsplitting a bucketsplitting buckets from n = 0 to n = 2M- 1 (after each splittingn is increased by 1.Phase 1 finishes when n = 2M (in this case, the primary areawill contain 4M buckets.)

phase 3: ... … h2 = …, h3 = …, ...

Database: Review


Linear Hashing including two Phases:

- collision resolution strategy: chaining- split rule: load factor > 0.7- initially M = 4 (M: size of the primary area)- hash functions: hi(key) = key mod 2i M (i = 0, 1, 2, …)- bucket capacity = 2Trace the insertion process of the following keys into a linearhashing file:

3, 2, 4, 1, 8, 14, 5, 10, 7, 24, 17, 13, 15.

Database: Review


The first phase – phase0

•when inserting the sixth record we would have

•but the load factor 6/8= 0.75 > 0.70 and so bucket 0 must be split (using h1 = Key mod 2M):

48

12

14

0 1 2 3

3

0 1 2 3 4

8 12

143 4

n=0 before the split

(n is the point to the bucket to be split.)

n=1 after the split

load factor: 6/10=0.6

no split

Database: Review


0 1 2 3 4

8 12

143 4

n=1load factor: 7/10=0.7no split

insert(5)

815

214

3 4

0 1 2 3 4

Database: Review


0 1 2 3 4

815

214

3 4

n=1load factor: 8/10=0.8split using h1.

insert(10)

815

214

3 4

10

overflow

Database: Review


0 1 2 3 4 5

8 12

143 4

n=2load factor: 8/12=0.66no split

10

overflow

5

Database: Review



8 12

143 4

10

overflow

5

0 1 2 3 4 5

8 12

1437

4

10

overflow

5

insert(7)

Database: Review


n=3load factor: 9/14=0.642no split.

8 12

1037

4 5 14

8 12

1037

4 5 14

insert(24)

Database: Review



824

12

1037

4 5 14

824

12

103 4 5 14 7

Database: Review


n=4

824

12

103 4 5 14 7

The second phase – phase1

824

12

103 4 5 14 7

n = 0; using h1 = Key mod 2M to insert and h2 = Key mod 4M to split.

insert(17)

Database: Review


n=4

824

12

103 4 5 14 7

The second phase – phase1

824

12

103 4 5 14 7

n = 0; using h1 = Key mod 2M to insert and h2 = Key mod 4M to split.

insert(17)

Database: Review


824

117

210

3 4 5 14 7

n=0load factor: 11/16=0.687no split.

824

117

210

3 4 5 14 7

insert(13)

Database: Review


824

117

210

3 45

1314 7

n=0load factor: 12/16=0.75split bucket 0, using h2.

117

210

3 45

1314 7

824

Database: Review


n=1load factor: 13/18=0.722split bucket 1, using h2.

117

210

3 45

1314 7

824

insert(15)

117

210

3 45

1314

715

824

117

210

3 45

1314

715

824

Database: Review


tree

- root, internal, leaf, subtree

- parent, child, sibling

balanced, unbalanced

b+-tree

- splits on overflow; merge on underflow

- in practice it is usually 3 or 4 levels deep

search, insert, delete algorithms

Multi-levelindex

Database: Review


B+-tree Structure

non-leaf node (internal node or a root)

• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq > (q pinternal)

• K1 < K2 < ... < Kq-1 (i.e. it’s an ordered set)

• For any key value, X, in the subtree pointed to by Pi

•Ki-1 < X Ki for 1 < i < q•X K1 for i = 1•Kq-1 < X for i = q

• Each internal node has at most pinternal pointers.• Each node except root must have at least pinternal/2 pointers.• The root, if it has some children, must have at least 2 pointers.

Database: Review


B+-tree Structure

leaf node (terminal node)

• < (K1, Pr1), (K2, Pr2), …, (Kq-1, Prq-1), Pnext >

• K1 < K2 < ... < Kq-1

• Pri points to a record with key value Ki, or Pri points to a page containing a record with key value Ki.

• Maximum of pleaf key/pointer pairs.• Each leaf has at least pleaf/2 keys.• All leaves are at the same level (balanced).• Pnext points to the next leaf node for key sequencing.

Database: Review


A B+-tree

5

3 7 8

6 7 9 125 81 3

Records in a file

pinternal = 3,pleaf = 2.

Database: Review


B+-tree insertion: leaf node splitting, internal node splitting

Leaf splitting

When a leaf splits, a new leaf is allocated • the original leaf is the left sibling, the new one is the right sibling • key and pointer pairs are redistributed: the left sibling will have smaller

keys than the right sibling• a 'copy' of the key value which is the largest of the keys in the left sibling

is promoted to the parent

33

12 22 33 44 48 55 12 22 44 48 5531 33

22 33

insert 31

Database: Review


Internal node splitting

If an internal node splits and it is not the root,• insert the key and pointer and then determine the middle key• a new 'right' sibling is allocated• everything to its left stays in the left sibling• everything to its right goes into the right sibling • the middle key value along with the pointer to the new right sibling is

promoted to the parent (the middle key value 'moves' to the parent to become the discriminator between this left and right sibling)

22 33

55

22

26 55

Insert 26

33

Database: Review


Internal node splitting

When a new root is formed, a key value and two pointers must be placed into it.

26 55

Insert 40

26 55

40

Database: Review


Deleting nodes from a B+-tree:

1. When deleting a key from a node A, check whether the

number of the remaining keys (or pointers) is p/2.

2. If it is not the case, redistribute the keys in the left sibling B or

in the right sibling C if it is possible. Otherwise, merge A and B or merge

A and C.

3. When redistributing or merging, change the key values in the

parent node so that the following condition is satisfied:

• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq >

• K1 < K2 < ... < Kq-1 (i.e. it is an ordered set)

• for the key values, X, in the subtree pointed to by Pi

• Ki-1 < X <= Ki for 1 < i < q• X <= K1 for i = 1• Kq-1 < X for i = q

Database: Review


A b+-tree

5

3 7 8

6 7 9 125 81 3

Records

pinternal = 3,pleaf = 2.

Database: Review


Entry deletion

- deletion sequence: 8, 12, 9, 7

5

3 7 9

6 7 125 91 3

Deleting 8 causes the node redistribute.

Database: Review


Entry deletion


5

3 7

6 75 91 3

12 is removed.

Database: Review


Entry deletion


5

3 6

65 71 3

9 is removed.

Database: Review


Entry deletion


5

3 6

651 3

Deleting 7 makes this pointer no use.Therefore, a merge at the level abovethe leaf level occurs.

Database: Review


Entry deletion


53

For this merge, 5 will be taken as a key value in A since any key value in B is less than or equal to 5 but any key value in C is larger than 5.

651 3

5A

B

C

5

This point becomes useless.The corresponding nodeshould also be removed.

Database: Review


Entry deletion


651 3

53 5

Database: Review


Data modeling usingRelational modelRelational algebra

Relational Data Model

- relation schema, relations

- database schema (relational schema), database state

- integrity constraints and updating

Relational algebra

- select, project, join, cartesian product

- division

- set operations:

union, intersection, difference,

Database: Review


Integrity Constraints

• any database will have some number of constraints that must be applied to ensure correct data (valid states)

1. domain constraints• a domain is a restriction on the set of valid values• domain constraints specify that the value of each

attribute A must be an atomic value from the domain dom(A).

2. key constraints• a superkey is any combination of attributes that

uniquely identify a tuple: t1[superkey] t2[superkey].- Example: <Name, SSN> (in Employee)

• a key is superkey that has a minimal set of attributes- Example: <SSN> (in Employee)

Database: Review


Integrity Constraints• If a relation schema has more than one key, each of them is

called a candidate key.• one candidate key is chosen as the primary key (PK)• foreign key (FK) is defined as follows:

i) Consider two relation schemas R1 and R2;ii) The attributes in FK in R1 have the same domain(s) as the

primary key attributes PK in R2; the attributes FK are said to reference or refer to the relation R2;

iii) A value of FK in a tuple t1 of the current state r(R1) either occurs as a value of PK for some tuple t2 in the current state

r(R2) or is null. In the former case, we have t1[FK] = t2[PK], and we say that the tuple t1 references or refers to the tuple t2.Example:

Employee(SSN, …, Dno) Dept(Dno, … )

FK

Database: Review


Integrity Constraints

3. entity integrity • no part of a PK can be null

4. referential integrity• domain of FK must be same as domain of PK• FK must be null or have a value that appears as a PK

value5. semantic integrity

• other rules that the application domain requires: • state constraint: gross salary > net income • transition constraint: Widowed can only follow

Married; salary of an employee cannot decrease

Database: Review


Updating and constraints

insert

• Insert the following tuple into EMPLOYEE:<‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windy

Lane, Katy, TX’, F, 40000, null, 4>

• When inserting, the integrity constraints should be checked: domain, key, entity, referential, semantic integrity

update

• Update the SALARY of the EMPLOYEE tuple with ssn = ‘999887777’ to 30000.

• When updating, the integrity constraints should be checked: domain, key, entity, referential, semantic integrity

Database: Review


Updating and constraints

delete

• Delete the WORK_ON tuple with Essn = ‘999887777’ and pno = 10.

• When deleting, the referential constraint will be checked.

- The following deletion is not acceptable:

Delete the EMPLOYEE tuple with ssn = ‘999887777’

- reject, cascade, modify

Database: Review


cascade – a strategy to enforce referential integrity

ssn

Employee

Essn Pno

delete

Works-on

delete

Database: Review


cascade – a strategy to enforce referential integrity

Employee

delete

ssn supervisor

null

Employee

delete

ssn supervisor

null

delete

not reasonable

Database: Review


Modify – a strategy to enforce referential integrity

ssn

Employee

Essn Pno

delete

Essn Pnonull

This violates the entity constraint.

Works-on Works-on

Database: Review


Relational Algebra

a set of relations

a set of operations

set operations

relation specific

selectprojectjoindivision

unionintersectiondifferencecartesian product

Database: Review


Relational algebra

Retrieve for each female employee a list of the names of her

dependents:

FEMALE_EMPS SEX = ‘F’ (EMPLOYEE)

ACTUAL_DEPENDENTS EMPNAMES

EMPNAMES FNAME,LNAME, SSN(FEMALE_EMPS)

RESULT FNAME, LNAME, DEPENDENT_NAME(ACTUAL_DEPENDENTS )

DEPENDENTSSN = ESSN

Database: Review


Query: Retrieve the name of employees who work on allthe projects that ‘John Smith’ works on.

SMITH FNAME = ‘John’ and LNAME = ‘Smith’(EMPLOYEE)

SMITH_PNOs PNO(WORK_ON ESSN = SSNSMITH)

SSN_PNO ESSN,PNO(WORK_ON)

SSNS(SSN) SSN_PNO : SMITH_PNOs

RESULT FNAME, LNAME(SSNS * EMPLOYEE)

Database: Review


Division

The DIVISION operator can be expressed as a sequence of, , and - operations as follows:

Z = {A1, …, An, B1, …, Bm}, X = {B1, …, Bm},Y = Z - X = {A1, …, An},

R(Z) S(X) :T1 Y( R)

T2 Y((S T1) - R)

T T1 - T2

result

Database: Review


DDL

- creating schemas

- modifying schemas

DML

- select-from-where clause

- group by, having, order by

- update

- view

SQL

Database: Review


DDL - Examples:

• Create schema: Create schema COMPANY authorization JSMITH;

• Create table: Create table EMPLOYEE

(FNAME VARCHAR(15) NOT NULL, MINIT CHAR, LNAME VARCHAR(15) NOT NULL, SSN CHAR(9) NOT NULL, BDATE DATE, ADDRESS VARCHAR(30), SEX CHAR, SALARY DECIMAL(10, 2), SUPERSSN CHAR(9), DNO INT NOT NULL,

PRIMARY KEY(SSN),FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN),FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER));

Database: Review


DDL - Examples:

• drop schemaDROP SCHEMA CAMPANY CASCADE;DROP SCHEMA CAMPANY RESTRICT;

• drop tableDROP TABLE DEPENDENT CASCADE;DROP TABLE DEPENDENT RESTRICT;

• alter tableALTER TABLE COMPANY.EMPLOYEE

ADD JOB VARCHAR(12);ALTER TABLE COMPANY.EMPLOYEE

DROP ADDRESS CASCADE;

Database: Review


DML - select-from-where clause

Retrieve a list of employees and the projects they are working on, ordered bydepartment, within each department, ordered alphabetically by last name, first name:

SELECT DNAME, LNAME, FNAME, PNAMEFROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECTWHERE DNUMBER = DNO AND SSN = ESSN AND

PNO = PNUMBERORDER BY DNAME, LNAME, FNAME

Documents

Database: Review Sept. 2009Yangjun Chen ACS-39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational