View
222
Download
1
Tags:
Embed Size (px)
Citation preview
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 1
Database
Introduction
system architecture,
Basic concepts,
ER
-mod e l,
Da ta
mo d el in g ,
B+
-treeH
ashing
Relational algebra,
Relational data m
odel
SQ
L: D
DL
, DM
L
not i
nclu
ded
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 2
Introductionto the database systems
What is a database?
The main characters of a database
The basic database design method
The entity-relationship data model
for application modeling
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 3
The main characteristics of the database approach:
single repository of data• sharable by multiple users
• concurrency control and transaction concept• security and integrity constraints
• self-describing - system catalogue contains meta data
• program-data independence• some changes to the database are transparent to
programs/users
• multiple views of data - to support individual needs of programs/users
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 4
Data modeling usingER-model
Entity-relationship model- Entity types
- strong entities- weak entities
- Relationships among entities- Attributes - attribute classification- Constraints
- cardinality constraints- participation constraints
ER-to-Relation-mapping
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 5
employee
department
project
dependent
ER-model:
works for
manages
works on
dependents of
controls
supervision
bdate
ssn
name
lnameminitfname
sex address salary
birthdatename sex relationship
name number location
name number location
number ofemployees
startdate
hours
1
1
1
N
supervisor supervisee NM
N
1M
N1 M
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 6
Database schema, Schema evolution,
Database state
Working process with a database system
Database system architecture
Data independence concept
Concepts andArchitecture
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 7
Database schema
Relation schema
Schema evolution
Database state
Student Name StNo Class Major
Smith 17 1 CS
Brown 8 2 CS
Course CName CNo CrHrs Dept
Database 8803 3 CS
C 2606 3 CS
Section SId CNo Semester Yr Instructor
32 8803 Spring 2000 Smith
25 8803 Winter 2000 Smith
43 2606 Spring 2000 Jones
Grades StNo Sid Grade
17 25 A
17 43 B
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 8
Working process with a database system:
Definition•record structure•data elements
•names•data types•constraints
etc
Construction•create database
files•populate the
database with records
Manipulation•querying•updating
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 9
Database Management System (DBMS)
• collection of software facilitating the definition, construction and manipulation of databases
Users/actors
Requestmanager
Storagemanager,
Queryevaluation
Meta data
Storeddatabase
DBMS
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 10
Three-schema architecture
Externalview
Externalview
Conceptualschema
Internalschema
Physical storage structures and details
Describes the whole database for all users
A specific user or groups view of the database
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 11
external hashing
static hashing & dynamic hashing
hash function
mathematical function that maps a key to a
bucket addresscollisionscollision resolution scheme- open addressing- chaining- multiple hashing
linear hashing
Hashing technique
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 12
External hashing: the data are on the disk.
Static hashing:using a hashing function to map keys to bucket addressesprimary area can not be changedcollision resolusion scheme:
open addressingchainingmultiple hashing
Dynamic hashing:primary area can be changedlinear hashing
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 13
Linear hashing:
1. What is a phase?
2. When to split a bucket?
3. How to split a bucket?
4. What bucket will be chosen to split next?
5. How do we find a record inserted into a linear hashing file?
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 14
Linear hashing:initially hash file contains M bucketshi = key mod (2iM) (i = 0, 1, 2, ...)insertion process can be divided into several phases
phase 1:insertion using h0 = key mod Msplitting using h1 = key mod (2M)splitting rule: overflow of a bucket or
if load factor > constant (e.g., 0.70)overflow will be put in the overflow area or redistributed throughsplitting a bucketsplitting buckets from n = 0 to n = M- 1 (after each splittingn is increased by 1.Phase 1 finishes when n = M (in this case, the primary areabecomes 2M buckets long)
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 15
phase 2:insertion using h1 = key mod (2M)splitting using h2 = key mod (4M)splitting rule: overflow of a bucket or
if load factor > constant (e.g., 0.70)overflow will be put in the overflow area or redistributed
throughsplitting a bucketsplitting buckets from n = 0 to n = 2M- 1 (after each splittingn is increased by 1.Phase 1 finishes when n = 2M (in this case, the primary areawill contain 4M buckets.)
phase 3: ... … h2 = …, h3 = …, ...
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 16
Linear Hashing including two Phases:
- collision resolution strategy: chaining- split rule: load factor > 0.7- initially M = 4 (M: size of the primary area)- hash functions: hi(key) = key mod 2i M (i = 0, 1, 2, …)- bucket capacity = 2Trace the insertion process of the following keys into a linearhashing file:
3, 2, 4, 1, 8, 14, 5, 10, 7, 24, 17, 13, 15.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 17
The first phase – phase0
•when inserting the sixth record we would have
•but the load factor 6/8= 0.75 > 0.70 and so bucket 0 must be split (using h1 = Key mod 2M):
48
12
14
0 1 2 3
3
0 1 2 3 4
8 12
143 4
n=0 before the split
(n is the point to the bucket to be split.)
n=1 after the split
load factor: 6/10=0.6
no split
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 18
0 1 2 3 4
8 12
143 4
n=1load factor: 7/10=0.7no split
insert(5)
815
214
3 4
0 1 2 3 4
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 19
0 1 2 3 4
815
214
3 4
n=1load factor: 8/10=0.8split using h1.
insert(10)
815
214
3 4
10
overflow
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 20
0 1 2 3 4 5
8 12
143 4
n=2load factor: 8/12=0.66no split
10
overflow
5
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 21
n=2load factor: 9/12=0.75split using h1.
8 12
143 4
10
overflow
5
0 1 2 3 4 5
8 12
1437
4
10
overflow
5
insert(7)
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 22
n=3load factor: 9/14=0.642no split.
8 12
1037
4 5 14
8 12
1037
4 5 14
insert(24)
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 23
n=3load factor: 10/14=0.71split using h1.
824
12
1037
4 5 14
824
12
103 4 5 14 7
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 24
n=4
824
12
103 4 5 14 7
The second phase – phase1
824
12
103 4 5 14 7
n = 0; using h1 = Key mod 2M to insert and h2 = Key mod 4M to split.
insert(17)
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 25
n=4
824
12
103 4 5 14 7
The second phase – phase1
824
12
103 4 5 14 7
n = 0; using h1 = Key mod 2M to insert and h2 = Key mod 4M to split.
insert(17)
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 26
824
117
210
3 4 5 14 7
n=0load factor: 11/16=0.687no split.
824
117
210
3 4 5 14 7
insert(13)
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 27
824
117
210
3 45
1314 7
n=0load factor: 12/16=0.75split bucket 0, using h2.
117
210
3 45
1314 7
824
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 28
n=1load factor: 13/18=0.722split bucket 1, using h2.
117
210
3 45
1314 7
824
insert(15)
117
210
3 45
1314
715
824
117
210
3 45
1314
715
824
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 29
tree
- root, internal, leaf, subtree
- parent, child, sibling
balanced, unbalanced
b+-tree
- splits on overflow; merge on underflow
- in practice it is usually 3 or 4 levels deep
search, insert, delete algorithms
Multi-levelindex
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 30
B+-tree Structure
non-leaf node (internal node or a root)
• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq > (q pinternal)
• K1 < K2 < ... < Kq-1 (i.e. it’s an ordered set)
• For any key value, X, in the subtree pointed to by Pi
•Ki-1 < X Ki for 1 < i < q•X K1 for i = 1•Kq-1 < X for i = q
• Each internal node has at most pinternal pointers.• Each node except root must have at least pinternal/2 pointers.• The root, if it has some children, must have at least 2 pointers.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 31
B+-tree Structure
leaf node (terminal node)
• < (K1, Pr1), (K2, Pr2), …, (Kq-1, Prq-1), Pnext >
• K1 < K2 < ... < Kq-1
• Pri points to a record with key value Ki, or Pri points to a page containing a record with key value Ki.
• Maximum of pleaf key/pointer pairs.• Each leaf has at least pleaf/2 keys.• All leaves are at the same level (balanced).• Pnext points to the next leaf node for key sequencing.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 32
A B+-tree
5
3 7 8
6 7 9 125 81 3
Records in a file
pinternal = 3,pleaf = 2.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 33
B+-tree insertion: leaf node splitting, internal node splitting
Leaf splitting
When a leaf splits, a new leaf is allocated • the original leaf is the left sibling, the new one is the right sibling • key and pointer pairs are redistributed: the left sibling will have smaller
keys than the right sibling• a 'copy' of the key value which is the largest of the keys in the left sibling
is promoted to the parent
33
12 22 33 44 48 55 12 22 44 48 5531 33
22 33
insert 31
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 34
Internal node splitting
If an internal node splits and it is not the root,• insert the key and pointer and then determine the middle key• a new 'right' sibling is allocated• everything to its left stays in the left sibling• everything to its right goes into the right sibling • the middle key value along with the pointer to the new right sibling is
promoted to the parent (the middle key value 'moves' to the parent to become the discriminator between this left and right sibling)
22 33
55
22
26 55
Insert 26
33
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 35
Internal node splitting
When a new root is formed, a key value and two pointers must be placed into it.
26 55
Insert 40
26 55
40
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 36
Deleting nodes from a B+-tree:
1. When deleting a key from a node A, check whether the
number of the remaining keys (or pointers) is p/2.
2. If it is not the case, redistribute the keys in the left sibling B or
in the right sibling C if it is possible. Otherwise, merge A and B or merge
A and C.
3. When redistributing or merging, change the key values in the
parent node so that the following condition is satisfied:
• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq >
• K1 < K2 < ... < Kq-1 (i.e. it is an ordered set)
• for the key values, X, in the subtree pointed to by Pi
• Ki-1 < X <= Ki for 1 < i < q• X <= K1 for i = 1• Kq-1 < X for i = q
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 37
A b+-tree
5
3 7 8
6 7 9 125 81 3
Records
pinternal = 3,pleaf = 2.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 38
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3 7 9
6 7 125 91 3
Deleting 8 causes the node redistribute.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 39
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3 7
6 75 91 3
12 is removed.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 40
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3 6
65 71 3
9 is removed.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 41
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3 6
651 3
Deleting 7 makes this pointer no use.Therefore, a merge at the level abovethe leaf level occurs.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 42
Entry deletion
- deletion sequence: 8, 12, 9, 7
53
For this merge, 5 will be taken as a key value in A since any key value in B is less than or equal to 5 but any key value in C is larger than 5.
651 3
5A
B
C
5
This point becomes useless.The corresponding nodeshould also be removed.
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 43
Entry deletion
- deletion sequence: 8, 12, 9, 7
651 3
53 5
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 44
Data modeling usingRelational modelRelational algebra
Relational Data Model
- relation schema, relations
- database schema (relational schema), database state
- integrity constraints and updating
Relational algebra
- select, project, join, cartesian product
- division
- set operations:
union, intersection, difference,
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 45
Integrity Constraints
• any database will have some number of constraints that must be applied to ensure correct data (valid states)
1. domain constraints• a domain is a restriction on the set of valid values• domain constraints specify that the value of each
attribute A must be an atomic value from the domain dom(A).
2. key constraints• a superkey is any combination of attributes that
uniquely identify a tuple: t1[superkey] t2[superkey].- Example: <Name, SSN> (in Employee)
• a key is superkey that has a minimal set of attributes- Example: <SSN> (in Employee)
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 46
Integrity Constraints• If a relation schema has more than one key, each of them is
called a candidate key.• one candidate key is chosen as the primary key (PK)• foreign key (FK) is defined as follows:
i) Consider two relation schemas R1 and R2;ii) The attributes in FK in R1 have the same domain(s) as the
primary key attributes PK in R2; the attributes FK are said to reference or refer to the relation R2;
iii) A value of FK in a tuple t1 of the current state r(R1) either occurs as a value of PK for some tuple t2 in the current state
r(R2) or is null. In the former case, we have t1[FK] = t2[PK], and we say that the tuple t1 references or refers to the tuple t2.Example:
Employee(SSN, …, Dno) Dept(Dno, … )
FK
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 47
Integrity Constraints
3. entity integrity • no part of a PK can be null
4. referential integrity• domain of FK must be same as domain of PK• FK must be null or have a value that appears as a PK
value5. semantic integrity
• other rules that the application domain requires: • state constraint: gross salary > net income • transition constraint: Widowed can only follow
Married; salary of an employee cannot decrease
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 48
Updating and constraints
insert
• Insert the following tuple into EMPLOYEE:<‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windy
Lane, Katy, TX’, F, 40000, null, 4>
• When inserting, the integrity constraints should be checked: domain, key, entity, referential, semantic integrity
update
• Update the SALARY of the EMPLOYEE tuple with ssn = ‘999887777’ to 30000.
• When updating, the integrity constraints should be checked: domain, key, entity, referential, semantic integrity
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 49
Updating and constraints
delete
• Delete the WORK_ON tuple with Essn = ‘999887777’ and pno = 10.
• When deleting, the referential constraint will be checked.
- The following deletion is not acceptable:
Delete the EMPLOYEE tuple with ssn = ‘999887777’
- reject, cascade, modify
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 50
cascade – a strategy to enforce referential integrity
ssn
Employee
Essn Pno
delete
Works-on
delete
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 51
cascade – a strategy to enforce referential integrity
Employee
delete
ssn supervisor
null
Employee
delete
ssn supervisor
null
delete
not reasonable
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 52
Modify – a strategy to enforce referential integrity
ssn
Employee
Essn Pno
delete
Essn Pnonull
This violates the entity constraint.
Works-on Works-on
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 53
Relational Algebra
a set of relations
a set of operations
set operations
relation specific
selectprojectjoindivision
unionintersectiondifferencecartesian product
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 54
Relational algebra
Retrieve for each female employee a list of the names of her
dependents:
FEMALE_EMPS SEX = ‘F’ (EMPLOYEE)
ACTUAL_DEPENDENTS EMPNAMES
EMPNAMES FNAME,LNAME, SSN(FEMALE_EMPS)
RESULT FNAME, LNAME, DEPENDENT_NAME(ACTUAL_DEPENDENTS )
DEPENDENTSSN = ESSN
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 55
Query: Retrieve the name of employees who work on allthe projects that ‘John Smith’ works on.
SMITH FNAME = ‘John’ and LNAME = ‘Smith’(EMPLOYEE)
SMITH_PNOs PNO(WORK_ON ESSN = SSNSMITH)
SSN_PNO ESSN,PNO(WORK_ON)
SSNS(SSN) SSN_PNO : SMITH_PNOs
RESULT FNAME, LNAME(SSNS * EMPLOYEE)
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 56
Division
The DIVISION operator can be expressed as a sequence of, , and - operations as follows:
Z = {A1, …, An, B1, …, Bm}, X = {B1, …, Bm},Y = Z - X = {A1, …, An},
R(Z) S(X) :T1 Y( R)
T2 Y((S T1) - R)
T T1 - T2
result
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 57
DDL
- creating schemas
- modifying schemas
DML
- select-from-where clause
- group by, having, order by
- update
- view
SQL
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 58
DDL - Examples:
• Create schema: Create schema COMPANY authorization JSMITH;
• Create table: Create table EMPLOYEE
(FNAME VARCHAR(15) NOT NULL, MINIT CHAR, LNAME VARCHAR(15) NOT NULL, SSN CHAR(9) NOT NULL, BDATE DATE, ADDRESS VARCHAR(30), SEX CHAR, SALARY DECIMAL(10, 2), SUPERSSN CHAR(9), DNO INT NOT NULL,
PRIMARY KEY(SSN),FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN),FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER));
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 59
DDL - Examples:
• drop schemaDROP SCHEMA CAMPANY CASCADE;DROP SCHEMA CAMPANY RESTRICT;
• drop tableDROP TABLE DEPENDENT CASCADE;DROP TABLE DEPENDENT RESTRICT;
• alter tableALTER TABLE COMPANY.EMPLOYEE
ADD JOB VARCHAR(12);ALTER TABLE COMPANY.EMPLOYEE
DROP ADDRESS CASCADE;
Database: Review
Sept. 2009 Yangjun Chen ACS-3902 60
DML - select-from-where clause
Retrieve a list of employees and the projects they are working on, ordered bydepartment, within each department, ordered alphabetically by last name, first name:
SELECT DNAME, LNAME, FNAME, PNAMEFROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECTWHERE DNUMBER = DNO AND SSN = ESSN AND
PNO = PNUMBERORDER BY DNAME, LNAME, FNAME