Database: Review Sept. 2012Yangjun Chen ACS-39021 Database Introduction system architecture, Basic...

Database: Review

Sept. 2012 Yangjun Chen ACS-3902 1

Database

Introduction

system architecture,

Basic concepts,

-model,

modeling,

-treeH

ashing

Relational algebra,

Relational data m

alization

Lossless join

Hierarchical databases

B application in Java)

Multi-dim

ensional indexes

not covered

Database: Review

Introductionto the database systems

What is a database?

The main characters of a database

The basic database design method

The entity-relationship data model

for application modeling

Database: Review

The main characteristics of the database approach:

single repository of data• sharable by multiple users

• concurrency control and transaction concept• security and integrity constraints

• self-describing - system catalogue contains meta data

• program-data independence• some changes to the database are transparent to

programs/users

• multiple views of data - to support individual needs of programs/users

Database: Review

Database schema, Schema evolution,

Database state

Working process with a database system

Database system architecture

Data independence concept

Concepts andArchitecture

Database: Review

Database schema

Relation schema

Schema evolution

Database state

Student Name StNo Class Major

Smith 17 1 CS

Brown 8 2 CS

Course CName CNo CrHrs Dept

Database 8803 3 CS

C 2606 3 CS

Section SId CNo Semester Yr Instructor

32 8803 Spring 2000 Smith

25 8803 Winter 2000 Smith

43 2606 Spring 2000 Jones

Grades StNo Sid Grade

17 25 A

17 43 B

Database: Review

Working process with a database system:

Definition•record structure•data elements

•names•data types•constraints

Construction•create database

files•populate the

database with records

Manipulation•querying•updating

Database: Review

Database Management System (DBMS)

•collection of software facilitating the definition, construction and manipulation of databases

Users/actors

Requestmanager

Storagemanager,

Queryevaluation

Meta data

Storeddatabase

Database: Review

Three-schema architecture

Externalview

Conceptualschema

Internalschema

Physical storage structures and details

Describes the whole database for all users

A specific user or groups view of the database

Database: Review

Data modeling usingER-model

Entity-relationship model- Entity types

- strong entities- weak entities

- Relationships among entities- Attributes - attribute classification- Constraints

- cardinality constraints- participation constraints

- is-a relationshipER-to-Relation-mapping

Database: Review

employee

department

project

dependent

ER-model:

works for

manages

works on

dependents of

controls

supervision

lnameminitfname

sex address salary

birthdatename sex relationship

name number location

number ofemployeesstartdate

supervisor supervisee NM

Database: Review

student

graduate undergraduate

The arc implies graduate and undergraduate are subtypes of student

The bubble and the d imply disjoint subtypes(o - overlap subtypes)

A student must be a graduate or undergraduate

• Participation of supertype may be mandatory or optional

• Subtypes may be disjoint or overlapping

• a predicate (on an attribute) determines the subtype: e.g. attribute Student_class

Student_class = ‘graduate’; Student_class = ‘undergraduate’

Student_class

Subtype is determined by the student_class attribute

Database: Review

Mapping to a relational database

• 4 choices:

1. Create separate relations for the supertype and each of the subtypes.

2. Create relations for the subtypes only - each contains attributes from the supertype.

3. (disjoint subtypes) Create only one relation - includes all of the attributes for the supertype and all for the subtypes, and one discriminator attribute.

4. (overlapping subtypes) Create only one relation - includes all of the attributes for the supertype and all for the subtypes, and one logical discriminator attribute per subtype.

PK is always the same - determined from the supertype

Database: Review

SECRETARY ENGINEER

TECHNICIAN

lnameminitfname

Ssn bDates Address JobType

TypingSpeed

TGradeEngType

fname, minit, lname, ssn, bdate, address, JobType

EMPLOYEE

Essn, TypingSpeed

SECRETARY

Essn, TGrade

TECHNICIAN

Essn, EngType

ENGINEER

EMPLOYEE

Database: Review

CAR TRUCK

Example for super- & sub-types: choice 2

VehicleId Price LicensePlate

TNoOfPassengersNoOfAxles

VehicleId, LicensePlate, Price, MaxSpeed, NoOfPassenger

VehicleId, LicensePlate, Price, NoOfAxles, Tonnage

MaxSpeed Tonnage

Vehicle

Database: Review

SECRETARY ENGINEER

TECHNICIAN

lnameminitfname

Ssn bDates Address JobType

TypingSpeed

TGradeEngType

fname, minit, lname, ssn, bdate, address, JobType, TypingSpeed, Tgrade, EngType

EMPLOYEE

Database: Review

Manufacture_Part Purchased_Part

PartNo Description

manufactureDate

Supplier

PartNo, Desription, MFlag, Drawing, ManufactureDate, BatchNo, Pflag, Supplier, ListPrice

BatchNo

DrawingNoListPrice

Database: Review

external hashing

static hashing & dynamic hashing

hash function

mathematical function that maps a key to a

bucket addresscollisionscollision resolution scheme- open addressing- chaining- multiple hashing

linear hashing

Hashing technique

Database: Review

External hashing: the data are on the disk.

Static hashing:using a hashing function to map keys to bucket addressesprimary area can not be changedcollision resolution schema:

open addressingchainingmultiple hashing

Dynamic hashing:primary area can be changedlinear hashing

Database: Review

Linear hashing:

1. What is a phase?

2. How to split a bucket?

3. When to split a bucket?

4. What bucket will be chosen to split next?

Database: Review

Linear hashing:initially hash file contains M bucketshi = key mod 2iM (i = 0, 1, 2, ...)insertion process can be divided into several phases

phase 1:insertion using h0 = key mod Msplitting using h1 = key mod 2Msplitting rule: overflow of a bucket or

if load factor > constant (e.g., 0.70)overflow will be put in the overflow area or redistributed throughsplitting a bucketsplitting buckets from n = 0 to n = M- 1 (after each splittingn is increased by 1.Phase 1 finishes when n = M (in this case, the primary areabecomes 2M buckets long)

Database: Review

phase 2:insertion using h1 = key mod 2Msplitting using h2 = key mod 4Msplitting rule: overflow of a bucket or

if load factor > constant (e.g., 0.70)overflow will be put in the overflow area or redistributed

throughsplitting a bucketsplitting buckets from n = 0 to n = 2M- 1 (after each splittingn is increased by 1.Phase 1 finishes when n = 2M (in this case, the primary areawill contain 4M buckets.)

phase 3: ... … h2 = …, h3 = …, ...

Database: Review

- root, internal, leaf, subtree

- parent, child, sibling

balanced, unbalanced

b+-tree

- splits on overflow; merge on underflow

- in practice it is usually 3 or 4 levels deep

search, insert, delete algorithms

Multi-levelindex

Database: Review

Motivation

• B+-tree provides a short access path.

Inverted indexSignature fileB+-treeHashing… …

file of recordspage1

Database: Review

A B+-tree

6 7 9 125 81 3

Records in a file

pinternal = 3,pleaf = 2.

Database: Review

B+-tree insertion: leaf node splitting, internal node splitting

Leaf splitting

When a leaf splits, a new leaf is allocated • the original leaf is the left sibling, the new one is the right sibling •key and pointer pairs are redistributed: the left sibling will have smaller keys than the right sibling

•a 'copy' of the key value which is the largest of the keys in the left sibling is promoted to the parent

12 22 33 44 48 55 12 22 44 48 5531 33

insert 31

Database: Review

Internal node splitting

If an internal node splits and it is not the root,• insert the key and pointer and then determine the middle key• a new 'right' sibling is allocated• everything to its left stays in the left sibling• everything to its right goes into the right sibling • the middle key value along with the pointer to the new right sibling is

promoted to the parent (the middle key value 'moves' to the parent to become the discriminator between this left and right sibling)

Insert 26

Database: Review

Internal node splitting

When a new root is formed, a key value and two pointers must be placed into it.

Insert 40

Database: Review

Deleting nodes from a B+-tree:

1. When deleting a key from a node A, check whether the number of the

remaining keys (or pointers) is p/2.

2. If it is not the case, redistribute the keys in the left sibling B or in the right

sibling C if it is possible. Otherwise, merge A and B or merge A and C.

3. When redistributing or merging, change the key values in the parent node

so that the following condition is satisfied:

• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq >

• K1 < K2 < ... < Kq-1 (i.e. it is an ordered set)

• for the key values, X, in the subtree pointed to by Pi

• Ki-1 < X <= Ki for 1 < i < q• X <= K1 for i = 1• Kq-1 < X for i = q

Database: Review

A b+-tree

6 7 9 125 81 3

Records

p = 3,pleaf = 2.

Database: Review

Entry deletion

- deletion sequence: 8, 12, 9, 7

6 7 125 91 3

Deleting 8 causes the node redistribute.

Database: Review

Entry deletion

6 75 91 3

12 is removed.

Database: Review

Entry deletion

65 71 3

9 is removed.

Database: Review

Entry deletion

Deleting 7 makes this pointer no use.Therefore, a merge at the level abovethe leaf level occurs.

Database: Review

Entry deletion

For this merge, 5 will be taken as a key value in A since any key value in B is less than or equal to 5 but any key value in C is larger than 5.

This point becomes useless.The corresponding nodeshould also be removed.

Database: Review

Entry deletion

Database: Review

Data modeling usingRelational modelRelational algebra

Relational Data Model

- relational schema (database schema)

- relation schema, relations, database state

- integrity constraints and updating

Relational algebra

- select, project, join, cartesian product

- division

- set operations:

union, intersection, difference,

Database: Review

Integrity Constraints

• any database will have some number of constraints that must be applied to ensure correct data (valid states)

1. domain constraints• a domain is a restriction on the set of valid values• domain constraints specify that the value of each

attribute A must be an atomic value from the domain dom(A).

2. key constraints• a superkey is any combination of attributes that

uniquely identify a tuple: t1[superkey] t2[superkey].- Example: <Name, SSN> (in Employee)

• a key is superkey that has a minimal set of attributes- Example: <SSN> (in Employee)

Database: Review

Integrity Constraints• If a relation schema has more than one key, each of them is

called a candidate key.• one candidate key is chosen as the primary key (PK)• foreign key (FK) is defined as follows:

i) Consider two relation schemas R1 and R2;ii ) The attributes in FK in R1 have the same domain(s) as the primary key attributes PK in R2; the attributes FK are said to reference or refer to the relation R2;iii) A value of FK in a tuple t1 of the current state r(R1) either

occurs as a value of PK for some tuple t2 in the current state r(R2) or is null. In the former case, we have t1[FK] = t2[PK], and we say that the tuple t1

references or refers to the tuple t2.

Example:

Employee(SSN, …, Dno) Dept(Dno, … )

Database: Review

Integrity Constraints

3. entity integrity

• no part of a PK can be null

4. referential integrity

• domain of FK must be same as domain of PK

• FK must be null or have a value that appears as a PK value

5. semantic integrity• other rules that the application domain requires: • state constraint: gross salary > net income • transition constraint: Widowed can only follow Married;

salary of an employee cannot decrease

Database: Review

Other SQL capabilities

• Assertions can be used for some constraints

• e.g. Create Assertion ... ... Executed and enforced by DBMS

Constraint: The salary of an employee must not be greater thanthe salary of the manager of the department that the employeeworks for.

CREATE ASSERTION salary_constraintCHECK (NOT EXISTS (SELECT * FROM employee e,

employee m, department dwhere e.salary > m.salary and e.dno=d.dnumber and

d.mgrssn=m.ssn));

Database: Review

Relational algebra

Retrieve for each female employee a list of the names of her

dependents:

FEMALE_EMPS SEX = ‘F’ (EMPLOYEE)

ACTUAL_DEPENDENTS EMPNAMES

EMPNAMES FNAME,LNAME, SSN(FEMALE_EMPS)

RESULT FNAME, LNAME, DEPENDENT_NAME(ACTUAL_DEPENDENTS )

DEPENDENTSSN = ESSN

Database: Review

- creating schemas

- modifying schemas

- select-from-where clause

- group by, having, order by

- update

- view

Database: Review

DDL - Examples:

• Create schema:Create schema COMPANY authorization JSMITH;

• Create table:Create table EMPLOYEE(FNAME VARCHAR(15) NOT NULL, MINIT CHAR, LNAME VARCHAR(15) NOT NULL, SSN CHAR(9) NOT NULL, BDATE DATE, ADDRESS VARCHAR(30), SEX CHAR, SALARY DECIMAL(10, 2), SUPERSSN CHAR(9), DNO INT NOT NULL,

PRIMARY KEY(SSN),FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN),FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER));

Database: Review

DDL - Examples:

• drop schemaDROP SCHEMA CAMPANY CASCADE;DROP SCHEMA CAMPANY RESTRICT;

• drop tableDROP TABLE DEPENDENT CASCADE;DROP TABLE DEPENDENT RESTRICT;

• alter tableALTER TABLE COMPANY.EMPLOYEE

ADD JOB VARCHAR(12);ALTER TABLE COMPANY.EMPLOYEE

DROP ADDRESS CASCADE;

Database: Review

DDL - Examples:

• Specifying constraints: Create table EMPLOYEE

(…, DNO INT NOT NULL DEFAULT

1,CONSTRAINT EMPPK

PRIMARY KEY(SSN),CONSTRAINT EMPSUPERFKFOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN)

ON DELETE SET NULL ON UPDATE CASCADE,CONSTRAINT EMPDEPTFKFOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER)

ON DELETE SET DEFAULT ON UPDATE CASCADE);

• Create domain:CREATE DOMAIN SSN_TYPE AS CHAR(9);

Database: Review

set null or cascade: strategies to maintain data consistency

Employee

delete

ssn supervisor

Employee

delete

ssn supervisor

delete

not reasonable

cascade

Database: Review

set null or cascade: strategies to maintain data consistency

Employee

set null

ssn supervisor

Employee

delete

ssn supervisor

delete

reasonable

Database: Review

set default: strategy to maintain data consistency

Department

DNUMBER … …

… …

Employee

ssn DNO

… …

delete

change thisvalue to thedefault value 1.

Database: Review

DML - select-from-where clause

Retrieve a list of employees and the projects they are working on, ordered bydepartment, within each department, ordered alphabetically by last name, first name:

SELECT DNAME, LNAME, FNAME, PNAMEFROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECTWHERE DNUMBER = DNO AND SSN = ESSN AND

PNO = PNUMBERORDER BY DNAME, LNAME, FNAME

order by – clausegroup by – clausehaving – clauseaggregation functions: max, min, average, count, sum

Database: Review

DML - select-from-where clause

• Insert• Update• Delete

INSERT INTO employee ( fname, lname, ssn, dno )VALUES ( "Joe", "Smith", 909, 1);

UPDATE employee SET salary = 100000WHERE ssn=909;

DELETE FROM employee WHERE ssn=909;

Note that Access changes the above to read:INSERT INTO employee ( fname, lname, ssn, dno )SELECT "Joe", "Smith", 909, 1;

Database: Review

View definition

• Use a Create View command

• essentially a select specifying the data that makes up the view

• Create View Enames as select lname, fname from employee

CREATE VIEW Enames (lname, fname)AS SELECT LNAME, FNAME FROM EMPLOYEE

Database: Review

CREATE VIEW DEPT_INFO (DEPT_NAME,NO_OF_EMPS,TOTAL_SAL)

AS SELECT DNAME, COUNT(*), SUM(SALARY)FROM DEPARTMENT, EMPLOYEEWHERE DNUMBER = DNOGROUP BY DNAME;

Database: Review

(Database application in Java)

Database: Review

To develop a database application, JDBC or ODBC should be used.

JDBC – JAVA Database Connectivity

ODBC – Open Database Connectivity

JDBC-ODBC Bridge

ODBC Driver

Database Client

Client

Database

Server

Database: Review

Connection to a database:

1. Loading driver class

Class.forName(“sun.jdbc.odbc.JdbcOdbcDriver”);

2. Connection to a database

String url = “jdbc:odbc:<databaseName>”;

Connction con = DriverManager.getConnection(url, <userName>, <password>)

Database: Review

3. Sending SQL statements

Statement stmt = con.createStatement();

ResultSet rs = stmt.executeQuery(“SELECT * FROM Information WHERE Balance >= 5000”);

4. Getting results

while (rs.next())

a table name

Database: Review

import java.sql.*;

public class DataSourceDemo1{ public static void main(String[] args){ Connection con = null;

try{//load driver classClass.forName{“sun.jdbs.odbs.JdbsOdbcDriver”);

//data sourceString url = “jdbs:odbc:Customers”;

//get connectioncon = DriverManager.getConnection(url,“sa”, “ “)

password

Database: Review

//create SQL statementStatement stmt = con.createStatement();

//execute queryResult rs = stmt.executeQuery(“SELECT *FROM Information WHERE Balance >= 5000”);

String firstName, lastName;Date birthDate;float balance;int accountLevel;

Database: Review

while(rs.next()){firstName = rs.getString(“FirstName”);lastName = rs.getString(“lastName”);balance = rs.getFloat(“Balance”);

System.out.println(firstName + “ “ +lastName + “, balance = “ + balance);

}}catch(Exception e){e.printStackTrace();}finally{try{con.close();}catch(Exception e){ }

Database: Review

Programming in an dynamical environment:

Disadvantage of DataSourceDemo1:

If the JDBC-ODBC driver, database, user names, or password are changed, the program has to be modifid.

Solution:

Configuration file:config.driver=sun.jdbc.odbc.JdbcOdbcDriverconfig.protocol=jdbcconfig.subprotocol=odbcconfig.dsname=Customersconfig.username=saconfig.password=

file name: datasource.config

config – datasource name

Database: Review

import java.sql.*;import java.io.*;import java.util.Properties;

public class DatabaseAccess{ private String configDir;//directory for configuration fileprivate String dsDriver = null;private String dsProtocol = null;private String dsSubprotocol = null;private String dsName = null;private String dsUsername = null;private String dsPassword = null;

Database: Review

public DatabaseAccess(String configDir){ this.configDir = configDir; }

public DatabaseAccess(){ this(“.”); }

//source: data source name//configFile: source configuration file

public Connection getConnection(String source,String configFile) throws SQLException, Exception{ Connection con;

try{Properties prop = loadConfig(ConfigDir, ConfigFile);

getConnection(“config”,

“datasource.config”);

Database: Review

if (prop != null){dsDriver = prop.getProperty(source + “.driver”);dsProtocol = prop.getPropert(source + “.protocol”);dsSubprotocol = prop.getPropert(source +“.subprotocol”);if (dsName == null)dsName = prop.getProperty(source +“.dsName”);

if (dsUsername == null)dsUsername = prop.getProperty(source +“.username”);

if (dsPassword == null)dsPassword = prop.getProperty(source +

“.password”);

Database: Review

//load driver classClass.forName(dsDriver);

//connect to data sourceString url = dsProtocol + “:” + dsSubprotocol + “:”+ dsName;con = DriverManager.getConnection(url, dsUsername,dsPassword)

throw new Exception(“*Cannot find property file” +configFile);

return con;}catch (ClassNotFoundException e){ throw new Exception(“* Cannot find driver class “ +dsDriver + “!”); }

Database: Review

//dir: directory of configuration file//filename: file namepublic Properties loadConfig(String dir, String filename)throws Exception{ File inFile = null;Properties prop = null;

try{ inFile = new File(dir, filename);

if (inFile.exists(){ prop = new Properties();

prop.load(new FileInputStream(inFile));}else throw new Exception(“* Error in finding “ +

inFile.toString());}finally {return prop;}}

Database: Review

Using class DatabaseAccess, DataSourceDemo1 should be modified a little bit:

DatabaseAccess db = new databaseAccess();

con = db.getConnection(“config”, “datasource.config”);

Database: Review

function dependencies

- data redundancy, update anomalies

- what is a function dependency?

- inference rules, minimal set of FDs

normal forms

- first normal form

- second normal form

- third normal form

- Boyce Codd normal form

Normalization

Database: Review

Data redundancy and update anomalies:

ename ssn bdate address

EmployeeDepartment

dnumber dname

This is similar to Employee, but we have included dname.

Database: Review

EmployeeProject

ssn pnumber hours ename plocation

This is similar to Works_on, but we have included ename and plocation

Database: Review

In the two prior cases with EmployeeDepartment and EmployeeProject, we have redundant information in the database …

• if two employees work in the same department, then that department name is replicated

• if more than one employee works on a project then the project location is replicated

• if an employee works on more than one project his/her name is replicated

Redundant data leads to

• additional space requirements

• update anomalies

Database: Review

Suppose EmployeeDepartment is the only relation where department name is recorded

insert anomalies

• adding a new department is complicated unless there is also an employee for that department

deletion anomalies

• if we delete all employees for some department, what should happen to the department information?

modification anomalies

• if we change the name of a department, then we must change it in all tuples referring to that department

Database: Review

Functional dependencies:

Suppose we have a relation R comprising attributes X,Y, …

We say a functional dependency exists between the attributes X and Y,

if, whenever a tuple exists with the value x for X, it will always have the same value y for Y.

LHS RHS

Database: Review

student_no student_namecourse_no gender

Student

Given a specific student number, there is only one value for student name and only one value for gender found with it.

Student_no Student_name

gender

Database: Review

Inference Rules for Function Dependencies

• From a set of FDs, we can derive some other FDs

Example:

F = {ssn {EnameBdate, Address, dnumber},

dnumber {dname, dmgrssn}}

ssn {dname, dmgrssn}, ssn dnumber,dnumber dname.

inference

• F+ (closure of F): The set of all FDs that can be deduced fromF (with F together) is called the closure of F.

Database: Review

Inference Rules for Function Dependencies

• Inference rules:

- IR1 (reflexive rule): If X Y, then X Y. (X X.)

- IR2 (augmentation rule): {X Y} |= ZX Y.

- IR3 (transitive rule): {X Y, Y Z} |= X .

- IR4 (decomposition, or projective, rule):

{X Y} |= X Y, X Z.

- IR5 (union, or additive, rule): {X Y, Y Z} |= X Y.

- IR6 (pseudotransitive rule): {X Y, WY Z} |= WX .

Database: Review

Equivalence of Sets of FDs

E and F are equivalent if E+ = F+.

Minimal sets of FDs

• every dependency has a single attribute on the RHS

• the attributes on the LHS of a dependency are minimal

• we cannot remove any dependency from F and still have a set of dependencies that is equivalent to F.

{ssn, pnumber} hours,ssn ename,pnumber plocation.

Database: Review

Normal Forms

• A series of normal forms are known that have, successively, better update characteristics.

• We’ll consider 1NF, 2NF, 3NF, and BCNF.

• A technique used to improve a relation is decomposition, where one relation is replaced by two or more relations. When we do so, we want to eliminate update anomalies without losing any information.

Database: Review

1NF - First Normal Form

The domain of an attribute must only contain atomic values.

• This disallows repeating values, sets of values, relations within relations, nested relations, …

• In the example database we have a department located in possibly several locations: department 5 is located in Bellaire, Sugarland, and Houston.

• If we had the relation

then it would not be 1NF because there are multiple values to be kept in dlocations.

Department

dnumber dname dmgrssn dlocations

5 Research 333445555 Bellaire, Sugarland, Houston

Database: Review

1NF - First Normal Form

If we have a non-1NF relation we can decompose it, or modify it appropriately, to generate 1NF relations.

There are 3 options:

• option 1: split off the problem attribute into a new relation (create a DepartmentLocation relation).

dnumber dname dmgrssn dlocation

Department

dnumber

DepartmentLocation

5 Research 333445555 Bellaire5

5 Sugarland

5 HoustonGenerally considered the best solution

Database: Review

2NF - Second Normal Form

• full functional dependency

X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more.

EmployeeProject

{ssn, pnumber} hours is a full dependency

(neither ssn hours , nor pnumber hours).

Database: Review

• partial functional dependency

X Y is a partial functional dependency if removal of some attribute A from X does not affect the dependency.

{ssn, pnumber} ename is a partial dependency

because ssn ename holds.)

EmployeeProject

Database: Review

A relation schema is in 2NF if

(1) it is in 1NF and

(2) every non-key attribute must be fully functionally dependent on the primary key.

If we had the relation

EmployeeProject

then this relation would not be 2NF because of two separate

violations of the 2NF definition:

Database: Review

•We correct this by decomposing the relation into three relations - splitting off the offending attributes - splitting off partial dependencies on the key.

EmployeeProject

ssn pnumber hours

plocation

pnumber

Database: Review

3NF - Third Normal Form

• Transitive dependency

A functional dependency X Y in a relation schema R is a transitive dependency if there is a set of attributes Z that is not a subset of any key of R, and both X Z and Z Y hold.

EmployeeDept

dnumber dname

ssn dnumber and dnumber dname

Database: Review

A relation schema is in 3NF if

(1) it is in 2NF and

(2) each non-key attribute must not be fully functionally dependent on another non-key attribute (there must be no transitive dependency of a non-key attribute on the PK)

• If we had the relation

ename ssn bdate address dnumber dname

then this relation would not be 3NF because• dname is functionally dependent on dnumber and neither is• a key attribute

Database: Review

• We correct this by decomposing - splitting off the transitive dependencies

EmployeeDept

dnumber dname

ename ssn bdate address dnumber

dnamednumber3NF

Database: Review

Boyce Codd Normal Form, BCNF

• Consider a different definition of 3NF, which is equivalent to the previous one.

A relation schema R is in 3NF if, whenever a function dependency X A holds in R, either

(a) X is a superkey of R, or

(b) A is a prime attribute of R.

A superkey of a relation schema R = {A1, A2, ..., An} is a set of attributes S Rwith the propertity that no tuples t1 and t2 in any legal state r of R will have t1[S] = t2[S].An attribute is called a prime attribute if it is a member of any key.

Database: Review

• If we remove (b) from the previous definition for 3NF, we have the definition for BCNF.

• A relation schema is in BCNF if every determinant is a superkey key. Stronger than 3NF:

- no partial dependencies

- no transitive dependencies where a non-key attribute is dependent on another non-key attribute

- no non-key attributes appear in the LHS of a functional dependency.

Database: Review

Consider:

student_no course_no instr_no

Instructor teaches one course only.

Student takes a course and has one instructor.

In 3NF!

{student_no, course_no} instr_noinstr_no course_no

Database: Review

Some sample data:

121 1803 99

121 1903 77

222 1803 66

222 1903 77

Instructor 99 teaches 1803

Database: Review

Some sample data:

121 1803 99

121 1903 77

222 1803 66

222 1903 77

Database: Review

121 1803 99

121 1903 77

222 1803 66

222 1903 77

Deletion anomaly: If we delete all rows for course 1803 we’ll lose the information that instructors 99 teaches student 121 and 66 teaches student 222.Insertion anomaly: How do we add the fact that instructor 55 teaches course 2906?

Database: Review

course_no instr_no

student_no course_no

instr_no

course_no instr_no

student_no instr_no

student_no

121 1803

121 1903

222 1803

222 1903

1803 99

1903 77

1803 66

Joining these two tables leads to spurious tuples - result includes

121 1803 66222 1803 99

Which decomposition preserves all the information?

S# C# C# I#

Database: Review

121 1803 99

121 1903 77

222 1803 66

222 1903 77

course_no instr_nostudent_no course_no

121 1803

121 1903

222 1803

222 1903

1803 99

1903 77

1803 66

Database: Review

course_no instr_no

instr_no?

course_no instr_no

student_no instr_no

student_noJoining these two tables leads to spurious tuples - result includes

121 1803 77121 1903 99222 1803 77222 1903 66

121 1803 99

121 1903 77

222 1803 66

222 1903 77

S# C# I#S#

Which decomposition preserves all the information?

Database: Review

121 1803 99

121 1903 77

222 1803 66

222 1903 77

student_no instr_nostudent_no course_no

121 1803

121 1903

222 1803

222 1903

Database: Review

This decomposition preserves all the information.

course_no instr_no

student_no instr_no121 180399

121 190377

222 180366

222 77

S# C#I# I#

Only FD is instr_no course_no

but the join preserves

{student_no, course_no} instr_no

Database: Review

121 1803 99

121 1903 77

222 1803 66

222 1903 77

course_no instr_nostudent_no Instr_no

121 99

121 77

222 66

222 77

Database: Review

Definition of lossless join property

- relation decomposition

- lossless join property

Testing algorithm

- matrix construction

- matrix initialization

- matrix modification

Losslessjoin

Database: Review

• Basic definition of Lossless-join

A decomposition D = {R1, R2,..., Rm} of R has the lossless join property with respect to the set of dependencies F on R if, for every relation r of R that satisfies F, the following holds,

(R1(r), ..., Rm(r)) = r,

where is the natural join of all the relations in D.

The word loss in lossless refers to loss of information, not to loss of tuples.

Database: Review

SSN PNUM hours ENAME

Emp_PROJ

PNAME PLOCATION

F = {SSN ENAME, PNUM {PNAME, PLOCATION},{SSN, PNUM} hours}

SSN ENAME

PNUM PNAME PLOCATION

SSN PNUM hours

Lossless join

Database: Review

•decomposion-1

A2ENAME

A3PNUM

A4PNAME

A5PLOCATION

A6hours

Database: Review

SSN ENAME

PNUM {PNAME, PLOCATION}

SSN ENAME

PNUM PNAME PLOCATION

Database: Review

•Example: decomposition-2

SSN PNUM hours ENAME

Emp_PROJ

PNAME PLOCATION

F = {SSN ENAME, PNUM {PNAME, PLOCATION},{SSN, PNUM} hours}

SSN PNAME

PLOCATION

PNUM hours

Not lossless join

PLOCATION

Database: Review

•decomposition-2

A2ENAME

A3PNUM

A4PNAME

A5PLOCATION

A6hours

The matrix can not be changed!

SSN ENAMEPNUM {PNAME, PLOCATION}

{SSN, PNUM} hours

Database: Review

Multi-Dimensional Indexes

• Multiple-key indexes

• kd-trees

• Quad trees

• R-trees

• Bit map

• Inverted files

Database: Review

Multiple-key indexes

(Indexes over more than one attributes)

Employee

ename ssn age salary dnumber

Aaron, Ed

Abbott, Diane

Adams, JohnAdams, Robin

Database: Review

(Indexes over more than one attributes)

Index on ageIndex on salary

Database: Review

75100120275

Database: Review

kd-Trees

(A generalization of binary trees)

A kd-tree is a binary tree in which interior nodes have an associatedattribute a and a value v that splits the data points into two parts:those with a-value less than v and those with a-value equal or largerthan v.

Database: Review

kd-Treessalary 150

age 60 age 47

salary 80 salary 300

age 38

70, 11085, 140

50, 27560, 260

50, 10050, 120

30, 260 25, 40045, 350

25, 60 45, 6050, 75

Database: Review

Insert a new entry into a kd-tree:

insert(35, 500):salary 150

age 60 age 47

age 38

70, 11085, 140

50, 27560, 260

50, 10050, 120

30, 260 25, 40045, 350

25, 60 45, 6050, 75

Database: Review

Insert a new entry into a kd-tree:

salary 150

age 60 age 47

age 38

70, 11085, 140

50, 27560, 260

50, 10050, 120

30, 260

35, 50045, 350

25, 60 45, 6050, 75

insert(35, 500):

25, 400

age 35

Database: Review

Quad-trees

In a Quad-tree, each node corresponds to a square region in twodimensions, or to a k-dimensional cube in k dimensions.

• If the number of data entries in a square is not larger than whatwill fit in a block, then we can think of this square as a leaf node.

• If there are too many data entries to fit in one block, then we treatthe square as an interior node, whose children correspond to itsfour quadrants.

Database: Review

Quad-trees

salary

name age salary… …

… 25 400… …

Database: Review

Quad-trees

50, 200

50, 7550, 100

25, 6046, 60

75, 100 25, 30050, 27560, 260

85, 140 50, 12070, 110

30, 260 25, 40045, 350

SWSE NE

SW – south-westSE – south-east

NW – north-westNE – north-east

Database: Review

R-trees

An R-tree is an extension of B-trees for multidimensional data.

• In an R-tree, any interior node corresponds to some interiorregions, or just regions, which are usually a rectangle

• An R-tree corresponds to a whole area (a rectangle for two-di-mensional data.)

• Each region x in an interior node n is associated with a link to achild of n, which corresponds to all the subregions within x.

Database: Review

Suppose that the local cellular phone company adds a POP (pointof presence, or base station) at the position shown below.

school POP

house1

house2road1road2

pipeline

Database: Review

R-trees

((0, 0), (60, 50)) ((20, 20), (100, 80))

road1 road2 house1 school house2 pipeline pop

Database: Review

Insert a new region r into an R-tree.

school POP

house1

house2road1

road2pipeline

house3

((70, 5), (980, 15))

Database: Review

1. Search the R-tree, starting at the root.2. If the encountered node is internal, find a subregion into which

r fits.

• If there is more than one such region, pick one and go to itscorresponding child.

• If there is no subregion that contains r, choose any subregionsuch that it needs to be expanded as little as possible to containr.

((0, 0), (60, 50)) ((20, 20), (100, 80))

road1 road2 house1 school house2 pipeline pop

((70, 5), (980, 15))

Database: Review

((0, 0), (80, 50)) ((20, 20), (100, 80))

school house2 pipeline pop

Two choices:

• If we expand the lower subregion, corresponding to the firstleaf, then we add 1000 square units to the region.

• If we extend the other subregion by lowering its bottom by 5units, then we add 1200 square units.

road1 road2 house1 house3

Database: Review

3. If the encountered node v is a leaf, insert r into it. If there is noroom for r, split the leaf into two and distribute all subregions inthem as evenly as possible. Calculate the ‘parent’ regions for thenew leaf nodes and insert them into v’s parent. If there is theroom at v’s parent, we are done. Otherwise, we recursively splitnodes going up the tree.

((0, 0), (100, 100))

road1 road2 house1 school house2 pipeline

Add POP (point ofpresence, or basestation)

Suppose that eachleaf has room for6 regions.

Database: Review

Bit map

1. Image that the records of a file are numbered 1, …, n.2. A bitmap for a data field F is a collection of bit-vector of

length n, one for each possible value that may appear in thefield F.

3. The vector for a specific value v has 1 in position i if the ithrecord has v in the field F, and it has 0 there if not.

Database: Review

Example

Employee

ename ssn age salary dnumber

Aaron, Ed

Abbott, Diane

Adams, JohnAdams, RobinBrian, RobinBrian, Mary Widom, Jones

4050555560

75757880

Bit maps for age:

30: 110000040: 001000050: 0001000

55: 000011060: 0000001

Bit maps for salary:

60: 110000075: 001100078: 0000100

80: 0000010100: 0000001

Database: Review

Query evaluation

00001100000010

Select enameFrom EmployeeWhere age = 55 and salary = 80

In order to evaluate this query, we intersect the vectors forage = 55 and salary = 80.

vector for age = 55vector for salary = 80

0000010

This indicates the 6th tuple is the answer.

Database: Review

Range query evaluation

Select enameFrom EmployeeWhere 30 < age < 55 and 60 < salary < 78

We first find the bit-vectors for the age values in (30, 50); there are only two:0010000 and 0001000 for 40 and 50, respectively.

Take their bitwise OR: 0010000 0001000 = 0011000.

Next find the bit-vectors for the salary values in (60, 78) and take their bitwise

OR: 1100000 0011000 = 1111000.00110001111000

0011000

The 3rd and 4th tuples are the answer.

Database: Review

Compression of bitmaps

Run-length encoding:

Run in a bit vector: a sequence of i 0’s followed by a 1.

000000010001

Run compression: a run r is represented as another bit string r’composed of two parts.

part 1: i expressed as a binary number, denoted as b1(i).part 2: Assume that b1(i) is j bits long. Then, part 2 is a sequenceof (j – 1) 1’s followed by a 0, denoted as b2(i).

r’ = b2(i)b1(i).

This bit vector contains two runs.

Database: Review

Compression of bitmaps

Run-length encoding:

Run in a bit vector s: a sequence of i 0’s followed by a 1.

000000010001

r’ = b2(i)b1(i).

This bit vector contains two runs.

r1 = 00000001

b11 = 7 = 111, b12 = 110

r2 = 0001

b11 = 3 = 11, b12 = 10

r1’ = 110111

r2’ = 1011

Database: Review

000000010001

r1’ r2’ = 1101111011

Decoding a compressed sequence s’:

1. Scan s’ from the beginning to find the first 0.2. Let the first 0 appears at position j. Check the next j bits. The

corresponding value is a run.3. Remove all these bits from s’. Go to (1).

Starting at the beginning, find the first 0at the 3rd bit, so j = 3. The next 3 bits are111, so we determine that the first integeris 7. In the same way, we can decode1011.

Database: Review

Inverted files

An inverted file - A list of pairs of the form: <key word, pointer>

… the cat isfat

… was rainingcats and dogs …

… Fido theDogs …a bucket of pointers

Database: Review

Inverted filesWhen we use “buckets” of pointers to occurrences of each word,we may extend the idea to include in the bucket array someinformation about each occurrence.

… the cat isfat

… was rainingcats and dogs …

… Fido theDogs …

header

anchortext

type position …

Database: Review

Hierarchical database schema

- hierarchical schema

- record type, PCR type

- virtual PCR: virtual child, virtual parent

Database languages

- HDDL

- HDML

Hierarchicaldatabases

Database: Review

dependent

Dept_locations

employee

department

project

ERD for Chapter 6 database example

Works on

Database: Review

•Virtual Parent-child Relationships- Hierarchical schema using VPCR - for a Company

databaseDepartment

Dname Dnum

Project

Pname … ...Dlocation

Location

DemployeeEPTR

DmanagerMPTR Pworker

Hours WPTR

Employee

Ename Minit … ...

EsuperviseeSPTR

Dependent

DEPname Minit ...

StartDate

Database: Review Sept. 2012Yangjun Chen ACS-39021 Database Introduction system architecture, Basic...

Documents

DATABASE RELASIONAL - kumoro.staff.ugm.ac.idkumoro.staff.ugm.ac.id/.../uploads/2008/12/database... · DATABASE RELASIONAL ¾Aplikasi manajemen database mengenal dua macam bentuk database:

Database and Database Security

Database fundamentals(database)

Physical database design(database)

Database system concepts and architecture Sept. 2012Yangjun Chen ACS-39021 Outline: Concepts and Architecture (Chapter 2 – 3 rd, 4 th, 5 th, and 6 th ed.)

Deductive Databases Jan. 2012Yangjun Chen ACS-39021 Outline Chapter 25 – 3rd ed. (Chap. 24.4 – 4 th, 5 th ed.; 26.5, 6 th ed.) What is a deductive database

Oracle Database / Database Options

Object oriented Database Prof. Sin-Min Lee. Database Management Systems Database Models Database Models Relational Database Object Oriented Database

File Organizations Sept. 2012Yangjun Chen ACS-39021 Outline: Hashing (5.9, 5.10, 3 rd. ed.; 13.8, 4 th, 5 th ed.; 17.8, 6 th ed.) external hashing static

define database(ddl) define database(ddl) - Firebird · define database(ddl) define database(ddl) NAME define database −create a database SYNTAX define database quoted-filespec[

Sept. 2012Dr. Yangjun Chen ACS-39021 Outline Signature Files - Signature for attribute values - Signature for records - Searching a signature file Signature

Database: Review Sept. 2009Yangjun Chen ACS-39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational

File Organizations Sept. 2012Yangjun Chen ACS-3902/31 Outline: File Organization Hardware Description of Disk Devices Buffering of Blocks File Records

Materi 1 Sekilas tentang Ekonomi Duniasinggih.staff.gunadarma.ac.id/Downloads/files/39021/EkoInter-01.pdf · – Porsi perdagangan dalan PDB – US: • Apa yang diexpor/impor? •

File Organizations March 2007R McFadyen ACS - 39021 In SQL Server 2000 Tree terms root, internal, leaf, subtree parent, child, sibling balanced, unbalanced

Relational Data Model Sept. 2014Yangjun Chen ACS-39021 Outline: Relational Data Model Relational Data Model -relation schema, relations -database schema,

SQL in Oracle Jan. 2008Yangjun Chen ACS-39021 Outline: SQL in Oracle Oracle database system architecture - Oracle server - Oracle client SQL*Plus PL/SQL

Database Database Sree

Indian Email Id Database, USA Email Database, UK Email Database, UAE Email Database

Database Concepts & Introduction to MS Access 1. Outline Database Overview Database Management System Concepts Database Structures Database, tables,