43
CSCI 3140 Module 1 – Background (Based on Chapters 1 – 13 of Database Systems by Connolly and Begg) Theodore Chiasson Dalhousie University

CSCI 3140 Module 1 – Background

  • Upload
    jasia

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

CSCI 3140 Module 1 – Background. (Based on Chapters 1 – 13 of Database Systems by Connolly and Begg) Theodore Chiasson Dalhousie University. Definitions. Database A collection of related data Database Management System (DBMS) Software that manages and controls access to the database. - PowerPoint PPT Presentation

Citation preview

Page 1: CSCI 3140 Module 1 – Background

CSCI 3140Module 1 – Background

(Based on Chapters 1 – 13 of

Database Systems by Connolly and Begg)

Theodore Chiasson

Dalhousie University

Page 2: CSCI 3140 Module 1 – Background

Definitions

• Database- A collection of related data

• Database Management System (DBMS)– Software that manages and controls access to the database

Page 3: CSCI 3140 Module 1 – Background

File-based systems

• Early attempt to computerize manual filing systems• Limitations of file-based systems include:

– Separation and isolation of data• Application programs often access many files

– Duplication of data• Data needed in several departments will be entered into each

department’s system– Data dependence

• Application programs need to be updated if file formats are changed– Incompatibility of files

• Different application development environments produce different file formats

– Fixed queries/proliferation of application programs• Application programmer needs to intervene for ad hoc queries

Page 4: CSCI 3140 Module 1 – Background

Two underlying problems with file-based systems:

1) The definition of data is embedded within the application programs rather than being stored separately and independently

2) There is no control over access and manipulation of data beyond that imposed by the application programs

Page 5: CSCI 3140 Module 1 – Background

Definitions (revisited)

• Database- A shared collection of logically related data, and a description of

this data, designed to meet the information needs of an organization

• Database Management System (DBMS)– A software systems that allows user to define, create, maintain,

and control access to the database.

Page 6: CSCI 3140 Module 1 – Background

Important database terms

• System catalog, data dictionary, or meta-data– The ‘data about the data’– The part of the database that makes it self-describing

• Entity– A distinct object (person, place, thing, concept or event) that is

represented in the database

• Attribute– A property describing some aspect of an entity

• Relationship– An association between entities

Page 7: CSCI 3140 Module 1 – Background

Facilities provided by a DBMS:

• Data Definition Language (DDL)– Used to specify the data types and structures and the constraints on the

data to be stored in the database

• Data Manipulation Language (DML)– Used to insert, update, delete, and retrieve data from the database

• Structured Query Language (SQL)– De facto standard language for relational DBMSs.

• Controlled access to the database– Security system

– Integrity system

– Concurrency control system

– Recovery control system

– User-accessible catalog

Page 8: CSCI 3140 Module 1 – Background

• External level– The users’ view of the database– Describes the part of the database that is relevant to each user– External schemas, or subschemas

• Conceptual level– The community view of the database– Describes what data is stored in the database and the relationships

among the data– Conceptual schema

• Internal level– The physical representation of the database on the computer– Describes how the data is stored in the database– Internal schema

ANSI-SPARC three-level architecture

Page 9: CSCI 3140 Module 1 – Background

ANSI-SPARC three-level architecture

View 1

InternalSchema

ConceptualSchema

View 2 View nExternallevel

Conceptuallevel

Internallevel

Physical dataorganization

User 1 User 2 User n

Page 10: CSCI 3140 Module 1 – Background

The Relational Model

• All data is logically structured in relations (tables)• Each relation has a name and is made up of attributes (columns) of data• Each tuple (row) contains one value per attribute

Relation: a table with columns and rows (also called a file)

Attribute: a named column of a relation (also called a field)

Tuple: a row of a relation (also called a record)

Domain: the set of allowable values for one or more attributes

Degree: the number of attributes in a relation (unary, binary, ternary, n-ary)

Cardinality: the number of tuples in a relation

Relational database: a collection of normalized relations with distinct relation names

Page 11: CSCI 3140 Module 1 – Background

Mathematical review

• Let A and B be sets, where A = {2,4} and B = {1,3,5).• The CARTESIAN PRODUCT of these two sets, written A X B, is the set of

all ordered pairs such that the first element is an element of A and the second element is an element of B, which gives us the set

{(2,1),(2,3),(2,5),(4,1),(4,3),(4,5)}• Any subset of this set is termed a relation• For example, R = {(2,1),(4,1)} is a relation. In this case, R can be specified as all those ordered pairs with

the second element equal to 1, or R = {(x,y) | x E A, y E B, and y = 1} We could also define a relation S as S = {(x,y) | x E A, y E B, and x = 2y} which, given set A and B above, would yield S = {(2,1)}

Page 12: CSCI 3140 Module 1 – Background

Database Relations

• Relation Schema – a named relation defined by a set of attribute and domain name pairs

• Let A1, A2, … An be attributes with domains D1, D2, ... Dn. Then the set {A1:D1, A2: D2, … An: Dn} is a relation schema. Thus, a relation R is a set of n-tuples:

(A1:d1, A2,d2, … An:dn) such that d1E D1, d2E D2, … dnE Dn

Tuples are normally written without the column names, yielding (d1,d2, … dn) A relation instance is a set of tuples, such as {(6050, Theo, 317)} {(ID:6050, NAME: Theo, Office: 317)}

• Relational database schema – a set of relation schemas, each with a distinct name– Also called a relational schema

Page 13: CSCI 3140 Module 1 – Background

Relational keys

• Superkey– An attribute, or set of attributes, that uniquely identifies a tuple within a

relation

• Candidate key– A superkey such that no proper subset is a superkey within the relation

• Primary key– The candidate key that is selected to uniquely identify tuples with the

relation

• Foreign key– An attribute, or set of attributes, within one relation that matches the

candidate key of some (possibly the same) relation

Page 14: CSCI 3140 Module 1 – Background

Relational Integrity

• Domain constraints– Restrict the allowable set of values for the attributes of a relation

• Null– Represents a value for an attribute that is currently unknown or is not

applicable for this tuple

• Entity integrity– In a base relation, no attribute of a primary key may be null

• Referential integrity– If a foreign key exists in a relation, either the foreign key value must

match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null

• Enterprise constraints– Additional rules specified by the users or database administrators of a

database

Page 15: CSCI 3140 Module 1 – Background

Views

• Base Relation– A named relation corresponding to an entity in the conceptual schema,

whose tuples are physically stored in the database

• View– The dynamic results of one or more relational operations operating on

the base relations to produce another relation.

– A view is a virtual relation that does not necessarily exist in the database but can be produced upon request by a particular user, at the time of the request

– Not all views are updatable.

Page 16: CSCI 3140 Module 1 – Background

Relation Algebra

• A theoretical language with operations that work on one or more relations to define another relation without changing the original relation(s)

• Five fundamental operations:– Selection

– Projection

– Cartesian product

– Union

– Set difference

• Three additional operations (can be derived from the five)– Join

– Intersection

– Division

Page 17: CSCI 3140 Module 1 – Background

Unary operations

• Selection operation– Input is a relation and a predicate. The predicate is constructed from

boolean expressions involving attributes of the relation using logical operators ^ (AND), v (OR), and ~ (NOT). Output is a new relation that consists of only those tuples in the original relation for which the predicate evaluates to TRUE.

– Example: σsalary > 100000(Staff)– Result: A new relation with the same attributes as the Staff relation, and

with tuples corresponding to staff with a salary greater than 100,000.– The degree of the new relation is the same as the degree of the original

relation.

• Projection operation– Input is a relation and a set of attributes of the relation. Output is a new

relation with only those attributes listed in the input set. The degree of the new relation is the number of attributes in the input set, and the cardinality of the new relation is the same as the cardinality of the original relation.

– Π(staffNo, salary)(Staff)

Page 18: CSCI 3140 Module 1 – Background

• Union (R U S)– Input is two union-compatible relations R and S, output is a new relation

containing all of the tuples of R and S

– Union-compatibility is defined as having the same number of attributes with each pair of corresponding attributes having the same domain

• Set Difference (R – S)– Yields a new relation consisting of tuples that are in relation R that are

not in relation S. R and S must be union-compatible.

• Intersection (R S)– Yields a new relation consisting of tuples that are in both relation R and

relation S. Equivalent to R – (R – S)

• Cartesian Product (R x S)– Yields a new relation that is the concatenation of every tuple of R with

every tuple of S

Set Operations

Page 19: CSCI 3140 Module 1 – Background

Join Operations

• Theta join ( R FS)

– Yields a relation with all tuples of the Cartesian Product of R and S that satisfy the predicate F

• Equijoin ( R FS)

– A Theta join in which the only predicate is equal (=)

• Natural join ( R S)– Equijoin over all common attributes of R and S. One occurrence of each

common attribute is eliminated from the result.

• Outer join ( R S)– Tuples from R which do not have matching values in the common

attributes of S are also included in the join, with missing values in the second relation set to null.

• Semijoin ( R FS)

– Contains the tuples of R that participate in a join of R with S

Page 20: CSCI 3140 Module 1 – Background

Division operation (R – S)

• The division operation defines a relation over the attributes C that consists of the set of tuples from R that match the combination of every tuple is S, where C is the set of attributes that are in R but are not in S.

Page 21: CSCI 3140 Module 1 – Background

Relational Calculus

• Based on a branch of symbolic logic called predicate calculus– A predicate is a truth-valued function with arguments

– A proposition is a predicate with values filled in for the arguments

– If P is a predicate, the set of all x such that P is true for x is expressed as

{x | P(x)}

Page 22: CSCI 3140 Module 1 – Background

Tuple Relational Calculus

• Tuple relational calculus– Find tuples for which a predicate is true– Tuple variables `range over’ a named relation– Specify a range of a tuple variabe S as the Staff relation using the

notationStaff(S)

– To express the query `find all tuples of S such that F(S) is true’, we write{S | F(S)}

F is a formula, or well-formed formula (wff)– Quantifiers tell how many instances the predicate applies to

• Universal quantifier `for all’• Existential quantifier `there exists’

– Variables qualified with a quantifier are bound variables, those not qualified are free variables

– All free variables are to the left of the vertical bar in a wff

Page 23: CSCI 3140 Module 1 – Background

General form of an expression in Tuple Relational Calculus

{ S1.a1, S2.a2, …, Sn.an | F(S1, S2, …, Sm)} m ≥ n

Where S1, S2, …, Sm are tuple variables, each ai is an attribute of the relation over which Si ranges, and F is a formula

A wff is made up of atoms, where atoms can be:

- R(Si) where Si is a tuple variable and R is a relation

- Si.a1 θ Sj.a2 where Si and Sj are tuple variables, a1 is an attribute of the relation over which Si ranges, a2 is an attributes of the relation over which Sj ranges, and θ is a comparison operator (<,≤,>,≥,=,≠); the attributes a1 and a2 must have domains whose members can be compared by theta

- Si.a1 θ c where where Si is a tuple variable, a1 is an attribute of the relation over which Si ranges, c is a constant from the domain of attribute a1, and θ is one of the comparison operators.

Page 24: CSCI 3140 Module 1 – Background

Domain Relational Calculus

• Domain relational calculus– Variables take their values from the domains of attributes rather than the tuples

of relations– Expressions have the general form:

{d1, d2, …, dn | F(d1, d2, …, dm)} m >= n

Where d1, d2, …, dn, …, dm represent domain variables and F(d1, d2, …, dm) represents a formula composed of atoms, where each atom has one of the following forms:

- R(d1, d2, …, dn) where R is a relation of degree n and each di is a domain variable

- di θ dj ,where di and dj are domain variables and θ is a comparison operator

- di θ c,where di is a domain variable, c is a constant from the domain of d i, and θ is a comparison operator

We recursively build up formulae from atoms where - an atom is a formula, - the conjunction or disjunction of two formulae is a formula, the negation of a

formula is a formula- Existential and universal quantifiers can be applied

Page 25: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Entity type– A group of objects with the same properties, which are identified by the

enterprise as having an independent existence

• Entity occurance– A uniquely identifiable object of an entity type

• Each entity type in an E-R diagram is represented as a rectangle labeled with the name of the entity

Staff PropertyForRent

Page 26: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Relationship type– A set of meaningful associations among entity types

• Relationship occurance– A uniquely identifiable association, which includes one occurrence from

each participating entity type

• Each relationship type in an E-R diagram is represented as a line connecting the associated entity types, labeled with the name of the relationship

Staff BranchHas

Page 27: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Degree of a relationship type– The number of participating entity types in a relationship

• Binary: A branch has staff

• Ternary: Staff registers a client at a branch

• Quaternary: A solicitor arranges a bid for a buyer through a bank

Staff BranchHas

Staff Branch

Client

Registers

Buyer Bank

Bid

Arranges

Solicitor

Page 28: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Recursive relationship– A relationship type where the same entity type participates more than

once in different roles• Example: Staff (supervisor) supervises staff (Supervisee)

• Example of entities associated through two distinct relationships– Manager manages branch office

– Branch office has member of staff

Staff

Supervises

Supervisee

Supervisor

Staff Branch

Has

Manages

Member of staff

Manager

Branch

Branch

Page 29: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Attribute– A property of an entity or relationship type

• Attribute domain– The set of allowable values for one or more attributes

• Simple Attribute– An attribute composed of a single component with an independent existence

• Composite Attribute– An attribute composed of multiple components, each with an independent

existence• Examples

– Address (could be broken into street, city, postal code)– Name (could be broken into FirstName, MiddleInitial, LastName)

• Single-valued Attribute– An attribute that holds a single value for each occurrence of an entity type

• Multi-valued Attribute– An attribute that holds multiple values for each occurance of an entity type

• Example: Each branch may have several phone numbers

• Derived Attribute– An attribute that represents a value that is derivable from the value of a related

attribute or set of attributes, not necessarily in the same entity type

Page 30: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Candidate Key– The minimal set of attributes that uniquely identifies each occurance of

an entity type

• Primary Key– The candidate key that is selected to uniquely identify each occurrence

of an entity type

• Composite Key– A candidate key that consists of two or more attributes

Page 31: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Diagrammatic representation of attributes in ER diagram

Staff Branch

Has

Manages

staffNo {PK}namepositionsalary/totalStaff

branchNo {PK}address street city postalCodetelNo[1..3]

Derivedattribute

Multi-valuedattribute

Compositeattribute

Primary key

Area forattribute

list

Page 32: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Strong entity type– An entity type that is not existence-dependent on some other entity type

• Weak entity type– An entity type that is existence-dependent on some other entity type

• Attributes on relationships– Attributes can be attached to relationships as well as entities

– Diagrammatic representation:

Newspaper PropertyForRentAdvertises

newspaperName propertyNo

dateAdvertcost

Page 33: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Multiplicity– The number of possible occurrences of an entity type that may relate to

a single occurrence of an associated entity type through a particular relationship

• One-to-one (1:1)• One-to-many (1:*)• Many-to-many (*:*)

– Cardinality• Describes the maximum number of possible relationship occurrences for an

entity participating in a given relationship

– Participation• Determines whether all or only some entity occurrences participate in a

relationship

Page 34: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Specialization/Generalization– Superclass

• An entity type that includes one or more distinct subgroupings of its occurrences

– Subclass• A distinct subgrouping of occurrences of an entity type

– (name of subclass) IS-A (name of superclass)• Manager IS-A Staff• Secretary IS-A Staff

– Specialization• The process of maximizing the differences between members of an entity by

identifying their distinguishing characteristics

– Generalization• The process of minimizing the differences between entites by identifying

their common characteristics

Page 35: CSCI 3140 Module 1 – Background

Entity-Relationship Modeling

• Diagrammatic representation of specialization/generalization

StaffBranchHas

staffNo {PK}namepositionsalary

branchNo {PK}address street city postalCode

1..1 1..*

SalesPersonnel

salesAreacarAllowance

Manager

mgrStartDatebonus

Secretary

typingSpeed

Manages1..1

1..1{Optional, And}

Page 36: CSCI 3140 Module 1 – Background

Normalization

• Unnormalized form– A table that contains one or more repeating groups

Artist Album

Supertramp Crime of the Century

Even in the quietest moments

Breakfast in America

Sting Dream of the Blue Turtles

Nothing like the Sun

Page 37: CSCI 3140 Module 1 – Background

Normalization

• First normal form (1NF)– A relation in which the intersection of each row and column contains one

and only one value

Artist Album

Supertramp Crime of the Century

Supertramp Even in the quietest moments

Supertramp Breakfast in America

Sting Dream of the Blue Turtles

Sting Nothing like the Sun

Page 38: CSCI 3140 Module 1 – Background

Normalization

• Functional dependency– Describes the relationship between attributes in a relation– If A and B are attributes of a relation R, B is functionally dependent on A

(denoted A -> B) if each value of A is associated with exactly one value of B.

• (A and B may each consist of one or more attributes)

• Determinant– The attribute or set of attributes on the left-hand-side of the arrow of a

functional dependency

• Full functional dependency– Indicates that if A and B are attributes in a relation, B is fully functionally

dependent on A if B is functionally dependent on A, but not on any proper subset of A

• Second normal form (2NF)– A relation that is in first normal form and every non-primary-key attribute

is fully functionally dependent on every candidate key

Page 39: CSCI 3140 Module 1 – Background

Normalization

• Transitive dependency– A condition where A, B, and C are attributes of a relation such that if

A->B and B->C , then C is transitively dependent on A via B (provided that A is not fully functionally dependent on B or C)

• Third normal form (3NF)– A relation that is in first and second normal form, and in which no

non-primary-key attribute is transitively dependent on any candidate key

Page 40: CSCI 3140 Module 1 – Background

Normalization

• Boyce-Codd normal form (BCFN)– A relation that is in BCNF if and only if every determinant is a

candidate key

• Example: ClientInterview(clientNo, interviewDate, interviewTime, staffNo, roomNo)

clientNo, interviewDate -> interviewTime, staffNo, roomNo (Primary key) staffNo, interviewDate, interviewTime -> clientNo (Candidate key)

roomNo, interviewDate, interviewTime -> staffNo, clientNo (Candidate key) staffNo, interviewDate -> roomNo

Split into two relations that are in BCNF:

Interview(clientNo, interviewDate, interviewTime, staffNo) StaffRoom(staffNo, interviewDate, roomNo)

Page 41: CSCI 3140 Module 1 – Background

Normalization

• Multivalued dependency– Represents a dependency between attributes in a relation, such that for

each value of A there is a set of values for B and a set of values for C. However, the set of values for B and C are independent of each other.

• Trivial multivalued dependency– If A->> B is a multivalued dependency in relation R, it is considered

trivial if either (a) B is a subset of A or (b) A U B = R

• Fourth normal form (4NF)– A relation that is in BCNF and contains no nontrivial multivalued

dependenciesExample:BranchStaffOwner(branchNo, sName, oName) BCNF

branchNo ->> sName 4NFbranchNo ->> oName 4NF

Page 42: CSCI 3140 Module 1 – Background

Normalization

• Lossless-join dependency– A property of decomposition, which ensures that no spurious tuples are

generated when relations are reunited through a natural join operation

• Fifth normal form (5NF)– A relation has no join dependencies

Example:

Chrysler, Dodge, Jeep dealership carries cars from all three manufacturers. If the dealership decides to offer trucks for sale, trucks from all manufacturers that make trucks must be offerred.

Manufacturer(manufacturerNo, vehicleType)

Dealer(dealerNo, dealerName)

VehicleTypes(vehicleTypeNo, vehicleType, vehicleTypeDescription)

VehicleTypesOffered(manufacturerNo, dealerNo, vehicleTypeNo)

Page 43: CSCI 3140 Module 1 – Background

Normalization

• Unnormalized form– A table that contains one or more repeating groups

• First normal form (1NF)– A relation in which the intersection of each row and column contains one and

only one value

• Second normal form (2NF)– A relation that is in first normal form and every non-primary-key attribute is fully

functionally dependent on every candidate key

• Third normal form (3NF)– A relation that is in first and second normal form, and in which no non-primary-

key attribute is transitively dependent any candidate key

• Boyce-Codd normal form (BCNF)– A relation is in BCNF, if and only if, every determinant is a candidate key

• Fourth normal form (4NF)– A relation is in Boyce-Codd normal form and contains no nontrivial multi-valued

dependencies

• Fifth normal form (5NF)– A relation has no join dependencies