Logical and Physical Design

  • Upload
    wafasa

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

  • 7/28/2019 Logical and Physical Design

    1/103

    Logical And PhysicalDatabase Design

    KRISNA ADIYARTA

    PASCA SARJANA (MAGISTER KOMPUTER)

    UNIVERSITAS BUDI LUHUR

    JAKARTA

    1. Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden. Modern Database Management. 4th

    Edition. Upper Saddle River, New Jersey:Prentice Hall (Pearson Educational, Inc), 2005

    2. A. Silberschatz, H.F. Korth, S. Sudarshan. "Database System Concepts (4th Edition). McGraw-Hill, 2002.

  • 7/28/2019 Logical and Physical Design

    2/103

    Objectives Definition of terms

    List five properties of relations State two properties of candidate keys Define first, second, and third normal form

    Describe problems from merging relationsTransform E-R and EER diagrams to relations Create tables with entity and relational integrity

    constraints Use normalization to convert anomalous tables

    to well-structured relations

  • 7/28/2019 Logical and Physical Design

    3/103

    Definition of terms

    Describe the physical database design process Choose storage formats for attributes

    Select appropriate file organizations

    Describe three types of file organization Describe indexes and their appropriate use

    Translate a database model into efficientstructures

    Know when and how to use denormalization

    Objectives

  • 7/28/2019 Logical and Physical Design

    4/103

    Relation Definition: A relation is a named, two-dimensional table of

    data

    Table consists of rows (records) and columns (attribute orfield)

    Requirements for a table to qualify as a relation: It must have a unique name

    Every attribute value must be atomic (not multivalued, notcomposite)

    Every row must be unique (cant have two rows with exactly thesame values for all their fields)

    Attributes (columns) in tables must have unique namesThe order of the columns must be irrelevantThe order of the rows must be irrelevant

    NOTE: all relations are in 1st Normal form

  • 7/28/2019 Logical and Physical Design

    5/103

    Correspondence with E-R Model

    Relations (tables) correspond with entity typesand with many-to-many relationship types

    Rows correspond with entity instances and with

    many-to-many relationship instances Columns correspond with attributes

    NOTE: The word relation (in relationaldatabase) is NOT the same as the word

    relationship (in E-R model)

  • 7/28/2019 Logical and Physical Design

    6/103

    Key Fields Keys are special fields that serve two main purposes:

    Primary keys are unique identifiers of the relation in question.

    Examples include employee numbers, social security numbers,etc. This is how we can guarantee that all rows are unique

    Foreign keys are identifiers that enable a dependent relation(on the many side of a relationship) to refer to its parent relation

    (on the one side of the relationship) Keys can be simple (a single field) or composite (more

    than one field)

    Keys usually are used as indexes to speed up theresponse to user queries

  • 7/28/2019 Logical and Physical Design

    7/103

    Schema for four relations (Pine Valley Furniture Company)

    Primary Key

    Foreign Key(implements 1:N relationshipbetween customer and order)

    Combined, these are acompositeprimary key(uniquely identifies the

    order line)individually they areforeign keys(implement M:Nrelationship between order and product)

  • 7/28/2019 Logical and Physical Design

    8/103

    Integrity Constraints

    Domain ConstraintsAllowable values for an attribute. Entity

    Integrity

    No primary key attribute may be null. Allprimary key fields MUST have data

  • 7/28/2019 Logical and Physical Design

    9/103

    Domain definitions enforce domain integrity constraints

  • 7/28/2019 Logical and Physical Design

    10/103

    Integrity Constraints

    Referential Integrityrule states that any foreign key value

    (on the relation of the many side) MUST match a primarykey value in the relation of the one side. (Or the foreignkey can be null)

    For example: Delete Rules Restrictdont allow delete of parent side if related rows exist

    in dependent side

    Cascadeautomatically delete dependent side rows that

    correspond with the parent side row to be deleted Set-to-Nullset the foreign key in the dependent side to null if

    deleting from the parent side not allowed for weak entities

  • 7/28/2019 Logical and Physical Design

    11/103

    Referential integrity constraints

    Referentialintegrity

    constraints aredrawn via arrowsfrom dependent to

    parent table

  • 7/28/2019 Logical and Physical Design

    12/103

    Referentialintegrity

    constraints areimplemented with

    foreign key to

    primary keyreferences

  • 7/28/2019 Logical and Physical Design

    13/103

    Transforming EER Diagrams into Relations

    Mapping Regular Entities to Relations

    1. Simple attributes: E-R attributes map directlyonto the relation

    2. Composite attributes: Use only their simple,

    component attributes3. Multivalued AttributeBecomes a separate

    relation with a foreign key taken from thesuperior entity

  • 7/28/2019 Logical and Physical Design

    14/103

    (a) CUSTOMER

    entity type withsimple

    attributes

    Mapping a regular entity

    (b) CUSTOMER relation

  • 7/28/2019 Logical and Physical Design

    15/103

    (a) CUSTOMER

    entity type withcompositeattribute

    Mapping a composite attribute

    (b) CUSTOMER relation with address detail

  • 7/28/2019 Logical and Physical Design

    16/103

    Mapping an entity with a multivalued attribute

    Onetomany relationship between original entity and new relation

    (a)

    Multivalued attribute becomes a separate relation with foreign key

    (b)

  • 7/28/2019 Logical and Physical Design

    17/103

    Transforming EER Diagrams into Relations (cont.)

    Mapping Weak Entities

    Becomes a separate relation with aforeign key taken from the superior

    entityPrimary key composed of:

    Partial identifier of weak entity

    Primary key of identifying relation (strongentity)

  • 7/28/2019 Logical and Physical Design

    18/103

    a) Weak entity DEPENDENT

    Example of mapping a weak entity

  • 7/28/2019 Logical and Physical Design

    19/103

    NOTE: the domain constraintfor the foreign key should

    NOT allownull value ifDEPENDENT is a weakentity

    Foreign key

    Composite primary key

    Example of mapping a weak entity (cont.)

    b) Relations resulting from weak entity

  • 7/28/2019 Logical and Physical Design

    20/103

    Transforming EER Diagrams into Relations (cont.)

    Mapping Binary Relationships

    One-to-ManyPrimary key on the one sidebecomes a foreign key on the many side

    Many-to-ManyCreate a new relation with

    the primary keys of the two entities as itsprimary key

    One-to-OnePrimary key on the mandatoryside becomes a foreign key on the optionalside

  • 7/28/2019 Logical and Physical Design

    21/103

    Example of mapping a 1:M relationship

    a) Relationship between customers and orders

    Note the mandatory one

    b) Mapping the relationship

    Again, no null value in theforeign keythis is because

    of the mandatory minimumcardinality

    Foreign key

  • 7/28/2019 Logical and Physical Design

    22/103

    Example of mapping an M:N relationship

    a) Completes relationship (M:N)

    TheCompletesrelationship will need to become a separate relation

  • 7/28/2019 Logical and Physical Design

    23/103

    New

    intersectionrelation

    Foreign key

    Foreign key

    Composite primary key

    Example of mapping an M:N relationship (cont.)

    b) Three resulting relations

  • 7/28/2019 Logical and Physical Design

    24/103

    Example of mapping a binary 1:1 relationship

    a) In_charge relationship (1:1)

    Often in 1:1 relationships, one direction is optional.

  • 7/28/2019 Logical and Physical Design

    25/103

    b) Resulting relations

    Example of mapping a binary 1:1 relationship (cont.)

    Foreign key goes in the relation on the optional side,Matching the primary key on the mandatory side

  • 7/28/2019 Logical and Physical Design

    26/103

    Transforming EER Diagrams into Relations (cont.)

    Mapping Associative Entities

    Identifier Not AssignedDefault primary key for the association

    relation is composed of the primary keys ofthe two entities (as in M:N relationship)

    Identifier Assigned

    It is natural and familiar to end-usersDefault identifier may not be unique

  • 7/28/2019 Logical and Physical Design

    27/103

    Example of mapping an associative entity

    a) An associative entity

  • 7/28/2019 Logical and Physical Design

    28/103

    Example of mapping an associative entity (cont.)

    b) Three resulting relations

    Composite primary key formed from the two foreign keys

  • 7/28/2019 Logical and Physical Design

    29/103

    Example of mapping an associative entity with

    an identifier

    a) SHIPMENT associative entity

  • 7/28/2019 Logical and Physical Design

    30/103

    Example of mapping an associative entity with

    an identifier (cont.)

    b) Three resulting relations

    Primary key differs from foreign keys

  • 7/28/2019 Logical and Physical Design

    31/103

    Transforming EER Diagrams into Relations (cont.)

    Mapping Unary RelationshipsOne-to-ManyRecursive foreign key in the

    same relation

    Many-to-ManyTwo relations:One for the entity type

    One for an associative relation in which theprimary key has two attributes, both takenfrom the primary key of the entity

  • 7/28/2019 Logical and Physical Design

    32/103

    Mapping a unary 1:N relationship

    (a) EMPLOYEE entity with

    unary relationship

    (b) EMPLOYEErelation withrecursive foreignkey

  • 7/28/2019 Logical and Physical Design

    33/103

    Mapping a unary M:N relationship

    (a) Bill-of-materialsrelationships (M:N)

    (b) ITEM andCOMPONENTrelations

  • 7/28/2019 Logical and Physical Design

    34/103

    Transforming EER Diagrams into Relations (cont.)

    Mapping Ternary (and n-ary)

    RelationshipsOne relation for each entity and

    one for the associative entityAssociative entity has foreign keys

    to each entity in the relationship

  • 7/28/2019 Logical and Physical Design

    35/103

    Mapping a ternary relationship

    a) PATIENT TREATMENT Ternary relationship withassociative entity

  • 7/28/2019 Logical and Physical Design

    36/103

    b) Mapping the ternary relationship PATIENT TREATMENT

    Rememberthat the

    primary keyMUST be

    unique

    Mapping a ternary relationship (cont.)

    This is whytreatment dateand time are

    included in the

    compositeprimary key

    But this makes avery

    cumbersomekey

    It would bebetter to create asurrogate key

    like Treatment#

  • 7/28/2019 Logical and Physical Design

    37/103

    Transforming EER Diagrams into Relations (cont.)

    Mapping Supertype/Subtype Relationships

    One relation for supertype and for each subtypeSupertype attributes (including identifier and

    subtype discriminator) go into supertype relation

    Subtype attributes go into each subtype;primary key of supertype relation also becomesprimary key of subtype relation

    1:1 relationship established between supertypeand each subtype, with supertype as primarytable

  • 7/28/2019 Logical and Physical Design

    38/103

    Supertype/subtype relationships

  • 7/28/2019 Logical and Physical Design

    39/103

    Mapping Supertype/subtype relationships to relations

    These are implemented as one-to-onerelationships

  • 7/28/2019 Logical and Physical Design

    40/103

    Data Normalization Primarily a tool to validate and improve

    a logical design so that it satisfiescertain constraints that avoid

    unnecessary duplication of dataThe process of decomposing relations

    with anomalies to produce smaller,

    well-structured relations

  • 7/28/2019 Logical and Physical Design

    41/103

    Well-Structured Relations A relation that contains minimal data redundancy and

    allows users to insert, delete, and update rowswithout causing data inconsistencies

    Goal is to avoid anomalies

    Insertion Anomalyadding new rows forces user to createduplicate data

    Deletion Anomalydeleting rows may cause a loss of datathat would be needed for other future rows

    Modification Anomalychanging data in a row forceschanges to other rows because of duplication

    General rule of thumb: A table should not pertain to

    more than one entity type

  • 7/28/2019 Logical and Physical Design

    42/103

    Example

    QuestionIs this a relation?AnswerYes: Unique rows and no

    multivaluedattributes

    QuestionWhats the primary key? AnswerComposite: Emp_ID, Course_Title

  • 7/28/2019 Logical and Physical Design

    43/103

    Anomalies in this Table

    Insertioncant enter a new employee without

    having the employee take a class Deletionif we remove employee 140, we lose

    information about the existence of a Tax Acc

    class Modificationgiving a salary increase to

    employee 100 forces us to update multiple

    recordsWhy do these anomalies exist?

    Because there are two themes (entity types) in this one

    relation. This results in data duplication and anunnecessary dependency between the entities

  • 7/28/2019 Logical and Physical Design

    44/103

    Functional Dependencies and Keys

    Functional Dependency: The value ofone attribute (the determinant)determines the value of anotherattribute

    Candidate Key:A unique identifier. One of the candidate

    keys will become the primary key E.g. perhaps there is both credit card number

    and SS# in a tablein this case both arecandidate keys

    Each non-key field is functionallydependent on every candidate key

  • 7/28/2019 Logical and Physical Design

    45/103

    Normalization

    Relations can fall into one or more categories (or classes) called Normal Forms

    Normal Form: A class of relations free from a certain set of modification

    anomalies.

    Normal forms are given name such as:

    First normal form (1NF)Second normal form (2NF)

    Third normal form (3NF)

    Boyce-Codd normal form (BCNF)

    Fourth normal form (4NF)

    Fifth normal form (5NF)

    These forms are cumulative. A relation in Third normal form is also in 2NF and

    1NF.

    Normalization

  • 7/28/2019 Logical and Physical Design

    46/103

    Steps in normalization

  • 7/28/2019 Logical and Physical Design

    47/103

    A relation is in first normal form if it meets the definition of a relation:1.Each column (attribute) value must be a single value only.2.All values for a given column (attribute) must be of the same type.3.Each column (attribute) name must be unique.4.The order of columns is insignificant.5.No two rows (tuples) in a relation can be identical.6.The order of the rows (tuples) is insignificant.

    If you have a key defined for the relation, then you can meet the uniquerow requirement.Example relation in 1NF:STOCKS (Company, Symbol, Date, Close_Price)

    112.0001/06/94NETSNetscape

    33.0001/05/94NETSNetscape

    102.0001/07/94IBMIBM

    100.5001/06/94IBMIBM

    101.0001/05/94IBMIBM

    Close PriceDateSymbolCompany

    First Normal Form (1NF)

  • 7/28/2019 Logical and Physical Design

    48/103

    A relation is in second normal form (2NF) if all of its non-key attributes are

    dependent on all of the key.

    Relations that have a single attribute for a key are automatically in 2NF.This is one reason why we often use artificial identifiers as keys.

    In the example below, Close Price is dependent on Company, Date and

    Symbol, Date

    The following example relation is not in 2NF:

    STOCKS (Company, Symbol, Headquarters, Date, Close_Price)

    112.0001/06/94Sunyvale, CANETSNetscape

    33.0001/05/94Sunyvale, CANETSNetscape

    102.0001/07/94Armonk, NYIBMIBM

    100.5001/06/94Armonk, NYIBMIBM

    101.0001/05/94Armonk, NYIBMIBM

    Close PriceDateHeadquartersSymbolCompanyCompany, Dat e - > Cl ose Pr i ce

    Symbol , Dat e - > Cl ose Pr i ceCompany - > Symbol , Headquar t er s

    Symbol - > Company, Headquar t er s

    Second Normal Form (2NF)

  • 7/28/2019 Logical and Physical Design

    49/103

    Consider that Company, Dat e - > Cl ose Pr i ce.

    So we might use Company, Date as our key.

    However: Company - > Headquar t er s

    This violates the rule for 2NF. Also, consider the insertion and deletionanomalies.

    One Solution: Split this up into two relations:

    COMPANY (Company, Symbol, Headquarters)STOCKS (Symbol, Date, Close_Price)

    Sunnyvale,CA

    NETSNetscape

    Armonk, NYIBMIBM

    HeadquartersSymbolCompany

    Company - > Symbol , Headquar t er s

    Symbol - > Company, Headquar t er s

    112.0001/06/94NETS

    33.0001/05/94NETS

    102.0001/07/94IBM

    100.5001/06/94IBM

    101.0001/05/94IBM

    Close PriceDateSymbol

    Symbol, Date -> Close Price

    Second Normal Form (2NF)

  • 7/28/2019 Logical and Physical Design

    50/103

    A relation is in third normal form (3NF) if it is in second normal formand

    it contains no transitive dependencies.

    Consider relation R containing attributes A, B and C.I f A - > B and B - > C t hen A - > C

    Transitive Dependency: Three attributes with the above dependencies.

    Example: At CUNY:

    Cour se_Code - > Cour se_Num, Sect i on

    Cour se_Num, Sect i on - > Cl assr oom, Pr of essor

    Example: At Rutgers: Course_

    I ndex_Num - > Cour se_Num, Sect i onCour se_Num, Sect i on - > Cl assr oom, Pr of essor

    Third Normal Form (3NF)

  • 7/28/2019 Logical and Physical Design

    51/103

    26%BergenAT&T

    28%PutnamIBM

    Tax RateCountyCompany Company - > Count yand

    Count y - > Tax Rat et hus

    Company - > Tax Rat e

    What happens if we remove AT&T ?We loose information about 2 different themes.

    Split this up into two relations:

    BergenAT&T

    PutnamIBM

    CountyCompany

    Company - > Count y26%Bergen

    28%Putnam

    Tax RateCounty

    Count y - > Tax Rat e

    Example:

    Third Normal Form (3NF)

  • 7/28/2019 Logical and Physical Design

    52/103

    A relation is in BCNF if every determinant is a candidate key.

    Recall that not all determinants are keys.

    Those determinants that are keys we initially call candidate keys.

    Eventually, we select a single candidate key to be the primary key for the relation.

    Consider the following example:

    Funds consist of one or more Investment Types.

    Funds are managed by one or more ManagersInvestment Types can have one more Managers

    Managers only manage one type of investment.

    SmithCommon Stock11

    BrownGrowth Stocks22

    GreenCommon Stock33

    J onesMunicipal Bonds99

    SmithCommon Stock99

    ManagerInvestmentTypeFundID

    FundI D, Manager - > I nvest ment TypeFundI D, I nvest ment Type - > ManagerManager - > I nvest ment Type

    Boyce-Codd Normal Form (BCNF)

  • 7/28/2019 Logical and Physical Design

    53/103

    The combination FundID and InvestmentType form a candidate key because we can use

    FundID,InvestmentType to uniquely identify a tuple in the relation.

    Similarly, the combination FundID and Manager also form a candidate key because we

    can use FundID, Manager to uniquely identify a tuple. Manager by itself is not a candidate key because we cannot use Manager alone to

    uniquely identify a tuple in the relation.

    Is this relation R(FundID, InvestmentType, Manager) in 1NF, 2NF or 3NF ?

    Given we pick FundID, InvestmentType as the Primary Key: 1NF for sure.2NF because all of the non-key attributes (Manager) is dependant on all of the key.3NF

    because there are no transitive dependencies.

    Consider what happens if we delete the tuple with FundID 22. We loose the fact that

    Brown manages the InvestmentType "Common Stocks."

    SmithCommon Stock11

    BrownGrowth Stocks22

    GreenCommon Stock33

    J onesMunicipal Bonds99

    SmithCommon Stock99

    Manage

    r

    InvestmentTyp

    eFundID

    FundI D, Manager - > I nvest ment TypeFundI D, I nvest ment Type - > ManagerManager - > I nvest ment Type

    Boyce-Codd Normal Form (BCNF)

  • 7/28/2019 Logical and Physical Design

    54/103

    The fol lowing are steps to normalize a relation into BCNF:1. List all of the determinants.2. See if each determinant can act as a key (candidate keys).3. For any determinant that is not a candidate key, create a new relation from the

    functional dependency. Retain the determinant in the original relation.For our example:

    Rorig(FundID, InvestmentType, Manager)1. The determinants are:

    FundI D, I nvest ment TypeFundI D, ManagerManager

    2. Which determinants can act as keys ?FundI D, I nvest ment Type YESFundI D, Manager YESManager NO

    3. Create a new relation from the functional dependency:Rnew(Manager, InvestmentType)Rorig(FundID, Manager)

    In this last step, we have retained the determinant "Manager" in the original relation Rorig.

    Boyce-Codd Normal Form (BCNF)

  • 7/28/2019 Logical and Physical Design

    55/103

    A relation is in fourth normal form if it is in BCNF and it contains

    multivalued dependencies.

    Multivalued Dependency: A type of functional dependency wherethe determinant can determine more than one value.

    More formally, there are 3 criteria:

    1. There must be at least 3 attributes in the relation. call them A, B, and

    C, for example.

    2. Given A, one can determine multiple values of B.

    Given A, one can determine multiple values of C.

    3. B and C are independent of one another.

    example:

    Student has one or more majors.

    Student participates in one or more activities.

    Fourth Normal Form (4NF)

  • 7/28/2019 Logical and Physical Design

    56/103

    SwimmingMarketing200

    VolleyballAccounting100

    BaseballAccounting100

    VolleyballCIS100

    BaseballCIS100

    ActivitiesMajorStudentID

    St udent I D - >> Maj orSt udent I D - >> Act i vi t i es

    T. Rowe Price Emerging Markets Bond FundKaufmann Fund888

    Dreyfus Short-Intermediate Municipal Bond FundScudder Global Fund999

    Municipal BondsScudder Global Fund999

    Dreyfus Short-Intermediate Municipal Bond FundJ anus Fund999Municipal BondsJ anus Fund999

    Bond FundStock FundPortfolio ID

    Fourth Normal Form (4NF)

  • 7/28/2019 Logical and Physical Design

    57/103

    A few characteristics:

    1. No regular functional dependencies2. All three attributes taken together form the key.

    3. Latter two attributes are independent of one another.

    4. Insertion anomaly: Cannot add a stock fund without adding a

    bond fund (NULL Value). Must always maintain the combinations

    to preserve the meaning.

    Stock Fund and Bond Fund form a multivalued dependency onPortfolio ID. PortfolioID ->-> Stock Fund PortfolioID ->-> Bond

    Fund

    Fourth Normal Form (4NF)

  • 7/28/2019 Logical and Physical Design

    58/103

    Resolution: Split into two tables with the common key:

    KaufmannFund888

    ScudderGlobal Fund

    999

    J anus Fund999

    Stock FundPortfolio

    ID

    T. Rowe Price Emerging Markets BondFund888

    Dreyfus Short-Intermediate MunicipalBond Fund

    999

    Municipal Bonds999

    Bond FundPortfolio

    ID

    T. Rowe Price Emerging Markets Bond FundKaufmann Fund888Dreyfus Short-Intermediate Municipal Bond FundScudder Global Fund999

    Municipal BondsScudder Global Fund999

    Dreyfus Short-Intermediate Municipal Bond FundJ anus Fund999

    Municipal BondsJ anus Fund999

    Bond FundStock FundPortfolio ID

    Fourth Normal Form (4NF)

  • 7/28/2019 Logical and Physical Design

    59/103

    There are certain conditions under which

    after decomposing a relation, it cannot bereassembled back into its original form.

    Fifth Normal Form (5NF)

  • 7/28/2019 Logical and Physical Design

    60/103

    Consider the following relation:

    CUSTOMER (CustomerID, Name, Address, City, State, Zip)This relation is not in DK/NF because it contains a functional dependency

    not implied by the key.

    Zi p - > Ci t y, St at e

    We can normalize this into DK/NF by splitting the CUSTOMER relation

    into two:

    CUSTOMER (CustomerID, Name, Address, Zip)

    CODES (Zip, City, State)

    We may pay a performance penalty - each customer address lookup

    requires we look in two relations (tables).

    In such cases, we may de-normalize the relations to achieve a

    performance improvement.

    De-Normalization

  • 7/28/2019 Logical and Physical Design

    61/103

    Many of you asked for a "complete" example that would run through all ofthe normal forms from beginning to end using the same tables. This istough to do, but here is an attempt:

    Example relation:EMPLOYEE ( Name, Project, Task, Office, Phone )

    Note: Keys are underlined.Example Data:

    15885588T2100XEd

    14424442T33300ZSue

    14424442T33200YSue

    14424442T33100XSue14004400T2200YBill

    14004400T1200YBill

    14004400T2100XBill

    14004400T1100XBill

    PhoneFloorOfficeTaskProjectName

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    62/103

    Name is the employee's name

    Project is the project they are working on. Bill is working on two different

    projects, Sue is working on 3.Task is the current task being worked on. Bill is now working on Tasks T1

    and T2. Note that Tasks are independent of the project. Examples of a

    task might be faxing a memo or holding a meeting.

    Office is the office number for the employee. Bill works in office number400.

    Flooris the floor on which the office is located.

    Phone is the phone extension. Note this is associated with the phone in

    the given office.

    Question :

    First Normal Form

    Assume the key is Name, Project, Task.Is EMPLOYEE in 1NF ?

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    63/103

    Second Normal Form

    List all of the functional dependencies for EMPLOYEE.

    Are all of the non-key attributes dependant on all of the key ?

    Split into two relations EMPLOYEE_PROJ ECT_TASK andEMPLOYEE_OFFICE_PHONE. EMPLOYEE_PROJ ECT_TASK (Name, Project,

    Task)

    T2100XEd

    T33300ZSue

    T33200YSue

    T33100XSue

    T2200YBill

    T1200YBillT2100XBill

    T1100XBill

    TaskProjectName

    EMPLOYEE_OFFI CE_PHONE ( Name,Of f i ce, Fl oor , Phone)

    15885588T2100XEd

    14424442T33300ZSue

    14424442T33200YSue14424442T33100XSue

    14004400T2200YBill

    14004400T1200YBill

    14004400T2100XBill

    14004400T1100XBill

    PhoneFloorOfficeTaskProjectName

    15885588Ed

    14424442Sue

    14004400Bill

    PhoneFloorOfficeName

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    64/103

    Third Normal Form

    Assume each office has exactly one phone number.

    Are there any transitive dependencies ?

    Where are the modification anomalies in EMPLOYEE_OFFICE_PHONE ?Split EMPLOYEE_OFFICE_PHONE.

    EMPLOYEE_PROJ ECT_TASK (Name, Project, Task)

    Name Project Task

    Bill 100X T1

    Bill 100X T2

    Bill 200Y T1

    Bill 200Y T2Sue 100X T33

    Sue 200Y T33

    Sue 300Z T33

    Ed 100X T2

    EMPLOYEE_OFFI CE

    ( Name, Of f i ce, Fl oor )Name Of f i ce Fl oor

    Bi l l 400 4

    Sue 442 4

    Ed 588 5EMPLOYEE_PHONE( Of f i ce, Phone)

    Office Phone

    400 1400

    442 1442

    588 1588

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    65/103

    Boyce-Codd Normal Form

    List all of the functional dependencies for

    EMPLOYEE_PROJ ECT_TASK, EMPLOYEE_OFFICE and

    EMPLOYEE_PHONE. Look at the determinants.

    Are all determinants candidate keys ?

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    66/103

    Forth Normal Form

    Are there any multivalued dependencies ?

    What are the modification anomalies ?

    Split EMPLOYEE_PROJ ECT_TASK.

    EMPLOYEE_PROJ ECT (Name, Project )

    Name Project

    Bill 100XBill 200YSue 100XSue 200YSue 300Z

    Ed 100X

    Name Project TaskBill 100X T1

    Bill 100X T2

    Bill 200Y T1

    Bill 200Y T2

    Sue 100X T33

    Sue 200Y T33

    Sue 300Z T33

    Ed 100X T2

    Name TaskBill T1Bill T2Sue T33

    Ed T2EMPLOYEE_TASK ( Name, Task )

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    67/103

    EMPLOYEE_OFFI CE ( Name, Of f i ce, Fl oor )Name Office Floor

    Bill 400 4

    Sue 442 4

    Ed 588 5

    R4 ( Of f i ce, Phone)

    Office Phone

    400 1400

    442 1442

    588 1588

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    68/103

    At each step of the process, we did the following:

    1.Write out the relation

    2.(optionally) Write out some example data.3.Write out all of the functional dependencies

    4.Starting with 1NF, go through each normal form and state why

    the relation is in the given normal form.

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    69/103

    Another short example

    Consider the following example of normalization for a CUSTOMER relation.

    Relation Name

    CUSTOMER (CustomerID, Name, Street, City, State, Zip, Phone)

    Example Data

    CustomerID Name Street City State Zip Phone

    C101 Bill Smith 123 First St. New Brunswick NJ 07101 732-555-1212

    C102 Mary Green 11 Birch St. Old Bridge NJ 07066 908-555-1212

    Functional DependenciesCust omer I D - > Name, St r eet , Ci t y, St at e, Zi p, PhoneZi p - > Ci t y, St at e

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    70/103

    1NF Meets the definition of a relation.

    2NF All non key attributes are dependent on all of the key.

    3NFThere are no transitive dependencies.

    BCNF Relation CUSTOMER is not in BCNF because one of the

    determinants Zip can not act as a key for the entire relation. Solution:

    Split CUSTOMER into two relations:CUSTOMER (CustomerID, Name, Street, Zip, Phone)

    ZIPCODES (Zip, City, State)

    Check both CUSTOMER and ZIPCODE to ensure they are both in 1NFup to BCNF.

    4NFThere are no multi-valued dependencies in either CUSTOMER or

    ZIPCODES.

    As a final step, consider de-normalization.

    Normalization

    All-in-One Example

  • 7/28/2019 Logical and Physical Design

    71/103

    Merging Relations

    View IntegrationCombining entities frommultiple ER models into common relations

    Issues to watch out for when merging entitiesfrom different ER models: Synonymstwo or more attributes with different

    names but same meaning Homonymsattributes with same name but different

    meanings

    Transitive dependencieseven if relations are in 3NFprior to merging, they may not be after merging

    Supertype/subtype relationshipsmay be hidden priorto merging

  • 7/28/2019 Logical and Physical Design

    72/103

    Enterprise Keys Primary keys that are unique in the

    whole database, not just within asingle relation

    Corresponds with the concept of anobject ID in object-oriented systems

  • 7/28/2019 Logical and Physical Design

    73/103

    Enterprise keys

    a) Relations withenterprise key

    b) Sample data withenterprise key

  • 7/28/2019 Logical and Physical Design

    74/103

    Physical Database Design

    Purposetranslate the logical descriptionof data into the technical specifications forstoring and retrieving data

    Goalcreate a design for storing data that

    will provide adequate performance andinsure database integrity, security, and

    recoverability

  • 7/28/2019 Logical and Physical Design

    75/103

    Physical Design Process

    zNormalized relations

    zVolume estimates

    z

    Attribute definitionszResponse time expectations

    zData security needs

    zBackup/recovery needs

    zIntegrity expectations

    zDBMS technology used

    Inputs

    zAttribute data types

    zPhysical record descriptions

    (doesnt always matchlogical design)

    zFile organizations

    zIndexes and databasearchitectures

    zQuery optimization

    Leads to

    Decisions

  • 7/28/2019 Logical and Physical Design

    76/103

    Composite usage map

  • 7/28/2019 Logical and Physical Design

    77/103

    Composite usage map (cont.)

    Data volumes

  • 7/28/2019 Logical and Physical Design

    78/103

    Composite usage map (cont.)

    Access Frequencies(per hour)

  • 7/28/2019 Logical and Physical Design

    79/103

    Composite usage map (cont.)

    Usage analysis:140 purchased parts accessedper hour

    80 quotations accessed fromthese 140 purchased partaccesses

    70 suppliers accessed fromthese 80 quotation accesses

  • 7/28/2019 Logical and Physical Design

    80/103

    Composite usage map (cont.)

    Usage analysis:75 suppliers accessed per

    hour40 quotations accessed fromthese 75 supplier accesses

    40 purchased parts accessedfrom these 40 quotationaccesses

  • 7/28/2019 Logical and Physical Design

    81/103

    Designing Fields

    Field: smallest unit of data in

    databaseField design

    Choosing data type

    Coding, compression, encryption

    Controlling data integrity

  • 7/28/2019 Logical and Physical Design

    82/103

    Choosing Data Types

    CHARfixed-length character

    VARCHAR2variable-length character (memo) LONGlarge number

    NUMBERpositive/negative number INEGERpositive/negative whole number

    DATEactual date

    BLOBbinary large object (good for graphics,sound clips, etc.)

  • 7/28/2019 Logical and Physical Design

    83/103

    Example code look-up table

    Code saves space, but costsan additional lookup toobtain actual value

  • 7/28/2019 Logical and Physical Design

    84/103

    Field Data Integrity

    Default valueassumed value if no explicit

    value Range controlallowable value limitations

    (constraints or validation rules)

    Null value controlallowing or prohibitingempty fields

    Referential integrityrange control (and null

    value allowances) for foreign-key to primary-key match-ups

    Sarbanes-Oxley Act (SOX) legislates importance of financial data integrity

  • 7/28/2019 Logical and Physical Design

    85/103

    Handling Missing Data

    Substitute an estimate of the missing value(e.g., using a formula)

    Construct a report listing missing values

    In programs, ignore missing data unless thevalue is significant (sensitivity testing)

    Triggers can be used to perform these operations

  • 7/28/2019 Logical and Physical Design

    86/103

    Physical Records

    Physical Record: A group of fields

    stored in adjacent memory locationsand retrieved together as a unit

    Page: The amount of data read orwritten in one I/O operation

    Blocking Factor: The number of physical

    records per page

  • 7/28/2019 Logical and Physical Design

    87/103

    DenormalizationTransforming normalized relations into unnormalized

    physical record specifications

    Benefits: Can improve performance (speed) by reducing number of table

    lookups (i.e. reduce number of necessary join queries)

    Costs (due to data duplication)Wasted storage space

    Data integrity/consistency threats

    Common denormalization opportunities

    One-to-one relationshipMany-to-many relationship with attributes)

    Reference data (1:N relationship where 1-side has data not usedin any other relationship)

    A possible denormalization situation: two entities with one to one

  • 7/28/2019 Logical and Physical Design

    88/103

    A possible denormalization situation: two entities with one-to-onerelationship

    A ibl d li ti it ti t l ti hi ith

  • 7/28/2019 Logical and Physical Design

    89/103

    A possible denormalization situation: a many-to-many relationship withnonkey attributes

    Extra tableaccessrequired

    Null description possible

    A possible denormalization situation:

  • 7/28/2019 Logical and Physical Design

    90/103

    A possible denormalization situation:reference data

    Extra tableaccess

    required

    Data duplication

    P titi i

  • 7/28/2019 Logical and Physical Design

    91/103

    Partitioning

    Horizontal Partitioning: Distributing the rows of atable into several separate files Useful for situations where different users need access to

    different rows

    Three types: Key Range Partitioning, Hash Partitioning, orComposite Partitioning

    Vertical Partitioning: Distributing the columns of atable into several separate relations Useful for situations where different users need access to

    different columnsThe primary key must be repeated in each file

    Combinations of Horizontal and Vertical

    Partitions often correspond with User Schemas (user views)

    P titi i ( t )

  • 7/28/2019 Logical and Physical Design

    92/103

    Partitioning (cont.)

    Advantages of Partitioning: Efficiency: Records used together are grouped together Local optimization: Each partition can be optimized for

    performance Security, recovery Load balancing: Partitions stored on different disks, reduces

    contentionTake advantage of parallel processing capability

    Disadvantages of Partitioning:

    Inconsistent access speed: Slow retrievals across partitions Complexity: Non-transparent partitioning Extra space or update time: Duplicate data; access from multiple

    partitions

    D t R li ti

  • 7/28/2019 Logical and Physical Design

    93/103

    Data Replication

    Purposely storing the same data in

    multiple locations of the database Improves performance by allowing multiple

    users to access the same data at the

    same time with minimum contentionSacrifices data integrity due to data

    duplicationBest for data that is not updated often

    Designing Physical Files

  • 7/28/2019 Logical and Physical Design

    94/103

    Designing Physical Files

    Physical File: A named portion of secondary memory allocated

    for the purpose of storing physical recordsTablespacenamed set of disk storage elements

    in which physical files for database tables can bestored

    Extentcontiguous section of disk space

    Constructs to link two pieces of data:

    Sequential storage Pointersfield of data that can be used to locate

    related fields or records

  • 7/28/2019 Logical and Physical Design

    95/103

    Physical file terminology in an Oracle environment

    Fil O i ti

  • 7/28/2019 Logical and Physical Design

    96/103

    File Organizations

    Technique for physically arranging records of afile on secondary storage

    Factors for selecting file organization: Fast data retrieval and throughput Efficient storage space utilization

    Protection from failure and data lossMinimizing need for reorganization Accommodating growth

    Security from unauthorized useTypes of file organizations

    Sequential

    Indexed Hashed

  • 7/28/2019 Logical and Physical Design

    97/103

    Sequential fi le organization

    If not sortedAverage time tofind desired record

    = n/2

    1

    2

    n

    Records of thefile are stored insequence by theprimary key

    field values

    If sorted

    every insert ordelete requiresresort

    Indexed File Organizations

  • 7/28/2019 Logical and Physical Design

    98/103

    Indexed File Organizations

    Indexa separate table that containsorganization of records for quick retrieval

    Primary keys are automatically indexed Oracle has a CREATE INDEX operation, and

    MS ACCESS allows indexes to be created for

    most field types Indexing approaches:

    B-tree index

    Bitmap index

    Hash Index

    J oin Index

    B t i d

  • 7/28/2019 Logical and Physical Design

    99/103

    B-tree index

    uses a tree searchAverage time to find desiredrecord =depth of the tree

    Leaves of the treeare all at samelevel

    consistent accesstime

    H h d fil i d i ti

  • 7/28/2019 Logical and Physical Design

    100/103

    Hashed file or index organization

    Hash algorithmUsually uses division-

    remainder to determinerecord position. Recordswith same position aregrouped in lists

    Bit i d i d i ti

  • 7/28/2019 Logical and Physical Design

    101/103

    Bitmap index index organization

    Bitmap saves on space requirementsRows - possible values of the attribute

    Columns - table rows

    Bit indicates whether the attribute of a row has the values

    Join Indexes speeds up join operations

  • 7/28/2019 Logical and Physical Design

    102/103

    Join Indexesspeeds up join operations

  • 7/28/2019 Logical and Physical Design

    103/103