38
4. Relational Databases

4. Relational Databases

  • Upload
    metta

  • View
    61

  • Download
    1

Embed Size (px)

DESCRIPTION

4. Relational Databases. Many views ,. View 1. View 2. View 3. Conceptual (logical) schema. Conceptual Schema. Physical schema. Physical Schema. Levels of Abstraction in data defined by various “schema” levels. - PowerPoint PPT Presentation

Citation preview

Page 1: 4. Relational Databases

4. Relational Databases

Page 2: 4. Relational Databases

Levels of Abstraction in datadefined by various “schema” levels

Schemas are defined usingData Definition Languages or DDLs;

data are modified/queried usingData Manipulation Languages or DMLs.

– Views describe how users see data (possibly different data models for different views)

• Many views, View 1 View 2 View 3

Conceptual Schema• Conceptual (logical) schema

Physical Schema• Physical schema.

– Conceptual schema defines logical structure of entire data enterprise

– Physical schema describes underlying files and indexes used.

ANSI schema model

Page 3: 4. Relational Databases

Structure of a DBMS

• A typical DBMS has a layered architecture.

Query Optimizationand Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

These layersmust considerconcurrencycontrol andrecovery

• This is one of several possible architectures.

• Another with a little more detail on next slide.

Page 4: 4. Relational Databases

Structure of a DBMS QUERIES from users (or Transactions or user-workload requests)

SQL (or some other User Interface Language)

QUERY OPTIMIZATION LAYER

Relational Operators (Select, Project, Join)

DATABASE OPERATOR LAYER File processing operators (open,close file,read/write record

FILE MANAGER LAYER (provide the file concept) Buffer managment operators (read/flush page)

BUFFER MANAGER LAYER Disk transfer operators (malloc, read/write block

DISK SPACE MANAGER LAYER

DB on DISK

Page 5: 4. Relational Databases

DISK SPACE MANAGER deals with space on disk

offers an interface to higher layers (mainly the BUFFER MGR) consisting of: allocate/deallocate space; read/write block

can be implement on a raw disk system directly, then it would likely access data as follows: read block b of track t of cylinder c on disk d

or can use OS file system (OS file = sequence of bytes) then it would likely access data as follows: read bytes b of file f and then the Operating System file manager would translate that into read block b of track t of cylinder c on disk d

most systems do not use the OS files system

- for portability reasons,

- to avoid OS file size peculiarities (limitations)

Page 6: 4. Relational Databases

BUFFER MANAGER partitions the main memory allocated to the DBMS into buffer page frames,

brings pages to and from disk as requested by higher layers (mainly the FILE Mgr).

FILE MANAGER

supports the file concept to higher layers (DBMS file = collection of records and pages of records)

supports access paths to the data in those files (e.g., Indexes).

Not all Higher level DBMS code recognizes/ uses page concept.

Almost all DBMS use the record concept, though.

Page 7: 4. Relational Databases

DATABASE OPERATOR LAYER implements physical data model operators

(e.g., relational operators; select, project, join...)

QUERY OPTIMIZER produces efficient execution plans for answering user queries (e.g., execution plans as trees of

relational operators: select, project, join, union, intersect translated from, e.g., SQL queries).

SQL is not adequate to answer all user-database questions, e.g., Knowledge workers working on Data Warehouses ask "what if" questions (On-Line Analytic Processing or OLAP) not retrieval questions (SQL)

Page 8: 4. Relational Databases

Overview of Database Design Conceptual design:

What are the entities and relationships in the enterprise? What information about these entities and relationships should be stored in the

database? What integrity constraints or business rules should be enforced?

A database `schema’ Model diagram answers these question pictorially (Entity-Relationship or ER diagrams).

Then one maps the ER diagrams into a relational schema (using the Data Definition Language provided)

Entity: Real-world object type distinguishable from other object types.

Employee

Each entity set has a key. (which is the chosen identifier attribute(s)

and is underlined in these notes)

ssnname

lot

An entity is described using a set of Attributes.

Each attribute has a domain.(allowable value universe)

Page 9: 4. Relational Databases

ER Model (Cont.) Relationship: Association among two or more entities. E.g.,

Employee Jones works in Pharmacy department.

lot

name

Employee

ssn

since

Works_In

dname

budgetdid

Department

Degree=2 relationship between entities, Employees and Departments.subor-dinate

super-visor

Reports_To

lot

name

Employee

ssn

Must specify the “role” of each entityto distinguish them.

Degree=2 relationship between an entity andItself? E.g., Employee Reports_To Employee.

Relationships can have attributes too!

Page 10: 4. Relational Databases

Relationship Cardinality Constraints (many-to-many) Works_In:

An employee can work in many departments.A dept can have many employees working in it.

• (1-many) e.g., Manages:

• It may be required that each dept has at most 1 manager.

dname

budgetdid

since

lot

name

ssn

ManagesEmployee Department1 m

lotdname

budgetdid

sincename

Works_In DepartmentEmployee

ssn

m n

• (1-1) Manages: In addition it may be required that each manager manages at most 1 department. dname

budgetdid

since

lot

name

ssn

ManagesEmployee Department1 1

1-to-1 1-to Many Many-to-1 Many-to-Many

Page 11: 4. Relational Databases

Participation Constraints

Every department may have to have a manager? This is an example of total participation constraint: the participation of Department in Manages is said to

be total (vs. partial).

lot

name dnamebudgetdid

sincename dname

budgetdid

since

Manages

since

DepartmentsEmployees

ssn

Works_In

total

Page 12: 4. Relational Databases

ISA (`is a’) Hierarchies namessn lot

hourly_wages hours_worked

contractid

We can use attribute inheritance to save repeating shared attributes.

Overlap constraints: Can Joe be an Hourly_Emp and a Contract_Emp? (Allowed/disallowed)

Covering constraints: Does every Employee entity also have to be an Hourly_Emp or a Contract_Emp entity? (Yes/no)

Contract_Emp

Employee

ISA

Hourly_Emp

e.g., every Hourly_Emp ISA Employee every Contract_Emp ISA Employee Hourly_Emps and Contract_Emps can have their own separate attributes also.

Coveringyes

Overlapallowed

If we declare an ISA relationship among entity types, e.g., A ISA B (every instance of A entity is also an instance entity of entity B), then B entities “inherit” A entity attributes

Page 13: 4. Relational Databases

Relational Database: Working Definitions

• Relational database: a set of relations

• Relation: made up of 2 parts:

– Instance or occurrence : a table, with rows and columns. #Rows = cardinality, #fields = degree

– Schema or type: specifies name of relation & name, type of each attribute

• Students(sid: string, name: string, login: string, age: integer, gpa: real).

• Strictly, a relation is a set of tuples but it is common to think of it as a table (sequence of rows made up of a sequence of attribute values)

Page 14: 4. Relational Databases

Relational Query Languages• A major strength of the relational model: supports simple,

powerful querying of data.

• Queries can be written intuitively (specifying what, not how), DBMS is responsible for evaluation

• The DBMS does your programming!

– Allows a module called the optimizer to extensively re-order operations (even combine similar operations from different concurrent requests), and still ensure that the answer does not change.

Page 15: 4. Relational Databases

• Developed by IBM (system R) in the 1970s• Need standards since it is used by many vendors• Standards:

– SQL-86– SQL-89 (minor revision)– SQL-92 (major revision)– SQL-99 (major extensions)

– Procedural constructs (if-then-else, loops, procs)– OO constructs (inheritance, polymorphism,…)

The SQL Query Language

Page 16: 4. Relational Databases

•To find just names and logins (a projection), replace 1st line:

SELECT S.name, S.loginFROM Students SWHERE S.age=18

SQL Query Language• One of the simplest languages on earth very English-

like! Specify what, not how.• E.g., SELECT attributes FROM relations WHERE

condition

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

• Find all 18 year old students (a selection)

SELECT *FROM Students SWHERE S.gpa=3.4

What columns you want What rows you want.

sid name login age gpa

53666 Jones jones@cs 18 3.4

name login

Jones jones@cs

Page 17: 4. Relational Databases

suceeds

Querying Multiple Relations (Join, implemented using nested loop –

alternative 1)

S.name E.cid

Smith Topology112

we get:

• What does the following query produce?

SELECT S.name, E.cidFROM Students S, Enrolled EWHERE S.sid=E.sid AND E.grade=“A”

sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B

sid name login agegpa

53666 Jones jones@cs 18 3.4

53650 Smithsmith@ee 18 3.2

JoinedButSelectfails

Where also used to combine (join) S & E

Page 18: 4. Relational Databases

Destroying and Altering Relations(also DDL)

• Destroys the relation Students. The schema information and the tuples are deleted.

DROP TABLE Students

The schema of Students is altered by adding a new field; every tuple in the current instance is extended, e.g., with a null value in the new field.

ALTER TABLE Students ADD COLUMN Year: integer

Page 19: 4. Relational Databases

Adding and Deleting Tuples

• Can insert a single tuple using:

INSERT INTO Students (sid, name, login, age, gpa)VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)

Can delete all tuples satisfying some condition (e.g., name = Smith):

DELETE FROM Students SWHERE S.name = ‘Smith’

many powerful variants of these commands are available!

Page 20: 4. Relational Databases

Views• A view is a relation constructable from stored or base relations.

Store a definition of it, rather than the instance (actual tuples).

CREATE VIEW YoungActiveStudents (name, grade)AS SELECT S.name, E.gradeFROM Students S, Enrolled EWHERE S.sid = E.sid and S.age<21

Views can be dropped using the DROP VIEW command. How to handle DROP TABLE if there’s a view on the table?

DROP TABLE command has options to let user specify this.

• Views can be used to present necessary information (or a summary), while hiding details in underlying relation(s).

Page 21: 4. Relational Databases

Integrity Constraints (ICs)

• IC: condition that must be true for any instance in the database; e.g., domain constraints. • ICs are specified when (or after) relations are created.– ICs are checked when relations are modified.

• A legal instance of a relation is one that satisfies all its ICs. – DBMS should not allow illegal instances.

– Avoids data entry errors, too!

Page 22: 4. Relational Databases

Primary Key Constraints• A set of fields is a key (strictly speaking, a candidate key) for a

relation if it satisfies:1. (Uniqueness condition) No two distinct tuples can have same

values in the key (which may be a composite)2. (Minimality condition) The Uniqueness condition is not true for

any subset of a composite key.– If Part 2 is false, it’s called a superkey (for superset of a key)– There’s always at least one key for a relation, one of the keys is

chosen (by DBA) to be the primary key, the primary record identification or lookup column(s)

• E.g., sid is a key for Students. The set {sid, gpa} is a superkey.

Entity Integrity No column of the primary key can contain a null value.

Page 23: 4. Relational Databases

Foreign Keys and Referential Integrity

• Foreign key : A field (or set of fields) in one relation used to `refer’ to a tuple in another relation. (by listing the the primary key value in the second relation.) Like a `logical pointer’.

• E.g. sid in ENROLL is a foreign key referring to sid in Students (sid is the primary key of S)– If all foreign key constraints are enforced, a special

integrity constraint, referential integrity , is achieved, i.e., no dangling references

– E.g., if Referential Integrity is enforced (and it almost always is) an Enrolled record cannot have a sid that is not present in Students (students cannot enroll in courses until they register in the school)

Page 24: 4. Relational Databases

Foreign Keys

• Only students listed in the Students relation should be allowed to enroll for courses.

sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B

Enrolled

sid name login age gpa

53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

Students

Page 25: 4. Relational Databases

Enforcing Referential Integrity

• Consider Students and Enrolled; sid in Enrolled is a foreign key that references Students.

• What should be done if an Enrolled tuple with a non-existent student id is inserted? (Reject it!)

• What should be done if a Students tuple is deleted?– Also delete all Enrolled tuples that refer to it?– Disallow that deletion if an Enrolled tuple refers to

it?– Set sid in Enrolled tuples that refer to it to a default

sid?– (sometimes there is a “default default, e.g., set sid in

Enrolled tuples to a special value null, denoting `not applicable’ if no other default is specified.)

Page 26: 4. Relational Databases

Referential Integrity in SQL

• SQL supports all 4 options on deletes and updates.

– Default action = NO ACTION (the violating delete/update request is rejected)

– CASCADE (also delete all tuples that refer to deleted tuple)

– SET NULL / SET DEFAULT (sets foreign key value of referencing tuple)

CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students

ON DELETE CASCADEON UPDATE SET NULL)

Page 27: 4. Relational Databases

Where do ICs Come From?• ICs are based on the semantics of the real-

world enterprise that is being described in the database. I.e., the users decide semantics, not the DB experts!Why?

• We can check a database instance to see if an IC is violated, but we can NEVER infer an IC by only looking at the data instances.

• An IC is a statement about all possible instances!

Page 28: 4. Relational Databases

• An IC is a statement about all possible instances!• It is not a statement that can be inferred from the set of

currently existing instances.

the system might infer that students MUST be 18 or thatnames have to be 5 characters or worse yet, thatgpa ranking must be the same as alphabetical name ordering!

Key and foreign key ICs are the most common.• The next slides deals with the IC of choosing keys.

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.5

• If ICs were inferred from current instances, then when a relation is newly created and has, say, just 2 tuple, many, many ICs would be inferred (e.g., in

Page 29: 4. Relational Databases

Who decides primary key? (and other design choices?)

Ph D I've looked at your data, and decided Part Number (P#) will be designated the primary key for the relation, PARTS(P#, COLOR, WT, TIME-OF-ARRIVAL).

MG You're the expert.

Ph D Well, according to what I’ve learned in school, P# should be the primary key, because IT IS the lookup attribute!

. . . later

– Pointy-headed Dbexpert = Ph D

• The Database design expert?– NO! Not in isolation, anyway.– Someone from the enterprise who understands the data and

the procedures should be consulted.– The following story illustrates this point. CAST: – Mr. Goodwrench = MG (parts manager);

Page 30: 4. Relational Databases

MG Why is lookup so slow?

Ph D You do store parts in the stock room ordered by P#, right?

• Ph D But, but… weight doesn't have Uniqueness property! Parts with the same weight end up together in a pile!

• MG No they don't. I tire quickly, so the first one I throw goes furthest.

• Ph D Then we’ll use a composite primary key, (weight, time-of-arrival).

• MG We get our keys primarily from Curt’s Lock and Key.

The point is: This conversation should have taken place during the 1st meeting.

MG No. We store by weight! When a shipment comes in, I take each part into the back room and throw it as far as I can. The lighter ones go further than the heavy ones so they get ordered by weight!

Page 31: 4. Relational Databases

An ER Example: COMPANY is described to us as follows:

1. The company is organized into depts - each with a name, number, manager. - Each manager has a startdate. - Each department can have several locations.

2. Departments control projects - each with a name, number, location.

3. Each employee has a name, SSN, sex, address, salary, birthdate, dept, supervisor. - An employee may work on several projects (not necessarily all controlled by his dept) for which we keep hoursworked by project.

4. Each employee dependent has a name, sex, birthdate and relationship.

In ER diagrams, entities are represented in boxes: |EMPLOYEE| |DEPENDENT| |DEPT| |PROJECT|

An attribute (or property) of an entity describes that entity.

An ENTITY has a TYPE, including name and list of its attributes.

ENTITY TYPE SCHEMA describes the common structure shared by all entities of that type.

Project (Name, Num,Location, Dept)

ENTITY INSTANCE = individual occurrence of an entity of a particular type at a particular time (Dome, 46, 19 Ave N & Univ, Athletics)

(IACC, 52, Bolley & Centennial, C.S.)

(Bean Res, 31, 12 Ave N & Federal, P.S.) . . .

Entity Type does not change often - very static.

Entity instances get added, changed often - very dynamic

Page 32: 4. Relational Databases

An ER Example continueed: ATTRIBUTES are written next to Entity they describe, usually something like the following:

Name-------------. Name-------------.

Number-----------| Number-----------|

Locations--------|--|DEPARTMENT| Location---------|-|_PROJECT|

Manager----------| ControlDepartment'

ManagerStartDate-'

.--Name

|--SSN .-Employee

|__EMPLOYEE__|----|--Sex |-DependentName

|--Address |_DEPENDENT|--|-Sex

|--Salary |-BirthDate

|--BirthDate `-Relationship

|--Department

|--Supervisor

`--WorksOn

Page 33: 4. Relational Databases

An ER Example: CATEGORIES OF ATTRIBUTES: =COMPOSITE ATTRIBUTE = attributes that are subdivided into smaller parts with independent meaning.

e.g., Name attribute of Employee may be subdivided into FName, Minit, LName.

Indicated: Name (FName, Minit, LName)

Also, WorksOn may be a composite attr of Employee of Project and Hours: WorksOn (Project, Hours)

SINGLE-VALUED ATTRIBUTE: one value per entry.

MULTIVALUED ATTRIBUTE (repeating group) have multiple values per entry:

eg, Locations (as an attribute of Department since a Department can have multiple locations)

- Multivalued Attribute, use {Locations}

- WorksOn may be a mutlivalued attr of Employee as well composite: {WorksOn (Project,Hours)}

DERIVED ATTRIBUTE is an attribute whose value can be calculated from other attribute values.

eg, Age calculated from BirthDate and CurrentDate.

KEY ATTRIBUTE: Each value can occur at most once. (has the uniqueness property)

Used to identify entity instances. We will * key attribute(s).

ATTRIBUTE DOMAIN: Set of values that may be assigned (also called Value Set).

Thus the Preliminary Design of Entity Types for COMPANY db is.

*Name-------------. *Name-------------.

*Number-----------| *Number-----------|

Locations--------|--|DEPARTMENT| Location---------|-|_PROJECT|

Manager----------| ControlDepartment'

ManagerStartDate-'

.---Name

|--*SSN .--Employee

|__EMPLOYEE__|----|---Sex |-*DependentName

|---Address |_DEPENDENT|--|--Sex

|---Salary |--BirthDate

|---BirthDate `--Relationship

|---Department

|---Supervisor

`---WorksOn

Page 34: 4. Relational Databases

An ER Example continued: RELATIONSHIPS among entities express relationships among them:

Relationships have RELATIONSHIP TYPEs (consisting of the names of the entities and the name of the relationship).

A Relationship type diagram for a relationship between EMPLOYEE and DEPARTMENT called "WorksFor" is diagrammed: (in a roundish box)

|EMPLOYEE|-( WorksFor )-|DEPARTMENT|

RELATIONSHIP INSTANCEs for the above relationship might be, eg:

( John Q. Smith, Athletics )

( Fred T. Brown, Comp. Sci.)

( Betty R. Hahn, Business ) . . .

RELATIONSHIP DEGREE: Number of participating entities (usually 2)

If an entity participates more than once in the same relationship, then ROLE NAMES are needed to distinguish multiple participations.

eg, Supervisor, Supervisee in Supervision relationship

- Called Reflexive Relationships.

- Unnecessary if entity types are distinct.

One decision that has to be made is to decide whether attribute or relationship is the appropriate way to model, e.g., "WorksOn". Above we modeled it as an attribute of EMPLOYEE

{WorksOn(Project,Hours)}

The fact that it is multivalued and composite (involving another entity, project) ssuggest that it would be better to model it as a relationship (i.e., it makes a very complex attribute!)

WORKS_FOR(EMPLOYEE, DEPARTMENT)

Page 35: 4. Relational Databases

An ER Example continued:

CONSTRAINTS ON A RELATIONSHIP

CARDINALITY CONSTRAINT can be

1-to-1

many-to-1

1-to-many or

many-to-many

1 to 1: MANAGES(EMPLOYEE, DEPARTMENT) Each manager MANAGES 1 dept

Each dept is MANAGED-BY 1 manager

Many to 1: WORKS_FOR(EMPLOYEE, DEPARTMENT) Each employee WORKS_FOR 1 dept

Each dept is WORKED_FOR by many emps

Many to Many: WORKS_ON(EMPLOYEE, PROJECT) Each employee WORKS_ON many projects

Each project is WORKED_ON by many employees

PARTICIPATION CONSTRAINT (for an entity in a relationship) can be Total, Partial or Min-Max

Total: Every EMPLOYEE WORKS_FOR some DEPARTMENT

Partial: Not every EMPLOYEE MANAGES some DEPT

RELATIONSHIP can have ATTRIBUTES (properties) as well: eg, Hours for WORKS_ON Relationship, Manager_Start_Date in MANAGES relationship.

Page 36: 4. Relational Databases

An ER Example continued: 6 RELATIONSHIPS; (role names, if any, above)

CARDINALITY RELATIONSHIP ATTRIBUTES

----------- ------------ ------ (participation below)

1:1 MANAGES (EMPLOYEE, DEPARTMENT)

partial total

1:many WORKS_FOR (DEPARTMENT, EMPLOYEE)

total total

many:many WORKS_ON (EMPLOYEE, PROJECT)

total total

1:many CONTROLS (DEPARTMENT, PROJECT)

partial total

Reflexive relationship with role names --------------.---------.

supervisor supervisee

1:many SUPERVISION (EMPLOYEE, EMPLOYEE)

partial partial

1:many DEPENDENTS_OF ( EMPLOYEE, DEPENDENT)

partial total

Page 37: 4. Relational Databases

An ER Example continued: COMPANY Entity-Relationship Diagram (showing the Schema) (double connecting lines means "total" while single line means partial participation.)

( MANAGES ) 1|| |1

|| (WORKS_FOR) | *Name-----------. || 1|| many|| | *Number---------| || || || / {Locations}-----|- DEPARTMENT // / number_employees' /1 // /

.----' // / ( CONTROLS ) // /many // / || // / || (SUPERVISE) // / || | | // / || 1| |many // / || 'er| |'ee // / || |____|_______//_____/ .Name(FN,Mi,LN) || |_EMPLOYEE__________|---|-*SSN || // | |-Sex || // | |-Address || Hours-. // | |-Salary || | /many | `-BirthDate \\ (WORKS_ON) | \\ 1| \\ many| | \\ || ( Dependent_0f ) \\ || |many *Nane-. \\_______||___ || *Numb-|--| PROJECT | || Locatn' || || *DependentName---. .Sex--------------|--|| DEPENDENT || BirthDate--------| Relationship-----'

Page 38: 4. Relational Databases

Thank you.