Upload
russell-copeland
View
236
Download
0
Tags:
Embed Size (px)
Citation preview
1
CS 564Database Management Systems: Design and Implementation
Lecture 2: Relational Model; ER to Relational
Chapter 3 in Cow Book
Slide ACKs: AnHai Doan, Jeff Naughton, and Jignesh Patel
Arun Kumar
2
@Wait list students:
You must have gotten an invite to enroll
to Sec 2. If not, email me before EOD!
3
General Dos and Do NOTs
Do: Raise your hand before asking questions during Lectures Participate in class discussions and use our Piazza page Use “[CS564]” as subject prefix for all emails to me/TAs
Do NOT: Take this class if you cannot attend on Fridays also Use laptops, tablets, mobile phones, or any other electronic
devices during Lectures Use email as primary communication mechanism for
doubts/questions instead of Office Hours and Discussions Record/quote my anecdotes outside of class!
5
Surprise ER Review Exercise!
Cool. Now, please draw a full-fledged ER diagram for Netflix’s
“movie recommendation system” with two Entities: Users and
Movies, and one Relationship: Ratings.
Attach the following attributes appropriately (reuse allowed):
UserID, MovieID, RatingID, NumStars, Name, Timestamp, Age,
Director, ReleaseDate, JoinDate
6
Review Exercise Answer
User
Name
AgeUserID
Movie
NameDirectorMovieID
Rating
TimestampRatingID
NumStars
JoinDate ReleaseDate
7
Database Design Process: ~6 steps
1. Requirements Analysis
2. Conceptual Database Design
3. Logical Database Design
4. Schema Refinement
5. Physical Database Design
6. Application and Security Design
EntityRelationshipModeling
RelationalModel andNormalization
Indexing, etc.
8
Relational data model in a nutshell
Basically, Relation:Table :: Pilot:Driver (okay, a bit more)
The model formalizes “operations” to manipulate relations
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 19 20
2 4.0 07/20/15 4232 293
3 2.5 08/02/15 54551 846
… … … … …
Ratings
10
ER Model vs. Relational Model
Key purposeEase of capturing user app requirements vs.
Ease of (semi-)automated management by computer
Concepts and structureMany concepts in a rich, complex graph vs.
Single, simple, “flat” concept: “relation”
Data management functionalityNo notion of “operations” vs. Rich “algebra” of relational operations
11
Relational Model: Basic Terms
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
2 4.0 07/20/15 4232 293
3 2.5 08/02/15 54551 846
… … … … …
What is a Relation?
A glorified table!
What are Attributes?
These things
What are Domains?
The mathematical “domains” for the attributes
Integers Real …
What is Arity?
Number of attributes
Ratings
12
Relational Model: Basic Terms
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
2 4.0 07/20/15 4232 293
3 2.5 08/02/15 54551 846
… … … … …
What are Tuples?What is Cardinality?
These thingsNumber of tuples
Ratings
13
Referring to “tuples”: Two notations
1. Without using attribute names (positional/sequence)
2. Using attribute names (named/set)
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
2 4.0 07/20/15 4232 293
3 2.5 08/02/15 54551 846
… … … … …
A tuple
Ratings (R)
t[1] = 3.5
t.NumStars = 3.5
14
Relational Model: Basic Terms
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
2 4.0 07/20/15 4232 293
3 2.5 08/02/15 54551 846
… … … … …
What is Schema?
The relation name, and the name and logical
descriptions of the attributes (including domains)
Aka “metadata”
Ratings
15
Relational Model: Basic Terms
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
2 4.0 07/20/15 4232 293
3 2.5 08/02/15 54551 846
… … … … …
What is an Instance?
A given relation populated with a set of tuples
(loose analogy: schema:instance::type:value in PL)
Instance 1
RatingID NumStars Timestamp UserID MovieID
3292 1.5 06/27/14 794 10
294122 4.0 07/10/14 232 329
74423 0.5 03/08/14 8451 846
… … … … …
Instance 2
Ratings
16
Relational Model: Basic Terms
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
… … … … …
What is a Relational Database?
UserID Name Age JoinDate
79 Alice 23 01/10/13
… … …
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
… … …
A collection of relations; similarly, schema vs. instance
Ratings
Users
Movies
18
Spot six differences!
User
Name
AgeUserID
Movie
NameDirectorMovieID
Rating
TimestampRatingID
NumStars
JoinDate ReleaseDate
19
Spot six differences!
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
… … … … …
UserID Name Age JoinDate
79 Alice 23 01/10/13
… … …
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
… … …
Ratings
Users
Movies
20
ER Model vs. Relational Model
Key purposeEase of capturing user app requirements vs.
Ease of (semi-)automated management by computer
Concepts and structureMany concepts in a rich, complex graph vs.
Single, simple, “flat” concept: “relation”
Data management functionalityNo notion of “operations” vs. Rich “algebra” of relational operations
21
“Write Operations” on a Relation
Insert
Add tuples to a relation
Delete
Remove tuples from a relation (typically based on
“predicate” matches, e.g., “NumStars <= 4.5”
Modify
Logically, deletes + inserts, but typically
implemented as in-place updates to a relation instance
22
“Read Operations” on a Relation
“Select”
Select all tuples from Ratings with “UserID == 19”
“Project”
Select only Director attribute from Movies
“Aggregate”
Select Average of all NumStars in Ratings
And a few more formal operations …
(Sneak peak: SQL to express both the write/read ops!)
23
Bottomline:
ER model is for conceptual schema
modeling; no notion of operations
Relational model includes operations on
data; implementable as fast software
25
ER to Relational: Entity as a Table
User
Name
AgeUserID
JoinDate
Movie
NameDirectorMovieID
Rating
TimestampRatingID
NumStars
ReleaseYear
UserID Name Age JoinDate
79 Alice 23 01/10/13
… … …
Users
Entity Name → Relation Name
Attribute Names → Attribute Names
Entity Set → Relation Instance (Tuples)
26
ER to Relational: Key Constraint
UserID uniquely identifies a User
Underline it in the relation too!
“Primary Key”
User
Name
AgeUserID
JoinDate
UserID Name Age JoinDate
79 Alice 23 01/10/13
… … …
Users
27
ER to Relational: More Examples
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
… … …
Movies
Movie
NameDirectorMovieID
ReleaseDate
29
Introducing SQL
Structured English QUEry Language (SEQUEL);
TL;DR name is SQL Invented at - you guessed it - IBM!
30
What is SQL?
SQL is a querying language for relational data Simple English-based syntax, but precise, formal
semantics (compiled down to relational algebra) Key advantages:
Physical Data Independence (“how” data is
stored on machine independent of “what”, i.e., SQL
queries)
Logical Data Independence (notion of views in
SQL enables simpler queries on same schema)
31
Major SQL Components
Data Definition Language (DDL)
Data Manipulation Language (DML)
Embedded and Dynamic SQL
Triggers and Cursors
Security
Transaction Management
Remote Database Access
32
Creating a table for an Entity in SQL
MovieID Name ReleaseDate Director
20 Inception 07/13/2010 Christopher Nolan
… … …
Movies
CREATE TABLE Movies (
MovieID INTEGER,
Name CHAR(30),
ReleaseDate DATE,
Director CHAR(20),
PRIMARY KEY (MovieID))
33
Relationship as a Relation?
User
Name
AgeUserID
Movie
NameDirectorMovieID
Rating
TimestampRatingID
NumStars
JoinDate ReleaseYear
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
… … … … …
“Foreign Keys”Ratings
34
Table for Relationships in SQL
CREATE TABLE Ratings( RatingID INTEGER, Numstars REAL, Timestamp DATE, UserID INTEGER, MovieID INTEGER, PRIMARY KEY (RatingID), FOREIGN KEY (UserID) REFERENCES Users(UserID), FOREIGN KEY (MovieID) REFERENCES Movies(MovieID))
RatingID NumStars Timestamp UserID MovieID
1 3.5 08/27/15 79 20
… … … … …
Ratings
35
Q: Why not this?
UserID
Name
Age
JoinDate
RatingID
NumStars
Timestamp
MovieID
Name ReleaseYEar
Director
79 Alice 23 01/10/13
1 3.5 08/27/15
20 Inception
2010 Christopher
Nolan… … … … … … … … … … …
AllStuff
(Sneak peak: Redundancy in the data!
Waste of storage; causes write anomalies!
Mitigated by Schema normalization)
36
How to represent self-relationships?
Mention
Movie
NameDirectorMovieID
ReleaseYear
MovieID MentionMovieID
20 313
… …
Mention
2 Foreign Keys
Q: How to express in SQL?
Q: What is the primary key?
37
NULL Values in Relations
User
Name
AgeUserID
JoinDate
(Sneak peak: A headache for SQL query processing!)
UserID Name Age JoinDate
79 Alice 23 01/10/13
48 John NULL 04/08/15
… … …
Users
NULL “stands in” for attribute
values that are missing/unknown
SQL has “NOT NULL”, e.g., “Age REAL NOT NULL”
38
What about many-to-one?
Student
Name AgeSID
Department
Name AddressDID
Major
SID Name Age
79 Alice 19
13 Bob 21
48 John NULL
… … …
StudentsDID Name Addre
ss
CS Computer Sciences
1210 …
ST Statistics Blah
… … …
Department
SID DID
79 CS
48 ST
… …
Major
39
What about many-to-one?
Student
Name AgeSID
Department
Name AddressDID
Major
SID Name Age MajorDID
79 Alice 19 CS
13 Bob 21 NULL
48 John NULL ST
… … … …
Students Foreign Key?DID Name Addre
ss
CS Computer Sciences
1210 …
ST Statistics Blah
… … …
Department
40
More Advanced Stuff
Integrity constraints
Key constraints
Referential integrity and participation constraints
Weak entity set
“Is A” hierarchy
41
Integrity Constraint (IC)
A logical condition (invariant) that must hold true
on any instance of a given database schema
A legal relation instance satisfied all ICs
Overuse/abuse of ICs is a danger!
Part of schema; cannot infer from data exactly!
Two main types:
Key Constraint
Referential Integrity Constraint
42
Key Constraint in SQL
Key vs. Superkey
Primary key vs. Candidate key vs. Alternate key
MovieID Name ReleaseDate Director IMDB_URLMovies
CREATE TABLE Movies (MovieID INTEGER,Name CHAR(30),ReleaseDate DATE,Director CHAR(20),IMDB_URL CHAR(20),PRIMARY KEY (MovieID),UNIQUE (IMDB_URL))
43
Referential Integrity Constraint (RIC)
A Foreign Key value should not be NULL!
Student
Name AgeSID
Department
Name AddressDID
Major
SID Name Age MajorDID
79 Alice 19 CS
13 Bob 21 NULL
48 John NULL ST
… … … …
Students Foreign Key?DID Name Addre
ss
CS Computer Sciences
1210 …
ST Statistics Blah
… … …
Department
45
Enforcing RIC
We have 3 options:
Refuse to allow the deletion! Delete all tuples in Students that reference the
deleted DID in Department
Set the corresponding DID in Students to some
default value, or in the worst case, NULL
46
Enforcing RIC in SQL
Refuse to allow the deletion!
CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)
ON DELETE NO ACTION)
47
Enforcing RIC in SQL
Delete all tuples in Students that reference the
deleted DID in Department
CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)
ON DELETE CASCADE)
48
Enforcing RIC in SQL
Set the corresponding DID in Students to some
default value, or in the worst case, NULL
CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)
ON DELETE SET DEFAULT)
49
Participation Constraint in SQL
Student
Name AgeSID
Department
Name AddressDID
Major
CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER NOT NULL, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)
ON DELETE NO ACTION)
50
Translating a Weak Entity Set
Floor
Number NumRooms
Department
Name AddressDID
PartOf
Number NumRooms DID
1 14 CS
2 12 CS
… … …
FloorDID Name Addre
ss
CS Computer Sciences
1210 …
ST Statistics Blah
… … …
Department
Q. What ICs needed on Floor?
51
Translating an “Is A” Hierarchy
Student
Name AgeSID
Undergrad DoctoralMasters
IsAIsHonors QualScore
ByThesis
52
Translating an “Is A” Hierarchy
We have 3 options:
“OOPL Approach”: separate relations for each
Entity Set; an entity present in exactly one
“TrueER Approach”: relations for “subclasses”
have foreign key to parent
“Truly Terrible Approach”: AllStuff with NULL!
53
Translating an “Is A” Hierarchy
“OOPL Approach”: separate relations for each
Entity Set; an entity present in exactly one
SID Name AgeStudents
SID Name Age IsHonorsUndergrad
SID Name Age IsThesisMasters
SID Name Age QualScoreDoctoral
54
Translating an “Is A” Hierarchy
“TrueER Approach”: relations for “subclasses”
have foreign key to parent
SID Name AgeStudents
SID IsHonorsUndergrad
SID IsThesisMasters
SID QualScoreDoctoral
Q. How do the ICs here differ with OOPL?
55
Translating an “Is A” Hierarchy
“Truly Terrible Approach”: AllStuff with NULL!
AllStuffSID Name Age IsHonors IsThesis QualScore
Q. How do the ICs here differ with the other 2?
NULL is awful!