Upload
vuongnhan
View
225
Download
0
Embed Size (px)
Citation preview
CSCI-UA.0060-002: Database Design and Web Implementation
CSCI-UA:0060-02Database Design &
Web Implementation
Professor Evan [email protected]@nytimes.com
Lecture #8: Normalization & Reverse Engineering
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Administrivia
§Readings: • For Next class 2/26•Chapter 4: Modeling and Designing Databases - a
summary of database design concepts •Chapter 5: Basic SQL
§Homework:• How’s it going?• Questions?
3
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Normalization
4
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
De-normalized data
§De-normalized data is data that can be more efficiently represented
§Symptoms• Repeated data, • Multiple values per column• Mixture of multiple entities in a single record
5
Artist Artist Popularity
Album Track Release Year
Length Track Popularity
Green Day 0.71529 American Idiot American Idiot 2004 176.556 0.75Green Day 0.71529 American Idiot Boulevard Of Broken Dreams 2004 493.877 0.72Green Day 0.71529 Dookie Basket Case 1994 181.759 0.77Wizards from Kansas 0.00008 Reunion River Road 2011 217.551 0.00000Wizards from Kansas 0.00008 Reunion Orion 2011 268.952 0.00000Fun 0.68483 Some Nights We Are Young 2012 277.345 0.91729
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Functional Dependency
§ If knowing the value of a set of columns X uniquely determines the value of another set of columns Y then we can say that Y is functionally dependent on X, written Y → X.
6
Area Code
Phone Number
First Last
913 555-1234 Ronald Sandhaus212 226-2000 Evan Sandhaus718 226-2000 Tara Sandhaus913 844-1212 Francis Sandhaus
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Functional Dependency
§ In this example: • [Area Code, Phone Number] → [First Last]
7
Area Code
Phone Number
First Last
913 555-1234 Ronald Sandhaus212 226-2000 Evan Sandhaus718 226-2000 Tara Sandhaus913 844-1212 Francis Sandhaus
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Key - A NEW Definition Based On Functional Dependency
§The key fields functionally determines all of the other fields in a table.
8
Area Code
Phone Number
First Last
913 555-1234 Ronald Sandhaus212 226-2000 Evan Sandhaus718 226-2000 Tara Sandhaus913 844-1212 Francis Sandhaus
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Primary Key - A NEW Definition Based On Functional Dependency
§A Primary Key contains no subset of fields that is also a key.
9
Area Code
Phone Number
Zip First Last
913 555-1234 66211 Ronald Sandhaus212 226-2000 66211 Evan Sandhaus718 226-2000 66211 Tara Sandhaus913 844-1212 66211 Francis Sandhaus
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
1st Normal Form
§A database is in 1st normal form if no column contains multiple values.
10
Album Id Album Album Countries
1 American Idiot US, UK, AU
2 Dookie US, UK3 Reunion US
4 Some Nights US, NZ, FR
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
1st Normal Form
§A database is in 1st normal form if no column contains multiple values.
11
Album Id Album1 American Idiot
2 Dookie
3 Reunion4 Some Nights
Album Id FK
Album Countries
1 US1 UK1 AU1 US1 UK1 AU2 US2 UK3 US3 US4 US4 NZ4 FR
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Primary Key?
12
Artist Artist Popularity
Album Release Year
Track Length Track Popularity
Green Day 0.71529 American Idiot2004 American Idiot 176.556 0.75Green Day 0.71529 American Idiot2004 Boulevard Of Broken Dreams493.877 0.72Green Day 0.71529 Dookie 1994 Basket Case 181.759 0.77Wizards from Kansas 0.00008 Reunion 2011 River Road 217.551 0.00000Wizards from Kansas 0.00008 Reunion 2011 Orion 268.952 0.00000Fun 0.68483 Some Nights 2012 We Are Young 277.345 0.91729
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
2nd Normal Form
§A database is in 2nd normal form if it contains no tables in which a column depends on just part of the primary key.
13
Artist Artist Popularity
Album Release Year
Track Length Track Popularity
Green Day 0.71529 American Idiot 2004 American Idiot 176.556 0.75Green Day 0.71529 American Idiot 2004 Boulevard Of Broken Dreams493.877 0.72Green Day 0.71529 Dookie 1994 Basket Case 181.759 0.77Wizards from Kansas 0.00008 Reunion 2011 River Road 217.551 0.00000Wizards from Kansas 0.00008 Reunion 2011 Orion 268.952 0.00000Fun 0.68483 Some Nights 2012 We Are Young 277.345 0.91729
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Primary Key?
14
Artist Artist Popularity
Album Release Year
Track Length Track Popularity
Green Day 0.71529 American Idiot2004 American Idiot 176.556 0.75Green Day 0.71529 American Idiot2004 Boulevard Of Broken Dreams493.877 0.72Green Day 0.71529 Dookie 1994 Basket Case 181.759 0.77Wizards from Kansas 0.00008 Reunion 2011 River Road 217.551 0.00000Wizards from Kansas 0.00008 Reunion 2011 Orion 268.952 0.00000Fun 0.68483 Some Nights 2012 We Are Young 277.345 0.91729
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
2nd Normal Form
§Functional Dependencies• [Artist] → [Artist Popularity]• [Album] → [Release Year]• [Track] → [Length, Popularity]
15
Artist Artist Popularity
Album Release Year
Track Length Track Popularity
Green Day 0.71529 American Idiot 2004 American Idiot 176.556 0.75Green Day 0.71529 American Idiot 2004 Boulevard Of Broken Dreams493.877 0.72Green Day 0.71529 Dookie 1994 Basket Case 181.759 0.77Wizards from Kansas 0.00008 Reunion 2011 River Road 217.551 0.00000Wizards from Kansas 0.00008 Reunion 2011 Orion 268.952 0.00000Fun 0.68483 Some Nights 2012 We Are Young 277.345 0.91729
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Second Normal Form
16
Artist Artist Popularity
Green Day 0.71529Wizards from Kansas 0.00008Fun 0.68483
Album Artist FKAmerican Idiot Green DayDookie Green DayReunion Wizards From KansasSome Nights Fun
Track Album FK Release Year
Length Track Popularity
American Idiot American Idiot 2004 176.56 0.75Boulevard Of Broken DreamsAmerican Idiot 2004 493.88 0.72Basket Case Dookie 1994 181.76 0.77River Road Reunion 2011 217.55 0.00000Orion Reunion 2011 268.95 0.00000We Are Young Some Nights 2012 277.35 0.91729
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
3rd Normal Form
§A database is in 3rd normal form if it contains no table in which a column depends on another non-key column
17
Track Album FK Release Year
Length Track Popularity
American Idiot American Idiot 2004 176.56 0.75Boulevard Of Broken DreamsAmerican Idiot 2004 493.88 0.72Basket Case Dookie 1994 181.76 0.77River Road Reunion 2011 217.55 0.00000Orion Reunion 2011 268.95 0.00000We Are Young Some Nights 2012 277.35 0.91729
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
3rd Normal Form
§A database is in 3rd normal form if it contains no table in which a column depends on another non-key column
18
Track Album FK Release Year
Length Track Popularity
American Idiot American Idiot 2004 176.56 0.75Boulevard Of Broken DreamsAmerican Idiot 2004 493.88 0.72Basket Case Dookie 1994 181.76 0.77River Road Reunion 2011 217.55 0.00000Orion Reunion 2011 268.95 0.00000We Are Young Some Nights 2012 277.35 0.91729
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
3rd Normal Form
19
Artist Artist Popularity
Green Day 0.71529Wizards from Kansas 0.00008Fun 0.68483
Album Release Year Artist FKAmerican Idiot 2004 Green DayDookie 1994 Green DayReunion 2011 Wizards From KansasSome Nights 2012 Fun
Track Album FK Length Track Popularity
American Idiot American Idiot 176.56 0.75Boulevard Of Broken DreamsAmerican Idiot 493.88 0.72Basket Case Dookie 181.76 0.77River Road Reunion 217.55 0.00000Orion Reunion 268.95 0.00000We Are Young Some Nights 277.35 0.91729
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Election Normalization
20
ElectionDate State Postal
County Name Race Number
Office ID
Office Name Candidate Number
Party First Name
Last Name
Incumbent Vote Count
Nov 6, 2012 AZ Arizona 4021 S U.S. Senate 8790 Dem Richard Carmona 0 912776Nov 6, 2012 AZ Arizona 4021 S U.S. Senate 8789 GOP Jeff Flake 0 992323Nov 6, 2012 AZ Arizona 4021 S U.S. Senate 9005 Lib Marc Victor 0 88397Nov 6, 2012 CA California 8619 S U.S. Senate 19128 Dem Dianne Feinstein 1 6286938Nov 6, 2012 CA California 8619 S U.S. Senate 19137 GOP Elizabeth Emken 0 3929593Nov 6, 2012 MA Massachusetts 24012 S U.S. Senate 31822 Dem Elizabeth Warren 0 1678997Nov 6, 2012 MA Massachusetts 24012 S U.S. Senate 31823 GOP Scott Brown 1 1450044
§A database is in 2nd normal form if it contains no tables in which a column depends on just part of the primary key.
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Election Normalization
21
ElectionDate State Postal
state Name Race Number
Office ID
Office Name
Nov 6, 2012 AZ Arizona 4021 S U.S. SenateNov 6, 2012 CA California 8619 S U.S. SenateNov 6, 2012 MA Massachusetts 24012 S U.S. Senate
§A database is in 3rd normal form if it contains no table in which a column depends on another non-key column
Candidate Number
Party First Name
Last Name
8790 Dem Richard Carmona8789 GOP Jeff Flake9005 Lib Marc Victor
19128 Dem Dianne Feinstein19137 GOP Elizabeth Emken31822 Dem Elizabeth Warren31823 GOP Scott Brown
Race Number
CandidateNumber
Incumbent Vote Count
0 9127760 9923230 883971 62869380 39295930 16789971 1450044
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Reverse Engineering
§Reverse Engineering (n): Figuring out how something works and building your own version of it.
§ IMDB§Twitter (if we have time)
22
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Relating - Implementation
23
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Relating Implementation
24
Thursday, February 21, 13
CSCI-UA.0060-002: Database Design and Web Implementation
Relating - Implementation
25
Thursday, February 21, 13