47
SQL, Data Storage Technologies, and Web- Data Integration Week 2

SQL, Data Storage Technologies, and Web-Data Integration Week 2

Embed Size (px)

Citation preview

SQL, Data Storage Technologies, and Web-Data

IntegrationWeek 2

Today’s Agenda

• Review First Week

• Data Modeling, ER Diagrams

• Normalization techniques– 2nd Normal Form– 3rd Normal Form

• Physical Data Model

First Normal Form

• First Normal Form (1NF) occurs when all attributes are single valued.– No repeating or attributes with multiple values

• Examples: – A Movie entity with attributes actor1, actor2,

actor3.– A Sundae entity with a “toppings” attributes

In Class Exercise

• Create an Data Model in 1st Normal Form for the following applications:

1. Recipes

2. Dating Service

3. Bookstore

4. Photo Sharing

5. Movie Collection

1st NF Data Model / ER Diagram

ER Diagram Terminology• NULL: The database term for a value that

does not exist.– What attributes in our data model could be

NULL?

• Unique Identifiers (IDs)– Every entity needs a Unique Identifier– It must be unique across all instances of an

entity– It must not be NULL– Its value must never change.

Unique Identifiers

• How do we pick IDs?– From attributes

• What are the IDs in our data model?

– Auto-generated IDs• Very common

• Security Issues

Relationships• An association between two entities

• Indicates the degree of the relationship– one and only one (also zero or one)– one or many (also zero or many)

• Examples:– A Donor gives one or many Donations– A Donation is given by one and only one

Donor

• What are the relationships in our model?

Relationships

• Three degrees make three types of relationships

• One to one– rare

• One to many– very common

• Many to many– will need special handling

ER Diagrams

• Entities are rectangles

• Attributes are ellipses

• Unique IDs are underlined

• Relationships are lines between entities– Straight line for one and only one– “Crow’s foot” for one or many– Can use bars and circles to represent one or

zero

Data Model Relationships

Junction Entities

• Many to many relationships can be hard to represent in a RDBMS

• They are replaced with junction entities– Take the many-to-many relationship– Replace it with an entity– Create two new one-to-many relationships to

the new entity• Which side should the “many” be?

Data Model with Junction Entity

New Example DataDonor

ID Name Address Phone Email

1 Fred Smith

123 Bedrock

555-1212

[email protected]

2 Beth Kirsh

104 Ballard

555-1234

[email protected]

3 Erin Lovett

1580 Stone Ln

555-5098

[email protected]

Donation

ID Amount Date P Name

1 100.00 01/02/04 Martha

2 250.00 12/11/04 Jim

3 10.00 09/07/04 Jim

4 100.00 02/02/04 Jim

Division

ID Name

1 Marketing

2 Child-care

3 Trips

DonationToDivision

ID Percentage

1 100%

2 50%

3 50%

ER Diagram Terminology• Non-identifying attribute: An attribute that

is not the Unique ID and is dependent on the Unique ID.

• Repeating entries are often a sign of an non-dependent attribute

• Examples:– Is Donor Name a non-identifying attribute?– Is Processor name?

2nd Normal Form (2NF)

• Model has to be in 1NF

• All attributes must be non-identifying attributes.

• To make 2NF, we have two options– Create a new entity for the attribute– Move the attribute to the entity where it really

belongs

2nd Normal Form

2nd Normal Form

• Don’t simply look for repeating entries to determine 2NF

• Example: Is percentage already in 2NF? Many entries have 100% for their value– Yes – The value is dependent on the

DonationToDivisionID– Percentage also doesn’t make sense as an

Entity: it has no attributes other than itself, and 75% isn’t a “thing”.

3rd Normal Form

• Must already be in 2nd normal form

• Non-identifying attributes cannot be dependent on each other.

• Examples:– Employee(eid, name, position, salary)– Address(street, city, state, state abbr.)

• Move the dependent attributes into a new Entity

3rd Normal Form

In Class Exercise

• Update your Data Models to 3rd Normal Form for the following applications:

1. Recipes

2. Dating Service

3. Bookstore

4. Photo Sharing

5. Movie Collection

Physical Database Design

• ER Diagram completed – review design carefully

• Time to convert our conceptual ER diagram into a real database system.

Physical Database Design

• Step 1– Convert all entities into tables

• A database is typically made up of many tables• A table is made up of columns and rows• Each row in a table represents one instance of

the entity

Physical Database Design

• Step 2– Attributes become columns in the tables

• Important to pick the appropriate data type for the columns

• More on data types later

Physical Database Design

• Step 3– Unique IDs become primary keys

• Remember, they cannot be NULL, and no duplicates are allowed

• Primary key is the just the database name for an entity’s unique ID– Primary keys are automatically indexed by

the database (more on this later)

Physical Database Design

• Step 4– Relationships become foreign keys in one table of the

relationship.

• A foreign key is a unique ID of another table.– This creates a reference to a unique row in another

table

• This simply means we have a column in one table that contains the unique ID of the other table.

Physical Database Design

• Step 4 Continued

• Which table does the foreign key belong in for a one-to-many relationship?– Store the unique ID from the "one" side of the

relationship in the table representing the "many" side of the relationship

Physical Database DesignDonor

DonorID Name Email address Phone number

Donation

DonationID Date Amount DonorID ProcessorID

Processor

ProcessorID name

DonationToDivision

DonationToDivisionID Percentage DonationID DivisionID

Division

DivisionID name

Data Types

• Each database has its own data types– Most share a common core of data types,

including integers, character strings, and dates.

• MySQL has 36 different data types

Data Types

• Numeric Types– store numeric data such as integers and

floating point numbers– Modifiers:

• UNSIGNED: 0 to 255 instead of -128 to 127• AUTO_INCREMENT: integers only, one per table

Data Types

• Numeric Types Cont.

Numeric Data Types INT (also INTEGER) a simple whole number, like 1 or 4,000 or –2 TINYINT a whole number with a range of only –128 to 128.

For example, use this when you want to store simple true/false Booleans

FLOAT a floating-point number with single precision DOUBLE (also REAL) a floating-point number with double precision DECIMAL (also NUMERIC)

a floating-point number, but with accurate precision. This type should be used for all monetary values

Data Types

• String Types– store textual data. – Modifiers:

• BINARY: allows case-sensitive searching

Data Types

• String Types Cont.String Data Types

CHAR (also CHARACTER)

a text field of a fixed length. When a column is defined as CHAR, the length of the text string is fixed and all values stored will use that much storage space. If a string shorter than the fixed length is stored, the right side of the string is padded with white space.

VARCHAR (also CHARACTER VARYING)

a text field of varying length. Trailing spaces are removed, and the storage space is one byte larger than the size of the text. Maximum size for this data type is 255 characters. This is a common type used for short character strings like names, phone numbers, street addresses, and so on.

TEXT text up to 65 kilobytes in length MEDIUMTEXT text up to 16 megabytes in length LONGTEXT text up to four gigabytes in length

Data Types

• Date types– store dates and times

Date Data Types DATE stores a date in the format YYYY-MM-DD DATETIME stores a date and time in the format YYYY-

MM-DD HH:MM:SS TIME stores a time from "00:00:00" to "23:59:59" TIMESTAMP stores the current date and time. This type of

column is updated automatically whenever there are modifications to a record. This type of field is great for recording when a row is modified

YEAR stores a four digit year

Data Types

• Complex data types– Enumerations (ENUM)

• list of predefined strings, value must be one of them.

– Sets (SET)• list of predefined strings, value can be any

combination of them.

Current Physical Database DesignDonor

DonorID Name Email address Phone number

Donation

DonationID Date Amount DonorID ProcessorID

Processor

ProcessorID name

DonationToDivision

DonationToDivisionID Percentage DonationID DivisionID

Division

DivisionID name

New Physical Database Design

Column Name Type

DonorID int unsigned

PhoneNumber Varchar(14)

Name Varchar(255)

Address Varchar(255)

Email Varchar(255)

• Donor table: includes types!

Physical Database Design

• Choosing Column Options– Column options help enforce data integrity– Can make the programmer’s job easier– Which makes the DBA’s job easier

• Column Options– NULLs allowed, default values, auto

incrementing values, and keys

Column Options

• NOT NULL– By default, columns can contain a NULL

instead of a value; this overrides that behavior– Requires that some value always exists in

that column for any given row of data.– Will cause a database error if the programmer

tries to add a NULL to that column.– What should be NOT NULL in our donation

ER diagram?

Column Options

• DEFAULT value– If a user doesn’t supply a value for a column,

you can specify a default value– Example: For a local organization, State and

Country might default to WA, USA– Should there be any DEFAULT columns in

our donation database?

Column Options

• AUTO_INCREMENT– Provides a default value to an INTEGER

column– The value will automatically be incremented

for each insert– Only one column per table can have this

option.– Great option for an internal (meaning not

shown to a external user) primary key

Column Options

• PRIMARY KEY– Creates an index on the column– Forces each column entry to be unique from

all other column entries– Automatically is NOT NULL

• UNIQUE– Just like PRIMARY KEY, without the special

name.

Physical Database Design

• Now includes column options!

Column Name Type Options

DonorID int unsigned Auto_increment primary key

PhoneNumber Varchar(14)

Name Varchar(255) Not NULL

Address Varchar(255)

Email Varchar(255)

Relational Database Schema

• We now have our database schema• Whereas our E/R Diagram was very abstract, we

now have a very concrete, relational design

Requirements / Ideas Database Schema

E/R Diagram RDBMS

In Class Exercise

• Turn your Data Models into Database Schemas:

1. Recipes

2. Dating Service

3. Bookstore

4. Photo Sharing

5. Movie Collection