Part I: Introductioncbourke/ComputerScienceII/modules/module-SQL/sql/... · 15 5 11132 Fall 2011 23 1 42123 Fall 2011 24 2 42123 Fall 2011 10 1 11132 Fall 2011 29 5 42123 Spring 2011

Introduction to Databases & SQL

Dr. Christopher M. [email protected]

Outline

1. Introduction

2. Creating Tables

3. Manipulating Data

4. Querying Data

5. Extended Demonstration

6. Designing Databases

7. Extended Demonstration

Part I: Introduction

Motivating ExampleFlat Files

Consider the following data, stored in a flat file:

Course Course Name Student NUID Email

waits, tom 11223344 [email protected] 156 Intro to CSII Lou Reed 11112222 [email protected] 156 Intro to CSII Tom Waits 11223344 [email protected] 230 Computer Hardware Student, J. 12345678 [email protected] 156 Intro to CSII Student, John 12345678 [email protected] 230 Computer Hardware John Student 12345678 [email protected] 235 Discrete Math Student, John 12345678 [email protected] 235 Discrete Math Tom Waits 11223344 [email protected] Null Tom Waits 11223344 [email protected]

Table: Course Enrollment Data

Problems with Flat Files

Problems:

I Formatting Issues

I Repetition of data

I Incomplete data

I Integrity of data

I Organizational problems: any aggregation requires processing allrecords

I Updating information is difficult (must enumerate all possiblechanges, side effects so that information is not lost)

I Concurrency Issues

Relational Databases

Solution: Relational Database Management System (RDBMS)

I Represents data as tuples (pairs, triples, etc.)

I Data is grouped into relations

I E. F. Codd (IBM, 1970)

Key Aspects I

I Data is stored in tables

I Tables have columns (or fields) that have a unique name and type(integer, string)

I Each column stores a single piece of data

I Each row represents a single record

Key Aspects II

I Each row may have a unique primary key (PK) which uniquelyidentifies each record

I Records in different tables are related to each other through foreignkeys (FK)

I Order of rows/columns is meaningless

Key Aspects III

studentId firstName lastName nuid

1 Tom Waits 112233442 Lou Reed 111122225 John Student 12345678

emailId studentId address

8 2 [email protected] 1 [email protected] 1 [email protected] 5 [email protected]

courseId name description

42123 CSCE 156 Computer Science II12333 CSCE 230 Computer Hardware11132 CSCE 235 Discrete Mathematics

enrollmentId studentId courseId semester

15 5 11132 Fall 201123 1 42123 Fall 201124 2 42123 Fall 201110 1 11132 Fall 201129 5 42123 Spring 201132 5 12333 Fall 2011

Key Aspects IV

I Supports Transactions: an interaction or batch of interactions treatedas one unit

I ConstraintsI Allowing or disallowing NULLI Disallowing “bad” values (ranges)I Enforcing formatting (capitalization, precision)I Limiting combinations of data fields

ACID Principles I

I Atomicity – Data transactions must be an all-or-nothing process

I Atomic operation: not divisible or decomposable

I Consistency – Transactions will retain a state of consistency

I All constraints, triggers, cascades preserve a valid state after thetransaction has completed

I Isolation – No transaction interferes or is even aware of another;state transitions are equivalent to serial transactions

I Durability – Once committed, a transaction remains so

I Data is to be protected from catastrophic error (power loss/crash)

RDBMS’s

I Commercial Systems: MS Access, MySQL (Oracle), MariaDB,PostgreSQL, Informix, DB2 SQLServer, Oracle, SQLite

I NoSQL (non relational databases)

I Key-value databases: MongoDB, Cassandra, Spanner

I Graph-based: Neo4j, OrientDB

I Redis

I Database SaaS: Amazon Web Services (Dynamo)

Advantages I

I Data is structured instead of “just there”

I Better organization

I Duplication is minimized (with proper normalization)

I Updating information is easier

I Organization of data allows easy access

I Organization allows aggregation and more complex information

Advantages II

I Data integrity can be enforced (data types and user definedconstraints)

I Faster

I Scalable

I Security

I Portability

I Concurrency

Structured Query Language

We interact with RDBMs using Structured Query Language (SQL)

I Common language/interface to most databases

I Developed by Chamberlin & Boyce, IBM 1974

I Implementations may violate standards: portability issues

I Comments: #

I Keywords are case insensitive

I Create & manage tables: create, alter, drop

Structured Query LanguageCRUD

Basic SQL functionality: CRUD:

I Create – insert new records into existing tables ( insert )

I Retrieve – get a (subset) of data from specific rows/columns( select )

I Update – modify data in fields in specified rows ( update )

I Destroy – delete specific rows from table(s) ( delete )

Part II: Creating a Database

MySQLGetting Started

I Need access to a database server

I Need to use a client to connect to the server

I CLI: mysql

I MySQL Workbenchhttps://www.mysql.com/products/workbench/

1 use databaseName;

2 show tables;

3 describe TableName;

MySQLCSCE Students

I You only have access to a database with the same name as your login

I Host: cse.unl.edu

I Port: 3306

I User: your CSE login ID

I Password: go to https://cse.unl.edu/account

I Do not use the same password as your login

I More: http://cse.unl.edu/faq

Creating TablesSyntax

1 create table TableName (

2 columnName columnType [options],

3 ...

4 );

Options:

I primary key

I auto_increment (for primary keys)

I not null

I default (value)

Column Types

I varchar(n) – variable character field of up to n characters

I integer (or int )

I decimal(65,30) , float , double

I Others: aliases, vendor-specific keywords, date/time, other dataformats (JSON)

Creating TablesExample

1 create table Book (

2 bookId integer primary key auto_increment not null,

3 title varchar(255) not null,

4 author varchar(255),

5 isbn varchar(255) not null default ’’,

6 dewey float,

7 numCopies integer default 0

8 );

Primary Keys

I A primary key gives each record a unique identity

I At most one primary key per table

I No two rows can have the same primary key value

I Must be able to uniquely identify all records (not just those that exist)

I PKs can be one or more columns (composite key)

I Should not use/allow NULL values

I Can/should

I External identifiers should not be used

I Should use integers, not varchars or floats

Keys

I Tables can have multiple (non-primary) keys

I May be a combination of columns (composite key)

I May be declared non-unique in which case it serves as an index(allows database lookup optimization)

I Syntax:key(column1, column2,...)

Indices and Unique Constraints

I The keyword index can also be used

I We can make them unique by using unique

I Syntax: constraint constraintName unique index(columnName)

I Multiple columns: ... index(c1, c2)

Foreign Keys I

I Relations between records in different tables can be made withforeign keys

I A foreign key is a column that references a key (PK or regular key) inanother table

I Inserts cannot occur if the referenced record does not exist

I Foreign keys establish relationships between tables:I One-to-one (avoid)I One-to-many relationsI Many-to-oneI Many-to-Many relations: requires a Join Table

I Syntax:foreign key (columnName) references tableName(columnName)

Demonstration I

Write SQL to define two tables: one for authors and one for books.Simplify the book table, but include an ISBN as well as a constraint tokeep it unique. Model the fact that an author may write more than onebook.

Demonstration ISolution

1 create table Author (

2 authorId integer primary key not null auto_increment,

3 firstName varchar(255),

4 lastName varchar(255) not null

5 );

6

7 create table Book (

8 bookId integer primary key not null auto_increment,

9 authorId integer not null,

10 title varchar(255),

11 isbn varchar(255) not null,

12 numCopies integer default 0,

13 foreign key (authorId) references Author(authorId),

14 constraint uniqueISBN UNIQUE INDEX(isbn)

15 );

Demonstration IISolution

Figure: Entity-Relation Diagram for the Book/Author tables

Part III: Manipulating Data

Inserting Data

I Creating data involves inserting new records into a table

I Values are specified with literals

I Numerical literals

I String literals: use single quote characters

I Syntax:insert into tableName (c1, c2, ...) values (v1, v2, ...);

Inserting Data IExamples

1 insert into Author (firstName, lastName) values

2 (’Normal’, ’Mailer’);

3 --or

4 insert into Author (authorId, firstName, lastName) values

5 (1234, ’Normal’, ’Mailer’);

6

7 insert into Book (title, authorId, isbn) values

8 (’The Naked and the Dead’, 1234, ’978-0312265052’);


10 (’The Naked and the Dead’,

11 (select authorId from Author where lastName = ’Mailer’),

12 ’978-0312265052’);

Inserting Data IIExamples

1 --Errors:

2 /*


4 (’The Executioner’s Song’, 1234, ’978-0312265052’);

5 */

6


8 (’The Executioner\’s Song’, 1234, ’978-1306746281’);

Updating Data

I Existing data can be modified using an update statement

I Should be used in conjunction with clauses

I Syntax:1 update tableName set c1 = v1, c2 = v2, ...

2 where [condition];

Updating DataExamples

1 --dangerous:

2 update Author set firstName = ’Norman’;

3

4 --correct:

5 update Author set firstName = ’Norman’

6 where authorId = 1234;

Deleting Data

I Records can be deleted using a delete statement

I Should be used in conjunction with clauses

I Unless you really want to delete everything

I Syntax:delete from tableName where [condition]

Deleting DataExamples

1 --we cannot delete norman:

2 delete from Author where lastName = ’Mailer’;

3 delete from Author where authorId = 1234;

4

5 delete from Book where bookId = 1;

6 delete from Book where title = ’The Naked and the Dead’;

7

8 --deletes all of Norman’s book entries:

9 delete from Book where authorId = 1234;

10 --now we could delete norman

Part IV: Querying Data

Querying Data

I Data can be retrieved using the select statement

I Syntax:select c1, c2, ... from tableName where [condition];

I Can select all columns by using the * wildcard:select * from Book;

Querying DataExamples

1 select * from Book;

2 select * from Book where authorId = 1234;

3 select * from Book where isbn = ’978-0312265052’;

4

5 select lastName, firstName from Author;

6 select lastName, firstName from Author where authorId = 1234;

Querying DataAliasing

I Names of the columns are part of the database

I SQL allows us to “rename” them in result of our query using aliasing

I Syntax: columnName as columnAlias

I Sometimes necessary if column has no name (aggregates)

1 select title as bookTitle,

2 numCopies as numberOfCopies

3 from Book;

Complex where Clause

I Queries can be quantified using the where clause

I Only records matching the condition will be affected (updated,deleted, selected)

I Compound conditions can be composed using parentheses and theand and or keywords

1 select * from Book

2 where numCopies > 10 AND

3 (title != ’The Naked and the Dead’

4 OR authorId = 1234);

To check nullity: where title is null , where title is not null

distinct Clause

I Many records may have the same column value

I To query only unique values you may use the distinct keyword

1 select distinct lastName from Author;

2 select distinct authorId from Book;

like Clause

I String ( varchar ) values can be searched/partially matched using thelike clause

I Used in conjunction with the string wildcard, %

1 select * from Book where isbn like ’123%’;

2 select * from Author where lastName like ’%ailer%’;

in Clause

I The in clause allows you to do conditionals on a set of values

I May be used in conjunction with a nested query

1 select * from Book where isbn in (’978-0312265052’,

2 ’789-65486548’, ’681-0654895052’);

3

4 select * from Book where authorId in

5 (select authorId from Author where lastName like ’M%’);

order by Clause

I In general, the order of the results of a select clause isnondeterministic

I To impose an order, you can use order by

I Can order along multiple columns

I Can order descending ( desc ) or ascending ( asc ) order

1 select * from Book order by title;

2 select * from Book order by authorId, title desc;

Aggregate Functions

I Aggregate functions allow us to compute data in the databasewithout processing all the data in code

I count provides a mechanism to count the number of records

I Other aggregate functions: max , min , avg , sum

I null values are ignored/treated as zero

Aggregate FunctionsExamples

1 --number of Book records:

2 select count(*) as numberOfTitles from Book;

3 --most common book:

4 select max(numCopies) from Book;

5

6 --nested queries:

7 select * from Book

8 where numCopies = (select max(numCopies) from Book);

GROUP BY clause

I The group by clause allows you to project data with common valuesinto a smaller set of rows

I Used in conjunction with aggregate functions to do more complicatedaggregates

I Example: find total copies of all books grouped by their author

1 select author, sum(numCopies) as totalCopies

2 from Book group by author;

group by clauseExample Illustrated

Table content:

title author numCopies

Naked and the Dead Norman Mailer 10Dirk Gently’s Holistic Detective Agency Douglas Adams 4Barbary Shore Norman Mailer 3The Hitchhiker’s Guide to the Galaxy Douglas Adams 2The Long Dark Tea-Time of the Soul Douglas Adams 1Ender’s Game Orson Scott Card 7

GROUP BY clauseExample Illustrated

Grouping:

title author numCopies

Naked and the Dead Norman Mailer 10Barbary Shore Norman Mailer 3Dirk Gently’s Holistic Detective Agency Douglas Adams 4The Hitchhiker’s Guide to the Galaxy Douglas Adams 2The Long Dark Tea-Time of the Soul Douglas Adams 1Ender’s Game Orson Scott Card 7

group by clauseExample Illustrated

Projection & aggregation:

author totalCopies

Norman Mailer 13Douglas Adams 7Orson Scott Card 7

having clause

I Projected data can be further filtered using the having clause

I having is evaluated after group by which is evaluated after any

where clause

1 select author, sum(numCopies) as totalCopies

2 from Book where isbn like ’%9’

3 group by author

4 having totalCopies > 10;

Joins

A join is a clause that combines records from two or more tables.

I Result is a set of columns/rows (a “table”)

I Results from tables A,B are joined by shared values in specifiedcolumns

I Common to join via foreign keys

I Table names can be aliased for convenienceI Types of joins we’ll look at:

I (inner) joinI left (outer) join

I Other types of joins: Self-join, cross join (cartesian product), rightouter joins, full outer joins

Inner Join I

Inner Join:

I Most common type of join

I Combines rows of table A with rows of table B for all records thatsatisfy some predicate

I Predicate provided by the on clause

I Common to omit “inner” and simply use the join keyword

Inner Join II

1 select * from Book b

2 join Author a on b.authorId = a.authorId;

3

4 select * from Author a

5 join Book b on a.authorId = b.authorId;

6


8 join Author a on b.authorId = b.authorId

9 where a.title like ’The%’;

10

11 select a.firstName as firstName,

12 a.lastName as lastName,

13 b.title as bookTitle

14 from Book b

15 join Author a on b.authorId = b.authorId;

Inner JoinExample Illustrated

1 select s.studentId, s.lastName, e.address

2 from student s

3 join email e on s.studentId = e.studentId;

Inner JoinExample Illustrated

studentId lastName firstName

1234 Castro Starlin5678 Rizzo Anthony1122 Sveum Dale9988 Sandberg Ryne

(a) Student Table



(b) Email Table

studentId lastName address

1234 Castro [email protected] Rizzo [email protected] Sandberg [email protected] Sandberg [email protected]

Table: Joined tables

Left (Outer) Join I

I A Left Outer Join joins table A to table B, preserving all records intable A

I Records in A with no matching records in B will have null valuesfor columns in B

I Syntax: omit “outer” and use left join

Left (Outer) Join II

1 --no difference as a book cannot exist without an author:


3 left join Author a on b.authorId = a.authorId;

4

5 --includes authors even if they have no book records:

6 select * from Author a

7 left join Book b on a.authorId = b.authorId;

8

9 select s.studentId, s.lastName, e.address

10 from student s

11 left join email e on s.studentId = e.studentId;

Left (Outer) Join III

student id last name adddress

1234 Castro [email protected] Rizzo [email protected] Sandberg [email protected] Sandberg [email protected] Sveum null

Table: Left-Joined Tables

Other Joins

Nice tutorial: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

Figure: Types of Joins

Part V: Extended Demonstration

SQL Query Demonstrations

Figure: Video Game Database

Part VI: Designing a Database

Database DesignTable Design

I Identify all “entities” in different tables

I Ask “what defines an entity” to determine the columns and theirtypes

I Properly define whether or not a column should be allowed to be nulland/or defaults

I Each table should have a primary key

I Relations between tables should be identified and defined with foreignkeys

I Security concerns: don’t store sensitive information (passwords) inplaintext

Normalization I

Normalizing a database is the process of separating data into differenttables to reduce or eliminate data redundancy and reduce anomolies(preserve integrity) when data is changed. It also reduces the amount ofbook keeping necessary to make such changes.

I 1 Normal Form: the domain of each attribute in each table has onlyatomic values: each column value represents a single value

I Example: Allowing multiple email records for one PersonI Storing these as (say) a CSV in one column is a violation of 1NFI Hardcoding a fixed number of columns (Email1, Email2, Email3) for

multiple values is a violation of 1NFI Separating records out into another table and associating them via a

FK conforms to 1NF

Normalization II

I 2 Normal Form: 1NF and no non-prime attribute is dependent on anyproper subset of prime attributes

I Mostly relevant for tables with composite PKs (multiple columns)I If another, non-prime column is dependent only on a subset of these,

its not 2NFI Must split it out groups of data into multiple tables and relate them

through FKs so that non-prime column is dependent only on the subsetof prime columns

I Example: a purchase record may contain: customerId, storeId,storeLocation

I Using an auto-generated primary key gives you 2NF automatically

Normalization III

I 3 Normal Form: 2NF and no non-prime column is transitivelydependent on the key

I No non-prime column may depend on another non-prime columnI Example: Storing a price-per-unit, quantity, and total (total can be

derived from the other two)I Example: OrderId-CustomerId-CustomerName (OrderId is the PK,

CustomerId is FK, CustomerName should be derivable fromCustomerId)

Normalization IV

I Bottom line: 3NF is common sense; design your database to benormalized to begin with

I Every non-key attribute must provide a fact about the key (1NF), thewhole key (2NF), and nothing but the key (3NF), so help me Codd1

1Edgar Codd, inventor of the relational model, 1970–1971

Best Practice Tip 1Primary Keys

Primary Keys

I Should be integers (not varchars nor floats)

I Best to allow the database to do key management (auto increment)

I Should not be based on external identifiers that are not controlled bythe database (SSNs, ISBNs, etc.)

I Be consistent in naming conventions ( tableNameId )

Good Practice Tip 2Use consistent naming conventions

I Short, simple, descriptive names

I Limit abbreviations, acronymsI Use consistent styling

I Tables: UpperCamelCaseI Columns: lowerCamelCase

I Use singular forms, avoid pluralizations

I Primary key field: tableNameId

I Foreign key fields should match the fields they refer to

I End goal: unambiguous, consistent, self-documenting

Good Practice Tip 3Ensure Good Data Integrity

Data can break code, code should not break data.

I Data/databases are a service to code

I Different code, different modules can access the same data

I The database does not use the code!

I Should do everything you can to prevent bad code from harming data(constraints, foreign & primary keys, etc).

I Database is your last line of defense against bad code

Good Practice Tip 4Keep Business Logic Out!

I Databases offer “programming functionality”

I Triggers, cascades, stored procedures, etc.

I Use them sparingly!!!

I RDMSs are for the management and storage of data, not businesslogic

I Demarcation of responsibility

I DBAs should not have to be Application Programmers, and vice versa

Good Practice Tip 5Hard vs Soft Deletes

I delete removes records entirely (“hard” delete)

I May not want the data to be permanently deleted

I Tables may be given an active/inactive flag (boolean)

I Deleting a record: set the flag to inactive (“soft” delete)

I Application is responsible for only querying for active records

I Extra care must be taken

Part VII: Database Design Demonstration

Designing A DatabaseDemonstration

Design a database to support a course roster system for the University ofNebraska–Lincoln system. The database design should be able to modelstudents (with their unique NUIDs), courses, and their relation (ability ofstudents to enroll in courses). The system will also need to email studentsabout updates in enrollment, so be sure your model is able to incorporatethis functionality.

Designing A Database

Figure: A normalized database design.

End Result

I Pieces of data are now organized and have a specific type

I No duplication of data

I Entities are represented by IDs, ensuring identity (Tom Waits is nowthe same as t. Waits)

I Data integrity is enforced (only one NUID per Student)I Relations are well-defined

I A student has email(s)I A course has student(s) and a student has course(s)

Data from flat file

studentId firstName lastName nuid

1 Tom Waits 112233442 Lou Reed 111122225 John Student 12345678



courseId name description

42123 CSCE 156 Computer Science II12333 CSCE 230 Computer Hardware11132 CSCE 235 Discrete Mathematics

enrollmentId studentId courseId semester

15 5 11132 Fall 201123 1 42123 Fall 201124 2 42123 Fall 201110 1 11132 Fall 201129 5 42123 Spring 201132 5 12333 Fall 2011

Documents

Part I: Introductioncbourke/ComputerScienceII/modules/module-SQL/sql/... · 15 5 11132 Fall 2011 23 1 42123 Fall 2011 24 2 42123 Fall 2011 10 1 11132 Fall 2011 29 5 42123 Spring 2011