34
Intro. to SQL DSC340 Mike Pangburn

Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Embed Size (px)

Citation preview

Page 1: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Intro. to SQL

DSC340

Mike Pangburn

Page 2: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

2

Learning Objectives

Understand the data-representation terminology underlying relational databases

Understand core SQL concepts that stem from relational algebra cover all the fundamental operators other than divide

Gain familiarity with SQL syntax

Page 3: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Relational Databases and Tables

A Relational Database is a database management system (DBMS) that holds a set of tables, like Excel worksheets, each filled with data

A table = a set of rows or “records” Like a list… …but rows have no assumed order

Even columns have no assumed order

Page 4: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

A sample database table

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

ProductAttribute names

Table name

Tuples or rows

Page 5: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

DB Terminology overview

Term Definition

Relation (also, Entity) A table.

Tuple, (also, record or “entity instance”)

A row.

Attribute (also, field) A column.

Attribute value The value in a particular table cell (i.e., a row/column intersection).

Primary key Specified column(s) providing a unique value for every row.

Table scheme The set of attributes (not their values) defining the structure of a table.

Database schema The full set of named tables and their schemes, in the full database (which could contain many tables).

Page 6: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

6

SQL – what is it?

All SQL query take some rectangular input table(s) and, using that data as you nicely ask it to, generates your desirable output table You must ask nicely, i.e., properly

A simple language Not a true programming language Much easier to learn

Still very hard to master

Why was SQL developed? As a user-friendly means to define and manipulating data in

relational databases Incredibly valuable for addressing managerial ad hoc query

questions

Page 7: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

SQL Query

Basic form:

SELECT attributes FROM table (or multiple tables, often “joined”)

WHERE conditions (“row restrictions”)

Page 8: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

1-table query with row filtering

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

SELECT *FROM ProductWHERE category=‘Gadgets’

Product

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

“row restriction”

Page 9: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

A query projecting specific columns

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

SELECT PName, Price, ManufacturerFROM ProductWHERE Price > 100

Product

PName Price Manufacturer

SingleTouch $149.99 Canon

MultiTouch $203.99 Hitachi

“row restriction”with certain

columns“projected”

Page 10: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Selections

What goes in the WHERE clause:

x = y, x < y, x <= y, etc For numbers, they have the usual meanings For characters: lexicographic ordering For dates and times, earlier is less than later

Pattern matching on strings: s LIKE p (next slide)

You can combine selections using AND / OR type logic E.g., WHERE x < y AND (x < z OR y > z)

Page 11: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

The LIKE operator

s LIKE p: pattern matching on strings

p may contain two special “wildcard” symbols: % = any sequence of characters _ = any single character Note: these two special characters are system

dependent… “Google” the LIKE wildcard symbols for any system you use

Product(Name, Price, Category, Manufacturer)Find all products whose name mentions ‘gizmo’:

SELECT *FROM ProductsWHERE PName LIKE ‘%gizmo%’

Page 12: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Hoe to eliminate identical output rows?

Compare to:

SELECT DISTINCT categoryFROM Product

Household

Photography

Gadgets

Category

SELECT categoryFROM Product

Household

Photography

GadgetsGadgetsCategory

Page 13: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Ordering the Results

SELECT pname, price, manufacturerFROM ProductWHERE category=‘Gadgets’ AND price > 50ORDER BY price, pname

Ordering is ascending, unless you specify ORDER BY DESC(DESC means you want descending order)

Ties are broken by the second attribute on the ORDER BY list, etc.

Page 14: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Ordering the Results

SELECT CategoryFROM ProductORDER BY PName

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi?

Page 15: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

“Joining tables” in SQL

Connect two or more tables:

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

Company

CName StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

What isthe connection? Do you think the DBMS sees that?

Page 16: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Joins using SQL

Product (pname, price, category, manufacturer)Company (cname, stockPrice, country)

Find all products under $200 manufactured in Japan;return their names and prices.

SELECT PName, PriceFROM Product, CompanyWHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= 200

Joinbetween Product

and Company

Page 17: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product CompanyCname StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

PName Price

SingleTouch $149.99

SELECT PName, PriceFROM Product, CompanyWHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= 200

Joins using SQL

Page 18: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Joins and unexpected “duplicated” results

Example: Find all countries that manufacture some product in the ‘Gadgets’ category

Product (pname, price, category, manufacturer)Company (cname, stockPrice, country)

.SELECT CountryFROM Product, CompanyWHERE Manufacturer=CName AND Category=‘Gadgets’

Page 19: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Joins in SQL

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product Company

Cname StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

Country

??

??

What will the two output lines be?Do we recall how to avoid

showing identical output rows?

SELECT CountryFROM Product, CompanyWHERE Manufacturer=CName AND Category=‘Gadgets’

Page 20: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

An example joining 3 tables

Product (pname, price, category, manufacturer)

Purchase (buyer, seller, store, product)

Person(persname, phoneNumber, city)

Find names of people living in Seattle that bought some product in the ‘Gadgets’ category, and the names of the stores they bought such product from.

SELECT DISTINCT persname, storeFROM Person, Purchase, ProductWHERE persname=buyer AND product = pname AND city=‘Seattle’ AND category=‘Gadgets’

Page 21: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Disambiguating Attributes

Sometimes two relations have same attributePerson(pname, address, worksfor)Company(cname, address)

SELECT DISTINCT pname, addressFROM Person, CompanyWHERE worksfor = cname

SELECT DISTINCT Person.pname, Company.addressFROM Person, CompanyWHERE Person.worksfor = Company.cname

Whichaddress ?

Page 22: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Renaming Columns

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

SELECT Pname AS prodName, Price AS askPriceFROM ProductWHERE Price > 100

Product

prodName askPrice

SingleTouch $149.99

MultiTouch $203.99Query withrenaming

Page 23: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

“Appendix:” More detail on Join operations

Consider two new tables: Car and Driver

Let’s now look in detail at what happens “behind the scenes” when you ask a Database Management System (a DBMS) to join the two tables

Reference: car and driver table examples from http://db.grussell.org

Page 24: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Two tables for our exampleSELECT * from driver;NAME DOB

Jim Smith 11 Jan 1980

Bob Smith 23 Mar 1981

Bob Jones 3 Dec 1986

REGNO MAKE COLOUR PRICE OWNER

F611 AAA FORD RED 12000 Jim Smith

J111 BBB SKODA BLUE 11000 Jim Smith

A155 BDE MERCEDES BLUE 22000 Bob Smith

K555 GHT FIAT GREEN 6000 Bob Jones

SC04 BFE SMART BLUE 13000

SELECT * from car;

Page 25: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

REGNO MAKE COLOUR PRICE OWNER NAME DOBF611 AAA FORD RED 12000 Jim Smith Jim Smith 11 Jan 1980

J111 BBB SKODA BLUE 11000 Jim Smith Jim Smith 11 Jan 1980

A155 BDE MERCEDES BLUE 22000 Bob Smith Jim Smith 11 Jan 1980

K555 GHT FIAT GREEN 6000 Bob Jones Jim Smith 11 Jan 1980

SC04 BFE SMART BLUE 13000 Jim Smith 11 Jan 1980

F611 AAA FORD RED 12000 Jim Smith Bob Smith 23 Mar 1981

J111 BBB SKODA BLUE 11000 Jim Smith Bob Smith 23 Mar 1981

A155 BDE MERCEDES BLUE 22000 Bob Smith Bob Smith 23 Mar 1981

K555 GHT FIAT GREEN 6000 Bob Jones Bob Smith 23 Mar 1981

SC04 BFE SMART BLUE 13000 Bob Smith 23 Mar 1981

F611 AAA FORD RED 12000 Jim Smith Bob Jones 3 Dec 1986

J111 BBB SKODA BLUE 11000 Jim Smith Bob Jones 3 Dec 1986

A155 BDE MERCEDES BLUE 22000 Bob Smith Bob Jones 3 Dec 1986

K555 GHT FIAT GREEN 6000 Bob Jones Bob Jones 3 Dec 1986

SC04 BFE SMART BLUE 13000 Bob Jones 3 Dec 1986

The following SQL is missing the join condition for the two tables

SELECT *FROM car,driver

What does it do? It takes all combinations of rows from the two tables.

What if you forget the “join condition?”

Page 26: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Adding the Join condition

REGNO MAKE COLOUR PRICE OWNER NAME DOB

F611 AAA FORD RED 12000 Jim Smith Jim Smith 11 Jan 1980

J111 BBB SKODA BLUE 11000 Jim Smith Jim Smith 11 Jan 1980

A155 BDE MERCEDES BLUE 22000 Bob Smith Jim Smith 11 Jan 1980

K555 GHT FIAT GREEN 6000 Bob Jones Jim Smith 11 Jan 1980

SC04 BFE SMART BLUE 13000 Jim Smith 11 Jan 1980

F611 AAA FORD RED 12000 Jim Smith Bob Smith 23 Mar 1981

J111 BBB SKODA BLUE 11000 Jim Smith Bob Smith 23 Mar 1981

A155 BDE MERCEDES BLUE 22000 Bob Smith Bob Smith 23 Mar 1981

K555 GHT FIAT GREEN 6000 Bob Jones Bob Smith 23 Mar 1981

SC04 BFE SMART BLUE 13000 Bob Smith 23 Mar 1981

F611 AAA FORD RED 12000 Jim Smith Bob Jones 3 Dec 1986

J111 BBB SKODA BLUE 11000 Jim Smith Bob Jones 3 Dec 1986

A155 BDE MERCEDES BLUE 22000 Bob Smith Bob Jones 3 Dec 1986

K555 GHT FIAT GREEN 6000 Bob Jones Bob Jones 3 Dec 1986

SC04 BFE SMART BLUE 13000 Bob Jones 3 Dec 1986

To make the prior query a proper “join” of the two tables, we add the following WHERE clause row restriction: WHERE Car.Owner = Driver.Name

Page 27: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Traditional join syntax

SELECT * FROM car, driver

WHERE owner = name

REGNO MAKE COLOUR PRICE OWNER NAME DOB

F611 AAA FORD RED 12000 Jim Smith Jim Smith 11 Jan 1980

J111 BBB SKODA BLUE 11000 Jim Smith Jim Smith 11 Jan 1980

A155 BDE MERCEDES BLUE 22000 Bob Smith Bob Smith 23 Mar 1981

K555 GHT FIAT GREEN 6000 Bob Jones Bob Jones 3 Dec 1986

Page 28: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

An alternative syntax for the join in SQL

SELECT *

FROM car JOIN driver ON ( owner = name )

REGNO MAKE COLOUR PRICE OWNER NAME DOB

F611 AAA FORD RED 12000 Jim Smith Jim Smith 11 Jan 1980

J111 BBB SKODA BLUE 11000 Jim Smith Jim Smith 11 Jan 1980

A155 BDE MERCEDES BLUE 22000 Bob Smith Bob Smith 23 Mar 1981

K555 GHT FIAT GREEN 6000 Bob Jones Bob Jones 3 Dec 1986

Page 29: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Outer joins

Now let’s learn about the “outer join” The 3 variations on the outer join are the…

Left outer join Right outer join Full outer join

Page 30: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

OUTER JOIN

Consider the last row shown in the large table 4 slides back

This is a car without an owner (due to the “blank”) Sometimes we want to see the “unmatched rows”

that fail the join condition due to NULL (blank) values. This idea is referred to as performing an “outer

join.” There are “left” and “right” outer joins, depending on

which rows you want to keep. The join operation we have discussed prior to now is

known as a standard “inner” join.

REGNO MAKE COLOUR PRICE OWNER NAME DOB

SC04 BFE SMART BLUE 13000 Bob Jones 3 Dec 1986

Page 31: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Consider this: SELECT *

FROM car JOIN driver on (driver = name)

To the LEFT of the JOIN To the RIGHT of the JOIN

If you want all the rows in CAR to always be in the answer (whether matched or unmatched), you need a “left outer join”

If you want all the rows in DRIVER to always be in the answer (whether matched or unmatched), you need a “right outer join”

What if you want to keep all the rows from both sides? You need a “full outer join,” known as simply a FULL JOIN.

Page 32: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

SELECT *

FROM car LEFT JOIN driver ON ( owner = name )REGNO MAKE COLOUR PRICE OWNER NAME DOB

F611 AAA FORD RED 12000 Jim Smith Jim Smith 11 Jan 1980

J111 BBB SKODA BLUE 11000 Jim Smith Jim Smith 11 Jan 1980

A155 BDE MERCEDES BLUE 22000 Bob Smith Bob Smith 23 Mar 1981

K555 GHT FIAT GREEN 6000 Bob Jones Bob Jones 3 Dec 1986

SC04 BFE SMART BLUE 13000

Page 33: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

SELECT *

FROM car RIGHT JOIN driver ON ( owner = name )

REGNO MAKE COLOUR PRICE OWNER NAME DOB

F611 AAA FORD RED 12000 Jim Smith Jim Smith 11 Jan 1980

J111 BBB SKODA BLUE 11000 Jim Smith Jim Smith 11 Jan 1980

A155 BDE MERCEDES BLUE 22000 Bob Smith Bob Smith 23 Mar 1981

K555 GHT FIAT GREEN 6000 Bob Jones Bob Jones 3 Dec 1986

David Davis 1 Oct 1975

NAME DOB

Jim Smith 11 Jan 1980

Bob Smith 23 Mar 1981

Bob Jones 3 Dec 1986

David Davis 1 Oct 1975

Row added for this

example

Page 34: Intro. to SQL DSC340 Mike Pangburn. Learning Objectives Understand the data-representation terminology underlying relational databases Understand core

Example: Full outer join

SELECT *

FROM car FULL JOIN driver ON ( owner = name )

REGNO MAKE COLOUR PRICE OWNER NAME DOB

F611 AAA FORD RED 12000 Jim Smith Jim Smith 11 Jan 1980

J111 BBB SKODA BLUE 11000 Jim Smith Jim Smith 11 Jan 1980

A155 BDE MERCEDES BLUE 22000 Bob Smith Bob Smith 23 Mar 1981

K555 GHT FIAT GREEN 6000 Bob Jones Bob Jones 3 Dec 1986

SC04 BFE SMART BLUE 13000

David Davis 1 Oct 1975