Lecture 4 Additional Slides - University of Washington · Lecture 4 Additional Slides CSE 344,...

Preview:

Citation preview

Lecture 4 Additional Slides

CSE 344, Winter 2014

Sudeepa Roy

1 CSE 344 - Winter 2014

• NOTE: These slides were used as a whiteboard in class, and mostly contain the same material as the lecture slides and sqlite notes.

2 CSE 344 - Winter 2014

Announcements

• Webquiz 1 is due on Tuesday (01/14): 11:59 pm • Webquiz 2 will be posted on Wednesday (01/15):

00:00 am • Homework 1 is due on Thursday (01/16)

• This Wednesday’s (01/15) office hour: 10-10:50 am,

CSE 344 (Sudeepa)

• Lecture 5-8 (SQL) by Daniel Halperin

CSE 344 - Winter 2014 3

Outline

• Nulls in SQL (6.1.6)

• Outer joins (6.3.8)

• Aggregations (6.4.3 – 6.4.6)

• Examples, examples, examples…

CSE 344 - Winter 2014 4

NULL In SqLite

pname price category manufacturer

Gizmo 19.99 gadget GizmoWorks

PowerGizmo 29.99 gadget GizmoWorks

SingleTouch 149.99 photography Canon

MultiTouch 199.99 photography Hitachi

SuperGizmo 49.99 gadget Hitachi

iPad 5 NULL gadget Apple

cname country

GizmoWorks USA

Canon Japan

Hitachi Japan

Google USA

Product Company

SELECT gadegts with price < 25 and >=25 iPad5 is nowhere!

• Product(pname, price, category, manufacturer) • Company(cname, country)

5 CSE 344 - Winter 2014

Conditions involving NULL

• We need to evaluate in SQL conditions like this:

• ((price < 25) and (category = 'gadget')) or (manufacturer = 'Apple')

• Suppose price = 19, category = NULL, and manufacturer = NULL

• Is the predicate true or false?

6 CSE 344 - Winter 2014

3-valued logic

• FALSE = 0 – E.g. price<25 is FALSE when price=99

• UNKNOWN = 0.5 – E.g. price<25 is UNKNOWN when price=NULL

• TRUE = 1 – E.g. price<25 is TRUE when price=19

• PREDICATES:

– C1 AND C2 means min(C1,C2) – C1 OR C2 means max(C1,C2) – not C means 1-C

• Answer appears only if the truth value is TRUE • Does not appear if the truth value is FALSE or UNKNOWN • Therefore, no iPad5!

7 CSE 344 - Winter 2014

Compute the truth value

• Your answer in class :

= (TRUE and UNKNOWN) OR UNKNOWN

= UNKNOWN OR UNKNOWN

= UNKNOWN

• ((price < 25) and (category = 'gadget‘)) or (manufacturer = 'Apple')

• Suppose price = 19, category = NULL, and manufacturer = NULL

8 CSE 344 - Winter 2014

OUTERJOIN

9 CSE 344 - Winter 2014

JOIN = INNER JOIN

• Select pname, cname from Product, Company where manufacturer = cname;

• Select pname, cname from Product (INNER) JOIN Company ON manufacturer = cname;

• Where is ipad5 or Google? pname price category manufacturer

Gizmo 19.99 gadget GizmoWorks

PowerGizmo 29.99 gadget GizmoWorks

SingleTouch 149.99 photography Canon

MultiTouch 199.99 photography Hitachi

SuperGizmo 49.99 gadget Hitachi

iPad 5 NULL gadget Apple

cname country

GizmoWorks USA

Canon Japan

Hitachi Japan

Google USA

Product Company

optional

10 CSE 344 - Winter 2014

LEFT OUTER JOIN

• Include the left tuple even if there’s no match • Select pname, cname from Product LEFT OUTER JOIN

Company ON manufacturer = cname; • You should get (iPad5, null) • Reverse the order of relations.

pname price category manufacturer

Gizmo 19.99 gadget GizmoWorks

PowerGizmo 29.99 gadget GizmoWorks

SingleTouch 149.99 photography Canon

MultiTouch 199.99 photography Hitachi

SuperGizmo 49.99 gadget Hitachi

iPad 5 NULL gadget Apple

cname country

GizmoWorks USA

Canon Japan

Hitachi Japan

Google USA

Product Company

11 CSE 344 - Winter 2014

RIGHT OUTER JOIN

• Include the right tuple even if there’s no match • Select pname, cname from Product RIGHT OUTER

JOIN Company ON manufacturer = cname; • You should get (null, Google)

pname price category manufacturer

Gizmo 19.99 gadget GizmoWorks

PowerGizmo 29.99 gadget GizmoWorks

SingleTouch 149.99 photography Canon

MultiTouch 199.99 photography Hitachi

SuperGizmo 49.99 gadget Hitachi

iPad 5 NULL gadget Apple

cname country

GizmoWorks USA

Canon Japan

Hitachi Japan

Google USA

Product Company

12 CSE 344 - Winter 2014

FULL OUTER JOIN

• Include both left and right tuples even if there’s no match

• Select pname, cname from Product (FULL) OUTER JOIN Company ON manufacturer = cname;

• You should get both (null, Google) and (iPad5, null)

pname price category manufacturer

Gizmo 19.99 gadget GizmoWorks

PowerGizmo 29.99 gadget GizmoWorks

SingleTouch 149.99 photography Canon

MultiTouch 199.99 photography Hitachi

SuperGizmo 49.99 gadget Hitachi

iPad 5 NULL gadget Apple

cname country

GizmoWorks USA

Canon Japan

Hitachi Japan

Google USA

Product Company

13 CSE 344 - Winter 2014

AGGREGATION IN SQL

14 CSE 344 - Winter 2014

Aggregation in SQL

CSE 344 - Winter 2014 15

Other DBMSs have other ways of importing data

Specify a filename where the database will be stored

>sqlite3 lecture04

sqlite> create table Purchase

(pid int primary key,

product text,

price float,

quantity int,

month varchar(15));

sqlite> -- download data.txt

sqlite> .import data.txt Purchase

Comment about SQLite

• One cannot load NULL values such that they are actually loaded as null values

• So we need to use two steps:

– Load null values using some type of special value

– Update the special values to actual null values

CSE 344 - Winter 2014 16

update Purchase

set price = null

where price = ‘null’

Simple Aggregations

Five basic aggregate operations in SQL

CSE 344 - Winter 2014 17

select count(*) from Purchase

select sum(quantity) from Purchase

select avg(price) from Purchase

select max(quantity) from Purchase

select min(quantity) from Purchase

Aggregates and NULL Values

18

insert into Purchase

values(12, 'gadget', NULL, NULL, 'april')

select count(*) from Purchase

select count(quantity) from Purchase

select sum(quantity) from Purchase

select sum(quantity)

from Purchase

where quantity is not null;

Null values are not used in aggregates

Let’s try the following

CSE 344 - Winter 2014

COUNT applies to duplicates, unless otherwise stated:

SELECT Count(product)

FROM Purchase

WHERE price > 4.99

same as Count(*)

We probably want:

SELECT Count(DISTINCT product)

FROM Purchase

WHERE price> 4.99

Counting Duplicates

CSE 344 - Winter 2014 19

More Examples

SELECT Sum(price * quantity)

FROM Purchase

SELECT Sum(price * quantity)

FROM Purchase

WHERE product = ‘bagel’

What do

they mean ?

CSE 344 - Winter 2014 20

Simple Aggregations

Purchase

SELECT Sum(price * quantity)

FROM Purchase

WHERE product = ‘Bagel’ 90 (= 60+30)

Product Price Quantity

Bagel 3 20

Bagel 1.50 20

Banana 0.5 50

Banana 2 10

Banana 4 10

CSE 344 - Winter 2014 21

GROUP BY: Grouping and Aggregation

Purchase(product, price, quantity)

SELECT product, Sum(quantity) AS TotalSales

FROM Purchase

WHERE price > 1

GROUP BY product

Let’s see what this means…

Find total quantities for all sales over $1, by product.

CSE 344 - Winter 2014 22

Grouping and Aggregation

1. Compute the FROM and WHERE clauses.

2. Group by the attributes in the GROUPBY

3. Compute the SELECT clause:

grouped attributes and aggregates.

CSE 344 - Winter 2014 23

1&2. FROM-WHERE-GROUPBY

Product Price Quantity

Bagel 3 20

Bagel 1.50 20

Banana 0.5 50

Banana 2 10

Banana 4 10

CSE 344 - Winter 2014 24

WHERE price > 1

SELECT product,

Sum(quantity) AS TotalSales

FROM Purchase

WHERE price > 1

GROUP BY product

3. SELECT

SELECT product, Sum(quantity) AS TotalSales

FROM Purchase

WHERE price > 1

GROUP BY product

Product TotalSales

Bagel 40

Banana 20

Product Price Quantity

Bagel 3 20

Bagel 1.50 20

Banana 0.5 50

Banana 2 10

Banana 4 10

25 CSE 344 - Winter 2014

Other Examples

SELECT product,

sum(quantity) AS SumQuantity,

max(price) AS MaxPrice

FROM Purchase

GROUP BY product

What does

it mean ?

CSE 344 - Winter 2014

SELECT product, count(*)

FROM Purchase

GROUP BY product

SELECT month, count(*)

FROM Purchase

GROUP BY month

Compare these

two queries:

26

Need to be Careful…

SELECT product, max(quantity)

FROM Purchase

GROUP BY product

SELECT product, quantity

FROM Purchase

GROUP BY product

sqlite is WRONG on this query. Advanced DBMS (e.g. SQL Server) gives an error

(the attribute in SELECT must appear in an aggregate function OR GROUP BY clause)

Product Price Quantity

Bagel 3 20

Bagel 1.50 20

Banana 0.5 50

Banana 2 10

Banana 4 10

27 CSE 344 - Winter 2014

What does this query do?

• Your answer in class :

Bagel september

Banana september

Gizmo march

Gadget march

Orange may

28 CSE 344 - Winter 2014

• select product, max(month) from purchase group by product;

Understanding groups -- 1

• select * from purchase;

• How many tuples?

• Answer: 12

29 CSE 344 - Winter 2014

Understanding groups -- 2

• Your answer in class : • Bagel 2 • Orange 1 • Banana 2 • Gadget 4 • Gizmo 3

30 CSE 344 - Winter 2014

• select product, count(*) from purchase group by product;

Understanding groups - 3

• Your answer in class:

• Bagel 1

• Banana 0 (xxxxx WON’T APPEAR)

• Orange 0 (xxxxx WON’T APPEAR)

• Gizmo 3

• Gadget 3

31 CSE 344 - Winter 2014

• select product, count(*) from purchase group by product;

Always remember that first WHERE

condition is applied, then GROUP BY is

performed, so these tuples with

0 value won’t appear in the

results

DISTINCT = GROUP BY

• select month, count(*) from purchase group by month;

• BUT AGGREGATES ARE NOT NECESSARY WITH GROUP BY

• select month from purchase group by month;

• IS EQUIVALENT TO

• select distinct month from purchase;

32 CSE 344 - Winter 2014

Ordering Results

CSE 344 - Winter 2014

SELECT product, sum(price*quantity) as rev

FROM purchase

GROUP BY product

ORDER BY rev desc;

33

GROUP BY with Multiple Attributes

• Try the query you wrote in class

• select pid, pname, count(*) from purchase group by pid, pname;

CSE 344 - Winter 2014 34