36
Database Principles SQL 2

Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Embed Size (px)

Citation preview

Page 1: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

SQL 2

Page 2: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Aggregate Queries

• SQL has 5 built-in “column functions” called aggregate functions.– min(): Returns the minimum value in a column– max(): Returns the maximum value in a column– sum(): Returns the sum of the values in a numeric

column– count(): Returns the number of values in a column– avg(): Returns the average of the values in a numeric

column

Page 3: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Simple Syntax for Aggregate Functions:

select <non-aggregate column list>, <aggregate column list> from <table list> where <condition> group by <non-aggregate column list>

repeated

Page 4: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Example

Select min(p_price) AS MIN, max(p_price) AS MAX, sum(p_price) AS SUM, count(p_price) AS COUNT, count(distinct p_price) AS COUNT_DISTINCT, CAST(avg(p_price) AS NUMERIC(5,2)) AS AVERAGE, CAST(avg(distinct p_price) AS NUMERIC(5,2)) AS AVERAGE_DISTINCT from copy

MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT------- ------- ------- ---------- -------------------------- ---------------- ------------------------------11.00 37.00 354.00 16 8 22.12 24.12

1 record(s) selected.

Page 5: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

The Gory Details:

• Using aggregate functions on table columns (or expressions) is complicated by having these functions operate on subgroups of the values in some other column or columns.

• This is similar to something you might do with Excel.

Page 6: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Another example:

• Perform the previous query but aggregate over the individual books in the Copy table and not the entire table.

Select ISBN, min(p_price) AS MIN, max(p_price) AS MAX, sum(p_price) AS SUM, count(p_price) AS COUNT, count(distinct p_price) AS COUNT_DISTINCT, CAST(avg(p_price) AS NUMERIC(5,2)) AS AVERAGE, CAST(avg(distinct p_price) AS NUMERIC(5,2)) AS AVERAGE_DISTINCT from copy group by ISBN

Page 7: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

The Answer:

ISBN MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT------- ------- ------- ------- ----------- -------------------------- --------------- ------------------------------1-23 19.00 19.00 19.00 1 1 19.00 19.001-52 28.00 28.00 84.00 3 1 28.00 28.002-34 30.00 37.00 67.00 2 2 33.50 33.503-56 21.00 21.00 21.00 1 1 21.00 21.004-76 30.00 30.00 60.00 2 1 30.00 30.006-99 11.00 12.00 68.00 6 2 11.33 11.507-45 35.00 35.00 35.00 1 1 35.00 35.00

Select ISBN, min(p_price) AS MIN, max(p_price) AS MAX, sum(p_price) AS SUM, count(p_price) AS COUNT, count(distinct p_price) AS COUNT_DISTINCT, CAST(avg(p_price) AS NUMERIC(5,2)) AS AVERAGE, CAST(avg(distinct p_price) AS NUMERIC(5,2)) AS AVERAGE_DISTINCT from copy group by ISBN

Page 8: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

How Group-By Works:

• Step 1: Ignore the group_by clause, use the where_clause to build a work table invisible to the programmer. The work table will contain all the columns necessary to calculate the final result table.

• Step 2: Use the columns in the group_by clause to divide the work table into groups where the values of the group_by columns are the same.

• Step 3: Calculate the aggregate functions of the select_list one group at a time.

• Step 4: Produce one row of output per group.

Page 9: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Step 1:

• Create the work table with all data needed to produce the final output:

ISBN P_PRICE------- -------------6-99 12.001-52 28.00

6-99 12.001-23 19.006-99 11.003-56 21.001-52 28.006-99 11.001-52 28.004-76 30.006-99 11.002-34 30.007-45 35.006-99 11.002-34 37.004-76 30.00

Remember the final output contains the ISBN column and various aggregate functionsapplied to p_price.

Page 10: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Step 2:

• Break the work table into groups using the columns of the group_by clause (ISBN). Each group contains a single set of values for the group_by columns.

ISBN P_PRICE------- -------------1-23 19.00---------------------1-52 28.001-52 28.001-52 28.00--------------------2-34 30.002-34 37.00--------------------3-56 21.00--------------------4-76 30.004-76 30.00--------------------6-99 11.006-99 11.006-99 11.006-99 11.006-99 12.006-99 12.00--------------------7-45 35.00

single value ineach group

NOTE: This requires sortingthe rows of the work tableon the columns of thegroup_by clause.

Page 11: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Step 3: • Calculate the aggregate functions of the select_list one group at a time.

ISBN P_PRICE------- -------------1-23 19.00---------------------1-52 28.001-52 28.001-52 28.00--------------------2-34 30.002-34 37.00--------------------3-56 21.00--------------------4-76 30.004-76 30.00--------------------6-99 11.006-99 11.006-99 11.006-99 11.006-99 12.006-99 12.00--------------------7-45 35.00

MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT------- ------- ------- ----------- -------------------------- --------------- ------------------------------19.00 19.00 19.00 1 1 19.00 19.00

28.00 28.00 84.00 3 1 28.00 28.00

30.00 37.00 67.00 2 2 33.50 33.50

21.00 21.00 21.00 1 1 21.00 21.00

30.00 30.00 60.00 2 1 30.00 30.00

11.00 12.00 68.00 6 2 11.33 11.50

35.00 35.00 70.00 1 1 35.00 35.00

Page 12: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Step 4:

• Produce one row of output per group.

ISBN MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT------- ------- ------- ------- ----------- -------------------------- --------------- ------------------------------1-23 19.00 19.00 19.00 1 1 19.00 19.001-52 28.00 28.00 84.00 3 1 28.00 28.002-34 30.00 37.00 67.00 2 2 33.50 33.503-56 21.00 21.00 21.00 1 1 21.00 21.004-76 30.00 30.00 60.00 2 1 30.00 30.006-99 11.00 12.00 68.00 6 2 11.33 11.507-45 35.00 35.00 35.00 1 1 35.00 35.00

Page 13: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Observations:

• It is vitally important that there not be any variation in the non-aggregate values in each group since only one row of output per group is permitted and there can be no ambiguity about what goes in that row.

• For this reason db2 insists that the non-aggregate columns of the select_list match the columns in the group_by clause.

• Queries like the following are not permitted because ofthe potential that author andtitle might not be constantfor a given ISBN (eventhough we know they are).

Select k.ISBN, k.author, k.title, sum(p_price) AS SUM from copy c, book k where c.isbn = k.isbn group by k.ISBN

Page 14: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Explanation of Single-Row Rule:

• Suppose we started to build the work table for the previous query and that there were two copies with the same ISBN but different author or title.

• For the group ‘5-55’ what would the single output row look like?

Select k.ISBN, k.author, k.title, sum(p_price) AS SUM from copy c, book k where c.isbn = k.isbn group by k.ISBN

ISBN Author Title p_price------- --------- ------ ---------- . . .----------------------------------------------5-55 X1 T1 27.005-55 X1 T2 33.00---------------------------------------------- . . .

5-55 X1 T1 60.00

OR5-55 X1 T2 60.00

Since db2 can’t decide it

doesn’t let this happen.

Page 15: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Solution:

• Rewrite the query.

• The groups in the worktable would becometwo groups and noambiguity about theoutput

Select k.ISBN, k.author, k.title, sum(p_price) AS SUM from copy c, book k where c.isbn = k.isbn group by k.ISBN, k.author, k.title

ISBN Author Title p_price------- --------- ------ ---------- . . .----------------------------------------------5-55 X1 T1 27.00----------------------------------------------5-55 X1 T2 33.00---------------------------------------------- . . .

Page 16: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Example:

• Find how many books have been borrowed by each cardholder.

• Problem: The result only has six cardholders and the original Cardholder table has seven cardholders.

• Analysis: One cardholder (Albert from Rosendale) has not borrowed any book and does not appear in the query result. Why?

select b_name, b_addr, count(*) from cardholder ch, borrows b where ch.borrowerid = b.borrowerid group by b_name, b_addr

B_NAME B_ADDR 3 ------------- ------------ -----------diana Tilson 1jo-ann New Paltz 2john Kingston 2john New Paltz 2mike Modena 3susan Wallkill 1

Page 17: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Example (cont):

• According to our description of how group by works, the first thing created is a work table. Let’s look at that table.

• The reason Albert of Rosendaleis missing from the work table isbecause the join_term

fails to be true for that cardholder.• Since Albert never makes it to

the work table he can never makeit to the final answer table.

B_NAME B_ADDR L_DATE ------------- ------------ ------------john New Paltz 12/10/1992john New Paltz 12/01/1992jo-ann New Paltz 12/14/1992jo-ann New Paltz 11/30/1992mike Modena 12/08/1992mike Modena 12/04/1992mike Modena 12/04/1992john Kingston 12/09/1992diana Tilson 12/12/1992susan Wallkill 12/01/1992john Kingston 11/28/1992

ch.borrowerid = b.borrowerid

Page 18: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Solution:

• SQL has a special join called the outer join that helps resolve this problem.

• The left outer join acts like a normal join when the join_term is true.

• When the join_term is never true for a row in the table to the left of the left outer join syntax, the left outer join is true once.

• This changes the work table

select b_name, b_addr, count(l_date) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr

Page 19: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Solution (2):

• The reason for the null valueis that since there is no joinbetween the row containingAlbert’s information and theBorrows table, there is no corresponding l_date value so the work table has to putnull value in place of a date.

• On top of that, the count() function counts a single null value as 0.

select b_name, b_addr, count(l_date) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr

B_NAME B_ADDR L_DATE ------------- ------------ ------------john New Paltz 12/10/1992john New Paltz 12/01/1992albert Rosendale nulljo-ann New Paltz 12/14/1992jo-ann New Paltz 11/30/1992mike Modena 12/08/1992mike Modena 12/04/1992mike Modena 12/04/1992john Kingston 12/09/1992diana Tilson 12/12/1992susan Wallkill 12/01/1992john Kingston 11/28/1992

Page 20: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Solution (3):

• The final table then becomes:

B_NAME B_ADDR 3 ------------ ------------ -----------albert Rosendale 0diana Tilson 1jo-ann New Paltz 2john Kingston 2john New Paltz 2mike Modena 3susan Wallkill 1

Page 21: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Alternative (Incorrect) Solution:

• You must be careful that the column being used in the aggregate function must come from the right-hand table.

• The following query fails to produce the correct result.

• It is clear that when

tries to count ch.borroweridit is not counting null so actually comes up with a number – 1.

select b_name, b_addr, count(ch.borrowerid) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr

B_NAME B_ADDR BORROWERID ------------- ------------ ---------------------diana Tilson 9823jo-ann New Paltz 1325jo-ann New Paltz 1325Albert Rosendale 1345john Kingston 7635john Kingston 7635john New Paltz 1234john New Paltz 1234mike Modena 2653mike Modena 2653mike Modena 2653susan Wallkill 5342

work table

count(ch.borrowerid)

Page 22: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Alternative (Incorrect) Solution (cont):

• Trying to count from the Cardholder table and not the Borrows table yields the following incorrect solution:

select b_name, b_addr, count(ch.borrowerid) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr

B_NAME B_ADDR 3 ------------ ------------ -----------albert Rosendale 1diana Tilson 1jo-ann New Paltz 2john Kingston 2john New Paltz 2mike Modena 3susan Wallkill 1

turns out to be 1 insteadof the correct 0.

Page 23: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Left Outer Join vs Right Outer Join:

• The following are equivalent:

select b_name, b_addr, count(l_date) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr

select b_name, b_addr, count(l_date) from borrows b right outer join cardholder ch on b.borrowerid = ch.borrowerid group by b_name, b_addr

Page 24: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Warning:

• You are not allowed to use an aggregate function in a where_clause except inside a subquery.

• The error in this query is that where_clause conditions are evaluated one row at a time and count(*) is always applied to a set of rows as a unit.

Find the cardholders with two books borrowed

Select b_name, b_addr from cardholder ch, borrows b where ch.borrowerid = b.borrowerid AND count(*) = 2 # this causes a syntax error

Find the cardholders with two books borrowed

Select b_name, b_addr from cardholder ch where 2 = (select count(*) from borrows b where b.borrowerid = ch.borrowerid)

co-related subquery

Page 25: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Complete Group By Syntax:

• The having_clause is intended to do for groups what the where_clause does for rows. In other words, the having_clause is intended to include some groups and not others.

select <non-aggregate column list>, <aggregate column list> from <table list> where <condition> group by <non-aggregate column list> having <group condition>

Page 26: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

How the Complete Group-By Works:

• Step 1: Ignore the group_by clause, use the where_clause to build a work table invisible to the programmer. The work table will contain all the columns necessary to calculate the final result table.

• Step 2: Use the columns in the group_by clause to divide the work table into groups where the values of the group_by columns are the same.

• Step 3: Apply the having_clause condition to each group in turn, throwing away groups where it is false.

• Step 4: Calculate the aggregate functions of the select_list one group at a time.

• Step 5: Produce one row of output per group.

Page 27: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Example:

• For each cardholder, find the total value of all books on loan to that cardholder provided the total values exceeds $40.00.

• NOTE: We don’t need to use left outer join here because we are only interested in cardholders with one or more book loans.

select b_name, b_addr, sum(p_price) from cardholder ch , borrows b, copy c where ch.borrowerid = b.borrowerid AND b.accession_no = c.accession_no group by b_name, b_addr having sum(p_price) >= 40.00

B_NAME B_ADDR 3 ------------- ------------ --------john Kingston 58.00mike Modena 95.00

Page 28: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Example 2:

• For each cardholder, find the total value of all books on loan to that cardholder provided the total values is less than $40.00.

• NOTES:– coalesce(A,B)– If A is null then value is B

select b_name, b_addr, coalesce(sum(p_price),0) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid, copy c where b.accession_no = c.accession_no group by b_name, b_addr having coalesce(sum(p_price),0.0) < 40.00;

B_NAME B_ADDR 3 ------------- ------------ -----------albert Rosendale 0.00diana Tilson 28.00jo-ann New Paltz 39.00john New Paltz 30.00susan Wallkill 37.00

Page 29: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Revisit Left Outer Join:

• Yes, know how to do them but avoid them if you can.• Consider

• To fully join Cardholder to borrows or Copy we need a left outer join.

• To join Book to Copy we do not need a left outer join.

borrows

Page 30: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Dummy Rows in Copy and Book

• Perform the following inserts into Book and Copy

• Think of these as “dummy” rows and needs be done only once.

• Minimum participation number of COPY is_copy_of BOOK stays as 1

insert into Book (ISBN) values ('0-00');insert into Copy(acc_no,ISBN) values ('0','0-00');

Page 31: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Insert a New Cardholder

• Every time you add a new Cardholder, add a corresponding dummy row in Borrows.

• What we have done is make it appear as though Donna has borrowed the “dummy” copy of the “dummy” book.

• Now Cardholder <borrows> Copy minimum participation number is 1.

insert into Cardholder (borrowerid,b_name,b_addr,b_status) values(9999,'Donna','Accord','junior');-- also addinsert into Borrows (borrowerid, accession_no) values(9999,'0');

Page 32: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Automatic Input

• Databases provide a mechanism called a trigger to do automatic things like the insert into Borrows.

• Insert a row into Cardholder and the trigger “fires” and causes an insert to take place in Borrows as well.

• So even Cardholders who have borrowed nothing have borrowed the dummy book.

create trigger i_cardholder after insert on Cardholderreferencing new as nfor each rowbegin atomic insert into borrows (borrwerid,accession_no) values(n.borrowerid,'0');end@

Page 33: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Automatic Input (cont)

• We also need triggers on Borrows because we need a cardholder to either have borrowed the dummy book or a real book but not both.

create trigger i_borrows after insert on borrowsreferencing new as nfor each rowbegin atomic delete into borrows where borrower_id = n.borrower_id and accession_no = '0';end@

Page 34: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Automatic Input (cont)

• And when we delete a book loan.

create trigger d_borrows after delete on borrowsreferencing old as ofor each rowBEGIN ATOMIC declare v_accession_cnt int set v_accession_cnt = (select count(*) from borrows where borrower_id = o.borrower_id); IF (v_accession_cnt = 0) THEN insert into borrows(borrower_id,accession_no) values (o.borrower_id,'0'); END IF;END@

Page 35: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

No More Left Outer Join:

• Find the number of books borrowed by each cardholder.

• NOTES: – qnec(a,b) is a user-defined function that returns

1 if a!= b and 0 if a = b.– sum(0|1) == count(*) where row has 1

select ch.borrower_id, b_name, b_addr, sum(qnec(b.accession_no,'0')) from cardholder ch, borrows b where ch.borrowerid = b.borrowerid group by ch.borrowerid, b_name, b_addr;

Page 36: Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the

Database Principles

Non-Aggregate Example

• Suppose we want a list of all books a cardholder has borrowed and the cardholder names. Place a – where the cardholder has borrowed no books

compared to

select b_name, title from cardholder ch, borrows b, copy c, book k where ch.borrowerid = b.borrowerid and b.accession_no = c.accession_no and c.isbn = k.isbn;

select b_name, title from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid , copy c, book k Where b.accession_no = c.accession_no and c.isbn = k.isbn;