Upload
tabitha-lorin-wiggins
View
221
Download
0
Embed Size (px)
Citation preview
Database Principles
SQL 2
Database Principles
Aggregate Queries
• SQL has 5 built-in “column functions” called aggregate functions.– min(): Returns the minimum value in a column– max(): Returns the maximum value in a column– sum(): Returns the sum of the values in a numeric
column– count(): Returns the number of values in a column– avg(): Returns the average of the values in a numeric
column
Database Principles
Simple Syntax for Aggregate Functions:
select <non-aggregate column list>, <aggregate column list> from <table list> where <condition> group by <non-aggregate column list>
repeated
Database Principles
Example
Select min(p_price) AS MIN, max(p_price) AS MAX, sum(p_price) AS SUM, count(p_price) AS COUNT, count(distinct p_price) AS COUNT_DISTINCT, CAST(avg(p_price) AS NUMERIC(5,2)) AS AVERAGE, CAST(avg(distinct p_price) AS NUMERIC(5,2)) AS AVERAGE_DISTINCT from copy
MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT------- ------- ------- ---------- -------------------------- ---------------- ------------------------------11.00 37.00 354.00 16 8 22.12 24.12
1 record(s) selected.
Database Principles
The Gory Details:
• Using aggregate functions on table columns (or expressions) is complicated by having these functions operate on subgroups of the values in some other column or columns.
• This is similar to something you might do with Excel.
Database Principles
Another example:
• Perform the previous query but aggregate over the individual books in the Copy table and not the entire table.
Select ISBN, min(p_price) AS MIN, max(p_price) AS MAX, sum(p_price) AS SUM, count(p_price) AS COUNT, count(distinct p_price) AS COUNT_DISTINCT, CAST(avg(p_price) AS NUMERIC(5,2)) AS AVERAGE, CAST(avg(distinct p_price) AS NUMERIC(5,2)) AS AVERAGE_DISTINCT from copy group by ISBN
Database Principles
The Answer:
ISBN MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT------- ------- ------- ------- ----------- -------------------------- --------------- ------------------------------1-23 19.00 19.00 19.00 1 1 19.00 19.001-52 28.00 28.00 84.00 3 1 28.00 28.002-34 30.00 37.00 67.00 2 2 33.50 33.503-56 21.00 21.00 21.00 1 1 21.00 21.004-76 30.00 30.00 60.00 2 1 30.00 30.006-99 11.00 12.00 68.00 6 2 11.33 11.507-45 35.00 35.00 35.00 1 1 35.00 35.00
Select ISBN, min(p_price) AS MIN, max(p_price) AS MAX, sum(p_price) AS SUM, count(p_price) AS COUNT, count(distinct p_price) AS COUNT_DISTINCT, CAST(avg(p_price) AS NUMERIC(5,2)) AS AVERAGE, CAST(avg(distinct p_price) AS NUMERIC(5,2)) AS AVERAGE_DISTINCT from copy group by ISBN
Database Principles
How Group-By Works:
• Step 1: Ignore the group_by clause, use the where_clause to build a work table invisible to the programmer. The work table will contain all the columns necessary to calculate the final result table.
• Step 2: Use the columns in the group_by clause to divide the work table into groups where the values of the group_by columns are the same.
• Step 3: Calculate the aggregate functions of the select_list one group at a time.
• Step 4: Produce one row of output per group.
Database Principles
Step 1:
• Create the work table with all data needed to produce the final output:
ISBN P_PRICE------- -------------6-99 12.001-52 28.00
6-99 12.001-23 19.006-99 11.003-56 21.001-52 28.006-99 11.001-52 28.004-76 30.006-99 11.002-34 30.007-45 35.006-99 11.002-34 37.004-76 30.00
Remember the final output contains the ISBN column and various aggregate functionsapplied to p_price.
Database Principles
Step 2:
• Break the work table into groups using the columns of the group_by clause (ISBN). Each group contains a single set of values for the group_by columns.
ISBN P_PRICE------- -------------1-23 19.00---------------------1-52 28.001-52 28.001-52 28.00--------------------2-34 30.002-34 37.00--------------------3-56 21.00--------------------4-76 30.004-76 30.00--------------------6-99 11.006-99 11.006-99 11.006-99 11.006-99 12.006-99 12.00--------------------7-45 35.00
single value ineach group
NOTE: This requires sortingthe rows of the work tableon the columns of thegroup_by clause.
Database Principles
Step 3: • Calculate the aggregate functions of the select_list one group at a time.
ISBN P_PRICE------- -------------1-23 19.00---------------------1-52 28.001-52 28.001-52 28.00--------------------2-34 30.002-34 37.00--------------------3-56 21.00--------------------4-76 30.004-76 30.00--------------------6-99 11.006-99 11.006-99 11.006-99 11.006-99 12.006-99 12.00--------------------7-45 35.00
MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT------- ------- ------- ----------- -------------------------- --------------- ------------------------------19.00 19.00 19.00 1 1 19.00 19.00
28.00 28.00 84.00 3 1 28.00 28.00
30.00 37.00 67.00 2 2 33.50 33.50
21.00 21.00 21.00 1 1 21.00 21.00
30.00 30.00 60.00 2 1 30.00 30.00
11.00 12.00 68.00 6 2 11.33 11.50
35.00 35.00 70.00 1 1 35.00 35.00
Database Principles
Step 4:
• Produce one row of output per group.
ISBN MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT------- ------- ------- ------- ----------- -------------------------- --------------- ------------------------------1-23 19.00 19.00 19.00 1 1 19.00 19.001-52 28.00 28.00 84.00 3 1 28.00 28.002-34 30.00 37.00 67.00 2 2 33.50 33.503-56 21.00 21.00 21.00 1 1 21.00 21.004-76 30.00 30.00 60.00 2 1 30.00 30.006-99 11.00 12.00 68.00 6 2 11.33 11.507-45 35.00 35.00 35.00 1 1 35.00 35.00
Database Principles
Observations:
• It is vitally important that there not be any variation in the non-aggregate values in each group since only one row of output per group is permitted and there can be no ambiguity about what goes in that row.
• For this reason db2 insists that the non-aggregate columns of the select_list match the columns in the group_by clause.
• Queries like the following are not permitted because ofthe potential that author andtitle might not be constantfor a given ISBN (eventhough we know they are).
Select k.ISBN, k.author, k.title, sum(p_price) AS SUM from copy c, book k where c.isbn = k.isbn group by k.ISBN
Database Principles
Explanation of Single-Row Rule:
• Suppose we started to build the work table for the previous query and that there were two copies with the same ISBN but different author or title.
• For the group ‘5-55’ what would the single output row look like?
Select k.ISBN, k.author, k.title, sum(p_price) AS SUM from copy c, book k where c.isbn = k.isbn group by k.ISBN
ISBN Author Title p_price------- --------- ------ ---------- . . .----------------------------------------------5-55 X1 T1 27.005-55 X1 T2 33.00---------------------------------------------- . . .
5-55 X1 T1 60.00
OR5-55 X1 T2 60.00
Since db2 can’t decide it
doesn’t let this happen.
Database Principles
Solution:
• Rewrite the query.
• The groups in the worktable would becometwo groups and noambiguity about theoutput
Select k.ISBN, k.author, k.title, sum(p_price) AS SUM from copy c, book k where c.isbn = k.isbn group by k.ISBN, k.author, k.title
ISBN Author Title p_price------- --------- ------ ---------- . . .----------------------------------------------5-55 X1 T1 27.00----------------------------------------------5-55 X1 T2 33.00---------------------------------------------- . . .
Database Principles
Example:
• Find how many books have been borrowed by each cardholder.
• Problem: The result only has six cardholders and the original Cardholder table has seven cardholders.
• Analysis: One cardholder (Albert from Rosendale) has not borrowed any book and does not appear in the query result. Why?
select b_name, b_addr, count(*) from cardholder ch, borrows b where ch.borrowerid = b.borrowerid group by b_name, b_addr
B_NAME B_ADDR 3 ------------- ------------ -----------diana Tilson 1jo-ann New Paltz 2john Kingston 2john New Paltz 2mike Modena 3susan Wallkill 1
Database Principles
Example (cont):
• According to our description of how group by works, the first thing created is a work table. Let’s look at that table.
• The reason Albert of Rosendaleis missing from the work table isbecause the join_term
fails to be true for that cardholder.• Since Albert never makes it to
the work table he can never makeit to the final answer table.
B_NAME B_ADDR L_DATE ------------- ------------ ------------john New Paltz 12/10/1992john New Paltz 12/01/1992jo-ann New Paltz 12/14/1992jo-ann New Paltz 11/30/1992mike Modena 12/08/1992mike Modena 12/04/1992mike Modena 12/04/1992john Kingston 12/09/1992diana Tilson 12/12/1992susan Wallkill 12/01/1992john Kingston 11/28/1992
ch.borrowerid = b.borrowerid
Database Principles
Solution:
• SQL has a special join called the outer join that helps resolve this problem.
• The left outer join acts like a normal join when the join_term is true.
• When the join_term is never true for a row in the table to the left of the left outer join syntax, the left outer join is true once.
• This changes the work table
select b_name, b_addr, count(l_date) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr
Database Principles
Solution (2):
• The reason for the null valueis that since there is no joinbetween the row containingAlbert’s information and theBorrows table, there is no corresponding l_date value so the work table has to putnull value in place of a date.
• On top of that, the count() function counts a single null value as 0.
select b_name, b_addr, count(l_date) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr
B_NAME B_ADDR L_DATE ------------- ------------ ------------john New Paltz 12/10/1992john New Paltz 12/01/1992albert Rosendale nulljo-ann New Paltz 12/14/1992jo-ann New Paltz 11/30/1992mike Modena 12/08/1992mike Modena 12/04/1992mike Modena 12/04/1992john Kingston 12/09/1992diana Tilson 12/12/1992susan Wallkill 12/01/1992john Kingston 11/28/1992
Database Principles
Solution (3):
• The final table then becomes:
B_NAME B_ADDR 3 ------------ ------------ -----------albert Rosendale 0diana Tilson 1jo-ann New Paltz 2john Kingston 2john New Paltz 2mike Modena 3susan Wallkill 1
Database Principles
Alternative (Incorrect) Solution:
• You must be careful that the column being used in the aggregate function must come from the right-hand table.
• The following query fails to produce the correct result.
• It is clear that when
tries to count ch.borroweridit is not counting null so actually comes up with a number – 1.
select b_name, b_addr, count(ch.borrowerid) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr
B_NAME B_ADDR BORROWERID ------------- ------------ ---------------------diana Tilson 9823jo-ann New Paltz 1325jo-ann New Paltz 1325Albert Rosendale 1345john Kingston 7635john Kingston 7635john New Paltz 1234john New Paltz 1234mike Modena 2653mike Modena 2653mike Modena 2653susan Wallkill 5342
work table
count(ch.borrowerid)
Database Principles
Alternative (Incorrect) Solution (cont):
• Trying to count from the Cardholder table and not the Borrows table yields the following incorrect solution:
select b_name, b_addr, count(ch.borrowerid) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr
B_NAME B_ADDR 3 ------------ ------------ -----------albert Rosendale 1diana Tilson 1jo-ann New Paltz 2john Kingston 2john New Paltz 2mike Modena 3susan Wallkill 1
turns out to be 1 insteadof the correct 0.
Database Principles
Left Outer Join vs Right Outer Join:
• The following are equivalent:
select b_name, b_addr, count(l_date) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr
select b_name, b_addr, count(l_date) from borrows b right outer join cardholder ch on b.borrowerid = ch.borrowerid group by b_name, b_addr
Database Principles
Warning:
• You are not allowed to use an aggregate function in a where_clause except inside a subquery.
• The error in this query is that where_clause conditions are evaluated one row at a time and count(*) is always applied to a set of rows as a unit.
Find the cardholders with two books borrowed
Select b_name, b_addr from cardholder ch, borrows b where ch.borrowerid = b.borrowerid AND count(*) = 2 # this causes a syntax error
Find the cardholders with two books borrowed
Select b_name, b_addr from cardholder ch where 2 = (select count(*) from borrows b where b.borrowerid = ch.borrowerid)
co-related subquery
Database Principles
Complete Group By Syntax:
• The having_clause is intended to do for groups what the where_clause does for rows. In other words, the having_clause is intended to include some groups and not others.
select <non-aggregate column list>, <aggregate column list> from <table list> where <condition> group by <non-aggregate column list> having <group condition>
Database Principles
How the Complete Group-By Works:
• Step 1: Ignore the group_by clause, use the where_clause to build a work table invisible to the programmer. The work table will contain all the columns necessary to calculate the final result table.
• Step 2: Use the columns in the group_by clause to divide the work table into groups where the values of the group_by columns are the same.
• Step 3: Apply the having_clause condition to each group in turn, throwing away groups where it is false.
• Step 4: Calculate the aggregate functions of the select_list one group at a time.
• Step 5: Produce one row of output per group.
Database Principles
Example:
• For each cardholder, find the total value of all books on loan to that cardholder provided the total values exceeds $40.00.
• NOTE: We don’t need to use left outer join here because we are only interested in cardholders with one or more book loans.
select b_name, b_addr, sum(p_price) from cardholder ch , borrows b, copy c where ch.borrowerid = b.borrowerid AND b.accession_no = c.accession_no group by b_name, b_addr having sum(p_price) >= 40.00
B_NAME B_ADDR 3 ------------- ------------ --------john Kingston 58.00mike Modena 95.00
Database Principles
Example 2:
• For each cardholder, find the total value of all books on loan to that cardholder provided the total values is less than $40.00.
• NOTES:– coalesce(A,B)– If A is null then value is B
select b_name, b_addr, coalesce(sum(p_price),0) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid, copy c where b.accession_no = c.accession_no group by b_name, b_addr having coalesce(sum(p_price),0.0) < 40.00;
B_NAME B_ADDR 3 ------------- ------------ -----------albert Rosendale 0.00diana Tilson 28.00jo-ann New Paltz 39.00john New Paltz 30.00susan Wallkill 37.00
Database Principles
Revisit Left Outer Join:
• Yes, know how to do them but avoid them if you can.• Consider
• To fully join Cardholder to borrows or Copy we need a left outer join.
• To join Book to Copy we do not need a left outer join.
borrows
Database Principles
Dummy Rows in Copy and Book
• Perform the following inserts into Book and Copy
• Think of these as “dummy” rows and needs be done only once.
• Minimum participation number of COPY is_copy_of BOOK stays as 1
insert into Book (ISBN) values ('0-00');insert into Copy(acc_no,ISBN) values ('0','0-00');
Database Principles
Insert a New Cardholder
• Every time you add a new Cardholder, add a corresponding dummy row in Borrows.
• What we have done is make it appear as though Donna has borrowed the “dummy” copy of the “dummy” book.
• Now Cardholder <borrows> Copy minimum participation number is 1.
insert into Cardholder (borrowerid,b_name,b_addr,b_status) values(9999,'Donna','Accord','junior');-- also addinsert into Borrows (borrowerid, accession_no) values(9999,'0');
Database Principles
Automatic Input
• Databases provide a mechanism called a trigger to do automatic things like the insert into Borrows.
• Insert a row into Cardholder and the trigger “fires” and causes an insert to take place in Borrows as well.
• So even Cardholders who have borrowed nothing have borrowed the dummy book.
create trigger i_cardholder after insert on Cardholderreferencing new as nfor each rowbegin atomic insert into borrows (borrwerid,accession_no) values(n.borrowerid,'0');end@
Database Principles
Automatic Input (cont)
• We also need triggers on Borrows because we need a cardholder to either have borrowed the dummy book or a real book but not both.
create trigger i_borrows after insert on borrowsreferencing new as nfor each rowbegin atomic delete into borrows where borrower_id = n.borrower_id and accession_no = '0';end@
Database Principles
Automatic Input (cont)
• And when we delete a book loan.
create trigger d_borrows after delete on borrowsreferencing old as ofor each rowBEGIN ATOMIC declare v_accession_cnt int set v_accession_cnt = (select count(*) from borrows where borrower_id = o.borrower_id); IF (v_accession_cnt = 0) THEN insert into borrows(borrower_id,accession_no) values (o.borrower_id,'0'); END IF;END@
Database Principles
No More Left Outer Join:
• Find the number of books borrowed by each cardholder.
• NOTES: – qnec(a,b) is a user-defined function that returns
1 if a!= b and 0 if a = b.– sum(0|1) == count(*) where row has 1
select ch.borrower_id, b_name, b_addr, sum(qnec(b.accession_no,'0')) from cardholder ch, borrows b where ch.borrowerid = b.borrowerid group by ch.borrowerid, b_name, b_addr;
Database Principles
Non-Aggregate Example
• Suppose we want a list of all books a cardholder has borrowed and the cardholder names. Place a – where the cardholder has borrowed no books
compared to
select b_name, title from cardholder ch, borrows b, copy c, book k where ch.borrowerid = b.borrowerid and b.accession_no = c.accession_no and c.isbn = k.isbn;
select b_name, title from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid , copy c, book k Where b.accession_no = c.accession_no and c.isbn = k.isbn;