Upload
lybao
View
214
Download
0
Embed Size (px)
Citation preview
Slide 1
NULLs & Outer Joins
Objectives of the Lecture :
•To consider the use of NULLs in SQL.
•To consider Outer Join Operations, and their
implementation in SQL.
Slide 2
Missing Values : Possible Strategies
Use a special value to represent missing data.
E.g. „N/A‟, „T.B.A.‟
The special value must have the same data type as the data
that is missing, so it can be stored with the data that is
known.
Requires no special facility from the DBMS.
Use NULL to represent missing data.
NULL is the absence of a value.
NULL 0 NULL „ ‟
NULL is not part of any data type.
Requires special support from the DBMS.
SQL DBMSs provide this support.
So most DBs use NULLs to represent missing data.
space
character
This is revision based on part of the earlier lecture „The Data in a Relation‟.
Slide 3
Display of SQL NULLs
EmpNo EName M-S Sal
E3 Smith S 18,000
E5 Robinson M 24,000
E9 Graham S
E1 Robson D 32,500
E2 Atkins 24,000
E6 Blakelaw M 54,000
E7 Mortimer D 28,000
E4 Fenwick S 40,000
Blank space
in Oracle.
Keyword NULL in
other SQL DBMSs.
Other possibilities
in other DBMSs.
This is how Oracle displays NULLs in a retrieved table. Other SQL DBMSs use
different conventions to display a NULL.
Slide 4
Dealing with NULLs in SQL Tables
Three situations arise :
Comparisons of column values.This occurs in the SQL equivalents
of the Restrict and the various Join operations, plus Deletionsand Updates.
Calculations involving column values.This occurs in the SQL equivalents
of the GroupBy and Extend operations.
Comparisons of row values.This occurs in the SQL equivalents
of the Project, GroupBy, Union, Intersect, and Differenceoperations.
Each of these three cases is now considered in turn :-
Case One : Comparison of Column Values
Slide 5
Comparison of Column Values (1)
SQL provides special comparators to check for NULL :-
X IS NULL
X IS NOT NULL
Let X be a numeric column. If X has a value, the comparison
X = 3
makes sense. It should yield true or false.
Suppose X is NULL. An error should arise.
In fact SQL treats the NULL as representing an existing but
unknown value. Comparison returns maybe.
Rationale : We don‟t know if X = 3
because X is NULL (= not available).
Note : X may represent some other case of missing data
(e.g. not applicable, does not exist).
The result is still maybe even though this is then illogical.
NULLs can be used to represent many different cases of missing data. Each different
case may require its own rationale for how to handle missing data, and they can vary
significantly. So SQL‟s choice of rationale will generally only be valid in certain cases.
Slide 6
Comparisons of Column Values (2)
Let X and Y be a numeric columns. Consider the comparison
X = Y
Suppose X and Y are both NULL.
The result is maybe not true.
NULL is not the same as maybe.
Absence
of a value.
A truth
value.
SQL uses NULL to mean maybe !
To avoid confusion in the remainder of the lecture, we will still assume the value maybe
exists in SQL.
Slide 7
Restricts, Joins, Updates and Deletions
Restrict SELECT *FROM TableNameWHERE condition ;
Join SELECT *FROM Table1 NATURAL JOIN Table2 ;
Delete DELETE FROM TableNameWHERE condition ;
Update UPDATE TableNameSET column(s) = new value(s)WHERE condition ;
Restrict / Join / Delete / Update action takenonly where condition evaluates to true,
not where it evaluates to maybe or false.
Column
comparison
used as a
condition
Similarly for other
kinds of Join.
In principle, there is no problem with this.
In practice, problems arise because the presence of NULLs is forgotten, or it is assumed
that SQL will take the same „reasonable‟ approach to NULLs that the user does, when in
fact SQL doesn‟t take that approach.
We need to assume NULLs may occur and include appropriate conditions for them.
Consider an example of what happens if this is forgotten :-
Slide 8
Unexpected Results (1)
They arise when forgetting that the condition can evaluate to maybe.
Example :-
SELECT *
FROM EMP
WHERE Sal >= 20000
UNION
SELECT*
FROM EMP
WHERE Sal < 20000 ;
the 2 Restrictions will not necessarily contain all the rows
of EMP between them.
If column „Sal‟
contains any NULLs,
the result will not
re-create table EMP.
Of course it is not assumed that you will use 2 queries simply to re-create the original
table !
Each Restriction condition is the logical inverse of the other. Consequently there is a
temptation to think that those rows not retrieved from the table by one query must
inevitably be retrieved by the other query. This is guaranteed to be true if it were not for
NULLs. If there are NULLs in the table, there will be a third set of rows only retrieved
from the table by a Restriction condition that retrieves those rows containing NULLs.
Therefore it is essential to decide when formulating the required Restriction condition
whether the NULL-including rows should be retrieved as well.
The following shows the Restriction conditions that correspond to the three sets of rows
that arise when NULLs are included.
Slide 9
Unexpected Results (2)
To ensure the table is re-created,
re-write the query as follows :-
SELECT *FROM EMPWHERE Sal >= 20000
UNIONSELECT *FROM EMPWHERE Sal < 20000
UNIONSELECT *FROM EMPWHERE Sal IS NULL ;
In general,adjust
statementsto reflect the
NULLpossibility.
In this particular case, should the rows with “NULL Sals” be retrieved with the salaries
that are £20,000 or more, or with those that are less than £20,000; or do we genuinely
want to ignore the “NULL Sals” ?
Slide 10
Join involving NULLs : Example
P# S# Qty
P1 S1 5
P2 10
P2 S2 7
S# Details
S 1 ……..
S 2 ……..
S 3 ……..
P# S# Qty Details
P1 S1 5 ………..
P2 S2 7 ………..
This row does not
appear in the result. SELECT *
FROM R Natural Join S ;
R S
If we want to include the missing row, we need to replace the
R Natural Join S
with
R Join S Using On( some condition involving NULLs )
Slide 11
Oracle‟s “NVL” Function
NVL supplies a value to use whenever a NULL is encountered.
It can be used in SELECT and WHERE phrases.
Example : NVL( Sal, 0 )
This yields the value of the „Sal‟ column, except that if „Sal‟ is
NULL, then a value of zero is returned.
NVL can be used to ensure a comparison always yields
true or false, and never maybe.
Example : ……... WHERE NVL( M-S, „S‟ ) <> „M‟
Put a column name in the first position.
Put a value in the second position.
Value „S‟ is used in the comparison when „M-S‟ is NULL.
Comparison can never return maybe.
Most SQL DBMSs have some function analogous to Oracle‟s NVL, although it may be
somewhat different.
Such a function can be very useful in practice, and can make it easier to design conditions
that cope with possible NULLs.
Case Two : Calculations Involving Column Values
Slide 12
Calculations Involving Column Values
These arise in two situations :
Scalar calculations along rows
Extend
Aggregate calculations
along columns
GroupBy
EmpNo EName M-S Sal
E3 Smith S
E5 Robinson M 24,000
E9 Graham S
E1 Robson D 32,500
E2 Atkins M
E6 Blakelaw M
E7 Mortimer D 28,000
E4 Fenwick S 40,000
We will now consider these two situations :-
Slide 13
Scalar Calculations
Any calculation involving a NULL
results in a NULL.
Examples : let n be NULL. Then :-
n + 1 NULL
n concatenate “ABCD” NULL
n - n NULL (not zero)
Example :-
SELECT Sal + 100 AS NewSal
FROM EMP ;
So “NewSal” will be NULL whenever “Sal” is NULL.
This can give some surprising results. Take care !
Slide 14
Aggregate Calculations
If the columns being aggregated contain one or more NULLs,
then the answer from :
Sum
Avg
Min ignores the NULLs.
Max
Count( Distinct )
Count(*) includes the NULLs.
Not all of these are mathematically valid. Take note and take care !
Slide 15
Example : Aggregation in GroupBy
SELECT Sum(Sal) AS Total, M-S
FROM EMP
GROUP BY M-S ;
EmpNo EName M-S Sal
E3 Smith S
E5 Robinson M 24,000
E9 Graham S
E1 Robson D 32,500
E2 Atkins M
E6 Blakelaw M
E7 Mortimer D 28,000
E4 Fenwick S 40,000
Total M-S
40,000 S
24,000 M
60,500 D
We consider here the aggregate calculation aspects of GroupBy, not the grouping
aspects.
Depending on the circumstances, what SQL does about NULLs may or may not be
appropriate. See the earlier discussion on the rationale concerning missing data.
Case Three : Comparison of Row Values
Slide 16
Comparisons of Rows
In SQL, two rows are identical if :
• they have the same number of attributes;
• corresponding attributes are of the same data type;
• corresponding attributes have the same value.
In an SQL row comparison,
a NULL compared to a NULL
true
In an SQL column comparison
(e.g. for a Join operation)
a NULL compared to a NULL
maybe
Different !!
Slide 17
Example : Row Comparison
Comparison of M-S column values :
Row Comparison : 2 NULLs are defined to be identical.
A comparison yields true !!
these 2 rows are identical.
Column Comparison : 2 NULLs are not assumed identical.
A comparison yields maybe !!
these rows are rejected.
E2 Atkins 24,000
E2 Atkins 24,000
E_No E_Name M_S Sal
Slide 18
Project, GroupBy, & Set Operators
Project SELECT DISTINCT ColumnName(s)
FROM TableName ;
GroupBy SELECT “Aggregation(s)”, GroupingCol(s)
FROM TableName
GROUP BY GroupingCol(s) ;
Set Ops SELECT *
FROM TableName1
UNION
SELECT *
FROM TableName2 ;
Project / GroupBy (grouping rows) / Union / Intersect / Except
action taken on the basis that all NULLs are identical.
Similarly for
the other
Set Ops,
Intersect &
Except/Minus.
Consider some examples :-
Slide 19
Example : Projection
SELECT DISTINCT M-S
FROM EMP ;
EmpNo EName M-S Sal
E3 Smith S 18,000
E5 Robinson M 24,000
E9 Graham 18,000
E1 Robson D 32,500
E2 Atkins 24,000
E6 Blakelaw M 54,000
E7 Mortimer D 28,000
E4 Fenwick W 40,000
M-S
S
M
D
W
Slide 20
Example : Grouping in „GroupBy‟
SELECT “Aggregation”, M-S
FROM EMP
GROUP BY M-S ;
Aggregate M-S
“Agg-Val1” S
“Agg-Val2” M
“Agg-Val3”
“Agg-Val4” D
EmpNo EName M-S Sal
E3 Smith S 18,000
E5 Robinson M 24,000
E9 Graham 18,000
E1 Robson D 32,500
E2 Atkins 24,000
E6 Blakelaw M 54,000
E7 Mortimer D 28,000
E4 Fenwick S 40,000
This example concerns the grouping of rows, not the calculation of aggregate values.
Slide 21
Example : Union Operation
Union
EmpNo EName M-S Sal
E3 Smith S 18,000
E5 Robinson M 24,000
E9 Graham S
E1 Robson D 32,500
E2 Atkins 24,000
EmpNo EName M-S Sal
E1 Robson D 32,500
E2 Atkins 24,000
E6 Blakelaw M 54,000
EmpNo EName M-S Sal
E3 Smith S 18,000
E5 Robinson M 24,000
E9 Graham S
E1 Robson D 32,500
E2 Atkins 24,000
E6 Blakelaw M 54,000
Outer Joins
Slide 22
Joins - Inner versus Outer
All joins considered so far are Inner Joins.
Only a subset of each operand‟s tuples appear in the result.
These are the tuples that match each other in the 2 operands.
(Match the comparison (of whatever kind) is true).
The unmatched tuples don‟t appear in the result.
Sometimes it is useful to have unmatched tuples in the result as
well. Outer Join
Three kinds of Outer Join, to retain in the result all the
unmatched tuples from :
• „Left‟ operand,
• „Right‟ operand,
• „Left‟ and „Right‟ operands.
Consider now some illustrations of Inner and Outer Joins.
For convenience, a Natural Join is always assumed, but the same principles apply to
every type of join.
Slide 23
Inner Joins
(Natural)
Unmatched tuples are not in the result.
unmatched
unmatched
Slide 24
Outer Join : Left
? ? ? ?
(Natural)
Some unmatched tuples are in the result.
unmatched „padding‟
unmatched
unmatched
Slide 25
Outer Join : Right
? ? ? ? ? ? ? ?
(Natural)
Some unmatched tuples are in the result.
unmatched„padding‟
unmatched
unmatched
Slide 26
Outer Join : Full
? ? ? ? ? ? ? ?
? ? ? ?
(Natural)
All unmatched tuples are in the result.
unmatched
unmatched „padding‟
„padding‟
unmatched
unmatched
Slide 27
Outer Joins in SQL
What “padding” attribute values are used with the unmatched
columns ?
What syntax is used for outer joins ?
Natural Join,
Join Using( ColNames ),
Join On( condition ).
Each of these can be used for Left, Right and Full outer joins.
9 possibilities.
SQL uses NULLs.
An extension of the FROM phrase inner join syntax.
Although 9 possible kinds of outer join may seem a lot to cope with, it is made easier to
cope with by remembering that they are the same 3 kinds of joins as for inner joins, and
each can be used orthogonally with the 3 kinds of „Outer‟ facility; i.e. we decide
independently on the kind of join and the kind of „Outer‟ facility required, and then just
put the two together.
Slide 28
SQL2 Outer Natural Joins
SELECT *FROM R Natural Join S ;
Left
Right
Full
Outeroptionally inserted
Examples :-
SELECT *
FROM SUPP Natural Left Outer Join SHIP ;
Result retains all the unmatched rows of LHS table, i.e. SUPP.
SELECT *
FROM SUPP Natural Right Join SHIP ;
Result retains all the unmatched rows of RHS table, i.e. SHIP.
Slide 29
The Other Two SQL2 Outer Joins
SELECT *
FROM R Join S Using ( attribute(s) ) ;
SELECT *
FROM R Join S On ( condition ) ;
Example :-
SELECT *
FROM SUPP Left Outer Join SHIP Using( S# ) ;
Left and right refer to the tables written to left and right of the
join operator.
Logically only left or right is required, but it is convenient to
have both.
Left
Right
FullOuteroptionally inserted
Left
Right
Full
Useful syntax rules to remember :
If „outer‟ is optionally used, it always comes after the keyword left/right/full.
The keyword(s) left/right/full (outer) always come before the keyword join.
Slide 30
Oracle : Outer Joins
Original Oracle syntax is completely non-standard.
The idea is to add a (+) suffix to
the name of the column that is in the table whose columns will
receive the NULLs as „padding‟.
Regarding „left‟ and „right‟, this is
the exact opposite of the SQL standard.
Example :-
SELECT AttributeNames
FROM SUPP, SHIP
WHERE SUPP.S# = SHIP.S#(+) ;
Old fashioned SQL1 join syntax is required.
„Left‟ & „right‟ refer to columns in the WHERE phrase, not tables in the FROM phrase.
Full join ≡ Union of left and right outer joins.
Do NOT useunless desparate !
This outer join syntax is peculiar to Oracle and used by no other DBMS. It has several
disadvantages, and now that Oracle has provided its DBMSs with an SQL2 standard
outer join syntax, you are strongly advised to use that syntax and avoid the old Oracle
version.
The old version is only mentioned for completeness - only an overview of it is given here
- should you find that you need to use an old version of an Oracle DBMS that lacks the
modern SQL2 syntax.