Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Chapter 5
Relational Algebra and SQL
Now that we have some idea as how to create
and set up a database based on a project spec-
ification, via an E/R chart, we will learn how
to get the “right stuff” out of such a database,
which is what we do most of the time.
A database query language is a special-purpose
programming language that is designed and
used to retrieve, and update, information stored
in databases.
The structured query language(SQL) is the
one that we use most of the time. A very
important feature for SQL statements is that
it only states what it does, but not how to do
it, which is left for DBMS to figure out. It
is thus called a declarative language, different
from the other procedural languages.
1
RA, SQL, and MariaDB
SQL is based on a mathematical body of knowl-
edge, Relational algebra (RA), which serves as
an intermediate language for the DBMS.
When a declarative SQL statement is parsed
by a DBMS, it will be translated into an RA ex-
pression. Such an expression is then analyzed
and optimized by a query optimizer to become
an equivalent but more efficient algorithm, or,
a query execution plan. Such a plan is then
converted to a piece of executable code.
The mathematical nature of the relational al-
gebra makes such analysis and optimization,
and proof of equivalence, possible.
MariaDB is one way to implement SQL, incre-
mentally, with the most recent version being
10.6.4, released on August 6, 2021.
2
What is relational algebra?
A relational algebraic expression consists of a
combination of some eight, or nine, basic op-
erators.
There are three groups of operators, two of
them, Restrict and Project, on the tables; four,
Union, Difference, Intersection, and Cartesian
product, on sets; together with two derived
ones: Join, and Division.
Sometimes, renaming, the ninth one, also plays
a role, when name change become necessary....
Just like we use the combination of the three
basic control structures to come up with a pro-
gram as we know it, we use a combination of
these operators to come up with a data access
program.
3
...and in words
A∪B, the union of A and B, returns a relation
containing all tuples that appear in either, or
both, of the two specified relations, A and B.
A ∩ B, the intersection of A and B, returns
a relation containing all tuples that appear in
both of the two specified relations, A and B.
A ×B, the product of A and B, returns a rela-
tion containing all possible pairs (a, b), where
a is from A, and b from B.
A − B, the difference of A and B, returns a
relation containing all the tuples that appear
in A, but not in B.
We also mentioned, during the review, that the
first two are communicative (?), but the other
two are not.
5
σCR, the restriction of a relation R on C, re-
turns a relation containing all the tuples from
a relation R that satisfy a condition C.
πAsR, the projection of a relation R on As, re-
turns a relation containing all the (sub) tuples
of R in terms of attributes As.
A ./C B, the join of A and B in terms of C,
returns a relation containing all the pairs (a, b),
a ∈ A and b ∈ B such that (a, b) satisfies the
condition C.
Natural join of A and B collects those that
agree on the shared attribute(s).
A/C via B, the division of A by C via B, takes
two unary relations, A and C, and a binary one,
B, as its inputs. As the output, it sends back a
relation containing all the tuples from A, each
is matched with all the tuples in C, as shown
in B.
6
Restriction (Select)
We apply this operation to select a subset of
tuples satisfying certain Boolean conditions.
For example, if we want to get a list of com-
puter science professors, we use a select op-
eration to get it from the Professor table as
follows:
σDeptId=’CS’(Professor)
i.e., “Select all tuples from the Professor rela-
tion that satisfy the condition that DeptId=‘CS’.”
The general syntax is the following:
σselection condition(relation expression)
7
What could a condition be?
A condition can be a simple one, such as
attribute ⊕ constant,
e.g., “DeptId=’CS’”; or
attribute ⊕ attribute,
e.g., “Teaching.ProfId = Professor.Id”.
It could also be a general logic expression, i.e.,
an expression formed with logical operators,
such as And (∧), Or (∨), and Not (¬).
You must have learned this stuff in earlier courses.
8
How do we get the nastiness?
σselection condition(relation expression) is just a
string. But, when applied to a concrete database,
it has a value as its meaning.
Assume that it is applied to a relation instance
r of type R, we define the values of such an
expression σselection condition(r) to be the col-
lection of all the tuples in r that satisfies the
selection condition.
The important thing is that, when applied to
a relation, this expression will result in another
relation.
Question: So what?
This provides the basis for nested (nasty) queries,
as a query can be put anywhere a table fits.
9
An example
Given the following Person table,
Id Name Address Hobby
1123 John 123 Main St. Stamps
1123 John 123 Main St. Coin
5556 Mary 7 Lake Dr. Hike
9876 Bart 5 Pine St. Stamps
with the expression σHobby=’Stamps’(Person),
we will get the following table back.
Id Name Address Hobby
1123 John 123 Main St. Stamps
9876 Bart 5 Pine St. Stamps
This latter table (relation) can be used in other
queries... .
10
Another example
Given a more complicated condition
σStudId!=1111111 And (Semester=‘S2017’ Or Grade=’B’)(Transcript)
for each and every tuple in the table, it will
check if it satisfies the requirement, and throw
that into the result bucket if it does.
Question: What do we want?
The condition part could be further extended,
e.g.,
EmpSalary > (MngrSalary ∗ 2)
And (DeptId + CrsNumber) Like CrsCode
where ‘+’ is for string concatenation, and ‘Like’
is for pattern matching.
11
What do we usually do?
Query: What are all the courses taught by CS
professors”.
Question: How should we do it?
Answer: We always start with the input and
walk towards the output.
It seems that two tables, Teaching and Professor
are mentioned, and the input seems to be “CS”.
Notice that those two tables share the profes-
sor id information, we can find the output from
these two tables such that the tuples share the
same professor id, and the professor is affiliated
with Computer Science.
Technically, we can have the following RA ex-pression.
σProfessor.DeptId=’CS’ And Teaching.ProfId=Professor.Id
(Teaching × Professor)
12
Check it out...
Given the following data of the Professor table,
+------+--------------+--------+| Id | Name | DeptId |+------+--------------+--------+| 1111 | Jacob | MG || 2222 | John | CS || 3333 | David | EE || 4444 | Mary | CS |+------+--------------+--------+
and that of the Teaching table:
+------+---------+----------+
|ProfId| CrsCode | Semester |
+------+---------+----------+
| 1111 | MGT123 | F1995 |
| 2222 | CS305 | S1996 |
| 2222 | CS315 | F1997 |
| 3333 | EE101 | F1995 |
| 4444 | CS305 | F1995 |
+------+---------+----------+
13
How will it get the result?
1. Get the information by doing a Cartesian
product
Id# NAME DeptId ProfId CrsCode Semester
1111 Jacob MG 1111 MGT123 F19951111 Jacob MG 2222 CS305 S19961111 Jacob MG 2222 CS315 F19971111 Jacob MG 3333 EE101 F19951111 Jacob MG 4444 CS305 F19952222 John CS 1111 MGT123 F19952222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19972222 John CS 3333 EE101 F19952222 John CS 4444 CS305 F19953333 David EE 1111 MGT123 F19953333 David EE 2222 CS305 S19963333 David EE 2222 CS315 F19973333 David EE 3333 EE101 F19953333 David EE 4444 CS305 F19954444 Mary CS 1111 MGT123 F19954444 Mary CS 2222 CS305 S19964444 Mary CS 2222 CS315 F19974444 Mary CS 3333 EE101 F19954444 Mary CS 4444 CS305 F1995
The first row is ‘related”, but not what we
want; and the second is not related.
We need to get rid of them by doing a restric-
tion through the two conditions.
14
Keep those useful...
... taught by CS professors...Id# NAME DeptId ProfId CrsCode Semester
2222 John CS 1111 MGT123 F19952222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19972222 John CS 3333 EE101 F19952222 John CS 4444 CS305 F19954444 Mary CS 1111 MGT123 F19954444 Mary CS 2222 CS305 S19964444 Mary CS 2222 CS315 F19974444 Mary CS 3333 EE101 F19954444 Mary CS 4444 CS305 F1995
... and tuples have to be related, with match-
ing Ids.Id# NAME DeptId ProfId CrsCode Semester
2222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19974444 Mary CS 4444 CS305 F1995
Question: Is this what we want?
Answer: No. We want names of those courses,
taught by CS professors.
Question: How could we focus on, e.g, CS305,
and get what we really want?
15
The three yards....
1. A Cartesian product will be formed, which
contains a (h)uge table of twenty rows and six
columns. /
2. Only those ten (?)rows, where DeptId keeps
the “CS” value will be kept.
3. Finally, it keeps three (?) rows where CS
professors 101202303 and 555666777 taught
CS305 and CS315.
4. All these three rows contain six attributes,
and we will use projection to get out the course
code for these courses, as we will see later.
5. We want more..., e.g., focusing on the
course Ids and get their names through the
Course table.
16
Is there a better way?
The above solution works, but it is bulky /.
We always want to have a smaller intermediate
table to reduce the space, as well as time, to
get something done.
Procedurally, we start with the Professor table
to get all the professors who work in the CS
department, then walk over to the Teaching
table to get those tuples such that its ProfId
match with those that we just found.
σT.ProfId=Professor.Id((σProfessor.DeptId=’CS’Professor) × Teaching)
Question: Is this one better?
Notice that we have to get the stuff from two
tables. Thus, we are really Joining tuples from
two related tables that agree on the ProfId
attribute, shared among the two tables.
17
Check it out...
Given the following data of the Professor table,
+------+--------------+--------+| Id | Name | DeptId |+------+--------------+--------+| 1111 | Jacob | MG || 2222 | John | CS || 3333 | David | EE || 4444 | Mary | CS |+------+--------------+--------+
1. The restriction
σProfessor.DeptId=’CS’Professor
will get us the following:
+------+--------------+--------+| Id | Name | DeptId |+------+--------------+--------+| 2222 | John | CS || 4444 | Mary | CS |+------+--------------+--------+
18
2. With the restricted Cartesian product
(σProfessor.DeptId=’CS’Professor) × Teaching
we get the following smaller intermediate ta-
ble:
Id# NAME DeptId ProfId CrsCode Semester
2222 John CS 1111 MGT123 F19952222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19972222 John CS 3333 EE101 F19952222 John CS 4444 CS305 F19954444 Mary CS 1111 MGT123 F19954444 Mary CS 2222 CS305 S19964444 Mary CS 2222 CS315 F19974444 Mary CS 3333 EE101 F19954444 Mary CS 4444 CS305 F1995
3. Finally, the final layer of restriction
σT.ProfId=Professor.Id((σProfessor.DeptId=’CS’
Professor) × Teaching)
leads to the following result.
Id# NAME DeptId ProfId CrsCode Semester
2222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19974444 Mary CS 4444 CS305 F1995
19
Projection
A table might contain too much stuff, we don’t
always want to get back all the information.
The projection is to help us choose attributes.
Let A denote an attribute of a relation, R, and
let t be a tuple in r, and instance of R, then t.A
denote part of t consisting of the column under
A only, e.g., if t is a tuple of the Professor
table, then t.Id refers to the Id value of this
tuple.
In general, we have the following notation:
πattribute list(relation)
When applied to a relation r with type R, where
A1, · · · , An are all attributes of R, then πA1,···,An(r),
the projection of r on the list, returns the col-
lection of tuples t.[A1, · · · , An], where t is a tu-
ple of r.
20
An example
Given the following Person table,
Id Name Address Hobby
1123 John 123 Main St. Stamps
1123 John 123 Main St. Coin
5556 Mary 7 Lake Dr. Hike
9876 Bart 5 Pine St. Stamps
with the expression πName,Hobby(Person), we
will get the following table back
Name Hobby
John Stamps
John Coin
Mary Hike
Bart Stamps
21
Embedded (nested) expression
The gist of DB programming is that RA op-
erations can be combined. For example, given
the following table
Id Name Address Hobby
1123 John 123 Main St. Stamps
1123 John 123 Main St. Coin
5556 Mary 7 Lake Dr. Hike
9876 Bart 5 Pine St. Stamps
and πId,Name(σHobby=’Stamps’ Or Hobby=’Coins’(Person)),
we will get the following table back
Id Name
1123 John
9876 Bart
Notice the order of operation is “inside-out”,
just like in an arithmetic expression.
22
Related to SQL
Given πId,Name(σHobby=’Stamps’ Or Hobby=’Coins’(Person)),
we immediately have the following SQL query:
Select Distinct Id, Name
From Person
Where Hobby=’Stamps’ or Hobby=’Coins’;
Question: What will the above get?
MariaDB [zshen]> Select Distinct Id, Name
-> From Person
-> Where Hobby=’Stamps’ or Hobby=’Coins’;
+------+------+
| Id | Name |
+------+------+
| 1123 | John |
| 9876 | Bart |
+------+------+
2 rows in set (0.00 sec)
23
deja vu
Query: What are “all the courses taught by
CS professors”?
The following RA expression finds out the course
numbers of the courses taught by CS profes-
sors (Cf. Page 12).
πCrsCode(σProfessor.DeptId=’CS’ And ProfId=Professor.Id
(Teaching× Professor))
Question: What are those courses, namely,
names?
Since course names are in Course tables, we get
the stuff out of Teaching× Professor × Course,where the tuples agree on both Professor Id
and Course Id.
πCrsName(σCourse.CrsCode=Teaching.CrsCode
(πCrsCode(σProfessor.DeptId=’CS’ And ProfId=Professor.Id
(Teaching× Professor)))× Course)
24
How would it work out?
a. Courses that a CS person teaches (Cf.
Page 15 or 19).Id# NAME DeptId ProfId CrsCode Semester
2222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19974444 Mary CS 4444 CS305 F1995
b. Get the course code:
CrsCode
CS305
CS315
c. Join with Course with matching CrsCode:
CrsCode DeptId CrsName Descr
CS305 CS Database On the road to high-paying jobCS315 CS Trans. Proc. Recover from your worst crashes
d. Get the names by making a projection on
the course names.
CrsName
Database
Trans. Proc.
25
Related to SQL
Given the following RA query
πCrsName(σCourse.CrsCode=Teaching.CrsCodeAnd Professor.DeptId=’CS’ And Teaching.ProfId=Professor.Id
(Teaching× Professor × Course))
We immediately (?) get the following
Select Distinct CrsNameFrom Course C, Teaching T, Professor PWhere P.DeptId="CS" And T.CrsCode=C.CrsCode
And T.ProfId=P.Id;
With the current instance, this query bringsback the following:
MariaDB [register]> Select Distinct CrsName-> From Course C, Teaching T, Professor P-> Where P.DeptId="CS" And T.CrsCode=C.CrsCode-> And T.ProfId=P.Id;
+---------------------+| CrsName |+---------------------+| Database Systems. || Transaction Process |+---------------------+2 rows in set (0.00 sec)
26
Is there a better way?
A better way might be the following (Cf. Page 17):
πCrsName(σT.CrsCode=C.CrsCodeπCrsCode(σT.ProfId=Professor.Id
(σProfessor.DeptId=’CS’Professor) × Teaching)× Course)
Procedurally, we start with the Professor ta-
ble to get all the Ids of those professors who
work in the CS department, walk over to the
Teaching table to get the Course Ids of those
courses taught by CS professors. Finally, we
walk over to the Course tables with those course
Ids to get the names of those courses.
Question: Why is this one better?
Answer: All the intermediate tables will con-
tain minimum information that we need to con-
tinue.
Assignment: Use the current instance (Unit
4, Page 23-24) to verify the answer.
27
Set operations
Since relations (tables) are sets, the set op-
erations are pretty straightforward. You must
have played with them in either Finite Math.,
MA for CS, Math Reasoning, or Discrete Math..
Again, given two sets A and B, their union,
intersection, and difference are represented as
A ∪ B, A ∩ B, and A − B, respectively. Notice
that although the first two are symmetric, the
difference is not, i.e., A − B could be different
from A − B.
Given two relations r and s, we immediately
obtain r∪s, r∩s, r−s as the collection of tuples
that are in either r or s; in both r and s; and
in r but not in s. Thus, the results are all sets,
as well.
28
Union compatible
To be meaningful in database manipulation,when we apply set operators, both relationsmust have the same attributes, i.e., union com-patible.
πCrsCode,Semester(σGrade=‘C’(Transcript))
− πCrsCode,Semester(σCrsCode=‘MAT123’(Transcript))
What are the courses, except MAT123, andwhen it was offered, at any time, when at leastone student got a ‘C’?
πCrsCode,Semester(σGrade=‘C’(Transcript))
∪ πCrsCode,Semester(σCrsCode=‘MAT123’(Transcript))
When did we offer MAT123 in the past, andwhat are the other courses, offered at anytime, when at least one student got a ‘C’?
πCrsCode,Semester(σGrade=‘C’(Transcript))
∩ πCrsCode,Semester(σCrsCode=‘MAT123’(Transcript))
When did we offer MAT123, where at leastone student got a ’C’?
29
Related to SQL
Query: When did we off MAT123 that at least
one student took, or something else for whichshe got a ‘C’:
We have the following RA expression:
πCrsCode,Semester(σGrade=‘C’(Transcript))
∪ πCrsCode,Semester(σCrsCode=‘MAT123’(Transcript))
The MariaDB query is immediate:MariaDB [registration]> (Select CrsCode, Semester
-> From Transcript Where Grade=’C’)-> Union-> (Select CrsCode, Semester-> From Teaching Where CrsCode=’MAT123’);
+---------+----------+| CrsCode | Semester |+---------+----------+| CS305 | F1995 || CS315 | F1997 || MAT123 | F1997 || MAT123 | S1996 |+---------+----------+
Notice that neither intersection nor comple-
ment is supported by Version 5.5.56 of Mari-
aDB, but it is available after Version 10.3.0.
32
Cartesian product
Given two relations r and s, r × s, where r and
s share no common attribute names consists
of the set of all tuples (a, b), a ∈ r and b ∈ s.
For example, Let r and s be the following,
S#
S1
S2
P#
P1
P2
Then, the result of r × s is the following,
S# P#
S1 P1
S1 P2
S2 P1
S2 P2
33
What happens...
when r and s do share common attribute names?
For example, T1(A, B)×T2(B, C). If we do noth-
ing, by the very definition, we will end up with
a table T3(A, B, B, C), where the two B’s have
the same name, but potentially different val-
ues. This is not allowed by the relational data
model. (Still remember data atomicity?)
What we will do is thus to rename such at-
tributes. This ninth operator, not a basic one,
can take the following syntax:
expression[A1, · · · , An],
where A1, · · · , An are the new names of the
original relational expression, for the correspond-
ing positions.
Let’s check out an example:
34
Mix up the profs and students...
(πId,Name(Student)× πId,DeptId(Professor))
[Student.Id, Name,Professor.Id, DeptId])
35
Join
A RDB is often a collection of small tables.
Thus, a query is often involved with multiple
tales, when we use Join.
A bit more formally, given two relation schemas,
R and S, their join, is denoted as
R ./join condition S,
where the join condition is used to complete
this operation.
Let A1, · · · , An and B1, · · · , Bn be two subsets of
attributes of R and S, respectively, and ⊕1, · · · ,
⊕n be the standard comparators such as ‘=’,
‘<’, etc., then a general join is R× S, with the
following restriction:
(R.A1 ⊕1 S.B1) And · · ·And (R.An ⊕n S.Bn).
A join is thus not a basic operation, but one
derived with product, restriction, and projec-
tion.
36
Natural join
When all the operations used in a join are ‘=’,
we call this special case a natural join.
For example, considering two tables, Dept
DEPT# DNAME BUDGET
D1 Marketing 10MD2 Development 12MD3 Research 5M
and Emp
EMP# ENAME DEPT# SALARY
E1 Lopez D1 40KE2 John D1 42KE3 Bob D2 30KE4 Jay D2 35K
37
Their natural join over DEPT#, a commonly shared
attribute, is the following:
DEPT# DNAME BUDGET EMP# ENAME SALARY
D1 Marketing 10M E1 Lopez 40KD1 Marketing 10M E2 John 42KD2 Development 12M E3 Bob 30KD2 Development 12M E4 Jay 35K
We notice the following two things about this
table: 1. Those two tables are related through
a commonly shared column, i.e., DEPT#. We will
discuss the extreme cases later, on Page 55,
when nothing, or everything, is shared.
2. When being joined, every row in the first
table will be concatenated with another from
the second row, as long as they are related,
i.e., sharing the same DEPT# value.
For example, since no row in the first table
has a DEPT# value of ‘D3’, then no such row is
contained in the joined table.
38
More specifically...
1. A Cartesian product of the two tables will
be formed:
D# DNAME BUDGET EMP# ENAME D# SALARY
D1 Market 10M E1 Lopez D1 40KD1 Market 10M E2 John D1 42KD1 Market 10M E3 Bob D2 30KD1 Market 10M E4 Jay D2 35KD2 Develop 12M E1 Lopez D1 40KD2 Develop 12M E2 John D1 42KD2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35KD3 Research 5M E1 Lopes D2 40KD3 Research 5M E2 John D2 42KD3 Research 5M E3 Bob D2 30KD3 Research 5M E4 Jay D2 35K
2. All rows that have different D# values will
be deleted, since they are not related. Thus,
D# DNAME BUDGET EMP# ENAME D# SALARY
D1 Market 10M E1 Lopez D1 40KD1 Market 10M E2 John D1 42KD2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35K
39
3. Finally, the duplicated D# column is deleted,
since it would be redundant.
Technically, we make a projection of the result-
ing table on all by one redundant attribute, D#
in this case.
D# DNAME BUDGET EMP# ENAME SALARY
D1 Market 10M E1 Lopez 40KD1 Market 10M E2 John 42KD2 Develop 12M E3 Bob 30KD2 Develop 12M E4 Jay 35K
After the normalization process, to be discussed
in the next Chapter, an RDB almost always
consists of a bunch of simple and small tables.
On the other hand, a general query needs in-
formation from several tables, when the join
operation is applied to collect information from
related tables.
Let’s look at a few applications of this useful,
but challenging, operation.
40
An example
Query: Who taught a course in the fall semester
of 1995?
We want the names of the professors, not all
of them, but the ones who taught in F1995.
The general plan is always how to get the out-
put, based on the input.
One way to proceed is to start with the input,
’F1995’, and find out where that input sits,
Teaching (taught). This can be obtained via a
restriction.
σSemester=‘F1995’(Teaching)
With the current instance of the Teaching ta-
ble, we get the following:
ProfId CrsCode Semester
555666777 CS305 F1995121232343 EE101 F1995
41
What do we really want?
The above information does show the profes-
sors who taught in F1995, but only their num-
bers, not their names.
Question: Where is the beef?
Look at the other tables, we find out that their
names can be found in the Professor table,
related to the Teaching table via the professor’s
Ids.
We can thus get a) the ids of those professors,
then b) their names via a join.
It is easy to get the Ids via a projection.
πProfId(σSemester=‘F1995’(Teaching))
ProfId
555666777121232343
42
Go over with join
With the Id’s in hand, we can connect the two
tables via a join, as follows:
Professor ./P.Id=T.ProfId (σSemester=‘F1995’(Teaching)),
Since join only keeps those rows sharing the
same Id values, we get the following:
Id Name DeptId
555666777 Mary Doe CS121232343 David Jones EE
43
The final kick
Since we only want the names, we have to do
another projection on the Name attribute.
πName(Professor ./Id=ProfId (σSemester=‘F1995’(Teaching))),
This gets us the following:
Name
Mary DoeDavid Jones
Question: What have we done?
Answer: The restriction applied on Teaching
finds out all the information about who taught
what in F1995, including the professor Id.
To get their names, we have to match up the
selected tuples with the tuples in the Professor
table with a natural join on their Id.
Finally, since we only want the names, we make
a projection on the Name.
44
Related to SQL
Given the RA expression
πName(Professor ./Id=ProfId σSemester=‘F1995’(Teaching)),
we immediately have the following SQL query:
MariaDB [registration]> Select P.Name-> From Teaching T, Professor P-> Where T.Semester=’F1995’ And P.Id=T.ProfId;
Notice that we use T, P as shortcuts for Teaching
and Professor, and we also use a condition,
P.Id=T.ProfId, to explicitly enforce the join.
We will get the following result for this query:
+-------------+| Name |+-------------+| David Jones || Mary Doe |+-------------+
Assignment: You have to check all these queries
with you-know-what.
45
Another example
Query: Who taught what in the fall semester
of 1995.
πCrsName,Name((Professor ./P.Id=T.ProfId
(σSemester=‘F1995’(Teaching))) ./C.CrsCode=T.CrsCode Course)
Question: What is going on?
Answer: The restriction finds all the informa-
tion from the Teaching table about who taught
what in F1995, including the professor Id and
course Id.
To get the names of the professors and those
of the courses, we have to match up the se-
lected tuples with the tuples in the Professor
table with a natural join. We similarly find out
the names of those courses.
Finally, since we only want to get the names,
we make a projection on the respective names.
46
Related to SQL
Given
πCrsName,Name((Professor ./P.Id=T.ProfId
(σSemester=‘F1995’(Teaching))) ./C.CrsCode=T.CrsCode Course)
we immediately have the following SQL query:
MariaDB [registration]> Select P.Name, C.CrsName
-> From Teaching T, Professor P, Course C-> Where T.Semester=’F1995’ And P.Id=T.ProfId-> And T.CrsCode=C.CrsCode;
Notice again that we use additional conditions
to explicitly enforce the two join operations.
We will get the following result for this query:
+-------------+---------------------+| Name | CrsName |+-------------+---------------------+| Mary Doe | Database Systems. || David Jones | Electronic Circuits || Ann White | Algebra |+-------------+---------------------+
47
Join in MariaDB
MariaDB actually implements a Join in the
form of
A Join B (join condition)
Thus, we can have the following alternative
SQL expression, which gets us the same an-
swer.
MariaDB [registration]> Select distinct Name, CrsName-> From Professor Join Teaching-> on (Professor.Id=Teaching.ProfId)-> Join Course-> on (Teaching.CrsCode=Course.Crscode)-> Where Teaching.Semester=’F1995’;
It sends back the following:
+-------------+---------------------+| Name | CrsName |+-------------+---------------------+| Mary Doe | Database Systems. || David Jones | Electronic Circuits || Ann White | Algebra |+-------------+---------------------+
48
Why natural join?
Such equality based join is indeed natural, since
it reflects a good design principle: the same
stuff should be related, and nothing else.
For example, different pieces of information
about the same course should be related; while
those information of different courses have noth-
ing to do with each other. Such an attribute is
often given the same name in different tables,
collecting different information.
The condition in a natural join actually equates
all the related attributes in the relations be-
ing joined. Moreover, as we already discussed,
since these attributes really mean the same
thing, it keeps only one of them, while an equi-
join keeps both.
A natural join is defined as follows:
πattributes(σequation of the shared attributes(R × S))
49
Who taught whom?
Those who took the same course as taught by
the professor in the same semester will be so
paired off /. Below is the RA expression:
πStudId,ProfId(Transcript ./C Teaching)
We have the following with MariaDB, where
condition C is made explicit:
MariaDB [registration]> Select T.StudId, H.ProfId-> From Transcript T, Teaching H-> Where T.CrsCode=H.CrsCode-> and T.Semester=H.Semester;
+-----------+-----------+| StudId | ProfId |+-----------+-----------+| 666666666 | 9406321 || 987654321 | 9406321 || 23456789 | 101202303 || 123454321 | 101202303 || 23456789 | 121232343 || 666666666 | 121232343 || 123454321 | 555666777 || 987654321 | 555666777 || 111111111 | 783432188 || 111111111 | 900120450 || 666666666 | 900120450 |+-----------+-----------+
Question: Who are those people?
50
Let’s find them out...
Find out where their names sit, then join those
tables.
MariaDB [registration]> Select P.Name As Professor, S.Name As Student-> From Transcript T, Teaching H, Professor P, Student S-> Where T.CrsCode=H.CrsCode and T.Semester=H.Semester-> and P.Id=H.ProfId and T.StudId=S.Id;
Below is the answer:
+--------------+---------------+| Professor | Student |+--------------+---------------+| John Smyth | Homer Simpson || David Jones | Homer Simpson || Ann White | Jane Doe || Adrian Jones | Jane Doe || Mary Doe | Joe Blow || John Smyth | Joe Blow || David Jones | Jesoph Public || Ann White | Jesoph Public || Jacob Taylor | Jesoph Public || Mary Doe | Bart Simpson || Jacob Taylor | Bart Simpson |+--------------+---------------+
Notice As, the renaming operator.
51
Yet another example
Query: Who took at least two courses?
There are several ways of doing this. We will
start with the following one:
πStudId(σCrsCode 6=CrsCode2(Trancript
./ Transcript[StudId, CrscCode2, Semester2, Grade2]))
Question: Why don’t we rename StudId?
Answer: We use StudId to connect all the
courses taken by the same student, since each
student has his/her unique Id.
Question: Are you sure this stuff works?
Check it out...
52
Let’s find it out...
Given the following Transcript table,
SId CrsC G Sem
1111 CS2370 C F20171111 CS3600 A F20182222 CS2370 B F2017
(Trancript ./ Transcript[SId, CrscC2, G2, Sem2]))
will give us the following:SId CrsC G Sem CrsC2 G2 Sem2
1111 CS2370 C F2017 CS2370 C F20171111 CS2370 C F2017 CS3600 A F20181111 CS3600 A F2018 CS2370 C F20171111 CS3600 A F2018 CS3600 A F20182222 CS2370 B F2017 CS2370 B F2017
(σCrsC 6=CrsC2(Trancript ./ Transcript[SId, CrscC2, G2, Sem2]))
will give usSId CrsC G Sem CrsC2 G2 Sem2
1111 CS2370 C F2017 CS36000 A F20181111 CS3600 A F2018 CS2370 C F2017
And the whole thing gives us
SId
1111
53
Related to SQL
Given
πStudId(σCrsCode 6=CrsCode2(Trancript
./ Transcript[StudId, CrscCode2, Semester2, Grade2]))
we can have the following SQL query:MariaDB [registration]> Select distinct T1.StudId
-> From Transcript T1, Transcript T2-> Where T1.StudId=T2.StudId And T1.CrsCode <> T2.CrsCode;
Notice that we use different names to get two
separate copies, T1 and T2, of the same table.
We will get the following result for this query,
based on our instance:
+-----------+| StudId |+-----------+| 23456789 || 111111111 || 123454321 || 666666666 || 987654321 |+-----------+
Assignment: Find out who they are....
54
Something special
Question: What is the natural join of R and
S when R and S share the same attributes?
Answer: By definition, once we construct the
product of R and S, and apply the equality
restriction on the shared attributes, only the
identical pairs, i.e., those belonging to both,
will stay. But, we will keep only one copy of
those identical pairs.
Hence, in this case, we have R ∩ S.
Question: What is the natural join of R and
S when they have no attribute in common.
Answer: In this case, when we apply the equal-
ity restriction on the shared attributes, nothing
will be kicked out, since nothing is shared. In
the projection step, we also project out no at-
tributes since no attribute is duplicated.
Then, in this case, the whole product stays.
55
An example
To construct the join of S1 and S2
A B
a1 b1a1 b2
A B
a1 b1a2 b1
1. Construct S1 × S2 :
A B A B
a1 b1 a1 b1a1 b1 a2 b1a1 b2 a1 b1a1 b2 a2 b1
2. Apply the join condition: S1.A = S2.A and
S1.B = S2.B
A B A B
a1 b1 a1 b1
3. Remove redundancy.
A B
a1 b1
We end up with S1 ∩ S2.
56
Another example
To construct the join of S1 and S3
A B
a1 b1a1 b2
C D
c1 d1
c2 d1
1. Construct S1 × S3 :
A B C D
A B C B
a1 b1 c1 d1
a1 b1 c2 d1
a1 b2 c1 d1
a1 b2 c2 d1
2. Apply the join condition. Since nothing is
shared, none is removed.
3. Remove redundancy. Again, since no dupli-
cate attributes exist, no attribute is projected
away.
We end up with S1 × S3.
By the way, I just did 5.6 for you.
57
Division
This might be the most complex operation. It
is used in such scenarios that who has taught
everything that is offered by the CS depart-
ment or who has taken every course offered by
a particular professor or who supplies every red
part?
Division in RA is OK, but much more challeng-
ing in SQL /
This operator takes two unary relations, A,B;
and one binary one, C, as its inputs. As the
output, it sends back elements of A that matches
with every element in B, as shown in C.
In the teaching everything case, A and B refer
to ProfId and CrsCode of all the CS courses,
respectively; and C is πProfId,CrsCode(Teaching).
58
Worth how many words?
Let’s check them out with the following data,
and will get {1,2,3,4} divide by s per r =
{2,3} :
59
A final example
Query: Who have taken all the courses taught
by Professor John Smyth?
We know that we can use the division operator,
by finding out the three tables, A, B and C.
A = πStudIdTranscript, i.e., “those who have
taken courses.”
It is easy to get C, “Who have taken what”:
πStudId, CrsCode(Transcript).
To find out B, we have to find out the code of
those courses taught by Prof. Smyth, i.e.,
πCrsCode(σProfId=(πProfId(σName=‘John Smyth’(Professor)))(Teaching)).
We will see later on how to use MariaDB to do
division which is much more intimidating. /
60
Now the SQL part
SQL is the most widely used DB programming
language, with MariaDB being an incomplete
implementation, with the current version being
10.6.4, as of October 6, 2021.
We can submit individual SQL query state-
ments directly to an DBMS through a terminal,
as what we have been doing.
But, in practice, to provide the users with a
better UX, we almost always embed them in a
program in, e.g., PhP, that submits a collec-
tion of SQL statements, with, e.g., an HTML
based UI, to a DBMS at run time and process
the returned results.
We discuss the former case in this chapter, and
talk about the embedding case with a front
end, in a later chapter, Unit 8.
61
To kick off
Query: Who are the professors working in the
EE department?
MariaDB [register]> Select P.Name
-> From Professor P
-> Where P.DeptId=’EE’;
+-------------+
| Name |
+-------------+
| David Jones |
+-------------+
1 row in set (0.00 sec)
As we saw earlier, in the above, we use a tuple
variable, P, which ranges over the tuples of the
Professor relation.
It is not necessary here, but quite useful when
we have to deal with several tables with iden-
tical attribute names as we saw earlier.
62
The evaluation process
The basic algorithm for evaluating such an
SQL statement is as follows:
1. The From part is evaluated to produce a
Cartesian product of all the tables mentioned.
2. The Where part is evaluated to apply a re-
striction on the product where we keep only
these rows that “make the cut”.
3. Finally, the Select part is evaluated to apply
a projection to select those attributed from the
leftover rows taken from the previous step.
Thus, the previous query is nothing but
πName(σDeptId=‘EE’(Professor)).
63
A multi-table example
Considering two tables, Dept
DeptId DName Budget
D1 Marketing 10D2 Development 12D3 Research 5
and Emp
EMPId EName DeptId Salary
E1 Lopez D1 40000E2 John D1 42000E3 Bob D2 30000E4 Jay D2 35000
and the query “Who makes less then 40 grands,
and where do they work?”
Select E.EName,D.DName,E.Salary
From Dept D, Emp E
Where D.DeptId=E.DeptId and E.SALARY<40000;
64
The evaluation process
1. The Cartesian product of the two tables,
Dept and Emp, as mentioned in the From part, is
constructed as follows:
DeptId DName Budget EmpId EName DeptId Salary
D1 Market 10M E1 Lopez D1 40KD1 Market 10M E2 John D1 42KD1 Market 10M E3 Bob D2 30KD1 Market 10M E4 Jay D2 35KD2 Develop 12M E1 Lopez D1 40KD2 Develop 12M E2 John D1 42KD2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35KD3 Research 5M E1 Lopes D2 40KD3 Research 5M E2 John D2 42KD3 Research 5M E3 Bob D2 30KD3 Research 5M E4 Jay D2 35K
2. Then, the Where part is evaluated, so that
only those tuples satisfying the condition
D.DeptId=E.DeptId and E.Salary<40000
are kept.
65
DeptId DName Budget EmpId EName DeptId Salary
D2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35K
3. Finally, the Select part is evaluated, which
keeps only those attributes as mentioned in the
target list.
EName DName Salary
Bob Development 30KJay Development 35K
This is indeed the result of this query when
being applied to this table instance.
MariaDB [Strange]> Select E.EName,D.DName,E.Salary-> From Dept D, Emp E-> Where D.DeptId=E.DeptId and E.SALARY<40000;
+-------+-------------+--------+| EName | DName | Salary |+-------+-------------+--------+| Bob | Development | 30000 || Jaz | Development | 35000 |+-------+-------------+--------+
66
Query with a join
Query: Who taught in Fall 1995?
MariaDB [register]> Select P.Name
-> From Professor P, Teaching T
-> Where P.Id=T.ProfId
-> And T.Semester=’F1995’;
+-------------+
| Name |
+-------------+
| David Jones |
| Mary Doe |
| Ann White |
+-------------+
The evaluation of this query really follows the
three step process as we discussed earlier.
This one is nothing but, in terms of relational
algebra:
πName(Professor ./Id=ProfId (σSemester=‘F1995’(Teaching))).
67
SQL and RA
Given an SQL query
Select TargetList
From Rel1 V1, ..., Reln Vn
Where Condition
its RA expression is essentially the following:
πTargetList(σCondition(Rel1 × · · · × Reln)),
where we do need to convert the Condition into
its RA equivalent.
We talked about earlier the optimization gain
of such a conversion, and, on the other hand,
RA will often give us clues as how to come up
with SQL query, particularly for those tough
ones, as we will see with numerous examples
later on.
68
An example
Query: Who taught what in Fall 1995? (The
stuff on Page 46 ,)
Its RA expression is something like the follow-ing:
πName,CrsName(Professor ./ (σSem=‘F1995’Teaching) ./ Courses),
with appropriate join conditions.
This can be immediately turned into an SQL
query, as follows:
MariaDB [registration]> Select P.Name, C.CrsName-> From Professor P, Teaching T, Course C-> Where T.Semester=’F1995’ And-> P.Id=T.Profid And T.CrsCode=C.CrsCode;
The answer should be the following:
+-------------+---------------------+| Name | CrsName |+-------------+---------------------+| Mary Doe | Database Systems. || David Jones | Electronic Circuits || Ann White | Algebra |+-------------+---------------------+
69
Self-join queries
We once mentioned that, to compose a Carte-
sian product, we sometimes have to rename
identically named attributes.
In particular, we had the following RA expres-
sion (Page 52) to get all students who took at
least two different courses:
πStudId(σCrsCode 6=CrsCode2(Trancript
./ Transcript[StudId, CrscCode2, Semester2, Grade2]))
Here we keep the table name, but rename the
attributes.
70
Its SQL cousin
To do it in SQL, we have the following, wherewe rename the table names.
MariaDB [registration]> Select Distinct T1.StudId-> From Transcript T1, Transcript T2-> Where T1.CrsCode<>T2.CrsCode-> And T1.StudId=T2.StudId;
It would give us the following:
+-----------+| StudId |+-----------+| 23456789 || 111111111 || 123454321 || 666666666 || 987654321 |+-----------+
Question: How do we verify the above result?
Question: Why do we need “Distinct”?
71
We want distinct results
If we don’t use “distinct” we would get the
following:
MariaDB [registration]> Select T1.StudId-> From Transcript T1, Transcript T2-> Where T1.CrsCode<>T2.CrsCode-> And T1.StudId=T2.StudId;
+-----------+| StudId |+-----------+| 23456789 || 23456789 || 111111111 || 111111111 || 111111111 || 111111111 || 111111111 || 111111111 || 123454321 || 123454321 || 123454321 || 123454321 || 123454321 || 123454321 || 666666666 || 666666666 || 666666666 || 666666666 || 666666666 || 666666666 || 987654321 || 987654321 |+-----------+
72
Who are these students?
It should be clear that we just need to add in
another layer of Join with Student to find out
their names.
MariaDB [registration]> Select Distinct S.Name-> From Transcript T1, Transcript T2, Student S-> Where T1.CrsCode<>T2.CrsCode-> And T1.StudId=T2.StudId-> And S.Id=T1.StudId;
+---------------+
| Name |
+---------------+
| Homer Simpson |
| Jane Doe |
| Joe Blow |
| Jesoph Public |
| Bart Simpson |
+---------------+
5 rows in set (0.00 sec)
73
RA 6= SQL
The mathematical RA is to get a set, while
the practical SQL is to get a multiset.
To return a true relations with no duplicates,
the evaluator has to do another scan to take
out all the duplicates, which can be arranged
by putting up another operator
MariaDB [registration]> Select Distinct-> T.ProfId, T.CrsCode-> From Teaching T;
+-----------+---------+| ProfId | CrsCode |+-----------+---------+| 9406321 | MGT123 || 101202303 | CS305 || 101202303 | CS315 || 121232343 | EE101 || 555666777 | CS305 || 783432188 | MGT123 || 900120450 | MAT123 |+-----------+---------+
74
Making comments
We sometimes want to make comments to
make queries more readable. Given the fol-
lowing code
# An example of Select distinct
Select distinct
T.ProfId, T.CrsCode
From Teaching T;
MariaDB will give your the following back:
MariaDB [register]> # An example of Select distinctMariaDB [register]> Select distinct
-> T.ProfId, T.CrsCode-> From Teaching T;
+-----------+---------+| ProfId | CrsCode |+-----------+---------+| 9406321 | MGT123 || 101202303 | CS305 || 101202303 | CS315 || 121232343 | EE101 || 555666777 | CS305 || 783432188 | MGT123 || 900120450 | MAT123 |+-----------+---------+
75
What is in Where?
We have so far only seen simple conditions in
the Where part.
SQL provides some common operators for such
a purpose, such as ‘=’, ‘<’, ‘<>’, (Same as
! =, or even ‘Not =’), etc..
In general, any Boolean expression will do. For
example, the following query
Select E.Id
From Employee E, Employee M
Where E.BossSSn=M.SSN And E.Salary>2*M.Salary
And E.LastName=‘Mc’||E.FirstName
should return all employees who make more
than twice what his boss does, and whose last
name is ‘Mc’ concatenated with his first name,
such as “Donald McDonald”.
76
What about Select?
The Select part can also come with a few spe-
cial features.
If you do want to get everything from the From
part, you put an ‘*’ in the Select.
MariaDB [registration]> Select * From Professor;+-----------+--------------+--------+-----+--------+| Id | Name | DeptId | Age | Salary |+-----------+--------------+--------+-----+--------+| 9406321 | Jacob Taylor | MG | 45 | 30000 || 101202303 | John Smyth | CS | 32 | 40000 || 121232343 | David Jones | EE | 56 | 25000 || 555666777 | Mary Doe | CS | 67 | 40000 || 783432188 | Adrian Jones | MG | 55 | 30000 || 864297351 | Qi Chen | MA | 34 | 35000 || 900120450 | Ann White | MA | 38 | 50000 |+-----------+--------------+--------+-----+--------+
77
Expression in Select
SQL permits expressions in the target list, as
well as new headings through the renaming
mechanism via “As”.
The following finds out the average salary per
year in age for all the professors with MariaDB.
MariaDB [registration]> Select Name, Age, Salary,-> Round(Salary/Age, 2) As SalaryByAge-> From Professor;
+--------------+-----+--------+-------------+| Name | Age | Salary | SalaryByAge |+--------------+-----+--------+-------------+| Jacob Taylor | 45 | 30000 | 666.67 || John Smyth | 32 | 40000 | 1250.00 || David Jones | 56 | 25000 | 446.43 || Mary Doe | 67 | 40000 | 597.01 || Adrian Jones | 55 | 30000 | 545.45 || Qi Chen | 34 | 35000 | 1029.41 || Ann White | 38 | 50000 | 1315.79 |+--------------+-----+--------+-------------+
Notice ROUND(X,D) rounds the argument X to D
decimal places.
Check out the course page fore more.
78
What does Not mean?
Any condition can be negated. For example,
NOT (T1.CrsCode=T2.CrsCode)
It could be even nested. For example
NOT (E.BossSNN=M.SSN And E.Salary>2*M.Salary
And NOT (E.LastName=’Mc’||E.FirstName))
Question: What does the last piece mean?
Answer: It means that if somebody’s salary is
more than twice that much of his boss, then
his last name is ’Mc’ together with his first
name, since
¬(A ∧ ¬B) ≡ ¬A ∨ B ≡ A → B.
Labwork: Let’s take care of Lawork 3.1.
79
Set operations
With SQL, we can use set operators as defined
in RA, i.e., union, intersection and difference.
Query: Who are those professors working ei-
ther in CS or EE departments?
MariaDB [registration]> Select P.Name-> From Professor P-> Where P.DeptId=’CS’-> Union-> Select P.Name From Professor P-> Where P.DeptId=’EE’;
+-------------+| Name |+-------------+| John Smyth || Mary Doe || David Jones |+-------------+
Check out Example 5.15 in Sec. 3.2 of theMariaDB
notes for much more details.
80
An equivalent form
Recall the following:
A ∪ B ≡ {x : x ∈ A ∨ x ∈ B},
the previous query is thus equivalent to the
following:
MariaDB [registration]> Select Distinct P.Name
-> From Professor P
-> Where (P.DeptId=’CS’ Or P.DeptId=’EE’);
+-------------+| Name |+-------------+| John Smyth || David Jones || Mary Doe |+-------------+
Question: Which one is to use?
Answer: It is largely a personal preference for
“Union”, but we don’t have a choice for the
other two, “Intersect” and “Except”, / when
we use MariaDB before 10.3.0, as neither is
available. /
81
This or that...
Query: Who are those professors either affil-
iated with the Computer Science department
or have ever taught a CS course?
MariaDB [registration]> Select P.Name-> From Professor P, Teaching T-> Where P.Id=T.ProfId And T.CrsCode like ’CS%’-> Union-> Select P.Name From Professor P-> Where P.DeptId=’CS’;
+------------+| Name |+------------+| Mary Doe || John Smyth |+------------+
Alternatively, we can also use “or” to do thesame:
MariaDB [registration]> Select Distinct P.Name-> From Professor P, Teaching T-> Where (P.Id=T.ProfId) And (T.CrsCode like ’CS%’)-> Or (P.DeptId=’CS’);
Notice that the condition is (A ∧ B) ∨ C.
82
This and that...
Query: Who are those students who took
both CS315 and CS305?
We might want to do the following:
MariaDB [registration]> Select S.Name-> From Student S, Transcript T-> Where S.Id=T.StudId And T.CrsCode=’CS305’-> Intersect-> Select S.Name-> From Student S, Transcript T-> Where S.Id=T.StudId And T.CrsCode=’CS315’;
ERROR 1064 (42000): You have an error in your SQL syntax;check the manual that corresponds to your MariaDB serverversion for the right syntax to use near ’Intersect
It is because our version, MariaDB Ver. 3.5.68,
does not support “Intersecct” /; while 3.10.0
does.,
Question: Is there a way out?
83
What could we do?
We can use logical operators. Does the fol-lowing one work?
MariaDB [registration]> Select S.Name-> From Student S, Transcript T-> Where S.Id=T.StudId And T.CrsCode=’CS315’-> And T.CrsCode=’CS305’;
Empty set (0.00 sec)
Question: Is it really empty?
Answer: Apparently not so. With the current
instance, 123454321 (Joe Blow) took both. /
Question: Why is it incorrect?
The CrsCode box of any tuple contains only
one value. So, no CrsCode box of any tuple
of Transcript may contain both ‘CS305’ and
‘CS315’.,
Do you still remember data atomicity? /
84
What should we do?
Look for evidence in two tuples in the Transcript
table for two different courses.
MariaDB [registration]> Select S.Name-> From Student S, Transcript T1, Transcript T2-> Where S.Id=T1.StudId And T1.CrsCode=’CS315’-> And S.Id=T2.StudId And T2.CrsCode=’CS305’;
+----------+| Name |+----------+| Joe Blow |+----------+
Question: What is going on?
We came up with two copies of transcript ta-
ble, T1 and T2, where for the same student S,
we look for her record of taking ‘CS315’ in T1
and ‘CS305’ in T2.
Question: How do we make sure it is the same
student?
Join conditions via Student.Id.... ,
85
Who can take CS 3600?
The prerequisites for CS 3600 are “CS 2370
and (MA 2250 or MA 2200)”.
If you use something that supports all the op-
erations
(Select S.NameFrom Student S, Transcript TWhere T.CrsCode="CS2370" And S.Id=T.StudId)Intersect((Select S.NameFrom Student S, Transcript TWhere T.CrsCode="MA2200" And S.Id=T.StudId)Union(Select S.NameFrom Student S, Transcript TWhere T.CrsCode="MA2250" And S.Id=T.StudId))
Otherwise,
Select S.NameFrom Student S, Transcript T1, Transcript T2Where T1.CrsCode="CS2370" And
(T2.Crscode="MA2200 Or T2.CrsCode="MA2250")And T1.StudId=T2.StudIdAnd S.Id=T1.StudId;
Notice again that the condition is A ∧ (B ∨C).
86
This but not that...
Query: Who are those professors who are not
affiliated with Computer Science department,
but taught a CS course?
(Select P.Name From Professor P, Teaching TWhere P.Id=T.ProfId And T.CrsCode like ’CS%’))Except(Select P.Name From Professor PWhere P.DeptId=‘CS’))
Again, this is not supported with MariaDB 3.5.68,
either, but it is supported after Version 10.3.0.,
Question: Is there any alternative?
Recall that A \ B = A ∩ B = A ∩ (¬B)
= {x|x ∈ A} ∩ {x|x 6∈ B} = {x|x ∈ A ∧ x 6∈ B}.
Hence, the following query should do the trick.
MariaDB [registration]> Select P.Name-> From Professor P, Teaching T-> Where P.Id=T.ProfId And T.CrsCode like ’CS%’-> And P.DeptId!=’CS’;
Empty set (0.00 sec)
87
Is it really? Yes!
MariaDB [registration]> Select * From Teaching;+-----------+---------+----------+| ProfId | CrsCode | Semester |+-----------+---------+----------+| 9406123 | MGT123 | F1995 || 9406321 | MGT123 | F1994 || 101202303 | CS305 | S1996 || 101202303 | CS315 | F1997 || 121232343 | EE101 | F1995 || 121232343 | EE101 | S1991 || 555666777 | CS305 | F1995 || 783432188 | MGT123 | F1997 || 900120450 | MAT123 | F1997 || 900120450 | MAT123 | S1996 |+-----------+---------+----------+
MariaDB [registration]> Select * From Professor;+-----------+--------------+--------+-----+--------+| Id | Name | DeptId | Age | Salary |+-----------+--------------+--------+-----+--------+| 9406321 | Jacob Taylor | MG | 45 | 30000 || 101202303 | John Smyth | CS | 32 | 40000 || 121232343 | David Jones | EE | 56 | 25000 || 555666777 | Mary Doe | CS | 67 | 40000 || 783432188 | Adrian Jones | MG | 55 | 30000 || 864297351 | Qi Chen | MA | 34 | 35000 || 900120450 | Ann White | MA | 38 | 50000 |+-----------+--------------+--------+-----+--------+
88
Is it in?
With SQL, we can also test whether something
is a member of a finite set, a basic relation in
set theory.
Query: Who are those professors who work
either in CS or EE department?
Select P.Name From Professor P
Where P.DeptId In {’CS’, ’EE’}
In MariaDB, it looks like the following:
MariaDB [registration]> Select P.Name-> From Professor P-> Where P.DeptId In (’CS’, ’EE’);
+-------------+| Name |+-------------+| John Smyth || David Jones || Mary Doe |+-------------+
Labwork: Let’s take care of Lawork 3.2 next.
89
Nested (nasty) queries
Way back, on Page 41, we addressed the fol-lowing:
Query: Who taught in Fall 1995?
MariaDB [registration]> Select P.Name
-> From Professor P, Teaching T
-> Where P.Id=T.ProfId
-> And T.Semester=’F1995’;
+-------------+
| Name |
+-------------+
| David Jones |
| Mary Doe |
| Ann White |
+-------------+
This one is involved with a join, for which wehave to work out first a Cartesian product, anexpensive operation. /
Question: Is there a better way?
90
Let’s get nesty ...
We can also do it in two steps: a) find out
the ids of those who make the cut from the
Teaching table, then, b) using those ids to find
out their names in the Professor instance.
MariaDB [register]> Select P.Name From Professor P-> Where P.Id IN-> (Select T.ProfId From Teaching T-> Where T.Semester=’F1995’);
+-------------+| Name |+-------------+| David Jones || Mary Doe || Ann White |+-------------+3 rows in set (0.00 sec)
Question: Do you like this latter approach
better?
The fact that SQL statements can be nested
makes it much more powerful ,, but poten-
tially complex and tough to work with /.
91
Why going nesty?
In the previous example, it might be more nat-
ural (?) to come up with the nested version.
But, a much more important reason to use
nested query is that it increases SQL’s expres-
sive power in the sense that lots of things can’t
be done without this feature.
Query: Who did not take any course?
MariaDB [registration]> Select S.Name-> From Student S-> Where S.Id Not In-> # All students who takes some course-> (Select T.StudId-> From Transcript T);
+------------+
| Name |
+------------+
| Mary Smith |
+------------+
Question: Did Mary take anything?
92
Let’s find it out...MariaDB [registration]> Select Distinct C.CrsName
-> From Transcript T, Student S, Course C-> Where S.Name="Mary Smith" And S.Id=T.StudId-> And T.CrsCode=C.CrsCode;
Empty set (0.01 sec)
MariaDB [registration]> Select id, Name From Student;+-----------+---------------+| id | Name |+-----------+---------------+| 111111111 | Jane Doe || 666666666 | Jesoph Public || 111223344 | Mary Smith || 987654321 | Bart Simpson || 23456789 | Homer Simpson || 123454321 | Joe Blow |+-----------+---------------+MariaDB [registration]> Select Distinct StudId, CrsCode
-> From Transcript;+-----------+---------+| StudId | CrsCode |+-----------+---------+| 23456789 | CS305 || 23456789 | EE101 || 111111111 | EE101 || 111111111 | MAT123 || 111111111 | MGT123 || 123454321 | CS305 || 123454321 | CS315 || 123454321 | MAT123 || 666666666 | EE101 || 666666666 | MAT123 || 666666666 | MGT123 || 987654321 | CS305 || 987654321 | MGT123 |+-----------+---------+
93
An alternative
The following certainly works as well: We just
collect all these students for whom no tran-
script record exists.
MariaDB [register]> select S.Name
-> From Student S
-> Where not exists
-> (Select * from Transcript T
-> Where T.StudId=S.Id);
+------------+
| Name |
+------------+
| Mary Smith |
+------------+
1 row in set (0.01 sec)
Notice that Exists is different from In: Some-
one is in R 207 vs. John is in R 207.
94
The nasty and nesty division
Query: Who were taught by all the CS pro-
fessors.
Let’s start by finding out all the students who
were not taught by at least one CS professor.
MariaDB [registration]> Select Distinct S.Name-> From Student S,-> # All CS Professors-> (Select P.Id From Professor P-> Where P.DeptId=’CS’) As CSP-> Where CSP.Id Not In-> # Is this CS professor NOT among those-> # who taught S? If this is the case,-> # we have found the evidence, so we-> # put S’s Name into the output bucket.->-> # All those who taught S-> (Select T.ProfId-> From Teaching T, Transcript R-> Where T.CrsCode=R.CrsCode And-> T.Semester=R.Semester And-> S.Id=R.StudId);
It is a three-layer query. /
95
What do we get?
If you apply the above query to the database
instance we have created in Lab 5, we get the
following:
+---------------+
| Name |
+---------------+
| Jane Doe |
| Jesoph Public |
| Mary Smith |
| Bart Simpson |
| Homer Simpson |
+---------------+
Each and every one of them is not taught by
at least one CS faculty.
Question: Should we trust Dr. Shen? ,?
96
Absolutely not!
Question: Why is Jane in, but Joe out?.
With our instance, Table CSP leads to the fol-
lowing two Ids for Computer Science profes-
sors:
MariaDB [registration]> Select P.Id
-> From Professor P
-> Where P.DeptId=’CS’;
+-----------+
| Id |
+-----------+
| 101202303 |
| 555666777 |
+-----------+
Question: Why should Jane be included in
the output bucket?
97
Knowing that Jane’s Id is ‘’111111111’, wefind out the ProfIds of all the professors whohave taught her are the following:
MariaDB [registration]> Select T.ProfId-> From Teaching T, Transcript R-> Where T.CrsCode=R.CrsCode And-> T.Semester=R.Semester And-> R.StudId=’111111111’;
+-----------+| ProfId |+-----------+| 900120450 || 783432188 |+-----------+
Now, the query on Page 95 tries to check ifany of the Id as contained in Table CSP, i.e.,ProfId of the Computer Science professors, isnot in the above ProfId table.
If it is true, it would mean that at least oneComputer Science professor did not teach her,thus Jane should belong to the bucket of thisquery.
The very first, “101202303” is not in /. That’swhy Jane is included in the output bucket.
98
How about Joe?
Joe’s Id is ‘’123454321’, the ProfIds of all the
professors who have taught him are the follow-
ing:
MariaDB [registration]> Select T.ProfId-> From Teaching T, Transcript R-> Where T.CrsCode=R.CrsCode And-> T.Semester=R.Semester And-> R.StudId=’123454321’;
+-----------+| ProfId |+-----------+| 555666777 || 101202303 |+-----------+
Again, the query on Page 95 tries to check if
at least one CSP professor did not teach Joe.
The inside query fails for both CSP instances.
Both taught him ,. That’s why Joe Blow is
not included in the output bucket.
Thus, Dr. Shen might be correct... in this case.
99
Let’s dig a bit deeper....
Mathematically speaking, what we have got is
the following:
B = {s|∃f ( CSP(f) ∧ ¬( Teaching(f, s)))}.
Question: What is its complement (Cf. Page 87)?
S \ B = {s|¬∃f ( CSP(f) ∧ ¬ Teaching(f, s))}
= {s|∀f ¬( CSP(f) ∧ ¬ Teaching(f, s))}
De′Morgan= {s|∀f ¬ CSP(f) ∨ Teaching(f, s))}
Page79= {s|∀f CSP(f) → Teaching(f, s))}.
Therefore, the complement of B collects all
the students whom every CS faculty has taught.
Question: How do we get the complement in
SQL?
Answer: Not in. Remember “Who does not
supply any red part?” (Cf. Query 5 in lab 9)
100
Let’s play it out...
The following digs out all the students whom
every CS faculty has taught, in four layers.
MariaDB [registration]> Select Name From Student-> Where Id Not In (-> # Below is the previous query on Page 95-> Select Distinct S.Id-> From Student S,-> # All CS Professors-> (Select P.Id From Professor P-> Where P.DeptId=’CS’) As CSP-> Where CSP.Id Not In-> # Professors who has taught S-> (Select T.ProfId-> From Teaching T, Transcript R-> Where T.CrsCode=R.CrsCode And-> T.Semester=R.Semester And-> S.Id=R.StudId));
+----------+| Name |+----------+| Joe Blow |+----------+
Question: Is this result correct?
Answer: Let’s find it out...
101
The whole nine yards...
Question: Who took a CS course?
MariaDB [registration]> Select * From Transcript-> Where CrsCode like ’CS%’;
+-----------+---------+----------+-------+| StudId | CrsCode | Semester | Grade |+-----------+---------+----------+-------+| 23456789 | CS305 | S1996 | A || 123454321 | CS305 | F1995 | A || 123454321 | CS315 | F1997 | A || 987654321 | CS305 | F1995 | C |+-----------+---------+----------+-------+
Question: Who taught a CS courses?
MariaDB [registration]> Select * From Teaching-> Where CrsCode like ’CS%’;
+-----------+---------+----------+| ProfId | CrsCode | Semester |+-----------+---------+----------+| 555666777 | CS305 | F1995 || 101202303 | CS305 | S1996 || 101202303 | CS315 | F1997 |+-----------+---------+----------+
102
Question: Who are those CS professors?
MariaDB [registration]> Select Id From Professor-> Where DeptId=’CS’;
+-----------+| Id |+-----------+| 101202303 || 555666777 |+-----------+
Thus, only one student was taught by the only
two CS professors, with the id being 123454321.
Question: Who is this student?
MariaDB [registration]> Select Name From Student-> Where Id=123454321;
+----------+| Name |+----------+| Joe Blow |+----------+
Thus, the result of this four-layer query / seems
to be correct. ,
103
Could we do it with RA?
What we have got is the collection of students
who have been taught by all the CS faculties.
If we do it with Division, we would find out the
following:
Result = A Divide B V ia C,
where
A = πStudIdTranscript,
B = πProfId(σDeptId=’CS’(Professor)),
C = πStudId, ProfId
(Transcript ./CrsCode,Semester Teaching)
The above MariaDB code implements this de-
vision, although it, and SQL in general, does
not provide a direct hit on division.
Question: Do we have to use four layers of
nesting? Is there a simpler way to do it?
104
Quantified predicates
Beginning with SQL 1999, we can also have
a support of limited quantification, with the
following basic format
For All relation (condition)
For Some relation (condition)
The first returns true if, for all the tuples in
relation, condition is true; while the second
returns true if, at least one tuple in relation,
condition is true.
For example, the following tries to make sure
that every professor is teaching at least one
course. Remember participation constraint?
For All Professor
(Id In (Select T.ProfId From Teaching))
105
The exists operator
It is often necessary to check if a nested sub-query actually returns anything.
Query: Who never took any computer sciencecourse?
One way to do it is to find all CS courses astudent has taken and return those studentsfor which this list is empty. Thus,MariaDB [registration]> Select S.Name From Student S
-> Where Not Exists (-> # All CS courses taken by S.Id-> Select T.CrsCode-> From Transcript T-> Where T.CrsCode like ’CS%’-> And T.StudId=S.Id);
+---------------+| Name |+---------------+| Jane Doe || Mary Smith || Jesoph Public |+---------------+
Question: Can we use Not In in place of Not
exists? Why not?
Answer: It does not fit.
106
The All operator
Query: Who has the highest GPA among all
the students?
MariaDB [registration]> Select S.Name, S.Id
-> From Student S
-> Where S.GPA >= All (Select S.GPA
-> From Student S
-> );
+--------------+-----------+
| Name | Id |
+--------------+-----------+
| Bart Simpson | 987654321 |
+--------------+-----------+
1 row in set (0.00 sec)
Here the operator All returns true whenever
its left argument is at least as high as that of
every student or all students.
107
Is he the one?
Let’s check it out....
MariaDB [registration]> Select Name, GPA
-> From Student;
+---------------+-----+
| Name | GPA |
+---------------+-----+
| Homer Simpson | 3.3 |
| Jane Doe | 3.4 |
| Mary Smith | 0 |
| Joe Blow | 3.2 |
| Jesoph Public | 3.3 |
| Bart Simpson | 3.6 |
+---------------+-----+
6 rows in set (0.00 sec)
So, the universal quantifier is supported in the
MariaDB version as currently installed in tur-
ing, 5.5.68 as well! ,
108
An alternative solution
MariaDB also supports an aggregation opera-
tor, MAX, which can be used as follows.
MariaDB [registration]> Select S.Name, S.Id
-> From Student S
-> Where S.GPA >= (Select Max(S1.GPA)
-> From Student S1);
+--------------+-----------+
| Name | Id |
+--------------+-----------+
| Bart Simpson | 987654321 |
+--------------+-----------+
1 row in set (0.03 sec)
We will see how to embed this to a script later
on.
Labwork: It is about time to do Labwork 3.3.
109
Aggregation
As we have already seen, it is often necessary
to calculate the average, the maximum, etc.,
of the values of certain attributes. It also helps
to simplify some query (Cf. The one on the
last Page).
Thus, SQL provides a collection of such ag-
gregated operators.
Query: What is the average age of our se-
niors?
MariaDB [register]> Select ROUND(avg(S.Age), 1)-> From Student S-> Where S.Status=’Senior’;
+----------------------+| ROUND(avg(S.Age), 1) |+----------------------+| 21.5 |+----------------------+1 row in set (0.02 sec)
110
Is it correct?MariaDB [register]> Select S.Age
-> From Student S-> Where S.Status=’Senior’;
+-----+| Age |+-----+| 21 || 22 |+-----+
We have two seniors, with their average age
being 43/2=21.5. Yes!
Question: Can we make it look even “bet-
ter”?
Answer: Yeah, give it another name...
MariaDB [registration]> Select ROUND(avg(S.Age), 1)-> As AverageAge-> From Student S-> Where S.Status=’Senior’;
+------------+| AverageAge |+------------+| 21.5 |+------------+1 row in set (0.00 sec)
111
Another example
Query: Who is the youngest math professor?
# The youngest professor in Math departmentMariaDB [registration]> Select P.Name,P.Age
-> From Professor P-> Where P.DeptId=’MA’ And-> P.Age=(Select Min(P1.Age)-> From Professor P1-> Where P1.DeptId=’MA’);
+---------+-----+| Name | Age |+---------+-----+| Qi Chen | 34 |+---------+-----+
Check it out...MariaDB [registration]> Select P.Name, P.Age
-> From Professor P-> Where P.DeptId=’MA’;
+-----------+-----+| Name | Age |+-----------+-----+| Qi Chen | 34 || Ann White | 38 |+-----------+-----+
Question: Do we have to use P1?
Answer: No. Check it out... . Why?
112
More aggregation examples
Query: How many professors are there in the
Mathematics department?
MariaDB [registration]> Select Count(P.Name)
-> From Professor P
-> Where P.DeptId=’MA’;
+---------------+
| Count(P.Name) |
+---------------+
| 2 |
+---------------+
Question: Could some of them share names?
MariaDB [registration]> Select Count(Distinct P.Name)-> From Professor P-> Where P.DeptId=’MA’;
+------------------------+| Count(Distinct P.Name) |+------------------------+| 2 |+------------------------+
113
thou shall not do this
You cannot mix an aggregated quantity and
any ordinary attribute, like the following:
Select count(*), S.Id From Student S
Since count returns a single value for the set
of tuples, while S.Id tries to send back a set
of values for every tuple. / They don’t fit to-
gether with each other, so you have to quit....
Query: Which junior(s) achieved the highest
GPA among all the students?
MariaDB [registration]> Select S.Name, S.Id
-> From Student S
-> Where S.GPA >= (Select Max(S1.GPA)
-> From Student S1)
-> And S.Status=’junior’;
Empty set (0.00 sec)
114
Aggregation and grouping
Although we know how to count professors for
one department, what happens if we need this
information for all the departments? This is
where the “Group by” clause is used.
This clause will group rows of a table that
agree on values of a specified subset of at-
tributes.
Question: What does it mean? / An example
...
115
... might help ,
Query: How many professors are there in each
department, and what are their average ages?
MariaDB [registration]> select * from Professor;+-----------+--------------+--------+-----+--------+| Id | Name | DeptId | Age | Salary |+-----------+--------------+--------+-----+--------+| 9406321 | Jacob Taylor | MG | 45 | 30000 || 101202303 | John Smyth | CS | 32 | 40000 || 121232343 | David Jones | EE | 56 | 25000 || 555666777 | Mary Doe | CS | 67 | 40000 || 783432188 | Adrian Jones | MG | 55 | 30000 || 864297351 | Qi Chen | MA | 34 | 35000 || 900120450 | Ann White | MA | 38 | 50000 |+-----------+--------------+--------+-----+--------+7 rows in set (0.00 sec)
We can see, e.g., there are two professors in
Computer Science, and their average age is
(32+67)/2, i.e., 49.5.
What we need to do is to group together all
such rows agreeing on DeptId, and then apply
various operations.
116
How to do it sqlly?
We simply group all the tuples by their Dept.Id,
then apply such aggregated operators as count
and avg.
MariaDB [registration]> Select P.DeptId,-> count(P.Name) As DeptSize,-> ROUND(Avg(P.Age), 1) As AvgAge-> From Professor P-> Group By P.DeptId;
+--------+----------+--------+| DeptId | DeptSize | AvgAge |+--------+----------+--------+| CS | 2 | 49.5 || EE | 1 | 56.0 || MA | 2 | 36.0 || MG | 2 | 50.0 |+--------+----------+--------+4 rows in set (0.00 sec)
The key issue here is that, each column in the
resulted table is either named in the Group by
statement, or is the result of applying certain
aggregation to the tuples of that group.
117
Another example
Query: How many courses does each student
take, and what are their average grades?
Select T.StudId, Count(*) As NumCourses,
Avg(T.Grade) As CrsAvg
From Transcript T
Group By T.StudId
The result could be the following:
StudId NumCourses CrsAvg
6666 3 3.339876 2 2.501234 3 3.330234 2 3.50
For each student, it calculates the number of
courses, together with the average grade that
she has achieved in those courses.
118
The Having clause
Similar to “Where”, the “Having” clause is used
together with the “Group By” to indicate which
groups should be included in the final result.
Whenever a group is generated, this condition
will be applied first. If this group does not
meet the cut, it will not be included.
MariaDB [registration]> Select P.DeptId,-> Count(P.Name) As DeptSize,-> ROUND(Avg(P.Age), 1) As AvgAge-> From Professor P-> Group By P.DeptId-> Having count(*) > 1;
+--------+----------+--------+| DeptId | DeptSize | AvgAge |+--------+----------+--------+| CS | 2 | 49.5 || MA | 2 | 36.0 || MG | 2 | 48.5 |+--------+----------+--------+3 rows in set (0.00 sec)
EE is out, because....
119
More examples
Query: Who achieves more than 3.5 at the
end of this academic year?
Select T.StudId, Count(*) As NumCrs,
Avg(T.Grade) As CrsAvg
From Transcript T
Where T.Semester In (’F2021’, ’S2022’)
Group By T.StudId
Having Avg(T.Grade)>3.5
This can also be done without the Having clause
in two steps:
Select Stats.StudId, Stats.CrsAvg
From (Select T.StudId, Avg(T.Grade)As CrsAvg
From Transcript T
Where T.Semester In (’F2021’, ’S2022’)
Group By T.StudId) As Stats
Where Stats.CrsAvg>3.5
120
Put them into order
We sometimes want to line up all the rows in
the result, using the Order by clause. Thus, if
we add the following
Order by CrsAvg
at the end of the previous query, that list will
be sorted by their average GPA.
If we include instead the following
Order by CrsAvg, StudId
then this list will be sorted by the average GPA,
and with those sharing the same GPA, sorted
by their student ID.
121
deja vu
Query: Who took at least two courses?
We once did it using join as follows (Cf. Pages
52-54):MariaDB [registration]> Select distinct T1.StudId
-> From Transcript T1, Transcript T2-> Where T1.StudId=T2.StudId-> And T1.CrsCode <> T2.CrsCode;
+-----------+| StudId |+-----------+| 23456789 || 111111111 || 123454321 || 666666666 || 987654321 |+-----------+
Question: What to do if we want students
who have taken three, four, or five courses? ,
It turns out that the aggregation operators do
a better job: pick up all those students such
that the number of courses that they have
taken is at least 2.
Question: How to do it?
122
Begin with the very beginning...
The following one finds out the StudId and the
courses they have taken.
MariaDB [registration]> Select StudId, CrsCode
-> From Transcript;
+-----------+---------+| StudId | CrsCode |+-----------+---------+| 23456789 | CS305 || 23456789 | EE101 || 111111111 | EE101 || 111111111 | MAT123 || 111111111 | MGT123 || 123454321 | CS305 || 123454321 | CS315 || 123454321 | MAT123 || 666666666 | EE101 || 666666666 | MAT123 || 666666666 | MGT123 || 987654321 | CS305 || 987654321 | MGT123 |+-----------+---------+
Question: Is this what we want? No.
123
Question: What do we really want?
Answer: We want to know the number of the
courses they have taken.
Question: How to get it SQLly?
MariaDB [registration]> Select StudId,
-> Count(CrsCode) as Num
-> From Transcript
-> Group by StudId;
This code gives us the following:
+-----------+-----+
| StudId | Num |
+-----------+-----+
| 23456789 | 2 |
| 111111111 | 3 |
| 123454321 | 3 |
| 666666666 | 3 |
| 987654321 | 2 |
+-----------+-----+
124
Then what?
We cut out those who took less than two courses
to get the final answer.
MariaDB [registration]> Select StudId,-> Count(CrsCode) as Num-> From Transcript-> Group by StudId-> Having Num>=2-> Order by StudId;
+-----------+-----+| StudId | Num |+-----------+-----+| 23456789 | 2 || 111111111 | 3 || 123454321 | 3 || 666666666 | 3 || 987654321 | 2 |+-----------+-----+
In the above, the Order clause lines up the stuff.
Question: Joe Pecci: “Is that it?”
Answer: No!
125
Who are those kids?
Find them out by joining the last table with
Student.
MariaDB [registration]> Select S.Name-> From (Select StudId, count(CrsCode) as Num-> From Transcript-> Group by StudId-> #Taking at least 2-> Having Num>=2-> Order by StudId) T, Student S-> Where T.StudId=S.Id-> Order by S.Name;
+---------------+
| Name |
+---------------+
| Homer Simpson |
| Jane Doe |
| Joe Blow |
| Jesoph Public |
| Bart Simpson |
+---------------+
126
How about at least three courses?
We simply change 2 to 3 in the above query:
Database changedMariaDB [registration]> Select S.Name
-> From (Select StudId, count(CrsCode) as Num-> From Transcript-> Group by StudId-> #Taking at least 3-> Having Num>=3-> Order by StudId) T, Student S-> Where T.StudId=S.Id-> Order by S.Name;
+---------------+
| Name |
+---------------+
| Jane Doe |
| Jesoph Public |
| Joe Blow |
+---------------+
Question: How to find those kids who have
taken at least 25 courses? ,
127
One step a time?
Join is expensive. There are certainly other
ways... e.g.,
MariaDB [registration]> Select S.Name-> From (Select StudId, count(CrsCode) as Num-> From Transcript-> Group by StudId-> #Taking at least 2-> Having Num>=2-> Order by StudId) T, Student S-> Where T.StudId=S.Id-> Order by S.Name;
+---------------+| Name |+---------------+| Bart Simpson || Homer Simpson || Jane Doe || Jesoph Public || Joe Blow |+---------------+
Again, instead of getting an intermediate table
through a product, the above got it from two
tables, thus cutting down both space and time.
128
The whole ninety feet...
1. The From part will be evaluated first to
produce a Cartesian product of all the tables
mentioned there.
2. The Where clause will be evaluated to pro-
cess each row of the above product table indi-
vidually to see if it makes the cut, and throw
out those who don’t.
3. The Group By clause will then be evaluated
to split the previously cleaned table into groups
where each group consists of those tuples that
agree on the specified attributes as mentioned
in the group clause.
4. The Having clause will then be evaluated
to cut out those groups that don’t satisfy the
Having condition.
129
5. The Select part will be evaluated. It takes
the leftover groups, evaluates the aggregated
functions in the target list for each group, re-
tains those columns that are listed as argu-
ments of the Select statement, and generates
one result row for each group.
6. The rows are ordered with Order by.
130
A final example
Considering two tables, Dept
DEPTId DNAME BUDGET
D1 Marketing 10MD2 Development 12MD3 Research 5M
and Emp
EMPId ENAME DEPTId SALARY
E1 Lopez D1 40KE2 John D1 42KE3 Bob D2 30KE4 Jay D2 35K
and the query
Select D.Dname, Avg(E.Salary)As SalaryAvgFrom Dept D, Emp EWhere D.DName In (’Development’,’Research’)
And D.DeptId=E.DeptIdGroup By D.DeptIdIdOrder by D.DName;
131
How is it evaluated?
1. The Cartesian product of the two tables,
Dept and Emp, as mentioned in the From part, is
constructed as follows:
DId DNAME BUDGET EMPId ENAME DId SALARY
D1 Market 10M E1 Lopez D1 40KD1 Market 10M E2 John D1 42KD1 Market 10M E3 Bob D2 30KD1 Market 10M E4 Jay D2 35KD2 Develop 12M E1 Lopez D1 40KD2 Develop 12M E2 John D1 42KD2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35KD3 Research 5M E1 Lopes D2 40KD3 Research 5M E2 John D2 42KD3 Research 5M E3 Bob D2 30KD3 Research 5M E4 Jay D2 35K
2. Then, the Where part is evaluated, to get
those tuples satisfying the condition
D.DName In (’Develop’,’Research’)And D.Dept#=E.Dept#
DId DNAME BUDGET EMPId ENAME DId SALARY
D2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35K
132
3. The Group by clause is evaluated to group
the above tuples into groups with the same
DId’s:
DId DNAME BUDGET EMPId ENAME DId SALARY
D2 Develop 12M E3 Bob D2 30KDevelop 12M E4 Jay D2 35K
4. There is no Having clause, so all groups stay.
5. The Select part is done, giving the following
table of Stats:
DNAME SalaryAvG
Develop 32.5K
6. Since there is only one row, the result stays
the same under Order by.
133
This is what MariaDB says...
MariaDB [Strange]> Select D.DName,
-> ROUND(Avg(E.Salary), 2)
-> As SalaryAvg
-> From Dept D, Emp E
-> Where D.DName In (’Development’,
-> ’Research’)
-> And D.DeptId=E.DeptId
-> Group By D.DeptId
-> Order by D.DName;
+-------------+-----------+
| DName | SalaryAvg |
+-------------+-----------+
| Development | 32500.00 |
+-------------+-----------+
1 row in set (0.00 sec)
Labwork: Let’s get it over with Labwork 3.4.
134
Views in SQL
A view is simply a virtual table. We can create
views and then use them in SQL queries. For
example,
MariaDB [register]> Create View AvgDeptAge(Dept, AvgAge) As
-> Select P.DeptId, Avg(P.Age)-> From Professor P
-> Group By P.DeptId;Query OK, 0 rows affected (0.01 sec)MariaDB [registration]> Select Dept, ROUND(AvgAge)
-> From AvgDeptAge;+------+---------------+
| Dept | ROUND(AvgAge) |+------+---------------+| CS | 50 |
| EE | 56 || MA | 36 |
| MG | 50 |+------+---------------+4 rows in set (0.00 sec)
If you get bored with it, use the following to
remove it.
MariaDB [register]> drop view AvgDeptAge;
Query OK, 0 rows affected (0.00 sec)
135
Then what?
We can then use this subroutine, (method) in
further programming, e.g., to find out the de-
partment with the minimum average age
MariaDB [register]> Select A.Dept, ROUND(A.AvgAge)-> From AvgDeptAge A-> Where A.AvgAge=(-> Select Min(A.AvgAge)-> From AvgDeptAge A);
+------+-------------------+| Dept | ROUND(A.AvgAge) |+------+-------------------+| MA | 36 |+------+-------------------+
Thus, a view is similar to a pre-defined method
(function) as we have been using a lot in Java
(Python) or any other programming languages,
which cuts away unnecessary details.
Question: Did Adam talk about it earlier? ,
We will go through some view based program-
ming in Lab 4 this Friday to wrap up DB pro-
gramming.
136
What else?
There are a few reasons why view is desirable.
1. It provides automatic security for hidden
data, e.g., with the PresidentList view, a user
cannot access their Ids: it does not have any.
Create View PresodemtList AsSelect T.Name, Count(*) As NumCrs,
Avg(T.Grade) As CrsAvgFrom Transcript TWhere T.Semester In (’F2021’, ’S2022’)Group By T.StudIdHaving Avg(T.Grade)>= 3.7;
2. It allows a user to pay attention to what
s/he is interested in, and ignore the rest.
3. It makes MariaDB programming much moreexpressive and powerful, as you will see through
Lab 4.
4. It provides logic data independence...?
Incidentally, View is not implemented in MySQL
(MariaDB) until version 5.1. ,
137
Logic data independence
As discussed in an earlier chapter, logical data
independence refers to the immunity of users
and user programs to changes in the logic struc-
ture of the database. Views provide the means
to achieve such immunity in two aspects: growth
and restructuring.
When a database grows to get in more infor-
mation, its structure must grow accordingly: it
may have to include new attributes, and even
new tables.
We already revised the Student table in Lab 11,
and will go through a Professor restructuring
example in Lab 12, as described in Section 4
in the MariaDB notes.
At least in principle, neither of these changes
should have any effect on existing users or user
programs at all. Otherwise, we are in deep
trouble. /
138
What trouble?
From time to time, it is necessary to restruc-
ture the database such that the overall content
remains the same, but the structure of infor-
mation changes. For example, at some point,
we wish to replace the original Student table by
the following two tables:
Create table StudentBasic (
Id Integer,
Name Char(20), Not null
Address Char(50)
Primary Key (Id))
Create table StudentStatus (
Id Integer,
Status Char(10), default ’freshman’
Primary Key (Id))
Now, all the application programs based on the
Student table can no longer be used... /
139
What to do?
We just create a view as follows:
Create View Student As
Select B.StudId, B.Name, B.Address,S.Status
From StudentBasic B, StudentStatus S
Where B.Id=S.Id
When this change takes place, application pro-grams that previously referred to the originalStudent table will now refer to the Student view,thus nothing changes externally from a user’sperspective, although the table structure haschanged at the conceptual level (Still remem-ber this stuff?).
All the applications that we have developedover the years can still be used, through thisview. ,
Question: When is the Final? /
Answer: Four weeks down the road... on De-cember 15, 2021.
140
Two principles
Views really serve two rather different purposes:
1) To a user who defines the view, it is really
just a shorthand for a “subroutine”. 2) To
other external users, it should look and behave
exactly like a table.
The Interchangeability Principle: There is no
distinction between tables and views from an
external perspective.
The Database Relativity Principle: As far as
the information equivalence is concerned, the
choice of which database is the real one is ar-
bitrary, as well.
We only care about the content (what is it?),
but not the format (how is it kept?).
141
Modify the database
We have discussed mostly how to get informa-
tion out of a database via queries.
In reality, a database changes its data all the
time. We can either add more rows, take some
out, or modify the existing rows, as we have
been doing throughout this semester.
When we have to insert a large quantity of
rows, there is a much easier way to do it.
In Section 4 of the MariaDB notes, you can
find out how to easily fill an easyClass table.
Sometimes, we just want to have the informa-
tion, but not physically keep them.
In Lab 12, the final lab for individuals, you will
dig out information of hard(er) classes, with a
hardClass view.
142
Update tables
It is pretty easy to update an existing row in a
table if you can identify it.
MariaDB [register]> Select Grade From Transcript-> Where StudId=’666666666’-> And CrsCode=’EE101’;
+-------+| Grade |+-------+| B |+-------+
MariaDB [register]> Update Transcript
-> Set Grade=’A’
-> Where StudId=’666666666’
-> And CrsCode=’EE101’;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MariaDB [register]> Select Grade From Transcript-> Where StudId=’666666666’-> And CrsCode=’EE101’;
+-------+| Grade |+-------+| A |+-------+
143
Another example
If, instead of firing those tough professors, who
failed more than half of a class /, we want to
move them over to administration positions ,.
We can do the following, where hardClass is a
view that you will create in Lab 12.
Update Professor
Set DeptId=’Adm’
Where id In
(Select T.ProfId
From Teaching T, hardClass H
Where T.CrsCode=H.CrsCode
And T.Semester=H.Semester
And H.FailRate>0.5)
144
All politics are local....
The following does not work with the current
version, 5.5.68, of MariaDB.
MariaDB [registration]> Update Professor P1-> Set P1.Salary = P1.Salary*1.1-> Where P1.Id in-> (Select P.Id-> From Professor P, Teaching T-> Where P.age < 40 and P.Id = T.ProfId-> and T.CrsCode =’MAT123’-> and (T.Semester = ’S1997’-> or T.Semester = ’F1997’));
ERROR 1093 (HY000): You can’t specify target table ’P1’for update in FROM clause
I found the following disclaimer in §13.2.11.
UPDATE Syntax of MySQL 5.7 Reference Man-
ual: “Currently, you cannot update a table and
select from the same table in a subquery.” /
But, it is mentioned that, since MariaDB 10.3.2,
UPDATE statements may have the same source
and target. ,
145
Update on views
It is natural to allow the programmers to up-
date them as well. But, it is tough to do....
1. Assume we have a simple projective view
on the Transcript, with only three attributes,
CrsCode, StudId and Semester. If we add a row
into this view, which is further put into the
Transcript table, then the Grade piece is miss-
ing in that row in the table. This can be filled
with a null value if it is permitted by the as-
sociated ICs. Otherwise, it would be rejected.
You would not know why.
2. Assume that we have another view CSProf
over the Professor table, generated with a re-
striction of DeptId=’CS’. Assume that we now
add in a row (1212, ’Paul Schemit’, ’EE’) into
this view, then the table.
When we later query this view, we will not be
able to get the row back even though we just
added it in. (Remember durability?)
146
3. Moreover, the impact of a view update can
be ambiguous. This could lead to serious con-
sequence. For example, given the following
view
Create View ProfDept (PrName, DeName) AS
Select P.Name, D.Name
From Professor P, Dept D
Where P.DeptId = D.DeptId
If we delete a row (’Smyth’,’CS’) from the
view, we could either delete the row for ‘Smyth’
from Professor, or the row for ‘CS’ from Dept,
or set the value for DeptId for ‘Smyth’ in Professor
to null.
Question: What should DBMS do? /
147
A little summary
View update is not always doable. Much work
has been done in this regard, but no consen-
sus has emerged. SQL thus has taken a sim-
ple minded approach by accepting only a very
limited case of view update, called updatable
views.
1. Exactly one table can be included in the
From part.
2. Neither aggregates, Group By clause, Having
clause, nor set operators are allowed.
3. Nested sub-queries in the Where part can’t
refer to the table mentioned in the From part.
4. No expressions, or Distinct keyword are
allowed in the Select part.
148
An example
Below shows an updatable view:
Create View CanTeach(Professor, Course) As
Select T.ProfId, T.CrsCode
From Teaching T
Assume we want to delete a pair (0940, MGT123)
from the view. Then all the rows in the Teaching
table must be deleted.
Labwork: Let’s wrap this up with Lab 12 onLabwork 4, due at 9 p.m., Friday, November 5,2021.
We have learned a lot, don’t we? It is time todo something... .
Based on these programming labs, teams shouldget together and come up with queries for yourprojects.
Project III is due by 9 p.m., Wednesday, Novem-ber 10, 2021.
149