Upload
lamtu
View
218
Download
0
Embed Size (px)
Citation preview
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
CSE 303: Database
Lecture 11
Chapter 2: The Relational Algebra
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
RDBMS Architecture
How does a SQL engine work ?
SQL Relational
OptimizedSQL Query
Algebra (RA) Plan
OptimizedRA Plan
Execution
Declarative query (from user)
Translate to relational algebra expression
Find logically equivalent- but more efficient-RA expression
Execute each operator of the optimized plan!
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
RDBMS Architecture
How does a SQL engine work ?
SQL Relational
OptimizedSQL Query
Algebra (RA) Plan
OptimizedRA Plan
Execution
Relational Algebra allows us to translate declarative (SQL) queries into precise and optimizable expressions!
Relational Algebra allows us to translate declarative (SQL) queries into precise and optimizable expressions!
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
• Five basic operators:
1. Selection: s
2. Projection: P
3. Cartesian Product:
Relational Algebra (RA)
We’ll look at these first!We’ll look at these first!
4. Union:
5. Difference: -
• Derived or auxiliary operators:
• Intersection
• Joins (natural,equi-join, theta join)
• Renaming:
And also at one example of a derived operator (natural join) and a special operator (renaming)
And also at one example of a derived operator (natural join) and a special operator (renaming)
Then we’ll look see these set operationsThen we’ll look see these set operations
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Keep in mind: RA operates on sets!
• RDBMSs use Bags (multisets), however in relational algebra formalism we will consider sets!
• Also: we will consider the named perspective, where every attribute • Also: we will consider the named perspective, where every attribute must have a unique name
• attribute order does not matter…
Now on to the basic RA operators…Now on to the basic RA operators…
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
• Returns all tuples which satisfy a condition
• Notation: sc(R)
• Examples
SELECT *FROM StudentsWHERE gpa > 3.5;
SQL:
Students(sid,sname,gpa)
• Examples
• sSalary > 40000 (Employee)
• sname = ‘Smith’ (Employee)
• The condition c can use =, <, , >, , <>RA:
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
SSN Name Salary
1234545 John 200000
5423341 Smith 600000
4352342 Fred 500000
Another example:
sSalary > 40000 (Employee)
SSN Name Salary
5423341 Smith 600000
4352342 Fred 500000
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
• Eliminates columns, then removes duplicates
• Notation: P A1,…,An (R)
• Example: project social-security
SELECT DISTINCTsname,gpa
FROM Students;
SQL:
Students(sid,sname,gpa)
• Example: project social-security number and names:
• P SSN, Name (Employee)
• Output schema: Answer(SSN, Name)
FROM Students;
RA:
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
SSN Name Salary
1234545 John 200000
5423341 John 600000
4352342 John 200000
Another example:
P Name,Salary (Employee)
Name Salary
John 200000
John 600000
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Note that RA Operators are Compositional!
SELECT DISTINCTsname,
Students(sid,sname,gpa)
sname,gpa
FROM StudentsWHERE gpa > 3.5;
How do we represent this query in RA?How do we represent this query in RA? Are these logically equivalent?Are these logically equivalent?
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
• Each tuple in R1 with each tuple in R2
• Notation: R1 R2
• Example: • Employee Dependents
SELECT *FROM Students, People;
SQL:
Students(sid,sname,gpa)People(ssn,pname,address)
• Employee Dependents
• Rare in practice; mainly used to express joins RA:
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
ssn pname address
1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid sname gpa
001 John 3.4
002 Bob 1.3
People StudentsAnother example:
ssn pname address sid sname gpa
1234545 John 216 Rosse 001 John 3.4
5423341 Bob 217 Rosse 001 John 3.4
1234545 John 216 Rosse 002 Bob 1.3
5423341 Bob 216 Rosse 002 Bob 1.3
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
• Changes the schema, not the instance
• A ‘special’ operator- neither basic nor derived
• Notation: (R)
SELECTsid AS studId,sname AS name,gpa AS gradePtAvg
SQL:
Students(sid,sname,gpa)
• Notation: S(B1,…,Bn) (R)
• Note: this is shorthand for the proper form (since names, not order matters!):
• S(A1B1,…,AnBn) (R)
gpa AS gradePtAvgFROM Students;
RA:)(),,( StudentsgradePtAvgnamestudIdStudents
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
sid sname gpa
001 John 3.4
002 Bob 1.3
StudentsAnother example:
)(Students
studId name gradePtAvg
001 John 3.4
002 Bob 1.3
Students
)(),,( StudentsgradePtAvgnamestudIdStudents
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
SELECT DISTINCTssid, S.name, gpa,ssn, address
FROM
SQL:
Students(sid,name,gpa)People(ssn,name,address)
FROM Students S,People P
WHERE S.name = P.name;
RA:
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
ssn P.name address
1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid S.name gpa
001 John 3.4
002 Bob 1.3
People PStudents SAnother example:
sid S.name gpa ssn address
001 John 3.4 1234545 216 Rosse
002 Bob 1.3 5423341 216 Rosse
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Example: Converting SFW Query -> RA
SELECT DISTINCTgpa,
Students(sid,sname,gpa)People(ssn,sname,address)
gpa,address
FROM Students S,People P
WHERE gpa > 3.5 ANDS.sname = P.sname;
How do we represent this query in RA?How do we represent this query in RA?
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Advanced Relational Algebra
1. Set Operations in RA
2. Fancier RA
Lecture 16 > Section 2
2. Fancier RA
3. Extensions & Limitations
19
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
• Five basic operators:
1. Selection: s
2. Projection: P
3. Cartesian Product:
4. Union:
Relational Algebra (RA)
4. Union:
5. Difference: -
• Derived or auxiliary operators:
• Intersection
• Joins (natural,equi-join, theta join, semi-join)
• Renaming:
We’ll look at theseWe’ll look at these
And also at some of these derived operators
And also at some of these derived operators
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
1. Union () and 2. Difference (–)
• R1 R2
• Example: • ActiveEmployees RetiredEmployees
R1 R2
• R1 – R2
• Example:• AllEmployees - RetiredEmployees R1 R2
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
What about Intersection () ?
• It is a derived operator
• R1 R2 = ?
• R1 R2 = R1 – (R1 – R2)
• Example
R1 R2
• Example• UnionizedEmployees RetiredEmployees
R1 R2
R1 – R2
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
SELECT *FROM Students,People
WHERE q;
SQL:
Students(sid,sname,gpa)People(ssn,pname,address)
WHERE q;
RA:
Note that natural join is a theta join + a projection.Note that natural join is a theta join + a projection.
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
SELECT *FROM Students S,People P
SQL:
Students(sid,sname,gpa)People(ssn,pname,address)
People PWHERE sname = pname;
RA:
Most common join in practice!Most common join in practice!
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
RDBMS Architecture
How does a SQL engine work ?
SQL Relational
OptimizedSQL Query
Algebra (RA) Plan
OptimizedRA Plan
Execution
We’ll get a flavor of how to optimize on these plans now
We’ll get a flavor of how to optimize on these plans now
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Note: We can visualize the plan as a tree
R(A,B) S(B,C)
Bottom-up tree traversal = order of operation execution! Bottom-up tree traversal = order of operation execution!
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
A simple plan
What SQL query does this correspond to?What SQL query does this correspond to?
Are there any logically Are there any logically
R(A,B) S(B,C)
Are there any logically equivalent RA expressions?
Are there any logically equivalent RA expressions?
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
“Pushing down” projection
R(A,B) S(B,C)R(A,B) S(B,C)
Why might we prefer this plan?Why might we prefer this plan?
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Takeaways
• This process is called logical optimization
• Many equivalent plans used to search for “good plans”
• Relational algebra is an important abstraction.
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
RA commutators
• The basic commutators:• Push projection through (1) selection, (2) join
• Push selection through (3) selection, (4) projection, (5) join
• Also: Joins can be re-ordered!Also: Joins can be re-ordered!
This simple set of tools allows us to greatly improve the execution time of queries by optimizing RA plans!This simple set of tools allows us to greatly improve
the execution time of queries by optimizing RA plans!
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
sA<10
SELECT R.A, T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C
R(A,B) S(B,C) T(C,D)
Translating to RA
R(A,B) S(B,C)
T(C,D)
AND S.C = T.CAND R.A < 10;
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Logical Optimization
• Heuristically, we want selections and projections to occur as early as possible in the plan
• Terminology: “push down selections” and “pushing down projections.”
• Intuition: We will have fewer tuples in a plan.
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
sA<10
SELECT R.A,T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C
R(A,B) S(B,C) T(C,D)
Optimizing RA Plan Push down selection on A so it occurs earlier
Push down selection on A so it occurs earlier
R(A,B) S(B,C)
T(C,D)
AND S.C = T.CAND R.A < 10;
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
SELECT R.A,T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C
R(A,B) S(B,C) T(C,D)
Optimizing RA Plan Push down selection on A so it occurs earlier
Push down selection on A so it occurs earlier
R(A,B)
S(B,C)
T(C,D)
AND S.C = T.CAND R.A < 10;
sA<10
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
SELECT R.A,T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C
R(A,B) S(B,C) T(C,D)
Optimizing RA Plan Push down projection so it occurs earlier
Push down projection so it occurs earlier
R(A,B)
S(B,C)
T(C,D)
AND S.C = T.CAND R.A < 10;
sA<10
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
SELECT R.A,T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C
R(A,B) S(B,C) T(C,D)
Optimizing RA Plan We eliminate B earlier!We eliminate B earlier!
In general, when is an attribute not needed…?
In general, when is an attribute not needed…?
R(A,B)
S(B,C)
T(C,D)
AND S.C = T.CAND R.A < 10;
sA<10
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Example (PPLP)
• What PC models have a speed of at least 3.00?
• Which manufacturers make laptops with a hard disk of at least
Answer(model) := Pmodel(sspeed >= 3.00 (PC))
• Which manufacturers make laptops with a hard disk of at least 100GB.
Answer(maker) := Pmaker(shd >= 100 (Product Laptop))
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Example (PPLP)
• Find the model numbers of all color laser printers?
• Find those manufacturer that sells Laptops but not PC’s.
Pmodel(scolor=‘T’ AND type = ‘laser’ (Printer))
• Find those manufacturer that sells Laptops but not PC’s.
Pmaker(stype=‘laptop’ (Product)) - Pmaker(stype=‘pc’ (Product))
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Example (PPLP)
• Find those hard disk sizes that occur two or more PC’s?
• Find those pairs of PC models that have both the same speed and RAM. A
Phd(spc1.model < > pc2.model AND pc1.hd = pc2.hd (ρpc1 (PC) ρpc2 (PC)))
• Find those pairs of PC models that have both the same speed and RAM. A pair should be listed only once.
Ppc1.model,pc2.model(spc1.model < pc2.model AND pc1.speed = pc2.speed AND pc1.ram = pc2.ram (ρpc1 (PC) ρpc2 (PC)))
CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman
Example (PPLP)
• Find the manufacturer(s) of the PC or Laptop with highest available speed?
R1:= Pmodel,speed(sspeed >= 2.0 (PC) sspeed >= 2.0 (Laptop))
Pmaker, model (Product) - Ppdpl1.maker, pdpl1.model(spdpl1.speed > pdpl2.speed (ρpdpl1 (Product
R1 ) ρpdpl2 (Product R1)))