20
Fall 2002 CSE330/CIS550 Handout 2 1 The Relational Model: Relational Calculus

Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Embed Size (px)

Citation preview

Page 1: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 1

The Relational Model: Relational Calculus

Page 2: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 2

Relational Calculus

• First-order logic (FOL) can also be thought of as a query language, and can be used in two ways:– Tuple relational calculus– Domain relational calculus

• The difference is the level at which variables are used: for attributes (domains) or for tuples.

• The calculus is non-procedural (declarative) as compared to the algebra.

Page 3: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 3

Domain relational calculus

• Queries have form: {<x1,x2, …, xn>|p} where x1,x2, …, xn are domain variables and p is a predicate which may mention the variables x1,x2, …, xn

• Example: simple projection {<RN,H>|RI,G,R. <RI,RN,G,R,H>Routes}• Example: selection and projection: {<RN,H>|RI,G,R. <RI,RN,G,R,H>Routes

G >15}

Page 4: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 4

DRC examples, cont

• Join: {<CI,R>|RI,RN,G,H,RI’,Da,Du.

<RI,RN,G,R,H>Routes <CI,RI’,Da,Du>Climbs RI=RI’}

We could also have written the above as: {<CI,R>|RI,RN,G,H,Da,Du.

<RI,RN,G,R,H>Routes <CI,RI,Da,Du>Climbs}

Page 5: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 5

Predicate Logic - a quick review

• The syntax of predicate logic starts with variables, constants and predicates that can be built using a collection of boolean-valued operators (boolean expressions)

• Examples: 1=2, x y, prime(x), contains(t,”Joe”). Precisely what operations are available depends on the domain and on the query language.

• For now we will assume the following boolean expressions:– <X,Y,…> Rel, X op Y, X op constant, or constant op

X, where op is , , , , , and X,Y,… are domain variables

Page 6: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 6

Predicate Logic, cont.• Starting with these basic predicates (also called

atomic) , we can build up new predicates by the following rules:– Logical connectives: If p and q are predicates, then so

are pq, pq, p, and pq• (x>2) (x<4)• (x>2) (x>0)

– Existential quantification: If p is a predicate, then so is x.p x. (x>2) (x<4)

– Universal quantification: If p is a predicate, then so is x.p x.x>2 x. y.y>x

Page 7: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 7

Logical Equivalences

• There are two logical equivalences that will be heavily used:– pq p q (Whenever p is true, q must

also be true.) x. p(x) x. p(x) (p is true for all x)

• The second will be especially important when we study SQL.

Page 8: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 8

Free and bound variables

• A variable v is bound in a predicate p when p is of the form v… or v…

• A variable occurs free in p if it occurs in a position where it is not bound by an enclosing or

• Examples: – x is free in x>2– x is bound in x.x>y– x is free in (x>17) (x.x>2)

• Note that there are two occurrences of x in the last example.

Page 9: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 9

Renaming variables

• When a variable is bound one can replace it with some other variable without altering the meaning of the expression, providing there are no name clashes

• Example: x.x>2 is equivalent to y.y>2

Page 10: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 10

Some queries…

• Try the following examples:– The names and ages of climbers– The names and ages of climbers who have

climbed route 214– The names of climbers who have climbed

“Last Tango”– The names of climbers who have climbed all

routes with rating greater than 15– The names of climbers who have climbed

the same route twice

Page 11: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 11

Safety• There is a problem with what we have done so far. How

should we treat a query like: {<CI,CN,S,A>| <CI,CN,S,A> Climbers>}• This presumably means the set of all four-tuples that are

not climbers, which is presumably an infinite set (and unsafe query).

• A query is safe if no matter how we instantiate the relations, it always produces a finite answer.

• In particular, the query should be domain independent, meaning that the answer is the same regardless of the domain in which it is evaluated.

• Unfortunately, both this definition of safety and domain independence are semantic conditions, and are undecidable.

Page 12: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 12

Syntactic Safety

• There are syntactic conditions that are used to define “safe” formulas. In particular:– Every “safe” formula is domain independent.– It is implementable.– The formulas that are expressible in real query

languages based on relational calculus are all “safe”

• The definition is complicated, and is not in the text book. It can be found in Ullman’s book on databases (Principles of Database and Knowledge-Base Systems).

Page 13: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 13

Translating from RA to DRC

• Recall that the relational algebra consists of , , , x, -. We need to work our way through the structure of an RA expression, translating each possible form.

• Let TR[e] be the translation of RA expression e into DRC.

• Relation names: For the RA expression R, the DRC expression is

{<x1,x2, …, xn>| <x1,x2, …, xn> R}

Page 14: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 14

Selection• Suppose the RA expression is c(e’), where e’ is

another RA expression with TR[e’]= {<x1,x2, …, xn>| p}

Then the translation of c(e’) is

{<x1,x2, …, xn>| pC’}, where C’ is the condition obtained from C by replacing each attribute with the corresponding variable.

• Example: TR[#1=#2 #4>2.5R] (where R has arity 4) is

{<x1,x2, x3, x4>|< x1,x2, x3, x4> R x1=x2 x4>2.5}

Page 15: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 15

Projection

• If TR[e]= {<x1,x2, …, xn>| p} then

TR[i1,i2,…,im(e)]=

{<x i1,x i2

, …, x im >| xj1,xj2

, …, xjk.p},

where xj1,xj2

, …, xjk are variables in x1,x2, …,

xn that are not in x i1,x i2

, …, x im

• Example: With R as before, #1,#3 (R)={<x1,x3>| x2,x4. <x1,x2, x3,x4> R}

Page 16: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 16

Union• We know that R and S in RS must be union

compatible, so they must have the same arity. Therefore we can assume that for e1e2, where e1, e2are algebra expressions, TR[e1]={<x1,…,xn>|p} and TR[e2]={<y1,…yn>|q}. Relabel the variables in the second so that TR[e2]={< x1,…,xn>|q’}. This may involve relabeling bound variables in q to avoid clashes. Then TR[e1e2]={<x1,…,xn>|pq’}.

• Example: TR[RS]={< x1,x2, x3,x4>| <x1,x2, x3,x4>R <x1,x2,

x3,x4>S

Page 17: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 17

Other binary operators

• Difference: The same conditions hold as for union. So TR[e1]={<x1,…,xn>|p} and TR[e2]={< x1,…,xn>|q}. Then TR[e1- e2]= {<x1,…,xn>|pq}

• Product: If TR[e1]={<x1,…,xn>|p} and TR[e2]={< y1,…,ym>|q}, then TR[e1 e2]= {<x1,…,xn, y1,…,ym >| pq}

• Example: TR[RS]= {<x1,…,xn, y1,…,ym >| <x1,…,xn> R <y1,…,ym > S }

Page 18: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 18

Summary• We’ve seen how to translate relational algebra

into (domain) relational calculus.• There are various syntactic restrictions for

guaranteeing the safety of a DRC query. From any of these we can translate back into relational algebra

• It was this correspondence between an (implementable and optimizable) algebra and first-order logic that was responsible for the initial development of relational databases – a prime example of some theory leading to highly successful practical developments!

Page 19: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 19

What we cannot compute with relational algebra

• Aggregate operations, e.g. “The number of climbers who have climbed ‘Last Tango’” or “The average age of climbers.” These are possible in SQL.

• Recursive queries. Given a relation Parent(Parent, Child) compute the ancestor relation. This appears to call for an arbitrary number of joins. It is known that it cannot be expressed in first-order logic, hence it cannot be expressed in relational algebra.

Page 20: Fall 2002CSE330/CIS550 Handout 21 The Relational Model: Relational Calculus

Fall 2002 CSE330/CIS550 Handout 2 20

What we cannot compute with relational algebra, cont

• Computing with complex structures that are not (1NF) relations, e.g. lists, arrays, multisets.

• Of course, we can always compute such things if we can “talk to” a database from a full-blown (Turing complete) programming language, and we’ll see how to do this later. However, communicating with a database in this way may well be inefficient, and adding computational power to a query language remains an important research topic.