Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Chapter 3:
The Relational Data Model andRelational Databases
2
The Relational Model of Data
a
b
c
d
lm
ns
t
A B
e
abcde
l m n s t
A relation between sets A and B
A subset of A x B
3
The basis of the model is the concept of relation, as foundin mathematics, set theory, mathematical logic, in particular inpredicate logic
A model is given in terms of relations between elements of adomain
A relational schema contains the basic elements of a relationaldata model
The schema is application dependent
4A relational schema S contains:
The data domain D, which is a possibly infinite set
A finite collection of relations (relation names) R1, . . . , Rn
over D of finite and fixed arity
That is, for each relation name R ∈ S, its potential exten-sions will be subsets of Dk = D × · · · × D (k times) forsome natural number k that depends on R
R(·, . . . , ·) k arguments
An k-ary relation can be seen as a table with k columns
R · · · · · ·· · · · · · · ·· · · · · · · ·
↑←− k columns−→↑
5
A finite collection of attributes (attribute names) A1, . . . , Am
They are associated to the different relations to denotetheir arguments, or “columns”
They can be identified with/by unary relations (unary pred-icates, properties) over DThat is, they can be identified with subsets (sub-domains)of the domain D
R A · · · · · · C· · · · · · · ·· · · · · · · ·
A, . . . , C are attributes of relation R
6
Example: Schema S with
Domain D = {john, peter ,mary , ..., 1, 2, 3, 4, ....}Binary relation People(·, ·)Attributes for People, in this order: Name,Age
People Name Age· ·
No contents or extensions so far; the schema describes the struc-ture of the model
The schemas are domain/application dependent
We can see that attributes can be seen as subsets (to be) of thedomain
7
Two attributes, with different names, can have later the sameextensions (and still be different; that’s why treating them asfunctions is more precise)
Example: Schema with domain D = {john, peter ,mary , . . .}and relation
Manager Boss Subordinate· ·
Schemas can be filled with data in many different ways
A database instance D compatible with a given schema S isa collection of finite extensions for the relation names in theschema
8
Example: For the schema S with
Domain D = {john, peter,mary, ken, carol, steve, ...,1, 2, 3, 4, ....}
Binary relations People(·, ·), Manager(·, ·)Attributes for People, in this order: Name,Age
Attributes for Manager , in this order: Boss , Subordinate
This is an instance compatible with the schema:
D1:People Name Age
john 35mary 25ken 40
Manager Boss Subordinateken johnjohn mary
9
This is another compatible instance:
D2:People Name Age
mary 35mary 25peter 40
Manager Boss Subordinateken stevecarol stevejohn mary
The sub-domains for the attributes Boss and Subordinate arethe same, namely the subset {john, peter ,mary , ken, carol , steve, ...}of the database domain D
10
Example: (different notations for the same) The table
Account# Name Balance12345 Raoul 400,0034567 Rupert 354,6012338 Rumilde 1234,3034561 Sulema 34445,23
Accounts relation
is an instance of a relation between the attributes Acount#,Name, and Balance
Each attribute has an associated (sub)domain
Here, the account number 12345, the name Raoul and the nu-merical value 400,00 are mutually related through the relation
The schema of the relation is:Accounts(Account#, Name, Balance)
11
Bank Example: Some abbreviations
clientn = client namecladd = client addressclneigh = client neighborhood,branch = branch nameacc# = account number
Schema:
Deposit(branch,acc#,clientn,balance),
Client(clientn,cladd,clneigh)
12
An instance
Depositbranch acc# clientn balance
Carleton 101 Jim 500Downtown 215 Sandy 700Barrhaven 304 Alvin 1300
Clientclientn cladd neighclJim 101 Queensbury Barrhaven
Sandy 40 Stone NepeanHernandez 15 Laurier Downtown
Alvin 17 Clyde AltavistaJohn 89 Case Centrepoint
13
What is a right schema?
What about this one? A single universal relationBank(branch,acc#,clientn, balance,cladd,neighcl)
It depends on the application and other practical, DB oriented,issues
If a client has several accounts, there is redundancy of in-formation
This DB becomes unnecessarily large, and inconsistenciesbecome more likely to occur
If a client has an account, but no address, we have to usemore null values than desired
Null values are not easy to handle
14
We will come back to design issues later on ...
For the moment, this one seems to be a better schema:
Deposit(branch,acc#,clientn,balance)
Client(clientn,cladd,clneigh)
Consider the relation Deposit
We have 4 (sub)domains, D1,D2,D3,D4, one for each of its4 attributes, where they take values (branch names, accountnumbers, client names, balances)
Any row in the table (extension of the relation) is a 4-tuple(v1, v2, v3, v4) with
v1 ∈ D1, v2 ∈ D2, v3 ∈ D3, v4 ∈ D4
15
That is
Depositbranch acc# clientn balance
Carleton 101 Jim 500Downtown 215 Sandy 700Barrhaven 304 Alvin 1300
is a subset of D1 ×D2 ×D3 ×D4
Any instance of the relation Deposit will be a subset ofD1 ×D2 ×D3 ×D4
We use relation and table as synonymous, the same for tupleand row
If t is a tuple, and R is a relation (extension), then:
16
We can say that t ∈ R if the tuple belongs to the relationR (relation extensions are sets)
Let A be the name of an attribute in the nth column ofrelation R
If t ∈ R, then t[n] and t[A] denote the value of the at-tribute A in the tuple t
For example, if t denotes the first tuple in the table Deposit,then t[2] = t[acc#] = 101
Useful notation: Since the same attribute may appear in differenttables, we distinguish the occurrences of the attribute, by usingthe relation name followed by “.” as a prefix, e.g.
Deposit.acc# Deposit.clientn Client.clientn
17
Queries
For the instance on page 12, give me the addresses with balancesof the clients who have a balance higher than 600
Answer:40 Stone 70017 Clyde 1300
The answer is a set of tuples, a new relation (extension)
We can say that a query is a mapping that sends DB instancesto new DB instances (possible with a different schema)
18
Several issues:
How to specify a query?
How to write it?
In what language?
What is the precise meaning of a query?
How to compute the answer?
There are several query languages for RDBs
Some more used in practice than others
But those of a more theoretic nature are the basis for the mostused in practice
19
The distinction between declarative vs. procedural query lan-guages is always relevant
The former express what the user wants to obtain from thedatabase, the latter express a particular way to compute theanswer
20
Relational Algebra as a Query Language
Idea: Relations are sets (subsets of cartesian products) con-structed on top of other sets (domain or subdomains)
Query answers are new relations
Thus, in order to obtain new relations (e.g. query answers) doset-theoretic algebra on existing relations
Operate on sets and relations in order to obtain new sets orrelations
21
The Relational Algebra (RA)
Provides algebraic operations over relations that producenew relations
Operations based on set-theoretic operations
Some of those operations come directly from set theory
Others are specific, ad hoc, for the RA
The latter are applicable to relations (as opposed to setsin general)
Provides a procedural query language for RDBs (becauseit is based on explicit operations)
The RA is one of the strengths of the relational model
RA can be used to give a precise, set-theoretic semanticsto other query languages
22
Queries in RA:
It is possible to answer the query by applying a sequence ofalgebraic (relational) operations starting from the originaldatabase instance
Even if the RDBMS offers a different query language, e.g.a declarative one, a query will be compiled into a sequenceof algebraic operations on the DB
23
Summary of basic operations of RA:
Union and Intersection: R1 ∪ R2, R1 ∩ R2
Can be applied to similar relations, i.e. same arity (and datatypes), as normal sets
Difference: R1 � R2
Again, for similar relations, as normal sets
Product: R1 × R2
This is essentially the cartesian product of two relationstaken as normal sets
E.g. for R = {(a, b), (c, d)}, S = {(1, 2), (2, 3)}R × S = {(a, b, 1, 2), (a, b, 2, 3), (c, d, 1, 2), (c, d, 2, 3)}
24
D
D R1R2
D
D R2 R1 U R2
R1 R2
R1
D
D R1 R2
R1 \ R2
25
Projection: ΠAR(· · · , A, · · · ), i.e. the projection of rela-
tion R on attribute A
R
A
B
II RA
Here, A is one of the attributes of R
The projection could be on several attributes of R
This is a unary operation: takes one relation as input (theprevious ones are binary)
This is an operation special for relations
It deletes, ignores, projects out entire “columns” from arelation
Projects R over one (or several) “coordinates” (attributes)
26
It generates a new relation, with a subset of the attributes(columns)
Its logical counterpart is the existential quantification
For the relation in the figure:
ΠAR(A,B) = {a ∈ A | it exists b ∈ B such that
(a, b) ∈ R}
27
Selection: σ<condition>(R)
Unary operation, special for relations
Selects the tuples of the relation R that satisfy the condi-tion
The condition can be expressed in a (limited) logical lan-guage
It generates a new relation, with the same attributes, butpossibly fewer tuples (rows)
28
Join: R1 �� R2
A binary operator, essential in RA
It allows to compose two relations through the values incommon taken by a distinguished attribute that shared bythe two relations(or two different attributes but with same data type ordomain)
Similar to the operation of composition of two relations asseen in set theory: R◦S
It is essential to combine tables in natural way, without ap-pealing to the possibly large and computationally expensiveproduct of them
There are generalizations of this basic, natural join
29
Notice: There is no (set-theoretic) Complement operation in RA(as found in set theory)
D
D
R R = ???c
In principle, there could be, given that relations are sets, but
What is the “meaning” of the complement of a relation?
Actually, it could be infinite, because the DB domain is possiblyinfinite
The difference (�) is only a relative complement, relative a givenrelation
30
Depositbranch acc# clientn balance
Carleton 101 Jim 500Downtown 215 Sandy 700Barrhaven 304 Alvin 1300
Which could be the tuples that are not in Deposit?
Which of those make sense?
31
Examples:
Union: Two relations with the same schema
WINE1 W# GRAPE VINTAGE PERCENTAGE100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5130 Tokay 1980 12.5
⋃
WINE2 W# GRAPE VINTAGE PERCENTAGE130 Tokay 1980 12.5140 Chenas 1981 12.7150 Volnay 1978 12.5
32
WINE3 W# GRAPE VINTAGE PERCENTAGE100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5130 Tokay 1980 12.5140 Chenas 1981 12.7150 Volnay 1978 12.5
Similarly, there is the intersection⋂
of the two relations:
WINE4 W# GRAPE VINTAGE PERCENTAGE130 Tokay 1980 12.5
33
Difference: Two relations with the same schema
WINE1 W# GRAPE VINTAGE PERCENTAGE100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5130 Tokay 1980 12.5
�
WINE2 W# GRAPE VINTAGE PERCENTAGE130 Tokay 1980 12.5140 Chenas 1981 12.7150 Volnay 1978 12.5
34
WINE4 W# GRAPE VINTAGE PERCENTAGE100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5
It should be clear from these examples that the complement ofa table does not make much sense ...
35
Product: Two relations, not necessarily with same schema
GRAPE GRAPE AREA COUNTRYChenas Beaujolais FranceVolnay Bourgogne France
Chanturgues Auvergne France
×
YEAR VINTAGE QUALITY1979 Good1980 Average
36
G/Y GRAPE AREA COUNTRY VINTAGE QUALITY
Chenas Beaujolais France 1979 GoodChenas Beaujolais France 1980 AverageVolnay Bourgogne France 1979 GoodVolnay Bourgogne France 1980 Average
Chanturgues Auvergne France 1979 GoodChanturgues Auvergne France 1980 Average
A huge table; maybe many of the combinations do not makemuch sense
The product is an expensive operation we may want to avoid,or apply only after we have reached smaller tables using otheroperations ...
Usually it makes more sense from the application point of viewto combine tables via a join
37
Join:
First the natural Join (there are more general ones)
Essential binary operator of RA
Relations are composed via the values in common taken by at-tributes in common (or of a similar data type)
38
WINE W# GRAPE VINTAGE QUALITY100 Chenas 1977 Good200 Chenas 1980 Excellent300 Chablis 1977 Good400 Chablis 1978 Bad500 Volnay 1980 Average
�
LOCATION GRAPE AREA AVG-QUALITYChenas Beaujolais GoodChablis Bourgogne AverageChablis California Bad
39
W/L W# GRAPE VINTAGE QUALITY AREA AVG-QUAL
100 Chenas 1977 Good Beaujolais Good200 Chenas 1980 Excellent Beaujolais Good300 Chablis 1977 Good Bourgogne Average300 Chablis 1977 Good California Bad400 Chablis 1978 Bad Bourgogne Average400 Chablis 1978 Bad California Bad
This is a common but expensive operation in RDBS
In general one applies it once tables have been reduced usingother operations
The intersection, difference, selection and projection all reducerelations
In (syntactic) query optimization, sequences of operations arerearranged to make the whole evaluation less expensive
40
Join operations can be more general than this basic one, ac-tually joins can be performed considering more complex “joinconditions”
Here, in formal terms, the simple condition can be specified withthe join:
WINE �WINE.GRAPE=LOCATION.GRAPE
LOCATION
The join is performed under the condition that the values in theGRAPE attribute in the two tables coincide
41
Projection:
WINE W# GRAPE VINTAGE PERCENTAGE QUALITY
100 Volnay 1979 12.7 Good110 Chablis 1980 11.8 Average120 Tokay 1981 12.1 Excellent130 Chenas 1979 12.0 Good140 Volnay 1980 11.9 Average
ΠVINTAGE,QUALITY
YEAR VINTAGE QUALITY1979 Good1980 Average1981 Excellent1979 Good1980 Average
A unary operator
Giving a new nameto the result is notpart of the operation
(but helps)
42
A tuple t is in the result (the projection) iff there is a tuple t′ inthe original relation that, restricted to the attributes indicatedin Π gives t:
t′[VINTAGE, QUALITY] = t
In other words:(1979,Good) ∈ Π
VINTAGE,QUALITY(WINE )
because there there exist values, say x, y, z for attributesW#,GRAPE , PERCENTAGE (we do not care whichones) such that the tuple (x, y, 1979, z,Good) belongs to re-lation WINE
43
Selection:
WINE W# GRAPE VINTAGE PERCENTAGE QUALITY
100 Volnay 1979 12.7 Good110 Chablis 1980 11.8 Average120 Tokay 1981 12.1 Excellent130 Chenas 1979 12.0 Good140 Volnay 1980 11.9 Average
σQUALITY=Good
GOOD-WINE W# GRAPE VINTAGE PERCENTAGE QUALITY
100 Volnay 1979 12.7 Good130 Chenas 1979 12.0 Good
Here the condition is very simple
44
It is possible to express more complex selection conditions usinga more expressive language that may use
Attribute names
Logical, boolean (propositional) operations (AND ,OR,NOT )
Built-in relations (=, <,≤, >,≥, �=) applied to attributenames and domain elements
Built-in relations have a fixed semantics, and fixed andpossibly infinite extensions
As opposed to relations in the schema that have variableextensions depending on the application and the state ofthe DB
E.g. the < built-in relation on the data type integer has aninfinite, fixed extension that the DBMS can simple use
45
< Smaller Bigger0 10 2· · · · · ·1000 1500· · · · · ·
�= String Stringjohn peterpeter mary· · · · · ·
mary john· · · · · ·
So a selection could be
σVINTAGE>1980 OR QUALITY=Good
(WINE )
This boolean language for expressing conditions can be used toextend the join operations with conditions �
<condition>, so as we
can express selections with complex conditions σ<condition>
WINE �W.GRAPE=L.GRAPE AND QUALITY=AVG-QUALITY
LOCATION
46
Queries Expressed in RA
A query can be expressed as a sequence of operations of RAapplied to the original tables and/or intermediate results
Example: Consider the schemas
DRINKER DRINKER# SURNAME FNAME TYPE
DRINKS DRINKER# WINE# DATE QUANTITY
WINE WINE# GRAPE VINTAGE PERCENTAGE
47
Query 1: Obtain the percentages of alcohol in the wines ofgrape Morgon, vintage 1979
Answer 1:
R1 := σGRAPE=Morgon
(WINE )
R2 := σVINTAGE=1979
(WINE )
R3 := R1 ∩ R2
ANS := ΠPERCENTAGE
(R3)
Answer 2: (same values)
ANS = ΠPERCENTAGE
(σGRAPE=Morgon AND VINTAGE=1979
(WINE ))
Notice the correspondence between the set-theoretic and logicaloperations ...
48
Query 2: Obtain last and first names of drinkers of Morgon orChenas
Now we need to combine the three original tables
R1 := σGRAPE=Morgon
(WINE )
R2 := σGRAPE=Chenas
(WINE )
R3 := R1 ∪ R2
R4 := R3 �WINE#
DRINKS (R3 is smaller than WINE)
R5 := R4 �DRINKER#
DRINKER
ANS := ΠSURNAME,FNAME
(R5)
Notice that we selected before the join, which becomes smaller
The other way around would have been semantically the same
49
Query 3: Obtain last and first names of drinkers who havetried in one day more than 10 samples of Chablis, vintage 1976,together with the percentage of alcohol of the wine
R1 := σQUANTITY >10
(DRINKS )
R2 := σGRAPE=Chablis
(WINE )
R3 := σVINTAGE=1976
(WINE )
R4 := R2 ∩ R3
R5 := R1 �WINE#
R4
R6 := ΠDRINKER#,PERCENTAGE
(R5)
R7 := R6 �DRINKER#
DRINKER
ANS = ΠSURNAME,FNAME,PERCENTAGE
(R7)
50
Warning!:
RA is based on set-theoretic operations, i.e. that take and pro-duce sets
In consequence, “duplicates” (multiple occurrences of the sametuple) do not appear anywhere
It is possible to extend these operations to “multi-sets” thatmay have duplicates (we do not do this for the moment though)
Exercise: Illustrate the computations of queries 1-3 using con-crete initial instances and producing all the intermediate rela-tions that lead to the final answer
51
Exercise: Assume we have the following schema
Frequents(Drinker,Bar) Serves(Bar,Beer)
Likes(Drinker,Beer)
Express in RA the following queries:
1. Which bars serve the beer John likes?
2. Which drinkers frequent at least one bar that serves somebeer they like?
3. Which drinkers frequent only bars that serve at least onebeer they like?
4. Which drinkers do not frequent any bar that serves somebeer they like?
52
Remarks, Extensions, Limitations
RA provides a procedural language for querying RDBs
We presented the most common relational operations, but thereare others (c.f. the textbook)
There may be many different RA expressions (formulas) thatcan be used to compute the same query answer
Which one to use depends on efficiency issues
Queries (computations thereof) can be optimized by rearrang-ing them into semantically equivalent RA formulas (i.e. samemeaning)
Space is always an issue; DBs can be very large and computa-tions take place in main memory
53
RDBMSs have built-in query optimizers that take care of opti-mizing the query
The notion of “semantic equivalence” of queries, in particular ofrelational expressions, is well-defined and precise: two RA queriesare equivalent if for every instance they produce the same result(i.e. same query instance)
Just like when we say that the (numerical) algebraic expressionsx + y + 0 and x(y + 1)− xy + y − 1 + 1 are equivalent: for anyvalues for x, y the result is the same
A strength of RA: the semantics of the language is clear, precise,formal and well-studied
It is grounded on set theory and predicate logic
54
There is a purely “logical counterpart” to the RA
The relational calculus is a declarative query language that isbased directly on predicate logic (we saw informal examples be-fore)
Relational algebra and relational calculus are equivalent in termsof the queries they can express (more on this later)
It is possible to define new operations for the RA using algebraicexpressions that use the already defined operations (and nothingmore)
55
Example: We could define the “symmetric difference” of twosimilar relations
R1 ∆ R2 := (R1 � R2) ∪ (R2 � R1)
D
D R2 R1 R2R1
Notice that the new operation (∆) on the LHS is being definedby means of a fixed algebraic formula that uses already definedoperations (�,∪)
A single formula or definition that can be applied to any instance,i.e. the definition is independent from the instance to which isis applied
56
Two relations with the same attributes
WINE1 W# GRAPE VINTAGE PERCENTAGE
100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5130 Tokay 1980 12.5
∆
WINE2 W# GRAPE VINTAGE PERCENTAGE
130 Tokay 1980 12.5140 Chenas 1981 12.7
WINE5 W# GRAPE VINTAGE PERCENTAGE
100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5140 Chenas 1981 12.7
57
Exercise: Invent and define new operations for the RA
Actually, we haven’t been very economical when we listed thebasic RA operations
Some of them could have been defined in terms of the others, sothey are theoretically redundant (but not necessarily practicallyredundant)
Exercise: Define the join in terms of the product, the selectionand the projection
58
The Transitive Closure
Example: Paternity Father Son
Eric LuisEric JuanJuan CarlosJuan SergioLuis TomasTomas Pedro
We want to define and com-pute a new relation Ancestrythat contains all (and only) thetuples that can be obtained bytransitive paternity, i.e.
Ancestry Ancestor DescendantEric LuisEric JuanJuan CarlosJuan SergioLuis TomasTomas PedroEric TomasEric CarlosEric SergioLuis PedroEric Pedro
59
Ancestry is the transitive closure of Paternity, i.e. the smallesttransitive relation that includes Paternity
The transitive closure of a relation is something we use, com-pute, and need all the time
Computation? An iterative procedures computes it
Ancestry Ancestor DescendantEric LuisEric Juan
step 0 Juan CarlosJuan SergioLuis TomasTomas PedroEric Tomas
step 1 Eric CarlosEric SergioLuis Pedro
step 2 Eric Pedro
The length of the iteration de-pends on the initial instance; itis not bounded a priori
60
Can we define the transitive closure using a general and fixedformula of RA?
TC (R) := · · ·With a formula on the RHS in terms of the operations of RAwe saw (and nothing else)?
In particular, not depending on the instance at hand ...
61
It can be mathematically proved that it is not possible to definethe TC of a relation by means of a fixed and general formula ofRA
The TC is not part of the RA, and cannot be defined by meansof a fixed formula that uses (the other) relational operators
The theorem is easier to state and prove in the “logical coun-terpart” of the RA, i.e. in the relational calculus
This is a result about the (limited) expressive power of a partic-ular query language: there are things that cannot be expressedin it
62
In order to compute the TC from a RDB, an iterative procedurecan be programmed in interaction with the DB
Ideally, a query language provided by a RDBMS should offer thepossibility to define and express the TC
The newer SQL standard (SQL99) supports this
(We will come back in the context of (extended) logical query languages ...)
63
Integrity Constraints
So far we have no way to capture requirements or conditionsthat our DB model (and DB) should satisfy in order to:
Be an accurate model of the outside reality being modeled(c.f. instance in page 9, two ages for mary ...)
Impose or contain more meaning, more semantics wrt themodeled domain
Stay in correspondence with the modeled domain
To make sure that when the data changes, the meaningand correspondence are kept
64
O utsideReality
DB
ICs
ICs are statements (sentences, propositions, ...) that have to besatisfied in every stable, valid, legal state of the database
It is not difficult to express semantic or integrity constraints(ICs) in languages of predicate logic
They can be expressed as formal, symbolic sentences in thoselanguages
65
People Name Age Degreemary 35 lawmary 25 medicinepeter 40 CSpeter 40 math
∀x∀y∀z∀u∀v(People(x, y, z) ∧ People(x, u, v) → y = u)
The instance above is not admissible if the IC is to be satisfied,because it does not satisfy the IC
The database instance is inconsistent, in the sense that it doesnot satisfy (does not make true) the ICs
66
In principle, ICs expressed in such languages could be processedby a DBMS
And the DBMS could make sure that the actual DB instancedoes satisfy them
67
How?
Rejecting changes (updates) that violates them
Compensating with additional, internal, automatic updates,those updates issued by user or applications programs
Notifying the user or applications about violations of ICsbefore committing changes
...
However, to privilege efficiency, those mechanisms are not alwaysimplemented or offered in/by DBMSs
In general, commercial DBMSs provide automated, built-in sup-port only for some restricted, limited classes of ICs
68
Thus, in some cases the user has to find alternatives
Maintaining the IC satisfied (DB maintenance) through ex-ternal application programs that interact with the DB
The external program could issue a query to the databasein order to detect if there is a violation of the IC
Depending on the answer returned by the DBMS, the ap-plication program has alternatives on how to proceed
Which could be the query to detect if there is a violationof the IC in page 66?
69
An external program that interacts with the DBMS, cancreate a view that captures the violations of the IC
For example, the the following “violation view”
V (x) : ∃y∃z∃v∃w(Person(x, y, v) ∧ Person(x, z, w) ∧ y �= z)
This view will contain all the names that have more thatone age
It is expected that the contents of the view is always empty
If not, there is a violation, and the application program cando something about that
In the example, mary would be caught by the view
70
Defining triggers in the DB to be stored in the DB
Triggers (aka. active rules) react automatically when a vi-olation occurs:
• notification messages
• rejection of updates
• additional compensating updates as programmed inthe trigger
• ...
They are of the form: Event & Condition ⇒ Action
71
For the IC in page 66, the Event could be an update oftable Person, like an insertion (deletions of tuples are notrelevant for the FD)
The Condition could be a check of the occurrence of sometuple in the violation view
The Action does something to restore consistency, e.g.delete the old conflicting tuple and accept the new one
Crossing fingers ...
72
Some Classes of ICs
In some cases, a subset of the attributes functionally determines(or is expected to determine) another subset of the attributes
Example:Students Number Name Study Sport
9901254 John Stanley CS Soccer9910803 Sue Jones Math Skating9910803 Sue Jones Math Soccer9901254 John Stanley Literature Handball
“Every student number is associated to at most one studentname” or “Every two students that coincide in student numbercoincide in student name”
Number functionally determines Name; denotedStudents : Number → Name (not logical implication)
73
2. Key Dependencies (constraints): A particular case of func-tional dependency, where a subset of the attributes functionallydetermines all the attributes in the relation
Students Number Name Study Sport9901254 John Stanley CS Soccer9910803 Sue Jones Math Skating
“Student number determines all the other attributes of the stu-dent”
Number is a key of the relation, i.e.Students : Number → {Number ,Name, Study , Sport}Satisfied by this instance, but not by the previous one
74
Example:
WINE GRAPE VINTAGE VINEYARD QUALITYChenas 1977 Laphite GoodChenas 1980 Mouton ExcellentChablis 1977 Rotschild GoodChablis 1978 Crepeau BadVolnay 1980 Satie Average
The set of attributes {GRAPE, VINTAGE, VINEYARD} formsa key
{GRAPE, VINTAGE, VINEYARD} → QUALITY
(If the relation name is clear from the context, we omit it)
753. Range Constraints: They restrict the values that can be takenby some attributes
Example: “A CEO cannot make less that 80,000 per year”,“An employee must be over 18”
Both satisfied by
Employee Name Position Salary Age
john clerk 40 K 35mary CEO 100 K 45ken accountant 60 K 40
But none of the two by
Employee Name Position Salary Age
john clerk 40 K 35mary CEO 70 K 45ken accountant 60 K 40carol programmer 90 K 16
In predicate logic: ∀wxyz(Employee(w, x, y, z) → z > 18)(exercise: express the other)
76
4. NOT NULL Constraints: They restrict the values taken by someattributes to be non NULL
A NULL value is used in databases to represent missing, un-known, non applicable, ...., information (there are different, most-ly informal, semantics for NULL values)
Emp Name Posit Sal Age
john clerk 40 K 35mary CEO 85 K NULLken account. 60 K 40carol NULL 90 K 19
This instance satisfies“Name cannot beNULL”, but not “Agecannot be NULL”
Normally when a key constraint declares that a set of attributesis a key, it is also (compulsory) required that its attributes cannotbe NULL
Key constraints and NON NULL constraints go together
77
If Name above has been declared a key (i.e. this IC is imposed),it should not take the value NULL
It has to do with the way NULL values are treated by the DBMS:it does not know if it represents a value that is equal or differentfrom the other certain (or null) values
78
5. Referential Constraints: They require that all the values ofsome attributes in a relation also appear in attributes of anotherrelation
The first relation refers to the second relation (which can bethough of as a relation containing official data)
Example: “Every student in relation UnivTeams must be aregistered student”
UnivTeams Number Team
9910803 Basketball... ...
Students Number Name Study
9901254 John Stanley CS9910803 Sue Jones Math9901254 John Stanley Literature
(official students table)
79
A referential IC: UnivTeams .Number refers to Students .Number
Also called an inclusion dependency: UnivTeams .Number isincluded in Students .Number :
UnivTeams .Number ⊆ Students .Number
(or UnivTeams [Number ] ⊆ Students [Number ])
This is not a full inclusion dependency in the sense that not allthe attributes of UnivTeams participate in the inclusion
The referring and referred attributes may have different names,e.g. we could have
UnivTeams [Number ] ⊆ Students [ID ]
(as long as the data types match)
80
In the language of predicate logic:
∀x∀y(UnivTeams(x, y) → ∃z∃wStudents(x, z, w))
81
5. Foreign Key Constraints: A combination of a referential con-straint and a key constraint
In addition to the referential constraint, it is required that thereferred attributes in the second relation form a key for thatrelation
That is, if the referential IC requires
R[Ai1 , . . . , Aim ] ⊆ S[Bj1 , . . . , Bjm ],
then we also require that {Bj1 , . . . , Bjm} is a key for S
82Example: We want UnivTeams .Number to be a foreign key forrelation Students , i.e. that it refers to the attribute Students .Namethat is a key of Students
UnivTeams Number Team
9910803 Basketball... ...
Students Number Name Study
9901254 John Stanley CS9910803 Sue Jones Math9901254 John Stanley Literature
Inconsistent instance!
UnivTeams Number Team
9910803 Basketball... ...
Consistent instance!
Students Number Name Study
9901254 John Stanley CS9910803 Sue Jones Math99052454 Ken Scott Literature
(official students table)
83
Example: A referential IC
Loan(branchn,loan#,clientn,amount)
↓Branch(branchn,actives,branchNeigh)
84
Example: Foreign key constraints
DRINKER DRINKER# SURNAME FNAME TYPE
↑DRINKS DRINKER# WINE# DATE QUANTITY
↓WINE WINE# GRAPE VINTAGE PERCENTAGE
(Thinking in terms of the E/R model, the relation in the middleseems to come from a Relationship and the other two, fromEntities)
85
Final Remarks
There are several other classes of ICs for the relationalmodel
We presented those most common in practice
For most of them, commercial RDBMSs provide automaticsupport (database maintenance wrt them)
Integrity constrains become part of the relational databaseschema
Those declared in the schema are expected to be satisfiedby all the instances that are compatible with the schema
In that case we say that the database (instance, extension)is consistent wrt the declared ICs
We will see other kinds of ICs later on, in other contexts
Chapter 5:
Relational Algebra
and
Relational Calculus
2
Relational Calculus
The relational algebra is an algebraic and procedural query lan-guage for relational databases
A few examples have shown that it is also possible to use lan-guages from predicate logic to:
Pose queries to the database
Specify integrity constraints
Define views of the database
In general, express metadata, i.e. data about the data, i.e.about the structure and organization of data
3The “logical counterpart” of the RA is called the RelationalCalculus (RC), and it comes in two flavors:
The Tuple Calculus (TC): Basically the “atomic values”of data are complete tuples in relations
The language has variables to refer to tuples
The Domain Calculus (DC): The atomic values of dataare those taken by the attributes (i.e. columns) as opposedto whole rows
The reason for the name is that values taken by the at-tributes are drawn from the underlying database domain
Variables of the language refer to elements of (values in)the database domain
4
There are transformations between TC and DC, and basicallythe same can be expressed
Since DC is easier to explain and closer to classical predicatelogic, we will concentrate on the DC
RC is a declarative language to express queries (a query lan-guage), ICs, view definitions, etc.
We review some elements of predicate logic, at least those thatare relevant to RC:
5
We introduce a formal, symbolic, object, language to talkabout a database
So, we need some symbolic ingredients
First, we need symbolic names for the data items, i.e. forelements of the database domain DActually, we will use in the formal language of RC the samenames that we use in the metalanguage for the elementsof DSymbolic names for the relations (predicates) in the schema
It may be useful to introduce unary symbolic predicates for(domains of) the attributes A
Alternatively, we could have unary predicates to refer tothe subdomains D(A) of D (c.f. chapter 3 of these notes)
6Example: Consider the relational schema S with
Domain
D = {john, peter,mary, ken, carol, steve, ..., 0, 1, 2, 3, 4, ....}Binary relations People(·, ·), Manager(·, ·)Attributes for People, in this order: Name,Age
Attributes for Manager , in this order: Boss , Subordinate
This schema S has an associated language L(S) of predicatelogic, based on the following symbols:
Names for domain individuals, to denote them: john, peter ,,mary , ken, carol , steve, ...
Predicate symbols: People(·, ·), Manager(·, ·)Logical symbols: ¬,∧,∨,→,↔,∀, ∃An infinite but countable, official, set of variables: x1, x2, x3, . . .(sometimes we will use other variables)
7
Possibly of set of logical predicates (aka. evaluable, built-inpredicates): =, 6=, <, ...
They have a fixed interpretation (extension) given by thelogic and depending on the underlying domain (c.f. chapter3 of these notes), as opposed to those in the second list,that can have different interpretations (extensions)
Symbols for subdomain (attribute) predicates:Name(·),Age(·),Boss(·), Subordinate(·)These predicates for the domains of the attributes (or sub-domains of the domain) also have fixed interpretations (ex-tensions), in the sense that they depend on the domain, andnot on the relations of the database
The extension for Name is {john, peter,mary, ken, carol,steve, ...}, the same for Boss and Subordinate, but theextension of Age is {0, 1, 2, 3, . . .}
8Using these symbols it is possible to build formulas of the lan-guage L(S), e.g.
1. People(john, 35), P eople(john, mary), P eople(x3, 20),P eople(x3, x5), Age(35), Age(x10), john=mary,x2 =ken, 35 < 12, ken 6= john, ...
These are all atomic formulas: A predicate applied tonames and/or variables
2. More complex -non atomic- formulas:
a) People(john, 32) ∧ ¬People(mary, 23)
b) People(peter, x) → Age(x)
c) ∀x∀y∀z(Manager(x, y) ∧Manager(z, y)→x = z)
d) ∃x∀y(y 6= x → Manager(x, y))
e) Manager(peter, x) ∧ ∃y(People(x, y) ∧ y < 30)
f ) ∀x∀y(People(x, y) → Name(x) ∧ Age(y))
9
Some of these formulas do not have variables outside the scopeof a quantifier (∃,∀)They are called sentences, e.g. People(john, 35), and 2.(a),2.(c)
Notice that atomic sentences correspond to data in the database,i.e. tuples in tables
Sentence 2.(c) could be an integrity constraint; if it is imposedas an IC with the schema, it is expected to be true in the legalinstances of the schema
Sentence 2.(f) could be seen as a condition on the schema: thearguments for People are elements of Name and Age, resp.
Formula 2.(e) could be seen as a query: “Give me the values forx such that the condition on it becomes true” (in the instanceof the database at hand)
10
The notion of “being true” seems to be crucial here!
Formulas and sentences are purely symbolic objects, but theybecome true (or false) when they are interpreted
In predicate logic, formulas are interpreted in structures; anddatabase instances can be seen as structures
So, formulas can be interpreted in database instances, and theybecome true or false in them
The semantics of the RC languages is inherited from the seman-tics for languages of predicate logic:
11A database instance D for a relational schema S = {R, S, ...}can be seen as a finite structure D = 〈D, RD, SD, . . .〉, in thefollowing sense:
It has a domain, namely D (possibly infinite)
Example: D = {john, peter, . . . , 0, 1, . . .}The names for individuals in L(S) are interpreted by them-selves
Example: peter of the object language is interpreted as theelement peter of the domain
Every non-logical, domain dependent, predicate has a finiteextension (i.e. a finite number of tuples in the table)
Example:
PeopleD = {(john, 34), (peter, 37), . . . , (mary, 25)}ManagerD = {(john, peter), (peter,mary), . . . , (john, ken)}
12
Built-in predicates have fixed extensions, possibly infinite
Example:
• {(john, john), (peter, peter), . . .} are the tuples in theinterpretation of =
• {(john, peter), (peter,mary), . . .} are the tuples inthe interpretation of 6=
• < has the extension {(0, 1), (0, 2), . . . , (1, 2), (1, 3), . . .}
We can see that given the schema S and a fixed domain D,the different instances will differ only in the extensions of thenon-logical predicates, i.e. People, Manager
13
Now we can apply the classical definition of truth of symbolicformulas in structures (Alfred Tarski, early 30’s)
It is a recursive (inductive) definition that can be made preciseand general, but we illustrate it with examples
Given a schema S, a formula ϕ ∈ L(S), and a database instanceD compatible with S, we want to define when ϕ is true in D
Notice that ϕ could have free variables, then we need to indicatethe values in the domain that we assign to the free variables
14
Example: These are two instances compatible with the schema:
D1:People Name Age
john 35mary 25ken 40
Manager Boss Subordinateken johnjohn mary
D2:People Name Age
mary 35mary 25peter 40
Manager Boss Subordinateken stevecarol stevejohn mary
15
1. People(john, 35) is true in D1, but false in D2:
D1 |= People(john, 35), because (john, 35) ∈ PeopleD1 ,i.e. the tuple (john, 35) belongs to the extension of thepredicate in the DB
But D2 6|= People(john, 35) (the tuple does not belong tothe extension)
Actually, it is considered to be false by applying the ClosedWorld Assumption on Databases: The only true atomicknowledge is the one explicitly contained in the tables
2. D1 |= john = john, because (john, john) is in theextension for =
D1 |= john 6= mary, because (john,mary) is in theextension for 6=
D1 |= 5 < 40, because (5, 40) is in the extension for <
16
D1 6|= john 6= john, because (john, john) is notthe extension for 6=
3. People(mary, x) is true in D2 when x takes the value 25
D2 |= People(mary, x)[25]
But D2 6|= People(mary, x)[10]
4. Actually, for the values 35 or 25 for x, the formula (query?)People(mary, x) becomes true in D2
D2 |= People(mary, x)[25] and D2 |= People(mary, x)[35]
5. D2 |= ¬People(john, 35) by definition, becauseD2 6|=People(john, 35) (cf. 1. and use of CWA)
6. D1 |= (People(john, 35)∧Manager(ken, john)), by def-inition, because
D1 |= (People(john, 35) and D1 |= Manager(ken, john))
17
7. D2 |= (People(peter, 10)∨Manager(ken, steve) by defi-nition, because
D2 |= (People(peter, 10) or D2 |= Manager(ken, steve)
8. D2 |= ∃xManager(x, steve) by definition, because thereexists a value in the domain for x, namely x = ken, suchthat D2 |= Manager(x, steve)[ken]
9. D2 |= ∀y(People(mary, y) → y = 35 ∨ y = 25)by definition, because for all the values a ∈ D, i.e. in thedomain, it holds
D2 |= (People(mary, y) → y = 35 ∨ y = 25)[a]
the value for y ↑
E.g. D2 |= (People(mary, y) → y = 35 ∨ y = 25)[10]
E.g. D2 |= (People(mary, y) → y = 35 ∨ y = 25)[25]
18
With this recursive definition it is possible to evaluate the truthof any syntactically well-formed sentence or formula (providedwe give values to the free variables) in a database instance
Since the formulas are symbolic, with a precise syntax, and theextensions of the relevant relations are finite, all the “reason-able” formulas (c.f. later) can be evaluated by a computationalsystem, like a RDBMS
Notice that the evaluation of the truth of a formula is composi-tional: the truth of a formula is based on the truth (or not) ofits subformulas, which makes evaluation easier and clearer
A sentence (written in the logical language associated to theschema) can be algorithmically determined as true or false inthe DB
19
Exercise: Say in English what is expressed by the symbolic sen-tence
∀x∃y∀z(Manager(x, y) ∧ People(y, z) ∧ z > 30 →∃w(People(w, 25) ∧ ¬Manager(x, w)))
Determine if the following sentence is true in the instancesD1, D2 by using the inductive (recursive) definition of truth ina structure (database instance)
D1
?
|= ∀x∃y∀z(Manager(x, y)∧People(y, z)∧z > 30 → ∃w(People(w, 25)∧¬Manager(x, w)))
D2
?
|= ∀x∃y∀z(Manager(x, y)∧People(y, z)∧z > 30 → ∃w(People(w, 25)∧¬Manager(x, w)))
The sentence has to be true or false in D1, D2
20
For an instance D for a schema S and a formula of L(S) withfree variables, it is possible to algorithmically determine if thereare values in the database domain for those variables, so thatthe formula becomes true in D
If those values exist, they can also be determined algorithmicallyas a part of the same evaluation process
So, we can use the RC as a query language
This language (or family of relational languages) has a precise,clear, and well-studied semantics
21
Example: Pose and answer the following queries to the databaseinstances above
1. Return the managers who have subordinates that are youngerthan 27
∃y∃z(Manager(x, y) ∧ People(y, z) ∧ z < 27)
The answers are collected as the values for the only freevariable, namely x
D1
?
|= ∃y∃z(Manager(x, y) ∧ People(y, z) ∧ z < 27)
Answer: x = john, because
D1 |= ∃y∃z(Manager(x, y)∧People(y, z)∧z < 27)[john]
Notice how the join of the tables is captured through thevariable in common y above and the conjunction
Projection is captured by existential quantifiers; selectionsby conjuncts expressed in terms of built-in predicates
22
2. Return names and ages of the employees who are not aboss (of any people)
People(x, y) ∧ ∀z∀w(People(z, w) → ¬Manager(x, z))
Answers in D1: {(mary, 25)}Answers in D2: {(mary, 25), (mary, 35), (peter, 40)}Alternatively (but a different query though, with differentmeaning):
People(x, y) ∧ ¬∃zManager(x, z)
Exercise: Show that the two queries above have different mean-ing by providing an instance of S where the answers are different
23
RA vs. RC
In RC we can express all the operations of RA, i.e. we can de-fine by means of logical formulas the relations that result fromapplying the RA operations
We introduce in the logical language a new predicate Ans tocollect the result of the operation, and next we define it by asentence
Selection: σϕ(R(A1, . . . , An))
Here R is a relation and ϕ is a condition on the values ofthe attributes Ai
∀x1 · · · ∀xn(Ans(x1, . . . , xn) ←→ R(x1, . . . , xn) ∧ ϕ)
E.g. σA=aR(A,B) can be defined by
∀x∀y(Ans(x, y) ←→ R(x, y) ∧ x = a)
24
Intersection: R(A, B) ∩ S(A,B)
∀x∀y(Ans(x, y) ←→ R(x, y) ∧ S(x, y))
Instead of using the answer predicate, we can use the for-mula R(x, y) ∧ S(x, y), with free variables x, y, whereverneeded as a (sub)formula to capture the intersection
Union: R(A,B) ∪ S(A,B)
∀x∀y(Ans(x, y) ←→ R(x, y) ∨ S(x, y))
Projection: ΠA(R(A,B))
∀x(Ans(x) ←→ ∃yR(x, y))
Join: R(A,B) 1B=C S(C,D)
∀x∀y∀z(Ans(x, y, z) ←→ R(x, y) ∧ S(y, z))
Cartesian Product: R(A,B)× S(C,D)
∀x∀y∀z∀w(Ans(x, y, z, w) ←→ R(x, y) ∧ S(z, w))
25
Difference: R(A, B)r S(A,B)
∀x∀y(Ans(x, y) ←→ R(x, y) ∧ ¬S(x, y))
We can see that all the RA can be expressed in the RC
Thus complex RA expressions (RA queries) can be translatedinto declarative RC formulas (queries)
Actually, with the syntax and semantics of RC we could gobeyond ...
Example: Give me those who are not bosses
¬∃yManager(x, y) (*)
Who should be answers in, say D1?mary? susan? (with susan ∈ D)
26
As a formula, the query is O.K., its semantics as a logical formulais also O.K.
But as a DB query?
Notice that the “corresponding RA query”would be
(ΠSubordinate(Manager))c
We do not have complement in RA
We do have it in RC (or logic), but in DB we do not want touse it
We restrict ourselves to the so called domain independent orsafe queries (the “reasonable” queries mentioned before)
27
Those are the queries that can be evaluated without appealingto the whole -possibly infinite- underlying DB domain DDomain independent queries can be evaluated by concentratingon the active domain of the DB: the subset of D that containsthe data items that appear in some of the finite DB tables
activeDom(D1) = {john, 35,mary, 25, ken, 40, ken}The query 2. above (as formulated) is safe, because the nonbosses are found among the people; and the latter appear in atable
The RA difference is always safe: R(x, y) ∧ ¬S(x, y)because the answers are all among rows in table R
The query (*) is not domain independent
28
Finally, it can be proved that it is not possible to define thetransitive closure of a relation using the RC language
This is expected given the correspondence between the RC andthe RC
Actually, this impossibility result is usually proved in the contextof the RC, not directly for RA
So the RC has limited expressive power for some natural appli-cations
Any reasonable, well-behaved, and more expressive extensionsof the RC?
29
Other Uses of the RC Languages
The RC languages can be used for many purposes, not only forquery formulation
With the same advantages of having a language that is suitablefor computational processing, has clear syntax and semantics,and is highly expressive for DB purposes
1. Metadata:
We can express conditions expressed in a RC language on thestructure of the data
Example: ∀x∀y(People(x, y) → Name(x) ∧ Age(y))
This is saying that the values in attributes have to be taken inthe right subdomain
30
This opens the possibility of expressing more complex conditionson the data types for the different attributes
∀x∀y(People(x, y) → CharString(x) ∧ Integer(y)),
where CharString(·), Integer(·) are recognized by the system(built-in types)
Or even more complex:
∀x∀y(R(x, y) → Type1 (x) ∧ Type2 (y)),
where Type1(·), T ype2(·) are defined by means of additionallogical formulas
Conditions like these can be checked by the system
31
Metadata is crucial in many applications of databases today,because data is integrated from different databases
The integration is usually virtual: the data stay at their sources
Think of data sources integrated through the WWW
The metadata of each source provides (some) information aboutwhat is found in a data source and how
Because it is data about data ...
32
2. Integrity Constraints:
ICs can be expressed as sentences of a RC language
D
Employee Name Position Salary Age
john clerk 40 K 35mary CEO 100 K 45ken accountant 60 K 40
∀x∀y∀z∀w(Employee(x, y, z, w) ∧ y = CEO → Salary > 90K)
This range constraint is satisfied by the DB instance D:
D |= ∀x∀y∀z∀w(Employee(x, y, z, w) ∧ y = CEO → Salary > 90K)
33
ICs are also metadata
They embody knowledge (about the data) that can be used
For example, it can be publicized as or provided as a semanticlayer for a data source
In this way conveying “meaning” (semantics) about/of the datasource
For example, if different data sources about salaries of employeesare virtually integrated and we want to find those CEO who makeless that 80K, we do not have to search inside a data source thatis exposed to the outside world as satisfying the range constraintabove
34
We can also express functional dependencies, e.g.Employee : Name → Age
∀x∀y1∀y2∀z1∀z2∀w1∀w2(Employee(x, y1, z1, w1) ∧Employee(x, y2, z2, w2) → w1 = w2)
It is satisfied by D
It is easy to express that Name is a key of the relation
Write down an axiom like this for each of the attributes otherthat those in the key
35
UnivTeams Number Team
9910803 Basketball... ...
Students Number Name Study
9901254 John Stanley CS9910803 Sue Jones Math
Referential IC: UnivTeams .Number ⊆ Students .Number
∀x∀y∃z∃w(UnivTeams(x, y) → Students(x, z, w))
The corresponding foreign key constraint can be expressed bythe conjunction of this sentence plus the sentence that says thatNumber is a key of Students (do it!)
36
We can see that by using RC languages to express ICs:
It is clear what it means for a DB to satisfy an IC
We can express complex ICs, actually we can go muchbeyond what commercial RDBMSs support
ICs checking becomes machine processable
Being ICs syntactic objects, they can be in principle storedin the DB and used as extra knowledge about the domain
Knowledge that can be used for other purposes, e.g. queryoptimization, more precisely semantic query optimization
Example: Return students (numbers) that participate in ateam and are registered students
∃y∃z∃v(UnivTeams(x, y) ∧ Students(x, z, v))
If the system knows that the RIC is satisfied, no need togo to table Students
37
ICs should be checkable in the active domain of the database,so IC are expected to be domain independent sentences
Exercise: Check that all the ICs we have encountered so far aredomain independent
There is a syntactic characterization of the safe formulas, so inDB applications one restricts the RC to its safe portion
(To be precise, the class of safe formulas is a proper subset ofthe class of domain independent formulas, but safeness is goodenough for applications)
Example: ∀x∃yStudent(x, y) is a RC sentence, but is not do-main independent
38
V
R S
virtual table
3. View Definitions:
Views are (usually) virtual tables that “contain” data that comefrom other (usually material) base tables, those in the originalschema
Views can be defined using the RC: first introducing a namefor the view, i.e. a new predicate, and then a RC formula thatdefines it (as we did with the Ans predicate before)
39
D :
People Name Agejohn 35mary 25ken 40
Manager Boss Subordinateken johnjohn mary
The view that shows (“contains”) the bosses
∀x(Bosses(x) ←→ ∃yManager(x, y))
The “extension” of the view on D is Bosses(D) = {ken, john}The view containing “top bosses”
∀x(TopBoss(x) ↔ ∃yManager(x, y) ∧ ¬∃zManager(z, x))
TopBoss(D) = {ken}
40
Bosses with their ages:
∀x(BossAge(x, z) ↔ ∃yManager(x, y) ∧ People(x, z))
BossAge(D) = {(ken, 40), (john, 35)}Notice that views are defined by a query, the one on the RHS
So, a view is any relation (usually virtual) that is defined in termsof already existing relations (usually material tables) by using asuitable query language
41
We can see that the semantics of a view is clear
We have a rich, expressive language for defining views
It is easy to compute their extensions if wanted
Views are useful to represent different perspectives and/or usesof the data in the DB
They allow to combine data into new relations
They can be used also for security purposes: certain users mayhave access to certain views of the DB only
Problem: How to speed up updating of the view when baserelations change?
42
Views are very important today
The emphasis is on integration of data sources
Thin again of data sources used/accessed through the WWW
Actually virtual integration (data is not collected into a singleand huge physical repository)
View definitions provide a way to define correspondences, map-pings, semantic bridges between separate and autonomous datasources