View
229
Download
3
Category
Tags:
Preview:
Citation preview
CPSC 504: Data ManagementReview of Relational Model
1/2Laks V.S. Lakshmanan
Dept. of CS UBC
The Basic Model
• A DB = a (finite) set of (finite) relations. • A relation has a well-defined schema – its
set of attributes, each with its domain: e.g., dom(songName), dom(Artist), dom(AlbumNo), dom(fileFormat), …
• A relation – a subset of the Cartesian (aka cross) product of its attribute domains:
• e.g., myfavs dom(AlbumNo) x dom(songName) x dom(Artist) x dom(fileFormat).
How do you get there?
• In practice, you often use the ER (for entity-relationship) model as an informal spec of a design of a relational database schema.
• Schema = structure or table template.
• Relation = table instance.
Example ER schema
Actor MovieStars in
Name
DOB Addr
Role
Title Year Dir
A schema in the ER model is specified using an ER diagram.
Some constraints• Each relation has a key: candidate keys,
primary keys, superkeys, …? • Functional dependency (FD): formalization
and generalization of key. • r(R), X, Y R. r |= XY iff any pair of tuples
in r that agree on X, necessarily agree on Y. • e.g.: AlbumNoArtist, AlbumNo, songNameAlbumNo, …,
fileFormat. • {AlbumNo, songName} – a key, but not
AlbumNo. • Note: ER diagrams can express a rich class
of (but not all) constraints.
More on FDs • FDs help determine keys: there are efficient
algorithms as well as an assortment of (computationally) hard problems.
• FDs are pointers to redundancies in data: e.g., same Artist (value) will appear many times, once for each song in an album of that artist.
• Why is redundancy (in representation) bad? efficient algorithms for redundancy removal –
the so-called normal forms. • There is a full assortment of them – 1NF, 2NF,
3NF, BCNF, 4NF, … • FDs/keys help with indexing, with query
optimization, with data exchange, etc.
More on Normal Forms
• 1NF really says tuples must be “flat”. • Other NFs really seek to capture different
degrees of redundancy (and eliminate them).
• 3NF, BCNF – some of the most important ones in practice.
• Normalization algorithms: start from U, the set of all attributes, and figure out which attr sets should stay together, which ones should split up.
Some more on FDs
• Constraints such as FDs are called integrity constraints. Why?
• ICs are statements in predicate logic. • What does an FD really say? • Need to reason with ICs, so:
– Can determine keys, – Can check if certain ICs are enforced. – Can understand interaction of ICs.
Last bits on FDs
• FDs admit a sound and complete axiom system: – Reflexivity: if X Y, then YX. – Augmentation: if WZ and X Y, then
WYZX. – Transitivity: if XY and YZ, then XZ.
• Not all classes of ICs admit a S&C axiom system.
• Some consequences of the above axiom system for FDs.
ICs, Tableaux, and Queries • An FD can also be stated in a
“pictorial” form: A B C D
x1 x2 _ _
x1 x3 _ _
x2 = x3
Schema/columnheaders
Pattern rows.
Summary row/ Consequent/Conclusion.
ICs, Tableuax, and Queries
• Tableaux are general; can express what conditions must hold when certain patterns are present in data.
• Can express equality generating dependencies (egd’s) and tuple generating dependencies (tgd’s).
• Can also express queries (by using summary row for saying what you want in the output).
ICs, Tableaux, and Queries • Tableaux can also be expressed in
the form of rules, also known as “Horn clauses”.
• Examples: X2 = X3 r(X1,X2,_,_), r(X1,X3,_,_).
Which really is saying … X2 = X3 r(X1,X2,Y3,Y4),
r(X1,X3,Y5,Y6). Logically speaking, what are these
rules saying? p(X,Y) a(X,Z), b(Z,Y), a(Y,W), W>100.
Some more on NFs – 3NF
• Def. of 3NF: r(R), given set of applicable FDs F, is in 3NF iff for every XA that holds for r, X is a superkey for r according to F OR A belongs to some key.
• e.g.: r(AlbumNo, Artist, songName, fileFormat) is not in 3NF, while r1(AlbumNo, Artist), r2(AlbumNo, songName, fileFormat) are. Why?
Some more on NFs – BCNF
• Def. of BCNF: r(R), given applicable FDs F, is in BCNF iff for every XA that holds for r, X is a superkey for r according to F.
• Clearly stronger than 3NF. • e.g.: {S(treet), (s)T(ate), Z(ip)}, with
STZ, ZT. – it’s in 3NF. • Not in BCNF. What’s the redundancy here? • {ZT, ZS} is in BCNF. • Can you anticipate any problems with the
design {ZT, ZS}? • Majority of 3NF designs in practice tend to
be in BCNF.
More on ICs
• Inclusion depencency (IND): r1(Emp, Dept, Sal), r2(Mgr, Dept, Budget). r2[Mgr] r1[Emp]: every manager is an employee.
• INDs are a generalization of referential integrity constraints.
• FDs, INDs. – most fundamental. • What can we say about reasoning
with INDs, or with INDs + FDs?
More on INDs
• Good news -- a complete axiomatization. • Reflexivity: r[X]r[X]. • Projection+permutation: if r[A1,…,Am]
s[B1…Bm], then r[Ai1…Aik] s[Bi1…Bik]. • Transitivity: if r[X] s[Y] and s[Y]
t[Z], then r[X] t[Z]. • But, while testing whether F |= f can be
done in linear time, testing I |= i is PSPACE-complete!
Story of FDs+INDs
• Recall: FDs admit complete axiomatization, as do INDs.
• But, put together, there is no complete (k-ary) axiomatization for INDs+FDs!
• We have seen some (very) positive results and some negative results.
• What approaches can you think of for tackling the negative situations?
Last bits on ICs • There is a rich landscape of lot more classes of
ICs. • Fundamentally split into two syntactic classes:
– Equality generating dependencies (egds), e.g., FDs. – Tuple generating dependencies (tgds), e.g., INDs.
• The value is NOT that these fancier ICs arise as is in practice.
• Rather, they arise in solving various problems: e.g., integrating data from diverse sources.
• E.g. tgd: (\forall M,D,B) [r2(M, D, B) (\exists S) r1(M, D, S)].
• What is it saying, logically speaking?
More Reading
• J.D. Ullman: Principles of Database and Knowledge-Base Systems, vol. I and II. CS Press, 1988.
• R. Ramakrishnan & J. Gehrke. Database Management Sytems. Mc-Graw Hill, latest edition.
• S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. 1995. Available online from DBLP.
Some Resources for Research Literature
• DBLP • Citeseer• ACM digital library• If you aren’t already a member,
consider becoming ACM student member: really a great deal!
Some top conferences/journals for DMM Research
• ACM SIGMOD/PODS • VLDB / PVLDB • IEEE ICDE • ICDT / EDBT • KDD, PKDD, ICDM, SDM, … • CIKM • WWW, WSDM, ICWSM, • TODS, IS, VLDBJ, TKDE, TCS, JACM,
JCSS, I&C, …
Recommended