1
Lecture 10:Database Design
XML
Wednesday, October 20, 2004
2
Outline
• Design of a Relational schema (3.6)
• XML
3
Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Third Normal Form (3NF) = this lecture
Boyce Codd Normal Form (BCNF) = this lecture
Others...
4
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.
A relation R is in BCNF if:
If A1, ..., An B is a non-trivial dependency
in R , then {A1, ..., An} is a key for R
A relation R is in BCNF if:
If A1, ..., An B is a non-trivial dependency
in R , then {A1, ..., An} is a key for R
5
BCNF Decomposition Algorithm
A’s OthersB’s
R1
Is there a 2-attribute relation that isnot in BCNF ?
Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2
Until no more violations
Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2
Until no more violations
R2
In practice, we havea better algorithm (next):
6
BCNF Decomposition Algorithm
BCNF_Decompose(R) find X s.t.: X ≠X+ ≠ [all attributes] if (not found) then “R is in BCNF” else let Y = X+ - X let Z = [all attributes] - X+ decompose into R1(X Y) and R2(X Z) BCNF_Decompose(R1) BCNF_Decompose(R2)
BCNF_Decompose(R) find X s.t.: X ≠X+ ≠ [all attributes] if (not found) then “R is in BCNF” else let Y = X+ - X let Z = [all attributes] - X+ decompose into R1(X Y) and R2(X Z) BCNF_Decompose(R1) BCNF_Decompose(R2)
7
Example BCNF DecompositionPerson(name, SSN, age, hairColor, phoneNumber)
SSN name, ageage hairColor
Iteration 1: PersonSSN+ = SSN, name, age, hairColorDecompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber)
Iteration 2: Page+ = age, hairColorDecompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber)
Iteration 1: PersonSSN+ = SSN, name, age, hairColorDecompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber)
Iteration 2: Page+ = age, hairColorDecompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber)
Find X s.t.: X ≠X+ ≠ [all attributes]
What isthe key ?
8
Other Example
• R(A,B,C,D) A B, B C
• Iteration 1: X = A: A+= ABC– split R into R1(A,B,C) R2(A,D)
• Iteration 2: X = B: B+=BC– Split R into R3(B,C), R4(A,B), R2(A,D)
• What happens if at iteration 1 we pick X = AB ?
What isthe key ?
9
3NF: A Problem with BCNF
Unit CompanyCompany, Product Unit
Unit CompanyCompany, Product Unit
Unit+ = Unit, Company
We loose the FD: Company, Product Unit !!
Unit Company Product
Unit Company Unit Product
Unit CompanyUnit Company
10
So What’s the Problem?
No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again:
Unit Company
Galaga99 UW
Bingo UW
Unit Product
Galaga99 Databases
Bingo Databases
Unit Company Product
Galaga99 UW Databases
Bingo UW Databases
Unit CompanyUnit Company
Company, Product UnitCompany, Product UnitViolates the FD:
11
The Problem
• We started with a table R and FD
• We decomposed R into BCNF tables R1, R2, …with their own FD1, FD2, …
• We can reconstruct R from R1, R2, …
• But we cannot reconstruct FD from FD1, FD2, …
12
Solution: 3rd Normal Form (3NF)
A simple condition for removing anomalies from relations:
A relation R is in 3rd normal form if :
Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } a super-key for R, or B is part of a key.
A relation R is in 3rd normal form if :
Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } a super-key for R, or B is part of a key.
Tradeoff:BCNF = no anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies
13
3NF Decomposition Algorithm
3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X+ - X - K ≠ and X+ ≠ [all attributes] if (not found) then “R is in 3NF” else let Y = X+ - X - K let Z = [all attributes] - (X Y) decompose into R1(X Y) and R2(X Z) 3NF_Decompose(R1) 3NF_Decompose(R2)
3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X+ - X - K ≠ and X+ ≠ [all attributes] if (not found) then “R is in 3NF” else let Y = X+ - X - K let Z = [all attributes] - (X Y) decompose into R1(X Y) and R2(X Z) 3NF_Decompose(R1) 3NF_Decompose(R2)
14
Example of 3NF decompositionR(A,B,C,D,E):R(A,B,C,D,E):
AB CC DD BD E
AB CC DD BD E
Keys: (need to compute X+, for several Xs) AB, AC, AD
K = {A, B, C, D}
Pick X = CC+ = BCDEC BDE is a BCNF violationFor 3NF: remove B, D (part of K):C E is a 3NF violationDecompose: R1(C, E), R2(A,B,C,D)
R1 is in 3NFR2 is in 3NF (because its keys: AB, AC, AD)
15BCNF
3NF v.s. BCNF DecompositionA B C D E F G H K
A B C D E E F G H K
E F G G H KA B C C D E
A B A B A B A B A B A B A BA B
3NF
16
XML Outline
• XML (4.6, 4.7)– This lecture: syntax, semistructured data– Next lectures: DTDs, XPath, XQuery
17
Additional Readings on XML
• XQuery from the Experts, Katz, Ed. – The reference on Xquery
• http://www.w3.org/XML/1999/XML-in-10-points• www.zvon.org/xxl/XMLTutorial/General/book_en
.html• http://db.bell-labs.com/galax/• Main source: www.w3.org (but hard to read)
18
XML
• eXtensible Markup Language
• XML 1.0 – a recommendation from W3C, 1998
• Roots: SGML (a very nasty language).
• After the roots: a format for sharing data
19
XML Data
• Relational data does not have a syntax– I can’t “give” you my relational database– Need to import it from other other syntax, like CSV (comma-
separated-values)
• XML = rich syntax for data– But XML is not relational: semistructured
• Usage:– Map any data to XML– Store it in files, exchange on the Web, etc.– Even query it directly, using XPath, XQuery
20
From HTML to XML
HTML describes the presentation
21
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
22
XML<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
XML describes the content
23
XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:
<book>…</book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element
well formed XML document: if it has matching tags
24
More XML: Attributes
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>attributes are alternative ways to represent data
25
More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
oids and references in XML are just syntax
26
More XML: CDATA Section
• Syntax: <![CDATA[ .....any text here...]]>
• Example:
<example> <![CDATA[ some text here </notAtag> <>]]></example>
<example> <![CDATA[ some text here </notAtag> <>]]></example>
27
More XML: Entity References
• Syntax: &entityname;
• Example: <element> this is less than < </element>
• Some entities: < <
> >
& &
' ‘
" “
& Unicode char
28
More XML: Processing Instructions
• Syntax: <?target argument?>
• Example:
• What do they mean ?
•<product> <name> Alarm Clock </name> <?ringBell 20?> <price> 19.99 </price></product>
•<product> <name> Alarm Clock </name> <?ringBell 20?> <price> 19.99 </price></product>
29
More XML: Comments
• Syntax <!-- .... Comment text... -->
• Yes, they are part of the data model !!!
30
XML Namespaces
• http://www.w3.org/TR/REC-xml-names (1/99)
• name ::= [prefix:]localpart
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
31
<tag xmlns:mystyle = “http://…”>
…
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
<tag xmlns:mystyle = “http://…”>
…
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
XML Namespaces
• syntactic: <number> , <isbn:number>
• semantic: provide URL for schema
Belong to this namespace
32
From Relational Data to XML Data
<persons><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone>
6363</phone></row></persons>
<persons><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone>
6363</phone></row></persons>
row row row
name name namephone phone phone
“John” 3634 “Sue” “Dick”6343 6363Persons
XML: persons
Name Phone
John 3634
Sue 6343
Dick 6363
33
XML Data
• XML is self-describing
• Schema elements become part of the data– Reational schema: persons(name,phone)– In XML <persons>, <name>, <phone> are part
of the data, and are repeated many times
• Consequence: XML is much more flexible
• XML = semistructured data
34
Semi-structured Data Explained
• Missing attributes:
• Could represent ina table with nulls
<person> <name> John</name> <phone>1234</phone> </person>
<person> <name>Joe</name></person>
<person> <name> John</name> <phone>1234</phone> </person>
<person> <name>Joe</name></person> no phone !
name phone
John 1234
Joe -
35
Semi-structured Data Explained
• Repeated attributes
• Impossible in tables:
<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>
<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>
two phones !
name phone
Mary 2345 3456 ???
36
Semistructured Data Explained
• Attributes with different types in different objects
• Nested collections (no 1NF)• Heterogeneous collections:
– <db> contains both <book>s and <publisher>s
<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>
<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>
structured name !