Upload
koren
View
59
Download
2
Embed Size (px)
DESCRIPTION
関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases. 産 総研 平井洋一 AIST, Yoichi Hirai 2013-11-22, Nagano (TPP 2013). ACID p roperties of the database systems. Atomicity Changes are applied in “all or nothing” manner. Partial changes must be rolled back. - PowerPoint PPT Presentation
Citation preview
1
関係データベースの第三正規化の形式的検証Formally Verifying
the Third Normalization ofRelational Databases
産総研 平井洋一AIST, Yoichi Hirai
2013-11-22, Nagano (TPP 2013)
2
ACID properties of the database systems
• AtomicityChanges are applied in “all or nothing” manner. Partial changes must be rolled back.
• ConsistencyChanges on valid states result in valid states.
• IsolationEven concurrent changes simulate a temporally serial execution.
• DurabilityOnce changes are applied, they remain forever unless overwritten.
3
Anomalities: failures of consistency
• Update anomalitytitleID Title Author Library
3 Istanbul Orhan Pamuk Central
3 Istanbul Orhan Pamuk East
titleID Title Author Library
3 Istanbul Orhan Pamuk Central
3 My name is red Orhan Pamuk East
Tried to change the title, but failed to change all occurrences.Consistency is violated.
4
Anomalities: failures of consistency
• Deletion anomalityFacultyID Faculty
nameFaculty hire date
Course name
Couse day
Couse time
33 R. Wavey 1951-09-01
Physics 2A
Wed 15:00-
34 … … … … …
FacultyID Faculty name
Faculty hire date
Course name
Couse day
Couse time
34 … … … … …
Just removed a course, but removed a faculty as a result.
5
Codd’s first normal form
titleID Title Author Library Library
3 Istanbul Orhan Pamuk
East Central
5
1st normal form excludes repetition of the same attributes.
6
Functional dependenciestitleID Title Author Library
3 Istanbul Orhan Pamuk Central
3 Istanbul Orhan Pamuk East
{titleID} → {Title, Author}{titleID, Library} → {titleID, Title, Author, Library}
7
Functional dependenciesFacultyID Faculty
nameFaculty hire date
Course name
Couse day
Couse time
33 R. Wavey 1951-09-01
Physics 2A
Wed 15:00-
34 … … … … …
{FacultyID} → {Faculty name, Faculty hire date}{FacultyID, Course name} → {Course day, Course time}{FacultyID, Course name} → {FacultyID, Faculty name, Faculty hire date, Course name, Course day, Course time}
8
Armstrong’s laws
• Mizar has formalization, soundness and completeness with respect to the relational semantics
1. Reflexivity: Y X implies X → Y⊆2. Augmentation: Z W and X → Y imply⊆
X W → Y Z∪ ∪3. Transitivity: X → Y and Y → Z imply X → Z
sound and complete with respect to the relational semantics
9
Codd’s second normal form• Excludes this
FacultyID Faculty name
Faculty hire date
Course name
Couse day
Couse time
33 R. Wavey 1951-09-01
Physics 2A
Wed 15:00-
34 … … … … …
Because of these conditions1. {FacultyID, course name} is a minimal set X with functional dependency
X → {FacultyID, faculty name, faculty hire date, course name, course day, course time} ({Faculty ID, couse name} is a candidate key).
2. Faculty hire date is not contained in any candidate key (faculty hire date is non-prime attribute)
3. Faculty hire date is dependent on {FacultyID}, which is a proper subset of a candidate key {FacultyID, couse name}.
10
The third normal form• Excludes this (example from Wikipedia)
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Because a non-prime attribute “Winner Date of Birth” is transitively dependent on a candidate key. Concretely,1. “Winner Date of Birth” is a
non-prime attribute2. {Tournament, Year} is a
candidate key3. {Tournament, Year} → Winner
holds
4. Winner → {Tournament, Year} does not hold
5. Winner → {Winner Date of Birth} holds
6. “Winner Date of Birth” is not in {Tournament, Year}
7. “Winner Date of Birth” is not in {Winner}
11
Obtaining the third normal form:the input and output
• Input: a finite set of functional dependencies
• Output: a finite set of relations and their keys (in 3NF)
Tournament
Year
Winner Winner Date of Birth
Tournament Year Winner
Winner Winner Date of Birth
12
Bernstein’s algorithm 1[Bernstein, 1976]
Obtained after two earlier erroneous attempts!
13
Bernstein’s algorithm 1, step 1Eliminating extraneous attributes.
Tournament
Year Winner Winner Date of Birth
Place
Tournament
Year
Winner Winner Date of Birth
Place
Smaller, but equivalent (after taking closure of Armstrong’s laws)
14
Bernstein’s algorithm 1, step 2Finding nonredundant covering
• A set of functional dependencies is nonredundant when no element can be inferred from the others using Armstrong’s laws.
• Step 2 removes functional dependencies until the whole set becomes nonredundant.
15
Bernstein’s algorithm 1, step 3Partition
Tournament
YearWinner Winner
Date of Birth
Place These two functional dependencies share the left hand side.
16
Bernstein’s algorithm 1, step 4Construct Relations
Tournament
YearWinner Winner
Date of Birth
PlaceRelation 2{Tournament, Year, Place, Winner}
Relation 1{Winner, Winner Date of Birth}
Underlined attributes are keys.
These relations are in the third normal form. Why?
17
Formalization Strategies
• Never mention the relational semantics• Attributes are just elements of a type (with
equalities)• A functional dependency is a pair of sequents
of attributes• Derivations based on armstrong’s laws are
defined in an inductive manner.
18
Termination of algorithms.(coq computes only total, terminating functions)
• Termination of closure (on Armstrong’s laws)– Sizes converge because increasing and bounded– When sizes converge, the closure converges
• Termination of Bernstein’s algorithm 1– This is easier because all steps are simplification in
some case.– Repeat simplifying something until it cannot
simplified further.
19
Proving Preservation Properties
• Each step preserves the closure of functional dependencies!
• This property holds entirely without exception, so very easy to formalize and to prove (straightforward divide and conquer).
20
Proving 3NF
• Mostly followed the text(first, I omitted step 1 then the proof attempt failed)
• Changed a little to allow easier formalization.
• Some proof steps not understood entirely–Refactoring should bring enlightenments.
21
Some changes on Bernstein’s original proof.
Removed this graphical reasoning
“If there exists a (graphical) derivation using a functional dependency g,”
The root cause of such graphical objects
“If all (graphical) derivation uses a functional dependency g,”
A reformulation
22
Amount of codeParts Lines of code Comments
Properties of Armstrong’s laws &closure operation
~600 Took ~100 lines for proving that monotinic bounded sequence of natural numbers converge.
Definition of steps,Steps keep closures,When steps terminate, certain things are removed totally.
~700 Somewhat boilerplate.
The whole algorithm produces 3NF
~200 Very involved monolithic proof.
23
Still to be seen: Bernstein’s algorithm 2
• The number of relations produced by Bernstein’s algorithm 1 is not optimal
• Bernstein’s algorithm 2 gives optimal (= smallest) number of relations, answering Codd’s challenge.
• We just formalized the algorithm 2.
• And multi-dependencies, normal forms 4 and 5.