23
関関関関関関関関関 関関関関関関関関関関関 Formally Verifying the Third Normalization of Relational Databases 関関関 関関関AIST, Yoichi Hirai 2013-11-22, Nagano (TPP 2013) 1

関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

  • Upload
    koren

  • View
    59

  • Download
    2

Embed Size (px)

DESCRIPTION

関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases. 産 総研 平井洋一 AIST, Yoichi Hirai 2013-11-22, Nagano (TPP 2013). ACID p roperties of the database systems. Atomicity Changes are applied in “all or nothing” manner. Partial changes must be rolled back. - PowerPoint PPT Presentation

Citation preview

Page 1: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

1

関係データベースの第三正規化の形式的検証Formally Verifying

the Third Normalization ofRelational Databases

産総研 平井洋一AIST, Yoichi Hirai

2013-11-22, Nagano (TPP 2013)

Page 2: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

2

ACID properties of the database systems

• AtomicityChanges are applied in “all or nothing” manner. Partial changes must be rolled back.

• ConsistencyChanges on valid states result in valid states.

• IsolationEven concurrent changes simulate a temporally serial execution.

• DurabilityOnce changes are applied, they remain forever unless overwritten.

Page 3: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

3

Anomalities: failures of consistency

• Update anomalitytitleID Title Author Library

3 Istanbul Orhan Pamuk Central

3 Istanbul Orhan Pamuk East

titleID Title Author Library

3 Istanbul Orhan Pamuk Central

3 My name is red Orhan Pamuk East

Tried to change the title, but failed to change all occurrences.Consistency is violated.

Page 4: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

4

Anomalities: failures of consistency

• Deletion anomalityFacultyID Faculty

nameFaculty hire date

Course name

Couse day

Couse time

33 R. Wavey 1951-09-01

Physics 2A

Wed 15:00-

34 … … … … …

FacultyID Faculty name

Faculty hire date

Course name

Couse day

Couse time

34 … … … … …

Just removed a course, but removed a faculty as a result.

Page 5: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

5

Codd’s first normal form

titleID Title Author Library Library

3 Istanbul Orhan Pamuk

East Central

5

1st normal form excludes repetition of the same attributes.

Page 6: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

6

Functional dependenciestitleID Title Author Library

3 Istanbul Orhan Pamuk Central

3 Istanbul Orhan Pamuk East

{titleID} → {Title, Author}{titleID, Library} → {titleID, Title, Author, Library}

Page 7: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

7

Functional dependenciesFacultyID Faculty

nameFaculty hire date

Course name

Couse day

Couse time

33 R. Wavey 1951-09-01

Physics 2A

Wed 15:00-

34 … … … … …

{FacultyID} → {Faculty name, Faculty hire date}{FacultyID, Course name} → {Course day, Course time}{FacultyID, Course name} → {FacultyID, Faculty name, Faculty hire date, Course name, Course day, Course time}

Page 8: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

8

Armstrong’s laws

• Mizar has formalization, soundness and completeness with respect to the relational semantics

1. Reflexivity: Y X implies X → Y⊆2. Augmentation: Z W and X → Y imply⊆

X W → Y Z∪ ∪3. Transitivity: X → Y and Y → Z imply X → Z

sound and complete with respect to the relational semantics

Page 9: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

9

Codd’s second normal form• Excludes this

FacultyID Faculty name

Faculty hire date

Course name

Couse day

Couse time

33 R. Wavey 1951-09-01

Physics 2A

Wed 15:00-

34 … … … … …

Because of these conditions1. {FacultyID, course name} is a minimal set X with functional dependency

X → {FacultyID, faculty name, faculty hire date, course name, course day, course time} ({Faculty ID, couse name} is a candidate key).

2. Faculty hire date is not contained in any candidate key (faculty hire date is non-prime attribute)

3. Faculty hire date is dependent on {FacultyID}, which is a proper subset of a candidate key {FacultyID, couse name}.

Page 10: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

10

The third normal form• Excludes this (example from Wikipedia)

Tournament Year Winner Winner Date of Birth

Indiana Invitational 1998 Al Fredrickson 21 July 1975Des Moines Masters 1999 Al Fredrickson 21 July 1975

Indiana Invitational 1999 Chip Masterson 14 March 1977

Because a non-prime attribute “Winner Date of Birth” is transitively dependent on a candidate key. Concretely,1. “Winner Date of Birth” is a

non-prime attribute2. {Tournament, Year} is a

candidate key3. {Tournament, Year} → Winner

holds

4. Winner → {Tournament, Year} does not hold

5. Winner → {Winner Date of Birth} holds

6. “Winner Date of Birth” is not in {Tournament, Year}

7. “Winner Date of Birth” is not in {Winner}

Page 11: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

11

Obtaining the third normal form:the input and output

• Input: a finite set of functional dependencies

• Output: a finite set of relations and their keys (in 3NF)

Tournament

Year

Winner Winner Date of Birth

Tournament Year Winner

Winner Winner Date of Birth

Page 12: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

12

Bernstein’s algorithm 1[Bernstein, 1976]

Obtained after two earlier erroneous attempts!

Page 13: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

13

Bernstein’s algorithm 1, step 1Eliminating extraneous attributes.

Tournament

Year Winner Winner Date of Birth

Place

Tournament

Year

Winner Winner Date of Birth

Place

Smaller, but equivalent (after taking closure of Armstrong’s laws)

Page 14: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

14

Bernstein’s algorithm 1, step 2Finding nonredundant covering

• A set of functional dependencies is nonredundant when no element can be inferred from the others using Armstrong’s laws.

• Step 2 removes functional dependencies until the whole set becomes nonredundant.

Page 15: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

15

Bernstein’s algorithm 1, step 3Partition

Tournament

YearWinner Winner

Date of Birth

Place These two functional dependencies share the left hand side.

Page 16: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

16

Bernstein’s algorithm 1, step 4Construct Relations

Tournament

YearWinner Winner

Date of Birth

PlaceRelation 2{Tournament, Year, Place, Winner}

Relation 1{Winner, Winner Date of Birth}

Underlined attributes are keys.

These relations are in the third normal form. Why?

Page 17: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

17

Formalization Strategies

• Never mention the relational semantics• Attributes are just elements of a type (with

equalities)• A functional dependency is a pair of sequents

of attributes• Derivations based on armstrong’s laws are

defined in an inductive manner.

Page 18: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

18

Termination of algorithms.(coq computes only total, terminating functions)

• Termination of closure (on Armstrong’s laws)– Sizes converge because increasing and bounded– When sizes converge, the closure converges

• Termination of Bernstein’s algorithm 1– This is easier because all steps are simplification in

some case.– Repeat simplifying something until it cannot

simplified further.

Page 19: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

19

Proving Preservation Properties

• Each step preserves the closure of functional dependencies!

• This property holds entirely without exception, so very easy to formalize and to prove (straightforward divide and conquer).

Page 20: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

20

Proving 3NF

• Mostly followed the text(first, I omitted step 1 then the proof attempt failed)

• Changed a little to allow easier formalization.

• Some proof steps not understood entirely–Refactoring should bring enlightenments.

Page 21: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

21

Some changes on Bernstein’s original proof.

Removed this graphical reasoning

“If there exists a (graphical) derivation using a functional dependency g,”

The root cause of such graphical objects

“If all (graphical) derivation uses a functional dependency g,”

A reformulation

Page 22: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

22

Amount of codeParts Lines of code Comments

Properties of Armstrong’s laws &closure operation

~600 Took ~100 lines for proving that monotinic bounded sequence of natural numbers converge.

Definition of steps,Steps keep closures,When steps terminate, certain things are removed totally.

~700 Somewhat boilerplate.

The whole algorithm produces 3NF

~200 Very involved monolithic proof.

Page 23: 関係データベースの 第三正規化の形式的検証 Formally Verifying the Third Normalization of Relational Databases

23

Still to be seen: Bernstein’s algorithm 2

• The number of relations produced by Bernstein’s algorithm 1 is not optimal

• Bernstein’s algorithm 2 gives optimal (= smallest) number of relations, answering Codd’s challenge.

• We just formalized the algorithm 2.

• And multi-dependencies, normal forms 4 and 5.