Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
CompSci 516DataIntensiveComputingSystems
Lecture6aDesignTheoryandNormalization– 2/2
Instructor:Sudeepa Roy
1DukeCS,Fall2017 CompSci516:DatabaseSystems
Announcements• HW1deadline:– Dueon09/21(Thurs),11:55pm,nolatedays
• Projectproposaldeadline:– Preliminaryideaandteammembersdueby09/18(Mon)byemailtotheinstructor
– Proposaldueonsakai by09/25(Mon),11:55pm
• Everyoneshouldbeinagroupnow– otherwiselettheinstructorknowasap
DukeCS,Fall2017 CompSci516:DatabaseSystems 2
Today
• FinishNormalizationfromLecture5• StartDatabaseInternals
• Recap– Whyredundancyisbad– Functionaldependencies– Closureofattributesandfunctionaldependencies
DukeCS,Fall2017 CompSci516:DatabaseSystems 3
Acknowledgement:• Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.• SomeslideshavebeenadaptedfromslidesbyProf.JunYang
NormalForms
Risin4NF⇒ RisinBCNF⇒ Risin3NF⇒ Risin2NF(ahistoricalone)⇒ Risin1NF(everyfieldhasatomicvalues)
DukeCS,Fall2017 CompSci516:DatabaseSystems 4
BCNF
3NF
2NF
1NF
OnlyBCNFand4NFarecoveredintheclass
4NF
Boyce-CoddNormalForm(BCNF)
• RelationRwithFDsF isinBCNF if,forallX→AinF– AϵX(calledatrivial FD),or– XcontainsakeyforR
• i.e.Xisasuperkey
DukeCS,Fall2017 CompSci516:DatabaseSystems 5
Nextlecture:BCNFdecompositionalgorithm
Decomposition
• Eliminatesredundancy• Togetbacktotheoriginalrelation:
6
⋈
uid uname twitterid gid fromDate
142 Bart @BartJSimpson dps 1987-04-19
123 Milhouse @MilhouseVan_ gov 1989-12-17
857 Lisa @lisasimpson abc 1987-04-19
857 Lisa @lisasimpson gov 1988-09-01
456 Ralph @ralphwiggum abc 1991-04-25
456 Ralph @ralphwiggum gov 1992-09-01
… … … … …
uid uname twitterid
142 Bart @BartJSimpson
123 Milhouse @MilhouseVan_
857 Lisa @lisasimpson
456 Ralph @ralphwiggum
… … …
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1987-04-19
857 gov 1988-09-01
456 abc 1991-04-25
456 gov 1992-09-01
… … …
(ontwitter)
• Userid• username• Twitterid• Groupid• JoiningDate
(toagroup)
DukeCS,Fall2017 CompSci516:DatabaseSystems
uid twitterid
142 @BartJSimpson
123 @MilhouseVan_
857 @lisasimpson
456 @ralphwiggum
… …
uid uname
142 Bart
123 Milhouse
857 Lisa
456 Ralph
… …
Unnecessarydecomposition
• Fine:joinreturnstheoriginalrelation• Unnecessary:noredundancyisremoved;schemaismore
complicated(anduid isstoredtwice!)7
uid uname twitterid
142 Bart @BartJSimpson
123 Milhouse @MilhouseVan_
857 Lisa @lisasimpson
456 Ralph @ralphwiggum
… … …
DukeCS,Fall2017 CompSci516:DatabaseSystems
uid fromDate
142 1987-04-19
123 1989-12-17
857 1987-04-19
857 1988-09-01
456 1991-04-25
456 1992-09-01
… …
Baddecomposition
• Associationbetweengid andfromDate islost• Joinreturnsmorerowsthantheoriginalrelation
8
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1987-04-19
857 gov 1988-09-01
456 abc 1991-04-25
456 gov 1992-09-01
… … …uid gid
142 dps
123 gov
857 abc
857 gov
456 abc
456 gov
… …
DukeCS,Fall2017 CompSci516:DatabaseSystems
Losslessjoindecomposition
• Decomposerelation𝑅 intorelations𝑆 and𝑇– 𝑎𝑡𝑡𝑟𝑠 𝑅 = 𝑎𝑡𝑡𝑟𝑠 𝑆 ∪ 𝑎𝑡𝑡𝑟𝑠 𝑇– 𝑆 = 𝜋,--./ 0 𝑅– 𝑇 = 𝜋,--./ 1 𝑅
• Thedecompositionisalosslessjoindecompositionif,givenknownconstraintssuchasFD’s,wecanguaranteethat𝑅 =𝑆 ⋈ 𝑇
• 𝑅 ⊆ 𝑆 ⋈ 𝑇 or𝑅 ⊇ 𝑆 ⋈ 𝑇 ?
• Anydecompositiongives𝑅 ⊆ 𝑆 ⋈ 𝑇 (why?)– Alossy decompositionisonewith𝑅 ⊂ 𝑆 ⋈ 𝑇
9DukeCS,Fall2017 CompSci516:DatabaseSystems
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1987-04-19
857 gov 1988-09-01
456 abc 1991-04-25
456 gov 1992-09-01
… … …
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1988-09-01
857 gov 1987-04-19
456 abc 1991-04-25
456 gov 1992-09-01
… … …
Loss?ButIgotmorerows!
• “Loss”refersnottothelossoftuples,buttothelossofinformation– Or,theabilitytodistinguishdifferentoriginalrelations
10
Nowaytotellwhichistheoriginalrelation
uid fromDate
142 1987-04-19
123 1989-12-17
857 1987-04-19
857 1988-09-01
456 1991-04-25
456 1992-09-01
… …
uid gid
142 dps
123 gov
857 abc
857 gov
456 abc
456 gov
… …DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNFdecompositionalgorithm
• FindaBCNFviolation– Thatis,anon-trivialFD𝑋 → 𝑌 in𝑅 where𝑋 isnotasuperkeyof𝑅
• Decompose𝑅 into𝑅8 and𝑅9,where– 𝑅8 hasattributes𝑋 ∪ 𝑌– 𝑅9 hasattributes𝑋 ∪ 𝑍,where𝑍 containsallattributesof𝑅 thatareinneither𝑋 nor𝑌
• RepeatuntilallrelationsareinBCNF
• Alsogivesalosslessdecomposition!
11DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNFdecompositionexample- 1
• CSJDPQV,keyC,F={JP→ C,SD→ P,J→ S}– TodealwithSD→P,decomposeintoSDP,CSJDQV.– TodealwithJ→ S,decomposeCSJDQVintoJSandCJDQV
• IsJP→ CaviolationofBCNF?
• Note:– severaldependenciesmaycauseviolationofBCNF– Theorderinwhichwepickthemmayleadtoverydifferentsetsof
relations– theremaybemultiplecorrectdecompositions(canpickJ→ Sfirst)
DukeCS,Fall2017 CompSci516:DatabaseSystems 12
BCNFdecompositionexample- 2
13
UserJoinsGroup (uid,uname,twitterid,gid,fromDate)
uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate
BCNFviolation:uid→ uname,twitterid
User (uid,uname,twitterid) Member(uid,gid,fromDate)
BCNFBCNF
uid→ uname,twitteridtwitterid→ uid
uid,gid→ fromDate
DukeCS,Fall2017 CompSci516:DatabaseSystems
14
UserJoinsGroup (uid,uname,twitterid,gid,fromDate)
uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate
BCNFviolation:twitterid→ uid
UserId (twitterid,uid)
Member(twitterid,gid,fromDate)
BCNF
BCNF
twitterid→ unametwitterid,gid→ fromDate
UserJoinsGroup’ (twitterid,uname,gid,fromDate)
BCNFviolation:twitterid→ uname
UserName (twitterid,uname)BCNF
applyArmstrong’saxiomsandrules!
DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNFdecompositionexample- 3
Recap
• Functionaldependencies:ageneralizationofthekeyconcept
• Non-keyfunctionaldependencies:asourceofredundancy
• BCNFdecomposition:amethodforremovingredundancies– BCNFdecompositionisalosslessjoindecomposition
• BCNF:schemainthisnormalformhasnoredundancyduetoFD’s
15DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNF=noredundancy?
• User (uid,gid,place)– Ausercanbelongtomultiplegroups– Ausercanregisterplacesshe’svisited– Groupsandplaceshavenothingtodowithother– FD’s?
• None– BCNF?
• Yes– Redundancies?
• Tons!
16
uid gid place
142 dps Springfield
142 dps Australia
456 abc Springfield
456 abc Morocco
456 gov Springfield
456 gov Morocco
… … …
DukeCS,Fall2017 CompSci516:DatabaseSystems
Multivalueddependencies
• Amultivalueddependency(MVD)hastheform𝑋 ↠ 𝑌,where𝑋 and𝑌 aresetsofattributesinarelation𝑅
• 𝑋 ↠ 𝑌 meansthatwhenevertworowsin𝑅 agreeonalltheattributesof𝑋,thenwecanswaptheir𝑌 componentsandgettworowsthatarealsoin𝑅
17
𝑿 𝒀 𝒁𝑎 𝑏8 𝑐8𝑎 𝑏9 𝑐9… … …
𝑿 𝒀 𝒁𝑎 𝑏8 𝑐8𝑎 𝑏9 𝑐9𝑎 𝑏9 𝑐8𝑎 𝑏8 𝑐9… … …
DukeCS,Fall2017 CompSci516:DatabaseSystems
MVDexamples
User(uid,gid,place)• uid↠ gid• uid↠ place– Intuition:givenuid,attributesgid andplaceare“independent”
• uid,gid↠ place– Trivial:LHS∪ RHS=allattributesof𝑅
• uid,gid↠ uid– Trivial:LHS⊇ RHS
18DukeCS,Fall2017 CompSci516:DatabaseSystems
CompleteMVD+FDrules
• FDreflexivity,augmentation,andtransitivity• MVDcomplementation:If𝑋 ↠ 𝑌,then𝑋 ↠ 𝑎𝑡𝑡𝑟𝑠 𝑅 − 𝑋 − 𝑌
• MVDaugmentation:If𝑋 ↠ 𝑌 and𝑉 ⊆ 𝑊,then𝑋𝑊 ↠ 𝑌𝑉
• MVDtransitivity:If𝑋 ↠ 𝑌 and𝑌 ↠ 𝑍,then𝑋 ↠ 𝑍 − 𝑌
• Replication(FDisMVD):If𝑋 → 𝑌,then𝑋 ↠ 𝑌
• Coalescence:If𝑋 ↠ 𝑌 and𝑍 ⊆ 𝑌 andthereissome𝑊 disjointfrom𝑌 suchthat𝑊 → 𝑍,then𝑋 → 𝑍
19
Tryprovingthingsusingthese!?
Verifytheseyourself!
DukeCS,Fall2017 CompSci516:DatabaseSystems
Anelegantsolution:“chase”
• GivenasetofFD’sandMVD’s𝒟,doesanotherdependency𝑑 (FDorMVD)followfrom𝒟?
• Procedure– Startwiththepremiseof𝑑,andtreatthemas“seed”tuplesinarelation
– Applythegivendependenciesin𝒟 repeatedly• IfweapplyanFD,weinferequalityoftwosymbols• IfweapplyanMVD,weinfermoretuples
– Ifweinfertheconclusionof𝑑,wehaveaproof– Otherwise,ifnothingmorecanbeinferred,wehaveacounterexample
20
Readthisslideafterlookingattheexamples
DukeCS,Fall2017 CompSci516:DatabaseSystems
Proofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵 and𝐵 ↠ 𝐶implythat𝐴 ↠ 𝐶?
21
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐9 𝑑8𝑎 𝑏9 𝑐8 𝑑9
Have: Need:
𝑎 𝑏9 𝑐8 𝑑8𝑎 𝑏8 𝑐9 𝑑9
𝐴 ↠ 𝐵
𝑎 𝑏9 𝑐8 𝑑9𝑎 𝑏9 𝑐9 𝑑8
𝐵 ↠ 𝐶
𝑎 𝑏8 𝑐9 𝑑8𝑎 𝑏8 𝑐8 𝑑9
𝐵 ↠ 𝐶
AA
DukeCS,Fall2017 CompSci516:DatabaseSystems
Anotherproofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 → 𝐵 and𝐵 → 𝐶 implythat𝐴 → 𝐶?
22
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9
Have: Need:𝑐8 = 𝑐9
𝐴 → 𝐵 𝑏8 = 𝑏9𝐵 → 𝐶 𝑐8 = 𝑐9
A
Ingeneral,withbothMVD’sandFD’s,chasecangeneratebothnewtuplesandnewequalities
DukeCS,Fall2017 CompSci516:DatabaseSystems
Counterexamplebychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵𝐶 and𝐶𝐷 → 𝐵implythat𝐴 → 𝐵?
23
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9
Have: Need:𝑏8 = 𝑏9
𝑎 𝑏9 𝑐9 𝑑8𝑎 𝑏8 𝑐8 𝑑9
𝐴 ↠ 𝐵𝐶
D
Counterexample!
DukeCS,Fall2017 CompSci516:DatabaseSystems
4NF
• Arelation𝑅 isinFourthNormalForm(4NF)if– Foreverynon-trivialMVD𝑋 ↠ 𝑌 in𝑅,𝑋 isasuperkey
– Thatis,allFD’sandMVD’sfollowfrom“key→otherattributes”(i.e.,noMVD’sandnoFD’sbesideskeyfunctionaldependencies)
• 4NFisstrongerthanBCNF– BecauseeveryFDisalsoaMVD
24DukeCS,Fall2017 CompSci516:DatabaseSystems
4NFdecompositionalgorithm
• Finda4NFviolation– Anon-trivialMVD𝑋 ↠ 𝑌 in𝑅 where𝑋 isnot asuperkey
• Decompose𝑅 into𝑅8 and𝑅9,where– 𝑅8 hasattributes𝑋 ∪ 𝑌– 𝑅9 hasattributes𝑋 ∪ 𝑍 (where𝑍 contains𝑅 attributesnotin𝑋 or𝑌)
• Repeatuntilallrelationsarein4NF
• AlmostidenticaltoBCNFdecompositionalgorithm• Anydecompositionona4NFviolationislossless
25DukeCS,Fall2017 CompSci516:DatabaseSystems
4NFdecompositionexample
26
uid gid place
142 dps Springfield
142 dps Australia
456 abc Springfield
456 abc Morocco
456 gov Springfield
456 gov Morocco
… … …
User (uid,gid,place)4NFviolation:uid↠gid
Member(uid,gid) Visited(uid,place)4NF 4NFuid gid
142 dps
456 abc
456 gov
… …
uid place
142 Springfield
142 Australia
456 Springfield
456 Morocco
… …
DukeCS,Fall2017 CompSci516:DatabaseSystems
Otherkindsofdependenciesandnormalforms
• Dependencypreservingdecompositions• Joindependencies• Inclusiondependencies• 5NF,3NF,2NF• Seebookifinterested(notcoveredinclass)
DukeCS,Fall2017 CompSci516:DatabaseSystems 27
Summary
• PhilosophybehindBCNF,4NF:Datashoulddependonthekey,thewholekey,andnothingbutthekey!– Youcouldhavemultiplekeysthough
• Redundancyisnotdesiredtypically– notalways,mainlyduetoperformancereasons
• Functional/multivalueddependencies– captureredundancy• Decompositions– eliminatedependencies• Normalforms
– Guaranteescertainnon-redundancy– BCNF,and4NF
• Losslessjoin• HowtodecomposeintoBCNF,4NF• Chase
28DukeCS,Fall2017 CompSci516:DatabaseSystems