23
Functional Functional Dependencies Dependencies Jorge Pombar Jorge Pombar

Functional Dependencies Jorge Pombar. Definitions Functional dependencies are the building blocks that enable the analysis of data redundancy and the

  • View
    232

  • Download
    1

Embed Size (px)

Citation preview

Functional Functional DependenciesDependencies

Jorge PombarJorge Pombar

DefinitionsDefinitions

Functional Functional dependenciesdependencies are the are the building blocks that building blocks that enable the analysis of enable the analysis of data redundancy and data redundancy and the elimination of the elimination of anomalies caused by anomalies caused by data redundancy data redundancy through the process of through the process of normalization.normalization.

Normalization Normalization is a is a technique that technique that facilitates systematic facilitates systematic validation of validation of participation of participation of attributes in a relation attributes in a relation schema from a schema from a perspective of data perspective of data redundancy.redundancy.

Functional Dependencies (FD)Functional Dependencies (FD)

An attribute A in a An attribute A in a relation schema R relation schema R functionally functionally determines another determines another attribute B in R if for a attribute B in R if for a given value a1 of A given value a1 of A there is a single, there is a single, specified value b1 of B specified value b1 of B in the relation r of R.in the relation r of R.

A and B can be either A and B can be either atomic or composite.atomic or composite.

A symbolic A symbolic representation of this: representation of this: FD is: A->BFD is: A->B

In other words for A-In other words for A->B to be true. If two >B to be true. If two tuples in r(R) have the tuples in r(R) have the same A values then same A values then they must have the they must have the same B values.same B values.

NotationNotation

A (the left-side of the FD) is the A (the left-side of the FD) is the determinant and B (the right-side of determinant and B (the right-side of the FD) is the dependant.the FD) is the dependant.

If the determinant or the dependant If the determinant or the dependant are composite values then the are composite values then the atomic values are enclosed in braces.atomic values are enclosed in braces.

{Store, Product} -> Quantity{Store, Product} -> Quantity

ExampleExample

Stock table. Normalized or Stock table. Normalized or unnormalized?unnormalized?

Unnormalized!Unnormalized!

• Let’s take a closer look.Let’s take a closer look.

• Is Is QuantityQuantity redundant?redundant?

• How about How about Price?Price?

• What about What about Location Location and and Discount?Discount?

AnomaliesAnomalies

Anomalies happen when a database Anomalies happen when a database operation produces the undesired operation produces the undesired result of affecting the integrity of the result of affecting the integrity of the database.database.

Three types:Three types:• Insertion anomalyInsertion anomaly• Deletion anomalyDeletion anomaly• Update anomalyUpdate anomaly

The three anomalies combined are The three anomalies combined are known as known as modification anomalies.modification anomalies.

ExampleExample

We want to add a blender and its Price to our stock. We want to add a blender and its Price to our stock. We can’t unless we know the store where they’ll be We can’t unless we know the store where they’ll be stocked.stocked.

Insertion anomalyInsertion anomaly!!

If we close store 17 we have to change multiple lines If we close store 17 we have to change multiple lines and we loose the info on the price of the vacuum and we loose the info on the price of the vacuum cleaner.cleaner.

Deletion anomaly!Deletion anomaly!

If we want to change theIf we want to change the Location Location of store 11 we have to of store 11 we have to change all rows were store 11 appears. change all rows were store 11 appears.

Modification anomaly!Modification anomaly!

How do we fix it?How do we fix it? We “split” the data into separate tables to We “split” the data into separate tables to

eliminate redundancies.eliminate redundancies.

StoreStore ProductProduct QuantityQuantity DiscountDiscount

1515 RefrigeratorRefrigerator 120120 5%5%

1515 DishwasherDishwasher 150150 5%5%

1313 DishwasherDishwasher 180180 10%10%

1414 RefrigeratorRefrigerator 150150 5%5%

1414 TelevisionTelevision 280280 10%10%

1414 HumidifierHumidifier 3030

1717 TelevisionTelevision 1010

1717 Vac CleanerVac Cleaner 150150 5%5%

1717 DishwasherDishwasher 150150 5%5%

1111 ComputerComputer 180180 10%10%

1111 RefrigeratorRefrigerator 120120 5%5%

1111 Lawn MowerLawn Mower

InventoryInventory

New tablesNew tables

StoreStore LocationLocation Sq_ftSq_ft ManagerManager

1515 HoustonHouston 23002300 MetzgerMetzger

1313 TulsaTulsa 17001700 MetzgerMetzger

1414 TulsaTulsa 19001900 SchottSchott

1717 MemphisMemphis 23002300 CreechCreech

1111 HoustonHouston 23002300 CreechCreech

ProductProduct PricePrice

RefrigeratorRefrigerator 18501850

DishwasherDishwasher 600600

TelevisionTelevision 14001400

HumidifierHumidifier 5555

Vacuum Vacuum CleanerCleaner

300300

ComputerComputer

Lawn MowerLawn Mower 300300

Washing Washing MachineMachine

750750

StoreStore ProductProduct

New tables (cont.)New tables (cont.)

This new system is less efficient This new system is less efficient when retrieving data. That’s the price when retrieving data. That’s the price paid for eliminating the modification paid for eliminating the modification anomalies. anomalies.

We draw the line between efficiency We draw the line between efficiency and redundancy.and redundancy.

DiscountDiscount is stored redundantly. This is stored redundantly. This is called is called controlled redundancycontrolled redundancy and and is done for efficiency of data is done for efficiency of data retrieval.retrieval.

Inference rules for FDsInference rules for FDs

The set of functional dependencies The set of functional dependencies explicitly specified on a relational explicitly specified on a relational schema is referred a schema is referred a F.F.

Given Given FF it is possible to deduce all it is possible to deduce all other FD’s in R that are not explicitly other FD’s in R that are not explicitly defined.defined.

ClosureClosure is the set of all possible is the set of all possible functional dependencies that hold in functional dependencies that hold in R. It is also referred as R. It is also referred as FF++. .

Armstrong’s AxiomsArmstrong’s Axioms

In 1974 William W. Armstrong In 1974 William W. Armstrong proposed a systematic approach to proposed a systematic approach to derive all possible functional derive all possible functional dependencies that can be inferred dependencies that can be inferred from F using what is now known as from F using what is now known as Armstrong Axioms.Armstrong Axioms.

Armstrong’s Axioms (cont.)Armstrong’s Axioms (cont.)

RuleRule DefinitionDefinition

ReflexivityReflexivity If Y is a subset of X [i.e., if X is (A,B,C,D) and Y is If Y is a subset of X [i.e., if X is (A,B,C,D) and Y is (A,C)], then X->Y.(A,C)], then X->Y.

Example:Example:

{Store, Product} -> Store{Store, Product} -> Store

AugmentationAugmentation If X->Y, then {X,Z} -> {Y,Z}; also {X,Z}->YIf X->Y, then {X,Z} -> {Y,Z}; also {X,Z}->Y

Example: Example:

If Store->Location, then {Store,Product} -> If Store->Location, then {Store,Product} -> {Location,Product} and {Store,Product} ->Location{Location,Product} and {Store,Product} ->Location

TransitivityTransitivity If X->Y, and Y->Z, then X->ZIf X->Y, and Y->Z, then X->Z

Example:Example:

If {Store, Product} ->Quantity and Quantity-> If {Store, Product} ->Quantity and Quantity-> Discount, then {Store, Product} -> DiscountDiscount, then {Store, Product} -> Discount

Armstrong’s Axioms (cont.)Armstrong’s Axioms (cont.)

Four more rules can be derived from the Four more rules can be derived from the previous three.previous three.

RuleRule DefinitionDefinition

DecompositionDecomposition If X->{Y,Z}, then X->Y and X->ZIf X->{Y,Z}, then X->Y and X->Z

UnionUnion If X->Y, and X->Z, then X->{Y,Z}If X->Y, and X->Z, then X->{Y,Z}

CompositionComposition If X->Y, and Z->W, then {X,Z} -> {Y,W}If X->Y, and Z->W, then {X,Z} -> {Y,W}

Pseudo-Pseudo-transitivitytransitivity

If X->Y, and {Y,W} ->Z, then {X,W} ->ZIf X->Y, and {Y,W} ->Z, then {X,W} ->Z

Minimal Cover for a set of FDsMinimal Cover for a set of FDs

It is always useful to identify a simplified It is always useful to identify a simplified set of FDs, Gset of FDs, Gcc, that is equivalent to F. This , that is equivalent to F. This means that they have the same closure means that they have the same closure (F+) as F and its no further reducible.(F+) as F and its no further reducible.

We try to get the set G where F ≡ G. This We try to get the set G where F ≡ G. This means that we could enforce G or F and means that we could enforce G or F and the valid database states will remain the the valid database states will remain the same. same.

In practice the minimal cover is useful In practice the minimal cover is useful because the effort required to check for because the effort required to check for violations in the database is minimized violations in the database is minimized therefore improving the database therefore improving the database performance.performance.

Minimal Cover for a set of FDs (cont.)Minimal Cover for a set of FDs (cont.)

F can be its own minimal cover also known as F can be its own minimal cover also known as canonical cover.canonical cover.

There can be several minimal covers of F.There can be several minimal covers of F. Formally GFormally Gcc is the minimal cover of F if: is the minimal cover of F if:

• GGcc ≡ F ≡ F• The dependant (RHS) in every FD in GThe dependant (RHS) in every FD in Gcc is a is a

singleton attribute. This is called standard or singleton attribute. This is called standard or canonical form.canonical form.

• No FD in GNo FD in Gcc is redundant. In other words, if any FD is redundant. In other words, if any FD in Gin Gcc is discarded, then G is discarded, then Gcc would be no longer would be no longer equivalent to F.equivalent to F.

• The determinant (LHS) if every FD in GThe determinant (LHS) if every FD in Gcc is is irreducible. In other words, if any attribute is irreducible. In other words, if any attribute is discarded from the determinant of any FD in Gdiscarded from the determinant of any FD in Gcc, , then Gthen Gcc would be no longer equivalent to F. would be no longer equivalent to F.

Algorithm to compute the minimal coverAlgorithm to compute the minimal cover

1.1. Set G to F.Set G to F.2.2. Convert all FDs into standard (canonical) Convert all FDs into standard (canonical)

form.form.3.3. Remove all redundant attributes from the Remove all redundant attributes from the

determinant (LHS) of the FDs from Gdeterminant (LHS) of the FDs from G4.4. Remove all redundant FDs from G.Remove all redundant FDs from G.

Two Notes:Two Notes: This algorithm might produce different This algorithm might produce different

results based on the order of candidates results based on the order of candidates removal.removal.

Steps 3 and 4 aren’t interchangeable.Steps 3 and 4 aren’t interchangeable.

ExamplesExamples

Consider a set of attributes {ABC} Consider a set of attributes {ABC} and set of FDs F:and set of FDs F:

fd1: A->Cfd1: A->C fd2: (AC)->Bfd2: (AC)->B

fd3: B->Afd3: B->A fd4: C->(AB)fd4: C->(AB)

• Rewrite in standard form fd4: Rewrite in standard form fd4: fd4a: C->Afd4a: C->A fd4b: C->Bfd4b: C->B

Examples (cont.)Examples (cont.)

• Based on fd4b, A in fd2 is redundant. Based on fd4b, A in fd2 is redundant. We remove it. Now we remove fd4b We remove it. Now we remove fd4b because is identical to fd2.because is identical to fd2.

• We are left with the minimal cover of We are left with the minimal cover of F (F (GGcc):):

fd1: A->Bfd1: A->B fd2: B->Cfd2: B->C

fd3: C->Afd3: C->A

Examples (cont.)Examples (cont.) Consider the set of attributes Consider the set of attributes

{Student,Advisor,Subject,Grade} and a set of FDs {Student,Advisor,Subject,Grade} and a set of FDs F:F:

fd1:{Student,Advisor}->{Grade,Subject}fd1:{Student,Advisor}->{Grade,Subject} fd2: Advisor->Subjectfd2: Advisor->Subject fd3: {Student, Subject}->{Grade,Advisor}fd3: {Student, Subject}->{Grade,Advisor}

• Rewrite in standard form:Rewrite in standard form:fd1a: {Student,Advisor}->Gradefd1a: {Student,Advisor}->Gradefd1b: {Student,Advisor}->Subjectfd1b: {Student,Advisor}->Subjectfd2: Advisor->Subjectfd2: Advisor->Subjectfd3a: {Student,Subject}->Gradefd3a: {Student,Subject}->Gradefd3b: {{Student,Subject}->Advisorfd3b: {{Student,Subject}->Advisor

Examples (cont.)Examples (cont.)• Given fd2, Student is redundant in fd1b. Given fd2, Student is redundant in fd1b.

We remove it. Now we remove fd1b since We remove it. Now we remove fd1b since its identical to fd2.its identical to fd2.

• Next, fd1a is redundant because it’s Next, fd1a is redundant because it’s contained by the set {fd2, fd3a}. We contained by the set {fd2, fd3a}. We remove it.remove it.

• We are left with the minimal cover of F We are left with the minimal cover of F ((GGcc):):

fd2:Advisor->Subjectfd2:Advisor->Subject fd3a: {Student,Subject}->Gradefd3a: {Student,Subject}->Grade fd3b: {Student,Subject}->Advisorfd3b: {Student,Subject}->Advisor

ConclusionConclusion

After we have the ER diagrams each After we have the ER diagrams each relation in the schema must be relation in the schema must be independently reviewed and independently reviewed and normalized when needed.normalized when needed.

This process gives us the final This process gives us the final opportunity to correct errors and opportunity to correct errors and establish a robust design before establish a robust design before implementing the database system. implementing the database system.

ReferencesReferences

Lotito, J. (2001). Lotito, J. (2001). Concepts of Database Concepts of Database Design and Management. Design and Management. Retrived Retrived September 2007 from September 2007 from http://www.sitepoint.com/article/datahttp://www.sitepoint.com/article/database-design-managementbase-design-management

Scamell, R.W., & Umanath N.S. (2007). Scamell, R.W., & Umanath N.S. (2007). Data Modeling and Database Design: Data Modeling and Database Design: Boston, MA: Thomson.Boston, MA: Thomson.