35
CodeCritics Applied to Database Schema: Challenges and First Results Julien Delplanque 1,2 Anne Etien 2 Olivier Auverlot 2 Tom Mens 1 Nicolas Anquetil 2 St´ ephane Ducasse 2 1 Universit´ e de Mons, Belgique [email protected] [email protected] 2 Universit´ e de Lille, CNRS, Inria, Centrale Lille, UMR 9189 - CRIStAL, F-59000 Lille, France {nom.prenom}@univ-lille1.fr 1 / 35

CodeCritics Applied to Database Schema: Challenges and First Results

Embed Size (px)

Citation preview

Page 1: CodeCritics Applied to Database Schema: Challenges and First Results

CodeCritics Applied to Database Schema:

Challenges and First Results

Julien Delplanque1,2 Anne Etien2 Olivier Auverlot2

Tom Mens1 Nicolas Anquetil2 Stephane Ducasse2

1Universite de Mons, [email protected]

[email protected]

2Universite de Lille, CNRS, Inria, Centrale Lille,UMR 9189 - CRIStAL,F-59000 Lille, France

{nom.prenom}@univ-lille1.fr

1 / 35

Page 2: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario ISmells detection

DBAs need tools to highlight smells, anti-patterns andviolations of business rules.

Rule = a property that the database should have

• Generic rulese.g., foreign keys reference primary keys

• Company or database-specific rulese.g., ensure the respect of naming convention

2 / 35

Page 3: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario ISmells detection

DBAs need tools to highlight smells, anti-patterns andviolations of business rules.

Rule = a property that the database should have

• Generic rulese.g., foreign keys reference primary keys

• Company or database-specific rulese.g., ensure the respect of naming convention

3 / 35

Page 4: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario ISmells detection

DBAs need tools to highlight smells, anti-patterns andviolations of business rules.

Rule = a property that the database should have

• Generic rulese.g., foreign keys reference primary keys

• Company or database-specific rulese.g., ensure the respect of naming convention

4 / 35

Page 5: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario ISmells detection

DBAs need tools to highlight smells, anti-patterns andviolations of business rules.

Rule = a property that the database should have

• Generic rulese.g., foreign keys reference primary keys

• Company or database-specific rulese.g., ensure the respect of naming convention

5 / 35

Page 6: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario IIDBMS version migration

DBMS evolves to introduce new features or to fix bugs.

• Upgrade migration patches are rarely provided

• Sometimes a textual change log is provided

• DBAs need to identify the migration impact

6 / 35

Page 7: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario IIDBMS version migration

DBMS evolves to introduce new features or to fix bugs.

• Upgrade migration patches are rarely provided

• Sometimes a textual change log is provided

• DBAs need to identify the migration impact

7 / 35

Page 8: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario IIDBMS version migration

DBMS evolves to introduce new features or to fix bugs.

• Upgrade migration patches are rarely provided

• Sometimes a textual change log is provided

• DBAs need to identify the migration impact

8 / 35

Page 9: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario IIDBMS version migration

DBMS evolves to introduce new features or to fix bugs.

• Upgrade migration patches are rarely provided

• Sometimes a textual change log is provided

• DBAs need to identify the migration impact

9 / 35

Page 10: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario IIIMaintaining consistency

A DB schema may be used as a basis for multiple projects.

• Need to integrate thechanges to profit from theoriginal schema updates

• The consistency of theDB should be kept afteran update

10 / 35

Page 11: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario IIIMaintaining consistency

A DB schema may be used as a basis for multiple projects.

• Need to integrate thechanges to profit from theoriginal schema updates

• The consistency of theDB should be kept afteran update

11 / 35

Page 12: CodeCritics Applied to Database Schema: Challenges and First Results

Use Case Scenario IIIMaintaining consistency

A DB schema may be used as a basis for multiple projects.

• Need to integrate thechanges to profit from theoriginal schema updates

• The consistency of theDB should be kept afteran update

12 / 35

Page 13: CodeCritics Applied to Database Schema: Challenges and First Results

Additionally...

• All kind of entities (tables, columns, views, functions,. . . ) and the relationships between them are potentiallysubject to quality defects

• Checking for domain-specific or system-specific rulesprovides better defect prevention

• Automatic detection of quality problems is important butresolving them is the ultimate goal

• Resolving an issue on an entity may imply changes onother entities

13 / 35

Page 14: CodeCritics Applied to Database Schema: Challenges and First Results

Additionally...

• All kind of entities (tables, columns, views, functions,. . . ) and the relationships between them are potentiallysubject to quality defects

• Checking for domain-specific or system-specific rulesprovides better defect prevention

• Automatic detection of quality problems is important butresolving them is the ultimate goal

• Resolving an issue on an entity may imply changes onother entities

14 / 35

Page 15: CodeCritics Applied to Database Schema: Challenges and First Results

Additionally...

• All kind of entities (tables, columns, views, functions,. . . ) and the relationships between them are potentiallysubject to quality defects

• Checking for domain-specific or system-specific rulesprovides better defect prevention

• Automatic detection of quality problems is important butresolving them is the ultimate goal

• Resolving an issue on an entity may imply changes onother entities

15 / 35

Page 16: CodeCritics Applied to Database Schema: Challenges and First Results

Additionally...

• All kind of entities (tables, columns, views, functions,. . . ) and the relationships between them are potentiallysubject to quality defects

• Checking for domain-specific or system-specific rulesprovides better defect prevention

• Automatic detection of quality problems is important butresolving them is the ultimate goal

• Resolving an issue on an entity may imply changes onother entities

16 / 35

Page 17: CodeCritics Applied to Database Schema: Challenges and First Results

Table of contents

1 Introduction

2 DBCritics

3 Case Studies

17 / 35

Page 18: CodeCritics Applied to Database Schema: Challenges and First Results

Overview

⇒ Apply traditional Software Quality Analysis methods todatabase schemas

18 / 35

Page 19: CodeCritics Applied to Database Schema: Challenges and First Results

Examples of rules

1 Detect use of * in SELECT request

2 View using another view

19 / 35

Page 20: CodeCritics Applied to Database Schema: Challenges and First Results

Examples of rules

1 Detect use of * in SELECT request

2 View using another view

20 / 35

Page 21: CodeCritics Applied to Database Schema: Challenges and First Results

Examples of rules

1 Detect use of * in SELECT request

2 View using another view

21 / 35

Page 22: CodeCritics Applied to Database Schema: Challenges and First Results

Examples of rules

1 Detect use of * in SELECT request

2 View using another view

22 / 35

Page 23: CodeCritics Applied to Database Schema: Challenges and First Results

Examples of rules

1 Detect use of * in SELECT request

2 View using another view

23 / 35

Page 24: CodeCritics Applied to Database Schema: Challenges and First Results

Examples of rules

1 Detect use of * in SELECT request

2 View using another view

24 / 35

Page 25: CodeCritics Applied to Database Schema: Challenges and First Results

Table of contents

1 Introduction

2 DBCritics

3 Case Studies

25 / 35

Page 26: CodeCritics Applied to Database Schema: Challenges and First Results

EvaluationDiscovering rule violations on two real databases

• WikiMedia: 25 versions analysed

• AppSI: 12 versions analysed

WikiMedia AppSITables 30/51 71/91Columns 196/353 583/974View 0/1 30/52Functions 3/5 46/67Triggers 2/3 12/16LOC 1,435/2,453 4,910/7,006

Min/Max number of entities per type for each database.

26 / 35

Page 27: CodeCritics Applied to Database Schema: Challenges and First Results

Violation count per version

Rule violations can be found in open source as well as inproprietary DB schemas.

27 / 35

Page 28: CodeCritics Applied to Database Schema: Challenges and First Results

Violating entities proportion

Dashed: violating entities, Solid: entities count.The number of violating entities evolves with the total number

of entities.

28 / 35

Page 29: CodeCritics Applied to Database Schema: Challenges and First Results

“Time-to-fix” of a rule violation

Corrected violations:

• WikiMedia (WM): 21/87

• AppSI: 3/85

⇒ On both DBs some violations are fixed but not all of them.

Time in days needed to correct violations:

Min 1st quantile Median 3rd quantile MaxWM 95 1227 1833 2403 3644AppSI 3 / 125 / 278

29 / 35

Page 30: CodeCritics Applied to Database Schema: Challenges and First Results

“Time-to-fix” of a rule violation

Corrected violations:

• WikiMedia (WM): 21/87

• AppSI: 3/85

⇒ On both DBs some violations are fixed but not all of them.

Time in days needed to correct violations:

Min 1st quantile Median 3rd quantile MaxWM 95 1227 1833 2403 3644AppSI 3 / 125 / 278

30 / 35

Page 31: CodeCritics Applied to Database Schema: Challenges and First Results

False positivesThree categories of violations can be distinguished:

1 Real design issues

2 Issues that the DBA accept to live with

3 Issues due to limitations of DBCritics

Classifying violations in these categories can not be automated.

On AppSI v10, the DBA analysed the 81 rule violations:

Category Count1 512 83 22

⇒ Can not be generalised, just gives an idea.

31 / 35

Page 32: CodeCritics Applied to Database Schema: Challenges and First Results

False positivesThree categories of violations can be distinguished:

1 Real design issues

2 Issues that the DBA accept to live with

3 Issues due to limitations of DBCritics

Classifying violations in these categories can not be automated.

On AppSI v10, the DBA analysed the 81 rule violations:

Category Count1 512 83 22

⇒ Can not be generalised, just gives an idea.

32 / 35

Page 33: CodeCritics Applied to Database Schema: Challenges and First Results

False positivesThree categories of violations can be distinguished:

1 Real design issues

2 Issues that the DBA accept to live with

3 Issues due to limitations of DBCritics

Classifying violations in these categories can not be automated.

On AppSI v10, the DBA analysed the 81 rule violations:

Category Count1 512 83 22

⇒ Can not be generalised, just gives an idea.

33 / 35

Page 34: CodeCritics Applied to Database Schema: Challenges and First Results

Conclusion

• Relational databases are at the core of many informationsystems

• As any artefact, they are subject to errors and qualitydefects

• Empirical study on two real DB supporting the relevanceof the approach

• External validation based on the feedback of AppSI’sDBA supporting the relevance of the tool’s results

34 / 35

Page 35: CodeCritics Applied to Database Schema: Challenges and First Results

Questions

• Do open-source and proprietary DB schemas behavedifferently in terms of rule violations?

• How to practically integrate such an approach in the DBlife-cycle?

• How to convince DBAs of the relevance of the approachsince they have lived without such tools for years?

35 / 35