Upload
julien-delplanque
View
58
Download
0
Embed Size (px)
Citation preview
CodeCritics Applied to Database Schema:
Challenges and First Results
Julien Delplanque1,2 Anne Etien2 Olivier Auverlot2
Tom Mens1 Nicolas Anquetil2 Stephane Ducasse2
1Universite de Mons, [email protected]
2Universite de Lille, CNRS, Inria, Centrale Lille,UMR 9189 - CRIStAL,F-59000 Lille, France
{nom.prenom}@univ-lille1.fr
1 / 35
Use Case Scenario ISmells detection
DBAs need tools to highlight smells, anti-patterns andviolations of business rules.
Rule = a property that the database should have
• Generic rulese.g., foreign keys reference primary keys
• Company or database-specific rulese.g., ensure the respect of naming convention
2 / 35
Use Case Scenario ISmells detection
DBAs need tools to highlight smells, anti-patterns andviolations of business rules.
Rule = a property that the database should have
• Generic rulese.g., foreign keys reference primary keys
• Company or database-specific rulese.g., ensure the respect of naming convention
3 / 35
Use Case Scenario ISmells detection
DBAs need tools to highlight smells, anti-patterns andviolations of business rules.
Rule = a property that the database should have
• Generic rulese.g., foreign keys reference primary keys
• Company or database-specific rulese.g., ensure the respect of naming convention
4 / 35
Use Case Scenario ISmells detection
DBAs need tools to highlight smells, anti-patterns andviolations of business rules.
Rule = a property that the database should have
• Generic rulese.g., foreign keys reference primary keys
• Company or database-specific rulese.g., ensure the respect of naming convention
5 / 35
Use Case Scenario IIDBMS version migration
DBMS evolves to introduce new features or to fix bugs.
• Upgrade migration patches are rarely provided
• Sometimes a textual change log is provided
• DBAs need to identify the migration impact
6 / 35
Use Case Scenario IIDBMS version migration
DBMS evolves to introduce new features or to fix bugs.
• Upgrade migration patches are rarely provided
• Sometimes a textual change log is provided
• DBAs need to identify the migration impact
7 / 35
Use Case Scenario IIDBMS version migration
DBMS evolves to introduce new features or to fix bugs.
• Upgrade migration patches are rarely provided
• Sometimes a textual change log is provided
• DBAs need to identify the migration impact
8 / 35
Use Case Scenario IIDBMS version migration
DBMS evolves to introduce new features or to fix bugs.
• Upgrade migration patches are rarely provided
• Sometimes a textual change log is provided
• DBAs need to identify the migration impact
9 / 35
Use Case Scenario IIIMaintaining consistency
A DB schema may be used as a basis for multiple projects.
• Need to integrate thechanges to profit from theoriginal schema updates
• The consistency of theDB should be kept afteran update
10 / 35
Use Case Scenario IIIMaintaining consistency
A DB schema may be used as a basis for multiple projects.
• Need to integrate thechanges to profit from theoriginal schema updates
• The consistency of theDB should be kept afteran update
11 / 35
Use Case Scenario IIIMaintaining consistency
A DB schema may be used as a basis for multiple projects.
• Need to integrate thechanges to profit from theoriginal schema updates
• The consistency of theDB should be kept afteran update
12 / 35
Additionally...
• All kind of entities (tables, columns, views, functions,. . . ) and the relationships between them are potentiallysubject to quality defects
• Checking for domain-specific or system-specific rulesprovides better defect prevention
• Automatic detection of quality problems is important butresolving them is the ultimate goal
• Resolving an issue on an entity may imply changes onother entities
13 / 35
Additionally...
• All kind of entities (tables, columns, views, functions,. . . ) and the relationships between them are potentiallysubject to quality defects
• Checking for domain-specific or system-specific rulesprovides better defect prevention
• Automatic detection of quality problems is important butresolving them is the ultimate goal
• Resolving an issue on an entity may imply changes onother entities
14 / 35
Additionally...
• All kind of entities (tables, columns, views, functions,. . . ) and the relationships between them are potentiallysubject to quality defects
• Checking for domain-specific or system-specific rulesprovides better defect prevention
• Automatic detection of quality problems is important butresolving them is the ultimate goal
• Resolving an issue on an entity may imply changes onother entities
15 / 35
Additionally...
• All kind of entities (tables, columns, views, functions,. . . ) and the relationships between them are potentiallysubject to quality defects
• Checking for domain-specific or system-specific rulesprovides better defect prevention
• Automatic detection of quality problems is important butresolving them is the ultimate goal
• Resolving an issue on an entity may imply changes onother entities
16 / 35
Table of contents
1 Introduction
2 DBCritics
3 Case Studies
17 / 35
Overview
⇒ Apply traditional Software Quality Analysis methods todatabase schemas
18 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
19 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
20 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
21 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
22 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
23 / 35
Examples of rules
1 Detect use of * in SELECT request
2 View using another view
24 / 35
Table of contents
1 Introduction
2 DBCritics
3 Case Studies
25 / 35
EvaluationDiscovering rule violations on two real databases
• WikiMedia: 25 versions analysed
• AppSI: 12 versions analysed
WikiMedia AppSITables 30/51 71/91Columns 196/353 583/974View 0/1 30/52Functions 3/5 46/67Triggers 2/3 12/16LOC 1,435/2,453 4,910/7,006
Min/Max number of entities per type for each database.
26 / 35
Violation count per version
Rule violations can be found in open source as well as inproprietary DB schemas.
27 / 35
Violating entities proportion
Dashed: violating entities, Solid: entities count.The number of violating entities evolves with the total number
of entities.
28 / 35
“Time-to-fix” of a rule violation
Corrected violations:
• WikiMedia (WM): 21/87
• AppSI: 3/85
⇒ On both DBs some violations are fixed but not all of them.
Time in days needed to correct violations:
Min 1st quantile Median 3rd quantile MaxWM 95 1227 1833 2403 3644AppSI 3 / 125 / 278
29 / 35
“Time-to-fix” of a rule violation
Corrected violations:
• WikiMedia (WM): 21/87
• AppSI: 3/85
⇒ On both DBs some violations are fixed but not all of them.
Time in days needed to correct violations:
Min 1st quantile Median 3rd quantile MaxWM 95 1227 1833 2403 3644AppSI 3 / 125 / 278
30 / 35
False positivesThree categories of violations can be distinguished:
1 Real design issues
2 Issues that the DBA accept to live with
3 Issues due to limitations of DBCritics
Classifying violations in these categories can not be automated.
On AppSI v10, the DBA analysed the 81 rule violations:
Category Count1 512 83 22
⇒ Can not be generalised, just gives an idea.
31 / 35
False positivesThree categories of violations can be distinguished:
1 Real design issues
2 Issues that the DBA accept to live with
3 Issues due to limitations of DBCritics
Classifying violations in these categories can not be automated.
On AppSI v10, the DBA analysed the 81 rule violations:
Category Count1 512 83 22
⇒ Can not be generalised, just gives an idea.
32 / 35
False positivesThree categories of violations can be distinguished:
1 Real design issues
2 Issues that the DBA accept to live with
3 Issues due to limitations of DBCritics
Classifying violations in these categories can not be automated.
On AppSI v10, the DBA analysed the 81 rule violations:
Category Count1 512 83 22
⇒ Can not be generalised, just gives an idea.
33 / 35
Conclusion
• Relational databases are at the core of many informationsystems
• As any artefact, they are subject to errors and qualitydefects
• Empirical study on two real DB supporting the relevanceof the approach
• External validation based on the feedback of AppSI’sDBA supporting the relevance of the tool’s results
34 / 35
Questions
• Do open-source and proprietary DB schemas behavedifferently in terms of rule violations?
• How to practically integrate such an approach in the DBlife-cycle?
• How to convince DBAs of the relevance of the approachsince they have lived without such tools for years?
35 / 35