14
( An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb Laboratoire MASI, Universite P. et M. Curie, Centre de Versailles 45, avenue des Etats-Unis 78000 Versailles, France Email: [email protected] A bstact : Database design has become an art that requires a high level of competence. The database design process is characterized by a certain indetermination in the way of choosing data structures and constraints. Several different schemas may describe the same reality. The design process is also characterized by an intuitive and empirical methodology; consequently, the quality of the schema obtained is heavily dependent on the database administrator's experience and insight in the database design. The development of increasingly sophisticated and efficient relational Data Base Management Systems has emphasized the need for increasingly complex information systems. It is therefore no longer possible to envision database design carried out without a computer aid. The expert system approach, described in this paper, is more adapted to the difficult problem of database modeling. It integrates algorithms, heuristics and practical know-how, and proposes an interactive methodology which is based on abstraction levels, user friendly interfaces and a modular knowledge base. This approach was fIrst implemented by a prototype which was followed by a commercial product named SECSI<l!> (Systeme Expert en Conception de Systemes d'information). Keywords: Information Systems, Database Design, Semantic Data Models, Relational Model, Expert Systems, Computer Aided Software Engineering. ® SEeSI is a registrated mark of INFOSYS.

An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

Embed Size (px)

Citation preview

Page 1: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

(

An Expert System forSemantic and Relational Database Design

Mokrane BOllzeglzollb

Laboratoire MASI, Universite P. et M. Curie, Centre de Versailles

45, avenue des Etats-Unis 78000 Versailles, France

Email: [email protected]

Abstact :

Database design has become an art that requires a high level of competence. The databasedesign process is characterized by a certain indetermination in the way of choosing datastructures and constraints. Several different schemas may describe the same reality. The designprocess is also characterized by an intuitive and empirical methodology; consequently, thequality of the schema obtained is heavily dependent on the database administrator's experienceand insight in the database design. The development of increasingly sophisticated and efficientrelational Data Base Management Systems has emphasized the need for increasingly complexinformation systems. It is therefore no longer possible to envision database design carried outwithout a computer aid. The expert system approach, described in this paper, is more adaptedto the difficult problem of database modeling. It integrates algorithms, heuristics and practicalknow-how, and proposes an interactive methodology which is based on abstraction levels, userfriendly interfaces and a modular knowledge base. This approach was fIrst implemented by aprototype which was followed by a commercial product named SECSI<l!> (Systeme Expert enConception de Systemes d'information).

Keywords:

Information Systems, Database Design, Semantic Data Models, Relational Model, Expert •Systems, Computer Aided Software Engineering.

® SEeSI is a registrated mark of INFOSYS.

Page 2: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

1. IntroductionRelational data bases are nowadays the preferentialtools for memorization and restitution of largeamounts of data. This is especially true because ofthe great performance of the systems (DBMS).However, using the relational model as aconceptual design tool was somewhatcontroversial [KENT 79]. Indeed relationalconcepts are neither simple to use nor sufficient tocapture the semantics of the user's application. Atthe conceptual level, new models are necessary tocapture the semantics of the real world withpreciseness and naturalness. Semantic datamodels seem to be more suitable for this objective[SMIT 77], [HAMM 81], [MYLO 80], [HULL87], [PECK 88].

Even with these new models, database designremains a lengthy and tedious process withoutcomputer aids. Many computer tools have alreadybeen built, but they are far from solving all thedata base design problems.

We have proposed a new approach based ontechniques used in artificial intelligence [BOUZ83]. This approach aims to combine variouscategories of knowledge coming both fromrelational theory, semantic data models, artificialintelligence and software engineering. An expertsystem product for conceptual and logical designhas been built to apply the main ideas of thisapproach and to point to specific problems[BOUZ 85].

The SECSI system is developed in Prologlanguage, under PC-MS/DOS environment andUNIX workstation environment (particularly SUNworkstations). Several other environments areenvisioned in the nearest future. The product wascommercially available since june 1988, and morethan fifthy installations exist today. An englishversion will be available too.

The product SECSI can be considered as acentral part of a software engineeringenvironment. It was one of the first tools of theCASE technology. SECSI can be used as well asby expert designers for data modeling or schemavalidation, and by novice users and students inorder to learn database design and relationaltechnology.

After a brief outline, in section 2, of the database design problems in both traditional and newdatabases, section 3 presents an overview of theexisting tools. Section 4 characterizes the expertsystem approach for database design. Sections 5and 6 detail the expert system approach, itsobjectives, its structure and the knowledgeinvolved. Section 7 highlights how a deductive

approach can efficiently contribute to the databaseconceptual design. Finally, section 8 concludes bypointing to the future developments.

2. Database Design ProblemsData base design is a difficult task. Since thebegining of the database era, many designproblems have been pointed. In addition to theseproblems, the introduction of knowledge bases,deductive databases and object oriented databases,impose new complex problems. The followingsub-sections outline the different problems as wellas in traditional databases and in future databases.In the following, we are not concerned by thephysical organisation of data, nor by itsoptimization. We point only to those problemsimplied by the conceptual level and the logicallevel (i.e. semantic level).

2.1. Design problems intraditional databases.

In this section, we outline the main problemswhich must be solved in traditional databases.

(1) Size oj the application: the difficulty ofthe design increases with the size of theapplication. Designing small data bases is quiteeasy, but structuring several hundred objects,relationships and constraints without computeraids is always a hard task, especially for peoplewhose profession is not to be a "specialist" indatabase design.

(2) Relative perception oj the real worldand modeling choices: the real world may bereported in different ways with respect to thedesigners' perceptions: several database schemasmay describe the same reality. The problem is tocharacterize the best schema with respect to formalrules, then to fmd a methodology to build and tochoose this schema.

(3) Availability oj injormation and theproblem oj restructuring : the designercannot often get a detailed description of objectsbecause of the lack of knowledge about theapplication. However the design must start withincomplete and imperfect knowledge. As thedesign is bas"ed on a fuzzy universe of discourse,new information may arise and may entail changesin the definition of classes, relationships andconstraints.These changes must be integratedwithout reconsidering all thc modeling phaseswhich were already done.

Page 3: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

(

2.2. Design problems in futuredatabases

Besides the traditional database problemskn.owledge bases. deductive databases and objectonented databases have introduced the followingproblems [BROD 84). [GALL 84). [BROD 86).

(l). Representation of more complexobjects: future databases will be able torepresent more complex objects than flat datastructures. The new objects may be rules. abstractdata types•.texts. graphics or images. Hence thecorrespondmg representation models could bevery difficult to learn and to use.

(2) Rep'resentation of more complexcOllstramts : models have to express morecomplex constraints as general integrity constraints(state constraints) and dynamic constraints(transition constraints). For example. some logicbased models allow to express most constraints asany first ~rder formu~as. Besides this problem ofspecification, the deSigner has to decide wether aset of integrity constraints is consistent. easy tocheck and to maintain.

(3). Represent~tion of tile dynamics ofob!ects: tr~dltlonal database design makes aSlnct separation between data and the behaviouralof this data. The trends in the new models are (as~vell as in the object oriented languages) tomtegrate the dynamic aspect with the static aspectof the database [OODB 86). This leads to adouble reflection effon during the database designprocess.

. This list of problems shows how complex anddifficult the database design was and will be.The.n! nowdar, no useful methodology could beenvlSlOl1ed Without computer design aid. That iswhy CASE technology becomes more and morenecessary and urgent.

3. Overview of ExistingDatabase Design Tools

To solve some of the traditional databaseproblems. many tools have been proposed andcommercialized during the last decade [DBEN84]. [BOUZ 86b). [BROD 87). We distinguishthree categories of tools. They are generallycharacterized by the methodology and the modelswhich are supponed.

The first category of tools consists of. graphi~al tools which are generally based onsemantic data models lilce the Entity-RelationshipModel [CHEN 76), or the Semantic Hierarchy

2

Model [SMIT 77). The different graphical toolswhic~ are pr?vided act as a word process; theyp~rrrut to deSign. to store and to layout aestheticdla~rams. but they rarely help in the modelingchOIces nor m the consistency checking of thedeSIgned schemas. All the design choices are leftto the database administrator (DBA). In the bestcase. the graphical tool offers a few facilities forsyntactic control or. if integrated with a datadictionary. it insures the correspondance betweent~e ~omponents of the schema and the terms of thedIc~lOnary. These tools are very interesting fortherr user fn~ndly aspect and their help to a cleandocumentation. but they are far from beingeffective modeling tools as they do not make anydesign choice.

The second category of tools provides a setof al~orithms which are generally built upon theRelanonal M?d.el [CODD 7~). Such tools provideprograms denvmg a normalized relational schemafroma set of attributes and dependencies [BERN76) [BEER 79) [FAGIN 77). This approach is oneof the best formalized. it permits to characterizeand to automaticaly design the best relationalschema: with respect to redundancy and to updateanomalies. Unforrunately, this approach ignoresnatural. objects such as entities. relationships,genera~zatlon hierarchies and semantic integrityconstramts. The only constraints which arehand.led are generaly functional dependencies andmultivalued dependencies (all of them consideredas semantic links). But the acquisition of thesedependencies. in the context of the universelrelation assunption. remains a combinatoryproblem which requires a detailed study of thesemantics of all elementary relationships betweenattributes. However. if these tools are not suitablefor the conceptual level, they could be strongcomponents at the logical level.

The third category of tools consists of a setof interactive programs which can be consideredas a combination of manual tools and algorithmictools. They often call for CAD techniques [BROD87). Interactions between the users and thesystem are question-answering oriented orgraphics language oriented. This approach isbased on the idea that database design is panly anautomated process and partly a human an'lBecause of the last reason. this category of tools ismore realistic t!Jan the. previous category of tools,and thus more mterestmg from the practical pointof view. However if this approach does notalways succeed. it is because computer aids areprogrammed once and for all; hence it is difficultto modify or to add design rules. Tools appear asblack boxes in which you must believe and accepttheir results, or you don't believe, then youremake manually the design. But this remains aninteresting approach if we combine it with the new

Page 4: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

developments in knowledge engineering anddeductive systems. That is what we have donewith SECS!.

The computer design tool we have built,combines the advantages of these three catogoriesof tools and aims to do more by providing aunified methodology which combines theoreticalknowledge (expressed by algorithms and rules)and practical know-how (expressed by heuristicsand cooking recipes). The deductive approachfulfils the lacks of the first two categories of 100ls,and a modular knowledge base increases theevolution of the design techniques and allows theCASE system to evolve better than the thirdcategory of tools.

4. SEeSI: a New Generationof Tools for the DatabaseDesigners

This section summarizes the different objectives ofSECSI, highlights its different characteristics as anexpert system and reports the different levels ofexpertise of its knowledge base.

4.1. Main Objectives of SECSI

SECSI has been conceived for answering manyproblems related 10 the development of databaseapplications. Generally, design 1001s are devotedto a given specific task within the design process.SECSI aims to cover all the life cycle of thisprocess, from the conceptual level to the physicallevel [BOUZ 86a]. These objectives are declaredhereafter as a set of questions for which thedesired system must give answers.

(l) Conceptual lIlodeling: how to transformincomplete specifications of a problem, expressedin an ambiguous natural language, into a formal,complete and consistent conceptual schema? How objects can be classified as entities,

relationships, attributes or constraints? Can thehuman designer be liberated from these differentchoices?

(2) Physical design : how to generate anefficient data slOring and retrieving structure,starting from the conceptual schema of adatabase? How to take into account the differentparameters that characterize: (i) the givenapplication and its cost requirements, (ii) thesoftware and hardware environment which will beused for development and implementation?

(3) Schema restructuring: how to make surethat the database structure will evolve as the

3

information system changes? How to add newattributes, new relationships and new constraintswithout redesigning the whole database? How 10integrate several data base schemas that have beendesigned independently?

(4) Database administration: how to developand maintain a detalled and precise documentationon the complete life cycle of the database (from thedesign phase through the operational phase) ?What are the repercussions on the data and thealready eltisting programs when the characteristicsof the software or hardware environment aremodified ?

Some of these objectives have already beenreached (e.g. conceptual and logical design, someschema restructuring and some documentation).Others are in the prototyping phase (e.g. physicaldesign, view integration).

More than just a tool for the design of databaseschemas, SECSI can also be used in many otherways as for example, a validation tool of manualdesign or as an educational tool in data modeling.Combined with a DBMS, SECSI can beconsidered as a powerfull prototypingenvironment of relational databases. SECSI isalready used to design buiseness applications aswell as technical applications like geographicaldatabases.

4.1. Main Characteristics of anExpert System for Database Design

The general defmition of an expert system statesthat it is a specialized software 1001 which solves aproblem as well as a human expert can do [HAYE84] . Practically, in our area, this definition maybe interpreted by the following four concretecharacteristics:

The first characteristic is to constitute acomplete knowledge base includingtheoretical algorithms and rules, and experimentalknowledge in the database design process.

The second characteristic is to provide aninteractive methodological environmentwhich accepts incomplete specifications, providesthe same reasoning as a human expert, permitsbacktracking 10 any design step, uses a question­answering system and can infer over examples.

The third characteristic is to be an opensystem which can learn and integrate newtheoreiical and experimental rules. The systemmust also trartsfer its expertise through its use (viaexplanation of its design rules) and throughjustification of its results.

Page 5: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

The founh characteristic is to provide end·user friendly interfaces by offering aworkstation which reproduces the main featuresof the manual design (Le. a graphical interface todesign the first rough sketch of the databaseschema. a natural language to facilitate thecommunication of the specifications, and adeclarative language to specify high levelassertions that could have ambiguousinterpretation in the natural language or a complexrepresentation in the graphical interface). Tofacilitate the interaction between the system and theuser. other features like icons. menues and mousemust be used too.

4.2. The Different Levels ofExpertise of a Case Toolin Database Design

propenies are described by facts and rules whichconstitute the detailed description of a givenapplication.

All the knowledge referred to above constitutethe knowledge base of the expen system SECS!.

5. An Open SystemArchitecture

SECSI architecture is characterised by itsmodularity which permits, thanks to the atomicityof its knowledge base. to add new design rulesand new various interfaces. This section describesthe components of this architecture.

5.1. The knowledge baseThe expenise of the system is divided into threecategories:

The knowledge base is composed of two pans: arule base and a fact base.

Fig,1: The SEeS! architecture

The rule base is created and updated by theexpen designer. This base contains general rulessuch as normalization rules, mapping rules; butalso specific rules which can be system dependentor even application dependent. General designrules are grouped into the DESIGN module whichis used by the methodological regulator and theexplanator. This part of the knowledge constitutesthe system shell. hence automatically delivered inthe basic version of the system.

The fact base is designed by the databaseadministrator. It contains compiled specificationsdescribing a given application. This part iscreated and updated by the DBA. It correspondsto the problem which is submitted to the system.

I

I .N E,E N• GE IN NC EE

IUUlfATOI

RULIS

Dun ImUI"ClSl"I\"DAiD SQL

ISOWLlDCl ....n

DOCUMlNT4110lf

INTlRlACU

II'fATUlAL IIDleu""T·1

ICRAPHIC·I

(3) Specific application knowledge: withina specific application domain. there are somepropenies which characterize each application(e.g. reservation in a given travel agency). These

(2) Specific domain knowledge: for eachapplication domain (e.g. insurance. banking.medicine. travels). there is some commonterminology. managerial rules and skills whichrepresent the specific know-how of the domain.This know-how can be represented. as well as wecan fomalize it. either by general behavioural rulesor by general predefined structures. Thisknowledge is stored in the system data dictionaryand reused. when appropriate. during the designprocess.

(1) Theoretical knowledge: this knowledgecoincides with the design concepts (models, rules)and the design methodology (advices. reasoningprinciples). Theoretical knowledge is composedof algorithms. rules and heuristics:

A1~orithms : some parts of the design processare well isolated and formalized. and havealready been expressed by many efficientalgorithms. such as normalization algorithms.cost evaluation of transactions and access pathsoptimization.

- .R.uk£: correspond to some known or admittedexpenise as in normalization process(Armstrong's inference rules). view integ­ration rules. mappings rules between differentmodels. and consistency enforcement rules.

- Heuristics: may be particular interpretations ofthe real world. assumptions in some valuedistributions. or simplification of thecorrelations between attributes or betweenconstraints.

4

Page 6: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

SIJIl'lber Muque r)Jle PO'/I'tl' Color

V~fd rd

To represent these two types of knowledge, weutilize two different representation models: asemantic network to represent facts and productionrules to represent behavioural constraints and tospecify the design rules. The semantic network isdefined by a set of typed nodes (attributes, values,srtructured objects and possibly their instances)and a set of typed arcs representing relationshipsbetween nodes (aggregation, generalization,association and some constraint arcs) [BOUZ 84].To.enhance the semantics of this model, additionalconstraints are defined: domains, roles, keys,intersection of classes, cardinalities and functionaldependencies. The figure 2 gives an illustration ofthe different concepts of this semantic data model.

pl1.1)

'"'I'uro<7.aFig.2: An example of a semantic network

Each semantic relationship is represented by acouple of binary arcs. Arcs a and p are calledatomic aggregation arcs, arcs r and 0 are calledmolecular aggregaiion arcs, and arcs g and s arecalled generalization-specialization arcs.Cardinality constraints are expressed both overalp arcs and rio arcs. Other constraints likefunctional dependencies are portrayed by fd arcstoo.

6.2. The inference engine

The inference engine is a program composed ofthe basic mechanisms which permit to manage therules into the rule base and to apply them onto thefact base. This inference engine functions by analternate use of backward-chaining and forward­chaining. The combination of these two principlesis made necessary because each step of the usedmethodology consists of:(I) Proving an hypothesis: for example. is a

constraint derivable from a set of other given

constraints? This leads to the use of atheorem prover principle, based on abackward-chaining.

(2) Transforming specifications from one givenform to another: for example, the successivemapping of the natural languagespecification to the relational model schema.This leads to the use of an inference engine,based on a forward chaining.

Besides this basic program, the inferenceengine provides two important modules: themethodology regulator and the results justificator.The first module allows the user to backtrack toany design step to redesign his database or tomodify his specifications. The second modulemakes the system able to explain and justify itsresults.

6.3. The external interfaces

Describing the application in a comprehensiblemanner is an important problem in data basedesign. SECSI offers three types of interfaces: thespecification interfaces, the acquisition interfacesand the interaction interfaces.

(1) The specification interfaces consists ofthree languages:i. A natural language which accepts simple

sentences, possibly composed of a conjunctionof subjects,a verb and conjunction ofcomplements. Its role is particuralyimportant for novice designers and for the easycommunication of specifications. It also allowsa rapid handling of the system without leamingany formal language.

ii. A high level declarative language which helpsto specify what the natural language cannoteasily express. This language permits also todescribe any database schema specified intoany given data model

Fig,_3 An example of an applicatioD specification

5

Page 7: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

iii. A graphical interface which enhances the userfriendly interaction. This interface is usedeither as an input facility or as a layout feature.As there is no common standard representationof graphics, SEeSI provides graphicsgenerator which permits to each user to havehis own diagrams.

(3) The interaction interfaces distinguisheicons, menus, forms and documentation:

i . The menus visualize the authorized operae tionson a given active window. According to thestages of design, only the permitted operationsare visualized or accessible.

~.

CARDINAlITIES

DElJVf]S

VEJDC1LQIE.\T

fUNCTIONAL DEPE~llENCIES

"APPLICA1IONS SECSI llODElINO

APflJCATI~~ NAlIE

iElmONSmp ACQI11SITION Of CIJIOIN.lLITY CONS1llAl~TS

DDJ'IERS('IUllCli,llJOO~A~)

ii. Forms facilitate the capture of some cons­traints (e.g. functional depedencies, cardi­nalities), and permit to hihglight the defaultoptions generated by the system.

Fjg4: An example of a graphical interface

To be more flexible, the designer of a data basehas the choice to describe his application in one ormore of these languages, and then combining themin the same specification. An interactive parsergenerates a fact base from the description, thisbase being progressively enriched by theinteractive acquisition module and transformedinto a canonical semantic representation.

(2) The acquisition ill/erface : the interactiveacquisition assistance helps in completing thedescription of a problem through a question­answering system that automatically reminds theuser what he might have forgotten to specify orwhat he has ignored in his description. Thisinteractive acquisition process is based ondeduction rules, heuristics, and the analysis ofexamples fed in by the user.

NOiMAWATIOH PBASI or lUI RELATICfl

VIllIlli(NU\lB>J.II,UQUF,lYI<J'(7IIIl.lXLO«j

F~cr[ONAL DEPENDENCIFJ EXAMPLES or TUPLESNlfMIlER. -) lYPl!, WllQUB, rot.Oi.

T1JPlB I lUPUil T1JPlBmE·)POWEl.

NOS VALmE fUNrf. DEPEHDENC. "'tJWIER. 1234 1234 5618DEDUCED FiOM EXAMPLrJ 1m M A1 A1

Y.ARQlfE .J.) lYPl!, h1JMllEi.",W!Jl 1 1TYPE -It) h1.MlER

Nl'f8ER..j·) TYPahtrMllEl ""QUO R R R

>YOill' WlIll~eul'e ootwstll the ded'oco.t IX &Mil ~peDdeD<:a,P..... •lllI1d you 1tI1 .. ",!bot NUMBER cIelemint<mE" DOl 1

Fig.6: A menu and fonn example for cardinality acquisition

ii. The documentation is accessible at all levelswhenever the user of the system requests it,during a session or independently. It consistsof a concise user's manual, a synthesis of themethodology implemented by the system, andthe defmition of the main notions of data basesto which the system refers.

6.4. The Expected Results

When the design process is terminated, the systemproduces the following results:

(1) A set of basic relations in 4NF and the variouskeys of these relations (primary keys andcandidate keys).

(2) A set of virtual relatjons (or views) and theIdefmition of the corresponding relational querieswhich permit to derive them from the implementeddatabase relations.

(3) A set of constraints like domain, referential andother general semantic constraints which havebeen generated by the mapping rules and thenormalization process.

Fig.5: Interaction between cODlrajnts and examples

6

Page 8: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

t1lEA1E VIEW I'lliONASSELEcr_, plm:, lddressfRO~CliU

~~IOS

SELEcrnm, plm:, ,<\:bes$fROMSu~

Fig.? Example of results produced by SEeS!

All the results can be obtained, as needed,either in standard SQL, as far as this language cando it, or in a more readable ad hoc language (i.e. adeclarative language). Fot those results thatcannot be generated in SQL (especially semanticintegrity constraints), the declarative language isused, and it is of the responsibility of the databaseadministrator to program these results in thecorresponding language used for his application.

6.A Modular and ProgressiveDesign Approach

The design approach adopted by SECSI andportrayed in figure 8 is based on three abstractionlevels: the conceptual level, the logical level, andthe physical level. To each level there corresponda specific model and a specific design process.The methodology proceeds by stepwiserefinement, going from informal description of agiven problem down to physical representation ofthis problem in terms of records and meso Staningfrom informal know ledge, three main phasessuccessively produce: a formal specification, aconceptual schema, a logical relational schema anda physical schema.

The conceptual design phase generates from theexternal description of an application a soundconceptual schema stored as a semantic networkwith its associated constraints. This generation isperformed while conversing with the end-user.The different views of the application given by theend-user(s) are mixed into one description afterelimination of redundancy and resolution ofconflicts. A verification step performs thevalidation of the application description in order togenerate a consistent conceptual schema. Inaddition to the syntactic controls, this phasedetects generalization cycles, recursiveassociations and objects which play several rolesin the same relationship. The system tries to

7

eliminate the possible inconsistencies usingpredifmed inference rules or with the end-user'shelp.

The logical design phase transforms theconceptual schema into a fourth normal formrelational schema with its associated integrityconstraints and views. It performs the interactiveacquisition of constraints and the choice of firstnormal form relations. Constraints such asintersection and union of classes, cardinalities ofrelationships and functional dependencies betweenattributes are captured. The fust normal formrelations are constructed by suppressinggeneralization hierarchies and removingmultivalued attributes. Normalization is carriedout using dependencies between attributes.

~~~~. 'Y ~

R,qwlrtlllUI u.". atqlllrtlllnl ... Ip." RtillllrllJlnl ... IJ."

... ''''In'''~ .., ••~n"".. J,.,,'n,,'"

~t;:JpView In~itltlo.

CONCEPTUAL~...w.. SCHEMA I4lp~_1Iru-.....~.Ir-

~hoplllltl .lIdlIormllluUoa,

LOGICAL~I_oohn SCIJEMA ....oetJ0t6ctl..._ ..

CMaJ&iIu \bt gll:-..oa

OPu~J..CIU•pnYSICAL

SCHEMA

Fig,S; The deSlgn melbodotowy

The physical design phase gives an optimizedphysical schema of the data base. This schemaincludes both a set of initial implemented relationsand a set of indexes and formats. The choice ofimplemented relations and plausible attributes forindexing depends on the most important or mostfrequent queries which will be performed on thedatabase. This choice needs some estimationsdepending on the DBMS used.

Page 9: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

(

The design of a database is an iterative .process.It implies numerous comings-and-goings betweenthe universe of discourse and the expert system.SEeSI provides this iteration at two levels :

(i) by permitting the interuption of a workingsession without losing the achieved task.The recover of the session can be donewithout any redesign of the last schema.

(ii) by authorizing the interuption of a dialogue tomodify an assertion or to return to a precedingquestion.

Some resumptions are automatic at each of thislevels without having to manually return to thecurrent stage of modeling.

The design of a data base is not a fullyautomatic process: a permanent interaction withthe designer allows the comb.ining of algorithmictasks with human decisions.

7. How a DeductiveApproach Can Contributeto the Conceptual Design

This section highlights the contribution of expertsystem approach in the database design process.We particularly focus on that parts where thesystem has enough knowledge to infer from rules,heuristics and examples.

7.1. The Interpretation and theControl of Specifications

Besides the syntactical checks of sentences, one ofthe hard problem, when compiling userspecificiltions, is to decide wether a term in a givensentence must be considered as an attribute, anentity, a relationship or an integrity constraint.

Sentences are not only interpreted asindependent units, but also as a whole consistentspecification whose interpretation is stronger thanthat of each sentence. So when a sentence isfollowed by another one, its interpretation couldbe modified because we get more information byreading two sentences than by reading only one.For example, from the following sentence: "Aprodllct has a nllmber, a IInit-price, and itssllpplier", we understand that there is an objectnamed "product" and characterized by threeattributes: "numberll

, t1unit-price", "supplierll• But

if we add a new sentence like: "Each prodllctsupplier, described by his name and address,supplies one to ten parts", we modify theprevious interpretation by removing the attribute

8

"supplier" and generating another object("supplier") described by two attributes ("name"and "address") and a relationship ("supplies")between "product" and "supplier". In fact we gotmore information in the second sentence as weknow the minimum and the maximum number ofproducts supllied by a given supplier (i.e.cardinality constraints). But the second sentenceintroduces an additional complexity, related to theusage of synonyms ("product" and "parts"), thatcan be solved if a data dictionary is provided.

Another problem is related to objects that playseveral roles in the same relationship. Forexample, in the sentence: "A person could bemarried to another person", we do not know whois the wife and who is the husband. Theinterpretation must complete this sentence byacquiring the different roles from the user andmodifying the previous sentence as follows: "Aperson as a husband could be married to anotherperson as a wife". Every body who reads thissentence can infer more than a relationshipbetween a person and a person; he can deduce thata person cannot be married with himself (becauseof the term "another" in the sentence). One canalso deduce that there exist some persons who arenot married.

Redundancy is a frequent problem in thespecification phase. Some new sentences, altoughthey are true, do not augment the semantics of theapplication, as the new described facts can bededuced from the previous ones. For example, inthe following description, the third sentence isredundant to the first two: "A person has a nameand and age. An employee is a person. Anemployee has a name and an age." Again, in thefollowing example, there is a redundancy, but it isan underhand one: "Employees and secretaries arepersons. A secretary is an employee". Indeedthe second sentence makes a part of the first oneredundant; as a secretary is an employee, it is notnecessary to say that he or she is a person, thisfact can automatically be deduced.

As the previous paragraphs show, theinterpretation of a given specification is not only asyntactic process, but a very high level semanticprocess based on expert knowledge:

a lexicon of terms which correspond to the lISUalabsttations (like aggregation andgeneralization), and to the different terms usedin the vocabulary of the application,

a set of semantic rules which permit todistinguish between atomic objects (attributes)and molecular objects (entities andrelationships),

I

Page 10: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

- a set of inference rules which capture theintegrity constraints involved by the differentsentences.

a set of reinterpretation rules which are able tomodify the interpretation of an objet to anotherobject with respect to a given set of relatedsentences.

a set of redundancy checking rules whichavoid the assertions that can be deduced fromother given facts.

7.2. The Acquisition of Constraints

One the most problem related to the constraintsacquisition is the combinatory explosion (e.g. theacquisition of functional dependencies in therelational model). In this case. we cannot envisionto ask the user all the possible questions. So wemust use various means in order: (i) to limit thedifferent combinations to consider and (ii) toavoid all questions for which an answer can beproduced by using a set of inference rules.

As stated above, the problem of combinatoryexplosion arises for all constraints involvingattributes (e.g. functional dependencies.cardinalities of attributes, keys). To illustrate thisproblem. we will consider the case of theacquisition of functional dependencies. Supposewe have an object type O(Al .... ,An) where foreach instance, we allow the attributes to bemonovalued or multivalued. This can berepresented by the following rliagram:

An

As one can see. cardinalities specify in onehand the monovalued or multivalued attributes (pcardinalities). and in the other hand wether. for agiven attribute value. we can associate one ormany instances of the object (a cardinalities).There is a particular correlation betweencardinalities and functional dependencies. Indeed.from a set of attributes with their cardinalities. wecan deduce some functional dependencies. Forexample:If Card(a(A1 ,0»=[1.1 j and Card(p(0,A2))=[1.1jThen A1->A2.In the same way. from a combination ofcardinalities and functional dependencies. one canderive new cardinalities and son on. For example:

9

IfCard(p(0,A1 ))=[1.1 j and A1->A2Then Card(p(O,A2»=[1.1].Finally. as it is well-known. one can derive. usingArmstrong's axioms. new functional dependenciesfrom a given set of dependencies. For example:If Al->A2 and A2->A3 Then A1->A3.

This process can be done as far as necessary togenerate the maximum facts. All generated factsare so much gained questions for the user. Thisdeduction process is based on two assumptions:

(i) the number of combinations necessary to getcardinalities is lower that that of gettingfunctional dependencies,

(ii) the acquisition of cardinalities is more naturalthan the acquisition of functional dependencies.

However. all functional dependencies are notimplied by cardinalities. This derivation processcan then be regarded as a heuristic to reduce thecombinatory explosion in the search of functionaldependencies. For all other dependencies whichare not implied by cardinalities. one must useArmstrong's inference rules or. in the last.questions to the user.

Before requiring questions to get dependencies.one can go far. when possible, in the usage ofheuristics. For example. we can make theassumption that. in most applications. there is noneed to search for functional dependencies withmore than four or five attributes in their lefthands ide. This can contribute to considerablyreduce the combinatory explosion.

Fianally. one can also use examples to reducethe combinatory explosion of functionaldependencies. More precisely. examples could beused to generate some non valide functionaldependencies; hence a set of unnecessaryquestions to ask to the designer. For example,from the following small relational extension.

NAME AGE ADDRESS

Dupond 24 Pa::hDurand 27 LyonDurand 27 DijonMartin 24 Paris

one can generate the following non validedependencies:

NAME -1-> ADDRESSAGE ·I->ADDRESS(AGE, ADDRESS) ·I->NOM(NAME,AGE) -1-> ADDRESS

We can notice that we cannot say anything aboutthe NAME and AGE.

Page 11: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

p8'\I),~ Sa

+

To be more relational. we must removemolecular aggregation arcs (rio) and replace themby references and referential constraints. asportrayed in the following schema:

One of the semantic concepts which is notsupported by the relational model is thegeneralization hierarchy. Thanks to the inheritanceproperty. this concept can be represented by basic Irelations (implemented relations) and virtualrelations (calculated relations or views). Forexample. in the left handside of the followingschema. one can replace the CLIENT object by avirtual relation calculated from the union ofAGENT and PRIVATE]ERS. The suppressionof this generic class is immediately followed bythe inheritance of its properties (related objects) toits sub-classes (schema of the right handside).

In fact. this transformation depends on ·thecardinality values. To satisfy the first normal formdefInition of relations. the cardinality of one of thetwo a)'cs (either r or 0) must be equal to [1.1) or[0,1). References designate the foreign keys ofthe related objects.

If the two cardinalities are different of [1.1) or[0,1). we must use another transformation whichimplies the defInition of an inteimediate object insuch a way that one of its cardinalities equals[1.1). That permits to come back to the previoustransformation. This case is portrayed by thefollowing schema.

PII.N,-'FNlJDe

+ j PfRSO~(Nm.SlIJ1llJlll. SurmJIl2, !gel1,11 \

~

To summarize, the problems related toconstraint acquisition could be solved by usingdifferent techniques as : (i) interaction rulesbetween constraints, (ii) heuristics, (iii) examplesand finally. when necessary, (iv) questions for theuser.

7.3.Transformation of the SemanticModel to the Relational Model

One of the main tasks of the system SECSI is totransform a conceptual schema expressed in asemantic model into a logical schema expressed ina relational model. As it is known. the problem inthis case is to not loose the semantics between thetwo levels. For this objective. the systemprovides a set of transformation rules whichconserves the semantics of the conceptual level atthe logical level. by generating new objects,semantic integrity constraints and views (queries).

One simple transformation rule is to representany molecular objet with monovalued attributes(cardinalities of p arcs equal [1,1) or [0.1]) of thesemantic model by fIrst normal form relation in therelational model.

When some attributes are multivalued attributes(cardinalities of p arcs equal [IoN) or [O,N)) theymust be transformed into a molecular object relatedto the fust one. to satisfy the atomicity of valueswhich characterizes the fIrst normal form relations.Depending on the attributes wether they are relatedby functional dependencies or not, thistransformation may generate one or severalmolecular objects.

l

1 0

Page 12: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

l1.1FXf ~ AGf:\1 UPRIVAll:}Eltl

But this transformation is not the unique oneallowed. Indeed, instead of removing the genericclass, one can remove the sub-classes by replacingthem with a specific attribute (say "role") whichcaptures the role played by a given client in thespecialization hierarchy. The sub-classes AGENTand PRIVATE]ERS should be calculated byrestriction of the CLIENT class using the newattribute role. This case is illustrated by thefollowing schema transformation:

,t6P{ Y~ ,

Ide

Dom(1011)'{t1 t"I,IIIOI}PRIVATE PERS~IPERSOSllol",·t1IIOI·)

AGE~I'IPDlSON I Rot,,"IIOI')

In a given schema, the two precedingtransformations are not both possible. Theirapplication depends on wether the sub-classeshave specific properties or not. If they have, thefirst transfonnation is better, otherwise the secondone is beller. In practice, this strategy is not sosimple: we can also move up some specificattributes if we introduce a specific integrityconstraint the check the null values on thisattribute. To decide for each case whichtransformation to apply, the expert system mayuse heuristics based on the number of sub-classes,the number of specific properties of each class andthe complexity of the possible integrity constraintsto be generated.

In general, the mapping process from theconceptual schema to the logical relational schemais based on a specific strategy which is built insuch a way that:

no semantic information is lost during eachtransformation,no duplicate relation schemas are generated,

the number and the complexity of the generatedintegrity constraints would not be too high.

After the mapping process, he normalizationprocess is carried out, as in usual, by generatingfor each relation its minimal cover of functionaldependencies.

7.4. The Justification of the Results

Justifications and explanations are especiallyemphasized int the expert system area. Such aspectis devoted in one hand to increase the user'sknowledge in database design, and in other handto enhance the credibility of the results obtained bythe system. Explanations and justifications arelillie different. In the fust case, the system has toexplain the concepts, the methodology and thedesign rules which constitute the knowledge base.In the second case, the system has to justifydifferent representation choices for a givendatabase application. SECSI supports these twoaspects at different levels.

At the first level, the system providesexplanations for every ununderstood concept orquestion during the interactive design process. Ifa given concept is not understood by the user (e.g.a cardinality of a functional dependency), SECSIelaborates a text composed by a defmition of theconcept and an illustrative example). If a givenquestion asked by the system is not understoodbecause of its complexity (i.e. contains conceptswhich are not understood) or of its fuzzy form, thesystem decomposes the given question into easysub-questions for which the user has only toanswer "yes" or "no". This gives the system theability to be used by both expert users and naiveusers. The documentation about the system useand the syntactic form of the different languages isalso integrated at this level.

At the second level, the system justifies why anobject of the application is represented as anormalized relation or as a virtual relation, why agiven fact is represented by an integrity-constraintwhereas the user waits for a relationship, but alsowhy a given fact does not explicitly appear in theresults, or why some artificial objects are in theresults whereas the user description did notcontain them. To answer to a given question, thesystem elaborates a synthesis of the differentdesign rules applied to the concerned object, fromthe analysis of the external description to thenormalized relational schema. This synthesis is

1 1

Page 13: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

(

(

based on the different states of the knowledgebase, which are saved all along the design steps.

Explanation and justification is a very complexprocess which needs to remember a large amountof data an design rules. Its feasability isdemonstrated in the first prototype of SECSI[BOUZ 86c], but its current industrial applicationis very limited.

7.5, The Incremental Design

As often claimed, the database design task is aniterative and long process which cannot be donedefinitely in a short time. Many refinements arenecessary during a long period of time (severalweeks or several months depending on thecomplexity of the application). The contributionof an expert system to this problem is to providesome methodological and some recoveringfeatures which pennit the user to backtrack to anydesign step and to interrupt his design with thepossibility to recover his session a few days later,without redesigning his application.

During each recovering session, the user wouldmodify his first specification by adding new factsor modifying old ones. Hence, two differentproblems arise:- how to make sure that the specification isconsistent,- how to integrate these new facts withoutreconsidering with the user all the pevious design(especially the set of previous asked questions).

To reach this double objective, SECSI stores inan extended fact base all the deduced facts from itsrule base and all the captured fact from the user'sanswers. When a given session is recovered withsome possible specification updates, the systemgenerates a new fact base and procceeds throughits following design steps. Whenever the systemneeds to ask a question to the user, the extendedfact base is used first to derive a possible answer.Then only questions concerning new facts andmodified facts are effectively asked to the user.

8. Conclusion:In this paper, we have described an expert systemapproach for database design. As an answer to thevarious modeling problems, SECSI relies on themost elaborated concepts of databases and themost recent techniques of artificial intelligence.Compared to the existing tools, this approachseems more suitable for data base design in thesense that it takes advantage of both theoreticaldevelopment and practical experience. Even ifpractically limited, the ability to justify several

design choices and to explain reasoningalternatives is one of the important feature ofexpert systems; it makes them attractive forcomplex problems.

More than just a tool for the design of databaseschemas, SECSI can also be used in many otherways, as for example, a validation tool of manualdesign or as an educational tool in data modeling.Combined with a DBMS, SECSI can beconsidered as a powerfull prototypingenvironment of relational databases. SECSI isalready used to design buiseness applications andtechnical applications like geographical databases.

Three main versions are dedicated to the designof databases: Version 1 concerns the productionof a normalized relational schema from the givenspecifications in natural or declarative language.Version 2 is dedicated to the optimization ofstructures and to the generation of access paths.Version 3 is an extension of the conceptualmodeling to the integration of views or to anincremental design of the data base. Each versionevolves horizontally through a permanent researchin improving the interfaces, the reasoningexplanation and the documentation, offeringamong others the graphic edition and layoutpossibilities adapted to each type of model used.Currently, only the version I has reached theindustrial level and then commercialized. Theother version are in the prototyping level.

With future development of deductive databasesand knowledge bases, this approach is moreadequate to integrate new concepts and new designrules. We think that only powerful expert systemswill efficiently handle the complexity introducedby these new research developments.

References[BAT! 85] BATIN! C. and CERI S. "Database Design:

Methodologies. tools and environments" Ranelsesion, ACM SIGMOD 1985.

[BEER 79) BEERI C. and BERNSTEIN P.A."Computational problems related to the design ofnormal Conn relation sct':m1S" ACNf Transactions IOn Database Systems, marcH 1979.

[BERN 76J BERNSTEIN Ph. "Synthesizing Third NormalFonn Relations from Functional Dependencies"ACM TODS. Voll,N'4, 1976.

[BORG 85J BORGIDA A. and WILLIAMSON K."Accomoding Exeptions in DB and Refining theschema by learning from them." VLDB Conf,Stockholm august 85.

[BOUZ 83J BOUZEGHOUB M. and GARDARIN G. "Thedesign of an expert system for database design-"InU. Worlcshop on New Applications of Databases,Cambridge (UK). sept. 83. Published in New

1 2

Page 14: An Expert System for Semantic and Relational Database Design …ceur-ws.org/Vol-961/paper39.pdf · An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb

Applications of Databases. Academic Press.Gardarin & Gelenbe eds. 1984.

[BOUZ 84) BOUZEGHOUB M. "MORSE: A FunctionalQuery Language and its Semantic Data Model"IN RIA RR270 and Proceed of 84 Trends andApplication conf on Databases, IEEE-NBSGaithersburg (USA), 1984.

[BOUZ 85] BOUZEGHOUB M., GARDARIN G.,METAlS E. "Database Design Tools: an ExpertSystem Approach" VLDB Conf, Stockholmaugust 85.

[BOUZ 86a] BOUZEGHOUB M. "SECS!: Un Syst~me

Expert en Conception de Syst~mes d'lnfonnations"These de Doctorat de I'Universite Pierre et MarieCurie (Paris VI), mars 1986.

[BOUZ 86b] BOUZEGHOUB M., COMYN I. &RICHARD D. "Conception de Bases de Donntes:Etat de l'art sur la mod6lisation conceptuelle,l'int6gration de vues et la conception physique"Rapport MASI-Universitt Paris VI NO 180, 1986.

[BOUZ 86c] BOUZEGHOUB M. & METAlS E."L'cxplication du raisonnement et la justificationdes resultats dans Ie syst~me expert SECSl", 2iemColloque International en Intelligence Artificiellede Marseille, 1986).

[BROD 84J BRODIE M., MYLOPOULOS 1., SCHMIDTY. "On Conceptual Modelling: Perspectives fromArtificial Intelligence, Data Bases andProgramming languages. Springer-Verlag, NY1984.

[BROD 86] BRODIE M. & MYLOPOULOS 1. (editors)"On Knowledge Base Management Systems"Springer Verlag, 1986.

[BROD 87J BRODIE M. "Automating Database Designand Development" A Tutorial of the SIGMODConf., San Francisco 1987.

[BROW 83] BROWN and STOTT-PARKER "LAURA: Aformal Database Model and her Logical DesignMethodology." Proceed. VLDB Conf, Florence1983.

[BUBE 82) GUSTAFSSON M., KARLSSON T. &BUBENKO 1. "A Declarative Approach toConceptuallnfonnation Modeling" in InfonnationSystems Design Methodologies Olle, Verijn-Stuarteditors, North Holland /Pub!. Co, 1982-

[CERI 83] CERI S. (editor) "Methodology and Tools forDatabase Design" North Holland Pub!. Co, 1983.

[CHEN 76J CHEN P.P. "The Entity Relationship Model ­Toward a Unified View of Data" ACM trans. OnDatabase Systems VI, NI, March 1976.

[CODD 70] CODD E.F. "A Relational Model of Data forLarge Shared Data Banks", Comm ACM,VoIl3,N06,1970.

[CODD 791 CODD E.F. "Extending The DatabaseRelational Model to capture more Meaning." ACMtrans. On Database Systems, 4, Dec 79.

[DBEN 84) Database Engineering Revue, "Special Issue onDatabase Design Aids" Vol7, N'4, 1984.

[FAG! 77] FAGIN R. "Multivalued Dependencies and aNew Normal Form for Relational Databases"ACM TODS, Vo12, N'3, Sept 1977.

[GALL 84J GALLAlRE H., MINKER 1., NICOLAS I.M."Logic and databases: a deductive approach" ACMComputing Surveys vol 16, NO 2, Iuin 84.

[GARD 89] GARDARIN G; & VALDURIEZ P."Relational Databases and Knowledge Bases"Addison Wesley Pub!. Co. 1989.

[HAMM 81] HAMMER N. and McLEOD D. "Data BaseDescription with SDM: A Semantic Data Model"(ACM TODS V6, N3, Sep 81).

[HAYE 84 ] HAYES-ROTH F. "The knowledge basedExpert Systems: A Tutorial" Computing Revue,Vol 17, N'9, 1984.

[HULL 87J HULL R. & KING R. "Semantic DatabaseModeling: Survey, Applications and ResearchIssues" ACM Computing Surveys, Vol 19, N'3,1987.

[KENT 79] KENT W. "Limitations of Record-BasedWonnation Models." ACM Trans.on DatabaseSystems 4,1,1979.

[MYLO 80] MYLOPOULOS 1. BERNSTEIN P.A.WONG H.K.T." A language facility for designingdatabase intensive applications" ACM trans. OnDatabase Systems valIS, nb2, 1980.

[NILS 82] NILSSON N.J. "Principales of ArtificialIntelligence", Springer_Verlag Berlin HeidelbergNew York, 1982.

[OODB 86] First Workshop on Object Oriented DatabaseSystems, IEEE Computer Society Press, 1986.

[PECK 88] PECKHAM 1. & MARYANSKI F. "SemanticData Models" ACM Computing Surveys, Vol 20,N'3, 1988.

[SMIT 77J SMITH I.M. and SMITH D.C.P. "Data BasesAbstractions Aggregation and Generalisation ..ACM trans. On Database Systems, june 77.

[TUCH 85] TUCHERMAN 1.., FURTADO A. andCASANOVA M. "A Tool for Modular D"atabaseDesign." VLDB Conf, Stockholm 1985.

[ULLM 82) ULLMAN J.D. "Prineiples of DatabaseSystems" computer science press 1982.

[ZAN! 81] ZANIOLO C., MELKANOFF M.M: "On thedesign of relational data base schemata", ACMtrans. On Database Systems, V6, NI; march 1981.

13