8
Expressing casual relationships in conceptual database schemas V. Ramesh a, * , Glenn J. Browne b a Department of Accounting and Information Systems, Kelley School of Business, Indiana University, 1309 E, 10th St., Bloomington, IN 47405, USA b Information Systems and Quantitative Sciences, College of Business Administration, Texas Tech University, Lubbock, Texas, USA Received 1 April 1998; accepted 22 October 1998 Abstract Conceptual schema design is a crucial phase in the database design process. The quality of the final database (regardless of logical implementation model) is dependent largely upon the quality of the conceptual schema. Since conceptual schemas serve as formal representations of the requirements specification for a database, it is critical that a schema capture the requirements as completely and unambiguously as possible. Many studies have shown that semantic models, such as the Extended Entity–Relationship model, are better for conceptual database design than traditional models such as relational, hierarchical, and network models. This is primarily because of their ability to capture explicitly many ‘‘natural’’ cognitive relationship types that are likely to occur in re- quirements specifications, e.g., association, generalization/specialization, and aggregation. However, the relationships that can be specified in a semantic model represent only a subset of the relationships that are likely to be used by people in describing an application environment. Thus, using current semantic models for conceptual database design may result in abstractions of ap- plication environments in which some important information from the requirements is either not represented or is represented inappropriately.This paper seeks to help bridge the gap between requirements specifications and data modeling by hypothesizing the need for supporting additional cognitive relationship types in conceptual models. In the paper, we demonstrate the need for one such relationship type, causation. Specifically, we investigate the eects of the lack of constructs in semantic models for capturing causation on analysts’ ability to express causal relationships mentioned in a requirements document.We found that subjects not familiar with data modeling expressed causal relationships better in their representations than did subjects who had some prior exposure to data modeling. This seems to indicate that the lack of constructs for capturing causation in semantic models hinders the ability of people trained in data modeling techniques to recognize and express causal relationships in conceptual schemas. The results also suggest the need to develop semantic models that provide constructs for capturing causation and other cognitive re- lationships. Ó 1999 Elsevier Science Inc. All rights reserved. Keywords: Semantic modeling; Conceptual schemas; Requirements determination; Causation; Cognitive relationships; Database design 1. Introduction Conceptual schema design is a crucial phase in the database design process. The quality of the final data- base (regardless of logical implementation model) and applications are dependent largely upon the quality of the conceptual schema. A conceptual model is intended to serve as a formal representation of the requirements specifications. Hence, it is important that a conceptual schema capture the requirements specified as completely and unambiguously as possible (Jarvenpaa et al., 1989). The need to bridge the gap between requirements spec- ifications and data modeling has been identified as a critical area of research (Navathe, 1992). This paper seeks to help bridge this gap by hypothesizing the need for additional cognitive relationship types in conceptual schemas, and demonstrating the need for one such construct, causation. To understand the weakness of current data models, it is important to examine the kinds of relationships that are likely to be present in a requirements document. Managers and other end-users are typically not trained in database design, and therefore express requirements as they perceive them in the world using natural rela- tionships among objects. These expressions of require- ments reflect the natural relationship types humans use to organize knowledge. Research in fields as diverse as cognitive psychology, philosophy, and rhetoric has identified numerous such relationships between entities, such as causal, motivational, and hierarchical relation- ships (Brockriede et al., 1960; Browne et al., 1998; The Journal of Systems and Software 45 (1999) 225–232 * Corresponding author. Tel.: +1-812-855-2641; fax: +1-812-855- 8679; e-mail: [email protected] 0164-1212/99/$ – see front matter Ó 1999 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 4 - 1 2 1 2 ( 9 8 ) 1 0 0 8 1 - X

Expressing casual relationships in conceptual database schemas

Embed Size (px)

Citation preview

Page 1: Expressing casual relationships in conceptual database schemas

Expressing casual relationships in conceptual database schemas

V. Ramesh a,*, Glenn J. Browne b

a Department of Accounting and Information Systems, Kelley School of Business, Indiana University, 1309 E, 10th St., Bloomington, IN 47405, USAb Information Systems and Quantitative Sciences, College of Business Administration, Texas Tech University, Lubbock, Texas, USA

Received 1 April 1998; accepted 22 October 1998

Abstract

Conceptual schema design is a crucial phase in the database design process. The quality of the ®nal database (regardless of logical

implementation model) is dependent largely upon the quality of the conceptual schema. Since conceptual schemas serve as formal

representations of the requirements speci®cation for a database, it is critical that a schema capture the requirements as completely

and unambiguously as possible. Many studies have shown that semantic models, such as the Extended Entity±Relationship model,

are better for conceptual database design than traditional models such as relational, hierarchical, and network models. This is

primarily because of their ability to capture explicitly many ``natural'' cognitive relationship types that are likely to occur in re-

quirements speci®cations, e.g., association, generalization/specialization, and aggregation. However, the relationships that can be

speci®ed in a semantic model represent only a subset of the relationships that are likely to be used by people in describing an

application environment. Thus, using current semantic models for conceptual database design may result in abstractions of ap-

plication environments in which some important information from the requirements is either not represented or is represented

inappropriately.This paper seeks to help bridge the gap between requirements speci®cations and data modeling by hypothesizing the

need for supporting additional cognitive relationship types in conceptual models. In the paper, we demonstrate the need for one such

relationship type, causation. Speci®cally, we investigate the e�ects of the lack of constructs in semantic models for capturing

causation on analysts' ability to express causal relationships mentioned in a requirements document.We found that subjects not

familiar with data modeling expressed causal relationships better in their representations than did subjects who had some prior

exposure to data modeling. This seems to indicate that the lack of constructs for capturing causation in semantic models hinders the

ability of people trained in data modeling techniques to recognize and express causal relationships in conceptual schemas. The

results also suggest the need to develop semantic models that provide constructs for capturing causation and other cognitive re-

lationships. Ó 1999 Elsevier Science Inc. All rights reserved.

Keywords: Semantic modeling; Conceptual schemas; Requirements determination; Causation; Cognitive relationships; Database design

1. Introduction

Conceptual schema design is a crucial phase in thedatabase design process. The quality of the ®nal data-base (regardless of logical implementation model) andapplications are dependent largely upon the quality ofthe conceptual schema. A conceptual model is intendedto serve as a formal representation of the requirementsspeci®cations. Hence, it is important that a conceptualschema capture the requirements speci®ed as completelyand unambiguously as possible (Jarvenpaa et al., 1989).The need to bridge the gap between requirements spec-i®cations and data modeling has been identi®ed as acritical area of research (Navathe, 1992). This paper

seeks to help bridge this gap by hypothesizing the needfor additional cognitive relationship types in conceptualschemas, and demonstrating the need for one suchconstruct, causation.

To understand the weakness of current data models,it is important to examine the kinds of relationships thatare likely to be present in a requirements document.Managers and other end-users are typically not trainedin database design, and therefore express requirementsas they perceive them in the world using natural rela-tionships among objects. These expressions of require-ments re¯ect the natural relationship types humans useto organize knowledge. Research in ®elds as diverse ascognitive psychology, philosophy, and rhetoric hasidenti®ed numerous such relationships between entities,such as causal, motivational, and hierarchical relation-ships (Brockriede et al., 1960; Browne et al., 1998;

The Journal of Systems and Software 45 (1999) 225±232

* Corresponding author. Tel.: +1-812-855-2641; fax: +1-812-855-

8679; e-mail: [email protected]

0164-1212/99/$ ± see front matter Ó 1999 Elsevier Science Inc. All rights reserved.

PII: S 0 1 6 4 - 1 2 1 2 ( 9 8 ) 1 0 0 8 1 - X

Page 2: Expressing casual relationships in conceptual database schemas

Curley et al., 1995). A list of these relationships appearsin Table 1.

Given the breadth of relationship types that is likelyto occur in requirements speci®cations, it is not sur-prising that semantically-rich data models that permitexplicit speci®cation of cognitive relationships such asassociation (sign in Table 1), generalization/specializa-tion (generalization and individuation in Table 1), andaggregation (various hierarchical relationships in Ta-ble 1) have been found to be better for conceptualschema design than traditional models such as rela-tional, hierarchical, and network models (Hull et al.,1987). For example, in a study of the literature com-paring the usability of various conceptual data models(traditional and semantic), Batra et al. (1994) found thatsemantic models such as the Entity±Relationship Model(Chen, 1976) and its derivatives were best suited forsupporting conceptual database design. Further, Jar-venpaa et al. (1989) found that end-users were able toexpress relationships better using semantic models.Navathe (1992) identi®ed ®ve characteristics that a goodconceptual model must possess: Expressiveness, Sim-plicity, Minimality, Formality, and Unique Interpr-etation. The key characteristic distinguishing semanticmodels from traditional models is the expressiveness ofthe relationship constructs supported by them (Burt etal., 1990). This expressiveness allows designers to createabstractions of real-world information by mapping thatinformation into basic human concepts (Tsichritzis etal., 1982), Thus, a semantic model can better capture theuser's perception of data relevant to an application (asde®ned by the requirements) (Navathe, 1992).

As noted, most semantic models allow explicit spec-i®cation of association, generalization/specialization,and aggregation relationships. However, a review ofTable 1 shows that these represent only a subset of therelationships that are likely to be used by people in de-scribing an application environment. Hence, while se-mantic models may have higher image ®delity thantraditional data models, i.e., schemas created using se-mantic models may conform better to users' views of theworld (Everest, 1986), they are still limited in the typesof relationships available in them. Thus, using currentsemantic models for conceptual schema design may re-sult in abstractions of application environments inwhich some important information is either not repre-sented or is represented inappropriately. 1

1.1. Causation

One relationship type that is not supported in currentsemantic data models is causation. Causation is a fun-damental aspect of cognition, and is the most commontype of relationship revealed in studies of human rea-soning (Curley et al., 1995; Schustack, 1988). For ex-ample, in an empirical study of managerial reasoning,two-thirds of the relationships expressed by subjectswere causal in nature (Curley et al., 1995). Hence, causalrelationships undoubtedly are part of users' represen-tations of problem representations, and it is likely thatsuch relationships will be found in requirements speci-®cations.

Data modelers are most likely to encounter causalrelationships in the form of business rules (McFadden etal., 1999) or conditional requirements statements. Sucha rule or statement, though not representing causality inits purest form, is an informal use of causation; it pro-vides a condition whose presence makes a critical dif-ference to the occurrence of an outcome (Schustack,1988). The importance of causal statements in require-ments documents is likely to increase in the future, be-cause embedding business rules in the form of triggers isbecoming increasingly prevalent in commercial data-bases.

Although the pervasiveness of causation in problemsolvers' representations has been empirically demon-strated (e.g., Curley et al., 1995; Tversky et al., 1980;Wilkin, 1996), none of the models used for databasedesign provide su�cient means for capturing causal re-lationships (Hull et al., 1987). The inability to expressthese relationships is likely to lead to conceptual sche-mas that do not completely represent the requirements.The focus of this paper is on investigating the e�ects ofthe lack of constructs in semantic models for capturingcausation on analysts' ability to express causal rela-tionships mentioned in a requirements document.

2. Hypotheses

Two groups of subjects were sought for the study, onefamiliar with semantic data modeling techniques andone unfamiliar. The rationale for the two groups was asfollows. Research has demonstrated that people orga-nize information using causation under appropriatecircumstances (Schustack, 1988). Hence, subjects unfa-miliar with database modeling (the database-naivegroup) should use causal relationships as naturally ap-propriate in modeling an application environment.However, because current data modeling training andpractice do not support the representation of causation,we hypothesize that subjects familiar with data modeling(the database-knowledgeable group) will not use suchrelationships. Rather they will force causal relationships

1 It should be noted that we are not implying that all relationships

mentioned in Table 1 need to be supported in semantic models. In fact,

some of the relationships that people use in their cognitive represen-

tations may not have relevance for data modeling. The usefulness of

such relationships is an empirical question.

226 V. Ramesh, G.J. Browne / The Journal of Systems and Software 45 (1999) 225±232

Page 3: Expressing casual relationships in conceptual database schemas

Tab

le1

Exa

mp

lere

lati

on

ship

s(A

da

pte

dfr

om

Bro

wn

ea

nd

Cu

rley

,1998;

Cu

rley

,B

row

ne,

Sm

ith

,an

dB

enso

n,

1995)

Rel

ati

on

ship

cate

go

ry

Rel

ati

on

ship

typ

e

Des

crip

tio

nE

xam

ple

Ca

usa

lC

au

sal

Sta

tes

con

dit

ion

sw

ho

sep

rese

nce

mak

esa

crit

ical

di�

eren

cein

the

occ

urr

ence

of

an

even

t.T

he

com

pan

yis

sued

ago

od

earn

ings

fore

cast

;

ther

efo

re,

its

sto

ckp

rice

rose

the

nex

td

ay.

Mo

tiv

ati

on

al

Inte

nti

on

so

fh

um

an

ag

ents

are

ass

um

edas

reaso

ns

for

act

ion

s.T

he

emp

loyee

sw

ill

use

this

info

rmati

on

syst

em

bec

au

seth

eyk

no

wit

wil

lh

elp

them

wit

hth

eir

job

s.

Co

va

ria

tio

na

lS

ign

An

ob

serv

edo

ra

ssu

med

covari

ati

on

bet

wee

nen

titi

esall

ow

so

ne

enti

tyto

be

use

das

asy

mp

tom

or

clu

efo

rco

ncl

ud

ing

the

oth

eren

tity

isp

rese

nt.

Th

ep

rod

uct

man

ager

bel

ieved

that

the

hig

hin

itia

l

sale

so

fth

en

ewp

rod

uct

ind

icate

dth

at

itw

ou

ldb

ea

bes

tse

ller

.

Hie

rarc

hic

al

Gen

era

liza

tio

nT

he

ind

uct

ive

arg

um

ent:

reaso

nin

gth

at

wh

at

istr

ue

for

spec

i®c

inst

an

ces

wil

lals

ob

etr

ue

for

oth

er

inst

an

ces

wit

hin

am

ore

gen

eral

cate

go

ry.

Ih

ave

trie

dtw

ob

ott

les

of

the

new

Do

lph

inso

ftd

rin

k

an

dh

ave

lik

edth

eta

ste;

ther

efo

re,

Ili

ke

the

new

soft

dri

nk

.

Ind

ivid

ua

tio

nR

easo

nin

gfr

om

the

gen

eral

toth

esp

eci®

c;w

hat

istr

ue

for

the

gen

eral

cate

go

ryis

claim

edto

be

tru

efo

rin

div

idu

al

inst

an

ces

wit

hin

that

cate

go

ry.

All

pro

du

cts

intr

od

uce

db

yth

isco

mp

an

yh

ave

bee

n

succ

essf

ul;

ther

efo

re,

this

new

pro

du

ctw

ill

be

suc-

cess

ful.

Ca

teg

ori

zati

on

Use

dto

sup

po

rtg

ener

ali

zati

on

an

din

div

idu

ati

on

arg

um

ents

;ap

pli

esw

hen

the

pre

sen

ceo

ffe

atu

res

is

su�

cien

tto

con

clu

de

that

an

enti

tyb

elo

ngs

toa

sup

ero

rdin

ate

cate

go

ry.

Th

isp

rod

uct

had

sale

so

f$100

mil

lio

nit

rst

yea

r;

ther

efo

re,

this

isa

succ

essf

ul

pro

du

ct.

Hie

rarc

hic

al

Ex

clu

sio

n

Wh

ena

cate

go

ryco

nta

ins

ase

to

fm

utu

all

yex

clu

sive

inst

an

ces,

the

pre

sen

ceo

fo

ne

inst

an

ceall

ow

sth

e

arg

uer

toco

ncl

ud

eth

eab

sen

ceo

fth

ere

main

ing

inst

an

ces.

Th

ep

erso

nw

ho

com

mit

ted

the

crim

ew

as

aw

om

an

;

ther

efo

re,

itw

as

no

ta

man

.

Hie

rarc

hic

al

Co

mb

ina

tio

n

Wh

ena

cate

go

ryco

nta

ins

ase

to

fco

llec

tivel

yex

hau

stiv

ein

stan

ces,

the

pre

sen

ceo

fall

inst

an

ces

all

ow

s

the

arg

uer

toco

ncl

ud

eth

ep

rese

nce

of

the

sup

erse

tca

tego

ry.

Th

ep

rod

uct

isn

ow

sold

in50

state

s;th

eref

ore

,it

no

w

has

an

ati

on

al

pre

sen

ce.

Sim

ila

rity

Pa

rall

elC

ase

An

intr

a-d

om

ain

sim

ilari

tyex

ists

bet

wee

ntw

oen

titi

es.

Th

ela

stti

me

we

intr

od

uce

da

pro

du

ctu

nd

erth

ese

circ

um

stan

ces,

itw

as

succ

essf

ul;

ther

efo

re,

this

pro

du

ctsh

ou

ldals

ob

esu

cces

sfu

l.

An

alo

gy

An

inte

r-d

om

ain

sim

ila

rity

exis

tsb

etw

een

two

enti

ties

.S

ale

so

fa

new

pro

du

ctare

lik

ea

seed

lin

g;

togro

w,

they

mu

stb

eca

refu

lly

nu

rtu

red

.

Tes

tim

on

yA

uth

ori

tyT

he

arg

uer

uti

lize

sa

sta

tem

ent

mad

eb

yan

exte

rnal

kn

ow

led

ge

sou

rce.

Th

ed

oct

or

said

Iam

per

fect

lyh

ealt

hy.

V. Ramesh, G.J. Browne / The Journal of Systems and Software 45 (1999) 225±232 227

Page 4: Expressing casual relationships in conceptual database schemas

into types supported by current data models or not ex-press them at all. Therefore, our ®rst hypothesis is:

(1) Database-naive modelers will express causal re-lationships in an application scenario to a greater extentthan database-knowledgeable modelers.

Further, since current semantic data models do notprovide adequate means for expressing causal relation-ships, we hypothesize that database-knowledgeablemodelers will use textual statements to represent anycausal relationships they could not express in their datamodel. However, since we expect that the database-na-ive modelers will not restrict themselves to a particulargraphical representation, we anticipate that they willrepresent causal relationships diagrammatically. Hence,our second hypothesis is:

(2) Database-knowledgeable modelers will use agreater number of textual statements to express causalrelationships than database-naive modelers.

3. Methodology

An experimental hypothesis testing methodology wasused to investigate users' ability to represent causal re-lationships in an application scenario. Subjects were 78students recruited from information systems classes atan eastern US university who received course credit fortheir participation. Subjects were categorized as eitherdatabase-naive (``Naive'') or database-knowledgeable(``Knowledgeable'') for purposes of analysis. A briefquestionnaire distributed after the experimental taskwas used to facilitate this determination (the principalquestion concerned subjects' ability to de®ne the terms``entity'' and ``cardinality'' in an E/R modeling context).35 subjects were categorized as database-naive and 43subjects were categorized as database-knowledgeable. 2

The stimulus material was a short case describing therequirements for a hospital database application. Thecase appears in the appendix to this paper. Four informalcausal relationships were deliberately and explicitly em-bedded in the case. 3 These statements appear in Table 2.Subjects in both groups were instructed to sketch agraphical description of the case situation using paper

and pencil (note that the knowledgeable subjects werenot explicitly asked to create an ER diagram). Subjectswere given 45 minutes to complete the task. All subjects®nished their representations during this time period.

4. Results

As a check on whether the naive group was handi-capped by the lack of formal training in modeling, wetested to see whether members of the two groups cap-tured the essential entities in the model to the same ex-tent. Six entities were identi®ed by the researchers ascritical to representing the content of the scenario. Thenumber of entities expressed by each subject was tallied(to be counted, the entity had to be explicitly stated bythe subject). The mean number of entities expressed bygroup was as follows: Naive� 5.17; Knowledge-able� 5.28. These means were not signi®cantly di�erent(t�63� � 0.53; p � 0.60), indicating that subjects in thenaive group were able to express the important entitiesin the scenario as well as subjects in the knowledgeablegroup. 4

Subjects' representations were coded independentlyby two coders. To prevent the possibility of subcon-scious biases, the coders were unaware of which group aparticular subject fell within during coding. The codersused a four point scale to rate the extent to which sub-jects expressed the causal relationships embedded in thecase scenario. The coding scale is described in Table 3.

The coding scheme is necessarily subjective, and in-terrater reliabilities were calculated to assess the extentto which the coding was performed consistently. For allfour statements across 78 subjects, the two codersagreed on 83.3% of the statements (260 out of 312).Codes for relationships on which there was disagree-ment were resolved through discussion between the twocoders. These agreed-upon codes were used for the an-alyses that follow.

Of primary interest was whether the naive group andknowledgeable group di�ered in the extent to which theyexpressed the causal relationships present in the sce-nario. As a preliminary procedure, we removed all re-lationships that had been coded as zero by the coders.As noted, a code of zero indicated that a relationshipwas not expressed at all by a subject. In other words,some subjects simply did not express the relationships ofinterest. 5 Since the ultimate question of interest iswhether subjects expressed the causation present in re-

2 Note that ``database-knowledgeable'' does not mean ``expert.''

Subjects in the knowledgeable group simply had had some training in

constructing entity-relationship diagrams. They were not necessarily

expert data modelers. However, our argument is that people trained in

data modeling will not express causality because semantic modeling

does not support causation. Thus, there is no reason to believe that

experienced data modelers (in organizations) will be any more sensitive

to causality than the ``knowledgeable'' subjects in our study.3 The causal relationships included in the case may be termed

``informal'' because they did not explicitly meet all the criteria for

causation; e.g., they did not explicitly rule out all other possible causes.

The relationships re¯ected causation in the common everyday sense,

i.e., providing a condition that leads to an e�ect.

4 Although number of entities captured is relevant to a model's

quality, we do not mean to imply anything regarding the quality of

subjects' models in the two groups. No judgments of overall quality

were made.5 For the naive group, 31% of the codes were zeros. For the

knowledgeable group, 27% of the codes were zeros.

228 V. Ramesh, G.J. Browne / The Journal of Systems and Software 45 (1999) 225±232

Page 5: Expressing casual relationships in conceptual database schemas

lationships they captured, we treated these subjects asnot having performed that portion of the task. For thosesubjects who were able to capture some relationshipbetween entities in the data, were there di�erences in theextent to which they expressed causation?

In analyzing the data, signi®cant violations of thehomogeneity of variance assumption underlying theanalysis of variance procedure were observed. Hence,the Wilcoxen Sum of Ranks test (corrected for tiedvalues), a non-parametric procedure, was performed totest whether the groups di�ered in terms of the causalratings assigned to their representations of the causalrelationships. A sum of ranks was performed withineach causal statement, to control for possible di�erencesbetween the statements. The results are shown in Ta-ble 4. As can be seen, di�erences between the naive andknowledgeable groups were found for statements 3 and4. The ratings assigned to these statements for subjectsin the naive group were signi®cantly higher, at ana� 0.05 level (indicating greater expression of causa-tion). Naive subjects had higher mean ratings forstatements 1 and 2 as well, although these di�erences didnot reach statistical signi®cance. These results support

Hypothesis 1. 6 The following conclusion may be drawn:Subjects in the naive group expressed the causationpresent in the application scenario to a greater extent thansubjects in the knowledgeable group.

As a second method of analyzing the extent to whichsubjects expressed the causal relationships in the sce-nario, we counted the number of times subjects' repre-sentations for each relationship were rated as a ``2''(indicating) implicit expression of causation) or a ``3''(indicating explicit expression of causation). Table 5lists these numbers, and Figs. 1 and 2 show examples ofsubjects' representations that were coded as ``2'' or ``3''(only a portion of each subject's diagram is shown). Ascan be seen in Table 5, subjects in the naive group weremuch more likely to express causal relationships eitherimplicitly or explicitly than were subjects in the knowl-edgeable group. Although only 14 of the 78 total sub-jects expressed causation in their representations, 13 of35 subjects in the naive group expressed causation.These results are implicit in Table 4 above, but Table 5provides further explicit support for Hypothesis 1.

The second question of interest in the current studywas whether subjects in the knowledgeable group wouldutilize textual statements to a greater extent to representcausal relationships than subjects in the naive group.Supplementary textual statements made by subjectswere coded independently by two coders for their causalcontent. 7 A three-point scale was used (Table 6).

Table 2

List of causal statements in scenario

Number Causal statements

1 When a physician is on leave or vacation, this causes changes in hospital scheduling

2 The equipment located in each room causes certain patients to be given certain rooms

3 The availability of physicians causes a particular physician to be assigned to a particular patient

4 Each nurse on practical training is supervised by a physician, with the particular physician assigned determined

by the physician's specialization

Table 3

Scheme for coding graphical representations

Level Description

0 Subject did not express the relationship in his or her diagram

1 Subject expressed the relationship with no indication of

causation

2 Subject expressed the relationship and expressed causation in

some implicit way

3 Subject expressed the causal relationship explicitly, using

either causal language or speci®cally-de®ned causal symbols

Table 4

Mean ratings and signi®cance for four causal statements

Causal

Statement

Naive group

mean ratings

Knowledgeable

group mean

ratings

p-value

1 1.06 1.00 .11

2 1.19 1.03 .10

3 1.19 1.00 .01

4 1.21 1.00 .01

6 We should note that although we found a statistically signi®cant

di�erence between naive and knowledgeable subjects in the extent to

which they expressed the causation present in relationships, the mean

ratings for both groups of subjects were quite low. This may indicate

that people have di�culty representing causation graphically. That is,

these data seem to indicate that anyone asked to represent causal

relationships using graphical representations may have di�culty

expressing the causation. Further research is needed to test whether

certain representational forms are more useful than others for

expressing causal relationships. (There are, however, graphical tools

explicitly designed to help people express causal relationships. Cause

maps are one example of such a tool (for a review, see Hu�, 1990).

However, in the absence of explicit instructions to use such tools,

causation may be di�cult to express).7 Textual statements that were exact copies of statements in the

requirements were not counted by the coders. We interpreted such

statements as simple restatements of the problem for ``problem-

solving'' purposes.

V. Ramesh, G.J. Browne / The Journal of Systems and Software 45 (1999) 225±232 229

Page 6: Expressing casual relationships in conceptual database schemas

Table 7 shows examples of textual statements coded as``1'' and ``2''.

The mean ratings for textual statements made bythe two groups were as follows: Naive group� 0.46;Knowledgeable group� 0.48. These means were notstatistically di�erent (using Wilcoxen Sum of Ranks,z � 0.05; p � 0.98); thus, hypothesis 2 was not sup-ported. Subjects in the naive group made a total of 28textual statements to qualify their diagrams, and sub-jects in the knowledgeable group used a total of 27

qualifying statements. Of the 28 statements made bynaive group members, one was coded as a 1 (implicitexpression of causation) by the coders, and six werecoded as a 2 (explicit expression of causation). Coin-cidentally, of the 27 statements made by knowledge-able group members, one was also coded as a 1 andsix were also coded as a 2. Hence, in both groups, onlyabout 25% of the textual statements made by subjectsto supplement their diagrams were related to causa-tion.

Fig. 1. Example Representation (Naive group) coded as a 2.

Fig. 2. Example Representation (Naive group) coded as a 3.

Table 5

Causal representations rated as 2 or 3

Statement 1 Statement 2 Statement 3 Statement 4

2s 3s 2s 3s 2s 3s 2s 3s

Naive 1 0 1 2 3 1 4 1

Knowledgeable 0 0 1 0 0 0 0 0

Table 7

Example of textual statements coded as ``1'' and ``2''

Example of

``1''

``Hospital scheduling needs to know each physician's

leave or vacation to assign available physicians to day or

night hours.''

Example of

``2''

``Availability of physicians causes physicians to be

assigned to patients.''

Table 6

Scheme for coding graphical representations

Level Description

0 Textual statement makes no attempt to express causal

relationship

1 Textual statement makes an implicit attempt to express

causal relationship

2 Textual statement explicitly expresses causal relationship

230 V. Ramesh, G.J. Browne / The Journal of Systems and Software 45 (1999) 225±232

Page 7: Expressing casual relationships in conceptual database schemas

Several possible conclusions can be drawn from theseresults. The relatively low number of textual statementsused to express causal relationships (14/312 � 4.5%)suggests that both groups relied heavily on their dia-grams to communicate the information in the scenario.As expected, subjects in the naive group may have feltthat they could adequately represent the causal rela-tionships in their diagrams, and in fact they did a betterjob of doing so overall than did the knowledgeablegroup. Subjects in the knowledgeable group may haveignored the causation present in the scenario becausecausation is not one of the types of relationships ana-lysts are trained to look for when creating semanticmodels.

5. Conclusions and future research

Our objective in this paper has been to investigate thee�ects of the lack of causal constructs in semanticmodels on analysts' ability to express causal relation-ships in conceptual schemas. We reported the outcomeof an experiment that examined the extent to whichdatabase-naive and database-knowledgeable peoplewere able to model causal statements embedded in ashort case. We found that the conceptual representa-tions created by the naive subjects expressed causal re-lationships better than those created by moreknowledgeable subjects.

Although causation is a natural construct used bypeople to represent relationships, the results of our studysuggest that the lack of adequate support for capturingcausation in semantic models hinders the ability of moreknowledgeable subjects to recognize and express causalrelationships during conceptual modeling (hence thelack of causation in the knowledgeable group's repre-sentations). The data also suggest that exposure to amodeling technique causes people to suppress theirnatural inclination to express certain types of relation-ships and to use only the relationship constructs sup-ported by the semantic models. The result is a lessfaithful representation of the requirements in knowl-edgeable subjects' models.

Therefore, future research might investigate ways inwhich causation can be incorporated into semanticmodels. Such incorporation must be accomplishedwithout adversely a�ecting a semantic model's ability toserve as a medium of communication among people(Tsichritzis et al., 1982). Another natural extension ofthe current research is an evaluation of the role thatother types of cognitive relationships (identi®ed in Ta-ble 1), such as similarity relationships, might play inreducing di�erences between requirements speci®cationsand conceptual schemas intended to represent such re-quirements. Since these relationships are used by peoplein their descriptions of problem domains, they need to

be recognized and represented by analysts. Promptsdesigned to elicit such relationships explicitly can beuseful in helping users to articulate their beliefs about anapplication environment (Browne et al., 1997). How-ever, representational forms must be available to con-nect the requirements to conceptual schema design. Thekey point is that various types of relationships will bestated when managers and end-users specify require-ments, and it is important to develop techniques thatexplicitly map these relationships from the requirementsto the conceptual model.

Finally, semantic modeling practice suggests that themere existence of useful modeling constructs is not en-ough to guarantee their widespread use. For example,although most semantic models provide support foraggregation, the construct is seldom used during con-ceptual schema design (e.g., an aggregation is oftenrepresented as multiple association relationships).Hence, another important goal in this research is toinvestigate how training in conceptual schema designcan be modi®ed to encourage the use of new relationshiptypes.

The authors contributed equally to the preparation ofthis article.

Appendix A. Instructions

Please organize the information in the following case.In particular, please use pencil and paper to sketch agraphical description of the company's business pro-cesses and information important to those businessprocesses. Please also carefully describe any informationor relationships that you see but cannot capture in yourdiagram.

Mountain View Community HospitalMountain View Community Hospital serves several

cities in northwestern Baltimore County. MountainView is planning to implement an information system tohelp manage its operations, particularly information onpatient administration and hospital personnel.

The hospital needs to keep records concerning itsphysicians, patients, departments, equipment, and bill-ing information. Physicians specialize in only one area,and the hospital is particularly interested in keepingtrack of pediatricians, heart specialists, and cancer-treatment specialists. When a physician is on leave orvacation, this causes changes in hospital schedulingprocedures. Therefore, the hospital needs this informa-tion to assign available physicians to day or night dutyhours.

Patients are treated as in-patients or out-patients. In-patients are assigned individual rooms. The equipmentlocated in each room causes certain patients to be givencertain rooms. Each patient may be treated by one ormore physicians; the availability of the physicians causes

V. Ramesh, G.J. Browne / The Journal of Systems and Software 45 (1999) 225±232 231

Page 8: Expressing casual relationships in conceptual database schemas

a particular physician to be assigned to a particularpatient. Each physician may treat one or more patients.Treatment history for each patient is maintained by thehospital. This includes such factors as when a patientwas treated, who treated him or her, what medicationwas prescribed, and any side e�ects that the treatmentmay have had.

Mountain View keeps track of two categories ofemployees, full-time and practical-training. The hospitalis particularly interested in monitoring the progress ofnurses who are on practical training. Informationtracked includes the date practical training began, theexpected end date, university a�liation (if any), status ofpractical training, and progress reports (if any). Eachnurse on practical training is supervised by a physician,with the particular physician assigned determined by thephysician's specialization.

References

Batra, D., Antony, S.R., 1994. E�ects of data model and task

characteristics on designer performance: a laboratory study.

International Journal of Human-Computer Studies 41, 481±508.

Brockriede, W., Ehninger, D., 1960. Toulmin on argument: an

interpretation and application. Quarterly Journal of Speech,

44±53.

Browne, G.J., Curley, S.P., Benson, P.G., 1997. Evoking information

in probability assessment: knowledge maps and reasoning-based

directed questions. Management Science 43, 1±14.

Browne, G.J., Curley, S.P., 1998. Reasoning with category knowledge

in probability forecasting: typicality and perceived variability

e�ects. In: Wright, G., Goodwin, P. (Eds), Forecasting with

Judgment, Wiley, Chichester, pp. 169±200.

Burt, P.V., Kinnucan, M.T., 1990. Information models and modeling

techniques for information systems. Annual Review of Informa-

tion Science and Technology 25, 175±208.

Chen, P.P., 1976. The entity-relationship model: toward a uni®ed view

of data. ACM Transactions on Database Systems. 1 (1), 9±36.

Curley, S.P., Browne, G.J., Smith, G.F., Benson, P.G., 1995. Argu-

ments in the practical reasoning underlying constructed proba-

bility responses. Journal of Behavioral Decision Making 8, 1±20.

Everest, G.C., 1986. Database management: objectives, system func-

tions, and administration. McGraw-Hill, New York.

Hu�, A.S., 1990. Mapping Strategic Thought, In: Hu�, A.S. (Ed.),

Mapping Strategic Thought, Wiley, Chichester, pp. 11±49.

Hull, R., King, R., 1987. Semantic database modeling: survey,

applications and research issues. ACM Computing Surveys 19,

201±260.

Jarvenpaa, S., Machesky, J., 1989. Data analysis and learning: an

experimental study of data modeling tools. International Journal

of Man-Machine Studies 31, 367±391.

McFadden, F.R., Ho�er, J.A., Prescott, M.B., 1999. Modern Data-

base Management (5th ed.), Addison±Wesley, Reading, MA.

Navathe, S.B., 1992. Evolution of data modeling formalisms. Com-

munications of the ACM 35, 112±123.

Schustack, M.W., 1988. Thinking about causality, In: Sternberg, R.J.,

Smith, E.E. (Eds.), The Psychology of Human Thought, Cam-

bridge University Press, Cambridge, pp. 92±115.

Tsichritzis, D.C., Lochovsky, F.H., 1982. Data Models. Prentice-Hall,

Englewood Cli�s, NJ.

Tversky, A., Kahneman, D., 1980. Causal schemas in judgments under

uncertainty. In: Fishbein, M. (Ed.), Progress in Social Psychol-

ogy, Erlbaum, Hillsdale, NJ, pp.49±72.

Wilkin, N.E., 1996. An Empirical Investigation of Practical Reasoning

in the Construction of Beliefs Regarding Medication by Arthritis

Patients, Doctoral Dissertation, University of Maryland, Balti-

more.

V. Ramesh is an Assistant Professor in the Department of Accountingand Information Systems, Kelley School of Business at Indiana Uni-versity. His research interests are in heterogeneous databases, databasemodeling, and group support systems. His papers have been publishedin ACM Transactions on Information Systems, IEEE Expert, Infor-mation Systems and other journals. He received his Ph.D. in BusinessAdministration (MIS) from the University of Arizona. He also holds aM.S. in Computer Science from the University of Iowa and a B.E. inComputer Science from the Birla Institute of Technology, Mesra(Ranchi), India.

Glenn J. Browne received his Ph.D. in MIS and Decision Sciences fromthe University of Minnesota. His research interests include systemsdevelopment, semantic modeling, and basic decision-making processes.His papers have appeared in Management Science and other journals.

232 V. Ramesh, G.J. Browne / The Journal of Systems and Software 45 (1999) 225±232