8 Object-role modelling.pdf

Embed Size (px)

Citation preview

  • 8Object-role modelling

    Perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away.

    Antoine de Saint-Exupery, French author

    Overview In the real world of information technology databases come with dozens of fields arranged into many tables. Databases also have to interact with each other, with many users and perhaps with a variety of operating systems. The only way this can happen with any degree of reliability and certainty is if they are carefully designed, well set up, and effectively maintained. We are now at the point of finding out the best way to develop the tables that make up databases. In doing this we will follow the advice of Saint-Exupery above. We will look for the essence of what is needed and reduce everything else to a minimum. We may not find perfection but we will develop an effective information system. In this unit we will investigate:

    problems with flat-file databases elementary sentences and conceptual schema diagrams relationships between data constraints on relationships normalisation the role of virtual communities.

    Introduction So far we have looked at several databases, for example the Repairs database with tables for devices, technicians, and so on. These databases however are a logical abstraction. A real world situation has been reduced to a set of relational tables that have been entered into an information system. When the tables are in the system we can add data, query them, develop reports and so on to derive meaningful information, but the tables are not reality. They are just a way of representing the real world in a manageable way.

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 244

    To move from reality to this abstraction we will go through a process called object role modelling (ORM). This process will quickly and efficiently produce a set of database tables in optimal (best) normal form. It is not difficult to use when it is understood, but that will only come after some effort. Much of the process of normalisation that we will go through will have to be taken on trust, especially the early stages. We are going to look at an involved procedure that will not mean much until we can see it in its entirety. As we go you will be expected to carry out activities that may seem irrelevant at first. Complete the reading, exercises and activities as we go and accept that in the end you will develop a very useful skill, if you hang in there. Before we start however we will see why we do not just put all of the information into just one table. An information system with just one table is known as a flat-file database.

    Problems with flat-file databases A flat-file database is inefficient and can lead to problems with maintaining the integrity of data. To see this we will look at a single table database that contains pet and owner information for a vet. This table has the following fields:

    Customer Data (surname, first name, address, suburb, phone, pet name, pet type, pet age, amount owing, last visit)

    The table is called Customer Data and it has ten fields (surname, first name, etc.). When the database is populated the table holds data arranged into a record for each client.

    Part of the Customer Data table in the Vet database

    This looks to be a perfectly reasonable arrangement of data. What could be simpler than having everything we want in the one place? Unfortunately this placing all fields into the one table leads to some problems. What if Mary Crothers brings a second pet, her cat Macavity, to see the vet? We now have to re-enter all of the other information about Mary, such as her address, phone, and so on, even though it is already in the database. Having to re-enter data that is already recorded is not such a big problem with a simple data set such as we have here, but for an information system with thousands of records it can be very inefficient. The redundant data takes time to enter and takes up storage space.

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 245

    There is another difficulty with this situation too. The key field in this table is surname in the example above, the surname Crothers uniquely identifies the set of data contained in the first record. If we add a new record for the second pet we no longer have a key for this table. Crothers does not identify whether the information in a record is about Sprout the guinea pig or Macavity the cat. To solve this problem we would have to make a new key for the table. (How might this be done?) Recording the same data more than once can lead to further problems. We will explore these by following a sequence of events. Say Cath Darkin has three pets in the database, and she moves from Middle Ridge to South Park. Now, not only did we originally have to enter all of her address data three times, but now we have to change it three times. This is inefficient. Next, perhaps the receptionist was in a hurry and when he updated Caths information he missed one of the phone numbers:

    Darkin Cath 15a Kyle St South Park 4639987 Socks cat 3 $62.40 5-05-10Darkin Cath 15a Kyle St South Park 4639987 Alf galah 14 $23.10 18-03-10Darkin Cath 15a Kyle St South Park 4356984 Bowser dog 2 $0.00 12-12-09

    We now have the confusing situation of not knowing which record to believe. Cath lives in Kyle St, South Park, but is her phone number 4639987 or is it 4356984? We have no way of telling. The situation where one record is changed but another is not, so that they disagree with one another, is called an update anomaly. Update anomalies cause major problems in large databases that have been poorly designed. If, on searching through ten thousand records, say you are presented with three different prices for exactly the same product which do you believe? The data is unreliable. No matter how reputable or expensive a database is, if the data in it is unreliable, the database is useless. Our problems with this flat-file database are not yet over. Alice Donovan has just phoned to tell Dr Harry that her spaniel Barkley has died. This is very sad, but since Alice owes no money the receptionist removes Barkleys record from the database. A week later Dr Harry learns of a litter of spaniel pups looking for a home. He thinks of Alice and decides to phone her to see if she would want one as a replacement for Barkley. Looking in the database however he cannot find Alices phone number. When the receptionist deleted Barkleys record he deleted all of Alices address information too! (Fortunately Dr Harry sees Alice down the street the next day and arranges for her to get the puppy.) We have now seen the four problems that can arise with a poorly designed database:

    redundancy the fact that Mary Crothers lives at 160 Drayton Rd, Southbrook, phone 4909547 is recorded for every pet she has

    inefficiency if we have to change the information recorded about Mary we have to do it in every record

    update anomalies if in changing the same information in different records we make a mistake, as with Caths phone number, the data will be inconsistent and hence unreliable

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 246

    data loss if we delete a record such as Barkleys, we lose all of the information it holds, not just the parts we no longer want.

    Perhaps a flat-file database is not the best way to organise information, especially if we have hundreds, or even thousands of records, in dozens of fields. The question now arises, how can we design a data set so these problems do not occur? (The answer to this question will occupy much of this unit!) Let us look at a better way of arranging the data. To do this we will divide it into three tables in such a way that each fact is only recorded once:

    Owner (surname, first name, address, suburb, phone) Pet (pet name, pet type, pet age, amount owing) Visit (surname, pet name, last visit)

    Instead of being all in one table the information would now be arranged like this:

    Does this solve the problems? Lets look at each in turn:

    no redundancy Marys contact information is recorded once only; the data does not have to be entered for each pet in the database

    no inefficiency if we have to change address information we only have to do it in one place

    no update anomalies since each fact is recorded only once, the data changes can only be made in the one place, and so it cannot be inconsistent

    no data loss if we delete the information about Barkley we do not lose the contact information for Alice.

    This is a much better design. To get to this arrangement of fields we will use the ORM process. With ORM we organise the fields into tables so that each fact is only recorded once with no loss of data. The data is now in optimal normal form (ONF), or we say it has been normalised.

    surname first name address suburb phone Crothers Mary 160 Drayton Rd Southbrook 4909547 Curtis Alexandra 13 Faith Crt Middle Ridge 4387114 Darkin Cath 123 Jull Av Middle Ridge 4356984 Donovan Alice 15 Aruma Drv University Hgts 4352638 , etc.

    pet name pet type age owingSprout guinea pig 2 $23.10Sophie snake 5 $45.80Socks cat 3 $62.40Barkley dog 3 $0.00, etc.

    surname pet name last visit Crothers Sprout 9-07-10 Curtis Sophie 5-11-10 Darkin Socks 5-05-10 Donovan Barkley 2-06-09 , etc

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 247

    We will explore ORM shortly but for now note that the three tables are related. Each table has at least one field from another table. We have solved each of the problems listed above by arranging the data into a relational database. To handle relational databases we will need more skill than with a flat-file database, but we have a more reliable, more effective, and more efficient way of handling the data.

    Activity 8.1 Video store The video store flat-file database has been poorly designed and only has one table:

    Video (title, category, hire period, cost, rating, member number, due date) Use this table to answer the following questions.

    1. a What is the name of the table? b How many fields are there in the table? c Which field acts as a key to the table? d Give an example of one possible record of data from the table.

    2. Explain what each of the following means: a redundant b inefficient c update anomaly d related field.

    3. Using the table above create a set of data to populate it (3-4 rows) so that you can give examples to show how each of the following might occur: a The storage of redundant information. b Inefficiencies in maintaining the data set. c An update anomaly. d Data loss.

    Explain with reference to your data set how each of these could occur.

    4. Rearrange the fields in the video database into separate, related tables so that the problems listed in 3 above do not occur.

    ORM In designing an information system we will model a real world situation in an abstract way. This model must represent an accurate but simplified form of reality that can be manipulated to produce the most efficient way of handling data. Object role modelling is one way to develop a simple, thorough description of how information is to be represented. The ORM method consists of the following six steps:

    investigate the given situation and develop a set of elementary sentences that describe it

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 248

    from the elementary sentences draw the conceptual schema diagram simplify the diagram by eliminating surplus entities and by indicating derived facts add constraints to the diagram establish table groups form named tables and indicate keys and other relevant information.

    We will use conceptual schema (CS) diagrams as part of ORM to show data and the relationships between data.

    Part of the School conceptual schema

    However, before we can start our ORM, we must impose limitations and qualifications on the real world, and make assumptions about the data.

    The UoD The limitations are the parts of the real world we select to model in our system. What we choose to include in our model, and what we choose to leave out, forms a Universe of Discourse (UoD). In one case we might restrict detail to students in a school and not the teachers, in another we might include a stocktake for a business but exclude invoicing, or we might decide to maintain data about males but not females. In each case we set the UoD for the given system. The UoD is the part of the real world a system designer selects to be part of an information system. To determine the UoD we can investigate all of the input and output currently produced by the system. (If all aspects of a system can be established this way the set of documents we investigate is described as being significant.) We may also interview a UoD expert. This is simply anyone who knows the current system well. This may be a key person like an accountant, a chef, or an administrator of the enterprise. These are the people who are aware of the flow of data and they can point out what is required, what is important and, just as importantly, what is not needed. In addition to limitations there are also qualifications that have to be made to transfer a real world situation into an information system. The qualifications are the level to which we will develop our model. For example, while it is possible to see a connection between age and salary (older people are usually paid more), we will generally ignore relationships of this tenuous type. In these cases we will either consult the UoD expert or make assumptions as to what is relevant. At times we will also need to make

    Student (name)

    studies Subject (code)

    is studied by

    achieves Result (%)

    is for

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 249

    assumptions to simplify our task or to make our task manageable. For example we might assume only one phone number is recorded for a business, or that all employees work overtime. By placing limits and qualifications, and by making assumptions, we will select only a part of the real world. We will necessarily lose detail. The skill in ORM is to embody the essence of what is reality in forming the UoD of the information system we are building. This must be done without losing important or key aspects that might alter the sense of what we are modelling, and so must be done carefully and precisely.

    Elementary facts The first part of ORM is to establish elementary facts from the selected UoD. These are very simple statements that indicate the roles played by the objects that the system deals with. Elementary indicates something cannot be split into smaller units of information; and fact indicates that we accept the given data as true. From the CS diagram shown above some elementary facts might be:

    Arnold studies French Mary achieves 80% IPT is studied by Jim

    Elementary facts state, in the simplest possible way, the relationships between things in the UoD. Look at the following partial output from a school sports carnival:

    name event place house pointsSmith,J 100m 1st Red 5 Jones,P 200m 2nd Blue 3 Bloggs,F 100m 2nd Red 3 Smith,J 200m 1st Red 5 : : : : :

    A fact from the table might be: Bloggs,F from Red house was 2nd in the 100m and earned 3 points

    (Note: we are not concerned whether the data is true or not we accept it as a fact.) This fact however is not elementary. It can be split into smaller units of information:

    Bloggs,F in running the 100m was 2nd Bloggs,F belongs to Red house Red house gained 3 points for the 100m

    These are now elementary facts. To split any of these any further would result in a loss of information. If we did try to split it further we might say:

    Bloggs,F ran the 100m but the information about place is lost, or:

    Red house gained 3 points but the information about which event this was for is lost.

    ... etc.

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 250

    To repeat, an elementary fact is the simplest possible statement of the relationships between data that does not result in the loss of information. In developing elementary facts we must ensure each is as plain and clear a statement as possible, without losing meaning. An elementary fact should be specific and must avoid the use of words such as and or not. (And suggests a statement can be split into two, while an information system usually does not record negative facts.)

    From facts to sentences Once we have worked out the elementary facts in a UoD we convert them into elementary sentences. An elementary sentence is a formalised statement of an elementary fact. It deals with the entities in an information system.

    Entities An entity is anything that has characteristics and can be recognised as a unit. To be an entity an object must represent a class of things, as are shown by the following:

    student, address, salary are entities each is a single recognisable type of an object that can be described

    colour, score, position are still entities even though they are abstract and not physical people, animals, furniture are not entities these are vague and general Mike Smith, 4567543, and 12-8-00 are also not entities these are specific instances

    or examples of an entity. Identifying the entities in a UoD to form elementary facts about them is the first step in ORM.

    Reference mode Entities can act in relation to other entities. Take the elementary fact stated earlier:

    Arnold studies French

    In this fact we have two objects (Arnold and French) and one relationship (studies). But what about the following elementary fact:

    the French speak French

    Here we have two different types of entity (nationality and language) but we cannot tell them apart, or which is which. To avoid confusion we will include the identity of each object in our statement:

    the nationality French speak the language French

    or in our first fact: the student Arnold studies the subject French

    In this way there is no confusion as to Arnold being a subject or French being a person. (We must remember as we conduct this process that we are modelling reality in a way that can be represented inside a computer. A computer has no concept of person or subject other than what we put into it, and so we have to be explicit in identifying what each is.) To make our presentation even clearer we can also include the reference mode by which we refer to each entity:

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 251

    the student with first name Arnold studies the subject with subject name French

    This avoids problems of context. There is now no way we could confuse a surname of Arnold with a first name Arnold or a nickname Arnold. French is definitely a subject name, and not a code or a description of the subject. To make the importance of this clear, an unmanned Martian probe failed because part of its specification was in metric and part in imperial measurements. The designers had not been specific enough in identifying different components. Some parts were in millimetres and these did not match those in inches. Tens of millions of dollars and months of effort were wasted by failing to clarify the reference mode to be applied to the different components.

    Elementary sentence At this level of formalisation we have altered the elementary fact so that it is now an elementary sentence derived from the UoD. An elementary sentence is a formalised fact that indicates:

    entities the objects that make up a system, e.g. student labels used to identify specific entities, e.g. Arnold reference modes how the label refers to the entity, e.g. first name roles the relationships between entities, e.g. studies.

    The elementary sentence above may have come from table such as the following: name subject Arnold French Mary MathsA Hayley Chemistry Kim IPT

    In this table: the entities are student and subject; these are objects that are part of the UoD some labels are Mary, Chemistry and IPT; these are values that identify specific

    instances of an entity first name is the reference mode of the label Arnold to the entity student, subject

    name is the reference mode of the label Chemistry to the entity subject, etc. the roles are studies and is studied by; these are the relationships between the entities.

    A clear understanding of the above terms is very important. We will use them in later explanations and so you will need to be comfortable and familiar with them.

    Representing elementary sentences The above table actually results in two elementary sentences. The first is shown above the other is the reverse:

    the subject with subject name French is studied by

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 252

    the student with first name Arnold

    For brevity the two elementary sentences can be written together and abbreviated to: student (first name) Arnold studies / is studied by subject (subject name) French

    In this form of the elementary sentence we write the entity with its reference mode in brackets and a sample label on one line. The two elementary sentences are contracted into one by reading both down (student of first name Arnold studies subject of subject name French) and upward (subject of subject name French is studied by student of first name Arnold). In general an elementary sentence will look like:

    entity (reference mode) label role / reverse role entity (reference mode) label

    As a further example look at the following table:

    which would result in the following elementary sentence (read both down and up): salesperson (id#) 2578 is paid / is pay for wage ($) 520

    Finding relationships Developing the list of entities and then combining them in elementary sentences is usually the most difficult step of the ORM process. One way that can help is to list the possible entities and then drawing lines showing any relationship links. For the above example this would look like: salesperson wage but for a more complex situation might look like:

    code# product

    order# date salesperson

    price

    id# pay 2578 $520 4578 $650 6578 $455 4556 $495

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 253

    These lists need not be formal as they are only used to help organise your ideas. If you wish they can be written on scrap paper and discarded after use. Once the entities and links have been identified it is then much easier to work towards developing the elementary sentences.

    Dummy key One final point in developing elementary sentences. Look at the following set of entities:

    name gender born height weight

    It is possible that for each of these entities there will be repeat instances, e.g. two people with the same name. There may be no entity that could act as a key field. In situations such as this it is useful to create a new entity to act as a key. In this example perhaps we could include a unique identity number or student code that will be able to act as a key. This field does not exist in the real world, but to enable us to better organise the database we create it to act as the key entity.

    Activity 8.2 Elementary my dear Watson 1. State at least four elementary facts from the following table:

    name age heightJane 15 168 Jim 16 175 Bill 15 169 Kate 17 170

    2. a Why is the following not an elementary fact?: Helen is 30 and earns $55 000 per year.

    b Convert the statement into elementary facts.

    3. Is it possible to break the following statements into simpler units without losing information? If so convert them into elementary facts, if not say why not. a A large Coke costs $1.90 and a small Coke is $1.40. b In March Michael Clark scored 420 runs. c Stan and George and Pete are all single. d Deciduous trees lose leaves in autumn. e Brooke borrowed the Living End CD from Petra last Wednesday.

    surname gender born height weight Smith, J male 12-08-95 1.73m 66kg Smith, J female 6-06-97 1.52m 48kg etc.

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 254

    4. Extract as many elementary facts as you can from the following: On Saturday Mike and Dean went to the speedway at Caldwell. They saw Peter Brock win the Super A-Car event and Jim O'Sullivan come second. First prize was $50 000. The event was covered by Channel 7. 32 000 people saw the event live, and an estimated 3 million on TV.

    5. Explain what each of the following is. Give an example of each from an original situation of your own choice: a entity b label c reference mode d role.

    6. a Explain in your own words the difference between an entity and a label. b Identify the entities and labels in the following:

    name section positionLowe A4 junior Adams B9 senior Fredricks A4 trainee Anderson R5 junior

    c What are the reference modes of the labels to the entities? d What roles are played? e Write the elementary sentences for the above table.

    7. Convert your elementary facts from Q1 and Q2 into elementary sentences.

    8. Cecil D. Romm has undertaken the task of developing an information system for Mike Jones Home Find. This is a new real estate agency that has been set up recently by Mike Jones with three salesmen and a secretary/ receptionist.

    Cecil would like to consult the owner of the system, the potential end users and any person who might be considered a UoD expert. a Suggest the person or employee who might fill each of these roles (owner, end user,

    UoD expert) at the real estate agency. b Give an indication as to the sort of information you think Cecil could expect to obtain

    from each of these in the given situation.

    9. Look at the data capture form over page and answer the following questions: a Suggest a possible UoD for the above situation. b Identify the entities represented on the form. c List the entities vertically and then draw lines linking the ones that are related. d Use this list to write the elementary sentences for the above form. e In writing the elementary sentences did you have to make any assumptions? Consider

    the use of the phone, the link between the suburb and postcode, and other such things. List any assumptions you have made.

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 255

    Bunyip Town Library

    Full name: .................................................. Date of Birth: ................... Title: Dr Mr Mrs Ms

    Street address: ...........................................................................

    Suburb: ..................................... Postcode: ...............

    Phone: .....................................

    Office use only

    Card number: ..................................... Issue date: .....................................

    CS diagrams To be able to simply represent and manipulate the entities and relationships in an information system we can use diagrams. In conceptual schema (CS) diagrams we indicate entities as ellipses (ovals), while roles are shown by rectangles. Lines indicate the relationships between entities.

    A simple CS diagram

    The entity student is placed in the ellipse with the reference mode in brackets underneath. The roles (relationships) are placed in the rectangles. If the reverse relationship is obvious it can be omitted. While a CS diagram represents an elementary sentence it does not include any labels. After drawing a diagram it is a good idea to check that all instances have been represented to make sure none have been left out. If the reference mode is not unique to an entity it is represented in a separate, dotted, ellipse. Say in our information system we wished to record the entity student by an identity number, by surname, and by name most commonly known as. Each is a different reference mode, but each identifies the one entity that is student.

    Student (name)

    studies Subject (code)

    is studied by

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 256

    On a diagram it could look like this:

    Alternative reference modes for one entity

    The main reference mode is still under the entity in brackets, while the alternative reference modes are in dotted ellipses. (Why do you think id# might be chosen as the most important reference mode?)

    Developing CS diagrams Before drawing a CS diagram it is necessary to identify all of the entities and the relationships between them. This can be done by determining the elementary sentences. Take the following situation:

    title accn# performer yearAlchemy 234-319 Dire Straits 1984 Unplugged 876-909 Clapton 1992 Hotel California 299-765 Eagles 1976 But Seriously 458-778 Collins 1989

    Since title and accession number both refer to the same entity we get: album artist year

    which gives us the elementary sentences: album (accn#) 876-909 album (accn#) 876-909 is performed by / performs on was recorded in / is date of artist (name) Clapton year (AD) 1992

    We also need to show the link between title and accession number: album (accn#) 876-909 is called / is name for title (name) Unplugged

    Surname

    Known by

    is named

    is called

    studies studied

    by Student

    (id#) Subject (code)

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 257

    Putting all of this on the one diagram will result in:

    CS diagram for CD Albums

    Finally we check all instances (values on the table) have been represented on the diagram. To do this read the diagram placing real values in place of the entities. If the elementary sentences were developed correctly there should be no problem, but it does not hurt to check.

    Incorrect CS diagrams As we add more and more entities to a diagram it can get very complex. As long as we work from elementary sentences that fit the original UoD there should be no problem. There should be a relationship linking each pair of entities (or entity pair if nested). The following is possible:

    CS diagram with each entity linked to at least one role

    but the following three examples would make no sense:

    Incorrect CS diagrams

    Title is called

    performed by

    Album (accs#)

    Artist (name)

    recorded in

    Year (AD)

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 258

    Arity So far we have mostly looked at binary relationships where each entity is linked by a relationship to one other entity. However we can have relationships with just one entity. These are called unary relationships. Here are two examples of unary relationships: A role with one entity One entity with more than one role

    The second of these is also called a collapsed entity. The number of entities in a relationship is called its arity. Unary relations have an arity of one, binary relationships an arity of two. It is possible to have unary, binary, ternary, quaternary, etc., relationships. In general these are described as n-ary relations. In ORM is best to work with at least binary relations. The collapsed entity has two roles and is binary, but with a single role relationship we will need to alter it artificially as in the following example:

    name runnerJim Y Dale N Kim Y Alice Y

    In the next section we will look at higher arity relationships.

    Activity 8.3 Drawing relationships 1. Identify the entities and roles in the following and then draw the CS diagram for each.

    a name height Jeremy 160 Mary 156 Alex 172 Arthur 162

    b name height born Jeremy 160 1984 Mary 156 1985 Alex 172 1984 Arthur 162 1986

    person (name) Dale with status / is status of runner (Y/N) no

    runs

    is husband of

    is wife of

    Athlete (name)

    Person (name)

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 259

    2. Convert the following facts to elementary sentences and then represent them as CS

    diagrams: a Mr Smith teaches 11S. b Peter drives a Camry made by Toyota. c The U/15s play at 2:30 on Willows Oval. d Fiona lives in Inglewood which has a postcode of 4387.

    3. For each of the following identify the entities and draw the CS diagram to show the relationship between them. a Pet Motel

    Owner Pet Type KennelRedman Sky dog 13 Redman Hook cat 2 Ranger Silver dog 24 Smith Silvester cat 3 Morris Deefer dog 22

    b Australian Prime Ministers Prime Minister Born DiedBarton E 1849 1920 Deakin A 1856 1919 Watson J 1867 1941 Reid G 1885 1918 Fisher A 1885 1952

    4. The table below shows information in relation to projects undertaken by a large organisation. List the entities and draw lines showing the relationship links. Use your list to draw the CS diagram for the table. project# manager budget salary born startedP1 Smith 40 000 12 000 1955 1998 P2 Jones 30 000 14 000 1964 1997 P3 Adams 50 000 11 000 1962 2000 etc.

    c name section positionLowe A4 junior Adams B9 senior Fredericks A4 trainee Anderson R5 junior Jones R5 senior

    d ISBN title author06446387049 Applied IT Savage 0552141275 Bravo Two Zero McNab 0684816121 Popcorn Elton

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 260

    As you prepare the CS diagram will need to make some assumptions as you only have the limited information the table gives you. Under the diagram list the assumptions you make.

    5. The table below shows information in relation to sales. item# description price seller code name discount dept 1387-67 1 litre paint $14.50 B392 Jones 5% home wares 1355-62 2 kg fertiliser $19.75 B392 Jones 8% gardening 1387-67 1 litre paint $14.50 A922 Smith 5% home wares etc.

    a What is a possible UoD for this system? b Identify the entities in the system. c List the entities and draw lines showing the relationship links. d Develop a CS diagram to represent the data presented. e List any assumptions you made as you prepared the CS diagram.

    Ternary relationships As we saw earlier some facts contain more than one role but cannot be split without losing meaning. Take the following example:

    name subject gradeClaudia 11 Eng HA Claudia 11 IPT VHA Greg 11 Eng SA Gina 12 MathsA SA

    If we represented the above table with the two elementary sentences: student (name) Claudia student (name) Claudia studying / studied by scores / is score for subject (code) 11 Eng grade (mark) HA

    this would simply suggest that Claudia studies only one subject and receives only one mark. If we keep things this simple we have lost information (that students get different marks for different subjects). The grade needs to be linked to a combination of student and subject. This is an example of a ternary relationship, one that contains three entities. The HA is not connected to Claudia alone, nor to 11 Eng alone. Achievement is linked to the person-subject combination, Claudia studying English achieved an HA. The above situation is represented in one elementary sentence as:

    student (name) Claudia studying / studied by subject (code) 11 Eng scores / is score for grade (mark) HA

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 261

    There are two ways to represent a ternary relationship on a CS diagram:

    Ternary relations on a CS diagram

    The second form is called a nested relationship. The grade is linked to the combination student-subject as if student-subject formed an entity in itself. The mark belongs to the combination of a given student doing a given subject and not to either student by itself or subject by itself. One way of checking for ternary or nested relationships is to see if any values are repeated in a column. You will notice in the above table that Claudia appears more than once, as does 11 Eng. It is only the combination Claudia-11 Eng, or Greg-11 Eng that we can link the HA or the SA to. Of the two forms the nested relationship is more common, and is the one we will mostly use in developing CS diagrams. Here is a second example:

    horse event winningsRedhot Melbourne Cup 3 500 000Redhot Caulfield Cup 1 750 000North Sea Caulfield Cup 250 000Hobnob Melbourne Cup 500 000

    : : :

    As we can see Redhot has won money in more than one race. The amount she has won depends on which race she was in. We can also see that Caulfield Cup comes up more than once, so that as this table continues we can imagine that there will be a variety of horses in a variety of races each with the respective winnings. The amount won will vary from horse to horse depending on which race it was in to win that amount. This is a ternary relationship.

    studies

    studies

    studied by

    Student (name)

    Grade (mark)

    Subject (code)

    Grade (mark)

    Student (name)

    Subject (code)

    studied by scores

    scores

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 262

    horse (name) Redhot running in event (race) Caulfield Cup earning winnings ($) 750 000

    The winnings is linked to the combination Horse-Event. In effect Horse-Event has become an entity in its own right. It is possible to have quaternary (4), quinternary (5), and higher arity relations but these are complex and we do not often have to use them. If it is necessary to represent a quaternary relation it can be represented as a double nested.

    Activity 8.4 Nested relations 1. a What does arity refer to?

    b What is meant by the term a ternary relationship? c The fact In March, Michael Clark scored a total of 420 runs is ternary.

    Explain how the dividing of it into simpler facts would result in the loss of information.

    d Written as an elementary sentence the above fact would be: cricketer (name) Michael Clark during / by

    earns winnings ($)

    Horse (name)

    Event (race)

    running in

    studying

    Time (day)

    Student (name)

    Subject (title)

    during

    is in Room (code)

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 263

    period (month) March scored / scored in runs (number) 420

    Draw the CS diagram for the sentence.

    2. For each of the following identify: a possible UoD the entities involved the reference mode used for each entity the elementary sentences each represents a

    b c

    studies

    Room (number)

    Student (name)

    Subject (title)

    Located in

    written by

    Book (ISBN)Author

    (surname

    Pages (number)

    contains

    sells for Price ($)

    Drink (brand)

    Size (ml)

    of

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 264

    d e

    3. Represent each of the following as elementary sentences and then draw them as CS diagrams.

    a class day room 11IPT Monday R1 12IPT Monday R2 11IPT Tuesday R4

    c

    breed owner pet dog Harris Fido cat Paulson Cindy dog Paulson Deefer bird Murray George dog Dunn Lonnie

    b city rainfall month Brisbane 45 January Sydney 85 January Melbourne 12 January Brisbane 62 February

    d shop item sold Coles mop 300 brush 400 duster 200 BigW mop 400 dust pan 250

    contains

    Dept (code)

    Section (branch)

    Workers (count)

    part of is assigned

    due on Date (dd/mm/yy)

    Book (title)

    Borrower (id)

    lent to

    written by

    Author (name)

    called Name

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 265

    e f Pizza Palace

    g Bunyip High School Id# Student Subject Teacher Room34578 Harrow W English Watson A24 34578 Harrow W Science Richards C41 34578 Harrow W Maths Graham R9 34665 Ayre R English Watson A24 34665 Ayre R Science Graham C41 34778 Kempsey P Maths Smith R9

    Simplifying the structure Part of our job in designing an information system is to make it as simple as possible while still keeping it true to the real world situation it is modelling. To do this if we can reduce the number of entities, and hence the number of columns or tables, then we will generate a more manageable structure. As part of this process we can remove surplus entities, and identify derived entities. We will see how to do this in our CS diagrams.

    Surplus entities If something is described as being surplus we mean it is not really needed and we can do without it. Look at the following information about netball teams:

    squad manager trainer captainArrows Peters Harris Harris Jets Logan Michaels Kemp Rockets Morris Morris Lange Bullets Roberts Forde Roberts

    we might be tempted to represent this on a CS diagram as:

    city month max minLondon January 15 2 February 14 4 Paris January 15 -1 February 18 5

    Pizza Size Price Mega Meal Small $7.20

    Medium $8.50 Large $10.50

    Supreme Small $7.50 Medium $8.90 Large $10.90

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 266

    However if we look carefully at the values in the table we will see that there appears to be an overlap. Some managers are trainers, some trainers are captains, and some captains are managers. If we want to record the same data (personal details, contact information, etc.) about each, then they can be treated as the one entity. The above could more simply be represented as:

    When we find a situation where one label instance is turning up in different entities it is an indication that we can simplify the diagram by combining the entities. In this situation instead of four entities we have two, because the entity member plays three different roles. In the second diagram we have eliminated two surplus entities. This is simpler to handle and easier to maintain in a database. To find surplus entities such as the above look for the same instance or value turning up in more than one column. If different instances are going to have the same information recorded about them, and be treated in the same way in each case (e.g. as a text field 20 characters long), then the different instances can be represented by one entity. Take the following case:

    rectangle length (cm) width (cm) area (cm2)J1 15 4 60 J2 4 3 12 J3 12 6 72 J4 5 60 300

    There again appears to be an overlap. The instance labels 12, 4 and 60 appear in more than one column perhaps there are surplus entities here.

    controlled by

    trained by

    Squad (name)

    Trainer (surname)

    run by Captain (surname)

    Manager (surname)

    trained by

    Squad (name)

    Member (surname)

    run by

    controlled by

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 267

    For the 4 we can see that in both cases the value is in centimetres and so it appears length and width can be treated as one entity. But what about the 12 and the 60? In these cases we are dealing with two different units, and 12cm cannot be meaningfully compared to 12cm2. Length and width are both about the same sort of thing, but area is a different type of unit. Our CS diagram can therefore be drawn as:

    with one surplus entity removed. When we come to enter this into the database we will have one less field to be concerned with, and will have less chance of entering properties for the same type of thing in two different ways. One other way of spotting surplus entities is if we see we are entering the same type of information for what we thought were two different entities. Take the following that might be part of a much larger CS diagram for a school sports day:

    In this situation we can see that the same sort of information is being recorded for a student as is being recorded for a competitor. If you look at the tables of information and discover that students are also competitors then it makes sense that there is a surplus entity here. In this case we could do away with the entity competitor and simply use the student number to identify competitors. This would not only remove one entity but there would be one less table in the database, and less chance of inconsistent data. Removing the surplus entity would simplify the database and improve its integrity.

    has length

    Rectangle (number)

    Distance (cm)

    has region

    has width

    Area (cm2)

    was born

    Student (id#)

    Date (AD)

    is

    is Gender (M/F)

    Competitor (comp#)

    Height (cm)

    was born

    is

    is

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 268

    Derived entities A derived entity is a value that the computer can calculate as the database is running. Take the example above of area of the rectangle. This field can easily be determined by multiplying the length and the width. By establishing entities that can be worked out at run time we have less data to store and a simpler set of database tables to create. To indicate derived fields on a CS diagram we place an asterisk (*) next to them and show the calculation at the bottom of the diagram as a footnote. Take the following case:

    item code cost price selling price profitbl786 $10.45 $15.90 $5.45 gh321 $11.80 $18.50 $6.70 kk776 $5.65 $11.25 $5.60

    The final three columns all contain money, so they can be treated as one entity. In addition the profit can be worked out by the computer (how?). The CS diagram would look like: * selling price cost price The asterisk shows that profit is derived and points to the calculation needed to derive it. In effect we will not create the last column of the above table as a field in our database. The amount of profit will only be available to users at run time. This has the advantages of making for a simpler table structure, and avoids the problem of having a possibly incorrect value stored in the database. The value is only calculated when it is needed.

    Activity 8.5 Less is more Draw CS diagrams for each of the following situations. In each remove surplus, and indicate derived entities.

    *

    1. code salary allowances pay

    E1 30 000 5 000 35 000

    E2 25 000 3 000 28 000

    E3 28 000 3 500 31 500

    2. expt start(g) end(g) gain(g)

    1 230 280 50

    2 240 295 55

    3 220 270 50

    4 225 255 30

    costs

    Item (code)

    Money ($)

    earns profit

    sells for

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 269

    5. Prepare a conceptual schema diagram for each of the databases we used in unit 7. a Repairs:

    Devices (type, rate, priority) Technicians (id_numb, name, grade) Experience (id_numb, type, qualification) Repair (job_numb, id_numb, type, owner, date, time, ready, cost)

    b School: Student: (stnumb, stname, gender, grade, born) Subject: (subjnumb, subjname, tname) Results (stnumb, subjnumb, percent) Teacher: (tname, grade, room)

    c Classic movie hire club: Movie (movienumb, movname, length, year, dirnumb) Director (dirnumb, dirname, country) Member (memberid, memname, address, owes) Onhire (movienumb, memberid, duedate)

    Uniqueness constraints Our aim in ORM is to get the best, or optimal, arrangement of fields and tables in a relational database, and to identify important properties of the fields. At this stage we have completed the most difficult part of the process, determining the entities and relationships in a given situation and representing them on a CS diagram. We are now ready to determine which tables make up the database and which fields go into which tables. To do this we first have to identify the uniqueness constraints on relationships, and then use these to group fields into tables, with a specified table key. You will remember from Unit 1 that it is important to be able to identify fields that can act as keys. Columns are named, but rows are not. To be able to refer to an individual row it must have a field that is unique, i.e. not repeated. The data in this field will then act as the key or identifier for a given tuple. So what is a uniqueness constraint? Unique means not repeated; a constraint is a limitation or restriction. In terms of ORM a uniqueness constraint is an indication of the number of times that an entity can play a role in a relationship with another entity. This sounds more complicated that it is, so to illustrate what uniqueness constraints are, and how they work, we will use an example of a table of information about drivers of cars.

    3. id# number weight total

    254 5 230 1150

    256 10 150 1 500

    257 6 310 1 860

    258 3 200 600

    4. item cost tax price

    cassette $15.50 5% $16.28

    CD $24.90 10% $27.39

    DVD $29.50 8% $31.86

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 270

    Many:1 The first form of uniqueness constraint we will look at is many:1.

    Take the following example showing who holds which drivers licence:

    The double headed arrow, called a bar, over the role on the left indicates the licence can only play that role once relative to a driver. If we look at a table containing instances of these entities we can see that there are no repeats of the licence number, while there may be repeats in the surname column: We can see from this example that the surname Harris has two licences (whether Harris is the same person or not). A licence will only ever have one surname linked to it, but a surname may have more than one licence. The bar indicates the uniqueness of the licence between this pair of entities. For just this pair of entities the only one of the two that has non-repeating instances is licence. To show that licence is unique when looking at just these two, we place the bar over the role that licence plays in the relationship. This sort of relationship where the entity on the left is not repeated, while the one the right may be, is described as many:1 (many-to-one), and called a single strong relationship. (Why it is called many:1 will be explained shortly). If the entities appear in a CS diagram with the unique entity on the right then it can be described as 1:many.

    Many:many The second form of uniqueness constraint is a many:many (many-to-many) Again using our car drivers example, this time for traffic offences:

    This time the uniqueness constrain is over both roles.

    lic_numb surname59 762 139 Harris 59 762 140 Addams 59 762 141 Smith 59 762 142 Harris ..., etc.

    licence (lic_numb)

    driver (surname)

    belongs to

    licence (lic_numb)

    offence (desc)

    issued to

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 271

    If we look at a table: we can see that there are repeats of instances in both columns. Looking closely however, while each offence has been issued to several licences, and some licences have more than one offence, we can see that the combination offence-licence is unique. The bar over both entities indicates that there are no repeats of the same offence to the same licence. This is a many:many or weak constraint.

    1:1 The final uniqueness constraint is 1:1 (one-to-one). In this case there is one and only one mobile phone contact number recorded for each driver through their licence number. There are no repeats of instances in either column. A relationship in which each entity only has unique instances is described as 1:1 or double strong. A 1:1 relationship is indicated by a bar over each role in the relationship.

    Activity 8.6 Constraints 1. a What does the word unique mean?

    b Why is it necessary to have a key field in a relational database? c In the following table which field is likely to act as the key field?: Account (name, birthdate, gender, account#, balance)

    d Explain with examples why each of the other fields would be unsuitable.

    2. What are the three types of uniqueness relationship that can exist between two entities?

    offence lic_numbspeeding 59 762 139seat belt 59 762 140seat belt 59 762 139speeding 59 762 141parking 59 762 141..., etc.

    lic_numb mobile59 762 139 0143 890 144 59 762 140 0138 556 887 59 762 141 0148 245 574 59 762 142 0133 564 123

    ..., etc.

    licence (lic_numb)

    mobile (ph#)

    contact at

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 272

    3. For each of the following identify the form of uniqueness relationship and add a constraint bar over the appropriate role or roles. a b c d e

    4. Using the collapsed CS diagram shown here, what uniqueness constraints would apply in each of the situations that follow?

    school (number)

    person (surname)

    principal of

    run by

    account (acc#)

    address (street)

    registered to

    found at

    gender (M/F) is of

    person (surname)

    feature (desc)

    fitted with

    car (reg#)

    person (first name)

    is husband of

    is wife of

    student (student#)

    subject (desc)

    studied by

    studies

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • a polygymy where one husband can have several wives husband wifeJim Mary Jim Jane Peter Lisa

    c polyandry where a wife can have several husbands husband wifeJim Jane Peter Alice Mike Alice

    b monogamy a person may have only one spouse husband wife Jim Jane Peter Alice Mike Lisa

    d polygamy a person may have several spouses husband wife Jim Mary Jim Jane Peter Mary Peter Lisa Mike Jane

    5. Draw the CS diagram and add uniqueness constraints for the following: a b c

    Uniqueness relations There are three possible uniqueness relations between a pair of entities, single strong (many:1 or 1:many), weak (many:many) or strong (1:1). By identifying these constraints we can group fields into tables, and choose a field or fields that can act as key for each table. Before we do that, let us look at why the constraints are so named. The 1:1 is the easiest to see. Using the drivers mobile phone example: Each instance on the left points to a single instance on the right. They are linked one to one. Many:1 (or 1:many) on the other hand describes a situation in which more than one instance in one entity may be linked to only one in the other.

    name age name height weight captain team played wonMick 16 Mick 168 75 Harris cricket 8 6 Jim 16 Jim 182 84 Murray netball 14 12 Claire 15 Claire 159 58 Harris rugby 15 6 Linda 17 Linda 162 56 Jones tennis 6 3

    59 762 139

    59 762 140

    59 762 141

    59 762 142

    0143 890 144

    0138 556 887

    0148 245 574

    0133 564 123

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 274

    Take for example if we list all of the possible drivers licence instances under one entity, and all possible genders under the other, and then show links with lines:

    This shows that more than one instance on the left (many) can point to a single instance on the right (1). Finally the many:many relationship describes a situation in which more than one instance in one entity may be linked to more than one instance in the other. Going back to our traffic offence example:

    More than one instance on each side (many) can point to more than one instance on the other (many).

    Key fields As we saw in unit 1 the columns in a relational database are named, but rows are not. In order to identify a given record in a database one or more fields are chosen as keys. For example in a motor vehicle database the licence number will uniquely identify a particular driver, even if two drivers have the same name.

    lic_numb surname first_name age offence59 762 139 Smith Alice 35 speeding 59 762 140 Harris Mark 54 seat belt 59 762 139 Smith Alice 17 parking

    : : : : : We can use uniqueness constraints to identify which entities can potentially act as table keys when the entities become fields in the database. In a many:1 situation the entity that is the many side will not be repeated and so becomes a possible key entity. If it is linked to other different entities that together might form a table, it becomes a possible primary key for that table. In the example above the licence number is many:1 to surname, to first name, to age, and to offence. In this case licence number is a possible primary key if a table is made from these entities.

    59 762 139 59 762 140 59 762 141

    speeding seat belt parking

    59 762 139 59 762 140 59 762 141 59 762 142 59 762 142

    ...etc.

    Male

    Female

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 275

    On the other hand in 1:1 situations where each entity has unique instances, either entity can act as a table key for a group of other entities. In this case they are referred to as candidate keys. If one entity is chosen it becomes the primary key of the table; the one not chosen is referred to as a secondary key. Finally in a many:many situation the only unique relation is the combination of the two entities. In cases such as this the two entities can be linked to form a combination key for any entities linked from them. One example of this is in a nested relation where one entity is linked to a combination of two other entities.

    Here the offence belongs to the combination driver-date. A driver, identified by his or her licence number, commits more than one offence; at the same time more than one offence is committed on the same day. The only thing that is unique in this instance is the driver-date combination. (What assumption are we making here?) A weak constraint is put across the combination. In turn offence is linked to the driver-date combination by a single strong constraint. This pattern of constraints almost always appears on a nested relation and is worth remembering. In the next section we will see how we can now use these constraints to identify the table keys in a database and to optimise the table structure. We will look at other forms of constraint later.

    Activity 8.7 Key constraints 1. a Table keys can be described as primary, candidate or composite. With which sort of

    relationship is each type of key associated? b Uniqueness constraints can be weak, single-strong or double-strong. With which sort

    of relationship or relationships is each type of constraint associated?

    2. a Create a table of data with two columns and four rows. Add data so that there is at least one repeat in at least one of the two columns but no two tuples are the same.

    b Using the examples in the section above draw a diagram showing which instances in one field are linked to instances in the other.

    driver date offence59 762 139 12/08/10 speeding59 762 140 12/08/10 seat belt 59 762 139 13/08/10 seat belt 59 762 141 13/08/10 speeding59 762 141 20/08/10 parking ..., etc.

    on

    offence (desc)

    driver (lic#)

    date (dd/mm/yy)

    comm.-itted

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 276

    c Identify the type of relationship you have drawn.

    3. Develop the CS diagram and add uniqueness constraints for the following facts: a Bankcard 3912 5643 23 belongs to Smith. b Jane's phone number is 345621. c Martin in the German Grand Prix started in 3rd place on the grid. d Jim in Yr 11 is 185cm tall (assume there is only one Jim in Yr 11). e Student 23451 received a credit for Maths C.

    4. Draw the CS diagram and add uniqueness constraints for the following ternary relations: a b

    c d 5. Draw the conceptual schema for the Bunyip Town Library data capture from Q8 in

    Activity 8.2.

    6. Draw the conceptual schema for the following situation. State any assumptions you make, and add uniqueness constraints.

    client born gender balance bankTully 12-08-91 M $12 567 ANZ Peters 14-06-84 F $245.45 ANZ Peters 14-06-84 F $56 457 NAB Rodgers 01-05-86 M $48.50 CBA

    7. Ezy Eddys Bonza Used Cars is about to be computerised. Ezy Eddy would like to record the model of the car (e.g. Ford Fiesta), the year of manufacture (e.g. 2001), the selling price (e.g. $18 700), as well as the cars colour. Each car is distinguished by the unique number on its compliance plate.

    Draw the CS diagram for the scenario. State any assumptions you make and add the uniqueness constraints.

    team month winsBroncos May 5 Broncos June 4 Sharks May 3 Storm June 4

    year month rainfall 2009 Nov 56 2009 Dec 68 2010 Jan 23 2010 Feb 49

    brand HDD RAM cost Whiz 6.8G 2Gb $2 450 Whiz 12G 3Gb $2 800 Supa 6.8G 2.5Gb $2 200 Supa 10G 3Gb $2 560

    bank rate loanANZ 7.8% home ANZ 6.8% variable NAB 7.75% home CBA 6.5% variable

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 277

    Normalisation Creating the optimal (best) design for a database involves a process called normalisation. In a series of processes data is rearranged to make handling it more efficient. Each normal form is better suited to efficient data manipulation without update anomalies. The simplest arrangement is called first normal form, or 1NF, while optimal normal form is 3NF. This is the best way we know to arrange data so that it is simple, efficient and effective to operate with. (In addition to 3NF data can be further refined into 6NF.) To normalise data to 3NF used to be a complex and time consuming process. Fortunately using ORM and the CS diagrams we have developed it is now a simple task. There are three steps:

    a. draw a loop around groups of relationships linked to key entities (many:1 or 1:1) b. draw a loop around any relationships that are many:many and any nested off them c. establish named tables by listing all entities coming from each loop and underlining the

    key entity. This process will need a bit of explanation. To do this and for simplicity we will use the following skeleton of a CS diagram.

    The above diagram simply shows the entities identified by letter names and the roles with uniqueness constraints. A diagram such as this is used only for demonstration purposes. The first thing we need to do is find any entities that can act as primary keys. These will always be in relationships that are many:1 or 1:1 the uniqueness constraint bar will be over the role closest to the key entity. In the diagram we can see that B is many:1 to A, to D, and to C (the bar is closest to B for each pair). The entity B can thus act as a primary key for the other three entities. In turn C can act as key for F and G. But what about D and A? This is a 1:1 relationship and so we have what is described as candidate keys either could be primary key. We have to choose between them, and since in this situation D is also a key for C we will choose D as the primary key.

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 278

    To show the choices we have made so far we draw loops around the relations already chosen.

    For clarity the loops have been shaded, but in practice this is not needed. The loop on the left shows D is to act as key for A and C. The next loop shows B acting as key for A, D and C. The last loop shows C as key for F and G. We identify the key because it is the entity in the loop that the uniqueness bars are closest to. In drawing these loops we must only draw around the relationships, not around the entities themselves. (At times this may result in awkward looking loops.) We also have to ensure there is no cross over each relationship is in only one loop. The loops we have drawn so far indicate the relationships that are linked many:1 or 1:1 to our key entities. The next step is to draw loops around many:many relations and any nested off them. These entities in many:many relations will form composite keys for the remaining tables. Doing this results in:

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 279

    We put a loop around the D-H relation. The loop round the B-E combination is extended to cover I which is nested off them. D-H and B-E will form composite keys. We now have five loops on our diagram. The entities attached to these loops will form the fields in our database tables. The key in each table will be the primary or composite keys already identified. These are indicated by underlining them:

    One: ( D , A , C ) Two: ( B , A , C , D ) Three: ( C , F , G ) Four: ( B , E , I ) Five: ( H , D )

    In reality we would name the tables customer or accounts or something similar but here we have just used numbers for names. These five tables are now in optimal normal form (ONF) the best known way of arranging fields for a relational database. It is interesting to note that some fields are key fields in one table, but just ordinary fields in another (see D in One and Two, or C in Two and Three). A non-key field that is a key for another table is called a foreign key. To review the normalisation process:

    develop the CS diagram and add uniqueness constraints draw loops around many:1 or 1:1 relationships grouped to a key entity next draw loops around any many:many relations and entities nested off them check to see that each relation is in only one loop, and that the loops do not go around

    the entities themselves finally use the loops to establish named tables and underline the keys.

    Activity 8.8 ONF 1. a What does the word optimal mean?

    b What is an update anomaly? Give an example of a situation in which one occurs. c What is a key field in a database table? Why is one needed? d Draw the CS diagram for two entities in a many:1 relation. Of the two which could

    act as a key? e Identify the four different types of database key described above and explain how

    each is used.

    2. Use the normalisation process to establish unnamed tables from the following skeleton CS diagrams. a

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 280

    b c 3. Whizz Computers intends to maintain sales records on computer. They would like to

    record the name and phone number of customers as well as the computer they bought and the date it was bought on. They also want to record the supplier of the computer for warranty purposes. The following tables are poorly designed but give an indication of the types of information that will be in the database.

    client cust# phone comp# ord# comp# ord# date desc supplierKeats,J 1349 4356214 PX121 7654 PX121 7654 12-4-10 PC Acme Harms,C 2314 4324134 PX123 7546 PX453 7693 12-4-10 PC Acme Keats,J 1349 4356214 MX795 7321 MX795 7321 13-4-10 iMac GoComp Long,S 1298 7346871 PX945 7527 PX945 7527 13-4-10 PC Acme

    a Suggest how an update anomaly might occur with the above tables. b Prepare a CS diagram to represent the entities and relations in the above tables. Check

    for surplus entities. c Add uniqueness constraints and state any assumptions you have made. d Use normalisation to prepare a set of named tables with keys indicated. e Why will an update anomaly no longer occur?

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 281

    More constraints As stated earlier a CS diagram is only an abstract model. In this section we are going to look at ways to include additional information on CS diagrams to communicate more information of the real world situation that is being represented.

    Mandatory constraints In the Whizz Computers question in the last activity the phone number was recorded for each customer. The inclusion of the number is obviously necessary so that the customer can be contacted if needed. In fact this is so important the database should force the sales assistant to enter the number. The database should not allow a record to be completed if the required field does not have an entry. In this way customer information cannot be recorded without a phone number. In a CS diagram we show an entity is required using a mandatory, or total role, constraint. This is shown by a necessity dot next to the key entity in a relation:

    The dot indicates that if a customer number is recorded then a phone number must also be recorded. In this relationship the phone number cannot be null. The dot is placed next to customer to show every instance of customer must be involved in a relationship with a phone number. To show how this works look at the following section of a CS diagram:

    The necessity dot is only on one of the two relations coming from the key entity. This shows that we must record a date of birth for a driver, but the driver may not yet have committed a driving offence.

    The offence field is permitted to contain nulls but the born field is not.

    driver born offence59 762 139 12-05-92 speeding59 762 140 03-06-86 59 762 139 25-07-90 seat belt 59 762 141 15-08-84 ..., etc.

    phone (number)

    contacted at

    customer (id#)

    DoB (dd/mm/yy) born

    driver (lic#)

    offence (desc) committed

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 282

    If we wish we can indicate not null in a table definition by placing a necessity dot over the required field.

    Driver (lic#, born, address, phone, offence)

    Since mandatory constraints are only used to determine required fields we need only consider roles coming from key entities.

    Roles from non-key entities can be ignored. In the above example we needed to consider if we placed necessity dots on the roles leading from driver, but did not have to worry about DoB or offence.

    Entity constraints Some entities are limited to the values they can take. On a CS diagram we can indicate this alongside of the entity in brackets:

    Marital status can only be recorded as S, M, W or D, gender can only be M or F, and wage can only be values between 300 and 850. These are all examples of entity constraints, a limitation placed on the values the entity can hold.

    Frequency constraints A frequency constraint indicates the number of times a relationship may occur and is indicated over the role. In this example the student may take up to six subjects

  • Object-role modelling 283

    More constraints

    External uniqueness If two entities together form a unique combination:

    The U linking the roles with a dotted line shows that in this situation a students first name and surname taken together do not repeat. This means that no two students have the same full name.

    Subset If a role can be played by an entity only if another role is also played by that entity, then link the two with a dotted arrow. In this situation only those employees who perform extra duties will have a bonus recorded. The arrow points from the dependent role (the one that would not exist without the other) to the main role.

    Equality If a role will be played by an entity if, and only if, the entity also plays another role. Here a teacher who teaches the subject must be the one to write the subject reports. The equality constraint is shown by a dotted double-headed arrow.

    earned

    carrying out

    first name

    known as

    student (student#)

    surname called

    U

    employee (emp#)

    extra duty (desc)

    bonus ($)

    reports on

    teaches

    teacher (code)

    subject (desc)

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 284

    Exclusion Playing a role prevents an entity playing another role. Here the amount customer either paid or still owes is recorded. Only one of the two roles (paid or owes) is recorded, not both. The exclusion constraint is indicated by an X linking the roles with a dotted line.

    Subtype The final constraint we will look at is subtype. This is used where data is recorded for only part of an entity. Here the amount paid is divided into either cash payments, or by credit card. If cash the amount of change given is recorded, if by credit card the card number is recorded. The entity amount is split into the subtypes cash and credit, with different information recorded about each. The subtypes, that together make up the entity amount, are linked to it by solid-line arrows.

    Activity 8.9 By design Develop a CS diagram for the each of the following situations. Add all constraints to your diagram and develop named tables in ONF with keys clearly underlined. 1. Each football club in the Masters League Competition has a home ground. On a given date

    a club will play at either its own or its opponent's home ground (there are no other venues).

    owes

    paid

    customer (cust#)

    amount ($) X

    is paid

    paid customer (cust#) amount ($)

    cash change ($)

    card (card#)

    credit made with

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 285

    Each venue has a certain seating capacity. Players play for a given club although their starting position may vary from game to game. (Assume a player only plays for one team for the season.)

    2. CerealCo runs a grain silo. Each farmer who delivers to the silo has a unique producer number, and is located at a set address with a contact phone number. Grain is supplied daily to the silo during harvest (no details of the silo need be recorded). The quantity of grain supplied and the type (wheat, barley etc.) and moisture count of the grain supplied by each producer for each day are recorded. (Hint: Either assume a farmer makes only one delivery a day, or, each farmer only produces one type of grain.)

    3. Cheapa Rentals hires out a range of budget cars. Each car is identified by a unique vehicle number. The make of the vehicle and the date it was bought by the hire company are also recorded. When a vehicle is taken out the odometer reading (in kilometres) of when the vehicle leaves, and when it returns are recorded, as well as the date. The hirer must supply his/her name and driver's license number. Each make of vehicle has its own hire and insurance rates. Assume vehicles are hired for full days only and for only one day at a time.

    4. Bunyip SHS wishes to keep instructional information for one semester in a computer based information system. For each student, the name, age, gender and year level must be recorded as soon as they are enrolled. Sometimes different students have the same name. Each student may choose up to six subjects to study. Each subject has only one teacher but teachers may teach more than one subject. Each teacher is allocated a personal teaching room for their exclusive use and they use it all of the time. At the end of the semester, the student is allocated a Level of Achievement for each subject.

    5. Helen Hiram runs a moderately successful gym based mainly around aerobics, but with body building and swimming classes included. Recently she has set up Hiram's Health Hire Club - a company to rent out equipment to people wanting to get fit.

    The club supplies equipment in the categories of Body Building (benches and weights), Exercise (step-up, rowing or cycling machines) and Massage (e.g. ray lamps, immersion units). Helen has up to five of each item, each individually numbered and with its purchase date recorded for warrantee purposes. All equipment is on a one week loan with prices varying from item to item (e.g. rowing machines $15 per week).

    Helen wants members of her Health Hire Club to be able to come in and, on showing their membership card (with barcode), be able to take out any piece of equipment. Members may hire between one and three pieces of equipment and may hire items while they still

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 286

    have other items on hire. The system will need to record the due date for return of each item.

    Since all members are local only their name and street address need be recorded. As Helen likes to remember member's birthdays with a special half price offer she records this if she has details. Some members are offered credit so Helen would like also to keep a record of who owes her money and how much. For simplicity you may assume that all money transactions etc. are done by the sales assistant and so you may exclude them from your information system (apart from amount owing for creditors).

    6. Your school would like to keep a detailed database of its academic staff. Apart from name and address and other usual details, the database is to record details of

    qualifications that include institution, date and qualification type. Staff can have more than one qualification. The database is also to record subject, year level, semester taught and for which calendar year.

    The database should be able to list all staff alphabetically with subjects taught each semester from the time they commenced at the school. Another useful list would be to have staff alphabetically arranged with qualifications attained in date of attainment.

    7. Friendly Application Software Technology (FAST) is a computer programming agency. FAST arranges for programmers working from home to contract their services to various companies to develop computer applications. (A contractor is someone who does part of a job for a company for a set price; an application is a computer program.) Each programmer may be developing applications for several companies and so their FAST identification number must be unique. Information is recorded on the company or companies a programmer works for, the rate of pay, the language s/he programs in and their contact phone number.

    For an application both the language it is written in, and the type of program it is (game, word processor etc.) are recorded.

    The conceptual schema design process We have now learned enough about ORM and conceptual schema to be able to develop our own table structure in optimal normal form from scratch. The activity that follows this section will give you exercises where you can have a go at doing it yourself. However before we get to that we will have a full run through of the whole process from a realistic scenario through to the stage of developing a set of relational tables. To begin we will review the steps to be followed. 1. Investigate the UoD and develop a set of elementary sentences to describe it. 2. From the elementary sentences draw the conceptual schema diagram. 3. Simplify the diagram by eliminating surplus entities and by indicating derived facts. 4. Add uniqueness, mandatory and other constraints to the diagram.

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 287

    5. Establish table loops around relation groups that are many:1 or 1:1, and around any many:many relations.

    6. From the loops form named tables and indicate keys and other relevant information. As we go through this process you might like to try each step yourself before looking at the solutions offered.

    Yacht club scenario Every year the Bunyip Lake Yacht Club holds regular regattas. At the beginning of the racing season, most members of the club put in a season entry, which means they are automatically entered into all races for that season. It is also possible to enter for a particular race on the day of the race. Such an entry is known as a beach entry. On the day of the race, the secretary makes up a list of entries for the race from the season entries and the beach entries. Races are normally run over a triangular course, finishing close to, but not at, the starting point. Races are usually divided into divisions containing a single type (class) of boat, or a small number of different classes. The race committee supervises the race from the committee boat. At the start of the race, the committee boat is placed at one end of the starting line, and the boats that actually start are recorded on the list of entries. The actual start time for each division is also noted. For the finish of the race, the committee boat moves to the finishing line and the time that each boat crossed the line is recorded. On the return to the clubhouse, the secretary calculates corrected times by adjusting the elapsed time for the boat to finish the course by a helmsman's handicap and a yardstick where relevant. The handicap is a measure of the helmsman's skill or previous success. The yardstick is a measure of the size and speed of the boat, but is only used for divisions containing more than one class of boat. A list of placings within each division, based on corrected time, is then written out, and a copy pinned up on the club noticeboard. Season entries are allocated points towards the club championship based on their results in each race. The points are updated by the secretary.

    Step 1 elementary sentences The first step is to investigate the UoD presented in the scenario and develop a set of elementary sentences to describe it. To do this carefully read through the scenario several times and identify possible entities. This can be done by underlining or circling (in pencil if doing so in this book!). Since this is not a real-world task, and as you have no UoD expert to ask questions of, you may have to guess at what something means, or the implications it may have for the task. As you do this you will probably have to make assumptions about the scenario. Keep a record of these assumptions as part of your documentation of the task. A list of potential entities is given in the solution on the next page. Have a go at identifying them yourself before looking at the solution. The next step is to develop the elementary sentences that go with these entities. To help with this draw lines (again in pencil) showing any relationship links between the entities listed. From these links write out the relationships in the form:

    member (id#) H3958

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 288

    contacted on / is contact for phone (number) 3689 1247

    There are at least three ternary (nested) relations among the entities. In developing the elementary sentences the following assumptions may be made:

    entries for events belong to the members not the boats all members who are not recorded as a season entry are classified as a beach entry there may be more than one race on a day helmsman may race on more than one boat for different races on a given day, but

    helmsmen do not change during a race handicaps can change during a race day and over a season.

    The first of these, about linking an entry primarily to the member and not to the boat, is crucial. If not done this way the elementary sentences, CS diagram, and eventually the resultant database will be different. Why do you think this assumption has been made this way? See if you agree with the other assumptions, or did you make any of your own?

    Step 2 CS diagram From the elementary sentences the first draft of the CS diagram can be produced. Have a go at this yourself before looking at the solution on page 290.

    Possible entities Member [id#, name] Home phone Mobile phone Email address Entry type Handicap Entry Boat [reg#, name]

    Class Yardstick Start status Race [race#, title] Race date/time Course Division Division start time

    Finish time Elapsed time Corrected time Placing Race points Season points Boat points

    Items in square brackets include alternative reference modes for the entity; entities in italics are optional.

    Step 3 eliminate surplus entities and indicate derived facts To identify surplus entities see if the same instance is turning up in different entities. To find these you may have to imagine instances in the different entities in the diagram or elementary sentences. We can simplify the diagram by combining any surplus entities to produce a database that is simpler to handle and easier to maintain. There is only one quantity that can be worked out at run time and this is elapsed time. This derived field is to be indicated on the CS diagram with an asterisk next to it and the calculation shown at the bottom of the diagram as a footnote. Again have a go at these yourself before looking at the solution.

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 289

    Step 4 add constraints Add uniqueness constraints, again recording any assumptions you make. Next add mandatory, entity, frequency, and any other constraints indicated by the scenario. The assumptions you make might include the following:

    members must have at least one contact number, but only one home (and one mobile) phone number is recorded for each member

    members do not have to own or skipper boats boats can be owned by more than one member one member can own more than one boat two boats may not tie for one placing in a race members must have some season points recorded (even if just 0) a class of yacht can only be in one division the same yardstick can be applied to different classes a given placement in a race (1st, 2nd, etc.) is always awarded the same points all yachts are owned by members a member who takes part in a race must be either a helmsman (skipper) or crew.

    Step 5 establish table loops We are now at the stage of determining which entities fall into which tables. To do this draw table loops around relation groups that are many:1 or 1:1, and around any many:many relations.

    Step 6 form named tables and indicate keys Finally from the loops write out the tables, naming each and underlining the keys.

    ONF Tables Member (member_id, name, phone, mobile, entry_type, current_points) Boat (reg_nr, name, class) Race (race_id, title, day_time) Start time (race_id, div_nr, start_time) Event (reg_nr, race_id, skipper, finish_time, elapsed_time*, start_status, placing) Class (class, div_nr, yardstick) Owner (reg_nr, member_id) Event crew (reg_nr, race_id, member_id) Points (placing, points_awarded) Handicap (member_id, race_id, handicap) *elapsed time = (finish time division start time + handicap) * yardstick

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 290

    CS

    diagram first draft

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role m

    odelling 291

    CS

    diagram w

    ith constraints

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Leading Technology 292

    CS

    diagram final draft

    Kevin Savage 2011 Single user licence issued to Mitchell Ingall

  • Object-role modelling 293

    Activity 8.10 Developing an information system 1. The relati