33
Obj-RelDBv05f.ppt - RJL 050907 - <1> Database Design &Implementation First priority of database management: Never lose the content of an operational database. This ability to automatically preserve data off-line is called persistence . Hybrid Object-Relational DataBases (ORDB’s) are one way to solve the persistence problem when implementing object-oriented applications with persistent data content. Understanding ORDBs requires more details about database architecture (next slide).

Database Design &Implementation

  • Upload
    garnet

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Database Design &Implementation. First priority of database management: Never lose the content of an operational database. This ability to automatically preserve data off-line is called persistence . - PowerPoint PPT Presentation

Citation preview

Page 1: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <1>

Database Design &Implementation

First priority of database management: Never lose the content of an operational database.

This ability to automatically preserve data off-line is called persistence.

Hybrid Object-Relational DataBases (ORDB’s) are one way to solve the persistence problem when implementing object-oriented applications with persistent data content.

Understanding ORDBs requires more details about database architecture (next slide).

Page 2: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <2>

Hybrid Object-Relational DataBases (ORDB’s)

ORDB’s can preserve network-structured data content by what is called ‘pointer swizzling’: Every address pointer in a memory-resident structured data object is mapped to/from an object key or unique identifier in external RDB tables. This must identify the table/class type and the row/object number within the table.

The COOL framework supports ORDB management with separate components for data and method access:

– GEN generates C/C++ code for persistent I/O and navigation through data structures in the ORDB. This code depends only on the database model or schema, not on the application code.

– LCP captures event-mediated flow of control information in a state model database. Application programmers write the State Actions

Page 3: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <3>

LCP (Life Cycle Prototype) Interpreter

LCP controls application behavior by capturing event-mediated flow-of-control information in a state model database. It dispatches events to object instances by ‘delegating’ execution to a prototype ‘Active Class’. This is one way of implementing inheritance.

LCP supports method access inside prototype object instances by (compile-time) mapping of named state actions to function pointers for each Active Class. Developers write the application-specific code for the State Actions.

Page 4: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <4>

LCP Interpreter Performance

LCP performance depends on two kinds of overhead:

Control-flow interpretation from the State Model database: this overhead increases as state actions become more fine-grained.

Event-dispatching overhead: this is inherent in distributed and multi-processor (non-shared-memory) applications, but not in shared-memory multi-threaded ones.

Page 5: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <5>

Relational Databases

An RBD ‘instance’ (a snapshot of its value at time t) is a set of ‘tuples’; each tuple contains a sequence of scalar attribute values for its coordinates or fields.

One tuple is stored externally as a record in a ‘flat’ file, and internally as a sequence of data members for a single object.

Any homogeneous set of tuples can be viewed conceptually as a table with rows and columns, with one column per field or data member and one row per tuple or object instance.

‘Tuples generalize Vectors in the sense that coordinates of the tuple space can be different types, while vectors have homogeneous type for their coordinates.

A table of tuples has an abstract geometrical interpretation as a set of points that ranges over a multi-typed coordinate space.

Page 6: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <6>

Relational Databases

An RBD is a set of relations, each of which is a set of ‘tuples’.

Each tuple represents an object with scalar attributes. Each relation may be stored externally as a set of similarly

structured records in a ‘flat’ file. A single relation or file can be viewed conceptually as a

table of rows. The tuples of one relation define a geometrical set of points

in a multi-typed coordinate space. Tuples generalize Vectors in the sense that coordinates of the tuple space can be different types, while vectors have one homogeneous type for all of their coordinates.

Page 7: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <7>

Database Normalzation

Complex structured data types (and object instances) are decomposed or ‘normalized’ into simple parts of ‘Second Normal Form’ (2NF): (no composite=structured attributes or repeating groups of values are allowed).

For maintenance and reliability reasons, the design is further normalized (3NF): (There are no redundant or indirectly computable field values and all properties are stored in only one place.)

[Refs: Sanders Ch. 3 & Appndx A; Kroenke Chapter 2 ]

Page 8: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <8>

Composite pkeys in RDB’s

Every tuple must have a unique field (or set of fields) called its primary key (pkey) which uniquely identifies it.

A composite pkey for a child or component tuple is often built by concatenating multiple key fields from a chain of ancestors. This complicates pkey-to-fkey matching.

Example: Dept--->Course--->Section (ERD on next slide) – CS has Dept# = 91 and OOAD has course# = 91.523. – Many courses have the same section # 201, so 201 is only a unique

identifier within sections of a particular course, just as 523 is only a unique course# within a particular Department.

– In my syllabus the course id 05f523 assumes only one Dept (91) and one Section (291), but it combines a semester id (05f) for the term=Fall 2005 with the course number 523 so I can partition directories for this course over multiple terms.

Page 9: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <9>

Composite Pkeys (2 Examples)

A unique pkey for my section of SWEngI is a composite of Dept, Course and Section number:

CS Dept.pkey: 91

SWEngCourse pkey: 91+523

Section (one of many)pkey: 91+523+291

(These are ’instance diagrams’, not ERD’s. They show field values for a single table row;an ERD only shows entity and field types.

Termpkey: 05f

RJL Coursepkey: 523

Examples: SWEngI: (Schedule id vs. my directory id

My directory id for SWEngI 91.523 in fall 2005 associates 05f with 523:

Directory (one)Pkey: 01f523

Page 10: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <10>

Surrogate Keys in RDB’s

The composite key for my section of OOAD in the Student Information System (SIS) Database is a composite of Dept, Course and Section number: 91.523.291.

For IBM’s RDB, EFCodd advocated a hidden ‘surrogate’ pkey to replace the user-defined composite keys. This improves code quality and performance (by expediting the fundamental RDB operation ‘join’: match pkeys to fkeys).

[Ref: EFCodd: Extending the RDB Model…, ACM TODS Dec’79]

Example: Entity with old and new key name and value: Entity: alternate (old pkey): surrogate (name = value): DEpt deptNo = 91 DEid = DE0000xx COurse courseNo = 91+523 COid = CO000yyy SEction sectNo= 91+523+291 SEid = SE000zzz

Page 11: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <11>

RDB with Surrogate pfkeys: A GEN Example: CS Dept View of SIS Database ERD: Entity: alternate (old pkey): surrogate (name = value): Dept deptNo = 91 DEid = DE000001 Course courseNo = 91+523 COid = CO000220 Section sectNo= 91+523+291 SEid = SE002601*

Department DEDE00000191 Course CO

CO000220DE000001523

Section SE*SE002601CO000220291

A Persistence Requirement (WHY?): Each table has a mnemonic abbreviation (DE,CO,SE) encoded into the pkey value of its objects.

(Note that the fkey only references the immediate ancestor or container of an object or tuple.)

(*Sections of ALL Courses are in table SE)

Page 12: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <12>

Surrogate Keys in COOL/GEN GEN uses surrogate pkeys and matching

fkeys, but does not hide them. (OK for CAD/CASE tools with hi-tech users.)*

Pkeys can never be re-used for new objects, as long as fkeys exist that can reference their former object (in old but still-in-use database versions).

____________* CJDate: The RDB Model (A-W 2001) faults EFCodd for

recommending that users should not see surrogate pfkey values.

Page 13: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <13>

Persistent Object Identifiers C++ and Java runtime objects have an object-id (oid),

typically represented by its virtual memory address on the heap. This oid corresponds at least conceptually to the pkey of an RDB tuple. This type of oid is not visible and not persistent, because it disappears when the program terminates.

Large RDBMS’s can avoid loss of information and achieve persistence by taking over or duplicating OS virtual memory management functions: moving large segments of virtual memory to/from mass storage in a fail-safe manner, saving relative addresses on hard disk.

Another way to achieve persistence is to convert pkey/fkey relationships to/from object references during database import/export. This is done by COOL/GEN.

Page 14: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <14>

Persistent Databases Persistence means that pkeys and fkeys are preserved

during export to mass storage or remote sites and re-import by the same or another DataBase Management System (DBMS)

A relational database (RBD) supports inter-object relationships by foreign key (fkey) fields. These are both user-visible and persistent: they get saved in mass storage if the program terminates.

The process of mapping RDB pkey-fkey associations to and from C++ pointers is called ‘pointer swizzling’.

Database in Main Memory

Databasein MassStorage

import

export

Page 15: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <15>

Referential Integrity

The principle of ’Referential Integrity’: – To maintain valid database content, all fkey values

must match the unique primary key or object identifier of another tuple, or else have the reserved ‘null’ (unknown or undefined) value.

N-ary relations (N-way associations) can be implemented by a new associative entity, whose tuples contain exactly N fkeys (plus optional non-key attributes).

Most relations are binary (N = 2). Note that fkeys may refer to the same or different types.

See Example 2, Composite Pkeys (slide 9) and next slide 16

Page 16: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <16>

N-ary Relation (ERD Styles) N-ary relations are many-to-many associations among N

object instances (of the same or different types). N-way associations can be implemented by introducing a new

associative entity, whose tuples contain exactly N fkeys (plus optional non-key attributes). (Most relations are binary: N = 2).

The diamond indicates a ternary relation among types AA, BB and CC. [It is superfluous if N=2, if the relation is one to many, or if an associative entity replaces it.]

Optional attributes

AA

BB

CC

AA BB CC

AABBCCNew Entity AABBCC gives these attributes a decent home, and replaces the diamond.

(3 fkeys inside)

Examplefor N=3:

Page 17: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <17>

Extended ER Diagrams When an RDB implements an Extended ERD (EERD), a

tuple’s fkeys or inter-object cross-references can identify either a super-class object or an associated parent or container object (instance of a class).

– These two fkey types have different semantic interpretations.– The principle of ‘Referential Integrity’ constrains fkey values to

the pkey value of some existing parent (AA or BB) instance. To improve readability, EERD’s use different link layouts

for inheritance and for instance-level associations:

AA BB

CC0..*

In this example, CC both inherits from AA and is a component of thecomposite entity BB.It contains two fkeys,(say) AAid and BBid.

1..1

Page 18: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <18>

Multiple Inheritance on EERD’s Multiple inheritance requires an fkey to each superclass

object whose properties (attributes or methods) are inherited.

In a prototype or delegation implementation of multiple inheritance, superclass object[s] actually exist apart from their corresponding subclass object[s]. Each sub-object has one fkey to each of its direct ancestor object types.

For a C or C++, an object with multiple inheritance paths to the same superclass type must be constrained to inherit data from the same instance as well. (Path divergence is risky!)

In an ORDB, fkeys also support dynamic method delegation. The COOL/LCP interpreter implements such a dynamic map (from a concrete object to its generic or prototype Active Instance, from object class to generic Active Class).

Page 19: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <19>

ORDB via Prototype Delegation An Extended ERD (EERD) can be implemented as either a

relational RDB, object-oriented OODB, or object-relational ORDB. An OODB is supported by its own class-based data representations.*

An ORDB can be class-based or prototype-based with delegation. (GEN is prototype-based.)**

Prototype delegation does not rely on Class membership for method inheritance - it creates object-level relationships to support method delegation: ANY client object can ‘delegate’ any of its behavior to another server object via the oid equivalent of an fkey.

* CJDate insists that OODBs are fully supported by the relational model without any extensions. (I agree :-) [Ibid]

** Breugg:OOSE (10.4.4) calls this a vertical mapping.

Page 20: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <20>

GEN Database: Persistence

Our GEN tool imports an external RDB to a memory-resident object-relational database (ORDB):

Its external persistent RDB format is a union of records representing tuples of different types.

During import, fkeys are augmented or replaced by parent and first-child and next-sibling object reference pointers, which follow strict GEN naming conventions.

During export, pointers are discarded; fkeys are preserved or restored for persistent storage in external RDB tuples.

Page 21: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <21>

GEN Database: Schema Constraints The external RDB schema (or EER Diagram) is first

converted to Third Normal Form. Other attributes that would normally comprise a user-

defined (and typically composite) primary key can be removed during schema or EERD conversion to Third Normal Form.

This eliminates redundant attributes that functionally depend on some fkey instead of the pkey attribute.

The corresponding EERD network tends to have links which are all 1 to many relations with mandatory parents and optional children (1..1 to 0..* on UML class diagrams).

Inheritance links are 1..1 to 0..1 on class diagrams, because at most one subclass instance can inherit from any given ancestor class instance.

Page 22: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <22>

GEN Database: External Format

Our GEN tool imports an external RDB to a memory-resident object-relational database (ORDB):

Its external RDB format is a union of records representing tuples of different types:

Every tuple record has an integral and immutable ‘surrogate’ primary key attribute (and object id).

Different tuple types have pairwise disjoint pkey ranges. All foreign keys (fkeys) use this surrogate pkey value to

refer to their parent (container or superclass) record type. Chgen and gencpp are external-database-compatible.

Page 23: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <23>

GEN Database: Internal Format During import, fkeys are augmented or replaced by direct

parent object pointers plus first-child and next-sibling object reference pointers. These are constructed from fkey names following strict GEN naming conventions.

This results in an internal ORDB format which is a set of multiply-threaded linked lists of parent-to-children and super-to-subclass object (tuple instance) reference pointers.

Parent-pointers support direct access to parent table attributes, replacing pair-wise join queries in an RDB.

For each 1-to-many parent-child relationship, chgen provides a child_loop macro while gencpp provides a for-each iterator.

Gencpp has two optional modes: chgen or C++/STL.

Page 24: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <24>

GEN Database: Import/ExportGEN creates two schema-based import/export utilities: pr_load parses tuples and imports an external RDB into a

memory-resident object-relational database (ORDB); pr_dump exports the modified ORDB back to the persistent

external RDB. During import, fkeys are augmented or replaced by direct

parent pointers plus first-child and next-sibling object reference pointers. These are constructed from fkey names following strict GEN naming conventions, Super- and sub-class objects are also connected in the same way.

This results in an internal ORDB format which is a set of multiply-threaded linked lists from each parent through each of its child-sets, that supports parent-child JOINs.

[Ref: Jiri Soukup: Taming C++, A-W 1994; see also $PH/JiriSoukup]

Page 25: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <25>

Importing RDB’s to C++/Java If the RDB is imported to an object-relational database

implemented in C++ or Java, then during import the fkey fields of RDB tuple types should be converted to corresponding C++/Java object reference types.

Caveat/pre-condition: All fkeys implied by links on the RDB’s data model or EERD must conform to inheritance and type constraints of the language (C++ or Java).

Fkeys in an RDB can also support non-exhaustive or over-lapping subclasses (going beyond C++ constraints).

Fkeys and object references can also support dynamic migration (of an object among the subclasses of its class).– Example: An object may make transitions among OLC

states (states become subclasses of the object’s class).

Page 26: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <26>

Object-Relational Databases - Prototypes and Delegation

The last few slides were inspired by Shlaer-Mellor-User Group email related to Divergent Inheritance (parallel hierarchies). This motivates the use of prototypes and delegation to explain the static information architecture that is supported by COOL’s chGEN/GENcpp code generator, and illustrates concurrent sub-state machine models for dynamic behavior.

– To: [email protected]– Subject: Re: (SMU) Polymorphic events and other

paranormal activity– Message 10/734 From [email protected] – Sep 04, 01 08:45:33 AM– responding to Fontana: . . .

Page 27: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <27>

Divergent Hierarchies responding to Fontana: > I think Jay was driving at divergent hierarchies, not multiple

inheritance, eg: > relationship S1 - supertype Dog, subtypes BigDog and SmallDog > relationship S2 - supertype Dog, subtypes BlackDog and WhiteDog

Relationship S1:(BLACK xor WHITE)(Mutex and exhaustive):

DOG CLASS

Relationship S2: (BIG xor SMALL)

(Mutex and exhaustive):

Black Dog White Dog Big Dog Small Dog

Divergent Hierarchies Example: > relationship S1 - supertype Dog, subtypes BigDog xor SmallDog> relationship S2 - supertype Dog, subtypes BlackDog xor WhiteDog

Page 28: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <28>

OLC’s with Concurrent Sub-states

> Assume each of the 4 subclasses has its own ‘object lifecycle’ (OLC): > BigDog: Woofing <--> Sleeping > SmallDog: Yipping <--> Skittering > BlackDog: Panting <--> Drooling > WhiteDog: Shedding <--> Scratching > Now create one instance of Dog - let's say it is a big black dog, with a > dogId = 13. It must be in one of the BigDog states (Woofing or Sleep-

> ing), > AND in one of the BlackDog states (Panting or Drooling).

Dog #13 (Big and Black):Woofing

SleepingPanting Drooling

Big:

Black:

(Illustrates direct product composition of concurrent STDs)

Page 29: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <29>

Merging OLC Behaviors of Concurrent Subclasses:

Each of the 4 subclasses has its own ‘object lifecycle’ (OLC); E.g. every Big&Black Dog must be in one of the BigDog states (Woofing

or Sleeping), AND in one of the BlackDog states (Panting or Drooling). Dog #13 (Big and Black) has the behavior/activity of both BigDogs and

BlackDogs:

Woofing

Sleeping

Panting Drooling

Big Dog OLC:

Black Dog OLC:

Woof&Pant

Woof&Drool

Sleep&Pant

Sleep&Drool

Page 30: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <30>

Divergent Hierarchies - revisited (1)

C++ does not support divergent class hierarchies. One alternate is prototype objects with delegation. RDB’s can support prototypes and delegation: In our example, each dog object belongs to one subclass for color,

and simultaneously to another subclass for size. That is, a ‘real’ dog object simultaneously belongs to, and inherits

from, exactly one of the subclasses in each inheritance tree above. The next slide shows (by its messiness) that multiple inheritance is

best avoided.

Partition S1:(BLACK xor WHITE)(Mutex and exhaustive):

DOG CLASSPartition S2:

(BIG xor SMALL) (Mutex and exhaustive):

Black Dog White Dog Big Dog Small Dog

Page 31: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <31>

Divergent Hierarchies - revisited (2)

Level 3 models concrete ‘leaf’ objects or ‘real’ dogs, which simultaneously belong to a distinct pair of subclasses at level 2 of the inheritance tree (compositional inheritance of properties).

So there are really 4 leaf classes at level 3, below level 2 above. Each leaf class instance at level 3 has exactly two paths up to level

1; both paths must end up at the same root object (Dog instance).

Partition S1:(BLACK xor WHITE)(Mutex and exhaustive):

DOG CLASSPartition S2:

(BIG xor SMALL) (Mutex and exhaustive):

Black Dog White Dog Big Dog Small Dog

Big Black Dog Big White Dog Small Black Dog Small White Dog

Page 32: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <32>

Composition or Implementation Inheritance

With compositional inheritance, dogs will inherit from two ‘component’ classes: Color and Size.

This is ‘impure’ multiple inheritance in C++ ( impure because the two ancestor classes have nothing in common with animals, which may not behave well as clients of Color or Size ancestor methods).

Java does not have multiple inheritance - but any class may ‘implement’ the interfaces Color’ and ‘Size’ instead.

Dogs must then be eligible to inherit (C++) or implement (Java) all the methods of the Color and Size classes - an undesirable compromise. Over-riding only hides the mis-match between class Dog and Color or Size classes.

Page 33: Database Design &Implementation

Obj-RelDBv05f.ppt - RJL 050907 - <33>

References Frank & Ulrich: ”Delegation: An Important Concept for the

Appropriate Design of Object Models”, JOOP June 2000 (p.13-17,44) Eliens: Principles of OO Software Dev. 2ed., AWL 2000 (Sect. 5.4:

Prototypes - delegation vs. inheritance) Kilov/Ross: Information Models, PH 1994 (Not about delegation, but

covers multiple/concurrent/overlapping subclass membership.) Lee &Tepfenhart: UML and C++: A Practical Guide to OO Dev, 2ed,

PH 2001(pp206-210) (Multiple Inheritance examples Fig. 12-4,12-5) Sanders: Data Modeling, Boyd-Fraser/ITP 1995 (Ch. 3 and Appendix) Jiri Soukup: Taming C++, A-W 1994; see also $PH/JiriSoukup CJDate: The RDB Model, A-W 2001 CJDate: Database Systems, 7ed! A-W 2000 EFCodd: Extending the RDB Model…, ACM TODS Dec’79 Kroenke: Database Concepts, 2nd Ed, P-H 2005 Chapter 2 Breugge & Dutoit: OO Software Engineering, A-W 2004.