26
Principles of Database Design NLM/MBL Medical Informatics

courses.mbl.edu

Embed Size (px)

Citation preview

Page 1: courses.mbl.edu

Principles of Database Design

NLM/MBL Medical Informatics

Page 2: courses.mbl.edu

NLM/MBL Medical Informatics

Session Outline

◆ Why learn this?◆ Database Principles and Paradigms◆ Principles of Relational Database

Design◆ System design and building methods◆ Exercise: Transforming flat files to

tables

Page 3: courses.mbl.edu

NLM/MBL Medical Informatics

Why Learn about Database Design?

◆ Vendors will sell you on user interfaces, but the power and flexibility is in the data model

◆ Evaluating and comparing products◆ Communicating with vendors and IT

support staff◆ Building your own databases

Page 4: courses.mbl.edu

NLM/MBL Medical Informatics

What is a Database?

◆ An organized collection of information– Computer-based representation– Systematic, automated retrieval

– Systematic, automated symbol manipulation

Page 5: courses.mbl.edu

NLM/MBL Medical Informatics

Historical Evolution of Databases

◆ Dedicated files created & maintained by application software (sequential, random access)

◆ Database Management Systems (DBMSs)

Page 6: courses.mbl.edu

NLM/MBL Medical Informatics

Hierarchical Databases

Lab Results

5/30/96

Serum Na+

Pt=Smith

Advantages: efficient storage and I/O, rapid access via predetermined data hierarchies

Disadvantages: difficult to view/retrieve data from other perspectives, hard to modify underlying structure

Page 7: courses.mbl.edu

NLM/MBL Medical Informatics

Information Network Databases

Advantages: Can model complex many-to-many relationships as well as hierarchies and simple lists

Disadvantages: difficult to predict & control effects of transitive relationships; recursion; I/O intensive, potential to become incomprehensible

“Database as Hypertext”

Page 8: courses.mbl.edu

NLM/MBL Medical Informatics

Relational Databases

Advantages: Understandable, permits variety of logical aggregation or “views” of data elements, structure easily modifiable, new elements generally do not “break” existing programs

Disadvantages: I/O intensive, 1 logical record may = many physical records, relational integrity is a constant concern & must be under software control

“Rows & Columns with inter-table references”

Pt-UI Testname Date12345 Serum_Na 5/30/9642353 CBC 5/30/9647756 ESR 5/30/9612348 HBsAg 5/30/9634523 Amylase 5/30/96

Lab_testPt-UI Lname Fname

12345 Smith Elmer12346 Jones Barbara12347 Clark Arthur12348 Jones Casey12349 Sample Steve

Patient

Page 9: courses.mbl.edu

NLM/MBL Medical Informatics

Object-Oriented Databases

◆ Multiple data types including text, graphics, sound, signals, etc.

◆ Encapsulation of data & programs◆ Interprocess messaging: e.g., “Print

Yourself”

Advantages: applications programs consist of high level commands & functions which do not need to know the underlying data organization; modularity, reusability and portability between systems

Disadvantages: early in commercialization; CPU intensive; few standards for query & object sharing

Page 10: courses.mbl.edu

NLM/MBL Medical Informatics

Fundamental Assertions about Systems Design

◆ The Data Model is the most critical aspect of system design and function

◆ Data Models should reflect real world objects and their relationships to ensure durability

◆ A correct Data Model subserves and outlasts applications, including many not anticipated at system start-up

Page 11: courses.mbl.edu

NLM/MBL Medical Informatics

Object-oriented Systems design:Basic Concepts

◆ The World contains Things e.g., Collies, Terriers, Bloodhounds

◆ We develop abstractions of things called “objects” e.g., dog

◆ We group objects by criteria which represent the abstract object as an empty table

Dog Name Breed Favorite Food Birthdate

Page 12: courses.mbl.edu

NLM/MBL Medical Informatics

Basic Concepts, cont’d

◆ Empty tables can be filled in to represent the real world things from which the object was abstracted

Dog Name Breed Favorite Food Birthdate

Boris St. Bernard Canned Jan 81

Fifi Poodle Dry May 92

Fido Pomeranian Canned Apr 87

Page 13: courses.mbl.edu

NLM/MBL Medical Informatics

Basic Concepts, cont’d

◆ There are Relationships between objects which are attributes of those objects

Dog Name License Owner Name Lic. Date

Owner Name Address PhoneRelationship: “OWNS” Dog Owner OWNS Dogs

Page 14: courses.mbl.edu

NLM/MBL Medical Informatics

Objects◆ All of the real-world things in the set (the

“instances”) have the same characteristics◆ All instances conform to the same rules

So that...

License Exp. Date Manufacturer Model

123 ABC Jan. 97 Ford Taurus

691XKY Mar.98 Honda Prelude

12-A-962 Apr.98 ? Poodle

...you don’t get holes in the table ...you don’t get strange values

LICENSE

Page 15: courses.mbl.edu

NLM/MBL Medical Informatics

Types of Objects (ie., types of tables)

◆ Tangible Things e.g., book◆ Roles e.g., doctor, patient, supervisor◆ Incidents (=events, occurences) e.g., ordering of

a lab test◆ Interactions (bind two or more other objects via a

transaction) e.g., Purchase relates Buyer to Seller

◆ Specifications (definition tables of tangible things)

Page 16: courses.mbl.edu

NLM/MBL Medical Informatics

Table Notation

Patient_Admissions

Pt_ID Date_Adm Time_Adm Unit Room

Empty Table form:

Graphical Form:

Patient_Admissions* Pt_ID-Date_Adm-Time_Adm-Unit-Room

Textual Form:

Patient_Admissions (Pt_ID,Date_Adm, Time_Adm, Unit,Room)

Page 17: courses.mbl.edu

NLM/MBL Medical Informatics

Formalisms for Tables◆ Rule 1: One instance of an object has exactly

one value for each attribute (i.e, only one data element at each row-column intersection; no repeating groups, no true “holes” in table)

◆ Rule 2: Attributes must contain no internal structure

Name Age-SexSmith 38-FJones 22-MClark 18-M

Not OK:

If Rules 1 and 2 are obeyed, the data model is in “First Normal Form”

Page 18: courses.mbl.edu

NLM/MBL Medical Informatics

Formalisms for Tables, cont’d

◆ Rule 3: Every attribute should represent a characteristic of the entire object, not a characteristic of a limited part of the object

Hospital Committee Membership* Person Name* Committee Name-Date committee term expires-Date first joined hospital staff

Not OK:

Attribute of hospitalstaff appointment, notcommitteeHospital Committee Membership

* Person Name* Committee Name-Date committee term expires

OK:

Page 19: courses.mbl.edu

NLM/MBL Medical Informatics

Relationships

◆ A relationship is the abstraction of a set of associations that hold systematically between different kinds of real world things– Patient OCCUPIES bed– Library CONTAINS books– Specimen IS ASSAYED by Lab Method

◆ Most relationships may be stated in the inverse also:

– Library LENDS book– Book IS LENT BY Library

Page 20: courses.mbl.edu

NLM/MBL Medical Informatics

Relationship Types

State GovernorOne-to-One:has

governs

Many-to-Many Authorwrites

Bookis written by

One-to-Many Dog Ownerowns

Dogis owned by

Page 21: courses.mbl.edu

NLM/MBL Medical Informatics

Modeling Many-to-Many Relationships

DRUG MANUFACTURER* manufacturer name- other attributes

DRUG*generic name- other attributes

LICENSE* manufacturer name* generic name- date licensed

Page 22: courses.mbl.edu

NLM/MBL Medical Informatics

Overall System Design Process

◆ Build the Entity-Relationship diagram for all defined objects (tables), [including an Object Specification Document]

◆ [Create a State Transition Model which describes changes to objects based on events or transactions]

◆ [Create a Data Flow diagram which models the information elements which cause State Transitions]

[Recommended for multi-programmer projects]

Page 23: courses.mbl.edu

Exercise: Devise a Relational Model

for MEDLINE citations

Page 24: courses.mbl.edu

UI - 90134185AU - Greenes RA ; Shortliffe EHTI - Medical Informatics. An Emerging academic discipline and institutional priorityMH - Hospital Information Systems; Career Choice; Medical Informatics/EDUCATION/*TRENDSPT - JOURNAL ARTICLE; REVIEW; TUTORIALEM - 9005AB - Information management constitutes a major activity of the health care profession. Currently a number of forces are focusing attention on this function...AD - Department of Radiology, Brigham and Women’s Hosp., Boston, MA 02115SO - JAMA 1990 Feb 23; 263(8):1114-20

Sample MEDLINE citation

Page 25: courses.mbl.edu

NLM/MBL Medical Informatics

The “Bottom Line” in Database Design

◆ The Data Model is the most critical aspect of system design and function

◆ Data Models should reflect real world objects and their relationships to ensure durability

◆ A correct Data Model subserves and outlasts applications, including many not anticipated at system start-up

Page 26: courses.mbl.edu

Questions?