Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
CSC4480: Principles of Database Systems
Lecture 2: Data Models/Entity Relationship Model
Slide 2
Steps in Designing a Relational Database
• Requirements and Specification– This involves scoping out the requirements and limitations of the database
• Conceptual Design– This involves using a data model (e.g Entity Relationship Model) to represent
the requirements of the specifications
• Logical Design – This involves translating the conceptual design into tables and relationships
expressed in a data model and implemented in a DBMS.
Slide 3
Importance of Database Design
• A foundation for good applications• Is a communication tool that facilitates the interaction among the
designer, the application’s programmer, and the end user• Both an art and a science
Slide 4
Defining Data Models
• Models are simplified abstractions of real world events or conditions
• A data model is the relatively simple representation using graphic, of facts (data) that are used to describe real world events, objects, or ideas– A data model illustrates what data are being represented (of interest for the
end use or system) and how the data elements relate to each other– A data model represents data characteristics, relationships, constraints, and
transformations.
Slide 5
The importance of Data Models
• Facilitates communication• Gives various views of the database• Organizes data for various users• Provides an abstraction for the creation of good a database
Slide 6
Database Models
• A database model is a collection of logical constructs (based on a data model) used to represent the data structure and the data relationships found within a specific database.
• They include a set of concepts to describe the structure of a database, the operations for manipulating these structures, and certain constraints that the database should obey.
Slide 7
Data Model Structure
• Entity: Represents an object of interest to a user. Each entity describes an object in the real world.
• Attributes: Characteristics that describes an entity• Relationships: Association between entities
– One to Many (1:M)•One team consists of many players
– Many to Many (M:N)•An employee might learn many job skills and each job skills might be learned by many employees
•EMPLOYEE(M) learns SKILLS(N)– One to One (1:1)
•A president leads one country
Slide 8
• Identify the Model Characteristics for the database below:
Slide 9
Data Model Structure (cont’d)
• Constraints: A restriction placed on the data– E.g Salary must be greater than 0 and less than 350– A GPA must be greater than or equal to 0.00 and less than or equal to 4.00
using 2 decimal places– A pilot cannot fly more than 10 hours during any 24 hour period.
Slide 10
Naming Conventions
• Entity name requirements– Be descriptive of the objects in the business environment– Use terminology that is familiar to the users
• Attribute name– Required to be descriptive of the data represented by the attribute
• Proper naming– Facilitates communication between parties– Promotes self-documentation
Slide 11
In Class Exercise
• What entities does uber have?• What attributes do you think Uber would have with the Entities you
identified?• Describe relationships that might exist between each entities
Slide 12
Business Rules
• A business rule is a brief, precise and unambiguous description of a policy procedure or principle within an organization
• Business rule are essential for:– Setting standards within an organization– Serving as a communication tool among users and designers– Describing the nature, role and scope of the data– Understanding of business processes– Developing the data model
Slide 13
Business Rule Example
• A Customer may generate many invoices• An invoice is generated by one customer
Slide 14
Business Rule Example
• A Customer may generate many invoices• An invoice is generated by one customer
• What are some business rules for Uber?
Slide 15
Translating Business Rules into Data Model Components
• Business rules set the stage for the proper identification of entities, attributes, relationships, and constraints– Nouns translate into entities– Verbs translate into relationships among entities
• Relationships are bidirectional– Questions to identify the relationship type
•How many instance of B are related to one instance of A?•How many instances of A are related to one instance of B?
Slide 16
Categories of Data Models
• Conceptual (high-level, semantic) data models:– Provide concepts that are close to the way many users perceive data.
•(Also called entity-based or object-based data models.)
• Physical (low-level, internal) data models:– Provide concepts that describe details of how data is stored in the computer. These
are usually specified in an ad-hoc manner through DBMS design and administration manuals
• Implementation (representational) data models:– Provide concepts that fall between the above two, used by many commercial DBMS
implementations (e.g. relational data models used in many commercial systems).
Slide 17
Database Schema
• Database Schema:– Represents the logical view of a database.
•Consists of tables– Defines how data is organized, the relationship between them, and constraints
that might exist.– A good schema can mean the difference between queries lasting a second to
hours.
• Schema Diagram:– An illustrative display of (most aspects of) a database schema.
• Schema Construct:– A component of the schema or an object within the schema, e.g., STUDENT,
COURSE.
Slide 18
Database State
• Database State:– The actual data stored in a database at a particular moment in time. This
includes the collection of all the data in the database.– Also called database instance (or occurrence or snapshot).
•The term instance is also applied to individual database components, e.g. record instance, table instance, entity instance
Slide 19
Database Schema vs Database State
• Distinction– The database schema changes very infrequently. – The database state changes every time the database is updated.
• Schema is also called intension.– Intension refers to a given relation which is independent of time. An intension
defines all permissible extensions.
• State is also called extension.– Extension is the state of all records appearing in a database at a given time
Slide 20
Example of a Database State
Slide 21
Example of a Database Schema
Slide 22
Three-Schema Architecture
• Proposed to support DBMS characteristics of:– Program-data independence.– Support of multiple views of the data.
• Not explicitly used in commercial DBMS products, but has been useful in explaining database system organization
Slide 23
Three-Schema Architecture
• Defines DBMS schemas at three levels:– Internal schema at the internal level to describe physical storage structures and
access paths (e.g indexes). •Typically uses a physical data model.
– Conceptual schema at the conceptual level to describe the structure and constraints for the whole database for a community of users. •Uses a conceptual or an implementation data model.
– External schemas at the external level to describe the various user views. •Usually uses the same data model as the conceptual schema.
Slide 24
Three-Schema Architecture
Slide 25
Three-Schema Architecture
• Mappings among schema levels are needed to transform requests and data. – Programs refer to an external schema, and are mapped by the DBMS to the
internal schema for execution.– Data extracted from the internal DBMS level is reformatted to match the user’s
external view (e.g. formatting the results of an SQL query for display in a Web page)
Slide 26
Data Independence
• Logical Data Independence: – The capacity to change the conceptual schema without having to change the
external schemas and their associated application programs.
• Physical Data Independence:– The capacity to change the internal schema without having to change the
conceptual schema.– For example, the internal schema may be changed when certain file structures
are reorganized or new indexes are created to improve database performance
Slide 27
Data Independence (continued)
• When a schema at a lower level is changed, only the mappingsbetween this schema and higher-level schemas need to be changed in a DBMS that fully supports data independence.
• The higher-level schemas themselves are unchanged.– Hence, the application programs need not be changed since they refer to the
external schemas.
Slide 28
DBMS Languages
• Data Definition Language (DDL)• Data Manipulation Language (DML)
– High-Level or Non-procedural Languages: These include the relational language SQL•May be used in a standalone way or may be embedded in a programming language
– Low Level or Procedural Languages:•These must be embedded in a programming language
Slide 29
DBMS Languages
• Data Definition Language (DDL): – Used by the DBA and database designers to specify the conceptual schema of
a database.– In many DBMSs, the DDL is also used to define internal and external schemas
(views).– In some DBMSs, separate storage definition language (SDL) and view
definition language (VDL) are used to define internal and external schemas.•SDL is typically realized via DBMS commands provided to the DBA and database designers
Slide 30
DBMS Languages
• Data Manipulation Language (DML):– Used to specify database retrievals and updates– DML commands (data sublanguage) can be embedded in a general-purpose
programming language (host language), such as COBOL, C, C++, or Java.•A library of functions can also be provided to access the DBMS from a programming language
– Alternatively, stand-alone DML commands can be applied directly (called a query language).
Slide 31
Types of DML
• High Level or Non-procedural Language:– For example, the SQL relational language– Are “set”-oriented and specify what data to retrieve rather than how to retrieve
it. – Also called declarative languages.
• Low Level or Procedural Language:– Retrieve data one record-at-a-time; – Constructs such as looping are needed to retrieve multiple records, along with
positioning pointers.
Slide 32
DBMS Interfaces
• Stand-alone query language interfaces– Example: Entering SQL queries at the DBMS interactive SQL interface (e.g.
SQL*Plus in ORACLE)• Programmer interfaces for embedding DML in programming languages• User-friendly interfaces
– Menu-based, forms-based, graphics-based, etc.• Mobile Interfaces: interfaces allowing users to perform transactions
using mobile apps
Slide 33
Types of Data Models
• Network Model• Hierarchical Model• Relational Model• Object-oriented Data Models• Object-Relational Models
Slide 34
Network Model
• The first network DBMS was implemented by Honeywell in 1964-65 (IDS System).
• Adopted heavily due to the support by CODASYL (Conference on Data Systems Languages) (CODASYL - DBTG report of 1971).
• Later implemented in a large variety of systems - IDMS (Cullinet - now Computer Associates), DMS 1100 (Unisys), IMAGE (H.P. (Hewlett-Packard)), VAX -DBMS (Digital Equipment Corp., next COMPAQ, now H.P.).
• Schema is represented as graphs in which objects types are nodesand relationships are arcs.
Slide 35
Network Model
• Advantages:– Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.– Can handle most situations for modeling using record types and relationship
types.– Language is navigational; uses constructs like FIND, FIND member, FIND
owner, FIND NEXT within set, GET, etc. •Programmers can do optimal navigation through the database.
• Disadvantages:– Navigational and procedural nature of processing– Database contains a complex array of pointers that thread through a set of
records.•Little scope for automated “query optimization”
Slide 36
Hierarchical Model
• Initially implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems.
• IBM’s IMS product had (and still has) a very large customer base worldwide
• Hierarchical model was formalized based on the IMS system.
• Other systems based on this model: System 2k (SAS inc.)
Slide 37
Hierarchical Model
• Data is stored in a tree-like structure – Data is stored as records and connected by links – Each child record has one parent while each parent record has one or more
child records. – Record retrieval is achieved by traversing the tree from the root node
Slide 38
Hierarchical Model
• Advantages: – Simple to construct and operate – Corresponds to a number of natural hierarchically organized domains, e.g.,
organization (“org”) chart – Language is simple: •Uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT, etc.
• Disadvantages: – Navigational and procedural nature of processing – Database is visualized as a linear arrangement of records – Little scope for "query optimization"
Slide 39
Relational Model
• Proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82.• Now in several commercial products (e.g. DB2, ORACLE, MS SQL Server,
SYBASE, INFORMIX).• Several free open source implementations, e.g. MySQL, PostgreSQL• Currently most dominant for developing database applications.• SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2), SQL-99, SQL3, …• Chapters 5 through 11 describe this model in detail
Slide 40
Relational Model
• Advantages– Structural independence is promoted using independent tables– Tabular view improves conceptual simplicity– Ad hoc query capability is based on SQL– Isolates the end user from physical-level details– Improves implementation and management simplicity
• Disadvantages– Requires substantial hardware and system software overhead– Conceptual simplicity gives untrained people the tools to use a good system
poorly– May promote information problems
Slide 41
Object-oriented Data Model
• Several models have been proposed for implementing in a database system.
• One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE).
• Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS (at H.P.- used in Open OODB).
• Object Database Standard: ODMG-93, ODMG-version 2.0, ODMG-version 3.0.
Slide 42
Object-oriented Data Model
• The trend to mix object models with relational was started with Informix Universal Server.
• Relational systems incorporated concepts from object databases leading to object-relational.
• Exemplified in the versions of Oracle, DB2, and SQL Server and other DBMSs.
• Current trend by Relational DBMS vendors is to extend relational DBMSs with capability to process XML, Text and other data types.
• The term “Object-relational” is receding in the marketplace.
Slide 43
Object-Oriented Model
• Advantages– Semantic content is added– Visual representation includes semantic content– Inheritance promotes data integrity
• Disadvantages– Slow development of standards caused vendors to supply their own
enhancements– Complex navigational system– Learning curve is steep– High system overhead slows transactions
Slide 45
The Entity Relationship Model (ERM)
• Graphical representation of entities and their relationships in a database structure– Entity relationship diagram (ERD): uses graphic representations to model
database components– Entity instance or entity occurrence: rows in the relational table– Attributes: describe particular characteristics– Connectivity: term used to label the relationship types
Slide 46
ERD Representation
Slide 47
ERD Representation – Crow’s Foot Notation
What are the business rules?Draw it using Google Drawings
Slide 48
ERD Translation
Slide 49
In Class Exercise
What are the business rules?
Slide 50
In class Exercise
• In groups, design an ERD for the UNIVERSITY database described in the previous class.