Upload
martina-cooper
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
UNIT I
Data base System ApplicationsData base System VS file System View of Data – Data Abstraction Instances and Schemas Data Models
– The ER Model – Relational Model – Other Models
Database Languages - DDL – DML
Database Access for applications Programs Data base Users and Administrator Transaction Management Data base System Structure Storage Manager The Query Processor
History of Data base Systems
Data base design and ER diagrams
Beyond ER Design
Entities, Attributes and Entity sets
Relationships and Relationship sets
Additional features of ER Model
Conceptual Design with the ER Model
Conceptual Design for Large enterprises.
Data: Known facts that can be recorded and have an implicit meaning
Database: A collection of related data.
Database Management System (DBMS): A software package/ system to facilitate the creation and maintenance of a computerized database
Introduction
Applications of DBMS
- Banking- Airlines- Universities- Credit card transactions- Tele communication- Finance- Sales- Manufacturing- Human resources
-- Electronic Commerce-- Social Networking-- OnLine Analytical Processing (OLAP)
File Organization : Terms and Concepts
•Database: Group of related files
•File: Group of records of same type
•Record: Group of related fields
•Field: Group of words or a complete number
•Byte: Group of bits that represents a single character
•Bit: Smallest unit of data; binary digit (0,1)
Data Hierarchy in a Computer System
Problems with the Traditional File Environment
•Data redundancy & inconsistency
•Program-Data dependence
•Lack of flexibility
•Integrity problems
•Poor security
•Atomicity problems
•No concurrency control
Traditional File Processing
DBMS and its Advantages
•A Database Management System is a collection of programs that enables users to create and maintain a database. It is a general purpose software system that facilitates processes of defining, constructing and manipulating databases for various applications.
•Advantages of Database approach:
• Controlling Redundancy
• Restricting Unauthorized access
• Providing persistent storage for program objects and data
structures
• Permitting inference and actions using deduction rules
• Providing multiple user interface
• Representing complex relationships among data
• Enforcing integrity constraints and providing backup and
recovery
View of Data: Data Abstraction
ExternalSchema 1
ExternalSchema 2
ExternalSchema n
...
ConceptualSchema
PhysicalSchema
External Schemas
how the data are physically stored
what data are stored,what relationships, constraints exist
customizations of the conceptualschema to theneeds of variousclasses of users
actual data
1st levelof abstraction
2nd levelof abstraction
3rd levelof abstraction
External view/External schema
• Different users often need different views of the data.• Example:
– Accountant needs to have access to financial information on a student
– Head of Department needs access to academic information.– Student needs to access marks information.
• An external schema is a description of part of the DB as seen by an application programmer or a user
Logical View/Conceptual Schema
• Representation of the logical structure of the information content of the DB.
• Abstracts away from the actual physical storage.• It is, in a sense, a composite of all the external schema.
Physical/Internal Schema
• The internal schema describes the data as it is physically stored.• For example,
– record structure– types of fields in a record– existence of primary and secondary indexes
Example: University Database
• Conceptual schema: – Students(sid: string, name: string, login: string,
age: integer, gpa:real)– Courses(cid: string, cname:string, credits:integer) – Enrolled(sid:string, cid:string, grade:string)
• Physical schema:– Relations stored as unordered files. – Index on first column of Students.
• External Schema (View): – Course_info(cid:string,enrollment:integer)
Schemas & Instances
Schema
• Description of how data is stored or represented– Physical schema describes the structure of database (i.e., data
files and indexes) as it is stored on disk (e.g., sequence of byte addresses and offsets)
– Conceptual schema describes data in terms of higher-level concepts (e.g., a Student is a record containing ID, Name, and Address fields, which are of type INTEGER, STRING and STRING respectively)
Financial Records (Company, Type, Name, Date, Amt, NumShares, Broker)
Company Type Name Date Amt NumShares Broker
Trimark MutualFund
TrimarkFund
01/01/84 49.75 100 C. Harris
AGF MutualFund
ForeignEquity
01/01/94 62.25 1000 C. Harris
Data ModelsData Models
A data model provides a way to describe the design of a database at the physical, logical, and view level.
A collection of tools for describing
– Data
– Data relationships
– Data semantics
– Data constraints
Relational model
Entity-Relationship data model (mainly for database design)
Object-based data models (Object-oriented and Object-relational)
Semistructured data model (XML)
Other older models:
• Network model
• Hierarchical model
Relational ModelRelational Model
• Example of tabular data in the relational model
Attributes
A Sample Relational DatabaseA Sample Relational Database
The Entity-Relationship ModelThe Entity-Relationship Model
• Models an enterprise as a collection of entities and relationships
– Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects• Described by a set of attributes
– Relationship: an association among several entities
• Represented diagrammatically by an entity-relationship diagram:
Hierarchical Model
• First database systems introduced in the 1960’s were based on the “hierarchical model”
• This model assumes all data relationships can be structured as hierarchies.
• Uses a “parent”” and “child” relationship where a child can only have one parent.
• Hierarchical Model still used today on some IBM mainframes such as IBM’s IMS system.
• Need to redesign entire database to change how data is interrelated.
• Can lead to confusion in relationships of data.
• Only allows one parent record type.
RatingsRatings SalarySalary
CompensationCompensation JobJob
AssignmentsAssignments
PensionPensionInsuranceInsuranceHealthHealth
BenefitsBenefits
EmployerEmployer
Network Model
• Network Model came about from the Conference on Data Systems Language (CODASYL).
• Design is based on “data sets” • Similar to the hierarchical model with sequential access properties.• Uses logically related data items to define record types.
• As data sets are added so is the complexity and confusion.
• Procedural model of one record processing at a time.
• Difficult to change data once it is stored
• Need detailed understanding of the data and how it is stored.
Other Data ModelsOther Data Models
• Object-Based data models
Object-oriented data model
• Object-oriented programming language
Object-relational data model
• Structured and collection types
• Object orientation
• Semistructured data models:
– Specification of data where individual data items of the
same type may have different sets of attributes.
Database Languages
• Data Definition Language: To specify the database schema.
• Data Manipulation Language: To express database queries and
updates.
Data Manipulation Language (DML)Data Manipulation Language (DML)
Language for accessing and manipulating the data organized by the appropriate data model
– DML also known as query language
Two classes of languages
– Procedural – user specifies what data is required and how to get those data
– Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data
SQL is the most widely used query language
- query is a statement requesting the retrieval of information.
- DML involves information retrieval is called a query language
Data Definition Language (DDL)Data Definition Language (DDL)
Specification notation for defining the database schema
Example: create table account (
account_number char(10),
branch_name char(10),
balance integer)
DDL compiler generates a set of tables stored in a data dictionary
Data dictionary contains metadata (i.e., data about data)
– Database schema
– Data storage and definition language
• Specifies the storage structure and access methods used
– Integrity constraints
• Domain constraints
• Referential integrity (e.g. branch_name must correspond to a valid
branch in the branch table)
Overall System Structure
Database Users
Users are differentiated by the way they expect to interact with the system
• Application programmers – interact with system through DML calls
• Sophisticated users – form requests in a database query language
• Specialized users – write specialized database applications that do not fit into the traditional data processing framework
• Naïve users – invoke one of the permanent application programs that have been written previously– Examples, people accessing database over the web, bank
tellers, clerical staff
Database Administrator
• Coordinates all the activities of the database system; the database
administrator has a good understanding of the enterprise’s
information resources and needs.
• Database administrator's duties include:
– Schema definition
– Storage structure and access method definition
– Schema and physical organization modification
– Granting user authority to access the database
– Specifying integrity constraints
– Monitoring performance and responding to changes in requirements
Storage ManagerStorage Manager
• Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
• Translates DML statements into low-level file-system commands.
• The storage manager is responsible to the following tasks:
– Interaction with the file manager
– Efficient storing, retrieving and updating of data
– Controls the placement of data on disk and its movement between disk and main memory.
• Data structures:
– Data files: which store the database itself.
– Data dictionary: stores meta data about the structures of the database.
– Indices: provide fast access to data items.
Storage ManagerStorage Manager
Components:• Authorization and integrity manager:
– Tests for satisfaction of integrity constraints and checks the authority of users to access data.
• Transaction manager:
– Consistent state
– Concurrent transaction executions
• File manager:
– Manages the allocation of space on disk storage and the data structures used to represent information stored on disk.
• Buffer manger:
– Responsible for fetching the data from disk storage into main memory, and deciding what data to cache in main memory.
– It enables the database to handle data sizes that are much larger than the size of main memory.
Query ProcessorQuery Processor
Components:
• DDL interpreter: interprets DDL statements and records the
definitions in the data dictionary.
• DML compiler: translates DML statements in a query
language into an evaluation plan consisting of low-level
instructions that the query evaluation engine understands.
– Translate into any of a number of alternative evaluation
plans that all give the same result
• Query evaluation engine: executes low-level instructions
generated by the DML compiler.
Query ProcessingQuery Processing
1. Parsing and translation2. Optimization3. Evaluation
Query ProcessingQuery Processing
• Alternative ways of evaluating a given query
– Equivalent expressions
– Different algorithms for each operation
• Cost difference between a good and a bad way of evaluating a query can be enormous
• Need to estimate the cost of operations
– Depends critically on statistical information about relations which the database must maintain
– Need to estimate statistics for intermediate results to compute cost of complex expressions
Transaction ManagementTransaction Management
• A transaction is a collection of operations that performs a
single logical function in a database application
• Transaction-management component ensures that
the database remains in a consistent (correct) state
despite system failures (e.g., power failures and operating
system crashes) and transaction failures.
• Concurrency-control manager controls the interaction
among the concurrent transactions, to ensure the
consistency of the database.
Transaction Management
ACID Properties
• Key concept is a transaction: a sequence of database actions (reads/writes).
• A transaction is a collection of operations that performs a single logical function in a database application
• DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a execution
• Each transaction, executed completely, must take the DB between consistent states or must not run at all.
• DBMS ensures that concurrent transactions appear to run in isolation.
• DBMS ensures durability of committed execution even if system crashes.
History of Databases• 1960s:
– Early 1960s: Charles Bachmann developed first DBMS at Honeywell (IDS)
• Network model where data relationships are represented as a graph.
– Late 1960s: First commercially successful DBMS developed at IBM (IMS)
• Hierarchical model where data relationships are represented as a tree
• Still in use today (SABRE reservations; Travelocity)– Late 1960s: Conference On DAta Systems Languages (CODASYL)
model defined. This is the network model, but more standardized.• 1970s:
– 1970: Ted Codd defined the relational data model at IBM San Jose Laboratory (now IBM Almaden)
– Two major projects start (both were operational in late 1970s)• INGRES at University of California, Berkeley
– Became commercial INGRES, followed-up by POSTGRES which was incorporated into Informix
• System R at IBM San Jose Laboratory– Became DB2
– 1976: Peter Chen defined the Entity-Relationship (ER) model
• 1980s
– Maturation of relational database technology
– SQL standardization (mid-to-late 1980s) through ISO
– The real growth period
• 1990s
– Continued expansion of relational technology and improvement
of performance
– Distribution becomes a reality
– New data models: object-oriented, deductive
– Late 1990s: incorporation of object-orientation in relational
DBMSs Object-Relational DBMSs
– New application areas: Data warehousing and OLAP, Web and
Internet, interest in text and multimedia
Entity-Relationship Model
• The entity-relationship (ER) data model allows us to describe the data
involved in a real-world enterprise in terms of objects and their
relationships and is widely used to develop an initial database design.
• The ER model is used in a phase called conceptual database design
Overview of Database Design
The database design process can be divided into six steps.
The ER model is most relevant to the first three steps:
1) Requirements Analysis:
To understand what data is to be stored in the database,
What applications must be built on top of it, and
What operations are most frequent and subject to performance
requirements.
In other words, we must find out what the users want from the database.
2) Conceptual Database Design:
To develop a high-level description of the data to be stored in the
database, along with the constraints .
To create a simple description of data that closely matches how users
and developers think of the data.
Conceptual design: (ER Model is used at this stage.)
– What are the entities and relationships in the enterprise?
– What information about these entities and relationships should we
store in the database?
– What are the integrity constraints or business rules that hold?
– A database `schema’ in the ER Model can be represented pictorially
(ER diagrams).
– Can map an ER diagram into a relational schema.
3) Logical Database Design:
Convert the conceptual database design into a database schema in the data model.
We will only consider relational DBMSs, and therefore, the task in the logical design step is to convert an ER schema into a relational database schema.,
sometimes called the logical schema, in the relational data model
Beyond the ER -Model
4) Schema Refinement:
To analyze the collection of relations in our relational database schema to identify potential problems, and to refine it.
In contrast to the requirements analysis and conceptual design steps, which are essentially subjective, schema refinement can be guided by some elegant and powerful theory.
5) Physical Database Design:
Simply involve building indexes on some tables and clustering
some tables, or it may involve a substantial redesign of parts of
the database schema obtained from the earlier design steps.
6) Security Design:
We identify different user groups and different roles played by
various users (e.g., the development team for a product, the
customer support representatives, the product manager).
For each role and user group, we must identify the parts of the
database that they must be able to access and the parts of the
database that they should not be allowed to access, and take
steps to ensure that they can access only the necessary parts.
Entities, Attributes, And Entity Sets
• An entity is an object in the real world that is distinguishable from other objects.– Example: specific person, company, event, plant
• A collection of similar entities is called an entity set.– Example: set of all persons, companies, trees, holidays
• Note that entity sets need not be disjoint
• An entity is described using a set of attributes.
– Example: people have names and addresses
Fig: The Employees entity set
Employees
ssnname
lot
• For each attribute associated with an entity set, we must
identify a domain of possible values.
• For example, the domain associated with the attribute name of
Employees might be the set of 20-character strings.
• If the company rates employees on a scale of 1 to 10 and stores
ratings in a field called rating, the associated domain consists of
integers 1 through 10.
• For each entity set, we choose a key.
• A key is a minimal set of attributes whose values uniquely
identify an entity in the set. There could be more than one
candidate key; if so, we designate one of them as the primary
key
Candidate & primary keys• Every entity type must have an attribute/set of attributes that uniquely
identifies each instance & clearly distinguishes that instance from other
instances of the same type
• Candidate key : an attribute/combination of attributes that uniquely
identifies each instance of an entity type
• e.g. STUDENT NO
Entity Type : GAME Entity Type : EMPLOYEE
Attributes Then EMPLOYEE No,
HOME TEAM NAME+ADDRESS => Candidate keys
VISITING TEAM
REFEREE
DATE
RESULT then HOME TEAM + VISTING TEAM =.> Candidate key
– Primary Key : A candidate key selected as the identifier for an entity
may not be null
AttributesAttributes
• Attribute types:
– Simple and composite attributes.
– Single-valued and multi-valued attributes
• Example: multivalued attribute: phone_numbers
– Derived attributes
• Can be computed from other attributes
• Example: age, given date_of_birth
Multivalued attributes
• Multivalued attribute : An attribute that can have more than one value for each entity instance
– During conceptual design highlight them– Subsequently normalise entity data - remove multivalued attributes &
place them in a separate entity type
EMP NO NAMEADDRESS SKILL
EMPLOYEE
Composite attributes can be divided into smaller parts which represent simple attributes with independent meaning
Simple Attribute: Aircraft-Type Complex Attribute: Aircraft-Location which is comprised of :
Aircraft-Latitude, Aircraft-Longitude, Aircraft-Altitude
Derived Attributes
Some attribute values can be derived from related attribute values: Age Date - B-day Y-Sal 12 * M-Sal NotationNotation
EMPLOYEE
M-sal
B-days Y-sal
Age
Some attribute values can be derived from attributed values of related entities
total-value = sum (qty * price)
Order
Item price
qty
Total-Value
Relationships And Relationship Sets
• A relationship is an association among two or more entities
• As with entities, we may wish to collect a set of similar
relationships into a relationship set.
• A relationship set can be thought of as a set of n-tuples:
• {(e1,……… ,en) / e1 € E1,…….., en € En}
• Each n-tuple denotes a relationship involving n entities e1
through en, where entity ei is in entity set Ei.
• Note that several relationship sets might involve the same
entity sets. For example, we could also have a Manages
relationship set involving Employees and Departments.
Relationships And Relationship Sets
lot
dname
budgetdid
sincename
Works_In DepartmentsEmployees
ssn
Fig: The Works In Relationship Set
Relationships And Relationship Sets
• A relationship can also have descriptive attributes.
• Descriptive attributes are used to record information about
the relationship, rather than about any one of the participating
entities;
• A relationship must be uniquely identified by the participating
entities, without reference to the descriptive attributes.
• An instance of a relationship set is a set of relationships.
Relationships And Relationship Sets• Consider Works_In: An employee can work in many
departments; a dept can have many employees.
• In contrast, each dept has at most one manager, according to
the key constraint on Manages.
Many-to-Many1-to-1 1-to Many Many-to-1
Relationships And Relationship Sets
Degree of a Relationship SetDegree of a Relationship Set• Refers to number of entity sets that participate in a relationship set.
• Relationship sets that involve two entity sets are binary (or degree two). Generally, most relationship sets in a database system are binary.
• Relationship sets may involve more than two entity sets.
• Relationships between more than two entity sets are rare. Most relationships are binary. (More on this later.)
Example: Suppose employees of a bank may have jobs (responsibilities) at multiple branches, with different jobs at different branches. Then there is a ternary relationship set between entity sets employee, job, and branch
Relationships And Relationship Sets• As another example of an ER diagram, suppose that each
department has offices in several locations and we want to record
the locations at which each employee works.
• This relationship is ternary because we must record an association
between an employee, a department, and a location. The ER
diagram for this variant of Works In, which we call Works In2, is
shown in Figure .
Relationships And Relationship Sets
• The entity sets that participate in a relationship set need not
be distinct; sometimes a relationship might involve two entities
in the same entity set.
Additional Features of the ER model
1) Key Constraint
• An employee can work in several departments, and a department can have several employees
• Now consider another relationship set called Manages between the Employees and Departments entity sets such that each department has at most one manager, although a single employee is allowed to manage more than one department.
• The restriction that each department has at most one manager is an example of a key constraint, and it implies that each Departments entity appears in at most one Manages relationship
in any allowable instance of Manages.
Additional Features of the ER model
This restriction is indicated in the ER diagram of Fig by using
an arrow from Departments to Manages.Intuitively, the arrow states that given a Departments entity, we
can uniquely determine the Manages relationship in which it
appears.
Additional Features of the ER model
• An instance of the Manages relationship set is shown in
Figure 2.7. While this is also a potential instance for the
Works In relationship set, the instance of Works In shown in
Figure 2.3 violates the key constraint on Manages.
Additional Features of the ER model
• A relationship set like Manages is sometimes said to be one-to-many, to indicate that one employee can be associated with many departments (in the capacity of a manager), whereas each department can be associated with at most one employee as its manager.
• In contrast, the Works In relationship set, in which an employee is allowed to work in several departments and a department is allowed to have several employees, is said to be many-to-many.
• If we add the restriction that each employee can manage at most one department to the Manages relationship set, which would be indicated by adding an arrow from Employees to Manages in Figure 2.6, we have a one-to-one relationship set.
Key Constraints for Ternary Relationships
Relationship sets involving three or more entity sets:
• If an entity set E has a key constraint in a relationship set R,
each entity in an instance of E appears in at most one
relationship in (a corresponding instance of) R.
• To indicate a key constraint on entity set E in relationship set
R, we draw an arrow from E to R.
Key Constraints for Ternary Relationships
Key Constraints for Ternary Relationships• Each employee works in at most one department, and at a
single location.
• An instance of the Works In3 relationship set is shown in
Figure 2.9. Notice that each department can be associated
with several employees and locations, and each location can
be associated with several departments and employees;
however, each employee is associated with a single
department and location
Key Constraints for Ternary Relationships
2) Participation Constraints
• The key constraint on Manages tells us that a department has at most one manager.
• A natural question to ask is whether every department has a manager.
• Let us say that every department is required to have a manager.
• Participation constraint; the participation of the entity set Departments in the relationship set Manages is said to be total.
• A participation that is not total is said to be partial.
• As an example, the participation of the entity set Employees in Manages is partial, since not every employee gets to manage a department.
Participation Constraints
• Revisiting the Works In relationship set, it is natural to expect that
each employee works in at least one department and that each
department has at least one employee.
• This means that the participation of both Employees and
Departments in Works In is total.
• If the participation of an entity set in a relationship set is total, the
two are connected by a thick line; independently, the presence of an
arrow indicates a key constraint.
Participation Constraints
Participation Constraint
• If every entity participates in at least one relationship, a participation constraint holds:– e.g., every professor works in at least one department
WorksInProfessor Department
• If every entity participates in exactly one relationship, both a participation and a key constraint hold:
– e.g., every professor works in exactly one department
• Representation in E-R diagram:
WorksInProfessor Department
Weak Entities
• A weak entity can be identified uniquely only by considering the
primary key of another (owner) entity.
– Owner entity set and weak entity set must participate in a one-
to-many relationship set (one owner, many weak entities).
– Weak entity set must have total participation in this identifying
relationship set.
lot
name
agepname
DependentsEmployees
ssn
Policy
cost
4) Class Hierarchies
• As in C++, or other PLs, attributes are inherited.
• If we declare A ISA B, every A entity is also considered to be a B entity.
• Reasons for using ISA: – To add descriptive attributes specific to a subclass.– To identify entities that participate in a relationship
Class Hierarchies• To classify the entities in an entity set into subclasses.
• A class hierarchy can be viewed in one of two ways:
Summary of Symbols Used in E-R Notation
1. Discuss the differences between a file processing system and DBMS.2. Explain the functions of DBA?3. Draw the architecture of a DBMS and explain the functionality of each component in it.4.Discuss the various DDL, DML commands with illustrations in SQL.5.Discuss the advantages and disadvantages of a database system6.Which of the following plays an important role in representing information about the real world in a database? Explain briefly.
(a) The data definition language.(b) The data manipulation language(c) The buffer manager.
7.(a)How do order file systems and database approach differ in relationship between the data and application program?
(b) Describe the general properties of data models.8. (a) Explain the difference between physical and logical data independence. (b) Write in detail about the database languages.9. Define a database and a database Management system and give the purpose of them10. What is data abstraction? Explain three levels of data abstraction with an example.
Assignment - I
1. Construct an E-R diagram for university registrar’s office. The office
maintains data about each class, including the instructor, the enrollment and the time and place of the class meetings. For each student class pair, a grade is recorded. Determine the entities and relationships that exist between the entities. Also construct the tabular representation of the entities and relationships.
2. What is an entity type? What is an entityset? Explain the difference
between the entity, entity type and entityset?
3. Explain the difference between week entity and strong entity set? How to
represent the strong entity and weak entity set through ER-diagrams
4. The State Bank of India offers the five different types of accounts : loan,
checking, recurring deposits, locker accounts and fixed deposit. The Bank
has a number of branches and a client of the bank can open many
accounts. A account can be joint and more than one client may operate an
account. Identify the entities , attributes what relationships exist among
these entities and Tables. Draw the corresponding E-R diagram
5. Construct and E-R diagram for a car insurance company with a set of cus-
tomers, each of whom own a number of cars. Each car has a number of
recorded accidents associated with it.
6. Construct an E-R diagram for a hospital with a set of patients and a set
of medical doctors. Associated with each patient a log of various tests and
examinations conducted.
7. Define the terms entity, attribute and relationship. Discuss the
conventions used for construction of a ER diagram for a database.
8. Explain with an E-R model diagram of an Airline reservation system
identify the entities, attributes, relationship exist among entites. Also
construct a tabular representation of the entities and relationships.
9. Explain the following terms
i)Relationship instance ii) Composite attribute iii) Multivalued attribute iv) Derived attribute
10. Explain the difference between the following
i) Attribute and value set ii) Relationship instance and Relationship type