Upload
bruce-hunt
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
C6 Databases
2
Traditional file environment
• Data Redundancy and Inconsistency: – Data redundancy: The presence of duplicate data in
multiple data files so that the same data are stored in more than one place or location
– Data inconsistency: The same attribute may have different values.
• Program-Data Dependence:– The coupling of data stored in files and the specific
programs required to update and maintain those files such that changes in programs require changes to the data and vice versa
Lack of Flexibility A traditional file system can deliver routine
scheduled reports after extensive programming efforts, but it cannot deliver ad-hoc reports or respond to unanticipated information requirements in a timely fashion
Poor security Management may have no knowledge of who
is accessing or making changes to the organization’s data
Lack of data sharing and availability: Information cannot flow freely across different
functional areas or different parts of the organization.
3
More problems
Relational Hierarchical and Network Object-oriented
4
Types of databases
The focus of this lecture is on relational
databases.
The Database Approach
• Relational DBMS– Represents data as two-dimensional tables
called relations– Relates data across tables based on common
data element • Examples: Access, DB2, Oracle, MS SQL
Server
6-15
Managing data
6-16
The Database ApproachManaging data
5
High Level
Data hierarchy
A group of values for the set of fields makes a record (tuple) (row)
A group of records makes a table (file) A group of tables (files) makes a database A field name serves to label each column of each table
6
Database ideas
Record
Fields can contain Strings (text characters) Numeric Sometimes very specific formats (e.g. Date)
8
Types of fields
Select Creates subset of rows that meet specific
criteria Join
Combines relational tables to provide users with information
requires a field in common between the tables being joined
Project Create a subset consisting of certain
columns of the table results in a new smaller table
9
Types of operationsin a relational database
6-18
The Database Approachto Data Management
10
The Database Approachto Data Management
Selections are related to choosing table rows. Projections are related to choosing table
columns Joins are related to choosing records that
have a common value in a field shared by two tables.
11
Summary on db operations
Conceptual design: Abstract model of database from a business perspective
Physical design: how data are actually structured on physical storage media
Entity-relationship diagram: Methodology for documenting databases illustrating relationships between database entities
Normalization: Process of creating small stable data structures from complex groups of data
Primary Keys: Each table requires a unique identifier (a field or a set of fields) 11
Designing a database
Data definition language: Specifies content and structure of database and defines each data element
Data manipulation language: Used to process data in a database; permits users to extract data
Data dictionary: Stores definitions of data elements and data characteristics; can indicate usage and ownership
13
DBMS
The Database Approach to Data Management 6-29
Distributed database
• A database that is stored in more than one physical location
• Reduce the vulnerability of a single, massive central site
• Increase service and responsiveness to local users
• Can often run on smaller, less expensive computers
• Depend on high-quality telecommunications lines
Online Analytical Processing Multidimensional data analysis
Supports manipulation and analysis of large volumes of data from multiple dimensions/perspectives
14
OLAP
16
Data Warehouse
• A massive database that stores current and historical data
• Data are standardized into a common data model
• Consolidated across entire enterprise for management analysis and decision making
17
Tools for analyzing large pools of data Find hidden patterns and infer rules to predict trends
Bank of Montreal uses data mining better understand their customers by using various
query tools a statistical package, and in-house developed
analytics one set of analytics sends prompts to an account
manager, indicating that a specific bank customer has changed their banking patterns and that the manager should follow up with that customer
This data mining leads to retaining business and customers and even obtaining new business from those customers.
18
Data mining
Managing Data Resources
• Establishing an information policy– Specifies the organization’s rules for sharing,
disseminating, acquiring, standardizing, classifying, and inventorying information
– Data administration is responsible for specific policies and procedures through which data is managed
• Data governance– Quality, management, policy, risk
management• DBA (Database administrator )
– installation, configuration, upgrade, administration, monitoring and maintenance of databases
6-40
Managing Data Resources
• Ensuring Data Quality• Data Quality Audit
– Structured survey of the accuracy and completeness of data in an information system
• Data cleansing– consists of activities for detecting and
correcting data in an information system
6-41