46
Chapter 3 Chapter 3 Database Support in Data Database Support in Data Mining Mining Types of database systems How relate to data mining

Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

Embed Size (px)

Citation preview

Page 1: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

Chapter 3Chapter 3Database Support in Data MiningDatabase Support in Data Mining

Types of database systems

How relate to data mining

Page 2: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-2

ContentsContents

Describes data warehousing and related database system.Describes data warehousing and related database system.

Discusses feature of data found in data warehouseDiscusses feature of data found in data warehouse

Describes how data warehouses are typically implemented Describes how data warehouses are typically implemented and operatedand operated

Defines metadata in the context of data warehousesDefines metadata in the context of data warehouses

Show how different data systems are typically used in data Show how different data systems are typically used in data miningmining

Provides real examples of database systems used in data Provides real examples of database systems used in data miningmining

Discusses the concept of data qualityDiscusses the concept of data quality

Reviews the database software marketReviews the database software market

Page 3: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-3

Data managementData management

Retail organization generate masses of data that require very advanced data storage system.

Wal-Mart relied on modern data management to engage with SCM.

The manipulation of data is a key element in the data mining process.

Data mining and other analysis can draw upon data collected in internal systems and external sources.

Page 4: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-4

Data accessData access

Data warehouses are not requirements to do data mining, data warehouses store massive amounts of data that can be used for data mining.

Data mining analyses also use smaller sets of data that can be organized in online analytic processing (OLAP) systems of in data mining.

OLAP: provides access to report generators and graphical support.

Page 5: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-5

Contemporary DatabaseContemporary Database

Gain competitive advantage customer information systems

data mining

Develop and market new productsmicromarketing

Page 6: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-6

SystemsSystems

DatabasePersonal, small business level

On-Line Analytic Processing (OLAP)Ability to use many dimensions, reports & graphics

Data MartUsually temporary analysis

Data WarehouseUsually permanent repository

Page 7: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-7

Data WarehousingData Warehousing

Price Waterhouse definition:A data warehouse is an orderly and accessible

repository of known facts and related data that is used as a basis for making better management decisions. The data warehouse provides a unified repository of consistent data for decision making that is subject oriented, integrated, time variant, and nonvolatile.

Page 8: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-8

Data WarehousingData Warehousing

Data warehouses are used to store massive quantities of data that can be updated and allow quick retrieval of specific types of data.

Not just a technology; an architecture and process designed to support decision making

special-purpose database systems to improve query performance significantly

Three general data warehouse processes: 1. warehouse generation is the process of designing the

warehouse and loading the data.

2. Data management is the process of storing the data.

3. Information analysis is the process of using the data to support organization decision making.

Page 9: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-9

Benefits from Data WarehousingBenefits from Data Warehousing

Provide business users views of data appropriate to mission

Consolidate & reconcile (consistent) data

Give macro views of critical aspects

Timely & detailed access to information

Provide specific information to particular groups

Ability to identify trends

Page 10: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-10

Data warehousingData warehousing

Within data warehouses, data is classified and organized around subjects meaningful to the company.The data is gathered from operational systems:Barcode readers at cash registers,Information from e-commerce,Daily reports…Industry volumesEconomic data..

Data from different sources (shipping, marketing, billing) are integrated into a common format.

Page 11: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-11

Data TransformationData Transformation

Consolidate data from multiple sources

Filter to eliminate unnecessary details

Clean dataeliminate incorrect entrieseliminate duplications

Convert & translate data into proper format

Aggregate data as designed

Page 12: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-12

Data warehousingData warehousing

A data warehouse is a central aggregation of data, intended as a permanent storage facility with normalized, formatted.

Normalized implies the use of small, stable data structure within the database. Normalized data would group data elements by category, making it possible to apply relational principles in data updating.

Page 13: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-13

Key ConceptsKey Concepts

ScalabilityAbility to accurately cope with changing

conditions (especially magnitude of computing)

GranularityLevel of detail

Data warehouse – tends to be fine granularityOLAP – tends to aggregate to coarse granularity

Page 14: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-14

Data WarehousingData Warehousing

OLAP On-Line Transactional Processing

summary data detailed operational data

few users many concurrent users

data driven transaction driven

effectiveness efficiency

use spreadsheets to access

Page 15: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-15

Data MartsData Marts

Intermediate-level database system

Originally, many data marts were marketed as preliminary data warehouses. Currently, many data marts are used in conjunction with data warehouses rather than as competitive products.

Data marts are usually used as repositories of data gathered to serve a particular set of users, providing data extracted from data warehouses and/or other sources.

Often used as temporary storageGather data for study from data warehouse, other sources

(including external)Clean & transform for data mining

Page 16: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-16

OLAPOLAP

Multidimensional spreadsheet approach to shared data storage designed to allow users to extract data and generate report on the dimensions important to them.Data is segregated into different dimensions and organized in a hierarchical manner.Hypercube – term to reflect ability to sort on many dimensional formsMany forms MOLAP – multidimensional ROLAP – relational (uses SQL) DOLAP – desktop WOLAP – web enabled HOLAP - hybrid

Page 17: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-17

OLAPOLAP

One function of OLAP is standard report generation, including financial performance analysis on selected dimensions (such as by department, geographical region, product, salesperson, time…).

Supporting the planning and forecasting projects using spreadsheet analytic tools.

An OLAP product including a data warehouse, an OLAP server, and a client server on a local area network (LAN).

OLAP functions – see page. 37

Page 18: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-18

Relationships of database and DMRelationships of database and DM

Data warehouses are not required for data mining, nor are OLAP system.

However, the existence of either presents many opportunities to data mining.

Page 19: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-19

Data Warehouse ImplementationData Warehouse Implementation

Data warehouses create the opportunity to provide much better information than what was available in the past. DW can produce consistent views of events and reports.

DW provides Reliable, comprehensive source of clean dataAccurate, complete, in correct format

ProcessesSystem developmentData acquisitionData extraction for use

Page 20: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-20

Data Warehouse ImplementationData Warehouse Implementation

Implementing processes involve a degree of continuity since data warehousing is a dynamic environment.

To have a suite of software tools to extract data from sources and move it to the data warehouse itself and provide user access to this information.

Data acquisition is supported data warehouse generation.

Page 21: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-21

Data Warehouse GenerationData Warehouse Generation

Extract data from sources

Transform

Clean

Load into data warehouse60-80% of effort in operating data warehouse

Page 22: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-22

Data Extraction RoutinesData Extraction Routines

Extraction programs are executed periodically to obtain records, and copy the information to an intermediate file.

Data extraction routines:Interpret data formatsIdentify changed recordsCopy information to intermediate file

Page 23: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-23

Data TransformationData Transformation

Transformation programs accomplish final data preparation, including:The consolidation of data from multiple sourcesFiltering data to eliminate unnecessary detailsCleaning data eliminate incorrect entries of duplicationsConverting and translating data into the format

established for the data warehouseThe aggregation of data

Page 24: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-24

Data ManagementData Management

Data Management involve in:Retrieve information from data warehouseRun extraction programs to generate

repetitive reports and serve specific needsImplementation Problems:

Required data not availableInitial data warehouse scope too broadNot enough time to do prototyping, or needs

analysisInsufficient senior direction

Page 25: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-25

Meta DataMeta Data

Data warehouse management vs. data management:Data management concerns the management of all of the

enterprise’s data.Data warehouse management refers to the designs and

operation of the data warehouse through all phases of its life cycle.

Manage meta data Design data warehouse Ensure data quality Manage system during operations

Page 26: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-26

Meta DataMeta Data

Metadata is the set of reference (Data) to keep track of data, and is used to describe the organization of the warehouse.

A data catalog provides users with the ability to see specifically what the data warehouse contains.

The content of the data warehouse is defined by metadata, which provides business views of data (information access tools) and technical views (warehouse generation tools).

Page 27: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-27

Business MetadataBusiness Metadata

What data are available

Source of each data element

Frequency of data updates

Location of specific data

Predefined reports & queries

Methods of data access

Page 28: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-28

Technical Meta DataTechnical Meta Data

Data source(internal or external)

Data preparation features (transformation & aggregation rules)

Logical structure of dataPhysical structure & contentData ownershipSecurity aspects (access rights, restrictions)

System information (date of last update, retention policy, data usage)

Page 29: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-29

Wal-Mart’s Data WarehouseWal-Mart’s Data Warehouse

Heavy user of IT

Core competency – supply chain distribution2900 outletsData warehouse of 101 terabytes ($4 billion)65 million transactions per weekSubject-oriented, integrated, time-variant, nonvolatile

data65 weeks of data by item, store, day

Page 30: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-30

Wal-MartWal-Mart

Use data warehouse to:Support decision makingBuyers, merchandisers, logistics, forecasters3,500 vendor partners can queryCan handle 35 thousand queries per week

Benefit $12,000 per querySome users about 1 thousand queries per day

Page 31: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-31

Summers Rubber CompanySummers Rubber Company

Distribution firm7 operating locations10,000 items3,000 customers

Old system:OLAPDatabases transactional & summarized,

distributed

Page 32: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-32

Summers Data Storage SystemSummers Data Storage System

Built in-house, PCs, Access database

Visual Basic & Excel

Distributed systemData warehouse server controlled queries, managed

resources

SecurityPasswords gave some protectionTo protect from leaving employees, used data marts

with small versions of central database

Page 33: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-33

Summers – Negative featuresSummers – Negative features

Too much disk space on user local drives

Often difficult to understand & use

Updating multiple data sites slow, limited access

Summary data often wrong

Couldn’t use data mining toolsProblem was aggregated data stored

Page 34: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-34

ComparisonComparison

Product Use Duration Granularity

Warehouse Repository Permanent Finest

MartSpecific study

Temporary Aggregate

OLAPReport & analysis

Repetitive Summary

Page 35: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-35

Examples of Data UsesExamples of Data Uses

Customer information systems

Fingerhut

Page 36: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-36

Customer Information SystemsCustomer Information Systems

Massive databases

Detailed information about individuals and households

Use automated analysisidentify focused market target

Page 37: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-37

MicromarketingMicromarketing

Target small groups of highly responsive customers

Own niches like smaller competitors

EXAMPLES:Great Atlantic & Pacific Tea Company (A&P)

target customers, centralize buyingFingerhut

sell on credit to households <$25,000 income

Page 38: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-38

System demonstrationsSystem demonstrations

A dealer wholesaler.

A small portion for the first 10 shipments (Table. 3.1).

Data warehouse are normalized into relational form. The data is organized into a series of tables connected by keys.

Revenue

Page 39: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-39

Data martData mart

Examining the characteristics of customers who buy the products. (Advertising by mail, internet, …)Data marts could extract the data and aggregate it in a form useful for data mining.Table 3.2 shows entries that might be found in a data mart. (on product D428 in two-year interval)

Page 40: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-40

OLAPOLAP

An OLAP application focuses more on analyzing trends or other aspects of organizational operations. It may obtain much of its information from the data warehouse, but extracts granular information.

This information could be accessed to make a report by product category. Table. 3.3.

positive

Page 41: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-41

OLAPOLAP

Evaluating the value of each client to the firm.

Data can be aggregated within data mart, or on an OLAP system.

Page 42: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-42

OLAPOLAP

Organizing volume according to the shipper.

Table 3.5 displays the results of cases by shipper for each shipper.

Page 43: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-43

Data QualityData Quality

Data warehouse projects can fail, one of the most common reason is the refusal (reject) of users to accept the validity of data obtained from a data warehouse. Because: The corruption of data or missing data from the original sources. Failure of the software transferring data into or out of the data

warehouse. Failure of the data-cleansing process to resolve data inconsistence.

The responsible staff must verify the integrity of data, ensuring the data loading and storing process.

Data Integrity: Do not allow any meaningless, corrupt, or redundant data into the data warehouse.

Controls can be implemented prior to loading data, in the data migration, cleansing, transforming, and loading processes.

Page 44: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-44

Data QualityData Quality

An example of multiple variations, as illustrated in Table. 3.6.

What are the variations?1. Variations of the same customer

2. Misspell

3. Corrected spell but with a more complete definition

Page 45: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-45

Data QualityData Quality

Matching involves associating variables.Software used to introduce new data into the data warehouse needs to check that the appropriate spelling and entry values are used. Also, matching companies with addresses… and some maintenance.Software tools to ensure data quality, including:The analysis of data for typeThe construction of standardization schemesThe identification of redundant dataThe adjustment of matching criteria to achieve selected

levels of discriminationThe transformation of data into designed format

Page 46: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining

結束

3-46

Software productsSoftware products