Data Warehousing and Mining Original) (2)

Embed Size (px)

Citation preview

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    1/25

    Data Warehousing And Data Mining

    Presented By

    E.David Joshua

    &Pankaj Jain

    CSE department

    Tirumala Engineering College

    Keesara, Bogaram.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    2/25

    Topics To Be discussed:

    Introduction

    History Of Data Warehousing

    Data Warehousing

    Data Warehouse Architecture

    Data Mining KDD process

    Classification of data mining systems

    Data mining Architecture

    Conclusion

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    3/25

    Introduction: Data Warehousing, OLAP(Online Analytical Processing) and Data Mining: what and why ?

    Relation to OLTP(Online Transaction Processing )

    A producer wants to know.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    4/25

    Data, Data everywhere

    yet ...

    I cant find the data I need:

    data is scattered over the network

    many versions, subtle differences

    I cant get the data I need:

    need an expert to get the data

    I cant understand the data I found:

    available data poorly documented

    I cant use the data I found:

    results are unexpected

    data needs to be transformed from one form to other

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    5/25

    Data should be integrated across the enterprise Summary data has a real value to the organization

    Historical data holds the key to understand data over time

    What-if capabilities are required

    What are the users saying...

    We need a special Data Base..!!

    A single, complete and consistent store of data

    obtained from a variety of different sources made

    available to end users in a what they can understandand use in a business context is called DATA

    WAREHOUSING.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    6/25

    60s: Batch reportshard to find and analyze information

    inflexible and expensive, reprogram every new request

    70s: Terminal-based DSS and EIS (executive information systems)still inflexible, not integrated with desktop tools

    80s: Desktop data access and analysis toolsquery tools, spreadsheets, GUIs

    easier to use, but only access operational databases

    90s: Data warehousing with integrated OLAP engines and tools

    History Of Data Warehousing:

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    7/25

    What is Data Warehousing?

    Data

    Information

    A process of transforming data intoinformation and making it available to users

    in a timely enough manner to make a

    difference.

    Simply, it is a collection of various databases

    into a single roof.

    A data warehouse is a subject-oriented

    integrated

    time-variant

    non-volatile

    collection of data that is used primarily in

    organizational decision making.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    8/25

    DataWarehouseArchitecture..!!

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    9/25

    Client:-

    * Query specification

    * Data Analysis

    * Data access

    Application/Data Mart Server:-

    * Summarizing

    * Filtering

    * Meta Data

    DW Server:-

    * Data logic

    * Data services

    * Meta data

    * File services

    Three-Tier Architecture of Data Warehouse

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    10/25

    Construction And Maintaining of Warehouse..!!

    A good database schema must me designed to hold an integratedcollection of data copied from various sources.

    Data is extracted from operational databases and external sources.

    Cleaned to minimize errors and Fill in missing information when possible.

    The cleaned and transformed data is finally loaded into the warehouse.

    The Transforming of data is typically accomplished by defining

    a relational view over the tables in the data sources.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    11/25

    DepartmentallyStructured

    IndividuallyStructured

    Data WarehouseOrganizationallyStructured

    Less

    More

    HistoryNormalizedDetailed

    Data

    Information

    Data Warehouse vs. Data Marts

    A data mart is the access layer of the data

    warehouse environment that is used to getdata out to the users.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    12/25

    Problems with Data Mart Centric Solution

    If you end up creating multiple warehouses, integratingthem is a problem

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    13/25

    True Warehouse

    Data Marts

    Data Sources

    Data Warehouse

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    14/25

    You are going to spend much time

    extracting, cleaning, and loading data.

    You are going to find problems with

    systems feeding the data warehouse.

    Your warehouse users will develop conflicting business rules.

    You will need to validate data not being validated by transaction processing systems.

    Data Warehouse Pitfalls

    You are building a HIGH maintenance

    system.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    15/25

    For a Successful Warehouse

    From day one establish that

    warehousing is a joint user/builder

    project.

    Look closely at the data extracting,cleaning, and loading tools.

    Determine a plan to test the

    integrity of the data in the

    warehouse.

    From the start get warehouse users in the habit of 'testing' complex queries.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    16/25

    Data Mining

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    17/25

    What is data mining?

    Extractingor mining Knowledge from large amount of data.

    On what kind of data mining can be done..??

    Relational databases

    Data warehouses

    Transactional databases

    Advanced data and Information systems

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    18/25

    Data Mining: A KDD Process

    Data Cleaning

    Data Integration

    Databases

    Data Warehouse

    Task-relevant Data

    Selection

    Data Mining

    Pattern Evaluation

    KnowledgData mining: The core of knowledgediscovery process.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    19/25

    Data Mining

    Database

    TechnologyStatistics

    Other

    Disciplines

    Information

    Science

    Machine

    LearningVisualization

    Classification of data mining systems:

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    20/25

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    21/25

    What kind of patterns can be mined..?

    Concept/Class description

    Mining frequent patterns, Association and Correlations

    Classification and Prediction

    Cluster Analysis

    Outliner Analysis

    Evolution analysis

    Query Processing

    Indexing: Exploiting indexes to reduce scanning of data is ofcrucial importance.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    22/25

    Bitmap Indexes

    22

    Customer

    Query : select * fromcustomer wheregender = F and vote

    = Y

    0

    0

    0

    0

    0

    0

    0

    0

    0

    1

    1

    1

    1

    1

    1

    1

    1

    1

    M

    F

    F

    F

    F

    M

    Y

    Y

    Y

    N

    N

    N

    Join Indexes

    A join index between a fact table and a dimension table correlates a

    dimension tuple with the fact tuples that have the same value on the

    common dimensional attribute.

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    23/25

    Data Mining and Data Warehousing

    The goal of a data warehouse is to support decision making with data.

    Data mining can be used in conjunction with a data warehouse to help

    with certain types of decisions.

    Data mining can be applied to operational databases with individual

    transactions.

    To make data mining more efficient, the data warehouse should havean aggregated or summarized collection of data.

    Goals of Data Mining and Knowledge Discovery

    Prediction:

    Identification:

    Classification:

    Optimization:

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    24/25

    A data warehouse takes the organizations operational data, historical data and

    external data consolidates it into a separately designed manages it into a format

    that is optimized for end users to access and analyze.

    The data warehouse technology together with online transaction processing and

    data mining, allows the management to provide better customer service.

    Last but never the least; the Internet has emerged as the largest data

    warehouse of unstructured and free form data. The new technologies aregeared towards mining this great data warehouse.

    CONCLUSION

  • 8/2/2019 Data Warehousing and Mining Original) (2)

    25/25

    Thank You..!!

    Any Queries..??