Chapter 2:Chapter 2:Data Warehousing
Business Intelligence: Business Intelligence: A Managerial Perspective on A Managerial Perspective on
Analytics (3Analytics (3rdrd Edition) Edition)
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 22
Learning ObjectivesLearning Objectives Understand the basic definitions and
concepts of data warehouses Learn different types of data warehousing
architectures; their comparative advantages and disadvantages
Describe the processes used in developing and managing data warehouses
Explain data warehousing operations
(Continued…)(Continued…)
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 33
Learning ObjectivesLearning Objectives Explain the role of data warehouses in
decision support Explain data integration and the extraction,
transformation, and load (ETL) processes Describe real-time (a.k.a. right-time and/or
active) data warehousing Understand data warehouse administration
and security issues
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 44
Opening Vignette…Opening Vignette…
Isle of Capri Casinos Is Winning with Isle of Capri Casinos Is Winning with Enterprise Data WarehouseEnterprise Data Warehouse
Company backgroundProblem descriptionProposed solutionResultsAnswer & discuss the case questions.
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 55
Questions for the Opening VignetteQuestions for the Opening Vignette1. Why is it important for Isle to have an EDW?
2. What were the business challenges or opportunities that Isle was facing?
3. What was the process Isle followed to realize EDW? Comment on the potential challenges Isle might have had going through the process of EDW development.
4. What were the benefits of implementing an EDW at Isle? Can you think of other potential benefits that were not listed in the case?
5. Why do you think large enterprises like Isle in the gaming industry can succeed without having a capable data warehouse/business intelligence infrastructure?
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 66
Main Data Warehousing TopicsMain Data Warehousing Topics DW definition Characteristics of DW Data Marts ODS, EDW, Metadata DW Framework DW Architecture & ETL Process DW Development DW Issues
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 77
What is a Data Warehouse?What is a Data Warehouse? A physical repository where relational data
are specially organized to provide enterprise-wide, cleansed data in a standardized format
“The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 88
A Historical Perspective to A Historical Perspective to Data WarehousingData Warehousing
1970s 1980s 1990s 2000s 2010s
ü Mainframe computersü Simple data entry ü Routine reportingü Primitive database structuresü Teradata incorporated
ü Mini/personal computers (PCs)ü Business applications for PCsü Distributer DBMSü Relational DBMSü Teradata ships commercial DBsü Business Data Warehouse coined
ü Centralized data storageü Data warehousing was born ü Inmon, Building the Data Warehouse ü Kimball, The Data Warehouse Toolkit ü EDW architecture design
ü Exponentially growing data Web dataü Consolidation of DW/BI industry ü Data warehouse appliances emergedü Business intelligence popularizedü Data mining and predictive modelingü Open source softwareü SaaS, PaaS, Cloud Computing
ü Big Data analyticsü Social media analyticsü Text and Web Analyticsü Hadoop, MapReduce, NoSQLü In-memory, in-database
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 99
Characteristics of DWsCharacteristics of DWs Subject oriented Integrated Time-variant (time series) Nonvolatile Summarized Not normalized Metadata Web based, relational/multi-dimensional Client/server, real-time/right-time/active …
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 1010
Data MartData Mart
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 1111
Other DW ComponentsOther DW Components Operational data stores (ODS)
A type of database often used as an interim area for a data warehouse
Oper marts - an operational data mart. Enterprise data warehouse (EDW)
A data warehouse for the enterprise. Metadata
Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 1212
Application Case 2.1Application Case 2.1A Better Data Plan: Well-Established TELCOs A Better Data Plan: Well-Established TELCOs Leverage Data Warehousing and Analytics to Leverage Data Warehousing and Analytics to Stay on Top in a Competitive IndustryStay on Top in a Competitive Industry
Questions for DiscussionQuestions for Discussion§What are the main challenges for TELCOs?§How can data warehousing and data analytics help TELCOs in overcoming their challenges?§Why do you think TELCOs are well suited to take full advantage of data analytics?
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 1313
A Generic DW FrameworkA Generic DW Framework
DataSources
ERP
Legacy
POS
OtherOLTP/wEB
External data
Select
Transform
Extract
Integrate
Load
ETL Process
EnterpriseData warehouse
Metadata
Replication
A P
I
/ M
iddl
ewar
e Data/text mining
Custom builtapplications
OLAP,Dashboard,Web
RoutineBusinessReporting
Applications(Visualization)
Data mart(Engineering)
Data mart(Marketing)
Data mart(Finance)
Data mart(...)
Access
No data marts option
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 1414
Application Case 2.2Application Case 2.2
Data Warehousing Helps MultiCare Data Warehousing Helps MultiCare Save More LivesSave More Lives
Questions for DiscussionQuestions for Discussion
1.What do you think is the role of data warehousing in healthcare systems?
2.How did MultiCare use data warehousing to improve health outcomes?
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 1515
DW ArchitectureDW Architecture Three-tier architecture
1. Data acquisition software (back-end)
2. The data warehouse that contains the data & software
3. Client (front-end) software that allows users to access and analyze data from the warehouse
Two-tier architectureFirst two tiers in three-tier architecture is combined into
one
… sometimes there is only one tier?
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 1616
DW ArchitecturesDW Architectures
Tier 2:Application server
Tier 1:Client workstation
Tier 3:Database server
Tier 1:Client workstation
Tier 2:Application & database server
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 1717
Data Warehousing Architectures Data Warehousing Architectures Issues to consider when deciding
which architecture to use: Which database management system (DBMS)
should be used? Will parallel processing and/or partitioning be
used? Will data migration tools be used to load the data
warehouse? What tools will be used to support data retrieval
and analysis?
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 1818
A Web-Based DW ArchitectureA Web-Based DW Architecture
WebServer
Client(Web browser)
ApplicationServer
Datawarehouse
Web pages
Internet/Intranet/Extranet
Alternative DW ArchitecturesAlternative DW Architectures
SourceSystems
Staging Area
Independent data marts(atomic/summarized data)
End user access and applications
ETL
SourceSystems
Staging Area
End user access and applications
ETL
Dimensionalized data marts linked by conformed dimensions
(atomic/summarized data)
SourceSystems
Staging Area
End user access and applications
ETL
Normalized relational warehouse (atomic data)
Dependent data marts(summarized/some atomic data)
(a) Independent Data Marts Architecture
(b) Data Mart Bus Architecture with Linked Dimensional Datamarts
(c) Hub and Spoke Architecture (Corporate Information Factory)
Alternative DW ArchitecturesAlternative DW Architectures
Each architecture has advantages and disadvantages!
Which architecture is the best?
SourceSystems
Staging Area
Normalized relational warehouse (atomic/some
summarized data)
End user access and applications
End user access and applications
Logical/physical integration of common data elements
Existing data warehousesData marts and legacy systems
ETL
Data mapping / metadata
(d) Centralized Data Warehouse Architecture
(e) Federated Architecture
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 2121
Ten factors that potentially affect the Ten factors that potentially affect the architecture selection decisionarchitecture selection decision
1. Information interdependence between organizational units
2. Upper management’s information needs
3. Urgency of need for a data warehouse
4. Nature of end-user tasks
5. Constraints on resources
6. Strategic view of the data warehouse prior to implementation
7. Compatibility with existing systems
8. Perceived ability of the in-house IT staff
9. Technical issues
10.Social/political factors
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 2222
Teradata Corp. DW ArchitectureTeradata Corp. DW Architecture
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 2323
Data Integration and the Extraction, Data Integration and the Extraction, Transformation, and Load (ETL) Transformation, and Load (ETL) ProcessProcess ETL = Extract Transform Load Data integration
Integration that comprises three major processes: data access, data federation, and change capture.
Enterprise application integration (EAI)
A technology that provides a vehicle for pushing data from source systems into a data warehouse
Enterprise information integration (EII)
An evolving tool space that promises real-time data integration from a variety of sources, such as relational or multidimensional databases, Web services, etc.
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 2424
Data Integration and the Extraction, Data Integration and the Extraction, Transformation, and Load (ETL) Transformation, and Load (ETL) ProcessProcess
Packaged application
Legacy system
Other internal applications
Transient data source
Extract Transform Cleanse Load
Datawarehouse
Data mart
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 2525
ETL (Extract, Transform, Load) ETL (Extract, Transform, Load) Issues affecting the purchase of an ETL tool
Data transformation tools are expensive Data transformation tools may have a long learning
curve
Important criteria in selecting an ETL tool Ability to read from and write to an unlimited number of
data sources/architectures Automatic capturing and delivery of metadata A history of conforming to open standards An easy-to-use interface for the developer and the
functional user
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 2626
Data Warehouse DevelopmentData Warehouse DevelopmentData warehouse development approaches
Inmon Model: EDW approach (top-down) Kimball Model: Data mart approach
(bottom-up) Which model is best?
Table 2.3 provides a comparative analysis between EDW and Data Mart approachOne alternative is the hosted warehouse
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 2727
Application Case 2.5Application Case 2.5Starwood Hotels & Resorts Manages Starwood Hotels & Resorts Manages Hotel Profitability with Data WarehousingHotel Profitability with Data WarehousingQuestions for DiscussionQuestions for Discussion1.How big and complex are the business operations of Starwood Hotels & Resorts?2.How did Starwood Hotels & Resorts use data warehousing for better profitability?3.What were the challenges, the proposed solution, and the obtained results?
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 2828
Additional Data Warehouse Additional Data Warehouse Considerations Considerations Hosted Data WarehousesHosted Data Warehouses Benefits:
Requires minimal investment in infrastructure Frees up capacity on in-house systems Frees up cash flow Makes powerful solutions affordable Enables solutions that provide for growth Offers better quality equipment and software Provides faster connections … more in the book
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 2929
Representation of Data in DWRepresentation of Data in DW Dimensional Modeling
A retrieval-based system that supports high-volume query access
Star schema The most commonly used and the simplest style of
dimensional modeling Contain a fact table surrounded by and connected to
several dimension tables
Snowflakes schema An extension of star schema where the diagram
resembles a snowflake in shape
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3030
The ability to organize, present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions)Multidimensional presentation
Dimensions: products, salespeople, market segments, business units, geographical locations, distribution channels, country, or industry
Measures: money, sales volume, head count, inventory profit, actual versus forecast
Time: daily, weekly, monthly, quarterly, or yearly
MultidimensionalityMultidimensionality
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3131
Star versus Snowflake SchemaStar versus Snowflake Schema
Fact TableSALES
UnitsSold
...
DimensionTIME
Quarter
...
DimensionPEOPLE
Division
...
DimensionPRODUCT
Brand
...
DimensionGEOGRAPHY
Country
...
Fact TableSALES
UnitsSold
...
DimensionDATE
Date
...
DimensionPEOPLE
Division
...
DimensionPRODUCT
LineItem
...
DimensionSTORE
LocID
...
DimensionBRAND
Brand
...
DimensionCATEGORY
Category
...
DimensionLOCATION
State
...
DimensionMONTH
M_Name
...
DimensionQUARTER
Q_Name
...
Star Schema Snowflake Schema
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3232
Analysis of Data in DWAnalysis of Data in DW OLTP vs. OLAP…
OLTP (online transaction processing) Capturing and storing data from ERP, CRM, POS, … The main focus is on efficiency of routine tasks
OLAP (Online analytical processing) Converting data into information for decision support Data cubes, drill-down / rollup, slice & dice, … Requesting ad hoc reports Conducting statistical and other analyses Developing multimedia-based applications …more in the book
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3333
OLAP vs. OLTPOLAP vs. OLTP
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3434
OLAP OperationsOLAP Operations Slice - a subset of a multidimensional array Dice - a slice on more than two dimensions Drill Down/Up - navigating among levels of data
ranging from the most summarized (up) to the most detailed (down)
Roll Up - computing all of the data relationships for one or more dimensions
Pivot - used to change the dimensional orientation of a report or an ad hoc query-page display
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3535
OLAPOLAP
Product
Time
Geo
grap
hy
Sales volumes of a specific Product on variable Time and Region
Sales volumes of a specific Region on variable Time and Products
Sales volumes of a specific Time on variable Region and Products
Cells are filled with numbers representing
sales volumes
A 3-dimensional OLAP cube with slicing operations
Slicing Slicing Operations on Operations on a Simple Tree-a Simple Tree-DimensionalDimensionalData CubeData Cube
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3636
Variations of OLAP Variations of OLAP Multidimensional OLAP (MOLAP)
OLAP implemented via a specialized multidimensional database (or data store) that summarizes transactions into multidimensional views ahead of time
Relational OLAP (ROLAP)
The implementation of an OLAP database on top of an existing relational database
Database OLAP and Web OLAP (DOLAP and WOLAP); Desktop OLAP,…
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3737
Technology Insights 2.2Technology Insights 2.2Hands-On DW with MicroStrategyHands-On DW with MicroStrategy A wealth of teaching and learning
resources can be found at TUN portal
www.teradatauniversitynetwork.com
The available resources include scripted demonstrations, assignments, white papers, etc…
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3838
DW Implementation IssuesDW Implementation Issues Identification of data sources and governance Data quality planning, data model design ETL tool selection Establishment of service-level agreements Data transport, data conversion Reconciliation process End-user support Political issues … more in the book
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 3939
Successful DW ImplementationSuccessful DW ImplementationThings to AvoidThings to Avoid Starting with the wrong sponsorship chain Setting expectations that you cannot meet Engaging in politically naive behavior Loading the data warehouse with information just
because it is available Believing that data warehousing database design
is the same as transactional database design Choosing a data warehouse manager who is
technology oriented rather than user oriented … more in the book
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4040
Failure Factors in DW ProjectsFailure Factors in DW Projects Lack of executive sponsorship Unclear business objectives Cultural issues being ignored
Change management Unrealistic expectations Inappropriate architecture Low data quality / missing information Loading data just because it is available
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4141
Massive DW and ScalabilityMassive DW and Scalability Scalability
The main issues pertaining to scalability: The amount of data in the warehouse How quickly the warehouse is expected to
grow The number of concurrent users The complexity of user queries
Good scalability means that queries and other data-access functions will grow linearly with the size of the warehouse
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4242
Real-Time/Active DW/BIReal-Time/Active DW/BI Enabling real-time data updates for
real-time analysis and real-time decision making is growing rapidly Push vs. Pull (of data)
Concerns about real-time BI Not all data should be updated continuously Mismatch of reports generated minutes apart May be cost prohibitive May also be infeasible
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4343
Enterprise Decision Evolution Enterprise Decision Evolution
and Data Warehousingand Data Warehousing
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4444
Real-Time/Active DW at TeradataReal-Time/Active DW at Teradata
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4545
Traditional versus Active DWTraditional versus Active DW
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4646
DW Administration and SecurityDW Administration and Security Data warehouse administrator (DWA)
DWA should… have the knowledge of high-performance software, hardware
and networking technologies possess solid business knowledge and insight be familiar with the decision-making processes so as to suitably
design/maintain the data warehouse structure possess excellent communications skills
Security and privacy is a pressing issue in DW Safeguarding the most valuable assets Government regulations (HIPAA, etc.) Must be explicitly planned and executed
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4747
The Future of DWThe Future of DW Sourcing…
Web, social media, and Big Data Open source software SaaS (software as a service) Cloud computing
Infrastructure… Columnar Real-time DW Data warehouse appliances Data management practices/technologies In-database & In-memory processing New DBMS Advanced analytics …
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4848
Free of Charge DW Portal Free of Charge DW Portal for Teaching & Learningfor Teaching & Learning www.TeradataUniversityNetwork.com Password to signup: <check with your instructor>
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 4949
End of the ChapterEnd of the Chapter
Questions, comments
Copyright © 2014 Pearson Education, Inc. Copyright © 2014 Pearson Education, Inc. Slide 2- Slide 2- 5050
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the
United States of America.