Upload
gyles-elijah-mccormick
View
270
Download
12
Tags:
Embed Size (px)
Citation preview
1
Sharif University
Data WarehouseData Warehouse
2
Sharif University
ObjectivesObjectives
• Need for Data Warehouse.
• What is Data Warehouse?
• Data Warehouse Properties.
• Data Warehouse Architectures.
• Data Marts.
• Corporate Information Factory.
• Extraction, Transportation, Loading and Transformation.
• Design in Data Warehouses.
• Data Warehousing Schemas.
3
Sharif University
Decision support questions Decision support questions that enterprises need to that enterprises need to
have answered have answered
• How did sales representatives perform over different periods of time?
• What are the popular products?• What types of customers buy what types of
products?• How much are the various internal
organizations spending on what products?
4
Sharif University
Cont.Cont.
• What were the variances between the amounts budgeted and the amounts spent?
• What positions are being filled by people with what types of background?
• What is the average pay for people within different age brackets?
• What is the average pay for people within different age brackets?
5
Sharif University
What is a Data Warehouse?What is a Data Warehouse?
• A data warehouse is a relational A data warehouse is a relational database that is designed for query database that is designed for query and analysis rather than for and analysis rather than for transaction processingtransaction processing
• A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by “ William Inmon ”:
– Subject Oriented
– Integrated
– Nonvolatile
– Time Variant
6
Sharif University
Data Warehouse PropertiesData Warehouse Properties
SubjectOriented
Integrated
DataWarehouse
Non Volatile Time Variant
7
Sharif University
Subject OrientedSubject Oriented
• For example, to learn more about your company’s sales data ,
"Who was our best customer for this item, in this region last year?"
This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.
•Data is categorized and stored by business subject rather than by application.
Operational SystemsOperational Systems
Region
Time
Customer
Product CustomerFinancial
Information
CustomerFinancial
Information
Data Warehouse Data Warehouse Subject AreaSubject Area
8
Sharif University
IntegratedIntegrated
Data warehouses must put data from disparate sources into a consistent format.
9
Sharif University
Time Variant (time series)Time Variant (time series)
•Data is stored as a series of snapshots, each representing a
•period of time.
DataTime
Jan/03
Feb/03
Mar/03
Data for January
Data for February
Data for March
Data Data WarehouseWarehouse
10
Sharif University
Non VolatileNon Volatile
•Typically data in the data warehouse is not updated or deleted.
ReadRead
LoadLoad
INSERT ReadINSERT Read
UPDATEUPDATE
DELETEDELETE
Operational DatabasesOperational Databases Warehouse DatabaseWarehouse Database
Nonvolatile means that, once entered into the warehouse, data should not change .This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.
11
Sharif University
Other Characteristics of Data WarehouseOther Characteristics of Data Warehouse
• Summarized
• Not Normalized
• Meta Data
• Sources (Both operational and external data are presents)
12
Sharif University
Summary DataSummary Data
– Provide fast access to pre-computed data
– Reduce use of
• I/O
• CPU
• Memory
– Distill from
• Source systems - lightly summarized
• Pre-calculated summaries - highly summarized
– Determine requirements early
13
Sharif University
Summary DataSummary Data
• Average
• Maximum
• Total
• Percentage
DimensionDimensionDataData
FactFactDataData
Units Sold Sales($) Store
Product A
Total
Product B
Total
Product C
Total
14
Sharif University
Summary DataSummary Data
TimeTime
ProductProduct
StoreStore
Summary FactSummary Fact(Derived)(Derived)
15
Sharif University
NormalizationNormalization
– Normalized data contains no
• Redundancy.
• Repeating data.
• Key independent columns.
– Denormalized data often
• Improves efficiency in OLAP systems.
• Exists in data warehouse databases.
• Comprises derived or summary data.
– Star and snowflake models are denormalized.
16
Sharif University
Meta Data (Data about Data)Meta Data (Data about Data)
Provides information about the content of the warehouse.
Meta Data includes:• A guide to moving data to the warehouse• Rules for summarization• Business terms used to describe data• Technical terminology• Rules for data extractions
17
Sharif University
Data Warehouse ArchitecturesData Warehouse Architectures
• Data Warehouse Architecture (Basic)• Data Warehouse Architecture (with a Staging Area)• Data Warehouse Architecture (with a Staging Area and
Data Marts)
18
Sharif University
Data Warehouse Architecture (Basic)Data Warehouse Architecture (Basic)
• End users directly access data derived from several source systems through the data warehouse.
19
Sharif University
Data Warehouse Architecture (with a Data Warehouse Architecture (with a Staging Area)Staging Area)
you need to clean and process your operational data before putting itinto the warehouse. You can do this programmatically, although most data
warehouses use a staging area instead.
20
Sharif University
Data Warehouse Architecture (with a Staging Data Warehouse Architecture (with a Staging Area and Data Marts)Area and Data Marts)
you may want to customize your warehouse’s architecture for different groups within your organization. You can do this by adding data
marts, which are systems designed for a particular line of business.
21
Sharif University
Data MartsData Marts
A Data Mart is a small warehouse designed for strategic business unit or a department.
Data Mart Advantages:• The cost is low.
• Implementation time is shorter.
• They are controlled locally rather than centrally.
• They contain less information than the data warehouse and hence have more rapid response.
• They allow a business unit to build its own DSS without relying on a centralized IS department.
Data Mart Types:• Replicated Data Marts.
• Stand-alone Data Marts.
22
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Corporate Information FactoryCorporate Information Factory
23
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Business Operations
Business Intelligence
Business Management
Major Business FunctionsMajor Business Functions
24
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Operational Systems are the internal and external core systems that run the day-to-day business operations. They are accessed through application program interfaces (APIs) and are the source of data for the data warehouse and operational data store.
Operational SystemsOperational Systems
25
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
External Data is any data outside the normal data collected through an enterprise’s internal applications. Generally, external data, such as demographic, credit, competitor, and financial information, is purchased by the enterprise from a vendor of such information.
External DataExternal Data
26
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Data Acquisition is the set of processes that capture, integrate, transform, cleanse, and load source data into the data warehouse and operational data store.
Data AcquisitionData Acquisition
27
Sharif University
Data ProblemsData Problems
28
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
The Data Warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data used to support the strategic decision-making process for the enterprise.
Data WarehouseData Warehouse
29
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
The Operational Data Store is an subject-oriented, integrated, current, volatile collection of data used to support the tactical decision-making process for the enterprise.
Operational Data StoreOperational Data Store
30
Sharif University
Comparing an Operational Data Store and Comparing an Operational Data Store and a Data Warehousea Data Warehouse
31
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
CIF Data Management is the set of processes that protect the integrity and continuity of the data within and across the data warehouse and operational data store. It may employ a staging area for cleansing and synchronizing data.
CIF Data ManagementCIF Data Management
32
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
The Transactional Interface is an easy-to-use and intuitive interface for the end user to access and manipulate data in the operational data store.
Transactional InterfaceTransactional Interface
33
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Data Delivery is the set of processes that enables end users and their supporting IT groups to filter, format, and deliver data to data marts and oper-marts.
Data DeliveryData Delivery
34
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
The Exploration Warehouse is a data mart whose purpose is to provide a safe haven for exploratory and ad hoc processing. An exploration warehouse may utilize specialized technologies to provide fast response times with the ability to access the entire database.
Exploration WarehouseExploration Warehouse
35
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
OtherThe Data Mining Warehouse includes tasks known as knowledge extraction, data archaeology, data exploration, data pattern processing and data harvesting.
Data Mining WarehouseData Mining Warehouse
36
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
The OLAP (online analytical processing) Data Mart is aggregated and/or summarized data that is derived from the data warehouse and tailored to support the multidimensional requirements of a given business unit or business function.
OLAP Data MartOLAP Data Mart
37
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
The Oper-Mart is a subset of data derived from of the operational data store used in tactical analysis and usually stored in a multidimensional manner (star schema or hypercube). They may be created in a temporary manner and dismantled when no longer needed.
Oper-MartOper-Mart
38
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
The Decision Support Interface is an easy-to-use, intuitive tool to enable end user capabilities such as exploration, data mining, OLAP, query, and reporting to distill information from data.
Decision Support InterfaceDecision Support Interface
39
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Meta Data Management is the set of processes for managing the information needed to promote data legibility, use, and administration.
Meta Data ManagementMeta Data Management
40
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Information Feedback is the set of processes that transmit the intelligence gained through usage of the Corporate Information Factory to appropriate data stores.
Information FeedbackInformation Feedback
41
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Information Workshop is the set of the facilities that optimize use of the Corporate Information Factory by organizing its capabilities and knowledge, and then assimilating them into the business process.
Information WorkshopInformation Workshop
42
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
The Library and Toolbox is the collection of meta data and capabilities that provides information to effectively use and administer the Corporate Information Factory. The library provides the medium from which knowledge is enriched. The toolbox is a vehicle for organizing, locating, and accessing capabilities.
Library and ToolboxLibrary and Toolbox
43
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
The Workbench is a strategic mechanism for automating the integration of capabilities and knowledge into the business process.
WorkbenchWorkbench
44
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Operation and Administration is the set of activities required to ensure smooth daily operations, to ensure that resources are optimized, and to ensure that growth is managed.
Operations and AdministrationOperations and Administration
45
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Systems Management is the set of processes for maintaining, versioning, and upgrading the core technology on which the data, software, and tools operate.
Systems ManagementSystems Management
46
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Data Acquisition Management is the set of processes that manage and maintain processes used to capture source data and its preparation for loading into the data warehouse or operational data store.
Data Acquisition ManagementData Acquisition Management
47
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Service Management is the set of processes for promoting user satisfaction and productivity within the Corporate Information Factory. It includes processes that manage and maintain service level agreements, requests for change, user communications, and the data delivery mechanisms.
Service ManagementService Management
48
Sharif University
Information Workshop
Meta Data Management
Operation & Administration
Library & Toolbox Workbench
Change Management
Service Management
Data Acquisition Management
Systems Management
Data Acquisition
CIF Data Management
Data Delivery
Information Feedback
API
API
API
API DSI
DSI
TrI
DSI
DSI
Operational Systems
OperationalData Store
Data Warehouse
Exploration Warehouse
Data Mining Warehouse
OLAP Data Mart
Oper Mart
External
ERP
Internet
Legacy
Other
Change Management is the set of processes coordinating modifications to the Corporate Information Factory.
Change ManagementChange Management
49
Sharif University
Extraction, Transportation, Loading and Extraction, Transportation, Loading and Transformation (ETL) Transformation (ETL)
OLTP DatabasesOLTP Databases Staging FileStaging File Warehouse DatabaseWarehouse Database
Purchase specialist tools, or develop programsPurchase specialist tools, or develop programs
• Extraction - select data using different methodsExtraction - select data using different methods• Transportation - move data into the warehouseTransportation - move data into the warehouse
• Loading and Transformation - validate, clean, Loading and Transformation - validate, clean, integrate, and time stamp dataintegrate, and time stamp data
50
Sharif University
Data Quality - ImportanceData Quality - Importance
Ensure data is• Relevant
• Useful
WarehouseWarehouse
Change
Clean up
Restructure
Operational Operational
systemssystems
Relevant
Useful
Quality
Accurate
Accessible
• Large time consuming taskLarge time consuming task
• QualityQuality
• AccurateAccurate
• AccessibleAccessible
51
Sharif University
An ExampleAn Example
a re
coro
f
as
XX
++
Customers:
Browser:
http://
HollywoodHollywood
Sale 1/2/98 12:00:01 Ham Pizza $10.00
Sale 1/2/98 12:00:02 Cheese Pizza $15.00
Sale 1/2/98 12:00:02 Anchovy Pizza $12.00
Return 1/2/98 12:00:03 Anchovy Pizza - $12.00
Sale 1/2/98 12:00:04 Sausage Pizza $11.00
Sale 1/2/98 12:00:02 Anchovy Pizza $12.00
Return 1/2/98 12:00:03 Anchovy Pizza - $12.00
Sale 1/2/98 12:00:01 Ham Pizza $10.00
Sale 1/2/98 12:00:02 Cheese Pizza $15.00
Sale 1/2/98 12:00:04 Sausage Pizza $11.00
52
Sharif University
Extraction in Data WarehousesExtraction in Data Warehouses
• Logical Extraction Methods– Full Extraction
• The data is extracted completely from the source system.– Incremental Extraction
• At a specific point in time, only the data that has changed since a well-defined event back in history will be extracted.
• Physical Extraction Methods– Online Extraction
• The data is extracted directly from the source system itself.– Offline Extraction
• Flat files• Dump files• Redo and archive logs• Transportable tablespaces
53
Sharif University
Changing DataChanging Data
Operational DatabasesOperational Databases Warehouse DatabaseWarehouse Database
First time loadFirst time load
RefreshRefresh
RefreshRefresh
RefreshRefresh
PurgePurgeoror
ArchiveArchive
54
Sharif University
Transportation in Data WarehousesTransportation in Data Warehouses
• Transportation Mechanisms in Data Warehouses
– Transportation Using Flat Files– Transportation Through Distributed Operations– Transportation Using Transportable Tablespaces
55
Sharif University
Transportation in Data WarehousesTransportation in Data Warehouses
• Transportation Using Flat Files– The most common method for transporting data is by the
transfer of flat files, using mechanisms such as FTP or other remote file system access protocols
• Transportation Through Distributed Operations– Distributed queries, either with or without gateways, can be an
effective mechanism for extracting data. These mechanisms also transport the data directly to the target system.
• Transportation Using Transportable Tablespaces– Some Databases such as Oracle and DB2 introduced an
important mechanism for transporting data: transportable tablespaces. This feature is the fastest way for moving large volumes of data between two databases.
56
Sharif University
Loading and Transformation in Data Loading and Transformation in Data WarehousesWarehouses
• Loading Mechanisms– SQL*Loader– External Tables– OCI and Direct-Path APIs– Export/Import
• Transformation Mechanisms
– Transformation Using SQL
– Transformation Using PL/SQL– Transformation Using Table Functions
57
Sharif University
Incremental Development Incremental Development
– Focus on business functionality
– Deliver business benefit
– Are suited to warehouse evolution
– Once an increment is complete the selection and scope of the next increment is defined
– Each increment follows the same phase sequence
StrategyStrategy
Projectand
ProgramManagement
Projectand
ProgramManagement
ETAEnterpriseTechnical
Architecture
ETAEnterpriseTechnical
Architecture
DefinitionDefinition
AnalysisAnalysis
DesignDesign
BuildBuild
Transition to ProductionTransition to Production
DiscoveryDiscovery
IncrementalDevelopment
58
Sharif University
RolesRoles
–The project team: roles and responsibilities–Common roles
• Analyst, Database Administrator, Programmer, Tester
– Warehouse specific roles• DW Architect, Metadata Architect, Data Quality
Administrator, DW Administrator
59
Sharif University
Design in Data WarehousesDesign in Data Warehouses
• Logical Design in Data WarehousesLogical Design in Data Warehouses– Data Warehousing Schemas
• Star
• Snowflake
• Constellation
• Physical Design in Data WarehousesPhysical Design in Data Warehouses– Physical Design Structures
• Tablespaces
• Tables and Partitioned Tables
• Views
• Integrity Constraints
• Dimensions
• Indexes and Partitioned Indexes
• Materialized Views
60
Sharif University
Data Warehousing SchemasData Warehousing Schemas
• Star
• Snowflake
• Constellation
61
Sharif University
Star SchemaStar Schema
• The center of the star consists of
one or more fact tables and the
points of the star are the
dimension tables.
Store TableStore_idDistrict_id...
Item TableItem_idItem_desc...
Time TableDay_idMonth_idPeriod_idYear_id
Product TableProduct_idProduct_desc…
Sales Fact TableProduct_idStore_idItem_idDay_idSales_dollarsSales_units...
62
Sharif University
Snowflake SchemaSnowflake Schema
• d
Sales Fact TableItem_idStore_id
Sales_dollarsSales_units
Store TableStore_id
Store_descDistrict_id
Item TableItem_id
Item_descDept_id
Time TableWeek_idPeriod_idYear_id
District TableDistrict_id
District_desc
Dept TableDept_id
Dept_descMgr_id
Mgr TableDept_idMgr_id
Mgr_name
Product TableProduct_id
Product_desc
63
Sharif University
ConstellationConstellation
Warehouse TableWarehouse_id
Warehouse_loc
Inventory Fact TableProduct_id
Shelf_idCost_dollarsQty_on_hand
Store TableStore_id
District_id
Item TableItem_idDept_id
Time TableWeek_idPeriod_idYear_id
Product TableProduct_id
Product_desc
Sales Fact TableItem_idStore_id
Sales_dollarsSales_units
64
Sharif University
SummarySummary
• Need for Data Warehouse.• What is Data Warehouse?• Data Warehouse Properties.• Data Warehouse Architectures.• Data Marts.• Corporate Information Factory.• Extraction, Transportation, Loading and Transformation.• Design in Data Warehouses.• Data Warehousing Schemas.
65
Sharif University
Q & A
Data warehouseData warehouseInternal andInternal andexternalexternalsystemssystems
Decision makersDecision makers