Upload
geetkiran-kaur
View
236
Download
0
Embed Size (px)
Citation preview
8/3/2019 DW Architecture & DataFlow
1/24
Architecture of Data Warehouse
By:Er. Manu Bansal
(Assistant Professor)
Dept. of IT
8/3/2019 DW Architecture & DataFlow
2/24
Data Warehouse- Concept
A data warehouse refers to a database that ismaintained separately from an organizations
operational databases.
The construction of data warehouses involvesdata cleaning, data integration, and datatransformation.
Data warehousing also forms an essential step inthe knowledge discovery process.
8/3/2019 DW Architecture & DataFlow
3/24
The four keywords distinguishing data warehouses fromother data repository systems, such as relational databasesystems, transaction processing systems, and file systems
are: Subject-oriented
Integrated
Time-variant
Nonvolatile
Data Warehouse V/S Data Base
8/3/2019 DW Architecture & DataFlow
4/24
Three Tired Architecture
Data
Warehouse
Extract
Transform
Load
Refresh
OLAP Engine
Analysis
Query
Reports
Data mining
Monitor
&
IntegratorMetadata
Data Sources Front-End Tools
Serve
Data Marts
Operational
DBs
other
sources
Data Storage
OLAP Server
8/3/2019 DW Architecture & DataFlow
5/24
Typical Components of a Data
Warehouse Architecture
8/3/2019 DW Architecture & DataFlow
6/24
Operational data
Without source system, there would be no data
The data sources for the data warehouse are supplied asfollows: Operational data held in network databases Departmental data held in file systems Private data held on workstations and private servers and
external systems such as Internet, commercially available DB,
or DB associated with and organizations suppliers orcustomers
8/3/2019 DW Architecture & DataFlow
7/24
Operational Data Store(ODS)
Is a repository of current and integrated operationaldata used for analysis. It is often structured andsupplied with data in the same way as the data
warehouse, but may in fact simply act as a staging areafor data to be moved into the warehouse.
ODS objectives: to integrate information from day-to-day systems and allow operational lookup to relieve
day-to-day systems of reporting and current-dataanalysis demands.
ODS can be helpful step towards building a data
warehouse because ODS can supply data that has beenalread extracted from the source s stems and cleaned.
8/3/2019 DW Architecture & DataFlow
8/24
Load Manager
Called thefrontendcomponent. The data is extracted from the operational systems
directly or from the operational datastore and then tothe data warehouse
Performs all the operations associated with theextraction and loading of data into the warehouse.
These operations include sourcing, acquisition, cleanup andtransformation toolswhich prepare the data for entry into
the warehouse. The functionality includes: Removing unwanted data from operational databases. Converting to common data names and definitions. Calculating summaries.
Establishing defaults for missing data.
8/3/2019 DW Architecture & DataFlow
9/24
Warehouse Manager
Performs all the operations associated with themanagement of the data in the warehouse asfollows:
Analysis of data to ensure consistency
Transformation and merging of source data fromtemporary storage into the data warehouse tables
Creation of indexes and views. Backing-up and archiving data.
8/3/2019 DW Architecture & DataFlow
10/24
Data Warehouse Database
Central Repository for information. This database is almost always implemented on the
relational database management system (RDBMS)
technology.
Certain data warehouse attributes such as very largedatabase size, ad hoc query processing and need for flexibleuser view creation including aggregates, multi-table joins
and drill downs have become drivers for differenttechnology approaches to data warehouse database.These approaches include:
8/3/2019 DW Architecture & DataFlow
11/24
Data Warehouse Database- Contd.
Parallel Relational database designs that require aparallel computing platform, such as symmetricmultiprocessors (SMPs) and massively parallel
processors (MPPs). Multidimensional databases (MDDBs).
8/3/2019 DW Architecture & DataFlow
12/24
Query Manager
Called backendcomponent
Performs all the operations associated with themanagement of user queries
Directing queries to the appropriate tables andscheduling the execution of queries.
8/3/2019 DW Architecture & DataFlow
13/24
Detailed Data
Stores all the detailed data in the databaseschema.
On a regular basis, detailed data is added to thewarehouse to supplement the aggregated data.
8/3/2019 DW Architecture & DataFlow
14/24
Lightly and Highly Summarized
Data
Stores all the pre-defined lightly and highly aggregateddata generated by the warehouse manager.
The purpose of summary information is to speed up
the performance of queries. On the other hand, it removes the requirement to
continually perform summary operations (such as sortor group by) in answering user queries.
The summarized data is updated continuously as newdata is loaded into the warehouse.
8/3/2019 DW Architecture & DataFlow
15/24
Archive/Backup Data
Stores detailed and summarized data for the purposesof archiving and backup
May be necessary to backup online summary data if this
data is kept beyond the retention period for detaileddata
The data is transferred to storage archives such asmagnetic tape or optical disk
8/3/2019 DW Architecture & DataFlow
16/24
Meta Data
This area of the warehouse stores all the metadatadefinitions used by all the processes in the warehouse Meta-Data is used for a variety of purposes:
Extraction and loading processes
Warehouse management processUsed to automate the production of summary tables
Query management process
Used to direct a query to the most appropriate data source
End-user access tools use metadata to understand howto build a query
8/3/2019 DW Architecture & DataFlow
17/24
End-user Access Tools
Users interact with the warehouse using end-user accesstools.
Can be categorized into five main groups Data reporting and query tools(Query by ExampleMS Access
DBMS) Application development tools (application used to access major
DBSOracle, sybase..) Executive information system (EIS) tools(For sales, marketing and
finance) Online analytical processing (OLAP) tools(Allow users to analyze
the data using complex and multidimentional views-frommultiple databases)
Data mining tools (allow the discovery of new patterns andtrend by mining a large amount of data using statistical,
mathematical tools)
8/3/2019 DW Architecture & DataFlow
18/24
Data Warehousing: Data flows
8/3/2019 DW Architecture & DataFlow
19/24
Inflow The processes associated with the extraction, cleansing,
and loading of the data from the source systems intothe data warehouse
Cleaning include removing inconsistencies, adding
missing fields, and cross-checking for data integrity Transformation include adding date/time stamp fields,
summarizing detailed data, deriving new fields to storecalculated data
Extract the relevant data from multiple, heterogeneous,and external sources (commercial tools are used)
Then mapped and loaded into the warehouse
8/3/2019 DW Architecture & DataFlow
20/24
Upflow The process associated with adding value to the data in
the warehouse through summarizing, packaging, anddistribution of the data
Summarizing the data works by choosing, projecting,joining, and grouping relational data into views that are
more convenient and useful to the end users. Packeging the data involves converting the detailed or
summarized information into more useful formats,such as spreadsheets, test documents, charts, othergraphical presentations, private databases, andanimation.
Distribute the data in appropiate groups to increase itsavailability and accessibility
8/3/2019 DW Architecture & DataFlow
21/24
Downflow
The processes associated with archiving and backing-upof data in the warehouse.
Archiving the effectiveness and performancemaintanance is achieved by transferring the older data
of limited value to storage archivers such as magnetictapes, optical disk or digital storage devices.
The downflow of data includes the processes to ensurethat the current state of the data warehouse can be
rebuilt following data loss, or software/hardwarefailures. Archived data should be stored in a way thatallows the re-establishement of the data in the
warehouse when required.
8/3/2019 DW Architecture & DataFlow
22/24
Outflow
Involves the process associated with making the dataavailabe to the end-users.
This involves two activities such as data accessing anddelivering
Data accessing is concerned with satisfying the enduserss requests for the data they need. The mainproblem here is the creation of an environment so thatthe users can effectively use the query tools to accessthe most appropiate data source.
Delivering activity makes possible the informationdelivery to the users systems/workstations.
8/3/2019 DW Architecture & DataFlow
23/24
Metaflow
Meta-flow is a description of the data contents of thedata warehouse, what is in it, where it came fromoriginally, and what has been done to it by way ofcleansing, integrating, and summarizing
Managing the metadata (data about the data)
8/3/2019 DW Architecture & DataFlow
24/24
Thanks