13
Data Warehousing Data Warehousing Alex Ostrovsky Alex Ostrovsky CS157B CS157B Spring 2007 Spring 2007

Data Warehousing Alex Ostrovsky CS157B Spring 2007

Embed Size (px)

Citation preview

Page 1: Data Warehousing Alex Ostrovsky CS157B Spring 2007

Data WarehousingData Warehousing

Alex OstrovskyAlex Ostrovsky

CS157BCS157B

Spring 2007Spring 2007

Page 2: Data Warehousing Alex Ostrovsky CS157B Spring 2007

IntroductionIntroduction

►Data warehouse is a main repository Data warehouse is a main repository of corporate dataof corporate data

►Multiple databases are employed per Multiple databases are employed per specific purposespecific purpose

►Contains raw events and unprocessed Contains raw events and unprocessed data, although separate tables might data, although separate tables might exist for processed information exist for processed information displaying meaningful datadisplaying meaningful data

Page 3: Data Warehousing Alex Ostrovsky CS157B Spring 2007

What is it used for?What is it used for?

►Data analysisData analysis►Data miningData mining►Complex queries with multiple table Complex queries with multiple table

joinjoin►ForecastingForecasting►Historical reportingHistorical reporting►OLAP (Online Analytical Processing)OLAP (Online Analytical Processing)

Page 4: Data Warehousing Alex Ostrovsky CS157B Spring 2007

High level viewHigh level view

Page 5: Data Warehousing Alex Ostrovsky CS157B Spring 2007

Key Concepts and FeaturesKey Concepts and Features

►Data is not required to be heavily Data is not required to be heavily normalizednormalized

►Transaction Processing is done mostly Transaction Processing is done mostly offline, thus processing time is not offline, thus processing time is not very critical. Although, this might very critical. Although, this might depend on amount of data, depend on amount of data, normalization, query complexity, and normalization, query complexity, and application specifications.application specifications.

Page 6: Data Warehousing Alex Ostrovsky CS157B Spring 2007

Key Concepts and Features Key Concepts and Features (cont.)(cont.)

►Unlike regular OLTP real-time databases Unlike regular OLTP real-time databases data is subject-orienteddata is subject-oriented

►Non-volatile, i.e. data is essentially Non-volatile, i.e. data is essentially stored forever without being pruned or stored forever without being pruned or deleted.deleted.

►Heavily integrated: contains data from Heavily integrated: contains data from majority of organization’s applicationsmajority of organization’s applications

►Time-variant: most of the data has Time-variant: most of the data has some time reference for the purpose of some time reference for the purpose of producing the reportsproducing the reports

Page 7: Data Warehousing Alex Ostrovsky CS157B Spring 2007

Types of data warehousing Types of data warehousing DBsDBs

►Offline operational database: similar to Offline operational database: similar to regular data replication. Used to regular data replication. Used to minimize the impact of queries on a minimize the impact of queries on a running primary operational systemrunning primary operational system

►Offline data warehouse: heavily Offline data warehouse: heavily integrated, reporting-oriented integrated, reporting-oriented warehouse databases which are warehouse databases which are updated with data from operational updated with data from operational databases on regular time intervalsdatabases on regular time intervals

Page 8: Data Warehousing Alex Ostrovsky CS157B Spring 2007

Types of data warehousing DBs Types of data warehousing DBs (cont)(cont)

►Real-time data warehouse: database Real-time data warehouse: database data is updated instantaneously as data is updated instantaneously as soon as transaction happenssoon as transaction happens

► Integrated data warehouse: database Integrated data warehouse: database is integrated with primary operational is integrated with primary operational system for immediate decision making system for immediate decision making and reporting.and reporting.

Page 9: Data Warehousing Alex Ostrovsky CS157B Spring 2007

Benefits of Data Benefits of Data WarehousingWarehousing

►No need to stress operational database with No need to stress operational database with complex queriescomplex queries

► Separation of processing and business logicSeparation of processing and business logic► Very flexible, multiple distinct relations can Very flexible, multiple distinct relations can

be defined from a set of databe defined from a set of data► Can be customer or object specificCan be customer or object specific► Persistent – once result is computed from Persistent – once result is computed from

the raw events, it doesn’t need to be the raw events, it doesn’t need to be recomputed again, giving faster response recomputed again, giving faster response time on subsequent queries.time on subsequent queries.

Page 10: Data Warehousing Alex Ostrovsky CS157B Spring 2007

Dangers of Data Dangers of Data WarehousingWarehousing

►Heavy processing requires physically separate Heavy processing requires physically separate database machines for warehousing and OLTPdatabase machines for warehousing and OLTP

►Must be optimized for novice users, complex Must be optimized for novice users, complex queries might take a very long timequeries might take a very long time

►Much more complex multidimensional design Much more complex multidimensional design compared to regular relational databasescompared to regular relational databases

► Errors in computational logic can cause Errors in computational logic can cause serious financial losses and computational serious financial losses and computational recalculations.recalculations.

►Data representationData representation► Relatively difficult to perform data migrationRelatively difficult to perform data migration

Page 11: Data Warehousing Alex Ostrovsky CS157B Spring 2007

Database DesignDatabase Design

►Data warehousing databases mostly utilize Data warehousing databases mostly utilize complex multidimensional designcomplex multidimensional design

► Relationships must be meaningful and Relationships must be meaningful and represent clear patterns and trends of represent clear patterns and trends of unprocessed data. More data and unprocessed data. More data and relationships you have more dimensions relationships you have more dimensions database will have.database will have.

► Information is viewed along one common Information is viewed along one common dimensional position. Can be thought of as dimensional position. Can be thought of as intersection of a few planes.intersection of a few planes.

Page 12: Data Warehousing Alex Ostrovsky CS157B Spring 2007

OLAP MarketOLAP Market

Page 13: Data Warehousing Alex Ostrovsky CS157B Spring 2007

ReferencesReferences

► http://en.wikipedia.org/wiki/Data_warehousehttp://en.wikipedia.org/wiki/Data_warehouse► http://en.wikipedia.org/wiki/OLAPhttp://en.wikipedia.org/wiki/OLAP► http://dmoz.org/Computers/Software/http://dmoz.org/Computers/Software/

Databases/Data_Warehousing/Databases/Data_Warehousing/► http://dmoz.org/Computers/Software/http://dmoz.org/Computers/Software/

Databases/Data_Warehousing/Articles/Databases/Data_Warehousing/Articles/► http://en.wikipedia.org/wiki/http://en.wikipedia.org/wiki/

Multidimensional_databaseMultidimensional_database► http://www.olapreport.com/market.htmhttp://www.olapreport.com/market.htm