Data Warehousing Alex Ostrovsky CS157B Spring 2007

  • View

  • Download

Embed Size (px)

Text of Data Warehousing Alex Ostrovsky CS157B Spring 2007

  • Slide 1
  • Data Warehousing Alex Ostrovsky CS157B Spring 2007
  • Slide 2
  • Introduction Data warehouse is a main repository of corporate data Multiple databases are employed per specific purpose Contains raw events and unprocessed data, although separate tables might exist for processed information displaying meaningful data
  • Slide 3
  • What is it used for? Data analysis Data mining Complex queries with multiple table join Forecasting Historical reporting OLAP (Online Analytical Processing)
  • Slide 4
  • High level view
  • Slide 5
  • Key Concepts and Features Data is not required to be heavily normalized Transaction Processing is done mostly offline, thus processing time is not very critical. Although, this might depend on amount of data, normalization, query complexity, and application specifications.
  • Slide 6
  • Key Concepts and Features (cont.) Unlike regular OLTP real-time databases data is subject-oriented Non-volatile, i.e. data is essentially stored forever without being pruned or deleted. Heavily integrated: contains data from majority of organizations applications Time-variant: most of the data has some time reference for the purpose of producing the reports
  • Slide 7
  • Types of data warehousing DBs Offline operational database: similar to regular data replication. Used to minimize the impact of queries on a running primary operational system Offline data warehouse: heavily integrated, reporting-oriented warehouse databases which are updated with data from operational databases on regular time intervals
  • Slide 8
  • Types of data warehousing DBs (cont) Real-time data warehouse: database data is updated instantaneously as soon as transaction happens Integrated data warehouse: database is integrated with primary operational system for immediate decision making and reporting.
  • Slide 9
  • Benefits of Data Warehousing No need to stress operational database with complex queries Separation of processing and business logic Very flexible, multiple distinct relations can be defined from a set of data Can be customer or object specific Persistent once result is computed from the raw events, it doesnt need to be recomputed again, giving faster response time on subsequent queries.
  • Slide 10
  • Dangers of Data Warehousing Heavy processing requires physically separate database machines for warehousing and OLTP Must be optimized for novice users, complex queries might take a very long time Much more complex multidimensional design compared to regular relational databases Errors in computational logic can cause serious financial losses and computational recalculations. Data representation Relatively difficult to perform data migration
  • Slide 11
  • Database Design Data warehousing databases mostly utilize complex multidimensional design Relationships must be meaningful and represent clear patterns and trends of unprocessed data. More data and relationships you have more dimensions database will have. Information is viewed along one common dimensional position. Can be thought of as intersection of a few planes.
  • Slide 12
  • OLAP Market
  • Slide 13
  • References ata_Warehousing/ ata_Warehousing/ ata_Warehousing/ ata_Warehousing/Articles/ ata_Warehousing/Articles/ ata_Warehousing/Articles/ base base base