Click here to load reader

Datawarehouse & Datamart OLAPs vs. OLTPs Dimensional Modeling Creating Physical Design Using SQL Mgt. Studio Module II: Designing Datamarts 1

Embed Size (px)

Citation preview

Slide 1

Datawarehouse & DatamartOLAPs vs. OLTPsDimensional ModelingCreating Physical Design Using SQL Mgt. StudioModule II: Designing Datamarts 1BI System Components Module 4: Populate a DataMartChapter 7 & 8 Larson Book ETL Process SSI Services Module 2: Design a Datamart: Chapter 3 & 6 Larson Book Requirement Analysis Creating a SchemaSS DB EngineModule 3: Business AnalyticsChapter 4,9, 10 Larson BookBuild an OLAP/CubeSSA ServicesModule 1: Delivering BIChapter 1, 2, 10,18 Larson Book Creating KPI Creating Reports Excel and Tableau OutlineData Warehouse Concept OLAPs vs. OLTPs (fundamental differences that suggest the need for different design approaches)Dimensional ModelingCreating Physical Design Using SQL Mgt. Studio

3Concept and Characteristics Datawarehouse & Datamart 4Data Warehouse Data Warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect.A warehouse is a collection of data that is subject-oriented, integrated, time-variant and non-volatile.Provides a consolidated view of enterprise data, optimized for reporting and analysis. A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized formatData Marts are smaller versions of warehouses 56OLAP vs OLTPOLAP vs. OLTP7Online Transaction Processing Systems (OLTP): Systems that (e.g., order processing) Inserting, Updating, Deleting appropriate records in a database at the end of each transaction.

Online Analytical Processing Systems (OLAP): Systems that summarize & analyze a collection of transaction data. process transactionssummarize & analyzeRelationship between OLTP and OLAP?Structural/Design differences? Purpose /Function difference? Difference in the type of data or information stored Size Users Data stored Performance Metric?

OLTP vs OLAP8Relationship between OLTP and OLAP? OLTP a data source for OLAP

Structural/Design differences? ER Modeling vs. Dimensional Modeling ER-Design vs. Star or Snow-Flake DesignER-Design -well structured steps, have been used and tested for decades vs. Star and Snow-Flake Design widely used for only a decade and still unstructured and the rules are not well established Application oriented vs. Subject oriented

OLTP vs OLAP9Purpose /Function difference? OLTP process transactions vs. .OLAP conducts analysis (performance, gain insight)

OLTP focus on transaction processing efficiencies vs. OLAP ease data retrieval that is cognitively less overloading (allows for chunks or Cubes of data to be viewed

OLTP process repetitive transactions (insert, delete) and conduct simple manipulations (select , update) vs. OLAP involves examining (mostly read only) many data items, complex relationships and focuses on aggregates

OLTP views detailed and flat transactions vs. OLAP multidimensional and aggregates

OLTP vs OLAP10Difference in the type of data or information stored OLTP current and isolated vs. OLAP historic and consolidated OLTP stores data specific to a transaction vs. OLAP stores data specific to performance Size Users - OLTP has thousands of users vs. OLAP have hundreds or fewer users Data stored - OLTP stores 100s MB-GB vs. OLAP stores 100s GB-TB

Performance Metric? Transaction Throughput vs. OLAP Query Throughput Data Quality - Dirty data a major issue for OLAP

OLTP vs OLAP11Modeling Technique used to design data warehouses and data marts12Dimensional ModelingER Modeling vs. Dimensional ModelingER Modeling Dimensional Modeling 13Transaction Capture Reduce Data Redundancy highly normalized tables Hard for End-user to understand and remember Not query friendly All the attributes for an entity including categorical as well as numeric, belong to the entity table. Well defined theory driven process Data Retrieval Intuitive and high query performance Categorical data in a 'dimension' entity and the 'fact' entity has mostly numeric attributes.The only categorical (non-fact) field in the fact table are the keys to dimension tablesProcess ill-definedmore of an art

Dimensional Modeling Benefits 14Produce database structures that are easy for end users to understand and write queries against.

Optimize query performance (as opposed to update performance).

Scalability - Dimensional models are scalable and easily accommodate unexpected new data.14Designing a Data MartIdentifying the information that the decision makers need - measures, dimensions, hierarchies, and attributes. (Group Deliverable I)Build the database structure for the data mart using either a star or snowflake schema. . (Group Deliverable II)

15Requirement Analysis Decision Makers' Needs (GD#1)Business intelligence design must start with the decision makers What foundational and feedback information do they need? How do they need that information sliced and diced for proper analysis? More specifically: What facts, figures, statistics, and so forth do you need for effective decision making? (measures)How should this information be sliced and diced for analysis? (dimensions)What additional information can aid in decision making? (attributes)

16Data Mart Structure Data Marts Structure consists of the following two types of data objectsPerformance Measures (also referred as facts)Dimensions Hierarchies Attributes 17In a dimensional model, data is divided into two general categories: Dimensional data and Fact data. Fact data represents the measurable results of a business event, or a business state and Dimensional data provides the context for that event or state. For example: I go into a supermarket and buy a gallon of milk for $3.96. That is a business event. When I bought it, who I am, what store I was in, and what I purchased are all contexts for this event. The measurable result is the $3.96 I paid for the milk.Events vary depending on the business. For a wholesaler, receiving an order is an event, shipping goods is another, and invoicing the customer is yet another. In the case of insurance or banking among others, business state is also important. Is the policy in force, what are the coverages, what accounts does the customer hold are all examples of questions relating to business state. It is important to understand the basic business process and identify the business events and states when developing a dimensional data warehouse. We will cover this in greater detail in the next installment.Dimensions provide the 'who, what, where, when and why' that is applied to the 'how much' fact. Getting the dimensions right in a data warehouse is, by far, the most critical and difficult process in designing a data warehouse. Whereas facts can be clearly traced to specific business processes, identifying dimensions is more art than science. Sure, there are easy ones: customer, vendor, date, product; but as you get into the nuances of a transaction and the business it is easy to get tripped up by roles, statuses, hierarchies and a multitude of attributes demanded by the users. We will spend a lot of time in future installments looking at dimensions and how to best address certain design problems.

17Data Mart Structure Performance Measures :A Measure is a numeric quantity expressing some aspect of the organization's performance. The information represented by this quantity is used to support or evaluate the decision making and performance of the organization. A measure can also be called a fact. Example Total Sales.Information needed during the design processName of the measureWhat fields should be used to supply the data (source)Data type (money, integer, decimal)Formula used to calculate the measure (if there is one)Measures define what the decision makers want to see

1818Data Mart Structure Dimensions (Slicers): A Dimension is a categorization used to spread out an aggregate measure to reveal its constituent parts. Examples: total sales by sales person by yearDimension - Key words: "by," "for each," or "for every

Information needed during the design processName of the dimensionWhat fields should be used to supply the data (source)Data type of the dimension's key (the code that uniquely identifies each member of the dimension)Name of the parent dimension (if there is one)

The dimensions and hierarchies define how the decision maker wants to view the data.

19Data Mart Structure Hierarchy (Slicers; Drill Down): A Hierarchy is a structure made up of two or more levels of related dimensions. A dimension at an upper level of the hierarchy completely contains one or more dimensions from the next lower level of the hierarchy. Example: Time Dimension Month, Quarter, Year. Hierarchies are used to organize dimensions into various levelsHierarchies roll up cities into sales regions" or "drill down from year into quarter

2020Data Mart Structure Attributes: An Attribute is an additional piece of information pertaining to a dimension member that is not the unique identifier or the description of the member. Example: Regional Managers information, Customers gender and age. Provides more contextual information about a dimension Information needed during the design processName of the attributeWhat fields should be used to supply the data (source)Data typeName of the dimension to which it appliesAllows decision makers to filter data21Dimensional Design The SchemaKey Principle - A dimensional schema physically separates the measures that quantify a subjects performance (e.g., student, business, team, process) from the descriptive elements (a.k.a. dimensions) that summarize and categorize the performance.

Two types of schemaA Star Schema A Snow Flake Schema 22Data Marts Data Objects Various Measures and Dimensions how to configure?23Hierarchies Hierarchies The main idea underlying this design 24Measure Group Dim 1Dim 2Dim 3Dim 4Dim 6Dim 5The Star Schema

25The Snow Flake Schema

26The Tables Measures All the measures are placed in a single table called the fact table in the schema The dimensions are places in their own tableIn the star schema, all the information for a hierarchy is stored in the same table. The information for the parent (or grandparent or great-grandparent, and so forth) dimension is added to the table containing the dimension at the lowest level of the hierarchy.

The snowflake schema works a bit differently. In the snowflake schema, each level in the dimensional hierarchy has its own table. The dimension tables are linked together with foreign key relationships to form the hierarchy.

27A Four Step Dimensional Modeling Process - http://www.kimballgroup.com/ (Not in the book) 28Step 1: Describe the Business Process that the Datamart Supports & Identify the Sources of Measurement Key concept - Measurement Events Step 2: Declare the Fact Table Grain Key Concept Fact Table Data Views Step 3: Choosing the Dimensions Key Concept Cardinalities & Hierarchies Step 4: Choosing the Facts Key Concept Its relationships with the measurement events and the grain Refer to the Class Handout and LBD#1 for this section29Dimension Modeling Details - Steps and Examples Refer to LBD#2 for this Section 30Converting Logical Design to Physical DesignUsing SQL Mgt. Studio Summary31Overview of Data Warehouse concept A data source for OLAPs OLTP vs OLAP Compare and Contrast Dimensional Modeling Benefits Data Objects Data Structures Schemas Logical and Physical