7
Different Types of Dimensions and Facts in Data Warehouse Posted on June 5, 2012 by bintelligencegroup Dimension - A dimension table typically has two types of columns, primary keys to fact tables and textual\descreptive data. Fact - A fact table typically has two types of columns, foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact’s data on detail or aggregated level. Types of Dimensions - Slowly Changing Dimensions: Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension. Rapidly Changing Dimensions: A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a Rapidly Changing Dimension. Junk Dimensions: A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign keys created by Rapidly Changing Dimensions. When developing a dimensional model, we often encounter miscellaneous flags and indicators. These flags do not logically belong to the core dimension tables. A junk dimension is grouping of low cardinality flags and indicators. This junk dimension helps in avoiding cluttered design of data warehouse. Provides an easy way to access the dimensions from a single point of entry and improves

Different Types of Dimensions and Facts in Data

Embed Size (px)

Citation preview

Page 1: Different Types of Dimensions and Facts in Data

Different Types of Dimensions and Facts in Data WarehousePosted on June 5, 2012by bintelligencegroupDimension -A dimension table typically has two types of columns, primary keys to fact tables and textual\descreptive data.Fact -A fact table typically has two types of columns, foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact’s data on detail or aggregated level. Types of Dimensions -Slowly Changing Dimensions: Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension.

Rapidly Changing Dimensions:A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a Rapidly Changing Dimension.

Junk Dimensions:A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign keys created by Rapidly Changing Dimensions.

When developing a dimensional model, we often encounter miscellaneous flags and indicators. These flags do not logically belong to the core dimension tables.

A junk dimension is grouping of low cardinality flags and indicators. This junk dimension helps in avoiding cluttered design of data warehouse. Provides an easy way to access the dimensions from a single point of entry and improves the performance of sql queries.

Example: For example, assume that there are two dimension tables (gender and marital status). The data of these two tables are shown below:

Code:Table: GenderId Gender_status----------------1 Male2 Female

Table: Marital StatusId Marital_Status

Page 2: Different Types of Dimensions and Facts in Data

----------------1 Single2 Married

Here both the dimensions have low cardinality flags. This will cause maintenance of two tables and decrease performance of sql queries.

We can combine these two dimensions into a single table by cross joining and can maintain a single dimension table. The result of cross join is shown below:

Code:id gender mrg_status--------------------1 Male Single2 Male Married3 Female Single4 Female Married

This new dimension table is called a junk dimension. This will improve the manageability and improves the sql queries performance.

Inferred Dimensions:While loading fact records, a dimension record may not yet be ready. One solution is to generate an surrogate key with Null for all the other attributes. This should technically be called an inferred member, but is often called an inferred dimension.

Conformed Dimensions:

A Dimension that is used in multiple locations is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses. In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact table in the structure. Conformed dimensions allow facts and its measures to be accessed in the same way across multiple facts, ensuring consistent reporting across the enterprise.

A conformed dimension can exist as a single dimension table that relates to multiple fact tables within the same data warehouse, or as identical dimension tables in separate data marts.

Page 3: Different Types of Dimensions and Facts in Data

As you can see in the above figure,the time and cust dimensions are called confirmed dimensions as they are shared across multiple fact tables with the same meaning.

Degenerate Dimensions: A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report. You can use these values to trace back to transactions in the OLTP system. Degenerate dimensions are quite common when we do a Data Warehouse implementation. They are nothing but a set of dimensions that can be classified as Fact attributes and are hence maintained in the fact table itself instead of separate dimension tables. One of the key advantages of using Degenerate dimensions is the fact that there is no need for doing a separate Surrogate lookup while creating the ETL mappings. Generally degenerate dimensions do not alter the grain of a fact table and hence are maintained within the Fact table itself. In some cases, they act as a reference between the data warehouse and the transactional system.

A degenerate dimension (DD) acts as a dimension key in the fact table, however does not join to a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions. Sometimes people want to refer to degenerate dimensions as textual facts,

Page 4: Different Types of Dimensions and Facts in Data

however they’re not facts since the fact table’s primary key often consists of the DD combined with one or more additional dimension foreign keys.

Degenerate dimensions commonly occur when the fact table’s grain is a single transaction (or transaction line). Transaction control header numbers assigned by the operational business process are typically degenerate dimensions, such as order, ticket, credit card transaction, or check numbers. These degenerate dimensions are natural keys of the “parents” of the line items.

Even though there is no corresponding dimension table of attributes, degenerate dimensions can be quite useful for grouping together related fact tables rows. For example, retail point-of-sale transaction numbers tie all the individual items purchased together into a single market basket. In health care, degenerate dimensions can group the claims items related to a single hospital stay orepisode of care.In a data warehouse, a degenerate dimension is a dimension which is derived from the fact table and doesn't have its own dimension table. Degenerate dimensions are often used when a fact table's grain represents transactional level data and one wishes to maintain system specific identifiers such as order numbers, invoice numbers and the like without forcing their inclusion in their own dimension. The decision to use degenerate dimensions is often based on the desire to provide a direct reference back to a transactional system without the overhead of maintaining a separate dimension table.

Role Playing Dimensions:A role-playing dimension is one where the same dimension key — along with its associated attributes — can be joined to more than one foreign key in the fact table. For example, a fact table may include foreign keys for both Ship Date and Delivery Date. But the same date dimension attributes apply to each foreign key, so you can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship date as well as delivery date, and hence the name of Role Playing dimension.

Shrunken Dimensions:A shrunken dimension is a subset of another dimension. For example, the Orders fact table may include a foreign key for Product, but the Target fact table may include a foreign key only for ProductCategory, which is in the Product table, but much less granular. Creating a smaller dimension table, with ProductCategory as its primary key, is one way of dealing with this situation of heterogeneous grain. If the Product dimension is snowflaked, there is probably already a separate table for ProductCategory, which can serve as the Shrunken Dimension.

Static Dimensions:Static dimensions are not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually — for example with Status codes — or it can be generated by a procedure, such as a Date or Time dimension.

Types of Facts -

Page 5: Different Types of Dimensions and Facts in Data

Additive:Additive facts are facts that can be summed up through all of the dimensions in the fact table. A sales fact is a good example for additive fact.Semi-Additive:Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.Eg: Daily balances fact can be summed up through the customers dimension but not through the time dimension.Non-Additive:Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.Eg: Facts which have percentages, ratios calculated.

Factless Fact Table: In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called “Factless Fact tables”.Eg: A fact table which has only product key and date key is a factless fact. There are no measures in this table. But still you can get the number products sold over a period of time.Based on the above classifications, fact tables are categorized into two:Cumulative:This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.Snapshot:This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.