Data Warehousing Basics

7/29/2019 Data Warehousing Basics

1/3

Data warehousing basics

Comparing conceptual, logical and physical data models

LAP and OLTP

LTP is Online Transaction Processing Highly normalized structures 3NF used for keeping track of daily transactions and

quires faster insert update and deletes at DB level hence we follow ER modelling technique (highly normalized tables). The

pdate inserts deletes are faster as these operations have to be performed at single place. In case of de-normalized structures

pdates have to be done in multiple records. Since there is less data redundancy the storage size of DB decreases. ER model

enerates very complex interwoven Entity Diagrams across multiple Business processes.

LAP are is a data warehousing system where in we use dimensional modelling fact and dimensions. Such a design is easily

nderstandable and has better performance in querying terms. Here the dimensions are not completely normalized but they


2/3

e kept in de-normalized state. Here in Dimensional Modelling from ER diagram we determine a business process and then

uild corresponding fact and dimensions related to this business process and then repeat the same for other business

ocesses.

ACT a fact table contains 2 parts

Foreign keys( reference to dimensions)

Measures(additive or semi additive fields e.g. quantity sold and market value of a product)

Dimension table contains the textual details of records in FACT table.

mensional Modelling can be performed in 2 formats

ar Schema

ere we have our fact table in the centre, while all the dimension tables surround our fact. There is reference of each dimension

the fact.

nowflake Schema

has same fundamental structure as Star Schema, However the dimensions are further normalized into separate tables.

he principle behind snowflaking is normalisation of the dimension tables by removing low cardinality attributes and forming

parate tables.

ype of dimensions

onformed Dimensions These are the dimensions that have same meaning across multiple Subject areas/Business process.

hey are the integration points within a Data mart across multiple Subject areas/Business process. Like Time Dimension, in

MDM there is MDM_BATCH_DIM table

egenerate Dimensions These are dimensions that are derived from fact but have no dimensions of their own.

nk Dimensions These are dimensions that contain low cardinality columns from fact tables like indicators, flags.

ct-less fact

fact table that does not contain any measure is called a fact-less fact. This table will only contain keys from different

mension tables. This is often used to resolve a many-to-many cardinality issue.

ny day a de-normalized table will return query results faster than a normalized bunch of tables.


3/3

When star and when snow flake

First of all, some definitions are in order. In a star schema, dimensions that reflect a hierarchy are flattened into a

single table. For example, a star schema Geography Dimension would have columns like country, state/province,

city, state and postal code. In the source system, this hierarchy would probably be normalized with multiple

tables with one-to-many relationships.

A snowflake schema does not flatten a hierarchy dimension into a single table. It would, instead, have two or

more tables with a one-to-many relationship. This is a more normalized structure. For example, one table may

have state/province and country columns and a second table would have city and postal code. The table with city

and postal code would have a many-to-one relationship to the table with the state/province columns.

There are some good for reasons snowflake dimension tables. One example is a company that has many types of

products. Some products have a few attributes, others have many, many. The products are very different from

each other. The thing to do here is to create a core Product dimension that has common attributes for all the

products such as product type, manufacturer, brand, product group, etc. Create a separate sub-dimension table

for each distinct group of products where each group shares common attributes. The sub-product tables must

contain a foreign key of the core Product dimension table.

One of the criticisms of using snowflake dimensions is that it is difficult for some of the multidimensional front-

end presentation tools to generate a query on a snowflake dimension. However, you can create a view for eachcombination of the core product/sub-product dimension tables and give the view a suitably description name

(Frozen Food Product, Hardware Product, etc.) and then these tools will have no problem.

Documents

Data Warehousing Basics