Data Warehousing Basics

Embed Size (px)

Citation preview

  • 7/29/2019 Data Warehousing Basics

    1/3

    Data warehousing basics

    Comparing conceptual, logical and physical data models

    LAP and OLTP

    LTP is Online Transaction Processing Highly normalized structures 3NF used for keeping track of daily transactions and

    quires faster insert update and deletes at DB level hence we follow ER modelling technique (highly normalized tables). The

    pdate inserts deletes are faster as these operations have to be performed at single place. In case of de-normalized structures

    pdates have to be done in multiple records. Since there is less data redundancy the storage size of DB decreases. ER model

    enerates very complex interwoven Entity Diagrams across multiple Business processes.

    LAP are is a data warehousing system where in we use dimensional modelling fact and dimensions. Such a design is easily

    nderstandable and has better performance in querying terms. Here the dimensions are not completely normalized but they

  • 7/29/2019 Data Warehousing Basics

    2/3

    e kept in de-normalized state. Here in Dimensional Modelling from ER diagram we determine a business process and then

    uild corresponding fact and dimensions related to this business process and then repeat the same for other business

    ocesses.

    ACT a fact table contains 2 parts

    Foreign keys( reference to dimensions)

    Measures(additive or semi additive fields e.g. quantity sold and market value of a product)

    Dimension table contains the textual details of records in FACT table.

    mensional Modelling can be performed in 2 formats

    ar Schema

    ere we have our fact table in the centre, while all the dimension tables surround our fact. There is reference of each dimension

    the fact.

    nowflake Schema

    has same fundamental structure as Star Schema, However the dimensions are further normalized into separate tables.

    he principle behind snowflaking is normalisation of the dimension tables by removing low cardinality attributes and forming

    parate tables.

    ype of dimensions

    onformed Dimensions These are the dimensions that have same meaning across multiple Subject areas/Business process.

    hey are the integration points within a Data mart across multiple Subject areas/Business process. Like Time Dimension, in

    MDM there is MDM_BATCH_DIM table

    egenerate Dimensions These are dimensions that are derived from fact but have no dimensions of their own.

    nk Dimensions These are dimensions that contain low cardinality columns from fact tables like indicators, flags.

    ct-less fact

    fact table that does not contain any measure is called a fact-less fact. This table will only contain keys from different

    mension tables. This is often used to resolve a many-to-many cardinality issue.

    ny day a de-normalized table will return query results faster than a normalized bunch of tables.

  • 7/29/2019 Data Warehousing Basics

    3/3

    When star and when snow flake

    First of all, some definitions are in order. In a star schema, dimensions that reflect a hierarchy are flattened into a

    single table. For example, a star schema Geography Dimension would have columns like country, state/province,

    city, state and postal code. In the source system, this hierarchy would probably be normalized with multiple

    tables with one-to-many relationships.

    A snowflake schema does not flatten a hierarchy dimension into a single table. It would, instead, have two or

    more tables with a one-to-many relationship. This is a more normalized structure. For example, one table may

    have state/province and country columns and a second table would have city and postal code. The table with city

    and postal code would have a many-to-one relationship to the table with the state/province columns.

    There are some good for reasons snowflake dimension tables. One example is a company that has many types of

    products. Some products have a few attributes, others have many, many. The products are very different from

    each other. The thing to do here is to create a core Product dimension that has common attributes for all the

    products such as product type, manufacturer, brand, product group, etc. Create a separate sub-dimension table

    for each distinct group of products where each group shares common attributes. The sub-product tables must

    contain a foreign key of the core Product dimension table.

    One of the criticisms of using snowflake dimensions is that it is difficult for some of the multidimensional front-

    end presentation tools to generate a query on a snowflake dimension. However, you can create a view for eachcombination of the core product/sub-product dimension tables and give the view a suitably description name

    (Frozen Food Product, Hardware Product, etc.) and then these tools will have no problem.