39
© 2013 Genesee Academy, LLC 25568 Genesee Trail Rd Golden, Colorado 80401 (303) 526-0340 Data Vault Modeling and Approach DW2.0 and Unstructured Data Master Data Management and Metadata Data Vault & Ensemble Modeling BI Podium Next Generation DWH Modeling 2013 Hans Hultgren 2013 Genesee Academy, LLC 25568 Genesee Trail Rd Golden, Colorado 80401 © gohansgo

Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

  • Upload
    others

  • View
    20

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

25568 Genesee Trail Rd Golden, Colorado 80401

(303) 526-0340

Data Vault Modeling and Approach DW2.0 and Unstructured Data Master Data Management and Metadata

Data Vault & Ensemble Modeling

BI Podium Next Generation DWH Modeling 2013

Hans Hultgren

2013 Genesee Academy, LLC 25568 Genesee Trail Rd Golden, Colorado 80401

©

gohansgo

Page 2: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Data Vault & Ensemble Modeling

• Welcome • Quick audience poll:

– Data Warehousing Business Intelligence – Data Vault Modeling – Certification Course

• Session will cover: – Data Vault – Ensemble – Unified Decomposition – Data Warehousing – Agility

• More information

Page 3: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Data Vault and Ensemble Modeling

• About Data Warehousing - Characteristics

Each layer of the architecture has its own requirements, constraints & variables

3

1 Intro

Page 4: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Data Vault and Ensemble Modeling

• Why do we need it?

Each layer of the architecture has its own requirements, constraints & variables

4

Intro

Page 5: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Data Vault and Ensemble Modeling

• Why do we need it?

Each layer of the architecture has its own requirements, constraints & variables

5

Intro

3 layer architecture…

Page 6: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

About Data Vault, Ensemble & the EDW

6

• Enterprise Data Warehousing

• Integrated, Non-Volatile, Time-Variant, Subject/Concept Oriented, Central data store.

• Core Features: Enterprise-Wide, Historized, Auditable, Central Data, Integrated across all forms of sources internal and external.

2 Intro

Why data vault…

Page 7: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Why do we use Data Vault

7

• Integration • Traceability • History • Incremental Build • Agility

• Gracefully Adapts to New Sources • Full Auditability - Source to Mart • Enterprise View of Central Data

• Data Vault is optimized for modeling the EDW

2 Intro

What is data vault…

Page 8: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

• Data Vault is the leading data modeling approach among new options for the flexible/agile data warehouse.

Data Modeling Approaches:

Operational Data Warehouse Data Mart

• For data warehouse agility there are other techniques as well. The

broader family of techniques are all flavors of Ensemble Modeling. • In effect Ensemble modeling = EDW modeling. • Ensemble is based on the premise: The flexibility required by the data

warehouse needs a model that de-couples changing context from relationships from the business keys (Unified Decomposition).

Data Vault & Ensemble Modeling

3rd Normal Form Data Vault Dimensional

2 Intro

Agenda…

Page 9: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

• Background Topics: – Core Business Concepts – Agility

• Unified Decomposition • Ensemble Modeling • Data Vault Agility • The Data Vault Ensemble • Data Vault Core Constructs • Applying Data Vault • Core Concepts and the Backbone • DV Pattern applied • Bottom Line and Summary

Agenda

Page 10: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

INTEGRATION & THE CORE BUSINESS CONCEPT

Page 11: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

The Core Business Concept

11

• The Core Business Concept is the basis for our Data Vault Data Warehouse. It is similar to the Entity in 3NF or a Dimension in a Star Schema. And so it commonly includes Customer, Product, Employee, and etc.

• Important to note: 1) Business Driven, and 2) Enterprise Wide.

Page 12: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

ABOUT AGILITY

Page 13: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Agile Data Warehousing BI

13

• Agility = Measure of ability to Adapt to Change

• The EDW is constantly needing to adapt to change

– New Sources – New Attributes – Changing Sources – New and Changing Requirements – New and Changing Business Rules – New and Changing Deliveries – Expanding Subject Areas

Data Warehousing

Adapting to Change =

4

Page 14: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

UNIFIED DECOMPOSITION™

Page 15: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Unified Decomposition™

15

Separate things that change from things that are not changing.

• Break things out into component parts for flexibility and to facilitate the capture of things that are either interpreted in different ways or changing independently of each other. Decomposition.

• These parts however need to be integrated to define the core business concept (the Entity, the Dimension, etc.). So they must be kept together. Unified.

Page 16: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Ensemble Modeling™

16

All the parts of a thing taken together, so that each part is considered only in relation to the whole.

• The constellation of component parts acts as a whole – an Ensemble.

• With Ensemble Modeling the Core Business Concepts that we define and model are represented as a whole – an ensemble – including all of the component parts.

• An Ensemble is based on all things defining a Core Business Concept that can be uniquely and specifically said for one instance of that Concept.

Page 17: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Data Vault Agility

17

• The Data Vault Ensemble conforms to a single key embodied in the Hub construct.

• The component parts for the Data Vault Ensemble include: – Hub The Natural Business Key – Link The Natural Business Relationships – Satellite All Context, Descriptive Data and History

Page 18: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

The Data Vault Ensemble

18

• Data Vault constructs have been broken out by type of data…

Customer Customer

Core Constructs…

Core

Page 19: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Hubs

– A Hub Construct in Data Vault • contains Business Key • only the Business Key • contains No Context • is always 1:1 with EWBK

– A Hub Table contains only • Business Key • Surrogate Key (Data Warehouse) • Load Date / Time Stamp • Record Source

Record source

Date/Time Stamp

Business Key

H_Customer_SID

H_Customer

Page 20: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Links

– A Link Construct in Data Vault • contains Relationship • only a Relationship • contains No Context • is always 1:1 with Relationship

– A Link Table contains only • 2-n FKs for the Relationship • Surrogate Key (Data Warehouse) • Load Date / Time Stamp • Record Source

L_Cust_Class_SID

H_Sequence1_SID

H_Sequence2_SID

Date/Time Stamp

Record source

L_Cust_Class

Page 21: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Satellites

– A Satellite Construct in Data Vault • contains Context only • has no FKs (no relationships) • Designed by * Rate of Change

* Type of Data * System…

– A Satellite Table contains only • Business Key FK + • Load Date / Time Stamp • Context Data… • Record Source

Context A Context B Context C

H_Customer

Record source Context D

Date/Time Stamp

S_Customer

Page 22: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Applying the data vault modeling pattern

Page 23: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Data Vault Model – How it Looks

23

Data Vault Model for Customer Sales with Employee and Product.

Page 24: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Core Concepts

24

Page 25: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Core Concepts

25

Six (6) Concept Keys

Page 26: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Data Vault Backbone

Six (6) Concept Keys

The model as viewed.. without the things that describe the key without the things that change over time

The core foundation, the skeletal structure of the data vault model

Page 27: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

The Complete Data Vault Model

27

Complete model with all context and history. Easily adapting to changes.

Page 28: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Applying the data vault modeling pattern

Page 29: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Tracking History: Time Slice Data

Page 30: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Tracking History: Time Slice Data

Page 31: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Tracking History: Time Slice Data

Page 32: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Tracking History: Time Slice Data

Page 33: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Tracking History: Time Slice Data

Page 34: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Impact of Change: New Attribute

34

New Attribute

5

Page 35: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

• The Data Warehouse needs to adapt to change easily, be based on central business concepts, integrate data from several sources, track history of changing context, contain trusted and auditable information, and it needs to perform.

• Answering this call means a data warehouse program that is designed to meet these requirements with the people, processes, and the modeling techniques that support them.

• Data Warehouse modeling => Ensemble modeling. Techniques that are based on Unified Decomposition. There are several forms of Ensemble methods in play today.

• Data Vault modeling is the leading form of Ensemble modeling today.

• The Best Practice is Modeling Awareness

The Bottom Line

Page 36: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Data Vault Around the World

36

Estimated 750 Data Vault based Data Warehouses around the world

Page 37: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

Data Vault Certification Course

The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of data vault modeling. The course is delivered in a blended learning method using online video lessons (2 weeks), classroom lectures, exercises, labs and small group modeling cases. Public courses are offered on a regular schedule www.GeneseeAcademy.com and there are in-company options as well.

37

Data Vault Class

June 10-11

Amsterdam NL

Register Today!

Page 38: Data Vault & Ensemble Modeling - BI-Podium · The Genesee Academy CDVDM – Data Vault Modeling Course. The CDVDM is the data vault certification course covering all main topics of

© 2013 Genesee Academy, LLC

• Hans Hultgren is an author, speaker, educator and advisor in the data warehousing and business intelligence space. He is an expert on data vault modeling and the author of Modeling the Agile Data Warehouse with Data Vault where he introduced Ensemble Modeling and Unified Decomposition.

• Hans is the President of Genesee Academy, LLC (including also

www.DataVaultAcademy.com) which provides the CDVDM data vault certification around globe.

• For 20 years Hans was a professor at DU where he was the founder and

director of the masters of science degree in business intelligence and data warehousing MSBI.

About Hans Hultgren