Upload
patrick-van-renterghem
View
691
Download
0
Embed Size (px)
Citation preview
Data Vault Fundamentals & Best Practices
1
Erik Fransen, managing consultant+31 6 159 444 76@erikfransen
Agenda• Introduction• Data Vault Basics• Benefits & Challenges• Best practices: Automation & Data
Virtualization• Recommended reading
2
• Founded in 1998, The Hague, NL• 40+ consultants• Business Intelligence, Data Vault, Datawarehousing,
Datawarehouse Automation, Big Data, Data Virtualization• Business & technical consultancy, end-to-end
implementation projects of Data Vault EDW, audits, training, certification
• Wide range of customers (profit, non-profit) across variousindustries
• Since 2009 Genesee Academy partner for Data Vault Day and Data Vault Certification in NL, B & D
• Implementation partner of Cisco, MapR, Qlik & Tableau
The Data Vault modeling approachData Vault is a data modeling approach
…so it fits into the family of modeling approaches:
4
3rd NormalForm EnsembleModeling Dimensional
• While 3rd Normal Form is optimal for Operational Systems
…and Dimensional is optimal for Data Marts
…the Ensemble Modeling is optimal for the Datawarehouse
• And Data Vault is the leading form of Ensemble Modeling
Forms of Ensemble Modeling
5
Why do we use Data Vault for DWH?
6
• When we need a DWH that supports:– Integration– Traceability– History– Incremental Build– Agility
• Gracefully Adapts to New Sources• Full Auditability - Source to Mart• Enterprise View of Central Data• Ready for Automation
DataVault isspecificallydesigned for modelling the
EDW
The Data Vault Ensemble
7
• The Data Vault Ensemble conforms to a single key – embodied in the Hub construct
• The parts for the Data Vault Ensemble only include:– Hubs The Natural Business Keys– Links The Natural Business Relationships– Satellite s All Context, Descriptive Data and History of
Links and Hubs“Separating things that change from things that don’t change”
The Data Vault modeling approach
• As the scope of the EDW is expanded and new data sources added, the Data Vault can adapt to these changes without impacting the existing model
• This is what allows the EDW to be built incrementally and to adapt to change without the need for re-engineering.
NewAreaabsorbed
8
H_Cust
H_SaleH_Empl
H_Store
H_Car
Toolsfor DWHAutomationupdatetheDataVaultEDW(model+data)inafast,agile&consistentway
• Business benefits• Ability to adapt quickly to new business needs• Data is traceable allowing for a fully auditable, integrated data store• Allows the EDW to absorb all data all of the time• Easily adapts to new data sources and changing business rules – without expensive re-
engineering• Results in an Data Warehouse with lower total cost of ownership (TCO)• Automation: short time to market, consist quality
• Project/development benefits • Ideal for agile development techniques resulting in lower project risk and more
frequent deliverables• Can be built incrementally without compromising the core architecture• Automation: fast and incremental sprints, predictable costs
• Architectural benefits• Parallel loading• Data architecture that supports future expanded scope• Can scale to virtually any size• Ready for Automation: forces standardization
Data Vault Benefits
9
Data Vault Modeling Process
The Modeling Process for creating a Data Vault model includes three primary steps:
1) Identify and Model the Core Business Concepts• Business Interviews is at the heart of this step
What do you do? What are the main things you work with?• Also find best/target Natural Business Key
2) Identify and Model the Natural Business Relationships• Specific Unique Relationships
3) Analyze and Design the Context Satellites• Consider Rate of Change, Type of Data and also the Sources of
your data during design process
10
Ideallythedatavaultismodelled basedonbusiness processesandbusiness
concepts
Getting data out of the Data Vault • Problem:
– The Data Vault EDW is about data decomposition, data registration and data integration
– Data Vault is not intended, nor designed or optimized for data distribution and data consumption downstream the EDW
– Leads typically to many complex physical data marts (high maintenance, high cost)
• Solution:– Start thinking differently: focus on creating functional data
products for the business– Stop loading and replicating data physically, start using
data virtualization 11
Eliminate the need for physical data martsNo data replicationneededReal-time data refreshmentNo redundant data storageSimple updates of data modelsSimple queries
Short Time toMarketAutomatic updatesLower storage costsHigh performanceReady for Big Data
DataVaultEDW
CRM
ERP
Weblogs
…
Production
DataDataCopy
Steeringinformation
SQL
DataVirtualization
Tool+
DataAbstraction
Layers
NoDataCopyatall
12
Virtual
13
SuperNovaDataModel
OperationalDataModel
UniformDataModel
DataVirtualization ”Physical”Model
VirtualApplicationLayer
Virtual“Physical”Layer
VirtualBusinessLayer
Webservices Views
Any other sourcedata
Data Layers for Data Virtualization
DataVault datawarehouse
Automated step!
Wrap up• Data Vault Basics:
– Hubs, Links, Satellites– Integration, history, incremental modelling, agility
• Benefits: – Business, project, architecture– Make use of automation tools for fast, agile and consistent
delivery• Challenges:
– Data downstream the data vault EDW– Solution: use virtual data marts and automate SuperNova
data models for reporting & analytics
14
Recommendedreading onSuperNovaFreedownloadhttp://www.cisco.com/web/services/enterprise-it-services/data-
virtualization/documents/whitepaper-cisco-datavaul.pdf
15
RecommendreadingonDataVaultFreedownloadshttp://hanshultgren.wordpress.com/
16
RecommendreadingonEnsemble&DataVaultModelingtheAgileDataWarehousewithDataVault
• DataVaultModeling• AgileDataWarehousingBI• EnterpriseDataWarehousing• DataIntegrationandDWBIArchitecture• UnifiedDecomposition™
• EnsembleModeling™
• AcompletebookonDataVault• AnIntroduction,aGuideandaReference• Modeling,Architecture&theDataWarehousingProgram• Data&SemanticIntegrationforEnterpriseCentralMeaning• ApplyingConceptstoasuccessful AgileDWBIProgram
17
RecommendreadingonDataVirtualizationDataVirtualizationinBusinessIntelligenceArchitectures
• Firstindependent book ondatavirtualization thatexplains inaproduct-independentwayhow datavirtualization technology works.
• Illustrates concepts using examples developed withcommercially available products.
• Showsyou how to solve commondataintegrationchallenges such asdataquality,systeminterference,and overallperformanceby followingpracticalguidelines onusing datavirtualization.
• Apply datavirtualization rightawaywith threechapters fullofpracticalimplementation guidance.
• Understandthebigpictureofdatavirtualizationand its relationship with datagovernance andinformationmanagement.
18
Data Vault Training & Certification
• CDVDM: March 31, April 1 2016 Amsterdam• DVD: March 2, 2016 Diegem
• www.centennium-opleidingen.nl• For all questions: [email protected]
19
A short history on Data Vault• 2002: First papers published by Dan Linstedt• 2006: Start CDVDM certification program by Genesee
Academy • 2007: Start of Data Vault EDW implementations
– Primarily in Europe (NL, S), some in USA
• 2008-2015: Several books published on DataVault by Dan Linstedt, Hans Hultgren and others
• 2013: Data Vault on the radar in B, DACH, UK, USA, AUS, NZ, Asia
• 2013: Data Vault EDW implementations going worldwide• 2015: Over 900 CDVDM professionals and 750+ Data Vault
EDW worldwide20