45
Agile Data Warehousing From Start to Finish Presenter: Davide Mauri, Architect & Mentor, SolidQ Moderator: Alex Whittles

Agile Data Warehousing

Embed Size (px)

Citation preview

Page 1: Agile Data Warehousing

Agile Data WarehousingFrom Start to Finish

Presenter: Davide Mauri, Architect & Mentor, SolidQModerator: Alex Whittles

Page 2: Agile Data Warehousing

Technical Assistance

2

If you require assistance during the session, type your inquiry into the question pane on the right side.

Maximize your screen with the zoom button on the top of the presentation window

Type your questions in the question pane on the right side

Page 3: Agile Data Warehousing

Thank You Sponsors

Welcome to the Azure family!Try DocumentDB today!

http://documentdb.com

Solutions from Dell help you monitor, manage, protect and improve your SQL Server environment.

http://software.dell.com/sql-pass-vc-dell-sql-server-solutions

Page 4: Agile Data Warehousing

Planning on attending PASS Summit 2014? Start saving today!

• The world’s largest gathering of SQL Server & BI professionals

• Take your SQL Server skills to the next level by learning from the world’s SQL Server experts, in 190+ technical sessions

• Over 5000 attendees, representing 2000 companies, from 52 countries, ready to network & learn

Use discount code 24HOP14to save $200!

$1,895UNTIL SEPTEMBER

26, 2014

www.PASSSummit.com

Page 5: Agile Data Warehousing

Davide Mauri

SolidQ Mentor Board of Directors, SolidQ Italy Microsoft SQL Server MVP Works with managers to build

effective, tailor-made BI solutions for customers

@mauridb

Page 6: Agile Data Warehousing

Agile Data WarehousingFrom Start to Finish

Davide Mauri, Architect & Mentor, SolidQ

Page 7: Agile Data Warehousing

Agenda

What is a DWH, really? Agile: the only way to succeedEngineering the DWHETL Design PatternsETL AutomationTesting

Page 8: Agile Data Warehousing

What is a DWH, really?

Page 9: Agile Data Warehousing

The Data-Driven Age

Page 10: Agile Data Warehousing

Isn’t the DWH and “old” thing?

Big Data, In Memory and all the new stuff, can’t just replace the Data Warehouse?

The answer would be “yes”, if a DWH would be a simple “container” of data.

But it’s much more than this.

Page 11: Agile Data Warehousing

What is a DWH, really?

In this new era, data is like water.

Who will ever drink from untested, untrusted, uncertified data?

Page 12: Agile Data Warehousing

What is a DWH, really?

Would a manager or a decision maker, take a decision based on data of which he doesn’t know the source, the integrity and the correctness?

Page 13: Agile Data Warehousing

What is a DWH, really?

The Data Warehouse is the place where managers and decision makers will look for• Correct• Trusted• UpdatedData in order to make a conscious decision

Page 14: Agile Data Warehousing

What is a DWH, really?

The answer is now easy:

Page 15: Agile Data Warehousing

What is DWH, really?

A place to store consolidated data coming from the whole companyA place where cleanse, verify and certify dataA place where historic data is storedA place that holds the single version of truth (if there is one!)Forms the core of a BI solutionUser friendly Data models, designed to make data analysis easier

Page 16: Agile Data Warehousing

Modern Data Environment

MasterData

EDWData Mart

Big Data

UnstructuredData

BI Environment

Analytics Environment

StructuredData Data Scientist

Decision Maker

Page 17: Agile Data Warehousing

Agility: the only way to succeed

Page 18: Agile Data Warehousing

EDW: Reality Check

EDW is the trusted container of all company data

It cannot be created in “one day”

It has to grow and evolve with business needs.

It will never be 100% complete

Page 19: Agile Data Warehousing

The story so far

Page 20: Agile Data Warehousing

Adapt to Survive

“50% of requirements change in the first year of a BI project”

Andreas Bitterer, Research VP, Gartner

Page 21: Agile Data Warehousing

Agile Principles

Small design upfront. Prototype.

Delivery quickly, Deliver frequently.

Users are part of the development team!Feedback is a key part of the successThey’ll grow with the solution and the solution will grow with them

Embrace Changes!

http://agilemanifesto.org/principles.html

Page 22: Agile Data Warehousing

Agile Challenges

Delivery Quickly and Fast Challenge: keep high quality, no matter who’s doing the

work

Embrace Changes Challenge: don’t introduce bugs. Change the smallest part

possible. Use automatic Testing to preserve and assure data quality.

Page 23: Agile Data Warehousing

Engineering the DWH

Page 24: Agile Data Warehousing

Engineering the solution

To be Agile, some engineering practices needs to be included in our work model

Agility != Anarchy

Engineering: Apply well-known models Define, Apply & Enforce rules Automate and/or Check rules application Measure Test

24

Page 25: Agile Data Warehousing

Engineering the solution

Favor Kimball Approach (for user-facing models) Dimensional Modeling Fact & Measures Dimensions

Use views to introduce abstraction layers Reduce the “friction” between layers (source / stage / dwh

/ dm) Apply the “Information Hiding Principle”

Page 26: Agile Data Warehousing

Engineering the solution

Define & Force the application of well-known ETL patterns

SCD1 / SCD2 Incremental / Partition Load

Divide Et Impera At least two SSIS solutions many small SSIS Packages 5 Databases (STG, CFG, LOG, MD, DWH)

Page 27: Agile Data Warehousing

Design Pattern

“A general reusable solution to a commonly occurring problem within a given context”

Page 28: Agile Data Warehousing

Design Pattern

Generic ETL Pattern Partition Load Incremental/Differential Load

Generic DWH/BI Design Pattern Slowly Changing Dimension

SCD1, SCD2, ecc. Fact Table

Transactional, Snapshot, Temporal Snapshot

Page 29: Agile Data Warehousing

Design Pattern

Specific SQL Server Patterns Change Data Capture Change Tracking Partition Load SSIS Parallelism

Page 30: Agile Data Warehousing

ETL Automation

Page 31: Agile Data Warehousing

No Monkey Work!

Let the people think and let the machines do the «monkey» work.

Page 32: Agile Data Warehousing

Invest on Automation?

Faster development Reduce Costs Embrace Changes

Less bugs

Increase solution quality and make it consistent throughout the whole product

Page 33: Agile Data Warehousing

Hi-Level Vision

STGETLETL

OLTP DWH

ETL

Technical Process

Business Process

Technical Process

Page 34: Agile Data Warehousing

ETL Phases

«E» and «L» must be Simple, Easy and Straightforward Completely Automated Completely Reusable

«E» and «L» have ZERO value in a DWH Solution

Should be done in the most economic way

Page 35: Agile Data Warehousing

Automation Tools

PowerShell / .NET Supported by SMO & SSIS API Microsoft creates platforms not only products!

BIML – BI Markup Language From Varigence Free with BIDS Helper Full support with MIST

Page 36: Agile Data Warehousing

Metadata

Metadata is needed in order to make automation a repeatable process

Source to Staging Info Staging to DWH info

Dimension Keys Dimension & Fact Table relationship

Extended Properties + SQL Server DMVs help to maintain metadata coherent

Page 37: Agile Data Warehousing

Unit Testing

Page 38: Agile Data Warehousing

Unit Testing

Data MUST be tested.

It’s like water, remember?

If trust is lost, DWH is an#epicfail

Page 39: Agile Data Warehousing

Unit Testing

Before releasing anything data in the DW must be tested.

User has to validate a sample of data (e.g.:total invoice amount of January 2012)

That validated value will become the reference value

Before release, the same query will be executed again. If the data is the expected reference data then test is green otherwise the test fails

Page 40: Agile Data Warehousing

Unit Testing

Of course test MUST be automated when possible Visual Studio NUnit extensions

NBI BI.Quality

What to test? Aggregated results Specific values of some «special» rule Fixed bugs/tickets

40

Page 41: Agile Data Warehousing

The perfect BI process & architecture

AGILE BI

Iterative!

Page 42: Agile Data Warehousing

Questions?

Page 43: Agile Data Warehousing

Like What You Heard?

Davide will be presenting at PASS Summit 2014!

PreConference: Agile Data Warehousing: Start to Finish

General Session: Agile BI: Unit Testing and Continuos

Integration

Use discount code 24HOP14to save $200!

@mauridb

Page 44: Agile Data Warehousing

DAX Formulas in Action

Alberto Ferrari

Coming up next …

Page 45: Agile Data Warehousing

Thank You for Attending