Agile Data Warehousing

Preview:

Citation preview

Agile Data WarehousingFrom Start to Finish

Presenter: Davide Mauri, Architect & Mentor, SolidQModerator: Alex Whittles

Technical Assistance

2

If you require assistance during the session, type your inquiry into the question pane on the right side.

Maximize your screen with the zoom button on the top of the presentation window

Type your questions in the question pane on the right side

Thank You Sponsors

Welcome to the Azure family!Try DocumentDB today!

http://documentdb.com

Solutions from Dell help you monitor, manage, protect and improve your SQL Server environment.

http://software.dell.com/sql-pass-vc-dell-sql-server-solutions

Planning on attending PASS Summit 2014? Start saving today!

• The world’s largest gathering of SQL Server & BI professionals

• Take your SQL Server skills to the next level by learning from the world’s SQL Server experts, in 190+ technical sessions

• Over 5000 attendees, representing 2000 companies, from 52 countries, ready to network & learn

Use discount code 24HOP14to save $200!

$1,895UNTIL SEPTEMBER

26, 2014

www.PASSSummit.com

Davide Mauri

SolidQ Mentor Board of Directors, SolidQ Italy Microsoft SQL Server MVP Works with managers to build

effective, tailor-made BI solutions for customers

@mauridb

Agile Data WarehousingFrom Start to Finish

Davide Mauri, Architect & Mentor, SolidQ

Agenda

What is a DWH, really? Agile: the only way to succeedEngineering the DWHETL Design PatternsETL AutomationTesting

What is a DWH, really?

The Data-Driven Age

Isn’t the DWH and “old” thing?

Big Data, In Memory and all the new stuff, can’t just replace the Data Warehouse?

The answer would be “yes”, if a DWH would be a simple “container” of data.

But it’s much more than this.

What is a DWH, really?

In this new era, data is like water.

Who will ever drink from untested, untrusted, uncertified data?

What is a DWH, really?

Would a manager or a decision maker, take a decision based on data of which he doesn’t know the source, the integrity and the correctness?

What is a DWH, really?

The Data Warehouse is the place where managers and decision makers will look for• Correct• Trusted• UpdatedData in order to make a conscious decision

What is a DWH, really?

The answer is now easy:

What is DWH, really?

A place to store consolidated data coming from the whole companyA place where cleanse, verify and certify dataA place where historic data is storedA place that holds the single version of truth (if there is one!)Forms the core of a BI solutionUser friendly Data models, designed to make data analysis easier

Modern Data Environment

MasterData

EDWData Mart

Big Data

UnstructuredData

BI Environment

Analytics Environment

StructuredData Data Scientist

Decision Maker

Agility: the only way to succeed

EDW: Reality Check

EDW is the trusted container of all company data

It cannot be created in “one day”

It has to grow and evolve with business needs.

It will never be 100% complete

The story so far

Adapt to Survive

“50% of requirements change in the first year of a BI project”

Andreas Bitterer, Research VP, Gartner

Agile Principles

Small design upfront. Prototype.

Delivery quickly, Deliver frequently.

Users are part of the development team!Feedback is a key part of the successThey’ll grow with the solution and the solution will grow with them

Embrace Changes!

http://agilemanifesto.org/principles.html

Agile Challenges

Delivery Quickly and Fast Challenge: keep high quality, no matter who’s doing the

work

Embrace Changes Challenge: don’t introduce bugs. Change the smallest part

possible. Use automatic Testing to preserve and assure data quality.

Engineering the DWH

Engineering the solution

To be Agile, some engineering practices needs to be included in our work model

Agility != Anarchy

Engineering: Apply well-known models Define, Apply & Enforce rules Automate and/or Check rules application Measure Test

24

Engineering the solution

Favor Kimball Approach (for user-facing models) Dimensional Modeling Fact & Measures Dimensions

Use views to introduce abstraction layers Reduce the “friction” between layers (source / stage / dwh

/ dm) Apply the “Information Hiding Principle”

Engineering the solution

Define & Force the application of well-known ETL patterns

SCD1 / SCD2 Incremental / Partition Load

Divide Et Impera At least two SSIS solutions many small SSIS Packages 5 Databases (STG, CFG, LOG, MD, DWH)

Design Pattern

“A general reusable solution to a commonly occurring problem within a given context”

Design Pattern

Generic ETL Pattern Partition Load Incremental/Differential Load

Generic DWH/BI Design Pattern Slowly Changing Dimension

SCD1, SCD2, ecc. Fact Table

Transactional, Snapshot, Temporal Snapshot

Design Pattern

Specific SQL Server Patterns Change Data Capture Change Tracking Partition Load SSIS Parallelism

ETL Automation

No Monkey Work!

Let the people think and let the machines do the «monkey» work.

Invest on Automation?

Faster development Reduce Costs Embrace Changes

Less bugs

Increase solution quality and make it consistent throughout the whole product

Hi-Level Vision

STGETLETL

OLTP DWH

ETL

Technical Process

Business Process

Technical Process

ETL Phases

«E» and «L» must be Simple, Easy and Straightforward Completely Automated Completely Reusable

«E» and «L» have ZERO value in a DWH Solution

Should be done in the most economic way

Automation Tools

PowerShell / .NET Supported by SMO & SSIS API Microsoft creates platforms not only products!

BIML – BI Markup Language From Varigence Free with BIDS Helper Full support with MIST

Metadata

Metadata is needed in order to make automation a repeatable process

Source to Staging Info Staging to DWH info

Dimension Keys Dimension & Fact Table relationship

Extended Properties + SQL Server DMVs help to maintain metadata coherent

Unit Testing

Unit Testing

Data MUST be tested.

It’s like water, remember?

If trust is lost, DWH is an#epicfail

Unit Testing

Before releasing anything data in the DW must be tested.

User has to validate a sample of data (e.g.:total invoice amount of January 2012)

That validated value will become the reference value

Before release, the same query will be executed again. If the data is the expected reference data then test is green otherwise the test fails

Unit Testing

Of course test MUST be automated when possible Visual Studio NUnit extensions

NBI BI.Quality

What to test? Aggregated results Specific values of some «special» rule Fixed bugs/tickets

40

The perfect BI process & architecture

AGILE BI

Iterative!

Questions?

Like What You Heard?

Davide will be presenting at PASS Summit 2014!

PreConference: Agile Data Warehousing: Start to Finish

General Session: Agile BI: Unit Testing and Continuos

Integration

Use discount code 24HOP14to save $200!

@mauridb

DAX Formulas in Action

Alberto Ferrari

Coming up next …

Thank You for Attending

Recommended