19
1 Copyright © 1991 2016 R20/Consultancy B.V., The Hague, The Netherlands. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic, or otherwise, without the explicit written permission of the copyright owners. Data Quality and Governance in a DataObsessed World by Rick F. van der Lans R20/Consultancy BV Twitter @rick_vanderlans www.r20.nl Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 2 Rick F. van der Lans Rick F. van der Lans is an independent consultant, lecturer, and author. He specializes in data warehousing, business intelligence, database technology, and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects in which data warehousing, and integration technology was applied. Rick van der Lans is an internationally acclaimed lecturer. He has lectured professionally for the last twenty five years in many of the European and Middle East countries, the USA, South America, and in Australia. He has been invited by several major software vendors to present keynote speeches. He is the author of several books on computing, including his new Data Virtualization for Business Intelligence Systems. Some of these books are available in different languages. Books such as the popular Introduction to SQL is available in English, Dutch, Italian, Chinese, and German and is sold world wide. He also authored The SQL Guide to Ingres and SQL for MySQL Developers. As author for TechTarget.com and BeyeNetwork.com, writer of whitepapers, chairman for the annual European Enterprise Data and Business Intelligence Conference, and as columnist for a few IT magazines, he has close contacts with many vendors. R20/Consultancy B.V. is located in The Hague, The Netherlands, www.r20.nl. You can get in touch with Rick via: Email: [email protected] Twitter: @Rick_vanderlans LinkedIn: http://www.linkedin.com/pub/rick-van-der-lans/9/207/223

Data Quality and Governance in a Data Obsessed World

Embed Size (px)

Citation preview

1

Copyright © 1991 ‐ 2016 R20/Consultancy B.V., The Hague, The Netherlands. All rights reserved. No 

part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic, 

or otherwise, without the explicit written permission of the copyright owners.

Data Quality and Governance in a Data‐Obsessed World

byRick F. van der LansR20/Consultancy BVTwitter @rick_vanderlanswww.r20.nl

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 2

Rick F. van der LansRick F. van der Lans is an independent consultant, lecturer, and author. He specializes in data warehousing, business intelligence, database technology, and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects in which data warehousing, and integration technology was applied.

Rick van der Lans is an internationally acclaimed lecturer. He has lectured professionally for the last twenty five years in many of the European and Middle East countries, the USA, South America, and in Australia. He has been invited by several major software vendors to present keynote speeches.

He is the author of several books on computing, including his new Data Virtualization for Business Intelligence Systems. Some of these books are available in different languages. Books such as the popular Introduction to SQL is available in English, Dutch, Italian, Chinese, and German and is sold world wide. He also authored The SQL Guide to Ingres and SQL for MySQL Developers.

As author for TechTarget.com and BeyeNetwork.com, writer of whitepapers, chairman for the annual European Enterprise Data and Business Intelligence Conference, and as columnist for a few IT magazines, he has close contacts with many vendors.

R20/Consultancy B.V. is located in The Hague, The Netherlands, www.r20.nl. You can get in touch with Rick via: Email: [email protected]: @Rick_vanderlansLinkedIn: http://www.linkedin.com/pub/rick-van-der-lans/9/207/223

2

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 3

Economic Resources

Economic resources = Factors of production

Pr imar y resources: land, labor, and capital• primary factors facilitate production but

neither become part of the product

Secondar y resources: materials and energy

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 4

The New Economic Resource: Data

3

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 5

Usage of Production Data is Changing

Data is used for reportingData is used for forecasting and predictions

Data is used for improving business processesData is used for improving customer careData is used for product personalizationData is used by customers and suppliersData is used …

Before

Now

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 6

The Importance of Data Quality

The quality of raw products determines the quality of end productsThe quality of labor determines the quality of end products

Likewise …

The quality of data determines the quality of an organization’s products and efficiency

4

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 7

Data Quality is Key

Source: Experian Data Quality, 2015; see https://www.edq.com/uk/resources/papers/global-data-quality-research/

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 8

The Classic Data Warehouse Architecture

ETL ETLETL

Sourcesystems

Data martsDatawarehouse

Stagingarea

Analytics &reporting

5

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 9

The Classic Data Warehouse Architecture

ETLETL

Sourcesystems

Data martsDatawarehouse

Stagingarea

Analytics &reporting

Data Cleansing

ETL

Manual corrections

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 10

“Old” Requirements

No need for real-time data in reports• There was time to spend on data cleansing

No need for high-quality data in production systemsOnly internally-produced data used for reportingMostly internal usersAll reports developed by IT specialists

6

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 11

New Requirements

Reporting and analytics requires real-time dataExternal users, such as customers and suppliersMixing of internal with external dataMachine-generated dataSelf-service development of reports…

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 12

Operational Business Intelligence

Web analytics: Which ad or product to present nowSecurity: Face recognition real-timeFactories: Changing machine settings based on real-time eventsCall Centers: Predict the chance of churning and predict which service or upgrade to offer

Incorrect data can lead to the wrong reaction

7

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 13

The Chain is Too Long for Real‐time Reporting

ETL ETLETL

Sourcesystems

Data martsDatawarehouse

Stagingarea

OperationalAnalytics &reporting

Too many steps and too much copying

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 14

The Chain is Too Long for Real‐time Reporting

ETL ETLETL

Sourcesystems

Data martsDatawarehouse

Stagingarea

ClassicAnalytics &reporting

OperationalBI reports

8

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 15

Customer‐Driven BI

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 16

Real‐Time Reporting for Customers

9

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 17

Real‐Time Analytics for Customers

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 18

High Data Qualityis Crucial for

Customer‐Driven BI

10

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 19

Streaming Data

Producersof data

Storage ofstreaming data

Consumersof data

Listener

Listener

Listener

Listener

Streamprocessor

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 20

Data Streaming for Operational BI

ETL ETLETL

Sourcesystems

Data martsStagingarea

Analytics &reporting

Datawarehouse

Producersof data Consumers

of data

Streamprocessor

?

11

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 21

Self‐Service BI Continues

Self-Service Data Visualization

Self-Service Analytics

Self-Service ETL

Self-Service Data Preparation

Self-Service …

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 22

Self‐Service Data Preparation

Non-technical interface for studying data filesEasy way of defining rulesData is fixed by defining filters, not by changing data in source systemsRelationship with data blending

User s ar e def ining t heir own dat a qualit y r ules

12

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 23

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 24

Open Data is Available in Abundance

13

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 25

External Data Integration by IT?

ETL ETLETL

Sourcesystems Data martsData

warehouseStagingarea

Analytics &reporting

Socialmedia data

Open data

Spreadsheets

ETL

ETL ETL

?

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 26

External Data Integration by Users

ETL ETLETL

Sourcesystems Data marts

Datawarehouse

Stagingarea Self‐Service

Analytics

Socialmedia data

Open data

Spreadsheets

?

14

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 27

Raising the Data Quality Bar

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 28

Option 1: Do Nothing

15

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 29

Option 2: Old Technology

For New Applications 

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 30

Option 3:Adopt New Technology, but Stick to Old Ideas

16

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 31

Recommendations (1)

Data quality is not only relevant for reporting and analyticsData has become a primary economic resourceData quality improves reporting results, but has operational business impact as wellPoor data quality can be as damaging to an organization as other poor-quality resources

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 32

Recommendations (2)

Presenting poor data quality to customers and suppliers will reflect poorly on an organizationPoor data quality may lower trust in the organization

17

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 33

Recommendations (3)

Move data quality checks upstreamDevelop new production systems with data quality checks built-inUse new architectures

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 34

ETL ETLETL

Sourcesystems

Data martsStagingarea

Analytics &reporting

Datawarehouse

Shortening the Chain

ETLETL

ETL

18

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 35

Recommendations (4)

A dat a st r at egy is essential for implementing an adequate data quality program, not an option

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 36

What is Data Strategy?

A single, unified, organization-wide plan …… for the use of corporate data …… as a vital asset for strategic and operational decision-making. Investing in a formal data strategy lends much needed intentionality around critical data related issues, such as data quality, metadata, performance, data distribution, organization, ownership, security, privacy, etc.

Source: Capstone Consulting, January 2009

19

Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 37

Data Quality