Upload
information-builders
View
445
Download
3
Embed Size (px)
Citation preview
1
Copyright © 1991 ‐ 2016 R20/Consultancy B.V., The Hague, The Netherlands. All rights reserved. No
part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic,
or otherwise, without the explicit written permission of the copyright owners.
Data Quality and Governance in a Data‐Obsessed World
byRick F. van der LansR20/Consultancy BVTwitter @rick_vanderlanswww.r20.nl
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 2
Rick F. van der LansRick F. van der Lans is an independent consultant, lecturer, and author. He specializes in data warehousing, business intelligence, database technology, and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects in which data warehousing, and integration technology was applied.
Rick van der Lans is an internationally acclaimed lecturer. He has lectured professionally for the last twenty five years in many of the European and Middle East countries, the USA, South America, and in Australia. He has been invited by several major software vendors to present keynote speeches.
He is the author of several books on computing, including his new Data Virtualization for Business Intelligence Systems. Some of these books are available in different languages. Books such as the popular Introduction to SQL is available in English, Dutch, Italian, Chinese, and German and is sold world wide. He also authored The SQL Guide to Ingres and SQL for MySQL Developers.
As author for TechTarget.com and BeyeNetwork.com, writer of whitepapers, chairman for the annual European Enterprise Data and Business Intelligence Conference, and as columnist for a few IT magazines, he has close contacts with many vendors.
R20/Consultancy B.V. is located in The Hague, The Netherlands, www.r20.nl. You can get in touch with Rick via: Email: [email protected]: @Rick_vanderlansLinkedIn: http://www.linkedin.com/pub/rick-van-der-lans/9/207/223
2
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 3
Economic Resources
Economic resources = Factors of production
Pr imar y resources: land, labor, and capital• primary factors facilitate production but
neither become part of the product
Secondar y resources: materials and energy
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 4
The New Economic Resource: Data
3
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 5
Usage of Production Data is Changing
Data is used for reportingData is used for forecasting and predictions
Data is used for improving business processesData is used for improving customer careData is used for product personalizationData is used by customers and suppliersData is used …
Before
Now
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 6
The Importance of Data Quality
The quality of raw products determines the quality of end productsThe quality of labor determines the quality of end products
Likewise …
The quality of data determines the quality of an organization’s products and efficiency
4
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 7
Data Quality is Key
Source: Experian Data Quality, 2015; see https://www.edq.com/uk/resources/papers/global-data-quality-research/
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 8
The Classic Data Warehouse Architecture
ETL ETLETL
Sourcesystems
Data martsDatawarehouse
Stagingarea
Analytics &reporting
5
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 9
The Classic Data Warehouse Architecture
ETLETL
Sourcesystems
Data martsDatawarehouse
Stagingarea
Analytics &reporting
Data Cleansing
ETL
Manual corrections
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 10
“Old” Requirements
No need for real-time data in reports• There was time to spend on data cleansing
No need for high-quality data in production systemsOnly internally-produced data used for reportingMostly internal usersAll reports developed by IT specialists
6
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 11
New Requirements
Reporting and analytics requires real-time dataExternal users, such as customers and suppliersMixing of internal with external dataMachine-generated dataSelf-service development of reports…
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 12
Operational Business Intelligence
Web analytics: Which ad or product to present nowSecurity: Face recognition real-timeFactories: Changing machine settings based on real-time eventsCall Centers: Predict the chance of churning and predict which service or upgrade to offer
Incorrect data can lead to the wrong reaction
7
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 13
The Chain is Too Long for Real‐time Reporting
ETL ETLETL
Sourcesystems
Data martsDatawarehouse
Stagingarea
OperationalAnalytics &reporting
Too many steps and too much copying
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 14
The Chain is Too Long for Real‐time Reporting
ETL ETLETL
Sourcesystems
Data martsDatawarehouse
Stagingarea
ClassicAnalytics &reporting
OperationalBI reports
8
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 15
Customer‐Driven BI
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 16
Real‐Time Reporting for Customers
9
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 17
Real‐Time Analytics for Customers
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 18
High Data Qualityis Crucial for
Customer‐Driven BI
10
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 19
Streaming Data
Producersof data
Storage ofstreaming data
Consumersof data
Listener
Listener
Listener
Listener
Streamprocessor
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 20
Data Streaming for Operational BI
ETL ETLETL
Sourcesystems
Data martsStagingarea
Analytics &reporting
Datawarehouse
Producersof data Consumers
of data
Streamprocessor
?
11
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 21
Self‐Service BI Continues
Self-Service Data Visualization
Self-Service Analytics
Self-Service ETL
Self-Service Data Preparation
Self-Service …
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 22
Self‐Service Data Preparation
Non-technical interface for studying data filesEasy way of defining rulesData is fixed by defining filters, not by changing data in source systemsRelationship with data blending
User s ar e def ining t heir own dat a qualit y r ules
12
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 23
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 24
Open Data is Available in Abundance
13
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 25
External Data Integration by IT?
ETL ETLETL
Sourcesystems Data martsData
warehouseStagingarea
Analytics &reporting
Socialmedia data
Open data
Spreadsheets
ETL
ETL ETL
?
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 26
External Data Integration by Users
ETL ETLETL
Sourcesystems Data marts
Datawarehouse
Stagingarea Self‐Service
Analytics
Socialmedia data
Open data
Spreadsheets
?
14
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 27
Raising the Data Quality Bar
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 28
Option 1: Do Nothing
15
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 29
Option 2: Old Technology
For New Applications
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 30
Option 3:Adopt New Technology, but Stick to Old Ideas
16
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 31
Recommendations (1)
Data quality is not only relevant for reporting and analyticsData has become a primary economic resourceData quality improves reporting results, but has operational business impact as wellPoor data quality can be as damaging to an organization as other poor-quality resources
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 32
Recommendations (2)
Presenting poor data quality to customers and suppliers will reflect poorly on an organizationPoor data quality may lower trust in the organization
17
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 33
Recommendations (3)
Move data quality checks upstreamDevelop new production systems with data quality checks built-inUse new architectures
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 34
ETL ETLETL
Sourcesystems
Data martsStagingarea
Analytics &reporting
Datawarehouse
Shortening the Chain
ETLETL
ETL
18
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 35
Recommendations (4)
A dat a st r at egy is essential for implementing an adequate data quality program, not an option
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 36
What is Data Strategy?
A single, unified, organization-wide plan …… for the use of corporate data …… as a vital asset for strategic and operational decision-making. Investing in a formal data strategy lends much needed intentionality around critical data related issues, such as data quality, metadata, performance, data distribution, organization, ownership, security, privacy, etc.
Source: Capstone Consulting, January 2009