Upload
tatiana-stebakova
View
286
Download
2
Embed Size (px)
Citation preview
A HITCHHIKER'S GUIDE TO DATA QUALITY
Tatiana Stebakova
The Data & Information Assembly Australia April 2015
Evolution of DQ Governance approach over the past 10 years
How to make a quantum leap from DQ theory to execution, personal view
You’ve done it all by the book, but there is little traction in Data quality. DQ and system’s thinking. Don’t panic!
Content
Evolution of DQ Governance approach over the past 10 years
Data Duplicates – still magic words
Data Quality Frameworks - from emergence to maturity
Senior Management Support - a breakthrough
Senior Architects Support – little change
Data Quality Governance - from novelty to mainstream
Data Quality Tools and Technology – from luxury to BAU
Metadata - from “what is it?” to “new black”
How to make a quantum leap from DQ theory to execution, personal view
Step1. Data Quality Justification
DQ Horror stories
About 6.5 million Americans are 112 or older. The US Social Security office has 6.5 million people on record as having reached the age of 112, even though only 42 people are known to be that old globally
"Studies in cost analysis show that
between 15% to > 20% of a company’s operating revenue is spent doing things to get around or fix data quality issues"
Larry English
Option 1 – What can we gain?
Option 2 – Scare technique
Option 3 (my favourite) –Risks
"Poor data is like a dirty windscreen. You can continue driving as your
vision degrades, but at some point you must stop and clear the
windscreen or risk everything"
Ken Orr
Step2. Build DQ requirements into solution architecture and system’s development contract
Example of DQ requirements
ETL solution SHALL have capability to perform Column integrity screening/ profiling
ETL solution SHALL have capability to perform Data Structure screening/ profiling
ETL solution SHALL have capability to perform Compliance to Business rule screening/ profiling
ETL controls solution SHALL capture and store the date and time that the data batch extraction process
completed successfully.
Editorial note: This may or may not be the same date as the Batch Business Schedule Date. It is
recommended to use ISO 8601 standard to represent the date/ time.
Quality should be built into the product, and testing alone cannot be relied to ensure product quality (FDA, Current Good Manufacturing Practice)
The … ETL controls solution SHALL perform a periodic full snapshot
of the same data for reconciliation purposes, if Delta files are used.
The … ETL solution SHALL have capability to perform Data
Structure screening/profiling
The … data extract process SHALL support logical data
consistency (temporal relationship of data).
Step3. Build data quality requirements into system’s operation contract + DQ KPIs
“I’ve never been a good
spectator.
Either I’m playing the
game or I’m not
interested.”Christiaan Barnard, the first surgeon,
performed heart transplant
…..solution shall have a capability to measure and report on the data quality Key Performance Indicators
(KPIs) as defined by the Governance authority.
KPI Examples:
• customer record uniqueness
• directory currency and accessibility
• information provenance.
• uptake rate - coverage
• quality of records per DQ dimensions and characteristics
• response time for typical transactions.
You’ve done it all by the book, but there is little traction in Data quality.
Don’t be afraid
From Hitchhiker to Hijacker Become a driver. Apply for the architect’s, project lead or data
management jobs
Drop your “data quality bugs/requirements” anywhere you can
Look for opportunities. Change your strategy all the time
Mimic your requirements, do not call them DQ requirements
Lean on standards
Do not reference DQ gurus. Reference Technology gurus instead
Befriend architects
Be patient, keep cool
““Success is not final,
failure is not fatal: it is
the courage to continue
that counts.” Winston Churchill
Complex adaptive systems (CAS) - are dynamic systems able to adapt with a changing environment where all participants are closely linked with each other making up an “IT ecosystem” (MIT)
Within such ecosystem, change becomes not so much as adaptation, but co-evolution with all other related systems
Rules of flocking: Follow the leader
Align with neighbours
Avoid overcrowding
Data Quality and system’s thinking
System’s thinking – delayed response
Launch date - 2 March 2004
Mission duration 10 years, 11 months and 23 days
6.5 billion Kilometres
“After 10 years, and a journey of more than six billion kilometres, the Rosetta spacecraft sent its fridge-sized Philae lander down to Comet
67P/Churyumov-Gerasimenko”.
Questions