Data quality overview

Embed Size (px)

Citation preview

Data Quality OverviewAlex Meadows1/28/2013

Data Quality Facts

Cost of poor data quality in US - $600 Billion

Poor Data/Lack of visibility cited as #1 reason for project cost overruns

Poor data quality costs the US Economy $3.1 Trillion a year

Implementing data quality best practices boosts revenue by 66%

Median Fortune 1000 company could increase revenue by $2.01 Billion if they improved usability of data by 10%

Source: http://www.webmastat.com/blog/2012/09/07/7-facts-about-data-quality/

What is Data Quality?

Measuring data to determine if it isfit for purpose

Fit For Purpose?

Bad data is a myth!

Two Questions

What is the data used for?

What can be measured to make sure it meets the need?

Application use vs. Reporting/Analysis

Data Quality Dimensions

Consistency

Correctness

Timeliness

Precision

Unamiguous

Completeness

Reliability

Accuracy

Objectivity

Conciseness

Usefulness

Usability

Relevance

Amount of data

Source: Data Quality Fundamentals, The Data Warehousing Institute

Measuring Data Quality

Profiling understanding metadata

Point in time shows what data looks like now

Automating shows trendsAlert to new/potential issues as they happen

Potentially fix issues in near real time

Six Sigma Principals

Statistical Process Control

Automated inspection

Visibly shows process deviation

Data Profiling Analysis

Duplication

Pattern matching

Boolean/String/Number

Date Gap

Date/time

Day of Week

Character Set

Reference Data Matching

Value Distribution

Inter-Data Set Comparisons

Master Data Management

Create a gold standard for data

Distribute data so that all sources are uniform

Names

Addresses

Phone Numbers

Products

Can hook into third party sources

Data Governance Program

Central authority for data quality control

Applies information collected from data profiling, MDM, etc. Uniformly across the business

Communication channels between business and IT groups

Questions?