25

SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic [email protected]

Embed Size (px)

Citation preview

Page 1: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com
Page 2: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

SQL SERVER DATA QUALITY SERVICES

Marc Jellinek

Principal Consultant – Neudesic

[email protected]

Page 3: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

ABOUT ME

Experience

• Principal Consultant - Neudesic

• Assistant Director (SQL Team) – Application Engineering at Ernst & Young

• IT Manager at MLB Network

• Sr. Technology Specialist at Microsoft

Technologies

• Microsoft SQL Server 6.0, 6.5, 7.0, 2000, 2005, 2008, 2008 R2 and 2012

• Relational Engine, Analysis Services, Integration Services and Reporting Services

Marc Jellinek – [email protected]

Page 4: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

SESSION OBJECTIVES

• Introduction to SQL Server Data Quality Services (DQS)

• Understanding the problem

• Demo

• Where do we go from here?

Page 5: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

SETTING THE STAGE

• Building from the SQL Server Series: Master Data Services in SQL Server 2012, presented by Patrick Gallucci

• http://www.neudesic.com/media/webcasts/20120501/20120501.wmv (start at 6:26)

• PASS MDM/DQS Virtual Chapter http://masterdata.sqlpass.org

• Based on demos from “SQL Server 2012 Developers Update”

Page 6: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

Data Quality Issue Sample Data ProblemStandard Are data elements consistently

defined and understood?Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system

Complete Is all necessary data present? 20% of customers’ last name is blank,

50% of zip-codes are 99999

Accurate Does the data accurately represent reality or a verifiable source?

A Supplier is listed as ‘Active’ but went out of business six years ago

Valid Do data values fall within acceptable ranges?

Salary values should be between 60,000-120,000

Unique Data appears several times Both John Ryan and Jack Ryan appear in the system – are they the same person?

THE DATA QUALITY PROBLEM SPACE

Page 7: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

WHAT’S THE PROBLEM

• My name is Marc Jellinek

• Marc <> “Mark”, “Marck” or “March”

• Jellinek <> “Jelinek”, “Jellineck”, “Jelineck”, “Jelliner”, “Jeliner” or “Jellyneck”

Rr

Page 8: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

WHAT’S THE PROBLEM

Jellinek Jelinek Jellineck Jelineck Jelliner Jeliner Jellyneck

Marc

Mark

Marck

March

Page 9: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

THE NIGHTMARE SCENARIO

The Customer Dimension

– Jellinek, Marc– Jellinek, Mark– Jellinek, Marck– Jellinek, March– Jelinek, Marc– Jelinek, Mark– Jelinek, Marck– Jelinek, March– Jellineck, Marc– Jellineck, Mark– Jellineck, Marck– Jellineck, March– Jelineck, Marc– Jelineck, Mark

– Jelineck, Marck– Jelineck, March– Jelliner, Marc– Jelliner, Mark– Jelliner, Marck– Jelliner, March– Jelliner, Marc– Jelliner, Mark– Jelliner, Marck– Jelliner, March– Jellyneck, Marc– Jellyneck, Mark– Jellyneck, Marck– Jellyneck, March

Page 10: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

ANALYTIC IMPACT

• Average Revenue per customer

• Average Profit per customer

• Number of customers

• Customers per Geography

• Customers by Income

• Customers by Gender

• Customers by Educational Level

• Customers by Product Bought

Page 11: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

OBLIGATORY TRUISMS

• The accuracy of your reporting is determined by the accuracy of your data (Garbage In, Garbage Out)

• Decisions made based on data will only be as good as the data on which you are basing your decisions.

• You can’t manage what you can’t measure. Inaccurate measurements lead to interesting management challenges.

Page 12: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

THE DATA QUALITY SOLUTION SPACE

Amend, remove or enrich data that is incorrect or incomplete. This includes correction, enrichment and standardization.

Identifying, linking or merging related entries within or across sets of data.

Cleansing Matching

Profiling MonitoringAnalysis of the data source to provide insight into the quality of the data and help to identify data quality issues.

Tracking and monitoring the state of Quality activities and Quality of Data.

Page 13: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

SQL SERVER 2012 DATA QUALITY SERVICES

High quality data is critical to effective business intelligence and to business activities

DQS is an on-premise Data Quality product in SQL Server 2012, extendible with knowledge from multiple parties thru Azure DataMarket

Richer DQ knowledge and capabilities in the cloud will make it even easier to provide high quality data

Data Quality Services (DQS) is a Knowledge-Driven data quality solution enabling IT Pros and data stewards to easily improve the quality of their data

Included with SQL Server 2012 Enterprise and BI Editions

Page 14: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

KEY DATA QUALITY SERVICES CONCEPTS

Knowledge-Driven

Semantics

Knowledge Discovery

Based on a Data Quality Knowledge Base (DQKB)

Data Domains capture the semantics of your data

Acquires additional knowledge the more you use it

Open and Extendible

Easy to use

Add user-generated knowledge & 3rd party reference data providers

User experience designed for increased productivity

Page 15: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

DQS ARCHITECTURE

MatchingReference

Data

DQ Clients

DQS UI

DQ Server

DQ Projects Store Common Knowledge Store Knowledge Base Store

DQ Engine

3rd Party / Internal

MS DQ Domains Store

Reference Data Services

Reference Data SetsSSIS DQ

Component

DQ Active ProjectsMS Data Domains

Local Data

Domains

Published KBs

Knowledge Discovery

Data Profiling & Exploration

Cleansing

Knowledge Discovery and Management

Interactive DQ Projects

Data Exploration

Azure Market Place

Categorized Reference Data

Categorized Reference Data Services

Reference Data API(Browse, Get, Update…)

RD Services API(Browse, Set,

Validate…)

MDS Excel Add in

Page 16: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

DATA QUALITY SERVICES PROCESSES

Build

Use

DQ Projects

KnowledgeManagement

Match & De-dupe

Correct

& standardize

Manage KnowledgeEnterprise

Data

ReferenceData

Cloud Services

IntegratedProfiling

NotificationsProgressStatus

KnowledgeBase

Discover / Explore Data

Page 17: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

BASIC DEFINITIONS

• Knowledge Base

– Stores all the knowledge related to a specific type of data source

– Container for domains

• Domain

– Semantic representation of a type of data in a data field or column

– Trusted values, invalid values and erroneous data

– Synonym associations, term relationships, validation and business rules, matching policies

• Matching Rule

– Set of rules and conditions that determine a match or duplicate

Page 18: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

DATA QUALITY SERVICES COMPONENTS

• Data Quality Server

• Data Quality Client

• DQS Cleansing Component for SQL Server Integration

• Data Quality Processes in Master Data Management

Page 19: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

DATA QUALITY SERVICES COMPONENTS

• Data Quality Server

– SQL Server Databases

• DQS_MAIN

– DQS Stored Procedures, the DQS Engine and published Knowledge Bases

• DQS_PROJECTS

– Data required for knowledge base management and DQS project activities

• DQS_STAGING_AREA

– Intermediate staging area where source data is copied and processed

Page 20: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

DATA QUALITY SERVICES COMPONENTS

• Data Quality Client

– Standalone application

• Designed for both data stewards and DQS Administrators

• Perform knowledge management, data quality projects and administration in one user interface

• Allows for domain management, matching policy creation, data cleansing, matching, profiling, monitoring and server administration.

• Can be installed on a remote computer

Page 21: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

DATA QUALITY SERVICES COMPONENTS

• DQS Cleansing Component in SQL Server Integration Services

– Performs data cleansing as a part of an SSIS package

– Alternative to running a cleansing project within the Data Quality Services Client application

• Data Quality Processes within Master Data Services

– Perform de-duplication on source data and master data within the Microsoft SQL Server Data Services Add-in for Microsoft Excel.

Page 22: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

DEMO

Page 23: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

RESOURCES

•DBI207: Using Knowledge to Cleanse Data with Data Quality Serviceshttp://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI207

•Data Quality Services Bloghttp://blogs.msdn.com/b/dqs

•Books Online for SQL Server - Data Quality Serviceshttp://technet.microsoft.com/en-us/library/ff877925(SQL.110).aspx

•Install Data Quality Serviceshttp://technet.microsoft.com/en-us/library/gg492277.aspx

•Used Master Data Services Configuration Manager, set up IIShttp://msdn.microsoft.com/library/ee633744%28SQL.110%29.aspx

•Troubleshoot Installation and Configuration Issues (Master Data Services in SQL Server 2012http://go.microsoft.com/fwlink/?LinkId=226284

•SQL Server 2012 Developer Training Kit Web Installer http://www.microsoft.com/en-us/download/details.aspx?id=27721

•SQL Server 2012 Update for Developers Training Workshophttp://social.technet.microsoft.com/wiki/contents/articles/6981.sql-server-2012-update-for-developers-training-workshop.aspx

•SQL Server 2012 Update for Developers Training Kithttp://social.technet.microsoft.com/wiki/contents/articles/6982.sql-server-2012-developer-training-kit-bom-en-us.aspx

•SQL Server 2012 Update for Developers Training Kit Contenthttp://social.technet.microsoft.com/wiki/contents/articles/6976.sql-server-2012-developer-training-kit-content-en-us.aspx

• MSDN – Data Quality Serviceshttp://msdn.microsoft.com/en-us/library/ff877925(v=sql.110).aspx

• MSDN Discussions – Data Quality Serviceshttp://social.msdn.microsoft.com/Forums/en-US/sqldataqualityservices/threads

• Technet – Data Quality Serviceshttp://technet.microsoft.com/en-us/library/ff877925.aspx

• PASS MDM/DQS Virtual Chapterhttp://masterdata.sqlpass.org

Page 24: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

Driving Innovation with SAP On-Premise and Beyond

Monish NagisettyDaniel Sepp

7 June 201210:00-11:00 PDT

http://www.clicktoattend.com/?id=159718

Delivering a Semantic Model for Ad-Hoc Reporting

Steve Muise 21 June 201210:00-11:00 PDT

http://www.clicktoattend.com/?id=159917

Exploring the New Hadoop Implementation in Azure

Jacob Saunders 10 July 201210:00-11:00 PDT

http://www.clicktoattend.com/?id=160452

Amerishore – an Innovative, Socially Conscious Approach to Offshoring

Tracy Derr 17 July 201210:00-11:00 PDT

http://www.clicktoattend.com/?id=160650

Data Integration Improvements within SQL Server 2012

Kola Bolarin 24 July 201210:00-11:00 PDT

http://www.clicktoattend.com/?id=160322

EDI: The Reports of My Death Have Been Greatly Exaggerated

Abhilash Shanmug 9 August 201210:00-11:00 PDT

http://www.clicktoattend.com/?id=159720

http://www.neudesic.com/About/Events/Pages/RecentWebcasts.aspx

Page 25: SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com

THANK YOU