Upload
blaise-cheuteu
View
493
Download
0
Embed Size (px)
Citation preview
White Paper
Data Profiling Best Practices
Data
Profiling
Best
Practices
2
OverviewThis white paper provides an overview of best practices with data
– Examines the best scenarios for
Why Use Data Profiling Technologies?
Deployment of Data Profiling Technologies
Data Quality Management
>>
>>
>>
White Paper
Data Profiling Best Practices
3
>>
>>
>>
>>
Data Integration
>>
>>
>>
>>
>>
4
Data Profiling Process
Prepare for the Project
>>
>>
>>
>>
>>>>>>>>>>>>>>>>>>>>>>
Analysis Preparation
Review Project Initiation Document
White Paper
Data Profiling Best Practices
5
Current Documentation
>>>>>>
Team Training
Internal Setup/Decisions
>>
>>
Profiling Overview
PROJ
ECT
PREP
ARAT
ION
ANAL
YSIS
PREP
ARAT
ION
ANAL
YSIS
SAM
PLIN
GEX
TRAC
T&
FOR
MAT
Project Initiation Document
Project Preparation
Extract & Format
Analyze Samples
Profiling
6
Activity Workflow
>>
>>
>>
>>
>>
>>>>
Extract and Format the Data
>>
White Paper
Data Profiling Best Practices
7
>>
>>
Create the Extract Program(s)
Load Preparation
>>
>>
>>
Sampling
>>
>>
>>
>>
>>
>>
Load a Sample of the Data
Analysis of the Sample
Csv Each field, if separated by a comma, and text fields enclosed within quotes. Generally this type of file al-lows the first row to contain the name of the column.
csv FileDefinition
Some product require or allow you to create defini-tion rules for csv files. It is helpful to add or change column names or add descriptions to the attributes.
Flat FileDefinition
Varies based on the data profiling product chosen. It varies from a flattened copybook or equivalent for the language used, to pre-defined formats specific to the tool itself.
ODBCConnection
Open DataBase Connectivity, a standard database access method developed by Microsoft Corporation. The goal of ODBC is to access any data from any application, regardless of which database manage-ment system (DBMS) is handling the data.
8
Adjust the Extracts and Formats of the Data
>>
>>
>>
Produce Deliverables
Delete the Samples
Analysis
Analysis Assistant
>>
>>
>> Code
>>
>>
>>
Blanks/Nulls/Low Values/High Values
White Paper
Data Profiling Best Practices
9
Minimums/Maximums
Patterns
>>
>>
Duplicates / Inconsistencies
Invalid Codes
Identify Keys
Key Testing
Join Testing
Low Value
000-00-0000
NULL
High Value
999-99-9999
System
System 1
Minimum
000-00-00001
Maximum
System
System 1
System 1
System 2
Values
123-45-6789
12-3456789
123456789
Pattern
9(3)-9(2)-(4)
9(2)-(7)
9(9)
System
System 1
System 1
Values
123-45-6789
123-45-6789
System
System 1
System 1
Values
123-45-6789
123-45-6789
10
Outputs
White Paper
Data Profiling Best Practices
11
White Paper
Data Profiling Best Practices
For more information about our products and services, please log onto our website at www.g1.com or call us today at 888-413-6763.
4200 Parliament Place, Suite 600Lanham, MD 20706-18441-888-413-6763 • www.g1.com
Group 1, Group 1 Software and the Group 1 logo are registered trademarks of Group 1
Software, Inc. Pitney Bowes and the Pitney Bowes logo are registered trademarks and the
Pitney Bowes Process Bar Design is a trademark of Pitney Bowes Inc. Group 1 Software
is a Pitney Bowes company. All other marks referenced in this material are the property of
their respective owners.
© 2007 Group 1 Software, Inc. All rights reserved.
An Equal Opportunity Employer. Printed in U.S.A.