Keeping Your Data Clean - NiUG · 2016-11-29 · Keeping Your Data Clean . Data Integrity Paradigm...

Preview:

Citation preview

Monday November 28, 2016

2:05PM – 3:05PM

Jake Bound, K2 and You

Doug Morris, Computer System Innovations

Keeping Your Data Clean

Data Integrity Paradigm

Data Integrity Paradigm

Today

A day in our life…

Why?

• The 1-10-100 Rule • It costs $1 to verify data when it is first entered into the

database

• It costs $10 to clean and de-dupe the data later

• It costs $100 in additional costs and lost opportunity if it is never corrected

• Trust

• Trust is priceless

Why?

• Data is the single most important asset in your organization

or to put it in even more clear terms

• Your whole business might blow up!

How

1. Understand the Paradigm (you can’t win)

2. Decide what “clean” means to you

3. Create your iMIS superhero

4. Set standards

5. Identify bad data

6. Remediate

7. Refine

8. Rinse and Repeat

Understanding the Paradigm

• Fixing everything in one pass is impractical

• Data cleanup is a process, not a one-time task

• Not all errors can be prevented

• Not all cleanup can be automated

• Consider cost/benefit

• Always remember, WE ARE HUMAN

Deciding on what “clean” means to you

• Every organisation has different:

• Requirements

• Configurations

• Budgets

• Audiences

• Staff abilities

• Legacy data

• Vendors

• Politics

The “why” behind cleaning

• Build Trust with all

• Improve reporting quality

• Increase self-service

• Simplify automation and other projects

• Change of process, fresh start

• Save license fees

• Increase value of rosters

• Discard legacy data

• Save postage

• Save space

• Save money (e.g. emails)

Intangible Tangible

Your iMIS Superhero

• In charge of

• New Fields added to iMIS

• Bridging the gap between what is important and what is controllable

• Working with your vendors • Initial through on-going

• Creating reports/queries/other clean-up options

Set Standards

• Define criteria for “clean” and “dirty”

• Communicate the standards to all (staff)

• Configure iMIS to capture data in a useful format

• Birth date vs. age, staff size range vs. number of employees

• Desired outputs guide your inputs

• The ultimate use of data should guide how you collect that data (e.g. spouse name, public directory)

Identify bad data

• Country = Australia, but State is not an Australian state

• Country = Canada, but State is not a Province

• Potential duplicate related data (two orders for the same products the same day, same ID)

• Middle name is 1 character with no period (i20 fixes now)

• Orphaned address

• Preferred address is blank

• “Other” selection with no other text provided

• One Address found when two are expected

Identify bad data

• Business rule violations

• Lapsed members (still a member, but expired)

• Email required

• Students with past grad date

• Apprentice members must have a mentor relationship

• Board member not registered for House Of Delegates

Identifying bad data

Free Tools (IQA)

Pros

Easy to access

On demand

End user accessible from iMIS

Does not require SQL knowledge

Can be made interactive

Cons

• Reactive

• No scheduling

• No automated corrections

• No automatic distribution of results (without PAP)

Identifying bad data

Free Tools (SSRS)

Pros

On demand or scheduled*

Can be under user control

Can give them a home in iMIS

Automatic Distribution of results (send email to all)*

Cons

• Reactive

• No automated corrections

• Some technical aptitude required

Identifying bad data

Free Tools (SQL)

Pros

Incredibly powerful (SOUNDEX)

Can create views/business objects

More powerful with each version of SQL

Automated corrections

Cons

• Requires SQL Server

Knowledge

Access

• Dangerous in the wrong hands

• Can stop working without notice (Agent/Jobs)

Identifying bad data

Included with iMIS 20 (Process Automation)

Pros

Included with iMIS 20

Multiple alerts included

Works great with Staff site and Member site

Cons

• Limited Data Quality Alerts

Missing Primary email alert

Missing Mobile phone alert

Identifying bad data

Optional Modules (Process Automation Plus)

Pros

Create your own alerts

Can perform scheduled tasks (and call SQL)

Can send emails to Staff

Can send SSRS reports

Automatic Distribution of results

Cons

• Reactive

• Optional

• Additional Licensing Required

Identifying bad data

Optional modules (Customer Service Alerts)

Pros

Proactive

User configurable

Works with desktop and staff site

Immediate notification

Can run processes

Cons

• No automated corrections

• No interactive corrections*

• Reactive

• Additional Licensing Required

Identifying bad data

Optional modules (TaskCentre)

Pros

User configurable

Immediate or delayed notification

Can correct errors as well

Cons

• Reactive

• No interactive corrections

• IT Support required to configure and maintain

• Requires dedicated SQL Server (limiting hosting options)

Remediate

• Better labeling/instructions

• Restrict access, increase access (nobody knows their data better than themselves)

• Change the process, add lookup tables/validations OR just stop collecting data

• Upgrade to newer technology (i20)

• Rollback changes

• Notify the person who entered it

• Sooner is better!

• Notify a supervisor or someone who is able to fix (or punish)

• Modify the data automatically using tools

Remediate

Refine

• Is our process working?

• Do we need to automate what we log?

• Have we discovered new problems?

• What is the most serious remaining problem? (choose 1 per month)

• Remember to check cost/benefit (JB story)

Rinse and Repeat

• The process is continuous

How about some tools to take with you?

• IQA Query #1

• If country = Canada, province is a valid province

Or

• If valid province, country should = Canada

If Country = Canada then Valid Postal Code

If Country = Canada then Valid Postal Code

If Valid Postal Code then Country = Canada

If Valid Postal Code then Country = Canada

How about some tools to take with you?

• IQA Query #2

• Ensure email address = username

• Because we can’t compare fields in IQA (can’t say where csContact.EMAIL <> users.UserId) we need a custom business object

• Create a SQL view

• Create a business object

• Create an IQA

Username = Email address

create view k2V_UserNameEmailCheck as Select n.ID, n.EMAIL, u.ContactMaster, u.UserId, VALID_CHECK = case when n.EMAIL = '' then 'Email missing' when u.UserId = '' then 'Username missing' when n.EMAIL <> u.UserId then 'Invalid Username' else '' end From Name n, UserMain u Where n.ID = u.ContactMaster And n.STATUS = 'A' And n.MEMBER_RECORD = 1 And n.COMPANY_RECORD = 0 And u.UserId <> ''

Username = Email address

Username = Email address

How about some tools to take with you?

• IQA Query #3

• Ensure Canadian postal code matches correct format (LNL NLN)

• Unfortunately another example of not being able to use more advanced logic in IQA comparisons

• SQL and Business Object time again …

Check for Valid/Correct Postal Code

create view k2V_PostalCodeCheck as Select ID, ADDRESS_NUM, ZIP, VALID_CHECK = PATINDEX('%[A-Z]%', substring(ZIP,1,1)) + PATINDEX('%[0-9]%', substring(ZIP,2,1)) + PATINDEX('%[A-Z]%', substring(ZIP,3,1)) + PATINDEX('% %', substring(ZIP,4,1)) + PATINDEX('%[0-9]%', substring(ZIP,5,1)) + PATINDEX('%[A-Z]%', substring(ZIP,6,1)) + PATINDEX('%[0-9]%', substring(ZIP,7,1)), COUNTRY From Name_Address Where ZIP <> '' And COUNTRY = 'Canada' And PATINDEX('%[A-Z]%', substring(ZIP,1,1)) + PATINDEX('%[0-9]%', substring(ZIP,2,1)) + PATINDEX('%[A-Z]%', substring(ZIP,3,1)) + PATINDEX('% %', substring(ZIP,4,1)) + PATINDEX('%[0-9]%', substring(ZIP,5,1)) + PATINDEX('%[A-Z]%', substring(ZIP,6,1)) + PATINDEX('%[0-9]%', substring(ZIP,7,1)) <> 7

Check for Valid/Correct Postal Code

Check for Valid/Correct Postal Code

How about some tools to take with you?

• IQA Query #4

• Find Duplicates based on First Name, Last Name, and part of address

Possible Duplicates (Name and City)

create view k2_PossibleDuplicates as Select ID From Name Where replace(lower(FIRST_NAME)+lower(LAST_NAME)+lower(CITY),' ','') in ( Select replace(lower(FIRST_NAME)+lower(LAST_NAME)+lower(CITY),' ','') as MATCH_KEY from Name where FIRST_NAME <> '' and LAST_NAME <> '' group by replace(lower(FIRST_NAME)+lower(LAST_NAME)+lower(CITY),' ','') having count(replace(lower(FIRST_NAME)+lower(LAST_NAME)+lower(CITY), ' ','')) > 1 )

Possible Duplicates (Name and City)

Possible Duplicates (Name and City)

How about some tools to take with you?

• IQA Query #5

• Members without paid dues subscriptions

• Considering paid thru as “no paid dues”

Unpaid Members

Unpaid Members

Custom Shortcuts to Help with Editing

• Create a simple custom page with just the address and security details

• Pass the ID to the custom page from each IQA

Custom Shortcuts to Help with Editing

• Create a simple custom page with just the address and security details

• Pass the ID to the custom page from each IQA

Custom Shortcuts to Help with Editing

More ideas from Michelle Lelempsis!

More ideas from Michelle

More ideas from Michelle

Michelle’s favorite Stored Procedures

• Job coding in Financials for division

• Phone formatting

• Country specific (Australia/Canada/USA) removal from address

• Unlocking member web accounts

• Activity tasks to notify a staff about an issue

Michelle’s plan

• Data sources and data integrity

• Develop a Data Management Plan with key teams and management

• Identify your Minimum Data Set

• Develop iQA Reports for daily, weekly, monthly and adhoc checks

• Identify SQL Stored Procedures for updating fields

• Develop Staff Sites dashboards

• Implement Staff Site Alerts

Staff Site Data Integrity Dashboards Community & Events i20 2015Q4

Staff Site Data Integrity iMIS 20 Q4

• Staff site Alerts

Let’s hear from you!

Thank You

Jake Bound jake@k2andyou.com

Douglas Morris dmorris@csiinc.com

Click to add text CLICK TO ADD TEXT

Click to add text

• Click to add text • Click to add text

THANK YOU TO OUR SPONSORS

FOUNDING PARTNER

PLATINUM PARTNERS

GOLD PARTNERS

SILVER PARTNERS