31
1 Improving Data Quality

1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

Embed Size (px)

Citation preview

Page 1: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

1

Improving Data Quality

Page 2: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

COURSE DESCRIPTIONTITLE: Improving Data Quality in Census/Surveys

DURATION: 3 weeks Two weeks in Washington, DC and one week in Jeffersonville, IN at the US Census Bureau National Processing Center (NPC)

PREREQUISITES: None

TRAINEE MATERIALS: Participant Manual

PERFORMANCE OBJECTIVES: Upon completion of this course, you will be able to: Plan and budget quality assurance programs; Apply the principles of quality assurance to tasks, processes, and products; Identify areas of potential error in mapping, questionnaire development, staff training, data

collection, coding, data processing, and data analysis operations; Develop a quality assurance program

INSTRUCTOR: Rebecca Sauer ([email protected]) International Programs Center, US Census Bureau

COURSE DATES: June 23 – July 11, 2003

Page 3: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

Introduction to Data Quality- Course OutlineWashington, DC

1. Managing the process, an introduction to quality assurance A. Purpose and activities of a quality assurance program B. Principles of Total Quality Management (TQM)

2. Involving data users in the process A. Producing relevant data B. Creating more data users C. Building a reputation

3. Planning a quality assurance program

A. Work Breakdown Structure (WBS) B. Resource requirements C. Scheduling D. Determining costs for quality assurance programs

4. Tools for budgeting and planning assurance programs

5. Quality control considerations for census/survey operations A. Reducing Non-Response B. Development of manuals C. Mapping operations D. Staff training E. Field operations F. Acceptance Sampling G. Questionnaire development H. Coding operations I. Data processing operations J. Data analysis K. Data delivery

Page 4: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

4

What is Quality Data?

Page 5: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

5

Does Data Quality Matter?

Policy and program decisions

Trends

Modification of survey tools

Page 6: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

6

Data Quality:

The degree of excellence or accuracy of the factual information being collected in a survey or census needed to make it meet the user’s needs for decision making purposes.

Page 7: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

7

Quality Assurance Program

A good quality assurance program is an effective tool used to fine-tune the products and processes of a census or survey to prevent data errors before they happen, saving time and money.

Page 8: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

8

Goals of Data Quality: Relevance

Accuracy

Timeliness

Accessibility

Interpretability

Coherence

Page 9: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

9

RELEVANCERelevance is the degree to which the data

meets the users’ needs. In order to meet these goals, Subject-matter specialists of the statistical organization must meet with the users to define:

Items to be measured

Concepts and definitions

Analytical plans

Tabulation plans

Page 10: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

10

ACCURACY

The objective of a survey or census is to obtain

estimates of the true (unknown) value of a

population or economic parameter. For these

estimates to have any worth they must be

close to the true value. Therefore, it is of

utmost importance to establish accuracy as a

primary goal for data production.

Page 11: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

11

TIMELINESS

Timeliness refers to the length of time

between data availability and the event

it describes. Timely information is

valuable because it can still be acted

upon. Timelines is usually a trade-off

with accuracy.

Page 12: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

12

ACCESSIBILITYThe accessibility of statistical information refers

to the ease with which it can be obtained from the national statistical office. This includes the ease with which the existence of information can be ascertained, as well as the suitability of the form or medium through which the information can be accessed.

The cost of the information may also be an aspect of accessibility for some users.

Page 13: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

13

INTERPRETABILITYThe interpretability of statistical information

reflects the availability of the supplementary information and metadata necessary to interpret and utilize it appropriately.

This information normally covers the underlying concepts, variables, and classifications used, the methodology of collection, and indications of the accuracy of the statistical information.

Page 14: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

14

COHERENCEThe coherence of statistical information reflects

the degree to which it can be successfully brought together with other statistical information within a broad analytical framework over time.

The use of standard concepts, classifications, and target populations promotes coherence, as does the use of common methodology across surveys.

Page 15: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

15

Responsibility of the Statistical Organization:

Produce timely, coherent data to satisfy users’ needs, which is accessible and easily understood, while insisting on the greatest possible accuracy.

Relevance

Timeliness

Accuracy

Coherence

Accessibility

Interpretability

Page 16: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

16

Benefits of High Quality Data:

Increased use of data

Increased visibility and prestige for the statistical office

Generate a culture of data use and demand

Page 17: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

17

Quality Assurance Program:

Major components:

A Training Program

Quality Control Program

An Evaluation Program

Page 18: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

18

Purposes of Quality Control:

To control the product:

Census products are the results of any work that is produced by one group of persons that will be used by another group of persons later in the census.

In order to control census products, we need definitions of acceptable for each product, decision rules to determine which products are accepted or rejected, and appropriate actions to take based on the results of the decision.

Page 19: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

19

Purposes of Quality Control:

To control the process

Control the methods used to monitor the operation

Control the steps that determine when an employee needs to be retrained or released

Page 20: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

-User meetings -Data Collection -Post-collection processing

-Design and Development -Analysis

-Dissemination

PLAN the products

COLLECT the data

DELIVER the products

Docu

men

tati

on

Cu

sto

mer

Serv

ice

Manage the Process

Quality Control

Page 21: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

21

Anatomy of a Survey/Census 5 Phases

Contract Negotiation

Design and Development

DataCollection

Post-CollectionProcessing

Analysis andDissemination

Each phase has its own:•Objective•Key tasks•Deliverables•Documentation

Page 22: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

22

Contract Negotiation

Objective: Identify the sponsor’s needs and

outline the survey(s) to meet those needs.

Key Tasks: Understand the requirements Generate the contract Negotiate to final decision Gain necessary government

approval/clearance

Deliverables: Approved contract Rough schedules and timelines Rough questionnaire outline

Documentation: Project description List of data products expected Contract

Page 23: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

23

Design and DevelopmentObjective:

Develop survey tools to meet the objectives, given time and cost parameters

Key Tasks: Finalize schedule Sampling Create input files (listing) Develop/revise data capture systems Develop and test the questionnaire Develop training and interviewing

materials Conduct field pre-test Test systems

Deliverables: Sample Approved data collection/capture

modes Input files (master list) Training/interview materials Analysis plan

Documentation: Baseline schedule Final specifications Sampling plan Training materials Instrument documentation

Page 24: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

24

Data Collection

Objective: Gather raw data in a timely and

cost-effective manner.

Key Tasks: Conduct training Field the survey Collect the data from the field Monitoring and problem solving

Deliverables: Status of each case Raw data for each case

Documentation: Tracking report of field problems Progress/status reports

Page 25: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

25

Post-Collection ProcessingObjective: Generate accurate and organized

final microdata.

Key Tasks: Data capture Data receipt (reformatting) Preliminary review Clean the data Imputation Weighting Generation of preliminary tables Monitoring and problem solving

Deliverables: Approved internal data file

(microdata) Crosstabulations and/or work

tables

Documentation: Data dictionary All processing specifications

(coding, editing, imputation, weighting, etc.)

Problem tracking and progress reports

Page 26: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

26

Analysis and DisseminationObjective:

Translate data into useful information that meets objectives, and distribute it to the appropriate audience.

Key Tasks: Send data directly to sponsor

(if applicable) Create public use file Table/publication generation Compile/produce final

documentation Evaluation and debrief

Deliverables: Tables for publication Reports Public use file Press Releases

Documentation: Lessons learned Procedural History Reports/Publications/Press

Releases Public use file Disclosure request

Page 27: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

27

Activities of a Quality Assurance Program

Measurement of Quality Characteristics

Comparison to Pre-determined

Standards

Corrective Actions

Page 28: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

28

Quality Control InspectionsTypes:• Qualitative or Attribute Inspections

Examination of a characteristic of interest and determination of whether a presence or absence of a certain property is there.

• Quantitative or Variable InspectionsMeasurement of the characteristic of interest on

a continuous scale.

Methods:• Sample Inspections• 100% Verification

Page 29: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

29

Verification Methods

Dependent Verification:

• Production clerk

• Verifier

Verifier sees production clerk’s work

PROBLEM: In dependent verification, the verifier may agree more often than they should since they see the production clerk’s work.

Page 30: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

30

2-Way Independent Verification:Two-way match• Production clerk• Verifier• Matcher

Agreements between production clerk and verifier are correct

Disagreements are reviewed by a matcher

PROBLEM: Independent verification is more costly since there are three clerks involved in the process: the production clerk, the verifier, and the matcher. However, independent verification is more accurate since the verifier is not influenced by the production clerk’s work.

Page 31: 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline

31

3-Way Independent Verification:

Three-way match• Production clerk• Two independent verifiers• Matcher

Agreements of all three (clerk and verifiers) are correct

If two out of three agree, an error is chargedIf all three disagree, no error is charged and

matcher decides the correct answer

Problem: more costly but more accurate