40
1 QUALITY QUALITY OF OF DATA DATA

Data verification slides bangalore to t (4)

Embed Size (px)

Citation preview

Page 1: Data verification slides bangalore to t (4)

1

QUALITY QUALITY

OF OF

DATADATA

QUALITY QUALITY

OF OF

DATADATA

Page 2: Data verification slides bangalore to t (4)

2

LEARNING OBJECTIVESLEARNING OBJECTIVESLEARNING OBJECTIVESLEARNING OBJECTIVES

Realise importance of correct data for program management

Realise distinction between random data errors and falsified data

Understand causes of poor data quality Being able to check data quality through

supervision and review of reports Learn when and how to correct erroneous

data

Page 3: Data verification slides bangalore to t (4)

3

Occurrence & importance of Occurrence & importance of errorserrors

In business context: Error rates of 1-5 % are not exceptional Estimated cost ≈ 10 % of revenue Problems with data quality when data originate from

multiple sources After initial enthusiasm to improve data quality, focus

on data quality generally slowly fades

In disease control context Error occurrence? Impact on program performance? Checking of errors: limited effort

Page 4: Data verification slides bangalore to t (4)

4

Errors in RNTCP?Errors in RNTCP?

Based on pre-test carried in all countries:

Real possibility of errors in subdistrict reports

Minor possibility in district reports

Little attention to checking for errors

Page 5: Data verification slides bangalore to t (4)

5

Errors identified in 1039 TB patientscohort review method in NY city: Munsiff et all IJTLD, 2006, 10 : 1133-9

• 41% of cases presented errors• multiple errors per patient: 596 / 424 = 1.4 • What kind of errors?

- program info errors 55 %

- patient related errors 45 %

NB. Error rates in HMIS > 50 % Gillies A. Methods Inf Med 2000, 39 : 208-12

Page 6: Data verification slides bangalore to t (4)

6

DataData qualityquality: definitiondefinition The state of

validity,validity,

reliability,reliability,

consistency, consistency,

timelinesstimeliness

and completenessand completeness

making data appropriate for a specific use

Problems with data quality do not only arise from incorrect data

Inconsistent data is a problem as well

Page 7: Data verification slides bangalore to t (4)

7

Data Quality ~ Data Quality ~ ManagementManagement

Quality AssuranceQuality Assurance

Activities to ensure quality before data collection

Quality AssuranceQuality Assurance

Activities to ensure quality before data collection

Quality ControlQuality Control

Monitoring and maintaining quality of data during RNTCP implementation

Quality ControlQuality Control

Monitoring and maintaining quality of data during RNTCP implementation

Data managementData management

Handling and analysis of data throughout the RNTCP surveillance

Data managementData management

Handling and analysis of data throughout the RNTCP surveillance

Page 8: Data verification slides bangalore to t (4)

8

Quality assurance & Quality assurance & controlcontrol

Quality assurance Quality control - anticipates problems before they occur - responds to observed problems

- uses all available information to generate improvements

- uses ongoing measurements to make decisions on the processes or products

- is not tied to a specific quality standard - requires a pre-specified quality standard for comparability

- is applicable mostly at the planning stage - is applicable mostly at the processing stage

- is all-encompassing in its activities - is a set procedure that is a subset of quality assurance

Page 9: Data verification slides bangalore to t (4)

9

Quality controlQuality control

Quality control is a regulatory procedure through which we:

measure quality compare quality with pre-set standards act on the differences

The objective of quality control is to achieve a given quality level with minimum cost (ex. EQA sampling)

Page 10: Data verification slides bangalore to t (4)

10

Dimensions of data Dimensions of data qualityquality

1. Intrinsic data quality accuracy (validity and reliability)

2. Contextual data quality relevant

timely

complete

3. Representational data quality interpretability, easy to understand

4. Accessibility data quality accessibility, security

Page 11: Data verification slides bangalore to t (4)

11

Intrinsic data qualityIntrinsic data qualityACCURACYACCURACY

Exact conformity to the true value

WHY IMPORTANT?

Accurate data = precondition for accurate decisions!!

Two concepts: validityvalidity and reliabilityreliability

QUESTION: is this guaranteed?

Page 12: Data verification slides bangalore to t (4)

12

ValidityValidity= the degree to which

a measurement reflects the truth

There should be no systematic error or bias

What is a valid sputum result for an open TB case?What is a valid sputum result for an open TB case?

A result is valid if it corresponds to the true value!A result is valid if it corresponds to the true value!

Open TB case = sputum positive!!Open TB case = sputum positive!!

Page 13: Data verification slides bangalore to t (4)

13

ReliabilityReliabilityThe degree to which a measurement gives the same

result: each time it is used under the same condition with the same subject

A necessary but not sufficient condition for validity because one can make the same errors twice

Reliability = repeatibility of measurementsReliability = repeatibility of measurements

Reliability is inversely related to Reliability is inversely related to random errorrandom error

Page 14: Data verification slides bangalore to t (4)

14

Dimensions of data Dimensions of data qualityquality

1. Intrinsic data quality accuracy

2. Contextual data quality relevant

timely

complete

Page 15: Data verification slides bangalore to t (4)

15

RELEVANCERELEVANCE

(usefulness)(usefulness)

Reflects the degree to which information meets the real needs of clients.

Is concerned with whether the available

information sheds light on the issues that are important to users.

Page 16: Data verification slides bangalore to t (4)

16

RELEVANCERELEVANCE

A good information source should include all relevant content and exclude all irrelevant content.

. Decision making for RNTCP management

Relevant for what?

.

Assessing relevance is subjective and depends upon the varying needs of users!

Page 17: Data verification slides bangalore to t (4)

17

TIMELINESSTIMELINESS

Refers to the moment data are compiled, reported and analysed

Given RNTCP’s normalization of the data reporting

system, timeliness is not a major issue in India.

But it could be an issue in remote areas and in PPM

Page 18: Data verification slides bangalore to t (4)

18

COMPLETENESSCOMPLETENESS

No missing data (records, items) All data fields that have to be filled up,

should indeed contain data.

QUESTION: does this presently happen??

Page 19: Data verification slides bangalore to t (4)

19

Missing records

• Annual report 2001 NTP Bangladesh

Reports DOTS areas non DOTS areas

-------------------------------------------------------

Received 2230 180

Missing 59 4

% missing 3% 2%

Page 20: Data verification slides bangalore to t (4)

20

Dimensions of data Dimensions of data qualityquality

1. Intrinsic data quality accuracy

2. Contextual data quality

relevant

timely

complete

3. Representational data quality interpretability, easy to understand

Page 21: Data verification slides bangalore to t (4)

21

Representational data Representational data qualityquality

Interpretability

Data must be in appropriate language and units, and the data definitions must be clear to all (language, jargon, concepts)

Ease of understanding

Data must be clear, without ambiguity, and easily comprehended.

Page 22: Data verification slides bangalore to t (4)

22

Dimensions of data Dimensions of data qualityquality

1. Intrinsic data quality accuracy

2. Contextual data quality relevant timely complete3. Representational data quality interpretability, easy to understand4. Accessibility data quality

accessibility, security

Page 23: Data verification slides bangalore to t (4)

23

ACCESSIBILITYACCESSIBILITY

Essential element of any data quality assessment. Essential element of any data quality assessment.

If data is not accessible, then it has little or no valueIf data is not accessible, then it has little or no value ..If data is not accessible, then it has little or no valueIf data is not accessible, then it has little or no value ..

Accessibility = precondition for use, but no guarantee for use!

Data items should be easily obtainable and legal to collect. Data items should be easily obtainable and legal to collect. In computer era, guidelines have to be established for who In computer era, guidelines have to be established for who may access which datamay access which data

Page 24: Data verification slides bangalore to t (4)

24

SECURITYSECURITYThe protection of data from: ☞unauthorized modification (accidental or

intentional)☞equipment malfunction (computer crash), ☞natural disasters (fire, tsunami..) and crime

Be aware!Security threats are more serious when HMIS is computerized:

unauthorized access to data damage to files (viruses…)

Be aware!Security threats are more serious when HMIS is computerized:

unauthorized access to data damage to files (viruses…)

Page 25: Data verification slides bangalore to t (4)

25

Data management covers the whole process, starting from data recording to transcription, compilation, analysis & interpretation,

reporting, feedback and use.

TB CENTRE (OPD or lab)TB CENTRE (OPD or lab)

TRANSCRIPTIONTRANSCRIPTIONRECORDINGRECORDING

COMPILATIONCOMPILATION

ANALYSIS & INTERPRETATION

ANALYSIS & INTERPRETATION

REPORTINGREPORTING

FEEDBACK & USEFEEDBACK & USE

Page 26: Data verification slides bangalore to t (4)

26

Where can errors occur?Where can errors occur?

At each step, especially during:

Data recording

Manual data transcription

Data compilation

Data entry in computer

Analysis

Interpretation

Page 27: Data verification slides bangalore to t (4)

27

Step in data flow Source of error

Data recording Information not registered

Wrong information (wrong address, etc )

Right information wrongly entered (in the wrong place)

Missing records

Data compilation Wrong counts

Missed reports

Duplicate counting

Compterised data entry

Wrong entry

Partial entry

Partial entry of records

Template based computerised data analysis

nil

Page 28: Data verification slides bangalore to t (4)

28

Prevention of data errorsPrevention of data errors

clarity of the instructions

training and motivation of the staff

honesty of the staff

user-friendliness of the data supports, such as data forms and templates

supervision

Page 29: Data verification slides bangalore to t (4)

29

Prevention of data errors Prevention of data errors computerized data handling :

improves the accuracy of the data

prevents processing and analysis errors

makes fudging less easy, once the data have been entered in the computer

use of independent double entry techniques (and checking of inconsistencies between the 2 entries)

data entry formatted to acceptable ranges and modalities only

Page 30: Data verification slides bangalore to t (4)

30

How to proceed with the How to proceed with the data verification?data verification?

1. Be alert

2. Routine checking of data

3. Quarterly report checking

Page 31: Data verification slides bangalore to t (4)

31

BE ALERT!BE ALERT!

Registers that look meticulously clean All data entered with the same pen Lack of variation / identically results every quarter A too nice performance:

absence of initial defaulters too low death rates too high cure rates absence of defaulting in IP …

Be alert to the likelihood of intentional falsification of data!!!

Do not accept data without checking

their veracity!!!

Page 32: Data verification slides bangalore to t (4)

32

How to proceed with the data How to proceed with the data verification?verification?

Routine checking of the data through supervision

Completeness checking Consistency checking

Quarterly report checking Range checking Modality checking

Page 33: Data verification slides bangalore to t (4)

33

Completeness checkingCompleteness checking

Completeness of report = all data have been reported!

A minimal completeness check verifies if

all variables contain data.

A minimal completeness check verifies if

all variables contain data.

Example:

200 NSP cases and age information only for 187 casesInformation is incomplete!

How to solve?

Verify via the original reports.

Page 34: Data verification slides bangalore to t (4)

34

Consistency checkingConsistency checkingChecks whether the values of data items are concordant

Example: CAT III and Sputum+

How to check for inconsistencies?How to check for inconsistencies?

By cross tabulation

CAT Sputum result

SP+ SP-

CAT I 1162 114

CAT II 300 148

CAT III 16 103016

Contradiction

Page 35: Data verification slides bangalore to t (4)

35

Range checkingRange checking

Any method of detecting whether a quantitative variable is within an acceptable range

Example 1: Height of an adult patient

Acceptable range = 1.00 m to 2,00 m

3.00 m is impossible

0.98 is possible, but needs verification

Example 2: Age of an adult patient

Acceptable range =15 to 100 years

150 years is impossible

Any “impossible” or “out of range” value should be verified via the original record or the patient.

Any “impossible” or “out of range” value should be verified via the original record or the patient.

Page 36: Data verification slides bangalore to t (4)

36

Modality checkingModality checking

The data of a qualitative variable are classified in groups or modalities.

Each data should belong to one modality only

Example : Sex

Two modalities: Male or Female

Other values are impossible!

“Not known” is sometimes entered but is not a valid modality and should be verified and corrected!

Page 37: Data verification slides bangalore to t (4)

37

Correction of errorsCorrection of errors

ERROR ERRORS ??

Go back to the original data source.

But what if the original data source is erroneous?

The best method is to go back to a previous step in the data flow, and verify patient records, lab records, etc.

If correct data found, then modify the erroneous data If correct data not found, then report as “missing”.

Page 38: Data verification slides bangalore to t (4)

38

Errors in dataErrors in data

Risk for wrong decisionsRisk for wrong decisions

Information has to be of good quality• correct data• correct data processing

Valid Valid

ReliableReliable

CompleteComplete

ConsistentConsistent

TimelyTimely

Page 39: Data verification slides bangalore to t (4)

39

Erroneous dataErroneous dataErroneous dataErroneous data

BadBad informationinformationBadBad informationinformation

WrongWrong decisionsdecisionsWrongWrong decisionsdecisions

Appropriate actions??Appropriate actions??Appropriate actions??Appropriate actions??

Page 40: Data verification slides bangalore to t (4)

40

Don’t forget : there is more room for error than shown in this picture