Types and Sources of Errors in Statistical Data *
a. non-sampling errors and
b. sampling errors.
*
Non-sampling errors
These are errors that arise during the course of all data
collection activities.
In summary, they have the following characteristics:
exist in both sample surveys and censuses data.
difficult to measure .
failure to identify the target population.
non response.
Defects in the sampling frame
This result in coverage errors.
These occur when there is an omission, duplication or wrongful
inclusion of units in the sampling frame.
Omissions are referred to as ‘under coverage’ while duplications
and wrongful inclusions are called ‘over coverage’.
These errors are caused by defects such as inaccuracy,
incompleteness, duplication, inadequacy and out of date sampling
frames.
*
Failure to Identify Target Population
*
Response
*
a. Poor questionnaire design
The content and wording of the questionnaire may be misleading and
the layout of the questionnaire may make it difficult to accurately
record responses.
As a rule, questions in questionnaire should not be loaded,
double-barrelled, misleading or ambiguous, and should be directly
relevant to the objectives of the survey.
*
Poor questionnaire design – cont’d
*
b. Interviewer bias
An interviewer may influence the way a respondent answers survey
questions.
*
These arise through the respondent providing inaccurate or wrong
information.
They occur because of memory biases or respondents giving
inaccurate or false information when they believe that they are
protecting their personal interests or integrity.
They can also arise from the way the respondent interprets the
questionnaire and the wording of the answer that the respondent
gives.
Careful questionnaire design and effective questionnaire testing
can overcome these problems to some extent.
*
d. Problems with the survey process
*
Non-response results when data is not collected from
respondents.
The proportion of these non-respondents in the sample is called the
non-response rate.
Non-response can be either total or partial.
Total non-response or unit non-response can arise if a respondent
cannot be contacted (because the sampling frame is incomplete or
out-of-dated) or the respondent is not at home or is unable to
respond because of language difficulties or illness or out rightly
refuses to answer any questions or the dwelling unit is
vacant.
*
Non-response - cont’d
When conducting surveys it is important to document information on
why a respondent has not responded.
Partial non-response or item non-response can occur when a
respondent replies to some but not all questions of the
survey.
This can arise due to memory problems, inadequate information or an
inability to answer a particular question/section of the
questionnaire.
A respondent may refuse to answer if;
a. they find questions particularly sensitive, or if
b. they have been asked too many questions.
*
To reduce non-response, the following approaches can be used:
care should be taken in questionnaire design through the use of
simple questions.
pilot testing of the questionnaire.
explaining survey purposes and uses.
assuring confidentiality of responses.
*
Processing
These occur at various stages of data processing such as data
cleaning, data capture and editing.
Data cleaning involves taking preliminary checks before entering
the data onto the processing system.
*
Processing – cont’d
Inadequate checking and quality management at this stage can
introduce data loss (where data is not entered into the system) and
data duplication (where the same data is entered into the system
more than once) thus introducing errors in data.
*
Time Period Bias
*
Analysis and Estimation
Analysis errors include any errors that occur when using wrong
analytical tools or when preliminary results are used instead of
the final ones.
Errors that occur during the publication of the data results are
also considered as analysis errors.
Estimation errors occur when inappropriate or inaccurate weights
are used in the estimation procedure thus introducing errors to the
data.
*
Can be minimised by adopting any of the following approaches:
using an up-to-date and accurate sampling frame.
careful selection of the time the survey is conducted.
planning for follow up of non-respondents.
careful questionnaire design.
*
Reducing non-sampling errors – cont’d
*
Sampling error
Refer to the difference between the estimate derived from a sample
survey and the 'true' value that would result if a census of the
whole population were taken under the same conditions.
These are errors that arise because data has been collected from a
part, rather than the whole of the population.
*
Sampling errors – cont’d
There are no sampling errors in a census because the calculations
are based on the entire population.
*
a. sample size.
In general, larger sample sizes decrease the sampling error,
however this decrease is not directly proportional.
As a rough rule of the thumb, you need to increase the sample size
fourfold to halve the sampling error but bear in mind that non
sampling errors are likely to increase with large samples.
b. the sampling fraction.
*
Factors Affecting Sampling Error – cont’d
c. the variability within the population.
More variable populations give rise to larger errors as the samples
or the estimates calculated from different samples are more likely
to have greater variation.
The effect of variability within the population can be reduced by
the use of stratification that allows explaining some of the
variability in the population.
d. sample design.
*
Characteristics of the sampling error
generally decreases in magnitude as the sample size increases (but
not proportionally).
depends on the variability of the characteristic of interest in the
population.
can be accounted for and reduced by an appropriate sample
plan.
can be measured and controlled in probability sample surveys.
*
Reducing sampling error
*