Upload
tabitha-wilkins
View
217
Download
0
Embed Size (px)
Citation preview
Measurement in Program Evaluation:
• test – measurement theory:
observed score on measure
true score
error
Deductive/Inductive ModelDeductive/Inductive Model
TheoryTheory
ConceptConcept PropositionProposition ConceptConcept
VariableVariable HypothesisHypothesis VariableVariable
OperationalizationOperationalization
Indicator(s)Indicator(s) Indicator(s)Indicator(s)
EmpiricalEmpirical
Conceptual Framework
Measures
IntendedInputs
ProgramComponents
IntendedOutputs
IntendedOutcomes
Measures Measures
An Example
• Let’s begin with an example– The photo radar program in BC is intended to
reduce the number of speed-related motor vehicle collisions on BC roadways
• We can model it
Photo RadarProgram
Fewer speed-related motor
vehicle collisions
An Example
• If we want to measure the performance of the program, we need to translate the intended outcome into observables
• Our conceptual framework for measurement outlines the process
ConstructIs the construct clearly stated?
Speed-related motor vehicle
collisions
Measurement procedures (the actual steps we use to
gather the data)
Criteria/Issues Criteria/Issues For For
MeasurementMeasurement
MeasurementMeasurementProcessProcess Our ExampleOur Example
Attending police officer’s assessment of whether speed was a contributing factor; recorded in an accident report; entered into a
database
Are the measurement procedures valid and
reliable?
Measuring Mental Constructs
-We ask survey questions-We try to control how the
questions are asked-Intended survey questions or
survey items are stimuli
While we are asking the questions, uncontrolled things happen:-Interviewer characteristics
-Setting characteristics-Interviewee characteristics-Instrument characteristics
STIMULISTIMULI RESPONSESRESPONSES
The Person’s:The Person’s:KNOWLEDGEKNOWLEDGEATTITUDESATTITUDES
EXPERIENCEEXPERIENCE
Valid and reliable responses to survey items (useful data)
-Responses to uncontrolled stimuli (noise)-These produce invalid or unreliable data
The challenge is to separate useful data from noise
Validity and Reliability of Measures
• Validity: does the variable actually measure the corresponding construct?
• In our example of the photo radar program, do we believe that police officers can actually tell whether speed was a contributing factor in a motor vehicle accident?
• Reliability: if we repeat the measurement process for a construct in a given situation, do we get the same result?
• In a given accident situation, would independent observers reach the same conclusions about speed being a contributing factor?
Types of Validity
• There are different ways of assessing validity – several are relevant here
• Face validity: do we judge the measurement process/variable to validly represent the construct?
• Content validity: would experts in the field say that the measure captures the meaning of the construct?
• Concurrent validity: does the measure correlate with another measure that is valid?– Measuring crime levels (police reports and victim surveys)
Types of Reliability
• We can also assess reliability in different ways• Having two or more independent observers take
measurements in a given situation– Two police officers completing accident report forms
• Having the same observer repeat the measurement process in a given situation– Police officer repeats the assessment of possible
contributing causes of the accident
Tests for Checking Reliability
• Test-retest method - take the same measurement more than once.
• Split-half method - make more than one measurement of a social concept (prejudice).
• Use established measures.• Check reliability of research-workers.
Characteristics of Variables
• Variables can categorize (nominal variables)– Categories must be mutually exclusive and jointly
exhaustive• In a job training program, clients could be categorized as
being on social assistance or not• Variables can rank (ordinal variables)
– Categories are ranked from less to more• In a job training program, clients could be asked to rate
the program: not beneficial, somewhat beneficial, very beneficial
• Variables can count (interval and ratio variables)– There is a unit of measurement
• Number of weeks of job training
Likert Item and Response Categories
Improved pre-harvest planning, quicker reforestation, and better planting maintenance would reduce the
need for chemical or mechanical treatments.Strongly Strongly Agree Agree Neither Disagree Disagree
1 2 3 4 5 Please circle the appropriate response
Example QuestionsQuestion 8: Do you think that your police services would
improve if your police department and all other police departments (emphasis in the original) in the West Shore area combined into one department?_____ Yes _____ No _____ Undecided
Question 9: Have you discussed this question of police consolidation with friends or neighbors?_____ Yes _____ No _____ Undecided
Question 10: Are you for or against combining your police department with police departments in surrounding municipalities?_____ Yes _____ No _____ Undecided
Examples of Validity and Reliability Issues Applicable to Surveys
Validity: Bias Source of the Problem Reliability: Random Error
race, gender, appearance, interjections,
interviewer reactions to responses
interviewer inconsistency in the way questions are worded/spoken
old age, handicaps, suspicion
respondent wandering attention
biased questions, response set, question order
instrument single measures to measure client perceptions of the
program
privacy, confidentiality,
anonymity
interviewing situation/interviewing method
noise, interruptions
biased coding, biased categories (particularly
for qualitative data)
data processing coding errors, intercoder reliability problems
Four Levels of Measurement
1. Nominal - offer names for labels for characteristics (gender, birthplace).
2. Ordinal - variables with attributes we can logically rank and order.
Four Levels of Measurement
3. Interval - distances separating variables (temperature scale).
4. Ratio - attributes composing a variable are based on a true zero point (age).