Upload
ann-meredith-garcia
View
141
Download
0
Tags:
Embed Size (px)
Citation preview
Reliability vs. validity
A degree of test reliability is requisite to validity.
VALID ≠ RELIABLE
TEST RELIABILITY
Definition
Consistency with which a test measures what it is measuring Consistent, constant, and repeatable results?
Over time? Across different versions of a test? Among scale items?
TEST RELIABILITY
Definition
Consistency with which a test measures what it is measuring Consistent, constant, and repeatable results? Goal: As close as possible to measuring the TRUE SCORE
TRUE SCORE ERROR
OBTAINED
SCORE
TEST RELIABILITY
Sources of measurement error:1. OBJECTIVITY OF SCORING
Different scorers produce the same score if they apply the same scoring key
More objective scoring more accurate score
TEST RELIABILITY
Score1? Score2? Score3?
Sources of measurement error:2. SAMPLING OF CONTENT
A teacher cannot really construct 2 forms of a test that are independent of each other.
Another teacher’s test usually would differ even more.
TEST RELIABILITY
SET OF ALL
POSSIBLE QUESTION
S
SAMPLE OF
QUESTIONS
Sources of measurement error:2. SAMPLING OF CONTENT
If the test plan is fairly detailed and followed carefully content sampling for an objective test with a large number of items should be reasonably adequate
TEST RELIABILITY
SET OF ALL
POSSIBLE QUESTION
S
SAMPLE OF
QUESTIONS
Sources of measurement error:3. TEMPORAL INFLUENCES
TEMPORAL STABILITY – scores should fluctuate very little over a reasonably brief time interval
TEST RELIABILITY
TIME
TEST AScore?
TEST AScore?
Methods of estimating reliability:1. TEST-RETEST METHOD
Estimates TEMPORAL RELIABILITY – correlation between scores on the 2 trials
COEFFICIENT OF STABILITY – measure of the correspondence of scores obtained at 2 different times
TEST RELIABILITY
TIME
TEST AScore?
TEST AScore?
Methods of estimating reliability:1. TEST-RETEST METHOD
Assesses the external consistency of a test
NO information about possible effects of inadequate sampling of contents and processes
TEST RELIABILITY
TIME
TEST AScore?
TEST AScore?
Methods of estimating reliability:2. ALTERNATE-FORMS METHOD
COEFFICIENT OF STABILITY AND EQUIVALENCE – correlation of scores on the 2 forms would reveal not only temporal influences (delayed testing) but also content differences (immediate & delayed testing)
TIM
E
TEST AX
Score?
TEST AY
Score?
TEST RELIABILITY
Methods of estimating reliability:3. INTER-RATER RELIABILITY
Different and equally competent raters evaluate the results of a single test correlate the 2 sets of scores
Assesses the consistency of how a measuring system is implemented
TEST RELIABILITY
ESSAY
TEST
Score1? Score2?AVERAGE
Also called ODD-EVEN RELIABILITY
r = estimate of content reliability for half of the test
R = estimate of content reliability for the whole test
Methods of estimating reliability:4. SPLIT-HALF METHOD
TEST RELIABILITY
TIM
E
TESTAodd
Score?
TEST Aeven
Score?r
Extension of the split-half method performed on all combinations of questions average of split-half estimates that would be expected from making all possible divisions of a test into halves
Measure of internal consistency reliability for measures with dichotomous choices
Methods of estimating reliability:5. KUDER-RICHARDSON APPROACH
TEST RELIABILITY
TIM
E
TESTAodd
Score?
TEST Aeven
Score?r
k = number of questions
pj = number of people in the sample who answered question j correctly
qj = number of people in the sample who didn’t answer question j
correctly
σ2 = variance of the total scores of all the people taking the test
Methods of estimating reliability:5. KUDER-RICHARDSON APPROACH
TEST RELIABILITY
TIM
E
TESTAodd
Score?
TEST Aeven
Score?r
Which method should be used?
Test-retest method• Stability of test scores over time
Alternate-forms method• Consistency of scores over different test forms
Split-half & Kuder-Richardson methods• Go-togetherness of test items
TEST RELIABILITY
Factors affecting reliability:1. LENGTH OF TEST
TEST RELIABILITY
Larger sampling of responses with equally good items or greater length of test higher reliability Reliability does NOT increase in a straight line
(SPEARMAN-BROWN FORMULA) Reliability of .50 increases to .67 when the length of
a test is doubled Assumption: Subjects do not become exhausted and lose
motivation
Factors affecting reliability:2. RANGE OF TALENT
TEST RELIABILITY
Validity and reliability coefficients can be expected to increase as range of talent of the subjects increases Homogeneous group lower reliability coefficient Wider spread of scores higher reliability
Sample of subjects should be representative of those for whom one wishes to draw conclusions about individual differences
Factors affecting reliability:3. TIME LIMITS
TEST RELIABILITY
SPLIT-HALF and KUDER-RICHARDSON approaches If some students do not have time to try some items Proportion of correct responses for those items will
decrease and the score spread will increase Positive although spurious influence on the size of the
reliability coefficient
Factors affecting reliability:4. DIFFICULTY OF TEST ITEMS
TEST RELIABILITY
Narrow score distributions low reliability
SC
OR
E
VERY DIFFICULT TEST
VERY EASY TEST
Other factors affecting reliability
TEST RELIABILITY
QUALITY OF TEST ITEMS
CLARITY OF INSTRUCTIO
NS
FREEDOM FROM
DISTRACTIONS
OBJECTIVITY IN
SCORING
Best reliability
Tests are long
enough with a fair time limit
Items are near 50% difficulty
and free of ambiguity
Same, clear, and concise directions
Heteroge-neous
abilities
TEST RELIABILITY
Definition
Usefulness or applicability of the testing procedure in order to serve the needs of its users
PRACTICALITY
Economy of:
Time Effort Money
1. Ease of CONSTRUCTION Demands adequate time and informed talent
PRACTICALITY
2. Ease of ADMINISTRATION Clarity and simplicity
Ease of reading instructions
3. Ease of SCORING Subjective vs. objective?
4. Ease of INTERPRETATION and APPLICATION
Meaningfulness of scores obtained from the test Misinterpreted or misapplied test results – of little value
and may be harmful to certain individuals or groups
PRACTICALITY
Definition
RELIABILITY and VALIDITY – often discussed separately but sometimes you will see them both referred to as aspects of generalizability
Extent one can generalize the results of a measure or a test used with a particular group to other tests or other groups
GENERALIZABILITY