Characteristics of a Good Test

CHARACTERISTICS OF A GOOD TEST

Ann Meredith U. Garcia, MD

Reliability vs. validity

A degree of test reliability is requisite to validity.

VALID ≠ RELIABLE

TEST RELIABILITY

Definition

Consistency with which a test measures what it is measuring Consistent, constant, and repeatable results?

Over time? Across different versions of a test? Among scale items?

TEST RELIABILITY

Definition

Consistency with which a test measures what it is measuring Consistent, constant, and repeatable results? Goal: As close as possible to measuring the TRUE SCORE

TRUE SCORE ERROR

OBTAINED

SCORE

TEST RELIABILITY

Sources of error

TEST RELIABILITY

is aHUMAN BEING

Examinee

Sources of error

TEST RELIABILITY

is a HUMAN BEING

Examinee

Examiner

Sources of error

TEST RELIABILITY

is designed by & forHUMAN BEINGS

Examinee

Examiner

Examination

Sources of measurement error:1. OBJECTIVITY OF SCORING

Different scorers produce the same score if they apply the same scoring key

More objective scoring more accurate score

TEST RELIABILITY

Score1? Score2? Score3?

Sources of measurement error:2. SAMPLING OF CONTENT

A teacher cannot really construct 2 forms of a test that are independent of each other.

Another teacher’s test usually would differ even more.

TEST RELIABILITY

SET OF ALL

POSSIBLE QUESTION

S

SAMPLE OF

QUESTIONS

Sources of measurement error:2. SAMPLING OF CONTENT

If the test plan is fairly detailed and followed carefully content sampling for an objective test with a large number of items should be reasonably adequate

TEST RELIABILITY

SET OF ALL

POSSIBLE QUESTION

S

SAMPLE OF

QUESTIONS

Sources of measurement error:3. TEMPORAL INFLUENCES

TEMPORAL STABILITY – scores should fluctuate very little over a reasonably brief time interval

TEST RELIABILITY

TIME

TEST AScore?

TEST AScore?

Methods of estimating reliability:1. TEST-RETEST METHOD

Estimates TEMPORAL RELIABILITY – correlation between scores on the 2 trials

COEFFICIENT OF STABILITY – measure of the correspondence of scores obtained at 2 different times

TEST RELIABILITY

TIME

TEST AScore?

TEST AScore?

Methods of estimating reliability:1. TEST-RETEST METHOD

Assesses the external consistency of a test

NO information about possible effects of inadequate sampling of contents and processes

TEST RELIABILITY

TIME

TEST AScore?

TEST AScore?

Methods of estimating reliability:2. ALTERNATE-FORMS METHOD

COEFFICIENT OF STABILITY AND EQUIVALENCE – correlation of scores on the 2 forms would reveal not only temporal influences (delayed testing) but also content differences (immediate & delayed testing)

TIM

E

TEST AX

Score?

TEST AY

Score?

TEST RELIABILITY

Methods of estimating reliability:3. INTER-RATER RELIABILITY

Different and equally competent raters evaluate the results of a single test correlate the 2 sets of scores

Assesses the consistency of how a measuring system is implemented

TEST RELIABILITY

ESSAY

TEST

Score1? Score2?AVERAGE

Also called ODD-EVEN RELIABILITY

r = estimate of content reliability for half of the test

R = estimate of content reliability for the whole test

Methods of estimating reliability:4. SPLIT-HALF METHOD

TEST RELIABILITY

TIM

E

TESTAodd

Score?

TEST Aeven

Score?r

Methods of estimating reliability:4. SPLIT-HALF METHOD

TEST RELIABILITY

Extension of the split-half method performed on all combinations of questions average of split-half estimates that would be expected from making all possible divisions of a test into halves

Measure of internal consistency reliability for measures with dichotomous choices

Methods of estimating reliability:5. KUDER-RICHARDSON APPROACH

TEST RELIABILITY

TIM

E

TESTAodd

Score?

TEST Aeven

Score?r

k = number of questions

pj = number of people in the sample who answered question j correctly

qj = number of people in the sample who didn’t answer question j

correctly

σ2 = variance of the total scores of all the people taking the test

Methods of estimating reliability:5. KUDER-RICHARDSON APPROACH

TEST RELIABILITY

TIM

E

TESTAodd

Score?

TEST Aeven

Score?r

Advantages & disadvantages

TEST RELIABILITY

Which method should be used?

Test-retest method• Stability of test scores over time

Alternate-forms method• Consistency of scores over different test forms

Split-half & Kuder-Richardson methods• Go-togetherness of test items

TEST RELIABILITY

Factors affecting reliability:1. LENGTH OF TEST

TEST RELIABILITY

Larger sampling of responses with equally good items or greater length of test higher reliability Reliability does NOT increase in a straight line

(SPEARMAN-BROWN FORMULA) Reliability of .50 increases to .67 when the length of

a test is doubled Assumption: Subjects do not become exhausted and lose

motivation

Factors affecting reliability:2. RANGE OF TALENT

TEST RELIABILITY

Validity and reliability coefficients can be expected to increase as range of talent of the subjects increases Homogeneous group lower reliability coefficient Wider spread of scores higher reliability

Sample of subjects should be representative of those for whom one wishes to draw conclusions about individual differences

Factors affecting reliability:3. TIME LIMITS

TEST RELIABILITY

SPLIT-HALF and KUDER-RICHARDSON approaches If some students do not have time to try some items Proportion of correct responses for those items will

decrease and the score spread will increase Positive although spurious influence on the size of the

reliability coefficient

Factors affecting reliability:4. DIFFICULTY OF TEST ITEMS

TEST RELIABILITY

Narrow score distributions low reliability

SC

OR

E

VERY DIFFICULT TEST

VERY EASY TEST

Other factors affecting reliability

TEST RELIABILITY

QUALITY OF TEST ITEMS

CLARITY OF INSTRUCTIO

NS

FREEDOM FROM

DISTRACTIONS

OBJECTIVITY IN

SCORING

Best reliability

Tests are long

enough with a fair time limit

Items are near 50% difficulty

and free of ambiguity

Same, clear, and concise directions

Heteroge-neous

abilities

TEST RELIABILITY

Definition

Usefulness or applicability of the testing procedure in order to serve the needs of its users

PRACTICALITY

Economy of:

Time Effort Money

1. Ease of CONSTRUCTION Demands adequate time and informed talent

PRACTICALITY

2. Ease of ADMINISTRATION Clarity and simplicity

Ease of reading instructions

3. Ease of SCORING Subjective vs. objective?

4. Ease of INTERPRETATION and APPLICATION

Meaningfulness of scores obtained from the test Misinterpreted or misapplied test results – of little value

and may be harmful to certain individuals or groups

PRACTICALITY

Definition

RELIABILITY and VALIDITY – often discussed separately but sometimes you will see them both referred to as aspects of generalizability

Extent one can generalize the results of a measure or a test used with a particular group to other tests or other groups

GENERALIZABILITY

Thank you!

Education

Characteristics of a Good Test