13
Test Validity S-005

Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Embed Size (px)

Citation preview

Page 1: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Test Validity

S-005

Page 2: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Validity of measurement

• Reliability refers to consistency

– Are we getting something stable over time?

– Internally consistent?

• Validity refers to accuracy

– Is the measure accurate?

– Are we really measuring what we want?

Page 3: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Important distinction!The term “validity” is used in two different ways

1. Validity of an assessment or method of collecting data

• The validity of a test or questionnaire or interview

2. Validity of a research study

• Was the entire study of high quality

• Did it have high internal and external validity

Page 4: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Important distinction!The term “validity” is used in two different ways

1. Referring to entire studies or research reports:– OK: “We examined the internal validity of the study.”– OK: “We looked for the threats to validity.”– OK: “That study involved randomly assigning students to

groups, so it had strong internal validity, but it was carried out in a special school, so it is weak on external validity.”

2. Referring to a test or questionnaire or some assessment:– OK: “The test is a widely used and well-validated measure of

student achievement.”– OK: “The checklist they used seemed reasonable, but they

did not present any information on its reliability or validity.”

– NOT: “The test lacked internal validity.” (This sounds very strange to me.)

Page 5: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Types of validity

• Validity – the extent to which the instrument (test, questionnaire, etc.) is measuring what it intends to measure– Examples:

Math test is it covering the right content and concepts? is it also influenced by reading level or background knowledge?

Attitude assessment are the questions appropriate? does it assess different dimensions of attitudes (intensity, direction, etc.)

• Validity is also assessed in a particular context– A test may be valid in some contexts and not in others– A questionnaire may be useful with some populations and not so

useful with other groups– Not: “The test has high validity.”– OK: “The test has been useful in assessing early reading skills

among native speakers of English.”

Page 6: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Types of validity

• Content validity

– The extent to which the items reflect a specific domain of content Is the sample of items really representative?

– Often a matter of judgment

– Experts may be asked to rate the relevance and appropriateness of the items or questions

e.g., rate each item: very important / nice to know / not important

– “Face validity” refers to whether the items appear to be valid (to the test taker or test user)

Page 7: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Types of validity

Criterion-related validity• Concurrent validity

– agreement with a separate measure– common in educational assessments

e.g., Bayley Scales and S-B IQ test Complete version and screening test version

– Issue: Is there really a strong existing measure, a “gold standard” we can use for validating a new measure?

• Predictive validity– agreement with some future measure

– SAT scores and college GPA

– GRE scores and graduate school performance

Page 8: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Types of validity (cont.)

Construct validity

• Does the measure appear to produce results that are consistent with our theories about the construct?

– Example: We have a “stage-model” of development, so does out measure produce scores/results that look like “stages”?

• Convergent validity– Does out measure converge or agree with other measures that should be similar?

And . . .

• Discriminant validity– Does our measure disagree (or diverge) where it should be different?

Page 9: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Stanford Achievement Test Example – Grade 1

Page 10: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

Stanford Achievement Test Example – Grade 12

Page 11: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

McCarthy Screening test example

• A test for pre-school children (2.5 – 8.5)• Six subtests:

– Verbal, perceptual-performance, quantitative, general cognitive (composite), memory, motor

• Reliability evidence for using a short version as a screening test– Split-half correlations for several scales (r = .60 to .80)

– Test-retest reliability for other scales (on a subset of children) showed a range of correlations, from .32 to .70.

Page 12: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

McCarthy Scales of Children’s Abilities

• Reliability• The internal consistency coefficients for the General

Cognitive Index (GCI) averaged .93 across 10 age groups between 2.5, and 8.5 years.

• Test-retest reliability of GCI over a one month interval was .80. Stability coefficients of the cognitive scales ranged from .62 to .76 with the Motor Scale emerging as the only scale that lacked stability (r=.33).

Page 13: Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity

A short version developed as a screening test

Validity information for a short version

• A sample of 60 children with learning disabilities• On full version of entire test

– 53 out of 60 (88%) failed at least 2 of the 6 subtests

• On the short version (the proposed screening version)– 40 out of 60 (67%) failed (and would be identified)

• Is this enough information?