13
The Many Faces of Reliability and Validity November 17, 2010 James A. Penny, PhD Stephen B. Johnson, PhD Diane M. Talley, MA Castle Worldwide

Reliability And Validity

Embed Size (px)

DESCRIPTION

Presentation given at ICE 2010 (Atlanta) regarding basic concepts and vocabulary of test reliability and validity

Citation preview

Page 1: Reliability And Validity

The Many Faces of Reliability and Validity

November 17, 2010

James A. Penny, PhDStephen B. Johnson, PhD Diane M. Talley, MA

Castle Worldwide

Page 2: Reliability And Validity

Why reliability & validity matter The Standards Reliability Validity Round Tables

Reliability & validity for small programsFocus on reliabilityFocus on validity

Wrap up and questions

The plan for our roundtable session

ICE 2010 Conference Atlanta Georgia

Page 3: Reliability And Validity

First principles – why they matter

We want to make a decision about someone’s competence (reliability)

We can’t physically “weigh” the concept

So we make logical inferences linking what can be observed to the concept (validity)

ICE 2010 Conference Atlanta Georgia

Page 4: Reliability And Validity

The Standards

The Standards:The degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed purpose

The Uniform Guidelines require a demonstration of validity if a test is used in employment selection

Especially if adverse impact exists

The Supreme Court – validity matters

ICE 2010 Conference Atlanta Georgia

Page 5: Reliability And Validity

Information from testing is like a radio signal

Measures (tests and items) have signal and noise like an AM radio.

When the signal is strong and the noise is weak, you’re happy.

ICE 2010 Conference Atlanta Georgia

Page 6: Reliability And Validity

But reliability of what and assessed how?

ICE 2010 Conference Atlanta Georgia

Consistency, replicability, and/or the precision of our measure We have to infer reliability

Reliability comes in many flavors Test-retest Parallel Forms Split half KR-20 KR-21 Cronbach’s alpha Generalizability

Decision consistency Standard error of measurement

Page 7: Reliability And Validity

Factors influencing reliability

ICE 2010 Conference Atlanta Georgia

Test length Increasing the length increases consistency

Location of the cut score in the score distribution Away from the mean improves consistency

Test score variability Increasing variability increases consistency

Spread of candidate performance (heterogeneity) Consistency of testing experiences Quality of item writing process

Page 8: Reliability And Validity

Validity is about making a case

ICE 2010 Conference Atlanta Georgia

Data being collected supports the theory

Procedural evidence How were the content requirements, items, and tests created?

Measurement evidence Linear factor analysis, SEM Regression analysis (prediction) Relationship to other measures

Should be able to stand court scrutiny

Page 9: Reliability And Validity

Like reliability, validity has many flavors

ICE 2010 Conference Atlanta Georgia

But what about Face

validity?

But what about Face

validity?

Page 10: Reliability And Validity

Factors influencing validity Process matters Purpose matters Theory matters Logical implications Measurement tool must match

measurement goals Nature of the group

Age, gender, background, heterogeneity

Reliability

ICE 2010 Conference Atlanta Georgia

Page 11: Reliability And Validity

Relationship between validity and reliability

Valid tests should be reliable (if measuring validity). Reliable tests may not be valid. Validity is more important than reliability (according

to the courts). BUT, the more important the decision, the more reliable you must be.

To be useful, an instrument (test, scale) must be both reasonably reliable and valid.

Aim for validity first, and then try make the test more reliable little by little.

ICE 2010 Conference Atlanta Georgia

Page 12: Reliability And Validity

Implications for creating assessments

ICE 2010 Conference Atlanta Georgia

Clearly define the construct(s) to be assessed. Identify the logical outcomes of that definition. Define how the construct(s) should be measured.

e.g., format, time required Don’t let the format do the driving!

Define the process to create any required tools. Identify who should be involved in creating any tools. Identify how you will assess reliability and validity.

Page 13: Reliability And Validity

Questions?

James A. Penny [email protected] Stephen B. Johnson [email protected]

Diane M. Talley [email protected]

919.572.6880www.castleworldwide.com