33
Session 5: Analysing Tests and Test Items using Classical Test Theory (CTT) Professor Jim Tognolini

Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

  • Upload
    neqmap

  • View
    65

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Session 5: Analysing Tests and Test Items using Classical Test Theory (CTT)

Professor Jim Tognolini

Page 2: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Analysing Tests and Test Items using Classical Test Theory (CTT)

During this session we will

•define some basic test level statistics using Classical Test Theory analyses: test mean, test discrimination and test reliability (Chronbach’s Alpha).

•define some basic item level statistics from Classical test theory: item difficulty, item discrimination (Findlay Index and Point Biserial Correlation).

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 3: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

• Difficulty

• Discrimination

• Reliability

• Validity

Test characteristics to evaluate

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 4: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Test difficulty

Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016

Page 5: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Test discrimination

The ability of a test to discriminate between high- and low-achieving individuals is a function of the items that comprise the test.

Capacity Development Workshop: Test and Item Development and Design,

Laos, September 2016

Page 6: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Methods of estimating reliabilityMethod Type of Reliability Procedure

Test-Retest Stability Reliability Give the same test to the same group on different occasions with some time between tests.

Equivalent Forms Equivalent Reliability Give two forms (parallel forms) of the test to the same group in close succession.

Split-half Internal Consistency Give test once; split test in half (odd/even); get the correlation between the score; correct the correlation between halves using the Spearman-Brown formula.

Coefficient Alpha Internal Consistency Give test once to a group and apply formula.

Interrater Consistency of Ratings Get two or more raters to score the responses and calculate the correlation coefficient.

Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016

Page 7: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Split-halves method

Reliability can also be estimated from a single administration of a test, either by correlating the two halves or by using the Kuder-Richardson Method.

The Split-halves method requires the test to be split into halves which are most equivalent.

To estimate the reliability of the full test the Spearman-Brown Adjustment is usually applied

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 8: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Kuder-Richardson (KR-20 and KR-21) Method

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 9: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Cronbach’s alpha method

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 10: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

1. Test length

In general the longer the test the higher the reliability (more adequate sampling) provided that the material that is added is identical in statistical and substantive properties

2.Homogeneity of group

The more heterogeneous the group, the high the reliability. It can vary at different score levels, gender, location, etc.

3.Difficulty of items

Tests that are too difficult or too hard provide results of low reliability. Generally set tests of item difficulty equal to 0.5. In general with tests that are required to discriminate, spread questions over the range in which the discrimination is required.

Ways to improve reliability

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 11: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

4. Objectivity

The more objective the test (and marking scheme) the more reliable are the resulting test scores.

5.Retain Discriminating Items

In general replace items with a low discrimination with those that highly discriminate. There comes a point where this practice raises the reliability to such a point that it lowers validity (attenuation paradox).

6.Increase Speededness of the Tests

Highly speeded tests usually show higher reliability. Don’t use internal consistency estimates.

Ways to improve reliability

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 12: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Types of validity

There are many different types of validity. Traditionally there are three main types:

I. Content Validity (sometimes referred to as curricular or instructional validity)

II. Criterion Related Validity (types include predictive and concurrent validity)

III. Construct ValidityIV. Face Validity

Loevinger (1957) argued that “since predictive, concurrent and content validities are all essentially ad hoc, construct validity of the whole of validity from a scientific point of view”

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 13: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Define some basic item level statistics from Classical Test Theory

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 14: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Item difficulty

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 15: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Item discrimination

Methods for checking item discrimination include

•The Findlay Index (FI)

•The Point Biserial Correlation

•The Biserial Correlation

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 16: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Findlay Index (FI)

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 17: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Findlay Index (FI) – An example

Item NRU NRL NU FI Comment

1 9 2 10 0.7 Good item, better students do well

2 6 6 10 0.0 Weak item, does not discriminate

3 6 8 10 -0.2 Invalid item, weak students do better

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 18: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Findlay Index (FI)

If the number of students in the top group is not equal to the number in the bottom group proportions must be used.

where

PRU = proportion of persons right in upper groupPRL = proportion of persons right in lower group

FI = PRU - PRL

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 19: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Graphical display of the Findlay Index (FI)Calculate the proportion of the group getting the item correct and then plot this against the mean score for the particular group mean scores for each group.

Capacity Development Workshop: Test and Item Development and Design,

Laos, September 2016

Page 20: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Graphical display of the Findlay Index (FI)

0

0.2

0.4

0.6

0.8

1

L M U

Prop

ortio

n Co

rrec

t

Score Group

Item 2

Item 6.2

Item 7

Item 10.4

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 21: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Findlay Index (FI) – An example

Item Type SA SA SA SA SA E E E E E E E Total

Item Number 1 2 3 4 5 6 7 8 9 10 11 12

Max Marks 1 1 1 1 1 3 2 2 3 4 3 6 28

Astha 1 1 0 0 1 3 0 1 3 1 3 4 18

Bosco 1 1 1 0 1 3 0 1 3 1 3 3 18

Chetan 1 1 1 1 1 3 0 2 1 2 3 5 21

Devika 1 1 1 0 1 3 0 2 1 1 2 3 16

Emily 1 1 1 1 1 3 0 1 3 4 2 3 21

Farhan 1 1 1 1 1 3 1 2 3 3 3 4 24

Gogi 1 1 1 0 1 0 0 1 0 0 0 1 6

Harshita 1 1 1 1 1 3 2 1 3 4 3 3 24

Indu 0 1 0 0 1 0 0 2 0 0 2 0 6

Jagat 1 1 1 1 1 2 1 1 3 2 3 5 22

TOTAL 9 10 8 5 10 23 4 14 20 18 24 31 176

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 22: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Findlay Index (FI) – An example

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 23: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Findlay Index (FI) – An example

Capacity Development Workshop: Test and Item Development and Design, Laos, September

2016

Page 24: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Findlay Index (FI) – An example

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 25: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Findlay Index (FI) – An example

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 26: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Guttman scale

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 27: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Point-biserial correlation

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 28: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Guttman structure

If person A scores better than person B on the test, then person A should have all the items correct that person B has, and in addition, some other items that are more difficult.

Louis Guttman

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 29: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

The Guttman structure (cont.)1 2 3 4 5 6 Total

Score 0 0 0 0 0 0 0

1 0 0 0 0 0 1

1 1 0 0 0 0 2

1 1 1 0 0 0 3

1 1 1 1 0 0 4

1 1 1 1 1 0 5

1 1 1 1 1 1 6

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 30: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Reasons for not obtaining a strict Guttman pattern

• The items do not go together as expected and the scores on the items should not be added.

• The items are very close in difficulty and the persons are all close in ability.

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 31: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Guttman scale

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 32: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Individual reporting

3 11 2 15 14 9 8 1 7 4 13 12 5 10 6

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016

Page 33: Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Individual reporting

3 11 2 15 14 9 8 1 7 4 13 12 5 10 6

Capacity Development Workshop: Test and Item Development and Design, Laos,

September 2016