Form Effects on the Estimation of Students’ Progress in Oral Reading Fluency using CBM

Form Effects on the Estimation of Students’

Progress in Oral Reading Fluency using CBM

David J. Francis, University of Houston Kristi L. Santi, UT - Houston

Chris Barr, University of Houston

CRESST September 8, 2005

Overview

Curriculum Based Measurement (CBM) to Monitor Student Progress and Inform Instruction

Methods Results Conclusions

Background

Report of the National Reading Panel (NRP, 2000) highlighted the importance of instruction and assessment in five domains of reading and related skills

Phonemic awarenessPhonicsFluencyVocabularyComprehension

Background

No Child Left Behind (NCLB) and Reading First (RF) are based on the NRP model of reading acquisition and mastery

RF emphasizes The five domains, Three-tier model of instruction, prevention and

intervention, Four purposes of assessment in guiding

instruction

Purposes of Assessment

Reading First describes four purposes for assessment in the five domains: Screening Diagnosis

Progress Monitoring Outcome

All in the service of guiding instruction

Progress Monitoring

Monitor student progress toward year-end goals

Provide teachers regular feedback on students’ rate of skill acquisition

Identify students needing modification to current instruction based on low rate of skill acquisition

Progress Monitoring

Essential characteristics Administer on a regular basis Brief and easy to administer in the classroom Provide scores on a constant metric Predictive of end of year outcomes Free from measurement artifacts such as practice

effects and form effects

CBM has been proposed as having these properties

What is CBM?

Students read connected text for a fixed duration of time, typically one minute

Oral reading fluency (WCPM) is computed and charted as a measure of growth in reading rate

Reading materials range from basal readers to pre-packaged texts

DIBELS Developed by Good and Kaminski CBM measure of early reading skills using

one minute probes Included in this study due to

A large number of stories are in place for fluency assessment

Developers’ efforts to equate stories for “readability” Ubiquitous in RF for PM assessment

Many Strengths

Quick, easy assessment One minute probe given once a week Teacher friendly format Easy to follow directions Instructionally relevant information Within grade evaluation of student growth

Why might we expect form effects?

Story construction Readability formulas are not perfect Difficult to precisely control text features that

affect fluency Lack of attention to scaling

Stories have been pre-equated for text features

No attempt to empirically equate forms Assumption that WCPM provides a constant

scale

Purpose of Current Study

Examine form effects on DIBELS Oral Reading Fluency (DORF) at single time point in grade 2

Examine form effects on inferences about growth in DORF over 6 weeks in grade 2

Methods

Setting and Participants

Two schools in HISD 134 students

85 from school A 49 from school B 69 females 65 males

Ethnically diverse student populations

Measures

DORF Passages (n=29) Six passages were randomly selected

Spache readability index average = 2.65 Range 2.6 to 2.7

Degrees of Reading Power readability index = 45.67 Range 44 to 46 Scale 0 (easy) to 100 (difficult)

Procedures

3 research assistants administered the probes to all students once every two weeks

Inter-rater reliability of .85 established prior to start of study

Passages administered according to guidelines provided in DIBELS manual

Story order randomly assigned (1 of 6) Three stories read at baseline One story read in waves 2-4

Random Assignment of Students to Passages Each student read three passages at baseline Design allows estimation story, order, and story by order effects

GROUP BABY BOOK COLOR HOME POOL TWIN

A 1 2 3

B 1 2 3

C 1 2 3

D 1 2 3

E 3 1 2

F 2 3 1

Despite randomization of students to six groups, group differences in fluency were apparent at baseline Using a measure of fluency from the Texas

Primary Reading Inventory (TPRI), the six groups differed in mean fluency

F(5,118) = 3.98, p < .002 Means ranged from 47 to 80 WCPM across the 6

groups

Subsequent analyses used TPRI fluency as a covariate

When TPRI fluency is covaried, groups do not differ on any particular form/story.

Note we’re not saying that DIBELS stories are equal, only that for any given story, groups did not differ in performance after controlling for TPRI fluency.

Data Analysis

Analyzed oral reading fluency using mixed model approach to repeated measures analysis of variance using SAS PROC MIXED

Fixed effects Random effects: TPRI_Fluency(TPRI_story) Story Correlations DIBELS_Story (1-6) (By Order) DIBELS Order (1,2,3) DIBELS_Story by Order

Results

Descriptive Data

ORDER BABY BOOK COLOR HOME POOL TWIN

1 M 76.00

SD 34.04

M 66.30

SD 35.58

M 82.58

SD 28.39

M 105.37

SD 35.22

M 93.26

SD 34.03

M 69.83

SD 22.54

2 M 78.21

SD 29.66

M 71.10

SD 32.43

M 78.68

SD 31.28

M 95.44

SD 38.23

M 81.85

SD 36.05

M 105.55

SD 30.35

3 M 69.00

SD 29.03

M 66.46

SD 27.04

M 111.00

SD 35.83

M 81.50

SD 35.18

M 82.38

SD 35.79

M 95.21

SD 35.47

Grand Mean

M 74.33

SD 30.65

M 67.91

SD 31.18

M 89.41

SD 34.24

M 93.86

SD 36.91

M 85.65

SD 35.12

M 88.83

SD 32.84

Grand

Mean

M 81.53

SD 33.65

M 84.40

SD 34.20

M 83.13

SD 35.74

M 83.02

SD 34.47

Tests of Fixed Effects

EFFECTNUM

DFDEN DF

F VALUE

Pr > F

TPRI_FLUENCY(T_STORY) 5 226 117.8 <.0001ORDER 2 226 2.4 0.0931DIBELS STORY 5 226 19.0 <.0001DIBELS STORY X ORDER 10 226 0.75 0.6798

Pairwise Differences in LS Means

Story LS Means Baby Diff Book Diff Color Diff Home Diff Pool Diff Twins Diff

BABY 62.20

BOOK 50.30 11.90COLOR 69.46 -7.26 -19.16

HOME 68.05 -5.85 -17.75 1.41

POOL 65.28 -3.08 -14.98 4.18 2.77

TWIN 67.03 -4.83 -16.73 2.43 1.02 -1.75

What about rate of growth?

Real interest in DORF passages is to estimate rate of skill acquisition

Typical Practice Test Students Every 2 Weeks Compute a best-fitting straight line through the

data Students with low rates are targeted for

intervention or adjustments to instruction

Descriptive Data over 4 Waves

WAVE BABY BOOK COLOR HOME POOL TWIN

1 M 74.33

SD 30.65

M 67.91

SD 31.18

M 89.41

SD 34.24

M 93.86

SD 36.91

M 85.65

SD 35.12

M 88.83

SD 32.84

2 M 107.95

SD 35.44

M 61.27

SD 33.13

M 97.33

SD 34.07

M 87.75

SD 33.82

M 77.96

SD 31.74

M 81.10

SD 33.67

3 M 93.89

SD 38.49

M 97.00

SD 38.34

M 83.00

SD 35.91

M 79.35

SD 29.61

M 71.14

SD 20.65

M 80.95

SD 28.94

4 M 87.11

SD 38.28

M 86.69

SD 41.26

M 88.19

SD 37.73

M 75.63

SD 27.92

M 105.58

SD 33.71

M 81.57

SD 26.57

Grand Mean

M 84.26

SD 35.63

M 73.68

SD 36.03

M 89.37

SD 34.85

M 86.65

SD 34.02

M 84.65

SD 33.46

M 85.02

SD 31.20

Grand

Mean

M 83.02

SD 34.47

M 83.81

SD 36.19

M 83.93

SD 32.91

M 86.81

SD 34.54

M 83.91

SD 34.48

Tests of Fixed Effects

EFFECTNUM

DFDEN DF

F VALUE

Pr > F

TPRI_FLUENCY(STORY) 5 224 137.8 <.0001WAVE 1 115 0.24 0.6239GROUP 5 110 3.27 0.0086WAVE*GROUP 5 115 3.47 0.0047

Estimated Growth Rates

LSMean Fluency by Wave and Group

Linear and Quadratic Trends by Group

Group Wave 1 Wave 2 Wave 3 Wave 4 Linear Quadratic

A Twin Color Baby Book Pool Home 2.23 4.44

B Color Baby Book Pool Home Twin 0.01 1.51

C Baby Book Pool Home Twin Color 1.08 -0.59

D Book Pool Home Twin Color Baby 0.09 0.49

E Pool Home Twin Color Baby Book -3.66 -3.54

F Home Twin Color Baby Book Pool -0.96 3.21

Conclusions

Conclusions

Form Effects in PM assessments must be addressed if teachers are to: Form valid inferences about student progress Target the right students for intervention and

supplemental instruction

The problem is not one of reliability in terms of low correlation between alternate forms

The problem is one of inconsistency in scaling across forms

Conclusions (cont.)

These form effects adversely affect the reliability and validity of slope estimates.

The problem is not unique to DIBELS, nor to CBM, but it has been ignored in this literature.

CBM was chosen for this study because of its popularity for PM assessment.

The CBM literature implies that fluency (WCPM) inherently provides a constant scale.

For WCPM to provide a constant scale, forms must be parallel

A more viable solution is to remove “form effects” through scaling of the raw ORF scores

We have to develop a scale score that takes “form difficulty” into account

One potential solution is equipercentile equating

Progress Monitoring

Solution is to empirically equate forms and develop a scale score metric that factors out form differences

Because of the large number of forms in use, we propose a “FEDEX” model that equates all forms to a single standard form based on percentiles

Documents

Form Effects on the Estimation of Students’ Progress in Oral Reading Fluency using CBM