Identifying the gaps in state assessment systems CCSSO Large-Scale Assessment Conference Nashville June 19, 2007 Sue Bechard Office of Inclusive Educational

Identifying the gaps in state assessment systems

CCSSO Large-Scale Assessment ConferenceNashville

June 19, 2007

Sue Bechard Office of Inclusive Educational Assessment

Ken Godin

Research Questions

Of all the students who are not proficient, how can states identify those who are in the assessment gap?

Who are the students in the gaps, what are their attributes, and how do they perform?

Gap identification processConduct exploratory

interviews with teachers to identify the assessment

gaps

Review student assessment data

Review teacher judgment data

Operationalize gap criteria

Conduct focused teacher interviews to

confirm gap criteria

Parker and Saxon: Teacher

views of students and assessments

Bechard and Godin: Finding the real assessment

gaps

Data sources

State assessment data – grade 8 mathematics results from two systems– General large-scale test results– Demographics (special programs, ethnicity, gender)– Teachers’ judgments of students’ classroom work– Student questionnaires completed at time of test– Accommodations used at time of test

State data bases for additional student demographic data– Disability classification– Free/reduced lunch– Attendance

Student-focused teacher interviews

Why use teacher judgment of students’ classroom performance?

Gap 1: the test may not reflect classroom performance

Teachers see students performing proficiently in class, but test results are below proficient.

Gap 2: the test may not be relevant for instructional planning

Teachers rate students’ class work as low as possible and test results are at “chance” level. No information is generated on what students can do.

Teacher judgment instructionsThe instructions were clear that this was to be a judgment

of the student’s demonstrated achievement on GLE-aligned academic material in the classroom, not a prediction of test performance.

NECAP: The teacher judgment field consisted of 12 possibilities – each of the 4 achievement levels had low, medium, and high divisions.

MEA: The teacher judgment field consisted of 4 possibilities - one possibility per achievement level.

(For comparisons across the two systems, we used a collapsed version of the NECAP judgments (down to the 4 achievement levels).

Research on validity of teacher judgment

While there are some conflicting results, the most accurate judgments were found when:

• teachers were given specific evaluation criteria • levels of competency were clearly delineated • criterion-referenced tests in mathematics or reading

were the matching measure • criterion-referenced tests reflected the same content as

did classroom assessments • judgments were of older students who had no

exceptional characteristics, and • teachers were asked to assign ratings to students, not to

rank-order them

Validation of teacher judgment data from NECAP and MEA

Data collected to establish as “Round 1” cutpoints (of 3 rounds) during standard-setting.

Validation studies were conducted which asked: Were there differences between the sample of students with non-

missing teacher judgments data and the rest of the population? Were there suspicious trends in the judgment data suggesting that

teachers did not take the task seriously? How did teacher judgments compare with students’ actual test

scores?

Results of these investigations were considered supportive of using the teacher judgment data for standard setting.

Teacher judgment vs. test performance (NECAP)

Mathematics Achievement Levels – Student Performance and Teacher Judgments: NECAP

Achievement Level

Overall Mathematics Performance (N=36,708)

Teacher Judgments* (n=24,168)

4 Proficient with Distinction

12.9% 17.9%

3 Proficient

40.6% 53.5%

39.7% 57.6%

2 Partially Proficient

21.6% 31.0%

1 Substantially Below

Proficient 24.9%

46.5% 11.4%

42.4%

Test Floor† (4.6%) *Collapsed from 12 to 4 categories † Students within error of bottom of scale (i.e., chance score) is subset of Achievement Level 1.

Teacher judgment vs. test performance (MEA)

Mathematics Achievement Levels – Student Performance and Teacher Judgments: MEA

Achievement Level

Overall Mathematics Performance (N=16,213)

Teacher Judgments* (n=10,319)

4 Exceeds the Standards

10.6% 9.6%

3 Meets the Standards

34.1% 44.7%

41.0% 50.6%

2 Partially Meets the

Standards 29.3% 35.9%

1 Does Not Meet the Standards

26.0%

55.3%

13.5%

49.4%

Test Floor† (8.1%)

† Students within error of bottom of scale (i.e., chance score) is subset of Achievement Level 1.

Operationalizing the gap definitions using teacher judgment

Operationalizations of the Two Gaps (Grade 8 Mathematics Test). Gap1

Non-gap 1

student performance ≤ 1 S.E.M. below sub-proficient/proficient cutscore but teacher judgment ≥ Proficient. student performance ≤ 1 S.E.M. below sub-proficient/proficient cutscore, and, if score was within 1 S.E.M. of achievement level 2 boundaries, received level 2 teacher judgment, or, if score was within 1 S.E.M. of achievement level 1 boundaries, received achievement level 1 teacher judgment.

Gap2

Non-gap 2

student performance within 1 S.E.M. of the floor of the test and teacher judgment matched as closely as possible within assessment system (NECAP: lowest available within level 1. MEA: Level 1).

student performance within 1 S.E.M. of the floor of the test and teacher judgment too high (NECAP: next higher available within level 1. MEA: Level 2).

Comparison student performance ≥ 1 S.E.M. above sub-proficient/proficient cutscore and teacher judgment ≥ Proficient.

Student questionnaires (answered after taking the test)

1. How difficult was the mathematics test?A. harder than my regular mathematics schoolworkB. about the same as my regular mathematics schoolworkC. easier than my regular mathematics schoolwork

2. How hard did you try on the mathematics test?

A. I tried harder on this test than I do on my regular mathematics schoolwork.

B. I tried about the same as I do on my regular mathematics schoolwork.

C. I did not try as hard on this test as I do on my regular mathematics schoolwork

Accommodations (used during the mathematics test)

NECAP: 16 accommodations listed by category:SettingScheduling/timingPresentation formatsResponse formats

MEA: 21 accommodations listed by category:SettingSchedulingModalityEquipmentRecording

Student-focused teacher interviews

Student profile data math test scores (both overall and on subtests)specific responses to released math test items student’s responses to the questionnaire special program statusaccommodations used during testing

Teacher interview questions Questions regarding perceptions of the students in each

gap on various aspects of gap criteria, 17 Likert scale questions on the student’s class work and

participation in classroom activities.

Student-focused teacher interview samples

NECAP sample: 20 8th grade math and special ed teachers7 schools across three states (NH, RI, and VT). 51 students: gap 1=19, gap 2=18, and comparison

group=14. MEA sample: 7 8th grade math and special ed teachers3 schools14 students: gap 1=4, non-gap 1=3, gap 2=2, non-gap

2=5, and comparison group=0.

Results: Percentages of students in the gaps (NECAP)

Breakdown of Gap Group Designations: NECAP

Group NECAP

(N=24,168) Gap 1 8.6%

Non-gap 1 8.8%† Gap 2 0.8% [2.3%]*

Non-gap 2 1.5% [1.2%]* Comparison 39.0%

† 188 (i.e., 8.7% of) non-gap 1 students scored so low that they also fit the criterion for gap 2 * Shown in brackets: If teacher judgments were collapsed to four achievement levels as on MEA.

Gap 2 and non-gap 2 percentages are different when fine or gross

grained ratings are used.

Results: Percentages of students in the gaps (MEA)

Breakdown of Gap Group Designations: MEA

Group MEA

(N=10,319) Gap 1 7.1%

Non-gap 1 7.1%† Gap 2 4.3%

Non-gap 2 3.1% Comparison 31.8%

† 444 (i.e., 60.3% of) non-gap 1 students scored so low that they also fit the criterion for gap 2

Accommodations use (NECAP)

Mathematics Accommodation Frequencies within Gap and Comparison Groups: NECAP

Within Group 0 1 2-3 4-6 7+ Gap 1 (n=2,070) 89.8%+ 3.1%- 5.6%- 1.6%- none- Non-gap 1 (n=2,129) 54.3%- 10.4%+ 23.7%+ 10.1%+ 1.6%+ Gap 2 (n=188) 26.5% 15.1% 30.8% 22.2% 5.4%

Non-gap 2 (n=369) 33.9% 16.3% 32.5% 15.5% 1.9% Comparison

(n=9,429) 97.9%+ 1.3%- 0.6%- 0.2%- none- Overall Population 89.8% 3.1% 5.6% 1.6% none + Statistically higher than expected - Statistically lower than expected

•Students in gap 1 were significantly less likely to use accommodations than students in non-gap 1. •Only a small percentage of students in gap 1 used any accommodations at all.•The majority of students in both gap 2 and non-gap 2 used one or more accommodations.

Accommodations use (MEA)

Mathematics Accommodation Frequencies within Gap and Comparison Groups: MEA Within Group 0 1 2-3 4-6 7+

Gap 1 (n= 734) 86.5%+ 1.6%- 6.7%- 4.5%- 0.7%- Non-gap 1 (n=736) 45.0%- 4.1%+ 12.9%+ 26.4%+ 11.7%+ Gap 2 (n=444) 36.3% 4.3% 14.4% 32.0% 13.1%+ Non-gap 2 (n=318) 45.0% 5.7% 21.1% 22.3% 6.0%- Comparison

(n=3, 278) 97.8%+ 0.6%- 0.7%- 0.7%- 0.1%- Overall Population 84.8% 1.4% 5.5% 6.3% 1.9% + Statistically higher than expected - Statistically lower than expected

Similar patterns of accommodations use are seen for gap 1 on the MEA as in NECAP.

Performance of students in gap 1 compared to non-gap 1 on the NECAP

Subpopulation Mean Mathematics Scaled Scores* within Gap Group Designations: Within Group IEP only ELL only IEP&ELL General Ed

Gap 1 (n=2,070) 830.9+ 829.7+ 827.7+ 833.2+

Non-gap 1 (n=2,129) 819.7- 819.1- 815.8- 829.3-

Comparison

(n=9,429) 847.4 848.8 none 850.2

Overall Population 828.2 827.3 817.5 842.3

*AL scale score ranges

AL 1: 800-833

AL 2: 834-839

AL 3: 840-851

AL 4: 852-880

Below proficient Above proficient

+ Statistically higher than expected- Statistically lower than expected

Performance of students in gap 1 compared to non-gap 1 on the MEA

Subpopulation Mean Mathematics Scaled Scores* within Gap Group Designations: Within Group IEP only ELL only IEP&ELL General Ed

Gap 1 (n=734) 823.6+ 826.0+ none 827.7+

Non-gap 1 (n=736) 808.8- 812.2- 808.6 812.5-

Comparison (n=3,278) 855.4 856.0 none 858.6

Overall Population 824.1 828.0 815.1 842.6

*AL scale score ranges

AL 1: 800-828

AL 2: 829-840

AL 3: 841-860

AL 4: 861-880

Below proficient Above proficient


Special program status of students in gap 1 (NECAP)

Breakdown of Subpopulations within Gap 1 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed

Gap 1 (n=2,070) 14.2%- 2.3% 0.1% 83.4%+

Non-gap 1 (n=2,129) 50.8%+ 5.0% 0.9% 43.3%-

Comparison

(n=9,429) 2.2% 0.5% none 97.3%

Overall Population 15.1% 1.9% 0.2% 82.8%

•The majority of students in gap 1 were in general education.

•Students with IEPs were under-represented in gap 1 and over-represented in non-gap 1.


Special program status of students in gap 1 (MEA)

Breakdown of Subpopulations within Gap 1 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed

Gap 1 (n=734) 12.3%- 1.1% none 86.7%+

Non-gap 1 (n=736) 50.3%+ 4.6% 1.0% 44.2%-

Comparison (n=3,278) 2.5% 0.7% none 96.7%


There were similar gap 1 compositions in MEA.


Disability designations in gap 1

Learning disabilities (NECAP) Gap 1: 57.7% of the IEP gap 1 group (n=208)Non-gap 1: 49.7% of the IEP non-gap 1 group (n=860)Comparison: 49.2% of the IEP comparison group (n=83)Total population: 52% of students with IEPs (N=4,465)

Disability designations only seen in non-gap 1:NECAP: Students with learning impairments, deafness, multiple

disabilities and traumatic brain injury

MEA: Students with learning impairments and traumatic brain injury

Additional characteristics of students in gap 1 compared to non-gap 1

Gap 1 students:Were more likely female and whiteHad the fewest absencesHad higher SESFound the state test about the same level of

difficulty as class workExhibited academic and mathematics-

appropriate behaviors in class

Performance of students in gap 2 on the test (NECAP and MEA)

By definition, students in both gap 2 and non-gap 2 scored no better than chance on the assessment.

Special program status of students in gap 2 (NECAP)

Breakdown of Sub-Populations within Gap 2 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed

Gap 2 (n=185)

80.0% 6.5% 2.7% 10.8%-

Non-gap 2 (n=369)

69.4% 9.8% 1.6% 19.2%

Comparison (n=9,429)

2.2% 0.5% none 97.3%

Overall Population

15.1% 1.9% 0.2% 82.8%

The majority of students in gap 2 and non-gap 2 were students with IEPs.

Special program status of students in gap 2 (MEA)Breakdown of Sub-Populations within Gap 2 and Comparison Groups

Within Group IEP only ELL only IEP&ELL General Ed

Gap 2 (n=444) 57.4% 4.5% 1.1% 36.9%

Non-gap 2 (n=318) 47.8% 2.5% 0.9% 48.7%

Comparison (n=3,278) 2.5% 0.7% none 96.7%


MEA results show the majority of the students in gap 2 had IEPs.

The percentages of students in general education in gap 2 and non-gap 2 groups are higher than in NECAP.

Disability designations in gap 2

Learning disabilities: Fewer than half of the students in gap 2 groups had learning disabilities in both systems

Other disability designations differed between the two systems.NECAP Students who were deaf/blind and those with multiple disabilities were

only found in gap 2. Students with hearing impairments, deafness and traumatic brain injury

were only found in non-gap 2.

MEA Students with hearing impairments were only in gap 2. Students with visual impairments or blindness were only in non-gap 2.

Additional characteristics of students in gap 2 compared to non-gap 2Students in gap 2 were very similar to students in non-gap

2 on most variables.

Students from both groups felt that the test was as hard as or harder than their schoolwork.

They tried as hard as or harder on the test as in class.

They used mathematics tools in the classroom (e.g., calculators).

Summary: How many students are in the gaps?

10.9% - 11.4% of the total student population in two systems are in gaps 1 & 2.

NECAPGap 1 = 8.6% Gap 2 = 2.3%

MEAGap 1 = 7.1% Gap 2 = 4.3%

SummaryWe found substantial differences between the

composition of the gap 1 groups, which held in both systems.

Gap 1 students may have characteristics and behaviors that mask their difficulties.

Non-gap 1 students are those generally thought to be in the “achievement gap”.

Summary (cont.)

Low performing students in gap 2 and non-gap 2 share many characteristics.

Their extremely low performances in both classroom activities and the test raise issues about the relevancy of the general assessment for them.

ConclusionsFor students in gap 1, increase focus on classroom supports and

training on how to transfer their knowledge and skills from classroom to assessment environments.

For students in non-gap 1, examine expectations and opportunities to learn. Providing a different test based on modified academic achievement standards is premature.

Students with IEPs in gap 2 and non-gap 2 may benefit from the 2% option for AYP and an alternate assessment based on modified academic achievement standards (AA-MAAS).

There will be challenges designing a test based on MAAS that is strictly aligned with grade level content.

www.measuredprogress.org

[email protected]@measuredprogress.org

Documents

Identifying the gaps in state assessment systems CCSSO Large-Scale Assessment Conference Nashville June 19, 2007 Sue Bechard Office of Inclusive Educational