Liru Zhang, Delaware DOE Shudong Wang, NWEA Presented at the 2015 NCSA Annual Conference, San Diego, CA 1

1

Seeking Validity Evidence of Passage-Based Computer Adaptive Reading Test with

Item-Level Selection

Liru Zhang, Delaware DOEShudong Wang, NWEA

Presented at the 2015 NCSA Annual Conference, San Diego, CA

2

At the heart of the Common Core State Standards (CCSS) in English language arts/Literacy (ELA/LIT) is the shift of instruction to center on text. The standards focus on the growing complexity of texts (or passages) and using evidence collected from texts to present analyses and well-defended claims.

To align the CCSS in K-12 assessments, students are expected to read and comprehend grade-appropriate passages across content categories in a variety of topics and respond a range of passage-dependent questions that require inferences based on the passage instead of questions that can be answered from prior knowledge and experiences (Coleman and Pimentel, 2012).

Background (1)

3

With innovative technology, advantages of online testing, and encouragement of educational policy, the computerized-adaptive testing (CAT) has been greatly implemented in K-12 assessment.

In CAT, the adaptive process is typically based on item-level selection. The ultimate goals are to satisfy the test specifications, match provisional student ability, and control item exposure rate.

In reading comprehension, students are expected to read and comprehend grade-appropriate passages across a range of genres with a variety of topics. Some texts, such as in philosophy, literature or scientific research, may be more difficult than the others to comprehend because of knowledge, multiple resources, structure, and/or features.

Background (2)

4

The current study is an investigation whether the item-level selection could achieve the content balance at both the item-level and the passage-level to align the CCSS for the passage-based reading comprehension.

According to the Standards for the Educational and Psychological Testing (1999), the study collected supporting evidence (e.g., in item selection procedure, item exposure rate) and validity evidence in parallel construct across individual tests to assure that content standards are adequately represented.

The CAT reading comprehension is Rasch-based with fix-length of 50 on-grade and off-grade items. Test results are reported on the vertical scale and in four performance levels. Students’ responses are collected from the grades 5 and 9 in the fall and spring operations.

Purposes and Methods

5

Item Pool Review and Evaluation: Two overlapping item pools, the initial pool used in the fall and the enhanced pool used in the spring, per grade. Item pool evaluation focused on the sufficiency to support the constraints in test specifications.

Empirical Analyses focused on the adequacy of validity in content balance and parallel construct across individual tests by student achievement level in, such as, number of passages and associated items per passage, content category, topic, gender, Lexile, and difficulty level of passages, uses of on-/off-grade items/passages, and conditional exposure and overlapping rates.

Expert Review focused on the content balance at the passage and the item-levels based on a randomly selected sample of 100 individual tests per grade per operation by student achievement level.

Process for Analyses

6

GradeItem Content Category Passage Type Item Cognitive Level

Standard Min. Max. % Type Min. Max. % DOK Min. Max. %

5

2 30 40 60-80 Inf. 4 5 50 1 0 20 0-304 10 20 20-40 Lit. 4 5 50 2 0 40 0-60

Total 8 10 3 0 6 0-10On-Gr 40 50 MC 49 50

Off-Gr 0 10 TE 0 1

9

2 30 40 60-80 Inf. 6 7 70 1 0 20 0-304 10 20 20-40 Lit. 2 3 30 2 0 40 0-60

Total 8 10 3 0 6 0-10On-Gr 40 50 MC 49 50 Off-Gr 0 10 TE 0 1

Test Specifications

7

Matching Test Spec. in Grade 5

ConstraintsTest Spec. Grade 5

N % Pool (%)Operation (N)

Mean Min.-Max.

Passage-Level 8-10 7.1-8.9 6-13Informational 4-5 50 52 3.7-5.0 2-5

Literary 4-5 50 48 3.5-3.9 2-8Item-Level 50 50

Std. 2 30-40 60-80 78-80 36.4-37.2 31-40Std. 4 10-20 20-40 20-22 12.8-13.6 10-19On-Gr 40-50 80-100 44-48 43.3-43.8 39-50Off-Gr 0-10 0-20 52-56 6.2-6.7 0-11

MC 49-50 98-100 49.7-49.9 48-50TE 0-1 0-2 0.1-0.3 0-2

8

Matching Test Spec. in Grade 9

ConstraintsTest Spec. Grade 9

N % Pool (%)Operation

Mean Min.-Max.Passage-Level 8-10 7.2-8.0 6-10

Informational 6-7 70 70-76 4.1-4.8 3-7Literary 2-3 30 24-30 3.1-3.2 1-5

Item-Level 50 50Std. 2 30-40 60-80 78 37.3-37.8 31-40Std. 4 10-20 20-40 22 12.2-12.7 10-19On-Gr 40-50 80-100 43-52 44.8 39-50Off-Gr 0-10 0-20 48-57 5.2 0-11

MC 49-50 98-100 49.6-49.9 44-50TE 0-1 0-2 0.1-0.4 0-6

9

Sample Individual Test 1 – Grade 5 (AL 1)

Content Category N. Items Type Gender On-Grade Lexile Length

TopicAnimal 7 I N Y 1020L 815

Career 6 I N Y 1250L 825

Genre

Biography 1 I F Y 880L 632

Biography 6 I N Y 1110L 791

Legend-Folktale 1 L M Y 870L 582

Realistic Fiction 1 L M N 500L 341

Realistic Fiction 5 L M Y 1130L 668

Realistic Fiction 1 L F Y 740L 981


Realistic Fiction 6 L F N 570L 984

StructurePair 8 L M Y 890L 112

Pair 7 I M Y 1290L 681

Total 12 50 5/7 4/5/3 10/2 CCSS: 830L-1010L

10


Content Topic N. Items Type Gender On-Grade Lexile Length

TopicCareer 6 I N Y 1250L 825

Sports 1 I M Y 910L 717

Format How-to-Do 8 I N Y 970L 679

Genre

Poem 6 L N Y 1100L 100

Realistic Fiction 8 L N N 960L 575


StructurePair 6 I M Y 1000L 550

Pair 8 L M Y 890L 1012

Total 8 50 4/4 1/3/4 7/1

CCSS: 830L-1010L

11


Content Topic N. Items Type Gender On-Grade Lexile Length

Topic

Food 8 I N N 1290L 609

History 7 I M Y 970L 1135

Science 7 I N Y 1320L 1171

Format How-to-Do 7 I N Y 1160L 1588

GenreRealistic Fiction 8 L F Y 1060L 821


StructurePair 2 L N N 1610L 991

Pair 9 L F Y 820L 1031

Total 8 50 4/4 2/2/4 6/2CCSS: 1050L-

1260L

12


Content Topic N. Items Type Gender On/Off Lexile Length

Topic

Entertainment 5 I N Y 1070L 1188

Entertainment 4 I N Y 1090L 1130

Environment 7 I N Y 1300L 1048

Health 8 I N Y 1290L 761

Genre

Biography 6 I M N 1210L 376



Realistic Fiction 6 L F Y 930L

Total 8 50 5/3 1/3/4 7/1 CCSS: 1050L-1260L

13

Findings and Implications (1)

Compared with the test specifications, the content balance is satisfied at the item level in Standards 2 & 4 on the average as well as within the min./max. limitations in both grades. The proportion of on-grade and off-grade items is generally met in the fix-length test.

At the passage level, the total number of passages varies greatly from student to student, as shown in the sample test from 8 for an achievement level 4 to 12 for an achievement level 1. The proportion of the two types of passages, informational and literary, are failed to meet the target in operation.

In reading, passage and associated items are related with each other, but each has its unique coding category and evaluation system. To address all constraints and balance them in both levels is much more complicated to accomplish in reality than presumed. Otherwise, the compensation for the constraints at one level could be compromised at another level.

14

Summary and Implications (2)

When students repeatedly received reading passages from certain genres with similar topics or in the same format or structure, it not only limits the breadth of their exposure in reading, but also introduces bias in testing.

According to the content expert review, one pairing per test is desirable. This is because paired passages increase the reading demands with an additional passage. More importantly, the cognitive load is increased as students are asked to make inferences and draw conclusions across passages, not just within each passage.

When students read passage(s) with only 1-2 associated items to satisfy the test specifications in a fixed-length test, the reading demands are unexpectedly swelled, especially for young readers.

15

Findings and Implications (3)

To achieve content balance in passage-based adaptive reading tests, an indispensable condition is that all constraints at the item level and the passage level must be considered simultaneously. .

In CAT, sufficient item pools and well established content constraints in the test specifications are the necessity for ensuring the adequacy in content balance and comparable construct across individual tests.

16

Thank you!

Documents

Liru Zhang, Delaware DOE Shudong Wang, NWEA Presented at the 2015 NCSA Annual Conference, San Diego, CA 1