30
Scantron Performance Series Technical Report 65 Chapter 4: Reliability and Validity All item-bank statistics, analyses, and procedures used to illustrate the concepts of reliability and validity as they relate to Performance Series were reviewed for completeness and accuracy by a statistical team. In this chapter: Reliability and Standard Error of Measurement . . .page 66 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . page 71 4

Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

  • Upload
    others

  • View
    41

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Scantron Performance Series Technical Report 65

Chapter 4:Reliability and Validity

All item-bank statistics, analyses, and procedures used to illustrate the concepts of reliability and validity as they relate to Performance Series were reviewed for completeness and accuracy by a statistical team.

In this chapter:Reliability and Standard Error of Measurement . . .page 66

Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .page 71

4

Page 2: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

66 Scantron Performance Series

Chapter 4: Reliability and ValidityReliability and Standard Error of Measurement

Reliability and Standard Error of MeasurementAccording to the Standards for Educational and Psychological Testing, reliability refers to “the degree that true scores are free from errors of measurement.” That is, measurements are consistent when repeated on a population of examinees. In classical test theory, reliability is defined as the ratio of true score variance to the observed score variance. Reliability is usually expressed as a single number (e.g., Cronbach’s alpha). Depending on the audience, the standard error of measurement is sometimes used.

A more meaningful index for both classical and Item Response Theory (IRT) based assessment tools is the standard error of measurement. This measure of precision specifies a confidence interval within which an examinee’s measure will fall with repeated assessments. In Computer Adaptive Testing (CAT), where examinees are exposed to different sub-sets of items, the only meaningful way to express an instrument’s reliability/precision is through the error associated with an examinees’ ability estimate, that is, the standard error of measurement.

Scantron’s goal (in fact, one of the test stopping criteria) is a standard error of measurement of less than 0.30 logits for each examinee. This is roughly equivalent to a conventional reliability coefficient of 0.91. Although this is one of the stopping criteria for the test, the standard error of measurement will vary for each examinee. The majority of the tests finish with a standard error of measurement less than 0.30. Table 4-1 displays mean standard error of measurement (SEM) and number of items administered across grade level groups in Mathematics during the spring 2004 administration.

Page 3: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 67

Chapter 4: Reliability and ValidityReliability and Standard Error of Measurement

The number of reading passages seen by examinees influences the number of items administered within Performance Series Reading content area. Administering more items to achieve a lower standard error of measurement (comparable to that seen in the Mathematics content area) necessitates the reading of additional passages by most examinees. Examinees requiring additional items to meet the standard error of measurement threshold set by Scantron will, in some cases, need to read an entire new passage and respond to its associated group of items. This contributes to the increased variability in the number of items administered and larger mean standard error of measurement compared to the Mathematics content area. Table 4-2 indicates mean

Table 4-1: Mean standard error of measurement (SEM) and number of items administered across grade level groups in Mathematics

Grade Level SEM (Mean) SEM (SD)# Items (Mean)

# Items (SD)

N

2 0.29 0.02 52.5 6.0 16,522

3 0.28 0.01 53.8 6.4 16,349

4 0.27 0.02 60.3 7.6 21,811

5 0.27 0.01 61.0 7.1 16,702

6 0.27 0.01 60.6 6.7 21,236

7 0.27 0.02 59.8 6.2 20,802

8 0.27 0.02 59.6 6.2 13,343

9 0.27 0.03 60.0 6.7 5,091

10 0.27 0.02 59.5 6.4 2,213

11 0.28 0.04 59.5 6.3 1,685

12 0.28 0.04 59.9 6.6 1,160

Totals 0.27 0.02 58.5 7.3 136,914

Page 4: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

68 Scantron Performance Series

Chapter 4: Reliability and ValidityReliability and Standard Error of Measurement

standard error of measurement and number of items administered across grade level groups in Reading for tests administered in spring 2006.

Similar summary statistics are presented in Table 4-3 for Language Arts and Table 4-4 for Science for tests completed in the spring of 2004. The nature of the test for these two subject areas are the same as mathematics (i.e. no passages) hence we see comparable results in the mean standard error of measurement for these subject areas.

Table 4-2: Mean standard error of measurement and number of items administered across grade level groups in Reading

Grade LevelSEM

(Mean)SEM (SD)

# Items (Mean)

# Items (SD)

N

2 0.37 0.17 41.2 11.9 16,940

3 0.34 0.12 45.0 8.9 17,828

4 0.32 0.08 46.3 6.8 21,632

5 0.32 0.07 46.6 6.1 16,617

6 0.31 0.05 47.9 5.5 20,423

7 0.32 0.05 48.0 5.2 20,558

8 0.31 0.05 48.4 5.0 11,485

9 0.32 0.08 47.9 6.4 6,369

10 0.32 0.06 48.3 5.1 2,477

11 0.32 0.05 48.7 4.9 1,619

12 0.33 0.08 48.2 5.2 1,216

Totals 0.33 0.09 46.4 7.6 137,164

Page 5: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 69

Chapter 4: Reliability and ValidityReliability and Standard Error of Measurement

Table 4-3: Mean standard error of measurement and number of items administered across grade level groups in Language Arts

Grade LevelSEM

(Mean)SEM (SD)

# Items (Mean)

# Items (SD)

N

2 0.30 0.03 49.0 4.1 10,348

3 0.30 0.02 49.8 3.6 4,669

4 0.29 0.02 50.0 3.3 9,728

5 0.29 0.01 50.1 3.1 4,873

6 0.30 0.01 49.6 3.3 10,296

7 0.30 0.02 49.3 3.5 9,903

8 0.30 0.01 49.2 3.1 3,325

9 0.30 0.03 49.8 3.4 646

10 0.30 0.02 49.5 3.1 546

11 0.30 0.02 49.1 3.1 418

12 0.30 0.02 48.8 3.4 193

Totals 0.30 0.02 49.5 3.5 54,945

Table 4-4: Mean standard error of measurement and number of items administered across grade level groups in Science

Grade LevelSEM

(Mean)SEM (SD)

# Items (Mean)

# Items (SD)

N

2 0.29 0.02 52.2 6.9 7,634

3 0.28 0.02 53.4 4.9 1,528

4 0.28 0.01 54.2 3.5 9,117

5 0.28 0.01 54.1 3.3 1,829

6 0.28 0.02 54.0 3.8 883

Page 6: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

70 Scantron Performance Series

Chapter 4: Reliability and ValidityReliability and Standard Error of Measurement

7 0.28 0.01 54.1 3.5 8,880

8 0.28 0.01 54.1 3.4 513

9 0.29 0.03 54.6 3.8 227

10 0.28 0.02 54.4 3.8 208

11 0.29 0.02 53.2 4.0 114

12 0.29 0.03 53.4 4.4 61

Totals 0.28 0.02 53.6 4.7 30,994

Table 4-4: Mean standard error of measurement and number of items administered across grade level groups in Science

Grade LevelSEM

(Mean)SEM (SD)

# Items (Mean)

# Items (SD)

N

Page 7: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 71

Chapter 4: Reliability and ValidityValidity

ValidityThe Standards for Educational and Psychological Testing define validity as “the degree to which accumulated evidence and theory support specific interpretations of test scores entailed by proposed uses of a test.” To put it another way, a test should not be considered valid in an absolute sense. Rather, the validity of a test should be considered within the context of the groups to be tested, and the desired interpretation of test results.

Much of Scantron’s validity research has been an effort to “accumulate evidence” as the Standards for Educational and Psychological Testing indicate. The results of these efforts are categorized below.

Content ValidityContent validity refers to the degree to which a test measures an indicated content area. Presently, the content areas within Performance Series are Mathematics, Reading, Language Arts, and Science. In an attempt to illustrate the content validity of Performance Series with regard to these content areas, Scantron examined the concepts of item validity and sampling validity, both of which are necessary components of content validity. Item validity focuses on the degree to which test items are relevant in measuring the desired content area. Sampling validity focuses on how well items selected for the test sample or span the content area. Due to the newness of the High School Algebra, High School Geometry, Spanish Math, and Reading Foundations components of Performance Series, no concurrent validity research on them has been completed at this time. The tasks described below regarding Item Validity and Sampling Validity are, however, in place for these subject areas.

Page 8: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

72 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

Item ValidityScantron began the item development process by creating a list of skills through research of individual state standards, state assessment programs, and the National Assessment of Educational Progress (NAEP). In addition, those standards proposed by national organizations such as the National Council of Teachers of Mathematics (NCTM) and the National Council of Teachers of English (NCTE) were also reviewed.

Much of this research was performed during the creation and regular update of Scantron’s Curriculum Designer product, which has taken place over the last ten years. The Curriculum Designer database of skills and objectives is aligned to standards and assessment documents from around the country that have been created within the last fifteen years. As a result, trends in education and assessment (from a skills and objectives perspective) were analyzed during the development of Performance Series skill list.

Using Curriculum Designer, similar elements (standards, skills, objectives, competencies) spanning any combination of documents contained within its database are readily identified. Therefore, a core of these most common elements, taken within and across grade levels, can be determined. Performance Series skill list represents this core group of skills.

Scantron content team members developed all items that appear within Performance Series. Each item that exists within the item bank was written to measure a skill from Performance Series skill list at the appropriate grade level. In order to ensure the uniformity of the construction of items within Performance Series item bank, Scantron developed a process for training content team members on item development. This training consisted of a hands-

Page 9: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 73

Chapter 4: Reliability and ValidityValidity

on program designed to enable content team members to transfer their content area knowledge and classroom teaching experience into successful item development. In addition to their training, all content team members received the Scantron Item Development Training Manual as a reference tool.

As prospective items are developed, they are subjected to an external evaluation by a panel of content area experts. New items are reviewed for:• Item alignment with the indicated skill at the appropri-

ate grade level• Item content and quality (accuracy of content, overall

clarity, one unambiguous answer)• Item bias (to ensure that the item did not demonstrate

gender, racial/ethnic, and/or socioeconomic bias)• Gender count for passive/active voice. Reading pas-

sages are reviewed to ensure that male/female main characters are written in an equal number of instances with regard to passive/active voice.

The items are then returned to the Scantron content team to make changes based on the recommendation of the external evaluation panel. This process is repeated to ensure that corrections were made as the evaluation panel intended, and that no new errors or problems with the items were introduced during the rewrite/editing process. Items failing this external review are eliminated from further consideration for entry into Performance Series item bank. Items passing this external review process are deemed to be relevant to the task of measuring their respective content areas.

Page 10: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

74 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

Sampling ValidityIn order to possess a high degree of sampling validity, an assessment must include items that span the given content area. To address this need, Performance Series content areas are divided into sub-areas or units that function as independent testlets during test administration. Examinees in any content area are required to be exposed to items from the many component testlets that make up that content area. This is accomplished through Scantron’s item selection algorithm. As a result, no examinee’s Performance Series experience is restricted to a minute subset of a given content area.

Inter-Testlet CorrelationTo illustrate the concepts of item and sampling validity of Performance Series in a more quantitative manner, Scantron has examined the correlation of examinee scores between the component testlets within each content area. Most of the table entries indicate a fairly good (> 0.65) correlation coefficient. This indicates that test items within each of the component testlets in each content area are measuring their segment of the overall content area at about the same level. Also, examinees are not exposed to wide ranges of items (with regard to difficulty) from one testlet to the next within a given content area unless their ability within each testlet warrants such a variation.

In addition to making a statement about content validity, Tables 4-5 through 4-8 on the following pages serve to illustrate, in an indirect manner, the degree of precision in item difficulty calibration, as well as the proper functioning of Scantron's item selection algorithm. These tables summarize the inter-testlet correlations within the content areas tested in Performance Series. All tables represent examinee results during the Spring 2006 administration period. Inter-Testlet correlations partitioned by grade level are available upon request.

Page 11: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 75

Chapter 4: Reliability and ValidityValidity

Mathematics

Reading

Table 4-5: Inter-testlet correlation coefficients for tests completed in spring 2006.

All Grades

Over

all

Geom

etry

Stat

isti

cs

Alge

bra

Mea

sure

men

t

Num

ber

Oper

atio

ns

Overalla

a. Correlation is significant at the 0.01 level (2-tailed)

Pearson Correlationa 1.000 0.857 0.861 0.873 0.868 0.899N 42,565 42,483 42,375 42,450 42,454 42,583

Geometry Pearson Correlationa 0.857 1.000 0.668 0.692 0.668 0.717N 42,483 42,483 42,347 42,415 42,426 42,481

Statistics Pearson Correlationa 0.861 0.668 1.000 0.689 0.693 0.705N 42,375 42,347 23,375 42,317 42,355 42,374

Algebra Pearson Correlationa 0.873 0.692 0.689 1.000 0.689 0.748N 42,450 42,416 42,317 42,450 42,386 42,449

Measurement Pearson Correlationa 0.868 0.668 0.693 0.689 1.000 0.723N 42,454 42,426 42,355 42,386 42,454 42,452

Number Operations Pearson Correlationa 0.899 0.717 0.705 0.748 0.723 1.000N 42,563 42,481 42,374 42,449 42,452 42,563

Table 4-6: Inter-testlet correlation coefficients for tests completed in spring 2006.

All Grades

Over

all

Voca

bula

ry

Fict

ion

Nonf

icti

on

Long

Pas

sage

Overall Pearson Correlationa

a. Correlation is significant at the 0.01 level (2-tailed)

1.000 0.956 0.899 0.913 0.939N 34,654 34,654 34,293 34,293 34,293

Vocabulary Pearson Correlationa 0.956 1.000 0.794 0.817 0.838N 34,654 34,654 34,293 34,293 34,293

Fiction Pearson Correlationa 0.899 0.794 1.000 0.804 0.820N 34,293 34,293 34,293 34,293 34,293

Nonfiction Pearson Correlationa 0.913 0.817 0.804 1.000 0.842N 34,293 34,293 34,293 34,293 34,293

Long Passage Pearson Correlationa 0.939 0.838 0.820 0.825 1.000N 34,293 34,293 34,293 34,293 34,293

Page 12: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

76 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

Language Arts

Life Science and Inquiry

Table 4-7: Inter-testlet correlation coefficients for tests completed in spring 2006.

All Grades

Over

all

Capi

taliz

atio

n

Part

s of

Spe

ech

Punc

tuat

ion

Sent

ence

Str

uctu

re

Overall Pearson Correlationa

a. Correlation is significant at the 0.01 level (2-tailed)

1.000 0.386 0.866 0.864 0.891N 13,777 13,777 13,777 13,777 13,777

Capitalization Pearson Correlationa 0.386 1.000 0.634 0.657 0.652N 13,777 13,777 13,777 13,777 13,777

Parts of Speech Pearson Correlationa 0.866 0.634 1.000 0.647 0.729N 13,777 13,777 13,777 13,777 13,777

Punctuation Pearson Correlationa 0.864 0.657 0.647 1.000 0.676N 13,777 13,777 13,777 13,777 13,777

Sentence Structure

Pearson Correlationa 0.891 0.652 0.729 0.676 1.000N 13,777 13,777 13,777 13,777 13,777

Table 4-8: Inter-testlet correlation coefficients for tests completed in spring 2006.

All Grades

Over

all

Livi

ng T

hing

s

Ecol

ogy

Scie

nce

Proc

ess

Overall Pearson Correlationa

a. Correlation is significant at the 0.01 level (2-tailed)

1.000 0.904 0.937 0.924N 6,682 6,682 6,651 6,682

Living Things Pearson Correlationa 0.904 1.000 0.776 0.753N 6,682 6,682 6,651 6,682

Ecology Pearson Correlationa 0.937 0.776 1.000 0.791N 6,651 6,651 6,651 6,651

Science Process Pearson Correlationa 0.924 0.753 0.791 1.000N 6,682 6,682 6,651 6,682

Page 13: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 77

Chapter 4: Reliability and ValidityValidity

Criterion - Related ValidityOne type of criterion-related validity is concurrent validity. Concurrent validity indicates the degree to which Performance on two separate assessments is correlated. Scantron has been engaged in concurrent validity research since the initial release of Performance Series. Some of the research results for Performance Series are summarized in the tables below for the following standardized assessments:• ASBA—Alaska Standards-Based Assessments• BST—Minnesota Basic Skills Test• CAHSEE—California High School Exit Exam• CAT6—California Achievement Test version 6• CRCT—Georgia Criterion-Referenced Competency

Tests• CSAP—Colorado Student Assessment Program• CST—California Standards Test• CTBS/5—Kentucky Comprehensive Test of Basic

Skills • EXPLORE—ACT's College and Career Readiness

Program for 8th and 9th graders • FCAT—Florida Comprehensive Assessment Test• ISAT—Illinois Standards Achievement Test• ISTEP+—Indiana Statewide Testing for Educational

Progress-Plus • ITBS—Iowa Test of Basic Skills• KCCT—Kentucky Core Content Test • LEAP/iLEAP—Louisiana Educational Assessment

Program/Integrated Louisiana Educational Assessment Program

• MCA—Minnesota Comprehensive Assessment• MEAP—Michigan Educational Assessment Program• MSA—Maryland School Assessments• NECAP—New England Common Assessment

Program• NYST—New York State Tests• OCCT—Oklahoma Core Curriculum Tests

Page 14: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

78 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

• PLAN—ACT's College and Career Readiness Pro-gram for 10th graders

• PSSA—Pennsylvania System of School Assessment• SAT9—Stanford Achievement Test• TAKS—Texas Assessment of Knowledge and Skills• TCAP—Tennessee Comprehensive Assessment

Program • WASL—Washington Assessment of Student Learning

Table 4-9: Performance Series Mathematics Correlations

Test Grade Date N Pearson State

ASBA 8 Spring 2008 205 0.830 AK

BST 8 Spring 2004 482 0.780 MN

CAHSEE 10 Spring 2006 913 0.720 CA

CAT6 2 Spring 2003 551 0.655 CA

3 Spring 2003 578 0.738 CA

4 Spring 2003 627 0.753 CA

5 Spring 2003 571 0.780 CA

6 Spring 2003 581 0.760 CA

7 Spring 2003 563 0.712 CA

8 Spring 2003 534 0.772 CA

9 Spring 2003 439 0.741 CA

10 Spring 2003 380 0.705 CA

11 Spring 2003 303 0.710 CA

Page 15: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 79

Chapter 4: Reliability and ValidityValidity

CRCT 2 Spring 2009 1069 0.752 GA

3 Spring 2009 987 0.769 GA

4 Spring 2009 873 0.794 GA

5 Spring 2009 772 0.782 GA

6 Spring 2009 757 0.820 GA

7 Spring 2009 710 0.825 GA

8 Spring 2009 705 0.826 GA

CSAP 3 Spring 2010 1355 0.785 CO

4 Spring 2010 1177 0.827 CO

5 Spring 2010 1175 0.864 CO

6 Spring 2010 1435 0.873 CO

7 Spring 2010 1499 0.862 CO

8 Spring 2010 1197 0.878 CO

9 Spring 2010 1369 0.851 CO

10 Spring 2010 696 0.830 CO

Table 4-9: Performance Series Mathematics Correlations (Continued)

Test Grade Date N Pearson State

Page 16: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

80 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

CST - General Math

2 Spring 2008 61 0.65 CA

3 Spring 2008 110 0.52 CA

4 Spring 2008 103 0.57 CA

5 Spring 2008 107 0.71 CA

6 Spring 2008 127 0.72 CA

7 Spring 2008 142 0.66 CA

8 Spring 2008 131 0.60 CA

9 Spring 2008 126 0.59 CA

10 Spring 2008 107 0.59 CA

11 Spring 2008 70 0.55 CA

CST Algebra I HS Spring 2003 325 0.722 CA

CST Algebra II HS Spring 2003 100 0.658 CA

CST Geometry HS Spring 2003 199 0.752 CA

CTBS/5 3 Spring 2004 851 0.659 KY

6 Spring 2004 826 0.770 KY

FCAT 8 Fall 2010 1303 0.768 KY

9 Fall 2010 14032 0.742 IL

FCAT(Norm Reference portion of the Florida Comprehensive Assessment Test.)

3 Spring 2003 162 0.724 FL

4 Spring 2003 188 0.811 FL

Table 4-9: Performance Series Mathematics Correlations (Continued)

Test Grade Date N Pearson State

Page 17: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 81

Chapter 4: Reliability and ValidityValidity

FCAT(Criterion-referenced portion measuring the Sunshine State Standards.)

3 Spring 2003 162 0.721 FL

4 Spring 2003 188 0.851 FL

ISAT

3 Spring 2011 19896 0.822 IL

4 Spring 2011 18691 0.803 IL

5 Spring 2011 18727 0.807 IL

6 Spring 2011 18493 0.817 IL

7 Spring 2011 17451 0.822 IL

8 Spring 2011 17929 0.810 IL

ISTEP+ 3 Fall 2007 593 0.762 IN

4 Fall 2007 1172 0.774 IN

5 Fall 2007 1234 0.830 IN

6 Fall 2007 1492 0.841 IN

7 Fall 2007 1221 0.877 IN

8 Fall 2007 1098 0.874 IN

9 Fall 2007 550 0.820 IN

Table 4-9: Performance Series Mathematics Correlations (Continued)

Test Grade Date N Pearson State

Page 18: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

82 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

ITBS

2 Spring 2009 2265 0.778 LA

2 Spring 2002 136 0.336 OK

3 Spring 2002 1,013 0.649 OK

4 Spring 2003 1,388 0.837 GA

4 Spring 2002 984 0.760 OK

5 Spring 2002 954 0.765 OK

6 Spring 2003 3,372 0.858 GA

6 Spring 2002 726 0.764 OK

7 Spring 2002 791 0.796 OK

8 Spring 2002 774 0.848 OK

9 Spring 2002 621 0.724 OK

10 Spring 2002 416 0.674 OK

11 Spring 2002 268 0.728 OK

12 Spring 2002 199 0.727 OK

KCCT 5 Spring 2004 988 0.712 KY

8 Spring 2004 1023 0.801 KY

Table 4-9: Performance Series Mathematics Correlations (Continued)

Test Grade Date N Pearson State

Page 19: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 83

Chapter 4: Reliability and ValidityValidity

LEAP/iLEAP 3 Spring 2008 2384 0.790 LA

4 Spring 2008 2428 0.810 LA

5 Spring 2008 1978 0.770 LA

6 Spring 2008 2040 0.800 LA

7 Spring 2008 2143 0.810 LA

8 Spring 2008 2049 0.750 LA

9 Spring 2008 190 0.670 LA

MCA 7 Spring 2004 2700 0.835 MN

MEAP 3 Spring 2006 1232 0.762 MI

4 Spring 2006 1346 0.678 MI

5 Spring 2006 1261 0.714 MI

6 Spring 2006 1144 0.721 MI

7 Spring 2006 1198 0.762 MI

8 Spring 2005 1253 0.728 MI

MSA 3 Spring 2009 777 0.780 MD

4 Spring 2009 456 0.840 MD

5 Spring 2009 583 0.800 MD

6 Spring 2009 250 0.820 MD

7 Spring 2009 291 0.880 MD

NECAP 11 Fall 2008 663 0.820 NH

Table 4-9: Performance Series Mathematics Correlations (Continued)

Test Grade Date N Pearson State

Page 20: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

84 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

NYST 3 Spring 2010 4309 0.697 NY

4 Spring 2010 4446 0.806 NY

5 Spring 2010 3991 0.789 NY

6 Spring 2010 4521 0.774 NY

7 Spring 2010 4313 0.773 NY

8 Spring 2010 4230 0.751 NY

OCCT 5 Spring 2004 212 0.823 OK

8 Spring 2004 256 0.835 OK

PLAN 9 Fall 200 878 0.818 KY

PSSA 3 Spring 2008 552 0.711 PA

4 Spring 2008 564 0.721 PA

5 Spring 2008 573 0.765 PA

6 Spring 2008 1117 0.788 PA

7 Spring 2008 1139 0.776 PA

8 Spring 2008 1133 0.783 PA

11 Spring 2008 643 0.820 PA

SAT9

2 Spring 2002 476 0.733 SD

3 Spring 2002 981 0.709 OK

4 Spring 2002 910 0.801 SD

Table 4-9: Performance Series Mathematics Correlations (Continued)

Test Grade Date N Pearson State

Page 21: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 85

Chapter 4: Reliability and ValidityValidity

TAKS 3 Spring 2004 111 0.611 TX

4 Spring 2004 134 0.745 TX

5 Spring 2004 95 0.739 TX

6 Spring 2004 117 0.740 TX

7 Spring 2004 83 0.789 TX

8 Spring 2004 99 0.847 TX

9 Spring 2004 92 0.769 TX

10 Spring 2004 95 0.881 TX

TCAP

3 Spring 2004 749 0.736 TN

4 Spring 2004 763 0.701 TN

5 Spring 2004 778 0.804 TN

6 Spring 2004 746 0.817 TN

7 Spring 2004 725 0.851 TN

8 Spring 2004 175 0.823 TN

8 Spring 2002 747 0.814 SD

11 Spring 2002 546 0.761 SD

WASL 7 Spring 2004 145 0.860 WA

Table 4-9: Performance Series Mathematics Correlations (Continued)

Test Grade Date N Pearson State

Table 4-10: Performance Series Reading Correlations

Test Grade Date N Pearson State

ASBA 8 Spring 2008 237 0.690 AK

Page 22: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

86 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

BST 8 Spring 2004 660 0.789 MN

CAHSEE

10 Spring 2006 903 0.790 CA

11 Spring 2003 125 0.738 CA

CAT6 - English Language Arts

2 Spring 2003 547 0.676 CA

3 Spring 2003 576 0.717 CA

4 Spring 2003 625 0.700 CA

5 Spring 2003 568 0.757 CA

6 Spring 2003 580 0.676 CA

7 Spring 2003 555 0.576 CA

8 Spring 2003 521 0.635 CA

9 Spring 2003 353 0.654 CA

10 Spring 2003 330 0.578 CA

11 Spring 2003 338 0.569 CA

Table 4-10: Performance Series Reading Correlations (Continued)

Test Grade Date N Pearson State

Page 23: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 87

Chapter 4: Reliability and ValidityValidity

CAT6 - Reading

2 Spring 2003 547 0.698 CA

3 Spring 2003 576 0.734 CA

4 Spring 2003 625 0.719 CA

5 Spring 2003 568 0.771 CA

6 Spring 2003 580 0.695 CA

7 Spring 2003 555 0.699 CA

8 Spring 2003 521 0.757 CA

9 Spring 2003 353 0.698 CA

10 Spring 2003 330 0.598 CA

11 Spring 2003 338 0.584 CA

CRCT 2 Spring 2009 1053 0.711 GA

3 Spring 2009 963 0.791 GA

4 Spring 2009 862 0.806 GA

5 Spring 2009 760 0.790 GA

6 Spring 2009 736 0.746 GA

7 Spring 2009 669 0.755 GA

8 Spring 2009 701 0.767 GA

Table 4-10: Performance Series Reading Correlations (Continued)

Test Grade Date N Pearson State

Page 24: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

88 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

CSAP 3 Spring 2010 1298 0.796 CO

4 Spring 2010 1249 0.823 CO

5 Spring 2010 1128 0.841 CO

6 Spring 2010 1245 0.805 CO

7 Spring 2010 1676 0.794 CO

8 Spring 2010 1064 0.804 CO

9 Spring 2010 1317 0.761 CO

10 Spring 2010 898 0.774 CO

CST - English Language Arts

2 Spring 2008 57 0.700 CA

3 Spring 2008 102 0.600 CA

4 Spring 2008 99 0.770 CA

5 Spring 2008 99 0.730 CA

6 Spring 2008 117 0.790 CA

7 Spring 2008 125 0.770 CA

8 Spring 2008 117 0.770 CA

9 Spring 2008 118 0.760 CA

10 Spring 2008 129 0.680 CA

11 Spring 2008 150 0.730 CA

CTBS/5 3 Spring 2004 607 0.673 KY

6 Spring 2004 586 0.628 KY

Table 4-10: Performance Series Reading Correlations (Continued)

Test Grade Date N Pearson State

Page 25: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 89

Chapter 4: Reliability and ValidityValidity

EXPLORE 8 Fall 2010 1312 0.675 KY

9 Fall 2010 14280 0.638 IL

FCAT(Norm Reference portion of the Florida Comprehensive Assessment Test.)

3 Spring 2003 162 0.800 FL

4 Spring 2003 191 0.836 FL

FCAT(Criterion-referenced portion measuring the Sunshine State Standards. )

3 Spring 2003 162 0.793 FL

4 Spring 2003 191 0.859 FL

ISAT

3 Spring 2011 19750 0.845 IL

4 Spring 2011 18713 0.836 IL

5 Spring 2011 18713 0.828 IL

6 Spring 2011 18646 0.822 IL

7 Spring 2011 17667 0.807 IL

8 Spring 2011 18036 0.787 IL

ISTEP+ 3 Fall 2007 601 0.786 IN

4 Fall 2007 1169 0.809 IN

5 Fall 2007 1239 0.832 IN

6 Fall 2007 1500 0.778 IN

7 Fall 2007 1231 0.782 IN

8 Fall 2007 1100 0.782 IN

9 Fall 2007 515 0.749 IN

Table 4-10: Performance Series Reading Correlations (Continued)

Test Grade Date N Pearson State

Page 26: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

90 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

ITBS

2 Spring 2009 2052 0.795 LA

2 Spring 2002 236 0.465 OK

3 Spring 2002 1,068 0.693 OK

4 Spring 2003 1,399 0.845 GA

4 Spring 2002 1,071 0.811 OK

5 Spring 2002 1,098 0.790 OK

6 Spring 2003 3,491 0.859 GA

6 Spring 2002 867 0.826 OK

7 Spring 2002 1,108 0.814 OK

8 Spring 2002 968 0.798 OK

9 Spring 2002 766 0.828 OK

10 Spring 2002 513 0.776 OK

11 Spring 2002 314 0.695 OK

12 Spring 2002 255 0.708 OK

KCCT 4 Spring 2004 992 0.611 KY

7 Spring 2004 905 0.560 KY

LEAP/iLEAP 3 Spring 2008 2350 0.810 LA

4 Spring 2008 2409 0.770 LA

5 Spring 2008 1971 0.730 LA

6 Spring 2008 1987 0.750 LA

7 Spring 2008 2095 0.760 LA

Table 4-10: Performance Series Reading Correlations (Continued)

Test Grade Date N Pearson State

Page 27: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 91

Chapter 4: Reliability and ValidityValidity

8 Spring 2008 2039 0.690 LA

9 Spring 2008 146 0.620 LA

MCA 7 Spring 2004 2576 0.830 MN

MEAP 3 Spring 2006 1244 0.742 MI

4 Spring 2006 1346 0.700 MI

5 Spring 2006 1258 0.714 MI

6 Spring 2006 1150 0.735 MI

7 Spring 2006 1234 0.721 MI

8 Spring 2006 1257 0.700 MI

MSA 3 Spring 2009 613 0.760 MD

4 Spring 2009 768 0.770 MD

5 Spring 2009 725 0.750 MD

6 Spring 2009 272 0.740 MD

7 Spring 2009 303 0.800 MD

NYST 3 Spring 2010 3142 0.703 NY

4 Spring 2010 3333 0.780 NY

5 Spring 2010 2923 0.718 NY

6 Spring 2010 2374 0.769 NY

7 Spring 2010 2249 0.787 NY

8 Spring 2010 2590 0.721 NY

Table 4-10: Performance Series Reading Correlations (Continued)

Test Grade Date N Pearson State

Page 28: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

92 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

OCCT 5 Spring 2004 220 0.842 OK

8 Spring 2004 224 0.829 OK

PLAN 9 Fall 2008 823 0.668 KY

PSSA 3 Spring 2008 555 0.742 PA

4 Spring 2008 560 0.762 PA

5 Spring 2008 569 0.758 PA

6 Spring 2008 1101 0.737 PA

7 Spring 2008 1144 0.757 PA

8 Spring 2008 1081 0.776 PA

11 Spring 2008 652 0.724 PA

SAT9 2 Spring 2002 520 0.756 SD

3 Spring 2002 1,033 0.818 OK

4 Spring 2002 931 0.747 SD

8 Spring 2002 841 0.708 SD

11 Spring 2002 658 0.613 SD

Table 4-10: Performance Series Reading Correlations (Continued)

Test Grade Date N Pearson State

Page 29: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

Technical Report 93

Chapter 4: Reliability and ValidityValidity

TAKS 3 Spring 2004 108 0.682 TX

4 Spring 2004 120 0.727 TX

5 Spring 2004 96 0.756 TX

6 Spring 2004 116 0.713 TX

7 Spring 2004 84 0.773 TX

8 Spring 2004 98 0.662 TX

9 Spring 2004 88 0.743 TX

TCAP 3 Spring 2004 760 0.736 TN

4 Spring 2004 762 0.701 TN

5 Spring 2004 766 0.708 TN

6 Spring 2004 743 0.747 TN

7 Spring 2004 706 0.760 TN

8 Spring 2004 183 0.734 TN

WASL 7 Spring 2004 153 0.710 WA

Table 4-10: Performance Series Reading Correlations (Continued)

Test Grade Date N Pearson State

Page 30: Chapter 4: Reliability and Validity · 2014-09-10 · Chapter 4: Reliability and Validity Validity Validity The Standards for Educational and Psychological Testing define validity

94 Scantron Performance Series

Chapter 4: Reliability and ValidityValidity

Table 4-11: Performance Series Language Arts Correlations

Test Grade Date N Pearson State

CRCT 2 Spring 2009 1063 0.765 GA

3 Spring 2009 997 0.801 GA

4 Spring 2009 870 0.780 GA

5 Spring 2009 761 0.780 GA

6 Spring 2009 773 0.773 GA

7 Spring 2009 709 0.777 GA

8 Spring 2009 732 0.810 GA

CTBS/5 3 Spring 2004 573 0.694 KY

6 Spring 2004 797 0.673 KY

9 Spring 2004 30 0.633 KY

OCCT 5 Spring 2004 212 0.808 OK

8 Spring 2004 250 0.771 OK

Table 4-12: Performance Series Life Science and Inquiry Correlations

Test Grade Date N Pearson State

KCCT 4 Spring 2004 875 0.622 KY

7 Spring 2004 940 0.710 KY

TAKS 5 Spring 2004 84 0.753 TX