Incorporating Contextual Information in Standard Setting...– Three year cycle (2012, 2015, 2018)...

Preview:

Citation preview

Incorporating Contextual Information in Standard SettingGary W. Phillips, VP, American Institutes for ResearchVince Verges, Assistant Deputy Commissioner, FloridaIrene Hunting, Deputy Associate Superintendent, ArizonaJames Wright, Director, Office of Curriculum and Assessment, Ohio

June 20, 20162016 CCSSO/NCSA

June 2016

Copyright © 2016 American Institutes for Research. All rights reserved.

AIR

2

AMERICAN INSTITUTES FOR RESEARCH

What is Contextual Information in Standard Setting?• Data used to inform panelist decisions above and beyond

the content standards, performance level descriptions and the test items

3

AMERICAN INSTITUTES FOR RESEARCH

Impact Data• State item P-values • Item Maps using Response Probabilities (RPs)• Overall state inverse cumulative percentages• State inverse cumulative percentages by demographic

subgroup

4

PG1

Slide 4

PG1 Phillips, Gary, 6/9/2016

AMERICAN INSTITUTES FOR RESEARCH

Articulation• Use student frequency distributions to smooth cut-scores

across grades• Use vertical scale to smooth cut-scores across grades

5

AMERICAN INSTITUTES FOR RESEARCH

National Benchmarks• ACT/SAT• NAEP• National Norm-Referenced Tests

6

AMERICAN INSTITUTES FOR RESEARCH

International Benchmarks• PISA

– Three year cycle (2012, 2015, 2018)– Reading, Mathematics & Science– Age 15

• TIMSS– Four Year cycle (2011, 2015, 2019)– Mathematics and Science– Grades 4 & 8

• PIRLS– Five year cycle (2011, 2016, 2021)– Reading Literacy– Grade 4

7

AMERICAN INSTITUTES FOR RESEARCH

Benchmarking Methodology

• Linking–Item Calibration

» Common item linking (e.g., PISA in Delaware, Hawaii and Oregon, 2015)

–Equipercentile» Randomly equivalent groups (e.g., comparing state standards to

SBAC and PARCC standards, 2016)

–Statistical Moderation»Randomly equivalent groups (e.g., comparing state

standards to TIMSS standards, 2010)

8

AMERICAN INSTITUTES FOR RESEARCH

When is Contextual Information Important?• When you are trying to reach a policy goal• The earlier you introduce contextual information into the

standard setting workshop the more it is likely to influence the outcome

9

AMERICAN INSTITUTES FOR RESEARCH

Literature• Ferrara (2005). Vertically Articulated Performance

Standards (special issue of Applied Measurement in Education)

• McClarty, Way, Porter, Beimer and Miles (2012). Evidence-Based Standard Setting (Educational Researcher)

• Phillips (2011). The Benchmark Method of Standard Setting. In Gregory J. Cizek (ed.), Setting performance standards (2nd edition). New York: Routledge

• Phillips, G. and Jiang, T. (2015) Using PISA as an International Benchmark in Standard Setting. Journal of Applied Measurement, 16(2):161-70

10

Florida

11

AMERICAN INSTITUTES FOR RESEARCH

Florida Standards Assessment (FSA)

• Implemented in spring 2015 (baseline administration)• Assessments administered:

• Grades 3‐10 English Language Arts (includes text‐based writing)

• Grades 3‐8 Mathematics• Algebra 1 EOC• Geometry EOC• Algebra 2 EOC

AMERICAN INSTITUTES FOR RESEARCH

Standard Setting: A Multi-Stage Process

AMERICAN INSTITUTES FOR RESEARCH

Four Rounds of Educator Panel Judgment

• Initial judgment based on test content and Achievement Level Descriptors (round 1)

• Articulation – how cut scores appear across grades in Grades 3-10 ELA and Grades 3-8 mathematics (round 2)

• Impact data – how many students would be in each achievement level, and how subgroups would perform based on recommended cut scores (round 3)

• Benchmarking – how students would compare on FSA vs. international assessments (round 4)

AMERICAN INSTITUTES FOR RESEARCH

Standard Setting ProcessAchievement Level Policy Definitions

• Achievement Level Policy Definitions – describe student achievement of Florida Standards at each achievement level

Level 1 Level 2 Level 3 Level 4 Level 5Students at this level demonstrate an inadequatelevel of success with the challenging content of the FloridaStandards.

Students at this level demonstrate abelowsatisfactory level of success with the challenging content of the Florida Standards.

Students at this level demonstrate a satisfactory level of success with the challenging content of the Florida Standards.

Students at this level demonstrate an above satisfactory level of success with the challenging content of the Florida Standards.

Students at this level demonstrate mastery of the most challenging content of the Florida Standards. 

AMERICAN INSTITUTES FOR RESEARCH

Individual and Group-Level Stakes• Individual

– To be promoted, Grade 3 students must score at or above Level 2 on ELA (“good cause” exemptions exist, however)

– Students must score at or above Level 3 on Grade 10 ELA and Algebra 1 EOC to graduate (retakes and alternative assessments provided for)

– EOCs count as 30% of course grade

• Group– School grades, which include learning gains, acceleration, and improvement of

performance of lowest 25% of students– School recognition dollars– District grades (based on school grades)– Teacher evaluation (Value-added Model, scores count at least 33% toward

evaluation)

AMERICAN INSTITUTES FOR RESEARCH

ELA – Round 1 (Test Items & ALDs)

17

AMERICAN INSTITUTES FOR RESEARCH

ELA – Round 2 (Articulation)

18

AMERICAN INSTITUTES FOR RESEARCH

ELA – Round 3 (Impact Data - % in each Level; Subgroup Performance)

19

AMERICAN INSTITUTES FOR RESEARCH

ELA – Round 4 (Benchmark to National/Int’l Tests)

20

AMERICAN INSTITUTES FOR RESEARCH

ELA – Reactor Panel (Educator Panel + Policy Considerations)

21

AMERICAN INSTITUTES FOR RESEARCH

Mathematics – Round 1 (Test Items & ALDs)

22

AMERICAN INSTITUTES FOR RESEARCH

Mathematics – Round 2 (Articulation)

23

AMERICAN INSTITUTES FOR RESEARCH

Mathematics – Round 3 (Impact Data - % in each Level; Subgroup Performance)

24

AMERICAN INSTITUTES FOR RESEARCH

Mathematics – Round 4 (Benchmark to National/Int’l Tests)

25

AMERICAN INSTITUTES FOR RESEARCH

Mathematics – Reactor Panel (Educator Panel + Policy Considerations)

26

AMERICAN INSTITUTES FOR RESEARCH

Math EOCs – Round 1 (Test Items & ALDs)

27

AMERICAN INSTITUTES FOR RESEARCH

Math EOCs – Round 2 (Pseudo-Articulation)

28

AMERICAN INSTITUTES FOR RESEARCH

Math EOCs – Round 3 (Impact Data - % in each Level; Subgroup Performance)

29

AMERICAN INSTITUTES FOR RESEARCH

Math EOCs – Round 4 (Benchmark to National/Int’l Tests)

30

AMERICAN INSTITUTES FOR RESEARCH

Math EOCs – Reactor Panel (Educator Panel + Policy Considerations)

31

AMERICAN INSTITUTES FOR RESEARCH

What Florida Educator Panelists Said

32

How important were the following factors in your placement of the bookmarks?

Very Important

Somewhat Important

Not Important

Achievement Level Descriptions (ALDs)

84% 14% 2%

External benchmark data 37% 53% 10%Feedback data 77% 23%Impact data 70% 29% 1%

AMERICAN INSTITUTES FOR RESEARCH

What Florida Educator Panelists Said

33

Strongly Agree

Agree DisagreeStrongly Disagree

The feedback on cut scores was helpful in my decisions regarding placement of my bookmarks.

74% 24% 1%

I found the panelist feedback data and discussion helpful in my decisions about where to place my bookmarks.

83% 17% <1%

I found the impact data and discussion helpful in my decisions about where to place my bookmarks.

69% 28% 3%

I made my recommendations independently and did not feel pressured to set my bookmarks at a certain level.

82% 17% 1%

I believe that the recommended cut scores represent the expectations of performance for the students of Florida.

71% 29% <1%

AMERICAN INSTITUTES FOR RESEARCH

What Florida Reactor Panelists Said

34

Usefulness of the following used during the Reactor Panel meeting.

Not at all

useful

Somewhat useful

Very useful

Reviewing external data 25% 75%Reviewing the standard-setting process used by the Educator Panel

6% 94%

Reactor Panel discussions 100%

AMERICAN INSTITUTES FOR RESEARCH

What Florida Reactor Panelists Said

35

How important was each of the following factors in rendering your judgments?

Notimportant

Somewhatimportant

Very important

The description of Achievement Level Descriptions 25% 75%

Reactor Panel discussions 6% 94%

External data100%

Impact data 19% 81%

Alignment of cut points across grades/subjects25% 75%

Arizona

36

AMERICAN INSTITUTES FOR RESEARCH

Prior AZ Performance Standards• AIMS Performance Standards were not aligned to College

and Career Ready Expectations and not comparable to other tests

• Achieve.org identified Arizona as one of the states with the largest gap (more than 40 percentage points) between state proficiency levels and the state’s NAEP proficiency levels

37

AMERICAN INSTITUTES FOR RESEARCH

New AZ Performance Standards• Expected to measure College and Career Readiness, per

law– Arizona Revised Statutes 15-741.01 (D)

Any additional assessments for high school pupils that are adopted by the state board of education after November 24, 2009 shall be designed to measure college and career readiness of pupils

38

AMERICAN INSTITUTES FOR RESEARCH

New AZ Performance Standards• Expected to measure College and Career Readiness and

provide comparability, per policy– State Board of Education’s Key Values for new Statewide Assessment

(2014)» Measure student mastery of the Arizona standards and progress toward college and career

readiness» Allow meaningful national or multistate comparisons of school and student achievement

39

AMERICAN INSTITUTES FOR RESEARCH

Challenge• Cut scores recommended by Standard Setting panelists

go to State Board of Education for adoption• Previously, State Board of Education has been criticized

when it altered cut scores proposed by Standard Setting panelists

• Establish a standard setting process that would result in recommended cut scores aligned with the Board’s stated expectations and could be adopted without revision

40

AMERICAN INSTITUTES FOR RESEARCH

Strategy• Include contextual information in the Standard Setting

process that provide panelists with benchmark information related to college and career readiness and comparability

• Primary consideration for panelists should still be the match between Performance Level Descriptors and the content of the test

41

AMERICAN INSTITUTES FOR RESEARCH

Contextual Information• Approximate performance standards for the following were

included at appropriate grades– AIMS (AZ’s previous academic assessment)– Smarter Balanced– NAEP – PISA – ACT college ready

42

AMERICAN INSTITUTES FOR RESEARCH

Standard Setting Model• Standard Setting panelists were instructed to place their

bookmarks based on test content and their “just barely” Performance Level Descriptors

• Contextual information provided in the Ordered Item Booklet indicating the general neighborhood where performance standards likely reside

• Contextual information was always available (from Round 1 onward)

43

AMERICAN INSTITUTES FOR RESEARCH

Use of Contextual Information• Was the contextual information useful to the panelists? • Did they rely primarily on the contextual information to

make their bookmark decisions?• Did they feel coerced into placing their bookmarks based

on the contextual information?

44

AMERICAN INSTITUTES FOR RESEARCH

What Panelists Said

45

AMERICAN INSTITUTES FOR RESEARCH

What Panelists Said

46

AMERICAN INSTITUTES FOR RESEARCH

What Observers Said• Arizona invited three independent observers to attend

Standard Setting who had full access to all panelists and ADE and vendor staff

• The following are some excerpts from their report to the State Board of Education

47

AMERICAN INSTITUTES FOR RESEARCH

What Observers Said• Teachers were trained to make decisions based on the

Performance Level Descriptors and the content students are supposed to know. They were guided by the Board’s goals of having tests that can be compared to other assessments that reflect college and career readiness.

48

AMERICAN INSTITUTES FOR RESEARCH

What Observers Said• (Teachers) were not told that the cut scores have to be at

cut points for other tests but they were there for context. In the training they were told “Your decision should be based on your professional opinion. The related tests are to give you a context for your choice.”

49

AMERICAN INSTITUTES FOR RESEARCH

What Observers Said• The cut points were set based on teacher judgment, and

the final decision was theirs. The directions and training made that clear to the teachers.

50

AMERICAN INSTITUTES FOR RESEARCH

Results• AzMERIT Performance Standards are quite consistent with

relevant ACT college ready, and the NAEP and Smarter Balanced proficient, benchmarks

51

AMERICAN INSTITUTES FOR RESEARCH

Results• Arizona is a “Top Truth

Teller in 2015 for having a proficiency score within five percentage points of NAEP in eighth-grade math.” – HonestGap.org

52

Ohio

53

AMERICAN INSTITUTES FOR RESEARCH

Performance Level Setting• Purpose of Performance

• Level Workshop

Recommend four performance standards to differentiate the five performance levels for state board consideration

AMERICAN INSTITUTES FOR RESEARCH

Performance Level Setting

• Ohio Revised Code requires State Board to set scores for 5 levels:

• An advanced level of skill;• An accelerated level of skill;• A proficient level of skill;• A basic level of skill; or• A limited level of skill

AMERICAN INSTITUTES FOR RESEARCH

Performance Level Setting• Training for Panelist

Take the Assessment Review Student Population Review Performance Level Descriptors Discuss concepts of the book mark method Discuss concept of a student “just barely” in

each performance level

AMERICAN INSTITUTES FOR RESEARCH

Performance Level Setting

• Ordered Item Booklet

Collection of test items ordered from easiest to most difficult

Each page corresponds to a level of achievement

Panelist used to recommend minimum level of achievement for each level

AMERICAN INSTITUTES FOR RESEARCH

Performance Level Setting• Benchmark Data Provided

Assessment Consortium performance levels (SBAC and PARCC)

NAEP performance standards for grades 4 and 8 and interpolated for grade 6

ACT for High School End of Course tests Ohio previous OAA and OGT assessments

AMERICAN INSTITUTES FOR RESEARCH

Performance Level Setting• Workshop Panelists

Reviewed and took the test Received training in performance setting

process Two rounds of performance level setting Table leaders reviewed vertical articulation Completed workshop evaluation

AMERICAN INSTITUTES FOR RESEARCH

Performance Level Setting

• State Considerations

• Graduation Requirements• Third Grade Reading Guarantee• State Report Cards• Growth measures

Recommended