Incorporating Contextual Information in Standard SettingGary W. Phillips, VP, American Institutes for ResearchVince Verges, Assistant Deputy Commissioner, FloridaIrene Hunting, Deputy Associate Superintendent, ArizonaJames Wright, Director, Office of Curriculum and Assessment, Ohio
June 20, 20162016 CCSSO/NCSA
June 2016
Copyright © 2016 American Institutes for Research. All rights reserved.
AIR
2
AMERICAN INSTITUTES FOR RESEARCH
What is Contextual Information in Standard Setting?• Data used to inform panelist decisions above and beyond
the content standards, performance level descriptions and the test items
3
AMERICAN INSTITUTES FOR RESEARCH
Impact Data• State item P-values • Item Maps using Response Probabilities (RPs)• Overall state inverse cumulative percentages• State inverse cumulative percentages by demographic
subgroup
4
PG1
Slide 4
PG1 Phillips, Gary, 6/9/2016
AMERICAN INSTITUTES FOR RESEARCH
Articulation• Use student frequency distributions to smooth cut-scores
across grades• Use vertical scale to smooth cut-scores across grades
5
AMERICAN INSTITUTES FOR RESEARCH
National Benchmarks• ACT/SAT• NAEP• National Norm-Referenced Tests
6
AMERICAN INSTITUTES FOR RESEARCH
International Benchmarks• PISA
– Three year cycle (2012, 2015, 2018)– Reading, Mathematics & Science– Age 15
• TIMSS– Four Year cycle (2011, 2015, 2019)– Mathematics and Science– Grades 4 & 8
• PIRLS– Five year cycle (2011, 2016, 2021)– Reading Literacy– Grade 4
7
AMERICAN INSTITUTES FOR RESEARCH
Benchmarking Methodology
• Linking–Item Calibration
» Common item linking (e.g., PISA in Delaware, Hawaii and Oregon, 2015)
–Equipercentile» Randomly equivalent groups (e.g., comparing state standards to
SBAC and PARCC standards, 2016)
–Statistical Moderation»Randomly equivalent groups (e.g., comparing state
standards to TIMSS standards, 2010)
8
AMERICAN INSTITUTES FOR RESEARCH
When is Contextual Information Important?• When you are trying to reach a policy goal• The earlier you introduce contextual information into the
standard setting workshop the more it is likely to influence the outcome
9
AMERICAN INSTITUTES FOR RESEARCH
Literature• Ferrara (2005). Vertically Articulated Performance
Standards (special issue of Applied Measurement in Education)
• McClarty, Way, Porter, Beimer and Miles (2012). Evidence-Based Standard Setting (Educational Researcher)
• Phillips (2011). The Benchmark Method of Standard Setting. In Gregory J. Cizek (ed.), Setting performance standards (2nd edition). New York: Routledge
• Phillips, G. and Jiang, T. (2015) Using PISA as an International Benchmark in Standard Setting. Journal of Applied Measurement, 16(2):161-70
10
Florida
11
AMERICAN INSTITUTES FOR RESEARCH
Florida Standards Assessment (FSA)
• Implemented in spring 2015 (baseline administration)• Assessments administered:
• Grades 3‐10 English Language Arts (includes text‐based writing)
• Grades 3‐8 Mathematics• Algebra 1 EOC• Geometry EOC• Algebra 2 EOC
AMERICAN INSTITUTES FOR RESEARCH
Standard Setting: A Multi-Stage Process
AMERICAN INSTITUTES FOR RESEARCH
Four Rounds of Educator Panel Judgment
• Initial judgment based on test content and Achievement Level Descriptors (round 1)
• Articulation – how cut scores appear across grades in Grades 3-10 ELA and Grades 3-8 mathematics (round 2)
• Impact data – how many students would be in each achievement level, and how subgroups would perform based on recommended cut scores (round 3)
• Benchmarking – how students would compare on FSA vs. international assessments (round 4)
AMERICAN INSTITUTES FOR RESEARCH
Standard Setting ProcessAchievement Level Policy Definitions
• Achievement Level Policy Definitions – describe student achievement of Florida Standards at each achievement level
Level 1 Level 2 Level 3 Level 4 Level 5Students at this level demonstrate an inadequatelevel of success with the challenging content of the FloridaStandards.
Students at this level demonstrate abelowsatisfactory level of success with the challenging content of the Florida Standards.
Students at this level demonstrate a satisfactory level of success with the challenging content of the Florida Standards.
Students at this level demonstrate an above satisfactory level of success with the challenging content of the Florida Standards.
Students at this level demonstrate mastery of the most challenging content of the Florida Standards.
AMERICAN INSTITUTES FOR RESEARCH
Individual and Group-Level Stakes• Individual
– To be promoted, Grade 3 students must score at or above Level 2 on ELA (“good cause” exemptions exist, however)
– Students must score at or above Level 3 on Grade 10 ELA and Algebra 1 EOC to graduate (retakes and alternative assessments provided for)
– EOCs count as 30% of course grade
• Group– School grades, which include learning gains, acceleration, and improvement of
performance of lowest 25% of students– School recognition dollars– District grades (based on school grades)– Teacher evaluation (Value-added Model, scores count at least 33% toward
evaluation)
AMERICAN INSTITUTES FOR RESEARCH
ELA – Round 1 (Test Items & ALDs)
17
AMERICAN INSTITUTES FOR RESEARCH
ELA – Round 2 (Articulation)
18
AMERICAN INSTITUTES FOR RESEARCH
ELA – Round 3 (Impact Data - % in each Level; Subgroup Performance)
19
AMERICAN INSTITUTES FOR RESEARCH
ELA – Round 4 (Benchmark to National/Int’l Tests)
20
AMERICAN INSTITUTES FOR RESEARCH
ELA – Reactor Panel (Educator Panel + Policy Considerations)
21
AMERICAN INSTITUTES FOR RESEARCH
Mathematics – Round 1 (Test Items & ALDs)
22
AMERICAN INSTITUTES FOR RESEARCH
Mathematics – Round 2 (Articulation)
23
AMERICAN INSTITUTES FOR RESEARCH
Mathematics – Round 3 (Impact Data - % in each Level; Subgroup Performance)
24
AMERICAN INSTITUTES FOR RESEARCH
Mathematics – Round 4 (Benchmark to National/Int’l Tests)
25
AMERICAN INSTITUTES FOR RESEARCH
Mathematics – Reactor Panel (Educator Panel + Policy Considerations)
26
AMERICAN INSTITUTES FOR RESEARCH
Math EOCs – Round 1 (Test Items & ALDs)
27
AMERICAN INSTITUTES FOR RESEARCH
Math EOCs – Round 2 (Pseudo-Articulation)
28
AMERICAN INSTITUTES FOR RESEARCH
Math EOCs – Round 3 (Impact Data - % in each Level; Subgroup Performance)
29
AMERICAN INSTITUTES FOR RESEARCH
Math EOCs – Round 4 (Benchmark to National/Int’l Tests)
30
AMERICAN INSTITUTES FOR RESEARCH
Math EOCs – Reactor Panel (Educator Panel + Policy Considerations)
31
AMERICAN INSTITUTES FOR RESEARCH
What Florida Educator Panelists Said
32
How important were the following factors in your placement of the bookmarks?
Very Important
Somewhat Important
Not Important
Achievement Level Descriptions (ALDs)
84% 14% 2%
External benchmark data 37% 53% 10%Feedback data 77% 23%Impact data 70% 29% 1%
AMERICAN INSTITUTES FOR RESEARCH
What Florida Educator Panelists Said
33
Strongly Agree
Agree DisagreeStrongly Disagree
The feedback on cut scores was helpful in my decisions regarding placement of my bookmarks.
74% 24% 1%
I found the panelist feedback data and discussion helpful in my decisions about where to place my bookmarks.
83% 17% <1%
I found the impact data and discussion helpful in my decisions about where to place my bookmarks.
69% 28% 3%
I made my recommendations independently and did not feel pressured to set my bookmarks at a certain level.
82% 17% 1%
I believe that the recommended cut scores represent the expectations of performance for the students of Florida.
71% 29% <1%
AMERICAN INSTITUTES FOR RESEARCH
What Florida Reactor Panelists Said
34
Usefulness of the following used during the Reactor Panel meeting.
Not at all
useful
Somewhat useful
Very useful
Reviewing external data 25% 75%Reviewing the standard-setting process used by the Educator Panel
6% 94%
Reactor Panel discussions 100%
AMERICAN INSTITUTES FOR RESEARCH
What Florida Reactor Panelists Said
35
How important was each of the following factors in rendering your judgments?
Notimportant
Somewhatimportant
Very important
The description of Achievement Level Descriptions 25% 75%
Reactor Panel discussions 6% 94%
External data100%
Impact data 19% 81%
Alignment of cut points across grades/subjects25% 75%
Arizona
36
AMERICAN INSTITUTES FOR RESEARCH
Prior AZ Performance Standards• AIMS Performance Standards were not aligned to College
and Career Ready Expectations and not comparable to other tests
• Achieve.org identified Arizona as one of the states with the largest gap (more than 40 percentage points) between state proficiency levels and the state’s NAEP proficiency levels
37
AMERICAN INSTITUTES FOR RESEARCH
New AZ Performance Standards• Expected to measure College and Career Readiness, per
law– Arizona Revised Statutes 15-741.01 (D)
Any additional assessments for high school pupils that are adopted by the state board of education after November 24, 2009 shall be designed to measure college and career readiness of pupils
38
AMERICAN INSTITUTES FOR RESEARCH
New AZ Performance Standards• Expected to measure College and Career Readiness and
provide comparability, per policy– State Board of Education’s Key Values for new Statewide Assessment
(2014)» Measure student mastery of the Arizona standards and progress toward college and career
readiness» Allow meaningful national or multistate comparisons of school and student achievement
39
AMERICAN INSTITUTES FOR RESEARCH
Challenge• Cut scores recommended by Standard Setting panelists
go to State Board of Education for adoption• Previously, State Board of Education has been criticized
when it altered cut scores proposed by Standard Setting panelists
• Establish a standard setting process that would result in recommended cut scores aligned with the Board’s stated expectations and could be adopted without revision
40
AMERICAN INSTITUTES FOR RESEARCH
Strategy• Include contextual information in the Standard Setting
process that provide panelists with benchmark information related to college and career readiness and comparability
• Primary consideration for panelists should still be the match between Performance Level Descriptors and the content of the test
41
AMERICAN INSTITUTES FOR RESEARCH
Contextual Information• Approximate performance standards for the following were
included at appropriate grades– AIMS (AZ’s previous academic assessment)– Smarter Balanced– NAEP – PISA – ACT college ready
42
AMERICAN INSTITUTES FOR RESEARCH
Standard Setting Model• Standard Setting panelists were instructed to place their
bookmarks based on test content and their “just barely” Performance Level Descriptors
• Contextual information provided in the Ordered Item Booklet indicating the general neighborhood where performance standards likely reside
• Contextual information was always available (from Round 1 onward)
43
AMERICAN INSTITUTES FOR RESEARCH
Use of Contextual Information• Was the contextual information useful to the panelists? • Did they rely primarily on the contextual information to
make their bookmark decisions?• Did they feel coerced into placing their bookmarks based
on the contextual information?
44
AMERICAN INSTITUTES FOR RESEARCH
What Panelists Said
45
AMERICAN INSTITUTES FOR RESEARCH
What Panelists Said
46
AMERICAN INSTITUTES FOR RESEARCH
What Observers Said• Arizona invited three independent observers to attend
Standard Setting who had full access to all panelists and ADE and vendor staff
• The following are some excerpts from their report to the State Board of Education
47
AMERICAN INSTITUTES FOR RESEARCH
What Observers Said• Teachers were trained to make decisions based on the
Performance Level Descriptors and the content students are supposed to know. They were guided by the Board’s goals of having tests that can be compared to other assessments that reflect college and career readiness.
48
AMERICAN INSTITUTES FOR RESEARCH
What Observers Said• (Teachers) were not told that the cut scores have to be at
cut points for other tests but they were there for context. In the training they were told “Your decision should be based on your professional opinion. The related tests are to give you a context for your choice.”
49
AMERICAN INSTITUTES FOR RESEARCH
What Observers Said• The cut points were set based on teacher judgment, and
the final decision was theirs. The directions and training made that clear to the teachers.
50
AMERICAN INSTITUTES FOR RESEARCH
Results• AzMERIT Performance Standards are quite consistent with
relevant ACT college ready, and the NAEP and Smarter Balanced proficient, benchmarks
51
AMERICAN INSTITUTES FOR RESEARCH
Results• Arizona is a “Top Truth
Teller in 2015 for having a proficiency score within five percentage points of NAEP in eighth-grade math.” – HonestGap.org
52
Ohio
53
AMERICAN INSTITUTES FOR RESEARCH
Performance Level Setting• Purpose of Performance
• Level Workshop
Recommend four performance standards to differentiate the five performance levels for state board consideration
AMERICAN INSTITUTES FOR RESEARCH
Performance Level Setting
• Ohio Revised Code requires State Board to set scores for 5 levels:
• An advanced level of skill;• An accelerated level of skill;• A proficient level of skill;• A basic level of skill; or• A limited level of skill
AMERICAN INSTITUTES FOR RESEARCH
Performance Level Setting• Training for Panelist
Take the Assessment Review Student Population Review Performance Level Descriptors Discuss concepts of the book mark method Discuss concept of a student “just barely” in
each performance level
AMERICAN INSTITUTES FOR RESEARCH
Performance Level Setting
• Ordered Item Booklet
Collection of test items ordered from easiest to most difficult
Each page corresponds to a level of achievement
Panelist used to recommend minimum level of achievement for each level
AMERICAN INSTITUTES FOR RESEARCH
Performance Level Setting• Benchmark Data Provided
Assessment Consortium performance levels (SBAC and PARCC)
NAEP performance standards for grades 4 and 8 and interpolated for grade 6
ACT for High School End of Course tests Ohio previous OAA and OGT assessments
AMERICAN INSTITUTES FOR RESEARCH
Performance Level Setting• Workshop Panelists
Reviewed and took the test Received training in performance setting
process Two rounds of performance level setting Table leaders reviewed vertical articulation Completed workshop evaluation
AMERICAN INSTITUTES FOR RESEARCH
Performance Level Setting
• State Considerations
• Graduation Requirements• Third Grade Reading Guarantee• State Report Cards• Growth measures