Using Standardized Formative Assessments
to Improve Classroom Instruction &
Practice
John Bielinski
Director of Research & Development
Classroom Assessment Products
ATI 2016 Conference
1ATI 2016 Annual Conference
Summary
Expert teachers combine professional judgment from the myriad personal
interactions with students and objective performance data to tailor
instruction to the needs of the students.
Most schools use standardized assessments (SA) to make decisions
about programs and students; but, most classroom teachers have not
been adequately trained on how to make effective use of those results,
and many do not trust SA.
This presentation provides a glimpse into the process of standardization
and how to use and interpret scores from SA. Participants will learn about
the factors that influence quality SA development and apply an
interpretive framework to make an action plan using actual student data
2Standardized Formative Assessment
About Me
• PhD in Educational Psychology (1999)
• NCEO: Assessment Policy Research (1997 – 2002)
o Inclusion of SWD in state testing programs
o Empirical research on test accommodations
• Pearson: Research Director (2002-2009)
o KeyMath-3 Diagnostic Assessment & companion intervention
program
• Pearson: Director of R&D (2011 – Present)
o aimswebPlus assessment system
3Standardized Formative Assessment
What is meant by standardization and why
do we do it?
How do standardized assessments (SA)
differ from classroom assessment?
What is your perspective on the usefulness
of SA for educators?
What do I need to know about SA to put it
to work for me?
4Standardized Formative Assessment
5Presentation Title Arial Bold 7 pt
Scientific Law
a statement of fact, deduced from observation, to
the effect that a particular natural or scientific
phenomenon always occurs if certain conditions are
present.
Natural Laws (e.g., Physics) are pervasive
Laws of human behavior are virtually non-existent
Learning requires attention
Individuals differ on all human behaviors
6Presentation Title Arial Bold 7 pt
Bethany & Megan
Identical TwinsElite Runners
Race Bethany Megan
State CC Championships 1st 17:32 2nd 17:39
1600m State
Championships1st 4:49 2nd 4:50
3200m State
Championships3rd 10:37 2nd 10:36
7Presentation Title Arial Bold 7 pt
Bob & Steve
BrothersElite Runners
Race Bob Steve
3200 m (HS) 9:16.9 9:16.6
Marathon 2:08:24 (AR) 2:42:16
8Presentation Title Arial Bold 7 pt
The content and task interactions in
standardized assessments
are constrained by cost, time, and infrastructure
must take into account individual differences in student experiences, interests, and capabilities
15Presentation Title Arial Bold 7 pt
Laws of Assessment
rxy ≤ rxx × ryy rxy = validity coefficient
rxx & ryy = reliability coefficients
Reliability ranges from 0 to 1
Need strong reliability to attain good validity
16Presentation Title Arial Bold 7 pt
Reliability increases with the number of positively correlated observations
18Presentation Title Arial Bold 7 pt
There are myriad factors that will determine whether a standardized assessment serves its purpose.
To promote appropriate use of SA results, publishers shouldshould
define the appropriate applications (what decisions does it decisions does it support), and limitations
describe the content (construct), development, validity, validity, scaling, and scoring procedures
Provide examples of how to interpret and use resultsresults
19Presentation Title Arial Bold 7 pt
AND
Consumers of SA results need training training and professional development on what the scores mean scores mean and how to use the results
20Presentation Title Arial Bold 7 pt
Measured Progress evaluated assessment literacy standards and performance measures for educators. They concluded the coverage of AL in pre-service programs was
was incomplete & superficial performance measures cover AL superficially
superficially rendering them incapable of gauging gauging candidate mastery
21Presentation Title Arial Bold 7 pt
Assessment Literacy Standards and Performance Measures for Teacher Candidates and Practicing Teachers, Measured Progress
2013
B = broadly covered
S = more specificity on the topic
22Presentation Title Arial Bold 7 pt
Teachers & administrators must be able to select and effectively interpret and use results from external interim and summative assessments designed for a variety of purpose: Diagnostic benchmark Diagnostic General achievement Adaptive State accountability
23Presentation Title Arial Bold 7 pt
Assessment the process of gathering & integrating information
information about a student’s behavior
Assessments Types of assessments in specified domains that are
that are scored using a standardized process (e.g., (e.g., formative, summative, etc.)
24Presentation Title Arial Bold 7 pt
Formative ongoing assessment of student learning to provide feedback
feedback to improve teaching and shape subsequent student subsequent student learning
Summative evaluate student learning at the end of instruction
Interim (benchmark) periodic evaluation of student progress toward learning
learning targets
Diagnostic evaluation of student knowledge and skills to determine
determine areas of strength and weakness
Presentation Title Arial Bold 7 pt
Questions during
Instruction
End of
Unit QuizInterim
(district testing)
End of
Course Test
Adju
stin
g I
nstr
uctio
n
What Students Know and Can Do
Low Info High Info
Lo
w In
foH
igh
In
fo
State
Tests
26Presentation Title Arial Bold 7 pt
Formative
State
SummativeClassroom
Interim
Diagnostic
50 million students in US
27Presentation Title Arial Bold 7 pt
State
Summative
Diagnostic
50 million students in US
Classroom
FormativeInterim
Formative Assessment
‒ Assess understanding in real time
‒ Assess engagement
‒ Adapt instruction
‒ Involve students
‒ Classroom or student-centered
‒ Informal or formal
‒ Embedded in instruction
‒ Rubrics, percent correct, letter grade
28Presentation Title Arial Bold 7 pt
Teacher-Developed Standardized
Purpose
Qualities
Scores
‒ Assess understanding
‒ Compare to benchmark(s)
‒ Determine risk
‒ Adapt instruction
‒ Grade-centered
‒ Formal
‒ Distinct event
‒ Number correct, scale score
‒ Percentiles
‒ Performance levels
29Presentation Title Arial Bold 7 pt
Assessment Utility Rating
ApproachAssessment
Type
Instructional
Utility
Program
Evaluation
Formative
Interim
Summative
Formative
Interim
Summative
Teacher
Standardized
• Rate the three types of assessment on their utility to
guide instructional planning and to evaluate programs
• Scale: 1 (very low), 10 (very high)
33Presentation Title Arial Bold 7 pt
The STANDARDS
Foundations
Validity (25 stds)
Reliability (20 stds)
Fairness (20 stds)
Operations (e.g., scoring, administration, interpretation)
Testing Applications
34Presentation Title Arial Bold 7 pt
High quality standardized assessment
development
Begins with a clear description how the results
will be used AND requires
high quality item development (text, art,
interactions)
trying it out on target population
strong validation evidence to support
claims/uses
35Presentation Title Arial Bold 7 pt
Development Process
Defining the
ConceptPilot Testing
TryoutStandardizationFinalization
36
Build an assessment system that:
• Accurately determines levels of academic risk
• Assesses current learning standards
• Supports differentiated instruction
• Can be used for RtI/MTSS
• Provides data for program planning &
resource allocation
37
A series of small scale studies designed to
address specific research questions
• Is the task interaction appropriate for this age?
• How much practice is needed?
• Do students understand instructions?
• Is the content too easy/too difficult?
38
A large-scale study designed to assess item
characteristics & build final forms
• Item Difficulty & Reliability
• Item Bias & Differential Item Functioning
• Refine item pool as necessary
• Build test forms
39
A large-scale study with a diverse and nationally
representative sample designed to
• Finalize scoring rules & create the score scale
• Generate norms (e.g., national percentiles)
• Evaluate validity & reliability
40
Prepare the product for launch
• Finalize system UI, logic, and reporting
features
• Complete supporting guides (e.g., directions
for administration, technical manual, etc.)
• Beta test
42Presentation Title Arial Bold 7 pt
aimswebPlus Approach
Develop an assessment system that combines relatively brief standards-based assessment with curriculum-based measurement for the dual purpose of 3x per year interim assessment and multi-tiered systems of support
43Presentation Title Arial Bold 7 pt
Improve predictive validity
Enhance diagnostic utility
Maintain or improve sensitivity to growth
Go fully digital
Keep administration time brief
Product Goals
44Presentation Title Arial Bold 7 pt
Predict overall math achievement
Provide information for instructional planning (individual students & classrooms)
Are delivered online
Are brief
Are sensitive to growth
How to design math CBM that:
45Presentation Title Arial Bold 7 pt
Review relevant research
Evaluate what is working well and not so well in the current product
Consult other assessment experts
Formulate a plan
46Presentation Title Arial Bold 7 pt
Hybrid Model Approach
CBM
SBA
Composite
Monitor
Progress
Differentiate
Risk/Tier
Screening
47Presentation Title Arial Bold 7 pt
Mental computation efficiency
Facility with making judgements about the magnitude and distance between numbers within and across number systems
Number sense is an essential skill that is predictive of long-term success in math
Basic math concepts and problem solving skills as defined by current learning standards
Constructs
48Presentation Title Arial Bold 7 pt
What characteristics of CBM enable strong
prediction of overall achievement, sensitivity
to growth, while remaining brief?
Assess basic foundational skills
Use rate-based scoring
Use simple response modes
Measure automaticity (efficiency) on basic skills that underlie development of complex skills
49Presentation Title Arial Bold 7 pt
What basic skills are essential for success
in math (Algebra)?
NMAP (2008)
Fluency with whole numbers
Fluency with fractions
Analyze properties of 2-D shapes and
solve perimeter and area problems
50Presentation Title Arial Bold 7 pt
Our own research revealed that
Math Computation Fluency is sensitive to
growth and predictive of overall math
achievement
Math Concepts & Applications is
predictive of math achievement but not
that sensitive to growth
Number sense is an essential skill that is
predictive of long-term success in math
CBM Approach
Mental Computation
FluencyNumber Comparison
Fluency
Compute answers to 1- and 2-
step expressions
Uses friendly numbers
Operations & numbers introduced
at least one grade prior
4-min; multiple choice
Correct for guessing
Compare magnitude and
distance among 3 numbers
Uses friendly numbers
Number systems introduced at
least one grade prior
3-min; multiple choice
Correct for guessing
Number Comparison Fluency - Triads
Item Type 2 3 4 5 6 7 8
2-digit numbers 17 5
3-digit numbers 23 23 10
4-digit numbers 12 20 15
5-digit numbers 5
Common fractions 10 15 10 8 6
Fractions & decimals 8 8 4
Fractions (unlike denominators) 14 8 8
Decimals 5 8 8 4
Negatives 8 7
Scientific notation 7
Squares 4
Item Count by Grade
Mental Computation Fluency 2 3 4 5 6 7 8
Add, subtract 2- & 3-digit numbers 42
Add, subtract 3- & 4-digit numbers 26
Multiply 1-digit with 2- or 3-digit 16 9
Divide 3-digit by 1-digit 6
Add, subtract 4- & 5-digit numbers 27 16
Multiply & divide 2- and 3-digit numbers 8 9
Add, subtract fractions with like denominators 6 3
Order of operations 6 12 12
Add and subtract fractions with unlike denominators 6 10 10 10
Multiply decimals by whole numbers 8 5 4
Whole number divided by a fraction 4 6
1- and 2- step solve for y 11 16
Add, subtract with negatives numbers 6
Item Count by Grade
Mental Computation Fluency
Concepts and Applications
Assess conceptual knowledge and problem solving ability that reflects grade level learning standards
Students get as much time as they need
Multiple-choice items
Audio
CCSS Domain 2 3 4 5 6 7 8
Operations & Algebraic Thinking 17 30 22 15 4 -- --
Equations & Expressions -- -- -- -- 21 24 43
Functions -- -- -- -- -- -- 7
Number: Base 10 30 14 12 23 6 -- --
Number: Fractions -- 12 25 25 4 -- --
Number Systems -- -- -- -- 27 17 7
Ratios & Proportions -- -- -- -- 9 20 1
Measurement & Data 33 25 18 12 -- -- --
Statistics & Probability -- -- -- -- 6 12 14
Geometry 10 8 12 16 12 18 17
Items per Form 30 30 30 30 30 30 30
Item Count by Grade
Concepts and Applications
Framework for Instructional Planning
Level of
InterpretationStudent Group
Composite• Risk (low, moderate, high)
• Performance levels
• Percent of students at risk or
by performance level
Measure
• Variability (pattern) by measure
• Fluency vs depth of knowledge
• Percent correct
• Group average by measure
• NSF vs CA
Skill Area • Profile by strand or skill area• Performance level distribution
by skill area (BA, A, AA)
Item • Correct/incorrect by item • Percent correct by item
72Presentation Title Arial Bold 7 pt
Composite
NSF
(NCF + MCF)
CA
Composite
• Broadest indicator of
overall performance
• Most reliable score
• Best predictor of end-
of-year performance
74Presentation Title Arial Bold 7 pt
Profile Analysis: Level & Pattern
0
20
40
60
80
100
120
Student 1 Student 2 Student 3 Student 4
Test 1 Test 2 Test 3
75Presentation Title Arial Bold 7 pt
Content
NCF-T MCF CA
• Efficiency comparing
numbers within and
across number
systems
• Timed, 3 min
• 40 items
• One grade below
grade level
• Efficiency mentally
computing
• Timed, 4 min
• 42 items
• One grade below
grade level
• Conceptual
knowledge and
problem solving
skills
• Untimed, 30 items
• Mostly on-grade
level; basic to
advanced skills
Level of
InterpretationInterpretation Questions
Composite
NCF-T
MCF
CA
• Is the student at risk? If yes, how serious?
• Is performance fluent (>= 30 correct, >90% accuracy)
• Is accuracy low (<65%), moderate (65-90), high (>90)
• Are there skill area deficits?
• Does student demonstrate basic competency (total score >
10, with >65% accuracy)?
• Is performance fluent (>= 30 correct, >90% accuracy)
• Is accuracy low (<65%), moderate (65-90), high (>90)
• Are there skill area deficits?
• Does student demonstrate basic competency (total score >
10, with >65% accuracy)?
• Did the student show mastery (>85% correct)
• Is the student at or above the average range
• Are there skill area deficits?
Student 1, Grade 6: Math Profile
0
10
20
30
40
50
60
70
80
90
100
Composite NSF CA NCF-T MCF
National P
erc
entile
Student 1: Performance Summary
Measure Risk NP Perf Level Score Accuracy
Composite Low 64 A 227
NSF 64 A 24
NCF-T 29 A 4 55
MCF 88 AA 20 95
CA 60 A 15 (203) 50
Student 1: S-W by Skill Area
CA (60th %ile) NCF-T (29th %ile) MCF (88th %ile)
DomainPerf
LevelSkill # Corr # Att. Skill # Corr # Att.
EE ACommon
Fractions (10)3 3
Add & subtract
fractions (13)7 7
NS WFraction &
Decimals (8)1 3 Order of Oper. (12) 4 5
RP AUnlike
Denominator (14)1 3
Mult. & Div.
2- & 3-digits (9)5 5
SP -- Decimals (8) 1 2Mult. Decimals by
whole #s (8)5 5
Geo S
0
10
20
30
40
50
60
70
80
90
100
Composite NSF CA NCF-T MCF
National P
erc
entile
• Is the student at risk? If yes, how serious?
Measure Risk NP Perf Level Score Accuracy
Composite Low 64 A 227
NSF 64 A 24
NCF-T 29 A 4 55
MCF 88 AA 20 95
CA 60 A 15 (203) 50
• Is performance fluent (>= 30 correct, >90% accuracy)
• Is accuracy low (<65%), moderate (65-90), high (>90)
• Does student demonstrate basic competency (total score > 10, with
>65% accuracy)?
NCF-T & MCF
CA (60th %ile) NCF-T (29th %ile) MCF (88th %ile)
DomainPerf
LevelSkill # Corr # Att. Skill # Corr # Att.
EE ACommon
Fractions (10)3 3
Add & subtract
fractions (13)7 7
NS WFraction &
Decimals (8)1 3 Order of Oper. (12) 4 5
RP AUnlike
Denominator (14)1 3
Mult. & Div.
2- & 3-digits (9)5 5
SP -- Decimals (8) 1 2Mult. Decimals by
whole #s (8)5 5
Geo S
• Are there skill area deficits?
NCF-T & MCF
Measure Risk NP Perf Level Score Accuracy
Composite Low 64 A 227
NSF 64 A 24
NCF-T 29 A 4 55
MCF 88 AA 20 95
CA 60 A 15 (203) 50
• Did the student show mastery (>85% correct)
• Is the student at or above the average range
Concepts & Applications
• Are there skill area deficits?
CA (60th %ile) NCF-T (29th %ile) MCF (88th %ile)
DomainPerf
LevelSkill # Corr # Att. Skill # Corr # Att.
EE ACommon
Fractions (10)3 3
Add & subtract
fractions (13)7 7
NS WFraction &
Decimals (8)1 3 Order of Oper. (12) 4 5
RP AUnlike
Denominator (14)1 3
Mult. & Div.
2- & 3-digits (9)5 5
SP -- Decimals (8) 1 2Mult. Decimals by
whole #s (8)5 5
Geo S
Concepts & Applications
Student 1: Conclusions
Level of
InterpretationInterpretation Questions
Composite
NCF-T
• Does not demonstrate basic competency comparing
fractions, and fractions with decimals
• Relatively low accuracy; not due to high amount of guessing
• Consider remediation on basic fraction concepts
MCF
• Demonstrates basic competency; but not quite fluent
• Minimal guessing
• No specific deficits; but, should check performance on
add/subtract fractions with unlike denominators in light of
NCF-T scores
CA
• Has not mastered on-grade level concepts and associated
problem solving; but performed within the average range
relative to his peers.
• Weaknesses in Number Systems; may be due to difficulty
with fractions
• Low overall risk; should remain on track for success without
additional intervention
Level of
InterpretationInterpretation Questions
Composite
NCF-T
MCF
CA
• Is the student at risk? If yes, how serious?
• Is performance fluent (>= 30 correct, >90% accuracy)
• Is accuracy low (<65%), moderate (65-90), high (>90)
• Are there skill area deficits?
• Does student demonstrate basic competency (total score >
10, with >65% accuracy)?
• Is performance fluent (>= 30 correct, >90% accuracy)
• Is accuracy low (<65%), moderate (65-90), high (>90)
• Are there skill area deficits?
• Does student demonstrate basic competency (total score >
10, with >65% accuracy)?
• Did the student show mastery (>85% correct)
• Is the student at or above the average range
• Are there skill area deficits?
Student 2: S-W by Skill Area
0
10
20
30
40
50
60
70
80
90
100
Composite NSF CA NCF-T MCF
Na
tion
al P
erc
en
tile
Student 2: Performance Summary
Measure Risk NP Perf Level Score Accuracy
Composite Low 46 A 225
NSF 24 BA 14
NCF-T 40 A 10 50
MCF 15 BA 4 83
CA 73 A 17 (211) 57
Student 2: S-W by Skill Area
CA (73rd %ile) NCF-T (40th %ile) MCF (15th %ile)
DomainPerf
LevelSkill # Corr # Att. Skill # Corr # Att.
EE ACommon
Fractions (10)6 10
Add & subtract
fractions (13)2 3
NS SFraction &
Decimals (8)5 14 Order of Oper. (12) 2 2
RP AUnlike
Denominator (14)6 8
Mult. & Div.
2- & 3-digits (9)1 1
SP S Decimals (8) 3 8Mult. Decimals by
whole #s (8)0 0
Geo A
0
10
20
30
40
50
60
70
80
90
100
Composite NSF CA NCF-T MCF
National P
erc
entile
Student 3: Math Profile
Student 3: Performance Summary
Measure Risk NP Perf Level Score Accuracy
Composite High 13 BA 193
NSF 62 A 38
NCF-T 81 AA 24 78
MCF 47 A 14 76
CA 1 WBA 3 (155) 10
Student 3: S-W by Skill Area
CA (1st %ile) NCF-T (81st %ile) MCF (47th %ile)
DomainPerf
LevelSkill # Corr # Att. Skill # Corr # Att.
EE WCommon
Fractions (10)9 9
Add & subtract
fractions (13)3 7
NS WFraction &
Decimals (8)5 12 Order of Oper. (12) 5 5
RP WUnlike
Denominator (14)6 7
Mult. & Div.
2- & 3-digits (9)5 5
SP A Decimals (8) 8 8Mult. Decimals by
whole #s (8)3 4
Geo W
Student 2: Conclusions
Level of
InterpretationInterpretation Questions
Composite• Overall risk is at the low to moderate cut point. Consider
intervention (e.g., more practice, small group work, etc.)
NCF-T
• Demonstrates basic competency comparing numbers within and
across number systems
• Low accuracy (50%); possibly rushed to answer every question
• May struggle with decimals
MCF
• Does not demonstrates basic competency
• Minimal guessing; but attempted few items
• Low performance across the board; should check performance
on easier content to establish mental computation ability
CA
• Has not mastered on-grade level concepts and associated
problem solving
• In average range for this grade; thus, appears to be on-track with
conceptual understanding and problem solving skill
• Weaknesses solving word problems involving rate & ratio
Student 3: ConclusionsLevel of
InterpretationInterpretation Questions
Composite
• Spring performance places student at high risk. Without intensive
intervention, student not likely to be successful in 7th grade.
• Leverage good NSF, possibly mix in practice on fluency skills as
a motivator
NCF-T
• Demonstrates basic competency comparing numbers within and
across number systems, with moderate accuracy
• Perfect scores on problems involving fractions with like
denominators, and comparing decimals.
• Struggles comparing decimals and fractions, and fractions with
unlike denominators
MCF• Demonstrates basic competency; with moderate accuracy
• No apparent strengths or weaknesses
CA
• Knowledge of concepts and problem solving skills are very poor
• General weaknesses across the board.
• Poor reading skills may partially explain poor performance,
especially on word problems