DDM Part II Analyzing the Results Dr. Deborah Brady

DDM Part IIAnalyzing the Results

Dr. Deborah Brady

Agenda Overview of how to measure growth in 4 “common sense”

ways Quick look at “standardization” Not all analyses are statistical or new We’ll use familiar ways of looking at student work Excel might help when you have a whole grade’s scores, but

it is not essential

Time for your questions; exit slips

My email [email protected]

mailto:[email protected]

2 Considerations Local DDMs,”

1. Comparable across schools Example: Teachers with the same job (e.g., all 5th grade

teachers) Where possible, measures are identical

Easier to compare identical measures

Do identical measures provide meaningful information about all students?

Exceptions: When might assessments not be identical? Different content (different sections of Algebra I)

Differences in untested skills (reading and writing on math test for ELL students)

Other accommodations (fewer questions to students who need more time)

NOTE: Roster Verification and Group Size will be considerations by DESE

3

2. Comparable across the District

Aligned to your curriculum (comparable content) K-12 in all disciplines Appropriate for your students Aligned to your district’s content Informative, useful to teachers and administrators

“Substantial” Assessments (comparable rigor): “Substantial” units with at least 2 standards and/or concepts

assessed. (DESE began talking about finals/midterms as preferable recently)

See Core Curriculum Objectives (CCOs) on DESE website if you are concerned

http://www.doe.mass.edu/edeval/ddm/example/

Quarterly, benchmarks, mid-terms, and common end of year exams

NOTE: All of this data stays in your district. Only HML goes to DESE with a MEPID for each educator.



Examples of 4 +1 Methods for Calculating Growth

Each is in handout

Pre-post test Repeated measures Holistic Rubric (Analytical Rubric)Post test only

A look at “standardization” with percentiles

Typical Gradebook and Distribution Page 1 of handout

Alphabetical order (random) Sorted low to high Determine “cut scores” (validate in the student

work) Use “Stoplight Method” to help see cut scores Graph of distribution of all scores Graph of distribution of High, Moderate, Low

scores

Random Sorted90 5276 6092 6172 6380 6598 7291 7575 7660 7652 7776 7877 7996 8061 8063 8478 8579 8695 9080 9185 9286 9584 9665 98

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230

20

40

60

80

100

120

Distribution of whole class all scores, low to high

High Mod Low0

2

4

6

8

10

12

14

High, Moderate, Low Distribution

HighCount 6ModCount 12LowCount 5

“Cut” Scores and “common sense”: validate them with performances.

What work is not moving at an average rate?

What work shows accelerated growth?

Some benchmarks have determined rates of growth over time

Pre/Post Test Description:

The same or similar assessments administered at the beginning and at the end of the course or year

Example: Grade 10 ELA writing assessment aligned to College and Career Readiness Standards at beginning and end of year

Measuring Growth: Difference between pre- and post-test.

Check if all students have an equal chance of demonstrating growth

8

Pre- Post Tests

Pre-testLowest to highest

Post test Difference(Growth)

AnalysisRange of growth

20 35 15 15

25 30 5 5

30 50 20 20 ?

35 60 25 25

35 60 25 25

40 70 35 35

40 65 25 25

50 75 25 25

50 80 30 30

50 85 35 35

low moderate high

2

5

3

How many L/M/H?

Cut score?Look at work.Look at distribution.

Holistic

Description: Assess growth across student work collected

throughout the year. Example: Tennessee Arts Growth Measure System

Measuring Growth: Growth Rubric (see example)

Considerations: Option for multifaceted performance

assessments Rating can be challenging & time consuming

10

11

Holistic Example (unusual rubric)

11

1 2 3 4

Details

No improvement in the level of detail.

One is true

* No new details across versions

* New details are added, but not included in future versions.

* A few new details are added that are not relevant, accurate or meaningful

Modest improvement in the level of detail

One is true

* There are a few details included across all versions

* There are many added details are included, but they are not included consistently, or none are improved or elaborated upon.

* There are many added details, but several are not relevant, accurate or meaningful

Considerable Improvement in the level of detail

All are true

* There are many examples of added details across all versions,

* At least one example of a detail that is improved or elaborated in future versions

*Details are consistently included in future versions

*The added details reflect relevant and meaningful additions

Outstanding Improvement in the level of detail

All are true

* On average there are multiple details added across every version

* There are multiple examples of details that build and elaborate on previous versions

* The added details reflect the most relevant and meaningful additions

Example taken from Austin, a first grader from Anser Charter School in Boise, Idaho. Used with permission from Expeditionary Learning. Learn more about this and other examples at http://elschools.org/student-work/butterfly-drafts

http://elschools.org/student-work/butterfly-drafts

HOLISTIC Easier for Large-Scale Assessments

like MCAS Rubric Topic or Conventions and useful when categories overlap

CriteriaIn one cell

Advanced Proficient NI At Risk

Writing

1) Claims/evidence2) Counterclaims3) Organization4) Language/style

1)Insightful, accurate, carefully developed claims and evidence. 2) Counterclaims are thoughtfully, accurately, completely discussed and argued.3) Whole essay and each paragraph are carefully organized and show interrelationships among ideas. 4) Sentence structure, vocabulary, and mechanics show control over language use

AdequateEffective“Gets it”

Misconceptions; some errors

Serious errors

MCAS Has 2 Holistic Rubrics6 5 4 4 5 6

Topic/Development

Rich topic/idea developmentCareful, subtle organizationEffective rich use of language

Full topic/idea developmentLogical organizationStrong detailsAppropriate use of language

Moderate topic/idea development and organizationAdequate, relevant detailsSome variety in language

Rudimentary topic/idea development and/or organizationBasic supporting detailsSimplistic language

Limited or weak topic/idea development, organization, and/or detailsLimited awareness of audience and/or task

Little topic/idea development, organization, and/or detailsLittle or no awareness of audience and/or task

Conventions

Control of sentence structure, grammar, usage, and mechanics, (length and complexity of essay) provide opportunity for student to show control of standard English conventions)

Errors do not interfere with communication and/orFew errors relative to length of essay or complexity of sentence structure, grammar and usage, and mechanics

Errors interfere somewhat with communication and/orToo many errors relative to the length of the essay or complexity of sentence structure, grammar and usage, and mechanics

•Errors seriously interfere with communication AND•Little control of sentence structure, grammar and usage, and mechanics

Pre and Post Rubric (2 Criteria) Growth

Add the scoresPretestsTopic

Conventions

Post testsTopic

Conventions

Growth

AnalysisAdd together criteria gainsas raw score

In order

1/1 1/1 0/0 0 0

1 / 2 2/2 1/0 1 1

1/2 2/3 1/1 2 1

2/3 3/3 1/0 1 2

Rubrics do not represent percentages. A student who received a 1 would probably receive a 50. F?

1= 50 FSeriously at risk

2= range 60-72, 75? D to C-At risk

3= 76-88, 89? C+ to B+ Average

4= 90-100 A to A+Above most

Holistic Rubric or Holistic DescriptorKeeping 1-4 scale

Pre Post Difference

Rank order

Cut

0 1 +1 -1 -1

0 1 +1 0 0

0 1 +1 0 0

1 0 -1 1 1

1 1 0 1 1

1 1 0 1 1

1 3 +2 1 1

1 1 0 1 1

2 3 +1 2 2 low mod High0

1

2

3

4

5

6

7

distribution

Converting Rubrics to PercentagesNot recommended for classroom use because it distorts the meaning of

the descriptors.May facilitate this large-scale use. District Decision

Pre

Converted

Post Converted

Difference

Difference

Ranked

0 50 1 65 15 -15

0 50 1 65 15 0

0 50 1 65 15 0

1 65 0 50 -15 0

1 65 1 65 0 0

1 65 1 65 0 15

1 65 3 82 17 15

1 65 1 65 0 15

2 82 3 82 0 17

Common Sense analysisWas the assessment too difficult?Zeros in pretest (3)Zero growth (4 plus minus growth)Only 1 student improved

Change assessment scale?

Look at all of the grade-level assessments.

Repeated Measures

Description: Multiple assessments given throughout

the year. Example: running records, attendance,

mile run Measuring Growth:

GraphicallyRanging from the sophisticated to simpleLess pressure on each administration.Authentic Tasks (reading aloud, running) 17

Repeated Measures

Description: Multiple assessments given throughout the

year. Example: running records, attendance, mile run

Measuring Growth:GraphicallyRanging from the sophisticated to simple

Considerations:Less pressure on each administration.Authentic Tasks

18

Repeated Measures Example Running Record Errors in ReadingAverage of high, moderate, and low error groups

19

September Sept

Septe

mber

Novem

ber

J anuar

yMarch April J une Ra

65 48 30 15 15 13 6863 65

30 35 20 22 18 10 6532

22 10 12 5 2 1 30 30282422 2220

Error Chart of Averages from each assessment

1 2 3 4 5 60

10

20

30

40

50

60

70

Error Chart of Averages from each assessment

Post test onlyAP exam: Use as baseline to show growth for

each level or… for classroom

This assessment does not have a “normal curve”

An alternative for post test only for a classroom and to show student growth is to give a mock AP pre and post.

five four three two one0

2

4

6

8

10

12

14

16

Post Test Only AP Exam Example

Looking for Variability

Low Moderate High0

50

100

150

200

Good#

of

stu

dents

Low Moderate High0

50

100

150

200

Problematic

# o

f stu

dents

The second graph is problematic because it doesn’t give us information about the difference between average and high growth because so many students fall into the “high” growth category.

NOTE: Look at the work and make “common sense” decisions. Consider the whole grade level; one class’s variation may be caused by teacher’s

effectiveness Critical Question: Do all students have equal possibility for success?

21

“Standardizing” Local NormsPercentages versus Percentiles

% within class/course %iles across all courses in district

22

Many Assessments

with different standards

Student A English:

15/20 Math:

22/25 Art:

116/150 Social Studies: 6/10 Science:

70/150 Music:

35/35

“Standardized”Normal Curve

Student A English: 62

%ile Math: 72

%ile Art: 59

%ile Social Studies: 71 %ile Science:

70 %ile Music: 61

%ile

Percentage of 100%

Student A• English

75%• Math

88%• Art

77%• Social Studies 60%• Science

46%• Music

100

StandardizationIn Everyday Terms

Standardization is a process of putting different measures on the same scale

For example

Most cars cost $25,000 give or take $5,000Most apples costs $1.50 give or take $.50Getting a $5000 discount on a car is about

equal to what discount on an apple? Technical terms

“Most are” = mean

“Give or take” = standard deviation

23

Percentile/Standard Deviation

Excel Functions

Sort high to low or low to high, Graphing Function, Statistical Functions including Percentiles and

Standard Deviation

Student grades can be sorted from highest to lowest score with one command

Table of student scores can be easily graphed with one command

Excel will easily calculate %, but this is probably not necessary

“Common Sense” The purpose of DDMs is to assess Teacher Impact The student scores, the Low, Moderate, and High

growth rankings are totally internal DESE (in two years) will see

MEPIDS and L, M or H next to a MEPID

The important part of this process needs to be the focus: Your discussions about student learning with colleagues Your discussions about student learning with your

evaluator An ongoing process

Documents

DDM Part II Analyzing the Results Dr. Deborah Brady