Acton-Boxborough Day 2 District Determined Measures December 15, 2014 Dr. Deborah Brady [email protected]

Acton-Boxborough Day 2

District Determined Measures

December 15, 2014Dr. Deborah Brady

[email protected]

Do Now

Getting Online

You may want to download from http://ddmsbrady.wikispaces.com

The Excel file (for calculating local DDMs)The Grade 4 file (for looking at student

work)The Agenda, Mock Assessment Protocol,

Checklist

http://cli.gs/h7vu0bj

AgendaI. Collecting DDMs, Assuring High Quality

Coversheet and Checklist Examples The good the bad, the ugly “Mock” evaluation of sample DDMs

II. Scoring DDMs: Calibration and CalculationsGroup Work: Calibration Protocols—Calibrating with RubricsGroup Work: Excel Calculation of pre-post, rubrics, and MCAS SGP

LunchIII. Time to work your district’s plan forCommunicationDesignation and Documentation of DDMsAssessment of DDMsAnalysis of pre and post testCalculating Individual Teacher Growth Scores

1:00 Optional: Union Negotiations (30 minutes—table talk)1:30 Optional: Indirect Measures—Administrator, Guidance Counselor, Nurse, School

Psychologists, for example 2:00 Optional: Your choice

Why Flunking Exams is Actually a Good Thing

The excitement around pre-finals is rooted in the fact that the tests appear to improve subsequent performance in topics that are not already familiar, whether geography, sociology or psychology. At least they do so in experiments in controlled laboratory conditions.

A just-completed study — the first of its kind, carried out by the U.C.L.A. psychologist Elizabeth Ligon Bjork — found that in a live classroom of Bjork’s own students, pretesting raised performance on final-exam questions by an average of 10 percent compared with a control group.

“That is: The (bombed) pretest drives home the information in a way that studying as usual does not. We fail, but we fail forward.”

Full article is on wiki "Why flunking exams is actually a good thing" from NY Times. http://www.nytimes.com/2014/09/07/magazine/why-flunking-exams-is-actually-a-good-thing.html?module=Search&mabReward=relbias:r,{1:RI:7}&_r=1

http://www.nytimes.com/2014/09/07/magazine/why-flunking-exams-is-actually-a-good-thing.html?module=Search&mabReward=relbias:r






Consistency in Directions for Administrating Assessments Directions to teachers need to define rules

for giving support, dictionary use, etc.

What can be done? What cannot?“Are you sure you are finished?”How much time?Accommodations and modifications?

Examples: The Good, the Bad, the Ugly

Scores:Thumbs up—All is goodThumbs horizontal—Some questionsThumbs Down—Needs significant work

Quick Reminder: Assessment Quality Requirementsand Definitions from DESE (See Checklist.)

• Alignment to Frameworks and District Curriculum content and/or district standards

• Rigor • Comparability across all classes and in all

disciplines• “Substantial” assessment of the course; core

content and skills• Modifications are allowed as with MCAS

Table Vote

Thumbs UP? Halfway? DOWN?

Learning Skills Criteria (Special Education) Individual Goals; measured weekly; permanent folder

Notes

Planner

Work/Action Plan

Flexible when Necessary

Prepared for Class (materials, work)

Revises work

Follows instructions

Uses time well

Gets to work

Asks for help when needed

Advocates for self

Moving toward independence

Works collaboratively

Table Vote

Thumbs UP? Halfway? DOWN

Essay Prompt from Text

Read a primary source about Mohammed based on Mohammed’s Wife’s memories of her husband.

Essay: Identify and describe Mohammed’s most admirable quality based on this excerpt.

Then, select someone from your life who has this quality. Identify who they are and describe how they demonstrate this trait.What’s wrong with

this prompt using a primary source and a district-required

text-based question?

Table Vote


Scoring Guides from Text

Lou Vee Air Car built to specs (50 points)

Propeller Spins Freely (60 points)

Distance car travels

1m 70

2m 80

3m 90

4m 100

Best distance (10,8,5)

Best car(10,8,5)

Best all time distance all classes (+5)

235 points total

A scoring guide from a textbook

for building a Lou Vee Air Car. Is it good enough to

ensure inter-rater reliability?

Table Vote


PE Rubric in

Progress.Grade 2

for overhand throw and catching.

Table Vote


Table Vote


Music: Teacher and Student InstructionsTable Vote


Scoring

Validity—Does it test what it says it tests? Are the assessors’ ratings calibrated? Floor and ceiling effects Rubric concerns Validity assessment after test is given

What happens to these scores and assessments? Stored as an L, M, H for district Used as a discussion topic with evaluator

Beware Rubrics! Holistic Rubric Show Progress across a Scale, Continuum, Descriptors

15

1 2 3 4

Details

No improvement in the level of detail.

One is true

* No new details across versions

* New details are added, but not included in future versions.

* A few new details are added that are not relevant, accurate or meaningful

Modest improvement in the level of detail

One is true

* There are a few details included across all versions

* There are many added details are included, but they are not included consistently, or none are improved or elaborated upon.

* There are many added details, but several are not relevant, accurate or meaningful

Considerable Improvement in the level of detail

All are true

* There are many examples of added details across all versions,

* At least one example of a detail that is improved or elaborated in future versions

*Details are consistently included in future versions

*The added details reflect relevant and meaningful additions

Outstanding Improvement in the level of detail

All are true

* On average there are multiple details added across every version

* There are multiple examples of details that build and elaborate on previous versions

* The added details reflect the most relevant and meaningful additions

Example taken from Austin, a first grader from Answer Charter School in Boise, Idaho. Used with permission from Expeditionary Learning. Learn more about this and other examples at http://elschools.org/student-work/butterfly-drafts

http://elschools.org/student-work/butterfly-drafts

4(25)=100

4(22)=88

4(18)=72

4(15)=60

x

x

x

x25 + 18 + 22 + 15 = 80%

Criterion Referenced Rubric and Raw Scores or % of 100

Rubric “Cut Scores”

Create a “growth” rubric and describe a typical year’s growth Translate into 100% www.roobrix.com

http://www.roobrix.com/

CalibrationProtocol

Considerations for Scoring Student Work

Districts will need to determine fair, efficient and accurate methods for scoring students’ work. (Use consistent directions for teachers.)

DDMs can be scored by the educators themselves, groups of teachers within the district, external raters, or commercial vendors.

For districts concerned about the quality of scoring when educators score their own student’s work, processes such as randomly re-scoring a selection of student work to ensure proper calibration or using teams of educators to score together, can improve the quality of the results.

When an educator plays a large role in scoring his/her own work, a supervisor may also choose to include the scoring process into making a determination of a Student Impact.

Mock Calibration1. All of the readers come together and are provided student compositions for calibration. These compositions can be pre-selected by the facilitator from all of the writing submitted or can be provided by the teachers.

To assure fairness in assessment, teachers will not evaluate their own students’ compositions, but will be provided with their students’ work once compositions are graded and the scores are entered for the entire class.

The purpose of the calibration meeting is to make sure that all evaluators are assessing student work on the same scale. In addition, the papers used for calibration become the exemplars that all teachers will use during scoring. Rubrics alone are not sufficient for precise assessment.

Encourage each scorer to make notations on these compositions. Differences are expected and will be discussed until it is clear that all compositions are assessed with the same standards.

2. Time is given for each scorer to read and to use the rubric provided (local, textbook, MCAS, PARCC, or 6-Trait rubrics are all appropriate) to score the first composition and to enter their score on a chart like the one illustrated below. After each scorer has entered his or her score, the facilitator discusses the reasons for the scores. Scorers may need to find examples for their scores within the compositions.

3. After there is consensus for the first paper, the scorers go on to a second and third until there is a composition that represents each level of the rubric. Generally, after the first one or two compositions are calibrated, the process goes quickly.

Mock Calibration

Ask teachers to select (or you can pre-select 2 low compositions, 2 average, and 2 high compositions (altogether) Try to select a clear range.

Photocopy all of the compositions and ask teachers to evaluate the top one and using the 6-level rubric and the 4-level rubric to enter their scores under the first composition. Suggest that they can comment on their copies. Continue with the calibration until you have an exemplar for all levels.

Then the discussion needs to begin to discuss the specific reasoning for each scoring

Assume each person has a good reason, but the purpose is to work toward consensus.

Composition # 1st Exemplar

2nd 3rd 4th 5 6

Content 2,2,3,2,4,2

Conventions 1,1,1,1,2,1

MCAS ( 2 Holistic) Rubrics1 2 3 4 5 6

Content Little topic/idea development, organization, and/or detailsLittle or no awareness of audience and/or task

Limited or weak topic/idea development, organization, and/or detailsLimited awareness of audience and/or task

Rudimentary topic/idea development and/or organizationBasic supporting detailsSimplistic language

Moderate topic/idea development and organizationAdequate, relevant detailsSome variety in language

Full topic/idea developmentLogical organizationStrong detailsAppropriate use of language

Rich topic/idea developmentCareful and/or subtle organizationEffective/rich use of language

Conventions Errors seriously interfere with communication ANDLittle control of sentence structure, grammar and usage, and mechanics

Errors interfere somewhat with communication and/orToo many errors relative to the length of the essay or complexity of sentence structure, grammar and usage, and mechanics

Errors do not interfere with communication and/orFew errors relative to length of essay or complexity of sentence structure, grammar and usage, and mechanics

Control of sentence structure, grammar and usage, and mechanics (length and complexity of essay provide opportunity for student to show control of standard English conventions)

4th Grade Prompthttp://www.doe.mass.edu/mcas/student/2014/question.aspx?

GradeID=4&SubjectCode=ela&QuestionID=33207#

You are finally old enough to baby-sit, and your first job is this afternoon!

You will be spending the entire afternoon with a one-year-old.

When you open the door you realize that instead of watching a one-year-old child, you will be watching a one-year-old elephant! Write a story about spending your afternoon with a baby elephant. Give enough details to show readers what your afternoon is like baby-sitting the elephant

2014 MCASGrade 4 English Language Arts CompositionTopic/Idea Development - Score Point 3

This composition is rudimentary in topic development and organization. The straightforward introduction moves immediately to the surprise of discovering that the "baby" is a baby elephant. From here, though, only basic supporting details are demonstrated as this composition of five paragraphs unfolds. There is an interesting "snoring scenario" which briefly captures the babysitter's personality in his or her impatience with the elephant: "It was as if there were 100 bells surrounding the house and all ringing at the same time. I covered my ears with pillows, it didn't work. I put ear muffs on, it still didn't work. Finally I just woke him up. He was pretty upset." The job ends as the mom comes home and there is a brief exchange of mildly humorous dialogue. The conclusion is simplistic, reiterating that the experience was not enjoyable

Calculating Growth Scores MCAS and LocalWhat you need to understand as you are creating assessments

Growth Score FAQs from DESE Do the same numbers of students have to be identified as having high, moderate, and

low growth? There is no set percentage of students who need to be included in each category. Districts should set parameters for high, moderate, and low growth using a variety of approaches.

How do I know what low growth looks like? Districts should be guided by the professional judgment of educators. The guiding definition of low growth is that it is less than a year’s worth of growth relative to academic peers, while high growth is more than a year’s worth of growth. If the course meets for less than a year, districts should make inferences about a year’s worth of growth based on the growth expected during the time of the course.

Can I change scoring decisions when we use a DDM in the second year? It is expected that districts are building their knowledge and experience with DDMs. DDMs will undergo both small and large modifications from year to year. Changing or modifying scoring procedures is part of the continuous improvement of DDMs over time.

Will parameters of growth be comparable from one district to another? Different assessments serve different purposes. While statewide SGPs will provide a consistent metric across the Commonwealth and allow for district-to-district comparisons, DDMs are selected

MCAS SGPLocal Manipulation of Scores

(4-8; ELA or Math; not grade 10)

Excel File Tour

Sample Cut Score Determination (for local assessments)

Pre-test Post test Difference Student Scores

Sorted low to

high

Teacher score is based on the MEDIAN Score of her class for each DDM

20 35 15 5Cut score

LOW GrowthLowest ___%

25 30 5 15

30 50 20 20

35 60 25 25

35 60 25 25 median teacher score

40 70 35 25 median Teacher score 40 65 25 25

50 75 25 30

50 80 30 35 Top 20%

50 85 35 35 Cut score HIGH GROWTHHighest ___?

Measuring Growth Example: Fountas and Pinnell P to Q; N to P; D to K

Teacher A Teacher B Teacher C Teacher D Teacher E Teacher F

5 7.5 6.5 3.5 9

5.5 7.5 6.5 7 106.5 8.5 7.5 7 106.5 1 10 8.5 7 106.5 3 10 9 7 106.5 3 10 10 10 127 6 10 10 10 12.5

10 6.5 10 10 11.5 12.5

12 6.5 10 10 12.5 12

None 7 12 10 12 137 12 12 16 13

9 12 12 16 13.59 13 13.5 16 13.5

10 13 13.5 16 13.510 13 13.5 16 13.612 16 17 16.5 15.512 17 16.5 1912 17 16.5

12 1716 17

Median 6.5 Median 9 Median 12 Median 10 Median 16 Median 12

Below 6.5Between 6.5

and 16Between 6.5

and 16Between 6.5

and 16Between 6.5

and 16Between 6.5

and 16

LOW Moderate Moderate Moderate High Moderate

All Classes

1

3

3

3

3.5

5.5

6

6.5

6.5

6.5

6.5

6.5

6.5

6.5

6.5

6.5

7

7

7

7

7

7

7

7

7.5

7.5

8.5

8.5

9

9

9

9

9

9.5

10

10

10

10

10

10

10.7

10.9

11.0

11.2

11.4

11.5

11.7

11.9

12.0

12.212.5

12.6

12.7

12.9

13.1

13.2

13.4

13.6

13.8

13.9

14.1

14.3

14.4

14.6

14.8

15.0

15.1

15.3

15.5

15.6

12.5

12.5

12.5

13

13

13

13

13

13.5

13.5

13.5

13.5

13.5

13.5

13.6

15.5

16

16

16

16

16

16

16

16.5

16.5

16.5

16.5

16.5

17

17

17

19

26

6.5 Cut Score lowest 15%

12.0 Median for whole Grade 3 DDM

16 cut score highest 15%

103 Third Graders

Fountas and PinnellGrowth for Each Student

Is Based on 10 Months of GrowthSecond Grade StudentLevel

AchievementLevel End of Year

Levels from beginning to

the end of the year

Pre-Post F&P LevelsGrowth

HIGH, MODERATE, OR LOW GROWTH

(10 MONTHS=YEAR)

Q Above Benchmark PQ 7 MONTHS GROWTH

LOW GROWTH

P At Benchmark NOP 10 MONTHS OF GROWTH

MODERATE GROWTH

K Below Benchmark DEFGHIJK 17 MONTHS OF GROWTH

HIGH GROWTH

Summary DDM Process for Determining L, M, H for Every Teacher

Whole Grade Level or Course Score the entire grade level or course or take the MCAS Growth Scores for all

students

Rank the scores from highest to lowest (post minus pre or MCAS SGP)

Identify the median score for the entire group

Determine the “cut” scores for local assessments; MCAS 35 and 65 for classrooms

Individual Teacher Select students for each teacher

Rank the scores from highest to lowest

Identify the Median score

Is the median below or above the “cut” score? Is it in the middle?

Don’t forget Roster Verification might change the specific scores and, therefore, change the Median

Distribute scores to teachers for each DDM

Mock Scoring, Storing, Determining Cut ScoresProtocol

1. Assess all students for the course or grade level.

2. Enter student names, teacher names pre-test scores, post-test scores, and the gain from pre-test to post-test.

3. Using the SORT function, rank all assessments from highest to lowest. Determine the local cut scores. DESE recommends 1.5 Standard Deviations from 50; local districts have selected 10%, 15%, and 20% as their “cut scores.”

4. Look at samples of student work just above and just below the cut scores. Professional judgment about the appropriateness of this number is then used to determine the local cut scores to determine Low, Moderate, and High Growth. This number may vary from assessment to assessment and from year to year based upon the district’s determination and upon the professional judgment of the district.

Using ExcelExcel (really simple method)

1. Enter data

2. Simple pre-post formula

3. Cut and Paste Values

4.Sort Highest to Lowest

Tabs include (quick tour):

Fountas and Pinnell Sample for 6 teachers, 103 students with median for full assessment and for each teacher and with High, Moderate, and Low determinations

Pre-Post-Test Calculations (for three teachers)

Rubric Pre-Post (for three teaches)

MCAS SGP (Student Growth Percentile) Calculations for three teachers

A “Test” file that calculates the “gain,” but does not determine cut scores

Use the “test” tab to enter your local data

Or use the data provided

Time to Work with Your Team

On line materials

Excel

Templates

Documents

Acton-Boxborough Day 2 District Determined Measures December 15, 2014 Dr. Deborah Brady [email protected]