Upload
angelina-carson
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Acton-Boxborough Day 2
District Determined Measures
December 15, 2014Dr. Deborah Brady
Do Now
Getting Online
You may want to download from http://ddmsbrady.wikispaces.com
The Excel file (for calculating local DDMs)The Grade 4 file (for looking at student
work)The Agenda, Mock Assessment Protocol,
Checklist
AgendaI. Collecting DDMs, Assuring High Quality
Coversheet and Checklist Examples The good the bad, the ugly “Mock” evaluation of sample DDMs
II. Scoring DDMs: Calibration and CalculationsGroup Work: Calibration Protocols—Calibrating with RubricsGroup Work: Excel Calculation of pre-post, rubrics, and MCAS SGP
LunchIII. Time to work your district’s plan forCommunicationDesignation and Documentation of DDMsAssessment of DDMsAnalysis of pre and post testCalculating Individual Teacher Growth Scores
1:00 Optional: Union Negotiations (30 minutes—table talk)1:30 Optional: Indirect Measures—Administrator, Guidance Counselor, Nurse, School
Psychologists, for example 2:00 Optional: Your choice
Why Flunking Exams is Actually a Good Thing
The excitement around pre-finals is rooted in the fact that the tests appear to improve subsequent performance in topics that are not already familiar, whether geography, sociology or psychology. At least they do so in experiments in controlled laboratory conditions.
A just-completed study — the first of its kind, carried out by the U.C.L.A. psychologist Elizabeth Ligon Bjork — found that in a live classroom of Bjork’s own students, pretesting raised performance on final-exam questions by an average of 10 percent compared with a control group.
“That is: The (bombed) pretest drives home the information in a way that studying as usual does not. We fail, but we fail forward.”
Full article is on wiki "Why flunking exams is actually a good thing" from NY Times. http://www.nytimes.com/2014/09/07/magazine/why-flunking-exams-is-actually-a-good-thing.html?module=Search&mabReward=relbias:r,{1:RI:7}&_r=1
Consistency in Directions for Administrating Assessments Directions to teachers need to define rules
for giving support, dictionary use, etc.
What can be done? What cannot?“Are you sure you are finished?”How much time?Accommodations and modifications?
Examples: The Good, the Bad, the Ugly
Scores:Thumbs up—All is goodThumbs horizontal—Some questionsThumbs Down—Needs significant work
Quick Reminder: Assessment Quality Requirementsand Definitions from DESE (See Checklist.)
• Alignment to Frameworks and District Curriculum content and/or district standards
• Rigor • Comparability across all classes and in all
disciplines• “Substantial” assessment of the course; core
content and skills• Modifications are allowed as with MCAS
Table Vote
Thumbs UP? Halfway? DOWN?
Learning Skills Criteria (Special Education) Individual Goals; measured weekly; permanent folder
Notes
Planner
Work/Action Plan
Flexible when Necessary
Prepared for Class (materials, work)
Revises work
Follows instructions
Uses time well
Gets to work
Asks for help when needed
Advocates for self
Moving toward independence
Works collaboratively
Table Vote
Thumbs UP? Halfway? DOWN
Essay Prompt from Text
Read a primary source about Mohammed based on Mohammed’s Wife’s memories of her husband.
Essay: Identify and describe Mohammed’s most admirable quality based on this excerpt.
Then, select someone from your life who has this quality. Identify who they are and describe how they demonstrate this trait.What’s wrong with
this prompt using a primary source and a district-required
text-based question?
Table Vote
Thumbs UP? Halfway? DOWN?
Scoring Guides from Text
Lou Vee Air Car built to specs (50 points)
Propeller Spins Freely (60 points)
Distance car travels
1m 70
2m 80
3m 90
4m 100
Best distance (10,8,5)
Best car(10,8,5)
Best all time distance all classes (+5)
235 points total
A scoring guide from a textbook
for building a Lou Vee Air Car. Is it good enough to
ensure inter-rater reliability?
Table Vote
Thumbs UP? Halfway? DOWN?
PE Rubric in
Progress.Grade 2
for overhand throw and catching.
Table Vote
Thumbs UP? Halfway? DOWN?
Table Vote
Thumbs UP? Halfway? DOWN?
Music: Teacher and Student InstructionsTable Vote
Thumbs UP? Halfway? DOWN?
Scoring
Validity—Does it test what it says it tests? Are the assessors’ ratings calibrated? Floor and ceiling effects Rubric concerns Validity assessment after test is given
What happens to these scores and assessments? Stored as an L, M, H for district Used as a discussion topic with evaluator
Beware Rubrics! Holistic Rubric Show Progress across a Scale, Continuum, Descriptors
15
1 2 3 4
Details
No improvement in the level of detail.
One is true
* No new details across versions
* New details are added, but not included in future versions.
* A few new details are added that are not relevant, accurate or meaningful
Modest improvement in the level of detail
One is true
* There are a few details included across all versions
* There are many added details are included, but they are not included consistently, or none are improved or elaborated upon.
* There are many added details, but several are not relevant, accurate or meaningful
Considerable Improvement in the level of detail
All are true
* There are many examples of added details across all versions,
* At least one example of a detail that is improved or elaborated in future versions
*Details are consistently included in future versions
*The added details reflect relevant and meaningful additions
Outstanding Improvement in the level of detail
All are true
* On average there are multiple details added across every version
* There are multiple examples of details that build and elaborate on previous versions
* The added details reflect the most relevant and meaningful additions
Example taken from Austin, a first grader from Answer Charter School in Boise, Idaho. Used with permission from Expeditionary Learning. Learn more about this and other examples at http://elschools.org/student-work/butterfly-drafts
4(25)=100
4(22)=88
4(18)=72
4(15)=60
x
x
x
x25 + 18 + 22 + 15 = 80%
Criterion Referenced Rubric and Raw Scores or % of 100
Rubric “Cut Scores”
Create a “growth” rubric and describe a typical year’s growth Translate into 100% www.roobrix.com
CalibrationProtocol
Considerations for Scoring Student Work
Districts will need to determine fair, efficient and accurate methods for scoring students’ work. (Use consistent directions for teachers.)
DDMs can be scored by the educators themselves, groups of teachers within the district, external raters, or commercial vendors.
For districts concerned about the quality of scoring when educators score their own student’s work, processes such as randomly re-scoring a selection of student work to ensure proper calibration or using teams of educators to score together, can improve the quality of the results.
When an educator plays a large role in scoring his/her own work, a supervisor may also choose to include the scoring process into making a determination of a Student Impact.
Mock Calibration1. All of the readers come together and are provided student compositions for calibration. These compositions can be pre-selected by the facilitator from all of the writing submitted or can be provided by the teachers.
To assure fairness in assessment, teachers will not evaluate their own students’ compositions, but will be provided with their students’ work once compositions are graded and the scores are entered for the entire class.
The purpose of the calibration meeting is to make sure that all evaluators are assessing student work on the same scale. In addition, the papers used for calibration become the exemplars that all teachers will use during scoring. Rubrics alone are not sufficient for precise assessment.
Encourage each scorer to make notations on these compositions. Differences are expected and will be discussed until it is clear that all compositions are assessed with the same standards.
2. Time is given for each scorer to read and to use the rubric provided (local, textbook, MCAS, PARCC, or 6-Trait rubrics are all appropriate) to score the first composition and to enter their score on a chart like the one illustrated below. After each scorer has entered his or her score, the facilitator discusses the reasons for the scores. Scorers may need to find examples for their scores within the compositions.
3. After there is consensus for the first paper, the scorers go on to a second and third until there is a composition that represents each level of the rubric. Generally, after the first one or two compositions are calibrated, the process goes quickly.
Mock Calibration
Ask teachers to select (or you can pre-select 2 low compositions, 2 average, and 2 high compositions (altogether) Try to select a clear range.
Photocopy all of the compositions and ask teachers to evaluate the top one and using the 6-level rubric and the 4-level rubric to enter their scores under the first composition. Suggest that they can comment on their copies. Continue with the calibration until you have an exemplar for all levels.
Then the discussion needs to begin to discuss the specific reasoning for each scoring
Assume each person has a good reason, but the purpose is to work toward consensus.
Composition # 1st Exemplar
2nd 3rd 4th 5 6
Content 2,2,3,2,4,2
Conventions 1,1,1,1,2,1
MCAS ( 2 Holistic) Rubrics1 2 3 4 5 6
Content Little topic/idea development, organization, and/or detailsLittle or no awareness of audience and/or task
Limited or weak topic/idea development, organization, and/or detailsLimited awareness of audience and/or task
Rudimentary topic/idea development and/or organizationBasic supporting detailsSimplistic language
Moderate topic/idea development and organizationAdequate, relevant detailsSome variety in language
Full topic/idea developmentLogical organizationStrong detailsAppropriate use of language
Rich topic/idea developmentCareful and/or subtle organizationEffective/rich use of language
Conventions Errors seriously interfere with communication ANDLittle control of sentence structure, grammar and usage, and mechanics
Errors interfere somewhat with communication and/orToo many errors relative to the length of the essay or complexity of sentence structure, grammar and usage, and mechanics
Errors do not interfere with communication and/orFew errors relative to length of essay or complexity of sentence structure, grammar and usage, and mechanics
Control of sentence structure, grammar and usage, and mechanics (length and complexity of essay provide opportunity for student to show control of standard English conventions)
4th Grade Prompthttp://www.doe.mass.edu/mcas/student/2014/question.aspx?
GradeID=4&SubjectCode=ela&QuestionID=33207#
You are finally old enough to baby-sit, and your first job is this afternoon!
You will be spending the entire afternoon with a one-year-old.
When you open the door you realize that instead of watching a one-year-old child, you will be watching a one-year-old elephant! Write a story about spending your afternoon with a baby elephant. Give enough details to show readers what your afternoon is like baby-sitting the elephant
2014 MCASGrade 4 English Language Arts CompositionTopic/Idea Development - Score Point 3
This composition is rudimentary in topic development and organization. The straightforward introduction moves immediately to the surprise of discovering that the "baby" is a baby elephant. From here, though, only basic supporting details are demonstrated as this composition of five paragraphs unfolds. There is an interesting "snoring scenario" which briefly captures the babysitter's personality in his or her impatience with the elephant: "It was as if there were 100 bells surrounding the house and all ringing at the same time. I covered my ears with pillows, it didn't work. I put ear muffs on, it still didn't work. Finally I just woke him up. He was pretty upset." The job ends as the mom comes home and there is a brief exchange of mildly humorous dialogue. The conclusion is simplistic, reiterating that the experience was not enjoyable
Calculating Growth Scores MCAS and LocalWhat you need to understand as you are creating assessments
Growth Score FAQs from DESE Do the same numbers of students have to be identified as having high, moderate, and
low growth? There is no set percentage of students who need to be included in each category. Districts should set parameters for high, moderate, and low growth using a variety of approaches.
How do I know what low growth looks like? Districts should be guided by the professional judgment of educators. The guiding definition of low growth is that it is less than a year’s worth of growth relative to academic peers, while high growth is more than a year’s worth of growth. If the course meets for less than a year, districts should make inferences about a year’s worth of growth based on the growth expected during the time of the course.
Can I change scoring decisions when we use a DDM in the second year? It is expected that districts are building their knowledge and experience with DDMs. DDMs will undergo both small and large modifications from year to year. Changing or modifying scoring procedures is part of the continuous improvement of DDMs over time.
Will parameters of growth be comparable from one district to another? Different assessments serve different purposes. While statewide SGPs will provide a consistent metric across the Commonwealth and allow for district-to-district comparisons, DDMs are selected
MCAS SGPLocal Manipulation of Scores
(4-8; ELA or Math; not grade 10)
Excel File Tour
Sample Cut Score Determination (for local assessments)
Pre-test Post test Difference Student Scores
Sorted low to
high
Teacher score is based on the MEDIAN Score of her class for each DDM
20 35 15 5Cut score
LOW GrowthLowest ___%
25 30 5 15
30 50 20 20
35 60 25 25
35 60 25 25 median teacher score
40 70 35 25 median Teacher score 40 65 25 25
50 75 25 30
50 80 30 35 Top 20%
50 85 35 35 Cut score HIGH GROWTHHighest ___?
Measuring Growth Example: Fountas and Pinnell P to Q; N to P; D to K
Teacher A Teacher B Teacher C Teacher D Teacher E Teacher F
5 7.5 6.5 3.5 9
5.5 7.5 6.5 7 106.5 8.5 7.5 7 106.5 1 10 8.5 7 106.5 3 10 9 7 106.5 3 10 10 10 127 6 10 10 10 12.5
10 6.5 10 10 11.5 12.5
12 6.5 10 10 12.5 12
None 7 12 10 12 137 12 12 16 13
9 12 12 16 13.59 13 13.5 16 13.5
10 13 13.5 16 13.510 13 13.5 16 13.612 16 17 16.5 15.512 17 16.5 1912 17 16.5
12 1716 17
Median 6.5 Median 9 Median 12 Median 10 Median 16 Median 12
Below 6.5Between 6.5
and 16Between 6.5
and 16Between 6.5
and 16Between 6.5
and 16Between 6.5
and 16
LOW Moderate Moderate Moderate High Moderate
All Classes
1
3
3
3
3.5
5.5
6
6.5
6.5
6.5
6.5
6.5
6.5
6.5
6.5
6.5
7
7
7
7
7
7
7
7
7.5
7.5
8.5
8.5
9
9
9
9
9
9.5
10
10
10
10
10
10
10.7
10.9
11.0
11.2
11.4
11.5
11.7
11.9
12.0
12.212.5
12.6
12.7
12.9
13.1
13.2
13.4
13.6
13.8
13.9
14.1
14.3
14.4
14.6
14.8
15.0
15.1
15.3
15.5
15.6
12.5
12.5
12.5
13
13
13
13
13
13.5
13.5
13.5
13.5
13.5
13.5
13.6
15.5
16
16
16
16
16
16
16
16.5
16.5
16.5
16.5
16.5
17
17
17
19
26
6.5 Cut Score lowest 15%
12.0 Median for whole Grade 3 DDM
16 cut score highest 15%
103 Third Graders
Fountas and PinnellGrowth for Each Student
Is Based on 10 Months of GrowthSecond Grade StudentLevel
AchievementLevel End of Year
Levels from beginning to
the end of the year
Pre-Post F&P LevelsGrowth
HIGH, MODERATE, OR LOW GROWTH
(10 MONTHS=YEAR)
Q Above Benchmark PQ 7 MONTHS GROWTH
LOW GROWTH
P At Benchmark NOP 10 MONTHS OF GROWTH
MODERATE GROWTH
K Below Benchmark DEFGHIJK 17 MONTHS OF GROWTH
HIGH GROWTH
Summary DDM Process for Determining L, M, H for Every Teacher
Whole Grade Level or Course Score the entire grade level or course or take the MCAS Growth Scores for all
students
Rank the scores from highest to lowest (post minus pre or MCAS SGP)
Identify the median score for the entire group
Determine the “cut” scores for local assessments; MCAS 35 and 65 for classrooms
Individual Teacher Select students for each teacher
Rank the scores from highest to lowest
Identify the Median score
Is the median below or above the “cut” score? Is it in the middle?
Don’t forget Roster Verification might change the specific scores and, therefore, change the Median
Distribute scores to teachers for each DDM
Mock Scoring, Storing, Determining Cut ScoresProtocol
1. Assess all students for the course or grade level.
2. Enter student names, teacher names pre-test scores, post-test scores, and the gain from pre-test to post-test.
3. Using the SORT function, rank all assessments from highest to lowest. Determine the local cut scores. DESE recommends 1.5 Standard Deviations from 50; local districts have selected 10%, 15%, and 20% as their “cut scores.”
4. Look at samples of student work just above and just below the cut scores. Professional judgment about the appropriateness of this number is then used to determine the local cut scores to determine Low, Moderate, and High Growth. This number may vary from assessment to assessment and from year to year based upon the district’s determination and upon the professional judgment of the district.
Using ExcelExcel (really simple method)
1. Enter data
2. Simple pre-post formula
3. Cut and Paste Values
4.Sort Highest to Lowest
Tabs include (quick tour):
Fountas and Pinnell Sample for 6 teachers, 103 students with median for full assessment and for each teacher and with High, Moderate, and Low determinations
Pre-Post-Test Calculations (for three teachers)
Rubric Pre-Post (for three teaches)
MCAS SGP (Student Growth Percentile) Calculations for three teachers
A “Test” file that calculates the “gain,” but does not determine cut scores
Use the “test” tab to enter your local data
Or use the data provided
Time to Work with Your Team
On line materials
Excel
Templates