Upload
dorthy-hudson
View
217
Download
2
Embed Size (px)
Citation preview
MSP Evaluation Rubric and Working Definitions
Xiaodong Zhang, PhD, Westat
Annual State Coordinators Meeting
Washington, DC, June 10-12, 2008
2
Relevant GPRA Measure
• GPRA measure—the percentage of MSP projects that use an experimental or quasi-experimental design for their evaluations that are conducted successfully and that yield scientifically valid results
• Westat’s Data Quality Initiative (DQI) developed a rubric to determine whether a grantee’s evaluation meets the GPRA measure
• The rubric is applied to each grantee’s final evaluation report
3
Evaluation Rubric
• The criteria on the rubric are the minimum criteria that need to be met for an evaluation to have been successfully conducted and yield valid data
• An evaluation has to meet each criterion in order to meet the GPRA measure
4
Evaluation Components Covered in Rubric
For Experimental Designs:
• Sample size
• Quality of measurement instruments
• Quality of data collection methods
• Data reduction rates
• Relevant statistics reported
For Quasi-Experimental Designs:
All of the above, plus…
• Baseline equivalence of groups
5
Working Definitions
• DQI developed working definitions to help implement the rubric criteria
• Report eligibility: final evaluation report that contains post-test results on key outcomes
• Multicomponent evaluations: each component will be coded separately
Teacher content knowledge Teacher instructional practice Student achievement
6
Working Definitions (Continued)
• Baseline equivalence
Pretest on key outcomes is most relevant Other related variables are acceptable (e.g., student SES
for student outcomes; education; experience for teacher outcomes)
7
Working Definitions (Continued)
• Minimum sample sizes: based on final sample Balanced designs
• Teacher outcomes– School/district level: N=12 – Teacher level: N=60
• Student outcomes– School/district level: N=12– Classroom level: N=18– Student level: N=130
Unbalanced design: smaller group must meet minimum size divided by 2
Sample size recommendations are based on power analysis
8
Working Definitions (Continued)
• Quality of instruments Existing state accountability or widely used assessment
(e.g., Iowa Test) Select items from validated assessment: must have a
minimum of 10 items, 70% of which must be from a validated and reliable instrument(s)
Grantee-developed assessment: must demonstrate reliability and validity
All instruments must have face validity
• Data reduction Allow flexibility if study population is highly mobile or if
potential differences were addressed in analysis
9
Conclusions
• Written guidance is forthcoming
• Q&A at next breakout session: Analysis of Final Reports