MSP Evaluation Rubric and Working Definitions Xiaodong Zhang, PhD, Westat Annual State Coordinators Meeting Washington, DC, June 10-12, 2008

MSP Evaluation Rubric and Working Definitions

Xiaodong Zhang, PhD, Westat

Annual State Coordinators Meeting

Washington, DC, June 10-12, 2008

2

Relevant GPRA Measure

• GPRA measure—the percentage of MSP projects that use an experimental or quasi-experimental design for their evaluations that are conducted successfully and that yield scientifically valid results

• Westat’s Data Quality Initiative (DQI) developed a rubric to determine whether a grantee’s evaluation meets the GPRA measure

• The rubric is applied to each grantee’s final evaluation report

3

Evaluation Rubric

• The criteria on the rubric are the minimum criteria that need to be met for an evaluation to have been successfully conducted and yield valid data

• An evaluation has to meet each criterion in order to meet the GPRA measure

4

Evaluation Components Covered in Rubric

For Experimental Designs:

• Sample size

• Quality of measurement instruments

• Quality of data collection methods

• Data reduction rates

• Relevant statistics reported

For Quasi-Experimental Designs:

All of the above, plus…

• Baseline equivalence of groups

5

Working Definitions

• DQI developed working definitions to help implement the rubric criteria

• Report eligibility: final evaluation report that contains post-test results on key outcomes

• Multicomponent evaluations: each component will be coded separately

Teacher content knowledge Teacher instructional practice Student achievement

6

Working Definitions (Continued)

• Baseline equivalence

Pretest on key outcomes is most relevant Other related variables are acceptable (e.g., student SES

for student outcomes; education; experience for teacher outcomes)

7


• Minimum sample sizes: based on final sample Balanced designs

• Teacher outcomes– School/district level: N=12 – Teacher level: N=60

• Student outcomes– School/district level: N=12– Classroom level: N=18– Student level: N=130

Unbalanced design: smaller group must meet minimum size divided by 2

Sample size recommendations are based on power analysis

8


• Quality of instruments Existing state accountability or widely used assessment

(e.g., Iowa Test) Select items from validated assessment: must have a

minimum of 10 items, 70% of which must be from a validated and reliable instrument(s)

Grantee-developed assessment: must demonstrate reliability and validity

All instruments must have face validity

• Data reduction Allow flexibility if study population is highly mobile or if

potential differences were addressed in analysis

9

Conclusions

• Written guidance is forthcoming

• Q&A at next breakout session: Analysis of Final Reports

Documents

MSP Evaluation Rubric and Working Definitions Xiaodong Zhang, PhD, Westat Annual State Coordinators Meeting Washington, DC, June 10-12, 2008