Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1

1

Developing an evaluation of professional development

Webinar #2: Going deeper into planning the design

2

Information and materials mentioned or shown during this presentation are provided as resources and examples for the viewer's convenience. Their inclusion is not intended as an endorsement by the Regional Educational Laboratory Southeast or its funding source, the Institute of Education Sciences (Contract ED-IES-12-C-0011).

In addition, the instructional practices and assessments discussed or shown in these presentations are not intended to mandate, direct, or control a State’s, local educational agency’s, or school’s specific instructional content, academic achievement system and assessments, curriculum, or program of instruction. State and local programs may use any instructional content, achievement system and assessments, curriculum, or program of instruction they wish.

3

Purpose & Audience

In scope• Evaluation designs that

allow for causal inferences (RCT & QED)

• Creating an evaluation plan to examine the effectiveness of professional development

Out of scope• Other program evaluation

designs• Identifying best practices

for conducting professional development

• Identifying best practices in systems change

Target audience: LEAs, SEAs, and researchers interested in creating an evaluation of a specific professional development program and have an intermediate level of understanding in effectiveness studies.

4

PLANNING THE DESIGNDr. Sharon Koon

5

Distinction between WWC evidence standards and additional qualities of strong studies

• WWC design considerations for assessing effectiveness research:– Two distinct groups—a treatment group (T) and a comparison group (C). – For randomized controlled trials (RCTs), low attrition for both the T and C

groups.– For quasi-experimental designs (QEDs), baseline equivalence between T and

C groups. – Contrast between T and C groups measures impact of the treatment. – Valid and reliable outcome data used to measure the impact of a treatment. – No known confounding factors.– Outcome(s) not overaligned with the treatment.– Same data collection process—same instruments, same time/year—for the T

and C groups. Source: http://www.dir-online.com/wp-content/uploads/2015/11/Designing-and-Conducting-Strong-Quasi-Experiments-in-Education-Version-2.pdf

http://www.dir-online.com/wp-content/uploads/2015/11/Designing-and-Conducting-Strong-Quasi-Experiments-in-Education-Version-2.pdf



6

Distinction between WWC evidence standards and additional qualities of strong studies (cont.)

• Additional qualities of strong studies:– Pre-specified and clear primary and secondary research questions. – Generalizability of the study results. – Clear criteria for research sample eligibility and matching methods. – Sample size large enough to detect meaningful and statistically

significant differences between the T and C groups overall and for specific subgroups of interest.

– Analysis methods reflect the research questions, design, and sample selection procedures.

– A clear plan to document the implementation experiences of the T and C conditions. Source: http

://www.dir-online.com/wp-content/uploads/2015/11/Designing-and-Conducting-Strong-Quasi-Experiments-in-Education-Version-2.pdf





Determinants of a What Works Clearinghouse (WWC) study rating

Source: http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19

http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19



Study features that will be discussed

• Randomized controlled trials (RCTs)– Random assignment process

• Cluster-level RCT considerations

– Attrition, both overall and T-C differential• Quasi-experimental designs (QEDs) and high-attrition RCTs

– Baseline equivalence• For both RCTs and QEDs

– Confounding factors– Outcome eligibility

• Power analysis (not considered by WWC evidence standards)

Source: WWC references - http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18

http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18


Random assignment process• Units can be assigned at any level and at multiple levels (e.g., schools,

teachers, students)– Cluster design: when groups rather than individuals are the unit of assignment

• Make sure the units are – Assigned entirely by chance– Have a non-zero probability of being assigned to each group (but can have

different probabilities across conditions)– Have consistent assignment probability within group or use an appropriate

analytic approach• Can be useful to conduct within strata• Must maintain assignment status in the analysis, even if noncompliance

occurs (i.e, intent-to-treat analysis)Source: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18



Cluster-level RCT considerations• When cluster-level outcomes are analyzed, results provide evidence

about cluster-level effects• To meet WWC standards without reservations for analyses of

subcluster effects, the sample should include subcluster units identified before the results of the random assignment were revealed.

• For example, in a school-level RCT examining teacher retention, the sample– Should include teachers in the schools before the random assignment results

were provided to the schools– Cannot meet standards without reservations if it includes any teachers who

joined the schools after the random assignment results were provided

Source: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18



Attrition• Occurs when sample members initially assigned to T or C

groups are not in the analysis because they are missing key data used to calculate impacts

• The WWC is concerned about overall attrition and differences in the attrition rates between T and C groups

• WWC examines cluster and, if applicable, subcluster attrition

• Key data include outcomes, and for high-attrition RCTs, characteristics used to assess baseline equivalence




Ways of minimizing attrition in RCTs

• Make sure study participation activities are clear to everyone involved– e.g., can prevent an uninformed superintendent from pulling the

plug• Conduct random assignment after participants consented to

study participation– Non-consent counts as attrition

• Conduct random assignment as close to the start of the implementation period as possible– Could help minimize attrition turnover




QEDs

• In a QED, there are at least two groups (one intervention and one comparison).

• The groups are created non-randomly– Use a convenience sample, or nonparticipants who are

nearby and available, but are not participating in the intervention.

– Use a statistical technique to match participants (e.g., propensity score matching).

– Form the groups retrospectively, using administrative data.Source: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=23



Baseline equivalence

• Must be demonstrated for QEDs and high-attrition RCTs

• Based on units/individuals in the analytic sample using baseline characteristics

• Example baseline characteristics:– Prior measure of the outcome– Demographic characteristics related to the outcome of

interestSources: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18, http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19






Baseline equivalence (cont.)

• Calculate the T-C standardized mean difference at baseline– Differences between 0.05 and 0.25 standard deviations

require statistical adjustment when calculating impacts– If there is a difference greater than 0.25 standard

deviations for any required characteristic, then no outcomes in that domain may meet standards

Sources: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18, http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19





Confounding factors

• Common confounds– Single unit (school, classroom, teacher) in one or both

conditions)– Characteristics of the units in each group differ

systematically in ways that are associated with the outcomes

– Intervention is bundled with other services not being studied

– T and C occur at different points in timeSource: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=23, http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19





Outcome eligibility• Face validity and reliability. Minimum reliability standards

include: – internal consistency (such as Cronbach’s alpha) of 0.50 or higher; – temporal stability/test-retest reliability of 0.40 or higher; or – inter-rater reliability (such as percentage agreement, correlation,

or kappa) of 0.50 or higher. • Not overaligned

– E.g., an outcome measure based on an assessment that relied on materials used in the T condition but not in the C condition (e.g., specific reading passages)





Outcome eligibility (cont.)

• Collected in the same manner for both T and C groups. Issues include: – different modes, timing, or personnel were used

for the groups – measures were constructed differently for the

groups





19

Power analysis• Power: the probability of finding a difference when there is a

true difference in the populations (i.e., correctly rejecting a false null hypothesis).

• Key variables influence the power of a statistical test:– The alpha that a researcher chooses– The magnitude of the true population effect (effect size)– The sample size– Any clustering of the data– The extent to which baseline covariates predict the outcome

variable

20

Power analysis (cont.)

• A priori power analysis is conducted prior to doing the study. It enables you to design a study with adequate statistical power.

• Several online tools are available to researchers. For example, Optimal Design, can be used for individual and group RCTs.http://sitemaker.umich.edu/group-based/optimal_design_software

http://sitemaker.umich.edu/group-based/optimal_design_software



21

Questions & Answers

Homework:Find psychometric properties of outcome measures you are

consideringBring questions to sessions 3 - 5

22

Developing an evaluation of professional development

• Webinar 3: Going Deeper into Identifying & Measuring Target Outcomes 1/15/2016, 2:00pm

• Webinar 4: Going Deeper into Analyzing Results 1/19/2016, 2:00pm

• Webinar 5: Going Deeper into Interpreting Results & Presenting Findings 1/21/2016, 2:00pm

Documents

Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1