24
Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer Valad (CUNY Queens College) 2018 Transforming STEM Higher Education Association of American Colleges & Universities ~ Network for Academic Renewal November 10, 2018 Working with What Works Clearinghouse Standards to Evaluate Designs for Broadening Participation

Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

  • Upload
    hangoc

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Kate Winter (Kate Winter Evaluation, LLC)

& Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer Valad

(CUNY Queens College)

2018 Transforming STEM Higher EducationAssociation of American Colleges & Universities ~ Network for Academic Renewal

November 10, 2018

Working with What Works Clearinghouse Standards

to Evaluate Designs for Broadening Participation

Page 2: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

● Background

● What Works Clearinghouse (WWC)

● At your tables: Case studies

● Group share-out

● At your tables: Implications for planned proposals or projects in progress

● Resources

Overview

Page 3: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Background

Page 4: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

RFP, Title III HSI-STEM program, US Dept. of Education: no specification about WWC Evidence Standards or quantitative methods, but specified that the evaluation design would be judged on the extent to which:

(1) The data elements and the data collection procedures are clearly described and appropriate to measure the attainment of activity objectives and to measure the success of the project in achieving the goals of the comprehensive development plan; (up to 5 points)

(2) The data analysis procedures are clearly described and are likely to produce formative and summative results on attaining activity objectives and measuring the success of the project on achieving the goals of the comprehensive development plan; (up to 5 points) and

(3) The evaluation will provide guidance about effective strategies suitable for replication or testing in other settings. (up to 5 points) (US Department of Education, 2016)

Context: STEM Bridges Project

Page 5: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

STEM Bridges: The Proposal

Two Goals: (a) graduate more Hispanic & low-income students

(b) develop articulation agreements for QCC-to-QC transfer

Three Activities:❶ Improve access: redesign STEM “landing” courses

❷ Improve learning: learning collectives

❸ Bridge: cross-campus model for transfer student success

http://hsistem.qc.cuny.edu

Page 6: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

❶ Improve Access: Re-design 20 STEM-landing courses(up to 7,000 students annuallyin treatment-group courses)

❷ Improve Learning:Develop “learning collectives”where peers mentor students

❸ Bridge:Articulation agreements and assessment processes for all STEM majors, QCC → QC

Page 7: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

● Cluster Randomized Controlled Trial (RCT), intended to meet WWC evidence standards without reservations

● Eligible sections randomly assigned to treatment or control, some treatment sections randomly assigned to learning collectives (txt +)

● Exploring “landing” course GPA, “gateway” course GPA, term-to-term retention, time to graduation

● Hierarchical modeling nesting students within sections, using baselines as covariates

● Two sites, large sample, multiple disciplines

STEM Bridges: Evaluation Design

Page 8: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,
Page 9: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

WWC Evidence Standards

Page 10: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

● Without reservations○ RCTs or RDDs* with low attrition

● With reservations○ RCTs with high attrition○ QEDs with baseline equivalence, RDDs*

● All RCTs must follow an ITT protocol and use approved baseline measures

● All studies must be of appropriate interventions with included populations and approved outcomes

*Have additional requirements and are not discussed today

How to Meet WWC Evidence StandardsTLAs

WWC: What Works ClearinghouseRCT: Randomized Control Trial

RDD: Regression Discontinuity DesignQED: Quasi-Experimental Design

ITT: Intention To Treat

Page 11: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Attrition

RCTs with low attrition meet WWC Evidence Standards with no reservations

RCTs with high attrition must demonstrate baseline equivalence

Page 12: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

What is “Low Attrition” versus “High Attrition”?Conservative Thresholds Liberal Thresholds*

*Liberal thresholds are used in postsecondary education reviews

Page 13: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

● Institutions are inexperienced with experimental designs○ FERPA, IRB consent process○ Faculty reluctance○ Project personnel still figuring out monitoring processes

● Faculty unfamiliar with RCT structure○ Compliance, implementation fidelity, and sample size issues

● Implementing experimental designs in small departments● Access to institutional data, analytic sample sizes, & statistical power● Avoiding attrition

Challenges

Page 14: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

The following case studies are real.

At the request of the survivors,the names have been changed.

Out of respect for the dead,The rest has been told exactly as it occurred.

(Adapted from Fargo, 1996)

Case Studies

Annotated Version + Glossary of Terms:https://goo.gl/99NXkX

Page 15: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Case 1: Fall Term

Page 16: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Case 2: Spring Term

Page 17: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Implications

Page 18: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

● Build fault tolerance in at the proposal stage:○ Plan to exclude MANY sections each term○ Use conservative N for estimates

● Develop resources to prepare faculty to participate in an RCT● Spend the first term (or the first year) “beta-testing” all processes:

faculty recruitment, sampling, implementation monitoring, data acquisition, data analysis

● Let your evaluator play “bad cop” and toe the line of the WWC rules● Closely monitor treatment implementation so your evaluator

can formatively conduct ToT analysis

Applying Our Lessons Learned

Page 19: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Resources

● RCT Basics: https://emj.bmj.com/content/20/2/164● WWC Homepage: https://ies.ed.gov/ncee/wwc/FWW● WWC Handbooks: https://ies.ed.gov/ncee/wwc/Handbooks● WWC Postsecondary Review Protocols:

https://ies.ed.gov/ncee/wwc/Handbooks#protocol ● WWC Webinars: https://ies.ed.gov/ncee/wwc/Handbooks#webinars● Common Guidelines: https://ies.ed.gov/pdf/CommonGuidelines.pdf

Slides: https://goo.gl/WXXQpJ Handouts: https://goo.gl/99NXkX

Page 20: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Kate Winter ([email protected]) http://www.katewinterevaluation.com

Eva Fernández ([email protected]) & Sabrina Avila, Patrick Johnson, Jennifer Valad

http://hsistem.qc.cuny.edu

Slides: https://goo.gl/WXXQpJ Handouts: https://goo.gl/99NXkX

Thanks!

Page 21: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Department Normal offers 20 sections of NORM-101 to the project, with 10 students each. The treatment—a redesigned full-term set of lab experiences with standardized materials, activities, and assessments—was implemented the same way by every lab instructor that used it. In all, 10 treatment and 10 control sections were randomly assigned. Following assignment, 6 treatment-assigned instructors indicated that they were not involved in the redesign efforts and would not be risking their students’ learning to untested approaches. However, 5 of the control-assigned instructors saw significant value in the new materials during the workshops when they were being built and eagerly implemented them with their sections. Halfway through the term, 1 of the treatment-assigned faculty who opted not to implement decided to implement after seeing the improvements in students engaged in the treatment. The treatment appears so successful that the chair wants to realign the final exam in the lecture course based on the amazing new materials being covered in the treatment labs, to more accurately reflect student learning. The score on this exam serves as the primary outcome measure for the lab treatment. Inexplicably, the external evaluator is pushing back against this idea.

Department Enthusiastic had 4 of 10 instructors who were so excited about the new grant that they decided to implement their redesigned sections without waiting for instruction to do so, impacting 9 of 20 sections. Each instructor did something a little differently, thinking that they could compare their results at the end of the term to determine which approaches were most effective. Everyone is confused by the external evaluator’s sigh and sad smile when they inform her.

Department Small only offers 2 lectures and 4 labs of SMAL-102 each term, divided equally among 2 instructors. Fortunately, both instructors are willing to offer treatment sections, but it is unclear if they are each willing to offer BOTH treatment and control sections in the same term. The external evaluator keeps telling them they cannot offer the treatment in the lecture and have it included in the study analysis being reported to the funding agency, so the department is offering the treatment only in the 4 lab sections. The evaluator continues to push the project coordinators to closely monitor this department to ensure that the instructors each offer 1 treatment and 1 control section of the lab.

At the end of the term, Lead Institution is ready to provide institutional data about students in study-assigned sections to the external evaluator for analysis. Per WWC guidelines, since there are no baseline measures for course GPA, retention, or graduation, one measure of prior academic achievement (a composite of HS or transfer GPA and SAT/ACT) and one of SES (Pell eligibility) are used as covariates in the analysis model. However, prior academic achievement scores are frequently missing for transfer students. Similarly, many students do not complete the FAFSA, so SES data are not available for them. Despite there being more than 3,000 students in the courses where the redesigns are taking place, the restriction to only one section per instructor for sampling purposes has dropped the headcount significantly. Of 1,000 students in the study sections, 25% are missing one baseline and 25% are missing the other, with 25% of students missing something missing both. This leaves 625 students in the analytic sample if baseline scores are not imputed. Meanwhile, the on-campus IRB is pushing to require Informed Consent from all students in the study and the evaluator has your data lead on speed-dial.

Lead InstitutionWhen was the last time you worked with Institutional Research (or equivalent) to access data? How complete was that data? Do you remember what proportion of your students complete the FAFSA?What strategies might help maximize the completeness of the data?

Department SmallGiven the 2 faculty and 2 lectures with 4 labs each structure of this course, what options exist for avoiding the N of 1 confound where only one instructor offers the treatment or control condition?

Implementation fidelity – having

easily implemen-ted standardized

materials and strategies facili-

tates standard implementation.

Compliance – although one

treatment sec-tion joins late

and only threa-tens implemen-

tation fidelity.

Department EnthusiasticA. Without random assignment, can these sections’ data be used in the RCT analysis? Why or why not?B. Assuming all the varying treatments were variations on a theme aimed at the same outcome, could the treatment sections be analyzed together against the remaining “business as usual” sections, assuming baseline equivalency (i.e., a QED)? Why or why not?C. If each instructor offered a unique treatment, so that he or she was the only source of that treatment, would it be possible to analyze the impacts of those treatments through an experimental design? Why or why not?D. How might you approach determining if any of the treatment conditions had an impact?

Implementation fidelity.

Overalignment

Department NormalA. Across the entire term, how many treatment and control sections were offered in- and out-of-compliance?B. How many treatment sections were offered with full implementation fidelity?Assuming all sections are full and all students have the needed data (i.e, there is no attrition):C. How many students were in randomly assigned treatment sections? Control sections?D. How many students were exposed to treatment (at any level)? How many to control conditions?E. We need at least 85 students in treatment and 73 in the control in order to detect an effect size of a 0.4 (alpha = .05 and power of .8). Will we have enough students following an ITT protocol (as assigned)? How about following a ToT protocol (as received)?

Case 1: Fall Term

Implementation fidelity, though if this is consi-

dered as testing multiple treat-

ments through a QED, then it

hits the N of 1 confound: only

1 individual offered each

different intervention.

No random assignment.

Avoids N of 1 con-found by having

the same instruc-tor offer both a

treatment and a control condition.

As long as EITHER instructor offers

BOTH, the data will count in the RCT.

N of 1 confound if each

instructor offers 1 of the 2 lectures.

Issues accessing data

Page 22: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

AttritionA. If all of the treatment sections had administered the post-test (i.e., 0% cluster-level attrition within the treatment sections), and the control cluster-level attrition remained the same, would the cluster-level attrition be high or low per WWC guidelines?B. Given what we know about cluster-level attrition not counting against a study twice, would the student-level attrition be high or low?C. Based on the WWC attrition tables, if you are going to “lose” clusters and/or individuals from a study:D. Why is it problematic to have small samples?E. Why is it problematic to have very low treatment attrition with reasonable to high control attrition?F. If you “lose” 60% of both the treatment and control groups (assuming you still have enough data points to have the statistical significance to detect treatment effects), is that a problem for the study based on attrition, per WWC guidelines?

Individual-level attrition

Cluster-level attrition

Increasing compliance,

drastically shrinking

sample sizes

Department UnsupervisedA. Why would not having full implementation fidelity and compliance data hinder ToT (as received) analysis?B. Why would not having full implementation fidelity and compliance data NOT hinder ITT (as assigned) analysis?C. How might a program craft monitoring approaches to maximize monitoring and tracking with small project staffs?

Compliance monitoring/

implementation fidelity

monitoring

Department Unsupervised offers 20 sections of UNSU-110 to the project. In all, 10 treatment and 10 control sections were randomly assigned. Due to project staffing constraints, only the treatment-assigned sections were monitored, and not as thoroughly as desired. Following assignment, 1 treatment-assigned instructor implemented the treatment as designed for the full term, 2 offered it for all but the first three weeks, 2 more offered half the treatment activities for the full term, 2 decided they were not ready and did not implement anything new, and no data are available for the remaining 3 treatment-assigned sections or any of the control-assigned sections. While this does not impact the ITT protocol, your evaluator is claiming that this hinders her efforts to provide you with formative assessment of how the treatment seems to be working.

There are 2 campuses participating in this grant-funded project. At campus 1 in the Fall term, 20 sections were included in the RCT allowing for 10 sections each of treatment and control (Department Normal). However, many faculty misunderstood how RCT works and were found to be in non-compliance, which undermined the data. At campus 2, therefore, only 4 classes were included in RCT because the department did not want to risk non-compliance by having a single instructor assigned to both a treatment and a control condition, so only 1 section per instructor was included in the pool of eligible sections to be randomly assigned. These 4 instructors/sections with their 15 students each were then randomly assigned to the treatment (2 sections, 30 students) and control (2 sections, 30 students) conditions. All 4 of the faculty at campus 2 offered their sections in compliance. In the end, while campus 1 seemed to be better off with a larger sample, the evaluator claims that campus 2 had more meaningful data and found a treatment effect size of .7 SD, while campus 1 detected no treatment effect.

While the primary project outcomes of interest are course GPA, retention, and graduation, which do not lend themselves easily to attrition from the study (even a student who leaves has a data point and therefore remains in the study), another component of the analysis entails psycho-social surveys of students’ sense of belonging in STEM and self-confidence/self-efficacy engaging in STEM coursework and pursuing STEM careers. These constructs are measured using validated survey instruments administered to treatment and control students before Fall term starts and again at the end of the academic year through the study sections. Of the 50 sections (25 treatment and 25 control) that administered the pre-test and were supposed to administer the post-test survey, 12 did not (5 txt, 7 cntrl; 20% and 28% attrition, respectively, with an overall attrition of 24%). Each section had 20 students and all completed the in-class pre-test, but only 52% of the students in the remaining treatment sections took it and 48% of the students in remaining control sections (50% overall attrition, 4% differential attrition). This means that from the original pre-test pool, we only have post-test data for 42% of treatment section students and 35% of control section students (62% overall attrition from the original pool, 7% differential attrition). No one understands why the evaluator is not freaking out about such low response rates and such high attrition.

Implementation fidelity

Compliance.

Sample Size v. ComplianceA. If one of the four sections (treatment or control) had closed due to lack of enrollment, would it have impacted the validity of the data (i.e., introduced confounding factor)? Why or why not?B. What strategies might work on your campus to increase the likelihood that treatment-assigned faculty will offer the treatment and control-assigned faculty will not?

Compliance monitoring/

Implementation fidelity

monitoring

Case 2: Spring Term

Yay! Compliance!

WWC-approved outcomes

You do not need a lot of statistical

power to detect such a large

effect, but you do need Clean Data. Smaller samples are very useful if

they are in compliance.

WWC-approved measure

Distraction – this does NOT repre-

sent actual indivi-dual level attrition

because cluster-level attrition

would then count against us twice

Winter, K., Fernández, E., Avila, S., Johnson, P., & Valad, J. (2018). Working with What Works Clearinghouse Standards to evaluate designs for broadening participation. 2018 Transforming STEM Higher Education, AAC&U Network for Academic Renewal. https://goo.gl/WXXQpJ

Page 23: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

WWC Evidence Standards Glossary of Terms  1

Term  Definition  Why It’s Problematic  Info Analytic  Sample  Size 

The number of cases with data points in the analysis. 

Depending on attrition and the number of variables needed in the analysis (blocking variables, required covariates, etc.), small analytic samples may not have the statistical power necessary to detect the small effect sizes common in education research. 

 

Attrition  The percentage of a sample randomly assigned to a study that is not part of the final analysis sample, calculated overall and for each study condition. Attrition may occur at the cluster-level (in cluster RCTs) and the individual level (in any RCT). 

High attrition precludes RCTs and RDDs from meeting the highest WWC evidence standards and requires demonstration of baseline equivalence. 

 2

Cross-Over  See Non-Compliance     Effect Size  A standardized measure of the magnitude of an 

effect. The effect size represents the change (measured in standard deviations) in an average student’s outcome that can be expected if that student is given the intervention. Because effect sizes are standardized, they can be compared across outcomes and studies. 

Many interventions in education have small effect sizes (less than .3 SD), which necessitate having enough statistical power AND high enough compliance to detect them. 

 3

Experimental Design 

A method of research in the social sciences in which a controlled experimental factor is subjected to special treatment for purposes of comparison with a factor kept constant. For WWC 

4

purposes, includes RCTs, RDDS, and QEDs. 

Requires specific processes and structures, can increase costs and resources needed to conduct the study. 

 

FERPA  The Family Educational Rights and Privacy Act (FERPA) (20 U.S.C. § 1232g; 34 CFR Part 99) is a Federal law that protects the privacy of student education records. The law applies to all schools that receive funds under an applicable program of the U.S. Department of Education. 

Depending on how an institution interprets the rules, this can severely limit access to necessary student data, shrinking analytic samples and reducing statistical power. Secondary analysis of existing data, when used anonymously to understand student outcomes in the aggregate and disaggregated by sub-groups of interest, is typically not protected by FERPA. 

 5

Implementation Fidelity 

When an intervention is implemented as proposed, with regards to content, quality, timing, and duration 

If an intervention requires certain elements to be effective and these are not implemented as intended, it can reduce or eliminate the effectiveness of the intervention. Variation in implementation can therefore impact the treatment effect. 

 6

Informed Consent 

Informed consent is an ethical and legal requirement for research involving human participants. It is the process where a participant is informed about all aspects of the trial, which are important for the participant to make a decision and after studying all aspects of the trial the participant voluntarily confirms his or her willingness to participate.  

For studies of routine educational outcomes, such as GPA and graduation, it is rarely necessary to inform individuals that data already collected (i.e., secondary analysis) is being used anonymously and in an aggregated form to understand institutional or intervention effectiveness. 

Requiring informed consent to access routinely collected and anonymous data can significantly reduce sample sizes and statistical power, while potentially creating additional burden for individuals. 

For studies collecting new data directly from students (interviews, surveys), informed consent is typically required, but new survey technologies permit anonymous data collection and these methods can be exempted from IRB review. 

 7

   

1 Many harvested verbatim from the WWC Glossary of Terms (https://ies.ed.gov/ncee/wwc/Glossary/Group%20Design%20Standards) 2 https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_brief_attrition_080715.pdf 3 https://effectsizefaq.com/2010/05/30/what-are-some-conventions-for-interpreting-different-effect-sizes/ 4 https://www.merriam-webster.com/dictionary/experimental%20design 5 https://www2.ed.gov/policy/gen/guid/fpco/ferpa/index.html 6 https://implementationscience.biomedcentral.com/articles/10.1186/1748-5908-2-40 7 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3777303/ 

Page 24: Eva Fernández, Sabrina Avila, Patrick Johnson, & Jennifer ... #29 Presentation.pdf · Kate Winter (Kate Winter Evaluation, LLC) & Eva Fernández, Sabrina Avila, Patrick Johnson,

Term (Cont.)  Definition  Why It’s Problematic  Info ITT  (Intention to Treat)  Protocol 

All cases within all study conditions are analyzed as if they received the treatment condition to which they were randomly assigned, regardless of whether they did. This is the required approach to meet WWC evidence standards. 

If some treatment-assigned cases do not receive exposure to treatment, and they are analyzed as if they had, it can dilute the treatment average and hide a treatment effect. Conversely, if control-assigned cases receive exposure to treatment, and they are analyzed as if they had not, it can enhance the control average and hide a treatment effect. 

 8

N of 1 Confound  A component of a study that is completely aligned with one of the study conditions, aka a confounding factor. For example, a study may have one intervention school and a different comparison school. In this case, it is impossible to separate how much of the observed effect was due to the intervention and how much was due to the particular school in which the intervention was used. 

In small departments, it can be difficult to create randomly assigned scenarios where there are more than one instructor offering the treatment and more than one offering the control. In these instances, it is necessary to have the same instructor offer both conditions (instructor is no longer completely aligned with either condition), which many instructors do not wish to do. 

 9

Non-Compliance  When a case does not receive the study condition to which it was randomly assigned. 

See ITT Protocol   

Overalignment  A study closely tailors an outcome measure to a condition, or the measure repeats some aspect of a condition (usually the treatment).  

If a measure is overaligned, the study findings may not accurately indicate the effect of the intervention. 

 10

Per Protocol  See Treatment on the Treated (ToT)     QED  (Quasi- experimental Design) 

These designs have a treatment and control group but do not randomly assign cases to them. Rather, baseline equivalence must be established on key variables between the groups prior to analyzing outcome differences between the groups.  

Requires specific processes and structures, can increase costs and resources needed to conduct the study. 

 11

RCT (Randomized Controlled Trial) 

A study design that randomly assigns participants into an experimental group or a control group. As the study is conducted, the only expected difference between the control and experimental groups in a randomized controlled trial (RCT) is the outcome variable being studied.  

12

Requires specific processes and structures, can increase costs and resources needed to conduct the study. Can be difficult to get good data if the design is not implemented well (i.e., random assignment or compliance issues, implementation fidelity issues, attrition issues, etc.) 

 13

RDD  (Regression Discontinuity Designs) 

These designs use a pretest-posttest program-comparison group strategy by assigning participants to conditions solely on the basis of a cutoff score on a pre-program measure.  

Requires specific processes and structures (mathematical magic), can increase costs and resources needed to conduct the study. 

 14

Statistical Power  Statistical power is the likelihood that a study will detect an effect when there is an effect there to be detected. If statistical power is high, the probability of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down. 

Statistical power is affected chiefly by the size of the effect and the size of the sample used to detect it. Bigger effects are easier to detect than smaller effects, while large samples offer greater test sensitivity than small samples. 

 15

Treatment on the Treated (ToT) 

Also called “per protocol” and analyzes the treatment effect on all who received treatment versus all who did not, regardless of random assignment. 

This approach does not meet WWC evidence standards and requires close monitoring and tracking of treatment implementation.  

 

 

8 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3159210/ 9 https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_brief_confounds_101117.pdf 10 https://ies.ed.gov/ncee/wwc/Docs/OnlineTraining/wwc_training_m5.pdf 11 http://www.socialresearchmethods.net/kb/quasiexp.php 12 https://himmelfarb.gwu.edu/tutorials/studydesign101/rcts.html 13 Ibid. 14 http://www.socialresearchmethods.net/kb/quasird.php 15 https://effectsizefaq.com/2010/05/31/what-is-statistical-power/  Winter, K., Fernández, E., Avila, S., Johnson, P., & Valad, J. (2018). Working with What Works Clearinghouse Standards to evaluate designs for broadening participation. 2018 Transforming STEM Higher Education , AAC&U Network for Academic Renewal. Handouts: https://goo.gl/99NXkX ; Slides: https://goo.gl/WXXQpJ