38
Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Embed Size (px)

Citation preview

Page 1: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Session 4: Analysis and reporting

Steve Higgins (Chair)Paul Connolly Stephen Gorard

Page 2: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Analysis of Randomised Controlled Trials (RCTs)

Paul Connolly

Centre for Effective Education

Queen’s University Belfast

Conference of EEF Evaluators: Building Evidence in Education

Training Day, 11 July 2013

Page 3: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Main Analysis of Simple RCT

• These slides provide an introductory overview of one approach to analysing RCTs

• Assume we are dealing with a continuous outcome variable that is broadly normally distributed

• Three variables:• Pre-test score “score1” (centred so that mean = 0)• Post-test score “score2”• Group membership “intervention” (coded 0 = control group;

1 = intervention group)• Basic analysis via linear regression:

predicted score2 = b0*constant + b1* intervention + b2*score1

Page 4: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Main Analysis of Simple RCT

Predicted score2 = b0*constant + b1* intervention + b2*score1

• b0 = adjusted mean post-test score for those in control group

• b0 + b1 = adjusted mean post-test score for those in intervention group

• Estimate standard deviations for post-test mean scores using s.d. for predicted score2 for control and intervention group separately*

• Significance of b1 = significance of difference between post-test mean scores for intervention and control groups

• Effect size, Cohen’s d = b1 / [s.d. for pred. score2]

• 95% confidence interval for effect size:

= b1 ± 1.96*(standard error of b1)

standard deviation for pred. score2

*Most statistical software packages provide the option of creating a new variable comprising the predicted scores of the model. This new variable is the one to use to estimate standard deviations for adjusted post-test scores.

Page 5: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Exploratory Analysis of Mediating Effectsfor RCT

• Take example of gender differences (variable “boy”, coded as: 0 = girls; 1= boys)

• Analysis via extension of basic linear regression model:

predicted score2 = b0*constant + b1* intervention + b2*score1

+ b3*boy + b4*boy*intervention

• Significance of b4 indicates whether there is evidence of an interaction effect (i.e. in this case that the intervention has differential effects for boys and girls)

• Same approach when your contextual variable is continuous rather than binary as here

Page 6: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Exploratory Analysis of Mediating Effectsfor RCT

predicted score2 = b0*constant + b1* intervention + b2*score1

+ b3*boy + b4*boy*intervention

• Use the model to estimate adjusted mean post-test scores*:

• b0 = girls in control group

• b0 + b3 = girls in control group

• b0 + b1 = girls in intervention group

• b0 + b3+ b4 = boys in intervention group

• Estimate standard deviations by calculating s.d. for predicted score2 for each subgroup separately

*When dealing with a continuous contextual variable, it is often still useful to calculate adjusted mean post-test scores to illustrate any interaction effects found. This can be done by using the model to predict the adjusted post-test mean scores for those participants in the control and intervention groups who have a score for the contextual variable concerned that is one standard deviation below the mean and then doing the same for those who have a score one standard deviation above the mean.

Page 7: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Extending the Analysis

• For trials with binary or ordinal outcome measures, the same approach can be used but with generalised linear regression models:– Binary logistic regression (binary outcomes)– Ordered logistic regression (ordinal outcomes)

• For cluster randomised trials (with >30 clusters), the same models can be used but extended to create two level models

• For quasi-experimental designs, either:– Same models as above but adding in a number of additional

co-variates (all centred) to control for pre-test differences– Propensity score matching

• For repeated measures designs can also extend the above using multilevel models with observations (level 1) clustered within individuals (level 2)

Page 8: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Discussion (2 mins)

Write on post-it notes:

• What are the key issues or questions for evaluators?

• Have you found any solutions?

Page 10: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

What is N?

how many cases were assessed for eligibility?

how many of those assessed did not participate, and for what reasons (not meeting criteria, refused etc.)?

how many then agreed to participate?

how many were allocated to each group (if relevant)?

how many were lost or dropped out after agreeing to participate (and after allocation to a group, if relevant)?

how many were analysed, and why were any further cases excluded from the analysis?

Page 11: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Allocation Pre-test score Post-test score Reason

Treatment group 78 - Left school, not traced

Treatment group 73 - Long-term sick during post-test

Control 74 - Left school, new school would not test

Control 75 - Withdrawn, personal reasons

Control - 70 Pre-test not recorded, technical reasons

Control 73 - Permanently excluded by school

In total, 314 individual Year 7 pupils took part in the study. 157 pupils were assigned to treatment and 157 to control. The sample included students from a disadvantaged background (eligible for free school meals), those with a range of learning disabilities (SEN) and those for whom English was a second language. By the final analysis six students had dropped out or could not be included in the gain score analysis. One took the pre-test (repeatedly) but his school were unable to record the score. His post-test score was 78, and he would have been in the control. Five others took the pre-test but did not sit the post-test. One left the school and could not be traced, initially scored 78 and would have been in treatment. One left the school and their new school was not able to arrange the post-test, initially scored 64 and would have been control. One changed schools, one could not get their score saved at pre-test, one refused to cooperate and one was persistently absent at post-test (perhaps excluded). Although this loss of data, and the reduction of the sample to 308 pupils, is unfortunate, there is no specific reason to believe that this dropout was biased or favoured one group over the other. Pupils allocated to groups but with no gain score, and reason for omission

An example of reporting problems with a sample

Source: Gorard, S., Siddiqui, N. and See, BH (2013) Process and summative evaluation of the Switch-On literacy transition programme, Report to the Educational Endowment Foundation

Page 12: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard
Page 14: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Discussion (2 mins)

Write on post-it notes:

• What are the key issues or questions for evaluators?

• Have you found any solutions?

Page 15: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Calculating effect sizes and the toolkit meta-analysis –

implications for evaluators Steve Higgins

[email protected]

School of Education, Durham University

EEF Evaluators Conference, June 2013

Page 16: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Sutton Trust/EEF Teaching and Learning Toolkit

Comparative evidence Aims to identify ‘best buys’ for schools Based on meta-analysis

http://educationendowmentfoundation.org.uk/toolkit

Page 17: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

What is meta-analysis?

A way of combining the results of quantitative research To accumulate evidence from smaller studies To compare results of similar studies - consistency To investigate patterns of association in the findings of

different studies – explaining variation ‘Surveys’ research studies

Page 18: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Why meta-analysis?

Cumulative – synthesis of evidence Based on size of effect and confidence intervals

rather than significance testing – patterns in the data

Identifying and understanding variation helps develop explanatory models

Page 19: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

What is an “effect size”? Standardised way of looking at difference

Different methods for calculation Binary (Risk difference, Odds ratio, Risk ratio) Continuous

Correlational (Pearson’s r) Standardised mean difference (d, g, Δ)

Difference between control and intervention group as proportion of the dispersion of scores

Intervention group score – control group score / standard deviation of scores

Page 20: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Examples of Effect Sizes:

ES = 0.2“Equivalent to the difference in heights between 15 and 16 year old girls”

58% of

control group below

mean of experimental

group

Probability you could guess which group a person was in = 0.54Change in the proportion above a given threshold:

from 50% to 58% or from 75% to 81%

Page 21: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

“Equivalent to the difference in heights between 13 and 18 year old girls”

79% of

control group below

mean of experimental

group

Probability you could guess which group a person was in = 0.66

ES = 0.8

Change in the proportion above a given threshold:

from 50% to 79% or from 75% to 93%

Page 22: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

The rationale for using effect sizes Traditional quantitative reviews focus on statistical

significance testing Highly dependent on sample size Null finding does not carry the same “weight” as a

significant finding Meta-analysis focuses on the direction and

magnitude of the effects across studies From “Is there a difference?” to “How big is the

difference?” and “How consistent is the difference?” Direction and magnitude represented by “effect size”

Page 23: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Issues and challenges in meta-analysis

Conceptual Reductionist - the answer is .42 Comparability - apples and oranges Atheoretical - ‘flat-earth’

Technical Heterogeneity Publication bias Methodological quality

Page 24: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Comparative meta-analysis

Theory testing Emphasises

practical value Incorporate

EEF findings in new Toolkit meta-analyses

Ability grouping

Slavin 1990 b (secondary low attainers) -0.06

Lou et al 1996 (on low attainers) -0.12

Kulik & Kulik 1982 (secondary - all) 0.10

Kulik & Kulik 1984 (elementary - all) 0.07

Meta-cognition and self-regulation strategiesAbrami et al. 2008 0.34

Haller et al. 1988 0.71

Klauer & Phye 2008 0.69

Higgins et al. 2004 0.62

Chiu 1998 0.67

Dignath et al. 2008 0.62

Page 25: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Calculating effect sizes The difference between the two means,

expressed as a proportion of the standard deviation ES = (Me – Mc) / SD

Cohen's d

Glass’ Δ

Hedges' g

Page 26: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Reporting effect sizes: RCTs

Post-test standardised mean difference with confidence intervals Fixed effect ok for individual randomisation Not for clusters…

Cluster analysis MLM

Equivalent measure Other comparisons Matched, Regression discontinuity

http://www.cem.org/evidence-based-education/effect-size-calculator

Page 27: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Discussion task

What analyses are you intending to undertake? How do you plan to calculate effect size(s)? What statistical techniques:

1. Are you confident to undertake?2. Would be happy to advise other evaluation teams?3. Would appreciate advice and/or support?

Page 28: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Key requirement: be explicit…

Describe analysis decisions (e.g. ITT and missing data) Report clusters separately Submit complete data-set in case different

analysis is required for comparability

Page 29: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

References, further readings and information

Books and articlesBorenstein, M., Hedges, L.V., Higgins, J.P.T. & Rothstein, H.R. (2009) Introduction to Meta Analysis (Statistics in Practice) Oxford: Wiley

Blackwell.Chambers, E.A. (2004). An introduction to meta-analysis with articles from the Journal of Educational Research (1992-2002). Journal of

Educational Research, 98, pp 35-44.Cooper, H.M. (1982) Scientific Guidelines for Conducting Integrative Research Reviews Review Of Educational Research 52; 291.Cooper, H.M. (2009) Research Synthesis and meta-analysis: a step-by-step approach London: SAGE Publications (4th Edition).Cronbach, L. J., Ambron, S. R., Dornbusch, S. M., Hess, R.O., Hornik, R. C., Phillips, D. C., Walker, D. F., & Weiner, S. S. (1980). Toward

reform of program evaluation: Aims, methods, and institutional arrangements. San Francisco, Ca.: Jossey-Bass. Eldridge, S. & Kerry, S. (2012) A Practical Guide to Cluster Randomised Trials in Health Services Research London: Wiley BlackwellGlass, G.V. (2000). Meta-analysis at 25. Available at: http://glass.ed.asu.edu/gene/papers/meta25.html (accessed 9/9/08)Lipsey, Mark W., and Wilson, David B. (2001). Practical Meta-Analysis. Applied Social Research Methods Series (Vol. 49). Thousand Oaks,

CA: SAGE Publications.Torgerson, C. (2003) Systematic Reviews and Meta-Analysis (Continuum Research Methods) London: Continuum Press.

WebsitesWhat is an effect size?, by Rob Coe: http://www.cemcentre.org/evidence-based-education/effect-size-resources The meta-analysis of research studies: http://echo.edres.org:8080/meta/The Meta-Analysis Unit, University of Murcia: http://www.um.es/metaanalysis/The PsychWiki: Meta-analysis: http://www.psychwiki.com/wiki/Meta-analysis Meta-Analysis in Educational Research: http://www.dur.ac.uk/education/meta-ed/

Page 30: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Discussion (2 mins)

Write on post-it notes:

• What are the key issues or questions for evaluators?

• Have you found any solutions?

Page 31: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Interpreting and Reporting Findingsand Managing Expectations

Paul Connolly

Centre for Effective Education

Queen’s University Belfast

Conference of EEF Evaluators: Building Evidence in Education

Training Day, 11 July 2013

Page 32: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Interpreting Findings

• Findings:– only relate to the outcomes measured– represent effects of programme compared to what those in

the control group currently receive– usually only relate to sample recruited (and thus are

context- and time-specific)• Dangers of:

– ‘fishing exercises’ characterised by post-hoc decisions to consider other outcomes and/or differences in effects for differing sub-groups

– hypothesising regarding the causes of the effects (or reasons for the non-effects)

Page 33: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Reporting Findings

• Being clear:– Option of using adjusted post-test scores – Conversion of findings into effect sizes more readily

understandable (e.g. ‘improvement index’)• Being transparent:

– Identify outcomes at the beginning and stick to these; register the trial

– Report methods fully (CONSORT statement)• Being tentative:

– Acknowledge limitations– Move from evidence of “what works” to evidence of “what

works for specific pupils, in a particular context and at a particular time”

Page 34: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Example: Adjusted post-test scores

Source: Connolly, P., Miller, S. & Eakin, A. (2010) A Cluster Randomised Controlled Trial Evaluation of the Media Initiative for Children: Respecting Difference Programme. Belfast: Centre for Effective Education (p. 31).See: http://www.qub.ac.uk/research-centres/CentreforEffectiveEducation/Publications/

Page 35: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Example: Improvement index

• Take effect size and convert to Cohen’s U3 index (either by using statistical tables of effect size calculators online)

• The improvement index represents the increase/decrease in the percentile rank for an average student in the intervention group (assuming at pre-test they are at the 50th percentile)

• Effect size of 0.30 U3 of 62% i.e. the intervention is likely to result in an average student in the intervention group being ranked 12 percentile points higher compared to the average student in the control group (who would remain at the 50th percentile).

0.10 4 percentile points

0.20 8 percentile points

0.40 16 percentile points

0.50 19 percentile points

Page 36: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Managing Expectations

• Regular and ongoing communication is the key• Importance of logic models and agreement of outcomes with

programme developers/providers at the outset– Careful consideration of the intervention and associated

activities and clear link between these and expected outcomes

– Ensure outcomes are domain-specific• Include sufficient time to discuss findings with programme

developers/providers– Talk through possible interpretations– Discuss further potential analyses (but be clear that these

are exploratory)

Page 37: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Discussion (2 mins)

Write on post-it notes:

• What are the key issues or questions for evaluators?

• Have you found any solutions?

Page 38: Session 4: Analysis and reporting Steve Higgins (Chair) Paul Connolly Stephen Gorard

Group discussion and feedback

Tables will be arranged by theme.

Evaluators should move to the table with a theme which either they are able to contribute expertise on or which they are struggling with.

Tables should discuss:• What are the key issues or questions for evaluators?• What are the solutions?• How can the EEF help?

Feedback from tables.