Upload
kacia
View
46
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Building Evidence in Education: Conference for EEF Evaluators 11 th July: Theory 12 th July: Practice www.educationendowmentfoundation.org.uk. The EEF by numbers. 1,800 schools participating in projects. 300,000 pupils involved in EEF projects. 33 topics in the Toolkit. - PowerPoint PPT Presentation
Citation preview
Building Evidence in Education:Conference for EEF Evaluators
11th July: Theory12th July: Practice
www.educationendowmentfoundation.org.uk
The EEF by numbers
56 projects
funded to date
1,800 schools
participating in projects
33 topics in
the Toolkit
16 independent evaluation
teams
300,000 pupils involved in EEF projects11
members of EEF team
£200mestimated spend over lifetime of
the EEF
3,000 heads
presented to since launch
Research Design
Stephen Gorard
http://www.evaluationdesign.co.uk/
S t e p h e n G o r a r d
R esea rch D es ign
Rob ust A pproa ches for the S o cia l S ciences
Phase 4 Prototyping and trialling
Phase 1 Evidence synthesis
Phase 6 Definitive
testing
Phase 7 Dissemination
impact and monitoring
Phase 2 Development
of idea or artifact
Phase 3 Feasibility
study
Phase 5 Field studies and design
stage
Outline of a full cycle of research
A model of causation in social science
Association - For X (a possible cause) and Y (a possible effect) to be in a causal relationship they must be repeatedly associated. This association must be strong and clearly observable. It must be replicable, and it must be specific to X and Y.
Sequence – X and Y must proceed in sequence. X must always precede Y (where both appear), and the appearance of Y must be safely predictable from the appearance of X.
Intervention - It must have been demonstrated repeatedly that an intervention to change the strength or appearance of X strongly and clearly changes the strength or appearance of Y.
Explanatory mechanism - There must a coherent mechanism to explain the causal link. This mechanism must be the simplest available without which the evidence cannot be explained. Put another way, if the proposed mechanism were not true then there must be no simpler or equally simple way of explaining the evidence for it.
Red herrings and real problems. Some reflections on the evaluation of Aimhigher
http://www.heacademy.ac.uk/assets/documents/aim_higher/Aspire-Reflections_on_evaluation_of_Aimhigher.doc
In an influential review of Widening Participation (WP) research written for the HEFCE and published in July 2006, Gorard et al (2006) have harshly criticised the evaluation of WP initiatives. In their view, to date no convincing evidence of impact has been produced on pre-entry interventions for school pupils and partnership-based interventions, such as Aimhigher.
Gorard et al’s criticisms were addressed by the HEFCE in another review of WP research published later in the same year, in November 2006, and based on a survey of the evidence collected by the HEIs. It reasserted the value [of Aimhigher and other WP initiatives] as a monitoring and evaluating device and emphasised that, to date, attitudes of learners and teachers have been consistently and overwhelmingly positive. HEFCE feels satisfied that convincing and precise evidence has been produced on attainment by the national evaluation carried out by the National Foundation for Educational Research (NFER), and, to a lesser extent, on HE participation by the NFER and the HEIs. For example, it has been found that participating in Aimhigher activities was associated with ‘[a]n average improvement of 2.5 points in GCSE total point scores’ and a ‘3.9 percentage point increase in Year 11 pupils intending to progress to HE’ (HEFCE 2006: 23). Moreover, ‘[i]f the ‘evidence bar’ is set too high’, the HEFCE (2006: 6-7) pointed out, ‘we run the risk of discouraging any attempt to estimate the effectiveness of the interventions’. There seems no scope for setting up a social science experiment in which the experiences of a wp group is compared with a control group.
Session 1: Part 2: Trial design (45 mins.)
Professor David TorgersonDirector, York Trials Unit,
University of [email protected]
Professor Carole TorgersonSchool of Education, Durham
2008 Palgrave Macmillan
Key design issues
• Independent concealed randomisation• Type of randomisation• Types of trials• Sample size• Regression discontinuity design
Independent concealed randomisation
• One of the most important issues is the need to undertake independent allocation.
• Many methodological studies have shown that unless someone who is disinterested in the trial results undertakes the randomisation there is a serious risk of bias.
• In health trials it is the source of bias that has the most evidence.
Subversion of a health RCTClinician Experimental ControlAll p < 0.01 59 631 p =.84 62 612 p = 0.60 43 523 p < 0.01 57 724 p < 0.001 33 695 p = 0.03 47 72Others p = 0.99 64 59
0.0
5.1
.15
Den
sity
-10 -5 0 5logit (p-value)
Adequate InadequateUnclear
Hewitt et al. BMJ;2005:.
Type of randomisation• Simple or restricted?• Simple, similar to tossing a coin
» Advantages: difficult to go wrong; with large samples (n > 100) and combined with ANCOVA is efficient
» Disadvantages: for small samples can produce imbalance and inefficiency in analysis.
• Restricted, ensures better balance» Advantages: gets better balance and more efficient
for small samples» Disadvantages: more complicated; can go wrong
Restricted allocation
• Minimisation» Not strictly randomisation; uses algorithm to
ensure balance in covariates• Stratified
» Using blocks of repeating allocations produces balance on 1 or 2 variables
• Matched pairs» Matches units (e.g., schools) and allocates
one to each group; can reduce power in some cases and has other disadvantages
Discussion (5 mins.)
• Discuss how randomisation was undertaken in your EEF trial(s) and note whether this was independent and concealed, and whether it was restricted. If so, what method was used?
Types of trial
• Individual randomisation» Most powerful design for given sample size
• Cluster design» Randomises groups of individuals (classes;
schools; periods of time; geographical areas)• Stepped wedge
» Type of cluster design; randomises order of implementation so all schools eventually receive intervention
Individual allocation
• Appropriate when it is possible to separate intervention and control conditions
• DISCOVER summer school evaluation using individual randomisation as control children cannot gain access to intervention
• Many educational interventions are delivered at class or school level – so can’t use individual allocation
Variations on a theme
• Factorial designs» Two trials for the price of one
• Unequal allocation» When the sample size is fixed equal allocation
best; when costs are fixed unequal best – DISCOVER using unequal allocation for intervention to ensure efficient use of summer school resources.
Individual RCT: key points• Trial registration• Pre-test BEFORE randomisation• Independent allocation• Spill over/contamination must not exceed 30%
or cluster allocation more efficient• Post-testing done blindly or in exam conditions,
marking done blindly• Primary outcome specified before analysis• Statistical analysis plan written and approved
before data are examined
Cluster allocation
• More complex to design than individual RCT
• Many educational interventions need to use cluster allocation
• Cluster allocation usually avoids contamination and can make intervention delivery logistically easier
Cluster allocation: additional key points
• Small number of clusters – so usually need to use restricted randomisation
• Need to recruit participants and pre-test BEFORE cluster allocation
• Teachers must be linked to class BEFORE randomisation
• Analysis and sample size need to take clustering into account
• Best to have large numbers of clusters with small numbers per cluster than few clusters with large numbers
Variations on a theme• What level of randomisation?
» Pupil > class > year > school• Balanced design
» An efficient design is a balanced approach – Year 7 gets intervention in half schools and Year 8 gets intervention in other schools with each school’s adjacent year acting as control
» Or Year 7 in intervention schools get literacy intervention and Year 7s in control get maths
• Split plot» Cluster level allocation followed by individual randomisation. A
form of factorial. Exeter evaluation using partial split plot
Stepped wedge
• A form of cluster design, which may be more efficient than standard cluster design
• If we have 12 schools all are pre-tested; 4 randomised for first 6 months and all tested; another 4 are given intervention and all tested; final 4 given intervention and all tested
• Requires testing at every point
Discussion (5 mins.)
• Discuss the trial designs that have been used and the challenges associated with them.
Sample size calculation
• Most interventions will not work very well.» Effect sizes of 0.20 to 0.3 – likely» Effect sizes 0.30 to 0.50 – unusual» Effect sizes >0.50 – very unlikely
• Need large sample sizes to detect modest differences. Example: 512 for 0.25; 800 for 0.20 (not clustered design)
• Powerful covariate can reduce this» 0.70 correlation reduces sample size by 50%
How to do it?
• Free programmes on line» PSPower; Optimal Design Software
• In your head (back of envelope) using approximation formula (i.e., 32/Effect Size squared)
• Fixed sample size» Still good practice to estimate likelihood of
difference.
Pilot trials – sample size
• Modelling study suggests that a study with 10% of the main study’s sample will produce a 1 sided 80% confidence interval that will include the ‘true’ estimate if it exists
Cocks K, Torgerson DJ. Sample size calculations for pilot randomised trials: a confidence interval approach. Journal of Clinical Epidemiology 2013;66:197-201
Discussion
• Discuss how sample size calculations were undertaken and whether sample sizes are large enough to detect modest differences between groups.
Regression discontinuity
• Theoretically the most robust, non-randomised approach, is the RD design
• Rediscovered several times since Thistlewaite and Campbell first described it in the 1960s
What is it?
• Regression discontinuity, sometimes known as risk based cut-off design, selects people into a group on the basis of a measureable continuous variable
• For example, age, test scores, waiting list, income
How does it work?
• Selecting on a pre-test variable we then correlate post test outcomes with the pre-test variable and test to see if there is an interruption, break or discontinuity in the regression line
• Effective treatment
Ineffective treatment
Do summer schools work?
• Some states in the USA mandate summer schools for children who fall below a certain score in a high stakes test
• But will sending children off to have extra tuition during their summer break be effective?
• Because the children chosen are chosen in the basis of a cut point on a quantitative scale this ideal RD territoryJacob and Lefgren, Rev of Economics and Statistics, 2004,86:226-44.
Proportion treated by test scores
Treatment against outcomes
Evaluation of SHINE on secondaries
• Randomised controlled trial design not possible
• Regression discontinuity design with ‘tie-breaker randomisation’
• Advantages of this design• Challenges of this design