Www.3ieimpact.org Hugh Waddington Publication bias in impact evaluation: evidence from a systematic review of farmer field schools Hugh Waddington, 3ie

www.3ieimpact.orgHugh Waddington

Publication bias in impact evaluation: evidence from a

systematic review of farmer field schools

Hugh Waddington, 3ie

International Initiative for Impact Evaluation


Acknowledgements

• Jorge Hombrados, J-PAL Latin America (co-author)

• Birte Snilstveit, co-PI on farmer field school review

• FFS co-authors: Martina Vojtkova, Daniel Phillips

• Presentation based on training on publication bias provided by Emily Tanner-Smith, Campbell Collaboration


“The haphazard way we individually and collectively study the fragility of inferences leaves most of us unconvinced that any inference is believable... It is important we study fragility in a much more systematic way”

Edward Leamer: Let’s take the con out of econometrics, AER 1983


What is publication bias?

• Publication bias refers to bias that occurs when research found in the published literature is systematically unrepresentative of the population of studies (Rothstein et al., 2005)

• On average published studies have a larger mean effect size than unpublished studies, providing evidence for a publication bias (Lipsey and Wilson 1993)

• Also referred to as the ‘file drawer’ problem: “…journals are filled with the 5% of studies that show Type I errors, while the file drawers

back at the lab are filled with the 95% of the studies that show non-significant (e.g. p < 0.05) results” (Rosenthal, 1979)

• Well-documented in other fields of research (biomedicine, public health, education, crime & justice, social welfare) – entertaining overviews in Ben Goldacre’s Bad Science and Bad Pharma


Types of reporting biasesDefinition

Publication bias The publication or non-publication of research findings, depending on the nature and direction of results

Time lag bias The rapid or delayed publication of research findings, depending on the nature and direction of results

Multiple publication bias The multiple or singular publication of research findings, depending on the nature and direction of results

Location bias The publication of research findings in journals with different ease of access or levels of indexing in standard databases, depending on the nature and direction of results

Citation bias The citation or non-citation publication of research findings, depending on the nature and direction of results

Language bias The publication of research findings in a particular language, depending on the nature and direction of results

Outcome reporting bias The selective reporting of some outcomes but not others, depending on the nature and direction of results

Source: Sterne et al. (Eds.) (2008: 298)


How much of a problem is it likely to be in international development research?

• ‘Exploratory’ research tradition in social sciences suggests potentially severe problems of file drawer effects

• Publication bias may be partly mitigated by tradition of publishing ‘working papers’ and modern electronic dissemination

• File drawer effects arguably more problematic for observational data (and small sample intervention studies)

• Testing for publication bias usually relies on testing for ‘small study effects’; but biases due to small study effects may also result from other factors

=> But we need more evidence since very little development research has addressed this topic


• FFS originally associated with FAO and Integrated Pest Management (IPM)

• Originated in response to the overuse of pesticides in irrigated rice systems in Asia

• Belief that farmers need confidence to reduce dependence on pesticides, through ‘discovery learning’

• Aim to promote use of good practices and improve agriculture and other outcomes

• Now applied globally in 90+ countries, millions of beneficiaries, range of crops and curricula

Farmer field schools


• Group of 25 farmers, meeting once a week in a designated field during the growing season

• Exploratory: facilitator encourages farmers to ask questions, and to seek answers, rather than lecturing or giving recommendations.

• Experimentation: group manages two plots

• Participatory: emphasis on social learning with exercises to build group dynamics

• Field days and follow-up activities may be provided for diffusion of message to neighbours

A ‘best practice’ FFS

(c) JM Micaud for FAO


3ie review motivated by polarised debate

• "Studies reported substantial and consistent reductions in pesticide use attributable to the effect of training. In a number of cases, there was also a convincing increase in yield due to training.... Results demonstrated remarkable, widespread and lasting developmental impacts” (Van den Berg 2004, FAO)

• “The analysis, employing a modified ‘difference-in-differences’ model, indicates that the program did not have significant impacts on the performance of graduates and their neighbors” (Feder et al. 2004)

• But how good are they really - what does a systematic review of the evidence say?


3ie’s review objectives and background

• Produce high quality review of relevance to decision makers

• Mixed methods review of effects on outcomes along causal chain and barriers and facilitators of change

• Peer review managed by Campbell Collaboration

• Discussion with FAO led to inclusion of wide range of impact evaluation research being included in the effectiveness review


Large body of evidence found

• 3ie systematic review found 93 separate ‘impact evaluations’ in LMICs• Experimental, quasi-experimental with controlled comparison (no treatment, pipeline,

other intervention) were included• Wide variation in attribution methods used: no RCTs, quasi-experiments of varying

quality• Small samples: 400 farmers on average (sample size ranges from 24 to 3,000), often in

only a handful of villages, and short follow-up periods (usually less than 2 years)

• Studies collected measuring outcomes across causal chain: – Knowledge– Adoption– Agriculture outcomes (yields, net revenues)– Health, environment, empowerment outcomes

• Analysis today focuses today on impacts on yields for FFS participants: usually self-reported weight of production per unit of area


Study Region (country) Crop Yield outcome measureAli and Sharif, 2011 SA (Pakistan) Cotton Yield (kg per ha)Birthal et al., 2000 SA (India) Cotton Value of Yield (value per ha)Carlberg et al., 2012 SSA (Ghana) Other

staple/veg.Yield (50 kg bags per acre 2010).

Cavatassi et al., 2011 LAC (Ecuador) Other staple/veg.

Yield (kg per ha)

Davis et al., 2012 SSA (Kenya, Tanzania)

Other staple/veg.

Value of Yield (growth rate in value local currency per acre)

Dinpanah et al., 2010 MENA (Iran) Rice Yield (ton per ha)Feder et al., 2004 EAP (Indonesia) Rice Yield (growth rate in yield, kg per ha)Gockowski et al., 2010 SSA (Ghana) Tree crop Sales (quantity of produce sold in 2004/05

season)Hiller et al., 2009 SSA (Kenya) Tree crop Yield (growth rate in yield, kg per acre)Huan et al., 1999 EAP (Vietnam) Rice Yield (ton per ha)Khan et al., n.d. SA (Pakistan) Cotton Yield (growth rate in yield, kg per ha) Labarta, 2005 LAC (Nicaragua) Other

staple/veg.Yield (per ha)

Mutandwa & Mpangwa, 2004

SSA (Zimbabwe) Cotton Yield (number of bales)

Naik et al., 2008 SA (India) Other staple/veg.

Yield (quintals of produce)

Orozco Cirilo et al., 2008 LAC (Mexico) Other staple/veg.

Yield (growth rate in ton per ha)

Palis, 1998 EAP (Philippines) Rice Yield (growth rate in ton per ha)Pananurak, 2010 EAP (China)

SA (India, Pakistan)

Cotton Yield (growth rate in kg per ha)

Pande et al., 2009 SA (Nepal) Rice Yields (ton/ha)Rejesus et al., 2010 EAP (Vietnam) Rice Yields (growth rate in tonnes per ha)Todo & Takahashi, 2011 SSA (Ethiopia) Other

staple/veg.Value of production (growth rate, in Eth birr)

Van den Berg et al., 2002 SA (Sri Lanka) Rice Yield (kg per ha)Van Rijn, 2010 LAC (Peru) Tree crop Yield (kg per ha, 2007)Wandji et al., 2007 SSA (Cameroon) Tree crop Sales (Kg of cocoa sold in the 2004-05 season)Wu Lifeng, 2010 EAP (China) Cotton Yield (growth rate in kg per ha)Yang et al., 2005 EAP (China) Cotton Yield (kg per ha)Yamazaki and Resosudarmo, 2007

EAP (Indonesia) Rice Yield (growth rate in kg per ha)

Zuger, 2004 LAC (Peru) Other staple/veg.

Yield (ton per ha)

Study characteristics


Unit of analysis is the study-level ‘effect size’

• ‘Response ratio’ effect size calculated for each study:

or

• RR standard error calculations:

or

c

t

Y

YRR

c

c

Y

Y

22

11

cctt

pRRYnYn

SSEt

RR)ln(exp


Before we turn to examination of publication bias, here’s some summary results from the meta-analysis of outcomes along the causal chain


Farmer field school stylised causal chain

Input 1 Training of trainers

Input 2 Field school

Adoption (FFS participants)

Knowledge (FFS participants)

Knowledge (FFS neighbours)

Adoption (FFS neighbours)

Measured impacts: Yield, income/net

revenue, empowerment, environment & health

outcomes


NOTE: Weights are from random effects analysis

.

.

FFS participants

Huan et al., 1999 (Vietnam)

Endalew, 2009 (Ethiopia)

Price et al., 2001 (Philippines)

Rao et al., 2012 (India)

Reddy & Suryamani, 2005 (India)

Mutandwa & Mpangwa, 2004 (Zimbabwe)

Dinpanah et al., 2010 (Iran)

Khan et al., 2007 (Pakistan)

Bunyatta et al., 2006 (Kenya)

Erbaugh, 2010 (Uganda)

Rebaudo & Dangles, 2011 (Ecuador)

Subtotal (I-squared = 93.9%, p = 0.000)

FFS neighbours


Reddy & Suryamani, 2005 (India)

Ricker-Gilbert et al, 2008 (Bangladesh)

Rebaudo & Dangles, 2011 (Ecuador)


ID

Study

0.02 (-0.06, 0.10)

0.27 (-0.06, 0.60)

0.42 (-0.17, 1.01)

0.43 (-0.02, 0.87)

0.45 (-0.04, 0.94)

0.59 (0.25, 0.92)

0.67 (0.41, 0.92)

0.79 (0.29, 1.29)

1.03 (0.65, 1.41)

1.14 (0.93, 1.34)

1.79 (1.17, 2.41)

0.67 (0.33, 1.02)

-0.13 (-0.68, 0.42)

0.05 (-0.45, 0.56)

0.17 (-0.25, 0.59)

0.38 (-0.15, 0.91)

0.13 (-0.12, 0.37)

ES (95% CI)

0.02 (-0.06, 0.10)

0.27 (-0.06, 0.60)

0.42 (-0.17, 1.01)

0.43 (-0.02, 0.87)

0.45 (-0.04, 0.94)

0.59 (0.25, 0.92)

0.67 (0.41, 0.92)

0.79 (0.29, 1.29)

1.03 (0.65, 1.41)

1.14 (0.93, 1.34)

1.79 (1.17, 2.41)

0.67 (0.33, 1.02)

-0.13 (-0.68, 0.42)

0.05 (-0.45, 0.56)

0.17 (-0.25, 0.59)

0.38 (-0.15, 0.91)

0.13 (-0.12, 0.37)

ES (95% CI)

Favours intervention 0-.5 0 .5 1 3

Knowledge of ‘improved’ farming practices



.

.

FFS participantsYamazaki & Resosudarmo, 2007 (Indonesia)Birthal et al., 2000 (India)Yang et al., 2005 (China)Yorobe & Rejesus, 2011 (Philippines)Yang et al., 2005 (China)Khan et al., 2007 (Pakistan)Khalid, n.d. (Sudan)Rejesus et al, 2010 (Vietnam)Pananurak, 2010 (India)Mutandwa & Mpangwa, 2004 (Zimbabwe)Pananurak, 2010 (Pakistan)Amera, 2008 (Kenya)Pananurak, 2010 (China)Mancini et al., 2008 (India)Wu Lifeng, 2010 (China)Huan et al., 1999 (Vietnam)Van den Berg et al., 2002 (Sri Lanka)Praneetvatakul & Waibel, 2006 (Thailand)Murphy et al., 2002 Vietnam)Cole et al., 2007 (Ecuador)Ali & Sharif, 2011 (Pakistan)Khan et al., 2007 (Pakistan)Labarta, 2005 (Nicaragua)Feder et al, 2004 (Indonesia)Cavatassi et al., 2011 (Ecuador)Friis-Hansen et al., 2004 (Uganda)Subtotal (I-squared = 93.2%, p = 0.000)

FFS neighboursPananurak, 2010 (India)Khan et al., 2007 (Pakistan)Yamazaki & Resosudarmo, 2007 (Indonesia)Wu Lifeng, 2010 (China)Pananurak, 2010 (Pakistan)Labarta, 2005 (Nicaragua)Pananurak, 2010 (China)Praneetvatakul & Waibel, 2006 (Thailand)Khan et al., 2007 (Pakistan)Feder et al, 2004 (Indonesia)Subtotal (I-squared = 84.6%, p = 0.000)

IDStudy

0.20 (0.01, 3.23)0.21 (0.17, 0.26)0.32 (0.21, 0.48)0.37 (0.18, 0.78)0.41 (0.36, 0.46)0.46 (0.39, 0.54)0.48 (0.31, 0.75)0.52 (0.24, 1.12)0.52 (0.30, 0.92)0.57 (0.36, 0.89)0.59 (0.41, 0.87)0.61 (0.52, 0.71)0.65 (0.50, 0.84)0.67 (0.46, 0.97)0.71 (0.64, 0.80)0.72 (0.62, 0.84)0.82 (0.74, 0.90)0.82 (0.68, 0.98)0.83 (0.75, 0.93)0.88 (0.68, 1.13)0.90 (0.75, 1.09)0.91 (0.28, 2.94)0.95 (0.39, 2.34)1.30 (1.08, 1.57)1.34 (0.99, 1.80)1.42 (1.09, 1.86)0.66 (0.56, 0.78)

0.54 (0.25, 1.15)0.61 (0.51, 0.74)0.67 (0.12, 3.88)0.68 (0.62, 0.76)0.78 (0.40, 1.49)0.99 (0.42, 2.33)1.11 (0.69, 1.79)1.15 (0.92, 1.43)1.20 (0.40, 3.53)1.30 (1.09, 1.55)0.88 (0.68, 1.14)

ES (95% CI)

0.20 (0.01, 3.23)0.21 (0.17, 0.26)0.32 (0.21, 0.48)0.37 (0.18, 0.78)0.41 (0.36, 0.46)0.46 (0.39, 0.54)0.48 (0.31, 0.75)0.52 (0.24, 1.12)0.52 (0.30, 0.92)0.57 (0.36, 0.89)0.59 (0.41, 0.87)0.61 (0.52, 0.71)0.65 (0.50, 0.84)0.67 (0.46, 0.97)0.71 (0.64, 0.80)0.72 (0.62, 0.84)0.82 (0.74, 0.90)0.82 (0.68, 0.98)0.83 (0.75, 0.93)0.88 (0.68, 1.13)0.90 (0.75, 1.09)0.91 (0.28, 2.94)0.95 (0.39, 2.34)1.30 (1.08, 1.57)1.34 (0.99, 1.80)1.42 (1.09, 1.86)0.66 (0.56, 0.78)

0.54 (0.25, 1.15)0.61 (0.51, 0.74)0.67 (0.12, 3.88)0.68 (0.62, 0.76)0.78 (0.40, 1.49)0.99 (0.42, 2.33)1.11 (0.69, 1.79)1.15 (0.92, 1.43)1.20 (0.40, 3.53)1.30 (1.09, 1.55)0.88 (0.68, 1.14)

ES (95% CI)

Favours intervention

1.1 .25 .5 1 2

Pesticide demand



.

.

FFS participantsPananurak, 2010 (India)Van Rijn, 2010 (Peru)Naik et al., 2008 (India)Huan et al., 1999 (Vietnam)Labarta, 2005 (Nicaragua)Rejesus et al, 2010 (Vietnam)Feder et al, 2004 (Indonesia)Wu Lifeng, 2010 (China)Ali & Sharif, 2011 (Pakistan)Pananurak, 2010 (China)Gockowski et al., 2010 (Ghana)Yang et al., 2005 (China)Hiller et al., 2009 (Kenya)Khan et al., 2007 (Pakistan)Gockowski et al., 2010 (Ghana)Cavatassi et al., 2011 (Ecuador)Davis et al, 2012 (Tanzania)Birthal et al., 2000 (India)Pananurak, 2010 (Pakistan)Dinpanah et al., 2010 (Iran)Wandji et al., 2007 (Cameroon)Mutandwa & Mpangwa, 2004 (Zimbabwe)Palis, 1998 (Philippines)Zuger 2004 (Peru)Carlberg et al., 2012 (Ghana)Yamazaki & Resosudarmo, 2007 (Indonesia)Van den Berg et al., 2002 (Sri Lanka)Davis et al, 2012 (Kenya)Pande et al., 2009 (Nepal)Gockowski et al., 2010 (Ghana)Dinpanah et al., 2010 (Iran)Orozco Cirilo et al., 2008 b) (Mexico)Todo & Takahashi, 2011 (Ethiopia)Subtotal (I-squared = 92.7%, p = 0.000)

FFS neighboursPananurak, 2010 (India)Khan et al., 2007 (Pakistan)Feder et al, 2004 (Indonesia)Labarta, 2005 (Nicaragua)Pananurak, 2010 (China)Wu Lifeng, 2010 (China)Pananurak, 2010 (Pakistan)Yamazaki & Resosudarmo, 2007 (Indonesia)Subtotal (I-squared = 49.5%, p = 0.054)

IDStudy

0.80 (0.61, 1.05)0.86 (0.63, 1.18)0.89 (0.83, 0.96)0.95 (0.92, 0.98)0.97 (0.92, 1.02)0.97 (0.72, 1.31)0.98 (0.96, 1.01)1.08 (1.03, 1.14)1.09 (1.03, 1.15)1.09 (1.04, 1.14)1.14 (1.03, 1.25)1.15 (0.94, 1.41)1.17 (0.53, 2.56)1.17 (0.97, 1.42)1.18 (1.07, 1.30)1.22 (0.97, 1.53)1.23 (1.00, 1.51)1.24 (1.13, 1.36)1.24 (1.01, 1.54)1.32 (1.22, 1.42)1.32 (1.07, 1.63)1.36 (1.06, 1.73)1.36 (0.97, 1.92)1.44 (1.09, 1.92)1.58 (1.19, 2.10)1.67 (1.23, 2.26)1.68 (1.30, 2.18)1.81 (1.15, 2.84)2.11 (1.25, 3.56)2.16 (0.99, 4.69)2.52 (2.05, 3.11)2.62 (2.23, 3.08)2.71 (1.11, 6.60)1.24 (1.16, 1.32)

0.79 (0.63, 1.00)0.97 (0.74, 1.26)0.99 (0.97, 1.01)1.00 (0.99, 1.01)1.02 (0.98, 1.07)1.03 (0.99, 1.08)1.03 (0.86, 1.25)1.43 (1.05, 1.96)1.01 (0.98, 1.03)

ES (95% CI)

0.80 (0.61, 1.05)0.86 (0.63, 1.18)0.89 (0.83, 0.96)0.95 (0.92, 0.98)0.97 (0.92, 1.02)0.97 (0.72, 1.31)0.98 (0.96, 1.01)1.08 (1.03, 1.14)1.09 (1.03, 1.15)1.09 (1.04, 1.14)1.14 (1.03, 1.25)1.15 (0.94, 1.41)1.17 (0.53, 2.56)1.17 (0.97, 1.42)1.18 (1.07, 1.30)1.22 (0.97, 1.53)1.23 (1.00, 1.51)1.24 (1.13, 1.36)1.24 (1.01, 1.54)1.32 (1.22, 1.42)1.32 (1.07, 1.63)1.36 (1.06, 1.73)1.36 (0.97, 1.92)1.44 (1.09, 1.92)1.58 (1.19, 2.10)1.67 (1.23, 2.26)1.68 (1.30, 2.18)1.81 (1.15, 2.84)2.11 (1.25, 3.56)2.16 (0.99, 4.69)2.52 (2.05, 3.11)2.62 (2.23, 3.08)2.71 (1.11, 6.60)1.24 (1.16, 1.32)

0.79 (0.63, 1.00)0.97 (0.74, 1.26)0.99 (0.97, 1.01)1.00 (0.99, 1.01)1.02 (0.98, 1.07)1.03 (0.99, 1.08)1.03 (0.86, 1.25)1.43 (1.05, 1.96)1.01 (0.98, 1.03)

ES (95% CI)


1.5 1 2 3

Yields



.

.

.

FFS participantsLabarta, 2005 (Nicaragua)Pananurak, 2010 (India)Waarts et al., 2012 (Kenya)Pananurak, 2010 (China)Pananurak, 2010 (Pakistan)Naik et al., 2008 (India)Van de Fliert 2000 (Indonesia)Van den Berg et al., 2002 (Sri Lanka)Yang et al., 2005 (China)Khan et al., 2007 (Pakistan)Subtotal (I-squared = 57.1%, p = 0.013)

FFS training + input/marketing supportBirthal et al., 2000 (India)Van Rijn, 2010 (Peru)Cavatassi et al., 2011 (Ecuador)Palis, 1998 (Philippines)Subtotal (I-squared = 96.2%, p = 0.000)

FFS neighboursPananurak, 2010 (India)Pananurak, 2010 (China)Pananurak, 2010 (Pakistan)Labarta, 2005 (Nicaragua)Khan et al., 2007 (Pakistan)Subtotal (I-squared = 0.0%, p = 0.706)

IDStudy

0.28 (0.02, 3.48)1.06 (0.68, 1.66)1.14 (0.92, 1.41)1.17 (1.08, 1.27)1.23 (1.09, 1.40)1.25 (1.09, 1.42)1.31 (1.11, 1.55)1.41 (1.19, 1.67)1.53 (1.10, 2.15)3.40 (1.94, 5.97)1.28 (1.17, 1.41)

1.43 (1.19, 1.72)2.00 (1.02, 3.94)3.34 (1.56, 7.15)4.61 (3.83, 5.56)2.57 (1.18, 5.58)

0.93 (0.66, 1.32)1.07 (1.00, 1.14)1.13 (1.01, 1.26)1.39 (0.66, 2.92)1.51 (0.51, 4.45)1.08 (1.03, 1.15)

ES (95% CI)

0.28 (0.02, 3.48)1.06 (0.68, 1.66)1.14 (0.92, 1.41)1.17 (1.08, 1.27)1.23 (1.09, 1.40)1.25 (1.09, 1.42)1.31 (1.11, 1.55)1.41 (1.19, 1.67)1.53 (1.10, 2.15)3.40 (1.94, 5.97)1.28 (1.17, 1.41)

1.43 (1.19, 1.72)2.00 (1.02, 3.94)3.34 (1.56, 7.15)4.61 (3.83, 5.56)2.57 (1.18, 5.58)

0.93 (0.66, 1.32)1.07 (1.00, 1.14)1.13 (1.01, 1.26)1.39 (0.66, 2.92)1.51 (0.51, 4.45)1.08 (1.03, 1.15)

ES (95% CI)

Favours intervention 1.2 .5 1 2 3

Net revenues (income less costs)


Detecting publication bias

• The only direct evidence for publication bias is through comparison of published and unpublished study results

• But there are also ways of assessing likelihood of publication bias directly and indirectly– Assess reporting biases in each study– Statistical analysis based on sample size


“An ounce of prevention is worth a pound of cure”

Sources of grey literature:(1) Multidisciplinary: Google, Google Scholar

(2) International development specific: JOLIS, BLDS and ELDIS (Institute of Development Studies)

(3) Good sources for impact evaluations: J-PAL/IPA databases, 3ie’s database of impact evaluations

(4) Subject-specific, e.g. IDEAS/Repec for economics, ERIC for education, LILACS for Latin American health publications, ALNAP for humanitarian

(5) Conference proceedings, technical reports (research, governmental agencies), organization websites, dissertations, theses, contact with primary researchers



.

.

Overall (I-squared = 92.7%, p = 0.000)

Gockowski et al., 2010 (Ghana)

Labarta, 2005 (Nicaragua)

Pananurak, 2010 (China)

Todo & Takahashi, 2011 (Ethiopia)


Study

Carlberg et al., 2012 (Ghana)

Palis, 1998 (Philippines)

Van Rijn, 2010 (Peru)

Yamazaki & Resosudarmo, 2007 (Indonesia)

Naik et al., 2008 (India)

Cavatassi et al., 2011 (Ecuador)

Published in journal

Ali & Sharif, 2011 (Pakistan)



Pananurak, 2010 (Pakistan)

Van den Berg et al., 2002 (Sri Lanka)

Wu Lifeng, 2010 (China)

Davis et al, 2012 (Tanzania)


Yang et al., 2005 (China)

Not published in journal

Hiller et al., 2009 (Kenya)

Pande et al., 2009 (Nepal)

Birthal et al., 2000 (India)

Rejesus et al, 2010 (Vietnam)

Davis et al, 2012 (Kenya)

Wandji et al., 2007 (Cameroon)


Pananurak, 2010 (India)

ID

Feder et al, 2004 (Indonesia)

Zuger 2004 (Peru)


Orozco Cirilo et al., 2008 b) (Mexico)




1.24 (1.16, 1.32)

1.18 (1.07, 1.30)

0.97 (0.92, 1.02)

1.09 (1.04, 1.14)

2.71 (1.11, 6.60)

2.16 (0.99, 4.69)

1.58 (1.19, 2.10)

1.36 (0.97, 1.92)

0.86 (0.63, 1.18)

1.67 (1.23, 2.26)

0.89 (0.83, 0.96)

1.22 (0.97, 1.53)

1.09 (1.03, 1.15)

1.36 (1.06, 1.73)

1.30 (1.19, 1.42)

1.24 (1.01, 1.54)

1.68 (1.30, 2.18)

1.08 (1.03, 1.14)

1.23 (1.00, 1.51)

2.52 (2.05, 3.11)

1.15 (0.94, 1.41)

1.17 (0.53, 2.56)

2.11 (1.25, 3.56)

1.24 (1.13, 1.36)

0.97 (0.72, 1.31)

1.81 (1.15, 2.84)

1.32 (1.07, 1.63)

1.14 (1.03, 1.25)

0.80 (0.61, 1.05)

ES (95% CI)

0.98 (0.96, 1.01)

1.44 (1.09, 1.92)

0.95 (0.92, 0.98)

2.62 (2.23, 3.08)

1.14 (1.04, 1.24)

1.32 (1.22, 1.42)

1.17 (0.97, 1.42)

1.24 (1.16, 1.32)

1.18 (1.07, 1.30)

0.97 (0.92, 1.02)

1.09 (1.04, 1.14)

2.71 (1.11, 6.60)

2.16 (0.99, 4.69)

1.58 (1.19, 2.10)

1.36 (0.97, 1.92)

0.86 (0.63, 1.18)

1.67 (1.23, 2.26)

0.89 (0.83, 0.96)

1.22 (0.97, 1.53)

1.09 (1.03, 1.15)

1.36 (1.06, 1.73)

1.30 (1.19, 1.42)

1.24 (1.01, 1.54)

1.68 (1.30, 2.18)

1.08 (1.03, 1.14)

1.23 (1.00, 1.51)

2.52 (2.05, 3.11)

1.15 (0.94, 1.41)

1.17 (0.53, 2.56)

2.11 (1.25, 3.56)

1.24 (1.13, 1.36)

0.97 (0.72, 1.31)

1.81 (1.15, 2.84)

1.32 (1.07, 1.63)

1.14 (1.03, 1.25)

0.80 (0.61, 1.05)

ES (95% CI)

0.98 (0.96, 1.01)

1.44 (1.09, 1.92)

0.95 (0.92, 0.98)

2.62 (2.23, 3.08)

1.14 (1.04, 1.24)

1.32 (1.22, 1.42)

1.17 (0.97, 1.42)


1.5 1 2 3

Meta-analysis of studies by publication status: journal v other


Assess likelihood of file-drawer effects in each study

• Is there evidence that results have been reported selectively: – outcomes not reported despite data collected (or

indicated in methods section, or reported in study protocol if available)?

– existence of studies reporting other outcomes?

• Have outcomes been constructed in a way which is uncommon which might suggest biased exploratory research?


Risk of bias (including file drawer effects) assessment for studies included in meta-analysis


Additional evidence for file-drawer effects

• 34% (14/41) of studies which report data on yields not includable in meta-analysis because do not provide standard errors or information to calculate them

• 30% (27/91) of all studies do not provide information on yields or other agriculture outcomes (net revenues) despite collecting data on knowledge/adoption


Detecting publication bias statistically

Methods for detecting publication bias assume:•Large n studies are likely to get published regardless of results due to time and money investments•Medium n studies will have some modest significant effects that are reported, others may never be published •Small n studies with the largest effects are most likely to be reported, but many will never be published or will be difficult to locate


Funnel Plots

• Exploratory tool used to visually assess the possibility of publication bias in a meta-analysis

• Scatter plot of effect size (x-axis) against some measure of study size (y-axis)

• Precision of estimates increases as the sample size of a study increases– Estimates from small n studies (i.e., less precise, larger standard

errors) will show more variability in the effect size estimates, thus a wider scatter on the plot

– Estimates from larger n studies will show less variability in effect size estimates, thus have a narrower scatter on the plot

• If publication bias is present, we would expect null or ‘negative’ findings from small n studies to be suppressed (i.e., missing from the plot)



Farmer field schools – FFS participant yields0

.1.2

.3.4

.5L

n S

E

-1 -.5 0 .5 1Ln RR

Not published in journal Published in journalLower CI Lower CIPooled

Funnel plot with pseudo 95% confidence limits


Tests for Funnel Plot Asymmetry

• Several regression tests are available to test for funnel plot asymmetry – attempt to overcome subjectivity of visual funnel plot inspection

• Framed as tests for “small study effects”, or the tendency for smaller n studies to show greater effects than larger n studies; i.e., effects aren’t necessarily a result of bias

• Egger test, Peters test (modified Egger test for use with log odds ratio effect sizes), Begg’s test, selection modeling (Hedges & Vevea, 2005), failsafe n (not recommended) (Becker, 2005)


Egger Test• Weighted regression of the effect size on standard error (w=inverse

variance)

– β0 = 0 indicates a symmetric funnel plot

– β0 > 0 shows less precise (i.e., smaller n) studies yield bigger effects

– Can be extended to include p predictors hypothesized to potentially explain funnel plot asymmetry (Sterne et al., 2001) (see analysis below)

• Limitations: – Low power unless there is severe bias and large n

– Inflated Type I error with large treatment effects, rare event data, or equal sample sizes across studies

– Inflated Type I error with log odds ratio effect sizes

0 :H 00

01

iii seES


Egger's publication bias plot

st

and

ardi

zed

effe

ct

precision0 50 100

-5

0

5

10

15

Egger test for FFS-participant yields

Coef. t P>t

const -0.047 -1.70 0.100slope 3.085 4.14 0.000


‘Trim and fill’ analysis (Duval & Tweedie, 2000)

• Iteratively trims (removes) smaller studies causing asymmetry

• Uses trimmed plot to re-estimate the mean effect size• Fills (replaces) omitted studies and mirror-images• Provides an estimate of the number of missing (filled)

studies and a new estimate of the mean effect size• Major limitations include: misinterpretation of results,

assumption of a symmetric funnel plot, poor performance in the presence of heterogeneity


‘Trim & fill’ for FFS-participant yields


Results of meta-trim

95% lower

Effect size

95%upper

Num. studies

Meta-analysis

1.16 1.23 1.32 31

Filled meta-analysis

1.03 1.10 1.17 40


Cumulative meta-analysis

• Typically used to update pooled effect size estimate with each new study cumulatively over time

• Can use as an alternative to update pooled effect size estimate with each study in order of largest to smallest sample size

• If pooled effect size does not shift with the addition of small n studies, provides some evidence against publication bias


Pande et al., 2009 (Nepal)Huan et al., 1999 (Vietnam)Danida, 2011 (Bangladesh)Wu Lifeng, 2010 (China)Cavatassi et al., 2011 (Ecuador)Islam et al., 2006 (Bangladesh)Pananurak, 2010 (China)Van den Berg et al., 2002 (Sri Lanka)Kabir & Uphoff, 2007 (Myanmar)Labarta, 2005 (Nicaragua)Dinpanah et al., 2010 (Iran)Dinpanah et al., 2010 (Iran)Davis et al, 2012 (Kenya)Davis et al, 2012 (Uganda)Davis et al, 2012 (Tanzania)Carlberg et al., 2012 (Ghana)Yamazaki & Resosudarmo, 2007 (Indonesia)Feder et al, 2004 (Indonesia)Gockowski et al., 2010 (Ghana)Gockowski et al., 2010 (Ghana)Gockowski et al., 2010 (Ghana)Ali & Sharif, 2011 (Pakistan)Todo & Takahashi, 2011 (Ethiopia)Gockowski et al., 2005 (Nigeria)Zuger 2004 (Peru)Waarts et al., 2012 (Kenya)Van Rijn, 2010 (Peru)Pananurak, 2010 (Pakistan)Wandji et al., 2007 (Cameroon)Mutandwa & Mpangwa, 2004 (Zimbabwe)Mancini et al., 2008 (India)Khan et al., 2007 (Pakistan)Van de Fliert 2000 (Indonesia)Hiller et al., 2009 (Kenya)Naik et al., 2008 (India)Pananurak, 2010 (India)Birthal et al., 2000 (India)Birthal et al., 2000 (India)Orozco Cirilo et al., 2008 b) (Mexico)Rejesus et al, 2010 (Vietnam)Palis, 1998 (Philippines)Torrez et al., 1999 (Bolivia)Yang et al., 2005 (China)Williamson et al., 2003 (Kenya)Williamson et al., 2003 (India)Jones, n.d. (Sri Lanka)

IDStudy

2.11 (1.25, 3.56)1.35 (0.62, 2.95)1.35 (0.62, 2.95)1.06 (0.92, 1.23)1.09 (0.96, 1.25)1.09 (0.96, 1.25)1.08 (0.98, 1.20)1.14 (1.02, 1.27)1.14 (1.02, 1.27)1.09 (1.00, 1.19)1.23 (1.10, 1.38)1.25 (1.12, 1.40)1.27 (1.14, 1.42)1.27 (1.14, 1.42)1.27 (1.14, 1.41)1.29 (1.16, 1.43)1.31 (1.18, 1.45)1.24 (1.15, 1.35)1.23 (1.14, 1.33)1.24 (1.15, 1.34)1.23 (1.15, 1.33)1.22 (1.14, 1.30)1.22 (1.14, 1.31)1.22 (1.14, 1.31)1.23 (1.15, 1.32)1.23 (1.15, 1.32)1.22 (1.14, 1.30)1.22 (1.14, 1.30)1.22 (1.15, 1.30)1.23 (1.15, 1.31)1.23 (1.15, 1.31)1.23 (1.15, 1.30)1.23 (1.15, 1.30)1.23 (1.15, 1.30)1.20 (1.13, 1.28)1.19 (1.12, 1.26)1.19 (1.13, 1.27)1.19 (1.13, 1.27)1.24 (1.17, 1.33)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)

ES (95% CI)

2.11 (1.25, 3.56)1.35 (0.62, 2.95)1.35 (0.62, 2.95)1.06 (0.92, 1.23)1.09 (0.96, 1.25)1.09 (0.96, 1.25)1.08 (0.98, 1.20)1.14 (1.02, 1.27)1.14 (1.02, 1.27)1.09 (1.00, 1.19)1.23 (1.10, 1.38)1.25 (1.12, 1.40)1.27 (1.14, 1.42)1.27 (1.14, 1.42)1.27 (1.14, 1.41)1.29 (1.16, 1.43)1.31 (1.18, 1.45)1.24 (1.15, 1.35)1.23 (1.14, 1.33)1.24 (1.15, 1.34)1.23 (1.15, 1.33)1.22 (1.14, 1.30)1.22 (1.14, 1.31)1.22 (1.14, 1.31)1.23 (1.15, 1.32)1.23 (1.15, 1.32)1.22 (1.14, 1.30)1.22 (1.14, 1.30)1.22 (1.15, 1.30)1.23 (1.15, 1.31)1.23 (1.15, 1.31)1.23 (1.15, 1.30)1.23 (1.15, 1.30)1.23 (1.15, 1.30)1.20 (1.13, 1.28)1.19 (1.12, 1.26)1.19 (1.13, 1.27)1.19 (1.13, 1.27)1.24 (1.17, 1.33)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)1.24 (1.16, 1.32)

ES (95% CI)

1.281 1 3.56

Cumulative meta-analysis for FFS-participant yields: studies ordered by sample size from largest to smallest


• The evidence for ‘small study effects’ seems strong, but is this due to publication bias?

• Asymmetry could be due to factors other than publication bias, e.g.,– methodological quality (smaller studies with lower quality

may have exaggerated treatment effects)– Artefactual variation (e.g. outcome measurement)– Chance– True heterogeneity due to intervention characteristics

(FFS-type, region, crop, follow-up length)• Assessing funnel plot symmetry relies entirely on subjective

visual judgment


0.1

.2.3

.4.5

Ln

SE

-1 -.5 0 .5 1Ln RR

High risk of bias Medium risk of biasLower CI Lower CIPooled

Funnel plot with pseudo 95% confidence limitsAnalysis by study quality


Contour Enhanced Funnel Plots

• Based on premise that statistical significance is most important factor determining publication

• Funnel plot with additional contour lines associated with ‘milestones’ of statistical significance: p = .01, .05, .1– If studies are missing in areas of statistical non-

significance, publication bias may be present– If studies are missing in areas of statistical significance,

asymmetry may be due to factors other than publication bias

– If there are no studies in areas of statistical significance, publication bias may be present

• Can help distinguish funnel plot asymmetry due to publication bias versus other factors (Peters et al., 2008)


0

.1

.2

.3

.4

.5

Sta

nda

rd e

rro

r

-1 -.5 0 .5 1Effect estimate

Studies

p < 1%

1% < p < 5%

5% < p < 10%

p > 10%


Meta-regression analysis (t-stats reported)

1 2 3 4 5 6 7

STANDARD ERROR (LN_SE) 4.37*** 4.33*** 3.90*** 3.81*** 3.21*** 3.53*** 4.60***

HIGH QUALITY 0.52 0.61 0.15 1.37 0.07 1.64

INTERACTION(HIGH QUALITY*LN_SE)

-1.83*

FFS+ 0.51 -1.01 0.73 0.35

YIELD MEASURE DUMMIES Yes

REGION DUMMIES Yes

CROP-TYPE DUMMIES Yes

ADJ. R-SQU 0.36 0.34 0.33 0.45 0.42 0.29 0.39

N.OBS 33 33 33 33 33 33 33

Specification 7 suggests heterogeneity from small study effects due to study quality


Meta-analysis also suggests bias due to study quality

Medium risk of bias


.

.

Overall (I-squared = 92.7%, p = 0.000)


Pananurak, 2010 (China)

Davis et al, 2012 (Tanzania)

ID

Yamazaki & Resosudarmo, 2007 (Indonesia)

Pananurak, 2010 (India)

Pananurak, 2010 (Pakistan)


Feder et al, 2004 (Indonesia)

Palis, 1998 (Philippines)

Pande et al., 2009 (Nepal)

Zuger 2004 (Peru)

Medium risk of bias



Cavatassi et al., 2011 (Ecuador)

Ali & Sharif, 2011 (Pakistan)

Naik et al., 2008 (India)



Wu Lifeng, 2010 (China)

Davis et al, 2012 (Kenya)

Yang et al., 2005 (China)

Rejesus et al, 2010 (Vietnam)

Todo & Takahashi, 2011 (Ethiopia)

Van den Berg et al., 2002 (Sri Lanka)

Wandji et al., 2007 (Cameroon)

Van Rijn, 2010 (Peru)

Carlberg et al., 2012 (Ghana)


Labarta, 2005 (Nicaragua)

Orozco Cirilo et al., 2008 b) (Mexico)

Birthal et al., 2000 (India)Gockowski et al., 2010 (Ghana)


Hiller et al., 2009 (Kenya)


High risk of bias

Study

1.24 (1.16, 1.32)

1.17 (0.97, 1.42)

1.09 (1.04, 1.14)

1.23 (1.00, 1.51)

ES (95% CI)

1.67 (1.23, 2.26)

0.80 (0.61, 1.05)

1.24 (1.01, 1.54)

1.35 (1.20, 1.51)

0.98 (0.96, 1.01)

1.36 (0.97, 1.92)

2.11 (1.25, 3.56)

1.44 (1.09, 1.92)

1.32 (1.22, 1.42)

1.10 (1.03, 1.17)

1.22 (0.97, 1.53)

1.09 (1.03, 1.15)

0.89 (0.83, 0.96)

2.52 (2.05, 3.11)

1.36 (1.06, 1.73)

1.08 (1.03, 1.14)

1.81 (1.15, 2.84)

1.15 (0.94, 1.41)

0.97 (0.72, 1.31)

2.71 (1.11, 6.60)

1.68 (1.30, 2.18)

1.32 (1.07, 1.63)

0.86 (0.63, 1.18)

1.58 (1.19, 2.10)

2.16 (0.99, 4.69)

0.97 (0.92, 1.02)

2.62 (2.23, 3.08)

1.24 (1.13, 1.36)1.18 (1.07, 1.30)

0.95 (0.92, 0.98)

1.17 (0.53, 2.56)

1.14 (1.03, 1.25)

1.24 (1.16, 1.32)

1.17 (0.97, 1.42)

1.09 (1.04, 1.14)

1.23 (1.00, 1.51)

ES (95% CI)

1.67 (1.23, 2.26)

0.80 (0.61, 1.05)

1.24 (1.01, 1.54)

1.35 (1.20, 1.51)

0.98 (0.96, 1.01)

1.36 (0.97, 1.92)

2.11 (1.25, 3.56)

1.44 (1.09, 1.92)

1.32 (1.22, 1.42)

1.10 (1.03, 1.17)

1.22 (0.97, 1.53)

1.09 (1.03, 1.15)

0.89 (0.83, 0.96)

2.52 (2.05, 3.11)

1.36 (1.06, 1.73)

1.08 (1.03, 1.14)

1.81 (1.15, 2.84)

1.15 (0.94, 1.41)

0.97 (0.72, 1.31)

2.71 (1.11, 6.60)

1.68 (1.30, 2.18)

1.32 (1.07, 1.63)

0.86 (0.63, 1.18)

1.58 (1.19, 2.10)

2.16 (0.99, 4.69)

0.97 (0.92, 1.02)

2.62 (2.23, 3.08)

1.24 (1.13, 1.36)1.18 (1.07, 1.30)

0.95 (0.92, 0.98)

1.17 (0.53, 2.56)

1.14 (1.03, 1.25)


1.5 1 2 3


Final thoughts

• Evidence of upwards bias in low quality vs higher quality quasi-experiments=> Where ‘relevance’ of review is important for users, careful risk of

bias assessment and sensitivity analysis required

• Study quality appears more important than publication bias in explaining small study effects, but we do also find evidence for file drawer effects in the literature

• Statistical tests available are sensitive to number of effect sizes available and are of limited validity where sample sizes homogeneous


Recommended Reading

• Duval, S. J., & Tweedie, R. L. (2000). A non-parametric ‘trim and fill’ method of accounting for publication bias in meta-analysis. Journal of the American Statistical Association, 95, 89-98.

• Egger, M., Davey Smith, G., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629-634.

• Hammerstrøm, K., Wade, A., Jørgensen, A. K. (2010). Searching for studies: A guide to information retrieval for Campbell systematic reviews. Campbell Systematic Review, Supplement 1.

• Harbord, R. M., Egger, M., & Sterne, J. A. C. (2006). A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints. Statistics in Medicine, 25, 3443-3457.

• Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2008). Contour-enhanced meta-analysis funnel plots help distinguish publication bias from other causes of asymmetry. Journal of Clinical Epidemiology, 61, 991-996.


Recommended Reading• Rosenthal, R. (1979). The ‘file-drawer problem’ and tolerance for null results.

Psychological Bulletin, 86, 638-641.• Rothstein, H. R., Sutton, A. J., & Borenstein, M. L. (Eds). (2005). Publication bias in

meta-analysis: Prevention, assessment and adjustments. Hoboken, NJ: Wiley.• Rücker, G., Schwarzer, G., & Carpenter, J. (2008). Arcsine test for publication bias in

meta-analyses with binary outcomes. Statistics in Medicine, 27, 746-763• Sterne, J. A., & Egger, M. (2001). Funnel plots for detecting bias in meta-analysis:

Guidelines on choice of axis. Journal of Clinical Epidemiology, 54, 1046-1055.• Sterne, J. A. C., Egger, M., & Moher, D. (Eds.) (2008). Chapter 10: Addressing

reporting biases. In J. P. T. Higgins & S. Green (Eds.), Cochrane handbook for systematic reviews of interventions, pp. 297 – 333. Chichester, UK: Wiley.

• Sterne, J. A. C., et al. (2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ, 343, d4002.

• Waddington, H., White, H., Snilstveit, B., Hombrados, J. Vojtkova, M. (2012) How to do a good systematic review of effects in international development: a tool-kit. Journal of Development Effectiveness, 4 (3).

Documents

Www.3ieimpact.org Hugh Waddington Publication bias in impact evaluation: evidence from a systematic review of farmer field schools Hugh Waddington, 3ie