24
Impact Evaluation Impact Evaluation for for International International Development: Development: Why we need it and why it is Why we need it and why it is hard hard Irene Irene Guijt Guijt March 25, 2013 CDI Conference on Impact Evaluation

Impact Evaluation for International Development: Why we need it and why it is hard Irene Guijt March 25, 2013 CDI Conference on Impact Evaluation

Embed Size (px)

Citation preview

Impact Evaluation Impact Evaluation for for International International Development:Development:

Why we need it and why it is Why we need it and why it is hardhardIrene Irene GuijtGuijt

March 25, 2013CDI Conference on Impact Evaluation

Copy right statementCopy right statement

On the ideas asserted in this power point On the ideas asserted in this power point Intellectual Property Rights apply.Intellectual Property Rights apply.

Appropriate citation:Appropriate citation: Guijt, I. (2013).Guijt, I. (2013). Impact Evaluation Impact Evaluation for for

International Development: Why we need it and International Development: Why we need it and why it is hardwhy it is hard. Keynote speech for CDI conference . Keynote speech for CDI conference on Impact Evaluation, March 25 and 26, 2013. on Impact Evaluation, March 25 and 26, 2013.

Pendulum of IEPendulum of IE

Key messagesKey messages

Crucial and growing area of evaluation expertiseCrucial and growing area of evaluation expertise Much contestation – definitions, methods, Much contestation – definitions, methods,

standards, utility standards, utility Utility of IEs remain to be provenUtility of IEs remain to be proven

Impact EvaluationImpact Evaluation

1.1. What is it?What is it?

2.2. Why do it? When is it appropriate?Why do it? When is it appropriate?

3.3. Who’s involved?Who’s involved?

4.4. How?How?• Options for key tasksOptions for key tasks• QualityQuality• UseUse

Controversial? All of the above!Controversial? All of the above!

small ‘e’, big ‘E’small ‘e’, big ‘E’

‘‘Results and targets’ Results and targets’

vs vs

‘‘(Why) do or don’t things work?’(Why) do or don’t things work?’

Seeds of interestSeeds of interest• Aid ‘failure’Aid ‘failure’• Financial crisisFinancial crisis• Shifting relationships (economic & players)Shifting relationships (economic & players)• Others?Others?

When to investWhen to invest

Testing innovation for scaling upTesting innovation for scaling up Risky context, investment utilityRisky context, investment utility Accountability for high investmentsAccountability for high investments Downward accountability (?)Downward accountability (?)

Policy (and practice) direction for the future:Policy (and practice) direction for the future:• Will the past be a good predictor of the future?Will the past be a good predictor of the future?• When what works, doesn’t work?When what works, doesn’t work?• And when what doesn’t work, works?And when what doesn’t work, works?

Learning vs accountabilityLearning vs accountability

Questions (that) matterQuestions (that) matter

Which questions matter?Which questions matter?• Whose questions are heardWhose questions are heard• Which theories or strategies do we need to questionWhich theories or strategies do we need to question• Who casts a voteWho casts a vote

3ie 3ie enduring questions, eg are school vouchers enduring questions, eg are school vouchers the solution, what works in reducing FGM, market the solution, what works in reducing FGM, market access for the poor, etcaccess for the poor, etc

MethodsLab MethodsLab rolling out an innovation within a rolling out an innovation within a flagship sector (25% of budget)flagship sector (25% of budget)

Defining ImpactDefining Impact

the positive and negative, intended and the positive and negative, intended and unintended, direct and indirect, primary and unintended, direct and indirect, primary and secondary effects produced by an intervention secondary effects produced by an intervention (OECD)(OECD)

I = Y2 – Y1 = (I) evaluation as a rigorous estimation of the difference between indicators of interest with (Y1) and without the intervention (Y2) (Howard White, 3ie)

People and MoneyPeople and Money

““An IE industry has developed which believes it is doing good workAn IE industry has developed which believes it is doing good work .”.”(Morton et al 2012)(Morton et al 2012)

Demand for IEDemand for IE• Development banksDevelopment banks• Bilateral aidBilateral aid• Multi-donor programsMulti-donor programs• NGOsNGOs

Supply for IE expertiseSupply for IE expertise• JPAL network (2003 - Abdul Latif Jameel Poverty Action Lab )JPAL network (2003 - Abdul Latif Jameel Poverty Action Lab )

““Conducting Rigorous Impact EvaluationsConducting Rigorous Impact Evaluations: J-PAL researchers conduct : J-PAL researchers conduct randomized evaluations to test and improve the effectiveness of programs randomized evaluations to test and improve the effectiveness of programs

and policies aimed at reducing poverty.”and policies aimed at reducing poverty.”• 3ie (2009)3ie (2009)• university departments, esp. economistsuniversity departments, esp. economists• CLEAR Network (World Bank funded)CLEAR Network (World Bank funded)• consultantsconsultants

The case of 3ieThe case of 3ie

20042004 Origin with Closing the Evaluation Gap Origin with Closing the Evaluation Gap Initiative (Centre for Global Development), Initiative (Centre for Global Development), Hewlett Foundation, BMGFHewlett Foundation, BMGF

20062006 paper ‘When Will We Ever Learn’, the paper ‘When Will We Ever Learn’, the manifesto of the Impact Evaluation movement manifesto of the Impact Evaluation movement

2008/092008/09 3ie established, $55.14 million (2010-3ie established, $55.14 million (2010-2013) Supply and demand 2013) Supply and demand • Quality standards for rigorous evaluationsQuality standards for rigorous evaluations• Review process for designs/studiesReview process for designs/studies• Identifying priority topicsIdentifying priority topics• Provide grants for IE designsProvide grants for IE designs

3IE Review Conclusions 3IE Review Conclusions (Morton et al (Morton et al 2012)2012)

Membership disappointingly lowMembership disappointingly low• 10 bilateral and one multilateral donor agencies; 6developing country 10 bilateral and one multilateral donor agencies; 6developing country

governments/agencies; 2 philanthropic foundations; 3INGOsgovernments/agencies; 2 philanthropic foundations; 3INGOs

Policymakers, implementers and donors: IE low Policymakers, implementers and donors: IE low priority; sizeable minority of practitioners not of priority; sizeable minority of practitioners not of experimental IE experimental IE

What is considered ‘good’ is funded, what is funded What is considered ‘good’ is funded, what is funded shapes the face of IEshapes the face of IE• 139 specialists, wide review net, esp US universities focused on 139 specialists, wide review net, esp US universities focused on

experimental IEexperimental IE

Journal of Development Effectiveness since 2009, Journal of Development Effectiveness since 2009, respected, used (4000 times in 2010) respected, used (4000 times in 2010)

To date 9 SRs and 9 IEs finalised but website includes To date 9 SRs and 9 IEs finalised but website includes many more studiesmany more studies

Others emerging (partial listing!)Others emerging (partial listing!)

INGOsINGOs• ICCO, World Vision, Oxfam, Plan International, CARE, Hivos, Save ICCO, World Vision, Oxfam, Plan International, CARE, Hivos, Save

the Children, Veco, Freedom from Hungerthe Children, Veco, Freedom from Hunger

MultilateralsMultilaterals• RCT studies undertaken for many UN agenciesRCT studies undertaken for many UN agencies• IFAD developing own approach (PIALA)IFAD developing own approach (PIALA)

BilateralsBilaterals• GiZ, AfD, AusAid, DfID, USAID, DGIS….GiZ, AfD, AusAid, DfID, USAID, DGIS….

Philanthropic FoundationsPhilanthropic Foundations• BMGF, Hewlet, Packard, Rockefeller….BMGF, Hewlet, Packard, Rockefeller….

It has to be ‘rigorous’It has to be ‘rigorous’

Rigour = value judgementRigour = value judgement• USAID: ‘USAID: ‘attributable to a defined intervention; im- pact

evaluations are based on models of cause and effect and require a credible and rigorously defined counterfactual’

• PIALA: data, sensemaking, utility, low resource effortsPIALA: data, sensemaking, utility, low resource efforts

Five standards in evaluation – which one wins? Five standards in evaluation – which one wins? • Utility or accuracyUtility or accuracy• Relevance or objectivityRelevance or objectivity

Hard in practiceHard in practice• Eg 3IE’s own standards, the 2012 review was very critical of IEs Eg 3IE’s own standards, the 2012 review was very critical of IEs

funded through 3IEfunded through 3IE

How does one do IE?How does one do IE?

Gold standard of whatGold standard of what• Statistics or rigour of thoughtStatistics or rigour of thought

Experimentalism vs ‘other’Experimentalism vs ‘other’• A method or naïve experimentalist beliefA method or naïve experimentalist belief

Attribution Attribution • Attribution as one form of contributionAttribution as one form of contribution

Counterfactual Counterfactual • rigour or philosophically dead?rigour or philosophically dead?

Beyond a counterfactual: Beyond a counterfactual: the shape of changethe shape of change

Emerging alternativesEmerging alternatives

Realist evaluation framingRealist evaluation framing Contribution analysisContribution analysis People’s narratives and surveys for attitudes / People’s narratives and surveys for attitudes /

behaviour shiftsbehaviour shifts Qualitative Comparative AnalysisQualitative Comparative Analysis Participatory impact assessment and learning Participatory impact assessment and learning

approachapproach PADEV …PADEV …

Participatory impact Participatory impact assessment and learning assessment and learning

approach (PIALA)approach (PIALA)

IFAD committed to 30 IEsIFAD committed to 30 IEs Needs an approach that works for low Needs an approach that works for low

resource programs, must show progress resource programs, must show progress towards 80 million, weak baselinestowards 80 million, weak baselines

PIALA = statistically sampled in nested PIALA = statistically sampled in nested hierarchy, PRA and survey data collection plus hierarchy, PRA and survey data collection plus project data, collective sensemakingproject data, collective sensemaking

GirlHub (DfID and Nike GirlHub (DfID and Nike Foundation)Foundation)

Impact of ‘starting a national conversation to Impact of ‘starting a national conversation to revalue girls’revalue girls’

Focus – attitude and behaviour change as a result Focus – attitude and behaviour change as a result of social media workof social media work

Control group hard – radio, magazineControl group hard – radio, magazine Purposively, stratified sample: detailed survey Purposively, stratified sample: detailed survey

and self-signified stories of change, before and and self-signified stories of change, before and after, supplementary context analysisafter, supplementary context analysis

Beyond only Beyond only data data collection: IE collection: IE is about more is about more than just than just describing describing ‘impact’ well‘impact’ well

betterevaluation.orgbetterevaluation.org

UseUse

Uptake? Uptake? • Evidence of evidence use is weakEvidence of evidence use is weak

Assumptions about uptakeAssumptions about uptake• Naïve politics Naïve politics • Many, statistically rigorous, published IEs will lead to useMany, statistically rigorous, published IEs will lead to use• IDRC 90+ variables that influence uptakeIDRC 90+ variables that influence uptake

Have to deal with the boundary between research Have to deal with the boundary between research and policy uptakeand policy uptake

Uptake requires fostering interest = we need to Uptake requires fostering interest = we need to understand the psychology of use, as well as the understand the psychology of use, as well as the politics of usepolitics of use

Assumptions in 3IEs Theory of Assumptions in 3IEs Theory of ChangeChange

overriding need is for more rigorous evidenceoverriding need is for more rigorous evidence quality will automatically be a result of more IEs quality will automatically be a result of more IEs

of a certain typeof a certain type research community will generate enough policy research community will generate enough policy

relevant proposals and it will only be necessary to relevant proposals and it will only be necessary to select the most policy relevantselect the most policy relevant

policy influence requires major effort from onset policy influence requires major effort from onset of researchof research

Future of IEFuture of IE

AudienceAudience• Convince policymakers that rigorous IE important for policy Convince policymakers that rigorous IE important for policy

processprocess

Feasibility and rigourFeasibility and rigour• Fit to context = (quasi) experimental studies difficult (conditions, Fit to context = (quasi) experimental studies difficult (conditions,

questions, capacities)questions, capacities)

Relevance for policyRelevance for policy• Inverse relationship? most policy relevant programmes least Inverse relationship? most policy relevant programmes least

amenable to experimental IE techniquesamenable to experimental IE techniques

Pendulum of IEPendulum of IE