12
The Case For and Against Experimental Methods in Impact Evaluation By Dexter N. Pante Introduction For thousands of years, human beings have been making causal inference about their environment. The discovery of fire by early humans indicated that even with their primitive minds, they understood that repeatedly striking stone flints causes them to spark (Cook & Campbell, 1979). It was through failures, coupled with reasoning and observations, that Homo sapiens were able to produce knowledge which enabled them to make sense of and then survive in their environment (Scriven, 2013b). From the ancient time to the present, the dominant paradigm in the discussion of causation has been the scientific tradition which argues that the experimental method is the best tool of discovering causal relations in the world (Beebee, Hitchcock, & Menzies, 2009; Losee, 2011). The successes of experiments in the natural sciences convinced social scientists that the same logic could be applied to arrive at causal inferences in the social sciences and evaluation discipline (Clarke, 2006; Donalson, Christie, & Mark, 2009; King, Keohane, & Verba, 1994; Tacq, 2011). In this paper, I argue that there are many languages to understand causation and experimentation is only one of them. Just like any language, the language of experimentation is very useful when addressing the communication needs that exist within its context. I shall also argue that the strength of this language is also its limitation. Finally, I will discuss emerging communication needs in the area of impact evaluation which are better addressed by other languages and how these languages could expand the repertoire of evaluation methods in understanding causation. The language 1 of experimentation: Its logic, vocabulary and notation In the Philosophical Investigations, Wittgenstein (1958, p. 20) wrote, For a large class of cases, the meaning of a word is its use in the language”. For Wittgenstein, one can only understand the meaning of an act through its expression in the language (Martland, 1975). In the case of causation, there is an entire language that has been developed to understand this concept. According to Cook and Campbell (1979), this is the language of experimentation which has its own logic, vocabulary and notation. 1 The word ‘language’ in this paper is used in a very liberal sense. A language is any heuristic device used to understand the world.

Pro and Cons of Experimental Method in Impact Evaluation

  • Upload
    dpante

  • View
    40

  • Download
    1

Embed Size (px)

DESCRIPTION

My assessment paper submitted at the Centre for Program Evaluation, University of Melbourne, Academic Year 2013.This has been submitted via Turnitin. This article presents various arguments for and against the use of experimental method in impact evaluation.

Citation preview

Page 1: Pro and Cons of Experimental Method in Impact Evaluation

The Case For and Against Experimental Methods in Impact EvaluationBy Dexter N. Pante

Introduction

For thousands of years, human beings have been making causal inference about their environment. The discovery of fire by early humans indicated that even with their primitive minds, they understood that repeatedly striking stone flints causes them to spark (Cook & Campbell, 1979). It was through failures, coupled with reasoning and observations, that Homo sapiens were able to produce knowledge which enabled them to make sense of and then survive in their environment (Scriven, 2013b). From the ancient time to the present, the dominant paradigm in the discussion of causation has been the scientific tradition which argues that the experimental method is the best tool of discovering causal relations in the world (Beebee, Hitchcock, & Menzies, 2009; Losee, 2011). The successes of experiments in the natural sciences convinced social scientists that the same logic could be applied to arrive at causal inferences in the social sciences and evaluation discipline (Clarke, 2006; Donalson, Christie, & Mark, 2009; King, Keohane, & Verba, 1994; Tacq, 2011).

In this paper, I argue that there are many languages to understand causation and experimentation is only one of them. Just like any language, the language of experimentation is very useful when addressing the communication needs that exist within its context. I shall also argue that the strength of this language is also its limitation. Finally, I will discuss emerging communication needs in the area of impact evaluation which are better addressed by other languages and how these languages could expand the repertoire of evaluation methods in understanding causation.

The language1 of experimentation: Its logic, vocabulary and notation

In the Philosophical Investigations, Wittgenstein (1958, p. 20) wrote, “For a large class of cases, the meaning of a word is its use in the language”. For Wittgenstein, one can only understand the meaning of an act through its expression in the language (Martland, 1975). In the case of causation, there is an entire language that has been developed to understand this concept. According to Cook and Campbell (1979), this is the language of experimentation which has its own logic, vocabulary and notation.

The underlying logic of experimentation proceeds as follows: 1) take two identical groups and subject the groups to a ‘pre-test’; 2) administer the intervention or treatment to either one of the groups; 3) subject the groups again to a ‘post-test’ and 4) compute the difference of the observed difference over time of the two groups (Astbury, 2012; Bryman, 2012; Getler, Martinez, Premand, Rawlings, & Vermeersch, 2011; Pawson & Tilley, 1997; Shadish, Cook, & Campbell, 2002). The resulting ‘difference of differences’ serves as an estimate of the effect of the treatment (Getler et al., 2011). According to Campbell and Stanley (1969), there are two types of experiments: true experiments and quasi-experiments. In brief, both follow the logic described earlier; the only difference is the presence of randomisation in the allocation of treatment in the former which is absent in the latter (Cook & Campbell, 1979; Getler et al., 2011). From these two types of experiments, different research design variations were developed (Campbell & Stanley, 1969; Cook & Campbell, 1979).

Experimentation also introduced terms like ‘internal’ and ‘external validity’, ‘bias’, ‘randomisation’, ‘counterfactual’ and ‘control’, among others. Experimental language has also its own notation to communicate the type of research designs being implemented, including, for

1 The word ‘language’ in this paper is used in a very liberal sense. A language is any heuristic device used to understand the world.

Page 2: Pro and Cons of Experimental Method in Impact Evaluation

example, the use of ‘O’ for observation, ‘X’ for treatment, ‘R’ for randomisation, subscripts to refer to sequential order, and a dashed line means lack of randomisation. Due to space limitations here, only relevant terms will be explained.2

The uses and constraints of the language of experimentation

Different languages have different purposes and uses. A language might be useful in one context but not in another. The use of mother tongue for teaching is beneficial where the objective is to facilitate learning, but it is not useful in other contexts, i.e. international business and research where English is the accepted medium (Smits, Huisman, & Kruijff, 2008; UNESCO, 2012). This principle applies to experimentation as well. In this section, I will enumerate the three uses of experimentation and discuss their constraints.

Experimentation is the most appropriate tool when the requirements in impact evaluation are methodological rigour, accountability and knowledge accumulation. First, rigour is usually understood as the quality of the method when it provides warranted causal inference and an estimate of the program effect while rendering other alternative explanations implausible (Getler et al., 2011; Stern et al., 2012; USAID, 2010; White & Phillips, 2012). Many experts agree that experimentation is the most rigorous evaluation design that can answer the question, ‘Did the program have an effect on the beneficiaries?’ (Bamberger, Rugh, & Mabry, 2012; St. Pierre, 2004; Stern et al., 2012). In fact, Getler et al. (2011) argued that the purpose of any rigorous impact evaluation is basically to answer whether the changes in the outcomes were directly caused by the program. In sum, rigour is synonymous with internal validity, which refers to inferences that the observed covariation between variables is causal (Campbell & Stanley, 1969; Cook & Campbell, 1979; Shadish et al., 2002).

The notion that causation is proved through evidence of association and elimination of competing explanations has its roots in David Hume’s regularity theory during the seventeenth century. According to Hume (1777, p. 912), “A cause is an object followed by another, and where all the objects are similar to the first are followed by objects similar to the second. Or in other words, where if the first object had not been, the second never had existed”. It is important to note that Hume’s theory involves the following two concepts: constant conjunction and counterfactual. These two concepts were the basis of John Stuart Mill’s methods of agreement and disagreement which became the foundation of the logic of experimentation (Copi & Cohen, 1990; Losee, 2011; Pawson & Tilley, 1997; Stern et al., 2012). With the first concept, causation is understood as a mere constant conjunction of events or objects (Cook & Campbell, 1979; Losee, 2011). The causal necessity is not observable in the regularity theory (Cook, Scriven, Coryn, & Evergreen, 2010; Garret, 2009). The necessary connection in any perceived causal relationship is a mental construct formed through habit, custom or experience of such conjunction (Garret, 2009; Losee, 2011; Scruton, 1994). Based on this, if a causal relationship is ever asserted, it is understood to be a high correlation (Cook & Campbell, 1979).

The problem with this is correlation is not causation (Shadish et al., 2002). To bolster correlation, there must be a necessary connection between two events such that without the first, the second event would not occur (Losee, 2011). This is how counterfactual reasoning originated. In simple terms, the counterfactual is the ‘what if’ situation where no intervention was implemented (Cummings, 2006; Ferraro, 2009; Mohr, 1999; Stern et al., 2012). Together, the application of constant conjunction and counterfactual dependencies permits stronger causal inferences by controlling other possibilities (Benoît & Ragin, 2009; Copi & Cohen, 1990).

2 For an exhaustive discussion of the different experimental and quasi-experimental designs, terms and notations, please refer to Shadish et al. (2002), Cook and Campbell (1979) and (Campbell & Stanley, 1969).

Page 3: Pro and Cons of Experimental Method in Impact Evaluation

The limitation of this causal language is based on its ontological view of causation. There are philosophers, even some psychologists, who disagree with the causal regularity of Hume and with the notion that causation is inferred and not observed (Anscombe, 1981; Danks, 2009; Scriven, 2009). For example, Harre and Madden argued that reality is not only composed of events or objects but also of ‘powerful particulars’ (Losee, 2011; Mumford, 2009). This view, known as generative theory or causal power, sees causation as acting internally as well as externally. The internal causation describes the transformative potential of phenomena (Pawson & Tilley, 1997). Humans are example of powerful particulars. We are not just an amalgamation of atomic particles; we are also endowed with intrinsic powers, such as intention, motivation and self-reflection, which may explain the occurrence of effect.

From this view arose the realist evaluation language which provides another way of understanding causation. Unlike Hume, the realists require understanding of the internal mechanisms producing the effect. Hence, the realist formula for explaining the outcome of an act consists of context and mechanism (Pawson & Tilley, 1997; Stern et al., 2012). However, it must be pointed out that this dual aspect of causation is also present in the language of experimentation. The concepts of molar and micromediation causations correspond to the external and internal aspects in the realist language (Cook & Campbell, 1979). While experiments aim to answer the question regarding causal attribution and effect, the realist approach seeks to unpack the causal mechanism in order to explain the complexity and dynamics occurring within the cause and its context.

Secondly, experiments are useful when people are discussing accountability because they allow identification of the cause and the magnitude of the effects. Being able to pinpoint which one contributed to the effect is a very strong incentive for managers because of the expectation to demonstrate performance. According to Owen (2006), accountability means being answerable for expenditures, decisions and one’s actions. This definition of accountability is aligned to economic rationalism, a view that the objective of government or programmatic intervention should be efficiency; that is to say, getting more with less. This assertion is supported by the fact that those pushing for experiments in impact evaluation are those working for or with policymakers or development agencies that make decisions regarding resource allocations and require evidence of the returns of investments, respectively. This is the reason White (2010) argued that one of the advantages of experiments is they allow cost-effectiveness or cost-benefit analysis when the evaluation is able to report on the estimate of the effect.

However, accountability is not the sole purpose of impact evaluation. When there is too much accountability, organizational learning suffers. It is also important to unpack the causal processes within the program so that implementers learn the various factors affecting the program’s success or failure (Cronbach et al., 1980; Davies, 2011; Shadish, Cook, & Leviton, 1991; Stern et al., 2012). Again, this need is something beyond what the language of experimentation can provide. In the report of Chaudhury, Friedman, and Onishi (2013) on the impact of conditional cash transfer in the Philippines, they concluded that while the evaluand had an impact on the identified outcomes of interests, they acknowledged the need for an in-depth qualitative study to explain the causal factors affecting the varied responses of the treatment group. This shows the growing consensus among evaluation experts that causal explanation is better handled by alternative languages of causation, for example, theory-based and realist evaluations (Astbury & Leeuw, 2010; Davies, 2011; Stern et al., 2012).

Another flaw in the argument for accountability is the premise articulated by economists that government strives for economic efficiency and exactitude in decision-making (Cronbach et al., 1980; Getler et al., 2011; White, 2010). According to Simon (1990), optimization is not always the

Page 4: Pro and Cons of Experimental Method in Impact Evaluation

goal when individuals or organizations make decisions. In actual practice, a lot of decisions are made under a variety of constraints, including political, data and time constraints; furthermore, decisions are sometimes made without regard to achieving the optimal scenarios (Cronbach et al., 1980; Shadish et al., 1991). In the context of decision-making in developing countries, it sometimes suffices that a decision is well-argued, popularly supported and ‘good enough’ solutions. This form of decision-making is described as bounded rationality (Simon, 1985, 1990).

Finally, another benefit of experiments is that they enable policymakers to separate the good programs from the poor ones on the basis of proven scientific knowledge. Often referred to as evolutionary epistemology, this thinking of confirming and falsifying knowledge is entrenched in the language of experimentation and in science (Campbell, 1971; Cook & Campbell, 1979). The evolutionary epistemology provides an idealized view that human progress can be achieved by storing all knowledge that have been proven by experiments in data banks (Campbell, 1971). This is the rationale behind the establishment of knowledge database systems of evidence from RCTs and quasi-experiments, such as the International Initiative for Impact Evaluation (3ie), Campbell and Cochrane Collaborations.

The criticism against evolutionary epistemology is the privileging of certain knowledge claims at the expense of others. In 2003, this was the issue when the Institute for Educational Sciences released a policy that only RCTs and quasi-experimental designs would be funded (Cook et al., 2010; Donalson, 2009). According to Davidson (2006), this policy would actually produce the opposite. If no formative evaluation is conducted, and if evaluands which cannot be evaluated using experiments are not evaluated, then there would be a very serious problem with knowledge creation in society (Davidson, 2006). Another objection to this privileging is its normative implications. According to Shadish et al. (1991), this epistemological model might conflict with democratic values which posit that it is more important to build consensus as opposed to ‘eliminating’ views that are unproven. Also, in multicultural settings, the idea of sidelining non-experimental evidence, such as oral testimonies, group consensus, and opinions of elders, might disempower beneficiaries who belong to different contexts and cultures (Wehipeihana, 2013). Even though the program impact is measured precisely, there would still be no advancement of knowledge if the evaluation is not relevant and is not used (Patton, 2004, 2008).

In summary, it is important to point out the all the arguments that have been forwarded thus far support the thesis that the benefits of the language of experimentation fit only within the boundaries and communication needs of this language. However, these benefits have their own constraints; furthermore, the alternative views provide a more comprehensive and richer understanding of causation. In the next section, I will provide some initial ideas on the future trends on impact evaluation which have been shaped by debates on experimentation.

Future Directions of Impact Evaluation

The debates on causation had the effect of clarifying the important issues in impact evaluation. On a positive note, more questions are being raised now regarding causation and its effects than before. In the past, impact evaluation had been narrowly defined as determining the causal relationship between a program and an outcome of interest (Getler et al., 2011; White, 2010). However, it is no longer sufficient that the precise answer to this question is achieved without addressing the scope and relevance of the causal relationship (Cronbach, 1982; Cronbach et al., 1980). Impact evaluation must be able to provide answers to the questions, ‘What works?’ and ‘When it does work, how does it work? Why does it work? And does it work equitably?” (Bamberger & Segone, 2011; Pawson & Tilley, 1997; Stern et al., 2012). This goes to show that there is a need for other languages to understand causation holistically. Regarding this matter, I will discuss three

Page 5: Pro and Cons of Experimental Method in Impact Evaluation

alternative perspectives on causation and describe how these are gaining acceptability among evaluation theorists and practitioners.

The first alternative language which has already been discussed thoroughly is the realists’ view of causation. The main benefit of this language is that it explains the causal processes and connections that occur inside the ‘black box’ (Astbury & Leeuw, 2010; Pawson & Tilley, 1997). This causal explanation facilitates learning for program implementers and even beneficiaries because it demonstrates the interaction between the pre-existing context in which the program works and the mechanisms which might also be deeply entrenched in the individual, organisational and social relationships. In a sense, realist evaluation provides basis for identifying assumptions in logic models, fidelity assessment, and use of case studies in impact evaluation.

The second is a common sense view of causation which accepts as basic proposition the notion that causal relationships are perceivable. There are two arguments in support this, one of which is psychological and the other is philosophical. The psychological basis has its roots in the work of Albert Michotte on causal perceptions (Danks, 2009). Based on this work, it is argued that evaluative judgement of causality is deeply intertwined with observation (Danks, 2009; Wagemans, van Lier, & Scholl, 2006). Studies show that babies perceive causal relations through the launching effect (Danks, 2009; Leslie, 1987). The findings support the hypothesis that humans are causal cognizers even without habits, customs and experiences.

The philosophical argument rejects the notion that cause is related with necessity or universality (Anscombe, 1981). The rejection of necessity in causal relationships implies that counterfactual dependence, from which regularity theory derives its explanatory power, is also unnecessary. Hence, causation could be established in a single case. Scriven adopts a somewhat pragmatic stance in his arguments. For him, causation is fundamental to our thinking and in language because of its survival value (Scriven, 2009, 2013a). The evaluative judgment gained by early humans that hitting the stone flints together is the cause that ignited the fire was ‘beyond reasonable doubt’ for them because of that knowledge’s functional value. The perceptibility of causation lends credence to qualitative and mixed method studies which relies on observation to establish causation.

Finally, the last view is causal pluralism, which describes that causation is not a single monolithic concept that explains the connection between things in reality (Losee, 2011; Smith-Godfrey, 2009). In impact evaluation, the word ‘cause’ is used in the two following senses: effects of causes and causes of effects. These concepts correspond to the dichotomy of causal description and causal explanation, respectively. The linguistic meaning of cause in ordinary language further illustrates the diversity of this concept. According to Anscombe (1981), when the word ‘cause’ is used, it is sometimes used as a ‘container’ for other sorts of meaning, such as igniting, eating, running, etc. This means that it would be difficult to understand cause if there is no other causal related concepts in human language (Anscombe, 1981; Smith-Godfrey, 2009). What causal pluralism tells us is that diversity is a characteristic of causation and, furthermore, the best way to understand it is to employ diverse heuristic devices.

As a concluding remark, the direction of this analysis leans towards a critical pluralist stance in impact evaluation. By critical, I mean constantly challenging our understanding about causal concepts, including its effects, bandwidth, timeframes, trajectories and mechanisms. A pluralist stance, however, means keeping an open mind about the diversity of available tools to answer various causal questions and the need for dialogue between different methods and languages. The hope is that by being both critical and pluralist, a more holistic understanding of the world could be approximated.

Page 6: Pro and Cons of Experimental Method in Impact Evaluation

References

Anscombe, G. E. (1981). Metaphysics and the Philosophy of the Mind. Oxford: Basil Blackwell.Astbury, B. (2012). Using Theory in Criminal Justice Evaluation. In E. Bowen & S. Brown (Eds.),

Advances in Program Evaluation (Vol. 13, pp. 3-27): Emerald Group Publishing Limited.Astbury, B., & Leeuw, F. (2010). Unpacking Black Boxes: Mechanisms and Theory Building in

evaluation. American Journal of Evaluation, 31(3), 363-381. Bamberger, M., Rugh, J., & Mabry, L. (2012). Real World Evaluation: Working Under Budget, Time,

Data and Political Constraints (2nd ed.). Los Angeles: Sage Publications.Bamberger, M., & Segone, M. (2011). How to design and manage Equity-focused Evaluation. New

York: UNICEF Retrieved from http://mymande.org/sites/default/files/EWP5_Equity_focused_evaluations.pdf.

Beebee, H., Hitchcock, C., & Menzies, P. (2009). Introduction. In H. Beebee, C. Hitchcock & P. Menzies (Eds.), The Oxford Handbook of Causation. Oxford: Oxford University Press.

Benoît, R., & Ragin, C. (2009). Configurational Comparative Methods. Los Angeles: Sage Publications.Bryman, A. (2012). Social Research Methods (4th ed.). Oxford: Oxford University Press.Campbell, D. (1971). Reforms as Experiments. Urban Affairs Review, 7(133). Campbell, D., & Stanley, J. (1969). Experimental and Quasi-Experimental Designs for Research.

Chicago: Rand McNally and Co.Chaudhury, N., Friedman, J., & Onishi, J. (2013). Philippines Conditional Cash Transfer Program:

Impact Evaluation 2012: World Bank.Clarke, A. (2006). Evidenced-Based Evaluation in Different Professional Domains: Similarities,

Differences and Challenges. In I. Shaw, J. Greene & M. Mark (Eds.), The Sage Handbook of Evaluation. London: Sage Publications.

Cook, T., & Campbell, D. (1979). Quasi-Experimentation: Design and Analysis Issues for Field Setting. Boston: Houghton Mifflin Co.

Cook, T., Scriven, M., Coryn, C. L. S., & Evergreen, S. (2010). Contemporary Thinking About Causation in Evaluation: A dialogue with Tom Cook and Michael Scriven. American Journal of Evaluation, 31(1), 105-117.

Copi, I., & Cohen, C. (1990). Introduction to Logic (8th ed.). New York: Macmillan Publishing Co.Cronbach, L. J. (1982). Designing Evaluations of Educational and Social Programs. San Francisco:

Jossey-Bass Publishers.Cronbach, L. J., Ambron, S. R., Dornbusch, S. M., Hess, R. D., Hornic, R. C., Philipps, D. C., . . . Weiner,

S. S. (1980). Toward reform of program evaluation San Francisco: Jossey-Bass Publishers.Cummings, R. (2006). What if: the counterfactual in program evaluation. Evaluation Journal of

Australasia, 6(2), 6-15. Danks, D. (2009). The Psychology of Causal Perception and Reasoning. In H. Beebee, C. Hitchcock &

P. Menzies (Eds.), The Oxford Handbook of Causation. Oxford: Oxford University Press.Davidson, E. J. (2006). Editorial. Journal of MultiDisciplinary Evaluation, 6. Davies, R. (2011). 3ie and the Funding of Impact Evaluation: A Discussion Paper for 3IE's Members.

Canberra: AusAID.Donalson, S. I. (2009). In Search of the Blueprint for an Evidence-Based Global Society. In S. I.

Donalson, C. Christie & M. Mark (Eds.), What counts as credible evidence in applied research and evaluation practice? . California: Sage Publications Inc.

Donalson, S. I., Christie, C., & Mark, M. (2009). In Search of the Blueprint for an Evidence-Based Global Society. In S. I. Donalson, C. Christie & M. Mark (Eds.), What counts as credible evidence in applied research and evaluation practice? California: Sage Publications Inc.

Ferraro, P. (2009). Counterfactual thinking and impact evaluation in environmental policy. In M. Birnbaum & P. Mickwitz (Eds.), Environmental program and policy evaluation: Addressing methodological challenges. New Directions for Evaluation (Vol. 122, pp. 75-84).

Page 7: Pro and Cons of Experimental Method in Impact Evaluation

Garret, D. (2009). Hume. In H. Beebee, C. Hitchcock & P. Menzies (Eds.), The Oxford Handbook of Causation. New York: Oxford University Press.

Getler, P., Martinez, S., Premand, P., Rawlings, L., & Vermeersch, C. (2011). Impact Evaluation in Practice. Washington: World Bank.

Hume, D. (1777). An Enquiry Concerning Human Understanding Second. 2013, from www.gutenberg.net

King, G., Keohane, R., & Verba, S. (1994). Designing Social Inquiry. New Jersey: Princeton University Press.

Leslie, A. M. (1987). Do Six-Month-Old Infants Perceive Causality? Cognition 25, 265-288. Losee, J. (2011). Theories of Causality: From Antiquity to the Present. New Brunswick, New Jersey:

Transaction Publishers.Martland, T. R. (1975). On "The limits of my language mean the limits of my world". The Review of

Metaphysics, 29(1). Mohr, L. B. (1999). The qualitative method of impact analysis. [Article]. American Journal of

Evaluation, 20(1), 69. Mumford, S. (2009). Causal Powers and capacities. In H. Beebee, C. Hitchcock & P. Menzies (Eds.),

The Oxford Handbook of Causation. Oxford: Oxford University Press.Owen, J. M. (2006). Program evaluation: forms and approaches: St Leonards, N.S.W. : Allen & Unwin,

2006.Patton, M. Q. (2004). The Roots of Utilization-Focused Evaluation. In M. C. Alkin (Ed.), Evaluation

Roots (pp. 276-292). London: Sage Publications, Inc.Patton, M. Q. (2008). Utilization-focused evaluation. Thousand Oaks: Sage Publications.Pawson, R., & Tilley, N. (1997). Realist Evaluation. Los Angeles: Sage Publications.Scriven, M. (2009). Demythologizing Causation and Evidence. In S. I. Donalson, C. Christie & M. Mark

(Eds.), What counts as credible evidence in applied research and evaluation practice? California: Sage Publications Inc.

Scriven, M. (2013a). The Foundation and Future of Evaluation. In S. I. Donalson (Ed.), The Future of Evaluation in Society: A tribute to Michael Scriven. Charlotte: Information Age Publishing.

Scriven, M. (2013b). The Past, Present and Future of Evaluation. Paper presented at the Australasian Evaluation Society 2013 International Conference, Brisbane, Australia.

Scruton, R. (1994). Modern Philosophy. New York: Penguin Press.Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and Quasi-Experimental Designs for

Generalized Causal Inference. Boston: Houghton Mifflin Co.Shadish, W., Cook, T. D., & Leviton, L. C. (1991). Foundations of program evaluation : theories of

practice. Newbury Park, CA: Sage Publications.Simon, H. (1985). Human Nature in Politics. The American Political Science Review, 79(2), 293-304. Simon, H. (1990). Invariants of Human Behavior. Annual Review in Psychology, 41, 1-19. Smith-Godfrey, P. (2009). Causal Pluralism. In H. Beebee, C. Hitchcock & P. Menzies (Eds.), The

Oxford Handbook of Causation. Oxford: Oxford University Press.Smits, J., Huisman, J., & Kruijff, K. (2008). Home language and education in the developing world.

Bangkok: UNESCO Retrieved from http://unesdoc.unesco.org/images/0017/001787/178702e.pdf.

St. Pierre, R. (2004). Using Randomized Experiments. In J. Wholey, H. Hatry & K. Newcomer (Eds.), Handbook on Practical Program Evaluation. San Francisco: Jossey-Bass.

Stern, E., Stame, N., Mayne, J., Forss, K., Davies, R., & Befani, B. (2012). Broadening the Range of Designs and Methods for Impact Evaluation. Retrieved from https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/67427/design-method-impact-eval.pdf.

Tacq, J. (2011). Causality in qualitative and quantitative research. Quality & Quantity, 45(2), 263-291. UNESCO. (2012). Language Matters for the Millenium Development Goals. Bangkok, Thailand:

UNESCO.

Page 8: Pro and Cons of Experimental Method in Impact Evaluation

USAID. (2010). Performance Monitoring and Evaluation Tips: Rigorous Impact Evaluation. Retrieved from http://pdf.usaid.gov/pdf_docs/pnadw119.pdf.

Wagemans, J., van Lier, R., & Scholl, B. J. (2006). Introduction to Michotte's heritage in perception and cognition research. Acta Psychologica, 123(1-2), 1-19.

Wehipeihana, N. (2013). Indigenous Evaluation: a metaphor for social justice, inclusion and equity. Paper presented at the Australasian Evaluation Society 2013 International Conference, Brisbane, Australia.

White, H. (2010). A Contribution to Current Debates in Impact Evaluation. Evaluation, 16(2), 153-164s.

White, H., & Phillips, D. (2012). Addressing attribution of cause and effect in small n impact evaluations: towards an integrated framework. 3ie Working Paper Series. Retrieved March 18, 2013, from http://www.3ieimpact.org/en/evaluation/working-papers/working-paper-15/

Wittgenstein, L. (1958). Philosophical Investigations. Oxford: Basil Blackwell Ltd.