Transcript
Page 1: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

The Debate About Randomized Controls in Evaluation:

The Gold Standard

Question

Page 2: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Political Context• US Dept of Ed “Scientifically Based Evaluation Methods” policy ( RCTs (Institute of Educational Sciences)www.ed.gov/programs/edresearch/applicant.html

•World Bank critics – Poverty Action Lab, MIT: RCTs as “the gold standard in evaluation… only 2% of World Bank programs “properly evaluated….” (NY Times, 28/4/04, p. A4)

Page 3: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

American Evaluation Association position and debate

•AEA dissent from RCTs as the gold standard

• Counter-statement from RCT advocates in AEA

Page 4: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

AEA position (Nov 4, 2003): (1) Studies capable of determining causality. Randomized control group trials (RCTs) are not the only studies capable of generating understandings of causality. In medicine, causality has been conclusively shown in some instances without RCTs, for example, in linking smoking to lung cancer and infested rats to bubonic plague. The secretary's proposal would elevate experimental over quasi-experimental, observational, single-subject, and other designs which are sometimes more feasible and equally valid.

Page 5: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

RCTs are not always best for determining causality and can be misleading. RCTs examine a limited number of isolated factors that are neither limited nor isolated in natural settings. The complex nature of causality and the multitude of actual influences on outcomes render RCTs less capable of discovering causality than designs sensitive to local culture and conditions and open to unanticipated causal factors.

Page 6: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

RCTs should sometimes be ruled out for reasons of ethics. For example, assigning experimental subjects to educationally inferior or medically unproven treatments, or denying control group subjects access to important instructional opportunities or critical medical intervention, is not ethically acceptable even when RCT results might be enlightening. Such studies would not be approved by Institutional Review Boards overseeing the protection of human subjects in accordance with federal statute.

Page 7: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

In some cases, data sources are insufficient for RCTs. Pilot, experimental, and exploratory education, health, and social programs are often small enough in scale to preclude use of RCTs as an evaluation methodology, however important it may be to examine causality prior to wider implementation.

Page 8: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

(2) Methods capable of demonstrating scientific rigor. For at least a decade, evaluators publicly debated whether newer inquiry methods were sufficiently rigorous. This issue was settled long ago. Actual practice and many published examples demonstrate that alternative and mixed methods are rigorous and scientific. To discourage a repertoire of methods would force evaluators backward. We strongly disagree that the methodological "benefits of the proposed priority justify the costs."

Page 9: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

(3) Studies capable of supporting appropriate policy and program decisions. We also strongly disagree that "this regulatory action does not unduly interfere with State, local, and tribal governments in the exercise of their governmental functions." As provision and support of programs are governmental functions so, too, is determining program effectiveness. Sound policy decisions benefit from data illustrating not only causality but also conditionality. Fettering evaluators with unnecessary and unreasonable constraints would deny information needed by policy-makers.

Page 10: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

While we agree with the intent of ensuring that federally sponsored programs be "evaluated using scientifically based research . . . to determine the effectiveness of a project intervention," we do not agree that "evaluation methods using an experimental design are best for determining project effectiveness.” We believe that the constraints in the proposed priority would deny use of other needed, proven, and scientifically credible evaluation methods, resulting in fruitless expenditures on some large contracts while leaving other public programs unevaluated entirely.

End of AEA position (Nov 4, 2003). Source: www.eval.org

Page 11: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Foundational Premises

Research and evaluation are different – and thereforeEvaluated by different standards.

Page 12: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Evaluation Standards

UtilityFeasibilityProprietyAccuracy

For the full list of Standards:www.wmich.edu/evalctr/checklists/standardschecklist.htm

Page 13: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

GOLD STANDARD:

METHODOLOGICALAPPROPRIATENESS

not

Methodologicalorthodoxy or rigidity

Page 14: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Conditions for which randomized control trials (RCTs) are especially appropriate

A discrete, concrete intervention

(treatment) – singular, well-specified

Implementation can be standardized

Implementation can be controlled

Valid and reliable measures exist for

the outcome to be tested

Random assignment is possible

Random assignment is ethical

Page 15: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Examples of Appropriate Use of RCTs

• Drug studies

• Fertilizer and crop yields

• A single health practice:

-- brushing teeth

-- exercise regimen

Page 16: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Examples where RCTs are not appropriate in my view

Complex, multi-dimensional and highly context-specific community health interventions

Example: FB

Page 17: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

What is methodologically appropriate and possible:

• In-depth case studies of particular community interventions and outcomes

• Comparative pattern analysis of natural variation

• Systems analysis

Page 18: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Example where RCT is not possible

A Study of the Deaths of Persons who areHomeless in Ottawa – A Social and Health Investigation Report to the Cityof Ottawa August 2003

Vivien RunnelsCoordinatrice/Coordinator, Research and OperationsCentre de recherche sur les services communautaires/Centre for Research onCommunity ServicesFaculté des sciences sociales/Faculty of Social SciencesUniversité d'Ottawa, University of Ottawa

www.socialsciences.uottawa.ca/crcs/eng/publ.asp

Page 19: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

What is possible and appropriate:

• Multiple sources of data

about each case

• Triangulation of sources

• Modus operandi analysis

• Epidemiological field

methods

Page 20: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Example where RCT isnot needed in my view

• Face validity is high

• The observed changes are

dramatic

• The link between treatment

and outcome is direct

Page 21: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Individual, every day examples of causal attribution:

• Sore muscles after exercise Hangover after drinking Sex with a single partner and resulting pregnancy or STD Keyboard failure -unplugged

Page 22: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Examples of professional expertise and causal analysis

• Auto mechanic

• Airline and auto crashes

• Causes of fires

• Forensic science, autopsies

• Physicians

Page 23: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Example of Formative Program Evaluation

Drop-out problem

Baseline stats, target group

analysis (disaggregate data),

interview drop-outs and staff,

review case records,

make program changes,

monitor results

Page 24: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Examples of summative program evaluation

• Treatment of chronic

alcoholics

• Worm medicine and

school attendance in LDCs

Page 25: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Intervention comparisons

The search for viable alternatives to an existing intervention or program that may produce comparably valuable benefits at comparable or lesser cost to find the

best use of the resources.

Page 26: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Comparing the costs and benefits of alternative interventions

is typically more important in evaluation than RCT designs.

Pragmatically, comparison of a treatment to a control is less interesting and meaningful because not doing anything is seldom the realistic policy alternative. This distinguishes the normal approach in program evaluation from that in experimental design for research purposes. (See Scriven, Evaluation Thesaurus, 1991:

112)

Page 27: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Matching the evaluation design to the stage of program or intervention development andthe state of knowledge.

Page 28: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

William Trochim, Cornell University (EvalTalk, Dec 4, 2003): Medical research arguably has the strongest tradition of randomizedexperimental design for causal assessment. But that can only be understood in the context of their comprehensive approach to scientific assessment. In medical research, clinical trials are typically grouped into four phases. All four phases are considered essential to scientific assessment of the effects of treatments. Randomized experiments (and, when necessary, their quasi-experimental alternatives) are considered Phase III trials.

• Phase I trials are essentially small sample exploratory studies that assess the nature of the intervention, how it is administered, potential side effects, patient reactions, etc. • Phase II trials tend to be correlational studies of safety and effectiveness. • Phase III trials are typically the randomized experiments designed for high internal validity.•Phase IV trials explore dose responses, potential unintended effects, generalizability issues and new uses for the treatment.

Page 29: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

All four phases are necessary to do "scientifically based" assessment in medicine. Phase III trials -- randomized experiments - emphasize causal assessment and internal validity considerations, and play a major role, but are not in themselves sufficient for scientific assessment. And they would not typically be allowed even to commence without rigorous Phase I and II assessment, typically taking years to accomplish. The other phases (than Phase III) address issues of critical importance in scientific assessmentincluding construct, conclusion and external validity. Most new treatments don't survive to the Phase III clinical trials. Typically well under half of all interventions are found safe and effective enough to proceed to Phase III research.

There is a very useful set of materials on clinical trials and their phases in cancer research at:www.cancer.gov/clinicaltrials/resources/clinical-trials-education-series.

William Trochim, Cornell University (EvalTalk, Dec 4, 2003)

Page 30: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Some Alternative Ways of

Establishing Causality

Page 31: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

The method of eliminating alternative possible causes (EAC): This relies on expert knowledge, e.g., by using the diagnostic schemata of the coroner or the auto mechanic. The approach rests on three premises: (i) the well-supported principle of macro-determinism; (ii) well-supported checklists of possible causes of much-studied effects; (iii) hard-earned knowledge about the distinctive pattern of operation (the modus operandi) of each possible cause. The EAC approach can be used to establish causation in a particular case or in a family of cases.

(Michael Scriven EvalTalk, Jan. 4, 2004)

Page 32: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Quasi-experimental designs, e.g., matched comparisons

Finding: “Heart disease risks seen as universal”

Dr. Salim Yousef, McMaster University, presented at the European Society of Cardiology, Aug 29,2004

Followed 29,000 people in 52 countries Conducted over 10 years 262 scientists involved matched 15,000 who had first heart attack by age, gender and location with those not having had a heart attack

Page 33: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Assessment of study’s rigorEditor of The Lancet medical journal where the findings have just been published:

“Probably the most robust study on heart disease risk factors ever conducted.”Associated Press report, reporter Emma Ross, Aug 30, 2004

Page 34: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Other designs to enhance drawing causal inferences

• Interrupted time series

• Systematic treatment (a.k.a. ‘dosage’) variation (STV)

• Meta-analysis methodology (MAM)

• Direct observations (astronomy, field biology)

Page 35: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Alternative Kinds of Conclusions

Principles versus Practices In Search of Excellence Good to Great External validity focus (Cronbach)

Farming systems research Family systems

continued…/

Page 36: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Cultural explanationsCausality is partly socially constructed,e.g., Thomas Theorem:What is perceived as real is real in its consequences.

So, cultural field studies record and use the cultural explanations constructed by people and the implications of those constructions for their lives and interactions with others.

Health examples: Polio eradication;Yukon Native Canadian practices;

vaccine side effects – MMR (measles, mumps, rubella, see Journal of Epidemiol Community Health, 2000, 54: 402-403 (June)

Page 37: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Considerations in using RCTs

1. Exact type of problem and answer required (e.g., a conclusion with high external validity vs. internal validity)

2. ethical constraints

3. logistical constraints (including resources, participants’ willingness and investigators’ competencies

4. economic constraints, and 5. time constraints. (Scriven, EvalTalk, Jan. 4, 2004)

Page 38: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Alternative Standards for Judging Attribution

• Scientific proof (RCTs)

• Reasonable probability,

or beyond reasonable doubt

• Occam’s razor

• Actionable and useful

Page 39: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

Different Evaluation Purposes

For making judgmentsCommonly called summative evaluations:

For improving programsCommonly called formative evaluations

For ongoing development and knowledge building.

Sometimes called developmental evaluations

Page 40: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

The Challenge:

Matching the evaluation design to the evaluation’s purpose, resources, and timeline to optimize use.

Page 41: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

GOLD STANDARD:

METHODOLOGICALAPPROPRIATENESS

notmethodological

orthodoxy or rigidity

Page 42: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

References1. Archives of EvalTalk, listserv of the American Evaluation Association, January-February, 2004 debate and discussion on AEA position statement on proposed Dept of Education research and evaluation standards.

2. Cook. T.D. 2003, "A Critical Appraisal of the Case Against UsingExperiments to Assess School (or Community) Effects," 128 kB pdf at<http://www.educationnext.org/unabridged/20013/cook.pdf> Print version: "Why have educational evaluators chosen not to do randomizedexperiments? Annals of American Academy of Political and Social Science, 589: 114-149

3. Cook. T.D. 2002. "Randomized Experiments in Educational Policy Research:A Critical Examination of the Reasons the Educational EvaluationCommunity has Offered for not Doing Them" Educational Evaluation andPolicy Analysis, Fall, Vol. 24(3): 175-199; abstract online at<http://www.aera.net/pubs/eepa/abs/eepa24.htm>

4. Patton, Michael Quinn, 1997. “The Paradigms Debate” Utilization-Focused Evaluation, 3rd ed., Sage Publications, chapter 12.

continued…/

Page 43: Michael Quinn Patton N.I.H. Sept 14, 2004 The Debate About Randomized Controls in Evaluation: The Gold Standard Question

Michael Quinn Patton N.I.H. Sept 14, 2004

5. Web reference:

"Determining Causality in Program Evaluation and Applied Research: Should Experimental Evidence Be the Gold Standard?“

Streaming video of a debate between Dr. Mark Lipsey (Vanderbilt University) and Dr. Michael Scriven (Western Michigan University) available on the Claremont Graduate University website:http://www.cgu.edu/sbos/pdw.html       

This debate, held on July 15, 2004 was co-sponsored by the Southern CaliforniaEvaluation Association following the Claremont Graduate University Professional Development Workshops. Selected excerpts from the debate are also available on the website in a text format.


Recommended