Alternatives for Evaluating Theories of Change · 2008. 8. 1. · 10 Conceptualizing the Intervention Alternatives for Evaluating Theories of Change All the World’s a Stage for

10Conceptualizing the InterventionAlternatives for Evaluating Theories of Change

All the World’s a Stage for Theory

In Tony Kushner’s Pulitzer Prize–winning play, Angels in America, Part Two opens in the Hallof Deputies, the Kremlin, where Aleksii Antedilluvianovich Prelapsarianov, the World’s oldestliving Bolshevik, speaks with sudden, violent passion, grieving a world without theory:

How are we to proceed without Theory? What System of Thought have theseReformers to present to this mad swirling planetary disorganization, to the InevidentWelter of fact, event, phenomenon, calamity? Do they have, as we did, a beautifulTheory, as bold, as Grand, as comprehensive a construct . . . ? You can’t imagine,when we first read the Classic Texts, when in the dark vexed night of our ignoranceand terror the seed-words sprouted and shoved incomprehension aside, when theincredible bloody vegetable struggled up and through into Red Blooming gave usPraxis, True Praxis, True Theory married to Actual Life. . . . You who live in this SourLittle Age cannot imagine the grandeur of the prospect we gazed upon: like standingatop the highest peak in the mighty Caucasus, and viewing in one all-knowing glancethe mountainous, granite order of creation. You cannot imagine it. I weep for you.

And what have you to offer now, children of this Theory? What have you to offerin its place? Market Incentives? American Cheeseburgers? Watered-down Bukharinitestopgap makeshift Capitalism? NEPmen! Pygmy children of a gigantic race!

Change? Yes, we must change, only show me the Theory, and I will be at the bar-ricades, show me the book of the next Beautiful Theory, and I promise you theseblind eyes will see again, just to read it, to devour that text. Show me the words thatwill reorder the world, or else keep silent.1

—Kushner 1994:13–14

333

10-Patton-45577.qxd 5/28/2008 7:23 PM Page 333

Evaluation and Program Theory

Evaluability Assessment

The idea that evaluation should includeconceptualizing and testing a program’stheory of change emerged in the 1970s aspart of a more general concern aboutassessing a program’s readiness for evalua-tion. The notion was basically this: Beforeundertaking an evaluation, the programshould be clearly conceptualized as someidentifiable set of activities that are expectedto lead to some identifiable outcomes. Thelinkage between those activities and out-comes should be both logical and testable.“Evaluability assessment is a systematicprocess for describing the structure of aprogram and for analyzing the plausibilityand feasibility of achieving objectives; theirsuitability for in-depth evaluation; and theiracceptance to program managers, policy-makers, and program operators” (Smith2005a:136; see also Smith 1989).

One primary outcome of an evaluabilityassessment is definition of a program’stheory. This means specifying the underlyinglogic (cause and effect relationships) of theprogram, including what resources andactivities are expected to produce whatresults. An evaluability assessment is alsoexpected to gather various stakeholders’perspectives on the program theory andassess their interest in evaluation. Alsoassessed are the program’s capacity toundertake an evaluation and its readinessfor rigorous evaluation (e.g., whether theprogram’s theory is sufficiently well con-ceptualized and measures of outcomes ade-quately validated to permit a meaningfulsummative evaluation).

Evaluability assessment was the evaluator’sversion of foreplay: getting the program readyfor the act itself, the act being evaluation—leading to the climax of producing findings.Or if you find sexual innuendo distracting

or inappropriate, consider an agriculturalanalogy. Evaluability assessment involvedtilling the soil before planting the seeds (eval-uation questions) that, if properly nourished,would produce an abundant yield (usefulfindings).

In effect, evaluability assessment putsevaluators in the business of facilitatingdesign of the program in order for it to beevaluated. For already existing programs,this means redesigning the program becausethe original program model was insuffi-ciently specified to be evaluated. Intendedoutcomes are often vague or unmeasur-able (as discussed in Chapter 7), and howdesired outcomes will actually result fromthe program’s activities is often far fromclear. As evaluators became involved inworking with program people to moreclearly specify the program’s model (ortheory), it became increasingly clear thatevaluation was an up-front activity notjust a back-end activity. That is, traditionalplanning models laid out some series ofsteps in which planning comes first, thenimplementation of the program, and thenevaluation, making evaluation a back-end,last-thing-done activity. But to get a pro-gram plan or design that could actually beevaluated meant involving evaluators—andevaluative thinking—from the beginning.

Evaluative thinking, then, becomes partof the program design process, including,especially, conceptualizing the program’stheory of change: How will what the pro-gram does lead to the desired results?Engaging in this work is an example ofprocess use (Chapter 5) in which the evalu-ation has an impact on the program quiteapart from producing findings about pro-gram effectiveness. The very process ofconceptualizing the program’s theory ofchange can have an impact on how theprogram is implemented, understood,talked about, and improved. The evalua-tive thinking process has these impacts.

334 � FOCUSING EVALUATIONS: CHOICES, OPTIONS, AND DECISIONS


This has huge implications for evalua-tors. It means that evaluators have to be(1) astute at conceptualizing program andpolicy theories of change and (2) skilled atworking with program people, policymak-ers, and funders to facilitate their articula-tion of their implicit theories of change.Given the importance of these tasks, itmatters a great deal what theory of changeframeworks the evaluator can offer.Options for doing theory of change workas part of a utilization-focused evaluationis the subject of this chapter.

Mountaintop Inferences

That evil is half-cured whose causewe know.

—Shakespeare

Causal inferences flash as lightning boltsin stormy controversies. While philoso-phers of science serve as meteorologists forsuch storms—describing, categorizing, pre-dicting, and warning, policymakers seekto navigate away from the storms to safeharbors of reasonableness. When studyingcausality as a graduate student, I marveledat the multitude of mathematical and logi-cal proofs necessary to demonstrate thatthe world is a complex place (e.g., Nagel1961; Bunge 1959). In lieu of rhetoric on

the topic, I offer a simple Sufi story tointroduce this chapter’s discussion of therelationship between means and ends,informed and undergirded by theory.

The incomparable Mulla Nasrudin was vis-ited by a would-be disciple. The man, aftermany vicissitudes, arrived at the hut on themountain where the Mulla (teacher) was sit-ting. Knowing that every single action of theilluminated Sufi was meaningful, the new-comer asked Nasrudin why he was blowingon his hands. “To warm myself in the cold,of course,” Nasrudin replied.

Shortly afterward, Nasrudin poured outtwo bowls of soup, and blew on his own.“Why are you doing that, Master?” askedthe disciple. “To cool it, of course,” said theteacher.

At that point, the disciple left Nasrudin,unable to trust any longer a man who usedthe same process to cause different effects—heat and cold.

—Adapted from Shah 1964:79–80

Conceptualizing Interventions

At the simplest level, we can model whatthe disciple observed as follows:

Hot soup → Blow on hot soup →Cooler soup

Cold hands → Blow on cold hands →Warmer hands

So, what’s going on in these twosequences? What’s the intervention? Theintervention is Nasrudin’s breath. Thebaselines are (1) hot soup and (2) coldhands. The results are (1) cooler soup and(2) warmer hands. We assume Nasrudin’sbreath temperature to be about the sametemperature in each case. Puzzling this out,we can posit the following interventiontheory: If the object being blown on iswarmer than one’s breath, then the objectwill be cooled by the blowing; if the objectbeing blown on is cooler than one’s breath,

Conceptualizing the Intervention � 335

Process Use and Theory of Change

Assisting primary intended users toconceptualize the program’s theory of changecan have an impact on the program beforeany evaluative data are gathered aboutwhether the program’s theory works. This isan example of the process use of evaluation(as opposed to findings use). The veryprocess of conceptualizing the program’stheory of change can affect how the programis implemented, understood, talked about,and improved.


then the object will be warmed. That’s asimple intervention theory. An interventiontheory is basically an if/then assertion orhypothesis: If we do x, then y will result.

Now, using this simple sequential logic,let’s turn to a program intervention.

Person lacks training needed to get a goodjob → Provide appropriate training →Trained person gets a good job

This is a simple (and common) programtheory. If we train people, then they will getgood jobs. It focuses on a single problem:lack of training. It provides a focused inter-vention: job training. It has a straightfor-ward, measurable outcome: a good job.That’s a starting place—and it’s the startingplace for many policymakers and programdesigners who want to help poor people getbetter jobs. Then we start asking deeperquestions and surfacing assumptions. Does“training” mean just skill training (how todo keyboarding and data entry), or does italso include “soft skills” (how to get along inthe workplace)? What is “appropriate”training? What is a “good” job? At thisstage, these aren’t measurement questions.We’re not asking how we would measurewhether or not a person got a good job.We’re asking conceptual and values-basedquestions: Will the kind of training providedlead to the kind of job desired? Is it enoughto give the person keyboarding skills? Whatif the person is a recent immigrant andspeaks English poorly? Does the programintervention need to include language train-ing? What if the trainee uses drugs? Does theprogram need to include drug treatment?What if the poor person is a single motherwith young children? Does the programintervention need to include child care?How will the poor person get to training?Will the program intervention have toinclude transportation support to be effec-tive? Is it enough to provide training, or willthere need to be job placement services?And what about the workplace? If the poor

person being training is African Americanand the job opportunities are in companieswith mostly white employees, will somekind of support be needed in the workplaceto create an environment in which thisnewly trained person can succeed? As thequestioning proceeds, the simple interven-tion above may morph into the more com-plicated program intervention as depictedin Exhibit 10.1, which presents the pro-gram theory of a real program.

The Jargon Challenge:What Are We Talking About?

A proliferation of terms has come intouse describing how program activities leadto program outcomes. Some of the languageemphasizes elucidating the logic of whatthe program does, so we have logic models,logical frameworks, and intervention logic.Some focus on theory: program theory,theory-based evaluation, theory-driven eval-uation, theory of change, theory of action,and intervention theory. Some approachesemphasize linkages: chain of objectives,outcomes mapping, and impact pathwayanalysis. Three important distinctions areembedded in these different terms.

(1) Logic modeling versus theory ofchange. Does the model simply describe alogical sequence or does it also provide anexplanation of why that sequence operatesas it does? Specifying the causal mecha-nisms transforms a logic model into atheory of change.

A logic model only has to be logical andsequential. The logic of a logic model is par-tially temporal: It is impossible for an effector outcome to precede its cause. A logicmodel expresses a sequence in the sense thatone thing leads to another. You crawl beforeyou walk before you run is a descriptivelogic model. Crawling precedes walking,which precedes running. It becomes a theory



337

New

Ski

lls L

ead

to

Liv

ing

Wag

e Jo

bs

• M

eet e

mpl

oyer

exp

ecta

tions

fo

r ha

rd a

nd s

oft s

kills

• M

anag

e st

abili

ty is

sues

and

cr

ises

that

can

affe

ct j

ob

perf

orm

ance

• In

tern

aliz

e em

pow

erm

ent s

kills

an

d us

e th

em d

aily

• P

riorit

ize

self-

inte

rest

and

set

bo

unda

ries

with

frie

nds

and

fa

mily

• E

ngag

e in

a h

ealth

ier

lifes

tyle

(e

xerc

ise,

nut

ritio

n, h

ealth

in

sura

nce)

• R

etai

n liv

ing

wag

e em

ploy

men

t

• A

chie

ve s

elf-

suffi

cien

cy

• P

artic

ipat

e in

the

broa

der

co

mm

unity

ParticipantExperience

• H

ighl

y st

ruct

ured

pro

gram

that

sup

port

s pa

rtic

ipan

t pro

gres

s•

Con

sist

ent e

xpec

tatio

ns, f

ollo

w-t

hrou

gh a

nd c

onse

quen

ces

base

d

on m

arke

t exp

ecta

tions

Pro

gra

m S

tru

ctu

re

• T

rain

ing

in e

mpo

wer

men

t ski

lls•

Tra

inin

g in

sof

t ski

lls (

time

man

agem

ent,

goal

set

ting,

oth

ers)

• T

rain

ing

in h

ard

skill

s (b

asic

ski

lls, k

eybo

ardi

ng, c

ompu

ters

)•

Tra

inin

g in

job

skill

s (jo

b se

arch

ski

lls, o

ther

job

skill

s)

Tra

inin

g

• R

econ

nect

with

thei

r co

re v

alue

• P

riorit

ize

self-

inte

rest

• R

egul

ate

emot

ions

• T

ake

resp

onsi

bilit

y fo

r se

lf•

Look

with

in fo

r so

lutio

ns•

Man

age

core

hur

ts

Em

po

wer

men

t H

elp

s P

arti

cip

ants

• D

evel

op n

ew s

oft a

nd h

ard

skill

s•

Dev

elop

new

hab

its•

Set

bou

ndar

ies

with

fam

ily/fr

iend

s•

Sep

arat

e fr

om u

npro

duct

ive

rela

tions

hips

• M

aint

ain

or d

evel

op h

ealth

y re

latio

nshi

ps•

Man

age

stab

ility

issu

es a

nd c

rises

Par

tici

pan

ts L

earn

To

• O

ne-o

n-on

e w

eekl

y co

achi

ng, e

ncou

rage

men

t, an

d su

ppor

t•

Ass

ista

nce

with

goa

l set

ting

and

plan

ning

• M

anag

e an

d re

fram

e pa

rtic

ipan

t exp

ecta

tions

• S

erve

as

a ro

le m

odel

for

the

skill

s be

ing

taug

ht•

Rei

nfor

ce/s

uppo

rt th

e ap

plic

atio

n of

em

pow

erm

ent s

kills

Co

ach

ing

• S

afe,

res

pect

ful l

earn

ing

envi

ronm

ent w

here

par

ticip

ants

feel

co

mfo

rtab

le m

akin

g m

ista

kes,

lear

ning

new

ski

lls, a

nd

deve

lopi

ng n

ew r

elat

ions

hips

• R

einf

orce

men

t of e

mpo

wer

men

t ski

lls a

nd p

rinci

ples

th

roug

hout

the

entir

e pr

ogra

m•

Pro

gram

cul

ture

that

mod

els

expe

ctat

ions

in th

e w

orkp

lace

Pro

gra

m C

ult

ure

EX

HIB

IT10

.1Th

eory

of C

hang

e fo

r an

Empl

oym

ent T

rain

ing

Prog

ram


of change when you explicitly add thechange mechanism. You crawl, and crawlingdevelops the gross-motor skills and bodycontrol capabilities that make it possible towalk; you walk, and walking develops thebalance, further gross-motor skills, and bodycontrol needed to run. Adding the causalmechanism moves the model from programlogic to program theory.

(2) A second critical distinction involvesthe source of the model. The terms programlogic or program theory imply that what isbeing depicted is what people who run theprogram believe is going on. It is the theoryarticulated by the program staff, administra-tors, and funders. In contrast, theory-drivenevaluation or theory-based evaluation typi-cally refers to the program as a test of somelarger social science theory. Staff in a faith-based initiative may explain a program bysaying it puts participants in touch with theirinherent spiritual nature; this would be theprogram’s theory. A social science researchermight look at the same program through thelens of a sociological theory that explainshow cohesive groups function to createshared beliefs and norms that determinebehavior; that approach would be theorydriven. Theory of change can be a hybrid ofboth program theory and social sciencetheory, and often is, as the idea of theory-based evaluation has evolved over the years(Mason and Barnes 2007; Rogers 2007;Weiss 2007) and come to include both“small theories” that are program specific(Leviton 2007; Lipsey 2007c; Layzer 1996)and the larger theories of which a specificprogram theory is but one manifestation.

(3) A third distinction concerns the unit ofanalysis—or we might say, the unit of logic,or the boundaries of the theory of change. Inelucidating a program model, the term pro-gram is sometimes a discrete local effort, likea local employment training program. Thatlocal program has its own logic model and/or

program theory that constitutes a specificintervention. But in large organizations likeinternational development agencies or philan-thropic foundations, a program can refer to acollection of interventions made up of severalprojects and grants. For example, TheAtlantic Philanthropies as a philanthropicfoundation has a Reconciliation and HumanRights strategic focus that consists of threeprogram areas each with several distinctprojects, grants, and intervention strategies,some of which fit together into a cluster. Thecluster has its own theory of change distinctfrom but based on the logic models of indi-vidual grants. In such settings, one has to becareful to specify the unit of analysis for thetheory of change. The language of interven-tion logic or intervention theory avoids con-fusion about what the word “program”means by focusing on a specific interventionwhich might be one strategy within anumbrella program (that has several interven-tions) or a strategy that cuts across anumber of programs (where the overallintervention is a comprehensive, multifac-eted, integrated, and omnibus developmentstrategy). A policy or advocacy process canalso be the unit of analysis for which one isdeveloping a logic model (Coffman 2007a;Gardner and Geierstanger 2007; Hendricks-Smith 2007; Kay 2007).

The theory of action language comesfrom action research and organizationaldevelopment traditions where the focusis typically on some specific solution to aspecific problem (the “action”); that actionmay not be a full-scale intervention, program,policy, or theory—but an action taken withinsome concrete time period for some specificpurpose. Doing something to reduce thedropout problem in a program would involvesome theory of action. The theory of actiontradition places particular emphasis on distin-guishing “espoused theory” (how practition-ers explain what they are attemptingto do) from “theory-in-use” (what their



behavior reveals about what actually guideswhat they do). The major evaluative thrustof the theory of action framework is helpingpractitioners examine, reflect on, and dealwith the discrepancies between theirespoused theory and their theory-in-use(Argyris 1993, 1974; Schön 1987, 1983;Argyris and Schön 1978, 1974). “People donot always behave congruently with theirbeliefs, values, and attitudes (all part ofespoused theories). . . . Although people donot behave congruently with their espousedtheories, they do behave congruently withtheir theories-in-use” (Argyris 1982:85).

In this conundrum of dissonance betweenstated belief and actual practice lies a goldenopportunity for reality testing: the heart ofevaluation. Sociologist W. I. Thomas posited inwhat has become known as Thomas’Theorem that what is perceived as real is realin its consequences. Espoused theories arewhat practitioners perceive to be real. Thoseespoused theories, often implicit and onlyespoused when asked for and coached intothe open, have real consequences for whatpractitioners do. Elucidating the theory ofchange held by primary users can help thembe more deliberative about what they do and

more willing to put their beliefs and assump-tions to an empirical test through evaluation.In short, the user-focused approach challengesdecision makers, program staff, funders, andother users to engage in reality testing, that is,to test whether what they believe to be true(their espoused theory of action) is what actu-ally occurs (theory-in-use).

Which Term Is Best?

Given all this diversity of and confusionin language, which term is best? Logicmodel? Theory of change? Interventionmodel? From a utilization-focused evalua-tion perspective, that label is best thatmakes the most sense to primary intendeduses—the term they resonate to and hasmeaning within their context. Here aresome examples of how contexts vary.

In international settings, logical frameworksor “Logframes” have a long history of use bygovernment aid agencies (Norwegian Agencyfor Development Cooperation 1999). UnitedWay of America has promoted logic models asa way for nonprofit agencies to present fundingproposals (United Way 1996). Large-scalecommunity development initiatives haveadopted the “theory of change” language duein large part to an influential article by CarolWeiss (1995) widely disseminated by TheAspen Institute (Connell et al. 1995). Programtheory has been a target for “advancement”among evaluators for more than two decades(Bickman 1994, 1990). Program logic and pro-gram theory are familiar terms in Australia andNew Zealand due to the influential works ofBryan Lenne (1987), Sue Funnell (2005,2000, 1997), and Patricia Rogers (2008,2005b, 2005c, 2003, 2000a, 2000b).Theory-driven evaluation has been widelypromoted by Huey-Tsyh Chen (2005a,2005b, 2004, 1990) and, like its cousin,theory-based evaluation (Weiss 2007,2000, 1997; Birckmayer and Weiss 2000) isa label that plays well in academic settings,


Program Theory and Logic Model Babel

Confusion reigns in the language describing howprogram activities lead to program outcomes:

* logic model * logical framework * programlogic * program model * intervention logic* intervention model * chain of objectives* outcomes map * impact pathway analysis* program theory * theory-based evaluation* theory-driven evaluation * theory of change* theory of action * intervention theory *

Which term is best? That best designation isthe one that makes the most sense to primaryintended uses—the term they resonate to andhas meaning within their context given theintended uses of the evaluation.


theory being much revered in universities.Stewart Donaldson (2007), Director of theInstitute of Organizational and ProgramEvaluation Research at Claremont GraduateUniversity, argues that systematic attentionto program theory in evaluation elevates eva-luation to the status and prestige of science,what he calls Program Theory-DrivenEvaluation Science. From a utilization-focused perspective, the choice of labeldepends on the purpose of the conceptualwork and the preferences of primary intendedusers. As Rogers (2005c) has observed,

Program logic is sometimes used inter-changeably with program theory. . . . Inmany cases, the choice of term is based onlocal responses to the words theory andlogic (each of which can be seen as unpalat-able) and on the terms used in the specifictexts used by the evaluators. (P. 339)

The challenge, then, is to use terms thathave meaning within a particular contextand tradition. We’ll now look at some ofthese approaches more closely.

The Logic Model Option in Evaluation: Constructing aMeans-Ends Hierarchy

Causation. The relation betweenmosquitos and mosquito bites.

—Michael Scriven (1991b:77)

A theory links means and ends. The con-struction of a means-ends hierarchy for a program constitutes a comprehensivedescription of the program’s model. Forexample, in his classic work on evaluation,Suchman (1967) recommended building achain of objectives by trichotomizing objec-tives into immediate, intermediate, and ulti-mate goals. The linkages between theselevels make up a continuous series of actions

wherein immediate objectives (focused onimplementation) logically precede intermedi-ate goals (short-term outcomes) and thereforemust be accomplished before higher-levelgoals (long-term impacts). Any given objec-tive in the chain is the outcome of the suc-cessful attainment of the preceding objectiveand, in turn, is a precondition to attainmentof the next higher objective.

Immediate goals refer to the results of thespecific act with which one is momentarilyconcerned, such as the formation of an obe-sity club; the intermediate goals push aheadtoward the accomplishment of the specificact, such as the actual reduction in weight ofclub members; the ultimate goal then exam-ines the effect of achieving the intermediategoal upon the health status of the members,such as reduction in the incidence of heartdisease. (Suchman 1967:51–52)

The means-ends hierarchy for a programoften has many more than three links. InChapter 7, I presented the mission statement,goals, and objectives of the MinnesotaComprehensive Epilepsy Program. One ofthe goals was to conduct high-qualityresearch on epilepsy. Exhibit 10.2 presentsthe chain of objectives for that goal.

The full chain of objectives that links inputsto activities, activities to immediate outputs,immediate outputs to intermediate outcomes,and intermediate outcomes to ultimate goalsconstitutes a program’s logical model. Any par-ticular paired linkage in the theory displays anaction and reaction: a hypothesized cause andeffect. As one constructs a hierarchical/sequentialmodel, it becomes clear that there is only a rela-tive distinction between ends and means: “Anyend or goal can be seen as a means to anothergoal, [and] one is free to enter the ‘hierarchy ofmeans and ends’ at any point” (Perrow1968:307). In utilization-focused evaluation, thedecision about where to enter the means-endshierarchy for a particular evaluation is made onthe basis of what information would be most



useful to the primary intended evaluation users.In other words, a formative evaluation mightfocus on the connection between inputs andactivities (an implementation evaluation) andnot devote resources to measuring outcomeshigher up in the hierarchy until implementationwas ensured. Elucidating the entire hierarchydoes not incur an obligation to evaluate everylinkage in the hierarchy. The means-ends hierar-chy displays a series of choices for more focusedevaluations while also establishing a context forsuch narrow efforts. Suchman (1967:55) usedthe example of a health education campaign toshow how a means-ends hierarchy can be statedin terms of a series of measures or evaluationfindings. See Exhibit 10.3.

Testing a Logic Model: A PolicyImplementation Example

Let me offer a simple example of testinga logic model. A State Department ofEnergy allocated conservation fundsthrough 10 regional districts. An evalua-tion was commissioned by the departmentto assess the impact of local involvementin priority setting. State and regional offi-cials articulated the following fair andequitable logic model of decision making:

1. State officials establish funding targetsfor energy proposals from districts andrules for submitting proposals.


BASIC LOGIC MODEL

Inputs/Resources

Outputs/Products

Activities/Processes

Short-TermOutcome

Longer -TermImpacts

Logic Modeling Resources

Centers for Disease Control Program Evaluation Resourceshttp://www.cdc.gov/healthyyouth/evaluation/resources.htm#4

European AID. 2005. Evaluation Tools. Brussels: European Commission.http://ec.europa.eu/europeaid/evaluation/methodology/tools/too_en.htm

Frechtling. 2007. Logic Modeling Methods in Program Evaluation.

Kellogg Foundation. 2001. Logic Model Development Guide: Logic Models to Bring TogetherPlanning, Evaluation & Action.http://www.wkkf.org/Pubs/Tools/Evaluation/Pub3669.pdf

Rogers. 2005b. Logic Model.

United Way. 1996. Measuring Program Outcomes: A Practical Approach.http://national.unitedway.org/outcomes/resources/mpo/model.cfm

University of Wisconsin Cooperative Extension. 2007. Program Development and Evaluation.http://www.uwex.edu/ces/pdande/evaluation/evallogicmodel.html



Mission and Goals of the Epilepsy Program

Program Mission To improve the lives of people with epilepsy through research Program Goal To publish high-quality, scholarly research on epilepsy Program Objective To conduct research on neurological, pharmacological,

epidemiological, and social psychological aspects of epilepsy

Epilepsy Research Goal Chain of Objectives

1. People with epilepsy lead healthy, productive lives

2. Provide better medical treatment for people with epilepsy

3. Increase physician’s knowledge of better medical treatment for epileptics

4. Disseminate findings to medical practitioners

5. Publish findings in scholarly journals

6. Produce high-quality research findings on epilepsy

7. Establish a program of high-quality research on epilepsy

8. Assemble necessary resources (personnel, finances, facilities) to establish a research program

9. Identify and generate research designs to close knowledge gaps

10. Identify major gaps in knowledge concerning causes and treatment of epilepsy

E X H I B I T 10.2Epilepsy Program Logic Model

2. District advisory groups assess each dis-trict’s energy needs with broad citizen inputand involvement.

3. District advisory groups develop fundingproposals, based on the needs assessmentsthat meet their district’s state target and fol-low the state’s rules.

4. The state approves the budgets based on themerit of the proposals within the guidelines,rules, and targets provided.

5. Expected results: (a) Approved funds equaloriginal targets and (b) everyone perceives thefunding as fair and equitable.

In short, the espoused logic model was thatdecisions would be made fairly based onexplicit and transparent procedures, guidelines,and rules. The data showed this to be the casein only 6 of the 10 districts. In the other 4 dis-tricts, proposals from the districts exceeded the

assigned target amounts by 30 percent to 55percent; for example, one district, assigned a target of $100 million by the state, submitteda proposal for $140 million (despite a “rule”that said proposals could not exceed targets).Moreover, the final, approved budgetsexceeded the original targets by 20 percent to40 percent. The district with a target of $100million and a proposal for $140 millionreceived $120 million. Four of the districts,then, were not engaged in a by-the-book equi-table process; rather, their process was negoti-ated, personal, and political—and subsequentlyperceived as unfair. Needless to say, when thesedata were presented, the six districts that fol-lowed the guidelines and played the fundinggame by what they thought were uniformrules—the districts whose proposals equaledtheir assigned targets—were outraged. Testingthe espoused theory of fairness and uniform



Reduction in morbidityand mortality

Proportion of people in the targetpopulation who meet prescribedstandards of behavior

Number whose behaviors change

Number whose opinions change

Number who learn the facts

Number who read it

Number of people who receive the literature

Amount of literature distributed

Number of pieces of literature available for distribution

Pretest literature by readability criteria

E X H I B I T 10.3Logic Model Hierarchy of Evaluation Measures for a Health Education Campaign

SOURCE: Adapted from Suchman 1967:55.

rules revealed that the reality (theory-in-use)in four districts did not match the espousedideal in ways that had significant conse-quences for all concerned.

This is a simple, commonsense exampleof testing a logic model at the policy level.Nothing elegant. No academic trappings.The espoused model is a straightforwardarticulation of what all agreed was sup-posed to happen in the process to achievedesired outcomes. The linkages betweenprocesses and outcomes are made explicit.Evaluative data then revealed what actu-ally happened—and where what actually

happened departed from what was sup-posed to happen and the consequences ofthat discrepancy. At the district level, eachdistrict would have its own model of howfunds were to be used and what thosefunds were supposed to accomplish.

Three Approaches to Program Theory

A logic model only has to be logical andsequential. Adding specification of the causalmechanism moves the model from program


logic to program theory. In this section, wewill look at three major approaches to devel-oping program theory for evaluation use:

1. The deductive approach—drawing onscholarly theories from the academicliterature

2. The inductive approach—doing field-work on a program to generate groundedtheory, for example, as part of an evalu-ability assessment process

3. The user-focused approach—workingwith intended users to extract and makeexplicit their implicit theory of action

The deductive approach draws on dom-inant theoretical traditions in specific schol-arly disciplines to construct models of therelationship between program treatmentsand outcomes. For example, an evaluationof whether a graduate school teachesstudents to think critically could be basedon the theoretical perspective of a phenom-enography of adult critical reflection, asarticulated by the Distinguished Professorof Education Stephen Brookfield (1994),an approach that emphasizes the visceraland emotional dimensions of criticalthought as opposed to purely intellectual,

cognitive, and skills emphases. Illustrationsof the deductive approach to evaluation arechronicled in Chen (2004), Rossi andFreeman (1993), Lipsey and Pollard(1989), and Chen and Rossi (1989, 1987).Testing social science theories may be a by-product of an evaluation in which the pri-mary purpose is knowledge generation (seeChapter 4). However, the temptation in thedeductive approach is to make the studymore research than evaluation, that is, to letthe scholarly contribution and theory testingtake over the evaluation, making it usefulacademically but not necessarily useful topractitioners and policymakers.

The inductive approach involves the eval-uator in doing fieldwork to generate pro-gram theory. Staying with the example ofevaluating whether graduate students learnto think critically, the inductive approachwould involve assessing student work,observing students in class, and interview-ing students and professors to determinewhat model of education undergirds effortsto impart critical thinking skills. Such aneffort could be done as an evaluation studyunto itself, for example, as part of an earlyevaluability assessment process, or it couldbe done in conjunction with a deductive


The Logical Framework Matrix

A logical framework (Logframe) is a matrix that specifies each step in the chain of objectives and requiresfor each step that a target result be specified, that the data source be specified for measuring the result, andthat any critical assumptions be stated. The Logical Framework Approach (IFAD 2002; NORAD 1999;Sartorius 1996, 1991) offers a format for connecting levels of impact with evidence. Used widely byinternational development agencies as a comprehensive map in designing projects, the framework beginsby requiring specification of the overall goal and purposes of the project. Short-term outputs are linkedlogically to those purposes, and activities are identified that are expected to produce the outputs. Carefulattention must be paid to the language of this model because what the logical framework calls a goal iswhat other models more commonly call mission; and “purposes” in this approach are similar to objectivesor outcomes; outputs are short-term, end-of-project deliverables. For every goal, purpose, output, andactivity, the framework requires specification of objectively verifiable indicators, means of verification (typesof data), and important assumptions about the linkage between activities and outputs, outputs to purposes,and purposes to goals. In their very practical book on RealWorld Evaluation, Bamberger, Rugh, and Mabry(2006) discuss how to integrate logical framework and program theory approaches with excellent examplesfrom their international development experiences.


effort based on a literature review. Theproduct of the inductive approach, andtherefore a major product of the evalua-tion, would be an empirically derived theo-retical model of the relationship betweenprogram activities and outcomes framed interms of important contextual factors.

User-Focused Theoryof Change Approach

In neither the deductive nor inductiveapproach to program theory does the evalu-ator have to engage with key stakeholders.That changes in the user-focused approach inwhich the evaluator’s task is to facilitateintended users, especially program person-nel, to articulate their operating theory.Continuing with the critical thinkingexample, this would mean bringing togetherstudents and professors to make explicittheir educational assumptions and generate aprogram theory model that could then betested as part of the evaluation.

Facilitating the program theory articula-tion process involves working with thosewho are knowledgeable about the programto construct a flowchart of what happensfrom the time a person enters a program tothe time they leave. How are peoplerecruited or selected into the program?What’s their baseline situation when theyenter? How are they oriented to the pro-gram once they enter? What are the earlyactivities they engage in? How much arethey expected to participate? What are theysupposed to be doing, learning, acquiring,or changing during those early activities?During later activities? What’s the sequenceor stages of participants’ experiences?What happens as the program approachesthe end? What changes should haveoccurred in participants by the timethey reach the end of the program? Whatmechanisms explain why these changes takeplace? To what extent are these changes

expected to be sustained? What additionaloutcomes are expected to occur after leav-ing the program (e.g., keeping a job orstaying off of drugs).

The primary focus of utilization-focusedevaluation is testing practitioner theoriesabout why they do what they do and whatthey think results from what they do.Utilization-focused evaluation involvesprimary intended users in specifying theprogram’s theory and in deciding howmuch attention to give to testing the theorygenerated, including how much to draw onsocial science theory as a framework forthe evaluation (Patton 1989).

A Menu of Theory-Based Approaches

Each of the three approaches to programtheory—deductive, inductive, and user-focused—has advantages and disadvan-tages. These are reviewed in Menu 10.1.The strategic calculations a utilization-focused evaluator must make include deter-mining how useful it will be to spend timeand effort elucidating a theory of change (ormore than one where different perspectivesexist, which is common); how to keeptheory generation from becoming esotericand overly academic; how formal to be inthe process; and what combinations of thethree approaches, or relative emphasis,should be attempted. Factors to consider inmaking these calculations will be clearerafter some more examples, which follow.

Theory of Change and Evaluation:Getting at Causal Mechanismsand Assumptions

The purpose of thoroughly delineating aprogram’s theory of change is to assistpractitioners in making explicit theirassumptions about the linkages betweeninputs, activities, immediate outputs,




MENU 10.1

Pitfalls to Avoid

Don’t oversimplify tothe point of esotericmeaninglessness in aneffort to manageconflict or varyingperceptions

Don’t force articulationof a single theory.Different users maywell have differenttheories of action

Don’t force a singletheory or model onthe program wheremultiple theories ofaction are operating

Don’t let generatingtheory take on a life ofits own and become ahigher priority thangenerating usefulresults

Don’t force programinto a theorypigeonhole

Don’t let theorytesting become higherpriority thangenerating usefulresults

Don’t let oneapproach trumpanother; treat each onits merits, fairly and inbalance

PotentialDisadvantages

As users struggleto articulate theirtheory, they maybe defensive

Formal explicitmodel may notreflect programrealties

Fieldworktakes time andresources forevaluation andprogram

Likely thatdifferent programpeople operatewith differenttheories in large,multilevel, orcomplexprograms

May not berelevant tospecific program

May feel esotericto practitioners

Literature searchtakes time andresources

Costly, time-consuming

May lead toconflicting results

PotentialAdvantages

Intended usersunderstand thetheory of action

Intended usersown the theoryof action

Theory groundedin real-worldpractice

High relevancebecause theory isgenerated fromactual programactivities andobservedoutcomes

Can focus anevaluabilityassessment effort

Draws on existingknowledge andliterature

High academiccredibility

Connects to largerissues

Uses strengths ofeach approach

Provides diverseperspectives

Approach

User-focusedapproach:working withintended users toextract and specifytheir implicit theoryof action to make itexplicit action

Inductive approach:doing fieldwork ona program togenerate groundedtheory

Deductiveapproach:drawing onscholarly theoriesfrom the academicliterature

Combiningapproaches

Approaches to Generating Program Theory


intermediate outcomes, and ultimategoals. Suchman (1967) called beliefs aboutcause-effect relationships the program’svalidity assumptions. For example, manyeducation programs are built on the valid-ity assumptions that (1) new informationleads to attitude change and (2) attitudechange affects behavior. These assump-tions are testable. Does new knowledgechange attitudes? Do changed attitudeslead to changed behaviors? Carol Weiss(2000) has commented on the widespreadnature of these assumptions:

Many programs seem to assume that provid-ing information to program participants willlead to a change in their knowledge, andincreased knowledge will lead to positivechange in behavior. This theory is the basisfor a wide range of programs, including thosethat aim to reduce the use of drugs, preventunwanted pregnancies, improve patients’adherence to medical regimens, and so forth.Program people assume that if you tell

participants about the evil effects of illegaldrugs, the difficult long-term consequences ofunwed pregnancies, and the benefits ofcomplying with physician orders, they willbecome more conscious of consequences,think more carefully before embarking ondangerous courses of action, and eventuallybehave in more socially acceptable ways.

The theory seems commonsensical. Socialscientists—and many program people—know that it is too simplistic. Much researchand evaluation has cast doubt on its univer-sal applicability. . . . So much effort isexpended in providing information in anattempt to change behavior that carefulinvestigation of this theory is warranted. (Pp. 40–41)

Knowing this, when an evaluatorencounters a program theory that positsthat information will produce knowledgechange, and knowledge change will pro-duce behavior change, it is appropriate tobring to the attention of those involved the


Expert Advice from Carol Weiss on Theory-Based Evaluation

• When social science provides theory and concepts that ground and support local formulations,it can be of great evaluative value. The evaluator should bring her knowledge of the socialscience literature to bear on the evaluation at hand.

• When a number of different assumptions are jostling for priority, a theory-based evaluation iswise to include multiple theories. . . . . But the more theories that are tracked, the morecomplex and expensive evaluation. Choices have to be made.

• Select theories to test on the following criteria:� The first criterion is the beliefs of the people associated with the program, primarily the

designers and developers who planned the program, the administrators who manage it,and the practitioners who carry it out on a daily basis. Also important may be the beliefsof the sponsors whose money funds the program and the clients who receive the servicesof the program. What do these groups assume are the pathways to good outcomes?

� A second criterion is plausibility. . . . Can the program actually do the things the theoryassumes, and will the clients be likely to respond in the expected fashion?

� A third criterion is lack of knowledge in the program field. This allows the evaluation tocontribute knowledge to the field.

� A final criterion for choosing which theories to examine in a theory-based evaluation isthe centrality of the theory to the program. Some theories are so essential to theoperation of the program that no matter what else happens, the program’s successhinges on the viability of this particular theory.

SOURCE: Weiss (2000:38–41).


substantial evidence that this model doesn’twork. In being active-reactive-interactive-adaptive when working with primaryintended users, the evaluator can andshould bring social science and evaluationknowledge to the attention of those withwhom they’re working.

Consider this example. The World Bankprovided major funding for a program inBangladesh aimed at improving maternaland child health and nutrition. The theoryof change was the classic one we are review-ing here: Information leads to knowledgechange, knowledge change leads to practicechange. It didn’t work. Women in extremepoverty did not have the resources to followthe desired behaviors, even if they wereinclined to do so (in other words, they couldnot afford the recommended foods).Moreover, they live in a social system wherewhat they eat is heavily influenced, evendetermined, by their mothers-in-law andhusbands. The World Bank commissionedan impact evaluation of the project whichdocumented this substantial “knowledge-practice gap” and found that the programwas ineffective in closing the gap. “All formsof knowledge transmitted by the project suf-fer from a knowledge-practice gap, so atten-tion needs to be paid to both the resourceconstraints that create this gap and trans-mitting knowledge to other key actors:mothers-in-law and husbands” (WorldBank 2005:43).

The larger question is, “Why are thosedesigning interventions aimed at womenin extreme poverty still operating on atheory of change that has been discreditedtime and time again?” We will return tothis question later in this chapter, whenwe discuss bringing systems thinking tobear on program theory. For the moment,I want to use this example to introducethe critical role evaluators play in helpingsurface and then test a program’s causalassumptions.

Identifying Critical Assumptions

Validity assumptions are the presumedcausal mechanisms that connect steps in alogic model turning it into a theory of change.The proposition that gaining knowledge willlead to behavior change is undergirded by a validity assumption, namely, that the rea-son people aren’t behaving in the desiredmanner is because they lack knowledge aboutwhat to do. Poor women in Bangladesh don’teat the right foods when they are pregnantbecause they don’t know enough aboutproper nutrition. Teach them about propernutrition and they will eat the right foods. Itturned out that they gained the knowledgebut didn’t change their behavior. The validityassumption proved false, or at least insuffi-cient. Knowledge of nutrition may be a nec-essary but not sufficient condition for propereating.

As validity assumptions are articulatedin a means-ends hierarchy, the evaluatorcan work with intended users to focus theevaluation on those critical linkages whereinformation is most needed at that particu-lar point in the life of the program. It is sel-dom possible or useful to test all the validityassumptions or evaluate all the means-endslinkages in a program’s theory of action.The focus should be on testing the validityof critical assumptions. In a utilization-focused evaluation, the evaluator workswith the primary intended users to identifythe critical validity assumptions wherereduction of uncertainty about causal link-ages could make the most difference.

While the evaluators can and should bringtheir own knowledge of social science to bearin interactions with primary intended users,the evaluator’s beliefs about critical assump-tions is ultimately less important than whatstaff and decision makers believe. An evaluatorcan often have greater impact by helping pro-gram staff and decision makers empirically testtheir own causal hypotheses than by telling



them such causal hypotheses are nonsense.This means working with them where theyare. So despite my conviction that knowledgechange alone seldom produces behaviorchange, I still find myself helping young pro-gram staff rediscover that lesson for them-selves. Not only does the wheel have to bere-created from time to time, its efficacy has tobe restudied and reevaluated. The evaluator’scertain belief that square wheels are less effica-cious than round ones may have little impacton those who believe that square wheels areeffective. The evaluator’s task is to delineatethe belief in the square wheel, share otherresearch on square wheels when available, andif they remain committed to a square wheeldesign, assist the true believers in designing anevaluation that will permit them to test forthemselves how well it works.

I hasten to add that this does not meanthat the evaluator is passive. In the active-reactive-interactive-adaptive process ofnegotiating the evaluation’s focus and

design, the evaluation facilitator can sug-gest alternative assumptions and theoriesto test, but first priority goes to evaluationof validity assumptions held by primaryintended users.

Filling in the Conceptual Gaps andTesting the Reasonablenessof Program Theories

Helping stakeholders examine concep-tual gaps in their theory of change isanother task in building program theoryand making it evaluatable. In critiquing afamous prison reform experiment, Rutman(1977) has argued that the idea of usingprison guards as counselors to inmatesought never have been evaluated (Ward,Kassebaum, and Wilner 1971) because, onthe face of it, the idea is nonsense. Whywould anyone ever believe that prisonguards could also be inmate counselors?But clearly, whether they should have or


Theory of Change

A Theory of Change defines all building blocks required to bring about a given long-term goal. This setof connected building blocks—interchangeably referred to as outcomes, results, accomplishments, orpreconditions—is depicted on a map known as a pathway of change/change framework, which is agraphic representation of the change process.

Built around the pathway of change, a Theory of Change describes the types of interventions (a singleprogram or a comprehensive community initiative) that bring about the outcomes depicted in thepathway of a change map. Each outcome in the pathway of change is tied to an intervention, revealingthe often complex web of activity that is required to bring about change.

A Theory of Change would not be complete without an articulation of the assumptions thatstakeholders use to explain the change process represented by the change framework. Assumptionsexplain both the connections between early, intermediate, and long-term outcomes and theexpectations about how and why proposed interventions will bring them about. Often, assumptionsare supported by research, strengthening the case to be made about the plausibility of theory and thelikelihood that stated goals will be accomplished.

Stakeholders value theories of change as part of program planning and evaluation because they createa commonly understood vision of the long-term goals, how they will be reached, and what will beused to measure progress along the way.

SOURCE: ActKnowledge and the Aspen Institute Roundtable on Community Change (2007) www.theoryofchange.org.

See also Anderson (2005).


not, some people did believe that the pro-gram would work. Without reaching anevaluation conclusion prior to gatheringdata, the evaluator can begin by filling inthe conceptual gaps in this program theoryso that critical validity assumptions can beidentified and examined. For example, issome kind of screening and selection part ofthe design so that the refined theory is thatonly certain kinds of guards with certaincharacteristics and in certain roles can serveas counselors to particular kinds of inmates?And what kind of training program forguards is planned? In what ways are guardssupposed to be changed during such train-ing? How will changed guard behavior bemonitored and rewarded? The first criticalassumptions to be evaluated may bewhether prison guards can be recruited andtrained to exhibit desired counselor atti-tudes and behaviors. Whether prisonguards can learn and practice human rela-tions skills can be evaluated without everimplementing a full-blown program.

Filling in the gaps in the program’s theoryof change goes to the heart of the implemen-tation question: What series of activities musttake place before there is reason even to hopethat the desired outcomes will result? I oncereviewed a logic model for an after-schoolprogram that would provide crafts, arts, andsports activities for middle-school studentsonce a week for 2 hours for one semester—atotal of 30 contact hours. The expected out-come was “increased self-esteem.” On theface of it, is this a reasonable outcome? Whatare the causal mechanisms that link 30 hoursof after-school group activities to increasedself-esteem. Or consider a “dress-for-success”program that provided appropriate dressclothes for poor people to wear to job inter-views. The program’s stated outcome was therather modest goal that the appearance of jobapplicants would make a positive impression.To justify funding the program, the funderwanted the program to change the outcome

to getting a job. Now the clothes might helpin that regard, but would, at best, be a minorfactor. The “dress-for-success” programappropriately resisted being evaluated on thecriterion of whether those to whom they pro-vided clothes got a job.

The logic of a chain of objectives is thatif activities and objectives lower in themeans-ends hierarchy will not be achievedor cannot be implemented, then evaluationof ultimate outcomes is problematic.

There are only two ways one can move upthe scale of objectives in an evaluation: (a) by proving the intervening assumptionsthrough research, that is, changing anassumption to a fact, or (b) by assumingtheir validity without full research proof.When the former is possible, we can theninterpret our success in meeting a lower-level objective as automatic progress towarda higher one. (Suchman 1967:57)

This has important implications for whatkind of follow-up is needed in an evaluationto determine impact. Research shows thatchildren immunized against polio do notget polio. The causal connection betweenthe immunization and immunity againstpolio is established. Therefore, the evalua-tion can stop at determining that childrenhave been immunized and confidently cal-culate how many cases of polio have beenprevented based on epidemiologicalresearch. The evaluation design does nothave to include follow-up to determinewhether immunized children get polio.That question has been settled by research.

One important reason for testing criticalvalidity assumptions is that some findingsare counterintuitive. The Bangladesh mater-nal and child nutrition program provides anexcellent example of the interface betweenresearch and evaluation. The programtheory posited that proper nutrition forwomen during pregnancy would reduce theincidence of low birth weight babies. It



seems commonsensical that the proper focusfor maternal health would be on nutritionduring the pregnancy. The evaluation find-ings questioned this assumption and addedto the interpretation results from researchshowing that the pregnant women’s pre-pregnancy weight was more predictive ofbabies’ birth weight than weight gain duringpregnancy. The evaluation concluded,

Supplementary feeding for pregnant womenappears to be a flawed approach on twogrounds: (1) the pregnancy weight gainachieved is mostly too small to have anoticeable impact on birth weight; and (2) itis pre-pregnancy weight that evidence sug-gests to be the most important determinantof birth weight. . . . The fact that it is pre-pregnancy weight that matters suggests thata different approach altogether ought per-haps be considered, such as school feedingprograms or targeting adolescent females inpoorer areas. (World Bank 2005:43)

However, this evaluation finding appearsto come with its own assumption, namely,that there are not sufficient resources to pro-vide needed food and nutritional supple-ments to impoverished women both beforeand during pregnancy. By framing the eval-uation conclusion as a choice between pro-viding nutrition before pregnancy or duringpregnancy, the evaluator has limited the pol-icy and programming options. This illus-trates how evaluators’ own theories ofchange and assumptions come into play andneed to be made explicit and questioned.

Using the Theory of Change to Focusthe Evaluation: The New School Case

Once an espoused theory of change isdelineated, the issue of evaluation focusremains. This involves more than mechani-cally evaluating lower-order validity assump-tions and then moving up the hierarchy. Notall linkages in the hierarchy are amenable to

testing; different causal linkages require dif-ferent resources for evaluation; data-gatheringstrategies vary for different objectives. In asummative evaluation, the focus will be onoutcomes attainment and causal attribution.For formative evaluation, the most impor-tant factor is determining what informationwould be most useful at a particular point intime. This means identifying those targets ofopportunity where additional informationcould make a difference to the direction ofincremental, problem-oriented, programdecision making. Having information aboutand answers to those select questions canmake a difference in what is done in the pro-gram. Here’s an example.

The New School of Behavioral Studies inEducation, University of North Dakota, wasestablished to support educational innova-tions that emphasized individualized instruc-tion, better teacher-pupil relationships, andan interdisciplinary curriculum. The NewSchool established a master’s degree, teach-ing-intern program in which internsreplaced teachers without degrees so thatthe latter could return to the university tocomplete their baccalaureates. The cooper-ating school districts released those teacherswithout degrees who volunteered to returnto college and accepted the master’s degreeinterns in their place. Over 4 years, the NewSchool placed 293 interns in 48 school dis-tricts and 75 elementary schools, both pub-lic and parochial. The school districts thatcooperated with the New School in theintern program contained nearly one thirdof the state’s elementary school children.

The Dean of the New School formed atask force of teachers, professors, students,parents, and administrators to evaluate theprogram. We constructed the theory ofchange shown in Exhibit 10.4. The objectivesstated in the first column are a far cry frombeing clear, specific, and measurable, but theywere quite adequate for discussions aimed atfocusing the evaluation question. The second



352

EX

HIB

IT10

.4Th

e Ne

w S

choo

l The

ory

of A

ctio

n: A

Hie

rarc

hy o

f Obj

ectiv

es, V

alid

ity A

ssum

ptio

n Li

nkag

es, a

nd E

valu

atio

n Cr

iteria

Hier

arch

y of

Obj

ectiv

es

I.Ul

timat

e Ob

ject

ives

1.Pr

epar

e ch

ildre

n to

live

full,

rich

, sat

isfy

ing

lives

as

adul

ts.

2.M

eet t

he a

ffect

ive

and

cogn

itive

nee

ds o

fin

divi

dual

chi

ldre

n in

Nor

th D

akot

a an

d th

eUn

ited

Stat

es.

3.Fa

cilit

ate

and

legi

timize

the

esta

blis

hmen

tan

d m

aint

enan

ce o

f a la

rger

num

ber o

f mor

eop

en c

lass

room

s in

Nor

th D

akot

a an

d th

eUn

ited

Stat

es.

II.In

term

edia

te O

bjec

tives

4.Pr

ovid

e pa

rent

s an

d ad

min

istra

tors

in N

orth

Dako

ta w

ith a

firs

than

d de

mon

stra

tion

of th

ead

vant

ages

of o

pen

educ

atio

n.

Caus

al A

ssum

ptio

n Li

nkag

es

Child

ren

who

se a

ffect

ive

and

cogn

itive

nee

ds a

re m

etw

ill le

ad fu

ller,

riche

r, m

ore

satis

fyin

g liv

es a

s ad

ults

.

Mor

e op

en c

lass

room

s w

ill b

ette

r mee

t the

affe

ctiv

ean

d co

gniti

ve n

eeds

of i

ndiv

idua

l chi

ldre

n.

Pare

nts

and

adm

inis

trato

rs w

ill fa

vor a

nd e

xpan

dop

en e

duca

tion

once

they

hav

e ex

perie

nced

it fir

stha

nd.

Eval

uativ

e Cr

iteria

1.Lo

ngitu

dina

l mea

sure

s of

chi

ld a

nd a

dult

satis

fact

ion,

hap

pine

ss, a

nd s

ucce

ss.

2.M

easu

res

of s

tude

nt a

ffect

ive

and

cogn

itive

grow

th in

ope

n an

d tra

ditio

nal s

choo

ls.

3.M

easu

res

of in

crea

ses

in th

e nu

mbe

r of

open

cla

ssro

oms

in N

orth

Dak

ota

and

the

Unite

d St

ates

ove

r tim

e an

d m

easu

res

ofth

e in

fluen

ce o

f the

New

Sch

ool o

n th

enu

mbe

r of o

pen

clas

sroo

ms.

4.M

easu

res

of p

aren

t and

adm

inis

trato

rat

titud

es to

war

d Ne

w S

choo

l cla

ssro

oms

and

open

edu

catio

n, a

nd m

easu

res

and

anal

ysis

of th

e fa

ctor

s af

fect

ing

thei

r atti

tude

s.


353

Hier

arch

y of

Obj

ectiv

es

5.Pr

ovid

e te

ache

rs a

nd te

ache

rs-in

-trai

ning

with

a o

ne-y

ear c

lass

room

exp

erie

nce

inco

nduc

ting

an o

pen

clas

sroo

m.

III.

Imm

edia

te O

bjec

tives

6.Pr

ovid

es te

ache

rs a

nd te

ache

rs-in

-trai

ning

with

a s

umm

er p

rogr

am in

how

to c

ondu

ctop

en c

lass

room

s.

7.Pr

ovid

e te

ache

rs a

nd te

ache

rs-in

-trai

ning

with

a p

erso

naliz

ed a

nd in

divi

dual

ized

lear

ning

exp

erie

nce

in a

n op

en le

arni

ngen

viro

nmen

t.

Caus

al A

ssum

ptio

n Li

nkag

es

Teac

hers

who

hav

e ex

perie

nced

the

New

Sch

ool

sum

mer

pro

gram

can

and

will

con

duct

ope

ncl

assr

oom

s du

ring

the

follo

win

g in

tern

yea

r tha

t are

visi

ble

to lo

cal p

aren

ts a

nd a

dmin

istra

tors

.

Teac

hers

who

hav

e ex

perie

nced

the

sum

mer

prog

ram

can

and

will

con

duct

ope

n cl

assr

oom

s.

To le

arn

abou

t ope

n ed

ucat

ion,

it is

bes

t to

expe

rienc

e it.

Tea

cher

s te

ach

the

way

they

are

taug

ht.

Eval

uativ

e Cr

iteria

5.M

easu

res

of th

e de

gree

of o

penn

ess

of N

ewSc

hool

teac

hing

inte

rn c

lass

room

s an

d th

efa

ctor

s af

fect

ing

the

degr

ee o

f ope

nnes

s of

thes

e cl

assr

oom

s.

6.M

easu

res

of te

ache

r atti

tude

s, te

ache

run

ders

tand

ing,

and

teac

her c

ompe

tenc

ybe

fore

and

afte

r the

New

Sch

ool P

rogr

am.

7.M

easu

res

of th

e de

gree

to w

hich

the

New

Scho

ol tr

aini

ng p

rogr

am is

indi

vidu

alize

dan

d pe

rson

alize

d, a

nd m

easu

res

of th

eco

gniti

ve a

nd a

ffect

ive

grow

th o

f tea

cher

sin

the

New

Sch

ool P

rogr

am.

NOTE

: The

val

idity

ass

umpt

ions

(mid

dle

colu

mn)

link

obj

ectiv

es (l

eft c

olum

n). A

rrow

s in

dica

te to

whi

ch o

bjec

tives

the

assu

mpt

ions

app

ly.



column lists validity assumptions underlyingeach linkage in the theory of action. The thirdcolumn shows the measures that could beused to evaluate objectives at any level in thehierarchy. When the Evaluation Task Forcediscussed the program theory, membersdecided they already had sufficient contactwith the summer program to assess thedegree to which immediate objectives werebeing met. With regard to the ultimateobjectives, the task force members thoughtit was premature to evaluate the ultimateoutcomes of open education (Objectives 1and 2), nor could they do much with infor-mation about the growth of the open educa-tion movement (Objective 3). However, anumber of critical uncertainties surfaced atthe level of intermediate objectives. Oncestudents left the summer program for the 1-year internships, program staff memberswere unable to carefully and regularly moni-tor intern classrooms. They didn’t knowwhat variations existed in the openness of theclassrooms, nor did they have reliable infor-mation about how local parents and adminis-trators were reacting to intern classrooms.Those objectives were prime targets for forma-tive evaluation focusing on three questions: (1)To what extent are summer trainees conduct-ing open classrooms during the regular year?(2) What factors are related to variations inclassroom “openness”? (3) What is the rela-tionship between variations in classroomopenness and parent/administrator reactionsto intern classrooms?

At the onset, nothing precluded evalua-tion at any of the seven levels in the hierar-chy of objectives. There was seriousdiscussion of all levels and alternative foci.In terms of the educational literature, theissue of the outcomes of open educationcould be considered most important; interms of university operations, the summerprogram would have been the appropriatefocus; but in terms of the informationneeds of the primary decision makers and

primary intended users on the task force,evaluation of the intermediate objectiveshad the highest potential for generatinguseful, formative information.

Theory Informing Practice,Practice Informing Theory

Comparing Program Theories

Much evaluation involves comparingdifferent programs to determine which ismore effective or efficient. Evaluations canbe designed to compare the effectiveness oftwo or more programs with the same goal,but if those goals do not bear the sameimportance in the two programs’ theories,the comparisons may be misleading. Aspart of undertaking a comparative evalua-tion, it is useful to compare programtheories in order to understand the extentto which apparently identical or similarlylabeled programs are in fact comparable.

Programs with different intended out-comes cannot be fairly compared to eachother on a same outcomes basis. Teachercenters established to support staff develop-ment and resource support for schoolteachers provide an example. The U.S. Officeof Education proposed that teacher centersbe evaluated according to a single set of uni-versal outcomes. But the evaluation foundthat teacher centers throughout the countryvaried substantially in both program activi-ties and goals. Exhibit 10.5 describes threetypes of teacher centers, behavioral, human-istic, and developmental, and summarizesthe variations among these types of centers.

Different teacher centers were trying toaccomplish different outcomes, so to determinewhich one was most effective became prob-lematic because they were trying to do differ-ent things. Evaluation could help determine theextent to which outcomes have been attainedfor each specific program, but empirical datacould not determine which outcome was



E X H I B I T 10.5Variations in Types of Teacher Centers

Type of Center

1. Behavioral centers

2. Humanisticcenters

3. Developmentalcenters

Primary Processes ForWorking with Teachers

Curriculum specialistsdirectly and formally instructadministrators and teachers.

Informal, nondirected teacherexploration; “teachers selecttheir own treatment.”

Advisers establish warm,interpersonal, and directiverelationship with teachersworking with them over time.

Primary Outcomesof the Process

Adoption of standardizedcurriculum systems, methods,and packages by teachers.

Teachers feel supported and important; pickup concrete and practical ideas and materialsfor immediate use in their classroom.

Teachers’ thinking about what they do andwhy they do it is changed over time;individualized teacher personaldevelopment.

SOURCE: Adapted from Feiman (1977).

most desirable. That is a values question. Anevaluation facilitator can help users clarifytheir value premises, but because the threeteacher-center models were different, evalua-tion criteria for effectiveness varied for eachtype. In effect, three quite different theories ofteacher development were operating in quitedifferent educational environments. Attentionto divergent theories of action helped avoidinappropriate comparisons and reframed theevaluation question from Which model isbest? to What are the strengths and weak-nesses of each approach, and which approachis most effective for what kinds of educa-tional environments? Very different evalua-tion questions!

Matching a Theory of Change withLevels of Evidence

Claude Bennett (1982, 1979) conceptual-ized a relationship between the “chain ofevents” in a program and the “levels of

evidence” needed for evaluation thatbecame widely used and, in his honor, isknown as “Bennett’s Hierarchy.” Althoughhis work was originally aimed at evaluationof cooperative extension programs (agricul-ture, home economics, and 4-H/youth programs), his ideas are generally applicableto any education-oriented intervention.Exhibit 10.6 depicts Bennett’s model.

The model presents a typical chain ofprogram events.

1. Inputs (resources) must be assembled toget the program started.

2. Activities are undertaken with availableresources.

3. Program participants (clients, students,beneficiaries) engage in program activities.

4. Participants react to what they experience.

5. As a result of what they experience,changes in knowledge, attitudes, and skillsoccur (if the program is effective).


6. Behavior and practice changes followknowledge and attitude change.

7. Overall community impacts result asindividual changes accumulate andaggregate—both intended and unin-tended impacts.

Bennett’s Hierarchy is a values hierarchybecause the model explicitly places highervalue on higher-level results. The hierarchyplaces the highest value on attaining ultimatesocial and economic goals (e.g., increasedagricultural production, increased health,or a higher quality of community life). Actualadoption of recommended practices and spe-cific changes in client behaviors are necessaryto achieve ultimate goals and are valued overknowledge, attitude, and skill changes.People may learn about some new agricul-tural technique (knowledge change), believeit’s a good idea (attitude change), and knowhow to apply it (skill change)—but thehigher-level criterion is whether they actuallybegin using the new technique (i.e., changetheir agricultural practices). Participant reac-tions (satisfaction, likes, and dislikes) arelower on the hierarchy. All these are out-comes, but they are not equally valued out-comes. The bottom part of the hierarchy

identifies the means necessary for accom-plishing higher-level ends; namely, indescending order, (3) getting people to partic-ipate, (2) providing program activities, and(1) organizing basic resources and inputs toget started.

Evaluation Use Theory of Change

Interestingly, this same hierarchy can beapplied to evaluating evaluations. Exhibit 10.7shows a hierarchy of evaluation account-ability. In utilization-focused evaluation,the ultimate purpose of evaluation is toimprove programs and increase the qualityof decisions made.

To accomplish this ultimate end, a chainof events must unfold.

1. Resources must be devoted to the evalu-ation, including stakeholder time andfinancial inputs.

2. Working with intended users, importantevaluation issues are identified and ques-tions focused; based on those issues andquestions, the evaluation is designed anddata are collected.


Theory-Practice Connection

Nothing as practical as a good theory.

Carol Weiss (1995:1)

It is sometimes said that there are two kinds of people in the world: thinkers and doers. And, of course,the third type: those who neither think nor do, but we won’t worry about them just now. Thinkers are the world’s theoreticians. They love ideas, many of which have yet to be tested and may prove quiteimpractical. Doers, on the other hand, are too busy doing to worry about theory. But ultimately, theoryand practice ought to connect. Practice is the test of theory. Theory is the explanation of practice.

The evaluator’s job is to challenge both practitioners and theoreticians. With the latter we ask, “So, itworks in theory, but does it work in practice?” And with practitioners we ask, smiling diabolically,“Yes, it works in practice, but does it work in theory?”


357

EX

HIB

IT10

.6Th

eory

-Evi

denc

e Hi

erar

chy

7. E

nd r

esul

ts

5. K

now

ledg

e, a

ttitu

de, a

nd s

kill

chan

ges

5. M

easu

res

of in

divi

dual

and

gro

up

chan

ges

in k

now

ledg

e, a

ttitu

des,

and

ski

lls

4. R

eact

ions

4.

Wha

t par

ticip

ants

and

clie

nts

say

ab

out t

he p

rogr

am; s

atis

fact

ion;

inte

rest

,

stre

ngth

s, w

eakn

esse

s

3. P

artic

ipat

ion

3. T

he c

hara

cter

istic

s of

pro

gram

par

ticip

ants

and

clie

nts;

nu

mbe

rs, n

atur

e of

invo

lvem

ent,

back

grou

nd

2. A

ctiv

ities

2.

Impl

emen

tatio

n da

ta o

n w

hat t

he p

rogr

am a

ctua

lly o

ffers

or

does

1. I

nput

s 1.

Res

ourc

es e

xpen

ded;

num

ber

and

type

s of

sta

ff in

volv

ed;

tim

e ex

tend

ed

Pro

gra

m C

hai

n o

f E

ven

ts(T

heo

ry o

f C

han

ge)

M

atch

ing

Lev

els

of

Evi

den

ce

7. M

easu

res

of im

pact

on

over

all

pr

oble

m, u

ltim

ate

goal

s, s

ide

effe

cts,

so

cial

and

eco

nom

ic c

onse

quen

ces

6. P

ract

ice

and

beha

vior

cha

nge

6. M

easu

res

of a

dopt

ion

of n

ew

prac

tices

and

beh

avio

r ov

er ti

me

Program Design Hierarchy

Hierarchy of Evaluatio

n Criteria


3. Key stakeholders and primary users areinvolved throughout the process.

4. Intended users react to their involvement(ideally in positive ways).

5. The evaluation process and findings pro-vide knowledge and new understandings.

6. Intended users interpret results, generateand adopt recommendations, and useevaluation results.

7. The program improves, and wise deci-sions are made.

Each step in this chain can be evaluated.Exhibit 10.7 shows the evaluation ques-tion that corresponds to each level in theevaluation use logic model.

Cautions against Theoryof Change Work

Eminent evaluation theorist MichaelScriven warns evaluators against thinkingthat logic modeling and theory testing arecentral to the work of evaluation. He wantsevaluators to stay focused on the job ofjudging a program’s merit or worth. Fromhis perspective, rather than being essential,elucidating program theory is “a luxury forthe evaluator.” He considers it “a grossthough frequent blunder to suppose thatone needs a theory of learning to evaluateteaching” (1991b:360). One does not needto know anything at all about electronics,he observes, to evaluate computers. Healso cautions that considerable time andexpense can be involved in doing a goodjob of developing and testing a programtheory. Because program theory develop-ment is really program development workrather than evaluation work, he would pre-fer to separate the cost of such work fromthe evaluation budget and scope of work. Incommenting on the American Evaluation

Association (AEA) listserv, Evaltalk, Scriven,with his renowned wit, offered this advice toevaluators:

First, there’s no law or ethical reason whyevaluators with the appropriate training can’tdo other things for their clients besides eval-uation, e.g., market surveys or statisticalanalyses or logic models. The law and ethicsonly require that they not argue that these arean essential part of evaluation. . . . There aremany cases where it is beyond the boundariesof scientific knowledge to provide the logicmodel, and in many others it’s a huge taskthat risks jeopardizing the primary task ofevaluation.

Second, this isn’t ALWAYS so, and whenit can be done without undue cost, helpingthe program managers improve their logicmodels—which THEY certainly need—is aGood Thing to do. Prudence requires you tokeep in mind that doing this will, on someoccasions (if it is possible at all), completelyantagonize the clients, since they are wed-ded to a type of logic model that evidencedoes not support; prayer is one such case,but there are many, many more wherethe model is essentially “their thing,” intowhich their ego is woven. So this unneces-sary extra task CAN cost you the contract,or the rehire; think twice before doing it INPUBLIC. (Scriven 2006a)

Theory-driven evaluations can alsoseduce evaluators away from answeringstraightforward formative questions ormaking summative judgments into the ethe-real world of academic theorizing. Whiletheory construction is a mechanism bywhich evaluators can link program evalua-tion findings to larger social scientificissues for the purpose of contributing toscientific knowledge, when conducting autilization-focused evaluation the initial the-oretical formulations originate with primarystakeholders and intended users; scholarlyinterests are adapted to the evaluation



359

EX

HIB

IT10

.7Ev

alua

ting

Eval

uatio

n: L

ogic

Mod

el o

f Use

7. P

rogr

am a

nd d

ecis

ion

impa

cts

7. T

o w

hat e

xten

t and

in w

hat w

ays

was

the

prog

ram

im

prov

ed?

To w

hat e

xten

t wer

e in

form

ed, h

igh-

qual

ity

deci

sion

s m

ade?

6. P

ract

ice

and

prog

ram

cha

nge

6. T

o w

hat e

xten

t did

inte

nded

use

occ

ur?

Wer

e

reco

mm

enda

tions

impl

emen

ted?

5. S

take

hold

ers

know

ledg

e an

d

attit

ude

chan

ges

5. W

hat d

id in

tend

ed u

sers

lear

n? H

ow

wer

e us

ers’

atti

tude

s an

d id

eas

affe

cted

?

4. R

eact

ions

of p

rimar

y in

tend

ed u

sers

4. W

hat d

o in

tend

ed u

sers

thin

k ab

out t

he

eval

uatio

n? W

hat’s

the

eval

uatio

n’s

cr

edib

ility

? B

elie

vabi

lity?

Rel

evan

ce?

A

ccur

acy?

Pot

entia

l util

ity?

3. S

take

hold

er p

artic

ipat

ion

3. W

ho w

as in

volv

ed?

To w

hat e

xten

t wer

e ke

y

stak

ehol

ders

and

prim

ary

deci

sion

mak

ers

in

volv

ed th

roug

hout

?

2. E

valu

atio

n ac

tiviti

es

2. W

hat d

ata

wer

e ga

ther

ed?

Wha

t was

the

focu

s, th

e

desi

gn, t

he a

naly

sis?

Wha

t hap

pene

d in

the

eval

uatio

n?

1. In

puts

1.

To

wha

t ext

ent w

ere

reso

urce

s fo

r th

e ev

alua

tion

suffi

cien

t

and

wel

l man

aged

? W

as ti

me

suffi

cien

t?

Hierarchy of Utilization Questions

Evaluation Action Hierarchy


needs of relevant decision makers, not viceversa. Attention to theoretical issues canprovide useful information to stakeholderswhen their theories are formulated andreality-tested through the evaluationprocess. As always, the decision aboutwhether and how much to focus the eval-uation on testing the program’s theory isdriven by the question of utility: Will help-ing primary intended users elucidate andtest their theory of change lead to programimprovements and better decisions. And asScriven adjures, ask the cost-benefit ques-tion: Will the program theory work yieldsufficient benefits to justify the likely addedcosts involved in such work?

As advocates of theory-driven evalua-tion assert, a better understood programtheory can be the key that unlocks the doorto effective action. But how much toengage stakeholders and intended users inarticulating their theories of change is amatter for negotiation. Helping practition-ers test their espoused theories and discoverreal theories-in-use can be a powerful learn-ing experience, both individually and orga-nizationally. The delineation of assumedcausal relationships in a chain of hierarchi-cal objectives can be a useful exercise in theprocess of focusing an evaluation. It is notappropriate to construct a detailed pro-gram theory for every evaluation situation,but it is important to consider the option.Therefore, the skills of a utilization-focused evaluation facilitator include beingable to help intended users construct ameans-ends hierarchy, specify validityassumptions, link means to ends, and layout the temporal sequence of a hierarchy ofobjectives.

But that’s not all. In the last decade, theoptions for conceptualizing and mappingprogram theories have expanded and nowinclude bringing systems perspectives intoevaluation. The remainder of this chapter

presents, illustrates, and discusses what itmeans to bring systems thinking to bear inevaluating theories of change.

Systems Theory and Evaluation

All models are wrong, but some areuseful.

—George Box(quoted by Berk 2007:204)

Let’s look at how modeling a programusing systems thinking changes a programtheory. We’ll use as an example a programfor pregnant teenagers. The purpose of theprogram is to teach pregnant teenagers howto take care of themselves so that they havehealthy babies. Exhibit 10.8 shows a classiclinear logic model for such a program. Theteenager learns proper prenatal nutritionand self-care (increased knowledge), whichincreases the teenager’s commitment to tak-ing care of herself and her baby (attitudechange), which leads to changed behavior(no smoking drinking, or drug use; eatingproperly and attending the prenatal clinicregularly). This is a linear model because aleads to b leads to c, et cetera: Program par-ticipation leads to knowledge change,which leads to attitude change, which leadsto behavior change, which produces thedesired outcome (a healthy baby). This is alinear cause-effect sequence as depicted inExhibit 10.8. This is the traditional, wide-spread approach to logic modeling.

Now, let’s ask some systems questions.What various influences actually affect apregnant teenager’s attitudes and behav-iors? The narrowly focused, linear model inExhibit 10.8 focuses entirely on the pro-gram’s effects and ignores the rest of theteenager’s world. When we ask about thatworld, we are inquiring into the multitudeof relationships and connections that may



influence what the pregnant teenager does.Exhibit 10.9 is a rough sketch of possiblesystem connections and influences. Weknow, for example, that teenagers are heav-ily influenced by their peer group. The lin-ear, narrowly focused logic model, targetsthe individual teenager. A systems perspec-tive that considered the influence of a preg-nant teenager’s peer group might ask howto influence the knowledge, attitudes, andbehaviors of the entire peer group. This

would involve changing the subsystem (thepeer group) of which the individual pregnantteenager is a part. Likewise, the system’s webof potential influences in Exhibit 10.9 invitesus to ask about the relative influence ofthe teenager’s parents and other familymembers, the pregnant teenager’s boyfriend(the child’s father), or teachers and otheradults, as well as the relationship to thestaff of the prenatal program. In effect, thissystems perspective reminds us that the


E X H I B I T 10.8Linear Program Logic Model for Teenage Pregnancy Program

Program reaches out to pregnant teens

Pregnant teens enter and attend the program(participation)

Teens learn prenatal nutrition and self-care(increased knowledge)

Teens develop commitment to take care of themselvesand their babies (attitude change)

Teens adopt healthy behaviors: no smoking, no drinking,attend prenatal clinic, eat properly (behavior change)

Teens have healthy babies(desired outcome)


behavior of the pregnant teenager and thehealth of her baby will be affected by anumber of relationships and not just partic-ipation in the prenatal program. In workingwith such a model with program staff, theconceptual elaboration of the theory ofchange includes specifying which directionarrows run (one way or both ways, show-ing mutual influence), which influences arestrong (heavy solid lines) versus weak(dotted lines), and which influences aremore dominant (larger circles versussmaller circles).

Exhibit 10.10 changes the conceptualiza-tion of the change process from a simple lin-ear model of cause-effect to a systemsdynamics model of reinforcing actions thatdepicts the change process as cumulative.When the pregnant teenager adopts healthybehaviors, she feels better; if and whenshe gets positive reinforcement from clinicsnurses, family, and friends, her healthybehaviors are affirmed and reinforced,and she is more likely to continue them. Thecausal mechanism for sustained change in Exhibit 10.10 is ongoing positive


E X H I B I T 10.9Systems Web Showing Possible Influence Linkages to a Pregnant Teenager

Her teachersand other adults

Young pregnantwoman’s attitudes

and behaviors

Child’s fatherand his peers

Her peergroup

Prenatal programstaff

Her parentsand other family

members


reinforcement from people who are impor-tant to the pregnant teenager, which is a dif-ferent program theory from the linear modelof knowledge alone leading to behaviorchange. Exhibit 10.10 guides the evaluatorto ask not just how to produce the desirebehavior—but how to sustain it. Howchange is sustained is a systems question.

Exhibit 10.11 presents yet anothersystems perspective depicting possible

institutional influences affecting pregnantteenagers’ attitudes and behaviors. Thenarrowly focused, linear logic model inExhibit 10.8 treats the program’s impact inisolation from other institutional and soci-etal factors. In contrast, the systems web inExhibit 10.11 shows the prenatal programas one potentially strong influence on preg-nant teenagers but also takes into accountthe important influences of the youth


E X H I B I T 10.10Sustainable Change: Systems Dynamic Reinforcing Feedback Loops

Teenagereats properly

and takes careof herself

Teenagerfeels better

A SustainableReinforcing System

Teenager tellsclinic nurse

about her changedbehaviors

Teenager getssupport fromfriends and

family

Teenager tellsher friends and

family abouther changed

behaviors

Teenager getspositive

feedback fromclinic nurses

Pregnant teenager learnsthe importance of not

smoking and drinking—and stops doing both


culture, the school system, other community-based youth programs, the local clinic andhospital, and possibly the local church.Moreover, during her pregnancy theteenager may be affected by other systems:the welfare system (eligibility for financialsupport and food stamps), the legal system(laws governing the degree to which theteenager can make independent decisions orlive on her own), nutrition programs thatmight collaborate with the prenatal pro-gram, the transportation system (which

affects how the teenager gets to clinic visitsand the program), and the pervasive influ-ences of the media (television, movies, music)that affect teenager attitudes and behaviors.The systems diagram in Exhibit 10.11 alsoincludes larger contextual factors such as thepolitical environment; economic incentivesthat can affect a teenager’s ability to liveindependently, get child care, continue toattend school, or get a job; and social normsand larger cultural influences that affect howsociety responds to a teenager’s pregnancy.


E X H I B I T 10.11Program Systems Web Showing Possible Institutional Influences

Affecting Pregnant Teenagers' Attitudes and Behavior

Schoolsystem

Young pregnantwomen’s attitudes

and behaviors

Church

Youthculture

Prenatal clinicand hospital

outreach

Othercommunity-based youth

programs

Prenatalprogram

Other systems• Welfare• Legal• Nutrition programs• Transportation• Child protection• Media messages

Context factors• Politics• Economic incentives• Social norms• Culture• Music


Constructing such a systems map with aprenatal program may lead the program toconsider a more collaborative effort in whichvarious institutional partners come togetherto work toward the desired outcome of ahealthy pregnant teenager who delivers ahealthy baby. The system diagrams suggestthat the prenatal program by itself, focusingonly on the teenager and only on its owndelivery of knowledge to the teenager, is lesslikely to achieve the desired outcome than amodel which takes into account the influ-ences of other people in the teenager’s system(the teenager’s world) and collaborates withother institutions that can have an effect onthe attainment of desired outcomes.

Systems Framework Premises

Looking at a program from a systemsperspective is one way to deepen our under-standing of the program and its outcomes.A systems framework is built on some fun-damental relationships premises. We’llexamine those premises using the programfor pregnant teenagers to illustrate each one.

1. The whole is greater than the sum ofthe parts. If you look at Exhibit 10.9, yousee interconnected parts—the pregnantteenager, her peer group, her family, herboyfriend, teachers and other adults whointeract with her, and program staff. Thiswhole web of relationships will be a uniqueconstellation of interactions for each preg-nant teenager. The whole may include con-sistent or contradictory messages about theteenager and her pregnancy. Moreover, thatweb of relationships isn’t just built aroundher pregnancy. Messages about school,family, life, work, and love are all part ofthe mix. The systems picture reminds usthat the teenager’s life consists of a numberof relationships and issues that extend well beyond her experience in the prenatal

program. The program is but one influencein her whole life. Linear program theoriestend to be conceptualized as if the programis the only thing going on in the partici-pant’s life. Looking at a program as butone part of a participant’s whole life iswhat Saville Kushner (2000) has describedas “personalizing evaluation.”

I will be arguing for evaluators approachingprograms through the experience of individu-als rather than through the rhetoric of pro-gram sponsors and managers. I want toemphasize what we can learn about pro-grams from Lucy and Ann. . . . So my argu-ments will robustly assert the need to address“the person” in the program. (Pp. 9–10)

Exhibit 10.9 is an abstract conceptual-ization of a set of relationships. If we putLucy in the circle depicting the pregnantteenager and capture her story as a casestudy, we get a more holistic understandingof Lucy’s life and where the program fits inLucy’s life. We then do the same thing forAnn. Each of those stories is its own whole,and the combination of stories of teenagersin the program gives us a sense of theprogram whole. But that program wholecannot be reduced to the individual stories(the parts) any more than Lucy’s life can bereduced to the set of relationships inExhibit 10.9. The whole is greater thanthe sum of parts. Moreover, Exhibit 10.11reminds us that the program cannot beunderstood as a free-standing, isolated entity.The program as a whole includes relation-ships with other entities—schools, commu-nity organizations, churches—and largersocietal influences. A systems frameworkinvites us to understand the program inrelation to other programs and as part of alarger web of institutions.

2. Parts are interdependent such that achange in one part has implications for all



parts and their interrelationships. Imaginethat when Lucy becomes pregnant andenters the prenatal program, she has a closerelationship with her boyfriend (the child’sfather) and her family, is doing well inschool, and is active in church. The stress ofthe pregnancy leads Lucy and her boyfriendto break up. Things become tense in herfamily as everyone wants to give her advice.Her school attendance becomes irregularand she stops going to church. Withouther boyfriend and with increased familytension, a small number of female peersbecome increasingly central to Lucy’s life.What we begin to understand is that Lucy’ssystem of relationships existed before shebecame pregnant. Her pregnancy affects allher relationships and those changes in rela-tionships affect each other, ebbing andflowing. The pregnancy can’t really be saidto “cause” these changes. What happensbetween Lucy and her boyfriend when shebecomes pregnant is a function of theirwhole relationship and their relationshipswith others. The program is part of thatmix—but only a part. And how Lucy expe-riences the program will be affected by theother relationships in her life.

3. The focus is on interconnected rela-tionships. The change in perspective thatcomes with systems thinking focuses ourattention on how the web of relationshipsfunction together rather than as a linearchain of causes and effects. It is different toask how things are connected than to askdoes a cause b. It’s not that one inquiry isright and the other is wrong. The point isthat different questions and differentframeworks provide different insights.Consider the example of your reading thisbook. We can ask, “To what extent doesreading this book increase your knowledgeof evaluation?” That’s a fairly straightfor-ward linear evaluation question. Now we

ask, “How does reading this book relate tothe other things going on in your life?”That’s a simple systems question. Eachquestion has value, but the answers tell usvery different things.

4. Systems are made up of subsystems andfunction within larger systems. Exhibit 10.9shows a pregnant teenager’s relationshipswith other people. Exhibit 10.11 shows theprogram’s relationships with other institu-tions and how these, in combination, influ-ence the teenager’s attitudes and behavior.The “subsystems” are the various circles inthese two exhibits. These subsystems—family, school, church, community, peergroup—function within larger systems suchas society, the legal system, the welfare sys-tem, culture, and the economy. How subsys-tems function within larger systems and howlarger systems connect to and are influencedby subsystems can be part of a systemsinquiry into understanding a program and itseffects. Both the content and processes of aprenatal program for pregnant teenagers willbe affected by larger societal norms. That’swhy programs in rural Mississippi, inner cityChicago, East Los Angeles, southern France,northern Brazil, and Burkina Faso would bedifferent—even if they supposedly werebased on the same model. The societal andcultural contexts would inevitably affecthow the programs functioned.

5. Systems boundaries are necessary andinevitably arbitrary. Systems are social con-structions (as are linear models). Systemsmaps are devices we construct to makesense of things. It is common in hiking toremind people in the wilderness that “themap is not the territory.” The map is anabstract guide. Look around at the terri-tory. What to include in a systems diagramand where to draw the boundaries arematters of utility. Including too much



makes the system overwhelming. Includingtoo little risks missing important elementsthat affect program processes and out-comes. Given the purpose of evaluation, toinform judgment and action, the solutionis to be practical. If we are mapping therelationships that affect a teenager’s healthduring her pregnancy (Exhibit 10.9), weask, “What are the primary relationshipsthat will affect what the teenager does?”List those and map them in relationshipto each other. Don’t try to include everysingle relationship (a distant cousin inanother city with whom she seldom

interacts), but include all that areimportant—including that distant cousinif she is a teenager who has recently beenthrough a pregnancy, which might be rep-resented by a circle designating “otherteenagers who have been or are pregnantwho this teenager knows.” The systemsmap of a program is a guide, a way to askquestions and understand the dynamics ofwhat occurs. The systems map is not,however, the program.

A first step in moving beyond simple lin-ear logic models is to add feedback loops tothe model. Once a program is in operation,


Systems Framework Premises

A systems framework is built on some fundamental relationships premises.

• The whole is greater than the sum of the parts.• Parts are interdependent such that a change in one part has implications for all parts and their

interrelationships.• Systems are made up of subsystems and function within larger systems.• The focus is on interconnected relationships among parts, and between parts and the whole.• Systems boundaries are necessary and inevitably arbitrary.

the relationships between links in thecausal hierarchy are likely to be recursiverather than unidirectional. Instead of acauses b, the model becomes a causes b,and achieving b stimulates more of a. Takedieting and exercising to lose weight.Eating less and exercising leads to weightloss; losing weight leads to looking andfeeling better. That’s the simple linearchain. But looking and feeling better rein-forces eating better and continuing to exer-cise. See Exhibit 10.12.

Recognizing such recursive feedbackeffects changes the model. For example, high-achieving schools affect the opinions andactions of parents, but parent reactions alsoaffect the degree to which schools are commit-ted to high achievement. The influence doesn’t

flow just one way. Classroom climate andschool curriculum affect student achievement,but variations in student achievement alsoaffect school climate and curriculum. From asystems perspective, a simple linear means-ends hierarchy without feedback loops orinterdependencies is likely to be oversimpli-fied, but there is no avoiding some simplifi-cation, even with systems maps. The basicdilemma is how much to simplify reality. Thechallenge is to construct simplifications thatpass the dual tests of usefulness and accuracy.

The Increasing Importance ofSystems Thinking in Evaluation

The preceding has provided only abrief introduction to the possibilities for


incorporating systems perspectives in evalua-tion. At the 2002 national conference of theAEA, president Molly Engle made systemsthe focus of the meeting. The keynote addressby noted systems thinker John Sterman wastitled “No Learning Without Feedback:The Vital Partnership of Evaluation andSystems Thinking.” Shortly thereafter aTopical Interest group focused on Systemswas formed within AEA and every nationalconference since has had a full strand of ses-sions devoted to systems approaches to eval-uation. In 2006, AEA published its first-evermonograph, an expert anthology on SystemsConcepts in Evaluation edited by BobWilliams and Iraj Iman. That monographprovides a wide range of systems approachesand demonstrates the diversity of approachesthat congregate under the systems umbrella.

In commenting on this diversity the editorswrote,

For those of you looking for coherenceabout what we consider to be relevant sys-tems concepts for evaluation, our advicewhen reading this publication is to look forpatterns rather than definitions. For us,three patterns stand out:

1. Perspectives. Using systems conceptsassumes that people will benefit from look-ing at their world differently. For systemspractitioners, this motivation is explicit,deliberate, and is fundamental to theirapproach. However, just looking at the“bigger picture,” or exploring interconnec-tions does not make an inquiry “systemic.”What makes it systemic is how you look atthe picture, big or small, and explore inter-connections. A “system” is as much an


E X H I B I T 10.12Converting a Linear Model to a Simple Feedback Model

Diet

Exercise

Feel better

Look better

Positive feedback from others

Positive feedback from others

Adding thefeedback arrowsconverts a linearmodel to asimple systemsdynamics model.


“idea” about the real world as a physicaldescription of it.

2. Boundaries. Boundaries drive how we“see” systems. Boundaries define who or whatlies inside and what lies outside of a particularinquiry. Boundaries delineate or identifyimportant differences (i.e., what is “in” andwhat is “out”). Boundaries determine who orwhat will benefit from a particular inquiry andwho or what might suffer. Boundaries are fun-damentally about values–they are judgementsabout worth. Defining boundaries is an essen-tial part of systems work/inquiry/thinking.

3. Entangled systems. One can observeand perceive systems within systems, systemsoverlapping other systems, and systems tan-gled up in other systems. Thus it is unwise tofocus on one view or definition of a systemwithout examining its relationship withanother system. Where does one systembegin and the other end? Is there overlap?Who is best situated to experience or beaffected by that overlap? What systems existwithin systems and where do they lead? Asystems thinker always looks inside, outside,beside, and between the readily identifiedsystems boundary. He or she then critiquesand if necessary changes that initial choice ofboundary. (Williams and Iman 2006:6)

Part of the challenge of incorporatingsystems thinking in evaluation is that thereare so many different systems meanings,models, approaches, and methods, includ-ing system dynamics, soft systems method-ology, cultural-historical activity theory, andcritical systemic thinking, each of which hasspecific implications for evaluation(Williams 2005b). More generally, criticalsystemic thinking can be considered as oneelement of evaluative thinking (see Chapter 5).Werner Ulrich (2000, 1998) has advocatedthat although not everyone should become asystems scholar, systems thinking involvesnew reflective skills that are essential inmodern society for both professional com-petence and effective citizenship.

Evaluating Systems Reform

Systems thinking is pushing evaluatorsto conceptualize what we do in new waysand offers new frameworks for use inworking with primary intended users tothink about what they do and how they doit. This is especially the case where the tar-geted unit of change is, itself, a system.Thus, while much program evaluation hastraditionally focused on the outcomes ofprograms aimed at individuals—students,farmers, chemically dependent people,parents, children, professionals, some ini-tiatives target systems for reform andchange. Policy initiatives can be aimed atreforming systems: the health care system,the educational system, the judicial system,the farming system, et cetera. Evaluatingadvocacy initiatives and policy changecampaigns changes the unit of analysis(the evaluand) from the program level tothe policy or systems level (Patton 2008,Coffman 2007a, b). While systems think-ing is an option in looking at program out-comes for individuals, it is essential forevaluating system reform initiatives. Andthat provides a segue to one particularlychallenging systems evaluation problem:How to evaluate emergent processes incomplex nonlinear systems.

Evaluation in ComplexAdaptive Systems

A Complex Adaptive System is a dynamicnetwork of many interacting parts, contin-uously acting and reacting. The results ofthese interactions are dynamic, emergent,uncertain, and unpredictable. Examples areweather systems, stock markets, ecosys-tems, and anthills. One of the characteris-tics of complex adaptive systems is thatsmall effects can have large consequences as



expressed by the butterfly effect metaphor,which suggests that a butterfly flappingits wings today in China may lead to atyphoon forming in the Pacific Oceanmonths later. This is represented in oureveryday experience by the story of achance, brief encounter that changes yourlife, or a phrase offered at an opportunemoment that turns you in a new directionand alters forever your path. Since the 2000presidential election, the butterfly effect hasan additional meaning. Election officials inPalm Beach, Florida, experimented with anew format and procedure for voting calledthe “butterfly ballot.” This resulted in voterconfusion in a tight election, propelled theelection results into the judicial systemwhere the Supreme Court refused to allowa recount giving Florida to George Bush,which gave the presidency to George Bush,which led to the Iraq invasion, which led toglobal events and processes still unfolding.

Small actions (a changed ballot in onecounty) can have huge repercussions as thataction reverberates through a complexadaptive system.

Sometimes complex effects take years. In1933, the Belgians who controlled Rwandaas a colony issued identity cards classifyingevery Rwandan as Tutsi, Hutu, or Twa (avery minor category). In 1994, as part of thetragic genocide in Rwanda, those cards wereused by Hutu to identify hundreds ofthousands of Tutsi and kill them (Kinzer2007:24). How does one portray such a con-nection? Certainly, not simple cause andeffect. And somehow more than merely unin-tended consequences. A complex nonlinearsystem unfolded in an unpredictable fashion.

Complexity science is being used to under-stand phenomena in the biological world, pol-icy analysis, ecosystems, economic systems,and in organizations (Fritjof et al. 2007;Dennard, Richardson, and Morcol 2005;



Richardson et al. 2005; Gribben 2004;Westley and Miller 2003; Gunderson andHolling 2002; Johnson 2001; Lewin 2001;Zimmerman, Lindbery, and Plsek 2001;Eoyang 1996; Waldrop 1992). But whatdoes this have to do with evaluation? Theanswer lies in situational responsiveness andproblem definition, which affect how weconceptualize and design evaluations.

Three Kinds of Problems:Simple, Complicated, Complex

To pursue greatness is to pursueMaybe.

—John Bare (2007), Vice PresidentThe Arthur M. Blank

Family FoundationAtlanta, Georgia

In studying social innovations, we wereimpressed by the uncertainty and unpre-dictability of the innovative process, evenlooking back from a mountaintop ofsuccess, which is why we called the bookGetting to Maybe (Westley, Zimmerman,and Patton 2006). Evaluating social inno-vations is a complex problem, as opposedto evaluating simple and complicated prob-lems. A simple problem is how to bake acake following a recipe. A recipe has clearcause-and-effect relationships and can bemastered through repetition and develop-ing basic skills. There is a chance to stan-dardize the process and to write the recipewith sufficient detail that even someonewho has never baked has a high probabil-ity of success. Best practices for programsare like recipes in that they provideclear and high fidelity directions sincethe processes that have worked to pro-duce desired outcomes in the past arehighly likely to work again in the future.Assembly lines in factories have a “recipe”quality as do standardized school curricula.

Part of the attraction of the 12-Step pro-gram of Alcoholics Anonymous is itssimple formulation.

A complicated problem is more likesending a rocket to the moon. Expertise isneeded. Specialists are required and coordi-nation of the experts is another area ofexpertise itself. Formulae and the latest sci-entific evidence are used to predict the tra-jectory and path of the rocket. Calculationsare required to ensure sufficient fuel basedon current conditions. If all the “home-work” is completed, and if the coordinationand communication systems are sophisti-cated enough to access the expertise, there isa high degree of certainty of the outcome. Itis complicated, with many separate partsthat need coordination, but it can be con-trolled by knowledgeable leaders and thereis a high degree of predictability about theoutcomes. Cause-and-effect relationships arestill very clear, although not as straightfor-ward as with simple problems. Coordinatinglarge-scale programs with many local sitesthroughout a country or region is a compli-cated problem.

Parenting is complex. Unlike the recipeand rocket examples, there are no clearbooks or rules to follow to guarantee suc-cess. Clearly, there are many experts in par-enting and many expert books available toparents. But none can be treated like acookbook for a cake, or a set of formulaeto send a rocket to the moon. In the case ofthe cake and the rocket, for the mostpart, we were intervening with inanimateobjects. The flour does not suddenly decideto change its mind, and gravity can becounted on to be consistent too. On theother hand, children, as we all know, haveminds of their own. Hence our interven-tions are always in relationship with them.There are very few stand-alone parentingtasks. Almost always, the parents and childinteract to create outcomes. Any highly



individualized program has elements ofcomplexity. The outcomes will vary for dif-ferent participants based on their differingneeds, experiences, situations, and desires.

Exhibit 10.13 highlights the distinctionsbetween the three kinds of problems. In allthree cases, we tend to be optimistic that pos-itive outcomes can be achieved. However, theway we intervene in each of these contextsis qualitatively different, as is how we

design an evaluation (Rogers 2008; Westleyet al. 2006:8–10).

Simple formulations invite linear logicmodels that link inputs to activities to out-puts to outcomes like a formula or recipe.Complicated situations invite system dia-grams and maps that depict the relation-ships among the parts. Complex problemsand situations are especially appropriate fordevelopmental evaluation in which the


E X H I B I T 10.13Simple, Complicated, and Complex Lenses

Simple

Following a recipe

The recipe is essential.

Recipes are tested to assureeasy replication.

No particular expertise isrequired but cooking expertiseincreases success rate.

A good recipe produces nearlythe same cake every time.

The best recipes give goodresults every time.

A good recipe specifies thequantity and nature of the“parts” needed and the order inwhich to combine them, butthere is room forexperimentation.

Complicated

Sending a rocket to the moon

Right protocols or formulae arecritical and necessary.

Sending one rocket to the moonincreases assurance that thenext will be also be a success.

High levels of expertise andtraining in a variety of fields arenecessary for success.

Key elements of each rocketMUST be the same to succeed.

There is a high degree ofcertainty of outcome.

Success depends on a blueprintthat directs both thedevelopment of separate partsand specifies the exactrelationship in which toassemble them.

Complex

Raising a child

Rigid protocols have a limitedapplication or are counter-productive.

Raising one child providesexperience but is no guaranteeof success with the next.

Expertise helps but only whenbalanced with responsivenessto the particular child.

Every child is unique and mustbe understood as an individual.

Uncertainty of outcome remains.

Can’t separate the parts fromthe whole; essence exists in therelationship between differentpeople, different experiences,different moments in time.

SOURCE: Westley et al. (2006:9).


evaluation design is flexible, emergent, anddynamic, mirroring the emergent, dynamic,and uncertain nature of the intervention orinnovation being evaluated. Tracking, mon-itoring, and evaluating complex policiesrequires continuous “streams of knowl-edge” rather than discrete and boundedstudies (Stame 2006a, b). Evaluating in theface of the uncertainties of complexityrequires anticipation and agility to improveevaluation quality, responsiveness, and real-time relevance (Morrell forthcoming, 2005).

Complexity scientist Ralph Stacey(2007, 1996, 1992) has offered a matrixof two dimensions that helps distinguishsimple, complicated, and complex situa-tions. One dimension scales the degree ofcertainty in the cause-effect relationship.Programs and interventions are close tocertainty when cause and effect linkages inthe logic model are highly predictable, asin the relationship between immunizationand preventing disease. At the other end ofthe certainty continuum are innovativeprograms where the outcomes are highlyunpredictable; a community developmentinitiative would typically involve consider-able uncertainty. Extrapolating from pastexperience is problematic because, likerearing a child, each community is unique.The vertical axis of the matrix captures thedegree of agreement among various stake-holders about a program’s needed inputs,goals, processes, outcomes measures, andlikely long-term impacts. High levels ofagreement make situations fairly simple;high degrees of values conflict fomentcomplexity. Exhibit 10.14 shows where onthis matrix the zones defining simple, com-plicated, and complex problems can beexpected.

• Simple interventions are defined by highagreement and high causal certainty;immunization to prevent disease fits thiszone on the matrix.

• Socially complicated situations are definedby fairly high predictability of outcomes,but great values conflict among stakehold-ers; abortion is an example.

• Technically complicated situations aredefined by high agreement among stake-holders but low causal certainty; every-one wants children to learn to read butthere are ferocious disagreements aboutwhich reading approach produces thebest result (Schemo 2007).

• Complex situations are characterized byhigh values conflict and high uncertainty;what to do about global warming wouldfall in the complexity zone of the matrix.

Let me now explain and illustratethe evaluation implications of these differ-ent ways of understanding a program orintervention.

An Evaluation Example IllustratingSimple, Complicated, andComplex Designs

Consider a nationwide leadership devel-opment program that aims to infuse energyand vitality into a moribund nonprofit sector(a judgment based on funder assessment).The intensive 18-month program includes

(1) skill development (e.g., communicationstraining, conflict resolution, needs assess-ment, strategic planning, appreciativeinquiry methods) and knowledge acqui-sition (e.g., introduction to varioustheories of change, systems thinking,complexity science),

(2) an organizational change project in par-ticipants’ own organizations, and

(3) networking with other participantsaround nonprofit sector issues of com-mon interest and concern.

Skill development and knowledgeacquisition can be modeled and evaluated



with a linear framework. The desired out-comes are specifiable, concrete, and mea-surable, and the outcomes are connecteddirectly to the curriculum and training in ashort, observable time frame. Participantsdemonstrate their skills and knowledge bywriting papers, carrying out assignments,and doing team projects. A linear logicmodel can appropriately capture anddepict the hypothesized connectionsbetween inputs, activities, outputs, andoutcomes as a framework for evaluation.

The second program component—carrying out organizational change projectsin their own organizations—is congruentwith relationship-focused systems modelingand systems change evaluation. The opera-tions, culture, and constellation of units

within each participant’s organization con-stitute a baseline organizational system atthe time each participant enters the leader-ship development program. Each organiza-tion functions within some context andenvironment. As part of the leadershipdevelopment experience, participants under-take some self-selected change effort, e.g.,board development, strategic planning, staffdevelopment, reorganization, or evaluation,among many possibilities. These are effortsaimed at increasing the effectiveness of theparticipating organizations and can be mod-eled and evaluated as systems change initia-tives. Evaluative case studies would capturethe changed relationships within the organi-zations, both changed relationships amonginternal elements (e.g., between board and


E X H I B I T 10.14Matrix Depicting Simple to Complex

SociallyComplicated

Build relationships,create common

ground

Zone ofComplexity

Technically Complicated

Experiment, coordinate expertise

Simple

Plan, control

Far

from

Far from

Clo

se to

Close to

Ag

reem

ent

Certainty

ChaosMassive Avoidance

SOURCE: Adapted from Stacey (2007); Zimmerman, Lindbery and Plsek (2001:136–141).


staff, or between technology support unitsand line management units) as well aschanged relationships with organizations inthe environment (e.g., collaborations, newor changed partnerships, new suppliers,changed community or funder relation-ships). The focus on changed relationships,linkages, and connections makes systemschange evaluation an especially appropriateframework for this aspect of the program.

The third overall thrust of the programinvolves supporting self-organizing net-works among participants to infuse newenergies and synergies into the nonprofitsector. This constitutes a vision rather thana measurable goal. It’s not at all clear whatmay emerge from such networking (noclear causal model), and the value of suchnetworking is hard to measure. Indeed,there’s no particular plan to support suchnetworking other than bringing theseleaders together and have them interact forintense periods of time. Beyond the net-working, it’s both impossible to predeter-mine what might occur as a result of theinfusion of new leadership into the non-profit sector and it would be politicallyinappropriate for the philanthropic funderto make such a determination because itwould be controversial. Indeed, part of theintervention is support for the nonprofitand voluntary sector leaders to engage indialogue around what actions and initia-tives would revitalize the sector. The out-comes in this case will be entirely emergent.The evaluation would involve real-timemonitoring of emergent initiatives watch-ing for what the self-organizing network-ing yields. Indeed, in a real case where thisform of emergent evaluation was actuallyundertaken, the results turned up confer-ences organized, regional institutes estab-lished, lobbying efforts coordinated,collaborations created, new partnerships,and shared development of materials.None of these efforts were predictable in

advance. They emerged from the processand were captured through developmen-tal evaluation, specifically, periodically e-mailing participants to inquire abouthow they were working with others and,when something turned up, interviewingthem about the details. Exhibit 10.15 sum-marizes and compares how these threeevaluation approaches, representing differ-ent theories of change, can be combined ina single comprehensive evaluation of theleadership development program.

Matching the Evaluation Frameworkto the Nature of the Intervention

The principle illustrated by the preced-ing leadership development program isthat the modeling framework and evalua-tion approach should be congruent withthe nature of a program intervention.Understanding an intervention as simple,complicated, or complex can significantlyaffect how an evaluation is conducted(Rogers 2008; Martin and Sturmberg2005).

When the intervention is readily under-stood as fitting traditional linear logicmodeling, then the evaluation would docu-ment the program’s inputs, processes,outputs, outcomes, and impacts, includingviable and documentable linkages connect-ing the elements of the model. This is thetraditional and dominant approach to pro-gram theory modeling and evaluation. Inmost cases, the outcomes for a linear logicmodel will be changes at the individuallevel among intended beneficiaries, e.g.,changes in attitudes, knowledge, behavior,and status (well-being, health, employ-ment, etc.). In other words, the unit ofanalysis for the evaluation is typically indi-viduals and individual-level change.

Systems mapping offers an approach forevaluating systems change efforts. In manycases, the object of an intervention is




E X H I B I T 10.15Evaluation Design for a Leadership

Development Program: Different Components Manifest Different Theories of Change

Programcomponent

Problem framing

Type of theory ofchange

Degree ofcertainty abouthow to achievedesiredoutcomes.(Horizontal axison the StaceyMatrix)

Degree ofagreementabout thedesiredoutcomes(Vertical axison the StaceyMatrix)

Evaluationquestions

Evaluationdesign

Leadership development:Increased knowledge andskills, and use of those skillsin their work.

Simple/Complex

Linear logic model: Programtraining increases knowledgeand skills. Complex:Additional unique,unanticipated, and emergentoutcomes likely with such ahigh-powered group.

High certainty: manyleadership programs haveproduced changes in skills &knowledge. There issubstantial knowledge abouthow to support professionaldevelopment. Highlyexperienced instructors.

High agreement thatnonprofit leaders should haveleadership skills andprofessional developmentopportunities.

Are the desired outcomesachieved? Can theseoutcomes be attributed to theprogram? Do the trainedleaders use their new skills intheir work?

Pre-post assessment ofchanged knowledge andskills. Follow-up to assessapplication of new skills.

Organizational change: Eachparticipant carries out a projectof his or her own choosing todevelop the organization.

Complicated/Complex

Systems change:Participants’ organizationalsystems are changed throughprojects. Projects varygreatly and are chosen byparticipants and theirorganizations. Complex:Each is unique.

Moderate to low certainty:Degree to whichorganizations changedependent on a large numberof factors, many of which areoutside the leaders’ control.

Varied agreement about theneed for organizationalchange among participants’organizations; some areopen, some resistant; mostuncertain about what isinvolved.

What projects do participantsdo in their organizations?How are their organizationschanged? How arerelationships altered?How are the organizations’relationships with externalinstitutions affected?

Case studies of participants’projects focusing on howtheir organizations arechanged.

Networking and leadershipwithin the national nonprofitsector: Vision of newenergy and vitality.

Complex

Complex adaptive self-organizing network:Informal groups emergeand decide to collaboratearound shared interests.

Very low certainty:Outcomes are unclear andunspecified; not evenpossible to specify all thevariables that come intoplay; high likelihood thatchance encounters willplay a part.

Low agreement about howthese leaders shouldengage together in thelarger nonprofit sector;vague vision ofengagement, but thespecifics will be emergentand opportunistic.

What informal groups ofparticipants self-organize?What do these emergentsubgroups do together?What impacts flow fromtheir emergent activities?What developments occurover time?

Developmental evaluationfollow-ups to track whatemerges and developsover time.


a change in a system, for example, develop-ing an organization (an organizational sys-tem change effort) or creating collaborativerelationships among organizations, or con-necting organizations and communities insome new ways. The unit of analysis is thesystem, and the focus is on changed rela-tionships and interconnections, which arethe defining elements of how the systemfunctions. For example, philanthropic fun-ders or government grants often seek tocreate genuinely collaborative relationshipsamong organizations operating indepen-dently of each other (often referred to atthe baseline as operating in “silos” or “ele-vator shafts”). An evaluation looking atthe effectiveness and results of such a sys-tems change initiative would map changesin relationships. Network analysis andmapping (McCarty et al. 2007; Durlandand Fredericks 2005; Bryson et al. 2004)are powerful tools for capturing anddepicting such dynamic and evolving sys-tem relationships (or absence of same whenthe change process fails to work).

Outcome mapping (International Devel-opment Research Centre 2007) is anapproach that recognizes the complexnature of international development initia-tives, especially those that involve multi-ple partners collaborating together, whereeach is contributing in some way tochanges in the behaviors, relationships,activities, or actions of the people, groups,and organizations with whom a programworks directly. Outcome mapping incor-porates aspects of linear logic modelingby having a program map the behaviorchanges it can affect within its direct sphereof influence; it incorporates systems think-ing by mapping relationships with partnersand recognizing the impossibility of drawingsimple causal attribution conclusions wheremany factors affect change and cooperat-ing partners each make contributions to

change; and it brings an appreciation ofcomplex adaptive processes to bear bytreating an outcome map as emergent anddynamic, a map that can be and should berevisited and revised as conditions change(as they surely will) and as new under-standings emerge (which is surely to behoped for and encouraged).

This last point deserves emphasis. Logicmodels and program theories can changeand evolve over the life of a program. NickTilley (2004) has described how theCrime Reduction Programme in the UnitedKingdom metamorphosed over its 3-yearlife, including changing theories of the pro-gram and changing forms of evaluation.The more volatile the environment of a pro-gram, the more likely it is that the programtheory will be affected by that volatility—and by learning from what actually unfolds.

Developmental Evaluation

Developmental evaluation is especiallyappropriate for situations with a high degreeof uncertainty and unpredictability wherean innovator or funder wants to “put thingsin motion and see what happens.” Usingthe example of the leadership development


Developmental Evaluation for ComplexAdaptive Systems

Developmental Evaluation is especiallyappropriate for situations with a high degreeof uncertainty and unpredictability where thepurpose of the evaluation is to supportdevelopment, adaptation, and innovation in acomplex adaptive environment characterizedby rapid change and dynamic systemrelationships among important influences andactors. (See Chapter 8, pages 277–290, for anin-depth discussion of DevelopmentalEvaluation.)


program, infusing a hundred highly trainedand supported national leaders into the non-profit sector fits these parameters. All kindsof things can happen; many unexpectedresults may emerge (and did in the actualinitiative on which this example is based).Some form of real-time, open-ended track-ing is needed to capture what emerges. Thisis especially appropriate for venture capitaland highly entrepreneurial, seed moneyefforts where the strategy is to infuse peopleand resources to shake up a system, increasethe rate and intensity of interactions amongsystem elements and actors, and see whathappens. As things start to happen, addi-tional resources can be infused to supportfurther development. The very act of captur-ing what emerges and feeding informationback into the evolving system makes thisform of developmental evaluation part ofthe intervention (a form of process use). Insuch developmental evaluation, thereis not and cannot be a classic separationbetween the measurement process and what is being observed (Gamble 2007). The observations affect the emergent self-organizing by injecting new informa-tion into the dynamic and emergent sys-tem. For example, following up with thetrained leaders to find out who they are networking with and what is emerging can stimulate them to network. Even theunit of analysis can change in emergent,developmental evaluation as the evaluatortracks whatever emerges; that is, the very def-inition of units of analysis is emergent as theevaluator follows the complex nonlineardynamics of the connections and linkagesamong those exchanging information andengaged in some forms of self-organized col-lective action.

Attention to complex adaptive systemsprovides a framework for understandingsuch common evaluation issues as unintended

consequences, irreproducible effects,lack of program fidelity in implementa-tion, multiple paths to the same out-comes, “slippery” program requirements,and difficulty in specifying treatments (Morell 2005:72). Glenda Eoyang (2006a)has described how a complexity-basedapproach to evaluation was used in a largesocial services department with 3,000employees at the county level to helpintegrate services and functions. Theevaluation elucidated emergent “networksof meaning,” supported new approaches toalignment in looking at how different unitsconnected to each other, and tracked theevolution of language about what was hap-pening, which shaped emergent meanings.Eoyang and Berkas (1998) have examinedand generalized the particular contribu-tions that the lens of complex adaptivesystems can bring to evaluation. Theyconcluded,

The system-wide behaviors of a ComplexAdaptive System (CAS) emerge over time.For this reason, the evaluation systemshould focus on developmental and emer-gent patterns of behavior that:

Match the developmental stage of the sys-tem. . . . Consider the dynamical pattern thatthe system exhibits over time to design anevaluation program to capture the “differ-ences that make a difference.”

Track patterns and pattern changes overtime, rather than focusing exclusively onbehaviors of specific individuals or groups.While it may be unreasonable to expect aparticular path of development or a pre-determined outcome from a CAS, emergentpatterns of behavior can be expected out-comes. An effective evaluation system mustbe able to capture and report on these evolv-ing patterns. (Eoyang and Berkas 1998:www.chaos-limited.com/CAS.htm; see alsoEoyang 2007)



The Challenges ofEstablishing Causality

Theories of change are important in evalu-ation because causality is a central issuein making judgments about a program’smerit, worth, or significance. The classiccausal question, in all its simple brilliance,is: Did the program produce the desiredand intended results? Or, to what extentcan the observed outcomes be attributed tothe program’s intervention? Establishingcausality becomes more complicated overlonger periods of time, when there are mul-tiple interventions occurring at the sametime, influences flow in both directionsbecause of feedback loops, and interdepen-dent relationships in a dynamic systemmeans that all kinds of interconnected ele-ments can change at the same time. Thenotion of cause and effect can lose allmeaning in highly dynamic and complexadaptive systems characterized by nonlin-ear patterns of interaction, where smallactions can connect with other smaller andlarger forces and factors that wanderhither and yon, but then suddenly emerge,like a hurricane, as major change at someunpredicted and unpredictable tippingpoint. Evaluation methods, designs, andtools for measuring predetermined out-comes and attributing causes for simpleinterventions that can be depicted as linearlogic models are of little use in tracking theeffects of interventions in complex adap-tive systems. Indeed, the imposition of suchlinear designs and simple measures can doharm by trying to control and therebystifling the very innovative process beingtracked. Different kinds of theories ofchange, then demand different evaluationapproaches to conceptualizing and assess-ing causality (Rogers forthcoming), includ-ing the importance of developmental

evaluation for use in complex adaptivesystems. The challenge is to match theevaluation approach to the nature of thesituation, the theory of change at work,and the information needs of primaryintended users.

In all this, it is important to interpretresults about causal linkages with prudenceand care. In that regard, consider the wis-dom of this Buddhist story.

One day an old man approached ZenMaster Hyakujo. The old man said, “I amnot a human being. In ancient times I livedon this mountain. A student of the Wayasked me if the enlightened were stillaffected by causality. I replied saying thatthey were not affected. Because of that, Iwas degraded to lead the life of a wild foxfor five hundred years. I now request you toanswer one thing for me. Are the enlight-ened still affected by causality?

Master Hyakujo replied, “They are notdeluded by causality.” At that the old manwas enlightened.

—Adapted from Hoffman 1975:138

Causal evaluation questions can beenlightening; they can also lead to delu-sions. Unfortunately, there is no clear wayof telling the difference. So among the manyperils evaluators face, we can add that ofbeing turned into a wild fox for 500 years!

Follow-Up Exercises

1a. Identify a program that has identi-fiable and measurable outcomes for indi-vidual participants in the program andwhere the length of the program is lessthan one year. Examples would be pro-grams that treat alcohol and drug abuse,parent education classes, an agriculturalextension program for farmers, a job



training program, and so on. Based on thewritten materials available on that pro-gram, construct a linear logic model andexplain the steps in the model.

1b. Develop a logic model with thestaff in an actual program. Describe howstaff reacted to the process of developingthe logic model.

2. Take a logic model and convert it toa theory of change by explicitly identifyingthe causal mechanisms and/or assumptionsthat explain why each step in the modelshould occur. For example, a simple logicmodel would specify that increasedknowledge about the health risks of smok-ing would lead to reduced smoking(Exhibit 10.3). What predictive theoryundergirds and explains the causal connec-tion between knowledge and behaviorchange in this example? Pick your own logicmodel example and add the causal linkageexplanations and/or validity assumptionsthat convert it to a theory of change.

3. Pick a highly visible political issue anddepict it as a logic model. For example, whatis the “logic model” that explains why tor-ture is used to extract information from pre-sumed terrorists (e.g., the Abu Ghraib torturecontroversy in the Iraq War)? Construct thelogic model for some matter of visible publicpolicy and then analyze and comment on theassumptions of advocates. Examples of top-ics: solutions to global warming; relief ofThird World debt; more gender equity byeducating girls in developing countries; ban-ning gay marriage; stopping family violence;ending child prostitution; building a highwall between the United States and Mexicoto stop illegal immigration; teaching youngpeople to abstain from sex until marriage;cutting down old growth forests fortimber; drilling for oil in the Alaska wilder-ness; or any current public policy contro-versy. Portray the proposed intervention

and alleged solution or desired result as alogic model. Discuss and explain the “logic”of the model you construct.

4. Exhibits 10.8, 10.9, 10.10, and 10.11portray different ways of looking at a pre-natal program for pregnant teenagers.Reconstruct those exhibits as a postnatalprogram. The intended and desired out-come of a prenatal program is a healthybaby and healthy mother when the baby isborn. The intended outcome of a postnatalprogram is a healthy mother and child2 years after the baby is born. Reconfigurethose four models for a postnatal program.Discuss and explain the changes.

5. Pick an intervention of some kind,for example, a program aimed at stoppingpeople from talking on cell phones (mobilephones) while driving. Pick your ownexample. Using the distinctions betweensimple, complicated, and complex(Exhibits 10.13, 10.14, and 10.15), presentand analyze that intervention from the per-spectives of (a) a simple problem that canbe solved through a linear model; (b) acomplicated problem that requires under-standing and changing system relation-ships; and (c) a complex problem that isbest portrayed as emergent and uncertainand requires developmental evaluation. Inessence, look at the sample issue or inter-vention through these three differentlenses. Compare and contrast whateach illuminates and makes possible—and the evaluation implications of eachperspective.

Note

1. Reprinted from Angels in America, PartTwo: Perestroika by Tony Kushner. Copyright1992 and 1994 by the author. Publishedby Theatre Communications Group. Used bypermission.



Documents

Alternatives for Evaluating Theories of Change · 2008. 8. 1. · 10 Conceptualizing the Intervention Alternatives for Evaluating Theories of Change All the World’s a Stage for