Developing an Evaluation Strategy for a Social Action Agency

The Board of Regents of the University of Wisconsin System

Developing an Evaluation Strategy for a Social Action AgencyAuthor(s): Walter WilliamsSource: The Journal of Human Resources, Vol. 4, No. 4 (Autumn, 1969), pp. 451-465Published by: University of Wisconsin PressStable URL: http://www.jstor.org/stable/145168 .

Accessed: 08/05/2014 11:44

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

University of Wisconsin Press and The Board of Regents of the University of Wisconsin System arecollaborating with JSTOR to digitize, preserve and extend access to The Journal of Human Resources.

http://www.jstor.org

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 11:44:53 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=uwisc

http://www.jstor.org/stable/145168?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


DEVELOPING AN EVALUATION STRATEGY FOR A SOCIAL ACTION AGENCY*

WALTER WILLIAMS

ABSTRACT

This article is concerned with the methodological and institutional problems faced by a social action agency in trying to make evaluation an important input to its policy process in which major decisions are formulated and implemented. The discussion focuses not only on the present technical capability of evaluation methods to produce useful policy data, but also on the agency's bureaucratic and administrative structure as it im- pinges upon evaluation. Consideration is given to such matters as the possible blockages by agency decision-makers of the development and use of evaluation data, the degree of cooperation to be afforded by program managers in field evaluations, and the administrative capability of social action agencies to implement the changes in the field implied by evaluative results. Thus, the article presents some of the problems that must be faced in trying to formulate an effective evaluation strategy in the real world of a social action agency with its evaluation problems that often defy present technological tools, its bureaucratic struggles, and its complex administrative machinery; and it offers a few tentative sugges- tions for moving towards such a strategy.

The author is currently on a research grant from the National Manpower Policy Task Force as its 1969 Scholar-in-Residence, and is on leave from the Office of Economic Opportunity. * Several individuals have provided helpful comments on earlier drafts of this pa-

per: Robert A. Levine and Tom Glennan, both of The RAND Corporation; John Evans, OEO; and Glen Cain and Robinson Hollister, both of the Institute for Research on Poverty, University of Wisconsin, Madison. The views expressed are those of the author, and not those of the organizations with which he is affil- iated.

The Journal of Human Resources ^ IV * 4



452 | THE JOURNAL OF HUMAN RESOURCES

I. INTRODUCTION

Much attention both in the government and in universities has been focused

recently on the type of evaluations which involve the use of modern an-

alytical techniques to ask the basic question, how well does a program or

project work? For example, what is the outcome in terms of benefits and costs?' One of the most critical issues is whether or not social action agencies will be able to develop and use these "outcome evaluations" as one of the critical inputs to the policy process in which major agency decisions are formulated and implemented.

Despite the fact that a number of outcome evaluations have been undertaken over the past few years, we start this line of inquiry very near point zero. Only a few social action agencies have begun to think seriously about problems involved in making evaluative analysis a significant policy guide. Further, two factors indicate that the path towards evaluation be-

coming a major element in the policy process may be both long and involved.

The present methodological tools are inadequate, particularly in some areas, and it will require time-consuming work to overcome these weaknesses. At the same time it will be a major hypothesis of this article that evaluation techniques now available can improve greatly the decision data available to the policy planner.

These improvements will not flow automatically, but will require a major agency effort that often must overcome formidable bureaucratic barriers. As outcome evaluations are quite new to social action programs, it seems likely that acceptance of the implications of such results will be fought by those with established decision-making positions. In the past, social action agencies have measured operating "performance" in terms of honesty (no embezzlement), prudence (no profligacy), cost control (not using too many paper clips), and occasionally relatively crude out- put standards (the number of job placements in a training program). How- ever, under cost-benefit standards, for example, the program manager can be honest, prudent, and thrifty (all, no doubt, great virtues) and still look like a clod with a shockingly low benefit-cost ratio. Beyond embar- rassment, evaluation data have a potential for either restricting program funds or forcing major changes in program direction. One can hardly assume passive acceptance by program managers and operators of outcome data as a key factor in agency decisions.

1 For example, the North American Conference on Cost-Benefit Analysis of Man- power Policies, May 14-15, 1969, was sponsored joiftly by two agencies, bne fromf the U.S. and one from the Canadian government, and a university from etch country, ahid brought together a large numfber of persons from both govern- ments and several universities.



Williams 1 453

Finally, in thinking about the development of evaluations, it must be remembered that after a decision is reached, the further hurdle remains of

translating the decision into effective operating policy so as to improve the performance of the agency's programs. Those who plan evaluations need to be sensitive to an agency's administrative structure through which

policy decisions are implemented, for, in the final analysis, the test of the effectiveness of outcome data is its impact on implemented policy.

To develop effective outcome evaluations for a social action agency, consideration needs to be given to methodological, bureaucratic, and administrative problems. In short, an effective evaluation policy must be formulated in the real world of a social action agency with its bureaucratic struggles and its complex administrative machinery. Hence, it seems

appropriate to speak of an evaluation strategy. In evolving an agency evaluation strategy, three major sets of ques-

tions must be asked:

1. What is the present methodological capacity of evaluation techniques for providing meaningful data that will support major program planning and program design decisions? How mayI that capability be increased materially in the near future?

2. To what degree will an agency evaluation staff be able to (a) initiate adequate evaluations of existing programs, including some se- lected project modification to eliminate participant selection biases, produce a wider range of treatment variation, etc., and (b) establish well- designed small scale field projects to test the effectiveness of new ideas with implications for restructuring ongoing programs or developing new ones? That is, how much flexibility will an evaluation staff have in developing a broad set of outcome data relevant for program policy?

3. How can the likelihood be increased that sound evaluation data will have a significant effect on agency policy both at the national level and the local level? That is, what are the major points of leverage at which evaluation data will have the most significant impact on implemented policy?

These three questions are closely related. It is, of course, true that the first question on methodology could be asked in an abstract setting. However, once the evaluation is used as a policy tool, an assessment of its quality is not purely an "academic" matter.

This point becomes strikingly clear in the recently published issue of the Urban Review,2 containing two vehement attacks and a rebuttal article on an evaluation of the More Effective Schools program, a pilot program to improve the quality of education in New York City ghetto

2 "The Controversy Over the More Effective Schools: A Special Supplement,' The Urban Review, 2 (May 1968), pp. 15-34.



454 J THE JOURNAL OF HUMAN RESOURCES

schools. Professor David Fox's evaluation for the New York City Board of Education, indicating little learning gain, was a factor in the Board's decision to reduce the MES program. It is surely relevant for the intensity of the argument that the results of the MES evaluation had important implications for the New York City Board of Education, the United Fed- eration of Teachers, and, lurking in the background, the whole controversy over decentralization of the New York school system.

The direction of cause and effect between the political and the

methodological, however, is not one-way. The more clear-cut the evaluation results, the more likely it is that the program operator must take into account the dictates of the findings. It is also true that the present thrust towards the evaluation of government programs at the highest levels- the 1967 Amendments to the Economic Opportunity Act, for example, levy evaluation requirements in every section-places pressure on administrators to allow meaningful evaluation. That is, the more evaluation is considered a good thing, the more likely it is that the needs of the evaluator will over-ride the concerns of program operators.

Thus, we would argue that one's assessments-in good part subjective-of methodological capability, program prerogatives, the sheer

ability in an administrative sense to change programs, etc., will be important factors, in the short run at least, in the development of a preferred evaluation strategy. Further, discussion of the merits of one strategy or another may become quite clouded if we leave these subjective assessments

implicit rather than setting them out in some detail.

II. OUTCOME EVALUATION AMETHODOLOGY

In the past, much of the evaluation effort has focused on local project measurement. Yet these studies seem unlikely to be useful to program (multiple project) planning activity. This point does not suggest that a local project's performance is more difficult to measure than a total program outcome. What is difficult is assessing the "why" of that individual

performance, and this deficiency blocks generalization to other projects. A claim that a local project model should be replicated widely must be based on evaluation data showing explicitly that success did not derive from atypical quality factors (for example, a charismatic teacher) or from exogenous variables (for example, the level of local economic activity). The development of such local project data will require a quantum leap in evaluation methodology, and it frequently will involve high-cost, small- area socioeconomic analysis.

Quality measures of project administrative and operating components (how good is counseling?) and measures of within-city relationships, such as the degree of cooperation between a manpower program and local



Williams I 455

industry, present methodological problems that strain our present capabilities. Further, if we accept that a local program should be viewed as an administrative entity operating within both its local area and also the

larger community (certainly, the Community Action Agency fits this pattern), the necessary analytical concepts concerning administrative organization and community structure are yet to be derived.

Similiar statements are true concerning the operational specification of treatment variables and the measurement of these variables' individual contribution to project success (for example, how much does counseling add to effectiveness?). Beyond this, the requirements for the measurement of the characteristics of participants seem much greater for individual projects (especially the sticky problem of control groups) than for large-scale programs. For example, did creaming explain success?

Finally, the development of economic data such as the city and/or target area unemployment rates and the demand for blue-collar jobs would

certainly be expensive and may present difficult methodological problems. The analysis is yet to be done which shows what combinations of SMSA, city, or subarea unemployment rates and the occupational distribution are relevant to the evaluation of training programs. Yet the types of analysis listed in the preceding paragraphs are the ones needed to demonstrate that a particular project really works and may warrant duplication in other areas.

The type of data most likely to support generalization (that is, pro- vide a valid basis for making decisions about large numbers of projects) is what will be termed a gross program statistic-a statistic derived from a large-scale sample survey of a program that shows the over-all level of effectiveness (for example, a mean benefit-cost ratio) of a total program (MDTA), major components (OJT), and/or major elements (remedial reading or coaching) within a program or component.

The difference in the burden put on measuring instruments between an evaluation of a single project and a total program derives from the law of large numbers. As Cain and Hollister observe in discussing the more global program evaluations: "It is the strength of summary, over-all measures of performance that they will include the 'accidental' foul-ups with 'accidental' successes, the few bad administrators and teachers as well as the few charismatic leaders."3 In a well-designed national sample of a total program, one can expect a wide range for the values of quality variables and exogenous factors. Under such circumstances, an over-all program

3 Glen G. Cain and Robinson G. Hollister, "Evaluating Manpower Programs for the Disadvantaged," a paper presented at the North American Conference on Cost-Benefit Analysis of Manpower Policies, University of Wisconsin, Madison, May 14-15, 1969, pp. 14-15.



456 1 THE JOURNAL OF HUMAN RESOURCES

statistic (for example, a benefit-cost ratio for the total Head Start program) may have statistical validity even though these quality and exogenous factors are not included explicitly in the analysis.

While a gross program statistic showing the over-all level of effectiveness of a program provides very useful program planning information, it does not in theory allow an optimum decision. What we have is an indi- cation that all projects taken together do poorly or well. The mean statistic derives from a distribution that can be expected to range, perhaps widely, from "bad" to "good" projects. If the decision-maker expands the program, he does not eliminate the present bad projects or prevent starting new poor- performance projects. Put differently, the decision-maker in essence draws from a distribution in which there is some unspecified probability of draw- ing bad (for example, poorly administered) projects. At the same time, it is certainly of great importance to have information indicating the expected values of various programs and major components (for example, OJT on the average is more effective than institutional training).

The critical question that must be asked is not in theory whether a better decision set could be used (the answer is yes), but whether or not present means of analysis available to the evaluator are likely to produce a better result. That is, does there exist within an actual, or likely, outcome distribution a determinate subset of projects subject either to replicability or elimination that can be expected on the average to have a mean performance rate significantly better or worse than the total distribution mean?

How likely is it at the present that the evaluator can do better with additional analysis beyond the development of the gross program statistic? The weaknesses of the present individual project measures considered above make this type of analysis questionable as a present policy tool. Taking the longer view, attempts to develop individual project data so as to make comparisons of projects at the extremes of the distribution seem useful. However, the most promising near-term step is probably a limited further breakout of within-program variations. For example, if one can determine that an OJT program with prevocational training and coaching follow-up has a higher benefit-cost ratio than other OJT variants, both planners and program managers may be in a better position to effect major changes. The data in essence would have generated a rudimentary model meeting the requirements of the replicability criteria.4

4 Again, the law of large numbers is relied on to get away from the problem of quality and exogenous factors. However, the matter is not so straightforward as before. If analysis is ex post and the "best" program variant contains several (rather subtle) elements, there may be a "quality" effect. Choice of this type of program itself may be related to the quality of the program management in that either "good" managers may choose "good" programs or only "good" managers can handle complex programs.



Williams 1 457

Thus, in my opinion, program (gross) evaluations employing present techniques offer a real potential in the near term for increasing mate-

rially the quantity of useful information available to the planner. However, there must be a far greater concern for the requirements of statistical

design than has generally been exhibited in the past. These requirements will generally include a well-designed sample, early field interviewing to maximize the retrieval of information, repeated follow-up to reduce sample attrition, and a reduction in the importance of heroic assumptions in the model.5 In short, good evaluations are going to need well-qualified evaluators who are funded at "high" levels so that excessive short-cuts are not required, and who are given realistic planning time to develop a sound evaluation model.

Even under such circumstances, it will be necessary to make arbi-

trary decisions and to recognize that many crucial questions are beyond our present capabilities. Consider, for example, a national sample of manpower training programs. Whatever our conceptual problems, we may simply accept that earnings for the six months or one year just prior to and just after training are going to be the proxy for lifetime earn-

ings at the before and after time points. It is almost like a bridge conven- tion; we agree that this is the rule. The problem of participant self-selectivity considered subsequently may not be solved in the field by developing an adequate control group. Thus, we may not be able to derive a true benefit-cost ratio (program participant earnings measured against earnings from a comparable nontreatment control group) but only to compare across different manpower programs. Even this assumes that selectivity factors are not significantly different from among competing programs.

Further, some caution is needed in interpreting evaluation data which generally will mean fitting the evaluation evidence into a mosaic with other reasonable evidence to "validate" a decision. In general, the present gen- eration of outcome evaluations should be viewed as a piece of evidence, not the definitive piece of information that bowls over all other reasonable indications of a different policy decision.

While such cautionary caveats are needed, they should not blind us to the basic fact that good program outcome evaluations using present

5 An endemic problem of past evaluations has been that they contain basic assumptions which could legitimately go either way in that qualified persons would not agree as to whether particular costs or benefits should be included or excluded, Depending on the decision (assumption) to include or exclude the benefit or cost, the benefit-cost ratio could vary widely. If such variation is large, and particularly if a bias towards positive results can be established, the usefulness of the analysis as a planning tool may be greatly diminished. For example, see Glen G. Cain, "Benefit/Cost Estimates for Job Corps," Discussion Papers (Madison: Institute for Research on Poverty, University of Wisconsin), pp. 3-4, 49-50.




techniques are likely to produce a signal improvement in planning data. This is especially true in light of the limited amount of good evidence from other sources now available to social program planners. At the same time, this optimism in no way counters the grave deficiencies of evaluation methodology, particularly for measuring local performance. And we have not even left the laboratory to encounter the problems of using these tools in the field or of getting evaluation data a hearing in the bureaucratic structure.

III. CONTROL OVER PROGRAM DATA AND DESIGN

Professor Suchman has observed:

Perhaps one of the easiest of research assignments is to lay out an "ideal" evaluation study design. It is not so much the principles of research that make evaluation studies difficult, but rather the practical problems of adhering to these principles in the face of administrative considerations.6

The nature of the possible conflict between operator and evaluator can be seen by brief analysis of the administrative implications of two of the statistical requirements of a good evaluation design. Cain and Hollister observe:

[I]f we do not have random assignments we must still admit the possibility that self-selectivity or the selectivity procedures of the program administrators has introduced a systematic difference between the participants and the non-participants.7

The standard solution of randomization may create horrible administrative problems. Assume a special reading program designed to help teenagers who have some specified demographic characteristics and read below some level. For the most pure of experiments, we need to draw au naturel a sample of teenagers having the appropriate characteristics, get some of them into the program (the test group), and absolutely bar some others (the control group). If we are not so pure and study persons who choose the program, it still follows that we must allow some of them to take the course and exclude others by purely random methods. Given a greater demand for the course than slots, random selection in theory is feasible. It is not clear, however, that program personnel are anxious to allow a table of random numbers to supersede their functions.

6 Edward A. Suchman, Evaluative Research (New York: Russell Sage Foundation, 1967), p. 21. Italics added.

7 Cain and Hollister, "Evaluating Manpower Programs... ," p. 21.



Williams 1 459

Cain and Hollister state that "designing the programs to be studied with as wide a range in levels and types of 'treatments' as possible will serve to maximize the information we can extract from an ... analysis."8 The "wideness" of the range generally requires the implementation of new, and frequently complex, methods; as well illustrated in the More Effective Schools controversy, this is difficult:

Despite the administrative and organization changes, little has happened in the way of innovation or restructuring in the basic teaching process. Observers noted that a majority of lessons they witnessed could have been taught to larger classes with no loss in effectiveness. When asked about changes in the "method of instruction," administrators and teachers alike pointed to the small class and the use of specialists and cluster teachers, which we would consider administrative changes rather than changes in methods of instruction.9

We may generalize this conflict between evaluator and operator through use of Weberian "ideal" types. The characterization sought is for the ideal types of "good" planner-evaluator and program operator. Both seek the good, the pure, the beautiful.

The goal of the planner-evaluator is to find effective techniques (models) that can be replicated, so his concern is not per se particular participants but the target population of which they are a sample. His focus then leads him to say: Within the bounds of human decency, con- sistently make choices in terms of participants and program situations which will increase the likelihood of valid statistical generalization.

The ideal operator's goal is to maximize the effect of his particular project. As a professional in operating social programs, he starts from the premise that he knows how to help people. His goal would indicate program tinkering based on professional knowledge: Do not hold the design if program effectiveness seems likely, in his opinion, to be raised by either minor or fundamental changes. In short, he will make choices biased towards the participant in the program and against the research design when these are in conflict.

From these "good" ideal types, it is easy to run through various Lord of the Flies-type scenarios in which the chaste and the good come to be at each other's throats. In the interest of realism, if we add a few dashes of the nasty, brutish, and short (for example, that program managers do not like the idea of being measured for success),10 the formula produces a real world of terribly complex problems for evaluation.

8 Ibid., p. 22. 9 "The Controversy Over the More Effective Schools . .. ," p. 17.

10 See Suchman, Evaluative Research, for a good discussion of this point.




For ongoing programs, the problems would seem to indicate the following rather bearish predictions: (1) Program operators will not be very cooperative in evaluations, especially in permitting evaluators to modify their programs in terms of participant selection, treatment variation, etc. (2) Even if such modifications are allowed, it remains questionable that participant selection procedures and design modifications will be implemented properly or carried through for a sufficiently long time to permit a meaningful evaluation.

IV. THE AGENCY R&D PROGRAM1l

Small scale field projects carried out in advance of larger-scale applica- tions offer the possibility of avoiding many of the administrative problems of ongoing programs. The emphasis on "project as program" typical of ongoing projects can be shifted for R&D to "project as experiment," properly concerned with statistical niceties.

But the necessary, if not sufficient, conditions are severe if an R&D project is to have a reasonably high probability of producing usable outcome data. The requirements include (1) a clearly defined set of treatment variables specified in operational terms and implemented in the field to meet those specifications; (2) a design sufficiently general so that the results will have broad application or can be replicated; and (3) a data retrieval system likely to produce the statistically valid outcome data required to measure the effect of important project variables. In addition, the project should either fit into a broader experimental design setting forth alternative treatments (about which outcome information may or may not be available) or itself contain sufficient diversity in treatments so as to allow meaningful comparisons with feasible alternatives.

To my admittedly limited knowledge, R&D programs in the social action fields do not appear to have had a high level of success either in avoiding administrative problems or in meeting the necessary conditions for a project to produce statistically valid models.12 These factors seem to me to have been important in this failure:

11 Social action agencies fund small scale projects operated under field conditions to try out new program ideas under a variety of labels-"E&D," "Demonstration," or "R&D"; we will use the last one.

12 It should be noted that no recent study has investigated these programs in terms of the scope of their activity, their mode of operation, their staffing pattern, their relationship to the agency evaluation staff, and so on. Subsequent discussion will elaborate on the dimensions of these questions.



Williams 1 461

1. Except in unusual circumstances, R&D projects have been run by professional program administrators, not researchers. The project offers the operator the chance to try out all his pet ideas outside of the confining structure of a regular program's rules and procedures. Such "freedom" seems to lure the atypical outstanding operator with "his" ideas that he plans to try out. So we have a "super operator syndrome" that -immediately raises grave doubts about the replicability of the projects.

2. R&D projects are often generated by field demands. That is, the R&D staff responds to proposals developed in the field (probably by operators) generally without any design structure for assessing the relevance of the project to over-all R&D needs and priorities.

3. Single projects raise the severe methodological problems of controlling for quality and exogenous factors and specifying and measuring treatment variables in meaningful operational terms, and the severe logistical problems of insuring that the treatment design is, in fact, implemented and that, once implemented, is adhered to over time.

4. The necessary conditions for an R&D project to produce valid outcome data indicate staff members with knowledge of substantive areas, with a background in statistics and experimental design, and with the ability to organize, implement, and supervise field work. Government R&D staffs in social action programs seem far short of these standards.

5. R&D staff members have at times had a very different orientation towards the program. Let me sketch the characteristics of an ideal- type "action-operator" orientation towards projects: (1) a strong com- mitment to action-things must be done before it is too late; (2) a fairly limited knowledge of analytical technology but at least a feeling for the severe limitations of the present methods; (3) a much stronger belief than the academic scholar both that he knows what needs to be done and that he can observe ("evaluate") success with limited empir- ical evidence; and (4) a strong belief that institutional (structural) blockages are a critical and often over-riding factor in inhibiting the progress of minority groups that are so often the target of social action programs. The projects developed under this rationale have not as a rule had sound analytical underpinnings.

So we come to find as a result of poor staffs, an action orientation, or both, that R&D projects have often failed not only to develop replicable program models useful to the decision-maker, but also to produce results that would further the development of evaluation methodology. And, given the complexity of evaluating individual projects, the failure to add cumu- latively to our knowledge may be the great sin. Despite the millions spent on social action R&D projects, seemingly we have gained scant additional knowledge about techniques that would have moved us much closer to making methodological breakthroughs.




V. THE AGENCY DECISION-MAKING AND IMPLEMENTATION PROCESS

An agency's established decision-making and implementation process is a critical factor for developing an evaluation strategy. That a particular agency will have decision centers which derive their power from a variety of sources such as personal charisma, long-established congressional com- mittee ties, a powerful constituency (organized labor, for example), etc., and that such centers of power may impinge on the evaluator is hardly a revelation. After recognizing the relevance of the "real politics" of the

agency for developing an evaluation strategy, one gains little from a theo- retical discussion, as any particular power situation is generally quite idiosyncratic. The implementation process (how programs get put in

place), however, both has general relevance across agencies and illustrates the kinds of political administrative questions facing the evaluation strat-

egist. The question of an agency's capacity to implement programs starts

from a set of simple, observable, and irrefutable facts. National level planners,l3 perhaps relying on evaluative analysis, may make major agency policy decisions, but they do not implement them. Nor do senior Washing- ton-level program heads. Rather, the task of implementation falls to "some-

body out there" in the field. Nevertheless, the end result of the policy process-implemented policy-ought to be the ultimate concern of the evaluation strategist. My claim, of course, is not that the senior agency evaluator or planner should be responsible for implementation or that he should be an expert in how to run a specialized program. Rather he must be an expert in terms of the capabilities and limitations of the field operation. It doesn't make much sense to come up with sophisticated new program alternatives-perhaps derived from evaluation activity-when you know that programs of this type are simply not going to be implemented at the local level for technical or political reasons.

In terms of these statements, the question of implementation has three aspects: how well articulated by the planner is the policy that is to be implemented, how capable administratively is the field or local staff to

implement it, and how much ability does the agency have to force the

change upon the local operator. It seems a fair generalization that planning staffs have been naive

about the complexity of translating fairly abstract policy concepts into

meaningful field operation terms. Thus, the first breakdown in the network

13 This paper will not consider the normative question of where evaluation should fit in the agency policy structure. Let us simply assume that the agency has a senior staff planner who is a key agency policymaker and has evaluation as one of his major responsibilities.



Williams 1 463

from the national office to the field may be in the planner's failure to trans- late outcome evaluation results from ongoing programs and R&D activities into a form usable in the field.

Next, some assessment is needed of the administrative capacity of the agency's field and local organizations to implement detailed and complex specific program changes. In part, successful policy implementation depends on the agency's regional or travelling field staff-how competent and how numerous (one good man in the Western Region is not likely to serve adequately the needs of 20 cities). However, the organization and technical skills of the local operator are probably the single most critical factor. For example, to put it euphemistically, rural CAA staffs are said to be rather thin. Can one envision the implementation of anything but the most simplified program changes? Also, there are problems involving the flexibility to change or replace local personnel. The teacher with 20

years' experience and seniority in a civil service system or a union may be there to stay, even though the new program requires quite different skills or a markedly different orientation.

Finally, the question of the agency's political ability to force field

compliance is germane. The Employment Service represents a classic case in which a local entity fully funded by the federal government has strong city and state ties that insulate it against federal change. Such considerations again reduce to idiosyncratic cases, which makes them no less im-

portant but only more difficult to generalize about. But it does seem certain that political blockages at the local level often will defy solution by an

agency. If the author has learned one thing after four years in the federal

government, it is that translating a major decision for a complex change (for example, initiating a sophisticated new program) into operating policy that accurately reflects that decision is a terribly difficult task. It is my strong opinion that one should assume that a social action agency is in-

capable of initiating such changes in a manner that will conform in detail to the basic decision. This statement, of course, does not suggest that

complex changes will not be made in the future-only that gaps between what is planned and what actually takes place are likely to be quite large unless direct action is taken to improve the agency implementation process.

The claim is not purely one of incompetence of field staffs, but reflects the difficulty of the task itself. The problem of operationalizing a concept so that it has meaning for a program operator, as suggested before, is one of the most difficult of tasks. Trying to do this in many places, as is re-

quired for a national program, raises the level of the problem exponentially.




VI. CONCLUSIONS

The previous sections have presented a discouraging set of problems facing the developer of an evaluation strategy for a social action agency: inferior methodological tools, severe field problems in implementing evaluations, ever more severe problems in implementing new program ideas derived from evaluations and from other sources, problems of integrating outcome evaluation results into the agency decision-making process, a basic weak- ness of R&D programs in producing good outcome data, etc. Nor has this

grim picture been painted to set up a dazzling solution. However, an

approach to the problem can be offered that rests on three sets of hypoth- eses:

1. Significant progress towards better over-all program outcome evaluations can be made using present methods. However, such progress will require a major effort to develop more competent evaluation staffs for the hard and often time-consuming task of developing and mounting sound projects.14

2. The big breakthroughs in evaluation methodology for individual projects will come, if at all, only from a program manned by a high quality internal staff and outside contractors and carefully planned in terms of a time horizon that is unlikely to be less than five and probably will be closer to ten years.

3. For evaluative analysis to have a major impact on policy, basic changes are required in a social action agency which (a) clearly establish the place of the evaluator in the decision process, (b) move managers and field operators towards greater cooperation, and (c) develop the agency implementation process to a point at which complex changes can in fact be implemented.

The breadth of these required changes indicate the problem is much larger than evaluation per se. After all, the paucity of hard data from any source and the weaknesses of the implementation process predate the new stress on evaluation. An outcome evaluation strategy must be viewed as part of what should be a larger strategy for a social action agency to

develop a greater concern for hard objective evidence,15 and for the terribly 14 If this paper does nothing else, it is hoped that it lays to rest any notion that there

is an easy path to producing good outcome evaluation data. It seems to me that there is a tendency in Washington to look for the shortcut because outcome data are needed here and now. Among other things, this emphasis has pushed evaluation contractors towards the "quick and dirty" project,

15 The emphasis in this article on hard evidence does not in any way suggest that only such evidence should determine social decisions. In particular, distributional decisions among heterogeneous groups (poor vs. nonpoor) will be influenced heavily by subjective assessments of a normative sort. Basically, once we decide who needs help, hard evidence should point towards more effective ways of providing this aid.



Williams 1 465

complex problems of implementing policy in the field. No one who has faced these problems will view them as easy or be confident of more than limited success. No one with any appreciation of the problems will expect to eliminate the "irreducible area for decision-making for which reason runs out."l6 Still, one can hope for progress towards harder evidence and less arbitrary judgments.

The notion of a long time-horizon in much of the agency's evaluation

activity will be a hard pill for the head of a social action agency to swallow. The pressures on him to act quickly are tremendous. It would be a sterile exercise to push aside these political considerations. At the same time, the case for a new attitude towards outcome evaluations must be made.

If basic changes are not made and a realistic time-horizon is not

adopted, someone else a few years hence will be recreating an article of this type. If we continue to thrash around as at the present, we seem des- tined to circle about, returning to the same point without greatly increasing our knowledge. Be it ever so unpalatable, this is a high probability outcome unless we start making the very hard changes dictated by the reality of the situation.

16 Charles E. Lindblom, The Intelligence of Democracy (New York: The Free Press, 1965), p. 188.



Documents

Developing an Evaluation Strategy for a Social Action Agency