Some Critical Issues in the Improvement of Instruction Through Programed Learning

TEACHING MACHINES AND

AUTO-INSTRUCTIONAL PROGRAMS

SOME CRITICAL ISSUES IN THE IMPROVEMENT OF INSTRUCTION THROUGH PROGRAMED LEARNING

IF WE ARE SERIOUS about improving the quality of instruction through programing techniques, we must address ourselves to at least five critical needs or issues which have not as yet received the attention they deserve. These issues were discussed in more detail in a paper which I presented recently at the 26th Educational Conference, in New York City, sponsored by the Educational Rec- ords Bureau and the American Council of Education. Although the full proceed- ings of this conference will be published in the relatively near future, a brief comment on the major issues identified in my paper seems to be appropriate here.

A first requirement that needs to be met is for objective, unbiased public in- formation. There is need to point up not only the potential promise of programed instruction but, also, the practical limi- tations and problems that will be en- countered in applying the techniques in the immediate future. The widespread dissemination of the interim guidelines prepared by the A E R A - A P A - D A V I Joint Committee--originally published in the July-August, 1961 issue ofA VCR, p. 206-07--was undertaken as one step

towards meeting this need for public in- formation. This statement has by now been reprinted in the NEA Journal and in a number of other educational periodicals.

A second critical issue (with which the Joint Committee has also been concerned) arises from the need for the wise allocation of government and foundation support both to certain kinds of practical development efforts and to basic, supporting research. The latter is needed to improve techniques, and to provide a firmer foundation on which subsequent practical development can be based. We need to recognize that basic or explora- tory research aimed at discovering new techniques and the testing of theories cannot, by its very nature, be guaranteed to pay off in each and every instance. It should, nevertheless, be given increased support because of its cumula- tive long-range payoff value. For more immediately oriented efforts, experimen- tation involving successive empirical tryout and improvement of programs in new curricular areas is to be recom- mended. Such useful spade work should be regarded as an indispensable prereq-

[61]

62 AV COMMUNICATION REVIEW

uisite to conduct of large-scale field tests, or attempts at spectacular demonstra- tion. Field suitability tests of those programs which have achieved a high level of potential effectiveness through re- peated tryout and revision should be accompanied by thorough studies of ad- ministrative, economic, and logistic fac- tors affecting the use of programed materials in schools.

A third critical issue is the relation- ship of programed-instruction techniques to other media and instrumentalities of education, including educational televi- sion and radio. Programed materials for use by individual students can provide a needed complement to other media of instruction. But, in addition, techniques for the optimal sequencing of material developed in the context of self-instructional materials can be applied in improving the content and sequencing of instruction presented to large groups by the mass media. Some recent Title VII experiments at the American Institute for Research by Gropper and Lumsdaine serve to illustrate some of these possibilities. (See Research Abstracts and Analytical Review in the November- December 1961 issue of AVCR.)

In terms of a broader horizon of educational requirements, I would like to stress the importance of applying and modifying the techniques of instructional programing to help meet the needs in areas of the world where educational resources are at present less bountiful than our own. For those areas, we par- ticularly need to explore the use of self- instructional materials in the training of teachers, and also the combined use of radio, recordings, and inexpensively- presented visual materials. Some, though not all, of the concepts of programed instruction could currently be utilized in

many regions where realistic considera- tion of present economic and geographic constraints would make the use of indi- vidually presented self-instructional programs unfeasible for the immediate future.

It seems clear that the concepts and techniques underlying teaching machines and programed instruction represent a potential (though as yet far from fully realized) breakthrough of the first mag- nitude toward more effective attainment of any particular educational objective. But this very fact focuses attention on a fourth issue--namely, the need to re- examine often outmoded methods of defining educational objectives and, thus, to identify those objectives toward which the evolving new techniques of programed instruction can most usefully be applied. The rapid advance being made through focusing massive effort on the development of new techniques of instruction may fail to be paralleled by comparable progress, equally needed, in the sharpening and redefinition of educational objectives. The two kinds of progress--with respect to ends as well as means--must go hand in hand if the sustained progress we need is to result.

A fifth critical issue involves the question of quality control for programed instructional materials, and points to a need for the development of standard- izable techniques for assessing programs. As has been previously indicated in this department (A VCR, July-August, 1961, p. 206) , the major concern of the AERA-APA-DAVI Joint Committee is with the preparation of technical recommendations for this purpose. The Educa- tional Testing Service of Princeton, New Jersey, is another organization which is also working actively on the problem of program assessment, and with which the

TEACHING MACHINES 63

Joint Committee is maintaining close liaison. The issue of quality control was the topic of a paper presented recently by Ernst Z. Rothkopf, of the Bell Tele- phone Laboratories, at the previously- mentioned 26th Educational Conference. Dr. Rothkopf pointed out that little guid- ance has heretofore been available as to the criteria by which educators may judge the quality of the programed self- instructional materials now being offered to them. In his paper, "Criteria for the Acceptance of Self-Instructional Pro- grams," he distinguished two classes of criteria which might be considered. The first category involves internal characteristics of the program; here one would ask, in effect, whether the program meets requirements of a theory of programed instruction. The second category is concerned not with the observable characteristics of the program material, but rather with what might be called its out- put or achievement characteristics; it asks, in effect, what the program can do. Members of the Joint Committee share Dr. Rothkopf's opinion that, for the present at least, criteria for the assessment of programs should be developed in terms of the latter category rather than the former. (C/. Lumsdaine and Glaser, Teaching Machines and Pro- grammed Learning; 1960, p. 566.)

Dr. Rothkopf has advocated a "label" for self-instructional programs which would specify their merits in terms of achievement on a suitable performance test as attained by the students who had used the program under specified conditions. Here is the outline of his proposal:

A SUGGESTED "LABEL" FOR SELF-INSTRUCTIONAL

PROGRAMS

The following items would be useful

in evaluating programs and in deciding on their acceptability:

1. A complete copy of a performance test, including alternate forms, which represents the complete specification of the behavioral goals of the instructional program;

2. Specification ol the student population used during evaluation: (a) number; (b) educational level and character (means and variability); (c) achievement levels (means and variability); (d) IQ (means and variability); (e) before-instruction performance test scores (means and variability); (f) other predictor measures; e.g., reading pro- ficiency;

3. Conditions of administration during evaluation: (a) group size and super- vision; (b) incentive systems; (c) distribution of practice; (d) supplementary instruction; (e) other supplementary procedures such as laboratory exercises, demonstrations, films, etc.; (f) teaching machines which were used; (g) physical arrangements of room (s) ;

4. Results of evaluation: (a) performance test scores, means, variability, and relations to population characteristics such as IQ; (b) administration time as a function of population characteristics;

5. Recommendations for use;

6. Price; 7. Supplementary report describing

the techniques used in the development of the program.

The Joint Committee is currently at work on a considerably more detailed, though still preliminary, formulation of technical recommendations on procedures for assessing the performance characteristics of programed materials. It is expected that these preliminary recommendations will be made available


early in 1962. In the meantime, members of the Joint Committee believe that Dr. Rothkopf's proposed "label" is a useful interim guide which can well be commended to the attention of program producers and potential program pur- chasers.

Several recent papers have been de-

voted to various aspects of the problem of assessing programed material. One of these recent papers, by George Geis of Hamilton College, appears below. Addi- tional papers representing various ap- proaches to this critical problem of program evaluation will appear in early subsequent issues of A FCR. ---A.A.L.

SOME CONSIDERATIONS IN THE EVALUATION OF PROGRAMS

The author of this paper is assistant pro- fessor of psychology at Hamilton Col- lege, and is the senior author of the Hamilton College Introductory Psychol- ogy Program, developed under a grant from the Ford Foundation's Fund for the Advancement of Education.

W I T H HUNDREDS OF PROGRAMS either available or in the final stages of preparation, the field of programing must increasingly turn its attention to the evaluation of its products. This paper proposes that we approach measures of evaluation cautiously, and that we recognize some of the complex problems in programing which may affect the data of evaluation.

Evaluation Data

The two kinds of data most often cited are (a) error-count that is, the number of erroneous fill-ins an item produces in a constructed-response program and (b) time saved, either by students or teachers, in a program as compared with conventional teaching methods.

G E O R G E L. G E l S

Error Count

Too often we read or hear that a program is good because of its low error rate, yet are not given supporting evi- dence to show that the program accom- plishes its original purpose of producing certain changes in the student's behavior. A low error-rate is a necessary but far from sufficient condition for a good constructed-answer program.

A reduction in errors can be achieved, for example, in a program that requires only the copying of responses instead of fill-in constructed answers. A more subtle and serious way of achieving low errors is by continuous "false cuing." Francis Mechner is writing extensively on this subject (3). It is sufficient to point out here that one of the important techniques used in programing can turn into an Achilles' heel if it is not used properly. Known as fading or vanishing, it is the withdrawing of irrelevant stimulus support after the response has been sufficiently strengthened by means of such props. Suppose the behavior we want the student to emit is eventually to be controlled by a set of conditions called x. We would begin by producing


the behavior under condition y. Next, we would go through the process of pairing x and y, and then of fading the y condition until the student's behavior is under the control of x alone. However, in many programs the final behavior sometimes remains under the control of y; or, if the programer is a little more sophisticated and aware of the danger, the control shifts to another set of irrelevant props. Thus, a programer who is fading support by leaving out more let- ters on successive frames in a spelling program may overlook the fact that the response is coming increasingly under the control of the grammar of the items or, even, of the previous answers in the program.

We must not accept a low error rate as indicating anything more than just that: the learner is making few errors on the program. The crucial issue is whether or not he has learned anything. Part of the definition of learning is that the response is being emitted under the appropriate conditions, not that the response is being emitted at high strength under inappropriate conditions.

Time Score

A second but somewhat specious measure of a program's efficacy is the time-score. We are all familiar with statements about the time that program instruction has saved in comparison with conventional teaching methods. These claims constitute a selling point to the educator who is faced with overcrowded, understaffed schools. Since the data on time-saving are so important to him, this measure deserves attention here.

Despite the appeal of such data, they are of questionable use in evaluating a program as good or bad. Consider, first, the claim that s tudent time is saved. We

assume that one of the major advantages of a program is that it lets each student progress through the material at his own rate. We may also assume that one student will be enrolled in a certain program much longer than another. If this slower student stays with the program until he reaches mastery of the material, he is likely to take longer to finish the course than to finish one using conventional teaching methods: After all, under the present educational system, he is not required to reach anything but bare com- petence before he completes the course. On the other hand, learning-time may be cut for the student who comes to the material with well-developed skills in the area, those skills which Glaser calls "existing behavioral repertoires (2) ."

Small-step analysis and individual presentation should lead to rapid mastery of the subject matter in most cases, but the emphasis on time-saving may befog the real issue: what is the final level of performance achieved?

That programed instruction saves the teacher's time is also said to be another asset. In the sense that the material covered in the program does not have to be covered in the same way in class, obvi- ously the teacher's time will be saved. But the teacher can expect no more from the program than the terminal behavior it is designed to produce. A program in, say, psychology is not necessari ly a course in psychology. While it may be possible to make course and program equivalent, the programs I have seen all require additional teaching--a statement with which most of these program authors would agree. If the teacher knows the terminal behavior that the program is to produce, he may very well feel there is still a great deal left for him to do. The program may not produce all


of the behaviors he wishes his students to demonstrate. In addition, the success of the individual-tutoring technique the program affords may encourage him to spend m o r e t ime teaching, not less. Finally, with effective programs, the students become so well prepared that the teacher will more and more feel called upon to develop complex, sophisticated behavior in the student. Such behavior takes time to produce; also it is behavior that the teacher has not often been prepared to develop..

In summary, the program is too often looked upon either as a way of cutting teaching staff or, at least, of maintaining the same size staff with an increased number of students. And the program may be used, or misused to this end. However, many of us see the programing movement as producing pressure for lar- ger and better trained teaching staffs. The goal of education, it seems to me, is not to save money but to increase the number and the level of sophistication of educated people in our society.

One last word on both error-rate and time-saving. Let us not forget one of the original aims of the programing movement: to focus attention upon the individual student. It is vitally important, therefore, that we do not get drawn into educational statistics showing us how a group does on the average. Let me re-emphasize: in programing we have committed ourselves to the individual. If we fail him, we are faced--accord- ing to our own tenets--not with a "low I.Q.," not with "poor motivation," not with "the end of the distribution." In- stead, as far as that individual student is concerned, we are faced with a bad program.

In the above paragraphs I have stressed the need for continual emphasis

on the terminal behavior of the program. The final evaluation of a program must rest upon proof that the program produces the behavior it sets out to produce. The procedures to obtain the appropriate data are yet to be worked out. Although it already seems that they will be complicated and costly, we must not settle for cheaper, more easily obtained data if they are essentially irrelevant to intel- ligent evaluation.

Terminal Behavior

Suppose that we reject time- and error-scores as the most important data for evaluation. Then we are left with the specification of terminal behavior, and with data on how well the program produces that behavior. The first command- ment of programing to which we all pay obeisance is: define the terminal behavior which the program is designed to produce. Theoretically, such an exhaus- tive definition is drawn up by the programer before he starts to write. This, however, simply is not done except, per- haps, in the case of short, rather restricted programs. I do not mean the programer does not approximate such a definition; I mean he has not worked out a detailed list of every response, or class of responses, which he hopes to condition. Such a list would, of course, be invaluable in estimating the power of the program. Such material should ac- company each program if for no other reason than that it would forestall a certain kind of criticism based on faulty expectations.

In lieu of a specific definition of terminal behavior, the teacher or student will be guided by the traditional cues: title, summary of contents, etc. Thus, a program in psychology suggests that the student will come away from the


program with large, weU-integrated units of verbal behavior appropriately conditioned. While one has similar expectations from a textbook or a course en- titled Psychology, this is not to say that the student does acquire the expected behaviors from any of these sources. The user of a program may call it a failure because the behavior he expected is not produced. It seems to me that the programer must not allow the user's expectations of what the program will accomplish to be different from his own. A program cannot be held responsible for not teaching something it was not

designed to teach. But the programer may be held responsible for indicating that it will do so. This indication may be an error of omission: the programer does not provide a sufficient definition of terminal behavior, and so allows the user to be guided by title, etc.

If the programer does not supply a definition of final behaviors, at least he can present data for criterion tests. Such tests, admittedly, are samples. They are supposed to indicate the repertoires that the student has acquired. But the history of psychometrics illustrates how difficult it is to define these repertoires. We must be cautious, indeed, in generalizing from sample tests to knowledge or ability. One special danger in programing is that the sample test may too closely resemble the original program. It is possible that the programer may intend to build in short verbal responses under restricted stimulus control. Whether or not he has succeeded may be shown in the results of a short answer or multiple choice test. But most often this kind of behavior is not our goal. We are really aiming at rather large units of verbal behavior. We must then either show test results which demonstrate directly that such behavior

is conditioned by the program; or we must show that our sample tests, regard- less of form, tap these behaviors; that is, that they are indicators of such large repertoires.

It is fairly evident that a student learns something from well-programed material. But learning is like freedom: it means different things to different people. It is sufficient in a strict sense to show that a program teaches what it is supposed to teach. Yet the programer as an educator has some responsibility to meet the educational needs of his society. It is true those needs are at present ill-defined. However, we suspect that students should come away from programed material with something more than short responses to well-defined stimuli. Even the most hard-headed programer hopes to produce additional behaviors of the kind traditionally referred to as "the ability to integrate or relate materials," "the ability to discuss the problem," etc. Such behaviors, expected by educators and expected by society from the educators, are at least to be considered as terminal behaviors for the programer to produce. In addition, the investigations involved in defining and producing such behaviors provide a fas- cinating job for the programer. Quite aside from being a tool of education, the technique of programing provides a way of investigating verbal behavior and suggests problems in verbal behavior. The teaching machine may serve the same role in the investigation of verbal behavior that the Skinner box has served in the investigation of more general problems of behavior.

In summary, I am suggesting that we should not overlook the exciting problems which grow out of an attempt to produce more complicated chains of be-


havior via programed instruction. These are the behaviors the educators expect us to produce; they are also the behaviors which bring up some major problems in verbal behavior.

Reintorcement

Another problem in the over-all evaluation of a program is to determine the reinforcements in the material. Many of us assume that they are of the type proposed by Skinner (4) and others: the student's going on to the next item, his knowledge of results, manipulation of the teaching machine, getting the answer correct, etc. As long as our assumption remains tentative, we will not run into too many difficulties. However, we must remember that, by definition, a reinforcement is a change in the environ- ment that strengthens the preceding response. The program may fail to produce the appropriate behavior in an individual not because it is technically a poor program, but because the supposed reinforcements are not, in fact, reinforcements at all.

We know from our work with animals as well as from simple observations of humans that cond i t ioned re in force rs come in a great variety of sizes and shapes--that , indeed, one man's meat is another man's poison. It seems not un- warranted, then, to point out that since the program involves rather specific, conditioned reinforcers, we must em- pirically determine whether these reinforcers are universal ones. Even within a rather limited population, such as the one in a college or school, we find a great many students who do not seem to respond as we expect when they re- ceive a higher mark, or when they get an opportunity to go to the next course or the next assignment. One might sus-

pect that for such students, their going on to the next item, as one example, may not be a sufficient reinforcement.

The modern school does not offer enough reinforcements to the student, nor does it selectively reinforce the student with the available reinforcements. This point has been well made in many articles on programed instruction. Yet we must not assume that if the present reinforcements which the schools are able to provide were made available uni- versally, and were made contingent upon the emission of appropriate behavior, they would prove to be reinforcing to every student. We must not only consider ways in which the current educational reinforcements can be made more widely available and appropriately contingent, but we must also examine whether or not we need to add reinforcements--different kinds of reinforcements. Indeed, one might look at the programing movement as an opportunity to investigate the variety of reinforcements which are effective.

Evaluation procedures may show a program as a failure when the only thing wrong with it is that the supposed reinforcements are not reinforcing for the group on which the program was tested.

In addition to discovering reinforcements that may work in short programs, the programer must consider whether all the aspects that he believes are reinforcing will continue to be so through- out a long program. Early in the game, the question was asked of programers: Will the student remain interested in the teaching machine when all of his classes are programed? It is premature to guess at the answer; in fact, the question is a bit naive. Nevertheless, to determine the reinforcements in a specific p rogram-- or better, which ones should be built


into a new program-- is an exciting problem for further exploration. I have made one attempt to point out some possibilities in this area (1) . In summary, a program that may fail with a class or with a single student might prove successful with the same group or individual after a mere change in its reinforcements.

This last point brings up some inter- esting possibilities. There have been dis- cussions of diagnostic tests to be used in determining at what level, or part, of a program a particular student should begin. These tests are analogous, I suppose, to the customary readiness tests in education. We could also design diagnostic tests to disclose a particular set of reinforcements for use with a particular student as he moves through the program. Similarly, we could conceive of education consciously building in some conditioned reinforcers for later use in the education of the child. This process is to some extent in effect now; e.g., we use marks as reinforcers. But new reinforcements which prove espe- cially adaptable to the programing situa- tion might be developed.

To close on this note suggests an over- all comment: The important data of evaluation should provoke us to more

than a simple decision to accept or reject a particular program. These data suggest additional and important prob- l e m s - s o m e at a rather basic level for the programer and the psychologist to investigate. Our failures in programing should not lead to dismay, nor merely to superficial rearrangements of the material. They present opportunities for exploration in the complex field of human knowledge.

REFERENCES

1. Geis, George L. "Some Possible Rein- forcements in a Program." Unpublished paper delivered at the First Conference of Language Programmers, April, 1961, University of Michigan.

2. Glaser, Robert. "Principles of Program- ming," Programmed Learning: Evolving Principles and Industrial Applications. Edited by J. P. Lysought. Ann Arbor: The Foundation for Research on Hu- man Behavior; 1961. p. 17.

3. Mechner, Francis. Programming for Automated Instruction, Supplements I, II, III (mimeographed materials). New York: Basic Systems, Inc.; 1961.

4. Skinner, B. F. "The Science of Learn- ing and the Art of Teaching," Harvard Educational Review 24: 86-97; 1954. (Reprinted in Teaching Machines and Programmed Learning. Edited by A. A. Lumsdaine and Robert Glaser. Depart- ment of Audiovisual Instruction, Na- tional Education Association, 1960. p. 99-113.)

Programer, Teach Thyself

These three words are the title of a mimeographed booklet explain- ing why programer, programed, and programing are now spelled with only one m. Available free from The Center for Programed Instruction, 365 West End Avenue, New York 24, N. Y., this 23-page booklet----of handy bookmark width--programs the slippery rules for doubling the final consonant or leaving it single when adding the suffixes ing, er, or ed.

Documents

Some Critical Issues in the Improvement of Instruction Through Programed Learning