Impact of cognitive theory on the practice of courseware authoring

Journal of Computer Assisted Learning (1993) 9,194-221

lmpuct of cognitive theory on the practice of courseware authoring S. Ohlsson Learning Research and Development Center, University of Pittsburgh

Abstract The cognitive revolution has yielded unprecedented progress in our understanding of higher cognitive processes such as remembering and learning. It is natural to expect this scientific breakthrough to inform and guide the design of instruction in general and computer-based instruction in particular. In this paper I survey the different ways in which recent advances in cognitive theory might influence the design of computer-based instruction and spell out their implications for the design of authoring tools and tutoring system shells. The discussion will be divided into four main sections. The first two sections deal with the design and the delivery of instruction. The third section analyzes the consequences for authoring systems. In the last section I propose a different way of thinking about this topic.

Keywords: Cognitive theory; Computer-assisted learning; Courseware authoring; Courseware design; Learning; Student modelling.

Improving the design of courseware

Design in general and courseware design in particular can be conceptualized as a search through the space of possible designs (Pirolli & Greeno, 1988). The structure of search processes has been analyzed in detail, because of their importance in the psychology of problem solving (Newel1 & Simon, 1972) as well as in artificial intelligence (Pearl, 1984). Briefly, a search process consists of two main subprocesses: the generation of plausible solutions and the evaluation of those solutions.

The generative step is difficult mainly because the space of possible designs is too large to explore in its entirety. The designer must find some way of focussing on the most promising solutions. The evaluative step is difficult mainly because the evaluation of a design is often costly, time consuming, labour intensive, morally dubious, or all of the above. It would

Invited conhibution to the NATO Advanced Research Workshop: Authoring Environments for Computer-based Courseware, Maratea, Italy, May 1991.

Correspondence: S. Ohlsson, University of Pittsbur h, Learning Research and Development Center, Pittsburgh, PA 15260, USA. Emaik steUan&ms.cis.pitt.edu 194

Impact of cognitive theory on the practice of courseware authoring 195

benefit a designer to have a quick and easy way to evaluate how well his or her design works in practice.

It follows that theory can impact design in two principally different ways: by guiding the generation process towards the best design candidates and by facilitating evaluation. Cognitive theory has the potential to impact the design of courseware in both these ways.

Guiding design generation

Bruner has proposed that the application of cognitive research to instruction requires a layer of theory sandwiched between the psychology of learning and the practice of instructional design. He calls that layer a theory of instruction (Bruner, 1966a). Unlike a theory of learning, a theory of instruction is prescriptive. Its principles do not describe how the mind works; instead, they specify what to do when designing instruction. The two types of theory are not independent of each other. The principles of an instructional theory constitute the prescriptive consequences of a descriptive theory of learning. Bruner's idea of a theory of instruction has been widely adopted by other educational researchers (e.g. Glaser, 1976; Resnick, 1983).

A theory of instruction can be seen as a device for guiding instructional design. Its principles would presumably claim that one type of design should be preferred over another. This is precisely what is needed in the generation of a design. The question arises whether cognitive research has, in fact, led to any interesting instructional principles in the 20 years since Bruner's proposal and, if so, what implications those principles have for the design of courseware and, indirectly, for authoring systems.

This is not the place to attempt an exhaustive review of the state of instructional theory. Instead, I will briefly summarize four hypothesis about learning, each of which has given rise to an interesting and non-trivial instructional principle. These principles, derived from different traditions of cognitive research, will be used in this paper to illustrate several points with respect to courseware design and the construction of authoring systems.

Advanced organizers

The theory of meaningful learning proposed by Ausubel (1963; 1968) and developed further by Reigeleuth and Stein (1983) is based on the hypothesis that learning does not proceed in the way in which a brick wall is build, one brick on top of the other in bottom-up fashion. Instead, knowledge grows the way an organism grows. Initially, there is only a small, undifferentiated blob of material which, however, contains within it a blueprint for the structure to be developed. Development is a process of unfolding. The initial blob is differentiated, expanded, and elaborated until the structure that was implicit within it has been fully articulated.

This view of knowledge growth implies the instructional principle that each segment of instruction should begin with a capsule formulation, a so- called epitome, which foreshadows what is coming. An epitome is not an

196 S. Ohlsson

outline, but rather a succinct statement of the fundamental idea or organizing principle of a particular subject matter unit. If the student internalizes the epitome, then learning can proceed by expanding, differentiating, and elaborating this cognitive kernel. This instructional principle has considerable empirical support (Reigeleuth & Stein, 1983) but is often ignored in practice.

Cognitive conflict

A central tenet of the cognitive theory proposed by Piaget is that mental growth is driven by cognitive conflict, which in his terminology is called disequilibrium (Piaget, 1985). A person is driven to learn when his or her cognitive structures are insufficient to handle a particular situation or problem. The learner will experience inconsistency and contradiction, which in turn trigger change processes which aim to restore equilibrium. A similar hypothesis was proposed by the cognitive consistency school of social psychology that flourished in the 50's and 60's (Abelson ei al., 1968). The implication of the cognitive consistency hypothesis is that learners must experience cognitive conflict in order to learn, or at least that cognitive conflict stimulates learning.

The prescriptive principle that follows from this hypothesis is that an instructional designer should deliberately induce cognitive conflicts in the learner. This can be done in a variety of ways. For example, the learner can be asked to predict what will happen in a particular situation and then be shown that his or her prediction was incorrect. However, empirical research has shown that this technique does not always cause cognitive change (Kuhn, Amsel, & OLoughlin, 1988). Better results have been obtained by asking subjects to temporarily adopt a position different from their own (Murray, Ames, & Botvin, 1977). In yet a third application of the cognitive conflict idea, Clement (1991) used what he calls bridging analogies to teach Newtonian physics. A bridging analogy is a situation that is similar both to a situation which the learner understands and also to one which he does not understand correctly. Reflection on such a bridging situation should make evident to the learner the contradiction in his or her own thinking.

Succession of representations

Bruner (1966b) has suggested that knowledge develops in a progression through three types of encoding: from enactive encoding, which enable the learner to act; to iconic encoding, which enable visualization; to symbolic encoding, which enable abstract thought. This theory has given rise to the pedagogical principle that abstract topics such as mathematics should be taught with the help of manipulative, physical objects which embody mathematical concepts and principles and on which the learner can perform various operations in the concrete before they try to understand them in the abstract (Dienes, 1960). This pedagogical principle has become a common part of educational practice, at least for the teaching of arithmetic in primary


school, although neither theoretical analysis (Hall & Ohlsson, 1991) nor empirical evidence (Sowell, 1989) support its effectiveness.

Goal-hierarchies

Information processing theories of cognition claim that cognitive processes can be seen as mental analogues of computer programs. To analyze a cognitive skill is to write down the mental program or strategy the learner has to construct in order to execute that skill successfully. A number of hypotheses have been proposed about the mechanisms by which strategies are learned (e.g. Mahr, Langley, & Neches, 1987) and these mechanisms have many instructional implications, most of which remain unexplored.

The most thorough attempt to develop a set of instructional principles on the basis of an information processing analysis of cognitive skills is to be found in the work of Anderson (Anderson et al, 1990). An important concept in his theory, as in computational analyses in general, is that of a goal hierarchy (Anderson et al., 1987). Most skills, cognitive as well as sensory- motor, can be analyzed hierarchically into a set of goals and subgoals. For example, the skill of moving a piece of text from one file to another on a word processor might by analyzed into three subgoals: select the text to be moved, specify the place where it is to be inserted, and then insert it. These three subgoals might in turn be broken down into smaller subgoals, different for different word processors, until we reach the individual key strokes or mouse clicks that actually accomplish the task. In his focus on the hierarchical breakdown of skills, Anderson is continuing a long tradition in education (Gagne, 1970; Jonassen, Hannum, & Tessmer, 1989; Resnick, 1973).

The instructional principle implied by the hypothesis that cognitive skills are hierarchically organized is that one should tell the learner about the goal hierarchy of the skill to be learned. To continue the word processing example, the learner should be told that moving text consists of three parts: select the text, specify a place, and insert. Each further breakdown of the skill should be taught explicitly as well. This idea has been employed in several instructional computer systems which have been demonstrated to deliver effective instruction (Anderson et al., 1990).

Summa y

Cognitive theory might impact the design of instruction, and indirectly the construction of authoring systems, through instructional principles which guide the generation process towards the most effective designs. Research suggests several such principles, including: a. that each unit of instruction should begin with an advanced organizer; b. that students should be motivated to learn through the induction of

cognitive conflicts: c. that abstract topics should be taught through operations and

manipulations with concrete objects, and d. that the goal-subgoal breakdown of a skill should be taught explicitly.

198 S. Ohlsson

Adherence to such principles might improve the generation of instructional designs. Their implications for authoring systems will be discussed after the following discussion of the evaluation of instructional designs.

Facilitating design evaluation

Artefacts do not follow deductively from the descriptive theory which underpins their design. Consider the case of airplane design. Obviously, an airplane cannot violate the principles of physics; its design has to be consistent with the principles that govern matter and energy. It is equally obvious that the design of an airplane is not u deductive consequence of the fundamental laws of physics. If it were so, then the perfect airplane would have been designed long ago and airplane engineering would be a closed subject. However, there is no such thing as a perfect airplane, only different airplanes which are more or less useful for different purposes. The laws of physics define a space of workable designs and a particular design is located by searching that space for a point with a desirable combination of features.

Because a design is not a deductive consequence of a theory, it is a hypothesis and as such it needs to be evaluated. Although aerodynamics is a reasonably well understood science, a new aircraft nevertheless has to undergo testflights. The less theory there is to build on, the more important is the evaluation. When a design space is searched with very little theory, one may need to test partial or incomplete designs in order to guide the development. In education, this is known as formative evaluation. The idea that the development of instruction should be informed by successive evaluations along the way to a complete design was originally proposed by Scriven (1967) and it has been widely adopted as a principle of good instructional practice.

However, the principle of formative evaluation is easier to agree with than to live by, for several reasons. First, formative evaluation is time- consuming. To teach a course obviously requires as much time as the course is designed to cover; formative evaluation of a six-week course is going to take at least six weeks. When the completion of the design is under a strict deadline, there might not be time to perform successive evaluations along the way to the final product. Second, formative evaluation requires the manpower to carry out the teaching and the collection and analysis of data. Other costs might occur as well, e. g. the cost of teaching materials. If the development is on a tight budget, the resources might not be available. Third, formative evaluation is morally dubious. It amounts to using human beings as guinea pigs by subjecting them to instruction which is known to be suboptimal. In contexts in which the instructional outcome might have a major career impact, people might quite rightly object to being so used.

There is yet another reason why formative evaluation is less common in practice than one might expect. Suppose that a partial design goes out for test and that the results are disappointing. Haw shmld the results be interpreted? The idea of formative evaluation is that the results of the evaluation should guide and inform the future development of the design.


However, knowing that the design as a whole does not work is not very useful. For the evaluation to fulfil its formative function, there must be some way to assign the blame of the failure to some component or aspect of the design, which can then be deleted or revised or replaced. However, the inference from a failure to its underlying cause is always difficult and often impossible. Consequently, formative evaluations risk being uninformative.

The problem of interpreting a formative evaluation is not unique to instructional design. Consider the design of the very first airplane. Inventors tried many different designs for airplanes in the century between George Cayley's definition of the problem in 1799 and the Wright Brothers' successful flight in 1903. Each inventor would build what he thought was a successful design and then try to fly it. In the analysis of the invention of the airplane, these modestly successful testflights were extremely hard to interpret:

'I. . . there is no simple way to attribute characteristics of flight performance to features of the design. Did the craft fly 80 feet because it was a biplane configuration or because it had a rear-mounted elevator? The global performance metrics give few clues to the individual contr- ibutions of the design features. (The difficulty of interpreting the outcome of the formative evaluations of these designs) can be seen in the many setbacks suffered by most inventors: it was not uncommon for a moderately-successful craft to be replaced by one that could not be flown."

(Bradshaw & Lienert, 1991, p. 609).

In this analysis of why the Wright Brothers succeeded where the others failed, Bradshaw and Lienert point out several crucial features of the formers' method, including the invention of a technological device to facilitate and speed up formative evaluation. A key problem in the invention was to find a wing shape that would give sufficient lift. Instead of building entire aircraft with differently shaped wings like the other inventors at the time, the Wright Brothers tested models of wings in a wind-tunnel. In a very short time, they evaluated 50 different wing designs and found one that would work. This would have taken many years, if each wing were to have been tested in a full-scale aircraft. The wind-tunnel provided a tight design- evaluate-redesign loop that enabled the Wright Brothers to solve this design problem ahead of their competitors. The key to progress was to replace realistic but slow and uninformative evaluations with rapid, technology- based tests under artificial but informative conditions.

Recent advances in the theory of learning has opened up the possibility of a technological development which might alleviate the difficulties associated with formative evaluation of instructional designs. Beginning with the seminal papers of Anzai and Simon (1979) and of Anderson, Kline, and Beasley (1979), a theory of the acquisition of cognitive skill has been formulated. This theory states, in brief, that cognitive skills are acquired by specialization of so-called weak problem solving methods on the basis of experience.

200 S. Ohlsson

One remarkable consequence of this theory is that we now know how to construct computer models that simulate learning. We can write down hypotheses about learning in the form of specifications for an artificial intelligence (AI) program which learns. When such a program is run over a set of practice problems, it acquires the intended skill. Researchers have had considerable success in constructing executable models for a variety of learning situations and in correctly accounting for a variety of data from human learning (e.g. Klahr, Langley, & Neches, 1987). This development is the greatest breakthrough that has ever occurred in our attempts to understand learning and it carries within it the potential for a powerful methodology for formative evaluation of instructional designs.

Simulation models of learning do not just learn; they learn from instruction. Every learning model requires inputs of some sort. The inputs vary from model to model. The first simulation models learned from a sequence of practice problems or from repeated trials on a single problem (e.g. Anzai & Simon, 1979). VanLehn (1990) extended this paradigm by noticing that textbooks often show so-called solved examples, i.e. a problem for which the salient parts of the solution have been laid out for the student to inspect. VanLehn's Sierra model learns from such solved examples. Extending the range of instructions that can be handled by computer models yet further, Anderson (1989) has proposed a model that takes declarative statements, e.g. geometry theorems, as inputs, and then constructs procedures for solving geometry proof problems from them, a process called knowledge compilation. The declarative representation used by this model when learning geometry is clearly analogous to the input a student in a geometry class receives when he listens to a teacher or reads the geometric theorems in a textbook. Anderson's theory has been used to analyze a number of different instructional topics besides geometry. A different model of learning from declarative information has been proposed by Ohlsson and Rees (1991).

Direct instruction, solved examples, and sequences of practice problems constitute a large proportion of the instruction that is delivered in a typical classroom or training centre. We can now simulate the acquisition of knowledge from either of these three pedagogical scenarios. We can give a model a set of declarative structures that simulate verbal instruction, a solved example, or a set of practice problems, and watch it learn. The amount of cognitive work the model has to devote to acquiring the target skill can be measured by counting up the number of elementary operations that the model has to go through before mastery has been attained. Thus, the model does not only show us how learning is possible and which cognitive processes are involved, but it also provides a measure of the cognitive complexity of learning a particular unit of subject matter when taught in a particular way.

The final step which brings us into contact with the notion of formative evaluation is to zmy the instruction given to a simulation model and to compare the amount of cognitive work required to learn from the different forms of instruction. For example, a simulation model can be given two


different practice problems and measure the amount of effort required to learn from the two problems. Alternatively, a simulation model might be given two different sequences of solved examples or two different sets of declarative instructions. The cognitive complexity of the learning processes that result in each case are measured. On the assumption that the model is well grounded in data on human learning, a difference between two sequences of instruction with respect to the resulting cognitive complexity constitutes a prediction that it is easier to learn from one sequence of instruction than from the other.

A simulation model of this sort, a 'canned student', can be used for formative evaluation of an instructional design. A courseware designer might face a choice between two different ways of teaching a particular instructional unit. He/she can translate the two different ways of teaching into two different sequences of inputs for the simulation model. By running the model over both sequences, he or she can measure the complexity of the cognitive processes required to learn under the two different types of instruction. If the simulation model has to do more work to attain mastery with one instructional sequence rather than the other, then there is reason to believe that the former is a less effective way of teaching, as long as there are have grounds for believing that the model learns in the same way as human students.

Formative evaluation against a 'canned student' has several advantages. First, it need not be time consuming. There is no need to round up students and actually teach them. The intended instructional design has to be translated into a sequence of inputs for the model, but once this task is done, the actual learning on the part of the machine might be very rapid. The speed will depend mainly on the computational power of the hardware, a commodity that is increasing in availability as rapidly as it is dropping in price. Second, there is virtually no cost associated with this operation, other than the negligible cost of running the computer. Third, there is no moral dilemma. A computer will not complain about inferior instruction.

However, the main advantage with formative evaluation using a 'canned student' is that we can inspect exactly what the 'student' learned and how and why. The computer model can be programmed to provide as informative a log of its own inner workings as we want. After finishing learning, the model can type out a detailed description of exactly how it reacted to each part of the input and what it could and could not learn from each instruction, solved example, or practice problem. Such a description might provide valuable clues to how each part of the instruction works. Although a simulation model is obviously an abstraction from the complex reality of human learning, formative evaluation using a 'canned student' might nevertheless be more informative than an evaluation against human students, for the same reason that the evaluation of wing shapes in a wind- tunnel is more informative than realistic tests: it is too difficult to extract any information from realistic tests about why a design worked or did not work.

As an example of this scenario, I have proposed a simulation model of learning that is based on the idea that the essence of procedural learning is

202 S.Ohlsson

the elimination of mistakes and that it becomes easier to learn a skill the more facts one knows about the relevant domain, because the easier it is to recognize when one makes an error. This theory has been embodied in running simulation model called HS, which simulates human learning in elementary arithmetic domains and in simple scientific skills (Ohlsson & Rees, 1991). It also provides novel explanations for various phenomena associated with procedural learning such as transfer of training and the shape of the learning curve.

In an example of how to use HS in the evaluation of instructional designs, we taught HS subtraction with regrouping under four different conditions. We taught the system two different subtraction methods, regrouping and augmenting. There is a long-standing debate in mathematics education about the relative difficulty of these two methods for young learners (Brownell & Moser, 1949). In addition, we taught the simulation model two different representations for each subtraction method, one conceptually rich and one impoverished. In the conceptually rich representation, the place values of digits represented and the operations executed while solving a subtraction problem were in accord with the place value interpretation of the digits. In the impoverished representation, each subtraction problem was represented as a twodimensional array of digits, with no information about place value, and the operations were represented as physical actions by marks on paper. The model was taught to master each subtraction method with each type of representation, so it learned subtraction four times. The cognitive complexity of each of the four learning processes was measured by counting the number of cycles the system had to go through before it reached mastery.

The results did not accord with the received wisdom in mathematics education. The regrouping method turned out to be more difficult to learn than the equal addition method. This is contrary to current educational practice, at least in American schools, where only the regrouping method is taught. Furthermore, the conceptually rich representation required more work to attain mastery than the conceptually impoverished representation. This observation was true for both subtraction methods. Finally, the advantage of the conceptually impoverished representation was greater for regrouping than for equal addition. In short, the results of this evaluation against a 'canned student' yielded surprises that do not accord with the common beliefs of mathematics educators. The results and their implications are discussed more in depth in Ohlsson (1992a). The important point for present purposes is that comparative evaluation of different ways of teaching a cognitive skill against a 'canned student' is already a possibility.

The idea of using a simulation model of learning as a device for evaluating instruction has occurred independently to other researches*. VanLehn (1990) has described Sierra, a simulation model of learning which, like HS, learns in the domain of subtraction. However, it does not learn from declarative instructions but from a sequence of solved examples. In one

To the best of my knowledge, the first time this idea was put into print was in Ohlsson & Hagert (1980).


simulation study (VanLehn, 1991), Sierra was given two different sequences of solved examples, extracted from two different textbooks both of which are standard in American schools. The model acquired a number of different subtraction procedures, most of them incorrect, from these instructional sequences. VanLehn concluded that children should not be given practice problems for which they have not received the appropriate instruction, because they will construct incorrect procedures which are difficult to correct later. A second conclusion from inspecting the printouts of the simulation runs was that teaching borrowing in the context of problems which successively increase in complexity causes many bugs, because the learner is likely to construct an overly specific procedure which has to be generalized later, with many opportunities for the introduction of new bugs. If at all possible, subprocedures should be introduced in the context of practice problems which vary across the full range of complexity. These conclusions held for both sequences of practice problems given to Sierra. VanLehn (1991) does not report any conclusions about the differential effectiveness of the two practice sequences.

Kieras and colleagues (1987) have applied the idea of evaluation against a 'canned student' to the design of instructional text. Instead of a learning model, Kieras used a model of natural language understanding. The model contains a parser which tries to construct the appropriate linguistic representation of the text that it is given as input. The parser is simple and may fail if the text is complicated; alternatively, it may be unable to distinguish which of several possible interpretations of the text is the correct one. These are, of course, precisely the types of difficulties that weak readers may encounter when trying to understand a badly written instructional text.

In summary, the recent advances in our ability to simulate cognitive processes in general and learning in particular opens up the possibility of constructing a technological device for formative evaluation, a box which can be fed an instructional design and which will give back an evaluation of that design. Tests in wind-tunnels did not replace testflights for novel aircraft designs and an evaluation against a canned student cannot eliminate the need for evaluations with human learners. The purpose of using a 'canned student' is rather to provide a rapid design-evaluate-redesign cycle in the beginning of the design process. Once the design is good enough to pass the test against the 'canned student', it is time for evaluation with human learners. The possibility of doing formative evaluation without leaving your desk would obviously have a tremendous impact on the workhabits of courseware designers.

Improving the delivery of instruction

Pedagogical decision making does not come to an end with the completion of a course design. Delivery of instruction requires a great many microdecisions about how to react to the student, particularly in one-on-one tutoring situations. This is the reason why teaching is such a complex skill

204 S.Ohlsson

(Leinhardt, 1987). What impact might cognitive theory have on the delivery, as opposed to the design, of instruction?

In the current flurry of interest in the application of cognitive science to instruction, it is sometimes forgotten that the original reason to introduce the computer into education was its potential to provide individualized instruction. Each student comes to a course with different prior knowledge and he or she understands or misunderstands the instruction in his or her unique way. Consequently, each student has different cognitive needs, or so the argument goes, and instruction in which a single teacher says the same thing to thirty students must necessarily be suboptimal. Instruction needs to be adapted to the individual learner. This cannot be done in a classroom, but because the computer is a one-on-one interactive device, it has the potential to deliver adaptive instruction to a large number of students on a regular basis. This argument always was, and still is, the main reason to believe that the computer medium provides some pedagogical power over and above that provided by the paper-and-pencil medium.

In order to adapt its responses to the individual student, an instructional computer system must know something about that student. This fact raises three questions: What information about the student should the system store internally? How is the information to be acquired by the system? The conjunction of these two questions is known in cognitive science as the student modeling problem. The third question is how the information about the student, once acquired and stored, is to be used to guide the delivery of instruction. This is known as the problem of tutoring rules.

Research on the student modeling problem has been summarized repeatedly in recent years (VanLehn, 1988; Wenger, 1987), so no exhaustive review will be attempted here. In order to bring out the implications of such research for authoring systems, I divide solutions to the student modeling problem into four types: global descriptions, models of knowledge, models of errors, and models of learning.

Global descriptions

Which aspects of a learner does a tutor, human or artificial, need to know in order to shape the delivery of instruction to fit that particular learner? Some obvious answers come to mind. For example, it is obviously useful to know how old the learner is; one does not teach an eight year old in the same way as a middle-aged person. Similarly, it is helpful to know the educational level of the learner. Teaching a high school drop out is not the same task as teaching somebody with a doctorate. Also, it would be a rare courseware designer who would not collect and make use of data about the success level of a student. I shall refer to aspects such as age, educational level, and success level as global descriptors, for lack of a better term, and I will refer to a collection of such descriptors as a (global) description of the student.

As the above examples show, common sense identifies some relevant global descriptors, but cognitive research suggests others. For example, the psychometric tradition contributes the notion of an IQ scme as a measure of


general smartness. The relevance of a measure of general smartness for instruction can hardly be doubted; obviously, one does not teach a cognitively handicapped person in the same way as a genius*. Another global descriptor to emerge from research is the concept of cognitive style. Research agrees with the pervasive intuition that different people process information differently (Goldstein & Blackman, 1978). The pedagogical potential of this concept seems not to have been systematically explored. A related concept is that of learning style or preferred learning strategy (ONeil, 1978). Some learners might prefer to figure things out by studying examples, while others might prefer to be told the theory and to figure out how to apply it to particular instances. Such individual differences are obviously important in adapting instruction to the individual learner.

Global descriptors do not require complicated programming constructs. To use global descriptors, an instructional computer system only needs to contain a file or data structure with the values of the descriptors for each student; most courseware would contain such a file for other reasons. The values of the descriptors for each individual student can be acquired in different ways. Some of them could be typed in by the teacher or supervisor (e.g. the educational level), while others could be acquired through straightforward questions on the part of the system (e.g. what is your age?). Subtler descriptors such as IQ, cognitive style, and preferred learning strategy represent more of a problem, but they could be ascertained through off-line tests and the results typed in.

However, the main weakness with a student model that consists of nothing but global descriptors is its limited usefulness: How are these descriptors to be used to individualize instruction? Although global descriptors might affect the overall character of a tutoring effort, there are few microdecisions which depend upon the values of such descriptors. For example, one micro-decision to be made during the delivery of instruction is how many instances, and which instances, a particular learner needs to study in order to grasp a particular concept. It is difficult to relate that decision to a global descriptor like cognitive style. Another on-line pedagogical decision is how to deal with multiple errors on the part of the learner: In which order should the different errors be brought to the student's attention? It is difficult to see how global descriptors could inform decisions like these. The answers to such questions seem to depend on what the learner knows rather than on who he/she is. Global descriptions lack pedagogical power because they ignore the content of the knowledge being taught.

Whether standard IQ tests really provide a measure of general smartness is a different question. Intelligence tests are plagued by both practical questions about how to avoid biasing such a test and by deep conceptual questions about what it means to measure "general smartness" and whether such a thing even exists. These issues cannot be discussed here. I am not advocating the use of IQ scores, only using IQ as an example of a global descriptor contributed by psychological research.

206 S.Ohlsson

Models of knavledge

Two important questions for adaptive instruction are which parts of the subject matter the learner has already mastered and which subject matter item - concept, proposition, or skill - he or she should be taught next. Hence, an instructional computer system needs a description of what the learner knows about the subject matter (at each moment in time). This argument forms the conceptual underpinning of so-called merluy models (Carr & Goldstein, 1977). The basic idea behind overlay models is to describe the learner in terms of those subject matter items that he or she has mastered (at some point in time).

Overlay models do not require complicated data structures or inference algorithms. There are three software components involved in the use of such models. First, the subject matter must be analyzed into a set of separate knowledge items. The items can be anything: concepts, principles, skills, heuristics, subgoals, definitions, terms, and so on. The only criterion for dist- inguishing two items X and Y is that it is possible to learn X without necessarily acquiring Y, or vice versa. The items are then connected by prerequisite relations. A subject matter item X is a prerequisite of another item Y, if X must be mastered before the learner is ready to learn Y. Thus, the subject matter representation required for an overlay model is a graph with the individual knowledge items as nodes and the prerequisite relations as links. In procedural domains such graphs are sometimes called learning hierurchies (Gagne, 1970, Ch. 9) and sometimes goal-hierarchies (Anderson et ul., 1987).

In some disciplines, a subject matter analysis might already have been codified by practitioners (e.g. algebra); in other domains, codifymg the subject matter might require a major creative effort on the part of the instructional designer (e.g. medical diagnosis Clancey, 1987). However, the subject matter representation can be done once and for all. It represents an investment in time which might pay off for the implementation of more than just one piece of courseware. In the long run, one should expect the develop ment of standardized computer representations of frequently taught topics.

The second software component required to use overlay models is a procedure for inferring which subject matter units a particular learner has mastered at a particular point in time. The necessary inference procedure is obvious and simple to implement. It requires that each practice problem, question, or other task is classified beforehand with respect to which subject matter items one has to have mastered in order to succeed on that task. Once this has been done, the required inference is trivial:

The Overlay Construction Rule. If mastery of subject matter unit X is necessary for success on task Y, and if the learner succeeds on Y, then mark unit X as learned (or upgrade the probability that it has been learned).

The implementation of this inference rule does not require sophisticated knowledge representation techniques or complicated reasoning mechanisms, nor does its execution require much processing capacity.


The overlay model and the above construction rule can be made more sophisticated in several ways. Most obviously, learning is not all or none and people forget, so it is more accurate to maintain a measure of the probability that the learner has acquired subject matter item X, rather than simply classifying X as either learned or not learned. Furthermore, there might be several different ways of attaining success on task Y, so each task should be related to several alternative sets of items, each of which is upgraded by a small amount upon success. Finally, the accuracy of an overlay model might be improved by decreasing the probability that a knowledge item that is necessary to solve task X has been learned every time the learner fails to solve task X correctly. These extensions are useful but do not change the fundamental nature of the overlay model as a description of the learner in terms of which subject matter units he or she knows.

Unlike global descriptions, overlay models have immediate implications for the delivery instruction. The basic ideas are to avoid teaching what the student already knows and to teach only items for which he or she is ready. The procedure for how to use an overlay model is simple to implement and cheap to execute:

The Overlay Utilization Rule. To select the subject matter item to teach next, search for an item such that (a) it has not been acquired already, but (b) its prerequisites have all been mastered.

The computation required is a search through the subject matter graph for an item with the two specified properties, a computation that can be accomplished with standard graph searching routines.

Overlay models can be made more sophisticated than this sketch illustrates. For example, if there is more than one subject matter items which fulfils the two criteria mentioned in the utilization rule, then the computer system must apply some further selection rule. In the simplest case, the subject matter items are assigned a predetermined measure of importance by the instructional designer, perhaps based on the centrality of each item in the subject matter graph. A more sophisticated procedure would determine on-line which item is to be taught next by considering the current state of the learner's knowledge.

An overlay model is a device for adapting the sequencing of a course to the individual learner. Instead of teaching the concepts, principles, and skills of the course in a predefined order, an instructional computing system which operates on the basis of an overlay model will present those items in an order which takes the student's current knowledge into account. Such a system will not teach items that the student already knows, nor will it try to teach items for which the learner is not ready. An overlay model can be faulted for not providing all the information one would need in order to individualize instruction (see below), but it nevertheless represents a considerable improvement over global descriptions. The usefulness of overlay models has largely been obscured in the literature on instructional computing by researchers' fascination with students' errors.

208 S.Ohlsson

Models of errors Teaching the student new parts of the subject matter is one half of the teacher's task. The other half is to correct what the student has learned but misunderstood. Empirical studies of students' errors - 'bugs' and misconceptions - have verified what teachers and instructional designers have always known, namely that different students misunderstand the subject matter in different ways. This complicates remedial training. Instruction that is helpful with respect to error A is not necessarily also helpful with respect to error B, but a teacher cannot diagnose the individual errors of thirty students during on-going classroom interaction and provide feedback appropriate to each. One of the great promises of computer-based instruction is precisely that a computer tutor might be able to take over the task of diagnosing and remedying students' errors. This argument has led a significant proportion of the research community to develop theories of cognitive errors, as well as methods for diagnosing and correcting them.

The obvious approach to remedial instruction is to program the instructional computer system to recognize common incorrect answers and to provide feedback tailored to each. This approach was tried three decades ago under the label 'programmed instruction' and was found to be of limited value, at least in knowledge-rich instructional domains. The failure of programmed instruction shows that a catalogue of incorrect answers is too simplistic a description of cognitive errors and that remedial instruction has to respond to something more than just students' actions.

Students' surface errors, i.e. their mistaken problem solving steps and their incorrect answers to problems and questions, arise because they have misunderstood or misconstrued the subject matter. Unfortunately, the English language does not have a convenient term for the distinction between surface errors and their mental causes; the word 'error' is ambiguous in this respect. As a consequence, cognitive scientists have acquired the habit of speaking about 'bugs'. This rather ugly piece of programmers' slang draws upon the analogy with mistakes in program code to refer to deep structure errors in procedural knowledge; when the error is in declarative knowledge, the term 'misconception' is more frequently used. I shall use the terms 'deep structure error' or 'incorrect knowledge' when I want to avoid making the distinction between declarative and procedural errors.

The key hypothesis for remedial instruction is that students' surface errors, i.e. their mistaken problem solving steps and their incorrect solutions and answers, are observable symptoms of deep structure errors. They are effects, caused by incorrect knowledge. If this view is accepted, then it follows that we need to correct the knowledge rather than the behaviour. It is the bug in the mind that needs remedy, not the blot in the copybook.

The obvious way to proceed is to infer the deep structure error from the surface error(s), and provide remedial instruction tailored to the deep structure error. However, inferring incorrect knowledge from observations of mistaken actions is complicated for the same reason that medical diagnosis is complicated (Pople, 1982). The relation between the observable


signs and symptoms and the underlying cause is not one-to-one. Just as a particular physical sign, e.g. high fever, can be caused by any number of diseases, so an incorrect solution to a practice problem can be the consequence of several different mental bugs. The converse is also true. A particular disease usually generates several different signs and symptoms. Similarly, a bug in a cognitive skill can generate several different types of mistakes, depending on its interaction with particular problem features.

Different approaches to overcoming the difficulty of cognitive diagnosis have been tried. The basic approach is to list enough bugs to be able to explain a significant proportion of all observed errors, match the observed behaviour of a particular student against the behaviour predicted by each one of those bugs, and accept the diagnosis that does the best job of accounting for the that behaviour. This approach has led to successful diagnostic systems in several instructional domains; see VanLehn (1988) and Wenger (1987) for reviews. To avoid having to construct a library of bugs, Langley and myself tried to infer bugs bottom-up at runtime, using machine learning techniques (Langley, Wogulis, & Ohlsson, 1990; Ohlsson & Langley, 1988). Neither the bug library technique nor the machine learning approach is currently used extensively in instructional computing systems.

The most practical technique for the diagnosis of bugs proposed to date is called model tracing (Anderson ef al., 1990; Reiser, Anderson & Farrell, 1985). Model tracing overcomes the computational intractability of cognitive diagnosis by following the student step by problem solving step and diagnosing the individual steps (as opposed to entire solutions). The theoretical rationale for this technique is the hypothesis that cognitive skills are encoded in the mind as collections of rules, where each rule controls a single action or step. According to this theory, whenever the student takes a step in a problem solution, he has evoked some rule. Furthermore, the rules are assumed to be learned in a modular fashion, with the acquisition of one rule being independent of the acquisition of neighbouring rules. Hence, the student only needs feedback that is local to each problem solving step. Given this view, a cognitive skill and all the various ways in which it can be misconstrued can be encoded as a single large collection of independent rules. Each step on the part of the student is matched against this rule library and the rule that best explains the step is identified. If the rule is correct, then no remedial instruction is called for. If the rule is incorrect, then a remedial training message attached to that rule is printed. This technique has been incorporated into several effective instructional computer systems (Anderson ef al., 1990).

All methods for diagnosing cognitive errors are costly, in one way or the other. Extensive empirical research might be required to construct a bug catalogue or a rule library. To make mattes worse, a recent empirical study by Payne and Squibb (1990) suggests that bugs might not be stable across student populations, implying that the painstaking work of constructing a bug library might have to be done all over again for each new student population. The machine learning approach was invented to circumvent this cost, but pedagogical applications of this technique are still lacking. Whether

210 S. Ohlsson

diagnosis is made by the bug library technique or through the machine learning technique, it is computationally costly. The computational demands of the model tracing technique are very much lower. However, model tracing carries costs of its own, most notably the necessity to correct each incorrect step as it occurs, a pedagogical style which might be effective in some instructional situations but perhaps not in all.

However, the greatest cost associated with the idea of basing instruction on the diagnosis of deep-structure errors is the need to encode the subject matter in some A1 knowledge representation language. This is necessary because the diagnosis of deep structure errors requires some way of inferring or predicting what the observable behaviour would be, if the student were suffering from such-and-such a deep structure error. To perform such inferences, the subject matter - both in its correct and various incorrect forms - must be explicitly represented in machine-readable form. This requirement is not avoided by any of the above-mentioned techniques. The format for the subject matter representation differs from technique to technique; it might be a procedure network, a set of Horn clauses, a collection of plans, or sets of so-called production rules. However, the different representational formats have that in common that the instructional designer who wants to use them must engage in a major knowledge representation enterprise with the aim to produce an executable implementation of the subject matter. Building courseware around an A1 representation of the subject matter is not current practice in courseware authoring. One possible escape from this requirement has been outlined in Ohlsson (1992b).

Models of learning

A description of a student in terms of his or her (correct as well as incorrect) knowledge constitutes a performance model, i.e. a model of how the student performs the relevant class of problems at some moment in time. A performance model is like a snapshot in that it does not model change over time. The next step up in psychological veracity from a model of knowledge is a model that learns in the same way as the student.

All simulation models of human learning described in the literature make the working assumption that different individuals do not differ in the mental mechanisms that produce cognitive change. They assume that there is a small set of learning mechanisms, such as generalization, discrimination, chunking, etc., which are responsible for all human learning and which are part of a universally shared cognitive architecture (Anderson, 1989; Newell, 1990). If this hypothesis is accepted, then the construction of a student model that learns encounters no more difficulties than then the construction of an error model. Once the particular combination of, say, correct and incorrect rules that describe a particular student has been identified, all that is needed is to combine those rules with the supposed general learning mechanisms. The result is a model that not only mimics the student's performance, but which models the way he or she changes in response to instruction as well.


Do learning models provide extra pedagogical leverage? In a previous section, I described how learning models can be used off-line as devices for the formative evaluation of instructional designs. The perspective here is different: the question is whether a learning model which models a particular student and which is available on-line, i.e. during the delivery of instruction, provides any extra pedagogical leverage for an instructional computer system? In principle, such a system could make its pedagogical decisions by internally simulating the student. The system could try out a particular practice problem on the internal simulation model and verify that the model can learn from that problem before trying it on the student. Such a system would in effect be doing formative evaluation decision by decision. We are very far from being able to do this, both in terms of the confidence we have in our hypothesis about the learning mechanisms in the mind and in terms of the computational capacity required to base every decision during on-going instruction on simulation runs. This intriguing possibility is as yet science fiction.

Summary

The strongest argument for the use of computers in education is their potential to provide individualized instruction. However, in order to adapt its instruction to the individual learner, the system must construct some kind of a description of learner and then use that description as a basis for the successive decisions that arises on-line, during the delivery of instruction. Cognitive scientists have so far identified four different types of student models: (a) global descriptions in terms of descriptors like cognitive style, (b) overlay models, i.e. descriptions of which parts of the subject matter the learner has mastered already, (c) models of student errors, (d) and models that mimic student learning. Global descriptions are easy to construct and to use but do not provide much pedagogical leverage. Models of what the student knows are both practical and useful. Cognitive science research has focussed on models of errors, but such models cannot be accommodated within an instructional computing system unless that system is build around an executable representation of the subject matter. On-line pedagogical use of models that learn is at the present time an idea whose time has not yet come.

Consequences for Authoring Systems

The previous sections have surveyed the main points of impact of cognitive theory on the design and delivery of instruction: instructional principles that constrain the generation of instructional designs, executable models of learning that can be used to automatize evaluation, and student models that can be used on-line to shape the delivery of instruction. The purpose of this section is to analyze the consequences of these types of impact for authoring systems.

212 S. Ohlsson

Consequences of instructional principles

The obvious way to apply instructional principles in the construction of an authoring system is to embed them in the system itself and thereby force the instructional designer to follow those principles. For example, an authoring system could enforce the use of advanced organizers by providing a special- purpose frame in which the courseware author is to write and edit the epitome of a unit and by requiring that the instructional designer writes such an epitome before he or she is allowed to write any other part of that unit. Also, the authoring system could automatically present the advanced organizer as the first screenful of information in each unit. Good practice would be enforced because the instructional principle would be embedded in the tool. In this scenario, instructional principles have a strong impact on the construction of authoring systems.

There are several reasons why this type of impact might not materialize. First, many instructional principles are highly abstract and it is not obvious how to follow them in practice. Although the principle ?begin each unit with an advanced organizer" implies some rather concrete actions, the principle 'motivate the learner by inducing cognitive conflicts' does not specify the required actions on the part of the courseware designer. It is difficult to see how an authoring system could be shaped so as to enforce this principle, because the actions required to follow it are different in different situations. Many other instructional principles are equally difficult to embed in an authoring system. Second, many instructional principles are domain- specific. The principle that 'number concepts and arithmetic operations are best taught with concrete manipulatives' is only relevant for the teaching of arithmetic. Many other instructional principles are equally domain-specific. Trying to implement those principles in an authoring system would violate the basic idea of such a system, i.e. to implement once and for all those aspects of courseware authoring which are general across content.

A third reason why instructional principles have less impact on the design of an authoring system than one might expect is that a tool need not necessarily reflect the properties of the object it is applied to. For example, a whetting stone is round and blunt, not sharp or pointed like the cutting tools it can be used to sharpen. Notice also that one and the same whetting stone can be used to sharpen such otherwise different tools as knifes, axes, scissors, and arrowheads. The tool - the whetting stone - need not be different to apply to these different types of objects. Returning again to the example of aircraft design, we see that a wind-tunnel need not be constructed differently to inform the design of different kinds of aircraft. The ultimate example of independence between design and use is, of course, the computer. The design of computer hardware is virtually unaffected by the intended use of the machine; one and the same computer can be used for such diverse activities as word processing, number crunching, and symbolic programming.

To see how this relative independence of tool design from tool use applies in the instructional realm, consider the instructional principle 'teach the goal hierarchy'. To follow this principle, standard skill training


courseware must be extended with a new set of subject matter items, namely the goals and their subgoal-supergoal relations. However, it is quite possible that no authoring facilities are needed to do this over and above standard editing tools. To present a goal to the learner might not require any capabilities other than those required to present other types of subject matter items. Hence, the instructional principle that one should teach the goal hierarchy might have no implications for the design of an authoring system, even if the principle is accepted as part of good practice. As we go from instruction to instructional design to the tool that supports that design, the causal consequences are successively attenuated, so that the construction of the tool might turn out to be quite independent of the nature of the instruction designed with its help.

Finally, there is considerable doubt that we want to use authoring tools to enforce good practice. To embed an instructional principle in an authoring system in such a way that the instructional designer cannot but behave in accordance with it requires an extraordinary confidence in the usefulness and correctness of that principle. Few instructional principles are so well established that they apply without exception. It therefore seems better to leave the application of a particular principle to the judgment of the instructional designer, rather than loosing flexibility by embedding it in the authoring system.

In short, we reach the somewhat surprising conclusion that instructional principles might not have any implications for the construction of authoring systems. Many such principles are too abstract to specify particular actions, while others are too domain specific to be implemented in a general tool. Some can be followed without the need to have any particular facilities in the authoring system. Finally, even in those cases in which instructional principles can be embedded in authoring systems there might be no strong reasons to do so. Embedding an instructional principle in an authoring system makes no difference in those cases in which the designer would have followed that principle anyway and in the remaining cases we probably should trust the judgment of the instructional designer more than we trust the principle.

Consequences of automatized evaluation

The possibility of using an executable model of learning for formative evaluation has truly revolutionary long-range implications for authoring systems. An authoring system that includes a learning model as one of its components could offer the user an 'evaluate' option. When the user activates this option, the course he/she is currently designing is passed to the build-in student model, which then 'takes' that course. After finished learning, the model prints out a report of what it learned and what problems it experienced while trying to make sense of the instruction. The designer can rework his design accordingly, activate the 'evaluate' option again at a later time, get an evaluation of the revised design, and so on. The possibility of evaluating instruction against a 'canned student' holds the promise of

214 S. Ohlsson

speeding up the design-evaluate-redesign loop by many orders of magnitude.

There are two reasons why this possibility cannot be realized in the near future. First, our learning models are too brittle to react intelligently to just any kind of input. Current simulation models learn in restricted classes of situations, on circumscribed types of problems, and from particular types of input. Also, they assume that the instruction is reasonably complete and correct. Most current machine learning programs would fall apart if presented with the kind of partial, irregular, and possibly inconsistent information that constitutes a course in the early stages of its design. A simulation model of learning that can respond intelligently and reasonably to open ended input is many years down the road.

A second reason why automated formative evaluation is not a practical option at the current time is that the authoring system must pass the instruction to the learning model in some format that the latter can read. In the example of the HS model described previously, we gave the model carefully crafted symbol structures that conformed to the particular syntax of the language that HS uses internally. To take an entire course and translate it - pictures and diagrams and graphs and all - into such a formalism is probably more time consuming than formative evaluation with human students. Hence, practical use of formative evaluation against student models must wait until the bandwidth of communication has been extended to cover at least natural language text and pictures, so that the instruction can be fed to the model without a major translation effort. The development of a simulation model with such a high bandwidth interface is far in the future.

In summary, an authoring system with a formative evaluation facility that allows a courseware designer to do formative evaluation without leaving his or her desk would radically alter the nature of courseware design. However, the brittleness and narrow communication bandwidth of current simulation models prevent this possibility from being realized in the near future.

Consequences of on-line student models

There is little doubt that individualized instruction is better than instruction that is the same for all learners. The next generation of authoring systems should therefore contain a facility for student modeling. I believe such a development is possible at this time, if we only scale down our ambitions as to what kind of student model we aim to include in our courseware.

Anybody who consults the research literature on student modeling is likely to come away with the impression that the construction of student models is a major research endeavour which requires expertise in A1 and which is likely to yield nothing but a prototype system which requires so much computational power that it can only run on high-end A1 workstations. This impression is accurate with respect to ambitious research


efforts that aim to create executable models that mimic student performance, including mistakes, in detail and which learn the way students learn.

However, student models need not be so ambitious in order to be useful in practice. I want to suggest that the so-called overlay model - much maligned in the research literature because it ignores incorrect knowledge - is quite an improvement, as compared to no student model. An overlay model does not describe everything we would like to know about the learner, but it describes something useful nevertheless. Instruction based on an overlay model is likely to be more effective than instruction which is not based on any type of student model. Overlay models are also computationally tractable.

An authoring system that enables the instructional designer to use overlay models on a routine basis has to provide the following software facilities: a.

b.

C.

d.

An editing tool for creating subject matter items and for drawing prerequisite links between them. Such graph-creating tools are already available in many programming environments. (Many authoring systems might already include a graphing facility for other reasons.) A facility for specifymg, for each practice problem, question, or other task, which subject matter units must have been learned in order to succeed on that problem, question, or task. This requires that the problem, questions, etc. themselves correspond to data-structures which can have links to other data-structures and that the system knows which problem is currently presented to the student. A subroutine which responds to a correct response to a problem by upgrading the probability that the subject matter items encoded as necessary for correct performance on that problem are learned. This routine can be programmed once and for all by the implementer of the authoring system and need not be visible to the instructional designer. It presupposes that student responses can be classified as correct and incorrect. Most authoring tools would provide a facility for such classification for other reasons. A routine that selects which subject matter item to teach next on the basis of the current state of the overlay model. The simplest such routine would look for any node in the graph that is not yet learned but for which all the prerequisites are learned. This routine too could be programmed once and for all and appear to the courseware author as part of the machinery the authoring system provides. The computations involved in such a graph search are not extensive.

These four software fakities - none of which requires A1 - could easily be implemented within current authoring systems and they would enable courseware designers to use student models on a routine basis. The author would have to lay out the individual items (concepts, principles, unit skills) which the instructional system is supposed to teach and define the prerequisite relations between them. Presumably, this is not an additional chore; good instructional design has to begin with a good subject matter analysis. The courseware designer must further specify, for each practice

216 S. Ohlsson

problem, which units are required for correct performance on that task. This, too, is an analysis which is useful to do in any case. Once these two analyzes are completed, no further work is required on the part of the courseware author. The routines described in points (c) and (d) above take care of the construction and utilization of the overlay model during delivery of instruction.

General Discussion

The expectation that recent advances in cognitive theory should have implications for the design of computer based instruction, and hence for the design of authoring systems, is only weakly supported by the analysis in this chapter. Although recent advances in cognitive theory do have instructional implications, those implications only impact the design of authoring systems at a small number of points. After summarizing the possible impacts, I will raise the question whether further insights into the construction of authoring systems can be attained by looking at the problem in a different way.

Possible p o i n t s of impact

Individualized instruction requires that the instructional computer system maintains a model of the student. Several different types of student models have been discussed in the literature. The only type of model that can be incorporated into current authoring systems is the so-called overlay model, i.e. a model of which parts of the subject matter the student has mastered (at some moment in time). The routine use of such models could begin in the near future, because the necessary software facilities are simple and encounter no difficulties, either pedagogical or computational, in implementation. The advantage of using overlay models is that the sequencing of the subject matter will be done automatically, as a function of the current state of the learner's knowledge, instead of in accordance with a pre-defined order. The routine use of overlay models constitute only a small step towards individualized instruction, but a step worth taking.

It is common wisdom that individualized instruction also requires a model of what the student knows that isn't so. The individualization of remedial instruction is particularly dependent upon a description of the error in the student's knowledge. However, the on-line diagnosis of incorrect knowledge requires the capability of computing what the observable behaviour would be, if the student were suffering from such-and- such a deep structure error. Hence, routine use of a model of the student's incorrect knowledge requires that most courseware is build around an executable representation of the subject matter. This is not currently the case.

Anderson and co-workers have released an authoring system that is based on the model tracing technique for the diagnosis of errors in cognitive skills (Anderson & Pelletier, 1991). In those contexts in which this technique, and the pedagogy that comes with it, are appropriate, the routine use of AI-based models of errors is already

However, this situation is changing.


available. Other authoring systems build around alternative techniques for cognitive diagnosis are likely to follow, making AI-based modeling of student errors, and hence individualized remedial instruction, a routine feature of courseware design in the intermediate future.

In the distant future, the development of student models might proceed to the point where we are able to construct robust, executable models that learn in the same way as the student and which have high enough a communication bandwidth to be able to learn from the same instruction as the student. Such a model could impact instructional design in a different way than providing a means for individualizing instruction. Incorporating such a learning model into an authoring system would provide the designer with the possibility of doing formative evaluation without leaving his or her desk. The authoring system could pass the instruction to the model which then tries to learn from it and returns a description of what it could learn, what it could not learn, and which parts of the course it had trouble with. Such a capability would obviously have a tremendous impact on courseware authoring, but it is impossible to forecast how many decades will go by before it can be realized in practice.

An expectation that failed to be verified in the present analysis is that instructional theory will have major implications for the construction of authoring systems. The reason is not that research on learning is not making progress. On the contrary, recent theories of learning go far beyond those proposed by previous generations of researchers with respect to conceptual depth and precision. Nor is the reason that our theories of learning do not have instructional implications. On the contrary, with each advance in our understanding of learning, further principles about instruction are proposed.

The first reason why instructional principles will have minimal impact on the construction of authoring systems is that there is no particular advantage to be won by embedding such principles in an authoring system. On the contrary, such embedding leads to a loss of flexibility and generality. For example, an authoring system that makes it possible for the courseware designer to use an advanced organizer, without requiring him to do so, is clearly superior to a system that enforces this practice. We rarely have enough confidence in our instructional principles to mandate their use in all cases. However, if the authoring system is to leave the application of instructional principles to the courseware designer, then nothing might follow from those principles with respect to the implementation of the system. Therefore, an instructional theory in the sense of a collection of instructional principles is likely to have minimal impact on the construction of authoring systems.

A second argument for the same conclusion is that courseware design is an inherently ill-defined problem. It is unlikely that there will ever be a set of principles which dictate in detail the best and most effective instructional design for every educational purpose and context. It seems rather more likely that courseware design will always remain an ill-defined problem. Solutions to ill-defined problems are attained by generating solutions and then evaluating them. The generation phase is necessarily a creative act. To

218 S. Ohlsson

try to bring theory to bear in the generation phase is an unlikely approach. It is more likely that instructional design, like other types of design, will benefit from theory more in the evaluation phase than in the generation phase. This observation raises the question whether we have not been looking at the current topic in the wrong way.

An alternative approach?

One might approach the question of how to ground authoring systems in cognitive research in a different way than the one pursued in this paper. Courseware design is a complex activity and one might approach the problem of constructing a software tool to support this activity not by asking 'what do we know about learning and instruction?' but by asking 'what do we know about design?'. Instead of studying learners, one might study designers, or, more precisely, what designers do when they design effective instruction and then construct an authoring system to be maximally supportive with respect to the activities they actually engage in.

Cognitive research provides both a perspective and some methods for how to carry out such studies. Presumably, some instructional designers are better than others and one would want to provide support for design as the experts do it. Thus, expert-novice studies are likely be informative. In such a study, the crucial features of expert performance are identified by close empirical observation of both experts and novices as they perform one and the same task. The most important type of data is not how fast problem solvers arrive at a solution, or even what solutions they arrive at, but what they actually do while solving a problem. There are many types of traces of problem solving behaviour: verbal protocols, eye movements, computer records, videotapes, and so on. In the case of the design of computer-based instruction, computer records of how experts and novices go about this task ought to be particularly easy to collect.

Properties of the design process might have strong implications for the construction of authoring systems. For example, a pervasive feature of design in many domains is the early generation of a number of different alternatives, followed by subsequent selection and fleshing out of one or perhaps a small number of them. This observation implies that a support system for design should enable the user to sketch alternatives, to keep several sketches active simultaneously, and to maintain design sketches at different level of fleshed-out-ness.

A second pervasive feature of design is that it involves the juggling of multiple, and possibly contradictory, constraints. For example, an airplane engineer might have to trade-off loading capacity and range against the cost of the engine. Reitman (1965) describes how the composition of a fugue proceeded through the posting of constraints and the resolution of conflicts between constraints. A courseware designer certainly has to resolve conflicts between depth, coverage, and time. A tool for courseware authoring might therefore provide the designer with the facilities for posting and rejecting constraints, discovering conflicts, and so on.


In short, I want to suggest that the construction of authoring systems might build upon a theory of design. The generation and fleshing out of alternative designs and the juggling of conflicting constraints are two properties of design processes which have strong implications for authoring systems. Other properties of the courseware design process no doubt remain to be discovered. In the end, research on design might have more to say about the construction of authoring systems than research on learning and instruction.

Acknowledgement

Preparation of this manuscript was supported by grant no N00014-89-J-1681 from the Cognitive Science Program of the Office of Naval Research. The opinions expressed are not necessarily those of the sponsoring agency and no endorsement should be inferred. Approved for public release; distribution unlimited.

References

Abelson, R.P., Aronson, E., McGuire, W.J., Newcomb, T.M., Rosenberg, M.J. & Tannenbaum, P.H., (eds.) (1968) Theories of cognitive consistency: a sourcebook. Rand McNally, Chicago, a.

Anderson, J.R. (1989) A theory of the origins of human knowledge. Artificial Intelligence, 40,313-351.

Anderson, J.R., Kline, PJ. & Beasley, C.M. Jr. (1979) A general learning theory and its application to schema abstraction. In The psychology of learning and motivation. Advances in research and theory, 13 (ed. G.H. Bower). pp. 277-318. Academic Press, New York.

Anderson, J.R., Boyle, C.F., Farrell, R. & Reiser, B.J. (1987) Cognitive principles in the design of computer tutors. In Modeling Cognition (ed. P. Morris). Wiley, New York.

Anderson, J.R., Boyle, C.F., Corbett, A.T. & Lewis, M.W. (1990) Cognitive modeling and intelligent tutoring. Artificial Intelligence, 42,7-49.

Anderson, R.J. & Pelletier, R. (1991) A developmental system for model-tracing tutors. In The lnternatwnal Conference on the Learning Sciences: Proceedings of the 2992 Conference (ed. L. Birnbaum). pp. 1-8. Association for the Advancement of Computing in Education, Charlottesville, VA.

Anzai, Y. & Simon, H.A. (1979) The theory of learning by doing. Psychologid Review,

Ausubel, D.P. (1963) The psychology of meaningful verbal learning. Grune & Stratton, New York.

Ausubel, D.P. (1968) Holt, Rinehart & Winston, New York.

Bradshaw, G. & Lienert, M. (1991) The invention of the airplane. Program of the Thirteenth Annual Conference of the Cognitive Science Society, pp. 605410. Lawrence Erlbaum, Hillsdale, NJ.

Brownell, W.A. & Moser, H.E. (1949) Meaningful us. mechanical learning: A study in Grade 111 subtraction. Duke University Press, Durham, NC.

Bruner, J.S. (1966a). Notes on a theory of instruction. In Toward a theory of instruction (ed. J.S. Bruner). pp. 39-72). Harvard University Press, Cambridge, MA.

86,124-140.

Educational psychology: A cognitive view.

220 S.Ohlsson

Bruner, J.S. (1966b) Patterns of growth. In Toward a theory of instruction (ed. J.S. Bruner). pp. 1-21). Harvard University Press, Cambridge, MA.

Carr, B. & Goldstein, I. (1977) Overlays: a theory of modeling for cornputerdided instruction. (Technical Report A1 Memo 406). Massachusetts Institute of Technology, Cambridge, MA.

Clancey, W.J. (1987) Knowledge-based tutoring: the GUIDON program. The MIT Press, Cambridge, MA.

Clement, J. (1991) Nonformal reasoning in experts and in science students: the use of analogies, extreme cases, and physical intuition. In lnformal reasoning and education (eds. J.F. Voss, D.N. Perkins, & J.W. Segal). pp. 345-362). Lawrence Erlbaum, Hillsdale, NJ.

Dienes, Z.P. (1960) Building up mathematics. Hutchinson, London, UK. Gagne, R.M. (1970) The conditions of learning (2nd ed.). Holt, Rinehart & Winston,

London, UK. Glaser, R. (1976) Components of a theory of instruction: toward a science of design.

Review of Educational Research, 46,l-24. Goldstein, K.M. & Blackman, S. (1978) Cognitive style. Five approaches and relevant

research. Wiley, New York, NY. Hall, N. & Ohlsson, S. (1991) A procedural-analogy theory of concrete illustrations in

arithmetic learning. In The International Conference on the Learning Sciences: Proceedings of the 2992 Conference (ed. L. Birnbaum). pp. 217-221. Association for the Advancement of Computing in Education, Charlottesville, VA.

Jonassen, D.H., Hannum, W.H. & Tessmer, M. (1989) Handbook of task analysis procedures. Praeger, New York, NY.

Kieras, D., Klahr, D., Langley, P. & Neches, R. (eds.) (1987) Production system models of learning and development. MI" Press, Cambridge, MA.

Kuhn, D., Amsel, E. & OLoughlin, M. (1988) The development of scientific thinking skills. Academic Press, San Diego, CA.

Langley, P., Wogulis, J., & Ohlsson, S. (1990) Rules and principles in cognitive diagnosis. In Diagnostic monitoring of skill and knowledge acquisition (eds. N. Frederiksen, R. Glaser, A. Lesgold & M.G. Shafto). pp. 217-250). Erlbaum, Hillsdale, NJ.

Leinhardt, G. (1987) Development of an expert explanation: an analysis of a sequence of subtraction lessons. Cognition b Instruction, 4,225-282.

Murray, F.B., Ames, G.J. & Botvin, G.J. (1977) Acquisition of conservation through cognitive dissonance. Journal of Educational Psychology, 69,519-527.

Newell, A. (1990) Unified theories of cognition. Harvard Univ. Press, Cambridge, MA. Newell, A. & Simon, H.A. (1972) Human problem solving. Prentice-Hall, Englewood

Cliffs, NJ. Ohlsson, S. (1992a) Artificial instruction. A method for relating learning theory to

instructional design. In Foundations and frontiers in instructional computing systems pp. 55-83, (eds. P. Winne & M. Jones). Springer-Verlag, New York, NY.

Ohlsson, S. (1992b) Constraint-Based student modelling. Journal of Artificial Intelligence in Education, $4,929-997.

Ohlsson, S. & Hagert, G. (1980) Applications of cognitive psychology: A discursive overview. University of Stockholm, Stockholm, Sweden. [In Swedish]

Ohlsson, S. & Langley, P. (1988) Psychological evaluation of path hypotheses in cognitive diagnosis. In Learning issues for intelligent tutoring systems (eds. H. Mandl & A. Lesgold). pp. 42-62). Springer-Verlag, New York, NY.

Ohlsson, S. & Rees, E. (1991) The function of conceptual understanding in the learning of arithmetic procedures. Cognition 6 Instruction, 6,103-179.

ONeil, H.F. Jr. (ed.) (1978) Learning strategies. Wiley, New York, NY.


Payne, S.J. & Squibb, H.R. (1990) Algebra mal-rules and cognitive accounts of errors. Cognitive Science, 14,445-481.

Pearl, J. (1984) Heuristics: Intelligent search strategies for computer problem solving. Addison-Wesley, Reading, MA.

Piaget, J. (1985) The equilibrium of cognitive structures. The central problem in cognitive development. University of Chicago Press, Chicago, K.

Pirolli, P.L. & Greeno, J.G. (1988) The problem space of instructional design. In Intelligent tutoring systems. Lessons learned (eds. J. Psotka, L.D. Massey & S.A. Mutter). Lawrence Erlbaum, Hillsdale, NJ.

Pople, H.E. Jr. (1982) Heuristic methods for imposing structure on ill-structured problems: The structuring of medical diagnosis. In Artificial intelligence in medicine (ed. P. Szolovits). pp. 119-190). American Association for the Advancement of Science, Boulder, CO.

Reigeleuth, C.M. & Stein, F.S. (1983) The elaboration theory of instruction. In Instructional-design theories and models: an overview of their current status (ed. C.M. Reigeleuth). pp. 335-381. Lawrence Erlbaum, Hillsdale, NJ.

Reiser, B., Anderson, J.R. & Farrell, R.G. (1985, August) Dynamic student modelling in an intelligent tutor for Lisp programming. Proceedings of the Ninth International Joint Conference on Artificial Intelligence. pp. 8-14).

Reitman, W.R. (1965) Cognition and thought. An information-processing approach. Wiley, New York, NY.

Resnick, L.B. (1973) Hierarchies in children's learning: a symposium. Instructional Science, 2,311-362.

Resnick, L.B. (1983) Toward a cognitive theory of instruction. In Learning and motivation in the classroom (eds . S.G. Paris, G.M. Olson & H.W. Stevenson). pp. 5- 38) Lawrence Erlbaum, Hillsdale, NJ.

Scriven, M. (1967) The methodology of evaluation. In Perspectives of c u m ' c u h evaluation (eds . R.W. Tyler, R.M. Gagne & M. Scriven). pp. 39-102. Rand McNally, Chicago, L.

Sowell, E.J. (1989) Effects of manipulative materials in mathematics education. Journal for Research in Mathematics Education, 20,498-505.

VanLehn, K. (1988) Student modeling. In Foundations of intelligent tutoring systems (eds. M.C. Polson & J.J. Richardson). pp. 55-78. Lawrence Erlbaum, Hillsdale, NJ.

VanLehn, K. (1990) Mind bugs: The origins of procedural misconceptions. MlT Press, Cambridge, MA.

VanLehn, K. (1991) Two pseudo-students: applications of machine learning to formative evaluation. In Advanced Research on Computers in Education (eds. R. Lewis & S. Otsuki). pp. 17-26. North-Holland/Elsevier Science Publishers, Amsterdam.

Wenger, E. (1987) Artificial intelligence and tutoring systems. Computational and cognitive approaches to the communication of knowledge. Morgan Kaufmann., Los Altos, CA.

Documents

Impact of cognitive theory on the practice of courseware authoring