8
307 Brit. J. Psychol. (1962), 53, 3, pp. 307-320 Printed in Great Britain A THEORY OF THE SERIAL POSITION EFFECT By EDWARD A. FEIGENBAUM University of California, Berkeley and HERBERT A. SIMON Carnegie Institute of Technology The paper proposes a theory of the well-known serial position effect that makes quantitative predictions, acceptable by non-parametric tests, of the observed amount of bowing of the serial position curve. The theory, which stems from viewing the central nervous system as an informa- tion-processing system, is compared with the Lepley-Hull hypothesis and Atkinson's theory of the serial phenomena, and is shown to be more satisfactory than the older explanations. Introduction Intraserial phenomena have been a major focus of interest in the study of serial learning. McGeoch (1942), for example, devoted 50 pages of his Psychology of Human Learning to such phenomena. And among intraserial phenomena, one of the most prominent is the serial position curve, depicting the relative number of errors made with the various syllables in a list while learning the list to some criterion. McCrary & Hunter (1953) observed that if percentage of total errors is taken as the unit of measurement, then all the empirical serial position curves for lists of a given length are substantially identical. Earlier investigators, measuring number of errors, had concluded that relatively more errors occurred for the middle syllables when the lists were hard than when they were easy, more with slow learners than with fast learners, more with rapid presentation of syllables than with slow presentation, and so on. One can find in the literature numerous theoretical explanations of these differences. The findings of McCrary & Hunter leave us in the embarrassing position of having explained phenomena that do not exist, i.e., the supposed differences in amount of the position effect, and of havingfailed to explaina striking uniformity that doesexist—the substantial identity of curves derived under a variety of experimental conditions. McCrary & Hunter themselves reach the peculiar conclusion that a single principle can hardly be expected to account for uniformity of effect under diversity of condi- tions, and hence that some multiple-factor is needed to explain the outcome. The thesis of this paper is the opposite one—that if a uniformity underlies experi- ments performed under a wide variety of conditions, this uniformity should be traceable to a single simple mechanism that is invariant under change of conditions. We shall propose such a model of the information processing activity of a subject as he organizes his learning effort in a serial learning task. The serial position effect will be shown to be a consequence of the information processing strategy postu- lated in the model; the model predicts both qualitatively and quantitatively the shape of the curve and the percentages reported by McCrary & Hunter and by others.

THE SERIALPOSITION EFFECT - Stanford Universityrf373wh1051/rf373wh1051.pdf · Brit.J.Psychol. (1962),53,3,pp. 307-320 307 PrintedinGreatBritain ATHEORY OF THE SERIALPOSITIONEFFECT

  • Upload
    lekhanh

  • View
    242

  • Download
    0

Embed Size (px)

Citation preview

307Brit. J. Psychol. (1962), 53, 3, pp. 307-320

Printed in Great Britain

A THEORY OF THE SERIAL POSITION EFFECT

By EDWARD A. FEIGENBAUMUniversity of California, Berkeley

and HERBERT A. SIMONCarnegie Institute of Technology

The paper proposes a theory of the well-known serial position effect that makes quantitativepredictions, acceptable by non-parametric tests, of the observed amountof bowing of the serialposition curve.The theory, which stems from viewing the central nervous system as an informa-tion-processing system, is compared with the Lepley-Hull hypothesis and Atkinson's theory ofthe serial phenomena, and is shown to be more satisfactory than the older explanations.

Introduction

Intraserial phenomena have been a major focus of interest in the study of seriallearning. McGeoch (1942), for example, devoted 50 pages ofhis Psychology ofHumanLearning to such phenomena. And among intraserial phenomena, one of the mostprominent is the serialposition curve, depicting the relative number of errors madewith the various syllables in a list while learning the list to some criterion.

McCrary & Hunter (1953) observed that ifpercentage of total errors is taken as theunit of measurement, then all the empirical serial position curves for lists of a givenlength are substantially identical. Earlier investigators, measuring number oferrors,had concluded that relatively more errors occurred for the middle syllableswhen thelists were hard than when they were easy, more with slow learners than with fastlearners, more with rapid presentation of syllables than with slow presentation, andso on. One can find in the literature numerous theoretical explanations of thesedifferences.

The findings of McCrary & Hunter leave us in the embarrassing position of havingexplained phenomena that do not exist, i.e., the supposed differences in amount ofthepositioneffect, andofhavingfailed to explainastriking uniformitythat doesexist—thesubstantial identity of curves derived under a variety of experimental conditions.McCrary & Hunter themselves reach the peculiar conclusion that a single principlecan hardly be expected to account for uniformity of effect under diversity of condi-tions, and hence that some multiple-factor is needed to explain the outcome.

The thesis ofthis paper is the opposite one—that if a uniformity underlies experi-ments performed under a wide variety of conditions, this uniformity should betraceable to a single simple mechanism that is invariant under change of conditions.We shall propose such a model of the information processing activity ofa subject ashe organizes his learning effort in a serial learning task. The serial position effectwill be shown to be a consequence of the information processing strategy postu-lated in the model; the model predicts both qualitatively and quantitatively theshape of the curve and the percentages reported by McCrary & Hunter and byothers.

Theory of the serialposition effect308 Edward A. Feigenbaum and Herbert A. Simon 309

Some relevant data

Before we state the theory of the serial position effect we shall review someimportant empirical findings on serial learning of nonsense syllables:

(1) Under the usual experimental conditions and with experienced subjects thereis generallya characteristic curvilinearrelation between number of errors to criterionfor a given syllableand the serialposition ofthat syllablein the list. The syllablewiththe largest number of errors is generally beyond the middle of the list, though thiseffect becomes less noticeable as the length of the list increases (Ribback & Under-wood, 1950) ; thefirst syllablealmostalways exhibits the fewest errors (Hovland, 1938).

(2) McCrary & Hunter (1953) have shown that, for lists of a given number ofsyllables, all serial position curves obtained with the usual experimental proceduresare virtually identical when errors are plotted on a percentage, rather than anabsolute, basis. About the same degree ofbowing is exhibited with nonsense syllablesas with names, with massed as with distributed practice, with slow as with fastlearners, with rapid as with slow presentation. Typical data for lists of 12 and14 syllables are given in Tables 1 and 2.

(3) In spite ofthis uniformity under normal conditions, it is easy to produce largedeviations from the characteristic curve. Such deviations can be produced in atleast the following four ways : (a) by varying the difficulty of particular items in thelist; (6) by introducing an item sharply distinguishable from those that precede it orfollow it (McGeoch, 1942, p. 107); (c) by introducing distinguishable sublists withinthe main list (Wishner, Shipley & Jurvich, 1957); (d) by explicit instructions to thesubjects (Krueger, 1932; Welch & Burnett, 1924). Not surprisingly, difficult itemsare learned with more errors than easy items in the same serial position ; distinguish-able items are learned with fewer errors; items that the subject is instructed to learnfirst are learned with fewer errors.

(4) For lists ofa given length, average learning time per syllable is almost indepen-dent of: (a) the rate of presentation (though this result is not given explicitly byHovland (1938), we have used his reported data to compute the average learningtime per syllable; this constancy has been independently reported by Wilcoxon,Wilson & Wise (1961)); and (6) the order in which subjects are instructed to learn theitems (though this result is not given explicitly byKrueger (1932), we have computedit from his data). Hence, number of trials to criterion is inversely proportional toseconds per syllable.

(5) Distribution of practice reduces the number of trials to criterion, but notsufficiently to compensate for the additional total time. The advantages of distribu-tion, measured intrials to criterion, almost disappearswhenthe presentation rate isasslow as four seconds per syllable (Hovland, 1938).

In the next section we shall propose a theory of the serial learning process thataccounts quantitatively for the data mentioned in items 1, 2, and 4, above, andqualitativelyfor the observation of item 3. In the present paper we shallnot discussthe effects of distribution of practice, since these effects almost certainly derive frommechanisms that go beyond the simple theory proposed here. We wish merely toobserve that when time to criterion is taken as the measure of learning rate theseeffects are ofrather small magnitude compared with those we shall consider.

An information processing theory of serial learningWe hypothesize that serial learning is an active, complex process involving the

manipulation and storage of symbols by means of an interacting set of elementaryinformation processes; and that these processes are qualitatively similar to thoseused in problem solving, concept formation, and other higher mental processes(Newell, Shaw & Simon, 1958a). Thus, we shall argue that the stimulus-responsesequences postulated by S-R theory are simple only in surface appearance—thatbeneath them lies an iceberg of complex information processing activity.

We shallnot defend this viewpoint in detail here althoughit hasproved exceedinglyfruitful in research in which we and our associates have been engaged (see, forexample, Feigenbaum (1959); Newell & Simon (1961); Newell et al. (19586)). Weshouldliketo offer three brief observationsto persuade thereader that our conjecturedoes not entirely fly in the face of common sense or previous psychological observa-tion. First, expectancy and mediation theories, like those of Tolman (Hilgard, 1956,pp. 185-221), or of Osgood (Hilgard, 1956, pp. 464-5), attribute as much complexity tothe stimulus-response connexion as does our conjecture; what they fail to indicate isthe nature ofthe mechanisms that might provide the complexity. Secondly, equallyelaborate and more explicit mechanisms are postulated in concept-formation theorieslike those of E. J. Gibson (1940), and the recent one of Bruner, Goodnow & Austin(1956). Indeed, we shall see that one of our postulates involves a conception closelyrelated to Bruner's notion of 'cognitive strain. Thirdly, the time an experiencedsubject needs, per syllable, to memorize a list of a dozen nonsense syllables is of theorder of thirty seconds. In comparison with the timesrequired by familiar electronicsystems for simple processes, this is an enormous time interval. It is large—by afactor of 500 or more—even in comparison with the 50 msec, or thereabouts requiredfor the central processes in the simplest responses to stimuli. If a theory is to fillup this 30 sec. time interval in at all a plausible manner, it will have to attributeconsiderable complexity to the processes that take place. Feigenbaum (1959)reports on such a theory of verbal learning, dealing in a complete manner with dis-crimination learning, association learning, responding, etc. The theory predicts avariety of the phenomena ofrote learning of nonsense syllables in serial andpaired-associate learning tasks.

Underlying assumptions of the modelFor the purposes of this paper, we shall not need to examine the elementary infor-

mation processes in detail, for the shape of the serial position curve will prove to beindependent oftheir micro-structure. This point is examinedin detailbyFeigenbaum(1959), where a distinction is drawn between macroprocesses of verbal learning andmicroprocesses. We require, instead, the assumption that in order for a connexionbetween a stimulus and a response to be formed, a certain (unspecified) sequence ofMomentary processes needs to be carried out, and that the execution ofthis sequenceSquires a definite interval of time, the length of the interval depending on thedifficulty ' of the task and other parameters of the experimental situation.We suppose the information processing mechanismto be operating predominantly

m a serial rather than a parallel manner—it is capable of doing only one, or a few

310 Edward A. Feigenbaum and Herbert A. Simonthings at a time. The narrowness of the span of attention is a familiar aspect ofconscious activity; we assume that it is also an attribute of the subconscious.

Information processing postulatesThe structure of the theory is embodied in four postulates about the processing

mechanism.Postulate 1. Serial mechanism. The central processing mechanism operates serially

and is capable ofdoing onlyone thing at a time. Thus, ifmany things demandprocess-ing activity from the central processing mechanism, they must share the total pro-cessing time available. This means that the total timerequired to memorize a collec-tion of items, when there is no interaction amongthem, will bethe sum ofthe times ofthe individual items. (In serial learning of syllables there is, in fact, interactionamong individual items ; and total learningtime increases more than proportionatelywith number of items. We will not be concerned with this point in the present paperbecause we are dealing not with total learning time or total errors, but with therelative number of errors made on different syllables in a list.)

Postulate 2. Average unit processing time per syllable. The fixation of an item on aserial listrequires the execution of a sequence of information processes that requires,for a given set of experimental conditions, a definite amount of processing time persyllable. The time per syllable varies with the difficulty of the syllables, the length oflist, the abilityofthe subject and other factors. In awell-known series ofexperimentsby Hovland (1938), for example, it averaged approximately 30 sec.

Postulate 3. Immediate memory. There exists in the central processing mechanisman immediate memory of limited size capable of storing information temporarily ;and all access to an item by the learning processes must be through the immediatememory. There is a great dealof experimental evidenceto support the concept of animmediate memory. The evidence points to a span of immediate memory of aboutfive or six symbols (Miller, 1956). We postulate that each symbol stored separatelyinthe immediate memory must be a familiar, well-learned symbol. For unfamiliarnonsense syllable materials, the familiar symbols are the letters. Thus, for the three-letter nonsense syllables ordinarily used, we postulate that the immediate memoryhas the capacity to hold two syllables (six letters). This means that it will ordinarilyhold at any moment one S-R pair being learned.

Postulate 4. Anchorpoints. Inthe absence ofcountervailing conditions—thenatureof which will be specified presently—the information processing will be carried outin a relatively systematic and orderly way which will limit the demands that areplaced on the small immediate memory. This postulate is related to the generaliza-tion, which Bruner and his associates (1956) have tested in certain concept-formingexperiments, that subjects develop strategies for limiting the 'cognitive strain'involved in concept formation, and that these strategies involve handling newlyacquired information in a systematic and orderly way.

We assume that subjects learning the syllables of a serial list will reduce thedemands on memory by treating the ends of the list as 'anchor points', and bylearning the syllables in an orderly sequence, starting from these anchor points andworking toward the middle. This procedure reduces demands on memory because,at each stage ofthe learning task, the next syllable to be learned is readily identified

Theory of the serialposition effect 311as being adjacent to a syllable that has already been learned. Thus, no specialinformation about position in list needs to be remembered.

The idea of learning from anchor points is not new, though it does not seem tonave been previously formalized. Woodworth (1938), for example, makes use°f it in describing the process by which a list of digits is learned. Wishner et al.(1957) mention it in their discussion of the serial position curves obtained in theirlist-sublist experiment.

Thefirst three postulates differ from the fourth in that the former describe built-incharacteristics ofthe processing mechanism that are probably not learned or readilymodified ; whilethe latter describesa method ofproceeding that isapparently habitualwith most subjects, at least in our culture, but which is modifiable by experimentalinstructions, and by certain attention-directing stimuli.

It has been observed frequently that in serial memorization subjects not onlydevelop associations between syllables, but also use various position cues and othercues. They learn, for example, that a particular syllable occurs in the early or in thelate part of the list. (From this reliance on 'irrelevant' cues, one can develop anexplanationfor such phenomena as anticipatory errors and 'remote associations ' thatis much simpler than the usual one ; but these topics would take us beyond the scopeof our present task.) The use of position cues gives a unique status to the beginningand end of a serial list, for these items have the special property that they have noneighbours 'before' and 'after' respectively, i.e. the first item is always preceded,and the last item succeeded, by the intertrial activity. Once the items at the anchorpoints are memorized, the items contiguous to them become the first unlearned items'after' or 'before' syllables already memorized; and so on, as the learning proceeds.More than this, the first two items are unique in that they represent the first S-RPair presented to the subject in the experiment. Thus, we can make out at least aplausible case that a learner can reduce the demands on immediate memory bymemorizing in a more or less systematic fashion from the ends ofthe list toward themiddle.

This postulate is sufficient to explain the bowed form ofthe serial position curve—although it says only a little more than the observed fact of the bowing. Its advan-tage over explanations like the Lepley-Hull hypothesis (which will be discussedlater) is that it is not inconsistent with the ease (item 3, above) with which changescan be induced by the experimenter in the serial position curve.

In order to make quantitative predictions as to the amount ofbowing that will beobserved, we use the notion of anchor points to strengthen postulate 4, as follows:

Postulate 4a. Processing sequence. We postulate the following information process-ing strategy for organizing the serial learning task using anchor points : (a) the firsttwo items presented in the experiment are learned first; (_») attention is next focusedfor learning on an item immediately adjacent to an anchor point. (In the ordinaryseriallist, thiswill be the third item orthe last.) The probability that any specific itemadjacent to an anchor point will be selected for learning next is \jp, where p is thenumber of anchor points (in the ordinary serial list, p = 2. Thus, for example, theProbability that the lastitem will be learned after the second item is 0-5) ; (c) attentionisfocused, andlearningproceeds, itemby itemin this orderlyfashion until the criteriontrial is completed.

312 Edward A. Feigenbaum and Herbert A. SimonOne can picture the subject building up over time an internal representation of the

serial list he is learning. It will be seen, then, that the postulate specifies only aminimal amount of organizational activity: namely, the ability to add an itemimmediately after or immediately before an item already learned (or a 'special'stimulus like the intertrial interval). Our explorations with other processing strate-gies have shown that this strategy reduces greatly the information processingdemands on the learner.

Predictions from the postulatesThe postulates describe a learning mechanism that memorizes serial lists in a pre-

scribed way. This mechanism generates a serial error curve as it learns (i.e. someparticular serial error curve is deduced as a consequence of the postulates). We wishto compute this serial error curve and compare it with the McCrary & Hunter curve.

Computer simulation is the most general and powerful method for doing this. Weand others have used this method extensively in building theories of problem solving(Newell et al. 1958a), binary choice behaviour (Feldman, 1959), concept attainment(Hovland & Hunt, 1960), and other cognitive phenomena. It is described in detailelsewhere (Newell & Simon, 1961). Briefly the idea is this. The digitalcomputer is auniversal information processing device, capable of carrying out any precisely speci-fied information process. Thus a computer can carry out exactly the informationprocessing required by the postulates of the model. We programme the model on acomputer, use it qua subject in verbal learning experiments (simulated inside thecomputer), observe the learning behaviour of the model, and thereby generate theconsequences ofthe postulates inparticular verbal learning situations. We have usedthis method in constructing and exploring an information processing theory ofverbal learning (Feigenbaum, 1959). In particular, for the purposes of this paper, we

have generated the serial error curve for a few simple serial learning experiments.We have done this in two different ways : first, following postulate 2, we introduced aunit processing timeper syllablewithout specifying the microprocesses ofthe learningthat take place duringthis time interval; secondly, weremoved the latter artificialityand substituted the full complement of microprocesses postulated by the morecomplete theory.

For the particular case of the serial position curve, the postulates are simpleenough so that there is no real need to employ computer simulation to generate thepredictions. The postulates can be formalized in a simple mathematical model, fromwhich the quantitative predictions can be generated. As this method is likely to bemore familiar to the reader, we give the mathematical model in the Appendix, andpresent the serial error curves which it predicts (for lists of twelve and fourteensyllables) in Tables 1 and 2. The results obtained by the computer simulation tech-nique are substantially identical (though slightly more discontinuous).

What can we specifically say about the fit? First, the ordinates of the first andlast syllable of the predicted curves are in almost exact agreement with those of theempirical curves. Secondly, the syllable position of the peak ofthe predicted curvesis substantially the same as that of the observed curves. Thirdly, the ordinates ofthepredicted curves and the empirical curves at each syllable position are very close,especially in the critical first and last third of each list, where very good agreement is

Theory of the serialposition effect 313important to any claims about goodness-of-fit. Furthermore, this fit was obtainedwithout any arbitrary parameters, other than the specification of the sequence inwhich incoming syllables are processed.

The goodness-of-fit of the observed frequency distribution to the predicted distri-bution was tested by the Kolmogorov-Smirnov test ofassociation. The test acceptedthe null hypothesis at the 99 % level of significance.

Table 1. Table showing percentage of total errors made during acquisition at eachsyllable position of a 14-item serial list of nonsense syllablespredicted and observed

Syllable position

12 3 4 5 6 7 8 9 10 11 12 13 14

* These values are approximate.The data were taken from fig. 4 of McCrary & Hunter (1953, p. 133).

Table 2. Table showingpercentage of total errors made during acquisition at eachsyllableposition of a 12-item serial list of nonsense syllables predicted and observed

Syllable position

12345 6 7 8 9 10 11 12

* These values are approximate values for themedianpercentages at each position for the four curvespresented in fig. 2 of McCrary __ Hunter (1953, p. 132).

Elaboration and discussion

In this section, we wish to compare the predictions of our information processingtheory with those derived from the Lepley-Hull hypothesis, and extend our predic-tions to two important experiments, one of which was published after we hadspecified our model.

(1) The Lepley-Hull hypothesisThere have been few attempts to account for the serialposition effect in quantita-

tive terms. Hull et al. (1940) attempted to do so on the assumption of some in-hibitory processes, or intralist 'interference. Atkinson (1957), drawing on statisticallearning theory, has exhibited a stochastic process which generates a curve of thegeneral shape of the serial position curve. We shall discuss Hull's results in somedetail, and comment briefly on Atkinson's.

AlthoughHull's equations provide a good fit to the empirical data, this fit is not aconvincing test of the Lepley-Hull theory for the following reasons :

Hull's theory leads to a set ofequations having three free parameters : thereactionthreshold, the ratio of inhibitory potential to excitatory potential per trial, and theremoteness reduction factor. These are used to fit the serial error curve, or, moreprecisely, the curve of numberofrepetitions toreaction threshold (see Hull et al. (1940,Pp. 103-7)). Hull fits the theoretical curve by passing it through three points of the

'redicted)bserved*

0-951-0

1-93-5

4-74-3

5-660

8-1 8-98-0 8-9

10-59-2

10-8100

10-89-5

10-510-6

8-9 8-18-8 8-9

5-67-2

4-74-0

'rei Lid

314 Edward A. Feigenbaum and Herbert A. Simonempirical curve. Since the empirical data form a relatively smooth, bow-shapedcurve, it is not surprising that a three-parameter curve can be made to fit themclosely; an equally good approximation can be obtained by fitting a parabolaempirically to the data.

This means that Hull's hypothesis will fit almost any data (provided the serialposition curve has the characteristic bowed pattern), and hence is almost impossibleto disprove from the data. It is therefore an exceedingly weak hypothesis. By thesame token—because of the three free parameters—Hull's theory does not predictthe constancy on a percentage scale observed by McCrary & Hunter.

Conversely, given the constancy observed by McCrary & Hunter, we can drawcertain conclusions from Hull's theory regarding the growth of excitatory andinhibitorypotential and thereaction threshold. For example, it can be shown by an

examination of Hull's equations that the ratio of the increment of inhibitory poten-tialpertrial to the increment ofexcitatorypotential must be a constant (independent,for example, of intra-list similarity). See Hull et al. (1940, pp. 104-5).

The McCrary & Hunter result implies that the Rs are related: they are propor-tional to each other. The variables qt are homogeneous of degree two inR, and theDt

are homogeneous of degree one in R. The Js are homogeneous of degree zero in R.By equation (3) of p. 104, Ae/Ai is homogeneous of degree zero in R, hence a

constant.This is surprising and contrary to the whole spirit of the Lepley-Hull hypothesis.

For we would expect that with high intra-list similarity the inhibitory potentialwouldrise morerapidly than with low intra-list similarity. On the contrary, ifHull'smodel is correct the only parameter that changes as lists become more difficult to

learn is the ratio of the threshold to the increment of excitatory potential per trial(by equation 2, p. 104, ofHull etal. (1940)). Finally, the Lepley-Hull hypothesis doesnot explain how a subject can voluntarily or through a shift in his attention greatlyalter the shape of the curve.

There are four reasons, therefore, why Hull's mathematical model for the serialposition effect is unsatisfactory: since it contains three adjustable parameters itspredictions are very weak ; it does not predict the constancy in the percentage-errorcurve; this constancy hardly seems compatible with the mechanism assumed as abasis for the model; and finally, the model is difficult to reconcile with well-knownattention-shift and set-change phenomena.

The preceding discussion of the curve-fitting aspects of Hull's equations appliesalso to Atkinson's equations. Atkinson (1957) has available four free parameters.He estimates these parameters from datafor an 18-syllable list, and uses these esti-mated values to make predictions for lists of 8 and 13 syllables. However, a carefulexamination of Atkinson's equations shows that even after the parameters have beenestimated, there are enough degrees offreedom left in the system almost to insure a

reasonably good fit to the other curves. Thus, his theory suffers the same infirmitythat we have pointed out in Hull's.

Furthermore, to make workable the difficult mathematicsof the stochastic process,Atkinson has had to introduce a number of very constraining assumptions. Hisequations hold only for serial lists of highly dissimilar words which are familiar andeasily pronounced; the presentation must be at a moderate rate, with a long interval

Theory of the serialposition effect 315provided at the conclusion of each trial. Yet the same bowed curve is obtainedempirically when these conditions are not met as when they are. Finally, as inHull's theory, there is difficulty in predicting the constancy observed by McCrary &Hunter. Given a set of parameter values, Atkinson's theory does not predict theconstancy over the experimental conditions reported by McCrary & Hunter. On theother hand, if one admits that these values may change from situation to situation,then one must re-estimate them for each experimental condition, and therein thetheory loses much of its power.

(2) The two-part listA simple extension of ordinary nonsense syllable experiments is to differentiate

the first half of the list from the second half by printing the former in one colour(say black) and the latter in another colour (say red), dividing the total list percep-tually into two smaller sublists.

What will be the shape of the serial position curve? One can predict this fromHull's theory by the unsatisfyingprocedure of assuming that the total list is learnedas two separate sublists, and by fitting each sublist with a three-parameter curvesuch as we have discussed previously. The total fit will have six free parameters.

We should liketo be able to predict the shape of the curve from the theory we havealready presented. There are two important issues involved in making such aprediction :

(i) Consider those subjects who perceive the total list as being constructed of twosublists. One possible reasonable strategy for dealing with the learning task is to usethe end points of each sublist as anchor points, in the type of learning process de-scribed earlier. Another plausible strategy is to use as anchor points the ends of thetotal list and one point in the centre to identify the point of bifurcation, say the firstred syllable.

In making a prediction of the serial position curve for the two-part list, we haveassumed simply that of those subjects who perceived the list as being two sublists,one half used the first strategy and the other half used the second strategy. Theassumption is, of course, arelatively crude one, but the prediction is not very sensi-tive to the actual percentages assumed. Alternatively, we could have estimated thepercentage from the data.

(n) In the experiment which we shall discuss shortly, we have no way ofknowingprecisely how many subjects perceived the task as one of memorizing two sublistsand how many perceived it as learning one long serial list. In the absence of thisknowledge, we can estimate these percentages from the observed ordinate of thefirst red syllable and weight our prediction at each syllable position appropriatelyusmg this estimate. This procedure essentially 'fits' our predicted curve to theempirical curve at one point, the 'break' between black and red syllables. But itguarantees nothing about the quality of the prediction at the other points.

As part of a larger experiment, Wishner et al. (1957) performed an experi-ment with a two-part list. They had an experimental group memorize lists of 14syllables, halfof which were printed in black capitals, the other half in red lower caseletters. The experimental group was told that the object of the experiment was todiscover how people learned two lists simultaneously.

316 Edward A. Feigenbaum and Herbert A. SimonIn Table 3 the predicted values for the percentage of errors at each syllable

position are compared with the observed values.Thepredictions were generated by the methods previously described.Three anchor

points were used. In the mathematical treatment the three-anchor-point predictionswere corrected by a factor which insured an exact fit of the ordinate of syllable 9

(the middle anchor point). In the computer simulationmethod, whole-list and subliststrategies were both run and the predicted ordinates averaged in a weighted fashion

suchthat the ordinate ofsyllable9fitted exactly. What this procedure comes downto is

the assumption that approximately two-thirds ofthe subjects learned the list as two

sublists and approximately one-third learned it as a single long list (note that thesefractions were not assumedapriori but were obtained by working backwards from theobserved ordinate of the ninth syllable).

Table 3. Table showing percentage of total errors, predicted and observed, made at eachsyllable position during acquisition, for a two-colour, U-item serial list of nonsensesyllables

Syllable position

I 2 3 4 5 6 7 8 9 10 11 12 13 14

* These values are approximate, and were derived from data taken from Wishner et 01. (1957, p. 260).

The agreement at all syllable positions is very close. As contrasted with our other

predictions, in this one we had available the one free parameter already mentioned.The goodness-of-fit of the observed frequency distribution to the predicted distribu-tion was tested by the Kolmogorov-Smirnov test of association. The test acceptedthe null hypothesis at the 99 % level of significance.

(3) The experiments of KruegerWe turn now to some important experiments, theresults ofwhich were alludedto

previously, and which have important implications for the information processingtheory. . -,_"

In a well-known series of experiments, Krueger (1932) presented various kinds ot

lists of 'easy' and 'hard' paired nouns to subjects who either received instructionsto learn the list in some specified order or no instructions at all. These studiesdemonstrated that the order in which the various items were learned was influencedmarkedlyby the instructions givento the subjects. As McGeoch puts it (1942, p. 102),

'The relation between rate of learning and position in the series is, then, a functionof the direction of the subject's effort or attention. As we have indicated, this is

entirely consistent with the information processing theory, which regards particularlearning sequences as 'strategies' for dealingwith the learning problem—as adaptiveresponse to task.

What about our more specific hypothesis that in the usual serial learning experi-

ment the 'end points' of the list will be taken as anchor points in the learningprocess ? Krueger's experiments showedthat subjects given no instructionsproduced

Theory of the serialposition effect 317essentially the same serial position curves as those subjects who were instructed tolearn the ends of the list first.

Because the fixation of an item requires a fixed amount of processing time, andbecause the sequenceof learning is considered a ' strategy' and not a built-in charac-teristic of the learning process, our theory predicts that the total number of syllableslearned will be proportional to the total learning time and independent ofthe order oflearning. On this point Krueger reports, 'When the attention given is constant, thetotal amounts learned are the same, irrespective of whether this effort is directed tothe beginning, center, or the final sections of the unit which is to be memorized '(1932, p. 527).

Although we assume that there is a constant fixation time associated with eachparticular item on a list, items of differentkinds (e.g. ' easy ' items against ' difficult'items) will have different processing times. The theory we have proposed predictsthat the total learning time will be the sum of these processing times per syllable,and as such will be independent ofthe order in which the syllables are learned. Con-firming this prediction, Krueger reports, 'When materials of unequal difficultyappear within the same unit to be mastered, the total number of trials required tomemorize the unit is approximately the same whether attention is given at first tothe more difficult or the easier sections of the unit' (1932, p. 527).

This is consistent with the McCrary & Hunterresults. If one plots the McCrary &Hunter curve by ordering the abscissa values not by serial position in the list but bythe apparent order in which the syllables were learned, the ordinates lie on a straightline. This, of course, is in exact agreement with our model. Recent additional infor-mation on this phenomenon was obtained by Jensen (1962) for the learning ofnonsense figures.

Thus, Krueger's experiments, though they were performed with serial lists ofpaired nouns rather than nonsense syllables, demonstrate (a) that the serial positioncurve can be 'shaped' by the experimenter with suitable instructions to subjects, sothat the order of learning syllables is itself a learned response ; (6) that the totalamount ofmaterial learned within a given time is independent of the order in whichvarious items, sometimes heterogeneous with respect to difficulty, are learned.

Some recent results

Subsequent to the specificationofthe model proposed in this paper, some importantnew experiments have been published on the effect of replacing syllables on a listduring learning. Rock (1957) used the following procedure on his experimentalgroups: on each test trial, those syllables incorrectly responded to by the subjectwere removed and new syllableswere substituted in their place for the next learningtrial. Rock foundno impairment of the rate of learningfor the experimental groups(as compared with the control groups). This important result casts further doubt onHull's incremental build-up hypothesis. Criticism of Rock's technique led Estes,Hopkins & Crothers (1960) to replicate and extend Rock's experiment, but theirresults substantiate Rock's.

Estes et al. say ofthese experiments, 'No hitherto published theory with which weare familiar gives areasonable amount of our principal findings' (1960, p. 338). The

'redicted)bserved*

1-9 312-2 4-3

4-76-5

6-77-3

9-18-2

10-18-5

8-981

5-9 6-060 60

8-390

8-7 9-89-5 9-9

9-98-5

5-56-0

318 Edward A. Feigenbaum and Herbert A. Simoninformation processing theory we have proposed here predicts the Rock result. Acomputer simulation of the Rock experiment using the information processing modelgenerated behaviour substantially identical with that reported byRock. In terms ofour theory (postulate 4a) the explanation, of course, is that items on a list arelearned one at a time in the processing sequence. Items presented when attention isfocused on some other particular item are simply ignored by the learner, and arepicked up on a later trial, as determined by the processing sequence. Hence, notime is lost by the learner if the experimenterreplaces an item that has not yet beenprocessed.

Concluding remarks

In this paper we have surveyed the principalknown facts about the shape of theserial position curve in serial learning of nonsense syllables by the anticipationmethod. We have examined the Lepley-Hull hypothesis as an explanation for theshape of the curve, and have concluded that the hypothesis is unsatisfactory. Wehave proposed an alternative hypothesis formulated interms ofinformationprocesses.We have shown that the hypothesis not only predicts the constancy of the serialposition curve when the ordinates are plotted in percentage terms, but alsopredictsthe quantitative values of the ordinates. Since the hypothesis allows no free para-meters, its success in fitting the observed data provides rather persuasive evidencefor its validity.

The information processing hypothesis is built on the following assumptions :(1) that the brain is a serial processing mechanism with alimited span ofprocessing

attention ;(2) that the fixation of an item uses up a definite amount of processing time;(3) that there is a small immediate memory which holds information to be

processed;(4) that the subject employs a relatively orderly and systematic method for

organizing the learning task, using items with features of uniqueness as anchorpoints.

In this paper, we have offered no explanation of the fixation process itself, i.e. wehave talked not at all about what occurs during the processing time assumed initem (2) above.

We are indebted to our colleague Allen Newell for numerous helpful discussionsabout this project, and to the Ford Foundation for financial assistance that made itpossible.

Appendix

Given the unit processing time per syllable, one can, from thepostulates, compute the averagetime after the beginning of a learning experiment that will pass before any specified syllable islearned. This time, in turn, determines uniquely the number oferrorsthat will be madewiththatsyllable. While the actual number of errors will be a function of the unit processing time, thepercentage that this numberrepresents of the expected total errors is independentofthe unitprocessing time.

The numerical estimates given in the text tables were obtained as follows : By the postulates,the syllables will be learned in an orderly sequence, each syllable requiring a certain processingtime, say k. Each syllable can be identified by its serial order, in the list as presented by theexperimenter, and also by the order, r, in which it is learned by the subject. Since learning takes

Theory of the serialposition effect 319place from both ends of the list, these two orders willnot, ingeneral, be identical. Thus, s( , the .thsyllable in order of presentation, may be the same syllable as s'r, the rth syllable in order oflearning. (Technically, the list ofsyllables in order of learning is a permutation of the originallist.) Let T'r be the time that elapses before the first successful response to syllable s'r—that is,until the rth syllable is learned. Then T'r = kr. The number of errors, W'r , the subject will makeon the rth syllable is equal to the number oflearning trials prior to the trial on which that syll-able is learned and will be proportional to r :

where m is a proportionality constant, equal tok divided by the time per trial.The numerical value ofm is a function, of course, ofthe difficulty of the items, and the rate

of presentation. However, we are only concerned wdth the fraction of total errors made on agiven syllable, and this fraction is clearly independent of m. For let W be the total number oferrors, summed over all syllables, and w'r = W'r/W, the fraction of total errors made on the rthsyllable.

(2)

where n is the number of syllables in the list.12

Suppose,forexample.thatwearedealingwithalistoftwelveßyllables, then2 r = 78. Hence, ther =l

fraction of total errors that will be made on the 4th syllable learned will be 4/78 = 0-051.Now, to obtainthe serialposition curve,we needmerely to relabel the syllables fromthe order in

which they are learned totheir order ofpresentation. Thatis, ifrt is the rank, in order of learning,ofthe .th syllable in order ofpresentation, then the fraction of total errors for the .th syllablewill be simply :

To apply this result, we must calculate therank, ri( ofthe .thsyllable in order ofpresentation,as determined by postulate 4a. Weassume (postulate 3) that the immediate memorycapacity istwo syllables and for simplicity in the calculation shall assume that the items are picked uppairwise in the processing sequence, andstored in the immediate memory for learning. The firsttwo syllables in the list will be learned first (will have rank, r = 1, and r = 2, respectively),followed either by the last two syllables on the list, or by syllables three and four, each withprobability one-half. The result is that in a list of 12 syllables the third syllable, for example,will have a probability of one-halfofbeing the third syllable in order of learning, a probabilityof one-quarter of being the fifth syllable, a probability of one-eighth ofbeing the seventh, and ofone-sixteenth of being ninth or eleventh. Averaging these ranks, weighted by their respectiveprobabilities, we find that the average rank of the third is r3 = 4-875. The fraotion of totalerrors on the third syllable will then be wa = 4-875/78 = 0-063 (see Table 2). All the otherpredicted values in the tables were computed in the same way.

The fact that the third syllable has zero probability of learned fourth, sixth, eighth, etc., isartificially introduced by the calculation simplification introduced above of handling thesyllables in pairs. It does notmaterially affectthe serial position curve prediction, as is shown bythe fact that the computer simulation (which does not use the pairwise learning simplification)generated the same serial position curve prediction.

ReferencesAtkinson, R. C. (1957). A stochastic model for rote serial learning. Psychometrika. 22, 87-95.Bbuneb, J. S., Goodnow, J. J. & Austin, G. A. (1956). A Study of Thinking. New York:

Wiley.Estes, W. X., Hopkins, B. L. & Ckothebs, E. J. (1960). AU-or-none and conservationeffeots in

the learning and retention ofpaired associates. J. Exp. Psychol. 6, 329-39.Feigenbaum, E. (1959). An Information Processing Theory of Verbal Learning. The RAND

Corporation, Paper P-1817.

W'r = mr, (1)

In in _rw'r =mr __ mr =rI J_ r = ——— ,/ rfi / rfi n(n+l)

w, = w'v = r tl__rt . (3)r

320 Edward A. Feigenbaum and Herbert A. SimonFeldman, J. (1959). Analysis of Behaviour in Two Choice Situations. Unpublished doctoral

dissertation, Carnegie Institute of Technology.Gibson, E. J. (1940). A systematic application ofthe concepts of generalizationand differentia-

tion to verbal learning. Psychol. Bey. 47, 196-229.Hiloabd, E. R. (1956). Theories of Learning (2nd cd.). New York: Appleton-Century-Crofts.Hovland, C. I. (1938). Experimental studies in rote learning theory. 111. Distribution of

practice with varying speeds of syllable presentation. J. Exp. Psychol. 23, 172-90.Hovland, C. I. (1951). Human learning and retention. In S. S. Stevens (cd.), Handbook of

IExperimental Psychology. New York: Wiley.

Hovland, C. I. & Hunt, E. B. (1960). The computer simulation ofconcept attainment. Behavi- i

oral Science, 5, 265-7.Hull, C. L. et al. (1940). Mathematics-deductive Theory of Bote Learning, New Haven: Yale

University Press.Jensen, A. R. (1962). Is the serial position curve invariant? Brit. J. Psychol. 53, 159-66.Kruegeb, W. C. F. (1932). Learning during directed attention. J. Exp. Psychol. 15, 517-27.McCbaby, J. W. & Hunteb, W. S. (1953). Serial position curves in verbal learning. Science,

117, 131-4.McGeoch, J. A. (1942). The Psychology of Human Learning. New York: Longmans, Green.Miller, G. A. (1956). The magical number seven, plus orminus two ; some limits on our capacity

for processing information. Psychol. Bey. 63, 81-97.Newell,A., Shaw, J. C. & Simon, H. A. (1958a). The elements of a theory of human problem

solving. Psychol. Bey. 65, 151-6.Newell,A., Shaw, J. C. & Simon, H. A. (19586). The Processes of Creative Thinking. TheRAND

Corporation, Paper P-1320.Newell, A. & Simon, H. A. (1961). Computer simulation of human thinking. Science, 134,

2011-17.Ribback, A. & Underwood, B. V. (1950). An empirical explanation of the skewness of the

bowed serial position curve. J. Exp. Psychol. 40, 329-35.Rock, I. (1957). The role ofrepetition in associative learning. Amer. J. Psychol. 70, 18&-93.Welch, G. B. & Bubnett, C. T. (1924). Isprimacy a factor in association-formation? Amer. J.

Psychol. 35, 396-401.Wilcoxon, H. C, Wilson, W. R. & Wise, D. A. (1961). Paired-associate learning as a function

of percentage of occurrence of response members and other factors. J. Exp. Psychol. 61,ft

283-9.Wishner, J., Shipley, T. E., Jr. & Jubvich, M. S. (1957). The serial position curveas a function

of organization Amer. J. Psychol. 70, 258-62.Woodwobth,R. S. (1938). Experimental Psychology. New York: Holt.

(Manuscript received 27 June 1959; revised 11 July 1961)