23
JOURNAL OF MEMORY AND LANGUAGE 28, 610-632 (1989) Memory Predictions Are Based on Ease of Processing IAN BEGG, SUSANNA DUFT, PAUL LALONDE, RICHARD MELNICK, AND JOSEPHINE SANVITO McMaster University The article reports four experiments that examine people’s ability to predict the outcome of a future test of memory. Our thesis is that memory predictions are implicit judgments of how easily the item is processed while answering the predictive question. If items are processed easily because of factors that also cause memory to succeed, predictions are accurate; if the factors that cause easy processing are irrelevant for memory, predictions are less accurate. The experiments examine factors that influence the prediction task and the memory test separately: these include item attributes, manner of processing, repetition, and similarity of processing between the prediction task and the memory test. Predictions are most accurate if the prediction task entails the same processes as the test, even if the predictive question is nominally irrelevant to the test; predictions are less accurate if the task and test have different entailments, even if the nominal question is specifically aimed at the test. 0 1989 Academic Press. Inc. It’s a poor sort of memory that only works back- wards. Lewis Carroll Can memory work forward? What current cognitive information do people use when they predict that they will remember some things but forget others? People cannot pre- dict memorability with certainty; only some of the cognitive information that is avail- able at the present time will be available and useful in the future. Even memory the- orists lack precise knowledge of which cur- rent information is the “right stuff’ for me- morial success. We propose that subjects predict success for items that are easiest to The research was funded by Grant A8122 from the Natural Sciences and Engineering Research Council of Canada to Ian Begg, and the procedures were ap- proved by the Ethics Committee of McMaster Univer- sity. Experiment 1 was part of an Honors Thesis by SD, Experiment 2 was part of an Honors Thesis by JS, Experiment 3 was part of an Honors Thesis by PL, and Experiment 4 was part of an Honors Thesis by RM, all supervised by IB. We thank Ann Anas, Marcia Barnes, Lee Brooks, Larry Jacoby, Douglas Need- ham, and Andrea Snider of McMaster University, Paula Hertel of Trinity University, and Fergus Craik and Ray Shaw of ErindaIe College for helpful discus- sions. Reprints can be obtained from Ian Begg, De- partment of Psychology, McMaster University, Ham- ilton, Ontario, Canada, L8S 4K1. process in the manner demanded by the task. How accurate are the predictions? Predictions will be more accurate if items are easily processed for reasons that are relevant for the test than if their ease is for reasons that are irrelevant for the test. Our thesis is that memory predictions are based on the implicit heuristic “what is easiest to do now will be remembered best later.” We will discuss the thesis generally, reserving detailed analysis and reference to other re- search on memory predictions until the spe- cific introductions to four experiments. In 1690, John Locke introduced the con- cept of mental “reflection”; by “the per- ception of the operations of our minds . . . there come to be ideas of those operations in the understanding” (Mandler & Man- dler, 1964, pp. 27-28). Hence, by observing ongoing mental states and processes, peo- ple gain implicit theories of cognition. An item’s future success or failure on a mem- ory test cannot be known with certainty, but many other things about the item are known. Implicit theories dispose the pre- dictor to use some of those known things as the basis for attributing memorability to items. The discovery of which current knowledge is the basis for predictions will 610 0749-596X/89 $3.00 Copyri&t 0 1989 by Academic Ress, Inc. All rights of reproduction in any form reserved.

Memory predictions are based on ease of processing

Embed Size (px)

Citation preview

JOURNAL OF MEMORY AND LANGUAGE 28, 610-632 (1989)

Memory Predictions Are Based on Ease of Processing

IAN BEGG, SUSANNA DUFT, PAUL LALONDE, RICHARD MELNICK, AND JOSEPHINE SANVITO

McMaster University

The article reports four experiments that examine people’s ability to predict the outcome of a future test of memory. Our thesis is that memory predictions are implicit judgments of how easily the item is processed while answering the predictive question. If items are processed easily because of factors that also cause memory to succeed, predictions are accurate; if the factors that cause easy processing are irrelevant for memory, predictions are less accurate. The experiments examine factors that influence the prediction task and the memory test separately: these include item attributes, manner of processing, repetition, and similarity of processing between the prediction task and the memory test. Predictions are most accurate if the prediction task entails the same processes as the test, even if the predictive question is nominally irrelevant to the test; predictions are less accurate if the task and test have different entailments, even if the nominal question is specifically aimed at the test. 0 1989 Academic Press. Inc.

It’s a poor sort of memory that only works back-

wards. Lewis Carroll

Can memory work forward? What current cognitive information do people use when they predict that they will remember some things but forget others? People cannot pre- dict memorability with certainty; only some of the cognitive information that is avail- able at the present time will be available and useful in the future. Even memory the- orists lack precise knowledge of which cur- rent information is the “right stuff’ for me- morial success. We propose that subjects predict success for items that are easiest to

The research was funded by Grant A8122 from the Natural Sciences and Engineering Research Council of Canada to Ian Begg, and the procedures were ap- proved by the Ethics Committee of McMaster Univer- sity. Experiment 1 was part of an Honors Thesis by SD, Experiment 2 was part of an Honors Thesis by JS, Experiment 3 was part of an Honors Thesis by PL, and Experiment 4 was part of an Honors Thesis by RM, all supervised by IB. We thank Ann Anas, Marcia Barnes, Lee Brooks, Larry Jacoby, Douglas Need- ham, and Andrea Snider of McMaster University, Paula Hertel of Trinity University, and Fergus Craik and Ray Shaw of ErindaIe College for helpful discus- sions. Reprints can be obtained from Ian Begg, De- partment of Psychology, McMaster University, Ham- ilton, Ontario, Canada, L8S 4K1.

process in the manner demanded by the task. How accurate are the predictions? Predictions will be more accurate if items are easily processed for reasons that are relevant for the test than if their ease is for reasons that are irrelevant for the test. Our thesis is that memory predictions are based on the implicit heuristic “what is easiest to do now will be remembered best later.” We will discuss the thesis generally, reserving detailed analysis and reference to other re- search on memory predictions until the spe- cific introductions to four experiments.

In 1690, John Locke introduced the con- cept of mental “reflection”; by “the per- ception of the operations of our minds . . . there come to be ideas of those operations in the understanding” (Mandler & Man- dler, 1964, pp. 27-28). Hence, by observing ongoing mental states and processes, peo- ple gain implicit theories of cognition. An item’s future success or failure on a mem- ory test cannot be known with certainty, but many other things about the item are known. Implicit theories dispose the pre- dictor to use some of those known things as the basis for attributing memorability to items. The discovery of which current knowledge is the basis for predictions will

610 0749-596X/89 $3.00 Copyri&t 0 1989 by Academic Ress, Inc. All rights of reproduction in any form reserved.

MEMORY PREDICTIONS 611

give insight into the implicit theory that guided the predictions.

What current information do people use to make judgments about something they do not know with certainty? We have bor- rowed from several accounts the idea that people’s current processing of items is the basis for attributing other qualities to the items. For example, Tversky and Kahne- man (1972) proposed that the readiness with which an instance of a concept comes to mind is a basis for estimating how often the concept has occurred. Similarly, Ja- coby and Dallas (1981) suggested that the ease with which a repeated item retrieves its earlier encoding is an implicit test of whether the item will succeed on a later test. Begg and Green (1988) suggested that the ease with which a repeated item re- trieves its earlier encoding is an implicit test of whether the item will succeed on a later test. Begg, Armour, and Kerr (1985) sug- gested that the current accord between the factual contents of a statement and re- trieved knowledge is the basis for ratings of probable truth. Glenberg, Sanocki, Ep- stein, and Morris (1987) suggested that the familiarity of a general domain of knowl- edge is the basis for people’s judgments of whether they gained specific knowledge by reading about part of the domain. Heurisitic judgments of these sorts are often correct; hence they are reasonable bases for deci- sions made under uncertainty.

Heuristic judgments, however, are often wrong. The drawback of heuristic judg- ments is that their correctness arises from associations that are correlational rather than causal. In particular, differences in ease of processing do not cause items to be differentially memorable. However, many variables that cause increases in memora- bility also cause increases in ease of pro- cessing; this covariation makes it reason- able to use ease of processing as the basis for memory predictions. Careful experi- mentation can manipulate separately the factors that cause ease of processing from those that cause memorability. Evidence

for heuristic judgments will occur if people base their judgments on ease of processing even if it has been invalidated as a predictor of memory.

Four experiments investigated the thesis that memory predictions are implicit judg- ments of ease of processing. If so, predic- tions will be sensitive to factors that influ- ence ease of processing, even if those fac- tors are irrelevant for or inimitable to memory. Furthermore, predictive accuracy will suffer as the processing demands of the prediction task depart from the processing demands of the memory test; accuracy re- flects the extent to which the factors that cause success and failure in memory also cause items to be more and less easily pro- cessed. The variables will include item at- tributes, the manner of processing, repeti- tion, and the type of prediction task. Ex- periment 1 asked how accurately judgments at study predict recognition memory for words varying in imagery value and fre- quency of occurrence and compared mem- ory predictions to explicit ratings of ease of processing. Experiment 2 examined mem- ory predictions for pairs of words studied by interactive or separate imagery and compared predictions made at study with predictions made after study. Experiment 3 varied the imagery value of both members of pairs and asked if memory predictions made after study are sensitive to the imag- ery value of absent responses. Experiment 4 included several judgment tasks and com- pared their accuracy of predicting which items would succeed in recall and recogni- tion.

In summary, we propose that memory predictions reflect how easily items are pro- cessed in the prediction task. Ease of pro- cessing is a reasonable heuristic because many factors that cause success and failure in memory also cause differences in ease of processing. However, the accuracy of the predictions depends on whether the most important determinants of which items suc- ceed or fail are also the most important de- terminants of ease of processing.

612 BEGG ET AL.

EXPERIMENT 1

Experiment 1 tested our basic assump- tion; do people expect more success re- membering words that are easy to process than words that are harder to process? In an early investigation, Underwood (1966, pp. 468-470) used lists of items that ranged from BUG to VXK. Some people judged learnability and others learned the items; judgments were highly predictive of actual learning (T’S = .92, .91). However, learning was also accurately predicted by other sub- jects’ ratings of meaningfulness and pro- nounceability (T’S = .88, .90). By our ac- count, judgments are accurate predictors because the factors that make BUG more memorable than VXK also make BUG eas- ier to process than VXK. More recently, Rabinowitz, Ackerman, Craik, and Hinch- ley (1982) used lists of highly, moderately, or slightly related pairs of words. Subjects correctly predicted that their memory would vary with relatedness. Relatedness was invalidated as a predictor for other sub- jects; they imagined each pair interactively, and, therefore, remembered the less related pairs as well as they remembered the more related pairs. However, these subjects made the same predictions as the controls; their predictions reflected relatedness and were insensitive to differences in encoding that invalidated relatedness as a predictor. By our account, related pairs are easier to process than less related pairs whether the pairs are imagined or merely studied.

We varied two attributes, imagery and frequency, that are associated with ease of processing; concrete words and common words are easier to interpret, express, and define than abstract words and rarer words (Begg, Upfold, & Wilton, 1978). If memory predictions are implicit ratings of ease of processing, then people should expect more memorial success for items that other people find easy to process than for items that others find hard to process. Some sub- jects rated each word with explicit refer- ence to ease of processing rather than mem-

ory; they rated ease of imagining, ease of understanding, or ease of pronouncing. Other subjects rated the words with explicit reference to memory; they rated memora- bility, or they rated how easy it was to study each word for a test of memory. We expect that all ratings will be higher for con- crete and common words than for abstract and rarer words.

Experiment 1 included a test of recogni- tion memory. Will predictions be most ac- curate if the factors that make items easy or hard to process also make them more and less memorable? We chose imagery and frequency as attributes because they have different effects on recognition memory; recognition favors concrete words over ab- stract words, but it favors rarer words over common words (e.g., Begg & Rowe, 1972). If memory predictions reflect ease of pro- cessing, then the ratings should correctly predict that concrete words will surpass ab- stract words on the test, but they should erroneously predict that the test will favor common words over rare ones. Further- more, imagery ratings should predict which individual words will succeed and fail more accurately than explicit memory predic- tions do; frequency and concreteness both influence the ease of processing that under- lies memory predictions, but imagery rat- ings explicitly focus on item imagery, a pos- itive covariate of recognition, rather than frequency, a negative covariate of recogni- tion.

Finally, subjects also rated their familiar- ity with the words. We used familiarity rat- ings to obtain a measure that is sensitive to item frequency, but that is not an explicit measure of memory. We expect that ratings of memorability will be more predictive of familiarity than of recognition memory, even though the ratings are made with memory in mind. That is, ratings will pre- dict any later measure that is influenced by the same variables that made the items easy or difficult to process. Predictive accuracy reflects the similarity between the factors that influence initial ratings and the factors

MEMORY PREDICTIONS 613

that intluence later performance, not what the subjects thought they were doing in the rating task.

In summary, we expect that initial ratings will vary directly with item imagery and item frequency, that recognition will vary directly with imagery but inversely with frequency, and that familiarity will vary di- rectly with frequency. Ratings will predict recognition to the extent that they reflect imagery, and they will predict familiarity to the extent that they reflect frequency. Thus, imagery ratings will predict memory implicitly better than memory predictions do explicitly; the memory predictions themselves will be better implicit predictors of familiarity than explicit predictors of memory.

Method

Design Overview

We first describe the experiment in over- view, then supply details about how it was conducted. Subjects in six conditions stud- ied a list of words. One group just studied the words, and the other five rated each word on a 7-point scale. Two groups judged the words with explicit reference to mem- ory; one rated memorability and one rated ease of studying. Three groups rated ease of processing; one rated ease of imagining, one rated ease of understanding, and one rated ease of pronouncing. The words were of 9 types, derived by crossing 3 levels of imagery (high, medium, low) with 3 levels of frequency (high, medium, low). After study, the subjects performed two tests. One test was recognition memory, and the other was a test of familiarity; both orders of presenting the two tests were used. Hence the design was a 2 (Test Order) x 6 (Study Task) x 3 (Imagery Level) x 3 (Fre- quency Level) factorial, with the latter two factors varied within subjects. Dependent variables include the ratings made at study, recognition scores, familiarity scores, and correlations between the initial ratings and the other two measures.

Subjects

The subjects were 155 students of Intro- ductory Psychology at McMaster Univer- sity who participated as a requirement of the course. Testing was in groups of 11 to 16 that were assigned at random to 12 con- ditions.

Materials

Materials included a 90-word study list, rating sheets for subjects to record their judgments, and two 180-word tests. The 90 studied words and 90 new words for the tests were nouns from Paivio, Yuille, and Madigan’s (1968) norms, chosen to vary 3 levels of rated imagery (Z) crossed with 3 levels of Thorndike-Lorge frequency (F). Of the 9 sets of items, 3 were high-l words (each of the three mean Z’s was 6.38), 3 were medium (each mean Z was 4.67), and 3 were low (each mean Z was 2.94). Fre- quency was also precisely controlled; within each level of imagery, one set was low-F words (1 < F < 9; mean = 4), one was medium (10 < F < 49; mean = 26) and one was high (10 A and 10 AA words). From each of the 9 sets, 10 words were in the study list and 10 were new items on the tests, chosen at random with the restriction that mean Z and F be equal in each half.

The study list was randomized in 10 blocks of 9 items, one of each type, and recorded on videocassette from a com- puter-generated display. Each word ap- peared for 6 s, with 1 s between words. The words were numbered from 1 to 90 to cor- respond to rating sheets, on which each nu- meral was accompanied by a 7-point rating scale. Tests for recognition and familiarity had the 90 old words and 90 new ones, each with a 7-point rating scale; 1 was “least cer- tainly in the list” or “least familiar,” and 7 was “most certainly in the list” or “most familiar.” Each block of 18 had a new and an old item of each of the 9 types, randomly ordered. For the familiarity test, the items from the recognition test were reordered within blocks, which were also reordered.

614 BECK3 ET AL.

Procedure

The design was 6 (Task) x 2 (Test Order) between subjects; the recognition test was either before or after the familiarity test. One of the 6 initial tasks was an intentional memory control condition; subjects were told to study the items for a memory test. Three tasks were incidental memory condi- tions; subjects rated ease of imagining (1 = hardest to imagine), understanding (1 = hardest to understand), or pronouncing (1 = hardest to pronounce) each word. Two tasks were judgment conditions; subjects rated the memorability of each word (1 = least memorable) or the ease of studying it for memory (1 = hardest to study).

Subjects were encouraged to try hard, do their best, and so on. Groups (except the controls) were given rating sheets and were told to rate each word while it was on the screen. After the list was shown, sheets were collected and replaced by the first test. For the recognition test, subjects cir- cled 1 to 7 based on their certainty that the item had been in the list; with familiarity, they circled 1 to 7 to indicate their preex- perimental familiarity with the word in print. When the first test was completed, the booklets were collected and replaced by the other test.

Results and Discussion

The major results pertain to ratings, test scores, and the correlations between rat- ings and scores. These will be reported first in each section, along with statistical sup- port. Other results are necessary for a com-

plete presentation; these are presented summarily and may be of little interest to some readers. The (Y level was .OS for infer- ences, including post hoc t tests used to compute critical differences (d) to evaluate simple effects.

Initial Ratings

Are ratings of memorability and ease of studying sensitive to the same factors as ratings of ease of imagining, understanding, and pronouncing the items? Table 1 shows how these five ratings respond to differ- ences in item imagery and frequency. It is clear that all the ratings declined as imagery declined and all declined as frequency de- clined. Thus, as predicted, judgments about memory respond to item attributes in the same way that explicit ratings of ease of processing do.

Analysis of variance of the ratings had the five tasks as a between-subjects factor and three levels of imagery and frequency as within-subjects factors. Imagery and fre- quency had reliable main effects [F(2,242) = 268, 203; MSe = 0.672, 0.6831; separate analyses for each task revealed main effects of imagery (F’s from 13.6 to 165) and fre- quency (F’s from 16.6 to 74.3).

Other results. Imagery and frequency in- teracted in the overall analysis [F(4,484) = 50.0, MSe = 0.1861 and in each task (F’s from 4.09 to 36.0). The interaction between imagery and frequency occurred because the effect of frequency was smallest for concrete words and the effect of imagery was smallest for common words; this unim-

TABLE 1 INITIAL RATINGS AS A FUNCTION OF ITEM IMAGERY AND FREQUENCY IN EXPERIMENT 1

Rating task High

Imagery

Medium Low High

Frequency

Medium Low

Memorability 4.58 4.12 3.85 4.69 4.18 3.68 Studying 5.76 4.93 4.57 5.87 5.23 4.15

Imagery 6.53 4.37 3.42 5.21 4.81 4.30 Understanding 6.66 5.74 5.71 6.63 6.14 5.34 Pronunciation 6.21 5.76 5.55 6.30 5.94 5.29

MEMORY PREDICTIONS 615

portant interaction occurred in all later analyses as well, but we shall not report it each time.

Recognition

The expected outcome in recognition is for concrete words to exceed abstract words, but for rarer words to exceed com- mon words. Table 2 shows that the ex- pected outcome occurred in all five rating tasks and in the control condition. The means in Table 2 are discriminability scores, defined as the difference in recog- nition scores for old items (hits) and new ones (false alarms) of the same type.

Did memory judgments made at study predict the outcome of the recognition test? All initial ratings anticipated the effect of imagery on memory; all ratings favored concrete words over abstract words and so did the test. However, no ratings antici- pated the effect of frequency on memory; all ratings favored common words over rarer words, but the test favored rarer words over common words. Hence, mem- ory predictions failed to discriminate items in the same way that the memory test did. The predictions reflect ease of processing, which, for item frequency, is a negative co- variate of memorability.

Analysis of variance of the discriminabil- ity scores had 6 tasks and 2 orders of testing as between-subjects factors; imagery and frequency were within-subjects factors. The same factors were also used in separate analyses of hits and false alarms. Item im-

agery had a large effect on discriminability [F(2,286) = 152, MSe = 0.9021; as imagery declined, hits declined (6.00 > 5.85 > 5.54; F = 42.4, MSe = 0.532) and false alarms increased (2.60 < 3.04 < 3.24; F = 92.5, MSe = 0.554). Item frequency also had a large effect on discriminability (F = 133, MSe = 0.709), hits (F = 32.1, MSe = 0.488), and false alarms (F = 79.5, MSe = 0.419), but the effect was negative; as fre- quency declined, hits increased (5.60 < 5.81 < 5.97) and false alarms decreased (3.21 > 2.98 > 2.68).

Other results. The six tasks differed in discriminability, hits, and false alarms [F’s(5,143) > 5.161; the order was rated im- agery, understanding, memorability, and ease of studying, then control and pronun- ciation. Discrimination was better if the test was first rather than second [F(l,l43) = 100, MSe = 6.441; the first test had higher hits (F = 9.67, MSe = 2.81) and lower false alarms (F = 85.6, MSe = 4.77). The effect of imagery on discrimination was greater on the first test (4.23 > 3.46 > 2.92) than the second [2.68 > 2.24 > 1.77; F(2,286) = 6.001 because false alarms increased more with abstractness on the first test (1.92 < 2.45 < 2.81) than the second (3.18 < 3.55 = 3.61; F = 11.5). The effect of frequency was greater on the first test (2.95 < 3.59 < 4.06) than the second (1.90 < 2.16 < 2.63; F = 6.77) because false alarms declined more with rarer items on the first test (2.76 > 2.41 > 2.02) than the second (3.61 > 3.48 > 3.25; F = 9.60).

TABLE 2 RECOGNITION DISCRIMINABILITY SCORES FOR EXPERIMENT 1

Imagery Frequency

Rating task High Medium Low

Memorability 3.51 2.91 2.09 Studying 3.13 2.64 2.17 Imagery 4.38 3.63 3.04 Understanding 3.60 3.22 2.70 Pronunciation 2.82 2.17 1.85 Control 2.84 2.18 1.94

High Medium Low r

2.34 2.82 3.36 .ll 2.28 2.63 3.03 .06 3.47 3.63 3.95 .16 2.62 3.14 3.76 - .03 1.70 2.30 2.84 -.04 1.85 2.36 2.74

616 BEGG ET AL.

Correlations with Recognition

How accurately did initial ratings predict which items would succeed and fail in rec- ognition? A product-moment correlation was computed for each subject between ini- tial ratings of the 90 items and recognition scores for those items. With r as the depen- dent measure in an analysis of variance, the only reliable effect was task [F(4,121) = 5.66, MSe = 0.031, d = .05], shown in the right column of Table 2. The highest mean r was between rated imagery and recognition (.16), followed by rated memorability (.ll) and ease of studying (.06); these three ex- ceeded zero, but rated understanding (- .03) and pronunciation (- .04) did not. Explicit memory ratings predicted recogni- tion less accurately than did imagery rat- ings made without explicit reference to memory.

Did the correlation between ratings and recognition occur because ratings and rec- ognition both vary with item imagery? If so, r should be lower when high, medium, and low imagery items are analyzed separately than when high, medium, and low fre- quency items are analyzed separately; the mean of the former three r’s was reliably lower than the mean of the latter three [.03 < .ll; F(1,121) = 66.2, MSe = 0.0131.

Familiarity

We included ratings of familiarity to ob- tain a measure that covaries with item fre- quency, but that is not an explicit measure of memory. Table 3 shows that rated famil-

iarity was strongly affected by frequency in each of the six tasks.

The means in Table 3 are averaged over the two orders of testing, which had no re- liable effects. They are also averaged over the new and the old items; old items were rated as more familiar than new ones [5.47 > 5.41; F(1,143) = 4.40, MSe = 0.0791 whether the test was first (5.48 > 5.42) or second (5.47 > 5.40). The main effect of item frequency was reliable in old and new items [F(2,286) > 520, MSe < 0.531; in each case frequency interacted with task [F(10,286) > 3.831. Familiarity was less strongly affected by imagery, although the small difference was reliable in old and new items (F > 25.5, MSe < 0.62). Finally, rated familiarity differed over the six tasks [F(5,143) = 3.01, MSe = 1.707, d = .36]; the order was pronunciation (5.81), imagery (5.70), understanding (5.64), then control (5.33), memorability (5.08), and ease of studying (5.05).

Correlations with Familiarity

Finally, how accurately did initial ratings predict which items would be rated as more and less familiar? The right column of Table 3 shows mean r’s between initial ratings and rated familiarity. These r’s are much higher than the ones reported for recogni- tion. Hence, memory ratings are better im- plicit predictors of familiarity than explicit predictors of memory. Mean r’s differed across the five tasks [F(4,121) = 3.89, MSe = 0.059, d = .07]; familiarity ratings were predicted best by rated ease of understand-

TABLE 3 FAMILIARITY, AVERAGED OVER ORDER AND OLD-NEW, IN EXPERIMENT 1

Rating task High

Memorability Studying

Imagery Understanding Pronunciation

Control

5.13 5.08 5.02 5.85 5.66 4.22 .42 5.16 5.00 4.99 6.18 5.04 3.92 .53

6.05 5.60 5.45 6.37 5.83 4.22 .45 5.79 5.55 5.56 6.32 5.74 4.84 .67 5.95 5.76 5.73 6.53 5.88 5.03 .52

5.42 5.29 5.28 6.14 5.38 4.47

Imagery

Medium Low High

Frequency

Medium Low r

MEMORY PREDICTIONS 617

ing (.67), then ease of studying (S3) and pronouncing (.52), with the lowest r’s for imagery (.45) and memorability (.42).

Summary and Conclusions

Memory predictions were affected by the same variables as explicit ratings of ease of processing. People judged that concrete and common words would be more memo- rable than abstract and rarer words, and other people found it easier to process con- crete and common words than abstract and rarer ones. The results were predicted by the hypothesis that memory predictions are based on ease of processing.

Recognition memory favored concrete words over abstract words, but it favored rarer words over common words. Memory judgments predicted the advantage for con- crete words, but they did not predict the advantage for rarer words. At the level of individual items, ratings of memorability and ease of studying reliably diagnosed which items would succeed and fail. How- ever, the mean T’S (.ll, .06) were less than the .16 between imagery ratings and recog- nition scores. In other words, the mean r’s reflect imagery; concrete items are easier to process and more memorable than abstract items, ratings of imagery achieve the best prediction and the best memory, and r’s ap- proach zero when imagery is controlled.

Memory judgments predicted rated fa- miliarity more than they did recognition. Rated familiarity reflects frequency of oc- currence; memory predictions are corre- lated with familiarity because common words are easier to process than rarer ones and they are also more familiar than rarer ones.

In summary, the results are consistent with the thesis that memory predictions are implicit ratings of ease of processing. Pre- dictions are affected by the same variables as explicit ratings of ease of processing; predictive accuracy is best if the target task and ease of processing depend on the same variables and worst if they are affected dif- ferently by those variables. Memory pre-

dictions succeed if a salient variable causes easy processing and good memory; the two are independent effects of that variable, but their magnitudes are correlated.

Will people continue to base their predic- tions on relative ease of processing if their uncertainty about memory is reduced? We mention briefly a follow-up experiment. As in Experiment 1, memory ratings were higher for concrete than abstract words and for common than rarer words. However, other subjects were told how both variables affect memory; their ratings were higher for concrete words than abstract words, but the ratings were higher for rarer words than common words. Hence, if uncertainty is re- duced by the provision of information, sub- jects can use that information to discount heuristically based differences in ease of processing.

EXPERIMENT 2

Experiment 1 found that memory predic- tions covary with preexperimental differ- ences in item attributes. Do memory pre- dictions also reflect differences that occur entirely within the experiment? One focus in Experiment 2 was differences in how items are processed. Previous research shows that memory predictions are insensi- tive to differences in processing. Rabino- witz et al. (1982) found that predicted success was nearly equal in imagery and control conditions, although the groups did differ in recall; several other reports also find no differences in predicted success be- tween better and worse study procedures (Perlmutter, 1978; Shaughnessy, 1981; Zechmeister & Shaughnessy, 1980) or be- tween better and worse test procedures (Hertel, Anooshian, & Ashbrook, 1986).

Students’ yawns in our lectures tell us they know that some ways of studying are better than others. Why do predictions fail to show this common knowledge? Our an- swer is that predictions are comparative judgments; predictions favor items that are processed most easily, but the same items may be most easily processed in each in-

618 BEGG ET AL.

structional condition. There is a simple test of the hypothesis. If people study different subsets of items in better and worse ways, the subsets can be contrasted, and more success will be predicted for the better pro- cedure; however, predictions will diagnose the fate of individual items equally well within better and worse procedures. To test these predictions, Experiment 2 contrasted interactive imagery and separate imagery; it is well known that interactively imagined pairs exceed separately imagined pairs in cued recall (Begg, 1973, 1978,1982; Bower, 1970). We showed half the pairs horizon- tally, AB; people imagined these pairs by separate imagery. The other half of the pairs were arrayed vertically, $; people imagined these pairs by interactive imag- ery. Other people used interactive imagery or separate imagery for all pairs. We expect that memory predictions will favor interac- tive imagery over separate imagery, but only if both procedures are used in the same list.

Experiment 2 had a second focus, di- rected at memory-based predictions. Previ- ous research has shown that memory pre- dictions become more accurate if people had earlier opportunities to study the items (King, Zechmeister, & Shaughnessy, 1980), or if they had earlier tests (Lovelace, 1984). On an item’s first appearance in an experi- ment, the factors that make some items eas- ier to process than others are mainly pre- experimental in origin, including attributes like imagery and frequency. However, re- peated items differ in how easily they re- trieve memories of the earlier encodings (Begg & Green, 1988). Thus the ease of pro- cessing repeated items reflects retrievabil- ity of earlier encodings, which is a good basis for memory predictions. We expect that interactive imagers will predict higher levels of recall than separate imagers if the predictions are made about previously studied items.

We devised a cued-judgment task. After study, people reviewed one member of each pair and predicted whether they would

recall its partner on a later test; other peo- ple actually attempted recall. The cued- judgment task is very much like actual re- call; subjects see one member of each pair, and assess retrievability of its missing part- ner. Hence the judgments have a task- relevant basis for discriminating among items and for estimating how many will suc- ceed. After people had judged recall or at- tempted it, they did a final test of A + B recall. The earlier tests presented both A and B cues; judgments about A + B recall require processing that is more similar to the processing required by the final test than judgments about B + A recall do.

Will judged success and failure predict actual success and failure more accurately as the prediction task becomes more similar in requirements to those of the future test? If so, memory predictions made at study will be less accurate than predictions made after study, which should be more accurate if the reviewed cue is A rather than B. Fi- nally, the accuracy of the judgments will be compared with how well actual A 4 B and B + A recall on the immediate test predict final A -+ B recall.

Method

Subjects

There were 120 students, as in Experi- ment 1, who were tested in small groups that were assigned at random to conditions until there were 10 in each of 12 conditions.

Materials

The study list had 100 pairs of nouns (Z > 5.99, F > 10; Paivio et al., 1968); 10 were buffers (4 primacy, 6 recency). The list was videotaped from a computer-generated dis- play, with 5 s for each pair and 2.5 s be- tween pairs. Half the pairs were side by side on the screen and half were centered vertically. The list was randomized in 15 blocks of 6 items, one from each combina- tion of spatial arrangement crossed with three testing histories.

The response sheet for judgments at study was numbered from 1 to 100, with Y

MEMORY PREDICTIONS 619

and N beside each numeral. There were two immediate tests; for cued judgments, each cue had Y and N beside it, and for recall, each cue had a blank space; the cues were 15 top items, 15 left items, 15 bottom items, and 15 right items, in 15 blocks of 4. The final test had 90 cues with a blank space by each; the cues were the 45 left members of horizontal pairs and the top members of vertical pairs, in 15 blocks of 6.

Procedure

Subjects were encouraged to do their best to follow all instructions and to try hard on all tests. They were told how the pairs would be arrayed. The design was 3 (Study Instructions) x 2 (Study Ratings) x

2 (Immediate Test) between subjects. Separate imagery instructions were to

form a single image for each word of each pair, and project the image to an imaginary frame to the left, right, top, or bottom of the monitor, with the left items to the left, and so on. Interactive imagery instructions were to imagine the two members of each pair in a meaningful interaction, and project the image to the right of the screen for hor- izontal pairs, and below it for vertical pairs. Subjects in the mixed condition used sepa- rate images for horizontal pairs and inter- active images for vertical pairs, projecting them as in the unmixed conditions.

At study, half the subjects circled Y or N for each pair depending on whether it would be remembered or forgotten later. The other subjects circled Y if they were able to study the pair in the manner re- quested, and N if they could not; we in- cluded the compliance ratings so that some subjects who later judged memory did so for the first time in the experiment.

After study, the sheets were collected and replaced by immediate tests. Half the groups were given recall tests and half, judgment tests; judgers circled Y or N de- pending on whether they thought they could recall the missing partner of each cue, and recallers tried to do so, writing their responses on the sheet. After 6 min,

the sheets were collected and replaced by the final test; 10 min were allowed for sub- jects to write as many partners as they could.

Results and Discussion

We first discuss ratings made at study, immediate judgments and recall, and final recall. After discussing these means, a sec- ond section discusses predictive accuracy for individual items; how well do prior mea- sures predict which pairs will succeed or fail later?

Means

Means are proportions, MSe’s are squared proportions, and (Y was .Ol . Table 4 has seven sets of means that show the ver- tical and horizontal pairs for the three dif- ferently instructed groups; interactive groups are on the left, separate groups are on the right, and mixed groups are between them, with their means staggered to ease comparisons.

The table has many means, but the pat- terns are simple. Memory predictions made at study are first. Note two things. First, memory predictions did not differ between groups who used better or worse study pro- cedures; interactive and separate imagers had equal means (.53 vs. .53; the outer col- umns). (These groups judged horizontal pairs to be more memorable than vertical pairs (.60 > .46); a glance down the left and right columns shows an advantage for the horizontal pairs on every measure.) Sec- ond, memory predictions did differ if the better and worse study procedures were used in the same list; subjects in the mixed condition predicted more success for inter- actively studied pairs than separately stud- ied pairs (even though interactive pairs were vertical and separate pairs were hori- zontal). Judgments of compliance with in- structions had the same pattern as memory predictions except that people were espe- cially able to imagine horizontal pairs sep- arately in the mixed condition (go). The 12 means in the top section of Table 4 show

620 BEGG ET AL.

TABLE 4 MEAN RECALL AND JUDGMENTS FOR EXPERIMENT 2

Mixed condition

Initial judgments Memorability

Vertical Horizontal

Compliance Vertical Horizontal

Immediate test Recall

Vertical Horizontal

Judgments Vertical Horizontal

Final test Twice-tested:

Same cue Vertical Horizontal

Different cue Vertical Horizontal

Once-tested Vertical Horizontal

Interactive Interactive

.47 s4

.59 -

.57 .65

.63 -

.42 .36

.51 -

.48 .52

.51 -

.39 .34

.48 -

.46 .43

.51 -

.27 .25

.33 -

Separate Separate

- .45 .47 .61

- .55 20 .67

- .ll .I6 .16

- .37 .38 .42

- .13 .20 .19

- .12 .21 .18

- .07 .12 .lO

the three-way interaction of instructions, spatial arrangement, and type of judgment [F(2,108) = 7.16, MSe = 0.006;d = .06].

The second section of Table 4 shows the immediate tests (MS < 0.005). There are three results to note. First, the interactive group exceeded the separate group in recall and in judgments. Second, interactive im- agery exceeded separate imagery in the mixed group; recall and judgment for the mixed group were about equal to their un- mixed controls. Third, judged recall ex- ceeded actual recall, especially for sepa- rately imagined pairs, but the judgments show the same patterns as recall. Analysis revealed effects of recall versus judgment [F(1,108) = 6891, instructions [F(2,108) = 23.61, and their interaction (F = 7.29); in- structions also interacted with array (F = 42.6).

The bottom section of Table 4 shows the

mean for the final test of recall (MSe < 0.003). The pattern in final recall was the same as in immediate recall. Interactive im- agery exceeded separate imagery between groups and within the mixed group, whose means were nearly equal to their unmixed control values; as above, instructions dif- fered [F(2,108) = 45.41 and interacted with array (F = 49.6). (As well, twice-tested items exceeded once-tested items, and pairs tested twice with a different cue ex- ceeded those tested twice with the same cue, but only for interactively imagined pairs; the main effect of history, F(2,216) = 98.3, interacted with instructions, F(4,216) = 7.09. Final recall was about equal after initial judgments of memorability or com- pliance (.25 vs. .27) and after immediate tests of recall or judgment (.27 vs. .25).)

In summary, memory judgments at study predicted that interactively and separately

MEMORY PREDICTIONS 621

studied items would differ in recall, but only if both were done in the same list; pre- dicted success did not differ between pure interactive and separate groups. In con- trast, judgments made after study accu- rately predicted the pattern of results in re- call. These judgments overestimated recall of separately imagined items, and, there- fore, underestimated the effect of instruc- tions. However, making the judgments like recall by requiring retrieval of prior encod- ings allowed the judgments to discriminate between differently instructed groups. Thus memory predictions can discriminate better from worse ways of studying, but predictions at study do so only if the differ- ent ways are both used in the same list. Predictions made after study discriminate ways of studying that differ in the likeli- hood that remembered information will be- come available.

Predictive Accuracy

The present analyses ((w = .05) ask how accurately prior measures predict which items will succeed and fail on the final test. Each task classes items as successes (S) or failures (F); any two tasks class items as SS, FF, FS, and SF. Analyses of contin- gencies compare the correct cells (SS, FF) to the incorrect cells (FS, SF). We used two statistics, c$, based on the difference be- tween correct and incorrect, and gamma (G), based on their ratio (Bishop, Fienberg, & Holland, 1975, Chap. 11, discuss the need for both measures). Each measure was calculated for each subject, and used as a dependent measure in analyses of vari- ance (Nelson, 1984). We present only the results for 4; the results were the same for G, but with some ceiling effects. It was not possible to analyze all within-subjects fac- tors simultaneously, because some individ- ual cells had undefined correlations; for ex- ample, after separate imagery, many cells were 0. Accordingly, data were retotalled for each analysis.

First, did predictive accuracy differ for interactively and separately imagined

pairs? Although memory judgments at study predicted more success for interac- tively studied pairs than for separately studied pairs from the same list, those judg- ments were equally accurate predictors of which pairs would succeed and fail (+ = .27, .27, MSe = 0.011).

Memory judgments made at study pre- dicted which items would succeed and fail on the final test more accurately than did judgments of compliance with instructions [.25 > .17; F(l,lOS) = 13.2, MSe = 0.0161. Predictive accuracy improved if judgments were made after study. Cued judgments about A and B predicted final recall with correlations of .52 and .41; actual success in immediate recall cued by A and B pre- dicted final recall with correlations of .86 and .60. Final recall was predicted more ac- curately by actual recall than by judged re- call [F(1,107) = 57.7, MSe = 0.0321; both measures predicted final recall more accu- rately if the immediate test had A rather than B as the cue [F(l,lOO) = 34.7, MSe = 0.00661.

To summarize, 4 increases as the predic- tor task becomes more similar to the final test. Judgments of compliance are lower (. 17) than memory predictions made at study (.25). Requiring retrieval to make the judgments improves accuracy, even if the judged cue is not the one that will ultimately be tested (.41); having the same cue in- creases accuracy (.52). Actual recall is still better, even with different cues on the two tests (.60), but prediction is best for two tests of the same sort (.86).

Summary and Conclusions

The results from Experiment 2 are clear. At study, people who used interactive im- agery for all pairs expected to remember the same number as people who used sep- arate imagery for all pairs. However, peo- ple who used both procedures expected more success for interactively than sepa- rately imagined pairs. Why? People know that interactive study is better than sepa- rate study, but comparative knowledge is

622 BEGG ET AL.

useful only if both procedures are available for contrast. Ease of processing does not discriminate the procedures; items that are easiest to process one way are also easiest to process the other way. The predictions were equally able to anticipate which items would succeed with either procedure.

If memory predictions were made after study, interactive imagers expected more success than separate imagers. In this judg- ment task, people reviewed words and pre- dicted recall of the missing partners of the words; like recall, the task entails process- ing the cue and assessing the retrievability of the response. These judgments favored interactive imagery over separate imagery and were reasonably good predictors of which items will succeed and fail, espe- cially if the reviewed item was the future cue. Why? The processing required to pre- dict recall of an absent B cued by a present A is similar to the processing required to recall B with A as the cue. The assessed retrievability appears to be lenient, because subjects expect to succeed more often than they actually do when tested.

We conclude that memory predictions are relative judgments of how easily items are processed. At study, predictions reflect preexperimental differences among items, but are insensitive to factors that apply to all the items, even if people have knowl- edge about those factors. If predictions are made after study, the reviewed items that are most easily processed are the ones that readily retrieve their prior encodings. This memorial basis reduces uncertainty by pro- viding task-relevant information about how many items will succeed and which ones will succeed, and predictions become more accurate as the predictive task becomes more similar to the test. Keep in mind that even actual recall is imperfectly predictive of continued success or failure (cf. Tulving, 1964).

EXPERIMENT 3

Experiment 3 used cued judgments. We argued that these judgments reflect ease of

processing the cue and retrieving its part- ner. Experiment 3 used pairs whose cues and partners varied in memorability; if the characteristics of the missing partners af- fect the judgments, then those items must be partially cognitively available when their success or failure is predicted. Concrete- ness is a good predictor of memory (Paivio, 1971; Paivio & Begg, 1981); the usual order in cued recall is concrete-concrete, con- crete-abstract, abstract-concrete, and ab- stract-abstract pairs. After study, one group recalled concrete or abstract partners of concrete or abstract cues, and another group made cued judgments; both had a fi- nal recall test. Will cued judgments vary as a function of response concreteness?

Method

Subjects

The subjects were 42 students as in Ex- periment 1, with 21 in each of two condi- tions .

Materials and Procedure

Forty-eight concrete (Z > 6.49) and 48 ab- stract (I < 3.85) nouns (Paivio et al., 1968), all with F > 29, were each sorted into four sets of 12 words in which mean Z and F were nearly equal. The sets were used to make 12 concrete<oncrete pairs, 12 con- crete-abstract pairs, 12 abstract-concrete pairs, and 12 abstract-abstract pairs. The pairs were randomly ordered and recorded on videotape, with the items side by side for 5 s. There were also 4 primacy and 4 re- cency pairs that were untested.

The 48 left-hand items were the cues on each test. For cued judgments each cue had YES and NO typed beside it; other subjects received the same cues, but with a blank for recalling the missing partners. After these immediate tests, all subjects received the final test; the cues were rerandomized. All subjects were told their memory would be tested, all saw the study list, and all re- ceived the final test. Between study and the final test, half did the cued-judgment test

MEMORY PREDICTIONS 623

and half did written cued recall, with 5 min for each; immediate tests were collected before final tests were distributed.

Results and Discussion

Means (MSe < 0.006) are in Table 5. The top two rows show immediate and final re- call; recall favored concrete cues over ab- stract cues, and concrete responses over abstract responses. The next two rows show immediate judgments and final recall. The judgments were two high, but they, like recall, were sensitive to cue concrete- ness and response concreteness. As in Ex- periment 2, cued judgments have the same pattern as recall. The results go beyond those of Experiment 2 in showing that at- tributes of cues and attributes of missing responses both influence predictions of success.

Concrete cues exceeded abstract cues [F( 1,40) = 2061 and concrete responses ex- ceeded abstract responses (F = 28.8). These two variables interacted (F = 22.3) and entered a three-way interaction with the time of test (F = 9.05); time of test also interacted with test sequence (F = 29.0). By post hoc t tests, the pattern in each row was CC > CA > AC = AA; between rows, immediate judgments exceeded everything else.

Conditionalized measures are presented briefly because recall was very poor in some conditions. Of items that succeeded

TABLE 5 RECALL PROPORTIONS FOR DIFFERENT TYPES OF

PAIRS TESTED IN DIFFERENT WAYS IN EXPERIMENT 3

Pair type

Test sequence AA AC CA CC

Immediate cued recall .Q7 .I0 .22 .44 Final cued recall .06 .lO .23 .44

Immediate judgment .27 .23 .46 .61 Final cued recall .ll .17 .31 .46

Note. A and C refer to abstract and concrete nouns; thus, with AC, people were tested with an abstract cue for a concrete response, and so on.

on the final test, 90% had succeeded ear- lier, and 92% had been predicted to suc- ceed; items predicted to fail rarely suc- ceeded. Items that succeeded on the first recall test were likely to succeed again on the second test (90%), but only 63% of items predicted to succeed actually did. Thus + was higher between two recall tests than between judgment and recall [.88 > .66; x2 (3) = 1301; partitioning x2 reveals that 95% of its value is from unwarranted optimisim; actual failures were too often predicted to succeed (.14 > .02,77% of x2), and too rarely predicted to fail (.60 < .77, 18% of x2).

Summary and Conclusions

Cued judgments were very similar to ac- tual recall, but the judgments were too op- timistic. Judgments are based on retrieval of task-relevant memories, but the evalua- tion of their adequacy to support recall is too lenient. The judgments accurately pre- dicted which pairs would succeed and fail, differing from recall only in the unwar- ranted confidence placed on pairs that ac- tually failed later on. Predictions are good because the ease of processing the item in the manner requested by the prediction task is influenced by the same variables that cause recall to succeed or fail.

EXPERIMENT 4

Our plan in Experiment 4 is to contrast the nominal focus of memory predictions with the processing required by the predic- tive task. People studied a list of AB pairs and were tested for recognition of A and A + B recall. Between study and test, people predicted whether they would succeed or fail in recognition of A, recognition of B, A * B recall, or B + A recall. Will these different judgments be differentially accu- rate predictors of memory performance? If the nominal focus of the predictive task matters, then judgments about recognition of A should be the most accurate predictors of recognition of A, and judgments about A -+ B recall should be the most accurate pre-

624 BEGG ET AL.

dictors of A -+ B recall. We have proposed that memory predictions reflect the relative ease of processing items in the way the task requires. Thus we are led to ask how similar each predictive task is to the demands of the final test.

First consider A * B recall; A is the cue and the subject must supply B from mem- ory. Experiment 2 found most accurate pre- diction of A + B recall if people reviewed A and judged whether they would recall B with A as the cue. We proposed that both tasks require identification of A as an old item and use of the relation between A and B to retrieve B. If our explanation is cor- rect, recall should also be predicted accu- rately if people are given A and judge whether they will recognize B if B appears alone on a test; the question is wrongly fo- cused, but it requires the same processes to answer the predictive question as it will to do the test. In contrast, suppose people are given the complete pair, AB, and asked to predict A + B recall. The question is cor- rectly focused, but the presence of both items reduces the need to assess retriev- ability of B. We expect that A + B recall will be predicted more accurately by judg- ments about recognition of the absent B partner of A than by judgments about A + B recall made for AB pairs.

Now consider recognition of A; the test requires identification of A as an old item, with no accompanying context. By our ac- count, any predictive question asked about A in the absence of accompanying context entails identification of A as an old item. Hence, if the reviewed stimulus is A, it should not matter whether people judge recognition of A, recognition of the absent B partner, or recall of the absent B with A as the cue. These judgments should predict recognition of A more accurately than judg- ments specifically focused at recognition of A, but reviewed in different ways; these in- clude judgments about recognition of A when the reviewed stimulus is AB, BA, or B.

In overview, people first studied AB

pairs, then made memory predictions about reviewed items, and then were tested for recognition of A and A + B recall. Re- viewed items were single A’s or B’s or they were complete pairs in the studied order, AB, or reversed order, BA. There were three judgement tasks. Subjects who re- viewed single words judged recall with the reviewed stimulus as the cue, they judged recognition of the reviewed stimulus, or they judged recognition of its absent part- ner. Subjects who reviewed pairs judged re- call of those pairs with the left item as the cue, recognition of the left item, or recog- nition of the right item. To repeat, we ex- pect that A + B recall will be predicted best by judgments directed at the absent B part- ners of reviewed A’s, regardless of whether the question asks about recall of B or rec- ognition of B; we expect that recognition of A will be predicted best by any judgments when A is reviewed, regardless of whether the judgment concerns recognition of A, recognition of the absent B, or recall of B.

We chose the particular prediction tasks because they make contact with existing lit- erature. There has been a great deal of re- search into “feeling-of-knowing.” This re- search was initiated by Hart (1965, 1967), who asked people questions like “What is the capital of Australia?” If subjects failed to recall the correct answer, Hart asked them to judge whether they would be able to recognize that answer on the later test. Feeling-of-knowing judgments are only modestly predictive of recognition (Blake, 1973), perceptual identification and relearn- ing (Nelson, Gerler, & Narens, 1984). In- deed, the judgments are less accurate than actuarial predictions based on the propor- tion of people who know the answer (Nel- son, Lenesio, Landwehr, & Narens, 1986). Thus uniquely personal differences across items are less predictive of memory than are culturally shared attributes of the items.

Why are feeling-of-knowing judgments only modest predictors of recognition? One reason is that the task has people predict recognition only after recall has failed,

MEMORY PREDICTIONS 625

rather than for all items. One of our tasks asks a “feeling-of-knowing” question of all items; subjects will predict recognition of the missing A partners of reviewed B items, then be tested for recognition of those A items. We will compare the accuracy of these predictions to the accuracy from other conditions. We expect that their ac- curacy will be worse than when the re- viewed stimulus is A alone, but not than when predictions are made about the A members of reviewed pairs.

Another reason than feeling-of-knowing judgments are only modestly accurate is that recall and recognition are imperfectly correlated. Feeling-of-knowing judgments concern recognition of unrecalled items; another large body of research concerns recognition failure of recalled items. The procedure in these experiments is to test recognition of B then A -+ B recall. The usual finding is that recalled B’s are only slightly better recognized than the average of all B’s; if average recognition is .75, rec- ognition of recalled B’s is about .84 (Flexser & Tulving, 1978). Even that mod- est predictive relation is reduced if people study the pairs in meaningful ways; Begg (1979, Experiment 1) found average recog- nition at .76, with recognition of recalled B’s at .79. The present condition in which people judge recognition of reviewed B then attempt A -+ B recall is formally equiv- alent to the experiments on recognition fail- ure. We will compare the predictive accu- racy of these judgments to the accuracy from other conditions. In particular, we ex- pect that A + B recall will be predicted more accurately if people predict recogni- tion of the absent B partner of a reviewed A than if they make explicit attempts to pre- dict recognition of B items that are physi- cally presented for review.

In summary, subjects studied AB pairs and were tested for recognition of A and A + B recall. Between these events, they re- viewed words or pairs and judged whether each reviewed stimulus would succeed or fail on a memory test of a particular type.

Our interest is with how accurately judged recall predicts actual recall, how accurately judged recognition predicts actual recogni- tion, and how accurately judgments about one measure predict actual success on the other. The thesis is that people predict suc- cess if the item is easy to process in the way the task requires but failure if the process- ing is more difftcult. Such predictions will be accurate, if the factors that make the process easy or difficult are the same as the factors that make memory succeed or fail, and will be increasingly less accurate as the two sets of factors become increasingly dif- ferent .

Method

Design Overview

All subjects studied 90 AB pairs, and did a final test of recognition of A and A + B recall. Between study and test, subjects re- viewed one of two sets of stimuli. The sin- gle word stimuli were the A members of 30 pairs and the B members of 30 pairs. The pair stimuli were the same except that the other member of the pair was included; 30 pairs were reviewed as studied, AB, and 30 were reversed, BA. Each set of stimuli was given to four groups; one group was a con- trol group that reviewed the items but did not rate them. The other three groups pre- dicted memory. One group predicted recall; subjects predicted A -+ B recall if the stim- ulus was A or AB, and predicted B --, A recall for B or BA. The other two groups predicted recognition. In the present/left condition, subjects predicted recognition of A if the stimulus was A or AB, and recog- nition of B if the stimulus was B or BA. In the absent/right condition, subjects pre- dicted recognition of B if the stimulus was A or AB, and recognition of A if the stimulus was B or BA.

Our purpose is to determine whether pre- dictive accuracy is influenced more by sim- ilarity in processing between judgments and the test than by the explicit focus of the judgments. The design includes two review

626 BEGG ET AL.

lists (word vs. pair), two subsets of items (AIAB vs. BIBA), and three explicit judg- ments (recall vs. recognize present/left vs. recognize absent/right). Hence ,there are 12 predictive judgments to evaluate against ac- tual recognition of A and actual A + B re- call. Consider recognition of A. Four judg- ments explicitly ask about recognition of A, with the review stimulus being A, AB, B, or BA; will these four achieve better predic- tion than the other eight, of which four ask about recognition of B and four ask about recall? We expect not. Instead, we expect best prediction if the reviewed stimulus, like the test, is A alone, whether the judg- ments explicitly concern recognition of A, recognition of B, or recall of B with A as the cue. Now consider A --, B recall. Two tasks explicitly asked about A + B recall, with the stimulus being A or AB. Of these two, reviewing A alone is more like the test than reviewing AB; furthermore, reviewing A alone and predicting recognition of B is more like the test than is reviewing AB and explicitly predicting A --, B recall. Our pur- pose is simple, but it requires a complex design to provide a fair comparison be- tween the alternatives.

Subjects

The subjects were 187 students, tested in 16 groups of 10 to 14; two groups were as- signed at random to each of eight condi- tions, with from 22 to 25 in each.

Materials

The study list had 90 pairs of words. These and 90 new items for the test were nouns with Z between 4 and 6, and F of 10 or higher; 232 were from Paivio et al. (1968), and 38 were from an enlarged set of 2448 words from Paivio’s lab. The nouns were sorted into 9 sets that were assigned at ran- dom to conditions; each set had 6 AA words, 6 or 7 A words, and 18 or 17 words with 9 < F < 50, and mean Z ranged from 5.47 to 5.50. These sets provided 90 new items for the test and 90 AB pairs, in 3 sub-

sets of 30 that would have different review- ing histories. The pairs were recorded on videocassette at the rate of 5 s for each pair; each block of 6 items had 2 pairs from each subset.

One review test was a word test; 30 pairs were not reviewed, 30 had A as the re- viewed stimulus, and 30 had B. Each block of 6 words had 3 A’s and 3 B’s. The other test was the same, except that the review stimuli were 60 pairs; the same 30 pairs were not reviewed, the 30 A’s from above were reviewed as AB, and the 30 B’s were reviewed as BA. Each stimulus appeared beside a 7-point scale; at the top of each page was the key (1 = certainly no, 2 = probably no, 3 = possibly no, 4 = don’t know, 5 = possibly yes, 6 = probably yes, and 7 = certainly yes). Control groups re- ceived these sheets without the scales. The final test had 180 words, each with Y and N to its left and a blank space to its right.

Procedure

There were 8 between-subjects condi- tions in a 2 (Review Words vs. Pairs) ~4 (Task) factorial design. In the control task, subjects received a review booklet and were told to review the items for a memory test. The other tasks were prediction con- ditions; subjects were asked questions that they answered on 7-point scales. For the recall question, subjects given single words predicted recall of the missing partner with the reviewed word as a cue, and subjects given pairs predicted recall of the right member with the left one as a cue. The other two conditions were asked recogni- tion questions. With A and B words as stim- uli, one group predicted recognition of the present stimulus (A or B), and one pre- dicted recognition of the absent partner (B or A); with AB and BA pairs as stimuli, one group predicted recognition of left mem- bers (A or B) and one predicted recognition of right members (B or A).

Subjects were encouraged to try hard and do their best. All subjects were instructed to use interactive imagery to study for a

MEMORY PREDICTIONS 627

test. After study, booklets were distributed with corresponding instructions. The re- views were self-paced, except for those in the control conditions, who received the average time needed for the other tasks. When the booklets were completed, they were collected and replaced by the test. Subjects circled N for new words and Y for old ones and recalled as many partners of the “old” words as they could. The test was self-paced.

Results and Discussion

How accurately did judgments made dur- ing review predict success or failure on the final memory test? We begin with recogni- tion, then discuss recall. In each case, we first discuss predictive accuracy, then sum- marize actual memory, which might be of interest to some readers.

Predictions of Recognition

Predictive accuracy was assessed by gamma (G) coefftcients. Each subject rated two subsets of 30 items; two G’s were com- puted for each subject between the ratings and whether the A items were recognized or missed. Mean G’s are in Table 6; the table also has all the data for actual recog- nition, but we will consider only the col- umns headed G for the moment. Inspection

of these 12 means shows that the highest G’s are the three in the upper left, for which the reviewed stimulus was A. The only re- liable effect in analysis of variance was the interaction that supports the inspection [F(1,127) = 7.38, MSe = 0.0881; recogni- tion of A was more accurately predicted if the reviewed stimulus was A (55) than if it was B (.35), AB (.38), or BA (.36). Predic- tive accuracy did not vary reliably over the different questions that were asked at re- view. If A was reviewed alone, prediction of A’s fate in recognition was accurate if subjects explicitly predicted recognition of A (.55), recognition of the absent B (.51), or A += B recall (.60). Predictions were less accurate if the review stimulus was B, AB, or BA, regardless of the explicit question; mean accuracy was .39 for recall questions, .38 for explicit questions about recognition of A, and .32 for questions about recogni- tion of B. The feeling-of-knowing predic- tion about whether the missing A partner of a reviewed B would be recognized, at .37, was no better or worse than the others.

Thus, the only factor that influenced the accuracy with which subjects predicted which A’s would be recognized or missed was the stimulus for review. Recognition of A was predicted most accurately if the stim- ulus was A, regardless of what question

TABLE 6 RECOGNITION OF THE A [Rn(A)] MEMBERS OF PAIRS IN EXPERIMENT 4

Word review tasks A reviewed

Question WA)

Control .87 Recall A + B? .80

Recognize A? .80 Recognize B? .82

Pair review tasks AB reviewed

Question WA)

G

.60

.5.5 Sl

G

B reviewed

Question Rn(A)

Control .73 Recall B + A? .65

Recognize B? .70 Recognize A? .68

BA reviewed

Question Rn(A)

G

.39

.28 .37

G

Unreviewed

Rn(A)

.70

.63

.67

.66

Unreviewed

WA)

Control .79 Control .82 .55 Recall A + B? .78 .43 Recall B + A? .83 .35 .58

Recognize A? .71 .33 Recognize B? .69 .29 .55 Recognize B? .69 .39 Recognize A? .77 .43 .53

628 BEGG ET AL.

was asked. Recognition was predicted less accurately if the stimulus was B, AB, or BA, again regardless of what question was asked. When A is reviewed alone, an im- portant part of the process required to an- swer any question is the identification of A as an old item by retrieving its prior encod- ing in the absence of any context; items that are easily processed in this fashion are ones that are most likely to be recognized later in the absence of context. If both items are present for review, the factors that influ- ence ease of processing reflect both items, so that predictions based on ease of pro- cessing are less specific to A, and are there- fore less accurate.

Cue recognition. We now present, for completeness, the analysis of recognition; mean hits for the A items are in Table 6, in columns headed Rn(A). Subjects who re- viewed single words will be discussed sep- arately from those who reviewed pairs; more new items were called old after single words rather than pairs were reviewed [.23 > .15; F(1,179) = 18.5, MSe = 0.0141.

Subjects in word review conditions saw A items from 30 pairs, B items from 30 pairs, and no items from the other 30; re- spective recognition of the A items from these sets was .82, .69, and .66 [F(2,186) = 120, MSe = 0.00591. No other effects were

reliable. Reviewed A items became more memorable, but the A partners of reviewed B items did not, and there was no reliable effect of what question was asked during review.

Subjects in pair review conditions saw 30 AB pairs and 30 BA pairs, and did not re- view 30 others; recognition of the A items from these sets (MSe < 0.009) was .74, .78, and 55 [F(2,172) = 1631. Recognition after pair review varied with the review ques- tions; the 4 review conditions interacted with the 3 sets of items (F(6,172) = 2.96, d = .05; recognition of A members of unre- viewed pairs was about equal in the four conditions, but for the other pairs, recogni- tion of A items was better for the controls and the subjects who predicted recall than for the subjects who predicted recognition, who showed better recognition of A if they had been asked about A rather than B).

Predictions of Recall

The accuracy of predictions of which pairs would succeed or fail in A --f B recall was again assessed by G’s. The 12 G’s are in Table 7, which also includes all the rest of the recall data. One point to note is that predictions more accurately anticipated success and failure in recall than they did in

TABLE I RECALL OF THE B [Rc(B)] MEMBERS OF PAIRS IN EXPERIMENT 4

Word review tasks A reviewed B reviewed Unreviewed

Question WB)

Control .16 Recall A+ B? .15

Recognize A? .14 Recognize B? .13

Pair review tasks AB reviewed

G

.84

.64 .89

Question WB) G WB)

Control .30 .16 Recall B+ A? .24 .63 .14

Recognize B? .25 .6.5 .16 Recognize A? .22 .59 .ll

BA reviewed Unreviewed

Question WB) G Question WB) G WB)

Control .44 Control SO .16 Recall A + B? .39 .63 Recall B -+ A? .50 .52 .17

Recognize A? .20 .59 Recognize B? .23 .65 .Ol Recognize B? .25 .68 Recognize A? .27 .67 .lO

MEMORY PREDICTIONS 629

recognition [.67 > .41; F(1,118) = 98.1, MSe = 0.0821.

We first consider the accuracy of predic- tions that were explicitly aimed at recall. These predictions were more accurate if the review question asked about A + B rather than B --;, A recall [F(1,45) = 6.84, MSe = 0.0981, and if the reviewed stimulus was a word rather than a pair [F(1,42) = 7.65, MSe = 0.1021. Explicit predictions of recall were most accurate if subjects saw A and predicted A --, B recall (.84); predictions were less accurate if subjects saw B and predicted B + A recall (.63) or saw AB and predicted A + B recall (.63); predictions were least accurate if subjects saw BA and predicted B * A recall (52). Explicit pre- dictions of recall are most accurate if the predictive question is the same as the test, and if the subject must identify the cue on its own and retrieve the target to answer the question.

How accurately did judgments about rec- ognition predict which pairs would succeed and fail in recall? The subjects who pre- dicted recognition of the single words pre- sented for review or the left-hand members of reviewed pairs revealed no reliable dif- ferences (MSe = 0.085); predictive accu- racy was about equal for predictions about recognition of A (.64) and B (.65) and the left member of AB (.59) and BA (.65). How- ever, subjects who predicted recognition of the absent partners of single words or the right-hand members of pairs revealed an in- teraction [F(1,39) = 6.14, MSe = 0.0911; predictions about recognition of an absent B partner were very accurate in predicting recall (.89), but predictions about an absent A (.59), the right member of AB (.68) or BA (.67) were about equal to the other recog- nition predictions. Predictions about recog- nition thus predict recall best if the question asks about recognition of the missing tar- get with the future cue as the stimulus.

In summary, recall is predicted most ac- curately if A is reviewed alone and the question concerns B; the two cases of this type asked whether the absent B would be

recalled or recognized, and both gave ex- cellent prediction of recall (.84, .89). Pre- dictions are worst if the complete pair is reviewed in the wrong order, because both items are available and the focus of the question is wrong; the one case of this type gave the worst prediction of recall (.52). The other nine ranged from .59 to .68; pre- dictions of A + B recall are moderately ac- curate if the question concerns B -+ A re- call, if the question concerns recognition, or if the contribution of retrievability of the target is reduced by presenting it explicitly. Within these conditions, the recognition failure condition in which people predicted recognition of reviewed B, at .65, was not more or less accurate in predicting recall of B than were the others.

Cued recall. Mean A + B recall is in Ta- ble 7 in columns headed Rc(B); in an overall analysis (MSe = 0.0065), all main effects and interactions were reliable, and any two means differing by more than .045 are reli- ably different from each other. We will dis- cuss the word review tasks, then the pair tasks, then differences between the tasks.

In a separate analysis of the word review tasks, the only reliable effect was that A +- B recall was better if the reviewed item was B (.25), rather than A (.14), or neither (.14) [F(2,186) = 83.51. Thus recall benefits from solitary review of the targets. In the pair review tasks, recall was better for pairs re- viewed as BA (.38) or AB (.32) than for un- reviewed pairs (.13) [F(2,186) = 1861. Re- call favored the controls (.37) and the recall question (.35) over the recognition question about right (.21) and left (.17) items [F(3,86) = 7.461.

Comparing the word and pair review tasks reveals that recall was lower if words were reviewed than if pairs were reviewed [. 18 < .28; F( 1,179) = 20.41. This effect interacted with the four tasks [F(3,179) = 4.691 and the three sets of items [F(2,358) = 70.91, and the three variables interacted [F(6,258) = 3.86; the advantage for pair review over word review was especially large for the controls (.47 > .23) and the subjects asked

630 BEGGETAL.

about recall (.40 > .20); for subjects asked about recognition, review of AB exceeded review of the cue, A (.23 > .14), but review of BA did not exceed review of the target, B (.25, .24).]

Summary and Conclusions

Success or failure in recognition is pre- dicted best if the stimulus for review is the item that will be tested in recognition, but it matters little what people are specifically asked to predict about it. Success or failure in recall is predicted best if the stimulus for review is the cue and the question concerns the absent response, but it matters little what is asked about the absent response. Thus predictive accuracy depends on whether the predictive task requires the same processes as the test, not on the nom- inal question; predictions reflect how easily the predictive task is done, and their accu- racy reflects the extent to which the factors that made the predictive task easier for some items than others also made memorial success more likely for those items than the others.

GENERAL DISCUSSION

We begin with a summary. Memory pre- dictions at study are senstive to preexperi- mental differences among items; more suc- cess is predicted for concrete than abstract words and for common than rarer words. However, ratings of ease of processing that make no mention of memory are equally sensitive to the same attributes. Predictions made while studying modestly anticipate which items will succeed or fail in recogni- tion, but only because concrete items are easy to process and are recognized well. Predictions are inversely related to recog- nition for item frequency, because common items are easy to process but are recog- nized poorly. The predictions anticipate which items will later seem more and less familiar more accurately than which items will succeed and fail in recognition, be- cause familiarity is an attribute based on frequency of occurrence. Memory predic-

tions at study do not discriminate better from worse ways of studying unless both are done in the same list; within ways of studying, predictions of which items will succeed and fail are equally accurate.

Items that are reviewed after study differ from each other in how readily they re- trieve earlier encodings. Predictions after study discriminate better from worse study procedures, better from worse cues, and better from worse targets. These predic- tions accurately anticipate which items will succeed and fail, although they are too op- timistic about actual performance in poorer conditions. Their predictive accuracy in- creases as the predictive question is more similar to the test. They predict A +- B re- call most accurately if A is present for re- view and a question is asked about the ab- sent B, regardless of what the actual ques- tion is; they predict A recognition most accurately if A is the only item reviewed, regardless of what the question is. Predic- tions decline in accuracy if the reviewed stimulus differs from the cue on the test, if factors irrelevant for the test influence do- ing the predictive task, and if the question requires processing that is irrelevant for the test.

Our explanation of the results is that pre- dictions reflect how easily the predictive task was done; this depends on preexperi- mental differences between items and task- specific differences caused by earlier events in the experiment. At study, preex- perimental differences dominate; predic- tions will be accurate only if the attributes that make items easy to process also make them memorable. At review, retrievability of prior encodings dominates; predictions will be accurate only if the conditions of retrieval entailed by the item-and-query are similar to the conditions on the test.

On the basis of the results, we will be skeptical of students’ claims that they felt they had studied enough for our exams, be- cause their claims may merely reflect how easily they understood the concepts. Even if they review previous exams, they will be

MEMORY PREDICTIONS 631

too optimistic about their ability to answer the questions unless they actually attempt those answers; last year’s exam always looks easier than this year’s exam. We also wonder whether we do students a favor by making materials too easy to understand in lectures; our lucidity, not how well the con- cepts are learned, causes their ready under- standing.

We sympathize with students. Predicting future outcomes is difficult even for explicit theories of memory. Because only some of the information that is available at study will be available or useful later, predictors need a basis for identifying the “right stuff.” Furthermore, the exact nature of that stuff varies from test to test. Many memory theories are like naive heuristics, in that they nominate a particular aspect of encoding as the cause of later success, then founder when the nominated factor is dis- covered to be irrelevant or even harmful for some types of tests. Heuristics and theories based on current encoding are sensible, but too simple. They fail to consider what the test actually needs for success. For exam- ple, if the test requires producing an item, predictions should be based on production rather than comprehension; expressive meaning is independent of evocative mean- ing (Begg, 1976). Conversely, effective communication is based on predicting the receiver’s understanding of what is said, not the speaker’s ease of saying it (Begg & White, 198.5; Harris, Begg, & Upfold, 1980).

Through reflection, people gain implicit theories of memory. The theories are based on personal experience, but experience is rarely analytic enough to disentangle causes from correlations. The real world is replete with examples in which important decisions are based on correlational heuris- tics. For example, young males are associ- ated with a proportionately higher accident rate than are other classes of drivers, and insurance companies are prone to charge them more for coverage. However, male- ness is not a cause of accidents; charging an

individual male more money assumes guilt by association, rather than guilt by one’s own misdeeds. Similarly, classes of people who differ in gender, race, age, and so on, also differ in other ways; however, using correlational differences between classes to decide an individual’s promotion, punish- ment for a crime, and so on, is bad science.

There is a practical side to our ideas. If memory predictions are based on differ- ences in the ease of processing items, we can find ways to make the experience of processing resemble the experience of do- ing a test. Our cued-judgment task is a good predictor of recall because doing that task entails processing the cues and deciding whether they provide access with task- relevant memories that are adequate sup- port a response. Memory works forward on these tasks because hindsight is the best ba- sis for foresight; the items that will succeed stand out because they already have suc- ceeded. Although little research has exam- ined the accuracy of hindsight (cf., Gar- diner & Klee, 1976; Robinson & Kulp, 1970), we will encourage our students to abandon foresight heuristics like “under- standing is remembering” and base their study on hindsight heuristics like “re- trieving is remembering,” or even “remem- bering is remembering.”

REFERENCES

BEGG, I. (1973). Imagery and integration in the recall of words. Canadian Journal of Psychology, 27, 159-167.

BEGG, I. (1976). Acquisition and transfer of meaning- ful function by meaningless sounds. Canadian Journal of Psychology, 30, 178-186.

BEGG, I. (1978). Imagery and organization in memory: Instructions effects. Memory and Cognition, 6, 174-183.

BEGG, I. (1979). Trace loss and the recognition failure of unrecalled words. Memory and Cognition, 7, 113-123.

BEGG, I. (1982). Imagery, organization, and discrimi- native processes. Canadian Journal of Psychol- ogy, 36, 273-290.

BEGG, I., ARMOUR, V., & KERR, T. (1985). On believ- ing what we remember. Canadian Journal of&- havioral Science, 17, W-214.

BEGG, I., & GREEN, C. (1988). Repetition and trace

632 BEGG ET AL.

interaction: Superadditivity. Memory and Cogni- tion, in press.

BEGG, I., & ROWE, E. J. (1972). Continuous judg- ments of word frequency and familiarity. Journal of Experimental Psychology, 95, 48-54.

BEGG, I., UPFOLD, D., & WILTON, T. D. (1978). Im- agery in verbal communication. Journal of Mental Imagery, 2, 165-186.

BEGG, I., & WHITE, P. (1985). Encoding specificity in interpersonal communication. Canadian Journal of Psychology, 39, 70-87.

BISHOP, Y. M. M., FIENBERG, S. E., & HOLLAND, P. W. (1975). Discrete multivariate analysis: The- ory and practice. Cambridge, MA: MIT Press.

BLAKE, M. (1973). Prediction of recognition when re- call fails: Exploring the feeling-of-knowing phe- nomenon. Journal of Verbal Learning and Verbal Behavior, 12, 311-319.

BOWER, G. H. (1970). Imagery as a relational orga- nizer in associative learning. Journal of Verbal Learning and Verbal Behavior, 9, 529-533.

FLEXSER, A. J., & TIJLVING, E. (1978). Retrieval in- dependence in recognition and recall. Psycholog- ical Review, 85, 153-171.

GARDINER, J. M., & KLEE, H. (1976). Memory for remembered events: An assessment of output monitoring in free recall. Journal of Verbal Learn- ing and Verbal Behavior, 15, 227-233.

GLENBERG, A. M., SANOCKI, T., EPSTEIN, W., & MORRIS, C. (1987). Enhancing calibration of com- prehension. Journal of Experimental Psychology: General, 116, 119-136.

HARRIS, G., BEGG, I., & UPFOLD, D. (1980). On the role of the speaker’s expectations in interpersonal communication. Journal of Verbal Learning and Verbal Behavior, 19, 597-607.

HART, J. T. (1965). Memory and the feeling- of-knowing experience. Journal of Educational Psychology, 56, 208-216.

HART, J. T. (1967). Memory and the memory- monitoring process. Journal of Verbal Learning and Verbal Behavior, 6, 685-691.

HERTEL, P. T., ANOOSHIAN, L. J., & ASHBROOK, P. (1986). The accuracy of beliefs about retrieval cues. Memory and Cognition, 14, 265-269.

JACOBY, L. L., & DALLAS, M. (1981). On the relation- ship between autobiographical memory and per- ceptual learning. Journal of Experimental Psy- chology: General, 110, 306-340.

KING, J. F., ZECHMEISTER, E. B., & SHAUGHNESSY, J. J. (1980). Judgments of knowing: The influence of retrieval practice. American Journal of Psy- chology, 93, 329-343.

LOVELACE, E. A. (1984). Metamemory: Monitoring

future recallability during study. Journal of Exper- imental Psychology: Learning, Memory, and

Cognition, 10, 756-766. MANDLER, J. M., & MANDLER, G. (1964). Thinking:

From association to Gestalt. New York: Wiley. NELSON, T. 0. (1984). A comparison of current mea-

sures of the accuracy of feeling-of-knowing pre- dictions. Psychological Bulletin, 95, 109-133.

NELSON, T. O., GERLER, D., & NARENS, L. (1984). Accuracy of feeling-of-knowing judgments for predicting perceptual identification and relearn- ing. Journal of Experimental Psychology: Gen- eral, 113, 282-300.

NELSON, T. O., LENESIO, R. J., LANDWEHR, R. S., & NARENS, L. (1986). A comparison of three predic- tors of cue individual’s memory performance: The individual’s feeling of knowing versus the norma- tive feeling of knowing versus base-rate item dif- ficulty. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 27%287.

PAIVIO, A. (1971). Imagery and verbal processes. New York: Holt, Rinehart, & Winston.

PAIVIO, A., & BEGG, I. (1981). The psychology of lan- guage. Englewood Cliffs, NJ: Prentice-Hall.

PAIVIO, A., YUILLE, J. C., & MADIGAN, S. A. (1968). Concreteness, imagery, and meaningfulness val- ues for 925 nouns. Journal of Experimental Psy- chology Monograph, 76, No. 1, Pt. 2.

PERLMUTTER, M. (1978). What is memory aging the aging of? Developmental Psychology, 14, 330- 345.

RABINOWITZ, J. C., ACKERMAN, B. P., CRAIK, F. I. M., & HINCHLEY, J. L. (1982). Aging and metamemory: The roles of relatedness and imag- ery. Journal of Gerontology. 37, 688-695.

ROBINSON, J. A., & KULP, R. A. (1970). Knowledge of prior recall. Journal of Verbal Learning and Verbal Behavior, 9, 84-86.

SHAUGHNESSY, J. J. (1981). Memory monitoring accu- racy and modification of rehearsal strategies. Journal of Verbal Learning and Verbal Behavior, 20, 216-230.

TULVING, E. (1964). Intratrial and intertrial retention: Notes towards a theory of free recall verbal leam- ing. Psychological Review, 71, 219-237.

TVERSKY, A., & KAHNEMAN, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207-232.

UNDERWOOD, B. J. (1966). Experimental Psychology (2nd ed.). New York: Appleton-Century-Crofts.

ZECHMEISTER, F. B., & SHAUGHNESSY, J. J. (1980). When you know that you know and when you think that you know but don’t. Bulletin of the Psy- chonomic Society, 15, 41-44.

(Received December 5, 1988) (Revision received January 16, 1989)