(766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions

Embed Size (px)

Citation preview

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    1/41

    1 Benefits of testing

    memory

    Best practices and  boundaryconditions

     Henry L. Roediger, III, Pooja K. Agarwal, Sean H. K. K ang and Elizabeth J. Marsh

    The idea of a memory test or of a test of academic achievement is often

    circumscribed. Tests within the classroom are recognized as important for the

    assignment of grades, and tests given for academic assessment or achieve-

    ment have increasingly come to determine the course of childr en’s lives: scor e

    well on such tests and you advance, are placed in more challenging classes,

    and attend better schools. Against this widely acnowledged  bacdrop of 

    the importance of testing in educational life !not "ust in the #$, but all over the world%, it would be difficult to "ustify the claim that testing is not used

    enough in educational practice. &n fact, such a claim may seem to be ludicr ouson the face of it. 'owever, this is "ust the claim we will mae in this cha pter :(ducation in schools would greatly benefit from additional testing, andthe need for increased testing  probably increases with advancement in the

    educational system. &n addition, students should use self-testing as a study

    strategy in preparing for their classes. )ow, having begun with an inflammatory claim * we need more testing

    in education * let us e+plain what we mean and bac up our claims. irst, we

    are not recommending increased use of standardized tests in education,

    which is usually what people thin of when they hear the words testing in

    education. /ather, we have in mind the types of assessments !tests,

    essays, e+ercises% given in the classroom or assigned for homewor. The

    reason we

    advocate testing is that it re0uires students to retrieve information eff ortfullyfrom memory, and such eff ortful retrieval turns out to be a wonderfully powerful mnemonic device in many cir cumstances.

    Tests have both indirect and direct eff ects on learning !/oediger 12arpice, 3445b%. The indirect eff ect is that, if tests are given morefr e0uently,students study more. 6onsider a college class in which there is only a midter mand a final e+am compared to a similar class in which weely 0uizzes ar egiven every riday, in addition to the midterm and the final. A large r esearch

     program is not re0uired to determine that students study more in the class

    with weely 0uizzes than in the class without them. 7et tests also have a

    dir ecteff ect on learning8 many studies have shown that students’  retrieval

    of 

    information on tests greatly improves their later retention of the tested

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    2/41

    material, either compared to a no-intervention control or even compared to a

    control condition in which students study the material for an e0uivalent

    amount of time to that given to students taing the test. That is, taing a

    test on material often yields greater gains than restudying material, as wedocument below. These findings have important educational

    implications,

    ones that teachers and professors have not e+ploited.&n this chapter, we first report selectively on findings from our lab on the

    critical importance of testing !or retrieval% for future remembering. / etrieval

    is a powerful mnemonic enhancer. 'owever, testing does not lead to

    improvements under all possible conditions, so the remainder of our cha pter will discuss 0ualifications and  boundary conditions of test-enhanced

    learning,as we call our program !9caniel, /oediger, 1 9cermott, 344;b%.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    3/41

    story. The sub"ects were told that they should remember the pictures,  because

    they would be tested on the names of the pictures !which were given in the

    story%. The test was free recall, meaning that students were given a  blan 

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    4/41

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    5/41

    aff ected  performance a wee later8 three prior tests raised recall over C4I

    relative to the no-test condition !i.e., !G3 J ;%?; K 44%. @ooed at another way, immediately after study about G3 items could be recalled. &f sub"ects

    too three tests "ust after recall, they could still recall G3 items a wee later .The act of taing three tests essentially stopped the forgetting process in its

    tracs, so testing may be a mechanism to permit memories to consolidate or 

    reconsolidate !udai, 3445%.6ritics, however, could pounce on a potential flaw in the

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    6/41

     ig!re ".# /esults from /oediger and 2arpice !3445a, (+periment %. >n theH-minute delayed test, students who had repeatedly studied the materialrecalled it better than those who had studied it once and taen a test.6ramming !repeatedly reading% does wor, at least at very short r etention

    intervals. 'owever, on the two delayed tests, the  pattern reversed8studying and taing an initial test led to better  performance on the delayedtest than did studying the material twice.

    modulate the magnitude of the testing eff ect, beginning with the f or matof tests.

    THE FORMAT OF TESTS

    The power of testing to increase learning and retention has been demon-

    strated in numerous studies using a diverse range of materials8 but  both

    study and test materials come in a multitude of formats. Although the use of 

    true?false and multiple-choice e+ams is now commonplace in high school

    and college classrooms, there was a time !in the 34s and G4s% when these

    inds of e+ams were a novelty and referred to as new-type, in contrast

    to the more traditional essay e+ams !/uch, 3%. Liven the variety of 

    testformats, one 0uestion that arises is whether all formats are e0ually

    efficaciousin improving retention. &f we want to provide evidence-based r ecommenda-

    tions for educators to utilize testing as a learning tool, it is important toascertain if particular types of tests are more eff ective than others.&n a study designed to e+amine precisely this issue, 2ang, 9cermott, and

    /oediger !344;% manipulated the formats of both the initial and final

    tests * multiple-choice !96% or short answer !$A% * using a fully-cr ossed,

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    7/41

    within-sub"ects design. $tudents read four short "ournal articles, and immedi-

    ately afterwards they were given an 96 0uiz, an $A 0uiz, a list of statementsto read, or a filler tas. eedbac was given on 0uiz answers, and the 0uizzes

    and the list of statements all targeted the same critical facts. or instance,

    after reading an article on literacy ac0uisition, students in the $A condition

    generated an answer to

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    8/41

    were significantly worse in the filler-tas condition than the other thr eeconditions, indicating that both testing !with feedbac% and focused r e-e+posure aid retention of the target information. &mportantly, only theinitial

    $A condition produced significantly better final  performance than the r ead-

    statements condition8 the initial 96 and read-statements conditions didnot diff er significantly. /etrieval is a potent memory modifier !B"or, ;H%.These results implicate the processes involved in actively producing inf or ma-tion from memory as the causal mechanism underlying the testing eff ect./egardless of the format of the final test, the initial test format that r e0uir edmore eff ortful retrieval !i.e., short answer% yielded the best final perf or mance,and this condition was significantly better than having read the test answersin isolation.

    $imilar results from other studies provide converging evidence that eff ort-ful retrieval is crucial for the testing eff ect !6arpenter 1 e@osh, 34458Llover, C%. Butler and /oediger !344;%, for e+ample, used art historyvideo

    lectures to simulate classroom learning. After the lectures, students com-

     pleted short answer or multiple-choice tests, or they read statements as in2ang et al. !344;%. >n a final $A test given G4 days later, Butler and

    / oediger 

    found the same  pattern of results: !% retention of target facts was  best

    when students were given an initial $A 0uiz, and !3% taing an initial 96test produced final performance e0uivalent to reading the test answers !with-out taing a test%. As discussed in a later section of this chapter, these findingshave been replicated in an actual college course !9caniel, Anderson,

    erbish, 1 9orrisette, 344;b%.

    Although most evidence suggests that tests that re0uire eff ortful r etrievalyield the most memorial benefits, it should be noted that this depends upon

    successful retrieval on the initial test !or the delivery of feedbac%. 2ang et al.!344;% had another e+periment identical to the one described earlier e+cept

    that no feedbac was provided on the initial tests.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    9/41

    subse0uent tests. &n a similar vein, it has been shown that items that elicit

    errors on a cued recall test have almost no chance of being recalled corr ectly

    at a later time unless feedbac is given !Dashler, 6epeda,

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    10/41

    %able "." 9ean  proportion recalled in Agarwal et al.’s !344C% (+periment 3 on testformats. Droportion correct was greater for test conditions than study conditions8however, sub"ects predicted learning would be greater for study conditions relative to

    test conditions. @earning conditions that contained both testing and feedbac, namelythe closed-boo test with feedbac and the open-boo test conditions, contributed to best performance on the delayed test.

     Pro&ortion 'orr e't 

    (ondition  Initial test  )ne*wee+ delayed test 

    $tudy K .N4$tudy 3K .H4$tudy GK .HN

    6losed-boo test .5; .HH6losed-boo test with feedbac  .5H .55>pen-boo test .C .55$imultaneous answering .CG .H )on-studied contr ol .5

    open-boo test. Derhaps a diff erence between these two conditions willemerge with a longer delay for the final test, an outcome that has occurr edin other e+periments !e.g., /oediger 1 2arpice, 3445a%. $uch a findingwould  probably further depend on how students approach the open-boo 

    tests !e.g., whether they attempt retrieval of an answer before consulting the

    study material for feedbac, or whether they immediately search the study

    material in order to identify the target information%. uture research in our 

    lab will tacle this topic.&n summary, one reason why testing benefits memory is that it  pr omotes

    active retrieval of information.  )ot all formats of tests are e0ual. Testformats that re0uire more eff ortful retrieval !e.g., short answer% tend to produce a greater boost to learning and retention, compared to test f or matsthat engage less eff ortful retrieval !e.g., multiple-choice%. 'owever, tests thatare more eff ortful or challenging also increase the lielihood of r etrievalfailure, which has been shown to reduce the beneficial eff ect of testing.Therefore, to ameliorate low  performance on the initial test, corr ectivefeedbac should be provided. The practical implications of these findingsfor improving learning in the classroom are straightforward: instead of 

    giving students summary notes to read, teachers should implement mor e

    fre0uent testing !of important facts and concepts% * using test formats thatentail eff ortful retrieval * and provide feedbac to correct errors.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    11/41

    TESTING AND FEEDBAC 

    Broadly defined, feedbac is information  provided following a

    r esponse or recollection, which informs the learner about the status of current  per- formance, often leading to improvement in future

     performance !/ oediger , Paromb, 1 Butler, 344C%. A great deal of 

    laboratory and applied research has

    e+amined the conditions under which feedbac is, and is not, eff ective in

    improving learning and  performance.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    12/41

    1 9arsh, 344%. $imilarly, Butler, 2arpice, and /oediger !344C% suggested

    that feedbac reinforces the association between a cue and its target r esponse,

    increasing the lielihood that an initially low-confidence correct r esponsewill be produced on a final criterial test./egarding the best time to deliver feedbac, there e+ists a great deal of 

    debate and confusion, as immediate and delayed feedbac are oper ational-ized diff erently across studies. or instance, the term immediate feedbac has been used to imply feedbac given "ust after each test item or feedbac 

     provided immediately after a test. >n the other hand, delayed feedbac can

    tae place anywhere from C seconds after an item to 3 days after the test

    !2uli 1 2uli, CC%, although in many educational settings the feedbac 

    may actually occur a wee or more later .Butler et al. !344;% investigated the eff ects of type and timing of feedbac 

    on long-term retention. $ub"ects read prose passages and completed an initial

    multiple-choice test. or some responses, they received standardfeedbac !i.e., the correct answer%8 for others they received feedbac by

    answering until correct !labeled A#6: i.e., each response was labeled as

    correct or incorr ect, and if incorrect they chose additional options until

    they answered the 0ues- tion correctly%. 'alf of the sub"ects received the

    feedbac immediately after each 0uestion, while the rest received the

    feedbac after a -day delay.

    >ne wee later, sub"ects completed a final cued recall test, and these data ar eshown in igure .N. >n the final test, delayed feedbac led to substantially better  performance than immediate feedbac, while the standard feedbac 

    condition and the answer-until-correct feedbac condition resulted in similar  performance. Butler et al. discussed why some studies find immediate feed- bac to be more beneficial than delayed feedbac !e.g., see 2uli 1 2 uli,CC, for a review%. )amely, this inconsistency might occur if learners do not

    fully process delayed feedbac, which would be particularly liely in a pplied

    studies where less e+perimental control is present. That is, even if students

    receive delayed feedbac they may not loo at it or loo only at feedbac on

    0uestions that they missed.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    13/41

     ig!re ". /esults from Butler, 2arpice, and /oediger !344;%. >n the final test,delayed feedbac !B% led to substantially  better  performance thanimmediate feedbac, while the standard feedbac condition and the

    answer-until-correct !A#6% feedbac condition resulted in similar  per-formance.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    14/41

    concentrate on others that have not yet been learned. The assumption is that

    learning to a criterion of one correct recitation means that the item is learned

    and that further practice on it will be for naught.

    2arpice and /oediger !344C% published a study that 0uestions this com-mon wisdom. &n their study, students learned foreign language vocabulary in

    the form of $wahili*(nglish words pairs. !$wahili was used because students

    were unfamiliar with the language, and yet the word forms were easily pr o-

    nounceable for (nglish speaers, such as -ash!aboat %. $tudents learned

    N4 pairs under one of four conditions. &n one condition, students studied

    and were tested on the N4 pairs in the usual multitrial learning situation

    favored by psychologists !study*test, study*test, study*test, study*test,

    la beled

    the $T condition%. &n a second condition, students received a similar firststudy*test cycle, but if they correctly recalled pairs on the test, these  pairs

    were dropped from the ne+t study trial. Thus, across the four trials, the

    study list got smaller and smaller as students recalled more items. 'owever,in this condition, labeled $ )T, students were tested on all N4 pairs during

    each test period. Thus, relative to the $T condition, the $ )T condition

    involved fewer study opportunities  but the same number of tests. &n a third

    condition, labeled $T ), students studied and were tested the same way as

    in the other conditions on the first trial, but after the first trial they repeatedly studied the

     pairs three more times, but once they had recalled a pair, it was dr oppedfrom the test. &n this condition, the study se0uence stayed the same on f our 

    occasions, but the number of items tested became smaller and smaller .inally, in a fourth condition denoted $ )T ), after the first study*test trial,items that were recalled were dropped both from the study and test phase of 

    the e+periment for the additional trials. &n this case, then, the study list and

    the test se0uence became shorter over trials. This last condition is most li estandard advice for using flashcards * students studied and were tested on the

     pairs until they were recalled, and then they were dropped so that attention

    could be devoted to unlearned  pairs.&nitial learning on the N4 pairs in the four conditions is shown in igur e

    .H, where it can be seen that all four conditions produced e0uivalent learn-

    ing. The data in igure .H show cumulative  performance, such that studentswere given credit the first time they recalled a pair and not again !for those

    conditions in which multiple recalls of a pair were re0uired, $T and $ )T%. Atthe end of the learning phase, students were told that they would come bac a

    wee later to be tested again and were ased to predict how many pairs they

    would recall. $tudents in all four groups estimated that they would r ecall

    about 34 pairs, or H4I of the pairs, a wee later. After all, the learningcurveswere e0ual, so why would we e+pect students’  "udgments to diff er=

    igure .5 shows the  proportion of items recalled wee later, in each of 

    the four conditions. $tudents in two conditions did very well !$T and $ )T,

    around C4I% and in the other two conditions students did much more  poor ly

    !$T ) and $ )T ), around GHI%.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    15/41

     ig!re "./ &nitial cumulative learning  performance from 2arpice and / oediger !344C%. All four conditions produced e0uivalent learning.

    &n both the $T and $ )T conditions, students were tested on all N4 pairs for all

    four trials. )ote that in the $T condition students studied all N4 items f our 

    times, whereas in the $ )T condition, the items were dropped from study.

    'owever, this reduced amount of study did not matter a bit for retention a

    wee later. $tudents in the $T )

    and $ )

    T )

    conditions had only enough testingfor each item to be recalled once and, without repeated retrieval, final r ecall

    was relatively poor. >nce again, the condition with many morestudy

    opportunities !$T )% did not lead to any appreciably better recall a wee 

    later than the condition that had minimal study opportunities !$ )T )%.

    The  bottom line from the 2arpice and /oediger !344C% e+periment is

    that after students have retrieved a pair correctly once, repeated

    r etrieval is the ey to improved long-term retention. /epeated studying after 

    this point does not much matter .

    /ecall that the students in the four conditions predicted that they would doe0ually well, and recall about H4I after a wee. As can be seen in igure .5,

    the students who were repeatedly tested actually outperformed their  pr edic-

    tions, so they underestimated the power of testing. >n the other hand, the

    students who did not have repeated testing overestimated how well they

    would do. &n a later section, we return to the issue of what students now

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    16/41

     ig!re ".0 inal learning results from 2arpice and /oediger !344C%. $tudents in the$T and $

     )T conditions performed very well, and students in the $T

     ) and$ )T ) conditions did much more poorly. &n both the $T and $ )T condi-tions, students were tested on all N4 pairs for all four trials. $tudents in the$T

     )and $ )T ) conditions had only enough testing for each item to  be

    recalled once and, without repeated retrieval, recall was relatively  poor.The condition with many more study opportunities !$T )% did not lead toany appreciably better recall a wee later than the condition that hadminimal study opportunities !$

     )T )

    %.

    about the eff ects of testing and whether they use testing as a study strategywhen left to their own devices.

    /epeated testing seems to be great for consolidating information into

    long- term memory, but is there an optimal schedule for repeated testing=

    @andauer and B"or !;C% argued that a condition they called e+panding

    retrieval was optimal, or at least was better than two other schedules

    called massed  practice and e0ual interval practice. To e+plain, let us stic 

    with our f or eign- language vocabulary learning e+ample above, -ash!aboat ,

    and consider  patterns in which three retrievals of the target might be

    carried out. &n the immediate test condition, after the item has been presented, -ash!a would be presented three times in a row for boat to

     be recalled each time. This condition might provide good practice because,

    of course,  performance on each test would be nearly perfect. !&n fact, in

    most e+periments, this massed testing condition leads to CI or higher 

    correct recall with pair ed-associates.%

    i t i

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    17/41

    The massed retrieval condition will be denoted a 4-4-4 to indicate that the

    three retrievals occurred bac to bac, with no other study or test items

     between retrievals of the tar get.

    A second condition is the e0ual interval schedule, in which tests are givenafter a delay from study and at e0ual intervals after that. $o, in a H-H-Hschedule, a pair lie -ash!aboat would be presented, then five other pairs or tests would occur, and then -ash!a 11 would be given as a cue. Thissame

     process would occur two more times. Although distributed retrieval could  be beneficial relative to massed retrieval, "ust as distributed study practice is beneficial relative to massed practice, one looming problem occurs in the caseof retrieval * if the first test is delayed, recall on the first test may be low and,as discussed above, low  performance on a test can reduce or eliminate the power of the testing eff ect. To overcome this problem, @andauer and B"or  !;C% introduced the idea of e+panding retrieval practice, to insure a near lyerrorless early retrieval with a 0uic first test while at the same time gaining

    advantages of distributed testing or practice. $o, to continue with our e+ample, in an e+panding schedule of -N-4, students would be tested with

    -ash!a * == after only intervening item, then again after N intervening items,

    and then after 4 intervening items. The idea behind the e+panding schedule

    is familiar to psychologists because it resembles the idea of shaping  behavior 

     by successive appro+imations !$inner, HG, 6hapter 5%8 "ust as schedules of 

    reinforcement !erster 1 $inner, H;% e+ist to shape behavioral

    r esponses, so schedules of retrieval may shape the ability to remember. &f 

    a student wants to be able to retrieve a vocabulary word long after study, the

    e+panding retrieval schedule may help to shape itsr etrieval.@andauer and B"or !;C% conducted two e+periments pitting massed,

    e0ual interval, and e+panding interval schedules of retrieval against one

    another. or the latter conditions, they used H-H-H spacing and -N-4

    spacing. )ote that this comparison e0uates the average interval of spacing

    at

    H. The materials in one e+periment were fictitious first and last names, suchthat students were re0uired to produce a last name when given the first name.@andauer and B"or measured  performance on the three initial tests and thenon a final test given at the end of the e+perimental session. The results for thee+panding and e0ual interval retrieval se0uences are shown in Table .3

    f or 

    %able ".# 9ean  proportion recalled in @andauer and B"or  ’s !;C% e+periment onschedules of testing8 data are estimated from igures . and .3. (+panding r etrievalschedules were better than e0ual interval schedules on both the initial three tests andthe final criterial test.

     Initial tests  inal test 

    " # $

    (+panding .5 .HH .H4 .N;(0ual .N3 .N3 .NG .N4

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    18/41

    the three initial tests and then the final, criterial, test. (+panding r etrieval

    schedules were better than e0ual interval schedules, as @andauer and B"or   predicted, on both the initial three tests and then the final criterial test. The

    ;I advantage of e+panding interval retrieval to e0ually spaced retrieval onthe final test was small but statistically significant, and this is the comparisonthat the authors emphasized in the paper. They replicated the eff ect in aseparate e+periment with face*name pairs. 'owever, note a curious fact

    about the data in Table .3: >ver the four tests shown,  performance

    dr ops steadily in the e+panding interval condition !from .5 to .N;% whereas

    in the

    e0ual interval condition  performance is essentially flat !.N3 to .N4%. This  pat-tern suggests that on a more delayed final test, the curves might cross over and e0ual interval retrieval might prove superior to e+panding r etrieval.

    $trangely, for some years researchers did not investigate @andauer and

    B"or  ’s  !;C% intriguing findings, perhaps because they made such good

    sense. 9ost of the studies on retrieval schedules compared e+panding and

    massed retrieval, but did not include the critical e0ual interval condition

    needed to compare e+panding retrieval to another distributed schedule

    !e.g., /ea 1 9odigliani, CH%. All studies maing the massed versus e+pand-ing retrieval comparison showed e+panding retrieval to be more eff ective,

    and

    Balota, uche, and @ogan !344;% have provided an e+cellent review of this

    literature. They show conclusively that massed testing is a poor strategy

    r ela- tive to distributed testing, despite the fact that massed testing produces

    very high  performance on the initial tests !much higher than e0ual interval

    testing%. Although this might seem commonplace to cognitive psychologistssteeped

    in the literature of massed versus spaced  presentation !the spacing eff ect%,from a diff erent perspective the outcome is surprising. $inner !HC%  pr o-moted the notion of errorless retrieval as being the ey to learning, and he

    implemented this approach into his teaching machines and  pr ogrammed

    learning. 'owever, current research shows that distributed retrieval is

    much

    more eff ective in  promoting later  performance than is massed retrieval,

    even

    though massed retrieval produces errorless perf or mance.>n the other hand, when comparisons are made between e+panding and

    e0ual interval schedules, the data are much less conclusive. The other main

     point established in the Balota et al. !344;% review is that no consistent evi-

    dence e+ists for the advantage of e+panding retrieval schedules over e0ual

    interval testing se0uences. A few studies after @andauer and B"or  ’s !;C%seminal study obtained the eff ect, but the great ma"ority did not. ,or 

    e+ample, 6ull !3444% reported four e+periments in which students learneddifficult word pairs. Across e+periments, he manipulated variables such asintertest interval, feedbac or no feedbac after the tests, and testing versus

    restudying the material. The general conclusion drawn from the four 

    e+peri-ments was that distributed retrieval produced much better retention on a final

    test than did massed retrieval, but that it did not matter whether the schedule

    had uniform or e+panded spacing of tests.

    /ecent research by 2arpice and /oediger !344;% and @ogan and Balota

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    19/41

     2ene3ts o4 testing -e-oryG4  Roediger et al.

    !344C% actually shows a more interesting  pattern. >n tests that occur a day

    or more after original learning, e0ual interval schedules of initial testing

    actually produce greater long-term retention than do e+panding schedules

    !"ust the opposite of @andauer and B"or  ’s  findings%. /ecall the data inTable .3 and how the e+panding retrieval testing condition showed asteady

    decline with repeated tests whereas the e0ual interval schedule showed essen-tially no decline. Because the final test in these studies occurred during thesame session as initial learning, the retention interval for the final test wasfairly short, leaving open the possibility that on a long-delayed test the func-

    tions would actually cross. This is "ust what both 2arpice and /oediger 

    and @ogan and Balota f ound.

    2arpice and /oediger !344;% had students learn word pairs taen from

     practice tests for the Lraduate /ecord (+am !e.g.,  sobri6!et 

    ni'+na-e ,benisonblessing % and tests consisted of giving the first member of the  pair and asing for the second. Their initial testing conditions were massed !4-4-

    4%, e+panding !-H-% and e0ual interval !H-H-H%. &n addition, they included

    two conditions in which students received only a single test after either

    intervening pair or H. The design and initial test results are shown on the left

    side of Table .G. &nitial test  performance was best in the massed condition,

    ne+t in the e+panding condition, and worst in the e0ually spaced condition,

    the usual  pattern, and students in the single test condition recalled less

    after a delay of H intervening items than after . There are no surprises in theinitial recall data. 'alf the students too a final test 4 minutes after theinitial learning phase, whereas the rest received the final test 3 days later .These results are shown in the right side of Table .G. irst consider dataat

    the 4-minute delay. The top three rows show a very nice replication of the

     pattern reported by @andauer and B"or !;C%: (+panding retrieval in theinitial phase produced better recall on the final test than the e0ual intervalschedule, and both of these schedules were better than the massed r etrieval

    %able ".$ 9ean  proportion recalled in 2arpice and / oediger ’s !344;% e+periment

    on schedules of testing. (+panding retrieval in the initial phase produced better r ecallthan the e0ual interval schedule on the 4-minute delayed test, and both of theseschedules were better than the massed retrieval schedule. 'owever, after a 3-daydelay, recall was best in the e0ual interval condition relative to the e+pandingcondition, although both conditions still produced better  performance than in themassed condition.

     Initial tests inal tests

    " # $ "5 -in 7 h

    9assed !4-4-4% .C .C .C .N; .34(+panding !-H-% .;C .;5 .;; .; .GG(0ual !H-H-H% .;G .;G .;G .53 .NH$ingle-immediate !% .C .5H .33

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    20/41

     2ene3ts o4 testing -e-oryG  Roediger et al.

    $ingle-delayed !H% .;G .H; .G4

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    21/41

     2ene3ts o4 testing -e-oryG3  Roediger et al.

    schedule. Also, the single-immediate test produced better delayed recall than

    the single-delayed test. 'owever, the startling result in this e+periment

    appeared for those sub"ects who too the test after a 3-day delay. /ecall

    was now best in the e0ual interval condition ! M M .NH% relative to thee+panding condition ! M M .GG%, although  both conditions still produced

     better  perf or m- ance than in the massed condition ! M M .34%. &nterestingly,

     performance also reversed across the delay for the two single test conditions:

    recall was better in the single-immediate condition after 4 minutes, but was

    reliably better in the single-delayed condition after 3 days.

    2arpice and /oediger !344;% argued that, congenial as the idea is,

    e+pand- ing retrieval is not conducive to good long-term retention.

    &nstead, what

    seems to be important for long-term retention is the difficulty of the firstretrieval attempt.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    22/41

     2ene3ts o4 testing -e-oryG3  Roediger et al.

    half the statements to be right and half to be wrong, and normally false itemsare plausible in order to re0uire rather fine discriminations. $imilarly, f or multiple-choice tests, students receive a 0uestion stem and then four  possible

    completions, one of which is correct and three others that are err oneous

    !but again, statements that might be close to correct%. Because erroneous

    information is presented on the tests, students might learn that incorr ect

    information, especially if no feedbac is given !as is often the case in collegecourses%. &f the test is especially difficult !meaning a large number of wr ong

    answers are selected%, the students may actually leave a test more confused

    about the material than when they waled in. 'owever, even if conditions

    ar e such that students rarely commit errors, it might be that simply reading

    and carefully considering false statements on true?false tests and

    distractors on multiple-choice tests can lead later to erroneous nowledge.

    $everal studies have shown that having people simply read statements!whether true or false% increases later "udgments that the statements are

    true !Bacon, ;8 Begg,

    Armour, 1 2err, CH8 'asher, Loldstein, 1 Toppino, ;;%. This

    eff ect

    underlies the tactics of  propagandists using the big lie techni0ue byr epeating

    a statement over and over until the populace believes it, and is also a

    favor ed tactic in most #$ presidential elections. &f you repeat an untruth

    about an opponent repeatedly, the statement comes to be believed./emmers and /emmers !35% first discussed the idea that incorr ect

    information on tests might mislead students, when the new techni0ues of 

    true?false and multiple-choice testing were introduced into education !/ uch,

    3%. They called this outcome the negative suggestibility eff ect, although

    not much research was done on it for many years. 9uch later Toppino andhis

    colleagues showed that statements presented as distractors on true?false and

    multiple-choice tests did indeed accrue truth value from their mere pr esenta-

    tion, because these statements were "udged as more true when mi+ed with

    novel statements in appropriately designed e+periments !Toppino 1Brochin,

    C8 Toppino 1 @uipersbec, G%. &n a similar vein, Brown !CC% and

    Oacoby and 'ollingshead !4% showed that e+posing students to misspelled

    words increased misspelling of those words on a later oral test.

    /oediger and 9arsh !344H% ased whether giving a multiple-choice test

    !without feedbac% would lead to a ind of misinformation eff ect !@oftus

    et al., ;C%. That is, if students tae a multiple-choice test on a subset of 

    facts, and then tae a short answer test on all facts, will prior testing incr easeintrusions of multiple-choice lures on the final test= /oediger and 9arsh

    conducted a series of e+periments to address this 0uestion, manipulating thedifficulty of the material !and hence level of  performance on the multiple-choice test% and the number of distractors given on the multiple-choice test.

    Three e+periments were submitted using these tactics, but the editor ased usto drop our first two e+periments !which established the  phenomenon% and

    report only a third, control, e+periment that showed that the negative sugges-tibility eff ect occurred under tightly controlled but not necessarily r ealisticconditions.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    23/41

    GG  Roediger et al.  2ene3ts o4 testing -e-ory $ $

    most interesting e+periments !in our opinion% were not reported. ur first e+periment was e+ploratory, "ust to mae sure we could get theeff ects we sought. f 

    interest was whether selecting distractors on the multiple-choice test would

    lead to their intrusion on a later test.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    24/41

     2ene3ts o4 testing -e-oryGN  Roediger et al.

    multiple-choice  performance, and this allowed us to see if any negative

    eff ects

    of testing were limited to conditions in which sub"ects made more errors onthe multiple-choice test !i.e., difficult items and relatively many distractors%.The interesting data are contained in Tables .H and .5 !again, the top

     panels devoted to (+periment %. Table .H shows the  proportion of short

    answer 0uestions answered correctly !bowling in response to

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    25/41

    GH  Roediger et al.  2ene3ts o4 testing -e-ory $ /

    %able ".0 Droportion target incorrect answers on the cued recall test as a function of 0uestion difficulty and number of alternatives !including the correct answer% on the prior multiple-choice test. )on-guess responses are in parentheses !proportion corr ectnot including those that received a not sure rating%. The prior multiple-choice test

    led to more errors on the final test and this eff ect grew larger when more distractors

    had been presented on the multiple-choice test. /emoving the not sure r esponsesreduced the size of the negative suggestibility eff ect, but left the basic pattern intact.

     8!-ber o4 &re9io!s alternati9es

     :ero ;not*tested< % wo   %hr ee  o!r 

    (+periment (asy 0uestions .4C .4 . .

    !.4H% !.45% !.4% !.4C%

    'ard 0uestions .; .35 .GN .G5

    !.4% !.34% !.3% !.3N%

    (+periment 3/ead  passages .4H .4 .4 .

    !.4% !.45% !.4H% !.4;%

     )on-read passages . .3N .3H .G;!.4G% !.G% !.3% !.H%

    The data show clearly that taing a multiple-choice test can simultaneouslyenhance  performance on a later cued recall test !a positive testing eff ect% andharm  performance !a negative suggestibility eff ect%. The former eff ect comes

    from 0uestions answered correctly on the multiple-choice test, whereas thelatter eff ect arises from errors committed on the multiple-choice test. &n fact,;CI of the multiple-choice lure answers on the final test had been selected

    erroneously on the prior multiple-choice test. This result is noteworthy because it suggests that any negative eff ects of multiple-choice testing r e0uir eselection of an incorrect answer, and that simply reading the lures !and then

    selecting the correct answer% is not  pr oblematic.

    &n (+periment 3 we e+amined whether students would show the sameeff ects when learning from prose materials.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    26/41

     2ene3ts o4 testing -e-oryG5  Roediger et al.

    The multiple-choice data are displayed in the  bottom  part of Table .N.

    Again, the results are straightforward:  )ot surprisingly,  performance on

    unread passages was lower than for read passages, and  performance

    gen- erally declined with the number of distractors !albeit more for unreadthan read passages%. >nce again, the manipulations succeeded in varying

    multiple- choice performance across a fairly wide r ange.

    The conse0uences of multiple-choice testing can be seen in the  bottom of Table .H, which shows the  proportion of final short answer 0uestionsanswered correctly. A positive testing eff ect occurred for both read andunread passages, although for unread passages the eff ect declined with thenumber of distractors on the prior multiple-choice test. $till, as in the firste+periment, a positive testing eff ect was observed in all conditions, even whennot sure responses were removed !the data in par entheses%.

    As can be seen in Table .5, the negative suggestibility eff ect also a ppear ed

    in full force in (+periment 3, although it was greater for the nonr ead

     passages, with their corresponding higher rate of errors on the multiple-

    choice test than for the read passages. or the read passages, the error 

    r ate nearly doubled after the multiple-choice test, from HI to I.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    27/41

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    28/41

     2ene3ts o4 testing -e-oryGC  Roediger et al.

    answers sounds more interesting when one thins about the importance of endorsing multiple-choice lures for the negative suggestibility eff ect. ne group of 

    undergraduates was warned they would receive a penalty for wrong

    answers and that they should choose a don’t now option if they were not

    r eason- ably sure of their answer. Another group was re0uired to answer 

    all of the

    multiple-choice 0uestions. Both groups showed large positive testing eff ects,and smaller negative testing eff ects. 6ritically, the penalty instruction signifi-cantly reduced the negative testing eff ect, although it was still significant./esearch on negative suggestibility is "ust beginning, and only a few

    variables have been systematically investigated. Three classes of variables are liely to be interesting: ones that aff ect how liely sub"ects areto selectmultiple-choice lures !e.g., reading related material, a penalty for wr onganswers on the 96 test%, ones that aff ect the lielihood that selected multiple-choice lures are integrated with related world nowledge !e.g., corr ectivefeedbac%, and ones that aff ect monitoring at test !e.g., the warning againstguessing on the final test used in /oediger 1 9arsh, 344H%. The negativetesting eff ect could change in size for any of these reasons. or e+ample,consider one recent investigation involving the eff ects of adding a none of the above option to the 96 test !>degard 1 2oen, 344;%.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    29/41

    G  Roediger et al.  2ene3ts o4 testing -e-ory $ >

    constraints !e.g., time pressure%, and online monitoring during the learning

    e+perience !i.e., sub"ective assessments of how well the material has  been

    learned8 Ben"amin, 344;%. &n other words, a student’s beliefs about learning

    and memory and his or her sub"ective evaluations during the learning e+peri-ence are vital to eff ective learning !unlosy, 'ertzog, 2ennedy, 1

    Thiede,

    344H%. &n this section we shall discuss the metacognitive factors concomitant

    with testing, how testing can improve monitoring accuracy, as well as the use

    of self-testing as a study strategy by students.

    /esearch on metacognition  provides a framewor for e+amining how

    students strategically monitor and regulate their learning.  Monitoring r efers

    to a  person’s sub"ective assessment of their cognitive processes, and 'ontr ol 

    refers to the processes that regulate behavior as a conse0uence of monitoring!)elson 1 )arens, 4%. >ne indirect way in which testing can enhance

    future learning is by allowing students to better monitor their learning !i.e.,

    discriminate information that has been learned well from that which has

    not been learned%. (nhanced monitoring in turn influences subse0uentstudy behavior, such as having students channel their eff orts towardslesswell-learned materials. A survey of college students’  study habitsr evealed

    that students are generally aware of this function of testing !2ornell 1

    B"or ,

    344;%. &n response to the 0uestion &f you 0uiz yourself while you study, whydo you do so= 5CI of respondents chose To figure out how well & havelearned the information &’m studying, while only CI selected & learn mor e

    that way than through rereading, suggesting that relatively few students

    view testing as a learning event !see too 2arpice, Butler, 1 /oediger, 344%.

    To gain insight into sub"ects’  monitoring abilities, researchers as 

    them to mae "udgments of learning !O>@s%. )ormally done during study,

    students predict their ability to remember the to-be-learned information

    at a later  point in time !usually on a scale of 4*44I%, and then these

     predictions ar e compared to their actual  performance. #sually people are

    moderately accur - ate when maing these predictions in laboratory paradigms !e.g., Arbucle 1 6uddy, 5%, but O>@s are inferential in

    nature and can be based on a variety of beliefs and cues !2oriat, ;%.

    The accuracy of one’s metacogni- tive monitoring depends on the e+tent to

    which the beliefs and cues that one uses are diagnostic of future memory

     performance  * and some of students’ beliefs about learning are wrong. or 

    e+ample, sub"ects believe that items that are easily processed will be

    easy to retrieve later !e.g., Begg, uft, @alonde, 9elnic, 1 $anvito,

    C%, whereas we have already discussedthat more eff ortful retrieval is more liely to promote retention.

    $imilar ly,

    students tend to give higher O>@s after repeated study than after r eceivinginitial tests on the to-be-remembered material, but actual final memory  per-formance e+hibits the opposite  pattern !i.e., the testing eff ect8 Agarwal et al.,344C8 2ang, 344a8 /oediger 1 2arpice, 3445a%. /epeated studying of thematerial  probably engenders greater processing fluency, which leads to anoverestimation of one’s  future memory  perf or mance.

    $tudents’ incorrect beliefs about memory mean that they often engage in

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    30/41

     2ene3ts o4 testing -e-oryN4  Roediger et al.

    suboptimal learning strategies. or e+ample, O>@s are often negatively cor-

    related with study times during learning, meaning that students spend mor etime studying items that they feel are difficult and that they still need to

    master !$on 1 9etcalfe, 34448 although see 9etcalfe 1 2ornellE344GF

    for conditions that produce an e+ception to this generalization%. )ot only is

    testing a better strategy, but sometimes substantial increases in study time ar e

    not accompanied by e0uivalent increases in performance, an outcome ter medthe labor-in-vain eff ect !)elson 1 @eonesio, CC%.

    6onsider a study by 2arpice !in press% that e+amined sub"ects’ str ategiesfor learning $wahili*(nglish word pairs. 6ritically, the e+periment had

    repeated study*test cycles !multi-trial learning% and once sub"ects were able

    to correctly recall the (nglish word !when cued with the $wahili word% they

    wer e given the choice of whether to restudy, test, or drop an item for the

    upcoming

    trial, with the goal of ma+imizing  performance on a final test wee later .$ub"ects chose to drop the ma"ority of items !54I%, while about 3HI and HI

    of the items were selected for repeated testing and restudy,

    r espectively. $ub"ects also made O>@s before maing each choice, and

    items selected f or restudy were sub"ectively the most difficult !i.e., lowest O>@s%, dropped

    items

    were perceived to be the easiest, and items selected for testing were in between. As e+pected, final  performance increased as a function of the  pr o- portion of items chosen to be tested, whereas there was no r elationship between the  proportion of items chosen for restudy and final recall. inally,there was a negative correlation  between the  proportion of items dr oppedand final recall, indicating that sub"ects dropped items before they had firmlyregistered the pair .

    These results suggest that learners often mae suboptimal choices during

    learning, opting for strategies that do not ma+imize subse0uent r etention.

    Also, the tendency to drop items once they were recalled the first time r eflectsoverconfidence and under-appreciation of the value of practicing r etrieval.ollow-up research in our lab !2ang, 344b% is investigating whether e+periencing the testing eff ect !i.e., performing well on a final test for items

     previously tested, relative to items that were previously dropped orr estudied%can induce learners to select preferentially self-testing study strategies that

    enhance future recall. @s suggest an important

    role for testing in improving monitoring accuracy. elayed O>@s refer to

    ones solicited at some delay after the items have been studied, whereas

    immediate O>@s are solicited immediately after each item has been

    studied. elayed O>@s are typically more accurate than immediate O>@s

    !e.g., )elson 1

    unlosy, %. This delayed O>@ eff ect is obtained only under certainconditions, specifically when the O>@s are cue-only O>@s. This term r efersto the situation in which studied items are A*B pairs and sub"ects are  pr o-

    vided only with A when ased to mae their prediction for later recall of the

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    31/41

    N  Roediger et al.  2ene3ts o4 testing -e-ory "

    target B8 the eff ect does not occur when O>@s are sought with intact cue-target pairs presented !unlosy 1 )elson, 3%. >ne e+planation for this

    finding is that sub"ects attempt retrieval of the target for cue-only

    delayedO>@s, and success or failure at retrieval then guides sub"ects’ pr edictions!i.e., a high O>@ is given if the target is successfully retrieved8 if not then a

    low O>@ is given%. This enhanced ability to distinguish well-learned fromless well-learned items, coupled with the testing eff ect on items r etrieved

    successfully during the delayed O>@, has been proposed to account f or the increased accuracy of delayed O>@s !$pellman 1 B"or, 38 2elemen 1

    @s were not always accurate: 2oriat and B"or !344H% had sub"ects learn

     paired associates, including forward-associated  pairs !e.g., 'heddar'heese%,

     bacwards-associated  pairs !e.g., 'heese.'heddar %, and unrelated pairs.

    uring learning, sub"ects were ased to "udge how liely it was that they

    would remember the 3nd word in the pair. $ub"ects over-predicted their 

    ability to remember the target in the  bacwards-associated  pairs, and the

    authors dubbed this an illusion of competence. n the

    first study*test cycle, sub"ectsshowed the same overconfidence for the bacward-associated pairs, but O>@s

    and recall  performance  became better calibrated with further study* testopportunities. This finding suggests that prior test e+perience can enhancelearners’ sensitivity to retrieval conditions on a subse0uent test, and can be a

    way to improve metacognitive monitoring.

    The studies discussed in this section converge on the general conclusionthat the ma"ority of college students are unaware of the mnemonic benefit of testing:

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    32/41

    N3  Roediger et al.  2ene3ts o4 testing -e-ory #

    engage in suboptimal study behavior. (ven though college students might  be

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    33/41

     2ene3ts o4 testing -e-oryN3  Roediger et al.

    e+pected to be e+pert learners !given their many years of schooling and

    e+perience preparing for e+ams%, they often labor in vain !e.g., rereading the

    te+t% instead of employing strategies that contribute to robust learning and

    retention. $elf-testing may be unappealing to many students because of thegreater eff ort re0uired compared to rereading, but this difficulty during learn-ing turns out to be beneficial for long-term  performance !B"or, N%. Ther e-fore, the challenge for future research is to uncover conditions thatencour age

    learners to set aside their naSve intuitions when studying and opt for retrieval-

     based strategies that yield lasting r esults.

    A""LICATIONS OF TESTING IN CLASSR OOMS

    /ecently, several "ournal articles have highlighted the importance of using

    tests and 0uizzes to improve learning in real educational situations. The

    notion of using testing to enhance student learning is not novel, however ,

    as Lates employed this practice with elementary school students in ;

    !see too Oones E3GF and $pitzer EGF%. >ne cannot, however, assume

    thatlaboratory findings necessarily generalize to classroom situations, given

    thatsome laboratory parameters !e.g., relatively short retention intervals,tighte+perimental control% do not correspond well to naturalistic conte+ts. This

    distinction has garnered interest recently and we will outline a few studiesthat have evaluated the efficacy of test-enhanced learning within a

    classr oom

    conte+t.

    @eeming !3443% adopted an e+am-a-day  procedure in two sections

    of &ntroductory Dsychology and two sections of his summer @earning

    and 9emory course, for a total of 33 to 3N e+ams over the duration of the

    courses. &n comparable classes taught in prior semesters, students hadreceived only four e+ams. inal retention was measured after 5 wees.

    @eem-

    ing found significant increases in  performance  between the e+am-a-day

     pr o-cedure and the four e+am procedure in both the &ntroductory Dsychology

    sections !C4I vs. ;NI% and @earning and 9emory sections !CI vs. CI%.

    &n addition, the percentage of students who failed the course decr eased

    following the e+am-a-day  procedure. @eeming’s students also  participated in

    a survey, and students in the e+am-a-day sections reported increased inter estand studying for class.

    9caniel et al. !344;a% described a study in an online Brain and Behavior 

    course that used two inds of initial test 0uestions, short answer and

    multiple-choice, as well as a read-only condition.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    34/41

     2ene3ts o4 testing -e-oryNG  Roediger et al.

    e+aminations in multiple-choice format were given after G wees of 

    0uizzes,and a final cumulative multiple-choice assessment at the end of the semester 

    covered material from both units. Although facts targeted on the initial0uizzes were repeated on the unit?final e+ams, the 0uestion stems wer e

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    35/41

    NG  Roediger et al.  2ene3ts o4 testing -e-ory $

     phrased diff erently so that the learning of concepts was assessed rather than

    memory for a prior test response. >n the two unit e+ams, retention f or 0uizzed material was significantly greater than that for non-0uizzed material,regardless of the initial 0uiz format. >n the final e+am, however, only shortanswer !but not multiple-choice% initial 0uizzes produced a significant  benefitabove non-0uizzed and read-only material. The results from this study provide further evidence of the strength of the testing eff ect in classr oomsettings, as well as replicating prior findings showing that short answer tests produce a greater testing eff ect than do multiple-choice tests !e.g., Butler 1/oediger, 344;8 2ang et al., 344;%.

    9caniel and $un !344% replicated these findings in a more tr aditional

    college classroom setting, in which students too two short-answer 0uizzes

     per wee. The 0uizzes were emailed to the students, who had to complete

    them by noon the ne+t day. After emailing their 0uiz bac to the  pr ofessor ,

    students received an email with the 0uiz 0uestions and correct answers.

    /etention was measured on unit e+ams, composed of 0uizzed and non-

    0uizzed material, and administered at the end of every wee. Derformance f or 0uizzed material was significantly greater than  performance on non-0uizzedmaterial.

    inally, /oediger, 9caniel, 9cermott, and Agarwal !344% conducted

    various test-enhanced learning e+periments at a middle school in &llinois. The

    e+periments were fully integrated into the classroom schedules and used

    material drawn directly from the school and classroom curriculum. &n the

    first study, 5th grade social studies, ;th grade (nglish, and Cth grade

    science

    students completed initial multiple-choice 0uizzes over half of the classr oom

    material. The other half of the material served as the control material. Theteacher in the class left the classroom during administration of 0uizzes, so she

    did not now the content of the 0uizzes and could not bias her instruction

    toward !or against% the tested material. The initial 0uizzes included a  pr e-test

     before the teacher reviewed the material in class, a post-test immediately

    following the teacher ’s  lecture, and a review test a few days after the

    teacher ’s lecture. #pon completion of a G- to 5-wee unit, retention was

    measured on chapter e+ams composed of both 0uizzed and non-0uizzed

    material. At all

    three grade levels, and in all three content areas, significant testingeff ects

    were revealed such that retention for 0uizzed material was greater thanfor non-0uizzed material, even up to months later !at the end of the

    school year%. The results from Cth grade science, for e+ample, can be seen in

    igure .;.

    This e+periment was replicated with 5th grade social studies students

    who, instead of completing in-class multiple-choice 0uizzes,  participated in

    games online using an interactive website at their leisure. This design was

    implemented in order to minimize the amount of class time re0uired for atest-enhanced learning  program. espite being left to their own devices,

    students still performed better on 0uizzed material available online than non-0uizzed material on their final chapter e+ams. urthermore, in a

    subse0uent

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    36/41

     2ene3ts o4 testing -e-oryNN  Roediger et al.

     ig!re ".= $cience results from /oediger, 9caniel, 9cermott, and Agarwal!344%. $ignificant testing eff ects in a middle school setting were r evealedsuch that retention for 0uizzed material was greater than for non-0uizzedmaterial, even up to months later !at the end of the school year%.

    e+periment with 5th grade social studies students, a read-only conditionwas included, and  performance for 0uizzed material was still significantlygreater than read-only and non-0uizzed material, even when the number of 

    e+posures were e0uated between the 0uizzed and read-only condition.&n sum, recent research is beginning to demonstrate the robust eff ects of 

    testing in applied settings, including middle school and college classr ooms.

    uture research e+tending to more content areas !e.g., math%, age gr oups

    !e.g., elementary school students%, methods of 0uizzing !e.g., computer-based

    and online%, and types of material !e.g., application and transfer 0uestions%,

    we e+pect, will only provide further support for test-enhanced learning

     pr ograms.

    CONCL!SION

    &n this chapter, we have reviewed evidence supporting test-enhancedlearning in the classroom and as a study strategy !i.e., self-testing% for 

    impr oving student  performance. re0uent classroom testing has both indirect

    and dir ect benefits. The indirect benefits are that students study for more time and

    with

    is ibt t

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    37/41

    NH  Roediger et al.  2ene3ts o4 testing -e-ory /

    greater regularity when tests are fre0uent, because the specter of a loomingtest encourages studying. The direct benefit is that testing on material servesas a potent enhancer of retention for this material on future tests, either 

    relative to no activity or even relative to restudying material. Dr oviding

    correct answer feedbac on tests and insuring that students carefully  pr ocessthis feedbac greatly enhances this testing eff ect. eedbac is especially

    important when initial test  performance is low. 9ultiple tests produce alarger testing eff ect than does a single test. &n addition, tests re0uiring  pr o-duction of answers !short answer or essay tests% produce a greater testingeff ect than do recognition tests !multiple-choice or true?false%. The latter testsalso have the disadvantage of e+posing students to erroneous

    inf or mation, but giving feedbac eliminates this problem. Test-enhanced

    learning is not limited to laboratory materials8 it improves  performance

    with educational materials !foreign language vocabulary, science passages%

    and in actual classroom settings !ranging from middle school classes in

    social studies, (nglish, and science, to university classes in introductory

     psychology and  biological bases of behavior%. n the analysis of the factors of recall in the learning  pr ocess.

     Psy'hologi'al Monogra&hs, "", H* ;;.

    Agarwal, D. 2., 2arpice, O. ., 2ang, $. '. 2., /oediger, '. @., 1 9cermott, 2.B. !344C%. (+amining the testing eff ect with open- and closed-boo tests. A &&lied (ogniti9e Psy'hology, ##, C5* C;5.

    Arbucle, T. 7., 1 6uddy, @. @. !5%. iscrimination of item strength at time of 

     presentation. Jo!rnal o4 E?&eri-ental Psy'hology, 7", 35* G.

    Bacon, . T. !;%. 6redibility of repeated statements: 9emory for trivia.

     J o!rnal o4 E?&eri-ental Psy'hology@ H!-an Learning and Me-ory, /, 3N* 3H3.

    Balota, . A., uche, O. 9., 1 @ogan, O. 9. !344;%. &s e+panded retrieval  pr actice

    a superior form of spaced retrieval= A critical review of the e+tant literature.

    &n O. $. )airne !(d.%, %he 4o!ndations o4 re-e-bering@ Essays in honor o4 Henry

     L. Roediger, III !pp. CG*4H%. )ew 7or: Dsychology Dr ess.

    Bangert-rowns, /. @., 2uli, 6. 6., 2uli, O. A., 1 9organ, 9. !%.

    The instructional feedbac in test-lie events.  Re9iew o4 Ed!'ational 

     Resear'h, 0",

    3G* 3GC.

    Begg, &., Armour, ., 1 2err, T. !CH%. >n believing what we remember.

    (anadian

     Jo!rnal o4 2eha9ioral S'ien'e, "= , * 3N.

    Begg, &., uft, $., @alonde, D., 9elnic, /., 1 $anvito, O. !C%. 9emory

     pr edic- tions are based on ease of processing.  Jo!rnal o4 Me-ory and 

     Lang!age, #7,

    54* 5G3.

    Ben"amin, A. $. !344;%. 9emory is more than "ust remembering: $trategic control of 

    encoding, accessing memory, and maing decisions. &n A. $. Ben"amin 1 B. '.

    /oss !(ds.%, %he &sy'hology o4 learning and -oti9ation@ S+ill and strategy in

    -e-or  y !se !ol. NC, pp. ;H*33G%. @ondon: Academic Dr ess.

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    38/41

     2ene3ts o4 testing -e-oryN5  Roediger et al.

    B"or, /. A. !;H%. /etrieval as a memory modifier: An interpretation of negative

    recency and related  phenomena. &n /. @. $olso !(d.%,  In4or-ation &ro'essing and 

    'ognition@ %he Loyola Sy-&osi!- !pp. 3G*NN%. )ew 7or: @% and the delayed-O>@ eff ect.  Me-ory and (ognition, #5, G;N* GC4.azio, @. 2., 1 9arsh, (. O. !344%. $urprising feedbac improves later memor y.

     Psy'hono-i' 2!lletin and Re9iew, "0 , CC* 3.

    eller, 9. !N%. >pen-boo testing and education for the future. St!dies in  Ed!*

    'ational E9al!ation, #5, 3GH* 3GC.

    erster, 6. B., 1 $inner, B. . !H;%. S'hed!les o4 rein4or'e-ent . )ew

    7or : Appleton-6entury-6rofts.

    Lates, A. &. !;%. /ecitation as a factor in memorizing.  Ar'hi9es o4  Psy'holog  y,

    0 !N4%.

    Llover, O. A. !C%. The testing  phenomenon:  )ot gone but nearly f or gotten.

     Jo!rnal o4 Ed!'ational Psy'hology, 7", G3* G.

    'asher, @., Loldstein, ., 1 Toppino, T. !;;%. re0uency and the conference of 

    referential validity.  Jo!rnal o4 erbal Learning and erbal 2eha9ior , "0 , 4;* 3.

    Oacoby, @. @., 1 'ollingshead, A. !4%. /eading student essays may be hazardous

    to

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    39/41

    N;  Roediger et al.  2ene3ts o4 testing -e-ory = 

    your spelling: (ff ects of reading incorrectly and correctly spelled words. (anadian

     Jo!rnal o4 Psy'hology, , GNH* GHC.Oones, '. (. !3G%. The eff ects of e+amination on the  performance of learning.

     Ar'hi9es o4 Psy'hology, "5, * ;4.2ang, $. '. 2. !344a%. (nhancing visuo-spatial learning: The benefit of r etrieval

     practice. 9anuscript under r evision.2ang, $. '. 2. !344b%. The influence of te+t e+pectancy, test format and test

    e+perience on study strategy selection and long-term retention. #npublished

    doctoral dissertation, , #$A.

    2ang, $. '. 2., 9cermott, 2. B., 1 /oediger, '. @. !344;%. Test format andcorr ec- tive feedbac modify the eff ect of testing on long-term retention.  E!ro&ean J o!rnal o4 (ogniti9e Psy'hology, ">, H3C* HHC.

    2arpice, O. . !in press%. 9etacognitive control and strategy selection: eciding

    to practice retrieval during learning.  Jo!rnal o4 E?&eri-ental Psy'hology@ Beneral .

    2arpice, O. ., Butler, A. 6., 1 /oediger, '. @. !344%. 9etacognitive strategiesin

    student learning: o students practise retrieval when they study on their own=

     Me-ory, "= , N;* N;.

    2arpice, O. ., 1 /oediger, '. @. !344;%. (+panding retrieval practice

     pr omotes short-term retention, but e0ually spaced retrieval enhances long-term

    r etention.  Jo!rnal o4 E?&eri-ental Psy'hology@ Learning, Me-ory, and 

    (ognition, $$,

    ;4N* ;.

    2arpice, O. ., 1 /oediger, '. @. !344C%. The critical importance of retrieval

    f or learning. S'ien'e, $">, 55* 5C.2elemen,

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    40/41

     2ene3ts o4 testing -e-oryNC  Roediger et al.

    @eeming, . 6. !3443%. The e+am-a-day procedure improves  performance in Dsych-

    ology classes. %ea'hing o4 Psy'hology, #>, 34* 33.

    @oftus, (. ., 9iller, . L., 1 Burns, '. O. !;C%. $emantic integration of 

    verbal information into a visual memory.  Jo!rnal o4 E?&eri-ental Psy'hology@ H!-an Learning and Me-ory, , * G.

    @ogan, O. 9., 1 Balota, . A. !344C%. (+panded vs. e0ual interval spaced r e-trieval practice: (+ploring diff erent schedules of spacing and retention intervalin younger and older adults.  Aging, 8e!ro&sy'hology, and (ognition, "/, 3H;* 3C4.

    9caniel, 9. A., Anderson, O. @., erbish, 9. '., 1 9orrisette, ). !344;a%. Testing

    the testing eff ect in the classroom.  E!ro&ean Jo!rnal o4 (ogniti9e Psy'hology, ">,

    NN* HG.

    9caniel, 9. A., /oediger, '. @., &&&, 1 9cermott, 2. B. !344;b%. Lener alizing

    test-enhanced learning from the laboratory to the classroom.  Psy'hono-i' 2!lletin

    and Re9iew, ", 344* 345.

    9caniel, 9. A., 1 $un, O. !344%. The testing eff ect: (+perimental evidence in acollege course. 9anuscript under r evision.

    9arsh, (. O., Agarwal, D. 2., 1 /oediger, '. @., &&& !344%. 9emorial

    conse0uences of answering $AT && 0uestions.  Jo!rnal o4 E?&eri-ental 

     Psy'hology@ A&&lied , "/,

    * .

    9arsh, (. O., /oediger, '. @., &&&, B"or, /. A., 1 B"or, (. @. !344;%. The memorial

    conse0uences of multiple-choice testing.  Psy'hono-i' 2!lletin and Re9iew, ",

    N* .

    9etcalfe, O., 1 2ornell, ). !344G%. The dynamics of learning and allocation of 

    study time to a region of pro+imal learning.  Jo!rnal o4 E?&eri-ental Psy'hology@Ben* eral , "$#, HG4* HN3.

    9oreno, /. !344N%. ecreasing cognitive load for novice students: (ff ects of e+plana-

    tory versus corrective feedbac in discovery-based multimedia.  Instr!'tional 

    S'ien'e, $#, * G.

     )elson, T. >., 1 unlosy, O. !%. ., 1 )arens, @. !C4%. )orms of G44 general-information 0uestions:

    Accuracy of recall, latency of recall, and feeling-of-nowing ratings.  Jo!rnal o4 

    erbal Learning and erbal 2eha9ior , ">, GGC* G5C.

     )elson, T. >., 1 )arens, @. !4%. 9etamemory: A theoretical framewor and newfindings. &n L. '. Bower !(d.%, %he &sy'hology o4 learning and -oti9ation !ol. 35,

     pp. 3H*;G%. )ew 7or: Academic Dr ess.

    >degard, T. )., 1 2oen, O. . !344;%. )one of the above as a correct and incorr ectalternative on a multiple-choice test: &mplications for the testing eff ect. Me-or  y,"/, C;G* CCH.

    Dashler, '., 6epeda, ). O.,

  • 8/16/2019 (766992060) Benefits of Testing Memory; Best Practices and Boundary Conditions.

    41/41

     2ene3ts o4 testing -e-ory >

    /emmers, '. '., 1 /emmers, (. 9. !35%. The negative suggestion eff ect on true-false e+amination 0uestions.  Jo!rnal o4 Ed!'ational Psy'hology, "= , H3* H5.

    /ichland, @. (., 2ornell, ). 1 2ao, @. $. !344%. The pretesting eff ect: o unsuccess-

    ful retrieval attempts enhance learning=  Jo!rnal o4 E?&eri-ental  Psy'holog  y@

     A&&lied , "/, 3NG* 3H;.

    /oediger, '. @., 1 2arpice, O. . !3445a%. Test enhanced learning: Taing

    memory tests improves long-term retention.  Psy'hologi'al S'ien'e, "= , 3N* 3HH.

    /oediger, '. @., 1 2arpice, O. . !3445b%. The power of testing memory:

    Basic research and implications for educational  practice.  Pers&e'ti9es on

     Psy'hologi'al S'ien'e, ", C* 34.

    /oediger, '. @., 9caniel, 9. A., 9cermott, 2. B., 1 Agarwal, D. 2. !344%.

    Test-enhanced learning in the classroom: The 6olumbia 9iddle $chool  pro"ect.

    9anuscript in pr epar ation.

    /oediger, '. @., 1 9arsh, (. O. !344H%. The positive and negative conse0uences of 

    multiple-choice testing.  Jo!rnal o4 E?&eri-ental Psy'hology@ Learning,  Me-or  y ,

    and (ognition, $", HH*H.

    /oediger, '. @., Paromb, . 9., 1 Butler, A. 6. !344C%. The role of repeated

    r etrieval in shaping collective memory. &n D. Boyer and O. .