12
Impact of teaching students to use evaluation strategies Aaron R. Warren Department of Mathematics/Statistics/Physics, Purdue University North Central, 1401 S. US-421, Westville, Indiana 46391, USA Received 30 January 2010; published 23 July 2010 Students often make mistakes in physics courses and are expected to identify, correct, and learn from their mistakes, usually with some assistance from an instructor, textbook, or fellow students. This aid may come in many forms, such as problem solutions that are given to a class, tutoring to an individual student, or a peer discussion among several students. However, in each case a student relies upon an external agent in order to determine whether, and how, her work is mistaken. Consequently, the student’s learning process is largely contingent upon the availability and quality of external evaluating agents. One may suspect that if a student developed the ability to evaluate her own work, her dependence on external agents could be moderated and result in an enhancement of her learning. This paper presents the results of a study investigating the impact of novel activities that aim to teach students when, why, and how to use the strategies of unit analysis and special-case analysis. The data indicate that it is possible to help students dramatically improve their under- standing of each strategy, and that this has a significant impact on problem-solving performance. DOI: 10.1103/PhysRevSTPER.6.020103 PACS numbers: 01.40.Fk I. INTRODUCTION Many students are almost completely reliant upon exter- nal evaluators in order to check and correct their work. Phys- ics courses are generally structured so that students receive feedback from instructors, peers, or intelligent tutoring sys- tems which enable the identification and correction of mis- takes in the students’ problem solutions. Since no or little explicit attention is paid to helping students develop the means with which to evaluate their own work, it is natural for the students to develop a belief that evaluation by exter- nal authorities is the only way to identify and learn from their mistakes. We begin by outlining a pair of strategies students can use to internally evaluate their own work, and which may thereby enable some degree of self-regulated learning. II. EVALUATION STRATEGIES The ability to effectively evaluate information has long been recognized as an important cognitive process and edu- cational objective 1,2. According to Anderson & Kraft- wohl, “Evaluation is defined as making judgments based on criteria and standards,” p. 83. In general, a given particular i.e., piece of information is evaluated by determining whether it satisfies some set of criteria to such a degree as to pass a pre-established standard. In physics, there are several types of criteria and standards, and general guidelines as to how the criteria and standards are to be applied. These asso- ciations of criteria, standards, and methods of application, are called evaluation strategies. There are many evaluation strategies for different types of particulars in physics. For example, a proposed problem- solution may be evaluated using the strategy of unit analysis to check whether each equation in the solution is physically sensible. A proposed theoretical model can be evaluated by checking whether it is consistent with other models in certain limiting cases. An experimental result can be evaluated by developing an independent experimental method and check- ing whether the two experiments give consistent results. The research reported here is guided by a belief that evaluation strategies play a critical role in developing a co- herent, hierarchally organized, robust working knowledge of physics. Before discussing why it may be important for stu- dents to learn and use evaluation strategies, two such strate- gies are outlined below. Each of these strategies could be cast in the form of hypothetico-deductive reasoning 3,4, whereby a hypothesis regarding the given particular is first constructed and then tested. The type of information each strategy is meant to evaluate and the criteria by which the information is judged are summarized below. A. Unit analysis Unit analysis is used to evaluate equations to determine whether they are physically sensible, having the same units for each term. If the equation is found to have inconsistent units on some terms, there are three possible reasons: a the equation is incorrect; b the student incorrectly remembers the units for some quantities; c the student made an alge- braic mistake in her analysis. Ideally, if the equation is incorrect the student should de- termine exactly how the equation fails the unit analysis and to then figure out which quantities and operations need to be added or removed in order to satisfy the unit analysis. In this way the student can, in principle, correct the equation to make it physically coherent. B. Special-case analysis Special case analysis is used to evaluate an equation, model, or conceptual claim to determine whether it is con- sistent with prior knowledge and experience. Any equation, model, or claim is meant to be true for some range of physi- cal situations. This strategy, as defined for the purposes of this study, requires the students to choose some specific situ- ation of which they have prior knowledge, and which also lies within or at the limits of the range of applicability. They PHYSICAL REVIEW SPECIAL TOPICS - PHYSICS EDUCATION RESEARCH 6, 020103 2010 1554-9178/2010/62/02010312 ©2010 The American Physical Society 020103-1

Impact of teaching students to use evaluation strategies

  • Upload
    aaron-r

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Impact of teaching students to use evaluation strategies

Impact of teaching students to use evaluation strategies

Aaron R WarrenDepartment of MathematicsStatisticsPhysics Purdue University North Central 1401 S US-421 Westville Indiana 46391 USA

Received 30 January 2010 published 23 July 2010

Students often make mistakes in physics courses and are expected to identify correct and learn from theirmistakes usually with some assistance from an instructor textbook or fellow students This aid may come inmany forms such as problem solutions that are given to a class tutoring to an individual student or a peerdiscussion among several students However in each case a student relies upon an external agent in order todetermine whether and how her work is mistaken Consequently the studentrsquos learning process is largelycontingent upon the availability and quality of external evaluating agents One may suspect that if a studentdeveloped the ability to evaluate her own work her dependence on external agents could be moderated andresult in an enhancement of her learning This paper presents the results of a study investigating the impact ofnovel activities that aim to teach students when why and how to use the strategies of unit analysis andspecial-case analysis The data indicate that it is possible to help students dramatically improve their under-standing of each strategy and that this has a significant impact on problem-solving performance

DOI 101103PhysRevSTPER6020103 PACS numbers 0140Fk

I INTRODUCTION

Many students are almost completely reliant upon exter-nal evaluators in order to check and correct their work Phys-ics courses are generally structured so that students receivefeedback from instructors peers or intelligent tutoring sys-tems which enable the identification and correction of mis-takes in the studentsrsquo problem solutions Since no or littleexplicit attention is paid to helping students develop themeans with which to evaluate their own work it is naturalfor the students to develop a belief that evaluation by exter-nal authorities is the only way to identify and learn fromtheir mistakes We begin by outlining a pair of strategiesstudents can use to internally evaluate their own work andwhich may thereby enable some degree of self-regulatedlearning

II EVALUATION STRATEGIES

The ability to effectively evaluate information has longbeen recognized as an important cognitive process and edu-cational objective 12 According to Anderson amp Kraft-wohl ldquoEvaluation is defined as making judgments based oncriteria and standardsrdquo p 83 In general a given particularie piece of information is evaluated by determiningwhether it satisfies some set of criteria to such a degree as topass a pre-established standard In physics there are severaltypes of criteria and standards and general guidelines as tohow the criteria and standards are to be applied These asso-ciations of criteria standards and methods of applicationare called evaluation strategies

There are many evaluation strategies for different types ofparticulars in physics For example a proposed problem-solution may be evaluated using the strategy of unit analysisto check whether each equation in the solution is physicallysensible A proposed theoretical model can be evaluated bychecking whether it is consistent with other models in certainlimiting cases An experimental result can be evaluated bydeveloping an independent experimental method and check-

ing whether the two experiments give consistent resultsThe research reported here is guided by a belief that

evaluation strategies play a critical role in developing a co-herent hierarchally organized robust working knowledge ofphysics Before discussing why it may be important for stu-dents to learn and use evaluation strategies two such strate-gies are outlined below Each of these strategies could becast in the form of hypothetico-deductive reasoning 34whereby a hypothesis regarding the given particular is firstconstructed and then tested The type of information eachstrategy is meant to evaluate and the criteria by which theinformation is judged are summarized below

A Unit analysis

Unit analysis is used to evaluate equations to determinewhether they are physically sensible having the same unitsfor each term If the equation is found to have inconsistentunits on some terms there are three possible reasons a theequation is incorrect b the student incorrectly remembersthe units for some quantities c the student made an alge-braic mistake in her analysis

Ideally if the equation is incorrect the student should de-termine exactly how the equation fails the unit analysis andto then figure out which quantities and operations need to beadded or removed in order to satisfy the unit analysis In thisway the student can in principle correct the equation tomake it physically coherent

B Special-case analysis

Special case analysis is used to evaluate an equationmodel or conceptual claim to determine whether it is con-sistent with prior knowledge and experience Any equationmodel or claim is meant to be true for some range of physi-cal situations This strategy as defined for the purposes ofthis study requires the students to choose some specific situ-ation of which they have prior knowledge and which alsolies within or at the limits of the range of applicability They

PHYSICAL REVIEW SPECIAL TOPICS - PHYSICS EDUCATION RESEARCH 6 020103 2010

1554-917820106202010312 copy2010 The American Physical Society020103-1

then determine what the equation model or claim predictsfor this situation and compare the prediction with ourknowledge of what actually happens Essentially the strategyis to conduct a thought experiment determining whether theequation model or claim makes sense based on prior knowl-edge

If the equation model or claim is found to be inconsistentwith prior knowledge there are three possible reasons athe equation model or claim is incorrect b the studentrsquosprior knowledge of the situation is incorrect c the studentmade a mistake in the execution of the analysis

If the equation model or claim is incorrect the studentshould determine exactly how the equation model or claimfails the special-case analysis The goal is to figure out howthe equation model or claim needs to be modified in orderto make it consistent with the prior knowledge While thisstrategy is quite general the actual process is highly depen-dent on the details of the physical model being employedand the particular context of the problem

III REMARKS

It is important to note that each evaluation strategy issubject to errors as it is wholly possible for the student tomake a mistake while using these strategies Therefore theevaluation strategies listed above provide a useful thoughimperfect means for judging the validity and soundness ofproposed solutions to quantitative questions Also there arecertainly many other evaluation strategies used in physicssuch as error analysis to evaluate an experimental result 5Research has found that it is possible and beneficial to helpstudents adopt certain paradigms and strategies for evaluat-ing experimental data 67 One of the major instructionalchallenges identified by Lippmann was helping studentsadopt a frame which values the theory of measurement asmany students were largely unaware that such a theory existsand is essential to good science Based on this one may fullyexpect that one of the difficulties in helping students learnthe evaluation strategies of unit and special-case analysiswill be helping the students to adopt a frame which valuesthese strategies

IV EVALUATION AND LEARNING

Evaluation strategies may serve a role in regulating thestructure of schemas employed by students in answeringquestions 8ndash11 That is an evaluation strategy may belinked to an array of schemata responsible for context-specific activities such as solving inclined-plane problemsFor the purposes of this paper a schema is defined to be aninterconnected network of knowledge elements that can betriggered by a variety of inputs which attempts to organizeand assign meaning to elements of the input and then utilizethe input to produce some set of cognitive outputs When acontext-specific schema is activated an evaluation strategyhas some probability of being activated as well The use ofan evaluation strategy can lead the student to recognize thatthe problem-solving schema is incoherently structured orgives results that are inconsistent with the results of another

schema Upon such recognition the student may correct herown mistake consequently restructuring the associatedschema In other words it is possible that evaluation strate-gies may be one of the agents responsible for establishinglocal and global coherence 12 and also for modifying theconditions under which a particular schema is activated

As a simple illustrative example consider a student work-ing on the homework problem shown in Fig 1 The studentrsquosattempted solution is also shown In this case the studentrsquoswork is fine except for the incorrect use of trigonometrywhen determining the force components It may be that thestudent used cos and sin as she did because she has a strongschematic connection that associates cos with the x axis andsin with the y axis By doing a special-case analysis of hersolution say by examining the case when =0deg the studentmay realize that she has made a mistake If =0deg then weexpect FN=Mg since the incline will be level requiring thenormal force to completely balance the gravitational forceHowever plugging =0deg into the studentrsquos solution givesFN=0 N This analysis therefore indicates that the studentmade a mistake in the way she dealt with the angle in hersolution

If the student tries the simplest alternative by swappingthe sin and cos functions in her solution she will get a newsolution that does pass this special-case analysis This pro-cess of evaluation and self-correction could thereby alter thestudentrsquos schema as she may in the future be less likely toblindly associate cos with the x axis and sin with the y axisIn particular a successful special-case analysis would elabo-rate on why cos should be associated with the y axis and sinwith the x axis in this case The links between knowledgeelements in the studentrsquos mind may consequently be revisedto account for this surprising result Although there are manysteps where a student may become lost or make a mistake itis possible that the use of evaluation strategies can serve aregulatory function in the organization of student knowledge

Without this evaluation strategy a studentrsquos ability to re-vise and regulate her understanding is much more limited Ifthe sole means for evaluation and feedback lie outside ourstudents they become dependant upon external agents in or-der to engage in learning This dependence strongly con-strains the range of times and places in which a student hasthe opportunity to learn as the majority of student learningcan only occur with the feedback of an instructor textbook

Question What is the normal force exerted on a block of mass M by a surface whichis inclined at an angle θ above the horizontal

θ

Student solutionX -Mg cos(θ) = M ax (after substituting FE = Mg)Y FN ndash Mg sin(θ) = M ay = 0 NTherefore

FN = Mg sin(θ)

FE

FNx

y

FIG 1 An illustrative example of a traditional homework prob-lem and a possible student solution

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-2

or peer In many college-level courses especially large-enrollment courses each student has only a few hours eachweek in the presence of an instructor and only during afraction of that time does an instructor directly interact witheach student Textbooks fail to provide any sort of dynamicfeedback for students and peer feedback may be limited by alack of availability and often may be incorrect or mislead-ing Consequently it should come as no surprise that stu-dents often learn far too little in physics courses althoughthey can learn more when courses are structured to facilitatebetter evaluative feedback to students via interactive-engagement methods 1314 such as implementations of theUniversity of Washington Tutorials 15 at the University ofColorado 16 Peer Instruction 1718 and intelligent tutor-ing systems 1920 In each case a student is given moreextensive access to finely-tuned external evaluation andhence has a greater opportunity to learn

If students had the ability to evaluate their own worktheir learning would no longer be completely contingentupon external evaluators Instead students would be able toidentify and correct their own mistakes allowing them tobetter learn on their own In fact Zimmerman and Martinez-Pons identified self-evaluation as a necessary component intheir model of self-regulated learning 21 and self-evaluation has been shown to promote self-regulated learn-ing in young students 22 Likewise Hammer 23 has iden-tified the importance of student ldquoindependencerdquo whichtypifies the extent to which a student takes responsibility forconstructing their own knowledge instead of simply accept-ing what is given by authorities without any evaluation

An important potential benefit of student self-evaluationis that it may help students develop authentic science reason-ing abilities a major goal of science education 24ndash26 Inparticular the use of evaluation strategies highlights the factthat our judgments of theoretical models can produce falsepositives and negatives and this limits our confidence insuch judgments Recognition of the sources of reasoning er-rors is fundamental to many aspects of critical thinking 27and reflective judgment 28 It seems reasonable to believethen that these abilities may be enhanced if instructors canhelp students to value and use evaluation strategies

V TEACHING AND ASSESSING EVALUATION

To help students learn how and why to use evaluationstrategies a set of formative activities and rubrics were de-veloped 2930 In general students must understand thegoal state they are trying to achieve their current level ofperformance and how to utilize descriptive feedback to im-prove their performance A scoring rubric is one tool that canbe used to help achieve these conditions The rubrics breakup abilities into finer-grained component abilities and con-tain descriptions of different levels of performance includingthe target level A student or a group of students can use therubrics to self-assess her or their own work or an instructorcan use the rubrics to assess studentsrsquo responses and providefeedback 31ndash33 Additionally written and verbal commentson student work may be given to personalize the feedbackand address particular issues in a studentrsquos work

VI EVALUATION TASKS AND RUBRICS

During the 2002ndash2003 and 2003ndash2004 academic years alibrary of tasks were designed tested and refined to helpstudents learn the evaluation strategies of unit analysis andspecial-case analysis Below is a list of the types of tasksfeatured in this study An example packet of these and othertypes of evaluation tasks can be downloaded from httppaerrutgerseduScientificAbilitiesKitsdefaultaspx

External unit analysismdashA problem and proposed solutionare given and the student must do a unit analysis to evaluateand possibly revise the given solution

External special-case analysismdashA problem and proposedsolution are given and the student must do a special-caseanalysis to evaluate and possibly revise the solution

Conceptual counterexamplemdasha conceptual claim is madeand the student is asked whether they agree or disagree andto justify their opinion In many cases the most appropriatestrategy is to do a special-case analysis although there is nospecific prompting to use any particular strategy

Integrated tasksmdashThe student must solve a problem thendo unit andor special-case analysis to evaluate their solutionand possibly revise it

Critical thinking tasksmdashThe student is given a problemand proposed solution and asked to give three independentarguments which each analyze whether the given solution isreasonable There are no explicit prompts to use any particu-lar strategy so the student must spontaneously recognize thatspecial-case and unit analysis can each be used to generatearguments and then use these strategies to make valid andsound arguments

These tasks may be used in recitation or homework as-signments Using the scoring rubrics and giving descriptivecomments on student work for evaluation tasks provides for-mative assessment of the studentsrsquo work The scoring rubricsrate student performance on a scale of 0ndash3 with the follow-ing general meanings 0=no meaningful work done1=student attempts but does not understand the generalmethod for completing the task 2=student understands thegeneral method but her execution is flawed 3=studentrsquosmethod and execution of the task are satisfactory The pro-cess by which we developed the rubrics is discussed inEtkina et al 34

A rubric score of 2 indicates the student tried applying allthe steps of the special-case analysis SCA but made somesuperficial mathematical or conceptual error along the wayIn other words the student demonstrates that she understandsthe context-independent strategic reasoning process of SCAbut commits some minor mistake in using the context-specific information during the application of the SCA egforgetting a minus sign or confusing x and y components ofa vector So the difference between a 2 and a 3 is onlyindicative of how well the student used the particularcontext-specific information in the task to get either scorethe student must demonstrate sound use of the general SCAreasoning process Our final set of evaluation rubrics yieldedan average inter-rater agreement level of 96 ranging be-tween 91 and 99 and a Cohen Kappa 35 of 0947 p 001 The rubrics are available for download at httppaerrutgerseduScientificAbilitiesRubricsdefaultaspx asRubric I

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-3

VII STUDY DESIGN AND RESULTS

Preliminary research conducted during the 2003ndash2004academic year demonstrated significant correlations betweenstudentsrsquo abilities to use evaluative strategies and theirproblem-solving performance 36 During the 2004ndash2005academic year a study was conducted to further investigatethe effects of using evaluation tasks in an algebra-basedphysics course After discussing the design of the study re-sults are presented addressing three research goals 1 mea-sure studentsrsquo abilities to use evaluation strategies 2 inves-tigate the extent to which students valued and incorporatedevaluation strategies into their personal learning behavior3 test whether the use of evaluation tasks resulted in sig-nificant improvements in student problem-solving perfor-mance

To accomplish these goals data were collected from twocourses at Rutgers University as part of a quasiexperimentalcontrol group study The experiment group consists of the193ndash194 course a year-long introductory algebra-basedphysics course for life science majors The gender distribu-tion was 46 male 54 female There were 200 studentsfor each term with 95 of students participating in both the193 and 194 courses The comparison group is the year-long203ndash204 course another introductory algebra-based coursefor life science majors There were 459 students for the fallterm and 418 of those students continued enrollment in thespring term These two courses were generally run in paral-lel although the 203ndash204 class covers slightly more materialand the students in this class typically have stronger mathand science backgrounds than in the 193ndash204 course it isthe more competitive course held on a different universitycampus This likely bias was substantiated by the data andwill serve to accentuate the results of our study as discussedbelow

The lectures for 203ndash204 were all designed and given byAlan Van Heuvelen The lectures for the 193 course weregiven by Sahana Murthy a post-doctoral student workingwith the Rutgers Physics Astronomy amp Education Group atthe time and the lectures for the 194 course were given bythe author All lectures for 193 and 194 were based on VanHeuvelenrsquos lecture notes Lectures for all courses involvedchalkboard and transparency-based presentations featuringexperimental demonstrations and student engagement viapeer discussions and student infrared response systems Thechalkboard and transparency-based presentations often beganwith an experimental demonstration to establish a concreteexample of some new class of phenomena to be studiedQuestions about the properties of the phenomena would beelicited andor posed by the lecturer These questions wouldmotivate the construction of a physical model Application ofthe model to solve conceptual and traditional problemswould then be illustrated This entailed making multiple rep-resentations of information assessing consistency betweenthe different representations and utilizing the representationstogether with the physical model in order to solve the prob-lem Students would then work in groups on one or twoproblems respond via infrared response systems and receivesome feedback from the lecturer It should be noted that lec-tures in the 203ndash204 courses did include explicit modeling of

unit analysis One premise of the study was the ability toprovide very similar lecture environments for the twocourses However some stylistic and personal differences inlecturing were unavoidable and the threat to the studyrsquos in-ternal validity will be assessed below

Recitations for both courses involved 25 students work-ing in groups of 3ndash5 on a recitation assignment with a teach-ing assistant there to provide help as needed Homeworkassignments featured problems which were distinct but usu-ally similar to problems from the recitation assignments So-lutions for recitation and homework assignments were postedonline for each course The recitation and homework assign-ments given by Van Heuvelen for 203ndash204 were generallyidentical to those given in 193ndash194 except that some prob-lems from the 203ndash204 assignments were replaced by evalu-ation activities In general there were between one and twosuch replacements each week with one replacement in thehomework assignment and usually one replacement in therecitation assignment

This replacement served as our experimental factor beingthe only designed difference between the two courses Weattempted to minimize any potential bias due to time-on-topic differences by only replacing problems which coveredthe same material and which were thought to take roughlythe same amount of time to complete as the evaluation taskswhich were used in their place Some differences in the reci-tation and homework assignments were inevitable as the203ndash204 course is required to cover more material and had55-min recitations while the 193ndash194 course had 80-minrecitations Again we will address the threats posed by suchfactors later

The evaluation tasks used in recitations and homework for193ndash194 covered only half of the topics in the course Forexample we did not include any evaluation tasks relating tomomentum fluid mechanics or wave optics although taskswere included on work-energy the first law of thermody-namics and dc circuits see Warren 37 for a list of specificevaluation tasks used This is an important feature of thestudy since it provides a baseline response which will beuseful when testing whether the use of evaluation tasks af-fects problem-solving performance

Laboratories for both courses were practically identicalwith 25 students per section working in groups of 3ndash5 withthe aid of a teaching assistant The laboratories featured threetypes of experiments investigative testing and applicationEach laboratory usually included two experiments most ofwhich were either testing or application experiments Itshould be noted that although the 193ndash194 students wererequired to take the laboratory the 203ndash204 students werenot as the laboratories constitute a distinct course labeled205ndash206 However nearly all 98 of the 203ndash204 stu-dents were enrolled in 205ndash206 concurrently

Another important set of factors in the study are the teach-ers themselves The teaching assistants in 193ndash194 wereatypical as 4 of the 8 assistants for the course were enrolledat the Graduate School of Education and another 2 assistantswere members of the Rutgers Physics and Astronomy Edu-cation Group Only 1 of the 8 teaching assistants was a tra-ditional physics graduate student In contrast the teachingassistants for 203ndash204 included a mixture of several profes-

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-4

sors and several physics graduate students none of whomwere in physics education These are at least superficiallyvery distinct populations This uncontrolled factor and thethreats it poses to the studyrsquos internal validity shall be dis-cussed below

VIII GOAL 1 TEACHING EVALUATION STRATEGIES

The 193ndash194 and 203ndash204 courses both had six examsduring the year Each exam featured a combination ofmultiple-choice questions and open-response tasks withroughly two-thirds of the test score based on multiple-choiceperformance and one-third based on the open-response tasksA typical midterm exam exams 1 2 4 5 had between 10and 15 multiple choice questions and 3 open-response tasksFinal exams exams 3 and 6 had roughly double the numberof items for each type An evaluation task was included as anopen-response task on each exam for both courses Thesetasks served as summative assessments to measure studentsrsquoevaluative abilities Student responses to the evaluation taskson each exam were photocopied and scored using our ru-brics Although we photocopied and scored each studentrsquoswork from 193ndash194 the large number of students in 203ndash204 made it impossible to do the same for them We there-fore randomly selected 150 students from the 203ndash204course whose responses to the evaluation tasks on examswould be photocopied and scored A check was performed todetermine whether this sample was representative of the en-tire class A Kruskal-Wallis test indicated that the grades forthe 203ndash204 sample were not significantly different fromthose of the remainder of the class

The evaluation tasks on exams 1 and 3 were externalspecial-case analysis tasks These allowed us to measure howwell the students could use this strategy when explicitlyasked to The evaluation tasks on exams 2 4 6 were criticalthinking tasks These allowed us to see what fraction of thestudents recognized that special-case andor unit analysiscould be used to construct such arguments as would be dem-onstrated by their spontaneous use of these strategies For

those students that did attempt to use these strategies therubrics were used to measure the quality of their use Each ofthese tasks involved topics which the 193ndash194 students hadhad both special-case and unit analysis tasks on eg dccircuits The evaluation task on exam 5 was a conceptualcounterargument task which allowed us to see what fractionof students tried to use special-case analysis and how wellthey used it For a list of the exam evaluation tasks seeWarren 37 Table I lists the results from these tasks Thereported numbers for the quality of each strategyrsquos use onexams 2 4 5 6 are the average rubric scores of those stu-dents who decided to try using that particular strategy on thecritical thinking tasks included in those exams The values incolumns 3 and 4 state the fractions of each class using eachstrategy and were determined by finding the fraction of theclass that had a score of at least 1 according to our rubrics Incolumn 4 a value of 100 indicates that students were ex-plicitly instructed to use a particular strategy on an evalua-tion task Values of NA in columns 3 and 5 indicate that noevaluation tasks relating to UA were included All differ-ences in fractional use between the two classes are statisti-cally significant at the 001-level according to a 2-test Notethat there is an NA in column 4 for exams 1 and 3 becausethe tasks given on those exams explicitly told students to doa special-case analysis

There were typically 8ndash12 students in our sample from203ndash204 who tried using special-case analysis on the evalu-ation tasks from those exams and their average quality ofuse according to our rubric was 23 for each exam Nomore than 1 student in our sample from 203ndash204 ever usedunit analysis on the critical thinking tasks

There are a few points to be made here First the resultsindicate that we were successful at helping the 193ndash194 stu-dents understand when why and how to use both strategiesThey significantly outperformed the 203ndash204 students in thefrequency of use for each evaluation strategy and eventuallydemonstrated high quality use of each strategy as measuredby the rubrics The increases in special-case analysis qualityfrom exam 1 to exam 3 and in the fraction using it fromexam 2 to exam 4 are in part attributable to the fact that the

TABLE I Measurements of studentsrsquo evaluative abilities for unit analysis UA and special-case analysisSCA on each of the six exams

Exam Class Fraction UA Fraction SCA Quality UA Quality SCA

1 193ndash194 NA 100 NA 117007

203ndash204 100 081005

2 193ndash194 089 018 248004 125008

203ndash204 001 005 300000 233019

3 193ndash194 NA 100 NA 226004

203ndash204 100 090005

4 193ndash194 053 038 273006 223007

203ndash204 001 009 300000 230013

5 193ndash194 NA 042 NA 234008

203ndash204 005 233019

6 193ndash194 038 038 235009 225007

203ndash204 001 007 300000 225013

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-5

ratio of types of tasks given in recitation and homeworkassignments was altered after exam 2 Up until then roughlyhalf of the evaluation tasks included in these assignmentsfocused on unit analysis while the other half dealt withspecial-case analysis After seeing the results from exam 2though it was realized that students were having much moresuccess with unit analysis perhaps because it is a much sim-pler strategy than special-case analysis Thereafter the ma-jority of evaluation tasks in recitations and homework fo-cused on special-case analysis and a substantial fraction40 of the students are observed to spontaneously em-ploy SCA on the exam evaluation activities

It should be clarified that on exams 4 and 6 the set of193ndash194 students who used unit analysis and the set of thosethat used special-case analysis were not identical eventhough the fractional usage was similar for both strategiesThe overlap between these two sets defined as the ratio oftheir intersection to their union was 045 for exam 4 and053 for exam 6

It is interesting that special-case analysis was used at allamong the 203ndash204 students on the critical thinking tasks onexams 2 4 5 6 as a group of roughly 8ndash12 students spon-taneously employed SCA with a high level of quality aver-age rubric scores of roughly 23 Apparently some studentsmay enter our physics courses with a well-developed under-standing of special-case analysis perhaps due to prior phys-ics courses It is also worth noting that 203ndash204 studentswho tried using special-case analysis for the open-responseexam problem typically scored very well on the multiple-choice questions with 38 of them earning perfect scores

IX GOAL 2 USING AND VALUING EVALUATIONSTRATEGIES

As a means to assess student valuation and use of evalu-ation strategies anonymous surveys were administered at thelast recitation of the spring term in the Physics 194 courseIncluded were several questions asking students to rate howfrequently they had used each strategy to evaluate their ownwork outside of class and also to rate how strongly certainfactors inhibited their use of each strategy These surveyquestions are shown in Fig 2 with the results in Table IIThere were 158 respondents to the survey out of the 200students in the course giving a response rate of 79 Theremay be a selection bias among the respondents as the stu-dents knew that one recitation grade would be dropped fromthe final grade and that the last recitation would be a reviewsession instead of a normal recitation For that reason upperand lower bounds are listed on the average scores in Table IIwhere the bounds are determined by assuming all missingrespondents would have given either the highest or lowestpossible response for the questions

To conduct a Cronbach alpha test for internal consistencythe scores to questions 3andash3d and 6andash6d were madenegative accounting for the fact that these inhibitors arelikely to reduce the usage and valuation of evaluation strat-egies by students ie all scores were coded in the sameconceptual direction The calculated alpha value is =0800 a strong result giving confidence that the items are

consistent and can therefore be used as a basis for interpre-tation about student usage and valuation of evaluation strat-egies

One may safely assume that there were very few studentswho came into the 193ndash194 course already knowing valu-ing and using either evaluation strategy This assumption issupported by the small number of students from 203ndash204who used either strategy on the tasks in exams 2 4 5 and 6see above Given that assumption the results suggest amoderate degree of success in teaching students to incorpo-rate these strategies into their personal learning behavior In-deed responses to questions 1 and 4 were probably artifi-cially lowered due to the fact that we only includedevaluation tasks relating to half of the topics during the yearIf evaluation tasks had been given on all topics it seemslikely that students would have used the evaluation strategiesmore frequently It is worth pointing out that the evidencehere has limited inferential strength because of its indirect-ness Further studies with more direct means of assessingwhether students learn when and why to employ evaluationstrategies would be useful

1 How much have you used special-case analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

2 If you were a physics major and really interested in learning physics how useful do youthink special-case analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

3 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dospecial-case analysis (10 means the factor really made you not want to do special-caseanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do special-case analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a special-case analysis (if you wereniacutet confused put 0)

4 How much have you used dimensional analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

5 If you were a physics major and really interested in learning physics how useful do youthink dimensional analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

6 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dodimensional analysis (10 means the factor really made you not want to do dimensionalanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do dimensional analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a dimensional analysis (if you wereniacutet confused put 0)

FIG 2 End-of-year survey questions regarding the studentsrsquoself-reported use of each evaluation strategy

TABLE II Results from end-of-year survey questions

Question number Average response Lower bound Upper bound

1 225 19 28

2 385 32 41

3a 7010 55 76

3b 5110 40 61

3c 2410 19 40

3d 2210 17 38

4 295 25 33

5 395 33 41

6a 5910 47 67

6b 4710 37 58

6c 1710 13 34

6d 1610 13 34

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-6

The relatively low response scores to questions 3c 3d 6cand 6d are consistent with the results for goal one of ourstudy indicating we were successful at helping students un-derstand when why and how to use each strategy The factthat the scores for 3c and 3d are higher than 6c and 6dappears reasonable because special-case analysis is certainlya much more complicated and multifaceted strategy that unitanalysis This disparity in complexity probably also explainswhy the scores to 3a and 3b were higher than 6a and 6b

X GOAL 3 ENHANCING PROBLEM-SOLVINGPERFORMANCE BY TEACHING EVALUATION

Several conventional multiple-choice questions werecommon to each of the 193ndash194 and 203ndash204 lecture examsThere were six exams during the year three per semesterThe midterm exams exams 1 2 4 and 5 were 80 minwhile the final exams exams 3 and 6 were 3 h The examquestions shared by the two classes were all designed by theinstructor of the 203ndash204 course Van Heuvelen Some ofthese shared multiple-choice questions were on topics whichthe 193ndash194 students had had evaluation tasks on such aswork-energy and dc circuits This set of multiple-choicequestions will be called E-questions The remainder of theshared multiple-choice exam questions covered topics thatno one had had evaluation tasks on such as momentum andfluid mechanics These will be called NE-questions The E-and NE-questions are listed in Appendix B of Warren 37and are assumed to provide accurate measures of studentproblem-solving performance 38 This assumption miti-gates the strength of the results reported below and morerobust measures of problem-solving performance would beuseful in future work Roughly 80 of these questions werenumerical and the remaining 20 of questions were eitherconceptual or symbolic Some of the conceptual questionswere directly taken or modified from standardized assess-ment tools such as FCI 39 and CSEM 40 All of thequestions had been used by Van Heuvelen in previous years

The first midterm exam had 2 NE-questions and noE-questions as there had not been any evaluation tasks usedto this point in the 193ndash194 course There were 2E-questions and 2 NE-questions on midterm exams 2 4 and5 Exam 3 had 4 NE questions and 3 E-questions and Exam6 had 5 NE-questions and 4 E-questions

Given this design we may make two predictions based onthe view of evaluation strategies as tools of self-regulatedlearning First because of the known population bias be-tween the 193ndash194 and 203ndash204 students the 203ndash204 stu-dents are expected to do better on the NE-questions Thisprediction assumes that whatever benefits the evaluationtasks may have for the 193ndash194 studentsrsquo E-question perfor-mance will not be transferable to the topics covered by NE-questions

A second prediction is that the performance of the 193ndash194 students should be relatively better on E-questions thanon NE-questions if the evaluation tasks succeed in benefitingstudent understanding of the topics covered by the tasksMoreover this boost of E-question performance should varyas students become more adept at the use of evaluation strat-

egies Therefore it is predicted that student performance onthe open-response evaluation tasks included on each examwill directly correlate with the relative performance of 193ndash194 students on E-questions

The relative performances of the 193ndash194 and 203ndash204students on E- and NE-questions are compared in Fig 3 andTable III In Fig 3 the normalized difference betweenthe two classes for each E- and NE-question is computed as

=Cexp minus Ccontrol

Cexp + Ccontrol

where Cexp and Ccontrol denote the percentage of students ineach condition who correctly answered the problem A posi-tive value therefore indicates that the experiment group193ndash194 outperformed the control group 203ndash204 on aparticular problem and vice-versa for a negative value Themagnitude gives a measure of how much of a differencethere was between groups and it exaggerates differences onproblems where the sum Cexp+Ccontrol is small That is itrewards improvements made on lsquodifficultrsquo problems morethan improvements made on ldquoeasyrdquo problems

The NE-question results show that the 203ndash204 studentsoften did significantly better on problems relating to topicsfor which both classes had very similar learning environ-ments with the differences on 7 NE-questions being signifi-cant at the 001-level and another 2 at the 005-level all withthe 203ndash204 sample doing better than the 193ndash194 sampleThis observation was anticipated due to the known selectionbias in this study Given this bias it would have been anachievement simply to bring the 193ndash194 students to a com-

NE-Questions

-02

-015

-01

-005

0

005

01

0 1 2 3 4 5 6 7

Exam

E-Questions

-03

-02

-01

0

01

02

03

0 1 2 3 4 5 6 7

Exam

FIG 3 Plots of the normalized difference between each condi-tion on multiple-choice exam questions

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-7

parable level of performance on the E-questions In fact theresults show that by exam 3 the performance of the 193ndash194students on E-questions was not only comparable to but wasbetter than that of the 203ndash204 students although only threeof the positive differences on E-questions were significant atthe 001-level

The favored hypothesis to explain these data is that theimproved relative performance of the 193ndash194 students onE-questions was due to the use of our evaluation tasks How-ever the strength of this hypothesis may be mitigated by thepresence of uncontrolled factors in the study Although stu-dents were not randomly assigned into the conditions it isdifficult to see how this may account for the observed differ-ences on NE- and E-questions Differences between theteaching populations for the two classes also fail to provide areasonable mechanism for producing the differences onE-questions and NE-questions simultaneously If there wereany sort of ldquogood teacher effectrdquo it would be likely to affectthe results for both E- and NE-questions

Another threat to internal validity stems from the fact thatthe lecturers were not blind to the study and may have un-intentionally skewed the results This potential experimenterbias can be argued against because of the fact that lecturesfor 193ndash194 and 203ndash204 were designed to be as similar aspossible and it is not at all clear how stylistic differencescould cause such preferential performance differences be-tween E- and NE-questions in any consistent fashion Alltopics whether those tested by E- or NE-questions werecovered in very similar fashions during lectures recitationsand laboratories Also the fact that these performance differ-ences developed and persisted through two different lecturersfor 193ndash194 suggests that differences in lecture style wereprobably not a significant causal factor

Another alternative hypothesis is that the results are dueto time-on-topic differences in the recitation and homeworkassignments Although recitation and homework assignmentswere designed to minimize such differences no actual mea-surements were made It is possible that the evaluation tasks

TABLE III Cross tabulation of performance on multiple choice MC exam problems and the exactsignificance 2-tailed for chi-squared tests of independence between each class =significant at 005 level =significant at 001 level

Exam Problem Class Correct p-value Exam Problem Class Correct p-value

Ex1 NE 1 193ndash194 711 0026 Ex4 NE 1 193ndash194 885 0001203ndash204 797 203ndash204 966

Ex1 NE 2 193ndash194 487 0001 Ex4 NE 2 193ndash194 411 0001203ndash204 655 203ndash204 559

Ex2 E 1 193ndash194 258 0001 Ex5 E 1 193ndash194 686 0001203ndash204 409 203ndash204 403

Ex2 E 2 193ndash194 813 0049 Ex5 E 2 193ndash194 957 0081

203ndash204 880 203ndash204 915

Ex2 NE 1 193ndash194 538 0361 Ex5 NE 1 193ndash194 894 0496

203ndash204 494 203ndash204 870

Ex2 NE 2 193ndash194 522 0001 Ex5 NE 2 193ndash194 931 0622

203ndash204 754 203ndash204 918

Ex3 E 1 193ndash194 831 1000 Ex6 E 1 193ndash194 913 0001203ndash204 832 203ndash204 787

Ex3 E 2 193ndash194 803 0330 Ex6 E 2 193ndash194 699 0244

203ndash204 764 203ndash204 645

Ex3 E 3 193ndash194 781 0124 Ex6 E 3 193ndash194 787 0268

203ndash204 712 203ndash204 740

Ex3 NE 1 193ndash194 809 0380 Ex6 E 4 193ndash194 546 0072

203ndash204 774 203ndash204 463

Ex3 NE 2 193ndash194 972 0368 Ex6 NE 1 193ndash194 601 0050203ndash204 954 203ndash204 690

Ex3 NE 3 193ndash194 691 0001 Ex6 NE 2 193ndash194 503 0001203ndash204 820 203ndash204 715

Ex3 NE 4 193ndash194 455 0038 Ex6 NE 3 193ndash194 623 0768

203ndash204 551 203ndash204 640

Ex4 E 1 193ndash194 964 0062 Ex6 NE 4 193ndash194 639 0001203ndash204 922 203ndash204 789

Ex4 E 2 193ndash194 911 0001 Ex6 NE 5 193ndash194 705 0146

203ndash204 731 203ndash204 765

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-8

simply took longer for students to complete and that theirperformance on E-questions was due not to the format of theactivity but simply that they spent more time thinking aboutthe concepts involved in the activity However anecdotalevidence from teaching assistants who helped students withtheir recitation and homework assignments suggests that stu-dents did not take an inordinate amount of time to completethe evaluation tasks Also there was no indication from stu-dent comments made to the teaching assistants that they feltthe evaluation tasks took much longer than other recitationand homework problems

While the results above indicate that the use of evaluationtasks benefited student problem-solving performance we canfurther test this hypothesis by looking for a concrete associa-tion between the strength of studentrsquos evaluation abilities andtheir problem-solving performance We construct two ad hocmeasures to reflect studentsrsquo apparent understanding ofwhen why and how to use each evaluation strategy for acertain topic The measures are

SCArelative = FSexp QSexp minus FScontrol QScontrol

UArelative = FUexp QUexp minus FUcontrol QUcontrol

where SCA stands for ldquospecial-case analysisrdquo UA stands forldquounit analysisrdquo FS is the fraction of the class experimentalor control which used special-case analysis and FU is thefraction which used unit analysis on the evaluation tasks in-cluded on exams 2 4 5 and 6 UArelative is not applicable forexam 5 due to the task format conceptual counterargumentAlso QS is the average quality of the classrsquo special-caseanalyses and QU is the average quality of the classrsquo unitanalyses for these tasks Each of these quantities correspondsto the appropriate values in Table I

The fractions FS and FU serve as indicators of how wellstudents understand when and why to use each strategy Ifstudents do not understand the purpose of an evaluation strat-egy they are not likely to spontaneously employ it on thecritical thinking or conceptual counterargument tasks withoutspecific prompting The quantities QS and QU are given bythe evaluation ability rubrics and measure the studentsrsquo un-derstanding of how to use each strategy By taking the prod-ucts FSQS and FUQU we get a pair of numbers be-tween 0 and 3 which are taken to be indicative of each classrsquooverall understanding of each evaluation strategy

To measure relative student performance on E- and NE-questions normalized differences in class performance foreach problem category are averaged on each exam

Erelative =1

NEminusproblems

Eminusproblems

Eminusproblem

NErelative =1

NNEminusproblems

NEminusproblems

NEminusproblem

where represents the normalized difference between thetwo classesrsquo performance as plotted in Fig 3 and N de-notes the number of problems of a certain type either E- orNE-questions on an exam

Table IV lists the values for these four measures of rela-tive class performance on each exam To determine whetherspecial-case analysis and unit analysis performance related toperformance on the exam questions a Pearson correlationanalysis of these data was performed The results indicatethat Erelative is significantly positively correlated withSCArelative r=996 p=004 and negatively correlated withUArelative r=minus921 p=255 while NErelative is not signifi-cantly correlated with both SCArelative r=447 p=553 andUArelative r=528 p=646

The most parsimonious account for these results entailsthree hypotheses One is that giving a greater proportion ofevaluation tasks on special-case analysis after exam 2 asdiscussed above helped students to improve their use of thatstrategy while at the same time reducing their use of unitanalysis The fact that we manipulated these two independentfactors in this fashion could thereby explain why the frac-tional usage of unit analysis steadily declined from exams 2to 6 see Table II

A second hypothesis is that studentsrsquo use of special-caseanalysis in recitation homework and in their personallearning behavior significantly benefited their problem-solving ability The relative performance of the 193ndash194 stu-dents on E-questions correlated very strongly with their useof special-case analysis on the exam evaluation tasks Notethat because we are looking at relative differences in perfor-mance between the experiment and control groups we cansafely rule out the possibility that this correlation is due tothe ldquoeasinessrdquo of the subject matter or any other such factor

The third hypothesis is that the use of unit analysis inrecitations homework and personal learning behavior didnot benefit studentsrsquo problem-solving ability as much asspecial-case analysis It would appear that the 193ndash194 stu-dents did not appreciate the greater utility of special-caseanalysis though since on the end-of-year survey they re-ported unit analysis as being just as valuable for learningphysics as special-case analysis see Table II It thereforeseems that the students had some limitations in their abilityto self-assess the utility of special-case analysis and unitanalysis for their learning Also the fact that NErelative isuncorrelated with both UArelative and SCArelative indicates thatthere is no transfer in the benefits of either strategy to topicsnot covered by the evaluation tasks from recitation andhomework assignments

TABLE IV Measures of relative overall class performance forexams 1 through 6 The definitions of each measure are described inthe text

Exam SCArelative UArelative Erelative NErelative

1 NA NA NA minus102

2 0179 2047 minus045 minus070

3 NA NA 0024 minus037

4 0645 1272 0066 minus098

5 0851 NA 0141 0010

6 0683 0893 0057 minus080

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-9

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 2: Impact of teaching students to use evaluation strategies

then determine what the equation model or claim predictsfor this situation and compare the prediction with ourknowledge of what actually happens Essentially the strategyis to conduct a thought experiment determining whether theequation model or claim makes sense based on prior knowl-edge

If the equation model or claim is found to be inconsistentwith prior knowledge there are three possible reasons athe equation model or claim is incorrect b the studentrsquosprior knowledge of the situation is incorrect c the studentmade a mistake in the execution of the analysis

If the equation model or claim is incorrect the studentshould determine exactly how the equation model or claimfails the special-case analysis The goal is to figure out howthe equation model or claim needs to be modified in orderto make it consistent with the prior knowledge While thisstrategy is quite general the actual process is highly depen-dent on the details of the physical model being employedand the particular context of the problem

III REMARKS

It is important to note that each evaluation strategy issubject to errors as it is wholly possible for the student tomake a mistake while using these strategies Therefore theevaluation strategies listed above provide a useful thoughimperfect means for judging the validity and soundness ofproposed solutions to quantitative questions Also there arecertainly many other evaluation strategies used in physicssuch as error analysis to evaluate an experimental result 5Research has found that it is possible and beneficial to helpstudents adopt certain paradigms and strategies for evaluat-ing experimental data 67 One of the major instructionalchallenges identified by Lippmann was helping studentsadopt a frame which values the theory of measurement asmany students were largely unaware that such a theory existsand is essential to good science Based on this one may fullyexpect that one of the difficulties in helping students learnthe evaluation strategies of unit and special-case analysiswill be helping the students to adopt a frame which valuesthese strategies

IV EVALUATION AND LEARNING

Evaluation strategies may serve a role in regulating thestructure of schemas employed by students in answeringquestions 8ndash11 That is an evaluation strategy may belinked to an array of schemata responsible for context-specific activities such as solving inclined-plane problemsFor the purposes of this paper a schema is defined to be aninterconnected network of knowledge elements that can betriggered by a variety of inputs which attempts to organizeand assign meaning to elements of the input and then utilizethe input to produce some set of cognitive outputs When acontext-specific schema is activated an evaluation strategyhas some probability of being activated as well The use ofan evaluation strategy can lead the student to recognize thatthe problem-solving schema is incoherently structured orgives results that are inconsistent with the results of another

schema Upon such recognition the student may correct herown mistake consequently restructuring the associatedschema In other words it is possible that evaluation strate-gies may be one of the agents responsible for establishinglocal and global coherence 12 and also for modifying theconditions under which a particular schema is activated

As a simple illustrative example consider a student work-ing on the homework problem shown in Fig 1 The studentrsquosattempted solution is also shown In this case the studentrsquoswork is fine except for the incorrect use of trigonometrywhen determining the force components It may be that thestudent used cos and sin as she did because she has a strongschematic connection that associates cos with the x axis andsin with the y axis By doing a special-case analysis of hersolution say by examining the case when =0deg the studentmay realize that she has made a mistake If =0deg then weexpect FN=Mg since the incline will be level requiring thenormal force to completely balance the gravitational forceHowever plugging =0deg into the studentrsquos solution givesFN=0 N This analysis therefore indicates that the studentmade a mistake in the way she dealt with the angle in hersolution

If the student tries the simplest alternative by swappingthe sin and cos functions in her solution she will get a newsolution that does pass this special-case analysis This pro-cess of evaluation and self-correction could thereby alter thestudentrsquos schema as she may in the future be less likely toblindly associate cos with the x axis and sin with the y axisIn particular a successful special-case analysis would elabo-rate on why cos should be associated with the y axis and sinwith the x axis in this case The links between knowledgeelements in the studentrsquos mind may consequently be revisedto account for this surprising result Although there are manysteps where a student may become lost or make a mistake itis possible that the use of evaluation strategies can serve aregulatory function in the organization of student knowledge

Without this evaluation strategy a studentrsquos ability to re-vise and regulate her understanding is much more limited Ifthe sole means for evaluation and feedback lie outside ourstudents they become dependant upon external agents in or-der to engage in learning This dependence strongly con-strains the range of times and places in which a student hasthe opportunity to learn as the majority of student learningcan only occur with the feedback of an instructor textbook

Question What is the normal force exerted on a block of mass M by a surface whichis inclined at an angle θ above the horizontal

θ

Student solutionX -Mg cos(θ) = M ax (after substituting FE = Mg)Y FN ndash Mg sin(θ) = M ay = 0 NTherefore

FN = Mg sin(θ)

FE

FNx

y

FIG 1 An illustrative example of a traditional homework prob-lem and a possible student solution

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-2

or peer In many college-level courses especially large-enrollment courses each student has only a few hours eachweek in the presence of an instructor and only during afraction of that time does an instructor directly interact witheach student Textbooks fail to provide any sort of dynamicfeedback for students and peer feedback may be limited by alack of availability and often may be incorrect or mislead-ing Consequently it should come as no surprise that stu-dents often learn far too little in physics courses althoughthey can learn more when courses are structured to facilitatebetter evaluative feedback to students via interactive-engagement methods 1314 such as implementations of theUniversity of Washington Tutorials 15 at the University ofColorado 16 Peer Instruction 1718 and intelligent tutor-ing systems 1920 In each case a student is given moreextensive access to finely-tuned external evaluation andhence has a greater opportunity to learn

If students had the ability to evaluate their own worktheir learning would no longer be completely contingentupon external evaluators Instead students would be able toidentify and correct their own mistakes allowing them tobetter learn on their own In fact Zimmerman and Martinez-Pons identified self-evaluation as a necessary component intheir model of self-regulated learning 21 and self-evaluation has been shown to promote self-regulated learn-ing in young students 22 Likewise Hammer 23 has iden-tified the importance of student ldquoindependencerdquo whichtypifies the extent to which a student takes responsibility forconstructing their own knowledge instead of simply accept-ing what is given by authorities without any evaluation

An important potential benefit of student self-evaluationis that it may help students develop authentic science reason-ing abilities a major goal of science education 24ndash26 Inparticular the use of evaluation strategies highlights the factthat our judgments of theoretical models can produce falsepositives and negatives and this limits our confidence insuch judgments Recognition of the sources of reasoning er-rors is fundamental to many aspects of critical thinking 27and reflective judgment 28 It seems reasonable to believethen that these abilities may be enhanced if instructors canhelp students to value and use evaluation strategies

V TEACHING AND ASSESSING EVALUATION

To help students learn how and why to use evaluationstrategies a set of formative activities and rubrics were de-veloped 2930 In general students must understand thegoal state they are trying to achieve their current level ofperformance and how to utilize descriptive feedback to im-prove their performance A scoring rubric is one tool that canbe used to help achieve these conditions The rubrics breakup abilities into finer-grained component abilities and con-tain descriptions of different levels of performance includingthe target level A student or a group of students can use therubrics to self-assess her or their own work or an instructorcan use the rubrics to assess studentsrsquo responses and providefeedback 31ndash33 Additionally written and verbal commentson student work may be given to personalize the feedbackand address particular issues in a studentrsquos work

VI EVALUATION TASKS AND RUBRICS

During the 2002ndash2003 and 2003ndash2004 academic years alibrary of tasks were designed tested and refined to helpstudents learn the evaluation strategies of unit analysis andspecial-case analysis Below is a list of the types of tasksfeatured in this study An example packet of these and othertypes of evaluation tasks can be downloaded from httppaerrutgerseduScientificAbilitiesKitsdefaultaspx

External unit analysismdashA problem and proposed solutionare given and the student must do a unit analysis to evaluateand possibly revise the given solution

External special-case analysismdashA problem and proposedsolution are given and the student must do a special-caseanalysis to evaluate and possibly revise the solution

Conceptual counterexamplemdasha conceptual claim is madeand the student is asked whether they agree or disagree andto justify their opinion In many cases the most appropriatestrategy is to do a special-case analysis although there is nospecific prompting to use any particular strategy

Integrated tasksmdashThe student must solve a problem thendo unit andor special-case analysis to evaluate their solutionand possibly revise it

Critical thinking tasksmdashThe student is given a problemand proposed solution and asked to give three independentarguments which each analyze whether the given solution isreasonable There are no explicit prompts to use any particu-lar strategy so the student must spontaneously recognize thatspecial-case and unit analysis can each be used to generatearguments and then use these strategies to make valid andsound arguments

These tasks may be used in recitation or homework as-signments Using the scoring rubrics and giving descriptivecomments on student work for evaluation tasks provides for-mative assessment of the studentsrsquo work The scoring rubricsrate student performance on a scale of 0ndash3 with the follow-ing general meanings 0=no meaningful work done1=student attempts but does not understand the generalmethod for completing the task 2=student understands thegeneral method but her execution is flawed 3=studentrsquosmethod and execution of the task are satisfactory The pro-cess by which we developed the rubrics is discussed inEtkina et al 34

A rubric score of 2 indicates the student tried applying allthe steps of the special-case analysis SCA but made somesuperficial mathematical or conceptual error along the wayIn other words the student demonstrates that she understandsthe context-independent strategic reasoning process of SCAbut commits some minor mistake in using the context-specific information during the application of the SCA egforgetting a minus sign or confusing x and y components ofa vector So the difference between a 2 and a 3 is onlyindicative of how well the student used the particularcontext-specific information in the task to get either scorethe student must demonstrate sound use of the general SCAreasoning process Our final set of evaluation rubrics yieldedan average inter-rater agreement level of 96 ranging be-tween 91 and 99 and a Cohen Kappa 35 of 0947 p 001 The rubrics are available for download at httppaerrutgerseduScientificAbilitiesRubricsdefaultaspx asRubric I

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-3

VII STUDY DESIGN AND RESULTS

Preliminary research conducted during the 2003ndash2004academic year demonstrated significant correlations betweenstudentsrsquo abilities to use evaluative strategies and theirproblem-solving performance 36 During the 2004ndash2005academic year a study was conducted to further investigatethe effects of using evaluation tasks in an algebra-basedphysics course After discussing the design of the study re-sults are presented addressing three research goals 1 mea-sure studentsrsquo abilities to use evaluation strategies 2 inves-tigate the extent to which students valued and incorporatedevaluation strategies into their personal learning behavior3 test whether the use of evaluation tasks resulted in sig-nificant improvements in student problem-solving perfor-mance

To accomplish these goals data were collected from twocourses at Rutgers University as part of a quasiexperimentalcontrol group study The experiment group consists of the193ndash194 course a year-long introductory algebra-basedphysics course for life science majors The gender distribu-tion was 46 male 54 female There were 200 studentsfor each term with 95 of students participating in both the193 and 194 courses The comparison group is the year-long203ndash204 course another introductory algebra-based coursefor life science majors There were 459 students for the fallterm and 418 of those students continued enrollment in thespring term These two courses were generally run in paral-lel although the 203ndash204 class covers slightly more materialand the students in this class typically have stronger mathand science backgrounds than in the 193ndash204 course it isthe more competitive course held on a different universitycampus This likely bias was substantiated by the data andwill serve to accentuate the results of our study as discussedbelow

The lectures for 203ndash204 were all designed and given byAlan Van Heuvelen The lectures for the 193 course weregiven by Sahana Murthy a post-doctoral student workingwith the Rutgers Physics Astronomy amp Education Group atthe time and the lectures for the 194 course were given bythe author All lectures for 193 and 194 were based on VanHeuvelenrsquos lecture notes Lectures for all courses involvedchalkboard and transparency-based presentations featuringexperimental demonstrations and student engagement viapeer discussions and student infrared response systems Thechalkboard and transparency-based presentations often beganwith an experimental demonstration to establish a concreteexample of some new class of phenomena to be studiedQuestions about the properties of the phenomena would beelicited andor posed by the lecturer These questions wouldmotivate the construction of a physical model Application ofthe model to solve conceptual and traditional problemswould then be illustrated This entailed making multiple rep-resentations of information assessing consistency betweenthe different representations and utilizing the representationstogether with the physical model in order to solve the prob-lem Students would then work in groups on one or twoproblems respond via infrared response systems and receivesome feedback from the lecturer It should be noted that lec-tures in the 203ndash204 courses did include explicit modeling of

unit analysis One premise of the study was the ability toprovide very similar lecture environments for the twocourses However some stylistic and personal differences inlecturing were unavoidable and the threat to the studyrsquos in-ternal validity will be assessed below

Recitations for both courses involved 25 students work-ing in groups of 3ndash5 on a recitation assignment with a teach-ing assistant there to provide help as needed Homeworkassignments featured problems which were distinct but usu-ally similar to problems from the recitation assignments So-lutions for recitation and homework assignments were postedonline for each course The recitation and homework assign-ments given by Van Heuvelen for 203ndash204 were generallyidentical to those given in 193ndash194 except that some prob-lems from the 203ndash204 assignments were replaced by evalu-ation activities In general there were between one and twosuch replacements each week with one replacement in thehomework assignment and usually one replacement in therecitation assignment

This replacement served as our experimental factor beingthe only designed difference between the two courses Weattempted to minimize any potential bias due to time-on-topic differences by only replacing problems which coveredthe same material and which were thought to take roughlythe same amount of time to complete as the evaluation taskswhich were used in their place Some differences in the reci-tation and homework assignments were inevitable as the203ndash204 course is required to cover more material and had55-min recitations while the 193ndash194 course had 80-minrecitations Again we will address the threats posed by suchfactors later

The evaluation tasks used in recitations and homework for193ndash194 covered only half of the topics in the course Forexample we did not include any evaluation tasks relating tomomentum fluid mechanics or wave optics although taskswere included on work-energy the first law of thermody-namics and dc circuits see Warren 37 for a list of specificevaluation tasks used This is an important feature of thestudy since it provides a baseline response which will beuseful when testing whether the use of evaluation tasks af-fects problem-solving performance

Laboratories for both courses were practically identicalwith 25 students per section working in groups of 3ndash5 withthe aid of a teaching assistant The laboratories featured threetypes of experiments investigative testing and applicationEach laboratory usually included two experiments most ofwhich were either testing or application experiments Itshould be noted that although the 193ndash194 students wererequired to take the laboratory the 203ndash204 students werenot as the laboratories constitute a distinct course labeled205ndash206 However nearly all 98 of the 203ndash204 stu-dents were enrolled in 205ndash206 concurrently

Another important set of factors in the study are the teach-ers themselves The teaching assistants in 193ndash194 wereatypical as 4 of the 8 assistants for the course were enrolledat the Graduate School of Education and another 2 assistantswere members of the Rutgers Physics and Astronomy Edu-cation Group Only 1 of the 8 teaching assistants was a tra-ditional physics graduate student In contrast the teachingassistants for 203ndash204 included a mixture of several profes-

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-4

sors and several physics graduate students none of whomwere in physics education These are at least superficiallyvery distinct populations This uncontrolled factor and thethreats it poses to the studyrsquos internal validity shall be dis-cussed below

VIII GOAL 1 TEACHING EVALUATION STRATEGIES

The 193ndash194 and 203ndash204 courses both had six examsduring the year Each exam featured a combination ofmultiple-choice questions and open-response tasks withroughly two-thirds of the test score based on multiple-choiceperformance and one-third based on the open-response tasksA typical midterm exam exams 1 2 4 5 had between 10and 15 multiple choice questions and 3 open-response tasksFinal exams exams 3 and 6 had roughly double the numberof items for each type An evaluation task was included as anopen-response task on each exam for both courses Thesetasks served as summative assessments to measure studentsrsquoevaluative abilities Student responses to the evaluation taskson each exam were photocopied and scored using our ru-brics Although we photocopied and scored each studentrsquoswork from 193ndash194 the large number of students in 203ndash204 made it impossible to do the same for them We there-fore randomly selected 150 students from the 203ndash204course whose responses to the evaluation tasks on examswould be photocopied and scored A check was performed todetermine whether this sample was representative of the en-tire class A Kruskal-Wallis test indicated that the grades forthe 203ndash204 sample were not significantly different fromthose of the remainder of the class

The evaluation tasks on exams 1 and 3 were externalspecial-case analysis tasks These allowed us to measure howwell the students could use this strategy when explicitlyasked to The evaluation tasks on exams 2 4 6 were criticalthinking tasks These allowed us to see what fraction of thestudents recognized that special-case andor unit analysiscould be used to construct such arguments as would be dem-onstrated by their spontaneous use of these strategies For

those students that did attempt to use these strategies therubrics were used to measure the quality of their use Each ofthese tasks involved topics which the 193ndash194 students hadhad both special-case and unit analysis tasks on eg dccircuits The evaluation task on exam 5 was a conceptualcounterargument task which allowed us to see what fractionof students tried to use special-case analysis and how wellthey used it For a list of the exam evaluation tasks seeWarren 37 Table I lists the results from these tasks Thereported numbers for the quality of each strategyrsquos use onexams 2 4 5 6 are the average rubric scores of those stu-dents who decided to try using that particular strategy on thecritical thinking tasks included in those exams The values incolumns 3 and 4 state the fractions of each class using eachstrategy and were determined by finding the fraction of theclass that had a score of at least 1 according to our rubrics Incolumn 4 a value of 100 indicates that students were ex-plicitly instructed to use a particular strategy on an evalua-tion task Values of NA in columns 3 and 5 indicate that noevaluation tasks relating to UA were included All differ-ences in fractional use between the two classes are statisti-cally significant at the 001-level according to a 2-test Notethat there is an NA in column 4 for exams 1 and 3 becausethe tasks given on those exams explicitly told students to doa special-case analysis

There were typically 8ndash12 students in our sample from203ndash204 who tried using special-case analysis on the evalu-ation tasks from those exams and their average quality ofuse according to our rubric was 23 for each exam Nomore than 1 student in our sample from 203ndash204 ever usedunit analysis on the critical thinking tasks

There are a few points to be made here First the resultsindicate that we were successful at helping the 193ndash194 stu-dents understand when why and how to use both strategiesThey significantly outperformed the 203ndash204 students in thefrequency of use for each evaluation strategy and eventuallydemonstrated high quality use of each strategy as measuredby the rubrics The increases in special-case analysis qualityfrom exam 1 to exam 3 and in the fraction using it fromexam 2 to exam 4 are in part attributable to the fact that the

TABLE I Measurements of studentsrsquo evaluative abilities for unit analysis UA and special-case analysisSCA on each of the six exams

Exam Class Fraction UA Fraction SCA Quality UA Quality SCA

1 193ndash194 NA 100 NA 117007

203ndash204 100 081005

2 193ndash194 089 018 248004 125008

203ndash204 001 005 300000 233019

3 193ndash194 NA 100 NA 226004

203ndash204 100 090005

4 193ndash194 053 038 273006 223007

203ndash204 001 009 300000 230013

5 193ndash194 NA 042 NA 234008

203ndash204 005 233019

6 193ndash194 038 038 235009 225007

203ndash204 001 007 300000 225013

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-5

ratio of types of tasks given in recitation and homeworkassignments was altered after exam 2 Up until then roughlyhalf of the evaluation tasks included in these assignmentsfocused on unit analysis while the other half dealt withspecial-case analysis After seeing the results from exam 2though it was realized that students were having much moresuccess with unit analysis perhaps because it is a much sim-pler strategy than special-case analysis Thereafter the ma-jority of evaluation tasks in recitations and homework fo-cused on special-case analysis and a substantial fraction40 of the students are observed to spontaneously em-ploy SCA on the exam evaluation activities

It should be clarified that on exams 4 and 6 the set of193ndash194 students who used unit analysis and the set of thosethat used special-case analysis were not identical eventhough the fractional usage was similar for both strategiesThe overlap between these two sets defined as the ratio oftheir intersection to their union was 045 for exam 4 and053 for exam 6

It is interesting that special-case analysis was used at allamong the 203ndash204 students on the critical thinking tasks onexams 2 4 5 6 as a group of roughly 8ndash12 students spon-taneously employed SCA with a high level of quality aver-age rubric scores of roughly 23 Apparently some studentsmay enter our physics courses with a well-developed under-standing of special-case analysis perhaps due to prior phys-ics courses It is also worth noting that 203ndash204 studentswho tried using special-case analysis for the open-responseexam problem typically scored very well on the multiple-choice questions with 38 of them earning perfect scores

IX GOAL 2 USING AND VALUING EVALUATIONSTRATEGIES

As a means to assess student valuation and use of evalu-ation strategies anonymous surveys were administered at thelast recitation of the spring term in the Physics 194 courseIncluded were several questions asking students to rate howfrequently they had used each strategy to evaluate their ownwork outside of class and also to rate how strongly certainfactors inhibited their use of each strategy These surveyquestions are shown in Fig 2 with the results in Table IIThere were 158 respondents to the survey out of the 200students in the course giving a response rate of 79 Theremay be a selection bias among the respondents as the stu-dents knew that one recitation grade would be dropped fromthe final grade and that the last recitation would be a reviewsession instead of a normal recitation For that reason upperand lower bounds are listed on the average scores in Table IIwhere the bounds are determined by assuming all missingrespondents would have given either the highest or lowestpossible response for the questions

To conduct a Cronbach alpha test for internal consistencythe scores to questions 3andash3d and 6andash6d were madenegative accounting for the fact that these inhibitors arelikely to reduce the usage and valuation of evaluation strat-egies by students ie all scores were coded in the sameconceptual direction The calculated alpha value is =0800 a strong result giving confidence that the items are

consistent and can therefore be used as a basis for interpre-tation about student usage and valuation of evaluation strat-egies

One may safely assume that there were very few studentswho came into the 193ndash194 course already knowing valu-ing and using either evaluation strategy This assumption issupported by the small number of students from 203ndash204who used either strategy on the tasks in exams 2 4 5 and 6see above Given that assumption the results suggest amoderate degree of success in teaching students to incorpo-rate these strategies into their personal learning behavior In-deed responses to questions 1 and 4 were probably artifi-cially lowered due to the fact that we only includedevaluation tasks relating to half of the topics during the yearIf evaluation tasks had been given on all topics it seemslikely that students would have used the evaluation strategiesmore frequently It is worth pointing out that the evidencehere has limited inferential strength because of its indirect-ness Further studies with more direct means of assessingwhether students learn when and why to employ evaluationstrategies would be useful

1 How much have you used special-case analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

2 If you were a physics major and really interested in learning physics how useful do youthink special-case analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

3 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dospecial-case analysis (10 means the factor really made you not want to do special-caseanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do special-case analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a special-case analysis (if you wereniacutet confused put 0)

4 How much have you used dimensional analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

5 If you were a physics major and really interested in learning physics how useful do youthink dimensional analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

6 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dodimensional analysis (10 means the factor really made you not want to do dimensionalanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do dimensional analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a dimensional analysis (if you wereniacutet confused put 0)

FIG 2 End-of-year survey questions regarding the studentsrsquoself-reported use of each evaluation strategy

TABLE II Results from end-of-year survey questions

Question number Average response Lower bound Upper bound

1 225 19 28

2 385 32 41

3a 7010 55 76

3b 5110 40 61

3c 2410 19 40

3d 2210 17 38

4 295 25 33

5 395 33 41

6a 5910 47 67

6b 4710 37 58

6c 1710 13 34

6d 1610 13 34

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-6

The relatively low response scores to questions 3c 3d 6cand 6d are consistent with the results for goal one of ourstudy indicating we were successful at helping students un-derstand when why and how to use each strategy The factthat the scores for 3c and 3d are higher than 6c and 6dappears reasonable because special-case analysis is certainlya much more complicated and multifaceted strategy that unitanalysis This disparity in complexity probably also explainswhy the scores to 3a and 3b were higher than 6a and 6b

X GOAL 3 ENHANCING PROBLEM-SOLVINGPERFORMANCE BY TEACHING EVALUATION

Several conventional multiple-choice questions werecommon to each of the 193ndash194 and 203ndash204 lecture examsThere were six exams during the year three per semesterThe midterm exams exams 1 2 4 and 5 were 80 minwhile the final exams exams 3 and 6 were 3 h The examquestions shared by the two classes were all designed by theinstructor of the 203ndash204 course Van Heuvelen Some ofthese shared multiple-choice questions were on topics whichthe 193ndash194 students had had evaluation tasks on such aswork-energy and dc circuits This set of multiple-choicequestions will be called E-questions The remainder of theshared multiple-choice exam questions covered topics thatno one had had evaluation tasks on such as momentum andfluid mechanics These will be called NE-questions The E-and NE-questions are listed in Appendix B of Warren 37and are assumed to provide accurate measures of studentproblem-solving performance 38 This assumption miti-gates the strength of the results reported below and morerobust measures of problem-solving performance would beuseful in future work Roughly 80 of these questions werenumerical and the remaining 20 of questions were eitherconceptual or symbolic Some of the conceptual questionswere directly taken or modified from standardized assess-ment tools such as FCI 39 and CSEM 40 All of thequestions had been used by Van Heuvelen in previous years

The first midterm exam had 2 NE-questions and noE-questions as there had not been any evaluation tasks usedto this point in the 193ndash194 course There were 2E-questions and 2 NE-questions on midterm exams 2 4 and5 Exam 3 had 4 NE questions and 3 E-questions and Exam6 had 5 NE-questions and 4 E-questions

Given this design we may make two predictions based onthe view of evaluation strategies as tools of self-regulatedlearning First because of the known population bias be-tween the 193ndash194 and 203ndash204 students the 203ndash204 stu-dents are expected to do better on the NE-questions Thisprediction assumes that whatever benefits the evaluationtasks may have for the 193ndash194 studentsrsquo E-question perfor-mance will not be transferable to the topics covered by NE-questions

A second prediction is that the performance of the 193ndash194 students should be relatively better on E-questions thanon NE-questions if the evaluation tasks succeed in benefitingstudent understanding of the topics covered by the tasksMoreover this boost of E-question performance should varyas students become more adept at the use of evaluation strat-

egies Therefore it is predicted that student performance onthe open-response evaluation tasks included on each examwill directly correlate with the relative performance of 193ndash194 students on E-questions

The relative performances of the 193ndash194 and 203ndash204students on E- and NE-questions are compared in Fig 3 andTable III In Fig 3 the normalized difference betweenthe two classes for each E- and NE-question is computed as

=Cexp minus Ccontrol

Cexp + Ccontrol

where Cexp and Ccontrol denote the percentage of students ineach condition who correctly answered the problem A posi-tive value therefore indicates that the experiment group193ndash194 outperformed the control group 203ndash204 on aparticular problem and vice-versa for a negative value Themagnitude gives a measure of how much of a differencethere was between groups and it exaggerates differences onproblems where the sum Cexp+Ccontrol is small That is itrewards improvements made on lsquodifficultrsquo problems morethan improvements made on ldquoeasyrdquo problems

The NE-question results show that the 203ndash204 studentsoften did significantly better on problems relating to topicsfor which both classes had very similar learning environ-ments with the differences on 7 NE-questions being signifi-cant at the 001-level and another 2 at the 005-level all withthe 203ndash204 sample doing better than the 193ndash194 sampleThis observation was anticipated due to the known selectionbias in this study Given this bias it would have been anachievement simply to bring the 193ndash194 students to a com-

NE-Questions

-02

-015

-01

-005

0

005

01

0 1 2 3 4 5 6 7

Exam

E-Questions

-03

-02

-01

0

01

02

03

0 1 2 3 4 5 6 7

Exam

FIG 3 Plots of the normalized difference between each condi-tion on multiple-choice exam questions

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-7

parable level of performance on the E-questions In fact theresults show that by exam 3 the performance of the 193ndash194students on E-questions was not only comparable to but wasbetter than that of the 203ndash204 students although only threeof the positive differences on E-questions were significant atthe 001-level

The favored hypothesis to explain these data is that theimproved relative performance of the 193ndash194 students onE-questions was due to the use of our evaluation tasks How-ever the strength of this hypothesis may be mitigated by thepresence of uncontrolled factors in the study Although stu-dents were not randomly assigned into the conditions it isdifficult to see how this may account for the observed differ-ences on NE- and E-questions Differences between theteaching populations for the two classes also fail to provide areasonable mechanism for producing the differences onE-questions and NE-questions simultaneously If there wereany sort of ldquogood teacher effectrdquo it would be likely to affectthe results for both E- and NE-questions

Another threat to internal validity stems from the fact thatthe lecturers were not blind to the study and may have un-intentionally skewed the results This potential experimenterbias can be argued against because of the fact that lecturesfor 193ndash194 and 203ndash204 were designed to be as similar aspossible and it is not at all clear how stylistic differencescould cause such preferential performance differences be-tween E- and NE-questions in any consistent fashion Alltopics whether those tested by E- or NE-questions werecovered in very similar fashions during lectures recitationsand laboratories Also the fact that these performance differ-ences developed and persisted through two different lecturersfor 193ndash194 suggests that differences in lecture style wereprobably not a significant causal factor

Another alternative hypothesis is that the results are dueto time-on-topic differences in the recitation and homeworkassignments Although recitation and homework assignmentswere designed to minimize such differences no actual mea-surements were made It is possible that the evaluation tasks

TABLE III Cross tabulation of performance on multiple choice MC exam problems and the exactsignificance 2-tailed for chi-squared tests of independence between each class =significant at 005 level =significant at 001 level

Exam Problem Class Correct p-value Exam Problem Class Correct p-value

Ex1 NE 1 193ndash194 711 0026 Ex4 NE 1 193ndash194 885 0001203ndash204 797 203ndash204 966

Ex1 NE 2 193ndash194 487 0001 Ex4 NE 2 193ndash194 411 0001203ndash204 655 203ndash204 559

Ex2 E 1 193ndash194 258 0001 Ex5 E 1 193ndash194 686 0001203ndash204 409 203ndash204 403

Ex2 E 2 193ndash194 813 0049 Ex5 E 2 193ndash194 957 0081

203ndash204 880 203ndash204 915

Ex2 NE 1 193ndash194 538 0361 Ex5 NE 1 193ndash194 894 0496

203ndash204 494 203ndash204 870

Ex2 NE 2 193ndash194 522 0001 Ex5 NE 2 193ndash194 931 0622

203ndash204 754 203ndash204 918

Ex3 E 1 193ndash194 831 1000 Ex6 E 1 193ndash194 913 0001203ndash204 832 203ndash204 787

Ex3 E 2 193ndash194 803 0330 Ex6 E 2 193ndash194 699 0244

203ndash204 764 203ndash204 645

Ex3 E 3 193ndash194 781 0124 Ex6 E 3 193ndash194 787 0268

203ndash204 712 203ndash204 740

Ex3 NE 1 193ndash194 809 0380 Ex6 E 4 193ndash194 546 0072

203ndash204 774 203ndash204 463

Ex3 NE 2 193ndash194 972 0368 Ex6 NE 1 193ndash194 601 0050203ndash204 954 203ndash204 690

Ex3 NE 3 193ndash194 691 0001 Ex6 NE 2 193ndash194 503 0001203ndash204 820 203ndash204 715

Ex3 NE 4 193ndash194 455 0038 Ex6 NE 3 193ndash194 623 0768

203ndash204 551 203ndash204 640

Ex4 E 1 193ndash194 964 0062 Ex6 NE 4 193ndash194 639 0001203ndash204 922 203ndash204 789

Ex4 E 2 193ndash194 911 0001 Ex6 NE 5 193ndash194 705 0146

203ndash204 731 203ndash204 765

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-8

simply took longer for students to complete and that theirperformance on E-questions was due not to the format of theactivity but simply that they spent more time thinking aboutthe concepts involved in the activity However anecdotalevidence from teaching assistants who helped students withtheir recitation and homework assignments suggests that stu-dents did not take an inordinate amount of time to completethe evaluation tasks Also there was no indication from stu-dent comments made to the teaching assistants that they feltthe evaluation tasks took much longer than other recitationand homework problems

While the results above indicate that the use of evaluationtasks benefited student problem-solving performance we canfurther test this hypothesis by looking for a concrete associa-tion between the strength of studentrsquos evaluation abilities andtheir problem-solving performance We construct two ad hocmeasures to reflect studentsrsquo apparent understanding ofwhen why and how to use each evaluation strategy for acertain topic The measures are

SCArelative = FSexp QSexp minus FScontrol QScontrol

UArelative = FUexp QUexp minus FUcontrol QUcontrol

where SCA stands for ldquospecial-case analysisrdquo UA stands forldquounit analysisrdquo FS is the fraction of the class experimentalor control which used special-case analysis and FU is thefraction which used unit analysis on the evaluation tasks in-cluded on exams 2 4 5 and 6 UArelative is not applicable forexam 5 due to the task format conceptual counterargumentAlso QS is the average quality of the classrsquo special-caseanalyses and QU is the average quality of the classrsquo unitanalyses for these tasks Each of these quantities correspondsto the appropriate values in Table I

The fractions FS and FU serve as indicators of how wellstudents understand when and why to use each strategy Ifstudents do not understand the purpose of an evaluation strat-egy they are not likely to spontaneously employ it on thecritical thinking or conceptual counterargument tasks withoutspecific prompting The quantities QS and QU are given bythe evaluation ability rubrics and measure the studentsrsquo un-derstanding of how to use each strategy By taking the prod-ucts FSQS and FUQU we get a pair of numbers be-tween 0 and 3 which are taken to be indicative of each classrsquooverall understanding of each evaluation strategy

To measure relative student performance on E- and NE-questions normalized differences in class performance foreach problem category are averaged on each exam

Erelative =1

NEminusproblems

Eminusproblems

Eminusproblem

NErelative =1

NNEminusproblems

NEminusproblems

NEminusproblem

where represents the normalized difference between thetwo classesrsquo performance as plotted in Fig 3 and N de-notes the number of problems of a certain type either E- orNE-questions on an exam

Table IV lists the values for these four measures of rela-tive class performance on each exam To determine whetherspecial-case analysis and unit analysis performance related toperformance on the exam questions a Pearson correlationanalysis of these data was performed The results indicatethat Erelative is significantly positively correlated withSCArelative r=996 p=004 and negatively correlated withUArelative r=minus921 p=255 while NErelative is not signifi-cantly correlated with both SCArelative r=447 p=553 andUArelative r=528 p=646

The most parsimonious account for these results entailsthree hypotheses One is that giving a greater proportion ofevaluation tasks on special-case analysis after exam 2 asdiscussed above helped students to improve their use of thatstrategy while at the same time reducing their use of unitanalysis The fact that we manipulated these two independentfactors in this fashion could thereby explain why the frac-tional usage of unit analysis steadily declined from exams 2to 6 see Table II

A second hypothesis is that studentsrsquo use of special-caseanalysis in recitation homework and in their personallearning behavior significantly benefited their problem-solving ability The relative performance of the 193ndash194 stu-dents on E-questions correlated very strongly with their useof special-case analysis on the exam evaluation tasks Notethat because we are looking at relative differences in perfor-mance between the experiment and control groups we cansafely rule out the possibility that this correlation is due tothe ldquoeasinessrdquo of the subject matter or any other such factor

The third hypothesis is that the use of unit analysis inrecitations homework and personal learning behavior didnot benefit studentsrsquo problem-solving ability as much asspecial-case analysis It would appear that the 193ndash194 stu-dents did not appreciate the greater utility of special-caseanalysis though since on the end-of-year survey they re-ported unit analysis as being just as valuable for learningphysics as special-case analysis see Table II It thereforeseems that the students had some limitations in their abilityto self-assess the utility of special-case analysis and unitanalysis for their learning Also the fact that NErelative isuncorrelated with both UArelative and SCArelative indicates thatthere is no transfer in the benefits of either strategy to topicsnot covered by the evaluation tasks from recitation andhomework assignments

TABLE IV Measures of relative overall class performance forexams 1 through 6 The definitions of each measure are described inthe text

Exam SCArelative UArelative Erelative NErelative

1 NA NA NA minus102

2 0179 2047 minus045 minus070

3 NA NA 0024 minus037

4 0645 1272 0066 minus098

5 0851 NA 0141 0010

6 0683 0893 0057 minus080

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-9

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 3: Impact of teaching students to use evaluation strategies

or peer In many college-level courses especially large-enrollment courses each student has only a few hours eachweek in the presence of an instructor and only during afraction of that time does an instructor directly interact witheach student Textbooks fail to provide any sort of dynamicfeedback for students and peer feedback may be limited by alack of availability and often may be incorrect or mislead-ing Consequently it should come as no surprise that stu-dents often learn far too little in physics courses althoughthey can learn more when courses are structured to facilitatebetter evaluative feedback to students via interactive-engagement methods 1314 such as implementations of theUniversity of Washington Tutorials 15 at the University ofColorado 16 Peer Instruction 1718 and intelligent tutor-ing systems 1920 In each case a student is given moreextensive access to finely-tuned external evaluation andhence has a greater opportunity to learn

If students had the ability to evaluate their own worktheir learning would no longer be completely contingentupon external evaluators Instead students would be able toidentify and correct their own mistakes allowing them tobetter learn on their own In fact Zimmerman and Martinez-Pons identified self-evaluation as a necessary component intheir model of self-regulated learning 21 and self-evaluation has been shown to promote self-regulated learn-ing in young students 22 Likewise Hammer 23 has iden-tified the importance of student ldquoindependencerdquo whichtypifies the extent to which a student takes responsibility forconstructing their own knowledge instead of simply accept-ing what is given by authorities without any evaluation

An important potential benefit of student self-evaluationis that it may help students develop authentic science reason-ing abilities a major goal of science education 24ndash26 Inparticular the use of evaluation strategies highlights the factthat our judgments of theoretical models can produce falsepositives and negatives and this limits our confidence insuch judgments Recognition of the sources of reasoning er-rors is fundamental to many aspects of critical thinking 27and reflective judgment 28 It seems reasonable to believethen that these abilities may be enhanced if instructors canhelp students to value and use evaluation strategies

V TEACHING AND ASSESSING EVALUATION

To help students learn how and why to use evaluationstrategies a set of formative activities and rubrics were de-veloped 2930 In general students must understand thegoal state they are trying to achieve their current level ofperformance and how to utilize descriptive feedback to im-prove their performance A scoring rubric is one tool that canbe used to help achieve these conditions The rubrics breakup abilities into finer-grained component abilities and con-tain descriptions of different levels of performance includingthe target level A student or a group of students can use therubrics to self-assess her or their own work or an instructorcan use the rubrics to assess studentsrsquo responses and providefeedback 31ndash33 Additionally written and verbal commentson student work may be given to personalize the feedbackand address particular issues in a studentrsquos work

VI EVALUATION TASKS AND RUBRICS

During the 2002ndash2003 and 2003ndash2004 academic years alibrary of tasks were designed tested and refined to helpstudents learn the evaluation strategies of unit analysis andspecial-case analysis Below is a list of the types of tasksfeatured in this study An example packet of these and othertypes of evaluation tasks can be downloaded from httppaerrutgerseduScientificAbilitiesKitsdefaultaspx

External unit analysismdashA problem and proposed solutionare given and the student must do a unit analysis to evaluateand possibly revise the given solution

External special-case analysismdashA problem and proposedsolution are given and the student must do a special-caseanalysis to evaluate and possibly revise the solution

Conceptual counterexamplemdasha conceptual claim is madeand the student is asked whether they agree or disagree andto justify their opinion In many cases the most appropriatestrategy is to do a special-case analysis although there is nospecific prompting to use any particular strategy

Integrated tasksmdashThe student must solve a problem thendo unit andor special-case analysis to evaluate their solutionand possibly revise it

Critical thinking tasksmdashThe student is given a problemand proposed solution and asked to give three independentarguments which each analyze whether the given solution isreasonable There are no explicit prompts to use any particu-lar strategy so the student must spontaneously recognize thatspecial-case and unit analysis can each be used to generatearguments and then use these strategies to make valid andsound arguments

These tasks may be used in recitation or homework as-signments Using the scoring rubrics and giving descriptivecomments on student work for evaluation tasks provides for-mative assessment of the studentsrsquo work The scoring rubricsrate student performance on a scale of 0ndash3 with the follow-ing general meanings 0=no meaningful work done1=student attempts but does not understand the generalmethod for completing the task 2=student understands thegeneral method but her execution is flawed 3=studentrsquosmethod and execution of the task are satisfactory The pro-cess by which we developed the rubrics is discussed inEtkina et al 34

A rubric score of 2 indicates the student tried applying allthe steps of the special-case analysis SCA but made somesuperficial mathematical or conceptual error along the wayIn other words the student demonstrates that she understandsthe context-independent strategic reasoning process of SCAbut commits some minor mistake in using the context-specific information during the application of the SCA egforgetting a minus sign or confusing x and y components ofa vector So the difference between a 2 and a 3 is onlyindicative of how well the student used the particularcontext-specific information in the task to get either scorethe student must demonstrate sound use of the general SCAreasoning process Our final set of evaluation rubrics yieldedan average inter-rater agreement level of 96 ranging be-tween 91 and 99 and a Cohen Kappa 35 of 0947 p 001 The rubrics are available for download at httppaerrutgerseduScientificAbilitiesRubricsdefaultaspx asRubric I

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-3

VII STUDY DESIGN AND RESULTS

Preliminary research conducted during the 2003ndash2004academic year demonstrated significant correlations betweenstudentsrsquo abilities to use evaluative strategies and theirproblem-solving performance 36 During the 2004ndash2005academic year a study was conducted to further investigatethe effects of using evaluation tasks in an algebra-basedphysics course After discussing the design of the study re-sults are presented addressing three research goals 1 mea-sure studentsrsquo abilities to use evaluation strategies 2 inves-tigate the extent to which students valued and incorporatedevaluation strategies into their personal learning behavior3 test whether the use of evaluation tasks resulted in sig-nificant improvements in student problem-solving perfor-mance

To accomplish these goals data were collected from twocourses at Rutgers University as part of a quasiexperimentalcontrol group study The experiment group consists of the193ndash194 course a year-long introductory algebra-basedphysics course for life science majors The gender distribu-tion was 46 male 54 female There were 200 studentsfor each term with 95 of students participating in both the193 and 194 courses The comparison group is the year-long203ndash204 course another introductory algebra-based coursefor life science majors There were 459 students for the fallterm and 418 of those students continued enrollment in thespring term These two courses were generally run in paral-lel although the 203ndash204 class covers slightly more materialand the students in this class typically have stronger mathand science backgrounds than in the 193ndash204 course it isthe more competitive course held on a different universitycampus This likely bias was substantiated by the data andwill serve to accentuate the results of our study as discussedbelow

The lectures for 203ndash204 were all designed and given byAlan Van Heuvelen The lectures for the 193 course weregiven by Sahana Murthy a post-doctoral student workingwith the Rutgers Physics Astronomy amp Education Group atthe time and the lectures for the 194 course were given bythe author All lectures for 193 and 194 were based on VanHeuvelenrsquos lecture notes Lectures for all courses involvedchalkboard and transparency-based presentations featuringexperimental demonstrations and student engagement viapeer discussions and student infrared response systems Thechalkboard and transparency-based presentations often beganwith an experimental demonstration to establish a concreteexample of some new class of phenomena to be studiedQuestions about the properties of the phenomena would beelicited andor posed by the lecturer These questions wouldmotivate the construction of a physical model Application ofthe model to solve conceptual and traditional problemswould then be illustrated This entailed making multiple rep-resentations of information assessing consistency betweenthe different representations and utilizing the representationstogether with the physical model in order to solve the prob-lem Students would then work in groups on one or twoproblems respond via infrared response systems and receivesome feedback from the lecturer It should be noted that lec-tures in the 203ndash204 courses did include explicit modeling of

unit analysis One premise of the study was the ability toprovide very similar lecture environments for the twocourses However some stylistic and personal differences inlecturing were unavoidable and the threat to the studyrsquos in-ternal validity will be assessed below

Recitations for both courses involved 25 students work-ing in groups of 3ndash5 on a recitation assignment with a teach-ing assistant there to provide help as needed Homeworkassignments featured problems which were distinct but usu-ally similar to problems from the recitation assignments So-lutions for recitation and homework assignments were postedonline for each course The recitation and homework assign-ments given by Van Heuvelen for 203ndash204 were generallyidentical to those given in 193ndash194 except that some prob-lems from the 203ndash204 assignments were replaced by evalu-ation activities In general there were between one and twosuch replacements each week with one replacement in thehomework assignment and usually one replacement in therecitation assignment

This replacement served as our experimental factor beingthe only designed difference between the two courses Weattempted to minimize any potential bias due to time-on-topic differences by only replacing problems which coveredthe same material and which were thought to take roughlythe same amount of time to complete as the evaluation taskswhich were used in their place Some differences in the reci-tation and homework assignments were inevitable as the203ndash204 course is required to cover more material and had55-min recitations while the 193ndash194 course had 80-minrecitations Again we will address the threats posed by suchfactors later

The evaluation tasks used in recitations and homework for193ndash194 covered only half of the topics in the course Forexample we did not include any evaluation tasks relating tomomentum fluid mechanics or wave optics although taskswere included on work-energy the first law of thermody-namics and dc circuits see Warren 37 for a list of specificevaluation tasks used This is an important feature of thestudy since it provides a baseline response which will beuseful when testing whether the use of evaluation tasks af-fects problem-solving performance

Laboratories for both courses were practically identicalwith 25 students per section working in groups of 3ndash5 withthe aid of a teaching assistant The laboratories featured threetypes of experiments investigative testing and applicationEach laboratory usually included two experiments most ofwhich were either testing or application experiments Itshould be noted that although the 193ndash194 students wererequired to take the laboratory the 203ndash204 students werenot as the laboratories constitute a distinct course labeled205ndash206 However nearly all 98 of the 203ndash204 stu-dents were enrolled in 205ndash206 concurrently

Another important set of factors in the study are the teach-ers themselves The teaching assistants in 193ndash194 wereatypical as 4 of the 8 assistants for the course were enrolledat the Graduate School of Education and another 2 assistantswere members of the Rutgers Physics and Astronomy Edu-cation Group Only 1 of the 8 teaching assistants was a tra-ditional physics graduate student In contrast the teachingassistants for 203ndash204 included a mixture of several profes-

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-4

sors and several physics graduate students none of whomwere in physics education These are at least superficiallyvery distinct populations This uncontrolled factor and thethreats it poses to the studyrsquos internal validity shall be dis-cussed below

VIII GOAL 1 TEACHING EVALUATION STRATEGIES

The 193ndash194 and 203ndash204 courses both had six examsduring the year Each exam featured a combination ofmultiple-choice questions and open-response tasks withroughly two-thirds of the test score based on multiple-choiceperformance and one-third based on the open-response tasksA typical midterm exam exams 1 2 4 5 had between 10and 15 multiple choice questions and 3 open-response tasksFinal exams exams 3 and 6 had roughly double the numberof items for each type An evaluation task was included as anopen-response task on each exam for both courses Thesetasks served as summative assessments to measure studentsrsquoevaluative abilities Student responses to the evaluation taskson each exam were photocopied and scored using our ru-brics Although we photocopied and scored each studentrsquoswork from 193ndash194 the large number of students in 203ndash204 made it impossible to do the same for them We there-fore randomly selected 150 students from the 203ndash204course whose responses to the evaluation tasks on examswould be photocopied and scored A check was performed todetermine whether this sample was representative of the en-tire class A Kruskal-Wallis test indicated that the grades forthe 203ndash204 sample were not significantly different fromthose of the remainder of the class

The evaluation tasks on exams 1 and 3 were externalspecial-case analysis tasks These allowed us to measure howwell the students could use this strategy when explicitlyasked to The evaluation tasks on exams 2 4 6 were criticalthinking tasks These allowed us to see what fraction of thestudents recognized that special-case andor unit analysiscould be used to construct such arguments as would be dem-onstrated by their spontaneous use of these strategies For

those students that did attempt to use these strategies therubrics were used to measure the quality of their use Each ofthese tasks involved topics which the 193ndash194 students hadhad both special-case and unit analysis tasks on eg dccircuits The evaluation task on exam 5 was a conceptualcounterargument task which allowed us to see what fractionof students tried to use special-case analysis and how wellthey used it For a list of the exam evaluation tasks seeWarren 37 Table I lists the results from these tasks Thereported numbers for the quality of each strategyrsquos use onexams 2 4 5 6 are the average rubric scores of those stu-dents who decided to try using that particular strategy on thecritical thinking tasks included in those exams The values incolumns 3 and 4 state the fractions of each class using eachstrategy and were determined by finding the fraction of theclass that had a score of at least 1 according to our rubrics Incolumn 4 a value of 100 indicates that students were ex-plicitly instructed to use a particular strategy on an evalua-tion task Values of NA in columns 3 and 5 indicate that noevaluation tasks relating to UA were included All differ-ences in fractional use between the two classes are statisti-cally significant at the 001-level according to a 2-test Notethat there is an NA in column 4 for exams 1 and 3 becausethe tasks given on those exams explicitly told students to doa special-case analysis

There were typically 8ndash12 students in our sample from203ndash204 who tried using special-case analysis on the evalu-ation tasks from those exams and their average quality ofuse according to our rubric was 23 for each exam Nomore than 1 student in our sample from 203ndash204 ever usedunit analysis on the critical thinking tasks

There are a few points to be made here First the resultsindicate that we were successful at helping the 193ndash194 stu-dents understand when why and how to use both strategiesThey significantly outperformed the 203ndash204 students in thefrequency of use for each evaluation strategy and eventuallydemonstrated high quality use of each strategy as measuredby the rubrics The increases in special-case analysis qualityfrom exam 1 to exam 3 and in the fraction using it fromexam 2 to exam 4 are in part attributable to the fact that the

TABLE I Measurements of studentsrsquo evaluative abilities for unit analysis UA and special-case analysisSCA on each of the six exams

Exam Class Fraction UA Fraction SCA Quality UA Quality SCA

1 193ndash194 NA 100 NA 117007

203ndash204 100 081005

2 193ndash194 089 018 248004 125008

203ndash204 001 005 300000 233019

3 193ndash194 NA 100 NA 226004

203ndash204 100 090005

4 193ndash194 053 038 273006 223007

203ndash204 001 009 300000 230013

5 193ndash194 NA 042 NA 234008

203ndash204 005 233019

6 193ndash194 038 038 235009 225007

203ndash204 001 007 300000 225013

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-5

ratio of types of tasks given in recitation and homeworkassignments was altered after exam 2 Up until then roughlyhalf of the evaluation tasks included in these assignmentsfocused on unit analysis while the other half dealt withspecial-case analysis After seeing the results from exam 2though it was realized that students were having much moresuccess with unit analysis perhaps because it is a much sim-pler strategy than special-case analysis Thereafter the ma-jority of evaluation tasks in recitations and homework fo-cused on special-case analysis and a substantial fraction40 of the students are observed to spontaneously em-ploy SCA on the exam evaluation activities

It should be clarified that on exams 4 and 6 the set of193ndash194 students who used unit analysis and the set of thosethat used special-case analysis were not identical eventhough the fractional usage was similar for both strategiesThe overlap between these two sets defined as the ratio oftheir intersection to their union was 045 for exam 4 and053 for exam 6

It is interesting that special-case analysis was used at allamong the 203ndash204 students on the critical thinking tasks onexams 2 4 5 6 as a group of roughly 8ndash12 students spon-taneously employed SCA with a high level of quality aver-age rubric scores of roughly 23 Apparently some studentsmay enter our physics courses with a well-developed under-standing of special-case analysis perhaps due to prior phys-ics courses It is also worth noting that 203ndash204 studentswho tried using special-case analysis for the open-responseexam problem typically scored very well on the multiple-choice questions with 38 of them earning perfect scores

IX GOAL 2 USING AND VALUING EVALUATIONSTRATEGIES

As a means to assess student valuation and use of evalu-ation strategies anonymous surveys were administered at thelast recitation of the spring term in the Physics 194 courseIncluded were several questions asking students to rate howfrequently they had used each strategy to evaluate their ownwork outside of class and also to rate how strongly certainfactors inhibited their use of each strategy These surveyquestions are shown in Fig 2 with the results in Table IIThere were 158 respondents to the survey out of the 200students in the course giving a response rate of 79 Theremay be a selection bias among the respondents as the stu-dents knew that one recitation grade would be dropped fromthe final grade and that the last recitation would be a reviewsession instead of a normal recitation For that reason upperand lower bounds are listed on the average scores in Table IIwhere the bounds are determined by assuming all missingrespondents would have given either the highest or lowestpossible response for the questions

To conduct a Cronbach alpha test for internal consistencythe scores to questions 3andash3d and 6andash6d were madenegative accounting for the fact that these inhibitors arelikely to reduce the usage and valuation of evaluation strat-egies by students ie all scores were coded in the sameconceptual direction The calculated alpha value is =0800 a strong result giving confidence that the items are

consistent and can therefore be used as a basis for interpre-tation about student usage and valuation of evaluation strat-egies

One may safely assume that there were very few studentswho came into the 193ndash194 course already knowing valu-ing and using either evaluation strategy This assumption issupported by the small number of students from 203ndash204who used either strategy on the tasks in exams 2 4 5 and 6see above Given that assumption the results suggest amoderate degree of success in teaching students to incorpo-rate these strategies into their personal learning behavior In-deed responses to questions 1 and 4 were probably artifi-cially lowered due to the fact that we only includedevaluation tasks relating to half of the topics during the yearIf evaluation tasks had been given on all topics it seemslikely that students would have used the evaluation strategiesmore frequently It is worth pointing out that the evidencehere has limited inferential strength because of its indirect-ness Further studies with more direct means of assessingwhether students learn when and why to employ evaluationstrategies would be useful

1 How much have you used special-case analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

2 If you were a physics major and really interested in learning physics how useful do youthink special-case analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

3 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dospecial-case analysis (10 means the factor really made you not want to do special-caseanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do special-case analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a special-case analysis (if you wereniacutet confused put 0)

4 How much have you used dimensional analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

5 If you were a physics major and really interested in learning physics how useful do youthink dimensional analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

6 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dodimensional analysis (10 means the factor really made you not want to do dimensionalanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do dimensional analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a dimensional analysis (if you wereniacutet confused put 0)

FIG 2 End-of-year survey questions regarding the studentsrsquoself-reported use of each evaluation strategy

TABLE II Results from end-of-year survey questions

Question number Average response Lower bound Upper bound

1 225 19 28

2 385 32 41

3a 7010 55 76

3b 5110 40 61

3c 2410 19 40

3d 2210 17 38

4 295 25 33

5 395 33 41

6a 5910 47 67

6b 4710 37 58

6c 1710 13 34

6d 1610 13 34

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-6

The relatively low response scores to questions 3c 3d 6cand 6d are consistent with the results for goal one of ourstudy indicating we were successful at helping students un-derstand when why and how to use each strategy The factthat the scores for 3c and 3d are higher than 6c and 6dappears reasonable because special-case analysis is certainlya much more complicated and multifaceted strategy that unitanalysis This disparity in complexity probably also explainswhy the scores to 3a and 3b were higher than 6a and 6b

X GOAL 3 ENHANCING PROBLEM-SOLVINGPERFORMANCE BY TEACHING EVALUATION

Several conventional multiple-choice questions werecommon to each of the 193ndash194 and 203ndash204 lecture examsThere were six exams during the year three per semesterThe midterm exams exams 1 2 4 and 5 were 80 minwhile the final exams exams 3 and 6 were 3 h The examquestions shared by the two classes were all designed by theinstructor of the 203ndash204 course Van Heuvelen Some ofthese shared multiple-choice questions were on topics whichthe 193ndash194 students had had evaluation tasks on such aswork-energy and dc circuits This set of multiple-choicequestions will be called E-questions The remainder of theshared multiple-choice exam questions covered topics thatno one had had evaluation tasks on such as momentum andfluid mechanics These will be called NE-questions The E-and NE-questions are listed in Appendix B of Warren 37and are assumed to provide accurate measures of studentproblem-solving performance 38 This assumption miti-gates the strength of the results reported below and morerobust measures of problem-solving performance would beuseful in future work Roughly 80 of these questions werenumerical and the remaining 20 of questions were eitherconceptual or symbolic Some of the conceptual questionswere directly taken or modified from standardized assess-ment tools such as FCI 39 and CSEM 40 All of thequestions had been used by Van Heuvelen in previous years

The first midterm exam had 2 NE-questions and noE-questions as there had not been any evaluation tasks usedto this point in the 193ndash194 course There were 2E-questions and 2 NE-questions on midterm exams 2 4 and5 Exam 3 had 4 NE questions and 3 E-questions and Exam6 had 5 NE-questions and 4 E-questions

Given this design we may make two predictions based onthe view of evaluation strategies as tools of self-regulatedlearning First because of the known population bias be-tween the 193ndash194 and 203ndash204 students the 203ndash204 stu-dents are expected to do better on the NE-questions Thisprediction assumes that whatever benefits the evaluationtasks may have for the 193ndash194 studentsrsquo E-question perfor-mance will not be transferable to the topics covered by NE-questions

A second prediction is that the performance of the 193ndash194 students should be relatively better on E-questions thanon NE-questions if the evaluation tasks succeed in benefitingstudent understanding of the topics covered by the tasksMoreover this boost of E-question performance should varyas students become more adept at the use of evaluation strat-

egies Therefore it is predicted that student performance onthe open-response evaluation tasks included on each examwill directly correlate with the relative performance of 193ndash194 students on E-questions

The relative performances of the 193ndash194 and 203ndash204students on E- and NE-questions are compared in Fig 3 andTable III In Fig 3 the normalized difference betweenthe two classes for each E- and NE-question is computed as

=Cexp minus Ccontrol

Cexp + Ccontrol

where Cexp and Ccontrol denote the percentage of students ineach condition who correctly answered the problem A posi-tive value therefore indicates that the experiment group193ndash194 outperformed the control group 203ndash204 on aparticular problem and vice-versa for a negative value Themagnitude gives a measure of how much of a differencethere was between groups and it exaggerates differences onproblems where the sum Cexp+Ccontrol is small That is itrewards improvements made on lsquodifficultrsquo problems morethan improvements made on ldquoeasyrdquo problems

The NE-question results show that the 203ndash204 studentsoften did significantly better on problems relating to topicsfor which both classes had very similar learning environ-ments with the differences on 7 NE-questions being signifi-cant at the 001-level and another 2 at the 005-level all withthe 203ndash204 sample doing better than the 193ndash194 sampleThis observation was anticipated due to the known selectionbias in this study Given this bias it would have been anachievement simply to bring the 193ndash194 students to a com-

NE-Questions

-02

-015

-01

-005

0

005

01

0 1 2 3 4 5 6 7

Exam

E-Questions

-03

-02

-01

0

01

02

03

0 1 2 3 4 5 6 7

Exam

FIG 3 Plots of the normalized difference between each condi-tion on multiple-choice exam questions

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-7

parable level of performance on the E-questions In fact theresults show that by exam 3 the performance of the 193ndash194students on E-questions was not only comparable to but wasbetter than that of the 203ndash204 students although only threeof the positive differences on E-questions were significant atthe 001-level

The favored hypothesis to explain these data is that theimproved relative performance of the 193ndash194 students onE-questions was due to the use of our evaluation tasks How-ever the strength of this hypothesis may be mitigated by thepresence of uncontrolled factors in the study Although stu-dents were not randomly assigned into the conditions it isdifficult to see how this may account for the observed differ-ences on NE- and E-questions Differences between theteaching populations for the two classes also fail to provide areasonable mechanism for producing the differences onE-questions and NE-questions simultaneously If there wereany sort of ldquogood teacher effectrdquo it would be likely to affectthe results for both E- and NE-questions

Another threat to internal validity stems from the fact thatthe lecturers were not blind to the study and may have un-intentionally skewed the results This potential experimenterbias can be argued against because of the fact that lecturesfor 193ndash194 and 203ndash204 were designed to be as similar aspossible and it is not at all clear how stylistic differencescould cause such preferential performance differences be-tween E- and NE-questions in any consistent fashion Alltopics whether those tested by E- or NE-questions werecovered in very similar fashions during lectures recitationsand laboratories Also the fact that these performance differ-ences developed and persisted through two different lecturersfor 193ndash194 suggests that differences in lecture style wereprobably not a significant causal factor

Another alternative hypothesis is that the results are dueto time-on-topic differences in the recitation and homeworkassignments Although recitation and homework assignmentswere designed to minimize such differences no actual mea-surements were made It is possible that the evaluation tasks

TABLE III Cross tabulation of performance on multiple choice MC exam problems and the exactsignificance 2-tailed for chi-squared tests of independence between each class =significant at 005 level =significant at 001 level

Exam Problem Class Correct p-value Exam Problem Class Correct p-value

Ex1 NE 1 193ndash194 711 0026 Ex4 NE 1 193ndash194 885 0001203ndash204 797 203ndash204 966

Ex1 NE 2 193ndash194 487 0001 Ex4 NE 2 193ndash194 411 0001203ndash204 655 203ndash204 559

Ex2 E 1 193ndash194 258 0001 Ex5 E 1 193ndash194 686 0001203ndash204 409 203ndash204 403

Ex2 E 2 193ndash194 813 0049 Ex5 E 2 193ndash194 957 0081

203ndash204 880 203ndash204 915

Ex2 NE 1 193ndash194 538 0361 Ex5 NE 1 193ndash194 894 0496

203ndash204 494 203ndash204 870

Ex2 NE 2 193ndash194 522 0001 Ex5 NE 2 193ndash194 931 0622

203ndash204 754 203ndash204 918

Ex3 E 1 193ndash194 831 1000 Ex6 E 1 193ndash194 913 0001203ndash204 832 203ndash204 787

Ex3 E 2 193ndash194 803 0330 Ex6 E 2 193ndash194 699 0244

203ndash204 764 203ndash204 645

Ex3 E 3 193ndash194 781 0124 Ex6 E 3 193ndash194 787 0268

203ndash204 712 203ndash204 740

Ex3 NE 1 193ndash194 809 0380 Ex6 E 4 193ndash194 546 0072

203ndash204 774 203ndash204 463

Ex3 NE 2 193ndash194 972 0368 Ex6 NE 1 193ndash194 601 0050203ndash204 954 203ndash204 690

Ex3 NE 3 193ndash194 691 0001 Ex6 NE 2 193ndash194 503 0001203ndash204 820 203ndash204 715

Ex3 NE 4 193ndash194 455 0038 Ex6 NE 3 193ndash194 623 0768

203ndash204 551 203ndash204 640

Ex4 E 1 193ndash194 964 0062 Ex6 NE 4 193ndash194 639 0001203ndash204 922 203ndash204 789

Ex4 E 2 193ndash194 911 0001 Ex6 NE 5 193ndash194 705 0146

203ndash204 731 203ndash204 765

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-8

simply took longer for students to complete and that theirperformance on E-questions was due not to the format of theactivity but simply that they spent more time thinking aboutthe concepts involved in the activity However anecdotalevidence from teaching assistants who helped students withtheir recitation and homework assignments suggests that stu-dents did not take an inordinate amount of time to completethe evaluation tasks Also there was no indication from stu-dent comments made to the teaching assistants that they feltthe evaluation tasks took much longer than other recitationand homework problems

While the results above indicate that the use of evaluationtasks benefited student problem-solving performance we canfurther test this hypothesis by looking for a concrete associa-tion between the strength of studentrsquos evaluation abilities andtheir problem-solving performance We construct two ad hocmeasures to reflect studentsrsquo apparent understanding ofwhen why and how to use each evaluation strategy for acertain topic The measures are

SCArelative = FSexp QSexp minus FScontrol QScontrol

UArelative = FUexp QUexp minus FUcontrol QUcontrol

where SCA stands for ldquospecial-case analysisrdquo UA stands forldquounit analysisrdquo FS is the fraction of the class experimentalor control which used special-case analysis and FU is thefraction which used unit analysis on the evaluation tasks in-cluded on exams 2 4 5 and 6 UArelative is not applicable forexam 5 due to the task format conceptual counterargumentAlso QS is the average quality of the classrsquo special-caseanalyses and QU is the average quality of the classrsquo unitanalyses for these tasks Each of these quantities correspondsto the appropriate values in Table I

The fractions FS and FU serve as indicators of how wellstudents understand when and why to use each strategy Ifstudents do not understand the purpose of an evaluation strat-egy they are not likely to spontaneously employ it on thecritical thinking or conceptual counterargument tasks withoutspecific prompting The quantities QS and QU are given bythe evaluation ability rubrics and measure the studentsrsquo un-derstanding of how to use each strategy By taking the prod-ucts FSQS and FUQU we get a pair of numbers be-tween 0 and 3 which are taken to be indicative of each classrsquooverall understanding of each evaluation strategy

To measure relative student performance on E- and NE-questions normalized differences in class performance foreach problem category are averaged on each exam

Erelative =1

NEminusproblems

Eminusproblems

Eminusproblem

NErelative =1

NNEminusproblems

NEminusproblems

NEminusproblem

where represents the normalized difference between thetwo classesrsquo performance as plotted in Fig 3 and N de-notes the number of problems of a certain type either E- orNE-questions on an exam

Table IV lists the values for these four measures of rela-tive class performance on each exam To determine whetherspecial-case analysis and unit analysis performance related toperformance on the exam questions a Pearson correlationanalysis of these data was performed The results indicatethat Erelative is significantly positively correlated withSCArelative r=996 p=004 and negatively correlated withUArelative r=minus921 p=255 while NErelative is not signifi-cantly correlated with both SCArelative r=447 p=553 andUArelative r=528 p=646

The most parsimonious account for these results entailsthree hypotheses One is that giving a greater proportion ofevaluation tasks on special-case analysis after exam 2 asdiscussed above helped students to improve their use of thatstrategy while at the same time reducing their use of unitanalysis The fact that we manipulated these two independentfactors in this fashion could thereby explain why the frac-tional usage of unit analysis steadily declined from exams 2to 6 see Table II

A second hypothesis is that studentsrsquo use of special-caseanalysis in recitation homework and in their personallearning behavior significantly benefited their problem-solving ability The relative performance of the 193ndash194 stu-dents on E-questions correlated very strongly with their useof special-case analysis on the exam evaluation tasks Notethat because we are looking at relative differences in perfor-mance between the experiment and control groups we cansafely rule out the possibility that this correlation is due tothe ldquoeasinessrdquo of the subject matter or any other such factor

The third hypothesis is that the use of unit analysis inrecitations homework and personal learning behavior didnot benefit studentsrsquo problem-solving ability as much asspecial-case analysis It would appear that the 193ndash194 stu-dents did not appreciate the greater utility of special-caseanalysis though since on the end-of-year survey they re-ported unit analysis as being just as valuable for learningphysics as special-case analysis see Table II It thereforeseems that the students had some limitations in their abilityto self-assess the utility of special-case analysis and unitanalysis for their learning Also the fact that NErelative isuncorrelated with both UArelative and SCArelative indicates thatthere is no transfer in the benefits of either strategy to topicsnot covered by the evaluation tasks from recitation andhomework assignments

TABLE IV Measures of relative overall class performance forexams 1 through 6 The definitions of each measure are described inthe text

Exam SCArelative UArelative Erelative NErelative

1 NA NA NA minus102

2 0179 2047 minus045 minus070

3 NA NA 0024 minus037

4 0645 1272 0066 minus098

5 0851 NA 0141 0010

6 0683 0893 0057 minus080

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-9

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 4: Impact of teaching students to use evaluation strategies

VII STUDY DESIGN AND RESULTS

Preliminary research conducted during the 2003ndash2004academic year demonstrated significant correlations betweenstudentsrsquo abilities to use evaluative strategies and theirproblem-solving performance 36 During the 2004ndash2005academic year a study was conducted to further investigatethe effects of using evaluation tasks in an algebra-basedphysics course After discussing the design of the study re-sults are presented addressing three research goals 1 mea-sure studentsrsquo abilities to use evaluation strategies 2 inves-tigate the extent to which students valued and incorporatedevaluation strategies into their personal learning behavior3 test whether the use of evaluation tasks resulted in sig-nificant improvements in student problem-solving perfor-mance

To accomplish these goals data were collected from twocourses at Rutgers University as part of a quasiexperimentalcontrol group study The experiment group consists of the193ndash194 course a year-long introductory algebra-basedphysics course for life science majors The gender distribu-tion was 46 male 54 female There were 200 studentsfor each term with 95 of students participating in both the193 and 194 courses The comparison group is the year-long203ndash204 course another introductory algebra-based coursefor life science majors There were 459 students for the fallterm and 418 of those students continued enrollment in thespring term These two courses were generally run in paral-lel although the 203ndash204 class covers slightly more materialand the students in this class typically have stronger mathand science backgrounds than in the 193ndash204 course it isthe more competitive course held on a different universitycampus This likely bias was substantiated by the data andwill serve to accentuate the results of our study as discussedbelow

The lectures for 203ndash204 were all designed and given byAlan Van Heuvelen The lectures for the 193 course weregiven by Sahana Murthy a post-doctoral student workingwith the Rutgers Physics Astronomy amp Education Group atthe time and the lectures for the 194 course were given bythe author All lectures for 193 and 194 were based on VanHeuvelenrsquos lecture notes Lectures for all courses involvedchalkboard and transparency-based presentations featuringexperimental demonstrations and student engagement viapeer discussions and student infrared response systems Thechalkboard and transparency-based presentations often beganwith an experimental demonstration to establish a concreteexample of some new class of phenomena to be studiedQuestions about the properties of the phenomena would beelicited andor posed by the lecturer These questions wouldmotivate the construction of a physical model Application ofthe model to solve conceptual and traditional problemswould then be illustrated This entailed making multiple rep-resentations of information assessing consistency betweenthe different representations and utilizing the representationstogether with the physical model in order to solve the prob-lem Students would then work in groups on one or twoproblems respond via infrared response systems and receivesome feedback from the lecturer It should be noted that lec-tures in the 203ndash204 courses did include explicit modeling of

unit analysis One premise of the study was the ability toprovide very similar lecture environments for the twocourses However some stylistic and personal differences inlecturing were unavoidable and the threat to the studyrsquos in-ternal validity will be assessed below

Recitations for both courses involved 25 students work-ing in groups of 3ndash5 on a recitation assignment with a teach-ing assistant there to provide help as needed Homeworkassignments featured problems which were distinct but usu-ally similar to problems from the recitation assignments So-lutions for recitation and homework assignments were postedonline for each course The recitation and homework assign-ments given by Van Heuvelen for 203ndash204 were generallyidentical to those given in 193ndash194 except that some prob-lems from the 203ndash204 assignments were replaced by evalu-ation activities In general there were between one and twosuch replacements each week with one replacement in thehomework assignment and usually one replacement in therecitation assignment

This replacement served as our experimental factor beingthe only designed difference between the two courses Weattempted to minimize any potential bias due to time-on-topic differences by only replacing problems which coveredthe same material and which were thought to take roughlythe same amount of time to complete as the evaluation taskswhich were used in their place Some differences in the reci-tation and homework assignments were inevitable as the203ndash204 course is required to cover more material and had55-min recitations while the 193ndash194 course had 80-minrecitations Again we will address the threats posed by suchfactors later

The evaluation tasks used in recitations and homework for193ndash194 covered only half of the topics in the course Forexample we did not include any evaluation tasks relating tomomentum fluid mechanics or wave optics although taskswere included on work-energy the first law of thermody-namics and dc circuits see Warren 37 for a list of specificevaluation tasks used This is an important feature of thestudy since it provides a baseline response which will beuseful when testing whether the use of evaluation tasks af-fects problem-solving performance

Laboratories for both courses were practically identicalwith 25 students per section working in groups of 3ndash5 withthe aid of a teaching assistant The laboratories featured threetypes of experiments investigative testing and applicationEach laboratory usually included two experiments most ofwhich were either testing or application experiments Itshould be noted that although the 193ndash194 students wererequired to take the laboratory the 203ndash204 students werenot as the laboratories constitute a distinct course labeled205ndash206 However nearly all 98 of the 203ndash204 stu-dents were enrolled in 205ndash206 concurrently

Another important set of factors in the study are the teach-ers themselves The teaching assistants in 193ndash194 wereatypical as 4 of the 8 assistants for the course were enrolledat the Graduate School of Education and another 2 assistantswere members of the Rutgers Physics and Astronomy Edu-cation Group Only 1 of the 8 teaching assistants was a tra-ditional physics graduate student In contrast the teachingassistants for 203ndash204 included a mixture of several profes-

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-4

sors and several physics graduate students none of whomwere in physics education These are at least superficiallyvery distinct populations This uncontrolled factor and thethreats it poses to the studyrsquos internal validity shall be dis-cussed below

VIII GOAL 1 TEACHING EVALUATION STRATEGIES

The 193ndash194 and 203ndash204 courses both had six examsduring the year Each exam featured a combination ofmultiple-choice questions and open-response tasks withroughly two-thirds of the test score based on multiple-choiceperformance and one-third based on the open-response tasksA typical midterm exam exams 1 2 4 5 had between 10and 15 multiple choice questions and 3 open-response tasksFinal exams exams 3 and 6 had roughly double the numberof items for each type An evaluation task was included as anopen-response task on each exam for both courses Thesetasks served as summative assessments to measure studentsrsquoevaluative abilities Student responses to the evaluation taskson each exam were photocopied and scored using our ru-brics Although we photocopied and scored each studentrsquoswork from 193ndash194 the large number of students in 203ndash204 made it impossible to do the same for them We there-fore randomly selected 150 students from the 203ndash204course whose responses to the evaluation tasks on examswould be photocopied and scored A check was performed todetermine whether this sample was representative of the en-tire class A Kruskal-Wallis test indicated that the grades forthe 203ndash204 sample were not significantly different fromthose of the remainder of the class

The evaluation tasks on exams 1 and 3 were externalspecial-case analysis tasks These allowed us to measure howwell the students could use this strategy when explicitlyasked to The evaluation tasks on exams 2 4 6 were criticalthinking tasks These allowed us to see what fraction of thestudents recognized that special-case andor unit analysiscould be used to construct such arguments as would be dem-onstrated by their spontaneous use of these strategies For

those students that did attempt to use these strategies therubrics were used to measure the quality of their use Each ofthese tasks involved topics which the 193ndash194 students hadhad both special-case and unit analysis tasks on eg dccircuits The evaluation task on exam 5 was a conceptualcounterargument task which allowed us to see what fractionof students tried to use special-case analysis and how wellthey used it For a list of the exam evaluation tasks seeWarren 37 Table I lists the results from these tasks Thereported numbers for the quality of each strategyrsquos use onexams 2 4 5 6 are the average rubric scores of those stu-dents who decided to try using that particular strategy on thecritical thinking tasks included in those exams The values incolumns 3 and 4 state the fractions of each class using eachstrategy and were determined by finding the fraction of theclass that had a score of at least 1 according to our rubrics Incolumn 4 a value of 100 indicates that students were ex-plicitly instructed to use a particular strategy on an evalua-tion task Values of NA in columns 3 and 5 indicate that noevaluation tasks relating to UA were included All differ-ences in fractional use between the two classes are statisti-cally significant at the 001-level according to a 2-test Notethat there is an NA in column 4 for exams 1 and 3 becausethe tasks given on those exams explicitly told students to doa special-case analysis

There were typically 8ndash12 students in our sample from203ndash204 who tried using special-case analysis on the evalu-ation tasks from those exams and their average quality ofuse according to our rubric was 23 for each exam Nomore than 1 student in our sample from 203ndash204 ever usedunit analysis on the critical thinking tasks

There are a few points to be made here First the resultsindicate that we were successful at helping the 193ndash194 stu-dents understand when why and how to use both strategiesThey significantly outperformed the 203ndash204 students in thefrequency of use for each evaluation strategy and eventuallydemonstrated high quality use of each strategy as measuredby the rubrics The increases in special-case analysis qualityfrom exam 1 to exam 3 and in the fraction using it fromexam 2 to exam 4 are in part attributable to the fact that the

TABLE I Measurements of studentsrsquo evaluative abilities for unit analysis UA and special-case analysisSCA on each of the six exams

Exam Class Fraction UA Fraction SCA Quality UA Quality SCA

1 193ndash194 NA 100 NA 117007

203ndash204 100 081005

2 193ndash194 089 018 248004 125008

203ndash204 001 005 300000 233019

3 193ndash194 NA 100 NA 226004

203ndash204 100 090005

4 193ndash194 053 038 273006 223007

203ndash204 001 009 300000 230013

5 193ndash194 NA 042 NA 234008

203ndash204 005 233019

6 193ndash194 038 038 235009 225007

203ndash204 001 007 300000 225013

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-5

ratio of types of tasks given in recitation and homeworkassignments was altered after exam 2 Up until then roughlyhalf of the evaluation tasks included in these assignmentsfocused on unit analysis while the other half dealt withspecial-case analysis After seeing the results from exam 2though it was realized that students were having much moresuccess with unit analysis perhaps because it is a much sim-pler strategy than special-case analysis Thereafter the ma-jority of evaluation tasks in recitations and homework fo-cused on special-case analysis and a substantial fraction40 of the students are observed to spontaneously em-ploy SCA on the exam evaluation activities

It should be clarified that on exams 4 and 6 the set of193ndash194 students who used unit analysis and the set of thosethat used special-case analysis were not identical eventhough the fractional usage was similar for both strategiesThe overlap between these two sets defined as the ratio oftheir intersection to their union was 045 for exam 4 and053 for exam 6

It is interesting that special-case analysis was used at allamong the 203ndash204 students on the critical thinking tasks onexams 2 4 5 6 as a group of roughly 8ndash12 students spon-taneously employed SCA with a high level of quality aver-age rubric scores of roughly 23 Apparently some studentsmay enter our physics courses with a well-developed under-standing of special-case analysis perhaps due to prior phys-ics courses It is also worth noting that 203ndash204 studentswho tried using special-case analysis for the open-responseexam problem typically scored very well on the multiple-choice questions with 38 of them earning perfect scores

IX GOAL 2 USING AND VALUING EVALUATIONSTRATEGIES

As a means to assess student valuation and use of evalu-ation strategies anonymous surveys were administered at thelast recitation of the spring term in the Physics 194 courseIncluded were several questions asking students to rate howfrequently they had used each strategy to evaluate their ownwork outside of class and also to rate how strongly certainfactors inhibited their use of each strategy These surveyquestions are shown in Fig 2 with the results in Table IIThere were 158 respondents to the survey out of the 200students in the course giving a response rate of 79 Theremay be a selection bias among the respondents as the stu-dents knew that one recitation grade would be dropped fromthe final grade and that the last recitation would be a reviewsession instead of a normal recitation For that reason upperand lower bounds are listed on the average scores in Table IIwhere the bounds are determined by assuming all missingrespondents would have given either the highest or lowestpossible response for the questions

To conduct a Cronbach alpha test for internal consistencythe scores to questions 3andash3d and 6andash6d were madenegative accounting for the fact that these inhibitors arelikely to reduce the usage and valuation of evaluation strat-egies by students ie all scores were coded in the sameconceptual direction The calculated alpha value is =0800 a strong result giving confidence that the items are

consistent and can therefore be used as a basis for interpre-tation about student usage and valuation of evaluation strat-egies

One may safely assume that there were very few studentswho came into the 193ndash194 course already knowing valu-ing and using either evaluation strategy This assumption issupported by the small number of students from 203ndash204who used either strategy on the tasks in exams 2 4 5 and 6see above Given that assumption the results suggest amoderate degree of success in teaching students to incorpo-rate these strategies into their personal learning behavior In-deed responses to questions 1 and 4 were probably artifi-cially lowered due to the fact that we only includedevaluation tasks relating to half of the topics during the yearIf evaluation tasks had been given on all topics it seemslikely that students would have used the evaluation strategiesmore frequently It is worth pointing out that the evidencehere has limited inferential strength because of its indirect-ness Further studies with more direct means of assessingwhether students learn when and why to employ evaluationstrategies would be useful

1 How much have you used special-case analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

2 If you were a physics major and really interested in learning physics how useful do youthink special-case analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

3 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dospecial-case analysis (10 means the factor really made you not want to do special-caseanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do special-case analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a special-case analysis (if you wereniacutet confused put 0)

4 How much have you used dimensional analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

5 If you were a physics major and really interested in learning physics how useful do youthink dimensional analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

6 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dodimensional analysis (10 means the factor really made you not want to do dimensionalanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do dimensional analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a dimensional analysis (if you wereniacutet confused put 0)

FIG 2 End-of-year survey questions regarding the studentsrsquoself-reported use of each evaluation strategy

TABLE II Results from end-of-year survey questions

Question number Average response Lower bound Upper bound

1 225 19 28

2 385 32 41

3a 7010 55 76

3b 5110 40 61

3c 2410 19 40

3d 2210 17 38

4 295 25 33

5 395 33 41

6a 5910 47 67

6b 4710 37 58

6c 1710 13 34

6d 1610 13 34

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-6

The relatively low response scores to questions 3c 3d 6cand 6d are consistent with the results for goal one of ourstudy indicating we were successful at helping students un-derstand when why and how to use each strategy The factthat the scores for 3c and 3d are higher than 6c and 6dappears reasonable because special-case analysis is certainlya much more complicated and multifaceted strategy that unitanalysis This disparity in complexity probably also explainswhy the scores to 3a and 3b were higher than 6a and 6b

X GOAL 3 ENHANCING PROBLEM-SOLVINGPERFORMANCE BY TEACHING EVALUATION

Several conventional multiple-choice questions werecommon to each of the 193ndash194 and 203ndash204 lecture examsThere were six exams during the year three per semesterThe midterm exams exams 1 2 4 and 5 were 80 minwhile the final exams exams 3 and 6 were 3 h The examquestions shared by the two classes were all designed by theinstructor of the 203ndash204 course Van Heuvelen Some ofthese shared multiple-choice questions were on topics whichthe 193ndash194 students had had evaluation tasks on such aswork-energy and dc circuits This set of multiple-choicequestions will be called E-questions The remainder of theshared multiple-choice exam questions covered topics thatno one had had evaluation tasks on such as momentum andfluid mechanics These will be called NE-questions The E-and NE-questions are listed in Appendix B of Warren 37and are assumed to provide accurate measures of studentproblem-solving performance 38 This assumption miti-gates the strength of the results reported below and morerobust measures of problem-solving performance would beuseful in future work Roughly 80 of these questions werenumerical and the remaining 20 of questions were eitherconceptual or symbolic Some of the conceptual questionswere directly taken or modified from standardized assess-ment tools such as FCI 39 and CSEM 40 All of thequestions had been used by Van Heuvelen in previous years

The first midterm exam had 2 NE-questions and noE-questions as there had not been any evaluation tasks usedto this point in the 193ndash194 course There were 2E-questions and 2 NE-questions on midterm exams 2 4 and5 Exam 3 had 4 NE questions and 3 E-questions and Exam6 had 5 NE-questions and 4 E-questions

Given this design we may make two predictions based onthe view of evaluation strategies as tools of self-regulatedlearning First because of the known population bias be-tween the 193ndash194 and 203ndash204 students the 203ndash204 stu-dents are expected to do better on the NE-questions Thisprediction assumes that whatever benefits the evaluationtasks may have for the 193ndash194 studentsrsquo E-question perfor-mance will not be transferable to the topics covered by NE-questions

A second prediction is that the performance of the 193ndash194 students should be relatively better on E-questions thanon NE-questions if the evaluation tasks succeed in benefitingstudent understanding of the topics covered by the tasksMoreover this boost of E-question performance should varyas students become more adept at the use of evaluation strat-

egies Therefore it is predicted that student performance onthe open-response evaluation tasks included on each examwill directly correlate with the relative performance of 193ndash194 students on E-questions

The relative performances of the 193ndash194 and 203ndash204students on E- and NE-questions are compared in Fig 3 andTable III In Fig 3 the normalized difference betweenthe two classes for each E- and NE-question is computed as

=Cexp minus Ccontrol

Cexp + Ccontrol

where Cexp and Ccontrol denote the percentage of students ineach condition who correctly answered the problem A posi-tive value therefore indicates that the experiment group193ndash194 outperformed the control group 203ndash204 on aparticular problem and vice-versa for a negative value Themagnitude gives a measure of how much of a differencethere was between groups and it exaggerates differences onproblems where the sum Cexp+Ccontrol is small That is itrewards improvements made on lsquodifficultrsquo problems morethan improvements made on ldquoeasyrdquo problems

The NE-question results show that the 203ndash204 studentsoften did significantly better on problems relating to topicsfor which both classes had very similar learning environ-ments with the differences on 7 NE-questions being signifi-cant at the 001-level and another 2 at the 005-level all withthe 203ndash204 sample doing better than the 193ndash194 sampleThis observation was anticipated due to the known selectionbias in this study Given this bias it would have been anachievement simply to bring the 193ndash194 students to a com-

NE-Questions

-02

-015

-01

-005

0

005

01

0 1 2 3 4 5 6 7

Exam

E-Questions

-03

-02

-01

0

01

02

03

0 1 2 3 4 5 6 7

Exam

FIG 3 Plots of the normalized difference between each condi-tion on multiple-choice exam questions

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-7

parable level of performance on the E-questions In fact theresults show that by exam 3 the performance of the 193ndash194students on E-questions was not only comparable to but wasbetter than that of the 203ndash204 students although only threeof the positive differences on E-questions were significant atthe 001-level

The favored hypothesis to explain these data is that theimproved relative performance of the 193ndash194 students onE-questions was due to the use of our evaluation tasks How-ever the strength of this hypothesis may be mitigated by thepresence of uncontrolled factors in the study Although stu-dents were not randomly assigned into the conditions it isdifficult to see how this may account for the observed differ-ences on NE- and E-questions Differences between theteaching populations for the two classes also fail to provide areasonable mechanism for producing the differences onE-questions and NE-questions simultaneously If there wereany sort of ldquogood teacher effectrdquo it would be likely to affectthe results for both E- and NE-questions

Another threat to internal validity stems from the fact thatthe lecturers were not blind to the study and may have un-intentionally skewed the results This potential experimenterbias can be argued against because of the fact that lecturesfor 193ndash194 and 203ndash204 were designed to be as similar aspossible and it is not at all clear how stylistic differencescould cause such preferential performance differences be-tween E- and NE-questions in any consistent fashion Alltopics whether those tested by E- or NE-questions werecovered in very similar fashions during lectures recitationsand laboratories Also the fact that these performance differ-ences developed and persisted through two different lecturersfor 193ndash194 suggests that differences in lecture style wereprobably not a significant causal factor

Another alternative hypothesis is that the results are dueto time-on-topic differences in the recitation and homeworkassignments Although recitation and homework assignmentswere designed to minimize such differences no actual mea-surements were made It is possible that the evaluation tasks

TABLE III Cross tabulation of performance on multiple choice MC exam problems and the exactsignificance 2-tailed for chi-squared tests of independence between each class =significant at 005 level =significant at 001 level

Exam Problem Class Correct p-value Exam Problem Class Correct p-value

Ex1 NE 1 193ndash194 711 0026 Ex4 NE 1 193ndash194 885 0001203ndash204 797 203ndash204 966

Ex1 NE 2 193ndash194 487 0001 Ex4 NE 2 193ndash194 411 0001203ndash204 655 203ndash204 559

Ex2 E 1 193ndash194 258 0001 Ex5 E 1 193ndash194 686 0001203ndash204 409 203ndash204 403

Ex2 E 2 193ndash194 813 0049 Ex5 E 2 193ndash194 957 0081

203ndash204 880 203ndash204 915

Ex2 NE 1 193ndash194 538 0361 Ex5 NE 1 193ndash194 894 0496

203ndash204 494 203ndash204 870

Ex2 NE 2 193ndash194 522 0001 Ex5 NE 2 193ndash194 931 0622

203ndash204 754 203ndash204 918

Ex3 E 1 193ndash194 831 1000 Ex6 E 1 193ndash194 913 0001203ndash204 832 203ndash204 787

Ex3 E 2 193ndash194 803 0330 Ex6 E 2 193ndash194 699 0244

203ndash204 764 203ndash204 645

Ex3 E 3 193ndash194 781 0124 Ex6 E 3 193ndash194 787 0268

203ndash204 712 203ndash204 740

Ex3 NE 1 193ndash194 809 0380 Ex6 E 4 193ndash194 546 0072

203ndash204 774 203ndash204 463

Ex3 NE 2 193ndash194 972 0368 Ex6 NE 1 193ndash194 601 0050203ndash204 954 203ndash204 690

Ex3 NE 3 193ndash194 691 0001 Ex6 NE 2 193ndash194 503 0001203ndash204 820 203ndash204 715

Ex3 NE 4 193ndash194 455 0038 Ex6 NE 3 193ndash194 623 0768

203ndash204 551 203ndash204 640

Ex4 E 1 193ndash194 964 0062 Ex6 NE 4 193ndash194 639 0001203ndash204 922 203ndash204 789

Ex4 E 2 193ndash194 911 0001 Ex6 NE 5 193ndash194 705 0146

203ndash204 731 203ndash204 765

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-8

simply took longer for students to complete and that theirperformance on E-questions was due not to the format of theactivity but simply that they spent more time thinking aboutthe concepts involved in the activity However anecdotalevidence from teaching assistants who helped students withtheir recitation and homework assignments suggests that stu-dents did not take an inordinate amount of time to completethe evaluation tasks Also there was no indication from stu-dent comments made to the teaching assistants that they feltthe evaluation tasks took much longer than other recitationand homework problems

While the results above indicate that the use of evaluationtasks benefited student problem-solving performance we canfurther test this hypothesis by looking for a concrete associa-tion between the strength of studentrsquos evaluation abilities andtheir problem-solving performance We construct two ad hocmeasures to reflect studentsrsquo apparent understanding ofwhen why and how to use each evaluation strategy for acertain topic The measures are

SCArelative = FSexp QSexp minus FScontrol QScontrol

UArelative = FUexp QUexp minus FUcontrol QUcontrol

where SCA stands for ldquospecial-case analysisrdquo UA stands forldquounit analysisrdquo FS is the fraction of the class experimentalor control which used special-case analysis and FU is thefraction which used unit analysis on the evaluation tasks in-cluded on exams 2 4 5 and 6 UArelative is not applicable forexam 5 due to the task format conceptual counterargumentAlso QS is the average quality of the classrsquo special-caseanalyses and QU is the average quality of the classrsquo unitanalyses for these tasks Each of these quantities correspondsto the appropriate values in Table I

The fractions FS and FU serve as indicators of how wellstudents understand when and why to use each strategy Ifstudents do not understand the purpose of an evaluation strat-egy they are not likely to spontaneously employ it on thecritical thinking or conceptual counterargument tasks withoutspecific prompting The quantities QS and QU are given bythe evaluation ability rubrics and measure the studentsrsquo un-derstanding of how to use each strategy By taking the prod-ucts FSQS and FUQU we get a pair of numbers be-tween 0 and 3 which are taken to be indicative of each classrsquooverall understanding of each evaluation strategy

To measure relative student performance on E- and NE-questions normalized differences in class performance foreach problem category are averaged on each exam

Erelative =1

NEminusproblems

Eminusproblems

Eminusproblem

NErelative =1

NNEminusproblems

NEminusproblems

NEminusproblem

where represents the normalized difference between thetwo classesrsquo performance as plotted in Fig 3 and N de-notes the number of problems of a certain type either E- orNE-questions on an exam

Table IV lists the values for these four measures of rela-tive class performance on each exam To determine whetherspecial-case analysis and unit analysis performance related toperformance on the exam questions a Pearson correlationanalysis of these data was performed The results indicatethat Erelative is significantly positively correlated withSCArelative r=996 p=004 and negatively correlated withUArelative r=minus921 p=255 while NErelative is not signifi-cantly correlated with both SCArelative r=447 p=553 andUArelative r=528 p=646

The most parsimonious account for these results entailsthree hypotheses One is that giving a greater proportion ofevaluation tasks on special-case analysis after exam 2 asdiscussed above helped students to improve their use of thatstrategy while at the same time reducing their use of unitanalysis The fact that we manipulated these two independentfactors in this fashion could thereby explain why the frac-tional usage of unit analysis steadily declined from exams 2to 6 see Table II

A second hypothesis is that studentsrsquo use of special-caseanalysis in recitation homework and in their personallearning behavior significantly benefited their problem-solving ability The relative performance of the 193ndash194 stu-dents on E-questions correlated very strongly with their useof special-case analysis on the exam evaluation tasks Notethat because we are looking at relative differences in perfor-mance between the experiment and control groups we cansafely rule out the possibility that this correlation is due tothe ldquoeasinessrdquo of the subject matter or any other such factor

The third hypothesis is that the use of unit analysis inrecitations homework and personal learning behavior didnot benefit studentsrsquo problem-solving ability as much asspecial-case analysis It would appear that the 193ndash194 stu-dents did not appreciate the greater utility of special-caseanalysis though since on the end-of-year survey they re-ported unit analysis as being just as valuable for learningphysics as special-case analysis see Table II It thereforeseems that the students had some limitations in their abilityto self-assess the utility of special-case analysis and unitanalysis for their learning Also the fact that NErelative isuncorrelated with both UArelative and SCArelative indicates thatthere is no transfer in the benefits of either strategy to topicsnot covered by the evaluation tasks from recitation andhomework assignments

TABLE IV Measures of relative overall class performance forexams 1 through 6 The definitions of each measure are described inthe text

Exam SCArelative UArelative Erelative NErelative

1 NA NA NA minus102

2 0179 2047 minus045 minus070

3 NA NA 0024 minus037

4 0645 1272 0066 minus098

5 0851 NA 0141 0010

6 0683 0893 0057 minus080

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-9

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 5: Impact of teaching students to use evaluation strategies

sors and several physics graduate students none of whomwere in physics education These are at least superficiallyvery distinct populations This uncontrolled factor and thethreats it poses to the studyrsquos internal validity shall be dis-cussed below

VIII GOAL 1 TEACHING EVALUATION STRATEGIES

The 193ndash194 and 203ndash204 courses both had six examsduring the year Each exam featured a combination ofmultiple-choice questions and open-response tasks withroughly two-thirds of the test score based on multiple-choiceperformance and one-third based on the open-response tasksA typical midterm exam exams 1 2 4 5 had between 10and 15 multiple choice questions and 3 open-response tasksFinal exams exams 3 and 6 had roughly double the numberof items for each type An evaluation task was included as anopen-response task on each exam for both courses Thesetasks served as summative assessments to measure studentsrsquoevaluative abilities Student responses to the evaluation taskson each exam were photocopied and scored using our ru-brics Although we photocopied and scored each studentrsquoswork from 193ndash194 the large number of students in 203ndash204 made it impossible to do the same for them We there-fore randomly selected 150 students from the 203ndash204course whose responses to the evaluation tasks on examswould be photocopied and scored A check was performed todetermine whether this sample was representative of the en-tire class A Kruskal-Wallis test indicated that the grades forthe 203ndash204 sample were not significantly different fromthose of the remainder of the class

The evaluation tasks on exams 1 and 3 were externalspecial-case analysis tasks These allowed us to measure howwell the students could use this strategy when explicitlyasked to The evaluation tasks on exams 2 4 6 were criticalthinking tasks These allowed us to see what fraction of thestudents recognized that special-case andor unit analysiscould be used to construct such arguments as would be dem-onstrated by their spontaneous use of these strategies For

those students that did attempt to use these strategies therubrics were used to measure the quality of their use Each ofthese tasks involved topics which the 193ndash194 students hadhad both special-case and unit analysis tasks on eg dccircuits The evaluation task on exam 5 was a conceptualcounterargument task which allowed us to see what fractionof students tried to use special-case analysis and how wellthey used it For a list of the exam evaluation tasks seeWarren 37 Table I lists the results from these tasks Thereported numbers for the quality of each strategyrsquos use onexams 2 4 5 6 are the average rubric scores of those stu-dents who decided to try using that particular strategy on thecritical thinking tasks included in those exams The values incolumns 3 and 4 state the fractions of each class using eachstrategy and were determined by finding the fraction of theclass that had a score of at least 1 according to our rubrics Incolumn 4 a value of 100 indicates that students were ex-plicitly instructed to use a particular strategy on an evalua-tion task Values of NA in columns 3 and 5 indicate that noevaluation tasks relating to UA were included All differ-ences in fractional use between the two classes are statisti-cally significant at the 001-level according to a 2-test Notethat there is an NA in column 4 for exams 1 and 3 becausethe tasks given on those exams explicitly told students to doa special-case analysis

There were typically 8ndash12 students in our sample from203ndash204 who tried using special-case analysis on the evalu-ation tasks from those exams and their average quality ofuse according to our rubric was 23 for each exam Nomore than 1 student in our sample from 203ndash204 ever usedunit analysis on the critical thinking tasks

There are a few points to be made here First the resultsindicate that we were successful at helping the 193ndash194 stu-dents understand when why and how to use both strategiesThey significantly outperformed the 203ndash204 students in thefrequency of use for each evaluation strategy and eventuallydemonstrated high quality use of each strategy as measuredby the rubrics The increases in special-case analysis qualityfrom exam 1 to exam 3 and in the fraction using it fromexam 2 to exam 4 are in part attributable to the fact that the

TABLE I Measurements of studentsrsquo evaluative abilities for unit analysis UA and special-case analysisSCA on each of the six exams

Exam Class Fraction UA Fraction SCA Quality UA Quality SCA

1 193ndash194 NA 100 NA 117007

203ndash204 100 081005

2 193ndash194 089 018 248004 125008

203ndash204 001 005 300000 233019

3 193ndash194 NA 100 NA 226004

203ndash204 100 090005

4 193ndash194 053 038 273006 223007

203ndash204 001 009 300000 230013

5 193ndash194 NA 042 NA 234008

203ndash204 005 233019

6 193ndash194 038 038 235009 225007

203ndash204 001 007 300000 225013

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-5

ratio of types of tasks given in recitation and homeworkassignments was altered after exam 2 Up until then roughlyhalf of the evaluation tasks included in these assignmentsfocused on unit analysis while the other half dealt withspecial-case analysis After seeing the results from exam 2though it was realized that students were having much moresuccess with unit analysis perhaps because it is a much sim-pler strategy than special-case analysis Thereafter the ma-jority of evaluation tasks in recitations and homework fo-cused on special-case analysis and a substantial fraction40 of the students are observed to spontaneously em-ploy SCA on the exam evaluation activities

It should be clarified that on exams 4 and 6 the set of193ndash194 students who used unit analysis and the set of thosethat used special-case analysis were not identical eventhough the fractional usage was similar for both strategiesThe overlap between these two sets defined as the ratio oftheir intersection to their union was 045 for exam 4 and053 for exam 6

It is interesting that special-case analysis was used at allamong the 203ndash204 students on the critical thinking tasks onexams 2 4 5 6 as a group of roughly 8ndash12 students spon-taneously employed SCA with a high level of quality aver-age rubric scores of roughly 23 Apparently some studentsmay enter our physics courses with a well-developed under-standing of special-case analysis perhaps due to prior phys-ics courses It is also worth noting that 203ndash204 studentswho tried using special-case analysis for the open-responseexam problem typically scored very well on the multiple-choice questions with 38 of them earning perfect scores

IX GOAL 2 USING AND VALUING EVALUATIONSTRATEGIES

As a means to assess student valuation and use of evalu-ation strategies anonymous surveys were administered at thelast recitation of the spring term in the Physics 194 courseIncluded were several questions asking students to rate howfrequently they had used each strategy to evaluate their ownwork outside of class and also to rate how strongly certainfactors inhibited their use of each strategy These surveyquestions are shown in Fig 2 with the results in Table IIThere were 158 respondents to the survey out of the 200students in the course giving a response rate of 79 Theremay be a selection bias among the respondents as the stu-dents knew that one recitation grade would be dropped fromthe final grade and that the last recitation would be a reviewsession instead of a normal recitation For that reason upperand lower bounds are listed on the average scores in Table IIwhere the bounds are determined by assuming all missingrespondents would have given either the highest or lowestpossible response for the questions

To conduct a Cronbach alpha test for internal consistencythe scores to questions 3andash3d and 6andash6d were madenegative accounting for the fact that these inhibitors arelikely to reduce the usage and valuation of evaluation strat-egies by students ie all scores were coded in the sameconceptual direction The calculated alpha value is =0800 a strong result giving confidence that the items are

consistent and can therefore be used as a basis for interpre-tation about student usage and valuation of evaluation strat-egies

One may safely assume that there were very few studentswho came into the 193ndash194 course already knowing valu-ing and using either evaluation strategy This assumption issupported by the small number of students from 203ndash204who used either strategy on the tasks in exams 2 4 5 and 6see above Given that assumption the results suggest amoderate degree of success in teaching students to incorpo-rate these strategies into their personal learning behavior In-deed responses to questions 1 and 4 were probably artifi-cially lowered due to the fact that we only includedevaluation tasks relating to half of the topics during the yearIf evaluation tasks had been given on all topics it seemslikely that students would have used the evaluation strategiesmore frequently It is worth pointing out that the evidencehere has limited inferential strength because of its indirect-ness Further studies with more direct means of assessingwhether students learn when and why to employ evaluationstrategies would be useful

1 How much have you used special-case analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

2 If you were a physics major and really interested in learning physics how useful do youthink special-case analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

3 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dospecial-case analysis (10 means the factor really made you not want to do special-caseanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do special-case analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a special-case analysis (if you wereniacutet confused put 0)

4 How much have you used dimensional analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

5 If you were a physics major and really interested in learning physics how useful do youthink dimensional analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

6 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dodimensional analysis (10 means the factor really made you not want to do dimensionalanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do dimensional analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a dimensional analysis (if you wereniacutet confused put 0)

FIG 2 End-of-year survey questions regarding the studentsrsquoself-reported use of each evaluation strategy

TABLE II Results from end-of-year survey questions

Question number Average response Lower bound Upper bound

1 225 19 28

2 385 32 41

3a 7010 55 76

3b 5110 40 61

3c 2410 19 40

3d 2210 17 38

4 295 25 33

5 395 33 41

6a 5910 47 67

6b 4710 37 58

6c 1710 13 34

6d 1610 13 34

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-6

The relatively low response scores to questions 3c 3d 6cand 6d are consistent with the results for goal one of ourstudy indicating we were successful at helping students un-derstand when why and how to use each strategy The factthat the scores for 3c and 3d are higher than 6c and 6dappears reasonable because special-case analysis is certainlya much more complicated and multifaceted strategy that unitanalysis This disparity in complexity probably also explainswhy the scores to 3a and 3b were higher than 6a and 6b

X GOAL 3 ENHANCING PROBLEM-SOLVINGPERFORMANCE BY TEACHING EVALUATION

Several conventional multiple-choice questions werecommon to each of the 193ndash194 and 203ndash204 lecture examsThere were six exams during the year three per semesterThe midterm exams exams 1 2 4 and 5 were 80 minwhile the final exams exams 3 and 6 were 3 h The examquestions shared by the two classes were all designed by theinstructor of the 203ndash204 course Van Heuvelen Some ofthese shared multiple-choice questions were on topics whichthe 193ndash194 students had had evaluation tasks on such aswork-energy and dc circuits This set of multiple-choicequestions will be called E-questions The remainder of theshared multiple-choice exam questions covered topics thatno one had had evaluation tasks on such as momentum andfluid mechanics These will be called NE-questions The E-and NE-questions are listed in Appendix B of Warren 37and are assumed to provide accurate measures of studentproblem-solving performance 38 This assumption miti-gates the strength of the results reported below and morerobust measures of problem-solving performance would beuseful in future work Roughly 80 of these questions werenumerical and the remaining 20 of questions were eitherconceptual or symbolic Some of the conceptual questionswere directly taken or modified from standardized assess-ment tools such as FCI 39 and CSEM 40 All of thequestions had been used by Van Heuvelen in previous years

The first midterm exam had 2 NE-questions and noE-questions as there had not been any evaluation tasks usedto this point in the 193ndash194 course There were 2E-questions and 2 NE-questions on midterm exams 2 4 and5 Exam 3 had 4 NE questions and 3 E-questions and Exam6 had 5 NE-questions and 4 E-questions

Given this design we may make two predictions based onthe view of evaluation strategies as tools of self-regulatedlearning First because of the known population bias be-tween the 193ndash194 and 203ndash204 students the 203ndash204 stu-dents are expected to do better on the NE-questions Thisprediction assumes that whatever benefits the evaluationtasks may have for the 193ndash194 studentsrsquo E-question perfor-mance will not be transferable to the topics covered by NE-questions

A second prediction is that the performance of the 193ndash194 students should be relatively better on E-questions thanon NE-questions if the evaluation tasks succeed in benefitingstudent understanding of the topics covered by the tasksMoreover this boost of E-question performance should varyas students become more adept at the use of evaluation strat-

egies Therefore it is predicted that student performance onthe open-response evaluation tasks included on each examwill directly correlate with the relative performance of 193ndash194 students on E-questions

The relative performances of the 193ndash194 and 203ndash204students on E- and NE-questions are compared in Fig 3 andTable III In Fig 3 the normalized difference betweenthe two classes for each E- and NE-question is computed as

=Cexp minus Ccontrol

Cexp + Ccontrol

where Cexp and Ccontrol denote the percentage of students ineach condition who correctly answered the problem A posi-tive value therefore indicates that the experiment group193ndash194 outperformed the control group 203ndash204 on aparticular problem and vice-versa for a negative value Themagnitude gives a measure of how much of a differencethere was between groups and it exaggerates differences onproblems where the sum Cexp+Ccontrol is small That is itrewards improvements made on lsquodifficultrsquo problems morethan improvements made on ldquoeasyrdquo problems

The NE-question results show that the 203ndash204 studentsoften did significantly better on problems relating to topicsfor which both classes had very similar learning environ-ments with the differences on 7 NE-questions being signifi-cant at the 001-level and another 2 at the 005-level all withthe 203ndash204 sample doing better than the 193ndash194 sampleThis observation was anticipated due to the known selectionbias in this study Given this bias it would have been anachievement simply to bring the 193ndash194 students to a com-

NE-Questions

-02

-015

-01

-005

0

005

01

0 1 2 3 4 5 6 7

Exam

E-Questions

-03

-02

-01

0

01

02

03

0 1 2 3 4 5 6 7

Exam

FIG 3 Plots of the normalized difference between each condi-tion on multiple-choice exam questions

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-7

parable level of performance on the E-questions In fact theresults show that by exam 3 the performance of the 193ndash194students on E-questions was not only comparable to but wasbetter than that of the 203ndash204 students although only threeof the positive differences on E-questions were significant atthe 001-level

The favored hypothesis to explain these data is that theimproved relative performance of the 193ndash194 students onE-questions was due to the use of our evaluation tasks How-ever the strength of this hypothesis may be mitigated by thepresence of uncontrolled factors in the study Although stu-dents were not randomly assigned into the conditions it isdifficult to see how this may account for the observed differ-ences on NE- and E-questions Differences between theteaching populations for the two classes also fail to provide areasonable mechanism for producing the differences onE-questions and NE-questions simultaneously If there wereany sort of ldquogood teacher effectrdquo it would be likely to affectthe results for both E- and NE-questions

Another threat to internal validity stems from the fact thatthe lecturers were not blind to the study and may have un-intentionally skewed the results This potential experimenterbias can be argued against because of the fact that lecturesfor 193ndash194 and 203ndash204 were designed to be as similar aspossible and it is not at all clear how stylistic differencescould cause such preferential performance differences be-tween E- and NE-questions in any consistent fashion Alltopics whether those tested by E- or NE-questions werecovered in very similar fashions during lectures recitationsand laboratories Also the fact that these performance differ-ences developed and persisted through two different lecturersfor 193ndash194 suggests that differences in lecture style wereprobably not a significant causal factor

Another alternative hypothesis is that the results are dueto time-on-topic differences in the recitation and homeworkassignments Although recitation and homework assignmentswere designed to minimize such differences no actual mea-surements were made It is possible that the evaluation tasks

TABLE III Cross tabulation of performance on multiple choice MC exam problems and the exactsignificance 2-tailed for chi-squared tests of independence between each class =significant at 005 level =significant at 001 level

Exam Problem Class Correct p-value Exam Problem Class Correct p-value

Ex1 NE 1 193ndash194 711 0026 Ex4 NE 1 193ndash194 885 0001203ndash204 797 203ndash204 966

Ex1 NE 2 193ndash194 487 0001 Ex4 NE 2 193ndash194 411 0001203ndash204 655 203ndash204 559

Ex2 E 1 193ndash194 258 0001 Ex5 E 1 193ndash194 686 0001203ndash204 409 203ndash204 403

Ex2 E 2 193ndash194 813 0049 Ex5 E 2 193ndash194 957 0081

203ndash204 880 203ndash204 915

Ex2 NE 1 193ndash194 538 0361 Ex5 NE 1 193ndash194 894 0496

203ndash204 494 203ndash204 870

Ex2 NE 2 193ndash194 522 0001 Ex5 NE 2 193ndash194 931 0622

203ndash204 754 203ndash204 918

Ex3 E 1 193ndash194 831 1000 Ex6 E 1 193ndash194 913 0001203ndash204 832 203ndash204 787

Ex3 E 2 193ndash194 803 0330 Ex6 E 2 193ndash194 699 0244

203ndash204 764 203ndash204 645

Ex3 E 3 193ndash194 781 0124 Ex6 E 3 193ndash194 787 0268

203ndash204 712 203ndash204 740

Ex3 NE 1 193ndash194 809 0380 Ex6 E 4 193ndash194 546 0072

203ndash204 774 203ndash204 463

Ex3 NE 2 193ndash194 972 0368 Ex6 NE 1 193ndash194 601 0050203ndash204 954 203ndash204 690

Ex3 NE 3 193ndash194 691 0001 Ex6 NE 2 193ndash194 503 0001203ndash204 820 203ndash204 715

Ex3 NE 4 193ndash194 455 0038 Ex6 NE 3 193ndash194 623 0768

203ndash204 551 203ndash204 640

Ex4 E 1 193ndash194 964 0062 Ex6 NE 4 193ndash194 639 0001203ndash204 922 203ndash204 789

Ex4 E 2 193ndash194 911 0001 Ex6 NE 5 193ndash194 705 0146

203ndash204 731 203ndash204 765

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-8

simply took longer for students to complete and that theirperformance on E-questions was due not to the format of theactivity but simply that they spent more time thinking aboutthe concepts involved in the activity However anecdotalevidence from teaching assistants who helped students withtheir recitation and homework assignments suggests that stu-dents did not take an inordinate amount of time to completethe evaluation tasks Also there was no indication from stu-dent comments made to the teaching assistants that they feltthe evaluation tasks took much longer than other recitationand homework problems

While the results above indicate that the use of evaluationtasks benefited student problem-solving performance we canfurther test this hypothesis by looking for a concrete associa-tion between the strength of studentrsquos evaluation abilities andtheir problem-solving performance We construct two ad hocmeasures to reflect studentsrsquo apparent understanding ofwhen why and how to use each evaluation strategy for acertain topic The measures are

SCArelative = FSexp QSexp minus FScontrol QScontrol

UArelative = FUexp QUexp minus FUcontrol QUcontrol

where SCA stands for ldquospecial-case analysisrdquo UA stands forldquounit analysisrdquo FS is the fraction of the class experimentalor control which used special-case analysis and FU is thefraction which used unit analysis on the evaluation tasks in-cluded on exams 2 4 5 and 6 UArelative is not applicable forexam 5 due to the task format conceptual counterargumentAlso QS is the average quality of the classrsquo special-caseanalyses and QU is the average quality of the classrsquo unitanalyses for these tasks Each of these quantities correspondsto the appropriate values in Table I

The fractions FS and FU serve as indicators of how wellstudents understand when and why to use each strategy Ifstudents do not understand the purpose of an evaluation strat-egy they are not likely to spontaneously employ it on thecritical thinking or conceptual counterargument tasks withoutspecific prompting The quantities QS and QU are given bythe evaluation ability rubrics and measure the studentsrsquo un-derstanding of how to use each strategy By taking the prod-ucts FSQS and FUQU we get a pair of numbers be-tween 0 and 3 which are taken to be indicative of each classrsquooverall understanding of each evaluation strategy

To measure relative student performance on E- and NE-questions normalized differences in class performance foreach problem category are averaged on each exam

Erelative =1

NEminusproblems

Eminusproblems

Eminusproblem

NErelative =1

NNEminusproblems

NEminusproblems

NEminusproblem

where represents the normalized difference between thetwo classesrsquo performance as plotted in Fig 3 and N de-notes the number of problems of a certain type either E- orNE-questions on an exam

Table IV lists the values for these four measures of rela-tive class performance on each exam To determine whetherspecial-case analysis and unit analysis performance related toperformance on the exam questions a Pearson correlationanalysis of these data was performed The results indicatethat Erelative is significantly positively correlated withSCArelative r=996 p=004 and negatively correlated withUArelative r=minus921 p=255 while NErelative is not signifi-cantly correlated with both SCArelative r=447 p=553 andUArelative r=528 p=646

The most parsimonious account for these results entailsthree hypotheses One is that giving a greater proportion ofevaluation tasks on special-case analysis after exam 2 asdiscussed above helped students to improve their use of thatstrategy while at the same time reducing their use of unitanalysis The fact that we manipulated these two independentfactors in this fashion could thereby explain why the frac-tional usage of unit analysis steadily declined from exams 2to 6 see Table II

A second hypothesis is that studentsrsquo use of special-caseanalysis in recitation homework and in their personallearning behavior significantly benefited their problem-solving ability The relative performance of the 193ndash194 stu-dents on E-questions correlated very strongly with their useof special-case analysis on the exam evaluation tasks Notethat because we are looking at relative differences in perfor-mance between the experiment and control groups we cansafely rule out the possibility that this correlation is due tothe ldquoeasinessrdquo of the subject matter or any other such factor

The third hypothesis is that the use of unit analysis inrecitations homework and personal learning behavior didnot benefit studentsrsquo problem-solving ability as much asspecial-case analysis It would appear that the 193ndash194 stu-dents did not appreciate the greater utility of special-caseanalysis though since on the end-of-year survey they re-ported unit analysis as being just as valuable for learningphysics as special-case analysis see Table II It thereforeseems that the students had some limitations in their abilityto self-assess the utility of special-case analysis and unitanalysis for their learning Also the fact that NErelative isuncorrelated with both UArelative and SCArelative indicates thatthere is no transfer in the benefits of either strategy to topicsnot covered by the evaluation tasks from recitation andhomework assignments

TABLE IV Measures of relative overall class performance forexams 1 through 6 The definitions of each measure are described inthe text

Exam SCArelative UArelative Erelative NErelative

1 NA NA NA minus102

2 0179 2047 minus045 minus070

3 NA NA 0024 minus037

4 0645 1272 0066 minus098

5 0851 NA 0141 0010

6 0683 0893 0057 minus080

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-9

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 6: Impact of teaching students to use evaluation strategies

ratio of types of tasks given in recitation and homeworkassignments was altered after exam 2 Up until then roughlyhalf of the evaluation tasks included in these assignmentsfocused on unit analysis while the other half dealt withspecial-case analysis After seeing the results from exam 2though it was realized that students were having much moresuccess with unit analysis perhaps because it is a much sim-pler strategy than special-case analysis Thereafter the ma-jority of evaluation tasks in recitations and homework fo-cused on special-case analysis and a substantial fraction40 of the students are observed to spontaneously em-ploy SCA on the exam evaluation activities

It should be clarified that on exams 4 and 6 the set of193ndash194 students who used unit analysis and the set of thosethat used special-case analysis were not identical eventhough the fractional usage was similar for both strategiesThe overlap between these two sets defined as the ratio oftheir intersection to their union was 045 for exam 4 and053 for exam 6

It is interesting that special-case analysis was used at allamong the 203ndash204 students on the critical thinking tasks onexams 2 4 5 6 as a group of roughly 8ndash12 students spon-taneously employed SCA with a high level of quality aver-age rubric scores of roughly 23 Apparently some studentsmay enter our physics courses with a well-developed under-standing of special-case analysis perhaps due to prior phys-ics courses It is also worth noting that 203ndash204 studentswho tried using special-case analysis for the open-responseexam problem typically scored very well on the multiple-choice questions with 38 of them earning perfect scores

IX GOAL 2 USING AND VALUING EVALUATIONSTRATEGIES

As a means to assess student valuation and use of evalu-ation strategies anonymous surveys were administered at thelast recitation of the spring term in the Physics 194 courseIncluded were several questions asking students to rate howfrequently they had used each strategy to evaluate their ownwork outside of class and also to rate how strongly certainfactors inhibited their use of each strategy These surveyquestions are shown in Fig 2 with the results in Table IIThere were 158 respondents to the survey out of the 200students in the course giving a response rate of 79 Theremay be a selection bias among the respondents as the stu-dents knew that one recitation grade would be dropped fromthe final grade and that the last recitation would be a reviewsession instead of a normal recitation For that reason upperand lower bounds are listed on the average scores in Table IIwhere the bounds are determined by assuming all missingrespondents would have given either the highest or lowestpossible response for the questions

To conduct a Cronbach alpha test for internal consistencythe scores to questions 3andash3d and 6andash6d were madenegative accounting for the fact that these inhibitors arelikely to reduce the usage and valuation of evaluation strat-egies by students ie all scores were coded in the sameconceptual direction The calculated alpha value is =0800 a strong result giving confidence that the items are

consistent and can therefore be used as a basis for interpre-tation about student usage and valuation of evaluation strat-egies

One may safely assume that there were very few studentswho came into the 193ndash194 course already knowing valu-ing and using either evaluation strategy This assumption issupported by the small number of students from 203ndash204who used either strategy on the tasks in exams 2 4 5 and 6see above Given that assumption the results suggest amoderate degree of success in teaching students to incorpo-rate these strategies into their personal learning behavior In-deed responses to questions 1 and 4 were probably artifi-cially lowered due to the fact that we only includedevaluation tasks relating to half of the topics during the yearIf evaluation tasks had been given on all topics it seemslikely that students would have used the evaluation strategiesmore frequently It is worth pointing out that the evidencehere has limited inferential strength because of its indirect-ness Further studies with more direct means of assessingwhether students learn when and why to employ evaluationstrategies would be useful

1 How much have you used special-case analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

2 If you were a physics major and really interested in learning physics how useful do youthink special-case analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

3 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dospecial-case analysis (10 means the factor really made you not want to do special-caseanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do special-case analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a special-case analysis (if you wereniacutet confused put 0)

4 How much have you used dimensional analysis on your own to learndo physics1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

5 If you were a physics major and really interested in learning physics how useful do youthink dimensional analysis would be for your learning1=Not at all 2=Not Much 3=Somewhat 4=Very 5=Extremely

6 Rate on a scale from 0-10 how much each of the listed factors affected your desire to dodimensional analysis (10 means the factor really made you not want to do dimensionalanalysis 0 means the factor did not matter)a) ___Time constraints (due to other classwork jobs etc)b) ____Motivation to learn physics (or lack thereof)c) ____Confusion about how to do dimensional analysis (if you wereniacutet confused put 0)d) ____Confusion about the purpose of a dimensional analysis (if you wereniacutet confused put 0)

FIG 2 End-of-year survey questions regarding the studentsrsquoself-reported use of each evaluation strategy

TABLE II Results from end-of-year survey questions

Question number Average response Lower bound Upper bound

1 225 19 28

2 385 32 41

3a 7010 55 76

3b 5110 40 61

3c 2410 19 40

3d 2210 17 38

4 295 25 33

5 395 33 41

6a 5910 47 67

6b 4710 37 58

6c 1710 13 34

6d 1610 13 34

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-6

The relatively low response scores to questions 3c 3d 6cand 6d are consistent with the results for goal one of ourstudy indicating we were successful at helping students un-derstand when why and how to use each strategy The factthat the scores for 3c and 3d are higher than 6c and 6dappears reasonable because special-case analysis is certainlya much more complicated and multifaceted strategy that unitanalysis This disparity in complexity probably also explainswhy the scores to 3a and 3b were higher than 6a and 6b

X GOAL 3 ENHANCING PROBLEM-SOLVINGPERFORMANCE BY TEACHING EVALUATION

Several conventional multiple-choice questions werecommon to each of the 193ndash194 and 203ndash204 lecture examsThere were six exams during the year three per semesterThe midterm exams exams 1 2 4 and 5 were 80 minwhile the final exams exams 3 and 6 were 3 h The examquestions shared by the two classes were all designed by theinstructor of the 203ndash204 course Van Heuvelen Some ofthese shared multiple-choice questions were on topics whichthe 193ndash194 students had had evaluation tasks on such aswork-energy and dc circuits This set of multiple-choicequestions will be called E-questions The remainder of theshared multiple-choice exam questions covered topics thatno one had had evaluation tasks on such as momentum andfluid mechanics These will be called NE-questions The E-and NE-questions are listed in Appendix B of Warren 37and are assumed to provide accurate measures of studentproblem-solving performance 38 This assumption miti-gates the strength of the results reported below and morerobust measures of problem-solving performance would beuseful in future work Roughly 80 of these questions werenumerical and the remaining 20 of questions were eitherconceptual or symbolic Some of the conceptual questionswere directly taken or modified from standardized assess-ment tools such as FCI 39 and CSEM 40 All of thequestions had been used by Van Heuvelen in previous years

The first midterm exam had 2 NE-questions and noE-questions as there had not been any evaluation tasks usedto this point in the 193ndash194 course There were 2E-questions and 2 NE-questions on midterm exams 2 4 and5 Exam 3 had 4 NE questions and 3 E-questions and Exam6 had 5 NE-questions and 4 E-questions

Given this design we may make two predictions based onthe view of evaluation strategies as tools of self-regulatedlearning First because of the known population bias be-tween the 193ndash194 and 203ndash204 students the 203ndash204 stu-dents are expected to do better on the NE-questions Thisprediction assumes that whatever benefits the evaluationtasks may have for the 193ndash194 studentsrsquo E-question perfor-mance will not be transferable to the topics covered by NE-questions

A second prediction is that the performance of the 193ndash194 students should be relatively better on E-questions thanon NE-questions if the evaluation tasks succeed in benefitingstudent understanding of the topics covered by the tasksMoreover this boost of E-question performance should varyas students become more adept at the use of evaluation strat-

egies Therefore it is predicted that student performance onthe open-response evaluation tasks included on each examwill directly correlate with the relative performance of 193ndash194 students on E-questions

The relative performances of the 193ndash194 and 203ndash204students on E- and NE-questions are compared in Fig 3 andTable III In Fig 3 the normalized difference betweenthe two classes for each E- and NE-question is computed as

=Cexp minus Ccontrol

Cexp + Ccontrol

where Cexp and Ccontrol denote the percentage of students ineach condition who correctly answered the problem A posi-tive value therefore indicates that the experiment group193ndash194 outperformed the control group 203ndash204 on aparticular problem and vice-versa for a negative value Themagnitude gives a measure of how much of a differencethere was between groups and it exaggerates differences onproblems where the sum Cexp+Ccontrol is small That is itrewards improvements made on lsquodifficultrsquo problems morethan improvements made on ldquoeasyrdquo problems

The NE-question results show that the 203ndash204 studentsoften did significantly better on problems relating to topicsfor which both classes had very similar learning environ-ments with the differences on 7 NE-questions being signifi-cant at the 001-level and another 2 at the 005-level all withthe 203ndash204 sample doing better than the 193ndash194 sampleThis observation was anticipated due to the known selectionbias in this study Given this bias it would have been anachievement simply to bring the 193ndash194 students to a com-

NE-Questions

-02

-015

-01

-005

0

005

01

0 1 2 3 4 5 6 7

Exam

E-Questions

-03

-02

-01

0

01

02

03

0 1 2 3 4 5 6 7

Exam

FIG 3 Plots of the normalized difference between each condi-tion on multiple-choice exam questions

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-7

parable level of performance on the E-questions In fact theresults show that by exam 3 the performance of the 193ndash194students on E-questions was not only comparable to but wasbetter than that of the 203ndash204 students although only threeof the positive differences on E-questions were significant atthe 001-level

The favored hypothesis to explain these data is that theimproved relative performance of the 193ndash194 students onE-questions was due to the use of our evaluation tasks How-ever the strength of this hypothesis may be mitigated by thepresence of uncontrolled factors in the study Although stu-dents were not randomly assigned into the conditions it isdifficult to see how this may account for the observed differ-ences on NE- and E-questions Differences between theteaching populations for the two classes also fail to provide areasonable mechanism for producing the differences onE-questions and NE-questions simultaneously If there wereany sort of ldquogood teacher effectrdquo it would be likely to affectthe results for both E- and NE-questions

Another threat to internal validity stems from the fact thatthe lecturers were not blind to the study and may have un-intentionally skewed the results This potential experimenterbias can be argued against because of the fact that lecturesfor 193ndash194 and 203ndash204 were designed to be as similar aspossible and it is not at all clear how stylistic differencescould cause such preferential performance differences be-tween E- and NE-questions in any consistent fashion Alltopics whether those tested by E- or NE-questions werecovered in very similar fashions during lectures recitationsand laboratories Also the fact that these performance differ-ences developed and persisted through two different lecturersfor 193ndash194 suggests that differences in lecture style wereprobably not a significant causal factor

Another alternative hypothesis is that the results are dueto time-on-topic differences in the recitation and homeworkassignments Although recitation and homework assignmentswere designed to minimize such differences no actual mea-surements were made It is possible that the evaluation tasks

TABLE III Cross tabulation of performance on multiple choice MC exam problems and the exactsignificance 2-tailed for chi-squared tests of independence between each class =significant at 005 level =significant at 001 level

Exam Problem Class Correct p-value Exam Problem Class Correct p-value

Ex1 NE 1 193ndash194 711 0026 Ex4 NE 1 193ndash194 885 0001203ndash204 797 203ndash204 966

Ex1 NE 2 193ndash194 487 0001 Ex4 NE 2 193ndash194 411 0001203ndash204 655 203ndash204 559

Ex2 E 1 193ndash194 258 0001 Ex5 E 1 193ndash194 686 0001203ndash204 409 203ndash204 403

Ex2 E 2 193ndash194 813 0049 Ex5 E 2 193ndash194 957 0081

203ndash204 880 203ndash204 915

Ex2 NE 1 193ndash194 538 0361 Ex5 NE 1 193ndash194 894 0496

203ndash204 494 203ndash204 870

Ex2 NE 2 193ndash194 522 0001 Ex5 NE 2 193ndash194 931 0622

203ndash204 754 203ndash204 918

Ex3 E 1 193ndash194 831 1000 Ex6 E 1 193ndash194 913 0001203ndash204 832 203ndash204 787

Ex3 E 2 193ndash194 803 0330 Ex6 E 2 193ndash194 699 0244

203ndash204 764 203ndash204 645

Ex3 E 3 193ndash194 781 0124 Ex6 E 3 193ndash194 787 0268

203ndash204 712 203ndash204 740

Ex3 NE 1 193ndash194 809 0380 Ex6 E 4 193ndash194 546 0072

203ndash204 774 203ndash204 463

Ex3 NE 2 193ndash194 972 0368 Ex6 NE 1 193ndash194 601 0050203ndash204 954 203ndash204 690

Ex3 NE 3 193ndash194 691 0001 Ex6 NE 2 193ndash194 503 0001203ndash204 820 203ndash204 715

Ex3 NE 4 193ndash194 455 0038 Ex6 NE 3 193ndash194 623 0768

203ndash204 551 203ndash204 640

Ex4 E 1 193ndash194 964 0062 Ex6 NE 4 193ndash194 639 0001203ndash204 922 203ndash204 789

Ex4 E 2 193ndash194 911 0001 Ex6 NE 5 193ndash194 705 0146

203ndash204 731 203ndash204 765

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-8

simply took longer for students to complete and that theirperformance on E-questions was due not to the format of theactivity but simply that they spent more time thinking aboutthe concepts involved in the activity However anecdotalevidence from teaching assistants who helped students withtheir recitation and homework assignments suggests that stu-dents did not take an inordinate amount of time to completethe evaluation tasks Also there was no indication from stu-dent comments made to the teaching assistants that they feltthe evaluation tasks took much longer than other recitationand homework problems

While the results above indicate that the use of evaluationtasks benefited student problem-solving performance we canfurther test this hypothesis by looking for a concrete associa-tion between the strength of studentrsquos evaluation abilities andtheir problem-solving performance We construct two ad hocmeasures to reflect studentsrsquo apparent understanding ofwhen why and how to use each evaluation strategy for acertain topic The measures are

SCArelative = FSexp QSexp minus FScontrol QScontrol

UArelative = FUexp QUexp minus FUcontrol QUcontrol

where SCA stands for ldquospecial-case analysisrdquo UA stands forldquounit analysisrdquo FS is the fraction of the class experimentalor control which used special-case analysis and FU is thefraction which used unit analysis on the evaluation tasks in-cluded on exams 2 4 5 and 6 UArelative is not applicable forexam 5 due to the task format conceptual counterargumentAlso QS is the average quality of the classrsquo special-caseanalyses and QU is the average quality of the classrsquo unitanalyses for these tasks Each of these quantities correspondsto the appropriate values in Table I

The fractions FS and FU serve as indicators of how wellstudents understand when and why to use each strategy Ifstudents do not understand the purpose of an evaluation strat-egy they are not likely to spontaneously employ it on thecritical thinking or conceptual counterargument tasks withoutspecific prompting The quantities QS and QU are given bythe evaluation ability rubrics and measure the studentsrsquo un-derstanding of how to use each strategy By taking the prod-ucts FSQS and FUQU we get a pair of numbers be-tween 0 and 3 which are taken to be indicative of each classrsquooverall understanding of each evaluation strategy

To measure relative student performance on E- and NE-questions normalized differences in class performance foreach problem category are averaged on each exam

Erelative =1

NEminusproblems

Eminusproblems

Eminusproblem

NErelative =1

NNEminusproblems

NEminusproblems

NEminusproblem

where represents the normalized difference between thetwo classesrsquo performance as plotted in Fig 3 and N de-notes the number of problems of a certain type either E- orNE-questions on an exam

Table IV lists the values for these four measures of rela-tive class performance on each exam To determine whetherspecial-case analysis and unit analysis performance related toperformance on the exam questions a Pearson correlationanalysis of these data was performed The results indicatethat Erelative is significantly positively correlated withSCArelative r=996 p=004 and negatively correlated withUArelative r=minus921 p=255 while NErelative is not signifi-cantly correlated with both SCArelative r=447 p=553 andUArelative r=528 p=646

The most parsimonious account for these results entailsthree hypotheses One is that giving a greater proportion ofevaluation tasks on special-case analysis after exam 2 asdiscussed above helped students to improve their use of thatstrategy while at the same time reducing their use of unitanalysis The fact that we manipulated these two independentfactors in this fashion could thereby explain why the frac-tional usage of unit analysis steadily declined from exams 2to 6 see Table II

A second hypothesis is that studentsrsquo use of special-caseanalysis in recitation homework and in their personallearning behavior significantly benefited their problem-solving ability The relative performance of the 193ndash194 stu-dents on E-questions correlated very strongly with their useof special-case analysis on the exam evaluation tasks Notethat because we are looking at relative differences in perfor-mance between the experiment and control groups we cansafely rule out the possibility that this correlation is due tothe ldquoeasinessrdquo of the subject matter or any other such factor

The third hypothesis is that the use of unit analysis inrecitations homework and personal learning behavior didnot benefit studentsrsquo problem-solving ability as much asspecial-case analysis It would appear that the 193ndash194 stu-dents did not appreciate the greater utility of special-caseanalysis though since on the end-of-year survey they re-ported unit analysis as being just as valuable for learningphysics as special-case analysis see Table II It thereforeseems that the students had some limitations in their abilityto self-assess the utility of special-case analysis and unitanalysis for their learning Also the fact that NErelative isuncorrelated with both UArelative and SCArelative indicates thatthere is no transfer in the benefits of either strategy to topicsnot covered by the evaluation tasks from recitation andhomework assignments

TABLE IV Measures of relative overall class performance forexams 1 through 6 The definitions of each measure are described inthe text

Exam SCArelative UArelative Erelative NErelative

1 NA NA NA minus102

2 0179 2047 minus045 minus070

3 NA NA 0024 minus037

4 0645 1272 0066 minus098

5 0851 NA 0141 0010

6 0683 0893 0057 minus080

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-9

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 7: Impact of teaching students to use evaluation strategies

The relatively low response scores to questions 3c 3d 6cand 6d are consistent with the results for goal one of ourstudy indicating we were successful at helping students un-derstand when why and how to use each strategy The factthat the scores for 3c and 3d are higher than 6c and 6dappears reasonable because special-case analysis is certainlya much more complicated and multifaceted strategy that unitanalysis This disparity in complexity probably also explainswhy the scores to 3a and 3b were higher than 6a and 6b

X GOAL 3 ENHANCING PROBLEM-SOLVINGPERFORMANCE BY TEACHING EVALUATION

Several conventional multiple-choice questions werecommon to each of the 193ndash194 and 203ndash204 lecture examsThere were six exams during the year three per semesterThe midterm exams exams 1 2 4 and 5 were 80 minwhile the final exams exams 3 and 6 were 3 h The examquestions shared by the two classes were all designed by theinstructor of the 203ndash204 course Van Heuvelen Some ofthese shared multiple-choice questions were on topics whichthe 193ndash194 students had had evaluation tasks on such aswork-energy and dc circuits This set of multiple-choicequestions will be called E-questions The remainder of theshared multiple-choice exam questions covered topics thatno one had had evaluation tasks on such as momentum andfluid mechanics These will be called NE-questions The E-and NE-questions are listed in Appendix B of Warren 37and are assumed to provide accurate measures of studentproblem-solving performance 38 This assumption miti-gates the strength of the results reported below and morerobust measures of problem-solving performance would beuseful in future work Roughly 80 of these questions werenumerical and the remaining 20 of questions were eitherconceptual or symbolic Some of the conceptual questionswere directly taken or modified from standardized assess-ment tools such as FCI 39 and CSEM 40 All of thequestions had been used by Van Heuvelen in previous years

The first midterm exam had 2 NE-questions and noE-questions as there had not been any evaluation tasks usedto this point in the 193ndash194 course There were 2E-questions and 2 NE-questions on midterm exams 2 4 and5 Exam 3 had 4 NE questions and 3 E-questions and Exam6 had 5 NE-questions and 4 E-questions

Given this design we may make two predictions based onthe view of evaluation strategies as tools of self-regulatedlearning First because of the known population bias be-tween the 193ndash194 and 203ndash204 students the 203ndash204 stu-dents are expected to do better on the NE-questions Thisprediction assumes that whatever benefits the evaluationtasks may have for the 193ndash194 studentsrsquo E-question perfor-mance will not be transferable to the topics covered by NE-questions

A second prediction is that the performance of the 193ndash194 students should be relatively better on E-questions thanon NE-questions if the evaluation tasks succeed in benefitingstudent understanding of the topics covered by the tasksMoreover this boost of E-question performance should varyas students become more adept at the use of evaluation strat-

egies Therefore it is predicted that student performance onthe open-response evaluation tasks included on each examwill directly correlate with the relative performance of 193ndash194 students on E-questions

The relative performances of the 193ndash194 and 203ndash204students on E- and NE-questions are compared in Fig 3 andTable III In Fig 3 the normalized difference betweenthe two classes for each E- and NE-question is computed as

=Cexp minus Ccontrol

Cexp + Ccontrol

where Cexp and Ccontrol denote the percentage of students ineach condition who correctly answered the problem A posi-tive value therefore indicates that the experiment group193ndash194 outperformed the control group 203ndash204 on aparticular problem and vice-versa for a negative value Themagnitude gives a measure of how much of a differencethere was between groups and it exaggerates differences onproblems where the sum Cexp+Ccontrol is small That is itrewards improvements made on lsquodifficultrsquo problems morethan improvements made on ldquoeasyrdquo problems

The NE-question results show that the 203ndash204 studentsoften did significantly better on problems relating to topicsfor which both classes had very similar learning environ-ments with the differences on 7 NE-questions being signifi-cant at the 001-level and another 2 at the 005-level all withthe 203ndash204 sample doing better than the 193ndash194 sampleThis observation was anticipated due to the known selectionbias in this study Given this bias it would have been anachievement simply to bring the 193ndash194 students to a com-

NE-Questions

-02

-015

-01

-005

0

005

01

0 1 2 3 4 5 6 7

Exam

E-Questions

-03

-02

-01

0

01

02

03

0 1 2 3 4 5 6 7

Exam

FIG 3 Plots of the normalized difference between each condi-tion on multiple-choice exam questions

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-7

parable level of performance on the E-questions In fact theresults show that by exam 3 the performance of the 193ndash194students on E-questions was not only comparable to but wasbetter than that of the 203ndash204 students although only threeof the positive differences on E-questions were significant atthe 001-level

The favored hypothesis to explain these data is that theimproved relative performance of the 193ndash194 students onE-questions was due to the use of our evaluation tasks How-ever the strength of this hypothesis may be mitigated by thepresence of uncontrolled factors in the study Although stu-dents were not randomly assigned into the conditions it isdifficult to see how this may account for the observed differ-ences on NE- and E-questions Differences between theteaching populations for the two classes also fail to provide areasonable mechanism for producing the differences onE-questions and NE-questions simultaneously If there wereany sort of ldquogood teacher effectrdquo it would be likely to affectthe results for both E- and NE-questions

Another threat to internal validity stems from the fact thatthe lecturers were not blind to the study and may have un-intentionally skewed the results This potential experimenterbias can be argued against because of the fact that lecturesfor 193ndash194 and 203ndash204 were designed to be as similar aspossible and it is not at all clear how stylistic differencescould cause such preferential performance differences be-tween E- and NE-questions in any consistent fashion Alltopics whether those tested by E- or NE-questions werecovered in very similar fashions during lectures recitationsand laboratories Also the fact that these performance differ-ences developed and persisted through two different lecturersfor 193ndash194 suggests that differences in lecture style wereprobably not a significant causal factor

Another alternative hypothesis is that the results are dueto time-on-topic differences in the recitation and homeworkassignments Although recitation and homework assignmentswere designed to minimize such differences no actual mea-surements were made It is possible that the evaluation tasks

TABLE III Cross tabulation of performance on multiple choice MC exam problems and the exactsignificance 2-tailed for chi-squared tests of independence between each class =significant at 005 level =significant at 001 level

Exam Problem Class Correct p-value Exam Problem Class Correct p-value

Ex1 NE 1 193ndash194 711 0026 Ex4 NE 1 193ndash194 885 0001203ndash204 797 203ndash204 966

Ex1 NE 2 193ndash194 487 0001 Ex4 NE 2 193ndash194 411 0001203ndash204 655 203ndash204 559

Ex2 E 1 193ndash194 258 0001 Ex5 E 1 193ndash194 686 0001203ndash204 409 203ndash204 403

Ex2 E 2 193ndash194 813 0049 Ex5 E 2 193ndash194 957 0081

203ndash204 880 203ndash204 915

Ex2 NE 1 193ndash194 538 0361 Ex5 NE 1 193ndash194 894 0496

203ndash204 494 203ndash204 870

Ex2 NE 2 193ndash194 522 0001 Ex5 NE 2 193ndash194 931 0622

203ndash204 754 203ndash204 918

Ex3 E 1 193ndash194 831 1000 Ex6 E 1 193ndash194 913 0001203ndash204 832 203ndash204 787

Ex3 E 2 193ndash194 803 0330 Ex6 E 2 193ndash194 699 0244

203ndash204 764 203ndash204 645

Ex3 E 3 193ndash194 781 0124 Ex6 E 3 193ndash194 787 0268

203ndash204 712 203ndash204 740

Ex3 NE 1 193ndash194 809 0380 Ex6 E 4 193ndash194 546 0072

203ndash204 774 203ndash204 463

Ex3 NE 2 193ndash194 972 0368 Ex6 NE 1 193ndash194 601 0050203ndash204 954 203ndash204 690

Ex3 NE 3 193ndash194 691 0001 Ex6 NE 2 193ndash194 503 0001203ndash204 820 203ndash204 715

Ex3 NE 4 193ndash194 455 0038 Ex6 NE 3 193ndash194 623 0768

203ndash204 551 203ndash204 640

Ex4 E 1 193ndash194 964 0062 Ex6 NE 4 193ndash194 639 0001203ndash204 922 203ndash204 789

Ex4 E 2 193ndash194 911 0001 Ex6 NE 5 193ndash194 705 0146

203ndash204 731 203ndash204 765

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-8

simply took longer for students to complete and that theirperformance on E-questions was due not to the format of theactivity but simply that they spent more time thinking aboutthe concepts involved in the activity However anecdotalevidence from teaching assistants who helped students withtheir recitation and homework assignments suggests that stu-dents did not take an inordinate amount of time to completethe evaluation tasks Also there was no indication from stu-dent comments made to the teaching assistants that they feltthe evaluation tasks took much longer than other recitationand homework problems

While the results above indicate that the use of evaluationtasks benefited student problem-solving performance we canfurther test this hypothesis by looking for a concrete associa-tion between the strength of studentrsquos evaluation abilities andtheir problem-solving performance We construct two ad hocmeasures to reflect studentsrsquo apparent understanding ofwhen why and how to use each evaluation strategy for acertain topic The measures are

SCArelative = FSexp QSexp minus FScontrol QScontrol

UArelative = FUexp QUexp minus FUcontrol QUcontrol

where SCA stands for ldquospecial-case analysisrdquo UA stands forldquounit analysisrdquo FS is the fraction of the class experimentalor control which used special-case analysis and FU is thefraction which used unit analysis on the evaluation tasks in-cluded on exams 2 4 5 and 6 UArelative is not applicable forexam 5 due to the task format conceptual counterargumentAlso QS is the average quality of the classrsquo special-caseanalyses and QU is the average quality of the classrsquo unitanalyses for these tasks Each of these quantities correspondsto the appropriate values in Table I

The fractions FS and FU serve as indicators of how wellstudents understand when and why to use each strategy Ifstudents do not understand the purpose of an evaluation strat-egy they are not likely to spontaneously employ it on thecritical thinking or conceptual counterargument tasks withoutspecific prompting The quantities QS and QU are given bythe evaluation ability rubrics and measure the studentsrsquo un-derstanding of how to use each strategy By taking the prod-ucts FSQS and FUQU we get a pair of numbers be-tween 0 and 3 which are taken to be indicative of each classrsquooverall understanding of each evaluation strategy

To measure relative student performance on E- and NE-questions normalized differences in class performance foreach problem category are averaged on each exam

Erelative =1

NEminusproblems

Eminusproblems

Eminusproblem

NErelative =1

NNEminusproblems

NEminusproblems

NEminusproblem

where represents the normalized difference between thetwo classesrsquo performance as plotted in Fig 3 and N de-notes the number of problems of a certain type either E- orNE-questions on an exam

Table IV lists the values for these four measures of rela-tive class performance on each exam To determine whetherspecial-case analysis and unit analysis performance related toperformance on the exam questions a Pearson correlationanalysis of these data was performed The results indicatethat Erelative is significantly positively correlated withSCArelative r=996 p=004 and negatively correlated withUArelative r=minus921 p=255 while NErelative is not signifi-cantly correlated with both SCArelative r=447 p=553 andUArelative r=528 p=646

The most parsimonious account for these results entailsthree hypotheses One is that giving a greater proportion ofevaluation tasks on special-case analysis after exam 2 asdiscussed above helped students to improve their use of thatstrategy while at the same time reducing their use of unitanalysis The fact that we manipulated these two independentfactors in this fashion could thereby explain why the frac-tional usage of unit analysis steadily declined from exams 2to 6 see Table II

A second hypothesis is that studentsrsquo use of special-caseanalysis in recitation homework and in their personallearning behavior significantly benefited their problem-solving ability The relative performance of the 193ndash194 stu-dents on E-questions correlated very strongly with their useof special-case analysis on the exam evaluation tasks Notethat because we are looking at relative differences in perfor-mance between the experiment and control groups we cansafely rule out the possibility that this correlation is due tothe ldquoeasinessrdquo of the subject matter or any other such factor

The third hypothesis is that the use of unit analysis inrecitations homework and personal learning behavior didnot benefit studentsrsquo problem-solving ability as much asspecial-case analysis It would appear that the 193ndash194 stu-dents did not appreciate the greater utility of special-caseanalysis though since on the end-of-year survey they re-ported unit analysis as being just as valuable for learningphysics as special-case analysis see Table II It thereforeseems that the students had some limitations in their abilityto self-assess the utility of special-case analysis and unitanalysis for their learning Also the fact that NErelative isuncorrelated with both UArelative and SCArelative indicates thatthere is no transfer in the benefits of either strategy to topicsnot covered by the evaluation tasks from recitation andhomework assignments

TABLE IV Measures of relative overall class performance forexams 1 through 6 The definitions of each measure are described inthe text

Exam SCArelative UArelative Erelative NErelative

1 NA NA NA minus102

2 0179 2047 minus045 minus070

3 NA NA 0024 minus037

4 0645 1272 0066 minus098

5 0851 NA 0141 0010

6 0683 0893 0057 minus080

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-9

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 8: Impact of teaching students to use evaluation strategies

parable level of performance on the E-questions In fact theresults show that by exam 3 the performance of the 193ndash194students on E-questions was not only comparable to but wasbetter than that of the 203ndash204 students although only threeof the positive differences on E-questions were significant atthe 001-level

The favored hypothesis to explain these data is that theimproved relative performance of the 193ndash194 students onE-questions was due to the use of our evaluation tasks How-ever the strength of this hypothesis may be mitigated by thepresence of uncontrolled factors in the study Although stu-dents were not randomly assigned into the conditions it isdifficult to see how this may account for the observed differ-ences on NE- and E-questions Differences between theteaching populations for the two classes also fail to provide areasonable mechanism for producing the differences onE-questions and NE-questions simultaneously If there wereany sort of ldquogood teacher effectrdquo it would be likely to affectthe results for both E- and NE-questions

Another threat to internal validity stems from the fact thatthe lecturers were not blind to the study and may have un-intentionally skewed the results This potential experimenterbias can be argued against because of the fact that lecturesfor 193ndash194 and 203ndash204 were designed to be as similar aspossible and it is not at all clear how stylistic differencescould cause such preferential performance differences be-tween E- and NE-questions in any consistent fashion Alltopics whether those tested by E- or NE-questions werecovered in very similar fashions during lectures recitationsand laboratories Also the fact that these performance differ-ences developed and persisted through two different lecturersfor 193ndash194 suggests that differences in lecture style wereprobably not a significant causal factor

Another alternative hypothesis is that the results are dueto time-on-topic differences in the recitation and homeworkassignments Although recitation and homework assignmentswere designed to minimize such differences no actual mea-surements were made It is possible that the evaluation tasks

TABLE III Cross tabulation of performance on multiple choice MC exam problems and the exactsignificance 2-tailed for chi-squared tests of independence between each class =significant at 005 level =significant at 001 level

Exam Problem Class Correct p-value Exam Problem Class Correct p-value

Ex1 NE 1 193ndash194 711 0026 Ex4 NE 1 193ndash194 885 0001203ndash204 797 203ndash204 966

Ex1 NE 2 193ndash194 487 0001 Ex4 NE 2 193ndash194 411 0001203ndash204 655 203ndash204 559

Ex2 E 1 193ndash194 258 0001 Ex5 E 1 193ndash194 686 0001203ndash204 409 203ndash204 403

Ex2 E 2 193ndash194 813 0049 Ex5 E 2 193ndash194 957 0081

203ndash204 880 203ndash204 915

Ex2 NE 1 193ndash194 538 0361 Ex5 NE 1 193ndash194 894 0496

203ndash204 494 203ndash204 870

Ex2 NE 2 193ndash194 522 0001 Ex5 NE 2 193ndash194 931 0622

203ndash204 754 203ndash204 918

Ex3 E 1 193ndash194 831 1000 Ex6 E 1 193ndash194 913 0001203ndash204 832 203ndash204 787

Ex3 E 2 193ndash194 803 0330 Ex6 E 2 193ndash194 699 0244

203ndash204 764 203ndash204 645

Ex3 E 3 193ndash194 781 0124 Ex6 E 3 193ndash194 787 0268

203ndash204 712 203ndash204 740

Ex3 NE 1 193ndash194 809 0380 Ex6 E 4 193ndash194 546 0072

203ndash204 774 203ndash204 463

Ex3 NE 2 193ndash194 972 0368 Ex6 NE 1 193ndash194 601 0050203ndash204 954 203ndash204 690

Ex3 NE 3 193ndash194 691 0001 Ex6 NE 2 193ndash194 503 0001203ndash204 820 203ndash204 715

Ex3 NE 4 193ndash194 455 0038 Ex6 NE 3 193ndash194 623 0768

203ndash204 551 203ndash204 640

Ex4 E 1 193ndash194 964 0062 Ex6 NE 4 193ndash194 639 0001203ndash204 922 203ndash204 789

Ex4 E 2 193ndash194 911 0001 Ex6 NE 5 193ndash194 705 0146

203ndash204 731 203ndash204 765

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-8

simply took longer for students to complete and that theirperformance on E-questions was due not to the format of theactivity but simply that they spent more time thinking aboutthe concepts involved in the activity However anecdotalevidence from teaching assistants who helped students withtheir recitation and homework assignments suggests that stu-dents did not take an inordinate amount of time to completethe evaluation tasks Also there was no indication from stu-dent comments made to the teaching assistants that they feltthe evaluation tasks took much longer than other recitationand homework problems

While the results above indicate that the use of evaluationtasks benefited student problem-solving performance we canfurther test this hypothesis by looking for a concrete associa-tion between the strength of studentrsquos evaluation abilities andtheir problem-solving performance We construct two ad hocmeasures to reflect studentsrsquo apparent understanding ofwhen why and how to use each evaluation strategy for acertain topic The measures are

SCArelative = FSexp QSexp minus FScontrol QScontrol

UArelative = FUexp QUexp minus FUcontrol QUcontrol

where SCA stands for ldquospecial-case analysisrdquo UA stands forldquounit analysisrdquo FS is the fraction of the class experimentalor control which used special-case analysis and FU is thefraction which used unit analysis on the evaluation tasks in-cluded on exams 2 4 5 and 6 UArelative is not applicable forexam 5 due to the task format conceptual counterargumentAlso QS is the average quality of the classrsquo special-caseanalyses and QU is the average quality of the classrsquo unitanalyses for these tasks Each of these quantities correspondsto the appropriate values in Table I

The fractions FS and FU serve as indicators of how wellstudents understand when and why to use each strategy Ifstudents do not understand the purpose of an evaluation strat-egy they are not likely to spontaneously employ it on thecritical thinking or conceptual counterargument tasks withoutspecific prompting The quantities QS and QU are given bythe evaluation ability rubrics and measure the studentsrsquo un-derstanding of how to use each strategy By taking the prod-ucts FSQS and FUQU we get a pair of numbers be-tween 0 and 3 which are taken to be indicative of each classrsquooverall understanding of each evaluation strategy

To measure relative student performance on E- and NE-questions normalized differences in class performance foreach problem category are averaged on each exam

Erelative =1

NEminusproblems

Eminusproblems

Eminusproblem

NErelative =1

NNEminusproblems

NEminusproblems

NEminusproblem

where represents the normalized difference between thetwo classesrsquo performance as plotted in Fig 3 and N de-notes the number of problems of a certain type either E- orNE-questions on an exam

Table IV lists the values for these four measures of rela-tive class performance on each exam To determine whetherspecial-case analysis and unit analysis performance related toperformance on the exam questions a Pearson correlationanalysis of these data was performed The results indicatethat Erelative is significantly positively correlated withSCArelative r=996 p=004 and negatively correlated withUArelative r=minus921 p=255 while NErelative is not signifi-cantly correlated with both SCArelative r=447 p=553 andUArelative r=528 p=646

The most parsimonious account for these results entailsthree hypotheses One is that giving a greater proportion ofevaluation tasks on special-case analysis after exam 2 asdiscussed above helped students to improve their use of thatstrategy while at the same time reducing their use of unitanalysis The fact that we manipulated these two independentfactors in this fashion could thereby explain why the frac-tional usage of unit analysis steadily declined from exams 2to 6 see Table II

A second hypothesis is that studentsrsquo use of special-caseanalysis in recitation homework and in their personallearning behavior significantly benefited their problem-solving ability The relative performance of the 193ndash194 stu-dents on E-questions correlated very strongly with their useof special-case analysis on the exam evaluation tasks Notethat because we are looking at relative differences in perfor-mance between the experiment and control groups we cansafely rule out the possibility that this correlation is due tothe ldquoeasinessrdquo of the subject matter or any other such factor

The third hypothesis is that the use of unit analysis inrecitations homework and personal learning behavior didnot benefit studentsrsquo problem-solving ability as much asspecial-case analysis It would appear that the 193ndash194 stu-dents did not appreciate the greater utility of special-caseanalysis though since on the end-of-year survey they re-ported unit analysis as being just as valuable for learningphysics as special-case analysis see Table II It thereforeseems that the students had some limitations in their abilityto self-assess the utility of special-case analysis and unitanalysis for their learning Also the fact that NErelative isuncorrelated with both UArelative and SCArelative indicates thatthere is no transfer in the benefits of either strategy to topicsnot covered by the evaluation tasks from recitation andhomework assignments

TABLE IV Measures of relative overall class performance forexams 1 through 6 The definitions of each measure are described inthe text

Exam SCArelative UArelative Erelative NErelative

1 NA NA NA minus102

2 0179 2047 minus045 minus070

3 NA NA 0024 minus037

4 0645 1272 0066 minus098

5 0851 NA 0141 0010

6 0683 0893 0057 minus080

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-9

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 9: Impact of teaching students to use evaluation strategies

simply took longer for students to complete and that theirperformance on E-questions was due not to the format of theactivity but simply that they spent more time thinking aboutthe concepts involved in the activity However anecdotalevidence from teaching assistants who helped students withtheir recitation and homework assignments suggests that stu-dents did not take an inordinate amount of time to completethe evaluation tasks Also there was no indication from stu-dent comments made to the teaching assistants that they feltthe evaluation tasks took much longer than other recitationand homework problems

While the results above indicate that the use of evaluationtasks benefited student problem-solving performance we canfurther test this hypothesis by looking for a concrete associa-tion between the strength of studentrsquos evaluation abilities andtheir problem-solving performance We construct two ad hocmeasures to reflect studentsrsquo apparent understanding ofwhen why and how to use each evaluation strategy for acertain topic The measures are

SCArelative = FSexp QSexp minus FScontrol QScontrol

UArelative = FUexp QUexp minus FUcontrol QUcontrol

where SCA stands for ldquospecial-case analysisrdquo UA stands forldquounit analysisrdquo FS is the fraction of the class experimentalor control which used special-case analysis and FU is thefraction which used unit analysis on the evaluation tasks in-cluded on exams 2 4 5 and 6 UArelative is not applicable forexam 5 due to the task format conceptual counterargumentAlso QS is the average quality of the classrsquo special-caseanalyses and QU is the average quality of the classrsquo unitanalyses for these tasks Each of these quantities correspondsto the appropriate values in Table I

The fractions FS and FU serve as indicators of how wellstudents understand when and why to use each strategy Ifstudents do not understand the purpose of an evaluation strat-egy they are not likely to spontaneously employ it on thecritical thinking or conceptual counterargument tasks withoutspecific prompting The quantities QS and QU are given bythe evaluation ability rubrics and measure the studentsrsquo un-derstanding of how to use each strategy By taking the prod-ucts FSQS and FUQU we get a pair of numbers be-tween 0 and 3 which are taken to be indicative of each classrsquooverall understanding of each evaluation strategy

To measure relative student performance on E- and NE-questions normalized differences in class performance foreach problem category are averaged on each exam

Erelative =1

NEminusproblems

Eminusproblems

Eminusproblem

NErelative =1

NNEminusproblems

NEminusproblems

NEminusproblem

where represents the normalized difference between thetwo classesrsquo performance as plotted in Fig 3 and N de-notes the number of problems of a certain type either E- orNE-questions on an exam

Table IV lists the values for these four measures of rela-tive class performance on each exam To determine whetherspecial-case analysis and unit analysis performance related toperformance on the exam questions a Pearson correlationanalysis of these data was performed The results indicatethat Erelative is significantly positively correlated withSCArelative r=996 p=004 and negatively correlated withUArelative r=minus921 p=255 while NErelative is not signifi-cantly correlated with both SCArelative r=447 p=553 andUArelative r=528 p=646

The most parsimonious account for these results entailsthree hypotheses One is that giving a greater proportion ofevaluation tasks on special-case analysis after exam 2 asdiscussed above helped students to improve their use of thatstrategy while at the same time reducing their use of unitanalysis The fact that we manipulated these two independentfactors in this fashion could thereby explain why the frac-tional usage of unit analysis steadily declined from exams 2to 6 see Table II

A second hypothesis is that studentsrsquo use of special-caseanalysis in recitation homework and in their personallearning behavior significantly benefited their problem-solving ability The relative performance of the 193ndash194 stu-dents on E-questions correlated very strongly with their useof special-case analysis on the exam evaluation tasks Notethat because we are looking at relative differences in perfor-mance between the experiment and control groups we cansafely rule out the possibility that this correlation is due tothe ldquoeasinessrdquo of the subject matter or any other such factor

The third hypothesis is that the use of unit analysis inrecitations homework and personal learning behavior didnot benefit studentsrsquo problem-solving ability as much asspecial-case analysis It would appear that the 193ndash194 stu-dents did not appreciate the greater utility of special-caseanalysis though since on the end-of-year survey they re-ported unit analysis as being just as valuable for learningphysics as special-case analysis see Table II It thereforeseems that the students had some limitations in their abilityto self-assess the utility of special-case analysis and unitanalysis for their learning Also the fact that NErelative isuncorrelated with both UArelative and SCArelative indicates thatthere is no transfer in the benefits of either strategy to topicsnot covered by the evaluation tasks from recitation andhomework assignments

TABLE IV Measures of relative overall class performance forexams 1 through 6 The definitions of each measure are described inthe text

Exam SCArelative UArelative Erelative NErelative

1 NA NA NA minus102

2 0179 2047 minus045 minus070

3 NA NA 0024 minus037

4 0645 1272 0066 minus098

5 0851 NA 0141 0010

6 0683 0893 0057 minus080

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-9

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 10: Impact of teaching students to use evaluation strategies

XI DISCUSSION

While the strategy of special-case analysis is rather com-plex and took more time and effort for students to learn itappears to provide significant benefits to performance onmultiple-choice exam questions In contrast the simplerstrategy of unit analysis is learned very quickly but appar-ently provided little or no benefits on exam questions Herewe shall discuss some possible reasons for these results

In his doctoral dissertation Sherin 41 presented and dis-cussed a categorization of interpretive strategies which hecalls ldquointerpretive devicesrdquo used by students to give mean-ing to physics equations while working on back-of-chapterhomework problems His classification of these strategies isreproduced in Table V There are three classes of devicesnarrative static and special case

Each of these devices plays a role in what we call special-case analysis this article uses the term ldquospecial-caserdquo in amuch broader sense than Sherin One may conduct manydifferent special-case analyses of an equation using any oneof these interpretive devices as the specific means For ex-ample one may choose to analyze a case where some param-eter is changed in the problem or where a quantity is takento some limiting value By engaging students in special-caseanalyses through the use of our tasks students are given theopportunity and feedback to practice the use of these inter-pretive devices

Sherin argues that interpretive devices function as sense-making tools which build meaning around an equation byrelating it to other pieces of knowledge The field of semiot-ics studies exactly how humans make meaning using re-sources such as systems of words images actions and sym-bols and has identified two aspects of meaning calledtypological meaning meaning by kind and topologicalmeaning meaning by degree 42 Typological meaning isestablished by distinguishing categories of objects relationsand processes For example an equation gains typologicalmeanings by identifying categories of physical situations inwhich it is applicable eg we can use a=v2 r for circularmotion the types of physical idealizations it assumes egwe will assume v and r are constant and by categorizing the

quantities in the equation eg does v correspond to voltageor velocity Does r correspond to the radius of the circle orthe size of the object

Topological meaning is created by examining changes bysome degree For example an equation gains topologicalmeanings by being relatable to other physical situationswithin the category of applicable situations eg if r hadbeen greater how would a change or with situationswhich lie on the borderline of applicable situations eg ifwe let r= what happens to a Also an equation gainstopological meaning by examining gradual deviations fromthe idealizations used by the equation eg what would hap-pen if v was increasing Typological and topological mean-ing for physics equations may be developed by a variety ofactivities 43

Typological and topological meanings are not distinct butnecessarily relate to one another For our example of centrip-etal acceleration by taking r to infinity we enter a new cat-egory of physical situations as the motion now has constantvelocity Therefore this aspect of topological meaning con-struction develops typological meaning by highlighting therelation between the two categories of constant velocity mo-tion and uniform circular motion

So it is possible that the use of special-case analysis tasksinasmuch as they compel students to use interpretive devicesaid student understanding of equations and consequentlybenefit their performance on exam questions Unit analysison the other hand makes no apparent use of interpretivedevices and therefore seems incapable of constructing topo-logical meaning Unit analysis may help to construct sometypological meaning as it compels students to figure outwhich physical property each specific quantity correspondsto but in general would seem to be weaker than special-caseanalysis at developing meaning for equations This does notmean that unit analysis is a worthless strategy as it certainlydoes serve the important function of testing for self-consistency in an equation It may be that unit analysis couldhave a stronger impact if taught in a different fashion whichplaces more emphasis on the conceptual aspects associatedwith it

XII CONCLUSION

Evaluation is a well-recognized part of learning yet itsimportance is often not reflected in our introductory physicscourses This study has developed tasks and rubrics designedto help students become evaluators by engaging them in for-mative assessment activities Results indicated that the use ofevaluation activities particularly those focusing on special-case analysis help generate a significant increase in perfor-mance on multiple-choice exam questions One possible ex-planation for this is that special-case analysis tasks engagestudents in the use of interpretive devices to construct typo-logical and topological meaning for equations It should benoted that increases in performance were only observed onE-questions indicating a lack of transfer to topics not cov-ered by the evaluation activities Future studies designed toreplicate and expand upon these results are necessary in or-der to strengthen the external validity of our results Also the

TABLE V Interpretive devices listed by class Reproduced fromSherin 41 Chapter 4 Fig 2

Narrative Class Static Class

Changing Parameters Specific Moment

Physical Change Generic Moment

Changing Situation Steady State

Static Forces

Conservation

Accounting

Special Case

Restricted Value

Specific Value

Limiting Case

Relative Values

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-10

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 11: Impact of teaching students to use evaluation strategies

potential epistemological benefits of using evaluation tasksshould be investigated as well as the interactions with stu-dent motivation and self-efficacy

ACKNOWLEDGMENTS

This study was conducted while the author was a memberof the Rutgers Physics amp Astronomy Education Group Iwould like to thank Alan Van Heuvelen Eugenia EtkinaSahana Murthy Michael Gentile David Brookes and DavidRosengrant for valuable discussions and help with the execu-

tion of this project Also the feedback of three anonymousreferees was helpful in improving the quality of this paperThis work was partially supported by the National ScienceFoundation Grant No DUE-0241078

APPENDIX EVALUATION ACTIVITIES

See separate auxiliary material for the evaluation tasksused in the 193ndash194 recitation and homework assignmentsthe evaluation rubrics developed and used the exam evalua-tion tasks and the E- and NE-problems

1 B S Bloom A Taxonomy of Educational Objectives Hand-book I The Cognitive Domain David McKay Co Inc NewYork 1956

2 A Taxonomy for Learning Teaching and Assessing A Revi-sion of Bloomrsquos Taxonomy of Educational Objectives edited byL W Anderson and D R Kraftwohl Longman New York2001

3 A E Lawson How do humans acquire knowledge And whatdoes that imply about the nature of knowledge Sci Educ 9577 2000

4 A E Lawson The generality of hypothetico-deductive reason-ing Making scientific reasoning explicit Am Biol Teach 62482 2000

5 A R Warren Evaluation Strategies Teaching Students to As-sess Coherence amp Consistency Phys Teach 47 466 2009

6 A Buffler S Allie F Lubben and B Campbell The devel-opment of first year physics studentsrsquo ideas about measurementin terms of point and set paradigms Int J Sci Educ 23 11372001

7 R F Lippmann PhD Dissertation University of Maryland2003

8 S P Marshall Schemas in Problem-Solving Cambridge Uni-versity Press New York 1995

9 W J Leonard W J Gerace and R J Dufresne Analysis-based problem solving Making analysis and reasoning the fo-cus of physics instruction Sci Teach 20 387 2002

10 F Reif and J I Heller Knowledge structures and problemsolving in physics Educ Psychol 17 102 1982

11 P Heller R Keith and S Anderson Teaching problem solv-ing through cooperative grouping Part 1 Group versus indi-vidual problem solving Am J Phys 60 627 1992

12 M Sabella and E F Redish Knowledge organization and ac-tivation in physics problem-solving University of Marylandpre-print 2004

13 R R Hake Interactive-Engagement Versus Traditional Meth-ods A Six-Thousand-Student Survey of Mechanics Test Datafor Introductory Physics Courses Am J Phys 66 Issue 1 641998

14 K Cummings J Marx R Thornton and D Kuhl Evaluatinginnovation in studio physics Am J Phys 67 S38 1999

15 L C McDermott and P S Schaffer Tutorials in IntroductoryPhysics Prentice-Hall Upper Saddle River NJ 2002

16 N D Finkelstein and S J Pollock Replicating and under-standing successful innovations Implementing tutorials in in-

troductory physics Phys Rev ST Phys Educ Res 1 0101012005

17 A P Fagen C H Crouch and E Mazur Peer InstructionResults from a range of classrooms Phys Teach 40 2062002

18 E Mazur Peer Instruction A Userrsquos Manual Prentice-HallUpper Saddle River NJ 1996

19 R Warnakulasooriya D J Palazzo and D E Pritchard inEvidence of Problem-Solving Transfer in Web-Based SocraticTutor Physics Education Research Conference Proceedingsedited by P Heron L McCullough and J Marx AIP PressMelville 2006

20 K VanLehn C Lynch K Schulze J Shapiro R Shelby LTaylor D Treacy A Weinstein and M Wintersgill TheAndes Physics Tutoring System Lessons Learned Int J ArtifIntell Educ 15 No 3 147 2005

21 B J Zimmerman and M Martinez-Pons Development of astructured interview for assessing student use of self-regulatedlearning strategies Am Educ Res J 23 Issue 4 614 1986

22 N Perry Young childrenrsquos self-regulated learning and contextsthat support it J Educ Psychol 90 715 1998

23 D Hammer More than misconceptions Multiple perspectiveson student knowledge and reasoning and an appropriate rolefor education research Am J Phys 64 1316 1996

24 American Association for the Advancement of ScienceProject 206 Benchmarks for Science Literacy Oxford Univer-sity Press New York 1993

25 NSF Directorate for Education and Human Resources Reviewof Undergraduate Education Shaping the Future New Expec-tations for Undergraduate Education in Science MathematicsEngineering and Technology Recommendations may be seenat httpwwwehrnsfgovegrduedocumentsreview96139fourhtm 1996

26 National Research Council National Science Education Stan-dards National Academy Press Washington DC 1996

27 R H Ennis in Teaching Thinking Skills Theory and Practiceedited by J B Baron and R J Sternberg Freeman New York1987 pp 9ndash26

28 P M King and K S Kitchener Developing Reflective Judg-ment Understanding and Promoting Intellectual Growth andCritical Thinking in Adolescents and Adults Jossey-Bass SanFrancisco 1994

29 P Black and D Wiliam Inside the Black Box Raising Stan-dards through Classroom Assessment Kingrsquos College Lon-

IMPACT OF TEACHING STUDENTS TO USE EVALUATIONhellip PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-11

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12

Page 12: Impact of teaching students to use evaluation strategies

don 199830 R Sadler Instr Sci 18 119 198931 S Murthy Peer-Assessment of Homework Using Rubrics

2007 Physics Education Research Conference edited by LHsu C Henderson and L McCullough AIP Melville NY2007 p 156ndash159

32 E Yerushalmi C Singh and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (I) 2007 PhysicsEducation Research Conference edited by L Hsu C Hender-son and L McCullough AIP Melville NY 2007 p 27ndash30

33 C Singh E Yerushalmi and B S Eylon Physics Learning inthe Context of Scaffolded Diagnostic Tasks (II) PreliminaryResults 2007 Physics Education Research Conference editedby L Hsu C Henderson L McCullough AIP Melville NY2007 p 31ndash34

34 E Etkina A Van Heuvelen S Brahmia D Brookes M Gen-tile S Murthy D Rosengrant and A Warren Scientific abili-ties and their assessment Phys Rev ST Phys Educ Res 2020103 2006

35 J Cohen A coefficient of agreement for nominal scales EducPsychol Meas 20 37 1960

36 A R Warren The Role of Evaluative Abilities in PhysicsLearning 2004 Physics Education Research Conference Pro-ceedings edited by J Marx S Franklin and P Heron AIPMelville NY 2005

37 A R Warren PhD Dissertation Rutgers University 200638 M Scott T Stelzer and G Gladding Evaluating multiple-

choice exams in large introductory physics courses Phys RevST Phys Educ Res 2 020102 2006

39 D Hestenes M Wells and G Swackhamer Force conceptinventory Phys Teach 30 141 1992

40 D P Maloney T L OrsquoKuma C J Hieggelke and A VHeuvelen Surveying studentsrsquo conceptual knowledge of elec-tricity and magnetism Am J Phys 69 S12 2001

41 B Sherin PhD Dissertation University of California Berke-ley 1996

42 J L Lemke in ldquoMultiplying Meaning Visual and VerbalSemiotics in Scientific Textrdquo in Reading Science edited by JR Martin and R Veel Routledge London 1998

43 E Etkina A R Warren and M J Gentile The Role of Mod-els in Physics Instruction Phys Teach 44 34 2006

AARON R WARREN PHYS REV ST PHYS EDUC RES 6 020103 2010

020103-12