22
Journal of Mathematical Behavior 21 (2002) 203–224 Exploring test performance in mathematics: the questions children’s answers raise Elham Kazemi University of Washington, 122 Miller, P.O. Box 353600, Seattle, WA 98195-3600, USA Abstract This article investigates children’s mathematical performance on test items, specifically multiple-choice questions. Using interviews with 90 fourth-graders, it reveals why particular kinds of items are more or less difficult for students. By using multiple-choice questions and juxtaposing them with similar open-ended problems, the findings underscore the costs of not attending to children’s thinking in designing and interpreting problems. The data from this study suggest that when answering multiple-choice questions, students’ attention is drawn to the choices themselves. They do not necessarily think through the problem first and thus make their choices based on (often incorrect) generalizations they have made about problem-solving. Whether students answered a multiple-choice question or a similar open-ended problem first impacted both their performance and their reasoning. Moreover, children draw on their life experiences when the context of the problem is salient, thus ignoring important parameters of the stated problem. Implications for investigating children’s thinking, instruction, and test design are discussed. © 2002 Elsevier Science Inc. All rights reserved. Keywords: Children’s thinking; Mathematical performance; Interpreting problems; Testing 1. Introduction Much research in mathematics education focuses on understanding children’s thinking. The central concerns of this body of work have been to understand what mathematical knowledge children need to know, how children come to build sophisticated understandings, and how their reasoning is shaped by the mathematical experiences they have in and out of school (e.g., Ball & Bass, 2000; Carpenter, Fennema, Franke, Levi, Empson, 1999; Cobb, Boufi, McClain, & Whitenack, 1997; Lampert, 1990; Lave, 1988; Saxe, 1990). One area that needs further attention is how children make sense of the assessment tasks they typically encounter at the end of the school year. This article examines how children’s under- standing interacts with the way test items are structured. Specifically, the study examines the reasons E-mail address: [email protected] (E. Kazemi). 0732-3123/02/$ – see front matter © 2002 Elsevier Science Inc. All rights reserved. PII:S0732-3123(02)00118-9

Exploring test performance in mathematics: the questions children’s answers raise

Embed Size (px)

Citation preview

Journal of Mathematical Behavior21 (2002) 203–224

Exploring test performance in mathematics: the questionschildren’s answers raise

Elham Kazemi

University of Washington, 122 Miller, P.O. Box 353600, Seattle, WA 98195-3600, USA

Abstract

This article investigates children’s mathematical performance on test items, specifically multiple-choice questions.Using interviews with 90 fourth-graders, it reveals why particular kinds of items are more or less difficult for students.By using multiple-choice questions and juxtaposing them with similar open-ended problems, the findings underscorethe costs of not attending to children’s thinking in designing and interpreting problems. The data from this studysuggest that when answering multiple-choice questions, students’ attention is drawn to the choices themselves.They do not necessarily think through the problem first and thus make their choices based on (often incorrect)generalizations they have made about problem-solving. Whether students answered a multiple-choice question or asimilar open-ended problem first impacted both their performance and their reasoning. Moreover, children draw ontheir life experiences when the context of the problem is salient, thus ignoring important parameters of the statedproblem. Implications for investigating children’s thinking, instruction, and test design are discussed.© 2002 Elsevier Science Inc. All rights reserved.

Keywords:Children’s thinking; Mathematical performance; Interpreting problems; Testing

1. Introduction

Much research in mathematics education focuses on understanding children’s thinking. The centralconcerns of this body of work have been to understand what mathematical knowledge children need toknow, how children come to build sophisticated understandings, and how their reasoning is shaped by themathematical experiences they have in and out of school (e.g.,Ball & Bass, 2000; Carpenter, Fennema,Franke, Levi, Empson, 1999; Cobb, Boufi, McClain, & Whitenack, 1997; Lampert, 1990; Lave, 1988;Saxe, 1990). One area that needs further attention is how children make sense of the assessment tasksthey typically encounter at the end of the school year. This article examines how children’s under-standing interacts with the way test items are structured. Specifically, the study examines the reasons

E-mail address:[email protected] (E. Kazemi).

0732-3123/02/$ – see front matter © 2002 Elsevier Science Inc. All rights reserved.PII: S0732-3123(02)00118-9

204 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

children articulate for their responses to multiple-choice questions that appear on end-of-the-year assess-ments. As schools increasingly rely on test scores to make policy, promotion, and instructional decisions(Linn, 2000; National Research Council, 1999), we need to understand more about how students interpretitems and why they choose the answers they do. By using multiple-choice questions and juxtaposing themwith similar open-ended problems, this study provides further evidence for the costs of not attending tochildren’s thinking in designing and interpreting even seemingly straightforward tasks.

This study builds on two different literatures. The first body of work has focused on understand-ing student reasoning in assessment situations, and the second includes research on understanding theculturally-specific knowledge that children might use in making sense of particular problem-solvingcontexts.

One body of work has arisen in response to highly cited examples of students’ test performance that havebeen used to show areas in which students lack understanding. Results from the National Assessment ofEducational Progress, for example, have repeatedly shown that students have difficulty with non-routineproblems that require them to analyze problems, not just solve them. For example, students’ solutionsto the “bus problem” from the third NAEP raised alarms about students’ understanding of division. Theproblem read, “An army bus holds 36 soldiers. If 1128 soldiers are being bused to their training site, howmany buses are needed?” Only 24% of the national sample of students taking the test solved this problemcorrectly (NAEP, 1983). Others did not interpret the remainder to indicate that another partially filled buswould be needed while some suggested that a minivan or smaller bus could be used for the remainingsoldiers who would not fill a bus.

Because of the nature of such wide-scale testing data, especially in multiple-choice formats, researchersinterested in student learning do not have access to students’ own explanations of their answers in thesesituations. This fact has led to a set of studies linked to the QUASAR1 project, which examined mid-dle schoolers’ mathematical achievement (Lane & Silver, 1995). In creating the QUASAR CognitiveAssessment Instrument (QCAI) used to measure middle school students’ capacity for higher-level rea-soning processes, researchers have carefully studied items to see whether they elicit students’ best rea-soning (Lane, 1993; Magone, Cai, Silver, & Wang, 1994). This work included the creation of open-endedversions of multiple-choice problems in order to more fully understand how students made sense ofthe problem situations (Cai & Silver, 1995; Lane, 1993; Santel-Parke & Cai, 1997; Silver, Shapiro, &Deutsch, 1993). For example, a study using division-with-remainder problems (similar to the bus problemdescribed above) in open-ended format found that a higher percentage of students than indicated by NAEPresults (45% of a sample of about 200 middle school students) could provide an appropriate interpreta-tion to their computational answer if given a chance to explain their reasoning (Silver et al., 1993). TheQUASAR studies have also compared the influence of different kinds of prompts on students’ responses.The findings show that prompts, which do not explicitly direct students to pay attention to mathemati-cal aspects of the task, can underestimate student understanding. Similarly, students’ interpretations offamiliar contexts may inadvertently interfere with their ability to use the reasoning that particular itemsintend (seeSantel-Parke & Cai, 1997, for examples).

1 QUASAR (Quantitative Understanding: Amplifying Student Achievement and Reasoning) was a reform project (1989–1995)whose goal was to develop and implement middle school curriculum in economically disadvantaged communities. The curriculumcentered on developing students’ reasoning, problem-solving, and communication skills in order to deepen their mathematicalunderstandings. The project was directed by Edward A. Silver and headquartered at the Learning Research and DevelopmentCenter at the University of Pittsburgh (seeSilver, Smith, & Nelson, 1995for an overview of the project).

E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 205

Like QUASAR, researchers who contributed to the development of a middle school curriculum,Mathematics in Context(Romberg, 1998), have built assessments that take into account detailed researchon students’ reasoning. In a series of articles about test development, Van den Heuvel-Panhuizen and hercolleagues underscore the importance of students’ active participation in assessment design. For example,they write about the value of understanding what makes certain problems more and less difficult for stu-dents, whether students can articulate why certain problems are more or less difficult, the importance ofallowing students to generate problems to use in assessments, and the role that contexts play in children’sproblem-solving efforts (Van den Heuvel-Panhuizen, 1994; Van den Heuvel-Panhuizen & Gravemeijer,1993; Van den Heuvel-Panhuizen, Middleton, & Streefland, 1995). Taken together, studies stemmingfrom QUASAR and the development ofMathematics in Contextprovide evidence for the importance ofexamining how children interpret assessment items.

To understand diversity in children’s sense-making strategies on tests, the second body of work thatinforms this study has shown how particular questions require culturally-specific knowledge. These stud-ies of children’s test performance have been concerned with documenting cultural bias in test language(McNeil, 2000; Smith & Fey, 2000; Solano-Flores & Nelson-Barber, 2001; Stiff & Harvey, 1988). Forexample,Tate (1994)demonstrated how African American students’ varied solutions to a problem re-flected the life experiences they brought to bear in solving the problem. The problem read, “It costs $1.50each way to ride the bus between home and work. The weekly pass is $16.00. Which is a better deal,paying the daily fare or buying the weekly pass?” Students who picked the weekly pass were markedwrong, but these students reasoned that the bus rider could use the pass to travel to multiple jobs and onweekends or could share it with family members (see alsoLadson-Billings, 1995). I use this exampleand the body of work it represents as evidence that students actively make sense of the problems theyencounter and construct a range of valid mathematical interpretations based on their everyday experiences(see alsoCooper & Dunne, 2000andSolano-Flores & Nelson-Barber, 2001for additional examples frommathematics and science assessments). The goal of this article, however, is not to make ethnic-specific orclass-related claims about students’ problem-solving strategies. Instead, I seek to demonstrate the rangeof knowledge and interpretations that a diverse group of students evoked in their problem-solving efforts.

This study was motivated in part by my work with teachers in which I have studied how teachersunderstand and make use of children’s mathematical thinking in making pedagogical and curriculardecisions (Franke & Kazemi, 2001; Kazemi & Franke, 2000). My work has taken place in schools wherestudents have not historically performed well on state or national assessments. Many teachers with whomI work feel compelled, near the end of the school year, to practice an array of discrete mathematicalprocedures they anticipate will be covered on the end-of-the-year tests. What is particularly striking isthat some teachers feel they must put aside their efforts to elicit children’s thinking and instead performtriage on the skills they have not yet “covered.” Their anxieties about coverage are heightened becauseof the sheer volume of distinct skills students are expected to master at each grade. Observing teachersturn to hours of computational practice, I wondered, instead what we might learn if we asked students totell us how they approached seemingly straightforward problems that they encounter on tests.

Using interview data of 90 fourth-graders, this study explores why particular kinds of items are moreor less difficult for students. In selecting the problems for this study (seeTable 1), I drew from myexperience observing children solve mathematical problems and from my knowledge of research onchildren’s thinking about number (e.g.,Carpenter et al., 1999). Theoretically, this study is informed bya situated view of learning. From this perspective, testing is one kind of practice with its own norms andrules (Miller-Jones, 1989; Rogoff, 1997; Wertsch, 1991). Children’s participation in a testing situation

206 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

Table 1Problems used in interview protocol

E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 207

is shaped by how they interpret the structure and content of the problems and what they think theyshould and should not do when they work on a problem. In this study, I rely on students’ own writtenand verbal explanations to acquire a deeper understanding of their sense-making processes. Whetherstudents answered a multiple-choice question or a similar open-ended problem first impacted both theirperformance and their reasoning. I demonstrate below that when students approach a multiple-choiceproblem, their attention is drawn to the choices themselves. Focusing on the choices can lead studentsto use different knowledge than if the problems were open-ended, leading to higher rates of incorrectresponses. When a problem is presented in the multiple-choice format, the findings discussed below showthat students do not necessarily think through the problem first. Instead, they make their choices based ongeneralizations, often incorrect ones, they have made about problem solving. Moreover, children draw ontheir life experiences when the context of the problem is salient to them, which may also lead to incorrectchoices. This result is not surprising however. If we assume that students’ life experiences play a role intheir problem solving, then we should expect students to generate a diverse array of solutions based onknowledge they use to make sense of the problem.

2. Method

The study draws on audio-taped clinical interview data involving 90 fourth-graders (48 girls, 42 boys).Data were collected in April 2000, several weeks before the administration of the state assessment.Children were selected from five schools across 12 classrooms. In each classroom, up to five boys andfive girls were randomly selected from students who turned in both student and parent consent. Theschools were selected based on their interest in participating in the study. The schools had diverse studentbodies, varying levels of curriculum innovation, and average or below average performance on the stateassessments. The sample was ethnically diverse (14% African American, 26% Asian, 12% Latino, 4%Native American, 43% White). Approximately 60% of the students at four of the five schools were onfree or reduced lunch. Between 8 and 21% of fourth-graders at those four schools met or exceeded thestandard for passing the mathematics portion of the state assessment the year prior to the study. At thefifth school, about a third of the students were on free or reduced lunch and 50% of the fourth-gradershad met or exceeded the standard for passing the mathematics portion of the state assessment the yearprior to the study.

I examined the multiple-choice mathematics portion of a fourth-grade state assessment, drawing fromsample items that were widely distributed to teachers for use in helping prepare their students. Theparticular state assessment was chosen because of its similarity to other state and national tests. Like othertests, the mathematics portion has three types of items: multiple choice, short response, and extendedresponse. The items measure achievement in both content strands (number sense, measurement, geometricsense, probability, statistics, and algebraic sense) and process strands (problem solving, logical reasoning,communication, and connections).

I selected four items, listed inTable 1, that I expected would elicit a diverse range of student interpre-tation. Each problem was selected for a different reason. In the first two problems, students were askedto read a word problem and select some aspect of the problem-solving process: an appropriate numbersentence or the best first step. The wording of those two problems do not explicitly direct students to solvethe problems first. Thus, the first two problems were selected in order to test the hypothesis that studentswould choose a solution based on their attempts to match clues in the problem with the set of choices

208 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

rather than based on their attempts to actually solve the problem. The third problem was selected in orderto further explore the idea that students’ everyday experiences can significantly influence their use ofmathematical reasoning to solve problems. It is a question that is meant to assess students’ understandingof probability, yet the salience for children of finding hidden toys in cakes could draw their attention awayfrom strict probabilistic interpretations. Finally, the fourth problem was chosen to test the influence ofa long example problem on students’ responses. Each problem required problem solving and reasoningskills. For three of the four multiple-choice items, a similar open-ended problem was posed to explorestudent responses in the absence of a fixed set of choices. The open-ended versions were posed in order toexplore students’ reasoning. They were not posed in order to argue that open-ended questions are betterforms of assessment than multiple-choice questions. The items were counterbalanced on two differentforms of the interview. Half of the students were given Form A and thus solved the multiple-choice itemsbefore the open-ended items. The other half were given Form B and thus solved the open-ended itemsbefore the multiple choice.

The problems were posed individually to students in one session, typically lasting 20–30 min. Thestudents recorded their solutions on paper. Each problem appeared on its own piece of paper. The studentsdid not look through the problems at the beginning of the session. Instead, they read and solved eachproblem one at a time. The author and two graduate students trained in eliciting children’s thinkingconducted the interviews. At the beginning of the interview, the interviewer explained that she wasinterested in how students’ solved the mathematical problems. The interviewer encouraged each studentto either speak aloud while solving the problem or to explain the solution verbally once it was recordedon paper. When students were given the multiple-choice items, they were asked to explain why theychose a particular answer. Students were also asked to explain why they did not pick other choices. Theinterviewer used probes to clarify students’ explanations with questions such as, “Can you explain howyou figured that out?”; “Why did you decide to ?”; “How did you know ?”; “You said . Canyou tell me a little more about that?”; “Tell me what you mean by?” The children’s own words wereoften used in probes in order to encourage them to be more explicit about their thinking. On open-endeditems, children were asked to explain their solution and reasoning using clarifying questions. Studentswere not told whether their answers were correct or not.

Audiotapes of the interviews were transcribed and entered into a database for easy retrieval, coding,and sorting. For each problem, the range of students’ strategies and explanations was categorized. Theauthor and a graduate student coded each explanation in several cycles, refining codes in order to capturethe range of solutions observed in the sample. In each cycle, categories were further defined or elaborateduntil complete agreement was reached.

3. Results

3.1. Finding the appropriate number sentence

Multiple-choice item: Juan and Bill worked together to unload bags of food from a van. On each trip,Juan carried 6 bags and Bill carried 4 bags. They each made a total of 3 trips.Which number sentence would you use to find how many bags they unloaded inall?

(a) 3 × 6 × 4 = � (b) (3 × 6) + 4 = � (c) (3 × 6) + (3 × 4) = �

E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 209

Table 2Number of students choosing each response on the number sentence itemJuan and Bill worked together to unload bags of food from a van. On each trip, Juan carried 6 bags and Bill carried 4 bags. Theyeach made a total of 3 trips. Which number sentence would you use to find how many bags they unloaded in all?

Form A Form B Total(multiple-choice posed first) (open-ended posed first)

(a) 3× 6 × 4 = � 16 9 25(b) (3 × 6) + 4 = � 11 4 15(c) (3 × 6) + (3 × 4) = � 17 31 48

No answer 1 1 2

Total 45 45 90

Open-ended item: Thomas and Joelle were moving books to a new shelf. Thomas carried 3 books at atime. Joelle carried 5 books at a time. They each made 4 trips to the bookshelf. Writea number sentence to find how many books they moved in all.

I selected this problem because I expected students to have difficulty selecting the appropriate numbersentence before they solved the problem. The presentation of the problem does not necessarily suggestto students to solve the problem first. By not solving the problem, I hypothesized that students woulduse other clues to pick an appropriate number sentence. By seeing a list of three number sentences, myconjecture was that students would try to find a match between key words in the problem and particularsymbols in the number sentences. In the absence of a fixed set of choices, I expected students to thinkthrough the problem first and then write a number sentence that closely modeled how they had solved theproblem. The results shown inTable 2confirmed those conjectures.

Of the students who received Form A of the interview, meaning they had solved the multiple-choicequestion first, 17 out of 45 or 38% of them chose the correct response. On Form B, when students hadsolved an open-ended question first, 31 out of 45 or 69% of the students chose the correct response. Achi-square test for independence revealed a significant relation between choosing a correct response andthe form taken [�2(1) = 9.75,P < .01]. Posing a similar open-ended problem first almost doubled thenumber of correct responses.

Further analyses indicated that students who chose the incorrect number sentences based those decisionson generalizations they made about number sentences. Across both Forms A and B, students’ reasons forselecting choices A or B are summarized inTable 3. There was no clear pattern in the choices Form A orForm B students made. The range of students’ reasons included their beliefs that: (a) the answer to thenumber sentence(3 × 6) + (3 × 4) would be too big, (b)(3 × 6) + (3 × 4) has too many numbers init, (c) the number “3” should not appear twice in the number sentence because it only appeared once inthe problem, (d) the word problem had three numbers in it, and all three numbers appeared in choices Aand B, (e) a number sentence cannot have more than one operation in it, and (f) the problem seemed tobe about adding, and multiplication is “a faster way to add,” so choice A must be correct.

The students’ reasoning revealed how they compared the number sentences as they deliberated. Thefirst choice, 3×6×4 was chosen by 25 out of 90 or 28% of the students. In this case, students appeared tolink the idea of makingthree tripsto multiplication, and eliminated the other two choices based on theirunderstanding of what number sentences should look like. Likewise, the students who selected(3×6)+4

210E

.Ka

zem

i/Jou

rna

lofM

ath

em

atica

lBe

ha

vior

21

(20

02

)2

03

–2

24

E.K

aze

mi/Jo

urn

alo

fMa

the

ma

ticalB

eh

avio

r2

1(2

00

2)

20

3–

22

4211

212 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

made the connection between making three trips with six bags and saw the “4” as just needing to be partof the number sentence. The fact that it was added to the quantity (3× 6) seemed reasonable to thembecause they said that both 3× 6× 4 and(3× 6) + (3× 4) would produce overly high answers. Finally,the expression(3 × 6) + (3 × 4), violated several rules for students — the number “3” appeared twice,and more than one operation was used. It was clear from the interviews that many students did not knowwhat the parentheses designated, yet students who correctly answered the question, reasoned that theparentheses divided up or separated the number sentence into two parts.

I did not expect the Form A students, who saw the multiple-choice problem first, to necessarily thinkthrough the problem. The Form A students who did choose the correct response, C, might have beenpuzzled by the its appearance, but their verbal explanations show how they thought through the contextof the problem. Below are some typical responses:

I think it is C. (So can you tell me how you decided which answer to choose?) Well, they each made 3trips and Juan carried 6 bags in each trip and that would be 3× 6 — that part there. And Bill carried 4bags each trip, so that would be 3× 4 and then you need to add them to find the total. (OK, so can youtell me why you didn’t pick A?) You don’t want to do 6× 4 because the number of bags they carriedeach doesn’t have anything to do with each other. Bill could have done it by himself, but he wouldhave had to do 6 trips. (And do you know what the parentheses mean?) It means you have to do thatproblem first. So you would say 18+ 12.

[Student works out answer to each choice.] (Did you read the problem first?) Yes. (OK. You picked C?OK, so tell me a little bit about what you were thinking and doing.) I read the problem and answeredall of these [number sentences] and when I did all these I did 6+6+6 and 4+4+4. The sixes equaled18 and the fours equaled 12 and I added them up and they came out to 30. (OK, so you did that first?)No, I answered these problems [all number sentences] first and then I did this, added it. (Then how didyou know which one to pick?) Because 3× 6× 4 is 72. 18+ 4 is 22. And 6× 3 plus 3× 4 equals 30,and that is the right answer.

The first student looked for the number sentence that had the two components she thought needed to beincluded (3× 6 and 3× 4). In contrast, the second student solved the problem first his way, by addingthree sixes and three fours. He discovered that the answer was 30. In order to pick the correct option, hesolved each expression to see which one produced the same answer of 30.

Form B students who selected the correct answer did recognize that the problem mirrored the one theyhad just solved. In the following examples, notice that the first two students were puzzled about the waythe number sentences were written. The parentheses produced some hesitation, and at first the studentsselected A because they saw both 3× 6 and 3× 4 in the expression 3× 6 × 4. The third student chosea different approach. He solved each expression and then matched the answer that he expected to thecorrect one. The following three explanations are representative of the way Form B students explainedtheir answers.

I did mainly the same thing I did the first time, but I did 3× 6. So 3× 6 equals the number and then3 × 4 · · · wait a minute. I don’t want to do that. (You don’t want to do A? How come?) Because Ijust figured it out. 3× 6 × 4 — I just figured that out. . . at the start I was confused, so now I knowit is C. (Tell me why you think it is C?) Well, 3× 6 is the first one plus 3× 4 equals both of theirnumbers. (Have you seen these marks — the parentheses — before?) Yeah. (What do they mean?)They mean that problem put together. It is like a problem that’s not there. They put it there to make

E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 213

the sentence a little longer and also so that it makes it a little easier for you to tell. And also they doit for multiplication, they times it and then you know what it adds up to and then you plus the otherone. That’s how I usually think of it. (So now you are saying it is not A, it is C?) Yeah. (Because youfigured it out. What do you mean by that?) Because first I thought it was 3× 6, 3× 4 and at first Ididn’t know that. Then I knew it had to be 3× 6 PLUS 3× 4.

Like on the last one, we have to multiply. So Juan carried 6 bags and Bill carried 4, so I just multipliedthat and so it’s probably the same as this one here. (You first thought it was the same as A?) Because Ithink you have to multiply first then multiply these two together, but it doesn’t have the little quotes toshow you to do that first, like we did this one. (The little quotes, you mean the parentheses?) Oh — it’sthis one [points to C]. (You changed your mind. Tell me a little bit about how you decided C was theone you wanted.) Because if we didn’t do it that way, like 3× 6 would be 18 times 4 again wouldn’tequal whatever it equaled. It would be a lot more. But right here it is easier because 6× 3 equals theanswer right there, but it is telling you to do that first and then add the two answers together first.Exactly like what I did on the other problem. (Oh, so you would multiply 6× 3 first and then 3× 4)and then add the answers. (OK, so why wouldn’t it be say, B?) Because you don’t just add 4 books.He didn’t carry them one time, he carried them around three times, so you would have to multiply.

[Student wrote answers in each square.] J carried 6 bags, times 3, which fits this sentence. Equals 18.Bill carried 4 bags, 4× 3 = 12, which also fits this one, so C must be the answer because A and Bdon’t have the required needs. So 3× 6 = 18, 3× 4 = 12 and 18+ 12 = 30. (You said the other twodon’t fit the required needs? Can you say a little more about that?) Well, 3× 6 = 18 and 18× 4 = 72which is way off. So this equals 72, which is way off from 3×6 or 3×4. And then you do 3×6 = 18,plus 4 which would equal 22. (Do you know what these parentheses mean?) They are there becauseyou have to add the answer to the first one plus the answer to the second problem.

Analyses on the open-ended problem revealed that 60% of the students were able to solve the problem.Children came up with five different correct number sentences that corresponded to the way they workedthrough the problem:

12+ 20 = 32 (a)

8 × 4 = 32 (b)

5 + 5 + 5 + 5 = 20, 3 + 3 + 3 + 3 = 12, 20+ 12 = 32 (c)

3 + 5 = 8, 8 × 4 = 32 (d)

(3 × 4) + (5 × 4) = 32 (e)

Only six students came up with a number sentences that mirrored the correct one on the multiple-choiceitem [(3 × 4) + (5 × 4) = 32]. All six of those students received Form A and thus had seen the modelfirst. Students’ own number sentences explain why some might have been confused by the expression(3 × 6) + (3 × 4). They wrote rather straightforward number sentences — they used one operation at atime and separated out the steps they used to solve the problem.

It is interesting to note that nine students solved the open-ended problem by first combining the bookscarried in one trip and then multiplying that number by four trips thus ending up with the statement

214 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

8 × 4 = 32. Seven of those nine students took Form B of the interview and thus solved the open-endedproblem first. When five of the Form B students saw the multiple-choice question, they solved the problemand then matched the answer to the number sentence. The other two students chose B,(3×6)+4, becausethey did not like the way C was written — the number three appeared twice or it seemed like the answerwould be too big.

The students who took Form A produced a range of incorrect number sentences (shown inTable 4).They wrote number sentences that reflected part, but not all, of the problem. For example, students whowrote 5+ 3 = 8 said they wrote down how many books were carried altogether, but the number sentenceonly refers to one trip. Students also recognized that the problem involved multiplication which explainsmost of the other number sentences they created (i.e., 3× 5 = 15, 5× 4 = 20, 4× 3 × 6 = 60,3 × 5 = 15+ 4 = 19, 3× 4 + 5 = 17).

The majority of Form B students who wrote incorrect number sentences selected the numbers out of theproblem and decided on one operation. Thus, they noted the numbers 3, 4, 5, and the words “in all,” which

Table 4Students’ own number sentencesThomas and Joelle were moving books to a new shelf. Thomas carried 3 books at a time. Joelle carried 5 books at a time. Theyeach made 4 trips to the bookshelf

Number sentence Form A Form B(multiple-choice first) (open-ended first)

Correct number sentences 22 323 × 4 = 12; 5 × 4 = 20; 12+ 20 = 32 13 21(3 × 4) + (5 × 4) = 32 6 03 + 5 = 8; 8× 4 = 32 2 38 × 4 = 32 0 45 + 5 + 5 + 5 = 20; 3+ 3 + 3 + 3 = 12; 20+ 12= 32 1 212+ 20= 32 0 2

Nearly correct number sentences 2 23 × 4 = 12; 5× 4 = 25; 12+ 25= 37 0 13 × 4 = 12+ 20= 32 1 15 + 3 = 8; 8× 4 = 40 1 0

Incorrect number sentences 19 115 + 4 + 3 = 12 4 75 + 3 = 8 4 03 × 4 × 5 3 05 × 4 = 20× 3 = 60 1 0(3 × 8) + (3 × 5) 1 03 × 4 + 5 = 17 1 05 × 4 = 20 1 03 × 5 = 15 1 05 × 4 = 20; 5× 3 = 15; 20× 15= 100 1 035× 4 = 140 1 04 × 3 × 6 = 60 0 15 × 3 = 15; 3× 3 = 9 0 13 × 5 = 15+ 4 = 19 0 14 + 5 = 9 0 1

E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 215

signaled an addition problem. A typical explanation, “Because it says ‘how many they moved in all’ and‘in all’ means to plus so I added all the books they carried.” The results from this problem both supportand contradict earlier reports about students’ problem-solving abilities documented in NAEP. PreviousNAEP findings (Carpenter, Corbitt, Kepner, Lindquist, & Reys, 1980; Kouba et al., 1988) led researchersto conclude that students’ poor performance on multi-step problems reflects an over-reliance on choosingthe appropriate arithmetic operation after reading the problem. While the results from Form A wouldsupport that conclusion, results from Form B contradict it. Students seem to be able to make sense ofmulti-step problems if they are given a chance to think through the problems first. It does remain true,however, that if students believe that problem solving involves choosing the correct operation, they arelikely to incorrectly solve multi-step problems.Kouba et al. (1988)state that problems that ask children tomatch number sentences to a problem situation are good contexts for assessing children’s understandingof symbolic representations. This study raises questions about how children interpret such formats. Ifthey believe they should just select a number sentence based on clues they find in the word problem ratherthan first thinking through the relationships in the problem, they may choose the incorrect one.

3.2. Finding a good first step

For both of the following problems, students were shown a graphic of two different bags of chocolatechips. One bag is labeled, “Brand X,” which holds 12 ounces and costs $1.20. The other bag is labeled“Brand Y,” which holds 16 ounces and costs $1.75.

Multiple-choice item: Pat finds the two brands below in a store. He wants to buy 30 ounces of chocolatechips for the least amount of money. Which is a good first step he could use tosolve this problem? (a) Find the total price of the bags he decides to buy, (b) Findhow many of each brand he would need, (c) Find the price-per-ounce for each bag.

Open-ended item: Pat finds the two brands below in a store. He wants to buy 30 ounces of chocolatechips for the least amount of money. What should he buy?

I expected students to respond to this problem similarly to the way they responded to the first problem.This problem asks students to select the first thing they would do to solve the problem. It does not directstudents to actually solve the problem. Like many mathematical problems, a variety of valid approachesmay exist. The way this problem is written, however, assumes that one way would be best.

Since the problem does not require students to find a solution, they did not actually work out theproblem. 12 out of 45 or 27% of the Form A students chose the correct response. 21 out of 45 or 47% ofthe Form B students, who had an opportunity to solve the problem itself first, chose the correct response(seeTable 5). A chi-square test for independence revealed a significant relation between getting the answercorrect and the form taken [�2(1) = 4.03, P < .05]. More students were likely to select the best firststep correctly when they were given a chance to solve the problem first. What is surprising is that a highpercentage of Form B students, 40%, still chose price-per-ounce. To understand their choices, we need toexamine how they interpreted the meaning of price-per-ounce and how they actually solved the problem.

When students were asked to solve the problem, the most popular solution was to figure out how manybags of each brand were needed to get at least 30 ounces. It takes three bags of Brand X and two bagsof Brand Y. Students added 12 three times and 16 twice. Then, they added up the dollar amount. Three

216 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

Table 5Number of students choosing each response on the first step problemPat finds the two brands below in a store. He wants to buy 30 ounces of chocolate chips for the least amount of money. Whichis a good first step he could use to solve this problem? (see graphic inTable 1)

Form A Form B Total(multiple-choice first) (open-ended first)

Find the total price of the bags he decides to buy 12 6 18Find how many of each brand he would need 12 21 33Find the price-per-ounce for each bag 21 18 39

Total 45 45 90

bags of Brand X cost $3.60 while two bags of Brand Y cost $3.50. Variations of this strategy includedmultiplying instead of adding at each step.

Despite the total number of students who selected response “C, find the price-per-ounce,” onlyonestudent actually calculated the price-per-ounce when she solved the problem on her own (yet she didnot select price-per-ounce as the correct answer in the multiple-choice problem; she chose option A).Students’ choice of price-per-ounce however differed based on whether they took Form A or Form B. 21Form A students selected price-per-ounce because (a) that’s what you do when you have to buy something,“Because sometimes when I go shopping, you look at the price to see which one you should buy. Whenyou find the price per ounce, I think it means what is cheaper.” (9 out of 21) or (b) both ounces and priceshave to be considered to solve the problem “Well, you kinda need to know how much they weigh ANDthe price before you go get a whole bunch.” (9 out of 21). Three students could not explain why theypicked price per ounce.

Sixteen of the 18 Form B students chose price-per-ounce not for its literal definition but because theyrecognized it as the only choice that had both priceandounce in it. They connected price-per-ounce totheir own efforts of considering both the price of the bags and its contents at the same time. One girlexplained, “. . . so he wants the least amount of money and 30 ounces. So he’d have to look at each brandand see the money and add it up so it’s the least amount.Andhe wants 30 ounces and see if it adds up to30 or 32.”

When children solved the open-ended problem, close to 60% chose a solution that made sense giventhe problem (seeTable 6). 36 out of 90 students chose the response the problem intended — two bags ofBrand Y. An additional 18 students chose a range of other strategies that made sense given the contextof the problem. Ten students chose either 2 bags of Brand X or 3 bags of Brand X. The children whochose Brand X did so because they found it to be the best buy, not necessarily the one that would givePat 30 ounces for the least amount of money. Notably, six children who picked Brand X said they wouldonly buy two not three boxes. When reminded that the problem indicated Pat wanted to buy 30 ounces,one girl said, “I’d ask him, is it okay to get 24 ounces?” She went on to explain that $2.40 is much lessthan $3.50, which was how much he would have to spend for two bags of Brand Y. She thought that hecould get by with 24 ounces. Another boy said that buying 24 ounces would save him a lot of money andwould only recommend that he buy two bags of Brand Y “if it was a real emergency, and he really, reallyneeded it.”

Most of the students in this study could reason through this problem, yet their responses to themultiple-choice question alone would have created doubt and alarm about their problem-solving skills.

E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 217

Table 6Solutions for open-ended problem for chocolate chip problem

Form A Form B Overall(multiple-choice first) (open-ended first)

Appropriate reasoning 26 28 542 bags of Brand Y because $3.50 is less than $3.60 (cost for

3 bags of Brand X)18 18 36

2 bags of Brand X because 24 ounces is close enough to 30ounces and it would be cheaper than 2 bags of Brand Y

2 4 6

3 bags of Brand X because it is a better deal (get 4 moreounces for only 10 more cents)

3 1 4

2 bags of Brand Y because 32 ounces is closer than 36ounces (did not compare prices)

3 5 8

Inappropriate reasoning 19 17 36

1 bag of Brand X because it costs less than Brand Y 11 8 19

1 bag of Brand Y because 16 ounces is closer to 30 than 12ounces

2 2 4

3 bags of Brand X because one bag of Brand X is cheaperthan one bag of Brand Y

2 3 5

It doesn’t matter. 2 bags of Brand X cost as much as 3 bagsof Brand Y (calculation error)

1 0 1

1 bag of each brand 1 0 1

Could not solve 2 4 6

3.3. The best chance

Multiple-choice item: Special cakes are baked for May Day in France. A small toy is dropped into thebatter for each cake before baking. Whoever gets the piece of cake with the toyin it is “king” or “queen” for the day. Which cake below would give you the bestchance of finding the toy in your piece?

Children are shown the following cakes: (A) rectangle cut into sixths, (B) circle cut into fourths, (C) circlecut into fifths, (D) circle cut into sixths.

The rationale for posing this problem was to further explore the way in which students would use theirlife experiences to make their choices. An open-ended problem was not posed in this case because I wasinterested in seeing how the context would influence children’s responses. 58% (52) of the students chosethe correct response. 3% (3) and 9% (8) chose responses C and D, respectively. 30% (27) of the studentschose A. Since nearly a third of the students selected the first rectangular cake cut into sixths, I exploredfurther their reasons for choosing it.

The children who selected A relied on their experiences eating cake and thought about this problemrealistically rather than probabilistically. Their responses fell into three categories. Most students (16 out

218 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

of 27) who selected the rectangular cake cut into sixths discussed how and whether thetoy would fitinto a slice. Children commented about the shape of a toy: “D is in triangle shapes and there aren’t toomany toys that could actually fit in that kind of shapes. Usually you see toy boxes that are in rectanglesand squares.” Others commented on whether the entire toy would fit into a slice, “I would not want tobe that toy if I got my head whomped off by a butcher’s knife.” Ten students thought that it would beeasier to find a toy in a bigger piece. They did not explicitly mention whether the toy would fit as otherchildren did. For example, one boy said, “Because you would only get one piece, and so you want thebiggest amount of pieces so you can get more chances of getting to the toy.” Finally, one boy consideredwhether or not you would get a slice. He said, “B doesn’t give you much of a chance. You could be thefifth person in line and the person in front of you would get it. You don’t get much of a chance.” In short,students who chose “D” responded to the question by thinking realistically about the problem. They wereconcerned about how toys actually fit into a piece of cake and whether or not they would get a pieceof cake.

Children who selected the correct response, the cake cut into fourths, chose a version of the followingtwo explanations:

Because (B’s) pieces are bigger than C. And although they are smaller than A and D, only four peoplecan eat it, so there’s less people so there’s more of a chance of you finding it. (So, the pieces in B arebigger than C but smaller than D and A but. . . ) only four people can eat out of it and six out of A andsix out of D. (OK. So why is that better?) Because if there is less people, you have a better chance ofgetting it.

Another student explained it this way:

It would be B. 1 in 4 chances is better than 1 out of 6 chances, with less pieces in it, it would be easier.If it were 1 out of 2, it would be really easy, but since four is the lowest number, it’s going to be thelowest number is the easiest. (Now why do you say that less pieces give you a better chance?) If it wasfive pieces, someone else might get it, it might be five people getting it. D or A there are six pieces sosix people, 1 out of 6 people get it. But if there is [sic] only four pieces, it would kinda be easier sincethere are four pieces of cake and one of them has the toy.

Because this question was drawn from a widely circulated sample test, some of the children com-mented during the interview that they had just done this problem before in class and relied on whatthey had discussed in class. So the number of correct responses, particularly on this problem, may beinflated.

3.4. What number does c stand for?

Sample problems are typically presented at the beginning of each new section on a test. I selecteda problem that included a sample embedded in it in order to explore the influence of the complicatedlayout on students’ ability to navigate through the problem. The problem linked the equationc + 2 = 5to a picture of chips and a cup (seeTable 1). The sample problem is meant to help students understandthat “c” stands for a number. 91% of the students solved the problem correctly, but 40% of them wereconfused by the picture and had to be told by the interviewer where the real problem was on the page.In contrast, when students were given the problema + 6 = 14 without an accompanying picture andasked, “What number does the letter ‘a’ stand for?” 98% of them responded correctly. Moreover, it did

E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 219

not matter which form of the interview they received. They were just as likely to get it correct when theyhad seen a similar multiple-choice item as when they had first seen the open-ended problem. This is acase in which supporting children’s problem solving through elaborated graphics actually created moreconfusion.

4. Discussion

This study provides further evidence that students’ problem-solving performance is influenced by howthey draw on their experiences with mathematics and the real world to interpret and make sense ofsituations. The results show differences in student reasoning when students were first presented with amultiple-choice problem rather than a similar open-ended problem. The findings suggest that, at leastfor particular kinds of multiple-choice items, children’s attention may be drawn directly to the choicesthemselves. This process detrimentally affected students’ responses. Instead of thinking through andsolving the problem, then choosing a reasonable answer, high percentages of children who chose theincorrect responses did so because of arbitrary rules they evoked about number sentences (in the caseof one problem) or about typical solutions (in the case of another problem). Students’ responses alsoprovide further evidence that their interpretation of the problem situation provides significant insight intotheir solutions, whether correct or not. The findings, then, raise questions about children’s mathematicalunderstanding, testing, and instruction that merit further investigation.

4.1. Mathematical implications

Previous studies have documented the number of problem-solving strategies children use when theyencounter a word problem, strategies that they have developed from their experience with school mathe-matics (Sowder, 1989). These strategies include picking the numbers and choosing an operation, lookingfor key words, etc. — ones that were also prevalent in this study. Some of these strategies are the result of“shortcuts” students are taught that will work most of the time with routine school problems. This studyprovides further evidence that teaching such shortcuts undermines students’ mathematical understanding.In both the number sentence problem and the chocolate chip problem, some students ignored the problemsituation and picked out the numbers and looked for key words that signaled to them the use of particularoperations. Some students picked the answer that seemed the most mathematically complex. What isstriking is that students relied more on those shortcuts when they were asked to choose from a set ofanswers. When they were asked to solve a problem that did not have a set of multiple choices, they didnot rely as much on their knowledge of shortcuts to generate a solution.

This study also cautions against exposing students to a narrow range of symbolic representations.For example, before students thought through the number sentence problem, they relied on implicitgeneralizations about what number sentences typically look like. These implicit and overly constrictedconceptions can be generated even as a result of “good teaching.” For example,Schoenfeld (1988),demonstrated that as a result of highly structured mathematics classrooms, students had developed thebelief that problems should take less than 2 min to solve and if they could not solve a problem in 10 min,then they were not capable of solving it at all. In this study, students claimed that number sentences shouldcontain only one operation, and there should be a one-to-one correspondence between the numbers in theword problem and the number sentence.

220 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

Much of the work in mathematics reform in the last decade and a half has promoted the use of authenticproblem-solving contexts. While this call has sometimes been interpreted too narrowly to simply meanthat children should solve more word problems, this study does raise issues about how we can assesschildren’s mathematical understandings when the context of the problem is a salient aspect of choosing asolution (see alsoSantel-Parke & Cai, 1997). The two examples just shown–The chocolate chip problemand the cake problem point to the rich interpretations students make. These efforts will go unrecognizedwhen problems that have the potential for multiple interpretations are presented as if they only have onecorrect solution. If we are to promote more authentic problem-solving in school, then we must take intoaccount real ways in which students will interpret the contexts of mathematical problems. There is a cleartension between doing school mathematics, understanding assumptions implicit in test questions, andthinking flexibly about a problem situation (Lave, 1993).

4.2. Assessment development

In a recent article on science assessments,Solano-Flores and Nelson-Barber (2001)argue that as-sessment developers need to “identify subtle, important ways in which sociocultural influences andinteractions determine student perceptions of what science items are about, what they believe they areexpected to do, and what problem solution strategies they use to solve them.” This study demonstratesthe importance of heeding that call. On one level, it raises questions about validity. I chose questionsthat I thought students would answer incorrectly at higher rates because of the way they were worded orstructured. I suspected that more students would choose the correct number sentence or find a good firststep once they had thought through the problem. I expected that students would not automatically solvethe problem since the question does not explicitly suggest to do that. A vital question to raise is to whatextent testmakers understand and develop test items based on understanding how children will interpretthe problem situation and its presentation. It is clear in some descriptions of assessment development(e.g.,Lane, 1993; Van den Heuvel-Panhuizen, 1994; Wiliam, 1998), that students’ reasoning is taken intoaccount when developing items. Such discussions are often embedded in attempts to create open-endeditems in which the context may be highly salient in how the students approach the problem. The samelevel of concern, I argue, needs to be directed towards multiple-choice items that have the potential ofbeing viewed as more straightforward and less perceptible to varying interpretation.

Adults, by virtue of their experiences with tests, have tacit knowledge about how tests work. When wetalk to young children, we realize the number of assumptions one has to make in order to solve particularproblems. For example, in the chocolate chip cookie problem, it clearly states that Pat wants to buy 30ounces of chocolate chip cookies. Yet nearly half of the children ignored that fact because purchasing 36ounces was a better buy than getting 32 ounces (one could buy 4 more ounces for only 10 more cents).Their reasoning was sound since in many shopping situations, we are concerned with getting the most forour money. Other students chose to buy less than the required 30 ounces because using fewer chocolatechips is often an option when baking. Yet if we believe that students should also learn how to follow theparameters stated by the problem (at least in testing situations), they need to understand that when theproblem stated Pat wanted to buy 30 ounces for the least amount of money, buying more for one’s moneyor buying less than 30 ounces are not options. Similarly, in the chocolate chip problem, 40% of these 9-and 10-year-olds were confused when they saw the answer to a problem in the test. Example problemsare a common part of tests, but not all children expect them to be embedded in the middle of particularitems.

E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 221

4.3. Test preparation and policy

Performing well on tests requires a certain level of understanding about the way tests work. It isreasonable to expect that if we are to use tests to measure achievement, teachers will need to help studentsunderstand what is being asked of them. How can teachers take a stance of learning more about studentthinking as a method of supporting students’ performance?

We cannot deny that students’ performance is important in the current discourse about school im-provement.Shepard (2000), in a recent article about assessment and learning, stated, “it is importantto recognize the pervasive negative effects of accountability tests and the extent to which externallyimposed testing programs prevent and drive out thoughtful classroom practices” (p. 9). If teachers turntheir anxiety into hours of drill and practice and sacrifice substantive mathematical investigations, howwill the testing system have served children’s learning? Teachers are also frustrated that after spendingdays testing children, the only feedback they receive are summary scores. While they may be able to seethat their students did better on number sense than probability, they have little sense of what kinds ofproblems were more difficult for students, how students solved those problems, or why they chose theanswers they did. I offer two responses to this problem, drawing from the results of this study. First, whenteachers practice problems with their students in the classroom, they can develop substantial insight intohow children are interpreting test items if they elicit their children’s thinking much like this study ratherthan just identify problems that students are missing. It is important to develop deeper understandings ofwhy students are choosing both correct and incorrect answers:

Sometimes an incorrect answer shows that a student has insight, and sometimes question marks must beput to a correct answer. Furthermore, one must be aware of the fact that a new, unexpected interpretationof the problem can be elicited while analyzing the student’s work. (Van den Heuvel-Panhuizen’s, 1994,pp. 359–360)

Second, while it is costly to return children’s test booklets to teachers for them to examine their work,keeping entire tests locked up does not allow teachers and researchers to investigate and understandstudents’ performance. Some portion of the tests should be released. While it is more efficient andcost-effective to base decisions about the quality of education at a school on the rise and fall of testscores, this study also underscores the importance of not using any single measure of achievement as thesole determinant of understanding or quality of instruction.

5. Conclusion

Assessment can be a source of insight into learning (Shepard, 2000), but that requires that we payclose attention to children’s thinking. While researchers have argued that we need more complex formsof assessments to better gauge the complexity of student thinking (Glaser & Silver, 1994; Resnick &Resnick, 1992), this study, using straightforward problems, raises important implications for teachers,policymakers, and test developers. For teachers, this study speaks to the importance of listening to howchildren understand and interpret mathematical problems. The findings underscore the importance ofanalyzing and learning from students’ work. Teachers can help prepare students for tests by surfacingthe assumptions and understandings that students bring to the test-taking situation. Teachers should dis-cuss with children how language used in test items may differ from their everyday problem solving. For

222 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

policymakers and test developers, this study raises significant questions about what students’ achieve-ment scores reveal about their mathematical understanding and achievement. The results show that wemust investigate more deeply how children use the knowledge they have developed about mathematics,about problem solving, and about test-taking as they work on problems that are meant to measure theirmathematical abilities.

Silver and Kilpatrick (1989)stated that testing has two functions, to inform instructional decisionmaking and to signify educational values. While our current testing practices do exude values, we havenot yet devised testing structures that would inform instructional decision making, leaving us in theposition to concur with observationsSilver and Kilpatrick (1989)made over a decade ago, “we can thinkof no large-scale testing effort that is designed to provide the kind of information that would be useful forinstructional decision making in general, let alone about problem solving. How is it that such an importantfunction of testing is going unmet?” (p. 179).

Acknowledgments

The author wishes to thank Jennifer Crane and Maureen Doyle for their assistance collecting andmanaging the data. The insightful comments of several anonymous reviewers as well as Megan Franke,Leslie Herrenkohl, and Deborah Stipek were instrumental in the writing of this article.

References

Ball, D. L., & Bass, H. (2000). Making believe: the collective construction of public mathematical knowledge in the elementaryclassroom. In: D. Phillips (Ed.),Yearbook of the national society for the study of education(pp. 193–224). Chicago: Universityof Chicago Press.

Cai, J., & Silver, E. A. (1995). Solution processes and interpretations of solutions in solving a division-with-remainder storyproblem: do Chinese and US students have similar difficulties.Journal for Research in Mathematics Education, 26, 491–497.

Carpenter, T. P., Corbitt, M. K., Kepner, H. S., Lindquist, M. M., & Reys, R. E. (1980). Solving verbal problems: results andimplications from National Assessment.Arithmetic Teacher, 28(1), 8–12.

Carpenter, T. P., Fennema, E., Franke, M. L., Levi, L., Empson, S. B. (1999).Children’s mathematics: cognitively guidedinstruction. Portsmouth, NH: Heinemann.

Cobb, P., Boufi, A., McClain, K., & Whitenack, J. (1997). Reflective discourse and collective reflection.Journal for Researchin Mathematics Education, 28, 258–277.

Cooper, B., & Dunne, M. (2000).Assessing children’s mathematical knowledge: social class, sex, and problem-solving. Buck-ingham, UK: Open University Press.

Franke, M. L., & Kazemi, E. (2001). Teaching as learning within a community of practice: characterizing generative growth. In:T. Wood, B. S. Nelson, & J. Warfield (Eds.),Beyond classical pedagogy: teaching elementary school mathematics(pp. 47–74).Mahwah, NJ: Lawrence Erlbaum.

Glaser, R., & Silver, E. (1994). Assessment, testing, and instruction: retrospect and prospect. In: L. Darling-Hammond (Ed.),Review of research in education(Vol. 20, pp. 393–419). Washington, DC: American Educational Research Association.

Kazemi, E., & Franke, M. L. (2000, April).Teacher learning in mathematics classrooms: a community of practice perspective.Paper presented at the annual meeting of the American Educational Research Association, New Orleans.

Kouba, V. L., Brown, C. A., Carpenter, T. P., Lindquist, M. M., Silver, E. A., & Swafford, J. O. (1988). Results of the fourthNAEP assessment of mathematics: number, operations, and word problems.Arithmetic Teacher, 35(8), 14–19.

Ladson-Billings, G. (1995). Making mathematics meaningful in multicultural contexts. In: W. Secada, E. Fennema, & L. B.Adajian (Eds.),New directions for equity in mathematics education(pp. 126–145). Cambridge: Cambridge University Press.

E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 223

Lampert, M. (1990). When the problem is not the question and the solution is not the answer: mathematical knowing andteaching.American Educational Research Journal, 27, 29–63.

Lane, S. (1993). The conceptual framework for the development of a mathematics performance assessment instrument.Educa-tional Measurement: Issues and Practices, 12, 16–23.

Lane, S., & Silver, E. A. (1995). Equity and validity considerations in the design and implementation of a mathematics perfor-mance assessment: the experience of the QUASAR project. In: M. Nettlees, & A. L. Nettles (Eds.),Equity and excellence ineducational testing and assessment(pp. 185–219). Boston, MA: Kluwer Academic Publishers.

Lave, J. (1988).Cognition in practice: mind, mathematics, and culture in everyday life. New York: Cambridge University Press.Lave, J. (1993). Word problems: a microcosm of theories of learning. In: P. Light, & G. Butterworth (Eds.),Context and cognition:

ways of learning and knowing(pp. 74–92). Hillsdale, NJ: Lawrence Erlbaum.Linn, R. L. (2000). Assessments and accountability.Educational Researcher, 29, 4–16.Magone, M. E., Cai, J., Silver, E. A., & Wang, N. (1994). Validating the cognitive complexity and content quality of a mathematics

performance assessment.International Journal of Educational Research, 21, 317–340.McNeil, L. M. (2000).Contradictions of school reform: educational costs of standardized testing. New York: Routledge.Miller-Jones, D. (1989). Culture and testing.American Psychologist, 44, 360–366.National Assessment of Educational Progress. (1983).The third national mathematics assessment: results, trends, and issues.

Denver, CO: Author.National Research Council. (1999).High stakes: testing for tracking, promotion, and graduation. Washington, DC: National

Academy Press.Resnick, L. B., & Resnick, D. P. (1992). Assessing the thinking curriculum: new tools for educational reform. In: B. R. Gifford,

& M. C. O’Connor (Eds.),Changing assessments: alternative views of aptitude, achievement, and instruction(pp. 37–75).Boston: Kluwer Academic Publishers.

Rogoff, B. (1997). Evaluating development in the process of participation: theory, methods, and practice building on each other.In: E. Amsel, & K. A. Renninger (Eds.),Change and development: issues of theory, method, and application(pp. 265–285).Mahwah, NJ: Lawrence Erlbaum.

Romberg, T. A. (Ed.). (1998).Mathematics in context: a connected curriculum for grades 5–8. Chicago: Encyclopedia BritannicaEducational Corporation.

Santel-Parke, C., & Cai, J. (1997). Does the task truly measure what was intended.Mathematics Teaching in the Middle School,3, 74–82.

Saxe, G. (1990).Culture and cognitive development: studies in mathematical understanding. Hillsdale, NJ: Lawrence Erlbaum.Schoenfeld, A. H. (1988). When good teaching leads to bad results: the disasters of “well taught” mathematics courses.Educa-

tional Psychologist, 23, 145–166.Shepard, L. (2000). The role of assessment in a learning culture.Educational Researcher, 29(7), 4–14.Silver, E. A., & Kilpatrick, J. (1989). Testing mathematical problem solving. In: R. I. Charles, & E. A. Silver (Eds.),The teaching

and assessing of mathematical problem solving(pp. 178–186). Reston, VA: National Council of Teachers of Mathematics.Silver, E. A., Shapiro, L. J., & Deutsch, A. (1993). Sense-making and the solution of division problems involving remainders:

an examination of students’ solution processes and their interpretations of solutions.Journal for Research in MathematicsEducation, 24, 117–135.

Silver, E. A., Smith, M. S., Nelson, B. S. (1995). The QUASAR project: equity concerns meet mathematics education reform inthe middle school. In: W. G. Secada, E. Fennema, & L. B. Adajian (Eds.),New directions for equity in mathematics education(pp. 9–56). Cambridge: Cambridge University Press.

Smith, M. L., & Fey, P. (2000). Validity and accountability in high-stakes testing.Journal of Teacher Education, 51,334–344.

Solano-Flores, G., & Nelson-Barber, S. (2001). On the cultural validity of science assessments.Journal of Research in ScienceTeaching, 38, 553–573.

Sowder, L. (1989). Choosing operations in solving routine story problems. In: R. I. Charles, & E. A. Silver (Eds.),The teachingand assessing of mathematical problem solving(pp. 148–158). Reston, VA: National Council of Teachers of Mathematics.

Stiff, L. V., & Harvey, W. B. (1988). On the education of black children in mathematics.Journal of Black Studies, 19,190–203.

Tate, W. (1994). Race, retrenchment, and the reform of school mathematics.Phi Delta Kappan, 75, 477–485.Van den Heuvel-Panhuizen, M. (1994). Improvement of (didactical) assessment by improvement of problems: an attempt with

respect to percentage.Educational Studies in Mathematics, 27, 341–372.

224 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224

Van den Heuvel-Panhuizen, M., & Gravemeijer, K. (1993). Tests aren’t all bad: an attempt to change the face of written testsin primary school mathematics instruction. In: N. Webb (Ed.),NCTM yearbook: assessment in the mathematics classroom(pp. 54–64). Reston, VA: National Council of Teachers of Mathematics.

Van den Heuvel-Panhuizen, M., Middleton, J. A., & Streefland, L. (1995). Student-generated problems: easy and difficultproblems on percentage.For the Learning of Mathematics, 15, 21–27.

Wertsch, J. V. (1991).Voices of the mind: a sociocultural approach to mediated action. Cambridge, MA: Harvard UniversityPress.

Wiliam, D. (1998). What makes an investigation difficult?Journal of Mathematical Behavior, 17, 329–353.