Download pdf - TEACHERS' CLASSROOM ASSESSMENT PRACTICES

Middle Grades Research Journal, Volume 5(3), 2010, pp. 107–117 ISSN 1937-0814

Copyright © 2010 Information Age Publishing, Inc. All rights of reproduction in any form reserved.

TEACHERS’ CLASSROOM

ASSESSMENT PRACTICES

Bruce B. Frey and Vicki L. Schmitt

This study examined classroom assessment practices of 3rd- through 12th-grade teachers in a Midwestern

state. In addition to determining the frequency with which specific assessment item formats were utilized, the

level of use of selected “best practice” approaches to assessment was considered (performance-based assess-

ment, teacher-made tests, and formative assessment). Essays and written assignments were the most common

assessment formats reported. There is substantial use of performance-based assessments across grade levels

and subject, but traditional paper-and-pencil testing remains the predominant classroom assessment format.

Female teachers choose performance-based assessment more often than male teachers. Performance-based

assessment is used much more frequently by language arts teachers than by those who teach other subjects and

is more common at higher levels than at the elementary level. Though teachers design their own classroom

assessments, they routinely rely on tests or items written by others. Formative assessment is not common, as

only about 12% of assessments do not affect student grades and 3 out of every 4 assessments are administered

after instruction is completed. Implications for promoting quality classroom assessment are discussed.

Decades of research have produced general

recommendations for quality classroom

assessment. Among these recommendations

are that teachers should consider performance-

based assessment, not only traditional paper-

and-pencil testing, design assessments them-

selves which match their objectives and

instructional approaches, and use assessment

which is formative, not just summative. This

study examined classroom assessment prac-

tices of a representative sample of teachers

across a Midwestern state. It reports on the

extent to which these key recommended class-

room assessment approaches are being used

and identifies the characteristics of teachers

who choose them. The frequencies of a wide

variety of specific classroom assessment for-

mats, both traditional and performance-based,

are also reported.

Classroom assessment researchers note that

the “assessments best suited to guide improve-

ments in student learning are the … assess-

ments that teachers administer in their

classrooms” (Guskey, 2003, p. 6) and class-

room teachers routinely construct assessments

in order to measure student progress (Brualdi,

Bruce Frey, 1122 West Campus Road, Room 643, Department of Psychology and Research in Education, Lawrence, KS

66045. Telephone: (785) 864-9706. E-mail: [email protected]

108 Middle Grades Research Journal Vol. 5, No. 3, 2010

1998). While much educational measurement

research is focused on state administered, stan-

dardized assessments, teachers place the high-

est information value on the tests they have

constructed themselves (Boothroyd, McMor-

ris, & Pruzek, 1992) and classroom assessment

is perhaps the single most common teacher

professional activity, with teachers devoting

approximately 33% of their professional time

assessing students in their classrooms (Stig-

gins, 1991a).

While classroom assessment research has

focused primarily on issues of validity and

reliability of traditional paper-and-pencil test-

ing, during the last two decades, a dramatic

shift has occurred in classroom measurement

with educators becoming increasingly aware

of the need to focus on alternative means of

assessing students that would “directly exam-

ine performance on worthy intellectual tasks”

(Wiggins, 1990, p. 1), validly measure impor-

tant classroom objectives, and use assessment

to promote learning.

Three best practice recommendations, sup-

ported in the research literature, were chosen

as measurable behaviors for study. Perhaps the

most consistently supported “modern”

approach to classroom testing is the use of per-

formance-based measures (Brookhart, 1999;

McMillan, 2001; Mertler, 2003; Popham,

1997, 2005; Stiggins, 1995). Teachers have

begun using performance assessment tech-

niques more often, which require observation

and professional judgment to make decisions

regarding student achievement. Educational

researchers also recommend the use of

teacher-made tests. Teachers know the curric-

ulum; they know what was covered and how it

was covered. As a general practice, using

items from test banks which accompany many

textbooks, or entire commercially produced

worksheet assessments, or using tests pro-

duced by others who have taught similar sub-

ject matter in the past, raises validity concerns

(Frey, Petersen, Edwards, Pedrotti, & Peyton,

2005; McMillan, 2001). An additional

research recommended purpose of classroom

assessments is to promote the learning of stu-

dents in the classroom. Classroom assessments

designed to affect learning by providing feed-

back to learners and instructors typically differ

from summative assessments as they are

administered during instruction and do not

affect student grades. This diagnostic tack that

provides feedback to teachers and students is

the philosophical approach of formative

assessment (Boston, 2002) and, when imple-

mented correctly, has been shown to increase

student achievement (Costa & Kallick, 2001;

Earl, 2001). In a review of more than 250 jour-

nal articles and book chapters, Black and Wil-

iam (1998) report that formative assessment

has shown effects which range from .40 to .70

standard deviations on standardized tests.

Classroom measures can provide continuous

feedback for both teacher and students, and

need not only be used when learning has come

to an end (Guskey, 2003). In fact, teachers can

utilize the information they collect from their

assessment of student learning to make adjust-

ments in instruction, and students can use feed-

back from frequent assessments to adjust their

own learning strategies.

Middle schools, in particular, provide a

promising environment for innovation and best

practice recommendations specifically aimed

at that level include the utilization of non-tra-

ditional or modern methods of classroom

assessment and to utilize frequent assessment.

Among the emphases recommended by the

Carnegie Corporation and Edna McConnell

Clark Foundation’s National Forum to Accel-

erate Middle Grades (Bottiger, 2009) is the

approach described in the Turning Points

reform model (Jackson & Davis, 2000) which

includes a focus on improving classroom

assessment and linking it to learning and

teaching (Jackson, 2000). Another product of

the National Forum was to establish criteria for

quality schools, the Schools to Watch criteria.

One key criterion of academic excellence is a

program of assessment which is individualized

and connected to the real-world (Bottiger,

2009). In other words, classroom assessment

should be formative and authentic.

Teachers’ Classroom Assessment Practices 109

METHODS

Participants

Selecting systematically from a comprehen-

sive list provided by the state board of educa-

tion, a cluster sampling method, using school

buildings as a cluster of teachers, was used to

obtain a sample of 140 third- to 12th-grade

teachers representing 22 school districts.

Teachers were solicited through letters which

were part of a packet which school principals

received. Two weeks after mailing, a follow-

up reminder postcard was sent to all of the

school principals originally solicited. Because

it is unknown how many of the survey letters

were distributed at the school level and how

many teachers at each school were qualified to

participate in the study, an exact response rate

is difficult to determine. We estimate a

response rate of 17% of teachers, 25% of

schools and 33% of districts. The characteris-

tics of the study sample matched well with

characteristics of the population of the study

state. About half of the teachers taught in a

rural area (49.3%), 29% taught in a suburban

area, while 22% taught in an urban school dis-

trict. Most teachers in the sample were female

(64%), had more than 6 years of teaching

experience (70%) and taught in a public school

(90%). A third of respondents had a graduate

degree (32.9%) and all had a baccalaureate.

All targeted grade levels were represented with

about 39% teaching at the elementary level,

32% at the middle level and 29% at the high

school level.

Instrument

The survey was one side of one page and

consisted of four sections:

1. an informed consent paragraph describing

the study and its voluntary nature;

2. a section of definitions asking respon-

dents to use provided definitions for two

key terms in the survey—traditional tests

and performance tests. Different teachers,

textbooks and researchers have differing

understandings of these terms. For the

purposes of this study, teachers were

asked to use the following definitions:

• Traditional tests are paper-and-pencil

tests made up of multiple-choice,

matching, true/false, short answer/fill-

in-the-blank or essay questions. They

are usually designed to measure under-

standing or knowledge. Scoring is

often objective.

• Performance tests, sometimes called

alternative or authentic assessments,

require the performance of a skill or

the production of a product. They are

usually designed to measure skill or

ability. Scoring often requires subjec-

tive judgment.

These definitions consider essay questions

to be traditional tests, but the item type might

reasonably fit in either, both, or neither cate-

gory (Frey & Schmitt, 2007). In actual use,

essay questions might be used to measure

knowledge and be objectively scored, in which

case they might best be considered traditional

paper-and-pencil items. Conversely, essay

questions can be used to measure skill or abil-

ity and be subjectively scored, so they might

be considered performance-based in that case.

For the purposes of this study, a choice was

made and the definition provided to respon-

dents.

3. a set of questions covering six aspects of

classroom testing asking respondents to

estimate the “percentage of the time” they

use various types of assessments (e.g. tra-

ditional, assessment during instruction,

tests which affect grades). Respondents

were encouraged to produce estimates

which summed to 100% within each area.

Instructions indicated that teachers should

only think about their “own testing, not

mandated standardized tests.”

4. a set of demographic questions.


Data Analysis

Descriptive statistics were produced for all

questions to examine the rate at which teachers

choose various classroom assessment prac-

tices. The primary research questions con-

cerned the frequency with which teachers use

tests they have written themselves, use forma-

tive assessment, and use performance-based

assessments. Correlations and analyses of vari-

ance were conducted to examine relationships

between these key variables of interest:

• years of teaching experience;

• gender, grade level, subject taught;

• percentage of time that teachers use perfor-

mance assessment (as opposed to tradi-

tional paper-and-pencil testing);

• percentage of time that teachers use tests

they have written themselves;

• percentage of time that teachers use tests

which affect grades (summative assess-

ment);

• percentage of time that teachers give tests

during instruction (formative assessment);

and

• percentage of time that teachers choose

specific assessment and item formats.

RESULTS

Descriptive results from the survey are shown

in Table 1. Teachers reported that they use

assessments they have entirely made them-

selves only about half the time (M = 55.01%,

SD = 34.08). Only a small proportion of class-

room tests are formative assessments, as the

majority of assessments affect grades (87.68%,

SD = 19.11) and are given after instruction is

over (M = 75.02%, SD = 23.17). Though the

majority of classroom assessment uses a tradi-

tional paper-and-pencil format, performance-

based tests represents a substantial proportion

of classroom assessment, as about 40% of

assessments are entirely (M = 27.58%, SD =

22.85) or partly (M = 12.27%, SD = 23.82) per-

formance-based.

By combining responses to specific items,

one can infer the overall relative frequency of

a variety of assessment formats. For example if

traditional assessment is used about 60% of the

time, and about 10% of those assessments are

true-false, then the true-false format is used

about 6% of the time. Table 2 presents these

frequency estimates. Writing assignments are

the most common type of classroom assess-

ment reported, followed by short answer/fill-

in-the-blank and multiple-choice items.

Teaching Experience

and Assessment Practices

No correlation was found between years of

teaching experience and the relative frequency

with which teachers use tests they make them-

selves versus those made by others. There

were, also, no relationships found between

experience and use of formative assessment or

use of performance-based assessments.

Some traditional assessment format prefer-

ences, however, vary based on years of teach-

ing experience. When teachers choose to

assess with traditional paper-and-pencil meth-

ods, those with more years of experience tend

to avoid the “short answer” format (r = −.30, p

< .001) and are somewhat more likely to use

performance-based assessment, formats other

than conventional multiple-choice, matching,

true-false, and essay (r = .28, p = .002). There

was no relationship found between which

types of performance-based assessment for-

mats teachers preferred and how long they had

been teaching.

Performance-Based Assessment

A two-way analysis of variance was con-

ducted with gender and grade level taught

(3rd-5th, 6th-8th, 9th-12th) as the independent

variables and the percentage of the time that

teachers use performance-based tests as the

dependent variable. The gender main effect

was significant with a small effect size, F(1,

113) = 4.97, p = .03, partial η2 = .04. Neither

the grade level taught main effect nor the inter-


TABLE 1Teachers’ Classroom Assessment Practices

Item Mean SD

About what percentage of the time do you use:

• tests you made yourself

• tests made by others

• one assessment which combines both

55.01

35.18

9.24

34.08

30.94

16.30

Of all your assessments, what percentage are given:

• during instruction

• after instruction

24.98

75.02

23.17

23.17

Of all your assessments, what percentage:

• do not affect students’ grades

• affect students grades

12.44

87.68

19.10

19.11

About what percentage of the time do you use:

• traditional tests

• performance tests

• one assessment which combines both

60.01

27.58

12.27

30.17

22.85

23.82

When you use traditional assessments, about what percentage of the time do you use:

• Multiple-choice

• Matching

• True-false

• Short answer/fill-in-the-blank

• Essay

• Other formats

23.97

14.18

9.81

27.13

14.72

10.59

21.45

12.20

10.58

24.11

17.94

25.25

When you use performance assessments, about what percentage of the time do you use:

• Portfolios

• Group projects

• Concept mapping

• Presentations (e.g. debates, speeches)

• Essays or writing assignments

• Other formats

8.57

23.36

5.84

19.81

24.46

17.80

16.23

22.44

8.27

19.12

22.11

28.79

Note: N = 139.

TABLE 2Frequency of Classroom Assessment Formats

Assessment Format

Percentage of All

Assessments

Essays or writing assignments 18.57

Short answer/fill-in-the-blank items 16.28

Multiple-choice items 14.38

Group projects 9.31

Matching items 8.50

Presentations (e.g. debates, speeches) 7.89

True-false items 5.89

Concept mapping 2.33

Other performance-based formats 7.09

Other traditional formats 6.36


action effect were significant. Figure 1 shows

the results. Females use performance-based

assessment about 50% more often than males

at all grade levels.

A second two-way analysis of variance was

conducted with gender and subject taught (ele-

mentary, math, science, social studies and lan-

guage arts) as the independent variables and

the percentage of the time that teachers use

performance-based tests as the dependent vari-

able. Significant differences were found for

subject taught with a moderate to large effect

size, F(4,109) = 3.13, p = .02, partial η2 = .10.

Follow-up comparisons found that perfor-

mance-based assessment was used by lan-

guage arts teachers significantly more than

teachers of other subjects. The gender effect

and interaction effects were not significant.

Though language arts teachers are much more

likely to be female than male (five times more

likely in our sample), that may not be the

explanation for the higher frequency of perfor-

mance-based assessment in that topic area. As

can be seen in Figure 2, which presents the

results of this analysis, male teachers in lan-

guage arts reported greater use of performance

assessment than female teachers. It is also not

the presumably more common use of essay

testing in language arts courses that explains

the differences, as in this study, and this analy-

sis, essay tests were categorized as traditional

assessment, not performance assessment.

Teacher-Made Tests

A two-way analysis of variance was con-

ducted with gender and grade level taught as

the independent variables and the percentage

of the time that teachers use tests they have

made themselves as the dependent variable.

Both the gender and grade level main effects

were significant with a moderate effect size for

level, F(2,113) = 4.21, p = .02, partial η2 = .07,

and a small effect size for gender, F(1,113) =

4.76, p = .03, partial η2 = .04. The interaction

term was not significant. Male teachers used

tests they made themselves more often than

female teachers (M = 67.53%, SD = 32.29; M =

50.93%, SD = 32.18). Follow-up comparisons

3-5 6-8 9-12

Grade Level Taught

25

30

35

40

45

Percen

tag

e o

f T

im

e T

each

ers U

se

Perfo

rm

an

ce-B

ased

Assessm

en

t

Females

Males

FIGURE 1Gender, Grade Level, and Use of Performance-Based Assessment


found that the grade level main effect was

driven by significant differences between 3rd-

through 5th-grade teachers (M = 37.67%, SD =

28.61) and the other grade levels (6th-8th, M =

59.57%, SD = 31.21, p = .003; 9th-12th, M =

68.31%, SD = 32.10, p < .001).

A one-way analysis of variance was con-

ducted with subject taught as the independent

variable and the percentage of the time that

teachers use tests they have made themselves

as the dependent variable. Significant differ-

ences were found with a large effect size,

F(4,114) = 6.730, p < .001, partial η2 = .19.

Descriptive results for the five groups were as

follows: elementary, M = 37.64%, SD = 31.48;

math, M = 56.35%, SD = 35.94; social studies,

M = 62.44, SD = 29.39; science, M = 70.53%,

SD = 30.27; language arts, M = 73.22, SD =

18.39. Because of significantly different vari-

ances among the groups (Levene’s F(4,114) =

3.68, p = .01), Dunnett’s C, a method which

does not require equality of variance, was used

for follow-up comparisons. This analysis

found that the elementary group used tests they

made themselves significantly less often than

social studies, science or language art teachers.

Formative Assessment

The survey included two items asking

teachers about assessment practices consistent

with the characteristics of formative assess-

ment. Teachers were asked what percent of the

time they give assessments which do not affect

grades and what percent of the time they assess

during instruction, instead of at the end. Two

two-way analyses of variance were conducted

with gender and grade level taught as the inde-

pendent variables and responses to these two

language artssocial studiessciencemathelementary

Subject Taught

80

70

60

50

40

30

20

Percen

tag

e o

f T

im

e T

each

ers U

se

Perfo

rm

an

ce-B

ased

A

ssessm

en

t

Females

Males

FIGURE 2Subject Taught, Gender, and Use of Performance-Based Assessment


items as the dependent variables. Two one-

way analyses of variance on the items were

also conducted with subject taught as the inde-

pendent variable. None of these analyses

resulted in differences significant at less than

the .05 level. However, two analyses identified

differences significant at around the .06 level

with small to moderate effect sizes. Those

results are presented here. The frequency with

which teachers give tests that do not affect stu-

dent grades differed by subject taught,

F(4,113) = 2.32, p = .06, partial η2 = .08, and

grade level taught, F(2,113) = 2.94, p = .06,

partial η2 = .05. Follow-up analyses found fre-

quency differences between elementary teach-

ers (M = 18.98%, SD = 22.94) and both math

teachers (M = 5.85%, SD = 9.01, p = .01) and

science teachers (M = 5.94%, SD = 13.69, p =

.02). Differences were also found between

teachers of 3rd-5th grade (M = 19.47%, SD =

22.42) and teachers of 9th-12th grade (M =

6.91%, SD = 13.41, p ≤ .05).

Table 3 presents a summary of comparisons

of assessment practices across gender, subject

and level. Female teachers choose perfor-

mance-based assessment more often than male

teachers and use tests they made themselves

less often. Language arts teachers use perfor-

mance assessment more often than do teachers

of other subjects. Elementary teachers use tests

they construct themselves less often than other

levels and other subjects, and they might be

more likely (p = .06) to design assessments for

use in ways consistent with the goals of forma-

tive assessment.

DISCUSSION

A quarter century ago, Gullickson and col-

leagues (Gullickson, 1985; Gullickson & Ell-

wein, 1985) and others (e.g., Gulliksen, 1985)

urged educational researchers to focus on

improving classroom assessment. The argu-

ment then was that the quality of teacher-made

tests had declined during the early years of the

1980s, probably because the emphasis on

national standardized test formats (i.e., objec-

tively scorable multiple-choice items) led

teachers to shy away from performance-based

assessments and open-ended constructed

response formats. The field called for better

teacher preparation focused on the choices

real-life teachers make, encouraging teachers

to more frequently assess for purposes of pro-

viding feedback on student learning and the

success of teacher instruction and a greater

reliance on assessments tailored by teachers

specifically for their students. The explicit

solution then is what it is today; the measure-

ment community must do a better job of train-

ing teachers.

Following the call for better teacher-made

assessment systems and during that time of

greater researcher scrutiny of teacher-made

assessments, teacher preparation programs did

not improve. Most college programs and state

TABLE 3Summary of Findings

Dependent Variable Gender Grade Level Subject Taught

Percentage of the time that teachers give performance-based tests p = .03

η2 = .04

n.s. p = .02

η2 = .10

Percentage of the time teachers give tests they have made themselves p = .03

η2 = .04

p = .02

η2 = .07

p < .001

η2 = .19

Percentage of the time teachers give test which do not affect grades n.s. p = .06

η2 = .05

p = .06

η2 = .08

Percentage of time teachers give tests during instruction n.s. n.s. n.s.

Note: n.s. = not significant.


certification guidelines continued to have no

explicit requirement that teachers were even

trained in assessment, there continued to be lit-

tle training after certification, and training

remained focused on large-scale testing and

score interpretation (Boothroyd et al., 1992;

Stiggins, 1991b, 2001, 2002; Trice, 2000;

Wise, Lukin, & Roos, 1991). Most impor-

tantly, today’s emphasis is on federally man-

dated standardized tests to assess broad

achievement areas more than ever before,

much more even than was the case when the

alarm was first sounded in the 1980s.

A study with similar sampling methods

which looked at teachers’ beliefs about the

importance of a range of specific classroom

assessment practices was conducted 25 years

ago (Gullickson, 1985). Gullickson found that

using teacher-made objectively scored tradi-

tional paper-and-pencil tests was the most

common method used across all levels and

subjects taught. The study also found that lan-

guage arts teachers were more likely than sci-

ence or social science teachers to use some

methods which could be considered perfor-

mance-based assessment-papers/notebooks

and oral reports. This study’s findings are con-

sistent with that snapshot from a generation

ago. Though there is now substantial use of

performance-based assessments across grade

levels and subjects, traditional paper-and-pen-

cil testing remains the predominant classroom

assessment paradigm. Formative assessment is

not common, as only about 12% of assess-

ments do not affect student grades and three

out of every four tests are given after instruc-

tion is completed. Even though teachers fre-

quently design classroom assessments to

measure the effect of their own teaching, they

still rely on tests or items written by others

(presumably from textbooks, worksheets, or

other teachers) about half the time.

Implications

Advocates of the middle school philosophy

often cite the benefits of a student-centered

approach in instructional strategies and assess-

ment approaches for young adolescents in pro-

moting a quality educational experience

(Jackson & Davis, 2000; National Middle

School Association, 2003). Those concerned

with advancing student achievement in middle

grades schools must begin to focus on assess-

ment methods that not only increase student

test scores on standardized measures, but that

also increase the learning taking place in the

classroom. There is some evidence to suggest

that classroom assessment environments in

middle school can be fertile ground for

increasing academic performance. Brookhart,

Walsh, and Zientarski (2006), for example,

found that “classroom assessment environ-

ments were characterized by student percep-

tions of the importance and value of

assessment tasks, perceived self-efficacy and

mastery goal orientations” (p. 151) in their

sampling of middle school social science and

social studies classrooms.

Research suggests that middle grades stu-

dents learn through meaningful, hands-on

experiences in the classroom that is collabora-

tive in nature and involves students in the deci-

sion-making process (Eggen & Kauchak,

2001; Messick & Reynolds, 1992). Some even

go so far as to say that “young adolescents who

are enrolled in middle schools that have faith-

fully followed the middle school model score

highest on high stakes standardized tests”

(McEwin, Dickinson, & Jenkins, 2003, p. 67).

Quality classroom assessment in middle

grades schools have the potential to improve

learning if teachers (1) focus on the quality of

their assessments, (2) provide feedback to stu-

dents, and (3) involve students in the assess-

ment process (Stiggins, 2002). This “student

involved” approach to classroom assessment

requires that middle grades teachers make a

concerted effort to engage in a formative

assessment approach.

Investigations into the time teachers spend

developing their own assessments, the types of

assessments they create and the purpose for

which they use the information collected con-

tinue to be important. Better understanding of

the choices teachers make when testing stu-


dents can be of great benefit to those con-

cerned with improving the quality of

classroom assessment. A generation after the

call for improved classroom assessment prac-

tices, in a continuing age of accountability,

when the research focus is overwhelmingly on

large-scale test development with little empha-

sis on assisting teachers in developing high

quality classroom measures, the typical assess-

ment system used by teachers in actual prac-

tice continues to be out of balance.

REFERENCES

Black, P. & Wiliam, D. (1998, October). Inside the

black box. Phi Delta Kappan, 139-144.

Boothroyd, R. A., McMorris, R. F., & Pruzek, R. M.

(1992, April). What do teachers know about

measurement and how did they find out? Paper

presented at the annual meeting of the National

Council on Measurement in Education, San

Francisco, CA. (ERIC Document Reproduction

Service No. ED351309)

Boston, C. (2002). The concept of formative assess-

ment. Clearinghouse on Assessment and Evalua-

tion Services. (ERIC Document Reproduction

Services No. ED470206)

Bottiger, L. (2009). The middle school experience:

A Latina perspective. Unpublished doctoral dis-

sertation, University of Kansas, Lawrence.

Brookhart, S. M. (1999). The art and science of

classroom assessment: The missing part of peda-

gogy. ASHE-ERIC Higher Education Report,

27, 1. Washington, DC: George Washington

University.

Brookhart, S. M., Walsh, J., & Zientarski, W. A.

(2006). The dynamics of motivation and effort

for classroom assessments in middle school sci-

ence and social studies. Applied Measurement in

Education, 19, 2, 151-184.

Brualdi, A. (1998). Implementing performance

assessment in the classroom. Practical Assess-

ment, Research and Evaluation, 6(2). Retrieved

from http://PAREonline.net/getvn.asp?v=6&n

=2

Costa, A. L., & Kallick, B. (2001). Assessment

strategies for self-directed learning. Thousand

Oaks, CA: Sage.

Earl, L. M. (2001). Assessment as learning. Thou-

sand Oaks, CA: Sage.

Eggen, P., & Kauchak, D. (2001). Educational psy-

chology: Windows on classrooms. Upper Saddle

River, NJ: Merrill Prentice Hall.

Frey, B. B., Petersen, S. E., Edwards, L. M.,

Pedrotti, J. T., & Peyton, V. (2005). Item-writing

rules: Collective wisdom. Teaching and Teacher

Education, 21, 357-364.

Frey, B. B., & Schmitt, V. L. (2007). Coming to

terms with classroom assessment. Journal of

Advanced Academics, 18, (3), 402-423.

Gullickson, A. R. (1985). Student evaluation tech-

niques and their relationship to grade and curric-

ulum. Journal of Educational Research, 79(2),

96-100.

Gullickson, A. R., & Ellwein, M. C. (1985). Post

hoc analysis of teacher-made tests: The good-

ness-of-fit between prescription and practice.

Educational Measurement: Issues and Practice,

4, 1, 15-18.

Gulliksen, H. (1985). Creating better classroom

tests. (ERIC Document Reproduction Services

No. ED268149)

Guskey, T. R. (2003). How classroom assessments

improve learning. Educational Leadership,

60(5), 6-11.

Jackson, A. W. (2000) Turning points 2000: A look

at adolescence. Weekly Report, 2, 81.

Jackson, A. W., & Davis, G. A. (2000). Turning

points 2000: Educating adolescents in the 21st

century. New York, NY: Teachers College

Press.

McEwin, C., Dickinson, T., & Jenkins, D. (2003).

American’s middle schools in the new century:

Status and progress. Westerville, OH: National

Middle School Association.

McMillan, J. H. (2001). Essential assessment con-

cepts for teachers and administrators. Thousand

Oaks, CA: Sage.

Mertler, C. A. (2003). Classroom assessment: A

practical guide for educators. Los Angeles, CA:

Pyrczak.

Messick, R. G., & Reynolds, K. E. (1992). Middle

school curriculum in action. White Plains, NY:

Longman.

National Middle School Association. (2003). This

we believe: Successful schools for young adoles-

cents. Westerville, OH: Author.

Popham, W. J. (1997). What’s wrong—and what’s

right—with rubrics. Educational Leadership,

55(2) 72-75.

Popham, W. J. (2005). Classroom assessment:

What teachers need to know (4th ed.). Boston,

MA: Allyn & Bacon.


Stiggins, R. J. (1991a, March). Assessment literacy.

Phi Delta Kappan, 72(7), 534-539.

Stiggins, R.J. (1991b). Relevant classroom assess-

ment training for teachers. Educational Mea-

surement: Issues and Practice, 10, 7-12.

Stiggins, R. J. (1995). Sound performance assess-

ment in the guidance context. Clearinghouse on

Counseling and Student Services. (ERIC Docu-

ment Reproduction Service No. ED388889)

Stiggins, R. J. (2001). The unfulfilled promise of

classroom assessment. Educational Measure-

ment: Issues and Practice, 20(3), 5-15.

Stiggins, R. J. (2002). Assessment crisis: The

absence of assessment for learning. Phi Delta

Kappan, 83(102), 78-83.

Trice, A. D. (2000). A handbook of classroom

assessment. New York, NY: Addison Wesley

Longman.

Wiggins, G. (1990). The case for authentic assess-

ment. ERIC Digest. (ERIC Document Repro-

duction Service No. ED328611)

Wise, S. L., Lukin, L. E., & Roos, L. L. (1991).

Teacher beliefs about training in testing and

measurement. Journal of Teacher Education, 42,

37-42.

Copyright of Middle Grades Research Journal is the property of Information Age Publishing and its content

may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express

written permission. However, users may print, download, or email articles for individual use.