Elementary Teachers' Classroom Assessment and Grading Practices

This article was downloaded by: [University of Bath]On: 26 November 2014, At: 05:06Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41Mortimer Street, London W1T 3JH, UK

The Journal of Educational ResearchPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/vjer20

Elementary Teachers' Classroom Assessment and GradingPracticesJames H. McMillan a , Steve Myran a & Daryl Workman aa Virginia Commonwealth UniversityPublished online: 02 Apr 2010.

To cite this article: James H. McMillan , Steve Myran & Daryl Workman (2002) Elementary Teachers' Classroom Assessment and GradingPractices, The Journal of Educational Research, 95:4, 203-213, DOI: 10.1080/00220670209596593

To link to this article: http://dx.doi.org/10.1080/00220670209596593

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in thepublications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations orwarranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsedby Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectlyin connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/vjer20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/00220670209596593

http://dx.doi.org/10.1080/00220670209596593

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Elementary Teachers’ Classroom Assessment and Grading Practices JAMES H. McMILLAN STEVE MYRAN DARYL WORKMAN Virginia Commonwealth University

ABSTRACT The authors investigated the assessment and grading practices of over 900 Grades 3-5 teachers represent- ing urban, suburban, and rural schools. Teachers indicated the extent to which they used various factors to grade students, the types of assessments used, the cognitive level of assessments, and the grades awarded. Teachers appeared to conceptualize 6 major factors when they graded students; they placed the greatest weight on academic performance and academic- enabling behaviors, such as effort and improvement, and much less emphasis on homework, comparisons with other students, grade distributions of other teachers, and borderline cases. The teachers used 3 types of assessments-constructed- response, objective, and teacher-made major examinations; they differentiated between recall and higher level cognitive skills. However, there were few relationships between assessment and grade level, subject matter assessed, and grades awarded. Results are discussed in light of other research, indi- cating that teachers use a “hodgepodge” of factors when assessing and grading students. Key words: classroom assessment, elementary teachers, grading practices

significant amount of recent literature has focused on A classroom assessment and grading as essential aspects of effective teaching. An increased scrutiny of assessment is evidenced by the popularity of performance assessment and portfolios; newly established national assessment competencies for teachers (American Federation of Teachers, National Council on Measurement in Education, and National Education Association, 1990); and the interplay between learning, motivation, and assessment (Brookhart, 1993, 1994; Tittle, 1994).

Researchers have documented teachers’ tendency to award a “hodgepodge grade of attitude, effort, and achievement” (Brookhart, 1991, p. 36; Cross & Frary, 1996), although this conclusion was reached primarily on results of surveys of secondary-level teachers. It is also clear that teachers use a variety of assessment techniques, even if established measurement principles are often violated (Cross & Frary; Frary, Cross, & Weber, 1993; Gullickson, 1993; Plake & Impara, 1993; Stiggins & Conklin, 1992). In

addition, over the last decade, significant emphasis has been placed on using alternative assessments, such as performance assessments and portfolios, rather than on tradition- al paper-and-pencil assessments.

Given the variety of assessment and grading practices in the field, the increasing importance of alternative assessments, the critical role that each classroom teacher plays in determining assessments and grades, and the trend toward greater accountability of teachers with state assessment approaches that are inconsistent with much of the current literature, one needs to fully understand current assessment and grading practices. Such information is important to understand how classroom assessments are constructed and used in this new climate. The purpose of this investigation was to describe actual classroom assessment and grading practices of upper level elementary teachers, to determine the primary factors used in grading, and to determine whether meaningful relationships exist between independent variables grade level and subject taught and dependent variables assessment and grading practices.

The literature tends to separate assessment practices from grading practices. In this review, we first examined assessment practices. Airasian (1984) reviewed literature that suggests that teachers focus their classroom assessments in two areas-academic achievement and social behavior. The importance of these items varies with grade level; elementary teachers place greater importance on social behavior. Fleming and Chambers (1983), in a study that analyzed nearly 400 teacher-developed classroom tests, made the following conclusions about the nature of classroom assessment: (a) Short-answer questions are used most frequently; (b) essay questions, which represent slightly more than 1% of test items, are avoided; (c) matching items are used more than multiple-choice or true-false items; (d) most test questions, approximately 69%, sample knowledge of terms, facts, and rules and principles; and (e) few test items mea-

Address correspondence to James H. McMillan, PO. Box 842020, Department of Educational Studies, Virginia Common- wealth University, Richmond, VA 23284 (E-mail: jincmillan @ edunet. soe. vcu. edu)

203

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14

204 The Journal of Educational Research

sure student ability to apply what they have learned. Stig- gins and Conklin (1992) asked 24 teachers to keep a journal to reflect their assessment practices. The analysis focused on how teachers described their assessments and which specific issues related to their assessments were raised. The researchers found that teachers were most interested in assessing student mastery or achievement and that performance assessment was used frequently. The nature of the assessments used in each class was coupled closely with the roles that each teacher set for her students, teacher expecta- tions, and the type of teacher-student interactions desired.

Marso and Pigge (1 993) summarized research that suggests that elementary teachers place more emphasis on students’ constructed-response work samples than on tradi- tional paper-and-pencil tests. They also reported that direct observation is used in language arts assessments more than in other subjects and that essay assessments, although infrequent, tend to occur most in language arts, history, and social studies. The vast majority of teachers use several types of assessments, generally placing greatest emphasis on completion and short-answer questions. Stiggins and Bridgeford (1985) also found that elementary teachers tend to stress constructed-response tests rather than objective types of tests, although both are used extensively. Stiggins and Bridgeford (1985) reported that the use of teacher-made objective tests is positively related to grade level, that pub- lished tests tend to be used more in early grades, and that teacher-made tests are relied on more for mathematics than for English assessments. The authors concluded that “grade level appears to be an important variable in understanding classroom assessment” (p. 281).

In a survey of 143 elementary and secondary school teachers, Cizek, Fitzgerald, Shawn, and Rachor (1995) found that assessment practices “were highly variable and unpredictable from characteristics such as practice setting, gender, years of experience, grade level or familiarity with assessment policies in their school district” (p. 159). This finding suggests that grade level may not be as important as variations found between individual teachers. Overall, the authors concluded that “many teachers seemed to have individual assessment policies that reflected their own individ- ualistic values and beliefs about teaching” (p. 160). The highly variable nature of assessment practices, as is pointed out in the following paragraphs, is consistent with how teachers grade students. The Cizek et al. study was limited to elementary teachers attending a university measurement course and the use of a limited number of questions that were restricted to 10 factors used in grading and five sources of assessment-related information. Furthermore, respondents simply checked whether each factor or source was used, without any indication of the extent of use.

Plake and Impara (1997) summarized results from a large- scale survey of teachers that was structured to obtain teacher competency concerning assessment practices, by asking teachers to indicate which of several possible answers to assessment questions was best. A national random sample of

,

555 elementary, middle, and high school teachers was used. Overall mean performance on the survey was 66% correct. Teachers did better on items related to choosing and admin- istering assessments and significantly worse on communi- cating results. According to the authors, the results “give empirical evidence of the anticipated woefully low levels of assessment competency for teachers” (p. 67). The results also showed that teachers who had taken a measurement course performed better than did teachers who lacked this background.

In summary, the existing literature on elementary classroom assessment practices indicates that teachers probably need further training to improve the quality of the assessments that are used. Whatever the type of question used on assessments, few are written to tap students’ higher level thinking skills. Appropriately, teachers appear to use a variety of assessment methods. Particularly absent in the literature, however, are large-scale examinations of relationships between classroom assessment practices and grade level and subject matter. There is some evidence of trends across grade level and subject matter, as in Stiggins and Bridgeford (1985) and Marso and Pigge (1993), but small samples are used in this study, and the researchers did not focus on differences between different elementary grades.

Teachers’ grading practices have received far more attention in the literature than have assessment practices. This fact may be due to the salient and summative nature of grades to students and parents. Grades have important consequences and communicate student progress to parents.

Stiggins, Frisbie, and Griswold (1989) set the stage for recent research on grading by providing an analysis of current grading practices as related to recommendations of measurement specialists and newly established Standards for Teacher Competence in Educational Assessment of Stu- dents (American Federation of Teachers, National Council on Measurement in Education, National Education Associ- ation, 1990). In this study, the authors interviewed and/or observed 15 teachers on 19 recommendations from the measurement literature. They found that teachers use a wide variety of approaches to grading and that they want their grades to fairly reflect both student effort and achievement, as well as to motivate students. Contrary to recommended practice, Stiggins and colleagues found that teachers value student motivation and effort and set different levels of expectation on the basis of student ability. This finding is consistent with an earlier study by Gullickson (1985), in which elementary teachers indicated that they used nontest information, such as class discussion and student behavior, more than test results to grade students. Gullickson also reported little difference in grading practices between science, social science, and language arts. Given the increased emphasis on school and student accountability due to high- stakes testing, the relative emphasis that teachers give to tests may be increasing.

Brookhart (1994) conducted a comprehensive review of literature on teachers’ grading practices. Her review identi-

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14

MarcWApril2002 [Vol. %(No. 4)] 205

fied 19 studies completed since 1984. In seven studies, researchers investigated secondary school grading; in 11 studies, both elementary and secondary school grading; and in one study, elementary school teachers. She identified three general methods of study: (a) surveys in which teachers responded to questions concerning components included in grading, grade distributions, and attitudes toward grading issues; (b) surveys in which teachers were asked to respond to grading scenarios, asking what they would do in various circumstances; and (c) qualitative methods, including interviews, observation, and document analysis. Despite methodological and grade-level differences, the findings from these studies are remarkably similar. Taken together, Brookhart came to the following conclusions:

Teachers inform students of the components used in

Teachers try hard to be fair in grading. Measures of achievement, especially tests, are major

contributors to grades. Student effort and ability are used widely as compo-

nents of grades. Elementary teachers rely on more informal evidence

and observation, whereas secondary teachers use paper- and-pencil achievement tests and other written evidence as major contributors.

Teachers’ grading practices vary considerably from one teacher to another, especially in perceived meaning and purpose of grades, and how nonachievement factors will be considered.

Teachers’ grading practices are not consistent with recommendations of measurement specialists, especially con- founding effort with achievement.

grading.

In one study, Brookhart (1993) investigated the meaning that teachers give to grades and the extent to which value judgments are used in assigning grades. The results indicated that low-ability students who tried hard would be given a passing grade even if the numerical grade were failure, although working below ability level did not affect the numerical grade. That is, an average or above-average student would get the grade earned, whereas a below-average student would get a break if there were sufficient effort to justify it. Teachers were divided about how to factor in missing work. About half of the teachers indicated that a zero should be given, even if that meant a failure for the semester. The remaining teachers would lower the grade, but not to a failure. The teachers’ written comments showed that they strived to be fair to students. Teachers also seemed to indicate that a grade was a form of payment to students for work completed. That is, grades were some- thing that students earned, as compensation for work completed. This finding suggests that teachers, either formally or informally, include conceptions of student effort in assigning grades. Because teachers are concerned with student motivation, self-esteem, and the social consequences

of giving grades, using student achievement as the sole cri- teria for determining grades is rare. This finding is consistent with earlier work by Brookhart (1991), in which she pointed out that grading often consists of a hodgepodge of attitude, effort, and achievement. A limitation of this study is the small sample of elementary teachers (30) and the use of only three nonachievement factors in scenarios that participants responded to (effodability, missing work, and improvement). In addition, the students in our study were taking a university measurement course, which could result in socially desirable responses or answers that reflect the perspectives of the instructor.

Brookhart’s conclusion concerning the variety of factors that go into grading is consistent with Cizek et al. (1995). Cizek and colleagues also found that teachers generally use a variety of objective and subjective factors to maximize the likelihood that students obtain good grades.

In summary, the literature specific to elementary teachers’ assessment and grading practices is limited. There is an indication that teachers believe it is important to combine nonachievement factors, such as effort, ability, and conduct, with student achievement to determine grades; however, most of the studies that provide the basis for this conclusion have been conducted with secondary-level teachers. Although the studies are clear in this conclusion, less is known about how elementary teachers decide to weigh these nonachievement factors in determining grades and whether particular factors tend to be considered together or whether elementary teachers separate nonachievement factors, such as effort and improvement, from achievement. Also, in most of the surveys and other approaches in previous studies, researchers have asked teachers about their beliefs or projected behavior on the basis of scenarios. Actual assessment and grading practice may be different. Few researchers used a scale in which teachers indicated the actual use of different assessment and grading practices that allowed independent recording of the emphasis of each factor, rather than asking teachers to indicate the relative emphasis of each factor by percentage. Finally, there is no research on whether a relationship exists between the types of assessments used and grades received by students.

In the present study, we used a large sample of elementary school teachers to describe assessment and grading practices in a way that builds on and extends previous studies. We addressed four specific research questions, as follows:

1. What is the current state of assessment practice and grading by elementary teachers?

2. What are major assessment and grading components that are used by elementary teachers?

3. What is the relationship between assessment and grading practices and grades given to students? 4. What are the relationships between the independent

variables grade level and subject taught (mathematics and language arts) and the dependent variables assessment and grading practices?

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14


Method

Sample

The population included all Grades 3-5 regular elementary teachers in seven urbdmetropolitan school districts near Richmond, Virginia (1,561 teachers in 124 schools). Completed surveys were returned by 921 elementary teachers from 105 schools. Twenty of the teachers were not full time in a regular classroom, resulting in a sample of 901. The response rate by school was 88%; by teachers, 58%.

Instrument

The purpose of the questionnaire was to document, using closed-form items, the extent to which teachers emphasized different assessment and grading practices. A 6-point scale, ranging from not at all to completely, was constructed to allow teachers to indicate usage without the constraints of an ipsative scale that is commonly used in this area (e.g., percentage each factor contributes to grades). Also, the questions were worded to emphasize actual teacher behaviors in relation to a specific class of students, rather than more glob- al teacher beliefs. Teachers responded to all items once for language arts and once for mathematics. The stem for the items was “To what extent were final first semester grades of students in your single class described above based on.”

The initial set of items was drawn from previous ques- tionnaires that had been reported in the literature, as well as from research on teachers’ assessment and grading practices (Brookhart, 1994; Frary, Cross, & Weber, 1993; Stiggins & Conklin, 1992). The items included factors that teachers consider in giving grades, such as student effort, improvement, academic performance, types of assessments used, and the cognitive level of the assessments (e.g., knowledge, application, reasoning). We strengthened content-related evidence for validity for the initial draft of 47 items by asking 15 elementary teachers to review the items for clarity and completeness in covering most, if not all, assessment and grading practices used. Appropriate revisions were made to the items, and a second pilot test with a school division outside of the sample was used to gather additional feedback on clarity, relationships among items, item- response distributions, and reliability. Twenty-three teachers participated in the second pilot test. Item statistics were used to reduce the number of items to 27. Items that showed a very high correlation (r > .90) or minimum variation were eliminated, as well as items that were weak in reliability. We assessed reliability by asking the teachers in the second pilot test to retake the questionnaire following a 4-week interval. The stability estimate was done by examining the percentage of matches for the items. Those items that showed an exact match of less than 60% were deleted or combined with other items. The revised questionnaire included 34 items in the three categories (19 items assessing different factors used to determine grades, 11 items assessing different types of assessments used, and 4 items

assessing the cognitive level of the assessments). The average exact match for the items was 46% of the teachers; 89% of the matches were within 1 point on the 6-point scale. In additional items, teachers were asked to indicate the approximate grade distribution of the class.

Procedure

School division central administrators communicated to teachers that they should complete the questionnaire and that they were responsible for distribution and collection. The questionnaire took about 15 min to complete. Teachers were assured that their responses would be confidential. No information was on the form that could be used to identify the teacher. The surveys were completed in early February, soon after the end of the first semester.

Data Analyses

The data analyses were primarily descriptive; we used frequencies, percentages, means, medians, standard deviations, and graphic presentations to summarize overall findings and trends. We used an exploratory factor analysis to reduce the number of components investigated within each of the three categories of items. Relationships between assessment and grading practices, grades given, grade level, and participants were examined through multiple regression and paired t tests.

Findings

The descriptive results are presented first, followed by relationships. The assessment and grading practices reported were organized by the three categories of items-factors used in grading, types of assessments used, and cognitive level of assessments.

Descriptive Results

The means and standard deviations for the assessment and grading practices items, divided by categories for both language arts and mathematics, are reported in Table 1. In Table 2, we present the frequency distributions of a few questions to illustrate the spread of responses across the different points in the scale.

The means and standard deviations in Table 1 show that, for this group of teachers as a whole, a few factors contributed very little, if anything, to grades (i.e., disruptive student behavior, grade distributions of other teachers, performance compared with other students, school division policy about the percentage of students who may obtain different grades, and extra credit for nonacademic performance). Also, a few factors clearly contributed most, ranging from quite a bit to extensively-academic performance as opposed to other factors, performance compared with a set scale of percentage correct, and specific learning

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14

March/ApriI2002 [Vol. 95(No. 4)] 207

Table 1.-Means and Standard Deviations of All Items Measuring Assessment and Grading Practices for Elementary Teachers (N = 901)

Mathematics Language arts Variable M SD M SD

Factors used in determining grades Disruptive student behavior Improvement of performance since the beginning of the year Student effort-how much students tried to learn Ability levels of students Work habits and neatness Grade distributions of other teachers Completion of homework (not graded) Quality of completed homework Academic performance, as opposed to other factors Performance compared with other students in the class Performance compared with a set scale of percentage correct (e.g., 8694%) Performance compared with students from previous years Specific learning objectives mastered Formal or informal school or district policy of the percentage of students who may obtain A’s, B’s,

The degree to which students pay attention, participate in class, or both Inclusion of zeros for incomplete assignments in the determination of final percentage correct Extra credit for nonacademic performance (e.g., bringing in items for food drive) Extra credit for academic performance Effort, improvement, behavior and other nontest indicators for borderline cases

Major examinations Oral presentations Objective assessments (e.g., multiple choice, matching, short answer) Performance assessments (e.g., structured teacher observations or ratings of performance, such as

Assessments provided by publishers or supplied to the teacher (e.g., in instructional guides or

Assessments designed primarily by yourself Essay-type questions Projects completed by teams of students Projects completed by individual students Performance on quizzes Authentic assessments (e.g., real world performance tasks

Cognitive level of assessments used in determining grades Assessments that measure student recall knowledge Assessments that measure student understanding Assessments that measure how well students apply what they learn Assessments that measure student reasoning (higher order thinking)

C’s, D’s, F‘s

Types of assessments used in determining grades

a speech or paper)

manuals)

1.37 3.00 3.21 3.39 2.68 1.35 2.80 2.69 4.40 2.00 4.68 1.29 4.53

1 SO 3.01 3.04 1.34 2.57 2.99

3.21 2.37 3.82

2.84

3.54 3.63 2.42 2.5 1 3.06 3.93 2.95

3.65 4.46 4.31 3.99

.77 1.20 1.03 1.31 1 .05 .85 .98

1.13 1.06 1.03 I .03 .71 .92

1.15 1.07 1.27 .75

1.10 1.01

1.39 1.11 1.07

1.14

1.05 .95

1.15 1.03 1.24 .91

1.08

.90

.78

.84

.87

1.38 3.07 3.26 3.40 2.80 1.33 2.77 2.73 4.37 2.04 4.50 1.31 4.50

1 S O 3.10 3.07 1.35 2.56 3.00

3.05 3.03 3.75

3.43

3.22 3.90 3.39 2.91 3.59 3.80 2.89

3.52 4.46 4.28 4.03

.77 1.21 1.02 1.29 1.05 .8 1 .99

1.14 1.07 1.03 1.08 .73

1.08

1.14 1.07 1.24 .77

1.10 1 .00

1.38 .88

1.01

.93

1.06 .98

1.03 .99 .96 .98

1.06

.86

.17

.82

.86

Table 2.-Percentages of Elementary Teachers’ Responses to Selected Items for Mathematics Assessment Practices and Grading

Question Not at all Very little Some Quite a bit Extensively Completely

Factors contributing to grades

Improvement of performance since the beginning of the year 13 17 38 21 7 2 Student effort-how much students tried to learn 6 14 44 27 7 2 Ability levels of students 10 13 31 24 19 4 Academic performance compared with other factors. 2 3 12 29 44 10

Objective assessments Peformance assessments

Types of assessments used

2 8 28 36 21 5 14 23 38 17 7 1

Assessments that measure student reasoning

Cognitive level of assessments

0 2 25 44 25 4

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14


objectives mastered. The remaining factors contributed some, more accurately ranging from very little to quite a bit.

A fairly large standard deviation was reported for those items, showing considerable variation in the extent to which the factors were used for grading. A large percentage of teachers used effort in vastly different ways for grading. This same kind of dispersion of scores was evident in many of the factors. In Table 2, for example, 13% of elementary teachers reported using improvement not at all, whereas 30% of the teachers responded quite a bit, extensively, or completely. The extent to which ability level was used also was highly varied; 23% of the teachers responded not at all and 47% responded quite a bit, extensively, or completely. Given that the grading scales in the districts used in this study were based on how performance compares with a set scale of percentage correct (e.g., 94-100 A, 86-93 B), we were surprised to find that only 65% of the teachers responded that they used this extensively or completely.

The items in which teachers indicated the types of assessments used shows that they did not rely on a single kind of assessment. Rather, many different types of assessments appear to have been used. Although objective assessments were employed most frequently for both mathematics and language arts (means of 3.75 and 3.82, respectively), performance assessments and projects were used almost as much in language arts as objective items (means of 3.43, and 3.59, respectively). Assessments in mathematics included fewer performance assessments and projects (means of 2.84 and 2.51, respectively). There was great reliance on assessments prepared by the teachers them- selves, but also considerable use of assessments provided by publishers (language arts means of 3.90 and 3.22, and mathematics means of 3.63 and 3.54). The standard deviations with respect to types of assessments (about 1 point on the scale), pointed to considerable variation.

Cognitive levels of assessments were very similar for mathematics and language arts. The lowest rated assessments, in terms of use, were those that measure student recall knowledge. The highest rated assessment was student understanding, with application and reasoning in between. For the three highest rated items, the means were around 4 on the scale (used quite a bit).

Grades Awarded

The results for percentages of different grades awarded by elementary teachers are presented in Table 3. The table was broken out by grade level and subject matter, as well as by letter grade awarded. Percentages were estimated by teachers and therefore may not sum to 100%.

Grades A, B, and C were most typically awarded by the teachers, comprising more than 70% of the total grades given. Grades D and F comprised less than 10% of total grades given. A grade of B was most typically awarded by teachers, accounting for approximately 32 to 35% of total grades given. Grades A and C were nearly equally distrib- uted, accounting for approximately 40% of a combined total; grades of A comprised approximately 18 to 24% of total grades given, and grades of C comprised approximately 21 to 25% of total grades. Grades of D awarded were between 6 and 8% of the total, whereas grades of F were less than 3% of the total grades given. The relatively large variability of teacher responses was illustrated by the standard deviations.

Table 4 shows a summary of results of another procedure to examine variability, by comparing variability within schools to variability between schools. To calculate the average standard deviation within schools, we used the responses of teachers from the same school to derive a standard deviation score for that school for each item. We then averaged 105 standard deviations, one for each of 105 schools, to result in within-schools variability. We calculat- ed between-schools variability by using the mean for each school, considering that as a single score, and then calculat- ing the standard deviation of the means. The results of these analyses for three items, and percentage of A's awarded, are summarized in Table 4. In each case, the average variation within schools was greater than the variation between schools. Even though this result was influenced by the relatively low number of teachers in each school, which would increase the variation, it still suggests that teachers in the same school differed more, on average, than did responses compared at the school level.

Figure 1 illustrates the frequency of mean percentage mathematics A's awarded between schools. It shows that

Table 3.-Percentages of Different Semester Grades Awarded by Elementary Teachers (N = 859)

A B C D F Grade level Math LA Math LA Math LA Math LA Math LA ______ ~ ~ ~~

3 (n = 294) 23.97 22.56 35.02 34.20 21.33 23.77 6.29 6.24 2.24 1.78 4 (n = 258) 21.39 22.27 35.50 34.73 24.52 24.35 6.67 6.22 2.90 2.33 5 (n = 205) 17.56 20.83 33.32 32.24 23.60 23.90 7.54 7.31 2.62 2.33 Mixed (n = 102) 19.56 18.52 32.18 33.08 21.91 22.48 7.11 6.07 2.54 1.75

Total 20.62 21.05 34.00 33.56 22.84 23.63 6.90 6.46 2.58 2.05 ~~ ~

Note. Math = mathematics; LA = language arts. Percentages were estimated by the teachers.

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14

MarctdApril2002 [Vol. %(No. 4)] 209

mathematics grades awarded were 12% or less for 35 elementary schools, whereas for 20 schools, the percentage of mathematics A's awarded was 32%. It also shows a large between-schools variation of the number of A's awarded.

Data Reduction

Before examining the relationships between subject (mathematics compared with language arts) and grade level (Grades 3,4, and 5) , we performed data reduction for each of the major categories of items (factors, types, and cognitive levels) for both mathematics and language arts. The first step in the data reduction was to eliminate items that showed a floor effect with little variability. We used the remaining items in the second step of the data reduction, an exploratory factor analysis, to identify relationships among the items by reducing them to a few relatively independent, but conceptually meaningful, composite variables called

Table 4.-Variation Within and Between Elementary Schools for Selected Items (N = 105)

Mean Mean variation variation

Question within between

% A's awarded in mathematics 16.2 10.4 Student effort-how much students tried

Assessments that measure student to learn .92 .57

reasoning .81 .42 Objective assessments .97 .51

Figure 1.-Between-School Variability in Mean Percentage of Math A Grades

4 0 1

Mean % Math A Grades Awarded

components. A varimax rotation was used for the factor analyses.

The factor analysis for items used in grading (factors) resulted in six components. There were no differences between mathematics and language arts. The loadings of different items are summarized in Table 5.

The first component was comprised of three items that emphasized effort, ability, improvement, work habits, attention, and participation. These items could be considered enablers to academic performance, important indicators to teachers to judge the degree to which the student has tried to learn and, by implication, actually learned. A second component was defined by the two items that included questions about homework. The third component was loaded on one item concerning grade distributions of other teachers. The fourth component included three items that focused on academic performance of the student. The fifth component was loaded highly on two items that included comparisons with other students. The sixth component included the suggestion that borderline work and using extra credit are related and distinct from other factors. Thus, there appear to be six conceptually meaningful variables that elementary teachers use when grading students for both language arts and mathematics. These variables include actual performance, effort, ability and improvement, homework, other teachers' grading, comparisons with other students, and borderline cases. Given the relatively low emphasis on homework, comparisons with other students, other teachers' grading, and the infrequent occurrence of borderline cases, these results suggest that teachers conceptualize two major ingredients-actual performance, and effort, ability, and improvement. Of these two, academic performance clearly is most important, but effort, ability, and improvement remain as fairly important, especially for some teachers.

The factor analysis for types of assessments used resulted in three components. The item loadings were, for the most part, the same for both subjects. The first component was comprised of six items for mathematics types and four items for language arts types, each of which described some kind of constructed-response assessment, such as essays (mathematics only), projects, and performance assessments. The second component, made up of either two or three items, included objective assessments, quizzes (language arts only), and assessments provided by publishers. Evidently, items provided by publishers are used in both quizzes and objective assessments. The third component was comprised of two items for mathematics (major examinations and teacher-made tests) and two items for language arts (teacher-made tests and essays). This result suggests that the common element in the third component was teacher made. For mathematics and for language arts essays, the major examinations tend to be teacher made.

The factor analysis for cognitive levels showed high intercorrelation among the three items that suggested higher order knowledge and skills (understanding, reasoning,

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14


Table 5.-Factor Loadings for Elementary Teachers' Assessment and Grading Practices

Item

Factors used in grading Improvement of performance since the beginning of the year Student effort-how much students tried to learn Ability levels of students Completion of homework (not graded) Quality of completed homework Grade distributions of other teachers Academic performance, as opposed to other factors Performance compared with a set scale of percentage correct

Specific learning objectives mastered Performance compared with other students in the class Performance compared with students from previous years Extra credit for academic performance Effort, improvement, behavior and other nontest indictors for

(e.g., 8694% B)

borderline cases Types of assessments

Oral presentations Performance assessments (e.g., structured teacher observations

Essay-type questions (mathematics only) Projects completed by teams of students Projects completed by individual students Authentic assessments (e.g., real world performance tasks) Objective assessments (e.g., multiple choice, matching, short

answer) Assessments provided by publishers or supplied to the teacher

(e.g., in instructional guides or manuals) Major examinations (mathematics only) Assessments designed primarily by yourself Essay-type questions (language arts only)

Assessments that measure student understanding Assessments that measure student reasoning (higher order

Assessments that measure how well students apply what they

or ratings of performance, such as a speech or paper)

Cognitive level

thinking)

learn

Type Level Factor 3 4 5 6 1 2 3 1 1 2

.777

.so9 ,644

.819

.750 ,660

,710

.668

.683 ,781 ,741

,730

,659

.704

,746 ,740 ,819 ,712 .636

,736

,776 ,691 ,721 .672

,836

.795

318

Note. Unless otherwise noted, factor loadings for mathematics and language arts were averaged.

and application). Teachers tended to think about these items as one kind of skill, apart from recall knowledge, which did not load on this analysis.

Relationship Results

In the relationship analyses for subject matter and grade level, we used paired t tests and analysis of variance, respectively, with standardized component scores for the items loading on each of the 10 components derived from the factor analyses, plus the percentage of A's given, as dependent variables. We also performed a regression analysis to determine if assessment and grading practices predict grades. Thus, there were two independent variables-subject matter, with two levels, and grade level, with three levels, in the first two analyses; we used all 10 components as independent variables to predict the percentage of A's awarded.

The t test analyses showed that there were few differences between language arts and mathematics assessment and grading practices, despite the large sample size that would have made it easy to detect statistically significant differences. Clearly, there was more in common than there was different on the basis of these two content areas. As might be expected, differences occurred for the extent to which performance assessments were used (mean of 2.33 for mathematics and 3.41 for language arts), projects completed by individual students (mean of 3.01 for mathematics and 3.56 for language arts), and the use of assessments provided by publishers (mean of 3.56 for mathematics and 3.23 for language arts). Thus, only three items in the cate- gory types of assessments showed a difference between mathematics and language arts. When considering other factors such as effort, participation, homework, and so forth, as well as cognitive levels, we found no difference between the mathematics and language arts responses.

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14

MarcWApril2002 [Vol. 95(No. 4)1 211

Table 6.-Relationship of Grade Levels (3,4, and 5) to Assessment Practices of Elementary Teachers (N = 873)

No relationship Positive relationship Negative relationship

Mulhematics

Effort, ability, and improvement Homework Percentage A's Academic performance Extra credit Teacher-made major examinations Constructed-response assessments Higher order thinking and application Objective assessments

Language arts

Effort, ability, and improvement Homework Academic performance Extra credit Objective assessments Constructed-response assessments Higher order thinking and application Teacher-made major examinations Percentage A's

Table 7.-Factors, Types of Assessments, and Cognitive Levels, as Predictor Variables of Percentage A's Awarded for Elementary Teachers

Significant Significant Subject R positive relationship negative relationship

Mathematics (n = 7 14) .21 Higher order thinking and application Objective assessments Publisher-provided items Homework

Language arts (n = 731) .20 Constructed-response items Extra credit

We used one-way analyses of variance with Sheffk post hoc tests, to examine the relationship between grade level and assessment and grading practices. The results of these analyses are shown in Table 6, which contains a summary of the components that indicate no relationship, those that show a positive relationship, and the single variable that shows a negative relationship.

As with other analyses, the major finding was no difference between grade levels on components that were most important to assessment and grading. For both language arts and mathematics, the results showed that as grade level increases, so does the importance of homework, extra credit, and constructed-response assessments. For mathematics, the importance of objective assessments showed a positive relationship with grade level. In language arts, teacher- made major examinations contributed more in higher grades. The only negative relationship was found in the percentage of A's awarded, which means that fewer A's were awarded in higher grades.

The predictive relationship between assessment and grading practices was examined with stepwise multiple regression-one for language arts; one for mathematics, with percentage A's awarded as the dependent variable; and the eight weighted component scores as independent variables. The results of these regressions are summarized in Table 7.

The multiple correlation coefficients were relatively

small in both regressions, suggesting that the major predic- tors of grades were not the weight given to different factors, types of assessments, or cognitive level of assessments. Given that finding, the percentage of A's awarded tended to increase with increased weight given to higher order thinking assessments for mathematics, and consuucted-response assessments for language arts. Negative relationships for mathematics were found with objective assessments, publisher-provided items, and homework, and extra credit for language arts.

Discussion

The results of the analyses, consistent with earlier research by Brookhart (1994) and Cizek et al. (1993, show that most elementary teachers use a multitude of factors in grading students. The hodgepodge of factors considered when grading appear to be organized into six distinct components. Academic performance is clearly the most important factor in grading students, as also reported by Stiggins and Conklin (1992), but the results of the present study show that nontest performance and behavior, such as effort, participation, and extra credit work, also are very important for many teachers, consistent with the Gullickson (1985) study. Disruptive student behavior, grade distributions of other teachers, and norm-referenced interpretations con-

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14


tribute little to grading. A substantial percentage of elementary teachers include zeros in the calculation of grades. Because there are different ways this can be done, this finding suggests a need to explore in greater depth more specif- ically how this is accomplished.

Three major types of assessments are used-constructed- response, such as projects, essays, and presentations; objective assessments; and teacher-made major examinations. Although objective assessments are used most frequently, there is also a great reliance on constructed-response types of assessments. Teachers tend to differentiate the cognitive level of their assessments into two categories-recall knowledge and higher order thinking and application. High- er order thinking and application are emphasized heavily. There is a significant reliance on assessments that are designed by publishers, even if most assessment is teacher made. This finding suggests that teachers need training in how to evaluate the quality of their own assessments, as well as those provided by others.

Along with the variety of factors that go into grading, great variation exists within schools concerning the extent to which teachers emphasize different factors in grading students. The finding that within-school variance is greater than between-school variance suggests that individual teacher preferences are more important than are differences between schools in determining grading practices. This data suggest that teachers vary considerably in how they weigh different factors, even within the same building, and that school and student characteristics as a whole are less important than are individual beliefs. This finding is consistent with the highly variable results found by Cizek et al. (1995) and McMillan and Nash (2000), confirming that an important characteristic of classroom assessment and grading practices is that they are highly individualized and may be unique from one teacher to another, even in the same school. It may be useful for teachers to discuss such differences and consider whether more consistency would provide students with a clearer message about what is important.

An implication of this variability of grading is that there seems to be only moderate consistency among teachers about what is most important and how different factors are weighted. Why is this the case? One possible answer is that teachers view grading as an extension of teaching to promote student success in general in many areas important to schooling, including both academic achievement and academic-enabling behaviors, such as responsibility, effort, improvement, participation, and cooperation. If teachers would award grades solely on the basis of academic performance, there might be much more consistency. This finding also may explain why teachers’ grading practices do not fol- low established measurement principles. McMillan and Nash (2000) found that teachers base their grading practices on their educational philosophies and on what is best for each student. Considering differences in styles of teaching, types of students, and curriculum, perhaps it is not surpris- ing that there is high variability of grading practices. From a

policy perspective, an appropriate question is whether there is a good reason to permit or even encourage differences in grading practices or there is a need for educational measurement experts to modify recommendations to be more consistent with the realities of teaching and the multitude of factors that influence teacher behavior. An implication for the preparation of teachers is that it is important to train them to integrate assessment meaningfully into instruction. Another implication for teacher training is to explore with students the importance of conceptualizing assessment as a process involving a multitude of factors, combined in unique ways that may differ for each individual.

The factor analysis indicated that several student behaviors considered together indicate what has been labeled here as an Academic Enabling factor. These behaviors include student effort, participation, improvement, ability, and discussion. It appears that teachers use all of these behaviors, apart from actual academic performance, as important indictors for determining grades. This finding has theoretical significance when one is conceptualizing the nature of student effort in relation to assessment. Brookhart (1997), for example, has identified mental effort and realized student effort as a major component of a theoretical framework for investigating the effects of classroom assessment. The present findings suggest that the manner in which effort is defined in this model may need to be broadened to include some indication of improvement and ability. It may well be that teachers think about effort as mediated by improvement and ability. That is, what is regarded as high or low effort is dependent on what knowledge and skills students bring to the classroom and on a more general impression of ability.

It is clear that many teachers use academically enabling behaviors, such as participation, effort, and improvement extensively to determine grades, whereas other teachers do not use these variables very much. This finding may reveal educational philosophies or approaches that give different messages to students. For example, teachers who reward effort may be allowing students who are not competent, as well as their parents, to believe that they demonstrate needed knowledge and skills. This action would be especially troubling if it occurred more with low-socioeconomic status (SES) students, who could be rewarded for their effort to maintain involvement, because they might not be provided with feedback that more accurately indicates their level of knowledge and skills. Also, are teachers who weight effort more in reality “coddling” students, making it easier for them to obtain passing grades? Some support for this implication is found in a study by Cauley and McMillan (2000), who found that middle school teachers at low-SES schools (determined by the percentage of students eligible for free and reduced-price lunches) tend to use nontest factors more in grading than do teachers at low-SES schools.

Although few relationships were reported between assessment practices and grade level, greater emphasis in later grades is placed on homework, extra credit, constructed- response assessments, objective assessments, and major

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14

MarcldApril2002 [Voi. 95(No. 411 213

examinations. Other practices, such as effort, ability, improvement, and academic performance are emphasized the same in all three grade levels. Teachers who award more A's use fewer objective assessments, fewer publisher-provided tests, less homework, and more assessments that measure reasoning and application. There was no relationship between the extent to which effort, improvement, ability, academic performance, homework, and extra credit were emphasized, and percentage A's awarded. Overall, the analyses designed to predict assessment and grading practices fell short of much practical significance. We know that these practices do not vary much according to grade level (3-5) or subject matter (mathematics and language arts). However, there was considerable variation between schools in the percentage of A's awarded. Further research concerning these differences is warranted to explore possible reasons.

The results of this study are limited by several variables, including demographics and location (Virginia is in the midst of a statewide assessment program consisting of all multiple-choice tests, with the exception of writing), using Grades 3, 4, and 5 with language arts and mathematics but not other subjects, and the fact that the data, including grades awarded, were based on teacher self-reports. In addition, the analyses were unable to tease out possible effects due to the SES of each school. However, the large and comprehensive nature of the sample still suggests strong exter- nal validity. The self-reports were based on actual practice with a specific class, not more generic beliefs, and repre- sented innercity, suburban, and rural schools. Researchers who investigate assessment practices may find that the components identified are useful categories for asking questions and relating assessment and grading practices to student motivation and achievement.

NOTES

This research was supported by the Metropolitan Educational Research Consortium, Virginia Commonwealth University, Richmond, Virginia. The findings do not represent the views of members of the consortium.

The authors appreciate the reviewers' suggestions for needed improve- ments in earlier drafts.

REFERENCES

Airasian, P. W. (1984). Classroom assessment and educational impmve- ment. Paper presented at the conference, Classroom Assessment: A Key to Educational Excellence, Northwest Regional Educational Laboratory, Portland, OR.

American Federation of Teachers, National Council on Measurement in Education, and National Education Association. (1990). Standards for teacher competence in educational measurement. Washington, DC: Author.

Brookhart, S. M. (1991). Grading practices and validity. Educational Mea- surement: Issues and Practice, 1 4 35-36.

Brookhart, S. M. (1993). Teachers' grading practices: Meaning and values. Journal of Educational Measurement, 30, 123-142.

Brookhart, S. M. (1994). Teachers' grading: Practice and theory. Applied Measurement in Education, 7, 279-301.

Brookhart, S. M. (1997). A theoretical framework for the role of classroom assessment in motivating student effort and achievement. Applied Mea- surement in Education, 10, 161-180.

Cauley, K. M., & McMillan, J. H. (2000, April). Do teachers grade difier- ently in low SES middle schools? Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.

Cizek, G. J., Fitzgerald, S., M., & Rachor, R. E. (1995). Teachers' assessment practices: Preparation, isolation and the kitchen sink. Educational assessment, 3(2), 159-1 79.

Cross, L. H., & Frary, R. B. (1996, April). Hodgepodge grading: Endorsed by students and teachers alike. Paper presented at the annual meeting of the National Council on Measurement in Education, New York.

Fleming, M., &Chambers, B. (1983). Teacher-made tests: Windows on the classroom. In W. E. Hathaway (Ed.), Testing in the schools: New direc- tions for testing and measurement (pp. 29-38). San Francisco: Jossey- Bass.

Frary, R. B., Cross, L. H., & Weber, L. J. (1993). Testing and grading practices and opinions of secondary teachers of academic subjects: Implica- tions for instruction in measurement. Educational Measurement; Issues and Practice, 12(3), 23-30.

Gullickson, A. R. (1985). Student evaluation techniques and their relationship to grade and curriculum. The Journal of Educational Research, 79(2), 96-100.

Gullickson. A. R. (1993). Matching measurement instruction to classroom- based evaluation: Perceived discrepancies, needs, and challenges. In S. T. Wise (Ed.), Teacher training in measurement and assessment skills (pp. 1-25). Lincoln, NB: Buros Institute of Mental Measurement.

Marso, R. N., & F'igge, F. L. (1993). Teachers' testing knowledge, skills, and practices. In S. T. Wise (Ed.), Teacher training in measurement and assessment skills. Lincoln, NB: Buros Institute of Mental Measurement.

McMillan, J. H., & Nash, S. (2000, April). Teacher classroom assessment and grading practices decision making. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.

Plake, B. S., & Impara, J. C. (1993). Assessment competencies of teachers: A national survey. Educational Measurement: h u e s and Practice, 12, 10-25.

Plake, B. S., & Impara, J. C. (1997). Teacher assessment literacy: What do teachers know about assessment? In G. D. Phye (Ed.), Handbook of classroom assessment (pp. 55-68). New York Academic Press.

Stiggins, R. J., & Bridgeford, R. J. (1985). The ecology of classroom assessment. Journal of Educational Measurement, 22(4), 27 1-286.

Stiggins, R. J., & Conklin, N. F. (1992). In teacher's h a d : Investigating the practices of classroom assessment. Albany: State University of New York.

Stiggins, R. J., Frisbie, D. A., & Griswold, I? A. (1989). Inside high school: Building a research agenda. Educational Measurement: Issues and Practice, 8, 5-14.

Tittle, C. K. (1994). Toward an educational psychology of assessment for teaching and learning: Theories, contexts, and validation arguments. Educational Psychologist, 29, 149-162.

Dow

nloa

ded

by [

Uni

vers

ity o

f B

ath]

at 0

5:06

26

Nov

embe

r 20

14

Documents

Elementary Teachers' Classroom Assessment and Grading Practices