17

Click here to load reader

An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

Embed Size (px)

Citation preview

Page 1: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

ELSEVIER Studies in Educational Evaluation 30 (2004) 237-253

Studies in Educational Evaluation

www.elsevier.com/stueduc

AN EXAMINATION OF STUDENT MATHEMATICS LEARNING IN ELEMENTARY AND MIDDLE SCHOOLS: A

LONGITUDINAL LOOK FROM THE US

Cody Ding and Virginia Navarro

University of Missouri-St, Louis, USA

Abstract

School districts in the U.S. increasingly rely on standardized testing to document student achievement in mathematics, reading, science, and English. Educators and policy-makers have engaged in heated debates as to the effects of such standardized testing practices on actual student achievement. In studying these issues, it is important to examine the growth and improvement patterns for individual learners over time. In most studies to date, however, there is a lack of empirical data at the individual level; the studies are rather cross-sectional designs. The current study documents longitudinal math achievement growth based on 716 students in a single U.S. school district. The data for this study are student math scores on SAT 9 achievement tests administered yearly between 1997 and 2000. Multidimensional Scaling profile modeling was used in this investigation to track individual and group patterns over four years. Analysis revealed unequal growth rates. The results are discussed in light of recent school reform effects and raise questions about the expectation of incremental growth curves mandated in the No Child Left Behind Act of 2001 provision that requires Adequate Yearly Progress (AYP). Even in a relatively stable school district, children's actual math learning growth patterns do not mirror the AYP expectation.

0191-491X/04/$ - see front matter © 2004 Published by Elsevier Science Ltd. doi: 10. t 016/j.stueduc.2004.09.004

Page 2: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

238 C. Ding & V. Navarro / Studies in Educational Evaluation 30 (2004) 237-253

Documenting student math achievement is a challenging 3ask, particularly if one wants to trace such achievement growth at the individual level. Due to the passage of the No Child Left Behind Act of 2001, states in the U.S. are now forced by federal law to meet adequate yearly progress (AYP) targets for all groups of students using high-stakes testing scores as the only measure of success. Concerned that students do not possess the mathematical knowledge needed to function in our increasingly complex society (Allexsaht-Snider & Hart, 2001; Middleton & Spanias, 1999), and galvanized by the claims in A Nation at Risk that our schools were drowning in a "rising tide o f mediocrity" (National Commission on Excellence in Education, 1983), legislators focused on mandates to increase both performance and accountability in schools (Cimbricz, 2002). The National Council of Teachers of Mathematics (NCTM) led reform efforts by developing standards- based curricula (NCTM, 1989; 1991; 1995; 2000) that "catalyzed a national standards movement" (Schoenfeld, 2002, p.15). As a result, many schools mandate tests in reading and math for all students in Grades 3 through 8; more states are also requiring that students pass such standardized tests for promotion and graduation (Kane & Staiger, 2001). Under the No Child Left Behind Act of 2001, students must meet or exceed an annual measure of "AYP" based on high-stakes standardized tests. Lashway (2002) succinctly summarizes today's present climate: "Today, standards-based accountability is the 800-pound gorilla of school reform - highly visible, hard to control, and impossible to ignore" (p. 15).

Despite the fact that many people in the U.S. believe that standardized testing can contribute to educational improvement at the local and national levels, evidence to support this relationship is controversial. There are two issues. First, as Stake (1995) and Zancanella (1992) claim, the validity of effects of standardized testing on student learning is unknown. The assumption that testing, in and of itself, can improve education has not been validated in the past decade's research on school testing. Baker, O'Neil, and Linn (1993) indicate that less than five percent of the literature cites empirical data. Thus Baker, et al. (1993) and Mehrens (1998) conclude that better research is needed to evaluate the degree to which newly developed assessments meet reform expectations. Most of the evidence on the effects of testing itself on achievement has not been gathered in a manner that allows for a causative inference to be drawn (Reckase, 1997).

Second, findings from empirical data on student achievement as assessed by standardized tests are not promising. In a study of student achievement for metropolitan Boston schools, Bolon (2001) examined 1998, 1999, and 2000 Massachuset ts Comprehensive Assessment System math scaled scores for 47 academic schools. He found that, by school, average Grade 10 math scores changed little, with a 1.3-point increase from 1998 to 1999 and a 5.9-point increase from 1999 to 2000. Similarly, Haney (2002) analyzed annual test score averages in Massachusetts for nearly 1000 schools for four years (1998-2001); he found that test score gains in one testing period tend to be followed by losses in the next testing period. He also showed that school averages were volatile in relatively small schools because of the small sample size.

Using effect size as a measure of AYP, Orlich (2003) studied the effect on student achievement as a consequence of the longitudinal administration of the Washington Assessment of Student Learning during 1998 to 2001. The findings, similar to those reported by Bolon (2001) and Haney (2002), showed no positive impact on yearly student achievement as a result of the assessment. At the national level, parallel findings were also

Page 3: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

C. Ding & V. Navarro /Studies in Educational Evaluation 30 (2004) 237-253 239

reported (Amrein & Berliner, 2002; Suter, 2000). Data from NAEP studies about trends in math achievement between 1973 and 1996 detected small score increases for similar age groups (9, 13, and 17) (Suter, 2000), but these changes were considered too small to be significant (National Science Board, 1998, 2000). In analyzing cohort trend data, Barton and Coley (1998) showed that there was no increase in growth rates between the 1970s and 1990s.

In addition, Darling-Hammond (2003) argues that doubt must be cast on the significance of gain scores on state tests because students who made gains on state tests such as TAAS in Texas did not make comparable gains on national standardized tests (see also Linn, Baker & Betebenner, 2002). Kane and Staiger (2001) suggest that the test results are not reliable due to tile particular samples of cohort grades and the differences in test content and administration. They warn that when evaluating the impact of policies on changes in test scores over time, one must take into account the fluctuation in test scores that occur naturally.

The study reported here builds on above published literature. For example, the longitudinal analytical approach used in several of these studies is also employed here. Additionally, while the work discussed above examined student achievement using aggregated school level data or cohort data, this inquiry addresses student math achievenlent using individual level data. One important reason why longitudinal individual level data might be a more useful indicator of student math achievement than the aggregated school level or district level data is that only individual level data allow for researchers to make direct identification of intra-individual change. On this count, individual achievement data clearly have more potential to reflect growth within the same individuals. Using the percentage of students who meet the annual measurable objectives set by a state might not adequately capture a detailed analyses of individual learning. For example, if the percentage of students in a school who meet the AYP benchmark increases from 4% to 8% from one year to the next, the increase may reflect a different group of students from one data point to the next. Thus, it is not possible to measure the growth of the same students over time.

Although several studies regarding these issues are available (e.g., Fan, 2001), empirical evidence based on longitudinal data is still much needed. The purpose of this study is to document the student achievement profile as a consequence of the longitudinal administration of the SAT-9 mathematics tests. The central research question addressed can be stated as follows: What are the math growth or achievement patterns of elementary and middle school students? Another way of stating this might be, when the same students are measured by standardized math tests over time, what types of growth seems plausible? Related questions are: (1) How is the student math achievement related to student demographic characteristics (i.e., limited English proficiency, special education status, giftedness, etc.) and (2) Are differences in growth attributable to individual school sites? Given the impetus of improving student math achievement over time, the significance of this inquiry lies in the fact that it will provide individual level longitudinal evidence addressing these questions that are fundamental to school reform efforts and teaching practices.

Page 4: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

240 C. Ding & V. Navarro /Studies in Educational Evaluation 30 (2004) 237-253

Methods

Sample

The data for this study are individual test scores from the Stanford Achievement Test, 9th edition, in mathematics administered between 1997 and 2000 to all children in Grades 3 to 8 in a school district from a southwestern state in the U.S. The SAT-9 is a widely used achievement test published by Harcourt Brace Educational Measurement in the U.S. It was designed to measure achievement in the curriculum content commonly taught in grades I through 9 throughout the United States. The majority of students in the sample stayed within the school district from 1997 to 2000; the district averaged an 8% transfer rate l'rom other schools during this period. Participants in the current sample were from seven K-8 schools and consisted of two cohorts tracked over four years, 1997-2000. One cohort comprised 337 third graders, and the second comprised 379 fifth graders. That is, the Grade 3 cohort was tested at 3rd, 4th, 5th, and 6th grade and they represented the population in the elementary schools. The Grade 5 cohort was tested at 5th, 6th, 7th, and 8th grade and they represented the population in middle schools. Since the focus of this study is on the longitudinal data, the subjects in the sample were those who had complete test data points over the four years of measurement.

The students in the sample were predominantly Caucasian from low middle to middle class families; 8% had limited English proficiency; 4% were in special education programs; 1 l% were considered gifted. The average age of the third graders was 10 years old at the first time of measurement and the average age of fifth graders was 12 years of age at the first time of measurement.

Instrument and Analysis Design

From 1997 to 2000, students in the sample were annually tested in mathematics using the standardized, nationally norm-referenced SAT-9 Math as part of a larger effort to monitor the progress of elementary and middle school students toward mastering mathenaatics knowledge from one year to the next. The test results of SAT-9 Math were also used as an indicator that students were making AYP. For the purpose of comparing a student's progress from one year to the next, the test scores were vertically scaled across multiple measurements so that the scores were comparable over time.

This study employed exploratory growth profile modeling via Multidimensional Scaling (MDS) (Ding, Davison, & Petersen, in press) to model student mathematics achievement from elementary school through middle school. The MDS profile analysis method provides us with (1) growth curves that are derived from the data rather than from a priori theory; that is, the shape of the growth curves are determined by the data, not by the rcscarchers; (2) a growth rate that is modeled for each time interval rather than one average growth rate; and (3) individual growth patterns that can be estimated with respect to the latent growth curve.

The rnodel for the growth profile analysis based on Multidimensional Scaling (MDS) is based on the equation:

mp(l) = YkWpk Xk(t) ~- Cp -}- ep(t) (1)

Page 5: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

C. Ding & V. Navarro / Studies in Educational Evaluation 30 (2004) 237-253 241

mpt~ ~ is the observed test score for person p at time t. Wrk iS a profile match index that characterizes person l) with regard to k m growth profile. Each dimension can be considered as a growth profile. The profile match index, wpk, quantifies the degree to which the observed data resemble the several growth profiles represented by profile from MDS analysis. Xk(t) is a growth scale value and reflects growth rates along the growth profile at a given time. In the model, each persoJfs individual growth profile is modeled as a linear combination of the latent profiles represented by dimensions.

The intercept or level parameter cp can be defined in a way that it corresponds with the growth scale value at the first time period (i.e., Xk(l) = 0 for all k) in longitudinal analyses. Therefore, once an initial MDS solution is obtained, the growth scale value of the first time point is re-set to zero on each profile so that level parameter estimate reflects initial scores.

Est imat ing M D S Growth Prof i le Parameters

MDS growth profile modeling in Equation 1 includes two kinds of parameter estimates. One is the parameter estimates of group-related growth rates, that is, the growth scale values Xk(t). The scale values indicate the distance between each time interval and are estimated based on geometric space model (reader may want to consult the book by Davison (1983) for details). Many statistical packages such as SAS have a MDS analysis procedure for estimating Xk(t). Growth across an interval is calculated by finding the slope of the curve for each interval in the following way:

gi - xkl ' / - xx('-l) (2) t - (t - l)

where x~ctl is scale value at time t, and t is a particular time point. For example, if we have four time points of data, we would have three time intervals. Thus, four growth scale values are estimated, one for each time point. The growth across the first interval (i.e., between time 1 and 2) is estimated by finding the ratio of difference between the scale values to the difference between time. As we mentioned above, after the initial estimates of scale values are obtained, the scale values at time 1 are set to zero on each profile by Xk(t) = X k ( t ) - X k ( l ) . The re-scaled growth scale values estimates would have a range of zero to some positive or negative values, depending on the shape of the curve, and these growth scale values would represent the growth rate for each time point. This is a particularly nice feature since (1) we can explicitly get growth rates for each time interval, and (2) we do not use one fixed growth rate to project the growth that would indicate a constant growth rate over the years.

A second feature in this model is parameter estimates of individual-related growth rates. l Iaving set the origin of the profiles, the intercept and profile match index parameters in the model, % and \Vpk, can then be estimated for each individual by regressing the observed data rap(t) onto the

scale value estimates, Xkfl), and this would yield least squares regression estimates of the intercept cp and slope Wpk for each person. These estimates reflect individual differences with regard to initial growth status and growth rate, which is also called slope in the growth literature.

Thus, the MDS growth profile analysis consists of estimating growth parameters for both group and individuals, therefore allowing one to address group growth/decline patterns as well as individual differences in growth/decline curves. In the first step, MDS scale values Xk(t) are estimated and the zero point of the scale values are rescaled so that the intercept estimate Cp can be interpreted as an initial growth level. Growth across the interval can be found by calculating the

Page 6: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

242 C. Ding & V. Navarro / Studies in Educational Evaluation 30 (2004) 237-253

ratio of change in scale values to the change in time. In the third step, the intercept cp and slope Wpk are estimated by regressing observed data mp(t) onto scale values Xk(t).

Results

Means and standard deviations for the student mathematics achievement data over the four-year span for each cohort are shown in Table 1. Over the four times of measurement the average scaled scores suggest an increase in student mathematics achievement for each cohort.

Table I: Means and Standard Deviations of Student Math Scaled Scores over Four Years

Grade 3 Cohort Grade 5 Cohort

1997 593.17 (38.96) 644.80 (34.56)

1998 629.36 (34.41) 667.80 (36.26)

1999 646.85 (36.00) 677.55 (36.42)

2000 664. l 5 (33.99) 686.04 (35.03)

Note: Standard deviation in parenthesis. N = 337 Ibr Grade 3 cohort and N - 379 for Grade 5 cohort.

Based on Equation 1, MDS growth profile analysis was used to examine group and individual growth profiles in the data. As mentioned above, time is specified as the latent profile along which individuals vary with regard to the growth/decline curve. In applying this model, the math score is considered a repeatedly measured variable on a t ime dimension along which individual growth patterns are of interest. Since the analysis is exploratory, no a priori specification of a particular growth/decline profile is necessary. Thc growth profiles are to be derived from the observed data.

MDS scale values XkUl were first estimated separately for each cohort using nonmetric MDS procedure in SAS, one of the commonly used statistical programs. Since there were four test scores, a one-profile solution was derived from the data for each cohort. The fit measure between a one-profile MDS model and the data was the MDS fit index as measured by STRESS formula 1 (Kruskal, 1964). The value of the STRESS formula 1 ranges from zero to one, with values close to one being an indication of poor fit between the model and the data. In the current analysis, the STRESS formula 1 value was zero (S1 = 0.00) for both cohorts, indicating a good fit between the data and the obtained one-profile solution. For issues related to the MDS growth model, readers may want to consult relevant articles (Ding, 2003; Ding, et al., in press).

The estimates of MDS growth curve values (i.e., scale values o f the four t ime points) are presented in Table 2. These growth scale values are the final estimates obtained by re-scaling the estimates so that the zero point corresponds to the growth scale value at time I. In the context of growth modeling, the scale values reflect the degree to which two test scores are apart from one test measurement to the next for the same individual. The ratio of difference between the two scale values to the change in time reflect growth/decline rates between test intervals. For example, in the current case the total growth rate in math

Page 7: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

C. Ding & V. Navar#v /Studies in Educational Evaluation 30 (2004) 237-253 2 4 3

scorcs between 97 and 2000 was 2.69 in scale value for the Grade 3 cohort. The growth rate gE between 97 and 98 was 1.42 in scale value (i.e., the difference between scale value at 98 and 97 divided by 1), which reflects a 1.42 scale value point growth in test scores. The growth rate g2 between 98 and 99 was a 0.67 scale value point, and for g3 between 99 and 00 was a 0.65 scale value point. Because the unit of the scale value is arbitrary, growth rate g~ is reported in terms of a proportion of the total growth represented at the end of the interval Time 1 and Time N instead of in growth rate gi itself. The growth in percentage for each cohort is shown in Table 2, along with the scale values.

Table 2: Estimates of Scale Values for the Growth Profile of Math Achievement over Four Measurenlcnt Points by Cohort

Grade 3 Cohort Grade 5 Cohort

Scale values Growth rate (%) Scale values Growth rate (%)

Math 97 0.00 0%

Math 98 1.42 53%

Math 99 2.09 77%

Math 00 2.69 100%

0 0%

1.54 58%

2.19 82%

2.64 100%

Figure I shows the group growth profile for each cohort based on the estimates of growth scale values.

= 2

1.5 t~

i 0 . 5

2 . 5

Grade 3 Cohort

m - - m Grade 5 Cohort

97 98 99 2000

Time of Testing

Figure 1" Group Growth Patterns for Student Mathematics Achievement Data

As can be seen froin Figure 1, both cohorts had very similar growth curve patterns over the years. Specifically, for the first cohort, the profile shows the greatest growth from 3rd grade to 4th grade. In this pattern, 53% of the growth occurs over this first interval. The

Page 8: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

244 C. Ding & g Navarre/Studies in Educational Evaluation 30 (2004) 237-253

growth slowed down from 4th grade to 6th grade. In this pattern, 24% of the growth occurs £rom 4th to 5th grade and 23% from 5th to 6th grade. Similarly, for the Grade 5 cohort, the pattern shows that the largest growth is from 5th grade to 6th grade, since 58% of the growth occurs over this first interval. The growth slows down from 6th grade to 7th grade when 24% of the growth occurs. From 7th grade to 8th grade, 18% of the growth occurs, showing the smallest percent of change.

hzdividual DiJJerences in Growth Rate

To examine individual differences in initial growth status and growth rate, intercept % and profile match index Wpk were estimated. As mentioned above, a profile match index indicates each individual's growth rate for a given time period if a given individual fits the model. The higher the value of the profile match index, the faster the growth for a given individual. In the current analysis, the average profile match index was 26.26, with a standard deviation of 11.32 for the Grade 3 cohort, and the average profile match index was 15.41, with a standard deviation of 8.45 for the Grade 5 cohort. These growth rates for individuals are shown in Figure 2. As can be seen from the figure, most of the students had lnade gains in math scores over the years, with only a few students failing to show some progress.

,oJ Grade 3 Cohort Grade 5 Cohort

t,',40,

0 0

20'

0 . ~ r - -25.00 0.00 25.00

Growth Rate for Individuals

50.00 -25.00 0,00 25,00

Growth Rate for Individuals

Figure 2: Histogram of Profile Match lndexes for Individuals

I

50.00

Page 9: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

C. Ding & ~ Navarro / Studies in Educational Evaluation 30 (2004) 237-253 245

In addition, the correlation between the intercept cp (i.e., initial growth status) and the profile match index Wpk was -.53 (p < .01) for the Grade 3 cohort, and -.28 (p < .01) for the Grade 5 cohort. This indicated that students who had high initial math scores tended to make less of a gain in achievement over the subsequent periods.

Who Improved and Who Declined?

Tile results fi'om tile above analyses indicate that individual students have made different degrees of gain in math achievement. As some public officials are increasingly calling for the use of tests to make high-stakes decisions (i.e., whether a student will move on to the next grade level or receive a diploma), it is important to ensure that certain subgroups of students, such as students with a disability or limited English proficiency, are not systematically excluded or disadvantaged by the test. On the other hand, it is also critical to remember that, in many instances, without tests, low-performing students and schools could remain invisible and therefore not get the extra resources or remedial help that they need. For these reasons, we will examine how certain subgroups of students perform as a consequence of repeat administration of the SAT9 as well as school differences in achievement.

The availability of individual profile match index and intercept estimates made it possible to examine subgroup equity in achievement. The specific question is: What type of students gained the most when measured by the standardized test? This can be done by relating initial growth level estimate and individual growth rate estimate (i.e., profile match index) to the student characteristic variables. In the current analysis, three student characteristic variables were used: (1) LEP status (1 = student with limited English l)re~/icien~3,, 2 = regular student); (2) Special education status (1 = in special education, 2 = not iH special education); and (3) gifted status (1 = gifted, 2 = not gifted). Looking at these ~ariablcs could help us answer the question: Do students with certain characteristics have the same growth rates as the rest of the students?

Two analyses were conducted: one with the initial growth level as the dependent variable and one with the profile match index as the dependent variable. Each analysis used a 2(LEP) x 2 (special ed) x 2 (gift status) factorial design. The initial factorial design included interaction terms between the factors. Since there were no significant interactions found, these interaction terms were excluded from the subsequent analyses.

Table 3: Analysis of Variance for Initial Growth Level and Individual Growth Index Estimates

Grade 3 Cohort Sources d/ F value

Initial Profile match index

LEP l 14.42"* 3.27

Special Ed l 9.52** 2.01

Gifted Status l 54.23** 0.16

Error 333

Grade 5 Cohort df F value

Initial Profile m~chindex

1 10.04"* 0.62

1 6.06** 1.00

1 131.56"* 2.12

375

Nolo: **I ~< .{)].

Page 10: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

246 C. Ding & V. Navarro / Studies in Educational Evaluation 30 (2004) 237-253

Table 3 shows the ANOVA results of those analyses. Table 4 shows the means and standard deviations of the initial growth level and profile match index by LEP status, special education status, and gifted status.

Table 4: Means and Standard Deviations of initial Growth Level and Individual Growth Index

Grade 3 Cohort Grade 5 Cohort

Initial level Initial level Profile match index

LEP

Yes 566.89 (24.02)** 30.19 (7.68)

No 594.89 (37.85)** 25.95 (11.51)

Special Ed

Yes 558.20 (30.48)** 30.76 (15.08)

No 593.99(37.37)** 26.10(11.15)

Gifted

Yes 634.90 (33.35)** 26.76 (11.52)

No 587.98 (35.16)** 26.21 (11.31)

Profile match index

627.97 (26.87)** 16.33 (8.15)

646.11 (34.77)** 15.61 (8.48)

618.86 (19.85)** 17.37 (7.88)

645.39 (34.57)** 15.34 (8.47)

690.37.(29.21)** 16.93 (8.10)

637.67 (29.71)** 15.18 (8.49)

Note: Slandard deviation in parenthesis. ** Indicates that the means were statistically significantly different from each other at .01 level.

As can be seen from Tables 3 and 4, there is a significant difference between all three student characteristic variables with regard to their initial SAT-9 math level, but not to their profile match index. Specifically, for both the Grade 3 cohort and Grade 5 cohort, students with limited English proficiency, students in the special education program, or students who were not gifted have, not surprisingly, statistically significant lower initial SAT-9 math scores. These same students, on the other hand, appear to have the same growlh rate as all other groups, including gifted students, despite their initial low starting s c o r e s .

School Differences in Growth Rate

The data used in this study represent seven schools in a single district in the southwest part o f the U.S. Did students across schools differ in initial math levels and growth rates? To examine the relationship between math growth and schools, the school variable was used as a factor in ANOVA analyses with initial math level and profile match index as dependent variables. The analyses were performed for each cohort separately. The results are shown in Tables 5 and 6.

Page 11: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

C Ding & V. Navarro /Studies in Educational Evaluation 30 (2004) 237-253

Table 5: School Differences in Initial Level and Growth Rate for Grade 3 Cohort

247

Source dT" F Values

Initial Profile match index

School (S) 6 11.26"* 13.10"*

Error 330

Me(ms and Standard Deviations (N =337)

School Initial Profile match index

I 07=90) 614.24 (34.76)* 20.84 (11.38)

2 (n=37) 605.72 (41.66)* 21.11 (10.54)

3 0~=47) 588.56 (28.61) 33.14 (7.69)*

4 (H=27) 583.11 (31.14) 23.13 (11.88)

5 (n-55) 580.83 (36.5t) 26.98 (10.55)

6 (n=34) 580.41 (30.12) 29.60 (8.59)*

7 (n=47) 573.72 (36.56) 32.37 (9.78)*

Nole: * * p < . ( l l . Standard deviat ion in parenthesis. * Indicates that the mean was statistically significantly different from others but not from each other at .05 level.

The ANOVA results indicate that there is a significant school effect on both the initial growth level and profile match index for both Grade 3 and Grade 5 cohorts. This suggests that some schools had higher initial achievement scores and some had higher growth rates than did the others. To further examine which schools had statistically significant differences from other schools in initial growth level and profile match index, mulliple group cornparisons using the Turkey's Honest Significant Differences (HSD) procedure were performed.

For the Grade 3 cohort, Schools 1 and 2 had higher initial achievement scores than did other schools, as shown in Table 5. On the other hand, schools 3, 6, and 7 have higher growth tales dmn other schools. That is, students in these three schools seem to have larger gains in achievement. It is interesting that the two schools in the Grade 3 cohort showing the lowest initial scores (Schools 6 and 7) are ranked 2 and 3 in the profile match index, indicating strong math growth. Additionally, School 1 with the highest initial achievement score has the lowest profile match index score at 20.84.

Page 12: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

248 C. Ding & V. Navarro / Studies in Educational Evaluation 30 (2004) 237-253

Table 6: School Differences in Initial Growth Level and Growth Rate for Grade 5 Cohort

Source q'f F Values

Initial Profile match index

School (S) 6 6.59** 8.77**

Error 372

A4~'tm,s' alld StaHc/alz/ De~'ialiollx (N --379)

School [ntercept Profile match index

1 (n=80) 664.42 (35.56)* 16.25 (6.20)

2 (n-47) 635.36 (30.87) 14.81 (8.78)

3 (n=60) 643.25 (36.05) 16.80 (8.55)

4 (n=25) 638.42 (30.76) 13.17 (8.33)

5 (n-52) 632.29 (30.48) 18.25 (5.95)

6 (n=47) 642.26 (28.00) 18.80 (8.72)

7 (n=68) 641.56 (36.05) 9.90 (9.27)*

Note: **t 7 < . 0 l . Standard devial ion in parenthesis.

* [ndic¢ltcs flint the mean was statistically significantly different from others at .05 level.

For the Grade 5 cohort, school 1 again had the highest initial achievement score as shown in Table 6. School 7, however, has the lowest growth rate although it ranks 4th in initial achievement score at 641.56. It might reflect that growth rates are not predictable from initial performance data but that the context of individual schools does influence math learning outcomes. This finding was consistent with articulations regarding school effectiveness (e.g., Alston, 2004; Griffith, 2003). Another issue that a comparison of Tables 5 and 6 raises is the disparity between growth rates in the 3rd grade and 5th grade cohorts. The profile match index in the lowest growth school in the 3rd grade cohort group is 20.84 (School 1) while the highest profile match index in the 5th grade group is 18.80 (School 6). Further study in developmental differences and math teaching by level is needed to understand such discrepancies.

Discussion

Calls to improve educational outcomes by measuring student and school performance are based on good intentions. Therefore, in conjunction with supporting the use of tests to evaluate performance, public policymakers should also support research on the consequences of such testing. If tests are going to be used to determine which students

Page 13: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

C. Ding & g Navarro / Studies in Educational Evaluation 30 (2004) 237-253 249

will advance, it is imperative that we understand the effects of testing on student achievemcnt. The main purpose of this study was to document students' actual mathematics achievement patterns between 1997 and 2000 in a low-mobility school district as lneasured by yearly-standardized test scores. The central research issue addressed was math growth or achievement profiles that might realistically be expected from elementary and middle school students. This issue is important because states will be required to document AYP for all students or they will be labeled deficient according to Bush's No Child Left Behind Act of 2001. Studying individual growth profiles allows us to examine viable progress actually made by students.

Fhe findings ol" uneven growth in math test scores may suggest that the assumption that children should show linear progress in norm-referenced tests in content areas is problemalic and fails to account for the uneven way in which mastery of new concepts are achieved across developmental ages. By setting up punitive consequences for schools who fail to achieve such unrealistic growth curves, these expectations result in unintended actions by schools and states to artificially scale results to appear successful. In many states, the growth rate expectations are set too high and few students actually show the expected growth patterns mandated by the AYP stipulations. Fearing that students who are unable to meet cut-off scores would be held back a grade, and that many schools would face penalties under the federal No Child Left Behind legislation, some states reduce test standards to avoid sanctions. For example, Michigan's standards, initially among the nation's highest, faced problems in 2002 when 1,513 schools were labeled, by law, as needing improvement, a higher percentage than in any other state. Michigan then lowered the passing rate to certify schools as making adequate progress (Dillon, 2003a). In another recent example, a significant percent of New York seniors failed to pass the math portion of their Regents Exams and were not eligible for a diploma, a state of affairs that gets community attention and resulted in invalidating the test scores (Dillon, 2003b).

In the analyses presented in this paper, it was shown that math achievement, at first, increased quickly for both 3rd and 5th grade cohorts, followed by a slower growth rate. Students from both cohorts demonstrated achievement growth over all years, but the data showed that the improved student mathematics achievement did not sustain at the same growth rate over the years. The initial growth rate was double the growth observed at later times. The implication o1" the findings revolves around the incongruity between actual patterns of growth over four years that we document from students in a stable system and the high-stakes growth expectations of AYP for all students based on standardized testing mandated by law.

According to Linn (2003), policymakers believe that AYP graphs should plot on a straight line, with a fixed annual measurable growth rate between years 2001-2002 and full implementation in 2013-2014. If we were to expect students to progress in such a linear fashion, the test scores would increase linearly at a fixed rate. Thus, the AYP requirements present an overwhelming and probably unrealistic challenge to schools and students. Patterns ot" learning development in humans may be more complex than business models that chart progress in more linear ways. Orlich (2003) suggested that there is a need at the I'ederal level to examine the plausibility of setting AYP targets. Others argue for a more balanced use of composite indices in school accountability systems (Stevens, Parkes, & Brown, 2002).

Page 14: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

250 C Ding & g Navarro / Studies in Educational Evaluation 30 (2004) 237-253

Data from longitudinal studies are essential to examine the nature of the true gain in student achievement before any definitive conclusions can be drawn. Current legislation may be setting up schools and students for unrealistic expectations of growth, resulting in labcting and inappropriate use of scarce resources to try and produce incremental growth curve results.

A related issue to math growth is whether every student can reach the same achievement level given that they all may have different starting points. The data showed that student subgroups (i.e., students with limited English proficiency or who are in special education programs) had lower initial math scores but their growth rates were similar to that of their counterparts. Thus, it seems that the achievement gap will not be closed Ihrougil short term efforts. Fan (2001) reported that initial good performers had faster growth than initial poor performers; because of these differential growth rates, the initial good performers performed even better over time widening the gap. Our data, however, show that initial poor performers have the same growth rates as the initial good performers, and that, additionally, initial good performers gain less than the initial poor performers, as indicated by the negative correlation between initial scores and growth rate. Thus it appears, in our data, that initial poor performers are "catching up" to some extent.

Thus, the implications for educational practice involve teachers knowing that they need to overcome not only the poor performers' initial performance deficit but also their growfl~ rate increase. This two-sided barrier (initial deficit and need for accelerated growth rate) may help explain why it is often so difficult to improve a group of poor performers' relative academic standing in education practice (Fan, 2001).

Limi ta t ions

As with any study, there are several caveats that should be considered prior to the generalization of the findings. First, the data used in the present study are from a single school district in the U.S. Thus, great care is warranted when inferring the growth profiles of student learning across very different contexts and countries. For example, compared to complex mobile urban districts, this district had moved towards standards and accountability measures early in the reform movement. Second, inclusion of just four time points limits the forms of growth that can be studied. Additional time points would allow investigation of true growth as well as increased power to detect growth in mathematics achievement over time. We acknowledge that multiple factors affect standardized test score outcomes and that assessment should be in line with teaching and educational standards or objectives. The data comparing the initial achievement levels and the growth rate indices across the seven schools suggest that school factors do mitigate the potential tbr individual growth over time and that further research is needed to identify and map these differences. Finally, given the complexity of school education reform and the situated contexts of diverse learners and communities, it would be of great interest to explore potential mediators or determinants that are likely to shape student achievement, especially fi-om a longitudinal perspective.

Page 15: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

C. Ding & V. Navarro /Studies in Educational Evaluation 30 (2004) 237-253 251

References

AllcxsahI-Snider, M.. & ltart, L.E. (2001). "Mathematics for all": How do we get there? Theory aad l'va~/ic< 40 (2), 93-101.

Amrcm, A.L. & Berliner, D.C. (2002, March 28). High-stakes testing, uncertainty, and student learning. Educa t ion Pol icy and Anah ' s i s Archives , 10 (18). IOn-line]. Available: http://epaa.asu.edu/epaa/v I On 18/.

Alston, J.A. (2004). The many faces of American schooling: Effective schools research and border- crossil~g in Ihc 21st century. American Secondary Education, 32 (2), 79-93.

Baker, E.L., O'Neil, H.F., & Lmn, R.L. (1993). Policy and validity prospects for performance- based assessment. American P, sTchologist. 48, 12 l 2- l 218.

P, arton, P.E., & Coley, R.J. (1998). Growth in school: Achievement gains.fiom the jburth to tae

eighth grade. Princeton, N J: Educational Testing Service.

Bolon, C. (2001, October 16). Significance of test-based ratings for Metropolitan Boston Schools. Education Polig, Analysis Archives, 9 (42). [On-line]. Available: http:llepaa.asu.edulepaatv9n42t.

( 'imbricz, S. (2002, January 9). State-mandated testing and teachers' beliefs and practice. Educati,,, Policy Anahwis Archives, 10 (2). [On-line]. Available: http:ltepaa.asu.edulepaa/vl On2.html .

Darling-Hammond, g. (2003). Standards and assessments: Where we are and what we need. TeacheJ:~ C,/leg, e Recoi'~L [On-line]. Available: http:l/www.tcrecord.org/Contentasp?ContentID=l 1109.

Davison, M.L. (1983). Multidimensiol~al scaling. New York: Wiley.

Dillon, S. (2003a, May 25). States cut test standards to avoid sanctions. New York Times. [On- line]. Available: http:/lwww.nytimes.com12003/OS/O22/education/22EDU.html.

Dillon, S. (2003b, June 25). Citing flaw, state voids math scores. New York Times. p. A1.

Ding, C.S. (2003). Exploratory longitudinal profile analysis via multidimensional scaling. Practical Assessme,I, Research. and Evaluation, 8(12). IOn-line]. Available http://ericae.net/paretArticles.htm

Ding, C.S., Davison, M.L., & Petersen, A.C. (in press). Multidimensional scaling analysis of growth and change. Journal o/Educational Measurement.

Fan, X. (2001). Parental involvement and students' academic achievement: A growth modeling analysis. The Journal o f Experimental Edacation, 70, 27-61.

(iriffith, J. (2003). Schools as organizational models: Implications for examining school effectiveness. The Elementary School Journal, 104 (1), 29-47.

llaney, W. (2002, May 6). Lake Woebe guaranteed: Misuse of test scores in Massachusetts, Part I. Edladti~m /'~Jli~ 3" Anally.via" Archives. 10 (24). [On-line]. Available: http://epaa.asu.edu/vlOn24/

Page 16: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

252 C. Ding & V. Navarro / Studies in Educational Evaluation 30 (2004) 237-253

Kane, T., & Staiger, D. (2001, April). Volatility in school test scores: ImplicationsJor test-based accmmtahilio' systenls. Working paper of the National Bureau of Economic Research. [On-line]. Available: htlp://www.nber.org/papers/w8156.

Kruskal, J.B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric laypolhcsis. P,D,chometrika, 29. 1-27.

Lashway, L. (2002). The accountability challenge. Principal, 81 (3), 14-16.

Linn, R.L. (2003, Winter). Requirements for measuring adequate yearly progress. The CRESST Policy Brie/," 6. National Center for Research on Evaluation, Standards, and Student Testing [CRESST], University of California, Los Angeles.

Linn, R.L, Baker, E.L. & Betebenner, D.W. (2002). Accountability systems: Implications of requirements of the No Child Left Behind Act of 200t. Educational Researcher, 31 (6), 3-16.

Mchrens, W.A. (1998). Consequences of assessment: What is the evidence? Educational Policy Atolls'sis Archives. 6 (13). [On-line]. Available: http://epaa.asu.edu//epaa/v6n 13.html.

Middleton, J.A., & Spanias, P. (1999). Motivation for achievement in mathematics: Findings, generalizations, and criticisms of the research. JournalJbr Research in Mathematics Education, 30, 65-88.

National Commission on Excellence in Education. (1983). A nation at risk." The imperative of educalional reform. Washington, DC: U.S. Department of Education. Author.

National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards ,[or school mathematics. Reston, VA: Author.

National Council of Teachers of Mathematics. (1991). Prq/~,ssional sta,ldards for teaching mathematics. Reston, VA: Author.

National Council of Teachers of Mathematics. (1995). Assessment standards for school mathematics. Reston, VA: Author.

Nalional Council of Teachers of Mathematics. (2000). Principles and standards ,for school mathc'mJtics. Reston, VA: Author.

National Science Board. (1998). Science and engineering indicators: 1998. Arlington, VA: National Science Foundation.

National Science Board. (2000). Science and engineering indicators: 2000. Arlington, VA: National Science Foundation.

No Child Left Behind Act of 2001, Pub. L. No. 107-110 Stat. 1425 (2002).

Orlich, D.C. (2003). An examination of the longitudinal effect of the Washington Assessment of Student Learning (WASL) on student achievement. Education Policy Analysis Archives, 11 (18). [On-line]. Awlilable: http://epaa.asu.edu/vol I 1 .html

Reckase, M.D. (March, 1997). Consequential validity fi'om the test developer'x perspective. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Page 17: An examination of student mathematics learning in elementary and middle schools: A longitudinal look from the us

C. Ding & 14 Navarro / Studies in Educational Evaluation 30 (2004) 237-253 253

Stake, R.E. (1995). The invalidity of standardized testing for measuring mathematics achievement. ]n T.A. Romberg (Ed.), Re/brm in school mathematics and authentic assessment. SUNY series, reform in

mathematic.s" education (Vol. Viii, 291, pp. 173-235). Albany, NY, US: State University of New York I)I-CSS,

Schocnfeld, A.H. (2002). Making mathematics work for all children: Issues of standards, testing, and equity. Educational Researcher, 31 ( 1 ), 13-25.

Slcvcns, J., Parkcs, J., & Brown, S. (April, 2002). The use q[ composite indices in school

occountahilitv .s'v.s':ems. Paper presented at the American Educational Research Association Conference, New Orleans, LA.

Suter, L.E. (1995). Is student achievement immutable? Evidence from international studies on school and student achievement. Review of Educational Research, 70 (4), 529-545.

Zancanella, D. (1992). The influence of state-mandated testing on teachers of literature. Educa:i~mal Evaluation and Polio 3, Analysis. 14 (3), 283-295.

The Authors

C O D Y D I N G is an ass i s tan t p ro fe s so r in the D iv i s ion o f E d u c a t i o n a l P s y c h o l o g y , Research , and Eva lua t ion at Univers i ty o f Missour i -S t . Louis . His research in teres ts include assessment o f percept ions and how percept ion inf luences behaviors , learning and students achievenlent , and assessment o f health behaviors and psychosoc ia l adapta t ion o f students and young adults.

V I R G I N I A N A V A R R O is an ass i s tan t p ro fe s so r in T e a c h i n g and L e a r n i n g at the Univers i ty or" Missour i -St . Louis. She wri tes about the social const ruct ion o f ident i ty and urban education.

Correspondence: <[email protected]>