Assessment of preservice teachers using alternative assessment methods

Journal of Personnel Evaluation in Education 10:117-135, 1996 © 1996 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands

Assessment of Preservice Teachers Using Alternative Assessment Methods

DAVID M. SHANNON MARGARET BOLL Auburn University, 4010 Haley Center, Auburn, AL 36849-5221

The current calls for accountability in education have raised significant concerns about methods of teacher assessment. The more traditional methods often provide more standardized scores, making it easy to compare teachers across institutions and nationally. However, there are continued calls for methods that are more authentic in assessing actual teaching ability. This dilemma has two direct implications for the assessment of preservice teachers. First, colleges of education must make determina- tions on how they will assess students throughout their teacher preparation program. Second, State Departments of Education have traditionally been responsible for determining methods of certification assessment. Addressing this issue requires a preliminary examination of the methods that have been most commonly used to assess teachers.

Traditional methods of state certification assessment

Large-scale certification assessment of teachers has traditionally been conducted using multiple-choice paper-and-pencil tests. These types of assessment offer certain advantages: They are accessible from various test publishers; they are relatively inexpensive; they are easy to administer and score; and they provide a basis for more direct comparisons across different institutions or states. Written tests are usually supported on the basis of their content validity. However, multiple-choice tests offer little in terms of face validity. It can also be difficult to convince preservice teachers that these tests alone can be used to measure all the knowledge and skills necessary for effective teaching. Finally, there is little evidence to support the ability of these tests to predict the classroom behavior of teachers (Ayers, 1989; Chiu, 1989; Quirk, Witten, & Weinberg, 1973).

One method commonly used to assess actual teaching behavior is direct observation. This method provides a more credible method for evaluating teacher's performance in the classroom, and this method is an improvement over basing evaluation decisions solely on multiple-choice tests. Direct observation allows teaching to be assessed in the context in which it occurs.

118 D.M. SHANNON & M. BOLL

Observational systems are not without their problems. Observational methods require significant amounts of time, thus limiting their use for certification assessment. Specifically, it would be difficult to observe all teachers before they obtain teaching positions. For example, it would be possible for an incompetent teacher to have already spent one or two years teaching before being denied certification. A second concern is sampling. An accurate judgment of a teacher cannot be made on just one observation. Therefore, many observations are needed to develop an accurate picture of a teacher' s typical classroom performance. These limitations raise the issue of cost. It is very expensive to maintain observational programs because of training, travel, and other expenses.

These traditional methods of assessment have also been criticized because they provide a limited view of teaching. Both methods fail to capture the interrelationships of content-area and pedagogical knowledge and situational factors such as student diversity and school environment (Bird, 1990; Darling-Hammond, 1988; Scriven, 1988, Shulman & Sykes, 1986). Moreover, these methods also fall short of satisfying the criteria proposed as necessary to establish job-related validity (D'Costa, 1993).

The limitations of these traditional methods of teacher assessment have lead to a call for the use of multiple or alternative methods (Furthwengler, 1986; Haertel, 1991; Shulman, 1988). At the completion of a four-year study exploring and developing new methods of teacher evaluation, the Teacher Assessment Project (TAP) recommended the use of alternative approaches such as portfolios and simulated performance assessments (Shulman, 1989). A multipronged assessment approach that incorporates the use of portfolios and simulations has also been recommended by others (Cruickshank & Metcalf, 1993; D'Costa, 1993). Both of these methods may provide advantages over traditional assessment methods, but each also has certain limitations. Let us examine each of these alternatives briefly.

Portfolios

The Ohio Consortium for Portfolio Development (OCPD) (Berry, Kisch, Ryan, & Uphoff, 1991), was established at the request of the Teacher Assessment Project to specifically address issues faced in portfolio development. There is a great deal of discussion of the use of portfolios to assess both preservice and inservice teachers (Barton & Collins, 1993; Bird, 1990; Collins, 1990; Cole 1992; Ryan & Kuhs, 1993; Smolen & Newman, 1992; Wolf, 1991a, 1991b). Portfolios have also been heavily used for national certification purposes (Bradley, 1992; NBPTS, 1992).

Portfolios offer several distinct advantages. This method possesses more face validity than the traditional methods of teacher assessment because portfolios allow the teacher to reflect upon the specific context in which teaching occurs and they involve teachers throughout the planning and implementation phases. Additionally, portfolios provide opportunities for professional development: teachers reflect upon their teaching to determine what evidence best supports their ability to teach. This flexibility provides the portfolio approach with greater applicability to teachers from a

PRESERVICE TEACHER ASSESSMENT 1 19

wide variety of grade levels and subject areas. A further strength is the opportunity provided for teachers to interact with colleagues during the development of their portfolios. This development process also requires gathering support from multiple sources such as lesson plans, teaching materials, journals, audiotapes, and videotapes. This information, gathered over time, provides a cumulative record of a teacher's development and accomplishments.

The primary disadvantages associated with portfolios are time, cost, and consistency. Portfolios are very time consuming compared to pencil-and-paper exams. The development and the evaluation stages take extensive time and are very labor-intensive. Time and labor always translate into money, thus making portfolios a high-cost option for assessment. Because they are designed to capture specific teaching contexts, there is a great deal of inconsistency from one portfolio to another. This inconsistency introduces potential problems related to the reliability and the application of evaluation criteria.

Simulat ion Exercises

Simulation exercises are designed to assess teaching more realistically than pencil- and-paper tests and less expensively than direct observation. Like portfolios, they offer greater face validity than traditional written tests, for they attempt to replicate actual classroom situations. Teachers are often required to perform specific functions of the teaching process, such as planning, instruction, or evaluation. Conditions are created that resemble the actual teaching context as closely as possible, usually at a common assessment center where all teachers engage in the same simulation exercise. This factor results in greater consistency than that associated with portfolios.

Simulations have been used regularly as part of instruction but not typically for teacher assessment purposes (Cruickshank, 1988). Some of the forms that simulations have taken include in-basket tests (Frederickson, Saunders, & Wand, 1957), computerized planning simulations (McNergney, Medley, Allesworth, & Innes, 1983), interactive teaching decision-making simulations (Shannon, Medley & Hays, 1993b), and role playing, micro-teaching, and other group activities (Cruickshank, 1988; Cruickshank & Metcalf, 1993; Haertel, 1991; Jacobson & Stilley, 1992). This richness of choices raises issues that may either enhance or weaken the case for adopting this type of assessment.

One of the most critical concerns is cost. While most simulation exercises may cost more than group administered written tests, they are less expensive than directly observing each individual teacher in his or her own classroom. The extent to which technology is necessary will further influence the cost of developing and implementing simulations. Some simulation exercises require a great deal of technological expertise and resources to create and maintain; others simply require the duplication of written materials to guide the completion of the simulated task.

Another issue is time. Simulations usually require a great deal of time to develop and validate. Time is also an issue when administering the simulation. Some simula-


tions can be administered to groups of teachers while others are limited to individuals, which can be very time-consuming. Training is an additional factor. As with observation, trained personnel are required to facilitate the implementation of some simulations while others can be self-administered. A final issue is evaluation. The results from some computer simulations are scored using built-in scoring features while other require trained professionals to score them.

Purpose

The purpose of this study was twofold: To examine what forms of assessment are being used at various stages of preservice teacher education and to determine if the methods used reflect the call for alternative assessment methods. This article offers a multiple perspective encompassing findings from Colleges of Education throughout the nation to the specific experiences of one college.

We begin by reporting the results of a survey of a national sample of Colleges of Education regarding methods currently used to assess preservice teachers enrolled in teacher education programs. Next, we examine how preservice teachers are being assessed for certification purposes, beginning with a review of national surveys of SDE's and then providing a more detailed examination of one state, Alabama. Finally, the efforts made by one College of Education to implement a preservice teacher assessment system using alternative methods recommended by the profession are discussed.

Assessment of Preservice Teachers in Colleges of Education

Sample The sample of Colleges of Education was randomly drawn from the 1992-1993 American Association for Colleges of Teacher Education (AACTE) membership directory. Only those AACTE institutions offering an undergraduate degree in education were eligible. A random sample of six institutions was drawn from each state, with the exception of those states having fewer than six member institutions. For those states having fewer than six member institutions, an exhaustive sample was drawn. This sampling procedure resulted in a total of 248 institutions, of which 45% were categorized as small (having fewer than 100 graduates annually), 48% as medium (between 100 and 500 graduates annually), and 7% as large (more than 500 graduates annually), according to AACTE size guidelines. A total of 166 (67%) surveys were returned and used in all data analyses. The responding sample of institutions was representative of the initial sample, as 43% were categorized as small, 50% as medium, and 8% as large.

Procedure Each college was asked to complete a two-page survey regarding the assessment of preservice teachers for admission to teacher education, during student teaching, and

PRESERVICE TEACHER ASSESSMENT 121

Table 1. Assessment Methods Used For Admission to Teacher Education.

Assessment Method

Total Written Computer Performance Simulation Portfolio Test Test Assessment

N (%)a N (%)b

Interview

General Knowledge 147 (89%) 124 (84%) 15 (5%) 7 (5%) 2 (1%) 4 (3%) 21 (14%)

Content-area Knowledge 71 (43%) 29 (41%) 2 (3%) 16 (23%) 0 (0%) 7 (10%) 15 (21%)

Pedagogical Knowledge 37 (22%) 10 (84%) 0 (0%) 13 (35%) 2 (5%) 4 (11%) 11 (30%)

Communication Skills 122 (73%) 68 (84%) 5 (4%) 29 (24%) 2 (2%) 5 (4%) 42 (34%)

Interactive Teaching 20 (12%) 0 (0%) 0 (0%) 15 (75%) 7 (35%) 0 (0%) 2 (10%)

Planning Ability 16 (10%) 3 (19%) 0 (0%) 11 (69%) 5 (31%) 0 (0%) 3 (19%)

a Percentages based on the total number of responding institutions (N = 166) b Percentages based on the number of institutions reporting the assessment of this area of teaching indicated under the

"TOTAL" column

for graduat ion. Informat ion was ga thered on the types of assessment used and the componen t s of teaching ta rge ted by these assessments . Speci f ica l ly , co l leges were asked to indicate their use of six assessment methods (i.e., penc i l - and-paper tests, compute r i zed tests, pe r fo rmance assessment , s imulat ions , por t fol ios , and in terviews) to measure six componen t s of teaching (i.e., genera l knowledge , conten t -a rea knowledge , pedagogica l knowledge, communica t ion skills, instruct ional planning, and interact ive teaching) . An opt ion to spec i fy addi t ional informat ion regard ing assessment methods used with preserv ice teachers or what is assessed was also inc luded on

the survey instrument.

A s s e s s m e n t Pr ior to A d m i s s i o n

A summary of the assessment methods used for admiss ion to teacher educat ion is found in Table 1. Asses smen t for the purposes o f admiss ion to teacher educat ion p rograms focussed p r imar i ly on genera l knowledge and communica t ion skil ls . Very few inst i tut ions repor ted the assessment of planning, interact ive teaching, pedagogica l knowledge, or content knowledge at the point of admission.

A total of 147 inst i tu t ions (89%) indica ted that they assessed genera l knowledge at the point of admiss ion . Inst i tu t ions genera l ly es tabl ish min ima l g rade-po in t -average requi rements for entrance to teacher educa t ion programs. The most dominan t form of assessment repor ted for assess ing genera l knowledge was the penc i l - and-paper test,


Table 2. Assessment Methods Used During Student Teaching.

Assessment Method

Total Written Computer Performance Simulation Portfolio Test Test Assessment

N (%)a N (%)b

Interview








"TOTAL" column

which was reported by 124 institutions. Of these 124 institutions, 35 (28%) reported using the PPST and 33 (27%) reported using the NTE for this purpose. Other admission tests, such as the SAT and ACT, were identified by 12 institutions.

Communication skills were assessed by a total of 122 institutions (73%). The assessment of communication skills was also conducted primarily through written tests and interviews. The most often used written test was the NTE, reported by 26 (38%) of the 68 institutions using written tests. The NTE was followed by the PPST, which was reported by 11 institutions (16%). Interviews were reported by 42 institutions (34%) that assessed communication skills prior to admission to teacher education.

Assessment During Student Teaching The greatest variety of assessment methods were reportedly used during student teaching. The emphasis of assessment also shifts from general knowledge, targeted at the admission stage, to the more specialized knowledge and skill required to become a successful teacher. The use of written tests is greatly reduced as performance assessment becomes the dominant form of assessment. Other methods of assessment, such as portfolios and simulations, also begin to surface during student teaching. This information is summarized on Table 2.

A total of 132 (80%) institutions reported the assessment of planning ability. Over seventy percent of the institutions also indicated the assessment of pedagogical knowledge, content-area knowledge, interactive teaching ability, and communication


Table 3. Assessment Methods Used Prior to Graduation.

Assessment Method

Total Written Computer Performance Simulation Portfolio Test Tes t Assessment

N (%)a N (%)b

Interview








"TOTAL" column

skills. Pe r fo rmance assessment was consis tent ly ident i f ied as the p r imary form of assessment . Por t fol ios and s imula t ions were used to the greatest extent for the assessment of planning, in teract ive teaching, and pedagog ica l knowledge . Penc i l - and-paper tests were l imi ted to the assessment of knowledge (i.e., general , content, pedagogica l ) and communica t ion skills.

A s s e s s m e n t P r i o r to G r a d u a t i o n

Table 3 provides a summary of the assessment methods used at the point of graduation. The data on this table reveal that the assessment of knowledge is the pr imary concern. The use of wri t ten tests are repor ted to assess p r imar i ly pedagog ica l knowledge and content knowledge pr ior to graduat ion from teacher education. Ninety-nine insti tutions (60%) repor ted the assessment of pedagog ica l knowledge , while 95 inst i tut ions (57%)

reported the assessment of content-area knowledge. Of the 60 inst i tut ions that ident i f ied a writ ten test to assess pedagogica l knowledge ,

38 (63%) repor ted the NTE. Of the remain ing 22, four inst i tut ions ident i f ied a state- deve loped test, two a un ivers i ty -deve loped test, and two a Nat ional Evaluat ion Sys tem (NES) test. The other 14 inst i tut ions fa i led to ident i fy the specific test that was used to assess pedagog ica l knowledge . Thi r ty- three (33) of the 60 inst i tut ions assess ing conten t -a rea knowledge also ident i f ied the NTE. Five inst i tut ions repor ted state- deve loped tests, three univers i ty-developed tests, three NES-deve loped tests, while the remaining 16 fai led to identify a specific test.


Certification Assessment by State Departments of Education

Overview

Assessment for certification purposes is the point where preservice teachers are most directly affected by assessment methods; it is at this point that they have the most to gain or lose. The results of certification tests determine whether one will continue working in the teaching profession. Recent studies have indicated that 45 states mandate some form of certification assessment (Delandshire, 1994; Haertel, 1991; Shannon & Boll, 1995). A survey of State Departments of Education (SDEs) conducted in 1993 (Shannon and Boll, 1995) indicated that most states continue to rely on pencil-and-paper tests for certification purposes. Of the 45 states that require assessment, 41 states (91%) reported using some form of standardized written examination.

State of Alabama

From this national perspective, let us proceed to examine the experience of Alabama, which has gone through several policy changes in recent years. Alabama is like most states in that certification assessment is mandated. However, the manner in which this assessment has been developed and administered has changed in part due to a lawsuit that challenged the validity of the statewide testing program for preservice teacher certification (Allen v. Alabama State Board of Education, 1985). Currently, the development and implementation of the specific assessment(s) used for certification purposes is planned by specific teacher education programs and approved by the state. Each assessment plan must include the measurement of professional and content knowledge, l

Procedure There are a total of 27 institutions in the state of Alabama that offer undergraduate programs in teacher preparation. Information regarding the methods of assessment used was gathered from two sources. First, each of these institutions was required to submit its exit examination plan to the SDE for approval. These materials were gathered and summarized by the SDE (Office of Professional Services) in the spring of 1993. Information regarding the specific types of assessments used by the 27 institutions was gathered from the SDE summary report. As a follow-up on the SDE summary, we surveyed these institutions approximately 11/2 years later. A total of 23 institutions (85%) responded to the follow-up survey.

Results The pattern established nationally was again born out in Alabama, as indicated by the Alabama SDE summary report. Twenty-five of 27 (92.5%) institutions included a pencil-and-paper test in their preservice assessment plan. Of these 25, 16 institutions (64%) relied solely on a written examination. Five institutions supplemented written


Table 4. Assessment of Preservice Teachers in Alabama.

Forms of Assesments Used N (%)

Written Test 22 (96%) Portfolio 11 (48%) Performance Assessment 9 (39%) Oral Examination 7 (30%) Other 2 (9%)

Table 5. Factors in Selection of Assessment Form(s).

Not Somewhat Very N Important Important Important Important

Validity 23 1 (4%) 0 (0%) 11 (48%) 11 (48%) Reliability 22 0 (0%) 1 (4.5%) 12 (54.5%) 9 (41%) Useability 22 0 (0%) 0 (0%) 13 (59%) 9 (41%) Cost 22 0 (0%) 6 (26%) 13 (57%) 4 (17%) Time 22 1 (4.5%) 7 (32%) 10 (45.5%) 3 (14%)

examinations with oral examinations, five supplemented with the use of a performance assessment, and three supplemented with the use of portfolios.

In our follow-up survey, conducted in the spring of 1995, some changes in these assessment plans were revealed, although written tests were still the method of choice by a wide margin. Twenty-two of the 23 responding institutions reported using some form of a written test. the most noticeable change from the initial SDE summary is the extent to which multiple methods of assessment are being reported. Only 7 of the 22 institutions using written tests failed to report supplemental methods of assessments. Nine institutions reported the use of three or more methods of assessment. These results are summarized in Table 4.

Each institution was also asked to respond to questions regarding the development of its preservice assessment program. On a four-point scale, several factors were rated as to their importance in the selection of an assessment method. The factors of useability, reliability, and validity were determined as most important, since 95% or more of the institutions rated them as either important or very important. Cost and time were evaluated as somewhat less important. These results are summarized in Table 5.

In response to an open-ended question asking respondents to list the strengths and weaknesses of their assessment methods, most comments were about the written exams. The most frequently mentioned strength of written tests (n = 8) was their ability to reflect the content of the program. Other strengths mentioned were simplicity and ease of administration or scoring (n = 5) and validity and reliability (n = 3). Two respondents mentioned national norms for their assessments as strengths, while two respondents mentioned a lack of national norms for their test as a weakness. Other


weaknesses of written exams mentioned included the stress they place upon students, their inability to measure individual strengths and weaknesses, and, for college- prepared exams, the time consumed both in preparation and evaluation.

Few comments were made about other forms of assessment, but the few that were made do provide some insight into why these forms were chosen. Two schools that combined the use of several assessment methods stated that they viewed this variety as a strength; another school that combined portfolios with oral assessments stated that these two provide a culminating assessment that is performance-based and reliable. They only weakness mentioned for alternative forms of assessment was that they were too time consuming.

Auburn University

One institution, Auburn University, incorporates the use of portfolios and simulations in its assessment plan. Auburn is also one of two institutions in the state that does not require a written test. In the following section, we will provide a brief overview of the assessment plan implemented at Auburn University and some of the issues and difficulties we have faced along the way.

The response of the Auburn University College of Education (AUCOE) to the state mandate mentioned previously can be instructive in explicating the processes and challenges involved in developing an assessment program. After a review of existing literature and feedback from the college faculty, an exit evaluation committee, comprised of faculty and administrators from throughout the college made the final decision to implement an assessment plan that included both portfolios and simulations. These forms of evaluation were chosen as representing the most meaningful ways for assessing the widest array of teaching knowledge and skills. The portfolio phase of this evaluation plan was first implemented during the spring quarter of 1993. The simulation component has been pilot tested and is currently being revised but has not been formally administered as part of the overall assessment plan. A detailed account of steps taken to develop and implement the college's assessment plan is provided elsewhere (Shannon, Ash, Barry, & Dunn, 1995). The following discussion will summarize the major issues we have faced throughout the implementation of our assessment system.

The first issue we had to deal with regarded the definition and purpose of these assessments. Several program areas, although they supported the use of portfolios and simulations, were concerned that a common definition and purpose was not shared among students, faculty, and the State Department of Education (SDE). The SDE imposed this requirement as summative, an exit evaluation of teacher education students prior to certification. We (the college) viewed this as more of an opportunity to foster the development of our preservice teachers, which could satisfy the requirements imposed by the state for beginning teachers (Alabama State Department of Education, 1992). The students wanted the portfolio, in particular, to be something they could use for job interviews.


We planned a portfolio that would allow our preservice teachers to represent their development over time and would encourage a great deal of reflection and self- evaluation. The completion of this portfolio would help prepare students for job interviews, but it was not designed specifically for that purpose. We did survey area employers, as well as faculty and students, to determine what types of components they saw as most relevant for inclusion in a portfolio. This feedback was useful in the further shaping of our exit portfolio.

The biggest constraint we have faced has been time. A great deal of time has been necessary throughout the development, implementation, and evaluation phases of our assessment project. We have had to be very patient as we waited for this portfolio system to take the shape we envisioned. The initial portfolio requirements were kept to a minimum so that students could complete them during their internship quarter. These initial components included a professional resume, a classroom management plan, two weeks of lesson plans, a student evaluation instrument, and example(s) of other teacher-made instructional materials. The majority of these components are normally completed as part of the internship responsibilities.

As a result of feedback from interns and faculty, the portfolio requirements were revised for the fall of 1993. The revised components, which remained in effect for the 1993-1994 academic year, were a professional resume, a self-evaluation of one's teaching, the lesson plan that proved most successful, the lesson plan that proved least successful, and a student evaluation instrument or process. These revised components incorporated reflection and self-evaluation. Interns were required to write reflective statements to accompany each lesson plan and the student evaluation component. In the reflective statements for each of these components, interns addressed the context in which the component was used, specific features of the component that proved successful and unsuccessful, and specific insight(s) gained as a result of using the component and looking back at what happened when it was used. The purpose of the self-evaluation was "for [interns] to reflect upon the knowledge, skills, and qualities that [they] have acquired during [their] teacher training program at Auburn University" (Shannon, Ash, Barry, & Dunn, 1995, p. 10). The current exit portfolio is introduced to students during their teacher education orientation class.

Planning has been completed for expanding the current exit portfolio into a much more comprehensive portfolio that students complete throughout their teacher preparation program. This portfolio will be comprised of three sections. The first section consists of five college-wide requirements. The second section of portfolio components are determined by the student's program area (e.g., early childhood, elementary, secondary). Each program area has specified which components best reflect the required coursework, field experiences, and other experiences in which students engage as part of their teacher preparation. The final section of portfolio components is determined by the student. Implementation of this plan has been delayed due to fiscal restraints that forced the college to allocate resources elsewhere.

Developing and implementing the simulation has been even more time consuming than the portfolio. We have had to postpone its inclusion in our overall assessment program. Our plans were to revise the Simulation of Interactive Decision-making


(SID), developed for teachers (Shannon, Medley, & Hays, 1993a, 1993b), for use with our preservice teachers. These revisions included the computerization of the simulation, a built-in scoring system, and the use of remote student response keypads so that it could be used with up to 24 students at a time, minimizing the cost of additional computers.

We experienced our first delay of over six months waiting for the arrival of the necessary computer hardware. This delay complicated the arrangements made for having the necessary assistance in computerizing the simulation. When all the required hardware did finally arrive, the designated personnel time was no longer available. The content of the simulation has been piloted in a preliminary form, but the computerized package has yet to be introduced. These delays have raised financial concerns. Time delays also mean costs- - in personnel and in resources. To complete the simulation development in a timely fashion, more personnel or consultant time will be necessary.

Another issue we have continually faced is that of evaluation. Once the simulation is ready, the evaluation process will not be a great concern because of the assistance of a built-in scoring system. On the other hand, the evaluation of portfolios has raised some concerns. Who should evaluate portfolios? What criteria should be used for evaluation? How can consistency be achieved with many different supervisors conducting the evaluation? While we hesitate to say we solved these problems, we have arrived at a working solution. Let 's answer each question in turn. First, who evaluates? Because the components of portfolios will differ depending on a student's program area, we believe that the faculty in specific program areas are those who are most qualified to evaluate portfolios, but the reality is that some program areas have more students than others, increasing the supervision load tremendously. To distribute the workload more evenly, the internship (university) supervisor, who is assigned a maximum number of students per quarter, currently has the primary responsibility for evaluation.

Second, what criteria is used to determine what is satisfactory? The definition of the term satisfactory is determined by the program area. The college does provide some general guidelines for the evaluation of reflective statements and self-evaluations, but each area specifies the criteria for each portfolio component. Portfolios that are determined to be unsatisfactory are reviewed by other faculty in the program area and discussed with the intern. The intern is then asked to address the areas of weakness before resubmitting the portfolio for another evaluation.

Third, how is consistency maintained? Consistency is an inherent weakness in any system that relies on personal judgment. Our attempts to address this problem have focused on providing guidelines and orientation meetings for university supervisors. The same evaluation guidelines are shared with all students.

One last issue worth discussion is that of management. Since this college-wide portfolio system has been in place, the requirement of portfolios for specific coursework has increased tremendously. This increase has made the management and coordi- nation of the overall portfolio system more difficult. The current portfolio plan was developed as an attempt to reduce the overlap between the college-wide portfolio and individual course-required portfolios by allowing each program a voice in determining specific components. While this attempt to integrate portfolio requirements within the


curriculum has made some progress, many faculty continue to require specialized portfolios as partial fulfillment of course requirements. This trend threatens to raise the specter of "portfolio overload" among students and may contribute to a decreasing sense of the significance of the exit portfolio.

Conclusion

How has the profession responded to the call for the use of multiple and alternative forms of teacher assessment? The findings from this study suggest a mixed response. Generally, the predominant use of the traditional methods of teacher assessment, written tests and observation, continues. The assessment of preservice teachers at the times of admission to and graduation from teacher education programs also continues to rely heavily on written tests. The use of observation is the overwhelming choice for evaluating student teachers. For certification purposes, written tests have continued to be the most dominant form of state-mandated teacher assessment.

The experience in Alabama where Colleges of Education have had the opportunity to select their own assessment method(s) offers a different outlook. The initial reaction to a state mandate resulted in the predominant use of written tests and little use of other assessment methods. However, after a period of l 1/2 years, the use of multiple methods of assessment has begun to emerge. Written tests continue to be used by over 95% of the state's teacher education programs, but the usage of alternative formats, such as portfolios, has doubled.

Why has change in assessment methods moved so slowly despite the call for alternative methods and dissatisfaction with traditional multiple-choice tests? Perhaps some of the issues faced at Auburn and discussed recently in the profession may provide some insight regarding the slow response to the call for use of alternative assessment methods. The implementation of an assessment plan that includes portfolios and simulations requires a great deal of time and patience.

We found that developing a successful assessment plan is evolutionary in nature, requiring the resolution of conflicting points of view regarding the definition, purpose, and form of assessment. Alternative assessments can assume a great variety of forms and a lack of clarity and specificity found in many alternative assessments continues to trouble thoughtful practitioners in adopting these forms of assessment (Freur & Fulton, 1993; Worthen, 1993). Meyer (1992) distinguishes authentic assessment from other types of alternative, performance assessment as assessment that occurs in a real-life context. Using this definition, portfolios and simulations would most often be defined as authentic.

These alternative assessments serve a critical diagnostic role and provide teachers with valuable feedback about their teaching ability. We have experienced a conflict between the formative and summative roles our portfolio assessment is intended to serve. A serious question has been raised as to whether the use of these assessments for formative and diagnostic purposes is compatible with the summative and evaluative purposes they are also intended to serve (Madaus & Kelleghan, 1993). To be


successful, these assessments must be introduced with a clear purpose and implemented in a manner consistent with their purpose (LeMahieu, Gitomer; & Eresh, 1995; Worthen, 1993).

A gradual approach should also be taken in introducing such an alternative assessment system and in increasing the expectations of student performance. An assessment system is not likely to succeed if it is simply a directive from the administration. The long-term success of an alternative assessment system depends on the key stakeholders (e.g., faculty and students) believing that the assessment is important and the results will be useful (Abruscato, 1993; Worthen, 1993). We have made regular visits with faculty as an effort to address their concerns for such an assessment system and what would be required of them and their students. We believe that this approach has helped to clarify the purpose for both faculty and students, resulting in an increased level of comfort and confidence with the methods chosen. A gradual implementation of such an assessment system is necessary to overcome resistance to change and to achieve greater acceptance of alternative assessment method(s), which require more involve- ment on the part of all concerned.

The issue of evaluation further complicates the use of alternative assessment methods. Alternative assessments are not easily scored. Efficient implementation of alternative assessment methods depends on the establishment of clearly stated criteria, by which successful performance can be judged. There are no standardized scoring keys. Professional educators must agree on which criteria best represent successful teaching and evaluate student performance against these criteria. Reaching such agreement within the teaching profession, or even within a specific teacher preparation program, however, presents great difficulty and further complicates the effective use of alternative assessment methods (Cruickshank & Metcalf, 1993; D'Costa, 1993).

It is extremely important that student performance on alternative assessment methods be reviewed by well-qualified judges, who are trained in the use of an agreed- upon scoring rubric. Basing an overall judgment of whether a preservice teacher has performed satisfactorily on multiple judges also allows for the estimation of reliability, which has often been the target of criticism with alternative assessment methods. It is especially true for using portfolios. The outcomes of portfolio assessment are not consistent across academic disciplines; therefore, it is necessary to reach consistent judgments as they apply to individual portfolios. The establishment of a portfolio review committee is very useful in dealing with the inconsistencies that are likely to arise (Ryan & Kuhs, 1993; Shannon, Ash, Barry, & Dunn, 1995). There is evidence that consistency in judgment can be reached and the use of multiple judges should continue to be an important component of a portfolio-based assessment system (LeMahieu, Gitomer, & Eresh, 1995).

The Alabama experience highlights another possible constraint. Currently public opinion is favoring setting standards at the national or state level. Legislators, acting in accordance with both their own biases and with the support of public opinion, are opting for nationally normed tests that provide a baseline measurement for minimal competency. The public is more easily convinced that mass-administered tests that have been prepared by national testing services provide objective and unbiased results.


These tests are therefore perceived as more valid than are assessments that introduce questions of consistency and subjectivity, such as portfolios. Neither legislators nor the public are persuaded by experts in the field who argue that pencil-and-paper tests may not test what is really important. In the argument between professionals in the field of education and the public, there is clearly a split between what is considered most important--with the professionals coming down on the side of meaningful assessment while the public is coming down on the side of replicability and ease of comparison.

Several issues currently confront the public and interfere with the acceptance of alternative assessment methods. First, because assessments, such as portfolios and simulations, are designed to capture more of the context in which teaching occurs, they do not produce standardized data that can be easily compared across teachers. Results from alternative assessment methods is very meaningful to the teacher but more difficult to summarize for larger audiences such as the public. This lack of comparable data has presented difficulties elsewhere (Madaus & Kellaghan, 1993) and presents an obstacle toward public understanding and acceptance of the use of alternative assessment methods (Cruickshank & Metcalf, 1993; Worthen, 1993). Using alternative assessment methods for certification purposes will also further complicate the problem of reciprocity among different states (D'Costa, 1993).

A second issue that concerns the public and impedes the acceptance of alternative assessment methods is cost, both in terms of time and money. Assessment methods such as portfolios and simulations typically cost more to develop and implement than do other assessment methods. The question is raised as to whether the benefits of such assessment justify the increased costs (Worthen, 1993). Increased costs not only postpone the implementation of such programs, as in the Auburn experience, but have seriously threatened the continuation of ongoing assessment programs in England and Scotland (Madaus & Kelleghan, 1993).

Given their costly nature and lack of standardized results, will alternative assessments ever gain acceptance in the high-stakes assessment arena? Because of the cumbersome nature of portfolios to administer, develop, and evaluate, we believe that the successful use of portfolios for large-scale assessment is very unlikely. The time and money required to evaluate individual portfolios for all candidates for teacher certification, particularly in populous states, as well as the time needed to develop portfolios, will probably keep them beyond the means of most states. Portfolios are more amenable for use in smaller, more manageable situations, such as individual courses or programs, and thus will probably be utilized with more frequency in colleges of education.

Simulation exercises face obstacles that are similar to those faced by portfolios, depending upon the format the simulation takes. Simulations, however, have a greater potential for large-scale adoption. Many simulation exercises can be computer- administered and can be completed with few personnel requirements. Simulations can also be designed with built-in scoring programs, saving great amounts of time. In other words, simulations Can be designed to include many of the features of group- administered written tests that are favorable for large-scale use.

So, what must be done for alternative assessments to overcome the many challenges


they face and become widely used and accepted? First of all, it will probably be necessary to concentrate our efforts on the development of assessments that are intended to assess more generic teaching skills (Cruikshank & Metcalf, 1993). This doesn ' t mean that all alternative assessment methods must become more generic in nature. Only those intended for large-scale assessment, such as state certification, must produce results that are more manageable and comparable across different teachers and institutions.

A second critical element is technology. Computer-based systems already exist that can be used to assist in the management and scoring of portfolios (Kurland, 1991, cited in Worthen, 1993). As technology has advanced, they have become much more accessible and affordable. The use of technology, such as computers, needs to be further investigated so that the labor-intensity currently associated with alternative assessment methods can be reduced. The establishment of more efficient delivery and scoring systems will greatly reduce the time and costs involved in the use of alternative assessment methods. Technology can also play a critical role in the establishment of a professional network that offers the instructional and technical support necessary for the successful implementation of an alternative assessment program.

Another practical approach to dealing with the costs of alternative assessment is genuine matrix sampling (Popham, 1993). Using such an approach requires that only a portion of students and assessment tasks be sampled so that time and costs are reduced. Popham (1993) recommends that state assessment officials help identify a range of assessment tasks to be used in a given year. A sample of these assessment tasks are then administered to a sample of students. To increase the comparabili ty of the results, students should be sampled so that a sufficient number of schools are represented. This approach offers promise and should be explored further as an approach to deal with the high costs of alternative assessment methods.

Further research is also needed to determine why a gap continues to exist between the assessment preferences of academicians and their adoption. First, it is necessary to investigate why many institutions have chosen not to adopt these alternative methods. Second, we need to explore other institutions that have experimented with alternative assessment methods to determine the extent to which their experiences are parallel to those we have confronted. Perhaps more important, there is a need for evidence supporting the predictive validity of portfolios and simulations. As such evidence accumulates, the impetus for changing from pencil-and-paper tests should accelerate within the profession.

Notes

1. This situation is now somewhat obscured due to legislation passed in June of 1995 that requires (as of August 1, 1995) colleges of education to administer a national teacher's exam to preservice teachers before they graduate (Rawls, 1995). Exactly how this legislation will be interpreted and affect the current certification mandate is still unclear. This law may also face legal challenges in light of the language contained in the lawsuit, which settled the previous challenge to testing in the state; it was settled by an agreement that required that no testing be used "where the difference in failure rate between the races was more than 5


percent" (Rawls, 1995, p. 3B). If the test results in differential failing rates, legal challenges may arise despite protestations of the bill's sponsors that having the testing done by colleges for graduation, not by the State Board of Education, gets around this problem.

References

Abruscato, J. (1993). Early results and tentative implications from the Vermont Portfolio Project. Phi Delta Kappan, 74(6), 474~,77.

Alabama State Department of Education. (1992). Criteria for the Alabama Professional Education Personnel Evaluation System. Montgomery, AL: Author.

Allen v. Alabama State Department of Education, 612 E Supp. 1046 (M.D. Ala. 1985) Association of Teacher Educators. (1988). Teacher assessment. Reston, VA: Author. Ayers, J. (1989). The NTE, PPST, and classroom performance. Jefferson City, MO: Association of Teacher

Educators. (ERIC Document Reproduction Service No. ED 305 373). Barton, J., & Collins, A. (1993). Portfolios in teacher education. Journal of Teacher Education, 44(3),

200-210. Berry, D., Kisch, J., Ryan, C., & Uphoff, J. (1991). The process and product of portfolio construction. Paper

presented at the annual meeting of the American Educational Research Association, Chicago, IL., April, 1991.

Bird, T. (1990). The schoolteacher's portfolio: An essay of possibilities. In J. Millman and L. Darling- Hammond (Eds.), The New Handbook of Teacher Evaluation: Assessing Elementary and Secondary School Teachers, 2nd ed., Newbury Pack, CA: SAGE Publications.

Bradley, A. (1992). Pilot test offers glimpse of board's teacher assessment. Education Week, 12(2). Chiu, Y.C. (1989). An investigation of the relationships between the National Teacher Examinations and a

performance-based assessment of professional knowledge. Unpublished doctoral dissertation, University of Virginia, Charlottesville.

Cole, D.J. (1992). The developing professional: Process and product portfolios. Paper presented at the annual meeting of the American Association of Colleges of Teacher Education (AACTE), San Antonio, TX.

Collins, A. (1990). Transforming the assessment of teacher: notes on a theory of assessment for the 21st century. Paper presented at the annual meeting of the National Catholic Education Association, Toronto, Canada.

Cruickshank, D.R., & Metcalf, K.K. (1993). Improving preservice teacher assessment through on-campus laboratory experiences. Theory into Practice, 32(2), 86-92.

Cruickshank, D. (1988). The uses of simulation in teacher preparation. Simulations and games, 19, 133-156. D'Costa, A.G. (1993). The impact of courts on teacher competence testing. Theory into Practice, 32(2),

104-112. Darling-Hammond. Educational Leadership, 46, 11-15, 17. Delandshere, G. (1994). The assessment of teachers in the United States. Assessment in Education, 1(1),

95-113. Eissenburg, T.E., & Rudner, L.M. (1988). State testing of teachers: A summary of current practices.

Washington, DC: American Institute for Research. Feuer, M.J., & Fulton, K. (1993). The many faces of performance assessment. Phi Delta

Kappan, 74(6), 478. Frederickson, N., Saunders, D.R., & Wand, B. (1957). The In-Basket Test. Psychological

Monographs: General and Applied, 71 (9), 1-28. Furthwengler , C. (1986). Mult iple data sources in teacher evaluation. Paper presented at the

National Council on Measurement in Education, San Francisco, CA. Haertel, E. (1991). New Forms of Teacher Assessment. In G. Grant (Ed.), Review o f Research in

Education, Washington, DC: American Educational Research Association.


Jacobson, L., & Stilley, L.R. (1992). "Developing and scoring assessment center exercises." Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA:. ERIC Reproduction Document No ED 351361).

Kurland, D.M. (1991). "Text Browser: A computer-based tool for managing, analyzing, and assessing student writing portfolios." Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

LeMahieu, P.G., Gitomer, D.H., & Eresh, J.T. (1995). Portfolios in large-scale assessment: difficult but not impossible. Educational Measurement: issues and Practice, 14(3), 11-16, 25-28.

Madaus, G.E & Kellaghan, T. (1993). The British experience with "authentic" testing, Phi Delta Kappan, 74(6), 458-469.

McNergney, R.F., Medley, D.M., Aylesworth, M.S., & Innes, A.H. (1983). Assessing teachers' planning abilities. Journal of Educational Research, 77(2), 108-111.

Meyer, C.A. (1992). What's the difference between authentic and performance assessment? Educational Leadership, 49(8), 39-40.

National Board for Professional Teaching Standards. (1992). Toward high and rigorous standards for the teaching profession. (3rd edition). Detroit, MI: Author.

Popham, W.J. (1993). Circumventing the high costs of authentic assessment. Phi Delta Kappan, 74(6), 470-473.

Quirk, T., Witten, B., & Weinberg, B. (1973). Review of studies of the concurrent and predictive validity of the National Teacher Examinations. Review of Educational Research, 43(1), 89-114.

Rawls, P. (1995, June 22). Teacher testing to become law. Montgomery Advertiser, p. 3B. Ryan, J.M., & Kuh, T.M. (1993). Assessment of preservice teachers and the use of portfolios.

Theory into Practice, 32(2), 75-81. Sandefur, J.T. (1985). Competency assessment of teachers. Action in Teacher Education, 7(1),

1-6. Scriven, M. (1988). Duty-based evaluation. Journal of Personnel Evaluation in Education, 1(4),

319-334. Shannon, D.M., Ash, Barry, N.H., B.H., & Dunn, C. (April, 1995). Implementing a Portfolio-

Based Evaluation System for Preservice Teachers. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.

Shannon, D.M. & Boll, M. (March, 1995). Teacher assessment methods: theory vs. reality. Paper presented at the annual meeting of the Eastern Educational Research Association, Hilton Head, SC.

Shannon, D.M., Medley, D.M., & Hays, L. (1993a). The construct validity of a simulation exercise in classroom decision-making. Journal of Educatinal Research, 86(3), 180-183.

Shannon, D.M., Medley, D.M., & Hays, L. (1993b). Assessing Teacher's Functional Professional Knowledge. Journal of Personnel Evaluation in Education, 7(1), 7-20.

Shulman, L.S. (1989). The Paradox of Teacher Assessment. New Directions for Teacher Assessment (pp. 13-27). Princeton, N J: Educational Testing Service.

Shulman, L.S. (1988). A union of insufficiencies: Strategies for teacher assessment in a period of educational reform. Educational Leadership, 46, 36-4 1.

Shulman, L.S., & Sykes, G. (1986). A national board for teaching: In search of a bold standard. Paper prepared for the Task Force on Teaching as a Profession, Carnegie Forum on Education and the Economy.

Smolen, L., & Newman, C. (1992). Portfolios: An Estimate of Their Validity and Practicality. Paper presented at the annual meeting of the Eastern Educational Research Association, Hilton Head, SC., February, 1992.

Wilson, A.J. (1985). Knowledge for teachers: the origins of the National Teacher Examinations Program. Paper presented at the annual meeting of the American Educational Research Association (AERA). Chicago, IL.ERIC Reproduction Document No. ED 262049)


Wolf, K. (1991a). The schoolteacher's portfolio: Issues in Design, Implementation, and Evaluation. Phi Delta Kappan, 73, 129-136.

Wolf, K. (199 l b). Teaching Portfolios: Synthesis of Research and Annotated Bibliography. San Francisco, CA: Far West Laboratory.

Worthen, B.R. (1993). Critical issues that will determine the future of alternative assessment. Phi Delta Kappan, 74(6), 444--454.

Documents

Assessment of preservice teachers using alternative assessment methods