17

Click here to load reader

High-Stakes Testing Washback

Embed Size (px)

Citation preview

Page 1: High-Stakes Testing Washback

High-Stakes Testing Washback: A Survey on the Effect of Iranian MA Entrance

Examination on Teaching

Mojtaba Mohammadi1

High-stakes tests work efficiently to bring about changes. They affect the participants as well

as process and product of an educational system. MA Entrance Examination in Iran is a case

in point. It is primarily designed to screen the candidates for postgraduate studies.

Nevertheless, its changes in the classroom, generally known as “washback” in applied

linguistics, are often more than what the designers expect. This paper aims at conducting a

survey of the washback effect of MA Entrance Examination on teachers’ methodology and

attitudes. 45 subjects, all of whom university professors, were selected using convenience

random sampling. Then, a validated researcher-made questionnaire was administered. To

have more reliable data, some were randomly selected to be interviewed so as to cross-check

the data collected through questionnaire. The data analysis revealed that the majority of the

subjects were positively affected by the examination. Moreover, they are fully aware that their

methodology and attitudes were gradually set to the demands of the examination.

Keywords: Washback, High-stakes tests, MA Entrance Examination, Teachers

1 Islamic Azad University - Roudehen Branch

Page 2: High-Stakes Testing Washback

Introduction

It has widely been acknowledged that tests, especially high-stakes ones like

school-leaving examinations, employment exams, or university-entrance exams, can

directly or indirectly influence the educational systems. The reason probably lies in

the fact that they usually involve a set of determining functions in testees’ life ranging

from employment and promotion to placement and achievement. Brown (1996)

clarifies two categories that help administrators and teachers to make program-level

decisions, on the one side, and to make classroom-level decisions on the other side.

The results of the tests as Wall (2000) called them ‘differentiating rituals’, can

sometimes be so crucial in the testees' future life that they require the testees to take

any possible measures to overcome the tests. The story is about the same with the

other human elements (e.g., teachers and administrators) and non-human elements

(e.g., materials and curriculum)in an educational system. Each of these elements are

also expected to adopt and adapt certain skills, techniques, and tasks in order to meet

the test demands and satisfy the students' needs. With these issues in mind, I

conducted this survey to examine the effect of MA Entrance Examination, annually

held in Iran to screen post-graduate applicants, on the methodology of university

professors during their under-graduate courses and their attitudes toward the probable

effect they receive from such examination.

Washback and Related Concepts

This relatively new topic in general educational circle is the phenomenon

called “backwash” which is used to refer to the influence of tests on teaching. Within

language assessment, however, the term “washback” is preferably used recently

(Alderson and Wall 1993; Messick 1996; Cheng 1997; Alderson 2004). Shohamy

(1992) also focuses on washback in terms of language learners as test-takers when she

Page 3: High-Stakes Testing Washback

describes "the utilization of external language tests to affect and drive foreign

language learning in the school context" (p. 513; as cited in Bailey, 1996, p.3). Some

Scholars like Cheng (1997) see the term ‘washback’ as a change in curriculum and

use it to indicate “an active direction and function of intended curriculum change by

means of the change of public examinations (p.38). As part of consequential validity,

Messick (1996: 261) says that:

Washback refers to the extent to which the introduction and the use of a test influences

language teachers and learners to do things that they would not otherwise do that promote or

inhibit language learning. (Fulcher and Davidson, 2007: 221)

The focus of any washback study, as Fulcher and Davidson (2007) claim, is “on those

things that we do in the classroom because of the test, but ‘would not otherwise do”.

A number of other key terms have also grown in the literature which seem to

convey, as Cheng (2005) claims, similar meaning and equated with washback (Green,

2007). Shohamy (1993a, p. 4) summarized some of these key concepts:

1. Measurement driven instruction refers to the notion that tests should drive

learning.

2. Curriculum alignment focuses on the connection between testing and the

teaching syllabus.

3. Systemic validity implies the integration of tests into the educational system

and the need to demonstrate that the introduction of a new test can improve

learning. (Bailey, 1999)

Morrow (1986) coined the term ‘washback validity’ and defined a valid test like this:

“… test is valid when it has good washback” and conversely “… test is invalid when

it has negative washback” (Alderson and Wall, 1993).

More recently, Bachman and Palmer (1996, pp. 29-35) have discussed impact

of a test as distinguished from its washback. The impact of test use, as they think,

Page 4: High-Stakes Testing Washback

operates at two levels: the micro level (i.e., the effect of the test on individual students

and teachers) and the macro level (i.e., the impact on society and its educational

systems). Many scholars consider these two concepts within the realm of a theoretical

notion first introduced by Messick. According to this notion called ‘consequential

validity’, the social consequences of testing are part of a broader, unified concept of

test validity (Messick, 1989, 1996). Arguing Morrow’s washback validity, he

suggests that tests which satisfy validity criteria are more likely to have a positive

influence on teaching and learning, and so counsels that washback is not a sign of test

validity, but that a valid test is likely to generate positive washback (Green, 2007).

Aspects of Washback

Wall (2000) considered tests as having positive (beneficial) and negative

(harmful)effects. Positive effects as she called included inducing them to cover their

subjects thoroughly, forcing them to complete their syllabuses within a prescribed

time limit, compelling them to pay as much attention to weak pupils as to strong ones,

and making them familiar with the standards which other teachers and schools were

able to achieve. Quoting Wiseman (1961), Wall mentioned the possible negative

effects of tests as encouraging teachers to `watch the examiner's foibles and to note

his idiosyncrasies' in order to prepare pupils for questions that were likely to appear,

limiting the teachers' freedom to teach subjects in their own way, encouraging them to

do the work that the pupils should be doing, tempting them to overvalue the type of

skills that led to successful examination performance, and convincing them to pay

attention to the `purely examinable side' of their professional work and to neglect the

side which would not be tested.

To be away from the complexity of the concepts, Alderson and Wall (1993)

explicitly stated 15 washback hypotheses through reading the literature and their

Page 5: High-Stakes Testing Washback

experience. The factors which are influenced are: teaching, learning, content, rate,

sequence, degree, depth, attitudes and also the number of teachers or learners affected by

a test. Hughes (1993) suggested a trichotomy model for washback, considering

participants, process, and products as components of washback. In Hughes framework,

participants include language learners and teachers, administrators, materials

developers, and publishers, "all of whose perceptions and attitudes toward their work

may be affected by a test". The term process covers ‘any actions taken by the

participants which may contribute to the process of learning’. According to Hughes,

such processes include materials development, syllabus design, changes in teaching

methods or content, learning and/or test-taking strategies, etc. Finally, in Hughes'

framework, product refers to "what is learned (facts, skills, etc.) and the quality of

learning (fluency, etc.)" (As cited in Bailey, 1999)

Watanabe (2004) conceptualized washback in terms of: Dimension (specificity,

intensity, length, intentionality and value of the washback), aspects of learning and

teaching that may be influenced by the examination, and the factors mediating the

process of washback being generated (test factors, prestige factors, personal factors,

macro-context-factors).

Andrews et al. (2002) found out in their study that the impact of a test can be

immediate or delayed. According to these researchers, washback seems to be

associated primarily with ‘high–stakes’ tests, that is, tests used for making important

decisions that affect different sectors. That is why this paper is going to deal with the

influences of MA Entrance Examination as a high-stakes test on aspects mentioned in

Alderson &Wall’s washback hypotheses. But before the study itself, I would like to

touch upon some of the empirical studies carried out on what this paper is concerned

with, that is, teacher’s methodology, attitudes and content.

Page 6: High-Stakes Testing Washback

Washback and Language Teachers

Among other groups like counselors, administrators, course designers, materials

developers, the most visible participants in washback studies, according to Hughes

framework, are language teachers. A number of studies delving into the teacher issue

in washback surveys have enriched the literature.

Cheng's (1997) reporting on the revised Hong Kong Certificate of Education

Examination (HKCEE) found that "84% of the teachers commented that they would

change their teaching methodology as a result of the introduction of the revised

HKCEE" (p. 45). Focusing on methodology washback, Lam (1994) concluded that

experienced teachers were much more examination-oriented than their younger

counterparts"(p. 91) and underlined changing the teaching culture as the challenge

(As cited in Bailey 1999, p.20).

A landmark study in the investigation of washback is no doubt Alderson and

Wall’s (1993) in Sri Lanka which ended up with the following summary statements

(p. 67):

1. A considerable number of teachers do not understand the philosophy/approach of the

textbook. Many have not received adequate training and do not find that the Teacher's Guides

on their own give enough guidance.

2. Many teachers are unable, or feel unable, to implement the recommended

methodology. They either lack the skills or feel factors in their teaching situation

prevent them from teaching the way they understood they should.

3. Many teachers are not aware of the nature of the exam- what is really being tested.

They may never have received the official exam support documents or attended

training sessions that would explain the skills students need to succeed at various exam tasks.

4. All teachers seem willing to go along with the demands of the exam (if only they knew

what they were).

5. Many teachers are unable, or feel unable, to prepare their students for everything that might

appear on the exam.

On her report, Wall (1996) revisiting the Sri Lankan impact study, stated that the

examination had had considerable impact on the content of English lessons and on the

Page 7: High-Stakes Testing Washback

way teachers designed their classroom tests, but it had had little to no impact on the

methodology they used in the classroom or on the way they marked their pupils' test

performance.

As another experience, Shohamy et al. (1996) observed that a new test of Arabic

(ASL) made class activated more test-like and teachers and students highly motivated

to master the materials (p. 301). Experienced teachers, they added, “turned to the test

as their main source of guidance for teaching oral language”, while the novice

teachers used "a variety of additional activities” to do that (As cited in Bailey, 1999).

In the context of Japan, Watanabe (1996) found that entrance exam did not influence

teachers in the same way and proposed three factors of 1) teachers' educational

background and/or experiences, 2) differences in teachers' beliefs about effective

teaching methods, and 3) the timing of the researcher's observations that can promote

washback in teachers. Watanabe concluded that "teacher factors may outweigh the

influence of an examination"(ibid., p. 331) in terms of how exam preparation courses

are actually taught.

Chen (2002) also investigated the effects of public exams on teachers. Chen

wrapped up with enumerating the factors that can influence the degree of washback on

teachers: teaching experience; teacher’s education; teacher’s fear or embarrassment of

their students’ poor performance; teacher’s awareness of test content; level of stake;

and gender.

Methodology

The participants of this study were 45 Iranian university professors who were

teaching English to undergraduate students. They were all either MA or PhD holders in

Teaching English as a Foreign language (TEFL) or English language literature(ELL). The

scope of thepopulation was all Islamic Azad University (IAU) branches in zones 8 and

12. The subjects were selected using convenient sampling method. They were 26 male

Page 8: High-Stakes Testing Washback

and 19 female teachers.

To determine the extent to which the subjects – their methodology, teaching

contents, and attitudes toward teaching and testing - were influenced by the national MA

Entrance Examination, a researcher-made questionnaire was administered. Out of 67

people received a copy of the questionnaire, 45 turned it back answered. The

questionnaire consisted of two sections: The first section comprised 20 statements. It

had a Likert Scale response format ranging from “very much” (which was given the

weight of 5) and “not at all” (which was given the weight of 1). The second section

was a brief mostly-selection type of items regarding the subject’s demographics,

including their gender, major, teaching experience, etc. To Check the reliability of the

Washback Effect Questionnaire, a pilot study was conducted with 20 English

teachers. Test-retest method was applied with a three week interval between two

administrations. Then, Pearson Product Moment Coefficient of Correlation formula

was used to calculate the index as 0.85.

As the researcher desired to reach at more reliable and valid results, he decided to

triangulate the data applying a structured interview to cross-check the data collected

through the questionnaire. Thus, after a time interval of three weeks, long

enough not to remember their responses to the items of questionnaire, 15 teachers out

of those handed in their questionnaires were interviewed. The questions were mostly

reworded form of the statements of the questionnaire. The responses were tape-

recorded for later detailed investigation.

Data Analysis

After meticulous analysis of the answers given to the items of the questionnaire, I

came up with interesting results. First, to get familiar with the sample, some

preliminary statistics on them are presented. The subjects were 45 English language

Page 9: High-Stakes Testing Washback

professors majoring in Teaching English as Foreign Language (TEFL) (69%) or

English Language and Literature (ELL) (31%). They were 57% male and 43%

female. Their age ranges are summarized in the following table: Age Range N. Per.

26 – 35 14 31.1

36 – 45 19 42.2

46 - above 12 26.6

Table 1: Number & percentage of the subject’s age ranges

Regarding their experience as language teachers, they are categorized as in Table 2: Teaching

Experience(years)

N. Per.

1 - 3 1 2.2

4 - 6 6 13.3

7 – 10 15 33.3

Over 10 23 51

Table 2: Number & percentage of the subject’s teaching experience

Moreover, 76% of subjects claimed that they usually check the MA exam items every

year, while the rest (24%) answered that they never or hardly ever do that annually.

For the second part, some determining areas I had in mind for this paper to investigate

are touched upon: Teaching and testing methods.

1. Teaching methods

Items 7, 9, 10, 14, and 16 are dealing with teaching method in one way or another.

The following graphs show the responses to the items mentioned.

Item No. 7: I use MA Examination items, as examples, while teaching in my classes.

0

5

10

15

20

25

30

35

not a

t all

not r

eally

som

etim

es

quite

a lo

t

very

muc

h

Figure 1: Response Percentage for the options in item 7

The figure shows that just 28% of the subjects use MA Exam items

Page 10: High-Stakes Testing Washback

Item No. 9: If I were supposed to teach in an MA preparation course, I would use the

same methods and techniques I am using now.

0

5

10

15

20

25

30

35

not a

t all

not r

eally

som

etim

es

quite

a lo

t

very

muc

h

Figure 2: Response Percentage for the options in item 9

Item No. 10: I teach the contents according to their sequence of importance in MA

examination.

05

10152025303540

not a

t all

not r

eally

som

etim

es

quite

a lo

t

very

muc

h

Figure 3: Response Percentage for the options in item 10

Item No. 14: I think my teaching method is helping students to get ready for both

final exam and MA exam

.

05

10152025303540

not a

t all

not r

eally

som

etim

es

quite

a lo

t

very

muc

h

Figure 4: Response Percentage for the options in item 14

Item No. 16: I teach the students the tips and tricks to answer the MA exam items.

Page 11: High-Stakes Testing Washback

0

5

10

15

20

25

30

35

not a

t all

not r

eally

som

etim

es

quite

a lo

t

very

muc

h

Figure 5: Response Percentage for the options in item 16

2. Testing Methods

Items 6, 12, and 13 are all related to the second area, which is testing method.

Item No. 6: In my class, I explain about the content or type of MA exam's items.

05

1015202530354045

not a

t all

not r

eally

som

etim

es

quite

a lo

t

very

muc

h

Figure 6: Response Percentage for the options in item 6

Item No. 12: I use MA exam items in my mid-term or final exams.

05

1015

2025

3035

not a

t all

not r

eally

som

etim

es

quite

a lo

t

very

muc

h

Figure 7: Response Percentage for the options in item 12

Item No. 13: My final exam’s items are essay-type.

Page 12: High-Stakes Testing Washback

0

5

10

15

20

25

30

35

not a

t all

not r

eally

som

etim

es

quite

a lo

t

very

muc

h

Figure 8: Response Percentage for the options in item 13

Moreover, by comparing these responses with other factors like, experience, gender, and

major some interesting results have come up:

The interesting result came up when the percentage of the subjects, in different categories of

teaching experience, who answered 4(Quite a lot) and 5(Very much) to all the above items.

Table 3: Percentage of the subjects who selected 4 or 5 to teaching method category according to their age

groups

Taking gender into consideration, the percentage of those who selected choices 4 and 5 is as

follows:

Gender Percentage

Male 40

Female 38.3

When major of the subjects was studied, there was no significant difference between those

majoring in TEFL and ELL:

Major Percentage

TEFL 36.2

ELL 38.3

Conclusions and Implications

In teaching method category, while figure 1 shows the subjects’ unwillingness to

use MA Exam items as examples, Figure 2 indicates that more than half of them tend to

Percentage Teaching

Experience(year)

10 4 -6

45.3 6 -9

50.4 Over 10

Page 13: High-Stakes Testing Washback

teach the way appropriate for the Exam. Moreover, the majority (60%) thought their

teaching method can be of help to prepare learners for the Exam. Nevertheless, as seen in

figure 5, the teachers sometimes or not really teach the tips and tricks of the Exam.

In the second category, where the effect of the MA entrance Exam is studied on

teachers’ testing method, figures depict that not the convincing majority of the subjects

explain about the tips and tricks of the Exam (Fig. 6). Figure 7 illustrates that more than

half of them avoid using exact MA items in their mid-term or final exams. Depicted in

Figure 8, the selection of item type for their exam seems to be affected by the form of the

MA exam which is not essay-type. This, however, can have other reasons like ease of

scoring for them.

As clearly indicated in the above charts and tables, the findings of the survey

indicate that teachers are positively affected by Iranian MA Entrance Examination as the

high-stakes test. The impact of this exam on teaching methods is positive as it makes

them teach the way students can be ready for the exam, they are using the same methods

and techniques appropriate for the exam, they teach according to the sequence of their

importance in the exam, they do not change the class to a mere introduction of the

students to the types of the target items, they do not spend their class time teaching tips

and tricks which per turns the class to an exam-oriented one. In addition, regarding the

test method of teachers in this study it is quite clear that except their disinclination to use

essay-type items, which may have some other reasons, the other aspects like explanation

about MA exam items or using those items in their class test are to a great extent

uncommon.

This survey also endorsed the study conducted by Lam (1994) and Shohamy

(1996)in saying that experienced teachers were much more examination-oriented than

their younger counterparts. Nevertheless, washback effect was not significantly

distinctive for the variable of teachers’ gender and field of study.

Page 14: High-Stakes Testing Washback

The pedagogical implications of the present survey is that teachers’ awareness of

the Irainian MA Entrance examination can to many extent influence on how well they

manage the class period, manipulate right techniques to teach the content with an eye to

the MA exam, and manage to design their classroom test.

Page 15: High-Stakes Testing Washback

References

Alderson, J.C. & Wall D. (1993). Does washback exist? Applied Linguistics, 14 (2), 115-129.

Andrews, S., Fullilove, J. & Wong, Y. (2002). Targeting washback – a case study. System,

30, 207-223.

Alderson, J. C. & Banerjee, J. (2000). Impact and washback research in language testing. In

Elder, C. Brown, A. Grove, E. Hill, K. Iwashita, N. Lumley, T. McNamara, T. & O’

Loughlin, K. (Eds.), Studies in Language Testing 11: Experimenting with Uncertainty.

Cambridge University Press.

Bailey, K. M. (1999). Washback in language testing. TOEFL Monograph Series. Princeton,

NJ: ETS.

Bachman, L (1990). Fundamental considerations in language testing. Oxford University

Press.

Bachman, L and Palmer, A (1996). Language testing in practice: Designing and Developing

Useful Language Tests. Oxford University Press.

Bailey, K (1996). ‘Working for washback: a review of the washback concept in language

testing’. Language Testing, 13/3, 257-279.

Banerjee, J (1996). The Design of the classroom observation instruments, UCLES Internal

Report, University of Cambridge Local Examinations Syndicate.

Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall

Regents.

Cheng, L. (2005). Changing language teaching through language testing: A washback study.

Studies in Language Testing, 21, Cambridge University Press, Cambridge.

Cheng, L. (1997). ‘How does washback influence teaching? Implications for Hong Kong,’

Language in Education, 11(1), 38-54.

Cheng, L. (forthcoming). Teacher perspectives and actions toward a public examination

change.Unpublished manuscript. Hong Kong: University of Hong Kong, Department of

Curriculum Studies.

Cheng, L., Watanabe, Y. & Curtis, A. (Eds.). (2004). Washback in language testing:

Research contexts and methods. Mahwah, N.J.: London: Lawrence Erlbaum.

Fulcher, G. and Davidson, F. (2007). Language testing and assessment: An advanced

resource book. New York: Routledge.

Hamp-Lyons, L (1997). ‘Washback, impact and validity: Ethical concerns,’ Language Testing

14/3, 295-303.

Page 16: High-Stakes Testing Washback

Henning, G. (1987). A guide to language testing." Development, evaluation, research, New

York: Newbury House.

Hughes, A. (1988). Introducing a needs-based test of English language proficiency into an

English medium university in Turkey. In A. Hughes (Ed.), Testing English for university

study (pp. 134-146). (ELT Documents #127). London: Modem English Publications in

association with the British Council.

Lam, H. P. (1994). Methodology washback- an insider's view. In D. Nunan, R. Berry, & V.

Berry (Eds.), Bringing about change in language education: Proceedings of the International

Language in Education Conference 1994 (83-102). Hong Kong: University of Hong Kong.

Messick, S. (1996). Validity and washback in language testing. Language Testing 13(3), 241-

256.

Qi, L. (2005). Stakeholders’ conflicting aims undermine the washback function of a high-

stakes test. Applied Linguistics, 22(2), 142-173.

Saif, S. (2006). Aiming for positive washback: a case study of international teaching

assistants. Language Testing, 23 (1), 1-34.

Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test impact revisited: Washback

effect over time. Language Testing 13(3), 298-317.

Shohamy, E. & Hornberger, N. H. (eds). ( 2008). Encyclopedia of Language and Education,

2nd Edition, Vol.7: Language Testing and Assessment, i–xi. Available on line at

http://spiina1001z/womat/production/PRODENV/ 0000000005/0000001

817/0000000016/0000590828.3D.

Taylor, L. (2005). Washback and impact. ELT Journal, 59: 154-155. OUP.

Turner, C. E. (2000). The need for impact studies of L2 performance testing and rating:

Identifying areas of potential consequences at all levels of the testing cycle. In Elder, C.

Brown, A. Grove, E. Hill, K. Iwashita, N. Lumley, T. McNamara, T. & O’Loughlin, K.

(Eds.), Studies in language testing 11: Experimenting with uncertainty. Cambridge University

Press.

Wall, D. (2000). The impact of high-stakes testing on teaching and learning: can this be

predicted or controlled? System, 28: 499-509.

Wall, D. & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study.

Language Testing 10(1), 41-69.

Watanabe, Y. (1996). Does grammar translation come from the entrance

examination ? Preliminary findings from classroom-based research. Language Testing, 13(3),

318-333.

Page 17: High-Stakes Testing Washback

APPENDIX

Interview Questions:

1. To what extent do you think the MA Entrance Examination influence your instruction?

2. Did you have to change the teaching techniques to meet the needs of the testing

syllabus?

3. Do you use MC items in your final exams?

4. If the MA examination items change to be essay-type, do you generally change your

final exam’s items?