final paper assessment.docx

8/10/2019 final paper assessment.docx

1/14

EFL Testing and Assesment

Heany, Helen SS 2014

26.7.2014

Final written Assignment

Ingo Maierbrugger

(Mrn.: a0708160)

Elisabeth Gottesheim

(Mrn.: a0)


2/14

CONTENTS

0. Introduction.2

----------------------- PART 1 ------------------

1.1. Test specifications3

1.2. Test development process.........4

1.2.1 Selecting texts:

Deciding as a group...........6

1.2.2 Writing Items:

Taming the test technique.... ..8

1.2.3 Revising Items and Testlet:

The importance of feedback..11

1.2.4 Final Version and Reflections:

The logistic of the test writing process..11

----------------------- PART 2 ------------------

2.1 Pilot test results

2.2. Revisions for a main trial

2.3. Test usefulness

2.4. Measures to improve test usefulness (could be integrated in II.3)

----------------------- PART 3 ------------------

3.1 Reflection Ingo Maierbrugger: More than a red ink pen ..

3.1 Reflection Elisabeth Gottesheim

Bibliography

Appendix..


3/14

Introduction:

The proper relationship of testing and teaching is surely that of a partnership

Hughes (2003: 7)

Hughes (2003) observes in the first chapter of his book that many test users; teachers as

well as students mistrust tests, and he even admits that they are in most cases right in

doing so as a great deal of language testing is of very poor quality . This unsatisfactory

condition, however, was, as Hughes admits, the initial cause for him to write his book in

order to help language teachers to improve their testing.

Following Hughes mission, and also using his book, our course was directed to future

language teachers, who should learn right from the beginning how they can design tests

so that they will prove to be useful to their teaching instead of harmful. In other words

this course taught its students how to use and design test successfully, from a theoretical

point of view, as well as in a practice-oriented group work.

And especially the completion of this group work, which will be outlined in the following

paper, embraced once again all important stages of a creating a successful test, from

establishing the test specifications to evaluating of its final result.

In short, the course as a whole, and the group work in especially, have taught future

teachers, how to purposefully use test, and thus this paper wants to show, how we have

learned to bring, just as Hughes wants it to be, teaching and testing in the relation of a

partnership.


4/14

1. Test specification:

1.1 General Statement of Purpose:

This test is designed to assess the English reading proficiency of students in the last

forms (7thor 8thgrade) of Academic Secondary School (AHS) who are preparing for the

Matura. As the test seeks to aid students in their preparations for the Matura exam the

level of performance the test is designed to measure is like in the Matura the B2 level of

the CEFR. Moreover, also in terms of test content, which means specifically in regard to

chosen texts as well as testing techniques, time setting this test tries to simulate

conditions of a Matura exam as authentically as possible.

Thus in respect to the test takers the purpose of the test is:

a.) to inform them about their reading proficiency in light of the upcoming Matura

b.)to give them a practice test, where they will find conditions similar to the

Matura

In this way, the test can be regarded as a proficiency test which will provide test takers

with a rough estimate of how they might do in an actual Matura exam. As test results are

not included in the students course grade, this test can be also regarded as a

approximate diagnostic test, which provides test takers with insights about their

performance on a Matura-like test and might so show test takers for example which

testing method they might still have problems with, or if they have to increase their

reading speed in order to complete the whole test in time. However, to effectively gain

this kind of diagnostic feedback, it would be advisable that, when the corrected tests are

handed back to students, some time is spent to discuss the test results as well as the

studentsoverall experiences with the test format

In sum, this reading test is a criterion-referenced proficiency test which follows closely

the test specifications of the Matura B2 level. Thus, the test, which will be objectively

scored, does not only provide useful feedback to the test takers who are preparing for

the Matura, but gives them a chance to practice the format. In other words, besides the

actual test results, this test contributes to creating a beneficial backwash for the new

Matura asit in the words of Hughes (2003: 55) helps to ensure that the test [namely

the Matura] is known and understood by students.


5/14

1.2 Test focus (Test construct):

Like the Matura also this test uses the test construct embedded in the description of the

B2 reading proficiency standards as it is defined in the CEFR, and tries so to cover all

main components that form together an overall reading comprehension. Thus the test

assesses following reading operations:

1. Reading for gist

2. Reading for information/ important details

3.

Reading for main ideas and supporting details

4. Making propositional inferences

5. Reading to deduce the meaning of words phrase

Or put in the terminology of Khalifa and Weir (2008), this test tries to elicit both careful

and expeditious reading on a global as well as local reading level. However in regard to

ours group testlet, expeditious reading is only indirectly involved in the completion of

the tasks, as it is used to locate relevant information in the text, but then the test taker

needs to engage in careful reading in order to extract the answer individual tasks.

1.3 Test Takers: Age 16 and upwards,L1 majority German speaker

1.4Test content:

Also the texts and task types candidates are expected to be able to deal with are

essentially in line with the demands of the Matura exam, however in regard to overall

length and coherence within the individual texts some differences had to be made due to

logistic restraints, so that the final test can be seen as a slightly shortened version of a

Matura reading exam.

1.4.1. Authenticity of text:Authentic, not simplified but in some cases shortened to fit

logistic restraints. As texts are mainly taken from the internet layout is changed for

paper version, however paragraphing and overall text structure is kept as authentic as

possible


6/14

1.4.2. Text types: General interest, articles, book reviews, etc., generally due to task

formats and layout restrictions mainly non- literary texts

1.4.3. Discourse type:narrative, argumentative, descriptive, expository, persuasive

1.4.4. Topic area: In order not to influence the performance of test takers selected

topics should be neither too provocative, i.e. topics which might cause offence or

emotional distress, nor too boring to the average reader in the test takers age group.

1.4. 5. Number of words: 500 Words precisely due to logistic reasons

1.4.6 Number of texts: 3, each text is accompanied by a different testing method

1.4.7 Test methods:

a. Multiple Choice

b. True / False with Justification

c. Gap filling

d. Short answers

1.5. Number of items per text: 8, which means 24 items altogether

1.6. Weighing per item:1 point per item (TFJ both parts need to be correct for 1 point)

1.7. Time for test:45 Minutes for 3 Texts with 8 items, compared with the Matura exam

which consists of 4 texts with also 8 items each and lasts 60 minutes, this test comprises

the same test time per individual section, namely 15 minutes, only that this test features

one section less to fit the time demands of school lessons which last 50 minutes. As

Hughes (2003: 141) points out that in assessing reading proficiency reading speed is a

prime important feature of the test which combines with the number and difficulty of

items to determine the amount of time needed for the test it is especially important that

the time setting of the test follows the demands of the Matura exam, so that the practice

for the students is as authentic as possible.

2. Test development process

2.1 Selecting Texts: Deciding as a group


7/14

The first step in compiling our groups testlet namely to select an appropriate text

became apparent to be both the easiest and the hardest part of the whole creation

process. It was the easiest part in the sense that they requirements we had for the text

(around 500 words, authentic English, and not disturbing topic wise) were so broad that

within only a few minutes hundreds of adequate texts could be found in the internet.

However, the hard part was then to decide which of these countless texts might be the

most suitable one for our purpose. To solve this issue group discussions as well as the

text mapping procedure proved to be highly useful tools to finally make a successful

choice.

In this way, after every group member had decided individually on one text, this first

selection was presented in a group discussion. As everyone wanted to sell his/ her text

to the other group members the group discussion proved to be a good place to discuss

advantages and disadvantages of the texts, in which mainly issues concerning the

possible appeal to test takers and beneficial backwash of chosen text types were

discussed.

However, it was the text mapping procedure which helped to reveal the inner quality

of the text and thus to show which text was written coherent and clear enough to gain

most consensus points. This procedure proposed by Urquhart & Weir (1998: 306-7)

limited our final choice to two texts: A description of how to apply for a US visitor visa,

which scored most consensus points, and a book review on A Million Ways to die in the

West.

In a final group discussion we came to the conclusion that the visitor visa had most

consensus points because it was specifically written and designed to be easily

understood by the reader. So in order to make the final testlet not too easy we decided

to use an more traditional text type, namely the book review, which put forward as its

main advantage a paragraph structure which would give in the words of Hughes (2003:

142) candidates a good number of fresh starts. Additionally we agreed that a book

review would be a useful text type in regard to what pupils were expected to read in

school and later in university.

So in the end we based our final decision, like Hughes (2003: 142) put it, ultimately on

experience, judgment and a certain amount of common sense. And as we had come to

our final choice together as a team, we could include in the previous decision-making


8/14

process the experiences and general judgment of not just one individual mind but of four

different people, which gave us finally the assurance that we had made a successful

choice.

2.2. Writing Items: Taming the test technique

The starting point of the item writing process were the consensus information points

which haven been raised in the text mapping procedure, as an ability to answer the

question(s) correctly implies that the text has been understood (Heany 2011) For this

reason we assigned to each group member three consensus points which had to be used

to create three test items respectively.

After the individual work we decided in a later held group discussion which 8 final task

items we would choose out of our pool of 12 possible items. Doing so, we exchanged

again thoughts and opinions about individual items and more importantly about the

coherence between those items. Thus, in reference to our initial test specifications we

noticed that nearly all items we had individually created were directed towards the

understanding of main ideas and important details, and that therefore the reading

operations reading for gistand making inference were almostnot addressed at all by

our test items. In retrospective, two factors can be identified which were responsible for

this first small set back in our test writing phase.

Firstly, as mentioned before, because test items were formulated on the grounds of

consensus items, it was quite logical that these items would target pieces of information

that were straightforwardly given by the text and not just implied by it, as the consensus

items themselves, had been comprised of concrete information remembered by a reader.

Secondly, the test technique itself, namely True/False, seemed to call on first glance as

well for concrete pieces of information rather than for gist or deductions, because it was

simply easier to ask yes and no questions, if you had a concrete fact in your mind.(i.e.

Q10: Albert is a coward.) Also Hughes observed this problem of the multiple choice

question format, of which the TF-format can be just regarded as a sub form (see Hughes

2011: 79), consequently Hughes concluded that this technique severely restricts what

can be tested.(Hughes 2011: 77)


9/14

However, after we discussed the issue of our somehow restricted question focus we

nonetheless managed to come up with two items (Q13-14) which addressed the gist of

the book review rather than only concrete pieces of information, and which prompted

test takers to engage in inferring strategies in order to answer the items. The idea

underlying these items, namely to ask about attitudes and opinions of the author of the

text, was taken from an example of a Matura exam. This way we learned by imitating

other peoples work, how to overcome the initial problems we had with our assigned

test technique and learned moreover how to use this technique to a wider purpose than

just to ask for straight forward given concrete information.

2.3. Revising Items and Testlet: The importance of feedback

The revising process of our test items already started in the first group discussion in

which we presented our three individual test items. At this point we identified the items

which seemed most useful to all of us, and as described before, came up with new or in

one case transformed items which should target other reading operations than just

reading for main ideas and supporting details.

After this first revising phase of our item, we felt content and even quite proud of our

pool of now revised 8 task items, so that we could comprehend, what Hughes meant,

when he referred to home-made tasks items as perceived as minor works of art, or

even, it sometimes seems, [as] our babies (Hughes 2003: 58)

Moreover, Hughes was also right when he spoke about the difficulties that came with

handing your baby over to others, who should evaluate and give you feedback about the

quality of the self-made items. However, despite the fear that ones own work would get

too harshly criticized, the moderation process in which members of another group

evaluated our groups item proved to be highly useful, as via this method notonly minor

problems within individual items, such as spelling and ambiguous wording had been

detected, but moreover new important points were raised, which we as producing group

had simply not noticed before. So for example, it was brought to our attention that the

very heading of our text, contained the answer to our first question (namely that this

was MacFarlanes first written novel) and that the last 5 answers to your True/ False

items were all False, which might have been distracting to the test takers.


10/14

Both problems were quite obvious, however to us, as we contentiously worked on the

items and the text, these discrepancies were invisible, as we were so focused on other

things that it never occurred to us to re-read the texts heading, and we knew the answer

to our items by heard, so that we never bothered to actually tick of the correct solution

in the answer boxes, which would had instantly revealed that nearly all items had to be

answered with False.

In conclusion, the fact that our proofreader saw our testlet for the first time, gave them a

different perspective to our work. They saw the big picture, and so they could perceive

problems, which were invisible to us. As Alderson, Clapham and Wall (1995: 39)

stressed: It is absolutely crucial in all test development [..] that some person or persons

other than the individual item writer(s) look closely at each item.

Revising the items was done easily: We dropped the first question altogether for being

too easy, and transformed some of the last five questions so that they had now True as

their correct answers. Thus, the required corrections were done without difficulties,

only seeing where they had been necessary was the crucial point, in which we had

clearly benefited from other peopleshelp.

2.4. Final Version and Reflections: The logistic of the test writing process

In the end we managed to finish our testlet in time and were quite confident that our

work was fit for its task. In retrospection, it had become clear that through the whole

test writing process next to individual ideas and creativity, or maybe even more

important than these factors, the exchange of thoughts with others, within or without

the working team, were the forces that drove the creating process forward to the final

compilation of the testlet.

However, and this might have the point responsible for some minor difficulties in the

overall creation process, the moments in which the group members were actually

physically together in the classroom and had time to discuss arisen issues, proved to be

a little too short to discuss all issues sufficiently, and other alternative online modes of

contacting each other were quite laborious, as you could only indirectly interact with

your colleagues , instead of engaging in a real face to face conversation, which would

have solved problems instantly.


11/14

In the case of our group, however, this slight lack of group time was not too harmful to

the overall testlet development, as we were luckily aided with organizational as well as

logistical support from our course teacher, but If we were to design another test or

testlet with the stakes put higher, and moreover find ourselves in charge of moderating

the test creation process, it would be definitely a good idea to assign plenty of time to

real life group meetings, and it should be ensured additionally that these meetings

would happen on a regular basis, because as Hughes (2003: 58) observed: test

development is be best thought of as a task to be carried out by a team, and certainly,

and this is also what we experienced throughout our group work, a team works best if

group members are actually physically together and can so really engage in an open

exchange of thoughts and ideas, and thus work effectively together in creating a

successful test.

3.1 Reflection (Ingo Maierbrugger) More than a red ink pen

After reading the course name EFL Testing and assessment for the first time, I

expected that the whole course would be about how to grade Schularbeiten and

Hausarbeiten and nothing more. I had just the fix idea in my mind that everything thatwas called assessment involved a red ink pen and a teacher reading through some

students texts in order to find mistakes and mark them.

This course however has shown me that testing consists of more than just this aspect of

grading which is even only done in relation to the assessment of writing, but is a form of

finding information and presenting it that plays an important role in our educational

system as well as in all our culture. In this way, this course did not only widen my

understanding of testing in general, which means in respect to the broader role it plays

in our society (placement test, back wash effect, etc.), but I have also learned that in the

overall process of testing something grading is just one part of a series of activities

which are all necessary in order to create and later to execute a successful test.

And especially in regard to the test creation process, of which I had never thought of

before this course, I have seen throughout our course as well as our group work how

many aspects and details have to be thought of in order to create a well-functioning test.

Thus, I finally realized how much work actually goes into the creation of something that

I have grown so used to during my school and student days. And so my perspective on


12/14

the matter has changed throughout the course: from one of a test taker to one of a test

maker.

And as a test maker I have learned that such a laborious task, as the compilation of a

testlet proved to be, is a job that is best done in group work and with help of others. As it

was described in the test development partof this paper group discussions, exchange

of thoughts and ideas, feedback and proof reading from other are vitally important in

creating a successful test. And I am quite confident that these modes of working

interactions are qualities that are also applicable in many other professional fields other

than test making.

In conclusion, this course as well as our group project have first of all taught me that

testing is more than just marking with a red ink pen. And secondly, and this is for my the

point that I will especially take home from this course, it was proved to me that working

together really ensures that the final result has in the end a high quality, and as I have

said before this observation surely is true for more than just test making.

Bibliography:

Alderson, Charles; Clapham, Caroline; Wall, Dianne. 1995. Language test construction

and evaluation. Cambridge: Cambridge University Press.

Heany, Hellen. 2011. Explanation of text mapping technique (rationale, method), moodle

course content.

Hughes, Arthur. 2003. Testing for Language Teachers. Cambridge: Cambridge University

Press.

Khalifa, Hannan; Weir, Cyrill.2008. Cambridge ESOL: Research Notes,2-16.

Urquhart, A. H.; Weir, Cyrill. 1998. Reading in a Second Language: Process, Product and

Practice, London: Longman.


13/14

Appendix 1: First draft items and testlet


14/14

Appendix 2: Testlet plus items Final version

Documents

final paper assessment.docx