Upload
maierbruggeringo
View
214
Download
0
Embed Size (px)
Citation preview
8/10/2019 final paper assessment.docx
1/14
EFL Testing and Assesment
Heany, Helen SS 2014
26.7.2014
Final written Assignment
Ingo Maierbrugger
(Mrn.: a0708160)
Elisabeth Gottesheim
(Mrn.: a0)
8/10/2019 final paper assessment.docx
2/14
CONTENTS
0. Introduction.2
----------------------- PART 1 ------------------
1.1. Test specifications3
1.2. Test development process.........4
1.2.1 Selecting texts:
Deciding as a group...........6
1.2.2 Writing Items:
Taming the test technique.... ..8
1.2.3 Revising Items and Testlet:
The importance of feedback..11
1.2.4 Final Version and Reflections:
The logistic of the test writing process..11
----------------------- PART 2 ------------------
2.1 Pilot test results
2.2. Revisions for a main trial
2.3. Test usefulness
2.4. Measures to improve test usefulness (could be integrated in II.3)
----------------------- PART 3 ------------------
3.1 Reflection Ingo Maierbrugger: More than a red ink pen ..
3.1 Reflection Elisabeth Gottesheim
Bibliography
Appendix..
8/10/2019 final paper assessment.docx
3/14
Introduction:
The proper relationship of testing and teaching is surely that of a partnership
Hughes (2003: 7)
Hughes (2003) observes in the first chapter of his book that many test users; teachers as
well as students mistrust tests, and he even admits that they are in most cases right in
doing so as a great deal of language testing is of very poor quality . This unsatisfactory
condition, however, was, as Hughes admits, the initial cause for him to write his book in
order to help language teachers to improve their testing.
Following Hughes mission, and also using his book, our course was directed to future
language teachers, who should learn right from the beginning how they can design tests
so that they will prove to be useful to their teaching instead of harmful. In other words
this course taught its students how to use and design test successfully, from a theoretical
point of view, as well as in a practice-oriented group work.
And especially the completion of this group work, which will be outlined in the following
paper, embraced once again all important stages of a creating a successful test, from
establishing the test specifications to evaluating of its final result.
In short, the course as a whole, and the group work in especially, have taught future
teachers, how to purposefully use test, and thus this paper wants to show, how we have
learned to bring, just as Hughes wants it to be, teaching and testing in the relation of a
partnership.
8/10/2019 final paper assessment.docx
4/14
1. Test specification:
1.1 General Statement of Purpose:
This test is designed to assess the English reading proficiency of students in the last
forms (7thor 8thgrade) of Academic Secondary School (AHS) who are preparing for the
Matura. As the test seeks to aid students in their preparations for the Matura exam the
level of performance the test is designed to measure is like in the Matura the B2 level of
the CEFR. Moreover, also in terms of test content, which means specifically in regard to
chosen texts as well as testing techniques, time setting this test tries to simulate
conditions of a Matura exam as authentically as possible.
Thus in respect to the test takers the purpose of the test is:
a.) to inform them about their reading proficiency in light of the upcoming Matura
b.)to give them a practice test, where they will find conditions similar to the
Matura
In this way, the test can be regarded as a proficiency test which will provide test takers
with a rough estimate of how they might do in an actual Matura exam. As test results are
not included in the students course grade, this test can be also regarded as a
approximate diagnostic test, which provides test takers with insights about their
performance on a Matura-like test and might so show test takers for example which
testing method they might still have problems with, or if they have to increase their
reading speed in order to complete the whole test in time. However, to effectively gain
this kind of diagnostic feedback, it would be advisable that, when the corrected tests are
handed back to students, some time is spent to discuss the test results as well as the
studentsoverall experiences with the test format
In sum, this reading test is a criterion-referenced proficiency test which follows closely
the test specifications of the Matura B2 level. Thus, the test, which will be objectively
scored, does not only provide useful feedback to the test takers who are preparing for
the Matura, but gives them a chance to practice the format. In other words, besides the
actual test results, this test contributes to creating a beneficial backwash for the new
Matura asit in the words of Hughes (2003: 55) helps to ensure that the test [namely
the Matura] is known and understood by students.
8/10/2019 final paper assessment.docx
5/14
1.2 Test focus (Test construct):
Like the Matura also this test uses the test construct embedded in the description of the
B2 reading proficiency standards as it is defined in the CEFR, and tries so to cover all
main components that form together an overall reading comprehension. Thus the test
assesses following reading operations:
1. Reading for gist
2. Reading for information/ important details
3.
Reading for main ideas and supporting details
4. Making propositional inferences
5. Reading to deduce the meaning of words phrase
Or put in the terminology of Khalifa and Weir (2008), this test tries to elicit both careful
and expeditious reading on a global as well as local reading level. However in regard to
ours group testlet, expeditious reading is only indirectly involved in the completion of
the tasks, as it is used to locate relevant information in the text, but then the test taker
needs to engage in careful reading in order to extract the answer individual tasks.
1.3 Test Takers: Age 16 and upwards,L1 majority German speaker
1.4Test content:
Also the texts and task types candidates are expected to be able to deal with are
essentially in line with the demands of the Matura exam, however in regard to overall
length and coherence within the individual texts some differences had to be made due to
logistic restraints, so that the final test can be seen as a slightly shortened version of a
Matura reading exam.
1.4.1. Authenticity of text:Authentic, not simplified but in some cases shortened to fit
logistic restraints. As texts are mainly taken from the internet layout is changed for
paper version, however paragraphing and overall text structure is kept as authentic as
possible
8/10/2019 final paper assessment.docx
6/14
1.4.2. Text types: General interest, articles, book reviews, etc., generally due to task
formats and layout restrictions mainly non- literary texts
1.4.3. Discourse type:narrative, argumentative, descriptive, expository, persuasive
1.4.4. Topic area: In order not to influence the performance of test takers selected
topics should be neither too provocative, i.e. topics which might cause offence or
emotional distress, nor too boring to the average reader in the test takers age group.
1.4. 5. Number of words: 500 Words precisely due to logistic reasons
1.4.6 Number of texts: 3, each text is accompanied by a different testing method
1.4.7 Test methods:
a. Multiple Choice
b. True / False with Justification
c. Gap filling
d. Short answers
1.5. Number of items per text: 8, which means 24 items altogether
1.6. Weighing per item:1 point per item (TFJ both parts need to be correct for 1 point)
1.7. Time for test:45 Minutes for 3 Texts with 8 items, compared with the Matura exam
which consists of 4 texts with also 8 items each and lasts 60 minutes, this test comprises
the same test time per individual section, namely 15 minutes, only that this test features
one section less to fit the time demands of school lessons which last 50 minutes. As
Hughes (2003: 141) points out that in assessing reading proficiency reading speed is a
prime important feature of the test which combines with the number and difficulty of
items to determine the amount of time needed for the test it is especially important that
the time setting of the test follows the demands of the Matura exam, so that the practice
for the students is as authentic as possible.
2. Test development process
2.1 Selecting Texts: Deciding as a group
8/10/2019 final paper assessment.docx
7/14
The first step in compiling our groups testlet namely to select an appropriate text
became apparent to be both the easiest and the hardest part of the whole creation
process. It was the easiest part in the sense that they requirements we had for the text
(around 500 words, authentic English, and not disturbing topic wise) were so broad that
within only a few minutes hundreds of adequate texts could be found in the internet.
However, the hard part was then to decide which of these countless texts might be the
most suitable one for our purpose. To solve this issue group discussions as well as the
text mapping procedure proved to be highly useful tools to finally make a successful
choice.
In this way, after every group member had decided individually on one text, this first
selection was presented in a group discussion. As everyone wanted to sell his/ her text
to the other group members the group discussion proved to be a good place to discuss
advantages and disadvantages of the texts, in which mainly issues concerning the
possible appeal to test takers and beneficial backwash of chosen text types were
discussed.
However, it was the text mapping procedure which helped to reveal the inner quality
of the text and thus to show which text was written coherent and clear enough to gain
most consensus points. This procedure proposed by Urquhart & Weir (1998: 306-7)
limited our final choice to two texts: A description of how to apply for a US visitor visa,
which scored most consensus points, and a book review on A Million Ways to die in the
West.
In a final group discussion we came to the conclusion that the visitor visa had most
consensus points because it was specifically written and designed to be easily
understood by the reader. So in order to make the final testlet not too easy we decided
to use an more traditional text type, namely the book review, which put forward as its
main advantage a paragraph structure which would give in the words of Hughes (2003:
142) candidates a good number of fresh starts. Additionally we agreed that a book
review would be a useful text type in regard to what pupils were expected to read in
school and later in university.
So in the end we based our final decision, like Hughes (2003: 142) put it, ultimately on
experience, judgment and a certain amount of common sense. And as we had come to
our final choice together as a team, we could include in the previous decision-making
8/10/2019 final paper assessment.docx
8/14
process the experiences and general judgment of not just one individual mind but of four
different people, which gave us finally the assurance that we had made a successful
choice.
2.2. Writing Items: Taming the test technique
The starting point of the item writing process were the consensus information points
which haven been raised in the text mapping procedure, as an ability to answer the
question(s) correctly implies that the text has been understood (Heany 2011) For this
reason we assigned to each group member three consensus points which had to be used
to create three test items respectively.
After the individual work we decided in a later held group discussion which 8 final task
items we would choose out of our pool of 12 possible items. Doing so, we exchanged
again thoughts and opinions about individual items and more importantly about the
coherence between those items. Thus, in reference to our initial test specifications we
noticed that nearly all items we had individually created were directed towards the
understanding of main ideas and important details, and that therefore the reading
operations reading for gistand making inference were almostnot addressed at all by
our test items. In retrospective, two factors can be identified which were responsible for
this first small set back in our test writing phase.
Firstly, as mentioned before, because test items were formulated on the grounds of
consensus items, it was quite logical that these items would target pieces of information
that were straightforwardly given by the text and not just implied by it, as the consensus
items themselves, had been comprised of concrete information remembered by a reader.
Secondly, the test technique itself, namely True/False, seemed to call on first glance as
well for concrete pieces of information rather than for gist or deductions, because it was
simply easier to ask yes and no questions, if you had a concrete fact in your mind.(i.e.
Q10: Albert is a coward.) Also Hughes observed this problem of the multiple choice
question format, of which the TF-format can be just regarded as a sub form (see Hughes
2011: 79), consequently Hughes concluded that this technique severely restricts what
can be tested.(Hughes 2011: 77)
8/10/2019 final paper assessment.docx
9/14
However, after we discussed the issue of our somehow restricted question focus we
nonetheless managed to come up with two items (Q13-14) which addressed the gist of
the book review rather than only concrete pieces of information, and which prompted
test takers to engage in inferring strategies in order to answer the items. The idea
underlying these items, namely to ask about attitudes and opinions of the author of the
text, was taken from an example of a Matura exam. This way we learned by imitating
other peoples work, how to overcome the initial problems we had with our assigned
test technique and learned moreover how to use this technique to a wider purpose than
just to ask for straight forward given concrete information.
2.3. Revising Items and Testlet: The importance of feedback
The revising process of our test items already started in the first group discussion in
which we presented our three individual test items. At this point we identified the items
which seemed most useful to all of us, and as described before, came up with new or in
one case transformed items which should target other reading operations than just
reading for main ideas and supporting details.
After this first revising phase of our item, we felt content and even quite proud of our
pool of now revised 8 task items, so that we could comprehend, what Hughes meant,
when he referred to home-made tasks items as perceived as minor works of art, or
even, it sometimes seems, [as] our babies (Hughes 2003: 58)
Moreover, Hughes was also right when he spoke about the difficulties that came with
handing your baby over to others, who should evaluate and give you feedback about the
quality of the self-made items. However, despite the fear that ones own work would get
too harshly criticized, the moderation process in which members of another group
evaluated our groups item proved to be highly useful, as via this method notonly minor
problems within individual items, such as spelling and ambiguous wording had been
detected, but moreover new important points were raised, which we as producing group
had simply not noticed before. So for example, it was brought to our attention that the
very heading of our text, contained the answer to our first question (namely that this
was MacFarlanes first written novel) and that the last 5 answers to your True/ False
items were all False, which might have been distracting to the test takers.
8/10/2019 final paper assessment.docx
10/14
Both problems were quite obvious, however to us, as we contentiously worked on the
items and the text, these discrepancies were invisible, as we were so focused on other
things that it never occurred to us to re-read the texts heading, and we knew the answer
to our items by heard, so that we never bothered to actually tick of the correct solution
in the answer boxes, which would had instantly revealed that nearly all items had to be
answered with False.
In conclusion, the fact that our proofreader saw our testlet for the first time, gave them a
different perspective to our work. They saw the big picture, and so they could perceive
problems, which were invisible to us. As Alderson, Clapham and Wall (1995: 39)
stressed: It is absolutely crucial in all test development [..] that some person or persons
other than the individual item writer(s) look closely at each item.
Revising the items was done easily: We dropped the first question altogether for being
too easy, and transformed some of the last five questions so that they had now True as
their correct answers. Thus, the required corrections were done without difficulties,
only seeing where they had been necessary was the crucial point, in which we had
clearly benefited from other peopleshelp.
2.4. Final Version and Reflections: The logistic of the test writing process
In the end we managed to finish our testlet in time and were quite confident that our
work was fit for its task. In retrospection, it had become clear that through the whole
test writing process next to individual ideas and creativity, or maybe even more
important than these factors, the exchange of thoughts with others, within or without
the working team, were the forces that drove the creating process forward to the final
compilation of the testlet.
However, and this might have the point responsible for some minor difficulties in the
overall creation process, the moments in which the group members were actually
physically together in the classroom and had time to discuss arisen issues, proved to be
a little too short to discuss all issues sufficiently, and other alternative online modes of
contacting each other were quite laborious, as you could only indirectly interact with
your colleagues , instead of engaging in a real face to face conversation, which would
have solved problems instantly.
8/10/2019 final paper assessment.docx
11/14
In the case of our group, however, this slight lack of group time was not too harmful to
the overall testlet development, as we were luckily aided with organizational as well as
logistical support from our course teacher, but If we were to design another test or
testlet with the stakes put higher, and moreover find ourselves in charge of moderating
the test creation process, it would be definitely a good idea to assign plenty of time to
real life group meetings, and it should be ensured additionally that these meetings
would happen on a regular basis, because as Hughes (2003: 58) observed: test
development is be best thought of as a task to be carried out by a team, and certainly,
and this is also what we experienced throughout our group work, a team works best if
group members are actually physically together and can so really engage in an open
exchange of thoughts and ideas, and thus work effectively together in creating a
successful test.
3.1 Reflection (Ingo Maierbrugger) More than a red ink pen
After reading the course name EFL Testing and assessment for the first time, I
expected that the whole course would be about how to grade Schularbeiten and
Hausarbeiten and nothing more. I had just the fix idea in my mind that everything thatwas called assessment involved a red ink pen and a teacher reading through some
students texts in order to find mistakes and mark them.
This course however has shown me that testing consists of more than just this aspect of
grading which is even only done in relation to the assessment of writing, but is a form of
finding information and presenting it that plays an important role in our educational
system as well as in all our culture. In this way, this course did not only widen my
understanding of testing in general, which means in respect to the broader role it plays
in our society (placement test, back wash effect, etc.), but I have also learned that in the
overall process of testing something grading is just one part of a series of activities
which are all necessary in order to create and later to execute a successful test.
And especially in regard to the test creation process, of which I had never thought of
before this course, I have seen throughout our course as well as our group work how
many aspects and details have to be thought of in order to create a well-functioning test.
Thus, I finally realized how much work actually goes into the creation of something that
I have grown so used to during my school and student days. And so my perspective on
8/10/2019 final paper assessment.docx
12/14
the matter has changed throughout the course: from one of a test taker to one of a test
maker.
And as a test maker I have learned that such a laborious task, as the compilation of a
testlet proved to be, is a job that is best done in group work and with help of others. As it
was described in the test development partof this paper group discussions, exchange
of thoughts and ideas, feedback and proof reading from other are vitally important in
creating a successful test. And I am quite confident that these modes of working
interactions are qualities that are also applicable in many other professional fields other
than test making.
In conclusion, this course as well as our group project have first of all taught me that
testing is more than just marking with a red ink pen. And secondly, and this is for my the
point that I will especially take home from this course, it was proved to me that working
together really ensures that the final result has in the end a high quality, and as I have
said before this observation surely is true for more than just test making.
Bibliography:
Alderson, Charles; Clapham, Caroline; Wall, Dianne. 1995. Language test construction
and evaluation. Cambridge: Cambridge University Press.
Heany, Hellen. 2011. Explanation of text mapping technique (rationale, method), moodle
course content.
Hughes, Arthur. 2003. Testing for Language Teachers. Cambridge: Cambridge University
Press.
Khalifa, Hannan; Weir, Cyrill.2008. Cambridge ESOL: Research Notes,2-16.
Urquhart, A. H.; Weir, Cyrill. 1998. Reading in a Second Language: Process, Product and
Practice, London: Longman.
8/10/2019 final paper assessment.docx
13/14
Appendix 1: First draft items and testlet
8/10/2019 final paper assessment.docx
14/14
Appendix 2: Testlet plus items Final version