Topic 10 Assessment in ESL

8/10/2019 Topic 10 Assessment in ESL

1/20


2/20

TOPIC 10 ASSESSMENT IN ESL212

Topic 10 focuses on basic concepts and constructs in language assessment,particularly on language testing rather than on the broader issue of assessment.A brief overview of assessment is provided which includes topics on:

(a) The nature of testing and evaluating ESL(b) Validity, reliability and practicality

(c) Types of language tests

(d) Test format

(e) Alternative assessment

(f) Effects of testing

TESTING AND EVALUATING ESLWhat means of assessment in Second Language Learning are you familiar with?

Why do teachers need to have a critical understanding of the principles andpractice of language assessment? Some reasons are:

(a) Language tests play a significant role in many peoples lives. Languagetests are instruments for the institutional control of individuals(McNamara, 2000: 4).

(b) As teachers, we need to be aware of what is involved in testing teachingto a test, administering tests, relying on information from tests to makedecisions on students performance, and even to develop tests to assessstudents progress.

(c) In action research, information about students proficiency is required. Youmay need to use measures of the proficiency of the students. You may needan existing test or develop your own.

Before we discuss tests and the nature of tests, let us first clarify the termsevaluation, assessment and t ests . Table 10.1 provides the definition of each term.

10.1


3/20

TOPIC 10 ASSESSMENT IN ESL 213

Table 10.1: Terms and Definitions

Term Definition

Evaluation Systematic gathering of information for the purpose of makingdecisions.

Assessment Procedure used to gain information about student learning and toform value judgments concerning learning progress.

Test A procedure for obtaining information on students performance.

Testing and teaching are interrelated and interdependent. However, the focus isnot necessarily similar. Tests focus on the assessment of the products of learning.On the other hand, teaching emphasises on enabling students to succeed in thelearning process (Chitravelu et al. 1995). A test is a yardstick a teacher uses tomeasure a learners performance (Baker, 1989). Most tests are administered underexamination conditions (e.g., formal, standardised tests and schoolexaminations); whereas others are conducted as an integral part of the teachingand learning processes. Formal tests are systematic, planned sampling techniquesdesigned to assist teachers and students in appraising students achievement.Informal tests are unplanned assessments made as a course is conducted.

Testing can help students in many ways. Madsen (1983) points out that well-constructed tests:

(a) Help to create positive attitudes toward the class; and(b) Assist students in mastering the language.

Testing helps teachers too, as they are expected to be accountable for the resultsof their instruction. Tests help teachers in answering the question, Have I beeneffective in my teaching? Thus, tests can act as a check for teachers indiagnosing teachers own efforts as well as the students. As teachers examine thestudents tests, s/he might ask a series of questions, Are my lessons pitched atthe right level? Am I aiming my instruction too high or low? Am I teaching someskills effectively and others less effectively? What areas need more practice?Which points need reviewing? (Madsen, 1983: 5). Tests therefore can benefitstudents, teachers, and administrators through confirmation of progress. Goodtests can sustain or boost class morale and encourage learning.

For detailed information on testing and evaluation in ESL, visit the following sitehttp://www.middleweb.com/Assmntlinks.html


4/20


5/20


(ii) Predictive validity concerns the degree to which a test can predict acandidates future performance.

(d) C onstruct ValidityA test is construct-valid if it can be shown that it measures just the abilitywhich it is supposed to measure. Construct refers to any underlyingability which is hypothesised in the theory of language ability. If we try tomeasure the ability to infer the meaning of unknown words from context ina test, then that part of the test is construct-valid only if we are able todemonstrate that we are indeed measuring just that ability.

Visit this site for information of construct validity by James Dean Brown:

http://www.jalt.org/test/bro_8.htmA summary on types of validity is given in Table 10.2.

Table 10.2: Types of Validity

Type of Validity Test

Face Looks like a good one to learner/layperson

Content Accurately reflects the syllabus it is based on

Predictive Accurately predicts future performanceConcurrent Gives similar results to already validated tests or other immediate

external criteria (e.g. teachers subjective assessment)

Construct Reflects closely a valid theory of foreign language learning that ittakes as its model

1. Distinguish the key difference between content and constructvalidity.

2. If content validity is absent, why does construct validity assumegreater importance?

3. Explain the fact that there is no final, absolute, and objectivemeasure of validity. Why does validity ultimately go back to thesubjective opinion of testers and theorists?

ACTIVITY 10.1


6/20


7/20


(h) Make comparisons between test-takers as directly as possible.

(i) Provide a detailed scoring key.

(j) Train scorers.(k) Agree on what constitutes acceptable responses and appropriate scores at

the outset of scoring.

(l) Identify candidates by number, not name to avoid bias.

(m) Employ multiple, independent scoring.

Another way of ensuring that the test is valid and reliable is for the teacher toconduct an item analysis of the test based on the scores in the test in apreliminary run of the test. That is, if time permits, the teacher should pilot the

test before its actual administering. The scores on the test can be analysed todetermine whether the items are consistent, well-constructed, and valid. Bailey(1998), Cohen (2001), and Heaton (1990) provide a description of how items ontests can be analysed.

10.1.3 Practicality

Some tests are ideal theoretically but are difficult in practice. For example, massoral testing is difficult. Oral proficiency testing is important in the learningprocess. However, it is seldom part of national or school level examinations.Mass oral proficiency is expensive to conduct, time-consuming, and reliability isoften low because of inter-rater variability. Thus, efficiency or practicality of atest involves issues of economy, ease of administration, scoring, andinterpretation of results. The longer it takes to construct, administer, and score,the more skilled personnel and equipment required, the higher the costs arelikely to be.

In language testing, which test seems easy at a glance but challengingto answer?

SELF-CHECK 10.1


8/20


TYPES OF LANGUAGE TESTSNunan (1991) lists the following types of tests:

(a) Direct Tests Versus Indirect TestsIn direct testing, the test-taker performs precisely the skill being measured.For instance, if we want to test speaking skills, the students tested should be asked to speak. Indirect testing, on the other hand, attempts to measurethe abilities underlying the skills in which we are interested. Indirect testingattempts to measure the abilities underlying the skills which many areinterested.

(b) Discrete Point Versus Integrative Tests

Discrete point testing involves testing one element at a time, item by item.An example of this would be to have a number of items testing a particulargrammatical structure. By contrast, integrative testing requires the test-taker to combine many language elements in order to complete the task.

(c) Norm-Referenced Tests Versus Criterion-Referenced Tests Testing which relates one candidates performance to that of othercandidates is norm-referenced testing. For example, a student obtained ascore that places him/her in the top ten percent of candidates who sat forthe test, and we are not told directly of what the candidate can do. By

contrast, criterion-referenced testing is one in which we classify peopleaccording to whether or not they are able to perform some task or sets oftasks satisfactorily.

(d) O bjective Tests Versus Subjective Tests The difference between the two tests is in the form of scoring. If no judgment is needed on the part of the scorer, the scoring is thus objective. If judgment is required, the scoring is subjective.

(e) Communicative Language Testing

Communicative language testing involves using a test which measures theability of candidates in taking part in acts of communication includingreading and writing (Weir, 1993). Such tests are normally intended to be ameasure of how test-takers are able to use language in real life situations.Communicative tests are often context-specific.

10.2


9/20


TEST FORMATSThere are various test formats or sometimes called test techniques used in

assessing language ability. Through test formats we obtain information about thecandidates language skill or ability. Some formats are suitable to test certainlanguage skills or abilities. Formats do not, in general, determine what can betested. For instance, we can use a multiple-choice format to test grammar,reading, vocabulary, and for diagnostic testing, etc.

In choosing a particular format, we need to consider the following questions:

(a) Does the test format allow us to obtain the information we need about thestudents ability in the skill we are testing? Will the results we obtain bevalid and reliable?

(b) Is using the format the most economical way to obtain the information wewant? Will it have good backwash effects? Backwash effect is the effecttesting has on teaching and learning.

(c) Are the students sufficiently familiar with the format?

The following is a discussion of the more common language tests, namelymultiple-choice item (a common test format) and cloze. For other types of testsrefer to Weir (1993), Heaton (1990) and McNamara (2000).

(a) Multiple-choice Questions MCQ) Hughes (1989) provides the following as the basic structure of a multiple-choice question test (MCQ):

There is a stem:Enid has been here half an hourA number of options is provided, one of which is correct, and the others aredistractors:

(i) during

(ii) for(iii) while

(iv) since

10.3


10/20


The Candidate Has to Identify the Correct or Most Appropriate OptionThe most obvious advantage of the MCQ is that scoring can be perfectlyreliable, rapid and economical. Another advantage is that it is possible to

include more items because it is quite easy for students to respond to thequestions by putting a mark on the paper. This makes for greater testreliability.

Limitations of the MCQThere is a number of disadvantages of the MCQ format. First, if there is nofit between candidates productive and receptive skills, the MCQ may givean inaccurate picture of the candidates ability.

Second, guessing may have a considerable but unknowable effect on test

scores. On the average, we expect a person to score 33 on a 100-item test bymere chance. The restricted number of response allows guessing to happenfor the more difficult questions.

Another limitation is that the format severely restricts what can be tested.MCQ requires distractors, and distractors are not always available andrequire skill to construct. In addition, it is difficult and time-consuming towrite successful items. However, the time saved in time administration andscoring far outweighs the time spent on constructing a successful test.

A fourth limitation is that backwash effects may be harmful. Practicing forthe test will have a harmful effect on learning and teaching. Practices of theMCQ items may not be the best way to improve language proficiency.Lastly, MCQ may facilitate cheating. The responses on the MCQ (a,b,c,d)are simple enough to communicate to other candidates non-verbally. Oneway to avoid this is to have two versions of the test one form has thereverse order of options.

(b) Clozentropy or Cloze Procedure Cloze tests are prepared by deleting a certain number of words from a textand replacing these deleted words with blanks. Candidates then fill in the blanks with the appropriate word or answer. Pure cloze is where every nthword is deleted (fifth, sixth, tenth, etc) depending on the level of difficulty.When there is no fixed number of words to be deleted consistently, ratherwords are deleted wherever the nth word falls, this is called a randomcloze. When the test constructor chooses the type of words to delete, this iscalled a rational cloze.


11/20


The cloze test is used to test grammar, vocabulary, reading and writingskills. The kinds of blank chosen and the kinds of options given determinewhat the cloze test tests.

Constructing the Cloze TestIn choosing texts for the cloze, the teacher must ensure that the language isappropriate to the students level. In general, a text passage that is toofactual do not lend itself to good cloze testing. The best types for a cloze arestories, descriptions of process, and explanations of something.

In addition, students should be given clear directions how to sit for thecloze test and how to respond to the test. Teachers can instruct students toread through the whole text before attempting to fill in the blanks.

Another guideline in constructing the cloze is that a sentence or two must be left intact at the beginning of the text. These are called the lead-in. Inwriting the options (if alternatives are provided), ensure that the optionsare placed as follows:

(i) Immediately after each blank;

(ii) In the margin;

(iii) Below the text, in numbered groups; or

(iv) In a separate answer sheet.

The cloze can be scored by marking the ones with alternatives using a key.Cloze tests which do not have options can be marked either by acceptingwords in the original text (before deletion). This is called exact wordmarking. The other way is to accept any word which fits the blank, termedacceptable word marking.

Strengths and Weaknesses of the Cloze:

(i) It is easily constructed, administered, and scored.

(ii) It has a high degree of reliability because it is an objective test.(iii) Cloze is not appropriate for diagnostic purposes.

(iv) The cloze can be a difficult task.

There are other forms of test formats. Refer to Chitravelu et al. (1995) andother testing books on language for a detailed account of this topic.


12/20


What types of test format can be used in a reading comprehension test?

ALTERNATIVE ASSESSMENTRecent developments in testing provide teachers with new perspectives ontesting language abilities. The following are some issues related to this recentchange in testing (Brown, 2001: 403 410):

(a) N ew Perspectives on IntelligenceIntelligence was seen in the past as the ability to perform linguistic andlogical mathematical problem solving. Today, our world is dominated bystandardised, norm-referenced tests that are timed, multiple-choice, tricky,long, and artificial. More recent theory of intelligence proposed by Gardner(1983) states that there are various forms of intelligence, as discussed in anearlier chapter, as follows:

(i) Linguistic intelligence

(ii) Logical-mathematical intelligence

(iii) Spatial intelligence (the ability to find your way around anenvironment, to form mental images of reality)

(iv) Musical intelligence (the ability to perceive and create pitch andrhythmic patterns)

(v) Bodily-kinesthetic intelligence (fine motor movement, athleticprowess)

(vi) Interpersonal intelligence (the ability to understand others, how theyfeel, and to interact effectively with them)

(vii) Intra-personal intelligence (the ability to understand oneself and todevelop a sense of self-identity)

10.4

Draw a mind-map on testing and evaluating ESL. Think of otheradvantages and limitations of using each of the test proceduresdescribed in the Topic.

ACTIVITY 10.2


13/20


These new perspectives on intelligence form a challenge to teachers intesting students abilities as they are endowed with multiple intelligence,each of which varies according to students. We now need to be able to test

interpersonal, creative, communicative, interactive skills and in doing so wehave to trust our own subjectivity and intuition (Brown, 2001: 404).

(b) Performance-based TestingIn educational settings around the world, testing has taken on a newAgenda. Instead of just the traditional paper-and-pencil single-answer tests,performance-based tests are being introduced in schools. They involve:

(i) Open-ended problems

(ii) Labs

(iii) Hands-on projects(iv) Student portfolios

(v) Experiments

(vi) Essay writing

(vii) Group projects

Although such testing is time-consuming and thus expensive, the losses inpracticality are compensated by the higher validity. Students are tested

based on actual performance. Learners are gauged on the process ofperforming the criterion and this establishes high content validity. Inlanguage teaching, the teacher needs to create a balance between formaland informal testing. In more formative evaluation of studentsperformance of various tasks, the teacher can move towards meeting thegoals of performance-based testing.

(c) Interactive Language TestsInteractive tests come under such performance-based testing. Interactivetests are constructed in line with Gardner s and Strenbergs theories of

intelligence in which students are assessed in the process of interacting withothers (Brown, 2001). Tests thus have to involve people actually performingthe behaviour that we want to measure.

(d) Traditional and Alternative Assessments These tests imply a move towards alternative ways of testing in that moreauthentic elicitation of meaningful communication is emphasised. Thefollowing Table 10.3 is a comparison between traditional and alternativeassessments (Brown, 2001: 408):


14/20


Table 10.3: A Comparison between Traditional and Alternative Assessment

Traditional Assessment Alternative Assessment

one short-term standardised exams continuous long-term assessmenttimed multiple-choice format untimed, free response format

decontextualised items contextualised communicative tasks

scores suffice for feedback formative, interactive feedback

norm-referenced scores criterion-referenced scores

focus on the right answer open-ended, creative answers

summative formative

oriented to product oriented to process

non-interactive performance interactive performance

fosters extrinsic motivation fosters intrinsic motivation

Brown (2001) proposes that traditional testing offers significantly higher levels ofpracticality. More time and budget are needed to conduct and evaluateassessments that require subjective evaluation, individualisation, and interactionin feedback found in alternative assessments. However, the payoff comes in theform of useful feedback to students, better chances for intrinsic motivation, andeventually greater validity.

Brown (2001: 408 409) suggests the following four principles for convertingordinary, traditional tests into authentic, intrinsically motivating learningopportunities for learners:

(a) Test Taking StrategiesTeachers can help learners with appropriate and useful strategies for testtaking. To ensure students are prepared to do their best in tests, teachersshould consider the following as a guide:

(i) Before the Test Provide the students with the necessary information about the

test.What will the test cover? Which topics will be the most important?What kind of items will be included? How long will the test be?

Encourage students to do a systematic review of material, e.g.,skim the book, outline the major points, write down examples, etc.


15/20


Give students practice tests or exercises, if available.

Facilitate the formation of study groups, if possible.

Remind students to get a good nights rest before the test. Remind students to get to the classroom early.

(ii) During the Test

As soon as the test is distributed, instruct students to quickly skimthrough the whole test to get a good grasp of the different parts.

Advise students to concentrate as carefully as possible.

Alert students a few minutes before the test ends so that they can

proofread their answers, catch careless errors, and still finish ontime.

(iii) After the Test

When the test is given back to students, give feedback on specificthings the students did well on, what s/he did not do well on, andthe possible reasons for such a judgment.

Advise students to concentrate on the feedback you give in class.

Encourage questions from students.

Remind students to focus for the future on points that they areweak on.

(b) Face Validity Sometimes students are not fully aware of what is being tested. They mayfeel that the test is not testing what it is supposed to test. Face validitymeans that the students must perceive the test to be valid. To help fosterthis perception, the teacher can:

(i) Prepare a carefully constructed, well-thought-out format.

(ii) Develop a test that could be completed within the time given.(iii) Write items that are clear and uncomplicated.

(iv) Include directions that are crystal clear.

(v) Give tasks that are familiar and related to their coursework.

(vi) Ensure the difficulty level of the test is appropriate for the students.


16/20


(c) Authenticity Teacher needs to ensure that the language in the test is as authentic aspossible. Provide the language with context so that items are not a string of

unrelated language samples, e.g., thematic organisation of items. The tasksmust also be in the format with which students are familiar. A classroomtest is not the time to introduce new tasks because we will not knowwhether student difficulty is a factor of the task or the language we aretesting.

(d) W ashback The benefit that tests offer is known as washback. Formal tests must belearning devices through which students can receive feedback anddiagnosis of their strengths and weaknesses. It is important for teachers to

give prompt feedback to foster intrinsic motivation. The t eacher needs togive a generous number of specific comments as feedback, instead of onlylimiting it to a letter grade or number score. Give praise for strengths andconstructive criticism for weaknesses. Washback also means that theteacher is available for discussion with students to go over their strengthsand limitations. The following section deals with the issue of washback inmore detail.

EFFECTS OF TESTING

Tests have a strong influence on the curriculum because they determine thefuture opportunities of individuals and influence the reputation of teachers andschools (McNamara, 2000: 73). We cannot agree more on this because, inMalaysia, standardised examinations play a pivotal role in the opportunitiesavailable to students. We need to know some of the effects of testing on ourteaching and on students learning in order to create a healthy balance betweenpracticality and the ideals in education.

Washback or the effect of tests on teaching and learning is a constantly debatedissue in education. Ethical language testing practice should work to ensure a

positive washback from tests. For example, it is often argued that performanceassessments have better washback than multiple choice (MCQ) test formats orcloze. Performance assessments require the integration of knowledge and skills,therefore they require preparation which encourages both students and teachersto invest time in practising such tasks in the classroom. In contrast, MCQs oftentest knowledge of grammar or vocabulary which may inhibit communicativeapproaches to learning and teaching.

10.5


17/20


However, research on assessments and washback found that washback is oftenunpredictable (McNamara, 2000:74). McNamara further adds that whether or notthe desired effect is achieved in efforts on curriculum change depends much on

local conditions in the classroom, the established traditions of teaching, theimmediate motivation of learners, and the unpredictable ways in whichinteractions develop.

According to Bachman and Palmer (1997:30 34), washback is discussed in testingliterature largely as the direct impact of testing on individuals. Hughes (1989)defines washback as the direct effects of testing on teaching and learning andpoints out that testing can either have a beneficial or a harmful effect on teachingor learning. Cohen (1994 in Bachman and Palmer, 1997:30), discusses the effectsof washback as the effect of assessment instruments on educational practice and

beliefs. Research has also shown that washback is a complex and difficult issue,rather than simply the effect of testing on teaching (Wall and Alderson (1993).

Some of the impact discussed by Bachman and Palmer (1997:30 35) are asfollows:

(a) Impact on Individuals Stakeholders who are directly affected by tests are the test-takers, the testusers or decision-makers. Others such as test-takers, future class-mates, orco-workers and future employers are indirectly affected. Every member of aparticular system (society) is indirectly affected by the use of tests.

(b) Impact on Test Takers The following aspects of a testing procedure can affect test takers:

(i) The experience of taking and preparing for the test;

(ii) The feedback received on their performance on the test; and

(iii) The decisions made about them based on the test scores.

Figure 10.1: Tests can affect test takers


18/20


The experience of taking a test can have an impact on the test-taker(Figure 10.1). If the topical or cultural information in the test is new, thisaffects the test- taker s performance. The test-takers language knowledge

can also be affected especially when something provided in the test isconsidered grammatically correct, which in fact, is not. In addition, thetypes of feedback given to test-taker are likely to affect the test-takerdirectly. Feedback thus has to be highly relevant, complete, and meaningfulto the test takers. Verbal descriptions and written comments can have apositive response on test-takers in terms of perceptions of the test.

Finally, decisions made based on the test-takers test scores directly affectthe test-takers, for example, acceptance into an instructional program,advancement, or employment are decisions that can have serious

consequences for test-takers.Are decision procedures and criteria applied uniformly to all groups of test-takers? Fair test use, according to Bachman and Palmer (1997), is related tothe relevance and appropriateness of the test score to the decision. Is it fairto make a life-affecting decision solely on the basis of a test score?

(c) Impact on TeachersTest-users such as teachers are also directly affected by tests. A majority ofteachers are familiar with the amount of influence testing can have on theirinstruction. Teachers may prefer to teach in a particular way, but if theyhave to do it in a specified way, then they may find teaching to the testunavoidable. Bachman and Palmer (1997:33) describe term teaching to thetest as doing something in teaching that may not be compatible withteacher s own values and goals, or with the values and goals of theinstructional program. If teachers feel that what they teach is not relevantto the test (vice-versa), this is seen as authenticity whereby the test mayhave harmful washback or negative impacts on instruction.

(d) Impact on Society and Education SystemsTest-developers and users need to consider the societal and educationalvalue systems that inform test use. However, values and goals that informtest use may vary cross-culturally. For instance, a particular culture mayplace a high value on individual effort and achievement, while in another,cooperation and respect for authority may be highly valued. In addition,values and goals change over time.


19/20


Confidentiality and access to information is today regarded as a basic rightfor test-takers which were once not considered at all. In high-stakes testsinvolving decision-making on a large group of individuals (e.g.,

standardised tests), tests have a direct impact on teaching practices andprograms.

The following is a list of things that we can do to organise our assessmentof potential consequences of tests (Bachman and Palmer 1997: 35):

(i) List the intended uses of the test;

(ii) List the potential consequences (positive and negative) of using thetest in these ways;

(iii) Rank the possible outcomes in terms of desirability or undesirability

of their occurring;(iv) Gather information to determine how likely each of the various

outcomes is.

Analyses of the possible consequences of using a particular test needs to becomplemented by considering too the consequences of using alternatives totesting to achieve the same purpose.

1. Think of tests with which you are familiar (e.g., aclassroom test). What do you think is the backwash effect ofeach of them? Are they harmful or beneficial? What are yourreasons for your conclusions?

2. Study a classroom test and describe the test in terms of thepurpose of the test, its validity and reliability, and the potential backwash effects. Do you think the test provides accurateinformation about the students?

ACTIVITY 10.3


20/20


This topic provided a brief overview of some basic concepts of languagetesting and included concepts such as validity, reliability, and practicality.

The different types of test format such as direct versus indirect tests, discreteversus integrative tests, norm-referenced versus criterion- referenced tests,and so forth were also described.

The chapter included a discussion of the two common language testingformats: multiple choice and cloze.

A brief discussion was also included on the topic of alternative assessment.

Finally, the effects of testing on students and society at large were described.

Cloze tests

Construct validity

Content validity

Criterion-related validity

Direct test

Face validity

Indirect test

Performance based testing

Practicality

Reliability

Documents

Topic 10 Assessment in ESL