e-Assessment: From Implementation to Practice

e-Assessment: From Implementation to Practice

Myles DansonFarzana Khandia

Loughborough University

Learning Outcomes

• Identify the benefits of online assessment

• Understand the pros and cons of using different question types

• Construct questions based on the advice given

• Translate statistical output of questions to improve the quality

Overview of CAA

“CAA is a common term for the use of

computers in the assessment of student

learning. The term encompasses the use of

computers to deliver, mark and analyse

assignments or examinations.” (Bull et al,

2001, p.8)

Benefits – For Staff1. Reduces lecturer administration (marking)

“ ... online assessment is capable of greater flexibility, cost effectiveness and timesaving. It is these attributes that have made it appealing in an age of competitive higher education funding and resource constraints.” McLoughlin, 2002, p. 511.

2. Supports distance learning assessment

3. Potential to introduce a richer range of material (audio, visual)

4. Ability to monitor the quality of the questions

5. Question reusability

Benefits – For Students

1. They can revise and rehearse at their own pace

2. Flexible access

3. Instant Feedback“Action without feedback is completely unproductive for a learner.” (Laurillard, 1993, p. 61)

4. Alternative form of assessment

Types of CAA - Web

• Authoring via PC• Delivery on PC / Web

Types of CAA - OMR

• Authoring / delivery on paper

• Marked using technology

• Example

Scenarios of use - Diagnostic

One-off test during start of academic year,

or at intervals throughout the year to gauge:

1. Variety in student knowledge

2. Gaps in student knowledge

3. Common misconceptions

Or help

4. Plan lecture content

Scenarios of use - Formative

• Promote learning by providing feedback

• may wish to use objective tests at regular intervals within a course to

• determine which topics have been understood, or

• to motivate students to keep pace with the teaching of the module

Scenarios of use - Summative

can be used to

• test the range of the student's understanding of course material

• Norm or Criteria Referenced

• High Risk

Demo

• Lboro Tests

• Bruce Wright Excel test

• Maths practice

Drivers

• Widening participation (Student diversity)

• Increasing student retention

• Enhanced quality of feedback

• Flexibility for distance learning

• Coping with large student numbers

• Objectivity in marks / defensibility

• QCA / DFES / HEFCE / JISC

QCA

by 2009• e-Assessment will be rolled out in post-16 education by

2009• e-Assessment will make a significant contribution to

reducing the assessment burden and improving the quality of assessment

• e-Assessment field trials should be taking place in at least two subjects per awarding body during 2005

• 75% of key and basic skills tests will be delivered on screen by 2005

• All new qualifications will include an option for on-screen assessment

• All Awarding Bodies should be set up to accept and assess e-portfolios

Barriers

• Availability of resources• Lack confidence in e-assessment• Fear of technical failure high stakes

assessments• Work pressures on academic staff (insufficient

time to give to developing potential of e-assessment)

• Fit for purpose / assessing appropriate levels of learning?

• Authentication issues e.g. learner identity, plagiarism

Support and Training

• Student support– Practice tests and fall back– Demonstration of the software– Special needs

• Staff support– Introductory session– Follow-up session– Pre, during and post-examination procedures

Question Design

It’s very easy to write objective questions but

difficult to write good objective questions.

A multiple choice question consists of four discreteelements:

As children’s language skills increase in complexity, from the pre-linguistic phase to telegraphic speech, the progression is most noticeable in the area of

a. semanticsb. intonationc. syntaxd. inferencee. clause combinations

Parts of a MCQ

STEM

KEY DISTRACTERSOPTIONS

• When possible, state the stem as a direct question rather than as an incomplete statement.

e.g.

Alloys are ordinarily produced by ...

How are alloys ordinarily produced?

Writing Stems

• Avoid irrelevant clues such as grammatical structure, well known verbal associations or connections between stem and answer e.g.A chain of islands is called an: *a. archipelago. b. peninsula. c. continent. d. isthmus

Grammatical clue!

The height to which a water dam is built depends on

a. the length of the reservoir behind the dam.

b. the volume of water behind the dam.

*c. the height of water behind the dam.

d. the strength of the reinforcing wall.

Connection between stem and answer clue

• Use negatively stated stems sparingly. When used, underline and/or capitalise the negative word. e.g.Which of the following is not cited as an accomplishment of the Kennedy administration?

Which of the following is NOT cited as an accomplishment of the Kennedy administration?

• Eliminate excessive verbiage or irrelevant information from the stem.

e.g.

While ironing her formal, Jane burned her hand accidentally on the hot iron. This was due to a transfer of heat be ...

Which of the following ways of heat transfer explains why Jane's hand was burned after she touched a hot iron?

• Include in the stem any word(s) that might otherwise be repeated in each alternative. e.g.In national elections in the United States the President is officially a. chosen by the people. b. chosen by members of Congress. c. chosen by the House of Representatives. *d. chosen by the Electoral College.

In national elections in the United States the President is officially chosen by a. the people. b. members of Congress. c. the House of Representatives. *d. the Electoral college.

• Present a definite, explicit and singular question or problem in the stem.

e.g.

Psychology ...

The science of mind and behaviour is

called ...

Writing Distracters• Use the alternatives "none of the above" and "all of the above"

sparingly. When used, such alternatives should occasionally be used as the correct response.

• Ensure there is only one unquestionably correct answere.g.The two most desired characteristics in a classroom test are validity and a. precision. *b. reliability. c. objectivity. *d. consistency.

The two most desired characteristics in a classroom test are validity and a. precision. *b. reliability. c. objectivity. d. standardisation.

• Make the alternatives grammatically parallel with each other, and consistent with the stem.

What would do most to advance the application of atomic discoveries to medicine? *a. Standardised techniques for treatment of patients. b. Train the average doctor to apply radioactive treatments. c. Remove the restriction on the use of radioactive substances. d. Establishing hospitals staffed by highly trained radioactive therapy specialists.

What would do most to advance the application of atomic discoveries to medicine? *a. Development of standardised techniques for treatment of patients. b. Training of the average doctor in application of radioactive treatments. c. Removal of restriction on the use of radioactive substances.d. Addition of trained radioactive therapy specialists to hospital staffs.

• Make alternatives approximately equal in length. e.g.

The most general cause of low individual incomes in the United States is *a. lack of valuable productive services to sell. b. unwillingness to work. c. automation. d. inflation.

What is the most general cause of low individual incomes in the United States? *a. A lack of valuable productive services to sell. b. The population's overall unwillingness to work. c. The nation's increased reliance on automation. d. An increasing national level of inflation.

• Make all alternatives plausible and attractive to the less knowledgeable or skilful student. e.g.

What process is most nearly the opposite of photosynthesis?

a. Digestionb. Relaxation *c. Respirationd. Exertion

a. Digestion b. Assimilation*c. Respiration d. Catabolism

• Vary the position of the correct answer in a random way

Exercise

Item Intimations

Question / Test Performance

• Caution is needed when using statistics based on small samples.

• External awarding bodies and test producers aim to trial items with at least 200 candidates to get reliable item statistics

• If there are fewer than 50 candidates the statistics will only give an approximate indication of how the items are working

Statistics for whole test

• Mean

• Standard deviation

• Reliability.

Mean= Average mark

Expected value:

• 65-75% for most tests (50-75% acceptable)

• Up to 90% in formative tests

• Mean should be similar to previous years.

Possible causes if Mean is not as expected:

1. Candidates are more / less able than in past (or than expected)

2. Candidates are better / less well taught

3. Test is too difficult / too easy

4. Time allowance was too short.

Standard deviation (s.d or σ)= a measure of the spread of the marks

Standard deviation (s.d. or σ)Expected value:

• 10-15%

• Will be much lower if mean is high

• S.D. should be similar to previous years.

Possible causes if Standard Deviation is low:

1. The candidates are genuinely of similar ability

2. The test is not distinguishing satisfactorily between the better and weaker candidates

3. The test covers two or more quite different topics (or abilities) .

Reliability= a measure of internal consistency (0 – 1)

Given as a decimal; the theoretical

maximum is 1.0

Expected value:

• above 0.75

Possible causes if Reliability is low:

1. Too many items with low Discrimination

2. Test is too short .

Example : Beauty Therapy exam

Number of candidates 151

Max possible score 120

Range of scores 63 / 112 (53-93%)

Mean 96.2 (80.2%)

Standard deviation 9.96 (8.3%)

K-R 20 reliability not calculated

Pass Rate 96.7%

Statistics for each item:

• Facility

• Discrimination

• Number and % choosing each option

• Mean for outcome

FacilityAlso called Difficulty or P-value

= Proportion or percentage of candidates answering question correctly

= Proportion choosing the key (multiple-choice)

Expected value:

• 40-90% for multiple-choice

• May be lower in other item types.

Possible causes if Facility below 40:

1. Key is not the best answer

2. Some candidates didn’t have time to answer

3. Topic is too difficult

4. The item is unclear or contains error

5. One or more distractors are unfair

6. Item is too complex.

Possible causes if Facility above 90:

1. Topic is too easy

2. Wording contains clues

3. One or more distractors are weak.

Discrimination / FacilityTheoretically possible values from –1.0 to +1.0

Shows whether the candidates who answered

this item correctly were the candidates who

generally did better on the whole test

Expected value: +0.2 or above .

Possible causes if Discrimination is too low:

1. Topic tested is unlike rest of the test

2. Item is very easy (> 90) or very difficult (<40)

3. Item is misleading

4. One or more distractors are unfair.

A negative Discrimination is never acceptable.

Display for each option

A B* C D

Number 10 137 1 3

% 6.6 90.7 0.7 1.9

Mean for outcome87.7 97.4 74 76

Facility 90.7 Discrimination 0.39

Number and % choosing each option

Also called Frequency and Percent or ‘Times answered’

Expected values:

• at least 5% for each distractor

• % for distractor should not be more than % for the key

Possible causes if distractor chosen is under 5%:

1. There is a clue in the wording or the distractor is silly

2. Topic is very easy.

Possible causes if distractor attracts more candidates than

the key:

1. Item is misleading in some way.

Mean for outcomeMean (average) mark on the whole test of the

candidates who chose this option.

Expected value:

• Mean for key (or correct outcome) should be higher than Mean for each of the distractors (or for incorrect outcome).

Possible causes if Mean for outcome not as expected:

1. Item wording is misleading or the key is arguable

2. Item is either very easy or very difficult.

Recent Innovations

• Confidence Assessment• Adaptive Testing• Free Text

www.sagrader.com www.intelligentassessment.com

• JISCe-Assessment Glossarye-Assessment Case Studiese-Assessment Road Map

• www.caaconference.com/

http://www.sagrader.com/

http://www.intelligentassessment.com/

http://www.caaconference.com/

Questions?

References

• Bull, J. and C. McKenna (2001). Blueprint for Computer-assissted Assessment, CAA Centre.

• Laurillard, D. (1993). Rethinking university teaching : a framework for the effective use of educational technology. London ; New York, Routledge.

• McLoughlin, C. (2002). "Editorial." British Journal of Educational Technology 33(5): 511-513.

Further Reading• Introduction to e-assessment

– Warburton (2005) Whither E-assessment, Proceedings of 9th CAA Conference (Loughborough University), 471-482.

• CAA Benefits for Staff and Students– Brown, Race & Bull (1999), Computer Assisted Assessment in Higher Education (Kogan

Page), 7-8.– Bull and McKenna (2000), Blueprint for CAA (CAA Centre), 8-10.– Ashton, H.S. et al (2003) Pilot summative web assessment in secondary education,

Proceedings of 7th CAA Conference (Loughborough University), 33-44.

• Student Acceptance– Sambell et al (1999) Students’ Perception of the learning benefits of computer-assisted

assessment: a case study in electronic engineering, in: S.Brown, J. Bull & P. Race (Eds) Computer-assisted assessment in higher education (Birmingham, SEDA), 179-191.

• Item Banks– Sclater, et al (2004) Item Banks Infrastructure Study (JISC).

• Future Developments– Boyle A (2005) Sophisticated Tasks in E-Assessment: What are they and what are their

benefits? Proceedings of 9th CAA Conference (Loughborough University), 51-66.

Documents

e-Assessment: From Implementation to Practice