eAssessment
Colin MilliganHeriot-Watt University
Randy Bennett, ETS
Like many innovations in their early stages, today’s computerised tests
automate an existing process without reconceptualising it to realise the dramatic improvements that the
innovation could allow.
Bennett’s 3 Stages
1. Resemble Paper Tests,2. New Formats – multimedia and
constructed responses,3. Simulations, Virtual Reality –
embedding assessment within learning.
Summative Formative
Efficient for large student numbers,
Consistent marking,
Integration with institutional systems,
Immediate feedback on performance.
Immediate feedback, linked to remediation,
Individualisation,Opportunity for
multiple attempts,
Opportunity for innovation.
It’s all Multiple Choice
Easy to author (!),Easy to mark,Supported by VLEs.
Assesses lower order skills, Guessability,Not suited to some disciplines.
Beyond Multiple Choice
Numerical Assessment,Authentic Assessment,Short Answer Questions,Discussion Fora,Essays,Peer Assessment.
Numerical Assessment
Randomisation,Expression evaluation,Partial credit,
Avoids pitfalls of closed ‘objective-type’ questions,
Matched to current assessment.
Authentic Assessment
Instead of testing knowledge recall, tests interpretation and application:– Utilise data,– Manipulate simulation,– Perform a task.
TRIADS (University of Derby)
Pushing the Boundaries of Computer Marking
src: Tom Mitchell, Intelligent Assessment Technologies
Short Answer Questions
Intelligent AssessmentCreate model answers based on
analysis of samples, Acceptable accuracy from around
50 responses, Optimum accuracy from ~200
samples.
Mean Marking Accuracy
Increased Consistency,Marking schemes can be varied to suit content,Built in Moderation.
Essays: e-Rater
Claims to assess quality of argument, style, examples etc.
Requires thousands of sample essays to develop model answers.
‘Off the shelf’ essay questions.
Assessing Discussion Forum ActivityAppropriate for Learning
Environments,Assessing increases forum activity,Contributions can be assessed, but be
careful with marking scheme – e.g.:– Baseline marks for activity,– Extra marks for worthwhile contributions –
e.g followed up.
Explain marking scheme clearly,Human marking required – intensive.
Low Stakes Peer Assessment
Students submit task – e.g. critique of paper. Anonymised and sent out to others in group,
Computer used for organisation, collation and distribution of content and feedback,
Encourages reflection,Combines well with Discussion
Forum activities.
Item Banking and Analysis
MCQs will predominate,Question Banks,Highlights good and bad
questions,Analysis of Question Quality: IRT
– Difficulty,– Discrimination,– Guessability.
Item Characteristic Curve (ICC)
Difficulty: the displacement
A more difficult question is less likely to be answered correctly by students of a given ability.
more difficult
Discrimination: the slope
High discrimination is good only if matched correctly to student ability.
Guessability: the baseline
Poor distractors can raise the effective guessability.
Question Design
All assessment can suffer from poor question design,
Authoring systems make it easy to create (poor) questions,
Multi-media presentation provides more opportunity for poor design.
standards
orig. src: Charles Duncan, Intrallect
standards
Question and Test Interoperability– Reuse questions,– Use others’ questions,– Change delivery systems.
Other standards– Integrate with other systems,– Embed questions within content,– Identify questions by metadata,– Match questions to content.
Student Attitudes
Resistant to change – so manage it.
Sell the benefits – – immediate access to results,– remediation/feedback,
Carefully integrate – don’t bolt-on – explain the rationale,
Provide opportunity for practice.
Staff Attitudes
Resistant to change – so manage it
Sell the benefits:– Automation,– Access to others’ questions,– Tried and tested assessment,– Authenticity of Assessment,– Long-term efficiency.
Security, Plagiarism and ReliabilityComputers are subjected to far
more rigorous scrutiny,See assessed work in context of
the whole course,Adapt assessment to integrate
understanding and bring in personal experience,
Get students to rate themselves,Always have a backup plan.
Implementation
Cost is front loaded:– Technology Investment, – Assessment Design
(Pedagogical Staff Development),– Planning and Piloting
(Systems, procedures and policies, Convince the stakeholders),
Use computers where appropriate,For what they are good at,Important Issues go beyond the
technology.
Where are we going?
Assessment integrated with learning, blurring the distinction.
Adaptive and diagnostic assessment.
Rich feedback, tailored studyPortfolios, Assessment of teamwork.
Can the curriculum change?Will the curriculum change?
Further Information
CAA Conference, Loughborough http://www.caaconference.com/
LTSN Generic Centre http://www/ltsn.ac.uk/genericcentre/
E-Learning@Ed CAA http://www.elearn.malts.ed.ac.uk/services/CAA/index.phtml
Slides and Notes for this Talk http://www.scrolla.hw.ac.uk/pres/eAssess/
Colin Milligan