Managing the development of robust and reliable assessments for qualifications and learning 21 March 2012 John Winkley

Managing the developmentManaging the developmentof robust and reliable assessmentsof robust and reliable assessmentsfor qualifications and learningfor qualifications and learning

21 March 2012John Winkley

IntroductionIntroduction

• What do we mean by robust and reliable? And what do our stakeholders mean? How do we know if qualifications are robust and reliable?

• Is e-assessment a help or a hindrance to achieving this?• Does this differ for different settings:

• Vocational vs academic vs professional qualifications• Diagnostic and formative vs summative assessments

• Experience drawn from UK – school (NC tests and GQ), vocational, professional and HE examinations

• Also from international projects

Reliable assessmentsReliable assessments

What do we mean by reliability?• Fairness, in terms of what we control – test specification, test

questions, test coverage, test marking and awarding. Repeatability.• Not true/false (although the public perception is commonly that it

is, and this has an effect on government and AO tactics).

How do we know?• By being careful about our assessment process in design and

implementation• By undertaking analysis of processes and outcomes afterwards.

much of it statistical but other methods exist• Reliability is quite hard to prove conclusively, and almost never

100% repeatable

Robust assessmentsRobust assessments

What do we mean by robust assessments? Rigorous?• I’m taking it to mean “valid”, ie fit for purpose. (there are

other aspects of assessment quality too)• Key elements of validity

• Does the assessment measure the curriculum properly?• Is the scoring accurate and reliable? (reliability)• Does the scoring match the performance standards• Is it a good predictor (predictive validity)• Do people believe in it? (face validity)

• Validity is a slippery term, not least because assessments in the UK often have multiple purposes

Robust assessments (2)Robust assessments (2)Fitness for PurposeFitness for Purpose

Assessment purpose 1. social evaluation 2. selection 3. formative 4. licensing 5. student monitoring 6. certification 7. diagnosis 8. school choice 9. provision eligibility 10. institution monitoring 11. screening 12. resource allocation 13. segregation 14. organizational intervention 15. guidance 16. program evaluation 17. transfer 18. system monitoring 19. placement 20. comparability 21. qualification 22. national accounting

Consider Psychology A Level

Paul Newton, (then QCDA now Cambridge Assessment)http://www.publications.parliament.uk/pa/cm200708/cmselect/cmchilsch/169/16906.htm#n35

Who else cares?Who else cares?

Most stakeholders•Teachers, parents, students, the press (they just use different words for reliability and validity)

Ofqual and Government•Make it clear that both reliability and validity are essentially non-negotiable•Are very interested in requiring AOs to report reliability measurements for qualifications, and have published detailed research on reliability.www.ofqual.gov.uk/standards/reliability (Mike Cresswell and John Winkley)

•We’ve set our stall out on validity (in contrast to other approaches)

Where are we with e-assessment?Where are we with e-assessment?Landscape divided into four domainsLandscape divided into four domains

The main benefits of e-assessment differ between summative and formative applications: For e-assessment in on-screen computer-marked testing•Speed of feedback•Increased flexibility and efficiency of assessment•More discriminating assessments•Richness and authenticity of the assessment experience•Environmental benefits compared to the costs of paper-based exams system(the benefits vary a little for different qualification types and purposes)

For “Wider e-assessment”•Richness and authenticity of the assessment experience•Technology-enabled assessment facilitates and improves the effectiveness and/or efficiency of communication between learners and tutors.•Candidates generally like e-assessmentBecta e-Assessment Landscape Study http://www.alphaplusconsultancy.co.uk/pdf/Becta%20-%20E-assessment%20Landscape%20review%20-%20Report%20Final.pdf

Benefits of e-assessmentBenefits of e-assessment

Barriers to e-assessmentBarriers to e-assessment

1. Capital Cost2. Custom and practice, apparent lack of interest from stakeholders, other

priorities (change) and risk aversion coupled with a lack of commercial pressure

3. ICT estate (particularly in schools) coupled with examination format4. Concerns about validity and reliability, often unarticulated:

• What if it tests IT skills rather than what it’s supposed to be testing?• What if it all breaks down mid-test or won’t start?• What about inter-form comparability?• What about face validity (eg the uncanny valley)?• What about transparency and openness?• What about paper to screen comparability?• What about accessibility?• What about screen size, tired eyes, broken mice, use of colour, etc?

E-Assessment, reliability and validityE-Assessment, reliability and validity

• Most of the legitimate concerns about validity and reliability have been dealt with, thoroughly, both in practice and research

• Cost challenges remain in many settings• The UK (and USA) have held a strong lead in e-assessment deployment• However, the rest of the world is catching up• The UK boasts an outstanding technology supply sector

• Viable, ‘reliable’ E-assessment for small and large AOs• Powerful assessment types and test creation technologies• Powerful marking technologies• Excellent research and evaluation• (Although the technology is still developing at the edges)

Today, more than presenting problems for validity and reliability, e-assessment provides some strong approaches to the meeting those

challenges

Improving reliability and validityImproving reliability and validity

Example 1 – Simple multiple choice tests (R)Many vocational and professional qualifications in the UK and internationally•E-assessment is well suited to this scenario and it is the easiest to implement•Relatively few ‘public’ concerns expressed about paper-based approaches despite known issues from research.•Multiple response items easier to handle on-screen than on-paper

Example 2 – Media rich, more complex question types (V)TDA QTS Tests, Functional Skills, DSA Hazard Perception, Medical assessments•Potential for improved content and face validity (authenticity)•Powerful computer marking available


Example 3 - On-screen marking (R) GCSE in England•Heavily deployed – hundreds of millions of marks given each year on-screen. Most GCSE marking moved to on-screen.•Management information improvement•Efficiency savings by avoiding movement of paper•New possibilities for monitoring and training of markers•Deal with errors and anomalies more effectivelyExample 4 - Item banking (R) Many vocational & professional exams (eg DSA)•Workflow around item creation – management information for QA•Test creation using test balancing rules to support on-demand testing•Monitor some aspects of reliability and fairness automatically•Monitoring exposure and drift, and other security issues•Using results to farm item banks

• Target resources on weakest areas• Deal more quickly and effectively with anomalies

•Adapt to changing requirements


Example 5 – Innovation in medium-stakes testing (R+V)SQA Unit assessments project•Supports sharing models for content•Innovative approach to dealing with challenges in unit testing•Validity is “built-in”, and wider quality is improved too•And it is not necessarily expensive

• Example 6 – Allowing students to use their own tools (V)•Crossover technology with ePortfolios•Maximises opportunity for students to show what they can do

Formative assessmentFormative assessmentHelps leverage value in contentHelps leverage value in content

• Many AOs are considering practical ways of leveraging their assessment resources (particularly the retired content) for other purposes, e.g. formative and diagnostic assessment.

• Adaptive assessments are very popular with students and teachers in a variety of settings.

• Banks of questions are more valuable to educators if they leverage quality performance metadata

Aspects of formative assessmentAspects of formative assessment

Where am I now? Where am I trying to get to? What do I need to do next?

Schools are ready forSchools are ready forsophisticated assessmentsophisticated assessment

http://www.tki.org.nz/r/asttle

SummarySummary

Although some challenges remain, most of the significant reliability and validity concerns in e-assessment have been addressed.

E-Assessment now offers several ways to improve assessment and qualification quality.

Consumer demand is latent but levels of acceptance and satisfaction are high.

Continued lack of innovation, particularly in school qualifications, risks the system being seen as out of touch.

ENDENDManaging the developmentManaging the developmentof robust and reliable assessmentsof robust and reliable assessmentsfor qualifications and learningfor qualifications and learning

21 March 2012John Winkley

Documents

Managing the development of robust and reliable assessments for qualifications and learning 21 March 2012 John Winkley