A. Ravenscroft et al. (Eds.): EC-TEL 2012, LNCS 7563, pp. 292305, 2012. Springer-Verlag Berlin Heidelberg 2012
eAssessment for 21st Century Learning and Skills
Christine Redecker*, Yves Punie, and Anusca Ferrari
Institute for Prospective Technological Studies (IPTS), European Commission, Joint Research Centre,
Edificio Expo, C/Inca Garcilaso 3, 41092 Seville, Spain
Abstract. In the past, eAssessment focused on increasing the efficiency and effectiveness of test administration; improving the validity and reliability of test scores; and making a greater range of test formats susceptible to automatic scoring. Despite the variety of computer-enhanced test formats, eAssessment strategies have been firmly grounded in a traditional paradigm, based on the explicit testing of knowledge. There is a growing awareness that this approach is less suited to capturing Key Competences and 21st century skills. Based on a review of the literature, this paper argues that, though there are still technological challenges, the more pressing task is to transcend the testing paradigm and conceptually develop (e)Assessment strategies that foster the development of 21st century skills.
Keywords: eAssessment, Computer-Based Assessment (CBA), competence-based assessment, key competences, 21st century skills.
1 Rethinking 21st Century Assessment
Assessment is an essential component of learning and teaching, as it allows the quality of both teaching and learning to be judged and improved . It often determines the priorities of education , it always influences practices and has backwash effects on learning . Changes in curricula and learning objectives are ineffective if assessment practices remain the same , as learning and teaching tend to be modelled against the test . Assessment is usually understood to have two purposes: formative and summative. Formative assessment aims to gather evidence about pupils' proficiency to influence teaching methods and priorities, whereas summative assessment is used to judge pupils' achievements at the end of a programme of work .
Assessment procedures in formal education and training have traditionally focused on examining knowledge and facts through formal testing , and do not easily lend themselves to grasping 'soft skills'. Lately, however, there has been a growing awareness that curricula and with them assessment strategies need to be revised to
* The views expressed in this article are purely those of the authors and may not in any circumstances be regarded as stating an official position of the European Commission.
eAssessment for 21st Century Learning and Skills 293
more adequately reflect the skills needed for life in the 21st century. The evolution of information and communication technologies (ICT) has contributed to a sudden change in terms of skills that students need to acquire. Skills such as problem-solving, reflection, creativity, critical thinking, learning to learn, risk-taking, collaboration, and entrepreneurship are becoming increasingly important . The relevance of these "21st Century skills"  is moreover recognised in the European Key Competence Recommendation  by emphasizing their transversal and over-arching role. To foster and develop these skills, assessment strategies should go beyond testing factual knowledge and aim to capture the less tangible themes underlying all Key Competences.
2 Looking beyond the e-Assessment Era
At the end of the eighties, Bunderson, Inouye and Olsen  forecasted four generations of computerized educational measurement, namely: Generation 1: Computerized testing (administering conventional tests by computer); Generation 2: Computerized adaptive testing (tailoring the difficulty or contents or an aspect of the timing on the basis of examinees responses); Generation 3: Continuous measurement (using calibrated measures to continuously and unobtrusively estimate dynamic changes in the students achievement trajectory); Generation 4: Intelligent measurement (producing intelligent scoring, interpretation of individual profiles, and advice to learners and teachers by means of knowledge bases and inferencing procedures).
administration and/or scoring of
Adaptive TestingGeneration 2
Automated Feedback Personalised feedback
1990 1995 2000 2005 2010 2015 2020 2025
Efficient Testing Personalised Learning
Generation 3Continuous integrated
Data Mining & Analysis
Generation Re-inventionTransformative testing: use
technology to change test formats
Computer-Based Assessment (CBA) Embedded Assessment
Collaborative multimedia learning environments
Online collaborationVirtual laboratories
Virtual WorldsReplication of complex real life
situations within a test
Fig. 1. Current and future e-Assessment strategies. Source: IPTS on the basis of [8-10]
These predictions from 1989 are not far off the mark. The first two generations of eAssessment or Computer-Based Assessment (CBA), which should more precisely be referred to as Computer-Based Testing, have now become mainstream. The main challenge currently lies in making the transition to the latter two, the era of Embedded Assessment, which is based on the notion of Learning Analytics, i.e. the
294 C. Redecker, Y. Punie, and A. Ferrari
interpretation of a wide range of data about students' proficiency in order to assess academic progress, predict future performance, and tailor education to individual students . Although Learning Analytics are currently still in an experimental and development phase, embedded assessment could become a technological reality within the next five years .
However, the transition from computer-based testing to embedded assessment requires technological advances to be complemented with a conceptual shift in assessment paradigms. While the first two generations of CBA centre on the notion of testing and on the use of computers to improve the efficiency of testing procedures, generation 3 and 4 seamlessly integrate holistic and personalised assessment into learning. Embedded assessment allows learners to be continuously monitored and guided by the electronic environment which they use for their learning activities, thus merging formative and summative assessment within the learning process. Ultimately, with generation 4, learning systems will be able to provide instant and valid feedback and advice to learners and teachers concerning future learning strategies, based on the learners' individual learning needs and preferences. Explicit testing could thus become obsolete.
This conceptual shift in the area of eAssessment is paralleled by the overall pedagogical shift from knowledge-based to competence-based assessment and the recent focus on transversal and generic skills, which are less susceptible to generation 1 and 2 e-Assessment strategies. Generation 3 and 4 assessment formats may offer a viable avenue for capturing the more complex and transversal skills and competences crucial for work and life in the 21st century. However, to seize these opportunities, assessment paradigms need to shift to being enablers of more personalised and targeted learning processes. Hence, the open question is how we can make this shift happen. This paper, based on an extensive review of the literature, provides an overview of current ICT-enabled assessment practices, with a particular focus on the more recent developments of ICT-enhanced assessment tools that allow the recognition of 21st century skills.
3 The Testing Paradigm
3.1 Mainstream eAssessment Strategies
First and second generation tests have led to a more effective and efficient delivery of traditional assessments . More recently, assessment tools have been enriched to include more authentic tasks and to allow for the assessment of constructs that have either been difficult to assess or which have emerged as part of the information age . As the measurement accuracy of all of these test approaches depends on the quality of the items it includes, item selection procedures such as Item Response Theory or mathematical programming play a central role in the assessment process . First generation computer-based tests are already being administered widely for a variety of educational purposes, especially in the US , but increasingly also in Europe . Second generation adaptive tests, which select test items based on the candidates' previous response, allow for a more efficient administration mode (less items and less testing time), while at the same time keeping measurement precision . Different algorithms have been developed for the selection of test items. The most
eAssessment for 21st Century Learning and Skills 295
well-known type of algorithmic testing is CAT, which is a test where the algorithm is designed to provide an accurate point estimation of individual achievement . CAT tests are very widespread, in particular in the US, where they are used for assessment at primary and secondary school level [10, 14, 17]; and for admission to higher education . Adaptive tests are also used in European countries, for instance the Netherlands  and Denmark .
3.2 Reliability of eAssessment
Advantages of computer-based testing over traditional assessment formats include, among other: paperless test distribution and data collection; efficiency gains; rapid feedback; machine-scorable responses; and standardized tools for examinees as calculators and dictionaries . Furthermore computer-based tests tend to have a positive effect on students' motivation, concentration and performance [20, 21]. More recently, they also provide learners and teachers with detailed reports that describe strengths and weaknesses, thus supporting formative assessment . However, the reliability and validity of scores has been a major concern, particularly in the early phases of eAssessment, given the prevalence of multiple-choice formats in computer-based tests. Recent research indicates that scores are indeed generally higher in multiple choice tests than they are in short answer formats . Some studies found no significant differences between student performance on paper and on screen [20, 23], whereas others indicate that paper-based and computer-based tests do not necessarily measure the same skills [10, 24].
Current research efforts concentrate on improving the reliability and validity of test scores by improving selection procedures for large item banks , by increasing measurement efficiency , by taking into account multiple basic abilities simultaneously , and by developing algorithms for automated language analysis, that allow the electronic scoring of long free text answers.
3.3 Automated Scoring
One of the main drivers of progress in eAssessment has been the improvement of automatic scoring techniques for free text answers  and dedicated written text assignments . Automated scoring could dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders .
Already, assignments in programming languages or other formal notations can be automatically assessed . For short-answer free-text responses of around a sentence in length, automatic scoring has also been shown to be at least as good as that of human markers . Similarly, automated scoring for highly predictable speech, such as a one sentence answer to a simple question, correlates very highly with human ratings of speech quality, although this is not the case with longer and more open-ended responses . Automated scoring1 is also used for scoring essay-length
1 Examples include: Intelligent Essay Assessor (Pearson Knowledge Technologies), Intellimetric (Vantage), Project Essay Grade (PEG) (Measurement, Inc.), and e-rater (ETS).
296 C. Redecker, Y. Punie, and A. Ferrari
responses  where it is found to closely mimic the results of human scoring: the agreement of an electronic score with a human score is typically as high as the agreement between two humans, and sometimes even higher [10, 17, 29]. However, these programmes tend to omit features that cannot be easily computed, such as content, organization and development . Thus, while there is in general a high correlation between human and machine marking, discrepancies are higher for essays which exhibit more abstract qualities . A further research line aims to develop programmes which mark short-answer free-text and give tailored feedback on incorrect and incomplete responses, inviting examinees to repeat the task immediately .
4 The Tutoring Paradigm
Integrated assessment is already implemented in some technology-enhanced learning (TEL) environments, where data-mining techniques can be used for formative assessment and individual tutoring. Similarly, virtual worlds, games, simulations and virtual laboratories allow the tracking of individual learners activity and can make learning behaviour assessable. Many TEL environments, tools and systems recreate learning situations which require complex thinking, problem-solving and collaboration strategies and thus allow for the development of 21st century skills. Some of these environments include the assessment of performance. Assessment packages for Learning Management Systems are currently under development to integrate self-assessment, peer-assessment and summative assessment, based on the automatic analysis of learner data . Furthermore, data on student engagement in these environments can be used for embedded assessment, which refers to students engaging in learning activities while an assessment system draws conclusions based on their tasks . Data-mining techniques are already used to evaluate university students' activity patterns in Virtual Learning Environments for diagnostic purposes. Analytical data mining can, for example, identify students who are at risk of dropping out or underperforming,2 generate diagnostic and performance reports,3 assess interaction patterns between students on collaborative tasks,4 and visualise collaborative knowledge work.5 It is expected that, in five years time, advances in data mining will enable the interpretation of data concerning students engagement, performance, and progress, in order to assess academic progress, predict future performance, and revise curricula and teaching strategies .