Redesigning the Computer-assisted Language Exams for Federal Government Employees

  • Published on

  • View

  • Download

Embed Size (px)


Presentation delivered by Bert Wylin at eAssessment Scotland 2010


<ul><li> 1. Redesigning the computer assisted language exams for federal government employees: a didactic, methodological and technological challenge </li> <li> 2. </li> <li> 3. </li> <li> 4. <ul><li>0. Context </li></ul><ul><li>Didactic challenge </li></ul><ul><li>Methodological challenge </li></ul><ul><li>Technological challenge </li></ul><ul><li>Results &amp; future developments </li></ul></li> <li> 5. 0. Context <ul><li>SELOR: Belgian governmental selection bureau </li></ul><ul><li>in a trilingual country, multilingualism is crucial for state servants </li></ul><ul><li>-&gt; language testing is crucial part of assessment &amp; selection procedures </li></ul><ul><li>Since the 90s: </li></ul><ul><li><ul><li>ATLAS = electronic language testing system for Dutch and French. </li></ul></li></ul><ul><li><ul><li>Yearly thousands of candidates </li></ul></li></ul><ul><li><ul><li>Wide variety of governmental jobs (policeman as well as diplomats) </li></ul></li></ul></li> <li> 6. <ul><li>ATLAS = state-of-the art at its creation, but needed a complete overhaul in three domains </li></ul><ul><li>1. didactical component: </li></ul><ul><li>strongly focused on language knowledge </li></ul><ul><li>weak integration of skills-based view on language competence ( Common European Framework of Reference didnt exist at that time) </li></ul><ul><li>2. Methodological component </li></ul><ul><li>Level structure without psychometric underpinning: 4 levels </li></ul><ul><li>No evaluation of the reliability and validity of the ATLAS tests </li></ul></li> <li> 7. <ul><li>3. Technological component </li></ul><ul><li>ATLAS operated on the SELOR network, not online </li></ul><ul><li>Closed, non-adaptable and non-updatable system </li></ul><ul><li>Off-line accompanying training module on cd-rom </li></ul><ul><li>No itembanking </li></ul><ul><li>No integration into Selor admin </li></ul></li> <li> 8. <ul><li>Constraints: </li></ul><ul><li><ul><li>Legal constraints: e.g. vocabulary &amp; grammar should be tested separately e.g. 4 levels (1 to 4) should be distinguished </li></ul></li></ul><ul><li><ul><li>Practical constraints: e.g. each examen takes maximum 120 minutes e.g. SELOR wanted us to reuse the existing items as much as possible e.g. whole operation had to be realised within one year </li></ul></li></ul></li> <li> 9. <ul><li>Research team: partners </li></ul><ul><li>Didactic component </li></ul><ul><li>French: Piet Desmet (K.U.Leuven Campus Kortrijk) </li></ul><ul><li>Dutch: Guy Deville (FUNDP Namur) </li></ul><ul><li>Methodological component </li></ul><ul><li>Sara Gysen (K.U.Leuven) </li></ul><ul><li>Technological component </li></ul><ul><li>Bert Wylin (Televic Education) </li></ul><ul><li>Coordination </li></ul><ul><li>Piet Desmet &amp; Sara Gysen </li></ul></li> <li> 10. <ul><li>0. Context </li></ul><ul><li>Didactic challenge </li></ul><ul><li>Methodological challenge </li></ul><ul><li>Technological challenge </li></ul><ul><li>Results &amp; future developments </li></ul></li> <li> 11. I. Didactic challenge <ul><li>I.I. Construct definition </li></ul><ul><li>I.2. Item revision and new item writing </li></ul><ul><li>1.3. Metadata </li></ul></li> <li> 12. 1.1. Construct definition <ul><li>From 9 modules to 4 components: </li></ul><ul><li><ul><li>2 knowledge-oriented: vocabulary &amp; grammar 2 skills-oriented: listening &amp; reading </li></ul></li></ul></li> <li> 13. I.2. Item Revision &amp; New item writing <ul><li>Revision of existing items: </li></ul><ul><li><ul><li>Uniformity (e.g. same type MCQ; only one gap in all cloze exercises) </li></ul></li></ul><ul><li><ul><li>Transparancy for test candidates (e.g. dichotomous rating for all items) </li></ul></li></ul><ul><li>New item writing </li></ul><ul><li><ul><li>As authentic as possible: real audiofragments, scanned articles, letters etc </li></ul></li></ul><ul><li><ul><li>A spectrum of different item types, not only multiple choice, in order to test as direct as possible the different tasks specified in the construct </li></ul></li></ul></li> <li> 14. <ul><li>New items writing </li></ul><ul><li><ul><li>New items were developed for the new categories within the listening and reading component </li></ul></li></ul><ul><li><ul><li>As authentic as possible: real audiofragments, scanned articles, letters etc: same look and feel, same distribution of images as in real-life tasks </li></ul></li></ul><ul><li><ul><li>A spectrum of different item types, not only multiple choice, in order to test as direct as possible the different tasks specified in the construct </li></ul></li></ul><ul><li><ul><li>Standard choice of technical item type for each part of the construct </li></ul></li></ul></li> <li> 15. 1.3. Metadata <ul><li>Item tags and features of 3 types: </li></ul><ul><li><ul><li>Content metadata (automatic and manual) </li></ul></li></ul><ul><li><ul><li>Psychometric metadata (cf. 2) </li></ul></li></ul><ul><li><ul><li>Dynamic metadata (evolving through use of system) </li></ul></li></ul><ul><li>Important for itembanking: </li></ul><ul><li><ul><li>Control of item selection in examen versions </li></ul></li></ul><ul><li><ul><li>Monitoring of item quality (cf. psychometric data and dynamic metadata) </li></ul></li></ul></li> <li> 16. <ul><li>Metadata for each item of the database: </li></ul><ul><li><ul><li>Content metadata </li></ul></li></ul><ul><li><ul><li><ul><li>Identification number </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Question format </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Excluded when other item present </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Linked to other item </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Date of creation </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Date of adaptation </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Adapted for candidates with special needs </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Rating </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>In training environment </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Inactive </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Assets (multimedia) </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Length audio/video </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Length text </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Example item </li></ul></li></ul></li></ul><ul><li><ul><li>Dynamic metadata </li></ul></li></ul><ul><li><ul><li><ul><li>Popularity of item </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Average answer time </li></ul></li></ul></li></ul><ul><li><ul><li>Psychometric metadata </li></ul></li></ul><ul><li><ul><li><ul><li>Logit-value </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>P-value </li></ul></li></ul></li></ul></li> <li> 17. <ul><li>0. Context </li></ul><ul><li>Didactic challenge </li></ul><ul><li>Methodological challenge </li></ul><ul><li>Technological challenge </li></ul><ul><li>Results &amp; future developments </li></ul></li> <li> 18. 2. Methodological challenge <ul><li>2.I. Screening and calibration of existing database </li></ul><ul><li>2.2. Development of an IRT-based item database </li></ul><ul><li>2.3. Standard setting &amp; selection rules </li></ul></li> <li> 19. 2.1. Screening of the existing database <ul><li>Screening based on test data from 1995-2000 </li></ul><ul><li><ul><li>Elimination of items based on </li></ul></li></ul><ul><li><ul><li><ul><li>their p-value (percentage correct answers provided by the test candidates) lower than 0.10 (extremely difficult) higher than 0.95 (extremely easy) </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>and their occurence in test versions: at least 100 times in test version </li></ul></li></ul></li></ul><ul><li><ul><li><ul><li>Results: 218 French items and 849 Dutch items were eliminated </li></ul></li></ul></li></ul></li> <li> 20. 2.2. Development of an IRT-based database <ul><li>Submitting the items to a psychometric analysis based on Item Respons Theory (IRT) which allows to place items on a scale </li></ul><ul><li>that orders items in function of their intrinsic difficulty level (logit value) </li></ul><ul><li>&amp; </li></ul><ul><li>that orders examinees in terms of their ability </li></ul></li> <li> 21. <ul><li>Example of a measurement scale in IRT model </li></ul><ul><li>Probabilistic model : </li></ul><ul><li>Person B has great potential to answer items c &amp; b correctly but far less to answer items d or f correctly, which will normally be solved correctly by person A. </li></ul><ul><li>The chance that person B will be able to answer item e is almost non existing </li></ul>A B </li> <li> 22. <ul><li>Eight different scales: one per target language and per component </li></ul><ul><li>e.g. Logit distribution of candidates and items (French) for component 4 - Reading </li></ul></li> <li> 23. Metadata set your own metadata Use system metadata </li> <li> 24. <ul><li>1. Test management by candidate him/herself: </li></ul><ul><li>Candidate decides when to start up the next component (but fixed time limit of 120min for the whole exam) </li></ul><ul><li>Possibility to review within component </li></ul><ul><li>Overview screen and brainteaser tag </li></ul><ul><li>No restriction on playing audio and video input, but limited time allocation is mentioned in instruction </li></ul><ul><li>Possibility of not answering an item </li></ul><ul><li>But restrictions: </li></ul><ul><li>Time limit: fixed per component (time interval on screen) </li></ul><ul><li>Fixed order of components: C1 C2 C3 C4 </li></ul><ul><li>2. Resuming possible in case of problems </li></ul><ul><li>3. Equal share of components in overall score </li></ul>2.3. Standard setting and selection rules </li> <li> 25. Exam version 1 General introduction to test No time allocation 2 Vocabulary component Instruction + 1 example item 20 test items 15 min 3 Grammar component Instruction + 5 example items 20 test items 15 min 4 Listening component Instruction + 2 example items 20 test items 45 min 5 Reading component Instruction + 3 example items 20 test items 45 min </li> <li> 26. <ul><li>0. Context </li></ul><ul><li>Didactic challenge </li></ul><ul><li>Methodological challenge </li></ul><ul><li>Technological challenge </li></ul><ul><li>Results &amp; future developments </li></ul></li> <li> 27. 3. Technological challenge <ul><li>3.I. E-testing: Edumatic technology </li></ul><ul><li>3.2. Exam &amp; preparatory learning environment </li></ul><ul><li>3.3. Itembanking &amp; Selor Test Administration System </li></ul></li> <li> 28. 3.1. E-testing: Edumatic based environment <ul><li>Edumatic is an authoring system for exercises and tests for both online and offline assessments (online server based, with export button to offline SCORM-compliant) </li></ul><ul><li>xml-based data in a flash user interface (ims-qti and scorm compliant) </li></ul><ul><li>20+ question types supports multimedia in all question types </li></ul><ul><li>Visit our booth! </li></ul></li> <li> 29. 3.2. Exam &amp; preparatory learning environment <ul><li>Customization of Edumatic environment </li></ul><ul><li><ul><li>Selor-skin </li></ul></li></ul><ul><li><ul><li>Single Sign-On with Selor Admin (SSO: login once, get access to multiple applications) </li></ul></li></ul></li> <li> 30. </li> <li> 31. </li> <li> 32. </li> <li> 33. <ul><li>Exam configuration </li></ul><ul><li><ul><li>Secure browser Full classroom control system (AvidaNet Exam) </li></ul></li></ul><ul><li><ul><li>Resume </li></ul></li></ul><ul><li><ul><li>Strict time allocation </li></ul></li></ul><ul><li><ul><li>Sequencing </li></ul></li></ul></li> <li> 34. <ul><li>Online learning environment: available at </li></ul><ul><li><ul><li>Preparatory learning environment </li></ul></li></ul><ul><li><ul><li>Login for free via My Selor </li></ul></li></ul><ul><li><ul><li>Components 1 &amp; 2 (vocabulary &amp; grammar) : access to the entire database + 35.000 items </li></ul></li></ul><ul><li><ul><li>Components 3 &amp; 4 (listening &amp; reading): only model items </li></ul></li></ul></li> <li> 35. </li> <li> 36. </li> <li> 37. </li> <li> 38. </li> <li> 39. </li> <li> 40. <ul><li>Stats: user stats </li></ul></li> <li> 41. <ul><li>Stats: user stats </li></ul><ul><li>Stats: package stats </li></ul></li> <li> 42. <ul><li>0. Context </li></ul><ul><li>Didactic challenge </li></ul><ul><li>Methodological challenge </li></ul><ul><li>Technological challenge </li></ul><ul><li>Results &amp; future developments </li></ul></li> <li> 43. 4. Results &amp; further developments <ul><li>Fully operational </li></ul><ul><li><ul><li>Exam version: in use since October 2007 many thousands of candidates every year </li></ul></li></ul><ul><li><ul><li>Online learning environment: online since September 2007 </li></ul></li></ul><ul><li>Edumatic now official electronic test environment </li></ul><ul><li><ul><li>For all Selor domains (including social skills, law, informatics, mathematics, accountancy) </li></ul></li></ul><ul><li>Development of decentralized exam facilities </li></ul><ul><li>Added question types </li></ul><ul><li><ul><li>Including voice recording (with scoring by human) </li></ul></li></ul><ul><li><ul><li>Including open questions (with scoring by human) </li></ul></li></ul><ul><li><ul><li>Including open questions with automatic feedback (not used in real exams) </li></ul></li></ul></li> <li> 44. </li> <li> 45. Please visit contact [email_address] [email_address] </li> <li> 46. <ul><li>Edu matic Exam </li></ul><ul><li>Edu matic Mobile </li></ul></li> </ul>