Upload
british-council
View
217
Download
2
Embed Size (px)
DESCRIPTION
Chris Hurling
Citation preview
How to create a new English test – practical
process
National ELT Conference Bogotá 2011 Chris Hurling
A new test that has had
far reaching benefits
• The need for change
• Terms of reference
• Item writing
• Piloting
• Statistical analysis
• Documenting the change
• Change management/backwash
The need for change
• English Graduation Exam
• About 600 tests taken every year
• A ‘high stakes test’
• Issues with old-style test:
– Content validity
– Construct validity
– Criterion validity
– Reliability
Terms of Reference
Test specification
• Test skills: reading, writing, speaking
• Reading & writing 2 hr, speaking 20 min
• Pass level = mid CEF C1
• Rubrics to assess writing and speaking
• Wider range of reading items types
• Paper based
Terms of Reference
Getting around constraints
Terms of Reference
Constraints
• Knowledge
• Experience
• Resources
Solutions
• Background reading
• Benchmarking
• Voluntary participation
Simple project structure
Terms of Reference
Sponsor
(e.g. Director)
Project Manager
Project Team
Consultative Group
Valid sample of writing
• Test only writing ability
• More than one sample
• Authentic tasks (genre)
• Restrict candidates/no choice of tasks
• Long enough samples
Terms of Reference
Valid sample of speaking
• Interactive
– Transactional or Interpersonal
• Plan and structure the test carefully
• Non-sensitive & non-academic topics
Terms of Reference
The more scores, the more
reliable the test for a candidate* Holistic rubric
• Example: TOEFL
• Impressionistic
• Quick to do
• Reliable (4 different raters)
• Sub-skills not rated
• No analysis for beneficial backwash
Analytical rubric
• Example: IELTS
• Sum of the parts
• Takes more time
• 4 or 5 scores per text
• A score for each sub-skill
• Beneficial backwash for teaching
* Hughes A, 2003
Terms of Reference
Item writing
Text 1
• 500 – 700 words
• Limit challenging words
• Title
• 10 Cloze questions
Text 2
• 700 – 900 words
• Limit challenging words
• Title
• Max 12 paragraphs
• 15 items
• Specified range of items
Define rules for selection and
editing of the reading texts
Genre of text influences content
validity Don’t
• underestimate the time
• use texts for special genre – e.g. internet
• select topics that will date quickly
• select sensitive topics
Do
• use electronic versions
• newspapers, specialist magazines
• select texts longer than the final text length
Item writing
Deciding item types (reading) IELTS/ TOEFL
Item type Skill tested Ease of question
Students prepared
Ease to write
Total score
IELTS Sentence completion
Locate & understand information
3 4 3 10
IELTS Short answer Locate & understand information
3 3 4 10
TOEFL Referencing (multiple choice)
Cohesion of ideas
3 3 4 10
TOEFL Negative stem m/choice
Scanning and reading for detail
4 0 1 5
Item writing
Moderating test items First item writer
• Write items
• 2 x required number
• Reject items
Second item writer
• Attempt item
• Amend/reject
• Feedback to first item writer
Consultative group
• Attempt items
• Amend/reject
• Feedback to item writers
Item writing
Pre-testing on real test-takers
Pilot test(s)
16
16
13 8
8
25
85 Students
IELTS 1
IELTS 2
Grad Exam
TOEFL 1
TOEFL 2
Control
Which items work best?
Analyse results
Item Discrimination (ID) - differentiation ID = (high group # correct) – (low group # correct)
½ (total # Ss in the high and low groups)
Item Facility (IF) - difficulty IF = __number of Ss answering item correctly_ total number of Ss responding to that item
Selecting the best items
Analyse results
Item Facility – Cloze questions
Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
I F .54 .96 .98 .61 .88 .71 .93 .89 .76 .48 .71 .88 .60 .74 .84
Item Facility and Item Discrimination compared – Summary completion
Item 23 24 25 26 27
High group correct answer 1 12 12 13 8
Low group correct answer 0 9 8 5 8
Item Discrimination (ID) .08 .23 .31 .62 .00
Item Facility (IF) .08 .88 .83 .75 .70
Acceptable concurrent
validation
Analyse results
IELTS/TOEFL mock tests
No correlative data
Met standard in both tests
Failed to meet standard in both tests
Passed TOEFL/IELTS, failed pilot
Failed TOEFL/IELTS, passed pilot
Document the new test
• Create test-writers manual
–Guidelines for writing items
– Specify item instructions for candidates
• Create instructions for the examiners
– Include grade reporting
• Create test-takers information
– Sample exam with answers
–How to prepare for the new exam
Document new test
Manage the change
• Training for teachers and test-writers
– 1 day course
– Awareness of new test
– How to write some of the items
– How to score the new test
• Devise examiner calibration session
• Socialise the change to test-takers
Change Management
Beneficial backwash -
curriculum
• Highest level course from CEF C1 to B2+
• ‘Best in class’ material selection
• Revision of syllabus for each level
• Exam training built into curriculum
• Exam preparation course refreshed
Change Management
Beneficial backwash -
assessment
• New style items incorporated into progress and summative tests
• Exam specifications written for all levels
• Exam-writers trained
• Teachers trained how to score tests
• Teachers trained to do more valid formative testing
Change Management
Beneficial backwash –
teaching methodology
• Product and process approaches to teaching writing skills
• Teachers trained on how to teach other skills
• Upgrade in teacher training sessions
Change Management
A practical process & a new
test with beneficial backwash
T o R Specific
ation Item
Writing Pilot
Testing Analyse results
Document
Manage change
Back-wash
References: Brown D, 2004. Language Assessment. New York: Pearson Education Ltd
Fulcher G and Davidson F, 2007. Language Testing and Assessment. Abingdon: Routledge
Gear J & Gear R, 2006. Cambridge Preparation for the TOEFL Test 4th Edn. Cambridge: CUP
Hughes A, 2003. Testing for Language Teachers. Cambridge: Cambridge University Press
IELTS official website, www.IELTS.org
ETS official website, www.ets.org