Upload
harlan
View
37
Download
0
Tags:
Embed Size (px)
DESCRIPTION
CELEA/CAAL 2007 Beijing The Assessment of English Oral Proficiency: Alternative Measures Lynne Hansen, C. Ray Graham, Jeremi Brewer, Rebecca Brewer, Wariyaporn Tieocharoen Contact e-mail: [email protected]. Toward the automatic adminstration and scoring of speaking tests. - PowerPoint PPT Presentation
Citation preview
CELEA/CAAL 2007 CELEA/CAAL 2007 BeijingBeijing
The Assessment of English Oral Proficiency: The Assessment of English Oral Proficiency: Alternative Measures Alternative Measures
Lynne Hansen, C. Ray Graham, Jeremi Brewer, Lynne Hansen, C. Ray Graham, Jeremi Brewer, Rebecca Brewer, Wariyaporn TieocharoenRebecca Brewer, Wariyaporn Tieocharoen
Contact e-mail: [email protected] e-mail: [email protected]
Toward the automatic adminstration Toward the automatic adminstration and scoring of speaking testsand scoring of speaking tests
1.1. Computerized elicited imitation (EI)Computerized elicited imitation (EI) 2.2. Automated measurement of temporality Automated measurement of temporality
FAST (Fully Automated Speaking Test)FAST (Fully Automated Speaking Test) Purpose:Purpose: to develop a valid and reliable to develop a valid and reliable
automated instrument to measure oral automated instrument to measure oral proficiency in L2 Englishproficiency in L2 English
Comparing Theoretical Constructs Comparing Theoretical Constructs for Speaking Testsfor Speaking Tests
Psycholinguistic/Psycholinguistic/Empirical Empirical Language processing Language processing TemporalityTemporalitySentence orientationSentence orientation
EI, FAST, VersantEI, FAST, Versant(automated tests)(automated tests)
Communicative/ Communicative/
FunctionalFunctional________AuthenticityAuthenticityNegotiation of meaningNegotiation of meaningTurn-takingTurn-taking
OPIs, SOPIsOPIs, SOPIs
This paper is dedicated to the This paper is dedicated to the memory of Craig Chaudronmemory of Craig Chaudron. .
He created L2 elicitation He created L2 elicitation
instruments for Vietnamese instruments for Vietnamese and Indonesian.and Indonesian.
He pointed out the need for He pointed out the need for work on an English EI testwork on an English EI test
C. Chaudron et al. (2005, July). ElicitedC. Chaudron et al. (2005, July). Elicitedimitation as an oral proficiency measure.imitation as an oral proficiency measure.Paper presented at the AILA ConferencePaper presented at the AILA Conference
Craig Chaudron
What is Elicited Imitation (EI)?What is Elicited Imitation (EI)?
A B C
History of Elicited ImitationHistory of Elicited Imitation
1963: Child L1 development1963: Child L1 development- (e.g. Fraser, Bellugi & Brown, 1963 )(e.g. Fraser, Bellugi & Brown, 1963 )
1964: Diagnose language abnormalities1964: Diagnose language abnormalities- (e.g. Menyuk, 1964)(e.g. Menyuk, 1964)
1970’s: L2 acquisition1970’s: L2 acquisition- (e.g. Naiman, 1974)(e.g. Naiman, 1974)
Two Major Thrusts in Two Major Thrusts in L2 Elicited Imitation ResearchL2 Elicited Imitation Research
Psycholinguistic research into language Psycholinguistic research into language competence and SLA processescompetence and SLA processes(Erlam, 2006)(Erlam, 2006)
Indirect measurement of oral language Indirect measurement of oral language proficiencyproficiency(Bley-Vroman & Chaudron, 1994; (Bley-Vroman & Chaudron, 1994; Chaudron et al., 2005)Chaudron et al., 2005)
Bley-Vroman & Chaudron (1994)Bley-Vroman & Chaudron (1994) ““We regard it as premature to view elicited We regard it as premature to view elicited
imitation as a proven method for inferring imitation as a proven method for inferring learner competence…”learner competence…”
But…But… ““the more you know of a foreign language, the more you know of a foreign language,
the better you can imitate the sentences of the better you can imitate the sentences of the language. Thus EI is a reasonable the language. Thus EI is a reasonable measure of global proficiency.” (p. 247)measure of global proficiency.” (p. 247)
Pilot Study InstrumentsPilot Study Instruments Three Forms of 60 sentences Three Forms of 60 sentences (13 repeated on all three (13 repeated on all three
forms,47 unique to each form)forms,47 unique to each form) Sentence length 3 to 24 syllablesSentence length 3 to 24 syllables Wide variety of morphological and syntactic formsWide variety of morphological and syntactic forms Variety of lexical items Variety of lexical items (81.3%=K1, (81.3%=K1,
6.7%=K2, .23%=AWL, 11.6%=Off)6.7%=K2, .23%=AWL, 11.6%=Off) Sentences selected according to criteria Sentences selected according to criteria (Chaudron (Chaudron
et al, 2005)et al, 2005) Recorded in studio; male and female voicesRecorded in studio; male and female voices
Subjects of Pilot StudySubjects of Pilot Study
223 223 learners of English in an IEP in U.S.learners of English in an IEP in U.S. 13 L1 backgrounds 13 L1 backgrounds (Chinese, Spanish, (Chinese, Spanish,
Korean, Japanese, Mongolian, etc.)Korean, Japanese, Mongolian, etc.) English proficiency levels from Novice English proficiency levels from Novice
to Advancedto Advanced Ages 18 to 53, mean = 24.5, SD = 6.9Ages 18 to 53, mean = 24.5, SD = 6.9
Form A ReliabilityForm A Reliability58 items 78 persons58 items 78 persons
Person RAW SCORE-TO-MEASURE Person RAW SCORE-TO-MEASURE CORRELATION = .98CORRELATION = .98
CRONBACH ALPHA (KR-20) Person CRONBACH ALPHA (KR-20) Person RAW SCORE RELIABILITY = .97RAW SCORE RELIABILITY = .97
58 Measured Items 58 Measured Items ITEM RELIABILITY = .98 ITEM RELIABILITY = .98
Persons (N=78) -MAP- Items (N=60) <high ability>|<high item difficulty> 110 + 06 | 100 + | 07 90 + 78 | | 60 80 + |T | 40 42 70 + 39 72 75 T| 08 38 41 | 43 45 73 74 71 |S 76 | 09 37 41 | 10 60 40 42 43 + 11 23 77 S| 36 59 22 59 | 60 67 69 68 | 03 12 35 38 39 57 58 61 62 | 13 57 25 56 63 70 | 34 56 58 50 21 24 34 55 66 +M 04 14 33 44 33 35 36 52 54 64 65 M| 15 48 46 12 26 32 37 49 | 16 32 55 02 13 45 48 50 | 31 47 54 51 53 | 17 30 50 07 08 16 19 27 30 47 | 19 18 40 06 10 11 14 17 18 46 S+ 25 28 53 01 04 31 44 | 24 29 09 28 29 | 01 23 26 27 52 03 05 15 |S 20 | 22 49 30 T+ 21 | 20 |T 05 20 + | 51 10 + 02
<low ability>|<low difficulty> Figure 1. Form A Person/Item Map
Form A Items with Unacceptable Form A Items with Unacceptable Point Measure CorrelationsPoint Measure Correlations
1016 Have you slept ? 1016 Have you slept ? 3015 Maybe she likes cats.3015 Maybe she likes cats. 3025 She quickly jumped down.3025 She quickly jumped down. 4018 They play games.4018 They play games. 4008 The situation in Iraq calls for 4008 The situation in Iraq calls for
diplomacy and sensitivity.diplomacy and sensitivity.
Instrument Instrument
The 60 best discriminating items from The 60 best discriminating items from the pilot studythe pilot study
Sentence length 5 to 22 syllablesSentence length 5 to 22 syllables Recorded in studio by male and Recorded in studio by male and
female voices female voices
Subjects Subjects
156 156 learners in a university ESL program in learners in a university ESL program in the U.S. the U.S.
12 L1 backgrounds (12 L1 backgrounds (Chinese, Mongolian, Chinese, Mongolian, Portuguese Spanish, Korean, Japanese, et al.)Portuguese Spanish, Korean, Japanese, et al.)
English proficiency levels from Novice to English proficiency levels from Novice to AdvancedAdvanced
Ages 18 to 55, mean = 24.3, SD = 6.8Ages 18 to 55, mean = 24.3, SD = 6.8
Test AdministrationTest Administration
1. Orientation. Logged on to computer. 1. Orientation. Logged on to computer. 2. Responses recorded as they spoke.2. Responses recorded as they spoke.3. Logged off. Wavefiles saved to server.3. Logged off. Wavefiles saved to server.
Scoring Method 1Scoring Method 1
Similar to Chaudron et al. (2005)Similar to Chaudron et al. (2005) Divide sentences into syllablesDivide sentences into syllables Mark each syllable with 1 or 0Mark each syllable with 1 or 0 Transcribe each mistake below the correct Transcribe each mistake below the correct
syllablesyllable Scoring 0-4 -1 for each errorScoring 0-4 -1 for each error
Scoring the ImitationsScoring the Imitations 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 11. If she lis tens, she will un der stand. 1. If she lis tens, she will un der stand. 44 1 0 1 1 1 1 11 0 1 1 1 1 12. Why had they liked peas so much2. Why had they liked peas so much? ? 33
1 1 0 1 1 0 1 1 1 1 1 1 1 13. Big ships will 3. Big ships will al ways make noise. al ways make noise. 33 (are)(are) 0 1 0 1 0 1 1 0 10 1 0 1 0 1 1 0 14. We should have ea ten break fast by now. 4. We should have ea ten break fast by now. 00 (They) (eat) (right) (They) (eat) (right)
1 1 1 0 0 1 1 1 1 0 1 0 0 0 1 1 11 1 1 0 0 1 1 1 1 0 1 0 0 0 1 1 15., If her heart were to stop beat ing we might not be a ble to help her! 5., If her heart were to stop beat ing we might not be a ble to help her! 0 0 (will)(be) (will) (being) (will)(be) (will) (being)
Scoring Method 2Scoring Method 2
Correct Syllable countCorrect Syllable count
1 point for each syllables repeated 1 point for each syllables repeated accuratelyaccurately
0 points for incorrect syllables0 points for incorrect syllables
Additional speaking tests Additional speaking tests administered to the subjectsadministered to the subjects
15 15 min. face-to-face placement interviewmin. face-to-face placement interview 30 min. simulated oral proficiency test (SOPI)30 min. simulated oral proficiency test (SOPI) scored by human raters scored by human raters 30 min. computer elicited oral achievement 30 min. computer elicited oral achievement
test (LAT) scored by human raterstest (LAT) scored by human raters OPI administered by certified ACTFL testers OPI administered by certified ACTFL testers
(stratified random sample)(stratified random sample)
ReliabilityReliability57 items 154 persons57 items 154 persons
57 Measured Items 57 Measured Items ITEM RELIABILITY = .98 ITEM RELIABILITY = .98
Person RAW SCORE-TO-MEASURE Person RAW SCORE-TO-MEASURE CORRELATION = .96CORRELATION = .96
CRONBACH ALPHA (KR-20) Person CRONBACH ALPHA (KR-20) Person RAW SCORE RELIABILITY = .96RAW SCORE RELIABILITY = .96
EI
TraditionalEI
Syllable OPIEI Traditional
Pearson Correlation 1 .925(**) .658(**)
Sig. (2-tailed) .000 .000 N 162 162 36EI Syllable Pearson
Correlation .925(**) 1 .648(**)
Sig. (2-tailed) .000 .000 N 162 162 36OPI Pearson
Correlation .658(**) .648(**) 1
Sig. (2-tailed) .000 .000 N 36 36 40
EI
TraditionalEI
SyllableOral
PlacemEI Traditional
Pearson Correlation 1 .925(**) .639(**)
Sig. (2-tailed) .000 .000
N 162 162 107
EI Syllable Pearson Correlation .925(**) 1 .691(**)
Sig. (2-tailed) .000 .000
N 162 162 107
Oral Placem
Pearson Correlation .639(**) .691(**) 1
Sig. (2-tailed) .000 .000
N 107 107 136
EI Traditiona
lEI
SyllableLAT
SpeakingEI Traditional
Pearson Correlation 1 .925(**) .551(**)
Sig. (2-tailed) .000 .000
N 162 162 55EI Syllable Pearson
Correlation .925(**) 1 .414(**)
Sig. (2-tailed) .000 .002 N 162 162 55
LAT Speaking
Pearson Correlation .551(**) .414(**) 1
Sig. (2-tailed) .000 .002
N 55 55 56
EI Traditional EI SyllableECT L2 Speak
EI Traditional Pearson Correlation 1 .925(**) .516(**)
Sig. (2-tailed) .000 .000
N 162 162 148
EI Syllable Pearson Correlation .925(**) 1 .465(**)
Sig. (2-tailed) .000 .000
N 162 162 148
ECT L2 Speak
Pearson Correlation.516(**) .465(**) 1
Sig. (2-tailed) .000 .000
N 148 148 161
EI Traditiona
lEI
SyllableECT L2 Speak OPI
Oral Placem
LAT Speaking
EI Traditional
Pearson Correlation 1 .925(**) .516(**) .658(**) .639(**) .551(**)
N 162 162 148 36 107 55
EI Syllable Pearson Correlation .925(**) 1 .465(**) .648(**) .691(**) .414(**)
N 162 162 148 36 107 55
ECT L2 Speak
Pearson Correlation .516(**) .465(**) 1 .432(**) .577(**) .442(**)
N 148 148 161 35 113 48
OPI Pearson Correlation .658(**) .648(**) .432(**) 1 .660(**) .652(*)
N 36 36 35 40 27 13
Oral Placem
Pearson Correlation .639(**) .691(**) .577(**) .660(**) 1 .(a)
N 107 107 113 27 136 0
LAT Speaking
Pearson Correlation .551(**) .414(**) .442(**) .652(*) .(a) 1
N55 55 48 13 0 56
Summary and ConclusionsSummary and Conclusions
We have presented large numbers of EI We have presented large numbers of EI items to almost 400 ESL items to almost 400 ESL studentsstudents
Student responses to EI are very consistentStudent responses to EI are very consistent Overall comparisons between EI scores and Overall comparisons between EI scores and
scores on other measures of oral language scores on other measures of oral language proficiency are promisingproficiency are promising
The EI task involves mechanisms similar to The EI task involves mechanisms similar to those used in spontaneous speech those used in spontaneous speech
Where do we go from here?Where do we go from here? We need to continue experimenting with the We need to continue experimenting with the
interrelationships between student responses interrelationships between student responses and EI variables such as sentence length, and EI variables such as sentence length, sentence complexity, and vocabulary.sentence complexity, and vocabulary.
We need to examine responder variables such We need to examine responder variables such as working memory, L1, age, etc.as working memory, L1, age, etc.
We need to use new analysis tools to examine We need to use new analysis tools to examine factors which contribute to learner responses.factors which contribute to learner responses.
Where do we go from here? Where do we go from here? (contd.)(contd.)
We need to experiment with new ways of We need to experiment with new ways of scoring and weighting items.scoring and weighting items.
We need to develop speech technology tools We need to develop speech technology tools to do the automatic scoring.to do the automatic scoring.
We need to develop an automated adaptive We need to develop an automated adaptive speaking test which includes EI, similar to speaking test which includes EI, similar to those used currently in reading and listeningthose used currently in reading and listening
FAST FAST (Fully Automated Speaking Test)(Fully Automated Speaking Test)
FAST was originally conceived as a test FAST was originally conceived as a test of oral fluencyof oral fluency
Fluency: The temporal aspect Fluency: The temporal aspect of oral proficiency of oral proficiency Cucchiarini, Strik & Boves (2000)Cucchiarini, Strik & Boves (2000)
Hesitation Phenomena in Hesitation Phenomena in Speech Speech
L2:L2:
Lennon, 1990Lennon, 1990Riggenbach,1991Riggenbach,1991Kuwahara,1995Kuwahara,1995Chaimanee 1999Chaimanee 1999
Language attrition:Language attrition: Russell, 1996Russell, 1996Kenny, 1996Kenny, 1996Nakuma, 1997Nakuma, 1997Yukawa, 1997, 1998Yukawa, 1997, 1998Hansen et al., 1998, Hansen et al., 1998, 20022002Tomiyama, 1999Tomiyama, 1999Nagasawa, 1999Nagasawa, 1999
L1: Goldman-Eisler,1968
Variables measured automatically for Variables measured automatically for calculation in the FAST algorithmcalculation in the FAST algorithm
Total length of silenceTotal length of silenceAverage length of silenceAverage length of silenceNumber of runs of speechNumber of runs of speechTotal length of speechTotal length of speechAverage length of speech runAverage length of speech run
Fluency studies of missionary language FASTManual measurement of temporality
Talk and Silence in the English Talk and Silence in the English Narratives of Fluent and Nonfluent Narratives of Fluent and Nonfluent
SpeakersSpeakers
Level 2 (n = 59)
ETTTT
ETTSP
Level 3 (n = 113)
ETTTT
ETTSP
Level 1
Talk
Silence
Native Speakers
Talk
Silence
Relationships of ESL level to temporality in Relationships of ESL level to temporality in L1 and English narratives: ANOVAL1 and English narratives: ANOVA
EnglishEnglish Mother tongue Mother tongueF sig. F sig. F sig. F sig.
SP timeSP time6.916.91 .001 .001 .082 .921.082 .921
SP length SP length 4.844.84 .009 .009 .157 .855.157 .855
Talk timeTalk time 6.72 .002 6.72 .002 .107 .898.107 .898
Run lengthRun length 3.60 .0303.60 .030 2.38 .0712.38 .071