CELEA/CAAL 2007 Beijing The Assessment of English Oral Proficiency: Alternative Measures

CELEA/CAAL 2007 CELEA/CAAL 2007 BeijingBeijing

The Assessment of English Oral Proficiency: The Assessment of English Oral Proficiency: Alternative Measures Alternative Measures

Lynne Hansen, C. Ray Graham, Jeremi Brewer, Lynne Hansen, C. Ray Graham, Jeremi Brewer, Rebecca Brewer, Wariyaporn TieocharoenRebecca Brewer, Wariyaporn Tieocharoen

Contact e-mail: [email protected] e-mail: [email protected]

Toward the automatic adminstration Toward the automatic adminstration and scoring of speaking testsand scoring of speaking tests

1.1. Computerized elicited imitation (EI)Computerized elicited imitation (EI) 2.2. Automated measurement of temporality Automated measurement of temporality

FAST (Fully Automated Speaking Test)FAST (Fully Automated Speaking Test) Purpose:Purpose: to develop a valid and reliable to develop a valid and reliable

automated instrument to measure oral automated instrument to measure oral proficiency in L2 Englishproficiency in L2 English

Comparing Theoretical Constructs Comparing Theoretical Constructs for Speaking Testsfor Speaking Tests

Psycholinguistic/Psycholinguistic/Empirical Empirical Language processing Language processing TemporalityTemporalitySentence orientationSentence orientation

EI, FAST, VersantEI, FAST, Versant(automated tests)(automated tests)

Communicative/ Communicative/

FunctionalFunctional________AuthenticityAuthenticityNegotiation of meaningNegotiation of meaningTurn-takingTurn-taking

OPIs, SOPIsOPIs, SOPIs

This paper is dedicated to the This paper is dedicated to the memory of Craig Chaudronmemory of Craig Chaudron. .

He created L2 elicitation He created L2 elicitation

instruments for Vietnamese instruments for Vietnamese and Indonesian.and Indonesian.

He pointed out the need for He pointed out the need for work on an English EI testwork on an English EI test

C. Chaudron et al. (2005, July). ElicitedC. Chaudron et al. (2005, July). Elicitedimitation as an oral proficiency measure.imitation as an oral proficiency measure.Paper presented at the AILA ConferencePaper presented at the AILA Conference

Craig Chaudron

What is Elicited Imitation (EI)?What is Elicited Imitation (EI)?

A B C

History of Elicited ImitationHistory of Elicited Imitation

1963: Child L1 development1963: Child L1 development- (e.g. Fraser, Bellugi & Brown, 1963 )(e.g. Fraser, Bellugi & Brown, 1963 )

1964: Diagnose language abnormalities1964: Diagnose language abnormalities- (e.g. Menyuk, 1964)(e.g. Menyuk, 1964)

1970’s: L2 acquisition1970’s: L2 acquisition- (e.g. Naiman, 1974)(e.g. Naiman, 1974)

Two Major Thrusts in Two Major Thrusts in L2 Elicited Imitation ResearchL2 Elicited Imitation Research

Psycholinguistic research into language Psycholinguistic research into language competence and SLA processescompetence and SLA processes(Erlam, 2006)(Erlam, 2006)

Indirect measurement of oral language Indirect measurement of oral language proficiencyproficiency(Bley-Vroman & Chaudron, 1994; (Bley-Vroman & Chaudron, 1994; Chaudron et al., 2005)Chaudron et al., 2005)

Bley-Vroman & Chaudron (1994)Bley-Vroman & Chaudron (1994) ““We regard it as premature to view elicited We regard it as premature to view elicited

imitation as a proven method for inferring imitation as a proven method for inferring learner competence…”learner competence…”

But…But… ““the more you know of a foreign language, the more you know of a foreign language,

the better you can imitate the sentences of the better you can imitate the sentences of the language. Thus EI is a reasonable the language. Thus EI is a reasonable measure of global proficiency.” (p. 247)measure of global proficiency.” (p. 247)

Pilot Study InstrumentsPilot Study Instruments Three Forms of 60 sentences Three Forms of 60 sentences (13 repeated on all three (13 repeated on all three

forms,47 unique to each form)forms,47 unique to each form) Sentence length 3 to 24 syllablesSentence length 3 to 24 syllables Wide variety of morphological and syntactic formsWide variety of morphological and syntactic forms Variety of lexical items Variety of lexical items (81.3%=K1, (81.3%=K1,

6.7%=K2, .23%=AWL, 11.6%=Off)6.7%=K2, .23%=AWL, 11.6%=Off) Sentences selected according to criteria Sentences selected according to criteria (Chaudron (Chaudron

et al, 2005)et al, 2005) Recorded in studio; male and female voicesRecorded in studio; male and female voices

Subjects of Pilot StudySubjects of Pilot Study

223 223 learners of English in an IEP in U.S.learners of English in an IEP in U.S. 13 L1 backgrounds 13 L1 backgrounds (Chinese, Spanish, (Chinese, Spanish,

Korean, Japanese, Mongolian, etc.)Korean, Japanese, Mongolian, etc.) English proficiency levels from Novice English proficiency levels from Novice

to Advancedto Advanced Ages 18 to 53, mean = 24.5, SD = 6.9Ages 18 to 53, mean = 24.5, SD = 6.9

Form A ReliabilityForm A Reliability58 items 78 persons58 items 78 persons

Person RAW SCORE-TO-MEASURE Person RAW SCORE-TO-MEASURE CORRELATION = .98CORRELATION = .98

CRONBACH ALPHA (KR-20) Person CRONBACH ALPHA (KR-20) Person RAW SCORE RELIABILITY = .97RAW SCORE RELIABILITY = .97

58 Measured Items 58 Measured Items ITEM RELIABILITY = .98 ITEM RELIABILITY = .98

Persons (N=78) -MAP- Items (N=60) <high ability>|<high item difficulty> 110 + 06 | 100 + | 07 90 + 78 | | 60 80 + |T | 40 42 70 + 39 72 75 T| 08 38 41 | 43 45 73 74 71 |S 76 | 09 37 41 | 10 60 40 42 43 + 11 23 77 S| 36 59 22 59 | 60 67 69 68 | 03 12 35 38 39 57 58 61 62 | 13 57 25 56 63 70 | 34 56 58 50 21 24 34 55 66 +M 04 14 33 44 33 35 36 52 54 64 65 M| 15 48 46 12 26 32 37 49 | 16 32 55 02 13 45 48 50 | 31 47 54 51 53 | 17 30 50 07 08 16 19 27 30 47 | 19 18 40 06 10 11 14 17 18 46 S+ 25 28 53 01 04 31 44 | 24 29 09 28 29 | 01 23 26 27 52 03 05 15 |S 20 | 22 49 30 T+ 21 | 20 |T 05 20 + | 51 10 + 02

<low ability>|<low difficulty> Figure 1. Form A Person/Item Map

Form A Items with Unacceptable Form A Items with Unacceptable Point Measure CorrelationsPoint Measure Correlations

1016 Have you slept ? 1016 Have you slept ? 3015 Maybe she likes cats.3015 Maybe she likes cats. 3025 She quickly jumped down.3025 She quickly jumped down. 4018 They play games.4018 They play games. 4008 The situation in Iraq calls for 4008 The situation in Iraq calls for

diplomacy and sensitivity.diplomacy and sensitivity.

Instrument Instrument

The 60 best discriminating items from The 60 best discriminating items from the pilot studythe pilot study

Sentence length 5 to 22 syllablesSentence length 5 to 22 syllables Recorded in studio by male and Recorded in studio by male and

female voices female voices

Subjects Subjects

156 156 learners in a university ESL program in learners in a university ESL program in the U.S. the U.S.

12 L1 backgrounds (12 L1 backgrounds (Chinese, Mongolian, Chinese, Mongolian, Portuguese Spanish, Korean, Japanese, et al.)Portuguese Spanish, Korean, Japanese, et al.)

English proficiency levels from Novice to English proficiency levels from Novice to AdvancedAdvanced

Ages 18 to 55, mean = 24.3, SD = 6.8Ages 18 to 55, mean = 24.3, SD = 6.8

Test AdministrationTest Administration

1. Orientation. Logged on to computer. 1. Orientation. Logged on to computer. 2. Responses recorded as they spoke.2. Responses recorded as they spoke.3. Logged off. Wavefiles saved to server.3. Logged off. Wavefiles saved to server.

Scoring Method 1Scoring Method 1

Similar to Chaudron et al. (2005)Similar to Chaudron et al. (2005) Divide sentences into syllablesDivide sentences into syllables Mark each syllable with 1 or 0Mark each syllable with 1 or 0 Transcribe each mistake below the correct Transcribe each mistake below the correct

syllablesyllable Scoring 0-4 -1 for each errorScoring 0-4 -1 for each error

Scoring the ImitationsScoring the Imitations 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 11. If she lis tens, she will un der stand. 1. If she lis tens, she will un der stand. 44 1 0 1 1 1 1 11 0 1 1 1 1 12. Why had they liked peas so much2. Why had they liked peas so much? ? 33

1 1 0 1 1 0 1 1 1 1 1 1 1 13. Big ships will 3. Big ships will al ways make noise. al ways make noise. 33 (are)(are) 0 1 0 1 0 1 1 0 10 1 0 1 0 1 1 0 14. We should have ea ten break fast by now. 4. We should have ea ten break fast by now. 00 (They) (eat) (right) (They) (eat) (right)

1 1 1 0 0 1 1 1 1 0 1 0 0 0 1 1 11 1 1 0 0 1 1 1 1 0 1 0 0 0 1 1 15., If her heart were to stop beat ing we might not be a ble to help her! 5., If her heart were to stop beat ing we might not be a ble to help her! 0 0 (will)(be) (will) (being) (will)(be) (will) (being)

Scoring Method 2Scoring Method 2

Correct Syllable countCorrect Syllable count

1 point for each syllables repeated 1 point for each syllables repeated accuratelyaccurately

0 points for incorrect syllables0 points for incorrect syllables

Additional speaking tests Additional speaking tests administered to the subjectsadministered to the subjects

15 15 min. face-to-face placement interviewmin. face-to-face placement interview 30 min. simulated oral proficiency test (SOPI)30 min. simulated oral proficiency test (SOPI) scored by human raters scored by human raters 30 min. computer elicited oral achievement 30 min. computer elicited oral achievement

test (LAT) scored by human raterstest (LAT) scored by human raters OPI administered by certified ACTFL testers OPI administered by certified ACTFL testers

(stratified random sample)(stratified random sample)

ReliabilityReliability57 items 154 persons57 items 154 persons

57 Measured Items 57 Measured Items ITEM RELIABILITY = .98 ITEM RELIABILITY = .98

Person RAW SCORE-TO-MEASURE Person RAW SCORE-TO-MEASURE CORRELATION = .96CORRELATION = .96

CRONBACH ALPHA (KR-20) Person CRONBACH ALPHA (KR-20) Person RAW SCORE RELIABILITY = .96RAW SCORE RELIABILITY = .96

EI

TraditionalEI

Syllable OPIEI Traditional

Pearson Correlation 1 .925(**) .658(**)

Sig. (2-tailed) .000 .000 N 162 162 36EI Syllable Pearson

Correlation .925(**) 1 .648(**)

Sig. (2-tailed) .000 .000 N 162 162 36OPI Pearson

Correlation .658(**) .648(**) 1

Sig. (2-tailed) .000 .000 N 36 36 40

EI

TraditionalEI

SyllableOral

PlacemEI Traditional


Sig. (2-tailed) .000 .000

N 162 162 107

EI Syllable Pearson Correlation .925(**) 1 .691(**)

Sig. (2-tailed) .000 .000

N 162 162 107

Oral Placem

Pearson Correlation .639(**) .691(**) 1

Sig. (2-tailed) .000 .000

N 107 107 136

EI Traditiona

lEI

SyllableLAT

SpeakingEI Traditional


Sig. (2-tailed) .000 .000

N 162 162 55EI Syllable Pearson

Correlation .925(**) 1 .414(**)

Sig. (2-tailed) .000 .002 N 162 162 55

LAT Speaking

Pearson Correlation .551(**) .414(**) 1

Sig. (2-tailed) .000 .002

N 55 55 56

EI Traditional EI SyllableECT L2 Speak

EI Traditional Pearson Correlation 1 .925(**) .516(**)

Sig. (2-tailed) .000 .000

N 162 162 148

EI Syllable Pearson Correlation .925(**) 1 .465(**)

Sig. (2-tailed) .000 .000

N 162 162 148

ECT L2 Speak

Pearson Correlation.516(**) .465(**) 1

Sig. (2-tailed) .000 .000

N 148 148 161

EI Traditiona

lEI

SyllableECT L2 Speak OPI

Oral Placem

LAT Speaking

EI Traditional

Pearson Correlation 1 .925(**) .516(**) .658(**) .639(**) .551(**)

N 162 162 148 36 107 55

EI Syllable Pearson Correlation .925(**) 1 .465(**) .648(**) .691(**) .414(**)

N 162 162 148 36 107 55

ECT L2 Speak

Pearson Correlation .516(**) .465(**) 1 .432(**) .577(**) .442(**)

N 148 148 161 35 113 48

OPI Pearson Correlation .658(**) .648(**) .432(**) 1 .660(**) .652(*)

N 36 36 35 40 27 13

Oral Placem

Pearson Correlation .639(**) .691(**) .577(**) .660(**) 1 .(a)

N 107 107 113 27 136 0

LAT Speaking

Pearson Correlation .551(**) .414(**) .442(**) .652(*) .(a) 1

N55 55 48 13 0 56

Summary and ConclusionsSummary and Conclusions

We have presented large numbers of EI We have presented large numbers of EI items to almost 400 ESL items to almost 400 ESL studentsstudents

Student responses to EI are very consistentStudent responses to EI are very consistent Overall comparisons between EI scores and Overall comparisons between EI scores and

scores on other measures of oral language scores on other measures of oral language proficiency are promisingproficiency are promising

The EI task involves mechanisms similar to The EI task involves mechanisms similar to those used in spontaneous speech those used in spontaneous speech

Where do we go from here?Where do we go from here? We need to continue experimenting with the We need to continue experimenting with the

interrelationships between student responses interrelationships between student responses and EI variables such as sentence length, and EI variables such as sentence length, sentence complexity, and vocabulary.sentence complexity, and vocabulary.

We need to examine responder variables such We need to examine responder variables such as working memory, L1, age, etc.as working memory, L1, age, etc.

We need to use new analysis tools to examine We need to use new analysis tools to examine factors which contribute to learner responses.factors which contribute to learner responses.

Where do we go from here? Where do we go from here? (contd.)(contd.)

We need to experiment with new ways of We need to experiment with new ways of scoring and weighting items.scoring and weighting items.

We need to develop speech technology tools We need to develop speech technology tools to do the automatic scoring.to do the automatic scoring.

We need to develop an automated adaptive We need to develop an automated adaptive speaking test which includes EI, similar to speaking test which includes EI, similar to those used currently in reading and listeningthose used currently in reading and listening

FAST FAST (Fully Automated Speaking Test)(Fully Automated Speaking Test)

FAST was originally conceived as a test FAST was originally conceived as a test of oral fluencyof oral fluency

Fluency: The temporal aspect Fluency: The temporal aspect of oral proficiency of oral proficiency Cucchiarini, Strik & Boves (2000)Cucchiarini, Strik & Boves (2000)

Hesitation Phenomena in Hesitation Phenomena in Speech Speech

L2:L2:

Lennon, 1990Lennon, 1990Riggenbach,1991Riggenbach,1991Kuwahara,1995Kuwahara,1995Chaimanee 1999Chaimanee 1999

Language attrition:Language attrition: Russell, 1996Russell, 1996Kenny, 1996Kenny, 1996Nakuma, 1997Nakuma, 1997Yukawa, 1997, 1998Yukawa, 1997, 1998Hansen et al., 1998, Hansen et al., 1998, 20022002Tomiyama, 1999Tomiyama, 1999Nagasawa, 1999Nagasawa, 1999

L1: Goldman-Eisler,1968

Variables measured automatically for Variables measured automatically for calculation in the FAST algorithmcalculation in the FAST algorithm

Total length of silenceTotal length of silenceAverage length of silenceAverage length of silenceNumber of runs of speechNumber of runs of speechTotal length of speechTotal length of speechAverage length of speech runAverage length of speech run

Fluency studies of missionary language FASTManual measurement of temporality

Talk and Silence in the English Talk and Silence in the English Narratives of Fluent and Nonfluent Narratives of Fluent and Nonfluent

SpeakersSpeakers

Level 2 (n = 59)

ETTTT

ETTSP

Level 3 (n = 113)

ETTTT

ETTSP

Level 1

Talk

Silence

Native Speakers

Talk

Silence

Relationships of ESL level to temporality in Relationships of ESL level to temporality in L1 and English narratives: ANOVAL1 and English narratives: ANOVA

EnglishEnglish Mother tongue Mother tongueF sig. F sig. F sig. F sig.

SP timeSP time6.916.91 .001 .001 .082 .921.082 .921

SP length SP length 4.844.84 .009 .009 .157 .855.157 .855

Talk timeTalk time 6.72 .002 6.72 .002 .107 .898.107 .898

Run lengthRun length 3.60 .0303.60 .030 2.38 .0712.38 .071

Documents

CELEA/CAAL 2007 Beijing The Assessment of English Oral Proficiency: Alternative Measures