Processing Corpus-derived Multi-unit Sequences by L2 English Learners

1

Processing Corpus-derived Multi-unit

Sequences by L2 English Learners

Fei Fei Second Language Studies Program

Michigan State University

May 19, Beijing

2

Purpose of the present study

Formulaic language use has long been one of the research foci in the study of second language acquisition. For L2 learners of intermediate and advanced proficiency, formulaic language was the biggest stumbling block to sounding nativelike (Wray, 2002).

However, most studies on formulaic sequences focus

on textual and descriptive aspects. Few studies investigate multi-unit sequence processing among L2 English learners. Even fewer explore individual factors in multi-unit sequence processing.

3

What is a formulaic sequence?

• Formulaic sequences: stored and retrieved holistically from memory at the time of use.

• Issues are

1.Compositionality (e.g., Howarth, 1998; Wray, 2002)

2.Representation and production (e.g., Sinclair, 1991; N. Ellis, 1996)

3.Development in L2 (e.g., Wong-Fillmore, 1976)

4

Why corpus-derived multi-unit sequences?

Word strings or lexical bundles generated based on frequency may not be stored holistically in mind, and also formulaic sequences stored as a whole may not be identified through certain corpus analysis.

However, Wray (2002, p. 25) suggested that “frequency as a salient, perhaps even a determining, factor in the identification of formulaic sequences.”

Numerous studies of formulaic sequences are based on corpus frequency (e.g., Sinclair & Renouf, 1988; DeCock, Granger, Leech & McEnery, 1998; Moon, 1998; Hunston & Francis, 2000).

The more often a word string is needed, the more likely it is to be stored in prefabricated form to save processing effort. Once it is stored, the more likely it is to be the preferred choice at the time of use.

5

What is a corpus-derived MUS?

In short, corpus-derived multi-word sequence(s)

• are based on corpus frequency; • may not be psycholinguistically valid;• is either fully fixed in form, or semi-preconstructed

phrases.• is a subset of formulaic sequences

6

Target multi-unit sequences• Schmitt et al.’s (2004) : Longman Grammar of Spoken and Written English Lexical Phrases and Language Teaching Hyland’s list BNC (British National Corpus) CANCODE (Cambridge and Nottingham Corpus of

Discourse in English) MICASE (Michigan Corpus of Academic Spoken English)

• Biber (2004): the T2K-SWAL Corpus (TOEFL 2000 Spoken and Written Academic Language Corpus)

• ANC: American National Corpus http://americannationalcorpus.org/frequency.html

7

Schmitt et al’s (2004) study on processing MUS: textual attributes

• Frequency

• Length

• Transparency in terms of meaning and function

8

Individual variables in processing multi-unit sequences: proficiency

• Research (Hinger & Spottl, 2002; Spottl & McCarthy, 2003, 2004; Schmitt et al., 2004) indicated that vocabulary size and language proficiency were two factors in investigating cross-linguistic lexical operations.

• Spottl & McCarthy (2004), in their cross-linguistic study of formulaic sequences, argued that without a certain level of general language proficiency, noticing did not even take place, and word strings were completely ignored, or simply avoided by learners. L2 language proficiency was defined as scores on a proficiency test (upper intermediate and advanced level).

• In Schmitt et al.’s (2004) study, the highest level non-native speakers in the study demonstrated native-like performance mostly.

9

Individual variables in processing MUS: working memory

• Working memory can be divided into two main components: one is phonological short-term memory (STM), and the other is storage and processing capacity, referred to as the Central Executive (CE).

• Previous studies showed that WM can affect:

1. L2 syntactic processing and development (e.g., Ellis & Sinclair, 1996; Ellis, 2001; Juffs, 2004);

2. L2 lexical processing and development (e.g., French, 2003; Papagno & Vallar, 1995);

3. L2 proficiency and aptitude (e.g., Kroll, Michael, Tokowicz, & Dufour, 2002; Payne & Whitney, 2002; Service & Kohonen, 1995).

10

Individual variables in processing MUS: working memory

• Myles et al. (1999) found that STM capacity can predict the ability to chunk. “Chunking”, in their study, was defined as the ability to remember set phrases in L2 and later use them appropriately.

• Roberts and Gibson (2001) found high correlations

between sentence memory and complex span; sentence memory and N-back span. It was argued that memory for sentences was not simply a result of linguistic experience; rather, it was likely that an independent working memory component contributes to participants’ performance on sentence memory.

11

In sum,

The present study seeks to test the role of proficiency and WM in L2 English learners’ processing of high frequency multi-unit sequences. Influences of textual attributes of MUS are also addresses. The study may contribute to explaining the variances in L2 English learners’ formulaic language use.

12

Research questions

• What is the relationship between proficiency, WM and participants’ processing of MUS?

• Do textual attributes of MUS affect how L2 learners process them?

• What are the linguistic features of learners’ reproduction of MUS?

13

Participants• Thirty-two adult L2 English learners participated in the

present study.

• They were graduate students recruited from a wide range of disciplines from a big Mid-western university in the States. The reported TOEFL scores ranged from 570 to 650.

• Participants' ages ranged from 21 to 38, with 10-14 years of formal English learning experience.

• All participants were native speakers of Chinese and had been living in the United States for less than 2 years.

14

Measuring the variables

• Elicited Imitation (EI) test is used as a measure of learner’s knowledge of precise grammatical factors (e.g. Hamayan et al., 1977; Gallimore and Tharp, 1981; Munnich et al., 1994), L2 competence (Baddeley et al., 1998; N. Ellis, 2001), and implicit knowledge (Erlam, 2006).

• The utterance elicited is argued to reflect the degree to which a test taker is able to assimilate the stimulus into an internal grammar (Munnich et al., 1994).

• “The basic idea is that if the stretches of language are long enough, it overloads working memory, and the person is forced to reconstruct the content of the dictation via their language resources, rather than repeating the dictation back from rote memory. One of those language resources is the inventory of formulaic sequences stored in memory.” (Schmitt et al., 2004)

15

Measuring the variables• The Elicited Imitation (EI) test is available at http://distancelearning.llc.msu.edu/research/chunks/ with assigned ID and password

• Two factors in designing an EI test: sentence length (Bley-Vroman and Chauron, 1994) and time pressure (R. Ellis, 2005)

• There were two tasks in the EI test Task 1 was a passage revised based on Schmitt's study (2004),

which contained 25 target multi-word sequences. Task 2 included 18 target multi-word sequences derived from the

American National Corpus and the T2K-SWAL Corpus. They were embedded into 18 single sentences.

• Scoring: complete reproduction = 2 points attempted reproduction with missing lexis = 1 point missing reproduction = 0 point

http://distancelearning.llc.msu.edu/research/chunks/



16

Measuring the variables

• The Working Memory test included

1. a reverse digit span task (15 items)

2. a word span task (15 items)

• Both span tasks were classical WM tasks. They were adapted and written by two researchers. The length of WM test items varies from 5 to 8 for both reverse digit span and word span.

17

Summary of the variables• Dependent variable: processing of MWS as indicated

by participants’ mean scores on the Elicited Imitation (EI) test

• Independent variables Individual factors 1. Language proficiency (TOEFL scores within 2 years)2. Working memory

Textual attributes1. Frequency2. Length3. Transparency in terms of meaning and function

18

Quantitative results

Intercorrelations Among Proficiency, WM and Dictation scores

Scores on the EI test

Proficiency Reverse digit span

Word span WM in total

Scores on the EI test

1.000

Proficiency .586** 1.000

Reverse digit span

.333 .323 1.000

Word span .616** .551** .658** 1.000

WM in total .484** .449* .948** .864** 1.000

Note.** Correlation significant at the 0.01 level (2-tailed).

* Correlation significant at the 0.01 level (2-tailed).

19


Results of Multiple Regression Analysis

Scores on EI test = -136.664 + 0.942 *word span + 0.191*proficiency

Predictors in the model

R R2 R2△ F B S.E. Beta t Sig.

(Constant) -136.647 47.697 -2.865 .008

Word span .616 .379 .357 17.086 ** .942 .377 .420 2.495 .019

Proficiency .683 .467 .427 11.805 ** .192 .091 .355 2.105 .045

20


Means, SD and t-tests of Textual Factors: Transparency, Length, Frequency

Note. * p <0.05 level ** p <0.01 level

Independent variables

Groups Mean SD t Sig. (2-tailed)

Transparency High 32.6842 16.63014 3.004 .006**

Low 19.6667 10.07220

Length High 27.3750 15.29937 0.979 .334

Low 22.9474 13.97805

Frequency High 27.1200 16.46795 0.891 .378

Low 23.0556 11.94828

21

Qualitative results

• Close examination of the transcribed data showed the following:

• (a) Complementizers in the clauses were not produced in general (e.g. “that” in multi-word sequences such as “make sure that” and “I understand that;”)

• (b) Participants reconstructed multi-word sequences in a creative way (e.g. “in a variety of” was produced as “in varieties of,” “have varieties of,” and have various (colors);” )

• (c) There were many cases where semantically similar sequences were produced (e.g. “from the point of view” was replaced by phrases such as “as to,” “for,” “in terms of;”)

• (d) There were L1 interferences in reproduction (e.g. Three participants used “day and night” rather than “night and day.”)

• It is assumed that the participants may have retrieved more frequent or salient MUS within the same lexical framework (morph-syntax interface).

22

Discussion

The primary purpose of the present study is to examine the impact of textual and individual factors on L2 English learners’ processing of corpus-derived MUS.

23

Discussion: WM and proficiency

• The finding that general proficiency played a role in processing MUS was consistent with previous studies (Spottl et al., 2002; Schmitt et al., 2002).

• However, when WM was taken into consideration, the results were mixed. Evidence indicated that different memory tasks functioned differently in the processing of MUS. Specifically, there was no significant relationship between the reverse digit span and the performance scores.

• Significant correlation was found between the word span and the performance scores. This finding was consistent with Roberts and Gibson’s (2003) view that STM as measured by simple word span may be a better indicator of individual differences in online processing.

24

Discussion: WM and proficiency

• The findings were also supported by Myles et al. (1999) who concluded that high-word-span learners can accumulate more chunks than low-span learners. The more chunks a learner has, the more comparisons he/she can carry out to establish cross-chunk analyses. The more frequent chunk-internal analyses have been made, the easier it is to process chunks online.

• However, the results needed to be treated with caution. This study investigated only a small number of MUS (43 in total).

25

Discussion: Textual attributes

• Significant differences were only found when MUS were categorized based on the degree of transparency in terms of meaning and function. However, there were no significant differences in terms of processing when MUS were categorized based on frequency or length.

• A plausible interpretation was that the results had to do with contextual information, that is, sentences, with the target sequences embedded, might mitigate the differences in terms of frequency or length to a certain extent.

26

Conclusion

• Implications: the relationship MUS and language proficiency

• Robinson (2002) stressed that “WM is only one of a complex set of cognitive factors that come together to account for learners’ performance.” In this study, two individual factors (proficiency and WM as measured by word span) account for 46.7% of the variance of the scores on the EI test. Future studies might include other variables in order to achieve a better understanding of MUS processing.

• So, which variables to choose? Do we need a model?

27

Next steps

• Pausing as a significant indicator (R. Ellis)

• Using Chinese EFL learner’s corpus

• A sample of 50 participants and a NS control group

• Data from stimulated recall for qualitative analysis

• Issue of scoring EI test (Prof. Hansen)

• The issue of using an EI test for FS will be addressed in a follow-up study.

28

T

H AN

K

YO

谢谢

U

Documents

Processing Corpus-derived Multi-unit Sequences by L2 English Learners