28
1 Processing Corpus- derived Multi-unit Sequences by L2 English Learners Fei Fei Second Language Studies Program Michigan State University May 19, Beijing

Processing Corpus-derived Multi-unit Sequences by L2 English Learners

  • Upload
    chavi

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Processing Corpus-derived Multi-unit Sequences by L2 English Learners. Fei Fei Second Language Studies Program Michigan State University May 19, Beijing. Purpose of the present study. - PowerPoint PPT Presentation

Citation preview

Page 1: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

1

Processing Corpus-derived Multi-unit

Sequences by L2 English Learners

Fei Fei Second Language Studies Program

Michigan State University

May 19, Beijing

Page 2: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

2

Purpose of the present study

Formulaic language use has long been one of the research foci in the study of second language acquisition. For L2 learners of intermediate and advanced proficiency, formulaic language was the biggest stumbling block to sounding nativelike (Wray, 2002).

However, most studies on formulaic sequences focus

on textual and descriptive aspects. Few studies investigate multi-unit sequence processing among L2 English learners. Even fewer explore individual factors in multi-unit sequence processing.

Page 3: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

3

What is a formulaic sequence?

• Formulaic sequences: stored and retrieved holistically from memory at the time of use.

• Issues are

1.Compositionality (e.g., Howarth, 1998; Wray, 2002)

2.Representation and production (e.g., Sinclair, 1991; N. Ellis, 1996)

3.Development in L2 (e.g., Wong-Fillmore, 1976)

Page 4: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

4

Why corpus-derived multi-unit sequences?

Word strings or lexical bundles generated based on frequency may not be stored holistically in mind, and also formulaic sequences stored as a whole may not be identified through certain corpus analysis.

However, Wray (2002, p. 25) suggested that “frequency as a salient, perhaps even a determining, factor in the identification of formulaic sequences.”

Numerous studies of formulaic sequences are based on corpus frequency (e.g., Sinclair & Renouf, 1988; DeCock, Granger, Leech & McEnery, 1998; Moon, 1998; Hunston & Francis, 2000).

The more often a word string is needed, the more likely it is to be stored in prefabricated form to save processing effort. Once it is stored, the more likely it is to be the preferred choice at the time of use.

Page 5: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

5

What is a corpus-derived MUS?

In short, corpus-derived multi-word sequence(s)

• are based on corpus frequency; • may not be psycholinguistically valid;• is either fully fixed in form, or semi-preconstructed

phrases.• is a subset of formulaic sequences

Page 6: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

6

Target multi-unit sequences• Schmitt et al.’s (2004) : Longman Grammar of Spoken and Written English Lexical Phrases and Language Teaching Hyland’s list BNC (British National Corpus) CANCODE (Cambridge and Nottingham Corpus of

Discourse in English) MICASE (Michigan Corpus of Academic Spoken English)

• Biber (2004): the T2K-SWAL Corpus (TOEFL 2000 Spoken and Written Academic Language Corpus)

• ANC: American National Corpus http://americannationalcorpus.org/frequency.html

Page 7: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

7

Schmitt et al’s (2004) study on processing MUS: textual attributes

• Frequency

• Length

• Transparency in terms of meaning and function

Page 8: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

8

Individual variables in processing multi-unit sequences: proficiency

• Research (Hinger & Spottl, 2002; Spottl & McCarthy, 2003, 2004; Schmitt et al., 2004) indicated that vocabulary size and language proficiency were two factors in investigating cross-linguistic lexical operations.

• Spottl & McCarthy (2004), in their cross-linguistic study of formulaic sequences, argued that without a certain level of general language proficiency, noticing did not even take place, and word strings were completely ignored, or simply avoided by learners. L2 language proficiency was defined as scores on a proficiency test (upper intermediate and advanced level).

• In Schmitt et al.’s (2004) study, the highest level non-native speakers in the study demonstrated native-like performance mostly.

Page 9: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

9

Individual variables in processing MUS: working memory

• Working memory can be divided into two main components: one is phonological short-term memory (STM), and the other is storage and processing capacity, referred to as the Central Executive (CE).

• Previous studies showed that WM can affect:

1. L2 syntactic processing and development (e.g., Ellis & Sinclair, 1996; Ellis, 2001; Juffs, 2004);

2. L2 lexical processing and development (e.g., French, 2003; Papagno & Vallar, 1995);

3. L2 proficiency and aptitude (e.g., Kroll, Michael, Tokowicz, & Dufour, 2002; Payne & Whitney, 2002; Service & Kohonen, 1995).

Page 10: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

10

Individual variables in processing MUS: working memory

• Myles et al. (1999) found that STM capacity can predict the ability to chunk. “Chunking”, in their study, was defined as the ability to remember set phrases in L2 and later use them appropriately.

• Roberts and Gibson (2001) found high correlations

between sentence memory and complex span; sentence memory and N-back span. It was argued that memory for sentences was not simply a result of linguistic experience; rather, it was likely that an independent working memory component contributes to participants’ performance on sentence memory.

Page 11: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

11

In sum,

The present study seeks to test the role of proficiency and WM in L2 English learners’ processing of high frequency multi-unit sequences. Influences of textual attributes of MUS are also addresses. The study may contribute to explaining the variances in L2 English learners’ formulaic language use.

Page 12: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

12

Research questions

• What is the relationship between proficiency, WM and participants’ processing of MUS?

• Do textual attributes of MUS affect how L2 learners process them?

• What are the linguistic features of learners’ reproduction of MUS?

Page 13: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

13

Participants• Thirty-two adult L2 English learners participated in the

present study.

• They were graduate students recruited from a wide range of disciplines from a big Mid-western university in the States. The reported TOEFL scores ranged from 570 to 650.

• Participants' ages ranged from 21 to 38, with 10-14 years of formal English learning experience.

• All participants were native speakers of Chinese and had been living in the United States for less than 2 years.

Page 14: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

14

Measuring the variables

• Elicited Imitation (EI) test is used as a measure of learner’s knowledge of precise grammatical factors (e.g. Hamayan et al., 1977; Gallimore and Tharp, 1981; Munnich et al., 1994), L2 competence (Baddeley et al., 1998; N. Ellis, 2001), and implicit knowledge (Erlam, 2006).

• The utterance elicited is argued to reflect the degree to which a test taker is able to assimilate the stimulus into an internal grammar (Munnich et al., 1994).

• “The basic idea is that if the stretches of language are long enough, it overloads working memory, and the person is forced to reconstruct the content of the dictation via their language resources, rather than repeating the dictation back from rote memory. One of those language resources is the inventory of formulaic sequences stored in memory.” (Schmitt et al., 2004)

Page 15: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

15

Measuring the variables• The Elicited Imitation (EI) test is available at http://distancelearning.llc.msu.edu/research/chunks/ with assigned ID and password

• Two factors in designing an EI test: sentence length (Bley-Vroman and Chauron, 1994) and time pressure (R. Ellis, 2005)

• There were two tasks in the EI test Task 1 was a passage revised based on Schmitt's study (2004),

which contained 25 target multi-word sequences. Task 2 included 18 target multi-word sequences derived from the

American National Corpus and the T2K-SWAL Corpus. They were embedded into 18 single sentences.

• Scoring: complete reproduction = 2 points attempted reproduction with missing lexis = 1 point missing reproduction = 0 point

Page 16: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

16

Measuring the variables

• The Working Memory test included

1. a reverse digit span task (15 items)

2. a word span task (15 items)

• Both span tasks were classical WM tasks. They were adapted and written by two researchers. The length of WM test items varies from 5 to 8 for both reverse digit span and word span.

Page 17: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

17

Summary of the variables• Dependent variable: processing of MWS as indicated

by participants’ mean scores on the Elicited Imitation (EI) test

• Independent variables Individual factors 1. Language proficiency (TOEFL scores within 2 years)2. Working memory

Textual attributes1. Frequency2. Length3. Transparency in terms of meaning and function

Page 18: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

18

Quantitative results

Intercorrelations Among Proficiency, WM and Dictation scores

Scores on the EI test

Proficiency Reverse digit span

Word span WM in total

Scores on the EI test

1.000

Proficiency .586** 1.000

Reverse digit span

.333 .323 1.000

Word span .616** .551** .658** 1.000

WM in total .484** .449* .948** .864** 1.000

Note.** Correlation significant at the 0.01 level (2-tailed).

* Correlation significant at the 0.01 level (2-tailed).

Page 19: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

19

Quantitative results

Results of Multiple Regression Analysis

Scores on EI test = -136.664 + 0.942 *word span + 0.191*proficiency

Predictors in the model

R R2 R2△ F B S.E. Beta t Sig.

(Constant) -136.647 47.697 -2.865 .008

Word span .616 .379 .357 17.086 ** .942 .377 .420 2.495 .019

Proficiency .683 .467 .427 11.805 ** .192 .091 .355 2.105 .045

Page 20: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

20

Quantitative results

Means, SD and t-tests of Textual Factors: Transparency, Length, Frequency

Note. * p <0.05 level ** p <0.01 level

Independent variables

Groups Mean SD t Sig. (2-tailed)

Transparency High 32.6842 16.63014 3.004 .006**

Low 19.6667 10.07220

Length High 27.3750 15.29937 0.979 .334

Low 22.9474 13.97805

Frequency High 27.1200 16.46795 0.891 .378

Low 23.0556 11.94828

Page 21: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

21

Qualitative results

• Close examination of the transcribed data showed the following:

• (a) Complementizers in the clauses were not produced in general (e.g. “that” in multi-word sequences such as “make sure that” and “I understand that;”)

• (b) Participants reconstructed multi-word sequences in a creative way (e.g. “in a variety of” was produced as “in varieties of,” “have varieties of,” and have various (colors);” )

• (c) There were many cases where semantically similar sequences were produced (e.g. “from the point of view” was replaced by phrases such as “as to,” “for,” “in terms of;”)

• (d) There were L1 interferences in reproduction (e.g. Three participants used “day and night” rather than “night and day.”)

• It is assumed that the participants may have retrieved more frequent or salient MUS within the same lexical framework (morph-syntax interface).

Page 22: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

22

Discussion

The primary purpose of the present study is to examine the impact of textual and individual factors on L2 English learners’ processing of corpus-derived MUS.

Page 23: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

23

Discussion: WM and proficiency

• The finding that general proficiency played a role in processing MUS was consistent with previous studies (Spottl et al., 2002; Schmitt et al., 2002).

• However, when WM was taken into consideration, the results were mixed. Evidence indicated that different memory tasks functioned differently in the processing of MUS. Specifically, there was no significant relationship between the reverse digit span and the performance scores.

• Significant correlation was found between the word span and the performance scores. This finding was consistent with Roberts and Gibson’s (2003) view that STM as measured by simple word span may be a better indicator of individual differences in online processing.

Page 24: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

24

Discussion: WM and proficiency

• The findings were also supported by Myles et al. (1999) who concluded that high-word-span learners can accumulate more chunks than low-span learners. The more chunks a learner has, the more comparisons he/she can carry out to establish cross-chunk analyses. The more frequent chunk-internal analyses have been made, the easier it is to process chunks online.

• However, the results needed to be treated with caution. This study investigated only a small number of MUS (43 in total).

Page 25: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

25

Discussion: Textual attributes

• Significant differences were only found when MUS were categorized based on the degree of transparency in terms of meaning and function. However, there were no significant differences in terms of processing when MUS were categorized based on frequency or length.

• A plausible interpretation was that the results had to do with contextual information, that is, sentences, with the target sequences embedded, might mitigate the differences in terms of frequency or length to a certain extent.

Page 26: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

26

Conclusion

• Implications: the relationship MUS and language proficiency

• Robinson (2002) stressed that “WM is only one of a complex set of cognitive factors that come together to account for learners’ performance.” In this study, two individual factors (proficiency and WM as measured by word span) account for 46.7% of the variance of the scores on the EI test. Future studies might include other variables in order to achieve a better understanding of MUS processing.

• So, which variables to choose? Do we need a model?

Page 27: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

27

Next steps

• Pausing as a significant indicator (R. Ellis)

• Using Chinese EFL learner’s corpus

• A sample of 50 participants and a NS control group

• Data from stimulated recall for qualitative analysis

• Issue of scoring EI test (Prof. Hansen)

• The issue of using an EI test for FS will be addressed in a follow-up study.

Page 28: Processing Corpus-derived Multi-unit Sequences by L2 English Learners

28

T

H AN

K

YO

谢 谢

U