Collen Tinea Senc i On

  • Upload
    asa3000

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

  • 8/10/2019 Collen Tinea Senc i On

    1/38

    Language Learning ISSN 0023-8333

    A Corpus-Based Analysis of the DiscourseFunctions of Ser/Estar+ Adjective in Three

    Levels of Spanish as FL Learners

    Joe Collentine

    Northern Arizona University

    Yuly Asencion-Delaney

    Northern Arizona University

    Research on the acquisition of Spanishs two copulas, serandestar, provides an un-

    derstanding of the interaction among syntax, semantics, pragmatics, morphology, and

    vocabulary during development (e.g., Geeslin, 2003a, 2003b; Gunterman, 1992; Ryan

    & Lafford, 1992). Recent research suggests that linguistic features in the surround-

    ing discourse influence learners copula choice. We present a corpus-based analysis

    of the lexico-grammatical features co-occurring with copula + adjective usage among

    foreign-language learners of Spanish at three levels of instruction. Findings revealedthe following: (a) both ser+ adjective andestar+adjective occur at all levels where

    little linguistic complexity typically occurs; (b) ser+adjective appears in descriptive

    and evaluative discourse; and (c)estar+adjective is present in narrations, descriptions,

    and hypothetical discourse.

    Keywords second language acquisition; Spanish interlanguage; learner corpus; corpus

    linguistics; grammatical development;serandestar; copula choice

    Introduction

    Studying the acquisition of Spanish copulas, ser andestar, interests second

    language acquisition (SLA) researchers because it requires studying syn-

    tax, semantics, pragmatics, morphology, and vocabulary during development

    We wish to thank Dr. Roy St. Laurent of the Northern Arizona University Statistical Consulting

    Lab for his valuable assistance in the design of the statistical analyses of this project. Any errors

    reside solely with us. Our thanks also go to Dr. Vincent and Dr. Ojeda for their financial support to

    transcribe the texts written by the learners.Correspondence concerning this article should be addressed to Joe Collentine, Northern Arizona

    University Modern Languages Box 6004 Flagstaff AZ 86011 Internet: Joseph Collentine@

  • 8/10/2019 Collen Tinea Senc i On

    2/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    (Leonetti, 1994). Although this might seem particular to Spanish as a second

    language (L2), the acquisition of these verbs shows how learners acquire one

    of the two basic Indo-European sentence types (Halliday, 1970): predicative(e.g., Juan corre r apidamenteJohn runs quickly) and attributive sentences

    (e.g., Juan es r apido John is quick), with the ser/estar (S/E) distinction

    forming the central verbal element of the latter. Pragmatically speaking, the

    S/E distinction requires knowing when the relationship between the subject

    and adjective involves characterization (Mar a es capazMary is capable) or

    identification (Mar a es la encargadaMary is the one in charge; Fernandez

    Leborans, 1999). Semantically, S/E can differ aspectually, with estaroften con-

    noting the perfective aspect (e.g., that an events time frame is short and limited

    in duration) and ser connoting the imperfective (e.g., the event is habitual)

    (Lujan, 1981). Morphologically, Spanish adjectives inflect for person and num-

    ber, which is especially difficult for learners whose first language (L1) has few

    inflections, like English. Finally, the number of adjectives that learners must as-

    sociate with eitherserorestarpresents lexical challenges. Geeslin (2003a) and

    Silva-Corvalan (1986, 1994) reminded us that even native speakers of Spanish

    show much variation in S/E usage with adjectives as a function of pragmatic

    considerations.

    Traditionally (and in current learner textbooks), ser+adjective segmentsdescribe a subjects permanent, seemingly unchanging characteristics. How-

    ever, estar+ adjective segments describe temporary, dynamic characteristics

    of a subject. It is for this reason that an adjective like aburridoboring/bored

    insoy aburrido, which usesser, produces the meaning I am boring, whereas

    inestoy aburrido, which usesestar, yields roughly I am bored; with ser, the

    boredom is constant, whereas withestar,the stateand its effect on others

    should pass. Nonetheless, this traditional view has come under much empirical

    scrutiny, with the works of Geeslin (2003a) and Silva-Corvalan (1986, 1994)showing that this explanation only scratches the surface of the pragmatic nu-

    ances that native speakers consider when choosing their copula.

    Studying the acquisition of S/E provides a means to address various SLA

    questions (e.g., orders of acquisition, the role of study abroad), and researchers

    have used various methodologies (e.g., error analysis of open-ended conver-

    sations, raters judging the semantic intent of learner utterances). Recent S/E

    research suggests that learner copula selection is sensitive to lexical and gram-

    matical features (often referred to together as lexico-grammatical features) in

    the surrounding discourse. Corpus-linguistics methods are particularly suited

    to study the interaction between a construct and its lexical and grammatical con-

  • 8/10/2019 Collen Tinea Senc i On

    3/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    analysis, we present a large-scale corpus-based analysis of learners use of

    S/E+ adjective at different instructional levels.

    Ser/EstarSLA Research to Date

    Initial S/E research identified developmental stages in instructed contexts, fo-

    cusing on accuracy and omission rates.Estaremerges in later stages, especially

    inestar+adjective segments. VanPatten (1985, 1987) studied oral interviews,

    grammaticality judgments, and informal class observations to propose five

    stages: (a) copula absence, (b)seras the default copula, (c)estarwith progres-

    sive, (d) estarwith locatives, and (e)estarwith adjectives of condition. Sim-

    plification, communicative value, frequency in input, and L1 transfer influence

    these stages (VanPatten, 1987). Researchers have studied whether VanPattens

    stages generalize to study-abroad and Peace Corps experiences (Gunterman,

    1992; Ryan & Lafford, 1992). Oral proficiency interviews in both Gunter-

    mans study and Ryan and Laffords study confirmed most stages, withestar+

    adjectives of condition appearing beforeestarwith locatives.

    Although accuracy studies reveal that these two copulas develop in a pre-

    dictable fashion, they do not explain the variability in S/E usage. Additionally,

    these studies appeared when SLA research was highly concerned with the roleof input in acquisition, and explanations focused on issues such as the cop-

    ulas individual frequency and communicative value/saliency (Ryan & Lafford,

    1992; VanPatten, 1985, 1987) in the input. Ryan and Lafford (1992) attributed

    the late emergence ofestar+adjective to access to naturalistic input. Nonethe-

    less, we know almost nothing about the input (e.g., the types of discourse)

    that learners process in naturalistic settings or over the course of a semester

    in at-home or study-abroad settings (Collentine, 2008). SLA theory posits that

    output (be it from instructional interventions or naturalistic experiences) playsas strong a role as input at latter stages of acquisition (Shehadeh, 2002; Swain,

    1985), which is whenestar+adjective emerges. What type of communication,

    then, do learners generate that coincides withestar+ adjective emergence?

    Some evidence suggests that copula + adjective production improves as

    learners grow in the complexity of the discourse types they generate. Copula +

    adjective segments help beginning learners to relate simple messages, contain-

    ing a subject and a verb without elaboration (e.g., accompanied by adverbs).

    Gunterman (1992) noted that when communication became difficult, learners

    resorted toser+ adjective segments. Because the questions typically elicited

    descriptions explanations and definitions the [peace corps volunteers] were

  • 8/10/2019 Collen Tinea Senc i On

    4/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    p. 1297). Descriptive discourse is structurally and semantically basic, depicting

    a situations important nouns and their states (e.g., via adjectives); descriptions

    lack dynamic details about events and changes of states. Estar+ adjectiveappears in Guntermans data, where learners went beyond descriptions to com-

    municate narrative discourse, which entails both a situations states and its

    events (often chronologically). Lafford (2004) attributed copula + adjective

    gains after a single semester of study abroad to the pragmatic constraints in-

    herent in real-world discourse . . . and perhaps to improved overall narrative and

    discursive abilities, proficiency, and fluency (p. 216; emphasis added). Subse-

    quent S/E research intimated that copula + adjective growth occurs as lexical

    and grammatical choices become sensitive to what appears in the surrounding

    discourse.

    In the copula + adjective segment, natives demonstrate variation in cop-

    ula selection because each copula affects different pragmatic and discursive

    interpretations (Geeslin, 2002; Silva-Corvalan, 1986), and so the copula +

    adjective context is ideal for studying how learners encode pragmatic and

    discursive information. Geeslin (2002, 2003a, 2003b) focused on different

    instructional levels while considering findings from sociolinguistic studies of

    copula + adjective language change in bilingual and monolingual communities

    (e.g., Silva-Corvalan, 1986) in which semantic, pragmatic, and sociolinguisticvariables such as frame of reference (i.e., comparison with group normJuan

    es alto John is tallor with the referentJuan est a alto Johns gotten

    tall), susceptibility of change (i.e., inherentJuan es inteligente John is

    intelligentvs. changingJuan est a viejoJohns gotten old), lexical class

    of the adjective (e.g., age, nationality), and semantic transparency (El mango es

    verde/El mango est a verdeThe mango is green/The mango is unripe vs. Juan

    es casado/Juan est a casadoJohn is married/John is just married) explained

    the overuse ofestar. Geeslin (2002) collected data from high school studentswith a guided interview, a picture-description task, and a contextualized ques-

    tionnaire, concluding that learners acquire the restriction of susceptibility of

    change earlier than the frame of reference restriction. Geeslin (2003a, 2003b)

    later examined copula choice with advanced learners using contextualized

    questionnaires, finding that semantic and pragmatic features interact to predict

    estarusage. She found that whereas advanced learners seem to overgeneral-

    ize pragmatic constraints such as frame of reference and experience with the

    referent, native speakers favor lexical and semantic constraints (e.g., predicate

    type) to decide when to use serorestar.

    Recently Geeslin (2003b) and Geeslin and Guijarro-Fuentes (2006) sug-

  • 8/10/2019 Collen Tinea Senc i On

    5/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    In the case of copula choice, advanced learners apply pragmatic

    constraints, even in contexts in which native speakers do not. In contrast,

    native speakers choose not to apply pragmatic constraints in favor oflexical and semantic constraints. (Geeslin, 2003a, p. 751)

    In copula selection, L2 Spanish learners may even be more sensitive to con-

    textual factors than native speakers, who appear to depend on local factors

    within the attributive copula + adjective segment (i.e., lexical and semantic

    constraints related to the interaction of the copula and the adjective alone);

    learners are sensitive to a wider context, apparently attending to speaker in-

    tent and implicatures (as pragmatic considerations would imply) as well as

    to lexico-grammatical features in the surrounding discourse. Geeslin (2003a,p. 748) noted that words/phrases that imply change near a copula +adjective

    segment apparently cause advanced learners to select estarthe copula asso-

    ciated with changing stateseven when the relationship between the adjective

    and the copula necessitates the use ofserthe copula associated with perma-

    nent states. Thus, to better understand the factors surrounding learners S/E

    usage, we might well ask the following:

    What are the contextual features that co-occur with each copula+ adjective

    segment at different levels of instruction? What types of discourse (e.g., narratives, descriptions) are usually associ-

    ated with each segment?

    Corpus Techniques and the Study of Context

    Geeslins (2002, 2003a, 2003b) research shows by way of rater judgments

    that the pragmatic intent of copula + adjective segments influences whether

    learners use ser orestar. It is also reasonable to suspect that discourse typeinfluences copula selection in important ways. Recall that Gunterman (1992)

    argued thatser+ adjective andestar+ adjective segments are distributed within

    different discourse types. Additionally, Lafford (2004) related learners copula

    selection gains to the expansion of the types of discourse they can produce.

    Myles and Mitchell (2004) argued that SLA researchers should take note that

    corpus research examining large collections of digitized documents has had a

    considerable role in furthering the field of discourse analysis. Accordingly, the

    present study employs a variety of corpus-based techniques to understand the

    contextual features that co-occur withser+ adjective andestar+ adjective use

    in addition to the discursive functions that learners at different levels assign to

  • 8/10/2019 Collen Tinea Senc i On

    6/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    the following section we not only briefly delimit what corpus-based research

    can reveal about SLA, but we also describe important corpus assumptions

    and techniques. Because our analysis compares learner data to corpus-basednative-speaker models, we also describe relevant perspectives that recent corpus

    research has uncovered about the nature of Spanish discourse.

    Not only does a corpus-based approach lend itself to questions of L2 dis-

    course development, but the techniques also permit empirical comparisons

    between learner behaviors and native-speaker models. For instance, using an

    English learner corpus and two British native-speaker corpora, Siyanova and

    Schmitt (2007) found that, in informal speech, learners are less likely to use

    two-word verb constructs (e.g.,run into,put off) than are native English speak-

    ers. One advantage of comparing learner performance to native-speaker models

    is that the SLA researcher can make empirically defensible and testable assump-

    tions about the end state of the acquisition process, an approach we adopt in

    the present study.

    Myles (2005) and Myles and Mitchell (2004) lamented that SLA research

    has not been quick to embrace new technologies for collecting and analyzing

    data, especially as it relates to corpus linguistics. They argued that corpus lin-

    guistics complements the current research by examining large amounts of data

    with relative ease, thus increasing the generalizability of findings (Rutherford& Thomas, 2001). Still, some notable corpus-based SLA research has con-

    tributed to our understanding of the context on language development (Belz,

    2004; Collentine, 2004; Granger, Hung, & Petch-Tyson, 2002; Klein & Purdue,

    1997). Some corpus research exists onserandestar.

    Corpus-Based S/E Findings

    Corpus-based S/E research provides some evidence that learners copula choice

    is sensitive to contextual factors and that there is reason to suspect that Spanishcopula+ adjective segments are distributed to different discourse types. Cheng,

    Lu, and Giannakouros (2008) examined a corpus of Mandarin Chinese L1

    learners of Spanish. They show how advanced learners copula choice varies

    according to the pragmatic intent of the surrounding discourse they themselves

    produce. They reported that exploratory writing evoked greaterestar+ad-

    jective usage and that estar+ adjective is compatible with the semantic and

    pragmatic goals of narratives or descriptions. Collentine (2008), in an invited

    commentary article on Cheng et al. (2008), conducted a study on whether cop-

    ula+adjective segments might serve discernable discourse functions in native

    Spanish discourse His analysis uncovered a significant interaction between

  • 8/10/2019 Collen Tinea Senc i On

    7/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    of discourse, whereas estar+ adjective was most frequent in dramas, which

    entail much evaluative language and monologues containing descriptions, and

    narratives. These two studies suggest that copula +adjective use by learnersand native speakers is not influenced by local features alone (which range from

    within the copula+ adjective phrase structure to the lexico-grammatical char-

    acteristics of the discourse) but also by communicative goals such as the type

    of discourse being produced.

    Techniques, Tools, and Utility of Corpus Based-Research

    Corpus linguistics ranges in complexity. Minimally, it utilizes searchable digi-

    tized texts sampled in a representative fashion, depending on the studys focus.

    Textual information is critical for statistical procedures (just as it is for indi-

    vidual learners), and so files are tagged with header information, such as topic,

    source type, biographical information about the author, and purpose (argu-

    mentative essay, narrative). Concordance applications and scripting languages

    allow researchers to search for specific segments and tabulate their frequencies

    by text. When investigators need to search for morphosyntactic information

    (e.g., all adjectives, all verbs whose infinitive is eitherserorestar), they often

    use a part-of-speech tagger: a series of software modules that annotates ev-

    ery word with information about its major word classes (e.g., adjective, noun,verb, determiner, preposition), basic morphological information (e.g., plural,

    preterit), as well as its lemma (i.e., its unmarked, dictionary root, such as a

    verbs infinitive or a nouns masculine, singular form).

    Part-of-speech tagging requires a dictionary with lexical and grammatical

    information about the possible words in a language (some words have more

    than one entry because languages have many synonyms). For the present project

    we compiled our own dictionary and we utilized a training set (which assists

    tagging ambiguous forms) from samples from theCorpus del espa nol(Biber,Davies, Jones, & Tracey-Ventura, 2006) as well as software routines from the

    Natural Language Tool Kit (NLTK; http://www.nltk.org/). After the corpus is

    tagged in this way, the investigator must verify the accuracy of the tagging

    and fix errors (individual and/or systematic) through further programming.

    An increasingly popular technology to create search patterns (regardless of

    the tagging software) utilizes regular expressions, a sophisticated wild-card-

    and variable-based text-search system (e.g., \w{3,}symbolizes words of three

    letters or more; \w+ing symbolizes words of any length ending ining).

    Having a tagged corpus along with the flexibility of regular expressions

    provided us with a powerful means of studying a number of lexical and/or

  • 8/10/2019 Collen Tinea Senc i On

    8/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    (?:ser|estar)` (?:obvio|evidente) \j \w+que \ \w+is one way to search for

    every verb whose lemma is eitherserorestarfollowed by the adjectives obvio

    orevidentefollowed by the conjunctionque.It is important to make mention of two common corpus-statistical tech-

    niques. The process ofnorming is a numerical transformation of counts to

    account for the fact that individual texts vary in length and that longer texts

    can have a greater influence on the numerical distribution of any phenomenon.

    Investigators often norm frequency counts to an arbitrary number, such as per

    1,000 or per 10,000 words: The count of some phenomenon in a text is di-

    vided by the texts total word count, the quotient of which is multiplied by

    1,000 (a higher norming multiplier like 100,000 affords greater precision). The

    technique known as normalizinginvolves converting the count of some phe-

    nomenon to itsz-score value vis-a-vis its count in each document in the corpus

    (i.e., the difference of the phenomenons frequency and its mean occurrence

    in the corpus divided by its standard deviation). Normalizing is convenient

    for measuring the relative presence of two or more linguistic features within

    any given text, as one can easily sum two or more z-scores to calculate how

    concentrated those features are in any texts or group of documents while taking

    into account the fact that some linguistic phenomena are naturally scarce in a

    document (e.g., the subjunctive), whereas others are naturally common (e.g.,articles) (cf. Biber & Conrad, 2001). For instance, the frequency of adverbs of

    time and copula+ adjective segments are likely to vary in different ways across

    the texts of a corpus (e.g., adverbs of time may be generally more frequent).

    By summing the two segmentsz-score per document, we can find which texts

    have the highest concentration of the two.

    Corpus-Based Native-Speaker Models of Discourse

    According to Myles and Mitchell (2004), we now have the ability to definestructurally and statistically different discourse types. Thus, the present study

    not only compares learners copula selection behaviors between different levels

    of instruction, but it also attempts to identify the types of discourse learners

    produce when using S/E +adjective, based on a native-speaker model. Corpus

    linguistics has shown through factor analyses how lexico-grammatical struc-

    tures bundle together to produce different types of discourse (Biber & Conrad,

    2001). Biber et al. (2006) provided the first comprehensive analysis of Spanish,

    analyzing a 20 million-word Spanish corpus with written and oral data from a

    variety of registers. There are four types of discourse that Biber et al. (2006)

    identified that learners might well produce in written texts the features with

  • 8/10/2019 Collen Tinea Senc i On

    9/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    Table 1 Discourse dimensions and features targeted in the learner-native speaker com-

    parison (cf. Biber et al., 2006)

    Discourse type Lexico-grammatical features

    Informationally rich Singular and plural nouns

    Postnominal descriptive adjectives

    Prenominal descriptive adjectives

    Definite articles

    Prepositions

    Derived nouns

    Type-token ratio

    Long wordsa

    Sepassives (i.e., ergativese use)

    Hypothetical Subjunctive use

    Conditional use

    Future use

    Verbs of obligation and causation (e.g., dejar,permitir,

    hacer+ infinitive)

    Infinitives not preceded by a verb or article

    Verbs followed by an infinitive

    Progressive aspect (imperfect use or present participle) Dependentque clauses

    Narrative Clitic usage

    Imperfect tense/aspect

    Preterit tense/aspect

    Possessives

    Third-person pronouns

    Reflexivese and changes of states

    Infinitives not preceded by a verb or article

    Verbs followed by an infinitive

    Descriptive Postnominal descriptive adjectives

    Derived nouns

    Absence of all narrative variables

    aDefined as those that have an average number of characters in the dataset, plus that

    calculations standard deviation, plus one characterthus, six or more characters.

    Informationally richdiscourse is one that conveys large amounts of infor-

    mation densely. Derived nouns, adjectives, multisyllabic words, and passives

    convey information in a decidedly encyclopedic fashion Another important

  • 8/10/2019 Collen Tinea Senc i On

    10/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    & Conrad, 2001), perhaps because Spanish has a neatly defined mood system

    (with readily discernable inflections)is hypothetical discourse, which com-

    municates possibilities and counterfactual information. It is characterized byfeatures such as verbs in the subjunctive and the conditional. The other two

    discourse types identified by Biber et al. (2006) are well known to most (viz.,

    narrativesanddescriptions).

    Research Questions

    The present study adds to our understanding of the acquisition of how contextual

    variables interact with learners use of attributive sentences. Although the field

    has a good idea of the communicative factors that motivate copula choice, we

    do not know how each copula+ adjective segment works with other lexical and

    grammatical structures to communicate coherent discourse. To address this gap

    in the literature and to understand the discursive function that ser+adjective

    andestar+ adjective segments serve over time, we provide a corpus-based

    analysis of the lexico-grammatical features that predict the use of these two

    segments with foreign-language (i.e., at-home) learners in the first, second, and

    third years of the university level. More specifically, we address the following

    research questions:

    1. What are the lexico-grammatical features that co-occur withser+ adjective

    usage? What are the discursive functions that these co-occurring features

    serve?

    2. What are the lexico-grammatical features that co-occur withestar+ad-

    jective usage? What are the discursive functions that these co-occurring

    features serve?

    To address these questions, we present the results of a series of regressionanalyses predicting the occurrence of each copula + adjective segment from

    a variety of lexico-grammatical features (see the Corpus Description section).

    We predict that ser+ adjective and estar+ adjective segments will have

    distinct lexico-grammatical associations that change over time. Specifically,

    we posit thatser+adjective segments appear in simple discourses (e.g., highly

    descriptive and listlike discourse) and estar+ adjective segments become

    increasingly associated with discursive complexity. However, we posit that the

    association ofestar+adjective with a particular discourse type will be more

    difficult to identify because previous research suggests that even advanced

    learners are more sensitive to contextual (i e pragmatic) constraints than are

  • 8/10/2019 Collen Tinea Senc i On

    11/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    Method

    Corpus Description

    This study used a 432,511-word learner corpus of written Spanish, comprisingedited and nonedited compositions collected from English-speaking Spanish

    learners at three levels of instruction: first year (230,270 words), second year

    (109,224 words), and third year (93,017 words). The compositions were not

    specific tasks designed to collect the data for this study but rather writing

    samples used for assessment purposes. Students wrote letters, narratives, de-

    scriptions, summaries, and argumentative essays both in and out of class as well

    as on exams. Topics related to the textbook themes (e.g., family, childhood)

    and the cultural readings assigned in class. Each text was tagged for numerouslexical and grammatical features (see above).

    To determine what lexico-grammatical features co-occur with ser+adjec-

    tive andestar+adjective usage, we considered a total of 75 potentialpredictor

    variables, each operationalized in the form of a regular expression. In corpus

    studies, variables refer to the linguistic features in the texts being analyzed. This

    studys predictor variables included various lexical features, such as adjectives

    other than the ones in the copula + adjective frame (e.g., derived adjectives,

    adjective in postnominal position), nouns (e.g., derived nouns, feminine nouns,

    masculine nouns), adverbs (e.g., adverbs of place, adverbs of time), and verb

    classes (e.g., verb in imperfect aspect, verb in past participle), as well as mor-

    phosyntactic features such as dependent clauses, noun phrase configurations

    (e.g., article plus noun), pronoun usage (e.g., cliticthird person), as well as

    a variety of verb phrases (e.g., verbs of communication, verbs of knowledge).

    The set of variables considered involved all parts of speech, common mor-

    phosyntactic constructs studied by learners, as well as additional constructs

    studied in Biber et al. (2006).

    Data Analyses

    Learner Models Analysis

    To identify the types of lexico-grammatical features that learners use with

    ser+ adjective andestar+ adjective segments and to identify which vari-

    ables distinguish among the three levels of learners, we constructed regression

    models of lexical-grammatical regressors predicting copula +adjective usage:

    a ser+ adjective learner model and a estar+ adjective learner model. We

    constructed regression models for each copula + adjective segmentrather

    than, for instance, a single regression model for which the choice between the

    two is the dependent variablebecause the previous research suggests that the

  • 8/10/2019 Collen Tinea Senc i On

    12/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    motivating estar+ adjective usage (cf. Guntermann, 1992). The process in-

    volves screening a set of potential predictor variables for standard assumptions

    of linear regression, submitting the reduced set to a best-subsets analysisrather than a stepwise procedureto identify the so-called best subset, and,

    finally, comparing the predictor variables ability to distinguish among the three

    levels of learners in terms of copula+ adjective usage.

    We employed a standard data-screening process, identifying which of the

    potential predictor variables had honest correlations with the criterion variables,

    thus discarding the following: (a) variables that had no correlation with a crite-

    rion variable (by examining correlation coefficients and scatter plots between

    a potential predictor variable and the criterion); (b) variables that represented

    inflated correlations (i.e., where two features correlated highly with each other

    and constituted too high an overlap in semantic or structural properties, so as

    to avoid colinearity problems in the final model selection phase);1 and (c) vari-

    ables that constituted deflated correlations, eliminating predictor variables that

    had a highly reduced range of responses to the criterion variable (e.g., those

    variables whose frequency was very small, such as n = 2, regardless of the

    level of the participant or the genre). This screening of the data yielded a list

    of 58 potential linguistic variables (37 forser+adjective and 21 forestar+

    adjective) that could be meaningful for the regression analyses to be performed.Table 2 shows the preliminary list of variables.

    We used best-subsets analyses to derive the two regression models forser+

    adjective andestar+ adjective. Social scientists frequently employ stepwise

    procedures for building regression models. Although these procedures for vari-

    able selection work adequately for reducing a small set of potential predictor

    variables to a small, more meaningful set (e.g., a subset that does not have a

    high degree of overlap), statisticians do not favor stepwise analyses when the

    initial pool of predictor variables is extremely large (Miller, 2002), such as thepresent case. Following Rencher (2002), we employed instead a best-subsets

    analysis for building the two models for predictingser+adjective andestar+

    adjective. The principal advantage that a best-subsets approach has over sta-

    tistical/stepwise regression (with a large number of predictor variables) is that

    best-subsets approaches attempt to reduce the number of predictor variables

    by comparing various combinations of variables, whereas the stepwise pro-

    cedure attempts the reduction process by considering each and every potential

    predictor variable individually. The best-subsets approach has been shown to

    produce less spurious results than stepwise procedures when reducing a large

    set of potential predictor variables With large pools of potential predictor vari-

  • 8/10/2019 Collen Tinea Senc i On

    13/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    Table 2 Linguistic variables used in the study after initial data screening

    Variable class Ser+ adjective Estar+ adjective

    Noun noun - derived noun - masculine

    noun - feminine noun - singular

    Adjective adjective - derived adjective - singular

    adjective - feminine adjective - type 1

    adjective - masculine adjective - type 2

    adjective - plural

    adjective - postnominalUna

    casa grandea large house

    adjective - prenominalUnabella mansionA beautiful

    mansion

    adjective - singular

    adjective - type 1Descriptive

    adjective with four inflections:

    masculine, feminine, singular,

    and plural.Blanco/a(s) white

    adjective - type 2Descriptive

    adjective with two inflections:singular and plural.Interesante(s),

    liberal(es) interesting, liberal

    Pronoun clitic - third person clitic - preverbal

    pronoun - subject

    quesubordinator

    pronoun - third

    person

    quesubordinator

    Other noun phrase

    elements

    article noun segmentEl libro

    The book

    article noun segment

    possessive adjective

    definite article

    possessive adjective

    Verbs SE plus third-singular verb

    verb - Gustar-like

    SE plus 3rd-singular

    verb

    verb - third person verb - Gustar-like

    verb - communicationDecir

    say/tell, anunciarannounce,

    explicarexplain, etc.

    verb - imperfect verb - infinitive

    verb - third person

    verb - knowledge

    verb - past participle

    verb - presentparticiple

  • 8/10/2019 Collen Tinea Senc i On

    14/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    Table 2 Continued

    Variable class Ser+ adjective Estar+ adjective

    verb - infinitive 2; not preceded

    by verb or article

    verb - knowledgeSaber

    know,recordarrecall,

    entenderunderstand, etc.

    verb - observationVersee,

    escucharlisten, etc.

    verb - past participle

    verb - past subjunctive

    verb - periphrastic future

    verb - present participle

    verb - preterit

    verb - suasiveQuererwant,

    mandarorder

    verb aspect - progressive

    verb -

    suasiveQuerer

    want,mandar

    order, etc.

    verb -

    probabilityCreer

    believe,negar

    deny,dudar

    doubt, etc.

    Adverbs or

    adverbial clauses

    adverb - place

    adverb - time

    adverbial clauses - contingency adverbial clauses - time

    adverb - time

    adverbial clauses -

    contingency adverbial clauses -

    time

    Total 37 21

    Note.All adjectives in this list did not follow one of the two copulas.

    may never consider combinations of predictor variables that are equally good atpredicting the occurrence of the response variable (i.e., the dependent variable)

    in question.2 Because this analysis is computationally intensive and not avail-

    able in many commercial software packages for the social sciences, we used

    the statistical package R and its best-subsets regression package to perform the

    analysis (see Dalgaard, 2008).3

    We employed what is termed a subgroup regression analysis to determine

    which of the variables in the two models predictingser+adjective andestar+

    adjective usage distinguished among the three levels (Hardy, 1993). The pro-

    cess employs indicator variables (sometimes called dummy variables) to add

    categorical predictor variables (into the model described earlier) called differ-

  • 8/10/2019 Collen Tinea Senc i On

    15/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    predictor variable (i.e., the unique contribution of each level in our study to

    each coefficient calculated for the predictor variables), producing k 1 differ-

    ence (predictor) variable models, where k represents the number of groups.4

    Because this group-level coefficient effect process is derived from two regres-

    sion models, we adjusted the alpha for significant coefficient differences via a

    Bonferroni adjustment to 0.025 (i.e., 1 (1 .05)1/2).

    Native-Speaker Model Comparison

    To objectively identify the types of discourse that the lexico-grammatical struc-

    tures (dis)associated with each copula+ adjective segment represent (derived

    from the best-subsets analysis), we compare the two copula+adjective learner

    models with the native-speaker discourse model described in Table 1. Our

    analysis measured the extent to which the learners discourse possessed indi-

    cators ofinformational richness, hypothetical discourse, narrative discourse,

    anddescriptive discourse.

    As described earlier, we calculated the normed frequency of the occurrence

    of each of these variables in the learner corpus to a scale of 10,000 per text.

    Subsequently, we calculated the extent to which documents representing high

    concentrations of each copula+ adjective model correlated with high concen-

    trations of each of the four native-speaker discourse types in three steps: (1) Foreach document we calculatedz-score totals for both theser+adjective and the

    estar+adjective models; (2) for each document we calculated a z-score total

    for each of the four discourse types in Table 2; (3) we regressed the four dis-

    course typez-score totals against each of the copula modelz-score totals along

    with subregession analyses to assess differences between the three levels. A

    z-score value for any document on a given variablebe it a criterion variable as

    in step 1 or a regressor as in step 2represents the extent to which that variable

    is represented in that document vis-a-vis all other documents. Summing a setofz-scores produces a value representing to what extent any document had a

    concentration of that set of variables (see Biber et al., 2006, as well as Biber and

    Conrad, 2001, for in-depth discussions of this technique). Thus, summing the

    z-scores for each document for variables representing, say, narrative discourse

    indicated how narrative each document is. Likewise,z-score totals for the set of

    regressors representing theser+adjective model and for the set representing

    the estar+ adjective model for each document yields values indicating how

    much each document more or less represented each model. (Of course, all

    z-scores here must be weighted according to their+/ sign in the model.) The

    regression and subregression analyses answer the following question: When

  • 8/10/2019 Collen Tinea Senc i On

    16/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    are they more or less encyclopedic, hypothetical, narrative, or descriptive in

    nature? Again, because we employ two regression analyses, we adjusted the al-

    pha for significant coefficient differences via a Bonferroni adjustment to 0.025(i.e., 1 (1 .05)1/2).

    Finally, to identify documents for the qualitative analysis of the discursive

    nature of copula+ adjective usage, we chose to concentrate on those documents

    for each learner level that most represented each regression model derived from

    the best-subsets analysis. This simply entailed identifying those documents that

    had highz-score totals for theser+ adjective models and those with highz-sores

    for theestar+ adjective model, as described earlier in step 2.

    Results

    Learner Usage: Ser+ Adjective

    The best-subsets analysis identified 21 regressors predictingser+ adjective us-

    age across the three levels, with 16 constituting significant regressors (p .05).

    This model included twice as many predictor variables as the estar+adjective

    model did. Additionally, the amount of variation that the ser+adjective model

    accounted for was 41% in the use of the criterion variable, whereas the estar+

    adjective model only accounted for 5% of its criterion variable (see below).Theser+ adjective model accounted significantly forser+adjective usage,

    F(21, 1576)= 54.9;p = .000.

    Furthermore, the subgroup regression analysis revealed that 5 of these

    21 regressors significantly distinguished among the three levels of learners:

    pronoun - subject, adverbs of place, verb - gustar-like, verb - observation,

    and verb - past subjunctive (see Table 3). In the following we discuss these

    21 regressors by grouping them into six lexico-grammatical regressor cate-

    gories: adjectives, nouns, pronouns, adverbial constructions, grammatical verbvariables, and lexical verb variables. Within the relevant lexico-grammatical

    regressor categories, we discuss the five variables distinguishing among the

    levels.

    Seven of the regressors represented various features of descriptive adjec-

    tives, although none distinguished among the three levels of learners. Table 3

    indicates that each variable contributed significantly to the model. For the most

    part, adjectives predictedser+copula usage, with five associating positively

    (i.e., their coefficient sign was positive) and two were disassociated with the

    construction (i.e., the coefficient sign was negative). The positive, adjectival

    regressors reveal that perhaps not surprisingly a variety of adjectives represent-

  • 8/10/2019 Collen Tinea Senc i On

    17/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    Table 3 Best-subsets regression model forser+ adjective

    Coefficient Estimate sign Estimate Std. error ttest p

    (Constant) 81.371 9.011 9.030 .000

    adjective - feminine + .050 .020 2.470 .010

    adjective - masculine + .040 .020 2.180 .030

    adjective - plural + .100 .020 5.420 .000

    adjective - postnominal .170 .020 10.010 .000

    adjective - prenominal .200 .020 9.370 .000

    adjective - singular + .150 .020 9.680 .000

    adjective - type 2 + .070 .030 2.580 .010

    noun - derived + .020 .010 2.030 .040

    noun - feminine + .020 .010 2.640 .010

    pronoun - subjecta + .050 .010 5.800 .000

    adverbs of placea .060 .040 1.560 .120

    adverbial clauses - cause + .040 .030 1.500 .130

    verb - third person + .060 .010 12.380 .000

    verb - infinitive + .040 .010 3.730 .000

    verb - periphrastic future .070 .050 1.570 .120

    verb - past participlea .040 .020 1.530 .130

    verb - past subjunctive .110 .060 1.860 .060

    verb - Gustar-likea .040 .020 2.460 .010

    verb - communication .070 .030 2.300 .020

    verb - knowledge .090 .040 2.340 .020

    verb - observationa .080 .040 1.980 .050

    aVariable distinguishing between the levels of instruction.

    that at all levels in contexts/discourses where ser+ adjective segments ap-

    pear, learners use adjectives in general in a variety of inflections. Interestingly,however, the positive correlation with type-2 adjectives (i.e., adjectives with

    only two inflections: singular and plural) tempers this conclusion because they

    are also significantly associated with the criterion. Finally, although various

    morphological properties of adjectives associate with ser+adjective, this con-

    struction is not associated with more complex uses of adjectives becauseser+

    adjective is disassociated with adjectives that appear in either prenominal (e.g.,

    bella casabeautiful house) or postnominal position (e.g.,casa grandelarge

    house).

    An analysis of the two nominal regressors indicates that a certain degree

    of morphological nominal complexity occurs where ser + adjective segments

  • 8/10/2019 Collen Tinea Senc i On

    18/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    variable. The association with feminine nouns shows an association with the cri-

    terion variable of gender-inflectional processes, whereas the association with

    derived nouns (which represent nouns packaging semantic information in adense fashion, as these derived forms have a base/root morpheme and an addi-

    tional derivational morpheme; e.g., constitu-cion,sereni-dad,procesa-miento).

    It is important to note, however, that this is the only indication ofser+ adjective

    association with semantically dense forms. As with the adjectival regressors,

    neither of these two nominal regressors distinguished among the three levels,

    suggesting that the association ofser+adjective with a certain degree of mor-

    phological complexity occurs from the beginning to more advanced levels of

    instruction.

    Subject pronouns for the most part also appeared where there was a pre-

    ponderance ofser+adjective segments, although the subregression analysis

    revealed that this regressor significantly distinguished among the three levels

    of learners. The subregression analysis revealed that for the first-year learners

    subject pronouns were positively associated with ser+adjective (beta=0.06;

    std error=0.001), that for the second-year learners there was no association at

    all (beta= 0.001; std error= 0.017), and that for the third-year learners there

    was a disassociation with the criterion variable (beta = 0.06; std error=

    0.043); the analysis also revealed that the significant difference came from thefirst-year learners rather than the other two (t=3.00;p = .003), meaning that

    the association ofser+adjective with subject pronoun use was primarily due

    to the first-year-learner data.

    The best-subsets analysis identified two adverbial constructions as impor-

    tant contributing predictors of overall ser+adjective usage: adverbs of place

    and adverbial clauses of cause. Although neither of the two contributed sig-

    nificantly on an individual basis, adverbs of place significantly distinguished

    among the three levels of learners in terms of predicting whenser+ adjectivewould occur. The subregression analysis indicated that for the first-year learn-

    ers, adverbs of place were disassociated with ser+adjective (beta = 0.12;

    std error= 0.05), whereas these adverbs were (positively) associated with the

    criterion at the second (beta = 0.07; std error= 0.06) and third years (beta =

    0.06; std error= 0.09), with the significant difference being attributed to the

    difference between the first-year and second-year individual contributions to

    the model (t= 2.45;p = .015).

    There were six grammatical features of verbs that predictedser+adjective

    usage at the three levels. For the most part, verbal variables were disassociated

    with ser + adjective Similar to the adverbial regressors three were important

  • 8/10/2019 Collen Tinea Senc i On

    19/38

  • 8/10/2019 Collen Tinea Senc i On

    20/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    indicates that only 5% of the variance in the Spanish learners use ofestar+

    adjective could be explained by this regression model. This indicates that the

    association ofestar+ adjective with other lexical-grammatical features is weakwithin the interlanguage for all levels of learners. The model did account for

    a significant amount of the overall variation in estar+ adjective usage, [F(10,

    1590)= 8.42;p < .0001].

    As observed in Table 4, most of these 10 variables distinguished signifi-

    cantly among the three levels, with the subgroup regression analysis revealing

    that four regressors significantly distinguished among the three levels of learn-

    ers: type-2 adjectives (i.e., adjectives with singular and plural inflection), article

    noun segments, preverbal clitics, and possessive adjectives. It is interesting to

    note that this group of variables is entirely different from the group of signif-

    icant regressors for the ser+adjective copula. At any rate, these differences

    are considered below in the interpretation of the variables, where we discuss

    all 10 variables by grouping them into three lexico-grammatical regressor cat-

    egories: nominal (noun and adjectival), verbal, and syntactic variables.

    In contrast toser+adjective segments,estar+adjective is associated with

    decidedly basic grammatical properties. For example, noun phrases in discourse

    where estar+adjective occurs usually comprises nouns preceded by articles

    or possessive determiners (e.g., mi mama my mother, la universidad theuniversity) and adjectives that have only two inflections (e.g.,inteligentein-

    telligent) or adjectives in their singular form (altatall [feminine]). Three of

    the four level-distinguishing regressors identified in the subregression analysis

    Table 4 Best subset regression model forestar+ adjective

    Coefficient Estimate sign Estimate Std. error ttest p

    (Constant) 4.460 3.518 1.267 .205

    adjective - singular + .010 .003 2.459 .014

    adjective - type 2a + .020 .009 2.616 .009

    noun - singular .000 .002 1.716 .086

    article noun segmenta + .010 .002 3.544 .000

    possessive adjectivea + .010 .003 3.669 .000

    verb - Gustar-like .020 .008 2.419 .016

    verb - present participle + .030 .011 2.364 .018

    verb - probability + .020 .011 1.871 .062

    clitics - preverbala + .020 .005 3.617 .000

    adverbial clauses - cause + .020 .009 2.326 .020

  • 8/10/2019 Collen Tinea Senc i On

    21/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    were nominal in nature. Type-2 adjectives were found to distinguish signifi-

    cantly between first- and third-year learners (t= 2.73; p = .006), indicating

    that the trend to associate inflectionally simple adjectives with estar+adjec-tive appears to become stronger as learners progress in their acquisition of

    Spanish. This predictor variable was disassociated with the criterion variable

    (beta= 0.01, std error=0.009) for the first-year students and was positively

    associated with estar+ adjective for the second (beta = 0.01, std error=

    0.021) and third year (beta = 0.13, std error=0.046). The article noun seg-

    ment significantly distinguished only between first- and second-year learners

    (t=3.30; p =.001). This regressor was weakly associated with the criterion

    variable for first-year (beta=0.002; std error=0.003) and third-year students

    (beta = 0.003; std error= 0.011) and only slightly more associated with es-

    tar+ adjective for second-year students (beta = 0.018; std error= 0.005).

    Finally, possessive adjectives significantly distinguished between second-year

    and third-year learners (t= 2.78; p = .005). This regressor was found to be

    weakly associated with the criterion level for the first year (beta =0.008; std

    error=0.003) and the third year (beta = 0.003; std error=0.011) and only

    slightly more associated with estar+ adjective for the second-year writing

    (beta= 0.041; std error= 0.010).

    Among verbal regressors, the significant predictor variables also showedno evidence that complexity is associated with the criterion. Although Gustar-

    like verbs are usually associated with complex syntax, in the learners writing

    this variable is negatively associated with the occurrence ofestar+ adjective.

    The other grammatical verb formpresent participleis expected to co-occur

    with estar+ adjective because it is mostly associated with estar to form

    the progressive aspect. Indeed, its beta coefficient was the highest of those

    regressors included in the best-subsets analysis (0.030).

    Two syntactic features were positively associated with estar+ adjective.Preverbal clitics positively associated withestar+ adjectives at all levels, per-

    haps the only indication of complexity associated with this phrase structure.

    The other syntactic regressor, causal adverbial clauseswhich usually started

    with the conjunctionporquealso predicted criterion usage. Preverbal clitics

    was the only syntactic regressor variable that distinguished significantly be-

    tween learners use ofestar+ adjective at different levels. This variable was

    weakly associated with the criterion for first-year learners (beta = 0.006; std

    error=0.007), which increases modestly yet significantly (t=2.31;p=.021)

    into the second and third years, with the association being greater for second-

    (beta = 0 033; std error = 0 011) and third-year (beta = 0 056; std error =

  • 8/10/2019 Collen Tinea Senc i On

    22/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    Native-Speaker Model Comparison

    As explained earlier (see the Corpus Description section), our analysis also

    included a measurement (via regression analysis) of the extent to which thelearners texts with high concentrations of each copula +adjective model re-

    lated with high concentrations of each of four types of native-speaker discourse

    types: informational richness, hypothetical discourse, narrative discourse, and

    descriptive discourse. The native-speaker model comparison indicated that

    three native-speaker discourse types combined significantly and individually to

    predict where the ser+ adjective learner model occurred: hypothetical, narra-

    tive, and descriptive (see Table 5). As observed in Table 6, three also combined

    to predict where the estar+ adjective learner model held: information rich,

    hypothetical, and narrative.

    The information-rich discourse regressor indicates the extent to which doc-

    uments reflecting a copula + adjective model is accompanied by semanti-

    cally dense discourse. Considering the sign of the coefficientsspecifically,

    whereas the encyclopedic regressor in the ser+adjective model was signifi-

    cantly negativeser+adjective usage is not semantically dense. Interestingly,

    Table 5 Native-speaker discourse-type predictions of documents matching ser +adjective model

    Coefficient Estimate sign Estimate Std. error ttest p

    (Constant) + 0.001 0.131 9.007 .995

    information rich 0.079 0.045 1.776 .076

    hypothetical 0.303 0.039 7.766 .000

    narrative + 0.376 0.124 3.030 .002

    descriptive + 0.350 0.114 3.060 .002

    Note. F(4, 1596)= 24.38;p = .000; multipleR2: 0.06; adjustedR2: 0.06.

    Table 6 Native-speaker discourse-type predictions of documents matching estar+

    adjective model

    Coefficient Estimate sign Estimate Std. error ttest p

    (Constant) + 81.371 9.011 9.030 .000

    information rich + 0.129 0.026 4.954 .000

    hypothetical + 0.059 0.023 2.574 .010

    narrative + 0.550 0.072 7.592 .000

    descriptive + 0.135 0.067 2.025 .043

  • 8/10/2019 Collen Tinea Senc i On

    23/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    documents reflecting the estar+ adjective model appear to be semantically

    dense. Furthermore, because the subregression analysis showed no interlevel

    coefficient difference, we must surmise that this association is constant for allthree levels of instruction.

    The hypothetical regressor implies how much copula + adjective usage

    occurs when learners conjecture and present possible scenarios. Given their

    signs and significance levels, ser+ adjective discourse appears to represent

    the antithesis of hypothetical discourse andestar+ adjective usage contains

    hypothetical elements. The disassociation with ser+adjective discourse may

    be partially explained by the observation made earlier that epistemic verbs

    (representing stance) are entirely disassociated with ser+ adjective usage

    as well as the models exclusion of verbal entities like the subjunctive and

    periphrastic future. The subregression analysis indicates that ser+ adjective

    is wholly unhypothetical at the first year and that at the second and third years

    this disassociation raises to the level of no association. The hypothetical

    regressor was disassociated with the first-year learner data (beta = 0.638; std

    error=0.071), which was significantly below those of the second-year (beta=

    0.167; std error=0.098; t=7.652;p =.000) and third-year learnerser+

    adjective usage (beta= 0.024; std error= 0.052;t=3.280;p = .001).

    Theestar+ adjective association with hypothetical discourse is supportedin the above analysis because this model was associated with verbs of proba-

    bility. Additionally, the learnerestar+adjective regression analysis included

    causal adverbial clauses in theestar+adjective model, and cause-effect rela-

    tionships are an important tool for hypothesizing. This hypothetical regressor

    was not associated with the first-year coefficients (beta = 0.638; std error=

    0.071), which were significantly below the second-year (beta = 0.055; std er-

    ror= 0.040;t=4.042;p = .000) and third-year (beta = 0.029; std error=

    0.001;t=3.280;p = .001) coefficients.The narrative regressors generally indicate where learners used a cop-

    ula + adjective model accompanied by story-telling elements, although not

    necessarily whole narrations. Both copula +adjective segments appear to be

    significantly associated with the presence of narrative features. The subregres-

    sion analysis indicates that both the second- and third-year learners generate

    more narrative features where ser+adjective occurs than first-year learners:

    Although the coefficients for the second-year (beta =1.031; std error=0.192)

    and third-year (beta = 1.015; std error= 0.379) data were not significantly

    different (t=0.030; stdp=.976), the difference between the second- and first-

    year coefficients (beta = 0 133; std error = 0 177) was significant (t = 3 450;

  • 8/10/2019 Collen Tinea Senc i On

    24/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    estar+ adjective segments with narrative features remains constant through

    the three levels, as there were no significant interlevel coefficient differences.

    This is consistent with the learner regression analysis, which showed thatpresent participles, which denote durative aspectan important element of

    storieswere associated withestar+ adjective.

    Both copula + adjective learner models were associated with descriptive

    features, although the ser+adjective association was significant. This might

    seem surprising given the operationalization of Spanish descriptive discourse

    offered by Biber et al. (2006), which is almost entirely devoid of narrative

    features. The implication here is that both copula+adjective segments operate

    inbothnarrative and descriptive contexts beyond the first year of instruction.

    We see a significant transition toward greater association ofser+ adjective

    segments with descriptive features from first (beta = 0.154; std error=

    0.168), to second (beta=0.989; std error=0.165), to third year (beta=1.417;

    std error=0.350), with the second-year coefficients being greater than the first

    (t=4.880;p =.000) as well as the third-year coefficients being greater than

    the first (t=3.271;p =.001). Finally, It is important to note that strength of

    association ofestar+ adjective segments with narrative features (beta= 0.550;

    std error= 0.072) is almost four times as much as with descriptive features

    (beta= 0.135; std error= 0.067).

    Qualitative Analysis

    We contextualize the following qualitative analysis in consideration of the

    learner models presented above and of their association with the preceding

    native-speaker discourse models. Ser+ adjective discourse serves first-year

    learners in highly descriptive discourse. The first-year documents reveal that

    ser+adjective segments are employed to relate descriptions containing multi-

    ple chained adjectives wheresertends to be the most frequently inflected verb.The following are segments from midterm-exam letters students in a first-year

    course wrote to a Mexican friend to describe their girlfriend/boyfriend and

    his/her family.

    (1)yo estoy bien porque yo tengo novia. se llama jessica. ella [es] bonita,

    inteligente y elegante. ella tiene veinte a nos. ella es de oregon. y [es] moreno,

    bajo y muy bonita. ella lleva camiseta verde y jeans azules. sus ropas es mucho

    dolares. ella gusta bailar y cantar para mi. ella gusta tens . . . la madre de

    jessica [es] bonita, inteligente y bajo. se llama velerie. nosotros jugamos tenis

    mucho. ella [es] bueno. nosotros aprendamos la universidad. ella lleva camisa

    verde y los jeans azul en la universidad (I am well because I have a girlfriend

  • 8/10/2019 Collen Tinea Senc i On

    25/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    old. She is from Oregon and she is a brunette, short and very beautiful. She

    wears a green t-shirt and blue jeans. Her clothes cost a lot of dollars. She likes to

    dance and sing for me. She likes tennis. Jessicas mother is beautiful, intelligentand short. Her name is Valerie. We play tennis a lot. She is good. We learn it at

    the University. She wears a green shirt and blue jeans at the university . . .)

    (2) yo soy bien porque yo soy amo con novia, selena. ella [es] bonita y

    simpatica. ella [es] soltera y practicar. ella es alta y la ropa es mocha colores.

    mi muchacha lleva rojo gora, blanco jacqueta, azul jeans, y negro sandalias.

    ella es mi amora. selena (stays) con madre en casa grande. la familia [es] baja.

    la madre [es] rica y lista y soltera . . . (I am well because I am in love with

    my girlfriend, Selena. She is beautiful and nice. She is single and practical.

    She is tall and she wears clothes in a lot of colors. My girl wears a red cap,

    white jacket, blue jeans, and black sandals. She is my love. Selena stays with

    her mother in Casa Grande. Her family is small. Her mother is rich, smart and

    single . . .)

    In both of these samples we see simple discourse, grammar, and lexicon,

    with few verbs except for the copula and an overuse of subject pronouns.

    Additionally, although there are numerous adjectives in both segments, it is

    apparent that noun + adjective segments are scarce. These first-year samples

    are nonnarrative and possess almost no conjecturing.Among second-year learners,ser+ adjective segments appear in list fashion

    in discourse with few conjunctions expressing interpropositional relationships

    (e.g., ser+adjective + quecopula +adjective + that). Such loosely con-

    nected discourse not only describes people, places and concepts, but it also

    describes evaluations and reactions to events and states. As the learner model

    suggests, there is a marked absence of epistemic verbs to demonstrate the

    stance (verbs of knowledge, pienso que I think that; verbs of perception,

    vemos que we see that; verbs of communication, se dice que it is saidthat). Instead, copula +adjective segments present (seemingly) indisputable

    assertions. Structurally speaking, we see subject pronouns omitted to mark

    continuity; still, there are various referents and allusions to the things they do

    frequently. This probably accounts for why ser+ adjective segments are asso-

    ciated with a mix of descriptive and narrative features. Finally, the derivational

    sophisticationand thus semantic densityof the nouns employed is slightly

    greater at this level in nouns, although these are mostly cognates. The following

    is an argumentative essay a second-year student wrote using short stories as the

    topic.

    (3) este cuento es un ejemplo que muchos padres estan usando la tele-

  • 8/10/2019 Collen Tinea Senc i On

    26/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    si es realidad o no. los ninos no reciben la atencion que necesitan para crecer.

    tambien pienso que los j ovenes necesitan atenci on y amor en los primeros

    anos m as que de cuando [son] maduros porque cuando son jovenes ellos nosaben que [es] malo o que [es] bueno. tambien, la television [es] mala para

    los padres. para los adultos la puede ser un escape tan ellos no tienen hacer

    trabajo, o cosas diferentes que necesitan hacer durante el d a. pero, tambien

    pienso que hay diferentes programas que [son] buenas. hay programas que

    ensena como cocinar, leer (para los ni nos), y que dice que esta haciendo en el

    mundo hoy. no todos de los programas de televisi on [son] mala. pero yo pienso

    que [es] malo usar la mas de necesario. (. . . this story is an example that many

    parents are using the television as babysitters. I think this is a problem because

    young people dont know whether it is real life or not. Children do not receive

    attention enough to grow up. I also think that young people need attention and

    love in their first years of life more than when they are mature because when

    they are young they dont know what is good or what is bad. Also, television

    is bad for parents. For adults it can be an escape because they dont have to do

    their work or the different things they need to do during the day. But, I also

    think that there are different programs that are good. There are programs that

    teach you how to cook, to read (for children) and that tell you what is being

    done in the world today. Not all the TV programs are bad, but I think it is badto use it more than necessary.)

    With the third-year learners,ser+ adjective is less frequent, reflected by a

    lower overall averagez-score ofser+adjective. It is now mixed among other

    verbs in the third person and adjectives modifying nouns. The discourse is de-

    scriptive and evaluative in nature, with references to relevant events, producing

    a mix of descriptive and narrative elements. The following texts are expository

    essays students wrote in a third-year course about different occupations.

    (4) al principio de su vida, el bebe atleta es una hija diferente de sushermanas. el grito del bebe [es] mas fuerte, el apetito m as famelico y el

    cuerpo pequeno mas musculoso que los otros bebes . . . de repente, en la escuela

    primaria, es la estrella de su partido de futbol y la parte necesaria entre su

    equipo de b asquetbol. al fin, no se puede negar todos los hechos, ella es

    atleta. [es] seguro que hay cualidades particulares para las atletas; factores

    que definen las mujeres que aman los deportes . . . mientras que la atleta est a

    entrenandose, se come un diet etico rico con una variedad de las frutas y las

    verduras. sin las vitaminas y minerales de estas comidas, el cuerpo no funciona

    mejor. . .se come mucho pescado y tofu, [es] justo porque los dos son comidas

    saludables sin mucha grasa en el concepto de la diversion el cuerpo de la

  • 8/10/2019 Collen Tinea Senc i On

    27/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    cigarrillas. todas las actividades giran de la salud y se mantienen la buena

    salud. [es] necesario que las atletas pasen sus noches jugando los juegos

    activas como escondite y jugar al corre que te pillo. (At the beginning of herlife, the baby athlete is a different daughter from her sisters. Her crying is

    stronger, her appetite is more ravenous. And her small frame more muscular

    than the one of the other babies . . . Suddenly, in grade school, she is the star in

    her football game and the main player in her basketball team. At the end, you

    cannot deny all the facts, she is an athlete. It is sure that there are particular

    qualities to athletes, factors that define women that love sports . . . While the

    athlete is training, she has a rich diet with a variety of fruit or vegetables. Without

    the vitamins and minerals in this food, her body couldnt work better. . . a lot

    of fish and tofu is eaten. It is so because both are healthy foods without much

    fat. On the entertainment side, the body of the athlete is her temple. Thats why

    she doesnt spend her Fridays drinking beer and smoking cigarettes. All her

    activities go around her health in order to keep her healthy. It is necessary that

    athletes spend their nights playing active games such as hide and seek or run

    and catch.)

    (5) los musicos es distingue por no estar religioso. muchos de ellos no

    creen que haya un dios. actualmente, [es] ir onico, porque los musicos viven

    como no creen en Dios, pero tan pronto como ganen un premio, lo agrade-cen . . . la diet etica no [es] similar entre m usicos. unos musicos se distinguen

    por su diet etica de alcohol y drogas. ellos tambi en fumar cigarrillos, o otras

    sustancias, y asistir a fiestas todas las noches, entonces casi nunca duermen.

    unos musicos est an muy saludable, y est an vegetarianos estrictos . . . musicos

    a veces tienen su propia familia. tienen esposos y a veces hijos. tener una

    familia es muy difcil cuando los musicos siempre est an viajando.(Musicians

    are known for not being religious. A lot of them dont believe there is a God.

    Actually, it is ironic because musicians live as they dont believe in God, butas soon as they are awarded a prize, they thank God. . . The diet is not similar

    among musicians. Some musicians distinguish themselves for having a diet

    with alcohol and drugs. They also smoke cigarettes or other substances, and

    they attend parties every night. So they almost never sleep. Some musicians are

    very healthy and they are strict vegetarians. Musicians sometimes have their

    own families. They have spouses and sometimes kids. Having a family is very

    difficult when the musicians are always traveling.)

    For the most part, however, important information is packaged into nominal

    lexemes (adjectives and nouns) with a derivational morpheme (e.g., salud-able

    healthy muscul-oso muscular cuali-dad quality) Still cognates prevail

  • 8/10/2019 Collen Tinea Senc i On

    28/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    subject pronouns are scarce perhaps due to topic continuity. As with the first-

    year learners, we see expression of stance via epistemic verbs, and statements

    are given as unqualified facts.Regarding the estar+adjective segments, their principal discourse func-

    tions appear to be narrative and descriptions within narrations. In the first year,

    estar+ adjective mostly appears with a fixed expression such as estoy feliz

    I am happy or is used in descriptive contexts where serwas required with

    adjectives such asbonitapretty andgrandelarge. The following examples

    come from in-class letters that learners wrote to a friend. The examples relate

    life events as well as describe familiar people and places.

    (6)querida maria, hola! [estoy] muy feliz porque yo tengo un novio nueva.

    su nombre es Pete. Pete tiene veinte anos. mi novio es de indiana. Pete es

    moreno y alto. mi novio es muy inteligente y optimista. (Dear Mary, Hello! I am

    happy because I have a new boyfriend. His name is Pete. Pete is twenty years

    old. My boyfriend is from Indiana. Pete has dark hair and is tall. My boyfriend

    is very intelligent and optimist.)

    (7) hola aubrey! fue a costa rica para un semana. fue a un hotel en la

    playa dominical de costa rica. la playa dominical [estuvo] mas bonita! viaj o

    con mis padres y mi hermano. fue en un avi on y lo [estuvo] m as grande. dorm

    en un hotel en la playa. el mar [estuv o] muy largo y yo pesqu e mucho. megustaron las comidas mucho!(Hello Aubrey! I went to Costa Rica for a week.

    I went to a hotel in the Dominical beach in Costa Rica. The Dominical beach

    was very beautiful. I traveled with my parents and my brother. I went by plane

    and it was very big. I slept in a hotel by the beach. The ocean was very big and

    I fished a lot. I liked the meals very much!)

    Learners couple their assessments of peoples states with causes embed-

    ded in porque because adverbial clauses. The semantically dense nature is

    attributable to the use of various cognates that are long words, which describeplaces, disciplines, actions, or events.

    In second-year writing, estar+adjective is used in narrative and descrip-

    tive discourse that is detached from the writer. Writing is elicited from tasks

    in which students must summarize events and describe characters in readings

    or audiovisual material. The description of events favors the use of the present

    participle. The summarizing task also allows students to speculate about charac-

    ters motives or actions by using verbs of probability such ascreerto believe

    and causal adverbial clauses that begin with theporquebecause causal con-

    junction. These are the types of behaviors that account for the hypothetical

    nature identified for estar + adjective

  • 8/10/2019 Collen Tinea Senc i On

    29/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    (8) la madre regresa de la cocina, ella piensa que el muchacho [est a]

    dormido. sin embargo, el muchacho [est a] despierto todava y est a mirando la

    television. la pantalla [esta] oscura, as la madre le pregunta a su hijo que elhace. el hijo responde que el est a esperando la muchacha en el televisor. esto

    es muy triste, porque obviamente la madre no es una madre muy bien . . . el hijo

    cree que la muchacha en el televisor es su amiga. el piensa esto porque cree

    que la muchacha est a hablando ael. . . (The mother returns to the kitchen, she

    thinks that the boy is asleep. However, the boy is still awake and he is watching

    the television. The screen is dark so she asks him what he is doing. The son

    responds that he is waiting for the girl in the television. This is very sad because

    it is obvious that the mother is not a good mother. . . The son believes that the

    girl in the television is his friend. He thinks so because he thinks the girl is

    speaking to him . . .)

    (9) la personalidad del protagonista, juan, era t mido y tranquilo. le gustaba

    [estar] solo con sus pensamientos. el sonaba antes de acostarse de, todas

    las peripecias de un viaje a francia, pero no pudo costearlo. no creo que

    juan [estuviera] satisfecho con su vida y su trabajo porque sonaba con ir a

    francia.el quera experimentar nuevas cosas. su vida era muy rutinario y quera

    cambiarla. el no [estaba] satisfecho con su trabajo porque no pudo costear el

    viaje . . . el narrador nos sugiri o que juan escribiera las cartas porque la letratuvo los mismos rasgos esenciales. creo que el narrador nos dijo eso porqueel

    esanonimo y nos quiere hacer creer que juan [estuviera] loco y se muriera. (The

    personality of the main character, Juan, is shy and quiet. He liked to be alone

    with his thoughts. He used to dream before going to bed about his adventures

    in a trip to France, but he couldnt afford it. I dont think that Juan was satisfied

    about his life and his work because he dreamed about going to France. He

    wanted to experience new things. His life was a routine and wanted to change it.

    He was not satisfied about his work because he could not afford his trip . . . .Thenarrator suggested to us that Juan wrote the letters because his handwriting had

    the same main features. I think the narrator told us so because he is anonymous

    and he didnt want us to believe that Juan was crazy and he died.)

    Third-year students combine different discourse patterns, using estar+

    adjective in argumentative texts where they describe two opposing sides of

    an issue, as in (10). The writer is comparing the culture or life of American

    and Hispanic cultures, which gives the text a hypothetical reading. They use

    estar+adjective to produce personal narratives such as what happened on a

    birthday. It is noteworthy that preverbal clitic forms are not only used with some

    Gustar-like constructions but also with verbs in middle voice (e g infiltrarse

  • 8/10/2019 Collen Tinea Senc i On

    30/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    to be infiltrated) and passive constructions (e.g. ensenarseto be taught),

    making the discourse more encyclopedic-sounding.

    (10) . . . . el sueno americano se infiltra desde juventud, es evidente por tele-visi on, escuela, la cultura y ejemplos del gobierno y pol ticos. esta influencia

    es subconsciente pero fuerte y se ense na el americano que launica cosa que se

    necesita hacer es trabaja fielmente y comprar las cosas correctas y eventual-

    mente se recibir a la vida perfecta. /. . . ademas el americano siempre [est a]

    preocupado, consumiendo y trabajando pero viva muy poco.(The American

    dream is instilled from youth, it is evident in television, school, the culture

    and the examples from the government and the politicians. This influence is

    subconscious but strong and it is taught to the American that the only thing that

    he needs to do is to work loyally and to buy the right things and eventually he

    will receive a perfect life. / Moreover, the American is always busy, consuming

    and working but he lives very little.)

    (11) mi familia recordaron mi cumplea nos! pero, en este momento mis

    padres [estaban] enojados conmigo. yo no quer a pelear, entonces, termino el

    papel y nos sentamos a comer. mis hermanas mi desean un buen cumpleanos

    y me dieron un regalo bell simo. mi padre me hablaba de un film que le haba

    gustado. yo pide a mi madre si a ella le a gustado y ella me respondio, no, en

    un tono agresivo. mientras la entera cena ella solo me dijo, no, y, si, y fue muymolestosa. no me habl o durante mi fiesta de cumplea nos! [estaba] muy triste

    este noche y la pr oxima d a.(My family remembered my birthday! But, at that

    moment my parents were angry with me. I didnt want to argue so I finished

    my paper and we sat to eat. My sisters wished me a good birthday and gave me

    a very beautiful present. My father was talking to me about a film he had liked.

    I asked my mother if she had liked it and she answered no, in an aggressive

    tone. During the whole dinner she told me no and yes and I was very angry.

    She didnt talk to me during my birthday party! I was very sad that night andthe next day.)

    Discussion and Conclusions

    Studying the acquisition of the Spanish copula provides insights into the in-

    teraction among syntax, semantics, pragmatics, morphology, and vocabulary

    during development in one of the most basic of syntactic structuresnamely,

    attributive sentences (Leonetti, 1994). Spanish requires learners to choose be-

    tween two copulas in attributive sentences in accordance with a variety of

    contextual considerations and in consideration of a variety of levels of rep-

  • 8/10/2019 Collen Tinea Senc i On

    31/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    and (b) the function of attributive sentences in terms of orders of acquisi-

    tion in different learning contexts (Gunterman, 1992; Ryan & Lafford, 1992;

    VanPatten, 1985, 1987) and (c) the contextual and semantic factors that predictlearner usage of this construct as compared to native speakers (Geeslin, 2003a,

    2005), the present study is the first to provide a corpus-based analysis of the

    lexico-grammatical features that co-occurred with the Spanish copula (i.e., ser

    andestar)+adjective usage and so the different discursive functions that the

    ser+ adjective and the estar+ adjective segments play at three learner levels

    and in comparison to native-speaker models. The study delves into important

    learner issuesfor example, the discourse types learners associate with copula

    usage (Gunterman, 1992), the strong influence of contextual cues on copula

    choice (Geeslin, 2003a, 2003b)identified in the S/E research but not fully

    developed to date. The results overall revealed the following: (a) Both ser+

    adjectiveandestar+ adjective were associated with simple discourse at all

    levels; (b)ser+ adjective appears in descriptive and evaluative discourse where

    much linguistic complexity reliably occurs; (c) estar+ adjectiveis present in

    narrations, descriptions, and hypothetical discourse where, nonetheless, little

    linguistic complexity typically occurs.

    Specifically, findings showed that the model predicting ser+ adjective

    usage identified more variables (n = 21) and accounted for more variation(41%) than the estar+ adjective model, which only identified 10 predictors

    and 5% of the variation. It seems that at beginning levels of instruction, learners

    findser+ adjective more communicatively productive and thus more easily

    associated with a large array of features within their interlanguage, although

    these features are basic grammatical and lexical items. Ser+adjective is one

    of the first copula segments taught and recycled during various semesters,

    whereas estar+ adjective is primarily used at beginning levels in routines

    and formulaics like estar+ bien, mal, ocupado, enfermo. In this sense, theinput provided by teacher, materials, and other students in the class through

    task completion emphasizes the use ofser+adjective overestar+adjective

    constructions and, therefore, encourages more ser+ adjective usage. These

    findings are in line with early SLA studies onser/estaracquisition, which found

    that ser+ adjective was acquired well before estar+ adjective (Gunterman,

    1992; Ryan & Lafford, 1992; VanPatten, 1987) presumably because of the

    higher frequency and saliency ofser+ adjective in instructional and naturalistic

    input.

    It was also found that many of the lexico-grammatical predictor variables in

    both models were characteristics of simple discourse and they did not differen-

  • 8/10/2019 Collen Tinea Senc i On

    32/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    All levels seem to use copula+adjective as a discourse tool such as to commu-

    nicate evaluatives likees importante, l astimaits important, its a shame, and

    so forth. However, when the discourse becomes more syntactically and gram-matically complex,ser+adjective segments are absent andestar+ adjective

    segments become more prevalent. On the one hand, these observations contrast

    with native speakers, who useser+adjective for evaluative purposes in a wide

    variety of discourses, simple or complex; on the other hand, they are consistent

    with natives propensity to use estar+ adjective in more complex discourse

    (Collentine, 2008).

    Theser+adjective model was mostly associated with adjective and gram-

    matical/lexical verb variables. Various morphological properties of adjectives

    (e.g., feminine, plural) associated with ser+ adjective, whereas more com-

    plex adjectival syntactic processes (e.g., prenominal or postnominal adjectives)

    emerged as disassociated. Most of the verbal variables reflecting complex syn-

    tax (e.g., periphrastic future, past subjunctive, Gustar-like verbs) were disas-

    sociated with the copula construction and started to emerge as associated with

    ser+ adjective at advanced levels of instruction. Other features such as null sub-

    jects also indicated some grammatical sophistication at advanced levels where

    ser+ adjective became less frequently used. As for the discursive functions

    served by the co-occurrence of the variables in the predictive model forser+adjective, the disassociation of verbs of observation and communication with

    the construction indicated a discourse that was nonepistemic/nonhypothetical

    in nature. Comparisons with native speakers discourse showed that learners

    usedser+ adjective in discourse that is highly descriptive in nature and accom-

    panied by story-telling elements, especially at advanced levels of instruction.

    These findings corroborate those of Gunterman (1992), who examined learners

    in study-abroad contexts where ser+ adjective was indicative of descriptive

    discourse. Spanish learners, regardless their level, associate an evaluative stancewithser+ adjective.

    Theestar+ adjective regression analysis revealed a weak association with

    other lexical-grammatical features. This indicates that throughout the early to

    middle stages of acquisition, this phrase structure is weakly integrated into the

    interlanguage in terms of being a productive, necessary tool for the types of

    communication in which learners engage. In other words, the use ofestar+

    adjective segments is not obviatedor evoked, cognitively speakingwhen

    learners use their standard repertoire of lexico-grammatical tools. All told, the

    story is complicated forestar+ adjectives, which ultimately might account

    for its late acquisition On the one hand it appears where there is little as-

  • 8/10/2019 Collen Tinea Senc i On

    33/38

    Collentine and Asencion-Delaney Corpus-Based Analysis ofSer/Estar+Adjective

    language, both in verbal and nominal constructs.). The few variables associated

    with estar+ adjective suggest