22
http://ltj.sagepub.com/ Language Testing http://ltj.sagepub.com/content/29/2/243 The online version of this article can be found at: DOI: 10.1177/0265532211419331 2012 29: 243 originally published online 28 November 2011 Language Testing Scott A. Crossley, Tom Salsbury and Danielle S. McNamara Predicting the proficiency level of language learners using lexical indices Published by: http://www.sagepublications.com can be found at: Language Testing Additional services and information for http://ltj.sagepub.com/cgi/alerts Email Alerts: http://ltj.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://ltj.sagepub.com/content/29/2/243.refs.html Citations: What is This? - Nov 28, 2011 OnlineFirst Version of Record - Apr 23, 2012 Version of Record >> at UNIV OF VIRGINIA on September 28, 2012 ltj.sagepub.com Downloaded from

Predicting the proficiency level of language learners using lexical indices

  • Upload
    d-s

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Predicting the proficiency level of language learners using lexical indices

http://ltj.sagepub.com/Language Testing

http://ltj.sagepub.com/content/29/2/243The online version of this article can be found at:

 DOI: 10.1177/0265532211419331

2012 29: 243 originally published online 28 November 2011Language TestingScott A. Crossley, Tom Salsbury and Danielle S. McNamara

Predicting the proficiency level of language learners using lexical indices  

Published by:

http://www.sagepublications.com

can be found at:Language TestingAdditional services and information for    

  http://ltj.sagepub.com/cgi/alertsEmail Alerts:

 

http://ltj.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://ltj.sagepub.com/content/29/2/243.refs.htmlCitations:  

What is This? 

- Nov 28, 2011 OnlineFirst Version of Record 

- Apr 23, 2012Version of Record >>

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 2: Predicting the proficiency level of language learners using lexical indices

Language Testing29(2) 243–263

© The Author(s) 2011 Reprints and permission:

sagepub.co.uk/journalsPermissions.navDOI: 10.1177/0265532211419331

ltj.sagepub.com

Predicting the proficiency level of language learners using lexical indices

Scott A. CrossleyGeorgia State University, USA

Tom SalsburyWashington State University, USA

Danielle S. McNamaraUniversity of Memphis, USA

AbstractThis study explores how second language (L2) texts written by learners at various proficiency levels can be classified using computational indices that characterize lexical competence. For this study, 100 writing samples taken from 100 L2 learners were analyzed using lexical indices reported by the computational tool Coh-Metrix. The L2 writing samples were categorized into beginning, intermediate, and advanced groupings based on the TOEFL and ACT ESL Compass scores of the writer. A discriminant function analysis was used to predict the level categorization of the texts using lexical indices related to breadth of lexical knowledge (word frequency, lexical diversity), depth of lexical knowledge (hypernymy, polysemy, semantic co-referentiality, and word meaningfulness), and access to core lexical items (word concreteness, familiarity, and imagability). The strongest predictors of an individual’s proficiency level were word imagability, word frequency, lexical diversity, and word familiarity. In total, the indices correctly classified 70% of the texts based on proficiency level in both a training and a test set. The authors argue for the applicability of a statistical model as a method to investigate lexical competence across language levels, as a method to assess L2 lexical development, and as a method to classify L2 proficiency.

Keywordsfrequency, language proficiency, lexical competence, lexical diversity, second language acquisition, word familiarity, word imagability

Corresponding author:Scott A. Crossley, Department of Applied Linguistics/ESL, Georgia State University, Atlanta, GA 30302-4099, USA Email: [email protected]

419331 LTJXXX10.1177/0265532211419331Crossley et al.Language Testing

Article

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 3: Predicting the proficiency level of language learners using lexical indices

244 Language Testing 29(2)

A general description of lexical competence is difficult to pinpoint. Many researchers in second language (L2) studies make a general distinction between breadth of lexical knowledge (i.e. the number of words a learner knows) and depth of lexical knowledge (i.e. the degree of organization of known words, Crossley, Salsbury, McNamara, & Jarvis, 2011, in press; Meara, 1996, 2005a; Qian, 1999; Read, 1988; Wesche & Paribakht, 1996). However, there are a variety of different components to lexical knowledge (cf. Nation, 1990; Richards, 1976; Read, 2004), not all of which fit con-veniently into categories. The complexity of a lexicon and the processes involved in acquiring words may, in fact, make it impossible to develop a unified construct of lexical competence (Henriksen, 1999).

Regardless of a unifying definition, researchers have devoted considerable time to defining lexical competence because it is an important element of L2 acquisition. Misinterpretations of lexical items produced by L2 learners are key elements in com-munication errors (Ellis, Tanaka, & Yamazaki, 1994; Ellis, 1995; de la Fuente, 2002). L2 lexical competence is also a prerequisite for academic achievement (Daller, van Hout, & Treffers-Daller, 2003). Lastly, understanding L2 lexical acquisition in relation to its deeper, cognitive functions can lead to increased awareness of how L2 learners process and produce language (Crossley, Salsbury, & McNamara, 2009; 2010a).

Recent developments in computational linguistics permit not only the measurement of breadth of lexical knowledge features, but also depth of knowledge features and fea-tures related to accessing lexical items. However, the examination of lexical competence using these contemporary, computational indices is still in its infancy and the orchestra-tion of these indices into a practical algorithm from which to investigate lexical compe-tence has only just begun (cf. Crossley et al., 2011, in press). The purpose of the current study is to advance such investigations by using a range of computational indices related to lexical competence to predict the language proficiency level of L2 learners based on lexical production (i.e. writing samples taken from beginning, intermediate, and advanced L2 learners). Such an approach allows us to examine lexical competence in relation to the language proficiency of the learner (i.e. as a learner property) and not in relation to human judgments of lexical competence (i.e. as a property of human factors, cf. Crossley et al., 2011, in press).

Lexical competence

Recent investigations of lexical competence as a property of human factors have inves-tigated human judgments of lexical proficiency. These studies have demonstrated that a significant amount of variance in human judgments of lexical proficiency is predictable based on computational indices related to lexical diversity, word hypernymy, word fre-quency, word imagability, and word familiarity (Crossley et al., 2011, in press). In these studies, both written and spoken language samples were collected from native speakers and L2 learners at the beginning, intermediate, and advanced levels. Human raters then evaluated each sample and assigned the sample a holistic, lexical proficiency score. The variance in these scores was then predicted using a variety of lexical indices taken from the computational tool Coh-Metrix (see http://cohmetrix.memphis.edu; McNamara & Graesser, 2010; McNamara, Louwerse, McCarthy, & Graesser, 2010).

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 4: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 245

In this current study, though, we are interested in examining lexical competence from a learner perspective. Thus, we do not consider human evaluations of texts, but rather focus on machine evaluations of learner language proficiency as measured by standardized tests (i.e. TOEFL and ACT Compass). An important notion supporting lexical competence as a learner property is that lexical competence in individual learn-ers increases concomitantly with language proficiency. Thus, an L2 learner’s language proficiency, as measured by a standardized test, correlates to their lexical competence (Bachman, Davidson, Ryan, & Choi, 1995; Bachman & Palmer, 1982; Oller, 1979; Read, 2004). Although the notion that the development of lexical competence as part of overall language development is commonly assumed, few studies have empirically supported it (cf. Zareva, Schwanenflugel, & Nikolova, 2005). The verification and modeling of such a notion has several potential benefits. First, automated indices for measuring lexical competence are freely available and could therefore provide accu-rate feedback to interested parties about a learner’s lexical competence and, by proxy, their language proficiency. Second, examining links between lexical measures and language proficiency can lead to the creation of models that test the multidimensional aspects of word knowledge (Zareva et al., 2005). The former would prove beneficial to language teachers and language institutions, while the latter would prove important to language researchers.

Breadth of knowledge measures

Standardized assessments of L2 lexical competence have generally depended on breadth of knowledge evaluations. The most well known of these assessments is Lexical Frequency Profiles (LFP; Nation & Heatley, 1996) with lesser known assessments such as the Lexical Quality Measure (Arnaud, 1992), Productive Levels Test (Laufer & Nation, 1999), Computer Adaptive Test of Size and Strength (CATSS; Laufer & Goldstein, 2004), and P_Lex (Meara & Bell, 2001) also available. The majority of these measures depend on surface level linguistic features (i.e. word counts measures such as lexical diversity and frequency) to assess lexical competence. Since both lexical diver-sity and word frequency indirectly assess how many words a learner knows, they are generally categorized as breadth of knowledge measures.1

Lexical diversity is an important measure of lexical competence because it can be used to measure vocabulary knowledge and is also indicative of writing quality (Malvern, Richards, Chipere, & Duran, 2004; Ransdell & Wengelin, 2003; McCarthy & Jarvis, 2010) and human judgments of lexical proficiency (Crossley et al., 2011, in press). Lexical frequency is also an important feature of lexical competence because lexical competence is at least partially based on the distribution of words by frequency. As a result, beginning L2 learners are more likely to comprehend, process, and produce higher frequency words (Ellis, 2002).

Measuring lexical frequency is probably the most common approach for assessing L2 lexical competence, especially in production tasks. The two most widely cited produc-tion indices of this type are Lexical Frequency Profiles (LFP; Nation & Heatley, 1996) and P_Lex (Meara & Bell, 2001). In both LFP and P_Lex, a greater incidence of low frequency words in a text indicates a more proficient L2 vocabulary. Laufer and Nation

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 5: Predicting the proficiency level of language learners using lexical indices

246 Language Testing 29(2)

(1995) argued that LFP could discriminate between L2 proficiency levels and that it cor-relates with independent measures of vocabulary knowledge. However, Meara (2005b) criticized LFP for being less robust than Laufer and Nation claimed (especially for shorter texts) and simulation studies indicated that LFP might not distinguish between learner proficiency levels.2 To our knowledge, P_Lex has not been used to discriminate between L2 proficiency levels.

Depth of knowledge measures

Depth of knowledge measures assess how well a learner knows a word (Nation, 1990; Meara, 2005a). Unlike breadth of knowledge measures, depth of knowledge features are not based on the number or variety of words a learner produces, but on constraints at the phonemic, morphemic, and syntactic level (Qian, 2004) and deeper level features related to word associations (e.g. semantic co-referentiality, hypernymy, polysemy, and word asso-ciations). Word association features, which are of interest for this study, are often integrated under the term ‘lexical network,’ which serves as a convenient metaphor to describe the manner in which lexical features combine to form complex association models that act categorically to form entire lexicons (Crossley et al., 2009; Haastrup & Henriksen, 2000; Huckin & Coady, 1999). From an acquisitional perspective, as L2 learners develop lexical proficiency, they build lexical networks that are strengthened by differentiating relations between words and within words (Haastrup & Henriksen, 2000). The word properties asso-ciated with lexical networks are discussed in detail in the following sections.

Conceptual levels. Conceptual levels refer to connections between general and specific lexical items (Chaffin & Glass, 1990; Haastrup & Henriksen, 2000) that permit the eco-nomical representation of lexical properties (Chaffin & Glass, 1990; Murphy, 2004) and lexical generalization (Murphy, 2004). These connections are the hierarchical associa-tions between hypernyms (superordinate words) and hyponyms (subordinate words). In lexical network models, hypernymy is considered an important organizational system for lexical relations (Miller & Teibel, 1991) because it allows for hierarchical categoriza-tions that define how hyponyms inherit properties from their related hypernyms, and thus enables set inclusion among category members.

From an L1 developmental perspective, hypernymic relations tend to be acquired as learners advance cognitively (Anglin, 1993; Snow, 1988), as they increase their levels of education (LeVine, 1980; Snow, 1990), and as they acquire more specific lexical knowl-edge (Wolter, 2001). In L2 studies, research indicates that learners produce more fre-quent words of general meaning than of specific meaning (Levenston & Blum, 1977) and that this production of general words by L2 learners produces inappropriate overgener-alizations (Ijaz, 1986). More recent studies have demonstrated that L2 learners produce more words that are less specific as time is spent studying English (Crossley, Salsbury, & McNamara, 2009). In addition, hypernymy is an important element of early L2 noun production in that produced nouns are more specific (Crossley & Salsbury, 2010). Hypernymy is also an important element of human judgments of lexical competence with language samples judged to be of lower lexical quality containing more specific words (Crossley et al., 2011, in press).

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 6: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 247

Word associations. Word associations relate to how many connections a word has with other words in a speaker’s lexicon (Toglia & Battig, 1978). Such associations mediate the organization and storage of words in the mental lexicon (Ellis & Beaton, 1993). Words with high associations invoke multiple links to other words (e.g. food, music, people), while those with lower associations invoke fewer links (e.g. chance, fault, soul). Words with more asso-ciations are thought to be acquired early by L2 learners (Ellis & Beaton, 1993).

The link between word associations and lexical acquisition is supported in several recent studies. Zareva (2007) found that higher proficiency learners provide significantly more word associations than intermediate and beginning level learners. She argued that larger vocabularies allow for a greater number of word associations. Additional studies (Crossley & Salsbury, 2010; Salsbury, Crossley, & McNamara, 2011) have explored word association scores for the individual words produced by learners. These studies have found that word association scores decrease as a function of increasing language proficiency with higher proficiency learners producing more difficult words that have fewer associations. Such a finding implies that advanced learners develop stronger networks that permit the acquisition of words with fewer associations (i.e. meaningful words). In general, studies support the notion that advanced learners’ networks are more densely connected (Meara, 2005a).

Polysemy. Polysemous words are words that have more than one related sense and are, thus, potentially more ambiguous. Because polysemy connects different word senses, it is related to conceptual organization. Thus, when words have multiple related senses, their meanings overlap within the same conceptual structure (Murphy, 2004). From a network perspective, polysemous relations are built on a single lexical entry that contains all the multiple senses for the word. Such an approach suggests that having separate entries for related word senses is uneconomical because more storage space would be used and it would fail to capture the sense connections in the word’s uses (Nunberg; 1979, Pustejovsky; 1995; Verspoor & Lowie, 2003). Additionally, positioning a word’s senses within a single lexical item would allow meaning relationships between a word’s senses to be more efficiently recognized (Verspoor & Lowie, 2003).

Studies concerning the polysemy knowledge of L2 learners have found that word sense knowledge increases as L2 learners gain proficiency (Schmitt, 1998). In addition, as language proficiency increases, L2 learners produce words that are more polysemous (i.e. words that contain more senses) as well as producing a greater number of meanings for individual words as language proficiency increases (Crossley et al., 2010a). These studies provide evidence for the development of word sense relations in L2 learners.

Semantic similarity. Semantic similarity measures how words are related at levels beyond root morphology. For example, synonyms are rarely related morphologically (compare cat and feline), but are semantically similar and serve comparable functions. Likewise, the words tail, fur, claw, and whisker are all unrelated morphologically, but they are all connected semantically to the concept of cat through meronomy. However, words need not be related conceptually to demonstrate semantic similarity. The words cat and mouse are more closely linked than the words dog and mouse by virtue of their real world rela-tionship. Associations such as these, which are based on semantic similarity, are reflected

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 7: Predicting the proficiency level of language learners using lexical indices

248 Language Testing 29(2)

in lexical network models (Haastrup & Henriksen, 2000). Recent studies have demonstrated that L2 learners’ utterances develop stronger semantic links over time (i.e. L2 learners use a greater number of semantically related words) supporting the notion that as learners acquire a lexicon, the semantic properties of their utterances become more similar (Crossley et al., 2010b).

Access to lexical items

The rate at which learners can retrieve or process known words relates to the accessibil-ity of lexical items in a learner’s lexicon. Core lexical items are generally those words that can be retrieved and processed the quickest (Meara, 2005a). While we adapt this notion from Meara, we examine the construct in a different manner. Meara tested word accessibility through hidden strings of letters. In contrast, we examine access to lexical items based on word properties such as concreteness, familiarity, and imagability. Since these word properties permit words to be recognized, recalled, and retrieved more quickly, we associate these properties with the accessibility of lexical items under the hypothesis that core lexical items will be more concrete, imagable, and familiar (Crossley & Salsbury, 2010, Crossley, Salsbury, McNamara, & Jarvis, 2011). These properties are discussed in more detail below.

Word concreteness. Word concreteness refers to here-and-now concepts, ideas, and things (Gilhooly & Logie, 1980; Toglia & Battig, 1978; Paivio, Yuille, & Madigan, 1968). The concreteness of a word has implications for that word’s learnability because concrete words, as compared to abstract words, have advantages in tasks involving recall, word recognition, lexical decision tasks, pronunciation, and comprehension (Gee, Nelson, & Krawczyk, 1999; Paivio, 1991). In L2 lexical acquisition, studies have demonstrated that concrete words are learned earlier (Crossley, Salsbury, & McNamara, 2009; Salsbury et al., 2011) and more easily than abstract words (Ellis & Beaton, 1993).

Word familiarity. Word familiarity has often been interpreted as a measure of word expo-sure because words that are rated as more familiar are recognized more quickly; how-ever, the exact processes involved in human ratings of word familiarity are unclear (Stadthagen-Gonzalez & Davis, 2006). Word familiarity ratings do differ from indices of word frequency in that they are better predictors of word performance (Gernsbacher, 1984; Connine, Mullennix, Shernoff, & Yelen, 1990) indicating that familiar words are more recognizable than frequent words. A variety of studies have demonstrated that word familiarity ratings have stronger links to word exposure than written frequency measures (i.e. CELEX and the BNC) and thus likely better reflect spoken word fre-quency (demonstrating a bias towards natural exposure; Stadthagen-Gonzalez & Davis, 2006). Researchers have also argued that word familiarity indices may also include a semantic component (Balota, Pilotti, & Cortese, 2001), reflect the familiarity of sublexi-cal spelling-sound correspondence (Toglia & Battig, 1978), or correspond to the age of acquisition of lexical items (Brown & Watson, 1987). An index of word familiarity has also been reported as a significant predictor of human judgments of spoken lexical pro-ficiency (Crossley et al., in press).

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 8: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 249

Word imagability. Highly imagable words trigger mental images quickly, which allows the words to be swiftly recalled. Thus, highly imagable words likely constitute core lexi-cal items in a learner’s lexicon because of links between cue properties and concept imagability. More imagable words facilitate lexical acquisition in L2 learners because they contain more context availability (Schwanenflugel, 1991) and because they are expe-rienced and analyzed visually (Ellis & Beaton, 1993). Ellis and Beaton (1993) also found that highly imagable words were strong candidates in keyword learning techniques for second language learners while Salsbury et al. (2011) found that highly imagable words were acquired early by L2 learners and Crossley et al. (2011) reported that word imagabil-ity was a significant predictor of human judgments of spoken lexical proficiency.

In summary, while there is no clear definition for lexical competence, most research-ers agree that it comprises breadth of lexical knowledge, depth of lexical knowledge, and access to core lexical items. Lexical features related to each of these components have demonstrated important predictive potential in assessing the lexical competence of L2 learners.

Method

Our goal is to determine which lexical features related to lexical competence best explain L2 proficiency level categorization. Thus, we analyze a corpus of written texts produced at three different L2 proficiency levels (beginning, intermediate, and advanced) using depth and breadth of knowledge lexical indices along with indices related to the acces-sibility of lexical items taken from the computational tool Coh-Metrix. We use analysis of variance (ANOVA) to examine differences between the three language proficiency groups and a discriminant function analysis to assess the indices that best discriminate between the groups.

Corpus collection

We collected written texts from 100 L2 learners using a cross-sectional approach. All the texts used in this study were unstructured freewrites. We selected freewrites so that the topic did not control the lexical output of the students and so the texts would better reflect the lexical knowledge of the participants. During data collection, all students were given 15 minutes to freewrite on a topic of their choosing. The freewrites collected were not designed to help develop an essay, but were individual texts in their own right and meant to be representative of the writers’ lexical knowledge. All original texts were handwritten by the participants and later entered electronically by the researchers. All participants were L2 learners studying English in the United States at intensive language programs at two large US universities. The L2 learners ranged in age from 17 to 34 years old and came from 19 different first language (L1) backgrounds (Arabic, Bambara, Bangla, Chinese, Dutch, French, German, Hindi, Ibo, Japanese, Korean, Farsi, Portuguese, Russian, Spanish, Thai, Turkish, Vietnamese, and Yoruba).

Text samples were separated at the paragraph level. The samples were controlled for text length by randomly selecting text segments of about 150 words from each sample

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 9: Predicting the proficiency level of language learners using lexical indices

250 Language Testing 29(2)

depending on paragraph constraints. All texts were corrected for spelling. We divided the corpus into two sets: a training set (n = 67) and a testing set (n = 33). The purpose of the training set was to identify which of the Coh-Metrix variables demonstrated significant differences among the learner level groupings. These variables were later used to classify the texts in the training set using a discriminant function analysis. Later, the texts in the test set were classified using the model from the training set to examine the predictability of the variables in an independent corpus (Witten & Frank, 2005).

Level classification

Participants in the L2 writing sample collection were administered one of two profi-ciency tests for placement and evaluation purposes in their respective intensive English programs. One group of L2 learners were administered either the Institutional TOEFL (paper-based test: PBT) or the TOEFL internet-based test (iBT). The second group of students were administered the ACT ESL Compass reading and grammar tests.

The TOEFL PBT, TOEFL iBT, and ACT ESL Compass reading scores were used to classify the L2 writers into beginning, intermediate, and advanced categories. In the case of both versions of the TOEFL, we used the total score on the exam in order to facilitate the comparison of the two exams. Comparisons are available through the Educational Testing Service (ETS) website for total scores but not for sub-scores. In establishing our three broad proficiency categories for the TOEFL scores we referred to Wendt and Woo (2009) and Boldt, Larsen-Freeman, Reed, and Courtney (1992). However, no compari-sons are available between the TOEFL tests and the ACT ESL Compass test. Thus, we relied on the test maker’s suggested proficiency levels and descriptors in classifying the ACT group of students. The classification of the L2 writers is as follows: L2 writers who scored 400 or below on the TOEFL PBT, 32 or below on the TOEFL iBT, or 126 or below on the combined Compass ESL reading/grammar tests were classified as beginning level. L2 writers who scored between 401 and 499 on the TOEFL PBT, 33 and 60 on the TOEFL iBT, or 127 and 162 on the combined Compass ESL reading/grammar tests were classified as intermediate level. L2 writers who scored 500 or above on the TOEFL PBT, 61 or above on the TOEFL iBT, or 163 or above on the combined Compass ESL reading/grammar tests were classified as advanced level. Such classifications have been used in similar studies concerning lexical proficiency (Crossley et al., 2010, in press).

Because of the cross-sectional approach used in this study, each learner level (i.e. beginning, intermediate, and advanced) was represented by an unequal number of texts. We collected 37 texts from beginning level learners, 30 texts from intermediate learners, and 33 texts from advanced learners.

Variable selection

The lexical indices used in this study were taken from Coh-Metrix. The indices were sepa-rated into breadth of knowledge measures, depth of knowledge measures, and measures that examine the accessibility of lexical items. Those measures related to breadth of knowledge include lexical diversity and word frequency. The measures related to depth of knowledge include hypernymy, polysemy, semantic co-referentiality, and word associations. The

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 10: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 251

measures that examine the accessibility of lexical items include word concreteness, word imagability, and word familiarity, For this study, we selected all indices reported by Coh-Metrix that are associated with these measures and placed them into their respective categories.

We used an ANOVA to select the index from each measure with the highest partial eta squared value that was also significant to use in the discriminant function analysis (DFA). We selected a ratio of 15 to 1 between the cases (the L2 texts) and the variables (the lexical indices). A 15 to 1 ratio allows for the interpretation of the discriminant function coeffi-cients as predictors for each variable’s individual contribution to the discriminant function (Field, 2005). Such a ratio allowed us to use 4 lexical variables in our training set analysis. To check for multicollinearity, we conducted tolerance tests on the selected variables. If the variables did not exhibit collinearity, they were then used in the discriminant function analysis. The lexical measures reported by Coh-Metrix and their respective indices are discussed briefly below. A fuller description of these indices can be found in Crossley and McNamara (2009), Crossley and Salsbury (2010), and McCarthy and Jarvis (2010).

Measures

Word frequency. Coh-Metrix reports average frequency counts for the majority of the individual words in the text using CELEX (Baayen, Piepenbrock, & Gulikers, 1995).

The indices computed report values for content words and all words combined with values for written words and spoken words.

Lexical diversity. The lexical diversity values reported by Coh-Metrix include M (Michea, 1969), MTLD (McCarthy & Jarvis, 2010), and D (Malvern et al., 2004). M, unlike MTLD and D, reports lexical diversity on a reverse scale.

Polysemy. Coh-Metrix reports the mean polysemy values for all content words in a text as well as an index based on the standard deviation of the polysemy values. The poly-semy values used in Coh-Metrix are taken from WordNet (Fellbaum, 1998; Miller, Beck-with, Fellbaum, Gross, & Miller, 1993).

Hypernymy. Coh-Metrix also measures hypernymy for both nouns and verbs using WordNet. The hypernymy scale used in Coh-Metrix is reversed such that a lower value reflects an overall use of less specific words, while a higher value reflects an overall use of more specific words.

Semantic co-referentiality. Coh-Metrix measures semantic coreferentiality using Latent Semantic Analysis (LSA; Landauer, McNamara, & Kintsch, 2007). Coh-Metrix reports LSA values between adjacent sentence and all sentences.

Word concreteness, familiarity, imagability, and meaningfulness. Coh-Metrix calculates val-ues for word concreteness, familiarity, imagability, and meaningfulness for all words as well as for only content words. These values are all based on human word judgments taken from the MRC Psycholinguistic Database (Wilson, 1988).3

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 11: Predicting the proficiency level of language learners using lexical indices

252 Language Testing 29(2)

Results

ANOVA analysis

To examine group differences between variables in the selected measures, we conducted an Analysis of Variance (ANOVA). The ANOVA examined 24 variables: three hypern-ymy indices, two polysemy indices, two LSA indices, three lexical diversity indices, two concreteness indices, two imagability indices, two familiarity indices, two meaningful-ness indices, and six frequency indices.

We selected the four, unrelated variables that demonstrated the highest partial eta- squared value when comparing differences between the L2 writers’ proficiency levels (beginning, intermediate, and advanced) that were not highly correlated. These variables were then used in a discriminant function analysis. Three measures (LSA, polysemy, and hypernymy) did not yield any indices that demonstrated significant differences between the groups and were not included in the discriminant function. The partial eta-squared values for imagability every word (.459), concreteness every word (.414), and meaning-fulness every word (.366) demonstrated large effect sizes. The partial eta-squared values for content written word frequency (.290), Maas lexical diversity (.250), and word famil-iarity content words (.207), demonstrated medium effect sizes (Cohen, 1992). The means, standard deviations, f values, p values, and partial eta-squared values for these variables are presented in Table 1.

Collinearity

Pearson correlations demonstrated that the word concreteness and word meaningfulness indices highly correlated (> .70) with the word imagability value. Because the word concreteness and meaningfulness values had lower effect sizes with the proficiency lev-els as compared to the imagability values, the word concreteness and meaningfulness

Table 1. Multivariate results for the three proficiency levels: Means (standard deviations), f value, p value, and ηp

2

Beginner Intermediate Advanced F(2, 66) ηp2

Word imagability every word

371.736 (20.024) 345.924 (14.226) 339.807 (11.449) 27.178 0.459

Word concreteness every word

341.974 (17.510) 319.181 (13.746) 315.935 (11.276) 22.590 0.414

Word meaningfulness every word

394.992 (16.361) 378.970 (15.623) 368.879 (12.856) 18.454 0.366

Content word frequency, written texts

1.817 (.256) 1.636 (.296) 1.441 (.210) 13.100 0.290

Lexical diversity M* 270.000 (60.000) 220.000 (30.000) 210.000 (20.00) 10.658 0.250Word familiarity content words

592.132 (6.464) 586.659 (5.636) 585.130 (6.413) 8.369 0.207

*The LD values reported by M are reversed scaled so that higher values correspond to lower LD.

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 12: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 253

indices were dropped from the discriminant function analysis. Thus, our four predictor variables for the analysis that did not demonstrate multicollinearity were word imagabil-ity, word frequency, lexical diversity, and word familiarity.

Discriminant function analysis: Training set

To examine which individual lexical indices best discriminated between language profi-ciency levels, all four selected variables were entered into a discriminant function analysis. Unlike the ANOVA analysis, the discriminant function analysis provided an estimate of relative importance of each of the indices to separate the language proficiency levels when examined simultaneously (Field, 2005; Meyers, Gamst, & Guarino, 2006). In this study, we are interested in four features of the discriminant analysis: the eigenvalues, the Wilks’s Lambda, the classification results, and the discriminant function co-efficients. The eigen-values explain the percentage of variance explained by each function (because there are three groups there will be two functions). The Wilks’s Lambda tests whether the functions are significant. The classification result assigns each writing sample into one of the three groups (beginning, intermediate, and advance language proficiency). The discriminant function co-efficients demonstrate the contribution that each lexical index makes in pre-dicting the dependent variable (the proficiency levels). The larger the co-efficient, the greater the contribution of that variable to the discrimination between groups.

The first function explained 96.1% of the separation among the groups. The Wilks’s Lambda for the first function was significant, Λ = .412, χ2 (8) = 55.494, p < .001. The second function explained only 3.9% of the separation between the groups. The Wilks’s Lambda for the second function was not significant, Λ = .949, χ2 (3) = 3.267, p > .05. Because the second function explained only 3.9% of the variance and was not signifi-cant, our subsequent analysis and discussion will focus on the first function only.

The classification results for the discriminant function analysis correctly classified 70.1% of the writing samples as beginner, intermediate or advanced, χ2 (4) = 47.465, p < .001 (see Table 2). The reported Kappa value comparing the actual classification to the classification predictions made by the DFA was 0.6483, indicating a substantial agree-ment. All levels were classified at a similar rate with texts written by advanced level L2 learners reporting the best classification results. All results were well above chance (i.e. above 33%).

The discriminant function co-efficients (DFC) for the discriminant analysis corre-spond to the partial contributions of each variable in the discriminant function. All the variables in this analysis made contributions to the discriminant function. The analysis demonstrated that the word imagability index contributed the most to separating the groups (DFC = .733). Word imagability was followed by the lexical diversity index (DFC = .311), the word frequency index (DFC = .286), and the word familiarity index (DFC = .224).

We also report results in terms of recall and precision. Recall scores compute the per-centage of items (freewrites) in each category (proficiency level) that are successfully retrieved. Recall is computed by counting the number of hits (correct predictions) over the number of hits + misses (incorrect predictions). Precision scores compute the percentage of items in each category that are relevant. Precision is the number of correct predictions

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 13: Predicting the proficiency level of language learners using lexical indices

254 Language Testing 29(2)

Table 3. Precision, recall, and F1 values for training and test set

Paragraph set Precision Recall F1

Training setBeginner 0.895 0.680 0.773Intermediate 0.500 0.684 0.578Advanced 0.773 0.739 0.756

Test setBeginner 0.846 0.917 0.880Intermediate 0.625 0.455 0.526Advanced 0.583 0.700 0.636

divided by the sum of the number of correct predictions and false positives. These scores are important because an algorithm could predict everything to be a member of a single group and score 100% in terms of recall. However, it could only do so by claiming mem-bers of the other group. If this were the case, the algorithm would score low in terms of precision. By reporting both values, we can better understand the accuracy of the model. The accuracy of the model for predicting text level is provided in Table 3. The combined accuracy for precision and recall scores (F1)4 for the training set was .70.

Discriminant function analysis: Test set

To investigate the strength of the model on an independent data sample, we used the model from the DFA to classify the written texts in the held back test set. The classifica-tion results for the test set correctly classified 69.7% of the writing samples as beginner, intermediate or advanced, χ2 (4) = 24.175, p < .001 (see Table 2). The reported Kappa value comparing the actual classification to the classification predictions made by the DFA was 0.637, indicating a substantial agreement. Unlike the results from the training

Table 2. Classification results

Predicted membership Beginner Intermediate Advanced Total

Training set Count Beginner 17 7 1 25 Intermediate 2 13 4 19 Advanced 0 6 17 23 Percentage Beginner 68.00 28.00 4.00 100 Intermediate 10.50 68.40 21.10 100 Advanced 0 26.10 73.90 100Test set Count Beginner 11 1 0 12 Intermediate 1 5 5 11 Advanced 1 2 7 10 Percentage Beginner 91.70 8.30 0 100 Intermediate 9.10 45.50 45.50 100 Advanced 10.00 20.00 70.00 100

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 14: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 255

set, all levels were not classified at a similar rate. Beginner texts were correctly classified by the model with a 92% accuracy. Intermediate texts were correctly classified with only a 46% accuracy. Advanced texts were correctly classified with a 70% accuracy. All results were above chance though (i.e. above 33%). We also report on the precision, recall, and accuracy of the model for the test set (see Table 3). The combined accuracy for precision and recall scores (F1) for the training set was .68.

Discussion

The results of this study indicate that automated, lexical indices can be used to predict the language proficiency levels of L2 learners based on their writing samples. These results pro-vide strong evidence to support the link between lexical competence and language profi-ciency (Read, 2000; Zareva et al., 2005) and demonstrates the strength of various breadth of knowledge indices and indices that examine lexical accessibility for predicting the language proficiency levels of individual L2 learners, especially at the beginning and the advanced levels. By proxy, these findings also provide evidence for the relative strength of these indices in explaining lexical competence. Specifically, the results show that as proficiency levels in English increase, the L2 writers in this study produce freewrites with less imagable words, less familiar words, and more infrequent words. In contrast, as proficiency levels increase, so does lexical diversity. Taken together, these lexical differences help to explain a significant amount of the variation among the proficiency levels. In discussing our findings, we address each of the examined lexical indices below in reference to their importance in explaining proficiency levels. We then provide text examples to illuminate these differences.

The word property index imagability was the strongest predictor of language profi-ciency in our L2 learner population. The results indicate that beginning learners produce the most imagable words and that advanced learners produce the least imagable. The effect size between the imagabilty values and the proficiency levels was large, implying a strong relationship between the two. We argue that the results support the notion that beginning language learners produce more words that are easily accessible because they trigger quicker mental images. Thus, the words produced by beginning level learners are more easily retrievable than those words produced by advanced learners. The accessibil-ity of these highly imagable words is likely a product of contextual learning which strengthens the links between words and concepts.

Our word frequency index was the second most predictive index of proficiency level classification for our subject population and demonstrated a medium effect size. The descriptive statistics for the proficiency levels showed that advanced L2 writers pro-duced more infrequent words as found in written texts than beginning L2 writers. This finding supports theories of lexical proficiency that posit that frequent word forms are produced more often by beginning L2 learners (Bell, 2003; Crossley & Salsbury, 2010; Ellis, 2002; Nation, 1988). This is likely the result of repeated lexical exposure strength-ening form-meaning connections and affording lexical production.

Our index of lexical diversity, M, was the third most predictive index of language proficiency classification and demonstrated a medium effect size. The mean values for M demonstrated that advanced L2 writers used greater lexical diversity than beginning level L2 writers with an increasing linear trend as the level of proficiency increased. This finding helps to support the notion that indices of lexical diversity are important

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 15: Predicting the proficiency level of language learners using lexical indices

256 Language Testing 29(2)

indicators of lexical competence and writing quality (Carrell & Monroe, 1993; Malvern et al., 2004; Ransdell & Wengelin, 2003) and that advanced learners produce a greater diversity of words than beginner learners.

Our index of word familiarity was our last predictor of L2 language proficiency and reported a medium effect size. Overall, lower level learners used more familiar words, while advanced level learners produced less familiar words. While both frequency and familiarity are indictors of word exposure, our word familiarity index did not demon-strate multicollinearity with our written word frequency index. This result likely demon-strates that our familiarity index is capturing elements of spoken word frequency (i.e. natural exposure) as compared to written word frequency. Potentially, our familiarity index also possesses lexical elements at the semantic or sublexical level.

Samples taken from the corpus help to illuminate the lexical differences among learners from the different proficiency levels. We present three samples below taken from our data with each level of learner represented (beginning, intermediate, and advanced). We also present the relevant lexical values for these samples as reported by Coh-Metrix in Table 4.

Beginner writing sample (Korean)

My family is big family. I have one elder sister and two younger sister. My elder sister is nurse. She very friendly. She have a one’s child. He name is Jong Youn. Jong Youn is very cute, so we are very happy. Jong Youn is very smiling. He looks good. My one’s younger sister draw very well. She’s name is Su Jeong. Su Jeong is small body. She dream is artist so she is everyday draw. My one’s younger sister name is Suhyun. She is 16 years old. She is student. So everyday study. My father is very busy. My mother too. My mother everyday cooking. Sometimes, my father help them. I have a my husband.

Intermediate writing sample (Japanese)

My home country is Japan, and I was born Tokyo in Japan. Tokyo is a capital city in Japan, so there are many people live in and enjoy their life. I’m Japanese, but I don’t know much about Japan, because I hate history, so I talk about the capital city of Tokyo. My hometown was a small country city, and there were nothing except for a small market and a convenience store. Also, the convenience store was far from my house, it took 30 minutes walk by myself. My friends and me were always played our school ground or a park that near from our house. I think, Tokyo is very famous for have a many entertainments. However, that is only center city of Tokyo. When I tell people who is from other countries, about my countries, they said Tokyo!! I know Tokyo. Although, that is totally different that their image or thought of Tokyo.

Advanced writing sample (Arabic)

I don’t usually drive to the campus, but the other day I woke up really late and I was going to miss my class. I took my morning shower, put on my clothes in five minutes and jumped into my car. After three minutes I arrived to the parking lot next to the Butler building. As expected, it was totally jammed. After circulating the area for more than ten times I managed to squeeze my car between a Mustang and a truck. After class I wanted to go out for a break. I started backing of the parking space looking to left. I just did not want to scratch that beautiful Mustang on my left side. And while I’m staring at it. All of a sudden I heard a crack sound. I looked to

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 16: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 257

right to see that my right side mirror was totally in the truck taillight. I panicked for a moment. That was my first accident.

Comparing the extremes (the beginning and advanced sample) permits a clearer delineation of the lexical differences between the levels. The beginner sample contains a variety of highly imagable words (i.e. words that quickly elicit mental images) such as sister, child, smiling, cooking, body, draw mother, and father. The advanced sample contains less imagable words such as day, late, morning, minutes, lot, area, break, space, sound, panic, and moment. The produced words in the beginning sample text are also more frequent words when compared to the produced words in the advanced sample, which contains many infrequent words such as squeeze, jammed, circulating, and taillight. Many of the words contained in the advanced text sample, as compared to the beginning sample, are also frequent and unfamiliar (e.g. area, than, just, side, while, and moment). Unlike the advanced text sample, the beginning text sample also contains frequent word repetition (e.g. family, sister, she, he, is mother, and father) and thus less lexical diversity. This same degree of repetition is not found in the advanced sample. As evidenced in the samples, advanced level writers produce a greater variety of words that are more difficult to access (i.e. words that are more difficult to recog-nize, retrieve, and recall).

These examples help to exemplify the level differences reported by the lexical indi-ces. In total, these indices classified 70% of the text samples in both our training and test corpus. The accuracy of the model in classifying the texts was higher for the beginning and advanced levels, although overall accuracy for the intermediate texts in the total corpus was 60%. The lower classification accuracy may reflect the transitory nature of the intermediate level learners or perhaps the random sampling used to create the training and test sets. Breadth of knowledge features and core lexical item indices were both important in explaining differences in L2 proficiency. Depth of knowledge features were not a significant predictor of proficiency level, although word meaningfulness scores did demonstrate significant differences among the levels. The strength of these findings gives us increased confidence that lexical indices can distinguish between levels of lan-guage proficiency in individual learners. Such a finding also has important implications

Table 4. Coh-Metrix results for selected text samples

Beginning Korean learner

Intermediate Japanese learner

Advanced Arabic learner

Word imagability every word

408 348 345

Content word frequency, written

2.159 1.935 1.237

Lexical diversity M* 350 230 200Word familiarity content words

599 593 582

*The LD values reported by M are reversed scaled so that higher values correspond to lower LD.

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 17: Predicting the proficiency level of language learners using lexical indices

258 Language Testing 29(2)

for L2 language assessment in that lexical indices appear to be highly predictive of gen-eral language proficiency as classified by standardized test scores.

In distinguishing among levels of language proficiency, we are also able to explore the construct of lexical competence. Such an argument relies on the notion that language proficiency subsumes lexical competence. Thus, analyzing levels of language profi-ciency affords us the opportunity to examine the development of lexical competence (Zareva et al., 2005). This study, then, supports the importance of breadth of knowledge features and core lexical items in understanding how lexical competence differs across individuals’ language proficiency levels. The model of lexical competence yielded from our data is multidimensional and distinguishes L2 learners’ proficiency levels based on lexical diversity and the production of imagable, frequent, and familiar words.

Also, unlike past studies that focused on explaining lexical competence as a construct of human evaluations of lexical proficiency (e.g. Crossley et al., 2011, in press), this study focused on the potential for lexical indices to categorize L2 learners based on their general language proficiency. The learner-based approach found in this study provides a different angle from which to investigate lexical competence. The features of lexical competence that contribute to our understanding of lexical competence from this current study overlap with studies focusing on human evaluations of lexical proficiency (e.g. Crossley et al., 2011, in press), providing additional evidence as to the importance of these feature in understanding the construct of lexical competence. However, the assigned weights to these features differ. For example, human judgments of lexical proficiency are most strongly predicted through indices of lexical diversity. In contrast, this study demonstrates that lexical competence as a learner-based property is best predicted through an index of word imagability.

Perhaps as important as the variables included in the DFA are those variables which were not included. These included polysemy indices, hypernymy indices and indices of semantic co-referentiality. While this finding does not completely rule out the impor-tance of these theoretical constructs, it does indicate that the indices we used to assess these constructs were not strong predictors of proficiency levels in our L2 population. Thus, as levels increase, there are no significant differences in the reported number of senses assigned to the produced words, the specificity of those words, or the semantic co-referentiality shared among those words.

We also note that many measures selected for this study may require additional enhancement. For instance, the WordNet and MRC Psycholinguistic Database indices used in this study are not completely representative of the entire English lexicon. In addi-tion to concerns with lexical coverage, many of the indices used in this study could be refined to better assess lexical competence. Our polysemy indices, for instance, assess word sense use peripherally (i.e. how many senses a word contains and not which sense a produced word represents) while our indices of semantic co-referentiality assess lexical links at the sentence and paragraph level, but not at the word level. Additionally, the indices derived from the MRC Psycholinguistic database are a reflection of the language experiences of L1 speakers and not L2 speakers. While there is no evidence that the lan-guage experiences of the two groups differ in such a manner that the lexical properties of words would differ significantly or that such features are not inherent properties of the words, it is a theoretical consideration. It is also important to note that we did not assess phrasal or syntactic knowledge in the sampled texts. To our knowledge, automatic

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 18: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 259

indices measuring such features have not been developed. Refining the current indices along with developing new indices to assess phrasal and syntactic knowledge would likely increase the accuracy of classification and better explain links between general language proficiency and lexical competence. Any such research replicating or extend-ing the findings of this study would also benefit from a larger corpus that better repre-sented the breadth and diversity of L2 learners.

In conclusion, this study continues an important discussion about how computational indices can be used to measure the language proficiency level of L2 learners as well as their lexical proficiency. The learner-based approach found in this study provides us with an opportunity to consider different aspects of lexical competence as well as provide sup-porting evidence for the strength of lexical indices to help explain lexical competence. Lastly, this study contributes to developments in automatic language assessments.

Acknowledgments

This research was supported in part by the Institute for Education Sciences (IES R305A080589 and IES R305G20018-02). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the IES. The authors would also like to thank Christie Collins and Richard Raymond at Mississippi State University who provided valuable resources without which this study could not have happened. The authors would also like to thank the three anonymous reviewers who provided critical and welcome feedback on this study.

Notes

1. Word frequency measures also overlap with depth of knowledge measures in some models of lexical acquisition (e.g. Ellis, 2002). In such models, exposure to the repetition of high frequency words strengthens the connections between the word and its meaning, providing a connectionist explanation.

2. For a rebuttal of Meara’s (2005b) study, please see Laufer (2005).3. Word concreteness, familiarity, imagability, and meaningfulness indices are calculated using

human ratings taken from native speakers of English. These ratings reflect the language expe-riences of native speakers of English and could potentially differ from judgments made by L2 learners of English. However, there is no evidence to demonstrate that the language experi-ences of L2 learners produce significantly different word judgments than L1 speakers or that these types of word judgments do not reflect inherent properties of the lexicon.

4. The F1 scores function as a weighted average of the precision and recall scores. F1 scores are calculated by multiplying the precision and recall scores, dividing that number by the sum of the precision and recall scores, and multiplying that value by two.

References

Anglin, J. M. (1993). Vocabulary development: A morphological analysis. Monographs of the Society for Research in Child Development, 58(10), 1–166.

Arnaud, P. J. (1992). Objective lexical and grammatical characteristics of L2 written compositions and the validity of separate-component tests. In P. J. Arnaud & H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 133–145). Macmillan: London.

Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). CELEX. Philadelphia: Linguistic Data Consortium.

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 19: Predicting the proficiency level of language learners using lexical indices

260 Language Testing 29(2)

Bachman, L. F., Davidson, F. G, Ryan, K., & Choi, I. C. (1995). An Investigation into the Compa-rability of Two Tests of English as a Foreign Language: The Cambridge TOEFL Comparabil-ity Study. Cambridge: UCLES.

Bachman, L. F., & Palmer, A. S. (1982). The construct validation of some components of com-municative proficiency. TESOL Quarterly, 16, 449–465.

Balota, D. A., Pilotti, M., & Cortese, M. J. (2001). Subjective frequency estimates for 2,938 mono-syllabic words. Memory & Cognition, 29, 639–647.

Bell, H. (2003). Using frequency lists to assess L2 texts. Unpublished thesis, University of Wales Swansea.

Boldt, R. F, Larsen-Freeman, D., Reed, M. S., & Courtney, R. G. (1992). Distributions of ACTFL ratings by TOEFL score ranges. TOEFL Research Reports, 41.

Brown, G. D. A., & Watson, F. L. (1987). First in, first out: Word learning age and spoken word frequency as predictors of word familiarity and word naming latency. Memory & Cognition, 15, 208–216.

Carrell, P. L., & Monroe, L. B. (1993). Learning styles and composition. The Modern Language Journal, 77, 148–162.

Chaffin, R., & Glass, A. (1990). A comparison of hyponym and synonym decisions. Journal of Psycholinguistic Research, 19(4), 265–280.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.Connine, C. M., Mullenix, J., Shernoff, E., & Yelen, J. (1990). Word familiarity and frequency in

visual and auditory word recognition. Journal of Experimental Psychology: Learning, Memory and Cognition, 16, 1084–1096.

Crossley, S. A., & Salsbury, T. (2010). Using lexical indices to predict produced and not produced words in second language learners. The Mental Lexicon, 5(1), 115–147.

Crossley, S. A., Salsbury, T., & McNamara, D. S. (2009). Measuring L2 lexical growth using hypernymic relationships. Language Learning, 59(2), 307–334.

Crossley, S. A., Salsbury, T., & McNamara, D. S. (2010a). The development of polysemy and fre-quency use in English second language speakers. Language Learning, 60(3), 573–605. DOI: 10.1111/j.1467-9922.2010.00568.x.

Crossley, S. A., Salsbury, T., & McNamara, D. S. (2010b). The development of word co-referen-tiality in second language speakers: A case for Latent Semantic Analysis. Vigo International Journal of Applied Linguistics, 7, 55–74.

Crossley, S. A., Salsbury, T., McNamara, D. S., & Jarvis, S. (2011). What is lexical proficiency? Some answers from computational models of speech data. TESOL Quarterly, 45(1), 182–193.

Crossley, S. A., Salsbury, T., McNamara, D. S., & Jarvis, S. (in press). Predicting lexical proficiency in language learners using computational indices. Language Testing. DOI: 10.1177/026553221037803.

Daller, H., van Hout, R., & Treffers-Daller, J. (2003). Lexical richness in the spontaneous speech of bilinguals. Applied Linguistics, 24(2), 197–222.

Ellis, N. C. (2002). Frequency effects in language processing. Studies in Second Language Acqui-sition, 24(2), 143–188.

Ellis, N. C., & Beaton, A. (1993). Psycholinguistic determinants of foreign language vocabulary acquisition. Language Learning, 43(4), 559–617.

Ellis, R. (1995). Modified oral input and the acquisition of word meanings. Applied Linguistics, 16, 409–435.

Ellis, R., Tanaka, Y., & Yamazaki, A. (1994). Classroom interaction, comprehension, and L2 vocabulary acquisition. Language Learning, 44, 449–491.

Fellbaum, C. (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.Field, A. (2005). Discovering statistics using SPSS. London: Sage Publications.

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 20: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 261

de la Fuente, M. J. (2002). Negotiation and oral acquisition of L2 vocabulary: The roles of input and output in the receptive and productive acquisition of words. Studies in Second Language Acquisition, 24, 81–112.

Gee, N. R., Nelson, D. L., & Krawczyk, D. (1999). Is the concreteness effect a result of underlying network interconnectivity? Journal of Memory and Language, 40, 479–497.

Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical famil-iarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology: General, 113, 256–281.

Gilhooly, K. J., & Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity, and ambi-guity measures for 1,944 words. Behavior Research Methods & Instrumentation, 12(4), 395–427.

Haastrup, K., & Henriksen, B. (2000). Vocabulary acquisition: Acquiring depth of knowledge through network building. International Journal of Applied Linguistics, 10(2), 221–240.

Henriksen, B. (1999). Three dimensions of vocabulary development. Studies in Second Language Acquisition, 21, 303–317.

Huckin, T., & Coady, J. (1999). Incidental vocabulary acquisition in a second Language. Studies in Second Language Acquisition, 21(2), 181–193.

Ijaz, I. H. (1986). Linguistic and cognitive determinants of lexical acquisition in a second lan-guage. Language Learning, 36(4), 401–451.

Landauer, T. K., McNamara, D. S, Dennis, S., & Kintsch, W. (Eds.). (2007). LSA: A road to mean-ing. Mahwah, NJ: Lawrence Erlbaum.

Laufer, B. (2005). Lexical frequency profiles: From Monte Carlo to the real world: A response to Meara (2005). Applied Linguistics, 26(4), 582–588.

Laufer, B., & Goldstein, Z. (2004). Testing vocabulary knowledge: size, strength, and computer adaptiveness. Language Learning, 54, 469–523.

Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written produc-tion. Applied Linguistics, 16(3), 307–322.

Laufer, B., & Nation, P. (1999). A vocabulary size test of controlled productive ability. Language Testing, 16, 33–51.

Levenston, E., & Blum, S. (1977). Aspects of lexical simplification in the speech and writing of advanced adult learners. In P. S. Corder & E. Roulet (Eds.), The notions of simplifica-tion, interlanguages and pidgins and their relation to second language pedagogy (pp. 51–72). Librairie Droz.

LeVine, R. A. (1980). Influences of women’s schooling on maternal behavior in the third world. Comparative Education Review, 24, 78–105.

Malvern, D. D., Richards, B. J., Chipere, N., & Duran, P. (2004). Lexical diversity and language development: Quantification and assessment. Basingstoke, UK: Palgrave Macmillan.

McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophis-ticated approaches to lexical diversity assessment. Behavior Research Methods, 42, 381–392.

McNamara, D. S., Louwerse, M. M., McCarthy, P. M., & Graesser, A. C. (2010). Coh-Metrix: Capturing linguistic features of cohesion. Discourse Processes, 47, 292–330.

McNamara, D. S., & Graesser, A. C. (in press). Coh-Metrix: An automated tool for theoretical and applied natural language processing. In P. M. McCarthy & C. Boonthum (Eds.), Applied natural language processing and content analysis: Identification, investigation, and resolu-tion. Hershey, PA: IGI Global.

Meara, P. (1996). The dimensions of lexical competence. In G. Brown, K. Malmkjaer & J. Williams (Eds.), Performance and competence in second language acquisition, (pp. 35–53), Cambridge: Cambridge University Press.

Meara, P. (2005a). Designing vocabulary tests for English, Spanish and other languages. In C. Butler, S. Christopher, M. Á. Gómez González & S. M. Doval-Suárez (Eds.) The dynamics of language use (pp. 271–285). Amsterdam, John Benjamins Press.

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 21: Predicting the proficiency level of language learners using lexical indices

262 Language Testing 29(2)

Meara, P. M. (2005b). Lexical frequency profiles: A Monte Carlo analysis. Applied Linguistics, 26(1), 32–47.

Meara, P. M., & Bell, H. (2001). P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect, 16(3), 5–19.

Meyers, L. S., Gamst, G., & Guarino, A. J. (2006). Applied multivariate research: Design and interpretation. Thousand Oaks, CA: Sage Publications.

Michea, R. (1969). Repetition et variété dans l’emploi des mots. Bulletin de la Societe de Linguis-tique de Paris, 1–24.

Miller, G. A, Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1993). Five papers on WordNet. Cognitive Science Laboratory, 43. Princeton, NJ: Princeton University.

Miller, G. A., & Teibel, D. A. (1991). A proposal for lexical disambiguation. In M. P. Marcus (Ed.), Human Language Technology Conference: Proceedings of the workshop on speech and natural language (pp. 395–399). Pacific Grove, California: Association for Computational Linguistics.

Murphy, G. L. (2004). The big book of concepts. Cambridge, MA: MIT Press.Nation, I. S. P. (1988). Word Lists. Victoria: University of Wellington Press.Nation, I. S. P. (1990). Teaching and learning vocabulary. New York: Newbury House.Nation, P., & Heatley, A. (1996). VocabProfile, WORD and RANGE: Programs for processing

text. LALS, Victoria University of Wellington.Nunberg, G. (1979). The non-uniqueness of semantic solutions: Polysemy. Linguistics and Phi-

losophy, 3, 143–184.Oller, J. W. (1979). Language tests at school: A pragmatic approach. London: Longman.Paivio, A. (1991). Dual coding theory: Retrospect and current status. Canadian Journal of Psy-

chology, 45, 255–287.Paivio, A., Yuille, J. C., & Madigan, S. (1968). Concreteness, imagery, and meaningfulness values

for 925 nouns. Journal of Experimental Psychology Monograph Supplement, 76, 1–25.Pustejovsky, J. (1995). The generative lexicon. Cambridge: MIT Press.Qian, D. D. (1999). Assessing the roles of depth and breadth of vocabulary knowledge in reading

comprehension. Canadian Modern Language Review, 56, 282–308.Qian, D. D., & Schedl, M. (2004). Evaluation of an in-depth vocabulary knowledge measure for

assessing reading performance. Language Testing, 21(1), 28–52.Ransdell, S., & Wengelin, Å (2003). Socioeconomic and sociolinguistic predictors of children’s

L2 and L1 writing quality. Arob@se, 1–2, 22–29.Read, J. (1998). Validating a test to measure depth of vocabulary knowledge. In A. Kunnan (Ed.),

Validation in language assessment (pp. 41–60). Mahwah, NJ: Lawrence Erlbaum.Read, J. (2000). Assessing vocabulary. New York: Cambridge University Press.Salsbury, T., Crossley, S. A., & McNamara, D. S. (2011). Psycholinguistic word information in

second language oral discourse. Second Language Research, 27(3), 343–360.Schmitt, N. (1998). Tracking the incremental acquisition of a second language vocabulary: A lon-

gitudinal study, Language Learning, 48(2), 281–317.Schwanenflugel, P. (1991). Contextual constraint and lexical processing. In G. B. Simpson (Ed.),

Understanding word and sentence. Amsterdam: Elsevier.Snow, C. E. (1988). The problem with bilingual education research critiques: A response to

Rossell. Equity and Excellence, 23(4), 30–31.Snow, C. E. (1990). The development of definitional skill. Journal of Child Language, 17,

697–710.Stadthagen-Gonzalez, H., & Davis, C. J. (2006). The Bristol norms for age and acquisition, image-

ability, and familiarity. Behavior Research Methods, 38(4), 598–605.Toglia, M. P., & Battig, W. F. (1978). Handbook of semantic word norms. Hillsdale, NJ: Lawrence

Erlbaum.

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from

Page 22: Predicting the proficiency level of language learners using lexical indices

Crossley et al. 263

Verspoor, M., & Lowie, W. (2003). Making sense of polysemous words. Language Learning, 53(3), 547–586.

Wendt, A., & Woo, A. (2009). A minimum English proficiency standard for the Test of English as a Foreign Language Internet-Based Test (TOEFL-iBT). NCLEX Psychometric Research Brief. National Council of State Boards of Nursing.

Wesche, M., & Paribakht, T. S. (1996). Assessing second language vocabulary knowledge: Depth versus breadth. Canadian Modern Language Review, 53, 13–40.

Whitten, I. A., & Frank, E. (2005). Data mining. San Francisco: Elsevier.Wilson, M. D. (1988). The MRC Psycholinguistic Database: Machine Readable Dictionary, Ver-

sion 2. Behavioural Research Methods, Instruments and Computers, 20(1), 6–11.Wolter, B. (2001). Comparing the L1 and L2 mental lexicons: A depth of individual word knowl-

edge model. Studies in Second Language Acquisition, 23, 41–70.Zareva, A. (2007). Structure of the second language mental lexicon: How does it compare to native

speakers’ lexical organization? Second Language Research, 23(2), 123–153.Zareva, A., Schwanenflugel, P., & Nikolova, Y. (2005). Relationship between lexical competence

and language proficiency: Variable sensitivity. Studies in Second Language Acquisition, 27, 567–595.

at UNIV OF VIRGINIA on September 28, 2012ltj.sagepub.comDownloaded from