Transcript

Designing and Sustaining aForeign Language WritingProficiency Assessment Programat the Postsecondary LevelElizabeth BernhardtStanford University

Joan MolitorisStanford University

Ken RomeoStanford University

Nina LinStanford University

Patricia ValderramaStanford University

Abstract: Writing in postsecondary foreign language contexts in North America hasreceived far less attention in the curriculum than the development of oral proficiency.This article describes one institution’s process of confronting the challenges not only ofrecognizing the contribution of writing to students’ overall linguistic development, butalso of implementing a program-wide process of assessing writing proficiency. Thearticle reports writing proficiency ratings that were collected over a 5-year period formore than 4,000 learners in 10 languages, poses questions regarding the proficiency

Elizabeth Bernhardt (PhD, University of Minnesota) is Professor of GermanStudies and John Roberts Hale Director of the Language Center, StanfordUniversity, Stanford, CA.Joan Molitoris (PhD, Columbia University) is Lecturer in Spanish and AssociateDirector of the Language Center, Stanford University, Stanford, CA.Ken Romeo (PhD, Stanford University) is Lecturer in English for Foreign Studentsand Academic Technology Specialist for the Language Center, Stanford University,Stanford, CA.Nina Lin (MA, Stanford University) is Lecturer in Chinese in the LanguageCenter, Stanford University, Stanford, CA.Patricia Valderrama (PhD candidate, Stanford University) is Graduate TeachingAssociate in Comparative Literature, Stanford University, Stanford, CA.Foreign Language Annals, Vol. 48, Iss. 3, pp. 329–349. © 2015 by American Council on the Teaching of ForeignLanguages.DOI: 10.1111/flan.12153

Foreign Language Annals � VOL. 48, NO. 3 329

levels that postsecondary learners achievedacross 2 years of foreign language instruction,and relates writing proficiency scores to Sim-ulated Oral Proficiency Interview ratings for asubset of students. The article also articulatesthe crucial relationship between professionaldevelopment and writing as well as the role oftechnology in collecting and assessing writingsamples.

Key words: assessment, oral proficiency,technology, writing proficiency

IntroductionWriting in postsecondary foreign languagecontexts in North America has received farless attention in the curriculum than thedevelopment of oral proficiency. While atleast one extensive volume on the impor-tance of advanced writing in foreign lan-guage contexts exists (Byrnes, Maxim, &Norris, 2010), there remain a number ofreasons that account for the lack of bothresearch-based and instruction-based atten-tion to the development of foreign languagewriting throughout the early years of lan-guage acquisition. First, the focus of foreignlanguage instruction at the postsecondarylevel in the United States is most often onoral proficiency goals. Most foreign lan-guage programs intend to prepare learnersto speak and listen and to read so that theyare able to negotiate overseas foreign set-tings with confidence. Hence, oral profi-ciency is emphasized and writing in thesecontexts is often relegated to exercises thatreveal learners’ acquisition of grammaticalforms or developing breadth of vocabulary,to the thank-you note to foreign hosts or thepersonal resume, or as a means of measur-ing syntactic complexity. In upper-levelcourses, programs may even allow compo-sitions to be written in the native languageof students, English in the American con-text, in order to facilitate deeper literary andcultural interpretations. Even dissertationsproduced in foreign language departmentsin many American universities are written

in English. This phenomenon stands instark contrast to the field of English as asecond language, which attempts to prepareEnglish language learners with the skillsthat they will need to pursue bachelor’s orpost-bachelor’s degrees in English-speakingcountries—a project that necessarily entailscopious amounts of academic writing. Anumber of research studies such as Leki(1995) and Leki and Carson (1997) existon this topic and have been synthesizedsuccinctly by Hedgcock (2005).

Another speculation for the lack of fo-cus on foreign language writing is that writ-ing is a planned language performance, incontrast to the spontaneous language per-formance of oral proficiency, and, hence, isviewed as less demanding. The languagedevelopment research that has dominatedstudies in second language acquisition hasbeen principally rooted in oral assessments,with literacy (writing or reading) rarely ac-knowledged as an important dimension ofinput (Bernhardt, 2011). With the excep-tion of Byrnes and colleagues (2010), whocontended that writing provides learnerswith the ability to perform in genres andhence is a “particularly valued indicator ofoverall FL development toward upper levelsof ability” (p. 4), most research has ignoredlearners’ abilities to integrate spoken andwritten texts; even fewer studies referredto foreign language writing in terms of rela-tively lengthy connected discourse. Reichelt(1999) thoroughly reviewed these and otherdilemmas confronted by the context of for-eign language writing.

The development of the foreign lan-guage profession itself from the 1980s on-ward has lacked perspective on writingdevelopment. In the early years of the pro-ficiencymovement, the focuswas exclusive-ly on oral proficiency and centered on aconsistent measure of student performance,most especially in their oral performance.Admittedly, the ACTFL Proficiency Guide-lines entailed reading, writing, listening,and speaking from their initial publicationin 1982, with the primary emphasis on oralproficiency. This was evidenced by a full-

330 FALL 2015

scale certification process, attached only tooral proficiency interviewing and ratingwith concomitant recertification possibili-ties, that was developed throughout the1980s. The potential for assessment in read-ing, listening, and writing, comparable tooral proficiency assessment, remained un-tapped for more than a decade. Writingproficiency has, of course, been includedfrom the inception of the formal discussionaround proficiency rooted in the ForeignService Institute (FSI) guidelines. An earlyversion of the ACTFL guidelines that in-cluded all four skills appeared in 1986,and a further version also accompaniedthe revised guidelines for oral proficiencyin 1999 (Breiner-Sanders, Lowe, Miles, &Swender, 2000). In 2001, the PreliminaryProficiency Guidelines–Writing Revised(Breiner-Sanders, Swender, & Terry,2002) were published. In parallel to theprocess involving oral proficiency, the writ-ing guidelines were widely distributed, andultimately a training and certification pro-cedure was put into place by 2008. Further,a writing protocol, the Writing ProficiencyTest (WPT), was developed and made avail-able commercially via online delivery aswell as in hard copy. In spite of the consid-erable research that exists on oral proficien-cy attainment, i.e., the number of hours ofinstruction related to proficiency levels(Omaggio, 1986) and the ultimate attain-ment of language majors in their oral profi-ciency (Glisan, Swender, & Surface, 2013;Swender, 2003), limited published data ex-ist with regard to the role of writing inforeign language programs or the use ofthe ACTFL writing guidelines.

In light of the complexities of consider-ing the role of writing in the postsecondaryforeign language programs, the limited re-search base, and a continued focus on oralproficiency, this article describes a pro-gram-wide approach to assessing students’writing proficiency. It offers writing profi-ciency data from for 4,476 postsecondarylearners in 10 languages across 2 years offoreign language instruction. Further, itcompares those writing data with oral

proficiency ratings for a subset of learnersand provides insight into the role of tech-nology and its influences on writing perfor-mance. Specifically, it poses the followingquestions:

1. What writing proficiency ratings arelearners able to achieve across the levelsof a 2-year foreign language sequence?Are there differences based in Englishcognate vs. English noncognatelanguages?

2. What is the relationship between foreignlanguage learners’ oral proficiency rat-ings and their writing proficiency rat-ings? Are there differences in therelationship based in English cognateand English noncognate languages?

3. Does technology use influence foreignlanguage writing assessment?

Literature ReviewSeveral studies have examined foreign lan-guage writing as a support mechanism forlanguage learning and have used or men-tioned the ACTFL guidelines as bench-marks. Armstrong (2010) comparedgraded compositions; for-credit online dis-cussion boards; and ungraded, not-for-cred-it essays in order to determine the effect ofgrades on foreign language writing in afourth-semester Spanish class. Specifically,she tried to understand differences in theaccuracy, fluency, and complexity betweengraded and ungraded assignments, buildingon previous work done with the ACTFLwriting guidelines in the curriculum; Arm-strong suggested that more frequent andungraded writing assignments should beincorporated into the foreign languageclassroom, since assessment had little effecton student writing. Brown, Brown, and Eg-gett (2009) also looked to writing as amechanism for enhancing language devel-opment. They described a curriculum forthird-year foreign language courses thatwas grounded in content-based instructionto enhance written argumentation and to

Foreign Language Annals � VOL. 48, NO. 3 331

differentiate it from oral production. Theaim of the curricular shift was to help stu-dents cross the Intermediate-Advanced bor-der, as defined by the writing guidelines.They found that a focus on Advanced-and Superior-level writing tasks proved sta-tistically significant, as measured by pre-and post-WPT ratings over the course of asingle semester. Similarly, Dodds (1997)looked to content-based instruction usingfilm to develop German language proficien-cy. Her curricular experiment, using Ger-man films and television series as thecontext for writing assignments and class-room discussion, indicated that using thewriting proficiency guidelines as an orga-nizing principle improved student perfor-mance by providing the basis for clear goalsfor the course. Godfrey, Treacy, and Tarone(2014) turned to a study abroad context forexamining the foreign languagewriting pro-cess. In their project, they compared thedevelopment of writing skills of a groupof study abroad students with those of adomestic group and investigated how thestudents’ ratings on the ACTFL WPT relat-ed to the “more fine-grained measures” (p.51) of fluency, accuracy, and complexity oflinguistic form, as well as the form-functionmapping of making claims and providingsupporting evidence. The authors ended byadvocating for a more multidimensional ap-proach in the evaluation of second language(L2) writing development.

Findings from two studies challengedthe use of the proficiency guidelines formapping development. Henry (1996) exam-ined characteristics of short essays in Rus-sian at four levels of collegiate study in orderto contribute to the empirical testing of theACTFL proficiency guidelines for writing.In response to Vald�es, Haro, and Echevar-riarza (1992), she questioned whether theguidelines could be used to build a generaltheory of L2 writing, specifically for Novice-and Intermediate-level students. Her re-search partially supported the existence ofan early writing stage similar to the ACTFL’sNovice-level descriptors, and she concludedby expressing continued doubt about the

validity of the ACTFL guidelines for writ-ing. Brown, Solovieva, and Eggett (2011)also examined writing and described a cur-riculum for Advanced- and Superior-levelL2 writing in Russian and discussed the useof both quantitative and qualitative evalua-tion measures. The authors conducted aquantitative analysis of complexity meas-ures in writing samples to understand up-take in writing proficiency. In comparingthe holistic, qualitative WPT ratings withthese quantitative measures, their findingsdemonstrated the importance of using bothtypes of measures when analyzing L2 writ-ing and led the researchers to question thelimits of outcome-based courses incorporat-ing proficiency scales in the curriculum.

Other studies examined the ACTFLguidelines for writing in and of themselves.Hubert (2013), for example, compared col-legiate students’ scores on ACTFL Oral Pro-ficiency Interviews (OPIs) and WPTs inorder to better understand how the develop-ment of speaking and writing proficiencieswas related. The study showed a “fairlystrong” positive correlation between speak-ing and writing proficiencies among the stu-dents across beginning, intermediate, andadvanced Spanish. However, that correla-tion weakened when measured in eachcourse individually. Hubert concluded thatspeaking andwritingproficiencies improvedat similar rateswhen viewed on a global leveland with a long-term perspective and calledfor a pedagogical approach that enhancedproficiency across both modalities.

Rifkin (2005) conducted a longitudinalanalysis of listening, reading, speaking, andwriting proficiency of students in the Mid-dlebury College summer immersion pro-gram, the majority of whom werecollegiate language learners. He measuredproficiency in each area through computer-ized tests based on the ACTFL proficiencyguidelines for listening, reading, andwriting,and either ACTFL-certified OPIs or oral ex-amsmodeled onOPIs. He found a significantcorrelation between the four modalities andhours of classroom instruction (in immer-sion and nonimmersion settings) as well as

332 FALL 2015

grammatical competence. In his data, speak-ing and writing proficiencies showed theclosest relationship. Overall, the first partof his research provided an overview of post-secondary, nonimmersion Russian languageinstruction in the United States and sug-gested that more than 600 hours of instruc-tion were required to bring students toAdvanced-level reading, writing, speaking,and listening proficiency in a noncognatelanguage. The second part of his researchcalculated “the benefit of immersion” (Rif-kin, 2005, p. 10), which he deemed moreefficient in bringing students to higher pro-ficiency levels. Rifkin suggested a possibleceiling effect in the higher levels of theACTFL proficiency pyramid in the tradition-al university course sequence that offers 400hours or fewer of instruction over 4 years.Heconcluded by advocating for changes in cur-ricular policy that allowed for more immer-sion experiences as well as more hours ofclassroom instruction, and that integratedthe teaching of grammar and syntax.

I. Thompson (1996) rated the speak-ing, reading, listening, and writing profi-ciencies of students of Russian after 1, 2,3, 4, and 5 years of study using tests basedon the ACTFL guidelines for each area. Thegoal was assessing whether the proficiencydescriptors were realistic and attainable forcollegiate foreign language programs, andwhether there existed a significant positivecorrelation between proficiency levels in thefour skills as well as between the four skillsand levels of study. Her data revealed over-lapping ranges of performances with noexact correspondence between levels ofstudy and levels of proficiency in the fourmodalities. Although the median proficien-cy level increased with each additional yearof study, students with no change in profi-ciency scores were found at almost all levelsof study. The four modalities themselveswere found to have a significant positivecorrelation, although none of the correla-tions “were particularly impressive” (I.Thompson, 1996, pp. 54–55). Thompsonconcluded that each skill followed a slightlydifferent and nonparallel development. Like

Rifkin (2005), she noted a ceiling effectcaused by the exponential nature of theACTFL scale and suggested that this posedsignificant problems for developing testsbased on the guidelines for reading, writing,and listening (Clifford &Cox, 2013; Cox&Clifford, 2014; Glisan et al., 2013).

As a whole, these studies provide asuggestive yet shadowy knowledge base re-garding foreign language writing specifical-ly in postsecondary settings. The field needsto continue to investigate a range of valid,reliable, and efficient tools for examiningand gauging foreign language writing profi-ciency and thus develop a deeper under-standing of what writing as writing—as aprocess in and of itself rather than as a toolfor grammatical practice—provides thefield as an insight into written languagedevelopment, a view passionately and co-gently expressed by Byrnes et al. (2010).

The Institutional Context

Establishing a Proficiency-OrientedCurriculumThe trajectory of the Stanford LanguageCenter, established in 1995, is parallel tothat of the modern foreign language profes-sion across the same time period. In itsbeginnings, it too emphasized oral profi-ciency without apology. First, becauseoral proficiency is the most difficult skillto acquire in formal settings, it was impor-tant to measure student progress within thischallenging context. Second, oral proficien-cy was the dimension of language studyperceived as lacking by the wider universitycommunity at the founding of the LanguageCenter (Stanford University Board of Trust-ees, 1994). Third, a nationally recognizedscale and a concomitant training program,namely, the ACTFL/FSI scale and relatedrater and interview workshops attached tothe OPI, were available. Further, a cost-ef-fective, validated mechanism existed thatenabled large-scale assessment in the formof the Simulated Oral Proficiency Interview,or SOPI (Kenyon & Tschirner, 2000; Ma-lone, 2000; Shohamy, Gordon, Kenyon, &

Foreign Language Annals � VOL. 48, NO. 3 333

Stansfield, 1989; Stansfield & Kenyon,1992). The SOPI has been used successfullyin large-scale program assessment (R.Thompson et al., 2014).

From 1995 on, the Language Centerconducted program-level assessments oforal proficiency development (Bernhardt,1997, 2009), using the SOPI for all learnersexiting a course sequence and the OPI insubsets of those same learners. The intentionof this systematic assessment program wastwofold: (1) to be able to document theprogress of students through programs inSpanish, French, Portuguese, Italian, Ger-man, Russian, Arabic, Chinese, Japanese,Korean, and Hebrew, ensuring that studentsmet established benchmarks across theirlanguage learning experiences, and (2) toexamine the extent to which individual pro-gramsmet andperhaps exceeded their statedobjectives. Findings from this process havebeen documented throughout the literature,most recently by Bernhardt and Brillantes(2014), and all data reported are availableat http://www.language.stanford.edu

Alongside assessment, the LanguageCenter also implemented a curricular modelbased on the most current research in L2literacies. Target objectives were developedin each language program, following a pro-totype crafted in 1997 by Spanish and Por-tuguese language instructors (representingthe largest enrollments) and respectful ofthe unique features of each language (Bern-hardt, Vald�es, & Miano, 2009). These docu-ments, now available in their revisedversions at http://www.language.stanford.edu, all have as their foundation theNationalStandards for Foreign Language Learning(National Standards, 2006), with particularemphasis on the interpersonal, interpretive,and presentational modes of communica-tion. In contrast to more traditional curricu-la, which are often textbook- or four-skill-driven, these objectives laid out concretedevelopmental goals by detailing what stu-dents should be able to do with the languagewithin each of the three modes and withinand across courses that form a yearlong se-quence. Writing, specifically in parallel to

speaking, was integrated within the presen-tational or interpersonal mode according tothe type of task, purpose, and audience andwas also articulated developmentallythroughout the course sequence as it becameprogressively more complex and demon-strated features of increasing proficiency.

When the ACTFL scale for assessingwriting proficiency, after years of discus-sion, debate, and refinement, was finalized,it followed the general outline of the oralproficiency scale and focused on functionalwriting ability in a foreign language by mea-suring the performance of specific writingtasks against the criteria stated in the Pre-liminary Proficiency Guideline–Writing Re-vised (Breiner-Sanders et al., 2002). Inparallel to the OPI scale, the writing scalealso had an assessment as well as a certifi-cation procedure for raters attached to it:the WPT. What it did not have was a vali-dated, simulated protocol for writing thatwas parallel to the SOPI, which could ac-commodate wide-scale programmatic as-sessment at a reasonably low cost.

In 2007, the staff at the Stanford Lan-guage Center took up the challenge of devel-oping and piloting a protocol that wouldcapture the intention of the WPT, providethe potential for appropriate test statistics inrelation to the WPT, and do so in a cost-effective and efficient manner. It was impor-tant to take up the challenge of adding writ-ing assessment to the already-establishedsystematic program in oral proficiency as-sessment for three significant reasons. First,adding writing provided a more completepicture of what students were actually ableto do with the language; in other words, itwas a view into their literacy, which is of theutmost relevance to their academic future.Second, writing provided a concrete viewinto learners’ linguistic development, unaid-ed by external supports that oral discourse,particularly interactive speech, provides. Athird reason was that the ACTFL trainingand certification component attached towriting assessment had recently been estab-lished. Having this procedure available en-abled continuing substantive professional

334 FALL 2015

development for the teaching staff, a keyfactor in the success of programs that aimto bring language learners to higher levels ofproficiency.

Maintaining and Enhancing aSystemic Professional DevelopmentProgramOver the years, professional programming atthe Language Center had been grounded in acrucial process: OPI tester training and certifi-cation. All staff, numbering approximately 65full-time instructors across 14 languages, con-tinue to participate in the initial stages of oralproficiency interview training by attending thecorresponding workshop in a 2-day or 4-dayformat. Almost all complete the full 4-daytraining. More than 70% of instructors todate have received full certification in oralinterview rating and testing. Maintaining themomentum in this process to include writingwas critical. Instructors began in 2008 to pur-sue WPT rater training, adding it to their al-ready established OPI certification. Within 5years, more than half of the entire languageteaching staffwasACTFL-certified inbothoralrating and testing and in writing.

A successful professional developmentprogram pushes a staff forward intellectual-ly. For the Language Center, this meant thatinstructors showed a growing interest inwriting and in including it as a corollary tothe already established oral assessment pro-gram. This professional stance meant thatthere needed to be parameters and promptsfor a writing proficiency assessment thatwould be consistent with instructors’ formalknowledge about writing proficiency assess-ment garnered through their WPT certifica-tion process. Hence, a core group of ACTFLwriting-certified instructors across Englishcognate and English noncognate languagescollaborated on drafting a writing proficien-cy assessment (WPA). The collaboration fo-cused on format, duration, and level anddesign of prompts; generated potential writ-ing contexts; and contributed sampleprompts to a wiki, with an eye toward creat-ing a template that all languages could use.

Ultimately, the collaborative group de-veloped prompt types, based on the estab-lished proficiency-based program objectivesof the language program. Prompts were con-structed to elicit Novice- and Intermediate-level functions, with a “mini-probe” to testfor the Advanced level, to elicit Advanced-level functions with a probe-like task of in-creasing difficulty targeted at the Superiorlevel. Using these prompts, two forms ofwriting assessment were created: a shortform and a long form. The short form wasintended for students completing the firstyear of a noncognate language (150 hoursof instruction). The long form was adminis-tered at the end of first-year cognate lan-guages (150 hours) and at the end of allsecond-year languages (300 hours), cognateand noncognate alike. This framework wasconsistent with the general structure of theSOPI, in that duration and type of tasks cor-responded to the anticipated proficiencyrange of the test-takers. Similarly, the WPAwas structured to align with the proficiencyobjectives of the established curriculum andto respond to institutional constraints suchas delivery within a 50-minute class session.In contrast to oral proficiency assessments,however,writing proficiency assessments areobviously not interactive. It was crucialtherefore that each prompt elicit sufficientwriting that reflected thewriter’s proficiency,and at the same time that a broad range ofcontexts and functions be represented in agiven test. Sample prompts developedthrough professional collaboration are pro-vided in the Appendix. This form of a WPAallowed ratable samples to be collected fornoncognate as well as cognate languages,in first- as well as second-year languagecourses.

Using Technology in WritingAssessmentA critical arena within any modern institu-tional context is technology. In the initialyears of developing and implementing thewriting assessment, tasks and topics wereadministered on paper, which was

Foreign Language Annals � VOL. 48, NO. 3 335

complicated and time-consuming. For eachcourse section in a classroom, test promptshad to be delivered from the administrativeoffices and returned along with student re-sponses. Responses then had to be deliveredto raters who then had to return them alongwith the ratings, all without losing a singlesheet of paper since the loss of a studentresponse was a potentially serious breachof student privacy as dictated by FERPA(Family Educational Rights and PrivacyAct) as well as a potential threat to test integ-rity. The teaching staff also lamented theinefficiency of this process.

The obvious solution to this problemwas to administer the writing assessment viathe university’s learning management sys-tem (LMS). Enrollment information wasreadily available in a system built to besecure, and prompts could be distributedand responses collected and rated withoutthe risk of losing physical artifacts. Usingthe LMS also enabled students to use asystem with which they were familiar andcomfortable, since most of the work in theirother courses was done on a computer witha keyboard. Yet questions and concerns re-mained, specifically about student knowl-edge of foreign characters and how to findthem on a keyboard, about how much writ-ing students could reasonably produce in astandard examination format of 50minutes,

and about how to ensure that learners werenot relying on Web-based help such asgrammar and spell-checking as well as lo-cating and copying passages from the Web.

Methods

ParticipantsStudents at Stanford University are requiredto complete thefirst year of language instruc-tion or its equivalent. Most students enterthe university having completed the require-ment by testing out with an IntermediateMid level of oral proficiency as well gram-matical knowledge and/or scores of 5 on anAdvanced Placement examination. Eachyear, the approximately 800 students whodo not test out of the language requirementcomplete either a first-year (150 hours ofinstruction) or a second-year sequence (anadditional 150 hours of instruction) and areassessed in their oral andwritingproficiency.These data from the academic years 2009through 2014 academic years are providedin Tables 1–4, for a total of 3,310 studentswho completed the first-year sequence and1,166 students who completed the second-year sequence across 10 languages.

To better understand the relationshipbetween students’ writing performance, asmeasured by the WPA, and their oral per-formance, as measured by the SOPI, data

TABLE 1

Writing Proficiency Ratings of 2,066 Learners in English-Cognate Languages

After 150 Hours of Instruction, 2009–2014

Ratings After 150 Hours (in percentages)

NH IL IM IH AL AM

French (N¼ 440) 2 21 67 10German (N¼ 205) 9 45 36Italian (N¼ 378) 2 26 57 13 1Portuguese (N¼ 86) 1 48 29 13 9Spanish (N¼ 957) 1 16 57 24 2

Notes: NH¼Novice High, IL¼ Intermediate Low, IM¼ Intermediate Mid, IH¼ Interme-diate High, AL¼Advanced Low, AM¼Advanced Mid.

336 FALL 2015

generated by all students who were enrolledin first- and second-year course sequencesduring the 2013–2014 academic year weretargeted for analysis. WPA and SOPI datafrom 444 first-year students and 209 sec-ond-year students in the 2013–2014 cohortgroup are reported below in the Findings forQuestion 2.

ProceduresWPAs were completed at the end of eachcourse from 2009 to 2014 along with SOPIs.The correlation between the SOPI and theOPI has been reported at between 0.85 and

0.91 across a number of studies (Clark& Li,1986; Kenyon& Tschirner, 2000; Shohamyet al., 1989; Stansfield & Kenyon, 1992). Ina Language Center internal comparison ofSOPI ratings with Language Testing Inter-national (LTI) ratings of telephonic OPIs(N¼ 156), the correlation was 0.85 (LTIis the testing branch of the ACTFL). TheStanford SOPIs were assessed by certifiedOPI raters and testers, many of whom regu-larly test for LTI. Interrater reliability wascalculated each year and ranged from 0.87to 0.99 across all languages. SOPIs werealways administered first. Several class ses-sions later, learners sat for the internalWPA

TABLE 2

Writing Proficiency Ratings of 1,244 Learners in Non-English-Cognate

Languages After 150 Hours of Instruction, 2009–2014

Ratings After 150 Hours (in percentages)

NH IL IM IH AL AM

Arabic (N¼ 218) 9 77 13Chinese (N¼ 483) 34 58 3Japanese (N¼ 390) 16 49 17 2Korean (N¼ 63) 24 62 12Russian (N¼ 90) 30 45 5

Notes: NH¼Novice High, IL¼ Intermediate Low, IM¼ Intermediate Mid, IH¼ Interme-diate High, AL¼Advanced Low, AM¼Advanced Mid.

TABLE 3

Writing Proficiency Ratings of 680 Learners in English-Cognate Languages

After 300 Hours of Instruction, 2009–2014

Ratings After 300 Hours (in percentages)

NH IL IM IH AL AM

French (N¼ 149) 1 16 54 23 6German (N¼ 14) 8 67 28Italian (N¼ 71) 8 33 48 18Portuguese (N¼ 51) 2 21 32 24 21Spanish (N¼ 395) 1 6 31 40 21

Notes: NH¼Novice High, IL¼ Intermediate Low, IM¼ Intermediate Mid, IH¼ Interme-diate High, AL¼Advanced Low, AM¼Advanced Mid.

Foreign Language Annals � VOL. 48, NO. 3 337

anchored in the ACTFL WPT. Each WPAwas assessed by two WPT certified raters,many of whom also test regularly for LTI.Interrater reliability was calculated eachyear and ranged from 0.85 to 0.92.

These oral and writing performanceswere uploaded electronically within the dig-ital language laboratory and delivered toraters via the university’s course manage-ment system. The certified OPI and WPAraters were paid for their assessments via 1or 2 months of summer salary. Ratings weredelivered to the Language Center for reli-ability analyses and for final data entry.When there were any discrepancies be-tween raters, the lower rating was accepted.In the rare instances when there were dis-crepancies at the main proficiency level bor-der, a third rater rerated the sample.

AnalysesWriting data across five administrations oftheWPA (2009–2014) were sorted and per-centages were calculated for each proficien-cy level. To analyze the relationship betweenoral and writing ratings, data were takenfrom a subsample of more than 600 partic-ipants from the 2013–2014 academic yearcohort for whom ratings on each assessment(writing and oral) could be preciselymatched. Data for Chinese and Korean

were eliminated from these analyses dueto a lack of precisely matched data. All nom-inal rater data were converted to numericalequivalents based on Dandonoli and Hen-ning (1990) for statistical analysis. In theirscheme, values range from 0.1 for a NoviceLow to 2.3 for AdvancedMid.Matched pair ttestswere then conducted for each language.To assess differences in handwriting andkeyboarding, analyses of variance were con-ducted based onword counts across samplesof French and Spanish learners who re-sponded to identical prompts.

Findings

Question 1: What writing proficiencyratings are learners able to achieveacross the levels of a two-year foreignlanguage sequence? Are theredifferences based in English cognatevs. English noncognate languages?Table 1 displays data for writing perform-ances across five academic years of studentscompleting a first-year sequence in the En-glish cognate languages of French, German,Italian, Portuguese, and Spanish(N¼ 2066), and Table 2 displays data gen-erated in English noncognate languages ofArabic, Chinese, Japanese, Korean, andRussian (N¼ 1244). Tables 3 and 4

TABLE 4

Writing Proficiency Ratings of 486 Learners in Non-English-Cognate

Languages After 300 Hours of Instruction, 2009–2014

Ratings After 300 Hours (in percentages)

NH IL IM IH AL AM

Arabic (N¼ 113) 2 25 56 16Chinese (N¼ 155) 1 37 47 1Japanese (N¼ 138) 23 50 24 2Korean (N¼ 21) 23 68 9Russian (N¼ 59) 6 26 59 7 1

Notes: NH¼Novice High, IL¼ Intermediate Low, IM¼ Intermediate Mid, IH¼ Interme-diate High, AL¼Advanced Low, AM¼Advanced Mid.

338 FALL 2015

illustrate data from students completingsecond-year language sequences, groupedby English-cognate (Table 3; N¼ 680) andEnglish-noncognate languages (Table 4;N¼ 486). Generally speaking, first-year stu-dents learning languages that are cognateswith English achieved ratings that wereprincipally (55% on average) in the Inter-mediate Mid range. Learners of languagesthat are not cognate with English tended toachieve an Intermediate Low rating (an av-erage of 58%), with 77% of Arabic learnersachieving this average. Second-year learners(Tables 3 and 4) tended to cross at least onesublevel when compared with first-yearlearners. In the case of noncognate lan-guages such as Chinese and Japanese, learn-ers moved from Intermediate Low to Mid,while in Arabic, many learners moved twosublevels, namely from Low to High. Learn-ers in cognate languages such as French,Spanish, and Italian frequently (around25%) moved into the Advanced range atthe end of the second-year sequence. Insummary, not surprisingly, learners whodid not need to conquer an orthographic

distance were able to apply their ortho-graphic background knowledge and achievea proficiency rating that was higher thanthose achieved in languages in which learn-ers were required to learn not only the lan-guage, but also the written script. The dataalso revealed some of the advantages thatalphabetic languages, such as Arabic, haveover languages written with characters,such as Chinese.

Question 2: What is the relationshipbetween foreign language learners’oral proficiency ratings and theirwriting proficiency ratings? Are theredifferences in the relationship basedin English cognate and Englishnoncognate languages?In order to examine the relationships be-tween speaking and writing ratings, all pro-ficiency ratings, oral and written, from the2013–2014 academic year (eight languages)were converted into numerical ratings(Dandonoli & Henning, 1990). Table 5

TABLE 5

Relationship Between Speaking and Writing Ratings of 513 Learners

Completing First- or Second-Year Sequence in One of Five

English-Cognate Languages, 2013–2014

French N SOPI SD WPA SD r t Tests Probability

150 hours 70 1.198 0.02 1.26 0.02 0.374 t(69)¼ –3.11 p< 0.001300 hours 46 1.51 0.09 1.624 0.09 0.714 t(45)¼ –3.14 p< 0.001

German N SOPI SD WPA SD r t Tests Probability

150 hours 36 1.244 0.01 1.436 0.06 0.44 t(35)¼ –5.08 p< 0.001300 hours 14 1.657 0.05 1.95 0.06 0.19 t(13)¼ –3.5 p< 0.001

Italian N SOPI SD WPA SD r t Tests Probability

150 hours 56 1.267 0.02 1.28 0.02 0.51 t(55)¼ –0.66 p< 0.25300 hours 8 1.88 0.08 1.95 0.03 0.81 t(7)¼ –1 p< 0.17

Portuguese N SOPI SD WPA SD r t Tests Probability

150 hours 14 1.407 0.05 1.371 0.03 0.51 t(13)¼ 1 p< 0.16300 hours 22 1.83 0.05 1.786 0.1 0.81 t(21)¼ 0.82 p< 0.21

Spanish N SOPI SD WPA SD r t Tests Probability

150 hours 167 1.19 0.02 1.57 0.09 0.45 t(166)¼ –18.2 p< 0.001300 hours 80 1.68 0.11 2.19 0.03 0.27 t(79)¼ –13.4 p< 0.001

Foreign Language Annals � VOL. 48, NO. 3 339

describes the relationships between learn-ers’ oral proficiency ratings, as documentedby SOPI ratings, and their WPA ratings.

In Spanish, French, and German, inboth the first- and second-year programs,writing performances were always statisti-cally significantly higher than the respectiveoral ratings (French, t(69)¼ –3.11,p< 0.001 and t(45)¼ –3.14, p< 0.001;German, t(35)¼ –5.08, p< 0.001 and t-(14)¼ –3.5, p< 0.001; Spanish, t(166)¼–18.2, p< 0.001 and t(79)¼ –13.4,p< 0.001). This finding was also consistentwithfirst-yearArabic and Japanese (Table 6)(Arabic, t(24)¼ –4.71, p< 0.001 and Japa-nese, t(58)¼ –4.44, p< 0.001), but notwith second-year Arabic (t(10)¼ –1.86,p< 0.255) and Japanese (t(16)¼ 1,p< 0.166). The finding was generally un-surprising in that learners had more time tocompose and to correct in any writing as-sessment than they did in their spontaneousoral performances. This underlines theByrnes et al. (2010) contention that writingis an excellent measure of consolidatedskills and can reveal language acquisitionin a fashion that an oral performancecannot.

Interestingly, Portuguese and Italiandid not follow the pattern of other En-glish-cognate languages. The data indicatedno statistically significant differences in

learner performances in speaking as com-pared with writing (Portuguese,t(13)¼ 1, p< 0.16 and t(21)¼ 0.82,p< 0.21; Italian, t(55)¼ –0.66, p< 0.25and t(7)¼ –1, p< 0.17). A possible expla-nation for this phenomenon is that eachlanguage program—Portuguese and Italian—attracts a majority of students who arealready familiar with a closely related lan-guage (Spanish or French, respectively).Perhaps the consolidation of grammaticalskills for this population in Portugueseand Italian was more facile as comparedto learner processes in the other English-cognate languages under investigation.

Table 6 indicates that first- and second-year Russian and second-year Japanese andArabic were the outliers in the present dataset: There was no difference betweenwritingand speaking proficiency ratings in first-yearRussian (t(16)¼ –1.6, p< .06) and in sec-ond-year Arabic (t(10)¼ –1.86, p< 0.255)and Japanese (t(16)¼ 1, p< 0.166). Admit-tedly, the total number of subjects in theselanguages was much smaller than in theEnglish-cognate languages examined, andthis lack of statistical power may haveskewed the data. It is also possible that stu-dents adopted a “write what I can say” strat-egy in these languages that use non-Romanorthographies, whereas English-speakinglearners of English cognate languages were

TABLE 6

Relationship Between Speaking and Writing Ratings of 140 Learners

Completing First- or Second-Year Sequence in One of Three

Non-English-Cognate Languages, 2013–2014

Arabic N SOPI SD WPA SD r t Tests Probability

150 hours 25 0.844 0.10 1.12 0.02 0.42 t(24)¼ –4.71 p< 0.001300 hours 11 1.39 0.18 1.71 0.14 0.76 t(10)¼ –1.86 p< 0.255

Japanese N SOPI SD WPA SD r t Tests Probability

150 hours 59 0.98 0.05 1.11 0.03 0.49 t(58)¼ –4.44 p< 0.001300 hours 17 1.217 0.01 1.182 0.01 0.02 t(16)¼ 1 p< 0.166

Russian N SOPI SD WPA SD r t Tests Probability

150 hours 17 1.017 0.06 1.094 0.01 0.75 t(16)¼ –1.6 p< 0.06300 hours 11 1.418 0.06 1.263 0.01 0.42 t(10)¼ 2.23 p< 0.02

340 FALL 2015

more willing to utilize their first-languageliteracy knowledge to support and enhancetheir performances. That literacy knowledgeis just not as useful in the learning of non-English cognate languages. A further odditywithin the data set is that second-year Rus-sian and Japanese writing ratings were lowerthan the speaking ratings.

Question 3: Does technology useinfluence foreign language writingassessment?Given that the WPA had already been con-ducted via handwriting, data existed onhow much written language learners wereable to produce. The question became oneof comparing total production across iden-tical prompts within a 15-minute timeframe (Prompt 1) or a 30-minute assess-ment (Prompt 2) in handwriting and via acomputer in two languages (Spanish andFrench).

Baseline data provided in Table 7 indi-cate that with Intermediate-level prompts,learners completing the first year of in-struction in Spanish (N¼ 204) and French(N¼ 81) produced approximately 190handwritten words (185 and 196, respec-tively, in Spanish and French) in 15 mi-nutes and approximately 197 whilekeyboarding (220 and 174 words, respec-tively). With the more advanced prompt,for which 30 minutes were allocated, first-year learners of Spanish and French

produced an average of 236 handwrittenwords (243 and 229 words, respectively)and around 287 words (305 and 270words, respectively) while keyboarding.All differences were statistically significant(Spanish, df(1,202)¼ 24.12, p< 0.001 anddf(1,202)¼ 32.72, p< 0.001; French,df(1,79)¼ 5.97, p< 0.02; the performanceof first-year French learners responding tothe 15-minute prompt was inconsistent, asthey wrote less on the computer (174words) than by hand (196 words),df(1,79)¼ 5.487, p< 0.002). Table 8 dis-plays data generated by second-year learn-ers of Spanish (N¼ 82) and French(N¼ 32). Not surprisingly, second-yearlearners produced more language thanfirst-year learners of Spanish and French:around 219 handwritten words (231 and206 words, respectively) and 252 words(288 and 230 words, respectively) whilekeyboarding when responding to the Inter-mediate-level prompts, and around 298handwritten words (314 and 282 words,respectively) and 360 typed words (378and 352 words, respectively) when re-sponding to the Advanced-level prompts.While it may be possible to question theimportance of keyboarding for first-yearlearners with any level of prompt due totheir limited language proficiency, the ad-vantage for more advanced learners is in-disputable; using a computer almost alwaysprovided statistically significant findings(Spanish, df(1,80)¼ 18.16, p< 0.001 and

TABLE 7

Mean Number of Words Produced by Students Writing by Hand or Using

a Computer Across Spanish and French Responding to Two Prompts

After 150 Hours of Instruction, 2013–2014

Spanish Hand Computer p

Prompt 1 185 220 df(1,202)¼ 24.12 < 0.001Prompt 2 243 305 df(1,202)¼ 32.72 < 0.001

French Hand Computer p

Prompt 1 196 174 df(1,79)¼ 5.487 < 0.002Prompt 2 229 270 df(1,79)¼ 5.97 < 0.02

Foreign Language Annals � VOL. 48, NO. 3 341

df(1,80)¼ 7.96, p< 0.01; French,df(1,30)¼ 4.66, p< 0.04). An outlier wasthe French performance with Prompt 1.Even though the subjects wrote more viacomputer than by hand, findings didnot reach significance (df(1,30)¼ 2.18,p< 0.15). Allowing more advanced learnersto keyboard afforded them the opportunityto produce significantly more language andhence resulted in richer samples.

DiscussionThis article provides baseline data frommore than 4,000 writing samples across10 languages (Tables 1–4), data from a sub-sample of more than 600 learners’ writtenand oral proficiencies in eight languages(Tables 5–6), and analyses of assessmentsof learners’ handwritten and keyboardingproduction in Spanish and French (Tables7–8). This extensive data set demonstrateshow postsecondary learners at this institu-tion developed their writing proficiency asmeasured against ACTFL writing proficien-cy criteria and provides a platform for gaug-ing the achievement of learners acrossdifferent institutions and program configu-rations in writing over 2 years of instruc-tion. These data also open an array ofresearch questions that may lead to produc-tive exploration. Of equal importance arethe curricular and pedagogical implicationsfor including writing instruction and profi-ciency-based assessments in the foreign lan-guage curriculum.

Research ImplicationsA first question to pose is whether the dataoffered here are consistent or inconsistentwith previous explorations of writing in for-eign language classrooms. The writing pro-ficiency tests conducted by Brown et al.(2009) differed in both structure andprompts, yet the present findings are consis-tent in that their subjectswere able to cross atleast one sublevel over the course of thesemester. Brown et al. (2011) utilized a simi-lar method in a course for third- and fourth-year Russian students and found studentsmoving at least one sublevel up in writtenproficiency over the course of the semester,as measured by pre- and post-WPTs, alsoconsistent with the present study. The sec-ond-year German students in Dodds’s study(1997) entered with Intermediate Low orIntermediate Mid writing proficiency andexited with the ability “to achieve Advancedlevel proficiency at least someof the time” (p.143), also consistent with the present study.Henry’s (1996) findings were also generallyconsistent with the findings from this inves-tigation. Rifkin (2005) conducted his studyover 3 years at the Middlebury Russianschool. Overall, the present data reflect theresults of Rifkin’s study, in that students inRussian achieved the samewriting proficien-cy levels after 150 and300hours of studyandthatfirst-yearRussian students in thepresentstudy did not have different speaking andwriting scores either. On the other hand,this comparison also outlines the outlier

TABLE 8

Mean Number of Words Produced by Students Writing by Hand or Using

a Computer Across Spanish and French Responding to Two Prompts

After 300 Hours of Instruction, 2013–2014

Spanish Hand Computer p

Prompt 1 231 288 df(1,80)¼ 18.16 < 0.001Prompt 2 314 378 df(1,80)¼ 7.96 < 0.01

French Hand Computer p

Prompt 1 206 230 df(1,30)¼ 2.18 < 0.15Prompt 2 282 352 df(1,30)¼ 4.66 < 0.04

342 FALL 2015

character of second-year Russian studentsfrom this sample, who had writing ratingsthat were lower than their speaking ratings.The data from the present investigation aresomewhat consistent with those of I.Thompson (1996), also focused on Russian,who found that first-year students gained amedian writing proficiency score of NoviceHigh—one sublevel lower than the studentsin this data set—with second-year studentsreaching a median writing score of Interme-diate Mid, which is consistent with the pres-ent data set. Comparing the median of thespoken and written proficiency scores in herstudy shows an inconsistency with the re-sults offered within the context of this arti-cle: Thompson’s students scored higher inwritten proficiency in both their first andsecond years of study of a non-English-cog-nate language.

Despite some inconsistencies, all of thedata across multiple studies point toward asteady growth in writing proficiency relatedto the total amount of time spent in instruc-tion. The level of growth clearly varies,though not remarkably so, across an arrayof institutions. Further research is obvious-ly critical to determine the amount ofgrowth across different institutional andinstructional configurations: forthcomingstudies need to examine class size in rela-tion to writing proficiency outcomes as wellas level of instructor professional develop-ment. In a multivariate world, researchmust approach issues in foreign languagelearning and instruction in a multivariatemanner in order to uncover optimal combi-nations of factors that lead to student suc-cess and instructor satisfaction.

A more specific area for research is in-vestigating writing proficiency attainmentbetween and among languages. Approxi-mately 75% of the learners completing theirfirst year of an English cognate language(French, German, Italian, Portuguese, Span-ish) were in the Intermediate Mid to Inter-mediate High range. This indicates that mostwere emergent paragraph-level writers, thusevidencing ability well beyond lists or isolat-ed sentences. Learners completing a first year

of an English noncognate language (Arabic,Chinese, Japanese,Korean,Russian)were, onthe whole, in the Intermediate Low to Inter-mediateMid range. Their performances indi-cate that they were writing at the sentencelevelwith little emergent discourse structure.More advanced students, having completed asecond year or a total of 300 hours of instruc-tion, were Intermediate High to Advancedamong English cognate language learners,meaning they were capable of composingstructured paragraphs using a relatively com-plete grammatical arsenal. Those at the samelevel in the English noncognate languageswere generally Intermediate Mid, indicatingthat their ability to write in paragraphs wasjust emerging. Generally speaking, the sec-ond-year students were approaching thewriting criterion that is expected of a numberof professions such as secondary schoolteaching and some bilingual secretarialwork. More research is required to refinethe target-level descriptors and tounderstandmore thoroughlywhether topic, for example,influences writing performance (see Cox,Brown, & Burdis, 2015).

The data also contribute to our under-standing of the role of English languageliteracy in the learning of writing in En-glish-cognate languages as well as alphabet-ic languages, as compared with languagesthat utilize character systems. The baselinedata suggest that, indeed, English-speakinglearners have an advantage in developingproficiency in writing in English-cognatelanguages over English-speaking studentswho are learning noncognate languages. Al-though this may be intuitively obvious, un-derstanding the nature of the advantage isthe critical insight. The advantage appearsto be at least a sublevel (Low to Mid or Midto High) on the ACTFL rating scale. Theextent to which this research finding holdsacross learners in other academic contextsand whether their level of English languageliteracy has an impact on writing rating areimportant areas for further investigation.Factor analytic research designs will lendthemselves to productive explorations inthis arena.

Foreign Language Annals � VOL. 48, NO. 3 343

An additional area for research is con-tinuing to understand the relationship be-tween oral and writing performance. Asnoted in the literature review, when Hubert(2013) compared oral and writing proficien-cy in Spanish for students enrolled in sec-ond-year and third-year courses, he foundthat “speaking and writing proficiencies ap-peared to rise at fairly similar rates as learnerspassed frombeginning through intermediateto advanced levels of Spanish study” (p. 92).This finding is inconsistent with the presentdata collection. The discrepancy should beinvestigated. Hubert’s ratings were betweenNoviceMid and Intermediate High, differentratings from those collected in the presentstudy, and those ratings may account for thedifference. Indeed, the current study didindicate a positive relationship between theoral and writing performance, yet that rela-tionshipwithin this databasewas not strong.Writing proficiency in the English cognatelanguages was almost always higher thanoral proficiency within the present database,but not exclusively. Again, numbers of sub-jects, the nature of the relationship betweenand among languages, and the dedication tothe writing process within postsecondarycurricula probably are influential. These as-sociations should be explored in greaterdepth to fundamentally understand howspoken and written language proficienciesare linked. Interestingly, in the direct com-parisons between speaking and writing,writing ratingswere statistically significantlyhigher in most cases, but not all, at both the150-hour and300-hour levels of instruction.It appears that in many cases, as learnersbecame more knowledgeable and comfort-able in the language, that relationship be-tween the two modes widened. Toward theupper proficiency ranges, vocabulary andsyntax changed, making written languagemore nuanced and complex and far lesslike oral language. The data collected hereprovide the profession with a view onhow learners cope with that additionalcomplexity.

The data offered here should be inter-preted in light of the Byrnes et al. (2010)

work that examined Advanced-level writers.Are the data generated with first- and sec-ond-year learners consistent with the ex-pectations expressed by researchersexaminingmore advanced learners? In otherwords, is there a potential gap between whatis conventionally defined in the professionas early language learning vs. the knowledgeand skills that learners are expected to ac-quire in more advanced courses? Exploringsuch questions will provide the professionwith a critical research base that will informmore nuanced curriculum development.

Finally, technology and its impact onwriting performance need to be probed indepth. While the data collected within thisinvestigation permitted no ancillary assis-tance in the writing process, that lack ofassistance is fundamentally artificial. Con-temporary writers almost inevitably useelectronic assistance. The foreign languageprofession needs to understand thoroughlythe implications of various kinds of assis-tance for foreign language writers. Doesenabling writers to employ outside assis-tance enhance their performance? What isthe difference between generating a sponta-neous writing performance without assis-tance and a planned writing performancewith permitted assistance?

Pedagogical ImplicationsThe descriptive and inferential data have im-plications fordeveloping abroadunderstand-ing of student foreign language writingperformance in basic language sequences.Historically,writing has been viewed instruc-tionally as a vehicle for assessing grammaticalperformance. In other words, students havetraditionally been asked to “write a composi-tion” and then have received feedback aboutboth content and grammar as separate enti-ties. Generally speaking, the grammar scoreis the more highly weighted component ofthe assessment. However, a grammar scoredoes not communicate the full array ofwhat alearner can do with writing and the extent towhich a learner can communicate ideas inwritten language. A proficiency orientation

344 FALL 2015

calls for a holistic perspective that examinesfeatures of content, function, accuracy, andtext type as critical components. Using thegauge of the ACTFL guidelines for writingproficiency offers a more complex lensthrough which instructors can view theirstudents’ performance. Furthermore, basingassessment tasks and rubrics on the guide-lines assists in realistic goal setting. Theassessment development process, the assess-ments themselves, and the data provide awindow into howmuch this sample of learn-ers was able to accomplish across multiplelanguages.

These data also encourage instructors tothink about how they consider their stu-dents’ collective language performances:Learners’ ability to communicate messagesorally, for example,may not necessarily be atthe same level as their ability to completesimilar tasks in writing. In fact, students’writing ability in the English cognate lan-guages was almost always higher than theirspeaking ability within this investigation.Explanation for this finding must lie in thefact that writing is a planned performance,even when it is impromptu, and this contextenables learners to have more thinking timein contrast to an impromptu speaking per-formance. Note that for English noncognatelanguages, instructors need to recognize thattheir learners may bemore adept at speakingthan at writing because writing places extracognitive burdens on them—not only con-tent, but the written form itself. These find-ings may help instructors develop curriculathat can reflect the realities of individuallanguages, enabling a more sensitive viewof the distribution of instructional time. Per-haps in the English cognate languages,homework assignments are sufficient athelping students to learn to produce writtenlanguage. However, based on these data, itappears that English noncognate languagecurricula may need to allocate more instruc-tional time in class to written production.

The data also give some indication ofhow much written language learners canproduce in limited amounts of time. Suchinformation may help gauge the

development of expectations and specifica-tions for classroom tasks and assessmentsthat target discrete elements of written lan-guage such as discourse type, function,grammar, and cohesion. Similarly, the de-velopment of higher-stakes tests, such asmidterm and final examinations, wherelearners often lament that they do nothave enough time, can benefit from an un-derstanding of these baseline data. Implicitin these data is the development of betterunderstanding on the part of instructors ofhow to construct authentic and research-based assessments that reflect the holisticnature of writing.

In addition, the data imply that instruc-tors need to understand that, at the end of acourse or instructional sequence, studentswill demonstratediffering levels ofknowledgeand skills, and that instructors must expectthat students will receive a range of proficien-cy ratings. While the data offered here clusteraround the Intermediate to Advanced ranges,there were nevertheless some performancesin the Novice range. When examining curric-ula, it is important for programs to discuss theextent to which word-level writing perform-ances are acceptable either at the endof afirst-year sequence for the English cognatelanguages or at the end of a second-year se-quence in the English noncognate languages.The data also offer an opportunity for com-paring and thus better understanding individ-ual students’ performance.

The data offered in this study also pro-vide support for learners’ use of keyboard-ing from the beginning of instruction: Thedata indicate that learners almost alwaysprovided a larger sample of languagewhen they were permitted to type. Longersamples of written language provide in-structors with more data from which tomake sound instructional and assessmentjudgments. In addition, instructors needto understand that time spent teaching key-boarding—e.g., how to locate an accentmark—is not time wasted on clerical skills.Rather, foreign language instructors need totake the responsibility of understanding dif-ferent language keyboards and of

Foreign Language Annals � VOL. 48, NO. 3 345

instructing students in the use of these key-boards, both to facilitate the production ofwrittenwork at their home university and tobetter understand keyboarding as a culturalcomponent in instruction when studyingabroad. Providing a window into the typeand level of technology used by members ofthe culture students are learning is critical.

Finally, this investigation implies aneed for a particular kind and level of pro-fessional development. Thewriting promptsand ratings used throughout this investiga-tion were generated by certifiedWPT raters,who were also perforce OPI certified. Theseraters were able to bring an expert level ofprofessional knowledge to the project and toprogram development, curriculum design,instruction, and assessment. Instructorswith this level of knowledge display confi-dence in their ratings and in their learners.Such instructors also display confidence ineach other. The data across cognate andnoncognate languages were astonishinglysimilar. Indeed, time in instruction was dif-ferent (1 year as opposed to 2 years), but asimilar progression occurred across thewide array of languages investigated. Thesetypes of data enable instructors to see thatostensible language difficulty for English-speaking learners (e.g., Arabic learning vs.German learning) is not as critical as allo-cated—and engaged—time. Coming to thiskind of understanding of written languagedevelopment enables instructors to emergefrom linguistic silos and brings them into amore collaborative form of collegial interac-tion across all languages.

Caveats and ConcernsA critical step in designing any program inwriting assessment is to validate a protocolthat is convenient and cost-effective. In thepresent case, this means examining the WPAin light of its relationship to the WPT. Thisvalidationmust take the formof having learn-ers sit for each exam in close time proximityto probe whether they are awarded the same,or virtually identical, ratings. This classic val-idation process is, of course, costly in terms of

both time and money, yet is critical towardestablishing the credibility of the WPA.

In addition, a number of contextualfactors may have influenced the distributionof proficiency ratings that are reported here.Because of the institutional context and ad-mission requirement, the learners them-selves were likely to be, on average, moreverbally astute than many postsecondarylearners who presented a more moderaterange of native language abilities. Whenclass size was limited to 15 students, in-structors may also have been able to payadditional, special attention to learners’ lit-eracy development. Finally, since all of theinstructors were WPT certified or were inthe process of becoming WPT certified, in-structors’ level of professional knowledgemust be taken into consideration when con-sidering the findings from this study.

Renewing writing prompts is also a con-cern that is worthy of consideration. First,when designing a template that can accom-modate a variety of languages, participation,input, and a commitment of time from in-structors themselves are crucial. A designated—and rotating when possible—team of in-structors who are trained in like fashionbecomes important in generating new itemsand potential contexts. Ensuring that all lan-guages are represented—noncognate andcognate alike—at the time of writing promptdevelopment is important both to sustainbuy-in to the assessment process and to avoiddisadvantaging or, conversely, privilegingcertain languages. In addition, like any assess-ment that is given on a regular basis, andmostparticularly high-stakes, end-of-course as-sessments, maintaining test security whileavoiding recycling the same core writingtasks and topics is critical, particularlywhen the tasks and contexts are predictable,especially at theNovice and Intermediate lev-el. Since the use of familiar or semi-familiarwriting contexts supports lower-proficiencywriters in producing as much language aspossible, the contexts of the writing promptsmust be general enough to be accessible to abroad range of students, yet close enough tostudents’ experiences to stimulate them to

346 FALL 2015

write, thus making topic and task selectionmore difficult and deserving of attention.

ConclusionThis article outlines the staff and assessmentdevelopment procedures that resulted in thecreation of an assessment of students’ writ-ing that was aligned with the ACTFL guide-lines and consistent with WPT training andcertification processes. The assessmentsused protocols that replicated those usedby certified WPT trainers. In addition, thestudy reports data that were gathered over a5-year period using those assessments frommore than 4,000 samples of undergraduatewriting after 1 year (150 hours) and morethan 1,000 samples after 2 years (300 hours)of language instruction in French, Italian,Spanish, German, Portuguese, Arabic, Chi-nese, Japanese, Korean, and Russian. Thedata provide insight into foreign languagelearners’ productive capacities in writingacross languages. Thanks to SOPI ratingsfor more than 600 learners in a cohortgroup, the article also illustrates how thosewriting ratings related to students’ oral rat-ings. In addition, the data support the ad-vantages offered by allowing students toprepare and submit their assessments bycomputer as well as the ease and securitythat are provided when assessments aremanaged using secure, electronic submis-sion and rating processes. Moreover, thedata provide insight into curriculum devel-opment: Having substantive informationhelps set expectations for students acrosslanguage and levels and allows instructorsto set appropriate student learning out-comes and design cohesive learning experi-ences. Further, the overall assessmentproject demonstrates the impact of an ex-tensive and ongoing professional trainingprogram for instructors: Instructors’ deepand consistent level of knowledge of boththe oral and writing proficiency guidelinesas well as their formal OPI and WPT train-ing underpinned the development of thewriting protocol and its use across lan-guages and instructional levels.

References

Armstrong, K. M. (2010). Fluency, accuracy,and complexity in graded and ungraded writ-ing. Foreign Language Annals, 43, 690–702.

Bernhardt, E. (1997). Victim narratives orvictimizing narratives? Discussions of the re-invention of language departments and lan-guage programs. ADFL Bulletin, 29, 13–19.

Bernhardt, E. (2009). Systemic and systematicassessment as a keystone for language andliterature programs. ADFL Bulletin, 40, 14–19.

Bernhardt, E. (2011).Understand advanced sec-ond-language reading. New York: Routledge.

Bernhardt, E., & Brillantes, M. (2014). Thedevelopment, management and costs of alarge-scale foreign language assessment pro-gram. InN.Mills& J. Norris (Eds.), Innovationand accountability in language program evalua-tion (pp. 41–61). Boston: Heinle & Heinle.

Bernhardt, E., Vald�es, G., &Miano, A. (2009).A chronicle of standards-based curricular re-form in a research university. In V. Scott (Ed.),Principles and practices of the standards in col-lege foreign language education (pp. 54–85).Boston: Heinle & Heinle.

Breiner-Sanders, K., Lowe, P., Miles, J., &Swender, E. (2000). ACTFL proficiencyguidelines: Speaking, revised 1999. ForeignLanguage Annals, 33, 13–18.

Breiner-Sanders, K., Swender, E., & Terry, R.(2002). Preliminary proficiency guidelines–Writing revised 2001. Foreign Language An-nals, 35, 9–15.

Brown,N.A., Brown, J.,&Eggett, D. L. (2009).Making rapid gains in second languagewriting:A case study of a third-year Russian languagecourse. Foreign Language Annals, 42, 424–452.

Brown, N. A., Solovieva, R. V., & Eggett, D. L.(2011). Qualitative and quantitative measuresof second languagewriting: Potential outcomesof informal target language learning abroad.Foreign Language Annals, 44, 105–121.

Byrnes, H., Maxim, H. H., & Norris, J. M.(2010). Realizing advanced foreign languagewriting development in collegiate education:Curricular design, pedagogy, assessment.Mod-ern Language Journal, 94 [Supplement], 1–235.

Clark, J. L., & Li, Y. C. (1986). Development,validation, and dissemination of a proficiency-basedtest of speaking ability in Chinese and an associatedassessment model for other less commonly taughtlanguages. Washington, DC: Center for AppliedLinguistics.

Foreign Language Annals � VOL. 48, NO. 3 347

Clifford, R., & Cox, T. L. (2013). Empiricalvalidation of reading proficiency guidelines.Foreign Language Annals, 46, 45–61.

Cox, T., Bown, J., & Burdis, J. (2015).Exploring proficiency-based vs. perfor-mance-based items with elicited imitationassessment. Foreign Language Annals,doi:10.1111/flan.12152

Cox, T. L., & Clifford, R. (2014). Empiricalvalidation of listening proficiency guidelines.Foreign Language Annals, 47, 379–403.

Dandonoli, P., & Henning, G. (1990). Aninvestigation of the construct validity of theACTFL Proficiency Guidelines and oral inter-view procedure. Foreign Language Annals, 23,11–21.

Dodds, D. (1997). Using film to build writingproficiency in a second-year language class.Foreign Language Annals, 30, 140–147.

Glisan, E. W., Swender, E., & Surface, E. A.(2013). Oral proficiency standards and foreignlanguage teacher candidates: Current findingsand future research directions. Foreign Lan-guage Annals, 46, 264–289.

Godfrey, L., Treacy, C., & Tarone, E. (2014).Change in French second language writing instudy abroad and domestic contexts. ForeignLanguage Annals, 47, 48–65.

Hedgcock, J. (2005). Taking stock of researchand pedagogy in L2 writing. In E. Hinkel(Ed.), Handbook of research in second languageteaching and learning (pp. 597–613). Mahwah,NJ: Erlbaum.

Henry, K. (1996). Early L2 writing develop-ment: A study of autobiographical essays byuniversity-level students of Russian. ModernLanguage Journal, 80, 309–326.

Hubert, M. D. (2013). The development ofspeaking and writing proficiencies in theSpanish language classroom: A case study.Foreign Language Annals, 46, 88–95.

Kenyon, D., & Tschirner, E. (2000). The rat-ing of direct and semi-direct oral proficiencyinterviews: Comparing performance at lowerproficiency levels. Modern Language Journal,84, 85–101.

Leki, I. (1995). Coping strategies of ESL stu-dents in writing tasks across the curriculum.TESOL Quarterly, 29, 235–260.

Leki, I., & Carson, J. (1997). “Completelydifferent worlds”: EAP and the writing expe-riences of ESL students in university courses.TESOL Quarterly, 31, 39–69.

Malone, M. (2000). Simulated oral proficiencyinterviews: Recent developments [Online re-source digest]. Retrieved July 10, 2013, fromhttp://www.cal.org/resources/digest/0014sumulated.html

National Standards in Foreign Language Edu-cation Project. (2006). Standards for foreignlanguage learning: Preparing for the 21st centu-ry. Yonkers, NY: ACTFL.

Omaggio, A. (1986). Teaching language in con-text: Proficiency-oriented instruction. Boston:Heinle & Heinle.

Reichelt, M. (1999). Toward a more compre-hensive view of L2 writing: Foreign languagewriting in the U.S. Journal of Second LanguageWriting, 8, 181–204.

Rifkin, B. (2005). A ceiling effect in traditionalclassroom foreign language instruction: Datafrom Russian. Modern Language Journal, 89,3–18.

Shohamy, E., Gordon, C., Kenyon, D. M., &Stansfield, C. W. (1989). The developmentand validation of a semi-direct test for assess-ing oral proficiency in Hebrew. Bulletin ofHebrew Higher Education, 4, 4–9.

Stanford University Board of Trustees. (1994).Report of the commission on undergraduate ed-ucation. Stanford, CA: Stanford University.

Stansfield, C. W., & Kenyon, D. M. (1992).The development and validation of a simulat-ed oral proficiency interview. Modern Lan-guage Journal, 76, 129–141.

Swender, E. (2003). Oral proficiency testing inthe realworld: Answers to frequently asked ques-tions. Foreign Language Annals, 36, 520–526.

Thompson, I. (1996). Assessing foreign lan-guage skills: Data from Russian. Modern Lan-guage Journal, 80, 47–65.

Thompson, R. J. Jr., Walter, I., Tufts, C., Lee,K. C., Paredes, L., Fellin, L., et al. (2014).Development and assessment of the effective-ness of an undergraduate general educationforeign language requirement. Foreign Lan-guage Annals, 47, 653–668.

Vald�es, G., Haro, P., & Echevarriarza, M. P.(1992). The development of writing abilitiesin a foreign language: Contributions toward ageneral theory of L2 writing.Modern LanguageJournal, 76, 333–352.

Submitted May 12, 2015

Accepted June 17, 2015

348 FALL 2015

APPENDIX

Sample Prompts From 2014 WPA

Short FormYou and your family will be hosting an exchange student from [place] this summer.The student wants to know about your hometown and the surrounding area, and some of thethings to see or do while s/he is there. Write an e-mail in [language] to this student in whichyou:

1. Briefly describe your town (or neighborhood), its location, geography, attractions, etc.2. Describe, in one or two paragraphs, a local event or tradition typical of your community,

for example, a celebration, festival, social or religious practice, etc. Compare it with anevent or tradition that may be similar to one where the exchange student is from.

3. Ask four or five questions to find out more about the student in order to plan for her/hisarrival.

Note: Be sure to include an appropriate greeting, introduction, and closing in your message.Suggested length: 2–3 paragraphsSuggested time: 20–25 minutes

Long Form (includes Short Form)Imagine that you have been asked to contribute a short article to a [language] blog. The bloghas recently published a series of articles on the presence of individual and team sports withinAmerican universities. You have been asked to write a short essay in [language] that focuseson the role that organized sports play in campus life. In your essay, you should:

1. First, give a snapshot description of the issue from your perspective as a Stanford student.For example, how prevalent are sports on campus? Does participation in a team sportchange the college experience for those students? Second, briefly compare this withanother campus organization you feel is of equal importance, e.g., sorority or fraternity,student government, creative arts, professional club, etc.

2. Now recount a specific past experience or event that you observed or heard about (or inwhich you yourself participated), relating to a sport or other campus group. Describe indetail what happened and how this event illustrated the relationship of the particularorganization to campus life.

3. Finally, present your opinion on what you think the role of sports should be within auniversity setting. For example, is it essential to developing school spirit and community,or could this be accomplished through an alternate structure? To what degree shoulduniversities support organizations more closely related to academics? If you were acampus administrator, what changes would you make to the current balance betweensports and academics on campus?

Suggested length: 3–4 paragraphsSuggested time: 30 minutes

Foreign Language Annals � VOL. 48, NO. 3 349


Recommended