9
Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus AlbertoSim˜oes University of Minho [email protected] Diana Santos University of Oslo and FCCN [email protected] Resumen: En este art´ ıculo describimos una herramienta que crea ejercicios gra- maticales para la ense˜ nanza universitaria del portugu´ es como lengua extranjera, utilizando grandes corpora anotados. Despu´ es de presentar los motivos que nos llevaron a crear dicha herramienta, y el contexto en el que fue desarrollada, pre- sentamos sus especificaciones y sus aplicaciones, analizando algunos requisitos con ejemplos concretos. Presentamos su uso en un contexto de ense˜ nanza real y com- paramos esta herramienta con otros sistemas existentes. Terminamos con una breve descripci´on de trabajos en curso y futuros relacionados con el tema. Palabras clave: corpora, ense˜ nanza de la lengua asistida por ordenador, corpora anotados, herramientas did´acticas, portugu´ es Abstract: In this paper we describe Ensinador, a tool that creates grammar ex- ercises for teaching Portuguese as a foreign language, at university level, based on large annotated corpora. After discussing the motivation for the tool, and the con- text in which it was deployed, we present its specification and its implementation, discussing the particular requirements with concrete examples. We report actual use of the tool in teaching, and compare it to other systems in the literature, ending the paper with a short description of future and ongoing related work. Keywords: corpora, computer assisted language learning, annotated corpora, teaching aids, Portuguese 1 Claims on the use of corpora for language teaching It is a well rehearsed “truth” to say that text corpora are valuable aids for language and linguistics teaching, see e.g. Bacelar do Nascimento (1997). On the other hand, there is no denying that there are many more claims on this than reports of actual use in the classroom, as for example the Teaching and Language Corpora (TaLC) conferences proceedings illustrate. This stems, in our opinion, from three main reasons: the unfortunate separation between cor- pus compilers and corpus users, who are rarely the same people, as discussed and problematized by Santos (1999); individual teaching is and should be — centered on the individual students and classes, and therefore teachers nei- ther have the urge nor the confidence to report their experiences as generalizable or directly interesting for others; despite very honourable exceptions, uni- versity and high school language teach- ers are often the least interested and computer-savvy users of technology and computers. In fact, our experience has been sobering in this respect: Af- ter having lectured for more than a decade around Portugal and Brazil on the AC/DC corpora, the two main an- swers, apart from total neglect, have been: (i) this is not the type of text that interests us/that we base our teaching in; (ii) it is too difficult to use (so they do not even try). So this paper is a first report on how we de- cided to change this situation, not as a cor- pus development community from the out- side, but by developing corpus-based teach- ing aids, from the inside, and using them in the classroom, at the Portuguese grammar courses at the University of Oslo. This demonstrates on the field the poten- Procesamiento del Lenguaje Natural, Revista nº 47 septiembre de 2011, pp 301-309 recibido 02-05-2011 aceptado 26-05-2011 ISSN 1135-5948 © 2011 Sociedad Española Para el Procesamiento del Lenguaje Natural

Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

Embed Size (px)

Citation preview

Page 1: Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

Ensinador: corpus-based Portuguese grammar exercises

Ensinador: Ejercicios de gramatica portuguesa basados en corpus

Alberto SimoesUniversity of Minho

[email protected]

Diana SantosUniversity of Oslo and [email protected]

Resumen: En este artıculo describimos una herramienta que crea ejercicios gra-maticales para la ensenanza universitaria del portugues como lengua extranjera,utilizando grandes corpora anotados. Despues de presentar los motivos que nosllevaron a crear dicha herramienta, y el contexto en el que fue desarrollada, pre-sentamos sus especificaciones y sus aplicaciones, analizando algunos requisitos conejemplos concretos. Presentamos su uso en un contexto de ensenanza real y com-paramos esta herramienta con otros sistemas existentes. Terminamos con una brevedescripcion de trabajos en curso y futuros relacionados con el tema.Palabras clave: corpora, ensenanza de la lengua asistida por ordenador, corporaanotados, herramientas didacticas, portugues

Abstract: In this paper we describe Ensinador, a tool that creates grammar ex-ercises for teaching Portuguese as a foreign language, at university level, based onlarge annotated corpora. After discussing the motivation for the tool, and the con-text in which it was deployed, we present its specification and its implementation,discussing the particular requirements with concrete examples. We report actualuse of the tool in teaching, and compare it to other systems in the literature, endingthe paper with a short description of future and ongoing related work.Keywords: corpora, computer assisted language learning, annotated corpora,teaching aids, Portuguese

1 Claims on the use of corporafor language teaching

It is a well rehearsed “truth” to say thattext corpora are valuable aids for languageand linguistics teaching, see e.g. Bacelardo Nascimento (1997). On the other hand,there is no denying that there are many moreclaims on this than reports of actual use inthe classroom, as for example the Teachingand Language Corpora (TaLC) conferencesproceedings illustrate.

This stems, in our opinion, from threemain reasons:

• the unfortunate separation between cor-pus compilers and corpus users, who arerarely the same people, as discussed andproblematized by Santos (1999);

• individual teaching is — and shouldbe — centered on the individual studentsand classes, and therefore teachers nei-ther have the urge nor the confidence toreport their experiences as generalizable

or directly interesting for others;

• despite very honourable exceptions, uni-versity and high school language teach-ers are often the least interested andcomputer-savvy users of technology andcomputers. In fact, our experiencehas been sobering in this respect: Af-ter having lectured for more than adecade around Portugal and Brazil onthe AC/DC corpora, the two main an-swers, apart from total neglect, havebeen: (i) this is not the type of text thatinterests us/that we base our teachingin; (ii) it is too di!cult to use (so theydo not even try).

So this paper is a first report on how we de-cided to change this situation, not as a cor-pus development community from the out-side, but by developing corpus-based teach-ing aids, from the inside, and using them inthe classroom, at the Portuguese grammarcourses at the University of Oslo.

This demonstrates on the field the poten-

Procesamiento del Lenguaje Natural, Revista nº 47 septiembre de 2011, pp 301-309 recibido 02-05-2011 aceptado 26-05-2011

ISSN 1135-5948 © 2011 Sociedad Española Para el Procesamiento del Lenguaje Natural

Page 2: Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

tialities of annotated corpora, and makes thetools user-driven, to an extent that outsidelecturers could never achieve.

In what follows, we describe the rationaleand the use of Ensinador1 as the first versionof a tool suite designed for using the powerof the annotated AC/DC corpora to createexercises and teaching materials about Por-tuguese grammar.

2 Pedagogical assumptions andcontext

We concur with Sinclair (1997, page 31) thata student should be presented authentic ma-terials from the start.

[...] it is almost impossible to invent anadequate example; attempts by languageteachers, lexicographers an others to rep-resent usage are often embarrassing andnever reliable.

The same point is also forcefully made bySardinha (2007) when discussing teaching ofEnglish in Brazil.

Moreover, in a time when access to writtenand spoken Portuguese, thanks to the Inter-net, is not a problem for any adult studentof the language, it is easy to expose or getexposed to language material.

The role of a university teacher, then, isto guide the students in mastering or at leastunderstanding and being aware of the gram-matical and cultural idiosyncrasies of the lan-guage s/he is teaching, illustrated and moti-vated by the actual speech of native languagespeakers.

We note that, in the context of the teach-ing to adult learners of an international lan-guage such as Portuguese, natively spoken inthe five continents, there is no point in requir-ing, or even expecting, a teacher to know in-timately all varieties. This speaks even moreforcefully for the use of authentic materialsin the classroom and outside the classroom.

Besides, no matter the excellency of PLE2

manuals, most students who attend a uni-

1The name, derived from the verb “ensinar” (toteach), is a pun on “escrevedor” (used in VargasLlosa’s book title in Portuguese Tia Julia e o es-crevedor, and is meant to express something feeblerand less worthy than a teacher (“professor”), just like“escrevedor” is less than a writer (“escritor”). Thesesubtleties are lost in English, where the correspondingnouns are regularly derived from the verbs.

2Portugues como lıngua estrangeira, that is, Por-tuguese for foreigners.

versity course have contacted directly withPortuguese in a Portuguese speaking country,and have therefore constructed their model ofthe language, which is richer and more diver-sified than any manual, which must be writ-ten for a general audience, can purport toaddress.

Finally, we believe that the teacher hassome, if not considerable, influence on whatthe students learn and grasp. Therefore thecreation of materials for teaching grammarand culture cannot be done automatically orbeforehand a particular curriculum and ped-agogical environment is defined and imple-mented.

Aligning with the current Natural Lan-guage Processing (NLP) current who believesin computer-assisted applications (ratherthan fully automatic ones) — this is a pointof view we have strived to defend in Lin-guateca — we created this program not toreplace the teacher in the curriculum designor in the exercise choice, but to aid the de-ployment of both curriculum and exercises,leaving the teacher in control.

And this is how we o"er this service tothe whole community of PLE teachers (andpossibly also teachers of Portuguese grammarto native speakers): teachers are thus able tocreate their own materials, by selecting, andeven correcting, the material automaticallydiscovered by Ensinador.

This is the main di"erence between ourapproach and that of VISL (Bick, 2005) andWERTi (Working With English Real Texts)by Meurers et al. (2010), which are meant tobe directly used by the learners – with con-sequent attention to their interface and us-ability. Of course there are also other di"er-ences compared to these older and excellentsystems: While WERTi annotates full textson demand with specific developed gram-mars, and creates several exercises on top ofthose texts, it does so for a fixed and pre-programmed number of well-known di!cul-ties for learners of English. Our approach,in contrast, uses pre-annotated (parsed) cor-pora, but allows any possible kind of contextand thus exercise type to be devised by theteacher/user. As to VISL, its setup is cur-rently limited to simpler exercises that can begeneralized to the multitude of the languagesencompassed by the project, while Ensinadorwas devised to exploit maximally the idyosin-crasies of the Portuguese language (or of the

Alberto Simões y Diana Santos

302

Page 3: Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

annotated corpora behind it). Finally, an im-portant property of both WERti and VISLis their immediate feedback/correction afterthe user inputs. This is, however, not theway we intend the exercises to be used andunderstood, as explained later in this paper.

Before we present Ensinador in detail, wegive some information on the context it wasdeveloped, briefly introducing AC/DC andLinguateca.

3 The AC/DC service

One of the main goals of Linguateca, aproject devoted to fostering R&D on thecomputational processing of the Portugueselanguage (Santos, 2000; Santos, 2009), wasto create widely available resources for Por-tuguese. Therefore, since 1999 (Santos andBick, 2000) we have been serving access tocorpora on the Web as one of the first ser-vices on this kind, and this service has pro-gressed and improved ever since, giving aswell origin to other more specialized projectsand services3.

Currently (Santos, 2011) the AC/DC ser-vice gives access to more than 250 millionwords from various genres, from more than20 di"erent corpora, all of them syntacti-cally analysed with PALAVRAS (Bick, 2000)which is a full fledged parser for Portuguesedeveloped in the context of the VISL projectby Eckhard Bick. In addition, several otherkinds of information are associated with thesecorpora, and the most relevant for the pur-poses of the present paper is the semanticinformation on some domains such as colour,clothing and sentiments4.

We believe that AC/DC is one of the mostpowerful and knowledge-rich corpus systemson the Web no matter the language, and sev-eral other services have been built on top ofit: a frequency service, and a semantic re-lation discovery service (Freitas et al., 2011)are examples of such tools.

Ensinador is also built on top of AC/DC,and we turn to its functionalities now.

3For example, the first treebank for Portuguese,Floresta Sinta(c)tica (Afonso et al., 2002), and paral-lel corpora such as COMPARA (Frankenberg-Garciaand Santos, 2002) and CorTrad (Tagnin, Teixeira,and Santos, 2009).

4Currently only fear is available, see (Maia andSantos, 2011).

4 Ensinador: Functionalities

Basically, we wanted to create exercises basedon particular features of grammar — and getat the same time the exercise and its solution.

The kind of exercise that Ensinador pro-duces is a cloze test, that is, the studentis given sentences where one or more wordshave been removed, and s/he should fill theblanks, therefore restoring the original text.As a very simple example, consider the fol-lowing sentence to be completed with the cor-rect tense of the verb ter (to have):

A minha irma mais velha morreu quando88 anos.

Having access to millions of words, thefirst requirement was to be able to selectamong the thousands of possible candidates.

In addition, to create a single exercisewe might employ di"erent query expressions,and apply them to di"erent corpora5.

This means that after a sizeable numberof selected examples the teacher would like toshu#e them in random order — keeping theexercises and solutions aligned, of course.

In some cases, when the goal of the ex-ercise concerns for example tense (a mor-pheme in most cases in Portuguese), in orderto recreate the original utterance one wouldneed to know the actual verb that had beenremoved, so the next requirement was to haveit in the lemma form, after the sentence, seefigure 1.

But soon other, more complex, casesturned up: for example, to drill the useof quantifiers, it was deemed important tochoose the right number of a particular noun6

and therefore, also the noun lemma was re-quired in a search where more than one wordwas specified, see figure 2. (In other words,depending on the search expression, and onthe actual complexity of the results, di"erentinformation could be needed to be expressedas an aid to the exercise).

In still other cases, exercises whose goal isto distinguish between false friends or relatedmeanings (such as reflexive vs. non-reflexive

5For example, corpora of newspaper text, or oralinterviews, or blogs, or literary fiction, or even tech-nical language...

6Given that there are the four possibilities todaa historia, todas as historias, a historia toda, ashistorias todas with di!erent meanings and shades,roughly paraphrasable in English by the similar set“each story”, “all stories”, “every story”, “the wholestory”, “the whole stories”.

Ensinador: corpus-based Portuguese grammar exercises

303

Page 4: Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

Figure 1: Exercise on the position of the nunca and sempre adverbs.

Figure 2: Exercise on the todo quantifier (position and form)

conjugation) require that the student is giventhe verbal tense and person so that s/he caninsert the right lexeme in the correct form;see figure 3.

The result is then obtained as simpleHTML pages (one for the exercises and other

for the solution), which can easily be editedby the teacher afterwards.

5 Ensinador: how it works

Ensinador is a web application. Its basic be-havior is similar to a simple concordance sys-

Alberto Simões y Diana Santos

304

Page 5: Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

Figure 3: Exercise on the choice between reflexive or non-reflexive conjugation

tem: it requires the user to introduce a searchstring, using the Corpus Query Processor(CQP) syntax (Christ et al., 1999; Evert andthe OCWB Development Team, 2010), andto select the corpus to be searched. Option-ally, the user might choose to define a title forthe exercise section being created. Figure 4shows this interface.

Figure 4: Start point when using Ensinador.

Although any well formed query can beused, it is important to note that it is the hitsthat are replaced by blanks. It is thereforepossible, and useful, to specify cases whereboth simple and complex tenses (for exam-ple) may be used. (The same space, how-ever, should be provided in the exercise, notto uncover the solution.)

For example, one might search bythe verb “comer” in any of its forms,using “[pos="V.*"]* [lema="comer" &

func=".MV.*"].” The tool will present apage with the results in concordance form.An example of searching for this expressionin a test corpus is presented in figure 5.

Figure 5: Ensinador presenting concordancesto the user.

Note that each concordance has a checkbox at the left. They are used by theteacher to choose the example sentences forthe exercise, according to his or her personaltaste and appropriateness to the grammaticalpoints s/he wishes to illustrate. After choos-ing the sentences to be used and clicking the“OK” button, Ensinador presents the currentstatus of the exercise, as shown in figure 6.

Ensinador: corpus-based Portuguese grammar exercises

305

Page 6: Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

Figure 6: Ensinador presenting status of theexercise.

In this step the tool shows the exercise asit would be presented to the student, withthe part of the sentence that matched withthe query replaced by a blank space to befilled in by the student.

At this point the teacher is able to do sev-eral tasks, such as

• to see the solution, that is, this same in-terface but with the blanks replaced bythe respective words;

• to scramble the order of the sentences(as many times as s/he wishes);

• to download the exercise or the solutionin HTML format (without the buttoncontrols);

• to add a global title to the exercise.

Also, it is possible to perform new searches,and add them to the current exercise. Thisnew search can use a di"erent search expres-sion, can add the lines to the same exercisesection (under the same title) or in a newone, as well as change the corpus where theexpression will be searched.

As discussed previously, to produce mean-ingful exercises it is often necessary to pro-vide information about what should fill theblanks. To allow for this, the CQP syntaxwas extended in order to let the teacher se-lect extra information to be presented to thestudent. This extension allows the teacherto add one or more attributes (provided theyare encoded in the corpus) to be added tothe end of the sentence, in parenthesis. Forexample, the query “[pos="V"].lema” willfind all sentences with verbs, and present theverb lemma:

O atendimento que a casa as mulherese atraves de reunioes e cursos. (fazer)

Another example: In order to requestverbs with clitics, one could employ aquery such as [pos="PRON"].pessnum[pos="V"].lema.temcagr, which wouldthen produce exercises like the following:

Quem dinheiro era imediatamenteparticipado a polıcia. (ela, dar, imperfeito doconjuntivo)

6 Implementation

Ensinador is built over the AC/DC corpora,that are encoded in the Open IMS Cor-pus Workbench (Evert and the OCWB De-velopment Team, 2010), and tagged withPALAVRAS (Bick, 2000). The applicationinterface is written in the Perl programminglanguage, using the CWB::CQP module, andinterfaces with PHP configuration files usingthe PHP::Include module.

The tool is built using a Common Gate-way Interface (CGI), using a minimal amountof Java Script that is controlled by jQuery7.It does not use any kind of session or cookiemanagement, making it easy to use with anyexistent browser.

The most relevant part in the overall al-gorithm is how the CQP language was mod-ified, and how the extra properties of wordsare computed: A partial parser of the de-fault CQP language was implemented, andextended to interpret one or more optionalattributes after each token. As this is an ex-tension, it is not possible to easily detect re-peating tokens, and therefore the attributescan only be used in non-repeating or non-optional query expression parts. For exam-ple, one could not add the name of an at-tribute to the two first units of the nextquery, only to the first and to the last, in“[pos="V"] []+ [pos="N"].lema”, as En-sinador is not able to know how many to-kens were matched in the repeating expres-sion. When multiple optional or repeatingtokens are present, only the fixed tokens inthe beginning or the end of the expressioncan be annotated.

After associating each attribute with therespective token position, the tool activatesand deactivates attributes, one at a time,to fetch the information to be presented be-

7Available from http://jquery.com/.

Alberto Simões y Diana Santos

306

Page 7: Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

tween parentheses. This can be a very ine!-cient procedure, so Ensinador only computesthis information once for each concordance,caching subsequently the information.

7 Use in Teaching

Even though this project is quite young, wehave used Ensinador’s results while teach-ing Portuguese grammar at the Departmentof Literature, Area Studies and Languages(ILOS) at the Arts Faculty of the Universityof Oslo in the 2011 spring term, for studentsof beginning and intermediate levels in Por-tuguese grammar (all of them already profi-cient in Portuguese, to various degrees).

The way this was done was as follows:after the theoretical material had been pre-sented, the students would try to do the firstpart of the exercises in class. Doubts aboutthe exercise, as well as comments about par-ticular cases or di!culties were discussedin class, before the students took it hometo complete. Then, after a week, the re-sults (i.e., the original renderings) were madeavailable to the students, who could comparethem with their own work. Students were en-couraged to come back to the teacher if theydid not understand why a particular render-ing had been chosen by its author, but theynever did.

One should note that it was consistentlyemphasized by the teacher that in some casesthe “solution” was merely a question of stylis-tic choice by the text author; in other cases,one would need to access a larger context todecide; while in some cases only one possibil-ity was grammatically correct.

In a way, this should have been high-lighted later when the solutions were madeavailable, but was not (yet) done. We plan todo this new analysis in the Fall semester, sothat students can consolidate their grammarskills by reflecting upon their choices and theauthors’. They will also be encouraged toexpress the meaning di"erences when morethan one solution is possible.

All in all, the students’ exposition to sev-eral kinds of text genres (and varieties) byreading these authentic sentences was gener-ally felt to be positive. In fact, an addedadvantage was that it was clear which vari-ety (from Portugal or Brazil) the sentencesbelonged to, thanks to the codes associatedwith the concordance lines. This allowed abetter understanding of the commonalities

and specificities of the two varieties in thegrammatical subjects at hand.

8 Further Work

There are two distinct schools in foreign lan-guage teaching as far as the use of translationmaterials in foreign language teaching is con-cerned (Tavares, 2008), but we won’t arguefor any of these positions here.

Rather, we note that, no matter the po-sition as far as language teaching is con-cerned, looking at translation practice is ab-solutely required when you teach transla-tion, and one can reuse a variation of thepresent tools to teach translation if one hasaccess to aligned parallel corpora encodedin a AC/DC-like format like COMPARA orCorTrad, or the ones being developed in Per-Fide (Araujo et al., 2010).

One can then present translation pairswith the source or the target language itemsremoved, for the constructions or lexicalitems that the teacher is interested in, as il-lustrated by Frankenberg-Garcia (1999).

In order to concentrate precisely on the is-sues that have a non-standard solution, or forwhich several genuinely di"erent translationsolutions have been put forward, it would beadvantageous to have a corpus of many trans-lations on which to base the Trans-ensinador.As pointed out by Malmkjaer (1996), thepoints where there is disagreement or multi-ple solutions are those which illuminate bestthe di"erences between the languages and thecomplexities of the translation among them.

We plan thus in the near future to deployTrans-ensinador, based on parallel corpora.In order to o"er a richer set of options, weplan to use a word-aligner, so that the queryexpression can be more precise and the trans-lation part better highlighted than if simplepairs of (aligned) sentences were presented.This will be done with NATools (Simoes andAlmeida, 2003) using an approach alreadyapplied to COMPARA (Santos and Simoes,2008), but which needs to be extended so thatit can be used in a search expression as well.

Other AC/DC based teaching tools havealso been deployed but have so far not beingused in actual teaching, namely:

• a comparison tool that allows the user tocontrast two di"erent queries;

• a quantitative distribution tool that

Ensinador: corpus-based Portuguese grammar exercises

307

Page 8: Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

gives a bird’s eye of the frequency of sev-eral phenomena.

• the possibility to search by semantic re-lation, that is, all cases that have animalas hypernym, or are antonyms of bom(“good”), or synonyms of mesa (“ta-ble”), by annotating the AC/DC corporawith this additional information.

9 Concluding remarks

With this paper, we presented a work-ing NLP application in CALL (or CALT,computer-aided language teaching) whichradically increases a teacher’s power whileusing resources compiled for the computa-tional processing of Portuguese. The systemis available from http://www.linguateca.pt/Ensinador/ and its open source code canbe downloaded by whoever wants to use it inconnection with their own corpora (possiblyin other languages). We should anyway stressthat the wealth of the corpus material andunderlying annotation is the result of a long-term commitment by Linguateca to makingresources available for Portuguese and main-taining them in a long term perspective.

AcknowledgmentsThe work described here was developed in thescope of Linguateca, which has been funded bythe Portuguese government, EU (FEDER andFSE) POSC/339/1.3/C/NAC, UMIC and FCCN.The work in Trans-Ensinador is being alsofunded by the project Per-fide, Portugues emparalelo com seis lınguas (Portugues, Espanol,Russian, Francais, Italiano, Deutsch, En-glish) grant PTDC/CLE-LLI/108948/2008 fromFundacao para a Ciencia e a Tecnologia.

References

Afonso, Susana, Eckhard Bick, RenatoHaber, and Diana Santos. 2002. Flo-resta sinta(c)tica: um treebank para oportugues. In Anabela Goncalves andClara Nunes Correia, editors, Actas doXVII Encontro Nacional da AssociacaoPortuguesa de Linguıstica (APL 2001),pages 533–545, Lisboa, Portugal, 2-4 deOutubro de 2001. APL.

Araujo, Sılvia, Jose Joao Almeida, AlbertoSimoes, and Idalete Dias. 2010. Ap-resentacao do projecto Per-Fide: Par-alelizando o portugues com seis outraslınguas. Linguamatica, 2(2):71–74, June.

Bacelar do Nascimento, Maria Fernanda.1997. A exploracao de corpora linguısticosno ensino/aprendizagem do portugues.In Seminario Internacional do Portuguescomo Lıngua Estrangeira, Macau, 21–24,Maio.

Bick, Eckhard. 2000. The Parsing Sys-tem “Palavras”: Automatic Grammati-cal Analysis of Portuguese in a Con-straint Grammar Framework. Ph.D. the-sis, Aarhus University.

Bick, Eckhard. 2005. Grammar for fun: IT-based grammar learning with VISL. InP. Juel, editor, CALL for the Nordic Lan-guages, pages 49–64, Copenhagen. Sam-fundslitteratur.

Christ, Oliver, Bruno M. Schulze, Anja Hof-mann, and Esther Konig, 1999. The IMSCorpus Workbench: Corpus Query Pro-cessor (CQP): User’s Manual. Institutefor Natural Language Processing, Univer-sity of Stuttgart, March.

Evert, Stefan and the OCWB Develop-ment Team. 2010. The ims opencorpus workbench (cwb) cqp query lan-guage tutorial (cwb version 3.0), 17February. http://cwb.sourceforge.net/files/CQP_Tutorial.pdf.

Frankenberg-Garcia, Ana. 1999. Crosslin-guistic influence as a key to extracting sec-ond language teaching materials for mono-lingual classes from translation corpora.In Sylviane Granger, editor, Proceedingsof the Workshop Contrastive Linguisticsand Translation Studies: Empirical Ap-proaches, 5-6 February.

Frankenberg-Garcia, Ana and Diana Santos.2002. COMPARA, um corpus paralelo deportugues e de ingles na Web. Cadernosde Traducao, IX(1):61–79.

Freitas, Claudia, Diana Santos,Hugo Goncalo Oliveira, and Violeta Quen-tal. 2011. VARRA: Validacao, Avaliacaoe Revisao de Relacoes semanticas noAC/DC. In Atas do ELC2010.

Maia, Belinda and Diana Santos. 2011. Whous afraid of ... what? Fear in English andPortuguese. In 32th ICAME.

Malmkjaer, Kirsten. 1996. Who walked inthe emperor’s garden: The translation of

Alberto Simões y Diana Santos

308

Page 9: Ensinador: corpus-based Portuguese grammar exercises · Ensinador: corpus-based Portuguese grammar exercises Ensinador: Ejercicios de gram´atica portuguesa basados en corpus Alberto

pronouns in Hans Christian Andersens in-troductory passages. In Gunilla Ander-man and C. Baner, editors, Proceedingsof the Tenth Biennial Conference of theBritish Association of Scandinavian Stud-ies. The University of Surrey, Departmentof Linguistics and International Studies.

Meurers, Detmar, Ramon Ziai, Luiz Ama-ral, Adriane Boyd, Aleksandar Dimitrov,Vanessa Metcalf, and Niels Ott. 2010. En-hancing authentic web pages for languagelearners. In Proceedings of the NAACLHLT 2010 Fifth Workshop on Innova-tive Use of NLP for Building EducationalApplications, pages 10–18, Stroudsburg,PA, USA, June. Association for Compu-tational Linguistics.

Santos, Diana. 1999. Disponibilizacaode corpora de texto atraves da WWW.In Palmira Marrafa and Maria AntoniaMota, editors, Linguıstica Computacional:Investigacao Fundamental e Aplicacoes.Actas do I Workshop sobre LinguısticaComputacional da Associacao Portuguesade Linguıstica (Lisboa, 25-27 de Maio de1999), pages 323–335, Lisboa. Colibri.

Santos, Diana. 2000. O projecto Pro-cessamento Computacional do Portugues:Balanco e perspectivas. In Maria dasGracas Volpe Nunes, editor, V Encontropara o processamento computacional dalıngua portuguesa escrita e falada (PRO-POR 2000), pages 105–113, Sao Paulo, 19-22 de Novembro. ICMC/USP.

Santos, Diana. 2009. Caminhos percorri-dos no mapa da portuguesificacao: A Lin-guateca em perspectiva. Linguamatica,1(1):25–59, Maio.

Santos, Diana. 2011. Linguateca’s infras-tructure for Portuguese and how it allowsthe detailed study of language varieties.OSLa: Oslo Studies in Language, 3.

Santos, Diana and Eckhard Bick. 2000.Providing internet access to portuguesecorpora: the AC/DC project. InMaria Gavrilidou et al, editor, Second In-ternational Conference on Language Re-sources and Evaluation, LREC 2000,pages 205–210, Athens, May–June.

Santos, Diana and Alberto Simoes. 2008.Portuguese-English word alignment: someexperiments. In LREC 2008 — The 6th

edition of the Language Resources andEvaluation Conference, Marrakech, 28–30,May. European Language Resources Asso-ciation (ELRA).

Sardinha, Tony Berber. 2007. The bookis NOT on the table: Autenticidade eidiomaticidade do texto para o ensinode ingles na perspectiva da linguısticade corpus. In Maria Cristina Dami-anovic, editor, Material Didatico: Elab-oracao e Avaliacao. Taubate: Cabral Ed-itora, pages 273–286.

Simoes, Alberto M. and J. Joao Almeida.2003. NATools – a statistical word alignerworkbench. Procesamiento del LenguajeNatural, 31:217–224, September.

Sinclair, John. 1997. Corpus evidence in lan-guage description. In Anne Wichmann,Steven Fligelstone, Tony McEnery, andGerry Knowles, editors, Teaching and lan-guage corpora. Longman, London & NewYork, pages 27–39.

Tagnin, Stella E.O., Elisa Duarte Teixeira,and Diana Santos. 2009. CorTrad: amultiversion translation corpus for thePortuguese-English pair. Arena Roman-istica, 4:314–323.

Tavares, Ana. 2008. Ensino / Aprendizagemdo Portugues como Lıngua Estrangeira:Manuais de iniciacao. Lidel, Lisboa.

Ensinador: corpus-based Portuguese grammar exercises

309