30
POST-PRINT VERSION: final draft post-refereeing Tercedor Sánchez, Maribel, López Rodríguez, Clara Inés and Alarcón Navío, Esperanza (2013). “Identifying features of translation through multiword lexical units”. Belgian Journal of Linguistics 27 (Lefer, Marie-Aude and Svetlana Vogeleer (eds.), Interference and normalization in genre-controlled multilingual corpora [BJL 27], 87–109. doi 10.1075/bjl.27.05ter Identifying translation features in multi-word lexical units 1 ABSTRACT Multi-word lexical units can often be rendered by different lexicalizations in the target language. Variation in the translation of multi-word lexical units, specifically multi-word cognates, can be regarded as an indicator of interference, since there is evidence of a priming effect which leads to the production of such units in interlinguistic communication (Kroll and Stewart 1994). This paper studies the production of multi-word cognates in ecological experimental translation. For this purpose real text units and multiple-choice tasks are used, and the data thus obtained are compared with corpus instances. The results show that there is a correlation between the spontaneous production of multi-word cognates, as evidenced experimentally, and their frequency as attested by corpora. Keywords: specialized translation, multi-word cognates, lexical production, lexical decision, interference.

Identifying translation features in multi-word lexical units

  • Upload
    uam-mx

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

POST-PRINT VERSION: final draft post-refereeing

Tercedor Sánchez, Maribel, López Rodríguez, Clara Inés and Alarcón Navío, Esperanza

(2013). “Identifying features of translation through multiword lexical units”. Belgian

Journal of Linguistics 27 (Lefer, Marie-Aude and Svetlana Vogeleer (eds.), Interference and

normalization in genre-controlled multilingual corpora [BJL 27], 87–109.

doi 10.1075/bjl.27.05ter

Identifying translation features in multi-word lexical units1

ABSTRACT

Multi-word lexical units can often be rendered by different lexicalizations in the

target language. Variation in the translation of multi-word lexical units,

specifically multi-word cognates, can be regarded as an indicator of interference,

since there is evidence of a priming effect which leads to the production of such

units in interlinguistic communication (Kroll and Stewart 1994). This paper

studies the production of multi-word cognates in ecological experimental

translation. For this purpose real text units and multiple-choice tasks are used, and

the data thus obtained are compared with corpus instances. The results show that

there is a correlation between the spontaneous production of multi-word cognates,

as evidenced experimentally, and their frequency as attested by corpora.

Keywords: specialized translation, multi-word cognates, lexical production, lexical

decision, interference.

POST-PRINT VERSION: final draft post-refereeing

1. Introduction

The web has become a major source of specialized knowledge acquisition for lay

audiences. This is the case of medical information, for which the web offers a wealth of

more or less detailed resources of varying degrees of reliability. In this context,

translation plays a major role as a tool to disseminate knowledge. In the study of multi-

word lexical units representing specialized concepts, lexical units are often the result of

a translation process that takes place in a space-constrained context such as the web. In

webpages, text is often used to describe images and is subject to space restrictions.

Furthermore, the recency of the information and potential updates may affect the on-line

documentation process of translators. With regard to Internet, the rapid changes in the

information and the possibility of accessing the web from any place make it particularly

suited to the presence and assimilation of borrowings and calques. The extent to which

these borrowings are accepted often depends on the contact between the two languages

involved.

According to Tercedor and López (2012: 252-253), medical concepts can be lexicalized

in various ways depending on the facet of the concept being highlighted. The facet

selected can reflect a certain specialized domain or a priority of the text sender. This is

evidence of the multidimensional nature of medical terminology and of terminology in

general. Furthermore, a particular term can be chosen because of the geographic,

historic, or social context in which it is going to be used. The multiple ways of naming a

concept is often referred to as terminological variation2 and term variants are the

designations used for this purpose. Researchers in both translation and terminology

have long acknowledged the cognitive and communicative motivation of terminological

POST-PRINT VERSION: final draft post-refereeing

variation. Although in the last few years, variation has been approached mostly from a

cognitive perspective (Fernández 2010; Fernández & Kerremans 2011; Tercedor 2011),

its communicative aspects have also come into the spotlight (Freixa 2006; Freixa et al.

2008; Tercedor & Méndez 2000).

The aim of this paper is to study multi-word cognates as one of the manifestations of

lexical variation in translated texts. In this research cognates are understood as lexical

items with a shared form and semantic overlap in two languages, usually sharing

etymological roots, including borrowings (Poplack 2004) and nonce borrowings

(Sankoff et al. 1990). Nonce borrowings differ from borrowings in that they are not

necessarily recurrent, widespread, or recognized by host language monolinguals

(Sankoff et al. 1990: 71). Therefore, the presence of cognates in target texts can be

studied as a form of source-language interference (Toury 1995), or as a translation

feature in space-constrained texts. However, it is also true that there are certain cognates

that are difficult to avoid in translation. A case in point is the translation of words of

Latin and Greek origin shared by the source and target language, which are usually

transferred as cognates. In a previous study we analyzed the lexical units produced by

English-Spanish bilinguals, which could also be translated as cognates. In this context,

the use of cognates often reflected interference of the source language, a phenomenon

typical of a languages-in contact situation (Tercedor 2010).

1.1. VARIMED and the study of Multi-word Lexical Units

This study was carried out as part of the VARIMED research project

(http://varimed.ugr.es), whose aim is to identify and describe denominative variation in

POST-PRINT VERSION: final draft post-refereeing

medical communication in English and Spanish from a cognitive and communicative

perspective. As part of the project, terminological variants are currently being compiled

in a multifunctional and reusable lexical resource in the field of health care. This

database also includes images and linguistic information such as the following: part of

speech, register, dialectal variation, as well as data on usage and subjective familiarity3.

As can be seen in Figure 1, relevant linguistic features of terminological variants are:

English/French calque, eponym, false friend, misspelling, Latin term, neologism,

English borrowing, acronym, most used term, spelling variant. The database is now

being designed so that term variants can have more than one tag.

Figure 1. Linguistic and usage tags for terminological variants in VARIMED.

Another aim of VARIMED is to carry out a series of experimental studies that will

provide insights into the phenomenon of term variation in relation to the cognitive

processes of lexical production and comprehension.

In this context, the study of multi-word lexical units is a valuable source of information

because a high percentage of medical terminology is composed of such units.

Consequently, the recognition and decoding of multi-word lexical units (MWLUs) as

units is crucial to the understanding of medical terminology. MWLUs constitute

POST-PRINT VERSION: final draft post-refereeing

knowledge units in specialized domains and express conceptual relations (Moldovan et

al. 2004; Nastase and Szpakowicz 2003) such as PART-OF, HAS-FUNCTION, IS-A, HAS-

PROPERTY, IS-LOCATED-IN, etc. such as in Figure 2:

Figure 2. Multi-word lexical units and their implicit conceptual relations.

The study of MWLUs is also a priority because in specialized texts, they are often the

result of translations into different languages. Multi-word cognates can be studied as a

marker of interference (Toury 1995) or a feature of translation that can be attributed to

the influence of the source language (Rabadán et al. 2009). Thus, this paper focuses on

multi-word cognates that are manifestations of source-language interference and lexical

variation in translated texts. An illustration of this can be found in the in-context

Spanish translations for the term “Non-Hodgkin Lymphoma”: linfoma no Hodgkin

(translation influenced by the source language) and linfoma no hodgkiniano (more

natural term in Spanish). The English form (Non-Hodgkin Lymphoma) and the Spanish

variant linfoma no Hodgkin are multi-word cognates, i.e. lexical items with a shared

form and semantic equivalence.

In line with Kroll and Stewart’s (1994) Revised Hierarchical Model, we hypothesize

that in L2-L1 translation, cognates are primed since the lexical route is accessed instead

POST-PRINT VERSION: final draft post-refereeing

of the conceptual route. This hypothesis guides the aims and research questions of this

study.

1.2. Objectives and research questions

The objectives of our study are the following:

a. To identify translation features in a production task: source-language interference,

explicitation and “untypical collocations” (Mauranen 2008: 41-44)

b. To compare the result of experimental tasks involving the translation of MWLUs

with corpus frequencies in order to see whether spontaneous production of cognates

runs parallel to attested corpus frequencies of cognates.

c. To test the usefulness of corpora in the translation of MWLUs and to point out

which corpora provide better usage examples in their translation.

These objectives are related to the following research questions:

1. In time-constrained L2–L1 translation of naturally occurring contexts, are cognates

produced as translation equivalents?

2. Do students react differently when translating spontaneously as opposed to selecting

an option in a multiple-choice task?

3. Is there a correlation between the frequency of cognates in translated text and the

frequency of cognates in L2 non-translated corpora?

4. Is there a correlation between the experimental production of non-cognates and low

frequencies of cognate terms in non-translated L2 corpora?

POST-PRINT VERSION: final draft post-refereeing

2. Methods

In translation studies, MWLUs have been empirically studied with methods such as: (i)

concordancing in large corpora; (ii) comparing thesauri, lexicons, and dictionaries; (iii)

interviewing subject groups.

In this study, we combined an experimental task and corpus data to analyze English-to-

Spanish translations of MWLUs. We compared data obtained in a production task with

corpus evidence to see if there was a correlation between the use of cognates in

spontaneous translation and the frequency of cognates in comparable corpora.

Therefore, the emphasis was on contrasting results obtained in ecological experimental

tasks with corpus data analysis.

2.1. Corpus compilation

In VARIMED, we follow a combined bottom-up and top-down approach. More

specifically, our bottom-up approach implies compiling a well-balanced English-

Spanish corpus using the Oncoterm Corpus as a starting point. The Oncoterm corpus

was developed between 1999 and 2002 as part of a research project on medical

terminology in the field of Oncology (Faber, López and Tercedor 2001; López, Faber

and Tercedor 2006, Oncoterm Project 2002). The corpus has 32 million words of which

28,771,000 are of the English section of the corpus. Based on this initial corpus as well

as on resources for corpus compilation and analysis, such as Sketch Engine (Kilgarriff et

al. 2004), we are in the process of updating our corpus to account for instances of term

variation and to identify parameters that guide the production and use of term variants.

POST-PRINT VERSION: final draft post-refereeing

This corpus is being used to study term variants from both an intralingual as well as an

interlingual perspective, based on multilingual approaches to corpora (Granger et al.

2003, Johansson 2007) in the contrastive analysis of medical texts.

The top-down methodology of VARIMED involves structuring the domain of SIGNS AND

SYMPTOMS, with reference to the body part where such signs and symptoms appear and

are felt, and to the condition they refer to. The final objective is to offer researchers and

general users alike a network of terminological variants for categories of symptoms and

conditions in English and Spanish.

The data for this study were based on a corpus of non-translated language (English and

Spanish) combined with a corpus of translated texts (from English into Spanish). The

comparable component includes texts that were originally written in English and

Spanish,: (i) for English, the EnTenTen12 corpus, and the English component of

Oncoterm Corpus; (ii) for Spanish, the EsTenTen corpus and the Spanish component of

the Oncoterm Corpus. The ‘TenTen’ corpora for English and Spanish were compiled

and accessed by means of the Sketch Engine corpus query system (Lexical computing

Ltd., n.d.). The EnTenTen12 corpus is a web corpus of more than 11,000 million words

tagged by TreeTagger4. The EsTenTen11 (Eu) corpus is a web corpus of European

Spanish containing more than 2,000 million words also tagged by TreeTagger. As

opposed to the TenTen corpora, the Oncoterm corpus is a specialized corpus on cancer.

Because of its size and the fact that oncology is a multidisciplinary area within

medicine, not only does it provide many usage examples on oncology but also on

general medicine. Regarding the translation corpus, it is composed of the Spanish

translations of the English texts published in Medline Plus5.

POST-PRINT VERSION: final draft post-refereeing

The corpus composition is shown in Table 1. The numbers of words provided do not

include numbers and punctuation marks.

POST-PRINT VERSION: final draft post-refereeing

Comparable corpus (non-translated language)

English:

EnTenTen corpus: Sketch Engine (11,191,860,036 words)

Oncoterm corpus (28,771,714 words)

Spanish:

EsTenTen11 corpus: Sketch Engine (2,103,770,763 words)

Oncoterm Corpus (13,645,317 words)

Translation corpus

Spanish translated texts from English

MedlinePlus in Spanish (709,764 words)

Table 1. Composition of the corpus.

2.2. Subjects

In our experiment, we selected 34 undergraduate students in a third year course on

scientific and technical translation within the Degree of Translation and Interpreting of

the University of Granada. They were 20-24 years of age (mean age, 21), with an L1 of

Spanish, and an L2 of English. Initially, 44 students participated in the task but 10 had

an L1 other than Spanish and were subsequently eliminated from the study.

2.3. Experimental task

The subjects were asked to perform a time-constrained translation task in which 19

naturally-occurring English medical contexts (paragraphs or whole sentences) were

presented onscreen and centered. Their length ranged from 61 to 317 characters. Stimuli

were selected from real contexts using Sketch Engine and WordSmith Tools

POST-PRINT VERSION: final draft post-refereeing

concordancing modules. The contexts remained onscreen for a time period, based on a

standard reading speed of eleven characters per second (Díaz Cintas and Remaël 2007:

96). All contexts were controlled for difficulty. Difficulty was calculated as their level

of explanatoriness of the sign/symptom referred to (a paraphrase or definition was

provided to facilitate comprehension). All contexts contained a multi-word lexical unit

that could be translated by at least two options: a cognate option and a non-cognate unit.

The non-cognate unit was usually a more naturally occurring option in original Spanish

texts. In the contexts provided, English words with a Spanish cognate equivalent were

marked in bold and occupied an initial, middle, or final position in the sentence, and

stimuli were presented randomly (see Figure 3). Subjects were given both oral and

written instructions to read the context and then to translate the marked unit. Students

were asked to maintain the register and degree of technicality of the source texts. They

were also asked to write down their choice and not to go back and change their option.

Figure 3. Example of stimuli presented onscreen.

Finally, a multiple-choice task was presented in which students had to choose between

three translation options for the MWLUs of the previous production task. The objective

was to see whether the presentation of this lexical decision task modified their first

choice.

(1) mitochondrial DNA

POST-PRINT VERSION: final draft post-refereeing

ADN mitocondrial

ADN mitocóndrico

ADN mitocondriano

3. Results

The use of different sources of empirical data (e.g. experimental tasks and corpus data)

provides a way to triangulate results (Alves et al. 2010; Tercedor et al. 2012). More

specifically, we used several data collection instruments and types of analysis to shed

light on the translation of cognates. This section describes the result of combining

experimental tasks with corpus data and explores whether there is a cognitive

motivation that lead students to select cognates vs. non-cognates, and whether there is a

correlation between their answers (process-based research) and the frequency of

MWLUs in different corpora (product-based research).

3.1. Research questions 1 and 2: lexical decision and cognitive motivation

The data of the on-screen production task and the multiple choice task suggest that in

time-constrained L2-L1 translation, native Spanish students of translation use cognates

as translation equivalents for MWLUs, and that there is a cognitive motivation

underlying their production. In our study, cognates produced as target renderings appear

to be safe options. This contrasted with results obtained by Kussmaul (1995: 17-18) and

Kussmaul and Tirkkonen-Condit (1995: 187), who found that translators avoided

cognate options in their translations because of fear of cognates being false cognates. As

POST-PRINT VERSION: final draft post-refereeing

shown in Figure 4, in the onscreen production task, the majority of students translated

the MWLUs as cognates with the exception of cardiopulmonary resuscitation and

trauma surgery. The difference between the cognate and non-cognate options is very

significant for most of the units. The exception, the choice of non-cognates as

translations for cardiopulmonary resuscitation or anabolic steroids can be explained by

the fact that these expressions were part of their world knowledge. More specifically,

students were familiar with these concepts because of TV medical series, the use of

steroids in sports, etc. This became evident from the feedback on these concepts, which

was received from the students in an interview two days after the experiment. The cases

in which students did not write anything (transient ischaemic attack) or provided

unsatisfactory solutions (DNA mitocondrial, DNA mitocondriano for mitochondrial

DNA) could be markers of difficulty in the recognition of specialized concepts and in

the understanding their meaning. In fact, one of the most frequent features in non-

existent solutions (i.e. terms not found in medical dictionaries) was the atypical

collocations feature of translation as established in Mauranen (2000; 2008: 41–44). This

feature points to the fact that translations normally display ‘odd collocations’ or

untypical combinatory tendencies. In her words, translations tend to favor combinations

that, although possible in the target language system, are rare or absent from actual

target language texts. Conversely, translations often have few or no instances of

combinations that are frequent in target-language originals” (Mauranen 2008: 44).

For example, some of the subjects (20.59%) translated trauma surgery as cirugía

traumática, which is a non-existent collocation.

POST-PRINT VERSION: final draft post-refereeing

Figure 4. Results of translation of multi-word cognates by students of scientific and

technical translation.

The rapid response obtained from students and the low percentage of non-translations

may indicate non-complexity in translating cognate forms, namely access from lexical

representation in L2 to lexical representation in L1. It might be the case that in L2–L1

translation, if there is formal similarity, the lexical route is accessed (the cognate form is

used) instead of the conceptual route in line with the Revised Hierarchical Model (Kroll

and Stewart 1994). These results reinforce the hypothesis that in time-constrained

translations, there is source-language interference.

Moreover, in the domain of Medicine many terms in English and Spanish share

common Greek and Latin roots, and this explains the fact that an MWLU such as

chronic gastralgia is translated into Spanish with its ‘natural’ cognate: “gastralgia

POST-PRINT VERSION: final draft post-refereeing

crónica”. In future research, we plan to filter out those English examples involving

MWLUs containing Greek and Latin stems that are easily rendered into Spanish in the

same form, as opposed to Spanish-English translation, where the Latin or Greek form

might not be the primary option.6

In order to answer our second research question (i.e. Do students react differently when

translating cognates spontaneously as opposed to selecting an option in a multiple-

choice task?), we carried out a multiple-choice task in which students were given

different translation options (cognate and non-cognate) and were asked to choose one.

We wanted to make sure that using cognates was not just an “easy escape” for students

who did not know the equivalent term in time-constrained translation tasks. We wanted

to check whether their preference for cognates remained when they were given several

translation options and had time to think about their appropriateness.

Figure 5. Results of the multiple-choice task.

POST-PRINT VERSION: final draft post-refereeing

As can be seen in Figure 5, when students had more time, they did not choose cognates

as markedly as in the production task. It could be the case that the translation

competence of undergraduate students of Translation allows them to avoid calques and

borrowings as translation options, given the time to do so and having all options

visually available. This result brings us to the third research question.

3.2. Research question 3: correlation between the use of cognates in the production task

and their frequency in corpora

To relate experimental data with corpus data, we formulated the question: Is there a

correlation between the frequency of cognates in the production task (translated Spanish

text) and the frequency of cognates in L1 non-translation corpora (EsTenTen11)?

Frequency of cognates was measured in two ways: (i) frequency of MWLUs as a unit

(for example, adenoma* hepatocelular* encoded as "adenoma.*" []{0}

"hepatocelular.*" in Sketch Engine’s Corpus Query Language); (ii) frequency of its

components taken individually even when they do not collocate (frequency of

adenoma* + frequency of hepatocelular*). The wildcard indicates that the singular and

plural forms were retrieved.

As shown in Figure 6, data from the Spanish EsTenTen11 corpus of European Spanish

indicate that cognates are more frequent than their alternative non-cognate synonyms in

naturally-occurring contexts, such as those extracted from this corpus of non-translated

texts. The only multi-word lexical units that are more frequent in their non-cognate

form are cirugía traumatológica and reanimación cardiopulmonar.

POST-PRINT VERSION: final draft post-refereeing

.

Figure 6. Frequency of multi-word cognates in the EsTenTen corpus.

The frequency of the cognate units in the corpus varies greatly as can be observed in

Figure 6. All units except for gastralgia crónica were present in the corpus, although

their frequency is very low. This is not surprising since the EsTenTen11 corpus is a

general language corpus, and therefore, does not necessarily include our multi-word

medical terms verbatim.

However, if taken separately, the elements composing the multi-word lexical units have

a high frequency in the EsTenTen11 corpus (Figure 7).

POST-PRINT VERSION: final draft post-refereeing

Figure 7. EsTenTen frequencies of W1+W2 multi-word cognates taken separately.

It is our understanding that a student with translation competence can produce multi-

word lexical units on the basis of their knowledge of L2 and L1 grammars and their

familiarity with the individual words. We assumed that if the words composing a

specific MWLU were frequent in a general corpus of non-translated Spanish, then

native speakers of Spanish were likely to use them or be familiar with them, and

therefore, they might associate them with their collocate more easily. Consequently, we

tested whether there was a correlation between the cognates produced in the

experimental task and the corpus frequency of MWLUs. On this occasion, we measured

the frequency of MWLUs by adding the frequency of their components when they

appeared individually, and we compared these frequencies with the results of the

production task (Figure 8).

POST-PRINT VERSION: final draft post-refereeing

Figure 8. Cognates in the production task (inner circle) and frequencies of their

elements in the EsTenTen11 corpus (outer circle).

The Spanish cognates used by students in the production task are represented in the

inner circle, and the frequencies of the elements of each MWLU in the EsTenTen

corpus are in the outer circle. For example, to obtain the percentages of the inner circle,

the cognates in the production task were grouped together as a whole, and a percentage

was assigned to indicate the frequency of each cognate in relation to the overall number

of cognates. It is true that this kind of representation is limited in the sense that

percentages only refer to their overall presence in the data under consideration.

Nevertheless, it might point to those examples more likely to be translated as cognates

because the cognate form is frequent in a general corpus. For example, in the inner

circle, the higher percentages (11% for dolor epigástrico; and 10% for adenoma

POST-PRINT VERSION: final draft post-refereeing

hepatocelular, dolor abdominal and neuralgia trigeminal; 9% for gastralgia crónica,

8% for examen ecocardiográfico) indicate that these were the examples where students

used more cognate variants. In the outer circle, the higher frequencies indicate higher

frequencies of the components of each MWLU in the EsTenTen corpus: 23% for dolor

abdominal, 21% for dolor epigástrico, 17% for examen ecocardiográfico, 14% for

adaptación lumínica, and 8% for gastralgia crónica. Some of the most frequent

cognates coincide and, overall, there is a rough correspondence between both circles in

the sense that when words (cognates) are frequent in a corpus, speakers will use them to

communicate or translate. Therefore, it would appear that there is a correlation between

the frequency of multi-word cognates in the production task and the frequency of their

elements in non-translation corpora.

3.3. Research question 4: correlation between the use of non-cognates in the production

task and low frequencies of cognate terms in corpora

The fourth research question was: Is there a correlation between the use of non-cognates

in the production task and low frequencies of cognate terms in non-translated L1

corpora? With this question we wished to test whether there was a relation between the

use of non-cognates in the production task and their significance in terms of frequency

in non-translated corpora. We assumed that non-translated corpora in Spanish

(comparable corpora)7 contain more naturally-occurring contexts, and therefore, are a

good reference to search for non-cognates in the translation classroom.

If we compare the two charts in Figure 9, which give the results of the production task

and frequency of cognate MWLUs in Spanish monolingual corpora (EsTenTen11), it is

POST-PRINT VERSION: final draft post-refereeing

evident that for those cases in which students resorted to using the non-cognate form for

a concept (medicamento anticonvulsivo, cirugía traumatológica), the corpus also

showed a low frequency for the particular cognate. The exception was the case of

cardiopulmonary resuscitation because of the reasons given in section 3.1.

Figure 9. The use of non-cognates and low frequencies of cognate terms in L1

monolingual corpora.

We finally checked the frequency of the most frequent cognate and non-cognate

MWLUs in our different corpora, with the aim of comparing results and exploring

specific corpus usability for the translation of these units.

POST-PRINT VERSION: final draft post-refereeing

COGNATES NON-COGNATES

neuralgia trigeminal neuralgia del trigémino

examen ecocardiográfico ecocardiograma, N + ecocardiográfico/a

anticonvulsivante/s, anticonvulsante/s anticonvulsivo/a/os/as

cirugía traumática cirugía traumatológica

adaptación lumínica/ a la luz adaptación fotópica

esteroide/s anabólico/s esteroide/s anabolizante/s, anabolizante/s (N)

ataque/s isquémico/s transitorio accidente/s isquémico/s transitorio/s

resucitación cardiopulmonar reanimación cardiopulmonar

Table 2. Spanish most frequent cognate (left) and non-cognate units (right) as revealed

by our corpora.

Interestingly, the terms cirugía traumática, resucitación, ataque isquémico, and

anticonvulsante are either not present or their use is not recommended by the recently

published Diccionario de términos médicos (Real Academia Nacional de Medicina,

[Spanish Royal Academy of Medicine], 2012). In the case of resucitación, this term was

sanctioned by the RAE in 2001.

More specifically, frequency was analyzed in the EsTenTen corpus and the Oncoterm

corpus (Figures 10 and 11).

Appropriate solutions in EsTenTen11

10 1

1127

16

75

128268

156 387

16332

0

864

108795

0%

20%

40%

60%

80%

100%

trige

min

alne

ural

gia

echo

card

iogr

aphi

cex

amin

atio

n

antic

onvu

lsan

tm

edic

atio

n

traum

a su

rger

y

light

ada

ptat

ion

anab

olic

ste

roid

s

trans

ient

isch

aem

ic a

ttack

card

iopu

lmon

ary

resu

scita

tion

Non-cognatesCognates

POST-PRINT VERSION: final draft post-refereeing

Figure 10. Frequencies of cognates and non-cognates in the EsTenTen11 corpus.

Appropriate solutions in OncoTerm Corpus

4

623

0 0 115

49

40

57

1

0

2549

34

0%20%40%60%80%

100%tri

gem

inal

neur

algi

a

echo

card

iogr

aphi

cex

amin

atio

n

antic

onvu

lsan

tm

edic

atio

n

traum

a su

rger

y

light

ada

ptat

ion

anab

olic

ste

roid

s

trans

ient

isch

aem

ic a

ttack

card

iopu

lmon

ary

resu

scita

tion

Non-cognatesCognates

Figure 11. Frequencies of cognates and non-cognates in the OncoTerm corpus.

The EsTenTen provided more data than the OncoTerm corpus. Both corpora mainly

contained more examples of non-cognates (percentages on top) than cognates. Thus it

seems that they may be useful for translators to retrieve natural expression in Spanish.

On the other hand, Figure 12 is an extract of results of the frequency analysis obtained

from the Medline corpus. Despite the fact that it is a specific translated Spanish corpus

on Medicine, the Medline corpus is of little use for frequency purposes, given its

reduced size (800,000 types). We can thus conclude that this corpus of translated

Spanish is not very useful to obtain idiomatic specialized MWLUs in the target

language (Spanish).

Appropriate solutions in MedlinePlus Spanish

0 0 0 0 0

6 56

3

0

1

0 0

0 02

0%10%20%30%40%50%60%70%80%90%

100%

trige

min

alne

ural

gia

echo

card

iogr

aphi

cex

amin

atio

n

antic

onvu

lsan

tm

edic

atio

n

traum

a su

rger

y

light

ada

ptat

ion

anab

olic

ste

roid

s

trans

ient

isch

aem

ic a

ttack

card

iopu

lmon

ary

resu

scita

tion

Non-cognatesCognates

POST-PRINT VERSION: final draft post-refereeing

Figure 12. Frequency of cognate and non-cognate solutions in the Medline corpus.

Conclusions

The purpose of this research was to study multi-word cognate production in a

translation task. The results showed that in time-constrained L2-L1 translation of

MWLUs, cognates were usually preferred as translation equivalents. A correlation was

found between the use of multi-word cognates in translation and the frequency of their

elements in corpora. According to López, Buendía and García (2012: 68), the use of

different measuring instruments is helpful in gathering and interpreting evidence from

other perspectives that may confirm the results obtained from participants in a specific

context. These additional converging instruments should be selected according to

pragmatic criteria, such as time, place and ethics, as well as their scientific quality and

familiarity, their time-saving properties and coherence with our initial measuring

instrument.

By comparing data obtained in a production task with corpus evidence, it was possible

to better understand translation features. Our results are preliminary due to the

limitations of the task and the data at hand, but they have a direct application for the

VARIMED database in relation to the following:

(a) the classification of forms of lexical variation in medical texts as attested in

experimental tasks and web corpora;

(b) the acquisition of insights into interlinguistic competence, procedural competence,

and expert knowledge of translators;

POST-PRINT VERSION: final draft post-refereeing

(c) the specification of areas of improvement in translation, second language learning,

dictionary making and terminology encoding.

This study has illustrated how combining corpus evidence with data obtained in

production tasks can offer valuable information regarding issues such as the relation

between frequency, as attested by corpus, and familiarity, as evidenced in an

experimental task.

References

Alves, Favio, Adriana Pagano, and Stella Neumann. 2010. “Translation units and

grammatical shifts: towards an integration of product- and process-based translation

Research.” In Gregory M. Shreve and Erik Anglone (eds.), Translation and cognition.

Amsterdam/ Philadelphia: Benjamins, 109-142.

Díaz Cintas, Jorge and Aline Remaël. 2007. Audiovisual translation: subtitling.

Manchester: St. Jerome.

Faber, Pamela, Clara Inés López Rodríguez and Maribel Tercedor Sánchez. 2001. “La

utilización de técnicas de corpus en la representación del conocimiento médico.”

Terminology, 7 (2): 167-197.

Fernández Silva, Sabela. 2010. Variación terminológica y cognición. Factores

cognitivos en la denominación del concepto especializado. PhD Thesis. Barcelona:

Universitat Pompeu Fabra.

Fernández Silva, Sabela and Koen Kerremans. 2011. “Terminological variation in

source texts and translations: a pilot study.” Meta: journal des traducteurs / Meta:

Translators' Journal, 56 (2), 2011, 318-335.

POST-PRINT VERSION: final draft post-refereeing

Freixa, Judit. 2006. “Causes of denominative variation in terminology: a typology

proposal.” Terminology 12 (1), 51-57.

Freixa, Judit, Sabela Fernández, Sabela and María Teresa Cabré. 2008. “La multiplicité

des chemins dénominatifs.” Meta: journal des traducteurs / Meta: Translators' Journal,

53 (4) : 731-747.

Geeraerts, Dirk, Stefan Grondelaers and Dirk Speelman. 1999. Convergentie en

divergentie in de Nederlandse woordenshat. Een ordenzoek naar Kleding- en

voetbaltermen. Amsterdam: Meertensinstitut.

Granger, Sylviane, Jacques Lerot, and Stephanie Petch-Tyson (eds.). 2003. Corpus-

based approaches to contrastive linguistics and translation studies. Amsterdam:

Rodopi.

Johansson, Stig. 2007. Seeing through Multilingual Corpora. On the use of corpora in

contrastive studies. Amsterdam/Philadelphia: John Benjamins.

Kilgarriff, Adam, Pavel Rychly, Pavel Smrz, and David Tugwell. 2004. “The Sketch

Engine.” In Proceedings EURALEX 2004. Lorient, France, 105-116.

Kroll, Judith and Erika Stewart. 1994. “Category interference in translation and picture

naming: evidence from asymmetric connections between bilingual memory

representations.” Journal of Memory and Language 33: 149-174.

Kussmaul, Paul. 1995. Training the Translator. Amsterdam/Philadelphia: John

Benjamins.

Kussmaul, Paul and SonjaTirkkonen-Condit. 1995. “Think-Aloud Protocol Analysis in

Translation Studies.” TTR 8: 177-199.

Lexical Computing Ltd. (n.d.). Sketch engine. Available at:

http://www.sketchengine.co.uk/

POST-PRINT VERSION: final draft post-refereeing

López Rodríguez, Clara Inés, Miriam Buendía Castro, and Alejandro García Aragón.

2012. “User needs to the test: evaluating a terminological knowledge base on the

environment by trainee translators.” JoSTrans (Journal of Specialised Translation) 18

(July 2012): 57-76. Available at: http://www.jostrans.org/issue18/art_lopez.pdf

López Rodríguez, Clara Inés, Pamela Faber, and Maribel Tercedor Sánchez. 2006.

“Terminología basada en el conocimiento para la traducción y la divulgación médicas:

el caso de Oncoterm.” Panace@ VII, 24. December, 2006. Available at:

http://www.medtrad.org/panacea/IndiceGeneral/n24_tradyterm-l.rodriguez.etal.pdf

Mauranen, Anna. 2000. “Strange strings in translated language: A study on Corpora.” In

Maeve Olohan. (ed.). Intercultural faultiness. Research models in Translation Studies

1: Textual and cognitive aspects. Manchester: Saint Jerome, 119-141.

Mauranen, Anna. 2008. “Universal tendencies in translation”. In Gunilla Anderman and

Margaret Rogers (eds.), Incorporating corpora: the linguist and the translator.

Clevedon, Buffalo: Multilingual matters, 32-48.

Moldovan, Dan, Adriana Badulescu, Marta Tatu, Daniel Antohe, and Roxana Girju.

2004. “Models for the semantic classification of noun phrases.” In Proceedings of the

Human Language Technology Conference (HLT-NAACL) 2004, Computational

Lexical Semantics Workshop. Boston, MA.

Nastase, Vivi and Stan Szpakowicz. 2003. “Exploring noun-modifier semantic

relations”. In Fifth International Workshop on Computational Semantics (IWCS-5).

Tilburg, The Netherlands, 285-301.

OncoTerm Project. 2002. “Sistema Bilingüe de Información y Recursos Oncológicos

[Bilingual System of oncological information and resources.” Available at:

http://www.ugr.es/~oncoterm/oncodesc.htm.

POST-PRINT VERSION: final draft post-refereeing

Peters, Carol, Eugenio Picchi, and Lisa Biagini. 1996. “Parallel and comparable

bilingual corpora in language teaching and learning.” In Simon Botley et al. (eds.).

Proceedings of Teaching and Language Corpora (UCREL Technical Papers, Vol. 9).

Lancaster: University of Lancaster, 68-80.

Poplack, Shana. 2004. “Code-switching”. In U. Ammon, N. Dittmar, K.J. Mattheier,

and P. Trudgill (eds.), Soziolinguistik. An international handbook of the science of

language. Berlin: Walter de Gruyter, 589-596.

Rabadán, Rosa, Belén Labrador and Noelia Ramón. 2009. “A tool for translation quality

assessment English-Spanish.” Babel 55:4, 303-328.

Real Academia Nacional de Medicina. 2012. Diccionario de términos médicos. Madrid:

Editorial médica panamericana.

Sankoff, David, Shana Poplack, and Swathi Vanniarajan. 1990. “The Case of the Nonce

loan in Tamil.” Language Variation and Change 2: 71-101.

Tercedor Sánchez, Maribel and Beatriz Méndez Cendón, Beatriz. 2000. “Fraseología y

variación terminológica: estudio descriptivo en corpora biomédicos.” Terminologie et

Traduction 2: 82-100.

Tercedor Sánchez, Maribel. 2010. “Cognates as lexical choices in translation.

Interference in space-constrained texts.” Target 22: 2. 177-193.

Tercedor Sánchez, Maribel. 2011. “The cognitive dynamics of terminological

variation.” Terminology 17 (2): 181-197.

Tercedor Sánchez, Maribel and Clara Inés López Rodríguez. 2012. “Access to health in

an intercultural setting: the role of corpora and images in grasping term variation”.

Linguistica Antverpiensia NS (Themes in Translation Studies: Translation and

knowledge mediation in medical and health settings) 11/2012: 247-268.

POST-PRINT VERSION: final draft post-refereeing

Tercedor Sánchez, Maribel, Clara Inés López Rodríguez and Pamela Faber. 2012.

“Working with words: research methodologies in translation-oriented lexicographic

practice.” TTR: traduction, terminologie, rédaction Vol. XXV, 1: 181-214.

Toury, Gideon. 1995. Descriptive Translation Studies and beyond.

Amsterdam/Philadelphia: John Benjamins.

POST-PRINT VERSION: final draft post-refereeing

NOTES

1 This research has been carried out within the framework of VariMed: Denominative variation in Medicine: Multilingual multimodal tool for research and knowledge dissemination (FFI2011-23120), a three year (2012-2014) research project funded by the Spanish Ministry of Economy and Competitiveness, with the participation of researchers from the University of Granada, University Pablo de Olavide, and University of Valladolid (Spain), Rutgers University (USA) and Carleton University (Canada), aimed at the study of denominative variation from a cognitive and communicative perspective. This research is also part of the innovative teaching project Comunicación y ciudadanía europea (inglés-español): recursos multimodales para la salud y el medioambiente [Communication and European Citizenship (English-Spanish): multimodal resources on Health and the Environment], funded by the University of Granada. 2 The concept has been referred to as onomasiological variation in the cognitive linguistics paradigm (Geeraerts, Grondelaers and Speelman (1999), distinguishing between formal variation (use of synonyms) and conceptual onomasiological variation (use of the hyperonym and hyponym alternatively). However in the field of terminology this form is rarely used and full synonymy is rare. 3 In the database, the field familiarity will contain a likert-type scale reflecting the results of such tests. 4 The TreeTagger is a tool for annotating text with part-of-speech and lemma information. For more information: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/. 5 The corpus was compiled by Miguel Ángel Jiménez-Crespo from the Medline webpage: http://www.nlm.nih.gov/medlineplus/spanish/ 6 There is an interesting description of the percentage of English words that come from Latin in Dictionary.com (http://dictionary.reference.com/help/faq/language/t16.html). According to this source over 60 percent of all English words have Greek or Latin roots. In the vocabulary of the sciences and technology, the figure rises to over 90 percent. 7 Peters et al. (1996: 69) define comparable corpora as "sets of texts from pairs or multiples of languages which can be contrasted and compared because of their common features."