Upload
uam-mx
View
0
Download
0
Embed Size (px)
Citation preview
POST-PRINT VERSION: final draft post-refereeing
Tercedor Sánchez, Maribel, López Rodríguez, Clara Inés and Alarcón Navío, Esperanza
(2013). “Identifying features of translation through multiword lexical units”. Belgian
Journal of Linguistics 27 (Lefer, Marie-Aude and Svetlana Vogeleer (eds.), Interference and
normalization in genre-controlled multilingual corpora [BJL 27], 87–109.
doi 10.1075/bjl.27.05ter
Identifying translation features in multi-word lexical units1
ABSTRACT
Multi-word lexical units can often be rendered by different lexicalizations in the
target language. Variation in the translation of multi-word lexical units,
specifically multi-word cognates, can be regarded as an indicator of interference,
since there is evidence of a priming effect which leads to the production of such
units in interlinguistic communication (Kroll and Stewart 1994). This paper
studies the production of multi-word cognates in ecological experimental
translation. For this purpose real text units and multiple-choice tasks are used, and
the data thus obtained are compared with corpus instances. The results show that
there is a correlation between the spontaneous production of multi-word cognates,
as evidenced experimentally, and their frequency as attested by corpora.
Keywords: specialized translation, multi-word cognates, lexical production, lexical
decision, interference.
POST-PRINT VERSION: final draft post-refereeing
1. Introduction
The web has become a major source of specialized knowledge acquisition for lay
audiences. This is the case of medical information, for which the web offers a wealth of
more or less detailed resources of varying degrees of reliability. In this context,
translation plays a major role as a tool to disseminate knowledge. In the study of multi-
word lexical units representing specialized concepts, lexical units are often the result of
a translation process that takes place in a space-constrained context such as the web. In
webpages, text is often used to describe images and is subject to space restrictions.
Furthermore, the recency of the information and potential updates may affect the on-line
documentation process of translators. With regard to Internet, the rapid changes in the
information and the possibility of accessing the web from any place make it particularly
suited to the presence and assimilation of borrowings and calques. The extent to which
these borrowings are accepted often depends on the contact between the two languages
involved.
According to Tercedor and López (2012: 252-253), medical concepts can be lexicalized
in various ways depending on the facet of the concept being highlighted. The facet
selected can reflect a certain specialized domain or a priority of the text sender. This is
evidence of the multidimensional nature of medical terminology and of terminology in
general. Furthermore, a particular term can be chosen because of the geographic,
historic, or social context in which it is going to be used. The multiple ways of naming a
concept is often referred to as terminological variation2 and term variants are the
designations used for this purpose. Researchers in both translation and terminology
have long acknowledged the cognitive and communicative motivation of terminological
POST-PRINT VERSION: final draft post-refereeing
variation. Although in the last few years, variation has been approached mostly from a
cognitive perspective (Fernández 2010; Fernández & Kerremans 2011; Tercedor 2011),
its communicative aspects have also come into the spotlight (Freixa 2006; Freixa et al.
2008; Tercedor & Méndez 2000).
The aim of this paper is to study multi-word cognates as one of the manifestations of
lexical variation in translated texts. In this research cognates are understood as lexical
items with a shared form and semantic overlap in two languages, usually sharing
etymological roots, including borrowings (Poplack 2004) and nonce borrowings
(Sankoff et al. 1990). Nonce borrowings differ from borrowings in that they are not
necessarily recurrent, widespread, or recognized by host language monolinguals
(Sankoff et al. 1990: 71). Therefore, the presence of cognates in target texts can be
studied as a form of source-language interference (Toury 1995), or as a translation
feature in space-constrained texts. However, it is also true that there are certain cognates
that are difficult to avoid in translation. A case in point is the translation of words of
Latin and Greek origin shared by the source and target language, which are usually
transferred as cognates. In a previous study we analyzed the lexical units produced by
English-Spanish bilinguals, which could also be translated as cognates. In this context,
the use of cognates often reflected interference of the source language, a phenomenon
typical of a languages-in contact situation (Tercedor 2010).
1.1. VARIMED and the study of Multi-word Lexical Units
This study was carried out as part of the VARIMED research project
(http://varimed.ugr.es), whose aim is to identify and describe denominative variation in
POST-PRINT VERSION: final draft post-refereeing
medical communication in English and Spanish from a cognitive and communicative
perspective. As part of the project, terminological variants are currently being compiled
in a multifunctional and reusable lexical resource in the field of health care. This
database also includes images and linguistic information such as the following: part of
speech, register, dialectal variation, as well as data on usage and subjective familiarity3.
As can be seen in Figure 1, relevant linguistic features of terminological variants are:
English/French calque, eponym, false friend, misspelling, Latin term, neologism,
English borrowing, acronym, most used term, spelling variant. The database is now
being designed so that term variants can have more than one tag.
Figure 1. Linguistic and usage tags for terminological variants in VARIMED.
Another aim of VARIMED is to carry out a series of experimental studies that will
provide insights into the phenomenon of term variation in relation to the cognitive
processes of lexical production and comprehension.
In this context, the study of multi-word lexical units is a valuable source of information
because a high percentage of medical terminology is composed of such units.
Consequently, the recognition and decoding of multi-word lexical units (MWLUs) as
units is crucial to the understanding of medical terminology. MWLUs constitute
POST-PRINT VERSION: final draft post-refereeing
knowledge units in specialized domains and express conceptual relations (Moldovan et
al. 2004; Nastase and Szpakowicz 2003) such as PART-OF, HAS-FUNCTION, IS-A, HAS-
PROPERTY, IS-LOCATED-IN, etc. such as in Figure 2:
Figure 2. Multi-word lexical units and their implicit conceptual relations.
The study of MWLUs is also a priority because in specialized texts, they are often the
result of translations into different languages. Multi-word cognates can be studied as a
marker of interference (Toury 1995) or a feature of translation that can be attributed to
the influence of the source language (Rabadán et al. 2009). Thus, this paper focuses on
multi-word cognates that are manifestations of source-language interference and lexical
variation in translated texts. An illustration of this can be found in the in-context
Spanish translations for the term “Non-Hodgkin Lymphoma”: linfoma no Hodgkin
(translation influenced by the source language) and linfoma no hodgkiniano (more
natural term in Spanish). The English form (Non-Hodgkin Lymphoma) and the Spanish
variant linfoma no Hodgkin are multi-word cognates, i.e. lexical items with a shared
form and semantic equivalence.
In line with Kroll and Stewart’s (1994) Revised Hierarchical Model, we hypothesize
that in L2-L1 translation, cognates are primed since the lexical route is accessed instead
POST-PRINT VERSION: final draft post-refereeing
of the conceptual route. This hypothesis guides the aims and research questions of this
study.
1.2. Objectives and research questions
The objectives of our study are the following:
a. To identify translation features in a production task: source-language interference,
explicitation and “untypical collocations” (Mauranen 2008: 41-44)
b. To compare the result of experimental tasks involving the translation of MWLUs
with corpus frequencies in order to see whether spontaneous production of cognates
runs parallel to attested corpus frequencies of cognates.
c. To test the usefulness of corpora in the translation of MWLUs and to point out
which corpora provide better usage examples in their translation.
These objectives are related to the following research questions:
1. In time-constrained L2–L1 translation of naturally occurring contexts, are cognates
produced as translation equivalents?
2. Do students react differently when translating spontaneously as opposed to selecting
an option in a multiple-choice task?
3. Is there a correlation between the frequency of cognates in translated text and the
frequency of cognates in L2 non-translated corpora?
4. Is there a correlation between the experimental production of non-cognates and low
frequencies of cognate terms in non-translated L2 corpora?
POST-PRINT VERSION: final draft post-refereeing
2. Methods
In translation studies, MWLUs have been empirically studied with methods such as: (i)
concordancing in large corpora; (ii) comparing thesauri, lexicons, and dictionaries; (iii)
interviewing subject groups.
In this study, we combined an experimental task and corpus data to analyze English-to-
Spanish translations of MWLUs. We compared data obtained in a production task with
corpus evidence to see if there was a correlation between the use of cognates in
spontaneous translation and the frequency of cognates in comparable corpora.
Therefore, the emphasis was on contrasting results obtained in ecological experimental
tasks with corpus data analysis.
2.1. Corpus compilation
In VARIMED, we follow a combined bottom-up and top-down approach. More
specifically, our bottom-up approach implies compiling a well-balanced English-
Spanish corpus using the Oncoterm Corpus as a starting point. The Oncoterm corpus
was developed between 1999 and 2002 as part of a research project on medical
terminology in the field of Oncology (Faber, López and Tercedor 2001; López, Faber
and Tercedor 2006, Oncoterm Project 2002). The corpus has 32 million words of which
28,771,000 are of the English section of the corpus. Based on this initial corpus as well
as on resources for corpus compilation and analysis, such as Sketch Engine (Kilgarriff et
al. 2004), we are in the process of updating our corpus to account for instances of term
variation and to identify parameters that guide the production and use of term variants.
POST-PRINT VERSION: final draft post-refereeing
This corpus is being used to study term variants from both an intralingual as well as an
interlingual perspective, based on multilingual approaches to corpora (Granger et al.
2003, Johansson 2007) in the contrastive analysis of medical texts.
The top-down methodology of VARIMED involves structuring the domain of SIGNS AND
SYMPTOMS, with reference to the body part where such signs and symptoms appear and
are felt, and to the condition they refer to. The final objective is to offer researchers and
general users alike a network of terminological variants for categories of symptoms and
conditions in English and Spanish.
The data for this study were based on a corpus of non-translated language (English and
Spanish) combined with a corpus of translated texts (from English into Spanish). The
comparable component includes texts that were originally written in English and
Spanish,: (i) for English, the EnTenTen12 corpus, and the English component of
Oncoterm Corpus; (ii) for Spanish, the EsTenTen corpus and the Spanish component of
the Oncoterm Corpus. The ‘TenTen’ corpora for English and Spanish were compiled
and accessed by means of the Sketch Engine corpus query system (Lexical computing
Ltd., n.d.). The EnTenTen12 corpus is a web corpus of more than 11,000 million words
tagged by TreeTagger4. The EsTenTen11 (Eu) corpus is a web corpus of European
Spanish containing more than 2,000 million words also tagged by TreeTagger. As
opposed to the TenTen corpora, the Oncoterm corpus is a specialized corpus on cancer.
Because of its size and the fact that oncology is a multidisciplinary area within
medicine, not only does it provide many usage examples on oncology but also on
general medicine. Regarding the translation corpus, it is composed of the Spanish
translations of the English texts published in Medline Plus5.
POST-PRINT VERSION: final draft post-refereeing
The corpus composition is shown in Table 1. The numbers of words provided do not
include numbers and punctuation marks.
POST-PRINT VERSION: final draft post-refereeing
Comparable corpus (non-translated language)
English:
EnTenTen corpus: Sketch Engine (11,191,860,036 words)
Oncoterm corpus (28,771,714 words)
Spanish:
EsTenTen11 corpus: Sketch Engine (2,103,770,763 words)
Oncoterm Corpus (13,645,317 words)
Translation corpus
Spanish translated texts from English
MedlinePlus in Spanish (709,764 words)
Table 1. Composition of the corpus.
2.2. Subjects
In our experiment, we selected 34 undergraduate students in a third year course on
scientific and technical translation within the Degree of Translation and Interpreting of
the University of Granada. They were 20-24 years of age (mean age, 21), with an L1 of
Spanish, and an L2 of English. Initially, 44 students participated in the task but 10 had
an L1 other than Spanish and were subsequently eliminated from the study.
2.3. Experimental task
The subjects were asked to perform a time-constrained translation task in which 19
naturally-occurring English medical contexts (paragraphs or whole sentences) were
presented onscreen and centered. Their length ranged from 61 to 317 characters. Stimuli
were selected from real contexts using Sketch Engine and WordSmith Tools
POST-PRINT VERSION: final draft post-refereeing
concordancing modules. The contexts remained onscreen for a time period, based on a
standard reading speed of eleven characters per second (Díaz Cintas and Remaël 2007:
96). All contexts were controlled for difficulty. Difficulty was calculated as their level
of explanatoriness of the sign/symptom referred to (a paraphrase or definition was
provided to facilitate comprehension). All contexts contained a multi-word lexical unit
that could be translated by at least two options: a cognate option and a non-cognate unit.
The non-cognate unit was usually a more naturally occurring option in original Spanish
texts. In the contexts provided, English words with a Spanish cognate equivalent were
marked in bold and occupied an initial, middle, or final position in the sentence, and
stimuli were presented randomly (see Figure 3). Subjects were given both oral and
written instructions to read the context and then to translate the marked unit. Students
were asked to maintain the register and degree of technicality of the source texts. They
were also asked to write down their choice and not to go back and change their option.
Figure 3. Example of stimuli presented onscreen.
Finally, a multiple-choice task was presented in which students had to choose between
three translation options for the MWLUs of the previous production task. The objective
was to see whether the presentation of this lexical decision task modified their first
choice.
(1) mitochondrial DNA
POST-PRINT VERSION: final draft post-refereeing
ADN mitocondrial
ADN mitocóndrico
ADN mitocondriano
3. Results
The use of different sources of empirical data (e.g. experimental tasks and corpus data)
provides a way to triangulate results (Alves et al. 2010; Tercedor et al. 2012). More
specifically, we used several data collection instruments and types of analysis to shed
light on the translation of cognates. This section describes the result of combining
experimental tasks with corpus data and explores whether there is a cognitive
motivation that lead students to select cognates vs. non-cognates, and whether there is a
correlation between their answers (process-based research) and the frequency of
MWLUs in different corpora (product-based research).
3.1. Research questions 1 and 2: lexical decision and cognitive motivation
The data of the on-screen production task and the multiple choice task suggest that in
time-constrained L2-L1 translation, native Spanish students of translation use cognates
as translation equivalents for MWLUs, and that there is a cognitive motivation
underlying their production. In our study, cognates produced as target renderings appear
to be safe options. This contrasted with results obtained by Kussmaul (1995: 17-18) and
Kussmaul and Tirkkonen-Condit (1995: 187), who found that translators avoided
cognate options in their translations because of fear of cognates being false cognates. As
POST-PRINT VERSION: final draft post-refereeing
shown in Figure 4, in the onscreen production task, the majority of students translated
the MWLUs as cognates with the exception of cardiopulmonary resuscitation and
trauma surgery. The difference between the cognate and non-cognate options is very
significant for most of the units. The exception, the choice of non-cognates as
translations for cardiopulmonary resuscitation or anabolic steroids can be explained by
the fact that these expressions were part of their world knowledge. More specifically,
students were familiar with these concepts because of TV medical series, the use of
steroids in sports, etc. This became evident from the feedback on these concepts, which
was received from the students in an interview two days after the experiment. The cases
in which students did not write anything (transient ischaemic attack) or provided
unsatisfactory solutions (DNA mitocondrial, DNA mitocondriano for mitochondrial
DNA) could be markers of difficulty in the recognition of specialized concepts and in
the understanding their meaning. In fact, one of the most frequent features in non-
existent solutions (i.e. terms not found in medical dictionaries) was the atypical
collocations feature of translation as established in Mauranen (2000; 2008: 41–44). This
feature points to the fact that translations normally display ‘odd collocations’ or
untypical combinatory tendencies. In her words, translations tend to favor combinations
that, although possible in the target language system, are rare or absent from actual
target language texts. Conversely, translations often have few or no instances of
combinations that are frequent in target-language originals” (Mauranen 2008: 44).
For example, some of the subjects (20.59%) translated trauma surgery as cirugía
traumática, which is a non-existent collocation.
POST-PRINT VERSION: final draft post-refereeing
Figure 4. Results of translation of multi-word cognates by students of scientific and
technical translation.
The rapid response obtained from students and the low percentage of non-translations
may indicate non-complexity in translating cognate forms, namely access from lexical
representation in L2 to lexical representation in L1. It might be the case that in L2–L1
translation, if there is formal similarity, the lexical route is accessed (the cognate form is
used) instead of the conceptual route in line with the Revised Hierarchical Model (Kroll
and Stewart 1994). These results reinforce the hypothesis that in time-constrained
translations, there is source-language interference.
Moreover, in the domain of Medicine many terms in English and Spanish share
common Greek and Latin roots, and this explains the fact that an MWLU such as
chronic gastralgia is translated into Spanish with its ‘natural’ cognate: “gastralgia
POST-PRINT VERSION: final draft post-refereeing
crónica”. In future research, we plan to filter out those English examples involving
MWLUs containing Greek and Latin stems that are easily rendered into Spanish in the
same form, as opposed to Spanish-English translation, where the Latin or Greek form
might not be the primary option.6
In order to answer our second research question (i.e. Do students react differently when
translating cognates spontaneously as opposed to selecting an option in a multiple-
choice task?), we carried out a multiple-choice task in which students were given
different translation options (cognate and non-cognate) and were asked to choose one.
We wanted to make sure that using cognates was not just an “easy escape” for students
who did not know the equivalent term in time-constrained translation tasks. We wanted
to check whether their preference for cognates remained when they were given several
translation options and had time to think about their appropriateness.
Figure 5. Results of the multiple-choice task.
POST-PRINT VERSION: final draft post-refereeing
As can be seen in Figure 5, when students had more time, they did not choose cognates
as markedly as in the production task. It could be the case that the translation
competence of undergraduate students of Translation allows them to avoid calques and
borrowings as translation options, given the time to do so and having all options
visually available. This result brings us to the third research question.
3.2. Research question 3: correlation between the use of cognates in the production task
and their frequency in corpora
To relate experimental data with corpus data, we formulated the question: Is there a
correlation between the frequency of cognates in the production task (translated Spanish
text) and the frequency of cognates in L1 non-translation corpora (EsTenTen11)?
Frequency of cognates was measured in two ways: (i) frequency of MWLUs as a unit
(for example, adenoma* hepatocelular* encoded as "adenoma.*" []{0}
"hepatocelular.*" in Sketch Engine’s Corpus Query Language); (ii) frequency of its
components taken individually even when they do not collocate (frequency of
adenoma* + frequency of hepatocelular*). The wildcard indicates that the singular and
plural forms were retrieved.
As shown in Figure 6, data from the Spanish EsTenTen11 corpus of European Spanish
indicate that cognates are more frequent than their alternative non-cognate synonyms in
naturally-occurring contexts, such as those extracted from this corpus of non-translated
texts. The only multi-word lexical units that are more frequent in their non-cognate
form are cirugía traumatológica and reanimación cardiopulmonar.
POST-PRINT VERSION: final draft post-refereeing
.
Figure 6. Frequency of multi-word cognates in the EsTenTen corpus.
The frequency of the cognate units in the corpus varies greatly as can be observed in
Figure 6. All units except for gastralgia crónica were present in the corpus, although
their frequency is very low. This is not surprising since the EsTenTen11 corpus is a
general language corpus, and therefore, does not necessarily include our multi-word
medical terms verbatim.
However, if taken separately, the elements composing the multi-word lexical units have
a high frequency in the EsTenTen11 corpus (Figure 7).
POST-PRINT VERSION: final draft post-refereeing
Figure 7. EsTenTen frequencies of W1+W2 multi-word cognates taken separately.
It is our understanding that a student with translation competence can produce multi-
word lexical units on the basis of their knowledge of L2 and L1 grammars and their
familiarity with the individual words. We assumed that if the words composing a
specific MWLU were frequent in a general corpus of non-translated Spanish, then
native speakers of Spanish were likely to use them or be familiar with them, and
therefore, they might associate them with their collocate more easily. Consequently, we
tested whether there was a correlation between the cognates produced in the
experimental task and the corpus frequency of MWLUs. On this occasion, we measured
the frequency of MWLUs by adding the frequency of their components when they
appeared individually, and we compared these frequencies with the results of the
production task (Figure 8).
POST-PRINT VERSION: final draft post-refereeing
Figure 8. Cognates in the production task (inner circle) and frequencies of their
elements in the EsTenTen11 corpus (outer circle).
The Spanish cognates used by students in the production task are represented in the
inner circle, and the frequencies of the elements of each MWLU in the EsTenTen
corpus are in the outer circle. For example, to obtain the percentages of the inner circle,
the cognates in the production task were grouped together as a whole, and a percentage
was assigned to indicate the frequency of each cognate in relation to the overall number
of cognates. It is true that this kind of representation is limited in the sense that
percentages only refer to their overall presence in the data under consideration.
Nevertheless, it might point to those examples more likely to be translated as cognates
because the cognate form is frequent in a general corpus. For example, in the inner
circle, the higher percentages (11% for dolor epigástrico; and 10% for adenoma
POST-PRINT VERSION: final draft post-refereeing
hepatocelular, dolor abdominal and neuralgia trigeminal; 9% for gastralgia crónica,
8% for examen ecocardiográfico) indicate that these were the examples where students
used more cognate variants. In the outer circle, the higher frequencies indicate higher
frequencies of the components of each MWLU in the EsTenTen corpus: 23% for dolor
abdominal, 21% for dolor epigástrico, 17% for examen ecocardiográfico, 14% for
adaptación lumínica, and 8% for gastralgia crónica. Some of the most frequent
cognates coincide and, overall, there is a rough correspondence between both circles in
the sense that when words (cognates) are frequent in a corpus, speakers will use them to
communicate or translate. Therefore, it would appear that there is a correlation between
the frequency of multi-word cognates in the production task and the frequency of their
elements in non-translation corpora.
3.3. Research question 4: correlation between the use of non-cognates in the production
task and low frequencies of cognate terms in corpora
The fourth research question was: Is there a correlation between the use of non-cognates
in the production task and low frequencies of cognate terms in non-translated L1
corpora? With this question we wished to test whether there was a relation between the
use of non-cognates in the production task and their significance in terms of frequency
in non-translated corpora. We assumed that non-translated corpora in Spanish
(comparable corpora)7 contain more naturally-occurring contexts, and therefore, are a
good reference to search for non-cognates in the translation classroom.
If we compare the two charts in Figure 9, which give the results of the production task
and frequency of cognate MWLUs in Spanish monolingual corpora (EsTenTen11), it is
POST-PRINT VERSION: final draft post-refereeing
evident that for those cases in which students resorted to using the non-cognate form for
a concept (medicamento anticonvulsivo, cirugía traumatológica), the corpus also
showed a low frequency for the particular cognate. The exception was the case of
cardiopulmonary resuscitation because of the reasons given in section 3.1.
Figure 9. The use of non-cognates and low frequencies of cognate terms in L1
monolingual corpora.
We finally checked the frequency of the most frequent cognate and non-cognate
MWLUs in our different corpora, with the aim of comparing results and exploring
specific corpus usability for the translation of these units.
POST-PRINT VERSION: final draft post-refereeing
COGNATES NON-COGNATES
neuralgia trigeminal neuralgia del trigémino
examen ecocardiográfico ecocardiograma, N + ecocardiográfico/a
anticonvulsivante/s, anticonvulsante/s anticonvulsivo/a/os/as
cirugía traumática cirugía traumatológica
adaptación lumínica/ a la luz adaptación fotópica
esteroide/s anabólico/s esteroide/s anabolizante/s, anabolizante/s (N)
ataque/s isquémico/s transitorio accidente/s isquémico/s transitorio/s
resucitación cardiopulmonar reanimación cardiopulmonar
Table 2. Spanish most frequent cognate (left) and non-cognate units (right) as revealed
by our corpora.
Interestingly, the terms cirugía traumática, resucitación, ataque isquémico, and
anticonvulsante are either not present or their use is not recommended by the recently
published Diccionario de términos médicos (Real Academia Nacional de Medicina,
[Spanish Royal Academy of Medicine], 2012). In the case of resucitación, this term was
sanctioned by the RAE in 2001.
More specifically, frequency was analyzed in the EsTenTen corpus and the Oncoterm
corpus (Figures 10 and 11).
Appropriate solutions in EsTenTen11
10 1
1127
16
75
128268
156 387
16332
0
864
108795
0%
20%
40%
60%
80%
100%
trige
min
alne
ural
gia
echo
card
iogr
aphi
cex
amin
atio
n
antic
onvu
lsan
tm
edic
atio
n
traum
a su
rger
y
light
ada
ptat
ion
anab
olic
ste
roid
s
trans
ient
isch
aem
ic a
ttack
card
iopu
lmon
ary
resu
scita
tion
Non-cognatesCognates
POST-PRINT VERSION: final draft post-refereeing
Figure 10. Frequencies of cognates and non-cognates in the EsTenTen11 corpus.
Appropriate solutions in OncoTerm Corpus
4
623
0 0 115
49
40
57
1
0
2549
34
0%20%40%60%80%
100%tri
gem
inal
neur
algi
a
echo
card
iogr
aphi
cex
amin
atio
n
antic
onvu
lsan
tm
edic
atio
n
traum
a su
rger
y
light
ada
ptat
ion
anab
olic
ste
roid
s
trans
ient
isch
aem
ic a
ttack
card
iopu
lmon
ary
resu
scita
tion
Non-cognatesCognates
Figure 11. Frequencies of cognates and non-cognates in the OncoTerm corpus.
The EsTenTen provided more data than the OncoTerm corpus. Both corpora mainly
contained more examples of non-cognates (percentages on top) than cognates. Thus it
seems that they may be useful for translators to retrieve natural expression in Spanish.
On the other hand, Figure 12 is an extract of results of the frequency analysis obtained
from the Medline corpus. Despite the fact that it is a specific translated Spanish corpus
on Medicine, the Medline corpus is of little use for frequency purposes, given its
reduced size (800,000 types). We can thus conclude that this corpus of translated
Spanish is not very useful to obtain idiomatic specialized MWLUs in the target
language (Spanish).
Appropriate solutions in MedlinePlus Spanish
0 0 0 0 0
6 56
3
0
1
0 0
0 02
0%10%20%30%40%50%60%70%80%90%
100%
trige
min
alne
ural
gia
echo
card
iogr
aphi
cex
amin
atio
n
antic
onvu
lsan
tm
edic
atio
n
traum
a su
rger
y
light
ada
ptat
ion
anab
olic
ste
roid
s
trans
ient
isch
aem
ic a
ttack
card
iopu
lmon
ary
resu
scita
tion
Non-cognatesCognates
POST-PRINT VERSION: final draft post-refereeing
Figure 12. Frequency of cognate and non-cognate solutions in the Medline corpus.
Conclusions
The purpose of this research was to study multi-word cognate production in a
translation task. The results showed that in time-constrained L2-L1 translation of
MWLUs, cognates were usually preferred as translation equivalents. A correlation was
found between the use of multi-word cognates in translation and the frequency of their
elements in corpora. According to López, Buendía and García (2012: 68), the use of
different measuring instruments is helpful in gathering and interpreting evidence from
other perspectives that may confirm the results obtained from participants in a specific
context. These additional converging instruments should be selected according to
pragmatic criteria, such as time, place and ethics, as well as their scientific quality and
familiarity, their time-saving properties and coherence with our initial measuring
instrument.
By comparing data obtained in a production task with corpus evidence, it was possible
to better understand translation features. Our results are preliminary due to the
limitations of the task and the data at hand, but they have a direct application for the
VARIMED database in relation to the following:
(a) the classification of forms of lexical variation in medical texts as attested in
experimental tasks and web corpora;
(b) the acquisition of insights into interlinguistic competence, procedural competence,
and expert knowledge of translators;
POST-PRINT VERSION: final draft post-refereeing
(c) the specification of areas of improvement in translation, second language learning,
dictionary making and terminology encoding.
This study has illustrated how combining corpus evidence with data obtained in
production tasks can offer valuable information regarding issues such as the relation
between frequency, as attested by corpus, and familiarity, as evidenced in an
experimental task.
References
Alves, Favio, Adriana Pagano, and Stella Neumann. 2010. “Translation units and
grammatical shifts: towards an integration of product- and process-based translation
Research.” In Gregory M. Shreve and Erik Anglone (eds.), Translation and cognition.
Amsterdam/ Philadelphia: Benjamins, 109-142.
Díaz Cintas, Jorge and Aline Remaël. 2007. Audiovisual translation: subtitling.
Manchester: St. Jerome.
Faber, Pamela, Clara Inés López Rodríguez and Maribel Tercedor Sánchez. 2001. “La
utilización de técnicas de corpus en la representación del conocimiento médico.”
Terminology, 7 (2): 167-197.
Fernández Silva, Sabela. 2010. Variación terminológica y cognición. Factores
cognitivos en la denominación del concepto especializado. PhD Thesis. Barcelona:
Universitat Pompeu Fabra.
Fernández Silva, Sabela and Koen Kerremans. 2011. “Terminological variation in
source texts and translations: a pilot study.” Meta: journal des traducteurs / Meta:
Translators' Journal, 56 (2), 2011, 318-335.
POST-PRINT VERSION: final draft post-refereeing
Freixa, Judit. 2006. “Causes of denominative variation in terminology: a typology
proposal.” Terminology 12 (1), 51-57.
Freixa, Judit, Sabela Fernández, Sabela and María Teresa Cabré. 2008. “La multiplicité
des chemins dénominatifs.” Meta: journal des traducteurs / Meta: Translators' Journal,
53 (4) : 731-747.
Geeraerts, Dirk, Stefan Grondelaers and Dirk Speelman. 1999. Convergentie en
divergentie in de Nederlandse woordenshat. Een ordenzoek naar Kleding- en
voetbaltermen. Amsterdam: Meertensinstitut.
Granger, Sylviane, Jacques Lerot, and Stephanie Petch-Tyson (eds.). 2003. Corpus-
based approaches to contrastive linguistics and translation studies. Amsterdam:
Rodopi.
Johansson, Stig. 2007. Seeing through Multilingual Corpora. On the use of corpora in
contrastive studies. Amsterdam/Philadelphia: John Benjamins.
Kilgarriff, Adam, Pavel Rychly, Pavel Smrz, and David Tugwell. 2004. “The Sketch
Engine.” In Proceedings EURALEX 2004. Lorient, France, 105-116.
Kroll, Judith and Erika Stewart. 1994. “Category interference in translation and picture
naming: evidence from asymmetric connections between bilingual memory
representations.” Journal of Memory and Language 33: 149-174.
Kussmaul, Paul. 1995. Training the Translator. Amsterdam/Philadelphia: John
Benjamins.
Kussmaul, Paul and SonjaTirkkonen-Condit. 1995. “Think-Aloud Protocol Analysis in
Translation Studies.” TTR 8: 177-199.
Lexical Computing Ltd. (n.d.). Sketch engine. Available at:
http://www.sketchengine.co.uk/
POST-PRINT VERSION: final draft post-refereeing
López Rodríguez, Clara Inés, Miriam Buendía Castro, and Alejandro García Aragón.
2012. “User needs to the test: evaluating a terminological knowledge base on the
environment by trainee translators.” JoSTrans (Journal of Specialised Translation) 18
(July 2012): 57-76. Available at: http://www.jostrans.org/issue18/art_lopez.pdf
López Rodríguez, Clara Inés, Pamela Faber, and Maribel Tercedor Sánchez. 2006.
“Terminología basada en el conocimiento para la traducción y la divulgación médicas:
el caso de Oncoterm.” Panace@ VII, 24. December, 2006. Available at:
http://www.medtrad.org/panacea/IndiceGeneral/n24_tradyterm-l.rodriguez.etal.pdf
Mauranen, Anna. 2000. “Strange strings in translated language: A study on Corpora.” In
Maeve Olohan. (ed.). Intercultural faultiness. Research models in Translation Studies
1: Textual and cognitive aspects. Manchester: Saint Jerome, 119-141.
Mauranen, Anna. 2008. “Universal tendencies in translation”. In Gunilla Anderman and
Margaret Rogers (eds.), Incorporating corpora: the linguist and the translator.
Clevedon, Buffalo: Multilingual matters, 32-48.
Moldovan, Dan, Adriana Badulescu, Marta Tatu, Daniel Antohe, and Roxana Girju.
2004. “Models for the semantic classification of noun phrases.” In Proceedings of the
Human Language Technology Conference (HLT-NAACL) 2004, Computational
Lexical Semantics Workshop. Boston, MA.
Nastase, Vivi and Stan Szpakowicz. 2003. “Exploring noun-modifier semantic
relations”. In Fifth International Workshop on Computational Semantics (IWCS-5).
Tilburg, The Netherlands, 285-301.
OncoTerm Project. 2002. “Sistema Bilingüe de Información y Recursos Oncológicos
[Bilingual System of oncological information and resources.” Available at:
http://www.ugr.es/~oncoterm/oncodesc.htm.
POST-PRINT VERSION: final draft post-refereeing
Peters, Carol, Eugenio Picchi, and Lisa Biagini. 1996. “Parallel and comparable
bilingual corpora in language teaching and learning.” In Simon Botley et al. (eds.).
Proceedings of Teaching and Language Corpora (UCREL Technical Papers, Vol. 9).
Lancaster: University of Lancaster, 68-80.
Poplack, Shana. 2004. “Code-switching”. In U. Ammon, N. Dittmar, K.J. Mattheier,
and P. Trudgill (eds.), Soziolinguistik. An international handbook of the science of
language. Berlin: Walter de Gruyter, 589-596.
Rabadán, Rosa, Belén Labrador and Noelia Ramón. 2009. “A tool for translation quality
assessment English-Spanish.” Babel 55:4, 303-328.
Real Academia Nacional de Medicina. 2012. Diccionario de términos médicos. Madrid:
Editorial médica panamericana.
Sankoff, David, Shana Poplack, and Swathi Vanniarajan. 1990. “The Case of the Nonce
loan in Tamil.” Language Variation and Change 2: 71-101.
Tercedor Sánchez, Maribel and Beatriz Méndez Cendón, Beatriz. 2000. “Fraseología y
variación terminológica: estudio descriptivo en corpora biomédicos.” Terminologie et
Traduction 2: 82-100.
Tercedor Sánchez, Maribel. 2010. “Cognates as lexical choices in translation.
Interference in space-constrained texts.” Target 22: 2. 177-193.
Tercedor Sánchez, Maribel. 2011. “The cognitive dynamics of terminological
variation.” Terminology 17 (2): 181-197.
Tercedor Sánchez, Maribel and Clara Inés López Rodríguez. 2012. “Access to health in
an intercultural setting: the role of corpora and images in grasping term variation”.
Linguistica Antverpiensia NS (Themes in Translation Studies: Translation and
knowledge mediation in medical and health settings) 11/2012: 247-268.
POST-PRINT VERSION: final draft post-refereeing
Tercedor Sánchez, Maribel, Clara Inés López Rodríguez and Pamela Faber. 2012.
“Working with words: research methodologies in translation-oriented lexicographic
practice.” TTR: traduction, terminologie, rédaction Vol. XXV, 1: 181-214.
Toury, Gideon. 1995. Descriptive Translation Studies and beyond.
Amsterdam/Philadelphia: John Benjamins.
POST-PRINT VERSION: final draft post-refereeing
NOTES
1 This research has been carried out within the framework of VariMed: Denominative variation in Medicine: Multilingual multimodal tool for research and knowledge dissemination (FFI2011-23120), a three year (2012-2014) research project funded by the Spanish Ministry of Economy and Competitiveness, with the participation of researchers from the University of Granada, University Pablo de Olavide, and University of Valladolid (Spain), Rutgers University (USA) and Carleton University (Canada), aimed at the study of denominative variation from a cognitive and communicative perspective. This research is also part of the innovative teaching project Comunicación y ciudadanía europea (inglés-español): recursos multimodales para la salud y el medioambiente [Communication and European Citizenship (English-Spanish): multimodal resources on Health and the Environment], funded by the University of Granada. 2 The concept has been referred to as onomasiological variation in the cognitive linguistics paradigm (Geeraerts, Grondelaers and Speelman (1999), distinguishing between formal variation (use of synonyms) and conceptual onomasiological variation (use of the hyperonym and hyponym alternatively). However in the field of terminology this form is rarely used and full synonymy is rare. 3 In the database, the field familiarity will contain a likert-type scale reflecting the results of such tests. 4 The TreeTagger is a tool for annotating text with part-of-speech and lemma information. For more information: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/. 5 The corpus was compiled by Miguel Ángel Jiménez-Crespo from the Medline webpage: http://www.nlm.nih.gov/medlineplus/spanish/ 6 There is an interesting description of the percentage of English words that come from Latin in Dictionary.com (http://dictionary.reference.com/help/faq/language/t16.html). According to this source over 60 percent of all English words have Greek or Latin roots. In the vocabulary of the sciences and technology, the figure rises to over 90 percent. 7 Peters et al. (1996: 69) define comparable corpora as "sets of texts from pairs or multiples of languages which can be contrasted and compared because of their common features."