New Directions in Corpus Studies

Embed Size (px)

Citation preview

  • 8/18/2019 New Directions in Corpus Studies

    1/175

    New directions incorpus-basedtranslationstudiesEdited by

    Claudio Fantinuoli and Federico

    Zane in

    Translation and Multilingual NaturalLanguage Processing 1

    language

    sciencepress

  • 8/18/2019 New Directions in Corpus Studies

    2/175

    Translation and Multilingual Natural Language Processing

    Chief Editor: Reinhard Rapp (Johannes Gutenberg-Universität Mainz)Consulting Editors: Silvia Hansen-Schirra, Oliver Čulo (Johannes Gutenberg-Universität Mainz)

    In this series:

    1. Fantinuoli, Claudio & Federico Zanettin (eds.). New directions in corpus-based translationstudies

  • 8/18/2019 New Directions in Corpus Studies

    3/175

    New directions incorpus-basedtranslationstudiesEdited by

    Claudio Fantinuoli and Federico

    Zane in

    language

    sciencepress

  • 8/18/2019 New Directions in Corpus Studies

    4/175

    Claudio Fantinuoli and Federico Zane in (ed.). 2015. New directions incorpus-based translation studies (Translation and Multilingual Natural LanguageProcessing 1). Berlin: Language Science Press.

    is title can be downloaded at:http://langsci-press.org/catalog/book/76© 2015, the authorsPublished under the Creative Commons A ribution 4.0 Licence (CC BY 4.0):h p://creativecommons.org/licenses/by/4.0/ISBN: 978-3-944675-83-1

    Cover and concept of design: Ulrike Harbort

    Typese ing: Claudio Fantinuoli, Katrin Hamberger, Felix Kopecky, SebastianNordho Proofreading: Željko Agić, Benedikt Baur, Rachele De Felice, Stefan Hartmann,Rebekah Ingram, Ka Shing Ko, Kristina Pelikan, Christian Pietsch, DanielaSchröder, Charlo e van TongerenFonts: Linux Libertine, ArimoTypese ing so ware:

    Language Science Press

    Habelschwerdter Allee 4514195 Berlin, Germanylangsci-press.org

    Storage and cataloguing done by FU Berlin

    Language Science Press has no responsibility for the persistence or accuracy of URLs for external or third-party Internet websites referred to in this publication,and does not guarantee that any content on such websites is, or will remain, ac-curate or appropriate. Information regarding prices, travel timetables and otherfactual information given in this work are correct at the time of rst publicationbut Language Science Press does not guarantee the accuracy of such informationtherea er.

    http://langsci-press.org/http://langsci-press.org/catalog/book/76

  • 8/18/2019 New Directions in Corpus Studies

    5/175

    Contents

    1 Creating and using multilingual corpora in translation studiesClaudio Fantinuoli and Federico Zane in 1

    2 Development of a keystroke logged translation corpus

    Tatiana Serbina, Paula Niemietz and Stella Neumann 11

    3 Racism goes to the movies: A corpus-driven study of cross-linguisticracist discourse annotation and translation analysisE e Mouka, Ioannis E. Saridakis and Angeliki Fotopoulou 35

    4 Building a trilingual parallel corpus to analyse literary translations fromGerman into BasqueNaroa Zubillaga, Zuriñe Sanz and Ibon Uribarri 71

    5 Variation in translation: Evidence from corporaEkaterina Lapshinova-Koltunski 93

    6 Non-human agents in subject position: Translation from English intoDut : A corpus-based translation study of “give” and “show”Steven Doms 115

    7 Investigating judicial phraseology with COSPE: A contrastive corpus-based studyGianluca Pontrandolfo 137

    Indexes 161

  • 8/18/2019 New Directions in Corpus Studies

    6/175

  • 8/18/2019 New Directions in Corpus Studies

    7/175

    Chapter 1

    Creating and using multilingual corporain translation studies

    Claudio Fantinuoli and Federico Zane in

    1 Introduction

    Corpus linguistics has become a major paradigm and research methodology intranslation theory and practice, with practical applications ranging from pro-fessional human translation to machine (assisted) translation and terminology.Corpus-based theoretical and descriptive research has investigated wri en andinterpreted language, and topics such as translation universals and norms, ideol-ogy and individual translator style (Laviosa 2002; Olohan 2004; Zane in 2012),while corpus-based tools and methods have entered the curricula at translationtraining institutions (Zane in, Bernardini & Stewart 2003; Beeby, Rodríguez Inés& Sánchez-Gijón 2009). At the same time, taking advantage of advancementsin terms of computational power and increasing availability of electronic texts,enormous progress has been made in the last 20 years or so as regards the de-velopment of applications for professional translators and machine translationsystem users (Coehn 2009; Brune e 2013).

    e contributions to this volume, which are centred around seven Europeanlanguages (Basque, Dutch, German, Greek, Italian, Spanish and English), addto the range of studies of corpus-based descriptive studies, and provide exam-ples of some less explored applications of corpus analysis methods to transla-tion research. e chapters, which are based on papers rst presented at the 7thcongress of the European Society of Translation Studies held in Germersheim in

    Claudio Fantinuoli & Federico Zane in. 2014. Creating and using multilin-gual corpora in translation studies. In Claudio Fantinuoli & Federico Za-ne in (eds.), New directions in corpus-based translation studies , 1–10. Berlin:Language Science Press

  • 8/18/2019 New Directions in Corpus Studies

    8/175

  • 8/18/2019 New Directions in Corpus Studies

    9/175

    1 Creating and using multilingual corpora in translation studies

    2 Corpus design

    e initial thrust to descriptive corpus-based studies ( ) in translation camein the 1990s, when researchers and scholars saw in large corpora of monolin-gual texts an opportunity to further a target oriented approach to the study of translation, based on the systemic comparison and contrast between translatedand non-translated texts in the target language (Baker 1993). In the wake of the

    rst studies based on the Translation English Corpus ( ) (Laviosa 1997) vari-ous other corpora of translated texts were compiled and used in conjunction withcomparable corpora of non-translated texts. Descriptive translation research us-ing multilingual corpora progressed more slowly, primarily because of lack of suitable resources. Pioneering projects such as the English Norwegian ParallelCorpus ( ), set up in the 1990s under the guidance of Stig Johansson (see e.g. Johansson 2007) and later expanded into the Oslo Multilingual Corpus, whichinvolved more than one language and issues of bitextual annotation and align-ment, were a productive source of studies in contrastive linguistics and transla-tion, but they were not easily replicable because the creation of such resourcesis more time consuming and technically complex than that of monolingual cor-pora.2 us, research was initially mostly restricted to small scale projects, o eninvolving a single text pair, and non re-usable resources. However, the last fewyears have seen the development of some robust multilingual and parallel corpusprojects, which can and have been used as resources in a number of descriptivetranslation studies. Two of these corpora, the Dutch Parallel Corpus (Rura, Van-deweghe & Perez 2008) and the German-English CroCo Corpus (Hansen-Schirra,Neumann & Steiner 2013) are in fact sources of data for two of the articles con-tained in this volume. Other corpora used in the studies in this volume wereinstead newly created as re-usable resources.

    Typically, a distinction is made between (bi- or multi-lingual) parallel corpora,said to contain source and target texts, and comparable corpora, de ned as cor-

    pora created according to similar design criteria. However, not only is the ter-minology somewhat unstable (Zane in 2012: 149) but the distinction betweenthe two types of corpora is not always clear cut. First, parallel corpora do not

    2 Given the advances in parallel corpus processing behind developments in statistical machinetranslations, it may appear somewhat surprising that they have not bene ted descriptive re-search more decisively. However, while descriptive and pedagogic research depends on man-ual analysis and requires data of high quality, research in statistical machine translation privi-leges automation and data quantity, and thus tools and data developed for machine translation(including alignment techniques and tools, and aligned data), are usually not suitable or avail-able for descriptive translation studies research.

    3

  • 8/18/2019 New Directions in Corpus Studies

    10/175

    Claudio Fantinuoli and Federico Zane in

    necessarily contain translations. For instance, the largest multilingual parallelcorpora publicly available, Europarl and Acquis Communautaire, created by theactivity of European Institutions, contain all originals in a legal sense. Second,comparable corpora may have varying degrees of similarity and contain not only“original” texts but also translations. ird, various “hybrid texts” exist in which“translated” text is intermingled with “comparable” text, very similar in terms of subject ma er, register etc., but not a translation which can be traced to “par-allel” source text. Examples include news translation and text crowdsourcing(e.g. Wikipedia articles in multiple languages), which are generated through“transediting” (Ste ing 1989) practices and are thus partly “original writing” andpartly translation, possibly from multiple sources.

    It may thus be useful to consider the a ribute “parallel” or “comparable” asreferring to a type of corpus architecture, rather than to the status of the textsas concerns translation. Parallel corpora can thus be thought of as corpora inwhich two or more components are aligned, that is, are subdivided into composi-tional and sequential units (of di ering extent and nature) which are linked andcan thus be retrieved as pairs (or triplets, etc.). On the other hand, comparablecorpora can be thought of as corpora which are compared on the whole on thebasis of assumed similarity.

    A distinctive feature of the corpora described in this volume is their com-

    plexity, as most corpora contain more than two subcorpora, o en in di erentlanguages, and in some cases together with di erent types of data. Serbina,Niemietz and Neumann’s keystroke logged corpus contains original texts andtranslations, together with the intermediate versions of the unfolding transla-tion process. e corpus is based on keystroke logging and eye-tracking datarecorded during translation, editing and post-editing experiments. e log of keystrokes is seen as an intermediate version between source and nal transla-tion. e corpus created by Mouka, Saridakis and Fotopoulou is a multilingualand multimodal corpus comprising ve lms in English together with English,

    Greek and Spanish subtitles. e lms were selected for their related subjectma er and contain a signi cant amount of conversation carried out in interra-cial communities, and feature several instances of racist discourse. Zubillaga,Sanz and Uribarri describe the design and compilation of Aleuska, a multilin-gual parallel corpus of translations from German to Basque. e corpus, whichcollates three subcorpora of literary and philosophical texts, was collected a ermeticulous bibliographic research. Translation into a minority language, suchas Basque, is a complex phenomenon, and this complexity is re ected in the de-sign of the corpus, which includes a subcorpus of Spanish texts used as a relay

    language in the translation process.

    4

  • 8/18/2019 New Directions in Corpus Studies

    11/175

    1 Creating and using multilingual corpora in translation studies

    Lapshinova-Koltunski’s VARiation in TRAnslation ( ) corpus comprisesve sets of translations of the same source texts carried out using di erent trans-

    lation methods, together with the source texts and a set of comparable Ger-man originals. e rst subcorpus of translations is a selection extracted fromthe Cross-linguistic Corpus (CroCo) (Hansen-Schirra, Neumann & Steiner 2013),which contains human translations together with their source texts from vari-ous registers of wri en language. Since CroCo is a bidirectional corpus, it alsocontains a set of comparable source texts in German (and their English transla-tions, which however were not needed for this investigation). e second setof German translations contains texts produced by translators with the help of Computer Assisted Translation ( ) tools, while each of the three remainingsubcorpora contains the output of a di erent machine translation system. elast two articles in this collection focus on corpus analysis rather than on thedesign and construction of the corpora used, which are described extensivelyelsewhere. However, it is clear that results are as good as the criteria whichguided the creation of the corpora from which they are derived. Doms drawshis data from the Dutch Parallel Corpus ( ), a balanced 10 million word cor-pus of English, French and Dutch originals and translations, while the data ana-lyzed by Pontrandolfo come from the COrpus de Sentencias PEnales ( ), acarefully constructed specialized corpus of legal discourse. is a trilingual

    comparable corpus and does not contain translations, though its Italian, Englishand Spanish subcorpora are extremely similar from the point of view of domain,genre and register.

    3 Annotation and alignment

    e enrichment of a corpus with linguistic and extra-linguistic annotation mayplay a decisive part in descriptive studies based on corpora of translations, andare of particular concern to the rst four articles, in which research implemen-tation relies to a large extent on annotation. Issues of annotation and alignmentcome to the fore in the study by Sebine, Niemetz and Neumann, who show howboth process and product data can be annotated in format in order to querythe corpus for various features and recurring pa erns. e keylogged data pro-vided by the Translog so ware are pre-processed to represent individual key-stroke logging events as linguistic structures, and these process units are thenaligned with source and target text units. All process data, even material thatdoes not appear in the nal translation product, is preserved, under the assump-tion that all intermediate steps are meaningful to an understanding of the trans-lation process.

    5

  • 8/18/2019 New Directions in Corpus Studies

    12/175

    Claudio Fantinuoli and Federico Zane in

    Bringing together approaches from descriptive translation studies and criti-cal discourse linguistics, Mouka, Saridakis and Fotopoulou address the topic of racism in multimedia translation by creating a time-aligned corpus of lm dia-logues, and a empting to code and classify instances of racist discourse in En-glish subtitles and their translations in multiple languages. e authors devise ataxonomy of racism-related u erances in the light of Appraisal eory (Martin& White 2005), and use the and applications to apply multiple layersof , conformant annotation to the multimodal and multilingual corpus.Racism-related u erances in the source and target languages are classi ed in or-der to allow for the analysis of register shi s in translation. e subtitles arealigned together into the trilingual parallel corpus as well as synchronized withthe audiovisual data, allowing access to the wider context for every u eranceretrieved.

    Zubillaga, Sanz and Uribarri had to face the challenge of working with a mi-nority language, Basque, for which scarce computational linguistics resourcesare available, and had therefore to develop their own tools. Research into lit-erary translations from German into Basque involves direct translations fromGerman into Basque but also indirect translation, carried out by going through aSpanish version. In order to observe both texts in the case of direct translationsand all three texts for indirect translations, Zubillaga, Sanz and Uribarri have

    aligned their annotated parallel trilingual corpus at sentence level, using aproject speci c alignment tool.e features chosen for comparative analysis in Lapshinova-Koltunski’s chap-

    ter were obtained on the basis of automatic linguistic annotation. All subcor-pora were tokenised, lemmatised, tagged with part of speech information, andsegmented into syntactic chunks and sentences, and were then encoded in a for-mat compatible with the Open Corpus Workbench corpus management andquery tool. ough the set of translations extracted from the CroCo corpus arealigned with their source texts, the ve subcorpora of translations are not aligned

    between them since this annotation level is not necessary for the extraction of the operationalisations used in this study. In this respect, then, is treatedas a comparable rather than as a parallel corpus.

    Dom’s data are a collection of parallel concordances drawn from the Dutch Par-allel Corpus, and annotation and alignment at sentence level are clearly prerequi-sites for the type of investigation conducted. Pontrandolfo’s contains crim-inal judgements in di erent languages by di erent judicial systems, and there-fore the texts in the three subcorpora cannot be aligned. However, as shown byPontrandolfo, both researchers and translators can bene t from research based

    on corpora which are neither linguistically annotated nor aligned.

    6

  • 8/18/2019 New Directions in Corpus Studies

    13/175

    1 Creating and using multilingual corpora in translation studies

    4 Corpus analysis

    Sebine, Niemetz and Neumann o er several examples of possible data queriesand discuss how linguistically informed quantitative analyses of the translationprocess data can be performed. ey show how the analysis of the intermedi-ate versions of the unfolding text during the translation process can be used totrace the development of the linguistic phenomena found in the nal product.Mouka, Saridakis and Fotopoulou use the apparatus of systemic-functional lin-guistics to trace register shi s in instances of racist discourse in lms translatedfrom English into Greek and Spanish. ey also avail themselves of large compa-rable monolingual corpora in English and Greek as a backdrop against which toevaluate original and translated u erances in their corpus. Zubillaga, Sanz andUribarri provide a preliminaryexploration of the type of searches that can be per-formed using the Aleuska corpus using the accompanying search engine. eyframe their search hypothesis within Toury’s (1995) translation laws, nding ev-idence both of standardisation and interference, in direct as well as in indirecttranslation.

    Lapshinova-Koltunski’s chapter is one of the rst investigations which com-pares corpora obtained through di erent methods of translation to test a theoret-ical hypothesis rather than to evaluate the performance of machine translationsystems. e subcorpora are queried using regular expressions based on partof speech annotation which retrieve words belonging to speci c word classesor phrase types. ese lexicogrammatical pa erns, together with word countstatistics, are used as indicators of four hypothesized translation speci c fea-tures, namely simpli cation, explicitation, normalisation vs. “shining through”,and convergence. While these features have been amply investigated in the liter-ature, the novelty of Lapshinova-Koltunski’s study is that the comparison takesinto account not only variation between translated and non-translated texts, butalso with respect to the method of translation. Preliminary results show interest-

    ing pa erns of variation for the features under analysis.Doms analyses 338 parallel concordances containing instances of the Englishverbs give and show with an agent as their subject, and their Dutch translations.

    e analysis was carried out manually by ltering out from search results un-wanted instancessuch as passive andidiomaticconstructions, andby distinguish-ing between human and non-human agents. First, the author provides a discus-sion of the prototypical features of agents which perform the action with partic-ular verbs, and an overview of the di erent constraints which certain verbs poseon the use of human and non-human agents in English and Dutch, respectively.

    7

  • 8/18/2019 New Directions in Corpus Studies

    14/175

    Claudio Fantinuoli and Federico Zane in

    He then zooms in on the two verbs under analysis, and discusses the data fromthe corpus. Since sentences with action verbs like give or show and non-humanagents are less frequently a ested in Dutch than in English, the expectation isthat translators will not (always) translate English non-human agents as sub- jects of give and show with Dutch non-human agents as subjects of the Dutchcognates of give and show , geven and tonen, respectively. Doms describes thechoices made by the translators both on a syntactic and semantic level, compar-ing the translation data with the source-text sentences to verify whether thesesource-text verbs give rise to di erent solutions, showing how the translatorsdecided between either primed translations with non-human agents and transla-tions without non-human agents, but with speci c Dutch syntactic and semanticpa erns which di er from those in the English source texts.

    Pontrandolfo presents the results of an empirical study of phraseologicalunits in a speci c domain (criminal law) and type of legal genre (criminal judg-ments), approaching contrastive phraseology both from a quantitativeanda qual-itative perspective. He describes how four categories of phraseological units,namely complex prepositions, lexical doublets and triplets, lexical collocationsand routine formulae, were extracted from the corpus using a mix of manual andautomatic techniques. He shows how formulaic language, which plays a pivotalrole in judicial discourse, can be analyzed and compared across three languages

    by means of concordancing so ware. e nal goal of Pontrandolfo’s researchis to provide a resource for legal translators, as well as for legal experts, whichcan help them develop their phraseological competence through exposure to real,authentic (con)texts in which these phraseological units are used.

    5 Conclusions

    Corpus-based translation studies have steadily grown as a disciplinary sub-cat-egory since the rst studies began to appear more than twenty years ago. Abibliometric analysis of data extracted from the Translation Studies AbstractsOnline database shows that in the last ten years or so about 1 out of 10 publi-cations in the eld has been concerned with or informed by corpus linguisticsmethods (Zane in, Saldanha & Harding 2015). e contributions to this volumeshow that the area keeps evolving, as it constantly opens up to di erent frame-works and approaches, from Appraisal eory to process-oriented analysis, andencompasses multiple translation se ings, including (indirect) literary transla-tion, machine (assisted)-translation and the practical work of professional legaltranslators (and interpreters). Finally, the studies included in the volume expand

    8

  • 8/18/2019 New Directions in Corpus Studies

    15/175

    1 Creating and using multilingual corpora in translation studies

    the range of application of corpus applications not only in terms of corpus designand methodologies, but also in terms of the tools used to accomplish the researchtasks outlined. Corpus-based research critically depends on the availability of suitable tools and resources, and in order to cope properly with the challengesposed by increasingly complex and varied research se ings, generally availabledata sources and out of the box so ware can be usefully complemented by toolstailored to the needs of speci c research purposes. In this sense, a stronger tiebetween technical expertise and sound methodological practice may be key toexploring new directions in corpus-based translation studies.

    References

    Baker, Mona. 1993. Corpus linguistics and translation studies: Implications andapplications. In Mona Baker, Gill Francis & Elena Tognini-Bonelli (eds.), Text and technology: In honour of John Sinclair , 233–250. Amsterdam: John Ben- jamins.

    Beeby, Allison, Patricia Rodríguez Inés & Pilar Sánchez-Gijón. 2009. Corpus use and translating: Corpus use for learning to translate and learning corpus use to translate . Amsterdam: John Benjamins.

    Brune e, Louise. 2013. Machine translation and the working methods of transla-

    tors. Special issue of JosTrans (19). 2–7.Coehn, Philipp. 2009. Statistical machine translation. Cambridge: Cambridge Uni-

    versity Press.Hansen-Schirra, Silvia, Stella Neumann & Erich Steiner. 2013. Cross-linguistic

    corpora for the study of translations. Insights from the language pair English- German. Berlin: de Gruyter.

    Johansson, Stig. 2007. Seeing through multilingual corpora: On the use of corpora in contrastive studies . Amsterdam: John Benjamins.

    Laviosa, Sara. 1997. How comparable can “comparable corpora” be? Target 9(2).289–319.

    Laviosa, Sara. 2002. Corpus-based translation studies: eory, ndings, applica- tions . Amsterdam: Rodopi.

    Martin, James Robert & Peter R. R. White. 2005. e language of evaluation: Ap- praisal in English . London: Palgrave Macmillan.

    Olohan, Maeve. 2004. Introducing corpora in translation studies . London: Rout-ledge.

    9

  • 8/18/2019 New Directions in Corpus Studies

    16/175

    Claudio Fantinuoli and Federico Zane in

    Rura, Lidia, Willy Vandeweghe & Maribel M. Perez. 2008. Designing a paral-lel corpus as a multifunctional translator’s aid. In Proceedings of the XVIII FIT World Congress . Shanghai.

    Ste ing, Karen. 1989. Transediting – A new term for coping with the grey areabetween editing and translating. In Graham Caie, Kirsten Haastrup & ArntLykke Jakobsen (eds.), Proceedings from the fourth nordic conference for english studies , 371–382. Copenhagen: University of Copenhagen.

    Toury, Gideon. 1995. Descriptive translation studies and beyond . Amsterdam: JohnBenjamins.

    Zane in, Federico. 2012. Translation-driven corpora: Corpus resources for descrip- tive and applied translation studies . Manchester: St. Jerome Publishing.

    Zane in, Federico, Silvia Bernardini & Dominic Stewart (eds.). 2003. Corpora intranslator education. Manchester: St. Jerome Publishing.

    Zane in, Federico, Gabriela Saldanha & Sue-Ann Harding. 2015. Sketching land-scapes in translation studies. A bibliographic study. Perspectives: Studies inTranslatology 23(2). 1–22.

    10

  • 8/18/2019 New Directions in Corpus Studies

    17/175

    Chapter 2

    Development of a keystroke loggedtranslation corpus

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    is paper describes the development of a keystroke logged translation corpus con-taining both translation product and process data. e initial data comes from atranslation experiment and contains original texts and translations, plus the inter-mediate versions of the unfolding translation process. e aim is to annotate bothprocess and product data to be able to query for various features and recurringpa erns. However, the data must rst be pre-processed to represent individualkeystroke logging events as linguistic structures, and align source, target and pro-cess units. All process data, even material that does not appear in the nal trans-lation product, is preserved, under the assumption that all intermediate steps aremeaningful to our understanding of the translation process. Several examples of possible data queries are discussed to show how linguistically informed quantita-tive analyses of the translation process data can be performed.

    1 Introduction

    Empirical translation studies can be subdivided into two main branches, namelyproduct and process-based investigations (see Laviosa 2002; Göpferich 2008).Traditionally, the former are associated with corpus studies, while the la er re-quire translation experiments. e present study combines these two perspec-tives on translation by treating the translation process data as a corpus and trac-ing how linguistic phenomena found in the nal product have developed duringthe translation process.

    Typically, product-based studies consider translations as texts in their ownright, which can be analyzed in terms of translation properties, i.e. ways in whichtranslated texts systematically di er from the originals. e main translation

    Tatiana Serbina, Paula Niemietz & Stella Neumann. 2014. Development of a keystroke logged translation corpus. In Claudio Fantinuoli & Federico Za-ne in (eds.), New directions in corpus-based translation studies , 11–31. Berlin:Language Science Press

    http://-/?-

  • 8/18/2019 New Directions in Corpus Studies

    18/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    properties analyzed so far include simpli cation, explicitation, normalization to-wards the target text ( ), leveling out (Baker 1996) and shining through of thesource text ( ) (Teich 2003). Investigations into these properties can be con-ducted using monolingual comparable corpora containing originals and trans-lations within the same language (e.g. Laviosa 2002), bilingual parallel corporaconsisting of originals and their aligned translations (e.g. Becher 2010), or alsocombinations of both (Čulo et al. 2012; Hansen-Schirra & Steiner 2012).

    Empirical research requires not only description but also explanation of trans-lation phenomena. Why, for instance, are translated texts more explicit thanoriginals? It has been suggested that explicitation as a feature of translated textsis a rather heterogeneous phenomenon and can be subdivided into four di er-ent types: the rst three classes are linked to contrastive and cultural di er-ences, whereas instances of the fourth type are speci c to the translation pro-cess (Klaudy 1998: 82–83). Other researchers propose to explain translation phe-nomena in general through contrastive di erences between and , registercharacteristics and a set of factors connected to the translation process, for in-stance those related to the process of understanding (Steiner 2001). us, studiesusing parallel corpora have shown that the majority of examples of explicitationfound in the data can be accounted for through contrastive, register and/or cul-tural di erences (Hansen-Schirra, Neumann & Steiner 2007; Becher 2010). Based

    on these corpus-based studies researchers can formulate hypotheses that ascribethe remaining instances to the characteristics of the translation process, andthen test these hypotheses by considering data gathered during translation ex-periments, e.g. through keystroke logging. Keystroke logging so ware such asTranslog (Jakobsen & Schou 1999) allows researchers to study intermediate stepsof translations by recording all keystrokes and mouse clicks during the processof translation. Based on this behavioral data and the intermediate versions of translations, assumptions with regard to cognitive processing during translationcan be made. Analysis of translation process data helps explain the properties of

    translation products, describe potential translation problems and identify trans-lation strategies.Previousstudies in this area have focused onanalysis ofpauses and the number

    as well as length of the segments in between (e.g. Dragsted 2005; Jakobsen 2005;Alves & Vale 2009; 2011). Furthermore, translation styles have been investigatedin both quantitative and qualitative manners (e.g. Pagano & Silva 2008; Carl,Dragsted & Jakobsen 2011), for example, the performances of professional andstudent translators have been compared with regard to speed of text productionduring translation, length of produced chunks and revision pa erns (e.g. Jakob-

    sen 2005).

    12

  • 8/18/2019 New Directions in Corpus Studies

    19/175

  • 8/18/2019 New Directions in Corpus Studies

    20/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    Ma hiessen 2014: 715; Taverniers 2003: 8–10), we assume that in the complexversion the information is more dense and less explicit. For instance, whereasthe italicized stretches of text in (1) and (2) contain the same semantic content,its realization as a clause in (1) leads to an explicit mention of the agents, namelythe researchers, which are le out in the nominalized version presented in (2).During the experiment every participant translated one of the two versions of the text, in which simple and complex stimuli had been counterbalanced. In otherwords, ve simple and ve complex stimuli integrated into the rst source textcorresponded to the complex and simple variants of the same stimuli in the sec-ond text. e only translation resource allowed during the translation task wasthe online bilingual dictionary leo .3 e participants’ keystrokes, mouse move-ments and pauses in between were recorded using the so ware Translog . Addi-tionally, the information on gaze points and pupil diameter was collected withthe help of the remote eye-tracker Tobii 2150 , using the corresponding so wareTobii Studio , version 1.5 (Tobii Technology 2008). Currently the corpus considersonly the keystroke logging data, but later the various data sources will be trian-gulated (see Alves 2003) to complement each other. e discussion of individualqueries and speci c examples in §3 indicates how the analysis of the data couldbene t from the additional data stream.

    (1) Simple stimulusInstead of collapsing to a nal xed size, the height of the crushed ballcontinued to decrease, even three weeks a er the researchers had applied the weight . (P Source text 2)

    (2) Complex stimulusInstead of collapsing to a nal xed size, the height of the crushed ballcontinued to decrease, even three weeks a er the application of weight .(P Source text 1)

    e prototype of the thus consists of 2 versions of the original (sourcetexts), 16 translations (target texts) as well as 16 log les (process texts). esource and target texts together amount to approximately 3,650 words, not in-cluding the process texts. e total size, taking into account various versionsof the same target text words, can be determined only a er completion of thepre-processing step (see §2.2). All the texts belong to the register of popularscienti c writing. A er the gold standard is established, the corpus will be ex-tended to include data from further translation experiments, e.g. data stored in

    3 http://dict.leo.org/ende/index_de.html.

    14

    http://dict.leo.org/ende/index_de.html

  • 8/18/2019 New Directions in Corpus Studies

    21/175

    2 Development of a keystroke logged translation corpus

    the – (Carl 2012).4 is database is a collection of keystroke log-ging and eye-tracking data recorded during translation, editing and post-editingexperiments. It provides both raw and processed data: for instance, originalsand nal translation products are tokenized, aligned and annotated with parts of speech, whereas the process data is analyzed in terms of gaze and keystroke units(Carl 2012). According to the website, thecurrent version of the database consistsof approximately 1300 experiments.5 In the development of our keystroke loggedtranslation corpus we go further by identifying all potential tokens produced dur-ing a translation process and enriching these with linguistic information. At themoment, the relatively small size of the corpus is su cient to develop the newprocedures and queries required for this type of data.

    2.2 Pre-processing

    While the originals and the nal translations can be automatically annotatedand aligned using existing tools, the process texts require pre-processing beforethey can be enriched with further information. e keystroke logs consist of individual events corresponding to one press of a key or a mouse. To link thisbehavioral information to the linguistic level of analysis, the events have to berepresented in terms of complete tokens. Since the intentions of a translatorare not always clear, it is essential to re ect all possible tokens produced dur-ing the translation process. Using a modi ed version of the concept of targethypotheses that Lüdeling (2008) introduced for learner corpora (which also con-tain non-standard language with errors), the will include multiple layers of annotation re ecting di erent versions of the same tokens that could be inferredfrom the process data. us, in ourcontext, targethypotheses represent potentialtranslation plans. Several hypotheses are annotated when the keystroke loggingdata is ambiguous, i.e. in cases when, based on the pressed keys, it is unclearwhat token the translator intended to produce, and when the process containsadditional indicators of increased cognitive processing such as longer pauses orcorrections. is method retains the necessary level of objectivity because it doesnot force the researcher to select only the version which appears most plausibleat a certain stage of corpus compilation.

    Leijten et al. (2012) discuss the processing of monolingual keystroke loggingdata by aggregating it from the character (keystroke) to the word level (see also

    4 e – is the Translation Process Research Database of the Centre for Research andInnovation in Translation and Translation Technology.

    5 https://sites.google.com/site/centretranslationinnovation .

    15

    https://sites.google.com/site/centretranslationinnovationhttp://-/?-

  • 8/18/2019 New Directions in Corpus Studies

    22/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    Macken et al. 2012). For translation data, however, the required processing ismore complex. Within the target text keystrokes are aligned to tokens, and thesetokens (representing intermediate versions of words either preserved in the ,or modi ed/deleted in the process) are in turn aligned to the alignment unitsconsisting of – counterparts (see Carl 2009: 227). e same process of align-ment is also performed for the phrase and grammatical function levels. esealignment links make it possible to query for all intermediate versions of individ-ual tokens and phrases (see §3.5).

    To facilitate this alignment, an alignment tool was developed which allows theresearcher to manually select items to be aligned from the and the .6 esealignment units are saved in the same keystroke logging le. e screenshot inFigure 1 shows the selection of an alignment pair with the tool: the words ex- plaining from the list and erklären ‘to explain’ from the list are highlightedto become alignment pair 0 in the bo om window. e window on the le partof the screen displays the le for reference.

    Figure 1: Screenshot of an alignment process using the alignment tool

    e Translog so ware supplies the keystroke data in format. Each key-stroke is identi ed as a log event containing values for the type of action (i.e.,character, deletion, movement, mouse click), the cursor position of this key-stroke, a time stamp and a block ID which identi es the number of characters

    6 e tool was developed by students Adjan Hansen-Ampah ( Aachen) and Chuan Yao(Georgia Institute of Technology) during a project at Aachen University in 2013.

    16

  • 8/18/2019 New Directions in Corpus Studies

    23/175

    2 Development of a keystroke logged translation corpus

    highlighted in the log event (e.g. when a segment is highlighted prior to beingmoved or deleted). During the pre-processing stage for the prototype, the data was enriched by aggregating the log events into plausible tokens to whichtoken IDs were assigned. For each alignment level (currently only word level;in the future also phrase and grammatical function levels) a reference link wasspeci ed to link the object to the corresponding alignment unit created by thealigner. If the token did not appear in the nal version and could not be linked toany existing alignment units, the reference link was designated as an empty link.In example (3) below the three words ür Verwirrung sorgt ‘causes confusion’,which appear in an intermediate version of this sentence, are characterized byempty links: since the same semantic information is expressed in the nal ver-sion through a di erent grammatical structure using non-related lexical items,namely nicht vollständig erklären können ‘could not explain entirely’, the tokenscannot be connected to any alignment units. e reference to the empty linksensures that the information contained in the intermediate versions is preservedin the data and can be queried. ese tokens can only be linked on the level of units larger than words. e frequent use of semantically equivalent structuresrather than structurally similar units requires alignment on multiple levels, ascertain relations cannot be captured at the level of individual words.7

    (3) EO: Yet it displays surprising strength and resists furthercompression, a fact that has confounded physicists . ( )

    GT_i: ♦ ⋆⋆ eine♦a

    Tatsache,♦fact

    die♦the

    Physikerf ♦ ⋆ < ×⊐< ×⊐♦ [⋆ 11.968]< ×⊐< ×⊐< ×⊐< ×⊐< ×⊐< ×⊐< ×⊐< ×⊐< ×⊐physicistsbei♦by

    Physikern♦physicists

    ür♦for

    Verwirrung♦confusion

    sorgrt< ×⊐< ×⊐t.caters

    GT_f einea

    Tatsache,fact

    diethat

    sichthemselves

    Physikerphysicists

    nochyet

    immerstill

    nichtnot

    vollständigentirely

    erklärenexplain

    könnencan

    7 e intermediate versions of German translations use special characters introduced in linearrepresentation, a visualization option provided by the keystroke logging so ware Translog .♦ – a space character, ⋆ – approx. 1 sec. pause, [⋆ 36.721] – a pause of 36 seconds, 721milliseconds,< ×⊐ – a backspace character. e part of the original corresponding to the transla-tion is wri en in italics. One or more intermediate versions (GT_i) and the nal version (GT_ )of translations, if relevant for the discussion, are presented in their chronological order.

    17

  • 8/18/2019 New Directions in Corpus Studies

    24/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    Similarly, empty links were also de ned in the – alignment units, if nocorresponding element could be identi ed for either the or the (Čulo et al.2012), so that this information can also be extracted from the corpus.

    2.3 Annotation

    “Corpus annotation adds value to a corpus in that it considerably extends therange of research questions that a corpus can readily address” (McEnery, Xiao &Tono 2006: 29): a systematic annotation of particular information types through-out a corpus enables researchers to search for and extract corpus examples basedon certain criteria included in one or more annotation layers. At the moment alltexts are annotated with meta-information specifying the participant ID, a ver-sion of the translated text and the participant’s group (translator/physicist). emeta-information will be extended to include further variables relevant for po-tential analyses of the translation process data, e.g. participant-speci c metadatasuch as age or native language (see Hvelplund & Carl 2012). Furthermore, thewill contain several layers of linguistic annotation. e part of speech ( ) an-notation of the process texts was done manually for some examples in the corpusprototype, but the aim is to perform this step automatically for process as well assource and target texts through the use of an existing tagger. Automatic syntac-tic parsing and annotation of grammatical functions is also planned;8 however,it is recognized that manual interaction to check the results will still be neces-sary. e multilayer annotation (see Hansen-Schirra, Neumann & Vela 2006) willbe extended by integrating the target hypotheses as a separate annotation layer(see §3.2). In addition, behavioral information such as the length of individualpauses (see Alves & Vale 2009; 2011) will be annotated to facilitate quantifyingthese types of features, as well as querying for a combination of behavioral andlinguistic information.

    3 Possible queriesDepending on the research questions, di erent types of queries into the transla-tion process data are required. e following sub-sections describe a selection of possible queries. Taking into account the novelty of this corpus type for transla-

    8 Di erent taggers and parsers will be tested, and in a later step trained to accommodate thenon-standard features present in the . e ongoing work on pre-processing and annota-tion of monolingual process data (Leijten et al. 2012; Macken et al. 2012) is being taken intoconsideration.

    18

  • 8/18/2019 New Directions in Corpus Studies

    25/175

    2 Development of a keystroke logged translation corpus

    tion process research, this section aims at showing the potential applications of the planned annotation and alignment layers introduced above for the analysisof translations.

    3.1 Alternative versions and incomplete structures within individualintermediate versions

    One query type concerns alternative versions of an unfolding target text. Dur-ing the process of translation, evolving texts typically undergo multiple revisions(e.g. in the form of deletions, overwrites or additions) before the nal productis completed. One way of looking at revisions is to consider all keystrokes re-lated to the translation of one source text sentence, up to the point where thetranslator begins translating other sentences, as an intermediate version of thetranslation of this source text sentence. e next version is identi ed, when andif the translation of this sentence is resumed a er text production and/or revisionof other passages.9 O en such intermediate versionscould function on their own:their linguistic structures are complete and could be le unchanged throughoutthe translation session. However, for various reasons, subsequent revisions maylead to (a series o ) changes in these structures, thus creating new versions of thesame sentences.

    A single intermediate version may include several alternatives for the samelinguistic slot realized by the same part of speech. For example, in (4), two ver-sions of the modal verb within a subordinate clause have been supplied by thetranslator: the rst of these in the present (können ‘can’) and the second in thepast tense (konnten ‘could’), separated by a slash.

    (4) EO: Yet it displays surprising strength and resists further compression,a fact that has confounded physicists. ( )

    GT_i: […] die♦which

    sich♦themselves

    Physiker♦physicists

    nicht♦not

    erklären♦explain

    können⋆ /konnten.can/could

    e part of speech annotation allows us to query this and similar pa ernsthrough a search for identical parts of speech separated by a punctuation mark.Figure 2 shows the code provided by the keystroke logging so wareTranslog

    9 e identi cation of intermediate versions di ers from the annotation of di erent target hy-potheses (see §3.2): for instance, in (4), one intermediate version corresponds to two targethypotheses.

    19

  • 8/18/2019 New Directions in Corpus Studies

    26/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    Figure 2: code enriched with alignment links and information on tokens andparts of speech

    corresponding to the production of the tokens können and konnten in example(4). As can be seen, the tool generates les representing one log event (e.g. akeystroke corresponding to a le er or a slash) per line. e pre-processing steprequires the grouping of these events into tokens, such as können and konnten,which can be then annotated with part of speech tags. Here we use the tagsfrom the Stu gart-Tübingen Tagset ( ) for German (Schiller et al. 1999) forthe purposes of illustration. Both können ‘can’ in Token 38 and konnten ‘could’in Token 40 bear the part of speech tag indicating ‘verb nite, modal’.

    20

  • 8/18/2019 New Directions in Corpus Studies

    27/175

    2 Development of a keystroke logged translation corpus

    Sometimes, alternatives might ll not only one part of speech slot but a wholephrase or clause, requiring a di erent approach in order to query for such morecomplex intermediate versions. e present study di erentiates between wordsoccurring in the and the , on the one hand, and di erent tokens that canbe identi ed in the intermediate versions. From the perspective of the processall meaningful items in the intermediate versions are tokens. In addition, thosetokens that are kept in the nal translation are designated as words. is distinc-tion helps us keep the process and the product of translation apart and study theirinterrelations. For instance, combinations between one or several words and alarger number of tokens, present in the same intermediate translation version,are considered to be an indicator that several alternatives for the same linguis-tic unit are included. erying for such combinations would result in a morecomplete list of examples similar to (4).

    However, in some cases a translator leaves a stretch of text un nished byeitherwriting less or more linguistic material than is required for a complete linguisticstructure. Rather than addingmultiple alternatives to a single translation version,a translator may also write an incomplete structure, in which a placeholder issubstituted for the later linguistic unit, such as a sequence of characters “xxx”or simply several space characters, as is shown in (5). In this sequence of wordclasses * (article adjective * coordinating conjunction nite

    verb), the head noun of the noun phrase is missing. For this reason, searches forsuch examples also require annotation of the intermediate versions.

    (5) EO: Yet it displays surprising strength and resists further compression,a fact that has confounded physicists. ( )

    GT_i: Denno⋆ ch♦ ⋆⋆⋆⋆⋆yet

    zeigt⋆ ♦displays

    sie♦it

    eine♦a

    Er< ×⊐< ×⊐< ×⊐♦ erstaun⋆⋆ liche♦suprising♦♦♦

    [⋆ 36.721]♦♦ ⋆♦♦

    und♦ ⋆⋆⋆⋆⋆⋆and widersteht[…].resists

    Examples of the phenomena described in this sub-section can be seen as in-dications of understanding di culties or a empts at nding the most suitabletranslation of the unit. e translator is aware of the problems and, ratherthan taking the time to optimize this section at that point, s/he prefers to con-tinue translating the text, intending to return to this passage later. ese exam-ples can be investigated in terms of the translation strategies that are employedby translators. It is possible that the strategies di er not simply from translator

    21

  • 8/18/2019 New Directions in Corpus Studies

    28/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    to translator, but also depending on linguistic factors such as the grammaticalcomplexity of the original.

    3.2 Alternative target hypotheses

    As mentioned earlier, some tokens found in the intermediate versions may beambiguous: in these cases, the researcher cannot determine the intention of thetranslator. Here it is essential not to interpret the data but rather re ect all possi-ble options by annotating several target hypotheses (Lüdeling 2008). In example(6) below, the preposition innerhalb and the inde nite article eines are followedby a longer pause, a er which the ending of the article is changed, turning eines into einer . Since articles in German contain morphological endings expressingthe grammatical categories of person, number, gender and case, one di erentle er can a ect the grammatical structure of the noun phrase. e researchercan, therefore, formulate a target hypothesis that the original plan for the nounphrase was eines Zylinders ‘a .M cylinder’, where the masculine genitive form of the determiner (matching the masculine noun) was typed, a er which the trans-lation plan changed. As a result, the translator deleted the –s at the end of eines ,typed the –r instead (yielding the feminine form of the determiner einer ) andcontinued typing to produce the feminine noun Zylindergeometrie ‘a .F cylindergeometry’. Although only the token Zylindergeometrie is evident at this point inthe translation process, the existence of the assumed rst version is supportedby the fact that, at a later stage of the translation process, Zylindergeometrie wasaltered to Zylinder . It is plausible that the text-editing operations leading to a dif-ferent grammatical su x – especially if preceded by a longer pause (a potentialindicator of increased cognitive processing, see Dragsted 2005) – do not repre-sent the correction of a simple typing error, but rather re ect a more complexcognitive process of changes to the translation plan. Still, the researcher cannotdiscount the possibility that the changefrom –s to –r is in fact a simplecorrectionof a typo. is scenario constitutes another target hypothesis.

    (6) EO: e researchers crumpled a sheet of thin aluminized Mylar andthen placed it inside a cylinder equipped with a piston to crushthe sheet. ( )

    GT_i: […] innerhalb♦inside

    eines⋆ ♦ ⋆⋆⋆⋆ < ×⊐< ×⊐r♦a .M⋆ ♦ ⋆⋆⋆⋆ a .F

    Z⋆ ylindergeometriecylinder.geometry .F.

    […].

    22

    http://-/?-

  • 8/18/2019 New Directions in Corpus Studies

    29/175

    2 Development of a keystroke logged translation corpus

    Planned annotation of alternative target hypotheses will allow querying forsuch pa erns.10 ese can be analyzed with regard to more or less technicalvocabulary, as is the case in example (6) above, verbal or nominal variants, etc.Taking into account a number of explanatory factors, such as register character-istics or process-related variables, a comprehensive picture on such alternationswill emerge.

    3.3 Incorrect combinations of morphological markings in the nalproduct

    Analyzing the nal product in terms of its quality, the researcher may comeacross grammatical errors, as in (7).

    (7) EO: e researchers crumpled a sheet of thin aluminized Mylar .( )

    GT_i: Die the

    Wissenscha ler scientists

    zerkni ertencrumpled

    eine a

    dünne thin

    Alufolie aluminium.foil

    GT_f: Die the

    Wissenscha ler scientists

    zerkni ertencrumpled

    eine a

    dünnes thin

    Bla sheet

    Alufolie

    aluminium.foile grammatical rule in German requires that in noun phrases, not only arti-

    cles and nouns but also premodifying adjectives agree in person, number, genderand case. For instance, in (7) the intermediate version contains the noun phraseeine dünne Alufolie ‘a thin aluminium foil’. e head noun Alufolie ‘aluminiumfoil’ has the following characteristics: third person singular, feminine genderand accusative case. erefore, the inde nite article ein ‘a’ and the adjectivedünn ‘thin’ are used with the ending -e indicating the same person, number, gen-der and case. In the nal version the corresponding NP has the form eine dünnes Bla Alufolie ‘a thinsheet of aluminium foil’: here the head noun is no longerAlu- folie ‘aluminium foil’ but rather the noun Bla ‘sheet’, having the same person,number and case but di erent gender, namely neuter. To agree with the headnoun along these four paramaters, the ending of the adjective has been changedto -es and the article should have been modi ed into ein ‘aACC.N’. However, thisrule has not been observed.

    10 Since the notion of target hypotheses was originally developed for annotation of learner cor-pora, it has to be modi ed to be compatible with the translation process data.

    23

  • 8/18/2019 New Directions in Corpus Studies

    30/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    Considering not only the source and the target texts but also intermediateversions of translation helps understand how the grammatical error has beenintroduced into the nal product: the noun phrase a sheet of thin aluminized Mylar was initially translated to the noun Alufolie ‘aluminium foil’ and thenchanged during a (later) revision phase into Bla Alufolie ‘sheet of aluminiumfoil’, which is more similar to the original than the rst a empt. e level of explicitness of the is recreated by specifying that exactly one sheet of thefoil rather than simply aluminium foil was crumpled. During this revision themorphological ending of the preceding adjective was changed to agree in genderwith the new head noun Bla ‘sheet’, but the ending of the article was not modi-

    ed accordingly. Since all translations were performed into the native languageof test subjects, grammatical inconsistencies are not necessarily due to a lack of grammatical competence. One possible explanation could be that the increasedcognitive e ort during translation of this noun phrase led to a grammatical errorin the nal version, possibly by drawing the cognitive resources away from thegrammatical article. is hypothesis can be further tested by triangulating thekeystroke logging data to such eye-tracking variables as number and length of

    xations or pupil dilation, which are typically used in the eye-tracking researchto operationalize cognitive demands (e.g. Pavlović & Hvelplund 2009).

    3.4 Substitutions of word classesTranslation studies research has a long tradition of studying the phenomenon of translation shi s, i.e. various changes introduced during the translation processand visible in the translation product. A parallel corpus of aligned originals andtranslations allows a systematic analysis of shi s between translation units of various sizes and on di erent level of linguistic analysis. For instance, a recentcorpus-based study has concentrated on shi s between di erent word classes(Čulo et al. 2008), the so-called “transpositions” (Vinay & Darbelnet 1995: 36).Example (8) illustrates a change from the verb require in the English original tothe adjective erforderlich ‘necessary’ in the nal version of the German transla-tion.

    (8) EO: Crumpling a sheet of paper seems simple and doesn’t require much e ort ( )

    GT_i: Eina

    Bla sheet

    Papier paper

    zu to

    zerknüllen,crumple

    scheint seems

    eine a

    einfache simple

    Sache thing

    zu to

    sein

    be

    und

    and

    benötigt

    requires

    nicht

    not

    viel

    much

    Kra aufwand .

    e ort

    24

  • 8/18/2019 New Directions in Corpus Studies

    31/175

    2 Development of a keystroke logged translation corpus

    GT_f: Eina

    Bla sheet

    Papier paper

    zu to

    zerknüllen,crumple

    scheint seems

    eine a

    einfache simple

    Sache thing

    zu to

    sein,be

    und and

    scheinbar apparently

    ist is

    dazu for.that

    auch also

    nicht not

    viel much

    Kra aufwand e ort

    erforderlich necessary

    It is possible to extract this translation shi from a verb in the to an adjec-tive in the using an available English-German parallel corpus such as CroCo(Hansen-Schirra & Steiner 2012). However, this kind of product-oriented corpusdoes not contain the information on what happened to the original verb in theintermediate translation versions. As is shown in (8), the translation shi wasnot introduced until a later revision of the pa ern: the verb benötigen ‘require’,initially used as a translation of the English verb, was replaced at a later stageby an adjective integrated into a di erent clause-level structure. e oppositepa ern is also possible, in which a translation shi present in the intermediateversion disappears during further editing of the translation. us, a keystrokelogged corpus enables researchers to extract shi s present at di erent stages of the translation development and to compare, for instance, the two possible revi-sion pa erns involving changes of word classes.

    Previous studies have suggested that translation involves a process of under-standing during which the semantic content of the has to be unpacked bythe translator. In other words, it is assumed that certain highly dense grammat-ical structures are typically understood in terms of grammatically less complexpa erns. A number of factors in uencing translations, such as contrastive di er-ences, register characteristics or other translation process-dependent variables(e.g. time pressure), might lead to changes with respect to the level of grammat-ical complexity of the corresponding unit, depending on how information isrepacked by the translator (Steiner 2001; Hansen-Schirra & Steiner 2012). Shi s

    of grammatical complexity have been operationalized as shi s of word classes.us, for example, the same semantic information can be expressed either as aclause or as a noun phrase; in the la er case the described event is presentedin a more compressed manner, making certain aspects implicit. By looking atshi s between verbs and nouns, such changes of complexity can be analyzed fur-ther. e addition of intermediate versions allows the investigation of how o enand under which circumstances the level of grammatical complexity is changedduring the process of translation.

    25

  • 8/18/2019 New Directions in Corpus Studies

    32/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    (9) EO: Once a paper ball is scrunched , it is more than 75 percent air .( )

    GT_i: nachdema er der the Papierball paper.ball zusammengedrückt together.pressed wurde was besteht consistser itzu to

    mehr more

    als that

    75 75

    Prozent percent

    aus of

    Lu .air

    GT_f: Eina

    zusammengedrückter together.pressed

    Papierball paper.ball

    besteht consists

    zu to

    mehr more

    als than

    75 75

    Prozent percent

    aus of

    Lu air

    In (9), the professional translator has initially kept the structure of the originalsentence: a temporal adverbial expressed through a subordinate clause is presentin both the and the intermediate version of the . However, during the nalrevision the clause was turned into an NP by using a strategy of premodi ca-tion typical for German, namely a reduced participle clause. is compressionof semantic information results in a more complex grammatical structure in theGerman translation than in theEnglish original. It has been suggested that one of the factors leading to the increase of grammatical complexity could be high trans-lation competence (Hansen-Schirra & Steiner 2012: 260). To test this hypothesis,

    the frequency of similar examples in translations by professional translators andphysicists could be compared and submi ed to statistical tests.

    3.5 Lexical substitutions

    As mentioned in §2.2, the alignment units de ned between corresponding words,phrases or chunks in the and the function as reference points to which theprocess tokens are linked during the pre-processing of the data. Using thesereference links a researcher can trace the history of the word. While the pre-

    vious section discussed an example in which a verb in the intermediate versionis linked to an adjective in the nal , a revision does not necessarily a ectthe grammatical structure of a sentence. us, as is shown in example (10), thechanges could also be at a lower level of complexity: in this sentence only thenoun slot is repeatedly modi ed before the translator found the solution that s/heconsidered to be most suitable. is and similar instances found in the areinterpreted in terms of register characteristics or stylistic reasons (e.g. avoidanceof repetitions).

    26

  • 8/18/2019 New Directions in Corpus Studies

    33/175

    2 Development of a keystroke logged translation corpus

    (10) EO: is another ma er entirely ( )GT_i1: so

    so

    ist

    is

    dies

    this

    eine

    a

    völlig

    totally

    andere

    di erent

    Sache

    thingGT_i2: so so

    ist is

    dies this

    eine a

    völlig totally

    andere di erent

    Angelegenheit ma er

    GT_f: so so

    ist is

    dies this

    eine a

    völlig totally

    andere di erent

    Frage question

    is particular example illustrates that the alignment of process tokens in-volves a certain level of interpretation on the part of the researcher: according toKollberg & Severinson-Eklundh (2001: 92), “if a writer deletes a word, and subse-

    quently inserts another word at the same position in the text, one cannot deducethat the writer intended the second word to replace the rst (even if this is o enthe case)”. In other words, the authors indicate that though it might seem obvi-ous to assume that the writer/translator meant to substitute a certain word, thisis still an interpretation by the researcher and, therefore, does not belong to theformal level of data description. e functional analysis should be le to a laterresearch stage (Kollberg & Severinson-Eklundh 2001: 92–93). e distinctionbetween formal and functional data pre-processing can be compared to formaland functional types of annotation found in the corpora. For instance, on the for-

    mal level, sentences can be parsed into individual phrases, whereas an additionalfunctional annotation would involve enrichment of these units with grammaticalfunctions. e present study takes the position that both types of pre-processingand annotation are required. is combination of formal and functional levelsfacilitates di erent types of analyses. us, it is possible to analyze the data in amore qualitative manner by looking at individual sentences or texts; in this casethe formal pre-processing of the keystroke logging data might be enough. At thesame time, the queries discussed in this article are designed to conduct quanti-tative investigations, which certainly bene t from additional functional types of pre-processing and annotation. As long as all of the decisions involved in theseprocesses are made transparent, the researcher can assess which informationstored in the corpus is required for each individual case.

    4 Conclusion and outlook

    In this paper we have described the compilation and annotation of a keystrokelogged corpus containing original and translated texts along with the processtexts, with the aim of tracing the development of the linguistic phenomena found

    27

  • 8/18/2019 New Directions in Corpus Studies

    34/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    in the nal product through the intermediate versions of the unfolding text dur-ing the translation process. is requires complex alignment procedures on sev-eral levels of analysis together with multilayer annotation to include informa-tion such as target hypotheses and typical translation features (e.g. grammaticalshi s). e corpus will allow us to query the data in order to discover consis-tencies or compare intermediate versions, and to understand more about thetranslation process; thus, while it is particularly the quantitative research intothe translation process that will be facilitated through this type of corpus, theinterpretation of these quantitative ndings requires taking a more qualitativeperspective on the data.

    e next steps in the development of the corpus are undertaken within thework of the Boost Fund project e-cosmos . e goal of e-cosmos is to de-velop a transparent and user-friendly environment for the quantitative analysisof complex, multimodal humanities data, and at the same time allow researchersto interact with the data, from the collection stage through (semi-automatic) an-notation to the application of a wide range of statistical tests. is approach hastwo immediate consequences for the translation data: 1) the data outputs andformats generated by the parsers and other tools selected for work with the datawill be compatible; and 2) the platform will enable the analysis of the keystrokedata together with other data streams such as the eye-tracking data, thereby al-

    lowing more ne-grained quantitative analyses. e combined analysis of thedata on translation process and product will contribute to a comprehensive un-derstanding of the various factors playing a role in translation.

    Appendix

    Shortened original

    Crumpling a sheet of paper seems simple enough and certainly doesn’t require

    much e ort, but explaining why the resulting crinkled ball behaves the way itdoes is another ma er entirely. Once scrunched, a paper ball is more than 75percent air yet displays surprising strength and resists further compression, afact that has confounded physicists. A report in the February 18 issue of Physical Review Le ers , though, describes one aspect of the behavior of crumpled sheets:how their size changes in relation to the force they withstand.

    A crushed thin sheet isessentially a mass ofconical points connected bycurvedridges, which store energy. When the sheet is further compressed, these ridgescollapse and smaller ones form, increasing the amount of stored energy within

    28

  • 8/18/2019 New Directions in Corpus Studies

    35/175

    2 Development of a keystroke logged translation corpus

    the wad. Sidney Nagel and colleagues of the University of Chicago modeled howthe force required to compress the ball relates to its size. A er crumpling a sheetof thin aluminized Mylar, the researchers placed it inside a cylinder equippedwith a piston to crush the crumpled sheet. Instead of collapsing to a nal xedsize as expected, the team writes, the height of the crushed ball continued todecrease, even three weeks a er the weight was applied […].

    Graham, Sarah. 2002. A New Report Explains the Physics of Crumpled PaperScienti c American Online . http://www.scientificamerican.com/article.cfm?id=a-new-report-explains-the.

    Source text 1

    Crumpling a sheet of paper seems simple and doesn’t require much e ort, but ex-plaining why the crumpled ball behaves the way it does is another ma er entirely.A scrunched paper ball is more than 75 percent air. Yet it displays surprisingstrength and resistance to further compression, a fact that has confounded physi-cists. A report in Physical Review Le ers, though, describes one aspect of thebehavior of crumpled sheets: how their size changes in relation to the force theywithstand. A crushed thin sheet is essentially a mass of conical points connectedby curved energy-storing ridges. When the sheet is further compressed , these ridgescollapse and smaller ones form, increasing the amount of stored energy withinthe wad. Scientists at the University of Chicago modeledhow the force required to compress the ball relates to its size. A er the crumpling of a sheet of thin aluminized Mylar , the researchers placed it inside a cylinder. ey equipped the cylinder with a piston to crush the sheet. Instead of collapsing to a nal xed size, the heightof the crushed ball continued to decrease, even three weeks a er the applicationof weight .

    Source text 2

    Crumpling a sheet of paper seems simple and doesn’t require much e ort, butexplaining the crumpled ball’s behavior is another ma er entirely. Once a paper ball is scrunched , it is more than 75 percent air. Yet it displays surprising strengthand resists further compression, a fact that has confounded physicists. A report inPhysical Review Le ers, though, describes one aspect of the behavior of crum-pled sheets: changes in their size in relation to the force they withstand.

    A crushed thin sheet is essentially a mass of conical points connected bycurved ridges, which store energy. In the event of further compression of the sheet theseridges collapse and smaller ones form, increasing the amount of stored energy

    29

    http://www.scientificamerican.com/article.cfm?id=a-new-report-explains-thehttp://www.scientificamerican.com/article.cfm?id=a-new-report-explains-the

  • 8/18/2019 New Directions in Corpus Studies

    36/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    within the wad. Scientists at the University of Chicago modeled the relation be- tween compression force and ball size. e researchers crumpled a sheet of thin alu- minized Mylar and then placed it inside a cylinder equipped with a piston to crushthe sheet. Instead of collapsing to a nal xed size, the height of the crushedball continued to decrease, even three weeks a er the researchers had applied the weight .

    References

    Alves, Fabio (ed.). 2003. Triangulating translation: Perspectives in process oriented research . Amsterdam: John Benjamins.

    Alves, Fabio & Célia Magalhaes. 2004. Using small corpora to tap and map theprocess-product interface in translation. TradTerm 10. 179–211.Alves, Fabio & Daniel Couto Vale. 2009. Probing the unit of translation in time:

    Aspects of the design and development of a web application for storing, anno-tating, and querying translation process data. Across Languages and Cultures 10(2). 251–273.

    Alves, Fabio & Daniel Couto Vale. 2011. On dra ing and revision in translation:A corpus linguistics oriented analysis of translation process data. Translation: Computation, Corpora, Cognition 1. 105–122.

    Baker, Mona. 1996. Corpus-based translation studies: e challenges that lieahead.In Harold Somers (ed.), Terminology, LSP and translation: Studies in language engineering in honour of Juan C. Sager , 175–186. Amsterdam: John Benjamins.

    Becher, Viktor. 2010. Abandoning the notion of “translation-inherent” explicita-tion: Against a dogma of translation studies. Across Languages and Cultures 11(1). 1–28.

    Carl, Michael. 2009. Triangulating product and process data: antifying align-ment units with keystroke data. Copenhagen Studies in Language 38. 225–247.

    Carl, Michael. 2012. e CRITT TPR-DB 1.0: A database for empirical humantranslation process research. In Sharon O’Brien, Michel Simard & Lucia Specia(eds.), Proceedings of the AMTA 2012 workshop on post-editing technology and practice (WPTP 2012) , 9–18. Stroudsburg: Association for Machine Translationin the Americas (AMTA).

    Carl, Michael, Barbara Dragsted & Arnt Lykke Jakobsen. 2011. A taxonomy of human translation styles. Translation Journal 16(2). http:// translationjournal.net/journal/56taxonomy.htm.

    30

    http://translationjournal.net/journal/56taxonomy.htmhttp://translationjournal.net/journal/56taxonomy.htm

  • 8/18/2019 New Directions in Corpus Studies

    37/175

    2 Development of a keystroke logged translation corpus

    Carl, Michael & Arnt Lykke Jakobsen. 2009. Objectives for a query language foruser-activity data. In In 6th international natural language processing and cog- nitive science workshop . Milano.

    Dragsted, Barbara. 2005. Segmentation in translation: Di erences across levelsof expertise and di culty. Target 17(1). 49–70.

    Göpferich, Susanne. 2008. Translationsprozessforschung: Stand - Methoden - Per- spektiven. Tübingen: Narr.

    Halliday, Michael Alexander Kirkwood & Christian Ma hiessen. 2014.Halliday’s introduction to functional grammar . 4th edition. London: Routledge.

    Hansen-Schirra, Silvia, Stella Neumann & Erich Steiner. 2007. Cohesive explicit-ness and explicitation in an English-German translation corpus. Languages incontrast: International journal for contrastive linguistics 7(2). 241–265.

    Hansen-Schirra,Silvia, StellaNeumann & Michaela Vela. 2006. Multi-dimensionalannotation and alignment in an English-German translation corpus. In In pro- ceedings of the 5th workshopon NLP andXML(NLPXML-2006): Multi-dimensional markup in Natural Language Processing , 35–42. Trento: EACL.

    Hansen-Schirra, Silvia & Erich Steiner. 2012. Towards a typology of translationproperties. In Silvia Hansen-Schirra, Stella Neumann & Erich Steiner (eds.),Cross-linguistic corpora for the study of translations: Insights from the language pair English-German, 255–280. Berlin: de Gruyter.

    Hvelplund, Kristian Tangsgaard & Michael Carl. 2012. User activity metadata forreading, writing and translation research. In Victoria Arranz, Daan Broeder,BertrandGai e, Maria Gavrilidou,MonicaMonachini& orsten Trippel (eds.),Proceedings of the eighth international conference on language resources and eval- uation, 55–59. Paris: ELRA.

    Jakobsen, Arnt Lykke. 2005. Instances of peak performance in translation. Le- bende Sprachen 3. 111–116.

    Jakobsen, Arnt Lykke & Lasse Schou. 1999. Translog documentation. In GydeHansen (ed.), Probing the process in translation: Methods and results , 9–20. Fred-

    eriksberg: Samfunds Li eratur.Klaudy, Kinga. 1998. Explicitation. In Mona Baker (ed.), Routledge encyclopedia of translation studies , 80–84. London: Routledge.

    Kollberg, Py & Kerstin Severinson-Eklundh. 2001. Studying writers’ revising pat-terns with S-notation analysis. In ierry Olive & Michael Levy (eds.), Con- temporary tools and techniques for studying writing , 89–93. Kluwer AcademicPublishers.

    Laviosa, Sara. 2002. Corpus-based translation studies: eory, ndings, applica- tions . Amsterdam: Rodopi.

    31

  • 8/18/2019 New Directions in Corpus Studies

    38/175

    Tatiana Serbina, Paula Niemietz and Stella Neumann

    Leijten, Mariëlle, Lieve Macken, Veronique Hoste, Eric Van Horenbeeck & LuukVan Waes. 2012. From character to word level: Enabling the linguistic analy-ses of Inputlog process data. In Proceedings of the second workshop on compu- tational linguistics and writing , 1–8. Avignon: Association for ComputationalLinguistics.

    Lüdeling, Anke. 2008. Mehrdeutigkeiten und Kategorisierung: Probleme bei derAnnotation von Lernerkorpora. In Maik Walter & Patrick Grommes (eds.),Fort- geschri ene Lernervarietäten, 119–140. Tübingen: Niemeyer.

    Macken, Lieve, Veronique Hoste, Mariëlle Leijten & Luuk Van Waes. 2012. Fromkeystrokes to annotated process data: Enrichingtheoutputof Inputlogwith lin-guistic information. In Proceedings of the international conference on language resources and evaluation, 2224–2229. Paris: ELRA.

    McEnery, Tony, Richard Xiao & Yukio Tono. 2006. Corpus-based language studies: An advanced resource book . London: Routledge.

    Pagano, Adriana & Igor Silva. 2008. Domain knowledge in translation task exe- cution: Insights from academic researchers performing as translators . Shanghai:XVIII FIT World Congress.

    Pavlović, Natasa & Kristian Tangsgaard Hvelplund. 2009. Eye tracking transla-tion directionality. In Anthony Pym & Alexander Perekrestenko (eds.), Trans- lation research projects 2 , 93–109. Tarragona: Intercultural Studies Group.

    Schiller, Anne,SimoneTeufel,ChristineStöckert& Christine ielen. 1999.Guide- lines ür das Tagging Deutscher Textcorpora mit STTS . Universität Stu gart, Uni-versität Tübingen.

    Steiner, Erich. 2001. Translations English-German: Investigating the relative im-portance of systemic contrasts and of the text type ‘translation´. SPRIKreports 7. 1–49.

    Taverniers, Miriam. 2003. Grammatical metaphor in SFL: A historiography of theintroductionandinitial study of theconcept. In Anne-Marie Simon-Vandenbergen,Miriam Taverniers & Louise Ravelli (eds.), Grammatical metaphor: Views from

    systemic functional linguistics , 5–33. Amsterdam: John Benjamins.Teich, Elke. 2003. Cross-linguistic variation in system and text: A methodology for the investigation of translations and comparable texts . Berlin: de Gruyter.

    Tobii Technology. 2008. Tobii Studio 1.X user manual . http: / /www.tobii.com/Global/Analysis/Downloads/User_Manuals_and_Guides/Tobii_Studio1.X_UserManual.pdf.

    Vinay, Jean-Paul & Jean Darbelnet. 1995. Comparative stylistics of French and En- glish: A methodology for translation. Amsterdam: John Benjamins.

    32

    http://www.tobii.com/Global/Analysis/Downloads/User_Manuals_and_Guides/Tobii_Studio1.X_UserManual.pdfhttp://www.tobii.com/Global/Analysis/Downloads/User_Manuals_and_Guides/Tobii_Studio1.X_UserManual.pdfhttp://www.tobii.com/Global/Analysis/Downloads/User_Manuals_and_Guides/Tobii_Studio1.X_UserManual.pdf

  • 8/18/2019 New Directions in Corpus Studies

    39/175

    2 Development of a keystroke logged translation corpus

    Čulo, Oliver, Silvia Hansen-Schirra, Stella Neumann & Mihaela Vela. 2008. Em-pirical studies on language contrast using the English-Germancomparable andparallel CroCo Corpus. In Proceedings of the sixth international conference onlanguage resources and evaluation, 47–51. Paris: ELRA.

    Čulo, Oliver, Silvia Hansen-Schirra, Karin Maksymski & Stella Neumann. 2012.Heuristic examination of translation shi s. In Silvia Hansen-Schirra, StellaNeumann & Erich Steiner (eds.), Cross-linguistic corpora for the study of trans- lations: Insights from the language pair English-German, 255–280. Berlin: deGruyter.

    33

  • 8/18/2019 New Directions in Corpus Studies

    40/175

  • 8/18/2019 New Directions in Corpus Studies

    41/175

    Chapter 3

    Racism goes to the movies: Acorpus-driven study of cross-linguistic

    racist discourse annotation andtranslation analysisE e Mouka, Ioannis E. Saridakis and Angeliki Fo-topoulou

    is paper traces register shi s (Halliday & Hasan 1976: 22; Hatim & Mason 1997)between source-texts (English) and target-texts (Greek and Spanish) in instancesof racist discourse in lms. It presents preliminary, as yet non-exhaustive, ndingsandaims to ultimately formulate explanatory hypotheses concerning the emergingnorms. Our methodological approach is placed in the framework of DescriptiveTranslation Studies (Toury 2012; Chesterman 2008) and in the school of CriticalDiscourse Analysis (Fairclough 1985; 1992), relying on Appraisal eory (Martin &White 2005) to provide and analyse a taxonomy of the racism-related u erancesexamined.

    1 IntroductionTechnological advances in Corpus Linguistics and tools for processing and com-piling linguistic corpora open new ways on how we exploit textual and researchmaterial. In a descriptive approach, textual and pragmatic annotation can largelyfacilitate the systematic lexico-grammatical analysis of linguistic resources (seeMcEnery & Hardie 2012: 29–31; Zane in 2012: 76–79). is holds true also fortranslation corpora, with a particular focus on the descriptive examination of translation strategies and norms (Zane in 2012: 78–96).

    E e Mouka, Ioannis E. Saridakis & Angeliki Fotopoulou. 2014. Racism goesto the movies: a corpus-driven study of cross-linguistic racist discourse an-notation and translation analysis. In Claudio Fantinuoli & Federico Zanet-tin (eds.), New directions in corpus-based translation studies , 31–61. Berlin:Language Science Press

  • 8/18/2019 New Directions in Corpus Studies

    42/175

    E e Mouka, Ioannis E. Saridakis and Angeliki Fotopoulou

    is paper partly presents the rst author’s1 ongoing PhD research, whichaims to examine, from a descriptive viewpoint and by using corpus annotation,the translational norms of the socio-culturally marked discourse of racism, andtheshi s observed during the discourse transfer from a source language ( ) intotwo target languages ( , ).2 is paper focuses on the applied methodology, onthe ndings collected so far, and discusses problems and impediments observedduring corpus analysis.

    Racism, as manifested in discourse, is a constantly open issue that merits re-search (van Dijk 1993; Reisigl & Wodak 2001) and is clearly on the agenda of (critical) discourse analysis in light of the European social, political andeconomicbackdrop. Realistic lms on racism represent discourses emanating from raciststances, while cinema, as a mediumwidely accessible to the public communicatesideas apart from re ecting society. On the other hand, subtitles are considered tobe among the most read translations and text types in countries with a subtitlingtradition (Go lieb 1997: 153 in Pedersen 2011: 125). To this end, the analysis of subtitles in racism-related lms, rather than in lms with sporadic racist u er-ances, seems to be be er suited to research on the translation of racist-orienteddiscourse.

    §2 of this article outlines the aims and scope of our research, and introduces thebasic concepts and theoretical tenets used in this study. First, we introduce the

    principal discourse-related de nitions of racism, together with a discussion of how racist discourse is handled by Critical Discourse Analysis. Subsequently, weprovide a brief overview of Appraisal eory, rst developed by Martin & White(2005). is theory has been used extensively in sentiment analysis. Finally, weconsider the phenomenon of register shi s in subtitles.

    §3 presents our corpus-driven methodology and the corpus tools used in ourresearch. §4 and §5 present and exemplify the implementation of our method-ology and outline the principal ndings with regard to context-bound registershi s in translation.

    1 e second author, I.E. Saridakis is the PhD research director. Dr. A. Fotopoulou also partici-pates in the project’s consultative commi ee. e authors express their gratitude to V. Giouli,scienti c associate at the Institute for Language and Speech Processing ( ) for her supportin initially developing and implementing the sentiment annotation scheme described in thispaper, and in adopting the corpus metadata handling model used in our method.

    2 Batsalia & Sella-Mazi (2010: 120–121) de ne “shi s” as subsuming all changes that may appearduring the translation process, on a semantic, lexical, morphological, syntactic, pragmatic,and/or stylistic level. e “translation shi ” hypothesis is a useful and powerful descriptivedevice, to approach hermeneutically the phenomenon of di erentiation of the from its ,without stigmatising it.

    36

  • 8/18/2019 New Directions in Corpus Studies

    43/175

    3 Racism goes to the movies

    2 Resear aims and scope

    e focus of our work is to examine racist discourse from a translation perspec-tive, identifying its structure, its textual deployment, and its elements and traitson the basis of lexicogrammatical evidence and using a classi catory device. Inother words, our aim is to examine how racist a itudes can be classi ed in spo-ken lm discourse, linking this classi cation to the context of the u erancesfrom which the text chunks have been drawn. is classi cation and analysis isbased on a model adapted from Appraisal eory (Martin & White 2005), usingpostulates derived from Critical Discourse Analysis (Reisigl & Wodak 2001; vanDijk 2000a; 2000b; 2002). Finally, by linking the examined u erances to theirtranslations in two s, register shi s can be analysed on the basis of previousresearch (Hatim & Mason 1997; Mason 2001; Pe it 2005; Mubenga 2009; Mun-day 2012). is study is based on corpus resources and methodologies. We rstconstructed an ad hoc corpus and annotated it with a purpose-built annotationscheme, then set out to identify register shi s in the translation of racist u er-ances. is approach is exempli ed by the preliminary ndings reported in thisarticle.

    2.1 Ba ground. Racism and racist discourse

    e phenomenon of racism is fuzzy and evasive, and the term is o en used rathervaguely, even to describe discriminatory phenomena other than those relatedto the concept of “race”. Racism subsumes everyday practices and behaviours,both verbal and non-verbal, stereotyping, discriminatory practices, institutionalsystemic policies, or even acts of racial segregation and genocides (Giddens 2009:637–653).

    How racism is de ned depends, in the nal analysis, on the scope of individualresearch: forexample, literature lists distinctive de nitions such as “institutional”or “systemic” racism, to designate racism that is present in societal structures,such as the education