38
Sub-segment matching - Demystified Angelika Zerfaß Daniel Zielinski

Sub-segment matching - Demystified · memoQ 4.5 The footnote position is identified by a placeholder, the actual text of the footnote is a separate segment. Footnotes that contain

Embed Size (px)

Citation preview

Sub-segment matching -Demystified

Angelika ZerfaßDaniel Zielinski

2

Agenda

• Definition of a sub-segment– As a text element– As placeable/automatically substitutable elements– As terminology– In concordance searching (source and target

language)– In source/target language matching (sub-segment

translation)

• Technologies involved

3

Definition of sub-segments

• As a text element…– Separately translatable elements that are

embedded or attached to other elements (segments) in a text

• Footnotes• Index entries

– Text within elements like tags that are individual units of meaning

• Text within translatable attributes of tags

4

Footnotes inside and at the end

Across 5

The footnote position is identified by a placeholder, the actual text of the footnote is a separate segment.Footnotes that contain several segments in themselves are treated as flowing text. Each segment is translated individually.

5

Footnotes inside and at the end

memoQ 4.5

The footnote position is identified by a placeholder, the actual text of the footnote is a separate segment.Footnotes that contain several segments in themselves are treated as flowing text. Each segment is translated individually.

7

Footnotes inside and at the end

SDL Trados 2009

The footnote position is identified by a placeholder, the actual text of the footnote is a separate segment.Footnotes that contain several segments in themselves are treated as flowing text. Each segment is translated individually.

8

Index entries examples SDL Trados 2009 and memoQ

memoQ 4.5The index entry start and end is identified by a placeholder, the actual text of the index entry has to be translated inside the segment.

SDL Trados 2009 StudioThe index entry position is identified by a placeholder, the actual text of the index entry is a separate segment.

9

Index entries examples: Across 5

The index entry position is identified by a placeholder, the actual text of the index entry is a separatesegment, to be translated in a separate window.

10

Text inside a tag (attribute)

The text within the tag has to be translated insidethe segment.

Across 5

Title of a graphic in an HTML file (pop-up during mouse-over)

11

Text inside a tag (attribute)

SDL Trados 2007

The text within the tag has to be translated inside the segment.

The text within the tag is translated as a separatesegment.

memoQ 4.5

12

As placeables

• Automatically substitutable elements– Dates– Times– Numbers– Measurement– Variables

13

Substitution examples• Across 5

• memoQ 4.5

15

Substitution examples

• SDL Trados 2009

16

As terminology

• Terminology search in one or more term bases

• Search will also find fuzzy matches• Terms and translations shown in a separate

window

18

Term recognition examples

• memoQ 4.5Allowed (blue) and forbidden (black) terms are listed, more term information at the bottom

20

Term recognition examples• SDL Trados 2009

21

Sub-segments in source language -Concordance

• Whenever the TM system does not find a match for the whole segment, a search for segment parts can be initiated, the so-called concordance search

• A concordance is a sorted list of words and phrases, associated with the sentences they appear in.

• Concordance search will often also find similar fragments (fuzzy search)

• Result appear marked in a list of all the sentence pairs from the TM

22

Concordance examples

• Across

23

Concordance examples

• Déjà Vu X

24

Concordance examples

• memoQ 4.5

26

Concordance examples

• SDL Trados 2009

27

Concordance examples: target

• Across

28

Concordance examples: target

• SDL Trados 2009

29

Automatic concordance search

• Concordance searching is often initiated manually, when you know that part of the segment has been translated before

• Some tools also offer the automation of the concordance search in the source language so that the translator gets the information that there are segment pairs in the TM that contain a certain sub-segment in the source language, without having to look for them explicitly

30

Automatic sub-segment translation

• Concordance search can speed up the translation process– by showing the translator sentence pairs that

contain the term or phrase from the concordance search

• It would be even better, if the system could find the translation for that term or phrase and offer it to the translator automatically

• As linguistic analysis would be too difficult to implement for each and every language pair, most tools today work with a statistics-based approach.

31

Automatic sub-segment translation

Fragment Assembly• Segments can be assembled out of known parts like,

terms from the term base or smaller segments in the TM.

• The translations of those sub-segments are embedded into the source language segment.

32

Automatic sub-segment translation

Database of fragment pairs • The tool creates a list of sub-segments in the source

language that appear frequently.

• Then, by a statistical approach also known from terminology extraction, they search for recurring fragments in the target language parts of the segment pairs, thus selecting the possible translation of the fragments .

• This list is a third database besides TM and term base

33

Automatic sub-segment translation examples

• Across 5: Auto-completionauto-text suggestions out of a lexicon created with crossMining (statistical extraction)

34

Automatic sub-segment translation: Examples

• Déjà Vu: AutoAssembleKnown fragments from term base and TM are inserted into the translation field

35

Automatic sub-segment translation: Examples

• memoQ: Fragment Assemblyknown elements (from TM and term base) are embedded into the source sentence

36

Automatic sub-segment translation: Examples

• SDL Trados 2007Pre-translation with insertion of terms from the term base as annotations

37

Automatic sub-segment translation: Examples

• DéjàVu AutoAssemble– Lexicon (extracted terms of the source language

text)– User adds translations to terms in the lexicon list– Assembly of segments during translation

• Translation of similar sentences are filled with terms from lexicon

38

Automatic sub-segment translation: Examples

• DéjàVu AutoAssemble

39

Automatic sub-segment translation: Examples

• SDL Trados AutoSuggestAn AutoSuggest database is created with statistical means, extracting frequent source language fragments and their (statistical) counterpart in the target language from a TM. When the translator starts to type, the suggestions for translations are displayed in a list.

41

Automatic sub-segment translation: Examples

• MultiTrans WordAlignFrom the database of bi-texts, a list of phrase and term pairs is created, and the statistics for the terms in the translation can be shown to the translator

42

Automatic sub-segment translation: Examples (MultiTrans)

43

Sub-Segment Suggestions

• For sub-segment matching in the target language– Tools extract phrase pairs (from 1 to n words per phrase)

with statistical means from a bilingual source (translation memory, bi-text corpus)

– A list or database of these phrase pairs provides suggestions for the translation of those phrases as auto completion

– Linguistic analysis is not (yet) a part of this phrase matching process

Thank you foryour attention!

Angelika Zerfaß[email protected]

Daniel [email protected]