45
Machine translation with free software for enterprises 09/07/2010 1/45 Machine translation with free software for enterprises Bordeaux, 6th July 2010 Garbiñe Aranbarri

Mt translation with free software en

Embed Size (px)

Citation preview

Machine translation with free software for enterprises 09/07/2010 1/45

Machine translation with

free software for

enterprises

Bordeaux, 6th July 2010

Garbiñe Aranbarri

Machine translation with free software for enterprises 09/07/2010 2/45

ELEKA

www.eleka.net [email protected]

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 3/45

Translation engines with

free software

Apertium

http://sourceforge.net/projects/apertium/files/

Matxin

http://sourceforge.net/projects/matxin/files/

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 4/45

Machine translation with

free software

Based on the

Apertium and

Matxin

technologies

www.opentrad.com

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 5/45

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 6/45

Practical and accessible

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 7/45

Quality: revise always!

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 8/45

MT: speeds up translation process

Free software

continuous improvements

flexibility for the adaptations

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 9/45

Translation engines with

free software

Apertium

http://sourceforge.net/projects/apertium/files/

Matxin

http://sourceforge.net/projects/matxin/files/

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 10/45

Language pairs of the same family

Superficial transfer

Apertium in the beginning

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 11/45

Superficial transfer modules of Apertium

Deformatting

Morphological analysis

Structural transfer Lexical transfer

Morphological generation

After-generation

Reformatting

Source text

Translation

Lexical disambiguation

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 12/45

ES: He recibido un pagaré.

...

Modules

Source text

Deformatting

Morphological analysis

Disambiguation

Transfer

Generation

After-generation

Reformatting

Translation

...

Machine translation with free software for enterprises 09/07/2010 13/45

Deformatting:

He recibido un pagaré....

Modules

Source text

Deformatting

Morphological analysis

Disambiguation

Transfer

Generation

After-generation

Reformatting

Translation

...

Machine translation with free software for enterprises 09/07/2010 14/45

Morphological analysis:He: He/Haber, verb, pres. indic., 1st p. sg.

recibido: recibido/recibido, adj, m. sg. / recibir, verb, participle, m. sg.

un: un/uno, det, ind, m. sg.

pagaré: pagaré/pagaré, n, m. sg. / pagar, verb, future indic, 1st p. sg.

...

Modules

Source text

Deformatting

Morphological analysis

Disambiguation

Transfer

Generation

After-generation

Reformatting

Translation

...

Machine translation with free software for enterprises 09/07/2010 15/45

Lexical disambiguation:He: Haber, verb indic., 1st p. sg.

recibido: recibir, verb, participle, m. sg.

un: uno, det., ind, m. sg.

pagaré: pagaré, n, m. sg.

...

Modules

Source text

Deformatting

Morphological analysis

Disambiguation

Transfer

Generation

After-generation

Reformatting

Translation

...

Machine translation with free software for enterprises 09/07/2010 16/45

Structural and lexical transferHe: (haber) pp1, sg – (avoir) pp1, sg.

recibido: (recibir) participle, m, sg – (recevoir) participle,m, sg.

un: (uno) det, ind, m, sg – (un) det, ind, m, sg.

pagaré: (pagaré) n, m, sg – billet (m, sg) à ordre

...

Modules

Source text

Deformatting

Morphological analysis

Disambiguation

Transfer

Generation

After-generation

Reformatting

Translation

...

Machine translation with free software for enterprises 09/07/2010 17/45

Morphological generation

~Je ai reçu un billet à ordre~.~...

Modules

Source text

Deformatting

Morphological analysis

Disambiguation

Transfer

Generation

After-generation

Reformatting

Translation

...

Machine translation with free software for enterprises 09/07/2010 18/45

After-generation

J'ai reçu un billet à ordre....

Modules

Source text

Deformatting

Morphological analysis

Disambiguation

Transfer

Generation

After-generation

Reformatting

Translation

...

Machine translation with free software for enterprises 09/07/2010 19/45

Reformatting

J'ai reçu un billet à ordre....

Modules

Source text

Deformatting

Morphological analysis

Disambiguation

Transfer

Generation

After-generation

Reformatting

Translation

...

Machine translation with free software for enterprises 09/07/2010 20/45

FR: J'ai reçu un billet à ordre.

...

Modules

Source text

Deformatting

Morphological analysis

Disambiguation

Transfer

Generation

After-generation

Reformatting

Translation

...

Machine translation with free software for enterprises 09/07/2010 21/45

Language pairs of

different families

Deep transfer

Statistical training of the

dictionaries

Apertium today

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 22/45

Linguistic adaptations

Objectives

Context

Ex.: media

Understandable

General

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 23/45

scie/sierra - sierra (fr-es)sierra-sierra

scie-sierra (LR)

MULTIWORDS

scie circulaire – sierra circular

scie à archet – sierra de arco

scie à ruban – sierra de cinta

etc.

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 24/45

Publishing house

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 25/45

Several fields:

mitocondria-mitòcondria, …

alejandrino-alexandrí, …

califato-califat, …

blues-blues

Pitágoras-Pitàgores

etc.

es-ca / es-gl

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 26/45

« Trainières » associationIntroduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 27/45

Specific terminology:

Remo-oar

club de remo-rowing club

Caja Madrid-Caja Madrid

(not “Box Madrid”)

Zarauzko estropadak-

Zarauzko estropadak

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 28/45

Savings bank

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 29/45

Bank terminology (multiwords):

payer en espèces -

pagar en metálico

payer en liquide –

pagar en metálico (LR)

billet à ordre - pagaré

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 30/45

Needs?

What translation

procedure?

Integrations

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 31/45

Implementation

Integration of Apertium in

the system of the company

Definition of an API

Readaptation of the

codification

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 32/45

Translation engines with

free software

Apertium

http://sourceforge.net/projects/apertium/files/

Matxin

http://sourceforge.net/projects/matxin/files/

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 33/45

RBMTES EU

Rule-based

Deep transfer

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 34/45

Deep transfer modules of Matxin

Deformatting

Deep analysis

Structural transfer Lexical transfer

Morphological generation

After-generation

Reformatting

Source text

Translation

Syntactical generation

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 35/45

Deformatting

Morphological analysis

ST LT

Morphological generation

After-generation

Reformatting

ST

T

Lexical disambiguation

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

ST Deformatting

Deep analysis

ST LT

Sintactical generation

Morphological generation

After-generation

Reformatting T

Apertium (superficial transfer) Matxin

Machine translation with free software for enterprises 09/07/2010 36/45

Differences between ES and EU

ES

inflected

inflections and prepositions

EU

agglutinative

words= lexemes + affixes

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 37/45

«para los de la casa = etxekoentzako»

(« for those of the house »)

para (prep.)

los (art./m/pl)

de (prep.)

la (art./f/sg)

casa (SN/f/sg)

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

etxe- (n)

-koentzako (morpheme = « for those of the »)

Machine translation with free software for enterprises 09/07/2010 38/45

Terminological adaptation

Improve the lexical choice

Method based on corpus

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 39/45

Polysemy

ES EU_1 EU_2

capital (m/f) hiriburu (capital city)

kapitala (capital)

formación (f) alderdi (political party)

trebakuntza (training)

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 40/45

Similar signifiers in ES and in EU:

capital => kapital

formación => formazio

Multiword

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 41/45

MultiwordPN: Pilar del Castillo => gazteluko mugarri

(pilier du château / pilar of the castle)

Others: ministro de educación, política social y deporte

(ministre d'éducation, politique social et sports /

education, social policy and sports minister)

recinto cerrado => esparru itxi

(enceinte fermée / enclosure)

recinto ferial => azoka barruti

(enceinte de l'exposition / exhibition site)

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 42/45

SMTEU ES

Moses

Less developed than ES-EU

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 43/45

ChallengesMatxin: ES<->EU productive

Apertium :

Keep on improving

Create new language pairs

Follow the technological

research

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 44/45

Machine translation with

free software

Based on the

Apertium and

Matxin

technologies

www.opentrad.com

Introduction

MT and free software

Apertium and Matxin

Apertium

Modules

Adaptations

Linguistic A.

publish

sports

finance

Integrations

Matxin

ES-EU

Modules

ESǂEU

Adaptations

EU-ES

Conclusion

Machine translation with free software for enterprises 09/07/2010 45/45

Thank you very much!

[email protected] | www.eleka.net