21
MMT – Modern, Next Generation Machine Translation Achim Ruopp, Directory of R&D [email protected]

MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Embed Size (px)

Citation preview

Page 1: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

MMT – Modern, Next Generation Machine Translation

Achim Ruopp, Directory of R&[email protected]

Page 2: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

MMT Project

Horizon 2020 Innovation Action

3M € funding

3 years: 2015-2017

Goal:

deliver a large-scale commercial online machine

translation service based on a new open-source distributed

architecture.

This project has received funding from the European Union's Horizon 2020

research and innovation programme under grant agreement No 645487.

Page 3: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

MMT Team

Business Research

Special thanks to Marcello Frederico (FBK) and Ulrich Germann

(University of Edinburgh) for many of the slides!

Page 4: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Setting up MT for CAT today

1. Select TMs

2. Collect extra data

3. Train and evaluate engine

4. Doesn’t work? back to 2.

5. Analyse/process input documents

6. Apply MT on fake TM

7. Import TMs in CAT tool

8. Start translating

9. Adapt engine to new data - go back to 3.

10. New project? back to 1.

Page 5: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

The MMT way

1. Drag & drop your private TMs

2. connect your CAT with a key

3. Start translating!

Page 6: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Modern MT in a nutshell

Zero training time

Manages context

Learns from users

Scales with data and users

Page 7: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Prototype (April 2016) - Fast training

Page 8: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Context aware translation

party

CONTEXT

We are going out.

TRANSLATION

fête

SENTENCE

CONTEXT

We approved the law

TRANSLATION

parti

Page 9: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Prototype (March 2016)

Page 10: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

MS Translator Hub vs Modern MT

Page 11: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

MMT vs. Moses core language processing

● More supported languages

● Faster processing

● Simpler to use

● Tags and XML management

● Localisation of expressions

Page 12: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

REST API

GET /translate?q=party&context=We+approved+the+law

"translation": "parti",

"context": [

{ "id": "europarl",

"score": 0.10343984

}, …

]

Page 13: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

MMT Architecture

Page 14: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

MMT Data Pooling

Partner’s repositories: MyMemory (Translated)

Data Cloud (TAUS)

Volume pooled for the English-Italian prototypes

ca 785M words & 423M segments in total

Page 15: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

MMT Data Collection from CommonCrawl

commoncrawl.org – US-based non-profit“CommonCrawl is a 501(c)(3) non-profit organization

dedicated to providing a copy of the internet to internet

researchers, companies and individuals at no cost for the

purpose of research and analysis.”

On average 1.5 billion unique URLs per crawl Vs. an estimated 50 billion pages in Google index and 20

billion pages in Microsoft Bing index

What can be considered the “surface web” vs. the “deep

web”?

Two questions1. What language are these pages in?

2. Which pages are translations of each other?

Page 16: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Monolingual Data Including English

Page 17: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Monolingual Data Excluding English

Page 18: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Parallel Data Projections from en→it

Page 19: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

MMT is Open Source

LGPL/Apache licences

new core technology

github.com/ModernMT/MMT

soon: github.com/ModernMT/DataCollectionemail me if you are interested

Page 20: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

Roadmap

2015 Q1 2016 Q2 2016 Q4 2017 Q4

development

started

first alpha

release.

10 langs,

fast training,

context aware,

distributed

first beta

release

45 langs,

Incremental

learning

final release

enterprise

ready

Page 21: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)

This slide may not be used or copied without permission from TAUS

THANK YOU!