20

Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Embed Size (px)

Citation preview

Page 1: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)
Page 2: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

KantanMT.com – A Complete MT Platform

Kantan

Templates

Kantan

NER

Kantan

Llibrary

Kantan

Fleet

Kantan

BuildAnalytics

Build

Kantan

Analytics

Kantan

PEX

Kantan

LQR

Adaptive

MT

Kantan

GENTRY

Kantan

TotalRecall

Kantan

Neural

Improve

Kantan

Translate

Kantan

Swift

Kantan

API

Kantan

AutoScale

Kantan

OfficeMT

Kantan

Connectors

Kantan

Snippets

Deploy

Page 3: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Translation Quality Evaluation

Page 4: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Translation Quality Evaluation

KantanLQR Built into the KantanMT platform

Integral step in KantanMT Engine Development

Translation Quality Evaluation Factored Model

Templates based on Simplified Factors, MQM, and DQF and MQM-DQF

A/B Testing A, B (C or D) testing now fully supported

Real-time data analytics built into your LQR Dashboard

Available to all KantanMT Account holders

Page 5: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training Efficiency

Page 6: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training EfficiencyEn

gin

e tr

ain

ing

tim

e Pro

du

ct Delivery

Page 7: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training Efficiency

Giza++

Fast_Align

Page 8: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training Efficiency

Language Arc WC Unique WC

EN-FR 781,075 42,563

109,379,800 1,008,696

EN-DE 786,981 42,648

138,119,563 1,084,485

EN-ES 861,557 44,375

154,169,102 1,119,475

EN-IT 924,331 38,506

104,196,079 914,889

EN-ZH 810,134 33,281

58,274,131 550,862

Page 9: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training Efficiency

Language Arc

WC Unique WC GIZA++

EN-FR 781,075 42,563 00:09:23

109,379,800 1,008,696 10:35:11

EN-DE 786,981 42,648 00:10:06

138,119,563 1,084,485 15:33:43

EN-ES 861,557 44,375 00:10:21

154,169,102 1,119,475 14:07:21

EN-IT 924,331 38,506 00:11:03

104,196,079 914,889 11:09:32

EN-ZH 810,134 33,281 00:10:07

58,274,131 550,862 10:08:16

Page 10: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training Efficiency

Language Arc

WC Unique WC GIZA++ Fast-Align

EN-FR 781,075 42,563 00:09:23 00:03:49

109,379,800 1,008,696 10:35:11 04:02:14

EN-DE 786,981 42,648 00:10:06 00:03:57

138,119,563 1,084,485 15:33:43 04:13:57

EN-ES 861,557 44,375 00:10:21 00:04:20

154,169,102 1,119,475 14:07:21 04:54:12

EN-IT 924,331 38,506 00:11:03 00:04:32

104,196,079 914,889 11:09:32 05:46:41

EN-ZH 810,134 33,281 00:10:07 00:04:45

58,274,131 550,862 10:08:16 03:34:13

Page 11: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training Efficiency

Language Arc

WC Unique WC GIZA++ Fast-Align Difference

EN-FR 781,075 42,563 00:09:23 00:03:49 59%

109,379,800 1,008,696 10:35:11 04:02:14 62%

EN-DE 786,981 42,648 00:10:06 00:03:57 61%

138,119,563 1,084,485 15:33:43 04:13:57 73%

EN-ES 861,557 44,375 00:10:21 00:04:20 58%

154,169,102 1,119,475 14:07:21 04:54:12 65%

EN-IT 924,331 38,506 00:11:03 00:04:32 59%

104,196,079 914,889 11:09:32 05:46:41 48%

EN-ZH 810,134 33,281 00:10:07 00:04:45 55%

58,274,131 550,862 10:08:16 03:34:13 65%

Average 61%

Page 12: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training Efficiency

70.8

73.7

70.4

71

74.7

66.3

75.9

69.5

66.6

74.4

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

EN-DE-large

EN-ES-large

EN-FR-large

EN-IT-large

EN-ZH-large

F-MEASURE

66.2

60.5

61.8

60.5

53.7

63.4

63.5

62.2

61.3

52.2

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

EN-DE-large

EN-ES-large

EN-FR-large

EN-IT-large

EN-ZH-large

BLEU

43.7

40.2

42.7

41

48.7

49.6

37.2

43.5

44.6

48.8

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

EN-DE-large

EN-ES-large

EN-FR-large

EN-IT-large

EN-ZH-large

TER

Page 13: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training Efficiency

57.3

69.5

63

61.9

75.4

58.6

67.1

61.8

61

76.5

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

EN-DE-small

EN-ES-small

EN-FR-small

EN-IT-small

EN-ZH-small

F-MEASURE

55.6

59.2

62.7

54.2

44.2

59.2

56.9

60

53

45.3

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

EN-DE-small

EN-ES-small

EN-FR-small

EN-IT-small

EN-ZH-small

BLEU

58.9

44.9

51.9

52.6

43.9

55.1

48.6

53.5

54.4

41.5

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

EN-DE-small

EN-ES-small

EN-FR-small

EN-IT-small

EN-ZH-small

TER

Page 14: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Improving Training Efficiency

Dr. Dimitar Shterionov, [email protected], KantanLabs Dr. Jinhua Du, [email protected], ADAPT Centre, DCU

Marc Anthony Palminteri, [email protected], KantanMT.comLaura Casanellas, [email protected], KantanMT.com

Tony O’Dowd, [email protected], KantanMT.comProf. Andy Way, [email protected], ADAPT Centre, DCU

Page 15: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

KantanNeural™

Page 16: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

KantanNeural™ - Developments

3 Language Combinations

EN-DE, EN-ZH, EN-JP

Identical Training Data Catalogs

Training, Testing & Tuning

Phase 1 : Automated Test Score Comparisons

Phase 2 : Professional Translator A/B Testing

Arcs # Segments # Words Domain

EN-DE 8.8 million 156 million Legal

EN-ZH 3.5 million 53 million Legal

EN-JA 8.1 million 90 million Legal

Page 17: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

KantanNeural™ - Developments

Phase 1 : Automated Test Score Comparisons

Arcs Type F-Measure BLEU TER

EN-DE SMT 68% 59% 50%

NMT 67% 49% 51%

Arcs Type F-Measure BLEU TER

EN-ZH SMT 76% 43% 45%

NMT 73% 43% 44%

Arcs Type F-Measure BLEU TER

EN-JA SMT 78% 53% 45%

NMT 68% 40% 53%

Page 18: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

KantanNeural™ - Developments

Phase 1 : Automated Test Score Comparisons

Now available for use on the KantanMT Platform Beta I Release

Part of the KantanFleet Collection of pre-built engines

KantanMT Account holders can now translate All document formats are supported

New Language Arcs will be added during Q1 2017

Arcs Type F-Measure BLEU TER

EN-DE SMT 68% 59% 50%

NMT 67% 49% 51%

Arcs Type F-Measure BLEU TER

EN-JA SMT 78% 53% 45%

NMT 68% 40% 53%

Arcs Type F-Measure BLEU TER

EN-ZH SMT 76% 43% 45%

NMT 73% 43% 44%

Page 19: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

KantanNeural™ - Developments

Phase 2 : Professional Translator A/B Testing

KantanLQR A/B Testing starting in Feb

Will publish results in March/April timeframe

Domain Adapted NMT

Available Feb 2017

Beta I Release

Page 20: Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)

Solving

Thank you…