21
2 TAUS REPORT Putting Machine Translation to Work Copyright © 2006 by TAUS Putting Machine Translation to Work Report on the TAUS Executive Forum on March 23-24, 2006 in Washington DC TAUS REPORT April 2006

Putting Machine Translation to Work

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

2

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Putting MachineTranslation to Work

Report on the TAUS Executive Forum on March 23-24, 2006 in Washington DC

TAUS REPORT Apri l 2006

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Table of Contents

Participants 1

Different Approaches to Machine Translation 2

Best Practices in Post-Editing MT 2

Barriers and solutions to implementing MT 2

User cases 3

Technology roadmaps 11

Five most critical conditions for implementing MT 13

Looking back, looking forward 15

TAUS action items 18

FOUNDING MEMBERSUSERS

IT• Autodesk• EMC Software Group• FileNET• Hewlett-Packard ACG• Intel• McAfee• Oracle• Sun Microsystems• Symantec• PTC• UGS

Telecom• Avaya• Cisco• Lucent

Medical• Gambro BCT• MAQUET Critical Care• Molina• Philips• Siemens• Spacelabs Medical

Food• McDonald’s Corporation

Patents• Zacco A/S

Institutions• European Patent Office• International Monetary Fund• SWIFT

PRACTITIONERS

• CLS Communications• Delta International• Eurotexte• GrafiData• Lionbridge• Logrus• Merrill Brink• SDL International• TOIN• Transco• Vistatec• WH&P• Yamagata Europe

PRODUCTS

• acrolynx• AuthorIT• CCID Transtech• Cross Language• DocZone• Idiom Technologies• Language Weaver• Meaningful Machines• Multicorpora

SummaryWith a good sense of history, the TAUS Executive Forum was held in the Key Bridge Marriott Hotel in Washington DC with a clear view of Georgetown University on the other side of the Potomac river, where just over fifty years ago the first MT experiment was performed on an IBM mainframe computer. The almost perfect translation of a Rus-sian text into English convinced one of the project leaders that ‘all of the Soviet Union could easily be translated in a couple of hours’ and that ‘human translators would no longer be needed in a period of five years’. We opened the Forum meeting with a video of this news report from 1954, which Steve Richardson from Microsoft had kindly made available to us.

From the discussions of 11 user cases and 3 technology roadmaps during this two-day forum, it became clear that the predictions of that enthusiast back in the 1950s were far off the mark. But at the same time we concluded that MT technology has a very clear role to play in business these days. New applications have emerged, and in the mainstream publishing model, MT is being integrated with existing translation tools and workflows to generate productivity increases of 18% to 50%.

In the breakout sessions the 33 delegates from large user companies and smaller practitioner agencies discussed ways to streamline the integration and deployment of MT. High on the agenda are industry collaboration, sharing and learning from user cases, and validating ROI. This is exactly what TAUS likes to deliver to its members.

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

ParticipantsThe TAUS Executive Forum was attended by 33 delegates from a mix of corporate and institutional users, practitioners and product developers.

Name Company Profile Presentation

Andrew Bredenkamp acrolinx Product

Paul Trotter AuthorIT Product

Martin Guttinger Cisco User

Anthony Clarke CLS Communication Practitioner User case

Frank Ouyang Cognos User

Heidi Depraetere Cross Language Product User case

Roger Troth Dannemann, Siemsen, Bigler & Ipanema Moreira

User

Jessica Roland EMC User

Enrique Filloy-García European Patent Office User User case

Tony Dolph Idiom Product

Mike Denzin Idiom Product

Julie Chang Intel User

Bill Skinner International Monetary Fund User User case

Bernd Löffler iSP Practitioner

Svetlana Sheremetyeva Lana Consulting Product Roadmap

Kirti Vashee Language Weaver Product Roadmap

Rafa Moral Lionbridge Practitioner/Product User case

Serge Gladkoff Logrus Practitioner

Michael Steinbaum Meaningful Machines Product Roadmap

Tim Ehrhard Merrill Corporation Practitioner

Sonia Gordon Molina Healthcare, Inc. User

Martha Bernadett Molina Healthcare, Inc. User

Daniel Gervais MultiCorpora R&D Inc Product User case

Aiman Copty Oracle User

Julia Aymerich PAHO User/Product User case

Sinclair Morgan SDL Practitioner/Product User case

Guy Van Leemput Swift User User case

Johann Roturier Symantec User User case

Masaru Kawahara TOIN Practitioner

Vic Dickson Transco Practitioner

Yolanda Tan Transco Practitioner

Phil Ritchie VistaTEC Practitioner

Viggo Hansen Zacco User User case

Putting Machine Translation to WorkReport on the TAUS Executive Forum on March 23-24, 2006 in Washington DC

1

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Different Approaches to Machine TranslationPresented by Jaap van der Meer.In the fifty-year history of machine translation (MT), different schools of thought have emerged about the best approach to building an MT system. While researchers fought over the best way to achieve Fully Automatic High Quality Translation (FAHQT), market requirements have gradually changed. End-users seem to be quite satisfied with useful translation quality. The real difficulty is how to measure the quality of any translation, machine or human. Various user cases prove that there is genuine value to be derived from using machine translation technology. The TAUS Foundation Report about the Different Approaches to Machine Translation gives a high-level overview of the evolution of machine translation. It presents the basic concepts and the key issues, providing a useful introduction to anyone who is relatively new to the topic of MT. This report is available on the TAUS member portal.

Best Practices in Post-Editing MTPresented by Andrew Joscelyne.Post-editing is best understood as an integral part of the automated translation and localization process, rather than a separate stage of editing, revising, quality assurance. It involves linguistic more than subject area skills and is performed best by alert trans-lators, familiar with machine output, working in a standard translation environment. Practical training is required to spot and correct typical machine mistakes as quickly as possible, and ensure that the automation system receives appropriate feedback to upgrade the dictionary or rule base. Publishable (dissemination) quality post-editing can output around 5,000 translated words a day, whereas lighter editing for gisting (assimilation) can at least double this rate. Since the overall aim of any translation automation solution is to reduce costs and accelerate throughput at consistent quality levels, future work on post-editing will seek to optimize the post-editing task by both improving input quality to the translation process, and also using emerging tools to automatically correct egregious machine output errors before the actual post-editing begins. The TAUS Best Practices Report about Best Practices in Post-Editing MT col-lates findings from various user sources on current practice in post-editing in a trans-lation automation environment. This report is available on the TAUS member portal.

During this presentation one or two of the participants shared the experience that full-time post-editors seem to become ‘immune’ for a certain level of machine translation mistakes over time. The resolution in these cases has been to rotate the task of post-editing so that a consistent level of quality review can be maintained.

Barriers and solutions to implementing MTFor the Forum to be as useful as possible, it is vital that the participants share publicy their own opinions, doubts, preconceived ideas, and practical experience of language automation and especially MT. So, as an exercise in sharing, learning and consensus seeking, all attendees divided into six “tables” to discuss their views of the key barriers

2

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

to implementing MT, and some of the most relevant solutions that address these bar-riers. These brainstorming sessions then gave rise to 6 lists of barriers and solutions. A vote was then taken to decide on what this TAUS Forum considered to be the five most effective solutions to these barriers.

The top five barriers selected were:1. Lack of ROI validation for MT solutions2. Unrealistic customer expectations about MT capabilities3. Poor quality of source documents4. Fragmentation of available tools and solutions 5. Resistance from professional translators

The top five solutions voted were:1. Prioritize domain-specific solutions, there is no one-size fit all MT2. Need better, more practical quality metrics, plus more industry collaboration and

sharing of resources across companies3. Need more competition between MT players and more ROI success stories 4. Terminology extraction is vital, and should be made more accessible. Need for

more prototypes and proof of concept MT solutions5. Highlight MT positively as a vital support technology for language professionals

By matching the barriers and the solutions resulting from this brainstorming session, it seems that the Executive Forum itself and TAUS in general plays a vital role in facilitating a broader adoption of machine translation. The eleven user cases presented at the Washington Forum gave insight in the benefits of using MT (“the ROI validation”).

User casesEleven different user cases were presented at the TAUS Executive Forum in Washing-ton. We like to report on these user cases in a standard format, since this allows us to extract intelligence, draw conclusions and make comparisons more easily. We cate-gorize the application types roughly into dissemination and assimilation models. Of the eleven user cases below eight are of the dissemination type. This means that MT is used to support a more efficient publishing process. Usually the MT engine is inte-grated with the other long-existing language automation which we began by referring to as translation memory. Productivity increases for MT alone are quoted between 18% on the low end to more than 50%.

The other three user cases are of the assimilation type. This means that MT is used to provide real-time translation (no post-editing) for the purpose of gisting or information retrieval to internal staff or external customers and users. The usage models in our series of cases below provide a good breakdown of the application types we generally find in the market: one is for customer support (SWIFT), one

3

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

is for unlocking information on multilingual corporate Intranets (UBS) and one is for expert information access (European Patent Office). In assimilation user cases, executive management is looking for benefits such as reduced customer support costs due to fewer contact center calls, increased efficiency of corporate staff, and tighter security since confidential information is not being sent to public translation engines on the Internet.

We categorize translation engines as rule-based (also referred to as transfer-based) MT systems and statistical MT (or data-driven) systems. This simple classification is not always helpful though, as the developers like to keep differentiating themselves. As we state in our TAUS Foundation Report Different Approaches to MT, the short history of MT has taught us that all efforts at differentiating new systems from old or other systems seem to end by blending together in a new wave of convergence. “Long live the hybrid”: each of the user cases below use a combination of MT and TM. Statistical systems do not seem to be in use in business environments yet.

User: Canadian Government Agency

Presenter: Daniel Gervais Multicorpora

Translation engine: Rule-based MT integrated with text-based TM

Application type: Publishing (dissemination)

Detailed description: The government agency uses a commercial MT system and Multi-corpora’s TextBase TM combined with post-editing services to translate job postings.

Customization:

Corpora: There are around 90 millions words of free text per year in job postings.

Conditions for MT implementation: Improve quality of source text (spell and grammar checkers).

Integration: Translation software components are integrated in a database driven translation workflow.

Measuring MT quality Combined use of database and TM and MT resulted in a quality impro-vement of 60% over previous solution.

Measuring MT productivity Post-editing of 16.2% of all job offers. Productivity is 12K words per person per day (cost: 0.0039US$/word). The total translation cost is: 0.0057US$/word.

4

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

User: European Patent Office Presenter: Enrique Filloy-García European Patent Office

Translation engine: Rule-based or statistical MT

Application type: Gisting, information retrieval (assimilation)

Detailed description: European Patent Office will deploy MT to offer the public access to the databases of patent claims and descriptions.

Customization: Dictionaries are developed based on a term extraction process from bilingual corpora. The dictionaries are implemented in a hierarchical structure following the existing classification scheme of the patent organization. EPO follows the example of the Japan Patent Office in this hierarchical dictionary set-up.

Corpora: Existing corpora of hundreds of thousands of patent claims and descrip-tions.

Conditions for MT implementation: Political support, IT support, financial support and user support from the global patent community.

Integration: In the EPO network.

Measuring MT quality Quality differs dependent on the technical fields and the user expectati-ons. In general a “Degree of Understanding” is achieved that meets the requirements of the application.

Measuring MT productivity Benefits are measured in strategic terms: faster access to patent infor-mation will stimulate economic development and industry innovation.

User: DaimlerChrysler Presenter: Sinclair Morgan, SDL

Translation engine: Rule-based MT integrated with TM

Application type: Publishing (dissemination)

Detailed description: SDL has implemented SDL KbT System, which is an integration of the SDL rule-based (Transcend) MT system with SDL’s TM, to translate the service literature for Chrysler corporation into the FIGS and Canadian French languages.

Customization: Dictionaries are developed via term mining and encoding. The process takes about 6-8 weeks to customize, test on samples, review of post-edit output.

Corpora: Service manuals count 5000 to 8000 pages, around 1 million words.

Conditions for MT implementation: No control of source but looking at authoring support.

Integration: SDL has integrated SDL KbT System with the content production system of Tweddle Litho, the documentation production partner of Chrysler. This is a seamless integration of KbTS with the XML publishing environment. The post-editors work in the familiar SDL translation editors.

Measuring MT quality Feedback from post-editors. The J2450 quality assessment standard originally designed for the automotive industry is applied to keep statis-tics on the output quality.

Measuring MT productivity 35% cost savings over previous process, 40% productivity increase using small post-editor teams.

5

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

User: Fortis AG Presenter: Heidi Depraetere, Cross Language

Translation engine: Rule-based MT integrated with TM

Application type: Publishing (dissemination)

Detailed description: Cross Language has integrated SYSTRAN MT into the TRADOS Team-Works translation workflow system. This gives the internal translation department at Fortis AG, an insurance company, a natural environment for post-editing TM and MT matches in the standard TRADOS transla-tion editor. Annual volume is around 6 million words in source language.

Customization: Customer-specific dictionaries were developed based on alignment of translated documents, term extraction and validation. The whole process including coding for the SYSTRAN engine took 40 days and resulted in dictionaries of around 20,000 terms in four languages.

Corpora: Corpora for the customization effort consisted of around 750,000 words from a variety of domains, including financial and insurance text, legal documents and general interest articles on the company Intranets and Extranets. Level of repetition in this varied domain corpus is relatively low.

Conditions for MT implementation: The installation at Fortis AG became a success thanks to very strong management support. There was an urgent need to automate due to growing translation volumes and shorter deadlines.

Integration: The integration of the SYSTRAN engine in the TeamWorks workflow environment created a hybrid translation environment where.

Measuring MT quality Fortis AG needs high quality translation. Dictionary ‘tweaking’ is provi-ded based on feedback on MT errors by the translation team.

Measuring MT productivity MT is applied to 80% of translation requests. After three months the MT engine alone generated an 18% increase in productivity. The combina-tion with TM in the TeamWorks environment resulted in an overall 30% increase in productivity.

User: Microsoft/Lionbridge Presenter: Rafa Moral, Lionbridge

Translation engine: Rule-based MT integrated with TM

Application type: Publishing (dissemination)

Detailed description: Lionbridge has put its proprietary MT system Barcelona to work in an integrated service environment to increase productivity and quality in a 14 million words translation project for Microsoft into three target langu-ages. The MTM system is integrated with monolingual and bilingual ter-minology extraction, entitiy recognition and several other components.

Customization: The MT system was customized with project-specific dictionaries and grammar rules. An important feature of the Barcelona MT system is the easy customization of dictionaries and also of the MT grammar rules to meet the specific terminology and grammar/stylistic needs of each customer or domain.

Corpora: The MT system was applied to this one project which included software interface and Help for a new release of Visual Studio. A pilot project was run on 10% of the total corpus.

6

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Conditions for MT implementation: A process of continuous improvement must be put in place, based on multi-disciplinary cooperation, thorough preparation and a long term vision.

Integration: The MT system is integrated with TM and a suite of proprietary tools for reporting, post-editing.

Measuring MT quality The translation team worked closely with the authors to address trans-latability issues. Lionbridge applied a tool to track the MT quality output in relation to the quality output of professional translators, labeled as Edit Distance. The tool is similar to the Blue score method tracking the deviation of the MT output from samples of professional translations.

Measuring MT productivity Tracked against the process with no MT, and using Edit Distance metrics to identify the MT output quality improvement and productivity increase during the project life, as more customized dictionaries and rules were created, this project generated savings from the beginning, and established the foudation (MT dictionaries and rules) for the use of MT in future versions of that and similar projects.

User: SWIFT Presenter: Guy Van Leemput, SWIFT

Translation engine: Rule-based MT Application type: Customer support (assimilation)

Detailed description: SWIFT has just started to deploy SYSTRAN Web Server 5.0 to provide real-time translation of the 1,500 knowledge base articles to SWIFT customers from English into French, Spanish, German and Simplified Chinese.

Customization: The MT system was customized with customer-specific dictionaries by an in-house team. The plan is to dedicate one full-time resource to maintain the MT dictionaries based on the viewing stats of the system. As SWIFT has full control over the source text, efforts will be put into implementing a ‘style for MT’ at the authoring phase.

Corpora: The MT system is applied to a corpus of around 3,000 pages of techni-cal content that is developed internally by the customer support staff.

Conditions for MT implementation: At SWIFT a strict control of the English source is critical to the success of the MT deployment. The MT project had strong support from execu-tive management.

Integration: The MT system is integrated in SWIFT’s customer support extranet open to the general public.

Measuring MT quality Currently the quality is only judged on a subjective and incidental basis. However executive management will need a more objective measure-ment.

Measuring MT productivity The benefits of the MT implementation will be measured in the reduc-tion of in-coming support requests in the call center. Surveys will be conducted to track the satisfaction rate of call deflection.

7

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

User: Symantec Presenter: Johann Roturier, Symantec

Translation engine: Rule-based MT, integrated with TM

Application type: Publishing (dissemination)

Detailed description: Symantec has started using SYSTRAN Webserver 5.0 to translate cus-tomer support documents and virus alerts with a very quick turnaround requirement. MT is integrated with TM. Light post-editing results in the required turnaround time. Current languages: French, German, Italian, Spanish, Portuguese.

Customization: Customization started with the import glossaries of around 3000 entries. Limited coding was required, using the SYSTRAN Intuitive coding feature.

Corpora: The MT system is applied to Symantec support documents, highly technical and restricted domain texts. Use will be extended to some user documentation.

Conditions for MT implementation: Symantec found that a controlled source “content model” instead of a free flow of information input, reduced the amount of required lexical customization work and post-editing. Strict post-editing guidelines are also an absolute necessity for successful use of the MT system.

Integration: The SYSTRAN Webserver will be integrated into the Idiom Worldserver which Symantec uses to manage all of its translation workflows. This integration will result in a hybrid translation environment where each MT output segment will be a TM input segment and go through the standard translation editing process.

Measuring MT quality Use General Text Matcher to score non- and post-edited output. MT errors are tracked by separating errors into terminology-related errors and syntactic errors.

Measuring MT productivity The MT system has proven to have the potential of doubling the output average per translator per day.

User: UBS Presenter: Anthony Clarke, CLS Communication

Translation engine: Rule-based MT Application type: Gisting, information retrieval (assimilation)

Detailed description: CLS Communication uses the Braintribe MT system (formerly known as Comprendium and a reincarnation of the old Metal system) to provide real-time translation in a secure environment through an ASP delivery to the staff of the UBS bank in Switzerland. CLS provides additional terminology and professional translation services.

Customization: Lexical customization was done using translation memories as a base. Before the service was launched on a large scale a pilot phase was undertaken followed by user surveys.

Corpora: The user environment means that there is no control over the source text and a wide variety of domains. CLS offers post-editing as additional service.

8

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Conditions for MT implementation: Strong management support for a secure solution for direct access to translation is the number one condition. The market environment of international staff and customer base combined with a number of official languages makes the deployment of MT a favorable scenario. MT helps to improve on human translation cost. Last but not least CLS sees a need to market the online MT service on a continuous basis.

Integration: The on-demand MT service is offered via an Internet browser.

Measuring MT quality The MT quality is not measured, but CLS runs regular user surveys to check on user satisfaction and also keeps track of the frequency of usage.

Measuring MT productivity Metrics are used to track customer usage, showing increased produc-tivity from the MT solution (replacing a human translator contract), with savings in dictionary management, etc.,

User: Zacco Presenter: Viggo Hansen, Zacco

Translation engine: Rule-based MT, integrated with TM

Application type: Publishing (dissemination)

Detailed description: Zacco has been using a proprietary MT system called PaTRANS, which is a rule-based MT system built on the foundations of the Eurotra project. The MT system translates English patent documents into Danish. The purpose of translation is the validation of European patents. Hundred percent accuracy and consistency is therefore required. Zacco uses TM to complement the features of PaTRANS.

Customization: The MT system is customized with general and specific domain dictionaries. Unlike the European Patent Office Zacco does not main-tain a hierarchy in the set-up of the MT dictionaries. Customization work has been done internally by specialized translators.

Corpora: Zacco’s total translation volume is around 30 million words, but the system is used for English into Danish translation only.

Conditions for MT implementation: The condition for the implementation of MT was simple and straight-forward: a solution was needed that was quicker and cheaper than conventional translation.

Integration: The PaTRANS system in integrated with a TM database.

Measuring MT quality Quality is measured in terms of speed, cost and language quality.

Measuring MT productivity The fully automated process using MT and post-editing services results in savings in excess of 50% compared to a conventional translation process.

9

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

User: International Monetary Fund

Presenter: Bill Skinner, International Monetary Fund

Translation engine: Rule-based MT, integrated with TM

Application type: Publishing (dissemina-tion), plus some gisting (assimilation)

Detailed description: A plan to introduce MT (Systran 5.0) as a standard translation service option over the IMF intranet, independently of the human translation service. Also some gisting will be provided for internal end users. Currently, MultiTrans used as TM server, terminology extrcation and glossary production. Pilot testing, not yet in production mode.

Customization: So far, minimal, with some dictionary work for certain languages.

Corpora: 18 centralized corpora, updated weekly. Need to convert legacy files from paper to digital. No control over the source text.

Conditions for MT implementation: Need for metrics on cost containment. Currently, MT success means deflecting small volume of work from standard human translation process.

Integration: API to in-house Phoenix workflow software.

Measuring MT quality User assessment form for MT service.

Measuring MT productivity N/A, but very important to develop reliable ROI metrics.

User: Pan American Health Organization

Presenter: Julia Aymerich, Pan American Health Organization

Translation engine: Rule-based MT, integrated with TM

Application type: Publishing (dissemination)

Detailed description: Originally developed in the 1970s as a special system for the PAHO, it began as a direct English-Spanish system and then developed a trans-fer component based on syntactic rules. Today it covers all combinati-ons of English/Spanish and Portuguese, and has been used daily for 90% of all PAHO translation requirements. It is fully automatic, but well integrated into the translation environment and understood as a tool. It is used as a paying service by 75 client end users.

Customization: Extensive dictionary work to build 110K entries for each language. 3 staff members and 1 contractor to customize and integrate dictio-naries and grammars. Massive terminology extrcation from bilingual corpus using MultiTrans.

Corpora: Very large bilingual corpora, with extraneous content cleaned out before adding new documents. No control of source text b apart from format-ting cleaning.

Conditions for MT implementation: MT is the default solution at PAHO, and is appropriate in 90% of cases. Not used when source documents are non-digital or too idiomatic in nature.

10

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Integration: Workflow involves an Active X environment for standard word-proces-sing tools, and WordFast TM. Project to integrate TM directly into the MT process, so that the whole process is MT-driven.

Measuring MT quality Monitor not-found words, parser statistics on output and translator feed-back, and use this to track quality. Problems of post-editors getting lazy about quality, but rotation of translators prevents this problem.

Measuring MT productivity The MT battle has been won at PAHO! The gain in productivity compa-red to human only ranges from 50 to 100%., and up to 33% reduction in overall costs. Translator throughput of 3K words a day.

Technology roadmapsThree technology roadmaps were presented at the TAUS Executive Forum. They represent new approaches to MT that have not been rolled out to business users yet. TAUS likes to monitor these developments closely because we believe that new tech-nologies will find their way to the market fairly soon as the business requirements are changing rapidly. In-depth reports on new products are scheduled in the series of TAUS Technology Focus Reports in the course of this year.

Technology: Language Weaver Presenter: Kirti Vashee, Language Weaver

Translation engine: Statistical MT Application type: Gisting (assimilation) and publishing (dissemination)

Detailed description: Language Weaver is a statistical MT system that is in use at several US military, intelligence and government organizations for information retrieval and gisting usages. The current availability of the Customizer Tool makes the Language Weaver system suitable for implementation in business environments, both for assimilation and for dissemination types of applications. No business implementations are being reported at the moment.

Customization: The Customizer Tool allows Language Weaver to use existing transla-tion memories to tune the core existing MT system to the specific domain of a customer. Language Weaver has also started adding dictionaries and linguistic rules to improve the output quality.

Corpora: The core Language Weaver system is ‘trained’ on large bilingual cor-pora. The size of these bilingual corpora may vary wildly dependent on the language pairs, but they generally need to be in excess of 10 million words. For tuning and customization of the core engine translation memories of at least 250,000 words are needed. However the credo in general for statistical systems is: the more data the better.

Conditions for success: Large volumes of high quality and perfectly aligned parallel data, best with tight domain focus. A test set is needed to measure the impact of new data, followed by iterative refinement.

Measuring MT quality Language Weaver uses the Blue score tests for the measurement of the quality of MT output.

Measuring MT productivity Existing translation memories help to leverage the productivity.

11

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Technology: Meaningful Machines Presenter: Mike Steinbaum, Meaningful Machines

Translation engine: Context-driven MT Application type: Gisting (assimilation) and publishing (dissemination)

Detailed description: The presentation of Meaningful Machines at the TAUS Executive Forum marks the launch of a new MT system. Originally described a statistical MT system, the marketers at Meaningful Machines have decided on the eve of the commercialization of their MT system to differentiate them-selves and label the system as ‘context-driven MT’ system. The main difference with classical statistical MT systems is that Meaningful Ma-chines does not require bilingual corpora. Instead the engine is ‘trained’ on monolingual corpora. The required volumes are not in the millions of words but in the billions of words, but then these monolingual corpora are readily available. Another difference with existing statistical MT systems is that Meaningful Machines trains its engine on much longer word patterns. Statistical systems typically take three words (3-grams). Meaningful Machines is not limited and therefore uses the N-gram as its characteristic. Based on this N-gram approach and gigantic mono-lingual training corpora, Meaningful Machines can reportedly generate new fluent sentences. The transfer to other languages is then made possible by applying standard dictionaries and using gazetteers for exemption of not-translatables (like named entities).

Customization: For customization Meaningful Machines may use phrase dictionaries for a particular domain or even translation memories. No commercial implementations are reported.

Corpora: As indicated above no bilingual corpora are needed.

Conditions for success: Delivers confidence indicators on MT feasibility to support the post- editing process.

Measuring MT quality Meaningful Machines also uses the Blue score to measure the success of its engine.

Measuring MT productivity Based on some tests Meaningful Machines reports the potential of 50% cost reduction and 40% productivity increases.

12

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Technology: Lana Consulting Presenter: Svetlana Sheremetyeva, Lana Consulting

Translation engine: APTrans, a hybrid rules- and data-driven MT en-gine using an interlingua transfer approach

Application type: Dissemination (publishing) and assimilation (gisting)

Detailed description: The presentation preview at the TAUS Executive Forum drew attention to the domain-centric nature of this MT system still under development. APTrans is an MT system or platform for translating patent claims. The system can model all sentences unambiguously in a language-indepen-dent interlingua. It has a modular architecture, and new language pairs are easy to integrate. It uses grammar checkers to post-edit output.

Customization: Can integrate other MT engines to the APTrans platform to expand the range and power of the system.

Corpora: Draws on a 9 million word corpus to model sub-language features of patent claim texts. This language modeling (performed by an analyst) is time-intensive at first, but very effective for delivering quality.

Conditions for success: Large domain specific linguistic knowledge base.

Measuring MT quality System not yet in production, but aims for dissemination quality.

Measuring MT productivity MT should reduce cost of patent translation by 30 to 60%.

Five most critical conditions for implementing MTAt the end of the user case and technology roadmap presentations the discussion groups reconvened for a brainstorming session. This time the groups were organized by common perspectives. There was a ‘Patent’ group, a group of ‘Publishers’, a group of ‘Language Service Providers’, a ‘Customer Support’ group and a ‘General’ group. Each group was requested to discuss and agree on the five most critical conditions for the implementation of MT. The results are listed below.

Patent groupThe ‘Patent’ group realizes that management buy-in is a number one condition because of the magnitude of the MT implementation and customization efforts without a di-rectly measurable return. Avoid re-inventing the wheel points into the direction of increased collaboration between likeminded projects. In most cases patent translation efforts serve the public interest. Sharing and collaboration in this area therefore seems very natural.

1. Get management buy-in2. Establish good workflow3. Need for extensive terminology coverage4. MT must be application-dependent, and end-user centric5. Avoid re-inventing the wheel

13

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Publishers groupThe ‘Publishers’ group seemed to really get on board after two days of discussi-ons: MT must be implemented but the vendors should take the initiative. Translator resistance to working with MT technology should be ‘attacked’ with an active PR campaign. The difficulty in measuring MT quality or translation quality in general is an issue that requires attention. In this respect we refer to the planned TAUS Founda-tion Report on Different Approaches to Measuring Translation Quality which is due to come out in May 2006.

1. MT must be easier to implement, and vendors should take the initiative2. Promote the “Post-editing is fun” message3. “Chat” communication could be a killer app!4. Seek wide consensus on quality evaluation

Language Service Providers groupThe ‘Language Service Providers’ group likes to get more reassurance on the return on investment of MT implementation. They like customers to take the lead and drive the market towards large-scale adoption of the new technology, perhaps similar to the way the use of TM technology was spread over the market because a couple of the big users imposed the use of TM on their vendors. However this time, it seems that the publishers push it back on the service providers (see above). Translation vendors would be wise to take a more proactive role this time, stated Jaap van der Meer. After all translation is their core business. Embracing the new promising technology can put the practitioners in a victorious role, and will stimulate the product developers to deliver new language pairs and mainstream products.

1. Need ROI to work out the business case2. Market adoption so that MT becomes a universal service offering3. Need for more language pairs4. Developers should deliver mainstream products to drive uptake

Customer support groupThe ‘Customer Support Group’ has to work more on intuition than on hard ROI data. MT in customer support environments seems to make perfect sense, but it would be good if the business case can be supported by real and hard data on the economic bene-fits. Could TAUS facilitate this exchange of sensitive data? Good advice and guidance on the optimal implementation scenarios is essential. Perhaps customization efforts could even be shared on a cross-industry basis.

1. MT is only one part of a bigger solution (authoring, TM and workflow)2. Need for good advice / guidance3. Work out the business case for data sharing (hard numbers)4. Easy deployment, minimal customization5. Need for real user response feedback

14

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

General overview groupThe ‘Generalist’ group takes a careful approach: prove that MT works on the ‘easy’ languages first. And again: run a PR program to promote the positive sites of working with MT technology to the translator community.

1. Focus first on language pairs that work well in MT2. Need for good domain terminology3. Management buy-in4. Make MT more fun for professional translators5. Good ROI figures to justify investment

In general this brainstorming session resulted in a number of good tips and ideas as well as a very healthy debate between the publishers and the practitioners about who should take the initiative for a large-scale adoption of the MT technology.

TAUS will take the conclusions from this session to heart and intends to deliver on many of the suggestions and requests. See last section of this report: TAUS action items.

Looking back, looking forwardPresented by Jaap van der Meer.Companies and institutions are increasingly confronted by the localization dilemma: what to translate and what not to translate? The information pyramid keeps growing and the number of required languages is expanding. At the same time the paternalistic publis-her-centric model is making place for a user-centric model. The user is taking control by surfing the Internet and retrieving just the information needed and when it is needed.

15

Localization dilemma: “What to translate and what not to translate?”

UI

IFULabels

Brochures Web sites

Help Documentation

Catalogs/Training

Customer support Knowledge base

Intranet/Extranet

Email/Communications

Internet

yes

maybe

MT gisting

self-service

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

16

In response to these changing market conditions TAUS likes to imagine native language support as a utility that is embedded in every customer support environment. However translation today is still a slow and costly process. Fragmentation of sup-ply and demand of translation services and technologies has prevented the language industry from implementing advanced technology architectures on a broad scale. Buyers of services tend to follow their vendors and go through trial and error. Service vendors lack scale or direction from their customers to make a difference. Product developers are locked in to an early adoption phase of the market.

To truly meet the requirements from end-users, citizens and patients for pervasive native language support, translation automation is moving up on the management agenda. TAUS brings together corporate and institutional users, practitioners and developers to discuss and benchmark user cases, best practices and technology road-maps. Forums and collaboration networks help to overcome industry fragmentation and maximize the returns from translation automation.

Evolution of the translation marketPractitioners of translation services have seen their activities change dramatically over a period of around fifty years. From a simple straightforward translation of diplomas of the emigrants and the user instructions for the first refrigerators and transistor radios in the fifties, services turned a lot more complex in the eighties when software publishers needed more than just the translation of user manuals. The complete product localiza-tion involved new services like testing and project management. The first translation technology was introduced – translation memory – which caused professional transla-tors to change their skills and working environment. Moving on to globalization at the end of the nineties the translation industry started to learn about content management integration, standards like XML and web site translation. Now not just the manual or the product, but the whole enterprise needed to be adopted to the cultures and langua-ges of the global market place. Buying criteria changed over the years from the lowest

“Translation out of

the wall”

Buyers’ knowledgeUser cases

Practitioners’ experienceBest practices

Developers’ insightsProduct roadmaps

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

17

cost to the best quality and now to the shortest time, or any mix of these three. The desk-top translation memory tools no longer suffice to meet these stringent customer needs. Translation workflow software is introduced to automate supply chains and to support the central storage and retrieval of linguistic data.

Changes follow one another in an ever faster pace. With emergence of the new generation of the Internet, the translation industry must be ready for what we call the transmutation phase. In this phase the network is the center for all information. The “environment” or ecosystem in general is creating this transmutation, as technology, market needs and computing power converge in a complex configuration which is about ubiquitous information and power decentralization. Translation or native langu-age support is now embedded in everything we do.

TRANSLATION• Glossary• Proofreading

booklets

2005

2005

20052000198519502005

200019851950

LOCALIZATION• TM tools• Linguistic verification• Functional testing/Project tracking• Vendor management/Quality assurance

products

GLOBALIZATION• GMS, CRM, CMS integration• Workflow• SGML-XML standardization

enterprise

TRANSMUTATION• ontology, taxonomy• search, MT• customer self-service• user-driven translation

ecosystem embedded feature infrastructure

speed of service enterprise

quality of service revenue

cost of service “must”

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

18

TAUS action itemsTAUS will focus among others on the following action items for the remainder of 2006.

TAUS Foundation Report on Different Approaches to Measuring Translation QualityAlmost all the discussions about deploying MT technologies seem to strand in a dispute about the quality of MT output. But there is no single and objective way of measuring that quality (or the quality of the work of professional translators for that matter.) What is the norm of good quality translations and how can we measure the required translation quality as exactly as we measure the cost and the delivery time, the two other buying criteria of translation services. In this report we will survey the different methodologies and tools of measuring translation quality and come up with recommendations for a common approach. We welcome input from TAUS members on this important topic.

TAUS User Case ReportsSeveral User Case reports are planned for this year to document the implementation and use of not only MT but the whole range of authoring, translation and globalization technologies. This will contribute to the need for more validation and transparency on economic benefits.

Collaboration NetworksIn addition to the horizontal benchmarking on user cases and best practices as we have experienced in the TAUS Executive Forums so far, TAUS plans to add a second dimension in the discussions by establishing ‘Collaboration Networks’ for users and practitioners that are active in the same vertical industry domain. For instance we have already started a collaboration effort for the CAD/CAM software companies within the TAUS community to share and possibly unify terminology. More details and pro-posals on the TAUS Collaboration Networks will be shared later this spring.

Executive Forum Beijing – September 21-22An Executive Forum is scheduled in China for September 21-22. Focus of this meeting will be to unveil more user cases and technology roadmaps for translation automation for the Asian languages. The need to translate into Asian languages as well as from Asian languages is increasing rapidly. TAUS likes to ‘build and expand its member-ship in Asia’ to be able to monitor the technology developments there. The Program Committee for this Forum so far consists of:

• Professor Yuan Qi, CCID Transtech• Vic Dickson, Transco

We welcome additional members for the Program Committee as well suggestions for user cases and technology roadmaps or best practices to be presented.

A p r i l 2 0 0 6

TAUS REPORT PuttingMachine

Translationto Work

Copyright © 2006 by TAUS

Executive Forum Brussels – November 23-24An Executive Forum is scheduled in Brussels for November 23-24. Focus of this mee-ting will be to establish more cross-industry exchange of intelligence, among others between business and government, but also between more vertical industry domains, like medical, telecommunications, industry automation and business software. Measu-ring translation quality and unifying terminology will be two of the key areas of in-terest within the scope of translation automation. The Program Committee for this Forum consists of:

• Dieter Rummel, Translation Centre for the Bodies of the European Union• Erwin Pijck, Lucent Technologies• Thomas Hecht, Siemens

We welcome additional members for the Program Committee as well suggestions for user cases and technology roadmaps or best practices to be presented.

About TAUS:

TAUS is a networking community for corporate and institutional users, developers and practitioners of authoring,

translation and localization technologies and services. By sharing best practices and intelligence in cross-industry

meetings and online forums the members aim at advancing the adoption of translation automation technologies.

TAUS Reports cover:• Different approaches: Introductions to the key areas of translation automation.• Best practices: Overview of best practices in applying technologies. Best practice reports are regularly updated.

• User cases: Analyses of processes in member and non-member companies.

For more information on TAUS, see: www.translationautomation.com.

Replies, questions and observations can also be sent to:

Email: [email protected]

Director: Jaap van der Meer

Address: Oosteinde 9-11, 1483 AB De Rijp, Netherlands, tel. +31 299 672028

©2006 by TAUS B.V. All rights reserved.

TAUS Reports are published by TAUS B.V. exclusively for members. No part of this publication may be

reproduced, stored in a retrieval system, or transmitted in any form or by any means.

19