Wednesday, 4 June /09:10 – 09:40
Terminology as a Service Indra Samite, Tilde
TaaS Workshop 2014 4 June, Dublin (Ireland)
The research within the project TaaS leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-‐2013), grant agreement no 296312
TAUS TaaS Workshop
Dublin / 04.06.2014.
* What does a terminologist do?
____________________________________ TAUS TaaS Workshop | Dublin | June 4, 2014
embracing innovation
Terminology field
SaaS Software as a Service
PaaS Platform as a Service
TaaS Terminology as a Service
* Tilde Latvia (Coordinator)
* TAUS Netherlands
* Kilgray Hungary
* University of Cologne Germany
* University of Sheffield UK
TaaS Partners
* Industry & Research Collaboration project * Supported by EU 7th R&I Framework Programme * Resulted in TaaS cloud-‐based services * Accessible for free at the online portal
termunity.com
TaaS
* Ioannis Iakovidis, Interverbum Technology * Uwe Muegge & Carl Yao, CSOFT International * Luigi Muzii, sQuid * Maria Pia Montoro, Intrasoft International, Terminology Blogger at Wor lLo
Invited speakers
____________________________________ TAUS TaaS Workshop | Dublin | June 4, 2014
* Little survey for warming-‐up * Panel of speakers * Reflections from the audience
Discussion from 11:55
____________________________________ TAUS TaaS Workshop | Dublin | June 4, 2014
Welcome to the Cloud! Terminology as a Service
Andrejs Vasiļjevs, Indra Sāmīte
Tilde TAUS TaaS Workshop / Dublin / 04.06.2014.
* Language technology developer * Translation and Terminology systems * >350 000 users
* Localization service provider * Leadership in smaller languages * Offices in Riga (Latvia), Tallinn (Estonia) and Vilnius
(Lithuania) * 130 employees * Strong R&D team * 5 PhDs + and engineers and students, 80+ research papers * Coordinator of several EU industry-‐academic R&D projects
About Tilde
EuroTermBank Portal
Microsoft Language Portal
ECDC Terminology Server
* Term identification in the source text * Consulting online databases and local files for translation
equivalents * Creating and maintaining terminology glossaries * Sharing term glossaries and involving others in their polishing * Structuring data in the industry standard formats * Integrating term glossaries in CAT and other productivity tools * Keeping terminology up to date * etc.
Complexity of terminology work
TaaS User Needs Survey Results: Importance of terminology work
43.5%
39.9%
14.8% 1.8%
Very important
Quite important
Less important
Not important
TaaS User Needs Survey: willingness to share
24.9%
19.2%
14.2%
11.4%
7.6%
6.0%
16.7%
Yes, provided that…
Joint contribution to the DB Access control Legal aspects External quality control Little effort Anonymity Other
48.6%
22.0%
16.5%
8.3% 4.6%
No, because…
Legal restrictions Poor quality/Lack of time Own asset Risk of misunderstanding
60.5% 39.5%
* Simplify the process for language workers to prepare, store and share of task-‐specific multilingual term glossaries
* Provide instant access to term translation equivalents and translation candidates for professional translators
* Improve quality of machine translation systems by dynamic integration of terminology data
TaaS Mission
cloud-‐based platform that automates
the terminology work for human and machine use
termunity.com
* Automatic extraction of monolingual term candidates from user uploaded documents
* Automatic retrieval of translation equivalents from different public and industry terminology databases
* Translation candidate acquisition from multilingual web data
* Facilities for cleaning-‐up by users automatically acquired terminological data;
* Data sharing and integration facilities through APIs and export tools
Key services of TaaS
TaaS Services
Term identification and annotation
* Support for industry standard formats * Integration into CAT and productivity tools * API to integrate TaaS services into various software applications
Integration
TaaS in the service for MT
Online Terminology Services
Translation
Training
SMT System Training and adaptation
Online Translation Service
Input Text for Translation
Parallel corpus
Monolingual corpus
Bilingual term collections
Monolingual Term
Extraction
Trained SMT Model
Bilingual Term
Extraction
Translated Text
TaaS Architecture
Presentation Layer
Web Page UI Public API
Application Logic LayerTerminologycollection
management
User management
Terminologycollection search
Terminology collection creation
Data Storage Layer(Shared Term Repository)
High-‐performance Computing (HPC) Cluster
SGE
External TDBsCAT tools MT
https
REST
http/https
html
https
REST
https
REST
includ
ed
CPUCPU
includ
ed
Shared Term Repository
DB
File Store
Web Browsers
HPC frontend
CPU
CPUCPU CPU
CPUCPU CPU
Term extraction workflowsFull collection
creation workflow
Monolingual collection creation
Translation candidateextraction
....
Modules
Result processing
Collection Importer
Marked Text enrichment
Text tagging
with terms
Statistical DB acquisition
Statistical DB feeding
Bilingual Term Extraction System
Parameter retriever
Translation lookupETB & STR
IATETAUS API
Statistical DBCollection merger
CPUCPU CPU
Term extractionTXT extractor
TWSCKilgray TermExtractor
Collection creatorTerm normalizer
Statistical DB
Research
Development
Usage
Focus areas
* Term extraction * Collection of domain specific
multilingual corpora * Max(FTC)
* Usability * Outreach * Sustainability
* Quality * Performance * Scalability * Interoperability
Indra Sāmīte Business Development Director
Tilde
TaaS in action
The Modern Translator
* Search for individual terms in various sources * Identify term candidates in your documents and extract them automatically * Automated Look up translation candidates in various sources * Refine and approve terms and their translations * Share your terminology with other users * Collaborate with colleagues & team * Use your terminology in other working environments
Features
Simple Search
* Search for terms in various sources
Simple Search
Identify & extract
Identify & extract
Automated Lookup
Automated Lookup
Refine & Approve
Refine & Approve
Refine & Approve
Share & Collaborate
CAT integrated
Machine translation by Tilde
* In the cloud * Do it yourself or Custom * CAT integrated * Terminology ready * Vast data base of resources for training * Productivity boosting
LetsMT
MT friendly
* Trusted terminology resources * Sharing new terminology data * Reuse of terminology resources * Quality improvement of MT systems * Efficient work patterns * Increase competitiveness * Translation quality improvement
Strategic impact
* Free access to online services * Integrated with memoQ 2014 * Integrated with OmegaT (July’14)
Sign up now! termunity.com
Thank You!
The research within the project TaaS leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-‐2013), grant agreement n° 296312
Contact: [email protected]
termunity.com @TermServ on Twitter
Terminology Services group on LinkedIn