Instant speech translation 10BM60080 - VGSOM



Instant speech translation

Citation preview




I Year M.B.A

VGSOM, IIT Kharagpur


1. Abstract .................................................................................................3

2. Instant Speech Translation – Eliminating Language Barriers ...........3

3. System Requirements ..........................................................................3

3.1. Speech Recognition ...............................................................................4

3.2. Language Parsing ..................................................................................5

3.3. Translation .............................................................................................5

4. Applications and their Business Potential ..........................................6

4.1. Mobile Applications and Services ...........................................................6

4.2. Voice Interface Devices with Local Language support ............................8

4.3. Data Entry Applications – in Multiple Languages ....................................9

4.4. e-Learning .............................................................................................9

4.5. Business Applications .......................................................................... 10

5. Key Players ......................................................................................... 11

6. Challenges Ahead ............................................................................... 11

7. Conclusion .......................................................................................... 12

8. References .......................................................................................... 13

1. Abstract

With the current pace of globalization, any Industry needs to look beyond Geographical

borders. Indian IT firms provide services to Japanese, Korean clients etc. These firms also

invest a lot on foreign language training programs. An Application that provides instant

translation will not only cut down these costs but will also help gathering requirements more

precisely and in a short span of time. Instant speech translation [IST] finds wide applications in

other industries as well. Say in a country like India where numerous vernacular languages are

in use, IST can be used in a number of ways in day-to-day life. There is huge potential for IST

applications in mobile phones. All major players such as Google, Microsoft, and IBM have

already come up with some sort of prototype for these kind of applications. Google Translator is

one such primitive example. A lot many such applications will be in our gadgets soon. This

Paper elaborates on few such applications and their business potential.

2. Instant Speech Translation – Eliminating Language Barriers

Internet and mobile services has reached even remote villages. Now rural markets are

considered significant in countries like China and India. Breaking Language barriers will further

open up these markets for international business. Knowledge anywhere in any form should be

used for the growth of the humanity. We should create opportunities for those who want to

learn and share knowledge using their own native languages. Instant Speech translation will

create a platform for them. This could unravel many things that are not known to the world.

In “The Hitchhiker’s Guide to the Galaxy” Babel fish, a fictitious animal performs instant

translations when kept in the ear. If such an application is there on the mobile, Say I call a

person in Japan, I speak to him in English which would be translated to Japanese by the

application and then transmitted through a telecom service provider. This will eliminate

language boundaries and create a truly connected world.

3. System Requirements

“We think speech-to-speech translation should be possible and work reasonably well in a few years’

time. Clearly, for it to work smoothly, you need a combination of high-accuracy machine translation

and high-accuracy voice recognition, and that’s what we’re working on .If you look at the progress in

machine translation and corresponding advances in voice recognition, there has been huge progress


- Franz Och, Google’s head of translation services

To develop an Instant speech Translation application, we need a robust speech recognition

and Machine translation system. Following figure depicts the basic blocks of an instant speech

translation system.

Fig. Basic Functional Blocks of Instant Speech Translation

3.1. Speech Recognition

Advances in speech-recognition and dictation technology have made stunning leaps

forward in recent years although it isn't perfect. Word Error Rate (WER) has drastically come

down in the recent past.

Fig. Word Error Rate of Speech Recognition Systems over Years

Source - Communications of the ACM

Speech recognition has achieved good usability and there is a sudden surge in the

speech controlled devices. Even Microsoft Vista had speech recognition capabilities which

turned out to be a failure. But we had witnessed basic commands working in it. Just a listening

and guessing system is not going to this forward.

Robust speech recognition technology is an crucial part of Instant speech translation.

Main problem systems face is in understanding the nuance of user’s enunciation and voice

patterns. When used over a period of time it could reduce the speech recognition error rate.

Mobile phones will have an upper hand over gadgets in this. As a mobile phone is used by only

one user mostly and even users can’t avoid mobile phone usage. Mobiles can also soon

recognise user’s natural free-style speech. Speech recognition systems can be customized to a

particular user by having a predefined set of commands or words to be uttered by the user.

This could help the system recognize its master’s voice patterns. This could be done with the

help of a professional in early stages of development for this sort of customization.

3.2. Language Parsing

Human sentences can’t be easily parsed by programs as they parse mathematical

expressions. There is substantial ambiguity associated with the structure of human language.

Some sort of linguistic analysis needs to be done to fetch the relevant information. Language

parser splits the raw text into understandable word units and selects the correct form and class

for each word that can have more than one interpretation and identifies the head words of a

sentence. The information that is analysed by the language parser is passed to the machine

translation engine for further tasks.

There should be some set of protocols defined for communication between different

languages. Say for e.g. Indian languages generally use SUBJECT-OBJECT-VERB pattern but

in English SUBJECT-VERB-OBJECT pattern is generally used. Language Parser role is provide

parsed language stream that can be easily interpreted by translators.

3.3. Translation

Machine Translator translates a parsed input language stream to a well defined output

language stream. Translation done by Machine translator will abide by the set of protocols

defined for communication between a set of languages.

Fig. Machine Translation

4. Applications and their Business Potential

IST applications have great business potential. Various players are almost set ready to

roll out these services in various types of gadgets.

4.1. Mobile Applications and Services

IST as a service:

Instant Speech translation will have a lot many applications on mobile. It is highly

impossible for an IST service provider to cover all languages and various colloquial forms in

them. Hence the service provider can expose certain Application Programming Interfaces

(APIs) so that interested third parties can develop and sell them back to the IST service

provider. This will become a viable business model once regional language enthusiasts start

involving in this. IST service provider can bill the users based on usage. This sort of services

can be launched in collaboration with the telecomm service provider.

Fig. A Model of IST Services on mobile

IST as a product:

Even these services can be packaged into a product. But this will be a heavy

application to support an almost perfect translation. So in the initial stages user preferred

language packs can be packed into a product and sold to the user.

Fig. Users interacting through an IST application on mobile

Service model will suit Indian languages and Product model will suit for international

languages like Japanese. Service model will facilitate wide spread of these applications and it

will also bring in various players into it.

Even IST applications can be used in other type of gadgets like iPod, iPad etc. Few

basic stuffs are already available in App store for e.g. Jibbigo Voice Translation

Fig. Screenshot of Jibbigo Application on iPod

IST Development Standards

To facilitate easy development and learning some set of standards need to be

established similar to HTML in web design. As XML and JSON for machine readable data

sharing, VOXML (Voice XML) can be used for these types of applications.

4.2. Voice Interface Devices with Local Language support

Voice interface devices that support Local languages will soon be on use. Say a

localities interacting with a railway information kiosk with their local language through speech.

Instant speech translation will play a vital role in these types of interfaces. IST Applications can

be at the front end of such devices. This will also consume lesser query resolving time as

compared to traditional key entry enquiry system. As most of the voice driven applications

currently support English. Even same is the case with Windows 7 Operating System. IST

Application when used at the front end can translate local language speech input to English

which can be further processed by Speech recognition systems supported by various Operating


Fig. Various blocks in a Railway Information Kiosk that supports Regional Language support

through speech

4.3. Data Entry Applications – in Multiple Languages

IST Applications can help in Data entry applications in multiple languages. This could

assist in translating legal documents to various languages. We have witnessed many court

proceedings getting delayed due to lack of documents in regional languages. Our Governm ent

also invests a lot in translating various documents to regional languages. In the years to come

Microsoft word will have options to view translated versions while typing. This could cut down

costs and time involved in such activities.

4.4. e-Learning

Advancement in computing and bandwidth has brought the benefits of traditional classroom

education into a distance learning environment. IST will take this a step forward by removing

language barriers that impede the sharing of ideas and knowledge. Below figure depicts the schema

of an e-classroom that uses IST.

Local Language

Speech input

IST Applications

Command / Query Generator

Normal Processing done in a Railway Information Kiosk


Fig. IST Applications supporting Distance Learning in Various Languages

Even IST applications could be used in webcasting in a similar way.

4.5. Business Applications

IST Applications could also assist Business enterprises to interact with customers located

across different geographies. IST will help in understanding customer requirements in short

span of time.

Users’ contribution to IST applications is very crucial. They can provide suggestions to

improvise the translation provided by the application. Some credits can be given to regular

users who provide valuable suggestions. This will encourage local participation, which would

ultimately help in improving the quality of service provided by IST applications.

Applications of IST discussed here is just a tip of an iceberg. We would see a lot many such

applications in future when IST applications are usable in real time. Then IST applications

could be expanded to lot many sensitive areas like Health care, defence etc.

5. Key Players

Google was the first company to announce that it was working on speech-to-speech

translation for mobile phones. The Latest Apps from Google Android that supports translation is

Babylon that will give dictionary results in 75 different languages as well as full text translations

in over 12 languages. Apple is working with IBM to roll out speech-to-speech translator for

iPhones. IBM and Apple are already working closely on a few applications that will run on

iPhone and iPad.

IBM has been working on translation software and machine translation for years. In fact,

they created MASTOR and the SMT (Statistical Machine Translation) technology that many

other Translating Applications are using.

Microsoft has inbuilt speech recognition support in its Operating systems. It has

recently demonstrated German-English translation of a conversation between two Microsoft

employees. It has made no official announcements on projects pertaining to Instant Speech


Videos of Instant Speech Translation applications by other major players like AT&T,

NEC, ATR float in YouTube. Nespole, Babylon, Verbmobil, MATRIX etc. are few well known

speech translation systems developed by these players in this field. Extensive Research

Projects are going on to improve the usability of Speech translation systems. PDA

manufacturers could work in collaboration with these Application developers to accelerate

these projects, which would also help them in gaining an upper hand over their competitors.

6. Challenges Ahead

System that works well in real time environment will only be successful in the long run.

Numerous hurdles need to be crossed to reach a perfect real time IST. One such is Speech

Recognition with high accuracy. It is heavily dependent upon the quality of the input speech.

Acoustical degradations produced by additive noise are an obstacle to reach desired accuracy.

In a real time user is not going to use IST applications in a noise free environment. Hence IST

application should be intelligent enough to separate out the user’s voice form the noise in the


IST applications are also expected to be intelligent enough to capture the user’s mood

in the future. Monotonous voice from an IST application will soon make the user bored with

these applications. Even a customisable voice from the IST application will make them more

expressive and friendly. Adding Phonemes to computerised voice will it nearer to a human


Industry should work in collaboration with research communities in resolving these

hurdles and achieve a human like performance.

7. Conclusion

Speech/Text Translation Applications are being used in variety of forms in number of

devices. To attain humanlike performance, we must continue to invest in research. Along with

speech, other sensory user inputs can also be integrated with IST applications to attain

humanlike performance. Once that is achieved Instant speech translation will soon spread to

devices like T.V. It wouldn’t be a surprise if text in the web now gets replaced by audio and

video in the future “glocal” world.

8. References

1. “Enhancing Global and Synchronous Distance Learning and Teaching by Using Instant

Transcript and Translation” By Ivan Ho, Hajime Kiyohara, Akira Sugimoto, and Kazuo

Yana Hosei. University Research Institute, California.








