Upload
heinz
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Speak to your customers loudly and clearly. Elan Speech mission statement. Beyond the words As a leading world player in Text to Speech, Elan Speech focuses exclusively on the development and marketing of natural-language interfaces. - PowerPoint PPT Presentation
Citation preview
Speak to your customers loudly and clearly
Elan Speech mission statement
Beyond the words
As a leading world player in Text to Speech, Elan Speech focuses exclusively on the development and
marketing of natural-language interfaces.
Elan Speech brings organisations new ways of interacting with their clients,
providing new opportunities to speech-enable their world through revenue-generating applications.
Our mission is to vocalize content to the end user with efficiency and accuracy, whatever the situation is »
Antoine Kauffeisen, CEO, Elan Speech
Worldwide speech provider
Elan Speech profile
Private company, headquartered in Toulouse, France.
Funded by venture capital (raised in 2002): IRDI, Part’Com,WT.
Strong in-house R&D, Elan Sayso™ technology ownership.
Wide offer of TTS technologies, with up to 12 languages and more to come.
Large Partner & Customer network in Europe, Eastern Europe, North America, Latin America, Japan & India.
New management growth oriented, with longterm vision and roadmap.
Elan Speech background
Elan Speech was created in June 2002, from the assets of previously named Elan Informatique
1980: creation of Elan Informatique
1986: beginning of work on text to speech (LPC technology)
1996: exclusive focus on TTS (diphone concatenation technology: Elan
Tempo™)
2000: company sold to Lernout & Hauspie (L&H)
2001: legal battle against L&H, won in November 2001
2002: decision to go chapter 11 (RJ) in Feb 2002
June 2002: creation of Elan Speech, acquisition of all Elan Informatique
assets, new management, new capital structure.
July 2002: launch of new high-end TTS technology: Elan Sayso™
Elan Speech : figures
More than 3 million de licenses in automotive and multimedia applications.
More than 10,000 ports deployed in telephony services.
More than 350 active customers.
12 languages already supported.
3 target markets: Telecom, Multimed and Mobility.
2 text to speech technology families: Elan Tempo™ and Elan Sayso™
Worldwide speech provider
Focus #1 : EUROPE
Germany, France, UK, Spain, Netherlands, Belgium, Italy,
Switzerland …
Focus #2 North America
Focus #3Latin America : Brazil
Chile, Argentina, Venezuela
Focus #4Middle east (Arabic), India,
Japan, Korea, Australia
Elan Speech : Geographical markets
Elan Speech’s markets
TelecomServer based vocalization of contents for multiple users over the phone
for Enterprise : Unified messaging, Auto attendant, CRM for Telcos : Unified messaging, Voice portal, SMS2Voice, directory and
reverse directory
Automotive and mobile terminalsOn board and off-board speech solutions to free user from reading instructions.
On board car navigation systems & Off board car navigation systems Traffic information Telematics, RDS – TMC
MultimediaPersonal software on PC & Mac
Edutainment software Disabilities assistance Personal productivity
Elan Speech markets’s requirements> TTS component for Telecom
High quality High density (ports per server) High reliability (24/24 7/7) Support of markup languages and standard APIs Support of 3 major OS : Windows NT/XP/2000, Solaris Sparc, Linux
> TTS component for Automotive and mobile terminals High quality Low footprint (2 to 16 Mb depending on platform) Support of multiple RTOS (VxWorks, WinCE, PSOS, Neutrino, etc..) Support for multiple processors Support for phonetic input/output and phonetic lexicons of proper names
> TTS component for Multimedia High quality High flexibility (Speed, pitch adjustment, voice customization) availability for PC & MAC platform Support of Standard APIsServer based vocalization of contents for multiple users over the phone
Elan Speech
VAR & OEM
(integrator, platform vendor, publisher)
End Customer
(ASP, Service provider, Telco, car manufactuer)
End Customer
(Subscriber, mass market user)
Core Technology (TTS licences)
Solution, platform
Service/consumer product
Value addedservices aroundthe technology(Custom voice,
Quality monitoring,Expertise)
Elan Speech’s direct & indirect business model
> Diphone based concatenative TTS
Advantages High density (over 250 ports per server) Small footprint (2 to 6 Mb) Flexible (Pitch, Speed adjustment, prosody copying) High intelligibility 12 language supported
Disadvantage : robotic sounding
Markets/Application targeted : Automotive & consumer electronic (low footprint) High density, short ROI server based TTS (telephony), low cost
of ownership Multimedia software products
Elan Speech TTS technologies
Elan Speech TTS technologies
> Unit selection concatenative TTS
Advantages: Very high quality Highly natural Flexible (Pitch, Speed adjustment, timber alteration, whisper
feature) Support for Custom voice (“Speech Brand” Program)
Disadvantage: lower density (50 ports/server) larger footprint (16 to 70 Mb)
Markets/Application targeted : High end telephony application Mass market telco service (Voice portal, news) Public address High end multimedia software
Pre-processing
Text normalization
Phonetic transcription
Prosody calculation
Synthesizer
Pre-processing
Text normalization
Phonetic trans.
Unit selection
Decoder
Abbreviation & exception
Diphone database
Audio output Audio outputElan studio
Units database
Comparison of Elan’s TTS technologies
Footprint
Quality / Naturalness
Elan Sayso™25-50 MB
Human speaker
4 Mb 12 Mb 32 Mb
Elan Tempo™2-6 MB
Elan Sayso™ Embedded
10-16MB
Positioning of the two technologies
Languages available with Elan Tempo™ technology
American English Male Female
British English Male Female
French Male Female
German Male Female
Italian Male Female
Spanish Male Female
Polish Male
Russian Male
Dutch Male
Brazilian Portuguese Male Female
Latin American Spanish Male
Arabic Male
R&D approach : Automation & Tools
> Elan Studio : a strong R&D set of tools
Advantages
Automate most r&d tasks Build-in signal processing Build-in linguistic analysis Build-in Phonetic analysis Build-in Database generation Automatic segmentation Voice factory Fast and easy tuning & improvement Optimization tools
= Key component for R&D to rollout languages and voices rapidly.
Langue Genre Voice samples Interactive demo Product GAFrench Female Available Available AvailableUS English Female Available Available Available V1.0German Female Available Available AvailableSpanish (castillian) Female Available 30/06/03 22/07/03Brazilian PortugueseFemale Available 30/07/03 20/08/03Italian Female 14/08/03 28/08/03 15/09/03UK English Male Available 10/09/03 01/10/03Polish Female 30/08/03 21/11/03 07/12/03Arabic Male 01/09/03 28/11/03 15/12/03UK English Female Available 15/11/03 20/12/03US English Male 01/08/03 28/11/03 30/12/03
Elan Sayso™ language/voice roadmap
Pre-processing
Text normalization
Phonetic trans.
Unit selection
Decoder (HNI)
Pre-processing
Text normalization
Phonetic trans.
Prosody model.
Synthesizer
Product layer – OS related level – native API
SAPI 4 SAPI 5 NSC API NVIF JavaSpeech
Audio Layer
5 APIs supported, a 6th to be discussed
Common product framework for
Elan Tempo andElan Sayso™ providing full compatibility
Speechmanager
Elan Speech products framework
Elan Speech’s market (1)
TelecomContent vocalisation solutions for Operators & Entreprises.
Applications
Customer services automatisationIVRVoice portalSMS to voiceUnified messaging and email reading
Elan Speech OfferElan Sayso™ Telecom & Elan Tempo™ Telecom : > Multilingual, multi-channel, carrier grade TTS engine. > Client server architecture, heterogeneous architecture supported> Load balancing (multi-server architecture), centralized supervision> Dynamic user lexicons (abbreviation, exceptions)
Elan Sayso™ Telecom & Elan Tempo™ Telecom
Available for Windows NT/2000/XP, Solaris Sparc, Linux x86
Support for 12 languages, with male and female voice
Support for 5 API (SAPI4, SAPI5, NVIF, Elan NSC API,
JavaSpeech)
Cross-platform integration with Elan NSC API
Client server architecture, heterogeneous architecture
supported
Load balancing (multi-server architecture), centralized
supervision
Dynamic user lexicons (abbreviation, exceptions)
Specific modules included :
- E-mail pre-processing
- automatic language identification
- Markup language supported : SSML (VoiceXML), JSML
Elan Speech’s market (2)
Multimedia & Web Products for personal communication and content enhancement.
Applications : Edutainment software Aid for the disabled Personal productivity Personal Web assistant (Agent) Voice enabled tutorials Consumer electronics vocal interface
Specific support> Elan Sayso™ for Multimedia & Elan Tempo™ for Multimedia, TTS software component for Windows and MAC OS X platforms.> Elan Sayso™ PocketSpeech & Elan Tempo™ PocketSpeech, TTs software component for Pocket PC
Elan Speech Markets (3)
Automotive & Mobile devicesTTS multi-platforms for embedded compact solutions.
Applications
Embedded navigation aid Traffic information Navigation sytems for PDAs Telematics services Vocal interface on professional devices public address services
Elan Speech offer> A wide range of portage to serve more than 10 RTOS and 20 procesors specifically adapted to customers’ platforms.> Pocket Speech, specific offer for PDA for Windows CE
Elan Speech Markets (4)
Elan Sayso™ PocketSpeech,Elan Tempo™ PocketSpeech
Multilingual TTS engine for PDA based applicationsSupport for both Tempo & Sayso technology
Available for WinCE 2.Xx, WinCE 3.0 / PocketPC 2002 / WinCE.Net
Support for 8 languages, with male and female voiceSupport for 3 API (SAPI4, SAPI5, Elan NSC API)
Tempo PocketSpeech™: small footprint engine, high quality : 3 to 6 MB
Sayso PocketSpeech™ : high quality and high naturalness : 8 to 16 MB
Elan Speech Markets (5)
Elan Sayso™ Embedded, Elan Tempo™ Automotive
Multilingual TTS engine for embedded platformsSupports Tempo technology in 8 languages with male & female
voice
Available for WinCE 2.Xx, WinCE 3.0 Automotive, QNX, Neutrino, VxWorks, PSOS, µITROn, RTXC, Linux Embedded
On Intel X86, Motorola 68332,Motorola 68360, Motorola Power PC, Hitachi Super H(SH3, SH4), Philips Trimedia, OKI 763X, OKI ML2110, StrongARM, MIPS…
Support for 3 API (SAPI4, SAPI5, Elan NSC API)Unlimited vocabulary (names, numbers and currencies, dates, free
text, e-mail, etc.) High quality voice, smooth and natural intonation with
concatenative synthesis. Voice speed and voice pitch control. Female and male voices. User abbreviation lexicon for each language. Text tags. Phonetic input/output (SAMPA, IPA)
Elan Speech Markets (6)
Pre-processing
Text normalization
Phonetic trans.
Unit selection
Decoder (HNM)
Textual Abbreviations & Exceptions
Generic Units database
Audio output
A-TTS (Applicative Text-to-Speech) for mix of Prompts and Elan Sayso™
Sound Exceptions – Prompts
Application dependent (encoded in HNM frames)
Text input
> A-TTS: Applicative Text to Speech means that prompts are fully tunable and updatable (application corpus) and treated like “Sound Exceptions” within the generic TTS system.
A
TTS
Applicative TTS and recorded messages included in the TTS system
Sayso30-50Mb
6 to 10 Mb
Sayso Embedded10-16Mb
Hnm frames : 15ms to 20ms
>70% removed units (pruning)
Recorded or TTS
generated applicative
prompts stored at 1,6Kbps
(22khz, 15ms frames)
Recorded prompts for applicative TTS
Prompts generated with the full Elan Sayso version
ATTS
ATTS
Hnm frames : 10ms to 15ms50% removed
units (pruning)
1/3 to ¼ size
D-TTS (Distributed Text-to-Speech) for Web and Telematic applications
Elan Sayso TTS
server
Application (server)
ActiveX client
Java AppletClient
Embedded JavaClient
Servlet(Java
security)
GPRS/UMTSgateway
<16Kbps bandwidth used for a 22khz sampling rate streamed TTS
TCP/IP Socket over
GPRS/UMTS
HTTP
TCP/IP Socket over Internet
connexion
Elan HNM coder
0,00
10,00
20,00
30,00
40,00
50,00
60,00
70,00
44khz 22khz 16khz 11khz 8khz
Kb
/s
skip 1
skip 2
skip 3
skip 4
Coder performance for applicative TTS and distributed TTS
With Sayso embedded, at 22kHz sampling rate, 1 hour of recorded prompts for applicative TTS will take less than 6,5MB.
Skip 1 : 5ms frames: transparent
Skip 2 : 10ms frames : no audible change
Skip 3 : 15ms frames, slightly degraded acoustic quality, hard to perceive
skip 4 : 20 ms frames, audible degraded acoustic quality, acceptable
Elan Speech Web solutions
> “Digalo cast”
Distributed TTS over an IP network (DTSS)High quality server based Sayso and Tempo TTSSmall footprint remote client , Java native (100Kb)Low bandwidth connection (<15Kbps)“HiFi” restitution quality (22khz, no degradation)Lips synchronization tags for animated web agent (3D agents)
Java Clientfor Digalo Cast
(100Kb)
Digalo Cast Server running Elan Tempo or
Elan Sayso technology
serving from 30 to 300 users
simulatenously
IP connexion : less than 15kbps bandwith used,
TCP, UDP or HTML encapusalted
Elan Speech Tools
Elan Virtual SpeakerVoice prompts creation tools for telephony or multimedia
application
• Quick and Easy to use, available for audio updates 24/24 7/7- Automatic generation- Batch processing - Editing features- Multiple output format- Pitch, Speed adjustable- 8khz, 22khz sampling- A-law, µ-law
Elan Speech Technology tools
Elan Prosel
Applies natural intonation to synthetic speech
Elan Lexitool
Edit and enrich exception and abbreviations lexicons
Elan Speech Services
Proprietary voice : “Speech Brand”“An exclusive TTS voice based on an existing speaker of your choice. Based on Elan Sayso technology, the new voice will mimic the timber, the intonations and the accent of the original speaker.
Technology adaptation & PortingElan’s core technology adapted to a specific platform (Processor, RTOS), especially for embedded TTS
Quality monitoringA global service offer to continuously improve the result of TTS for a specific application.Audit of written contents, specialization of the TTS.
Speech Brand : the process to create custom Sayso voices
Text corpus(5 weeks)
Recordings(4 weeks)
Autosegmentation
(2 weeks)
Segmentation verification (manual)(2 to 4 months depending on size)
Database generation
and optimization
(2 weeks)
5 to 7 month process
Reduced if the language is
already available with
Sayso
Requires the Speaker.Might be
reduced for latin
languages
Computerprocessing
Longest part, required to achieve high quality. Currently Investigating reduction and automation a part of
this task.
Ready for integration
Elan Studiovoice factoryframework
Elan’s marketing tools
Elan’s partner program Web, News and tradeshow support
Elan news & EvenTTSA monthly newsletter dedicated to customers applications and
deployments,sent out to a highly focused database of 13000 e-mails
Digalo.comA website dedicated to promoting consumer speech-enabled
applications.
Joint marketing agreements : A program to refer qualified leads of prospects
Telecom references
Automotive references
Beyond the words