TextText--toto--Speech (TTS) Speech (TTS) Synthesis … · 2010. 5. 19. ·...

intelligible and natural sounding syntheticintelligible and natural sounding synthetic speech so as to transmit information from a machine to a personmachine to a person

input text

Text Analysis

Speech Synthesizer

abstract underlying linguistic

synthetic speech

t ty y

description output

• pronunciation of text => phonemes, stress, intonation, duration

• syntactic structure of sentence (pauses, rate of speaking, emphasis)

• semantic focus, ambiguity resolution (duration, intonation)

– rules for word etymology (especially names, foreign terms)

Text Analysis ComponentsText Analysis ComponentsText Analysis ComponentsText Analysis ComponentsRaw English Input TextRaw English Input Text

Basic Text ProcessingBasic Text ProcessingDocument Structure DetectionDocument Structure DetectionText NormalizationText NormalizationLinguistic AnalysisLinguistic Analysis DictionaryDictionaryLinguistic AnalysisLinguistic Analysis

Phonetic AnalysisPhonetic Analysis

Tagged TextTagged Text

DictionaryDictionary

Phonetic AnalysisPhonetic AnalysisHomograph disambiguationHomograph disambiguationGraphemeGrapheme--toto--Phoneme ConversionPhoneme Conversion

Tagged PhonesTagged Phones

Prosodic AnalysisProsodic AnalysisPitch and Duration RulesPitch and Duration RulesStress and Pause AssignmentStress and Pause Assignment

Tagged PhonesTagged Phones

Stress and Pause AssignmentStress and Pause Assignment

Synthesis Controls (Sequence of Synthesis Controls (Sequence of Sounds, Durations, Pitch)Sounds, Durations, Pitch)

Document StructureDocument StructureDocument StructureDocument Structure•• end of sentenceend of sentence marked by ‘.?!’ is not infallibley

– “The car is 72.5 in. long”•• ee--mailmail and web pages need special processing

– Larry:

Sure. I'll try to do it before Thursday :-)y y )

Ed•• multiple languagesmultiple languages•• multiple languagesmultiple languages

– insertion of foreign words, unusual accent and diacritical marks, etc.

Text NormalizationText Normalization

•• abbreviations and acronymsabbreviations and acronyms– Dr. is pronounced either as Doctor or drive depending on context (Dr.

Smith lives on Smith Dr.)S i d i h ‘ ’ ‘S i ’ d di (I li– St. is pronounced either as ‘street’ or ‘Saint’ depending on context (I live on Bourbon St. in St. Louis)

– DC is either direct current or District of Columbia– MIT is pronounced as either ‘M I T’ or ‘Massachusetts Institute of

Technology’ but never as ‘mitt’– DEC is pronounced as either ‘deck’ or ‘Digital Equipment Company’ but

never as ‘D E C’•• numbersnumbers

– 370-1111 can be either ‘three seven oh …’ or ‘three seventy-model 1111’ for the IBM 370 computer

– 1920 is either ‘nineteen-twenty’ or ‘one thousand, nine hundred, twenty’•• dates times currency account numbers ordinals cardinalsdates times currency account numbers ordinals cardinals•• dates, times currency, account numbers, ordinals, cardinals, dates, times currency, account numbers, ordinals, cardinals,

mathmath– Feb. 15, 1983 needs to convert to ‘February fifteenth, nineteen eighty-

three’$10 50 is pronounced as ‘ten dollars and fifty cents’– $10.50 is pronounced as ‘ten dollars and fifty cents’

– part # 10-50 needs to be pronounced as ‘part number 10 dash fifty’ rather than ‘part pound sign ten to fifty’

Text NormalizationText NormalizationText NormalizationText Normalization•• proper namesproper namesp pp p

– Rudy Prpch is pronounced as ‘Rudy Perpich’– Sorin Ducan is pronounced as ‘Sorin Duchan’

•• part of speechpart of speech– read is pronounced as ‘reed’ or ‘red’

record is pronounced as ‘rec ard’ or ‘ri cord’– record is pronounced as rec-ard or ri-cord•• word decompositionword decomposition

– need to decompose complex words into base formsneed to decompose complex words into base forms (morphemes) to determine pronunciation (‘indivisibility’ needs to be decomposed into ‘in-di-visible-ity’ to determine pronunciation)visible ity to determine pronunciation)

Text NormalizationText NormalizationText NormalizationText Normalization

• proper handling of special symbolsspecial symbols• proper handling of special symbolsspecial symbolsin text– punctuation, e.g., . , : ; ’ ” - -- _ * & ( ) ^

% @ ! ~ < > ? / \ = +•• string resolutionstring resolution

– 10:20 can be pronounced as either10:20 can be pronounced as either ‘twenty after 10 (as a time)’ or ‘ten to twenty’ (as a sequence)twenty (as a sequence)

Knowledge Source ConfusionsKnowledge Source ConfusionsKnowledge Source ConfusionsKnowledge Source Confusions

• Let us pray Lettuce spray• Let us pray ---------- Lettuce spray• Pay per view ---------- Paper viewy p p• Meet her on Main St. ----------

M t M i StMeter on Main St.• It is easy to recognize speech -----It is easy to recognize speech

----- It is easy to wreck a nice b hbeach

Linguistic AnalysisLinguistic AnalysisLinguistic AnalysisLinguistic Analysis• part-of-speech (POS)p p ( )• word sense• phrases

h• anaphora• emphasis• style• style

a conventional parser could be used, but a co e t o a pa se cou d be used, buttypically a simple shallow analysis is done for speed (parsers are not real-time!)

Homograph DisambiguationHomograph DisambiguationHomograph DisambiguationHomograph Disambiguation• “an absent boy” versus “do you choose toan absent boy versus do you choose to

absent yourself?”• “they will abuse him” versus “they won’tthey will abuse him versus they won t

take abuse”• “an overnight bag” versus “are you stayingan overnight bag versus are you staying

overnight?”• “he is a learned man” versus “he learnedhe is a learned man versus he learned

to play piano”• “El Camino Real road” versus “real world”El Camino Real road versus real world

LetterLetter--toto--Sound (LTS) ConversionSound (LTS) Conversion

• CART “Whole Word” Dictionary Probe“Whole Word” Dictionary Probe

Input WordInput Word

(Classification and Regression Affix StrippingAffix Stripping

nonogTree) analysis

• conventional “Root” Dictionary Probe“Root” Dictionary Probe

yesyesnono

nonoconventional dictionary search with

LetterLetter--toto--Sound and Sound and Stress RulesStress Rules

yesyesyesyes

search with letter-to-sound rules

Affix ReattachmentAffix Reattachmentrules

phonemes, stress, phonemes, stress, partsparts--ofof--speechspeech

This is only ONE way of doing This is only ONE way of doing LTS. FSMs are another.LTS. FSMs are another.

ProsodyProsodyProsodyProsody•• pausespauses

– to indicate phrases and to avoid running out of breath

•• pitchpitch– fundamental frequency (F0) as a function of q y ( )

time•• rate/relative durationrate/relative duration

– phoneme durations, timing, and rhythm•• loudnessloudnessloudnessloudness

– relative amplitude/volume

Full TTS SystemFull TTS Systemyy

T t L tt t S th i

Text Face SyncText/Language

Text Analysis

Letter-to-Sound Rules

Synthesis Backend Speech

S th i d l i t TTS

- Numerical expansion(dates, times, $$$, …)

- abbreviations acronyms

- text and name dictionaries

- timing and duration[rate, stress (position, context)]

- intonation/pitch

Synthesis model impacts TTS, as well as unit representation.

choice of units: - words

phonesabbreviations, acronyms

- proper name id

“Dr. Smith lives at 23 Lakeshore Dr.”

- intonation/pitch(type of phrase[list,statement, ??])

- phonetic substitution(vowel reduction, flapping rules)Pr

y - phones - diphones - dyads - syllables

choice of parameters:

- sentences, phrase and breath groups- semantic and syntactic accent; stress of compounds

- loudness/amplitude(phrasal amp., local reduction)

choice of parameters:- LPC - formants- waveform templates- articulatory parameters

Parsing:

y p- intonation types; parts of speech - sinusoidal parameters

method of computation:- rules- concatenation

Word Concatenation SynthesisWord Concatenation Synthesisyy

• words in sentences are much shorter than• words in sentences are much shorter than in isolation (up to 50% shorter) (see next page)page)

• words cannot preserve sentence-level t h th i t ti ttstress, rhythm or intonation patterns

• too many words to store (1.7 million surnames), extended words using prefixes and suffixes

Word ConcatenationWord ConcatenationWord ConcatenationWord Concatenation

Proper Name StatisticsProper Name StatisticsProper Name StatisticsProper Name StatisticsNumber of NamesNumber of Names CoverageCoverageNumber of NamesNumber of Names CoverageCoverage

10 4.9%

100 16.3%

5,000 59.1%5,000 59.1%

50,000 83.2%

100,000 88.6%

200,000 93.0%, %

Statistics of Proper Name Statistics of Proper Name CCCoverageCoverage

name name pronunciation pronunciation

based onbased onbased on based on etymologyetymology

Concatenative Word SynthesisConcatenative Word SynthesisConcatenative Word SynthesisConcatenative Word Synthesis

several examples of several examples of concatenated words and fullconcatenated words and fullconcatenated words and full concatenated words and full

sentencessentences----SBRSBR

wordword--based based synthesis does synthesis does

not work!not work!

Speech Synthesis MethodsSpeech Synthesis MethodsSpeech Synthesis MethodsSpeech Synthesis Methods• 1939—the VODER (Voice Operated1939 the VODER (Voice Operated

DEmonstratoR)—Homer Dudley– based on a simple model of speech sound production– select voicing source (with foot pedal control of pitch)

or noise sourcet filt h d th t d l– ten filters shaped the source to produce vocal or noise-excited sounds—controlled by finger motions

– separate keys for stop soundsseparate keys for stop sounds– wrist bar control of signal energy

The VODERThe VODERThe VODERThe VODER

Articulatory SynthesisArticulatory Synthesis•• in theoryin theory — can create more natural and more

realistic motions of the articulators (rather than formant parameters), thereby leading to more natural sounding synthetic speech

utilize physical constraints of articulator movements– utilize physical constraints of articulator movements– use X-ray data to characterize individual speech

sounds– model how articulatory parameters move smoothly

between sounds• direct method: solve wave equation for sound pressure at q p

lips• indirect method: convert to formants or LPC parameters for

final synthesis in order to utilize existing synthesizers– use highly constrained motions of articulatory

parameters

Articulatory ModelArticulatory ModelArticulatory ModelArticulatory ModelArticulatory Parameters of Articulatory Parameters of Model:Model:

•• lip lip openingopening——WW

•• lip protrusion lengthlip protrusion length——LLlip protrusion lengthlip protrusion length LL

•• tongue body height and tongue body height and lengthlength——Y,XY,X

l ll l KK•• velar closurevelar closure——KK

•• tongue tip height and tongue tip height and lengthlength——B,RB,R

•• jaw raising (dependent jaw raising (dependent parameter)parameter)

•• velum openingvelum opening——NNp gp g

Articulatory Synthesis Using Articulatory Synthesis Using F S h i B k dF S h i B k dFormant Synthesizer BackendFormant Synthesizer Backend

Cecil CokerCecil Coker----teaching computers to teaching computers to talktalk

Articulatory Synthesis of Articulatory Synthesis of SpeechSpeech——Cecil CokerCecil Coker

talktalk

SpeechSpeech Cecil CokerCecil Coker

Articulatory Synthesis by Copying Articulatory Synthesis by Copying M d V l T t D tM d V l T t D tMeasured Vocal Tract DataMeasured Vocal Tract Data

• fully automatic closed-loop optimizationi iti li d f ti l t d b k l t– initialized from articulatory codebooks, neural nets

– Schroeter and Sondhi, 1987

One example: original: re-synthesis:

Articulatory Synthesis IssuesArticulatory Synthesis IssuesArticulatory Synthesis IssuesArticulatory Synthesis Issues

• requires highly accurate models of glottisrequires highly accurate models of glottis and vocal tract

• requires rules for dynamics of the• requires rules for dynamics of the articulators

SourceSource--Filter Synthesis ModelsFilter Synthesis ModelsSourceSource Filter Synthesis ModelsFilter Synthesis Models• cascade/serial (formant) synthesis model

Serial/Formant Synthesis ModelSerial/Formant Synthesis Model

•• flaws in the serial/formant synthesis model:flaws in the serial/formant synthesis model:flaws in the serial/formant synthesis model:flaws in the serial/formant synthesis model:–– can’t handle voiced fricativescan’t handle voiced fricatives–– no zeros for nasal soundsno zeros for nasal sounds–– no precise control for stop consonantsno precise control for stop consonants–– pitch pulse shape fixedpitch pulse shape fixed——independent of pitchindependent of pitch–– spectral compensation is inadequatespectral compensation is inadequate

OVE 1OVE 1----FantFant SPASS SPASS synthesissynthesis

“To Be …”“To Be …”--Bell LabsBell Labs

DaisyDaisy--Daisy with Daisy with

musicmusic

We Wish We Wish You …You …

JSRU JSRU SynthesisSynthesis

Parallel Synthesis ModelParallel Synthesis ModelA i l th i i d h f l l t tA serial synthesizer is a good approach for open, non-nasal vocal tracts (vowels, liquids). For obstruents and nasals, we need to control the amplitudes of each resonance, and to introduce zeros in addition to the poles.

1 2 21

1 21 2

cos( )( )cos( )

k k k k

r rV zr z r z

θθ − −

− +=

− +∏14

1 2 21 1 2 cos( )

use the approximation

k k k k

A B zr z r zθ

− −=

− +∑i

1 2 21 1 2

use the approximation

( )cos( )

k k k k

AV zr z r zθ − −

≈− +∑

parallel synthesizer provides more parallel synthesizer provides more flexibility in matching spectrum levels flexibility in matching spectrum levels

t f t f i ( i it f t f i ( i iat formant frequencies (via gain at formant frequencies (via gain controls)controls)——however, zeros are however, zeros are introduced into the spectrum.introduced into the spectrum.

Parallel SynthesisParallel SynthesisParallel SynthesisParallel Synthesis•• issues:issues:

–– need individual resonance amplitudes need individual resonance amplitudes ((A1,…,A4A1,…,A4))——if resonances are close, this is a if resonances are close, this is a messy calculationmessy calculationmessy calculationmessy calculation

–– phasing of resonances neglected (the phasing of resonances neglected (the BBkkzz--11 terms)terms)–– synthetic speech has both resonances and zeros synthetic speech has both resonances and zeros

(at frequencies between the resonances) that may (at frequencies between the resonances) that may be perceptiblebe perceptiblebetter reproduction of complex consonantsbetter reproduction of complex consonants–– better reproduction of complex consonantsbetter reproduction of complex consonants

Parallel synthesis from BYUParallel synthesis from BYUParallel synthesisParallel synthesis--HolmesHolmes

Continuing Evolution (1959Continuing Evolution (1959--1987)1987)

• Haskins 1959• Haskins, 1959 • KTH – Stockholm, 1962• Bell Labs 1973• Bell Labs, 1973 • MIT, 1976

MIT t lk 1979• MIT-talk, 1979• Speak ‘N spell, 1980

BELL L b 1985• BELL Labs, 1985• Dec talk, 1987

TextText--toto--Speech Synthesis (TTS) Speech Synthesis (TTS) EvolutionEvolution

Poor Good

Good Intelligibility;

C stomerPoor Intelligibility;

Poor Naturalness

Good Intelligibility;

Poor Naturalness

Customer Quality

Naturalness (Limited Context)

Formant Synthesis

LPC-Based Diphone/Dyad

Unit Selection Synthesis

Context)

Bell Labs; Joint Speech

Synthesis

Bell Labs; CNET;

ATR in Japan; CSTR in Scotland; Joint Speech

Research Unit; MIT (DEC-Talk);

Haskins Lab

;Bellcore; Berkeley Speech

Technology

BT in England; AT&T Labs (1998);

L&H in Belgium

1962 1967 1972 1977 1982 1987 1992 1997

Speech SynthesisSpeech Synthesis——the 90’sthe 90’sSpeech SynthesisSpeech Synthesis the 90 sthe 90 s• what changed?what changed?

– TTS was highly intelligible but extremely unnatural sounding

– a decade of work had not changed the naturalness substantiallycomputation and memory grew with Moore’s law– computation and memory grew with Moore s law, enabling highly complex concatenative systems to be created, implemented and perfected

– concatenative systems showed themselves capable of producing (in some cases) extremely natural sounding synthetic speechsounding synthetic speech

Concatenation TTS SystemsConcatenation TTS SystemsConcatenation TTS SystemsConcatenation TTS Systems• key idea: use segments of recorded speech for y g p

synthesis• data driven approach more segments give better

synthesis using an infinite number of segments leadssynthesis using an infinite number of segments leads to perfect synthesis

• key issues:what units to use– what units to use

– how to select units from natural speech– how to label and extract consistent units from a large database

h t i l t ti h ld b d f t ll– what signal representation should be used for spectrally smoothing units (at junctures) and for prosody modification (pitch, duration, amplitude)

Concatenation UnitsConcatenation UnitsConcatenation UnitsConcatenation Units

• choice of units:choice of units:– Words—there are an infinite number of them

Syllables there are about 10K in English– Syllables—there are about 10K in English– Phonemes—there are about 45 in English

Demi syllables there are about 2500 in– Demi-syllables—there are about 2500 in EnglishDiphones there are about 1500 2500 in– Diphones—there are about 1500-2500 in English

Choice of UnitsChoice of UnitsChoice of UnitsChoice of UnitsLength # Rules, Necessary

Unit ModificationsUnit # Units

(English)

Allophone 60-80

( g )QualityLow

ManyShort

pDiphone <402-652

Triphone <403-653

Demisyllable 2K ySyllable 11KVC*V2-syllable <11 K2yWord 100K-1.5MPhraseSentence HighFewLong

8Sentence gLong

Corpus Coverage by Unit TypeCorpus Coverage by Unit TypeCorpus Coverage by Unit TypeCorpus Coverage by Unit Type50000 2-Syllable

VC*V1.0 NOTE: depends on domain

(here SURNAMES)

40000TriphonesSyllablesDemisyllablesDiphones

30000DiphonesWords

20000 .23.15.11

.0031 10K 20K 30K 40K 50K 2 M

Top N Surnames (rank)

Concatenation UnitsConcatenation UnitsConcatenation UnitsConcatenation Units• Words:

– no complete coverage for broad domains words have to be supplemented with smaller units

– limited ability to modify pitch, amplitude and durationlimited ability to modify pitch, amplitude and duration without losing naturalness and intelligibility

– need huge database to extract multiple versions of each word

• Sub-Words:– hard to isolate in context due to co-articulation

need allophonic variations to characterize units in all– need allophonic variations to characterize units in all contexts

– puts large burden on signal processing to smooth at unit join pointsunit join points

Block Diagram of a Block Diagram of a C i TTS SC i TTS SConcatenative TTS SystemConcatenative TTS System

Dictionary and Rules

Store of Sound Units

Text l

Assemble Speech W f

SpeechAnalysis,

Letter-to-Sound,Prosody

ssemble Units that

Match Input

Waveform Modification

and Synthesis

Alphabetic Ch t

Message Textp

Prosodyp

Targets SynthesisCharacters

Phonetic SymbolsPhonetic Symbols, Prosody Targets

Female MaleFemale Male

Procedure for Concatenative SynthesisProcedure for Concatenative Synthesis• off-line inventory preparation:

– record speech corpus and process with coding method f h iof choice

– determine location of speech units and store units in inventoryy

• on-line synthesis from text– normalize input text (expand abbreviations, etc.)p ( p )

– letter-to-sound (pronunciation dictionary and rules)

– prosody (melody/pitch&durations, stress patterns/amplitudes, …)

– select appropriate sequence of units from inventory– modify units (smooth at boundaries, match desired

prosody)prosody)– synthesize and output speech signal

Unit Selection SynthesisUnit Selection SynthesisUnit Selection SynthesisUnit Selection Synthesis• need to optimally match units at boundaries, e.g.,

fundamental frequency (pitch) and spectrumfundamental frequency (pitch), and spectrum • need to automatically and efficiently select optimal

sequence of units from database• issues in Unit Selection Synthesis

– several examples in each unit category (from 10 to 10 )– waveform modification used sparingly (leads to perceived

p g y ( pdistortions)

– high intelligibility must be maintained– “customer quality” attained with reasonable training set y g

(1-10 hours)– “natural quality” attained with large training set (10’s of hours)– “unit selection” algorithm must run in a fraction of real time on a

state-of-the-art processor

(On(On--line) Unit Selection: Viterbi line) Unit Selection: Viterbi SearchSearch

u-##-a a-u

a-u(1)

#-a(1)

u-#(1)

#-# #-#

a-u(2)

#-a(2)

u-#(2)

a-u(3)

#-a(3)

u-#(3)

• transitional (concatenation) costs are based on acoustic distances

a-u(4)

( )• node (target) costs are based on linguistic identity of unit

Unit Selection MeasuresUnit Selection MeasuresUnit Selection MeasuresUnit Selection Measures• USD—Unit Segmental Distortion differencesUSD Unit Segmental Distortion differences

between desired spectral pattern of target and that of candidate unit, throughout whole unit

• UCD—Unit Concatenative Distortion spectral discontinuity across boundaries of the concatenated units

Example: source context—want—/w ah n t/target context—cart—/k ah r t/USD (ah n versus ah r)

Modern TTS Systems (Natural Modern TTS Systems (Natural V i f AT&T)V i f AT&T)Voices from AT&T)Voices from AT&T)

NV2005NV2005• Soliloquy from Hamlet—• Gettysburg Address—

NV2005NV2005

• Bob Story—• German female—• UK British female—• Spanish female—Spanish female• Korean female—• French male• French male—

Modern COMMERCIAL SystemsModern COMMERCIAL Systems

L t• Lucent• AcuVoice • Festival• L&H RealSpeakL&H RealSpeak• SpeechWorks female

S hW k l• SpeechWorks male• Cselt (Actor) - Italian

TTS Future NeedsTTS Future NeedsTTS Future NeedsTTS Future Needs• TTS needs to know how things should be said• context-sensitive pronunciations of words• prosody prediction emphasis

– I gave the book to John (not someone else)g ( )– I gave the book to John (not the photos)– I gave the book to John (I did it, not someone else)

• unit selection process target cost captures mismatch p g pbetween predicted unit specification (phoneme name, duration, pitch, spectral properties) and actual features of a candidate recorded unit need better spectral distance measures that incorporate human perceptiondistance measures that incorporate human perception

• better signal processing compress units for small footprint devices

Business Drivers of TTSBusiness Drivers of TTSBusiness Drivers of TTSBusiness Drivers of TTS• cost reductioncost reduction

– TTS as a dialog component for customer care– TTS for delivering messages– TTS to replace expensive recorded IVR promptsTTS to replace expensive recorded IVR prompts

• new products and services– location-based services

providing information in cars (e g driving directions– providing information in cars (e.g., driving directions, traffic reports)

– unified Messaging (reading e-mail, fax)– voice Portals (enterprise, home, phone access tovoice Portals (enterprise, home, phone access to

Web-based services)– e-commerce (automatic information agents)– customized News, Stock Reports, Sports Scores, p , p– devices

Reading EmailReading EmailFrom: Marilyn Walker <walker@research.att.com>

Example: Old TTS, No Filter

Reading EmailReading EmailFrom: Marilyn Walker walker@research.att.com To: David Ross <davidross@home.com> Subject: Re: Today's Meeting Date: Tuesday, December 01, 1998 4:25 PM ---------------------------------------------------------------------------------------------------------------------------4 30 i fi f S t th ti4:30 is fine for me. See you at the meeting.

Marilyn

-----Original Message-----g gFrom: David Ross <davidross@home.com> To: Marilyn Walker <walker@research.att.com> Date: Tuesday, December 01, 1998 2:25 PM Subject: Today's Meeting

Today's meeting has been changed from 4:00 to 4:30 PM. If the time change is a problem, please send me email at davidross@home.com.

Thanks,

david ross

Reading Email (final)Reading Email (final)From: Marilyn Walker <walker@research.att.com>

Example: Enhanced TTS

Reading Email (final)Reading Email (final)

To: David Ross <davidross@home.com> Subject: Re: Today's Meeting Date: Tuesday, December 01, 1998 4:25 PM ---------------------------------------------------------------------------------------------------------------------------4:30 is fine for me. See you at the meeting.

Marilyn Walker

O i i l M-----Original Message-----From: David Ross <davidross@home.com> To: Marilyn Walker <walker@research.att.com> Date: Tuesday, December 01, 1998 2:25 PM Subject: Today's Meeting j y g

Today's meeting has been changed from 4:00 to 4:30 PM. If the time change is a problem, please send me email at davidross@home.com.

ThanksThanks,

david ross

TTS Application CategoriesTTS Application CategoriesTTS Application CategoriesTTS Application Categories• PDAs, cellphones, gaming, talking appliancesDevices

• driving directions, city and restaurant guides, location services (e.g., “Macys has a sale!”)

Automotive Connectivity

Devices

voice control of cell phones, VCRs, TVs • home information access over telephone (“Home

Voice Portals”)

Connectivity

ConsumerCommunications

• information access over the phone such as sales information, HR, internal phonebook, messaging

Enterprise Communications

Voice assisted • E-commerce, customer care (e.g., friendly automated talking web agents, FAQs, product information)

Voice-assistedE-Commerce

Call center • next-gen “HMIHY” automated operator servicesAutomation

TextText--toto--Speech (TTS) Speech (TTS) Synthesis … · 2010. 5. 19. ·...

Documents

Emic Text-To-Speech Module (#30006) Sheets/Parallax PDFs/30006.pdf · Emic Text-To-Speech Module (#30006) General Description The Emic Text-to-Speech (TTS) Platform is a modular hardware

TTS-1 Text to Speech Server

Towards Audiovisual TTS in Estonian - utCorpus-based AV synthesis • Extends the unit selection strategy developed initially for audio speech synthesis to audiovisual speech • Needs

Polsko-Japońska Wyższa Szkoła Technik Komputerowych by Szklanny.pdf · mowy oraz stosowanych algorytmów aż po zastosowania pełnych systemów TTS. System Text-to-speech (TTS)

IBM Text-to-Speech SSML Programming Guide · International Business Machines ... IBM TTS phonetic alphabet ... Guide. IBM Text-to-Speech SSML Programming Guide,,,,, IBM Text-to-Speech

A Bisaya Text-to-Speech (TTS) System Utilizing Rule …euacademic.org/UploadArticle/1260.pdf · For example, in Ethiopia, ... General Architecture of Bisaya Text-to-Speech Figure

Emic Text-To-Speech Module (#30006) - Grand Idea Studio · Emic Text-To-Speech Module (#30006) General Description The Emic Text-to-Speech (TTS) Platform is a modular hardware design

The Use of Text-to-Speech (TTS) Speech-to-Text (STT) Machine Translator (MT) in Chinese Language Learning

Subjective Testing of Urdu Text-to-Speech (TTS) Systemcle.org.pk/Publication/papers/2016/Subjective Testing of... · Urdu TTS system converts Urdu text into synthetic speech waveform

27 a TOTO TOTO ! I HP TOTO …okuno-home.sakura.ne.jp/sblo_files/okuno-home/image/SCAN0424_0… · 27 a TOTO TOTO ! I HP TOTO —JV—L TOTO T0T0 5'-.75 L, TOTO TOTO F754-0029 I-I

MAC OS X Edition - Text to Speech (TTS) for … Natural Voices Tts...AT&T Natural Voices Text-To-Speech Engines System Developer’s Guide MAC OS X Edition US English, UK English,

Vocia TTS-1 and TTS-1nc Operation Manualc353616.r16.cf1.rackcdn.com/Vocia_TTS-1_TTS-1nc_Manual_Jan13.… · The Vocia Text to Speech Server 1 (TTS-1) and The Vocia Text to Speech

1 5-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Phone Units Phone Sequence To Speech Speech Naturalness –Concatenative Approaches –Rule-Based

Lothar Stein(Lothar.Stein@brunatata--huerth.de) BRUNATA ...€¦ · ØText-To-Speech (TTS) A Text-To-Speech (TTS) web service. ØAirline Fare Checker Returns airfare/flight information

Text-to-Speech Lin Cheng Yuan 2014,6,18. Agenda TTS Introduction HTS Q & A

Text Processing for Urdu TTS System Processing for Urdu TTS System.pdfText to Speech (TTS) system for any language takes a sequence of words as input & con-verts it into speech. TTS

TOTO ap rrorro ap TOTO ap TOTO ap TOTO ap TOTO CRASSO …homedeco.idogaki.co.jp/common/file/download/file/14640663739187.pdfTOTO ap rrorro ap TOTO ap TOTO ap TOTO ap TOTO CRASSO IN

Humix community #2 - TTS, Speech Recognition and Natural Language Processing

Text to Speech Systems (TTS) EE 516 Spring 2009

ARABIC TEXT-TO-SPEECH SYNTHESIZER - UM …repository.um.edu.my/142/1/Arabic TTS Synthesizer.pdf · arabic text-to-speech synthesizer ahmad qasim mohammad al jayousi dissertation submitted