Upload
triston-loar
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Applying the Pronunciation Lexicon Specification to ASR & TTS 1Patrizio Bergallo 1
Monday, August 20, 2007SpeechTEK ASTS - Advances in Text-to-Speech Processing
Applying the Pronunciation Lexicon Specification to ASR & TTS
Patrizio Bergallo
Applying the Pronunciation Lexicon Specification to ASR & TTS 2Patrizio Bergallo
Agenda
• Loquendo Today
• Introduction to PLS– Reference Scenario
– Pronunciation Lexicons
– International Phonetic Alphabet
• Overview of PLS– How does TTS use PLS?
– How does ASR use PLS?
• Examples of Use
• Latest Improvements
Applying the Pronunciation Lexicon Specification to ASR & TTS 3Patrizio Bergallo
Loquendo Today
• Global company of the Telecom Italia group, leader in Europe and South America in the Speech Technologies market
• Company founded in 2001 from Telecom Italia Labs, benefiting from know-how gained from more than 30 years research experience
• Complete set of Multilingual speech technologies on a wide spectrum of devices; 25 patents, 50 voices and 20 languages
• Full support for international standards (MRCPv1/v2, VoiceXML 2.0/2.1, CCXML, SSML, SRGS, SISR)
• Company ready for challenging future scenarios: Multimodality, Security • 100 employees, and displayed strong growth throughout 2007• HQ in Turin, Offices in US, Spain, Germany and France, and a Worldwide
Network of Partners
Applying the Pronunciation Lexicon Specification to ASR & TTS 4Patrizio Bergallo
Reference Scenario
• Many speech applications need to specify pronunciation for words and phrases– Surnames, locations, company names
– Acronyms
– Names in specific contexts (restaurants, sports, movie titles, etc.)
– Foreign words, mixed languages
• Pronunciation is critical both for TTS and ASR– Improves reading of prompts by TTS
– Improves ASR performance
• VoiceXML 2.0/2.1 applications are the reference scenario– Prompts are based on SSML 1.0 (or in future SSML 1.1)
– Recognition grammars are based on SRGS 1.0
Applying the Pronunciation Lexicon Specification to ASR & TTS 5Patrizio Bergallo
Pronunciation Lexicons
• Pronunciation Lexicon– a mapping between words (or short phrases), their written
representations, and their pronunciations suitable for use by an ASR engine or a TTS engine
• Pronunciation lexicons are not only useful for voice browsers – They have also proven effective mechanisms to support accessibility for
the differently able as well as greater usability for all users
– They are used to good effect in screen readers and user agents supporting multimodal interfaces
• The W3C Pronunciation Lexicon Specification (PLS) Version 1.0 is designed to enable interoperable specification of pronunciation lexicons
Applying the Pronunciation Lexicon Specification to ASR & TTS 6Patrizio Bergallo
Pronunciation Lexicon Specification
• W3C specification status– Second Last Call Working Draft (26 October, 2006)
– Currently the Implementation Report Plan and the Disposition of Comments are under development (all public comments were addressed)
– Candidate Recommendation expected 3Q07
Part of first version of the Speech Interface
Framework (Larson, 2000)
W3C Recommendation
W3C Last Call Working Draft
Applying the Pronunciation Lexicon Specification to ASR & TTS 7Patrizio Bergallo
International Phonetic Alphabet
• Pronunciation is represented by a phonetic alphabet– Standard phonetic alphabets
• International Phonetic Alphabet (IPA)
– Well known phonetic alphabet• SAMPA - ASCII based (simple to write)• Pinyin (Chinese Mandarin), JEITA (Japanese), etc.
– Proprietary phonetic alphabets
• International Phonetic Alphabet (IPA)– Created by International Phonetic Association (active since 1896),
collaborative effort by all the major phoneticians around the world
– Universally agreed system of notation for sounds of languages
– Covers all languages
– Requires UNICODE to write it
– Normatively referenced by PLS
Applying the Pronunciation Lexicon Specification to ASR & TTS 8Patrizio Bergallo
Overview of PLS
• A PLS document is a container (<lexicon>) of several lexical entries (<lexeme>)
• Each lexical entry contains– One or more spellings (<grapheme>)
– One or more pronunciations (<phoneme>) or substitutions (<alias>)
• Each PLS document is related to a single unique language (xml:lang)
• SSML 1.0 and SRGS 1.0 documents can reference one or more PLS documents
• Current version doesn’t include morphological, syntactic and semantic information associated with pronunciations
Applying the Pronunciation Lexicon Specification to ASR & TTS 9Patrizio Bergallo
PLS Example
<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciationlexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon2007@@@@/pls.xsd" alphabet="ipa" xml:lang="en-US">
<lexeme> <grapheme>Sepulveda</grapheme> <phoneme>səˈpʌlvɪdə</phoneme> </lexeme>
<lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme>
</lexicon>
Applying the Pronunciation Lexicon Specification to ASR & TTS 10Patrizio Bergallo
How does TTS use PLS?
• SSML 1.0<?xml version="1.0" encoding="UTF-8"?><speak version="1.0" … xml:lang="en-US"> <lexicon uri="http://www.example.com/SSMLexample.pls"/> The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Benigni. </speak>
• PLS 1.0<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>La vita è bella</grapheme> <phoneme>ˈlɑ ˈviːɾə ˈʔeɪ ˈbɛlə</phoneme> </lexeme> <lexeme> <grapheme>Benigni</grapheme> <phoneme>bɛˈniːnji</phoneme> </lexeme></lexicon>
Applying the Pronunciation Lexicon Specification to ASR & TTS 11Patrizio Bergallo
How does ASR use PLS?
• SRGS 1.0<?xml version="1.0" encoding="UTF-8"?><grammar version="1.0" … xml:lang="en-US” root="movies" mode="voice"> <lexicon uri="http://www.example.com/SRGSexample.pls"/> <rule id="movies" scope="public"> <one-of> <item>Terminator 2: Judgment Day</item> <item>Pluto's Judgement Day</item> </one-of> </rule></grammar>
• PLS 1.0<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>judgment</grapheme> <grapheme>judgement</grapheme> <phoneme>ˈdʒʌdʒ.mənt</phoneme> </lexeme></lexicon>
Applying the Pronunciation Lexicon Specification to ASR & TTS 12Patrizio Bergallo
Examples of Use
• Multiple pronunciations for the same orthography
• Multiple orthographies
• Homophones
• Homographs
• Acronyms, Abbreviations, etc.
Applying the Pronunciation Lexicon Specification to ASR & TTS 13Patrizio Bergallo
Multiple pronunciations for the same orthography
• Multiple pronunciations are represented by more than one <phoneme> or <alias> element
<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>ˈnjuːtən</phoneme> <phoneme>ˈnuːtən</phoneme> </lexeme></lexicon>
Applying the Pronunciation Lexicon Specification to ASR & TTS 14Patrizio Bergallo
Multiple orthographies
• Alternative textual representations for the same word or phrase are represented by more than one <grapheme> inside the same <lexeme>
• All the pronunciations given within the <lexeme> apply to each and every <grapheme> within the <lexeme>
<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="jp"> <lexeme> <grapheme>nihongo</grapheme> <grapheme> 日本語 </grapheme> <grapheme> にほんご </grapheme> <phoneme>ɲihoŋo</phoneme> </lexeme></lexicon>
Applying the Pronunciation Lexicon Specification to ASR & TTS 15Patrizio Bergallo
Homophones
• Words with the same pronunciation but different meanings are represented as different lexemes
<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>cede</grapheme> <phoneme>siːd</phoneme> </lexeme> <lexeme> <grapheme>seed</grapheme> <phoneme>siːd</phoneme> </lexeme></lexicon>
Applying the Pronunciation Lexicon Specification to ASR & TTS 16Patrizio Bergallo
Homographs (1/2)
• Words with the same spelling but pronounced in different ways are represented using the role attribute of the <lexeme> element
• This mechanism allows for the referencing of defined taxonomies of word classes (part of speech, meaning, etc.)
<lexicon version="1.0“ xmlns:claws=“http://www.example.com/claws7tags” alphabet="x-myorganization-pinyin" xml:lang="zh-CN"> <lexeme role="claws:VV0"> <!-- base form of lexical verb -->
<grapheme> 处 </grapheme> <phoneme>chu3</phoneme> <!-- pinyin string is: "chǔ" in 处罚 处置 --> </lexeme> <lexeme role="claws:NN"> <!-- common noun, neutral for number -->
<grapheme> 处 </grapheme> <phoneme>chu4</phoneme> <!-- pinyin string is: "chù" in 处所 妙处 --> </lexeme></lexicon>
Applying the Pronunciation Lexicon Specification to ASR & TTS 17Patrizio Bergallo
Homographs (2/2)
<speak version="1.1“ xmlns:claws="http://www.example.com/claws7tags" xml:lang="zh-CN">
<lexicon uri="http://www.example.com/lexicon.pls“
type="application/pls+xml“ xml:id="mylex"/>
<lookup ref="mylex">
他这个人很不好相 <w role="claws:VV0"> 处 </w> 。 此 <w role="claws:NN"> 处 </w> 不准照相。 </lookup>
</speak>
• SSML 1.1 will support the role attribute
• Currently PLS doesn’t define/mandate any taxonomy
• PLS generally defines role values as qualified names (QNames)
Applying the Pronunciation Lexicon Specification to ASR & TTS 18Patrizio Bergallo
Acronyms, Abbreviations, etc.
• Pronunciations expressed as a sequence of other orthographies (acronyms, abbreviations, etc.) are represented by the <alias> element
<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> <lexeme> <grapheme>101</grapheme> <alias>one hundred and one</alias> </lexeme></lexicon>
Applying the Pronunciation Lexicon Specification to ASR & TTS 19Patrizio Bergallo
Latest Improvements
• W3C Last Call Working Draft stage allows public comments to be addressed– Large majority were clarifications
– New functionalities were deferred to a future version of PLS specification
• Major clarifications were about– <alias> recursion
– Multiple pronunciations
• Changes are subject to a formal approval by the Working Group
• Next Steps– PLS 1.0 is very close to Candidate Recommendation stage
– SSML 1.1 will provide a more complete support of PLS 1.0
Applying the Pronunciation Lexicon Specification to ASR & TTS 20Patrizio Bergallo
<alias> recursion
• Pronunciations of the <alias> element contents MUST be generated by the processor, using pronunciations described by the <phoneme> element of any constituent graphemes in the PLS document, and without invoking recursive access to the PLS document on the <alias> elements of any constituent graphemes
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" … alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>GNU</grapheme>
<alias>GNU is Not Unix</alias>
<phoneme>gəˈnuː</phoneme>
</lexeme>
<lexeme>
<grapheme>Unix</grapheme>
<grapheme>UNIX</grapheme>
<alias>a multiplexed information and computing service</alias>
<phoneme>ˈjuːnɪks</phoneme>
</lexeme>
</lexicon>
GNU is pronounced:gəˈnuː is Not ˈjuːnɪks
Applying the Pronunciation Lexicon Specification to ASR & TTS 21Patrizio Bergallo
Multiple pronunciations (1/2)
• ASR– If more than one pronunciation for a given <lexeme> is specified, an ASR
processor MUST consider each of them as valid pronunciations for the <grapheme>
• TTS– If more than one pronunciation for a given <lexeme> is specified, a TTS
processor MUST use the first one in document order that has the prefer attribute set to "true“
– If none of the pronunciations has prefer set to "true", the TTS processor MUST use the first one in document order unless the TTS processor is documented as having a method of selecting pronunciations, in which case the processor MUST use any one of the pronunciations
Applying the Pronunciation Lexicon Specification to ASR & TTS 22Patrizio Bergallo
Multiple pronunciations (2/2)
• An ASR processor will recognize both pronunciations, whereas a TTS processor will only use the first one (because it is the first in document order that has prefer set to "true").
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" … alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>lead</grapheme>
<alias prefer="true">led</alias>
<phoneme prefer="true">liːd</phoneme>
</lexeme>
<lexeme>
<grapheme>led</grapheme>
<phoneme>led</phoneme>
</lexeme>
</lexicon>
Applying the Pronunciation Lexicon Specification to ASR & TTS 23Patrizio Bergallo
References
• PLS 1.0 Second Last Call Working Draft (26 October, 2006) – http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/
• Voice Browser Activity Page (VoiceXML, SSML, SRGS, …)– http://www.w3.org/Voice/
• International Phonetic Association– http://www.arts.gla.ac.uk/IPA/
• VoiceXML Forum– http://www.voicexml.org/
Applying the Pronunciation Lexicon Specification to ASR & TTS 24Patrizio Bergallo
Final Remarks
THANK YOUTHANK YOU
• For more information please
– Visit Loquendo’s booth #509
– Keep an eye on: www.loquendo.com
– Contact us: [email protected]