Upload
myron-austin
View
213
Download
0
Embed Size (px)
Citation preview
Distributed Rendering Tool for Voices Distributed Rendering Tool for Voices (DRTV)(DRTV)
Familiar, Expressive Voices & Personalities
Speech Technology & Media Solutions
By Dale SchalowSCHALOW Innovations
Ashburn, Virginia (USA)
DRTV GoalsDRTV Goals
Professionally Design, Produce, Develop Familiar-sounding Voices from today, tomorrow and the past
Provide Always-On Service to Consumers, Businesses and Government
Provided for Interactive and Linear Media Users as a Hosted Solution (Client/Server)
DescriptionDescription
High-quality voices for use in Internet and Content.
Managing Assets with New and Historic Sources.
DescriptionDescription
High-quality voices for use in Internet and Content.– Entertainment and Education • 3D animation, gaming• Film, TV, radio
– Accessibility• Seniors• Low Vision• Motor-Impaired
DescriptionDescription
Build and Manage Speech Assets:– Establish formal voice asset collection,
storage and distribution – Facilitate asset preservation and
restoration– Coordinate with Museums, Libraries, 3D
Game/Film Studios, Radio, Foundations, Colleges, etc
DescriptionDescription
Build and Manage Assets:– Refactor inventory for both audio and audio-
visual physical assets (tapes, digital, reels, master sound recordings)
– Maintain digital asset libraries
– Maintain product voice library with dictionary of terms (paired vocabulary)
– Coordinate asset management IS/IT needs and initiatives with customer or partnering group
TechnologyTechnology
New media technology used– NLP Toolkit (Natural Language
Processing)– Cross-Encoding for Embedded Media
(PCs, HD, AAC, MP3/Internet Radio, etc)Standards being adopted–W3C (World-Wide Web Consortium)– Java™ and VoiceXML, SSML (Speech
Synthesis Markup Language)
Team/ResourcesTeam/Resources
Resources allocated to this project– Support & outside services• Internal software development• Internet Service Provider• Pro Recording Studios• 3rd party vendors (hardware/software)
Speech Tech ProceduresSpeech Tech Procedures
Step 1 - New Voice as Source?– Professionally Record using N-based “tape
script” • Output format as PCM (e.g. Wave 1-channel 16 bit)
Step 2 - Existing Voice as Source– Import audio source (PCM/16 bit quality)– “Auto-Extract” using N-based “tape script” to
pull phonetic-features phonemes and transcriptions• Audio scanning with automatically generated text-
based grammars• Retaining audio output
Speech Tech ProceduresSpeech Tech Procedures
Step 3 - Apply Vocabulary– Build a default dictionary of terms to allow automatic
translation– Minimum 40k words (ideally more is better)
Step 4 - Process Text-to-Speech (TTS)– Take as input some text (e.g. “hello”)– Use the speech synthesis engine to generate audio with
the applied vocabulary Step 5 - Use the URL/file of the generated voice
from Step 4 for vertical application (Web page, game, 3D import, etc)
Speech Tech ProceduresSpeech Tech Procedures
Benefits• Reduces time and manual effort to re-do
fundamental tasks• Achieved high-quality output• Moving things forward on at least two-fronts
– 1) Voices we already know or recognize– 2) Voices and creations we are yet to discover in
the process
• Appeals to many demographics for marketability
DRTV Contact InformationDRTV Contact Information
For more information:SCHALOW Innovations
Dale B. Schalow
Phone: (703) 625-7367
Email: [email protected]
Web: http://schalow.com