Distributed Rendering Tool for Voices (DRTV) Familiar, Expressive Voices & Personalities Speech Technology & Media Solutions By Dale Schalow SCHALOW Innovations

Distributed Rendering Tool for Voices Distributed Rendering Tool for Voices (DRTV)(DRTV)

Familiar, Expressive Voices & Personalities

Speech Technology & Media Solutions

By Dale SchalowSCHALOW Innovations

Ashburn, Virginia (USA)

DRTV GoalsDRTV Goals

Professionally Design, Produce, Develop Familiar-sounding Voices from today, tomorrow and the past

Provide Always-On Service to Consumers, Businesses and Government

Provided for Interactive and Linear Media Users as a Hosted Solution (Client/Server)

DescriptionDescription

High-quality voices for use in Internet and Content.

Managing Assets with New and Historic Sources.


High-quality voices for use in Internet and Content.– Entertainment and Education • 3D animation, gaming• Film, TV, radio

– Accessibility• Seniors• Low Vision• Motor-Impaired


Build and Manage Speech Assets:– Establish formal voice asset collection,

storage and distribution – Facilitate asset preservation and

restoration– Coordinate with Museums, Libraries, 3D

Game/Film Studios, Radio, Foundations, Colleges, etc


Build and Manage Assets:– Refactor inventory for both audio and audio-

visual physical assets (tapes, digital, reels, master sound recordings)

– Maintain digital asset libraries

– Maintain product voice library with dictionary of terms (paired vocabulary)

– Coordinate asset management IS/IT needs and initiatives with customer or partnering group

TechnologyTechnology

New media technology used– NLP Toolkit (Natural Language

Processing)– Cross-Encoding for Embedded Media

(PCs, HD, AAC, MP3/Internet Radio, etc)Standards being adopted–W3C (World-Wide Web Consortium)– Java™ and VoiceXML, SSML (Speech

Synthesis Markup Language)

Team/ResourcesTeam/Resources

Resources allocated to this project– Support & outside services• Internal software development• Internet Service Provider• Pro Recording Studios• 3rd party vendors (hardware/software)

Speech Tech ProceduresSpeech Tech Procedures

Step 1 - New Voice as Source?– Professionally Record using N-based “tape

script” • Output format as PCM (e.g. Wave 1-channel 16 bit)

Step 2 - Existing Voice as Source– Import audio source (PCM/16 bit quality)– “Auto-Extract” using N-based “tape script” to

pull phonetic-features phonemes and transcriptions• Audio scanning with automatically generated text-

based grammars• Retaining audio output


Step 3 - Apply Vocabulary– Build a default dictionary of terms to allow automatic

translation– Minimum 40k words (ideally more is better)

Step 4 - Process Text-to-Speech (TTS)– Take as input some text (e.g. “hello”)– Use the speech synthesis engine to generate audio with

the applied vocabulary Step 5 - Use the URL/file of the generated voice

from Step 4 for vertical application (Web page, game, 3D import, etc)


Benefits• Reduces time and manual effort to re-do

fundamental tasks• Achieved high-quality output• Moving things forward on at least two-fronts

– 1) Voices we already know or recognize– 2) Voices and creations we are yet to discover in

the process

• Appeals to many demographics for marketability

DRTV Contact InformationDRTV Contact Information

For more information:SCHALOW Innovations

Dale B. Schalow

Phone: (703) 625-7367

Email: [email protected]

Web: http://schalow.com

Documents

Distributed Rendering Tool for Voices (DRTV) Familiar, Expressive Voices & Personalities Speech Technology & Media Solutions By Dale Schalow SCHALOW Innovations