33
Speech Service Creation K. W. (Bill) Scholz NewSpeech, LLC An Overview of Speech Service Creation Tools NY / NJ Chapter December, 2006

Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Speech Service Creation

K. W. (Bill) Scholz

NewSpeech, LLC

An Overview of Speech Service Creation Tools

NY / NJ Chapter

December, 2006

Page 2: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

� Speech Applications – where we were and where we are

� Building speech applications today

� Methodologies and Tools

� Reusable components & packaged applications

� Summary of today’s Leading VUI creation tools

� Highlight / compare / contrast industry’s leading tools

Agenda

Page 3: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

What’s it take to build a speech app?

Requirements, Use Cases, Project Plan

Call flow, Implementation, & Test

Dialog Design & Test

Prompts, Grammars, & Test

Data / Back-end Integration, & Test

Unit Test, Integration Test, System Test

Pilot, Limited Deployment, Analysis

Full Deployment, Analysis

Page 4: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Where We’ve Come From: Building Speech Apps

� Development toolkits designed for building DTMF applications were extended to support speech

� Call flows had the sound-and-feel of DTMF apps

� Grammars were constructed by hand

� Back-end integration coded by hand, often targeting closed-architecture information stores

� Screen scraping – ‘row 12, column 37, 9 characters’

� Proprietary closed databases

� Separate natural language processors driven by recognizer output required separate ‘NL’ grammars

� Poor TTS quality generated need for recorded prompts

Page 5: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Where We Are: Building speech apps today

� Methodologies and Tools� Methodology: problem statement, use cases, dialog

design, project management

� Data / Back-end integration

� Reusable components� OpenSpeech Dialog Modules

� Reusable Dialog Components

� Packaged applications

� Testing & Analytics

Page 6: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Current Practice

Most applications use state-based dialogs

� Easiest to design, debug and test for current simple applications

� Natural fit with the directed dialogs that are easiest for novice users

� Speech recognizer grammars are simpler to construct and therefore less error prone

� As developers and users become exposed to more sophisticated dialog approaches, they will become less satisfied with state-based dialogs

� Goal-directed

� Conversational

� Rule-based

Page 7: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Tools for Building Speech Applications

� Dialog design, evaluation, call flow development back-end integration, prototype, deployment, tuning, life cycle support.

� Vendors � Active:

� Audium: the ‘Audium Builder’� DBscape Vocabase� Fluency: ‘Voice Runner’� OpenMethods: ‘OpenVXML’� TuVox: ‘CVR’ (‘Producer’ + management & analytics)� Vicorp: ‘xMP’� VoiceObjects: ‘VoiceObjects X6’

� Inactive:� Unisys: the ‘NL Speech Assistant’� Unveil: ‘Conversation Manager’� Vocalocity: ‘AppCenter’

� Support:� Eclipse – Back-end integration� Microsoft: ‘Visio’ for call flow representation� Nuance: OSI – Tuning

Avaya Dialog Designer

IBM WebSphere

Intervoice InVision

Microsoft Speech .NET

NetByTel (TuVox)

Nortel MPS Developer (was PeriProducer)

Nuance OSD

Orange Nextfire OAVS

And others……

Page 8: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

SCE Tools: what to look for

� Manipulable element – what the SCE assembles

� Element detailing – how each is tailored for use

� Business rule / back-end integration

� Architectural model – underlying design pattern

� Life cycle support – pre- and post-deployment management and testing

Page 9: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

" y e s "

D T 7

M ix e d I n i t i a t i v e

D T 7 . 1 p r o m p t f o r d a t a

D T 7 .2 Y e s /N o

p r o m p t1 : < d a ta 1 > . . . < d a t a N > . I s t h a t

c o r r e c t ? Y e s o r n o ?

P r o m p t3 : < d a ta 1 > a n d < d a ta 2 > . I s t h a t

c o r r e c t ? y e s o r n o

D T 7 .C 1 V a r io u s

P r o m p t 1 : W h a t i s

< d a ta 1 > ?

D T 7 .C 2 V a r io u s

P r o m p t 1 :

W h a t i s < d a ta 2 > ?

2 n d

" n o "

D T 7 .C n

P r o m p t 1 : W h a t i s

< d a ta N >

R e t u r n Y e s

G S 1

T r a n s a c t io n

E r r o r

R e c o v e r y

D T 7 .3 n Y e s /N o

p r o m p t1 : < d a ta N > C o r r e c t ?

P r o m p t2 : C o r r e c t ? y e s o r n o

D T 7 .3 Y e s /N o

p r o m p t1 : < d a ta 1 > I s t h a t

c o r r e c t ?

P r o m p t2 : C o r r e c t ? y e s o r n o

1 s t " n o "

y e s

2 n d " n o "

2 n d s i le n c e

2 n d m is r e c

D T 7 .3 b Y e s /N o

p r o m p t1 : < d a ta 1 > C o r r e c t ?

P r o m p t2 : C o r r e c t ? y e s o r n o

1 s t " n o "

2 n d " n o "

2 n d s i le n c e

2 n d m is r e c

( i f o n ly d a t a 2 w a s c o l le c t e d f r o m D T 7 .2 ,

t h e n g o t o D T 7 .C 1 a n d c o l le c t f i r s t p ie c e o f d a t a a n d t h e n

r e t u r n t o c o l le c t a n y r e m a in in g d a ta - t h is c a p a b i l i t y i s n o t im p le m e n t e d in A A T A K E C O M P L E T E )

1 s t " n o "

2 n d " n o "

2 n d s i le n c e

2 n d m is r e c

D T 7 .C 1

D T 7 .3d a t a 1 o n ly

D T 7 .C 12 n d m is r e c / s i le n c e

1 s t " n o "

( p r o m p t2 )

D T 7 . 3 b

d a ta 2 o n ly

D T 7 .4 U n a b le t o

c o l le c t < d a t a > .

D T 7 .4

D T 7 .4

Visio to Represent Dialog Call flow

Source: Unisys ‘FFA’ design specification)

Page 10: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Audium (Purchased by Cisco)

• Audium Builder: a GUI that permits users

to create and manage multiple applications

• Visual elements include functions for

managing databases, menus, dates and

times, or phone transfers, as well as credit

card or email processing.

• Application creation is done by dragging

elements to the workspace to construct the

call flow

• As elements are added their properties

can be configured to load pre-recorded

audio or TTS prompts, and configured to

play naturally to callers.

• Elements are interconnected using the GUI

to assign ‘exit states’ to reach an end goal.

Source: Joe Oh, Audium, (private communication)

Page 11: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Audium

Application treeview

Tools

Object properties

Page 12: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

DBscape Vocabase

The VocaBase “Dialog Map” represents the sequence of modules, sub-modules, and steps. Clicking on any element permits access its detailed configuration.

Page 13: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Fluency ‘Voice Runner’

Key features of this tool are:� Visual component assembly� Integrated component assembly

analysis & testing� One click assembly deployment� Library of process and rule

components:� Address Collection� Credit Card Verification

Page 14: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Vicorp xMP

Page 15: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

VoiceObjects 6 Desktop

� Tree structure to represent dialog design

� Point-and-click authoring.

� Layering includes system layers and user-built layers

� Single click packages an application for deployment

� Back-end integration: ‘connectors’ support both server-side scripting and J2EE code execution

� Uses object-oriented concepts

Source: http://www.voiceobjects.com/

Page 16: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

VoiceObjects Desktop – At a glance

Individual editor for voice object

List of all available VoiceObjects

Source: Tiemo Winterkamp, VoiceObjects (private communication)

Components

Resources

Logic

Actions

Page 17: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

VoiceObjects Desktop - Control Center

Source: Tiemo Winterkamp, VoiceObjects (private communication)

Page 18: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Microsoft Speech (Visual Studio)

Page 19: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Unisys ‘NLSA’

Page 20: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

NLSA Grammar Specification

Page 21: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Vocalocity AppCenter

Source: Ken Rehor - 2005

Page 22: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

OpenVXML – Open Source SCE

Page 23: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Back-end Integration

� Java, JSP, C#

� Scripting languages

� PERL

� JSP / ASP

� PHP

� …

� Databases� Oracle

� Microsoft SQL Server

� MySQL / PostgreSQL

� Web Services

� AJAX (Asynchronous Javascript and XML)

Page 24: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Eclipse

Page 25: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Testing

� Unit – emulation

� Callflow – WoZ or live

� Usability – WoZ or live

� Post deployment analytics

Page 26: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Modules and packaged applications

Modules: components and templates

Source: Steve Erlich, Apptera (private communication)

Application

A software program A software program

designed to perform a designed to perform a

specific set of functionsspecific set of functions

Component Template

A piece of software A piece of software

that can be combined that can be combined

with other pieces to with other pieces to

construct a programconstruct a program

A pattern used to A pattern used to

replicate objectsreplicate objects

Page 27: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

SCE Analysis and Evaluation

� Manipulable element – what the SCE assembles� Dialog state� Object module� Conversation step

� Element detailing� Properties and values� Element attributes� Prompt and grammar management

� Business rule / back-end integration� Built-in primitives� Integration with Java, Web Services, Databases

� Architectural model� OO? FSM? SOA? MVC? Design patterns?� Visible dialog metalanguage?

� Life cycle: Deployment and post-deployment support� Reuse: create, package, and integrate reusable components� Test capability; test script generation; WoZ capability� Analytics

Page 28: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Audium

� Application Development assets� Gui is implemented using Eclipse. VISIO-like view� Inline grammars can be generated directly by the Studio� Centralized prompt management capability; recording scripts generated� OSDM integration supported (but RDCs are not)� XML dialog meta-language documented and the DTD provided� Multiple ‘Form’ elements can be combined to generate mixed-initiative

dialog� Multi-user collaboration is well supported and demonstrated at customer

sites

� Runtime assets� Applications published as XML; interpreted by a Java runtime engine� SNMP queries are generated

� Liabilities� Layering is not distinct – common database and external component

references � No 3rd party application support� No automatic test script generation� No dedicated form for mixed initiative� No runtime cluster or server management� No speaker verification or video service generation capability� Elements oriented towards programmers, not towards VUI designers

Page 29: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Vicorp

� Application Development assets� Explicit separation of presentation layer from business objects layer� Visio-like presentation of application call flow.� Inline grammars with confidence levels generated from item lists� Prompt categories facilitates multiple persona and language management.� Invokes 3rd party applications by URI with arguments.� Directed dialog, mixed initiative, and sub dialogs are supported.

� Runtime assets� Applications published as EAR files for execution on J2EE application server.� Service Management Console provided to mange server clusters.

� Liabilities� No support for the generation of SSML for TTS� Internal XML dialog meta-language not exposed for use� No automatic testing of applications; no post-deployment analytics� No support for multi-user management or collaboration� Speaker verification and video service generation not shown� It is not possible to open multiple simultaneous projects then cut-and-paste

between them.

Page 30: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

VoiceObjects

� Application Development assets � Layering facilitates runtime prompt and persona remapping� Java extensions easily integrated as external resources� OSDM integration supported� Invokes 3rd party applications by URI with arguments.� XML dialog meta-language documented, DTD provided� Recording script generation by DB query� Multi-user collaboration supported: user logons with specific privileges

� Runtime assets� Single runtime engine accesses all applications as data� Runtime data collection through ‘InfoStore’ and a mature Analytics package.� Extensive server cluster management, including SNMP� Support for multi-tenancy: separate JVMs launched for each tenant

� Liabilities� Reusable Dialog Components are not supported� No explicit prompt management� Eclipse integration is incomplete� Confidence values not supported� No generation of SSML or recording scripts� No built-in application testing capability or test script generation capability� Natural language apps only supported by reference to external SLMs� External resources such as Java jar files are not managed by app dev

environment.

Page 31: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Conclusion

� Building speech applications today…..

…..a bit like a marriage!

Dialog modules,

Packaged apps

VUI built with

tools

ASR and TTS

subsystems

Page 32: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Summary

� Overview of speech application creation process

� Building speech applications today

� Methodologies and Tools

� Reusable components

� Packaged applications

� Where the field is going

� Dialog description languages and tools: MI, Personalization, automatic call flow generation

� SLMs, ASR & TTS improvements, Rule-Based and Case-Based Reasoning

Page 33: Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog Designer IBM WebSphere IntervoiceInVision Microsoft Speech .NET NetByTel(TuVox) Nortel

Thank You.