AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamala Gopal Gupta B. Prabhakaran DEPARTMENT OF COMPUTER SCIENCE THE UNIVERSITY

AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION

by

Narayanan AnnamalaGopal Gupta

B. Prabhakaran

DEPARTMENT OF COMPUTER SCIENCETHE UNIVERSITY OF TEXAS AT DALLAS

Goal: Make information accessible to visually-

impaired individuals Screen-readers work well, but are not completely

voice/audio based. Screen-readers provide a work-around a technology

that was not created with blind people in mind. We should leverage a technology that is more easily

usable by blind individuals: VoiceXML Web access via voice has become important for other

reasons as well: cell phones

Overview

VoiceXML is an XML for marking up voice-audio data A Standard developed by VoiceXML forum (AT&T,

Motorola, IBM, Lucent). Now a W3C standard. VoiceXML is a Markup language for creating

telephone-based human-computer interfaces VoiceXML pages “browsed” via a Voice Browser

running on a computer Users can interact with a VoiceXML page through

spoken inputs (Telephone key press). Voice browser plays synthesized speech audio files

using TTS (Text to speech) converters

VoiceXML

<vxml version="2.0"> <form> <field name="rich"> <grammar type=“application/x-gsl” mode = “voice”> <![CDATA[[ [(yes)]{<option “yes”>} [(no)]<option “no”>} ]]]> </grammar>

<prompt>Would you like to get rich quick?</prompt> <filled>Gotcha.

<if cond="rich==‘yes’">You want to be rich! <goto next="rich.vxml" />

<else /> You don't want to be rich. <goto next=“notrich.vxml" /> </if> </filled> </field> </form> </vxml>

Sample VXML

To make the web accessible via VoiceXML, we need

to rewrite all the web-pages in VXML Natural solution is to perform this translation

automatically The objective of our research is to develop a

translator that converts HTML to VoiceXML HTML pages can be translated to VXML and browsed

via voice/audio on a voice browser

HTML to VXML

Application of the Transcoder

PSTN

INTERNET

Mobile User

Voice Server

Transcoder

WEB SERVER

Req.

http req. html

VoiceXML

VoiceXML

Audio


INTERNET

Client

TranscoderWEB SERVER

http req.

Voice Browser

HTML

VXML

HTML

Audio


INTERNET

Client

WEB SERVER

http req.

Transcoder

Voice Browser

HTMLAudio

VXML

VXML

Transcoder: Objectives

Provide means for Visually impaired to access the Web.

Strive to express the structure of HTML pages in Voice form.

Application can be customized with respect to User’s wish.

Make the transcoder extensible – to accommodate new HTML tags in future

VoiceXML Example

<?xml version="1.0"?>

<vxml version="2.0">

<form id="f1">

<block> starting of the vxml page </block>

<block> Sample Page </block>

<block> The output is in the form of audio</block>

</form>

</vxml>

<html>

<head>

<title> Sample Page</title>

</head>

<body>

<h3> The output is in the form of audio </h3>

</body>

</html>

HTML file VoiceXML file

HTML vs VoiceXML

HTML VoiceXML

1. Single unit, presented with full efficiency.

2. Displays several inputs at the same time.

3. Input does not need any grammar for validation.

1. Consists of forms and blocks alone.

2. Inputs are collected sequentially

3. Every input needs a grammar for validation.

System Model

The application is realized in two phases

I. Parsing Phase

II. Translation Phase

Parsing Phase: The Input HTML file is parsed and the HTML node tree is obtained as output. Parser used - purpose is Web-Wise Systems HTML parser

Translation Phase: Each HTML node is converted in to corresponding VoiceXML node.

System Architecture

Input Provider

Parser

Translator

Internal data sheet

External data sheet

Output VoiceXML file

Parsing Phase

The structure of the HTML file should be transported to the VoiceXML file.

HTML file is parsed and the root node of the input file is obtained. Any HTML file’s root node will be the <html> node

<html>

<head> <body>

<html>

<head><title>

Example 1</title></head>

<body>

<h1> Hello World </h1>

</body>

</html>

Input HTML file Output parse tree

(htmlRoot = new RootNode())

.addNode(new PageNode()

.addNode(new HeadNode()

.addNode(new TitleNode()

.addNode(new StringNode().setHtmlData(“Example1”))

) //end TitleNode

) //end HeadNode

.addNode(new BodyNode()

.addNode(new H1Node().setAlign(``center’’)

.addNode(new StringNode().setHtmlData( ``Hello World ‘’))

) // end H1 Node

) // end Body Node

) //end PageNode

Parsing Example

Translating Phase: Issues

Translating phase: Node tree is traversed recursively (from left to right – depth first).

Html node converted to appropriate VoiceXML node.

Issues:

Verify inputs before submission – different from HTML

Highly structured – follows strict convention eg. consider <prompt> It is a beautiful city </prompt> syntactically right, but can be child of only field or block

One to one conversion not always possible

Forms: radio tag

Radio tags – provide choices, user selects one choice.

When one choice selected, other(s) becomes inactive.

HTML – radio tags does not have closing tag.

Challenge is to identify the last ‘radio’ button of the same type.

example: Input HTML section

<form>

<INPUT type = radio name = “sex’’ value=“male”> Male <br>

<INPUT type = radio name = “sex’’ value=“female”> Female <br>

<h1> End of Radio </h1>

</form>

Forms: radio tag (contd.)

Output VoiceXML section ……

<field name=“sex”>

<prompt> Please select an Entrée, what sex <enumerate/></prompt>

<option dtmf=“1” VALUE=“Male”> Male </option>

<option dtmf=“2” VALUE=“Female”> Female </option>

</field> …….

Form node

Radio: male sex

Radio: female sex

h1

String: ‘end of radio’

Form: Text Box

text box and text area are used to obtain String inputs from user.

No sample space for string : e.g., name of a person.

VoiceXML inputs need a grammar always. <record> element is used to solve the problem.

User can specify record time and attributes.

<submit> needs a list of fields and a URL for submission.

Should verify the inputs with user before submission.

Form: text box (contd.)

Sample HTML extract Corresponding VoiceXML extract

…….

<form action=WW method=XX>

<LABEL for=“firstname”> Firstname </LABEL>

<INPUT type=“text” id=“firstname”>

<INPUT type=“submit” value= “send”>

</form>

……..

……..

<form id=“f2”>

<record name=“firstname” beep=“true” maxtime=“10s” finalsilence=“4000ms” dtmfterm=“true”>

<prompt> At tone, speak First name: </prompt>

<noinput> I did not hear anything, please try again </noinput>

<filled> <prompt> Your input is <audio expr=“firstname”/></prompt>

</filled>

…….

<submit next=WW method=XX namelist= …..> </form>

Links

In HTML, links are given by <a href..> tag in two ways:

• To different part of the same document.

• To a different document altogether.

In VXML, links are provided by <goto next ..> method.

To Internal documents: Sub-dialogs are created. Sub-dialog is like a function call. <goto next= sub-dialog name>

To External documents: <goto next=URL>. The target HTML URL is converted to a VoiceXML page, thus VoiceXML URL is provided.

Text Display Tags

Tags used for display – does not make much sense in VoiceXML.

Function of some display tags can be spoken out orally

<block>…….</block> and <prompt>…….</prompt> are tags used to speak out text enclosed between them.

Content to be spoken can be tailored using Interface sheet.

The Interface sheet – used to add new HTML tags, making the system Extensible

Extensible Feature of Transcoder

A

B

Input Attributes

HTML Tags Corresponding Text spoken

Input duration in seconds for Text-box :

Input duration in seconds for Text-Area :

………….

<blockquote>

</blockquote>

…………

Starting of text quoted from elsewhere

Ignore

…………..

Row A – Input Attributes can be supplied by the user

Row B – Treatment of HTML tags can be altered, ignored. New tags can be added in this section.

Conclusion

Our transcoder is capable of converting any HTML (4.0 or lower version) file to corresponding VoiceXML file.

Prominent feature of the Transcoder – Extensibility and User Inter-activeness.

HTML to VoiceXML paves the way for Anytime, Anywhere Internet access for visually impaired (as well as cell phone users).

Future Work

Process applets and scripts that may be present in input HTML page.

Build a true voice-based web (see next talk).

Related Work

The visually impaired – used Screen readers

F. James proposed Auditory HTML Access System (AHA) – used distinct tones

Above two systems – No Interactive feature.

Goose et al. proposed HTML to VoXML converter. VoXML is the ancestor of VoiceXML.

Documents

AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamala Gopal Gupta B. Prabhakaran DEPARTMENT OF COMPUTER SCIENCE THE UNIVERSITY