petrR00(2)

8/8/2019 petrR00(2)

1/4

SPEECH CONTROL FOR CAR USING THE TMS320C6701 DSP

Department of Radio and Electronics, FEI STU, Ilkovicova 3, 812 19 BratislavaMartin Petriska, Duan Povaanec, Peter Fuchs

[email protected], [email protected], [email protected]

AbstractThis paper presents a design of voice module designed for using in cars. This module commu-

nicate with driver by human speech. It informs driver about the state of car equipments and

recognizes his voice commands. This feature makes it easy to control a lot of car equipment

by human voice. The system can be used also for speaker verification to protect car against

stealing. The system consist of DSP board with TMS320C6701, large memories and analog

codec. Accordingly actual situation in speech recognition and speech synthesis with aspect for

application in car technology will be described.

Introduction

Is it possible to use voice conversation with car, or is it only science fiction ? From the year 1998is there Clarion AutoPC, the first in dash computer/personal assistant controlled through voiceactivation. Voice control does just about everything a push-button would do. Conversation withmachine can be divided into two parts: speech synthesis and speech recognition. In speakingmode speech signal is generated according to central car computers commands, which are sentto the module. The commands can be as different as car position acquired from GPS module,name of telephone caller, telephone numbers, actual traffic informations, internet E-mails etc.In recognition mode the module analyses and executes the speaker commands e.g. to dialtelephone number, to tune the car radio, to open the windows, to run air-conditioning and controlothers car peripherals.

Commercial voice controlled systems for cars

Since year 1998 is there Clarion AutoPC on the USA market. Drivers can tell the ClarionAutoPC to select tracks from its optional six CD changer, activate or de-activate the navigationunit radio, scan or select specific stations or adjust the volume. When e-mails arrived, theClarion AutoPC automatically informs the driver the e-mail has arrived. Speech engine allowsusers to verbally browse through messages and hear who sent them, identify when they weresent and determine what the subject lines say. The same device sends drivers spoken trafficalerts concerning accidents and driving conditions. Speech technologies allow drivers to dialtheir phones via preset numbers simply by telling the unit to select a name in the address book.

And, since its all controlled through voice activation, driver never have to take his handsoff thewheel.

In the Europe is there Blaupunkts TravelPilot RNS149. Car audio and a terrific navigationsystem all in one. The GPS-based built-in navigation system gives you turn-by-turn guidancewith natural-sounding voice output.

8/8/2019 petrR00(2)

2/4

Figure 1: Functional diagram of TTS system

Speech synthesis

Speech synthesis in car voice module can be used for reading e-mails, SMS messages from mo-

bile phone, news from internet webpages or other informations which are in text form availableand can be simple filtered be interest. Text independent speech synthesis is in the most systemsbased on PSOLA methods. TD-PSOLA is currently one of the most popular concatenationmethods. These methods are used for their simplicity, high voice quality and great naturalness.Therefor we decided to use these methods in your voice module. We have developed the Slovakspeech synthesizers for personal computers and transform it into the DSP system. Block schemeof TTS system is shown on the figure 1. It consist from two main parts: Linguistic part and Dig-ital Signal Processing part. Linguistic part consists from three parts: the Text pre-processingmodule, phonetic transcriptor and the prosody generator. Text pre-processing module processinput text and converts numbers, short cuts, e.g. into text.The Phonetic transcriptor transform

text into sequence of diphone codes. Diphones are segments of speech used for our concatena-tive speech synthesis. The Prosody generator generate from text three prosodic parameters: F-fundamental frequency (pitch), T- diphone length, l-loudness. The Prosodic features have spe-cific functions in speech communication. They provide kinds of information e.g. relationshipsbetween word, finality or continuation, segmentation of the sentence into groups of syllables.

Speech recognition

Speech recognition is very complex problem. It requires using of many algorithms what makehigh computional requirements. To fulfill high performance new TMS320C6x DSP is chosen.Our project consists of two steps. In the first step we realize speaker verification. Speaker veri-

fication is the process of accepting or rejecting the identity claim of speaker. The algorithm forspeaker verification will be based on DTW (dynamic time warping). The second step is com-mand recognition, which need some additional memory to store comparing utterances. We planto use speech synthesis with adaptating speech parameters according to recognized speakersvoice for generating arbitrary utterance which will be compared whit speakers voice. This solu-tion will save memory space, because the recognized words will be generating on the fly from

8/8/2019 petrR00(2)

3/4

Figure 2: The block scheme of the DSP part

the text. At this time we are testing miscellaneous methods for recognizing on the PC. Thereare many open source projects in the world, which can be used as the model for developingour speech recognizer, but the speech synthesis and speech recognition is language dependend.Therefore Slovak lexical, prosodic rules, Slovak diphones database for speech synthesis must becreated as well Slovak speech database for independent speech recognition of Slovak language.At this time we have created two diphone databases, prosodic and lexical rules for Slovak TTS.TTS without prosodic rules was transformed into DSP and now work as speech value reader inpower supply meter.

Hardware - DSP part with TMS320C6701

The heart of speech processor is DSP board with processor TMS320C67xx. The device takesadvantage of large on-board memory RAM and ROM memories where the sound parameters are

stored. The main reason why to use most recently TI floating-point DSPs in speech processingis their high performance. TMS320C6701 and TMS320C6711 are very powerfull and thereforeare suitable for using in complex and high computational application e.g. speech analysis.At this time there were designed DSP board with processor TMS320C6701 which is used inPOWER SUPPLY meter. It uses TTS system with PSOLA algorithm for reading the measuredvalues.

Summary.

At this time there are several workplaces with develop in speech processing in Slovakia. Thegrade of develop is different. For the good speech recognition, large speech database is required.

At the workplace in the Slovak Academy of Science was finished project Speechdat-E, fixedtelephone speech database. Work on this project lasted approximately one year. For our projectwe need speech database recorded in car, but first experiments we can do with this database.

Digital signal processors TMS320C7601 or TMS320C7611 enable to design a system withvery high computional power and large memory space with minimal count of components whatsafes printed circuit board space and simplicifies design. These processors are very suitable for

8/8/2019 petrR00(2)

4/4

Figure 3: Detail of DSP module with TMS320C6701

speech processing. The DSP board can be easy installed in car modules as voice interface tothe central car computer. Work on the software for this module is in develop stage, because thespeech processing, especially speech recognition is very complex problem and need lot of time.Speech synthesis for this module need develop in prosodic module too.

This project is supported by project VTP 95/5195/297 with participate Slovak Academy ofScience by project VEGA 47/0214/99

References

[1] D. Povaanec, M. Petriska, P. Fuchs, Modern approaches to the speech synthesis based onDSP processors, In., Proceedings of conference NOVTECH 99 ilina, November 24-26,1999, pp. 41-46

[2] T.Dutoid, An Introduction to Text-To-Speech Synthesis, Kluwer Academic Publishers,Dordrecht Hardbound, ISBN 0-7923-4498-7 April 1997, 312 pp.

[3] http://www.ti.com

[4] http://www.autopc.com

[5] http://www.lhs.com

[6] http://www.speechdat.org

Documents

petrR00(2)