Pham Thang Presentation

8/3/2019 Pham Thang Presentation

1/14

Real-Time Speech Recognition

Thang Pham

Advisor: Shane Cotter


2/14

Background

Types of speech recognition systems: Word recognition, Connected speech recognition, Speech understanding systems

Simplest: user-dependent limited vocabulary

Hard to design any system Variations of speech, i.e.

amplitude, duration, and signal to noise

Background noise Reverberation noise.

Implemented in banking, telephone, etc. IBM ViaVoice


3/14

Project Outline

Design a user-dependent speech recognition system to controlthe movement of a small remote control car

Limited in vocabulary: Backward, Forward, Left, and Right Trained to my voice

Different speech recognition algorithms were examined tounderstand the advantages and disadvantages of each system

Linear Predictive Coding

Cepstrum Coefficients

Mel-frequency Cepstrum Coefficients


4/14

System Design

Microphone

TI 6713 DSP Board

Sample word at 8 kHz

Segment word into time frames

Find Mel-Cepstrum coefficientsfor each frame

Compare input word to acodebook of defined words using

dynamic time warping

Recognizedword


5/14

Components List

Texas Instruments TMS320C6713 DSP Board

Audio Technica Omnidirectional Microphone

ATR35S

Two step motors


6/14

Linear Predictive Coding

Provides a good model of the speech signal.

Can approximate a speech sample at time n from pastsamples.

where a1,a2,,ap are coefficients that weight each sample.

)(...)2()1()( 21 pnsansansans p


7/14

Mel-frequency Cepstrum Coefficients

Research has shown mel-frequency cepstrumcoefficients to be betterthan cepstrum coefficientsand LPC Modeled around human

auditory system (ear)

where cn

is the nth ordermel-frequency cepstrum,and Sk is the power of thekth mel filter.

12 mel-frequency cepstrumcoefficients characterize

each time frame

M

k MknkSLognC

1

]*)5.0(*cos[*])[(][


8/14

Dynamic Time Warping

Arranged mel-frequency coefficients into vectors

Use dynamic time warping to find best match

Compare words that are uttered in a different timeframe. You have a referenced word that you are listening

for

You have a sampled word

Want to compared both words, sampled andreferenced, and see if they match

Compare mel-frequency cepstrum coefficients foreach frame of speech


9/14


Example of DTW:


10/14


Solution:


11/14

Results

Word Recognition Rate

Backward 50 %

Forward 70 %

Left 90 %

Right 40 %

Sources of error: 1. Noise, i.e. computer fan, fluorescentlight.2. Voice changes, i.e. a word spoken ona day might not sound the same on the

next day3. Trained to one word template


12/14

Problems Encountered

Warping frequency domaininto mel-frequency, i.e.Log10.

Translation of MATLAB codeinto C, i.e. dynamic arrays,debugging process

Dynamic time warping, i.e.theory, algorithm

7001*2595

10Hz

mel

FLogF


13/14

Future Work

The C implementation of this system is being developed.The implementation will be uploaded onto the TI 6713 DSPBoard once it is completed.

The code will be modified to allow the recognition systemto operate in real-time.

A more comprehensive testing of the system will beperformed under a variety of noise conditions.


14/14

That is all.
http://images.google.com/imgres?imgurl=http://programs.chemeketa.edu/theater/hand/hand.jpg&imgrefurl=http://programs.chemeketa.edu/theater/hand/index.html&h=398&w=440&sz=31&hl=en&start=65&tbnid=faiknrgUd9IV8M:&tbnh=115&tbnw=127&prev=/images%3Fq%3Dhave%2Ba%2Bnice%2Bday%26start%3D60%26ndsp%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN

Documents

Pham Thang Presentation