Upload
umang-aggarwal
View
227
Download
0
Embed Size (px)
Citation preview
8/3/2019 Pham Thang Presentation
1/14
Real-Time Speech Recognition
Thang Pham
Advisor: Shane Cotter
8/3/2019 Pham Thang Presentation
2/14
Background
Types of speech recognition systems: Word recognition, Connected speech recognition, Speech understanding systems
Simplest: user-dependent limited vocabulary
Hard to design any system Variations of speech, i.e.
amplitude, duration, and signal to noise
Background noise Reverberation noise.
Implemented in banking, telephone, etc. IBM ViaVoice
8/3/2019 Pham Thang Presentation
3/14
Project Outline
Design a user-dependent speech recognition system to controlthe movement of a small remote control car
Limited in vocabulary: Backward, Forward, Left, and Right Trained to my voice
Different speech recognition algorithms were examined tounderstand the advantages and disadvantages of each system
Linear Predictive Coding
Cepstrum Coefficients
Mel-frequency Cepstrum Coefficients
8/3/2019 Pham Thang Presentation
4/14
System Design
Microphone
TI 6713 DSP Board
Sample word at 8 kHz
Segment word into time frames
Find Mel-Cepstrum coefficientsfor each frame
Compare input word to acodebook of defined words using
dynamic time warping
Recognizedword
8/3/2019 Pham Thang Presentation
5/14
Components List
Texas Instruments TMS320C6713 DSP Board
Audio Technica Omnidirectional Microphone
ATR35S
Two step motors
8/3/2019 Pham Thang Presentation
6/14
Linear Predictive Coding
Provides a good model of the speech signal.
Can approximate a speech sample at time n from pastsamples.
where a1,a2,,ap are coefficients that weight each sample.
)(...)2()1()( 21 pnsansansans p
8/3/2019 Pham Thang Presentation
7/14
Mel-frequency Cepstrum Coefficients
Research has shown mel-frequency cepstrumcoefficients to be betterthan cepstrum coefficientsand LPC Modeled around human
auditory system (ear)
where cn
is the nth ordermel-frequency cepstrum,and Sk is the power of thekth mel filter.
12 mel-frequency cepstrumcoefficients characterize
each time frame
M
k MknkSLognC
1
]*)5.0(*cos[*])[(][
8/3/2019 Pham Thang Presentation
8/14
Dynamic Time Warping
Arranged mel-frequency coefficients into vectors
Use dynamic time warping to find best match
Compare words that are uttered in a different timeframe. You have a referenced word that you are listening
for
You have a sampled word
Want to compared both words, sampled andreferenced, and see if they match
Compare mel-frequency cepstrum coefficients foreach frame of speech
8/3/2019 Pham Thang Presentation
9/14
Dynamic Time Warping
Example of DTW:
8/3/2019 Pham Thang Presentation
10/14
Dynamic Time Warping
Solution:
8/3/2019 Pham Thang Presentation
11/14
Results
Word Recognition Rate
Backward 50 %
Forward 70 %
Left 90 %
Right 40 %
Sources of error: 1. Noise, i.e. computer fan, fluorescentlight.2. Voice changes, i.e. a word spoken ona day might not sound the same on the
next day3. Trained to one word template
8/3/2019 Pham Thang Presentation
12/14
Problems Encountered
Warping frequency domaininto mel-frequency, i.e.Log10.
Translation of MATLAB codeinto C, i.e. dynamic arrays,debugging process
Dynamic time warping, i.e.theory, algorithm
7001*2595
10Hz
mel
FLogF
8/3/2019 Pham Thang Presentation
13/14
Future Work
The C implementation of this system is being developed.The implementation will be uploaded onto the TI 6713 DSPBoard once it is completed.
The code will be modified to allow the recognition systemto operate in real-time.
A more comprehensive testing of the system will beperformed under a variety of noise conditions.
8/3/2019 Pham Thang Presentation
14/14
That is all.
http://images.google.com/imgres?imgurl=http://programs.chemeketa.edu/theater/hand/hand.jpg&imgrefurl=http://programs.chemeketa.edu/theater/hand/index.html&h=398&w=440&sz=31&hl=en&start=65&tbnid=faiknrgUd9IV8M:&tbnh=115&tbnw=127&prev=/images%3Fq%3Dhave%2Ba%2Bnice%2Bday%26start%3D60%26ndsp%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN