22
1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented by: Pujan Ziaie ([email protected]) Presented at Hybrid Intelligent Systems International Conference, 2004, Kitakyushu, Japan.

1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

Embed Size (px)

Citation preview

Page 1: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

1/20

A Novel Fuzzy Approach to Speech Recognition

Ramin Halavati, Saeed B. Shouraki, Pujan ZiaieSharif University of Technology

Tehran, Iran

Presented by: Pujan Ziaie ([email protected])

Presented at Hybrid Intelligent Systems International Conference, 2004, Kitakyushu, Japan.

Page 2: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

2/22

Summery

Introduction: Speech Recognition

Proposed Model Recognition Approach Training process Results

Page 3: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

3/22

Speech Recognition Several Methods

HMM ( Hidden Markof Models), TDNN (Time Delay NN), …

Common Problems: Effect of Noise Recognition Speed

Fuzzy approach: To Ignore details such as noise. similarity with human recognition process.

Page 4: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

4/22

Human Voice Recognition

Imprecise processing Deciding upon a rough measurement

of amplitude No counting on speech frames

(relative lengths) Sensitive to lower frequencies

Page 5: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

5/22

Proposed Model Base Data:

Speech Spectrogram Phonemes Specification (developed by using GA)

Data manipulation: Stretching Using MEL Filter Banks. (Human’s ear is

more sensitive to low frequencies and less to high ones.)

Fuzzification to reduce amount of data. (Human do not use that much precise data.)

Calculating the belongness to each phoneme

Page 6: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

6/22

Proposed Model

Spectrogram:

Page 7: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

7/22

Proposed Model

After MEL-Stretching

Page 8: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

8/22

Proposed Model

Data Reduction (Fuzzification)

Sorting

Reduction In the first step, the original signal frames are divided into 25 vertical ranges and then, the values inside each range are sorted so that the more powerful ones are moved to top.

In the second step, the top 10% values of each range are chosen and averaged and the result is replaced with the all the value of that range, making all values in each vertical range similar.

Page 9: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

9/22

Proposed Model

Fuzzification (Contd.)

Page 10: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

10/22

Proposed Model

Phoneme definition necessities: Colors Lengths (5 MFs)

1 Degree

of B

elief 0

0 Range of Amplitudes 100

Black Blue Magenta Cyan White

Page 11: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

11/22

Proposed Model

Sample Phoneme Definition:Range 25: Black or Blue

Range 24: Black or Blue

.

.

.

Range 4: Red or Yellow

Range 3: Blue or Magenta

Range 2: Black or Blue or Magenta

Range 1: Black or Blue or Magenta

Length: Average

Page 12: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

12/22

Recognition Method

The existence of appropriate phoneme definitions is assumed

Recognition Compare the given sample with all

phoneme definitions Choose the one with highest

compatibility value

Page 13: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

13/22

Recognition Method

Single Phoneme Comparison: Comparing the color pattern of the

phoneme with all frames of the given sample.

Finding the matching sequences. Comparing the length of a matching

sequence with the required length.

Page 14: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

14/22

Recognition Method

Sample, Step One:

Range 25: Black or Blue

Range 24: Black or Blue

.

.

.

Range 4: Green or Yellow

Range 3: Blue or Magenta

Range 2: Black or Blue or Magenta

Range 1: Black or Blue or Magenta

Input:( A column of the colors of the signal which is to be recognized)

Pattern:(The color pattern of the phoneme which is to be evaluated.)

Range 25: 100% or 10%

Range 24: 100% or 10%

.

.

.

Range 4: 0% or 20%

Range 3: 10% or 100%

Range 2: 10% or 90% or 0%

Range 1: 10% or 90% or 0%

Compatibility:(The compatibility measure between the signal colors and the phoneme’s pattern.)

Range 25: 100%

Range 24: 100%

.

.

.

Range 4: 20%

Range 3: 100%

Range 2: 90%

Range 1: 90%

After applying MAX:

20%

Final Result after applying MIN:

Page 15: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

15/22

Recognition Method

Sample, Step Two:

85 79 75 65 55 45 55 98 78 78 77 76 54 82 83 88 99 98 78 77

1.Output of Step 1:

3

2. Assuming the 75% as a threshold, the lengths are:

5 7

3. Selecting the max Length:

4. Computing Best Match Value:

( 82 + 83 + 88 + 99 + 98 + 78 + 77 ) / 7 = 86

82 83 88 99 98 78 77

5. Assuming requested Average Length for the Pattern:

Compatibility = 86 * IsAverage( 7 )

Page 16: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

16/22

Training

To get the proper phoneme’s specification (colors and length)

Using GA for data improvement

Page 17: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

17/22

Training Method Genetic Algorithm

Each Genome: Color Definitions Length Definitions Phoneme Descriptions

Cross Over: Combination of two genomes phoneme

Description part Mutation:

Randomly change a color or length definition. Randomly change a phoneme description part

Page 18: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

18/22

Training Approach: flowchartStart

Sort Genomes Based on their Fitnesses.

Throw out the last 50% Genomes.

Randomly choose some genomes and add their cross-overs to the gene pool.

Add a mutated copy of all available genomes to the gene pool.

Is Best Genome’s Fitness acceptable?

No

Terminate.

Yes

Create 100 Random Genomes and add them to the gene pool.

Page 19: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

19/22

Experimental Results

Comparison with HMMFuzzy Approach HMM Approach

1st correct answers: 85% 62.28

3rd correct answers (out of 62)[1]: 95% 79.60

6th correct answers (out of 62): 98% 86.98

[1] One of the top three guesses has been correct.

Page 20: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

20/22

Future Works To encounter color transitions in the model.

To enhance horizontal segmentations.

To test noise immunities.

To alter model to represent and recognize words.

Page 21: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

21/22

Acknowledgment

Special thanks to professor Hirota (TIT) for his useful advices and also giving me the opportunity to participate in the conference

Page 22: 1/20 A Novel Fuzzy Approach to Speech Recognition Ramin Halavati, Saeed B. Shouraki, Pujan Ziaie Sharif University of Technology Tehran, Iran Presented

22/22

Thank youAny questions?