The Teaching Experiment of Speech Recognition

““THE TEACHING EXPERIMENT OF THE TEACHING EXPERIMENT OF SPEECH RECOGNITION”SPEECH RECOGNITION”

Presented by:Nikita DhanvijayM.E 2nd year

Learning ObjectivesLearning Objectives INTRODUCTIONLITERATURE SURVEYASR MODELMFCC, LPC, HMMRESULT AND EXPERIMENTCONCLUSIONADVANTAGES AND APPLICATIONREFERANCES

IntroductionIntroduction The speech is primary mode of communication among human

being and also the most natural and efficient form of exchanging information among human in speech.

Speech recognition is proposed in order to solve the recognition problem faced in the information processing,

Voice recognition requirements too high on the mathematical foundations ,the theory teaching is too difficult, the content is too deep, so the excessive theoretical studies are difficult to grasp technology and key points of the speech recognition.

It introduced the theoretical basis and processes, and then given instance of building a speech recognition system.

Improve the students' interest in learning and ability, and achieved good results.

Modes of Communication

SPEECH RECOGNITION TECHNIQUES SPEECH RECOGNITION TECHNIQUES

The goal of speech recognition is for a machine to be able to "hear,” understand," and "act upon" spoken information.

The goal of automatic speaker recognition is to analyze, extract characterize and recognize information about the speaker identity.

The speaker recognition system may be viewed as working in a four stages

1. Analysis 2. Feature extraction 3. Modeling 4. Testing

The baisc configuration of speech recognition The baisc configuration of speech recognition system system

LITERATURE SURVEYLITERATURE SURVEY

In authors proposed Inter-word Co articulation modeling and MMIE training for improved connected digit recognition The authors describe developments by the speech research group at CRIM

In authors present a research method to directly recognize greeting voice without segmentation to avoid error recognition because of error segmentation. The basic principle of biometric pattern recognition is applied to speaker- independent and continuous speech recognition of greeting.

Hidden Markov models (HMM) Modern general-purpose speech recognition systems are based on

Hidden Markov Models. These are statistical models that output a sequence of symbols or quantities. HMMs are used in speech recognition because a speech signal can be viewed as a piecewise stationary signal or a short-time stationary signal.

Automatic Speech Recognition System Automatic Speech Recognition System architecturearchitecture

Speech Recognition StagesSpeech Recognition Stages

Feature Extraction In the feature extraction step, the speech waveform

which is sampled at a rate between 6.6 to 20 kHz is processed to produce a new representation as a sequence of vectors containing values of features or parameters. The vectors typically comprise of 10 to 39 parameters, and are usually computed every 10 or 20 msec.

Acoustic modelAcoustic modelThe parameter values extracted from raw speech are

used to build acoustic models which approximate the probability that the portion of waveform just analyzed corresponds to a particular phonetic event that occurs in the phone-sized or whole-word reference unit being postulated.

Lexical and Language ModelsLexical and Language Models

define the vocabulary for ASR the properties of a language and predict the next

word in a speech sequenceWhen speech is produced in a sequence of words,

language model or artificial grammars are used to restrict the combination of words.

MFCCMFCC

• It is based on a concept called cepstrum.•Cepstrum spectrum• cepstrum transform •Fourier transform > complex logarithm > inverse Fourier transform.•MFCC has been found to perform well in speech recognition systems is to apply a non linear filter bank in frequency domain

It produces coefficients minimizing the difference between the actual speech samples and the linearly predicted ones.

Linear Predictive Coding(LPC)

HMM

Markov model can be treated as finite state automaton for which each transition has an associated probability and will have sequence of states called a Markov chain.

A strategy which makes use of stochastic model of speech production is known as HMM

It also known as Statistical markov model , it contains unobserve or hidden states.

HMM for speech recognition will have each state capable of generating a finite number of possible outputs.

SOFTWARE :SOFTWARE :

HTKThe Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and

manipulating hidden Markov models. HTK is primarily used for speech recognition. HTK is works on windows and linux.

Training (1)K-means training a reasonable state transition

probability and the initial value of the output probability.

(2)The first step is to train the initial parameters further training according to the Baum-Welch algorithm. It should be noted that the Baum-Welch algorithm is just the classic method widely used to solve this problem, but not the only, but also far not the most perfect way.

Identification

Use the Viterbi algorithm to dynamically find the hidden Markov model state transition sequence (ie identify the results), the time complexity is far less than the total probability formula.

Using the Viterbi algorithm can be found not only a good enough state transition path and the path corresponding to the output probability can also be obtained.

Using the Viterbi algorithm can be found not only a good enough state transition path and the path corresponding to the output probability can also be obtained.

Implementation Process

Yes/No Recognition system Yes / No identification system is a 2-word recognition system, It is the

most basic automatic speech recognition (ASR) system, its vocabulary:{Yes, No}.

Create a training set First, we recorded the “Yes” and “No” voice signal, to train the word

models (training set). Acoustic analysis This step is the acoustic analysis: HCopy which is contained tool in

HTK can be used to convert the original waveform files, generate a series of acoustic vector. HMM definition three acoustical events have using HMM modeling: Yes, No, Silence. Each event must be designed a HMM.

HMM training process Training: HRest, which tool is in HTK, Its function is to estimate, estimated

that the optimum value of the parameters of the HMM. Each time, HRest iteration is displayed on the screen, We can see if it has converged, Once the measure of value is no longer reducing (absolute value), This process should be stopped. In our example, two or three times estimation iteration is sufficient.

Task definition Before the use of the word model, it is necessary to define the

basic structure of the identifier (tasks syntax). Grammar and Dictionary First, we define the simple syntax:

Start-Pause, Word (Yes, No),End-Pause. The task grammar is written as a text file in HTK:

THE BASIC TASK SYNTAX THE BASIC TASK SYNTAX

TASKS DICTIONARY TASKS DICTIONARY

RECOGNITION RESULTSRECOGNITION RESULTS

hcopy converts the input signal into series of acostic vectot hcopy converts the input signal into series of acostic vectot

=========== HTK Results Analysis============

Date: Sun Oct 22 16:14:45 2012 Ref : testrefs.mlfRec : recout.mlf ------------------------ Overall Results

----------------- SENT: %Correct=80 [H=80, S=20, N=100] WORD: %Corr=81, Acc= 75[H=243, D=29, S=28,

I=18, N=300]

ADVANTAGES AND APPLICATIONADVANTAGES AND APPLICATION

AdvantagesSpeech is a very natural way to interact, and it is not

necessary to sit at a keyboard or work with a remote control.

No training required for users.

Applications In-car systems Typically a manual control input, for example by means of a finger control on the steering-wheel, enables the speech recognition system and this is signaled to the driver by an audio prompt. Health care1.Medical documentation 2.Therapeutic use Military 1.High-performance fighter aircraft2.Helicopters3.Training air traffic controllers Telephony and other domain

CONCLUSIONCONCLUSION

Although an isolated-word recognition system based on HMM has been established, we still have some difficult in the test. The voice is depending on the environment: Voice acquisition in certain circumstances, it must be trained to identify, in the same environment or performance dramatically. Different quality microphone and placement, will cause the user to input voice mixed with noise have an echo. This will also affect the accuracy of identification.

REFERENCESREFERENCES R.Cardin, Y.Normandin and E.Millien, Inter-word coarticulation modeling and MMIE

training for improved connected digit recognition,ICASSP,p243-246,1994 Yu Bo,Yu Xuefeng, Analysis narrowband signal with Nyquist sampling

theorem[J],XIAMEN SCIENCE & TECHNOLOGY, p37-39,2005 Liao Guangrui,Hu Yue,Liu Ping, Fast implementation of isolated-word recognition system

based on CHMM[J], MICROCOMPUTER & ITS APPLICATIONS,28(13),2009 Ye Hong, Speaker-independent continuous speech recognition based on the biomimetic

pattern recognition[J], JOURNAL OF ZHEJIANG UNIVERSITY OF TECHNOLOGY, 34(4),2006

Guo Qiumin, Liu Xiaowen, Xu Bo, DSP Speech-Recognition System Based on Mel-Frequency Cepstrum Coefficients[J], Communication Technology,p387-390,2007

He Qiang,He Ying.MATLAB Expand programming [M].BeiJing: Tsinghua University Press,2002

Bian Jie,Study on the Key Techniques of Speaker-Independent Isolated Words Speech Recognition System [D], Dalian:Dalian University of Technology,2005.3

Yao Tianren,Digital Speech Processing [M].Wuchang: Huazhong University Press,2002 Davis S B,Mermelstein P.Comparison of parametric representations for monosyllabic word

Recognition in continuously spoken sentences[J].IEEE Trans,on Speech and Audio Signal Processing,p357-366,1980

THANK YOU

Engineering

The Teaching Experiment of Speech Recognition