Human Speaker Recognition Based on the Integration of -05590672

8/2/2019 Human Speaker Recognition Based on the Integration of -05590672

1/4

Human Speaker Recognition based on the integration of

Genetic Algorithm and RBF Network

Yan Zhou

Department of electronics & information engineering,Suzhou Vocational University

Suzhou, Jiangsu, China

e-mail: [email protected]

Yunian Gu

Department ofelectronics & information engineering,Suzhou Vocational University

Suzhou, Jiangsu, China

e-mail: [email protected]

Abstract Although the human speaker recognition system

based on RBF network is one of the main models for

recognizing speakers, it has some shortcomings, this is, the

hidden layer node number of RBF network is often hard to be

assigned, and the system has a slow convergence speed. In this

paper, a RBF network optimization scheme based on genetic

algorithm for human speaker recognition is proposed. In the

scheme, the hybrid encoding genetic algorithm is used to

optimize the connected weights and structure of RBF networkand the redundant nodes and redundant connected weights are

removed from the network effectively. The scheme utilizes the

parallelism of the neural network and the global search

capability of the genetic algorithm, so it improves the

processing capability of the network evidently. The

experimental tests show that the scheme based on hybrid

encoding genetic algorithm has a fast learning speed, a high

recognition rate, and it is a new practical scheme for human

speaker recognition.

Keywords- human speaker recognition; RBF network;

genetic algorithm

I. INTRODUCTIONSpeaker recognition is a technology that can recognize or

confirm the identity of speakers. The recognition process isbased on the parameters of voice samples. Generallyspeaking, it consists of pre-processing, feature extraction,pattern matching and module reference.

Establishing a speaker recognition system involves twostages: training and recognizing. The current models forspeaker recognition mainly includes Vector-based model,Gaussian mixture model, Hidden Markov models andArtificial neural network model [1,2] etc. While among allartificial neural network models, RBF network is the mostwidely used system. However existing research [3] showsthat RBF network is designed primarily on the experiences

of designers, the designers selected samples among repeatedexperiments in large sample space, with no theoreticalguidance. Therefore, initial connection weights and choice ofnetwork structure is much too random, and there is lesspossibility of overall network optimization.

Developed in recent years, speaker recognition model [4]based on genetic algorithm (GA) is a robust identificationmethod. Genetic algorithm can guide the research anddetermine the direction, and finally obtain the optimalsolution of the network structure optimization. The algorithmselects initial population, crosses and mutates individuals

mainly based on the sample fitness function. Although thereare many successful cases [5] of applying genetic algorithmto optimize neural networks, most of the research mainlyfocused on training neural network weights, but ignored theconnection between neural network structure and weights.

Based on above mentioned shortcomings, the hybrid

coding program for the integration of network structure and

weights is proposed in this paper. The hybrid coded

individual can be trained by genetic algorithm. As a result,the most optimum training result is used to be the network

model of speaker recognition system. The improved genetic

operator can simultaneously realize optimization both of

network structure and weights.

II. CHOICE OF MAIN SCHEMEA. Principle of Speaker Recognition

System is realized by RBF neural network model basedon hybrid genetic algorithm. The main design process is asfollows:

First, performing endpoint detection, pre-emphasis, sub-

frame, and add hamming window for the input speechsamples, then the voice feature parameters can be extracted.Second, setting up the speaker model and training the modelparameters. The process can be designed as this: thealgorithm randomly generates different neural networkstructures, then encoding every corresponding parameter tobe individual, so that every individual represents a neuralnetwork. The network structures can be trained by differentinitial weights and then calculating the fitness. The algorithmselects the individuals with high fitness to inherit nextgeneration, perform genetic operation to current group andproduce the next generation, then repeat the above to find outone individual. The best individual is deemed to be themodel of speaker recognition system. Third, the algorithm

sets up network training parameter and trains voice featureparameters of the speakers, as a result, the reference patterndatabase can be established. Fourth, mainly about the processof speaker recognition, identify speaker based on thenetwork weight value.

B. Extraction of speech featureFeature extraction is the key problem to pattern

recognition. Currently the most popular feature parameter isthe Cepstral coefficient based on all-pole channel model andthe MEL Cepstral coefficient based on human auditory

2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics

978-0-7695-4151-8/10 $26.00 2010 IEEE

DOI 10.1109/IHMSC.2010.66

248

2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics

978-0-7695-4151-8/10 $26.00 2010 IEEE

DOI 10.1109/IHMSC.2010.66

239


2/4

characteristics. Studies have shown that [6] parametersMFCC compared parameters LPCC have better recognitionresults, and less sensitive to noise. Therefore MFCC hasbeen more widely used. The MFCC parameters usually takea minimum of 12 to 16. After testing, 12 bands parametersare selected in this paper.

C.

Selection of encoding schemeIt requires encoding data before adopting genetic

algorithm to find central values of hidden layer nodes.Encoding methods includes binary coding, floating point andsymbol coding. The most common coding method is binaryencoding which uses fixed-length binary symbol strings torepresent individuals in groups. However, coding length islimited, and the precision of the parametric will inevitably beaffected during encoding and decoding. While, the float-encoding based genetic algorithm, each individual gene inthe initial population can be formed by uniform distributedrandom numbers. So it makes up for shortcomings of binarycode to some extent. This paper has adopted hybrid-codedgenetic algorithm, that is, the same individual contains both

the binary coding and floating-point coding. Binary codingrepresents structure of RBF network while floating-pointencoding represents the corresponding parameters, and thatis beneficial to improve the algorithm accuracy.

III. PROCEDURE OF MODEL CONSTRUCTIONSpeaker recognition system typically including four main

modules: feature extraction, model training, patternmatching, and logical decision-making, In this research, weuse RBF network to be the recognition model, in theprocedure of model construction, the most difficult is how todetermine the network connection weight and selectencoding scheme. Here, genetic algorithm is applied tooptimize neural networks. This model contributes in two

aspects: Firstly, optimizing network connection weight,Secondly, optimizing network structure. In this paper, hybridcoding method is used to encode RBF network, the structureof RBF network is designed as an m-dimensional input, n-dimensional output and up to L hidden layer. The encodedindividuals become operating variables of genetic algorithm,and RBF networks are trained by genetic algorithm. Detailsteps are as follows:

Step 1 Sample InitializationThe key operation is to set population size in this step,

that is, to set the number of genes encoding combinations. Inthis paper, the initial population is formed by L individuals,

that is kA , each individual consists of two parts: The first

part is binary-coded network structure. The second part is

floating-point encoded initial weight factor. Then thealgorithm determines population size. At first, a certainnumber of genetic individuals should be generated randomly,and then the best individuals can be selected and be added tothe initial populations. The size constantly keeps iterative.When the initial population of the network meets the sizewhich had been built, the process of iterative can be ended.

Step 2 When the optimal individual in group meets therequirements or the evolutionary time reaches, then thetraining can be stopped; otherwise continue.

Step 3 Cross every individual in the population.Randomly select the same place of two individuals andexchange them according to crossover probability in theselected place. There are two encodings in the same

individual. Different cross methods are used to the twoencodings. The location of choice is binary or real numbersystem should be determined firstly. Location of binarycoding can be exchanged directly, but for floating-pointcoding, to cross the selected individuals with probability of

cP.

Step 4 Mutation OperationBecause of biological genetic principles of gene

mutation, the algorithm can implement mutation for somebits of some individuals according mutation probability.When mutate, it is still necessary to determine the type ofencoding, depending on the different type, to operate bits ofthe string which has mutated.

Step5 New individual collection can be recorded as kB .Then the fitness function of new individual also need be

calculated through the formula of 1fE

= . Which, Eis

the objective function for the network as,

2

1 1

( ( ) ( ))N O

i i

i i

E Y t Y t= =

= . In equation, ( )iY t and

( )iY t separately indicate the actual output and the expected

output of the training data at the number of t and the outputnode of i . O and N separately indicate the number ofoutput nodes and the input data number.

Step 6 Perform another round of selection, crossover and

mutation, until it reaches a satisfactory number of iterations.You could get the genetic optimized network and parametersafter the optimal individual in the final population individualis decoded and then set up the speaker recognition module.

IV. SIMULATION EXPERIMENTA. Experimental Design

In the speaker recognition simulation experiment, usingVisual C + + as the development platform. The systemprogram for the output of the network using criteria [7] asfollows: selecting maximum value from the 15-dimensionaloutput node, and then set 1, the other output set 0. The output

node of 1 is the speaker. It is the end of training when Error510E < or training times greater than 100 timesin other

word, at this time the Program sets end.Speech data used in the experiment is form Timit

database. In this experiment, different people use differentcontent for network training, moreover, the training voicemast keep text-independent. There are 15 participants andeach has to read 16 segments of speech. Each voice-length is20s. So, 15-dimention features can be extracted from the

249240


3/4

samples and save in record. In the experiment, the voice ofspeakers will be divided into two parts, of which 8 segmentsare used to train the network, and the others are used torecognize the speaker.

When set the parameters of genetic algorithm, set

population size as 40L = , in which, cross

probability 0.85cP=

, and mutation probability0.006mP = . Set 100 as the maximum number of iteration,

the search range for hidden layer node number is 15 ~

28.Because the experiment have to separate 15 identity of

speakers, so the output nodes set as 15.

B. Analysis of experimental results(1) Performance comparison, one RBF neural network is

based on hybrid-coded genetic algorithm and the other RBFneural network is traditional

The parameters of RBF neural network based onhybrid-coded genetic algorithm (HGA-RBF Network) can

be set by using the principles above. The simulated result is

showed in Figure1.As compared with RBF networksimulation results, it obviously shows that the self-adaptive

optimization design of HGA-RBF Network overcomes the

problem of local optimum which is can not evitable intraditional RBF network.

Figure 1. Comparison of network performance

(2) Comparison of recognition rateIn order to explain the superiority of better HGA-RBF

Network, the recognition rate of speaker recognition systemof HGA-RBF Network and traditional RBF network can becompared. Make the hidden layer nodes of HGA-RBFnetwork up to 28, and finally can obtain RBF network

structure and the corresponding weights. After training,hidden layer need 18 nodes. The network structure is muchsimpler than traditional RBF network structure. Therefore,HGA-RBF network structure is set as 12-18-15. So thedefinition of HGA-RBF network structure is 12 input, 15output, hidden layer nodes for 18; its hidden layer transfer

function is tansig , the output layer transfer function

is purelin . Table1 shows the comparison result ofrecognition rate.

TABLE I. COMPARISON OF RECOGNITION RATE

Network typeTraining voice length/s

5 10 15 20

TraditionalRBF network

72.11 78.72 80.43 81.98

HGA-RBFnetwork

81.44 87.23 90.19 95.01

The experimental results has shown that: under the sameerror conditions, the HGA-RBF network needs lessiterations, more efficient, fast training speed and higherrecognition ratio. The proposed hybrid-coded geneticalgorithm is able to achieve simultaneous optimization of thestructure, weight and threshold value, to avoid randomnesson neural network selection, and to improve the computationefficiency. In summary, it is a feasible and effective schemefor speaker recognition.

V. CONCLUSIONUsing neural networks as classifiers in speaker

recognition always has a problem. The topology design ofnetwork and initial weight settings is not theoreticallysupported. This shortcoming results in large network scale,inefficiency identification and other problems. In this paper,genetic algorithm is used to adjust network topology andnetwork parameters self-adaptively. The dynamic adjustmentcan obtain the optimal network design. This also helps toovercome the problem that neural networks often easilystuck in a local solution moreover, to increase the RBFnetwork generalization. In a word, the performance ofspeaker recognition system can finally be enhanced.

ACKNOWLEDGMENT

This research was supported by the grants ofNational Science Foundation of China (No. 60970058), and

sponsored by the grants of Natural Science Foundation

of Jiangsu Province of China (No.BK2009131), and the

Science and technology Foundation of suzhou vocationaluniversity (No.SZD09L26).

REFERENCES

[1] Zhan-ming Li, Zhen Wang. Vector quantization and neural networkscombined speaker recognition system [J]. Computer Engineering andApplications, 2006, (15) :205-207.

[2] Salameh W A. Detection of Intrusion Using Neural Networks: Acustomized study [J].Studies in Informatics [2] Salameh W A.Detection of Intrusion Using Neural Networks: A customized study[J]. Studies in Informatics and Control ,2004,13(2):137. and Control,

2004,13 (2): 137.

[3] Yoshihiro Yamamoto, Nikiforuk P NA new supervised learningalgorithm for multilayered and inter-connected neural network. IEEETrans. On Neural Network, 2001, 11(1):36-46. [5] YoshihiroYamamoto, Nikiforuk P NA new supervised learning algorithm formultilayered and inter-connected neural network. IEEE Trans. On

Neural Network, 2001, 11 (1) :36-46.

250241


4/4

[4] Dam M, Saraf D N. Design of neural networks using geneticalgorithm for On-line property estimation of crude fractionator

products[J].Computersti and Chemical Engineering,2006,30(4):722-729.

[5] Chai Yi. Based on improved genetic algorithm neural networkadaptive optimal design [J]. Journal of Chongqing University(Natural Science Edition). 2007, 30 (4): 91-96..

[6] Ahmed Mezghani ,Douglas Speaker verification using a newrepresentation based on a CMFCC and fomants[J].IEEE Electricaland Computer Engineering,2005,22:1469-1472.

[7] Bing Wang, Jing-lin Xiang. Based on neural network human PulseRecognition [J]. Northwestern Polytechnical University, 2002,20 (3):454-457.

251242

Documents

Human Speaker Recognition Based on the Integration of -05590672