Upload
shivani-srivastava
View
217
Download
0
Embed Size (px)
Citation preview
8/2/2019 Human Speaker Recognition Based on the Integration of -05590672
1/4
Human Speaker Recognition based on the integration of
Genetic Algorithm and RBF Network
Yan Zhou
Department of electronics & information engineering,Suzhou Vocational University
Suzhou, Jiangsu, China
e-mail: [email protected]
Yunian Gu
Department ofelectronics & information engineering,Suzhou Vocational University
Suzhou, Jiangsu, China
e-mail: [email protected]
Abstract Although the human speaker recognition system
based on RBF network is one of the main models for
recognizing speakers, it has some shortcomings, this is, the
hidden layer node number of RBF network is often hard to be
assigned, and the system has a slow convergence speed. In this
paper, a RBF network optimization scheme based on genetic
algorithm for human speaker recognition is proposed. In the
scheme, the hybrid encoding genetic algorithm is used to
optimize the connected weights and structure of RBF networkand the redundant nodes and redundant connected weights are
removed from the network effectively. The scheme utilizes the
parallelism of the neural network and the global search
capability of the genetic algorithm, so it improves the
processing capability of the network evidently. The
experimental tests show that the scheme based on hybrid
encoding genetic algorithm has a fast learning speed, a high
recognition rate, and it is a new practical scheme for human
speaker recognition.
Keywords- human speaker recognition; RBF network;
genetic algorithm
I. INTRODUCTIONSpeaker recognition is a technology that can recognize or
confirm the identity of speakers. The recognition process isbased on the parameters of voice samples. Generallyspeaking, it consists of pre-processing, feature extraction,pattern matching and module reference.
Establishing a speaker recognition system involves twostages: training and recognizing. The current models forspeaker recognition mainly includes Vector-based model,Gaussian mixture model, Hidden Markov models andArtificial neural network model [1,2] etc. While among allartificial neural network models, RBF network is the mostwidely used system. However existing research [3] showsthat RBF network is designed primarily on the experiences
of designers, the designers selected samples among repeatedexperiments in large sample space, with no theoreticalguidance. Therefore, initial connection weights and choice ofnetwork structure is much too random, and there is lesspossibility of overall network optimization.
Developed in recent years, speaker recognition model [4]based on genetic algorithm (GA) is a robust identificationmethod. Genetic algorithm can guide the research anddetermine the direction, and finally obtain the optimalsolution of the network structure optimization. The algorithmselects initial population, crosses and mutates individuals
mainly based on the sample fitness function. Although thereare many successful cases [5] of applying genetic algorithmto optimize neural networks, most of the research mainlyfocused on training neural network weights, but ignored theconnection between neural network structure and weights.
Based on above mentioned shortcomings, the hybrid
coding program for the integration of network structure and
weights is proposed in this paper. The hybrid coded
individual can be trained by genetic algorithm. As a result,the most optimum training result is used to be the network
model of speaker recognition system. The improved genetic
operator can simultaneously realize optimization both of
network structure and weights.
II. CHOICE OF MAIN SCHEMEA. Principle of Speaker Recognition
System is realized by RBF neural network model basedon hybrid genetic algorithm. The main design process is asfollows:
First, performing endpoint detection, pre-emphasis, sub-
frame, and add hamming window for the input speechsamples, then the voice feature parameters can be extracted.Second, setting up the speaker model and training the modelparameters. The process can be designed as this: thealgorithm randomly generates different neural networkstructures, then encoding every corresponding parameter tobe individual, so that every individual represents a neuralnetwork. The network structures can be trained by differentinitial weights and then calculating the fitness. The algorithmselects the individuals with high fitness to inherit nextgeneration, perform genetic operation to current group andproduce the next generation, then repeat the above to find outone individual. The best individual is deemed to be themodel of speaker recognition system. Third, the algorithm
sets up network training parameter and trains voice featureparameters of the speakers, as a result, the reference patterndatabase can be established. Fourth, mainly about the processof speaker recognition, identify speaker based on thenetwork weight value.
B. Extraction of speech featureFeature extraction is the key problem to pattern
recognition. Currently the most popular feature parameter isthe Cepstral coefficient based on all-pole channel model andthe MEL Cepstral coefficient based on human auditory
2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics
978-0-7695-4151-8/10 $26.00 2010 IEEE
DOI 10.1109/IHMSC.2010.66
248
2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics
978-0-7695-4151-8/10 $26.00 2010 IEEE
DOI 10.1109/IHMSC.2010.66
239
8/2/2019 Human Speaker Recognition Based on the Integration of -05590672
2/4
characteristics. Studies have shown that [6] parametersMFCC compared parameters LPCC have better recognitionresults, and less sensitive to noise. Therefore MFCC hasbeen more widely used. The MFCC parameters usually takea minimum of 12 to 16. After testing, 12 bands parametersare selected in this paper.
C.
Selection of encoding schemeIt requires encoding data before adopting genetic
algorithm to find central values of hidden layer nodes.Encoding methods includes binary coding, floating point andsymbol coding. The most common coding method is binaryencoding which uses fixed-length binary symbol strings torepresent individuals in groups. However, coding length islimited, and the precision of the parametric will inevitably beaffected during encoding and decoding. While, the float-encoding based genetic algorithm, each individual gene inthe initial population can be formed by uniform distributedrandom numbers. So it makes up for shortcomings of binarycode to some extent. This paper has adopted hybrid-codedgenetic algorithm, that is, the same individual contains both
the binary coding and floating-point coding. Binary codingrepresents structure of RBF network while floating-pointencoding represents the corresponding parameters, and thatis beneficial to improve the algorithm accuracy.
III. PROCEDURE OF MODEL CONSTRUCTIONSpeaker recognition system typically including four main
modules: feature extraction, model training, patternmatching, and logical decision-making, In this research, weuse RBF network to be the recognition model, in theprocedure of model construction, the most difficult is how todetermine the network connection weight and selectencoding scheme. Here, genetic algorithm is applied tooptimize neural networks. This model contributes in two
aspects: Firstly, optimizing network connection weight,Secondly, optimizing network structure. In this paper, hybridcoding method is used to encode RBF network, the structureof RBF network is designed as an m-dimensional input, n-dimensional output and up to L hidden layer. The encodedindividuals become operating variables of genetic algorithm,and RBF networks are trained by genetic algorithm. Detailsteps are as follows:
Step 1 Sample InitializationThe key operation is to set population size in this step,
that is, to set the number of genes encoding combinations. Inthis paper, the initial population is formed by L individuals,
that is kA , each individual consists of two parts: The first
part is binary-coded network structure. The second part is
floating-point encoded initial weight factor. Then thealgorithm determines population size. At first, a certainnumber of genetic individuals should be generated randomly,and then the best individuals can be selected and be added tothe initial populations. The size constantly keeps iterative.When the initial population of the network meets the sizewhich had been built, the process of iterative can be ended.
Step 2 When the optimal individual in group meets therequirements or the evolutionary time reaches, then thetraining can be stopped; otherwise continue.
Step 3 Cross every individual in the population.Randomly select the same place of two individuals andexchange them according to crossover probability in theselected place. There are two encodings in the same
individual. Different cross methods are used to the twoencodings. The location of choice is binary or real numbersystem should be determined firstly. Location of binarycoding can be exchanged directly, but for floating-pointcoding, to cross the selected individuals with probability of
cP.
Step 4 Mutation OperationBecause of biological genetic principles of gene
mutation, the algorithm can implement mutation for somebits of some individuals according mutation probability.When mutate, it is still necessary to determine the type ofencoding, depending on the different type, to operate bits ofthe string which has mutated.
Step5 New individual collection can be recorded as kB .Then the fitness function of new individual also need be
calculated through the formula of 1fE
= . Which, Eis
the objective function for the network as,
2
1 1
( ( ) ( ))N O
i i
i i
E Y t Y t= =
= . In equation, ( )iY t and
( )iY t separately indicate the actual output and the expected
output of the training data at the number of t and the outputnode of i . O and N separately indicate the number ofoutput nodes and the input data number.
Step 6 Perform another round of selection, crossover and
mutation, until it reaches a satisfactory number of iterations.You could get the genetic optimized network and parametersafter the optimal individual in the final population individualis decoded and then set up the speaker recognition module.
IV. SIMULATION EXPERIMENTA. Experimental Design
In the speaker recognition simulation experiment, usingVisual C + + as the development platform. The systemprogram for the output of the network using criteria [7] asfollows: selecting maximum value from the 15-dimensionaloutput node, and then set 1, the other output set 0. The output
node of 1 is the speaker. It is the end of training when Error510E < or training times greater than 100 timesin other
word, at this time the Program sets end.Speech data used in the experiment is form Timit
database. In this experiment, different people use differentcontent for network training, moreover, the training voicemast keep text-independent. There are 15 participants andeach has to read 16 segments of speech. Each voice-length is20s. So, 15-dimention features can be extracted from the
249240
8/2/2019 Human Speaker Recognition Based on the Integration of -05590672
3/4
samples and save in record. In the experiment, the voice ofspeakers will be divided into two parts, of which 8 segmentsare used to train the network, and the others are used torecognize the speaker.
When set the parameters of genetic algorithm, set
population size as 40L = , in which, cross
probability 0.85cP=
, and mutation probability0.006mP = . Set 100 as the maximum number of iteration,
the search range for hidden layer node number is 15 ~
28.Because the experiment have to separate 15 identity of
speakers, so the output nodes set as 15.
B. Analysis of experimental results(1) Performance comparison, one RBF neural network is
based on hybrid-coded genetic algorithm and the other RBFneural network is traditional
The parameters of RBF neural network based onhybrid-coded genetic algorithm (HGA-RBF Network) can
be set by using the principles above. The simulated result is
showed in Figure1.As compared with RBF networksimulation results, it obviously shows that the self-adaptive
optimization design of HGA-RBF Network overcomes the
problem of local optimum which is can not evitable intraditional RBF network.
Figure 1. Comparison of network performance
(2) Comparison of recognition rateIn order to explain the superiority of better HGA-RBF
Network, the recognition rate of speaker recognition systemof HGA-RBF Network and traditional RBF network can becompared. Make the hidden layer nodes of HGA-RBFnetwork up to 28, and finally can obtain RBF network
structure and the corresponding weights. After training,hidden layer need 18 nodes. The network structure is muchsimpler than traditional RBF network structure. Therefore,HGA-RBF network structure is set as 12-18-15. So thedefinition of HGA-RBF network structure is 12 input, 15output, hidden layer nodes for 18; its hidden layer transfer
function is tansig , the output layer transfer function
is purelin . Table1 shows the comparison result ofrecognition rate.
TABLE I. COMPARISON OF RECOGNITION RATE
Network typeTraining voice length/s
5 10 15 20
TraditionalRBF network
72.11 78.72 80.43 81.98
HGA-RBFnetwork
81.44 87.23 90.19 95.01
The experimental results has shown that: under the sameerror conditions, the HGA-RBF network needs lessiterations, more efficient, fast training speed and higherrecognition ratio. The proposed hybrid-coded geneticalgorithm is able to achieve simultaneous optimization of thestructure, weight and threshold value, to avoid randomnesson neural network selection, and to improve the computationefficiency. In summary, it is a feasible and effective schemefor speaker recognition.
V. CONCLUSIONUsing neural networks as classifiers in speaker
recognition always has a problem. The topology design ofnetwork and initial weight settings is not theoreticallysupported. This shortcoming results in large network scale,inefficiency identification and other problems. In this paper,genetic algorithm is used to adjust network topology andnetwork parameters self-adaptively. The dynamic adjustmentcan obtain the optimal network design. This also helps toovercome the problem that neural networks often easilystuck in a local solution moreover, to increase the RBFnetwork generalization. In a word, the performance ofspeaker recognition system can finally be enhanced.
ACKNOWLEDGMENT
This research was supported by the grants ofNational Science Foundation of China (No. 60970058), and
sponsored by the grants of Natural Science Foundation
of Jiangsu Province of China (No.BK2009131), and the
Science and technology Foundation of suzhou vocationaluniversity (No.SZD09L26).
REFERENCES
[1] Zhan-ming Li, Zhen Wang. Vector quantization and neural networkscombined speaker recognition system [J]. Computer Engineering andApplications, 2006, (15) :205-207.
[2] Salameh W A. Detection of Intrusion Using Neural Networks: Acustomized study [J].Studies in Informatics [2] Salameh W A.Detection of Intrusion Using Neural Networks: A customized study[J]. Studies in Informatics and Control ,2004,13(2):137. and Control,
2004,13 (2): 137.
[3] Yoshihiro Yamamoto, Nikiforuk P NA new supervised learningalgorithm for multilayered and inter-connected neural network. IEEETrans. On Neural Network, 2001, 11(1):36-46. [5] YoshihiroYamamoto, Nikiforuk P NA new supervised learning algorithm formultilayered and inter-connected neural network. IEEE Trans. On
Neural Network, 2001, 11 (1) :36-46.
250241
8/2/2019 Human Speaker Recognition Based on the Integration of -05590672
4/4
[4] Dam M, Saraf D N. Design of neural networks using geneticalgorithm for On-line property estimation of crude fractionator
products[J].Computersti and Chemical Engineering,2006,30(4):722-729.
[5] Chai Yi. Based on improved genetic algorithm neural networkadaptive optimal design [J]. Journal of Chongqing University(Natural Science Edition). 2007, 30 (4): 91-96..
[6] Ahmed Mezghani ,Douglas Speaker verification using a newrepresentation based on a CMFCC and fomants[J].IEEE Electricaland Computer Engineering,2005,22:1469-1472.
[7] Bing Wang, Jing-lin Xiang. Based on neural network human PulseRecognition [J]. Northwestern Polytechnical University, 2002,20 (3):454-457.
251242