Human Speaker Recognition Based on the Integration of -05590672

Embed Size (px)

Citation preview

  • 8/2/2019 Human Speaker Recognition Based on the Integration of -05590672

    1/4

    Human Speaker Recognition based on the integration of

    Genetic Algorithm and RBF Network

    Yan Zhou

    Department of electronics & information engineering,Suzhou Vocational University

    Suzhou, Jiangsu, China

    e-mail: [email protected]

    Yunian Gu

    Department ofelectronics & information engineering,Suzhou Vocational University

    Suzhou, Jiangsu, China

    e-mail: [email protected]

    Abstract Although the human speaker recognition system

    based on RBF network is one of the main models for

    recognizing speakers, it has some shortcomings, this is, the

    hidden layer node number of RBF network is often hard to be

    assigned, and the system has a slow convergence speed. In this

    paper, a RBF network optimization scheme based on genetic

    algorithm for human speaker recognition is proposed. In the

    scheme, the hybrid encoding genetic algorithm is used to

    optimize the connected weights and structure of RBF networkand the redundant nodes and redundant connected weights are

    removed from the network effectively. The scheme utilizes the

    parallelism of the neural network and the global search

    capability of the genetic algorithm, so it improves the

    processing capability of the network evidently. The

    experimental tests show that the scheme based on hybrid

    encoding genetic algorithm has a fast learning speed, a high

    recognition rate, and it is a new practical scheme for human

    speaker recognition.

    Keywords- human speaker recognition; RBF network;

    genetic algorithm

    I. INTRODUCTIONSpeaker recognition is a technology that can recognize or

    confirm the identity of speakers. The recognition process isbased on the parameters of voice samples. Generallyspeaking, it consists of pre-processing, feature extraction,pattern matching and module reference.

    Establishing a speaker recognition system involves twostages: training and recognizing. The current models forspeaker recognition mainly includes Vector-based model,Gaussian mixture model, Hidden Markov models andArtificial neural network model [1,2] etc. While among allartificial neural network models, RBF network is the mostwidely used system. However existing research [3] showsthat RBF network is designed primarily on the experiences

    of designers, the designers selected samples among repeatedexperiments in large sample space, with no theoreticalguidance. Therefore, initial connection weights and choice ofnetwork structure is much too random, and there is lesspossibility of overall network optimization.

    Developed in recent years, speaker recognition model [4]based on genetic algorithm (GA) is a robust identificationmethod. Genetic algorithm can guide the research anddetermine the direction, and finally obtain the optimalsolution of the network structure optimization. The algorithmselects initial population, crosses and mutates individuals

    mainly based on the sample fitness function. Although thereare many successful cases [5] of applying genetic algorithmto optimize neural networks, most of the research mainlyfocused on training neural network weights, but ignored theconnection between neural network structure and weights.

    Based on above mentioned shortcomings, the hybrid

    coding program for the integration of network structure and

    weights is proposed in this paper. The hybrid coded

    individual can be trained by genetic algorithm. As a result,the most optimum training result is used to be the network

    model of speaker recognition system. The improved genetic

    operator can simultaneously realize optimization both of

    network structure and weights.

    II. CHOICE OF MAIN SCHEMEA. Principle of Speaker Recognition

    System is realized by RBF neural network model basedon hybrid genetic algorithm. The main design process is asfollows:

    First, performing endpoint detection, pre-emphasis, sub-

    frame, and add hamming window for the input speechsamples, then the voice feature parameters can be extracted.Second, setting up the speaker model and training the modelparameters. The process can be designed as this: thealgorithm randomly generates different neural networkstructures, then encoding every corresponding parameter tobe individual, so that every individual represents a neuralnetwork. The network structures can be trained by differentinitial weights and then calculating the fitness. The algorithmselects the individuals with high fitness to inherit nextgeneration, perform genetic operation to current group andproduce the next generation, then repeat the above to find outone individual. The best individual is deemed to be themodel of speaker recognition system. Third, the algorithm

    sets up network training parameter and trains voice featureparameters of the speakers, as a result, the reference patterndatabase can be established. Fourth, mainly about the processof speaker recognition, identify speaker based on thenetwork weight value.

    B. Extraction of speech featureFeature extraction is the key problem to pattern

    recognition. Currently the most popular feature parameter isthe Cepstral coefficient based on all-pole channel model andthe MEL Cepstral coefficient based on human auditory

    2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics

    978-0-7695-4151-8/10 $26.00 2010 IEEE

    DOI 10.1109/IHMSC.2010.66

    248

    2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics

    978-0-7695-4151-8/10 $26.00 2010 IEEE

    DOI 10.1109/IHMSC.2010.66

    239

  • 8/2/2019 Human Speaker Recognition Based on the Integration of -05590672

    2/4

    characteristics. Studies have shown that [6] parametersMFCC compared parameters LPCC have better recognitionresults, and less sensitive to noise. Therefore MFCC hasbeen more widely used. The MFCC parameters usually takea minimum of 12 to 16. After testing, 12 bands parametersare selected in this paper.

    C.

    Selection of encoding schemeIt requires encoding data before adopting genetic

    algorithm to find central values of hidden layer nodes.Encoding methods includes binary coding, floating point andsymbol coding. The most common coding method is binaryencoding which uses fixed-length binary symbol strings torepresent individuals in groups. However, coding length islimited, and the precision of the parametric will inevitably beaffected during encoding and decoding. While, the float-encoding based genetic algorithm, each individual gene inthe initial population can be formed by uniform distributedrandom numbers. So it makes up for shortcomings of binarycode to some extent. This paper has adopted hybrid-codedgenetic algorithm, that is, the same individual contains both

    the binary coding and floating-point coding. Binary codingrepresents structure of RBF network while floating-pointencoding represents the corresponding parameters, and thatis beneficial to improve the algorithm accuracy.

    III. PROCEDURE OF MODEL CONSTRUCTIONSpeaker recognition system typically including four main

    modules: feature extraction, model training, patternmatching, and logical decision-making, In this research, weuse RBF network to be the recognition model, in theprocedure of model construction, the most difficult is how todetermine the network connection weight and selectencoding scheme. Here, genetic algorithm is applied tooptimize neural networks. This model contributes in two

    aspects: Firstly, optimizing network connection weight,Secondly, optimizing network structure. In this paper, hybridcoding method is used to encode RBF network, the structureof RBF network is designed as an m-dimensional input, n-dimensional output and up to L hidden layer. The encodedindividuals become operating variables of genetic algorithm,and RBF networks are trained by genetic algorithm. Detailsteps are as follows:

    Step 1 Sample InitializationThe key operation is to set population size in this step,

    that is, to set the number of genes encoding combinations. Inthis paper, the initial population is formed by L individuals,

    that is kA , each individual consists of two parts: The first

    part is binary-coded network structure. The second part is

    floating-point encoded initial weight factor. Then thealgorithm determines population size. At first, a certainnumber of genetic individuals should be generated randomly,and then the best individuals can be selected and be added tothe initial populations. The size constantly keeps iterative.When the initial population of the network meets the sizewhich had been built, the process of iterative can be ended.

    Step 2 When the optimal individual in group meets therequirements or the evolutionary time reaches, then thetraining can be stopped; otherwise continue.

    Step 3 Cross every individual in the population.Randomly select the same place of two individuals andexchange them according to crossover probability in theselected place. There are two encodings in the same

    individual. Different cross methods are used to the twoencodings. The location of choice is binary or real numbersystem should be determined firstly. Location of binarycoding can be exchanged directly, but for floating-pointcoding, to cross the selected individuals with probability of

    cP.

    Step 4 Mutation OperationBecause of biological genetic principles of gene

    mutation, the algorithm can implement mutation for somebits of some individuals according mutation probability.When mutate, it is still necessary to determine the type ofencoding, depending on the different type, to operate bits ofthe string which has mutated.

    Step5 New individual collection can be recorded as kB .Then the fitness function of new individual also need be

    calculated through the formula of 1fE

    = . Which, Eis

    the objective function for the network as,

    2

    1 1

    ( ( ) ( ))N O

    i i

    i i

    E Y t Y t= =

    = . In equation, ( )iY t and

    ( )iY t separately indicate the actual output and the expected

    output of the training data at the number of t and the outputnode of i . O and N separately indicate the number ofoutput nodes and the input data number.

    Step 6 Perform another round of selection, crossover and

    mutation, until it reaches a satisfactory number of iterations.You could get the genetic optimized network and parametersafter the optimal individual in the final population individualis decoded and then set up the speaker recognition module.

    IV. SIMULATION EXPERIMENTA. Experimental Design

    In the speaker recognition simulation experiment, usingVisual C + + as the development platform. The systemprogram for the output of the network using criteria [7] asfollows: selecting maximum value from the 15-dimensionaloutput node, and then set 1, the other output set 0. The output

    node of 1 is the speaker. It is the end of training when Error510E < or training times greater than 100 timesin other

    word, at this time the Program sets end.Speech data used in the experiment is form Timit

    database. In this experiment, different people use differentcontent for network training, moreover, the training voicemast keep text-independent. There are 15 participants andeach has to read 16 segments of speech. Each voice-length is20s. So, 15-dimention features can be extracted from the

    249240

  • 8/2/2019 Human Speaker Recognition Based on the Integration of -05590672

    3/4

    samples and save in record. In the experiment, the voice ofspeakers will be divided into two parts, of which 8 segmentsare used to train the network, and the others are used torecognize the speaker.

    When set the parameters of genetic algorithm, set

    population size as 40L = , in which, cross

    probability 0.85cP=

    , and mutation probability0.006mP = . Set 100 as the maximum number of iteration,

    the search range for hidden layer node number is 15 ~

    28.Because the experiment have to separate 15 identity of

    speakers, so the output nodes set as 15.

    B. Analysis of experimental results(1) Performance comparison, one RBF neural network is

    based on hybrid-coded genetic algorithm and the other RBFneural network is traditional

    The parameters of RBF neural network based onhybrid-coded genetic algorithm (HGA-RBF Network) can

    be set by using the principles above. The simulated result is

    showed in Figure1.As compared with RBF networksimulation results, it obviously shows that the self-adaptive

    optimization design of HGA-RBF Network overcomes the

    problem of local optimum which is can not evitable intraditional RBF network.

    Figure 1. Comparison of network performance

    (2) Comparison of recognition rateIn order to explain the superiority of better HGA-RBF

    Network, the recognition rate of speaker recognition systemof HGA-RBF Network and traditional RBF network can becompared. Make the hidden layer nodes of HGA-RBFnetwork up to 28, and finally can obtain RBF network

    structure and the corresponding weights. After training,hidden layer need 18 nodes. The network structure is muchsimpler than traditional RBF network structure. Therefore,HGA-RBF network structure is set as 12-18-15. So thedefinition of HGA-RBF network structure is 12 input, 15output, hidden layer nodes for 18; its hidden layer transfer

    function is tansig , the output layer transfer function

    is purelin . Table1 shows the comparison result ofrecognition rate.

    TABLE I. COMPARISON OF RECOGNITION RATE

    Network typeTraining voice length/s

    5 10 15 20

    TraditionalRBF network

    72.11 78.72 80.43 81.98

    HGA-RBFnetwork

    81.44 87.23 90.19 95.01

    The experimental results has shown that: under the sameerror conditions, the HGA-RBF network needs lessiterations, more efficient, fast training speed and higherrecognition ratio. The proposed hybrid-coded geneticalgorithm is able to achieve simultaneous optimization of thestructure, weight and threshold value, to avoid randomnesson neural network selection, and to improve the computationefficiency. In summary, it is a feasible and effective schemefor speaker recognition.

    V. CONCLUSIONUsing neural networks as classifiers in speaker

    recognition always has a problem. The topology design ofnetwork and initial weight settings is not theoreticallysupported. This shortcoming results in large network scale,inefficiency identification and other problems. In this paper,genetic algorithm is used to adjust network topology andnetwork parameters self-adaptively. The dynamic adjustmentcan obtain the optimal network design. This also helps toovercome the problem that neural networks often easilystuck in a local solution moreover, to increase the RBFnetwork generalization. In a word, the performance ofspeaker recognition system can finally be enhanced.

    ACKNOWLEDGMENT

    This research was supported by the grants ofNational Science Foundation of China (No. 60970058), and

    sponsored by the grants of Natural Science Foundation

    of Jiangsu Province of China (No.BK2009131), and the

    Science and technology Foundation of suzhou vocationaluniversity (No.SZD09L26).

    REFERENCES

    [1] Zhan-ming Li, Zhen Wang. Vector quantization and neural networkscombined speaker recognition system [J]. Computer Engineering andApplications, 2006, (15) :205-207.

    [2] Salameh W A. Detection of Intrusion Using Neural Networks: Acustomized study [J].Studies in Informatics [2] Salameh W A.Detection of Intrusion Using Neural Networks: A customized study[J]. Studies in Informatics and Control ,2004,13(2):137. and Control,

    2004,13 (2): 137.

    [3] Yoshihiro Yamamoto, Nikiforuk P NA new supervised learningalgorithm for multilayered and inter-connected neural network. IEEETrans. On Neural Network, 2001, 11(1):36-46. [5] YoshihiroYamamoto, Nikiforuk P NA new supervised learning algorithm formultilayered and inter-connected neural network. IEEE Trans. On

    Neural Network, 2001, 11 (1) :36-46.

    250241

  • 8/2/2019 Human Speaker Recognition Based on the Integration of -05590672

    4/4

    [4] Dam M, Saraf D N. Design of neural networks using geneticalgorithm for On-line property estimation of crude fractionator

    products[J].Computersti and Chemical Engineering,2006,30(4):722-729.

    [5] Chai Yi. Based on improved genetic algorithm neural networkadaptive optimal design [J]. Journal of Chongqing University(Natural Science Edition). 2007, 30 (4): 91-96..

    [6] Ahmed Mezghani ,Douglas Speaker verification using a newrepresentation based on a CMFCC and fomants[J].IEEE Electricaland Computer Engineering,2005,22:1469-1472.

    [7] Bing Wang, Jing-lin Xiang. Based on neural network human PulseRecognition [J]. Northwestern Polytechnical University, 2002,20 (3):454-457.

    251242