20
1 Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network Based Speech Classification for Human-Machine Interaction GIN-DER WU AND ZHEN-WEI ZHU Department of Electrical Engineering National Chi Nan University Nantou, Taiwan, R.O.C. e-mail: [email protected] Fax: +886-49-2917810 ABSTRACT Speech detection and speech recognition are two important classification problems in human-robot interaction. They are easily affected by the noisy environment. To solve this problem, a dual-discriminability-analysis type-2 fuzzy neural network (DDA2FNN) is presented to solve the problem of noisy data classification. In handing problems with uncertainties such as noisy data, type-2 fuzzy-systems generally outperform their type-1 counterparts. Hence, type-2 fuzzy-sets are adopted in the antecedent parts to model the noisy data. The most important consideration for classification problems is the “discriminability”. To enhance the “discriminability”, a dual-discriminability-analysis (DDA) method is proposed in the consequent parts. The novelty of DDA is its consideration of both linear-discriminant-analysis (LDA) and minimum-classification-error (MCE). The proposed dual-discriminability-analysis type-2 fuzzy rule includes an LDA-matrix and an MCE-matrix. Compared with other existing fuzzy neural networks, the novelty of the proposed DDA2FNN is its consideration of both uncertainty and discriminability. The effectiveness of the proposed DDA2FNN is demonstrated by two speech classification problems. Experimental results and theoretical analysis indicate that the proposed DDA2FNN performs better than the other fuzzy neural networks. Keywords: Discriminability, fuzzy neural network, classification, linear-discriminant-analysis (LDA), minimum classification error (MCE). 1. INTRODUCTION Fuzzy-rule-based methods [1]-[6] have received considerable attention in noisy speech classification problems. Although these fuzzy-rule-based classifiers attempt to minimize the training error, noise attack usually reduces the discriminability and increases the uncertainty. Therefore, discriminability and uncertainty are two important factors in noisy data classification. To consider the discriminability, principal component analysis (PCA) [7]-[9] has been applied in the optimization of classification. Some studies [10]-[11] propose a self-constructing neural fuzzy inference network (SONFIN) using PCA to classification problems. However, PCA lacks an analysis of the statistics among different classes, explaining why the discriminative capability of PCA is not good. Linear discriminant analysis (LDA) is a stochastic algorithm that optimizes the discriminative capability among different classes. Because the discriminative capability critically determines the classification performance, LDA has been extensively adopted to classify highly confusable patterns. Based on LDA, the authors of [12] proposed a maximizing-discriminability-based self-organizing fuzzy network (MDSOFN) that can classify highly confusable patterns. Experiments and analysis indicate that the LDA-derived fuzzy network outperforms the PCA-based fuzzy network and support vector machine (SVM) based fuzzy network [13]-[16]. Other studies [17]-[18] proposed minimum classification error (MCE) methods, where MCE is used to increase the temporal discriminative capability rather than to fit the distributions to the data. This study adopts LDA and MCE in the consequent parts of the proposed dual-discriminability-analysis type-2 fuzzy neural network (DDA2FNN) to increase the discriminability for solving classification problems. To optimize fuzzy neural networks, state-of-the-art hybrid machine learning systems apply either particle-swarm optimization (PSO) [19]–[21] or ant-colony optimization (ACO) [22]-[23]. Type-2 fuzzy systems allow researchers to model and minimize the effects of uncertainties in rule-based systems [24]-[29]. Type-2 fuzzy logic systems (FLS) outperform their type-1 counterparts in handing problems with uncertainties such as noisy data. This ability is attributed to type-2 fuzzy-sets, which have 3-D membership

Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

1

Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network Based Speech Classification for Human-Machine Interaction

GIN-DER WU AND ZHEN-WEI ZHU

Department of Electrical Engineering National Chi Nan University

Nantou, Taiwan, R.O.C. e-mail: [email protected]

Fax: +886-49-2917810

ABSTRACT

Speech detection and speech recognition are two important classification problems in human-robot interaction. They are easily affected by the noisy environment. To solve this problem, a dual-discriminability-analysis type-2 fuzzy neural network (DDA2FNN) is presented to solve the problem of noisy data classification. In handing problems with uncertainties such as noisy data, type-2 fuzzy-systems generally outperform their type-1 counterparts. Hence, type-2 fuzzy-sets are adopted in the antecedent parts to model the noisy data. The most important consideration for classification problems is the “discriminability”. To enhance the “discriminability”, a dual-discriminability-analysis (DDA) method is proposed in the consequent parts. The novelty of DDA is its consideration of both linear-discriminant-analysis (LDA) and minimum-classification-error (MCE). The proposed dual-discriminability-analysis type-2 fuzzy rule includes an LDA-matrix and an MCE-matrix. Compared with other existing fuzzy neural networks, the novelty of the proposed DDA2FNN is its consideration of both uncertainty and discriminability. The effectiveness of the proposed DDA2FNN is demonstrated by two speech classification problems. Experimental results and theoretical analysis indicate that the proposed DDA2FNN performs better than the other fuzzy neural networks.

Keywords: Discriminability, fuzzy neural network, classification, linear-discriminant-analysis (LDA), minimum classification error (MCE).

1. INTRODUCTION

Fuzzy-rule-based methods [1]-[6] have received considerable attention in noisy speech classification problems. Although these fuzzy-rule-based classifiers attempt to minimize the training error, noise attack usually reduces the discriminability and increases the uncertainty. Therefore, discriminability and uncertainty are two important factors in noisy data classification. To consider the discriminability, principal component analysis (PCA) [7]-[9] has been applied in the optimization of classification. Some studies [10]-[11] propose a self-constructing neural fuzzy inference network (SONFIN) using PCA to classification problems. However, PCA lacks an analysis of the statistics among different classes, explaining why the discriminative capability of PCA is not good.

Linear discriminant analysis (LDA) is a stochastic algorithm that optimizes the discriminative capability among different classes. Because the discriminative capability critically determines the classification performance, LDA has been extensively adopted to classify highly confusable patterns. Based on LDA, the authors of [12] proposed a maximizing-discriminability-based self-organizing fuzzy network (MDSOFN) that can classify highly confusable patterns. Experiments and analysis indicate that the LDA-derived fuzzy network outperforms the PCA-based fuzzy network and support vector machine (SVM) based fuzzy network [13]-[16]. Other studies [17]-[18] proposed minimum classification error (MCE) methods, where MCE is used to increase the temporal discriminative capability rather than to fit the distributions to the data. This study adopts LDA and MCE in the consequent parts of the proposed dual-discriminability-analysis type-2 fuzzy neural network (DDA2FNN) to increase the discriminability for solving classification problems.

To optimize fuzzy neural networks, state-of-the-art hybrid machine learning systems apply either particle-swarm optimization (PSO) [19]–[21] or ant-colony optimization (ACO) [22]-[23]. Type-2 fuzzy systems allow researchers to model and minimize the effects of uncertainties in rule-based systems [24]-[29]. Type-2 fuzzy logic systems (FLS) outperform their type-1 counterparts in handing problems with uncertainties such as noisy data. This ability is attributed to type-2 fuzzy-sets, which have 3-D membership

Page 2: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

2

functions. The third dimension in type-2 fuzzy-sets and a footprint of uncertainty provide an additional degree of freedom for type-2 FLS to directly model and handle uncertainties. Therefore, the antecedent parts of the proposed DDA2FNN uses type-2 fuzzy-sets to model the uncertainty.

Deep learning has become popular in artificial intelligence. As a typical deep learning architecture, the deep neural network (DNN) has shown good performance in speech signal processing [30]-[33]. Each network layer is regarded as a different feature space of the input data. The DNN works by using a back-propagation algorithm with feed-forward multi-layer neural networks. Compared with FNN, DNN needs more powerful CPU to process the data in networks, and has a high computational load. Additionally, unlike fuzzy rules, a DNN does not have much “interpretable” information. Hence, this study focuses on type-2 FNN.

The rest of this paper is organized as follows. Section 2 discusses the optimization of dual-discriminability-analysis (DDA) type-2 fuzzy rule for noisy speech classification. Section 3 introduces the structure and parameters of the proposed DDA2FNN. Section 4 demonstrates the effectiveness of DDA2FNN by using two speech classification problems. Finally, Section 5 draws conclusions.

2. FUZZY FOR NOISY DATA CLASSFICATION

This section discusses the optimization of fuzzy rules for noisy data classification. The proposed dual-discriminability-analysis (DDA) type-2 fuzzy rule comprises an LDA-matrix and an MCE-matrix which is introduced as follows.

A. Dual-Discriminability-Analysis Type-2 Fuzzy

The fuzzy rule with dual-discriminability-analysis is described as follows. Rule r : IF 1x is r

1A AND…AND Nx is rNA

THEN y is

+1

01

pr r

n nn

a a t

,r 1, ......,M (1)

where 1x … Nx represent input variables; r1A … r

NA are interval type-2 fuzzy sets; y is the output of fuzzy rule r ; M is the number of rules, and , , 0,..., 1r r r r r

n n n n na c s c s n p are interval sets. Consequent parts nt ( 1,...,n p ) are updated by the LDA-matrix, N p

LDAW , and 1pt is updated by the MCE-matrix, 1N

MCEW , as follows.

1 2 LDA 1 2[ ] [ ]T T Tp Nt t t W x x x (2)

1 MCE 1 2[ ]T Tp Nt W x x x (3)

The total number of classes is J , and the input vector ( ) ( )jX n in the training set is labeled as the j th class. The mean vector ( )j and covariance matrix ( )j labeled as belonging to class j are then

computed.

( ) ( )

1

1( )

jNj j

j n

X nN

(4)

( ) ( ) ( ) ( ) ( )

1

1( ( ) )( ( ) )

jNj j j j j T

j n

X n X nN

(5)

where jN indicates the total number of such input vectors ( ) ( )jX n labeled as belonging to the j th class.

Page 3: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

3

B. LDA-matrix

To generate N p

LDAW , the between-class matrix BS and within-class matrix WS are defined as

( ) ( )

1

( )( )J

j j TB j

j

S N

(6)

( )

1

Jj

W jj

S N

(7)

where ( )

1 1(1 )

J J jj jj j

N N

. To increase discriminative capabilities among different

classes, the optimal direction arg maxT

BTe

W

e S ee

e S e is shown as follows [12].

1W BS S e e (8)

In this form, e denotes the eigenvector of matrix 1

W BS S. The discrimination matrix N p

LDAW

gathers the eigenvectors of 1

W BS S corresponding to the largest p eigenvalues.

C. MCE-matrix To generate 1N

MCEW , the Gaussian model m of class m is expressed as follows.

1

( ) ( ) ( ) ( ) ( )1( ) ( )( ) 2

11( ) 22

1( ) |

2

TT j T m T m T j T mMCE MCE MCE MCE MCE MCEW X n W W W W X n WT j

MCE m

T mMCE MCE

P W X n e

W W

1

( ) ( ) ( ) ( ) ( )1( ) ( )

211

( ) 22

1

2

Tj m T m T j mMCE MCE MCE MCEX n W W W W X n

T mMCE MCE

e

W W

(9)

1

( ) ( ) ( ) ( ) ( )1( ) ( )( ) 2

11( ) 22

1ln ( ) | ln

2

Tj m T m T j mMCE MCE MCE MCEX n W W W W X nT j

MCE m

T mMCE MCE

P W X n e

W W

1( ) ( ) ( ) ( ) ( ) ( )1 1 1ln 2 ln ( ) ( )

2 2 2

TT m j m T m T j mMCE MCE MCE MCE MCE MCEW W X n W W W W X n

(10)

Page 4: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

4

( )ln |jTMCE m

MCE

P W X nW

( )( ) ( ) ( ) ( ) ( )

( ) 2( )

( ) ( ) ( ) ( ) ( )

1m TMCE j m j m mT

MCE MCE MCEmT mTMCE MCE MCE MCE

Tj m j m mTMCE MCE MCE

WX n X n W W W

W W W W

W X n X n W W

(11)

To maximize the discriminative, the cost function ,jTj MCEd W X n is defined for a certain class j

in which ( )T jMCEW X belongs to this class j :

1

1, ln | ln |

1

Jj j jT T T

j MCE MCE j MCE mmm j

d W X n P W X n P W X nJ

(12)

{ , 1, 2, , }m m J is a Gaussian model set where ,m mT T

m MCE MCE MCEN W W W

denotes the class m . J indicates the total number of classes. ln |jT

MCE jP W X n is related

to the class-conditioned likelihood, and 1

1ln |

1

JjT

MCE mmm j

P W X nJ

is a function that defines

how the class-conditioned likelihoods for the competing models , 1, 2,......, ,m m J m j , are

counted in the classification error function. This classification error function ,jT

j MCEd W X n is

smoothed by applying a sigmoid function.

1

( )1 exp( )

l dd

(13)

Accordingly,

1,

1 exp ,

jTj MCE

jTj MCE

l d W X nd W X n

(14)

Then, a total loss function MCER is defined as

Page 5: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

5

1 1 1

1ln | ln |

1

jNJ Jj jT T

MCE MCE j MCE mj n m

m j

R l P W X n P W X nJ

(15)

To minimize this loss function MCER in Fig. 1, 1NMCEW is updated as shown in [34].

1 2( ) [ , ,......, ]NX t x x x

1 2[ ... ]TMCE NW w w w

m

( )m

1m

12m

2

T TMCEW X

Fig. 1 Minimize the classification error.

This section describes the structure of DDA2FNN, which attempts to find the most discriminative

representation of a fuzzy neural network in the classification framework. The proposed DDA2FNN is a five-layered network. For convenience, Fig. 2 illustrates a single output of DDA2FNN.

1t 1pt

nx2x1x1x nx

1 1,L Rw w ,r rL Rw w ,M M

L Rw w 11[ , ]f f [ , ]

rrf f [ , ]MMf f

,L Ry y

2t

Fig. 2. Structure of dual-discriminability-analysis type-2 fuzzy neural network (DDA2FNN).

Page 6: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

6

A. Structure of DDA2FNN Layer 1: The inputs are crisp values. For input range normalization, each node in this layer scales

input , 1,......,nx n N to within the range [-1 , 1]. There are no weights to be adjusted in this layer.

Layer 2: In type-2 fuzzy-sets part, an interval type-2 Membership Function (MF) for the r th fuzzy set rnA in input variable nx has a fixed standard deviation σ and an uncertain mean [ 1m , 2m ] as follows.

2

1 2exp , ; , ,r

r r r r r rn nn n n n n n nr

n

x mN m x m m m

(16)

The upper MF, rn , and lower MF, r

n are

1 1

1 2

2 2

, ; ,

1,

, ; ,

r r rn n n n n

r r rn n n n

r r rn n n n n

N m x x m

m x m

N m x x m

(17)

1 22

1 21

, ; ,2

, ; ,2

r rr r n nn n n n

rn r r

r r n nn n n n

m mN m x x

m mN m x x

(18)

Hence, the output is the interval type ,r rn n

.

In “DDA” part, nt ( 1,...,n p ) are updated by N pLDAW , and 1pt is updated by

1NMCEW .

Layer 3 : The rule-node part performs the fuzzy meet operation by using an algebraic product

operation. The output of a rule node represents its corresponding firing strength. This firing strength is computed as ,r rf f

.

1

Nr r

nn

f

(19)

1

Nr r

nn

f

(20)

In consequent-node parts, the fuzzy set is represented by ,r rL Rw w

as follows.

+1

0 0 0 01

, , ,p

r r r r r r r r r rL R n n n n n

n

w w c s c s c s c s t

(21)

That is

1 1

0 0

p pr r rL n n n n

n n

w c t t s

(22)

1 1

0 0

p pr r rR n n n n

n n

w c t t s

(23)

where 0 1t .

Layer 4: This layer implements the type reduction. The type-reduced set is an interval type-1 fuzzy set

Page 7: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

7

,L Ry y . The outputs Ly and Ry can be calculated by using Kamik-Mendel iterative procedure [35]. Layer 5 : Based on the above interval type-1 set ,L Ry y , this layer implements the defuzzification

operations. The defuzzified output is calculated as

2

L Ry yy

(24)

B. Structure Learning A new rule is generated according to clustering on the input variables. Therefore, a cluster in the input

space corresponds to a rule. The rule firing strength is calculated as follows.

1ˆ2

r r rf f f (25)

This firing strength center acts as a rule generation criterion. For each incoming data 1,......, NX x x , find

1

ˆarg max r

r M tI f X

(26)

where M t is the number of rules at time t . If ˆ Ithf X , then a new rule is generated as

1 1M t M t . 0,1th is a pre-specified threshold to decide the number of input clusters. Once a new cluster is generated, its initial uncertain mean in input variable nx is

1

1 2, 0.1, 0.1 , 1,......,M t r rn n n n nm m m x x n N (27)

A fuzzy set generation criterion is computed as

2r r rn n n . (28)

The nearest fuzzy set is defined as the one that possesses the maximum degree.

1

ˆarg max , 1,......,rn n

r M tI n N

(29)

The initial width of the first class is 1n init which is set as a small value. The other initial widths are

assigned as

1

1 21

2n nM t I I

n n n nx m m (30)

where determines the overlap degree between two fuzzy sets. Once a new rule is generated, generation of the corresponding consequent node follows. The initial consequent parameters are set to

0 0 0 0, 0.1, 0.1 , 1,......,r r r r d dc s c s y y r M (31)

where 0r dc y is the desired output for input X . The initial parameter 0 0.1rs determines the initial

output interval range. The other initial consequent parameters , 1,......, 1rnc n p , and initial

, 1,......, 1rns n p , are the same as 0

rs .

00.01, 0.1, 1,..., 1, and 1,...,r r rn nc s s n p r M (32)

Page 8: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

8

C. Parameter Learning

The goal of this step is to minimize 21

2dE y y in which y and

dy denote real and desired

outputs, respectively. 1rnm , 2

rnm and

rn are updated in [36] as follows.

1 11

1r rn n r

n

Em t m t

m

(33)

2 22

1r rn n r

n

Em t m t

m

(34)

1r rn n r

n

Et t

(35)

In addition, the update rules for rnc and

rns are

1r rn n r

n

Ec t c t

c

(36)

1r rn n r

n

Es t s t

s

(37)

To clarify the underlying reason of optimization equations, Fig. 3 shows a new global flowchart for optimization. The last item is updated by

1NMCEW which makes the convergence slow. Updating

the first p terms by N pLDAW improves the speed of convergence.

1 2[ , , , ]Nx x x

21

2dE y y

CE CE

CECE CE

CE

1 |M M

MM M t W W t

M

RW t W t

W

1 11

1r rn n r

n

Em t m t

m

2 22

1r rn n r

n

Em t m t

m

1r rn n r

n

Et t

1r rn n r

n

Ec t c t

c

1r rn n r

n

Es t s t

s

CE CE CE1 1 1

1ln | ln |

1

jNJ Jj jT T

M M j M mj n m

m j

R l P W X n P W X nJ

y

dy

Fig. 3. The learning of parameters.

Page 9: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

9

4. EXPERIMENTS

The effectiveness of the proposed DDA2FNN was evaluated by two noisy speech classification problems. The first is speech detection in noisy environments, and the second is speech recognition in noisy environments. This section presents a detailed comparative performance analysis for the fuzzy neural network with PCA [10], fuzzy neural network with LDA [12], and type-2 fuzzy neural network [36].

A. Speech Detection in Noisy Environments

Fig. 4 shows the flowchart of fuzzy neural network classifiers for speech detection. This experiment is frame-based detection which has 120-points (15ms at 8KHz sampling rate). In [11] and [37], the input feature vector of the classifier consists of the average of the logarithmic root-mean-square (rms) energy on the first five frames of the recording interval (Noise_time), refined time-frequency (RTF) parameter, and zero-crossing rate (ZCR). These three input parameters are normalized within [0, 1]. The output of a classifier indicates whether the corresponding frame is a speech signal or noise. Therefore, the output vector of (1, 0) stands for speech signal, and (0, 1) stands for noise. There are 60 training patterns (16-bit waveform) selected over five SNR conditions (SNR = 0db, 5dB, 10dB, 15dB, 20dB). They were classified as speech or noises based on waveform, spectrum displays and audio output. Among these 60 training patterns, 30 patterns were from the “speech” with a desired output vector of (1, 0), and the other 30 patterns were “noise” with a desired output vector of (0, 1). After training, the classifier is ready for speech boundary detection. In Fig. 4, the decoder decodes the output vector (1, 0) as a speech signal, and (0, 1) as noise. The output waveform of the decoder then passes through a median filter to eliminate some impulse noise. Finally, a speech-waveform with sufficient magnitude and duration is defined as a speech-signal island.

Page 10: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

10

word boundary detection window(15ms) fs:8kHz

shift 15ms

Featureextraction

Noise_time

RTF

Zero-crossingrate

001

110Decoder

MedianFilter

Pick the wordboundary

Classifier

Fig. 4. Fuzzy neural network classifier for speech detection in noisy environments.

0 0.5 1 1.5 2 2.5 3

x 104

-1

0

1

x 104

Sample index

Mag

nitu

de

(a)

0 0.5 1 1.5 2 2.5 3

x 104

-1

0

1

2

Sample index

Sta

tus

(b)

Fig. 5. (a) Original speech waveform. (b) Original speech endpoint locations.

Page 11: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

11

0 0.5 1 1.5 2 2.5 3

x 104

-1

0

1

x 104

Sample index

Mag

nitu

de

(a)

0 0.5 1 1.5 2 2.5 3

x 104

-1

0

1

2

Sample index

Sta

tus

(b)

0 0.5 1 1.5 2 2.5 3

x 104

-1

0

1

2

Sample index

Sta

tus

(c)

0 0.5 1 1.5 2 2.5 3

x 104

-1

0

1

2

Sample index

Sta

tus

(d) Fig. 6. (a) Speech recorded in additive white noise (SNR=0dB). (b) Speech detected by SONFIN-PCA (c) Speech detected by MDSOFN-LDA (d) Speech detected by SEIT2FNN.

Page 12: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

12

0 0.5 1 1.5 2 2.5 3

x 104

-1

0

1

2

Sample index

Sta

tus

Fig. 7. Speech detected by DDA2FNN.

Fig. 5 ~ Fig. 7 illustrate the speech detection performance in noisy conditions under various detection methods. Fig. 5 shows the original speech waveform and its corresponding speech endpoint locations, in which sample index 15000 belongs to the speech signal. Fig. 6 (a) shows the speech waveform recorded in additive white noise (SNR = 0dB). Fig. 6 (b), (c), and (d) show the speech endpoint locations detected by SONFIN-PCA, MDSOFN-LDA and SEIT2FNN, respectively. Those detection methods did not correctly classify the sample- index 15000 in Fig. 6 as speech, unlike the original speech waveform in Fig. 5. To enhance the “discriminability”, the proposed DDA considers both linear-discriminant-analysis (LDA) and minimum-classification-error (MCE). The endpoint detection of DDA2FNN in Fig. 7 is more precise than those in Fig. 6 because sample index 15000 is correctly classified as speech signal.

Table I summarizes the detection accuracy of different fuzzy classifiers averaged over five SNR conditions (SNR = 0db, 5dB, 10dB, 15dB, 20dB) and five noise types (multi-talker babble noise, cockpit noise, noise on the floor of car factory, vehicle noise and white noise). The noise samples were taken from NOISEX-92 [38]. The detection accuracy was sample-wise. The proposed DDA2FNN performed better than the other methods, according to the experimental results.

Table I. Speech detection accuracy.

FNN Classifier SONFIN-PCA

[10]

MDSOFN-LDA

[12]

SEIT2FNN

[36]

DDA2FNN

Structure analysis 19 rules 17 rules 10 rules 7 rules

Number of parameters 275 173 250 153

Average correct rate 86.2% 88.6% 86.0% 89.6%

For the purpose of comparison, we also studied the performances of different methods of speech detection using the same training and test speech sequences. First, a novel adaptive long-term sub-band entropy for voice activity detection [39] was tested and compared. Its thresholds were determined by trial and error. Without the power of machine learning, its performance is lower than our proposed method. The second and third methods adopted Haar wavelet energy and entropy features for speech detection [40]. They are shown as HWEE-TRFN (Type-I recurrent fuzzy neural network) and HWEE-RSEIT2FNN (Type-II recurrent fuzzy neural network), respectively. However, they both lack an analysis of the statistics among different classes, explaining why their performances are lower than our method.

Page 13: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

13

Table II. Speech detection accuracy.

Speech detection ALT-SubEnpy

[39]

HWEE-TRFN

[40]

HWEE-RSEIT2FNN

[40]

DDA2FNN

Average correct rate 85.8% 87.4% 88.2% 89.6%

B. Speech Recognition in Noisy Environments

The second experiment adopted the modified two-dimensional cepstrum (MTDC) [41] in Fig. 8 as the robust speech feature. The speech signal was first transmitted through a first-order pre-emphasis filter with a pre-emphasis coefficient of 0.97. Then the speech signal is divided into frames by multiplying a Hamming window. The energy of each frequency band was computed using discrete-fourier-transform (DFT) and mel-scale filter-bank. The 4th-order infinite-impulse-response (IIR) temporal-filter and the half-wave rectification were used to remove additive noise component along the frame axis. The MFC coefficients were then obtained by taking the logarithm and cosine-transform. In this step, the MFC coefficients of each frame along the time axis were collected to form the MFC-time matrix. Finally, the MTDC matrix was generated from the real-part coefficients of the inverse discrete Fourier transform (IDFT) along the time axis. To represent an utterance, only 30 MTDC coefficients were selected to form a feature vector ( , 1, 2, ,30ix i ). These 30 positions were randomly selected instead of using the genetic algorithm (GA). The 16-bit speech data were a set of isolated Mandarin digits (0~9) spoken by 10 speakers. The sampling rate was 8 kHz, and the frame size was 240 samples with 50% overlap. Training was performed by 2000 utterances. The other 2000 utterances were used for testing.

To verify the performance, Table III summarizes the recognition accuracy of different fuzzy classifiers averaged over five SNR conditions (SNR = 0db, 5dB, 10dB, 15dB, 20dB) and five noise types (multi-talker babble noise, cockpit noise, noise on the floor of car factory, vehicle noise and white noise). The noise was again taken from the NOISEX-92. The attenuation was applied to ensure the addition of noise without causing an overflow of the 16-bit integer range. The recognition accuracy was utterance-wise. Experimental results indicate that the proposed DDA2FNN performed better than the others.

Page 14: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

14

Fig. 8 Flowchart of modified two-dimensional cepstrum (MTDC).

Page 15: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

15

Table III. Speech recognition accuracy.

FNN Classifier SONFIN-PCA

[10]

MDSOFN-LDA

[12]

SEIT2FNN

[36]

DDA2FNN

Structure analysis 24 rules 20 rules 12 rules 10 rules

Number of parameters 2784 2550 8520 2250

Average correct rate 77.4% 80.2% 78.6% 81.4%

For the purpose of comparison, we also studied the performances of different methods of speech recognition using the same training and test speech sequences. Hidden–Markov-Model (HMM) is usually applied to speech recognition [18]. In HMM, the maximum likelihood (ML) is used to determine the parameters. With the robust speech feature MFCC, the average recognition rate of HMM is about 76.6%. Furthermore, a Gaussian Mixture Model (GMM) [41] is applied for comparison. In the training phase, each model (Mandarin digit) is trained by a mixture of four Gaussian distribution density functions. According to TABLE IV, EDRFNN is better than HMM and GMM because of the ability to discriminate highly confusable patterns.

Table IV. Speech recognition accuracy.

Speech Recognition MFCC+HMM

[18]

MTDC+GMM

[41]

DDA2FNN

Average correct rate 76.6% 77.8% 81.4%

Page 16: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

16

C. Theoretical Analysis

Table V summarizes the analysis of different classifiers. All classifiers have the same cost function

21

2dE y y , but they differ in their other cost function. Therefore, the effect of the other cost function

could be observed. In SONFIN-PCA, the cost function of arg max[ ]TX

ee e increases the variance. In

MDSOFN-LDA, the cost function of arg maxT

BTe

W

e S e

e S e minimizes the within-class variance and maximizes

the between-class variance. In contrast to SONFIN-PCA, MDSOFN-LDA seeks directions that maximize the

discriminability instead of maximizing the variance, explaining why MDSOFN-LDA performed better than

SONFIN-PCA. In the cost function MCER of DDA2FNN, ln |jTjP W X n is related to the

class-conditioned likelihood. The function 1

1ln |

1

JjT

mmm j

P W X nJ

defines how the class-conditioned

likelihoods for the competing models , 1,2,......, ,m m J m j , are counted in the classification error

function. For this reason, DDA-based type-2 fuzzy rules yielded the most discriminative representation in the

above experiments.

Page 17: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

17

Table V. Analysis of different fuzzy neural networks

SONFIN-PCA MDSOFN-LDA SEIT2FNN DDA2FNN

Fuzzy Type Type-1 fuzzy set Type-1 fuzzy set Type-2 fuzzy set Type-2 fuzzy set

Consequent

Parts

TS-type TS-type TS-type TS-type

Cost Functions

1. 21

2dE y y

2. arg max[ ]TX

ee e

1. 21

2dE y y

2. arg maxT

BTe

W

e S e

e S e

1. 21

2dE y y

2. None

1. 21

2dE y y

2. arg maxT

BTe

W

e S e

e S e

3.

1 1

1

ln |

1ln |

1

j

jTj

NJ

JMCEjT

j n mmm j

P W X n

R lP W X n

J

Characteristic

1. PCA maximizes the

covariance.

2. The gradient descent

method adjusts the

Gaussian function.

1. LDA maximizes

between-class and

minimizes

within-class.

2. The gradient

descent method

adjusts the Gaussian

function.

In handing problems

with uncertainties

such as noisy data,

type-2 fuzzy-systems

usually outperform

their type-1

counterparts.

The antecedent parts adopt

interval type-2 fuzzy-sets to

model the uncertainty, and the

consequent parts adopt the

proposed DDA to enhance the

discriminability.

Drawback If LDA and PCA have

the same between-class,

PCA increases the

variance and decreases

the discriminative

capabilities.

Need the information

of between-class and

within-class.

The cost function

does not consider the

discriminability.

DDA needs extra memory to

process LDA-matrix and

MCE-matrix.

Page 18: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

18

5. CONCLUSIONS Because speech classification is very crucial to human-machine interaction, this study proposes a

dual-discriminability-analysis type-2 fuzzy neural network (DDA2FNN). To solve the problems of uncertainties in noisy speech classification, interval type-2 fuzzy-sets are adopted in the antecedent parts to model and minimize the effect of uncertainty. To increase the discriminability, a dual-discriminability-analysis (DDA) method is then proposed in the consequent parts. The novelty of DDA is its consideration of both linear-discriminant-analysis (LDA) and minimum-classification-error (MCE). The proposed dual-discriminability-analysis type-2 fuzzy rule includes LDA-matrix and MCE-matrix. The proposed DDA2FNN considers both uncertainty and discriminability, unlike other existing fuzzy neural networks. Additionally, DDA2FNN can classify noisy data, while preserving a small network size. The effectiveness of the proposed DDA2FNN is demonstrated by testing two speech classification problems. Experimental results and theoretical analysis show that the proposed DDA2FNN performs better than the other fuzzy neural networks.

REFERENCES 1. C. F. Juang, C. N. Cheng, and T. M. Chen, “Speech detection in noisy environments by wavelet

energy-based recurrent neural fuzzy network,” Expert System with Applications, vol. 36, no. 1, 2009, pp. 321-332.

2. C. C. Tu, and C. F. Juang, “Recurrent type-2 fuzzy neural network using Haar wavelet energy and entropy features for speech detection in noisy environment,” Expert System with Applications, vol. 39, no. 3, 2012, pp. 2479-2488.

3. J. Alcala-Fdez, R. Alcala, and F. Herrera, “A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning,” IEEE Trans. Fuzzy Syst., vol. 19, no. 5, 2011, pp. 857-872.

4. X. Yang, G. Zhang, J. Lu, and J. Ma, “A kernel fuzzy c-means clustering-based fuzzy support vector machine algorithm for classification problems with outliers or noises,” IEEE Trans. Fuzzy Syst., vol. 19 , no. 1, 2011, pp. 105-115.

5. H. J. Song, C. Y. Miao, R. Z. Wuyts, Q. Shen, M. D'Hondt, and F. Catthoor, “An extension to fuzzy cognitive maps for classification and prediction,” IEEE Trans. Fuzzy Syst., vol. 19, no. 1, 2011, pp. 116-135.

6. G. G. Pirlo, and D. D. Impedovo, “Fuzzy-zoning-based classification for handwritten characters,” IEEE Trans. Fuzzy Syst., vol. 19, no. 4, 2011, pp. 780-785.

7. J. W. Hung, H. M. Wang, and L. S. Lee, “Comparative analysis for data-driven temporal filters obtained via principal component analysis (PCA) and linear discriminant analysis (LDA) in speech recognition,” in Proc. Eurospeech, 2001

8. J. C. Lv, K. K. Tan, Z. Yi, and S. Huang, “A family of fuzzy learning algorithms for robust principal component analysis neural networks,” IEEE Trans. Fuzzy Syst., vol. 18, no. 1, pp. 217-226, Feb. 2010.

9. K. Honda, A. Notsu, and H. Ichihashi, “Fuzzy PCA-guided robust k-means clustering,” IEEE Trans. Fuzzy Syst., vol. 18, no. 1, 2010, pp. 67-79.

10. C. F. Juang, and C. T. Lin, “An on-line self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Syst., vol. 16, no. 1, 1998, pp. 12-32.

11. G. D. Wu, and C. T. Lin, “Word boundary detection with mel-scale frequency bank in noisy environment,” IEEE Trans. Speech Audio Process., vol. 8, no. 5, 2000, pp. 541-554.

12. G. D. Wu, and P. H. Huang, “A maximizing-discriminability-based self-organizing fuzzy network for classification problems,” IEEE Trans. on Fuzzy Syst., vol. 18, no. 2, 2010, pp. 362-373.

13. J. H. Chiang, and P. Y. Hao, “Support vector learning mechanism for fuzzy rule-based modeling: A new approach,” IEEE Trans. Fuzzy Syst., vol. 12, no. 1, 2004, pp. 1-12.

14. Y. Chen, and J. Z. Wang, “Support vector learning for fuzzy rule-based classificaton systems,” IEEE Trans. Fuzzy Syst., vol. 11, no. 6, 2003, pp. 716-728.

15. C. T. Lin, C. M. Yeh, S. F. Liang, J. F. Chung, and N. Kumar, “Support-vector-based fuzzy neural network for pattern classification,” IEEE Trans. Fuzzy Syst., vol. 14, no. 1, 2006, pp. 31-41.

Page 19: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

19

16. C. F. Juang, S. H. Chiu, and S. W. Chang, “A self-organizing TS-type fuzzy network with support vector learning and its application to classification problems”, IEEE Trans. Fuzzy Syst., vol. 15, no. 5, 2007, pp. 997-1008.

17. A. Biem, “Minimum classification error training for online handwriting recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 7, 2006, pp. 1041-1051.

18. J. W. Hung, and L. S. Lee, “Optimization of temporal filters for constructing robust features in speech recognition,” IEEE Trans. Audio, Speech, and Lang. Process., vol. 14, no. 3, 2006, pp. 808-832.

19. M. A. Shoorehdeli, M. Teshnehlab, and A. K. Sedigh, “Training ANFIS as an identifier with intelligent hybrid stable learning algorithm based on particle swarm optimization and extended Kalman filter,” Fuzzy Sets and Systems, vol. 160, no. 7, 2009, pp. 922-948.

20. R. K. Brouwer and A. Groenwold, “Modified fuzzy c-means for ordinal valued attributes with particle swarm for optimization,” Fuzzy Sets and Systems, vol. 161, no. 13, 2010, pp. 1774-1789.

21. S. K. Oh, W. D. Kim, W. Pedrycz, and B. J. Park, “Polynomial-based radial basis function neural networks (P-RBF NNs) realized with the aid of particle swarm optimization,” Fuzzy Sets and Systems, vol. 163, no. 1, 2011, pp. 54-77.

22. R. Jensen and Q. Shen, “Fuzzy-rough data reduction with ant colony optimization,” Fuzzy Sets and Systems, vol. 149, no. 1, 2005, pp. 5-20.

23. C. F. Juang and C. Lo, “Zero-order TSK-type fuzzy system learning using a two-phase swarm intelligence algorithm,” Fuzzy Sets and Systems, vol. 159, no. 21, 2008, pp. 2910-2926.

24. C. F. Juang, R. B. Huang., and Y. Y. Lin, “A recurrent self-evolving interval type-2 fuzzy neural network for dynamic system processing,” IEEE Trans. Fuzzy Syst., vol. 17, no. 5, 2009, pp.1092-1105.

25. F. J. Lin, and P. H. Chou, “Adaptive control of two-axis motion control system using interval type-2 fuzzy neural network,” IEEE Trans. Ind. Electron., vol. 56, no. 1, 2009, pp. 178-193.

26. C. F. Juang, and C. H. Hsu, “Reinforcement interval type-2 fuzzy controller design by online rule generation and Q-value-aided ant colony optimization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 6, 2009, pp. 1528-1542.

27. C. F. Juang, R. B. Huang, and W. Y. Cheng, “An interval type-2 fuzzy-neural network with support-vector regression for noisy regression problems,” IEEE Trans. Fuzzy Syst., vol. 18, no. 4, 2010, pp. 686-699.

28. X. Du, and H. Ying, “Derivation and analysis of the analytical structures of the interval type-2 fuzzy-PI and PD controllers,” IEEE Trans. Fuzzy Syst., vol. 18, no. 4, 2010, pp. 802-814.

29. M. Biglarbegian, W. W. Melek, and J. M. Mendel, “On the stability of interval type-2 TSK fuzzy logic control systems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 40, no. 3, 2010, pp. 798-818.

30. J. Markoff. “Scientists See Promise in Deep-Learning Programs,” New York Times, Nov 24, 2012. 31. J. Huang and B. Kingsbury. “Audio-visual deep learning for noise robust speech recognition,”

ICASSP, 2013. 32. A. Mohamed, G. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” IEEE Trans.

on Audio, Speech, and Language Processing, vol. 20, no. 1, 2012, pp. 14–22. 33. N. Morgan. “Deep and Wide: Multiple Layers in Automatic Speech Recognition,” IEEE Trans. on

Audio, Speech, and Language Processing, vol. 20, no. 1, 2012, pp. 7-13. 34. G. D. Wu, Z. W. Zhu, and P. H. Huang, “A TS-type maximizing-discriminability-based recurrent

fuzzy network for classification problems,” IEEE Trans. on Fuzzy Syst., vol. 19, no. 2, 2011, pp. 339-352.

35. J. M. Mendel, “Computing derivatives in interval type-2 fuzzy logic systems,” IEEE Trans. Fuzzy Syst., vol. 12, no. 1, 2004, pp. 84-98.

36. C. F. Juang, and Y. W. Tsao, “A self-evolving interval type-2 fuzzy neural network with online structure and parameter learning,” IEEE Trans. Fuzzy Syst., vol. 16, no. 6, 2008, pp. 1411-1424.

37. G. D. Wu, and C. T. Lin, “A recurrent neural fuzzy network for word boundary detection in variable noise-level environments,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 31, no. 1, 2001, pp. 84-97.

38. M. G. Varga, and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Comput. Speech Lang., vol. 12, 1993, pp. 247-251.

Page 20: Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network

20

39. K. C. Wang, “A Novel Approach Based on Adaptive Long-term Sub-band Entropy and Multi-thresholding Scheme for Detecting Speech Signal,” IEICE Trans. on Information and Systems, vol. E95-D, no.11, 2012, pp.2732-2736.

40. C. C. Tu and C. F. Juang, “Recurrent type-2 fuzzy neural network using Haar wavelet energy and entropy features for speech detection in noisy environments,” Expert Systems with Applications, vol.39, no.3, 2012, pp.2479-2488.

41. C. T. Lin, H. W. Nein, and J. Y. Hwu, “GA-based noisy speech recognition using two-dimensional cepstrum,” IEEE Trans. Speech and Audio Process., vol. 8 , no. 6, 2000, pp. 664-675.