24
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University of Technologyh, Japan) 1

Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Embed Size (px)

Citation preview

Page 1: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Speaker Identification by Combining MFCC and Phase Information

Longbiao Wang (Nagaoka University of Technologyh, Japan)

Seiichi Nakagawa(Toyohashi University of Technologyh, Japan)

1

Page 2: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Background

The importance of phase in human speech recognition has been reported.

In conventional speaker recognition methods based on mel-frequency cepstral coefficients (MFCCs), phase information has hitherto been ignored.

2

Page 3: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Purpose and method

We aim to use the phase information for speaker recognition.

We propose a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech and combines the phase information with MFCCs .

3

Page 4: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Investigating the effect of phase

4

Conventional MFCCs that capture the vocal tract information cannot distinguish the different speaker characteristics caused by vocal source.

The phase is greatly influenced by vocal source characteristics.

We generated a speech wave for different vocal sources and pitch, and a fixed vocal tract shape corresponding to vowel /a/.

Page 5: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Phase information extraction

The short-term spectrum S(ω, t) for the i-th frame of a signal is obtained by the DFT of an input speech signal sequence

• For conventional MFCCs, power spectrum is used, but the phase information            is ignored. In this paper, phase      is also extracted as one of the feature parameters for speaker recognition.

5

Page 6: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Problem of unnormalized phase

6

Example of the effect of clipping position on phase for Japanese vowel /a/

However, the phase       changes depending on the clipping position of the input speech even with the same frequency ω. The unnormalized wrapped phases of two windows become

quite a bit different because the phases change depending on the clipping position.

Page 7: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Phase normalization (1/2)

To overcome this problem, the phase of a certain basis radian frequency of all frames is converted to constant, and the phase of the other frequency is estimated relative to this. In the experiments discussed in this paper, the phase of basis radian frequency     is set to 2π × 1000 Hz.

For example, setting the phase of the basis radian frequency

to π/4, we have

7

Page 8: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Phase normalization (2/2)

The difference of unnormalized wrapped phase on basis frequency and the normalized wrapped phase is

With ω = 2πf in the other frequency (that is,                ), the difference becomes

   Thus, the spectrum on frequency ω becomes

   and the phase information is normalized as

8

Page 9: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Comparison of unnormalized phase and normalized phase

9Example of the effect of clipping position on phase for Japanese vowel /a/

After normalizing the wrapped phase, the phase values become very similar.

Page 10: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

From phase θ to phase{cosθ, sinθ}

10

There is a problem with this method when comparing two phase values. For example, with the two values and , the difference is then the difference       despite the two phases being very similar to  one   another. Therefore, for this research, we changed the phase into coordinates on a unit circle, that is,

Page 11: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

How to synchronize the splitting section

Page 12: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Combination method

12

The likelihood of MODEL 1 is linearly coupled with that of MODEL 2 to produce a new score given by

where is the likelihood produced by the n-th speaker model based on MFCC and the n-th speaker model based on phase, n=1,2,…,N with N being the number of speakers registered.

The GMM based on MFCCs is combined with the GMM based on phase information.

Page 13: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

NTT database # speaker: 35 (22 males and 13 females) # session: 5 (1990.8, 1990.9, 1990.12, 1991.3, 1991.6) # training utterance: 5 (1990.8) # test utterance: 1 (about 4 seconds),

35×4×5=700 trials

13

JNAS database# speaker: 270 (135 males and 135 females)# training utterance: 5 (about 2 seconds / sentence)# test utterance: 1 (about 5.5 seconds), about 95 sentences / person

270×95=25650 trials

Experimental setup (1/3)

Page 14: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

Noise Stationary noise (in a computer room) Non-stationary noise (in an exhibition hall)

14

Experimental setup (2/3)

Noisy speech Noise was added to clean speech at the average SN ratios of

20 dB and 10 dB, respectively. 

Page 15: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

15

MFCC Phase

Sampling frequency 16k Hz

Frame length 25 ms 12.5 ms

Frame shift 12.5 ms 5 ms

Dimensions 25 {θ}: 12

{ cosθ,sinθ} :24

GMMs 8 mixtures with full-covariance matrices

64 mixtures with diagonal covariance matrices

Experimental setup (3/3)

Page 16: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

16

Speaker identification using clean speech

Page 17: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

17

Speaker identification result on NTT database (1/2)

Speaker identification results using the combination of MFCC-based GMM and the original phase {θ}

Page 18: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

18

Speaker identification results using the combination of MFCC-based GMM and the modified phase {cosθ, sinθ}

Speaker identification result on NTT database (2/2)

Page 19: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

19

Speaker identification result on JNAS database

Speaker identification results using the combination of MFCC-based GMM and the modified phase {cosθ, sinθ}

Page 20: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

20

Speaker identification under stationary/non-stationary noisy

conditions

Page 21: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

21

Clean model Clean model + frame deletion

4045505560657075808590

MFCCPhaseCombination

Speaker identification results under noisy conditions (1/2)

NTT database

Spe

aker

iden

tific

atio

n ra

te

(%)

Page 22: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

22

Speaker identification results under noisy conditions (2/2)

JNAS database

Spe

aker

iden

tific

atio

n ra

te

(%)

Clean model Clean model + frame deletion

20

30

40

50

60

70

80

90

MFCCPhaseCombination

Page 23: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

23

Conclusion

We proposed a phase information extraction method which normalizes the change variation

of phase depending on the clipping position of the input speech and integrates the phase information with MFCC.

The experimental results showed that the combination of phase information and MFCC improved the speaker recognition performance remarkably than MFCC-based method.

Page 24: Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University

24

Thank you for your attention!