15
I I T B o m b a y 17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7 Transformation of Short-Term Spectral Envelope of Speech Signal Using Multivariate Polynomial Modeling P. K. Lehana P. C. Pandey {lehana, pcpandey}@ee.iitb.ac.in EE Dept, IIT Bombay 30 th January, 2011 1/15

Transformation of Short-Term Spectral Envelope of Speech Signal

  • Upload
    kira

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

Transformation of Short-Term Spectral Envelope of Speech Signal Using Multivariate Polynomial Modeling P. K. Lehana P . C. Pandey { lehana , pcpandey }@ ee.iitb.ac.in EE Dept, IIT Bombay 30 th January, 2011. 1/15. PRESENTATION OUTLINE. 1. Introduction - PowerPoint PPT Presentation

Citation preview

Page 1: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

Transformation of Short-Term Spectral Envelope of Speech

Signal Using Multivariate Polynomial Modeling

P. K. LehanaP. C. Pandey

{lehana, pcpandey}@ee.iitb.ac.in

EE Dept, IIT Bombay30th January, 2011

1/15

Page 2: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

PRESENTATION OUTLINE

1. Introduction

2. Multivariate Polynomial Modeling

3. Methodology

4. Results

5. Conclusion

2/15

Page 3: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

1. INTRODUCTION

Speaker transformation

Modification of the speech signal of the source speaker to make it perceptually similar to that of the target speaker.

Processing steps in transformation

Estimation of mapping

▫ Estimation of the source and the target parameters

▫ Alignment of the parameters

▫ Estimation of the source-to-target transformation function(s)

Transformation of source speech

▫ Estimation of the source parameters

▫ Application of the transformation function(s) on the source parameters

▫ Generation of the transformed speech

3/15

Page 4: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

Spectral parameters for transformation Formant frequencies Line spectral frequencies (LSFs) Cepstral coefficients Mel frequency cepstrum coefficients (MFCCs): robust w.r.t. to noise,

coefficients uncorrelated with each other and hence suitable for interpolation.

Transformation methods Vector quantization (Shikano, 86): degradation in the output speech quality due to

discretization of the acoustic space. Statistical and ANN (Narendranath, 98; Stylianou, 98; Ye, 06): large set of training data and computation needed.Frequency warping and interpolation (Rinscheid, 96; Hashimoto, 96; Jian, 07; Masuda, 07; Valbret, 92): different transformation functions needed for different acoustic classes.

4/15

Page 5: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

Research objective

Modification of spectral characteristics by modeling the source-target relationship using a single mapping applicable to all acoustic classes, by

modeling each parameter of the target speech as a multivariate polynomial function of all the parameters of the source speech,

harmonic plus noise model (HNM) based analysis- synthesis.

5/15

Page 6: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

2. MULTIVARIATE POLYNOMIAL MODELING

ModelingApproximation of m-dimensional function g, known at q points (wn), by a

multivariate polynomial with terms Фk and error n1

1 20

( , , , ) , 0,1,..., 1p

k k n n mn n nk

c w w w g n q

Coefficients ck obtained for minimizing the sum of squared errors.

Application ▫ Relationship between the parameters of the corresponding source and target frames obtained by modeling each parameter of the target speech as a multivariate polynomial function of all the parameters of the source speech.

▫ Each parameter of a target frame obtained as the corresponding function of all the parameters of the corresponding source frame.

6/15

Page 7: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

3. METHODOLOGYProcessing

HNM based analysis-synthesis as platform for transformation

▫ Harmonic band parameters: voicing, pitch, max. voiced frequency, harmonic magnitudes and phases.

▫ Noise band parameters: LP coefficients and energy.

Modification of parameters

▫ Harmonic magnitudes converted to MFCCs (20), transformed, & converted back to magnitudes; phases estimated by minimum-phase approximation.

▫ LP coeffs (20). converted to LSFs, transformed, & converted back to LP coeffs. Different transformation fns. for the voiced and the unvoiced frames.

▫ Linear transformation for time and pitch scaling.

7/15

Page 8: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

Estimation of spectral transformation functions

Transformation of source speech

Transformation functions investigated

▫ Univariate linear (UL) ▫ Multivariate linear (ML) ▫ Multivariate quadratic (MQ)

8/15

Page 9: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

Evaluation Material

A Hindi story with 80 sentences (10 kHz, 16 bits) from 5 speakers (2 M, 3 F). 77 sentences used for training, 3 for testing.

Preliminary evaluation ▫ Unity transformation (same speaker as the source and the target)

Identity not disturbed, a small degradation in quality. ▫ Pitch modificationTarget identity not achieved, quality degradation similar to the unity transformation. ▫ Spectral modificationSource identity changed towards target for the same gender transformation, slightly higher degradation in quality.

▫ Spectral modification along with pitch and time scalingSource identity close to the target for all the speaker pairs, quality same as in spectral

modification.

9/15

Page 10: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

Example: “Vah padne likhane men bahut achchha tha”

S

T

Tr_UL

Tr_ML

Tr_MQ

F1-F2 F1-M2

M1-F2 M1-M2

S

T

Tr_UL

Tr_ML

Tr_MQ

10/15

Page 11: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

Objective evaluation

Mahalanobis distance between two set of MFCC feature vectors (P,Q) ,

where P corresponds to the target speech and Q corresponds to the source or the transformed speech.

Subjective evaluation

XAB and MOS test (automated administration)

▫ Source, target, or modified randomly presented as X. Source or target randomly presented as A or B.

▫ No. of subjects: 6 ▫ Material: 2 sentences for each of the 4 speaker pairs ▫ No. of presentations for each stimulus: 3

T 1M ( , )D P Q P - Q P - Q Σ = Covariance matrix

11/15

Page 12: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

4. RESULTS

Distance TransformationF1-F2 F1-M1 M1-F2 M1-M2

Source 0.51 0.65 0.64 0.53

Tr_UL 0.68 0.65 0.61 0.64

Tr_ML 0.45 0.47 0.44 0.43

Tr_MQ 0.38 0.39 0.38 0.33

Stimulus Source Target Pitch modified Transformed

Score (%) 6 96 14 92

Transformation UL ML MQ

Score 1.7 2.8 3.1

• Mahalanobis distance of the target MFCCs

• XAB score (2 sentences × 3 presentations × 6 listeners, averaged across the 4 speaker pairs)Transformed: Tr_MQ along with pitch modification and time scaling

• MOS score (2 sentences × 3 presentations × 6 listeners, averaged across the 4 speaker pairs)

Highest reduction in the target-transformed distance for MQ based transformation

Identification errorsSource: 6 % Target: 4 % Transformed: 8 %

12/15

Page 13: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

DemoS: source, T: target, PM: pitch modified, SM: spectrum modified, TS: time scaledUL: univariate linear, ML: multivariate linear, MQ: multivariate quadratic

13/15

Page 14: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

5. CONCLUSION

Modification of spectral characteristics feasible by modeling the source-target relationship using multivariate polynomial functions for a single mapping applicable to all acoustic classes, without extensive training or labeling.

Methods investigated for transformation function: UL, ML, MQ. MQ resulted in satisfactory identity transformation and fair quality.

Further work

▫ Listening tests involving larger number of speaker pairs and listeners.

▫ Comparison with other transformation techniques.

14/15

Page 15: Transformation  of  Short-Term Spectral Envelope of Speech Signal

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P7

Thank you

15/15