[IEEE IEEE Workshop on Applications of Signal Processing to Audio and Acoustics - New Paltz, NY, USA (17-20 Oct. 1993)] Proceedings of IEEE Workshop on Applications of Signal Processing

ADAPTIVE PREDICTIVE CODING WITH TRANSFORM DOMAIN

RESOLUTION SPECTRAL MODELING QUANTIZATION USING BLOCK SIZE ADAPTATION AND HIGH-

B. R. Uduya Bhaskar

COMSAT Laboratories 22300 COMSAT Drive

Clarksburg, MD 2087 1,USA

ABSTRACT

The adaptive predictive coding with transform domain quantization (AX-TQ) technique was presented at the 1991 workshop [ref. I] for the compression of audio signals. Over the past two years. significant developments have taken place leading to a reduction in the coding rate. while enhancing the audio quality. These developments include (i) the use of block size adaptation to exploit the variations in the stationarity of the signal, (ii) high resolution spectral modeling using LPC analysis orders upto 64 and (iii) an adaptive bit-allocation procedure to minimize coding noise power as well as minimize the pauption of coding noise. The result is near transparent quality compression of 5 kHz bandwidth audio at the rate of 17 kbit/s. This technology will find applications in the distribution and transmission of AM quality audio programming over low rate channels such as the INMARSAT Standard A. B and Aeronautical systems.

1. INTRODUCTION

The fundamental characteristic of audio signals that permits bit rate reduction is the non-uniform distribution of power in the frequency domain. In addition, the perceptual sensitivity of the human auditory system varies across the frequency band. depending on the spectral power distribution. Significant reductions in bit rate can be achieved by exploiting these characteristics of the signal and the human ear. This forms the basis of a number of audio coding techniques such as perceptual transform coders and subband coders.

The APC-TQ technique exploits the non-uniform power spectral distribution by time domain prediction analysis and filtering techniques. The resulting prediction residual is quantized in the transform domain. A non-uniform distribution of the available bits controls the reconstruction noise power spectrum based on perceptual as well as objective considerations.

In general. fomard adaptive predictive methods such as APC-TQ can exploit the non-uniform power spectral distribution of

the input signal more efficiently than transform or subband coders. In these methods. the predictors m t optimized for each block of input signal samples resulting in a highly dccorrelated prediction residual signal. In contrast, practical transform coding with a fixed suboptimal transform such as the discrete cosine transform @cr) results in less complete decorrelation of the transform coefficients. In tbe case of subband coders, the number of subbands limits the extent to which the spectral variations are exploited. The strength of transform and subband coding schemcs. which is the case with which auditory characteristics arc exploited. is retained in AF'C-TQ due to quantization in the transform domain. This permits direct implementation of auditory noise masking models [ref. 31. thereby maximizing the perceived quality of the reconstructed audio signal.

Since a description of the AFCTQ technique can be found in the references 1 and 2, this paper will concentrate on the developments that have led to further bit-rate reduction and quality improvement. These are: (i) the use of block size adaptation to exploit the variations in stationarity, (ii) high resolution spectral modeling using LPC analysis orders upto 64 and (iii) an adaptive bit- allocation procedure to minimize coding noise power as well as minimize the perception of coding noise.

2. BLOCK SIZE ADAPTATION

The APC-TQ encoder is illustrated in Figure 1. The decoder is a simpler subset of encoder functions. The audio signal is bandlimited to 5 kHz bandwidth and sampled at a rate of 10240 samplesls. The codcc processes these samples in blocks whose size (i.e. number of sampledblock) varies in accordance with the short- term stationarity of the audio signal. Audio signals exhibit long intervals of stationarity. interspersed with periods of gradual or rapid change. During stationary intervals. it is advantageous to use a higher block size since this improves the efficiency of the spectral parametrization andlor permits a higher resolution spectral analysis. However, a small block size is essential when the signal is non- stationary. so that the changes in the signal characteristics am backed adequately. Block size adaptation meets these conflicting

requirements by matching the block size to the stationarity of the signal. The block size is adjusted in steps of 256 samples. The 256 sample unit is called a sub-block. Upto four sub-blocks may be concatenated to form a block. thus permitting block sizes of 256. 512.768 or 1024 samples.

2.1 Computation of a Non-Stationarity Measure

A measure of non-stationarity was developed based on an approach similar to the Itakura-Saito distortion measure [ref. 41. Let (x(n),O 5 n c N) be the existing block, and let (y(n).O S n c 15) be the new sub-block. Here. L is the sub-block size which quals 256. and N is the block size which can be 256,512 or 768. The block and the subblock are modeled by 1 6 4 order LPC models. k t (u,.O S m 5 16) be the LPC p a r a " of the new sub-block and

let (b .O I m I 16) be the LPC parameters of the existing block

with a = bo = 1. The prediction error power Ea due to prediction

filtering of the new sub-block with the LPC parameters of the new sub-block. i.e. (a ) is given by:

m

0

m

Similarly. the prediction error power Eb due to prediction filtering

of the new sub-block with the LPC parameters of the existing block, i.e, (b ) is given by:

m

The non-stationarity measure is then defined as

D(a.b)is non-negative since E 2 E . For the new sub-block. b a

(a ] minimizes the prediction error power, whereas (b ) can at m m

best match the performancc of (a ). The closer D(a.6) is to zero. m

the higher the spectral similarity of the sub-block to the block and hence higher the degree of non-stationarity of the subblock relative to the block. A threshold of 1.2 dB was determined as a satisfactory level to descriminate between stationarity (D(u.6) c 1.2) and non- stationarity ( D ( a . 6 ) 2 1.2). If a sub-block is found to be non- stationary relative to the existing block, the existing block, which could consist of 256.512 or 768 samples. is terminated and encoded.

Otherwise, the present sub-block is concatenated to the existing block. and the process is repeated until e i lha the block size reaches IO24 or a sub-block is found to be non-stationary relative to the existing block. Figure 2 illustrates the variations in the non- stationarity measure for an audio signal segment It is seen that the transitions in the characteristics of the audio signal are clearly marked by high values of the non-stationarity measure.

3. ADAPTIVE HIGH-ORDER SPECTRAL MODELING

The APC-TQ codec relies upon a short term model for prediction filtering as well as critcal band analysis leading to bit- allocation. From the perspective of objective predictor performance alone, (i.e. based on average prediction gain) a relatively low order was found to suffice in our earlier studies. However, our recent studies indicate that from the perspective of critical band and masking analysis and effective bit-allocation a significantly higher order is advantageous. With higher model orders. relatively small spectral peaks arc represented and now receive bit-allocation. As model orders incnased to 64 and above, perceptual performance continued to increase. However. the order cannot be arbitrarily high, s i n a the parameters must be transmitted to the decoder. Since, with increasing block size more bits are available to encode the parameters. the order can be increased in proportion to the block size. With these considerations, the short term model order was selected based on the block size. Orders of 16.32.48 and 64 were used respectively for the four possible block sizes mentioned earher. Model order is denoted by M in the following discussion.

3.1 Power Gain Control of LPC Parameters

For audio signals, which often display high spectral dynamic range corresponding to highly resonant sounds, the LPC parameter values can be large. The power gain G of the LPC parameters (a .O S m I M ] is a measure of LPC parameter values

and can be defined as: m

M

It is found that increases in modcl order arc accompanied by sharp increases in the values of G. Values as high as 30 dB have been observed for certain blocks of audio signals.

Such large values of G arc detrimental to the performance of the coder, sincc they reflect the gain by which the reconstruction noise of the previous block (stored in the delay lines of the synthesis filters) is amplified and added to the signal being reconstructed for the present block. In other words, the amplitude of the zero input response of the the decoder synthesis filter increases with G . This is

clearly undesirable, and the value of G must be reduced for satisfactory operation of the codec. Further, this reduction must bc accomplished without significantly compromising the spectral modeling accuracy of the LPC model.

This problem has been studied in the context of voice coding, where the roll-off introduced by the anti-aliasing filters causes large valued LPC parameters. The solution developed by Atal [ref. 51 is to compute the LPC parameters for a signal obtained by adding a low level noise to the signal being modeled. The addition of noise has the effect of raising the floor of the signal power spectrum, thus reducing the spectral dynamic range. As a result, the LPC parameter values and the power gain G are reduced. If the power level and the power spectrum of the noise arc chosen cartfully. there is no deterioration in the spectral modeling accuracy in the frequency ranges of interest

A modifcation of the above solution has been developed which has advantages for audio signals. The optimal LPC parameters

(um,O S m S M) are quantized and transmitted to the decoder. At

the encoder as well as the decoder, spectral analysis and bit- allocation functions are performed based on the spectral estimates obtained using these optimal parameters. These parameters are not used for prediction or synthesis filtering operations, as they arc likely to have a high power gain. A second set of LPC parameters [a ,O S m S M) are derived solely from the (quantized) optimal

parameters at the encoder and the decoder. using a power gain reduction procedure. The (a ] are used for prediction and synthesis

filtering operations.

m

m

The procedure for determination of (a ) from ( a ) is

based on the use of Levinson's recursions. First. the autoconelations (r } corresponding to the optimal LPCparameters (0 ) are

determined by a reversal of Levinson's recursions. Next. the autocorrelations ( r ] arc modified in a manner similar to the

method developed by Atal. to emulate the addition of a noise signal. Finally. using the modified autocorrelations. the Levinson recursions are used to determine the power gain reduced LPC pararmeters (a ) . Substantial reductions in power gain was achieved with

relatively small losses in prediction gain using this procedure. At the same time. the use of optimal parameters for spectral analsis improved the efficiency of bit a l l d o n .

m m

m m

m

m

3.2 Quantization of Short Term Parameters

In order to efficiently quantize the short term parameters. split vector quantization of the tine spectral frequencies (LSFs) was investigated [ref.6]. However. i t was found that for orders above 22, the numerical procedure used in the computation of the LSFs led to

unstable models for some audio signal blocks. Consquently. a less efficient quantization based on the log area ratio method [ref.7] is being used. More efficient methods are under development. which if successful. will further reduce the bit rate of the codec to 16 kbit/s.

4. BIT-ALLOCATION BASED ON OBJECnVE AND PERCEPTUAL CRITERIA

With bit-allocation based entirely on the auditory noise masking threshold computed using critical band analysis, occasionally, the codec performance was unstable. This was probably caused by a high level of quantization noise (albeit below the masking threshold) at the frequency corresponding to a synthesis f i lm pole very close to the unit circle. Bit-allocation based purely on objective criteria did not have this problem, since the mean squared reconstruction noise is minimized. However, aside from this advantage, the performance of the objective bit-allocation was clearly inferior to that of the perceptual bit-allocation. Consequently. a combination bit-allocation procedure was developed. whereby a fraction of the bits arc distributed based on objective criteria, and the remainder SE distributed based on perceptual criteria. About 70% of tbe bits are distributed based on objective criteria, while the remaining 30% arc distributed using perceptual criteria. This approach was very successful in maintaining stability, while providing perceptually a high level of audio quality.

5. REFERENCES

1. B. R. Udaya Bhaskar, "Adaptive Prediction with Transform Domain Quantization for Low Rate Audio Coding". 1991 IEEE ASSP Workshop on Applications of Signal Roccssing to Audio and Acoustics.

2. "Final Report - Low Rate Coding of Sound", COMSAT Laboratories Final Report, under Task MCD 6000-011, December 1992.

3. J. D. Johnston, 'Transform Coding of Audio Signals Using Perceptual Criteria", IEEE Journal on Selected Areas in Communications. Volume 6. pp 3 14-323. February 1988.

4. L. R. Rabiner and R. W. Schafer, "Digital Processing of Speech Signals". Prentice-Hall. Inc., Englewood Cliffs. N.J.. 1978.

5 . B. S. Atal. "Predictive Coding of Speech at Low Rates", IEEE Transactions in Communications, Vol. COM-30, No. 4, April 1982.

6. K. K. Palival and B. S. Atal. "Efficient Vector Quantization of LPC Parameters at 24 bitsfirame", IEEE Transactions on Spctch and Audio Processing. Vol 1, No 1. January 1993, pp 3-14.

7. R. Vishwanathan and J. Makhoul. "Quantization Properties of Transmission Parameters in Linear Predictive Systems". IEEE

Transactions on Acoustics. Speech and Signal Processing. Vol. ASSP-23. NO. 3, J U X 1975, Pp. 309-321.

4

Figure 1. APC-TQ Encoder Schematic Block Diagram

w-1 12.0 1 4 0 16.0 11.0 90.0 9 l . O 94.0 96.0 91.0

I '-.

0. Y

.lam a

. r m m . a

I I I I I I 1 I II.I. 1 I I I I I I I I IZO."

I I 2

I 2 0 I 4 0 16 0 11.0 90.0 92.0 94.0 96.0 91.0 - Nmibr, A

Figure 2. Non-Stationarity Measure for an Audio signal

* To :acodu

Documents

[IEEE IEEE Workshop on Applications of Signal Processing to Audio and Acoustics - New Paltz, NY, USA (17-20 Oct. 1993)] Proceedings of IEEE Workshop on Applications of Signal Processing