EE 4810 Final project Presented to Prof. Dan Ellis By ...dpwe/classes/e4810-2005-09/projects/dapho… · cv: the compress two dimension vector (index and coefficient). fs: the sampling

EE 4810 – Fall 2005

Digital Signal Processing

Final project

Presented to

Prof. Dan Ellis

By

Alexandre Mercier-Dalphond COO-312-4864

and ByoungJun Oh

Monday December 12th 2005

1

Table of Content Introduction.............................................................................................................................. 2 Introduction.............................................................................................................................. 2 Methodology............................................................................................................................ 2 Implementation........................................................................................................................ 5 Results ...................................................................................................................................... 6 Conclusion ............................................................................................................................... 9 Appendix................................................................................................................................ 10

2

Introduction Many have qualified our society as an Information and Communication Society. The economy and everyone lives is based on the exchange of information. Since physical resources, such as bandwidth, are limited and the relevance of the information dies quickly, it is very important to send it as fast as possible using minimal resources. Compression reduces the size of the information to send thus minimizing transfer time or the necessary bandwidth or both. Images, videos and sounds pose great challenges and even if many compression schemes were developed it is still a very active research field. The purpose of this project is to implement a technique to compress audio signals and to analyse its effects. Our proposed technique consists of applying the Fourier Transform to an audio signal to determine the most important frequency components in order to discard the least significant ones. This document presents our methodology, implementation, results, MatLab code and a conclusion.

Methodology This section outlines the methodology used to compress, decompressed and analyse the effects of theses two operations on the audio signal.

1. Using MatLab, we input the audio file to compress and it represents it with a vector with values ranging from [-1V, 1V]. It is important that our program will be able to compress any wav file coded in PCM only due to MatLab restrictions.

2. Once the audio signal is represented as a vector, we separate the digital audio

sample into group of M samples. Each group will have L common sample with the next group in order to assure smooth transition between compressed segments. If the length of the audio signal is not a integer multiple of M-L, the audio vector is padded with zeros to obtain a zero modulo.

3. The Fast Fourier Fourier Transform, which is equivalent to the Discrete Fourier

Transform, is applied to each group of M samples using the fft() function in MatLab. The result will be M complex coefficients representing the frequency components of the audio signal.

3

For example, if M=16, the frequency vector of the DFT will have the following form:

4. Because the signal is real, the Fourier Transform coefficients will conjugate

symmetric around M/2 if M is even or between (M±1)/2 if M is odd. In order to compress the signal, the least significant frequency components will be discarded. Only the K coefficients with the greatest magnitude before the symmetry index will be kept, including the DC values. In the previous example, the symmetry index is 8 and the four greatest coefficients are in bold (K=4). Therefore we obtain the subsequent compressed frequency vector:

5. The compressed audio vector contains two dimensions: one for the index of each maximum recorded and its Fourier coefficient.

1st segment 2nd segment 3rd segment

6. To reconstitute the signal, each segment of length K is expanded to M

coefficients by applying the conjugate symmetry and by padding with zeros. This process is illustrated below with K=4 and M=16.

7. Then, the Inverse Fast Fourier Transform is applied to each vector of length M

using the ifft() function in Matlab. This will result in a one dimension audio vector with values ranging from [-1V, 1V] of length M.

8. Because the audio signal was divided in segments that overlap, the total length

of all decompressed segments exceeds the original length of the audio signal. Therefore, it not possible to simply add the Inverse Fourier Transform coefficients. In order to do this, a scaling window of length M must be created to reduce the weight of the overlapping portions. Our program will compare two types of scaling window: linear and cosine. Their respective properties are presented below.

It is important to note, that the first and last window of the audio signal only have one scaling portion instead of two.

MnLMforL

LMnLMnLfor

LnforL

n

W

Sin

1)(cos2

12

11

11

cos21

21

:

cos

MnLMforL

LMnLMnLfor

LnforL

n

W

Linear

L

1)(1

1

11

1:

4

The following figure is an example of the linear window with M=16 and L=3.

9. After multiplying the signal with each window, we simply add the scaled L

overlapping samples.

5

10. Finally, the two decompressed audio files are compared with the original one with respect to four criterions: size, SNR, frequency density and audio perception. This comparison will be performed for various values of M, L and K in order to find optimal values.

Implementation Our methodology is implemented into two MatLab functions: codeaudio and decaudio. This section will outline their purpose, the built-in MatLab sub-functions used, their inputs and outputs. Codeaudio.m This function transforms the input wav file into a compressed vector containing the most important frequency components of the audio signal and it outputs all the information necessary to reconstitute the audio signal. Codeaudio performs a destructive compression.

[cv,fs,B,M,L,K] = codeaudio(‘original.wav’,M,L,K) where: cv: the compress two dimension vector (index and coefficient). fs: the sampling frequency (samples/sec) B: the precision per sample (bits/sample). Can only be 8 or 16. M: length of each segment to compress L: length of each common sample between two segments K: number of complex coefficients kept for each segment

Codeaudio.m uses the following MatLab functions:

[Vn,fs,B]=wavread(‘file.wav’)

This function reads the wave file specified and returns the sampled data in Vn. Vn is a column vector with amplitude values ranging from [-1,+1] Volts.

Vk=fft(Vn)

This script computes the Discrete Fourier Transform of Vn according to the following formula:

1

0

21...,,2,1,0,)()(

N

n

Nnkj

VnoflengththeisNwhereNkenVnkV

Decaudio.m This function decompresses a two dimensions frequency components input vector according to fs, B, M, L and K. This function saves two different uncompress audio files, because two scaling functions of the overlaps are implemented. It also outputs the signal to noise ration of each decompress signal over the original, the frequency density of the three signals, the compression rate and it plays on the speaker the three audio signals for comparison.

6

decaudio(cv,fs,B,M,L,K,‘output1.wav’,‘output2.wav’,’original.wav’) where: output1: decompressed audio signal with linear scaling output2: decompressed audio signal with cosine scaling original: original audio file Decaudio.m uses the following MatLab functions:

vn=ifft(Vk)

It computes the inverse Discrete Fourier Transform.

wavwrite(x,fs,B,’fichier2.wav’) It writes vector x to a Windows WAVE file with a sample rate of FS Hz and with B number of bits. sound(y,fs) It plays an audio vector y on the computer speakers at the sampling frequency fs, not the wave file from which the vector is created with waveread.

Results Our objective during the testing was to find the best compromise between audio quality and compression ratio. With four parameters to vary, M, L and K, we needed to evaluate the impact of each of them on our two criteria. The first parameter which we decided to fix was the length of each segment, M. In order to do so, we observed the effects of varying M while keeping the ratio of K/M equal to 10% and the number of overlapping sample (L) at 8. We observed that if M is too large, the audio quality is suffering, because even if K is large, the sample contains too many very different frequencies. On the other hand, if M is too small, the compression ratio increases because the overlapping samples become a significant portion of the segment and the time needed to perform decompression is very high. After our test, we decided to fix the length of each segment at 256 samples. For the second part, we decided to vary the parameter that had the most influence on both the compression ratio and the audio quality: the number of coefficients to conserve. While changing K and maintaining L=16, we observed the audio quality, the compression ratio and the SNR of the decompress signal. The following graph illustrates the effects of K on the SNR and the compression.

7

The previous graph put in evidence the large effect of K on both quantities. For voice with little music in the background, we relied on our audio perception to find the lowest K that resulted in good audio quality, this value is K=35. The final step was to determine the optimal length of the overlapping samples, L. Since this parameter had very few influence on how we perceived the signal, we evaluated the SNR and the compression ratio with M=256 and K=35.

As we see from the graphs, L has nearly no effect on the SNR past 45, but it continues to affect linearly the compression rate. We noticed that the greatest increase in term of SNR and audio quality was from L=0 to L=20. Looking at the graph, we decided to choose

8

L=16 to minimize the compression ratio while having a substantial increase in SNR compared to small L. The final results were quasi identical for both window function, the linear window producing slightly higher SNR (0.01dB) than the sinus function. The achieved compression was 71% which is very substantial. In other words, a 100kbits per second audio sample would only require 29kbits per second of bandwidth to be transmitted according to our compression schemes. To furthermore evaluate the difference between our original signal and the decompressed one, we decided to use a spectrogram. The spectrogram below was realized with the 1.4 seconds sample in French, “Belle demonstration” with K=35, L=16 and M = 256. It is easy to observe the effects of compression: only the most important frequencies components, shown in bright red, are conserved, while the energy in least important frequencies is nearly eliminated. The later can be observed when red, yellow or green frequencies are transformed to green or turquoise.

9

Conclusion The purpose of this project was to implement a compression technique to compress an audio signal and to observe its effect in order to determine optimal parameters to obtain the best audio quality while achieving important compression gains. Our proposed technique relied on conserving only the most important frequency components in each segment of the signal while discarding the others. Although, our technique was very basic, we were able to achieve low compression ratio in the order of 30% which means that a signal would use 70% less bandwidth, if transmitted, or memory, if stored. Our testing was done using a voice sample with no background music; therefore our parameters are optimized for such audio signals. We further tested our parameters with music sample and the presence of noise can easily be noticed. Since music has much more frequency components than voice, we would need to use much larger K to obtain a similar quality as with voice, but at the expanse of a higher compression ratio. Much more advance compression schemes, like mp3, exist today based on the same principle of conserving the most relevant frequency information of a signal. Even though, significant progress has be achieved in this field, it is still an active research interest for many scientists and engineers.

10

Appendix %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %This function will compress the audio signal by keeping only the %K highest coefficient of the Fourier Transform. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [vn,cv,fs,B,M,L,K]=codeaudio(nom,M,L,K) %%%%%%%%%%%%%%%%%%%%%%%%% % Read the ".wav" file %%%%%%%%%%%%%%%%%%%%%%%%% [vn fs B]=wavread(nom); N=length(vn); % The input signal will be divided in segments of M samples %M=256; % The overlap between two consecutive segments will be L samples %L=8; % The K greatest coeffcients per segments of M samples will be kept. %K=35; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Initialisation of the compressed vector that will contain the highest % coefficients %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% cv=[]; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The length of the signal must be an integer multiple of M-L, % if not we have to pad with "0". %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% vn=vn'; cvn=mod(N-L,M-L); if cvn~=0 vn=[vn zeros(1,M-L-cvn)]; N=N+M-L-cvn; end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Compression of the signal (M samples/segment); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% lmod=floor(0.5*M+1); for i=1:M-L:N-M+1 %Fourier Transform of the audio signal (one segment) vk=fft(vn(i:i+M-1)); modvk=abs(vk(1:lmod)); %We look for the K greatest coefficients [vct kvct]=sort(modvk); vct=[kvct(lmod:-1:lmod-K+1)-1;vk(kvct(lmod:-1:lmod-K+1))]; % Construction of the compressed vector containing all segments cv=[cv vct]; end

11

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %This function will uncompressed an audio file %This function will uncompress K Fourier coefficients of an audio signal %using the inverse Fourier transform. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [snr_linear, snr_sinus] = decaudio(vn,cv,fs,B,M,L,K,file_linear, file_sinus) pos = 0; VnTampon_linear = []; VnTampon_sinus = []; while pos < length(cv) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % We initialize all the Fourier coefficients to '0'. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Vk = zeros(1,M);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % We read the received signal vector and put the fft coefficients % at the right position. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

for i=1:K k = cv(1,i+pos)+1; Vk(k) = cv(2,i+pos); end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% For symetry purposes, we have to verify if the number of sample % M is even or odd. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

if (mod(M,2)==0) for j=0:(M/2)-2 %Symmetry at (M/2)+1 %Conjugate symmetry of the received value

Vk(M-j) = conj(Vk(j+2)); end end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %We compute the Inverse Fourier Transform %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% VnSegment_linear = ifft(Vk); VnSegment_sinus = ifft(Vk);

12

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %In the case of linear window function, we need to know if the %segment is the first, the last or in between.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% if pos < K %First segment for h=0:(L-1) %Only the last L samples overlap

%For linear VnSegment_linear(M-h) = (h+1)/(L+1)*VnSegment_linear(M-h);

%For sinus

Vnsegment_sinus(M-h) = (1/2 -1/2*cos((h+1)*pi/(L+1)))* VnSegment_sinus(M-h); end else if pos > (length(cv)-K-1) %Last segment for h=1:L %Only the first L samples overlap %For linear VnSegment_linear(h) = h/(L+1)*VnSegment_linear(h); %For sinus VnSegment_sinus(h) = (1/2 - 1/2*cos(h*pi/(L+1)))* VnSegment_sinus(h); end else for h=0:(L-1) %In between segments %The last and the first L samples overlap (linear %version) VnSegment_linear(h+1) = (h+1)/(L+1)* VnSegment_linear(h+1); % L first VnSegment_linear(M-h) = (h+1)/(L+1)* VnSegment_linear(M-h); % L last %The last and the first L samples overlap (sinus version) VnSegment_sinus(h+1) = (1/2 - 1/2*cos((h+1)*pi/(L+1)))* VnSegment_sinus(h+1); % L first VnSegment_sinus(M-h) = (1/2 - 1/2*cos((h+1)*pi/(L+1)))* VnSegment_sinus(M-h); % L last end end end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %Creation of our buffer vector that will contained M samples. This

%vector will be longer than the length of the audio signal, because %the segments are not overlapped yet.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% VnTampon_linear = [VnTampon_linear VnSegment_linear]; VnTampon_sinus = [VnTampon_sinus VnSegment_sinus];

% We go to the next set of coefficients that define an other segment.

pos = pos + K; end

13

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %This loop will create the output vector by overlapping the samples in %the buffer vector that contains the uncompressed audio signal %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Vn_linear = []; Vn_sinus = []; %initialization of our pointer that will read the buffer vector pos=1; %Initialization of our pointer to write the audio signal Vn_linear and %Vn_sinus n=1; %Until the end of the buffer... while pos < (length(VnTampon_linear)+1) %To test if we are at the begining of an overlap and before the last %segment if( mod(pos,M)==(M-L+1) && pos < (length(VnTampon_linear)-M) ) for g=0:(L-1) %We simply add the overlapping samples Vn_linear(n+g) = VnTampon_linear(pos+g) + VnTampon_linear(pos+L+g); Vn_sinus(n+g) = VnTampon_sinus(pos+g) + VnTampon_sinus(pos+L+g); end

% We add 2*L because of the overlap pos = pos + 2*L; %We add L to Vn_linus and Vn_sinus n = n+L;

else %We are not in an overlapping section

%We simply copy the buffer vector in the output vector Vn_linear(n) = VnTampon_linear(pos); Vn_sinus(n) = VnTampon_sinus(pos); pos = pos + 1; n = n+1; end end %Calculation of respective SNR(dB) for linear and sinus window function snr_linear = 10*log10( mean(Vn_linear.^2) / mean((Vn_linear-vn).^2) ) snr_sinus = 10*log10( mean(Vn_sinus.^2) / mean((Vn_sinus-vn).^2) ) coding_ratio = length(cv)*2/length(vn) %We transform the output vector in an audio file wavwrite(Vn_linear,fs,B,file_linear); wavwrite(Vn_sinus,fs,B,file_sinus);

Documents

EE 4810 Final project Presented to Prof. Dan Ellis By ...dpwe/classes/e4810-2005-09/projects/dapho… · cv: the compress two dimension vector (index and coefficient). fs: the sampling