Transcript
Page 1: Improvement of Audio Capture in Handheld Devices  through Digital Filtering

Improvement of Audio Capture in Handheld Devices through Digital Filtering

ProblemMicrophones in handheld devices are of low quality to reduce cost. This makes speech recognition less reliable.

Choosing a Test SoundTwo different test sounds were tested to find which sound worked better. One test sound was a sine wave increasing in frequency from 80Hz to 8000Hz. The other test sound was pink noise. Speech was recorded and filtered from four different people. These files were filtered using the coefficients produced by the least mean square algorithm.

These recordings were then tested against two different speech recognizers, the one built into Windows XP and the one built into Windows 7. The Windows 7 recognizer had a higher baseline success rate than the XP recognizer.

Overall, the filter created from the pink noise fixed more speech recognition errors than the other filter. Also, all but one of the phrases fixed by the filter from the sine wave were also fixed by the pink noise.

For the Future• Test more filter lengths, iterations, gains, sound files• Insert filter into Windows Mobile recording stack• Add options to the program to change the filter

creation parameters

Jonathan Brown: [email protected] <> Sam Marlin: [email protected] <> Advisor: W. T. Miller

Proposed SolutionUsing digital signal processing, a filter will be created to “undo” the distortion caused by the poor quality microphone. This process will be able to generate a filter for any handheld that uses the Windows Mobile platform, creating a custom tailored filter based on the acoustic characteristics of each device. Reference audio files, with known frequency components, will be used to find what frequencies are attenuated by the handheld.

Testing the CodeAll the code was first done in Matlab for testing purposes. The code was then ported to C# for final deployment.

Save the FilterThe filter coefficients are then saved into the registry of the handheld device for use by any audio recording or voice recognition application.

Record Test SoundPlay an ideal test sound from the computer while recording it on the handheld.

Create the FilterThe program on the computer will compare the test sound and the recorded sound to create the filter.

SetupSetup computer, speakers and handheld device.

Steps of the Solution Process

Lining up the Sound FilesEach test sound file had 10 cycles of a 440Hz sine wave at its start. This knowledge was used to line up the two sound files through cross-correlation.

ProblemThe above equation did not line up the sound files for all time. The time steps in each of the sounds are different, after 1000 samples the files would noticeably unaligned. To fix this, cross-correlation was used again to match the indexes in one file to another.

Creating the FilterThe least mean square algorithm was used to create the filter coefficients. For this algorithm to work, the test files have to be lined up in time. This algorithm has many different variables, so tests were done to find best filter parameters to solve the problem. The sine wave test sound file was used in these parameter tests.

Choosing the numbers depended on two values, the RMS of the error value used in the algorithm and if the filter coefficients changed by varying the iterations.

Numbers used in testing:Gain: 0.001, 0.0001, 0.00001Iterations: 500 to 3900 in steps of 200Filter Size: 257

k)-NI(n×e(n)×g+FC(k)=FC(k)

FS

0 k

k) NI(n FC(k) I(n) e(n) I = the ideal waveformNI = the non ideal waveformFC = the filter coefficientsFS = filter sizee = equalization errorg = the gain

Windows 7 Speech RecognizerNoise Unfiltered Filter from Sine Wave Filter from Pink NoiseRecognized 158 159 167Broke - 5 6Fixed - 6 15

0.5 1 1.5 2 2.5 3 3.5 40

1000

2000

3000

4000

5000

6000

7000

8000

Time (s)

Spectrogram of Sine Wave Test Signal

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 7 8-70

-65

-60

-55

-50

-45

-40

-35

Frequency (kHz)

Pow

er/fr

eque

ncy

(dB

/Hz)

Welch Power Spectral Desnsity Estimateof Pink Noise Test Signal

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2.5

-2

-1.5

-1

-0.5

0

0.5x 10

4

Normalized Frequency ( rad/sample)

Pha

se (d

egre

es)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-30

-20

-10

0

10

Normalized Frequency ( rad/sample)

Mag

nitu

de (d

B)

Frequency Response of Sine Wave Test Signal

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2.5

-2

-1.5

-1

-0.5

0

0.5x 10

4

Normalized Frequency ( rad/sample)

Pha

se (d

egre

es)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-40

-30

-20

-10

0

10

Normalized Frequency ( rad/sample)

Mag

nitu

de (d

B)

Frequency Response of Pink Noise Test Signal

Final Values:Gain: 0.0001Iterations: 1500Filter Size: 257

ConclusionsThe filter developed using the pink noise test signal resulted in astatistically significant improvement in speech recognizer performance at the 90% confidence level (from 79% to 83.5 % correct) . This indicates that the technique could provide a functionally significant improvement in practice, and warrants further investigation.

Recommended