36
Speech Recognition Front End Pre-emphasis Temporal Features Consolidat e Features Frequency Features Spectral Analysis windowing Enhance Features Speech Feature Vectors This week’s focus is on the spectral analysis

Speech Recognition Front End Pre-emphasis Temporal Features Consolidate Features Frequency Features Spectral Analysis windowing Enhance Features Speech

  • View
    225

  • Download
    1

Embed Size (px)

Citation preview

Speech Recognition Front End

Pre-emphasis

TemporalFeatures

Consolidate Features

Frequency Features

Spectral Analysis

windowing

EnhanceFeatures

Speech

FeatureVectors

This week’s focus is on the spectral analysis

Spectral Analysis

• Goal: Find useful frequency related features• Approaches

– Without Fourier Analysis: • Apply a recursive band pass bank of filters• Use linear predictive coding (LPC)

– With Fourier Analysis: • Calculate a Fourier transform • warp results based on the MEL scale

• Applications: Auditory models mimicking human hearing– Eliminate noise by removing non-voice frequencies – Detect formants present in signal– Perform Cepstral analysis to detect pitch and recognize speech– Auditory nerves stop responding to extended occurrences of the same

frequency• Idea: Deemphasize frequencies present for extended periods.• Results: Effective for speech recognition in noisy environments

This week’s emphasis will be on Fourier Analysis

The Fourier Transform Family• Fourier Series

A decomposed weighted sum of sinusoidal functions that models an arbitrary infinitely periodic continuous function

• Fourier TransformA linear operation that maps an arbitrary function with infinite range into a spectrum of its frequency components

• Discrete Fourier Transform (DFT)A Fourier Transform applied to a discrete infinitely repeating periodic series of complex numbers.

• Discrete Time Fourier Transform (DTFT)A Fourier Transform applied to a a-periodic discrete series of complex numbers that extend from ± ∞.

• Fast Fourier Transform: Fast way to calculate DFT

The number e

• e = limn->∞{(1 + 1/n)n}

• When n = 1 e ≈ 2When n = 2 e ≈ (1 + ½)2 = 9/4 = 2.25When n = 3 e ≈ (1 + 1/3)3 = 64/27 = 2.37037

• When n is extremely large, it approaches the value: e = 2.718281828 …

• What does this have to do with sound?Answer: The future slides will tell.

Quick Calculus Review

• The derivative of a function at a point is the slope of the function at that point (change in y over change in x).

• The derivative of x2 = 2x (Notation: f’(x2) = 2x)lim ∆x->0 ( (x+∆x)2 – x2 )/ ∆x

= lim ∆x->0 (x2 + 2x∆x + ∆x2 – x2)/∆x = lim ∆x->0 (2x + ∆x) = 2x

• Tables of derivatives proved by mathematicians exist• We will need these:

– f’(xn) = nxn-1

– f’(Sin x) = Cos x, f’(Cos x) = -Sin x– f’(ex) = ex, f’(eax) = a eax

Complex Numbers• Extends the number line to a plane

– Horizontal axis: Real Numbers– Vertical axis: Complex Numbers– Rectangular Notation: a + bi

• a along the real axis• b along the imaginary axis

• Operations– Addition: (a+bi) + (c+di) = (a+b) + (b+d)I– Multiplication: (a+bi) * (c+di) = (ac – bd) + (ad + bc)I– Division: (a+bi)/(c+di) solved by multiplying numerator and

denominator by the conjugate of c+di, which equals c-di

Polar Notation

• Rectangular Form 4+i3• Convert to Polar Form (5,36.87)

– M = sqrt(42+32) = 5– Ө = arctan(3/4)

• Convert to Rectangular– A+ib = M(cos Ө + i * sinӨ)

Distance and angle from the origin

Note: At 90 and 270 degrees we have a divide by zero

McLauren Series for e, sin, cos• McLauren Series to estimate any well-behaved

function in terms of polynomialsf(x) = f(0)x0/0! + f’(0)x1/1! + … + fn(0)xn/n! + …

• Try it out say for the third derivative at x = 0f3(0) = 0 + 0 + 0 + 3*2*1 f3(0)/(3*2*1) + 0 + 0 + …All the derivatives match at x = 0.

• Series that we will need• ex = 1 + x + x2/2! + x3/3! + x4/4! + …• Sin x = x – x3/3! + x5/5! – x7/7! + …• Cos x = 1 – x2/2! + x4/4! – x6/6! + …

• Another way to calculate e: e = 1 + 1 + 1/2! + 1/3! + …

Note: 0! = 1

Sine, Cosine and e

eiӨ = 1 + iӨ + (iӨ)2/2! + (iӨ)3/3! + (iӨ)4/4! + (iӨ)5/5! + (iӨ)6/6! + (iӨ)7/7! +···

(Multiply terms to eliminate higher powers of i) = 1 + iӨ - Ө2/2! - iӨ3/3! + Ө4/4! + iӨ5/5! - Ө6/6! - iӨ7/7! + ···

(Gather real and complex terms together) = (1- Ө2/2! + Ө4/4! - Ө6/6! + ···) + i (Ө - Ө3/3! + iӨ5/5! - iӨ7/7! + ···)

(Substitute Cos and Sin terms for the series) eiӨ = cos(Ө) + i sin(Ө) (This is called Euler’s formula)

From Previous Slideex = 1 + x + x2/2! + x3/3! + x4/4! + …Sin x = x – x3/3! + x5/5! – x7/7! + …Cos x = 1 – x2/2! + x4/4! – x6/6! + …

Key Formulae and Identities

Euler's Formula: eix = cos(x) + i * sin(x)

Trigonometric Identities: cos(x)=cos(-x) and sin(x)=-sin(-x)

cos(x) = (eix + e-ix)/2 and sin(x) = (eix – e-ix)/2isin2(x)+ cos2(x) = 1

sin(x+y) = sin(x)cos(y) + cos(x)sin(y)cos(x+y) = cos(x)cos(y) - sin(x)sin(y)

Quick Linear Algebra Review• Linear algebra extends Euclidian space beyond three dimensions.• <3,4,5> represents a vector going from points (0,0,0) to (3,4,5).• Two vectors are orthogonal (perpendicular roughly speaking) if

their inner (dot product) equals 0.– Example: <1,0,0> • <0,1,0> = 1*0 + 0*1 + 0*0 = 0– Example: <3,1>•<-1,3> = 3*-1 + 1*3 = 0

• Two functions are orthogonal between a and b if ∫a,b f(x)g(x)dx = 0

• A set of functions are mutually orthogonal if ∫a,b fi(x)fj(x)dx = 0 if i≠j

and c>0 if i=j.• Why do we need this? Orthogonal function sets can be used to

decompose or construct signals.

Inner Product: sum the products of correspondent coordinates

Basis to span a space• Consider the orthogonal basis <1,0,0>, <0,1,0>, <0,0,1>

– These form a basis a three dimension space.– Why? Any 3-dimension vector is a linear combination of these– Example: <4,3,2> = 4 * <1,0,0> + 3 * <0,1,0> + 2 * <0,0,1>

• Consider the orthogonal basis vectors: <1,2>, <-2,1> – They are orthogonal because: <1,2> • <-2,1> = 0

• Consider the basis vectors: <1/5½,2/5½ >, <-2/5½,1/5½ >– Also orthogonal because the inner (dot) product is 0)– <1/5 ½,2/5 ½ >has a length of unity ((1/5½)2 + (2/5½)2) ½ = 1– <-2/5 ½,1/5½ > also has a length of unity (same distance calculation)

• Orthonormal basis vectors: orthogonal and have unity length

Orthogonal and Orthonormal• Experiment (intuitive example, not mathematically precise)• Goal: construct <4,7> from basis vectors

– Orthogonal Basis: <1,2> and <-2,1>– <1,2> • <4,7> = 18 and <-2,1> • <4,7> = -1– 18 <1,2> + (-1)<-2,1> = <20, 35> which is five times <4,7>

• Another experiment– Orthonormal basis: <1/5½,2/5½ >, <-2/5½,1/5½ >– <1/5½,2/5½ > • <4,7> = 18/5½ and <-2/5½,1/5½>• <4,7> = -1/5½

– (18/5½)<1/5½,2/5½ > + (-1/5½)<-2/5½,1/5½> = <20/5, 35/5> = <4,7>

• Conclusion: Orthonormal basis vectors correlated with another vector gets the multiple of that basis vector.

Fourier Series• A Fourier series is an sum (possibly by not

necessarily infinite) of Sine and Cosines to model a continuous signal.

• Fourier modeling allows us to decompose a signal, perform processing, and recombine the results to solve an original problem

Fourier DecompositionThe top signal decomposes into nine cosine and sine waves

Fourier Square Wave Synthesis

Fourier Cosine Series

• The set of functions: {cos(k2πF0} where k is an integer >0

– Mutually orthogonal from –T to T for 0 ≤ t < ∞; T>0

– ∫-L,L cos(k12πx/P) cos(k22πx/P)dx = 0 if k1 ≠ k2; ≠ 0 if k1 = k2

– Proof requires some Calculus: Namely integration

• x(t) = a0cos(0*2πF0t) +a1 cos(1*2πF0t) +a1 cos(2*2πF0t) …x(t) = a0 + ∑k=1,∞akcos(k2πFt ) where F = 2π/T

• Comment: The series doesn’t include phases, if we add phases we have twice as many unknowns to compute

cos (πx/3) and cos (2 πx/3)Integral: Cos (πx/3) * cos (2 πx/3)

A General OrthogonalFunction Set

• Euler Equation: eiφ = cos(φ) + i sin(φ)– Radius = magnitude (always unity); φ = phase.

• Consider the function set: {eiωk}

– Angular frequency: ωk = 2πkF0 = 2πk/T0

– F0,T0 Fundamental frequency & period.

– k = speed which eiωk traverses the circle

– Orthogonal because ∫-∞,∞ ejω

n ejωm =0 whenever n ≠-m Notes

1. The book uses j instead of I

2. Electrical engineers prefer j

3. Mathematicians prefer I

4. Get used to both!

5. In the diagram, φ = 2πF0

Orthogonality Example• Left: Correlate top with middle resulting bottom having area ≠0 • Right: Correlate top with middle resulting bottom having area = 0

Putting it all together• {eiω

k} is an Orthogonal basis for signals– Each function: eiω

k is a basis function

– We can use to basis functions to synthesize signals• Synthesize (Fourier series)

– Source: frequency magnitudes, Sink: time signal– x(k) = (1/T)∑k=0,T ak eikω

0 where x(k) = signal at time t

– T = # of basis functions (possibly infinite); ak = magnitude of wk

• For computer processing, we need a discrete counterpart– Why? We don’t to deal with infinite points or basis functions– x[k] = (1/N)∑k=0,N X[k] ei2∏kn/N

– k determines how fast the sum traverses the circle (higher k faster)– N basis functions and N frequencies

• Note: For periodic functions, we can use [0,T] instead of [- ∞,∞]

Fourier Analysis• Goal: Compute coefficients given the signal.• Synthesis equation: x(t) = ∑k= -∞,∞ ak eitmω

0

• Multiply both sides by e-itkw0

x(t)e-itkw0 = (∑k= -∞, ∞ ak eimtω

0)e-itkw0

• Integrate over the period: 0, T0∫0,T0x(t)e-itkw

0dt = ∫0,T0(∑k= -∞, ∞ ak eimω0

t) e-itkω0dt

• The sum will be zero except when k = m∫0,T0 x(t)e-itkw

0dt = (∑k= -∞, ∞ ak ) ∫0,T0(etimω0) e-itkω

0dt∫0,T0 x(t)e-itkw

0dt = (∑k= -∞, ∞ ak ) ∫0,T0(eit(m-k)ω0)dt

• The only time this is non-zero is if k=m ∫0,T0 x(t)e-itkw

0dt = ak ∫0,T0dt = ak t |0,t0 = ak T0

• The answer (value of coefficient m): ak = (1/T0)∫0,T0 x(t)e-itkw0dt

• Note: 1/T0 is simply a constant the scales the result

Discrete Version• Definition: Continuous Fourier Transform and Inverse

– Transform: X(w) = ∫-∞, ∞ x(t)e-itwtdt

– Inverse: x(t) = (1/2π)∫ -∞, ∞ X(w)eiwtdw

• Convert from continuous version: – Evaluate at N equally spaced points (period now is N)– Use sums to approximate the integral– Note: x(t) = value at time t, x[n] is x(t) evaluated at time 2∏kn/N

• Discrete Fourier Transform and Inverse– Transform: X[k] = ∑n=0,N-1 x[n] e-i2∏kn/N

– Inverse: x[k] = (1/N)∑n=0,N-1 X[k] ei2∏kn/N

• Note: X[k] is a complex number representing magnitude/phase• Conclusion: We can go between time and frequency domains

Signal Plot

• The phases are shown in the spectrum plot in the complex plane.

• The phase affects how the time domain signal looks.

• The amplitude of the spectrum plot remain constant regardless of phase.

Fourier Transform of Square Wave

Fourier Transforms exhibit the property of duality• Square wave in frequency = to window sync function in time and visa versa• Convolution in time = multiplication in frequency and visa versa• Proof with calculus

∫-∞,∞ x(t)e-jtkw0dt = ∫-1/2,1/2 x(t)e-jtkw

0dt = ∫-1/2,1/2 e-jtkw0dt =(1/jw)e-jwt|-1/2,1/2

= (1/jw)(e-jw½ –e-jw(-½))=(1/jw)(ejw/2–e-jw/2) = sin(jw/2)/(jw/2)

-1/2 1/2

Complex DFT by Correlationdouble[] DFT( double[] time, int N){ double[] f[2*N], real, imag; double om, w = 2 * Math.PI / time.length; for (k=0; k<N; k++) { for (i=0; i<N; i++) { real = Math.cos(2*Math.PI*k*i/N); imag = -Math.sin(2*Math.PI*k*i/N); f[2*k] +=(time[2*i]*real–time[2*i+1]*imag); f[2*k+1]+=(time[2*i+1]*imag+time[2*i]*real); } } return freq;}

Note: even indices = real part, odd indices = imaginary part

Complexity: O(N2) because of the double loop of N eachExample: For 512 samples, loops 262144 timesEvaluation: Too slow, but FFT is O(N lg N)

The FFT Algorithm• The FFT algorithm is based on divide-and-conquer

n/4 n/4 n/4 n/4

n/2 n/2

n O(n)

O(n)

O(n)

O(log n)

The running time complexity is O(n log n)

Why do we need FFT?• Correlation algorithm is O(N2)

• Too slow to be practical even on today's processors

• Optimized FFT is O(N lgN) which is orders of magnitude faster

• Assume 512 elements in a window– O(N) = C * 512 – O(N2) = C * 512 * 512 = C * 262,144– O(N lg N) = C * 512 * 9 = C * 4,608

Theory for OptimizationBase Case: x[0]

Recursive Relationship

∑t=0N-1 x[t] e-i2πkt/N

= ∑t=0N/2-1 x[2t] e-i2πk(2t)/N

+ ∑t=0N/2-1 x[2t+1] e-i2πk(2t+1)/N

= ∑t=0N/2-1 x[2t] e-i2πkt/(N/2)

+ ∑t=0N/2-1 x[2t+1] e-i2πk(2t+1)/N

= ∑t=0N/2-1 x[2t] ei2πkt/(N/2)

+ e-i2πk/N∑t=0N/2-1x[2t+1]e-i2πkt/(N/2)

= Fkeven

+ e-i2πk/N * Fkodd

Note: work at each step is O(N); there are lg(N) levels

Simple Recursive FFT SolutionComplex[] fft(Complex[] x){ int N = x.length;

Complex[] y = new Complex[N]; if (x.length==1) {y[0] = x[0]; return y; } Complex[] even = new Complex[N/2]; Complex[] odd = new Complex[N/2]; for (int m=0; m<N/2; m++) { even[m] = x[2*m]; odd[n] = x[2*m+1]; } Complex[] q = fft(even), r = fft(odd);

for (int k=0; k<N/2; k++) { double exp = -2*k* math.PI /N; Complex wk = new Complex

(Math.cos(exp), Math.sin(exp)); Y[k] = q[k].plus(wk.times(r[k])); Y[k+N/2] = q[k].minus(wk.times(r[k])); } Return y; } Note: e-2kπ/N = -e-2kπ/N+N/2

Inefficiencies

• The Complex class causes many jumps and puts pressure on the hardware cache

• Declaring and copying arrays at every step slows things down at least by half

• Repetitive calculations of sines and cosines are extremely slow

• N<<1 is ten times faster than N/2• Overhead associated with activation record creation

due to the recursion calls is very slow

The computations still are an order of magnitude slower than needed

Eliminating the Recursion

• The numbers in the rectangles are the array indices

• You see the original indices as we pass through each level of recursion

• Can you see a pattern ?

000 001 010 011 100 101 111110

000 010 100 110 001 011 111101

000 100 010 110 001 101 111011

Butterfly algorithm

Butterfly Code

int j = N>>1, k;for (int i=1;i<N-1;i++){ if (i < j) { swap (x[i],x[j]);} k = N>>1; while (k>=2 & j>=k) { j -= k; k >>= 1; } j += k;}

• Most Significant BitSwapBit ( x, x + lgN)

• Second most significant bit

SwapBit(x, x + lg(N/2)

• Third most significant bit SwapBit(x, x + lg(N/4)

• kth most significant bitSwapBit(x, x + lg(N/2k))

Flip bits from left to right

Sin and Cosine Table Look Up

• ei2πk/N = cos(2πk/N) + i sin(2πk/N)• We can store in an array (sinX[])sin(2π0/N), sin(2π1/N), sin(2π2/N) sin(2π3/N), … sin(2π(N-1)/N)

• cos(2πk/N) = sinX[(k+(N>>2))%N]

Compute the values ahead of time and save repetitive calculations

Optimized FFT – after butterfly code// Perform the fft calculations.for (int stage=1; stage<=M; stage++) // M = lg N{ // Remember that complex numbers require pairs of doubles fftSubGroupGap = 2<<stage; // 4, 8, 16, ... – subgroup distance gap = fftSubGroupGap>>1; // 2, 4, 8, ... – odd/even distance kInc = N>>1; // Number of 2PIki/N steps for odd/even entries.

// Outer loop: each sub-fft group; inner loop: combine group elements for (int even=0; even<complex.length; even+=fftSubGroupGap) { k = 0; // Index into the trigonometric lookup table. for (int element=even; element<(even+gap); element+=2) {

// ***** See Next Slide ***** k += kInc; // position for next look up.

} } kInc >>= 1;}

Multiplication Portion // Look up e^2PIki/N avoiding trig calculations here. realW = sines[(k+(N>>2))%N]; // cos(2PIk/N); imagW = -sines[k%N]; // -sin(2PIk/N);

// Complex multiplication of the odd entry of the subgroup // with (e^2PIi/N)^k = (cos(2PI/N) - i * sin(2*PI/N)^k j = (element + gap); tempReal = realW * complex[j] - imagW * complex[j+1]; tempImag = realW * complex[j+1] + imagW * complex[j];

// Adjust the odd entry (subtract: the fft is periodic). complex[j] = complex[element] - tempReal; complex[j+1] = complex[element+1] -tempImag;

//Adjust the even entry. complex[element] += tempReal; complex[element+1] += tempImag;

Final Notes• Standard Fast Fourier Transform

– requires N to be a power of 2 for recursion to work– Can pad the array with zeroes to extend frequency domain

• Can it work if N is not a power of 2? – Yes, but special slower processing is needed

• How do we know if it works?– Point N/2-1 = Point N/2+1, Point N/2-2 = Point N/2+2,

Point N/2-k = Point N/2 + k, etc.– Note: Points 0 and N/2 don't match, so don’t check these– The FFT Inverse should restore the time domain signal– Compare to the slower correlation DFT calculation– Try some simple impulses and check the results