Digital Audio Signal Processing Lecture-2 Microphone Array Processing Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven [email protected] homes.esat.kuleuven.be/~moonen

Digital Audio Signal Processing

Lecture-2

Microphone Array Processing

Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven

[email protected]

homes.esat.kuleuven.be/~moonen/

Digital Audio Signal Processing Version 2015-2016 Lecture-2: Microphone Array Processing p. 2

Overview

• Introduction & beamforming basics– Data model & definitions

• Fixed beamforming– Filter-and-sum beamformer design – Matched filtering

• White noise gain maximization• Ex: Delay-and-sum beamforming

– Superdirective beamforming • Directivity maximization

– Directional microphones (delay-and-subtract)

• Adaptive Beamforming– LCMV beamformer– Generalized sidelobe canceler


Introduction

• Directivity pattern of a microphone– A microphone (*) is characterized by a `directivity pattern which

specifies the gain & phase shift that the microphone gives to a

signal coming from a certain direction (i.e. `angle-of-arrival’) – In general the directivity pattern is a function of frequency (ω) – In a 3D scenario `angle-of-arrival’

is azimuth + elevation angle– Will consider only 2D scenarios for

simplicity, with one angle-of arrival (θ),

hence directivity pattern is H(ω,θ) – Directivity pattern is fixed and defined

by physical microphone design

(*) We do digital signal prcessing, so this includes front-end filtering/A-to-D/..

|H(ω,θ)| for 1 frequency


Introduction

• Virtual directivity pattern – By weighting or filtering (=freq. dependent weighting) and then

summing signals from different microphones, a (software controlled) virtual directivity pattern (=weigthed sum of individual patterns) can be produced

– This assumes all microphones receive the same signals (so are all in the same position). However…

M

mmm HFH

1virtual ),().(),(

01000

20003000

045

90135

180

0

0.5

1

Angle (deg)

+:


+:

Introduction

• However, in a microphone array different microphones are in different positions/locations, hence also receive different signals

• Example : uniform linear array i.e. microphones placed on a line & uniform inter-micr. distances (d) & ideal micr. characteristics (p.9) For a far-field source signal (plane waveforms), each microphone receives the same signal, up to an angle-dependent delay… fs=sampling rate c=propagation speed

][][ 1 mm kyky

sm

m fc

d cos)(

dmdm )1(


+:

Introduction

• Beamforming = `spatial filtering’ based on

microphone characteristics (directivity patterns)

AND microphone array configuration (`spatial sampling’)

• Classification:

Fixed beamforming: data-independent, fixed filters Fm

e.g. delay-and-sum, filter-and-sum

Adaptive beamforming: data-dependent filters Fm

e.g. LCMV-beamformer, generalized sidelobe canceler


Introduction

• Background/history: ideas borrowed from antenna array design and processing for radar & (later) wireless communications

• Microphone array processing considerably more difficult than antenna array processing: – narrowband radio signals versus broadband audio signals– far-field (plane wavefronts) versus near-field (spherical wavefronts)– pure-delay environment versus multi-path environment

• Applications: voice controlled systems (e.g. Xbox Kinect), speech communication systems, hearing aids,…


Data model and definitions

Data model: source signal in far-field (see p.13 for near-field)

• Microphone signals are filtered versions of source signal S() at angle

• Stack all microphone signals (m=1..M) in a vector

d is `steering vector’• Output signal after `filter-and-sum’ is

TjM

j MeHeH )()(1 ).,(...).,(),( 1 d

)()}.,().({),().(),().(),(1

* SYFZ HHM

mmm dFYF

H instead of T for convenience (**)

)( ..),(),(

shift phase dep.-pos.

)(

pattern dir.

SeHY mjmm

)().,(),( SdY


Data model: source signal in far-field

• If all microphones have the same directivity pattern Ho(ω,θ), steering vector can be factored as…

• Will often consider arrays with ideal omni-directional microphones : Ho(ω,θ)=1 Example : uniform linear array, see p.5


)().,(),( SdY

positions spatial

)()(

pattern dir.

0 ...1.),(),( 2Tjj MeeH d

microphone-1 is used as a reference (=arbitrary)



Definitions: (1)

• In a linear array (p.5) : =90o=broadside direction = 0o =end-fire direction

• Array directivity pattern (compare to p.3) = `transfer function’ for source at angle ( -π<< π )

• Steering direction = angle with maximum amplification (for 1 freq.)

• Beamwidth (BW) = region around max with (e.g.) amplification > -3dB (for 1 freq.)

),().()(

),(),(

dFH

S

ZH

),(maxarg)(max H



Data model: source signal + noise• Microphone signals are corrupted by additive noise

• Define noise correlation matrix as

• Will assume noise field is homogeneous, i.e. all diagonal elements of noise correlation matrix are equal :

• Then noise coherence matrix is

TMNNN )(...)()()( 21 N

})().({)( Hnoise E NNΦ

iΦΦ noiseii , )()(

1....

....

....1

)(.)(

1)(

noise

noisenoise ΦΓ

)()().,(),( NdY S



Definitions: (2) • Array Gain = improvement in SNR for source at angle ( -π<< π )

• White Noise Gain =array gain for spatially uncorrelated noise

(e.g. sensor noise)

ps: often used as a measure for robustness• Directivity =array gain for diffuse noise (=coming from all directions)

DI and WNG evaluated at max is often used as a performance criterion

c

ddf ijsdiffuseij

)(sinc)(

skip this formula)(.).(

),().(),(

2

FF

dFdiffusenoise

H

H

DI

)().(

),().(),(

2

FF

dFH

H

WNG Iwhite

noise

|signal transfer function|^2

|noise transfer function|^2)().().(

),().(),(

2

FΓF

dF

noiseH

H

input

output

SNR

SNRG

PS: ω is rad/sample ( -Π≤ω≤Π ) ω fs is rad/sec


• Far-field assumptions not valid for sources close to microphone array– spherical wavefronts instead of planar waveforms– include attenuation of signals– 2 coordinates ,r (=position q) instead of 1 coordinate (in 2D case)

• Different steering vector (e.g. with Hm(ω,θ)=1 m=1..M) :

PS: Near-field beamforming

TjM

jj Meaeaea )()(2

)(1

21),( qqqqd ),( d

s

mref

m fc

pqpqq

)(

with q position of source pref position of reference microphone pm position of mth microphone

m

ref

mapq

pq

e

e=1 (3D)…2 (2D)


PS: Multipath propagation

• In a multipath scenario, acoustic waves are reflected against walls, objects, etc..

• Every reflection may be treated as a separate source (near-field or far-field)

• A more realistic data model is then..

with q position of source and Hm(ω,q), complete transfer function from source position to m-the microphone (incl. micr. characteristic, position, and multipath propagation)

`Beamforming’ aspect vanishes here, see also Lecture-3

(`multi-channel noise reduction’)

)()().,(),( NqdqY S

TMHHH ),(...),(),(),( 21 qqqqd


Overview








Filter-and-sum beamformer design

• Basic: procedure based on page 10

Array directivity pattern to be matched to given (desired) pattern over frequency/angle range of interest

• Non-linear optimization for FIR filter design (=ignore phase response)

• Quadratic optimization for FIR filter design (=co-design phase response)

),().()(

),(),(

dFH

S

ZH

),( dH

1

0,1 .)( , )(...)()(

N

n

jnnmm

TM efFFF F

ddHH dNnMmf nm ),(),(min

2

1..0,..1,,


2

1..0,..1,,



• Quadratic optimization for FIR filter design (continued)

With

optimal solution is


2

1..0,..1,,

),(

).1(

.0

1

11,0,

),(.)(.),(.)(...)(),().(),(

... , ...

d

dfddF

ffffNMxM

jN

j

MxMTH

MHH

TTM

TTNmmm

e

e

IFFH

ff

ddHdd d

Hoptimal ),().,( , ),().,( , . *1 dpddQpQf

Kronecker product



• Design example

M=8Logarithmic arrayN=50fs=8 kHz

01000

20003000

045

90135

180

0

0.5

1

Angle (deg)

Frequency (Hz)


Matched filtering: WNG maximization


• Maximize White Noise Gain (WNG) for given steering angle ψ

• A priori knowledge/assumptions:– angle-of-arrival ψ of desired signal + corresponding steering vector– noise scenario = white

})().(

),().(arg{max)},(arg{max)(

2

)()(MF

FF

dFF FF H

H

WNG


Matched filtering: WNG maximization

• Maximization in

is equivalent to minimization of noise output power (under

white input noise), subject to unit response for steering angle (**)

• Optimal solution (`matched filter’) is

• [FIR approximation]

),(.),(

1)( 2

MF

dd

F

1),().( s.t. ),().(min )( dFFFFHH

})().(

),().(arg{max)(

2

)(MF

FF

dFF F H

H

dNnMmf nm

2MF1..0,..1, )()(min

,FF


Matched filtering example: Delay-and-sum

• Basic: Microphone signals are delayed and then summed together

• Fractional delays implemented with truncated interpolation filters (=FIR)

• Consider array with ideal omni-directional micr’s

Then array can be steered to angle :

Hence (for ideal omni-dir. micr.’s) this is matched filter solution

1),( H

d

cos)1( dm

d

2

m

1

M

1

M

eF

mj

m

)(

M

mmm ky

Mkz

1

][.1

][

)(mm M

),()(

dF



• Array directivity pattern H(ω,θ):

=destructive interference

=constructive interference• White noise gain :

(independent of ω)

For ideal omni-dir. micr. array, delay-and-sum beamformer provides

WNG equal to M for all freqs. (in the direction of steering angle ψ).

)( 1),(

1),(

),().,(1

),(

max

H

HM

H H dd

MWNGH

H

..)().(

),().(),(

2

FF

dF

ideal omni-dir. micr.’s


02000

40006000

8000 045

90135

180

0.2

0.4

0.6

0.8

1

Angle (deg)Frequency (Hz)

• Array directivity pattern H(,) for uniform linear array:

H(,) has sinc-like shape and is frequency-dependent


)2/sin(

)2/sin(

),(

2/

2/1

)cos(cos)1(

j

jM

M

m

fc

dmj

e

Me

eHs

-20

-10

0

90

270

180 0

Spatial directivity pattern for f=5000 Hz

M=5 microphones

d=3 cm inter-microphone distance

=60 steering angle

fs=16 kHz sampling frequency

=endfire

1),( H

=60wavelength=4cm



0

2000

4000

6000

8000 050

100150

0.20.40.60.8

1

Angle (deg)


For an ambiguity, called spatial aliasing, occurs.

This is analogous to time-domain aliasing where now the spatial sampling (=d) is too large. Aliasing does not occur (for any ) if

M=5, =60, fs=16 kHz, d=8 cm

)cos1.(

d

cf

2.2min

max

f

c

f

cd

s

)cos1.(.. and 0for occurs 2 then

2 if 3)

)cos1.(.. and for occurs 2 then

2 if 2)

) all(for for 0 )1

integer for 2 1),(

Details...

d

cfπ

d

cfπ

pπ.pγiffH


)cos1.(

d

cf



• Beamwidth for a uniform linear array:

hence large dependence on # microphones, distance (compare p.24 & 25) and frequency (e.g. BW infinitely large at DC)

• Array topologies:– Uniformly spaced arrays– Nested (logarithmic) arrays (small d for high , large d for small )– 2D- (planar) / 3D-arrays

with e.g. =1/sqrt(2) (-3 dB)

d

2d

4d

sec

)1(96

dMcBW



Super-directive beamforming : DI maximization


• Maximize Directivity (DI) for given steering angle ψ

• A priori knowledge/assumptions:– angle-of-arrival ψ of desired signal + corresponding steering vector– noise scenario = diffuse

})(.).(

),().(arg{max)},(arg{max)(

2

)()(SD

FF

dFF FF diffuse

noiseH

H

DI


• Maximization in

is equivalent to minimization of noise output power (under

diffuse input noise), subject to unit response for steering angle (**)

• Optimal solution is

• [FIR approximation]

})(.).(

),().(arg{max)(

2

)(SD

FF

dFF F diffuse

noiseH

H

1),().( s.t. ),().().(min )( dFFFFHdiffuse

noiseH

),(.)}(.{1

)( 1

),(.1)}(.{),(

SD

dFdd

diffusenoisediffuse

noiseH

dNnMmf nm

2SD1..0,..1, )()(min

,FF



• Directivity patterns for end-fire steering (ψ=0):

Superdirective beamformer has highest DI, but very poor WNG (at low frequencies, where diffuse noise coherence matrix becomes ill-conditioned)

hence problems with robustness (e.g. sensor noise) !

-20

-10

0

90

270

180 0

Superdirective beamformer (f=3000 Hz)

-20

-10

0

90

270

180 0

Delay-and-sum beamformer (f=3000 Hz)

M=5 d=3 cmfs=16 kHz

Maximum directivity=M.M obtained for end-fire steering and for frequency->0 (no proof)


0 2000 4000 6000 80000

5

10

15

20

25

Frequency (Hz)

Dir

ecti

vit

y (

lin

ear)

SuperdirectiveDelay-and-sum

0 2000 4000 6000 8000-60

-50

-40

-30

-20

-10

0

10

Frequency (Hz)

Whit

e n

ois

e g

ain

(d

B)

SuperdirectiveDelay-and-sum

WNG=M= 5

PS: diffuse noise ≈white noise for high frequencies(cfr. ωΠ and c/fs=λmin/2≈min(dj-di) in diffuse noise coherence matrix)

DI=WNG=5

DI=M 2=25



• First-order differential microphone = directional microphone 2 closely spaced microphones, where one microphone is delayed (=hardware) and whose outputs are then subtracted from each other

• Array directivity pattern:

– First-order high-pass frequency dependence– P() = freq.independent (!) directional response– 0 1 1 : P() is scaled cosine, shifted up with 1

such that max = 0o (=end-fire) and P(max )=1

d

+

_

)cos

(1),( c

dj

eH

d/c <<, <<

cd /1

cos)1()( 11 P

Differential microphones : Delay-and-subtract


Differential microphones : Delay-and-subtract

• Types: dipole, cardioid, hypercardioid, supercardioid (HJ84)

=endfire

=broadside

Dipole:

1= 0 (=0) zero at 90o

DI=4.8dB

Cardioid:

1= 0.5 zero at 180o

DI=4.8dB

Supercardioid:

1=

zero at 125o, DI=5.7 dB highest front-to-back ratio

35.02/)13(

Hypercardioid:

1= 0.25

zero at 109o highest DI=6.0dB


Overview








LCMV beamformer

• Adaptive filter-and-sum structure:– Aim is to minimize noise output power, while maintaining a chosen

response in a given look direction

(and/or other linear constraints, see below). – This is similar to operation of a superdirective array (in diffuse

noise), or delay-and-sum (in white noise), cfr. (**) p.20 & 27, but now noise field is unknown !

– Implemented as adaptive FIR filter :

]1[...]1[][][ T Nkykykyk mmmmy

M

mm

Tm

T kkkz1

][][][ yfyf

TTM

TT kkkk ][][][][ 21 yyyy

TTM

TT ffff 21 ... T

1,1,0, Nmmmm ffff+:


LCMV beamformer

LCMV = Linearly Constrained Minimum Variance– f designed to minimize output power (variance) (read on..) z[k] :

– To avoid desired signal cancellation, add (J) linear constraints

Example: fix array response in look-direction ψ for sample freqs ωi

For sufficiently large J constrained output power minimization approximately corresponds to constrained output noise power minimization(why?)

– Solution is (obtained using Lagrange-multipliers, etc..):

fRfff

][min][min 2 kkzE yyT

JJMNMNT bCfbfC ,, .

bCRCCRf111 ][][

kk yyT

yyopt


Generalized sidelobe canceler (GSC)

GSC = Adaptive filter formulation of the LCMV-problem

Constrained optimisation is reformulated as a constraint pre-processing, followed by an unconstrained optimisation, leading to a simple adaptation scheme– LCMV-problem is

– Define `blocking matrix’ Ca, ,with columns spanning the null-space of C

– Define ‘quiescent response vector’ fq satisfying constraints

– Parametrize all f’s that satisfy constraints (verify!)

I.e. filter f can be decomposed in a fixed part fq and a variable part Ca. fa

bfCfRff

Tyy

T k ,][min

)( JMNMNa

C

JJMNMN bCf ,,


Generalized sidelobe canceler

GSC = Adaptive filter formulation of the LCMV-problem

Constrained optimisation is reformulated as a constraint pre-processing, followed by an unconstrained optimisation, leading to a simple adaptation scheme– LCMV-problem is

– Unconstrained optimization of fa : (MN-J coefficients)

bfCfRff

Tyy

T k ,][min JJMNMN bCf ,,

).].([.).(min aaqyyT

aaq ka

fCfRfCff


GSC (continued)

– Hence unconstrained optimization of fa can be implemented as an adaptive filter (adaptive linear combiner), with filter inputs (=‘left- hand sides’) equal to and desired filter output (=‘right-hand side’) equal to

– LMS algorithm :


}))..][().][({min...).].([.).(min

2][~][

aa

kd

qaaqyyT

aaq

T

aakkEk fCyfyfCfRfCf

ky

TTff

][~ ky

][kd

])[..][][..(][..][]1[

][~][][~

kkkkkk a

k

aT

kd

Tq

k

Taaa

T

fCyyfyCff

yy



GSC then consists of three parts:

• Fixed beamformer (cfr. fq ), satisfying constraints but not yet minimum

variance), creating `speech reference’ • Blocking matrix (cfr. Ca), placing spatial nulls in the direction of the

speech source (at sampling frequencies) (cfr. C’.Ca=0), creating `noise

references’ • Multi-channel adaptive filter (linear combiner) your favourite one, e.g. LMS

][kd

][~ ky



A popular & significantly cheaper GSC realization is as follows

Note that some reorganization has been done: the blocking matrix now generates (typically) M-1 (instead of MN-J) noise references, the multichannel adaptive filter performs FIR-filtering on each noise reference (instead of merely scaling in the linear combiner). Philosophy is the same, mathematics are different (details on next slide).

Postproc

y1

yM



• Math details: (for Delta’s=0)

][.

~][~

]1[~...]1[~][~][~

~...00

:::

0...~

0

0...0~

][...][][][

]1[...]1[][][

][.][.][~

:1:1

:1:1:1

permuted,

21:1

:1:1:1permuted

permutedpermuted,

kk

Lkkkk

kykykyk

Lkkkk

kkk

MTaM

TTM

TM

TM

Ta

Ta

Ta

Ta

TMM

TTM

TM

TM

Ta

Ta

yCy

yyyy

C

C

C

C

y

yyyy

yCyCy

=input to multi-channel adaptive filter

select `sparse’ blocking matrix such that :

=use this as blocking matrix now



• Blocking matrix Ca (cfr. scheme page 24)

– Creating (M-1) independent noise references by placing spatial nulls in look-direction

– different possibilities

(broadside steering)

• Problems of GSC:– impossible to reduce noise from look-direction– reverberation effects cause signal leakage in noise references adaptive filter should only be updated when no speech is present to avoid

signal cancellation!

1111TC

1001

0101

0011TaC

1111

1111

1111TaC 0

20004000

60008000 0 45 90 135 180

0

0.5

1

1.5

Angle (deg)

Griffiths-Jim

Walsh

Documents

Digital Audio Signal Processing Lecture-2 Microphone Array Processing Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven [email protected] homes.esat.kuleuven.be/~moonen