51
Linear and Parametric Microphone Array Processing Part 4 - Parametric Spatial Processing Emanu¨ el A. P. Habets 1 and Sharon Gannot 2 1 International Audio Laboratories Erlangen, Germany A joint institution of the University of Erlangen-Nuremberg and Fraunhofer IIS 2 Faculty of Engineering, Bar-Ilan University, Israel ICASSP 2013, Vancouver, Canada

Linear and Parametric Microphone Array Processing - Part 4 ...€¦ · Linear and Parametric Microphone Array Processing Part 4 - Parametric Spatial Processing Emanu el A. P. Habets1

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

  • Linear and Parametric Microphone Array Processing

    Part 4 - Parametric Spatial Processing

    Emanuël A. P. Habets1 and Sharon Gannot2

    1 International Audio Laboratories Erlangen, Germany

    A joint institution of the University of Erlangen-Nuremberg and Fraunhofer IIS

    2 Faculty of Engineering, Bar-Ilan University, Israel

    ICASSP 2013, Vancouver, Canada

  • Overview

    1 Overview

    2 Sound Field and Signal Models

    3 Parameter Estimation

    4 Estimation of the Sound Pressures

    5 Applications and Examples

    6 Summary

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 1/50

  • 1. Overview

    Parametric-based spatial audio processing makes use of an efficient parametricrepresentation of the sound-field. A major advantage compared to classicalspatial processing is the limited number of parameters.

    Examples:

    � Directional Audio Coding (DirAC)∗

    � High Angular Resolution Planewave Expansion (HARPEX)∗

    � Computational Auditory Scene Analysis (CASA)

    � Some well-known dereverberation techniques that make use of thereverberation time and signal-to-reverberation ratio.

    Applications:

    � Spatial audio recording, coding and reproduction

    � Source separation / source extraction / noise reduction / dereverberation

    � Acoustic scene analysis / localization

    ∗Patented technologies.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 2/50

  • 1. Overview

    Analysis Synthesis

    Transmission

    Storage

    Manipulation

    Signal(s)

    and

    parameters

    Signal(s)

    and

    parameters

    Figure: Block diagram of a parametric-based spatial audio processing system.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 3/50

  • 1. Overview

    Parameter Estimation

    Parameters

    ReferenceSignal Single-Channel Filter

    orTF-Mask

    ProcessedSignal

    ReferenceSignal

    Parameters

    (Optional)

    Coding

    Transmission

    Storage

    Figure: An example of the analysis stage (left) and processing stage (right) using asingle reference signal.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 4/50

  • Overview

    1 Overview

    2 Sound Field and Signal Models

    3 Parameter Estimation

    4 Estimation of the Sound Pressures

    5 Applications and Examples

    6 Summary

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 5/50

  • 2. Sound Field and Signal Models

    The sound pressure at a position pi, time frame m and discrete frequency k,can be decomposed into a direct sound component and a diffuse soundcomponent:

    Si(k,m) , S(k,m,pi) = Sdir(k,m,pi) + Sdiff(k,m,pi). (1)

    While other models have been presented in the literature, we will focus on themodel in (1) in this tutorial.

    We assume that� The direct components are sparse in the time-frequency domain.� The directional sound components are produced by i) a plane wave or

    ii) an isotropic point-like source (IPLS).� The direct and diffuse components are uncorrelated, i.e.,E {Sdir(k,m,pi)S∗diff(k,m,pi)} = 0.

    � E{|Sdiff(k,m,pi)|2

    }= Pdiff(k,m) for all positions i.

    � Optional: E{|Sdir(k,m,pi)|2

    }= E

    {|Sdir(k,m,pj)|2

    }when ‖pi − pj‖

    is small.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 6/50

  • 2. Sound Field and Signal Models

    The directional sound component Sdir(k,m,pi) can be expressed as:1) An isotropic point-like source:

    Sdir(k,m,pi) = Hdir(k,pi,ps)√Ps(k,m,ps) e

    jφ0(k,m,ps),

    where Hdir(k,pi,ps) describes the transfer function from the source positionps(k,m) to position pi and Ps(k,m,ps) is the power of the source.

    2) A plane wave:

    Sdir(k,m,pi) = e−jµ(k,φs)

    √Ps(k,m) e

    jφ0(k,m),

    where µ(k, φs) denotes the spatial frequency that depends on thedirection-of-arrival (DOA) φs(k,m).

    The i-th microphone signal can be expressed as

    Xi(k,m) , X(k,m,pi) = S(k,m,pi) + V (k,m,pi),

    where V denotes the noise received at position pi and S and V are assumed tobe uncorrelated, i.e., E {S(k,m,pi)V ∗(k,m,pj)} = 0 for all i, j, k and m.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 7/50

  • Overview

    1 Overview

    2 Sound Field and Signal Models

    3 Parameter EstimationDirection of Arrival EstimationPosition EstimationSignal-to-Diffuse RatioDiffuseness Estimation

    4 Estimation of the Sound Pressures

    5 Applications and Examples

    6 Summary

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 8/50

  • 3. Parameter Estimation

    � Commonly performed in the time-frequency domain (e.g., short-timeFourier transform (STFT) domain or perceptually motivated transformdomain).

    � It is assumed that the sound sources do not overlap in the time-frequencydomain. See also [Rickard and Yilmaz, 2002] in which the W-disjointorthogonality of speech is studied.

    � We desire high temporal, spatial and spectral resolutions.

    � Parameters:I Direction of arrivalI PositionI Signal-to-diffuse ratio (SDR)I Diffuseness

    � In the context of CASA, the following parameters are used:I Interaural Level DifferencesI Interaural Time Differences / Interaural Phase DifferencesI Interaural Coherence

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 9/50

  • 3.1 Direction of Arrival Estimation

    � The active sound intensity vector describes the direction and magnitude ofthe net flow of energy characterizing the sound field at the measurementposition.

    � The instantaneous active sound intensity vector is given by

    ia(k,m,pi) ∝ Re {X(k,m,pi) u∗(k,m,pi)} ,

    where X(k,m,pi) is the sound pressure and u(k,m,pi) is a vectorcontaining the particle velocities in the x, y and z dimensions.

    � Signals proportional to the sound pressure and particle velocity (a.k.a.B-format signals) can be measured using an:I Acoustic vector sensorI Sound field microphoneI Circular microphone arrayI Spherical microphone array

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 10/50

  • 3.1 Direction of Arrival Estimation

    � The instantaneous DOA can be expressed by means of the unit vectoreDOA, which points towards to the direction where the sound is comingfrom, namely

    ei(k,m) = −ia(k,m,pi)

    ‖ia(k,m,pi)‖.

    � It is well known that the response of differential arrays deviates from thedesired figure-of-eight response. The effect of such deviations in thecontext of DirAC parameter estimation was studied in[Kallinger et al., 2008]. In the absence of noise, it was shown that the

    relation between the observed azimuth φ̂ and the corresponding trueazimuth φ can be formulated analytically:

    φ̂ = arctan

    (sin(

    2πkrKc

    sinφ)

    sin(

    2πkrKc

    cosφ)) ,

    where r denotes the radius of the array, c denotes the speed of sound, andK denotes the number of subbands.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 11/50

  • 3.1 Direction of Arrival Estimation

    Source A Source B A + B

    Figure: Instantaneous DOA estimates obtained using a circular array (r = 2.65 cmand M = 4). Source A was located at 90◦ and source B was located at 120◦.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 12/50

  • 3.1 Direction of Arrival Estimation

    0 15 30 45 60 75 900

    15

    30

    45

    60

    75

    90

    4 kHz

    5 kHz

    6 kHz

    7 kHz

    Figure: Illustration of bias for an azimuth estimation based on active sound intensityvectors (cylindrical array: r = 2.65 cm and M = 4). The solid gray line correspondsto an unbiased angle estimation.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 13/50

  • 3.1 Direction of Arrival Estimation

    � We can estimate the instantaneous DOA using, for example, the ESPRIT(Estimation of Signal Parameters by Rotational Invariance Techniques)algorithm.

    First Subarray

    Second Subarray

    ︸ ︷︷ ︸

    ︸︷︷︸ φ

    Sdir

    1M

    Figure: Two subarrays of a uniform linear array.

    � A unit vector pointing towards the direct sound source is given by

    ei(k,m) =

    [cos[φ(k,m)]sin[φ(k,m)]

    ].

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 14/50

  • 3.1 Direction of Arrival Estimation

    � Assuming a sound at a fixed position, we can express the directcomponents received by M microphones as:

    sdir(k,m) = d(k,m)√Ps(k,m) e

    jφ0(k,m),

    where d(k,m) =[1 e−jµ(k,m) . . . e−j(M−1)µ(k,m)

    ]Twith

    µ(k,m) =2πk

    Kc∆ cos [φ(k,m)]

    and K denotes the number of subbands.� The power spectral density (PSD) matrix (in the absence of noise) is given

    by

    Φsdir = E{

    sdirsHdir

    }

    = Ps

    1 ejµ · · · ej(M−1)µ

    e−jµ 1 · · · e−j(M−2)µ... · · ·

    . . ....

    e−j(M−1)µ e−j(M−2)µ · · · 1

    = Psa1A

    a2

    .

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 15/50

  • 3.1 Direction of Arrival Estimation

    � Using the two subarrays, we can now form the following expression[a1A

    ]ρ =

    [Aa2

    ].

    where ρ is a complex scalar.

    � We can for example solve ρ in the least-squares sense such that theestimated DOA is given by

    φ(k,m) = arccos

    (cK∠ρLS(k,m)

    2πk∆

    ).

    � Alternative estimators (some of which require the computation of theeigenvalue decomposition of the PSD matrix) are available[van Trees, 2002].

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 16/50

  • 3.1 Direction of Arrival Estimation

    � The narrowband DOA estimators provide incorrect results above thespatial aliasing frequency, which is given by:

    fa = sin( πM

    ) cr

    Hz

    for a uniform circular microphone array (r denotes the radius of the arrayand c denotes the speed of sound) and

    fa =1

    2

    c

    ∆Hz

    for a uniform linear microphone array (∆ denotes the inter-microphonedistance).

    � To overcome this problem, an envelope-based DOA estimator was recentlypresented in [Kratschmer et al., 2012]. The estimator can be used toreliably estimate the DOA above the spatial aliasing frequency.

    � In particular, the time delay between the Hilbert envelope of two subbandmicrophone signals was used to estimate the DOA.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 17/50

  • 3.1 Direction of Arrival Estimation

    00

    −80

    −60

    −40

    −20

    0

    20

    40

    60

    80

    t [sec]

    f[kHz]

    6

    6

    11

    17

    22

    5 71 2 3 4

    (a) ESPRIT

    00

    −80

    −60

    −40

    −20

    0

    20

    40

    60

    80

    t [sec]

    f[kHz]

    6

    6

    11

    17

    22

    5 71 2 3 4

    (b) ESPRIT + envelope-based

    Figure: Estimated DOA for (a) ESPRIT and (b) ESPRIT combined withenvelope-based DOA estimator above spatial aliasing frequency for a recordedreverberant double-talk scenario [Kratschmer et al., 2012]. STFT analysis with1024-point STFT and 50% overlap. Sampling frequency fs = 44.1 kHz.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 18/50

  • 3.2 Position Estimation

    � We can use two or moremicrophone arrays with knownrelative position and orientation.

    � Step 1: At each array thedirection of arrival (DOA) isdetermined for eachtime-frequency instance (k,m).

    � Step 2: The position (w.r.t. areference coordinate system) ofeach IPLS is found viatriangulation. In this example,using the fact thatp1 + ‖d1‖ e1 = p2 + ‖d2‖ e2.

    Figure: Position estimation using theDOAs estimated using two arrays.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 19/50

  • 3.3 Signal-to-Diffuse Ratio

    � The signal-to-diffuse ratio (SDR) is defined as

    Γ(k,m,pi) =Pdir(k,m,pi)

    Pdiff(k,m,pi),

    where Pdir is the power of the direct component and Pdiff is the power ofthe diffuse component.

    � Different estimation methods have been proposed using:I Omni-directional microphones and assuming the DOA=90◦ (real part of the

    complex coherence) [Jeub et al., 2011]I Omni-directional microphones (complex coherence) [Thiergart et al., 2012b]I Using virtual first-order microphones (complex coherence)

    [Thiergart et al., 2011]I First-order directional microphones (complex coherence)

    [Thiergart et al., 2012a]

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 20/50

  • 3.3 Signal-to-Diffuse Ratio

    � For two omni-directional microphones and Pdiff = Pdiff(pi) ∀pi, anestimate of the SDR is given by [Thiergart et al., 2012b]

    Γ̂(k,m) = Re

    {γdiff(k)− γ̂sig(k,m)γ̂sig(k,m)− ejµ̂(k,m)

    },

    where γdiff(k) is the theoretical (or measured) complex spatial coherencebetween the two sensors in a purely diffuse field, γsig(k,m) is the complexspatial coherence between Si and Sj , and µ̂(k,m) is the estimated spatialfrequency of the direct signal components.

    � In [Thiergart et al., 2012b], we proposed the following estimator forµ(k,m):

    µ̂(k,m) = ∠ψij(k,m),

    where ψij(k,m) = E{Xi(k,m)X

    ∗j (k,m)

    }is the cross-PSD.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 21/50

  • 3.3 Signal-to-Diffuse Ratio

    −30 −20 −10 0 10 20 30−30

    −20

    −10

    0

    10

    20

    30

    .

    correct= 90= 120= 150

    Figure: SDR estimated from the complex spatial coherence as a function of the trueSDR at f = 2.48 kHz with 10 time frames (≈ 110 ms).

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 22/50

  • 3.4 Diffuseness Estimation

    � The diffuseness Ψ(k,m) ∈ [0, 1] can be expressed in terms of the SDR:

    Ψ(k,m) =1

    1 + Γ(k,m)=

    Pdiff(k,m)

    Pdir(k,m) + Pdiff(k,m).

    � The diffuseness can be approximated by the square-root of one minus thecoefficient of variation of the active intensity vector ia(k,m)[Ahonen and Pulkki, 2009]:

    Ψ(k,m) =

    √1− ‖E {ia(k,m)} ‖

    E {‖ia(k,m)‖}.

    � This estimator determines the diffuseness of sound field by comparing thelength of the average intensity vector (numerator) with the average lengthof the intensity vector (denominator).

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 23/50

  • 3.4 Diffuseness Estimation

    Source A Source B A + B

    Figure: Diffuseness estimates obtained using a circular array with M = 4 microphonesusing the coefficient of variation method. Source A was located at 90◦ at a distanceof 1 m and source B was located at 120◦ at a distance of 2 m.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 24/50

  • Overview

    1 Overview

    2 Sound Field and Signal Models

    3 Parameter Estimation

    4 Estimation of the Sound Pressures

    5 Applications and Examples

    6 Summary

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 25/50

  • 4. Estimation of the Sound Pressures

    � To manipulate or synthesize the recorded reference signal, we estimateSdir and Sdiff based on the SDR or diffuseness.

    � In principle, more recorded signals can be transmitted and used.

    � In the absence of noise, we can for example estimate Sdir using thesquare-root Wiener filter using a reference signal R(k,m):

    Ŝdir(k,m) =

    √Pdir(k,m)

    Pdir(k,m) + Pdiff(k,m)R(k,m)

    =

    √Γ(k,m)

    1 + Γ(k,m)R(k,m)

    =√

    1−Ψ(k,m) R(k,m)

    such that E{|Ŝdir(k,m)|2

    }= Pdir(k,m).

    � Note that the square-root Wiener filter has received lots of attention inthe context of noise reduction for hearing aid devices.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 26/50

  • 4. Estimation of the Sound Pressures

    � In a similar way, we can estimate the diffuse sound component Sdiff:

    Ŝdiff(k,m) =

    √Pdiff(k,m)

    Pdir(k,m) + Pdiff(k,m)R(k,m)

    =

    √1

    1 + Γ(k,m)R(k,m)

    =√

    Ψ(k,m) R(k,m).

    � Note that E{|Ŝdir(k,m)|2

    }+ E

    {|Ŝdiff(k,m)|2

    }= E

    {|R(k,m)|2

    }.

    � Obviously, many existing estimators can be used as well as techniques toreduce musical noise and other artifacts.

    � Finally, we can take into account the ambient noise V (k,m).

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 27/50

  • Overview

    1 Overview

    2 Sound Field and Signal Models

    3 Parameter Estimation

    4 Estimation of the Sound Pressures

    5 Applications and ExamplesSpatial ReproductionDirectional FilteringDereverberationAcoustic ZoomVirtual Microphone

    6 Summary

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 28/50

  • 5.1 Spatial Reproduction

    Vector

    Base

    Amplitude

    Panning

    Sdir(k,m)

    Sdiff(k,m)

    φ(k,m)

    Decorrelators

    Decorrelated

    signals

    Gain factors

    +

    +

    +

    x

    x

    x

    Figure: DirAC spatial reproduction [Pulkki, 2007] - here presented without the inverseSTFT.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 29/50

  • 5.2 Directional Filtering

    � In [Kallinger et al., 2009] a directional filter was proposed in theparametric domain by modifying the reference signal R(k,m), thediffuseness Ψ(k,m), and the DOA φ(k,m) (i.e., the signal and parametricside information used in DirAC).

    � In this tutorial, we apply two gain functions directly to the directional anddiffuse components:

    S̃dir(k,m) = Gdir[φ(k,m)] Ŝdir(k,m)

    S̃diff(k,m) = Gdiff Ŝdiff(k,m)

    � The modified reference signal is now given by:

    R̃(k,m) = S̃dir(k,m) + S̃diff(k,m).

    � The directional response of a first-order directional microphone is given by

    D[φ(k,m)] = α+ (1− α) cos[φ(k,m)− φd],where α is the shape parameter of a first-order directivity response and φdis the desired look-direction.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 30/50

  • 5.2 Directional Filtering

    � The power of a diffuse component at the output of such a first-orderdirectional microphone is given by

    Q =1

    ∫ 2π0

    D2[φ] dφ =3

    2α2 − α+ 1

    2.

    � The gain functions are therefore given by

    Gdir[φ(k,m)] = D[φ(k,m)]

    Gdiff =√Q.

    � Obviously, other directivity patterns can be used. Because of errors in theDOA estimates, we cannot use very sharp responses. In practice, abeam-width of 60◦ can be used without introducing severe audibleartifacts (assuming the direct components are sparse in the time-frequencydomain).

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 31/50

  • 5.2 Directional Filtering

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28−1

    0

    1

    Time [s]

    Am

    plit

    ude

    Fre

    quency (

    kH

    z)

    0

    1

    2

    3

    4

    5

    6

    7

    8

    (a) Reference microphone.

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28−1

    0

    1

    Time [s]

    Am

    plit

    ude

    Fre

    quency (

    kH

    z)

    0

    1

    2

    3

    4

    5

    6

    7

    8

    (b) Focus on female speaker.

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28−1

    0

    1

    Time [s]

    Am

    plit

    ude

    Fre

    quency (

    kH

    z)

    0

    1

    2

    3

    4

    5

    6

    7

    8

    (c) Focus on male speaker.

    Figure: Directional filtering example with two speakers. Thanks to Markus Kallingerfor providing the audio examples.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 32/50

  • 5.3 Dereverberation

    � A signal which contains less reverberation compared to the reference signalR(k,m) is given by [Kallinger et al., 2011]

    R̃(k,m) = Sdir(k,m) + β Sdiff(k,m)

    where β (0 ≤ β ≤ 1) is the reverberation reduction factor.

    � We seek a (real-valued) gain function that can be applied directly to the

    reference signal in order to estimate R̃(k,m), i.e.,

    R̃(k,m) = G(k,m)R(k,m).

    � The gain function that minimizes the error in the minimum mean squaresense is given by

    G(k,m) = argminG(k,m)

    E

    {∣∣∣R̃(k,m)−G(k,m)R(k,m)∣∣∣2}= 1− (1− β) Ψ(k,m) = Γ(k,m) + β

    Γ(k,m) + 1.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 33/50

  • 5.3 Dereverberation

    0 1 2 3−1

    0

    1

    Time [s]

    Am

    plit

    ude

    Fre

    quency (

    kH

    z)

    0

    1

    2

    3

    4

    5

    6

    7

    8

    (a) Reference microphone.

    0 1 2 3−1

    0

    1

    Time [s]

    Am

    plit

    ude

    Fre

    quency (

    kH

    z)

    0

    1

    2

    3

    4

    5

    6

    7

    8

    (b) MVDR beamformer (super-directive).

    0 1 2 3−1

    0

    1

    Time [s]

    Am

    plit

    ude

    Fre

    quency (

    kH

    z)

    0

    1

    2

    3

    4

    5

    6

    7

    8

    (c) Parametric −6 dB suppression.Figure: Dereverberation example - Thanks to Markus Kallinger for providing the audioexamples.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 34/50

  • 5.4 Acoustic Zoom

    � In [Schultz-Amling et al., 2010], atechnique was proposed for anacoustic zoom, which allows us tovirtually change the recordingposition.

    � To change the recording position,we need to:

    1. Change the DOAs of thedirectional sound sources.

    2. Change the signal-to-diffuseratio and the levels of the directsound components.

    L = Listener

    T = Talker

    T L

    T

    T

    Figure: Acoustic zoom.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 35/50

  • 5.4 Acoustic Zoom

    � It was proposed to remap theDOAs such that they correspondto the new listening position.

    � The region of interest increasesfrom 2φ to 2φ′ when the listenermoves d meters closer.

    � The following mapping functionwas derived:

    φ′ = arccos

    (r2 cos(φ) + d2 − r d[1 + cos(φ)](r − d)

    √d2 + r2 − 2r d cos(φ)

    ).

    r

    r′ φ′

    d

    p1

    p2

    Figure: Details of the geometric setup.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 36/50

  • 5.4 Acoustic Zoom

    � Three assumptions were made for a zoomed-in audio scene:1. A sound source becomes louder while approaching it.2. Sound coming from the side and back should be attenuated as it moves out

    of focus.3. A sound source moving closer become less diffuse and sound sources

    moving to the background becomes more diffuse.

    � This can be accomplished by extending the aforementioned directionalfiltering technique which will now depend on the DOA, the radius r andthe distance d. The direct and diffuse signals can be modified as follows:

    S̃dir(k,m) = Gdir[φ(k,m), d, r] Ŝdir(k,m)

    S̃diff(k,m) = Gdiff[φ(k,m), d, r] Ŝdiff(k,m)

    � More details can be found in [Schultz-Amling et al., 2010].

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 37/50

  • 5.4 Acoustic Zoom

    −180 −150 −120 −90 −60 −30 0 30 60 90 120 150 1800

    0.005

    0.01

    0.015

    0.02

    PDF

    −180 −150 −120 −90 −60 −30 0 30 60 90 120 150 1800

    0.005

    0.01

    0.015

    0.02

    −180 −150 −120 −90 −60 −30 0 30 60 90 120 150 1800

    0.005

    0.01

    0.015

    0.02

    PDF

    PDF

    Figure: PDF of the azimuth for the given scenario of three simultaneous talkers[Schultz-Amling et al., 2010]. Top: Microphone at position p1. Middle: Microphoneat position p2. Bottom: Microphone at position p1 and virtually moved to p2 withthe acoustic zoom processing.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 38/50

  • 5.5 Virtual Microphone

    � In [Del Galdo et al., 2011], a techniquewas proposed to generate virtualmicrophone signals.

    � The virtual microphone signal is computedusing the position of the IPLS as denotedby ps. In the following, we assume thatSdiff(k,m) = 0.

    � The position of the virtual microphone isdefined by the user and is denoted by pv.

    Figure: Geometric illustration ofthe problem.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 39/50

  • 5.5 Virtual Microphone

    � In the following we use X(k,m,p1) as a reference signal. We could alsouse any other microphone signal or a combination of the microphonesignals.

    � According to the model and in the absence of noise we have

    X(k,m,p1) = Hdir(p1,ps)S(k,m,ps).

    � Our objective is to compute a signal that sounds perceptually similar to asignal recorded using a microphone placed at position pv:

    X(k,m,pv) = Hdir(pv,ps)S(k,m,ps)

    = Hdir(pv,ps)H−1dir (p1,ps)X(k,m,p1).

    � As we do not know Hdir, we propose to use a simple model in which weonly model the attenuation of the sound pressure:

    Hdir[p1,ps(k,m)] =1

    ‖ps(k,m)− p1‖=

    1

    ‖d1(k,m)‖.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 40/50

  • 5.5 Virtual Microphone

    � Using the same model, we can now predict the attenuation from the IPLSto the position of the virtual microphone, i.e.,

    Hdir[pv,ps(k,m)] =1

    ‖ps(k,m)− pv‖=

    1

    ‖dv(k,m)‖.

    � Therefore, the virtual microphone signal is given by

    X(k,m,pv) =‖d1(k,m)‖‖dv(k,m)‖

    X(k,m,p1).

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 41/50

  • 5.5 Virtual Microphone

    � We can simulate any arbitrary directional response by defining the angleφv(k,m) that represents the DOA of the IPLS from the perspective of thevirtual microphone:

    φv(k,m) = arccos

    (dv(k,m) · cv‖dv(k,m)‖

    ),

    where cv is a unit vector describing the orientation of the virtualmicrophone.

    � Finally, the virtual microphone signal is now given by

    X(k,m,pv) = D[φv(k,m)]‖d1(k,m)‖‖dv(k,m)‖

    X(k,m,p1).

    � We can for instance use

    D[φv(k,m)] =1

    2+

    1

    2cos[φv(k,m)]

    to simulate a virtual microphone with cardioid directivity.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 42/50

  • 5.5 Virtual Microphone

    x [m]

    y [

    m]

    −5 −2.5 0 2.5 5−4

    −3

    −2

    −1

    0

    1

    2

    3

    4

    −50

    −40

    −30

    −20

    −10

    0

    x [m]

    y [

    m]

    −5 −2.5 0 2.5 5−4

    −3

    −2

    −1

    0

    1

    2

    3

    4

    −50

    −40

    −30

    −20

    −10

    0

    Figure: Spatial power density obtained using two circular arrays (M = 4 andr = 1.6 cm) for a one talker (left) and two talkers (right).

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 43/50

  • 5.5 Virtual Microphone

    0 5 10 15 20 25 30 35 40

    Time [ms]

    h1(t

    )

    Early part

    Direct part

    Direct part Early part Reverberant part

    Figure: Spatial power densities (in dB) of the localized positions ps for a single soundsource.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 44/50

  • 5.5 Virtual Microphone

    Time [s]

    Fre

    qu

    en

    cy [H

    z]

    Source A Source B A+B

    10 15 20 25 30 35 40 450

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    −60

    −50

    −40

    −30

    −20

    −10

    0

    Time [s]

    Fre

    qu

    en

    cy [H

    z]

    Source A Source B A+B

    10 15 20 25 30 35 40 450

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    −60

    −50

    −40

    −30

    −20

    −10

    0

    Figure: Spectrogram of a virtual omnidirectional microphone signal (left) and a virtualcardioid microphone pointing to Source A (right).

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 45/50

  • Overview

    1 Overview

    2 Sound Field and Signal Models

    3 Parameter Estimation

    4 Estimation of the Sound Pressures

    5 Applications and Examples

    6 Summary

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 46/50

  • 6. Summary

    � Parametric-based spatial audio processing relies on a simple yet powerfuldescription of the sound-field.

    � Accurate estimation of the time and frequency dependent parameters isparamount.

    � Possible model violations can introduce artifacts.

    � Several applications have been developed over the last five years.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 47/50

  • References I

    Ahonen, J. and Pulkki, V. (2009).

    Diffuseness estimation using temporal variation of intensity vectors.

    In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages285–288, San Francisco, USA.

    Del Galdo, G., Thiergart, O., Weller, T., and Habets, E. A. P. (2011).

    Generating virtual microphone signals using geometrical information gathered by distributedarrays.

    In Proc. Hands-Free Speech Communication and Microphone Arrays (HSCMA), Edinburgh,United Kingdom.

    Jeub, M., Nelke, C. M., Beaugeant, C., and Vary, P. (2011).

    Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals.

    In Proc. European Signal Processing Conf. (EUSIPCO), pages 1347–1351.

    Kallinger, M., Del Galdo, G., Kuech, F., and Thiergart, O. (2011).

    Dereverberation in the spatial audio coding domain.

    In 130th AES Convention, Paper 8429, London, UK.

    Kallinger, M., Kuech, F., Schultz-Amling, R., Del Galdo, G., Ahonen, J., and Pulkki, V.

    (2008).

    Enhanced direction estimation using microphone arrays for directional audio coding.

    In Proc. Hands-Free Speech Communication and Microphone Arrays (HSCMA), pages 45–48.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 48/50

  • References II

    Kallinger, M., Ochsenfeld, H., Del Galdo, G., Kuech, F., Mahne, D., Schultz-Amling, R., and

    Thiergart, O. (2009).

    A spatial filtering approach for directional audio coding.

    In 126th AES Convention, Paper 7653, Munich, Germany.

    Kratschmer, M., Thiergart, O., and Pulkki, V. (2012).

    Envelope-based spatial parameter estimation in directional audio coding.

    In Audio Engineering Society Convention 133, San Francisco, USA.

    Pulkki, V. (2007).

    Spatial sound reproduction with directional audio coding.

    J. Audio Eng. Soc, 55(6):503–516.

    Rickard, S. and Yilmaz, Z. (2002).

    On the approximate W-disjoint orthogonality of speech.

    In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP).

    Schultz-Amling, R., Kuech, F., Thiergart, O., and Kallinger, M. (2010).

    Acoustical zooming based on a parametric sound field representation.

    In Audio Engineering Society Convention 128, London UK.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 49/50

  • References III

    Thiergart, O., Del Galdo, G., and Habets, E. A. P. (2011).

    Diffuseness estimation with high temporal resolution via spatial coherence between virtualfirst-order microphones.

    In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages217–220.

    Thiergart, O., Del Galdo, G., and Habets, E. A. P. (2012a).

    On the spatial coherence in mixed sound fields and its application to signal-to-diffuse ratioestimation.

    J. Acoust. Soc. Am., 132(4):2337–2346.

    Thiergart, O., Del Galdo, G., and Habets, E. A. P. (2012b).

    Signal-to-reverberant ratio estimation based on the complex spatial coherence betweenomnidirectional microphones.

    In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP).

    van Trees, H. L. (2002).

    Detection, Estimation, and Modulation Theory, volume IV, Optimum Array Processing.

    Wiley, New York, USA.

    Linear and Parametric Microphone Array Processing

    Emanuël Habets (FAU) and Sharon Gannot (BIU)

    c© International Audio Laboratories Erlangen, 2013Page 50/50

    OverviewSound Field and Signal ModelsParameter EstimationDirection of Arrival EstimationPosition EstimationSignal-to-Diffuse RatioDiffuseness Estimation

    Estimation of the Sound PressuresApplications and ExamplesSpatial ReproductionDirectional FilteringDereverberationAcoustic ZoomVirtual Microphone

    Summary