1
Increasing the environment-awareness of rake beamforming for directive acoustic sources Pasi Pertilä , Alessio Brui * Tampere University of Technology, Department of Signal Processing, Finland, [email protected] * Fondazione Bruno Kessler ITC - irst, via Sommarive 18, 38123, Trento, Italy, [email protected] Introduction I Distant speech is deteriorated by reverberation and noise I Beamforming is oen used to enhance the source signal I The multi-path acoustic propagation leading to reverberation is not completely detrimental and can be used in a constructive way I Sound reflections from surfaces has been successfully applied in various research areas . Sound Source Localization (SSL) [1, 2], . Inferring the room reflector positions [3] and . Room shape estimation [4]. I This paper investigates the use of mirrored microphones, (see Fig. 1), associated to isolated and strong reflections, in combination with source directivity, to improve the captured speech quality. I Reflections are used to improve the quality of the beamformed signal Methodology Rake Beamforming The concept of echo exploitation in microphone array enhancement was introduced in [5] as Acoustic Rake Receiver (ARR). In [6], given Room Impulse Response (RIR) knowledge regarding reflections, the echoes are exploited considering multiple beamformers. Virtual source beamforming is developed in [7]. Distributed microphones Besides distance to source d , the angle γ between a directive source’s orientation and receiver direction aects the signal level in the receiver. The source directivity has been exploited in Delay and Sum Beamforming (DSB) framework for a distributed microphone array [8]. Beamforming with virtual microphone arrays θ s m (0,0,0) j m (1,0,0) j m (1,0,0) j y x -90 90 -45 k 45 Reflected path Direct path Virtual array Real array Virtual microphone Real microphone Figure 1: Illustration of microphone mirroring and the utilized geometry. A microphone m j of the array and its 1st order mirrors are shown. Right: actual measurement room. I We adopt a purely geometric modeling of the RIR, without any knowledge about the actual propagation paerns. I The contributions of the mirror microphones are weighted according to the surface specific reflection coeicient β . I We include the source directivity scheme into the ARR DSB. The beamforming weights for the j th microphone’s signal at m (n,m,p ) j are obtained based on "filter-and-sum" principle W j (ω )= M n=-M M m=-M M p =-M a (n,m,p ) j e j ω c -1 (d (n,m,p) j -d 0 ) , (1) where a (n,m,p ) j is the gain, defined as the product of 1. normalized inverse distance to (virtual) microphone d (n,m,p ) j 2. source radiation paern ζ (γ,ω ) for angle γ between source orientation and (virtual) microphone direction. 3. the reflection losses β (n,m,p ) incurred in the specific path defined by triple (n, m, p ), refer to Fig. 1. DSB designs considered 1. DSB: only real microphone array, reflection order M = 0 2. Rake Delay and Sum Beamforming (R-DSB) [7]: real and virtual arrays using assumption 1 (distance based aenuation). 3. Directivity and Reflection Weighted R-DSB (DRWR-DSB): real and virtual microphones using the gain assumptions 1–3, with directivity ζ (γ j )= 1+cos(γ j ) 2 α Results Speech from source orientation angles between -90 , 90 are simulated and measured using RIRs (3 m distance, SNR + 5 dB). 0 0.5 0.75 1 1.5 2 4 6 8 40 60 80 100 STOI improvement over nearest microphone Source directivity parameter α, SNR 5 dB Relative improvement [%] (a) STOI for real RIRs 0 0.5 0.75 1 1.5 2 4 6 8 3.5 4 4.5 5 SSNR improvement over nearest microphone Source directivity parameter α, SNR 5 dB Increase in SNR [dB] DSB DRWR-DSB, Mirror: P1 DRWR-DSB, Mirror: F1 DRWR-DSB, Mirror: F2 R-DSB, Mirror: P1 R-DSB, Mirror: F1 R-DSB, Mirror: F2 (b) SSNR for real RIRs (a) Using actual measured RIRs 0 0.5 1 1.5 2 3 4 5 6 90 100 110 120 130 140 150 160 170 STOI improvement over nearest microphone Source directivity parameter α Relative improvement [%] DSB DRWR-DSB, Mirror: P1 DRWR-DSB, Mirror: F1 DRWR-DSB, Mirror: F2 R-DSB, Mirror: P1 R-DSB, Mirror: F1 R-DSB, Mirror: F2 0 0.5 1 1.5 2 3 4 5 6 2 2.5 3 3.5 SSNR improvement over nearest microphone Source directivity parameter α Increase in SNR [dB] (b) sim., T 60 = 0.5s, directive source (α = 2) Figure 2: Performance improvement over unprocessed microphone signal averaged over all source orientation. 90 68 45 23 0 338 315 293 270 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Source orientation θ, SNR 5 dB STOI Near mic. DSB R-DSB, Mirror: F1 DRWR-DSB, Mirror: F1, α=2 90 68 45 23 0 338 315 293 270 -14 -12 -10 -8 -6 Source orientation θ, SNR 5 dB SSNR [dB] (a) Performance as a function of source orientation using measured RIRs. 0.1 0.3 0.5 0.7 0.9 60 80 100 120 140 160 180 STOI improvement over nearest microphone Reverberation time T 60 (s) Relative improvement [%] 0.1 0.3 0.5 0.7 0.9 1 2 3 4 5 6 SSNR improvement over nearest microphone Reverberation time T 60 (s) Increase in SNR [dB] DSB R-DSB, Mirror: F2 DRWR-DSB,Mirror:F2 α= 0 DRWR-DSB,Mirror:F2 α= 2 (b) Performance gain as a function of the reverberation time. Figure 3: Performance improvement for each angle in real-data (le) and dierent simulated reverberation times (right), SNR +5 dB. Conclusions Experiments on simulated as well as real RIRs using objective intelligibility and SNR of speech signals show that the rake beamforming benefits from the proposed enhanced modeling of i) distance based gain, ii) source radiation modeling, and iii) reflection loss modeling. [1] P. Svaizer, A. Brui, and M. Omologo, “Environment aware estimation of the orientation of acoustic sources using a line array,” in Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Aug 2012, pp. 1024–1028. [2] Teemu Korhonen, “Acoustic localization using reverberation with virtual microphones,” in Proc. IWAENC, 2008. [3] F. Antonacci, J. Filos, M. R. P. Thomas, E. A. P. Habets, A. Sarti, P. A. Naylor, and S. Tubaro, “Inference of room geometry from acoustic impulse responses,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 10, pp. 2683–2695, Dec 2012. [4] Ivan Dokmanić, Reza Parhizkar, Andreas Walther, Yue M. Lu, and Martin Veerli, “Acoustic echoes reveal room shape,” Proceedings of the National Academy of Sciences (PNAS), vol. 110, no. 30, pp. 12186–12191, 2013. [5] P. Annibale, F. Antonacci, P. Bestagini, A. Brui, A. Canclini, L. Cristoforei, E. A. P. Habets, J. Filos, W. Kellermann, K. Kowalczyk, A. Lombard, E. Mabande, D. Markovic, P. A. Naylor, M. Omologo, R. Rabenstein, A. Sarti, P. Svaizer, andM. R. P. Thomas, “The SCENIC Project: Space-Time Audio Processing for Environment-Aware Acoustic Sensing and Rendering,” in Audio Engineering Society Convention 131, Oct 2011. [6] T. Nishiura, S. Nakanura, and K. Shikano, “Speech enhancement by multiple beamforming with reflection signal equalization,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), 2001, vol. 1, pp. 189–192. [7] I. Dokmanić, R. Scheibler, and M. Veerli, “Raking the cocktail party,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5, pp. 825–836, Aug 2015. [8] T. Betlehem and R.C. Williamson, “Acoustic beamforming exploiting directionality of human speech sources,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003, pp. 147–148. http://www.cs.tut.fi/ pertila/ [email protected]

Increasing the environment-awareness of rake beamforming ...pertila/pertila-iwaenc-2016-poster.pdf · 3.2. Results on measured RIRs We evaluated the proposed algorithms on RIRs measured

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Increasing the environment-awareness of rake beamforming ...pertila/pertila-iwaenc-2016-poster.pdf · 3.2. Results on measured RIRs We evaluated the proposed algorithms on RIRs measured

Increasing the environment-awareness of rakebeamforming for directive acoustic sources

Pasi Pertilä†, Alessio Bru�i∗

†Tampere University of Technology, Department of Signal Processing, Finland, [email protected]∗Fondazione Bruno Kessler ITC - irst, via Sommarive 18, 38123, Trento, Italy, bru�[email protected]

Introduction

I Distant speech is deteriorated by reverberation and noise

I Beamforming is o�en used to enhance the source signal

I The multi-path acoustic propagation leading to reverberationis not completely detrimental and can be used in aconstructive way

I Sound reflections from surfaces has been successfully appliedin various research areas

. Sound Source Localization (SSL) [1, 2],

. Inferring the room reflector positions [3] and

. Room shape estimation [4].

I This paper investigates the use of mirrored microphones, (seeFig. 1), associated to isolated and strong reflections, incombination with source directivity, to improve the capturedspeech quality.

I Reflections are used to improve the quality of thebeamformed signal

Methodology

Rake Beamforming The concept of echo exploitation inmicrophone array enhancement was introduced in [5] asAcoustic Rake Receiver (ARR). In [6], given Room ImpulseResponse (RIR) knowledge regarding reflections, the echoesare exploited considering multiple beamformers. Virtualsource beamforming is developed in [7].

Distributed microphones Besides distance to source d, theangle γ between a directive source’s orientation and receiverdirection a�ects the signal level in the receiver.The source directivity has been exploited in Delay and SumBeamforming (DSB) framework for a distributed microphonearray [8].

Beamforming with virtual microphone arrays

θ

s

m(0,0,0)j m

(−1,0,0)j

m(1,0,0)j

y

x

-90◦ 90◦

-45◦

k 45◦

Reflected pathDirect pathVirtual arrayReal arrayVirtual microphoneReal microphone

Fig. 1: Illustration of microphone mirroring and the utilized geom-etry. When source orientation is parallel to y-axis with orientationgiven by k = [0, −1, 0]T, the corresponding spherical coordinate’sazimuth is θ = 0◦. The array construction is given in Fig. 2. A mi-crophone mj and its mirrors are shown.

Y (ω) =

N−1!

j=0

Wj(ω)Xj(ω), (1)

where the microphone signal is modeled as Xj(ω) = S(ω)Hj(ω)+Nj(ω): S(ω) is the source signal, Hj(ω) is the RIR between s andmj , Wj(ω) represents a filter associated with jth microphone, andNj(ω) is noise that is assumed independent between microphones.For sake of brevity the frequency term ω is omitted.

For an empty room, the RIR Hj can be approximated using theimage method [16], by mirroring the microphone mj with respect tothe surfaces of the enclosure. This mirroring procedure produces aset of virtual microphones which can be directly included in Eq. (1).Note that, due to the intrinsic symmetry of the problem, reflectionscan be modeled by means of microphone mirrors instead of sourcemirrors (as commonly done [14, 16]). An integer triple (n,m, p)is used to denote mirroring with respect to x,y,z surfaces, respec-tively, and n, m, p ∈ [−M, M ], where M is the reflection order.The sign of a triple variable indicates whether the mirroring is per-formed with respect to surface in the negative or in the positive axisdirection as illustrated in Fig. 1. The triple (0, 0, 0) refers to a realnon-mirrored microphone. The real/virtual microphone location isdenoted as m

(n,m,p)j . The distance from source to microphone j is

d(n,m,p)j = ∥m(n,m,p)

j − s∥, (2)

and τ(n,m,p)j = c−1(d

(n,m,p)j − d0) is used to denote Time Dif-

ference of Arrival (TDoA) between jth microphone m(n,m,p)j and

a reference microphone m0, where c is the speed of sound. Thebeamforming weights for the microphone’s signal at m

(n,m,p)j are

obtained based on same ”filter-and-sum” principle as for real micro-phones:

Wj =

M!

n=−M

M!

m=−M

M!

p=−M

a(n,m,p)j ejωτ

(n,m,p)j , (3)

where the gain a(n,m,p)j in near-field design is based on distance [17]

a(n,m,p)j = d0/d

(n,m,p)j . (4)

In this paper we consider a more detailed definition of a(n,m,p)j that

accounts for the source directivity [15] and for the energy dissipatedin the reflections, leading to the introduction of the Directivity and

Reflection Weighted rake-DSB (DRWR-DSB) gain:

a(n,m,p)j =

d0

d(n,m,p)j

ζ(γj , ω)β(n,m,p). (5)

In Eq. (5) the terms β(n,m,p) is the product of the room surfacespecific reflection coefficients involved in the sound reflection pathdefined by the triple (n, m, p). The direct path coefficient β(0,0,0)

evaluates as 1. While microphones are assumed omnidirectional, thefrequency dependent term ζ(γj , ω) models the source radiation pat-tern. The angular distance γj of the jth receiver from the source withorientation vector k = [kx, ky, kz]

T is:

γj = arccos

"kT(m

(n,m,p)j − s)

∥k∥ · ∥m(n,m,p)j − s∥

#. (6)

From Eqs. (3)– (5) three different DSB designs are considered:

• DSB: only real array (M = 0) using the gain in Eq. (4).

• R-DSB [14]: real and virtual arrays (M > 0) using the gainin Eq. (4).

• DRWR-DSB: real and virtual microphones (M > 0) usingthe gain in Eq. (5) with a directive source and energy dissipa-tion in reflections.

Note that all methods assume knowledge of source and micro-phone positions to obtain the beamforming weights. The R-DSB andDRWR-DSB assume here a shoebox-shaped enclosure with knowndimensions, while DRWR-DSB parameterizes the reflection coeffi-cient β and source radiation pattern ζ(γj , ω) (which in the R-DSBare β = 1 and ζ(γj , ω) = 1).

3. EXPERIMENTAL ANALYSIS

In this section, we experiment with a set of RIRs to explore the im-provements of modeling source directivity and reflection coefficientsover R-DSB. In addition, the order of reflections for beamformingweight design is experimented with, to see how the raking meth-ods behave in 1) a practical scenario where there is mismatch in themodeled room dimensions, source and microphone positions, and re-flection behavior from actuality and in 2) simulated setup with exactknowledge of corresponding parameters.

The experimental analysis is based on the set up shown in Fig. 1using both simulated and measured RIRs. The 13 microphone nestedarray depicted in Fig. 2 was installed parallel to the x-axis in a roomwhose x,y,z -dimensions are 3.49×5.22×2.56 meters. Nine sourceorientations were considered : θ = 90◦, 67.5◦, 45◦, 22.5◦, 0◦, -22.5◦,-45◦, -67.5◦, -90◦, where in θ = 0◦ the source direction is paral-lel to y-axis, and elevation angle was kept horizontal. The loud-speaker was located at s = [0.985, 3.200, 1.465]T. The referencemicrophone, having the largest x-coordinate value, was located atm0 = [2.680, 0.425, 1.515]T. Microphones were omnidirectional.Ten sentences spoken by randomly selected readers (four women,six men) from the TIMIT database [18] were used to comprise a34 s test sentence, which was kept the same for all trials. This orig-inal sentence was then convolved with the impulse responses (realor simulated) of the individual microphones to get the reverberatedmicrophone signals. White Gaussian noise (WGN) was added to thereverberated microphone signals in order to obtain a realistic level ofSNR for the array of microphones. The DSB methods used a win-dow length of 43 ms with 50 % overlap between the adjacent framesto obtain the enhanced speech signal at 16 kHz.

Figure 1: Illustration of microphone mirroring and the utilized geometry. A microphonemj of the array and its 1st order mirrors are shown. Right: actual measurement room.

I We adopt a purely geometric modeling of the RIR, withoutany knowledge about the actual propagation pa�erns.

I The contributions of the mirror microphones are weightedaccording to the surface specific reflection coe�icient β.

I We include the source directivity scheme into the ARR DSB.The beamforming weights for the jth microphone’s signal atm(n,m,p)

j are obtained based on "filter-and-sum" principle

Wj(ω) =∑M

n=−M∑M

m=−M∑M

p=−M a(n,m,p)j ejωc−1(d(n,m,p)j −d0), (1)

where a(n,m,p)j is the gain, defined as the product of

1. normalized inverse distance to (virtual) microphone d(n,m,p)j

2. source radiation pa�ern ζ(γ, ω) for angle γ between sourceorientation and (virtual) microphone direction.

3. the reflection losses β(n,m,p) incurred in the specific pathdefined by triple (n,m, p), refer to Fig. 1.

DSB designs considered

1. DSB: only real microphone array, reflection order M = 0

2. Rake Delay and Sum Beamforming (R-DSB) [7]: real andvirtual arrays using assumption 1 (distance based a�enuation).

3. Directivity and Reflection WeightedR-DSB (DRWR-DSB): real and virtual microphones usingthe gain assumptions 1–3, with directivity ζ(γj) =

(1+cos(γj)

2

Results

Speech from source orientation angles between−90◦, 90◦ aresimulated and measured using RIRs (3 m distance, SNR+5 dB).

Fig. 2: Geometry of the nested array used in the experiments.

Two metrics were used to evaluate the performance of the in-vestigated methods. The first metric is the Short-Term Objective In-telligibility (STOI) [19, 20] that is designed to predict the perceivedintelligibility that would be given by subjects in a listening test. Thesecond metric is the Segmental Signal to Noise Ratio (SSNR), com-monly used in evaluating speech enhancement algorithms [21]. Bothmetrics are established and standardized.

3.1. Array reflection order and source directivity

To consider different amount of reflections for the virtual arrayschemes (”R-DSB” and ”DRWR-DSB”), the following reflection or-ders are tested in the corresponding array gain design Eqs. (4)–(5):1. Partial 1st order (P1): reflections from the y-axis walls and fromthe ceiling was used to mirror the array. These surfaces providedintuitively most dominant reflections. In total, 3 virtual arrays (39microphones) and the real array (13 microphones) were considered.2. Full 1st order reflections (F1) considers all 1st order reflections,i.e. M = 1 in Eq. (3). In total 26 virtual arrays (338 virtual micro-phones) are used in addition to the real array.3. Full 2nd order reflections (F2): 2nd order reflections from eachsurface are considered, i.e M = 2 in Eq. (3). In total 124 virtualarrays are used in addition to the real array.

The surface reflections coefficients β are generally frequency,incidence angle, and material dependent [22]. For simplicity, thereflection coefficients are assumed identical for each surface and arederived by using the Eyering’s reverberation time formula [22] usingthe desired reverberation time T60 and room dimensions.

The radiation pattern of the source is modeled through a simpleparametric polar pattern

ζ(γj) =

(1 + cos(γj)

2

, (7)

where γj is the angular difference between source orientation andjth receiver, refer to Eq. (6), and parameter α ≥ 0 is used to controlthe amount of directivity; α=0 is an omnidirectional source, α=1is a cardioid source, and α > 1 represents a hypercardioid source.Refer to Table 1 for source directivity characterization. Note that thepattern is assumed frequency-independent for simplicity and doesnot restrict the generalization of the method.

3.2. Results on measured RIRs

We evaluated the proposed algorithms on RIRs measured in a realroom whose layout is depicted in Figure 1. The reverberation timeis approximately T60 = 0.35 s, and corresponding reflection coef-ficient is evaluated as β = 0.84. The room features a very highisolation from external noise sources, ensuring a good quality in theaudio recording. RIRs were obtained using an MLS sequence oforder 20 played at 48 kHz from a loudspeaker with nine orientations

Table 1: Source directivity ζ(γ) width given as the angle γ where−6 dB attenuation is reached for different α values, refer to Eq. (7).

α 0.0 0.5 0.75 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0 8.0γ − 90.0◦ 75.0◦ 65.6◦ 54.1◦ 47.1◦ 38.6◦ 33.6◦ 30.1◦ 27.5◦ 25.5◦ 23.8◦

0 0.5 0.75 1 1.5 2 4 6 840

60

80

100

STOI improvement over nearest microphone

Source directivity parameter α, SNR 5 dB

Rel

ativ

e im

pro

vem

ent

[%]

(a) STOI for real RIRs

0 0.5 0.75 1 1.5 2 4 6 8

3.5

4

4.5

5SSNR improvement over nearest microphone

Source directivity parameter α, SNR 5 dB

Incr

ease

in S

NR

[dB

]

DSBDRWR-DSB, Mirror: P1

DRWR-DSB, Mirror: F1

DRWR-DSB, Mirror: F2R-DSB, Mirror: P1

R-DSB, Mirror: F1

R-DSB, Mirror: F2

(b) SSNR for real RIRs

Fig. 3: Performance improvement over unprocessed microphonesignal averaged over all source orientation, SNR +5 dB.

in the position speficied in Section 3, refer also to Fig. 1. The ex-act directivity pattern was not known. The first half a second of theRIR length was considered. Array signals for each source orienta-tion were obtained by convolving the clean speech signal with theRIRs and then adding WGN noise to obtain a SNR level of +5 dB.

Figure 3 reports the performance of the proposed environmentaware beamforming as a function of the source directivity param-eter α, averaged over the source orientations. The performance isreported as improvement over the unprocessed nearest array micro-phone’s STOI (3a) and SSNR (3b). The DSB is further improved byusing raking. The methods using raking have their best STOI val-ues considering partial first order reflections (”P1”), whereas con-sidering more reflections (”F1,F2”) reduces STOI, refer to Fig. 3a.The R-DSB is more negatively affected by varying the reflection or-der than the proposed DRWR-DSB. In particular, the STOI is alwaysbetter when the reflection coefficient is included in the modeling, re-gardless of the mirror order. Including also the directivity (α > 0)to the raking beamformer results in steady increase in STOI overomnidirectional source modeling (α = 0) for all reflection orders(P1,F1,F2). The best average STOI score is obtained with α = 6using P1 order reflections with the proposed approach.

The proposed method improves SSNR over the R-DSB for eachconsidered reflection order. Unlike the STOI score, where increas-ing the directivity leads to improved scores, the SSNR score startsto decline below the non-directive DSB-raking after approximatelyα > 2, refer to Fig. 3b.

To better understand the impact of the source directivity onbeamforming, Fig. 4 presents the performance of the proposed meth-ods for each orientation. The F1 mirrors are considered, since theyrepresent a compromise in performance between the two metrics,and the directivity pattern is set to α = 2. The DSB and nearest mi-crophone scores are provided as reference. When the source pointsdirectly towards the real array (θ = 0) the scores for DSB and sin-gle microphone are maximized. In such orientations the direct pathdominates over the reflection paths. Using the mirroring in such ori-entations introduces strong pre-echoes into the beamformed signal,which decreases the intelligibility. Instead, the mirroring improvesthe STOI for non-frontal source orientations. In such orientationsnear (θ = 90◦, and θ = 270◦), the reflections can become strongerthan the direct path. This explains why larger α leads to improvedSTOI over DSB, since only few path are considered.

In the experiments the source directivity was assumed frequencyindependent and all surface reflections coefficients were set to equalvalue. However, if the frequency dependency of the source and/orsurface reflection coefficients are known, a more accurate directivityweighted raking could be obtained. The dimensions of the room,loudspeaker location and array position were measured manuallyand any errors are multiplied in virtual microphone positions as mir-

(a) Using actual measured RIRs

90 68 45 23 0 338 315 293 2700.1

0.15

0.2

0.25

0.3

0.35

0.4

Source orientation θ, SNR 5 dB

ST

OI

Near mic.

DSB

R−DSB, Mirror: F1

DRWR−DSB, Mirror: F1, α=2

(a) STOI using real RIR

90 68 45 23 0 338 315 293 270−14

−12

−10

−8

−6

Source orientation θ, SNR 5 dB

SS

NR

[dB

]

(b) SSNR using real RIR

Fig. 4: Performance as a function of source orientation for best rak-ing variants and baseline methods using measured RIRs.

roring order is increased. These inaccuracies are probably the reasonwhy performance decreases when F2 mirror order is considered overF1. Nevertheless, the mirroring scheme was shown to have signif-icant improvement over the DSB and the proposed reflection coef-ficients and source directivity modeling improved the performanceover the traditional raking DSB method.

3.3. Results on Simulated RIRs

To complete the experimental analysis we evaluated the performanceof the investigated algorithms on a set of RIRs simulated via theimage method [16], considering the same layout as for the real data,with varying reverberation time (T60 = 0.1, 0.3, . . . , 0.9 s). Weemployed a modified version of the image method that accounts forsource directivity using the modeling in Eq. (7). We considered theaverage performance over all the 9 orientations for SNR=+5 dB andwith directivity parameter2 D=2.

Figure 5 analyses the impact of modeling the source directivityin the attenuation definition of Eq. (5), reporting the average perfor-mance of the proposed DRWR-DSB in comparison with R-DSB asa function of the directivity parameter α, in moderate reverberationT60 = 0.5 s. First of all, note that, opposite to what was observedin the experiments on measured RIRs, the performance improves asthe number of mirrors increases for both metrics. This makes senseas in this case the propagation model perfectly matches the simu-lated acoustic propagation. When a non-omnidirectional source isconsidered (α > 0) DRWR-DSB is always superior to the R-DSB.Note that a minor gain is obtained also for α = 0 thanks to modelingthe reflection coefficient. For all mirrors and both metrics, the bestperformance is achieved when the radiation parameters in Eq. (5)matches the source radiation patter (i.e. α = D = 2). The algorithmis not very sensitive to the selection of α, performance improves forany α > 0 and presents a rather flat maximum.

Finally, we conclude the analysis of the simulated data consider-ing, in Fig. 6, the average performance as a function of the reverber-ation time. Also in this case, the proposed DRWR-DSB outperformsall the other methods independently of the reverberation time. Inter-estingly, the performance of DRWR-DSB for α = 0 converges tothose of R-DSB for high T60, since the reflection coefficients tendsto 1, providing only a minor improvement. When using the STOImetric, the traditional DSB performs very similarly to DRWR-DSBfor T60 = 0.1 s. The reason is that echoes we are adding are ex-tremely weak in this case and cannot improve the performance whenthe source is not frontal to the array. However, accounting for thesource directivity provides slightly better performance.

2Note that we use D to indicate the actual source radiation pattern whileα is the parameter used in the weight computation in Eq. (5)

0 0.5 1 1.5 2 3 4 5 690

100

110

120

130

140

150

160

170

STOI improvement over nearest microphone

Source directivity parameter α

Rel

ativ

e im

pro

vem

ent

[%]

DSBDRWR-DSB, Mirror: P1

DRWR-DSB, Mirror: F1

DRWR-DSB, Mirror: F2R-DSB, Mirror: P1

R-DSB, Mirror: F1

R-DSB, Mirror: F2

(a) STOI for simulations

0 0.5 1 1.5 2 3 4 5 6

2

2.5

3

3.5

SSNR improvement over nearest microphone

Source directivity parameter α

Incr

ease

in S

NR

[dB

]

(b) SSNR for simulations

Fig. 5: Performance gain with respect to the nearest microphonein terms of STOI and SSNR, averaged over all orientations, on thesimulated data for T60 = 0.5s, D = 2 and different values of α

0.1 0.3 0.5 0.7 0.960

80

100

120

140

160

180STOI improvement over nearest microphone

Reverberation time T60

(s)

Rel

ativ

e im

pro

vem

ent

[%]

(a) STOI for simulations

0.1 0.3 0.5 0.7 0.9

1

2

3

4

5

6SSNR improvement over nearest microphone

Reverberation time T60

(s)

Incr

ease

in S

NR

[dB

]

DSB

R−DSB, Mirror: F2

DRWR−DSB,Mirror:F2 α = 0

DRWR−DSB,Mirror:F2 α = 2

(b) SSNR for simulations

Fig. 6: Performance gain with respect to the nearest microphone interms of STOI and SSNR on the simulated data for D = 2 andmirrors F2, as a function of the reverberation time.

Although not reported here, scores for real and simulated datawith high SNR (+20 dB) exhibit similar benefits regarding STOI andSSNR, but are less pronounced for the SSNR gain.

4. CONCLUSIONSThis paper presented an environment-aware speech enhancementmethod for directive sources based on the use of DSB in combinationwith mirrored microphone arrays. The proposed method extends so-lutions already available in literature in two ways: i) by weightingthe contribution of mirrored microphones according to the reflectioncoefficient to take energy loss of reflections into consideration, andii) by accounting for the source directivity. Experiments on sim-ulated as well as real RIRs show that the rake beamforming ben-efits from this enhanced modeling. Interestingly, the performanceimprovement is not sensitive to parameter variations, in particularthe source directivity α. As shown in Fig. 6 the introduction of thereflection coefficient in the weight computation is particularly ben-eficial over the R-DSB in low reverberation, where the reflectioncoefficients are smaller than 1. Ideally, increasing the number ofreflections would improve the performance. However, in practicalscenarios the mismatch between the actual acoustic propagation andthe geometric model, together with inaccuracies in the environmentdescription, would lead to error accumulation and reduced perfor-mance if a large number of mirrors is considered. Therefore, theoptimum number of mirrors has to be determined based on the ap-plication scenario, the modeling and the desired computational cost.

Future work will address the adoption of this approach in moreadvanced enhancement frameworks (e.g., MVDR). A further openissue to be investigated is related to the robustness against erroneoussource position and orientation and inaccurate reflector characteriza-tion. This is particularly crucial in a fully automated system wherethe room geometry and the acoustic properties of the environment(and of the source) are automatically estimated.

(b) sim., T60=0.5s, directive source (α=2)Figure 2: Performance improvement over unprocessed microphone signalaveraged over all source orientation.

90 68 45 23 0 338 315 293 2700.1

0.15

0.2

0.25

0.3

0.35

0.4

Source orientation θ, SNR 5 dB

ST

OI

Near mic.

DSB

R−DSB, Mirror: F1

DRWR−DSB, Mirror: F1, α=2

(a) STOI using real RIR

90 68 45 23 0 338 315 293 270−14

−12

−10

−8

−6

Source orientation θ, SNR 5 dB

SS

NR

[d

B]

(b) SSNR using real RIR

Fig. 4: Performance as a function of source orientation for best rak-ing variants and baseline methods using measured RIRs.

roring order is increased. These inaccuracies are probably the reasonwhy performance decreases when F2 mirror order is considered overF1. Nevertheless, the mirroring scheme was shown to have signif-icant improvement over the DSB and the proposed reflection coef-ficients and source directivity modeling improved the performanceover the traditional raking DSB method.

3.3. Results on Simulated RIRs

To complete the experimental analysis we evaluated the performanceof the investigated algorithms on a set of RIRs simulated via theimage method [16], considering the same layout as for the real data,with varying reverberation time (T60 = 0.1, 0.3, . . . , 0.9 s). Weemployed a modified version of the image method that accounts forsource directivity using the modeling in Eq. (7). We considered theaverage performance over all the 9 orientations for SNR=+5 dB andwith directivity parameter2 D=2.

Figure 5 analyses the impact of modeling the source directivityin the attenuation definition of Eq. (5), reporting the average perfor-mance of the proposed DRWR-DSB in comparison with R-DSB asa function of the directivity parameter α, in moderate reverberationT60 = 0.5 s. First of all, note that, opposite to what was observedin the experiments on measured RIRs, the performance improves asthe number of mirrors increases for both metrics. This makes senseas in this case the propagation model perfectly matches the simu-lated acoustic propagation. When a non-omnidirectional source isconsidered (α > 0) DRWR-DSB is always superior to the R-DSB.Note that a minor gain is obtained also for α = 0 thanks to modelingthe reflection coefficient. For all mirrors and both metrics, the bestperformance is achieved when the radiation parameters in Eq. (5)matches the source radiation patter (i.e. α = D = 2). The algorithmis not very sensitive to the selection of α, performance improves forany α > 0 and presents a rather flat maximum.

Finally, we conclude the analysis of the simulated data consider-ing, in Fig. 6, the average performance as a function of the reverber-ation time. Also in this case, the proposed DRWR-DSB outperformsall the other methods independently of the reverberation time. Inter-estingly, the performance of DRWR-DSB for α = 0 converges tothose of R-DSB for high T60, since the reflection coefficients tendsto 1, providing only a minor improvement. When using the STOImetric, the traditional DSB performs very similarly to DRWR-DSBfor T60 = 0.1 s. The reason is that echoes we are adding are ex-tremely weak in this case and cannot improve the performance whenthe source is not frontal to the array. However, accounting for thesource directivity provides slightly better performance.

2Note that we use D to indicate the actual source radiation pattern whileα is the parameter used in the weight computation in Eq. (5)

0 0.5 1 1.5 2 3 4 5 690

100

110

120

130

140

150

160

170

STOI improvement over nearest microphone

Source directivity parameter α

Rel

ativ

e im

pro

vem

ent

[%]

DSBDRWR-DSB, Mirror: P1

DRWR-DSB, Mirror: F1

DRWR-DSB, Mirror: F2R-DSB, Mirror: P1

R-DSB, Mirror: F1

R-DSB, Mirror: F2

(a) STOI for simulations

0 0.5 1 1.5 2 3 4 5 6

2

2.5

3

3.5

SSNR improvement over nearest microphone

Source directivity parameter α

Incr

ease

in S

NR

[dB

]

(b) SSNR for simulations

Fig. 5: Performance gain with respect to the nearest microphonein terms of STOI and SSNR, averaged over all orientations, on thesimulated data for T60 = 0.5s, D = 2 and different values of α

0.1 0.3 0.5 0.7 0.960

80

100

120

140

160

180STOI improvement over nearest microphone

Reverberation time T60

(s)

Rel

ativ

e im

pro

vem

ent

[%]

(a) STOI for simulations

0.1 0.3 0.5 0.7 0.9

1

2

3

4

5

6SSNR improvement over nearest microphone

Reverberation time T60

(s)

Incr

ease

in S

NR

[dB

]

DSB

R−DSB, Mirror: F2

DRWR−DSB,Mirror:F2 α = 0

DRWR−DSB,Mirror:F2 α = 2

(b) SSNR for simulations

Fig. 6: Performance gain with respect to the nearest microphone interms of STOI and SSNR on the simulated data for D = 2 andmirrors F2, as a function of the reverberation time.

Although not reported here, scores for real and simulated datawith high SNR (+20 dB) exhibit similar benefits regarding STOI andSSNR, but are less pronounced for the SSNR gain.

4. CONCLUSIONSThis paper presented an environment-aware speech enhancementmethod for directive sources based on the use of DSB in combinationwith mirrored microphone arrays. The proposed method extends so-lutions already available in literature in two ways: i) by weightingthe contribution of mirrored microphones according to the reflectioncoefficient to take energy loss of reflections into consideration, andii) by accounting for the source directivity. Experiments on sim-ulated as well as real RIRs show that the rake beamforming ben-efits from this enhanced modeling. Interestingly, the performanceimprovement is not sensitive to parameter variations, in particularthe source directivity α. As shown in Fig. 6 the introduction of thereflection coefficient in the weight computation is particularly ben-eficial over the R-DSB in low reverberation, where the reflectioncoefficients are smaller than 1. Ideally, increasing the number ofreflections would improve the performance. However, in practicalscenarios the mismatch between the actual acoustic propagation andthe geometric model, together with inaccuracies in the environmentdescription, would lead to error accumulation and reduced perfor-mance if a large number of mirrors is considered. Therefore, theoptimum number of mirrors has to be determined based on the ap-plication scenario, the modeling and the desired computational cost.

Future work will address the adoption of this approach in moreadvanced enhancement frameworks (e.g., MVDR). A further openissue to be investigated is related to the robustness against erroneoussource position and orientation and inaccurate reflector characteriza-tion. This is particularly crucial in a fully automated system wherethe room geometry and the acoustic properties of the environment(and of the source) are automatically estimated.

(a) Performance as a function of sourceorientation using measured RIRs.

90 68 45 23 0 338 315 293 2700.1

0.15

0.2

0.25

0.3

0.35

0.4

Source orientation θ, SNR 5 dB

ST

OI

Near mic.

DSB

R−DSB, Mirror: F1

DRWR−DSB, Mirror: F1, α=2

(a) STOI using real RIR

90 68 45 23 0 338 315 293 270−14

−12

−10

−8

−6

Source orientation θ, SNR 5 dB

SS

NR

[dB

]

(b) SSNR using real RIR

Fig. 4: Performance as a function of source orientation for best rak-ing variants and baseline methods using measured RIRs.

roring order is increased. These inaccuracies are probably the reasonwhy performance decreases when F2 mirror order is considered overF1. Nevertheless, the mirroring scheme was shown to have signif-icant improvement over the DSB and the proposed reflection coef-ficients and source directivity modeling improved the performanceover the traditional raking DSB method.

3.3. Results on Simulated RIRs

To complete the experimental analysis we evaluated the performanceof the investigated algorithms on a set of RIRs simulated via theimage method [16], considering the same layout as for the real data,with varying reverberation time (T60 = 0.1, 0.3, . . . , 0.9 s). Weemployed a modified version of the image method that accounts forsource directivity using the modeling in Eq. (7). We considered theaverage performance over all the 9 orientations for SNR=+5 dB andwith directivity parameter2 D=2.

Figure 5 analyses the impact of modeling the source directivityin the attenuation definition of Eq. (5), reporting the average perfor-mance of the proposed DRWR-DSB in comparison with R-DSB asa function of the directivity parameter α, in moderate reverberationT60 = 0.5 s. First of all, note that, opposite to what was observedin the experiments on measured RIRs, the performance improves asthe number of mirrors increases for both metrics. This makes senseas in this case the propagation model perfectly matches the simu-lated acoustic propagation. When a non-omnidirectional source isconsidered (α > 0) DRWR-DSB is always superior to the R-DSB.Note that a minor gain is obtained also for α = 0 thanks to modelingthe reflection coefficient. For all mirrors and both metrics, the bestperformance is achieved when the radiation parameters in Eq. (5)matches the source radiation patter (i.e. α = D = 2). The algorithmis not very sensitive to the selection of α, performance improves forany α > 0 and presents a rather flat maximum.

Finally, we conclude the analysis of the simulated data consider-ing, in Fig. 6, the average performance as a function of the reverber-ation time. Also in this case, the proposed DRWR-DSB outperformsall the other methods independently of the reverberation time. Inter-estingly, the performance of DRWR-DSB for α = 0 converges tothose of R-DSB for high T60, since the reflection coefficients tendsto 1, providing only a minor improvement. When using the STOImetric, the traditional DSB performs very similarly to DRWR-DSBfor T60 = 0.1 s. The reason is that echoes we are adding are ex-tremely weak in this case and cannot improve the performance whenthe source is not frontal to the array. However, accounting for thesource directivity provides slightly better performance.

2Note that we use D to indicate the actual source radiation pattern whileα is the parameter used in the weight computation in Eq. (5)

0 0.5 1 1.5 2 3 4 5 690

100

110

120

130

140

150

160

170

STOI improvement over nearest microphone

Source directivity parameter α

Rel

ativ

e im

pro

vem

ent

[%]

DSBDRWR-DSB, Mirror: P1

DRWR-DSB, Mirror: F1

DRWR-DSB, Mirror: F2R-DSB, Mirror: P1

R-DSB, Mirror: F1

R-DSB, Mirror: F2

(a) STOI for simulations

0 0.5 1 1.5 2 3 4 5 6

2

2.5

3

3.5

SSNR improvement over nearest microphone

Source directivity parameter α

Incr

ease

in S

NR

[dB

]

(b) SSNR for simulations

Fig. 5: Performance gain with respect to the nearest microphonein terms of STOI and SSNR, averaged over all orientations, on thesimulated data for T60 = 0.5s, D = 2 and different values of α

0.1 0.3 0.5 0.7 0.960

80

100

120

140

160

180STOI improvement over nearest microphone

Reverberation time T60

(s)

Rel

ativ

e im

pro

vem

ent

[%]

(a) STOI for simulations

0.1 0.3 0.5 0.7 0.9

1

2

3

4

5

6SSNR improvement over nearest microphone

Reverberation time T60

(s)

Incr

ease

in S

NR

[dB

]

DSB

R−DSB, Mirror: F2

DRWR−DSB,Mirror:F2 α = 0

DRWR−DSB,Mirror:F2 α = 2

(b) SSNR for simulations

Fig. 6: Performance gain with respect to the nearest microphone interms of STOI and SSNR on the simulated data for D = 2 andmirrors F2, as a function of the reverberation time.

Although not reported here, scores for real and simulated datawith high SNR (+20 dB) exhibit similar benefits regarding STOI andSSNR, but are less pronounced for the SSNR gain.

4. CONCLUSIONSThis paper presented an environment-aware speech enhancementmethod for directive sources based on the use of DSB in combinationwith mirrored microphone arrays. The proposed method extends so-lutions already available in literature in two ways: i) by weightingthe contribution of mirrored microphones according to the reflectioncoefficient to take energy loss of reflections into consideration, andii) by accounting for the source directivity. Experiments on sim-ulated as well as real RIRs show that the rake beamforming ben-efits from this enhanced modeling. Interestingly, the performanceimprovement is not sensitive to parameter variations, in particularthe source directivity α. As shown in Fig. 6 the introduction of thereflection coefficient in the weight computation is particularly ben-eficial over the R-DSB in low reverberation, where the reflectioncoefficients are smaller than 1. Ideally, increasing the number ofreflections would improve the performance. However, in practicalscenarios the mismatch between the actual acoustic propagation andthe geometric model, together with inaccuracies in the environmentdescription, would lead to error accumulation and reduced perfor-mance if a large number of mirrors is considered. Therefore, theoptimum number of mirrors has to be determined based on the ap-plication scenario, the modeling and the desired computational cost.

Future work will address the adoption of this approach in moreadvanced enhancement frameworks (e.g., MVDR). A further openissue to be investigated is related to the robustness against erroneoussource position and orientation and inaccurate reflector characteriza-tion. This is particularly crucial in a fully automated system wherethe room geometry and the acoustic properties of the environment(and of the source) are automatically estimated.

(b) Performance gain as a function of thereverberation time.

Figure 3: Performance improvement for each angle in real-data (le�) anddi�erent simulated reverberation times (right), SNR +5 dB.

Conclusions

Experiments on simulated as well as real RIRs using objectiveintelligibility and SNR of speech signals show that the rakebeamforming benefits from the proposed enhanced modelingof i) distance based gain, ii) source radiation modeling, and iii)reflection loss modeling.

[1] P. Svaizer, A. Bru�i, and M. Omologo, “Environment aware estimation of the orientation of acoustic sources using a line array,” in Proceedings ofthe 20th European Signal Processing Conference (EUSIPCO), Aug 2012, pp. 1024–1028.

[2] Teemu Korhonen, “Acoustic localization using reverberation with virtual microphones,” in Proc. IWAENC, 2008.

[3] F. Antonacci, J. Filos, M. R. P. Thomas, E. A. P. Habets, A. Sarti, P. A. Naylor, and S. Tubaro, “Inference of room geometry from acoustic impulseresponses,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 10, pp. 2683–2695, Dec 2012.

[4] Ivan Dokmanić, Reza Parhizkar, Andreas Walther, Yue M. Lu, and Martin Ve�erli, “Acoustic echoes reveal room shape,” Proceedings of theNational Academy of Sciences (PNAS), vol. 110, no. 30, pp. 12186–12191, 2013.

[5] P. Annibale, F. Antonacci, P. Bestagini, A. Bru�i, A. Canclini, L. Cristofore�i, E. A. P. Habets, J. Filos, W. Kellermann, K. Kowalczyk, A. Lombard,E. Mabande, D. Markovic, P. A. Naylor, M. Omologo, R. Rabenstein, A. Sarti, P. Svaizer, and M. R. P. Thomas, “The SCENIC Project: Space-TimeAudio Processing for Environment-Aware Acoustic Sensing and Rendering,” in Audio Engineering Society Convention 131, Oct 2011.

[6] T. Nishiura, S. Nakanura, and K. Shikano, “Speech enhancement by multiple beamforming with reflection signal equalization,” in Proc. IEEEInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), 2001, vol. 1, pp. 189–192.

[7] I. Dokmanić, R. Scheibler, and M. Ve�erli, “Raking the cocktail party,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5, pp. 825–836,Aug 2015.

[8] T. Betlehem and R.C. Williamson, “Acoustic beamforming exploiting directionality of human speech sources,” in IEEE Workshop on Applications ofSignal Processing to Audio and Acoustics (WASPAA), 2003, pp. 147–148.

http://www.cs.tut.fi/∼pertila/ [email protected]