Koyama ASA ASJ joint meeting 2016

Super-resolution in sound field recording and reproduction based on sparse representation

Shoichi Koyama1,2, Naoki Murata1, and Hiroshi Saruwatari1

1The University of Tokyo2Paris Diderot University / Institute Langevin

November 29, 2016

Sound field reproduction for audio system

Microphone array Loudspeaker array

Large listening area can be achieved Listeners can perceive source distance Real-time recording and reproduction can be achieved

without recording engineers

Recording area Target area

November 29, 2016



Telecommunication system

NW

Home Theatre

Live broadcastingApplications


November 29, 2016



Improve reproduction accuracy when # of array elements is small # of microphones > # of loudspeakers

– Higher reproduction accuracy within local region of target area # of microphones < # of loudspeakers

– Higher reproduction accuracy of sources in local region of recording area[Koyama+ IEEE JSTSP2015], [Koyama+ ICASSP 2014, 2015]

[Ahrens+ AES Conv. 2010], [Ueno+ ICASSP 2017 (submitted)]


Sound Field Recording and Reproduction

November 29, 2016


Obtain driving signals of secondary sources (= loudspeakers)arranged on to reconstruct desired sound field inside

Inherently, sound pressure and its gradient on is required to obtain , but sound pressure is usually only known

Signal conversion for sound field recording and reproduction with ordinary acoustic sensors and transducers is necessary

Primary sources

November 29, 2016

Conventional: WFR filtering methodRecording area Target area

Secondary source planeReceiving plane

Primary sources

Signalconversion

[Koyama+ IEEE TASLP 2013]

Receivedsignals

Driving signals

Plane wave Plane wave

Each plane wave determines entire sound field

Signal conversion can be achieved in spatial frequency domain

November 29, 2016

Conventional: WFR filtering methodTarget area

Receivedsignals

Driving signals

Plane wave Plane wave

Each plane wave determines entire sound field

Spatial aliasing artifacts due to plane wave decomposition Significant error at high freq even when microphone < loudspeaker

Recording area

Signalconversion

Secondary source planeReceiving plane

Primary sources

[Koyama+ IEEE TASLP 2013]

Sound field representation for super-resolution

Plane wave decomposition suffers from spatial aliasing artifacts because many basis functions are used

Observed signals should be represented by a few basis functions for accurate interpolation of sound field

Appropriate basis function may be close to pressure distribution originating from sound sources

To obtain driving signals of loudspeakers, basis functions must be fundamental solutions of Helmholtz equation (e.g. Green functions)

November 29, 2016

Basis functionReceivedsignals

Sound field decomposition into fundamental solutions of Helmholtz equation is necessary

Sound field decomposition

Generative model of sound field

Inhomogeneous and homogeneous Helmholtz eq. Distribution of source components

November 29, 2016

[Koyama+ ICASSP 2014]

Sound field is divided into two regions

Generative model of sound field

Inhomogeneous and homogeneous Helmholtz eq.

November 29, 2016

[Koyama+ ICASSP 2014]

Green’s function

Inhomogeneous + homogeneous terms

Plane wave

November 29, 2016

Generative model of sound field Observe sound pressure distribution on plane

Conversion into driving signalsSynthesize monopole sources [Spors+ AES Conv. 2008]

Ambient componentsDirect source components

Applying WFR filtering method [Koyama+ IEEE TASLP 2013]

Decomposition into two components can lead to higher reproduction accuracy above spatial Nyquist freq

November 29, 2016

Sparse sound field representation

・・・・・・・・Microphone array

Source componentsGrid points Sparsity-based signal decomposition

Discretization

Ambient components

Dictionary matrix of Green’s functions

Observed signal Distribution of source components

A few elements of has non-zero valuesunder the assumption of spatially sparse source distribution

Sparse signal decomposition Sparse signal representation in vector form

Signal decomposition based on sparsity of

November 29, 2016

Minimize -norm of

Group sparsity based on physical properties

November 29, 2016

Group sparse signal models for robust decomposition• Multiple time frames• Temporal frequencies• Multipole components

Decomposition algorithm extending FOCUSS[Koyama+ ICASSP 2015]

Sparse signal representation in vector form Structure of sparsity induced by physical properties

Block diagram of signal conversion

Decomposition stage– Group sparse decomposition of

Reconstruction stage– and are respectively converted into driving signals– is obtained as sum of two components

November 29, 2016

Simulation Experiment

Proposed method (Proposed), WFR filtering method (WFR), and Sound Pressure Control method (SPC) were compared

32 microphones (6 cm intervals） and 48 loudspeakers (4 cm intervals) : Rectangular region of 2.4x2.4 m, Grid points: (10cm, 20cm) intervals Source directivity: unidirectional Source signal: single frequency sinewave


November 29, 2016

Simulation Experiment

Signal-to-distortion ratio of reproduction (SDRR)


November 29, 2016

Original pressure distribution

Reproduced pressure distribution

November 29, 2016

Frequency vs. SDR

SDRRs above spatial Nyquist frequency were improved

Source location: (-0.32, -0.84, 0.0) m

Reproduced sound pressure distribution (1.0 kHz)Pr

essu

reEr

ror

November 29, 2016

Proposed WFR SPC

18.1 dB 18.0 dB 19.4 dB

Source location: (-0.32, -0.84, 0.0) m

SDRR:

Reproduced sound pressure distribution (4.0 kHz)Pr

essu

reEr

ror

November 29, 201619.7 dB 6.8 dB 7.8 dB

Proposed WFR SPC Source location: (-0.32, -0.84, 0.0) m

SDRR:

Frequency response of reproduced sound field

November 29, 2016

Frequency response at (0.0, 1.0, 0.0) m

Reproduced frequency response was improved

Conclusion Super-resolution sound field recording and

reproduction based on sparse representation– Conventional plane wave decomposition is suffered from

spatial aliasing artifacts– Sound field representation using source and plane wave

components– Sound field decomposition based on spatial sparsity of source

components– Group sparsity based on physical properties of sound field– Experimental results indicated that reproduction accuracy

above spatial Nyquist frequency can be improved

November 29, 2016

Thank you for your attention!

Related publications• S. Koyama and H. Saruwatari, “Sound field decomposition in reverberant environment

using sparse and low-rank signal models,” Proc. IEEE ICASSP, 2016.• N. Murata, S. Koyama, et al. “Sparse sound field decomposition with multichannel

extension of complex NMF,” Proc. IEEE ICASSP, 2016.• S. Koyama, et al. “Sparse sound field decomposition using group sparse Bayesian

learning,” Proc. APSIPA ASC, 2015.• N. Murata, S. Koyama, et al. “Sparse sound field decomposition with parametric

dictionary learning for super-resolution recording and reproduction,” Proc. IEEE CAMSAP, 2015.

• S. Koyama, et al. “Source-location-informed sound field recording and reproduction with spherical arrays,” Proc. IEEE WASPAA, 2015.

• S. Koyama, et al. “Source-location-informed sound field recording and reproduction,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 5, pp. 881-894, 2015.

• S. Koyama, et al. “Structured sparse signal models and decomposition algorithm for super-resolution in sound field recording and reproduction,” Proc. IEEE ICASSP, 2015.

• S. Koyama, et al. “Sparse sound field representation in recording and reproduction for reducing spatial aliasing artifacts,” Proc. IEEE ICASSP, 2014.

November 29, 2016

Science

Koyama ASA ASJ joint meeting 2016