Upload
saruwatarilabutokyo
View
247
Download
2
Embed Size (px)
Citation preview
Super-resolution in sound field recording and reproduction based on sparse representation
Shoichi Koyama1,2, Naoki Murata1, and Hiroshi Saruwatari1
1The University of Tokyo2Paris Diderot University / Institute Langevin
November 29, 2016
Sound field reproduction for audio system
Microphone array Loudspeaker array
Large listening area can be achieved Listeners can perceive source distance Real-time recording and reproduction can be achieved
without recording engineers
Recording area Target area
November 29, 2016
Sound field reproduction for audio system
Microphone array Loudspeaker array
Telecommunication system
NW
Home Theatre
Live broadcastingApplications
Recording area Target area
November 29, 2016
Sound field reproduction for audio system
Microphone array Loudspeaker array
Improve reproduction accuracy when # of array elements is small # of microphones > # of loudspeakers
– Higher reproduction accuracy within local region of target area # of microphones < # of loudspeakers
– Higher reproduction accuracy of sources in local region of recording area[Koyama+ IEEE JSTSP2015], [Koyama+ ICASSP 2014, 2015]
[Ahrens+ AES Conv. 2010], [Ueno+ ICASSP 2017 (submitted)]
Recording area Target area
Sound Field Recording and Reproduction
November 29, 2016
Recording area Target area
Obtain driving signals of secondary sources (= loudspeakers)arranged on to reconstruct desired sound field inside
Inherently, sound pressure and its gradient on is required to obtain , but sound pressure is usually only known
Signal conversion for sound field recording and reproduction with ordinary acoustic sensors and transducers is necessary
Primary sources
November 29, 2016
Conventional: WFR filtering methodRecording area Target area
Secondary source planeReceiving plane
Primary sources
Signalconversion
[Koyama+ IEEE TASLP 2013]
Receivedsignals
Driving signals
Plane wave Plane wave
Each plane wave determines entire sound field
Signal conversion can be achieved in spatial frequency domain
November 29, 2016
Conventional: WFR filtering methodTarget area
Receivedsignals
Driving signals
Plane wave Plane wave
Each plane wave determines entire sound field
Spatial aliasing artifacts due to plane wave decomposition Significant error at high freq even when microphone < loudspeaker
Recording area
Signalconversion
Secondary source planeReceiving plane
Primary sources
[Koyama+ IEEE TASLP 2013]
Sound field representation for super-resolution
Plane wave decomposition suffers from spatial aliasing artifacts because many basis functions are used
Observed signals should be represented by a few basis functions for accurate interpolation of sound field
Appropriate basis function may be close to pressure distribution originating from sound sources
To obtain driving signals of loudspeakers, basis functions must be fundamental solutions of Helmholtz equation (e.g. Green functions)
November 29, 2016
Basis functionReceivedsignals
Sound field decomposition into fundamental solutions of Helmholtz equation is necessary
Sound field decomposition
Generative model of sound field
Inhomogeneous and homogeneous Helmholtz eq. Distribution of source components
November 29, 2016
[Koyama+ ICASSP 2014]
Sound field is divided into two regions
Generative model of sound field
Inhomogeneous and homogeneous Helmholtz eq.
November 29, 2016
[Koyama+ ICASSP 2014]
Green’s function
Inhomogeneous + homogeneous terms
Plane wave
November 29, 2016
Generative model of sound field Observe sound pressure distribution on plane
Conversion into driving signalsSynthesize monopole sources [Spors+ AES Conv. 2008]
Ambient componentsDirect source components
Applying WFR filtering method [Koyama+ IEEE TASLP 2013]
Decomposition into two components can lead to higher reproduction accuracy above spatial Nyquist freq
November 29, 2016
Sparse sound field representation
・・・・・・・・Microphone array
Source componentsGrid points Sparsity-based signal decomposition
Discretization
Ambient components
Dictionary matrix of Green’s functions
Observed signal Distribution of source components
A few elements of has non-zero valuesunder the assumption of spatially sparse source distribution
Sparse signal decomposition Sparse signal representation in vector form
Signal decomposition based on sparsity of
November 29, 2016
Minimize -norm of
Group sparsity based on physical properties
November 29, 2016
Group sparse signal models for robust decomposition• Multiple time frames• Temporal frequencies• Multipole components
Decomposition algorithm extending FOCUSS[Koyama+ ICASSP 2015]
Sparse signal representation in vector form Structure of sparsity induced by physical properties
Block diagram of signal conversion
Decomposition stage– Group sparse decomposition of
Reconstruction stage– and are respectively converted into driving signals– is obtained as sum of two components
November 29, 2016
Simulation Experiment
Proposed method (Proposed), WFR filtering method (WFR), and Sound Pressure Control method (SPC) were compared
32 microphones (6 cm intervals) and 48 loudspeakers (4 cm intervals) : Rectangular region of 2.4x2.4 m, Grid points: (10cm, 20cm) intervals Source directivity: unidirectional Source signal: single frequency sinewave
Recording area Target area
November 29, 2016
Simulation Experiment
Signal-to-distortion ratio of reproduction (SDRR)
Recording area Target area
November 29, 2016
Original pressure distribution
Reproduced pressure distribution
November 29, 2016
Frequency vs. SDR
SDRRs above spatial Nyquist frequency were improved
Source location: (-0.32, -0.84, 0.0) m
Reproduced sound pressure distribution (1.0 kHz)Pr
essu
reEr
ror
November 29, 2016
Proposed WFR SPC
18.1 dB 18.0 dB 19.4 dB
Source location: (-0.32, -0.84, 0.0) m
SDRR:
Reproduced sound pressure distribution (4.0 kHz)Pr
essu
reEr
ror
November 29, 201619.7 dB 6.8 dB 7.8 dB
Proposed WFR SPC Source location: (-0.32, -0.84, 0.0) m
SDRR:
Frequency response of reproduced sound field
November 29, 2016
Frequency response at (0.0, 1.0, 0.0) m
Reproduced frequency response was improved
Conclusion Super-resolution sound field recording and
reproduction based on sparse representation– Conventional plane wave decomposition is suffered from
spatial aliasing artifacts– Sound field representation using source and plane wave
components– Sound field decomposition based on spatial sparsity of source
components– Group sparsity based on physical properties of sound field– Experimental results indicated that reproduction accuracy
above spatial Nyquist frequency can be improved
November 29, 2016
Thank you for your attention!
Related publications• S. Koyama and H. Saruwatari, “Sound field decomposition in reverberant environment
using sparse and low-rank signal models,” Proc. IEEE ICASSP, 2016.• N. Murata, S. Koyama, et al. “Sparse sound field decomposition with multichannel
extension of complex NMF,” Proc. IEEE ICASSP, 2016.• S. Koyama, et al. “Sparse sound field decomposition using group sparse Bayesian
learning,” Proc. APSIPA ASC, 2015.• N. Murata, S. Koyama, et al. “Sparse sound field decomposition with parametric
dictionary learning for super-resolution recording and reproduction,” Proc. IEEE CAMSAP, 2015.
• S. Koyama, et al. “Source-location-informed sound field recording and reproduction with spherical arrays,” Proc. IEEE WASPAA, 2015.
• S. Koyama, et al. “Source-location-informed sound field recording and reproduction,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 5, pp. 881-894, 2015.
• S. Koyama, et al. “Structured sparse signal models and decomposition algorithm for super-resolution in sound field recording and reproduction,” Proc. IEEE ICASSP, 2015.
• S. Koyama, et al. “Sparse sound field representation in recording and reproduction for reducing spatial aliasing artifacts,” Proc. IEEE ICASSP, 2014.
November 29, 2016