58
3/10/17 1 Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 Virtual Acoustics Spatial audio techniques Vector-base amplitude panning VBAP (separate slides) Directional audio coding DirAC (separate slides) 10.3.2017 [email protected] 2

Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Embed Size (px)

Citation preview

Page 1: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

1

Ville Pulkki, Tapio Lokki

Virtual acoustics and spatial audio

ELEC-E5620, 2017Audio Signal Processing

Agenda 10.3.2017• Virtual Acoustics• Spatial audio techniques• Vector-base amplitude panning VBAP (separate slides)• Directional audio coding DirAC (separate slides)

[email protected]

2

Page 2: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

2

Basics of soundSound propagates as waves• Audible frequency range 20 … 20 000 Hz• Speed of sound in air ~340 m/s• Wavelength 17 m … 17 mm• Dynamics 0 … 120 dB• scattering, diffraction, interference...Modeling of room acoustics: source – medium – receiver model

[email protected]

3

Virtual Acoustics (Väänänen, 2003)

[email protected]

4

Page 3: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

3

Impulse response of a room

[email protected]

510 meters

7 meters

Impulse response of a room

Page 4: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

4

Impulse responseA linear time-invariant system (LTI) can be modeled with an impulse response

The output y(t) is the convolution of the input x(t) and the impulse response h(t)

Discrete form (convolution is sum)

[email protected]

7

Measured impulse responses of real concert halls

[email protected]

8

−1

−0.5

0

0.5

1

Promenadisali

Impulssivaste konserttisalissa 15m etäisyydellä lavasta

−1

−0.5

0

0.5

1

Sibeliustalo

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

−0.5

0

0.5

1

Musiikkitalo

Aika [s]

Page 5: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

5

Measured impulse responses of real concert halls

[email protected]

9

−40

−30

−20

−10

0

Promenadisali

Impulssivaste konserttisalissa 15m etäisyydellä lavasta

−40

−30

−20

−10

0

Sibeliustalo

Mag

nitu

di [d

B]

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−40

−30

−20

−10

0

Musiikkitalo

Aika [s]

Two goals of room acoustics modelingGoal 1: room acoustics prediction• Static source and receiver

positions• No real-time requirement

Goal 2: auralization, sound rendering• Possibly moving source(s) and

listener, even geometry• Both off-line and interactive

(real-time) applications• Need of anechoic stimulus signals

[email protected]

10

Page 6: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

6

Goal 1: Prediction of room acoustics

[email protected]

11

(Springer Handbook of Acoustics 2007)

Prediction of acoustics of a room (goal 1)Input data:

• Geometry, materials, source(s) and receiver(s) locations and orientationsGoal:

• Impulse response(s)- room acoustical parameters (T60, C80, EDT, LEF...), ISO 3382-1:2009- low frequencies behaviour

Modeling:• Source(s)

- omnidirectional, sometimes directional• Medium

- sound propagation in air and reflections- as accurate as possible

• Receiver(s)- output mono, fig-of-eight, binaural

[email protected]

12

Page 7: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

7

Goal 2: Auralization / sound rendering

“Auralization is the process of rendering audible, by physical or mathematical modeling, the sound field of a source in a space, in such a way as to simulate the binaural listening experience at a given position in the modeled space.” (Kleiner et al. 1993, JAES)Sound rendering: plausible 3-D sound, e.g., in games

3-D model Þ spatial IR * dry signal = auralization

[email protected]

13

Auralization, sound rendering (goal 2)Input data:

• Anechoic stimulus signal(s) !• Geometry, materials, source(s) and receiver(s) locations and

orientationsGoal:

• Plausible spatial sound, authentic auralizationModeling:

• Source(s)- omnidirectional, directional

• Medium- physically-based sound propagation in a room- perceptual models, i.e., artificial reverb

• Receiver- spatial sound reproduction (binaural or multichannel)

[email protected]

14

Page 8: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

8

Dynamic auralization (≈sound rendering)Method 1: A grid of impulse responses are computed and convolution is performed with interpolation between 2/4/8/-nodes• Applied in CATT software (http://www.catt.se)Method 2: ”Parametric rendering”

[email protected]

15

Source ModelingStimulus• Sound signal synthesis• Anechoic recordings- https://mediatech.aalto.fi/en/research/virtual-

acoustics/research/acoustic-measurement-and-analysis/85-anechoic-recordings

Radiation• Directivity is a measure of the directional

characteristic of a sound source.• Point sources- omnidirectional- frequency dependent directivity characteristics

• Line and volume sources• Database of loudspeakers

http://www.clfgroup.org/

[email protected]

16

Page 9: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

9

Directivity of musicalinstrumentsPätynen and Lokki (AAuA2010)Data available: https://mediatech.aalto.fi/en/research/virtual-acoustics/research/acoustic-measurement-and-analysis/77-directivity-of-instruments

[email protected]

17

[email protected]

18

Room acoustics modeling

• 1:10, 1:20, 1:50Scale models

• Element methods(FEM,BEM)

• Time-domain methods(FDTD, e.g., Waveguidemesh)

Wave-basedmethods

• Image-source method, beam tracing

• Ray-tracing, particles, phonon tracing, sonelmapping, etc.

• Acoustic radiancetransfer

Ray-basedmethods

• SEA• Sabine, Eyring, ym.

Statisticalmethods

Page 10: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

10

[email protected]

19

Room acoustics modeling

Scalemodels

FEM,BEM

FDTD, e.g.,Waveguide

mesh

Wave-basedmethods

Image sourcemethod,

beam-tracing

Ray-tracing,particles,

phonon tracing,sonel mapping

Acousticradiancetransfer

Ray-basedmethods

SEA

Statisticalmethods

Physically-basedmodeling methods

Building acoustics, such as structural coupling, is not covered in this presentation

Room acoustics modeling methods

[email protected]

20

Scale models 1:10

(Tachibana et al. 2004)

Copenhagen 2006

Musiikkitalo 2009

Page 11: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

11

ReproductionThe most intuitive way to study room acoustic prediction results• Not only for expertsAnechoic stimulus signal• Only a few recordings availableReproduction with binaural or multi-channel techniques

Impulse response has to contain also spatial information• Binaural IR• IR for each loudspeaker (e.g., SIRR, Spatial Impulse Response Rendering or

SDM, Spatial Decomposition Method)

[email protected]

21

Spatial audio(a.k.a. 3D sound)Humans are able to perceive the direction of sound event using only two ears. The mechanisms are based on monaural and binaural analysis of ear canal signals.Three-dimensional sound illusion can be achieved using headphones or a pair of loudspeakers. We can "cheat" the auditory system using 3-D audio!• We need to know:- basics of human hearing- basics of spatial hearing- signal processing- basics of loudspeakers- room acoustics

[email protected]

22

Page 12: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

12

Modeling the acousticsof listenerHRIR = head-related impulse responseHRTF = head-related transfer function

[email protected]

23

Binaural hearingHumans have two earsFirst studies already at 1876 and 1907 by Lord RayleighBinaural hearing is based on:• interaural time difference (ITD), below ~700 Hz• interaural level difference (ILD), above ~2000 Hz• in-between (700-2000 Hz) both ITD and ILD (also other features)

[email protected]

24

Page 13: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

13

Each human pinnae is individual

Torso, shoulders, head and pinna modify the perceived spectrum as a function of the incoming angle of soundRoom acoustics (early reflections), head movements and vision also contribute to spatial hearing sensationHRTF definition:

• Free-field impulse response from a point in a space to a point in the listener's ear canal

[email protected]

25

HRTF modeling and filter designChoice of HRTFs• Individual HRTFs yield best resultsMinimum-phase reconstruction• Reconstruction of ITD using a delay lineSpectral preprocessing• Equalization (diffuse-field, free-field)• Auditory smoothingFilter design• FIR, IIR, warped structures• Least squares, Chebyshev, Hankel norm designs

[email protected]

26

Page 14: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

14

HRTF measurementsMicrophones in earsTurntable

[email protected]

27

http://www.ais.riec.tohoku.ac.jp/Lab3/localization/

HRTF measurements, new development[Huttunen et al. Rapid generation of personalized HRTFs. In AES 55th conf, 2014]Scanning head geometry, mathematical model, simulation (FM-BEM)

• Multiple cameras, 3D laser scanners, video, etc. (http://ownsurround.com)Reciprocal measurement: Loudspeaker in the ear, N receivers

[email protected]

28

Page 15: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

15

Binaural reproduction with headphonesSeparate sound signals for both ears• HRTFs requiredHead-tracking needed to perceive sources outside-the-head (externalization)

[email protected]

29

Cross-talk cancellationbinaural reproduction with two loudspeakersCross-talk cancellation first introduced by Atal and Schroeder (1963 publication, 1966 patent)Originally intended for playing back dummy head recordings over loudspeakersSymmetry exploited in shuffler structure, transaural processing, by Cooper and Bauck(1989), covered by many patentsLoudspeaker setups• 60 degrees• 10 degrees (stereo-dipole)

Nintendo DS, Nokia phones

[email protected]

30

Page 16: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

16

Vector base amplitude panning (VBAP)Developed by Ville Pulkki, 1997 at TKKAmplitude panning extended to 3DSimple calculation of gain factorsArbitrary positioning of loudspeakers

[email protected]

31

AmbisonicsInvented by Michael Gerzon (1973)Both recording and reproduction technique• soundfield microphoneBased on spherical harmonics theory2D panning, 1st order gain factorsIRCAM had recently 9th order system (>300 lps)

[email protected]

32

Quadrafonicloudspeaker setup:first orderambisonics

Quadrafonicloudspeaker setup:second orderambisonics

Page 17: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

17

Wave field synthesis (WFS)Idea proposed by Berkhout (JAES, 1998) at the University of Delft • Based on the Huygens–Fresnel principle Requires a lot of loudspeakersLarge listening areaGreat animations: http://www.syntheticwave.de/wfs-properties.htm• http://www.syntheticwave.de/Principle%20of%20wave%20field%20synthesis.htm• http://www.syntheticwave.de/pictures/wave_field_synthesis.swfIOSONO• http://www.iosono-sound.com

[email protected]

33

Multichannel loudspeaker setups

[email protected]

34

10.25.1

22.2

Page 18: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

18

Loudspeaker-setup agnostic systems

• 40-120 discrete channels are transmitted to the user• Each channel contains spatial metadata (panning direction,

spread, etc) • Supports in principle any number of loudspeakers• Cinema audio format for 3D loudspeaker setups• Blue-ray also

[email protected]

35

- Dolby Atmos: http://www.dolby.com/us/en/professional/technology/cinema/dolby-atmos.html

- DTS:Xhttp://dts.com/dtsx

- MPEG-H: https://en.wikipedia.org/wiki/MPEG-H_3D_Audio

Virtual loudspeakers with headphones

Each loudspeaker signal convolved with corresponding HRTFs• Head-tracking need real-time implementation

[email protected]

36

Dolby Headphones (5.1 or 7.1 with headphones)

Sony Playstation VR utilizes VBAP with N virtual loudspeakers, head tracking affects the virtual source directions, not HRTFs

Page 19: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

19

[email protected]

37

Case: Auralization in DIVA system(Savioja et al., JAES1999)

1. Scene definition

2. Parametric presentation of sound paths

3. Auralization with parametric DSP structure

[email protected] - 38

Auralization parameters in DIVA

• Input contains (given to image-source calculation)– geometry data, material data, positions and orientations of

sources and the listener– anechoic recording

• For the direct sound and each image source the following set of auralization parameters is provided:– Distance from the listener– Azimuth and elevation angles with respect to the listener– Source orientation with respect to the listener– Set of filter coefficients which describe the material properties in

reflections

Page 20: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

20

[email protected] - 39

Treatment of one image source – a DSP view

• Directivity• Air absorption• Distance attenuation• Reflection filters• Listener modeling

• Linear system• Commutation• Cascading

[email protected] - 40

DIVA auralization block diagram

Page 21: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

21

[email protected] - 41

Treatment of each image source

[email protected] - 42

Late reverberation algorithm• A special version of feedback delay network (Väänänen et al.

1997)– also time-variant method (Lokki & Hiipakka, 2001)

Page 22: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

22

[email protected] - 43

A Case Study: a Lecture Room

[email protected] - 44

Image sources 1st order

Page 23: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

23

[email protected] - 45

Image sources up to 2nd order

[email protected] - 46

Image sources up to 3rd order

Page 24: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

24

[email protected] - 47

Distance attenuation

[email protected] - 48

Distance attenuation (zoomed)

Page 25: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

25

[email protected] - 49

Gain + air absorption

[email protected] - 50

Gain + air and material absorption

Page 26: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

26

[email protected] - 51

All monaural filtering

[email protected] - 52

All monaural filtering (zoomed)

Page 27: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

27

[email protected] - 53

Treatment of each image source

[email protected] - 54

Only ITD for pure impulse

Page 28: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

28

[email protected] - 55

Only ITD for pure impulse (zoom)

[email protected] - 56

ITD + minimum phase HRTF

Page 29: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

29

[email protected] - 57

Monaural filterings + ITD

[email protected] - 58

Monaural filterings + ITD + HRTF

Page 30: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

30

[email protected] - 59

DIVA auralization block diagram

[email protected] - 60

Reverb

Page 31: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

31

[email protected] - 61

Image sources + reverberation

[email protected] - 62

Image sources + reverberation

Page 32: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

32

[email protected] - 63

Image sources + reverberation

[email protected] - 64

Dynamic Sound Rendering

• Dynamic rendering– properties of image sources are time variant

• The coefficients of filters are changing all the time– every single parameter have to be interpolated– in delay line pick-ups the fractional filters have to be used to

avoid clicks and artifacts– Late reverberation is static

• Update rate ó latency

Page 33: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

33

[email protected] - 65

Auralization quality

• What is the wanted quality?– Assesment of quality is possible only by case studies

• Objectively:– Acoustical attributes– With auditory modeling

• Subjectively:– Listening tests

[email protected] - 66

A case study, lecture hall T3

Page 34: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

34

[email protected] - 67

Example impulse responses

• Image sources up to 4th order auralized• First order diffraction modeled• Statistical late reverberation

[email protected] - 68

Quality of auralization

Stimuli: clarinet drum

Results clarinet: recording auralizationResults drum: recording auralization

Page 35: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

3/10/17

35

Conclusions on room acoustics modelingTwo goals:• Prediction of acoustical attributes• Auralization, sound renderingA lot of different methods applied in room acoustic modeling• All of them have weaknesses• A hybrid method, combining many techniques, would be the ideal

solution• A few commercial software available and in everyday use of

consultants• Some methods are still much too complex for modern computers

(computation time and memory)

[email protected]

69

[email protected]

Literature• Required reading (for exam):

– Savioja, L., Huopaniemi, J., Lokki, T., and Väänänen, R. 1999. Creating virtualacoustic environments. Journal of the Audio Engineering Society, vol. 47, no. 9,pp. 675-705, September 1999.

• Recommended reading:– Avendano, C., Jot, J-M. Frequency domain techniques for stereo to multichannel

upmix. Proc. AES 22nd international conference, June 15-17, 2002, Espoo,Finland, pp. 121-130

– Lokki, T. Tasting music like wine: Sensory evaluation of concert halls. PhysicsToday, vol. 67, no. 1, pp. 27-32, 2014. http://dx.doi.org/10.1063/PT.3.2242

– TKK doctoral dissertations, see http://lib.tkk.fi/Diss/• Huopaniemi, J. Virtual acoustics and 3-D sound in multimedia signal processing, 1999• Savioja, L. Modeling Techniques for Virtual Acoustics, 1999• Pulkki, V. Spatial Sound Generation and Perception by Amplitude Panning Techniques, 2001• Lokki, T. Physically-based Auralization – Design, Implementation, and Evaluation, 2002• Väänänen, R. Parametrization, Auralization, and Authoring of Room Acoustics for Virtual Reality

Applications, 2003• Merimaa, J. Analysis, Synthesis, and Perception of Spatial Sound – Binaural Localization Modeling

and Multichannel Loudspeaker Reproduction, 2006• Siltanen, S. Efficient Physics-Based Room-Acoustics Modeling and Auralization, 2010• Pätynen, J. A virtual symphony orchestra for studies on concert hall acoustics, 2011• Tervo, S. Localization and tracing of early acoustic reflections in enclosures, 2012• Lainen, M-V. Techniques for versatile spatial-audio reproduction in time-frequency domain, 2014

Page 36: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Vector-base amplitude panning VBAPDirectional audio coding DirAC

Ville PulkkiProfessor of Acoustics (Associate Professor)Department of Signal Processing and AcousticsSchool of Electrical EngineeringAalto University, Helsinki, Finland

Installation lecture

January 19, 2016

These slides

Vector base amplitude panning (VBAP)Variants and enhancement of VBAPTime-frequency-domain parametric spatial audioDirectional audio coding

Vector-base amplitude panning VBAP Directional audio codingDirAC

2/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 37: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

A music student with MSc (Eng) needsextra income (1995)

Sibelius Academy chamber music hall had lots ofloudspeakers on walls and ceilingSibA wanted to have a "panning tool" for theirloudspeaker system (one month salary for student)1-month joint project btw TKK and SibA

Vector-base amplitude panning VBAP Directional audio codingDirAC

3/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Reformulation of amplitude panning

Tried to generalize the sine panning law to 3D, no luck"Could this be formulated with vector bases?" – "Yes!"Vector base amplitude panning (VBAP) was bornDivide setup into triplets, and compute gain factors for each

n

l

p

l

l

m

k

loudspeaker m

loudspeaker k

virtualsource loudspeaker n

Vector-base amplitude panning VBAP Directional audio codingDirAC

4/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 38: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Vector base amplitude panning

PhD degree in 2001.

Vector-base amplitude panning VBAP Directional audio codingDirAC

5/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Dissemination of VBAP

Published VBAP paper in JAES 1997Provided free software implementations of the methodArticle has been cited 990 times in google scholar (2017)Second paper of all JAES papers, when ranked with the number ofcitations (scopus)

Vector-base amplitude panning VBAP Directional audio codingDirAC

6/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 39: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Products with "VBAP inside"

ITU MPEG-H audio standard (broadcast)DTS:X audio format (cinema + blueray)Sony Playstation VR (gaming)Dedicated audio programming softwares

Vector-base amplitude panning VBAP Directional audio codingDirAC

7/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

VBAP maths

n

l

p

l

l

m

k

loudspeaker m

loudspeaker k

virtualsource loudspeaker n

p = g1l1 + g2l2 + g3l3

g = [p1 p2 p3]

2

4l11 l12 l13l21 l22 l23l31 l32 l33

3

5�1

g holds the barycentric coordinates of virtual source in vector baseloudspeaker signals y

i

= g

i

x(t)

g

i

controls the amplitude of signal in each loudspeaker

Vector-base amplitude panning VBAP Directional audio codingDirAC

8/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 40: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

VBAP maths

g1 = g2 = 0.7g1 = g2 = 1.0

g1 = g2 = 0.5

p

l1l2

(q = 2)

40o90o

120o

g

i

depend on opening angle of the loudspeaker base / not good!Length of g must be normalized to avoid changes in loudnessg

norm

= g/(P

i gpi )

1/q

Thus: (P

i

g

q

norm

i

)1/q == 1q = 1 for anechoic cases (also headphones with virtual loudspeakers)q = 2 for normal rooms

Matlab code available https://se.mathworks.com/matlabcentral/fileexchange/53884-vector-base-amplitude-panning-library

Vector-base amplitude panning VBAP Directional audio codingDirAC

9/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

VBAP runtime cycle

init Feed in loudspeaker directionsinit 2D: form pairs. 3D form triplets. Compute inverse matricesrun multiply input sound x(t) with g, output to loudspeakers

intrpt1 Start interrupt when virtual source direction changesintrpt2 Compute gain factors for each LS pair/triplet, select the pair/triplet with

positive gainsintrpt3 Normalize gains

Max demo

Vector-base amplitude panning VBAP Directional audio codingDirAC

10/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 41: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Amplitude panning audio quality

Direction of amplitude-panned source is perceived relatively accuratelyin best listening positionOutside of sweet spot: directional perception is dominated by nearmostloudspeakerNo prominent coloration issues in normal rooms inside or outside thesweet spotMost-used virtual source positioning method: all mixers have "panpot"buttonsColoration issues in anehcoic listening

Vector-base amplitude panning VBAP Directional audio codingDirAC

11/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Amplitude panning, mechanism behind formationof perceived directionPerceiving a virtual source between the loudspeakers does notcorrespond to actual situation. If you have a red LED in both left and righthands, you see two LEDS, not one in between.Amplitude panning causes cross-talk and affects both ITD and ILD incomplex way

1t

t t2

t

1l

2r

r

l

g1 g2

R

L

t

g1 = g2 g1 > g2

t

~0.2ms ~0.2ms

ampltude-panned impulse responses at ear canals

Page 42: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Amplitude panning, mechanism behind formationof perceived direction

loudspeaker amplitude difference changes to interaural time differenceat low frequenciesloudspeaker amplitude difference changes to interaural level differenceat high frequencies

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

40

50

ITD

A [d

egre

es]

ERB channel

Frequency [kHz]0.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2

0

5

10

15

20

25

30

θT =

1 2 3 4 5 6 7 8 9 10 11

0

10

20

30

40

50

ILD

A [d

egre

es]

ERB channel

Frequency [kHz]0.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2

10

5

15

20

0

25

30θ

T =

Pulkki, Ville. Spatial sound generation and perception by amplitude panning techniques. PhD thesis. Helsinki University of Technology, 2001.

Spread issue

Perceived spread of amplitude-panned virtual sources depends on virtualsource direction p

When p is coincident with loudspeaker direction, "point-like"When p is in-between loudspeakers, "more or less spread"Frequency-dependent ITD and ILD cues do not match with real source

Vector-base amplitude panning VBAP Directional audio codingDirAC

14/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 43: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Multiple-direction amplitude panning

Make the spread even!

Define pDefine a number of vectors p

i

around p within angular range of �around pCompute g

i

for for each pi

,Sum over i : g =

Pi

g

Result: always more than one LS hasconsiderable gain.

Max demo

Pulkki, V. (1999). Uniform spreading of amplitude panned virtual sources. In Applications of Signal Processing to Audio and Acoustics, 1999 IEEEWorkshop on (pp. 187-190). IEEE.

Vector-base amplitude panning VBAP Directional audio codingDirAC

15/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Coloration of amplitude-panned sources

1t

t t2

t

1l

2r

r

l1

t

t t2

t

1l

2r

r

l

Direct sounds from loudspeakers interfere ! comb filter effect !audible colorationReflected and reverberated sound paths arrrive at ear canals inincoherent manner, and no comb filter effect occurs! Amplitude-panned sources are not perceived colored in normalrooms

Vector-base amplitude panning VBAP Directional audio codingDirAC

16/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 44: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Coloration of amplitude-panned sources

Amplitude-panned sources are colored in anechoic listeningIsn’t anechoic listening just a niche that nobody cares?Headphone listening with virtual loudspeakers + panning: that isanechoic listening(Sony playstation VR + many other VR applications)We should do something for thisAn easy solution is to utilize more loudspeakers: when the anglebetween LS is smaller, traveling time difference is smaller, andcomb-filter effect migrates to higher frequencies and becomes lesssalient

Vector-base amplitude panning VBAP Directional audio codingDirAC

17/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Coloration of amplitude-panned sources in ane-choic listening

CharacteristicsDip around 1-2 kHzAt high frequencies a bitlower leveleffect depends on

panning angleloudspeaker directionsroom effect

Can we compensate this by equalizing / other means?

Vector-base amplitude panning VBAP Directional audio codingDirAC

18/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 45: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Coloration of amplitude-panned sourcesin anechoic listening

Gain factor normalization gnorm

= g/(P

i

g

q

i

)q = 1 or q = 2In anechoic listening only frequencies below about 800Hz satisfy q = 1conditionAt frequencies above 2kHz it is not intuitively clear what happens.

Lets make q to depend on frequency and listening-room-response

Vector-base amplitude panning VBAP Directional audio codingDirAC

19/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Coloration of amplitude-panned sourcesin anechoic listeningA solution has been proposed in [1]

Gain factor normalization gnorm

= g/(P

i

g

q

i

)

A solution has been numerically obtained using auditory models androom measurementsq(f ,DTT) = (p0(f ))

pDTT + 2

q0(f ) = 1.5 � 0.5 cos [4.7 tanh (a1f ) max (0, 1 � (a2f )]

where a1 = 0.00045 and a2 = 0.000085DTT is direct-to-total energy ratio

[1] Laitinen, M. V., Vilkamo, J., Jussila, K., Politis, A., & Pulkki, V. (2014, August). Gain normalization in amplitude panning as a function offrequency and room reverberance. In Audio Engineering Society Conference: 55th International Conference: Spatial Audio. Audio EngineeringSociety.

Vector-base amplitude panning VBAP Directional audio codingDirAC

20/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 46: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Coloration of amplitude-panned sourcesin anechoic listening

The figures show q(f ,DTT) valuesResults simulated with large number oflistening conditions with loudspeaker spanfrom 30� to 80�

Requires frequency-domainimplementation of panningMitigates coloration issuesReadily implementable intime-frequency-domain processing, suchas in DirACCan be implemented with IIR filters (?)

Vector-base amplitude panning VBAP Directional audio codingDirAC

21/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Directional audio coding (DirAC)

Developed in Ville Pulkki’s research group 2001 —

reproduce recorded spatial soundsynthesize spatial properties to sound (e.g., game sound engine)time-frequency-domain parametric methodnon-linear, processing depends on signal and on spatial properties ofsound field

Vector-base amplitude panning VBAP Directional audio codingDirAC

22/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 47: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

How could a sound field be reproduced

Problems with existing techniques

Vector-base amplitude panning VBAP Directional audio codingDirAC

23/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Spatial sound reproduction

Target: relay the perception of sound!

Vector-base amplitude panning VBAP Directional audio codingDirAC

24/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 48: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Analogy with video

How does a video camera work?

LensLight from distinct direction is projected to one position at CCDCCD encodes the light energy at three frequency channels (RGB)Visible light wave lengths 380 nm - 780 nm (less than one octave)Very similar with eye

Vector-base amplitude panning VBAP Directional audio codingDirAC

25/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Spatial sound reproduction

Could we do the same with sound than with video camera

Create narrow beam for each loudspeakerAudible sound includes wave lengths from 2 cm to about 30 mImpossible to build a microphone having constant narrow beam widthwithout coloration and noise problemsHigher-order Ambisonics / beam steering try to do it

Vector-base amplitude panning VBAP Directional audio codingDirAC

26/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 49: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Spatial sound reproduction

Holography then, perhaps?

Lots of spaced microphonesLots of loudspeakersWave field synthesisProblems

High priceDirectivity of microphones should be matched with directivity ofloudspeakers - is it possible?

Vector-base amplitude panning VBAP Directional audio codingDirAC

27/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Sound fields reproduced with WFS and HOA

monochromatic plane waves reproducedvalid sound field only in limited listeningarea "sweet spot"at high frequencies huge errors

Daniel, JÈrÙme, Sebastien Moreau, and Rozenn Nicol. "Further investigations of high-orderambisonics and wavefield synthesis for holophonic sound imaging." Audio Engineering SocietyConvention 114. Audio Engineering Society, 2003. This

Vector-base amplitude panning VBAP Directional audio codingDirAC

28/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 50: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Parametric spatial sound reproduction

Are there any workarounds?

Human spatial hearing can be fooled easilyE.g. two coherent sources produce one virtual source in the middleCompare with vision: coherent sources do not produce virtual sourcesAssumption: at one frequency band humans perceive only one directionand one coherence cue

Vector-base amplitude panning VBAP Directional audio codingDirAC

29/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Parametric spatial sound reproduction

Capture the soundAnalyze spatial parametersReproduce the sound in a way which recreates the spatial parameters

micro-phones time-

frequencyanalysis

spatialanalysis

spatialsynthesis

microphonesignals in TFdomain

spatialmetadatain TF domain

loudspeaker or headphonesignals

Vector-base amplitude panning VBAP Directional audio codingDirAC

30/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 51: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Assumptions in DirAC

Assumption 1: listener is able to localize only one sound object at onetime-frequency positionAssumption 2: good reproduction quality is obtained, if we reproducecorrectly the

direction,diffuseness, andspectrum of sound

Vector-base amplitude panning VBAP Directional audio codingDirAC

31/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Example implementation

Vector-base amplitude panning VBAP Directional audio codingDirAC

32/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 52: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

B-format microphones

Vector-base amplitude panning VBAP Directional audio codingDirAC

33/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

B-format directional patterns

N = 0

N = 1

N = 2

N = 3

N denotes the order of patterns (and microphone)0th-order (omni) microphones capture pressure signal [W]1st-order dipole microphones capture volume velocity signals [X,Y,Z]

Vector-base amplitude panning VBAP Directional audio codingDirAC

34/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 53: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

DirAC details, some of them

Time-frequency transformFilter banksSTFTThe system can be seen a filter changing weights fast in time, aliasingissues have to be taken into account

p / W is pressure signal, u / [X Y Z] is 3D velocity vectorIa

= <[p⇤(k , n) u(k , n)](active intensity vector)

e = ⇢02 ||u||

2 + |p|2

2⇢0c

2 (energy density)Direction of arrival DOA = �I

a

Diffuseness = 1 � ||E[Ia

]||cE[e]

Temporal integration of parametersShort constants for DOA and DiffusenessLonger for loudspeaker gains

Vector-base amplitude panning VBAP Directional audio codingDirAC

35/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Matlab code for directional analysis% d i r a n a l y s i s .m% Author : V . Pu l kk i% Example o f d i r e c t i o n a l ana l ys i s o f s imulated B�format record ingFs=44100; % Generate s igna l ss ig1 =2⇤(mod ( [ 1 : Fs ] ’ ,40) /80 �0.5) .⇤ min (1 ,max( 0 , (mod ( [ 1 : Fs ] ’ , Fs/5)�Fs / 1 0 ) ) ) ;s ig2 =2⇤(mod ( [ 1 : Fs ] ’ ,32) /72 �0.5) .⇤ min (1 ,max( 0 , (mod ( [ [ 1 : Fs ]+ Fs / 6 ] ’ , Fs/3)�Fs / 6 ) ) ) ;% Simulate two sources i n d i r e c t i o n s o f 50 and 170 degreesw=( s ig1+s ig2 ) / s q r t ( 2 ) ;x=s ig1⇤cos (50/180⇤ p i )+ s ig2⇤cos(�170/180⇤p i ) ;y=s ig1⇤s in (50/180⇤ p i )+ s ig2⇤s in (�170/180⇤p i ) ;

% Add fad ing i n d i f f u s e noise w i th 36 sources evenly i n the h o r i z o n t a l plane 43 f o r d i r =0:10:350noise =( rand ( Fs , 1 ) �0 . 5 ) .⇤ ( 1 0 . \ ^ ( ( ( [ 1 : Fs ] ’ / Fs)�1)⇤2));w=w+noise / s q r t ( 2 ) ;x=x+noise⇤cos ( d i r /180⇤ p i ) ;y=y+noise⇤s in ( d i r /180⇤ p i ) ;

endhopsize =256; % Do d i r e c t i o n a l ana l ys i s w i th STFTwins ize =512; i =2; alpha =1. / (0 .02⇤Fs / wins ize ) ;In tens=zeros ( hopsize ,2 )+ eps ; Energy=zeros ( hopsize ,2 )+ eps ;

Pulkki, Ville, Tapio Lokki, and Davide Rocchesso. "Spatial effects." DAFX: Digital Audio Effects, Second Edition (2011): 139-183.

Vector-base amplitude panning VBAP Directional audio codingDirAC

36/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 54: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Matlab code for directional analysisf o r t ime =1: hopsize : ( leng th ( x)�wins ize )

% moving to frequency domainW= f f t (w( t ime : ( t ime+winsize �1)).⇤hanning ( wins ize ) ) ;X= f f t ( x ( t ime : ( t ime+winsize �1)).⇤hanning ( wins ize ) ) ;Y= f f t ( y ( t ime : ( t ime+winsize �1)).⇤hanning ( wins ize ) ) ;W=W( 1 : hopsize ) ; X=X( 1 : hopsize ) ; Y=Y( 1 : hopsize ) ;

%I n t e n s i t y computat iontempInt = r e a l ( con j (W) ⇤ [1 1 ] .⇤ [X Y ] ) / s q r t (2) ;% InstantaneousIn tens = tempInt ⇤ alpha + In tens ⇤ (1 � alpha ) ; %Smoothed

% Compute d i r e c t i o n from i n t e n s i t y vec to rAzimuth ( : , i ) = round ( atan2 ( In tens ( : , 2 ) , In tens ( : , 1 ) )⇤ ( 1 8 0 / p i ) ) ; %Energy computat iontempEn=0.5 ⇤ (sum( abs ( [ X Y ] ) . ^ 2 , 2) ⇤ 0.5 + abs (W) . ^ 2 + eps);% I n s tEnergy ( : , i ) = tempEn⇤alpha + Energy ( : , ( i �1)) ⇤ (1�alpha ) ; %Smoothed

%Di f fuseness computat ionDi f fuseness ( : , i ) = 1 � s q r t (sum( In tens . ^ 2 , 2 ) ) . / ( Energy ( : , i ) ) ; i = i +1;

end

% Plo t v a r i a b l e sf i g u r e ( 1 ) ; imagesc ( log ( Energy ) ) ; t i t l e ( ’ Energy ’ ) ;se t ( gca , ’ YDir ’ , ’ normal ’ ) ; x l a b e l ( ’ Time frame ’ ) ; y l a b e l ( ’ Freq bin ’ ) ;f i g u r e ( 2 ) ; imagesc ( Azimuth ) ; co lo rba r ;se t ( gca , ’ YDir ’ , ’ normal ’ ) t i t l e ( ’ Azimuth ’ ) ; x l a b e l ( ’ Time frame ’ ) ; y l a b e l ( ’ Freq bin ’ ) ;f i g u r e ( 3 ) ; imagesc ( Di f fuseness ) ; co lo rba r ;se t ( gca , ’ YDir ’ , ’ normal ’ ) ; t i t l e ( ’ Di f fuseness ’ ) ; x l a b e l ( ’ Time frame ’ ) ; y l a b e l ( ’ Freq bin ’ ) ;

Vector-base amplitude panning VBAP Directional audio codingDirAC

37/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

"HQ" implementation

Too high coherence in virtual microphone channels is enhanced bydiffuse stream: loudspeaker-specific frequency-dependent-delay(decorrelation)non-diffuse stream: panning factors used as gates

Vector-base amplitude panning VBAP Directional audio codingDirAC

38/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 55: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Applications of DirAC

Teleconferencing [1]Realistic reproduction of spatial sound environments [2]– especially for head-mounted displays (VR) [3]Virtual reality (game) audio engines [4]Spatial audio effects [5]

[1] Ahonen, Jukka. "Microphone front-ends for spatial sound analysis and synthesis with Directional Audio Coding." Phd thsesis. Aalto University(2013).[2] V. Pulkki, M.-V. Laitinen, J. Vilkamo, and J. Ahonen "First-order directional audio coding (DirAC)" Parametric time-frequency-domain spatialaudio. Wiley (2017), in press, ask for a copy.[3] Laitinen, M. V., and Pulkki, V. (2009, October). Binaural reproduction for directional audio coding. In Applications of Signal Processing to Audioand Acoustics, 2009. WASPAA’09. IEEE Workshop on (pp. 337-340). IEEE.[4] Laitinen, M. V., Pihlajam% ki, T., Erkut, C., and Pulkki, V. (2012). Parametric time-frequency representation of spatial sound in virtual worlds.ACM Transactions on Applied Perception (TAP), 9(2), 8.[5] Politis, A., Pihlajam% ki, T., and Pulkki, V. (2012). Parametric spatial audio effects. York, UK, September.

Vector-base amplitude panning VBAP Directional audio codingDirAC

39/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Capturing the reality

Omnidirectional cameraat least 6 lensesstitched to spherical video

3D microphonegeneric representation of spatialaudiocan be reproduced over DirAC

Gomez Bolanos AND Pulkki Immersive Audiovisual Environment with 3D audio

Fig. 7: Omnidirectional camera (Ladybug 3) setupwith the A-format microphone (SPS200).

Fig. 8: Video cropping utility (MAX patch). Crop-ping our group meeting recording.

and the e�ect of widening the sound source is alsoperceived correctly.

In general, the system has a good match betweenthe spatial distribution of the auditory events andthe spatial distribution of the visual events, whichtogether with the wide field of view, improves thesensation of immersion and, in consequence, realismin the scene.

5. CONCLUSIONS

An implementation of an immersive audiovisual en-vironment was presented. The system is based onthe use of acoustically transparent screens which re-duces the necessity of complex filtering for correctionof the loudspeaker responses. This environment con-sists of 29 active loudspeakers and three high defini-tion video projectors controlled by a computer. Thesystem also includes a tracking system for interactivepurposes. The loudspeakers are disposed around thelistener in a spherical disposition. The visual dis-play spans 226� in the horizontal plane and 57� inthe vertical plane. The system is easy to assem-ble and disassemble, and allows modifications in theconfiguration of the loudspeakers and the projectorsin order to perform other tasks. With this flexibil-ity, the system can be used for researching into otherfields as crossmodal interaction and psychoacoustics,auralization and gaming. An audiovisual capturingsystem consisting of an omnidirectional camera andan A-format microphone is utilized for acquiring au-diovisual material for the system. Several recordingswere done using the capturing system. It has beenfound, from informal listening tests of the recordedmaterial, that the system presents a good match be-tween visual and audio events, providing good spa-tial audio quality. The non-anechoic characteristicsof the room do not seem to a�ect the spatial audioquality of the reproduction system when DirAC isutilized.

6. ACKNOWLEDGEMENTS

The research leading to these results has receivedfunding from the European Research Council underthe European Communitys Seventh 13 FrameworkProgramme (FP7/2007-2013) / ERC grant agree-ment no. [240453].

AES 132nd Convention, Budapest, Hungary, 2012 April 26–29

Page 7 of 9

Vector-base amplitude panning VBAP Directional audio codingDirAC

40/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 56: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Head-mounted audiovisual displays

ReproductionHead-mounted visual display +headphonesBoth video and spatial audio areupdated with head trackinginformationGeneric representation of audio inDirAC is well-suited for this

Demonstration by Aalto !Fraunhofer IIS demonstration

Vector-base amplitude panning VBAP Directional audio codingDirAC

41/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Head-mounted audio-visual displays(VR displays)

Virtual content (computer-generated world)Recorded content (surrounding camera + 3D sound)Very strong feeling of being somewhere else to subjectAbility to produce both externalized and internalized sound scenes

Vector-base amplitude panning VBAP Directional audio codingDirAC

42/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 57: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

DirAC as virtual reality audio engine

directionalparameters

propagationsimulation

loud

spea

ker

or h

eadp

hone

sign

als

DirAC-monosynth

soundsynth 1

soundsynth N

B-fo

rmat

DirA

C en

codi

ng/d

ecod

ing

DirAC toB-format

DirAC toB-format

monoDirACstream

singleaudio channel

directionalparameters

DirAC-monosynth

propagationsimulation

parametersmono reverbB-format reverb

parametersmono reverb

mux

B-formatstream

Control spatial extent of virtual sourcesWith headphones: Creation of external - internal sourcesLoudspeaker-setup-independent reverberatorEfficient transmission of spatial sound

Vector-base amplitude panning VBAP Directional audio codingDirAC

43/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

DevelopmentDifferent versions of DirAC available1st-order B-format input

some artifacts in acoustically challenging situationsapplause, broad-band sources in opposite directions, very strong earlyreflectionsdecorrelation process causes artifactscovariance-domain rendering minimizes the level of decorrelated sound [1]different assumptions of parameters yield different approaches [2], thathave a bit different problems

With higher number of microphones, higher quality is obtained,parametric processing needs to be less aggressiveA number of different techniques that use parametric approach inspatial audio has been proposed, see overview in [3]

[1] Vilkamo, Juha, and Ville Pulkki. "Minimization of decorrelator artifacts in directional audio coding by covariance domain rendering." Journal ofthe Audio Engineering Society 61.9 (2013): 637-646.[2] Barrett, Natasha, and Svein Berge. "A new method for B-format to binaural transcoding." Audio Engineering Society Conference: 40thInternational Conference: Spatial Audio: Sense the Sound of Space. Audio Engineering Society, 2010.[3] A. Politis, S. Delikaris-Manias and V. Pulkki "Overview to time-frequency-domain parametric spatial audio techniques" Parametrictime-frequency-domain spatial audio. Wiley (2017), in press, ask for a copy.

Vector-base amplitude panning VBAP Directional audio codingDirAC

44/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Page 58: Virtual acoustics and spatial audio - Aalto · Ville Pulkki, Tapio Lokki Virtual acoustics and spatial audio ELEC-E5620, 2017 Audio Signal Processing Agenda 10.3.2017 ... Invented

Higher-order microphones with TF-domain para-metric processing

Sound field

SF divided virtually into sectors

DirAC 1

DirAC 2

DirAC 3

2nd-orderB-format

1st-orderB-format

Energeticanalysis

Covariancedomainrendering

Higher-order DirAC (Politis & Pulkki)divide sound field into virtual sound fields

Vector-base amplitude panning VBAP Directional audio codingDirAC

45/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

Dissemination of DirAC

DirAC was published first as SIRR [1] for impulse responses, and laterfor continuous sound [2]First TF-domain parametric audio method where the parameters aremeasured using a microphone setup346 references to [2] in 10 years, ninth of all JAES articlesCorresponding patents transferred to Fraunhofer IISCommercialization

[1] Merimaa, Juha, and Ville Pulkki. "Spatial impulse response rendering I: Analysis and synthesis." Journal of the Audio Engineering Society53.12 (2005): 1115-1127.[2] Pulkki, Ville. "Spatial sound reproduction with directional audio coding." Journal of the Audio Engineering Society 55.6 (2007): 503-516.

Vector-base amplitude panning VBAP Directional audio codingDirAC

46/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics