Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories @dolby.com

  • View
    216

  • Download
    0

Embed Size (px)

Text of Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories @dolby.com

  • Slide 1
  • Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com
  • Slide 2
  • Motivation Many applications require processing hundreds of audio streams in real-time games/simulators, multi-track mixing, etc. Eden GamesSteinberg
  • Slide 3
  • Massive audio processing Often exceeds available resources Limited CPU or hardware processing Bus-traffic Typically involves individual processing mix-down of all signals to outputs 3D audio rendering
  • Slide 4
  • Perceptual audio rendering Perceptually-based processing Many sources and efficient DSP effects Level of detail rendering Independent of reproduction system Extended sound sourcesSound reflections sound sources
  • Slide 5
  • Leveraging limitations of human hearing A large part of complex sound mixtures is likely to be perceptually irrelevant e.g., auditory masking Limitations of spatial hearing e.g., localization accuracy, ventriloquism
  • Slide 6
  • masking clustering progressive processing sources listener Perceptual audio rendering components
  • Slide 7
  • Masking
  • Slide 8
  • Real-time masking evaluation Remove inaudible sources Fetch and process only perceptually relevant input Different from invisible or occluded sound sources Estimate inter-source masking Build upon perceptual audio coding work Computing audibility threshold requires knowledge of signal characteristics
  • Slide 9
  • Signal characteristics Pre-computed for short time-frames (20 ms) power spectrum tonality index in [0,1] (1 = tone, 0 = noise) time pre-recorded signal
  • Slide 10
  • Sort sources by decreasing loudness Loudness relates to the sensation of sound intensity Efficient run-time loudness evaluation Retrieve pre-computed power spectrum for each source Modulate by propagation effects Convert to loudness using look-up tables [Moore92] Greedy culling algorithm
  • Slide 11
  • power [dB] listener 1 Candidate sources Current mix Current masking threshold STOP ! 2 3 4 Current masking threshold Current masking threshold Current masking threshold Masking evaluation
  • Slide 12
  • Clustering
  • Slide 13
  • Dynamic spatial clustering Amortize (costly) 3D-audio processing over groups of sources Leverage limited resolution of spatial hearing Group neighboring sources together Compute an impostor for the group Perceptually equivalent but cheaper to render Unique point source with a complex response (mixture of all source signals in cluster)
  • Slide 14
  • Dynamic spatial clustering Limited spatial perception of human hearing [Blauert, Middlebrooks] Static sound source clustering [Herder99] non-uniform subdivision of direction space use Cartesian centroid as representative
  • Slide 15
  • Group neighboring sources together Uniform direction constraint Log(1/distance) constraint Weight by loudness Hochbaum-Schmoy heuristic [Hochbaum85] Fast hierarchical implementation Dynamic spatial clustering
  • Slide 16
  • Mix signals of all sources in the cluster create a single source with a complex response Rendering clusters
  • Slide 17
  • Dynamic spatial clustering
  • Slide 18
  • Culling and masking are transparent rated 4.4/5 avg. (5 = indistinguishable from reference) Clustering preserves localization cues 74% success avg. (90% within 1 meter of true location) no significant correlation with number of clusters Pilot validation study
  • Slide 19
  • Progressive processing
  • Slide 20
  • Progressive signal processing A scalable pipeline for filtering and mixing many audio streams fetch & process only perceptually relevant input continuously adapt quality vs. speed remain perceptually transparent use a standard representation of the inputs
  • Slide 21
  • Progressive signal processing Uses Fourier-domain coefficients for processing Degrade both signal quality and spatial cues Combines processing and audio coding Uses additional signal descriptors for decision making
  • Slide 22
  • Progressive processing pipeline N input frames importance Process + Reconstruct Masking Importance sampling 1 output frame
  • Slide 23
  • Progressive signal processing
  • Slide 24
  • Progressive processing and sound synthesis Sound synthesis from physics-driven animation Modal models Resonant modes can be synthesized in Fourier domain numer of Fourier coefficients can be allocated on-the-fly Balance processing costs for recorded and synthesized sounds at the same time
  • Slide 25
  • Conclusions Perceptually motivated techniques for rendering and authoring virtual auditory environments human listener only process a small amount of information in complex situations Extend to more complex auditory processing model cross-modal perception Efficient and Practical Audio-Visual Rendering for Games using Crossmodal Perception David Grelaud, Nicolas Bonneel, Michael Wimmer, Manuel Asselot, George Drettakis, Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games - 2009 other problems : dynamic range management e.g., HDR audio approach of EA/Dice studio for Battlefield
  • Slide 26
  • Additional references www-sop.inria.fr/reves This work was supported by http://www.inria.fr/reves/OPERA http://www.inria.fr/reves/OPERA RNTL project OPERA http://www.inria.fr/reves/OPERA http://www.inria.fr/reves/OPERA EU IST Project CREATE http://www.cs.ucl.ac.uk/create http://www.cs.ucl.ac.uk/create EU FET OPEN Project CROSSMOD http://www-sop.inria.fr/reves/CrossmodPublic/ http://www-sop.inria.fr/reves/CrossmodPublic/