COMP 4010 Lecture5 VR Audio and Tracking

LECTURE 5: VR AUDIO AND TRACKING

COMP 4010 – Virtual Reality Semester 5 – 2016

Bruce Thomas, Mark Billinghurst University of South Australia

August 23rd 2016

Recap – Last Week • Visual Displays

•  Head Mounted Display •  Vive, Mobile VE

•  Projection/Large Screen Display •  CAVE, Allosphere

• Haptic Displays •  Active Haptics

•  Actively Resist Motion •  Passive Haptics

•  Physical Props

•  Tactile Displays •  Vibrating actuators

AUDIO DISPLAYS

Audio Displays Definition: Computer interfaces that provide synthetic sound feedback to users interacting with the virtual world. The sound can be monoaural (both ears hear the same sound), or binaural (each ear hears a different sound)

Burdea, Coiffet (2003)

Virtual Reality Audio Overview

•  https://www.youtube.com/watch?v=yUlnMbxTuY0

Motivation • Most of the focus in Virtual Reality is on the visuals

• GPUs continue to drive the field • Users want more

•  More realism, More complexity, More speed

• However sound can significantly enhance realism • Example: Mood music in horror games

• Sound can provide valuable user interface feedback • Example: Alert in training simulation

Creating/Capturing Sounds • Sounds can be captured from nature (sampled) or synthesized computationally •  High-quality recorded sounds are

•  Cheap to play •  Easy to create realism •  Expensive to store and load •  Difficult to manipulate for expressiveness

•  Synthetic sounds are •  Cheap to store and load •  Easy to manipulate •  Expensive to compute before playing •  Difficult to create realism

Types of Audio Recordings

• Monaural: Recording with one microphone – no positioning • Stereo Sound: Recording with two microphones placed several

feet apart. Perceived sound position as recorded by microphones.

• Binaural: Recording microphones embedded in a dummy head. Audio filtered by head shape.

•  3D Sound: Using tiny microphones in the ears of a real person. Generate HRTF based on ear shape and audio response.

Synthetic Sounds • Complex sounds can be built from simple waveforms

(e.g., sawtooth, sine) and combined using operators • Waveform parameters (frequency, amplitude) could be

taken from motion data, such as object velocity • Can combine wave forms in various ways

•  This is what classic synthesizers do

• Works well for many non-speech sounds

Combining Wave Forms • Adding up waves creates new waves

Digital Audio Workstation Software

• Software for recording, editing, producing audio files •  Mixing console, synthesizer, waveform editor, etc

• Wide variety available •  https://en.wikipedia.org/wiki/Digital_audio_workstation

Typical Audio Display Properties Presentation Properties • Number of channels • Sound stage •  Localization • Masking • Amplification

Logistical Properties !  Noise pollution !  User mobility !  Interface with tracking !  Environmental requirements !  Integration !  Portability !  Throughput !  Cumber !  Safety !  Cost

Channels and Masking • Number of channels

• Stereo vs. mono vs. quadrophonic •  2.1, 5.1, 7.1

• Two kinds of masking •  Louder sounds mask softer ones

•  We have too many things vying for our audio attention these days!

• Physical objects mask sound signals •  Happens with speakers, but not with headphones

Audio Displays: Head-worn

Ear Buds On Ear Open Back

Closed Bone Conduction

Audio Displays: Room Mounted

• Stereo, 5.1, 7.1, 11.1, etc • Sound cube

11.1 Speaker Array

Spatialization vs. Localization • Spatialization is the processing of sound signals to make them emanate from a point in space • This is a technical topic

• Localization is the ability of people to identify the source position of a sound • This is a human topic, i.e., some people are better at it than others.

Stereo Sound

•  Seems to come from inside users head •  Follows head motion as user moves head

3D Spatial Sound

• Seems to be external to the head •  Fixed in space when user moves head • Has reflected sound properties

Spatialized Audio Effects • Naïve approach

• Simple left/right shift for lateral position • Amplitude adjustment for distance

• Easy to produce using consumer hardware/software • Does not give us "true" realism in sound

• No up/down or front/back cues • We can use multiple speakers for this

• Surround the user with speakers • Send different sound signals to each one

Example: The BoomRoom

• Use surround speakers to create spatial audio effects • Gesture based interaction •  https://www.youtube.com/watch?time_continue=54&v=6RQMOyQ3lyg

Audio Localization • Main cues used by humans to localize sound:

1.  Interaural time differences: Time difference for sound wave to travel between ears

2.  Interaural level differences: For high frequency sounds (> 1.5 kHz), volume difference between ears used to determine source direction

3.  Spectral filtering done by outer ears: Ear shape changes frequency heard

Interaural Time Difference

•  Takes fixed time to travel between ears • Can use time difference to determine sound location

Spectral Filtering

Ear shape filters sound depending on direction it is coming from. This change in frequency determines sound source elevation.

Natural Hearing vs. Headphones

• Due to ear shape natural hearing provides different audio response depending on sound location

Head-Related Transfer Functions (HRTFs)

• A set of functions that model how sound from a source at a known location reaches the eardrum

More About HRTFs • Functions take into account,

•  Individual ear shape • Slope of shoulders • Head shape

• So, each person has his/her own HRTF! • Need to have a parameterizable HRTFs

• Some sound cards/APIs allow specifying an HRTF

•  adsfa

Constructing HRTFs • Small microphones placed into ear canals • Subject sits in an anechoic chamber

• Can use a mannequin's head instead

• Sounds played from a large number of known locations around the chamber • HRTFs are constructed for this data

• Sound signal is filtered through inverse functions to place the sound at the desired source

Constructing HRTFs

• Putting microphones in Manikin or human ears • Playing sound from fixed positions • Record response

How HRTFs are Used • HRTF is the Fourier transform of the

in-ear microphone audio response (head related impulse response (HRIR))

•  From HRTF we can calculate pairs of finite impulse response (FIR) filters for specific sound positions •  One filter per ear

•  To place virtual sound at a position, apply set of FIR filters for that position to the incoming sound

HRTF Processing

•  Input sound is convolved with FIR to generate L/R outputs

Environmental Effects • Sound is also changed by objects in the environment

• Can reverberate off of reflective objects • Can be absorbed by objects • Can be occluded by objects

• Doppler shift • Moving sound sources

• Need to simulate environmental audio properties •  Takes significant processing power

Sound Reverberation

• Need to consider first and second order reflections • Need to model material properties, objects in room, etc

The Tough Part • All of this takes a lot of processing • Need to keep track of

• Multiple (possibly moving) sound sources • Path of sounds through a dynamic environment • Position and orientation of listener(s)

• Most sound cards only support a limited number of spatialized sound channels

•  Increasingly complex geometry increases load on audio system as well as visuals •  That's why we fake it ;-)

• GPUs might change this too!

Sound Display Hardware • Designed to reduce CPU load • Early Hardware

•  Custom HRTF •  Crystal River Engineering Convolvotron (1988)

•  Real time 3D audio localizer, 4 sound sources

•  Lake Technology (2002) •  Huron 20, custom DSP hardware, $40,000

• Modern Consumer Hardware •  Uses generic HRTF •  SoundBlaster Audigy/EAX •  Aureal A3D/Vortex card

Convolvotron Block Diagram

For N sound sources

GPU Based Audio Acceleration

• Using GPU for audio physics calculations • AMD TrueAudio Next

•  https://www.youtube.com/watch?v=Z6nwYLHG8PU

Audio Software SDKs • Modern CPUs are fast enough spatial audio can be

generated without dedicated hardware • Several 3D audio SDKs exist

•  OpenAL •  www.openal.org •  Open source, cross platform •  Renders multichannel three-dimensional positional audio

•  Google VR SDK •  Android, iOS, Unity •  https://developers.google.com/vr/concepts/spatial-audio

•  Oculus •  https://developer3.oculus.com/documentation/audiosdk/latest/

•  Microsoft DirectX, Unity, etc

Google VR Spatial Audio Demo

•  https://www.youtube.com/watch?v=I9zf4hCjRg0&feature=youtu.be

OSSIC 3D Audio Headphones

•  3D audio headphones •  Calibrates to user – calculates HRTF •  Integrated head tracking •  Multi-driver array providing sound to correct part of ear •  Raised $2.7 million on Kickstarter

•  https://www.ossic.com/3d-audio/

Ossic vs. Traditional Headphone

• Provides frequency reproduction of real sound

OSSIC vs. Generic Headphone

• Sound source localization (T = target)

OSSIC Technology

•  https://www.youtube.com/watch?time_continue=71&v=ko-VeQ7Aflg

Designing Spatial Audio

•  There are several tools available for designing 3D audio •  E.g. Facebook Spatial Workstation

•  Audio tools for cinematic VR and360 video •  https://facebook360.fb.com/spatial-workstation/

•  Spatial Audio Designer •  Mixing of surround sound and 3D audio •  http://www.newaudiotechnology.com/en/products/spatial-audio-designer/

Demo: Spatial Audio In VR

• AltspaceVR spatial audio for speaker discrimination •  https://www.youtube.com/watch?v=dV3Qog44z6E

TRACKING

Immersion and Tracking

• Motivation: For immersion, when the user changes position in reality the VR view also needs to change • Requires tracking of the user’s pose (position/orientation)

in the real world and mapping to the Virtual World

Definitions • Tracking: measuring the position and orientation of an object relative to a known frame of reference

• VR Tracker: technology used in VR to measure the real time change in a 3D object position and orientation

(1968) Ivan Sutherland Mechanical Tracker

•  Frames of Reference •  Real World Coordinate System (Wcs) •  Head Coordinate System (Hcs) •  Eye Coordinate System (Ecs)

• Need to create a mapping between Frames •  E.g. Transformation from Wcs to Hcs to Ecs •  Movement in real world maps to movement in Ecs frame

Frames of Reference

Example Frames of Reference

Assuming Head Tracker mounted on HMD

Assuming tracking relative to fixed table object

Tracking Degrees of Freedom • Typically 6 Degrees of Freedom (DOF) • Rotation or Translation about an Axis

1.  Moving up and down 2.  Moving left and right 3.  Moving forward and backward 4.  Tilting forward and backward (pitching); 5.  Turning left and right (yawing); 6.  Tilting side to side (rolling).

Key Tracking Performance Criteria • Static Accuracy • Dynamic Accuracy • Latency • Update Rate • Tracking Jitter • Signal to Noise Ratio • Tracking Drift

Static vs. Dynamic Accuracy • Static Accuracy

•  Ability of tracker to determine coordinates of a position in space

•  Depends on sensor sensitivity, errors (algorithm, operator), environment

• Dynamic Accuracy •  System accuracy as sensor moves •  Depends on static accuracy

• Resolution •  Minimum change sensor can detect

• Repeatability •  Same input giving same output

Tracker Latency, Update Rate

•  Latency: Time between change in object pose and time sensor detects the change •  Large latency (> 10 ms) can cause

simulator sickness •  Larger latency (> 50 ms) can

reduce VR immersion

• Update Rate: Number of measurements per second •  Typically > 30 Hz

Tracker Jitter, Signal to Noise Ratio •  Jitter: Change in tracker output

when tracked object is stationary •  Range of change is sensor noise •  Tracker with no jitter reports constant

value if tracked object stationary •  Makes tracker data changing

randomly about average value

• Signal to Noise Ratio: Signal in data relative to noise •  Found from calculating mean of

samples in known positions

Tracker Drift • Drift: Steady increase in

tracker error over time •  Accumulative (additive) error

over time •  Relative to Dynamic sensitivity

over time •  Controlled by periodically

recalibration (zeroing)

Tracking Technologies • Mechanical

•  Physical Linkage

• Electromagnetic •  Magnetic sensing

•  Inertial •  Accelerometer, MEMs

• Acoustic •  Ultrasonic

• Optical •  Computer Vision

• Hybrid •  Combination of Technologies

Contact-less

Contact-based

Mechanical Tracker

• Idea: mechanical arms with joint sensors

• ++: high accuracy, low jitter, low latency • -- : cumbersome, limited range, fixed position

Microscribe Sutherland

Example: Fake Space Boom

•  BOOM (Binocular Omni-Orientation Monitor) •  Counterbalanced arm with 100o FOV HMD mounted on it •  6 DOF, 4mm position accuracy, 300Hz sampling, < 5 ms latency

Demo: Fake Space Tele Presence

• Using Boom with HMD to control robot view •  https://www.youtube.com/watch?v=QpTQTu7A6SI

Magnetic Tracker • Idea: Measure difference in current between a magnetic transmitter and a receiver

• ++: 6DOF, robust, accurate, no line of sight needed • -- : limted range, sensible to metal, noisy, expensive

Flock of Birds (Ascension)

Example: Polhemus Fastrak • Degrees-of-Freedom: 6DOF • Number of Sensors: 1-4 •  Latency: 4ms • Update Rate: 120 Hz/(num sensors) • Static Accuracy Position: 0.03in RMS • Static Accuracy Orientation: 0.15° RMS • Range from Standard Source: Up to 5 feet or 1.52 meters • Extended Range Source: Up to 15 feet or 4.6 meters •  Interface RS-232 or USB (both included) • Host OS compatability GUI/API Toolkit 2000/XP •  http://polhemus.com/motion-tracking/all-trackers/fastrak

Polhemus Tracker Demo

•  https://www.youtube.com/watch?v=7DlEfd0VH_o

Polhemus Magnetic Tracking Error

Example: Razer Hydra

• Developed by Sixense • Magnetic source + 2 wired controllers

•  Short range (< 1 m), Precision of 1mm and 1o

•  62Hz sampling rate, < 50 ms latency •  $600 USD

Razor Hydra Demo

•  https://www.youtube.com/watch?v=jnqFdSa5p7w

Inertial Tracker •  Idea: Measuring linear and angular orientation rates (accelerometer/gyroscope)

• ++: no transmitter, cheap, small, high sample rate, wireless •  -- : drift, hysteresis, noise, only 3DOF

IS300 (Intersense) Wii Remote

Types of Inertial Trackers • Gyroscopes

•  The rate of change in object orientation or angular velocity is measured.

• Accelerometers • Measure acceleration. • Can be used to determine object position, if the starting

point is known. •  Inclinometer

• Measures inclination, ”level” position. •  Like carpenter’s level, but giving electrical signal.

Example: MEMS Sensor

• Uses spring-supported load • Reacts to gravity and inertia

•  Changes its electrical parameters •  < 5 ms latency, 0.01o accuracy •  up to 1000Hz sampling

• Problems •  Rapidly accumulating errors. •  Error in position increases with the square of time.

•  Cheap units can get position drift of 4 cm in 2 seconds. •  Expensive units have same error in 200 seconds.

•  Not good for measuring location •  Need to periodically reset the output

Demo: MEMS Sensor Working

•  https://www.youtube.com/watch?v=9eSnxebfuxg

MEMS Gyro Bias Drift

•  Zero reading of MEMS Gyro drifts over time due to noise

Example: iPhone Sensors •  Three-axis accelerometer

•  Gives direction acceleration - affected by gravity and movement

•  Three-axis gyroscope •  Measures translation and rotation

moment – affected by movement

•  Three axis magnetometer •  Gives (approximate) direction of

magnetic north

• GPS •  Gives geolocation – multiple

samples over time can be used to detect direction and speed

iPhone Sensor Monitor app

Acoustic - Ultrasonics Tracker •  Idea: Time of Flight or Phase-Coherence Sound Waves

• ++: Small, Cheap •  -- : 3DOF, Line of Sight, Low resolution, Affected by Environment (pressure, temperature), Low sampling rate

Ultrasonic Logitech IS600

Acoustic Tracking Methods •  Two approaches:

•  Time difference, •  Phase difference

•  Time-of-flight (TOF): •  All current commercial systems •  Time that sound pulse travels is proportional to distance from the receiver. •  Problem: differentiating the pulse from noise. •  Each transmitter works sequentially – increased latency.

•  Phase coherent approach (Sutherland 1968): •  No pulse, but continuous signal (~50 kHz) •  Many transmitters on different frequencies •  Sent and received signal phase differences give continuously the change

in distance, no latency, •  Only relative distance, cumulative & multi-path errors possible.

Acoustic Tracking Principles • Measurements are based on triangulation

•  Minimum distances at transmitter and receiver required. •  Can be a problem if trying to make the receiver very small.

• Each speaker is activated in cycle and 3 distances from it to the 3 microphones are calculated, 9 distances total.

•  Tracking performance can degrade when operating in a noisy environment.

• Update rate about 50 datasets/s •  Time multiplexing is possible •  With 4 receivers, update rate drops to 12 datasets/s

Example: Logitech Head Tracker •  Transmitter is a set of three ultrasonic

speakers - 30cm from each other •  Rigid and fixed triangular frame •  50 Hz update, 30 ms latency

• Receiver is a set of three microphones Placed at the top of the HMD •  May be part of 3D mice, stereo glasses, or

other interface devices

• Range typically about 1.5 m •  Direct line of sight required •  Accuracy 0.1o orientation, 2% distance

Optical Tracker • Idea: Image Processing and Computer Vision • Specialized

•  Infrared, Retro-Reflective, Stereoscopic

• ++: Long range, cheap, immune to metal • -- : Line of Sight, Visual Targets, Low Sampling rate

ART Hi-Ball

Outside-In vs. Inside-Out Tracking

Optical Tracking Technologies

• Scalable active trackers • InterSense IS-900, 3rd Tech HiBall

• Passive optical computer vision • Line of sight, may require landmarks • Can be brittle. • Computer vision is computationally-intensive

3rd Tech, Inc.

Example: HiBall Tracking System (3rd Tech)

• Inside-Out Tracker • $50K USD

• Scalable over large area • Fast update (2000Hz) • Latency Less than 1 ms.

• Accurate • Position 0.4mm RMS • Orientation 0.02° RMS

Example: Microsoft Kinect

• Outside-in tracking • Components:

•  RGB camera •  Range camera •  IR light source •  Multi-array microphone

• Specifications •  Range 1-6m •  Update rate 30Hz •  Latency 100ms •  Tracking resolution < 5mm

• Range Camera extracts depth information and combines it with a video signal

Hybrid Tracking •  Idea: Multiple technologies overcome limitations of each one • A system that utilizes two or more position/orientation

measurement technologies (e.g. inertial + vision)

•  ++: Robust, reduce latency, increase accuracy •  -- : More complex, expensive

Intersense IS-900 Ascension Laser Bird

Example: Intersense IS-900 •  Inertial Ultrasonic Hybrid tracking

•  Use ultrasonic strips for position sensing •  Intertial sensing for orientation •  Sensor fusion to combine together

• Specifications •  Latency 4ms •  Update 180 Hz •  Resolution 0.75mm, 0.05o •  Accuracy 3mm, 0.25o

•  Up to 140m2 tracking volume

•  http://www.intersense.com/pages/20/14

Demo: IS-1200 and IS-900

•  https://www.youtube.com/watch?v=NkYLlTyuYkA

Example: Vive Lighthouse Tracking • Outside-in hybrid tracking system •  2 base stations

•  Each with 2 laser scanners, LED array

• Headworn/handheld sensors •  37 photo-sensors in HMD, 17 in hand •  Additional IMU sensors (500 Hz)

• Performance •  Tracking server fuses sensor samples •  Sampling rate 250 Hz, 4 ms latency •  2mm RMS tracking accuracy •  Large area - 5 x 5m range

•  See http://doc-ok.org/?p=1478

Lighthouse Components

•  sd

Base station - IR LED array - 2 x scanned lasers

Head Mounted Display - 37 photo sensors - 9 axis IMU

Lighthouse Setup

How Lighthouse Tracking Works • Position tracking using IMU

•  500 Hz sampling •  But drifts over time

• Drift correction using optical tracking •  IR synchronization pulse (60 Hz) •  Laser sweep between pulses •  Photo-sensors recognize sync pulse, measure time to laser •  Know when sensor hit and which sensor hit •  Calculate position of sensor relative to base station •  Use 2 base stations to calculate pose

• Use IMU sensor data between pulses (500Hz) • See http://xinreality.com/wiki/Lighthouse

Lighthouse Tracking

Base station scanning

https://www.youtube.com/watch?v=avBt_P0wg_Y https://www.youtube.com/watch?v=oqPaaMR4kY4

Room tracking

www.empathiccomputing.org

@marknb00

[email protected]