A (Mis-) Guided Tour of the Web Audio API

A (Mis-) Guided Tour of the

“Web Audio API” Edward B. Rockower, Ph.D.

Presented 10/15/14

Monterey Bay Information Technologists (MBIT) Meetup [email protected]

1

Abstract • Audio for websites has a very checkered past. • The HTML5 <audio> tag is a big step forward • “Web Audio API”, more of a giant Leap

– modeled on a modular graph of “audio nodes” – provides filters, gains, convolvers, spectral analysis, and

spatially-located sound sources – Very important for sounds in

• games, online music synthesis, speech recognition, analyses

• Javascript Arrays and XHR2 (AJAX) • “getUserMedia” to capture real-time camera and

microphone input • arriving “as we speak” (Check Availability: www.CanIUse.com) 2

Presenter

Presentation Notes

AJAX (Asynchronous Javascript and XML)

Organizing Principles (Evolutionary Revolutionary)

3

Events Asynchronous/callbacks

Web Workers

Theremin (artificial)

Transistor Printed circuit

FFT A/D PCM, DSP

Moog Audio

Synthesizer

Internet Browser Wars

Human/computer interaction Online & Games

Natural (birds, voices)

Computer Generated

demos

Enabling Technologies

Music Audio Engineering

Web Audio API

Presenter

Presentation Notes

Organizing Principles: 1) enabling technologies, 2) Music & Sound Engineering, 3) Games, 4) Internet & Browser Evolution

What’s New in AJAX, HTML5, & Javascript

• New XHR2 (arraybuffer, typed arrays, CORS) • Asynchronous (callbacks, non-blocking, events) • Audio Threads (Audio Worker) • getUserMedia (HTML5, WebRTC) • requestAnimationFrame (60 fps, HTML5) • <audio> (HTML5 mediaElement) • Web Audio API (optimized ‘native’ C code, modular

audio graph, Fast Fourier Transform) • Vendor prefixed syntax (webkitAudioContext) • Firefox v. 32 Dev Tools displays/edits Web Audio graph

4

Presenter

Presentation Notes

Access-Control-Allow-Origin: http://example.com Header set Access-Control-Allow-Origin "*“ (in .htaccess) V8 Javascript in Google Chrome Demo the example of checkbox animations http://lab.hakim.se/checkwave/

Both Sound-Amplitude & Time are Quantized

5

Presenter

Presentation Notes

bit rate, which refers to the number of bits of audio required per second of playback. In the Web Audio API world, this long array of numbers representing a sound is abstracted as an AudioBuffer. AudioBuffers can store multiple audio channels (often in stereo, meaning a left and right channel) represented as arrays of floating-point numbers normalized between −1 and 1. The same signal can also be represented as an array of integers, which, in 16-bit, range from (−215) to (215 − 1). http://chimera.labs.oreilly.com/books/1234000001552/ch01.html#s01_5

PCM Digitization Analog to Digital (A/D)

• 4 bits 2^4 = 16 different values – Quantization of values – Encode as binary numbers

• Ts = Sampling interval • 1/ Ts = Sampling Frequency

• 44.1 kHz used in Compact discs – Nyquist Freq. = 44.1kHz/2 = upper limit of hearing

6

Buffers and views: “typed array” architecture

JavaScript typed arrays split implementation into buffers and views. • “buffer” (ArrayBuffer object) represents a chunk of data; • no format to speak of • no mechanism for accessing its contents • need to use a “view”, provides context, i.e. data type, starting offset,

number of elements • Not your standard Arrays, but Fast !!

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays 7

16 bytes = 8 * 16 bits = 128 bits

Presenter

Presentation Notes

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays 16 bytes = 8 bits/byte * 16 bytes = 128 bits XHR request.responseType = 'arraybuffer'; audioContext.decodeAudioData(request.response, function (buffer) { … } // Create the array for the data values frequencyArray = new Uint8Array(analyserNode.frequencyBinCount); analyserNode.getByteFrequencyData(frequencyArray); Fast Fourier Transform (FFT) i.e. Spectrum Uint8Array(k) has k samples where each ‘sample’ is a quantized measurement or computed value with 8 bits per value, i.e. “Bit Depth” = 8

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer

Buffers, Arrays, XHR, …

8

• XMLHttpRequest (XHR) request.responseType = 'arraybuffer'; • audioContext.decodeAudioData(request.response, function (anotherBuffer)

{ … } • // Create the array for the data values • frequencyArray = new Uint8Array(analyserNode.frequencyBinCount); • analyserNode.getByteFrequencyData(frequencyArray); Fast

Fourier Transform (FFT) i.e. Spectrum. FFT populates frequencyArray

• requestAnimationFrame plots data at each “frame” redraw (60 fps) more efficient than setTimeout( ) or setInterval( )

( here 8 bits is quantization in the value of each measurement/sample ‘frame’, Whereas the inverse of Sampling rate, e.g. 1/22,050 = ~4.5 ms is the quantization in time.)

Presenter

Presentation Notes

Uint8Array(analyserNode.frequencyBinCount) Uint8Array(k) has k samples where each ‘sample’ is a quantized measurement or computed value with 8 bits per value ( here 8 bits is quantization in the value of each measurement ‘frame’, Whereas the inverse of Sampling rate, e.g. 1/22,050 = ~4.5 ms is the quantization in time.)

Leon Theremin

9

http://mdn.github.io/violent-theremin/ http://youtu.be/w5qf9O6c20o

Presenter

Presentation Notes

Lev Sergeevich Termen http://en.wikipedia.org/wiki/Theremin

http://mdn.github.io/violent-theremin/

http://youtu.be/w5qf9O6c20o

http://youtu.be/w5qf9O6c20o

10

Moog Synthesizer Audio Graphs

11

Presenter

Presentation Notes

The modular synthesizer is a type of synthesizer consisting of separate specialized modules. The modules are not hardwired together but are connected together, usually with patch cords or a matrix patching system, to create a patch. The voltages from the modules may function as (audio) signals, control voltages, or logic conditions The Moog synthesizer gained wider attention in the music industry after it was demonstrated at the Monterey International Pop Festival in 1967.

Bourne Identity: Sound Engineers (explaining how car sounds are modified to be more exciting)

12

Audio Graph Setup: Typical Workflow

13

1.Create audio context

2.Inside the context, create sources, e.g. <audio>, oscillator, stream

3.Create effects nodes, e.g. reverb, biquad filter, panner, compressor

4.Choose final destination of audio, for example your system speakers

5.Connect the sources up to the effects, and the effects to the detination.

developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API

Presenter

Presentation Notes

https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API

Polyfills – Vendor-prefixed (webkit, moz, ms) (e.g. using a self-executing-function)

(function() { // Polyfill for AudioContext window.AudioContext = window.AudioContext || window.webkitAudioContext || window.mozAudioContext; // Polyfill for requestAnimationFrame (replaces setTimeout) var requestAnimationFrame = window.requestAnimationFrame || window.mozRequestAnimationFrame || window.webkitRequestAnimationFrame || window.msRequestAnimationFrame; window.requestAnimationFrame = requestAnimationFrame; })();

14

Presenter

Presentation Notes

Polyfills: there’s a kind of spackle in Europe called “polyfila”

Audio Sources

• <audio> • new Audio(‘sounds/mySound.mp3’); • XHR (AJAX) • oscillatorNode(s) • “getUserMedia()” (live, usb microphone) • Procedurally Generated (Script Processor)

15

Presenter

Presentation Notes

Audacity

Draw the AudioBuffer (no audiograph)

16

var audioContext = new AudioContext(); function initAudio() { var audioRequest = new XMLHttpRequest(); audioRequest.open("GET", "sounds/myAudio.ogg", true); audioRequest.responseType = "arraybuffer"; audioRequest.onload = function() { audioContext.decodeAudioData( audioRequest.response, function(buffer) { var canvas = document.getElementById("view1"); drawBuffer( canvas.width, canvas.height, canvas.getContext('2d'), buffer ); } ); } audioRequest.send(); } function drawBuffer( width, height, context, buffer ) { var data = buffer.getChannelData( 0 ); var step = Math.ceil( data.length / width ); var amp = height / 2; for(var i=0; i < width; i++){ var min = 1.0; var max = -1.0; for (var j=0; j<step; j++) { var datum = data[(i*step)+j]; if (datum < min) min = datum; if (datum > max) max = datum; } context.fillRect(i,(1+min)*amp,1,Math.max(1,(max-min)*amp)); } }

Draws a Web Audio AudioBuffer to a canvas https://github.com/cwilso/Audio-Buffer-Draw/commits/master

Plot Audio Spectrum var audioEl = document.querySelector('audio'); // <audio> var audioCtx = new AudioContext(); var canvasEl = document.querySelector('canvas'); // <canvas> var canvasCtx = canvasEl.getContext('2d'); var mySource = audioCtx.createMediaElementSource(audioEl); // create source var myAnalyser = audioCtx.createAnalyser(); // create analyser mySource.connect(analyser); // connect audio nodes myAnalyser.connect(audioCtx.destination); // connect to speakers function processIt() { var freqData = new Uint8Array(myAnalyser.frequencyBinCount); myAnalyser.getByteFrequencyData(freqData); // place spectrum in freqData requestAnimationFrame(function() { canvasCtx.clearRect(0, 0, canvasEl.width, canvasEl.height); canvasCtx.fillStyle = "#ff0000"; for (var i = 0; i < freqData.length; i++) { canvasCtx.fillRect(i, canvasEl.height, 1, canvasEl.height - freqData[i]); // plot frequency spectrum } // end for }); // end requestAnimationFrame } // end fcn processIt setInterval(processIt, 1000/60);

17

Plot Audio Spectrogram*

18

var audioEl = document.querySelector('audio'); // <audio> var audioCtx = new AudioContext(); var canvasEl = document.querySelector('canvas'); // <canvas> var canvasCtx = canvasEl.getContext('2d'); var mySource = audioCtx.createMediaElementSource(audioEl); var myAnalyser = audioCtx.createAnalyser(); myAnalyser.smoothingTimeConstant = 0; var myScriptProcessor = audioCtx.createScriptProcessor(myAnalyser.frequencyBinCount, 1, 1); mySource.connect(myAnalyser); myAnalyser.connect(audioCtx.destination); // speakers/headphone myScriptProcessor.connect(audioCtx.destination); var x = 0; myScriptProcessor.onaudioprocess = function () { if(!audioEl.paused) { x += 1; var freqData = new Uint8Array(myAnalyser.frequencyBinCount); myAnalyser.getByteFrequencyData(freqData); requestAnimationFrame(function() { if (x > canvasEl.width) { canvasCtx.clearRect(0, 0, canvasEl.width, canvasEl.height); x = 0; } for (var i = 0; i < freqData.length; i++) { canvasCtx.fillStyle = "hsl(" + freqData[i] + ",100%, 50%)"; canvasCtx.fillRect(x, canvasEl.height - i, 1, 1); } // end for }); // end requestAnimationFrame } // end if } // end onaudioprocess

*plot of the spectrum as a function of time

Time

Freq

uenc

y

Types of Audio Nodes

19

• Source • <audio> Element • Buffer Source (use with XHR) • Oscillator

• Analyser Node • Panner

• Doppler Shift (cf voice changer)? • http://chromium.googlecode.com/svn/trunk/samples/audio/doppler.html

• Script Processor/AudioWorker (e.g. add your own higher resolution FFT) • Compressor (e.g. avoid ‘clipping’) • Convolution (e.g. add impulse response of large cathedral) • Delay • …

http://chromium.googlecode.com/svn/trunk/samples/audio/doppler.html

Developer Tools Console “Hints”: Explore Latest Syntax, Methods & Params

20

e.g. Firefox

A Fluid Specification • http://webaudio.github.io/web-audio-api for latest • Updated frequently: W3C Editor's Draft 14 October 2014

– August 29th + … – September 29th + … – October 5th, 8th, 14th

• Boris Smus web book with syntax changes

– http://chimera.labs.oreilly.com/books/1234000001552 • Script Processor Node is deprecated, use createAudioWorker • “AudioProcessingEvent” (deprecated) is dispatched to

ScriptProcessorNode. When the ScriptProcessorNode is replaced by AudioWorker, we’ll use AudioProcessEvent.

21

Presenter

Presentation Notes

http://webaudio.github.io/web-audio-api/ for the latest spec. Audio workers provide the ability for direct scripted audio processing to be done inside a web worker context, and are defined by a couple of interfaces (new as of 29th August 2014.) These are not implemented in any browsers yet. When implemented, they will replace ScriptProcessorNode, https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API#Defining_audio_effects_filters http://chimera.labs.oreilly.com/books/1234000001552

http://webaudio.github.io/web-audio-api

http://chimera.labs.oreilly.com/books/1234000001552


Boris Smus Book, Deprecations (http://chimera.labs.oreilly.com/books/1234000001552/apa.html)

• AudioBufferSourceNode.noteOn() has been changed to start().

• AudioBufferSourceNode.noteGrainOn() has been changed to start().

• AudioBufferSourceNode.noteOff() has been changed to stop().

• AudioContext.createGainNode() has been changed to createGain().

• AudioContext.createDelayNode() has been changed to createDelay().

• AudioContext.createJavaScriptNode() has been changed to createScriptProcessor(). (changing to Audio Workers )

• OscillatorNode.noteOn() has been changed to start().

• OscillatorNode.noteOff() has been changed to stop().

22

http://chimera.labs.oreilly.com/books/1234000001552/apa.html

Firefox Web Audio Editor

https://developer.mozilla.org/en-US/docs/Tools/Web_Audio_Editor

Activate Web Audio Editor

Presenter

Presentation Notes



Firefox Web Audio Editor (cont.)

24

• Click F12 or Ctrl-Shift-K Show Developer Tools • Select “Web Audio” tab Oscillator Node AudioParams • Edit AudioParams • Update Audio Graph (and Sound!) in real time

Presenter

Presentation Notes

The Web Audio Editor examines an audio context constructed in the page and provides a visualization of its graph. This gives you a high-level view of its operation, and enables you to ensure that all the nodes are connected in the way you expect. You can then examine and edit the AudioParam properties for each node in the graph. Some non-AudioParam properties, like an OscillatorNode's type property, are displayed, and you can edit these as well.

Demos • http://borismus.github.io/spectrogram Realtime, “getUserMedia” • http://webaudioapi.com Boris Smus • https://webaudiodemos.appspot.com Chris Wilson • https://webaudiodemos.appspot.com/Vocoder • https://webaudiodemos.appspot.com/slides/mediademo • http://chromium.googlecode.com/svn/trunk/samples/audio/doppl

er.html • http://chromium.googlecode.com/svn/trunk/samples/audio/

(shows you files, can view sources) • http://labs.dinahmoe.com/ToneCraft

• Localhost Demos C:\Users\rockower\Dropbox\Audio\MBIT-

WebAudioTalk\demos\startPythonServer.bat

25

@echo off rem start Python3 Web Server in demos folder call python -m http.server 80

http://borismus.github.io/spectrogram


http://webaudioapi.com/


https://webaudiodemos.appspot.com/


https://webaudiodemos.appspot.com/Vocoder


https://webaudiodemos.appspot.com/slides/mediademo




http://chromium.googlecode.com/svn/trunk/samples/audio/

http://labs.dinahmoe.com/ToneCraft

http://labs.dinahmoe.com/ToneCraft

http://webaudioplayground.appspot.com/

26 • Web Audio Playground: interactive creation of Audio Graph • getUserMedia requests permission to access microphone

webaudioplayground.appspot.com

27


source.connect(B); B.connect(C); …

…; C.connect(audioContext.destination);

Impulse Response, Convolution, Spatialization, …

• *http://www.openairlib.net • http://www.openairlib.net/auralizationdb/content/r1-nuclear-reactor-hall

– Upload a sound to hear in that space .wav < 5Megs – Or download “impulse response” to convolve with your sound

29

Boris Smus says (in his O’Reilly book):

– Room Effects: ‘The convolver node “smushes” the input sound and its impulse response* by computing a convolution, a mathematically intensive function. The result is something that sounds as if it was produced in the room where the impulse response was recorded.’

– Spatialized Sounds: the Web Audio API comes with built-in positional audio features

– Position and orientation of sources and listeners

– Parameters associated with the source audio cones

– Relative velocities of sources and listeners (Doppler Shift)

http://www.openairlib.net/

http://www.openairlib.net/auralizationdb/content/r1-nuclear-reactor-hall

References/links

• http://webaudio.github.io/web-audio-api latest specification • http://webaudioapi.com/ Boris Smus site • http://chimera.labs.oreilly.com/books/1234000001552 “Web Audio API” book online • https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays • https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer • http://www.html5rocks.com/en/tutorials/webaudio/intro/ (Smus) • https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Using_XMLHttpRequest • http://webaudiodemos.appspot.com/ Chris Wilson • http://webaudioplayground.appspot.com create ‘audio graph’, include analyser, gain,

filter, delay • http://www.html5rocks.com/en/tutorials/file/xhr2/ Bidelman tutorial • Book “Javascript Creativity” Shane Hudson, Apress, chapter 3, etc. • https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Using_Web_Audio_API

Caveat: many Audio websites have outdated, i.e. non-working, syntax for AudioContext &/or Audio Nodes; some are “vendor-prefixed” e.g. webkitCreateAudioContext (as well as for requestAnimationFrame) 30

http://webaudio.github.io/web-audio-api

http://webaudioapi.com/


https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer

http://www.html5rocks.com/en/tutorials/webaudio/intro/

https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Using_XMLHttpRequest

http://webaudiodemos.appspot.com/


http://www.html5rocks.com/en/tutorials/file/xhr2/

https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Using_Web_Audio_API

Backup Slides

31

To make it as an audio engineer, you MUST know:

• Digital audio • The ins and outs of signal flow and patch bays • How analog consoles work • In-depth study of analog consoles • Audio processing • Available audio plugins and how they work • Signal processing and compressors • How to perform a professional mix-down • How various studios are designed and how their monitors work • Electronic music and beat matching • Sync and automation • Recording and mixing ins and outs • Surround mixing

32 http://www.recordingconnection.com/courses/audio-engineering

Presenter

Presentation Notes

http://www.recordingconnection.com/courses/audio-engineering/

http://www.recordingconnection.com/courses/audio-engineering




What is a “biquad” filter?

• a digital biquad filter is a second-order recursive linear filter, • containing two poles and two zeros. • "Biquad" is an abbreviation of "biquadratic", i.e. in the Z domain, its transfer function is the ratio of two quadratic functions

33

http://en.wikipedia.org/wiki/Recursive_filter

http://en.wikipedia.org/wiki/Linear_filter

http://en.wikipedia.org/wiki/Pole%E2%80%93zero_plot

http://en.wikipedia.org/wiki/Pole%E2%80%93zero_plot

http://en.wikipedia.org/wiki/Z-transform

http://en.wikipedia.org/wiki/Transfer_function

http://en.wikipedia.org/wiki/Quadratic_function

Uint8Array(k) has k samples where each ‘sample’ is a quantized measurement or computed value with 8 bits per value

34

• Analog signal is sampled every TS secs. • Ts is referred to as the sampling interval. • fs = 1/Ts is called the sampling rate or sampling frequency.

Presenter

Presentation Notes

2^4 = 16

Abstract of Presentation Audio for websites has a very checkered past. Finally, however, we can forget about using media tags like “embed” & “object”, and browser plugins like flash, along with the annoying “bgsound” of IE. The HTML5 <audio> tag is a big step forward…. But the “Web Audio API”, modeled on a graph of “audio nodes” providing filters, gains, spectral analysis, and spatially-located sound sources, is more of a giant leap forward for sounds in games and online music synthesis. That, along with “getUserMedia” to capture real-time camera and microphone input are arriving “as we speak”. Plan on lots of eye- (and ear-) candy to whet your appetite, with a modest taste of geeky codes and advances in Javascript Arrays and XHR2. 35

General audio graph definition

• General containers and definitions that shape audio graphs in Web Audio API usage.

• AudioContext: represents an audio-processing graph built from audio modules linked together, each represented by an AudioNode. An audio context controls the creation of the nodes it contains and the execution of the audio processing, or decoding. You need to create an AudioContext before you do anything else, as everything happens inside a context.

• AudioNode: interface represents an audio-processing module like an audio source (e.g. an HTML <audio> or <video> element), audio destination, intermediate processing module (e.g. a filter like BiquadFilterNode, or volume control like GainNode).

• AudioParam: interface represents an audio-related parameter, like one of an AudioNode. It can be set to a specific value or a change in value, and can be scheduled to happen at a specific time and following a specific pattern.

• ended (event): fired when playback has stopped because the end of the media was reached.

36

Presenter

Presentation Notes


Interfaces defining audio sources

• OscillatorNode: represents a sine wave. It is an AudioNode audio-processing module that causes a given frequency of sine wave to be created.

• AudioBuffer: represents a short audio asset residing in memory, created from an audio file using the AudioContext.decodeAudioData() method, or created with raw data using AudioContext.createBuffer(). Once decoded into this form, the audio can then be put into an AudioBufferSourceNode.

• AudioBufferSourceNode: represents an audio source consisting of in-memory audio data, stored in an AudioBuffer. It is an AudioNode that acts as an audio source.

• MediaElementAudioSourceNode: represents an audio source consisting of an HTML5 <audio> or <video> element. It is an AudioNode that acts as an audio source.

• MediaStreamAudioSourceNode: represents an audio source consisting of a WebRTC MediaStream (such as a webcam or microphone.) It is an AudioNode that acts as an audio source.

37

Presenter

Presentation Notes


Define effects you want to apply to audio sources.

• BiquadFilterNode: represents a simple low-order filter, represents different kinds of filters, tone control devices or graphic equalizers.

• ConvolverNode: performs a Linear Convolution on a given AudioBuffer, often used to achieve a reverb effect.

• DelayNode: causes a delay between the arrival of an input data and its propagation to the output.

• DynamicsCompressorNode: a compression effect, lowers volume of the loudest parts of the signal to help prevent clipping and distortion from multiple sounds played and multiplexed together

• GainNode: represents a change in volume, causes a given gain to be applied to the input signal

• WaveShaperNode: represents a non-linear distorter, uses a curve to apply a waveshaping distortion, often used to add a warm feeling

• PeriodicWave: define a periodic waveform that can be used to shape the output of an OscillatorNode.

38

Presenter

Presentation Notes


Audio Analysis, Spatialization & Destinations • AnalyserNode: represents a node able to provide real-time frequency

and time-domain analysis, for data analysis and visualization. • audio spatialization panning effects to your audio sources.

– AudioListener: represents the position and orientation of the unique person listening to the audio scene

– PannerNode: represents the behavior of a signal in space, describing its position with right-hand Cartesian coordinates, its movement using a velocity vector and its directionality using a directionality cone.

• AudioDestinationNode: represents the end destination of an audio source in a given context — usually the speakers of your device.

• MediaStreamAudioDestinationNode: represents an audio destination consisting of a WebRTC MediaStream with a single AudioMediaStreamTrack

– can be used in a similar way to a MediaStream obtained from Navigator.getUserMedia., acts as an audio destination.

39

Presenter

Presentation Notes


40

Firefox Web Audio Editor: AudioParams to adjust