Upload
edward-b-rockower
View
588
Download
3
Embed Size (px)
DESCRIPTION
Audio for websites has a very checkered past. Finally, however, we can forget about using media tags like “embed” & “object”, and browser plugins like flash, along with the annoying “bgsound” of IE. The HTML5 tag is a big step forward…. But the “Web Audio API”, modeled on a graph of “audio nodes” providing filters, gains, spectral analysis, and spatially-located sound sources, is more of a giant leap forward for sounds in games and online music synthesis. That, along with “getUserMedia” to capture real-time camera and microphone input are arriving “as we speak”. Plan on lots of eye- (and ear-) candy to whet your appetite, with a modest taste of geeky codes and advances in Javascript Arrays and XHR2.
Citation preview
A (Mis-) Guided Tour of the
“Web Audio API” Edward B. Rockower, Ph.D.
Presented 10/15/14
Monterey Bay Information Technologists (MBIT) Meetup [email protected]
1
Abstract • Audio for websites has a very checkered past. • The HTML5 <audio> tag is a big step forward • “Web Audio API”, more of a giant Leap
– modeled on a modular graph of “audio nodes” – provides filters, gains, convolvers, spectral analysis, and
spatially-located sound sources – Very important for sounds in
• games, online music synthesis, speech recognition, analyses
• Javascript Arrays and XHR2 (AJAX) • “getUserMedia” to capture real-time camera and
microphone input • arriving “as we speak” (Check Availability: www.CanIUse.com) 2
Organizing Principles (Evolutionary Revolutionary)
3
Events Asynchronous/callbacks
Web Workers
Theremin (artificial)
Transistor Printed circuit
FFT A/D PCM, DSP
Moog Audio
Synthesizer
Internet Browser Wars
Human/computer interaction Online & Games
Natural (birds, voices)
Computer Generated
demos
Enabling Technologies
Music Audio Engineering
Web Audio API
What’s New in AJAX, HTML5, & Javascript
• New XHR2 (arraybuffer, typed arrays, CORS) • Asynchronous (callbacks, non-blocking, events) • Audio Threads (Audio Worker) • getUserMedia (HTML5, WebRTC) • requestAnimationFrame (60 fps, HTML5) • <audio> (HTML5 mediaElement) • Web Audio API (optimized ‘native’ C code, modular
audio graph, Fast Fourier Transform) • Vendor prefixed syntax (webkitAudioContext) • Firefox v. 32 Dev Tools displays/edits Web Audio graph
4
Both Sound-Amplitude & Time are Quantized
5
PCM Digitization Analog to Digital (A/D)
• 4 bits 2^4 = 16 different values – Quantization of values – Encode as binary numbers
• Ts = Sampling interval • 1/ Ts = Sampling Frequency
• 44.1 kHz used in Compact discs – Nyquist Freq. = 44.1kHz/2 = upper limit of hearing
6
Buffers and views: “typed array” architecture
JavaScript typed arrays split implementation into buffers and views. • “buffer” (ArrayBuffer object) represents a chunk of data; • no format to speak of • no mechanism for accessing its contents • need to use a “view”, provides context, i.e. data type, starting offset,
number of elements • Not your standard Arrays, but Fast !!
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays 7
16 bytes = 8 * 16 bits = 128 bits
Buffers, Arrays, XHR, …
8
• XMLHttpRequest (XHR) request.responseType = 'arraybuffer'; • audioContext.decodeAudioData(request.response, function (anotherBuffer)
{ … } • // Create the array for the data values • frequencyArray = new Uint8Array(analyserNode.frequencyBinCount); • analyserNode.getByteFrequencyData(frequencyArray); Fast
Fourier Transform (FFT) i.e. Spectrum. FFT populates frequencyArray
• requestAnimationFrame plots data at each “frame” redraw (60 fps) more efficient than setTimeout( ) or setInterval( )
( here 8 bits is quantization in the value of each measurement/sample ‘frame’, Whereas the inverse of Sampling rate, e.g. 1/22,050 = ~4.5 ms is the quantization in time.)
Leon Theremin
9
http://mdn.github.io/violent-theremin/ http://youtu.be/w5qf9O6c20o
10
Moog Synthesizer Audio Graphs
11
Bourne Identity: Sound Engineers (explaining how car sounds are modified to be more exciting)
12
Audio Graph Setup: Typical Workflow
13
1.Create audio context
2.Inside the context, create sources, e.g. <audio>, oscillator, stream
3.Create effects nodes, e.g. reverb, biquad filter, panner, compressor
4.Choose final destination of audio, for example your system speakers
5.Connect the sources up to the effects, and the effects to the detination.
developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API
Polyfills – Vendor-prefixed (webkit, moz, ms) (e.g. using a self-executing-function)
(function() { // Polyfill for AudioContext window.AudioContext = window.AudioContext || window.webkitAudioContext || window.mozAudioContext; // Polyfill for requestAnimationFrame (replaces setTimeout) var requestAnimationFrame = window.requestAnimationFrame || window.mozRequestAnimationFrame || window.webkitRequestAnimationFrame || window.msRequestAnimationFrame; window.requestAnimationFrame = requestAnimationFrame; })();
14
Audio Sources
• <audio> • new Audio(‘sounds/mySound.mp3’); • XHR (AJAX) • oscillatorNode(s) • “getUserMedia()” (live, usb microphone) • Procedurally Generated (Script Processor)
15
Draw the AudioBuffer (no audiograph)
16
var audioContext = new AudioContext(); function initAudio() { var audioRequest = new XMLHttpRequest(); audioRequest.open("GET", "sounds/myAudio.ogg", true); audioRequest.responseType = "arraybuffer"; audioRequest.onload = function() { audioContext.decodeAudioData( audioRequest.response, function(buffer) { var canvas = document.getElementById("view1"); drawBuffer( canvas.width, canvas.height, canvas.getContext('2d'), buffer ); } ); } audioRequest.send(); } function drawBuffer( width, height, context, buffer ) { var data = buffer.getChannelData( 0 ); var step = Math.ceil( data.length / width ); var amp = height / 2; for(var i=0; i < width; i++){ var min = 1.0; var max = -1.0; for (var j=0; j<step; j++) { var datum = data[(i*step)+j]; if (datum < min) min = datum; if (datum > max) max = datum; } context.fillRect(i,(1+min)*amp,1,Math.max(1,(max-min)*amp)); } }
Draws a Web Audio AudioBuffer to a canvas https://github.com/cwilso/Audio-Buffer-Draw/commits/master
Plot Audio Spectrum var audioEl = document.querySelector('audio'); // <audio> var audioCtx = new AudioContext(); var canvasEl = document.querySelector('canvas'); // <canvas> var canvasCtx = canvasEl.getContext('2d'); var mySource = audioCtx.createMediaElementSource(audioEl); // create source var myAnalyser = audioCtx.createAnalyser(); // create analyser mySource.connect(analyser); // connect audio nodes myAnalyser.connect(audioCtx.destination); // connect to speakers function processIt() { var freqData = new Uint8Array(myAnalyser.frequencyBinCount); myAnalyser.getByteFrequencyData(freqData); // place spectrum in freqData requestAnimationFrame(function() { canvasCtx.clearRect(0, 0, canvasEl.width, canvasEl.height); canvasCtx.fillStyle = "#ff0000"; for (var i = 0; i < freqData.length; i++) { canvasCtx.fillRect(i, canvasEl.height, 1, canvasEl.height - freqData[i]); // plot frequency spectrum } // end for }); // end requestAnimationFrame } // end fcn processIt setInterval(processIt, 1000/60);
17
Plot Audio Spectrogram*
18
var audioEl = document.querySelector('audio'); // <audio> var audioCtx = new AudioContext(); var canvasEl = document.querySelector('canvas'); // <canvas> var canvasCtx = canvasEl.getContext('2d'); var mySource = audioCtx.createMediaElementSource(audioEl); var myAnalyser = audioCtx.createAnalyser(); myAnalyser.smoothingTimeConstant = 0; var myScriptProcessor = audioCtx.createScriptProcessor(myAnalyser.frequencyBinCount, 1, 1); mySource.connect(myAnalyser); myAnalyser.connect(audioCtx.destination); // speakers/headphone myScriptProcessor.connect(audioCtx.destination); var x = 0; myScriptProcessor.onaudioprocess = function () { if(!audioEl.paused) { x += 1; var freqData = new Uint8Array(myAnalyser.frequencyBinCount); myAnalyser.getByteFrequencyData(freqData); requestAnimationFrame(function() { if (x > canvasEl.width) { canvasCtx.clearRect(0, 0, canvasEl.width, canvasEl.height); x = 0; } for (var i = 0; i < freqData.length; i++) { canvasCtx.fillStyle = "hsl(" + freqData[i] + ",100%, 50%)"; canvasCtx.fillRect(x, canvasEl.height - i, 1, 1); } // end for }); // end requestAnimationFrame } // end if } // end onaudioprocess
*plot of the spectrum as a function of time
Time
Freq
uenc
y
Types of Audio Nodes
19
• Source • <audio> Element • Buffer Source (use with XHR) • Oscillator
• Analyser Node • Panner
• Doppler Shift (cf voice changer)? • http://chromium.googlecode.com/svn/trunk/samples/audio/doppler.html
• Script Processor/AudioWorker (e.g. add your own higher resolution FFT) • Compressor (e.g. avoid ‘clipping’) • Convolution (e.g. add impulse response of large cathedral) • Delay • …
Developer Tools Console “Hints”: Explore Latest Syntax, Methods & Params
20
e.g. Firefox
A Fluid Specification • http://webaudio.github.io/web-audio-api for latest • Updated frequently: W3C Editor's Draft 14 October 2014
– August 29th + … – September 29th + … – October 5th, 8th, 14th
• Boris Smus web book with syntax changes
– http://chimera.labs.oreilly.com/books/1234000001552 • Script Processor Node is deprecated, use createAudioWorker • “AudioProcessingEvent” (deprecated) is dispatched to
ScriptProcessorNode. When the ScriptProcessorNode is replaced by AudioWorker, we’ll use AudioProcessEvent.
21
Boris Smus Book, Deprecations (http://chimera.labs.oreilly.com/books/1234000001552/apa.html)
• AudioBufferSourceNode.noteOn() has been changed to start().
• AudioBufferSourceNode.noteGrainOn() has been changed to start().
• AudioBufferSourceNode.noteOff() has been changed to stop().
• AudioContext.createGainNode() has been changed to createGain().
• AudioContext.createDelayNode() has been changed to createDelay().
• AudioContext.createJavaScriptNode() has been changed to createScriptProcessor(). (changing to Audio Workers )
• OscillatorNode.noteOn() has been changed to start().
• OscillatorNode.noteOff() has been changed to stop().
22
Firefox Web Audio Editor
https://developer.mozilla.org/en-US/docs/Tools/Web_Audio_Editor
Activate Web Audio Editor
Firefox Web Audio Editor (cont.)
24
• Click F12 or Ctrl-Shift-K Show Developer Tools • Select “Web Audio” tab Oscillator Node AudioParams • Edit AudioParams • Update Audio Graph (and Sound!) in real time
Demos • http://borismus.github.io/spectrogram Realtime, “getUserMedia” • http://webaudioapi.com Boris Smus • https://webaudiodemos.appspot.com Chris Wilson • https://webaudiodemos.appspot.com/Vocoder • https://webaudiodemos.appspot.com/slides/mediademo • http://chromium.googlecode.com/svn/trunk/samples/audio/doppl
er.html • http://chromium.googlecode.com/svn/trunk/samples/audio/
(shows you files, can view sources) • http://labs.dinahmoe.com/ToneCraft
• Localhost Demos C:\Users\rockower\Dropbox\Audio\MBIT-
WebAudioTalk\demos\startPythonServer.bat
25
@echo off rem start Python3 Web Server in demos folder call python -m http.server 80
http://webaudioplayground.appspot.com/
26 • Web Audio Playground: interactive creation of Audio Graph • getUserMedia requests permission to access microphone
source.connect(B); B.connect(C); …
…; C.connect(audioContext.destination);
Impulse Response, Convolution, Spatialization, …
• *http://www.openairlib.net • http://www.openairlib.net/auralizationdb/content/r1-nuclear-reactor-hall
– Upload a sound to hear in that space .wav < 5Megs – Or download “impulse response” to convolve with your sound
29
Boris Smus says (in his O’Reilly book):
– Room Effects: ‘The convolver node “smushes” the input sound and its impulse response* by computing a convolution, a mathematically intensive function. The result is something that sounds as if it was produced in the room where the impulse response was recorded.’
– Spatialized Sounds: the Web Audio API comes with built-in positional audio features
– Position and orientation of sources and listeners
– Parameters associated with the source audio cones
– Relative velocities of sources and listeners (Doppler Shift)
References/links
• http://webaudio.github.io/web-audio-api latest specification • http://webaudioapi.com/ Boris Smus site • http://chimera.labs.oreilly.com/books/1234000001552 “Web Audio API” book online • https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays • https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer • http://www.html5rocks.com/en/tutorials/webaudio/intro/ (Smus) • https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Using_XMLHttpRequest • http://webaudiodemos.appspot.com/ Chris Wilson • http://webaudioplayground.appspot.com create ‘audio graph’, include analyser, gain,
filter, delay • http://www.html5rocks.com/en/tutorials/file/xhr2/ Bidelman tutorial • Book “Javascript Creativity” Shane Hudson, Apress, chapter 3, etc. • https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Using_Web_Audio_API
Caveat: many Audio websites have outdated, i.e. non-working, syntax for AudioContext &/or Audio Nodes; some are “vendor-prefixed” e.g. webkitCreateAudioContext (as well as for requestAnimationFrame) 30
Backup Slides
31
To make it as an audio engineer, you MUST know:
• Digital audio • The ins and outs of signal flow and patch bays • How analog consoles work • In-depth study of analog consoles • Audio processing • Available audio plugins and how they work • Signal processing and compressors • How to perform a professional mix-down • How various studios are designed and how their monitors work • Electronic music and beat matching • Sync and automation • Recording and mixing ins and outs • Surround mixing
32 http://www.recordingconnection.com/courses/audio-engineering
What is a “biquad” filter?
• a digital biquad filter is a second-order recursive linear filter, • containing two poles and two zeros. • "Biquad" is an abbreviation of "biquadratic", i.e. in the Z domain, its transfer function is the ratio of two quadratic functions
33
Uint8Array(k) has k samples where each ‘sample’ is a quantized measurement or computed value with 8 bits per value
34
• Analog signal is sampled every TS secs. • Ts is referred to as the sampling interval. • fs = 1/Ts is called the sampling rate or sampling frequency.
Abstract of Presentation Audio for websites has a very checkered past. Finally, however, we can forget about using media tags like “embed” & “object”, and browser plugins like flash, along with the annoying “bgsound” of IE. The HTML5 <audio> tag is a big step forward…. But the “Web Audio API”, modeled on a graph of “audio nodes” providing filters, gains, spectral analysis, and spatially-located sound sources, is more of a giant leap forward for sounds in games and online music synthesis. That, along with “getUserMedia” to capture real-time camera and microphone input are arriving “as we speak”. Plan on lots of eye- (and ear-) candy to whet your appetite, with a modest taste of geeky codes and advances in Javascript Arrays and XHR2. 35
General audio graph definition
• General containers and definitions that shape audio graphs in Web Audio API usage.
• AudioContext: represents an audio-processing graph built from audio modules linked together, each represented by an AudioNode. An audio context controls the creation of the nodes it contains and the execution of the audio processing, or decoding. You need to create an AudioContext before you do anything else, as everything happens inside a context.
• AudioNode: interface represents an audio-processing module like an audio source (e.g. an HTML <audio> or <video> element), audio destination, intermediate processing module (e.g. a filter like BiquadFilterNode, or volume control like GainNode).
• AudioParam: interface represents an audio-related parameter, like one of an AudioNode. It can be set to a specific value or a change in value, and can be scheduled to happen at a specific time and following a specific pattern.
• ended (event): fired when playback has stopped because the end of the media was reached.
36
Interfaces defining audio sources
• OscillatorNode: represents a sine wave. It is an AudioNode audio-processing module that causes a given frequency of sine wave to be created.
• AudioBuffer: represents a short audio asset residing in memory, created from an audio file using the AudioContext.decodeAudioData() method, or created with raw data using AudioContext.createBuffer(). Once decoded into this form, the audio can then be put into an AudioBufferSourceNode.
• AudioBufferSourceNode: represents an audio source consisting of in-memory audio data, stored in an AudioBuffer. It is an AudioNode that acts as an audio source.
• MediaElementAudioSourceNode: represents an audio source consisting of an HTML5 <audio> or <video> element. It is an AudioNode that acts as an audio source.
• MediaStreamAudioSourceNode: represents an audio source consisting of a WebRTC MediaStream (such as a webcam or microphone.) It is an AudioNode that acts as an audio source.
37
Define effects you want to apply to audio sources.
• BiquadFilterNode: represents a simple low-order filter, represents different kinds of filters, tone control devices or graphic equalizers.
• ConvolverNode: performs a Linear Convolution on a given AudioBuffer, often used to achieve a reverb effect.
• DelayNode: causes a delay between the arrival of an input data and its propagation to the output.
• DynamicsCompressorNode: a compression effect, lowers volume of the loudest parts of the signal to help prevent clipping and distortion from multiple sounds played and multiplexed together
• GainNode: represents a change in volume, causes a given gain to be applied to the input signal
• WaveShaperNode: represents a non-linear distorter, uses a curve to apply a waveshaping distortion, often used to add a warm feeling
• PeriodicWave: define a periodic waveform that can be used to shape the output of an OscillatorNode.
38
Audio Analysis, Spatialization & Destinations • AnalyserNode: represents a node able to provide real-time frequency
and time-domain analysis, for data analysis and visualization. • audio spatialization panning effects to your audio sources.
– AudioListener: represents the position and orientation of the unique person listening to the audio scene
– PannerNode: represents the behavior of a signal in space, describing its position with right-hand Cartesian coordinates, its movement using a velocity vector and its directionality using a directionality cone.
• AudioDestinationNode: represents the end destination of an audio source in a given context — usually the speakers of your device.
• MediaStreamAudioDestinationNode: represents an audio destination consisting of a WebRTC MediaStream with a single AudioMediaStreamTrack
– can be used in a similar way to a MediaStream obtained from Navigator.getUserMedia., acts as an audio destination.
39
40
Firefox Web Audio Editor: AudioParams to adjust