24
MPEG-4 CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault January 20, 2003 Further modified by Ichiro Fujinaga January 20, 2005

MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

Embed Size (px)

Citation preview

Page 1: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MPEG-4MPEG-4

CS DivisionUniversity of California at Berkeleywww.cs.berkeley.edu/~johnw

John LazzaroJohn Wawrzynek

June 18, 2001Modified by Francois Thibault

January 20, 2003Further modified by Ichiro Fujinaga

January 20, 2005

Page 2: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MPEG 4 StandardMPEG 4 Standard

Finalized its standardization process in 1999 (Vancouver)

Design to integrate visual and audio

Includes "natural" (recorded) and "synthetic" (synthesized) coding of audio and video

Page 3: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MPEG 4 ScopeMPEG 4 Scope

Provides a set of technologies to satisfy the needs ofauthorsnetwork service providersend users

Enables the production of content that has far greater reusability indigital televisionanimated graphicsweb pages

Page 4: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MPEG 4 FeaturesMPEG 4 FeaturesMPEG-4 provide standardized ways to:

represent units of aural, visual or audiovisual content, called “media objects” Natural origin Synthetic origin

recorded with a camera or microphone, or generated with a computer

describe the composition of these objects to create compound media objects that form audiovisual scenes

multiplex and synchronize the data associated with media objects, so that they can be transported over networks providing a QoS (Quality of Service)

interact with the audiovisual scene generated at the receiver’s end

Page 5: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MPEG 4 Standard (audio)MPEG 4 Standard (audio)

MPEG 4

audio systemvideo

SA

Natural coding Synthetic coding

AAC T/F CELP Parametric TTS

ISO/IEC 14496-3 sec5

Page 6: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MPEG 4 Audio: Natural (recorded)MPEG 4 Audio: Natural (recorded)

AAC: The Advanced Audio Coding Originally created as an extension to MPEG-2Provides better quality at 64 kbit/sec/channel than

MP3 does at 128 kbit/sec/channel

CELP: A codebook-excited linear predictionscheme optimized for telephone- quality transmission

of speech in the range 8-32 kbps

Parametric: A novel "harmonic vector + noise" method that

allows lossy but extremely low-bitrate coding of wideband sounds down to 2 kbps/sec/ channel

Page 7: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MPEG 4 Audio: Synthetic (synthesized)MPEG 4 Audio: Synthetic (synthesized)

Structured Audio: A downloadable synthesis method that allows

producers to describe new synthesis methods as part of the bitstream

the receiver implements a reconfigurable synthesis engine and synthesizes the sound on-the-fly as the instructions are received

Text-to-Speech: An interface to standalone TTS systems is provided,

so that synthetic speech can be synchronized in multimedia presentations

No "method" of creating synthetic speech is standardized by MPEG

Page 8: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MPEG 4 Standard - Structured AudioMPEG 4 Standard - Structured Audio

Structured Audio: One “component” in the MPEG audio standard.

MPEG 4

audio systemvideo

SA

Natural coding Synthetic coding

AAC T/F CELP Parametric TTS

ISO/IEC 14496-3 sec5

Page 9: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

Audio Compression BasicsAudio Compression Basics

decoderencoder

time

amp Filter into Critical Bands

Allocate Bits

Format Bit-

stream

Compute Masking

Traditional Technique for Music

Page 10: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

The Kolmogorov alternative:The Kolmogorov alternative: Write a computer program that generates the

desired audio stream.

Transmit the computer program.

To decode, execute the program.

MPEG-4 Structured Audio (MP4-SA) uses this approach.

Eric Scheirer, Editor (MIT Media Lab).http://sound.media.mit.edu/~eds/mpeg4/

Similar to Postscript!

Page 11: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MP4-SA EncodingMP4-SA Encoding may be a creative act: writing a program.

directly (emacs), or indirectly (GUI, webpage) In this case, MP4-SA is a lossless compressor.

may be automatic: given a sound, an encoder writes a program that generates the sound. Automatic encoding is a hard in the general case.

MP4-SA DecodersMP4-SA Decoders are interpreters or compilers.

Page 12: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

Key Application: Music ProductionKey Application: Music Production Modern music production is computer-based.

Musicians enter performances into computers as control information, not audio waveforms.

Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control.

“The Program”synthesis algorithmseffects “boxes”mixers

Musical performanceMix-down control information

“The Decoder”sound rendering

MP4-SA Maps to Modern Music Production

Network

Premium onlow-bandwidth

Page 13: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

Key Application: Music ProductionKey Application: Music Production Modern music production is computer-based.

Musicians enter performances into computers as control information, not audio waveforms.

Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control.

“The Program”synthesis algorithmseffects “boxes”mixers

Musical performanceMix-down control information

“The Decoder”sound rendering

MP4-SA Maps to Modern Music Production

Ideal for collaborative productions, remixes, and ...

File System

Standard Framework

Page 14: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

Key Application: Music PerformanceKey Application: Music Performance Music Performance requires dynamic control.

True interactively requires parameterized sounds.Musicians control instruments and effects with

interactive controllers.Control could be indirect and remote (ex: games).

MP4-SA Enables Networked Music Performance

Network

Premium onlow-bandwidth

“The Decoder”sound rendering

+

“The Decoder”sound rendering

+

Page 15: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

MPEG 4 Structured Audio:MPEG 4 Structured Audio:

A binary file format that encodes: The programming language SAOL (pronounced: sail). The musical score language SASL. Legacy support for MIDI. Audio sample data.

Result is normative: an MP4-SA file will sound identical on all compliant decoders.

Different from MIDI files.

Page 16: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

Why SAOL and MP4-SA?Why not Java?Why SAOL and MP4-SA?Why not Java?

Musical performance have temporal structure that changes over several timescales:

Sample-by-sample10’s of usec

Amplitude & timbre envelopes: 10’s of msec

Note-by-note: 100’s of msec

Writing sound generation code in a conventional language results in code dominated by time-scale management. Hard to maintain, hard to optimize.

Page 17: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

Time management is built into SAOL.Time management is built into SAOL.

A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion.

Work is scheduled to happen:at the a-rate (the audio sample rate)at the k-rate (envelope control rate)at the i-rate (rate for new notes)

Language variables are typed as a/k/i-rate.

A language statement is scheduled based on the rate of the variables it contains.

Page 18: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

SAOL, SASL, and Scheduling:SAOL, SASL, and Scheduling:

Sound creation in MP4-SA can be compared to a musician playing notes on an instrument.

A SAOL subprogram (called an instr or instrument) serves as the instrument.

SASL commands (called score lines) act to play notes on SAOL instruments.

Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.

Page 19: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

An example:An example:

SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.)

This SASL file plays melody on tone:

0.5 tone 0.75 52 0.251.5 tone 0.75 64 0.252.5 tone 0.5 63 0.253 tone 0.25 59 0.23.25 tone 0.25 61 0.2253.5 tone 0.5 63 0.2254 tone 0.5 64 0.255 end

How long instrument runs

When instanceis launched

Instance parameters(note number, loudness)

Page 20: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

SAOL code for toneSAOL code for toneinstr tone (note, loudness){ ivar a; // sets osc f

ksig env; // env output

asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0);

if (init == 0) // first a-pass only { x = loudness; init = 1; }

x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output

} // end of instr tone

Page 21: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

SAOL FeaturesSAOL Features

Rate semantics: i/k/a-rate execution

Vector arithmetic: ex: A=B+C for i=1,n A[i]=B[i]+C[i]

All floating-point arithmetic.

Extensive build-in audio function library:signal generators, table operators, pitch

converters, filters, fft, sample rate conversion, effects, ...

Page 22: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

Sfront - a SAOL-to-C translatorSfront - a SAOL-to-C translator

sfrontfoo.mp4 sa.c

Converts MP4-SA files to a ANSI C program, that when executed, produces audio.

Runs on UNIX, Windows, MacOS.Under Linux, supports real-time MIDI input, real-time audio

input and output, and MIDI over RTP (Real Time Protocol).www.cs.berkeley.edu/~lazzaro/sa

sfrontfoo.mp4

SAOL

MIDIUncompressed

samples

SASL

sa.c

Handles SAOL, SASL, MIDI, uncompressed samples.

Page 23: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

Generator TechniquesGenerator Techniques

Much of the SA standard describes a library104 core opcodes (ex: pow(), allpass(), reverb() )16 wave table generators (ex: harm, spline, random)

Sfront optimizes the code produced for each library element instance based on the invocation attributesrate, width, size, constancy, integral nature of the

parameters, number of paramaters

Page 24: MPEG-4 CS Division University of California at Berkeley johnw John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault

ConclusionsConclusions

MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. Physical Modeling good Sampling Natural Instruments bad

If models are chosen carefully, compression ratios of 100 to 10,000 are possible.

MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately.