Multimedia

T.Sharon-A.Frank

Multimedia

Video/Audio Compression

2T.Sharon-A.Frank

Hybrid coding

• Images:– JPEG

• Video/Audio– M-JPEG– MPEG (1, 2, 4)– Other codings– H.26x

3T.Sharon-A.Frank

Video Coding Requirements

• Random access• Fast forward /reverse searches• Reverse playback• Audio-visual synchronization• Robustness to errors• Low coding/decoding delay• Editability• Format flexibility• Cost tradeoffs

4T.Sharon-A.Frank

• Spatial (intra-frame) compression:– Compresses each frame in isolation, treating it as

a bitmapped image.– Based on quantization of DCT coefficients.

• Temporal (inter-frame) compression:– Compresses sequences of frames by only storing

differences between them.– Record displacement of object plus changed pixels

in area exposed by its movement.– Based on Motion Compensation (MC).

Video Compression

5T.Sharon-A.Frank

• Image compression applied to each frame.• Can therefore be lossless or lossy, but lossless rarely produces sufficiently high compression ratios for volume of data.• Lossless compression implies a loss of quality if decompressed then recompressed.• Ideally, work with uncompressed video during post-production.

Spatial Compression

6T.Sharon-A.Frank

• Key frames are spatially compressed only– Key frames often regularly spaced

(e.g., every 12 frames).• Difference frames only store the

differences between the frame and the preceding frame or most recent key frame.

• Difference frames can be efficiently spatially compressed.

Temporal Compression

7T.Sharon-A.Frank

Motion-JPEG (M-JPEG)

• Purely spatial compression. • Apply JPEG compression to each video frame.• Compression rates: 2:1 to 12:1

– lossy: up to 5:1 is considered broadcast quality.

• No standard, but MJPEG-A format widely supported.

• Excellent when there are rapid scene changes in the video.

• Easy to edit.

8T.Sharon-A.Frank

Video Compression

• Divide Image to blocks– 16x16 luminance– 8x8 chrominance (color)

• Use DCT based techniques for spatial redundancy removal (Intra-frame compression).

• Use MC (Motion Compensation) techniques for temporal redundancy removal (Inter-frame compression).

• Final stage is two dimensional run-length coding.

Coding of video is carried out in a series of steps:

Usually

9T.Sharon-A.Frank

Three consecutive video frames

10T.Sharon-A.Frank

Motion Compensation

• Motion compensation compensates for inter-frame differences.

• Real-time communication consideration – only the closest previous frame is used for prediction to reduce the encoding delay.

previous frame current frame

best match

11T.Sharon-A.Frank

Motion Compensation Algorithm

• Sends new location of block• If block changed more than a certain

threshold, resends all the block• Refreshes all the image once in a while

best match

previous frame current frame

12T.Sharon-A.Frank

Frame Types in Compressed Video

• Key Frame– Compression is based on content of this frame.

• Difference/Delta Frame– Compression is based on last key frame.

13T.Sharon-A.Frank

Bi-directional Motion Compensated Interpolation

14T.Sharon-A.Frank

MPEG Dynamics

• Delicate balance between Intra-frame and Inter-frame coding.

• Two basic techniques:– Transform domain DCT-based compression

for the reduction of spatial redundancy (intra-frame).

– Block-based bi-directional MC for reduction of the temporal redundancy (inter-frame).

15T.Sharon-A.Frank

The MPEG Standard

Three types of MPEG-2 frames processed by the viewing program:

1. I (Intracoded) frames: self-contained JPEG-encoded still pictures.

2. P (Predictive) frames: block-by-block difference with the last frame.

3. B (Bidirectional) frames: differences with the last and next frame.

16T.Sharon-A.Frank

Use of MPEG Image Types

<I> Intra-picture/frame/image– Access points for random access– Moderate Compression

<P> Predicted pictures– Coded with a reference to a past picture – Used as reference for future predicted pictures

<B> Bi-directional prediction (interpolated pictures) – Require past and future reference for prediction– Highest compression

17T.Sharon-A.Frank

• Group of Pictures (GOP):– Repeating sequence of I-, P- and B-pictures.– Always begins with an I-picture.– Display order – frames in order they will be

displayed.– Bitstream order – re-ordered so that every

P- or B-picture comes after frames it depends on, allowing reconstruction of the complete frames.

MPEG GOPs

18T.Sharon-A.Frank

A Typical MPEG Picture Display Order

I B B B B B B IP

Forward prediction

I I? B B? 25fps (9 I/P, 17B)

19T.Sharon-A.Frank

A Typical MPEG Picture Bitstream Order

• Transmitting order: 1, 5, 2, 3, 4, 9, 6, 7, 8

Forward prediction

1 2 3 4 5 6 7 8 9

I B B B P B B B I

Bi-directional prediction

20T.Sharon-A.Frank

MPEG Standards

• MPEG-1– 352x240 at 30 fps. – Quality is slightly below standard VCR videos.

• MPEG-2– 720x480 & 1280x720 at 60 fps, with full CD-quality audio. – Sufficient for television (including HDTV). – Used on DVD-ROMs.

• MP3– Audio compression. – Reduces digital sound files by 12:1 ratio with virtually no

loss in quality.

21T.Sharon-A.Frank

• Source Interchange Format (SIF)– 4:2:0 chrominance sub-sampling– 352x240 pixel frame

• MPEG-1 compressed SIF video at 30 frames per second has data rate of 1.86Mbps (CD video – 40mins of video at that rate).

• MPEG-1 can be scaled up to larger frames, but cannot handle interlacing.

MPEG-1 Compression

22T.Sharon-A.Frank

• Profiles define subsets of the features of the data stream.

• Levels define parameters such as frame size and data rate.

• Each profile may be implemented at one or more levels.

• Notation: profile@level, e.g. MP@ML.

MPEG Profiles & Levels

23T.Sharon-A.Frank

• MPEG-2 Main Profile at Main Level (MP@ML) used for DVD video:– CCIR 601 scanning– 4:2:0 chrominance sub-sampling– 15 Mbits per second– Most elaborate representation of MPEG-2

compressed data.

MPEG-2 Main Profile & Level

24T.Sharon-A.Frank

• Refinement of MPEG-1 compression:– I-pictures compressed by quantizing and Huffman

coding DCT coefficients.– Improved motion compensation leads to better quality

than MPEG-1 at same bit rates.• Designed to support a range of multimedia data at bit

rates from 10Kbps to >1.8Mbps.• Applications from mobile phones to HDTV.• Video codec becoming popular for Internet use –

is incorporated in QuickTime, RealMedia and DivX.

MPEG-4 (1)

25T.Sharon-A.Frank

• Standard defines an encoding for multimedia streams made up of different sorts of object –video, still images, animation, 3-D models…

• Higher profiles divide a scene into arbitrarily shaped video objects were each one may be compressed and transmitted separately; scene is composed at receiving end by combining them.

• SP and ASP profiles restricted to rectangular objects, usually complete frames.

MPEG-4 (2)

26T.Sharon-A.Frank

• Simple Profile (SP), suitable for low bandwidth streaming over Internet:– P-pictures only– Efficient decompression, suitable for PDAs, etc– SP@L1, 64 kbps, 176x144 pixel frame.

• Advanced Simple Profile (ASP) suitable for broadband streaming:– B-pictures– Global Motion Compensation– Sub-pixel motion compensation– ASP@L5, 8000 Kbps, full CCIR 601 frame.

MPEG-4 Profiles & Levels

27T.Sharon-A.Frank

• Starts with chrominance sub-sampling of CCIR 601.• Constant data rate 25Mbits per second; higher quality

than MJPEG at same rate.• Apply DCT, quantization, run-length and Huffman

coding on zig-zag sequence – like JPEG – to 8x8 blocks of pixels.

• If little or no difference between fields (almost static frame), apply DCT to block containing alternate lines from odd and even fields.

• If motion between fields, apply DCT to two 8x4 blocks (one from each field) separately, leading to more efficient compression of frames with motion.

DV Compression

28T.Sharon-A.Frank

DVI (Digital Video Interactive)

• Developed by General Electric.• Uses specialized processors for compression.• Hardware-only codec – lossless transforms.• Compression rate: 80:1-160:1

– 10 sec video clip is compressed to ~2MB.• Intel – software version of DVI algorithms, marketed

as Indeo (a software only codec):– there is also an audio version of Indeo.– latest version uses hybrid wavelet transform for

compression algorithm.

29T.Sharon-A.Frank

Cinepak• Developed by Apple and SuperMac.• Outputs 320x240 (quarter screen) at 15 fps

with good quality – data rate that even slow single-speed and 2x

CD-ROM players can deliver.• Software only codec supported by Microsoft’s

Video for Windows and Apple’s QuickTime.• Better color definition than other codecs,

so good for natural video without graphics or animation.

30T.Sharon-A.Frank

QuickTime

• Developed by Apple but is now cross-platform.

• Supports Cinepak, Indeo, M-JPEG and MPEG-1, and is extensible to support future codecs, such as DVCAM.

• Synchronizes all types of digital media.• For example, video frames are dropped if

necessary for synchronization with audio.

31T.Sharon-A.Frank

Video For Windows

• Microsoft (therefore, not cross-platform).• Uses generic AVI (audio video interleaved)

format which is provided by MCI (media control interface).

• Supports a number of compression methods in real-time, non-real-time, with or without hardware assistance– Cinepak, Indeo, Microsoft Video-1.

32T.Sharon-A.Frank

ActiveMovie (API from Microsoft)

• Now called DirectShow (supports DVD).• Solves problems of VfW and QuickTime.• Cross-platform.• Supports codecs supported by VfW as well as

MPEG audio, WAV audio, MPEG video, and Apple QuickTime video.

• Fully integrated with DirectX technology, allowing use of DirectX components and more graphics card features.

33T.Sharon-A.Frank

Video Streaming Players

• RealVideo (from RealNetworks)– G2 Player also plays RealAudio.– Uses a variety of compression techniques.

• RealProducer (also from RealNetworks)– Allows you to create streaming audio and

video.– Free software just like G2!

34T.Sharon-A.Frank

H.261 (Px64)

• Video compression for videoconferences– Compression in real-time– Targeted to ISDN

• Compressed data stream: p*64 Kbits/s, p=1, …, 30)

• 2 resolutions:– Common Intermediate Format (CIF)– Quarter CIF (QCIF)

35T.Sharon-A.Frank

H.261 (Px64) Resolutions

QCIFCIFLines/frame Pixels/lineLines/frame Pixels/line

144 176288 352Luminance (Y) 72 88144 176Chrominance (Cb) 72 88144 176Chrominance (Cr)

• Common Intermediate Format (CIF)• Quarter CIF (QCIF)

36T.Sharon-A.Frank

Image Preparation

• Uncompressed CIF– One frame = 288*352*8 + 2*144*176*8 =

1,216,512 bits– 30 fps– Bandwidth = 1,216,512*30 = 36.4 Mbits/s

• Uncompressed QCIF = 9.1Mbits/s• ISDN channels: 64Kbits/s-2Mbits/s

=> bit reduction required

37T.Sharon-A.Frank

Desktop Videophone Applications

• Channel capacity (p=1) = 64Kbits/s• QCIF at 10 fps --> 3 Mbits/s• Required compression ratio =

3Mbs/64Kbs=47• Channel capacity (p=10) = 640Kbits/s• CIF at 30 fps --> 36.4 Mbits/s• Required compression ratio =

36.4Mbs/640Kbs=57

38

• In general, lossy methods required because of complex and unpredictable nature of audio data.

• CD quality, stereo, 3-minute song requires over 25 Mbytes– Data rate exceeds bandwidth of dial-up Internet

connection.• Difference in the way we perceive sound and

image means different approach from image compression is needed.

Audio Compression

39T.Sharon-A.Frank

Audio Compression Techniques

Samplingfrequency

(KHz)

Quanti-zation(bits)

Format Quality CD quality

44.1 16 PCM HiFi music CD-DA

37.8 8 ADPCM HiFi music CD-I level

37.8 8 ADPCM FM broadcast(music)

CD-I level B

18.9 4 ADPCM AM broadcast(speech)

CD-I level

8 8 PCM Telephone N/A

40T.Sharon-A.Frank

Standards of Speech Encoding

Standard Description

G.711 PCM of voice frequencies

G.722 Audio coding at 7 KHz within 64Kbit/s(ADPCM)

G.728 Coding of speech at 16 Kbit/s using low delaycode excited linear prediction (LD-CELP)

41T.Sharon-A.Frank

Basic Steps of Audio Encoding

PsychoacousticalModel

Filter-Banks

QuantizationMultiplexer

EntropyCoder

Uncompressed audio data

Compressed audio data

[]

32 Sub-Bands

Control

42

• MP3 = MPEG-1 Audio, Layer 3• Three layers of audio compression in MPEG-1

(MPEG-2 essentially identical).• Layer 1...Layer 3, encoding proces increases in

complexity, data rate for same quality decreases– e.g. Same quality 192kbps at Layer 1, 128kbps at

Layer 2, 64kbps at Layer 3.• 10:1 compression ratio at high quality.• Variable bit rate coding (VBR).

MP3

43T.Sharon-A.Frank

Voice Quality - QoS

The Objective:Provide

unfailing, ubiquitous, toll quality service

0

200

400

160

0 1 5 10

Service Level Agreement Violation

Area of Unacceptable Operation

One

-Way

Del

ay (m

s)

Marginal Acceptance

Acceptable Operation

Packet Loss)%( high threshold low threshold

The Challenge:Eliminate the impact of delay-insensitive traffic on real-time

traffic

44T.Sharon-A.Frank

QoS Parameters Best High Medium Best

Effort

Mouth-to-

Ear Delay:

0ms -

150ms

150ms -

250ms

250ms -

450ms

450ms and

above

Call

Setup:

0 sec - 1

sec

1 sec - 3

sec

3 sec - 5

sec

5 sec and

above

few ms

echo pathPSTN PSTNGG IP network

EC EC

hundred ms

few ms hundred ms few ms

few msDelay Budgets

Documents

Multimedia