26
January 22, 2014 Sam Siewert Computer and Machine Vision Deeper Dive into MPEG Digital Video Encoding

Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

January 22, 2014 Sam Siewert

Computer and Machine Vision

Deeper Dive into MPEG

Digital Video Encoding

Page 2: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Reminders

CV and MV Use UNCOMPRESSED FRAMES

Remote Cameras (E.g. Security) May Need to Transport

Frames Capture Over Network to CV/MV Processor

We NEED to Understand Both!

BEWARE of LOSSY COMPRESSION

I-Frame ONLY or MJPEG Decent Compromise of Both

Sam Siewert 2

Page 3: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

MPEG: Order Of Operators

Sam Siewert 3

#1: POINT (Pixel) Encoding

#2 A-C: Macro-Block Lossy Intra-Frame Compression

#3: Motion-Based Compression in Group of Pictures

#1

#2A

#2B

#2C #3

Page 4: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Sam Siewert 4

Step #1 – RGB to YCrCb 4:4:4 24-bit

(Lossless) For every Y sample in a scan-line, there is also one CrCb

sample

– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits

– No compression between RGB and YCrCb 4:4:4 (both 24 bits/pixel)

Typically a Post Production, CEDIA or DCI format

… 0 319

… 76,480 76,799

= Y, Cr, and Cb sample = Y sample only

Page 5: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

48 bit to 32 bit

Sam Siewert 5

Step #1 – RGB to YCrCb 4:2:2 (Lossy) For every 2 Y samples in a scan-line, one CrCb sample

– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits

– Two RGB Pixels = 48 bits, Whereas Two YCrCb is 32 bits, or 16

bits per pixel vs. 24 bits per pixel (33% smaller frame size)

… 0 319

… 76,480 76,799

= Y, Cr, and Cb sample = Y sample only

Page 6: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Sam Siewert 6

Step #1 – RGB to YCrCb 4:2:0 (Lossy) For every 4 Y samples in a scan-line, one CrCb sample

– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits

– Two RGB Pixes = 48 bits, Whereas Four YCrCb is 48 bits, or 12

bits per pixel on average vs. 24 bits per pixel (50% smaller)

… 0 319

76,480 76,799

= Cr, Cb sample = Y sample only

Page 7: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Step #2 – Convert to 8x8 Macroblocks

and Transform Aspect Ratios Designed to Fit 8x8 Macroblock

E.g. 640 x 480 => 80 x 60 Macroblocks

Discrete Cosine Transform Applied to Each 8x8

– Spatial Intensity to Frequency Transform

– Applied on X Axis (Row)

– Applied on Y Axis (Column)

Set up for Intra-frame (I-frame) Compression

Sam Siewert 7

Page 8: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Convolution Concepts Math operation on 2 functions, that produces a 3rd

Point Spread Function “Sharpen” meets this Definition

So do Many Mask Operations applied to Pixel Neighborhoods

Sam Siewert 8

2 impulses, f(t), g(X – t)

Area inside intersection

f convolved with g over t

Page 9: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

DCT – Discrete Cosine Transform Convolution of Image with Discrete Cosine

See http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/

De-convolved to restore image from Convolved Image

Sam Siewert 9

DCT

Inverse DCT

Page 10: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

DCT Concepts

F(x) is a sum of sinusoids (with frequency, amplitude)

DCT operates of a discrete number of samples

Can derive DC sum at any x, even where F(x) not known

N x N Macro-block has Zero Frequency DC at 0,0

Increasing Horizontal Frequency

Increasing Vertical Frequency

Can De-convolve (inverse DCT, or iDCT)

Can Eliminate High Frequency Horizontal and Vertical

Terms

– Minimal Losses from Truncation (otherwise lossless)

– Loss of High Frequency Image Features (What are These?)

Sam Siewert 10

Page 11: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Basic Concept of Waveforms

Complex Waveform is Sum of Simple Fundamentals

Simple Fundamentals Can Be Derived from Complex

Sam Siewert 11

Page 13: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

What Is Lost with DCT Quantization? Noise More Than Anything Else

Complex XY Variable Patterns (Real Science Data?)

Sam Siewert 13

Complex Tiling

Higher Frequency X

Higher Frequency Y

Terms Can Still be Ignored

Complex Wood Texture

Most Detail in X

Far Less in Y

Randomized Texture Image

High X Detail

High Y Detail

Most Loss of Detail, But Noisy

Page 14: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Step #2A: Macro-block Discrete Cosine

Transform

8x8 Pixel Block – Macro-block

– SD NTSC 720x480 (90x60 Macro-blocks), 3:2 Aspect Ratio

– HD 720 1280x720 (160x90 Macro-blocks), 16:9 AR

– HD 1080 1920x1080 (240x135 Macro-blocks), 16:9 AR

Sam Siewert 14

Page 15: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Step #2B: Macro-block Quantization (Lossy)

Apply Weighting and Scaling 8x8 to DCT

Produces Lots of Repeated Values (and Zeros)

Compared to Original

Sam Siewert 15

Page 16: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Decode Process for #2A-B

Sam Siewert 16

Page 17: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

How Lossy is the Decode Macro-

Block?

Sam Siewert 17

Page 18: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

OpenCV Macroblock DCT Example

Same Cactus 320x240 with 80x80 DCT Macroblocks

Sam Siewert 18

DCT iDCT

Same Cactus 320x240 Again with 8x8 DCT Macroblocks

DCT iDCT

Page 19: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Mathematics for 2D DCT Frequency Variation on X and Y axes from top left to bottom right

Straight-forward Algorithm Based on 2D Equation is O(n2) per dimension

Like Cooley-Tukey for DFT, a DCT Algorithm that is O(n*log2(n)) has been formulated (Arai, Y.; Agui, T.; Nakajima, M. - Numerical Recipes: The Art of Scientific Computing (3rd ed.))

http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/dct2/dct2.c

Sam Siewert 19

http://en.wikipedia.org/wiki/File:Dctjpeg.png

Page 20: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Step #2C: Macro-block Run-Length and

Huffman Encoding

Zig-Zag Run-Length Encoding to Exploit Repeated Data

and Zeros found in H.O.T. of Quantized DCT

– 86, 1, 7, -5, -1, 0, 1, 0, 0, 2, -1, 1, 0, -1, 0 , 0, 0, 0, -1, 0, 0, …

Becomes:

Sam Siewert 20

Page 21: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Huffman Applied to RLE Data

Huffman Tables for MPEG-2 Macro-Blocks Defined in

13818-2 (Lossless)

Compression Based on Probability of Occurance

Shannon’s Source Coding Theory: log2(P), P=probability

of occurrence, Binary encoding of Symbols

Sam Siewert 21

Page 22: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Step #3: Group of Pictures Concept – Transmit Change-Only Data

I-Frame Compressed Only Intra-Frame

By Methods #2A-2C to Macro-Blocks

I-Frame Can Be Decoded Alone

P-Frame is Differences Only Over the

GoP

B-Frame is Differences Only Between

Both I-Frame and Closest P-Frame

Difference Data Can be Further

Encoded with Lossless Methods

Without Steps 2A-C, Specifically

Quantization, and With High Motion

Video, Could Blow-Up

Sam Siewert 22

Page 23: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Group of Pictures: High Level View

Sam Siewert 23

Page 24: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Overall MPEG YCrCb Compression

Performance Standard Definition 720x480x2 (675KB/frame) @ 30fps

– Requires 20MB/sec (200 Mbps) Uncompressed

– Typical MPEG-2 @ 3.75 Mbps, > 50x Compression

– Typical MPEG-4 @ 1.5 Mbps, > 100x Compression

– 10 to 20 Programs on QAM 256 (48Mbps, 6MhZ/Ch)

– ≈10 MPEG-4 Programs on ATSC 8VSB (19.39 Mbps, 6MhZ/Ch)

HD 720p (1280x720x2,1800KB/frame) @ 30fps

– Requires 53MB/sec (530Mbps) Uncompressed

– Typical MPEG-2 @ 20 Mbps, > 25x Compression

– Typical MPEG-4 @ 10 Mbps, > 50x Compression

HD 1080p (1920x1080x2, 4050KB/frame) @ 30fps

– Requires 120MB/sec (1200Mbps) Uncompressed

– Typical MPEG-2, VC-1 @ 45 Mbps, > 30x Compression

– Typical MPEG-4 @ 20 Mbps, > 60x Compression

Sam Siewert 24

Page 25: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

Parsing an Elementary Video Stream

Sam Siewert 25

Many 188-Byte Packet Types and Header

Allows for Multi-plexing of many Video and Audio

Streams on a Carrier

Page 26: Computer and Machine Vision - University of Colorado Boulderecee.colorado.edu/~siewerts/extra/ecen5763/ecen5763_doc/Lecture… · Computer and Machine Vision Deeper Dive into MPEG

MPEG-4 vs. MPEG-2

MPEG-2 – Defined by ISO 13818-1, 13818-2 – Leverages MPEG-1 (Motion Picture Experts Group – 1988)

– Widely Used for Digital Video – Digital Cable TV, DVD

– Transport Stream designed for Broadcast (Lossy, No Beginning or End of Stream)

ATSC – Advanced Television Systems Committee (HDTV Broadcast) – 8VSB Modulation – 8 level Vestigal Sideband Modulation, 6MhZ channel, 19.39

Mbps, Reed-Solomon Error Correction

– Up to 1080p (1920x1080) Video Resolution

– AC-3 (Dolby) Audio

DVB – Digital Video Broadcast (Europe, Satellite)

– Program Stream designed for Playback Media (DVD, Flash, HDD, etc.)

MPEG-4 – Defined by ISO 14496 (1998) – Leverages MPEG-2 Standards for Program/Transport, Encode/Decode

– Better Compression Rates (improved motion prediction for P,B frames), MPEG-4 Part-10 (H.264), e.g. Blu-Ray

– Extensions for Digital Rights Management

– Advanced Audio Encoding

– Becoming More Widely Deployed for HD and Because of Lower Bit-Rate Transport Streams

Sam Siewert 26