92
Image & Video Compression (19/09/2006) - 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and Video Compression A presentation to Avocent Noel O’Connor, Andrew Kinane, Daniel Larkin 19/09/2006

Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 1 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Image and Video CompressionA presentation to Avocent

Noel O’Connor, Andrew Kinane, Daniel Larkin

19/09/2006

Page 2: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 2 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Overview

• Lossless Compression – Entropy coding: a brief review

• Huffman Coding• Arithmetic Coding

– Lossless Compression Standards• The FAX Group Standards, JBIG, Lossless JPEG

• Lossy Compression– Generic Codec Structure

• DCT/IDCT• Quantization• Motion Estimation• Motion Compensation

– Lossy Compression Standards• JPEG, JPEG2000, H.261 / H.263 / H.264, MPEG-1/-2/-4

• Image Analysis Techniques – Visual Feature Extraction

Page 3: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 3 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Lossless Compression

Entropy Coding

Page 4: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 4 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Entropy Coding

• Also referred to as source coding• Assign each symbol a binary codeword

– Allocate a specific string of bits to a symbol

• Based on information theory:– S = {s1 … sN} is set of symbols to encode

with probabilities p1 … pN

– Entropy H(s) is measure of the information content:

– Specifies lower bound on efficiency

Page 5: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 5 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Huffman Coding

• A form of Variable Length Coding:– Assign shorter code-words to symbols most

likely to occur, longer to those less likely

• Problem: must choose code-words carefully!– Must obey prefix condition so decoder can

parse bitstreamSequence s1, s4, s3, s2

Bitstream 1 0 1 0 0 1 1 0 1

Decoder

s1 s4 s3 s2

s1 s2 or s4?

Page 6: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 6 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Huffman Coding

• Ensures instantaneously parseable code-words

• 100% efficient when p1 … pN are negative exponents of 2 (0.5, 0.25, etc …)

• Algorithm: generate Huffman coding tree:– Form the tree:

• Sort the symbols by their probabilities• Merge the two smallest probabilities by adding them and produce a new node in the tree

• Repeat until only a singe node is reached– Assign bits:

• Traverse the tree from the root to the leaf nodes assigning each branch encountered a one or zero.

• Decoding based on storing codewords in specially constructed LUT

Page 7: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 7 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Huffman Coding

• Generate code-words for each grey level

• S = {s1 s2 s3 s4 s5} = {0,4,5,6,7}

• p1 p2 p3 p4 p5 = 0.125, 0.484, 0.25, 0.125, 0.016

Page 8: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 8 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Huffman Coding

• Generate code-words for each grey level

• S = {s1 s2 s3 s4 s5} = {0,4,5,6,7}

• p1 p2 p3 p4 p5 = 0.125, 0.484, 0.25, 0.125, 0.016

Page 9: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 9 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Huffman Coding

• Efficiency:– Calculate Average Coding Rate

• Symbol probability (pi) x code-word length (li)

– Compare to entropy H(s) R

Page 10: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 10 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Huffman Coding

• Problems:– Lower bound of 1 bit/symbol– Does not facilitate adaptive coding

• Example

Page 11: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 11 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Arithmetic Coding

• Treat groups of symbols … but maintain a symbol-by-symbol encoding mechanism

• Assign a single codeword to a group of symbols

• Codeword represents a half-open interval on [0.0, 1.0)

• By assigning enough precision bits, one interval can be distinguished from another

• Symbols with higher probabilities correspond to larger intervals, thereby requiring less precision bits

Page 12: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 12 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Arithmetic Coding

• S={a,b} p1 p2 = 1/3, 2/3• First symbol narrows

interval to that symbol’s range:– Subsequent symbols further

restrict the current interval.• Decoding reverses this:

– Receives number in [0.0, 1.0)

– Checks which symbol’s range contains this & decode symbol

– Since lower & upper bounds of symbol known, their effects on the encoded number can be reversed

– Gives, a new number …– REPEAT

Page 13: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 13 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

0.0

1.0

Arithmetic Coding

• Incremental transmission• Example: message “BILL<space>GATES”

2

252572572

257216

2572167

Page 14: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 14 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Arithmetic Coding

• Can be performed very efficiently using 16/32 bit integer mathematics

• Bits are transmitted as they become available• Simplification: use the value 0.999 rather than 1.0

• In binary arithmetic this corresponds to 0.111…

• Only use fractional part => only need integers

• High initially stores 0xFFFF, whilst Low stores 0x0000

• For each symbol encoded, examine most significant bit of both High and Low:– If these bits are the same, output bit

Page 15: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 15 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Lossless Compression

Standards

Page 16: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 16 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ITU-T Facsimile

• ITU-T Rec. T4 (Group 3)• Targets scanned business documents:

– Binary images: white (1), black (0)

• Two modes: – Modified Huffman (MH):

• Run-length encoding is used to form runs of 1s and 0s for each line in the image;

• Huffman coding applied to these (run,symbol) pairs; • Different Huffman codes for runs of 1s and 0s;• A special end-of-line (EOL) symbol is encoded for error

detection purposes. – Modified Read (MR):

• Pixel values from the previous line used as predictors for current pixels to be encoded;

• Prediction residual is then encoded using Huffman coding.– MR mode is periodically interspersed with MH mode.

Page 17: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 17 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

JBIG

• Joint Binary Image Experts Group (JBIG) developed jointly by ITU-T and ISO

• Targets bi-level images:– may be either business documents or grey-

scale images of natural scenes rendered as bi-level images.

• Uses adaptive arithmetic encoding:– Modeling step estimates probability of next

symbol based on a context consisting of local pixels;

– Probability is then used to drive the arithmetic encoder;

– JBIG can be applied to grey-scale images by treating each grey-level image plane as a bi-level image.

Page 18: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 18 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Lossless JPEG

• Joint Photographic Experts Group (JPEG) has a lossless image compression mode.

• Prediction for pixel to be encoded based on a context of previously encoded pixels: – Different ways for forming the prediction;– Method used encoded as side-information for each

scan line.

• To encode the prediction residual:– (length, magnitude) pair formed; – length indicates the number of bits used to encode

the magnitude:• A static Huffman code is used.

– magnitude is the actual residual value directly encoded.

Page 19: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 19 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Lossless JPEG

• p = 190• p1 = 184, p2 = 176• P = 180• R = 180-190= -10 • Encoded as the event (4,0101)

– Negative residuals encoded as 1s complement– Huffman code for 4 is 001, then this give the final

codeword “0010101”

• Decoder: – Calculates the prediction value (180)– Parses the Huffman code, which allows decoding of the

magnitude (0101)– Detects a leading zero => knows the value must be

negative, so next four bits decoded as -10. – Reconstruction: p=P-R= 180-(-10) = 190

Page 20: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 20 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Lossy Compression

Generic structure of a video codec

Page 21: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 21 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Redundancy in Video Sequences

• Video compression targets 3 kinds of redundancy:– Spatial: the correlation that exists between

(groups of) pixels;– Temporal: similarity between video frames;– Perceptual: Human Visual System (HVS) is

less sensitive to high-frequency information.

• Lossy compression throws information away as part of these processes

• Remaining information is encoded losslessly using entropy coding

Page 22: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 22 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Redundancy in Video Sequences

• Spatial redundancy:– Transform data to be encoded into a new

representation where data is less correlated;– Leads to a more compact representation.

• Temporal redundancy:– Only encode difference between 2 video frames

(lower entropy);– Form prediction of frame to be encoded and encode

prediction residual;

• Perceptual redundancy:– Suppress/remove high frequency components

corresponding to fine image detail.

Page 23: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 23 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Coding Modes

• INTRA:– Encode a frame completely independently

(i.e. with no reference to previous/future frames);

– Forms random access point in bitstream, resets encoding, limits error propagation;

– Equivalent to having a JPEG-encoded still image at periodic intervals in bitstream.

Frame 0

N Frames N Frames

Page 24: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 24 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Coding Modes

• INTER:– Use a previous/future frame (termed reference

frame) as the basis for a prediction of the current frame;

– Could just simply subtract reference frame from current frame;

– Or use a more sophisticated prediction method;– Need to use reconstructed frame as basis for

prediction so that encoder/decoder stay synchronised.

Frame 0 Frame 0

Page 25: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 25 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Coding Unit

• Break image/frame up into 16 x 16 “macro-blocks”:

• For YUV:– 4 8x8 luminance pixel blocks;– 2 8x8 chrominance pixel blocks.

• Coding decisions made on macro-block basis:– INTRA/INTER coding mode;– prediction method if INTER;– Loss introduced.

• Decisions flagged in bitstream syntax.

Page 26: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 26 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Generic Codec Structure

Page 27: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 27 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Discrete Cosine Transform (DCT)

• Why DCT?• What is it?• How does it work?• How is it computed (in reality)?• Adoption and variations• What about the DWT?• Quantisation

Page 28: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 28 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Why DCT?

• Neighbouring pixels are likely to be similar• The same is true for prediction residual data

• Want to exploit this spatial correlation• We want a transform that:

– Removes correlation from data – Packs signal energy into as few coefficients as possible

• Coefficients suitable for entropy coding

Page 29: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 29 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Why DCT?

• Optimal solution– Use eigenvectors of the covariance matrix of the input pixel data– Order based on size of eigenvalue– Based on theory of principal component analysis (PCA)– Referred to as the Karhunen-Loeve Transform (KLT) [rao90]

• Achieves complete de-correlation• Packs most energy into fewest coefficients• Minimises MSE for a given number of coefficients (Quantisation)• Minimises the entropy

– Disadvantages:• Very computationally demanding• Transform kernel is data dependent• Kernel must be sent to decoder also!• Not practical in a real compression system

• Compromise The DCT

Page 30: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 30 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

What is the DCT?

• Treat frame as a grid of 8x8 pixel blocks– Pixel data (intra block)– Prediction Residual (inter block)

• Compute 8x8 2D DCT on each block• Formula:

• Basis functions derived using Fourier theory

otherwise 1

0,for 2

1

16

)12(cos

16

)12(cos),(

4

1),(

7

0

7

0

vuCC

vyuxyxfCCvuF

vu

x yvu

Page 31: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 31 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

What is the DCT?

• Fourier’s theorem and the Nyquist sampling criterion mean only certain discrete frequencies can be present in an 8x8 block of sampled data.

• DCT coefficients tell us “how much” of a particular frequency is present in a particular block– Very crude explanation!

• Inverse DCT (IDCT) reverses this process– Essentially Fourier synthesis

otherwise 1

0,for 2

1

16

)12(cos

16

)12(cos),(

4

1),(

7

0

7

0

vuCC

vyuxvuFCCyxf

vu

u vvu

Page 32: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 32 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

How does the DCT work?

• DCT does not compress anything in isolation!• This is achieved by quantiser and entropy coding• DCT output easier to compress though• Most natural video dominated by low frequencies

Page 33: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 33 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

How does the DCT work?

• Human eye less sensitive to high frequencies– Use a quantiser whose step size depends on frequency– Effectively discard perceptually unimportant data– After quantisation there will be many zero valued coeffs

• Typically only 5 or 6 non-zero valued coeffs [xanthopoulos99]

• Suitable for run length and entropy coding

Page 34: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 34 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

How does the DCT work?

• Zig-zag scan– Keep statistically related coeffs together– Better run-length coding

Page 35: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 35 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

How is the DCT Computed?

• Most implementations exploit the fact that the 2D DCT is separable– Compute 1D DCT on each column– Compute 1D DCT on each resultant row– 16 x 1D 8-point DCTs in total

• Need efficient implementation of 1D 8-point DCT– 30 years of research in this field– Basic implementation (64* 56+)– Fast implementation [loeffler89] (11* 29+)– Video codec optimised implementation “AAN” [arai89] (5* 29+)– Arithmetic precision a vital decision

• If constraint is 1920x1080 @ 30Hz– 97200 8x8 blocks per second– Need at least (17x106* 45x106+) per second using Loeffler!

Page 36: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 36 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

How is the DCT Computed?

• Sometimes dedicated hardware needed– Performance and/or power reasons

• Hardware architecture taxonomy

DistributedArithmetic

SystolicArray

Recursive CORDICApproxBased

IntegerEncoding

ROMBased

AdderBased

HardwareSoftware

FastAlgorithm

DCT Implementation

Page 37: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 37 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Adoption and Variations

• 8x8 DCT– Used in JPEG, H.261, H.263, MPEG-1, MPEG-2, MPEG-4 with

specific quality requirements

• Shape Adaptive DCT– Used in MPEG-4 Advanced Coding Efficiency (ACE) profile– Kernel basis functions determined by object shape

• Integer DCT Approximation– Used in H.264– Block size of 4x4 and 8x8 depending on mode– Avoids the “IDCT mismatch” problem– Less computationally demanding (16bit integer arith)– More features (can discuss later if necessary)

Page 38: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 38 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

What about the DWT

• Discrete Wavelet Transform (DWT)• Used by JPEG-2000• MPEG-4 uses SA-DWT (for static shape textures)• Why? “Better than Fourier analysis for non-stationary data”• Inherently scalable

– Involves successive LPF and HPF of data and subsampling

• More efficient at very low bit rates– DCT and coarse Q Blocking artefacts– DWT and coarse Q Blurring/smearing (much less perceptible)

• More computationally demanding than DCT

Page 39: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 39 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

What is Quantisation?

• A lossy process• Get rid of information

– Gives compression gain– Try to minimise distortion– Try to reduce entropy

• Two primary types– Scalar quantiser (one to one)– Vector quantiser (many to one)

Page 40: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 40 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Scalar Quantiser

• Need to find optimal values for– Decision levels di

– Reconstruction levels ri

• Difficult in general!

Page 41: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 41 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Scalar Quantiser

• Aim to mimimise distortion– Minimise MSE Lloyd-Max quantiser

• A good quantiser design depends on probability distribution of the input data– Want less error for more probable inputs

• Case 1: Uniform distribution– Decision bands all same width – Reconstruction levels equally spaced– Referred to as a “linear quantiser”– Used frequently for simplicity

ii dd 1

2

ii dr

Page 42: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 42 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Scalar Quantiser

• Case 2: Piecewise constant distribution– Used when # of decision levels N is large– Decision level solution difficult (Use numerical methods for

Lagrange multipliers)– Reconstruction levels

21 ii

i

ddr

Page 43: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 43 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Scalar Quantiser

• Case 3: Nonuniform distribution– Need numerical methods for di and ri

– Tables available for standard distributions (Gaussian, Laplacian, Rayleigh,…) for popular N

– This is a true Lloyd-Max quantiser (or optimum mean square quantiser)

• Case 4: Uniform quantiser– Uniform refers to equal spacing between

decision levels regardless of distribution– Similar structure to ‘Case 1’ but different

performance because distribution not uniform– Commonly used (e.g in JPEG,…)

Page 44: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 44 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

• MSE correlates well with subjective degradation• Don’t rely on MSE minimisation in isolation though• Need to consider overall rate-distortion

– Measures MSE as a function of number of bits n

– Constants a and b depend on distribution– When designing a quantiser for each DCT coefficient i need

to know ni

– 64 quantisers:

• How to determine ni (number of bits per coefficient)?– Depends on variance of coefficient i relative to others and

specified average bitrate nav

– Bit allocation algorithm paradigm

bnanf 2)(

Scalar Quantiser Performance

630 ,2)( ianf ibni

Page 45: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 45 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Bit allocation algorithms

• Try to keep constant• As variance increases, distortion decreases by

using more bits• Optimal allocation for N coefficients

• Often a rate controller after entropy encoder with feedback path to quantiser

)()( 2iiii nfnD

NN

jj

iavi bnn 1

1

0

2

2

2log1

Page 46: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 46 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Scalar Quantiser Summary

• Uniform quantiser most commonly used• In fact, rather than transmitting a

quantised coefficient, usually transmit the quantisation index

• This has much lower entropy

),(

),(),(

vu

vuFvuI

Page 47: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 47 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Vector Quantiser

• Quantise blocks of samples together– Each block assigned a single code

• A code book used to find code for block• Code book can be dynamic or pre-defined• Each pattern has specific encoding• Can give very good performance• Quite computationally expensive• Difficult to design tables• Used by GIF standard

Page 48: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 48 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Demo

Compression gain

Perceptual quality

Page 49: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 49 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Motion Estimation & Compensation

• Exploiting temporal redundancy• Motion Estimation

– Block matching algorithm overview• Matching Criteria • Selection of Search Strategies

• More advanced motion estimation techniques

• Software / Hardware Considerations• Motion Compensation• Adoption in standards discussed later

Page 50: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 50 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Exploiting Temporal RedundancyA) Frame number 1 B) Frame number 2

C) Residual = frame1 - frame2 D) Scaled residual (ease of viewing)

• Very slight change between successive frames (e.g A & B)

• Camera & Object Motion• Temporal prediction model at

encoder & decoder provides compression if:– model parameters + correction

terms < raw pixel information

• e.g. Frame differencing (C)– Entropy

• B = 7.15 bits/pixels• C = 4.38 bits/pixels

• More complex models can reduce entropy further– Computational expense, memory and prediction performance trade off

• Temporal Prediction model– Motion estimation– Motion compensation

Page 51: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 51 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Taxonomy of Motion Estimation Algorithms

• Good Motion Estimation reviews: [Mitchell96][Furht97][Kuhn99]

Motion Estimation Algorithms

Time Domain Frequency Domain

Gradient decent algorithms Matching Algorithms

pel recursiveblockrecursive

Wavelet basedmatching

Phasecorrelation

DCT basedmatching

Feature MatchingBlock Matching

Search Strategy Matching Criteria

Block Subsampling/Hierarchical

Prediction

Other Issues

Block Size Number ofreference frames

Optimisations

Rate / distortion Complexity / distortionFixed Variable

Mean Squared Error

Mean Absolute Error

Sum of absolute difference

Binary Block Matching

SAD summation truncation

SAD estimation

Reduced Bit Mean Absolute Difference

Minimised Maximum Error function

Pixel Difference Classification

Different Pixel Count

Adaptive Bit Truncation

Mean Absolute Difference of Means

Search spacereduction

Fast heuristicsearch strategies

Page 52: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 52 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Block Matching Algorithm

• For each MxN block in the current frame, find the associated best matching block within a predetermined or adaptive ±S pel search range in a reference frame(s)

– Estimates motion of a group of pixels – Assumes translational motion only– Typically operates on luminance component only– Good trade off between computationally complexity & prediction accuracy

• Motion vector (relative offsets to the best match) undergoes VLC• Prediction Residual undergoes further processing (DCT, VLC, etc)

Page 53: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 53 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

• At each MxN block search position a matching criteria evaluated• Wide variety of matching criteria:

– Mean Squared Error:

– Mean Absolute Differences:

– Sum of Absolute Differences:

• Reduced complexity matching criteria– Binary Block Match:

• Others – Cross correlation– SAD summation truncation– SAD estimation – Reduced Bit Mean Absolute Difference – Minimised Maximum Error function– Etc

• Matching criteria is a complexity/prediction performance trade off

M

i

N

jrefcurr jiBjiBSAD

1 1

,,

M

i

N

jrefcurr jiBjiB

NMMAD

1 1

,,1

2

1 1

,,1

M

i

N

jrefcurr jiBjiB

NMMSE

M

i

N

jrefcurr jiBjiBBBM

1 1

,,

Matching Criteria

Page 54: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 54 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Search Strategies (1/4)• Many possible search strategies! • Full Search: search every position

• Best results, but very computationally expensive• Operations required to generate 1 MV for 1 current block:

– (2S+1)2 block matches – For each pixel in a M * N block match: subtract, absolute, accumulate– After each block match, minimum SAD comparison– Therefore total operations:

» (2S+1)2 * (M * N * 3 + 1), e.g. s=8, 289 * (M * N * 3 + 1)

• Reduce computational expense – Logarithmic: reduces number of search positions

• Assumes matching criteria monotonically increases moving away from minimum point – iteratively converge to minimum point

– Possibility of getting stuck in local minimum» Yields higher energy prediction residual

• Pseudocode for the Three Step Search– 1: R = 2*(log2S-1); – 2: Search positions within the search window defined using R– 3: R = R/2; – 4: if R<1 finished, else repeat go to 2.

Page 55: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 55 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Search Strategies (2/4)

• Logarithmic searches contd.– Three Step Search [Koga81]

• S = 8, initial R=4• Search positions defined using R:

– (x-R,y-R), (x,y-R), (x+R,y-R) ….(x,y),…(x+R,y+R)• Operations required to generate 1 MV

– (9+8+8) * (M * N * 3 + 1)

– Variants: • 2-D logarithmic [Jain81], Parallel 1-D [Chen91],

CDS [Rao83], N3SS [Li94], 4SS [Po96]

• Hierarchical Search Strategies– Search fewer positions & use fewer pixels in the matching criteria

• Achieved via sub-sampling current & reference frames• Disadvantage: increased memory

– Best match in lower resolution seeds search for subsequent resolutions– Can help to avoid local minima due to low pass filtering effect– Local minima still possible for small regions which disappear during sub-sampling

Page 56: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 56 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Search Strategies (3/4)

• 3 Level Hierarchical Search Example:– Level 1: Original – Sub-sampled by factor of 2 generating level 2 – Level 1 sub-sampled by 4 generating level 3 – Motion Estimation starts at level 3

• block size: N/4 X M/4• Search window ±S/4 • FS or TSS employed within this window• Produces motion vector (Vx3, Vy3)

– Motion Estimation level 2• block size: N/2 X M/2 • Centered on (x/2+2*Vx3, y/2+2*Vy3)• Search window ±1 around this point• Produces motion vector (Vx2, Vy2)

– Motion Estimation level 1• Centered on (x+2*Vx2, y+2*Vy2)• Search window ±1 around this point• Produces final motion vector (Vx1, Vy1)

• Operations required to generate 1 MV using a FS at level 3• (2*(S/4)+1)2 *(M/4 * N/4 * 3 + 1) + 9*(M/2 * N/4 * 3 + 1) + 9*(M*N* 3 + 1)

Page 57: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 57 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Search Strategies (4/4)

• Scene adaptive search area– Zone based search strategies

• Can employ stopping threshold in each zone• Advantageous in a rate/distortion sense• [chan95][Jung96][Zhe97]

– Spiral Search– Dynamic search window size

• Many techniques used to adjust range:– Spatial correlation of MV [Chain95][In97]

– Gradient based methods• Block based gradient decent search [Liu96]

– Stops after 4 steps

• Diamond search [Cote97]

• Early stopping technique– Skip to next block match when the minimum SAD has

been exceeded– Successive elimination algorithm [Li95]– Conservative block SAD [Do98]

Spiral search based Motion Estimation

Zone-based Motion Estimation

Page 58: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 58 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Different Search Strategy Performance*

• Frame Differencing– “0” Motion Vector– Entropy: 4.38 bits/pixel– 1 operation/pixel (subtraction)

• Full Search– Block size 16x16– Search range ±8– Entropy: 2.61 bits/pixel– ~868 operations/pixel

• Hierarchical Search– Block size 4x4, 8x8, 16x16– Search window ±2,±4, ±8, – Entropy: 3.08 bits/pixel– ~39 operations/pixel

• Hierarchical Search– Block size 4x4, 16x16, 32x32– Search window ±2, ±4, ±8– Entropy: 2.91 bits/pixel– ~35 operations/pixel

Page 59: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 59 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

More advanced techniques (1/2)

FrameBoundary

• Bi-directional (Forward and Reverse) Prediction– Termed B-frames– Not feasible for real-time systems

• Multiple Reference Frames– Improves prediction– Increases computational expense & memory requirements

• Unrestricted Motion Vectors– Allow block matches outside the reference frame– Pixel padding used to extend beyond frame boundaries

• Predictive Motion Vectors – Rather than start at collocated block use a MV predictor

• Temporal and/or Spatial prediction [Lee97][Kos97][Zheng97]• Can improve prediction residual quality• Can employ thresholds to “gate-off” motion estimation• H/W: Reduces pixel reusability between current block positions

• Global Motion Compensation – “Default motion” for the frame/object

ORIGINALSEARCH WINDOW

PREDICTEDSEARCH WINDOW

MV PREDICTOR

Page 60: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 60 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

More advanced techniques (2/2)

• Sub-pel Motion Estimation– Real motion is not constrained by

integer pixel amounts– Half-pel & quarter pel frequently used– But memory increases– H.264:

• 6-tap FIR filter for ½ pel • Bilinear for ¼ pel

• Variable Block Size Motion– Smaller block size will lead to smaller residual– But number of motion vectors & signalling info increases

• 41 MV per 16x16 block in H.264

– MPEG-4 & H.263 Advanced Prediction Motion Estimation (4MV)– H.264:

• Dynamically adapts between multiple block sizes (16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4)

• Rate/Distortion Optimised

• Motion Vector Coding Prediction – Adding MVs to bitstream can be costly, particularly if block size

is small– DPCM used to exploit spatial MV redundancies

16x16 block 8x16 blocks 16x8 blocks

8x8 blocks8x4

block

4x8 blocks 4x4 blocks

Page 61: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 61 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ME Software/Hardware considerations

• Software algorithmic complexity (simplified analysis)– To support 1920x1280 = 9600 x 30 = 288K 16x16 blocks/sec– ±8 Search Window = 289 Block matches per current block– Total block matches: 289 * 288K = 83,232,000 matches/sec– Operations = 83,232,000 * (256 pixels*3+1) ~= 6.4 GOPS

• Hardware implementations can be attractive– Systolic Array (1D/2D) approaches typically employed

• Memory bandwidth efficient & high throughput

• Full Search commonly used– Architectures also available for heuristic search strategies

• Architectures for H.264 Variable Block Size emerging– Ball park figures for H.264 VBSME core:

• 1-D 16 PE SA: – Area: 40-60K gates; Memory Bandwidth: ~3 pixels per clock cycle– 1 16x16 block match every 4096 clock cycles (±8 search range)

• 2-D 256 PE SA:– Area: 100-200K gates; Memory Bandwidth: ~48 pixels per clock cycle– 1 16x16 block match every 256 clock cycles (±8 search range)

• To support 1920x1280: 9600 x 30 = 288K 16x16 blocks/sec– 256 PE 2D SA requires a clock frequency ~= 75Mhz – For higher throughput: Arrays of 1-D/2-D modules required

Page 62: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 62 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Motion Compensation

• Straightforward relative to motion estimation– Reconstructed MB = Residual + Mot. Comp. MB (pointed to by MVs)

• Copy block of pixels from displaced block in the reference frame into the current frame– Reference frame must be stored in decoder– For encoder and decoder to remain synchronised

• Encoder also needs to do motion compensation

• Considerations:– Additional frame memory at the decoder– Low computational requirements

Page 63: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 63 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Lossy Compression

Standards

Page 64: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 64 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Standards Evolution

1984 1986 1988 1990 19961992 1994 1998 2000 2002 2004

JPEGJPEG2000

MPEG-1 MPEG-4

H.262/MPEG-2

H.261

H.26L(H.264 / MPEG-4v10)

H.263 H.263+ H.263++ITU

standards

ITU / MPEGstandards

MPEGstandards

JPEGstandards

Page 65: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 65 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

JPEG

• Flexible image coding standard• 4 Modes of operation

– Lossless encoding (earlier)– Baseline sequential encoding– Progressive encoding– Hierarchical encoding (towards JPEG-2000)

• Motion JPEG– Baseline encoding of each frame– No motion estimation– Not properly standardised

Page 66: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 66 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

JPEG-2000

• JPEG not optimised for a wide range of apps• JPEG-2000 even more flexible• Interesting features:

– Uses DWT instead of DCT– Region of Interest (ROI) coding– Scalability

• Spatial scalability• SNR scalability

– More resilient to channel errors• Individual quality packets independently decoded

– Also supports lossless coding

• Added flexibility comes at computational cost

Page 67: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 67 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

JPEG/JPEG-2000 Summary

• JPEG capable of average compression of 15:1 for subjectively transparent quality

• JPEG-2000 better compression @ fixed rate– For ‘Foreman’:

• Gain of 1.54 dB for range of 1.20.12 bpp

• Applications– Internet– Digital photography– Many more

Page 68: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 68 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ITU-T H.261

• ITU-T: narrow bandwidth real-time apps• H.261 (p x 64)Kb/s over ISDN (1≤p≤30)• CIF and QCIF resolution• Real time video telephony/conferencing• Up to 3 frames interpolated by decoder

– Supports framerates of 30Hz, 15Hz, 10Hz, 7.5Hz• Video compression tools

– 8x8 DCT– Uniform scalar quantiser (rate control optional)– Entropy coder is modified run length and Huffman– Motion Estimation

• Only forward direction• Search window limited to ±15• Integer pixel accuracy only

– Motion Compensation is optional– Loop filter (alleviate blocking)

Page 69: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 69 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ISO/IEC MPEG-1

• Storage of AV content for delivery at ~1.5Mb/s• Flexible

– Resolutions typically ≤768x586– Framerate typically ≤30Hz

• H.261 was starting point for the standard• Compression gain at expense of latency• Specific features

– Standard VLCs determined by Huffman coding– DCT DC coeffs are differentially predicted– Bi-directional prediction (I,P,B frames)– Motion compensation with half-pixel accuracy– Maximum MV range of (-512,+511.5) for half pixel and

(-1024,+1023) for integer pixel– Weighted quantisation (H.261 does not have this)– Random access to bitstream, FF, FR

Page 70: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 70 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ISO/IEC MPEG-1

Page 71: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 71 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ISO/IEC MPEG-2

• High quality video @ 4-15Mb/s– VOD, Broadcast TV, DVD, HDTV, Satellite TV

• Major differences w.r.t. MPEG-1– More resolutions, framerates, qualities and bitrates

• SIF (352x288@25Hz) HDTV (1920x1250@60Hz)• Profiles and levels

– Has interlaced/progressive option• Frame/Field based ME, MC and DCT

– Scalability (temporal, spatial, SNR)

• Minor differences– More bits for quantisation– Alternate scan (as well as zigzag)

Page 72: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 72 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ITU-T H.263

• Very low bitrate apps (< 64kb/s)– Video telephony over PSTN, mobile telephony– Recommended resolutions: subQCIF, QCIF, CIF, 4CIF, 16CIF– Non-interlaced @ 29.97Hz

• Similar to H.261• Extensions (Some optional in Annex but included in H.264)

– MVs differentially encoded– Half-pixel accurate motion estimation

• Extensions support quarter and one eighth– Unrestricted motion vector mode

• MVs can point outside image, edge pixels form prediction– Advanced prediction mode

• MB can have 4 MVs associated with it– Syntax-based arithmetic encoding (SAC)

• Optional mode to replace VLCs with arithmetic encoding– “PB” frames– Error resilience

• Synchronisation markers• Reversible VLCs• More suggested in technical annex to standard

Page 73: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 73 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ISO/IEC MPEG-4

• An all encompassing standard!– Improved compression at 5kb/s 1Gb/s– Resolutions of sub-QCIF to studio– Content-based interactivity (semantic ‘objects’)– Universal access (scalability, error resilience)– Synthetic and natural hybrid coding (SNHC)

Page 74: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 74 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ISO/IEC MPEG-4

SA-DCT Quantiser

InverseQuantiser

SA-IDCT

EntropyEncoder

FrameMemory

MotionCompensation

MotionEstimation

+

-

+

BitstreamVideo In

Shape Coder

Shape In

Shape Decoder

PredictionResidual

Prediction

Current Frame

Current Frame Shape

Reference Frame

DecodedPredictionResidual

Reconstruction

Motion Vectors

DecodedCurrent Frame Shape

Page 75: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 75 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ISO/IEC MPEG-4

• Video coding tools– Integer, half and quarter pixel ME– Boundary MB ME: padding or polygon matching– Global ME– Shape Adaptive DCT– AC/DC intra prediction– Enhanced scalability: FGS– Still texture coding (uses SA-DWT)

• Shape Coding tools– Context-based arithmetic encoding (CAE)

• Compute context• Index into LUT for probability of 0,1• Drive arithmetic encoder

Page 76: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 76 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)

• Targets enhanced compression for wide range of apps• Improved prediction

– Variable block-size MC with small block sizes– Up to quarter-pixel MC– Unrestricted motion vector mode– Multiple reference picture MC– Weighted prediction (generalised B-pictures)– Directional intra prediction (9 4x4 modes, 1 16x16 mode)– In the loop adaptive deblocking filter

• Improved coding efficiency tools– Small block size transform– Hierarchical block transform– Short word length transform (16 bit integer arith)– Exact match inverse transform– CAVLC, CABAC

• Enhanced error robustness and network friendliness

Page 77: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 77 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)

Page 78: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 78 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)

• H.264 Version 1 has 3 profiles– Baseline– Main– Extended

• Fidelity Range Extension (FRExt) Amendment– High Profile– High 10 Profile– High 4:2:2 Profile– High 4:4:4 Profile

• Up to 12 bits per sample• Supports lossless region coding• Codes RGB to avoid colour space transformation error

Page 79: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 79 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Comparing Standards

• Video conferencing applications– Low latency real-time requirement

– H.264/AVC MP would improve by further 10-20%• Using low delay bi-prediction, CABAC

Page 80: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 80 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Comparing Standards

• Video streaming applications– Less of delay constraint

Page 81: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 81 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Comparing Standards

• Entertainment-quality applications– High resolution, delay tolerable

Page 82: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 82 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Comparing Standards

• Professional motion picture production– Random access to individual frames

• Up to HDTV, H.264/AVC MP comparable or better than Motion-JPEG2000

Page 83: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 83 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Comparing Standards

• PSNR while good does not take into account intricacies of the human eye– Need subjective video tests– Other metrics

• MPQM,…

• Experiments show that H.264 gives lowest bitrate for subjectively equivalent video over a range of apps

• Improved performance comes at the cost of computational complexity– Main bottleneck is ME (very memory intensive)

Page 84: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 84 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Image Analysis

Visual Feature Extraction

Page 85: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 85 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Visual Features - Still Images

• What features are important?– Colour– Texture

• The feel, appearance, consistency of a surface

• In an image:

• Distribution over the entire image?

• Of specific parts of the image?

No texture Highly textured

Page 86: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 86 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Visual Features - Colour

• Colour is visually important to humans• Colour features and similarity metrics easy to

compute– Histogram [Swain and Ballard, 1992]

• Most commonly used structure to represent global image features.

• Invariant to translation and rotation and can be made invariant to scale by normalisation

• MPEG-7 Scalable Colour Description: – H(16 levels) S(4 levels) V(4 Levels) – histogram encoded

with a Haar transform for efficiency & scaling

Page 87: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 87 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Visual Features - Texture

• Simple texture descriptors [Pratt, 1991]: – Autocorrelation function– Co-occurrence matrices – Edge frequency – Primitive length

• More sophisticated (based on transforms and/or filtering)– Wavelet [Mallat, 1990], Haar [Theodoridis, 1999],

Gabor [Bovis, 1990]

• Others:– Mathematical morphology – Fractals

Page 88: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 88 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Visual Features - Texture

• Example: MPEG-7 Edge Histogram– Represents the global (and possibly local -

[Won, 2002]) spatial distribution of edges• Need to first generate edge map

– Roberts, Sobel and Prewitt, Canny, …

• Build histogram based on 5 edge types

Page 89: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 89 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Change Detection

• Compare 2 temporally adjacent images and determine how different they are

• Why?– Surveillance-type applications

• Assume static camera & background• Anything changing between one object and next must be

an object!• In fact, this is naïve but starting point of many object

segmentation techniques

– Temporal video structuring• Breaking video up into “chunks” for non-linear browsing:

shots, scenes, events, story-lines

Page 90: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 90 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Temporal Video Structuring

• Shot boundary detection

A set of keyframes

Keyframe-based video browsers

a video document

Page 91: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 91 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Page 92: Image & Video Compression (19/09/2006)- 1 - Centre for Digital Video Processing C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g Image and

Image & Video Compression (19/09/2006) - 92 -

Centre for Digital Video Processing

C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g

Temporal Video Structuring

• Shot boundary detection– A shot is a continuous piece of video taken with one

camera– A shot cut is the abrupt or gradual transition between two

shots

• Uncompressed domain:– Calculate colour histogram for each frame– Calculate difference between histograms using suitable

metric: L1 (city-block), L2 (Euclidean), Mahanoblis, etc– Threshold

• Compressed domain:– Parse features directly from bitstream:

• E.g. use DCT coefficients for each frame to reconstruct approximation of image

• E.g. motion vectors for each pair of frame and detect changes in global statistics