View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Image & Video Compression (19/09/2006) - 1 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Image and Video CompressionA presentation to Avocent
Noel O’Connor, Andrew Kinane, Daniel Larkin
19/09/2006
Image & Video Compression (19/09/2006) - 2 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Overview
• Lossless Compression – Entropy coding: a brief review
• Huffman Coding• Arithmetic Coding
– Lossless Compression Standards• The FAX Group Standards, JBIG, Lossless JPEG
• Lossy Compression– Generic Codec Structure
• DCT/IDCT• Quantization• Motion Estimation• Motion Compensation
– Lossy Compression Standards• JPEG, JPEG2000, H.261 / H.263 / H.264, MPEG-1/-2/-4
• Image Analysis Techniques – Visual Feature Extraction
Image & Video Compression (19/09/2006) - 3 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Lossless Compression
Entropy Coding
Image & Video Compression (19/09/2006) - 4 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Entropy Coding
• Also referred to as source coding• Assign each symbol a binary codeword
– Allocate a specific string of bits to a symbol
• Based on information theory:– S = {s1 … sN} is set of symbols to encode
with probabilities p1 … pN
– Entropy H(s) is measure of the information content:
– Specifies lower bound on efficiency
Image & Video Compression (19/09/2006) - 5 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Huffman Coding
• A form of Variable Length Coding:– Assign shorter code-words to symbols most
likely to occur, longer to those less likely
• Problem: must choose code-words carefully!– Must obey prefix condition so decoder can
parse bitstreamSequence s1, s4, s3, s2
Bitstream 1 0 1 0 0 1 1 0 1
Decoder
s1 s4 s3 s2
s1 s2 or s4?
Image & Video Compression (19/09/2006) - 6 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Huffman Coding
• Ensures instantaneously parseable code-words
• 100% efficient when p1 … pN are negative exponents of 2 (0.5, 0.25, etc …)
• Algorithm: generate Huffman coding tree:– Form the tree:
• Sort the symbols by their probabilities• Merge the two smallest probabilities by adding them and produce a new node in the tree
• Repeat until only a singe node is reached– Assign bits:
• Traverse the tree from the root to the leaf nodes assigning each branch encountered a one or zero.
• Decoding based on storing codewords in specially constructed LUT
Image & Video Compression (19/09/2006) - 7 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Huffman Coding
• Generate code-words for each grey level
• S = {s1 s2 s3 s4 s5} = {0,4,5,6,7}
• p1 p2 p3 p4 p5 = 0.125, 0.484, 0.25, 0.125, 0.016
Image & Video Compression (19/09/2006) - 8 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Huffman Coding
• Generate code-words for each grey level
• S = {s1 s2 s3 s4 s5} = {0,4,5,6,7}
• p1 p2 p3 p4 p5 = 0.125, 0.484, 0.25, 0.125, 0.016
Image & Video Compression (19/09/2006) - 9 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Huffman Coding
• Efficiency:– Calculate Average Coding Rate
• Symbol probability (pi) x code-word length (li)
– Compare to entropy H(s) R
Image & Video Compression (19/09/2006) - 10 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Huffman Coding
• Problems:– Lower bound of 1 bit/symbol– Does not facilitate adaptive coding
• Example
Image & Video Compression (19/09/2006) - 11 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Arithmetic Coding
• Treat groups of symbols … but maintain a symbol-by-symbol encoding mechanism
• Assign a single codeword to a group of symbols
• Codeword represents a half-open interval on [0.0, 1.0)
• By assigning enough precision bits, one interval can be distinguished from another
• Symbols with higher probabilities correspond to larger intervals, thereby requiring less precision bits
Image & Video Compression (19/09/2006) - 12 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Arithmetic Coding
• S={a,b} p1 p2 = 1/3, 2/3• First symbol narrows
interval to that symbol’s range:– Subsequent symbols further
restrict the current interval.• Decoding reverses this:
– Receives number in [0.0, 1.0)
– Checks which symbol’s range contains this & decode symbol
– Since lower & upper bounds of symbol known, their effects on the encoded number can be reversed
– Gives, a new number …– REPEAT
Image & Video Compression (19/09/2006) - 13 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
0.0
1.0
Arithmetic Coding
• Incremental transmission• Example: message “BILL<space>GATES”
2
252572572
257216
2572167
Image & Video Compression (19/09/2006) - 14 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Arithmetic Coding
• Can be performed very efficiently using 16/32 bit integer mathematics
• Bits are transmitted as they become available• Simplification: use the value 0.999 rather than 1.0
• In binary arithmetic this corresponds to 0.111…
• Only use fractional part => only need integers
• High initially stores 0xFFFF, whilst Low stores 0x0000
• For each symbol encoded, examine most significant bit of both High and Low:– If these bits are the same, output bit
Image & Video Compression (19/09/2006) - 15 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Lossless Compression
Standards
Image & Video Compression (19/09/2006) - 16 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ITU-T Facsimile
• ITU-T Rec. T4 (Group 3)• Targets scanned business documents:
– Binary images: white (1), black (0)
• Two modes: – Modified Huffman (MH):
• Run-length encoding is used to form runs of 1s and 0s for each line in the image;
• Huffman coding applied to these (run,symbol) pairs; • Different Huffman codes for runs of 1s and 0s;• A special end-of-line (EOL) symbol is encoded for error
detection purposes. – Modified Read (MR):
• Pixel values from the previous line used as predictors for current pixels to be encoded;
• Prediction residual is then encoded using Huffman coding.– MR mode is periodically interspersed with MH mode.
Image & Video Compression (19/09/2006) - 17 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
JBIG
• Joint Binary Image Experts Group (JBIG) developed jointly by ITU-T and ISO
• Targets bi-level images:– may be either business documents or grey-
scale images of natural scenes rendered as bi-level images.
• Uses adaptive arithmetic encoding:– Modeling step estimates probability of next
symbol based on a context consisting of local pixels;
– Probability is then used to drive the arithmetic encoder;
– JBIG can be applied to grey-scale images by treating each grey-level image plane as a bi-level image.
Image & Video Compression (19/09/2006) - 18 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Lossless JPEG
• Joint Photographic Experts Group (JPEG) has a lossless image compression mode.
• Prediction for pixel to be encoded based on a context of previously encoded pixels: – Different ways for forming the prediction;– Method used encoded as side-information for each
scan line.
• To encode the prediction residual:– (length, magnitude) pair formed; – length indicates the number of bits used to encode
the magnitude:• A static Huffman code is used.
– magnitude is the actual residual value directly encoded.
Image & Video Compression (19/09/2006) - 19 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Lossless JPEG
• p = 190• p1 = 184, p2 = 176• P = 180• R = 180-190= -10 • Encoded as the event (4,0101)
– Negative residuals encoded as 1s complement– Huffman code for 4 is 001, then this give the final
codeword “0010101”
• Decoder: – Calculates the prediction value (180)– Parses the Huffman code, which allows decoding of the
magnitude (0101)– Detects a leading zero => knows the value must be
negative, so next four bits decoded as -10. – Reconstruction: p=P-R= 180-(-10) = 190
Image & Video Compression (19/09/2006) - 20 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Lossy Compression
Generic structure of a video codec
Image & Video Compression (19/09/2006) - 21 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Redundancy in Video Sequences
• Video compression targets 3 kinds of redundancy:– Spatial: the correlation that exists between
(groups of) pixels;– Temporal: similarity between video frames;– Perceptual: Human Visual System (HVS) is
less sensitive to high-frequency information.
• Lossy compression throws information away as part of these processes
• Remaining information is encoded losslessly using entropy coding
Image & Video Compression (19/09/2006) - 22 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Redundancy in Video Sequences
• Spatial redundancy:– Transform data to be encoded into a new
representation where data is less correlated;– Leads to a more compact representation.
• Temporal redundancy:– Only encode difference between 2 video frames
(lower entropy);– Form prediction of frame to be encoded and encode
prediction residual;
• Perceptual redundancy:– Suppress/remove high frequency components
corresponding to fine image detail.
Image & Video Compression (19/09/2006) - 23 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Coding Modes
• INTRA:– Encode a frame completely independently
(i.e. with no reference to previous/future frames);
– Forms random access point in bitstream, resets encoding, limits error propagation;
– Equivalent to having a JPEG-encoded still image at periodic intervals in bitstream.
Frame 0
N Frames N Frames
Image & Video Compression (19/09/2006) - 24 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Coding Modes
• INTER:– Use a previous/future frame (termed reference
frame) as the basis for a prediction of the current frame;
– Could just simply subtract reference frame from current frame;
– Or use a more sophisticated prediction method;– Need to use reconstructed frame as basis for
prediction so that encoder/decoder stay synchronised.
Frame 0 Frame 0
Image & Video Compression (19/09/2006) - 25 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Coding Unit
• Break image/frame up into 16 x 16 “macro-blocks”:
• For YUV:– 4 8x8 luminance pixel blocks;– 2 8x8 chrominance pixel blocks.
• Coding decisions made on macro-block basis:– INTRA/INTER coding mode;– prediction method if INTER;– Loss introduced.
• Decisions flagged in bitstream syntax.
Image & Video Compression (19/09/2006) - 26 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Generic Codec Structure
Image & Video Compression (19/09/2006) - 27 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Discrete Cosine Transform (DCT)
• Why DCT?• What is it?• How does it work?• How is it computed (in reality)?• Adoption and variations• What about the DWT?• Quantisation
Image & Video Compression (19/09/2006) - 28 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Why DCT?
• Neighbouring pixels are likely to be similar• The same is true for prediction residual data
• Want to exploit this spatial correlation• We want a transform that:
– Removes correlation from data – Packs signal energy into as few coefficients as possible
• Coefficients suitable for entropy coding
Image & Video Compression (19/09/2006) - 29 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Why DCT?
• Optimal solution– Use eigenvectors of the covariance matrix of the input pixel data– Order based on size of eigenvalue– Based on theory of principal component analysis (PCA)– Referred to as the Karhunen-Loeve Transform (KLT) [rao90]
• Achieves complete de-correlation• Packs most energy into fewest coefficients• Minimises MSE for a given number of coefficients (Quantisation)• Minimises the entropy
– Disadvantages:• Very computationally demanding• Transform kernel is data dependent• Kernel must be sent to decoder also!• Not practical in a real compression system
• Compromise The DCT
Image & Video Compression (19/09/2006) - 30 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
What is the DCT?
• Treat frame as a grid of 8x8 pixel blocks– Pixel data (intra block)– Prediction Residual (inter block)
• Compute 8x8 2D DCT on each block• Formula:
• Basis functions derived using Fourier theory
otherwise 1
0,for 2
1
16
)12(cos
16
)12(cos),(
4
1),(
7
0
7
0
vuCC
vyuxyxfCCvuF
vu
x yvu
Image & Video Compression (19/09/2006) - 31 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
What is the DCT?
• Fourier’s theorem and the Nyquist sampling criterion mean only certain discrete frequencies can be present in an 8x8 block of sampled data.
• DCT coefficients tell us “how much” of a particular frequency is present in a particular block– Very crude explanation!
• Inverse DCT (IDCT) reverses this process– Essentially Fourier synthesis
otherwise 1
0,for 2
1
16
)12(cos
16
)12(cos),(
4
1),(
7
0
7
0
vuCC
vyuxvuFCCyxf
vu
u vvu
Image & Video Compression (19/09/2006) - 32 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
How does the DCT work?
• DCT does not compress anything in isolation!• This is achieved by quantiser and entropy coding• DCT output easier to compress though• Most natural video dominated by low frequencies
Image & Video Compression (19/09/2006) - 33 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
How does the DCT work?
• Human eye less sensitive to high frequencies– Use a quantiser whose step size depends on frequency– Effectively discard perceptually unimportant data– After quantisation there will be many zero valued coeffs
• Typically only 5 or 6 non-zero valued coeffs [xanthopoulos99]
• Suitable for run length and entropy coding
Image & Video Compression (19/09/2006) - 34 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
How does the DCT work?
• Zig-zag scan– Keep statistically related coeffs together– Better run-length coding
Image & Video Compression (19/09/2006) - 35 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
How is the DCT Computed?
• Most implementations exploit the fact that the 2D DCT is separable– Compute 1D DCT on each column– Compute 1D DCT on each resultant row– 16 x 1D 8-point DCTs in total
• Need efficient implementation of 1D 8-point DCT– 30 years of research in this field– Basic implementation (64* 56+)– Fast implementation [loeffler89] (11* 29+)– Video codec optimised implementation “AAN” [arai89] (5* 29+)– Arithmetic precision a vital decision
• If constraint is 1920x1080 @ 30Hz– 97200 8x8 blocks per second– Need at least (17x106* 45x106+) per second using Loeffler!
Image & Video Compression (19/09/2006) - 36 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
How is the DCT Computed?
• Sometimes dedicated hardware needed– Performance and/or power reasons
• Hardware architecture taxonomy
DistributedArithmetic
SystolicArray
Recursive CORDICApproxBased
IntegerEncoding
ROMBased
AdderBased
HardwareSoftware
FastAlgorithm
DCT Implementation
Image & Video Compression (19/09/2006) - 37 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Adoption and Variations
• 8x8 DCT– Used in JPEG, H.261, H.263, MPEG-1, MPEG-2, MPEG-4 with
specific quality requirements
• Shape Adaptive DCT– Used in MPEG-4 Advanced Coding Efficiency (ACE) profile– Kernel basis functions determined by object shape
• Integer DCT Approximation– Used in H.264– Block size of 4x4 and 8x8 depending on mode– Avoids the “IDCT mismatch” problem– Less computationally demanding (16bit integer arith)– More features (can discuss later if necessary)
Image & Video Compression (19/09/2006) - 38 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
What about the DWT
• Discrete Wavelet Transform (DWT)• Used by JPEG-2000• MPEG-4 uses SA-DWT (for static shape textures)• Why? “Better than Fourier analysis for non-stationary data”• Inherently scalable
– Involves successive LPF and HPF of data and subsampling
• More efficient at very low bit rates– DCT and coarse Q Blocking artefacts– DWT and coarse Q Blurring/smearing (much less perceptible)
• More computationally demanding than DCT
Image & Video Compression (19/09/2006) - 39 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
What is Quantisation?
• A lossy process• Get rid of information
– Gives compression gain– Try to minimise distortion– Try to reduce entropy
• Two primary types– Scalar quantiser (one to one)– Vector quantiser (many to one)
Image & Video Compression (19/09/2006) - 40 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Scalar Quantiser
• Need to find optimal values for– Decision levels di
– Reconstruction levels ri
• Difficult in general!
Image & Video Compression (19/09/2006) - 41 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Scalar Quantiser
• Aim to mimimise distortion– Minimise MSE Lloyd-Max quantiser
• A good quantiser design depends on probability distribution of the input data– Want less error for more probable inputs
• Case 1: Uniform distribution– Decision bands all same width – Reconstruction levels equally spaced– Referred to as a “linear quantiser”– Used frequently for simplicity
ii dd 1
2
ii dr
Image & Video Compression (19/09/2006) - 42 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Scalar Quantiser
• Case 2: Piecewise constant distribution– Used when # of decision levels N is large– Decision level solution difficult (Use numerical methods for
Lagrange multipliers)– Reconstruction levels
21 ii
i
ddr
Image & Video Compression (19/09/2006) - 43 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Scalar Quantiser
• Case 3: Nonuniform distribution– Need numerical methods for di and ri
– Tables available for standard distributions (Gaussian, Laplacian, Rayleigh,…) for popular N
– This is a true Lloyd-Max quantiser (or optimum mean square quantiser)
• Case 4: Uniform quantiser– Uniform refers to equal spacing between
decision levels regardless of distribution– Similar structure to ‘Case 1’ but different
performance because distribution not uniform– Commonly used (e.g in JPEG,…)
Image & Video Compression (19/09/2006) - 44 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
• MSE correlates well with subjective degradation• Don’t rely on MSE minimisation in isolation though• Need to consider overall rate-distortion
– Measures MSE as a function of number of bits n
– Constants a and b depend on distribution– When designing a quantiser for each DCT coefficient i need
to know ni
– 64 quantisers:
• How to determine ni (number of bits per coefficient)?– Depends on variance of coefficient i relative to others and
specified average bitrate nav
– Bit allocation algorithm paradigm
bnanf 2)(
Scalar Quantiser Performance
630 ,2)( ianf ibni
Image & Video Compression (19/09/2006) - 45 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Bit allocation algorithms
• Try to keep constant• As variance increases, distortion decreases by
using more bits• Optimal allocation for N coefficients
• Often a rate controller after entropy encoder with feedback path to quantiser
)()( 2iiii nfnD
NN
jj
iavi bnn 1
1
0
2
2
2log1
Image & Video Compression (19/09/2006) - 46 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Scalar Quantiser Summary
• Uniform quantiser most commonly used• In fact, rather than transmitting a
quantised coefficient, usually transmit the quantisation index
• This has much lower entropy
),(
),(),(
vu
vuFvuI
Image & Video Compression (19/09/2006) - 47 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Vector Quantiser
• Quantise blocks of samples together– Each block assigned a single code
• A code book used to find code for block• Code book can be dynamic or pre-defined• Each pattern has specific encoding• Can give very good performance• Quite computationally expensive• Difficult to design tables• Used by GIF standard
Image & Video Compression (19/09/2006) - 48 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Demo
Compression gain
Perceptual quality
Image & Video Compression (19/09/2006) - 49 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Motion Estimation & Compensation
• Exploiting temporal redundancy• Motion Estimation
– Block matching algorithm overview• Matching Criteria • Selection of Search Strategies
• More advanced motion estimation techniques
• Software / Hardware Considerations• Motion Compensation• Adoption in standards discussed later
Image & Video Compression (19/09/2006) - 50 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Exploiting Temporal RedundancyA) Frame number 1 B) Frame number 2
C) Residual = frame1 - frame2 D) Scaled residual (ease of viewing)
• Very slight change between successive frames (e.g A & B)
• Camera & Object Motion• Temporal prediction model at
encoder & decoder provides compression if:– model parameters + correction
terms < raw pixel information
• e.g. Frame differencing (C)– Entropy
• B = 7.15 bits/pixels• C = 4.38 bits/pixels
• More complex models can reduce entropy further– Computational expense, memory and prediction performance trade off
• Temporal Prediction model– Motion estimation– Motion compensation
Image & Video Compression (19/09/2006) - 51 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Taxonomy of Motion Estimation Algorithms
• Good Motion Estimation reviews: [Mitchell96][Furht97][Kuhn99]
Motion Estimation Algorithms
Time Domain Frequency Domain
Gradient decent algorithms Matching Algorithms
pel recursiveblockrecursive
Wavelet basedmatching
Phasecorrelation
DCT basedmatching
Feature MatchingBlock Matching
Search Strategy Matching Criteria
Block Subsampling/Hierarchical
Prediction
Other Issues
Block Size Number ofreference frames
Optimisations
Rate / distortion Complexity / distortionFixed Variable
Mean Squared Error
Mean Absolute Error
Sum of absolute difference
Binary Block Matching
SAD summation truncation
SAD estimation
Reduced Bit Mean Absolute Difference
Minimised Maximum Error function
Pixel Difference Classification
Different Pixel Count
Adaptive Bit Truncation
Mean Absolute Difference of Means
Search spacereduction
Fast heuristicsearch strategies
Image & Video Compression (19/09/2006) - 52 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Block Matching Algorithm
• For each MxN block in the current frame, find the associated best matching block within a predetermined or adaptive ±S pel search range in a reference frame(s)
– Estimates motion of a group of pixels – Assumes translational motion only– Typically operates on luminance component only– Good trade off between computationally complexity & prediction accuracy
• Motion vector (relative offsets to the best match) undergoes VLC• Prediction Residual undergoes further processing (DCT, VLC, etc)
Image & Video Compression (19/09/2006) - 53 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
• At each MxN block search position a matching criteria evaluated• Wide variety of matching criteria:
– Mean Squared Error:
– Mean Absolute Differences:
– Sum of Absolute Differences:
• Reduced complexity matching criteria– Binary Block Match:
• Others – Cross correlation– SAD summation truncation– SAD estimation – Reduced Bit Mean Absolute Difference – Minimised Maximum Error function– Etc
• Matching criteria is a complexity/prediction performance trade off
M
i
N
jrefcurr jiBjiBSAD
1 1
,,
M
i
N
jrefcurr jiBjiB
NMMAD
1 1
,,1
2
1 1
,,1
M
i
N
jrefcurr jiBjiB
NMMSE
M
i
N
jrefcurr jiBjiBBBM
1 1
,,
Matching Criteria
Image & Video Compression (19/09/2006) - 54 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Search Strategies (1/4)• Many possible search strategies! • Full Search: search every position
• Best results, but very computationally expensive• Operations required to generate 1 MV for 1 current block:
– (2S+1)2 block matches – For each pixel in a M * N block match: subtract, absolute, accumulate– After each block match, minimum SAD comparison– Therefore total operations:
» (2S+1)2 * (M * N * 3 + 1), e.g. s=8, 289 * (M * N * 3 + 1)
• Reduce computational expense – Logarithmic: reduces number of search positions
• Assumes matching criteria monotonically increases moving away from minimum point – iteratively converge to minimum point
– Possibility of getting stuck in local minimum» Yields higher energy prediction residual
• Pseudocode for the Three Step Search– 1: R = 2*(log2S-1); – 2: Search positions within the search window defined using R– 3: R = R/2; – 4: if R<1 finished, else repeat go to 2.
Image & Video Compression (19/09/2006) - 55 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Search Strategies (2/4)
• Logarithmic searches contd.– Three Step Search [Koga81]
• S = 8, initial R=4• Search positions defined using R:
– (x-R,y-R), (x,y-R), (x+R,y-R) ….(x,y),…(x+R,y+R)• Operations required to generate 1 MV
– (9+8+8) * (M * N * 3 + 1)
– Variants: • 2-D logarithmic [Jain81], Parallel 1-D [Chen91],
CDS [Rao83], N3SS [Li94], 4SS [Po96]
• Hierarchical Search Strategies– Search fewer positions & use fewer pixels in the matching criteria
• Achieved via sub-sampling current & reference frames• Disadvantage: increased memory
– Best match in lower resolution seeds search for subsequent resolutions– Can help to avoid local minima due to low pass filtering effect– Local minima still possible for small regions which disappear during sub-sampling
Image & Video Compression (19/09/2006) - 56 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Search Strategies (3/4)
• 3 Level Hierarchical Search Example:– Level 1: Original – Sub-sampled by factor of 2 generating level 2 – Level 1 sub-sampled by 4 generating level 3 – Motion Estimation starts at level 3
• block size: N/4 X M/4• Search window ±S/4 • FS or TSS employed within this window• Produces motion vector (Vx3, Vy3)
– Motion Estimation level 2• block size: N/2 X M/2 • Centered on (x/2+2*Vx3, y/2+2*Vy3)• Search window ±1 around this point• Produces motion vector (Vx2, Vy2)
– Motion Estimation level 1• Centered on (x+2*Vx2, y+2*Vy2)• Search window ±1 around this point• Produces final motion vector (Vx1, Vy1)
• Operations required to generate 1 MV using a FS at level 3• (2*(S/4)+1)2 *(M/4 * N/4 * 3 + 1) + 9*(M/2 * N/4 * 3 + 1) + 9*(M*N* 3 + 1)
Image & Video Compression (19/09/2006) - 57 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Search Strategies (4/4)
• Scene adaptive search area– Zone based search strategies
• Can employ stopping threshold in each zone• Advantageous in a rate/distortion sense• [chan95][Jung96][Zhe97]
– Spiral Search– Dynamic search window size
• Many techniques used to adjust range:– Spatial correlation of MV [Chain95][In97]
– Gradient based methods• Block based gradient decent search [Liu96]
– Stops after 4 steps
• Diamond search [Cote97]
• Early stopping technique– Skip to next block match when the minimum SAD has
been exceeded– Successive elimination algorithm [Li95]– Conservative block SAD [Do98]
Spiral search based Motion Estimation
Zone-based Motion Estimation
Image & Video Compression (19/09/2006) - 58 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Different Search Strategy Performance*
• Frame Differencing– “0” Motion Vector– Entropy: 4.38 bits/pixel– 1 operation/pixel (subtraction)
• Full Search– Block size 16x16– Search range ±8– Entropy: 2.61 bits/pixel– ~868 operations/pixel
• Hierarchical Search– Block size 4x4, 8x8, 16x16– Search window ±2,±4, ±8, – Entropy: 3.08 bits/pixel– ~39 operations/pixel
• Hierarchical Search– Block size 4x4, 16x16, 32x32– Search window ±2, ±4, ±8– Entropy: 2.91 bits/pixel– ~35 operations/pixel
Image & Video Compression (19/09/2006) - 59 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
More advanced techniques (1/2)
FrameBoundary
• Bi-directional (Forward and Reverse) Prediction– Termed B-frames– Not feasible for real-time systems
• Multiple Reference Frames– Improves prediction– Increases computational expense & memory requirements
• Unrestricted Motion Vectors– Allow block matches outside the reference frame– Pixel padding used to extend beyond frame boundaries
• Predictive Motion Vectors – Rather than start at collocated block use a MV predictor
• Temporal and/or Spatial prediction [Lee97][Kos97][Zheng97]• Can improve prediction residual quality• Can employ thresholds to “gate-off” motion estimation• H/W: Reduces pixel reusability between current block positions
• Global Motion Compensation – “Default motion” for the frame/object
ORIGINALSEARCH WINDOW
PREDICTEDSEARCH WINDOW
MV PREDICTOR
Image & Video Compression (19/09/2006) - 60 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
More advanced techniques (2/2)
• Sub-pel Motion Estimation– Real motion is not constrained by
integer pixel amounts– Half-pel & quarter pel frequently used– But memory increases– H.264:
• 6-tap FIR filter for ½ pel • Bilinear for ¼ pel
• Variable Block Size Motion– Smaller block size will lead to smaller residual– But number of motion vectors & signalling info increases
• 41 MV per 16x16 block in H.264
– MPEG-4 & H.263 Advanced Prediction Motion Estimation (4MV)– H.264:
• Dynamically adapts between multiple block sizes (16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4)
• Rate/Distortion Optimised
• Motion Vector Coding Prediction – Adding MVs to bitstream can be costly, particularly if block size
is small– DPCM used to exploit spatial MV redundancies
16x16 block 8x16 blocks 16x8 blocks
8x8 blocks8x4
block
4x8 blocks 4x4 blocks
Image & Video Compression (19/09/2006) - 61 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ME Software/Hardware considerations
• Software algorithmic complexity (simplified analysis)– To support 1920x1280 = 9600 x 30 = 288K 16x16 blocks/sec– ±8 Search Window = 289 Block matches per current block– Total block matches: 289 * 288K = 83,232,000 matches/sec– Operations = 83,232,000 * (256 pixels*3+1) ~= 6.4 GOPS
• Hardware implementations can be attractive– Systolic Array (1D/2D) approaches typically employed
• Memory bandwidth efficient & high throughput
• Full Search commonly used– Architectures also available for heuristic search strategies
• Architectures for H.264 Variable Block Size emerging– Ball park figures for H.264 VBSME core:
• 1-D 16 PE SA: – Area: 40-60K gates; Memory Bandwidth: ~3 pixels per clock cycle– 1 16x16 block match every 4096 clock cycles (±8 search range)
• 2-D 256 PE SA:– Area: 100-200K gates; Memory Bandwidth: ~48 pixels per clock cycle– 1 16x16 block match every 256 clock cycles (±8 search range)
• To support 1920x1280: 9600 x 30 = 288K 16x16 blocks/sec– 256 PE 2D SA requires a clock frequency ~= 75Mhz – For higher throughput: Arrays of 1-D/2-D modules required
Image & Video Compression (19/09/2006) - 62 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Motion Compensation
• Straightforward relative to motion estimation– Reconstructed MB = Residual + Mot. Comp. MB (pointed to by MVs)
• Copy block of pixels from displaced block in the reference frame into the current frame– Reference frame must be stored in decoder– For encoder and decoder to remain synchronised
• Encoder also needs to do motion compensation
• Considerations:– Additional frame memory at the decoder– Low computational requirements
Image & Video Compression (19/09/2006) - 63 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Lossy Compression
Standards
Image & Video Compression (19/09/2006) - 64 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Standards Evolution
1984 1986 1988 1990 19961992 1994 1998 2000 2002 2004
JPEGJPEG2000
MPEG-1 MPEG-4
H.262/MPEG-2
H.261
H.26L(H.264 / MPEG-4v10)
H.263 H.263+ H.263++ITU
standards
ITU / MPEGstandards
MPEGstandards
JPEGstandards
Image & Video Compression (19/09/2006) - 65 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
JPEG
• Flexible image coding standard• 4 Modes of operation
– Lossless encoding (earlier)– Baseline sequential encoding– Progressive encoding– Hierarchical encoding (towards JPEG-2000)
• Motion JPEG– Baseline encoding of each frame– No motion estimation– Not properly standardised
Image & Video Compression (19/09/2006) - 66 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
JPEG-2000
• JPEG not optimised for a wide range of apps• JPEG-2000 even more flexible• Interesting features:
– Uses DWT instead of DCT– Region of Interest (ROI) coding– Scalability
• Spatial scalability• SNR scalability
– More resilient to channel errors• Individual quality packets independently decoded
– Also supports lossless coding
• Added flexibility comes at computational cost
Image & Video Compression (19/09/2006) - 67 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
JPEG/JPEG-2000 Summary
• JPEG capable of average compression of 15:1 for subjectively transparent quality
• JPEG-2000 better compression @ fixed rate– For ‘Foreman’:
• Gain of 1.54 dB for range of 1.20.12 bpp
• Applications– Internet– Digital photography– Many more
Image & Video Compression (19/09/2006) - 68 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ITU-T H.261
• ITU-T: narrow bandwidth real-time apps• H.261 (p x 64)Kb/s over ISDN (1≤p≤30)• CIF and QCIF resolution• Real time video telephony/conferencing• Up to 3 frames interpolated by decoder
– Supports framerates of 30Hz, 15Hz, 10Hz, 7.5Hz• Video compression tools
– 8x8 DCT– Uniform scalar quantiser (rate control optional)– Entropy coder is modified run length and Huffman– Motion Estimation
• Only forward direction• Search window limited to ±15• Integer pixel accuracy only
– Motion Compensation is optional– Loop filter (alleviate blocking)
Image & Video Compression (19/09/2006) - 69 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ISO/IEC MPEG-1
• Storage of AV content for delivery at ~1.5Mb/s• Flexible
– Resolutions typically ≤768x586– Framerate typically ≤30Hz
• H.261 was starting point for the standard• Compression gain at expense of latency• Specific features
– Standard VLCs determined by Huffman coding– DCT DC coeffs are differentially predicted– Bi-directional prediction (I,P,B frames)– Motion compensation with half-pixel accuracy– Maximum MV range of (-512,+511.5) for half pixel and
(-1024,+1023) for integer pixel– Weighted quantisation (H.261 does not have this)– Random access to bitstream, FF, FR
Image & Video Compression (19/09/2006) - 70 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ISO/IEC MPEG-1
Image & Video Compression (19/09/2006) - 71 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ISO/IEC MPEG-2
• High quality video @ 4-15Mb/s– VOD, Broadcast TV, DVD, HDTV, Satellite TV
• Major differences w.r.t. MPEG-1– More resolutions, framerates, qualities and bitrates
• SIF (352x288@25Hz) HDTV (1920x1250@60Hz)• Profiles and levels
– Has interlaced/progressive option• Frame/Field based ME, MC and DCT
– Scalability (temporal, spatial, SNR)
• Minor differences– More bits for quantisation– Alternate scan (as well as zigzag)
Image & Video Compression (19/09/2006) - 72 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ITU-T H.263
• Very low bitrate apps (< 64kb/s)– Video telephony over PSTN, mobile telephony– Recommended resolutions: subQCIF, QCIF, CIF, 4CIF, 16CIF– Non-interlaced @ 29.97Hz
• Similar to H.261• Extensions (Some optional in Annex but included in H.264)
– MVs differentially encoded– Half-pixel accurate motion estimation
• Extensions support quarter and one eighth– Unrestricted motion vector mode
• MVs can point outside image, edge pixels form prediction– Advanced prediction mode
• MB can have 4 MVs associated with it– Syntax-based arithmetic encoding (SAC)
• Optional mode to replace VLCs with arithmetic encoding– “PB” frames– Error resilience
• Synchronisation markers• Reversible VLCs• More suggested in technical annex to standard
Image & Video Compression (19/09/2006) - 73 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ISO/IEC MPEG-4
• An all encompassing standard!– Improved compression at 5kb/s 1Gb/s– Resolutions of sub-QCIF to studio– Content-based interactivity (semantic ‘objects’)– Universal access (scalability, error resilience)– Synthetic and natural hybrid coding (SNHC)
Image & Video Compression (19/09/2006) - 74 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ISO/IEC MPEG-4
SA-DCT Quantiser
InverseQuantiser
SA-IDCT
EntropyEncoder
FrameMemory
MotionCompensation
MotionEstimation
+
-
+
BitstreamVideo In
Shape Coder
Shape In
Shape Decoder
PredictionResidual
Prediction
Current Frame
Current Frame Shape
Reference Frame
DecodedPredictionResidual
Reconstruction
Motion Vectors
DecodedCurrent Frame Shape
Image & Video Compression (19/09/2006) - 75 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ISO/IEC MPEG-4
• Video coding tools– Integer, half and quarter pixel ME– Boundary MB ME: padding or polygon matching– Global ME– Shape Adaptive DCT– AC/DC intra prediction– Enhanced scalability: FGS– Still texture coding (uses SA-DWT)
• Shape Coding tools– Context-based arithmetic encoding (CAE)
• Compute context• Index into LUT for probability of 0,1• Drive arithmetic encoder
Image & Video Compression (19/09/2006) - 76 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
• Targets enhanced compression for wide range of apps• Improved prediction
– Variable block-size MC with small block sizes– Up to quarter-pixel MC– Unrestricted motion vector mode– Multiple reference picture MC– Weighted prediction (generalised B-pictures)– Directional intra prediction (9 4x4 modes, 1 16x16 mode)– In the loop adaptive deblocking filter
• Improved coding efficiency tools– Small block size transform– Hierarchical block transform– Short word length transform (16 bit integer arith)– Exact match inverse transform– CAVLC, CABAC
• Enhanced error robustness and network friendliness
Image & Video Compression (19/09/2006) - 77 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
Image & Video Compression (19/09/2006) - 78 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
• H.264 Version 1 has 3 profiles– Baseline– Main– Extended
• Fidelity Range Extension (FRExt) Amendment– High Profile– High 10 Profile– High 4:2:2 Profile– High 4:4:4 Profile
• Up to 12 bits per sample• Supports lossless region coding• Codes RGB to avoid colour space transformation error
Image & Video Compression (19/09/2006) - 79 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Comparing Standards
• Video conferencing applications– Low latency real-time requirement
– H.264/AVC MP would improve by further 10-20%• Using low delay bi-prediction, CABAC
Image & Video Compression (19/09/2006) - 80 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Comparing Standards
• Video streaming applications– Less of delay constraint
Image & Video Compression (19/09/2006) - 81 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Comparing Standards
• Entertainment-quality applications– High resolution, delay tolerable
Image & Video Compression (19/09/2006) - 82 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Comparing Standards
• Professional motion picture production– Random access to individual frames
• Up to HDTV, H.264/AVC MP comparable or better than Motion-JPEG2000
Image & Video Compression (19/09/2006) - 83 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Comparing Standards
• PSNR while good does not take into account intricacies of the human eye– Need subjective video tests– Other metrics
• MPQM,…
• Experiments show that H.264 gives lowest bitrate for subjectively equivalent video over a range of apps
• Improved performance comes at the cost of computational complexity– Main bottleneck is ME (very memory intensive)
Image & Video Compression (19/09/2006) - 84 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Image Analysis
Visual Feature Extraction
Image & Video Compression (19/09/2006) - 85 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Visual Features - Still Images
• What features are important?– Colour– Texture
• The feel, appearance, consistency of a surface
• In an image:
• Distribution over the entire image?
• Of specific parts of the image?
No texture Highly textured
Image & Video Compression (19/09/2006) - 86 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Visual Features - Colour
• Colour is visually important to humans• Colour features and similarity metrics easy to
compute– Histogram [Swain and Ballard, 1992]
• Most commonly used structure to represent global image features.
• Invariant to translation and rotation and can be made invariant to scale by normalisation
• MPEG-7 Scalable Colour Description: – H(16 levels) S(4 levels) V(4 Levels) – histogram encoded
with a Haar transform for efficiency & scaling
Image & Video Compression (19/09/2006) - 87 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Visual Features - Texture
• Simple texture descriptors [Pratt, 1991]: – Autocorrelation function– Co-occurrence matrices – Edge frequency – Primitive length
• More sophisticated (based on transforms and/or filtering)– Wavelet [Mallat, 1990], Haar [Theodoridis, 1999],
Gabor [Bovis, 1990]
• Others:– Mathematical morphology – Fractals
Image & Video Compression (19/09/2006) - 88 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Visual Features - Texture
• Example: MPEG-7 Edge Histogram– Represents the global (and possibly local -
[Won, 2002]) spatial distribution of edges• Need to first generate edge map
– Roberts, Sobel and Prewitt, Canny, …
• Build histogram based on 5 edge types
Image & Video Compression (19/09/2006) - 89 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Change Detection
• Compare 2 temporally adjacent images and determine how different they are
• Why?– Surveillance-type applications
• Assume static camera & background• Anything changing between one object and next must be
an object!• In fact, this is naïve but starting point of many object
segmentation techniques
– Temporal video structuring• Breaking video up into “chunks” for non-linear browsing:
shots, scenes, events, story-lines
Image & Video Compression (19/09/2006) - 90 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Temporal Video Structuring
• Shot boundary detection
A set of keyframes
Keyframe-based video browsers
a video document
Image & Video Compression (19/09/2006) - 91 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Image & Video Compression (19/09/2006) - 92 -
Centre for Digital Video Processing
C e n t r e f o r D I g I t a l V I d e o P r o c e s s I n g
Temporal Video Structuring
• Shot boundary detection– A shot is a continuous piece of video taken with one
camera– A shot cut is the abrupt or gradual transition between two
shots
• Uncompressed domain:– Calculate colour histogram for each frame– Calculate difference between histograms using suitable
metric: L1 (city-block), L2 (Euclidean), Mahanoblis, etc– Threshold
• Compressed domain:– Parse features directly from bitstream:
• E.g. use DCT coefficients for each frame to reconstruct approximation of image
• E.g. motion vectors for each pair of frame and detect changes in global statistics