Transcript
Page 1: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG

Page 2: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Introduction

Video Codec

video

encoder

video

decoder

input

sample

arrays

bitstream reconstr.

sample

arrays

Video EncoderMaps raw sample arrays into a bitstreamDetermines coding efficiency for given bitstream syntax and decoding process

Video DecoderReconstructs sample arrays from the received bitstream

Typical use caseEncoder and decoder know bitstream syntax and decoding process

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 2 / 49

Page 3: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Introduction

Lossy Compression

HD movie UHD broadcast Video chaton Blu-ray disc over DVB-S2 over the Internet

raw video 1920×1080, 24 fps, 3840×2160, 60 fps, 1280×720, 50 fps,data format Y’CbCr 4:2:0, 8 bit Y’CbCr 4:2:0, 10 bit Y’CbCr 4:2:0, 8 bit

raw data rate ca. 600 Mbit/s ca. 7.5 Gbit/s ca. 550 Mbit/s

channel bit rate 36 Mbit/s (read speed) 58 Mbit/s (8PSK 2/3) depends on connection

typicalca. 20 Mbit/s ca. 25 Mbit/s ca. 1 Mbit/svideo bit rate

requiredca. 30 : 1 ca. 300 : 1 ca. 500 : 1compression

Required video bit rate is typically much smaller than raw data rateRequired bit rate can only be achieved with lossy compressionDecoded signal represents approximation of the input signalGoal: Represent signal with small impact on perceived quality

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 3 / 49

Page 4: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Introduction

Coding Efficiency

Coding efficiencyTrade-off between bit rate and reconstruction qualityMeasured for sets of representative test sequences

Bit rateUse average bit rate in our comparisons (count bits in bitstream)In practice: Bit distribution among frames is also important

Reconstruction qualityHuman perception of visual quality: Difficult to evaluateMost reliable method: Subjective viewing tests (expensive, time consuming)Use mean squared error (MSE) and peak signal-to-noise ratio (PSNR)

MSE =1

W · H

W−1∑x=0

H−1∑y=0

(s ′[x , y ]− s[x , y ]

)2PSNR = 10 · log10

(2B − 1)2

MSE(B: bit depth)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 4 / 49

Page 5: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Introduction

Rate-Distortion Curves & Bit-Rate Savings

35

36

37

38

39

40

41

0 1000 2000 3000 4000 5000

PS

NR

(Y

CbC

r) [

dB]

bit rate [kbit/s]

codec B

codec ARB

RA

exampletarget quality

( 39 dB )

0

10

20

30

40

50

60

70

36 37 38 39 40 41

bit-

rate

sav

ing

[%]

PSNR (YCbCr) [dB]

BBA = (RA - RB) / RAtarget quality

( 39 dB )

example

bit rate saving ofcodec B vs. codec A

Rate-distortion curveMeasure average bit rate and average PSNR for multiple operation points

Bit-rate savingsDetermine bit-rate savings using interpolated rate-distortion curves

BBA = (RA − RB ) /RA (codec B relative to codec A)

Calculate average bit-rate savings by integrating curveT. Wiegand (TU Berlin) — Image and Video Coding: JPEG 5 / 49

Page 6: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Intra-Picture Coding

Still Image Coding / Intra-Picture Coding

Still Image CodingExchange, transmission and storage of imagesUsed in virtually all digital cameras and picture editing applicationsMost widely used image coding standard: JPEGFurther standards: JPEG-2000, JPEG-XR, HEVC-Intra

Simple Approach for Video CodingSeparate coding of pictures of a video sequence

Hybrid Video Coding: Two types of block coding modesIntra-picture coding modes

Represent blocks of samples without referring to other picturesUtilize only dependencies inside pictures

Inter-picture coding modesUtilize dependencies between pictures (motion-compensated prediction)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 6 / 49

Page 7: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Intra-Picture Coding

Intra-Picture Coding in Hybrid Video Codecs

Two different settings:

Intra PicturesAll blocks of a picture are coded using intra-picture coding modesFirst picture of a video sequenceRequired for “clean” random access / bitstream splicingCan be advantageous in error-prone environments

Individual Intra BlocksSome blocks of an inter picture are coded in intra-picture coding modesIncreases error robustnessOlder standards: Stops accumulation of transform mismatchesMain reason: Coding efficiency(remember: Non-matched prediction can decrease coding efficiency)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 7 / 49

Page 8: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Intra-Picture Coding

Intra Blocks in Inter Pictures — Coding Efficiency

32

33

34

35

36

37

38

0 2 4 6 8 10

PS

NR

(Y

) [d

B]

bit rate [Mbit/s]

Basketball Drive, IPPP

all coding modes

intra-picture codingmodes are disabledin inter pictures

0

5

10

15

20

25

30

32 33 34 35 36 37 38

bit-

rate

incr

ease

[%]

PSNR (Y) [dB]

Basketball Drive, IPPP

bit-rate increase due todisabling of intra-picture

coding modes in inter pictures(on average, 10.9%)

Example: IPPP coding with H.265 | MPEG-H HEVCDisabling of intra blocks in inter pictures ca. 11% bit-rate increaseCertain regions of a picture cannot be well predicted using MCP

Uncovered backgroundRegions with non-translational motion...

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 8 / 49

Page 9: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Overview

The JPEG Standard

Joint Photographic Experts Group (JPEG)Standard is named after the group which created itJoint committee of ITU-T and ISO/IEC JTC 1ITU-T Study Group 16, Question 6 (Visual Coding Experts Group: VCEG)ISO/IEC Joint Technical Committee 1, Subcommittee 29, Working Group 1(ISO/IEC JTC 1/SC 29/WG 1)

Standard: Digital Compression and Coding of Continuous-Tone Still ImagesOfficially ITU-T Rec. T.81 and ISO/IEC 10918-1Commonly referred to as JPEGSpecifies compression for gray-level and color imagesWork commences in 1986Standard published in 1992Still most widely used image compression

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 9 / 49

Page 10: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Overview

JPEG: Color Components and Partitioning

Color components (e.g., Y, Cb, Cr) are coded independently of each otherColor components are partitioned into 8×8 blocks (padding at borders)The 8×8 blocks are coded using transform coding

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 10 / 49

Page 11: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Overview

JPEG: Block Transform Coding (8× 8 Blocks)

2D

transform

scalar

quantization

entropy

coding

block of

samples

sequence

of bits

2D inverse

transform

decoder

mapping

entropy

decoding

reconstructed

block

2D Block TransformEnergy compaction (reduce dependencies between coefficients)

Scalar QuantizationApproximate signal in a suitable way

Entropy CodingRepresent quantization indexes with a small number of bits

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 11 / 49

Page 12: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / Orthogonal Block Transform

Orthogonal Block Transform

Linear TransformGeneral case: Samples of a block are arranged in a vector svec

Forward and inverse transforms are given by

tvec = A · svec and s′vec = B · t′vecs′vec is given by weighted sum of transform basis functions (columns of B)

s′vec = t ′0 · b0 + t ′1 · b1 + t ′2 · b2 + · · ·

Perfect ReconstructionPerfect reconstruction in the absence of quantization

=⇒ A = B−1

Orthogonal TransformsTransform basis functions bk are orthogonal to each otherTransform basis functions bk have unit norms (bT

k bk = 1)

=⇒ A = B−1 = BT

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 12 / 49

Page 13: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / Orthogonal Block Transform

Orthogonal Block Transform

Orthogonal Block TransformsRotation (and reflection) in signal space (axes remain orthogonal)Forward and inverse transforms are given by

tvec = BT · svec and s′vec = B · t′vec

Main AdvantageSSD distortion in signal space = SSD distortion in transform domain

D = (s − s′)T (s − s′) = (t − t′)T BTB (t − t′)= (t − t′)T (t − t′)

=∑k

(tk − t ′k)2 =∑k

(sk − s ′k)2

SSD distortion can be minimized with independent scalar quantizersLagrangian costs D + λ · R can be minimized using simple algorithms

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 13 / 49

Page 14: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / Separable 2D Block Transforms

Separable 2D Block Transforms

Separable TransformsN×M blocks of samples and transform coefficientsForward and inverse transforms are given by

t = BTV · s · BH and s′ = BV · t′ · BT

H

Interpretation (forward transform)1 Transform all columns of the block using the vertical transform2 Transform all rows of the intermediate block using the horizontal transform

or1 Transform all rows of the block using the horizontal transform2 Transform all columns of the intermediate block using the vertical transform

AdvantageSignificantly reduced complexityPotential loss in coding efficiency is very small (due to 2D character of data)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 14 / 49

Page 15: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / JPEG: 2D DCT-II

JPEG: Discrete Cosine Transform (DCT) of Type II

Discrete Cosine Transform of type II (DCT-II)Horizontal and vertical transforms BH and BV are DCTs of type IIN×N inverse transform matrix BDCT = {bik} is given by coefficients

bik =ak√N

cos

Nk

(i +

12

))with ak =

{1 : k = 0√2 : k > 0

Each basis vector bk = {bik} represents a sampled cosine function,where the frequency ωk = k π

N increases with increasing k

Why DCT-II?1 Fourier transform of double size applied to signal with mirror extension

Overcomes discontinuity issues of Fourier transform2 Optimal transform for Gauss-Markov sources with %→ 1

Independent transform coefficients: KLT for Gauss-Markov with %→ 13 Fast algorithms for computing forward and inverse transform

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 15 / 49

Page 16: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / JPEG: 2D DCT-II

Basis Functions of DCT-II (size 8)

b0

b1

b2

b3

b4

b5

b6

b7

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 16 / 49

Page 17: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / JPEG: 2D DCT-II

JPEG: Separable 2D DCT-II for 8× 8 Blocks

originalblock

horizontalDCT after

horizontalDCT

verticalDCT

after2d DCT

horizontalDCT

verticalDCT

aftervertical

DCT

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 17 / 49

Page 18: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / JPEG: 2D DCT-II

Basis Images of Separable 8×8 DCT-II (used in JPEG)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 18 / 49

Page 19: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / Integer Approximation

Integer Approximations of DCT

Disadvantage of DCTMost matrix coefficients are irrational numbersHave to be approximated by binary numbers with finite precisionMismatches if encoder and decoder use different approximationsMismatches can accumulate if prediction between blocks/pictures is used

Transforms in newer Video Coding Standards (H.264 | AVC, H.265 | HEVC)New video coding standards specify integer approximation of DCTSame approximation is used in all implementationsNo encoder/decoder mismatches due to different implementations

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 19 / 49

Page 20: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / Coding Efficiency

Transform Gain

Coding efficiency of a transformDifficult to evaluate

All components of a transform codec influence each otherFor Gaussian sources, high rates, and entropy-constrained scalar quantizers

Transform gain (ratio of arithmetic and geometric mean of variances)

GT = 10 · log10

(1N

∑k σ

2k(∏

k σ2k

)1/N)

Transform gain GT represents a measure for the decorrelation property /energy compaction property of a transform

Karhunen Loev́e transform (KLT)Orthogonal transform that maximizes transform gain GT

Optimal transform for Gaussian signalsKLT is signal dependent and, for 2D signals, it is a non-separable transform

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 20 / 49

Page 21: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / Coding Efficiency

Transform Gain of KLT, DCT-II, HEVC Integer Transform

0.0

0.1

0.2

0.3

0.4

Bas BQT Cac Kim Par

Loss

rel

. to

non-

sep.

KLT

[dB

]

Original pictures

17.05

16.08 18.99 23.13

16.56

Separable KLTDCT (type II)HEVC transform

DCT transform gain(in dB) is shownabove the bars

0.0

0.1

0.2

0.3

0.4

Bas BQT Cac Kim Par

Loss

rel

. to

non-

sep.

KLT

[dB

]

Residual pictures

2.87 1.03

2.08 7.382.69

Separable KLTDCT (type II)HEVC transform

DCT transform gain(in dB) is shownabove the bars

Experimental investigation of transform gainBlocks of 8×8 samples (original and residual) for 5 test sequencesRestriction to separable transforms has rather small impactDCT and integer approximation slightly decrease transform gain

Note: Transform gain does not reflect all effects (for non-Gaussian sources)T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 21 / 49

Page 22: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / Coding Efficiency

Coding Optimal Transform (COT)

Coding Optimal TransformNo straightforward design criterionDesign transform, quantizer, entropy coding using iterative algorithmGiven: Lagrange multiplier λ and sufficiently large training set {sk}

Example algorithm for designing a coding optimal transform1 Choose initial transform (e.g., KLT); given by inverse transform matrix B2 Generate transform coefficient vectors {tk} by transforming all sample

vectors {sk} of the training set using the forward transform BT

3 Develop an ECSQ (using the given λ) for each transform coefficient4 Generate set of reconstructed transform coefficients {t ′k} using the quantizers5 Choose the inverse orthogonal transform matrix B that minimizes the MSE

distortion D between {sk} and {B t ′k}6 Repeat the previous four steps until convergence

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 22 / 49

Page 23: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

2D Block Transform / Coding Efficiency

Coding Efficiency of KLT, DCT-II, COT

-0.35

-0.30

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

0.05

0 50 100 150 200 250

Loss

rel

. to

non-

sep.

KLT

[dB

]

bit rate (first-order entropy) [Mbit/s]

Cactus (original pictures)

separable KLT

DCT (type II)COT

-0.10

-0.08

-0.06

-0.04

-0.02

0.00

0.02

0.04

0.06

0 50 100 150 200 250

Loss

rel

. to

non-

sep.

KLT

[dB

]

bit rate (first-order entropy) [Mbit/s]

Cactus (residual pictures)

separable KLT

DCT (type II)

COT

Coding experiment for 8×8 blocks of original and residual picturesEntropy-constrained scalar quantizers & optimal independent entropy codingCompare non-separable KLT, separable KLT, DCT-II, and COT

2D DCT-II represents a reasonable choice for transform codingSignal independent, low complexity, no side informationRather small losses in coding efficiency compared to KLT & COT

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 23 / 49

Page 24: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Scalar Quantization

Scalar Quantization

ConsiderScalar quantization of transform coefficientsSeparate quantization of transform coefficientAssume independent, but optimal entropy coding

Optimal scalar quantizersEntropy-constrained scalar quantizers (ECSQ)ECSQs depend on distribution of transform coefficientsRequire transmission of reconstruction levelsNot used in practical video codecs

Scalar quantizers used in practiceUniform reconstruction quantizers (URQs)URQs with extra-wide dead zone (older video coding standards)Can be characterized by single parameter (step size)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 24 / 49

Page 25: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Scalar Quantization / Distribution of Transform Coefficients

Distribution of Transform Coefficients

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

-30 -20 -10 0 10 20 30

prob

abili

ty d

ensi

ty

transform coefficient value

Transform coefficient t1,1 (residual)

histogram

approximation byLaplacian pdf

(better fit)

approximation byGaussian pdf

0.00

0.05

0.10

0.15

0.20

0.25

0.30

-8 -6 -4 -2 0 2 4 6 8

prob

abili

ty d

ensi

ty

transform coefficient value

Transform coefficient t2,4 (residual)

histogram

approximation byLaplacian pdf

approximation byGaussian pdf(better fit)

Experimental investigation for 8×8 DCTMany coefficients can be well modeled by a Laplacian pdfFor other coefficients, Gaussian model provides better fitGood model: Generalized Gaussian distribution(typically between Laplacian and Gaussian)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 25 / 49

Page 26: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Scalar Quantization / Uniform Reconstruction Quantizers

Uniform Reconstruction Quantizers (URQs)

s

s'0 s'1 s'2 s'3 s'4s'-1s'-2s'-3s'-4

0 Δ 2·Δ 3·Δ 4·Δ-Δ-2·Δ-3·Δ-4·Δ

u-3 u-2 u-1 u0 u1 u2 u3 u4

z3 z2 z1 z0 z0 z1 z2 z3

Uniform reconstruction quantizersEqually spaced reconstruction levels (indicated by step size ∆)Simple decoder mapping

t ′ = ∆ · q

Encoder has freedom to adapt decision thresholds to sourceDecision thresholds can be specified by quantization offsets zk (see figure)Iterative design algorithm similar to that for ECSQs

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 26 / 49

Page 27: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Scalar Quantization / Uniform Reconstruction Quantizers

Coding Efficiency Comparison: URQs vs Optimal ECSQs

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0 1 2 3 4 5 6

SN

R lo

ss r

elat

ive

to E

CS

Q [d

B]

rate (entropy) [bit/sample]

URQ (opt.)

URQ2T

URQ1T

0.000

0.005

0.010

0.015

0.020

0.025

0 1 2 3 4 5 6

SN

R lo

ss r

elat

ive

to E

CS

Q [d

B]

rate (entropy) [bit/sample]

URQ (opt.)

URQ3T

URQ2T

URQ1T(maximum at

≈ 0.13327 dB)

Experimental investigation for Laplacian and Gaussian sourcesURQ (opt.) — URQ with optimally selected decision thresholdsURQ1T — URQ with single quantization offset (z0 = z1 = z2 = · · · )URQ2T — URQ with two quantization offsets (z0 and z1 = z2 = · · · )

Restriction to URQs has (typically) very small impact on efficiencyT. Wiegand (TU Berlin) — Image and Video Coding: JPEG 27 / 49

Page 28: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Scalar Quantization / Bit Allocation

Bit Allocation among Transform Coefficients

Optimal bit allocationAll scalar quantizers are designed using the same Lagrange multiplier λ

High-rate approximation for URQsOperational distortion-rate function Dk(Rk) for component quantizers

Dk(Rk) = ε2k · σ2k · 2−2Rk

Optimal bit allocation

λ = −dDk

dRk= 2 ln 2 · ε2k · σ2

k · 2−2Rk = 2 ln 2 · Dk = const

High-rate approximation for distortion

Dk =112·∆2

k

High-rate bit allocation rule

∆k = const

Same quantization step size ∆k for all transform coefficientsT. Wiegand (TU Berlin) — Image and Video Coding: JPEG 28 / 49

Page 29: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Scalar Quantization / Bit Allocation

Coding Experiment: Bit Allocation

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0 50 100 150 200 250

Loss

rel

. to

optim

al E

CS

Q [

dB]

bit rate (first-order entropy) [Mbit/s]

Cactus (residual pictures)

URQ (same λ)

URQ (same Δ)

URQ2T (same Δ)

URQ1T (same Δ)

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0 20 40 60 80 100 120

Loss

rel

. to

optim

al E

CS

Q [

dB]

bit rate (first-order entropy) [Mbit/s]

Kimono (residual pictures)

URQ (same λ)URQ (same Δ)

URQ2T (same Δ)

URQ1T (same Δ)

Experimental investigation for 8×8 residual blocksReference: Optimal ECSQs with optimal bit allocation (same λ)URQ (same λ ) — Optimal URQs, all designed for the same λURQ (same ∆) — Optimal URQs with the same quantization step size ∆

URQXT: Restricted URQ design with X different quantization offsets

Simple and efficient: URQs with the same quantization step sizeT. Wiegand (TU Berlin) — Image and Video Coding: JPEG 29 / 49

Page 30: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Scalar Quantization / JPEG

Quantization in JPEG: Uniform Reconstruction Quantizers

Inverse quantization (scaling) in decoder

t ′ik = ∆ik · qik

with qik : Quantization index for coefficient at location (i , k) inside block∆i,k : Quantization step size for coefficient at location (i , k)t′i,k : Reconstructed transform coefficient at location (i , k)

Quantization (in encoder)No normative encoding procedure, but informative quantization rule

qik = round(

tik∆ik

)(ti,k : original transform coefficient)

Can be improved (discuss later: Rate-distortion optimized quantization)

Quantization weighting matricesSeparate quantization step sizes ∆ik for each coefficient location (i , k)

Additional freedom for encoder optimization (perceptual optimization)T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 30 / 49

Page 31: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Scalar Quantization / JPEG

Quantization Weighting Matrices in JPEG

Quantization Weighting Matrices / Quantization TablesDetermine rate and distortion (among other parameters)Need to be transmitted (no default tables in JPEG)Example tables for YCbCr format are specified in Annex K of standard(empirically derived based on psychovisual threshold experiments)

luma blocks chroma blocks

16 11 10 16 24 40 51 6112 12 14 19 26 58 60 5514 13 16 24 40 57 69 5614 17 22 29 51 87 80 6218 22 37 56 68 109 103 7724 35 55 64 81 104 113 9249 64 78 87 103 121 120 10172 92 95 98 112 100 103 99

17 18 24 47 99 99 99 9918 21 26 66 99 99 99 9924 26 56 99 99 99 99 9947 66 99 99 99 99 99 9999 99 99 99 99 99 99 9999 99 99 99 99 99 99 9999 99 99 99 99 99 99 9999 99 99 99 99 99 99 99

Optimal PSNR: Use same quantization step size for all coefficients

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 31 / 49

Page 32: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Entropy Coding / Statistical Dependencies

Entropy Coding of Transform Coefficient Levels: Scanning

For investigation of transforms and quantizersIgnored potential dependencies between transform coefficient levelsUsed sum of first-order entropies for approximating rate

Statistical dependencies & Scanning patternsTransform coefficient levels are not independent of each other (see later)High-frequency transform coefficient levels are more likely to be equal to zeroScanning: Traverse coefficients from low to high frequency positions

0.242 0.108 0.053 0.009

0.105 0.053 0.022 0.002

0.046 0.017 0.006 0.001

0.009 0.002 0.001 0.000

probabilities P(qk 6= 0) zig-zag scan (JPEG) diagonal scan (HEVC)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 32 / 49

Page 33: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Entropy Coding / Statistical Dependencies

Statistical Dependencies: Transform Coefficient Levels

Are there statistical dependencies?Can compare marginal and conditional pmfs

Infeasible due to very large signal spaceInstead: Evaluate coding methods that utilize potential dependencies

Motivated by approaches found in actual video coding standardsIf levels are independent, no gain will be observed

Investigate coding concepts that exploit potential dependenciesCoded block flag (CBF): Signals whether all levels in block are equal to zeroEnd-of-block flag (EOB): Signals whether all following levels are equal to zero(transmitted at beginning and after each non-zero level)LastPos: Transmit position of last non-zero level in advanceCtxNumSig: Conditional codes depending on number of already codednon-zero levels (in forward and backward scanning order)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 33 / 49

Page 34: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Entropy Coding / Statistical Dependencies

Experiment: Statistical Dependencies

-14

-12

-10

-8

-6

-4

-2

0

0 10 20 30 40 50 60 70 80

bit-

rate

incr

ease

[%]

bit rate (first-order entropy) [Mbit/s]

Cactus (residual pictures)

EOB

LastPos

CtxNumSig(forward)

CtxNumSig(backward)

CBF

-10

-8

-6

-4

-2

0

0 10 20 30 40 50

bit-

rate

incr

ease

[%]

bit rate (first-order entropy) [Mbit/s]

Kimono (residual pictures)

EOBLastPos

CtxNumSig(forward)

CtxNumSig (backward)

CBF

Experimental investigation for 8×8 residual blocks (DCT + optimal URQs)Investigate coding techniques: CBF, EOB, LastPos, CtxNumSigNo actual coding: Calculate entropy limits for the considered technqiuesCompare limits with sum of marginal entropies (limit for independent coding)

There are statistical dependencies between the levels in a blockCan be utilized for an efficient entropy coding

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 34 / 49

Page 35: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Entropy Coding / Run-Level Coding

Entropy Coding Example: Run-Level Coding

Run-Level Coding (e.g., H.262 | MPEG-2 Video)Scan block of transform coefficient levels (e.g., using zig-zag scan)Map scanned sequence of transform coefficients to (run,level) pairs

run : Number of transform coefficient levels equal to zero thatprecede the next non-zero transform coefficient level

level : Value of the next non-zero transform coefficient level

Codewords are assigned to (run,level) pairsCode includes an additional end-of-block symbol (eob)

Signals that all following transform coefficient levels are equal to zero

Example:Scanned sequence of 20 transform coefficient levels

5 −3 0 0 0 1 0 −1 0 0 −1 0 0 0 0 0 0 0 0 0A conversion into run-level pairs (run,level) yields

(0,5) (0,−3) (3,1) (1,−1) (2,−1) (eob)T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 35 / 49

Page 36: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Entropy Coding / JPEG

Entropy Coding in JPEG

Different concepts for DC and AC levelsDC levels: Differential coding using codeword tablesAC levels: Run-level coding of scanned coefficients

Coding of DC transform coefficient levelsDC level is predicted by previous DC levelDifference to predictor is coded using VLC and FLCCategory C is coded using VLC

Specifies range of values

Specifies number of following bits (for FLC)

FLC specifies actual value of DIFF inside categoryIf DIFF > 0, low-order bits of DIFF

If DIFF < 0, low-order bits of DIFF− 1

VLC table for category needs to be transmittedIncreases side information (no default table)

Allows adaptation to actual statisticsT. Wiegand (TU Berlin) — Image and Video Coding: JPEG 36 / 49

Page 37: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Entropy Coding / JPEG

Example VLC Table for Coding DC Difference Category

Example VLC table for coding category C

Range of DIFF values is specified in standardCodeword assignment has to be transmittedExample shows recommended table for luma DC (Annex K of JPEG)

Category C Range of DIFF value Example codeword0 0 001 -1, 1 0102 -3, -2, 2, 3 0113 -7..-4, 4..7 1004 -15..-8, 8..15 1015 -31..-16, 16..31 1106 -63..-32, 32..63 11107 -127..-64, 64..127 111108 -255..-128, 128..255 1111109 -511..-256, 256..511 111111010 -1023..-512, 512..1023 1111111011 -2047..-1024, 1024..2047 111111110

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 37 / 49

Page 38: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Entropy Coding / JPEG

Entropy Coding of AC Transform Coefficient Levels

Representation of AC levelsConvert into sequence using a zig-zag scanAC coefficients are likely to be quantized to zero(in particular those at high-frequency locations)Successive runs of zeros are represented using a run(number of consecutive levels equal to zero)Non-zero levels are represented by a category anda value inside the category (same as for DC levels)

Coding of AC levels: Combination of VLC and FLCVariable-length code table is used for coding events {run,category}VLC table includes a special symbol (EOB) for signaling the end-of-block(all remaining AC levels are equal to zero)Fixed-length code is used for coding the exact value inside a category(number of bits is given by category) – same as for DC difference levelsVLC table has to be transmitted (no default table)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 38 / 49

Page 39: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Entropy Coding / JPEG

Example VLC Table for Run-Category Coding of AC Levels

Example: First entries of recommended table (Annex K of JPEG)

run/category codewordEOB 10100/1 000/2 010/3 1000/4 10110/5 110100/6 11110000/7 111110000/8 11111101100/9 11111111100000100/10 11111111100000111/1 11001/2 110111/3 11110011/4 1111101101/5 111111101101/6 11111111100001001/7 11111111100001011/8 11111111100001101/9 11111111100001111/10 1111111110001000

run/category codeword2/1 111002/2 111110012/3 11111101112/4 1111111101002/5 11111111100010012/6 11111111100010102/7 11111111100010112/8 11111111100011002/9 11111111100011012/10 11111111100011103/1 1110103/2 1111101113/3 1111111101013/4 11111111100011113/5 11111111100100003/6 11111111100100013/7 11111111100100103/8 11111111100100113/9 11111111100101003/10 1111111110010101· · · · · ·

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 39 / 49

Page 40: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Entropy Coding / JPEG

Example: JPEG Transform Coefficient Level Coding

Coding of DC transform coefficient levelLast DC level: DC (N − 1) = 178Prediction difference: DIFF = 185− 178 = 7Category C = 3: Codeword “100”Fixed-length code (lowest 3 bits of “7”): “111”Final bit representation (6 bit): “100111”

Coding of AC transform coefficient levelsZig-zag scanning and conversion into (run,level) pairs yields

(0, 3) (0, 1) (2, 1) (1,−1) (6,−3) (0,−1) (0,−2) (0,−1) (EOB)

Representation as (run,category) [FLC bits] sequence

(0, 2)[11] (0, 1)[1] (2, 1)[1] (1, 1)[0] (6, 2)[00] (0, 1)[0] (0, 2)[01] (0, 1)[0] (EOB)

Bit sequence: VLC bits [FLC bits] (in total: 46 bits for 63 AC levels)

01 [11] 00 [1] 11100 [1] 1100 [0] 111111110110 [00] 00 [0] 01 [01] 00 [0] 1010

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 40 / 49

Page 41: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Examples

JPEG Compression Example – Original (YCbCr 4:2:0, 12 bpp)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 41 / 49

Page 42: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Examples

JPEG Compression Example – 1:10 Compression (1.2 bpp)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 42 / 49

Page 43: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Examples

JPEG Compression Example – 1:25 Compression (0.48 bpp)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 43 / 49

Page 44: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Examples

JPEG Compression Example – 1:50 Compression (0.24 bpp)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 44 / 49

Page 45: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Examples

JPEG Compression Example – 1:100 Compression (0.12 bpp)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 45 / 49

Page 46: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Examples

JPEG Compression Example – 1:200 Compression (0.06 bpp)

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 46 / 49

Page 47: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Features

JPEG Baseline

JPEG Baseline FeaturesSequential processing of 8×8 blocks (all color components)Transform: Separable 8×8 discrete cosine transform (DCT) of type IIQuantizer: Uniform reconstruction quantizer (using quantization matrix)DC coding: Prediction and combination of VLC and FLCAC coding: Zig-zag scan and run-level coding (combination of VLC & FLC)

Encoding optionsChoose quantization and VLC tablesDetermine transform coefficient levels (rounding / advanced quantization)One-pass encoding:

Predefine tables and choose transform coefficient levels during encoding

Multi-pass encoding: One or more iterations betweenUse given tables and choose transform coefficient levels during encoding

Optimize VLC tables based on given statistics of transform coefficient levels

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 47 / 49

Page 48: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

JPEG Features

JPEG – Beyond Baseline

Extended bit depthImages with 1, 2, 3 or 4 color components and 8 or 12 bit per sample

Entropy codingUp to 4 AC and DC entropy coding tables can be specifiedAdaptive binary arithmetic coding can be optionally usedas a replacement for variable-length coding (rarely supported)

Supported modes of operationSequential DCT-based coding (as in Baseline)Progressive DCT-based codingLossless codingHierarchical coding

Extended file formats (not part of JPEG standard)Additional information can be embedded in files (e.g., EXIF):Date/time, camera & lens model, aperture, shutter speed, focal length, etc.

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 48 / 49

Page 49: Image and Video Coding II - TU Berlin...k Optimalbitallocation = dD k dR k = 2ln2"2 k˙ 2 k 2 2R k = 2ln2D = const High-rateapproximationfordistortion D k = 1 12 2 High-ratebitallocationrule

Summary

Part Summary

JPEG Baseline: Transform coding of 8×8 blocks of samplesTransform coding usingTransform: Decorrelation / Energy CompactionQuantization: Approximation of signal in suitable wayEntropy Coding: Represent quantization indexes with small number of bits

Transform and QuantizationSeparable DCT-IIUniform reconstruction quantizers (URQs)Support of quantization weighting matrices (side information)

Entropy CodingDifferent approaches for DC and AC coefficientsDC: Prediction + VLC for Category + Fixed-length codingAC: Run-Category coding using VLC + Fixed-length codingTransmission of entropy coding tables as side information

T. Wiegand (TU Berlin) — Image and Video Coding: JPEG 49 / 49


Recommended