Source Coding Fundamentals - TU Berlin · Source Coding Fundamentals Formulation of the Practical Communication Problem The practical source coding design problem can be posed as

o

Source Coding Fundamentals

Source CodingFundamentals

Thomas Wiegand Digital Image Communication 1 / 54

o


Outline

Introduction

Lossless Coding

Huffman CodingElias and Arithmetic Coding

Rate-Distortion Theory

Rate-Distortion FunctionShannon Lower Bound

Quantization

Scalar QuantizationVector Quantization

Predictive Coding

Linear PredictionDifferential Pulse Code Modulation (DPCM)

Transform Coding

Orthogonal Transforms and Bit AllocationKarhunen Loeve Transform (KLT)Discrete Cosine Transform (DCT)


o


Practical Communication Problem

Source codecs are primarily characterized in terms of:

Throughput of the channel, a characteristic influenced by

transmission channel bit rate andamount of protocol and error-correction coding overhead incurred bytransmission system

Distortion of the decoded signal, which is primarily induced by

source encoder andby channel errors introduced in path to source decoder

The following additional constraints must also be considered:

Delay (start-up latency and end-to-end delay) including

processing delay, buffering,structural delays of source and channel codecs, andspeed at which data are conveyed through transmission channel

Complexity (computation, memory capacity, memory access) of

source codec,protocol stacks and network


o


Formulation of the Practical Communication Problem

The practical source coding design problem can be posed as follows:

Given a maximum allowed delay and a maximum allowedcomplexity, achieve an optimal trade-off between bit rateand distortion for the transmission problem in the targetedapplications

Scope of the consideration in this lecture: Source codec

Delay is only evaluated for source codec

Complexity is also only assessed for the algorithm used in source codec


o


Types of Compression

Lossless coding:

Uses redundancy reduction as the only principle and is therefore reversible

Also referred to as noiseless coding, invertible coding, data compaction,or entropy coding

Well known use for this type of compression for data is Lempel-Ziv coding(e.g., gzip) and for picture and video signals JPEG-LS is well known

Lossy coding:

Uses redundancy reduction and irrelevancy reduction and is therefore notreversible

It is the primary coding type in compression for speech, audio, picture, andvideo signals

The practically relevant bit rate reduction that is achievable through lossycompression is typically more than an order of magnitude larger than withlossless compression

Well known examples are for audio coding are the MPEG-1 Layer 3 (mp3),for still picture coding JPEG, and for video coding H.264/AVC


o


Distortion Measures

Usage of distortion measures

The use of lossy compression requires the ability to measure distortion

Often, the distortion that a human perceives in coded content is a verydifficult quantity to measure, as the characteristics of human perception arecomplex

Perceptual models are far more advanced for speech and audio codecs thanfor picture or video codecs

In picture and video coding,

Perceptual models have limited use to guide encoding decisions(mainly focusing on properties of the human visual system)Viewing tests are used to determine subjective quality of coding results

For investigating source coding techniques, simple objective distortionmeasures such as MSE and PSNR are often used


o


Objective Distortion Measures

Mean Squared Error (MSE)Pictures (X: picture height, Y : picture width):

MSE =1

X · Y

X−1∑x=0

Y−1∑y=0

(s′[x, y]− s[x, y]

)2(1)

Videos (N : number of pictures, MSEn: MSE of picture n):

MSE =1

N

N−1∑n=0

MSEn (2)

Peak Signal-to-Noise Ratio (PSNR)Pictures (2k − 1: maximum amplitude of the picture samples)

PSNR = 10 · log10

((2k − 1)2

MSE

)(3)

Videos (N : number of pictures, PSNRn: PSNR of picture n)

PSNR =1

N

N−1∑n=0

PSNRn (4)


o


Distortion Measures for Picture and Video Coding

Typical artifacts in compressed videos

Blockiness, motion errors, blur, ringing

Picture and video encoding

Measurement of properties of the human visual system have been used toderived spatial contrast sensitivity functions that are used in the encoding ofpictures

Temporal sensitivity functions that have been measured have so far not beenincluded in video encoding algorithms known in the public domain

Mean squared error is a widely used measure

Picture and video quality assessment

Objective quality measures often have very limited correlation with the actualsubjective quality of a compressed picture or video signal

Picture and video quality is assessed in viewing tests

Viewing test with human subjects are costly and time consuming


o


Video Quality Measurement

Viewing conditions

The viewing conditions are fixed and described in order to be able toreproduce the subjective test

ITU-R Rec. BT.500-11 specifies viewing conditions including

Low room illumination; Viewing screen is the main light sourceScreen size and preferred viewing distance as ratio between height of thescreen and distance from screen

Example for a viewing test method

Double stimulus continuous quality scale (DSCQS) is a test method wherethe subject view the stimuli compressed by a codec under test and a reference

Coded sequence and reference are presented alternately

Each sequence is presented twice

Sequences are separated by mid-gray level sequence of 3 sec duration

Subjects have no knowledge of display chronology

Subjects rate video on a continuous scale is partitioned into intervals that arelabeled “Excellent”, “Good”, “Fair”, “Poor” or “Bad”


o

Source Coding Fundamentals Lossless Coding

Lossless Coding


o


Discrete Random Variables and Probability Mass Function

Definitions

A random variable S is a function of the sample space O that assignsa value S(ζ) to each outcome ζ∈ O of a random experiment

A random variable S called discrete random variable if it takes valuesof countable alphabet A = {a0, a1, . . .}Probability mass function (pmf) for discrete random variables

pS(a) = P (S = a) = P ( {ζ : S(ζ)= a} ) (5)

Examples for pmfs

Binary pmf:

A = {a0, a1} pS(a0) = p, pS(a1) = 1− p (6)

Uniform pmf:

A = {a0, a1, · · ·, aM−1} pS(ai) = 1/M ∀ ai ∈ A (7)

Geometric pmf:

A = {a0, a1, · · · } pS(ai) = (1− p) pi ∀ ai ∈ A (8)


o


Joint and Conditional Pmfs

Joint probability mass function

The N-dimensional pmf or joint pmf for a random vectorS = (S0, · · · , SN−1)T is defined by

pS(a) = P (S= a) = P (S0 = a0, · · · , SN−1 = aN−1) (9)

Joint pmf of two random vectors X and Y

pXY (ax,ay) = P (X=ax,Y =ay) (10)

Conditional probability mass functions

The conditional pmf pS|B(a | B) of a random variable S given an event B,with P (B) > 0, is defined by

pS|B(a | B) = P (S = a | B) (11)

Conditional pmf of a random vector X given another random vector Y

pX|Y (ax|ay) =pXY (ax,ay)

pY (ay)(12)


o


Example for Joint Pmf

Example: Joint pmf for neighboring picture samples

Samples in picture and video signals typically show strong statisticaldependencies

Below: Histogram of two horizontally adjacent samples for the picture ’Lena’

Relative

frequency

of occurence

Amplitude of

adjacent pixel

Amplitude of

current pixel


o


Expectation

Expectation value or expectation

Definition for discrete random variables S

E {g(S)} =∑a∈A

g(a) pS(a) (13)

Important expectation values are mean µS and variance σ2S

µS = E {S} and σ2S = E

{(S − µs)

2}

(14)

Conditional expectation

The conditional expectation value of function g(S) given an event B,with P (B) > 0, is defined by

E {g(S) | B} =∑a∈A

g(a) pS(a | B) =∑a∈A

g(a)P (S = a,B)

P (B)(15)


o


Discrete Random Processes

Discrete random process

Series of random experiments at time instants tn, with n = 0, 1, 2, . . .,characterized by a series of random variables S = {Sn}Statistical properties of discrete-time random process S: N -th order joint pmf

pSn(a) = P (Sn = a0, · · · , Sn+N−1 = aN−1) (16)

Stationary random process

Statistical properties are invariant to a shift in time: pSn(a) = pS(a)

Memoryless random process

The random variables Sn are independent of each other

Independent and identical distributed (iid) random process

Stationary and memoryless: pS(a) = pS(a0) · pS(a1) · · · pS(aN−1)

Markov process

Future outcomes do not depend on past outcomes, but only on the present outcome

pSn(an | an−1, · · · ) = pSn(an | an−1) (17)


o


Lossless Source Coding – Overview

Lossless source coding

Reversible mapping of sequence of discrete source symbolsinto sequences of codewords

Other names: Noiseless coding, entropy coding

Original source sequence can be exactly reconstructed – not the case in lossycoding

Bit rate reduction is possible, if source data contain statistical propertiesthat are exploitable for data compression


o


Lossless Source Coding – Terminology

Terminology

Message s(L) ={s0, · · · , sL−1} drawn from stochastic process S={Sn}

Sequence b(K) ={b0, · · · , bK−1} of K bits (bk ∈ B={0, 1})

Process of lossless coding: Message s(L) is converted to b(K)

Assume:

Subsequence s(N) = {sn, · · · , sn+N−1} with 1 ≤ N ≤ L and

Bits b(`)(s(N)) = {b0, · · · , b`−1} assigned to it

Lossless source code

Encoder mapping:b(`) = γ

(s(N) ) (18)

Decoder mapping:

s(N) = γ−1( b(`) ) = γ−1( γ( s(N) ) ) (19)


o


Classification of Lossless Source Codes

Lossless source code

Encoder mapping:b(`) = γ

(s(N)

)(20)

Decoder mapping:

s(N) = γ−1(b(`)

)= γ−1

(γ(s(N)

) )(21)

Classification

Fixed-to-fixed mapping: N and ` are both fixed (discussed as special caseof fixed-to-variable)

Fixed-to-variable mapping: N fixed and ` variable – Huffman algorithm forscalars and vectors (discussed in lecture)

Variable-to-fixed mapping: N variable and ` fixed – Tunstall codes (notdiscussed in lecture)

Variable-to-variable mapping: ` and N are both variable – Arithmeticcodes (discussed in lecture)


o


Variable-Length Coding for Scalars

Assigning codewords to scalar symbols

Assign a separate codeword to each scalar symbol sn of a message s(L)

Assume: Message s(L) generated by stationary random process S = {Sn}Random variables Sn = S with symbol alphabet A = {a0, · · · , aM−1} andmarginal pmf p(a) = P (S = a)

Lossless source code: Assign a binary codeword bi = {bi0, · · · , bi`(ai)−1} toeach alphabet letter ai

The length (number of bits) of the codeword bi that is assigned to ai isdenoted by `(ai), with `(ai) ≥ 1

Average codeword length

¯= E {`(S)} =

M−1∑i=0

p(ai) `(ai) (22)


o


Optimization Problem

Average code word length is given as

¯=

K−1∑k=0

p(ak) · `(ak) (23)

Goal of lossless code design:

Minimize average codeword length ¯ while providing uniquedecodability

ai p(ai) code A code B code C code D code E

a0 0.5 0 0 0 00 0a1 0.25 10 01 01 01 10a2 0.125 11 010 011 10 110a3 0.125 11 011 111 110 111

¯ 1.5 1.75 1.75 2.125 1.75


o


Unique Decodability and Prefix Codes

Unique decodability

The code γ has to specify a mapping ai → bi such that

if ak 6= aj then bk 6= bj (24)

For sequences of symbols, above constraint needs to be extended to theconcatenation of multiple symbols

→ For a uniquely decodable code, a sequence of codewords can only begenerated by one possible sequence of source symbols.

Prefix codes

One class of codes that satisfies the constraint of unique decodabilityis called prefix codes

A code is called a prefix code if no codeword is a prefix of any other codeword

It is obvious that if condition (24) is satisfied and the code is a prefix code,then any concatenation of symbols is uniquely decodable


o


Binary Code Trees

‘ 0 ’

‘ 0 ’

‘ 0 ’

‘ 0 ’

’ 10 ’

‘ 1 ’

‘ 1 ’

‘ 1 ’ ‘ 110 ’

‘ 111 ’

root node

interior

node

terminal

node

branch

‘ 0 ’

‘ 0 ’

‘ 0 ’

‘ 0 ’

’ 10 ’

‘ 1 ’

‘ 1 ’

‘ 1 ’ ‘ 110 ’

‘ 111 ’

root node

interior

node

terminal

node

branch

Representation of prefix codes with binary trees

Prefix codes can be represented by trees

A binary tree contains nodes with two branches (labelled as ’0’ and ’1’)leading to other nodes starting from a root node

A node from which branches depart is called an interior node while a nodefrom which no branches depart is called a terminal node

A prefix code can be constructed by assigning letters of the alphabet A toterminal nodes of a binary tree


o


Parsing of Prefix Codes

Given the code word assignment to terminal nodes of the binary tree, the parsingrule for this prefix code is given as follows:

1 Set the current node ni equal to the root node.

2 Read the next bit b from the bitstream.

3 Follow the branch labeled with the value of b from the current node ni to thedescendant node nj .

4 If nj is a terminal node, return the associated alphabet letter and proceedwith step 1.

Otherwise, set the current node ni equal to nj and repeat the previous twosteps.

Prefix codes are not only uniquely decodable, but also instantaneouslydecodable.


o


Kraft Inequality for Prefix Codes

Property of codeword length for prefix codes

Assume fully balanced tree with depth `max (=longest code word)

Codewords assigned to nodes with codeword length `(ak) ≤ `max

Each choice with `(ak) ≤ `max eliminates 2`max−`(ak) possibilities of codeword assignment at level `max, for example:

→ `max − `(ak) = 0, one option is covered

→ `max − `(ak) = 1, two options are covered

Number of terminal nodes is less than or equal to number of terminal nodesin balanced tree with depth `max, which is 2`max

K−1∑k=0

2`max−`(ak) ≤ 2`max (25)


o


Kraft Inequality and Unique Decodability

Kraft inequality for lossless codes

Observation for prefix codes can be generalized

Can be shown that the Kraft inequality

ζ(γ) =

K−1∑k=0

2−`(ak) ≤ 1 (26)

is a necessary condition for the unique decodability of a code γ

Proof can be found in [Cover and Thomas, 2006, p. 116] or[Wiegand and Schwarz, 2011, p. 25]


o


Bound for Scalar Variable-Length Coding: The Entropy

Lower bound for average codeword length

Based on the Kraft inequality it can be shown that the average codewordlength ¯ for uniquely decodable scalar codes is bounded by

¯≥ H(S) = E {− log2 p(S)} = −M−1∑i=0

p(ai) log2 p(ai) (27)

An example proof can be found in [Wiegand and Schwarz, 2011, p. 27]

The lower bound H(S) is called the entropy of the source S

Redundancy of a code

The redundancy of a scalar code is given by the difference

% = ¯−H(S) ≥ 0 (28)

The redundancy is zero only if for all alphabet letters ai the length of thecorresponding codewords are `(ai) = − log2 p(ai)

The redundancy can only be zero if all probability masses p(ai) representnegative integer powers of 2


o


Upper Bound for Minimum Average Codeword Length

Bounds for achievable average codeword length

The fundamental lower bound for ¯ is given by the entropy H(S), but it isnot always achievable (codewords must have integer number of bits)

A code with the codeword lengths `(ai) = d− log2 p(ai)e, ∀ai ∈ A canalways be constructed, yielding the upper bound

¯ =

M−1∑i=0

p(ai) d− log2 p(ai)e

<

M−1∑i=0

p(ai) (1− log2 p(ai)) = H(S) + 1 (29)

The minimum achievable average codeword length ¯min is bounded by

H(S) ≤ ¯min < H(S) + 1 (30)


o


Entropy of a Binary Source

Binary entropy functionA binary source has probabilities p(0) = p and p(1) = 1− pThe entropy of the binary source is given as

H(S) = −p log2 p− (1− p) log2(1− p) = Hb(p) (31)

with Hb(x) being the so-called binary entropy function

0 0.25 0.5 0.75 10

0.2

0.4

0.6

0.8

1

P(a0)

R [b

it/sy

mbo

l]


o


The Huffman Algorithm

Constructing lossless codes with minimum redundancy

Question: How to generate a prefix code with minimum redundancy?

The answer was given by D. A. Huffman in 1952 [Huffman, 1952]

The Huffman algorithm always yields a prefix code with minimum redundancy

For a proof that Huffman codes are optimal instantaneous codes (withminimum expected length), see [Cover and Thomas, 2006, p. 124ff]

The Huffman algorithm

1 Given an ensemble representing a memoryless discrete source

2 Pick the two symbols with lowest probabilities and merge them into a newauxiliary symbol and calculate its probability

3 If more than one symbol remains, repeat the previous step

4 Convert the code tree into a prefix code


o


Example for the Design of a Huffman Code

P=0.03 ‘0’ ‘1’

P=0.06 ‘0’

‘1’ P=0.13 ‘0’

‘1’ P=0.27 ‘0’

‘1’ P=0.43 ‘0’

‘1’

P=0.57 ‘0’ ‘1’

‘0’

‘1’

P(7)=0.29

P(6)=0.28

P(5)=0.16

P(4)=0.14

P(3)=0.07

P(2)=0.03

P(1)=0.02

P(0)=0.01

’11’

’10’

‘01’

‘001’

‘0001’

‘00001’

‘000001’

‘000000’


o


Conditional Huffman Codes

Reduce average codeword length for sources with memory

Random process {Sn} with memory: Design VLC for conditional pmfExample:

Stationary discrete Markov process, A = {a0, a1, a2}Conditional pmfs p(a|ak) = P (Sn =a |Sn−1 =ak) with k = 0, 1, 2

a a0 a1 a2 entropy

p(a|a0) 0.90 0.05 0.05 H(Sn|a0) = 0.5690

p(a|a1) 0.15 0.80 0.05 H(Sn|a1) = 0.8842 H(Sn|Sn−1) = 0.7331

p(a|a2) 0.25 0.15 0.60 H(Sn|a2) = 1.3527

p(a) 0.64 0.24 0.1 H(S) = 1.2575

Design Huffman code for conditional pmfs

aiHuffman codes for conditional pmfs Huffman code

for marginal pmfSn−1 = a0 Sn−1 = a1 Sn−1 = a2a0 1 00 00 1a1 00 1 01 00a2 01 01 1 01

¯0 = 1.1 ¯

1 = 1.2 ¯2 = 1.4 ¯= 1.3556

¯c = 1.1578


o


Average Codeword Length of Conditional Huffman Codes

Bounds for the average codeword length

Average codeword length ¯k = ¯(Sn−1 =ak) is bounded by

H(Sn|ak) ≤ ¯k < H(Sn|ak) + 1 (32)

with conditional entropy of Sn given event {Sn−1 =ak}

H(Sn|ak) = H(Sn|Sn−1 =ak) = −M−1∑i=0

p(ai|ak) log2 p(ai|ak) (33)

Taking the expectation yields

M−1∑k=0

p(ak)H(Sn|ak) ≤M−1∑k=0

p(ak) ¯k <

M−1∑k=0

p(ak)H(Sn|ak) + 1 (34)

where the average codeword length of the conditional code is given by

¯=

M−1∑k=0

p(ak) ¯k (35)


o


Conditional Entropy

Minimum average codeword length and conditional entropy

The lower bound for the average codeword length of conditional codes iscalled the conditional entropy of Sn given the random variable Sn−1

H(Sn|Sn−1) = E {− log2 p(Sn|Sn−1)} =

M−1∑k=0

p(ak)H(Sn|Sn−1 =ak)

(36)

The minimum achievable average codeword length ¯min for conditional codes

is bounded by

H(Sn|Sn−1) ≤ ¯min < H(Sn|Sn−1) + 1 (37)

Conditioning may reduce the average codeword length

The conditional entropy is less than or equal to the marginal entropy(with equality if and only if the process {Sn} is i.i.d.)

H(Sn|Sn−1) ≤ H(Sn) (38)


o


Huffman Coding of Vectors

Joint coding of blocks of N symbols

Stationary discrete random sources S = {Sn} with an M -ary alphabetA = {a0, · · · , aM−1}N symbols are coded jointly: Design Huffman code for joint pmfp(a0, · · · , aN−1) = P (Sn =a0, · · · , Sn+N−1 =aN−1)

Minimum average codeword length ¯min per symbol is bounded by

H(Sn, · · · , Sn+N−1)

N≤ ¯

min <H(Sn, · · · , Sn+N−1)

N+

1

N(39)

where the block entropy is defined by

H(Sn, · · · , Sn+N−1) = E {− log2 p(Sn, · · · , Sn+N−1)} (40)

The following limit is called entropy rate

H(S) = limN→∞

H(S0, · · · , SN−1)

N(41)


o


Entropy Rate

Entropy rate as fundamental bound for lossless coding

Entropy rate

H(S) = limN→∞

H(S0, · · · , SN−1)

N(42)

The limit in (42) always exists for stationary sources [Gallager, 1968]

The entropy rate H(S) is the greatest lower bound for the average codewordlength ¯ per symbol for all lossless coding techniques

¯≥ H(S) (43)

Entropy rate for iid processes

H(S) = limN→∞

E {− log2 p(S0, S1, · · · , SN−1)}N

= limN→∞

∑N−1n=0 E {− log2 p(Sn)}

N= H(S) (44)


o


Entropy Rate for Markov Processes

Entropy rate for stationary Markov processes

H(S) = limN→∞

E {− log2 p(S0, S1, · · · , SN−1)}N

= limN→∞

E {− log2 p(S0)}+∑N−1

n=1 E {− log2 p(Sn|Sn−1)}N

= H(Sn|Sn−1) (45)

Example: Joint Huffman coding of 2 events and ¯ vs. table size NC

aiak p(ai, ak) codewords

a0a0 0.58 1a0a1 0.032 00001a0a2 0.032 00010a1a0 0.036 0010a1a1 0.195 01a1a2 0.012 000000a2a0 0.027 00011a2a1 0.017 000001a2a2 0.06 0011

N ¯ NC1 1.3556 32 1.0094 93 0.9150 274 0.8690 815 0.8462 2436 0.8299 7297 0.8153 21878 0.8027 65619 0.7940 19683


o


Elias Coding and Arithmetic Coding

Motivation

Main drawbacks of block Huffman codes: Large table sizes

Another class of uniquely decodable codes are Elias codesand arithmetic codes

Encoding: Mapping of a string of N symbols s = {s0, s1, ..., sN−1} ontoa string of K bits b = {b0, b1, ..., bK−1}

γ : s 7→ b (46)

Decoding or parsing: Mapping the bit string onto the string of symbols

γ−1 : b 7→ s (47)

Complexity of code construction: Linear per symbol

Construction method: Recursive subdivision of the unit interval [0, 1)

Iterative encoding and decoding procedures


o


Mapping of Symbol Sequences to Numbers

Representing a symbol sequence by a number in the interval [0, 1)

Consider a sequence of N random variables S(N) = {S0, S1, . . . , SN−1},each Si with alphabet of size Mi

Order alphabet symbols and let ηi(si) be a function that returnsthe corresponding symbol index in the range from 0 to Mi − 1

A realization s(N) = {s0, s1, . . . , sN−1} of S(N) can be represented bya unqiue real number r ∈ [0, 1)

r = ζ(s(N)

)=

N−1∑i=0

ηi(si) ·Bi with Bi =

i∏j=0

M−1j (48)

Note that when all Mj = M , the basis simplifies to Bi = M−i−1

Define comparison operators for symbol sequences

s(N)a > s

(N)b ⇐⇒ ζ

(s(N)a

)> ζ(s(N)b

)(49)


o


Mapping of Symbol Sequences to Intervals

Representing a symbol sequence by a probability interval

The probability of the symbol sequence s(N) can be written as

p(s(N)

)= P

(S(N) =s(N)

)= P

(S(N)≤s(N)

)− P

(S(N)<s(N)

)(50)

A symbol sequence s(N) can be represented by an interval IN between twosuccessive levels of the cumulative probability mass function

IN =[LN , LN +WN

)=[P(S(N)<s(N)

), P(S(N)≤s(N)

))(51)

withLN = P

(S(N)<s(N)

)and WN = P

(S(N) =s(N)

)(52)

The intervals IN for different symbol sequences s(N) are disjoint

A symbol sequence s(N) can be uniquely representedby any value v inside the interval IN


o


How Many Bits for Identifying an Interval?

Bit sequence b = {b0, b1, · · · , bK−1} of K bits for representing an interval INRepresent the value v as binary fraction

v =

K−1∑i=0

bi · 2i−1 = 0.b0b1 · · · bK−1 with v ∈ IN(s(N)) (53)

Size of interval p(s(N)

)governs number K of required bits

p(s(N))=1/2 → B={.0, .1}

p(s(N))=1/4 → B={.00, .01, .10, .11}

p(s(N))=1/8 → B={.000, .001, .010, .011, .100, .101, .110, .111}

Minimum number of bits is

K = K(s(N)) =

⌈− log2 p

(s(N))⌉ (54)

Binary number v that identifies the interval IN and determines the bit string b

v =⌈LN · 2K

⌉· 2−K (55)


o


Redundancy of Elias Coding

Average codeword length

Consider coding of N symbols

Average codeword length per symbol

¯=1

NE{K(S(N))

}=

1

NE{⌈− log2 p(S

(N))⌉}

(56)

Applying inequalities dxe ≥ x and dxe < x+ 1 yields

1

NE{− log2 p(S

(N))}≤ ¯<

1

NE{

1− log2 p(S(N))

}(57)

Average codeword length is bounded

1

NH(S(N)) ≤ ¯<

1

NH(S(N)) +

1

N(58)

The redundancy approaches zero as the number N of coded symbolsapproaches infinity (similar to block Huffman coding)


o


Example: IID Source

Example for an iid source for which an optimum Huffman code exists

symbol ak pmf p(ak) Huffman code

a0=‘A’ 0.25 = 2−2 00a1=‘B’ 0.25 = 2−2 01a2=‘C’ 0.50 = 2−1 1

Suppose we intend to send the symbol string s =’CABAC’

Using the Huffman code, the bit string would be b =’10001001’

An alternative to Huffman coding is Elias coding

The probability p(s) is given by

p(s) = p(′C ′) · p(′A′) · p(′B′) · p(′A′) · p(′C ′) =1

2· 1

4· 1

4· 1

4· 1

2=

1

256

The size of the bit string is d− log2 p(s)e = 8 bit


o


Iterative Algorithm for Elias Coding

Iterative interval subdivision

Consider sub-sequences s(n) = {s0, s1, · · · , sn−1} with 1 ≤ n ≤ NIteration rule for the interval width

Wn+1 = Wn · p(sn | s0, s1, . . . , sn−1 ) (59)

Iteration rule of lower interval boundary

Ln+1 = Ln +Wn · c(sn | s0, s1, . . . , sn−1 ) (60)

with the cumulative probability mass function (cmf) c(·) being defined as

c(sn | s0, s1, . . . , sn−1 ) =∑

∀a∈An: a<sn

p(a | s0, s1, . . . , sn−1 ) (61)

An interval In+1 is always nested inside the interval In


o


Iterative Interval Subdivision for Different Sources

Iterative interval subdivision for different sources

Derivation above for general case of dependent and differently distributedrandom variables

For i.i.d. sources, interval refinement can be simplified

Wn = Wn−1 · p(sn−1) (62)

Ln = Ln−1 +Wn−1 · c(sn−1) (63)

For Markov sources with conditional pmf p(sn|sn−1) and conditionalcmf c(sn|sn−1)

Wn = Wn−1 · p(sn−1|sn−2) (64)

Ln = Ln−1 +Wn−1 · c(sn−1|sn−2) (65)

For non-stationary sources, the probabilities p(·) can be adapted during thecoding process


o


Encoding Algorithm for Elias Codes

Encoding algorithm:

1 Given is a sequence {s0, · · · , sN−1} of N symbols

2 Initialization of the iterative process by W0 = 1, L0 = 0

3 For each n = 0, 1, · · · , N − 1, determine the interval In+1 by

Wn+1 = Wn · p(sn|s0, · · · , sn−1)

Ln+1 = Ln +Wn · c(sn|s0, · · · , sn−1)

4 Determine the codeword length by K = d− log2WNe5 Transmit the codeword b of K bits that represents

the fractional part of v = dLN 2Ke 2−K


o


Example for Elias Encoding

s0=‘C’ s1=‘A’ s2=‘B’

W1 = W0 · p(‘C’) W2 = W1 · p(‘A’) W3 = W2 · p(‘B’)= 1 · 2−1 = 2−1 = 2−1 · 2−2 = 2−3 = 2−3 · 2−2 = 2−5

= (0.1)b = (0.001)b = (0.00001)b

L1 = L0 +W0 · c(‘C’) L2 = L1 +W1 · c(‘A’) L3 = L2 +W2 · c(‘B’)= L0 + 1 · 2−1 = L1 + 2−1 · 0 = L2 + 2−3 · 2−2

= 2−1 = 2−1 = 2−1 + 2−5

= (0.1)b = (0.100)b = (0.10001)b

s3=‘A’ s4=‘C’ termination

W4 = W3 · p(‘A’) W5 = W4 · p(‘C’) K = d− log2W5e = 8= 2−5 · 2−2 = 2−7 = 2−7 · 2−1 = 2−8

= (0.0000001)b = (0.00000001)b v =⌈L5 2K

⌉2−K

L4 = L3 +W3 · c(‘A’) L5 = L4 +W4 · c(‘C’) = 2−1 + 2−5 + 2−8

= L3 + 2−5 · 0 = L4 + 2−7 · 2−1

= 2−1 + 2−5 = 2−1 + 2−5 + 2−8 b = ‘10001001′

= (0.1000100)b = (0.10001001)b


o


Illustration of Iterative Coding


o


Decoding Algorithm for Elias Codes

Decoding algorithm:

1 Given is the number N of symbols to be decoded anda codeword b = {b0, · · · , bK−1} of K bits

2 Determine the interval representative v according to

v =

K−1∑i=0

bi 2−i

3 Initialization of the iterative process by W0 = 1, L0 = 04 For each n = 0, 1, · · · , N − 1, do the following:

1 For each ai ∈ An, determine the interval In+1(ai) by

Wn+1(ai) = Wn · p(ai|s0, . . . , sn−1)

Ln+1(ai) = Ln +Wn · c(ai|s0, . . . , sn−1)

2 Select the letter ai ∈ An for which v ∈ In+1(ai)and set sn = ai, Wn+1 = Wn+1(ai), Ln+1 = Ln+1(ai)


o


Arithmetic Coding

Arithmetic coding as finite precision implementation of Elias coding

Problem with Elias codes: Precision requirement for WN and LN

Arithmetic codes: Variant of Elias codes with fixed-precision arithmetic

Represent pmfs p(a) and cmfs c(a) and width Wn with finite number of bits

Loss in coding efficiency due to rounding is typically negligible

Representation of the lower interval boundary Ln has the structure

Ln = 0. aaaaa · · · a︸︷︷︸settled bits

0111111 · · · 1︸︷︷︸outstanding bits

xxxxx · · ·x︸︷︷︸active bits

00000 · · ·︸︷︷︸trailing bits

where

“settled bits” are not modified in following interval updates (can be output)“outstanding bits” may be modified by a carry from the active bits“active bits” directly modified by following interval update

Most practical variant of arithmetic coding is binary arithmetic coding

Symbols are first binarized using a variable length code

Decoding search is reduced to one comparison

Multiplication-free algorithms (e.g., M coder) for binary arithmetic coding


o


Comparison of Lossless Coding Techniques

Example: Markov source

Instantaneous entropy rate Hinst(S, L) = 1LH(S0, S1, . . . , SL−1)


o


Conditional and Adaptive Codes

Coding of source with memory and/or varying statistics

One approach would be a switch Huffman code trained on the conditionalprobabilities

The resulting number of Huffman code tables is often too large in practice

Hence, conditional entropy coding is typically done using arithmetic codes

In adaptive arithmetic coding, probabilities p(ak) are estimated/adaptedsimultaneously at encoder and decoder

Statistical dependencies can be exploited using so-called context modelingtechniques:Conditional probabilities p(ak|zk) with zk being a context/state that issimultaneously computed at encoder and decoder


o


Forward and Backward Adaptation

The two basic approaches for adaptation are

Forward adaptation:

Gather statistics for a large enough block of source symbolsTransmit adaptation signal to decoder as side informationDisadvantage: Increased bit rate due to side information

Backward adaptation:

Gather statistics simultaneously at coder and decoderDrawback: Error resilience

With today’s packet-switched transmission systems, an efficient combination ofthe two adaptation approaches can be achieved:

1 Gather statistics for the entire packet and provide initialization of entropycode at the beginning of the packet

2 Conduct backwards adaptation for each symbol inside the packet in order tominimize the size of the packet


o


Illustration of Adaptive Coding

Encoding

Ch

an

ne

l

Decoding Reconstructed

symbols Source symbols

Ch

an

ne

l

Computation

of adaptation

signal

Delay

Computation

of adaptation

signal Delay

Encoding

Computation

of adaptation

signal Delay

Decoding Reconstructed

symbols

Forward Adaptation

Backward Adaptation

Source

symbols


o


Summary

Entropy is the lower bound for the average number of bits/symbol foruniquely decodable scalar codes

Entropy rate is the lower bound for the average number of bits/symbol forall uniquely decodable lossless codes

Huffman coding

is an efficient and simple entropy coding methodneeds code tablecan be inefficient for certain probabilitiesdifficult to use for exploiting statistical dependencies and time-varyingprobabilities

Arithmetic coding

is a universal method for encoding strings of symbolsdoes not need a code table, but a table for storing probabilitiestypically requires serial computation of interval and probability estimationupdate (in case probabilities are adapted)approaches entropy for long stringsis well suited for exploiting statistical dependencies and coding withtime-varying probabilities


Documents

Source Coding Fundamentals - TU Berlin · Source Coding Fundamentals Formulation of the Practical Communication Problem The practical source coding design problem can be posed as