Digital Image Processing Lectures 25 & 26 · 2011-04-28 · Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Digital

Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Digital Image ProcessingLectures 25 & 26

M.R. Azimi, Professor

Department of Electrical and Computer EngineeringColorado State University

M.R. Azimi Digital Image Processing


Area 4: Image Encoding and Compression

Goal:

To exploit the redundancies in the image in order to reduce the numberof bits to represent an image or a sequence of images (e.g., video).

Applications:Image Transmission: e.g., HDTV, 3DTV, satellite/militarycommunication, and teleconferencing.Image Storage: e.g., Document storage & retrieval, medical imagearchives, weather maps, and geological surveys.Category of Techniques:

1 Pixel Encoding:PCM, run-length encoding, bit-plane, Huffmann encoding, entropy encoding

2 Predictive Encoding:Delta modulation, 2-D DPCM, inter-frame method

3 Transform-based Encoding:DCT-based, WT-based, Zonal encoding

4 Others:Vector quantization (clustering), neural network-based, hybrid encoding



Encoding SystemThere are three steps involved with any encoding system (Fig. 1).

a. Mapping: Removes redundancies in the images. Should be invertible.

b. Quantization: Mapped values are quantized using uniform orLlyod-Max quantizers.

c. Coding: Optimal codewords are assigned to the quantized values.

Figure 1: A Typical Image Encoding System.

However, before we discuss several types of encoding systems, we need to

review some basic results from information theory.



Measure of Information & Entropy

Assume there is a source (e.g., an image) that generates a discrete set ofindependent messages (e.g., grey-levels), rk, with prob. Pk, k ∈ [1, L]with L being the number of messages (or number of levels).

Figure 2: Source and message.

Then, information associated with rk is

Ik = − log2 Pk bits

Clearly,∑L−1k=0 Pk = 1. For equally likely levels (messages) information

can be transmitted as an n-bit binary number

Pk =1

L=

1

2n→ Ik = n bits

For images, Pk’s are obtained from the histogram.



As an example, consider a binary image with r0 = Black, P0 = 1 andr1 = White, P1 = 0, then Ik = 0 i.e. no information.

Entropy:Average information generated by the source

H =

L∑k=1

PkIk = −L∑k=1

Pk log2 Pk Avg. bits/pixel

Entropy also represents a measure of redundancy.Let L = 4,P1 = P2 = P3 = 0 and P4 = 1, then H = 0 i.e. most certaincase and thus maximum redundancy.Now, let L = 4,P1 = P2 = P3 = P4 = 1/4, then H = 2 i.e. mostuncertain case and hence least redundant.Maximum entropy occurs when levels are equally likely,Pk = 1

L k ∈ [1, L], then

Hmax = −L∑k=1

1

Llog2

1

L= log2 L

Thus,0 ≤ H ≤ Hmax

Entropy and codingEntropy represents the lower bound on the number of bits required to

code the coder inputs. That is, for a set of coder input levels

vk, k ∈ [1, L], with Pk then it is guaranteed that it is not possible to code

them using less than H bits on the average.



Entropy and Coding

Entropy represents the lower bound on the number of bits required tocode the coder inputs, i.e. for a set of coder inputs vk, k ∈ [1, L], withprob Pk it is guaranteed that it is not possible to code them using lessthan H bits on the average. If we design a code with codewordsCk, k ∈ [1, L] with corresponding word lengths βks, the average number

of bits required by the coder is R(L) =∑Lk=1 βkPk.

Figure 3: Coder producing codewords Cks with lengths βks.

Shannon’s Entropy Coding Theorem (1949)The average length R(L) is bounded by

H ≤ R(L) ≤ H + ε, , ε = 1/L



i.e. it is possible to encode without distortion a source with entropy Husing an average of H + ε bits/message; or it is possible to encode withdistortion the source using H bits/message. Optimality of the coderdepends on how close R(L) is to H.

Example:Let L = 2, P1 = p and P2 = 1− p 0 ≤ p ≤ 1. Thus, the entropy isH = −p log2 p− (1− p) log2(1− p). The above figure shows H as afunction of p. Clearly, since the source is binary, we can use 1 bit/pixel.This corresponds to Hmax = 1 at p = 1/2. However, if p = 1/8,H ≈ 0.2i.e. more redundancies then it is possible to find a coding scheme thatuses only 0.2 bits/pixel.



Remark:Max achievable compression is

C =Average bit rate of original raw data(B)

Average bit rate of encoded data (R(L))

ThusB

H + ε≤ C ≤ B

Hε = 1/L

Since certain distortion is inevitable in any image transmission, it isnecessary to find the minimum number of bits to encode the image whileallowing a certain level of distortion.

Rate Distortion FunctionLet D be a fixed distortion between the actual values, x and reproducedvalues, x̂. Then the question is: allowing D distortion what is minimumnumber of bits required to encode the data?If we consider x as a Gaussian r.v. with σ2

x, D is

D = E[(x− x̂)2]

Rate distortion function is defined by



RD =

{12 log2

σ2xD 0 ≤ D ≤ σ2x

0 D > σ2x

= Max[0,1

2log2

σ2xD

]

At maximum D ≥ σ2x, RD = 0 i.e. no information needs is transmitted.

Figure 4: Rate Distortion Function RD versus D.

RD shows the number of bits required for distortion D. Since RD

represents the number of bits/pixel N = 2RD =(σ2x

D

)1/2, D is

considered to be quantization noise variance. This variance can beminimized using Llyod-Max quantizer. In transform domain we canassume that x is white (e.g., due to KL).



Pixel-Based Encoding

Encode each pixel ignoring their inter-pixel dependencies. Amongmethods are:1. Entropy CodingEvery block of an image is entropy encoded based upon the Pk’s within ablock. This produces variable length code for each block depending onspatial activities within the blocks.

2. Run-Length EncodingScan the image horizontally or vertically and while scanning assign agroup of pixel with the same intensity into a pair (gi, li) where gi is theintensity and li is the length of the “run”. This method can also be usedfor detecting edges and boundaries of an object. It is mostly used forimages with a small number of gray levels and is not effective for highlytextured images.



Example 1: Consider the following 8× 8 image.4 4 4 4 4 4 4 04 5 5 5 5 5 4 04 5 6 6 6 5 4 04 5 6 7 6 5 4 04 5 6 6 6 5 4 04 5 5 5 5 5 4 04 4 4 4 4 4 4 04 4 4 4 4 4 4 0

The run-length codes using vertical (continuous top-down) scanningmode are:

(4,9) (5,5) (4,3) (5,1) (6,3)(5,1) (4,3) (5,1) (6,1) (7,1)(6,1) (5,1) (4,3) (5,1) (6,3)(5,1) (4,3) (5,5) (4,10) (0,8)

i.e. total of 20 pairs = 40 numbers. The horizontal scanning would lead

to 34 pairs = 68 numbers, which is more than the actual number of

pixels (i.e. 64).



Example 2: Let the transition probabilities for run-length encoding of abinary image (0: black and 1: white) be p0 = P (0|1) and p1 = P (1|0).Assuming all runs are independent, find (a) average run lengths, (b)entropies of white and black runs, and (c) compression ratio.Solution: A run of length l ≥ 1 can be represented by a Geometric r.v.Xi with PMF P (Xi = l) = pi(1− pi)l−1 with i = 0, 1 which correspondsto happening of 1st occurrences of 0 or 1 after l independent trials.(Note that (1− P (0|1)) = P (1|1) and (1− P (1|0)) = P (0|0).) andThus, for the average we have

µXi =

∞∑l=1

lP (Xi = l) =

∞∑l=1

lpi(1− pi)l−1

which using series∑∞n=1 na

n−1 = 1(1−a)2 reduces to µXi

= 1pi

. The

entropy is given by

HXi= −

∞∑l=1

P (Xi = l)log2P (Xi = l)

= −pi∞∑l=1

(1− pi)l−1[log2pi + (l − 1)log2(1− pi)]



Using the same series formula, we get

HXi= − 1

pi[pilog2pi + (1− pi)log2(1− pi)]

. The achievable compression ratio is

C =HX0

+HX1

µX0+ µX1

=HX0

P0

µX0

+HX1

P1

µX1

where Pi = pip0+p1

are the a priori probabilities of black and white pixels.



3. Huffman EncodingAlgorithm consists of the following steps.

1 Arrange symbols with probability Pk’s in a decreasing order andconsider them as “leaf nodes” of a tree.

2 Merge two nodes with smallest prob to form a new node whose probis the sum of the two merged nodes. Go to Step 1 and repeat untilonly two nodes are left (“root nodes”).

3 Arbitrarily assign 1’s and 0’s to each pair of branches merging into anode.

4 Read sequentially from root node to the leaf nodes to form theassociated code for each symbol.

Example 3: For the same image in the previous example, which requires 3bits/pixel using standard PCM we can arrange the table on the next page.



Gray levels # occurrences Pk Ck βk Pkβk −Pk log2 Pk

0 8 0.125 0000 4 0.5 0.375

1 0 0 - 0 - -2 0 0 - 0 - -3 0 0 - 0 - -4 31 0.484 1 1 0.484 0.507

5 16 0.25 01 2 0.5 0.5

6 8 0.125 001 3 0.375 0.375

7 1 0.016 0001 4 0.64 0.095

64 1 R HCodewords Cks are obtained by constructing the binary tree as in Fig. 5.

Figure 5: Tree Structure for Huffman Encoding.



Note that in this case, we have

R =

8∑k=1

βkPk = 1.923 bits/pixel

H = −8∑k=1

Pk log2 Pk = 1.852 bits/pixel

Thus,

1.852 ≤ R = 1.923 ≤ H +1

L= 1.977

i.e. an average of 2 bits/pixel (instead of 3 bits/pixel using PCM) can beused to code the image. However, the drawback of the standard Huffmanencoding method is that the codes have variable lengths.



Predictive Encoding

Idea: Remove mutual redundancy among successive pixels in a region ofsupport (ROS) or neighborhood and encode only the new information.This method is based upon linear prediction. Let us start with 1-D linearpredictors. An N th order linear prediction of x(n) based on N previoussamples is generated using a 1-D autoregressive (AR) model

x̂(n) = a1x(n− 1) + a2x(n− 2) + · · ·+ aNx(n−N)

ai’s are model coefficients determined based on some sample signals.Now instead of encoding x(n) the prediction error

e(n) = x(n)− x̂(n)

is encoded as it requires substantially smaller number of bits. Then, atthe receiver we reconstruct x(n) using the previous encoded valuesx(n− k) and the encoded error signal, i.e.

x(n) = x̂(n) + e(n)

This method is also referred to as differential PCM (DPCM).



Minimum Variance PredictionThe predictor

x̂(n) =

N∑i=1

aix(n− i)

is the best N th order linear mean-squared predictor of x(n), whichminimizes the MSE

ε = E

[(x(n)− x̂(n)

)2]This minimization wrt ak’s results in the following “orthogonal property”:

∂ε

∂ak= −2E

[(x(n)− x̂(n)

)x(n− k)

]= 0, 1 ≤ k ≤ N

which leads to the normal equation

rxx(k)−N∑i=1

airxx(k − i) = σ2eδ(k), 0 ≤ k ≤ N

where rxx(k) is the autocorrelation of the data x(n) and σ2e is the

variance of the driving process e(n).



Plugging different values for k ∈ [0, N ] gives the AR Yule-Walkerequation for solving for ai’s and σ2

e , i.e.

rxx(0) rxx(1) · · · · · · rxx(N)rxx(1) rxx(0) · · · · · · rxx(N − 1)

......

. . ....

. . ....

rxx(N) rxx(N − 1) · · · · · · rxx(0)

1−a1

...

...−aN

=

σ2e

00

...0

(1)

Note that correlation matrix, Rx in this case is both Toeplitz andHermitian. The solution to this system of linear equation is given by

σ2e =

1

[R−1x ]1,1ai = −σ2

e/[R−1x ]i+1,1

where [R−1x ]i,j is the i, jth element of matrix R−1x .



In the 2-D case, an AR model with non-symmetric half-plane (NSHP)ROS is used. This ROS is shown in Fig. 6 when image is scanned fromleft-to-right and top-to-bottom.

Figure 6: A 1st Order 2-D AR Model with NSHP ROS.

For a 1st order 2-D ARx(m,n) = a01x(m,n− 1) + a11x(m− 1, n− 1) + a10x(m− 1, n) +a1,−1x(m− 1, n+ 1) + e(m,n)where ai,j ’s are model coefficients. Then, the best linear prediction ofx(m,n) isx̂(m,n) =a01x(m,n−1)+a11x(m−1, n−1)+a10x(m−1, n)+a1,−1x(m−1, n+1)



Note that at every pixel four previously scanned pixels are needed togenerate predicted value x̂(m,n). Fig. 7 shows those pixels that need tobe stored in the global ”state vector” for this 1st order predictor.

Figure 7: Global State Vector.

Assuming that the reproduced values (quantized) up to (m,n− 1) areavailable, we generatex̂0(m,n) = a10x

0(m,n− 1) + a11x0(m− 1, n− 1) + a10x

0(m− 1, n) +a1,−1x

0(m− 1, n+ 1)Then, prediction error is applied to the quantizer.

e(m,n) := x(m,n)− x̂0(m,n) quantizer input

The quantized value e0(m,n) is encoded and transmitted. Also it is usedto generate the reproduced value using

x0(m,n) = e0(m,n) + x̂0(m,n) reproduced valueM.R. Azimi Digital Image Processing


The entire process at the transmitter and receiver is depicted in Fig. 8.Clearly, it is assumed that the model coefficients are available at thereceiver.

Figure 8: Block Diagram of 2-D Predictive Encoding System.

It is interesting to note that

q(m,n) := x(m,n)− x0(m,n) PCM quantization error

= e(m,n)− e0(m,n) DPCM quantization error

However, for the same quantization error, q(m,n), DPCM requires muchfewer number of bits.



Performance Analysis of DPCMFor straight PCM the rate distortion function is

RPCM =1

2log2 σ

2x/σ

2q bit/pixels

i.e. number of bits required per pixel in the presence of a particulardistortion σ2

q = E[q2(m,n)].Now for DPCM the rate distortion function

RDPCM =1

2log2 σ

2e/σ

2q bit/pixels

for the same distortion. Clearly, σ2e � σ2

x → RDPCM � RPCM . The bitreduction of DPCM over PCM is

RPCM −RDPCM =1

2log2 σ

2x/σ

2e

=1

0.6log10 σ

2x/σ

2e

The achieved compression depends on the inter-pixel redundancy i.e. foran image with no redundancy image (random).

σ2x = σ2

e → RPCM = RDPCMM.R. Azimi Digital Image Processing


Transform-Based Encoding

Idea: Reduce redundancy by applying unitary transformation to blocks ofan image. Then the redundancy removed coefficients/features areencoded.The process of transform-based encoding or block quantization isdepicted in Fig. 9. The image is first partitioned into non-overlappingblocks. Each block in then unitary transformed and the principalcoefficients are quantized and encoded.

Figure 9: Transform-Based Encoding Process.

Q1: What are the best mapping matrices A and B, so that maximumredundancy removal is achieved and at the same time distortion due tocoefficients reduction is minimized?Q2: What is the best quantizer that gives minimum quantizationdistortion?



Theorem:Let x be a random vector representing blocks of an image and y be thetransformed version y = Ax with components y(k) that are mutuallyuncorrelated. These components are then quantized to yo and thenencoded and transmitted. At the receiver the decoded values arereconstructed using matrix B i.e. xo = Byo. The objective is findoptimum matrices A and B and optimum quantizer such that

D = E[‖ x− xo ‖2]

is minimized.1. The optimum matrices are A = Ψ∗t and B = Ψ, i.e. KL transformpair.2. The optimum quantizer is Lloyd-Max quantizer.Proof: See Jain’s book (ch. 11).



Bit allocationTo allocate optimally a given total number of bits (M) to N (retained)components of yo, so that distortion is minimized

D =1

N

N∑k=1

E[(y(k)− yo(k))2] =1

N

N∑k=1

σ2kf(mk)

f(.): quantizer distortion function.σ2k: variance of coefficient y(k).mk: number of bits allocated to yo(k).Optimal bit allocation involves finding mks to minimize D subject toM =

∑Nk=1mk. Note that coefficients with higher variance contain more

information than those with lower variance. Thus, more bits aredesignated to them to improve the performance.i. Shannon’s Allocation Strategy

mk = mk(θ) = Max(0,1

2log2(

σ2k

θ))

θ: Must be found to produce an average rate of p = MN bits per pixel

(bpp).



ii. Segall Allocation Strategy

mk(θ) =

1

1.78 log2(1.46σ2

k

θ ) 0.083σ2k ≥ θ > 0

11.57 log2(

σ2k

θ ) σ2k ≥ θ > 0.083σ2

k

0 θ > σ2k

where θ solves

∑Nk=1mk(θ) = M .

iii. Huang/Schultheiss Allocation StrategyThis bit allocation approximates the optimal non-uniform allocation forGaussian coefficients giving:

m̂k =M

N+ 2 log2 σ

2k −

2

N

N∑i=1

log2 σ2i

mk = Int[m̂k] with M =

N∑k=1

mk Fixed.

Figs. 10 and 11 show reconstructed images of Lena and Barbara using

Shannon (SNRLena = 20.55 and SNRBarb = 17.24dB) and Segall

(SNRLena = 21.23 and SNRBarb = 16.90dB) bit allocation methods for

an average of p = 1.5 bpp together with the corresponding error images.



50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

Student Version of MATLAB

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500


50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500


Figure 10: Reconstructed & Error Images-Shannon’s (1.5bpp).



50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500


50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500


50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500


50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500


Figure 11: Reconstructed & Error Images-Segall’s (1.5bpp).


Documents

Digital Image Processing Lectures 25 & 26 · 2011-04-28 · Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Digital