Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Digital Image ProcessingLectures 25 & 26
M.R. Azimi, Professor
Department of Electrical and Computer EngineeringColorado State University
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Area 4: Image Encoding and Compression
Goal:
To exploit the redundancies in the image in order to reduce the numberof bits to represent an image or a sequence of images (e.g., video).
Applications:Image Transmission: e.g., HDTV, 3DTV, satellite/militarycommunication, and teleconferencing.Image Storage: e.g., Document storage & retrieval, medical imagearchives, weather maps, and geological surveys.Category of Techniques:
1 Pixel Encoding:PCM, run-length encoding, bit-plane, Huffmann encoding, entropy encoding
2 Predictive Encoding:Delta modulation, 2-D DPCM, inter-frame method
3 Transform-based Encoding:DCT-based, WT-based, Zonal encoding
4 Others:Vector quantization (clustering), neural network-based, hybrid encoding
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Encoding SystemThere are three steps involved with any encoding system (Fig. 1).
a. Mapping: Removes redundancies in the images. Should be invertible.
b. Quantization: Mapped values are quantized using uniform orLlyod-Max quantizers.
c. Coding: Optimal codewords are assigned to the quantized values.
Figure 1: A Typical Image Encoding System.
However, before we discuss several types of encoding systems, we need to
review some basic results from information theory.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Measure of Information & Entropy
Assume there is a source (e.g., an image) that generates a discrete set ofindependent messages (e.g., grey-levels), rk, with prob. Pk, k ∈ [1, L]with L being the number of messages (or number of levels).
Figure 2: Source and message.
Then, information associated with rk is
Ik = − log2 Pk bits
Clearly,∑L−1k=0 Pk = 1. For equally likely levels (messages) information
can be transmitted as an n-bit binary number
Pk =1
L=
1
2n→ Ik = n bits
For images, Pk’s are obtained from the histogram.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
As an example, consider a binary image with r0 = Black, P0 = 1 andr1 = White, P1 = 0, then Ik = 0 i.e. no information.
Entropy:Average information generated by the source
H =
L∑k=1
PkIk = −L∑k=1
Pk log2 Pk Avg. bits/pixel
Entropy also represents a measure of redundancy.Let L = 4,P1 = P2 = P3 = 0 and P4 = 1, then H = 0 i.e. most certaincase and thus maximum redundancy.Now, let L = 4,P1 = P2 = P3 = P4 = 1/4, then H = 2 i.e. mostuncertain case and hence least redundant.Maximum entropy occurs when levels are equally likely,Pk = 1
L k ∈ [1, L], then
Hmax = −L∑k=1
1
Llog2
1
L= log2 L
Thus,0 ≤ H ≤ Hmax
Entropy and codingEntropy represents the lower bound on the number of bits required to
code the coder inputs. That is, for a set of coder input levels
vk, k ∈ [1, L], with Pk then it is guaranteed that it is not possible to code
them using less than H bits on the average.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Entropy and Coding
Entropy represents the lower bound on the number of bits required tocode the coder inputs, i.e. for a set of coder inputs vk, k ∈ [1, L], withprob Pk it is guaranteed that it is not possible to code them using lessthan H bits on the average. If we design a code with codewordsCk, k ∈ [1, L] with corresponding word lengths βks, the average number
of bits required by the coder is R(L) =∑Lk=1 βkPk.
Figure 3: Coder producing codewords Cks with lengths βks.
Shannon’s Entropy Coding Theorem (1949)The average length R(L) is bounded by
H ≤ R(L) ≤ H + ε, , ε = 1/L
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
i.e. it is possible to encode without distortion a source with entropy Husing an average of H + ε bits/message; or it is possible to encode withdistortion the source using H bits/message. Optimality of the coderdepends on how close R(L) is to H.
Example:Let L = 2, P1 = p and P2 = 1− p 0 ≤ p ≤ 1. Thus, the entropy isH = −p log2 p− (1− p) log2(1− p). The above figure shows H as afunction of p. Clearly, since the source is binary, we can use 1 bit/pixel.This corresponds to Hmax = 1 at p = 1/2. However, if p = 1/8,H ≈ 0.2i.e. more redundancies then it is possible to find a coding scheme thatuses only 0.2 bits/pixel.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Remark:Max achievable compression is
C =Average bit rate of original raw data(B)
Average bit rate of encoded data (R(L))
ThusB
H + ε≤ C ≤ B
Hε = 1/L
Since certain distortion is inevitable in any image transmission, it isnecessary to find the minimum number of bits to encode the image whileallowing a certain level of distortion.
Rate Distortion FunctionLet D be a fixed distortion between the actual values, x and reproducedvalues, x̂. Then the question is: allowing D distortion what is minimumnumber of bits required to encode the data?If we consider x as a Gaussian r.v. with σ2
x, D is
D = E[(x− x̂)2]
Rate distortion function is defined by
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
RD =
{12 log2
σ2xD 0 ≤ D ≤ σ2x
0 D > σ2x
= Max[0,1
2log2
σ2xD
]
At maximum D ≥ σ2x, RD = 0 i.e. no information needs is transmitted.
Figure 4: Rate Distortion Function RD versus D.
RD shows the number of bits required for distortion D. Since RD
represents the number of bits/pixel N = 2RD =(σ2x
D
)1/2, D is
considered to be quantization noise variance. This variance can beminimized using Llyod-Max quantizer. In transform domain we canassume that x is white (e.g., due to KL).
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Pixel-Based Encoding
Encode each pixel ignoring their inter-pixel dependencies. Amongmethods are:1. Entropy CodingEvery block of an image is entropy encoded based upon the Pk’s within ablock. This produces variable length code for each block depending onspatial activities within the blocks.
2. Run-Length EncodingScan the image horizontally or vertically and while scanning assign agroup of pixel with the same intensity into a pair (gi, li) where gi is theintensity and li is the length of the “run”. This method can also be usedfor detecting edges and boundaries of an object. It is mostly used forimages with a small number of gray levels and is not effective for highlytextured images.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Example 1: Consider the following 8× 8 image.4 4 4 4 4 4 4 04 5 5 5 5 5 4 04 5 6 6 6 5 4 04 5 6 7 6 5 4 04 5 6 6 6 5 4 04 5 5 5 5 5 4 04 4 4 4 4 4 4 04 4 4 4 4 4 4 0
The run-length codes using vertical (continuous top-down) scanningmode are:
(4,9) (5,5) (4,3) (5,1) (6,3)(5,1) (4,3) (5,1) (6,1) (7,1)(6,1) (5,1) (4,3) (5,1) (6,3)(5,1) (4,3) (5,5) (4,10) (0,8)
i.e. total of 20 pairs = 40 numbers. The horizontal scanning would lead
to 34 pairs = 68 numbers, which is more than the actual number of
pixels (i.e. 64).
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Example 2: Let the transition probabilities for run-length encoding of abinary image (0: black and 1: white) be p0 = P (0|1) and p1 = P (1|0).Assuming all runs are independent, find (a) average run lengths, (b)entropies of white and black runs, and (c) compression ratio.Solution: A run of length l ≥ 1 can be represented by a Geometric r.v.Xi with PMF P (Xi = l) = pi(1− pi)l−1 with i = 0, 1 which correspondsto happening of 1st occurrences of 0 or 1 after l independent trials.(Note that (1− P (0|1)) = P (1|1) and (1− P (1|0)) = P (0|0).) andThus, for the average we have
µXi =
∞∑l=1
lP (Xi = l) =
∞∑l=1
lpi(1− pi)l−1
which using series∑∞n=1 na
n−1 = 1(1−a)2 reduces to µXi
= 1pi
. The
entropy is given by
HXi= −
∞∑l=1
P (Xi = l)log2P (Xi = l)
= −pi∞∑l=1
(1− pi)l−1[log2pi + (l − 1)log2(1− pi)]
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Using the same series formula, we get
HXi= − 1
pi[pilog2pi + (1− pi)log2(1− pi)]
. The achievable compression ratio is
C =HX0
+HX1
µX0+ µX1
=HX0
P0
µX0
+HX1
P1
µX1
where Pi = pip0+p1
are the a priori probabilities of black and white pixels.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
3. Huffman EncodingAlgorithm consists of the following steps.
1 Arrange symbols with probability Pk’s in a decreasing order andconsider them as “leaf nodes” of a tree.
2 Merge two nodes with smallest prob to form a new node whose probis the sum of the two merged nodes. Go to Step 1 and repeat untilonly two nodes are left (“root nodes”).
3 Arbitrarily assign 1’s and 0’s to each pair of branches merging into anode.
4 Read sequentially from root node to the leaf nodes to form theassociated code for each symbol.
Example 3: For the same image in the previous example, which requires 3bits/pixel using standard PCM we can arrange the table on the next page.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Gray levels # occurrences Pk Ck βk Pkβk −Pk log2 Pk
0 8 0.125 0000 4 0.5 0.375
1 0 0 - 0 - -2 0 0 - 0 - -3 0 0 - 0 - -4 31 0.484 1 1 0.484 0.507
5 16 0.25 01 2 0.5 0.5
6 8 0.125 001 3 0.375 0.375
7 1 0.016 0001 4 0.64 0.095
64 1 R HCodewords Cks are obtained by constructing the binary tree as in Fig. 5.
Figure 5: Tree Structure for Huffman Encoding.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Note that in this case, we have
R =
8∑k=1
βkPk = 1.923 bits/pixel
H = −8∑k=1
Pk log2 Pk = 1.852 bits/pixel
Thus,
1.852 ≤ R = 1.923 ≤ H +1
L= 1.977
i.e. an average of 2 bits/pixel (instead of 3 bits/pixel using PCM) can beused to code the image. However, the drawback of the standard Huffmanencoding method is that the codes have variable lengths.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Predictive Encoding
Idea: Remove mutual redundancy among successive pixels in a region ofsupport (ROS) or neighborhood and encode only the new information.This method is based upon linear prediction. Let us start with 1-D linearpredictors. An N th order linear prediction of x(n) based on N previoussamples is generated using a 1-D autoregressive (AR) model
x̂(n) = a1x(n− 1) + a2x(n− 2) + · · ·+ aNx(n−N)
ai’s are model coefficients determined based on some sample signals.Now instead of encoding x(n) the prediction error
e(n) = x(n)− x̂(n)
is encoded as it requires substantially smaller number of bits. Then, atthe receiver we reconstruct x(n) using the previous encoded valuesx(n− k) and the encoded error signal, i.e.
x(n) = x̂(n) + e(n)
This method is also referred to as differential PCM (DPCM).
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Minimum Variance PredictionThe predictor
x̂(n) =
N∑i=1
aix(n− i)
is the best N th order linear mean-squared predictor of x(n), whichminimizes the MSE
ε = E
[(x(n)− x̂(n)
)2]This minimization wrt ak’s results in the following “orthogonal property”:
∂ε
∂ak= −2E
[(x(n)− x̂(n)
)x(n− k)
]= 0, 1 ≤ k ≤ N
which leads to the normal equation
rxx(k)−N∑i=1
airxx(k − i) = σ2eδ(k), 0 ≤ k ≤ N
where rxx(k) is the autocorrelation of the data x(n) and σ2e is the
variance of the driving process e(n).
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Plugging different values for k ∈ [0, N ] gives the AR Yule-Walkerequation for solving for ai’s and σ2
e , i.e.
rxx(0) rxx(1) · · · · · · rxx(N)rxx(1) rxx(0) · · · · · · rxx(N − 1)
......
. . ....
. . ....
rxx(N) rxx(N − 1) · · · · · · rxx(0)
1−a1
...
...−aN
=
σ2e
00
...0
(1)
Note that correlation matrix, Rx in this case is both Toeplitz andHermitian. The solution to this system of linear equation is given by
σ2e =
1
[R−1x ]1,1ai = −σ2
e/[R−1x ]i+1,1
where [R−1x ]i,j is the i, jth element of matrix R−1x .
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
In the 2-D case, an AR model with non-symmetric half-plane (NSHP)ROS is used. This ROS is shown in Fig. 6 when image is scanned fromleft-to-right and top-to-bottom.
Figure 6: A 1st Order 2-D AR Model with NSHP ROS.
For a 1st order 2-D ARx(m,n) = a01x(m,n− 1) + a11x(m− 1, n− 1) + a10x(m− 1, n) +a1,−1x(m− 1, n+ 1) + e(m,n)where ai,j ’s are model coefficients. Then, the best linear prediction ofx(m,n) isx̂(m,n) =a01x(m,n−1)+a11x(m−1, n−1)+a10x(m−1, n)+a1,−1x(m−1, n+1)
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Note that at every pixel four previously scanned pixels are needed togenerate predicted value x̂(m,n). Fig. 7 shows those pixels that need tobe stored in the global ”state vector” for this 1st order predictor.
Figure 7: Global State Vector.
Assuming that the reproduced values (quantized) up to (m,n− 1) areavailable, we generatex̂0(m,n) = a10x
0(m,n− 1) + a11x0(m− 1, n− 1) + a10x
0(m− 1, n) +a1,−1x
0(m− 1, n+ 1)Then, prediction error is applied to the quantizer.
e(m,n) := x(m,n)− x̂0(m,n) quantizer input
The quantized value e0(m,n) is encoded and transmitted. Also it is usedto generate the reproduced value using
x0(m,n) = e0(m,n) + x̂0(m,n) reproduced valueM.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
The entire process at the transmitter and receiver is depicted in Fig. 8.Clearly, it is assumed that the model coefficients are available at thereceiver.
Figure 8: Block Diagram of 2-D Predictive Encoding System.
It is interesting to note that
q(m,n) := x(m,n)− x0(m,n) PCM quantization error
= e(m,n)− e0(m,n) DPCM quantization error
However, for the same quantization error, q(m,n), DPCM requires muchfewer number of bits.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Performance Analysis of DPCMFor straight PCM the rate distortion function is
RPCM =1
2log2 σ
2x/σ
2q bit/pixels
i.e. number of bits required per pixel in the presence of a particulardistortion σ2
q = E[q2(m,n)].Now for DPCM the rate distortion function
RDPCM =1
2log2 σ
2e/σ
2q bit/pixels
for the same distortion. Clearly, σ2e � σ2
x → RDPCM � RPCM . The bitreduction of DPCM over PCM is
RPCM −RDPCM =1
2log2 σ
2x/σ
2e
=1
0.6log10 σ
2x/σ
2e
The achieved compression depends on the inter-pixel redundancy i.e. foran image with no redundancy image (random).
σ2x = σ2
e → RPCM = RDPCMM.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Transform-Based Encoding
Idea: Reduce redundancy by applying unitary transformation to blocks ofan image. Then the redundancy removed coefficients/features areencoded.The process of transform-based encoding or block quantization isdepicted in Fig. 9. The image is first partitioned into non-overlappingblocks. Each block in then unitary transformed and the principalcoefficients are quantized and encoded.
Figure 9: Transform-Based Encoding Process.
Q1: What are the best mapping matrices A and B, so that maximumredundancy removal is achieved and at the same time distortion due tocoefficients reduction is minimized?Q2: What is the best quantizer that gives minimum quantizationdistortion?
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Theorem:Let x be a random vector representing blocks of an image and y be thetransformed version y = Ax with components y(k) that are mutuallyuncorrelated. These components are then quantized to yo and thenencoded and transmitted. At the receiver the decoded values arereconstructed using matrix B i.e. xo = Byo. The objective is findoptimum matrices A and B and optimum quantizer such that
D = E[‖ x− xo ‖2]
is minimized.1. The optimum matrices are A = Ψ∗t and B = Ψ, i.e. KL transformpair.2. The optimum quantizer is Lloyd-Max quantizer.Proof: See Jain’s book (ch. 11).
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Bit allocationTo allocate optimally a given total number of bits (M) to N (retained)components of yo, so that distortion is minimized
D =1
N
N∑k=1
E[(y(k)− yo(k))2] =1
N
N∑k=1
σ2kf(mk)
f(.): quantizer distortion function.σ2k: variance of coefficient y(k).mk: number of bits allocated to yo(k).Optimal bit allocation involves finding mks to minimize D subject toM =
∑Nk=1mk. Note that coefficients with higher variance contain more
information than those with lower variance. Thus, more bits aredesignated to them to improve the performance.i. Shannon’s Allocation Strategy
mk = mk(θ) = Max(0,1
2log2(
σ2k
θ))
θ: Must be found to produce an average rate of p = MN bits per pixel
(bpp).
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
ii. Segall Allocation Strategy
mk(θ) =
1
1.78 log2(1.46σ2
k
θ ) 0.083σ2k ≥ θ > 0
11.57 log2(
σ2k
θ ) σ2k ≥ θ > 0.083σ2
k
0 θ > σ2k
where θ solves
∑Nk=1mk(θ) = M .
iii. Huang/Schultheiss Allocation StrategyThis bit allocation approximates the optimal non-uniform allocation forGaussian coefficients giving:
m̂k =M
N+ 2 log2 σ
2k −
2
N
N∑i=1
log2 σ2i
mk = Int[m̂k] with M =
N∑k=1
mk Fixed.
Figs. 10 and 11 show reconstructed images of Lena and Barbara using
Shannon (SNRLena = 20.55 and SNRBarb = 17.24dB) and Segall
(SNRLena = 21.23 and SNRBarb = 16.90dB) bit allocation methods for
an average of p = 1.5 bpp together with the corresponding error images.
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
Student Version of MATLAB
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
Student Version of MATLAB
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
Student Version of MATLAB
Figure 10: Reconstructed & Error Images-Shannon’s (1.5bpp).
M.R. Azimi Digital Image Processing
Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
Student Version of MATLAB
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
Student Version of MATLAB
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
Student Version of MATLAB
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
Student Version of MATLAB
Figure 11: Reconstructed & Error Images-Segall’s (1.5bpp).
M.R. Azimi Digital Image Processing