Upload
barnaby-bennett
View
241
Download
3
Tags:
Embed Size (px)
Citation preview
Chapter 8
Image Compression
What is Image Compression?
• Reduction of the amount of data required to
represent a digital image → removal of redundant data
• Transforming a 2-D pixel array into a
statistically data set
Why Compression?
• Important in data storage data storage and data transmission data transmission
• Examples:
– Progressive transmission of images (Internet)
– Video coding .
– Digital libraries and image databases
– Remote sensing
– Medical imaging
Lossless vs. Lossy Compression
• Compression techniques
- Lossless (or information-preserving) compression:
Images can be compressed and restored without
any loss of information (e.g., medical imaging,
satellite imaging)
- Lossy compression:
Perfect recovery is not possible but provides a large data compression (e.g., TV signals)
Data and Information
• Data are the means by which information is conveyed• Various amounts of data may be used to represent the
same amount of information• Data redundancy: if n1 and n2 denote the number of
information-carrying units in two data sets that represent the same information, the relative data redundancy of the first data set is
Redundancy
• In digital image compression there exist three basic data redundancies:1. Coding redundancy2. Spatial and Temporal redundancy 3. Irrelevant redundancy
Coding Redundancy
• Let be a discrete random variable representing the gray levels in an image
• Its probability is represented by
• Let be the total number of bits used to represent each value of
• The average number of bits required to represent each pixel is
Example: Variable-Length Coding
Example: Rationale behind Variable-Length Coding
Lavg =1.81 bits
C = 4.42
R = .774
Spatial and Temporal Redundancy
• any pixel can be predicted from the value of its neighbors
•the information carried by individual pixels is relatively small
•spatial redundancy / interframe redundancy
Interpixel Redundancy
• Mapping– to reduce Spatial redundancy, image must be
transformed into a more efficient format– reversible mapping vs. irreversible mapping– reversible mapping:
• original elements can be reconstructed from the transformed set
Example: Run-Length Coding
• Run length Pairs• each pair consists of :
• Intensity value•# pixels that have this intensity value
Psychovisual redundancyIrrelevant
• The eye does not respond with equal sensitivity to all visual information
• Certain information has less relative importance than other information in
normal visual processing (psychovisually redundant)• It can be eliminated without significantly impairing the quality of image perception
Psychovisual redundancy
Psychovisual redundancy
Objective Fidelity Criteria
Signal-to-Noise Ratio (SNR)
Mean square Signal-to-Noise ratio of the output image:
Subjective Fidelity Criteria
• Example of an absolute comparison scale:
Measuring Image Information
• A discrete source of information generates one of N possible symbols from a source alphabet set A = { } , in unit time.
• Example: A = {a,b c, …,z}, {0,1}, {0,1,2,..., 255 }• The source output can be modeled as a discrete random variable E, which can take values in set A ={ },
with corresponding probabilities{ }• We will denote the symbol probabilities by the vector
• Naturally,
• The information source is characterized by the pair (A,z ).
110 ,...,, Naaa
110 ,...,, Naaa110 ,...,, Nppp
TN
TN pppapapapz ],...,,[)](),....,(),([ 110110
1
0
1N
iip
Measuring Information
• Observing an occurrence (or realization) of the random variable E results in some gain of information denoted by I(E). This gain of information was defined to be (Shannon):
• The base for the logarithm depends on the units for measuring information. Usually, we use base 2, which gives the information in units of “binary digits” or “bits.”
• The entropy, H(z), of a source: symbol:
1
0
log)(N
iii ppzH
Measuring Information
• Higher the source entropy, higher the information associated with a source.
• For a fixed number of source symbols, the entropy is maximized if all the symbols are equally likely
• H = 1.6614 for image in figure 8.1
Measuring Information
Image Compression Models
Source Encoder
• encoder is responsible for reducing or eliminating any coding, interpixel, or psychovisual redundancy.
• The first block “Mapper” transforms the input data into a (usually nonvisual) format, designed to reduce interpixel (spatial) redundancy. This block is reversible and may or may not reduce the amount of data.
Example: run-length encoding, image transform.
Source Encoder
• The Quantizer reduces accuracy of the mapper output in accordance with some fidelity criterion.
• This block reduces psychovisual redundancy and is usually not invertible.
• The Symbol Encoder creates a fixed or variable lengthcodeword to represent the quantizer output and maps theoutput in accordance with this code. This block is reversibleand reduces coding redundancy.
Source Decoder
• The decoder blocks are inverse operations of the corresponding encoder blocks (except the quantizer block, which is not invertible).
Huffman Coding
• If the symbols of an information source are coded individually, the Huffman coding yields the smallest possible number of code symbols per source symbol
• Method: create a series of source reductions by ordering the probabilities of the symbols under consideration and combining the lowest probability symbols into a single symbol that replace them in the next source reduction
Huffman Coding: Source Reduction
Huffman Coding: Source Reduction
Huffman Coding: Properties
• The resulting code is called a Huffman code. It has some interesting properties: (1)The source symbols can be encoded (and decoded) one at
time (with a lookup table).
(2) It is called a block code because each source symbol is
mapped into a fixed sequence of code symbols.
(3) It is instantaneous because each codeword in a string of
code symbols can be decoded without referencing succeeding symbols.
(4) It is uniquely decodable because any string of code symbols can be decoded in only one way.
Huffman Coding: Properties
Disadvantage: For a source with J symbols, we need J-2 source reductions. This can be computationally intensive for large J (ex. J = 256 for an image with 256 gray evels).
Huffman Coding: Properties
Huffman coded with 7.428 bits/pixelH = 7.3838 bits/pixel
C = 8/7.428 = 1.077R = 1 – (1/1.077) = 0.07157.15 % coding redundancy
Lempel-Ziv-Welch Coding
• LZW coding is also an error free compression technique.
• Uses a dictionary
• Dictionary is adaptive to the data • Decoder constructs the matching dictionary based on the codewords
received.• LZW encoder sequentially examines the image’s pixels, gray level sequences that are not in the dictionary are placed in
algorithmically determined locations.
LZW Coding: Example
• Consider an example: 4 by 4, an 8-bit image
• The dictionary values 0-255correspond to the pixel values0-255. Assume a 512 worddictionary formats
LZW Coding: Example
• The image is encoded by processing its pixels in a left-to-right, top-to-down manner.
LZW Decoding
Just like the compression algorithm, it adds a new string to the string table each time it reads in a new code. All it needs to do in addition to that is translate each incoming code into a string and send it to the output. Just like the compression algorithm, it adds a new string to the string table each time it reads in a new code. All it needs to do in addition to that is translate each incoming code into a string and send it to the output.
LZW Decoding: Example
LZW Decoding
• It needs to be able to take the stream of codes output from the compression algorithm, and use them to exactly recreate the input stream. One reason for the efficiency of the LZW algorithm is that it does not need to pass the sequence dictionary to the decompression code.
• The dictionary can be built exactly as it was during compression, using the input stream as data.
• Original Image = 128 bits reduced to 90 bits
• Disadvantages:- handling table overflow.
Run-Length Coding
• Images with repeating intensities along their rows (or columns) can often be compressed by representing the runs of identical intensities as run-length pairs.
• Each run-length pair specifies the start of new intensity, and the number of consecutive pixels that have that intensity.
• Removing spatial redundancy.• Very Good for Binary Images.
Run-Length Coding: Approaches
(1) Start position and lengths of runs of 1s for each row is used: (1,3)(7,2) (12,4) (17,2) (20,3) (5,13) (19,4) (1,3) (17,6)(2) Only lengths of runs, starting with the length of 1 run is used: 3,3,2,3,4,1,2,1,3 0,4,13,1,4 3,13,6
Run-Length Coding
• This technique is very effective in encoding binary images with large contiguous black and white regions, which would give rise to a small number of large runs of 1s and 0s.
• The run-lengths can in turn be encoded using a variable length code (ex. Huffman code), for further
compression.
Run-Length Coding
• Let be the fraction of runs of 0s with length k. Naturally, would represent a vector of probabilities (the probability of a run of 0s being of length k).
• Let be the entropy associated with and
be the average length of runs of 0s.
ka ),....,,( 21 Maaa
i
M
ii aaH
1
0 log ),....,,( 21 Maaa
M
iiiaL
10
Run-Length Coding
• Let be the fraction of runs of 1s with length k. Naturally, would represent a vector of probabilities (the probability of a run of 1s being of length k).
• Let be the entropy associated with and
be the average length of runs of 1s.
• The approximate run length entropy of the image is
• provides an estimate of the average number of bits per pixel required to code the run lengths in a binary image, using a variable-length code.
kb),...,,( 21 Mbbb
),...,,( 21 Mbbbi
M
ii bbH
1
1 log
M
iiibL
11
)(
)(
10
10
LL
HHH RL
RLH
Bit-Plane Coding
• A grayscale image is decomposed into a series of binary images and each binary image is compressed by some binary
compression method.
• This removes coding and interpixel redundancy.
• Given a grayscale image with gray levels, each gray value
can be represented by m-bits, say ,
• The gray value r represented by is given by the base 2 polynomial
• given by the base 2 polynomial
• This bit representation can be used to decompose the gray
scale image into m binary images (bit-planes).
m2
),,....,,( 0121 aaaa mm
),,....,,( 0121 aaaa mm
Gray Code
Alternatively, one can use the m-bit Gray code
to represent a given gray value.
The Gray code can be obtained from by the following relationship:
The Gray code of successive gray levels differ at only one position.127 → 01111111 (binary code representation) 01000000 (Gray)128 → 10000000 (binary code representation) 11000000 (Gray)
Bit-Plane Coding
Bit-Plane Coding
Block Transform Coding
• Image transform are able to concentrate the image in a few transform coefficients. Energy packing
• A large number of coefficients can be discarded
• Transform Coding techniques operate on a reversible linear transform coefficients of the image (ex. DCT, DFT, DWT etc.)
Image Compression: Transform Coding
• Input N × N image is subdivided into subimages of size n × n• n × n subimages are converted into transform arrays. This tends to pack as much information as possible in the smallest number of coefficients.
Image Compression: Transform Coding
• Quantizer selectively eliminates or coarsely quantizes the coefficients with least information.
• Symbol encoder uses a variable-length code to encode the quantized coefficients.
• Any of the above steps can be adapted to each subimage (adaptive transform coding), based on local image information, or fixed for all subimages.
Discrete Cosine Transform (DCT)
• Given a two-dimensional N by N image f (x, y), its discrete cosine transform (DCT) C(u,v) is defined as:
• Similarly, the inverse discrete cosine transform (IDCT) is given by
Discrete Cosine Transform (DCT)
(1) Separable (can perform 2-D transform in terms of 1-D transform).
(2) Symmetric (the operations on the variables x, y are identical) (3) Forward and inverse transforms are identical
• The DCT is the most popular transform for image compression algorithms like JPEG (still images), MPEG (motion pictures).
Transform Selection
• Commonly used ones are Karhunen-Loeve (Hotelling) transform (KLT), discrete cosine transform (DCT), discrete Fourier transform (DFT), discrete Wavelet transform (DWT), Walsh-Hadamard transform (WHT).
• Choice depends on the computational resources available and the reconstruction error that can be tolerated.
Transform Selection
Subimage Size Selection
• Images are subdivided into subimages of size n × n. 75% of the coefficient were truncated. • Usually n = , for some integer k. This simplifies the computation of the transforms (ex. FFT algorithm).
• Typical block sizes used in practice are 8 × 8 and 16 × 16
k2
Subimage Size Selection
Bit Allocation
• After transforming each subimage, only a fraction of the coefficients are retained. This can be done in two ways:
(1) Zonal coding: Transform coefficients with large variance are retained. Same set of coefficients retained in all subimages.
(2) Threshold coding: Transform coefficients with large magnitude in each subimage are retained. Different set of coefficients retained in different subimages.
• The retained coefficients are quantized and then encoded.
• The overall process of truncating, quantizing, and coding the transformed coefficients of the subimage is called bit-allocation.
Zonal Coding
• Transform coefficients with large variance carry most of the information about the image. Hence a fraction of the coefficients with the largest variance is retained
Threshold Coding
• In each subimage, the transform coefficients of largest magnitude contribute most significantly and are therefore retained.
• For each subimage
(1)Arrange the transform coefficients in decreasing order of magnitude .
(2) Keep only the top X% of the coefficients and discard rest.
(3) Encode the retained coefficient using variable length code.
Threshold Mask & Zigzag Ordering
•Example
Normalization Matrix
T(u,v) = round[T(u,v)/Z(u,v)]^
Normalization Matrix
JPEG (Joint Photographic Experts Group)
• Is a compression standard for still images
• It defines three different coding systems:
1. A lossy baseline coding system based on DCT (adequate for most compression applications)
2. An extended coding system for greater compression, higher precision, or progressive reconstruction applications
3. A lossless independent coding system for reversible compression
JPEG – Baseline Coding System
1. Computing the DCT : the image is divided into 8X8 blocks; each pixel is level shifted by subtracting the quantity where is the maximum number of gray levels in the image. Then, the DCT of each block is computed.
2. Quantization : The DCT coefficients are threshold and coded using a quantization matrix, and the recorded using zig-zag scanning to form a
1-D sequence
3. Coding : The non-zero AC coefficients are Huffman coded. The DC coefficients of each block are coded relative to the DC coefficient of the previous block
The JPEG Standard: Baseline
• (1) Consider the 8x8 image (s)
• (2) Level shifted (s-128)
• (3) 2D-DCT
• (4) After dividing by quantization matrix qmat
• (5) Zigzag scan as in threshold coding.
Example: Implementing the JPEGBaseline Coding System
Example: Level Shifting
Example: Computing the DCT
Example: The Quantization Matrix
Example: Quantization
Zig-Zag Scanning of the Coefficients
JPEG
JPEG
Example: Coding the Coefficients
• The DC coefficient is coded (difference between the DC coefficient of the previous block and current block)
• The AC coefficients are mapped to runlength pairs– (0,-26) (0,-31) ……………………..(5,-1),(0,-1),EOB
• These are then Huffman coded (codes are specified in the JPEG scheme)
Example: Decoding the Coefficients
Example: Denormalization
Example: IDCT
Example: Shifting Back the Coefficients
Example
Predictive Coding
• Does not require decomposition of grayscale image into bitplanes.
• Eliminate interpixel redundancy by extracting and coding only the new information in each pixel.
• New information in a pixel is the difference between its actual and predicted (based on “previous pixel values”) values.
Lossless Predictive Coding
Lossless Predictive Coding
• Generates an estimate of the value of a given pixel based on the values
– of some past input pixels
or
– of some neighboring pixels (spatial prediction)
Example:
m=order of predictor
Example: 1-D linear predictive coding
Lossless Predictive Coding
Lossless Predictive Coding
Lossy Predictive Coding
Example: Delta Modulation
Example: Delta Modulation
Digital Image Watermarking
• Process of inserting data into an image in such a way that it can be used to make an assertion about the image.
• Used in:– Copyright Identification
– User Identification.
Digital Image Watermarking
• Watermarks can be either visible or invisible.
• A visible watermark is a semi-transparent sub-image or image that is placed on top of another image(watermarked image).
Digital Image Watermarking
• Visible watermark example:• fw = (1 – α)f + αw
• α controls the visibility of the watermark and the underlying image
Digital Image Watermarking
• Invisible watermark example:
– fw = 4(f/4) + w/64
Digital Image Watermarking
• Robust invisible watermarks are designed to survive image modification (attack).
• Attacks: compression, filtering, rotation, cropping, ….
Digital Image Watermarking
Digital Image Watermarking
• Mark insertion:-1. Compute the 2-D DCT of the image to be
watermarked
2. Locate the K largest coeffecients, c1, c2, …, ck by magnitude.
3. Create a watermark by generating a K pseudo-random sequence of numbers ω1 ,ω2,…..ωk taken from a Gaussian distribution with mean = 0 and variance = 1.
Digital Image Watermarking
4. Embed the water mark from 3 into the k largest coeffecients from step 2 using the equation:
ci- = ci.(1+α ωi) 1 ≤ i ≤ K
5. Compute the inverse DCT of the result obtained from step 4.
Digital Image Watermarking
• Advantages:– No obvious structure.– Embedded in a multiple frequency components.– Attacks against them tend to degrade the image.
Digital Image Watermarking
• Watermark extraction:– Compuet the 2-D DCT of the watermarked image.
– Extract the K DCT coefficients ci^
– Compute watermarks using
– Measure the similarity between and
γ =
Digital Image Watermarking
• D = 1 if γ ≥T 0 otherwise
D = 1 indicate that watermark is present; D = 0 indicate that it was not.
Digital Image Watermarking