45
Principles of Data Compression: Theory and Applications Dr. Daniel Leon-Salas

Tutorial - Intercon 2014

Embed Size (px)

DESCRIPTION

Intercon 2014

Citation preview

Page 1: Tutorial - Intercon 2014

Principles of Data Compression: Theory and Applications

Dr. Daniel Leon-Salas

Page 2: Tutorial - Intercon 2014

Motivation

The Information Revolution

IEEE Intercon 2014, Arequipa

Page 3: Tutorial - Intercon 2014

Motivation

• Consider a 3 minute song:– assuming two channels, a 16-bit resolution, a

sampling rate of 48 kHz, it will take 33 MB of disk space to store the song.

• Consider a 5 megapixel camera:– assuming an 8-bit resolution per pixel, it will take

5 MB of disk space to store one picture.

• One second of video using the CCIR 601 format (720×485) needs more than 30 megabytes of storage space.

IEEE Intercon 2014, Arequipa

Page 4: Tutorial - Intercon 2014

Introduction

• If data generation is growing at an explosive rate, why not focus on improving transmission and storage technologies?

• Transmission and storage technologies are improving but not at the same rate as data is generated.

• This is especially true for wireless communications where the radio spectrum is limited.

IEEE Intercon 2014, Arequipa

Page 5: Tutorial - Intercon 2014

Introduction

• Data compression is the art or science of representing information in a compact form.

• Data compression is performed by identifying and exploiting structure and redundancies in the data.

• Data can be samples of audio, images, text files, it can be generated by sensors or scientific instruments, social networks, markets, etc.

IEEE Intercon 2014, Arequipa

Page 6: Tutorial - Intercon 2014

Introduction

• Consider Morse code, developed in the 19th

century, in which letters are encoded with dots and dashes. some letters (e and a) occur more often than others (q

and j).

letters that occur more frequently are encoded using shorter sequences: e . a .-

Letters that occur less frequently are encoded using longer sequences: q - - . - j .- - -

• In this case the statistical structure of the data was exploited.

IEEE Intercon 2014, Arequipa

Page 7: Tutorial - Intercon 2014

Introduction

• There are many other types of structure in data that can be exploited to achieve compression.

• In speech, the physical structure of our vocal tract determines the kind of sounds that we can produce instead of sending speech samples we can send information about the vocal tract to the receiver.

• We can also exploit characteristics of the end user of the data.

IEEE Intercon 2014, Arequipa

Page 8: Tutorial - Intercon 2014

Introduction

• In many cases, when transmitting images or audio, the end user is a human.

• Humans have limited hearing and vision abilities.

• We can exploit the limitations of human perception to discard irrelevant information and obtain higher compression.

IEEE Intercon 2014, Arequipa

Page 9: Tutorial - Intercon 2014

Compression and Reconstruction

IEEE Intercon 2014, Arequipa

compressionreconstruction

(decompression)

Original Reconstructed

Compression Algorithm

Page 10: Tutorial - Intercon 2014

Lossless Compression

IEEE Intercon 2014, Arequipa

• Lossless compression involves no loss of information.

• The recovered data is an exact copy of the original.

• Useful in applications that cannot tolerate any difference:

medical images scientific data financial records computer programs

Page 11: Tutorial - Intercon 2014

Lossy Compression

IEEE Intercon 2014, Arequipa

• In lossy compression some loss of information is tolerated.

• The original data cannot be recovered exactly but results in higher compression ratios.

• Useful in applications where some loss of information is not critical:

speech coding telephone communications

video coding digital photography

Page 12: Tutorial - Intercon 2014

Compression Performance

IEEE Intercon 2014, Arequipa

• Compression ratio (CR):

• Distortion (for lossy compression):

# bits required to represent data without compression

# bits required to represent data with compressionCR =

MSE =1

𝑁𝑋 − 𝑋

2

2

PSNR dB = 10 log10𝑋max2

MSE

• Rate: average number of bits per sample or symbol

Page 13: Tutorial - Intercon 2014

Example 1

IEEE Intercon 2014, Arequipa

Let’s consider the following input sequence:

𝑋 = [9, 11, 11, 11, 14, 13, 15, 17, 16, 17, 20, 21]

To encode this sequence using plain binary code, we would need to use 5 bits per number and a total of 60 bits.

K. Sayood, Introduction to Data Compression, 2nd edition, Morgan Kauffman

Page 14: Tutorial - Intercon 2014

Example 1

IEEE Intercon 2014, Arequipa

If we use the model:

𝑋 𝑛 = 𝑛 + 8

The residual 𝑒 consists of only three numbers {−1, 0, 1} which can be encoded using 2 bits per number for a total 36 bits.

and compute the residual 𝑒 = 𝑋 − 𝑋 = [0, 1, 0, −1, 1,−1, 0, 1, −1,−1, 1, 1]

Page 15: Tutorial - Intercon 2014

Example 2

IEEE Intercon 2014, Arequipa

• Input sequence: a_barayaran_array_ran_far_faar_faaar_away

• The sequence is made of eight different characters (symbols):

a, b, f, n, r, w, y, _

• Hence, we can use three bits per symbol to encode the sequence resulting in a total of 41×3=123 bits for the entire sequence.

• However, we can use fewer bits if we realize that some symbols occur more frequently than others.

• We can use fewer bits to encode the more frequent symbols.

K. Sayood, Introduction to Data Compression, 2nd edition, Morgan Kauffman

Page 16: Tutorial - Intercon 2014

Example 2

IEEE Intercon 2014, Arequipa

Using variable-length codes we can encode the sequence using only 97 bits.

Input character Frequency Variable-length code Fixed-length code

a 16 1 000

_ 7 001 001

b 1 01100 010

f 3 0100 011

n 2 0111 100

r 6 000 101

w 1 01101 110

y 3 0101 111

Input sequence: a_barayaran_array_ran_far_faar_faaar_away

codes

codewords

Page 17: Tutorial - Intercon 2014

Statistical Redundancy

IEEE Intercon 2014, Arequipa

• Statistical redundancy was employed in Example 2 to build a code to encode the input sequence.

• When compressing text, statistical redundancy can be extended to, not only characters, but also words dictionary technique.

• Examples of compression solutions that use the dictionary technique include the Lempel-Ziv (LZ) algorithm, LZ77, gzip, Zip, PNG, PKZip.

Page 18: Tutorial - Intercon 2014

Information and Entropy

IEEE Intercon 2014, Arequipa

• Information can be defined as a message that helps to resolve uncertainty.

• In Information Theory information is taken as a sequence of symbols from an alphabet.

• Entropy is a measure of information.

source

A{a1, a2 … an}

a1 a2 a3 a6 a8 a5 a3 a4

symbols

messagealphabet

𝐻 𝐴 = −

𝑖=1

𝑛

𝑃(𝑎𝑖) log𝑃(𝑎𝑖)

First-order entropy of the source:

Page 19: Tutorial - Intercon 2014

Entropy

IEEE Intercon 2014, Arequipa

• If the base of the logarithm is 2 the units of entropy are bits. If the base is 10 the units are hartleys. If the base is e the units are nats.

• The first-order entropy assumes that the symbols occur independently of each other.

• The entropy is a measure of the average number of bits needed to encoded the output of the source.

• Claude Shannon showed that the best rate that a lossless compression algorithm can achieve is equal to the entropy of the source.

• Example: Let’s consider a source with an alphabet consisting of four symbols: a1, a2, a3, a4.

P(a1) = 1/2, P(a2) = 1/4, P(a3) = 1/8, P(a4) = 1/8

H = -(1/2 log2(1/2) + 1/4 log2(1/4) + 1/8 log2(1/8) + 1/8 log2(1/8)) = 1.75 bits/symbol.

𝐻 𝐴 = −

𝑖=1

𝑛

𝑃(𝑎𝑖) log𝑃(𝑎𝑖)

Page 20: Tutorial - Intercon 2014

Coding

IEEE Intercon 2014, Arequipa

• Coding is the process of assigning binary sequences to symbols of an alphabet.

• Example: Let’s consider a source with a four-symbol alphabet such that: P(a1) = 1/2,

P(a2) = 1/4, P(a3) = 1/8, P(a4) = 1/8 H = 1.75 bits/symbol.

Symbol Probability Code 1 Code 2 Code 3 Code 4

a1 0.5 0 0 0 0

a2 0.25 0 1 10 01

a3 0.125 1 00 110 011

a4 0.125 10 11 111 0111

Average length 1.125 bits 1.25 bits 1.75 bits 1.875 bits

uniquely decodable

codes

Page 21: Tutorial - Intercon 2014

Prefix Codes

IEEE Intercon 2014, Arequipa

k bits

C1

n bits

C2

Consider the following codewords:

IF

n bits

C2

k bits

C1

dangling suffix

then we say that C1 is a prefix of C2

• If the dangling suffix is itself a codeword, the code is not uniquely decodable.

• A prefix code is a code in which no codeword is a prefix of another codeword.

• Prefix codes are uniquely decodable.

Page 22: Tutorial - Intercon 2014

Huffman Coding

IEEE Intercon 2014, Arequipa

• Huffman coding is an algorithm for building optimum prefix codes.

• It was developed as a class assignment in the first class on information theory taught by Robert Fano at MIT in 1950.

• Huffman coding assumes that the probabilities of the source are known.

• Huffman coding is based on the following observations about optimum prefix codes: Symbols with higher probability have shorter codewords than

less probable symbols. The two symbols with the lowest probabilities have the same

length (proof by contradiction) In a Huffman code the codewords corresponding to the two

symbols with the lowest probabilities differ only in the last bit.

Page 23: Tutorial - Intercon 2014

Huffman Coding

IEEE Intercon 2014, Arequipa

Example: Let’s build a Huffman code for a source with a four-symbol alphabet such that: (a1) = 0.5, P(a2) = 0.25, P(a3) = 0.125, P(a4) = 0.125

a1 a2 a3 a4

0.5 0.25 0.125 0.125

a1 a2

a3 a4

0.5 0.25 0.25

1

210

Page 24: Tutorial - Intercon 2014

Huffman Coding

IEEE Intercon 2014, Arequipa

a1 a2

a3 a4

0.5 0.25 0.25

2

a1

a2

a3 a4

3

0.250.25

0.5 0.5

10

10

10

Page 25: Tutorial - Intercon 2014

Huffman Coding

IEEE Intercon 2014, Arequipa

a1

a2

a3 a4

4

0.250.25

0.5 0.5

0.125 0.125

1.0

10

10

10

Symbol Probability Codeword

a1 0.5 0

a2 0.25 10

a3 0.125 110

a4 0.125 111

Average codeword length:lavg = 0.5×1 + 0.25×2 + 0.125×3 +

0.125×3 = 1.75 bits

It can be shown that for Huffman codes:

H(S) ≤ lavg ≤ H(S)+1

Page 26: Tutorial - Intercon 2014

Decoding Huffman Codes

IEEE Intercon 2014, Arequipa

a1

a2

a3 a4

10

10

10

Example: Decode the following message using the Huffman code from previous example: 0110101110

0110101110

0110101110

0110101110

0110101110

0110101110

a1

a1 a3

a1 a3 a2

a1 a3 a2 a4

a1 a3 a2 a4 a1

Decoded messageEncoded message

Page 27: Tutorial - Intercon 2014

Adaptive Huffman Codes

IEEE Intercon 2014, Arequipa

• Huffman coding requires knowledge of the probabilities of the source.• If this knowledge is not available, Huffman coding becomes a two-pass

procedure: first pass to compute the probabilities second pass to encode the output of the source.

• The adaptive Huffman coding algorithm converts this two-pass procedure into a single-pass procedure.

• In adaptive Huffman coding, the transmitter and the receiver start with a code tree that has a single node corresponding to all the symbols not yet transmitted (NYT).

• As transmission progresses, nodes corresponding to transmitted symbols are added to the tree.

• The first time a symbol is transmitted, the code for NYT is transmitted first followed by a non-adaptive code agreed by the transmitter and the receiver before transmission starts.

Page 28: Tutorial - Intercon 2014

Golomb-Rice Codes

IEEE Intercon 2014, Arequipa

• The Golomb-Rice codes are a family of codes commonly used in data compression applications due to their low-complexity and good compression performance.

• The JPEG committee and the Consultative Committee for Space Data Systems (CCSDS), for instance, have adopted the Golomb-Rice codes as part of their standards.

• Golomb-Rices codes have also been recommended in the lossless audio compression standard H.264 and are already used in many commercial audio compression software.

• The Golomb-Rice codes have their origin in the pioneering work of Golomb who proposed a method to encode run lengths of events of a binary source when po

m=1/2, where po is the probability of events and mis an integer.

Page 29: Tutorial - Intercon 2014

Golomb-Rice Codes

IEEE Intercon 2014, Arequipa

binary source

A{0, 1} 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 …

4 3 4 7 3 2po is the probability of a 1

(pom=1/2 where m is an integer) run lengths (non-negative integers)

. . . .0 1 2 3 4 5 6 7 8 9 10 11 12

n

P(n)

Geometric distribution

Page 30: Tutorial - Intercon 2014

Golomb-Rice Codes

IEEE Intercon 2014, Arequipa

The Golomb-Rice codes consider the special case when m = 2k (k≥0)

k

n

2

k

n

2mod

unary code

natural binarycode

n

Example:n =17 (00010001)

k=0 codeword = 111111111111111110k=1 codeword = 1111111101k=2 codeword = 1111001k=3 codeword = 110001

b7 b6 b5 b4 b3 b2 b1 b0

k

b3 b2 b1 b00111111

unary code binary code

k=4 codeword = 100001k=5 codeword = 010001k=6 codeword = 0010001k=7 codeword = 00010001

Encoding Procedure:

Page 31: Tutorial - Intercon 2014

Golomb-Rice Codes

IEEE Intercon 2014, Arequipa

P(n)

....0-1-2-3-4-5-6-7-8-9-10

. . . .1 2 3 4 5 6 7 8 9 10 11

n

Practical sources produce positive and negative numbers (double-sided distribution)

Use the following mapping:

M(n) =2n

2|n|−1

if n ≥ 0

if n < 0

Maps positive input numbers to even integers and negative input numbers to odd integers.

Page 32: Tutorial - Intercon 2014

Adaptive Golomb-Rice Codes

IEEE Intercon 2014, Arequipa

M

codewordsource G-Rcoder

adaptivealgorithm

Page 33: Tutorial - Intercon 2014

Adaptive Golomb-Rice Codes

IEEE Intercon 2014, Arequipa

1) Initialize k to kini;

2) Reset counter;

3) Read input n and encode it using parameter k;

4) If (unary code ≥ 1) increment counter;

5) If (unary code = 0) decrement counter;

6) If (counter value ≥ M) k++; Goto 2;

7) If (counter value ≤ -M) k--; Goto 2;

Page 34: Tutorial - Intercon 2014

Entropy Coding

IEEE Intercon 2014, Arequipa

sourceentropyencoder

compressed output

n

P(n)

If the source has a narrow distribution, an entropy encoder (Huffman, Golomb-Rice, arithmetic) can be used directly

Otherwise, a decorrelation step might be necessary

sourceentropyencoder

compressed outputdecorrelation

predictive coding, transform coding,

subband coding

Page 35: Tutorial - Intercon 2014

Predictive Coding Decorrelation

IEEE Intercon 2014, Arequipa

X

61 6358 69

6460

57 5955 63

X X = 64

3 20 6

4-1

2 2-2 4

eX

X ─ X prediction residual

pixel prediction

In an image, a pixel generally has a value

close to one of its neighbors

Page 36: Tutorial - Intercon 2014

Predictive Coding Decorrelation

IEEE Intercon 2014, Arequipa

Histogram Histogram

Original Residual

Page 37: Tutorial - Intercon 2014

Context Adaptive Lossless Image Compression (CALIC)

IEEE Intercon 2014, Arequipa

NNENN

NENNW

WWW X

Pixel neighborhood

𝑑ℎ = |𝑊 −𝑊𝑊| + 𝑁 − 𝑁𝑊 + |𝑁𝐸 − 𝑁|

𝑑𝑣 = |𝑊 −𝑁𝑊| + 𝑁 − 𝑁𝑁 + |𝑁𝑁𝐸 − 𝑁𝐸|

If 𝑑ℎ − 𝑑𝑣 > 80 𝑋 𝑁

else if 𝑑𝑣 − 𝑑ℎ > 80 𝑋 𝑊

else { 𝑋 𝑁 +𝑊 /2 + (𝑁𝐸 − 𝑁𝑊)/4

if 𝑑ℎ − 𝑑𝑣 > 32 𝑋 ( 𝑋 + 𝑁)/2

else if 𝑑𝑣 − 𝑑ℎ > 32 𝑋 ( 𝑋 + 𝑁)/2

else if 𝑑ℎ − 𝑑𝑣 > 8 𝑋 (3 𝑋 + 𝑁)/4

else if 𝑑𝑣 − 𝑑ℎ > 8 𝑋 (3 𝑋 +𝑊)/4

}

The neighboring pixels N, W,

NE, NW, NN, WW, NNE are available to both the encoder and the decoder (assuming a

raster scan)

To get an idea of the boundaries present in the neighborhood:

Initial pixel prediction:

1

2

3 The initial prediction is refined based on the relationships of the pixels in the neighborhood (contexts). For each context we keep track of how much prediction error is generated and use it to refine the initial prediction.

Page 38: Tutorial - Intercon 2014

Transform Coding

IEEE Intercon 2014, Arequipa

• In transform coding the input sequence is transformed into another sequence in which most of the information is contained in only a few elements.

• For a 1D signal such as audio or speech, 𝐱, the forward transform is defined as:𝜃 = 𝐀𝐱

and the inverse transform is defined as:𝐱 = 𝐁𝜃

the transforms are orthonormal transforms: 𝐁 = 𝐀−𝟏 = 𝐀𝑇

• For 2D signals such as images, a two-dimensional separable transform is used. In a separable transform, we can take a 1D transform in one dimension and another 1D transform in the other dimension.

• In matrix notation:𝚯 = 𝐀𝐗𝐀𝑇

and the inverse transform is given by:𝐗 = 𝐁𝚯𝐁𝑇

Page 39: Tutorial - Intercon 2014

Transform Coding

IEEE Intercon 2014, Arequipa

• In the JPEG standard, the forward transform is the Discrete Cosine Transform (DCT) and the inverse transform is the Inverse Discrete Cosine Transform (IDCT).

• The DCT transform matrix 𝐀 is defined as:

• 𝐀𝑖,𝑗 =

1

𝑁cos

2𝑗+1 𝑖𝜋

2𝑁𝑖 = 0, 𝑗 = 0,1,⋯ , 𝑁 − 1

2

𝑁cos

2𝑗+1 𝑖𝜋

2𝑁𝑖 = 1,2,⋯ , 𝑁 − 1, 𝑗 = 0,1,⋯ ,𝑁 − 1

DCT Quantization

DPCM

RLC

DC

AC

Entropy encoder

quantization table

compressed image

input image

Page 40: Tutorial - Intercon 2014

Transform Coding - DCT

IEEE Intercon 2014, Arequipa

183 177 147 79 41 34 35 43

189 153 63 39 38 37 39 44

187 99 37 38 42 41 46 46

101 42 36 39 61 63 59 44

41 41 38 45 57 73 52 47

44 49 49 50 54 60 58 54

51 58 55 50 55 57 58 54

44 50 52 54 55 59 67 63

502.0 119.5 83.8 48.3 6.0 0.0 -0.1 -0.3

88.6 173.4 90.9 22.5 11.5 -1.8 -0.2 -0.8

62.0 78.7 22.2 -44.9 -19.8 -9.4 -7.3 -1.1

12.2 4.7 -37.1 -44.6 -30.2 -12.2 5.0 -3.0

3.5 -22.5 -36.9 -20.3 -13.0 4.1 11.5 5.1

12.1 9.7 -7.0 -6.6 2.6 11.3 8.5 11.5

9.2 7.9 3.7 -6.4 6.3 10.1 3.8 1.8

2.6 9.8 1.4 -2.0 0.3 -1.2 2.3 -5.1

DCT

8

8

AC coefficientsDC coefficient

Page 41: Tutorial - Intercon 2014

Quantization of DCT Coefficients

IEEE Intercon 2014, Arequipa

502.0 119.5 83.8 48.3 6.0 0.0 -0.1 -0.3

88.6 173.4 90.9 22.5 11.5 -1.8 -0.2 -0.8

62.0 78.7 22.2 -44.9 -19.8 -9.4 -7.3 -1.1

12.2 4.7 -37.1 -44.6 -30.2 -12.2 5.0 -3.0

3.5 -22.5 -36.9 -20.3 -13.0 4.1 11.5 5.1

12.1 9.7 -7.0 -6.6 2.6 11.3 8.5 11.5

9.2 7.9 3.7 -6.4 6.3 10.1 3.8 1.8

2.6 9.8 1.4 -2.0 0.3 -1.2 2.3 -5.1

496 121 80 48 0 0 0 0

84 168 84 19 0 0 0 0

56 78 16 -48 0 0 0 0

14 0 -44 -58 -51 0 0 0

0 -22 -37 0 0 0 0 0

24 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

16 11 10 16 24 40 51 61

12 12 14 19 26 58 60 55

14 13 16 24 40 57 69 56

14 17 22 29 51 87 80 62

18 22 37 56 68 109 103 77

24 35 55 64 81 104 113 92

49 64 78 87 103 121 120 101

72 92 95 98 112 100 103 99

DCT coefficients

Quantization Table (𝐐)

Quantized coefficients

𝚯 = 𝐐 round𝚯

𝐐

After quantization the DCT coefficients are transmitted following a zig-zag pattern.

The coefficients are encoded using a Huffman

code.

Page 42: Tutorial - Intercon 2014

Transform Coding - DCT

IEEE Intercon 2014, Arequipa

Original Coded using DCT

Page 43: Tutorial - Intercon 2014

Sub-band Coding

IEEE Intercon 2014, Arequipa

• In sub-band coding the input signal is decomposed into several sub-bands using an analysis filter bank.

• Depending on the signal different sub-bands will contain different amounts of information.

• Sub-bands with lots of information are encoded using more bits while sub-bands with little information are encoded using fewer bits.

• At the decoder side, the signal is reconstructed using a bank of synthesis filter.

f1 f2 f3 fM

. . .

. . .

Page 44: Tutorial - Intercon 2014

Subband Coding

IEEE Intercon 2014, Arequipa

analysis filter 1 M

entropy encoder 1

entropy decoder 1 M

synthesis filter 1

. . .

analysis filter 2 M

entropy encoder 2

entropy decoder 2 M

synthesis filter 2

. . .

analysis filter 3 M

entropy encoder 3

entropy decoder 3 M

synthesis filter 3

. . .

analysis filter M M

entropy encoder M

entropy decoder M M

synthesis filter M

. . .

outputinput

Page 45: Tutorial - Intercon 2014

Further Reading

IEEE Intercon 2014, Arequipa

• Khalid Sayood, Introduction to Data Compression, 4th edition, Morgan Kaufmann, San Francisco, 2012.

• G. Held and T. R. Marshall, Data Compression, 3rd edition, John Wiley and Sons, New York, 1991.

• N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall, Englewood Cliffs, 1984.

• B. E. Usevitch, “A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000,” IEEE Signal Processing Magazine, vol. 18, no. 5, 2001.

• D. Pan, “Digital audio compression,” Digital Technical Journal, vol. 5, no. 2, 1993.

• M. Hans and R. W. Schafer, “Lossless compression of digital audio,” IEEE Signal Processing Magazine, vol. 18, no. 4, 2001.

• G. E. Blelloch, Introduction to Data Compression, course notes, Computer Science Department, Carnegie Mellon University