18
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

Embed Size (px)

DESCRIPTION

Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004. Introduction. Compression is used to reduce the volume of information to be stored into storages or to reduce the communication bandwidth required for its transmission over the networks. Compression Principles. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

1

Data Compression

Hae-sun Jung

CS146 Dr. Sin-Min Lee

Spring 2004

Page 2: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

2

Page 3: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

3

Introduction

Compression is used to reduce the volume

of information to be stored into storages or

to reduce the communication bandwidth

required for its transmission over the

networks

Page 4: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

4

Page 5: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

5

Compression Principles Entropy Encoding

Run-length encoding Lossless & Independent of the type of source i

nformation Used when the source information comprises l

ong substrings of the same character or binary digit

(string or bit pattern, # of occurrences), as FAX

e.g) 000000011111111110000011……

0,7 1, 10, 0,5 1,2…… 7,10,5,2……

Page 6: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

6

Compression Principles

Entropy Encoding Statistical encoding

Based on the probability of occurrence of a pattern

The more probable, the shorter codeword “Prefix property”: a shorter codeword must not

form the start of a longer codeword

Page 7: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

7

Compression Principles

Huffman Encoding Entropy, H: theoretical min. avg. # of bits that are required

to transmit a particular stream

H = -Σ i=1 n Pi log2Pi

where n: # of symbols, Pi: probability of symbol i

Efficiency, E = H/H’

where, H’ = avr. # of bits per codeword = Σ i=1 n Ni Pi

Ni: # of bits of symbol i

Page 8: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

8

E.g) symbols M(10), F(11), Y(010), N(011), 0(000), 1(001) with probabilities 0.25, 0.25, 0.125, 0.125, 0.125, 0.125

H’ = Σ i=1 6 Ni Pi = (2(20.25) + 4(30.125)) = 2.5 bits/codeword

H = -Σ i=1 6 Pi log2Pi = - (2(0.25log20.25) + 4(0.125log20.125)) = 2.5

E = H/H’ =100 % 3-bit/codeword if we use fixed-length codewords for six

symbols

Page 9: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

9

Huffman Algorithm

Method of construction for an encoding tree

• Full Binary Tree Representation

• Each edge of the tree has a value,

(0 is the left child, 1 is the right child)

• Data is at the leaves, not internal nodes

• Result: encoding tree

• “Variable-Length Encoding”

Page 10: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

10

Huffman Algorithm

• 1. Maintain a forest of trees• 2. Weight of tree = sum frequency of leave

s• 3. For 0 to N-1

– Select two smallest weight trees– Form a new tree

Page 11: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

11

• Huffman coding

• variable length code whose length is inversely proportional to that character’s frequency

• must satisfy nonprefix property to be uniquely decodable

• two pass algorithm– first pass accumulates the character frequency

and generate codebook– second pass does compression with the codeb

ook

Page 12: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

12

• create codes by constructing a binary tree

1. consider all characters as free nodes

2. assign two free nodes with lowest frequency to a parent nodes with weights equal to sum of their frequencies

3. remove the two free nodes and add the newly created parent node to the list of free nodes

4. repeat step2 and 3 until there is one free node left. It becomes the root of tree

Huffman coding

Page 13: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

13

• Right of binary tree :1• Left of Binary tree :0• Prefix (example)

– e:”01”, b: “010” – “01” is prefix of “010” ==> “e0”

• same frequency : need consistency of left or right

Page 14: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

14

•Example(64 data)• R K K K K K K K• K K K R R K K K• K K R R R R G G• K K B C C C R R• G G G M C B R R• B B B M Y B B R• G G G G G G G R• G R R R R G R R

Page 15: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

15

• Color frequency Huffman code• =================================• R 19 00• K 17 01• G 14 10• B 7 110• C 4 1110• M 2 11110• Y 1 11111

Page 16: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

16

Page 17: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

17

Static Huffman Coding Huffman (Code) Tree

Given : a number of symbols (or characters) and their relative probabilities in prior

Must hold “prefix property” among codes

Symbol Occurrence A 4/8 B 2/8 C 1/8 D 1/8

Symbol CodeA 1 B 01C 001D 000

41 + 22 + 13 + 13 = 14 bits are required to transmit

“AAAABBCD”

0 1

D

A

BC

0 10 18

4

2

Leaf node

Root node

Branch node

Prefix Property !

Page 18: Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

18

The end