Upload
quincy-cameron
View
36
Download
2
Embed Size (px)
DESCRIPTION
Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004. Introduction. Compression is used to reduce the volume of information to be stored into storages or to reduce the communication bandwidth required for its transmission over the networks. Compression Principles. - PowerPoint PPT Presentation
Citation preview
1
Data Compression
Hae-sun Jung
CS146 Dr. Sin-Min Lee
Spring 2004
2
3
Introduction
Compression is used to reduce the volume
of information to be stored into storages or
to reduce the communication bandwidth
required for its transmission over the
networks
4
5
Compression Principles Entropy Encoding
Run-length encoding Lossless & Independent of the type of source i
nformation Used when the source information comprises l
ong substrings of the same character or binary digit
(string or bit pattern, # of occurrences), as FAX
e.g) 000000011111111110000011……
0,7 1, 10, 0,5 1,2…… 7,10,5,2……
6
Compression Principles
Entropy Encoding Statistical encoding
Based on the probability of occurrence of a pattern
The more probable, the shorter codeword “Prefix property”: a shorter codeword must not
form the start of a longer codeword
7
Compression Principles
Huffman Encoding Entropy, H: theoretical min. avg. # of bits that are required
to transmit a particular stream
H = -Σ i=1 n Pi log2Pi
where n: # of symbols, Pi: probability of symbol i
Efficiency, E = H/H’
where, H’ = avr. # of bits per codeword = Σ i=1 n Ni Pi
Ni: # of bits of symbol i
8
E.g) symbols M(10), F(11), Y(010), N(011), 0(000), 1(001) with probabilities 0.25, 0.25, 0.125, 0.125, 0.125, 0.125
H’ = Σ i=1 6 Ni Pi = (2(20.25) + 4(30.125)) = 2.5 bits/codeword
H = -Σ i=1 6 Pi log2Pi = - (2(0.25log20.25) + 4(0.125log20.125)) = 2.5
E = H/H’ =100 % 3-bit/codeword if we use fixed-length codewords for six
symbols
9
Huffman Algorithm
Method of construction for an encoding tree
• Full Binary Tree Representation
• Each edge of the tree has a value,
(0 is the left child, 1 is the right child)
• Data is at the leaves, not internal nodes
• Result: encoding tree
• “Variable-Length Encoding”
10
Huffman Algorithm
• 1. Maintain a forest of trees• 2. Weight of tree = sum frequency of leave
s• 3. For 0 to N-1
– Select two smallest weight trees– Form a new tree
11
• Huffman coding
• variable length code whose length is inversely proportional to that character’s frequency
• must satisfy nonprefix property to be uniquely decodable
• two pass algorithm– first pass accumulates the character frequency
and generate codebook– second pass does compression with the codeb
ook
12
• create codes by constructing a binary tree
1. consider all characters as free nodes
2. assign two free nodes with lowest frequency to a parent nodes with weights equal to sum of their frequencies
3. remove the two free nodes and add the newly created parent node to the list of free nodes
4. repeat step2 and 3 until there is one free node left. It becomes the root of tree
Huffman coding
13
• Right of binary tree :1• Left of Binary tree :0• Prefix (example)
– e:”01”, b: “010” – “01” is prefix of “010” ==> “e0”
• same frequency : need consistency of left or right
14
•Example(64 data)• R K K K K K K K• K K K R R K K K• K K R R R R G G• K K B C C C R R• G G G M C B R R• B B B M Y B B R• G G G G G G G R• G R R R R G R R
15
• Color frequency Huffman code• =================================• R 19 00• K 17 01• G 14 10• B 7 110• C 4 1110• M 2 11110• Y 1 11111
16
17
Static Huffman Coding Huffman (Code) Tree
Given : a number of symbols (or characters) and their relative probabilities in prior
Must hold “prefix property” among codes
Symbol Occurrence A 4/8 B 2/8 C 1/8 D 1/8
Symbol CodeA 1 B 01C 001D 000
41 + 22 + 13 + 13 = 14 bits are required to transmit
“AAAABBCD”
0 1
D
A
BC
0 10 18
4
2
Leaf node
Root node
Branch node
Prefix Property !
18
The end