Upload
manish-t-i
View
69
Download
1
Embed Size (px)
Citation preview
DATA COMPRESSION
Simple Dictionary Compression
Manish T I
• It is a two pass algorithm in which first pass analyze the data in the source file and second pass will compress the data to a file.
First Pass:-
• In the source file distinct bytes are identified.• Check the number of times it occurs in the source
file.• A new list is sorted in descending order of the
frequencies, in such a manner in which higher count of byte (alphabets) appear at the top of the list which is known as the dictionary.
Second Pass:-
• The source file is read again byte by byte • Each byte is located in the dictionary by a direct
search and its index is noted.• Index value is written on the compressed file,
preceded by its length. • The index value consist of 256 values and range
spans from 0 to 255. • The index is written on the compressed file,
preceded by a 3-bit code denoting the index’s length.
• Index Table
Binary Value
Value Bit
000 0 1
001 1 2
010 2 3
011 3 4
100 4 5
101 5 6
110 6 7
111 7 8
Input File sample data : - TTVVVEGTVEN
Dictionary File : -
Compressed File (4 – 11 bits)
T
V
V
V
E
G
T
V
E
N
0 0 1 1 00 0 0 10 0 0 1
0 0 0 1
0 0 1 1 10 1 0 1 0 0
0 0 1 1 00 0 0 10 0 1 1 10 1 0 1 0 1
No: of bits used
5
4
4
4
5
6
5
4
5
6
• Compression is achieved because the dictionary is sorted by the frequency of the bytes. Each byte is replaced by a quantity of between 4 and 11 bits.
• Dictionary is not sorted by byte values.
• Disadvantage :- Slow compression not in the case of decompression.
Reference:-
Data Compression : The Complete Reference, David Salomon, Springer Science & Business Media, 2004
For any queries contact: Web: www.iprg.co.inE-mail: [email protected]: @ImageProcessingResearchGroup