Upload
brnanavaty
View
264
Download
0
Tags:
Embed Size (px)
Citation preview
DATA COMPRESSION
Chapter - 3
DEFINATION AND CONCEPT
• Implies reduction.– All mobile nos beginning with a set of code
put together saves 4/5 characters
• Different types of data is to be maintained
• Process of encoding information applying some specific encoding scheme.
• Used for storage of huge amount of data.
• Used for transmission of data.
• Goal is to use fewest number of bits
NEED
• Information Security.
• To increase the capacity of the communication channel.
• To reduce storage space. & Data Backup.
TYPES OF COMPRESSION
• LOSSY– Where loss of data is acceptable
– Ideal for high compression
– Decoding to exact data not possible
– Not preferred for textual data
– Preferred for Digitally Sampled Analog Data (DSAD) – sound, video or graphics
TYPES OF COMPRESSION
• LOSSLESS
–Decoding to exact data is possible
–Finds repeated patterns in a message and encoding those patterns in an efficient manner
–Also called redundancy reduction
–Ideal for text
METHODS FOR COMPRESSION
• Static– Set of codes mapped everytime with the same
code it appears
• Dynamic– Code change over time– Assignment is based on the values of relative
frequency of occurrence at each point of time
• Both follow some algorithm
SHANON-FANO CODING
• Simple lossless technique.
• Frequency of each symbol is counted.
• Arranged in descending order of their frequency.
• Divided into two sets whose total probabilities are as close as possible to being equal.
• All symbols then have the first digits of their codes assigned; symbols in the first set receive "0" and symbols in the second set receive "1".
Symbol Frequency Code Length Code Total Length A 24 2 00 48 B 12 2 01 24 C 10 2 10 20 D 8 3 110 24 E 8 3 111 24
total: 62 symbols SF coded: 140 Bit linear (3 Bit/Symbol): 186 Bit
1st NodeRoot Node
2nd level NodeLeaf node
Left child
Right child
EXAMPLENAMAN N – 2A – 2M – 1
NAM
5
NA
N A
M
0 1
00 01
Code assignedN is 00A 1 is 01M is 10
Because code are prefixed it is also called “Prefix Codes” 0100 01 10 00
10
HUFFMAN CODING
• Developed by David A. Huffman.• Based on frequency of occurrence of a symbol• A lower no code for the more frequent symbols• A tree is created to decide on the codes• Parent node (a sum of) from two less frequently
occurring children nodes is constructed• This process continues until the list of symbols is
empty• Left leaf is 0 and right leaf is 1• The codeword is obtained by following the path
RUN-LENGTH ENCODING
• Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run.
80 characters / line
Interpreted as 80b, 80b, 30b20p30b, 25b30p25b, etc.
WINZIP
• Zip files (.zip or .zipx) are single files, sometimes called "archives", that contain one or more compressed files.
• Zip files make it easy to keep related files together and make transporting, e-mailing, downloading and storing data and software faster and more efficient.
• WinZip features an intuitive point and click, drag and drop interface for viewing, running, extracting, adding and deleting files in archives.
WHAT IS AN ARCHIVE?
• Files that contain other files which are compressed.
• File names ending with ZIP, LZH, ARJ or ARC.
• Used to:– Distribute files on the Internet– Send a group of related Files– Save disk space
Compress and add to an archive
Decompress files of an archive and put them as separate files on the disk
Can be sorted
1)The goal of data compression is to?
a)Minimize a windowb)Flatten a computerc)Reduce the number of bits used to store or transmit informationd)write more concisely
2)Which of the following are examples of data compression?
a)WinZipb)JPEG filesc)MP3sd)All of the above
3)Which of the following is NOT an example of a use of data compression?
a)To save storage spaceb)To destroy datac)To improve the speed of data transfersd)To speed up downloads
4)What was the first data compression algorithm?
a)Shannon-Fano codingb)Huffman codingc)Run-Length codingd)Arithmetic coding
6)What kind of files would be appropriate for Lossy compression?
a).doc filesb).exe filesc).gif filesd)None of the above
END
BSNL
9427 098764
984532
231876
Vodafone
9825 098764
984532
231876
Amin
3raj
3sha
3t
AND NOW COMPRESS THIS INFORMATION AND STORE IT IN
YOUR MIND