Mult 8 Compresion

8/13/2019 Mult 8 Compresion

1/29

8. Compression


2/29

2

Video and Audio Compression

Video and Audio files are very large. Unless wedevelop and maintain very high bandwidthnetworks (Gigabytes per second or more) we have

to compress the data.Relying on higher bandwidths is not a good option -

M25 Syndrome: Traffic needs ever increases andwill adapt to swamp current limit whatever this is.

Compression becomes part of the representation orcod ingscheme which have become popularaudio, image and video formats.


3/29

3

What is Compression?

Compression basically employs redundancy in the data:

Temporal - in 1D data, 1D signals, Audio etc.

Spatial - correlation between neighbouring pixels or dataitems

Spectral - correlation between colour or luminescencecomponents. This uses the frequency domain to exploitrelationships between frequency of change in data.

psycho-visual - exploit perceptual properties of the humanvisual system.


4/29

4

Compression can be categorised in

two broad ways:

Lossless Compression

where data is compressed and can be reconstituted

(uncompressed) without loss of detail or information.

These are referred to as bit-preserving or reversiblecompression systems also.

Lossy Compression

where the aim is to obtain the best possible f idel i tyfor a

given bit-rate or minimizing the bit-rate to achieve agiven fidelity measure. Video and audio compression

techniques are most suited to this form of compression.


5/29

5

If an image is compressed it clearly needs to be

uncompressed (decoded) before it can

viewed/listened to. Some processing of datamay be possible in encoded form however.

Lossless compression frequently involves some

form of ent ropy encodingand are based in

information theoretic techniques (see next fig.)

Lossy compression use source encoding

techniques that may involve transform encoding,

differential encoding or vector quantisation (seenext fig.).


6/29

6


7/29

7

Lossless Compression Algorithms

(Repetitive Sequence Suppression)Simple Repetition Suppression

If in a sequence a series on nsuccessive tokens appearswe can replace these with a token and a count number ofoccurrences. We usually need to have a special f lagto

denote when the repeated token appearsFor Example

89400000000000000000000000000000000

can be replaced with

894f32

where f is the flag for zero.

Compression savings depend on the content of the data.


8/29

8

Applications of this simple compression

technique include:

Suppression of zero's in a file (Zero

Length Suppression)

Silence in audio data, Pauses in conversation

etc.

Bitmaps

Blanks in text or program source files

Backgrounds in images

other regular image or data tokens


9/29

9

Run-length Encoding

This encoding method is frequently applied to

images (or pixels in a scan line). It is a small

compression component used in JPEG

compression. In this instance, sequences of image elements X1,

X2, , Xn are mapped to pairs (c1, l1), (c1, L2),

, (cn

, ln

) where ci

represent image intensity or

colour and lithe length of the ith run of pixels (Not

dissimilar to zero length suppression above).


10/29


11/29

11


(Pattern Substitution)

This is a simple form of statistical encoding.

Here we substitute a frequently repeating

pattern(s) with a code. The code is shorterthan the pattern giving us compression.

A simple Pattern Substitution scheme could

employ predefined code (for examplereplace all occurrences of `The' with the

code '&').


12/29

12

More typically tokens are assigned to according tofrequency of occurrence of patterns:

Count occurrence of tokens

Sort in Descending order

Assign some symbols to highest count tokens

A predefined symbol table may used i.e. assigncode ito token i.

However, it is more usual to dynamically assigncodes to tokens. The entropy encoding schemesbelow basically attempt to decide the optimumassignment of codes to achieve the bestcompression.


13/29

13


(Entropy Encoding)

Lossless compression frequently involves

some form of entropy encodingand are

based in information theoretic techniques,

Shannon is father of information theory.


14/29

14

The Shannon-Fano Algorithm

This is a basic information theoretic

algorithm. A simple example will be used to

illustrate the algorithm:

Symbol A B C D E

Count 15 7 6 6 5


15/29

15

Encoding for the Shannon-Fano

Algorithm:

A top-down approach

1. Sort symbols according to their

frequencies/probabilities, e.g., ABCDE.

2. Recursively divide into two parts, eachwith approx. same number of counts.


16/29

16

Huffman Coding

Huffman coding is based on the frequency ofoccurrence of a data item (pixel in images).The principle is to use a lower number of

bits to encode the data that occurs morefrequently. Codes are stored in a CodeBookwhich may be constructed for eachimage or a set of images. In all cases thecode book plus encoded data must betransmitted to enable decoding.

Th H ff l ith i b i fl i d


17/29

17

The Huffman algorithm is now briefly summarised:

A bottom-up approach

1. Initialization: Put all nodes in an OPEN list, keep it

sorted at all times (e.g., ABCDE). 2. Repeat until the OPEN list has only one node left:

(a) From OPEN pick two nodes having the lowestfrequencies/probabilities, create a parent node of

them. (b) Assign the sum of the children's frequencies/

probabilities to the parent node and insert it intoOPEN.

(c) Assign code 0, 1 to the two branches of the tree,and delete the children from OPEN.


18/29

18


19/29

19

The following points are worth noting about theabove algorithm:

Decoding for the above two algorithms istrivial as long as the coding table (thestatistics) is sent before the data. (There is abit overhead for sending this, negligible if the

data file is big.) Unique Prefix Property: no code is a prefix

to any other code (all symbols are at the leaf

nodes) -> great for decoder, unambiguous. If prior statistics are available and accurate,

then Huffman coding is very good.


20/29

20

Huffman Coding of Images

In order to encode images:

Divide image up into 8x8 blocks

Each block is a symbol to be coded

Compute Huffman codes for set of block

Encode blocks accordingly


21/29

21

Adaptive Huffman Coding

The basic Huffman algorithm has been extended, for thefollowing reasons:

(a) The previous algorithms require the statisticalknowledge which is often not available (e.g., live audio,

video). (b) Even when it is available, it could be a heavy overhead

especially when many tables had to be sent when a non-order0 model is used, i.e. taking into account the impact ofthe previous symbol to the probability of the current symbol(e.g., "qu" often come together, ...).

The solution is to use adaptive algorithms, e.g. AdaptiveHuffman coding (applicable to other adaptive compressionalgorithms).


22/29

22

Arithmetic Coding

Huffman coding and the like use an integer number

(k) of bits for each symbol, hence k is never less

than 1. Sometimes, e.g., when sending a 1-bit

image, compression becomes impossible.Map all possible length 2, 3 messages to intervals

in the range [0..1] (in general, needlog pbits to

represent interval of sizep).

To encode message, just send enough bits of a

binary fraction that uniquely specifies the interval.


23/29

23

Problem: how to determine

probabilities?

Simple idea is to use adaptive model:

Start with guess of symbolfrequencies. Update frequency with

each new symbol.

Another idea is to take account ofintersymbol probabilities, e.g.,

Prediction by Partial Matching.


24/29

24

Lempel-Ziv-Welch (LZW) Algorithm

The LZW algorithm is a very common

compression technique.

Suppose we want to encode the Oxford

Concise English dictionary which

contains about 159,000 entries. Whynot just transmit each word as an 18 bit

number?


25/29

25

Problems:

Too many bits,

everyone needs a dictionary,

only works for English text.

Solution: Find a way to build the

dictionary adaptively.


26/29

26

Original methods due to Ziv and

Lempel in 1977 and 1978. Terry Welch

improved the scheme in 1984 (calledLZW compression).

It is used in UNIX compress-1D token

stream (similar to below)

It used in GIF compression - 2D

window tokens (treat image as withHuffman Coding Above).


27/29

27

The LZW Compression Algorithm

can summarised as follows:

w = NIL;

while ( read a character k )

{

if wk exists in the dictionaryw = wk;

else

add wk to the dictionary;

output the code for w;w = k;

}


28/29

28

The LZW Decompression Algorithm

is as follows:read a character k;

output k;

w = k;

while ( read a character k )

/* k could be a character or a code. */{

entry = dictionary entry for k;

output entry;

add w + entry[0] to dictionary;w = entry;

}


29/29

Documents

Mult 8 Compresion