Mult 8 Compresion

Embed Size (px)

Citation preview

  • 8/13/2019 Mult 8 Compresion

    1/29

    8. Compression

  • 8/13/2019 Mult 8 Compresion

    2/29

    2

    Video and Audio Compression

    Video and Audio files are very large. Unless wedevelop and maintain very high bandwidthnetworks (Gigabytes per second or more) we have

    to compress the data.Relying on higher bandwidths is not a good option -

    M25 Syndrome: Traffic needs ever increases andwill adapt to swamp current limit whatever this is.

    Compression becomes part of the representation orcod ingscheme which have become popularaudio, image and video formats.

  • 8/13/2019 Mult 8 Compresion

    3/29

    3

    What is Compression?

    Compression basically employs redundancy in the data:

    Temporal - in 1D data, 1D signals, Audio etc.

    Spatial - correlation between neighbouring pixels or dataitems

    Spectral - correlation between colour or luminescencecomponents. This uses the frequency domain to exploitrelationships between frequency of change in data.

    psycho-visual - exploit perceptual properties of the humanvisual system.

  • 8/13/2019 Mult 8 Compresion

    4/29

    4

    Compression can be categorised in

    two broad ways:

    Lossless Compression

    where data is compressed and can be reconstituted

    (uncompressed) without loss of detail or information.

    These are referred to as bit-preserving or reversiblecompression systems also.

    Lossy Compression

    where the aim is to obtain the best possible f idel i tyfor a

    given bit-rate or minimizing the bit-rate to achieve agiven fidelity measure. Video and audio compression

    techniques are most suited to this form of compression.

  • 8/13/2019 Mult 8 Compresion

    5/29

    5

    If an image is compressed it clearly needs to be

    uncompressed (decoded) before it can

    viewed/listened to. Some processing of datamay be possible in encoded form however.

    Lossless compression frequently involves some

    form of ent ropy encodingand are based in

    information theoretic techniques (see next fig.)

    Lossy compression use source encoding

    techniques that may involve transform encoding,

    differential encoding or vector quantisation (seenext fig.).

  • 8/13/2019 Mult 8 Compresion

    6/29

    6

  • 8/13/2019 Mult 8 Compresion

    7/29

    7

    Lossless Compression Algorithms

    (Repetitive Sequence Suppression)Simple Repetition Suppression

    If in a sequence a series on nsuccessive tokens appearswe can replace these with a token and a count number ofoccurrences. We usually need to have a special f lagto

    denote when the repeated token appearsFor Example

    89400000000000000000000000000000000

    can be replaced with

    894f32

    where f is the flag for zero.

    Compression savings depend on the content of the data.

  • 8/13/2019 Mult 8 Compresion

    8/29

    8

    Applications of this simple compression

    technique include:

    Suppression of zero's in a file (Zero

    Length Suppression)

    Silence in audio data, Pauses in conversation

    etc.

    Bitmaps

    Blanks in text or program source files

    Backgrounds in images

    other regular image or data tokens

  • 8/13/2019 Mult 8 Compresion

    9/29

    9

    Run-length Encoding

    This encoding method is frequently applied to

    images (or pixels in a scan line). It is a small

    compression component used in JPEG

    compression. In this instance, sequences of image elements X1,

    X2, , Xn are mapped to pairs (c1, l1), (c1, L2),

    , (cn

    , ln

    ) where ci

    represent image intensity or

    colour and lithe length of the ith run of pixels (Not

    dissimilar to zero length suppression above).

  • 8/13/2019 Mult 8 Compresion

    10/29

  • 8/13/2019 Mult 8 Compresion

    11/29

    11

    Lossless Compression Algorithms

    (Pattern Substitution)

    This is a simple form of statistical encoding.

    Here we substitute a frequently repeating

    pattern(s) with a code. The code is shorterthan the pattern giving us compression.

    A simple Pattern Substitution scheme could

    employ predefined code (for examplereplace all occurrences of `The' with the

    code '&').

  • 8/13/2019 Mult 8 Compresion

    12/29

    12

    More typically tokens are assigned to according tofrequency of occurrence of patterns:

    Count occurrence of tokens

    Sort in Descending order

    Assign some symbols to highest count tokens

    A predefined symbol table may used i.e. assigncode ito token i.

    However, it is more usual to dynamically assigncodes to tokens. The entropy encoding schemesbelow basically attempt to decide the optimumassignment of codes to achieve the bestcompression.

  • 8/13/2019 Mult 8 Compresion

    13/29

    13

    Lossless Compression Algorithms

    (Entropy Encoding)

    Lossless compression frequently involves

    some form of entropy encodingand are

    based in information theoretic techniques,

    Shannon is father of information theory.

  • 8/13/2019 Mult 8 Compresion

    14/29

    14

    The Shannon-Fano Algorithm

    This is a basic information theoretic

    algorithm. A simple example will be used to

    illustrate the algorithm:

    Symbol A B C D E

    Count 15 7 6 6 5

  • 8/13/2019 Mult 8 Compresion

    15/29

    15

    Encoding for the Shannon-Fano

    Algorithm:

    A top-down approach

    1. Sort symbols according to their

    frequencies/probabilities, e.g., ABCDE.

    2. Recursively divide into two parts, eachwith approx. same number of counts.

  • 8/13/2019 Mult 8 Compresion

    16/29

    16

    Huffman Coding

    Huffman coding is based on the frequency ofoccurrence of a data item (pixel in images).The principle is to use a lower number of

    bits to encode the data that occurs morefrequently. Codes are stored in a CodeBookwhich may be constructed for eachimage or a set of images. In all cases thecode book plus encoded data must betransmitted to enable decoding.

    Th H ff l ith i b i fl i d

  • 8/13/2019 Mult 8 Compresion

    17/29

    17

    The Huffman algorithm is now briefly summarised:

    A bottom-up approach

    1. Initialization: Put all nodes in an OPEN list, keep it

    sorted at all times (e.g., ABCDE). 2. Repeat until the OPEN list has only one node left:

    (a) From OPEN pick two nodes having the lowestfrequencies/probabilities, create a parent node of

    them. (b) Assign the sum of the children's frequencies/

    probabilities to the parent node and insert it intoOPEN.

    (c) Assign code 0, 1 to the two branches of the tree,and delete the children from OPEN.

  • 8/13/2019 Mult 8 Compresion

    18/29

    18

  • 8/13/2019 Mult 8 Compresion

    19/29

    19

    The following points are worth noting about theabove algorithm:

    Decoding for the above two algorithms istrivial as long as the coding table (thestatistics) is sent before the data. (There is abit overhead for sending this, negligible if the

    data file is big.) Unique Prefix Property: no code is a prefix

    to any other code (all symbols are at the leaf

    nodes) -> great for decoder, unambiguous. If prior statistics are available and accurate,

    then Huffman coding is very good.

  • 8/13/2019 Mult 8 Compresion

    20/29

    20

    Huffman Coding of Images

    In order to encode images:

    Divide image up into 8x8 blocks

    Each block is a symbol to be coded

    Compute Huffman codes for set of block

    Encode blocks accordingly

  • 8/13/2019 Mult 8 Compresion

    21/29

    21

    Adaptive Huffman Coding

    The basic Huffman algorithm has been extended, for thefollowing reasons:

    (a) The previous algorithms require the statisticalknowledge which is often not available (e.g., live audio,

    video). (b) Even when it is available, it could be a heavy overhead

    especially when many tables had to be sent when a non-order0 model is used, i.e. taking into account the impact ofthe previous symbol to the probability of the current symbol(e.g., "qu" often come together, ...).

    The solution is to use adaptive algorithms, e.g. AdaptiveHuffman coding (applicable to other adaptive compressionalgorithms).

  • 8/13/2019 Mult 8 Compresion

    22/29

    22

    Arithmetic Coding

    Huffman coding and the like use an integer number

    (k) of bits for each symbol, hence k is never less

    than 1. Sometimes, e.g., when sending a 1-bit

    image, compression becomes impossible.Map all possible length 2, 3 messages to intervals

    in the range [0..1] (in general, needlog pbits to

    represent interval of sizep).

    To encode message, just send enough bits of a

    binary fraction that uniquely specifies the interval.

  • 8/13/2019 Mult 8 Compresion

    23/29

    23

    Problem: how to determine

    probabilities?

    Simple idea is to use adaptive model:

    Start with guess of symbolfrequencies. Update frequency with

    each new symbol.

    Another idea is to take account ofintersymbol probabilities, e.g.,

    Prediction by Partial Matching.

  • 8/13/2019 Mult 8 Compresion

    24/29

    24

    Lempel-Ziv-Welch (LZW) Algorithm

    The LZW algorithm is a very common

    compression technique.

    Suppose we want to encode the Oxford

    Concise English dictionary which

    contains about 159,000 entries. Whynot just transmit each word as an 18 bit

    number?

  • 8/13/2019 Mult 8 Compresion

    25/29

    25

    Problems:

    Too many bits,

    everyone needs a dictionary,

    only works for English text.

    Solution: Find a way to build the

    dictionary adaptively.

  • 8/13/2019 Mult 8 Compresion

    26/29

    26

    Original methods due to Ziv and

    Lempel in 1977 and 1978. Terry Welch

    improved the scheme in 1984 (calledLZW compression).

    It is used in UNIX compress-1D token

    stream (similar to below)

    It used in GIF compression - 2D

    window tokens (treat image as withHuffman Coding Above).

  • 8/13/2019 Mult 8 Compresion

    27/29

    27

    The LZW Compression Algorithm

    can summarised as follows:

    w = NIL;

    while ( read a character k )

    {

    if wk exists in the dictionaryw = wk;

    else

    add wk to the dictionary;

    output the code for w;w = k;

    }

  • 8/13/2019 Mult 8 Compresion

    28/29

    28

    The LZW Decompression Algorithm

    is as follows:read a character k;

    output k;

    w = k;

    while ( read a character k )

    /* k could be a character or a code. */{

    entry = dictionary entry for k;

    output entry;

    add w + entry[0] to dictionary;w = entry;

    }

  • 8/13/2019 Mult 8 Compresion

    29/29