Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

1

Data Compression

The Encoder and PCA

Neural network techniques have been shown useful in the area of datacompression. In general, data compression can be lossless compression orlossy compression. In the latter, some portion of the information representedis actually lost. JPEG and MPEG (video & audio) compression standards areexamples of lossy compression whereas LZW and ’packbits’ are lossless.

Neural net techniques can be applied to achieve both lossless and lossycompression. The following is a closer look at examples of different neuralnet based compression techniques.

2

September 2003 EECE 592 -Data Compression

The Encoder

l Self-supervised backpropagation—The input is reproduced on the output—Hidden layer compresses data—Only hidden layer outputs transmitted—Output layer used for decoding

The encoder is a multi-layer perceptron, trained to act as an autoassociator,using backpropagation.

3


The Encoder

Transmitted

Received

The net is trained to produce the same output pattern that appears on the input.This is also known as self-supervised backpropagation. The aim is toreproduce the input pattern on the output, but using as few hidden layerneurons as possible. The output of the hidden layer then becomes the data tobe transmitted. The "compressed" data is decoded at the receiver using theweights on the output layer.

The illustration shows how an n dimensional input pattern can be transmittedusing less than n inputs (since there are less than n hidden units).

4


The Encoder

l Lossless compression—N orthogonal input patterns can be mapped

onto log2N hidden units.

l Lossy compression—N < log2N

It is known (Rumelhart & McClelland, 1986) that a set of N orthogonal inputpatterns can be mapped onto log2N hidden units. Thus, a figure of log2N canbe taken as a theoretical minimum number of hidden units to achieve losslesscompression.

5


The Encoder

l Cottrell et. Al. (1987)—Image compression

l Greyscale 8 bit image, any dimensionsl Network size : 64 in, 64 out and 16 hiddenl Image processed in 8x8 patches.

An example of this approach for image compression was investigated byCottrell et al. (1987). The aim here was to compress an image (of any size).Their approach used a network with 64 inputs (representing an 8x8 imagepatch), 16 hidden units and 64 outputs. Each input represented a 256 level.

6


The Encoder

—Near state of the artresults obtained!l 64 greyscale

pixels compressed by 16hidden units.

l 150,000 training patternsl Compression is image

dependent however.

Encode & transmit first

8x8 patch

The net was trained using 150,000 presentations of input taken randomly from8x8 patches of the image. Applying the net to each 8x8 non-overlapping patchof the image Cottrell obtained near state of the art compression results! Notehowever that compression was very much tuned to the actual imagecompressed and that results with other kinds of images were less impressive.

7


Principal Component Analysis

l PCA is dimensionality reduction—m bit data—converted to n bit data—where n < m

Another way to view data compression is to regard it as a reduction indimensionality. That is, can a representation of a set of patterns expressedusing n bits of information, be adequately described using m bits, where m isless than n? The goal is to effectively represent the same data using a reducedset of features. Given a set of data then principal component analysis, as wehave already seen, attempts to identify axes (or principal components) alongwhich the data varies the greatest.

8


PCA

l By definition—PCA is lossy compression—Reduction in number of features used to

represent data—Which features to keep and which to remove?

9


PCA

l Principal components—Are axes along which data varies the most—1st principal component exhibits greatest

variance—2nd principal component exhibits next greatest

variance—Etc.

The first principal component is regarded to be the axis, which exhibits thegreatest variance in data. The second component is orthogonal to the first andshows the second greatest variance of data, the third is orthogonal to the firsttwo and so on.

10


PCA

l 2nd component orthogonal to 1st

l 3rd orthogonal to 2nd

l Etc.

Original Data

Original axes - clusters difficult

to discriminate

Principal components -

easier to discriminate

1st PC2nd PC

11


PCA

l The Hebb Rule—Oja 1992 : a single neuron, can be trained to

find the 1st PC—Sanger 1989 : in general m neurons can be

trained to find the first m PCsl The generalized Hebbian algorithm (GHA)

In terms of neural nets, a Hebb like learning rule can be used to train a singleneuron so that the weights converge to the principal component of adistribution (Oja, 1992). In general, a layer of m neurons, can be trained usinga "generalized Hebbian algorithm" (GHA) to find the first m principalcomponents of a set of input data (Sanger 1989).

12


PCA & Image Compression

l Haykin 1994—Describes the GHA for image compression—Example

l Input image—256 x 256, each pixel with 256 grey levels

l PCA network—8 neurons, each with an 8 x 8 receptive field

Haykin describes an application of GHA for image compression.

A 256 by 256 image, where each pixel had 256 grey levels was chosen forencoding. The image was coded using a linear feedforward network of 8neurons, each with 64 inputs. Training was performed by presenting the netwith data taken from 8x8 non-overlapping patches of the image. To allowconvergence, the image was scanned from left to right, and top to bottom,twice. The 8 neurons represent the first 8 eigenvectors obtained byconvergence. (Sanger’s rule).

13


l Haykin 1994—Processing

l Image scanned top-to-bottom, and left-to-right.l The neurons let to converge.l The 8 neurons represents the first 8 eigenvectors.


14



l Example input image

From Haykin, S., "Neural Networks - A Comprehensive Foundation", 1994

Once the weights of the network had converged, they were used to encode theimage (shown above) for transmission.

15


l Encoding details—Each 8 x 8 block multiplied by each neuron—This gives 8 coefficients—Coefficients transmitted.

l In HaykinÕs example, 23 bits needed.l I.e. 23 bits encoded an 8x8x8 image patch.


Transmission

Each 8x8 block of the image was multiplied by the weights from eachof the eight neurons (I,e. applied to each neuron). This generated 8outputs or coefficients.

The coefficient from each neuron was transmitted. The number of bitschosen to represent each coefficient is determined by variance of thecoefficient over the whole image. (I.e. a larger number of bits areneeded to represent something that varies a lot, rather than somethingthat varies a little). In the example described in Haykin, this required23 bits to code the outputs of the 8 neurons. (That is, 23 bits wererequired to encode each 8x8 block of pixels, where each pixel wasrepresented using 8bits).

16


l The weights of the 8 neurons



The illustration above shows the weights obtained by each of the eightneurons. In the diagram, light areas depict positive weights and dark areasnegative (or inhibitory) weights.

17


l Decoding details—Neurons used to decode transmitted

coefficients.—Weights x coefficient = 8 x 8 patch

reconstructed


Receiving (decoding)

The image was reconstructed from the transmitted coefficients usingthe neurons again. This time however, the weights of each neuronwere multiplied by their coefficient and then added together toreconstruct each 8x8 patch of the image.

The following illustration shows the weights obtained by each of theeight neurons. In the diagram, light areas depict positive weights anddark areas negative (or inhibitory) weights.

18


1

2

8

Weights of each neuron represent

one of first eight principal

components of image data.

Obtained using Sanger’s rule.

8x8 patch

of image

Transmission

��

1

2

8


19


l Example output image



Input Image Output Image

Documents

Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the