19
1 Data Compression The Encoder and PCA Neural network techniques have been shown useful in the area of data compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the information represented is actually lost. JPEG and MPEG (video & audio) compression standards are examples of lossy compression whereas LZW and ’packbits’ are lossless. Neural net techniques can be applied to achieve both lossless and lossy compression. The following is a closer look at examples of different neural net based compression techniques.

Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

1

Data Compression

The Encoder and PCA

Neural network techniques have been shown useful in the area of datacompression. In general, data compression can be lossless compression orlossy compression. In the latter, some portion of the information representedis actually lost. JPEG and MPEG (video & audio) compression standards areexamples of lossy compression whereas LZW and ’packbits’ are lossless.

Neural net techniques can be applied to achieve both lossless and lossycompression. The following is a closer look at examples of different neuralnet based compression techniques.

Page 2: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

2

September 2003 EECE 592 -Data Compression

The Encoder

l Self-supervised backpropagation—The input is reproduced on the output—Hidden layer compresses data—Only hidden layer outputs transmitted—Output layer used for decoding

The encoder is a multi-layer perceptron, trained to act as an autoassociator,using backpropagation.

Page 3: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

3

September 2003 EECE 592 -Data Compression

The Encoder

Transmitted

Received

The net is trained to produce the same output pattern that appears on the input.This is also known as self-supervised backpropagation. The aim is toreproduce the input pattern on the output, but using as few hidden layerneurons as possible. The output of the hidden layer then becomes the data tobe transmitted. The "compressed" data is decoded at the receiver using theweights on the output layer.

The illustration shows how an n dimensional input pattern can be transmittedusing less than n inputs (since there are less than n hidden units).

Page 4: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

4

September 2003 EECE 592 -Data Compression

The Encoder

l Lossless compression—N orthogonal input patterns can be mapped

onto log2N hidden units.

l Lossy compression—N < log2N

It is known (Rumelhart & McClelland, 1986) that a set of N orthogonal inputpatterns can be mapped onto log2N hidden units. Thus, a figure of log2N canbe taken as a theoretical minimum number of hidden units to achieve losslesscompression.

Page 5: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

5

September 2003 EECE 592 -Data Compression

The Encoder

l Cottrell et. Al. (1987)—Image compression

l Greyscale 8 bit image, any dimensionsl Network size : 64 in, 64 out and 16 hiddenl Image processed in 8x8 patches.

An example of this approach for image compression was investigated byCottrell et al. (1987). The aim here was to compress an image (of any size).Their approach used a network with 64 inputs (representing an 8x8 imagepatch), 16 hidden units and 64 outputs. Each input represented a 256 level.

Page 6: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

6

September 2003 EECE 592 -Data Compression

The Encoder

—Near state of the artresults obtained!l 64 greyscale

pixels compressed by 16hidden units.

l 150,000 training patternsl Compression is image

dependent however.

Encode & transmit first

8x8 patch

The net was trained using 150,000 presentations of input taken randomly from8x8 patches of the image. Applying the net to each 8x8 non-overlapping patchof the image Cottrell obtained near state of the art compression results! Notehowever that compression was very much tuned to the actual imagecompressed and that results with other kinds of images were less impressive.

Page 7: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

7

September 2003 EECE 592 -Data Compression

Principal Component Analysis

l PCA is dimensionality reduction—m bit data—converted to n bit data—where n < m

Another way to view data compression is to regard it as a reduction indimensionality. That is, can a representation of a set of patterns expressedusing n bits of information, be adequately described using m bits, where m isless than n? The goal is to effectively represent the same data using a reducedset of features. Given a set of data then principal component analysis, as wehave already seen, attempts to identify axes (or principal components) alongwhich the data varies the greatest.

Page 8: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

8

September 2003 EECE 592 -Data Compression

PCA

l By definition—PCA is lossy compression—Reduction in number of features used to

represent data—Which features to keep and which to remove?

Page 9: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

9

September 2003 EECE 592 -Data Compression

PCA

l Principal components—Are axes along which data varies the most—1st principal component exhibits greatest

variance—2nd principal component exhibits next greatest

variance—Etc.

The first principal component is regarded to be the axis, which exhibits thegreatest variance in data. The second component is orthogonal to the first andshows the second greatest variance of data, the third is orthogonal to the firsttwo and so on.

Page 10: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

10

September 2003 EECE 592 -Data Compression

PCA

l 2nd component orthogonal to 1st

l 3rd orthogonal to 2nd

l Etc.

Original Data

Original axes - clusters difficult

to discriminate

Principal components -

easier to discriminate

1st PC2nd PC

Page 11: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

11

September 2003 EECE 592 -Data Compression

PCA

l The Hebb Rule—Oja 1992 : a single neuron, can be trained to

find the 1st PC—Sanger 1989 : in general m neurons can be

trained to find the first m PCsl The generalized Hebbian algorithm (GHA)

In terms of neural nets, a Hebb like learning rule can be used to train a singleneuron so that the weights converge to the principal component of adistribution (Oja, 1992). In general, a layer of m neurons, can be trained usinga "generalized Hebbian algorithm" (GHA) to find the first m principalcomponents of a set of input data (Sanger 1989).

Page 12: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

12

September 2003 EECE 592 -Data Compression

PCA & Image Compression

l Haykin 1994—Describes the GHA for image compression—Example

l Input image—256 x 256, each pixel with 256 grey levels

l PCA network—8 neurons, each with an 8 x 8 receptive field

Haykin describes an application of GHA for image compression.

A 256 by 256 image, where each pixel had 256 grey levels was chosen forencoding. The image was coded using a linear feedforward network of 8neurons, each with 64 inputs. Training was performed by presenting the netwith data taken from 8x8 non-overlapping patches of the image. To allowconvergence, the image was scanned from left to right, and top to bottom,twice. The 8 neurons represent the first 8 eigenvectors obtained byconvergence. (Sanger’s rule).

Page 13: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

13

September 2003 EECE 592 -Data Compression

l Haykin 1994—Processing

l Image scanned top-to-bottom, and left-to-right.l The neurons let to converge.l The 8 neurons represents the first 8 eigenvectors.

PCA & Image Compression

Page 14: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

14

September 2003 EECE 592 -Data Compression

PCA & Image Compression

l Example input image

From Haykin, S., "Neural Networks - A Comprehensive Foundation", 1994

Once the weights of the network had converged, they were used to encode theimage (shown above) for transmission.

Page 15: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

15

September 2003 EECE 592 -Data Compression

l Encoding details—Each 8 x 8 block multiplied by each neuron—This gives 8 coefficients—Coefficients transmitted.

l In HaykinÕs example, 23 bits needed.l I.e. 23 bits encoded an 8x8x8 image patch.

PCA & Image Compression

Transmission

Each 8x8 block of the image was multiplied by the weights from eachof the eight neurons (I,e. applied to each neuron). This generated 8outputs or coefficients.

The coefficient from each neuron was transmitted. The number of bitschosen to represent each coefficient is determined by variance of thecoefficient over the whole image. (I.e. a larger number of bits areneeded to represent something that varies a lot, rather than somethingthat varies a little). In the example described in Haykin, this required23 bits to code the outputs of the 8 neurons. (That is, 23 bits wererequired to encode each 8x8 block of pixels, where each pixel wasrepresented using 8bits).

Page 16: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

16

September 2003 EECE 592 -Data Compression

l The weights of the 8 neurons

PCA & Image Compression

From Haykin, S., "Neural Networks - A Comprehensive Foundation", 1994

The illustration above shows the weights obtained by each of the eightneurons. In the diagram, light areas depict positive weights and dark areasnegative (or inhibitory) weights.

Page 17: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

17

September 2003 EECE 592 -Data Compression

l Decoding details—Neurons used to decode transmitted

coefficients.—Weights x coefficient = 8 x 8 patch

reconstructed

PCA & Image Compression

Receiving (decoding)

The image was reconstructed from the transmitted coefficients usingthe neurons again. This time however, the weights of each neuronwere multiplied by their coefficient and then added together toreconstruct each 8x8 patch of the image.

The following illustration shows the weights obtained by each of theeight neurons. In the diagram, light areas depict positive weights anddark areas negative (or inhibitory) weights.

Page 18: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

18

September 2003 EECE 592 -Data Compression

1

2

8

Weights of each neuron represent

one of first eight principal

components of image data.

Obtained using Sanger’s rule.

8x8 patch

of image

Transmission

����

1

2

8

PCA & Image Compression

Page 19: Data Compression - University of British Columbia · compression. In general, data compression can be lossless compression or lossy compression. In the latter, some portion of the

19

September 2003 EECE 592 -Data Compression

l Example output image

PCA & Image Compression

From Haykin, S., "Neural Networks - A Comprehensive Foundation", 1994

Input Image Output Image