Discrete Cosine Transforms - Semantic Scholar · Sinusoidal unitary transform: ~ is an invertible linear transform whose kernel describes a set of complete, orthogonal discrete cosine

DISCRETE COSINE TRANSFORMS~ Jennie G. Abraham

Fall 2009, EE5355

Reference Book: THE TRANSFORM AND DATA COMPRESSION HANDBOOK,

edited by K.R. Rao and P.C. Yip

4.0 Transform Introduction

In general, there are several characteristics that are desirable for the purpose of data

compression.

Transforms are useful entities that encapsulate these some/all of these characteristics:

Data decorrelation: The ideal transform completely decorrelates the data in a sequence/block;

i.e., it packs the most amount of energy in the fewest number of coefficients. In this way, many

coefficients can be discarded after quantization and prior to encoding. It is important to note

that the transform operation itself does not achieve any compression. It aims at decorrelating

the original data and compacting a large fraction of the signal energy into relatively few

transform coefficients.

Data-independent basis functions: Owing to the large statistical variations among data, the

optimum transform usually depends on the data, and finding the basis functions of such

transform is a computationally intensive task. This is particularly a problem if the data blocks

are highly nonstationary, which necessitates the use of more than one set of basis functions to

achieve high decorrelation. Therefore, it is desirable to trade optimum performance for a

transform whose basis functions are data-independent.

Fast implementation: The number of operations required for an n-point transform is generally

of the order O(n2). Some transforms have fast implementations, which reduce the number of

operations to O(n log n). For a separable n × n 2-D transform, performing the row and column

1-D transforms successively reduces the number of operations from O(n4) to O(2n2 log n).

4.1 DCT Introduction

The discrete cosine transforms (DCT) and discrete sine transform (DST) are members of a family

of sinusoidal unitary transforms. They are real, orthogonal, and separable with fast algorithms for

its computation. They have a great relevance to data compression

Sinusoidal unitary transform: ~ is an invertible linear transform whose kernel describes a set of

complete, orthogonal discrete cosine and/or sine basis functions.

E.g.: KLT, generalized DFT, generalized discrete Hartley transform, and various types of

the DCT and DST are members of this class of unitary transforms.

The family of discrete trigonometric transforms consists of 8 versions of DCT.

Each transform is identified as EVEN or ODD and of type I, II, III, and IV.

All present digital signal and image processing applications (mainly transform coding and

digital filtering of signals) involve only even types of the DCT and DST.

Therefore, we consider these four even types of DCT.

DCT-I Wang and Hunt defined for the order N +1.

DCT-II Ahmed, Natarajan, and Rao excellent energy compaction property, best

approximation for the optimal KLT

DCT-III Ahmed, Natarajan, and Rao Inverse of DCT-II

DCT-IV Jain fast implementation of lapped orthogonal transform

for the efficient transform/subband coding

4.1.2 Definitions of DCTs

Note:

For normalized even types of DCT in the matrix form : calculate RHS value for each n and

k at (n,k)

N is assumed to be an integer power of 2, i.e., N = 2m

subscript of matrix denotes its order

superscript denotes the version number

4.1.3 Mathematical Properties

DCT Matrices are real and orthogonal

Unitary Property

Linearity Property

, for a matrix M, constants α and β, and vectors

g and f, all DCTs are linear transforms.

The Convolution-Multiplication Property

Convolution in the spatial domain is equivalent to taking an inverse transform of the

product of forward transforms of two data sequences.

The convolution — multiplication property is a powerful tool for performing

digital filtering in the transform domain.

All DCTs are separable transforms multidimensional transform can be decomposed

into successive application of one-dimensional (1-D) transforms in the appropriate

directions.

4.3 Relations to the KLT

KLT is an optimal transform for data compression in a statistical sense because

it decorrelates a signal in the transform domain,

packs the most information in a few coefficients, and

minimizes mean-square error between the reconstructed and original signal compared to

other transform.

However, KLT is constructed from the eigenvalues and the corresponding eigenvectors of a

covariance matrix of the data to be transformed; it is signal-dependent, and there is no

general algorithm for its fast computation.

There is asymptotic equivalence of the family of DCTs with respect to KLT for a first-order

stationary Markov process in terms of transform size and the adjacent (inter element) correlation

coefficient ρ.

The performance of DCTs, particularly important in transform coding, is associated with the KLT.

For finite length data, DCTs and DSTs provide different approximations to KLT, and the best

approximating transform varies with the value of correlation coefficient ρ.

E.g.:

ρ KLT is reduced to

1 DCT-II (DCT-III)

0 DST-I

-1 DST-II (DST-III)

For infinite length data i.e. data if the transform size N increases (i.e., N tends to infinity

KLT is reduced to DCT I or DCT IV

This asymptotic behavior implies that DCTs and DSTs can be used as substitutes for KLT of

certain random processes.

4.4 Relation to DFT[Question (?)]

DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only

real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data

with even symmetry. The obvious distinction between a DCT and a DFT is that the former uses

only cosine functions, while the latter uses both cosines and sines (in the form of complex

exponentials).

Compared with DFT, DCT has two main advantages:

It’s a real transform with better computational efficiency than DFT which by definition is a

complex transform.

It does not introduce discontinuity while imposing periodicity in the time signal. In DFT, as

the time signal is truncated and assumed periodic, discontinuity is introduced in time

domain and some corresponding artifacts is introduced in frequency domain. But as even

symmetry is assumed while truncating the time signal, no discontinuity and related artifacts

are introduced in DCT.

4.5 Relevance to data compression DCT-II

Performance of DCT-II is closest to the statistically optimal KLT based on a number of

performance criteria.

variance distribution,

energy packing efficiency,

residual correlation,

rate distortion,

maximum reducible bits …

Exhibition of desirable characteristics for data compression namely,

o Data decorrelation

o Data-independent basis functions

o Fast implementation

The importance of DCT II is further accentuated by its -

Superiority in bandwidth compression (redundancy reduction) of a wide range of signals.

Powerful performance in the bit-rate reduction.

Existence of fast algorithms for its implementation.

DCT-II and its inversion, DCT-III, have been employed in the international image/video coding

standards: e.g.: JPEG, MPEG, H.261, H.263, H.264…

4.6 DCT Computation

4.6.1 : DCT Definition

4.6.2 DCT– Matrix Form:

Example of a 4x4 DCT Matrix:

Example of a 4x4 IDCT Matrix:

Example: A -point DCT matrix can be generated by

Assume the signal is , then its DCT transform is:

The inverse transform is:

4.6.3 Computation of DCT from DFT (using 2N point FFT):

To derive the DCT of an N-point real signal sequence , we

first construct a new sequence of points:

This 2N-point sequence is assumed to repeat its self outside the range

, i.e., it is periodic with period , and it is even symmetric with respect

to the point :

If we shift the signals to the right by 1/2, or, equivalently, shift to the left by 1/2 by

defining another index , then is even symmetric with

respect to . In the following we simply represent this new function by .

The DFT of this 2N-point even symmetric sequence can be found as:

Since is even and is odd with respect to , all terms

in the second summation are odd and the summation is zero (while all terms in the first summation

are even). It can also be seen that all is real and even . Next, we replace

by and get

Note that since all terms in the summation are all even symmectric, only the first half of the data

points need to be used. Moreover, as cosine function is even, is also even and

periodic with period , we have

,

indicating that a point ( ) in the second half is the same as its

corresponding point in the first half, i.e., the second half is redundant and therefore

can be dropped.

Now we have the discrete cosine transform (DCT):

where the nth row and mth column of the DCT matrix:

All row vectors of this DCT matrix are orthogonal and normalized except the first one ( ):

It is straightforward to show that a DCT matrix is orthonormal for n even, since the norm of

each row is unity and the dot product of any pair of rows is zero(the product terms may be

expressed as the sum of a pair of cosine functions, which are each zero mean).

To make DCT a orthonormal transform, we define a coefficient

so that DCT now becomes

where is modified with , which is also the component in the nth row and mth

coloum of the N by N cosine transform matrix:

Here is the ith row of the DCT transform matrix . As these

row vectors are orthogonal:

the DCT matrix is orthogonal:

The inverse DCT is

or in matrix form:

4.6.4 DCT Fast Algorithms:

1. N – point DCT via 2N point FFT

2. N – point DCT via N point FFT

3. Recursive Fast Algorithm

4. Sparse Matrix Factors

5. Prime Factor Algorithm for DCT

6. DIT & DIF Algorithms for DCT

Fast DCT algorithm

Forward DCT

The DCT of a sequence can be implemented by FFT. First

we define a new sequence :

Then the DCT of can be written as the following (the coefficient is dropped for now for

simplicity):

where the first summation is for all even terms and second all odd terms. We define for the second

summation , then the limits of the summation and for

becomes and for , and the second summation can be written as

where the equal sign is due to the trigonometric identity:

Now the two summations in the expression of can be combined

Next, consider the DFT of :

If we multiply both sides by

and take the real part of the result (and keep in mind that both and are real), we get:

The last equal sign is due to the trigonometric identity:

This expression for is identical to that for above, therefore we get

where is the DFT of (defined from ) which can be computed using FFT

algorithm with time complexity .

In summary, fast forward DCT can be implemented in 3 steps:

Step 1: Generate a sequence from the given sequence :

Step 2: Obtain DFT of using FFT. (As is real, is symmetric and

only half of the data points need be computed.)

step 3: Obtain DCT from by

Inverse DCT

The most obvious way to do inverse DCT is to reverse the order and the mathematical operations

of the three steps for the forward DCT:

step 1: Obtain from . In step 3 above there are N equations but 2N variables

(both real and imaginary parts of ). However, note that as are real, the real

part of its spectrum is even (N+1 independent variables) and imaginary part odd (N-1

independent variables). So there are only N variables which can be obtained by solving the

N equations.

step 2: Obtain from by inverse DFT also using FFT in complexity.

Step 3: Obtain from by

However, there is a more efficient way to do the inverse DCT. Consider first the real part of the

inverse DFT of the sequence :

This equation gives the inverse DCT of all even data

points . To obtain the odd data points, recall

that , and all odd data points

can be obtained from the second half of the previous equation in reverse order

.

In summary, we have these steps to compute IDCT:

step 1: Generate a sequence from the given DCT sequence :

step 2: Obtain from by inverse DFT also using FFT. (Only the real part need

be computed.)

Step 3: Obtain from by

These three steps are mathematically equivalent to the steps of the first method.

Data Compression

Although representing images in digital form allows visual information to be easily manipulated in

useful and novel ways, there is one potential problem with digital images—the large number of

bits required to represent even a single digital image directly. The need for image compression

becomes apparent when we compute the number of bits per image resulting from typical sampling

and quantization schemes. We consider the amount of storage for the “Lena” digital image shown

in Fig. 4.7.

The monochrome (grayscale) version of this image with a resolution 512 × 512 × 8 bits/pixel

requires a total of 2,097,152 bits, or equivalently 262,144 bytes. The color version of the same

image in RGB format (red, green, and blue color bands) with a resolution of 8 bits/color requires a

total of 6,291,456 bits (=512 × 512 ×3 × 8 bits/pixel), or 786,432 bytes. Such an image should be

compressed for efficient storage or transmission.

In order to utilize digital images effectively, specific techniques are needed to reduce the number

of bits required for their representation. Fortunately, digital images generally contain a significant

amount of redundancy (spatial, spectral, or temporal redundancy). Image data compression (the

art/science of efficient coding of the picture data) aims at taking advantage of this redundancy to

reduce the number of bits required to represent an image. This can result in significantly reducing

the memory needed for image storage and channel capacity for image transmission.

Image compression methods can be classified into two fundamental groups: lossless and lossy

Lossless compression -

Reconstructed image after compression identical to the original image.

Modest 1:2 or 1:3 compression ratios are achieved.

Lossy compression -

Reconstructed image contains degradations relative to the original.

Generally, more compression is obtained at the expense of more distortion.

Transform Coding Compression Scheme: [Question (?)]

The most used lossy compression technique is transform coding.

A general transform coding scheme involves subdividing an N ×N image into smaller

nonoverlapping n × n sub-image blocks and performing a unitary transform on each block. The

transform operation itself does not achieve any compression. It aims at decorrelating the original

data and compacting a large fraction of the signal energy into a relatively small set of transform

coefficients (energy packing property). In this way, many coefficients can be discarded after

quantization and prior to encoding.

In principle, DCT introduces no loss to the source samples, it merely transforms them to a domain

in which they can be more efficiently encoded.

Most practical transform coding systems are based on DCT of types II and III, which –

Provides good compromise between energy packing ability and computational complexity.

The energy packing property of DCT is superior to that of any other unitary transform.

Transforms that redistribute or pack the most information into the fewest coefficients

provide the best sub-image approximations and, consequently, the smallest reconstruction

errors.

DCT basis images are fixed (image independent) as opposed to the optimal KLT which is

data dependent.

E.g.: DCT-Based Image Compression/Decompression

Block diagram of encoder and decoder for JPEG DCT-based image compression and

decompression.

Documents

Discrete Cosine Transforms - Semantic Scholar · Sinusoidal unitary transform: ~ is an invertible linear transform whose kernel describes a set of complete, orthogonal discrete cosine