Shift, scaling and derivative properties for the discrete cosine transform

ARTICLE IN PRESS

0165-1684/$ - se

doi:10.1016/j.sig

�CorrespondE-mail addre

Signal Processing 86 (2006) 1597–1603

www.elsevier.com/locate/sigpro

Shift, scaling and derivative properties for the discretecosine transform

Robert Reevesa,�, Kurt Kubikb

aSchool of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Q 4001, AustraliabSchool of Information Technology and Electrical Engineering, University of Queensland, Brisbane, Q 4072, Australia

Received 5 December 2002; accepted 5 September 2005

Available online 30 March 2006

Abstract

A set of DCT domain properties for shifting and scaling by real amounts, and taking linear operations such as

differentiation is described. The DCT coefficients of a sampled signal are subjected to a linear transform, which returns

the DCT coefficients of the shifted, scaled and/or differentiated signal. The properties are derived by considering the

inverse discrete transform as a cosine series expansion of the original continuous signal, assuming sampling in accordance

with the Nyquist criterion. This approach can be applied in the signal domain, to give, for example, DCT based

interpolation or derivatives. The same approach can be taken in decoding from the DCT to give, for example, derivatives

in the signal domain. The techniques may prove useful in compressed domain processing applications, and are interesting

because they allow operations from the continuous domain such as differentiation to be implemented in the discrete

domain. An image matching algorithm illustrates the use of the properties, with improvements in computation time and

matching quality.

r 2006 Elsevier B.V. All rights reserved.

Keywords: DCT; Image compression; Derivative; Shift; Scale; Image matching; JPEG

1. Introduction

The Discrete Cosine Transform (DCT) [1,2] hasfound wide application in image and video compres-sion, and continues to be at the center of innovativeresearch, with recent publications focussing oncomputation speed (e.g. [3,4], and transform domainfiltering (e.g. [5]). In this paper, a novel interpretationof the DCT is presented, which allows someinteresting properties to be derived. These properties

e front matter r 2006 Elsevier B.V. All rights reserved

pro.2005.09.041

ing author.

ss: [email protected] (R. Reeves).

include ways to generate the DCT of shifted, scaledor differentiated versions of a signal, directly from itsDCT coefficients. The method is based on interpret-ing the DCT coefficients as the coefficients of a cosineseries expansion of a band-limited, symmetricallyextended continuous signal. Similar approachesbased on splines and polynomial bases have beenreported [6,7]. In Section 2 we establish the validity oftreating the DCT as a sum of continuous sinusoidalbases. In Section 3, we use this interpretation toderive properties for DCT domain shifting, scalingand differentiation, and in Section 4, we illustrate theuse of the properties in an image matching algorithm.

.

www.elsevier.com/locate/sigpro

dx.doi.org/10.1016/j.sigpro.2005.09.041

mailto:[email protected]

ARTICLE IN PRESSR. Reeves, K. Kubik / Signal Processing 86 (2006) 1597–16031598

2. DCT as sum of continuous basis functions

Usually, the DCT is interpreted as a sum ofdiscrete bases, summing to a discrete sequence. Inthis section we interpret the DCT as a sum ofcontinuous bases. These bases sum to the symme-trically extended, band-limited continuous signal,which when sampled, gives rise to the discretesequence referred to above.

Let gðxÞ be a band limited continuous signal, suchthat omaxop. Without loss of generality, a samplinginterval of one is used, producing N samples gðnÞ atn ¼ 0; . . . ;N � 1. The forward discrete transformand its inverse are defined as

GðmÞ ¼ TfgðnÞg ¼XN�1n¼0

gðnÞf nðmÞ (1)

and

gðnÞ ¼ T�1fGðmÞg ¼XN�1m¼0

GðmÞrmðnÞ, (2)

where f nðmÞ is the forward transform kernel, rmðnÞ isthe reverse transform kernel, and n and m areintegers from 0 to N � 1. The type-2 DCT [8], asused in the JPEG [1] and related compressionschemes is defined by

f nðmÞ ¼ rmðnÞ ¼ cðmÞ

ffiffiffi2p

ffiffiffiffiffiNp cosðð2nþ 1Þmp=2NÞ,

(3)

with cðmÞ ¼ 1=ffiffiffi2p

for m ¼ 0 and cðmÞ ¼ 1 other-wise, and gðxÞ is assumed to be symmetricallyextended with period 2N, so that gðxÞ ¼ gðxþ 2NÞ

and gð�ð1=2Þ þ xÞ ¼ gð�ð1=2Þ � xÞ.Though rmðnÞ is defined only for integer values

of n, the expression can be computed for any realvalue. Replacing the discrete n by real x gives a sumof continuous cosine basis functions,

gðxÞ ¼XN�1m¼0

rmðxÞGðmÞ. (4)

By considering the periodicity of the bases rmðxÞ andthe orthonormality of the DCT kernel, it is evidentthat when gðxÞ is sampled with an interval of one,the values gðnÞ and their symmetric and periodic

repetitions result, as follows:

gðxÞ ¼XN�1m¼0

rmðxÞXN�1n¼0

gðnÞf nðmÞ ð5Þ

¼XN�1n¼0

gðnÞXN�1m¼0

rmðxÞf nðmÞ. ð6Þ

Taking samples at values of x ¼ p, where p is aninteger from 0 to N � 1, and noting that f nðmÞ ¼

rmðnÞ we have

gðpÞ ¼XN�1n¼0

gðnÞXN�1m¼0

rmðpÞf nðmÞ ð7Þ

¼XN�1n¼0

gðnÞXN�1m¼0

f pðmÞf nðmÞ. ð8Þ

By orthonormality of the DCT kernel,PN�1m¼0 f pðmÞf nðmÞ is equal to zero unless p ¼ n,

and one otherwise. Thus sampling gðpÞ produces thesame samples gðnÞ as sampling gðxÞ. This is sufficientto imply the equivalence of these two signals, aslong as gðxÞ is sampled in accordance with theNyquist criterion. It is trivial to extend thisargument to those values of p outside the range 0to N � 1 by considering the periodic and symmetricextensions of rmðpÞ: rmðpÞ ¼ rmðpþ 2NÞ, andrmðpÞ ¼ rmð�1� pÞ.

3. DCT domain properties

In this section simple expressions are derived forcomputing the DCT of any linear operation on asignal, from the DCT coefficients of the originalsignal’s samples. Applying a linear operation toboth sides of (4), and adopting the notation gLðnÞ tomean the linearly transformed signal sampled atx ¼ n, and rLmðnÞ to refer to the linearly trans-formed kernel sampled at x ¼ n,

gLðnÞ ¼XN�1m¼0

GðmÞrLmðnÞ. (9)

It follows from (1) and (9) that

TfgLðnÞgðmÞ ¼XN�1p¼0

GðpÞXN�1n¼0

f nðmÞrLpðnÞ. (10)

This represents a linear transform which computesthe DCT of gLðnÞ from the DCT of gðnÞ, in a singlematrix multiplication. The values of the termsPN�1

n¼0 f nðmÞrLpðnÞ are independent of the signaland its samples, depending only on the type oflinear transformation.

ARTICLE IN PRESSR. Reeves, K. Kubik / Signal Processing 86 (2006) 1597–1603 1599

Differentiation is one example of a linearproperty that can be performed in the DCT domain.In this case gLðnÞ denotes the derivative of gðxÞ

sampled at x ¼ n, and

rLpðnÞ ¼ �cðpÞ

ffiffiffi2p

ffiffiffiffiffiNp

mpN

sinðð2nþ 1Þpp=2NÞ, (11)

given by the derivative of the reverse transformkernel, sampled at x ¼ n. The extension to secondand higher derivatives, real valued shift, scaling,integrals, or any combination of them, is trivial. Forexample, a shifting and scaling property is given byletting rLpðnÞ ¼ rpða0 þ a1nÞ.

4. Example—image matching

As an example of how these DCT properties canbe used, two-dimensional versions were incorpo-rated into an image matching algorithm based on

A ¼

..

. ... ..

. ... ..

. ... ..

. ...

1 gð�Þqqx

gð�Þ xqqx

gð�Þ yqqx

gð�Þqqy

gð�Þ xqqy

gð�Þ yqqy

gð�Þ

..

. ... ..

. ... ..

. ... ..

. ...

2666664

3777775. (17)

the standard approach of Ackermann [9], in whichpartial derivatives are required. An affine transfor-mation models the transformation of left imagepatch to right image patch as follows:

g1ðx; yÞ ¼ h0 þ h1gða0 þ a1xþ a2y; b0 þ b1xþ b2yÞ

þ n1ðx; yÞ ð12Þ

and

g2ðx; yÞ ¼ gðx; yÞ þ n2ðx; yÞ, (13)

where g1ðx; yÞ and g2ðx; yÞ are the image patches tobe matched, h0 and h1 are radiometric transforma-tion parameters, ai and bi are geometric transforma-tion parameters, and n1ðx; yÞ and n2ðx; yÞ areGaussian noise.

Using Taylor’s theorem to linearize each equationabout an initial guess and then subtracting yields

Dgðx; yÞ ¼ dh0 þ dh1gðx; yÞ þ da0qqx

gðx; yÞ

þ da1xqqx

gðx; yÞ þ da2yqqx

gðx; yÞ

þ db0qqy

gðx; yÞ þ db1xqqy

gðx; yÞ

þ db2yqqy

gðx; yÞ þ vðx; yÞ, ð14Þ

where x and y take on a series of discrete valueswithin a match window. This results in a system ofequations for the perturbations to the initial radio-metric and geometric transformation parameters.

The system of equations can be expressed inmatrix form

L ¼ Axþ v, (15)

with the solution given by

x ¼ ðATAÞ�1ATL, (16)

where x is the vector of perturbations to the initiallychosen transformation parameters that result in abetter match between the two image patches. Vectorv is a vector of noise terms, and A is given by

Since the solution is based around a linearapproximation, it can be improved by linearizingaround the new solution, and re-solving. This isrepeated until the solution converges.

By choosing a suitable ordering system, theimages can be expressed as column vectors, the 2Dlinear transform as a matrix, and (15) can beexpressed in the transform domain as

TAxþ Tv ¼ TL, (18)

where multiplying by matrix T takes the 2D DCTtransform. This can be viewed as defining transformdomain A and L matrices given by TA and TL. Ithas been shown previously that as long as T isorthogonal, which is the case for the DCT, thesolution of (16) is unaffected by using the transformdomain A and L matrices [10]. For typical images,the DCT behaves in a similar manner to theKarhunen–Loeve transform, which constructsbasis functions in order of decreasing variance. Inimage compression, this fact is used to justifydiscarding many of the high frequency (low

ARTICLE IN PRESS

Fig. 1. Image one was formed by subsampling a 1600 � 1600

fragment using a 10� 10 Gaussian window, while the right image

was formed by first shifting by 5 pixels, then subsampling.

R. Reeves, K. Kubik / Signal Processing 86 (2006) 1597–16031600

variance) coefficients, while maintaining the infor-mation important to the structure of the image [11].This same principle can be extended to imagematching. Since the bulk of the image energyappears in the low order DCT coefficients, discard-ing the higher order coefficients should not impairimage matching. We can significantly reduce the sizeof the A matrix by transforming each column intothe DCT domain, and then omitting the same highfrequency coefficients from each column. Since thecomputational effort in the solution of the leastsquares system depends on the size of matrix ATA,this should enable the solution to be computed morequickly, without detriment to the quality of thematch result. The method of (10) is used to computethe transforms of the columns of the A matrixinvolving partial derivatives, from the transform ofthe image patch. The extension to two dimensions isstraightforward, with full details given in [12]. Anexperimental investigation is fully reported else-where [12,13]. Here we briefly summarise the mainresults concerning accuracy and computation time,when compared to a fully pixel domain algorithm inwhich the partial derivatives are estimated by firstdifferences.

It is important to note that as far as thisapplication is concerned, the important point thatresults in computational efficiencies is that the leastsquares problem (18) is solved in the transformdomain after removing those transform domainequations which effectively involve only noise. Whilewe have found it expedient to use the transformdomain properties we have proposed to compute thetransforms of the rows of A which involve partialderivatives from the transform of the image patch,an equivalent procedure would be to first computethe columns of A, finding the partial derivatives bysome other means, and then taking the DCT of eachcolumn of A. However, this introduces the problemof estimating the partial derivatives. Apart from theissues of the assumed periodic extension, and thesatisfaction of the Nyquist criterion, the partialderivatives involved in the methods we propose arethose of the original continuous function, notdiscrete estimates with an associated imprecision.As we discuss in Section 4.3 analogous properties inthe time(space) domain can be used to estimate thecolumns of A. However, in this case we would havean additional DCT to perform for each column of Ainvolving a partial derivative.

An artificial horizontal disparity was introducedinto two fragments of aerial photographs as follows.

In image one (Fig. 1), the left image was formed bysubsampling a 1600� 1600 fragment using a 10�10 Gaussian window, while the right image wasformed by first shifting by 5 pixels, then subsam-pling. In image two (Fig. 2), the left image wasformed by subsampling a 328� 328 fragment usinga 2� 2 Gaussian window, while the right image wasformed by first shifting by 1 pixel, then subsam-pling. This resulted in a known disparity of 0.5pixels being introduced in each case between the leftand right images.

4.1. Standard deviation of the disparity errors

The standard deviation of the disparity errorsrepresents the accuracy of the matching result, andin the case of the errors having zero mean, which isapproximately true for our data, it is equivalent tothe RMS error in the disparity estimate. Fig. 3shows that for all window sizes, and for twodifferent aerial images, the standard deviation ofthe errors starts off large, and as more DCTcoefficients are added, quickly reduces, and thenflattens out, after which adding further DCTcoefficients has little impact on the accuracy. Theknee in the curve occurred at around 5–10% of theDCT coefficients for the 32� 32 window, at around20% for the 16� 16 window, and around 25–30%

ARTICLE IN PRESS

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Standard Deviation of Disparity Error for Converged Match Windows

Percentage of DCT Coefficients Taken in Least Squares Adjustment

Sta

ndar

d D

evia

tion

of D

ispa

rity

Err

or (

pixe

ls)

8×8 - TD6×6 -PD16×16 - TD14×14 - PD32×32 - TD30×30 - PD

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

1.2

1.4Standard Deviation of Disparity Error for Converged Match Windows


Sta

ndar

d D

evia

tion

of D

ispa

rity

Err

or (

pixe

ls)

8×8 - TD6×6 -PD16×16 - TD14×14 - PD32×32 - TD30×30 - PD

Fig. 3. Optimum matching accuracy is shown to be achieved with

a small percentage of the available DCT coefficients. Results are

shown for 8� 8, 16� 16 and 32� 32 windows for image one

(top) and image two (bottom). The accuracy achieved by a

comparable pixel domain algorithm are shown as dotted lines for

each of three window sizes.

Fig. 2. Image two was formed by subsampling a 328� 328

fragment using a 2� 2 Gaussian window, while the right image

was formed by first shifting by 1 pixel, then subsampling.

R. Reeves, K. Kubik / Signal Processing 86 (2006) 1597–1603 1601

for the 8� 8 window. In all cases, after the knee, theaccuracy was comparable or better than thatachieved by the pixel domain algorithm, sometimesmarkedly so.

4.2. Average convergence time

The average time for match windows to convergeis shown in Fig. 4. For the 8� 8 window, the timesfor the DCT domain algorithm are comparable tothe pixel domain for DCT coefficient percentages upto about 30%, but then gradually increase as furtherDCT coefficients are added. For the 16� 16window, taking between 10% and 30% of thecoefficients resulted in reducing the average con-vergence time to about 50% of the pixel domaintime in one image, and about 75% in the otherimage. The improvements were more pronouncedfor the 32� 32 window, where in both images theaverage convergence time was under 50% of thepixel domain time, when between 5% and 20% ofthe DCT coefficients were taken.

4.3. Discussion

From our use of these DCT domain properties,several important considerations emerge. Firstly,the properties are based on the assumption that thesignal is symmetrically extended at each end of the

DCT window [14], or block in the 2D case. Wherethe result of the linear operation is outside the DCTwindow (as possible with shifting or scaling), a pointon the symmetrically extended waveform is re-turned. In the case of the DCT support being theentire signal or image, this may be an acceptableedge effect. However, in block based decomposi-tions, edge effects may be introduced into eachblock. The symmetric extension also causes thederivative to tend towards zero at the edges of theDCT window. This may also be problematic inblock based schemas.

ARTICLE IN PRESS

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

800

900

1000Average Match Time for Converged Match Windows


Tim

e (M

illi-s

econ

ds)

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

800Average Match Time for Converged Match Windows


Tim

e (M

illi-s

econ

ds)

8×8 - TD6×6 -PD16×16 - TD14×14 - PD32×32 - TD30×30 - PD

8×8 - TD6×6 -PD16×16 - TD14×14 - PD32×32 - TD30×30 - PD

Fig. 4. The effect of taking only a fraction of the available DCT

coefficients in each least squares adjustment on the average time

taken to converge for each match window is shown for image one

(top) and image two (bottom). The times for the pixel domain

algorithms are shown as dotted horizontal lines for comparison.

R. Reeves, K. Kubik / Signal Processing 86 (2006) 1597–16031602

Time (space) domain versions of the propertyprovide a means of shifting, scaling and taking thederivative of sampled signals. For example, a shiftedsignal can be computed by

gðnþ aÞ ¼XN�1n¼0

gðnÞXN�1m¼0

f nðmÞrmðnþ aÞ. (19)

This equation represents a linear transform, basedon the DCT kernels and the shift parameter. Notethat the shift parameter a can be any real value. Thisequation can also be viewed as an interpolationfunction. Such an interpolation can be combinedwith scaling and taking the derivative. It differs

from a DCT interpolation technique proposed byWang [15–17] which results in an increased numberof samples, spanning the same signal support. In themethod proposed here, the number of samplesremains fixed, but the signal support may change ifscaling or shifting is involved.

The property can also be incorporated directlyinto the decoding step by making use of (9).

5. Summary and conclusions

Shift, scale and derivative properties for the DCTcan be derived by treating the inverse transform as asum of continuous cosine bases. This sum ofcontinuous bases is identical to the original con-tinuous signal, subject to the Nyquist criterion andthe assumed symmetric periodic extension.

A single linear transform can be used to computethe DCT of the sampled derivative, from the DCTof the original signal. Linear transforms can also beconstructed for other linear operations, for exampleshifting and scaling. Any number of sequentiallyapplied linear operations may be combined into asingle linear transform, based on applying thecombined transform to the cosine bases.

The property described in this paper may also beapplied in the time or space domain, for example, toshift a signal by a real (possibly fractional) numberof samples, or to differentiate it. Given that the 2DDCT is separable, there is no impediment to astraightforward 2D extension, which has been usedin a DCT domain image matching algorithm. Weexpect therefore that these techniques may be usefulfor DCT based image representations, particularlywhere geometric transformations or derivatives arerequired.

As an example of how these properties may beused, a standard least squares image matchingalgorithm was implemented in the DCT domain,making use of the properties described. The algo-rithm was able to perform more accurately andconverge faster than a comparable pixel domainalgorithm using first differences to estimate thepartial derivatives. This improved performance maybe attributed to two factors. Firstly, the DCTdomain algorithm enables us to discard a highpercentage of DCT coefficients in the least squaresadjustment, thus reducing the size of the solutionwithout losing significant image information. Sec-ondly, the DCT properties described provide abetter estimate of the partial derivatives than thefirst differences.

ARTICLE IN PRESSR. Reeves, K. Kubik / Signal Processing 86 (2006) 1597–1603 1603

References

[1] G.K. Wallace, The JPEG still-picture compression standard,

Comm. ACM 34 (4) (1991) 31–44.

[2] K.-H. Tzou, Video Coding Techniques: An Overview, in:

P. Pirsch (Ed.), VLSI Implementations for Image Commu-

nications, Elsevier Science Publishers, Amsterdam, 1993,

pp. 1–47.

[3] S. Lee, Improved algorithm for efficient computation of the

forward and backward MDCT in MPEG audio coder, IEEE

Trans. Circuits Syst. II—Analog Digital Signal Process. 48

(10) (2001) 990–994.

[4] J. Liang, T. Tran, Fast multiplierless approximations of the

DCT with the lifting scheme, IEEE Trans. Signal Process. 49

(12) (2001) 3032–3044.

[5] N. Nikolaev, A. Gotchev, K. Egiazarian, Z. Nikolov,

Suppression of electromyogram interference on the electro-

cardiogram by transform domain denoising, Medical Biolo-

gical Eng. Comput. 39 (6) (2001) 649–655.

[6] M. Unser, Splines—a perfect fit for signal and image

processing, IEEE Signal Process. Mag. 16 (6) (1999)

22–38.

[7] H. Ridha, J. Vesma, T. Saramaki, M. Renfors, Derivative

approximations for sampled signals based on polynomial

interpolation, in: Proceedings of the 13th International

Conference on Digital Signal Processing, vol. 2, IEEE,

New York, 1997, pp. 939–942.

[8] K. Rao, P. Yip, Discrete Cosine Transform—Algorithms,

Advantages, Applications, Academic Publishers, San Diego,

1990.

[9] F. Ackermann, Digital image correlation: performance and

potential application in photogrammetry, Photogramm.

Rec. 11 (64) (1984) 429–439.

[10] R. Reeves, K. Kubik, Least squares matching in the

transform domain, Internat. Arch. Photogramm. Remote

Sensing 32 (3/1) (1998) 168–176.

[11] M. Rabbini, P. Jones, Digital Image Compression Techni-

ques, SPIE Optical Engineering Press, Bellingham, WA,

1991.

[12] R. Reeves, Image matching in the compressed domain,

Ph.D. Thesis, Space Centre for Satellite Navigation, Queens-

land University of Technology, Brisbane, Australia, 1999.

[13] R. Reeves, K. Kubik, Benefits of hybrid DCT domain image

matching, Internat. Arch. Photogramm. Remote Sensing 32

(2000) 761–768.

[14] S.A. Martucci, Symmetric convolution and the discrete sine

and cosine transforms, IEEE Trans. Signal Process. 42 (5)

(1994) 1038–1051.

[15] Z. Wang, Interpolation using type I discrete cosine trans-

form, Electron. Lett. 26 (15) (1990) 1170–1171.

[16] Z. Wang, Interpolation using the discrete cosine transform:

reconsideration, Electron. Lett. 29 (2) (1993) 198–200.

[17] J. Agbinya, Two dimensional interpolation of real sequences

using the DCT, Electron. Lett. 29 (2) (1993) 204–205.

Documents

Shift, scaling and derivative properties for the discrete cosine transform