Fast recursive algorithms for 2-D discrete cosine transform

*Corresponding author. Tel.: #65-799-4823; fax: #65-7912687.E-mail address: [email protected] (G. Bi).

Signal Processing 80 (2000) 1917}1935

Fast recursive algorithms for 2-D discrete cosine transform

Teng Chork Tan, Guoan Bi*, Han Ngee Tan

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore

Received 7 June 1999; received in revised form 9 November 1999

Abstract

A new algorithm for computation of two-dimensional (2-D) type-III discrete cosine transform is presented. Thealgorithm is particularly suited to block size (p

1*2m) by (p2*2n), where p

1and p

2are odd integers, and m and n are

non-negative integers. It shows that the 2-D type-III DCT can be decomposed into cosine}cosine, cosine}sine,sine}cosine, sine}sine sequences, which can be further decomposed into similar sequences. The proposed algorithmprovides the #exibility in choosing block size and has a simple indexing mapping scheme and a fairly regularcomputation structure. The algorithm also requires a smaller number of arithmetic operations for p

1"p

2"3. ( 2000

Elsevier Science B.V. All rights reserved.

Zusammenfassung

Es wird ein neuer Algorithmus zur Berechnung der zweidimensionalen (2-D) diskreten Cosinustransformation vomTyp III vorgestellt. Der Algorithmus ist besonders fuK r BlockgroK {en (p

1*2m) mal (p2*2n) geeignet, wobei p

1und

p2

ungerade ganzzahlig und m und n nicht negative ganzzahlig sind. Es zeigt sich, da{ die 2-D Typ III DCT inCosinus}Cosinus-, Cosinus}Sinus- und Sinus}Sinus- Folgen zerlegt werden kann, die in aK hnliche Folgen weiter zerlegtwerden koK nnen. Der vorgeschlagene Algorithmus ermoK glicht die FlexibilitaK t, die BlockgroK {e zu waK hlen und hat eineinfaches Schema der Indexabbildung und eine ziemlich regulaK re Berechnungsstruktur. Der Algorithmus benoK tigt auchweniger arithmetische Operationen fuK r p

1"p

2"3. ( 2000 Elsevier Science B.V. All rights reserved.

Re2 sume2

Nous preH sentons un nouvel algorithme pour le calcul de la transformeH e en cosinus discrets bidimensionnels (2-D) detype III. L'algorithme est particulierement utile pour des blocs de taille (p

1*2m) par (p2*2n), ou p

1et p

2sont des entiers

impairs, et m et n sont des entiers non neH gatifs. Nous montrons que la DCT 2-D de type III peut e( tre deH composeH een seH quence de cosinus}cosinus, cosinus}sinus, sinus}cosinus et sinus}sinus, qui peuvent e( tre ensuite deH composeH esdavantage en seH quences similaires. L'algorithme proposeH fournit une #exibiliteH dans le choix de la taille des blocs, a unscheHma de correspondance par indexage simple et une structure de calcul raisonnablement reH guliere. L'algorithmeneH cessite aussi un nombre plus petit d'opeH rations arithmeH tiques pour p

1"p

2"3. ( 2000 Elsevier Science B.V. All

rights reserved.

0165-1684/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.PII: S 0 1 6 5 - 1 6 8 4 ( 0 0 ) 0 0 1 0 2 - X

1. Introduction

Discrete cosine transform (DCT) has been widely used for various applications of digital signal processingbecause it has high-energy packing capabilities and approaches Karhunen}Loeve transform (KLT) forhighly correlated signals [2,8]. Many fast DCT algorithms have been reported in the literature. Duhamel andGuillemot [3] reported that the computation of using direct polynomial transform required the lowestnumber of multiplications. However, the polynomial transform potentially has a complex computationalstructure and is di$cult to generalize to higher transform sizes and dimensions. Unlike the conventionalrow-column approach that required 2N 1-D DCTs to compute N]N 2-D DCT, Cho and Lee [6,7]regrouped the input matrix so that N 1-D DCTs were used by their algorithm. However, N indexing schemeswere required for the regrouping process of an N]N input matrix. One potential disadvantage is that theimplementation of the indexing scheme becomes more complex for a larger transform size. The algorithmproposed by Haque [4] allowed a more #exible combination of row and column sizes. A comprehensivesurvey of DCT algorithms can be found in [8] and comments on various fast algorithms for 2-D DCT can befound in [5].

Most of reported algorithms assume that the input matrix has equal dimensional sizes which are powersof two. Although algorithms using polynomial transform or prime factor decomposition can beused to support transform sizes other than powers of two, it is di$cult to be generalized fore$cient computation of DCTs with various transform sizes. If the sizes of the input matrix are notmatched to the transform sizes supported by the fast algorithms, measures such as zero-padding techniquehave to be taken, which inevitably needs more computation than necessary. The possibility of themismatch problem can be minimized if the fast algorithm has the capability that naturally supports varioustransform sizes.

This paper presents a fast algorithm for the type-III 2-D DCT (or inverse DCT) which supports ar-bitrarily even transform sizes for each dimension. The proposed algorithm possesses a fairly regularstructure and the input/output indexing schemes can be implemented easily. The organization of this paper isas follows. Section 2 shows that the 2-D IDCT can be decomposed into cosine}cosine, sine}cosine,cosine}sine and sine}sine sub-sequences. Sections 3}8 show these sub-sequences can be recursively decom-posed into similar sequences. Discussions on the implementation issues of the proposed algorithm are givenin Section 9.

2. Algorithm

The computation of the 2-D type-III DCT of input matrix X(k1, k

2) is de"ned by

x(n1, n

2)"

N1~1+

k1/0

N2~1+

k2/0CX(k

1, k

2)cos A

(2n1#1)k

1p

2N1

Bcos A(2n

2#1)k

2p

2N2

BD,

n1"0,2, N

1!1, n

2"0,2,N

2!1, (2.1)

which can be decomposed into cosine}cosine, sine}cosine, cosine}sine and sine}sine sequences of smallerblock size. If the dimensions N

1and N

2are de"ned as

N1"p

1*2m, N2"p

2*2n, (2.2)

1918 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935

where p1

and p2

are odd integers, m and n are integers greater than zero. The transformed matrix x(n1, n

2)

can be partitioned into

Cx11

(n1, n

2)

x12

(n1, n

2)

x21

(n1, n

2)

x22

(n1, n

2)D"

xA2n1#

p1!1

2, 2n

2#

p2!1

2 BxA2n

1#

p1!1

2, 2n

2!

p2#1

2 BxA2n

1!

p1#1

2, 2n

2#

p2!1

2 BxA2n

1!

p1#1

2, 2n

2!

p2#1

2 B

. (2.3)

It is possible that some indices in (2.3) are either greater than or equal to N1

(or N2), or less than zero.

Appendix A illustrates a mapping process between these invalid indices and the valid ones. The mappingprocess is de"ned by

x(!t!1) Q x(t),

x(N#t) Q x(N!t!1), (2.4)

0)t)(p!1)/2,

where the data on the left-hand side of symbol Q have invalid indices and therefore are replaced by the dataon the right-hand side, and

x[!(p#1)/2]"0, x[N#(p!1)/2]"0. (2.5)

By using the properties

cos(A$B)"cos A cos BG sin A sin B

Eq. (2.1) is decomposed into

Cx11

(n1, n

2)

x12

(n1, n

2)

x21

(n1, n

2)

x22

(n1, n

2)D"C

1 !1 !1 1

1 !1 1 !1

1 1 !1 !1

1 1 1 1 D CyICC

(n1, n

2) 0)n

1)N

1/2, 0)n

2)N

2/2

yISC

(n1, n

2) 0(n

1(N

1/2, 0)n

2)N

2/2

yICS

(n1, n

2) 0)n

1)N

1/2, 0(n

2(N

2/2

yISS

(n1, n

2) 0(n

1(N

1/2, 0(n

2(N

2/2D , (2.6)

where

CyICC

(n1, n

2)

yISC

(n1, n

2)

yICS

(n1, n

2)

yISS

(n1, n

2)D"

+N1~1k1/0

+N2~1k2/0

X(k1, k

2) cos (a

1k1) cos (a

2k2) cos

pn1k1

N1/2

cospn

2k2

N2/2

+N1~1k1/0

+N2~1k2/0

X(k1, k

2) sin (a

1k1) cos (a

2k2) sin

pn1k1

N1/2

cospn

2k2

N2/2

+N1~1k1/0

+N2~1k2/0

X(k1, k

2) cos (a

1k1) sin (a

2k2) cos

pn1k1

N1/2

sinpn

2k2

N2/2

+N1~1k1/0

+N2~1k2/0

X(k1, k

2) sin (a

1k1) sin (a

2k2) sin

pn1k1

N1/2

sinpn

2k2

N2/2

T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1919

and

a1"

pp1

2N1

and a2"

pp2

2N2

. (2.7)

By using the property,

cospn(N!k)

N/2"cos

pnk

N/2, sin

pn(N!k)

N/2"!sin

pnk

N/2,

we can obtain Eqs. (2.8)}(2.11).

yICC

(n1, n

2)"

N1 @2+

k1/0

N2 @2+

k2/0 Ccos (a

1k1)[X(k

1, k

2) cos(a

2k2)#

(!1)(p2~1)@2r2X(k

1,N

2!k

2) sin (a

2k2)]#

(!1)(p1~1)@2r1

sin (a1k1)[X(N

1!k

1, k

2) cos (a

2k2)#

(!1)(p2~1)@2r2X(N

1!k

1, N

2!k

2) sin (a

2k2)] D cos

pn1k1

N1/2

cospn

2k2

N2/2

,

n1"0,2, N

1/2, n

2"0,2, N

2/2, (2.8)

yISC

(n1, n

2)"

N1 @2~1+

k1/1

N2 @2+

k2/0 Csin (a

1k1)[X(k

1, k

2) cos (a

2k2)#

(!1)(p2~1)@2r2X(k

1, N

2!k

2) sin (a

2k2)]!

(!1)(p1~1)@2 cos (a1k1)[X(N

1!k

1, k

2) cos (a

2k2)#

(!1)(p2~1)@2r2X(N

1!k

1,N

2!k

2) sin (a

2k2)] D sin

pn1k1

N1/2

cospn

2k2

N2/2

,

n1"1,2, N

1/2, n

2"0,2, N

2/2 (2.9)

yICS

(n1, n

2)"

N1 @2+

k1/0

N2@2~1+

k2/1 Ccos (a

1k1)[X(k

1, k

2) sin (a

2k2)!

(!1)(p2~1)@2X(k1, N

2!k

2) cos (a

2k2)]#

(!1)(p1~1)@2r1

sin (a1k1)[X(N

1!k

1, k

2) sin (a

2k2)!

(!1)(p2~1)@2X(N1!k

1,N

2!k

2) cos (a

2k2)] D cos

pn1k1

N1/2

sinpn

2k2

N2/2

,

n1"0,2, N

1/2, n

2"1,2, N

2/2!1 (2.10)

yISS

(n1, n

2)"

N1@2~1+

k1/1

N2 @2~1+

k2/1 Csin (a

1k1)[X(k

1, k

2) sin (a

2k2)!

(!1)(p2~1)@2X(k1, N

2!k

2) cos (a

2k2)]!

(!1)(p1~1)@2 cos (a1k1)[X(N

1!k

1, k

2) sin (a

2k2)!

(!1)(p2~1)@2X(N1!k

1,N

2!k

2) cos (a

2k2)] D sin

pn1k1

N1/2

sinpn

2k2

N2/2

,

n1"1,2, N

1/2!1, n

2"1,2,N

2/2!1 (2.11)

where

r1"G

0 if k1"0 or N

1/2,

1 otherwise,r2"G

0 if k2"0 or N

2/2,

1 otherwise.

It is noted that the ranges of indices k1

and k2

in Eqs. (2.8)}(2.11) are de"ned according to the type oftrigonometric identity. If cosine function is involved, for example, the related index is valid for 0 to N

i/2,

where i"1 or 2. If sine function is involved, the index is valid for 1 to Ni/2!1. Such an arrangement


includes all the valid indices (see Table A in Appendix A). It is di!erent from the index range used in otherdecomposition process in which the index is valid from 0 to N

i/2!1.

At "rst glance, it seems that r1

and r2

are redundant. In fact, they are introduced for compactmathematical expressions especially when k

i"N

i/2. However, they do not need any extra arithmetic

operations. Once yICC

, yICS

, yISC

and yISS

are computed according to (2.8)}(2.11), the "nal transformed outputscan be combined according to (2.6). Input data indexing in (2.8)}(2.11) is straightforward although a mappingprocess dealing with a few invalid output indices is needed, as de"ned in (2.4) and (2.5).

Detail derivation of the decomposition cost can be found in Appendix B, we simply state here thedecomposition costs in terms of the number of arithmetic operations(a) ¹

A"2N

1N

2!N

1(p

2#1)!N

2(p

1#1) additions for (2.8)}(2.11);

(b) ¹B"2(N

1N

2!N

1!N

2) additions for (2.6);

(c) ¹M"3(N

1!2p

1)(N

2!2p

2)#4p

2(N

1!2p

1)#4p

1(N

2!2p

2)#2p

1p2

multiplications for (2.8)}(2.11).

The total number of multiplications is

MIIIDCT

(N1, N

2)"MI

CC(N

1/2,N

2/2)#MI

SC(N

1/2, N

2/2)#MI

CS(N

1/2,N

2/2)

#MISS

(N1/2,N

2/2)#¹

M(2.12)

and the total number of additions is

AIIIDCT

(N1,N

2)"AI

CC(N

1/2, N

2/2)#AI

SC(N

1/2, N

2/2)#AI

CS(N

1/2, N

2/2)

#AISS

(N1/2, N

2/2)#¹

A#¹

B. (2.13)

In the following sections, we shall consider further decomposition of yICC

, yICS

, yISC

and yISS

into similarsequences of smaller computational blocks. For simplicity of presentation, detailed decomposition for type-Icosine}cosine sequence, yI

CC, is given. The computation of other sequences is described by mathematical

equations only.

3. Computation of type-I cosine}cosine sequence

For a general description of the decomposition procedures, the type-I cosine}cosine sequence uICC

(n1, n

2)

is de"ned as

uICC

(n1, n

2)"

N1

+k1/0

N2

+k2/0

=(k1, k

2) cos A

n1

k1p

N1B cos A

n2k2p

N2B,

n1"0,2, N

1, n

2"0,2, N

2, (3.1)

where =(k1, k

2) is the input matrix. With the decomposition of decimation-in-time (DIT), (3.1) can be

expressed by

bICC

(n1, n

2)"uI

CC(2n

1, 2n

2)"

N1

+k1/0

N2

+k2/0

=(k1, k

2) cos A

pn1k1

N1/2 B cos A

pn2k2

N2/2 B,

n1"0,2, N

1/2, n

2"0,2, N

2/2, (3.2)

bI}IIICC

(n1, n

2)"uI

CC(2n

1,2n

2#1)"

N1

+k1/0

N2

+k2/0

=(k1, k

2) cos A

pn1k1

N1/2 B cos A

p(2n2#1)k

2N

2B,

n1"0,2, N

1/2, n

2"0,2, N

2/2!1, (3.3)


bIII}ICC

(n1, n

2)"uI

CC(2n

1#1,2n

2)"

N1

+k1/0

N2

+k2/0

=(k1, k

2) cos A

p(2n1#1)k

1N

1B cos A

pn2k2

N2/2 B,

n1"0,2, N

1/2, n

2"0,2, N

2/2, (3.4)

bIIICC

(n1, n

2)"uI

CC(2n

1#1,2n

2#1)"

N1

+k1/0

N2

+k2/0

=(k1, k

2) cos A

p(2n1#1)k

12(N

1/2) B cos A

p(2n2#1)k

22(N

2/2) B,

n1"0,2, N

1/2!1, n

2"0,2, N

2/2!1. (3.5)

Based on the properties of trigonometric identities, (3.2)}(3.5) can be recursively divided into similarsub-sequences with a reduced size. For example,

bICC

(n1, n

2)"

N1 @2+

k1/0

N2 @2+

k2/0

=(k1, k

2) cos A

pn1k1

N1/2 B cos A

pn2k2

N2/2 B

#

N1 @2+

k1/0

N2@2~1+

k2/0

=(k1,N

2!k

2) cos A

pn1k1

N1/2 B cos A

pn2(N

2!k

2)

N2/2 B

#

N1 @2~1+

k1/0

N2@2+

k2/0

=(N1!k

1, k

2) cos A

pn1(N

1!k

1)

N1/2 B cos A

pn2k2

N2/2 B

#

N1 @2~1+

k1/0

N2 @2~1+

k2/0

=(N1!k

1, N

2!k

2) cos A

pn1(N

1!k

1)

N1/2 B cos A

pn2(N

2!k

2)

N2/2 B, (3.6a)

which can be rewritten as

bICC

(n1, n

2)"

N1 @2+

k1/0

N2 @2+

k2/0 C=(k

1, k

2)#

l2*=(k

1, N

2!k

2)#

l1*=(N

1!k

1, k

2)#

l1l2*=(N

1!k

1, N

2!k

2)D cos A

pn1k1

N1/2 B cos A

pn2k2

N2/2 B (3.6b)

where for i"1 and 2, li"0 for k

i"N

i/2 and l

i"1, otherwise. Similarly

bI}IIICC

(n1, n

2)"

N1@2+

k1/0

N2 @2+

k2/0

=(k1, k

2) cos A

pn1k1

N1/2 B cos A

p(2n2#1)k

2N

2B

#

N1@2+

k1/0

N2@2~1+

k2/0

=(k1, N

2!k

2) cos A

pn1k1

N1/2 B cos A

p(2n2#1)(N

2!k

2)

N2

B#

N1 @2~1+

k1/0

N2@2+

k2/0

=(N1!k

1, k

2) cos A

pn1(N

1!k

1)

N1/2 B cos A

p(2n2#1)k

2N

2B

#

N1 @2~1+

k1/0

N2@2~1+

k2/0

=(N1!k

1, N

2!k

2)

]cos Apn

1(N

1!k

1)

N1/2 B cos A

p(2n2#1)(N

2!k

2)

N2

B, (3.7a)


which is converted into

bI}IIICC

(n1, n

2)"

N1@2+

k1/0

N2 @2~1+

k2/0C

=(k1, k

2)!=(k

1, N

2!k

2)#

l1=(N

1!k

1, k

2)!l

1=(N

1!k

1, N

2!k

2)D

]cos Apn

1k1

N1/2 B cos A

p(2n2#1)k

2N

2B. (3.7b)

With similar arrangements for bIII~ICC

and bIIICC

, we have

bIII}ICC

(n1, n

2)"

N1 @2~1+

k1/0

N2 @2+

k2/0C

=(k1, k

2)#l

2=(k

1, N

2!k

2)!

=(N1!k

1, k

2)!l

2=(N

1!k

1, N

2!k

2)D

]cos Ap(2n

1#1)k

1N

1B cos A

pn2k2

N2/2 B (3.8)

and

bIIICC

(n1, n

2)"

N1 @2~1+

k1/0

N2 @2~1+

k2/0C

=(k1, k

2)!=(k

1, N

2!k

2)!

=(N1!k

1, k

2)#=(N

1!k

1, N

2!k

2)D

]cos Ap(2n

1#1)k

12(N

1/2) B cos A

p(2n2#1)k

22(N

2/2) B. (3.9)

The decomposition cost for (3.6)}(3.9) is (2N1N

2#N

1#N

2) additions. The proposed decomposition

requires no twiddle factor. In particular, bICC

is the same as uICC

except that the former has a reduced size andbIIICC

is the type-III (N1/2) by (N

2/2) 2-D DCT. Computation of bIII~I

CCcan be done in terms of bI}III

CCif we swap

N1, n

1, k

1and N

2, n

2, k

2, respectively. The decomposition of bI}III

CCwill be discussed in Section 7. The total

computation costs for the type-I cosine}cosine sequences are

MICC

(N1, N

2)"MI

CCAN

12

,N

22 B#MI}III

CC AN

12

,N

22 B#MIII}I

CC AN

12

,N

22 B#MIII

DCTAN

12

,N

22 B, (3.10)

AICC

(N1, N

2)"AI

CCAN

12

,N

22 B#AI}III

CC AN

12

,N

22 B#AIII}I

CC AN

12

,N

22 B#AIII

DCTAN

12

,N

22 B

#2N1N

2#N

1#N

2. (3.11)

4. Computation of type-I sine}cosine sequence

The type-I sine}cosine sequence of=(k1, k

2) is de"ned as

uISC

(n1, n

2)"

N1~1+

k1/1

N2

+k2/0

=(k1, k

2)sinA

pn1k1

N1B cosA

pn2k2

N2B,

n1"1,2, N

1!1, n

2"0,2, N

2, (4.1)


which can be decomposed into

bISC

(n1, n

2)"

N1@2~1+

k1/1

N2 @2+

k2/0

G1(k

1, k

2)sinA

pn1k1

N1/2 B cosA

pn2k2

N2/2 B,

n1"0,2, N

1/2, n

2"0,2, N

2/2, (4.2)

bI}IIISC

(n1, n

2)"

N1 @2~1+

k1/1

N2 @2~1+

k2/0

G2(k

1, k

2)sinA

pn1k1

N1/2 B cosA

p(2n2#1)k

2N

2/2 B,

n1"0,2, N

1/2, n

2"0,2, N

2/2!1, (4.3)

bIII}ISC

(n1, n

2)"

N1 @2~1+

k1/1

N2 @2+

k2/0

G3(k

1, k

2)sinA

p(2n1#1)k

1N

1B cosA

pn2k2

N2/2 B,

n1"0,2, N

1/2!1, n

2"0,2, N

2/2, (4.4)

bIIISC

(n1, n

2)"

N1 @2+

k1/1

N2 @2~1+

k2/0

G4(k

1, k

2)sinA

p(2n1#1)k

1N

1B cosA

p(2n2#1)k

2N

2B,

n1"0,2, N

1/2!1, n

2"0,2, N

2!1, (4.5)

where

CG

1(k

1, k

2)

G2(k

1, k

2)

G3(k

1, k

2)

G4(k

1, k

2)D"C

1 l2

1 l2

1 !1 1 !1

1 l2

!l1

!l1l2

1 !1 !l1

!l1D C

=(k1, k

2)

=(k1, N

2!k

2)

=(N1!k

1, k

2)

=(N1!k

1, N

2!k

2)D, (4.6)

where li, i"1, 2, is de"ned in the last section. The number of additions for the decomposition is

2N1N

2!3N

2#N

1!2. Note that bI

SCis the same as uI

SCwith a reduced size and bIII

SCcan be converted into

bIIICC

when n1

is replaced by N1/2!n

1. Similarly, computation of bIII}I

SCcan be done in terms of bIII}I

CC, which can

also be converted into bI}IIICC

by swapping the values of n1, k

1, N

1and n

2, k

2, N

2, respectively. The

decomposition of bI}IIISC

will be discussed in Section 8. The total computation cost for the type-I sine}cosinesequence is

MISC

(N1, N

2)"MI

SCAN

12

,N

22 B#MI}III

SC AN

12

,N

22 B#MIII}I

SC AN

12

,N

22 B#MIII

DCTAN

12

,N

22 B, (4.7)

AISC

(N1, N

2)"AI

SCAN

12

,N

22 B#AI}III

SC AN

12

,N

22 B#AIII~I

SC AN

12

,N

22 B#AIII

DCTAN

12

,N

22 B

#2N1N

2#N

1!3N

2!2. (4.8)

5. Computation of the type-I cosine}sine sequence

The type-I cosine}sine sequence of=(k1, k

2) is de"ned as

uICS

(n1, n

2)"

N1

+k1/0

N2~1+

k2/1

=(k1, k

2)cosA

pn1k1

N1B sinA

pn2k2

N2B,

n1"0,2, N

1, n

2"1,2, N

2!1, (5.1)


which is decomposed into

bICS

(n1, n

2)"

N1 @2+

k1/0

N2 @2~1+

k2/1

G1(k

1, k

2)cosA

pn1k1

N1/2 B sinA

pn2k2

N2/2 B,

n1"0,2, N

1/2, n

2"1,2, N

2/2!1, (5.2)

bI}IIICS

(n1, n

2)"

N1@2+

k1/0

N2 @2+

k2/1

G2(k

1, k

2)cosA

pn1k1

N1/2 B sinA

p(2n2#1)k

2N

2B,

n1"0,2, N

1/2, n

2"0,2, N

2/2!1, (5.3)

bIII}ICS

(n1, n

2)"

N1 @2~1+

k1/0

N2 @2~1+

k2/1

G3(k

1, k

2)cosA

p(2n1#1)k

1N

1B sinA

pn2k2

N2/2 B,

n1"0,2, N

1/2!1, n

2"1,2, N

2/2!1, (5.4)

bIIICS

(n1, n

2)"

N1@2~1+

k1/0

N2 @2+

k2/1

G4(k

1, k

2)cosA

p(2n1#1)k

1N

1B sinA

p(2n2#1)k

2N

2B,

n1"0,2, N

1/2!1, n

2"0,2, N

2/2!1, (5.5)

CG

1(k

1, k

2)

G2(k

1, k

2)

G3(k

1, k

2)

G4(k

1, k

2)D"C

1 !1 l1

!l1

1 1 l1

l1

1 !1 !1 1

1 l2

!1 !l2D C

=(k1, k

2)

=(k1, N

2!k

2)

=(N1!k

1, k

2)

=(N1!k

1, N

2!k

2)D. (5.6)

The required decomposition cost is 2N1N

2!3N

1#N

2!2 additions. The sub-sequence bI

CSis the same as

uICS

, but has a reduced size, bIIICS

can be converted into bIIICC

if n2


2. Similarly,

computation of bI}IIICS

can be achieved from bI}IIICC

, and bIII}ICS

can be converted into bI}IIISC

by swapping n1, k

1,

N1

with n2, k

2, N

2, respectively. The total computation cost for type-I cosine}sine sequence is the same as

the type-I sine}cosine sequence.

6. Computation of type-I sine}sine sequence

The type-I sine}sine sequence of=(k1, k

2) is de"ned as

uISS

(n1, n

2)"

N1~1+

k1/1

N2~1+

k2/1

=(k1, k

2)sinA

pn1k1

N1B sinA

pn2k2

N2B,

n1"1,2, N

1!1, n

2"1,2, N

2!1, (6.1)

which can be decomposed into the following four sub-sequences.

bISS

(n1, n

2)"

N1@2~1+

k1/1

N2@2~1+

k2/1

G1(k

1, k

2)sinA

pn1k1

N1/2 BsinA

pn2k2

N2/2 B,

n1"1,2, N

1/2!1, n

2"1,2, N

2/2!1, (6.2)


bI}IIISS

(n1, n

2)"

N1 @2~1+

k1/1

N2 @2+

k2/1

G2(k

1, k

2)sinA

pn1k1

N1/2 BsinA

p(2n2#1)k

2N

2B,

n1"1,2, N

1/2!1, n

2"0,2, N

2/2!1, (6.3)

bIII}ISS

(n1, n

2)"

N1@2+

k1/1

N2 @2~1+

k2/1

G3(k

1, k

2)sinA

p(2n1#1)k

1N

1BsinA

pn2k2

N2/2 B,

n1"0,2, N

1/2!1, n

2"1,2, N

2/2!1, (6.4)

bIIISS

(n1, n

2)"

N1@2+

k1/1

N2 @2+

k2/1

G4(k

1, k

2)sinA

p(2n1#1)k

1N

1BsinA

p(2n2#1)k

2N

2B,

n1"0,2, N

1/2!1, n

2"0,2, N

2/2!1, (6.5)

where

CG

1(k

1, k

2)

G2(k

1, k

2)

G3(k

1, k

2)

G4(k

1, k

2)D"C

1 !1 !1 1

1 1 !1 !1

1 !1 1 !1

1 1 1 1 D C=(k

1, k

2)

=(k1, N

2!k

2)

=(N1!k

1, k

2)

=(N1!k

1, N

2!k

2)D . (6.6)

The subsequences bISS

has the same de"nition as uISS

, but with a reduced sizes, bIIISS

can be converted intobIIICC

when n1

and n2

are replaced by N1/2!n

1and N

2/2!n

2, respectively. Similarly, the computation of

bI}IIISS

can be achieved by bI}IIISC

if n2


2, and bIII}I

SScan be converted into bI}III

SSby swapping

n1, k

2, N

1with n

2, k

2, N

2, respectively. Furthermore, bI}III

SScan also be converted into bI}III

SCby replacing n

2by

N2/2!n

2. The decomposition requires 2N

1N

2!3N

2!3N

1#4 additions. The total computational costs

are

MISS

(N1, N

2)"MI

SSAN

12

,N

22 B#MI}III

SS AN

12

,N

22 B#MIII}I

SS AN

12

,N

22 B#MIII

SSAN

12

,N

22 B, (6.7)

AISS

(N1, N

2)"AI

SSAN

12

,N

22 B#AI}III

SS AN

12

,N

22 B#AIII}I

SS AN

12

,N

22 B#AIII

SSAN

12

,N

22 B

#2N1N

2!3N

1!3N

2#4. (6.8)

7. Computation of type-I}III cosine}cosine sequence

The type-I}III cosine}cosine sequence of =(k1, k

2) is de"ned as

uI}IIICC

(n1, n

2)"

N1

+k1/0

N2~1+

k2/0

=(k1, k

2)cosA

pn1k1

N1BcosA

p(2n2#1)k

22N

2B,

n1"0,2, N

1, n

2"0,2, N

2!1, (7.1)


which can be expressed by

CuI}IIICC A2n

1, 2n

2!

p2#1

2 BuI}IIICC A2n

1, 2n

2#

p2!1

2 BD"C1 1

1 !1D CbICC

(n1, n

2), n

1"0,2, N

1/2, n

2"0,2, N

2/2

bICS

(n1, n

2), n

1"0,2, N

1/2, n

2"1,2, N

2/2!1D,

(7.2a)

where

CbICC

(n1, n

2)

bICS

(n1, n

2)D"C

+N1k1/0

+N2~1k2/0=(k

1, k

2)cos(a

2k2)cosA

pn1k1

N1/2 B cosA

pn2k2

N2/2 B

+N1k1/0

+N2~1k2/1=(k

1, k

2)sin(a

2k2)cosA

pn1k1

N1/2 B sinA

pn2k2

N2/2 B D , (7.2b)

CuI}IIICC A2n

1#1, 2n

2!

p2#1

2 BuI}IIICC A2n

1#1, 2n

2#

p2!1

2 BD"C

1 1

1 !1D CbIII}ICC

(n1, n

2), n

1"0,2, N

1/2!1, n

2"0,2, N

2/2

bIII}ICS

(n1, n

2), n

1"0,2, N

1/2!1, n

2"1,2, N

2/2!1D, (7.3a)

where a2"pp

2/(2N

2) and

CbIII}ICC

(n1, n

2)

bIII}ICS

(n1, n

2)D"C

+N1k1/0

+N2~1k2/0=(k

1, k

2)cos(a

2k2)cosA

p(2n1#1)k

1N

1B cosA

pn2k2

N2/2 B

+N1k1/0

+N2~1k2/1=(k

1, k

2)sin(a

2k2)cosA

p(2n1#1)k

1N

1B sinA

pn2k2

N2/2 B D. (7.3b)

Both (7.2b) and (7.3b) are the sub-sequences that have been considered in previous sections. The indexassociated with n

2in (7.2a) and (7.3a) requires a mapping process for the invalid indices. The mapping

process can be performed in the same way as that given in Appendix AThe sub-sequences bI

CC, bI

CS, bIII}I

CCand bIII}I

CScan be further decomposed into

bICC

(n1, n

2)"

N1 @2+

k1/0

N2 @2+

k2/0 C=(k

1, k

2)cos(a

2k2)#

(!1)(p2~1)@2l2=(k

1, N

2!k

2)sin(a

2k2)#

l1=(N

1!k

1, k

2)cos(a

2k2)#

(!1)(p2~1)@2l1l2=(N

1!k

1, N

2!k

2)sin(a

2k2)D cos

pn1k1

N1/2

cospn

2#k

2N

2/2

, (7.5)

bICS

(n1, n

2)"

N1 @2+

k1/0

N2 @2~1+

k2/1 C=(k

1, k

2)sin(a

2k2)!

(!1)(p2~1)@2=(k1, N

2!k

2) cos (a

2k2)#

l1*=(N

1!k

1, k

2) sin (a

2k2)!

(!1)(p2~1)@2l1*=(N

1!k

1, N

2!k

2) cos (a

2k2)D cos A

pn1k1

N1/2 B sin A

pn2k2

N2/2 B,

(7.6)


bIII}ICC

(n1, n

2)"

N1 @2~1+

k1/0

N2 @2+

k2/0 C=(k

1, k

2) cos (a

2k2)#

(!1)(p2~1)@2l2=(k

1, N

2!k

2) sin (a

2k2)!

=(N1!k

1, k

2) cos (a

2k2)!

(!1)(p2~1)@2l2=(N

1!k

1, N

2!k

2) sin (a

2k2)D

]cos Ap(2n

1#1)k

1N

1B cos A

pn2#k

2N

2/2 B, (7.7)

bIII~1CS

(n1, n

2)"

N1 @2~1+

k1/0

N2@2~1+

k2/1 C=(k

1, k

2) sin (a

2k2)!

(!1)(p2~1)@2=(k1, N

2!k

2) cos (a

2k2)

!=(N1!k

1, k

2) sin (a

2k2)#

(!1)(p2~1)2=(N1!k

1, N

2!k

2) cos (a

2k2)D

]cos Ap(2n

1#1)k

1N

1B sin A

pn2k2

N2/2 B. (7.8)

These sub-sequences can be further decomposed, as shown in the previous sections. The computation in(7.2a) and (7.3a) needs N

1N

2#N

2!2N

1!2 additions. The cost for combining the terms inside the

brackets of (7.5)}(7.8) is 2N1N

2#N

2!(N

1#1)(p

2#1) additions and 2N

1N

2#2N

2!3p

2(N

1#1)

multiplications. Therefore, the total computation cost for the type-I}III cosine}cosine sequence is

MI}IIICC

(N1, N

2)"MI

CC(N

1/2, N

2/2)#MI}III

CC(N

1/2, N

2/2)#MI

CS(N

1/2, N

2/2)

#MI}IIICS

(N1/2, N

2/2)#2N

1N

2#2N

2!3p

2(N

1#1), (7.10)

AI}IIICC

(N1, N

2)"AI

CC(N

1/2, N

2/2)#AI}III

CC(N

1/2, N

2/2)#AI

CS(N

1/2, N

2/2)

#AI}IIICS

(N1/2, N

2/2)#3N

1N

2#2N

2!(N

1#1)(p

2#3). (7.11)

8. Computation of the type-I}III sine}cosine sequence

The type-I-III sine}cosine sequence of =(k1, k

2) is de"ned as

uI}IIISC

(n1, n

2)"

N1~1+

k1/1

N2~1+

k2/0

=(k1, k

2) sin A

pn1k1

N1B cos A

p(2n2#1)k

22N

2B.

n1"1,2, N

1!1, n

2"0, 2, N

2!1. (8.1)

We have

CuI}IIISC A2n

1, 2n

2!

p2#1

2 BuI}IIISC A2n

1, 2n

2#

p2!1

2 BD"C1 1

1 !1DCbISC

(n1, n

2), n

1"1,2, N

1/2!1, n

2"0,2, N

2/2

bISS

(n1, n

2), n

1"1,2, N

1/2!1, n

2"1,2,N

2/2!1D,

(8.2)


CuI}IIISC A2n

1#1,2n

2!

p2#1

2 BuI}IIISC A2n

1#1,2n

2#

p2!1

2 BD"C

1 1

1 !1DCbIII}ISC

(n1, n

2), n

1"0,2, N

1/2!1, n

2"0,2, N

2/2

bIII}ISS

(n1, n

2), n

1"0,2, N

1/2!1, n

2"1,2, N

2/2!1D, (8.3)

where for (8.2) and (8.3)

bISC

(n1, n

2)"

N1@2~1+

k1/1

N2 @2+

k2/0 C=(k

1, k

2) cos (a

2k2)#

(!1)(p2~1)2r2=(k

1, N

2!k

2) sin (a

2k2)!

=(N1!k

1, k

2) cos (a

2k2)!

(!1)(p2~1)@2r2=(N

1!k

1, N

2!k

2) sin (a

2k2)D sin

pn1k1

N1/2

cospn

2k2

N2/2

,

n1"1,2, N

1/2!1, n

2"0,2,N

2/2 (8.4)

bISS

(n1, n

2)"

N1@2~1+

k1/1

N2@2~1+

k2/1 C=(k

1, k

2) sin (a

2k2)!

(!1)(p2~1)@2=(k1, N

2!k

2) cos (a

2k2)!

=(N1!k

1, k

2) sin (a

2k2)#

(!1)(p2~1)@2=(N1!k

1, N

2!k

2) cos (a

2k2)D sin A

pn1k1

N1/2 B sin A

pn2k2

N2/2 B,

n1"1,2, N

1/2!1, n

2"1,2,N

2/2!1, (8.5)

bIII}ISC

(n1, n

2)"

N1@2+

k1/1

N2 @2+

k2/0 C=(k

1, k

2) cos (a

2k2)#

(!1)(p2~1)@2r2=(k

1, N

2!k

2) sin (a

2k2)#

l1=(N

1!k

1, k

2) cos (a

2k2)#

(!1)(p2~1)@2l1r2=(N

1!k

1, N

2!k

2) sin (a

2k2)D sin A

p(2n1#1)k

1N

1B

]cos Apn

2k2

N2/2 B,

n1"0,2, N

1/2!1, n

2"0,2,N

2/2, (8.6)

bIII}ISS

(n1, n

2)"

N1@2+

k1/1

N2 @2~1+

k2/1 C=(k

1, k

2) sin (a

2k2)!

(!1)(p2~1)@2=(k1, N

2!k

2) cos (a

2k2)#

l1=(N

1!k

1, k

2) sin (a

2k2)!

(!1)(p2~1)@2l1=(N

1!k

1, N

2!k

2) cos (a

2k2)D sin A

p(2n1#1)k

1N

1B

]sin Apn

2k2

N2/2 B,

n1"0,2, N

1/2!1, n

2"1,2,N

2/2!1, (8.7)

where a2"pp

2/(2N

2), l

1and r

2are de"ned in Sections 2 and 3, respectively. The sub-sequences in (8.4)}(8.7)

can be further decomposed as shown in previous sections. The computation for (8.2) and (8.3) isN

1N

2!2N

1!N

2#2 additions. It is further requires 2N

1N

2!3N

2!(p

2#1)(N

1!1) additions and


Fig. 1. Relation between di!erent sequences at di!erent time index.

2N1N

2!2N

2!3p

2(N

1!1) multiplications to combine the terms inside the brackets in (8.4)}(8.7). The

total computation cost for the type-I}III sine}cosine sequence is

MI}IIISC

(N1, N

2)"MI

SC(N

1/2, N

2/2)#MI}III

SC(N

1/2, N

2/2)#MI

SSSS1(N

1/2, N

2/2)

#MI}IIISS

(N1/2, N

2/2)#2N

1N

2!2N

2!3p

2(N

1!1), (8.8)

AI}IIISC

(N1, N

2)"AI

SC(N

1/2, N

2/2)#AI}III

SC(N

1/2, N

2/2)#AI

SSSS1(N

1/2, N

2/2)

#AI}IIISS

(N1/2, N

2/2)#3N

1N

2!4N

2!(N

1!1)(p

2#3). (8.9)

9. Discussion

In Section 2, it was shown that 2-D Type-III DCT could be decomposed into four sub-sequences. Theywere further decomposed into similar sub-sequences of smaller sizes. It was also shown that one type of


Table 1Computational complexity needed by various algorithms

N1]N

2Mul Add Total

Proposed algorithm (p1"p

2"3)

6]6 38 192 23012]12 342 1104 144624]24 1934 5952 788648]48 10502 29760 4026296]96 52542 143616 196158192]192 254774 672000 926774384]384 1196078 3081216 4277294768]768 5503014 13894656 19397670

Proposed algorithm (p1"1, p

2"3)

8]6 102 284 38616]12 626 1628 225432]24 3178 8484 1166264]48 16050 41876 57926128]96 77786 199796 277582256]192 369154 929076 1298230512]384 1710762 4238516 5949278

Proposed algorithm (p1"p

2"1)

8]8 183 417 60016]16 975 2305 328032]32 5195 11921 1711664]64 25483 58417 83900128]128 122099 277233 399332256]256 567243 1282929 1850172512]512 2589891 5830257 8420148

Chan's algorithm [1]8]8 144 464 60816]16 768 2592 336032]32 3840 13376 1721664]64 18432 65664 84096128]128 86016 311552 397568256]256 393216 1442304 1835520

Cho's algorithm [6]8]8 112 472 58416]16 640 2624 326432]32 3328 13504 1683264]64 16384 66176 82560128]128 77824 313600 391422256]256 360446 1450496 1810942

sub-sequences could be converted into another type according to two properties. The "rst property is thatthe parameters n

1, k

1and N

1can be swapped with n

2, k

2and N

2, which is equivalent to the transposition of

input matrix. For example, the type-III}I cosine}sine sequence can be implemented by type-I-III sine}cosinesequence. The second property is to substitute n by N/2!n so that the type-III sine term can be converted totype-III cosine term and vice versa. These conversion processes can be realized in the subroutines withoutrequiring much computational overhead. By using these properties, the type-III DCT computation can be


Fig. 2. Comparison of computational complexity needed by various algorithms.

accomplished by using only seven types of sub-sequences. Fig. 1 shows the relation between thesesub-sequences, each being decomposed into four sub-sequences of smaller size either directly (solid line) orthrough conversions (dashed line). The approaches used in Sections 3}8 are based on the technique ofdecimation-in-time decomposition. Similarly, type-II DCT is needed if the sub-sequences are decomposedusing the technique of decimation-in-frequency (DIF).

The proposed algorithm needs an index mapping process, which can be considered to be trivial because themapping process involves with a small number of data indices. In addition, Appendix A has shown that thenumber of indices involved in the mapping process is only related to the values of p

1and p

2and not

increased with the transform sizes.Table 1 shows the computational complexity needed by the proposed algorithm and the algorithms

reported by Cho [6] and Chan [1]. For N1"N

2"2r (i.e. p

1"p

2"1), the proposed and Chan's

algorithms require about the same number of arithmetic operations, which are slightly more than thecomplexity needed by Cho's algorithm. Table 1 also lists the number of arithmetic operations required by theproposed algorithm for (p

1"1,3 and p

2"3). Fig. 2 shows that a smaller computational complexity than

that for the case of p1"p

2"1 is needed by the proposed algorithm. In general, the proposed algorithm uses

a smaller number of additions and a larger number of multiplications as compared to other reportedalgorithms. This phenomenon is the consequence of using four multiplications and two additions fora butter#y computation in our analysis rather than three multiplications and three additions are used fora butter#y computation. It is up to the users to decide which implementation scheme is used for theirapplications. At present, multiplication and addition can be performed at the same speed on some DSP chips.It is important to minimize the overall computational complexity rather than to reduce the number ofmultiplications at the cost of increasing the number of additions.

10. Conclusion

A fast algorithm is presented for the two-dimensional type-III DCT. It is shown that the type-III DCT can berecursively decomposed into a number of sub-sequences with reduced sizes. This algorithm has a fairly regularstructure and a simple input and output indexing scheme. One important feature of the proposed algorithm isto naturally support various transform sizes with a possible reduction of computational complexity.


Table 2Mapping process for p"5, N"10 and p"11, N"22

n p"5 and N"10 p"11 and N"22

x(2n!3) x(2n#2) x(2n!6) x(2n#5)

0 x(!3)"0 x(2) x(!6)"0 x(5)1 x(!1)"x(0)! x(4) x(!4)"x(3)! x(7)2 x(1) x(6) x(!2)"x(1)! x(9)3 x(3) x(8) x(0) x(11)4 x(5) x(10)"x(9)! x(2) x(13)5 x(7) x(12)"0 x(4) x(15)6 } } x(6) x(17)7 } } x(8) x(19)8 } } x(10) x(21)9 } } x(12) x(23)"x(20)!

10 } } x(14) x(25)"x(18)!11 } } x(16) x(27)"0

!Mapping region.

Appendix A. Index mapping process

We consider the mapping process for 1-D DCT, whose sequence length is N"p*2r, where r'1 and p isan odd integer. In general, we divide x(n), n"0,2, N!1, into sub-sequences x[2n!(p#1)/2] andx[2n#(p!1)/2] where n"0,2N/2. It can be easily veri"ed that for n((p#1)/4 the indices2n!(p#1)/2 becomes negative and for n*N/2!(p#1)/4 the indices 2n!(p#1)/2 is larger than N.Both are invalid indices, thus they require an index mapping process.

Based on the de"nition of the type-III DCT, we know that x(n) is associated with cos [p(2n#1)k].Similarly for t*0, x(!t!1) is associated with cos [p(!2t!1)k],cos [p(2t#1)k]. Therefore, itis reasonable that the invalid data x(!t!1) is replaced by x(t) in the DCT computation. Similarly ifx(N#t) is replaced by x(N!t!1), where t*0, we have x(N#t)cos [kp#(2t#1)kp/(2N)],x(N!t!1)cos [kp!(2t#1)kp/(2N)]. In summary, the mapping process is de"ned by

x(!t!1) Q x(t) and x(N#t) Q x(N!t!1)

for 0)t)(p!1)/2.Now, let us consider the indices of sub-sequences x[2n!(p#1)/2] and x[2n#(p!1)/2] for

n"0, 2, N/2. Based on the above mapping process, it can be veri"ed that for n"0,x[!(p#1)/2] isreplaced by x[(p!1)/2] which is used twice. The data duplication of the mapping process also occurs whenx[N#(p!1)/2]"x[N!(p#1)/2] for n"N/2. The duplication problem can be eliminated by de"ningx[!(p#1)/2]"x[N#(p!1)/2]"0. Table 2 illustrates two examples for p"5 and 11. It also shows thatthe mapping process involves only a few data whose indices are invalid. The number of invalid data isgenerally 2vp/4w , where vxw is an integer larger than x.

Appendix B. Computation complexity

In Sections 2}6, we decompose one 2-D sequence into four sub-sequences, as shown in (2.6). Because thesummations [(2.7), for example] for these sub-sequences have di!erent limits, it is di$cult to calculate thedecomposition cost in terms of the number of additions. Table 3 lists the details of the additive costs for (2.6).


Table 3Decomposition cost

n1

n2

No. of additions

0 0 00 N

2/2 0

0 12N2/2!1 (N

2!2)

N1/2 0 0

N1/2 N

2/2 0

N1/2 12N

2/2!1 (N

2!2)

12N1/2!1 0 (N

1!2)

12N1/2!1 N

2/2 (N

1!2)

12N1/2!1 12N

2/2!1 2(N

1!2) (N

2!2)

Table 4No. of additions needed by the decomposition

k1

k2

No. of additions

0 0 00 N

2/2 0

0 12N2/2!1 (N

2!p

2!1)

N1/2 0 0

N1/2 N

2/2 0

N1/2 12N

2/2!1 (N

2!p

2!1)

12N1/2!1 0 (N

1!p

1!1)

12N1/2!1 N

2/2 (N

1!p

1!1)

12N1/2!1 12N

2/2!1 2(N

1!p

1!1)*(N2

!p2!1)

The total number of additions for (2.6) is 2*(N1N

2!N

1!N

2).

Additions and multiplications are needed by the computation inside the brackets of (2.8)}(2.11). To reducethe computational complexity, we can express the related computation into

cos (a1k1)[X

11(k

1, k

2) cos (a

2k2)#r

2X

21(k

1, k

2) sin (a

2k2)]

#r1sin (a

1k1)[X

31(k

1, k

2)cos (a

2k2)!r

2X

41(k

1, k

2) sin (a

2k2)],

0)k1)N

1/2, 0)k

2)N

2/2, (B.1)

sin (a1k1)[X

11(k

1, k

2) cos (a

2k2)#r

2X

21(k

1, k

2) sin (a

2k2)]

#cos (a1k1)[X

31(k

1, k

2) cos (a

2k2)!r

2X

41(k

1, k

2) sin (a

2k2)],

1)k1(N

1/2, 0)k

2)N

2/2, (B.2)

cos (a1k1)[X

11(k

1, k

2) cos (a

2k2)!X

21(k

1, k

2) sin (a

2k2)]

#r1

sin (a1k1)[X

31(k

1, k

2) cos (a

2k2)#X

41(k

1, k

2) sin (a

2k2)],

0)k1)N

1/2, 1)k

2(N

2/2, (B.3)

sin (a1k1)[X

11(k

1, k

2) cos (a

2k2)!X

21(k

1, k

2) sin (a

2k2)]

#cos(a1k1)[X

31(k

1, k

2)cos (a

2k2)#X

41(k

1, k

2) sin (a

2k2)],

1)k1(N

1/2, 1)k

2(N

2/2. (B.4)

Based on the above equations and taking the reduction of additions due to trivial twiddle factors, Table4 shows the details for the number of additions required by the decomposition process.

Hence, after taken into account of the reduction of multiplications and additions due to trivial twiddlefactors and special twiddle factors such as 1, 0.5 and 0.7071, the total decomposition cost for addition is

¹A"2N

1N

2!N

1(p

2#1)!N

2(p

1#1) (B.5)

To reduce the number of multiplications, (B.1)}(B.4) can be calculated by

cos (a1k1) cos (a

2k2)[X

11(k

1, k

2)#r

2X

21(k

1, k

2) tan (a

2k2)]

#r1

sin (a1k1) cos (a

2k2)[X

31(k

1, k

2)!r

2X

41(k

1, k

2)tan (a

2k2)],

0)k1)N

1/2, 0)k

2)N

2/2, (B.6)


Table 5Decomposition cost due to trivial twiddle factor (Multiplication)

k1

k2

No. of multiplications

0 0 00 N

2/2 1

0 12N2/2!1 2N

2!3p

2!1

N1/2 0 1

N1/2 N

2/2 0

N1/2 12N

2/2!1 2N

2!3p

2!1

12N1/2!1 0 2N

1!3p

1!1

12N1/2!1 N

2/2 2N

1!3p

1!1

12N1/2!1 12N

2/2!1 3(N

1!2p

1)(N

2!2p

2)#4(N

1!2p

1)(p

2!1)#4(N

2!2p

2)(p

1!1)#2(p

1!1)(p

2!1)

sin (a1k1) cos (a

2k2)[X

11(k

1, k

2)#r

2X

21(k

1, k

2) tan (a

2k2)]

#cos (a1k1) cos (a

2k2)[X

31(k

1, k

2)!r

2X

41(k

1, k

2) tan (a

2k2)],

1)k1(N

1/2, 0)k

2)N

2/2, (B.7)

cos (a1k1) cos (a

2k2)[X

11(k

1, k

2)!X

21(k

1, k

2) tan (a

2k2)]

#r1

sin (a1k1) cos (a

2k2)[X

31(k

1, k

2)#X

41(k

1, k

2) tan (a

2k2)],

0)k1)N

1/2, 1)k

2(N

2/2, (B.8)

sin (a1k1) cos (a

2k2)[X

11(k

1, k

2)!X

21(k

1, k

2) tan (a

2k2)]

#cos (a1k1) cos (a

2k2)[X

31(k

1, k

2)#X

41(k

1, k

2) tan (a

2k2)],

1)k1(N

1/2, 1)k

2(N

2/2. (B.9)

Table 5 shows the decomposition cost for multiplications. When the term tan(a2k2) in (B.6)}(B.9) becomes

in"nity, (B.1)}(B.4) are used.Hence, total number of multiplications for (2.8)}(2.11) is

¹M"3(N

1!2p

1)(N

2!2p

2)#4p

2(N

1!2p

1)#4p

1(N

2!2p

2)#2p

1p2. (B.10)

References

[1] S.C. Chan, K.L. Ho, A new two-dimensional fast cosine transform algorithm, IEEE Trans. Signal Process 39 (2) (1991).[2] R.J. Clark, Transform Coding of Images, Academic Press, London, (1985) 481}485.[3] P. Duhamel, C. Guillemot, Polynomial transform computation of the 2-D DCT, ICASSP-90, Vol. 3 (1990) pp. 1515}1518.[4] M.A. Haque, A two-dimensional fast cosine transform, ICASSP ASSP-33 (6) (1985) 1532}1539.[5] Hong Ren Wu, ZhiHong Mon, Comment on Fast algorithms and implementation of 2-D DCT, IEEE Trans. Circuits Systems

Video Technol. 8 (2) (1998) 128}129.[6] Nam Ik Cho, Sang Uk Lee, Fast algorithm and implementation of 2-D DCT, IEEE Trans. Circuits Systems 38 (3) (1991) 297}305.[7] Nam Ik Cho, Sang Uk Lee, A Fast 4]4 DCT algorithm for the recursive 2-D DCT, IEEE Trans. Signal Process 40 (9) (1992)

2166}2173.[8] K.R. Rao, P. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applications, Academic Press, New York, 1990.


Documents

Fast recursive algorithms for 2-D discrete cosine transform