Upload
teng-chork-tan
View
221
Download
5
Embed Size (px)
Citation preview
*Corresponding author. Tel.: #65-799-4823; fax: #65-7912687.E-mail address: [email protected] (G. Bi).
Signal Processing 80 (2000) 1917}1935
Fast recursive algorithms for 2-D discrete cosine transform
Teng Chork Tan, Guoan Bi*, Han Ngee Tan
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore
Received 7 June 1999; received in revised form 9 November 1999
Abstract
A new algorithm for computation of two-dimensional (2-D) type-III discrete cosine transform is presented. Thealgorithm is particularly suited to block size (p
1*2m) by (p2*2n), where p
1and p
2are odd integers, and m and n are
non-negative integers. It shows that the 2-D type-III DCT can be decomposed into cosine}cosine, cosine}sine,sine}cosine, sine}sine sequences, which can be further decomposed into similar sequences. The proposed algorithmprovides the #exibility in choosing block size and has a simple indexing mapping scheme and a fairly regularcomputation structure. The algorithm also requires a smaller number of arithmetic operations for p
1"p
2"3. ( 2000
Elsevier Science B.V. All rights reserved.
Zusammenfassung
Es wird ein neuer Algorithmus zur Berechnung der zweidimensionalen (2-D) diskreten Cosinustransformation vomTyp III vorgestellt. Der Algorithmus ist besonders fuK r BlockgroK {en (p
1*2m) mal (p2*2n) geeignet, wobei p
1und
p2
ungerade ganzzahlig und m und n nicht negative ganzzahlig sind. Es zeigt sich, da{ die 2-D Typ III DCT inCosinus}Cosinus-, Cosinus}Sinus- und Sinus}Sinus- Folgen zerlegt werden kann, die in aK hnliche Folgen weiter zerlegtwerden koK nnen. Der vorgeschlagene Algorithmus ermoK glicht die FlexibilitaK t, die BlockgroK {e zu waK hlen und hat eineinfaches Schema der Indexabbildung und eine ziemlich regulaK re Berechnungsstruktur. Der Algorithmus benoK tigt auchweniger arithmetische Operationen fuK r p
1"p
2"3. ( 2000 Elsevier Science B.V. All rights reserved.
Re2 sume2
Nous preH sentons un nouvel algorithme pour le calcul de la transformeH e en cosinus discrets bidimensionnels (2-D) detype III. L'algorithme est particulierement utile pour des blocs de taille (p
1*2m) par (p2*2n), ou p
1et p
2sont des entiers
impairs, et m et n sont des entiers non neH gatifs. Nous montrons que la DCT 2-D de type III peut e( tre deH composeH een seH quence de cosinus}cosinus, cosinus}sinus, sinus}cosinus et sinus}sinus, qui peuvent e( tre ensuite deH composeH esdavantage en seH quences similaires. L'algorithme proposeH fournit une #exibiliteH dans le choix de la taille des blocs, a unscheHma de correspondance par indexage simple et une structure de calcul raisonnablement reH guliere. L'algorithmeneH cessite aussi un nombre plus petit d'opeH rations arithmeH tiques pour p
1"p
2"3. ( 2000 Elsevier Science B.V. All
rights reserved.
0165-1684/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.PII: S 0 1 6 5 - 1 6 8 4 ( 0 0 ) 0 0 1 0 2 - X
1. Introduction
Discrete cosine transform (DCT) has been widely used for various applications of digital signal processingbecause it has high-energy packing capabilities and approaches Karhunen}Loeve transform (KLT) forhighly correlated signals [2,8]. Many fast DCT algorithms have been reported in the literature. Duhamel andGuillemot [3] reported that the computation of using direct polynomial transform required the lowestnumber of multiplications. However, the polynomial transform potentially has a complex computationalstructure and is di$cult to generalize to higher transform sizes and dimensions. Unlike the conventionalrow-column approach that required 2N 1-D DCTs to compute N]N 2-D DCT, Cho and Lee [6,7]regrouped the input matrix so that N 1-D DCTs were used by their algorithm. However, N indexing schemeswere required for the regrouping process of an N]N input matrix. One potential disadvantage is that theimplementation of the indexing scheme becomes more complex for a larger transform size. The algorithmproposed by Haque [4] allowed a more #exible combination of row and column sizes. A comprehensivesurvey of DCT algorithms can be found in [8] and comments on various fast algorithms for 2-D DCT can befound in [5].
Most of reported algorithms assume that the input matrix has equal dimensional sizes which are powersof two. Although algorithms using polynomial transform or prime factor decomposition can beused to support transform sizes other than powers of two, it is di$cult to be generalized fore$cient computation of DCTs with various transform sizes. If the sizes of the input matrix are notmatched to the transform sizes supported by the fast algorithms, measures such as zero-padding techniquehave to be taken, which inevitably needs more computation than necessary. The possibility of themismatch problem can be minimized if the fast algorithm has the capability that naturally supports varioustransform sizes.
This paper presents a fast algorithm for the type-III 2-D DCT (or inverse DCT) which supports ar-bitrarily even transform sizes for each dimension. The proposed algorithm possesses a fairly regularstructure and the input/output indexing schemes can be implemented easily. The organization of this paper isas follows. Section 2 shows that the 2-D IDCT can be decomposed into cosine}cosine, sine}cosine,cosine}sine and sine}sine sub-sequences. Sections 3}8 show these sub-sequences can be recursively decom-posed into similar sequences. Discussions on the implementation issues of the proposed algorithm are givenin Section 9.
2. Algorithm
The computation of the 2-D type-III DCT of input matrix X(k1, k
2) is de"ned by
x(n1, n
2)"
N1~1+
k1/0
N2~1+
k2/0CX(k
1, k
2)cos A
(2n1#1)k
1p
2N1
Bcos A(2n
2#1)k
2p
2N2
BD,
n1"0,2, N
1!1, n
2"0,2,N
2!1, (2.1)
which can be decomposed into cosine}cosine, sine}cosine, cosine}sine and sine}sine sequences of smallerblock size. If the dimensions N
1and N
2are de"ned as
N1"p
1*2m, N2"p
2*2n, (2.2)
1918 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935
where p1
and p2
are odd integers, m and n are integers greater than zero. The transformed matrix x(n1, n
2)
can be partitioned into
Cx11
(n1, n
2)
x12
(n1, n
2)
x21
(n1, n
2)
x22
(n1, n
2)D"
xA2n1#
p1!1
2, 2n
2#
p2!1
2 BxA2n
1#
p1!1
2, 2n
2!
p2#1
2 BxA2n
1!
p1#1
2, 2n
2#
p2!1
2 BxA2n
1!
p1#1
2, 2n
2!
p2#1
2 B
. (2.3)
It is possible that some indices in (2.3) are either greater than or equal to N1
(or N2), or less than zero.
Appendix A illustrates a mapping process between these invalid indices and the valid ones. The mappingprocess is de"ned by
x(!t!1) Q x(t),
x(N#t) Q x(N!t!1), (2.4)
0)t)(p!1)/2,
where the data on the left-hand side of symbol Q have invalid indices and therefore are replaced by the dataon the right-hand side, and
x[!(p#1)/2]"0, x[N#(p!1)/2]"0. (2.5)
By using the properties
cos(A$B)"cos A cos BG sin A sin B
Eq. (2.1) is decomposed into
Cx11
(n1, n
2)
x12
(n1, n
2)
x21
(n1, n
2)
x22
(n1, n
2)D"C
1 !1 !1 1
1 !1 1 !1
1 1 !1 !1
1 1 1 1 D CyICC
(n1, n
2) 0)n
1)N
1/2, 0)n
2)N
2/2
yISC
(n1, n
2) 0(n
1(N
1/2, 0)n
2)N
2/2
yICS
(n1, n
2) 0)n
1)N
1/2, 0(n
2(N
2/2
yISS
(n1, n
2) 0(n
1(N
1/2, 0(n
2(N
2/2D , (2.6)
where
CyICC
(n1, n
2)
yISC
(n1, n
2)
yICS
(n1, n
2)
yISS
(n1, n
2)D"
+N1~1k1/0
+N2~1k2/0
X(k1, k
2) cos (a
1k1) cos (a
2k2) cos
pn1k1
N1/2
cospn
2k2
N2/2
+N1~1k1/0
+N2~1k2/0
X(k1, k
2) sin (a
1k1) cos (a
2k2) sin
pn1k1
N1/2
cospn
2k2
N2/2
+N1~1k1/0
+N2~1k2/0
X(k1, k
2) cos (a
1k1) sin (a
2k2) cos
pn1k1
N1/2
sinpn
2k2
N2/2
+N1~1k1/0
+N2~1k2/0
X(k1, k
2) sin (a
1k1) sin (a
2k2) sin
pn1k1
N1/2
sinpn
2k2
N2/2
T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1919
and
a1"
pp1
2N1
and a2"
pp2
2N2
. (2.7)
By using the property,
cospn(N!k)
N/2"cos
pnk
N/2, sin
pn(N!k)
N/2"!sin
pnk
N/2,
we can obtain Eqs. (2.8)}(2.11).
yICC
(n1, n
2)"
N1 @2+
k1/0
N2 @2+
k2/0 Ccos (a
1k1)[X(k
1, k
2) cos(a
2k2)#
(!1)(p2~1)@2r2X(k
1,N
2!k
2) sin (a
2k2)]#
(!1)(p1~1)@2r1
sin (a1k1)[X(N
1!k
1, k
2) cos (a
2k2)#
(!1)(p2~1)@2r2X(N
1!k
1, N
2!k
2) sin (a
2k2)] D cos
pn1k1
N1/2
cospn
2k2
N2/2
,
n1"0,2, N
1/2, n
2"0,2, N
2/2, (2.8)
yISC
(n1, n
2)"
N1 @2~1+
k1/1
N2 @2+
k2/0 Csin (a
1k1)[X(k
1, k
2) cos (a
2k2)#
(!1)(p2~1)@2r2X(k
1, N
2!k
2) sin (a
2k2)]!
(!1)(p1~1)@2 cos (a1k1)[X(N
1!k
1, k
2) cos (a
2k2)#
(!1)(p2~1)@2r2X(N
1!k
1,N
2!k
2) sin (a
2k2)] D sin
pn1k1
N1/2
cospn
2k2
N2/2
,
n1"1,2, N
1/2, n
2"0,2, N
2/2 (2.9)
yICS
(n1, n
2)"
N1 @2+
k1/0
N2@2~1+
k2/1 Ccos (a
1k1)[X(k
1, k
2) sin (a
2k2)!
(!1)(p2~1)@2X(k1, N
2!k
2) cos (a
2k2)]#
(!1)(p1~1)@2r1
sin (a1k1)[X(N
1!k
1, k
2) sin (a
2k2)!
(!1)(p2~1)@2X(N1!k
1,N
2!k
2) cos (a
2k2)] D cos
pn1k1
N1/2
sinpn
2k2
N2/2
,
n1"0,2, N
1/2, n
2"1,2, N
2/2!1 (2.10)
yISS
(n1, n
2)"
N1@2~1+
k1/1
N2 @2~1+
k2/1 Csin (a
1k1)[X(k
1, k
2) sin (a
2k2)!
(!1)(p2~1)@2X(k1, N
2!k
2) cos (a
2k2)]!
(!1)(p1~1)@2 cos (a1k1)[X(N
1!k
1, k
2) sin (a
2k2)!
(!1)(p2~1)@2X(N1!k
1,N
2!k
2) cos (a
2k2)] D sin
pn1k1
N1/2
sinpn
2k2
N2/2
,
n1"1,2, N
1/2!1, n
2"1,2,N
2/2!1 (2.11)
where
r1"G
0 if k1"0 or N
1/2,
1 otherwise,r2"G
0 if k2"0 or N
2/2,
1 otherwise.
It is noted that the ranges of indices k1
and k2
in Eqs. (2.8)}(2.11) are de"ned according to the type oftrigonometric identity. If cosine function is involved, for example, the related index is valid for 0 to N
i/2,
where i"1 or 2. If sine function is involved, the index is valid for 1 to Ni/2!1. Such an arrangement
1920 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935
includes all the valid indices (see Table A in Appendix A). It is di!erent from the index range used in otherdecomposition process in which the index is valid from 0 to N
i/2!1.
At "rst glance, it seems that r1
and r2
are redundant. In fact, they are introduced for compactmathematical expressions especially when k
i"N
i/2. However, they do not need any extra arithmetic
operations. Once yICC
, yICS
, yISC
and yISS
are computed according to (2.8)}(2.11), the "nal transformed outputscan be combined according to (2.6). Input data indexing in (2.8)}(2.11) is straightforward although a mappingprocess dealing with a few invalid output indices is needed, as de"ned in (2.4) and (2.5).
Detail derivation of the decomposition cost can be found in Appendix B, we simply state here thedecomposition costs in terms of the number of arithmetic operations(a) ¹
A"2N
1N
2!N
1(p
2#1)!N
2(p
1#1) additions for (2.8)}(2.11);
(b) ¹B"2(N
1N
2!N
1!N
2) additions for (2.6);
(c) ¹M"3(N
1!2p
1)(N
2!2p
2)#4p
2(N
1!2p
1)#4p
1(N
2!2p
2)#2p
1p2
multiplications for (2.8)}(2.11).
The total number of multiplications is
MIIIDCT
(N1, N
2)"MI
CC(N
1/2,N
2/2)#MI
SC(N
1/2, N
2/2)#MI
CS(N
1/2,N
2/2)
#MISS
(N1/2,N
2/2)#¹
M(2.12)
and the total number of additions is
AIIIDCT
(N1,N
2)"AI
CC(N
1/2, N
2/2)#AI
SC(N
1/2, N
2/2)#AI
CS(N
1/2, N
2/2)
#AISS
(N1/2, N
2/2)#¹
A#¹
B. (2.13)
In the following sections, we shall consider further decomposition of yICC
, yICS
, yISC
and yISS
into similarsequences of smaller computational blocks. For simplicity of presentation, detailed decomposition for type-Icosine}cosine sequence, yI
CC, is given. The computation of other sequences is described by mathematical
equations only.
3. Computation of type-I cosine}cosine sequence
For a general description of the decomposition procedures, the type-I cosine}cosine sequence uICC
(n1, n
2)
is de"ned as
uICC
(n1, n
2)"
N1
+k1/0
N2
+k2/0
=(k1, k
2) cos A
n1
k1p
N1B cos A
n2k2p
N2B,
n1"0,2, N
1, n
2"0,2, N
2, (3.1)
where =(k1, k
2) is the input matrix. With the decomposition of decimation-in-time (DIT), (3.1) can be
expressed by
bICC
(n1, n
2)"uI
CC(2n
1, 2n
2)"
N1
+k1/0
N2
+k2/0
=(k1, k
2) cos A
pn1k1
N1/2 B cos A
pn2k2
N2/2 B,
n1"0,2, N
1/2, n
2"0,2, N
2/2, (3.2)
bI}IIICC
(n1, n
2)"uI
CC(2n
1,2n
2#1)"
N1
+k1/0
N2
+k2/0
=(k1, k
2) cos A
pn1k1
N1/2 B cos A
p(2n2#1)k
2N
2B,
n1"0,2, N
1/2, n
2"0,2, N
2/2!1, (3.3)
T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1921
bIII}ICC
(n1, n
2)"uI
CC(2n
1#1,2n
2)"
N1
+k1/0
N2
+k2/0
=(k1, k
2) cos A
p(2n1#1)k
1N
1B cos A
pn2k2
N2/2 B,
n1"0,2, N
1/2, n
2"0,2, N
2/2, (3.4)
bIIICC
(n1, n
2)"uI
CC(2n
1#1,2n
2#1)"
N1
+k1/0
N2
+k2/0
=(k1, k
2) cos A
p(2n1#1)k
12(N
1/2) B cos A
p(2n2#1)k
22(N
2/2) B,
n1"0,2, N
1/2!1, n
2"0,2, N
2/2!1. (3.5)
Based on the properties of trigonometric identities, (3.2)}(3.5) can be recursively divided into similarsub-sequences with a reduced size. For example,
bICC
(n1, n
2)"
N1 @2+
k1/0
N2 @2+
k2/0
=(k1, k
2) cos A
pn1k1
N1/2 B cos A
pn2k2
N2/2 B
#
N1 @2+
k1/0
N2@2~1+
k2/0
=(k1,N
2!k
2) cos A
pn1k1
N1/2 B cos A
pn2(N
2!k
2)
N2/2 B
#
N1 @2~1+
k1/0
N2@2+
k2/0
=(N1!k
1, k
2) cos A
pn1(N
1!k
1)
N1/2 B cos A
pn2k2
N2/2 B
#
N1 @2~1+
k1/0
N2 @2~1+
k2/0
=(N1!k
1, N
2!k
2) cos A
pn1(N
1!k
1)
N1/2 B cos A
pn2(N
2!k
2)
N2/2 B, (3.6a)
which can be rewritten as
bICC
(n1, n
2)"
N1 @2+
k1/0
N2 @2+
k2/0 C=(k
1, k
2)#
l2*=(k
1, N
2!k
2)#
l1*=(N
1!k
1, k
2)#
l1l2*=(N
1!k
1, N
2!k
2)D cos A
pn1k1
N1/2 B cos A
pn2k2
N2/2 B (3.6b)
where for i"1 and 2, li"0 for k
i"N
i/2 and l
i"1, otherwise. Similarly
bI}IIICC
(n1, n
2)"
N1@2+
k1/0
N2 @2+
k2/0
=(k1, k
2) cos A
pn1k1
N1/2 B cos A
p(2n2#1)k
2N
2B
#
N1@2+
k1/0
N2@2~1+
k2/0
=(k1, N
2!k
2) cos A
pn1k1
N1/2 B cos A
p(2n2#1)(N
2!k
2)
N2
B#
N1 @2~1+
k1/0
N2@2+
k2/0
=(N1!k
1, k
2) cos A
pn1(N
1!k
1)
N1/2 B cos A
p(2n2#1)k
2N
2B
#
N1 @2~1+
k1/0
N2@2~1+
k2/0
=(N1!k
1, N
2!k
2)
]cos Apn
1(N
1!k
1)
N1/2 B cos A
p(2n2#1)(N
2!k
2)
N2
B, (3.7a)
1922 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935
which is converted into
bI}IIICC
(n1, n
2)"
N1@2+
k1/0
N2 @2~1+
k2/0C
=(k1, k
2)!=(k
1, N
2!k
2)#
l1=(N
1!k
1, k
2)!l
1=(N
1!k
1, N
2!k
2)D
]cos Apn
1k1
N1/2 B cos A
p(2n2#1)k
2N
2B. (3.7b)
With similar arrangements for bIII~ICC
and bIIICC
, we have
bIII}ICC
(n1, n
2)"
N1 @2~1+
k1/0
N2 @2+
k2/0C
=(k1, k
2)#l
2=(k
1, N
2!k
2)!
=(N1!k
1, k
2)!l
2=(N
1!k
1, N
2!k
2)D
]cos Ap(2n
1#1)k
1N
1B cos A
pn2k2
N2/2 B (3.8)
and
bIIICC
(n1, n
2)"
N1 @2~1+
k1/0
N2 @2~1+
k2/0C
=(k1, k
2)!=(k
1, N
2!k
2)!
=(N1!k
1, k
2)#=(N
1!k
1, N
2!k
2)D
]cos Ap(2n
1#1)k
12(N
1/2) B cos A
p(2n2#1)k
22(N
2/2) B. (3.9)
The decomposition cost for (3.6)}(3.9) is (2N1N
2#N
1#N
2) additions. The proposed decomposition
requires no twiddle factor. In particular, bICC
is the same as uICC
except that the former has a reduced size andbIIICC
is the type-III (N1/2) by (N
2/2) 2-D DCT. Computation of bIII~I
CCcan be done in terms of bI}III
CCif we swap
N1, n
1, k
1and N
2, n
2, k
2, respectively. The decomposition of bI}III
CCwill be discussed in Section 7. The total
computation costs for the type-I cosine}cosine sequences are
MICC
(N1, N
2)"MI
CCAN
12
,N
22 B#MI}III
CC AN
12
,N
22 B#MIII}I
CC AN
12
,N
22 B#MIII
DCTAN
12
,N
22 B, (3.10)
AICC
(N1, N
2)"AI
CCAN
12
,N
22 B#AI}III
CC AN
12
,N
22 B#AIII}I
CC AN
12
,N
22 B#AIII
DCTAN
12
,N
22 B
#2N1N
2#N
1#N
2. (3.11)
4. Computation of type-I sine}cosine sequence
The type-I sine}cosine sequence of=(k1, k
2) is de"ned as
uISC
(n1, n
2)"
N1~1+
k1/1
N2
+k2/0
=(k1, k
2)sinA
pn1k1
N1B cosA
pn2k2
N2B,
n1"1,2, N
1!1, n
2"0,2, N
2, (4.1)
T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1923
which can be decomposed into
bISC
(n1, n
2)"
N1@2~1+
k1/1
N2 @2+
k2/0
G1(k
1, k
2)sinA
pn1k1
N1/2 B cosA
pn2k2
N2/2 B,
n1"0,2, N
1/2, n
2"0,2, N
2/2, (4.2)
bI}IIISC
(n1, n
2)"
N1 @2~1+
k1/1
N2 @2~1+
k2/0
G2(k
1, k
2)sinA
pn1k1
N1/2 B cosA
p(2n2#1)k
2N
2/2 B,
n1"0,2, N
1/2, n
2"0,2, N
2/2!1, (4.3)
bIII}ISC
(n1, n
2)"
N1 @2~1+
k1/1
N2 @2+
k2/0
G3(k
1, k
2)sinA
p(2n1#1)k
1N
1B cosA
pn2k2
N2/2 B,
n1"0,2, N
1/2!1, n
2"0,2, N
2/2, (4.4)
bIIISC
(n1, n
2)"
N1 @2+
k1/1
N2 @2~1+
k2/0
G4(k
1, k
2)sinA
p(2n1#1)k
1N
1B cosA
p(2n2#1)k
2N
2B,
n1"0,2, N
1/2!1, n
2"0,2, N
2!1, (4.5)
where
CG
1(k
1, k
2)
G2(k
1, k
2)
G3(k
1, k
2)
G4(k
1, k
2)D"C
1 l2
1 l2
1 !1 1 !1
1 l2
!l1
!l1l2
1 !1 !l1
!l1D C
=(k1, k
2)
=(k1, N
2!k
2)
=(N1!k
1, k
2)
=(N1!k
1, N
2!k
2)D, (4.6)
where li, i"1, 2, is de"ned in the last section. The number of additions for the decomposition is
2N1N
2!3N
2#N
1!2. Note that bI
SCis the same as uI
SCwith a reduced size and bIII
SCcan be converted into
bIIICC
when n1
is replaced by N1/2!n
1. Similarly, computation of bIII}I
SCcan be done in terms of bIII}I
CC, which can
also be converted into bI}IIICC
by swapping the values of n1, k
1, N
1and n
2, k
2, N
2, respectively. The
decomposition of bI}IIISC
will be discussed in Section 8. The total computation cost for the type-I sine}cosinesequence is
MISC
(N1, N
2)"MI
SCAN
12
,N
22 B#MI}III
SC AN
12
,N
22 B#MIII}I
SC AN
12
,N
22 B#MIII
DCTAN
12
,N
22 B, (4.7)
AISC
(N1, N
2)"AI
SCAN
12
,N
22 B#AI}III
SC AN
12
,N
22 B#AIII~I
SC AN
12
,N
22 B#AIII
DCTAN
12
,N
22 B
#2N1N
2#N
1!3N
2!2. (4.8)
5. Computation of the type-I cosine}sine sequence
The type-I cosine}sine sequence of=(k1, k
2) is de"ned as
uICS
(n1, n
2)"
N1
+k1/0
N2~1+
k2/1
=(k1, k
2)cosA
pn1k1
N1B sinA
pn2k2
N2B,
n1"0,2, N
1, n
2"1,2, N
2!1, (5.1)
1924 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935
which is decomposed into
bICS
(n1, n
2)"
N1 @2+
k1/0
N2 @2~1+
k2/1
G1(k
1, k
2)cosA
pn1k1
N1/2 B sinA
pn2k2
N2/2 B,
n1"0,2, N
1/2, n
2"1,2, N
2/2!1, (5.2)
bI}IIICS
(n1, n
2)"
N1@2+
k1/0
N2 @2+
k2/1
G2(k
1, k
2)cosA
pn1k1
N1/2 B sinA
p(2n2#1)k
2N
2B,
n1"0,2, N
1/2, n
2"0,2, N
2/2!1, (5.3)
bIII}ICS
(n1, n
2)"
N1 @2~1+
k1/0
N2 @2~1+
k2/1
G3(k
1, k
2)cosA
p(2n1#1)k
1N
1B sinA
pn2k2
N2/2 B,
n1"0,2, N
1/2!1, n
2"1,2, N
2/2!1, (5.4)
bIIICS
(n1, n
2)"
N1@2~1+
k1/0
N2 @2+
k2/1
G4(k
1, k
2)cosA
p(2n1#1)k
1N
1B sinA
p(2n2#1)k
2N
2B,
n1"0,2, N
1/2!1, n
2"0,2, N
2/2!1, (5.5)
CG
1(k
1, k
2)
G2(k
1, k
2)
G3(k
1, k
2)
G4(k
1, k
2)D"C
1 !1 l1
!l1
1 1 l1
l1
1 !1 !1 1
1 l2
!1 !l2D C
=(k1, k
2)
=(k1, N
2!k
2)
=(N1!k
1, k
2)
=(N1!k
1, N
2!k
2)D. (5.6)
The required decomposition cost is 2N1N
2!3N
1#N
2!2 additions. The sub-sequence bI
CSis the same as
uICS
, but has a reduced size, bIIICS
can be converted into bIIICC
if n2
is replaced by N2/2!n
2. Similarly,
computation of bI}IIICS
can be achieved from bI}IIICC
, and bIII}ICS
can be converted into bI}IIISC
by swapping n1, k
1,
N1
with n2, k
2, N
2, respectively. The total computation cost for type-I cosine}sine sequence is the same as
the type-I sine}cosine sequence.
6. Computation of type-I sine}sine sequence
The type-I sine}sine sequence of=(k1, k
2) is de"ned as
uISS
(n1, n
2)"
N1~1+
k1/1
N2~1+
k2/1
=(k1, k
2)sinA
pn1k1
N1B sinA
pn2k2
N2B,
n1"1,2, N
1!1, n
2"1,2, N
2!1, (6.1)
which can be decomposed into the following four sub-sequences.
bISS
(n1, n
2)"
N1@2~1+
k1/1
N2@2~1+
k2/1
G1(k
1, k
2)sinA
pn1k1
N1/2 BsinA
pn2k2
N2/2 B,
n1"1,2, N
1/2!1, n
2"1,2, N
2/2!1, (6.2)
T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1925
bI}IIISS
(n1, n
2)"
N1 @2~1+
k1/1
N2 @2+
k2/1
G2(k
1, k
2)sinA
pn1k1
N1/2 BsinA
p(2n2#1)k
2N
2B,
n1"1,2, N
1/2!1, n
2"0,2, N
2/2!1, (6.3)
bIII}ISS
(n1, n
2)"
N1@2+
k1/1
N2 @2~1+
k2/1
G3(k
1, k
2)sinA
p(2n1#1)k
1N
1BsinA
pn2k2
N2/2 B,
n1"0,2, N
1/2!1, n
2"1,2, N
2/2!1, (6.4)
bIIISS
(n1, n
2)"
N1@2+
k1/1
N2 @2+
k2/1
G4(k
1, k
2)sinA
p(2n1#1)k
1N
1BsinA
p(2n2#1)k
2N
2B,
n1"0,2, N
1/2!1, n
2"0,2, N
2/2!1, (6.5)
where
CG
1(k
1, k
2)
G2(k
1, k
2)
G3(k
1, k
2)
G4(k
1, k
2)D"C
1 !1 !1 1
1 1 !1 !1
1 !1 1 !1
1 1 1 1 D C=(k
1, k
2)
=(k1, N
2!k
2)
=(N1!k
1, k
2)
=(N1!k
1, N
2!k
2)D . (6.6)
The subsequences bISS
has the same de"nition as uISS
, but with a reduced sizes, bIIISS
can be converted intobIIICC
when n1
and n2
are replaced by N1/2!n
1and N
2/2!n
2, respectively. Similarly, the computation of
bI}IIISS
can be achieved by bI}IIISC
if n2
is replaced by N2/2!n
2, and bIII}I
SScan be converted into bI}III
SSby swapping
n1, k
2, N
1with n
2, k
2, N
2, respectively. Furthermore, bI}III
SScan also be converted into bI}III
SCby replacing n
2by
N2/2!n
2. The decomposition requires 2N
1N
2!3N
2!3N
1#4 additions. The total computational costs
are
MISS
(N1, N
2)"MI
SSAN
12
,N
22 B#MI}III
SS AN
12
,N
22 B#MIII}I
SS AN
12
,N
22 B#MIII
SSAN
12
,N
22 B, (6.7)
AISS
(N1, N
2)"AI
SSAN
12
,N
22 B#AI}III
SS AN
12
,N
22 B#AIII}I
SS AN
12
,N
22 B#AIII
SSAN
12
,N
22 B
#2N1N
2!3N
1!3N
2#4. (6.8)
7. Computation of type-I}III cosine}cosine sequence
The type-I}III cosine}cosine sequence of =(k1, k
2) is de"ned as
uI}IIICC
(n1, n
2)"
N1
+k1/0
N2~1+
k2/0
=(k1, k
2)cosA
pn1k1
N1BcosA
p(2n2#1)k
22N
2B,
n1"0,2, N
1, n
2"0,2, N
2!1, (7.1)
1926 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935
which can be expressed by
CuI}IIICC A2n
1, 2n
2!
p2#1
2 BuI}IIICC A2n
1, 2n
2#
p2!1
2 BD"C1 1
1 !1D CbICC
(n1, n
2), n
1"0,2, N
1/2, n
2"0,2, N
2/2
bICS
(n1, n
2), n
1"0,2, N
1/2, n
2"1,2, N
2/2!1D,
(7.2a)
where
CbICC
(n1, n
2)
bICS
(n1, n
2)D"C
+N1k1/0
+N2~1k2/0=(k
1, k
2)cos(a
2k2)cosA
pn1k1
N1/2 B cosA
pn2k2
N2/2 B
+N1k1/0
+N2~1k2/1=(k
1, k
2)sin(a
2k2)cosA
pn1k1
N1/2 B sinA
pn2k2
N2/2 B D , (7.2b)
CuI}IIICC A2n
1#1, 2n
2!
p2#1
2 BuI}IIICC A2n
1#1, 2n
2#
p2!1
2 BD"C
1 1
1 !1D CbIII}ICC
(n1, n
2), n
1"0,2, N
1/2!1, n
2"0,2, N
2/2
bIII}ICS
(n1, n
2), n
1"0,2, N
1/2!1, n
2"1,2, N
2/2!1D, (7.3a)
where a2"pp
2/(2N
2) and
CbIII}ICC
(n1, n
2)
bIII}ICS
(n1, n
2)D"C
+N1k1/0
+N2~1k2/0=(k
1, k
2)cos(a
2k2)cosA
p(2n1#1)k
1N
1B cosA
pn2k2
N2/2 B
+N1k1/0
+N2~1k2/1=(k
1, k
2)sin(a
2k2)cosA
p(2n1#1)k
1N
1B sinA
pn2k2
N2/2 B D. (7.3b)
Both (7.2b) and (7.3b) are the sub-sequences that have been considered in previous sections. The indexassociated with n
2in (7.2a) and (7.3a) requires a mapping process for the invalid indices. The mapping
process can be performed in the same way as that given in Appendix AThe sub-sequences bI
CC, bI
CS, bIII}I
CCand bIII}I
CScan be further decomposed into
bICC
(n1, n
2)"
N1 @2+
k1/0
N2 @2+
k2/0 C=(k
1, k
2)cos(a
2k2)#
(!1)(p2~1)@2l2=(k
1, N
2!k
2)sin(a
2k2)#
l1=(N
1!k
1, k
2)cos(a
2k2)#
(!1)(p2~1)@2l1l2=(N
1!k
1, N
2!k
2)sin(a
2k2)D cos
pn1k1
N1/2
cospn
2#k
2N
2/2
, (7.5)
bICS
(n1, n
2)"
N1 @2+
k1/0
N2 @2~1+
k2/1 C=(k
1, k
2)sin(a
2k2)!
(!1)(p2~1)@2=(k1, N
2!k
2) cos (a
2k2)#
l1*=(N
1!k
1, k
2) sin (a
2k2)!
(!1)(p2~1)@2l1*=(N
1!k
1, N
2!k
2) cos (a
2k2)D cos A
pn1k1
N1/2 B sin A
pn2k2
N2/2 B,
(7.6)
T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1927
bIII}ICC
(n1, n
2)"
N1 @2~1+
k1/0
N2 @2+
k2/0 C=(k
1, k
2) cos (a
2k2)#
(!1)(p2~1)@2l2=(k
1, N
2!k
2) sin (a
2k2)!
=(N1!k
1, k
2) cos (a
2k2)!
(!1)(p2~1)@2l2=(N
1!k
1, N
2!k
2) sin (a
2k2)D
]cos Ap(2n
1#1)k
1N
1B cos A
pn2#k
2N
2/2 B, (7.7)
bIII~1CS
(n1, n
2)"
N1 @2~1+
k1/0
N2@2~1+
k2/1 C=(k
1, k
2) sin (a
2k2)!
(!1)(p2~1)@2=(k1, N
2!k
2) cos (a
2k2)
!=(N1!k
1, k
2) sin (a
2k2)#
(!1)(p2~1)2=(N1!k
1, N
2!k
2) cos (a
2k2)D
]cos Ap(2n
1#1)k
1N
1B sin A
pn2k2
N2/2 B. (7.8)
These sub-sequences can be further decomposed, as shown in the previous sections. The computation in(7.2a) and (7.3a) needs N
1N
2#N
2!2N
1!2 additions. The cost for combining the terms inside the
brackets of (7.5)}(7.8) is 2N1N
2#N
2!(N
1#1)(p
2#1) additions and 2N
1N
2#2N
2!3p
2(N
1#1)
multiplications. Therefore, the total computation cost for the type-I}III cosine}cosine sequence is
MI}IIICC
(N1, N
2)"MI
CC(N
1/2, N
2/2)#MI}III
CC(N
1/2, N
2/2)#MI
CS(N
1/2, N
2/2)
#MI}IIICS
(N1/2, N
2/2)#2N
1N
2#2N
2!3p
2(N
1#1), (7.10)
AI}IIICC
(N1, N
2)"AI
CC(N
1/2, N
2/2)#AI}III
CC(N
1/2, N
2/2)#AI
CS(N
1/2, N
2/2)
#AI}IIICS
(N1/2, N
2/2)#3N
1N
2#2N
2!(N
1#1)(p
2#3). (7.11)
8. Computation of the type-I}III sine}cosine sequence
The type-I-III sine}cosine sequence of =(k1, k
2) is de"ned as
uI}IIISC
(n1, n
2)"
N1~1+
k1/1
N2~1+
k2/0
=(k1, k
2) sin A
pn1k1
N1B cos A
p(2n2#1)k
22N
2B.
n1"1,2, N
1!1, n
2"0, 2, N
2!1. (8.1)
We have
CuI}IIISC A2n
1, 2n
2!
p2#1
2 BuI}IIISC A2n
1, 2n
2#
p2!1
2 BD"C1 1
1 !1DCbISC
(n1, n
2), n
1"1,2, N
1/2!1, n
2"0,2, N
2/2
bISS
(n1, n
2), n
1"1,2, N
1/2!1, n
2"1,2,N
2/2!1D,
(8.2)
1928 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935
CuI}IIISC A2n
1#1,2n
2!
p2#1
2 BuI}IIISC A2n
1#1,2n
2#
p2!1
2 BD"C
1 1
1 !1DCbIII}ISC
(n1, n
2), n
1"0,2, N
1/2!1, n
2"0,2, N
2/2
bIII}ISS
(n1, n
2), n
1"0,2, N
1/2!1, n
2"1,2, N
2/2!1D, (8.3)
where for (8.2) and (8.3)
bISC
(n1, n
2)"
N1@2~1+
k1/1
N2 @2+
k2/0 C=(k
1, k
2) cos (a
2k2)#
(!1)(p2~1)2r2=(k
1, N
2!k
2) sin (a
2k2)!
=(N1!k
1, k
2) cos (a
2k2)!
(!1)(p2~1)@2r2=(N
1!k
1, N
2!k
2) sin (a
2k2)D sin
pn1k1
N1/2
cospn
2k2
N2/2
,
n1"1,2, N
1/2!1, n
2"0,2,N
2/2 (8.4)
bISS
(n1, n
2)"
N1@2~1+
k1/1
N2@2~1+
k2/1 C=(k
1, k
2) sin (a
2k2)!
(!1)(p2~1)@2=(k1, N
2!k
2) cos (a
2k2)!
=(N1!k
1, k
2) sin (a
2k2)#
(!1)(p2~1)@2=(N1!k
1, N
2!k
2) cos (a
2k2)D sin A
pn1k1
N1/2 B sin A
pn2k2
N2/2 B,
n1"1,2, N
1/2!1, n
2"1,2,N
2/2!1, (8.5)
bIII}ISC
(n1, n
2)"
N1@2+
k1/1
N2 @2+
k2/0 C=(k
1, k
2) cos (a
2k2)#
(!1)(p2~1)@2r2=(k
1, N
2!k
2) sin (a
2k2)#
l1=(N
1!k
1, k
2) cos (a
2k2)#
(!1)(p2~1)@2l1r2=(N
1!k
1, N
2!k
2) sin (a
2k2)D sin A
p(2n1#1)k
1N
1B
]cos Apn
2k2
N2/2 B,
n1"0,2, N
1/2!1, n
2"0,2,N
2/2, (8.6)
bIII}ISS
(n1, n
2)"
N1@2+
k1/1
N2 @2~1+
k2/1 C=(k
1, k
2) sin (a
2k2)!
(!1)(p2~1)@2=(k1, N
2!k
2) cos (a
2k2)#
l1=(N
1!k
1, k
2) sin (a
2k2)!
(!1)(p2~1)@2l1=(N
1!k
1, N
2!k
2) cos (a
2k2)D sin A
p(2n1#1)k
1N
1B
]sin Apn
2k2
N2/2 B,
n1"0,2, N
1/2!1, n
2"1,2,N
2/2!1, (8.7)
where a2"pp
2/(2N
2), l
1and r
2are de"ned in Sections 2 and 3, respectively. The sub-sequences in (8.4)}(8.7)
can be further decomposed as shown in previous sections. The computation for (8.2) and (8.3) isN
1N
2!2N
1!N
2#2 additions. It is further requires 2N
1N
2!3N
2!(p
2#1)(N
1!1) additions and
T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1929
Fig. 1. Relation between di!erent sequences at di!erent time index.
2N1N
2!2N
2!3p
2(N
1!1) multiplications to combine the terms inside the brackets in (8.4)}(8.7). The
total computation cost for the type-I}III sine}cosine sequence is
MI}IIISC
(N1, N
2)"MI
SC(N
1/2, N
2/2)#MI}III
SC(N
1/2, N
2/2)#MI
SSSS1(N
1/2, N
2/2)
#MI}IIISS
(N1/2, N
2/2)#2N
1N
2!2N
2!3p
2(N
1!1), (8.8)
AI}IIISC
(N1, N
2)"AI
SC(N
1/2, N
2/2)#AI}III
SC(N
1/2, N
2/2)#AI
SSSS1(N
1/2, N
2/2)
#AI}IIISS
(N1/2, N
2/2)#3N
1N
2!4N
2!(N
1!1)(p
2#3). (8.9)
9. Discussion
In Section 2, it was shown that 2-D Type-III DCT could be decomposed into four sub-sequences. Theywere further decomposed into similar sub-sequences of smaller sizes. It was also shown that one type of
1930 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935
Table 1Computational complexity needed by various algorithms
N1]N
2Mul Add Total
Proposed algorithm (p1"p
2"3)
6]6 38 192 23012]12 342 1104 144624]24 1934 5952 788648]48 10502 29760 4026296]96 52542 143616 196158192]192 254774 672000 926774384]384 1196078 3081216 4277294768]768 5503014 13894656 19397670
Proposed algorithm (p1"1, p
2"3)
8]6 102 284 38616]12 626 1628 225432]24 3178 8484 1166264]48 16050 41876 57926128]96 77786 199796 277582256]192 369154 929076 1298230512]384 1710762 4238516 5949278
Proposed algorithm (p1"p
2"1)
8]8 183 417 60016]16 975 2305 328032]32 5195 11921 1711664]64 25483 58417 83900128]128 122099 277233 399332256]256 567243 1282929 1850172512]512 2589891 5830257 8420148
Chan's algorithm [1]8]8 144 464 60816]16 768 2592 336032]32 3840 13376 1721664]64 18432 65664 84096128]128 86016 311552 397568256]256 393216 1442304 1835520
Cho's algorithm [6]8]8 112 472 58416]16 640 2624 326432]32 3328 13504 1683264]64 16384 66176 82560128]128 77824 313600 391422256]256 360446 1450496 1810942
sub-sequences could be converted into another type according to two properties. The "rst property is thatthe parameters n
1, k
1and N
1can be swapped with n
2, k
2and N
2, which is equivalent to the transposition of
input matrix. For example, the type-III}I cosine}sine sequence can be implemented by type-I-III sine}cosinesequence. The second property is to substitute n by N/2!n so that the type-III sine term can be converted totype-III cosine term and vice versa. These conversion processes can be realized in the subroutines withoutrequiring much computational overhead. By using these properties, the type-III DCT computation can be
T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1931
Fig. 2. Comparison of computational complexity needed by various algorithms.
accomplished by using only seven types of sub-sequences. Fig. 1 shows the relation between thesesub-sequences, each being decomposed into four sub-sequences of smaller size either directly (solid line) orthrough conversions (dashed line). The approaches used in Sections 3}8 are based on the technique ofdecimation-in-time decomposition. Similarly, type-II DCT is needed if the sub-sequences are decomposedusing the technique of decimation-in-frequency (DIF).
The proposed algorithm needs an index mapping process, which can be considered to be trivial because themapping process involves with a small number of data indices. In addition, Appendix A has shown that thenumber of indices involved in the mapping process is only related to the values of p
1and p
2and not
increased with the transform sizes.Table 1 shows the computational complexity needed by the proposed algorithm and the algorithms
reported by Cho [6] and Chan [1]. For N1"N
2"2r (i.e. p
1"p
2"1), the proposed and Chan's
algorithms require about the same number of arithmetic operations, which are slightly more than thecomplexity needed by Cho's algorithm. Table 1 also lists the number of arithmetic operations required by theproposed algorithm for (p
1"1,3 and p
2"3). Fig. 2 shows that a smaller computational complexity than
that for the case of p1"p
2"1 is needed by the proposed algorithm. In general, the proposed algorithm uses
a smaller number of additions and a larger number of multiplications as compared to other reportedalgorithms. This phenomenon is the consequence of using four multiplications and two additions fora butter#y computation in our analysis rather than three multiplications and three additions are used fora butter#y computation. It is up to the users to decide which implementation scheme is used for theirapplications. At present, multiplication and addition can be performed at the same speed on some DSP chips.It is important to minimize the overall computational complexity rather than to reduce the number ofmultiplications at the cost of increasing the number of additions.
10. Conclusion
A fast algorithm is presented for the two-dimensional type-III DCT. It is shown that the type-III DCT can berecursively decomposed into a number of sub-sequences with reduced sizes. This algorithm has a fairly regularstructure and a simple input and output indexing scheme. One important feature of the proposed algorithm isto naturally support various transform sizes with a possible reduction of computational complexity.
1932 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935
Table 2Mapping process for p"5, N"10 and p"11, N"22
n p"5 and N"10 p"11 and N"22
x(2n!3) x(2n#2) x(2n!6) x(2n#5)
0 x(!3)"0 x(2) x(!6)"0 x(5)1 x(!1)"x(0)! x(4) x(!4)"x(3)! x(7)2 x(1) x(6) x(!2)"x(1)! x(9)3 x(3) x(8) x(0) x(11)4 x(5) x(10)"x(9)! x(2) x(13)5 x(7) x(12)"0 x(4) x(15)6 } } x(6) x(17)7 } } x(8) x(19)8 } } x(10) x(21)9 } } x(12) x(23)"x(20)!
10 } } x(14) x(25)"x(18)!11 } } x(16) x(27)"0
!Mapping region.
Appendix A. Index mapping process
We consider the mapping process for 1-D DCT, whose sequence length is N"p*2r, where r'1 and p isan odd integer. In general, we divide x(n), n"0,2, N!1, into sub-sequences x[2n!(p#1)/2] andx[2n#(p!1)/2] where n"0,2N/2. It can be easily veri"ed that for n((p#1)/4 the indices2n!(p#1)/2 becomes negative and for n*N/2!(p#1)/4 the indices 2n!(p#1)/2 is larger than N.Both are invalid indices, thus they require an index mapping process.
Based on the de"nition of the type-III DCT, we know that x(n) is associated with cos [p(2n#1)k].Similarly for t*0, x(!t!1) is associated with cos [p(!2t!1)k],cos [p(2t#1)k]. Therefore, itis reasonable that the invalid data x(!t!1) is replaced by x(t) in the DCT computation. Similarly ifx(N#t) is replaced by x(N!t!1), where t*0, we have x(N#t)cos [kp#(2t#1)kp/(2N)],x(N!t!1)cos [kp!(2t#1)kp/(2N)]. In summary, the mapping process is de"ned by
x(!t!1) Q x(t) and x(N#t) Q x(N!t!1)
for 0)t)(p!1)/2.Now, let us consider the indices of sub-sequences x[2n!(p#1)/2] and x[2n#(p!1)/2] for
n"0, 2, N/2. Based on the above mapping process, it can be veri"ed that for n"0,x[!(p#1)/2] isreplaced by x[(p!1)/2] which is used twice. The data duplication of the mapping process also occurs whenx[N#(p!1)/2]"x[N!(p#1)/2] for n"N/2. The duplication problem can be eliminated by de"ningx[!(p#1)/2]"x[N#(p!1)/2]"0. Table 2 illustrates two examples for p"5 and 11. It also shows thatthe mapping process involves only a few data whose indices are invalid. The number of invalid data isgenerally 2vp/4w , where vxw is an integer larger than x.
Appendix B. Computation complexity
In Sections 2}6, we decompose one 2-D sequence into four sub-sequences, as shown in (2.6). Because thesummations [(2.7), for example] for these sub-sequences have di!erent limits, it is di$cult to calculate thedecomposition cost in terms of the number of additions. Table 3 lists the details of the additive costs for (2.6).
T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1933
Table 3Decomposition cost
n1
n2
No. of additions
0 0 00 N
2/2 0
0 12N2/2!1 (N
2!2)
N1/2 0 0
N1/2 N
2/2 0
N1/2 12N
2/2!1 (N
2!2)
12N1/2!1 0 (N
1!2)
12N1/2!1 N
2/2 (N
1!2)
12N1/2!1 12N
2/2!1 2(N
1!2) (N
2!2)
Table 4No. of additions needed by the decomposition
k1
k2
No. of additions
0 0 00 N
2/2 0
0 12N2/2!1 (N
2!p
2!1)
N1/2 0 0
N1/2 N
2/2 0
N1/2 12N
2/2!1 (N
2!p
2!1)
12N1/2!1 0 (N
1!p
1!1)
12N1/2!1 N
2/2 (N
1!p
1!1)
12N1/2!1 12N
2/2!1 2(N
1!p
1!1)*(N2
!p2!1)
The total number of additions for (2.6) is 2*(N1N
2!N
1!N
2).
Additions and multiplications are needed by the computation inside the brackets of (2.8)}(2.11). To reducethe computational complexity, we can express the related computation into
cos (a1k1)[X
11(k
1, k
2) cos (a
2k2)#r
2X
21(k
1, k
2) sin (a
2k2)]
#r1sin (a
1k1)[X
31(k
1, k
2)cos (a
2k2)!r
2X
41(k
1, k
2) sin (a
2k2)],
0)k1)N
1/2, 0)k
2)N
2/2, (B.1)
sin (a1k1)[X
11(k
1, k
2) cos (a
2k2)#r
2X
21(k
1, k
2) sin (a
2k2)]
#cos (a1k1)[X
31(k
1, k
2) cos (a
2k2)!r
2X
41(k
1, k
2) sin (a
2k2)],
1)k1(N
1/2, 0)k
2)N
2/2, (B.2)
cos (a1k1)[X
11(k
1, k
2) cos (a
2k2)!X
21(k
1, k
2) sin (a
2k2)]
#r1
sin (a1k1)[X
31(k
1, k
2) cos (a
2k2)#X
41(k
1, k
2) sin (a
2k2)],
0)k1)N
1/2, 1)k
2(N
2/2, (B.3)
sin (a1k1)[X
11(k
1, k
2) cos (a
2k2)!X
21(k
1, k
2) sin (a
2k2)]
#cos(a1k1)[X
31(k
1, k
2)cos (a
2k2)#X
41(k
1, k
2) sin (a
2k2)],
1)k1(N
1/2, 1)k
2(N
2/2. (B.4)
Based on the above equations and taking the reduction of additions due to trivial twiddle factors, Table4 shows the details for the number of additions required by the decomposition process.
Hence, after taken into account of the reduction of multiplications and additions due to trivial twiddlefactors and special twiddle factors such as 1, 0.5 and 0.7071, the total decomposition cost for addition is
¹A"2N
1N
2!N
1(p
2#1)!N
2(p
1#1) (B.5)
To reduce the number of multiplications, (B.1)}(B.4) can be calculated by
cos (a1k1) cos (a
2k2)[X
11(k
1, k
2)#r
2X
21(k
1, k
2) tan (a
2k2)]
#r1
sin (a1k1) cos (a
2k2)[X
31(k
1, k
2)!r
2X
41(k
1, k
2)tan (a
2k2)],
0)k1)N
1/2, 0)k
2)N
2/2, (B.6)
1934 T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935
Table 5Decomposition cost due to trivial twiddle factor (Multiplication)
k1
k2
No. of multiplications
0 0 00 N
2/2 1
0 12N2/2!1 2N
2!3p
2!1
N1/2 0 1
N1/2 N
2/2 0
N1/2 12N
2/2!1 2N
2!3p
2!1
12N1/2!1 0 2N
1!3p
1!1
12N1/2!1 N
2/2 2N
1!3p
1!1
12N1/2!1 12N
2/2!1 3(N
1!2p
1)(N
2!2p
2)#4(N
1!2p
1)(p
2!1)#4(N
2!2p
2)(p
1!1)#2(p
1!1)(p
2!1)
sin (a1k1) cos (a
2k2)[X
11(k
1, k
2)#r
2X
21(k
1, k
2) tan (a
2k2)]
#cos (a1k1) cos (a
2k2)[X
31(k
1, k
2)!r
2X
41(k
1, k
2) tan (a
2k2)],
1)k1(N
1/2, 0)k
2)N
2/2, (B.7)
cos (a1k1) cos (a
2k2)[X
11(k
1, k
2)!X
21(k
1, k
2) tan (a
2k2)]
#r1
sin (a1k1) cos (a
2k2)[X
31(k
1, k
2)#X
41(k
1, k
2) tan (a
2k2)],
0)k1)N
1/2, 1)k
2(N
2/2, (B.8)
sin (a1k1) cos (a
2k2)[X
11(k
1, k
2)!X
21(k
1, k
2) tan (a
2k2)]
#cos (a1k1) cos (a
2k2)[X
31(k
1, k
2)#X
41(k
1, k
2) tan (a
2k2)],
1)k1(N
1/2, 1)k
2(N
2/2. (B.9)
Table 5 shows the decomposition cost for multiplications. When the term tan(a2k2) in (B.6)}(B.9) becomes
in"nity, (B.1)}(B.4) are used.Hence, total number of multiplications for (2.8)}(2.11) is
¹M"3(N
1!2p
1)(N
2!2p
2)#4p
2(N
1!2p
1)#4p
1(N
2!2p
2)#2p
1p2. (B.10)
References
[1] S.C. Chan, K.L. Ho, A new two-dimensional fast cosine transform algorithm, IEEE Trans. Signal Process 39 (2) (1991).[2] R.J. Clark, Transform Coding of Images, Academic Press, London, (1985) 481}485.[3] P. Duhamel, C. Guillemot, Polynomial transform computation of the 2-D DCT, ICASSP-90, Vol. 3 (1990) pp. 1515}1518.[4] M.A. Haque, A two-dimensional fast cosine transform, ICASSP ASSP-33 (6) (1985) 1532}1539.[5] Hong Ren Wu, ZhiHong Mon, Comment on Fast algorithms and implementation of 2-D DCT, IEEE Trans. Circuits Systems
Video Technol. 8 (2) (1998) 128}129.[6] Nam Ik Cho, Sang Uk Lee, Fast algorithm and implementation of 2-D DCT, IEEE Trans. Circuits Systems 38 (3) (1991) 297}305.[7] Nam Ik Cho, Sang Uk Lee, A Fast 4]4 DCT algorithm for the recursive 2-D DCT, IEEE Trans. Signal Process 40 (9) (1992)
2166}2173.[8] K.R. Rao, P. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applications, Academic Press, New York, 1990.
T.C. Tan et al. / Signal Processing 80 (2000) 1917}1935 1935