A Fast Algorithm of the DCT and IDCT for VLSI Implementation

Embed Size (px)

Citation preview

  • 7/28/2019 A Fast Algorithm of the DCT and IDCT for VLSI Implementation

    1/4

    Proceedingsof ICSP '96A FastAlgorithmof theDCT and IDCT forVLSI Implementation

    Mong Y ing Hou ZhaohuanInstituteof Acoustics, Chinese Academyof SciencesP 0BOX 2712, Beijing 100080, P.R.China

    ABSTRACTSince DCT performs very close to the statistically

    optimal K arhunen-Loeve Transform (KLT), it is widelyused in digital signal processing, especially for speechand image data compression. The DCT algorithms andVLSI atchitectures for real-time computationcapabilitiesarerequired urgently. It isknown that VLSIimplementation of distributed arithmetic is very eacientfor computing convolution. Here, an algorithm ispresented to convert theDCT and inverse DCT(1DCT)to skew-convolution. VLSI implementation of thealgorithm has same advantage as any implementationusing distributed arithmetic.

    1 INTRODUCTIONSmce the mtroduchon of DCT in the 1970' ~~

    considerable amount of research has been performed onalgonthm, archtecture and processor designs forcomputmg DCT And alsoDCT and mverse DCT(1DCT)have been extensively used m areas of Qlgital speechand mage processing In parhcular, the DCT hasbecome an mtegral part of several standards such asJPEG, MPEG and CCITT Recommendahon H 261

    Tradihonally, the DCT is accomplished by usmgmulhpliers to mplement the butterfly structure ofvmous fast algonthms But m VLSI realizahon, themulbpliers and the megular architecture andcomplicated routmg of the butterfly approach reqwemore silicon area and slow the speed It is known thalwth dstnbuted anthmehc the resultmg VLSIarchtecturehas ahghly regular structureand elmates

    theneed of multipliers, while circular and skew-circularconvolution can be computed efficiently usingmstributed arithmetic.

    Therefore, in the paper we present an algorithmthat converts DCT to skew-circular convolution by usingnumber theorehc transforms ("ITS)echnique. Thealgorithmisvery efficiently far V LSI application.

    2. THEALGORITHMGiven an input sequence {x(n),n=0,1,-..,N- },the I-D DCT is definedas folllows:

    n=O

    N- 1 lr(2n+ lk. k =lJ;.-,N-l (1)(k)=mCx(n)cos--r-0 2NIn the following derivalhon of the algorithm, theconsantm is ignored and N is assumed to be a

    power of 2. The definition equation of the odd indexedDCT components can be rewritten as follows:

    n(2n -t1)(2k+1)-l2N(2k+1)=Zx(n)ms-n=o

    42(N - - 2n)+1](2k +1)/ 2 - 1+c x ( N - l - h ) w s - -n=O 2N

    24% +1x 2+1)/ 24=c x(2n)-x(N-l-h)]co~-IF0 4Nk =0,1;-., N / 2- 1 (2)It is known from Number Theoxy that there is one

    to one mapping re12ki,,+1=(-1)'3' mod4M

    writtenas follows:

    0-7803-2912-0 637

  • 7/28/2019 A Fast Algorithm of the DCT and IDCT for VLSI Implementation

    2/4

    2k,+l=.

    Therefore, (2)canberewrittenas follows:X(2ks,,+1)NJ Z-1 2z(4n1+1)(2k,.,+1)= [x(2nJ ) x(N - 1-2n,)]cosJ =o 4N

    '2k,., +1 2k;,,+1

  • 7/28/2019 A Fast Algorithm of the DCT and IDCT for VLSI Implementation

    3/4

    circular convolution of length N /2 . From (13), we alsosee that by precomputing and storing all possiblecombinationofone set of fixed coefficients COS-i =0,1,. . N/2- , he multipliers can be replaced bymemory look-up tables and large savings in number ofarithmetic operations can be arcluved in VLSlrealization when using distributed arithmetic.

    Above all, the procedureof computing theodd indexDCT components are shown. It includes:1. an input mapping x(2n)-+(2nj), j =O,l,-.., N /2accordmg to (4)and (6).2. subtractions 'according to (8) wth appropriatenegationsas follows:z'(2n,)=3. a skew-circular convolution of sequence {z*( 2n )

    2n-3'4N

    x(2n,) - x(N - -2n,)-x(2n,) - x( N - -2nl)

    2n, N1j=O,l ,.-.,~2-l)and constant sequence{+ 2n .34N

    i =O,l,..-N/2-1}accorhg to (13) (noting the constantf a c t o r m is combinedulth cos terms) and obtamngX ( 2 ,+l),(i =O,l,..., N/2) wth appropnate negabonsdependmg on 2kS,,+1

  • 7/28/2019 A Fast Algorithm of the DCT and IDCT for VLSI Implementation

    4/4

    registers only, no multiplier that is required in anybutterfly structure of implementation. Moreover, majorcomputabon for DCT and IDCT is similar, except thatDCT needs some preadditions, while IDCT needs somepostadditiom. Therefore, a processor can be devised tocomputer DCT and IDCT with little overhead. Theadvantageof theaigorithm isthe inputdatamapping andoutput results mapping.

    REFERENCES[1]B. G.Lee,A new algorithm to computethediscrete

    cosine transform, IEEETrans.Acoust.,Speech,Signal Processing, vol. 32, no. 6,pp. 1243-1245,Dec. 1984.

    [2]M.T.Sun,LWu, A concurrent architectureforVLSImplementation of hscrete cosinetransform, IEEETrans. Circuits And Systems,vol, CAS-34,no. 8, pp.

    [31H. S . Hou, A fast recursive algorithm for computingthechscrete cosinetr~form.EEE Trans. Acoust.,992-994, Aug. 1987.

    Speech, Signal Processing,vol. 35,no. 10,pp. 1455-1461, Oct. 1987.

    [4]Weipng A new algorithmtocomputetheDCT anditsinverse, IEEE Trans. Signal P rocessing,vo1.39,no. 6,pp. 1305-1313, J q1991.

    [ 5 ] S . Uramotoetal.,A 100MHz 2-D dwrete cosinetransformcore processor, IEEEJ .Solid-stateCircuits,vol. 27, pp.492-499, Apr. 1992.

    [6]E. Feig and S . Winogard, Fast algorithms for thehscrete cosinetransform IEEE Trans. SignalProcessing.,vol 40, pp. 2174-2193, Sept, 1992.

    [qD. Slaweclu andW i, DCTADCT processordesignforhghdata rate image&g, IEEE Trans.Circuits Syst. fide0 Technol.,vol. 2 pp. 135-146,June 1992

    [SlAvani~draMabsetti, AlanN.Willson, A 100 MHz2-D 8x 8 DCT/IDCT processor for HDTV applicati-on, IEEE Trans. Circuits Syst. VideoTechnolvol. 5no.2,pp1.58-165, Apr. 1995.volution Algorithm.

    C91H.J. Nussbaumer, FastFourierTransform and Con-New Y ork: Springer, 1987.

    Fig. 1 Computing the DCT (N=8)

    640