AIR FORCE WRIGHT-PATTERSON AFS OH SCHOO- … · efficient computer implementations of fast fourier ... efficient computer implementations of fast fourier transforms, ... a dft to

AD-AlOC 782 AIR FORCE INST OF TECH WRIGHT-PATTERSON AFS OH SCHOO--ETC F/S 19/1EFFICIENT COMPUTER IMPLEMENTATIONS OF FAST FOURIER TRANSFORMS (U)DEC 80 J 0 BLANKEN

UNCLASSIFIED AFIT/GE/EE/8OD-9 NLSIIIIIIIIII

EIIIIEEEEEEEEEEEEIIIEIIEEEEEIIEEEEEEEIIEEEEIIIIIIEEEIIIllElllEllEllEEI

EEEEEEEEEEEII

DISCLAIMER NOTICE

THIS DOCUMENT IS BEST QUALITYPRACTICABLE. THE COPY FURNISHEDTO DTIC CONTAINED A SIGNIFICANTNUMBER OF PAGES WHICH DO NOTREPRODUCE LEGIBLY.

, ,I

4 ( ii " ' 1 . . . :

EFFICIENT COMPUTER IMPLEMENTATIONS OF

FAST FOURIER TRANSFORMS

THESIS

*AFIT/GE/EF/8OD-9 John D. B13nkenCaptain USAF

JUL 1 1'331

Approved for public release; distributio~n unlimiteld

LAM--

.-

/ AFITr/,FiEE/'OD-9

EFFICIENT COMPUTER IMPLEMENTATIONS

OF FAST FOURIER TRANSFORMS,

THESIS

Presented to the Faculty of the School of Engineering

of the Air Force Institute of Technology

Air University

in Partial Fulfillment of the

Requirements for the Degree of

9 Master of Science

John DBlanke n~ B.S.E.E. ~ C~

Capt. USAF ,

I .,...

Graduate Electrical Engineering

Approved for public release; distribution unlimited

lt

t Acknowledgments

Dr. John Hines and Mr. Harold Noffke of the Air Force-I

Wright Aeronautical Laboratory proposed this topic. I am

indebted to them for both the topic and their support.

Gratitude is due Drs. Burrus and Parks of Rice

University for many helpful discussions and ideas through-

out this effort. They graciously provided the Prime Factor

Algorithm studied and tested in this paper.

A special thanks is extended to Mr. Jim Thompson

of Control Data Corporation and Dr. Poirier of Aeronautical

Systems Division for their invaluable assistance in quanti-

fying the multiply and add speed on the Cyber 74. A debt

of gratitude is owed to my thesis readers Capt. Larry Kizer

and Dr. Kabrisky. Their willingness to let me work inde-

pendently was greatly appreciated.

My very deepest appreciation is extended to my faculty

advisor Dr. Pedro Rustan for his patience, confidence, and

skilled quidance. Most of all I appreciate his enthusiasm

and hard work which motivated me to continue my efforts.o

Finally, I am indebted the most to my wife, Linda, for

her encouragement, typing the rough draft, and allowing me

to sacrifice our family life and complete my work at AFIT.

I 9John D. Blanken

J. 4 This thesis was typed by Niki Maxwell1.,

4,

Contents

Page

Acknowledgments ....................................... ii

List of Figures ....................................... vii

List of Tables ........................................ x

Glossary of Terms ..................................... xi

Abstract .............................................. xii

I. Introduction ..................................... 1

1.1 Background ................................. 11.2 Problem ................................. 21.3 Scope ................................... 21.4 Assumptions ................................ 31.5 Approach and Presentation ............... 5

II. Literature Review ................................ 7

III. FFT Theory .................................... 12

3.1 Computing Trigonometric Function Values. 133.2 Fixed Radix Algorithms .................. 14

3.2.1 Development of Radix-2 Theory .... 143.2.2 Development of Radix-3 Theory .... 263.2.3 Radix-5 Theory .................... 403.2.4 Digit Reversal Algorithm ......... 413.2.5 Development of a Radix-3 FFT

Based on the Cube Root of Unity.. 453.2.6 Summary ............................ 51

3.3 Real Operations Count for Fixed Radix

FFTs ....................................... 51

* 3.3.1 Number of Butterflies in Fixed

Radix-p FFTs ....................... 523.3.2 Number of Twiddle Factors in

4 Fixed Radix-p FFTs ................ 53j 3.3.3 Number of Trigonometric Functions

Required for the Fixed RadixAlgorithms ......................... 53

3.3.4 Number of Real Operations in4 Radix-p FFTs ....................... 56

3.3.5 Real Operations Count for theRadix-3 FFT Using the ComplexCube Root of Unity ............... 58

3.3.6 Memory Requirements for FixedRadix FFTs ....................... 61

iv F.B.Ch. "1&i I ih. b.aL, -NOk Fl iJ

Contents

Page

3.4 Mixed Radix FFT Algorithms ............... 63

3.4.1 Mixed Radix Theory ................ 653.4.2 Digit Reversal Algorithm

(General) ........................... 703.4.3 Twiddle Factors ..................... 723.4.4 Real Operations Count for Computing

Sine and Cosine Difference Equation 733.4.5 Real Operations Count for Mixed.

FFTs ................................ 783.4.6 Memory Requirements for Mixed

Radix FFTs .......................... 99

3.5 Fourier Transforms Using Fast ConvolutionAlgorithms ................................ 109

3.5.1 Converting a DFT to CircularConvolution ........................ 110

3.5.2 Reordering the Data Arrays ........ 1123.5.3 The Winograd Fourier Transform .... 1133.5.4 The Prime Factor Algorithm Theory. 1183.5.5 Real Operations for WFTA .......... 1223.5.6 Memory Requirements for WFTA ...... 1273.5.7 Real Operations for the PFA ....... 1343.5.8 Memory Requirements for PFA ....... 1373.5.9 Summary ............................ 144

IV. Comparison Results of Efficient Discrete FourierTransforms ....................................... 145

4.1 Introduction .............................. 1454.2 Conventional Radix-3 vs R(u) Field

Radix-3 ................................... 1474.3 Fixed Radix vs Mixed Radix FFTs .......... 1474.4 Mixed Radix FFT Comparison: IMSL vs

Singleton ................................. 1504.5 Conventional vs Fast Convolution Mixed

Radix FFTs ................................ 158

4.5.1 Real Operations Count ............. 1594.5.2 Memory ............................. 1714.5.3 WFTA vs PFA Operations Count ...... 173

4.6 Flexibility of the DFT Algorithms ........ 1744.7 An Algorithm to Select the Most Efficient

DFT Technique ............................. 178

4.7.1 Arguments .......................... 1784.7.2 Usage .............................. 179

v

Contentfi-

Page

V. Conclusions ..................................... 185

5.1 Results and Conclusions ................... 1855.2 Recommendations ............................ 188

Bibliography ........................................... 190

Appendix A: Radix-2 FFT Algorithm ................... 193

Appendix B: Radix-3 FFT Algorithm ............... *... 195

Appendix C: Radix-3 FFT in R(u) ...................... 201

Appendix D: Radix-5 FFT Algorithm .................... 208

Appendix E: Mixed Radix FFT Algorithm ............... 222

Appendix F: Singleton's Mixed Radix FFT ............. 242

Appendix G: IMSL Mixed Radix FFT .................... 258

Appendix H: An Alqorithm for Computing the WFTA..... 264

Appendix I: Computing the Prime Factor Algorithm(PFA) ................................... 286

Appendix J: Timing Tests on the CDC Cyber 74 ........ 298

vi

7.

List of Fiqures

Figure Page

3.1 Flowgraph of the Decimation-In-Time Decom-position of an N-Point DFT Computation intoTwo N/2-Point DFT Computations (N=8) ........ 16

3.2 Flowgraph of the Decimation-In-Time Decom-position of an N/2-Point DFT Computationinto Two N/4-Point DFT Computations (N=8)... 19

3.3 Result of Substituting Figure 3.2 into

Figure 3.1 .................................... 20

3.4 Flowgraph of Two-Point DFT ................... 22

3.5 Flowgraph of Complete Decimation-In-TimeDecomposition of an 8-Point DFT ............. 23

3.6 Flowgraph of Basic Butterfly Composition .... 25

3.7 Flowgraph of Simplified ButterflyComposition ................................... 27

3.8 Flowgraph of 8-Point DFT Using the "TwiddleFactor" Butterfly of Figure 3.7 ............. 28

3.9 Butterfly Flowgraph for First StageDecimation (N=9)............................... 31

3.10 Butterfly Flowgraph for F(k) ................ 33

3.11 Complete Butterfly Flowgraph (N=9) .......... 34

3.12 General Radix-3 Butterfly Flowgraph ......... 38

3.13 Basic Twiddle Factor Radix-3 Butterfly ...... 39

3.14 Radix-5 Twiddle Factor Buttcrfly ............ 42

3.15 Digit Reversed Input and Output Arrays ...... 44

3.16 Radix-3 Butterfly in R(u) Arithmetic ........ 49

3.17 First Decomposition N=30 ...................... 67

3.18 Butterfly Flowgraph for N=30 ................. 68

3.19 Radix-4 Butterfly Flowgraph Showing the 4Twiddle Factor Multipliers ................... 75

vii

- ' ' " J'i" ' ' "n - .. ... . .. .

List of Figures

Figure Page

3.20 Radix-2 Section of Singleton's FFT .......... 84




3.24 General Factor Section of Singleton's FFT... 91

3.25 Multiplications vs N for Singleton'sFFT (N<200) .................................... 97

3.26 Additions vs N for Singleton's FFT (N<200).. 98

3.27 Multiplications vs N for Multiples of2,3,4, and 5 ................................. 100

3.28 Additions vs N for Multiples of 2,3,4 and 5. 101

3.29 Memory Array vs N (<200) for Singleton's FFT 107

3.30 Memory Array vs N (<200) for IMSL's FFT ..... 108

3.31 Flow Control in WFTA Program ................ 124

3.32 Real Multiplications for WFTA ............... 131

3.33 Real Additions for WFTA ..................... 132

3.34 Memory Comparison Between Modified andOriginal WFTA ................................ 133

3.35 Real Multiplications for the PFA ............. 140

3.36 Real Additions for the PFA .................. 141

3.37 Percentage Savings of Multiplications byUsing Shifts in PFA .......................... 142

3.38 Memory Array Required by PFA ................ 143

4.1 Fourier Transform of e-t cos 50iit ........... 146

4.2 Memory Array Saved Using Singleton's Insteadof IMSL's FFT ................................ 155

4.3 Real Multiplication Comparison for PFA, WFTA,and MFFT (N<500) ............................ 160

viii

List of Iicures

Picjure "age

4.4 Real Addition Comparison for PFA, WFTA,and MFFT (<500) ............................. 162

4.5 Predicted Times of Execution as a Percentageof Measured Time for the MFrT, WFTA, and PFA 170

4.6 Memory Arrays Required by MFFT, WFTA, andPFA ......................................... 172

4.7 Relative Efficiencies of MFFT, WFTA, andPFA ......................................... 175

4.8 Flowchart to Select Most EfficientAlgorithm ................................... 180

i

I.

4Ii

= ' " , - r -? l * l : ': ' "l :

... . . . .' : ... . .

List of Tables

.i've Page

3.1 Real Operations Count for Radix-2,3 and 5... 59

3.2 Comparison Between Complex and R(u) Radix-3FFT For Real Operations ..................... 62

3.3 Fixed Radix Memory Required .................... 64

3.4 Results of Counters in Singleton's FFT ...... 79

3.5 Operations Executed for Each Counter .......... 81

3.6 Small-N Operations Count for WFTA ............ 119

3.7 PFA Small-N DFT Operations Count ............. 121

3.8 McClellan and Nawab's WFTA Real Operationsfor the Sr' II-N Algorithms .................... 128

3.9 Real OperaLions and Memory for McClellanand Nawab WFTA .............................. 129

3.10 PFA Small-N DFT Operations Count for NoShifts ...................................... 136

3.11 PFA Real Operations and Memory Count forN<72 ........................................ 138

4.1 Radix-3 Timing Comparison ................... 148

4.2 Fixed Radix (FR) vs Mixed Radix (MR) FFTs... 149

4.3 Program Memory Required by FFTs ............... 151

4.4 Timing Results for IMSL and Singleton FFTs.. 154

4.5 Operations and Memory Array Comparison forMFFT, WETA, and PFA ......................... 164

4.6 Timing Results from the WFTA Subroutines .... 1674

4.7 Measured and Predicted Timing Results forV MFFT, WFTA, and PFA ......................... 168

5.1 Comparison of DFT Algorithms .................. 189

x

m-

Glossary of Terms

1. Butterfly: The DFT computation of Figure 3.4 pro-

vides the notation whose appearance is that of a

"butterfly".

2. Fixed Radix: The term "radix" is commonly used to

describe a specific FFT decomposition. The term

"fixed" radix means that all the factors of N are

the same.

3. Mixed Radix: All the factors of N are not identical.

4. Relatively Prime: The numbers in a given set are said

to be relatively prime when no number in the set is

divisible (with no remainder) by any other number in

the set. Example, (2, 3, 7, 9) are not relatively

prime sets because 9 is divisible (with no remainder)

by 3. The following example is relatively prime:

(2, 3, 5, 7).

5. Square and Square--free Factors: For the case where

N = 4 • 3 - 7 • 4, the "4s" are square factors and

the 3 and 7 are square-free.

6. Twiddle Factors: The term refers to the complex

multipliers of Figure 3.8 which pre-multiply the FFT

4butterflies. They are sometimes called phase or

rotation factors.

t

t.t

xi

Abstract

-A comprehensive comparison of the most efficient

Discrete Fourier Transform (DFT) techniques is presented.

The DFT algorithms selected are the fixed radix Fast

Fourier Transform (FFT), mixed radix FFT, the Winograd

Fourier Transform Algorithm (WFTA), and the Prime Factor

Algorithm (PFA). Comparison of the alqorithms is based

on the number of real multiplications, additions, and

memory arrays required as a function of sequence length N.

This paper reviews the literature, selects the most

efficient DFT FORTRAN programs available, develops the

number of real multiplications and additions as a function

of N, and compares the algorithms using tables and plots of

real multiplications, additions, and memory arrays. This

comparison shows that the WFTA and PFA require the least

real multiplications and additions, but the fixed radix

and mixed radix FFTs require the least memory. The mixed

radix FFT is much more flexible than WFTA or PFA since N

can be any length sequence. The WFTA and PFA are closely

studied and tradeoffs between the two are discussed. The

PFA uses less additions but more multiplications for most

sequence lengths which means the WFTA is more efficient

when multiplications are "costly" relative to additions.

4The PFA uses less memory than the WFTA making the PFA

preferable when the machine memory is limited. -Based on

xii

the results of the paper, an alqorithm is presented to select

the most efficient. DPI' for an N length sequence given the

, multiply speed, add speed, and memory size of the computer.

xiii/e

*'rxiii

I. Introduction

1.1 Background

Computing the Discrete Fourier Transform (DFT) of N

points has many applications in scientific and engineering

calculations. In 1965 Cooley and Tukey described an

algorithm which became known as the Fast Fourier Transform

(FFT) because it reduced the number of complex operations

required to compute the DFT from N2 to N log 2 N where

N=2m , m an integer. Using ideas proposed in the Cooley-

Tukey paper a mixed radix algorithm was written and pub-

lished in 1969 by Singleton which permitted N to be any

positive integer length sequence.

In 1976 Winograd proposed a mixed radix DFT algorithm

which (1) converted the DFT to circular convolution,

(2) used fast convolution algorithms to perform "short-

DFTs", and (3) nested these short-DFTs into a structure to

perform long Fourier transforms on complex data sequences.

This alqorithm became known as the Winograd Fourier Trans-

form Algorithm (WFTA). The WFTA maintained the real

additions count at the FFT levels while significantly

reducing the real nultiplications required.

Kolba and Parks, 1977, used Winograd's fast convolu-

tion algorithms and proposed a new Prime Factor Algorithm

1

(PFA). This new algorithm niodified the short-DFTs to use

"shifts" instead of multiplication by 1/2 and (lid not u:,,

the nested structure of WPTA. As a consequence the J'I'A

uses more real multiplications and less additions rcldt iv,'

to the WFTA for a given length sequence N.

1.2 Problem

Both Winograd, 1976, and Kolba-Parks, 1977, compirLl

their operations count to that of the FFT but did not

include all possible WFTA ard PFA sequence lengths. Fur-

ther, no comparisons were made on the basi3 of memory arrays

required by each algorithm as a function of N. This paper

presents a comprehensive comparison of fixed radix FFTs,

mixed radix FFTs, WFTA, and PFA based on real operations

and memory arrays. This comparison provides the informa-

tion needed to select the most efficient algorithm to

perform the DFT based on machine size, machine speed,

and real operations.

1.3 Scope

This paper reviews the literature, selects DFT

algorithms for comparison, studies the theory of each

algorithm selected, develops the real operation and

memory count as a function of N, compares these algorithms

using tables and plots of operation and memory counts,

and presents an algorithm to select the most efficient

techniques.

2

M. ~ 4k IV - - - - -

The DFT algorithms selected for study and comparison

(1) Radix-2 FFT

(2) Radix-3 FFT

(3) Radix-3 FFT in the R(u) field

(4) Radix-5 FFT

(5) Mixed radix FFT written by the author

(6) Mixed radix FFT written by Singleton

(7) Mixed radix FFT available from International

Mathematical Subroutine Library (IMSL) on theCDC Cyber 74

(8) WFTA

(9) PFA.

Each of these algorithms has a particular advantage which

makes selection of the best algorithm dependent on the

machine size, machine speed, and sequence length.

1.4 Assumption:,

To a first approximation, the speed of an FFT

algorithm is proportional to the nunber of complex

multiplications used. The number of times the data array

is indexed is, however, an important secondary factor

(Singleton, 1969). Kolba and Parks, 1977, substantiated

4 this assumption by timing the PFA and FFTs on an IBM

370/155 for several sequence lengths and showing that the

FORTRAN coded PFA (having less real additions and multi-

plications) was faster than the FFT FORTRAN algorithms.

I 3"

In 1978 Morris demonstrated that the sequence of

arithmetic operations in a DFT algorithm's internal

structure can result in different execution times "between

ostensibly equivalent algorithms on a given machine"

and that the computer dependent algorithm/architecture

interactions may also alter relative performance of the

different algorithms. He modified the FORTRAN coded

radix-4 FFT and WFTA programs and matched them to the

PDP 11/55 and IBM 370/168 architecture and showed that

the WFTA offered neither time or space advantages over the

radix-4 FFT. Morris achieved these results because "the

radix-4 FFT appears almost ideally matched to the PDP-11

architecture" whereas the WFTA "has extra load/store

burdens" and requires extra data array indexing.

Morris demonstrated that it may be possible to

optimize DFT algorithms to match a certain machine, how-

ever, this type of optimization of the FORTRAN DFT alqo-

rithms is outside the scope of this paper. It is assumed

that existing FORTRAN coded DFT algorithms will not be

modified and selecting an algorithm which minimizes real

operations produces the most efficient algorithm.

This paper derives and tabulates real operations

counts as a function of N for the algorithms listed in

Section 1.3. The most efficient DFT algorithms are timed

on the CDC Cyber 74 computer and compared to the predicted

execution time based on real operations. These predicted

times are shown to be consistent with the timing results.

4

I

1.5 Approach and Presentation

A literature review is presented in Chapter II which

starts with the 1965 Cooley-Tukey paper and follows the

various DFT algorithm developments up through Kolba-Parks'

1977 article. The review puts Rader's 1968 landmark paper

in perspective with Winograd's "nested" DFT algorithm and

the subsequent work by Kolba and Parks.

Next, the theory behind the DFT algorithms is reviewed,

the real operations count developed, and the memory array

count needed for a sequence length N is determined. The

general expressions for real operations and memory array

counts are developed from published articles or from the

background theory and then plotted and tabulated as a

function of N. The readers familiar with the FFT and

Winograd background theory may wish to skip Sections 3.1

and 3.2.

In Chapter IV comparison tables and plots of the

DFT algorithms make it possible to select the most

efficient algorithm based on real operations and memory

* array required. Timing results from the CDC Cyber 74

system for representative sequence lengths are tabulated

to substantiate the assumption that minimizing real

operations equates to maximizing efficiency. An algorithm

is also presented at the end of Chapter IV which uses the

Setables in this paper to select the most efficient DFT

technique given the sequence length, memory size, and

computer add and multiply speed.

5

I ± .. .: . . .. . . .+ +"

I Conclusions and recominenclations are presented in

Chapter V. I

-I

A

1

4

4

-I6

- ~

II. LITERATURE REVIEW

The calculation of the Discrete Fourier Transform (DFT)

is a central operation performed in digital signal proces-

sing but was not widely used for other than trivial sequence

lengths because of the cumbersome DFT evaluation:

N-1X(k) = Z x(n)exp(-j2frnk/N) (2.1)

n=0

which required on the order of N2 complex operations.

In 1965 Cooley and Tukey published "An Algorithm for

the Machine Calculation of Complex Fourier Series" which

stimulated the widespread use of an algorithm which became

known as the "Fast Fourier Transform" (FFT). Their paper

proposed an efficient method of computing the DFT by factor-

ing an N length sequence into its prime components:

N = n 1 n 2 ... nm (2.2)

and then decomposing Eq (2.1) into m steps with N/ni trans-

formations within each step. If n1=n2= ... nm=2, the

operations are reduced to the N log 2 N level from the

previous N2 level.

Most of the early work on the FFT (Bergland, 1968) was

directed toward the special cases where N=2m which yielded

simple and efficient algorithms. These algorithms are

efficient because no multiplications are needed to evaluate

'. the 2-point DFT butterflies which can reduce the operations

count below the N log 2 N level.

7

Other "fixed radix" algorithms were studied and Dubois

and Venets nopouor nnbli.hcd "A New Radix-3 Algorithm" in

1978 which demonstrated that a radix-3 butterfly could be

computed without multiplications by defining a new basis

(l,u) instead of using the complex plane (l,i) basis, where

u is the complex cube root of unity. This technique was

later shown to be limited to the special cases of 3m and 6m

(Burrus and Parks, 1979).

Based on Cooley and Tukey's paper "mixed-radix"

algorithms were written by Brenner and Singleton. The

most efficient and popular of these algorithms was "An

Algorithm For Computing the Mixed Radix Fast Fourier Trans-

form" published in 1969 by Singleton and is frequently used

in digital signal processing where a wider choice of N is

needed. The Singleton algorithm can perform the DFT using

FFT techniques of any length sequence N but becomes most

efficient when N is highly composite from the set of inte-

gers 2, 3, 4, and 5. If N is a prime number the algorithm

2performs a DFT using N operations. The Singleton algorithm

became the standard against which all future DFT techniques

were measured.

In 1968 Rader presented "DFTs when the Number of Data

Samples Is Prime" which showed that a prime number length

sequence contains an (N-l) point circular convolution. lie

showed how to isolate the convolution by applying a permuta-

tion to the (N-l) signal points x(l), x(2), ... , x(N-1).

He also gave the permutation applied to the complex

871

multipliers from the set [exp(-j2irnh/N),k=l,2, ... , N-I].

Both of the permutations were generated by using a "primi-

tive" root which exists for N length prime sequences

(McClellan and Rader, 1979). Rader's paper was largely

overlooked for many years but took on new significance when

Winograd presented his new DFT algorithm "On Computing the

Discrete Fourier Transform" in 1976.

Winograd combined Rader's idea of converting a DFT to

circular convolution with his own fast convolution algo-

rithms to produce a new DFT method called the "Winograd

Fourier Transform Algorithm" (WFTA). Winograd provided the

fast convolution algorithms for short prime and prime power

length sequences and proposed that longer transforms be

computed by "nesting" the short-high speed transforms. He

presented a table comparing the WFTA to the radix-2 FFT

operations and showed that the number of additions remained

at the FFT levels while the number of multiplications was

significantly reduced.

Kolba and Parks published "A Prime Factor FFT Algorithm

Using High Speed Convolution" in 1977 which modified

Winograd's fast convolution algorithms to permit "shifts"

instead of multiplications by 1/2. They also changed the

nested structure of the WFTA in favor of a conventional FFT

decomposition. The decomposition of the sequence was based

on an algorithm proposed by Thomas, 1963, in his article

"Using a Computer to Solve Problems in Physics" which uses

an index mapping based on the Chinese Remainder Theorem.

9

Kolba and Parks selected several N length sequences and

compared their operations count to WFTA and FFT.

Paralleling Winograd's fast convolution work are the

studies into number theoretic transforms (NTTs) which have

been proposed for digital cyclic convolution and digital

filtering. The NTTs were first published by Pollard, 1971,

in "The Fast Fourier Transform in the Finite Field". He

showed that an analogous transform to the DFT exists in the

finite (or Galois) field where exp(j2nk/N) terms are

replaced by r n k in the DFT expression such that:

N- 1 nX(k) =Z x(n) rn k (2.3)

n=O

Notice that Pollard chose the alternative definition of the

DFT where the exponent of e is positive. The r term is

defined in the Galois field (GF) such that the same cyclic

convolution properties exist in GF and in the complex field

for the DFT. He then proved that this analogous DFT could

apply prime factor decomposition to the N length sequence

and perform N/n: transformations to reduce the operations

in GF to the N log2 N level which provided the FFT in GF.

Pollard proposed that this technique be applied to cyclic

convolutions in GF, multiplication of polynomials over

GF(pn), aperiodic convolution of integer sequences, multi-

plication of very large integers, division of polynomials

over CE (p), and a chirp-Z-transform for NTTs (McClellan and

Rader, 1979).

10

** %--

Pollard's paper stimulated more sLudy of the NTTs.

Reed and Truonq' s 1975 paper, "The Use of Finite Fields to

Compute Convolutions", includes complex valued NTTs. It

was snown that this NTT over GF(q 2 ) can reduce convolution

operations to the FFT levels. If q is sufficiently large

2the NTT can be used over GF(q ) to transform a sequence of

2complex integers x(n) into X(k) on GF(q ) for which the

2inverse transform of X(k) on GF(q ) is precisely the

original sequence x(n). Using these ideas filtering or

convolutions without roundoff errors can be obtained on a

sequence of complex integers.

Most applications of the NTTs have been in the areas

of digital filtering and convolution. The author was not

able to find any NTT algorithm which could be compared to

the FFT, WFTA, or PFA and perform all the same functions

as these three algorithms.

PFA, WFTA, and FFT represent the most efficient and

flexible FORTRAN programs available to perform the DFT.

Each algorithm has its own particular advantage over the

other two depending on machine size and speed for a particular

sequence lencth. None of the articles reviewed presents a

comprehensive evaluation or comparison of the three

4 algorithms based on real operations and memory arrays

required to perform a DFT for any sequence length N. This

paper fills that need so that an efficient algorithm can

be selected.

L

III. F"1T 'Theory

The set of algorithms known as the Fast Fourier

Transforms (FFT) use a variety of methods to reduce the

computation time required to evaluate the Discrete

Fourier Transform (DFT). The DFT is the central part

in most spectrum analysis problems and the FFT can improve

performance by a factor of 100 or more over direct eval-

uation of the DFT (Rabiner and Gold, 1975). Therefore,

the FFT is crucially important to the digital signal

processing techniques.

This section begins with "fixed radix" FFT algorithms

by discussing a "decimation-in-time" algorithm, the data

reordering (bit reversal) theory, the real operations

(addition and multiplication) count, a new fixed radix

algorithm in the finite field, and then summarizes the

memory required to use the fixed radix algorithms. Next

the conventional "mixed" radix algorithms are presented

by discussing the theory, digit reversal, real operations

count, and memory required to utilize the mixed radix

algorithms. This theory chapter concludes with a dis-

cussion of mixed radix algorithms based on fast convolu-

tion. The theory, data reordering, real operations count

and memory are also presented for these algorithms.

Before discussing the FFT algorithms comments must

be made relative to computing the trigonometric function

values needed to evaluate the PFT.

12

l~ l R ]. - : -

.,, . . . .

3.1 Comutinq Triqonomctric Function Values

The trigonomctric values used in 'l"Ts can be rcpre-

sented as values on the unit circle. The values are based

on integer powers of

exp (-j2ff/N)

which can be computed using sine and cosine functions. It

is useful to have accurate methods of generating the sine

and cosine terms other than the method of repeated use of

library sine and cosine functions.

The method most widely used in FFT algorithms

(Singleton, 1967) generates the trigonometric functions by

a difference equation given by:

cos ((k+l)a)

= (C • cos(ka) - S * sin (ka)) + cos(ka)

sin ((k+l)a)

= (C • sin(ka) + S • cos(ka)) + sin(ka)

where.2

C = -2 sin (a/2)

S = sin(a)

cos (0) = 1

sin (0) = 0

This technique is used for all FFTs presented in this paper

(except noted otherwise) because it minimizes using FORTRAN

library subroutinls cos (') and sin (-) thereby reducing

the overall FFT computation time.

13

3.2 Fixed Radix Alorithms

While FPT algorithms are well known and widely used,

they are relatively intricate and somewhat difficult to

grasp at first reading. There are two excellent textbooks

(Rabiner and Gold, 1975; Oppenheim and Schafer, 1975)

which discuss the FFT theory in great detail and present

FFTs based on decimation-in-time and frequency. Both

texts spend a great deal of time discussing the radix-2

FFT, which is the most widely known and ured. For this

reason, the radix-2 development is presented here as a

convenience for the reader and provides a theoretical

background from which the other fixed radix algorithms are

derived.

3.2.1 Development of Radix-2 Theory. To achieve

the reduction in complex operations (defined as four real

multiplications and two real additions) from N2 to N log 2 N

it is necessary to decompose the DFT computation into

smaller and smaller DFT computations. As a result, the

symmetry and periodicity of the complex exponentialnk

exp(-j2nnk/N) = WN can be exploited. This radix-2

algorithm is based on decomposition of the sequence x(n)

from the DFT expression:

N-IX(k) F. x(n)exp(-j2rnk/N) (3.1)

n=O

k = 0, 1, ..., N-1 and N= 2m

which is known as a "decimation-in-time" algorithm

(Oppenheim and Schafer, 1975). Since N is an even integer,

14

A°AL

X(k) can be computed by separating x(n) into two N/2 length

sequences consisting of even-numbered points and the odd-

numbered points in x(n). Using n=2r for n even and n=2r+l

for n odd Eq (3.1) becomes:

T 2rk T (2r+l)kX(k) = E x(2r)W N + E x(2r+l)W N (3.2)

r=O r=0

where T=(N/2)-l and WN = exp(-j27/N). By expanding

(2r+l)k kWN and factoring out WN Eq (3.2) ca: be rewritten as:

T 2 rk k T 2 rkX(k) =EZ x(2r) (WN) + WN E x(2r+l) (WN) (3.3)

r=0 r=0

2But WN = exp(-j4T/N) exp(-j27/(N/2)) = WN/2 and Eq (3.3)

can be written as:

T rk k T rkX(k) =Z x(2r)WN/2 + WN E x(2r+l) WN/2

r=0 / r=ON/

k= G(k) + WN H(k) (3.4)

Each of the sums in Eq (3.4) is an N/2 point DFT, the

first sum being the even numbered points of the original

sequence and the second sum being the odd numbered points

of the original sequence. Although the index k = 0,1,...,N-1,

each of the sums in Eq (3.4) need only be computed over

k = 0, 1, ... , (N/2)-l, since G(k) and 11(k) are periodic

in k with period N/2. After the two DFTs in Eq (3.4) are

computed, they are then combined to yield the N-point DFT,

X(k). Figure 3.1 indicates the computation involved in

computing X(k) according to Eq (3.4) for an eight-point

15

X

Figure 3.1 Floviqralph of the iDccim-i Lion- I r,-T i mcDecom--position of an N-Pi nt DFTComputation into Two N/-Point Pl'"T

4 Computations (N-)

NOTE: The int ecors on the branches of the filowqraphrepresent' the powers:~i- of 1-1 i ce. ,the "4"

N'N

S equence. 1' iquve 3.1 (Oppenhcimn anid ,;chafer, 1975) uses

the -i';ni flow c,_)mtvcion ,;uch Lhit brancles Lntering a

node are sunLned to jroduce the node variable. When no

coefficient is shown the branch transmittance is assumed

to be one. For other branches the transmittancc of a branch

is an integer power of WN. Note in Figure 3.1 that two

four-point DFTs are computed using G(k) and 11(k). X(O)0

is obtained by multiplying H(Q) by WN and addirg the product1

to G(0). X(l) is obtained by multiplying I1(l) by WI and

adding the result to G(1). For X(4) it would follow that

H(4) is multiplied by W4 and added to G(4), however, since

G(k) and H(k) arc both periodic in k with period 4, H(4) =

H(0) and G(4) = G(O). Thus X(4) results from multiplying

4H(0) by WN and adding the produce to G(0).

With the computation of the N-point DFT of Eq (3.4)

that number of computations can be compared with the direct

DFT computation of Eq (3.1). For the direct computation

without using symretry properties N 2 complex multiplications

were retuired. Eu (3.4) requires corpu,,ation of two N/2-!" 2

point )PT.;, whi ch require 2(k /2) co'r.f 1]x multiplicatio:s

and about 2(N/2)" complex additions (Oppenheim -ind Schafer,

1975). The two N/?-point DFT.i must be combined, requiring

N compl'x multipi jcations correspondinq to multiplying the

second u N by IV and then N comp lex additions, correspondina

to add iz, the p Ludact to the first sum. As a result, the

comput.ition of :(I (3.4) for ill values of k requires

717

--7

N + 2(N/2) 2 or N + (N/2) comlj]ex multiplications and

additions. For N>2, N + N 2/2 is les-;s than N2 "

The expression in Eq (3.4) corresponds to decimating

the original N-point oequence into odd and even N/2-point

sequences. Since N=2m the N/2-point sequences are also

even and then each G(k) and 11(k) can be further decimated

into two N/4-point DFTs, which could then be combined to

yield the N/2-point DFTs. Decimating the N/2-point sequences

in Eq (3.4) into N/4-point sequences gives:

(N/2)-l rkG(k) E g(r)WN/2/

r=0

(N/4)-l 2pk (N/4)-l (2p+l)kE 7 g(2p) WN/2 + E g(2p+ Np=0 N/2 W -2

Letting R (N/4)-l,

R pk k R pkG(k) = g(2p)WN/ 4 + WN/2 E g(2p+l) W/ 4 (3.5)

P=0 p=O

Similarly,

R pk k R pkH(k) p h( 2 p)WN/ 4 + WN 1 2 F h( 2 p+l)WN/ 4 (3.6)

If the four-point DFT in Figure 3.1 are computed using

Eq (3.5) and (3.6) then that computation would be carried4

out as indicated in Figure 3.2. Inserting the computation

in Figure 3.2 into the flowgraph of Figure 3.1 produces the

coiiplete flowgraph in Figure 3.3. Note that WN/2 WN was

used.

18

() 0 11

Q) 4J En

- :4$

1 0 4-)

H- 4-iI E- :

.H 0

N 4J~

4 0-40 1

4 4 -ig

.4 0

o~ 44'

-r 0

4-3 -c

Xo r4

X 4-

19'

ff

I-

200

-Ampa

For the 8-point DFT that has been used as an example,

the computation has been reduced to a computation of N/4-

point DFTs where N/4=2. An example, 2-point DFT for x(O)

and x(4) is shown in Figure 3.4. The complete flowgraph

for the computation of the 8-point DFT is shown in Figure

3.5 and was obtained with the computation of Figure 3.4

and inserting it in Figure 3.3.

Considering the more general case with N a power of

2 greater than 3 the same decimation procedure would be

continued by decomposing the N/4-point transforms in

Eqs (3.5) and (3.6) into N/8-point transforms. This

requires v stages of computation where v = log 2 N. Recall

that in the original decomposition of the N-point trans-

form into two N/2-point transforms, the number of complex

2multiplications and additions required was N + 2(N/2)

When the N/2-point transforms were decomposed into N/4-

point transforms the factor of (N/2) 2 is replaced by

N/2 + 2(N/4) 2 so that the overall computation now requires2

N + N + 4(N/4) complex multiplications and additions.

If N=2v this can be done at most v = log 2 N times, "so

that after carrying out this decomposition as many times

as possible the number of complex multiplications and

additions is equal to N log 2 N" (Oppenheim and Schafer, 1975).

The flowqraph of rigure 3.5 displays the operations

explicitly. By counting branches with transmittances of

the form Wr it is seen that each stage has N complexN

-.

21

4-J

A-4

0

440

etI

0

PL4

'. 4

22

L 0I I 0

41

0

0

X 1,

-4 ci

23,

mul ti p i ctt i on; and 'I complex addit. ion.. Since there are

lo0 2 N stages there are a total of N loj 2 N coriplex multi-

plications and additions as shown before. Further reductions

in the complex operations count can be achieved by exploiting

the symmetry and periodicity of W.N*

Note that on each "stage" of Figure 3.5 the computation

takes a set of N complex numbers and transforms them into

another set of N complex numbers. This process is repeated

vz0locT N times resulting in the DFT computation. For example,

in computing the first stage of Figure 3.5 one set of stor-

age registers would contain the input data sequence and a

second set of storage registers would contain the computed

results for the first stage. The sequence of numbers

resulting from the mth stage of computation is denoted as

X .(i), where i = 0, 1, ... , N-1 and m = 1, 2, ... , v. For

the following stage, the previous output array, X m(i),

becomes the input array and the new output array is Xm+l (i)

for the (m+l) stage of computation. Using this notation,

it can be seen that the basic flowgraph in Figure 3.5 is

given by Fiqurc 3.6. Using the notation of Fic;ure 3.6 the

equations of the butterfly are given by:

rXml (p) = Xm (p) + W X (q) (3.7)

r+N/2Xr (q) = X (P) I W, X (q) (3.8)

Because of the appearance of Figure 3.6 the computation of

Eq!; (3.7) and (3.8) are referred to as the "butterfly"

computations.

24

-ABC

4

4

4-)

U)

0

Q4J

41

r4

*Q

jo

The number of complex multip] ('ations can be reduced

by a factor of 2 u';ji g the syjiritry:

N/2

WN = exp(-j(271/N) • N/2) = exp(-jrr) = -1 (3.9)

so that the Eq (3.7) becomes:

rXm+(P) = X m(P) + WN X m (q) (3.10)

rXm+ 1(q) = X m(p) - WN Xm (q) (3.11)

Eqs (3.10) and (3.11) are shown in Figure 3.7 which reflects

the "twiddle factor" Wr out front in the butterfly. SinceN

there are N/2 "butterflies" of the form of Figure 3.7 per

stage and log 2 N stages, the total number of complex

multiplications required is (N/2) log 2N instead of the

N log 2N used in Figure 3.5. Using the "twiddle factor"

butterfly flowgraph of Figure 3.6 as a replacement for the

butterfly of Figure 3.4, the Figure 3.8 is obtained.

3.2.2 Development of Radix-3 FFT Theory. Starting

with the restriction that the N-point sequence be an

minteger power of three (N = 3 , m = 1, 2, 3, ... ), the

DFT X(k) was computed by seporating the discrete time

sequence s(n) into Lhrec N/3 point sequences. X(k) is

given by the DFT c:pression:

N-1 nk where k = 0,1, ... , N-1X(k) = Y x(n)WN (3.12)

n:0 and WN = exp(-j2i/N)

Breaking x (n) into three N/3 point sequences yields x (3r)

x(3r+l) and x(3r+2). Substituting these into Eq (3.12)

and adju5;Ling the respective summations to (N/3)-l yields:

26

.VI

L4-

4)

'4-)

ro

.ric~44J

.ri

-)

tP:

0

27

b -- - --- - -

44

4-)

0

ro

N , ,

4-)

-44

0.4-)

0

44.

Vo "

t - •. C , >

*J* ....

i IE l.-q

i /r-i 4

X x

AA,~

P (3r)k P (3r+1)kX (k) ). x (3r)IVN Y 2 x(3r37 1 )WN

-r- r=0

P (3r+2)k+ x(3j:2)W Nr=0

where P = (N/3)-1 (3.13)

By regrouping the exponents of WN Fq (3.13) can be

rewritten as:

P 3rk k P 3rkX(k) E x(3r)WN + WN Z x(3r+l)WN

r=0 r=0

2k P 3rk+ WN E x(3r+2 )WN (3.14)

r=03

By rewriting W as:N

WN = exp(-j67/N) = exp(-j2T/(N/3)) = WN/ 3 (3.15)

Eq (3.14) can be expressed as:

P rk k P rkX(k) = E x(3r)W + W Z x( 3 r+l)WN/3

r=0 N/3 Nr=

2k P rk+ WN Z x(3 r+2 )WN/3 (3.16)

r= 0

Each of the sums in Eq (3.16) represents an N/3 point DFT:

the first being the N/3 DiFT of the 3r points in the

original sequence, the second being the N/3 points of

3r+l, and the third being the N/3 points of 3r+2 points of

the original sequence. Although t:e index k of X(k) ranges

over N values (k = 0, 1, ... , N-i) each of the summations

in Eq (3.16) needs computation over (N/3)-I points. Eq

(3.16) can be rewritten to reflect this:

29

~• .P.'.

k 2kX (k) - 'k) I W.. G(1:) 4 W. I (L) (3.17)

Eq (3.17) can be implemented into the butterfly flowjraph

in Figure 3.9 using the accepted notational conventions

(Oppenheim and Schafer, 1975). The convention used for

the flowgraph is when no coefficient is shown, the branch

transmittance is assumed to be one. For other branches the

transmittance (multiplier) is an integer power multiplier

of W N ' In Figure 3.9 there are three N/3 point DFTs and

these are computed with F(k) designating the three point

DFT of the 3r points, G(k) designating the three point DFT

of 3r+l, and H(k) designating the DFT of 3r+2 points,

where r = 0, 1, ... , (N/3)-l.

X(O) is obtained by (1) multiplying 11(0) by a branch

transmittance of 1 (which equals WN), (2) multiplying

G(0) by 1, (3) multiplying F(0) by 1, and (4) summing the

three. Likewise, X(l) is obtained by multiplying H(l) by

2 1WN, multiplying G(1) by WI , and adding the results to F(l).

X(6) has 11(6) multiplied by W1 2 and G(6) multiplied by

q6WN and the products added to F(6) giving:

6 q;6 I 2 ( )( 8X(6) = F(6) + WN G(6) + 11(6) (3.18)

However, since F(k), G(k), and if(k) are all periodic in

k with period N/3=3, the periodicity can be exploited to

yield F(6) F(0), G(6) G(0), and 11(6) 11(0). These

results can be substituted into Eq (3.18) to give:

X(6) = F(o) + W6 G(0) + 2 11(0) 19)

30

.//

XNN

xW )

Figure 3.9. Butterfl, Flowqraph for Fir:itL StageDecimat ion t N-9) .

NOTE. The nt,,,bcr,; T n t,, ,. t .svr nt . W . • . *, . ,

w9* 31

31

Continuin,! to use the, neriodic properties, the

results for X(0) Lhrough X(8) are:

X(0) = F(0) + IS(0) + 11(0) (3.20)

1 2X(1) = F(1) + 9 G(1) + W9 11(1) (3.21)

2 4X(2) = F(2) + W9 G(2) + W9 H(2) (3.22)

3 6X(3) = F(0) + W9 G(0) + W9 H(0) (3.23)

4 8X(4) = F(1) + W9 G(1) + W9 H(1) (3.24)

5 10X(5) = F(2) + W9 G(2) 4 W9 H(2) (3.25)

6 12X(6) = F(0) + W9 G(0) + W9 H(0) (3.26)

7 14X(7) = F(l) + W9 G(1) + W9 H(1) (3.27)

8 16X(8) = F(2) + W9 G(2) + W9 H(2) (3.28)

Eqs (3.20) throuqh (3.28) conclude the first stage decimation

of the 9-point sequence. The DFT computation has been

reduced to computations of N/3-point DFTs where N/3 = 3.

An cy. h 1 -, 3-,oint. UFT for x(0) , x(3) , and x(6) is shown in

* Fiqure .10. The complete flowqraph for the computation of

the 9-i1)iiit PITI is shown in Figure 3.11 and was obtained by

" u t it ic';,i ptuh' of Figure 3.10 into Figure 3.9.

('on:-id (-i fji( t he more general case with N a power of 3

,t,.%. , '. , a ti, ,r d vnittion procedure would be

Sin 1; J'i' ()sin'; tc N/3 DFTs into N/9 computations

.(i , l (C k ), itnd 1(0k). The DFT of F(k) is:

32

04-)

S-i

4)

0

g.4 (n

H 4-)0

4 .q

4) A

4.J

0

0

o r) 0|...

02:

E-1

C3)

33

a) ci

Figjure 3.11. Compiotce ButLerfly I'1owfqrai~i C.-

a', NOTE:L Dig its on the i mich traw-smittancc - i

34

LM Amok-.

(N/3) -I rkF(k) Y x(r) W (3.29)

Thi!s cquation, It ttinri 0 (N/9)-I, can be divided into

three N/9 lonqth sequences:

Q 3ik Q (3i+l)kP (k) Z f(3i)W + E f(3i4 )WN/i=0 )N/3 i=0

Q (3i42)k+ , f ( 3 i+ 2 )V.N/3 (3.30)

Expanding the exponents of WN/ 3 E( (3.30) can be rewritten:

Q 3ik k Q 3ikF(k) ' f( 3i)WN/ 3 + WN/ 3 Z f(3 i+l)WN/ 3

i= 0 =

2k Q 3ik+ w f( 3 i+ 2 )N/ 3 (3.31)

N/3 iO ,/3

Using the substitution W W

Q ik k Q ikF(k) f 1 0 (3N)WN/9 + WN/3 f(3i+l)/i=o i=-o wV

2k Q 1k+ WN/ 3 f (3i N2)/9 (3.32)

Similar c:pre isi ons for (() and li(r ) caln be derived:

1k k Q ikG(k) q(3 i)W+," + h/

i:0 N/3 1<0

2k C)i k+ wN/3 0 q M-f2)W 9 (3.33)

Q i k Q ikIf(k) = h( ) , 4 j h (lIi 1)W /

2k Q ik+ W~N/3 i-w N/3 hi (3i 2)V,/9 (3.34)

• 35

.-

3] 3 2 1 !rjo 14 1 I: u ( t .i( i t I .-

I1q (3. 32) hrlo'Jqh Ci;. 4) (', w i,. u:;iid I, !, i. titic

qeniru] (x)n; i, foi a nid.ix-3 buttcrfIly 1 lowqrsh.

Lettinq N .9 the c,.pres ;ions for 1.'(k), G(k) ,nd 11(k) ),,(come:

0 01.)(0) 0) I(0 3 f(1) + W3 f(2)

1 2F() f(0) + W3 f(1) + W3 f(2)

2 4F(2) = f(0, f W3 f1( ) + W3 f(2) (3.35)

0 0G(0) g (0) + w3 (1) + W3 g(2)

1 2G(1) = g(O) + W3 g(l) + W3 g(2)

2 4G(3) g(O) + W3 g(i) + W3 g(2) (3.36)

0 01(0) = h(O) + W3 h(1) + W3 q(2)

1 21(1) = h(O) + W3 h(1) + W3 g(2)

2 4H(2) = h(0) + W3 h(1) + W3 g(2) (3.37)

From Eqs (3. 35) through (3.37) the qene, l butterflv

mult ipi.ers ire derived (c-un, irten with O],,, Y .r , ,l

Sch; f('1 ) to be:

k 2kX(k) = F(k) + WN G(k) ± WN if(k) (3.38)

k+r 2k-12rX(k=r) F(k) - WN C(k) + WN 11(k) (3.39)

k+2r 2ki4rX(k 2r) - I-- (k) + w 4k) 4 N 11(k) (3.40)

where r repr ;( ,t the di.;tzanco betwen th, (,iipoi nt:; of

the but t erf l'. In I- i ur, 3. 11 r I for .;t aio I and r -2 for

36

.. t -'. ...-

sti;e 2. 14(1:, (3.3') til 0u 11 (3.1 0) are r.mresenl (d in

li, .uie 3.1,1 wi,ch i'. ih,8 t j enor,il radix--3 buttierfly

The exnon ents of Figure 3.12 can be- rewritten to:

k k (.1Wk 11 W (3.41)

2k+2r 2k 2r (3.42)W Iq VI( . 2

Wk+2r :Wk W 2r (3.43)

W2k44r W2k WAr (344)

With these expressions for the butterfly multipliers an

alternative arrangement to Figure 3.12 is possible by

"premultiplyinq" or "twiddling" the inputs to G(k) and

H(k) (Centleman and Sande, 1966). The multipliers WNk

2kand WIk represent the twiddle factors of the butterfly

in Figure 3.13. Since N=3r (Oppcnheim and Schafer, 1975)

the butterfly multipliers can be reduced to:

r =rWN 3 3r exp (-j211r/3r) = exp (-j2,/3) (3.45)

= -0.5 - j.866

-r 2 1 (-j4,,i/3) -0.5 + j.806 (3.46)

4r 1- 'I. N 3 r p, (-jTS'r/3) -0.5 - J.866 (3.47)

1Oplnh,im ,ind -chafer obo::rvcd that there is no advantage

4in FigJure 3.12 to the alternato twiddle factor version in

]'i ur 3.13 lcc,iu e "e:.: (-j 2 /3) 1.ndI all [he pow:ers theieof

illO C,,eel', CN fici ,nti; that rei Ve mult Jp] ications"

lowever, for tht, particuliar ]'ORTRAN FFT radix-3 programs

which i upi e,,nt , 1 ifxvi 3.12 and 3.13, the twiddle factor

37

'- -..

1

* 4-1

-4-)44'

38d

AAA'

'4-J

4J

4

t7

version of the radi:--3 FI"' was much norc efficient to

imp, t nt b ,cause on itwo Lwicdic.- iactoL,; had L be compuLed

(Wk and W 2 ) per buttLrfly and the butterfly multipliers were

the constants in Eqs (3.45) and (3.46), the original version

of Figure 3.12 requires that all six complex multipliers be

computed for each butterfly. The twiddle factor version

represents a simplification over the original raiix-3

butterfly.

3.2.3 RaCix-5 Theorv. The theory for the radix-5

algorithm follows a development similar to the radix-3.

Because of this similarity only the radix-5 results are

given here for comparison to the radix-3, readers interested

in detailed development are referred to Appendix D.

The basic butterfly multipliers for the radix-5 are

given by:

k 2k 3k 4kX(k) = A(k) + WN B(k) + WN C(k) + IN D(k) + WN E(k) (3.48)

k+r 2k+2r 3k+3rX(k+r) = A(k) + 1 B(k) + WN C(k) + WN D(k)

4k+4r+ WN E(k) (3.49)

k+2r 2k-i4r 3k+6rX(k+2r) ACE) + W N B3(k) + W N C(k) + WN D(k)

4k+8r+ W N E (k) (3.50)

k+3r 2kA-6r 3k+9rX(k+3r) A(k) + WN B(k) + W N C(k) + WN D(k)

4k+12r+ WN E(k) (3.51)

40

-1.m

k4-4 r 2k+r 3k+12rX(k+4r) A(k) + W 1 ,(k) 4- %., ( k) A WN D(k)

4k+16r+ iN LE(k) (3.52)NI

The Eqs (3.48) through (3.52) ar(, shown in the twiddle

factor butterfly of Figure 3.14 wher,: "r" is the distance

between the butterfly and points. Since 1; ''r the butterfly

multipliers reduce to constant comiplex multipliers of:

r 6r 16rWN = WN =- =N cos(2,/5) -j sin(2,/5)

2r 12rWN = W = cos(4n/5) -j sin(47/5)

3r 2r, 8rW = (WN ) N WN cos(4,/5) +j sin(4r/5)

4r r , 9rWN (WN = WN = cos(2T/5) +j sin(2-r/5)

These constant butterfly multipliers are computed once

during the PET computation and used in every radix-5

butterfly.

3.2.4 Digit Reversal Algorithm. In order for the

DFT to be computed as discussed above, the input data must

be stored in nonsequential order. In fact the order in

which the input data are stored is in "bit-reversed" order

for the radix-2 FFT and "digit-reversed" order for the

other fixed-radix algorithms. To see what is meant by this

terminology note that for the 8-point radix-2 flowgraph of

Figure 3.8 three binary digits are required to index through

the data arroy. Writing the input indices X0 in binary form

and then reversing the order of th, .-s gives:

4/.o. 41

-----

,2i~::

A p::, \_N,,,.. . ----_ . .. . . . . . . . . ..--'.", -

f ). " . ... ?"__+__

" t_ J . .__ _ " "... " " " "

'K

" ,,

,.7 -..- -

- - - -- - - - - - - - - - - - - - - --\-------

A4

, ,:,*:" , ," '. -. t : ;i , r *'.... c Fac- . ..

Xo(O) = Xo(O00) = x(O00) = x(0)

X0 (1) = X0 (001) = x(100) = x(4)

X0 (2) = X0 (010) = x(010) = x(2)

X 0 (3) = X0 (011) = x(ll0) = x(6) (3.53)

X0 (7) = X0 (1ll) = x(lll) = X(7)

If (n2 nI no) is the binary representation of the index of

the sequence x(n), then sequence value s(n2 n1 no) is stored

in array position x0 (n0 n1 n2 ). That is, in determining the

position of x(n2 n1 n0 ) in the input array, the bits of

index n must be reversed in order.

For the radix-3 FFT the input array must be in a

similar nonsequential order. The order is determined by

"digit reversing" the input sequence value using a modulo-3

counter. The digit reversed radix-3 FFT example where N=9

is shown in Figure 3.15. The modulo-3 counter is given by:

COUNT = (b1 * 31) + (b0 * 3 ) (3.54)

where bk = 0, 1, 2. The reversed count is given by:

REVCOUNT = (b0 3) + (bI • 30) (3.55)

Eqs (3.54) and (3.55) show the modulo-3 counter for N=9

which requires only two b bits: b I and b0 to represent the

input sequence. For the case where N=33 =27 three bits are

needed to repre;cnt the input sequence x(n) and the modulo-3

counter becomes:

43

baf:e , wts I0 bn5;c 10 base "

X(o() 0)' (o) goo)

x(20) x(6) (2) X(02)

/01) -- x(i) F:F()-Al Lterfly

x(11) x(h)

x(21) = x(?) X5) = x(12)

x(02) x(2) (2G)

x(12) . ),.7)" '

='2 X P, (-22j

Figure 3.15. Digit Reversed Input and OutputArrays.

44

!0

C()U '' (b ,2) I (h ) ) ( 0 , ) (3.56)

aiid the iQXE d iI jt. oimnter if;:

REVCOL),' - (b0 * 32) + (b1 3 ) + (1)2 * 30 (3.57)

Siviilarly the conera-I ex r ss o i l. ; for COI.VI' and [k'C( i'!:r

can be given where N:3 m' and b k = 0, 1, 2:

COUNT -- (b m_ 1 • 3 m - 1) + (bn_ 2 3 3m-2 ) 4 ...

+ (bI 31) + (b0 • 3 ) (3.58)

and

REVCOUNT = (bI 3m-1 + (b2 3m- 2) + ...

+ (bm-2 3 ) + (bM_ * 3 ) (3.59)

Once COUNT and REVCOUNT are computed the magnitudes are

compared. If REVCOUNT is less than or equal to COUNT a

swap of the values indexed by COUNT and REVCOUNT is not

required; otherwise exchanqe the array value indexed in

by COUNT with the array value indexed by REVCOUNT. The

cojnters are incremented by one and the process continues.

until all N in.ices h tve been testcd.

3.2.5 Dro- <op'n- oF a Rad :.-_l ' ,T . i ,

Cu!),- , tI of Un Ly. This section prosen t-. tl!,h th-iorv (

a rad[::-3 FFT algorithm which uses the comiplex cube 1,ot or a

unity to perform the complex Fourier transformation (ftter-

.* fly) .ithout i; ijng mu]tiilications. The bcne'Lit of Lhis F

techn.i que w.i1. also be discussed in the section on real

operatlions count.

/

"I /

*A 45.s

W i .I : t~i (1c (l loi.; ,inId VenCt:aInopoulos,

. .':, .: > [ i>1 oI ; LuCo iq(]uC, it

.V.. C out .V(,v ,rI 1 twtI . which Cii in understanding the

th(ory and for that reason it is presented again here.

This a] (orithm u.Cos basis vectors (l,u) instead of the

conventional complex plane vectors (1,j) to perform the

cori)]cx Fouricr transform (wherc u is the cube root of 1

and j is the square root of -1). The new basis vectors

use arithmetic notation:

a + bu = R(u) ; a, b, real numbers (3.60)

Taking u as the cube root of 1 implies:

u - 1 = 0 (3.61)

or

2(u-i) (u + u + 1) = 0 (3.62)

Since it is known u . 1, then

u + u + 1 =0 (3.63)

or

2u= -1 - u (3.64)

Eq 3.60) is used in the definition of multiplication in

th,, R(vi) field:

(a + bu)(c + du) = ac + bdu + adu + bcu (3.65)

Subs-tituting Eq (3.64) into Eq (3.65) results in:

(a + bu)(c + du) = (ac - bd) + (ad + b(c-d))u (3.66)

The cxp-ession in Eq (3.66) can he expanded and then

,'cocl~i ned to reduce the number of multiplications:

46

.-. -'

ad + b(c-d) ad - bc - bd - bd + )-( + ac - ,ac (3.67)

=ac + ad + bc + bd - ac - bd - bd (3.68)

= (a + b)(c + d) - ac - bd - bd (3.69)

Substitutinq Eq (3.69) into Eq (3.66) qives:

(a + bu)(c + du) = (ac - bd) (3.70)

+ ((a + b)(c + b)- ac - bd - bd))u

The result in Eq (3.70) requires three real multiplications

and six real additions compared with conventional complix

multiplication which requires four real multiplications and

two real additions. Multiplication in the R(u) field requires

one less multiplication but four more additions.

3 3The expression for u is obtained from u = 1 by letting3 3

u = (exp(-j27r/3)) = 1. Consequently, u = exp(-j27/3) =

-1/2 -j (/3/2) which is used for conversion between a + bj

and c + du:

c + du = c + d(-1/2-j(/3/2)) = c - d/2-j(¥f/2) d .71)

c + du (c - d/2) + j(-,3/2)d 1.72)

To find the conversion from a + bj to c + du, solve

Eq (3.70) for j:

c + du (c - 1/2) +-

d/2 A- du = (-/r3/?)cl j

d(1/2 + u) = (-,F]2)d j

1/2 1 u j

t j - (-2//3) (]./2 + u) (3.73)

47

. ..7 ,

Us in; Inq (3.66) nd a + h- the conver;j on to c + du is:

= a + b(-2/(3)(1/2) + b(-2/v/j)u

a + bj (a - b//J) + (-2b//3)u (3.74)

Using the R(u) arithneLic dveloped above, it can be

shown that a radix-3 FFT butterfly can be developed which

require; no multiplications except for the twiddle factors

in Figure 3.13.

Using Eq (3.74) and WN = cos(2-ir/N) + j(-sin(Pr/N))

produces:

c + du = (cos(27r/N) + sin(27r/N)/!)

+ (2 sin(2Trr/N)//3)u (3.75)

Using the substitution of N = 3r in Eq (3.75) reduces it to:

W r = (cos(2Tr/3) + sin(21i/3)/3) + 2 sin(2T/3)/,3)uN

Wr = 0 + lu = u (3.76)

Likewise the remaining W terms in Figure 3.7 can be reduced:

2r

WNr= (cos(4v/3) + sin(4 /3)/ ) ±+ 2 sin(4v/3)//33)u

W -1 u (3.77)N

W4r 1- 0 + I u - u (3.78)tN

SubstiLuting Eq.; (3.76) through (3.78) into Figure 3.13

produces Figure 3.16.

48

H

. 4

'-44

'-4

4-)

49-

W-..

Ii :;ilx , ,;.itlii. i , Ic I ' Ln) 111ic I "'i:",; -;ut Ilk o',cr t- io l

.11 ,ii I .I. ] ' .,i, , [ t i Ii V, IO ; it. ii I '. I 4. :' : J11 I. nlo

liult j p' I :; irc, rcnl ui rud to YC.l uLt th, bittof I]v u o'.4rph.

X. , Yi arc the butterfly inpluts afte, twiJd]e fllctor i::u lti-

p iication ann A(.), B(.) arc the hUt t ir ],, out ju ts iln the

R(u) field.

A(]) + 13(1)u - (X1 + X2 + X3) + (Y1 + Y2 + Y3)u (3.79)

A(2) + B(2)u (X2 + Y2u) (0 + u) + (X3 + Y3u) (-1 - u)

+ (Xl + Ylu)

A(2) + P,(2)u = (-Y2) + (X2 + Y2 (-1))u + (-X3 + Y3)

+ (-X3)u - X1 + Ylu (3.80)

= (X1 - Y2 - X3 + Y3) + (Y1 + X2 - Y2 - X3)u

A(3) + J3(3)u = Xl + Ylu + (X2 + Y2u)(-l - u)

+ (X3 + Y3u)(0 + u)

= Xl + Ylu + (-X2 + Y2) + (-X2)u + (-Y3)

+ (X3 + Y3(Ql))u (3.81)

= (Xl - X2 + Y2 - Y3) 4 (Y- - X2 + X3 - Y3)u

There are 16 real adti t-ons shov..- in Eqs (3. S0) and

(3.81); however, by corbinin common ter-ms -Y2 - N3 - -R

and -X2 - Y3 = -S, tie radix-3 bu-Lr I y can be -,valuaLed

I using only fourteci rc al additions (noe ]ecting the twiddle

* 4 fact:ors)

A(1) X1 + X2 ± X3

B(1) Y1 + Y2 + Y3

A(2) - Xl + Y3 - R

B(2) Y1 + X2 - R where R = Y2 + X3

0

SA(3) -- N + Y2 -

B ( 7 ) I 4 X3 S;. ,. 5; c ; Y3

3.2.6 Summary. T1liC; (,te5 t,, (!iscusic n of

fixed rada:: II?' theory. in this secLion the gceiiral theory

was develope(d( usi nj Lh! radix- 3 eas, a<; an alternatyive to

the more common radix-2 d(evelopmcnt. A decimation-in-time

for N--9 was shown and the basic butterfly equations for

radix-3 was derived. ,ecause of the similarity to radix-3

butterflies, the radix-5 theory was not developed but the

butterfly equations necessary to implement a radix-5 FFT

was given. Finally, a new radix-3 FFT (Dubois and

Venetsanopoulos, 1978) was developed.

3.3 Real Operations Count for Fixed Radix FFTs

The speed at which an FFT algorithm can perform the

DFT is a (to a first approximation) proportional to the

numboer of complex multiplications used in the algorithm

(Singleton, 1969). The number of times the data array is

indexed is a secondary factor :and is shown to have minimal

impact on the results of this paper.

An lnoima:l\" in th nei.,nilure <;heu] be oinI out

before further discussion of "complex multiplications"

related to FIVPs. A Complex m]ltipication implies fourreal multipl i cations and two real additions. It has been

sho;.n1 ( Sinl tn, 1969) that (p-]) 2ial multi pica ions

are required to evaluate a complex transform of dimension

p, p odd, where N p Sinqleton then reies to the (p-1

51

2' ru. ]i lu]t ll[ a i as (p-] ) 2 com:1p](-, 11ulti'J"Iicatiolls

form of diinension p, requ.i ies miorc tim ii (p-1) 2/2 real additior s.

Throughout this paper all references to multiplications and

additions arc. in terms of real operations and noL comli]cx

operaticns.

The real operations are detcrmined from (1) the ntumbcr

of butterflies times the number of real operations required

to compute the butt ,rfly and (2) the numiber of twiddle

factors times real operations required per twiddle factor,

and (3) the number of trigonometric functions (sine and

cosine) which must be computed. The real operations count

for a radix-p FFTs are derived as a function of N, m, and

p where Np

3.3.1 Number of Butterflies in Fixed Radix-p FFTs.

The number of butterflies is dependent on N, m, and p,

where N=u Examining the radix-2 FIT in Figure 3-8 shows

that there are 8 input point.; and 8 output points for each

St a, . Te, radi x-2 butterflv in Figre 3. 7 has 2 i nput

aiid 2 out pa - }< .t t,'hich mccns chat ]ici.ure 3c. must have

8/2 4 . ,utL e yc. ,ir st a{j,. Tieere are 3 sta <c-s in this

radix-2 I''T (where N -2) giving a total, of 12 butterflies

"b in thi. ; I,'T.

(In p -III; a th' N~iI (r Of. rab tterf1 i is given

by: nm/p (3. 82)

bThis ( u.ition can be checked for the radix-3 example.

(;iven that N-9, p- 3, and 1112 1,q (3.82) gives the total

52

P1,I ) Ch r I. I it t. 2. .i': 9/3 6. Thi iv i fi Jed

1)y .1' ("Ii. .I I I .VIi iul .5 6 r ei- u eiI

3 .3. Numb11er wdie1conIuV eRuix-

I l T s . The i-tv dd le iict- ors are coi,,,,p i'- mu It tii I.i( of t he

f orn C' x") ( -- j 2) l./U, ) %.,I Is hi i:U 1 I p]' y c Q. i rai ci x-p hout t r f Iv

as shown inii Pj ure 3. 8 . NotLice tl)it: cL ich fs t-ape has- N/p

8/2 = 4 butterflies, eatch of w.hichi rciiii yes p-i 2-1, =1

complex tv Eddie factoi: . The genera 1 (pre.05 s ion for number

of twiddle factors inl each s~tage becoreo,-s:

N(p-l)/p (3. 84)

Given that N=p m there are in stages in a radix-p FFT making

the total numiber of twiddle factors for the FFT equal:

mN(p-l)/p (3. 85)

0some of the complex twiddle factors are W N =1 and can be

eliminated. In any FFT there are N-1 of these unity twiddle

factors (S-iaici-ton, 1.969) w.hich gives the final expression

for the nui-,ber of comple~x twiddle factorsa:

mN(p-l),/p - (N-1) (3.86)

') 3

fa jr .s .cn; to Lbec 5. TP :a ii n ii, 1'iu i c 3. 8 for N:2

s tlcre c, 5 noi 1- Cii il.t.v fw icih ?ict or

4 ~3.3. 3 N111oher oFTieioletrcRne ions Recuiuied

for t lle pi.:J nod adix f\ lihs.The t riqenemiet iic functions

of, 5.i 1c Ill(! (OiIi 1le llkuciJlei to co v ill(, twid(11 I factors.

Tli, 1,.1(11-:-ad2 a lyjo.' thim uses caIl1is; to the FOPTRAN

iiI111 P> : i COS fun.1 I. tansi; '1s well- -is thle di.fforence

53

equ;t i In ; give,;I in S.ct o . Tihe raldi: -3 and 5 FV 'i's

U'j ' O t,'~]

The ra-di x-, .l(c,1-J Li]u i n Ap lidi x A coipii)utes one sine

and cosine at each stage of the FFT using:

W -- CMPL)I (COS( P1/L]]_), SIN(PIr/FiL]].))

Each radix-2 PlI T has m s ,ages where N=m which means the

sine and cosine funcLions0 are called m ti mes for the FFT.

Once the initial sine and cosine are computed for the

stage each new Lwi ddl e factor in the stage is computed

using the complex multiplication:

U = U * W

where the complex U was originally initialized to U = (1,0).

The complex multiplication U * W effectively implements

the sine and cosine difference equations in Section 3.1.

The number of times U * W is computed for each PFT stage

is a function of the number of different twiddle factors in

the stage m i. ln Figure 3.8 the first stage has only one

0type of twiddle factor W, , the second stage has two types:

0 2 0 1 ,2 3. and P , while st-acf hs four: W , W The

genera! e::eo-Cfo tl(' Lypes of twiddle falctors iln

." ('.achl 5 [age iu:

TF =2k - I

0Thus for stage J, k=l and TF-2 0=1, which gives one type

of tv-i(Mlc factor; for st nc 2, ]--? and TF: 2 Z2 giving two

types of twidd]e f.actr:;; and fi nall]y for the last stage

Cin thi.; example ]3 and TV'=2 =4, or four types of twiddle

factors alre required. In general for the radix-2 1.'-,'T in

5.1

A, ,,eondlx A the ofiul( x mui.ltip] ication U * i is evai.Liated

. Le t l o [

; (2 )k=l

times, where m 1s tihe number of stages for H=4 . Given

that the complex multiplications requires 4 real ultipli-

cations and 2 additions, the number of operations required

to compute sines and cosines for this radix-2 FFT is:

real mult = 4 Z ( 2 k - ) (3.87)

real add = 2 X (2kl) (3.88)

k=l

sine and cosine calls m (3.89)

.re real operations required to compute the sine and

cosine lookup tables for the radix-3 and 5 algorithms is

less complex than the radix-2 FP'T. In these algorithms

the difference equaLien from Section 3.1 is used to compute

sino and cos ine look:im tebiCs which have length N. Because

of the symmce-r • of sin(k) -sin(-1) only N/"2 coriputations

of the dlif fvoc'i c,,ua .ons aer reqci.red. The equations

are given by:IWKC(I) C * W,C(l-1) - S* WKS(I-I) + WKC(I-l)

WKS (I) C * W4-(-1) + S * WKC(I-]) + WI(S(I-i)

whicl need a total of 4 real, mu]tipl.icat ions and 10 additons

to compute. For an N ong'Lh sequence computing the lookupI"

- tables require:

5 5

r a ,l IlW t 4 (!/2) 7 2N (3.90)

rTOd. add 3 0(N/2) -5N

3.3.4 Number of 1 ,a O)er. ion s in Pad iZ -p 'l"s.

Based on the (gefneral cxpressions in Eqs (3. 82) through

(3.91) the total nuimber ofF real mu]l.tipl ications can be

determined y Lven N-pI wiiere N, p, and m are integers

First , each radix-pbutterfly computation requires multi-

plications or additions or both to be evaluated. The

e:act number of multiplies and adds is determined from the

FORTiRAN code as shown below. Second, each complex twiddle

factor multiplication requires 4 real multiplications and

2 real additions. Third, the number of real operations to

compute the sines and cosines is added to the butterflies

and twiddle factors to (jive the total operations count for

each algorithm.

For the case of N: 2 m it was shown in the radix-2

SecLion 3.2.1 Lhat the radix-2 butterfly can be computed with

4 real. addiLions and no multiplications. This radix-2 butter-

fl y can be compIut d witL 4 re al additions and no mul t i plica-

tion: ;. This _,('ix-2 IF[' does not c,]iminate a]11 mu]. l) l., ice-

tion. by W0 . Therefore each radix-2 butterfly is multiplied

by a comIl)e twiddle fzctor as shown in Ficure 3. 8. For this

parLicuilar i-x iix-2 FFT Ilie number of twiddl.e factors equal

tin' nmber of ul tor- I '. Colbin all sources of real

- " op .,11 i ons for Li I' radix- 2 FFT gives a total of:

56

real r~iilt- (! nu].t l - butter1l') * (41 butterflies)

4, ( L\.iV1 factm>.) (3.92)

+ 4 (: type.,; of twiddo factorl)

Substituting the appropriate values for the r,di::-2 (jives:

real mulL = (0) * (ncN/2) + 4*(raN/2) + 4 * ( 2 -

k 1m

= 2mN + 4 2 k-I (3.93)

k=l

Likcwise for the nu-mber of real addit ions:

real adds = (f adds per butterfly) * (4 butterflies)

+ 2 (U twiddle factors) (3.94)

+ 2 (f types of twiddle factors)

m k-1real adds = 4 * (mN/2) + 2*(mN/2) + 2 * ( 2 2

k=1

= 3mN + 2 Z 2 k-l (3.95)k=l

For the radix-p FFTs where p is an odd prime it has

been shown by Singleton, 1969, that those butterflies can

2be evaluated using (p-1) real multiplications. The

FORTRAN codc, racli- -3 and rad i x-5 in Arppend-ices 13 and D

require 4 real multiplications and 1:1 a(!:dtion; for radix-3

buLterfl ie.o and 16 rc ]i mu'1t11.dicat. nd 30 ad iltions

for radix-2 butterflies. Using these in Eqs (3.87) and

(3.91) yields the total real operations for the radix-3 as:

t

5 "

real mul 1 -L ( mulL pr butLerfly) '. iN/3

+ 4(mn(3-l)/3 - (N-1)) + 2N

4raN/3 + 8mN/3 - 4 (N-i) + 2N

41,n - 4(N-l) + 2N (3.96)

real adds (12 adds per butterfly) * raN/3

+ 2(j(3-l)/3 - (N-i)) + 5N

= l2mN/3 + 41TLN/3 - 2 (N-i) + 5N

= 16maN/3 - 2(N-I) + 5N (3.97)

Similarly the real operations count for the radix-5 IFT

becomes:

real mult = (16 mult per butterfly) * mN/5

+ 4 (inN (5-1)/5 - (N-i)) + 2N

= l6mN/5 + 16mN/5 - 4(N-i) + 2N

= 32imN/5 - 4(N-1) + 2N (3.98)

real adds = (30 adds per butterfly) * mN/5

4 2(miN(5-l)/5 - (N-i)) + 5N

= 30niN/5 + 8mN/5 - 2(N-1) + 5N

= 38mN/5 - 2(N-I) + 5N (3.99)

The resu] K of Els (3.92) through (3.99) are given in Table

3.]. for N hc.ILween 8i andI 16,000. This table also sunmrizes

the possible values of N for the fixed radix-2, 3, and 5

4 PETs.

3.3.5 Real Operat]in.. Count for the Radix-3 FT

Uisingc Ll - ('mp-c- C Mfot of Unity. This algorithm

represents an alternative to the conventional radix-3 FFT.-It is shown in this section that selective use of this

. ..8 {

/

'TABLiE 3.1a

N Radix Multi .i i ca t ions "Addi t ion:; Tr i : i -V.

8 2 3 76 86 3

9 32 58 125 1

16 24 188 222 4

25 52 274 457 1

27 33 274 515 1

32 25 444 542 5

64 26 1020 1278 6

81 34 1138 1973 1

125 53 2154 3227 1

128 27 2300 2942 7

243 35 4378 7211 1

256 28 51.16 6654 8

512 29 11260 14846 9

625 54 14754 20877 1

729 36 15142 25517 1

1024 21.0 24572 32766 10

2048 211 53244 71678 11

2187 37 56866 8 .211 1

3125 55 93754 1 28127 1

4096 212 1146, 1 5564(l 12

8656 1 3 1908 , 2 () ) 1

8192 213 24 r, 1 3) 70 13

'.5

o.

[,

5 (*)

~ ' . .!

alqor- Lhm cli i ,dth it, h.u ntl -r Of. 1f r ;II ]Cn' r;It 1i. l.'; (1 CpeI I 11-

,d eln tiht' I .I L~ , ],' l j Lh ) 1 : .

The radi:.-3 I'IT in the R(u) field ha:; four sources

or real multipication whe N-- 3

]. 2mn/3 - (N-I) complex tw;ddle Iactors (I(. - \'.Idin Section 3.3.3.

m i-i

2. Conversion from com}plc. to R(u) of " 2 (3 - 1)i:2

twiddlj factors derived from FORTRAN code inAppendix C.

3. Convers;ion of complex arr'ay of lnath .T to theR(u) field derived from the FORTRAN code.

4. Conversion of R(u) array length N back to the

complex field derived from the FORTRAN code.

The radix-3 in R(u) has five sources of real additions:

1. mn/3 butterflies derived in Section 3.3.3.

2. The four sources of real multiplies listed above.

Based on the FORTRAN code in Appendix C, there are three

real multiplications per complex twiddle factor, two per

twiddle factor conversion, two per conversion from complex

to the R(u) field, and two per conversion from R(u) to the

comple: field. Condcn:izirs the above into an ec]ution for

real multiplicitions y.ields:

m i-l

real mult: = 3(2mN/3 - N+I) + 2 T 2(3 - 1) + 4N (3.100)

There are 14 real additions per butterfly, six per

twiddle facto, , one per twiddle factor conversion, one per

convei-:;ion to P (u) array, and one per convei:sion to complex

array. F-::c'; ing the total number of real additions as a

function of the above yields:

60

-" :-'; t ,: f- u-,"...

real 4ac'. ' ra /3 (;,N/ 3 NA .[

iv i- I

Y 2(3 - 1) + 2N (3.101)i=2

The results for the numlr of real multiplications and

addition:- for both vidix-3 algori thins is given in Table 3.2

for N=27 to N:- 1968,3. lBecaue the R(u) radix-3 rejuires -.ore

Plultiplica tions alls additions for N=-27 and }l it will alw.s

run slower than the complex field radix-3 FFT. But, for

Nz.243 and hijlier the R(u) radi.:-3 may run faster dependina

upon the speed of additions relative to multiplications for

the computer being used to perform the FFTs.

Table 3.2 also gives the "Add to Multiply Ratio"

required for the R(u) field radix-3 FFT to run faster than

the conventional radi:-3 FFT. (The ratio is the difference

in the number of multiplies divided by the differncc in

the number of additins.) For the case of !,=729, a multi- , '-.

operation must take 3.77 times longer than an addition

before the R(u) field radix-3 can run fastc- than t he c(:- '-

plex fiel( raidix-3 'sli me an. 1that p.-ir to s

either of the algor h[o: the relati ve costs of adcd7 ,,

to multip] ications ntl.t, be known as well as the l enjt

the data s., qunce.

3.3.6 Memory Triiironient s for P1xed Radi: N Ps. A

major con;i dcration 1b( r seiectino a parti cular I'.T

algorithm i s the sequence leng.::h and memory required to

execute the sublroutine relative, to the memory available

in the con'puter. For this reason the memory requirements

01

.C. ']_

COf, 1APISO, B]';T'1 M*'N CO 1.7I . ? NlD R(OORADIX- 3 IVT "OR IA, (lJ'I AI O ;*

Complox Radi>-3 R(u) Ri(ii:-3 Add toN Real *.uIt ].,7a] A,,bs lo I. Mul t R,!I Adds Mult Ratico

27 220 380 232 624 NA

81 976 1568 1284 2562 NA

243 3892 5996 3140 9796 5.05

729 14584 21.872 10912 35714 3.77

2187 52492 77276 37152 126108 3.18

6561 183712 266816 124628 435202 2.85

19683 62930C' 905420 413308 1476212 2.63

* Does not include computing sine and cosine terms

1.

A

62

- -" '-; ' -- ," " ' " ri\- ' ..

for the radix-2, 3, and 5 FF'T,; i (jvn !iere a. function

of sequence ]urig th N. The progl-ri m,,y ind (ht, array

storage requirements for each algorithm are enurc Led

below.

The program memor, required by each rottin, ws

determined from a "load map" generated by the comm*iand MAP,

PART. The array storage req iirem-,ts w.eie determined bV

inspection of the DIIE"NSION statements in the VORTPLAN code,

for each subroutine listed in Appendix A to D. The

results are:

FFT Program Arrays

Radix-2 108 2NRadix-3 301 4N + m + 30Radix-3 in R(u) 396 4N + M + 30Radix-5 458 4N + M + 30

The memory arrays required for each algorithm as a

function of N are listed in Table 3.3. The program memory

was not included because it is dependent on machine word

size which varies from machine to machine.

3. 4 Mix(,d Pdix PVT A] on thin;

Up to this point only fixed radix Pl]"Ts ha )ecn

discussed. Explanatinn and programming f-- the s 1 .ecial

cases where N=2m or 3 M or 5 m arc simpler than the generalI

case of N=p 1 p2 .. .Pm, and for most applications the restricted

choice of value,*; is adequate. However, when the application

* does not permit "zeropacking" of the data sequence to reach

one of the special cases a wider choice of N is needed.

63

, i r i 1-s.i

17W .I1 3. 3

FUI i) RAD)I P;POR 1.1 l

N Memory Arra'",,

8 26

9 68

16 32

25 ]-32

27 141

32 64

64 128

81 358

125 533

128 256

243 1007

256 512

512 1024

625 2534

729 2952

] 021 2040

204 S 409'

218,7 88200

t

6 4

s ~ 11(2 Iia; u V ~ he(1a n;i red 1rldl1iTa1 .rafi

li. j 3 i iwhi h i2 h Cf Wi del 'y u.'ud amc i j I1%j (T7('Iitcd onl

.111i~ al: :A 1 C~~l~ I r2.TIJ!2 i](miJthm i; 3 styC] inl

Au'ndi.: . (he nt.arna tanal, Vatliher-ati cal Scien Li fic

Iimr 172 .Ai i ch J L; av ai IaM 1)1c o n th c 11PA !'P CDC Cv'beor 7 4

coaj~p tier li2 ai Tlldx(K ra di-x FFT U zscee a-n Singleton -, work)

-,c) the1 author has wrilt Len andl tostcd a mixed radix~y a iqor I~l-.

wchis iitdin pprj'jj. nai .:. The theory, dijiit reversal1.

rel l!a 0n count 3'' I; K meIoV 2:(,(" r(!>22',LS; fr t hese

ailcjorithrna-, is li-scussec i n the follaw"-di Sections.

3.4.1 Mixecd Ra-clh \ Theory. All1 PET theory can be

de2 oed1:. raraiv 1qa an-i mesaal sea-Uencoe N as

sexora tw dir~es.~on>i nltriccS andl p)rfor:;ing ope-ratiors

oil these m trcs.Unolrstandli n~ this apprach when exposed!

to Lfor thne firs-t tiirc is dIiftf colt. For this reason the

dv c:,on f2 pre s on te hei, :-, 11nd thenC 2 a s 0ed fi c

Dx .-1a 70 i S tire C, C" to incr-aSe under,-Sta -ndinc, of the

1.( .~i 3 02

n0

1 i?.0 ,1 .. ,N- 1 whcr~e (k ) \n x(n ) are both comp~lex

v, 11;i.E (3.102) -,1 lie express;c ais a viat Fix

X Tx

'A> .Lri- T '-i'an 1),, dwcimLi Lcd-.in-t. iire (Cooley andl Tukcy,

6 19 Or 1- u11 1 t I a ('0 a 11 o ad S nlde , 19 966 )to prod IIce

c I, f_ f I it'll ti f.,-) ri 1)17:

G 5

1 - 2 1

whr I he deci I f 'onl corresfoi op to the factor

n. of: I n n * *p1rr1 2 1

anid P1 isLt mrndh cp eesl matr-ix Sng.L:i

1969j) .The PO .c> L 113iS oniv ?I j) no0l ixi i we onl each

row aind caluv'uri 31)0a io I,,,- r~ ind to N/nsqu(I1)C

submaJ.trices c, dilrcunIsies n1 it is thiis partLitieon that is

the basis fn,- these UAwdrax ]ty o; iceo

1969) .The matrices F. can be furtLher fa-ctored into:

F. JR i T (3.103)

whore JR a s the diacional inatrix of - dd- ("otataon) fac-

tors. Using those tidefactor:; enbethe trigjonomectric

syminotriosand complex m-.ultipliers (e.g.,e

0 i7/ N ) to be, exploitedl in the DP' uttefia and red(uce

the nurcer of Leal. operaitionr- A eel POie ecmtin-n

timo examle 1 flow coo sjc7(cied hIiChl usc s thle ab:ove, idoa S.

Given an N paint seqn-,cnce fo-i the "-n; ot rE

fawr~toc of ne nuwbo1)r :ftc tI_ C-11 c h rsult in) anly

1)02211)10 coire) nation. 1F N a30, it c on he fa-ctored a5;

5 , aind thin 'ins 5 .3 1.'PsI i-s eoaoi tioln is s;aw)nl

il 1'ic'r 9 n t;t~ s<l~o;:t11T follov.edl

by fLiv ye -po int 31 'T:.; Th( e xi[tg of (decorvnj mos it lol is

-fro 5 6 5 3 -2 andl is shown ini VIi pure 3. 1.8. Start-

v.-ai; i tin 1h I li 'T e x;' 1!-*atton in Ppx (3.9 9) thle sequimee caia

6 6

(17~

law-2

-i, - - \, .. /,

\ 7 / / "~

'I ' /-1-,.--/<. .. ..

,,,- ,,,- ,\ .. . ./ ~ ---

.1 I ' /

1(1\ /I "'\,'

7- /

22\3 1

I \~

• ."! / '/ - -. ..

1'

GIx

.. . . .. iiiI i . ... . -I . ..i-._

ha( I icjt int o (1 5 6 (rtr!-v*c ntink; a 5 by G

p -1 v% q- 11 p171k

X (K) ) x (m -!m ) N (3.A04m= 0 r=0

Now the :nc sus; can be (-:pressCd as the, CJ-pc)J. U 'Ts:

q-i rkC (k) ' x :,(pr+m) V.' (3. 105)

S r= 0

sinceprk rk

WN exp(-j2: prk/N) q::p(-j2;prk/pq) V.' (3.106)

Using p:-5 and q-6 in Eq (3.104) produces:

4 mk 5 5rkX(k) = Y W30 2 x(5r+m) W30 (3.107)

m= 0 r=0

The inner sum in Eq (3.107) is a 6-point DET which

can be dc;mposed into a 3 by 2 matrix by dividing the

sequences x(5r1n) into three ,sequenccs, each two points

long. The inner sturmation in EQ (3.107) can be represenLed

using the notation of Eq (3.1-04) as:

p-1 d: q-I ptkC,(], - q: Ui , (1,t4 ; N (3. 10'-,

s-0 t-0 N

whc -o ,' : nav , 1, p p 3 2. Sl,)id t-i tut in 1 1!1('l

q yields:

2 1-, 1 3tkS(k) 0 6 t (3 6 (3.109)

6; 9

IAki,

4 ik 2 ;1, t: k

in- (11 V~ V2 1 (3.110)

r -3t+s

3 tk t k

Iq exp(-j 2 i! 3 Lk/6) =exp-j 2 ill Lk/2) 2

rn 0, 1 , 2, 3, 4

s 0, 1, 2

t 0 0, 1

The cciipleto flowgraph is shown in V'igure 3. 18 and

iripleieintsF 1> (3.A1.0).

3.4.2 Di cdt R\'oiL Al qoriithn-I (Gcneril) . The

permuti'$.ilit-rx P1) is ro(3uirec I)_cjiusc Lhe Lranlsforricd

rosti] t .iS in1 LI dliit resocr -odc ur(lr . GiV~en a at rI

tion o." N -- n n 1 11 2 111 the Vourior oo1wi t of

k' : J~112 .. nh (3.112)

In (0ijiU>Iliii.i iiiorcha;( of K Vlilih k' canl hot (loneC "inl place''

if N aol; 01J;uoh I ha i (tiiilo f 1977)

n. n (3.1,13)

7 0

i or l::: : h hn n-i. For i I i."; fact ()I- I 1 (1: ".e co natd

inr. paLu.rl ] order and k' in diiL i v.v::r.d o I.r d,.cribed

for fi:-:cI-rAdi ooa ri th: bit -revei!"A

To implement thi-; technique for mix:ed ra i.ces j i

factored into its prime factors and the "squarc" fact rs

arranqed syma'trically around the "scuare-frec" factor:;

of N. For ex-ample, let N=270 and be factored as;:

3 - 2 • 3 • 5 3

Now the reordering, P. is factored into:

P = P P (3.114)

The reordering P1 is "associated with the square factors of

n and is cone by pair interchanges as previously described,

except that the digits of n correspondJing, to the square-

free factors are held constant and the digits of the

square factors are exchanged symmetrically" (Singleton, 1977).

For example, if:

N = n1 n 2 n3 n 4 n 5 n 6 n 7 (3.115)

with n I = 1 7 , n 2 = n6 , and n3, n 4 , n 5 relatively prime,

the Jntercange as;ociated .i Lh the sq-1 re factors n 7 ,

n 2 , and n 6 is liven b':

k=k 7 n 6 n 5 ... n1 +1 6 n n 4 "'" nl +k5 p 4 n 3 n 2 n 1

+k 4 n 3 n n]. + 1 3 n 2 n I + k 2 n1 + h' (3.116)

.ipte('hap(I, with:

k' 1-k1 n 6 n 5 ... n + k 2 n2 n 4 ... 4 i. + k 51 n3 n 2 n 1

4- k n 3 n n1 ) k n 2 n + k 1 -I k (3.117)4 31 5 6

71

Th~ i:: 1 ((, fl~ t in thi.; ,- i,] e JA ac s,-; ch (c : ei (-nle t of1

X(K) i1 L1 , ,wI r t : .,e:ten of Ih '(J LIi N/nI 12 qr-ou}ed inll 22"su,:,.(l,'cc:"of n I n 21 consecuiive ,o ,m.jt (Sinclc]eton,

1977) The ne::t reordering P 2 then finished the reordering

of e!Cl n n l 15 subseguences within each U/n 1 n 2 segment.

The above factori.2ation is used in the SinqIeLon and

INS, mixed -adix algorithms and generates a compl1cated

FORTRAN code. A simpler alternative factorization was

written by tlie author and used in his mixed radix algorithm.

The simpler algorithm requires an additional two arrays of

length N to store the intermediate results which detracts

from the algorithms utility when longer sequence lengths

are transformed. The details of this factorization are

presented in Appendix E for interested readers.

3.4.3 Twiddle Factors. In Section 3.4.2 the factoring

into F. was described corresponding to a factor n. F. canS1 "1 1

be factored to give a product R. T. where the matrix T. is1 1 1

one of N/ni identical Fourier transforms of dimension n

and P. is a a aona] iddle factor matrix. The elements

of P. are secified by the deci,.tLon-in- frequenc, vcrsion

of the FFT (SingleLon, 1977).

The twiddle factor matrix RP. multiplies each transform

Ti of dimns2on n. by (,j(Z) where Z is an angle from the

s~et:

0, Z, 2Z, ... , (n. 1)Z (3.118)

and Z 2i:/N. No multliplication -is needed for the zero

angl ,e which gjives at mwo;t N(n j- .)/n i complex multiplications

72

AM&L

.fchi I ., t It~:2(VI~ /) W it w~l1v

an I~ F P1t t> I'c I

I / . ( .1 9

I'hi- r(,: L is U:;eu i M, co;()i~t ji.L t w m m -c o (

mnult:i i icat ion:. -Indl add itions reqluired by anl N length FFT.

3A4. 4 Peal Oocr'i-atiions CountU for CompItoi an Sine

and Ccf-2 no, Pi F QoceVa Rcealill frui!1 Section 3. 1

that tj.-i,1onom, U ri c va lucn ii>:cd in ain F,," can be computed

cos (k:+1) a) (C cos(].ha) -S sin(ka) ) + cos(ka) (3.120)

sin( (k,-,i)a) (C sin (Ia) +S cos (ki) ) + sin (ka) (3. 121)

Where a 2/ r!a-,!d1ian11S

C -2 nin 2n /2)

cos;(0) 1

sin (0) 0

I n P t he rXi:.2e ()f- tho aiit her', m ixcd rat: lix 1 FT the

dii f ft I 0 w iC'( l o-L -m1 ~ 0~ipu tcd N t i mes and the sine and

Io t r' 1 11 t 3i v( i.Ool~lip t ablOs . The difference

1:11C(I c ~ .' -)-S * KSO(--i) + WKC (1-1) (3.122)w4 1-I + q ; 1

I'- C I ur:;( r- 1 E(ri)+WSI (3.123)

73

~Wk.

1,12) rA .( 12)s) .1ir mu '.I (.ltil S , oil-,an

trepe ,(,pu e j-e li- p r to Count i;(32\'jlby

rual- miult 4N(3. 124)

real1 addls TON (3.125)

The IMSL lind Singleton FYTs do not- use the sine and

cosine l ookup talbles in orde-r to save memory arrays.

Instead the s.-o nd cosine values aire computed as needed.

in the FVET prOjrami-, resulting in in intricate FORTRA\N code.

It was determined from the FORTRA'N coded I'ISL and Singleton

FFTs that both utilize the same method of computing the sine

and cosine difference equations. For this reason only the

Singleton FrT aie orithm was studied.

An algorithim w.hich com-iputes the number of real

operations required was interpolated from "counters" Placed

in the YTT rFliJo haV code( in Ap;pendix P. They provided the

number of tiinc Lhali each section of the FFT subroutine

was- usied to co7')u1 e Lhe sine and cosine values- for different

valuw-es NF. I erFor the ceutr are habc.

* a1n l the( .1 r'of J " <A coclo. whelre they\ we-rn

positJ oied. TI he lines of code are shown i n Appendix F.

12c: Counno,,v for thec radix-2 dlifference equationin 1ip~~ 2330 - 2340.

I 2CL Count rv for tHie i%-I x- 2 5. oand cosine

11mwv cllsin .1 nes 26")o - 2600.

14c] Couii I(Yr- for the r , ij x-4 sec(tioni which corn-put(: the Sine and con inc term~s of the

,WN 11 ni o the od, vix-4 .in I.inen; 3030 -3040.

iR t~r to Uicn( '13.19 w- hich shiows the radix-4

7 4

2r

ri.(yulW 3. 19. lnKii'-,i ay . .,flv lowy- h ,c .- nu',I o ) I 1

!I

I 4C2 C nt t(r Y-() t)he ral] ]:-'. 4 OC t i (,n% 7h ich CM1,PUtC<

[2'.. 505 .li_ [t ( t.11 : (0 i 1e 61( WN k• N

]('I:; of [Iio" rad.i.:-4 buLterfl f owjraph inlin s 3140 - 3170.

14CL: CounLer for radix-4 sine and cosine librarycalls in lines 3690 - 3700.

IGTF: Counter for the feneral twiddle factors sectionin lines 4990 - 5000. which cormutes the sineand cosine for the 11N leg of the general radix-pPIT.

IGTFE: Counter for the general twiddle factors sectionwhich cooxutes the sine and cosine for therenainder of the radix-p butterfly legs inlines 5170 - 5190.

IGTFL: Counter for the general radix-p sine and cosinelibrary calls in lines 5290 - 5300.

Data was collected for over 70 values of N using these

counters. A subset of the vz,lues were the 59 permissible

sequence lenctis of PFA and WFTA. Based on the results of

these tests and study of the FORTRAN code FFT in Appendix F

the general expressien> for these counters were determined.

Given that:

N osegence length

NFAC() ()f-r o ,v T N (as fact ored ))y the 21 nc]eTItons u brout r"

1-'I : nulll] n o fa](r ' oi o N

KSPN. = N/(NVIAC() * NFAC(2) ... * (NFAC(i-])J

then

T2C" (NPAN - 3),1/ for ]4],"N. > ' and odd

12C v (K1SP1ANi - 2)12 for IKfSAN i 4 and even

I2C 0 for K2P;\fl. < 4

. .

76

Am b

F'or th'( W~~tr 4 (?'t11c':~rve for I? ]'cmes

1C 12. o .f r of- 2 in N(3.126)

'Ihe x veni for- Llie- I1U1lD()w of c i . and COSIHO( clls

duringi cwnputat irn Fafact oi of 2 is[vpu/2r hr

Irepreroen s t rinciol of thew result. insi~le LlhbackeL

Using t ho "trunc:-,tioi' nota-t-ion:

kIM. N: 1, [1SPAU/701 (3.127)

Thui raodi;x-4 SectLioni uses the Same notatioanal conven-

tions for KSIIAN andl truncation. The expressicns for

14Cl, 14C2, and 14CL bccomez

14C2. I{S= N - 1 (3.128)

14CL. [YSPAN./32] (3.129)

14c1. 14C2. - 14CL. (3. 130)

For all fictors of 41 in N tC-he expression becomeis:

k14C2 = p,(K,\ N-1) (3. 131)

I4CL 1, 12f22VJ.21 (3 .1-32)

14C1 -14C2 - 14CL

4whec thcrc' arc, k- f c tors of 4 in N.

Th3c ee cllo r<;x n folr I CP, IGTIT, adI GTPIL were

~PI~TCTL:~ N/? (3.1-34)

I V'i E;AN -lciL.-2 (3.135)

'7

IJGT i. (Izf'hv (N I) ( ' (i) 1) . 1.336)

The rt_-ii t for. the (jenerai. raldix-p ) .(ec U on (cjes

k1GTF KIGTV. (3.137)

kIG'V M7.1L (3 . 13 8)

kIGTFL X I(;T F 1; (3. 139)

i= 1 1

Uqs (3.124) thrciiqh (3.139) were prwqrapuiid in FORTIRA" and

then tabulated as a function of N in Table 3.4. These

results identically match the tests conducted using the

counters.

1E7xamininq the FORTRAN code where the counters were

locatedl Uives the number o(7 operations performed each time

one of 1Lhc counters was incroim.ented. These results arc

pre-.qe:nt(-d an TaIble 3.5 for al.1 the counters. The numrber

of relopc ratj ns , sine and cosine library calls, and

expoetain call be ue1(te7mi.nec] for all N length seqluences

KADO) I4(i2L' I ]4C-1 I ~ i + 3 (_14C1) ± 2 il) (3.] 30)

ju i-r 4 (iT-c, j-'(,2 -i 1GTjI' + IGm.ri':) + 6 ( i l-) (3.141)

KEX1, 2 (14CI ) (3. 142)

3. 4. 5 N("1 Op, Coti (lH(? lnL I(, MJ*Ii\od 1. Is . Thte

rea"l h~ op- Ii tLi on: ; (2:11 (1ci ve A V iI olHI the ofI,)C 01 COllplox

(-ti-; tdi wfatl i- bui-)fje' and tenu-ilber

of- ti e a 111d Colni1 coL;r,;reju Led 1l! 11 (J cdif ference eq uatLions.

718

.. .. ... ...... . - 4',! . ..re !!m e. . . .L, -

- Vt

4- -

47"

-JV, V4 -- i - C

4 - - -.. , -

t -r

S

.4

1 1 ---. V .

- 'r - *. . ic4'!' -~ 4

- .1-';

* C I *i *-r di.:,.: -, U I..

-I ~1 -1 Vt Vt

I *J'( 4 .9!- (.0' . C~A (V

I

N CCUCC§2 -, .. iA .v *t..~.<C((~~.q.>orC0.. -1 CC 0(4 . if, CI) C') I in 01 Li

Vt ~-c ~-i (N -r (N

>4

(N UN UN " F Vt C NVt ,-s iN Vt C.

A -. 2- * - - i-i -iA ('-''-U-, -4 C .-.-, * C,:.: A

t

I.4

I

tA- P vrc - .~t P Vtr -4 4 -r- .- . io.-.,-A P

w -

Rea I IlC al I 7:pon 0 1- S I, c, Cwrd':Co un L cr Ad. ( Mu tU i ati onl C~aj Cj) 1 a

12(7 4 4 0 0 0

12CI1 0 0 0 1 1

I1l C,1 3 6 2 C 0

14C72 4 4 0 0 0

1 4ci 0 0 0 1 1

I1C(711 4 4 0 0 0

l il1 2 4 0 0 0

ICTY1, 0 0 0 11

EFFICIENT COMPUTER IMPLEMENTATIONS OF FAST FOURIER TRANSFORMS. CU)

UON EC 80 .J 0 BLANKENN*CLASSIFIEO AFIT/SE/EE/800-9 N

Given that N is factored as:

N P3 P 2 "m (3.143)

the number of twiddlc factors has bcn shown (Singleton,

1969) to be:

mZ (N(pi - ])/pi) - (N-1) (3.144)

i=1

where m is the total number of factors of N. The number of

butterflies required for an N length sequence is given by:

mE (N/pi) (3.145)

i=l

The total real operations count is determined by adding (a)

the number of real multiplications and additions required

per butterfly times Eq (3.145), plus (b) the complex twiddle

factor multiplications times Eq (3.144), plus (c) the number

of additions and multiplications given by Eq (3.140) and

(3.141).

Assuming a complex multiplication requires four real

multiplications and two additions a general expression for

the real operations count can be determined for the mixed

radix FFTs.

£Singleton's mixed radix algorithm contains special

transform sections for factors of 2, 3, 4, and 5 as well as

a general section for other odd factors. This requires

that N be represenLed as:

r s t u ml m2 mkN 2 3 4 5 p1 P2 ... Pk (3.146)

82

The IMST mixed radix FFT (! PJTCC) does not have a special

section for factors of 5 and uses the general suction to

transform Lhese factors. The author's mixed radix FFT (IFTMIR)

has sections for 2, 3, 4, and 5 but does not have the general

transform section. Only the detailed development of oper-

ations count for Singleton's algorithm is presented here

because the ether two algorithms are subsets thereof. The

general expressions for real operations versus N are given

for the other two algorithms in Appendix G and H.

The radix-2 section of the FORTRAN code for Singleton's

algorithm is shown in Figure 3.20. For factors of two the

twiddle (rotation) factor complex multiplications are com-

puted in this section rather than the "general rotation

section" to reduce the array indexing required. Using

Eq (3.144) the total number of butterflies is rN/2 and from

Eq (3.145) the total number of twiddle factors is rN/2

(neglecting the -(N-l) term which will be subtracted once

the complete real operations count for all factors has been

developed). The transform for factor of 2 (refer to

Figure 3.20) is computed in lines 2200-2230 using 4 real

additions, if no twiddles arc required, or it is computed

4 1 in lines 2450-2500 if twiddles are necessary. The general

expression for factors of two becomes:

real mult = 4(rN/2) = 2rN (3.147)

real adds = 4(rN/2) + 2(rN/2) = 3rN (3.148)

- The factors of 3 section shown in Figure 3.21 performs

only the butterfly in this section and uses the general

83

I

- I

* *.

S :I , , • 11 -4 I

-,-- . ,

- 1 *-, ".

a.- , . .

, -

• I p-. -" . : :' : " ,I , , " - '

II .

i'- Figure 3.20. Radix-2 Section of Singleton's FFT.

; 84

H'

T f1*

igr 3.21 Radx- Secionoigetns

Ali,

H 85

rotaLion (twiddle) section to twiddle the data (the general

tv: cUl c far t.or C ctioll fi; Fshown in Figure 3.24) . Ur;ing

Eqs (3.144) and (3.145) the number of butterflies for

factors of 3 is sN/3 and the number of complex twiddles is

s(2N/3). Examining lines 2760-2870 in Figure 3.21. shows

4 real multiplications and 12 real additions. Each complex

twiddle requires 4 real. multiplications and 2 real additions.

The expression for the factors of 3 section becomes:

real mult = 4(N/3)s + 4(2/3)Ns

= 4sN (3.149)

real adds = 12(N/3)s + 2(2/3)Ns

= 16sN/3 (3.150)

The factors of 4 section in Figures 3.22a and b include

the twiddles in the butterfly section to minimize array

indexing. The number of butterflies computed for t factors

of 4 is tN/4 and the number cf complex twiddles is t(3N/4)

from Eqs (3.144) and (3.145). From lines 3210-3320 and

3540-3570 the number of real additions per butterfly is 16.

Every complex twiddle requires 4 real multiplications and

2 additions. Combining the butterfly and twiddle operations

results in the general expression for factors of 4:

real mult = 4(3N/4)t = 3tN (3.151)

real adds = 2(3N/4)t + 16(N/4)t

= 3tN/2 + 8tN/2 = lltN/2 (3.152)

The transform section for factors of 5 shown in Figure

3.23 computes the butterflies for the u factors of 5. There

are uN/5 butterflies and u(4N/5) complex twiddles based on

86

. .... . ..- 4 I_

1 7 A

,.-j

-: = '-! - ;4 b- I!

i g 3.22a . :- ,c i ,l FF',

",1 -II- : F ' ', - 4 - - , .,

.- , '1 , - J=. ;,' 4 : , : , - '4, '-

: .f 4 - n :- : .

-A-

' i>" I'.' ',

-' I -' : , ' , ' " ' , "

* -I-' ' ' ' " -'I

' 4 - : + ;

-4 I n .: , , ,,,,

" Figure 3.22a. Radlix-4 Section of Singleton's FFT.

~87

Tr : ; ' l ' I ' I .. . . .- ... ..... .. . ..

"-' T . C - - 1 ', - - -

I.F;.q:. . i

Figure 3.221b. Radix-4 Section of Singleton's PET.

4

S.

4,

/ 88

, : r : - ? 7 r ' - '" 1 ' . , - -. ...... ... .. '---

I'

-,-- - I-I.,

." -"j .7 , T - .• , •

- S..

4 ,,. 0 1

:I-I';:- I:-I': ' " ' :" r , I ':

-4 , . - . 1 '.

4r11.i,- '- ', I . ' "

4 4

Figure~~ 3.3 Rai- etorfSnltnsFT

iii6i

.. •. .4-

.4:1-

-S-.,

)_]:iqs (3.144) and (3.145) . Examination of lines 3%20-4090

in Iiiqure 3.23 ! iow, 36 i-u, [ L i t anu 32 rcal

additions are required per butterfly. Cowbining the

butterfly and complex twiddle operations provides the

general expression for real operation for factors of 5:

real mult = 16(N/5)u + 4(4N/5)u

= 32uN/5 (3.153)

real adds = 32(N/5)u + 2(4N/5)u

= 8uN (3.154)

where u is the number of factors of 5 in N.

The general transform section for odd prime factors

is more complex than the special factors sections. To

aid in describing the number of real operations a p-radix

is defined such that p is an odd prime greater than 5 with

an associated "mi" integer power. The real operations

count for the general section does not include additions

associated with array indexing nor does it count multi-

plications and additions needed to recursively compute the

sine and cosine terms.

Based on the FORTRAN program for the odd factors shown

. in Figure 3.24a and b there arc five sources of real

operations for each pi factor. The first source shown in

lines 4310-4360 is computing the (Pi-l)/2 complex multi-

t pliers for the butterfly legs which require:

real mult 4(pi-1)/2 = 2 (pi-l) (3.155)

real adds 2 (pi-l)/2 = (pi-i) (3.156)

90

S-I Iz - I*

4 . i-,= ' I' - "

. F: -

, .L T. T T2 '41

.4 1

44' ,-..-i,,s

441= 1" = 74:;44320i---=' ,-I = d r:i

44. A= E::: P P444 H= j 1

44 h = -

4 i - - ,. .4:- ;;T44.90= _ s=' 1'" d

44'1z, ; -q'.: = ::IT ,_ , - ;q :

49,0= T,' - , 1' F'PT F.1,

451= F:: T ' + F:

45 '; .I I -' - '

4._4= F T I' - F'' F4= 1= : =I 1 F :':""

, 1= I , .: 1 T * ' $ , : - T V m T'-4t - - -, -

.,4:1," =,'I 1 "

4-,111= 1 1 4

t 4:jz1 = -- I I

4700n 3,.:0. , = ,

Figure 3.24a. General Factor Section of Singleton's FFT.

91

A L

T I

. ...... I , I I I!

- .j

-1':. r;= : '- :,.. I . n ."[

4 '4: I =3' T In 4-

T i ,. L f- f-

1 F .: -P 0.5F, 'l LI IT '- r,

4CJC~'i T IrrP Tr 1)

T 4,-T~ 1 -SF

....- ..HE 2LL TT~1 :T I qr

,. . . -

Ill - - - ,- 1 t *:-: C - :

'= N I '' ' I..,:';d; , -44 . , , . .' Sf F Zf

t "1

- ' = .: 5' ",* ;; +" ,; 5'ICl -L-,.1

Figure 3.24b. General Factor Section of Singleton's FFT.

92

-- ' . i ' " " ' i ". . ..' ++''") ; : " '. . ... ..+- ' : " a

. .;., . L , .,,: c '.., unlv OnCe Icr cach

, ... , ,: .. 26 - 4, thc.., fictor 7 re-l, -s

(7-)1 c mkT :: u ZL icrs. I f N 196=7.7-4 th r -rc

still only (7-1)/2 complex multipliers needed.

The second source of real operations is produced by

computing the butterfly transmittances which require only

real additions. From Eq (3.145) there are (mi)N/pi

butterflies required for the (mi) factors of pi. For

each butterfly there are (pi-l)/2 transmittances which

require only real additions. Examining lines 4470-4540

in Figure 3.24a show that the (pi-l)/2 transmittances

require 6 additions. Combining these results produces

the general expression for the real additions:

real adds = (6(pi-l)/2)(mi)N/pi

= 3 N(mi)(Pi-l)/pi (3.157)

The third source of operations is produced by the

(pi-l) 2/4 butterfly transmittances which require real

multiplications and additions. Lines 4510-4750 in

Fi-ure 3.24b show there are 4 real multinlications ,-,nd

4 real additions needed. Combining this with the number

*of transmittances and butterflies gives:

real mult = 4((mi)N/p i ) ((pi-l) 2/4)

2

| = (mi)N(Pi-l)2/pi (3.158)

real adds = (mi)N(pi-l) 2/p i (3.159)

93

"r ", lM "- A "

linc s 4800-4:230 ,-c w that this function requires 4 real

additions. Combining these results give the total as:

real adds ((mi)N/pi)4(pii-)/ 2

2(mi)N(Pi-l)/pi (3.160)

The final source of real operations is shown -.n

Figure 3.24b lines 5120-5140 which performs the complex

twiddle multiplications. From Eq (3.144) there are

(mi)N(Pi-l)/p i complex twiddles which provide the general

expression:

real mult = 4(mi)N(Pi-l)/Pi (3.161)

real adds = 2(mi)N(pi-l)/pi (3.162)

Combining Eqs (3.145) through (3.162) give the expression

for the real operations in the general odd factors section:

k 2real mult E 2(pi-l) + (mi)N(pi-l) p.

i=l

+ 4(mi)N(pi-l)/Pi (3.163)k

real adds Z ((pi-l) + 3N(mi)(pi-l)/pii=l

+ (mi)N(pi.-l)2pi + 2(mi)N(pi-l)Pi

+ 2(mi)N(Pi-1)/p i )

'4

k," = E (pi-l) + 7N(mi)(pi-1)/pi

i=l

* 2+ (mi)N(pi-1) /pi (3.164)

94

Assuminq that the sequence can be factored into

1 m2 p1 mk the expressions for the

total number of real operations can be written using

Eqs (3.140) through (3.164) as:

real mult = 2rN + 4sN + 3tN + 32uN/5

k2+ E (2(pi-l) + (mi)N(pi-l) /p,

i=l

+ 4(mi)N(pi-l)/pi) - 4(N-1) + KMULT (3.165)

real adds = 3rN + 16sN/3 + lltN/2 + 8uN

k+ k ((pi-l) + 7N(mi)(pi-l)/pii=l

+ (mi)N(pi-l) 2/pi) - 2(N-1) + KADD (3.166)

Notice that Eqs (3.165) and (3.166) have the corresponding

4(N-1) and 2(N-1) real operations subtracted from the total

multiplications and additions because the first stage of any

FFT decimation-in-time does not require the "twiddle factors"

(likewise with the last stage of an FFT decimation-in-

frequency). These equations also include KADD and KMULT

which are the real operations required to compute the

recursive sine and cosine difference equation.

4 Similar expressions and derivations were performed

* for the IMSL FFT and the author's FFT but due to theredundancy they were derived in Appendices G and E

respectively. The general expression for real operations

required by the IMSL mixed radix FFT (where N st

ml m2 mkSI P2 " k is given by:

95

a~t -. . *..

real itiul L -, r;4 + 3 Jn

k+ T (2(pi-1) + 4(mi)N(p.-l)/p ii=l £1 3

+ (mi)N(pi-l)2/pi) - 4(N-1) + KMULT (3.167)

real adds = 3rN + 6(sN + ltN/2

k+ E ((pi-l) + 8(mi)N(pi-l)p i

i=l

+ N(mi)(pi-1) 2/pi) - 2(N-1) + KADD (3.168)

where KMULT and KADD are the multiplies and adds needed

to compute the sine and cosine terms. The general expression

for real operations required by the author's mixed radix

FFT (where N = 2r 3s 4t 5u) is given by:

real mult = 2rN + 4sN + 3tN

+ 32uN/5 - 4(N-1) + 4N (3.169)

real adds = 3rN + 16sN/3

+ lltN/2 + 8uN - 2(N-1) + 1ON (3.170)

The real operations count for Singleton's mixed radix

FFT is shown for N 200 in Figures 3.26 and 3.27. The

operations count plotted includes only the additions and

multiplications for the butterfly and twiddle factors in

2p order to demonstrate the N2 "upper bound" and the N log2 N

"lower bound". The N upper bound occurs in the mixed

radix FFTs when a prime number must be transformed. The@mN log 2 N lower bound is reached when N=2m. In between the

N2 and N log 2 N bounds there are other "bounds" which are

observed in Figure 3.25. The dashed lines represent numbers

96

00

4J

0 3 0

0"4

oz

w 0

00-- S-4

IL

t __ ___ ____ ___ ___ ____ ___ ___ ____ ___ ___ ____ ___ __ 3

oooi. cn 00Qz ~iz oofr ooo. oD~

4.3

97.1I1

C-C

0

0

-w4-~ 0

0

14

.9-I

o 0oC 4-

C)

C Z

0

98-

which are not primes, but are not highly factorable either.

The dashed lint, ipprodchc2 : log2 N as :1 becomes more

factorable.

The relative efficiency of radix 2, 3, 4 and 5 FFTs

is observed in Figures 3.27 and 3.28. These figures plot

real operations counts for the mixed radix FFT for N less

than 250 (where N is divisible by 2, 3, 4 and 5 only) and

annotate the integer powers of 2, 3, 4 and 5. Notice that

the fixed radix-2 and 4 provide the "lower bound" and the

radix-3 and 5 provide the "upper bound" on the number of

real operations which shows that integer powers of 2 and 4

require the least number of real operations and radix-3

and 5 the most. Other combinations of factors, i.e.,

N=120=5*4*3*2, have real operations counts which fall

between the "bounds".

3.4.6 Memory Requirements for Mixed Radix FFTs.

As in the case of fixed radix algorithms, a major consider-

ation in selecting a particular mixed radix algorithm is

the memory required to execute the FFT subroutine given the

memory storage limitations of the computer to be used. The

memory requirements for the three mixed radix FFTs is given

here as a function of the sequence length N. Each

algorithm has program and memory array requirements which

are listed below.

All the algorithms were compiled on the CDC Cyber

system at AFIT and the program memory required by each sub-

* routine was determined from a "load map" generated by the

99

.........--. . -i

+ v

0

Nz

~LLJ

ZL('4

M4-

+ I

0 ~4J

4-' 00 4.4

3.4 0 + U)4) -1 M -Q) + (0 w +4.4 U~ 44 +

4-4)~ 0A-) J4 C0

0) 4J) " OI l u E 0a fd 0 41 14 r_'m .,

.41 r_ -4 .0 +) 4Ji+n 0 j :

-4 (D r + M)

m H 30-4 ~0 0 + 0 *a) "~ TY u

(4

1-44~N ~+

*.0J-U a Qg +

) 0

14 0~ 4-' M

ON

10 0

100

+ C

C'

uLJ

0 L1J

+ ~LLJ

+ 0

444 +D0 4-4 a)-

0+ U) +0

$4.

w4-4 4-4

Z 14 z

.W a 4 L)~ 0M~

+Ci)

aD) C (1) 4q'0

W 4 -.14J U N+

U)4 0 Q) o +4

w~ .-4 iLr)cl Mir+

"r +

V( 41' (a N

t4

-4 ).CJZ U

0 0

9008 009 01' 000 00

101

command MAP, PART. This load map gives the size of all

programs used during execution. The array storage require-

ments were determined from the FORTRAN coded programs and

reference material provided with the IMSL and Singleton FFT

subroutines. The general expression for memory renuire-

ments for each FFT subroutine (as a function of N) is

given below.

The subroutine written by the author requires 899

words of program memory. This subroutine (FFTMR) also

requires the "calling" program to dimension 6 arrays

(A, B, AT, BT, WKS, and WKC) to length N. (Use of these

arrays is explained in Appendix E). This gives the total

memory array required as:

FFTMR memory = 6N (3.171)

The mixed radix subroutine written by Singleton

(FFTSNG) requires 1100 words of program memory. Four arrays

(AT, BT, CK, SK) are dimensioned to equal the maximum prime

factor of N. If there are no prime factors greater than 5

these arrays may be reduced to 1. A fifth array (NP) is

dimensioned to at least one less than the product K of the

square-free factors (see Glossary) of N. If N contains at

most one square-free factor this array can be reduced to

M + 1 where M is the maximum number of prime factors of

N. Two more arrays, (XR, and XI) are dimensioned to length

N. The total memory array storage becomes:

FFTSNG memory = 2 • N + 4 • MAXPF + (K-1 or M+l) (3.172)

102

" ,~~ *7 7i T :;..

where

N = Sequence length

MAXPF = Maximum prime factor of N.1

K = Product of square-free factors

M = Maximum number of prime factors

NOTE: K-I or M+I is selected in Eq (3.172) basedon the number of square-free factors ofN as described in the preceding paragraph.

The mixed radix subroutine (FFTCC) provided as part

of the IMSL package on the CDC Cyber system requires 1061

words of program memory. A complex array (A) must be

dimensioned to length N and two other arrays (IWK and WK)

are dimensioned to length "IWORD", where:

IWORD = 3 • M + 3 + MAX (4 * M + 7 + 6 - K,

KB + 1 + 2 • JK) (3.173)

To define the quantities M, K, KB and JK a prime factor

decomposition of N is required such that:

N- 2 f 2 f2 fN = f 1f "" fKT fKT+1 ... fKT+JT

where each f. is a prime number (other than 1) and fi fr

given that:

i, r > KT + 1

KT > 0; JT > 0

Then:

M 2KT + JT (3.174)

is the number of prime factors in N and:

K= max (f) (3.175)1 < < KT + JT

103

is the larqcst prime factor of N. KB and JK are defined

as follows:

JK = 1 f f 2 "" KT (3.176)

where JK = 1 if KT = 0 and

KB = N/(JK) 2 - 2 (3.177)

Once M, K, JK, and KB are determined they are substituted

into Eq (3.173) to determine the value of IWORD, the actual

work storage requirement. Counting only the arrays for the

work vectors (IWK and WK) and the data arrays (A and B)

gives the total array memory required for the IMSL FFT:

Memory = 2 * N + IWORD * 2 (3.178)

An example of N=2100 is used to demonstrate the use

of Eqs (3.172) through (3.178) in computing the memory array

required by the IMSL and Singleton subroutines. For N=2100

the factors are 22 . 52 . 3 • 7 for which FFTSNG memory

becomes:

N = 2100 sequence length

MAXPF = 7 maximum prime factor in N

K = 3.7 = 21 = product of the square free factors

M = 6 = maximum number of prime factors

Using Eq (3.172) the expression for FFTSNG memory array

is given by

2 • 2100 + 4.7 + (20 or 7) = 4248 (3.179)

NOTE: There are two square-free factors 3and 7, therefore choose 20 for thelast term of Eq (3.179).

If this subroutine were used on the Cyber 74 computer, the

program memory is added to the memory array to give a

104

j I

total memory of:

memory = 4248 + 1100 = 5348 words (3.180)

The same example of N=2100 is applied to the IMSL

memory equation where:

2 2 2N f1 f 2 ... fKT fKT+ "-" fKT+JT

22 2 3 7 = 2100 (3.181)

From Eq (3.174) the expression for M becomes:

M = 2.KT + JT = 2 • 2 + 2 = 6 (3.182)

which is the number of prime factors in N. The largest

prime factor in N is given by Eq (3.175):

K = max (f.) = 7 (3.183)1 < j < KT+JT I

JK, which is the product of the "square-factors", is:

JK = 1 - f1 " f2 "'" fXT = 2-5 = 10 (3.184)

and KB is

KB = N/(JK) 2 - 2 = 2100/100 - 2 = 19 (3.185)

The results of Eq (3.181) through (3.185) provide the

size of the work vector IWORD given by Eq (3.173).

IWORD = 3M + 3 + MAX (4M + 7 + 6K, KB+1+2JK)

= 18 + 3 + MAX (24 + 7 + 42, 19+1+20)

= 21 + MAX (73, 40) = 94

Substituting IWORD=72 and N=2100 into Eq (3.178) gives the

memory array for FFTCC as:

2N + 2IWORD = 4200 + 94 = 4294 (3.186)

Using this subroutine on the Cyber 74 computer requires

1061 words of program memory which makes the total memory

required equal to:

105

,J . . , : " " , " l -m . : ,. . .

4294 + 1061 5355 words (3.187)

For this length N=2100 sequence the Singleton FFTSNG used

less memory (5348) than the IMSL FFTCC (5355).

The array memory requirements given by Eq (3.172) and

(3.178) are plottud in Figures 3.29 and 3.30 for ': less than

200. It is readily observed that selective adjustment of N

to be highly factorable (composite) minimizes the memory

required by subroutines FFTCC or FFTSNG. As an example of

how prime numbers increase the memory array sizes, cor.sider

N = 2099 for each algorithm. For FFTSNG the variables are

MAXPF = 2099, K = 2099, and M = 1. Since N = 2099 contains

only one square-free factor the array NP can be dimensioned

to M+1=2. The memory array for FFTSNG becomes:

2N + 4 • MAXPF + 2 = 12594 words of memory array

Adding the program memory of 1100 yields the total memory

requized to execute the FFTSNG on the Cyber 74:

memory = 12594 + 1100 = 13694 (3.188)

For the IMSL FFT the variables are K = 2099, JK 1,

KT = 0, JT = 1, KB = 2097, and M = 1. The expression for

IWORD becomes:

*j IWORD = 3M + 3 + MAX(4M+7+6K, KB+I+2JK)

= 3 + 3 + MAX(12605, 2100) = 12611

The total memory assuming execution on the Cyber 74

system is:

2N + 2.IWORD = 2.2099 + 2.12611 = 29420 (3.189)

which is 5.5 times larger than the total memory for N=2100.

106

0

-j 0

C3

p C)

ov

0W

LLJ4

n ocr,ceI

-i MI

m X 0 1

cvn

0 rq

107)

0L

E-C) aL

-- ----

- --- ( 3pJ U

0!U-

-zz

LLLU

-i Li

(00

C!

1010

100

3.5 Fourier Transforms Using [ast Convolution AI.rithms

The paper by Cooley and Tukey, 1965, had a major impact

on digital signal processing by stimulating the development

and wide use of the FFT. Recently several new ideas have

been used to compute the D?'T which have imnpacted diaital

signal processing. In 1968 it was observed by Rader that

computation of the DFT could be changed to circular con-

volution by rearranging the data when N is prime. Now, if

given a fast way to do circular convolution, one has a fast

DFT method. Winograd showed the minimum number of multi-

plications for circular convolution of primes and prime

power length sequences. He then proposed that these high

speed prime power convolutions be "nested" into long trans-

forms to minimize multiplications. The Winograd nested

algorithm has been studied and programmed (Silverman, 1977;

McClellan and Nawab, 1979; Zohar, 1979) for computing the

DFT of complex valued sequences.

An alternative to the Winograd algorithm was proposed

by Kolba and Parks and combined the concept of fast convolu-

tion with conventional DFT techniques to oive inotner

efficient DFT implementation. Kolba and Parks' prime

factor algorithm (PFA) uses the same reordering technique

as the Winograd Fourier transform algorithm (WFTA). The

oriainal PFA (Kolba and Parks, 1978) has been modified

(Burrus and Eschenbacher, 1980) so it can transform the samer

sequence lengths as the WFTA.

109

This section pre:;ents the theory (c f he IFTA ":;rall-N"

alqorithms, the data reordering (which is the same for PFA

and WFTA), the PFA theory, the real operations count, and the

memory array requirements for both PFA and WFTA. Since both

alqorithms follow a similar development the conversion of a

DFT to circular convolution and data reordering are only

presented once and apply to both algorithms.

3.5.1 Converting a DFT to Circular Convolution.

To convert the DFT expression to a circular convolution the

DFT matrix [W] must be "mapped" into the circular convolu-

tion matrix [W ]. The mapping between these two matrices,

and hence the basis for the WFTA and PFA was developed

by Rader in 1968.

Rader showed that if "N is prime, there is some

number g, not necessary unique, such that a one-to-one

mapping from the integers i = 1,2, ... , N-1 to the integers

j=1,2, ... , N-1 is given by:

j = ((g )) N (3.190)

where the notation ((x))N implies x modulo N." The example

of N=7 and g=3 usino the mapping of Eq (3.190) gives:

i 1 2 3 4 5 6

j 3 2 6 4 5 1

The number g is referred to as a "primitive root" in number

theory. The mapping of Eq (3.190) provides the convolution

matrix [Wc] from the DFT matrix (W]. Examples of this

mapping are extensively treated in the references

110

-* " I i-.. . . . . .. . ..

(Silverman, 1977; KoILci - !Id TI.-rk , 97)Alicr

repeated in this paper.

A brief example of using the results of tho convolu-

tion matrix is presented to aid in developing the small-N

algorithm operations count. Consik'L.r thae fo.llowin9 3-poinL-

DFT written in matrix notation as:

(X(2 7 KWo 2l xd (3.191)

where Wis assumed and usi= t The circular convolution

is given by:

(I =w~ 2 1 lR(2 [w w wl (2) (3.192)

=~) w2 wI] (2 (3.192)

which provides X(l) and X(2). Then the DFT in Eq (3.191)

can be rewritten using Eq (3.192) to give:

X(0) W 0(x(0) + x(l) + x(2))

X(l) W 0x(0) + (1)

X(2) wlx(0) + X(2) (3.193)

Using similar techniques to the one presented here, convolu-

tion expressions to perform DFTs have been developed for

N = 2, 4, 5, 7, 8, 9 and 16.

Ii

3. 5.2 Iord3rin5 the Data Arrays. Implmentinq the

WFTA or the PFA into a useful form involves making long

transforms from the short, fast-convolution transforms for

2, 3, 4, 5, 7, 8, 9, and 16. The general idea is "to con-

vert a one-dimensional lenath 1-! M1 12 ... M. transform

into a i-dimensional transform requiring computation of

i shorter length Mk transforms for k = 1, 2, ..., i.

(Kolba and Parks, 1977). The mapping from one-dimension

to i-dimensions is based on the Chinese Remainder Theorem

which requires relatively prime factors M1 M2 ... M i

The example for two mutually prime factors given by Kolba

and Parks, 1977, is presented here because the mapping is

common to both WFTA and PFA.

In the DFT:N-I

X(k) = N x(n) Wnk (3.194)N=0

the index n of the input sequence is referred to as the

input index, and the index k of the output sequence X(k)

is called the output index. Mapping from one-to-two

dimensions maps the input index n into a pair of indices

(nI , n2 ).

n = r n mod M n = 01 ..., MI-I r = M mod M1 1 ind 1 1** M1- 1 M2 md 1

n 2 = r 2 n mod M 2 n 2 = 0 ... , M 2-1 r 2 = M 1 mod M 2

The output index is

k= k mod M1 k I 0, ... , MlI

k 2 = k mod M2 k 2 =0 1 ... , M 2-1

112

T'.!. 1 fvc r:;. b ,) inI,; (" two-to -,n di muns ion for t1h ( Ut-

puL index is:

k (s k1 + s2k2 ) mod N (3.195)

where

s 1 mr) (I M and 0 mod M,

s I --- 0 mod M2 and SI = i mod M2

While the same inverse mapping in Eq (3.195) could be used

for the input index n, it is more convenient (Kolba and

Parks, 1977) to use:

n = (M2nI + M1n2 ) mod N (3.196)

When the mappings in Eqs (3.195) and (3.196) are used the

DFT becomes:

MI1 I M 2 1 n2k 2 nlk1X(kl,k 2 ) = E x(nl n2 ) n W (3.197)

n1 =0 n2=0 'nWM2 W 1

At this point the WFTA and PFA approach the implementation

of Eq (3.197) differently as seen below.

3.5.3 The Winograd Fourier Transform. A newalgorithm for computinq tie DIT was proposed by Winoerad

in Jul; 1975. The WFTA has properties such that the number

of real additions remained at the FFT level while the

number of real multiplications necessary to evaluate the

DFT was reduced (Silverman, 1977). This paper will not

derive the "small-N" algorithms. Readers interested in

derivation of the WFTA are referred to the articles which

extensively treat the topic (Winograd, 1976; Silverman, 1977;

Kolba and Parks, 1977; Zohar, 1979).

113

SNam

Winograd's proof started with the N by N matrix with

elements:I luens: ir ir mod N

WN = WN = QN(i,r) (3.198)N NN

which can be decomposed to:

Q 0 D I (3.199)

N N N N

where IN ir a u by N incidence matrix with values of 0, 1,

and -1 only, DN is a u by u diagonal matrix, and 0N is an

N by u incidence matrix (Silverman, 1977). The decomposi-

tion of QN is possible with large values of u relative to

N (i.e., u=N 2 ). Winograd solved the more difficult problem

of decomposing QN=ON DN IN given an incidence matrix which

2.has dimension u smaller than N2 . Winograd applied field

theory to give solutions where u approximately equals N for

small values of N, where N = 2, 3, 4, 5, 7, 8, 9, and 16

(Silverman, 1977).

Not only did Winograd prove the minimum multiplication

count for the above small-N DFTs but he also proposed a

special structure of Eq (3.197) using Eq (3.199). The two

dimensional transforr in Eq (3.197) may be implemented by

first calculating M1 length M2 DFTs:

SM- l n2k 2y(nlk 2) E x(nl,n 2 )W (3.200)

2=

and then calculating M2 length M1 DFTs:

X(klk 2 ) E-l k )W (3.201)nl=0

114

Usinq thc notuL,-n o- Leq (3.199) tiC M short trans-

(1)form can be written in terms of the input additions i

output additions 0 (, and multiplications d . The length

M 2 transform uses i 2 ), 0(2) , and d(2) (Kolba and Parks,

1977). The Ec (3.200) becomes:

u -1 M (y(n=1 k 2E 0 ( d(2 i() x(nl,n 2 ) (3.202)

2~ k r r n=0 2r=0 n 2= 0 2

X(kl,k 2 ) in Eq (3.201) is a length M1 transform of y(nl,k1 )

which can also be written:

X(kl#k2 ) = 0( 1) m 1 n1 y(n1 1 k (3.203)

m=0 k1m m n=0 mnI 2

Substituting Eq (3.202) into Eq (3.203) gives:

U1 1 () d(1) M - (1)X(k2k 2 ) = k1m m n *mnI

m=O ~ nl=O

u E 0 k2 r r n i( 2) x(nl,n 2 ) (3.204)r=0 n 2=0 r2

The order of summation may be interchanged to "nest" the

multiplications in the center which gives Eq (3.204)

rewritten as:

X(k 1 1 k2 ) = 0(2) U1 -l 0(i) d (I ) d (2 )

r-90 k2 r m=0 k1m m r

M 1-l1 (1) M 2-l1 (2)x iMn i rn x(nI1 n2 ) (3.205)n1=0 1 n2=0 2

I.

115

II l .. . .. . .

Eq (3.205) is th( f(- m .hat 1-1 s implcrmen t e! into FORTRPA

code (McClellan and Nawab, L979) and listud in Appendix I.

As an example of the "nesting" structure for thle WFTA

consider the case of N=3 given in Eqs (3.190) through (3.192).

First, let

7 ) (3.206)

7(2) [MI/2 + M2/2J

then equating Eqs (3.206) and (3.191) gives:

(i M/2 +2 [ w1 + , (3.207)

3?I/2 - M2/21 (1W2 + x(2)WI

Substituting,

W= exp(-j27/ 3 ) = -1/2-j(/3/2)

W = exp(-j4T/3) = -1/2+j (/3/2)

into Eq (3.207) provides:

M1 /2 + M2 /2 = -x(1)/2 - j(x(i)/3/2)

-x(2)/2 + j(x(2)y/3/2) (3.208)

M 1I/2 - M2,/2 = - (1),/ + j (x (1).-/2)

- M(2) '' - j(x(2),3,12) (3.209)

Solving for M1 and M2 gives:

M= -(1/2)(x(1) + x(2))

M2 = - j (.3/2) (.x(1) -x (2)) (3.210)

116

For the algorithm to be used in Winograd's algorithm the

r mu;t-nlications by W0= must he accounted for and minimized.

This is accomplished by modifying the length 3 DFT to:

a1 = x(1) + x(2)

a2 = x(1) - x(2)

a 3 = x.(O) + a1 (3.211)

M1 = (-1/2 - 1)a 1 -(3/2)a 1

M2 = -j(/3/2)a 1

M3 = W0a3 = a3 (3.212)

C1 M3 + M1

X(O) = 3

X(1) = C1 + M

X(2) C 1 - M 2 (3.213)

Eqs (3.211) through (3.213) result in 2 multiplications,

1 multiplication by W0 , and 6 additions which can now be

expressed in the X 0-D.I-x notation as:

[El1 fl 11}{1 -3/2 0].0 1 1].[X(1)] (3.214)."X (2)_ -1 14 1 0 -j/Z- _0 0 -l _x (2)

and then rewritten into sunmmations as:

u-i N-IX(k) 0kr d iX (n) (3.215)

r=0 n=0

The fast convolution cases for N=2,4,5,7,8,9, and 16

were developed similar to the method used for N=3 above.

The explicit equations for these cases provided the small-N

117

~ -. -

operations count show.n : T,1] , 3.6 whi, ch i:; i. n .orm-

puting the real operatir;': count :I:;. funrcti on

the M'TA.

3.5.4 The Prime Factor Algorithm Theory. A. ilter-

native to the nested alconrithm ,-nroIe( Lv WinocraK was

developed by Kolba and Parks. Because of the algorithms

structure it is called the prime factor algorithm (PFX)

and uses a modified version of Winograd's hiqh-speed con-

volution technique.

Converting the DFT to circular convolution and

reordering the data arrays for the PFA is identical up

through Eq (3.197)

where W = exp(-j2T/M 1),

W M = exp(-j2T/M 2), with M1 and M2 relatively

prime.

The transform in Eq (3.197) may be performed by calculating

M1 length M2 DFTs:

M -l n ky(n1 k2) = 2 x(nl,n2)W (3.216)

n=0

then calculatina M., lenath M, DFTs:

NMII nlk 1

X(k1 ,k2 ) =1 y(nl,k2)W 1 1 (3.217)n =0

The expressions in Eqs (3.216) and (3.217) are implemented

*as short DFTs instead of "nested" operations as shown in

Eq (3.205).

118

TABLE 3.6

SMALL-N OPERATIONS COUNT FOR WFTA

Mul t

N Mult by W 0 Adds

2 0 2 2

3 2 1 6

4 0 4 8

5 5 1 17

7 8 1 36

8 2 6 26

9 12 1 44

16 10 8 74

119

P or both al eorithr" :-tructure the 5:mal-N o ua 1ions

;:r: tx' . : r, only tI. r:i2 I. t t - i,n i., ,iffc ,n t. In

the case of the PFA strucLure, the _:,all- q ulyvrithms arc

modified to permit a "shift operation" instead of a multi-

!- icatinn lby 1/2. 7or thr. N=3 ex ampl I Pqs 2.21l) through

(3.213) are modifieO, to:

a1 = x(1) + x(2)

a 2 = x(l) - x(2)

a 3 = x(0) + a1 (3.218)

M 1 = -(i/2)a 1

S42 = -j (/3/2) a 2 (3.219)

C 1 = x(0) + M1

X(0) a 3

x(1) = C1 + M2

X(2) = C1 - M2 (3.220)

Eqs (3.218) through (3.220) have 1 multiplication, 1 shift

(multiplication by 1/2) and 6 additions.

Similar smal-N DFTs result for N=2,4,5,7,8,9 and 16

to produce the operations count for PFA small-N alorithms

shot,:j in 'able 3.7 (Burrus and Eschenbacher, 1980)

4 (Complex valued sequences require the count in Table 3.7

to be doubled.) If the implementation of the PFA does not

use "shifts" the multiplication count must be adjusted to

reflect the multiplications by 1/2. The original FORTRAN

program written (Kolba, 1977) did not include the factor

of 16. Later modifications (Burrus and Eschenbacher, 1980)

120

I7

PFA SM-ALL-U: D7T HEPATIONS COU.T

N Multiplies Shifts Adds

2 0 0 2

3 1 1 6

4 0 0 8

5 4 2 17

7 8 0 36

8 2 0 26

9 8 2 49

16 10 0 74

NOTE: For complex sequences the values inthe table must be doubled.

i2

121

.ir.cn ided H,- f.-ctor of 16, ich -:!, cliA na 1T-7 of

tran:;Forrin'i!, , SCI . : UIh 2 . .i tis .';",1 ' . it

should be noted that neither 'uklW. ru ion implueiented

the "shifts" which increased the number of real

multinlications.

3.5.5 Real Operations for WFT,.. To use the WFTA

the N length sequence must be factorable into R relatively

prime factors N1 N2 ... NR where each factor corresponds

to one of the Winograd small-I: algorithms for 2,3,4,5,7,8,

9 and 16. It has been shown (Silverman, 1977) that the

number of real multiplications is a function of the factors

of N. To aid in the development of the number of real

operations the following terms are defined:

Mr = number of real multiplications in factor Nr

Ar = number of real additions in factor Nr

N rth factor of N

Winograd proved that the DN matrix is an MR by MR diagonal

matrix with only 0, 1, or -1 for diagonal entries and 0Nand IN are N bY and M by N incidence matrices, respec-

tively. To evaluate the nested multiplications of DN

(Silverman, 1977) requires:

NMULT = M M (3.221)1 2 -MR(32)

which is the real multiplications count for real valued

sequences. For complex valued transforms Eq (3.221) must

be multiplied by 2.

122

' : i i h," 1 t I .:. 1 ' • . .. _

All rvii u it jrd i t,'-t on ; co, nts (Wi nojr, , ] 976;

Kolba and Parks, 1977; Silverman, 1977) use only Eq (3.221)

as the source of real multiplications for the WFTA. The

multirlicctions in Fq (3.221) are all performed by tic MULT

subroutine in Fiuure 3.31. Other real riuI tpA I t:3ns

required in the WFTA for computing the multiplier coefficients

and determining the input and output permutation vectors

of the INISL subroutine in Figure 3.31.

The DFT multiplier coefficients are computed in lines

1450-1510 of the WFTA listed in Appendix H and require:

real mult = 3 * NMULT (3.222)

where N MULT was computed in Eq (3.221). Determining the

output permutation vector in lines 2080-2170 requires:

real mult = 4 * N (3.223)

where N is sequence length to be transformed. Combining

Eqs (3.222) and (3.223) provides the number of real oper-

ations required for initializing the WFTA. Subsequent

transforms of the same sequence length do not require

initialization. The first complex transform of length N

usinc, the WFTA requires:

real mult = 2 * NMULT + 3 * NMULT + 4 *N (3.224)

Subsequent complex transforms require:

real mult = 2 * NMULT (3.225)

Counting the number of real additions is more compli-

cated because the factorization order of N will change the

real additions count (Silverman, 1977). For a given factor-

ization of N = N1 N2 ... NR the number of real additions

123

--- ': -- '' __', ...7Z _ ! , i~ ~ i t I-, ,- . ........ "AL,

- 1_- --- - -

I Determine

,;eave IYulti:Iier,

t :eCntroffcinttod Y.eter.....

Figu'rent 3.1ClwCnrli APogram.ent

,dul 1es.e er. i n J

L -j2-o. . . .

)t

Figure 3.31. Flow Control in WFTA Program.

124

AM&

r K "r.. . . .. _ ~- c ri I i i; ( , I _I llr i

K,' .,]~ c . .. hc :' ::,1, in.i 1i ''... . UO.

the WEAVE] and WEAVL2 JuLruuiucL; in> ur 3.31. First

the real additions from the "WEAVI~s" can be developed by

considerini the speciai cise of * 4 . . N. 1 «mnd

as the "inner-ost" factor and N2 is the "out(rrmcst" factor.

For two factors of N Silvcrman has shown the nu,:ber of

real additions to be:

A(2) = N1 A2 + M 2 A 1 (3.226)

(Recall A2 equal real adds to evaluate factor N2 and M2

equal real multiplies to evaluate N2.) Now consider

N = N1 N2 N3 where (N1 N2 ) is considered to be the "inner-

most" factor. The number of real additions becomes:

A(3) = (N1 N2)A3 + M3 A(2)

= NI N2 A3 + M3 N1 A2 + M3 M2 A1 (3.227)

By iterative substitution the number of additions for

N N1 N2 N3 N4 becomes:

A(4) = (N 1 N2 N3) A4 + M A(3)

= N N2 2 3 A4 + M4 N1 N2 A3

+M M3 N1 A2 + M4 3 M2 1 (3.228)

Eqs (3.226) through (3.228) are used to write a compact

expression for the rumber of real additions needed in the

WEAVE subroutines:

125

vi,

A(R)= 2 ( ( N.) ,/ (. 2 '

The expression in Eq (3.229) represents only real add iti ns

used in WEAVEl and WEAVE2. Other additions are re,:uirc.v

the INISHL iniLializationi sLut.U index L:,

coefficient array and corpute the output index VUct(o-.

Ihe DFT coefficient array is indexed with a J counter

in line 1500 of the FORTRAN WFTA program in Appendix H.

This part of the INISHL subroutine requires NMULT real

additions. The input index array INDXl requires another

J counter in line 1720 which uses N real additions. The

output index array INDX2 uses a J counter in line 2160

which uses N real additions. Also the INDX2 computation

requires 8N real additions in line 2120.

Totaling the real additions in the initialization

subroutine gives:

real adds = NMULT + 1ON (3.330)

Adding the results of Eq (3.330) to Eq (3.229) gives the

total additions needed to transform an N length sequencc-f-

the first time. Subsequent transforms at the same N sc>: n

length requires only the number of adds in Eq (3.229).

The FORTRAN WFTA program written by McClellan and

Nawab, 1979, decreased the number of real multiplications

for N=9 from 13 to 11 while the number of additions remained

constant at 44. Modifying Table 3.6 to reflect the new

multiply count for N=9 gives the McClellan and Nawab real

126

Tabic 3.8.

Using Eqs '.3.229) and (3. 330) wit :1 3., gI e.; the

nir-1)r of real ilr l: - r il1 . ': , n

"REAL MUL 1" and "LAL ,'DI rcL"reLL.t . i D1l" for

the initial transforn of lentn N. The coiumns labeled

"RFL MULT" and "REAL ADD" cive the operations count for

subsequent transformations o the save sequence length.

The number of real operations are plotted as a function of

N in Figures 3.32 and 3.33. These graphs demonstrate the

large reduction possible after the WFTA has been initialized

for an N length sequence.

3.5.t Memory Requirements for WFTA. Thf- FORTRAN

subroutine WFTA listed in Appendix H requires 2348 words of

program memory when compilc2 for the CDC Cvber 74 computer.

The memory array requirements are given by:

XR, XI, INDXI, INDX2: length N

COE , SR, , ,: i- Ith V.-3.'J1'': 4 w lc is

thc nu-1., ' .: t b.' .' v.

the factors of N. NMULT is listed

in Table 3.9a and b.

C03, CO4, C05, C03, C016, CDA, CDB, CDC,CDD: Total of SS

The original version of WFTA dimensioned IDXI, INDX2, COEF,

SR, and SI to their maximum possible lengths of 5040, 5040,

10692, 10692, and 10692 respc'7tively. This made the memory

127

"'' -" '* ', ? I , i , , - , , ,, -... .. . ... ...

TABLE 3.8

MCCLELLAN AND NAWAB'S WFTA

REAL OPERATIONS FOR THE SMALL-N ALGORITHMS

N M(N) A(N)

2 2 2

3 3 6

4 4 85 6 17

7 9 368 8 26

9 11 44

16 18 74

128

C

C C.

0

1-4 ,-4 -4 -'-4 -g 4 -, rj v-1

U,)z0H

E-4.

k, j. 4\.4 r V

0 " -

129

-4 4

N 4

13

0

.

LU

0LaJ

OC3

Ln Z

* 4

0 0

oc

C ) 4-1*

C) Z

t- Z x 0

C3 (- LlU

CC C3 r4

LL() x

K N Xc

'44

13

-LJ

0 LC) Z

0 L

0n 0

rL

0 0

V- -13K 4

C) -

-J +

'-4 J 0I

cooV-4 C3

*+ 0

00O*OZF- 00 otrz OO9 00,01 090 000

132

IL.J

0LiJ

'IJCC0

C)0

11 ~0W

ci 4

0 '

o3 0

4- .0

orC))

0

0 a

0 0

C3 0

4 0

4 0 0

4 0 3

0C)

60.0

k8OW3W gI

133

array storage very large even for the shortest sequence

lengths:

memory array = 2N + 2*5040 + 3*10692 + 88

= 2N + 42244 (3.331)

The memory arrays INDXI, INDX2, COEF, SR, and SI were

variably dimensioned by the author's version of WFTA in

Appendix H. This reduced the memory arrays required to:

memory array = 4N + 3NMULT + 88 (3.332)

The results of Eq (3.332) are listed in Table 3.9a and b

for all values of N. A comparison of the memory required

by Eqs (3.331) and (3.332) is plotted in Figure 3.34 which

shows the drastic savings in memory storage by using the

variable dimensions. The "cost" of variable dimensions is

more work for the user of WFTA because the dimensions must

be passed to the WFTA subroutine using more arguments in the

subroutine call. The original version required:

CALL WFTA (XR, XI, N, INIT, IERR)

The modified WFTA call is:

CALL WFTA (N, XR, XI, INIT, IERR, SR, SI, COEF,

M, INDXI, INDX2)

where M = NMULT. The increased complexity of the second

call is worth the savings of memory arrays.

3.5.7 Real Operations for the PFA. The real operation

sources for the PFA are computea from reordering the data

and performing the small-N DFTs. The unscrambling constant

A which maps ths PFA result from arrays X and Y to arrays

A and B requires N real additions and no multiplications.

134

' ' ' 2 - " ik -PIC A W&-. .. ...

The second source, computing the smcill-N DFTs using fast

convolution, has been proven (Kolba and Park, 1977) for

two factors (M1 M2 ) to be:

real mult = 2(M 1 u 2 + M 2 u1 ) (3.333)

real add - 2(M 1 A2 + M2 A1 ) (3.334)

for three factors (M1 M 2 M 3):

real mult = 2(M2M3u1 + MIM3U2 + M1 M 2u 3 ) (3.335)

real add = 2(M 2M 3A1 + M1 M 3A 2 + M1 M 2A 3 ) (3.336)

and for four factors (M1M 2M3M4) :

real mult = 2(M2M 3M4u1 + MIM 3M4u2 + M1 M2M4u 3

+ M1M2M 3u 4 ) (3.337)

real add = 2(M2M3M4A 1 + MIM3M4A2 + M1M2M4A 3

+ M1M 2M 3A4 ) (3.338)

where ui is the number of multiplications required for

Mi and A i is the number of additions required for M.

Notice that complex data transforms have been assumed in

Eqs (3.333) through (3.338) and the number of multiplications

tand additions were multiplied bV two.

As shown in the PFA theory chapter the small-N

alqorithms can be implemented by using "shifts" instead of

multiplications by 1/2. The FORTRAN programs available do

not make use of these shifts. Therefore, the operations

count for the PFA small-N DFTs shown in Table 3.7 is

modified to produce Table 3.10. Using the results of

135

- [{ • p •7'

TABLE 3.10

PFA SMALL-N DFT OPERATIONS COUNT FOR NO SHIFTS

N MULT ADD

2 0 2

3 2 6

4 0 8

5 6 17

7 8 36

8 2 26

9 10 42

16 10 74

136

-Aok

Eqs (3.333) through (3.338), the N adds required for the

output mapping, and Table 3.10 the number of real multi-

plications and additions are listed for all permissible N

values in Table 3.11a and b. The corresponding graphs in

Figures 3.35 and 3.36 show the multiplications and additions

as a function of N.

Even though this FORTRAN program did not use a shift

to perform multiplication by 1/2, incorporating shifts into

the small-N DFTs represents a significant savings of real

multiplications. The major benefit would be in small

computers where software multiplies are more costly relative

to additions. The benefit of performing multiplications by

using shifts is given in Table 3.1a and b under the PCT

(percentage) column. PCT was calculated by:

PCT = ((M-MS)*I00)/M (3.339)

where M is the number of multiplications without using

shifts and MS is the number using shifts. The percentage

savings as a function of N was plotted in Figure 3.37 for

all values of N.

3.5.8 Memory Requirements for PFA. The PFA proqram

listed in Appendix I requires 770 words of program memory

A when compiled for the CDC Cyber 74 computer. The memory

array requirements are given by:X, Y, A, B: length N

The memory array required by PFA is given by:

_..y array = 4N

137

TABLE 3.lla

PFA REAL OPERATIONS AND MEMORY COUNT FOR N<72

-'. -"-'4.

") . c. ' -:? '

!1 2-- 1C , ?'

(".l ., .°

"- 'I..'"Z "' -1:*.,4, , 4

- 4 ":1 " ,' ," .7 -"" -:"-:-(

... .i... . . ; .4"., - . ~4.~ / .4 -" ! : : 4 ,

..4- 4 ', ? ' '

...... ~ . t 4'i

i - -S.". '.a - -. -, - 4i

TABLE 3.11b

PFA REAL OPERATIONS AND MEMORY COUNT FOR N>80

f ", 1 """ L l

IL 7 D/- ?T -A ,

-- : .,r , r ,,- 7

L L,

4. ..

4 ,- - -.. -l 2 . - . -

-i '- r ,' -"

4-. . --- -

2" P

139

-- , " : ; i "

. ,- . : . ....

-

u-i

0

o LiJ

CoLLJ

ocot 0

ato

-- 4

C3

x.

X Lfl'x-

x

00 M 0009300,13 o 0 1 o 6 oc'(Fon 18NIr

140

AL0

0

LO

0

9z

LAJ

o LLJ

0 L&04 4

10

C44-

00

0

141.

LO

C!

0

C)

0WOLJ

UJU.H

0..H

w-d4

0-

La

4w

40

40 L0

142

LO

C

a

C!0

-LJ

CD

wZ

9) 0)

C--

oc

0-I

It 0

v 0L

40

Af., Am

and is listed in Table 3.11a and b ,ind Piotte - in

Pi,;ure 3.30.

3.5.9 Summary.. Two algorithms which use hic-h-

speed convolution techniques have been presented. Both use

the convolution for computine snal]-7.U DFTs ndI Loch reqnuire

N to be factored into relatively prime factors. This

particular factorization used the Chinese Remainder Theorem

and the "Sino correspondence" to reorder the data arrays.

The theory, structure, and operations count was presented

in this section.

144

4.M,,, A f - ' . ...

IV. Comra-ria;on Results of Efficient

ric-.t ou i ,r IT an-; forpi!

4.1 Introduction

Several fixed radix and mixed radix algorithms have

been studied nd C. number of real operations an6 meporv

count required have been computed in the preceding sections.

The results from these sections are compared and presented

here.

Tradeoffs and advantages of fixed radix and mixed radix

algorithms are discussed, the justification for selecting

Singleton's algorithm over the IMSL and mixed radix FFT

is given, tables and graphs comparing the conventional

rmixed radix FFT with the fast convolution algorithms (WFTA

and PFA) are presented and advantages of each are discussed.

This chapter concludes with an algorithm which selects the

most efficient algorithm based on memory available, machine

speed, zeropacking, and sequence length. A flowchart imple-

mentation of the algorithm is included.

The timing tests in this section used the Cyber 74

S- sstem clock. This clock was accessed using the FORTRAN

command SLCOND(CP) which provides a timer accurate to .001

seconds. The transforms were all performed using samples| -t

fror the function e cos 50nt which has the magnitude

transform shown in Figure 4.1 for N=625.

I

145

lw.

N- 625

D-

rn

oMAG

0

0

.0

0-

-50.00 -30.0O0 -tO.?O 10.00 30.00 50.00

FREQ

Figure 4.1. Fourier Transform of e- t cos 507t.

146

0|

"± ,__ • ° c...

The memory comparisons vidc in thi:s chccpter are based

n -(,m~ory Jrrv ) n: r*: te:

compilation on the Cyber 74 is not applicAAli fo smaller

machines and would not permit valid memory comparisons. The

program memory required for the C''Lcr 74 is Tiuvo: t show

the relative sizes of the algorithms.

4.2 Conventional Radix-3 vs R(u) Field Radix-3

In the previous chapter the real operations count for

these two radix-3 FFTs was given in Table 3.2. From this

table the most efficient radix-3 algorithm can be selected

based on machine speed. Validation of this table was per-

formed using the CDC Cyber 74 computer which has a 1.1

multiply-to-add ratio and test data).

With a 1.1 multiply-to-add ratio Table 3.2 indicates

that the conventional radix-3 algorithm is more efficient

for all sequence lengths shown. The timing results in

Table 4.1 verify this conclusion.

4.3 Fixed Radix vs Mixed Radix FFTs

In Sections 3.3 and 3.4 the real operations count and

memory requirements develoPed for uhe fixed radi; and mixcd

radix FFTs. Using the results from these sections the real

operations count and memory requirements are given in Table

4.2 along with results from timing tests conducted on theI!

CDC Cyber 74. This table demonstrates that Singleton's

mixed radix FFT (MFFT) minimizes the operations count for

factors of 2, 3, and 5 to the level of the fixed radix

algorithms.

147

- 'U-

TABLE 4.1

RADIX-3 TIMING COMPARISCO

Colive"'ionl 1 R(Lu) i l

N Radix- 3 Time Radix-3 Time

27 .002 .003

81 .009 .011

243 .026 .034

729 .094 .117

2143 .305 .393

44

148

-27-

Lo 4 J

F x

C\4O r- CDLo n :, r 1

~~~~~~~ LI -000 7 7~r

CN cc -T~

C- I

-44

The prograi 7emory requirer! by v>a!, nri thF is given

tfn r''W l] c. '.3 . Jh & . l,-2-JO 2]: 2 f i} " " i V V : ''' Of th<

exLra sections needed to transform any len-!th tr a nf.sfrrm and

the extra FORTRAN code required to perform multi-variate

transforms. None of the other FFTs are caablo of -erformir.

multi-variate transform, without a significant amount of

additional user programming. Singleton's .IFFT can Perform

up to a tri-variate transform, however, this additional

flexibility is a disadvantage on memory limited computers

when performing single-variate FFTs.

The fixed radix and mixed radix FFTs are roughly

equivalent in efficiency. The fixed radix FFTs offer a

memory savings over the MFF2 for all radix-2 transform

sequence lengths shown in Table 4.2 and some of the radix-3

and 5 transform lengths. The main advantage the MFFT offers

is the capability to transform any length sequence N while

the fixed radix algorithms are limited to integer powers

of 2, 3, and 5.

4.4 Mixed Radix FFT Comparison: IMSL vs Sincleton

In Chapter 3 and Appendix C the rec] );joraticns and

memory required for the IMSL and Singleton's mixed radix

FFTs were derived as a function of N. Those two algorithms

are now compared on the basis of real operations and memory

and the best algorithm selected.

150

• . .e -

TABLE 4.3

PROGRAIM MEMORY REQUIRED BY FFTs

FFT Program Memory

Radix-2 108

Radix-3 301

Radix-5 458

Singleton's MixedRadix 1100

I

151

ihe ex )1 o ;io f for real muIt j;) 1 i i (t)r io : , i ( i t,!, tions

(I I,' O( o -()1 Si n~(" Iotf(n ': 1.'"T it ! !"t.' 't,,: !-':', I y i.

FFT expression for real operations to show the o:.:tra oper-

ations required by IMSL. Recall that both Singleton and

IMSL versions of the FT compute sine and cosine usino the

difference equation of Section 3.1. Both implement the

sine and cosine computation similarly and require the same

number of real operations to compute them.

Assuming that N can be factored as:

r s t u ml mk (4.1)N =2 3 4 5 p1 ... k(41

the difference in real multiplications between JMSL and

Singleton's becomes:

delta multiplies = [IMSL multiplication expression]

- [Singleton multiplication expression]

deltamultiplies = (2rN + 4sN + 3tN + 8 + 32(u)N/5

k+ E (2 (pi-i) + 4(mi)N(Pi-l)/Pi

i=l

+ (mi)N(pi-l) 2/pi) - 4N-1) + KMULTI

- [2rN + 4sN + 3tN + 32uN/5

k+ E (2(pi-l) + (mi)N(pi-l) /pi|i=l 11 1

+ 4(mi)N(pi-l)/pi) - 4(N-1) + KMULT]

= 8 (4.2)

For large values of N the difference in multiplications is

negligible.

152

The difference in roa] additions iz derived from:

adds = [IMSL addition expression]

- [Singleton addition expression]

deltaadds [3rN + 6sN + 15tN/2 + 4 + 48(u)N/5

k+ ( ((pi-) + 8(mi)N(pi-l)/pi

+ N(mi)(pi-l) 2/pi) - 2(N-1) + KADD]

- [3rN + 16sN/3 + lltN/2 + BuN

k+ E ((pi-l) + 7N(mi) (pi-l)/pi

i=l

+ (mi)N(pi-1) 2/pi) - 2(N-1) + KADDI

= 2sN/3 + 2tN + 8uN/5 + 4+ N(pi-l)/pi (4.3)

The results from Eqs (4.2) and (4.3) demonstrate that

the IMSL has approximately the same number of real multi-

plications but requires significantly more additions than

Singleton's mixed radix algorithm. Based on these results

and because the data reordering for the two subroutines

is the same, the Singleton FFT is the nost efficient of the

two subroutines. This conclusion was confirmed by timing

tests on the CDC Cyber 74 computer at AFIT. The results

are shown in Table 4.4 for selected sequence lengths.

The memory array required for each of the alqorithms

was derived in the preceding chapter. Those results are

now compared for N less than 200 and the percentage of array

Vmemory saved by Singleton's FFT over the IMSL FFT was plotted

in Figure 4.2 using the equation:

153

M- , .

TABLE 4.4

TIMING RESULTS FOP, IMSL AND SINGLETON FITs

IMSL Singleton

N Time (sec) Time (sec)

60 .010 .008

120 .018 .014

125 .019 .012

128 .013 .011

210 .039 .036

243 .031 .031

256 .028 .021

315 .054 .052

420 .081 .072

504 .090 .082

625 .128 .076

729 .107 .107

840 .163 .150

1008 .151 .157

1024 .126 .092

1250 .275 .158

1260 .268 .231

2048 .269 .224

2187 .366 .364

2520 .565 .495

154

/~• • -, , ~~f ., '. _,- -... 1

CD

touSz--w

C3

C -

- 0

C!r

C U2

co 4

0

0 4

5155

. SI'/1ln~:; - (Mi..ICC ' .... ' .. ... • - ... ,) r O, . ., _(4.4

MILMSN,; : Sing1,. '-on ' s arra'y : 1mo r','

From the plot it is evident that Sinqleton's a1:ritm uses

loss maner' than the ICSL .ro-r . Th " " - of

the curve approaches 57'. which can be verified hv c":aina-

tion of Lqs 3.172) throuoh (3.17S) for N a primc nu.bcr.

This number represents the memory savings at the points

where N is prime.

The values of M, K, KB, and JK used to compute the

IWORD constant in Eq (3.173) are M=I, K=N, KB=N-2 and JK=l.

IWORD = 3 • M + 3 + MAX (4 - M + 7 + 6 - K,

KB + 1 + 2 - JK) (4.5)

IWORD = 3 + 3 + MAX (6N + 11, N + 1) (4.6)

IWORD = 6 • N + 17 (4.7)

Now the memory for IMSL given that N is prime becomes:

MEMCC = 2 N + 2(6 • N + 17) (4.8)

MEMCC = 14• N + 34 (4.9)

The array memory required by Singleton's FFT is based

on the v-,.]us NP and KD. ',P is direns i ne".i ,'t .ess than

the product of the square free factors of N or if at most one

square free factors is present, MP can be dimensioned to M+l

where M is the number of prime factors in N. KD is the size

of arrays AT, BT, CK, and SK where KD equals the largest

prime factor in N. Using these results the expression for

[array memory where N is prime becomes:

MEMSNG 2 • N + 4 • KD + NP (4.10)

156

.. ,

:': i , i i i i {' ' ' I i i= "A

Substitut ing for N1 and ED this c.uaition is.

.. .... 2 >_ 11 4 N 4 2 (4.11)

MEMSNG 6 ' + 2 (4.12)

Substituting Eqs (4.9) and (4.12) into the percentage

expression in Eq (4.4) is seen to Hciach approximaLuiv

57%:

% savings = ((14 • N + 34) - (6 • N + 2))

• 100/(14 N + 34) (4.13)

% savings = (8 - N + 36) 100/(14N + 34) (4.14)

As N gets large Eq (4.14) becomes:

% savings - 800N/14N - 57% (4.15)

which corresponds to the results shown by Figure 4.1.

The memory array must be added to the program memory

to determine the size of the program. The program memory

required by each algorithm was determined by compiling each

algorithm for the CDC Cyber 74. The IMSL FFT used 1061

words and the Singleton FFT used 1100 words. The larger

size of the Singleton FFT relative to the IMSL version

is because of the extra FORTRAN code needed to ,erform

multi-variate FFTs. These program me -cr " ficzurcs -ire onlv

applicable for the FORTRAN compiler used here at Aii'T,

however, they do provide a relative measure of the program

memory size. Singleton's program requires about 3.7% more

program memory.

IThe results for real operations count and memory

required show that Singleton's mixed radix FFT is superior

157

1IWI

- 1,A'ho 1i..

availab I for com-ari -,-i to thr WPTPA ,-nd T-' ,\ ' !, fol 7low-

in sections.

4 ' ConV(UtI .alV-; V <~ Con,'c- i

Singleton's algorithm (MFI"I) is referred to as a

"conventional" FFT because it uses the Cooley-Tukey deci-

mation and reordering of the data array. The WFTA and

PFA use Winograd's small-N fast convolution algorithms

to perform the DFT. The operation and memory array counts

are presented in Figures 4.3 and 4.4 and Tables 4.5a and b.

as a function of N for comparison of the three algorithms.

These tables and plots illustrate the advantages and dis-

advantages of each algorithm and are used along with the

fixed radix results in Table 4.2 to select the most

efficient algorithm for a particular sequence length and

machine capability (size and speed).

The tibles and plots refer to the algorithms as MFFT

(Singleton) , TA:.j neru< ) , -ind PA (i~oiba-Pa -ki . The

PI'A used for i-. , r on c,)Lnts and misorv -s s

the one described by Burrus and Eschenbacher which includes

prime power factors of 2,3,4,5,7,8,9 and 16. The FORTRAN

coded program for PFA was obtained from C. S. Burrus of

Rice University and does not make use of "shifts" for

multiplications by 1/2. Both the WFTA and MFFT FORTRAN

programs were obtained from the IEEE Press "Programs for

Digital Signal Processing".

158

I ! *., . 1r 1 i :, 1 .; ! k L... . .. a U S(

the program emory changes biased on machine woru length.

The program memory required for the Cvber 74 is liven for

e,ach a I -nr i thm so thr, YolcO i " ,-, be cor :.,

4.5.1 Real Operations Count. The mixed radix KFFT

written by Singleton includes special sections for factors

of 2, 3, 4, and 5 as well as a general section for odd

prime factors which permits the transformation of any

positive integer N length sequence. Because of the special

sections the operations count is less for an N which is

highly factorable by 2, 3, 4, or 5 instead of higher prime

powers. Figure 4.3 and 4.4 demonstrate the efficiency of

Singleton's MFFT relative to the radix-2 complex transform

multiplications and additions count of 2N log 2 N and

3N log 2 N respectively (Winograd, 1976). The MFFT oper-

ations count shown in Figures 4.3a,b and 4.4a,b are for N

factorable by 2, 3, 4, or 5 combinations thereof. The

WFTA and PFA counts are shown for all 59 seauence lenaths

which they can transform. Recal ,or. Section '.4 and 3.5

that \<F'TaA ad PIK s30uence ns -c limited by" the data

reordering algorithm used by the WFTA and PFA. These

Y figures also reflect the WFTA "post-initialization" oper-

ations count. As shown in Section 3.5 the post-initiali-

zation count is significantly less than the number of

operations required for the initial transform of length N.

159

... ~ ~ ~ b AAA.,,m ., i- _ . .

4 E0

LC~

4 El+0

+ 40+ C3'

0 L+ 8 u

+ + 0 L

4. 0

+1 0

+ ~C.

8 +

fl.1i

1601

.~+ +

0 CDJ

+ 0-j

+ L UJ

cnLU

0

a + +Ih.

+4- +

+- 00~0

4- 0c+ -0

++ 0

+ + .1

44-+0

+ *+- + Lb

+ .

LL-4 LL -X: CLA

4-00 4.gl 0 6T 09 06 0 a 0

I-o l*.. C34E

LL a. U. 4161

+ 03 ~LO

+ (-O.

+0

0 UJ

L

+. a LLI

+

+ ~ 0A LO

+ +

+- 0+ 0 ,

+ 100

+. 0

I. .L-

Ll.. LL. LL.

Al El +0

4 0008 0+09 DOo O6 0.g D-c D

1010

162

+?4 C2

+ C)

-Ji

+ 0n

+ c

+

4-L

0

CC-

+

4L ci

4 LO

C4-C34

00-00'Zc 00Q 0O 0LR 0.o o46 1011

SOOU -IU3

163

.4.-1 .4 4 ,-4 -4 .4 ,4 j C\J C\ J C'j CVj

.c: LL 71, ') -I'j ?) it X -n 7 ' Cj -t aNI 'j q 7 -oIn -q0 - Cd V!) "n f

A44

2- L 4 V-4 14 V4 r'!-4e.J4

0

PL.4

:4 c

0 U)4.fIm4 (\ Nj co 4r IV*'C~ oN % \I , C%4 :C 00 C

It ..-4 ti- -4 -- 4 r' I) t -I~. (1 ,.0 CO 01 11- -4 -4~ 1. 'j .C M\ rr) Z O - I,-

uI. ,-A. -X 44(I(iN 4

.40

4P

o4 14 -- ,- (\I -4 .4 (1 Nk 01-4

IL4 -4.~ -,. .4-r x 4,. A NJ-

E- ZZ

164

JD~ T ~ -1~c~f- '0~\Jj .

-4 1-4 1-4 v4 14 T4 (\.1 w' C'j C\J r-) .1 :: c

C\J~~~~~~~~~~~~~~~~ co -*N ocol 4 - t-1- \J\ WD- l I

.11 L - " t ,M N C, jo ' or% , ' J 0 , r.) 7r -4 T4 co 0 r 4 ) tA .I o'r C\J

-a -4 -4 0,11 it\

I- LL 1i 0 -0 m - r, un d (r. , I.'-tn , .cr , ff 1- t f m r 4 r ,

H- -4 14 4-4-1 . C'.j C'4 rl) 0 t-P 4 14

E-4

.r4 C. 14 14j C-j 14j C'm (J NN "~ 1-\ to cc ' c IMt oO

0(t cc co NA %C~ coi 'T k' (3' 11 x.. K . IQ '\ (NJ 2' J (\I co. (xi -T .4-ac )

Lr -d v..4 Nj (\ EN. fl) P") N )V cx -4 *.. .. d .Tr - , I W -)

14

.1 W -f .-4 4 .- 4cr4 F- to u.) tD 0l'1' 0 ~ l. c7-'V cc 4to 4N.

E-4 L,, T% In w-4 -4 -

o1 0- - -,4 .)' c~H4 .. j (\4 F-) -+ to I' IN I N ~

n JM + n"4,1%01,W n N1i -i - -A N~ ',.71 PO (\I . en P- .Vt\n -

4 --4MI-4 -lC\ -4\ +4 .4 .- -(% ' N') In -41-4

LL 4 -4 kC (\J jrjj r) M. " M-\ )nf C,1 -) 7' C

11 -4 -4 -4 14.4(V (\J (M I-)- i JI 4

04 <

C) (j k. 4 a j ~j cx - rV -- V165(\

(2I i In rt-i t- o n.o f nt nII ,i tC!- i ial i-

fi\ ( fl.jJ tile .,t 'I _I '1 ,

data presented here w .s :llected by timing tie £:diviuUal

subroutines (INISHL, PERM 1, WEAVF 1, MULT, WEAVE 2, PERM 2)

in the WI2TA for different sequence lengths ind th,,n dlvinw

the time required for each subroutine by the total time for

all of the subroutines. Comparinc the MFFT and PFA against

the post-initialized WFTA is assumed to be valid because

most applications of DFTs involve the repeated transform

of N length sequences.

A point by point comparison of MFFT, WFTA, and PFA

real operations is presented in Table 4.7. The

sequence lengths in these tables represent the only lengths

permissible for both PFA and WFTA, whereas the mixed radix

MFFT can transform any sequence length. The operations

count presented in Tables 4.2, 4.7 with a computer's

multiply and add speed can predict the most efficient

(fastest)DFT technique for that particular computer.

Using the multiply and add ipceds determined for

the CDC Cv-ber 74 (see Appendix J) as 1.9 x 10- 6 s 7con* ~-C

a : ] .-, x 10 scenvk,, ,y' ct1V-: , the t: '

execution speeds were predicted from the operations count

in Tables 3.9 and 4.7. The predicted execution speeds

do not account for all of the actual execution time

measured as shown in Figure 4.5. The extra time which

was not predicted by the real operations count comes

from array indexing and data reordering needed in all of

166

TABLE 4.6

TIMING RESULTS FROM TIE WPTA SUBROU'T'INES

N INISHL PERM 1 WEAVE 1 MULT WEAVE 2 PER%1 2

315 48.0. 7.5% 16.3%, 4.5, 16.3% 7.4:

360 47.0% 5.9% 15.7% 5.9% 21.6% 3.9%

630 43.9% 5.6% 18.7% 5.6% 21.5% 4.7%

720 44.0% 3.5% 20.0% 6.1% 22.8% 3.6%

840 34.5% 5.5% 23.6% 6.4% 23.6% 6.4%

1008 48.0% 1.7% 19.2% 6.2% 21.5% 3.4%

1260 38.2% 5.3% 18.1% 6.4% 27.7% 4.3%

Results are given as % of total timeto execute WFTA.

1

I

167

C ) C D 0 C ( (D 0 c

00 Q

Qr r

0 x

C'. N 0n TTl 0' N -4 r- r4 r C E0 1-4 1-1 (1)N m mN ( D 1'DT '-0 0 1- -40

Ok 0 0 0 0 0 0 0 0 -

En -4 r'

- 4 0 (N ('( C)(N. r 7 r Lf) 4jU

-1 -(N -) (- WZ (N10 r r) C o :C- 0 CD 0 0 (N LA D 0 -4 N C) a L0

-4- u m

LA 4 Q)

0' -40 N ( 0 N C 0 2' N L-4 ~ 4 ~ 4 (N (( 0 0 0 N 0 c C 32

C2 - 0 0 0 0 0 0 0 0 0 0 '-4 I nE-n r- -1 o r- a.o-

4-4 C~ -4

(N 00 0 00L^

(N ( (N (N .) N &.)U 4.)H -0 0 0 0 0 ~ <

00

E- -

00 0 0- -

1- N' -N m3 m T 0 w r N .3)

-4 -4 C444

168-'0'

AAII

N C-

if If

w

A LL3T

4

ElHd GMCI JO *::;dVO CaJI~-

170

based only on real operations are sufficient to select the

most efficient algorithm as demonstrated by Table 4.7.

The tiiiinc: results in 'Table -.pmo ra ono-t-o-nne wi-th

the predicted Limes (given the standard dviatio:.: she.' in

parentheses) for all three alaorithms. Several observa-

tions can be made from Table 4.7. First, the WFTAI which

represents the initial transform made by WFTA may be slower

than MFFT for certain sequence lengths. An example of this

is N=315, 630, and 720, all of which were correctly pre-

dicted to be slower from the operations counts in Tables

3.9 and 4.6. Second, the post-initialized WFTA2 and the

PFA were predicted to be, and are, faster than MEFT for all

sequence lengths. Third, the PFA and WFTA2 (post-initiali-

zation) are close in efficiency for all sequence lengths.

4.5.2 Memory. The memory array for MFFT, WFTA, and

PFA was compiled from the previous chapter and presented in

Figure 4.6 and Table 4.5a and b. The figure clearly demon-

strates how much less memory array is required by >TFT.

These rosults are due to the efficient d vta reorderino i

technique of MFT which can essentially be done in place

with very little additional memory relative to the sequence

length. The WFTA and PFA base their data reordering on

the Chinese Remainder Theorem and require an additional two

length N arrays for PFA. The WFTA uses even more memory

array because of the algorithm's structure which "nest"

multiplications inside all the additions. This requires

171

w

~Lb

a(

+cn rJ -

C; I.I 4

C()

C4.

>1

+ (

4-4

44

E)E

1- U- CL j 4

r- C)

L4

E) q O y

172

thr ,, -iddition, l arravs nf- lonuith -1 --

store the multiplication coefficients : • ing9

array storage because the WFTA is not comiputd- i 1ce.

The proqram remor': was not inc11 udcti i!n th( t I on S

for comparison because program memory required depnclds on

the machine word size. The :rogram menor" required on

the Cyber 74 for each algorithm is:

PFA program memory = 770 words

WFT program memory = 2348 words

FFT program memory = 1100 words

These results were achieved from the standard compiler

command FTN for the FORTRAN IV language. For short sequences

these program memory requirements contribute significantly

to the choice of the most memory efficient algorithm.

4.5.3 WFTA vs PFA Operations Count. The tradeoffs

between WFTA and PFA for real multiplications and additions

can be seen in Figures 4.3 and 4.4. In most cases the WFTA

requires less multiplications but more additions than PFA.

The selection of the most efficient alcorithn then oc0=o's

dependent on maclhie adeed t ion ad- t en!(:o 1, o

real multiplication. As an example of this tradeoff between

4additions and multiplications consider the case of N=630.I

For this sequence length the PFA requires 4352 multiplica-

tions and 18534 additions while the WFTA requires 2376

multiplications and 22072 additions. Assuming the machine

add speed of 1.7 x 10- 6 seconds and a multiply speed of

173

AI.-< .

_ _. I I I ( I l i i t9

For thc sclectud ad&. a i:1ui Liply' speed PFA was faster.

However, c.nsdhr , case where a multiply requires three-6

tj.; tIhA . iL! .2 10 sc6onds. Fr the

same N=630 the P!-. speed is -redicted to be .054 seconds

and the WPTA speed is .050 seconds. With the increase in

multiply time from 1.9 to 5.1 microseconds the WFTA

became the more efficient algorithm. This example illus-

trated why the add and multiply speed must be known to

select the fastest algorithm for a particular sequence

length N.

The effects of changing the multiply to add ratio from

1 to 20 is shown in Figure 4.7a, b, and c for MFFT, WFTA,

and PFA. For the sequences N=315 and 1008 the PFA is most

efficient at the low multiply to add ratios but as the

multiplies are "more costly" the WFTA soon. becomes the

most efficient. For N=30 the WFTA is the most efficient

for all ratios.

* 4.6 T P1 c i it -A '. t Ior ithrs

It is clear from the plots in Figures 4.3, 4.4, andI

data in Table 4.2 that the fixed radix FFT, PFA, and WFTA

are somewhat limited in permissible sequence lengths,

whereas the mixed radix FPT provides a much more "dense"

4selection even for sequence lengths factorable by only

- 2, , 4, or 5. The restriction in possible values for N

174

+ 4 D

+ 4 S

4 El

+. 4

4. 4 9)

4. 4 63

+ 4 E3 a l

4 + 4 ElC) 0

+ 4 E)(

+. 4 j o3+ 4 0 0

.00

+. 4 El L .)

C) a:

+. 40 03

+ 4 E

+. 40 l

+. 48 * 4

+. 4E)

M X C.. 49

+ ~ C3

* + 40

068 o-L L. 69 4.o 49 Doo -1 D

04 1 LL0L6

17

4Ism

+ 14 El

+ El8

- El

+ 4 El

+- 4 El

+ 4 El CZ

+ 40 El

+ 4 El

+ 14 El

+ 4 E CJ

+4 4 Sl

+ 4a

+4 E)

+4El eL-+ 4 ee

+. q El U)

+ 44 El 0 i+ -d C -4

+ 49 ElL

+ e9I-

+ 4 E

o>+ 4

+ 4+J~

+ 4+

00~~~~~~ ~ ~ 499 C 91 0 z1 0-6 0; o-d

JWIINO~iindw

* 4176

+ 4 El+ E3l '

4- El

+ d El

4 + 4

- 4 8

C

4- 4 El o

4- 4 cm-.'.a -- _

+ q El

+ 4 E)0

+- 4 El c

4- - 4 8

+ - 40 El mf

+ E-40

4+ 46 8

+ 464

4- 4 8- , ,-

+ 48 E

4 4 El -4

CD U

4 E3+- 48 EH

+ + El

+ 4 El4-4

+- 48+- 4E8 4*

CD 3 Lk- NO L L - 0

+

4--

Q)l

0

t+ I

3WIi NO~UiflidwoDj

177

2,3,4,5,7,P,9 aind 16. This limits #o 'imr factors and a

:u l u 0 of 3040. The fixed : i:: L ri t :. ire cyon

-- : st ict.] ch-i .' or i"'A L,,_,,",ur oiuc." c n trnsfor:

onlY sequence lengtn which .re an intejer ower of 2, 3, or 5.

4.7 An Algorithm to Select the Most ff'icient DFT Technicue.

The results of this chanter are used to develop a

systematic approach to selecting the most 2fficient DFT

method from the fixed radix FFTs, mixed radix FFT (,MFFT),

WFTA, and PFA. A flowchart is presented which selects the

most efficient algorithm based on real operations, computer

memory, machine speed, and sequence length. The algorithm

requires inputs of machine speed for add and multiply,

sequence length, zeropack limits, and computer memory. This

alqorithm also assumes that the same length sequence will

be repeatedly transformed such that the WFTA is initialized

only once.

4.7.1 Ar;::ts. .oj , th::--------rs Inruts:

N: Sequenco lone:th to be transformed

NP: The upper limit to which the sequence length

can be filled to reach an efficient transfo, rm

length.

A: Machine addition speed

M: Machine multiplication speed

178

7 AD-AlSO 782 AIR FORCE INST OF TECH WRISNY-PATT'ERSON AFB OH SCHOO-ETC F/6 12/1EFFICIENT COM4PUTER IMPLEMENTATIONS OF FAST FOURIER TRANSFORMS. (U)

DEC 80 J D BLANKEN

UNCLASSIFIED AFIT/6E/EE/8OD09E4EEE~~EEmhhEohhhEmhE

I IEEEEEEEEEEIEEhuEEEEEEEEEEE

4.7.2 Usage. The algorithm is presented as a flow-

fcli ir,. The hclejic lr (jic of the alqorithm is:

(1) Zeropack (if permitted) to the nearest WFTA or PFA

sequence length.

(2) Determine the memory requirements for the WFTA and PFA.

(3) If WFTA and PFA both fit in computer memory available,

select between the two by using real operations and

computer speed.

(4) If only PFA or WFTA fit in computer memory, select the

one that fits.

(5) If neither PFA nor WFTA will fit in computer memory,

zeropack to nearest N an integer power of 2, 3, or 5.

Choose the most efficient algorithm from the fixed radix

FFT and MFFT based on real operations counts and

machine speed.

(6) If fixed radix FFT cannot be used, zeropack to nearest

N factorable by 2, 3, or 5 and use the mixed radix FFT.

Using the flow diagram of Figure 4.8a, b, and c along with

the specified tables selects the most efficient algorithm.

An example for N=410 demonstrates the use of Figure 4.8

6' and the tablcs in this paper to select the most efficient

DFT. Given that A=450 nanoseconds (ns), M=1000 ns, 10%

* zeropacking permitted, and no memory limitations, the most

efficient algorithm can be selected.

t (1) MEM is very large and is not a limitation

(2) N=410

(3) NP=410 + .10(410) = 451

-Y 179

-f: - ; .. . . ..i . .. ... ; \ , -: . .. - :4 : . . . - ,

L Zoropaickn

MEN Permitted

'A Table 4.-

Set fr

NN

Set A

NP

DetermineA and M

NoN P WN,

SNo

4N

Figure 4.8a. Flowchart to Select Most EfficientAlgorithm.

180

neare .it WFTA& PIFA lenirgtl!

>N in

PFA fit No wilFtin c will notA

? fiteither

Table 4.6 & Y

\ ro .m moryfast; es f o

W F AN U s e t h e P F

A

fit in to transformiomputer o sequence

Agrtlength

'

fastesro

Determine

Table 4.6~usina

A&M

{ Stop

1.-

Figure 4.8b. Flowchart to Select Most Efficient

,

Algorithm.

181

- -

A

? ?

Yes Yes

[eslect tl.

N.P ralix-3 "(es fja.test

w r of . . . - 'iradix-3 fr,-

,3,or5 Tbe3.

?No No

SelectfatUse MFFT est FFTto compute or MFFT in I

DFT Table 4.j2

•ompute memor Computerequired for memory for

MFFT FFT chosen(Eq 3.172)

FFTmemory<

MEM?

Zeropack to

W most "factor-]able" N

No Yes

i '.N cannot Use FFT

F.be chosen totrans formed Itrans form

i sequence

~Figure 4.8c. Flowchart to Select Most Efficient

Algorithm.

V 182

Jim . . . .. - ' -.t -". . '-

(4) NP=N? No, continue

(5) NP<5040? Yes, continue

(6) Zeropack to nearest WFTA PFA length given in Table 4.6

which is NP=420.

(7) PFA fit in computer? Yes, continue

(8) WFTA fit in computer? Yes, continue

(9) Determine fastest algorithm between WFTA and PFA from

Table 4.6. For N=420,

WFTA PFA

Mult 1296 2528

Add 11352 10956

Using A=450 ns and M=1000 ns the predicted speeds

are: WFTA = 6.4 milliseconds

PFA = 7.5 milliseconds

For this sequence N=420 and for the add and multiply speeds

given the WFTA is the fastest algorithm. However, if this

sequence were only being transformed once for a particular

utilization and the WFTA could not be repeatedly used without

initialization the WFTA counts must be taken from Table 3.11

where 4920 multiplications and 16200 additions are used to

initialize the WFTA and perform the transform. Now the

WFTA is predicted to use 56.5 milliseconds to transform

N=420. When selecting between WFTA and PFA the particular

utilization must be considered.

It should also be noted that the predicted times from

Table 4.6 are based only on real operations which do not

account for all of the execution time required as shown by

183

I

L " ", ' l t' " - "- ... . . . -.. . . . ..---

the timing tests. For the cases tested in Table 4.7 on the

I CDC Cyber 74 the rcoal oporatioai* accounted for average 670,

of the PFA, 65% of the WFTA, and 61%. of tho.MFFT actual

execution speed.

184

V. Conclusions

This paper, for the first time, presented a capability

to select the most efficient DFT based on real operations.

These real operations were tabulated and plotted as a

function of N. The algorithms studied and compared for real

operations and memory include:

1. Radix-2 FFT from Rabiner and Gold.

2. Radix-3 FFT written by the author.

3. Radix-3 FFT in R(u) from Dubois and

Venetsanopolous.

4. Radix-5 FFT written by the author.

5. Mixed radix FFT for factors of 2, 3, or 5

written by the author.

6. IMSL mixed radix FFT which can transform

any sequence length N.

7. Singleton's mixed radix FFT which can

transform any sequence length N.

8. Winograd Fourier transform algorithm (WFTA)

written by McClellan and Nawab.

9. Prime Factor Algorithm (PFA) written by

Burrus and Eschenbacher.

5.1 Results and Conclusions

j The two radix-3 FFTs were compared for real operations

and memory required to perform the DFT of N length sequencesIM

where N= 3m. Selection criteria were developed and tabulated

based on machine speed. The new radix-3 FFT in the R(u)

I..185

go.FA:

field uses less multiplications but more real additions

than the conventional Radix-3 FFT. The more efficicnL of

the two algorithms depends on the relative costs of multi-

plications and additions. The Radix-3 in R(u) is most

efficient when multiplications are costly.

All of the fixed radix algorithms were compared to the

Singleton mixed radix FFT for real operations and memory.

The operations counts show that the most efficient algorithm

depends on multiplication and addition speed of the computer.

Data was tabulated for selecting the best algorithm based on

this criteria. The FFT algorithm using the least memory can

also be selected from Tables 4.2 and 4.3. The limited choice

of sequence lengths possible with the fixed radix FFTs

reduce their utility compared to Singleton's mixed radix FFT.

Three conventional mixed radix FFT algorithms were com-

pared for efficiency, memory array, and flexibility. The

author's mixed radix FFT was very efficient but required

more memory array and was not as flexible since N was limited

to factors of 2, 3, 4, and 5. It was shown that Singleton's

* mixed radix FFT was more efficient, flexible, and used less

memory array than the IMSL mixed radix FFT and was chosen

I as the best conventional mixed radix FFT.

Singleton's mixed radix FFT (labeled MFFT) and the fixed

radix FFTs were compared to the WFTA and PFA. The real

t, operations and memory required was tabulated and plotted for

all of the N length sequences permitted by WFTA and PFA.

186

1.-u .-.. ,.~-. - mm -

This comparison showed that the WFTA and PFA required less

t real operations but that the FFTs requires less memory. The

MFFT was much more flexible than WFTA or PFA since N can be

any length sequence.

The WFTA and PFA were then more closely studied and

the tradeoffs between the two were discussed. The PFA uses

less additions but more multiplications for most N length

sequences which means WFTA is more efficient when multipli-

cations are "costly" relative to additions. The PFA uses

less memory than the WFTA which makes PFA preferable when

the machine is memory limited. Further criteria considered

in selecting between these two algorithms are the (1)

machine language and (2) the particular application of the

algorithms. If the machine language permits "shifts" to be

used for multiplication by 1/2 the PFA performance can be

improved. (The percentage improvements have been tabulated

for all permissible PFA sequence lengths). The second con-

sideration affects the WFTA since any repeated use of WFTA

for the same length N sequence does not require the algorithm

to re-initialize the multiplier coefficients. Improvements

in operating speeds of 40% over the initial WFTA were realized

on the Cyber 74 for various sequence lengths.

An algorithm to select the most efficient DFT method

from WFTA (Winograd),MFFT (Singleton), fixed radix FFTs,

and PFA (Kolba and Parks) was presented. This selection is

based on: minimizing real operations and minimizing memory

size for the machine used. Minimizing real operations is

*1.

187

AVIEW-

the best "first order" criteria (Singleton, 1969) and was

t vcrified by timing the transforms on the CDC Cyber 74. A

summary of the above conclusions is presented in Table 5.1.

The PFA was chosen as the best DFT technique because

it minimizes real operations well below the FFT levels,

requires substantially less memory than WFTA, and is more

flexible than the fixed radix FFTs. Of course, the "optimum"

algorithm depends on the specific application and computer,

but for general applications the PFA provides the best mix

of minimizing real operations and memory.

5.2 Recommendations

The above conclusions related to an algorithm's

efficiency were based on real operations and then verified

by timing tests on the CDC Cyber 74. The Cyber 74 is a

representative large main frame computer with very high

speed additions and multiplications.

To further substantiate the conclusions of this paper

it is recommended that similar timing tests be made on other

computers (large and small) available at AFIT and the results

compared to the predicted efficiencies based on real additions

and multiplications. All of the data necessary to perform

these tests is available in this paper.

1

188

Imw

iA L2 )

-4-' 4-)

-4 I a0H I 4 4 - 4-i

-4 w I .-

4-)

1-4

0

rz. i-irzy)

00

0 4-0

0 8)E-14

00a

U))

(d- 4-)0 0

0 d ) --4 *. 0 a4 u 4

9~ d C -4 -400 r.

x x

E-444P E-4 E-4 0

189

-- q1 hL

ill ±ographv

1. Bergland, G.D. "A Fast Fourier Transform Using Base-8Interactions," Math. Comp., Vol. 22, pp. 275-279,April 1968.

2. Brenner, N.M. "Three FORTRAN Programs that Performthe Cooloy-Tukey Fourier Transform," M.I.T. LincolnLab, Lexington, MASS, Tech Note 1967-2, July 1967.

3. Brigham, Oran E. The Fast Fourier Transform.Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1974.

4. Burrus, C.S. and P.W. Eschenbacher. "An In-Place,In-Order Prime Factor FFT Algorithm," UnpublishedArticle. School of Electrical Engineering, RiceUniversity, Houston, TX 77001, 1980.

5. Burrus, C.S. and T.W. Parks. "Efficient Techniquesfor Signal Processing", Final Report, ContractDASG50-78-C-0082, Ballistic Missile Defense AdvancedDefence Center, 30 June 1979 (AD B039920L).

6. Cooley, J.W. and J.W. Tukey. "An Algorithm for theMachine Calculation of Complex Fourier Series",Mathematics of Computation, Vol. 19, No. 90,pp. 297-301, 1965.

7. Dubois, E. and A.N. Venetsanopoulos. "A New Algorithmfor the Radix-3 FFT", IEEE Trans. on Acoustics, Speech,and Sig. Processing, Vol. ASSP-26, No. 3, pp. 222-225,June 1978.

8. Gentleman, W.M. and G. Sande. "Fast Fourier Transformsfor Fun and Profit," 1966 Fall Joint Computer Conf.,AFIPS Proc., Vol. 29, Washington, D.C., Spartan,pp. 563-578, 1966.

9. Kolba, D.P. A Prime Factor Algorithm Using High SpeedConvolution. MS Thesis. Houston, Texas: Rice Univer-sity School of Engineering, May 1977.

1 10. Kolba, D.P. and T.W. Parks. "A Prime Factor Algorithm1.4 Using High Speed Convolution," IEEE Trans. on Acoustics,

Speech, and Sig. Processing, VoT. ASSP--T, No. 4,August 1977.

11. McClellan, J.H. and H. Nawab. "Complex General-N WinogradFourier Transform Alqorithm (WFTA)", Programs for DigitalSignal Processing, Edited by Digital Signal ProcessingCommittee IEEE Acoustics, Speech, and Sig. Processing

4W Society, IEEE Press. New York: John Wiley and Sons,Inc., 1979.

190

-o -- -

12. McClellan, J.H. and C.M. Rader. Number Theory inDigital Signal Processing. Englewood Clf, NewJersey: Prentice-lall, Inc., 1979.

13. Morris, L.R. "A Comparative Study of Time EfficientFFT and WFTA Programs for General Purpose Computers,"IEEE Trans. on Acoustics, Speech, and Sig. Processing,Vol. ASSP-26, No. 2, April 1978.

14. Oppenheim, A.V. and R.W. Schafer. Digital SignalProcessing. Englewood Cliffs, New Jersey: PrenticeHall, Inc., 1975.

15. Pollard, J.M. "The Fast Fourier in the Finite Field",Math Comp., Vol. 25, April 1971.

16. Rabiner, L.R. and B. Gold. Theory and Application ofDigital Signal Processing. Englewood Cliffs, NewJersey: Prentice-Hall, Inc., 1975.

17. Rader, C.M. "Discrete Fourier Transforms When theNumber of Data Samples is Prime", Proc. IEEE, Vol. 56,pp. 1107-8, June 1978.

18. Reed, I.S. and T.K. Truong. "The Use of Finite Fieldsto Compute Convoluticns", IEEE Trans. on Inf. Theory,Vol. IT-21, No. 2, March 1975.

19. Silverman, H.F. "An Introduction to Programming theWinograd Fourier Transform Alqorithm", IEEE Trans. onAcoustics, Speech, and Sig. Processing, Vol. ASSP-25,No. 2, April 1977.

20. Singleton, R.C. "On Computing the Fast FourierTransform", Communications of the ACM, Vol. 10, No. 10,p. 651, October 1967.

21. Singleton, R.C. "An Alqorithm for Computing the MixedRadix Fast Fourier Transform", IEEE Trans. on Audioand Electroacoustics, Vol. AU-17, op. 39-103, June 1969.

22. Thomas, L.H. "Using a Computer to Solve Problems inPhysics", Applications of Digital Computers, Boston,MASS: Ginn, 1963.

4 23. Winograd, S. "Some Bilinear Forms Whose MultiplicativeComplexity Depends on the Field of Constants", IBMT.J. Watson Res. Ctr., Yorktown Heights, NY, IBM Res.Rep., Rc 5669, 10 October 1975.

191

4' - -

24. Winograd, S. "On Computing the Discrete FourierTransform", Proc. Nat. Acad. Sci. USA, Vol. 73, No. 4,pp. 1005-6, A-pil 1-9-76.-

25. Zohar, S. "Fast Fourier Transformation: The Algorithm* of S. Winograd", NASA, Jet Propulsion Lab, California

Ins.titute of Technology, Pasadena, CA, 15 February 79(AS N79-19733).

1

' 192

Appendix A. Radix-2 FFT Algorithm*1 tThis appendix presents an algorithm for computing the

I !

complex fast Fourier transform (FFT) defined by:

N-1X(k) = Z x(n) exp(-j2-Fnk/N)

n=0

where k = 0,1, ... , N-1

* Mand n=2 M integer.

A FORTRAN subroutine is listed for computing the

radix-2 FFT of a single-variate forward complex Fourier

transform or calculates one variate of a multi-variate

transform.

Arguments.

A = The complex array to be transformed which is

dimensioned to length N.

N = The integer sequence length to be transformed

which must have length equal 2

M = The integer power of 2.

Usage. For a single variate forward transform:

(i) Specify the input complex sequence A along with

parameters M and N.

(2) Dimension complex array A to length N.

(3) Call FFT2C (A,M,N).

(4) A contains the complex output vector x(k).t

193

IqU;: na:. AMA

*ii

lH

.:4I D 1 1 1 ..-. H.. ,TO 5

" 'I T

TPT

-: 0O I 141f.,KE J a; I

- n I , I I T

*194

- Air,' :

"r-I

tI .

J

Appendix B. Radix-3 FFT Algorithm

This section presents an algorithm for computing the

fast Fourier transform (FFT) based on a method called

decimation-in-time described in Chapfer III. This algo-

rithm is an efficient method for computing the

transformation:

N-IX(k) = E x(n) exp(-j2rnk/N), k = 0,1,2,..., (N-i)

n=0

where X(k) and x(n) are complex valued. This algorithm

requires that the sequence length be N= 3m, m=0,l,2. ....

This appendix lists a FORTRAN subroutine for computing

the radix-3 FFT. This subroutine computes the single-

variate complex Fourier transform or calculates one

variate of a multivariate transform.

Arguments.

A = The real part of the array to be transformed

which is dimensioned to length N.

B = The imaginary part of the array to be transformed

* - which is dimensioned to length N.

M = The exponent of 3.

N = The length of the data sequence (N=3 M).I

IW = A work vector of length M.

WKS and WKC = Storage arrays of length N used for

sine and cosine lookup tables.

195

-l - V h lm l i . . .. . -

Usage. For a single variate forward transform:

(1) Specify the input sequences A arid B along with the

parameters M and N.

(2) Dimension A,B,rW,WKS and WKC to the correct lengths.

(3) Call FFT3TM (A,B,M,N,IW,WKC,WKS).

(4) A and B are the output real and imaginary portion of

the complex vector X(k).

I1'I;

V 196

I ' " " " " i, .. .

I '-.4 j

PL Ti Li -Ll

TH PE5 P; TE F itF

021 D; I ETPU T P I13 PEPL-IE;'BYIT E P92E TW;42.

P3: 1 I;PT 4W2CJ T~ HG C ll I' IE.0~ 0 1M LDPi 'T OP ;F L E; J TH M1.

:310 i ;C: RA FOP CTE1PAA4_F7 OIF THE 0I~ETERMS:7 USEI.F TO T;.d1IDPLE EP

320C=- .C.: RRAY U S ED TO STORE THE S*INE TERMS TO T-IIDDLE THE RE:<U;LT-332-0 C H LE-H-3TH OF THE RRAY TO BE TFRHC FORMED.24

FT IS OIRU;TED J5[4 UTT ERFLY ruO;.'I.P0PH HO BY270= ?1IITAiL *S I iHL RPC& PH Y OREHIPSHFRP. :3'14.

: I: P- -UTHOR. JOHN D. E:A;AEU, CARPT! UCAF 4 AUG 8 04 0C=41 11 =42 0J=c43iCCTR OFC:BRTIEFT;1440=iF *.i11EcIUALLARYS*.

46 0= PEA;L A'' f:( * j:([ fr'4

4

j 570-.: COMD;PU-TFE E :IdE ._O CO IdE LOOKrUP TABLES USIH_.31'G THE RECURSI'%-E

I Ii I= H r 1

0=FIN THETAlI?640-= 4 ;:C'1 '=.

$ 197

Js

77 -7

j= j-

PR 1;J *'P T ME i:D.

::0LrE=:Ecor TH 1HU F' -E Cu86 Cl~i'- ANDI1 THE DIGrIT REVEF*.-. TH1E' :O.'T

10 = .C.HUNPF- E THEE LIDiT [FPY UCI;Z3POTERCO~4E9:3-L NPAE Te I j Is:T ERETEHO?*.8? 0= DO

'A = 21i=r-:=l ir:

91 -C Z'EO ~TE HFCOUHO: OF P O;4TE

9 l2c= uF IC=~

9080= DO;iT 21 TO~. 26

I ~ F TH CIO=JH T P.P. 4

I~ I i= 24 =a f ICOUH

I 01 P -:14T 1 .O N1T.

1 14p j- ;iT

I 730= IF 'HFEHT.'. LE .2I:UT.* i'30 TD 2i o ~ O i T=1t eIEv

11 0 v)= 2 1 OUH T 'I C;O;T + I'

198

I

T ;7 -r 4T 7-_ 7,- T -

.&j TT L

T T

1 3 :. i 0i- T E T i:

1* =I_ ,1IIE F CI:' T-

13:-40= 0; ,'I:'- T 8.:, 6 02 540:3: 7.-' 441:35 L=r

I :3-0C THI LOP ' -THE :;.TFGE H . THF HO. [IF

1:_:38 0= X,'' I Hi- S'EC,-ID (C P'-.,l:3 ,)=.. DO1 30 L.'=1,9'01400=0141 0=C THE ITEGEP D 1:- THE DISTAFNCE PETFEN P;LITTEPFLIES:F,

1420=0 W.HICH HA VE THE :-_-:HIE COMPLEX TW,.IDD LE FAICTORS :(TF)1430O=C1440O= D=3**4L1450=C1460=C TYPES OF BF IN4 STAGE WJHICH USE DTIFFER.ENT TF:1470=0 .148A = LI =L- 1

1490= TYPE=3*,*L1 I1500=C1510=C INITITLIZE THE T;IDDLE FFICTS:152 0=01530= TF1O.

;1540= TFBI1=O.1550= TF2=1I.

1561)= TFB2= 0.1570= TFA3=.180= TFB3=, .

1600=C TYMPUTE DI:'TTACE ETHIEEH F EHDPIFETTR. FTP THI:7.TAi E:

1020= P=TYPE

' 16,-3 ,] ::"-2=F *2•. 1640= PX4=R*4I165 0=0

• 1660O=C1670=0 CINPUTE I THEX CNSTIDTS FO COS & SI LOOKUP TABLES

1680=0

16"30= K I =,1,170= T2=2K I1710=0 THIS LOOP INDEXES THE BF WITH ._iTFS, INDEXES THE TFS,1720=0

1730= D 40 Jx=,TYPE

1740I=C FIRST STA GE HAS N TF!S SO SKIP TF COMqPUTATION!1750= IFL.E0. " GO T, 601760= IF(J.EQ.i) DSN TO 60

1619

162= RTYP

. - .-i _ - - - -... - - -- -_ I ..... _ _- _ I n _ -!

* -c-- _.... ->

- - 1 4--. -

LE T , F - . .-

1", ':4 P-I T=P 11)I,' F:I T=:-I :-"

1-~i1 F : L =: 1 .7)-!CPE~:7 WrI ' F '. E:". 1. SD TO 11'-7-,= -,T, i.E' 1 .- f T E-1-'n= FE-'T ,: ._' ', aTF'"P -T: I. T F T2.

B:T= e,; I2: .TFB2) * ,: E) .TF- 2.:

2000= IT= ,:A ':I:- 'TF B:':, B 1:,:3) *TFF;:-)2010= 3 TO 622020= 61 R2T=A,:I S)2030= B2T=. I: 2'204 0= ';:3T =F, 1:3:20 5 =:T=B ': : ,2 0 o0 = ::

2070=C COMPUTE THE BF:20'18 (I='-.2 (1,4 0 = , R2 ,2F_-= 2T S,:='T

21 O = B =:2 T F-',T21 ':II) =;I T *C'' :S:A212 ' = I T -2 -

21 fm ' P ,II=I T -T. 5 -'.2 1 5 A =! l E I -i . . ' E

- F'; F!c

-- ' = 411 ;,lT 1 ''

I , II I, ,I-4 =', ncr- r. o-ir''N)T I t.,

C--ii2 2=IF' 4 " F TI; , :'','F

.4,,-.

i =.'il OF FTTM IUk'RI'UTIicE..4._,

* 2340= E2D

. 200

4.

Appendix C. Radix-3 FFT in R(u)

This appendix presents an algorithm for computing

the radix-3 FFT based on a method which transforms the

array from the complex domain (l,i) to the R(u) domain

(1,u) .

Arguments

A = Real portion of the complex data sequence to be

transformed. It is dimensioned to length N.

B = Imaginary portion of the complex data sequence

to be transformed. It is dimensioned to length N.

M = The exponent of 3.

N = The length of the data sequence (N=3 M).

IW = Work vector dimensioned to length M.

WKC and WKS = Storate array dimensioned to length N

and used for sine and cosine look up tables.

RTEST = Set equal to zero or one. If the data sequence

is real, RTEST=l; if the data sequence is complex, RTEST=0.

Usag. This algorithm is an efficient method for

computing the transformation:

N-I

X(k) = Z x(n) exp(-j2rnk/N) k = 0,1,n=O


restricts N to equal 3M where M = 0,1,2, .

201

LM

. ., , , .q~m ... .. - . .. .. . . .. .. ..

i -I>'...'m " ' ..

For a single variate forward transform:

(1) Specify the input sequences A and B along with

parameters M, N, and RTEST.

(2) Dimension A,B,WRC,WKS, and IW.

(3) Call FFT3RU (A,B,M,N,IW,WKC,WKS,RTEST).



i'0

i' 202

,-'-z =:-. *, - . ,_ . . .. ,_, T,-._ ,c -, te. ...:T , 4-..&4> 4 . -t. I, *. ecee-b: e.

r,, '- . r.V" rrrt-,-' , -- .... ~ T't z c . T

-- ' I'1= -r- n'-,-rr,- . ... -. .. .. . -L -7

, , ,, T,, T O: OF L. E . T T .4 I", M I ...jT B 0. - , T,,iiTHE 1i -M LU TO F TP;, F . ?tEiD

On OU T.FII-T F: IC EFPL~z:' E 11 P THEF'E Ti0,N-'0 =- "1: I,-_IT 01- ,,- N NT TO M!l_-C :- 1 1 : ED I E. ,;N=:3

II-: IM: ' ' O'ETO OF LENGTH M.:- E. ,._T-. OF THE A 'rY TO B E TRANFO:-:F '.:3:' -:f1 .QT E:-7T TE:-T FLAG3=1 IF PE:-L TRAN'-FORM:34"(1 TEST FLAG = IF IF ,OMLE. TRA;:FORM

7 T-.

-1-4.-

4 1 w~ T1 T I - 4N C :C $ I S 7 O F ~ ~ 2 I ' ; H ; E

E D7, F4 TTE T-C FF1':: Q7 T IEEET

.4 III-1D K i

27, :znL=:;P .

L bee C',-e

"-: FT T I T F

#4 f10INN s :

-.-.c..

FE: =- ~,ZF.T.ASF':2TI:; F1TATAS'-T F 1 TPT.iv- -. INI V-ON ',

610 = ,-iv,"'=1:_-* 'P- I T : E - , '-= ::h : e [, :_- _,-,-c ... ,- . .. :; , ' .. . :, ';- ' T ; T : l'1

I == = ' F~ S "! I' I T ' = -.. ... ' ... . - T T ,..E F ' .203 .

- r' .-' .. .z , % . .. " ; , , -

I T

F - I r r ..:- , ] ,r=11-I,.Y,, *, LL'"

O:C-- 0T. 0TID 2 T.Hf p4 cl

840 N pp F N9q 3 . 1r I q- N!IP

870I:= DO 81 I ..... N

8'90M : .

'7 -1 1- I-

cF'INT*. "L0DV TF TI" 'E T

7 T l,- I,

LE$ T T I T 7E'.'EfCETEY m 0

I4 -471 4

-t .4 0 .. .. . . . .

' h~l,- r'- 7 * -1 r

L' I

I 'if0= 3E ,j .7 rN ... .. .,r.- E,---

' rr

1 0 1 , r= :7 T.

1 79= c--4 "l 4- f- P .. I .. , 1- ,- *,',N

i :_=ail=c :1 : 1",2 ,: Ytth',: :- (.J :, P2 1 21 0 HEit1 0= ...... . .. . 0_: ,, 3 T 2

1150-= T,=-, r .E

1160= TB=P rFE V

204

I.I

-~ T2

'?=I E R L I:l' -OT

14 (1 C],] H1450= DO;S:_ I I

ri .l-- - r---- ;.- 4

1 ;-, 1 '1 = : 4

1 ii I;1 i" l i. , , , - 2 =I : REAL : F , ARRA .:.C . 1 ER.C- ,, [IO, ti:: ,"IT , ...E,.,C'UIRE.rI

140 F- fNL::: ARA7 :

1:T T1117T - -'ivR I T~ Ru-FED EU1 r

14 ; I7. 1 = L141 0= 0 , ON ER A,-oB-1420=0 TE."KI FOP',.REAL'n OP COMPLEX.: FARAY

14J .- = I F RTE:T. E. 1 ) :3 TOT 641440= ! .1:4 - 7T, . r T:... .:14-50= DO 63 l,

G T' _,ZEi',l TIC:', =;qI

1476= [:'=L , I

7 1

1,,- ,.

1; c. , in 2 'I . T"1T:Ci.E T 5> Tl:

i*l l.iii=:* 4l 1 " 'a' : '4 I> r '- -r r- nl r: T :J 5 TI - T - : P TT !.,2 E ,f FIIT T F~ %C I

~~~~1 ,_ - i i -'I' I - -

S,; 1 "10= LM=-1

", 170 O= T..2E!=L1l

,,ll,; 1710=0-

205

?A 3-

,-.I. . . . . . ... I--

I THI's. LO.I' ,E.c THE :v :,I[TH " TF-T -I:-.

.V,-?- -ONVEqT TO P... NOT-AT IN

194'1 rIO 40 _1=1, TYPE195(=C FIRT STAGE H O: D TF!.S SO 7.:Ip TF COMPUTATION196 O= I F :L. ... 1 , 3 TO 60 11970= I F (J. E :. 1 :, 3 TO -6 01980= .j,J I = 11990= = .. :1 0:1 +)eirw = "= 11I 4<i ,- 1:.

2. f 0= E:J:c FE I .....

C_ j

JJKO ~T IF.F

22 4 0:- T) 0 1]: =-:' ', : .: :l

;:lI 4[ 1--- -I " , ii $ :*

L E '

2r2--(I= I' J_ E 'T=. 1 3 TO I

22" 7: I = BX = 12 * TI2=

LH T.:- s". ='20

' T ,4 TI,,: ' . .

, ,--_-,. .'1 - -'

•M _,-,.r - -

I 0

c11Ti.. ~EP I2':SvEC22~::L:.~T~rU:'

4 I1' =F. IT +EcT .T48 C' :Ba T =- :-: T 4-F2 T

2 "4Q rl= ;2 F: T =-::2 T. :_- T

250 0 -: 12) =;1 T B :: T -; 2' T251 O= P <1 2 = 1F 1T *F2T -:',2T252 n= A 0, 3 =.A I T +B2T FT23T,2 B '1:3 =P 1 T -:3T -;2B3T2540= 50 C."ON T I 4U E;2550O= 40 ,C 1 ONT I, NOh-2560= 40 2NT IUE257 (1= C] 3.... "

2 ' =.-~

'-'-- . ,_ .T TO 7:Z 7

2. 1 r,=:"~ff,:,T.," : .T-'-,- ... .-,

*4

. .. _,. f, I , - T

207

-, *-.-. , = ,* r = -- r-, -,

App end L:. D. RFFi rPET Alqorithm

This section prusI.cnts an aljorithm for computing the

FFT based on decimation-in-time of the discrete Fourier

transform defined by:

N-iX(k) = E x(n) exp(-j2ffnk/N) k = 0,1,2, ... N-I

n=0


restricts the length of the sequence to be N=5m where m

is an integer.

In this appendix a FORTRAN subroutine FFT5TF is listed

for computing the radix-5 FFT. This subroutine computes

the single-variate complex Fourier transform or performs

the calculation for one variate of a multivariate transform.

Arguments

A = Real portion of the complex data sequence to be

transformed. It is dimensioned to length N.

B = Imaginary portion of the complex data sequence.

It is dimensioned to length, N.

MM = Exponent of 5, where N=5kM

N = Length of the data sequence (N=5 M).

PIW = Work vector of length M.%4

WKC and WKS = Storage arrays dimensioned to length N


t.

208

Usagle. For a single variat-o fL-rwtrd transform:

(1) Specify the input sequences A and B along with the

parameters M and N.

(2) Dimension A,B,IW,WKS and WKC to correct lengths.



I2

t

E

209

i!

ri T

fl~~~s r' z TA C

C O -OTrPU-T P. 1_7 EPLFU:Er Ut THE FOUPFI EP TPi~r4CFOFri.~A i M: INPUIT FOrIErT TO MIIHiCH 5. T- P.'T7 Er * I.E.-4C*

C.N: LErD3TH OF THE Fi-PRY To BE TF~riWFOPtMEE'.:31 'J C lIM: IJOPV '..EC: TOP OF LEr4JTH fl.32O=C 1-4 C: RPRi OF LENGTH N :Etl TO -TOPE Co:- INIE TEPWM:3.-_ 0 ='i C: iPPRY OF LEHGITH Ni UIEE TO :TR lETEPM.

350=C R IJTHOP: JOHN ti. PLANV4Et' CF4PT~ J,:A F

C .IL'C Tt--T OF- :A 1TIr4E FFTcTF7:

4~r L~ L

4400 F IL r. 4 p -r w N 1 4

*4 -1 t.-..-1IT T i 1r~Fv 1 iG It.E LDDO ;F y i 1 FELE - i I rit, Ti-sF

P F L T I ~ T

I)= :o LOWEF' =ri + I640= PR INITO, "NLLOI.E= rLOWEP

210

35~ ~ ~ ~ 0-' kik -=

:E~C'. COPNFUTE Ea~iE riOC.. OF COLINTEP8170= MFI::=M8380= NER ~E 1:) =1890= DO 21 .J=29M900fl= NBK7 . =I hi':FAC: .NE:A:E' (J- 1),:'I0= 2i1 r1FRCI =PFRC- 1q2o=': EPO TH Er4COU4T J) APARY

9310 DO24 j=1I.-l'40= 24 NC OUIIT J=0

-9_=C :ET PiE Ir-FT I CE.- FOP, Fi B RPR

l = r 4FF.= FP N fl 4 01_1t 4T t *P. 7E I*

111_1 :' 5 F~ 1-1frT

1 0 :u 01-F iE. T E I ' =rl-CINT 10 TO1 p

F- 14 ltTI L E G' 0, TO

IF It-l 1 1 reT 17 r4 0 T

1 ::. l OHr I NUE~frT I--

Iji

I 11 o=F I RjIT NOW .rH, FLEDTO #**

211

13~= THE I HTE GE P D IC THE DI ;TAAC.E BETieiEEN BUTTEPPLIE:--F,14fI(=C IHIT HFY THE PiE' COM3PPLE:.-: TIA IEIELE FAC--TOR.;S :TF*141 C=1420= t'5*lL143 0=C1440=C. TYPE:" OF BF IN :3TAGE WHICH lE."E DIFFERENT TFt ITAC BETW).EEN1450.=0 BF EUNiPOIr4T;_ FOR TNH:ITS3146i0CML-147 0= M=-14:3: 0= P=5..Lt-

j1ri I= I4T I RLUZE THE TiilririLE FACTT

1P 5: 1 ='[=.

1540= TFR:2=1.

155f, TFE =l.

1,.T f C17MI-UTF TkP I--- I1rl T,"iFLE 1~lC

I.a

II

I. DO 40 -1=1.,P

195@ mI5= I4+F-

1970=C TlIIDDLE THE BF INPUTS7 ANti S--TORE RESULTS IN TEMP LOCATIOrN:

19,')= A IT=R'-"112000= BlIT = B'IlI2010=C IF L =I1. NO TF!2S APE PECUIPED:20)20= IF.:L.E0.1, '30 TO 6-1

203=IF(J.E0'. 1', '30- 06E.2040= FA2T= AT8.TFF,,BI2.T- . ,:

2050= FT= A,l *TrE - FI1:T P7;,:P -?T=,R iZ4 1:'TFP P.- T-F.T PrtT

:7 0=t:4 T=, t:4 14 :' *TF 7 +a *T =,11:4 5:4T 44 *TF14P4*

c.t~i L4:T=H'T14'*TFE4+. 14 T5*Tcruc,

llu== FiT - 1'5 .TFES + F 5

;r Tn g :4

V-lrMli 'IF THF- l-t-

V22.90= A5MA2=AST-R2'T2.300)= B3PE:4=E---3T*E:4T

213

.~I; *IS -4- = - --" - 1,'- .

,:'4 L;,= -.t. - = *i:+, -- <f- +l:.1 :

2500= R. 4 B 3: 4 + , B 2.52,5 1 OI=

.. .1... . . 5-

7H, 4-4 EIF :2 , E4252(0= C ,-5 . 1T +C B 4 +C 2B-,:E :525:', 0= C B34=FB1T+C4E-:+C2B-B425401= .25 = :"-.4 4 4 -+ : A 5' R2550= .4 54, 2 A4 A.,2560= * ':I , =IT+-A4+_2PA52570= Fi 12:' = CA25+ A252580= A ,:,3:' =C :-',4 +: A ,:4257 0= R, 1",4: =C ;-'4- R -'4'600= A 15,=:A'- 7 ,-

261 ':i= F: '11', =EI T+B-''E-+P:C''.:' .--'n =E., 12) :, = : 2:'5 + -E:,--25

26-,0= P, -= + -: .'4

, '.14, =CE4- 7 :4265~ B,= E* 19:, BE25-" -::

-, 5. -I O T I rILUE

:f r lrT T it 'F

=1 Er'r' s- FFT5TF .It F:, LI T

4 1

214

<IL

Radix-5 FPT Thkorv

This section presents the theory of the radix-5 FFT

starting with the D!"1T definition and then decomposin(I th,

DFT equation using the decimation-in-time algorithm

(Cooley and Tukey, 1965). This development closely

parallels the radix-3 development presented earlier and

consequently the radix-5 theory will be brief.

The DFT X(k) is computed by separating the discrete

time sequence X(n) into five N/5 point sequences (n must

be of length 5m, m = 0,1,2, .. X(k) is given by the

DFT expression:

N-I nk where k = 0,1, ... , N-IX(k) = E x(n)WN (D.1)

n=0 and WN = exp(-j2/N)

Breaking X(n) into five N/5 point sequences yields X(5r),

X(5r+±), X(5r+2), X(5r+3), and X(5r+4). Using these

sequences and Eq (D.1) gives:

N/5-1 5rk N/5-l (5r+l)k N/5-1 (5r-:X(k) x(Sr)WN + 7 x(5r+l)W - x(5r+2)W.

r=r-N N r'OSr=O r=O

N/5-1 (5r+3)k N/5-1 (5r+4)k+ E x(5r+3)W + Z x(5r+4)W (D.2)

r=O r=0

By regrouping exponents and making the substitution of:

5r rW = WN/5 (D.3)

then Eq (D.2) can be written in final form as:

215

LM~ jq

N, -.- I : - ,/ .

:C = 5r)N IV w( Jr+i)WN/Sr=0 N5r 0

2k N/5-1 rk 3k N/5-1 rk" WN rx(5r+2)W N/5 + WMl r=0x(5r+3)WN/5

4k N/5-1 rkWN E x(5r+4)N/5 (D.4)

r=0

Each of the N/5 point DFTs in Eq (D.4) represents an N/5

length sequence and the WN terms in front of the summations

are the butterfly multipliers.

Eq (D.4) can be rewritten to reflect the N/5 point

DFTs as:

k 2k 3k 4kX(k) = A(m) + WNB(m) + WN C(m) + WN D(m) + WN E(m) (D.5)

For N=52=25 the Eq (D.5) representation is shown in Figure

D.1 and uses a less cumbersome FFT,notation (Rabiner and

Gold, 1975). X(k) is obtained by evaluating Eq (D.5) as:

X(O) = A(O) + B(O) + C(O) + D(O) + E(0)

1 2 3 4X(1) A (1) + W2 B(1) * 25 C(1) + W25 D(1) W E(1)

25 25 25 D25 E

2 4 6 8X (2) 2 2(2) + w2 B2(2) C (2) 1 W25 D(2) W2 E (2)

6 12 18 24X(6) = A(O) + W2 5 B(O) + W2 5 C(0) + W25 D(0) + W25 E(0)

216

• ,i' ,, ," -,' *; . ' ' : ; I : , ... _-

77

x(1) li/ /Ux (6)

x~~~~~ ~ ~ ~ (8) i- ( 2

x (21) B4x (9)

xC((2

x(22) x(4

x (3) D()(5

x (4) XC()?'4~ (1)1

x (9 N/5X(21)

x (24 X (.4)

Figure D.1. First Stage Decimation forN=25.

217

f2

2 3

2L 4 ' ' : 92

': 2 = 4 ) i:, I. t , V W ,- i:3)

24 48 72 96X(24) A(4) + W5 B(4) + C(4) + ,D 4 + w E(4)

The above expressions explicitly describe the first

stage decimation for N:25. The next step is to evaluate

A(m) - E(m) which are also 5-point DFTs. The 5-point DFT

for A(m) can be evaluated as:

N/5-1 rmA(m) E x(r)WN,5 (D.6)

r= 0

which results in five N/25 length sequences:

N/25-1 5im m N/25-1 5imA(m) - a(5i)WN/ 2 5 + WN/5 a(i+l)N/2 5

i=o I=(

2m N/25-1 5im 3m N/25-1 5im+ WN/5 i= a(5i+ 2)WN/2 5 + WN/ 5 i-0 a(5i+3)WN/25

N5i=O i=0

4m N/25-1 5im+ 'N/5 .i +4) WN/ 2 5

: KI-,.,. . , 4 2 7

It can be seen from Figure D.1 that a(5i) X(0),

{ a(9i+l) = x(5), a(5i+2) = x(10), a(5i+3) x(15), and

a(5i+4) = x(20) for the 5-point DFT of A(m). The final

expression for the A(m) 5-point DFT is given from Eq (D.7)

where N=25:

218

*~~~~ (k) ,'§'~

AX (k)

k(k

C Jjr k)~r

( k )3

'IC4()

t.r

321

N 9r- X k 3r

0 0 0 0~~A (0 ) ,: (0 ) L- t,,',_ -,[ ( ', . ( ) l , , , . 1 / ) (D . 8)

1 2 3 4A(1) = a(0) + W a(1) + W5 (2) + W5 a(3) + W5 a(4) (D.9)

2 4 6 8A(2) = a(0) + W a(1) + W a(2) + W5 a(3) + W a(4) (D.10)

5 5 5

6 9 12A(3) = a(0) + W5 a(1) + W5 a(2) + W5 a(3) + W5 a(4) (D.11)

4 8 12 16A(4) = a(0) + W5 a(1) + W5 a(2) + W5 a(3) + W5 a(4) (D.12)

From Eqs (D.8) - (D.12) the basic butterfly multipliers are

derived to be:

k 2k 3k 4kX(k) = A(k) + WNB(k) + WN C(k) + WN D(k) + WN E(k) (D.13)

k+r 2k+2r 3k+3rX(k+r) = A(k) + WN B(k) + WN C(k) + WN D(k)

4k+4r+ W N E(k) (D.14)

k+2r 2k+4r 3k+6rX(k+2r) = A(k) + WN B(k) + WN C(k) + WN D(k)

4k+8r+ WN E(k) (D.15)

k+3r 2k+6r 3k+9rX(k+3r) = A(k) + WN B(k) + WN C(k) 4 w D(k)

4k+12r+ WN E(k) (D.16)

k+4r 2k+8r 3k+12rX(k+4r) = A(k) + WN B(k) + WN C(k) + wN D(k)

4k+16r+ WN E(k) (D.17)

220

k.3W]..... '.' -- .. .- , . , i .. . . .

T ho EqIs (1)). ID)J- (1). 17) ar ;Ijowuj Lu 1 twiddle factor

butterfly of Figure D.2 where "r" is the distance between

the butterfly and points. Since N=5r the butterfly multi-

pliers reduICe to constant complex multipliers of:

r 6r 16rW N = VIM W N cos(2Tr/5) -j sin(2r/S)

2r 12rW N = W N cos(4ir/5) -j sin(4rr/5)

3r 2r 8 rW N= (WN WN cos(47T/5) 4-j sin(4Tr/5)

4r r 9rW = (WN = WN cos(27T/5) +j sin(2rT/5)

These constant butterfly multipliers are computed once

during the F'FT computation and used in every radix-5

butterfly.

~221

Appendix E. Mixud iFE? Aiqorithin

This section presents an alqorithm for computing the

FFT based on the discrete Fourier transform:

N-iX(k) = Z x(n) exp(-j2T:nk/N)

n=0

The algorithm described here can accept an N length sequence

which is factorable by 2, 3, 4, or 5. To aid in selecting

an appropriate length sequence for this algorithm a list of

numbers less than 50,000 containing no prime factors larger

than five is listed in Table E.

Arguments

A The real portion of the complex data sequence to

be transformed. It is dimensioned to length N.'

B = Imaginary portion of the complex data sequence to

be transformed. It is dimensioned to length N.

M = Number of factors of N.

WKC and WKS = Storage arrays dimensioned to length N


N = Lenqth of the sequence to be transformed. N

must be an integer power of 2, 3, 4, 5, or a combination

thereof.

AT and BT = Arrays used in the subroutine for tem-

porary storage of A and B during the data reordering (digit

reversal).

NFAC = Contains all the factors of N. NFAC is computed

by the user and passed to the subroutine in the argument list.

Dimensioned to length M. 222

)()W,4 2 2i til I I

dimonsi- ntd to 1.on tlh 4.

IWK() 1 [ow_,rs of 5

IWK(2) = powers cf 4[WI(3) = po.ers of 3iT;!' (4) - !,o'..-rs of 2 (must bV n or 1)

Usage. The subroutine listed permits a maximum of 11

factors which is adequate for any N less than 216 with the

factoring used by this subroutine.

(1) Dimension arrays A,B,AT,BT,WKC, and WKS to length

N and array NFAC to length M.

(2) Factor N and store them in array NFAC. Array NFAC

must contain the factors of N starting with the high-

est prime factor, 5, and continuing to the lowest, 2.

E.G. N=480NFAC(l) = 5, NFAC(2) = 4, NFAC(3) = 4NFAC(4) = 3, NFAC(5) = 2.

(3) Specify the integer powers of 2, 3, 4, and 5 in the

array IWK.

E.G. N=480IWK(1) = 1, IWK(2) = 2, IWK(3) = 1, IWK(4) I

In general,

N = 2 A 3 l 41' 5' and

IWK(l) = q, IWK(2) = p, IWK(3) = n, IWK(4) = m.

(4) Specify values for A the real part of data sequence

and B the imaginary part of the data sequence.

(5) Call FFTMR(AB,M,N, WKC,NKS,AT,BE,NFAC,IWK).

(6) A and B contain the real and imaginary part of the

transform X(b).

223

~~' 1

tit 11 rlC * libi. r s w -; r ~

I i

224

ANAL -,.

62~.6 b

72; 729 7i. 768

810. 8 ,* 9 D '. 6 J 7 21 ga" 0 . 24 C(:., 125 1152

1200 1215 1205 1295 I2ib

1351l D " ,0 i!a) 1! 3616 0. 16 2 . 37 . Z: 2518 "1921, 1944, 2L ... C 52[*216. 2187 225 , ?3 14 2, .

243. Z2 3ju 255. !21-9 ,W

2o8-4 2916 3L J. Y72 72

321t 32-" L 331 3 ' .- 3' .K

"3 64a- 37 L" 3" -).+2 4. qu 42

6J);. . .i

518 ,- .' . 2" , ...

• i:,3 >.15 p . .--.L~ -,

3 1 L 5r 2,b 32

i4-3,

32835 3375 365 4 W9:? -6-0

~~~~35 366 5 3 J" k~~2, 1 3L; 1S,, ;osJ ;e

1*71

2 8 a 0 291bL. 3Q3. 31?75 3Et, 1t

3 t1-4 31 25t, ~ J 324 a j 32765

32835 3 375C 3456J 3499 ?b .

3 355 3686 375ij 334PJ 3 8 813

3936b '.Jit) 4030'3 1#J36J 4 1L 72

4 IZJL 4.374,; 4 5 O 50 s 4(

#6875 483G 486U '43152 Om

225

S

- - - c. --

a7=:THE IfliF4 iiHPY '.L;ETO PE T~~ 1 ~28i)=;- OHA OUTPUT P I T PrEFPLiFC IDP Y THR PO'LPIER TRFINSFOPM'.29 C' B I:: lli *E;iCIAOHEDl TO LE;IHG:TH ii:3O~ 0 ' 1: HIBPOF FF4CTOP : H.3iC 1 H=0 :N THE LEH-iTH OF THE Z.Er)'LEHCE TO PE TPANSFORMEi.. N1 MUST BER320=C ITEGEP PC;..'EP OF DP5,, ~.330~=C ~V:DIMENHCIO D TO LENGTH N ANHD §H"1TAIHr- THE iCO-INE TEP;47 FOP

3 4 Ctl = C :, 1 D I i1 E i 10;4ED TO LEHGTH N HI r'~D~ONTRIN THE JIHE TEP;4"r THE FFT.

330i FiT: t1H:IH TO LENG3TH i4 PND 12 UED T 0 TOPE THE f; Z.Up IN.;

37 F- T: DI;1E;4i:IOHEji TO LEHG:Tr H 4 iHDil 1'7 LE i TO 11OPE TH EU~p I ;i

C 'HF FLE:~.-~J - ;4T~il4'- THE FTD H :T rEET:Dr

1- i4 THE 51 'HTL. T. 7CHE rT -IJ:-TL:I J.:

4~ ~ ~ 3 -4 '-l -T iPE7- 1

4,: - CPL L :i~i4 I' P;TP7 I ;4 T 4

5~2:c D I'E 4 IOH iziH (1 B -'H. ,biI H' C 6H0 it 14W ' 4.' i*:T 114.', .PT "Hifl5:3 0= DI;1EN rO;l HCOUH 'ii I BE Ik. LA NDIG I 7 10 FFIC (il.:54o= liHTEi3-EP P -TYPE ~;ED

226

6mL~~-

- .. .. ..

.40--0 :HJFFLE THE IHP;JT F4Pi, TQ FE.FP E DIi ,

750=C760= OH I 4= -CECO4D 'C P,770= DO 135 I=I.4783O= AT (I') =Ai(I790= 135 BT (I:., =B 'I80,I= C CO'PUTE THE BaSE ;U;_,EFS OF THE ,-OU;i4TEP310=0_

;32 = 'IFRC=i'1833= H4BE . = I840= DO 101- .Ja =3'5 f=- ;.4 -E r' = ,;,1Fa- , 4 W =' E -

,='._-.,=_- ,-00 '_I TE ThE ,C,"U; TEW LI 'TT - pg"' F,=.,- -:-Ji~i ~ O I i I

THE _I jD31 4 T' ' L1,,13 1 J = ,-

T,'= ICOUI4T,' 40 _IM

4I II0 ;IPUTE 0 ,5I- 1E -1

I

I ')3'" -

j gi4 , -. r =,1

l " " " = L1'L] ' t- ,

~~1070= 13')I ri="-

1030=0 CHECK IF CH;_tFFLE IS NECESSARY PH THIF PaIR

227

- - , " ~ ; .... . ...

~~ F&~I9 T*- r r .tt< I

1 340=i * O;IP-UTE THE TPFIFOFII RaDIX 5 SPECTIOII1350=13n i-, FI=-- RE THERE PO.IE? OF '57

1370= PPIN T.o 141 *.i ~ 1

13.30 :F T1; 4<EO~~j3 '0= IF *1I.WI ' E .0 T ''

144=T FT.: &Tc

149 1k1I.1=je

14,i~

* T I -fl

TF P4 I

228

T- -

11 0- TFr? .? C ;a l * + 1 1

L3I:8 0 T'J5=;4 - I *r 4+1

192=i- THIC LOOP PEPFOP;I ?' THE s5-PT DFT. THE LODP ICi ? 3 lllCTi-llE D' ;.'HloH -ELE-CT: THE UE F -W~ITH THE C-t4M T;:

i Yl: 2= I=I 4-P

I' ̂ T;. Ii E THE iR F7 ;4PF'-jT r: D TOF',E PF 7U T li4 TEI,1E 0-

B3T-rF; r *TF T. I.TE&P

Il r;4 T= r,IT4 *T~r4' 14 OT;7 r- 4

i?4 = i14: T i4

21229

R --R 3

2~41Fi :4 5=:4.umP2401 cB3-Pr4=;-[ I;4?.1:44241 0= -4B:3?P.=.:c.3F2424O= Cd2B C3 2.P24B

243 O= 42aAB:-4P4 ' I42.*5P 4244 O= C:42?p5=cD *4P?22PB72450= 41R4?2B 4 *CC ;t:H?

24? 0f= :4R3r-a= IT; 4;F43

24,030= :245z Pcr-*sr14 4#:-I4 r-41+ 4r -1 r4 r

0 14- '-$ 4 tl t I tt *r4"It~wItL rT+.-

4 1 Tr4r

'r4pi 34 4

2 r-k5 i- r-:

J,7 7 E4 at- - ---

274T 200) PPT1;T1. T;av;, 23 (3'.217 5 '< F4PE THE'RE A~ii [IFE? 472760)= I F IW1r2..L E.0.- -30 TOD :3010

230

-,

=.' ... .-,O-;'! ITE THE ITdTE> - T H4 r:2?7 i= IF23aV= r< .'2*i I

.3 1)0:0= ED 0 -1= TYPE3 0 0= IF ;IIlK 1 ,.;i E. , 1. ,'- I TO 2 1 )

3020 = IF *L4. E. 17' I 3, TO 2113,3n= 210 IF (:J. E-71. I GO TO 2113 '40= *J ' _- I

3050= TFa2=uiL ,( J;.Il *r: 1 + 1',

-,7 ,i= T - :7 ; ,r , - f I -+ I

- 7,-1,= TF,4=;,r - _C,' . _: i

:: =-,. I - iIi+rI A-I'

D1 TA I ~ I ~ j

:=, TH- LODF PEP:EEP;H- THE 4 FT LET

T zr_ ii; , + t!. .;

S7 I I 'T -, . I ' jr .- . ,- " T r --'

. F. 14T=r4l 15 '*TFS+P" , I,' *T-,'1 ' ~r liZ r-" 1-,I * ' r: ri -- il'L[ t rlT'

:, ' +- r T .. . , T - - +- , " - 7 -

;3300= , TO 217'310)= 215 a?2T=a I2':3_,'i= B2T=BK 12,

231

-_ I~ * I-. ,- . _1 . .. .I m.-im . .. ) . . . ... . .m . . . . ...

' 14.:

35 O= :TI IE

:35?'1 27" CD;j ,'j"j 4

~E

358 C= ppI;AT 'P I.

35 0-

F D :: : :a C T I '~

4 T F

4

-. 00=i-

i i I ;. j I

3 1 I -

A ,0= L 'l -

= ,; - : ' - r 1

4.1..

.4I A

232

d-.k&D -F] L-: ',{3

4:- - 7.--: ,-- .. ,

4uv~iz ::T=r- -

4(i7V= -I, L.

4,171= r:3 A T,: IZ41:31:= rs[-: = 1

41 CIO= P: T . I 3411 7= -1 = HH: T+H -T4120= B2:B=P2T+ T413 0= R-I ,I . =i I T+ia3i :?4140= BI . I T+B2.3415,= :'i 02;=cO;"- T* ' B -T-,2T'4 16 = P.'.,IB:'=cQL-,;-4 T* 'A2T-a:-T,417 0= Pi;,I =a I T--. 0.418 0= PB: d.,I =PI1 T-0. 5.P2B:34 19 C,= R 12> = 1a;.,'i42&d,'= F:,: '2' =pp£;,ij -P_;;4;-

421 ,= , ::> =F'R I +PR;,24 -P B'= (1':?:' =pF,;I j +P i,,I;:

42 2i, 0- 0 COT I HA;E424f= D; 4 1 T 4;I-H;E4250= :70 i T I; UE426 P= PPI;T*,, ' FI I :: - 0;"-4E

24Q7 F;= rib) 7h411

4 -',

.4.:_?, =,4 3 = -T; ' : -. ..-

4 Tn F i -

4 ,i7 1TF

4 _- 2P1E= I

4 T Ft TYPE=P4:4 = F I="4:350= TFPE ='

k[ 4T-'E. = TFa I1.

4.3-to= TF 1 = .44 0,1= r. I = -Ailc114410= iDa 4,'' _I= ',TY,.pE.'". 4420= IF' .I.cEO.1' '60 TO 411

,44:3eU= ._;ii1 J =.1

r 4440= TFa2=;," -1;11 . I + L.,

233

->aaffiL- a. Z m " -: t .:tt k Ul .... .- .

447 = TFFS d ~ I VI + I+,4 4 ,.'o 411 11J

4 47F 1 11.-

4 4:4 1 T B I I.,t ~47: 11 . IF P -. Eil. I' G3D TO0 4 15

451= F 2T= a't I 2.' T FAR2-B1i a-,') .TFB?24520L= 32T=A4'2) TFB2+B 'I2,-*TFRS4330W= .-O TO 4174540= 413 R2T =Ai(12'4550= 32T=B(12).456L0= 41? A(IL')=A T*Ai2T4570= B (I1.-=B I T+B2T458 0= RA(12) =FA T-F42T45':0= B (12) =BIT-B2T4600L= 4830 0; 1TI;IU E4610'= PRIiT*. RADIX 2 DONE"462 #)=;- END RADIX 246fS3 0= FFTOUT =SECONiD CO>-EFT I N46-40= PR.INT*."TIMIE TO PERFORM FFT= 'FETOUT4650=i4660'0=i: END OIF FFTMP C:-UBROUTINE46-70= 300) PET;JPH4680C= END

470 0=e'EOF

'23

I

r ilh.; Couit for F RTML

The operati ns :ount for the factorization used in

Th s agor!thim ts i fjnct.(on of (1) the number of butter-

flies, (2) tht , -ili r ." co:plex twiddle factors, and

(3) the nuwrrEr of *,,s the cosine and sine difference

equations must be coMputed. The number of butterflies in

a mixed radix algorithm has been shown to be (Singleton, 1969):

mE (N/Pi) (E.1i)

i=l

and the number of complex twiddle factors is:

mE (N(pi-l)/pi) - (N-1) (E.2)

i=l 1 I

where N=plP 2 ... Pm" The radices in this algorithm are

restricted to:

N = 2r 3s 4 t 5u (E.3)

Given the factorization in Eq (E.3) the radix-2 section

(where p=2 ) has

r rE (N/pi) E (N/2) = rN/2 (E.4)

i=1 i=l

butterflies which require four real additions each. The

number of complex twiddle factors for the radix-2 is

given as:

r rZ (N(pi-l)/pi) = Z (N/2) - rN/2 (E.5)|i=l 1 1 i~l

which requires four real multiplications and two real

additions each. Notice that the N-1 term has not been

235

, .i i i i i i 'i ' pj. . .

subtracted as in Eq (E.2). The N-i term will be subtracted

after the total operations count has been derived for 3, 4,

and 5 factors and combined with factors of 2. Using Eqs

(E.4) - (E.5) and the number of additions and multiplications

required for c ich provides the operations count for the

radix-2 section as:

real mult = 4(rN/2) = 2rN (E.6)

real adds = 4(rN/2) + 2(rN/2) = 3rN (E.7)

The radix-3 section requires 4 real multiplications

and 12 real additions per butterfly and 4 real multi- Iplications and two additions per complex twiddle factor.

Using Eqs (E.1) and (E.2) the number of butterflies for

p=3 is:

s sE (N/pi) = E (N/3) = sN/3 (E.8)

i=l i=l

and the number of twiddle factor (neglecting the N-I term)

is:

s(N(pi-l)/pi) = 2sN/3 (E.9)

i=l 1 1

Combininq the additions and multiplication, required for

each butterfly and twiddle fac ur with Eqs (E.8) - (E.9)

A gives the operations count for the radix-3 section as:

real mult 4(sN/3) + 4(2sN/3) = 4sN (E.10)

real adds 12(sN/'3) + 2(23N/3) 16sN/3 (E.11)

The radix-4 secti,.k has zero real multiplications

and 16 real additio:,E per butterfly with 4 real

236

multiplications and 2 real additions per twiddle factor.

The number of butterflies, where p=4, is given by:

t tE (N/p.) = E (N/4) = tN/4 (E.12)i=l i=l

the number of twiddle factors is:

t tZ (N(pi-l)/pi) = Z (3N/4) = 3tN/4 (E.13)

i=l i=1

Using the number of multiplications and additions per

butterfly and twiddle factor in Eqs (E.12) - (E.13) gives

the total operations for factors of 4 as:

real mult = 4(3tN/4) = 3tN (E.14)

real adds = 16(tN/4) + 2(3tN/4) = lltN/2 (E.15)

The radix-5 section requires 16 real multiplications

and 32 additions per butterfly with 4 real multiplications

and 2 additions per twiddle factor. Using Eqs (E.1) and

(E.2) where p=5 gives the total butterflies as:

u uE (N/pi ) = Z (N/5) = uN/5 (E.16)

i=l i=l

and the number of twiddle factors as:

u uE (N(pi-l)/pi) = E (4N/5) = 4uN/5 (E.17)

S i=l i=l

Combining Eqs (E.16) - (E.17) with the operations required

for butterflies and twiddle factor in the radix-5 section

gives the total as:

23237

real mult = 16(uN/5) + 4(4uN/5) = 32uN/5(E.18)

real adds = 32(uN/5) + 2(4uN/5) = 8uN

Using the results of Eqs (E.4) - (E.18) and subtracting

the N-i complex twiddles provides the number of real oper-

ations used for butterflies and twiddle factors for the mixed

radix algorithm. The expressions are:

real mult = 2rN + 4sN + 3tN(E.19)

+ 32uN/5 - 4(N-1)

real adds = 3rN + 16sN/3 + lltN/2(E.20)

+ 8uN - 2(N-l)

Recall that Eqs (E.19) - (E.20) account for only two

of the three sources of real operations in this algorithm.

The third source is computing the sine and cosine look up

table. From the FORTRAN program in this appendix the

expressions computing the look up table are:

WKC(I) = C*WKC(I-I) - S*WKS(I-I) + WKC(I-l) (E.21)

WKS(I) = (*WKS(I-I) + S*WKC(I-I) + WKS(I-I) (E.22)

Each equation requires 5 real addition. and 2 real

multiplications and they are computed N-i times for the

mixed radix FFT. The real operations required to compute

the look up table are:

real mult = 4(N-l) (E.23)

real adds = 10(N-1) (E-24)

238

S.>-

Combininq Eqs (E.23) - (1.24) with the real operations

for butterflies and twildle factors provides the total

real operations for the mixed radix FFT:


+ 32uN/5 - 4(N-l) + 4(N-1)

= 2rN + 4sN + 3tN + 32uN/5 (E.25)

real adds = 3rN + 16sN/3 + lltN/2

+ 8uN - 2(N-1) + 10(N-1)

= 3rN + 16sN/3 + lltN/2

+ BuN + 8(N-1) (E.26)

'.2

239

Development of the Mixed

Radix Digit-Reversed Algorithm

Assuming that the number of points to be transformed

satisfies N=r I , r2, ... , rm, where rI , r2, ... , rm are

integer values, the indices of x(n) and X(k) can be

expressed as (Brigham, 1974):

n =nm 1 (r2 r3 ... rm) + nm-2 (r3 r4 --- rm)

+ nlr m + n0 (E.27)

Sk -_ 1 (r1 r2 rmi) + km-2 (r1 r2 -" rm-2 )

+ k 1r + 0 (E.28)

where

k = 0, 1, 2, ... ri-1 i < i < m

n.i = 0, 1, 2, ... r - 0 < i < m-i

For N=30 = 2x3x5 = r1 r2r3 and m=3 the input sequence

x(n) counter is:

n n2 (15) + nI (5) + no (E.29)

where

n 0, 1, 2, 3, 4

n4 0, 1, 2

n = 0, 1

The output counter k for X(k) is:

k = k2 (6) + k (2) + k

240

'q 'I l~ l l ' " .,- , ,.. - . ... ..

whe re

k0 0, 1

k 0, 1, 2

k2 0, 1, 2, 3, 4

To implement the general digit reversed counter let the

input counter n use the digit reversed multipliers of the

output counter k:

n M_1 + nm 2 (r1 ) + ...(E. 30)

+ n1 (rI r2 ... rm-2 ) + n0 (rI r2 .. rm 1 )

For the example r1 r2 r3 = 2x3x5 = 30 the digit reversed

counter becomes:

n = n2 + 2n1 + 6n0 (E.31)

where, as before:

n = 0, 1, 2, 3, 4

nI 0, 1, 2

S2= 0, 1

241

Appendix F. Sin(TLeton's M i:xed Radix FFT

This program was written by R.C. Singleton and pub-

lished by the IEEE press in "Programs for Digital Signal

Processing". It computes the DFT defined by:

N-ix(k) 7 ' x(n) exp(-j2mnk/N)

n=O

It also computes the 1/N scaled inverse Fourier transform.

The subroutine listed in this appendix factors N into

"square" and "square-free" factors and stores the results

in an array NFAC. It then calls subroutine FFTMX to com-

pute the complex Fourier transform, twiddle the data, and

reorder the complex array to final order.

Use of this subroutine for multi-variatc transforms is

described in the comments section at the beginning of the

program. A multi-variate transform is basically a single-

variate transform with modified indexing (Singleton, 1977).

The subroutine listed permits the sequence length that

has 15 or fewer factors.

The smallest number that has more than 15 factors is

12,754,584 and if this condition is encountered an error

message is printed.

The transform portion of the subroutine includes

sections for factors of 2, 3, 4, or 5 as well as a general

section for odd prime factors. The special sections for

2 and 4 include the twiddle factor multiplication in these

special sections instead of using the general twiddle factor

242

section. "Pcrforminrj t1i. transform in this manner pro-

duces a 10 percent speed improvement over the general

twiddle section" (Singleton, 1969). The special sections

for 3 and 5 are similar to the general odd factor section

but reduce the indexing required and thus improve the

speed (Singleton, 1969).

Arguments. The Singleton FFT for computing a complex

single-variate transform is called using the following

arguments:

A = The real part of the array to be transformed and

is dimensioned to length N.

B = The imaginary part of the array to be transformed

and is dimensioned to length N.

N = Length of the input sequence N which must be a

positive integer with no more than 15 factors.

NSPN = The spacing of consecutive data values while

indexing the current variable (in units determined by the

magnitude of ISN).

ISN = The sign of ISN determines the transform direc-

tion (negative for forward and positive for inverse). The

ILmagnitude of ISN determines the indexing increment for

arrays A and B. Normally the magnitude of ISN is unity.

NSEG = An integer value such that NSEG x N x NSPN

equals the total number of complex data values.

243

Usaqe. lor a sinqfl-var iate forward transform:

(1) fpoc f,, the input ;c.qunces A and B and parameters

NSEG=l, N=transform length, NSPN-=, and ISN= -1.

(2) Dimension A and B to length N.

(3) Call FFTSNG (A,B,NSEG,N,NSPN,ISN).

(4) A and B are the output real and imaginary portion

of the complex vector X(b).

To perform a real valued, inverse, or multi-variate

transform refer to the comments portion of FFTSNG.

.4

244

"1" < - " . V. f '- Lh ' :

-- - - - - - - - - - - - - - - - - - - - - - --- - - - -T ' - - - - - - - - - - - - - - - - - - - - - -

0 I HfH Y 4- Hr-itl P OF'I'B,1TflLL,-f HO-1t I HI- Wt 4L RID 119 LIP

2:7 i-OMF'OfFr-iT- OF THF PTIFT. ann W'TU~pt THE PE;L. FibhilI M P4;-I t 4 AHP F-'. fnlF NTS. OF TH f NIIJT-A FOURIIEF-. COEFF n':lEtiTC

f-H TI HR I Th paTH 7U T H-4Ti Fi Hr- Itip ri I ~l TO THE FOPTRHiHR FL- WM-I >'W1fl F: T ~ F I ITkli' . IrIT

C~~~~~tj~~~~ TFh Ti '4 F C tF E 'Y:K i4 F 7 TTri-I- i'1-- I ~- V CHL9-I i>4. i-iF Fp

T HF ' HI 7 P H7 rTItT 1_ F TV IN rREr'il, [TIH- - -

Nn1 TA - 4 : T P Ti-LC C" FQ ~ .~

'4 P p :H F- ~ >PT t 1 1 5 1 7 1- I

-- U - T--

T .I '

5 ~~ F'i LzW F T . - < e I * . -

- ,' t1= C I FFT H* F t I *r., n 1 r - I

245

71 I~r4rFI;PP' T K'Ta fu' i

4'I EIIIILE ri E 117T:Ik I.s. TFHlf

11F T FR P TNF H .TOF? OF

II F- T T%[ '

- .246

I ' .t u- l i - Pl t r I j 1 0 1I

iT

IF~ ~~~ 7.9. ; 1

l hEi= IF F~. FT T D ,F t TCJTN

h1

W:4F14-ill rlcI> F = .

1247

I! =

1--,- --.= J -.

3,_4V 0 t I r, = r, r

I'1 1 = _1 = _ , - 1 1

1 ' 41 * 4 T 1 .-

1 -:4 1.- l hni'd. = I . -C H. 4'

I . . .i,_. . [ ,,- .,-, -

tj*1L v , I:. -fl- --- +1 T -~ r4 W P, - R H.~'I ~ l - &

i t 1 11,: -L h-rA "T. i-4i'' i',-A, I.d. j.

--- - .- --

--.-

a I -- l,,

--,.i TI. iii r- r" i-Fi 'fl

247

7rII F JIi T. 1'n TO I

1 ii

1P iii D~~I P 1811C. Ii- ", F i '

1 41 ti-I Ri 1. F 1 -1 :4T '4'

1, i l [ z 0 . T T i, T

1 i -11 . T -;

1-4:r1= I'I[ = ->14

1, T= A F - T-

I--' =3 . i

l ., fli li A = c.>. F 'I.. IT.

. 1' 1F *PHr I

t I- Ial ==1

C "1119'= 1;~ = . t':1'i = - 1

1 I- ,F r4FR".: 2 1-0 TO 11

248

I -T

7u H F Trr rIr4F~ I 7 I LI -i r'4'fl iLT FOP, TFuhi: Rr T01- Ovi iP I THMETI'1 1. LLI -1JE-TITUJTE

:- - ~ .I i .F

IIII

- , - 1249

; n- TO * --

2_ = [--

T ~ P =' r 4 4 - nF ri -

-' ' , IF" ,:Wh .. LT.H'H' '30 T0 1. '-,,:,nn= k [ = .: - H.

L .".:.,In=iF ,I':k.LE.V P FiH' 35 TO U'''t

,=,44.n TFi<F1 F cF"TO"nr

I: i=t I

oT

- ,'<I- - 1 c-

: - .e - - e . "

.! ' 1 ' = -. : I ' **

S 7.1. = t , F = S i: 'it '

~250

rr1

I4i , , 4i -'" -.,,-

-I- - -:- - 4- -

:4 - -,- , = -' - ' -.:-:4 .3 h =,q :.-::, = ~.: ,! - - - ; , - .

.:41 11 '"t :_ = F * M '::.' 4- W:liel-_-

L .4%I~l= : k = : R- We F::._p:e'f

44 I'=Fbri..+PKL:

S4.E, =IF .:: LE. NT) 130 TO 15A47 0 17 :': = H T + .1:

IF *'b LE. Ml: GO TO 1 .3004'"= IF '4 :. LT. F- N:'t 3 To 2 (1

It *--, LE. " _ TO I -fl:: :'lt1 " 1-- , . 7 P iz Hr .F: .1 ,71 1 , ' " T . .

7 1,--::%4 1= 1:;1 r:F- . = t: ;.1 4- Hi "

- -- , 1 1 -.r , ' =- f, Fl , i-

T1 I-r4 P 3F 5 'LF'1 Iur4HI I-0 1F

251ai&Zr

3'e:3Ii ,k. 1 = I-ZW - F:.

4?'' t- (104: =~ , l 1:_4,1 il~l'l, 'k 4 = E:i: 4- F;!

4111 I)= F:, k4, = E:I -4' -

4Ji1t'= ,F: = -FekE - F il:e7

4fr~t= t.: = F:.{ ",:' 4- B. i'7 _ 4- F::

44.0:.h 4 I = 41< fle ,- - _ie.' ,"4 = P r . - , ,- T.'-

411.'"=U-T , , = H-1 4- -, ,~

4 1talIT )- " - -Il 4'F - q

4i1. ,-,1t1{. = {. . - HF

411 0, b. M ' 7-

4 1. *I=

41'11= I. t-f',r ' -

'J 4,-,-.,= -- ,~ ,ri' 'Kl,

4 ,'* t - r . * r * .. :r 4

4 Co '

4 24 0 17. 1

252

49.1 Ek = F. - F

49:r 1i *. = F. - F

494a 5 ~ '~ i -4 6'.c2454" 0= BT = I B i FH

4 P'.(iIF '41.LTJ -Ct.'150 TO &6

49 F.,: 0 k:

-~~ T k 1 'J

I. n = Tn

~ FFrib

4-.4ii .2=3

- 4 #I= L e. ft liJ

44 bl.

flln.,i=, -t = i .....* *, *c4,' 1 *-* . 4- 1 t

9ft fl = "" 1 = 6-.1 .1. 1!'"3 II I i, = I; 1. 1:-

~I'lIII .- W I ' :1 = I

= 1Cit 11= :. = k:. 4 - F't4

,l ,)I IF ,. .LF_ ' T ET _ T T -- .

T-, 9J') I= I = Hi14'.-

S,13:F= H;,: tI l. ,*hf. ~

"ErI- tl- r = < C - f' .T--: -.-..1.-

f = I rir'.]I,

In, -., -. ii -- V--T

I TG) 4 4

01 L

": i4 = IF ,d LT.c F- i. F. T- -:2

141 i)= _11=T

N254

... - 4 -i "- .- l ' ' =' In .. .. . . . _ i

r-i P

5,-, i= E 'I< ': = Br--Ij + I F- . ti

5r., I F ,R.LT.k -.:' GO TO 370

.,711 .1 1 LP -4:n T 05r -1= IF- ,' K . 'LT. r ; 0: :, 30 TO ,-

III= I T-T il -- 'I] I F ..L T. 1.- 1' 3fl TO .:7i'

7I .. Tp r4 4. I'4t74'-= j- ,I'.'.LT.h. -. Rf TO -'

=,- C [ " ,F F.. _T. , '+,i TB -2.-.i

F I I :. 3":

j1 TB!4'

~~~~~~'~~~~ Tp= +FN rr i~T tTT ~,-

IU--7o = = TO' 4fl ,

4 ,--,1'= 4' I = ' 2',, . ,= .F- , = *.i,.

,-.--ll= F = - * --[ !

C.,'-;, IF ,.Ak*.LT . IT' 3fl TOl 41"tI

*:-:.4,-, , .' = F.,? - tIT 4. r" -' r". c'-c,II= k:F = l: - NT 4. *..t

..... i,=IF ' . LT.1 - ' .Hi TO 41111

255

:=

.44

,'rl-1 .I ) I -1

618( = k:T z IT *T 1r 1 n r- rm = NFI C,:1<:T) - 1

4 F -_ _ I Ir,L 11I1= .I = I1

FI = rip- TD 4T-l4I i ? '~ = 4 !e 1 l1 _1 _ I _1 I - 1 T

1,- 1 =- r - 1

r i

= tA~i T f

I I * -

I

, "m ,, I . 2-5

- flI. F- k.

7, 4

6,,71= = I - In-.7h ' =IF , 1 .Fi .:. ,~ * '- TOl C.fi"

_,- 1= r , i, F I = . 0- 0 5Fr-,7,2A= i-'c = 1.:1 - . *: r4FI<:,

o ,.--: = k = -r-.F',::.

II= F; ,: . = F, : .:',676, =w 1 = L - It-W

*77 L-IEk =-- : :E' - I r-.:,.-II: IF ,F l. HE. F::, F1f TO C_,9,f'

,fr: T; = L r4

,2 r5- A. .,] 1- , -.

r-1, 1 i-c ,

' = - iI

r'- - -I:-- -'' 11 '* ,- H1

'

-..-. ,ij='t I ' = t.i ,' ,-

-•-,I -

....

- |*<- CW

The International Mathematical Subroutine Library

contains a mixed radix subroutine which can perform the

FT[' of any positive inteqer length sequence. This sub-

routine was based on Singleton's article "On Computing the

Fast Fourier Transform", Comm. ACM 10(10) 1967 in which he

proposed several ideas used in the IMSL subroutine. As

stated in Chapter III the program closely resembles

Singleton's algorithm published in the open literature

but the IMSL version has been copyrighted and the FORTRAN

code is not listed in this paper. The IMSL description of

the algorithm and its usage are included in this appendix

for the convenience of the reader and a detailed develop-

ment of the real operations count which was not presented

in the main text is also in this appendix.

.

258

ho-7ijg 2

Real Optrations CouinL Fc, r

TMSI, Mi- d Radix Alqorithm

A copyrighted mixed radix FFT is available through

the International Mathematical Scientific Library (IMSL)

on the CDC computer used at AFIT. This subroutine (FFTCC)

can accept any length sequence N including prime numbers.

It is based on an article written by Singleton, "On

Computing the Fast Fourier Transform" published in 1967.

Functionally this subroutine has few differences from

Singleton's algorithm described in the preceding section.

The factoring, twiddle factors, and reordering of the data

is the same, however, the special sections for factors of

3 and 4 require 2 and 8 more additions, respectively, than

Singleton's subroutine. Also this mixed radi) algorithm

uses the general factors section for odd prime factors of

5 or greater which further reduces the efficiency compared

to Singleton's.

As in the case of Singleton's FFT subroutine the real

operations count for the TMSL subroutine is determined from

the number of twiddle factors:

m•~ (N(Pi-l)/Pi) - (N-1)( .I:' i=l

and the number of butterflies:

mSE N/Pi (G. 2)

i=1

259

where N-1) 1) - . . 1n tli j subro t n,' tho L, c t= r wi is

performed such that N 2 3- 4 t p 11 k with the

real operations count being derived from the FORTRAN

coded subroutine FFTCC and the Eqs (C.1) and (G.2) . The

radix-2 section of FFTCC includes the twiddle factor multi-

plications with the butterfly computation. In Chis case

there are rN/2 butterflies and twiddle factors to be com-

puted using 4 real multiplications and 6 real additions

giving:

# real mult = 4(rN/2) = 2rN (G.3)

# real adds = 6(rN/2) = 3rN (G.41

The radix-3 section uses sN/3 butterflies and 2sN/3 twiddle

factors which require 4 real multiplications and 14 additions

per butterfly and 4 real multiplications and 2 real additions

per twiddle factor. Combining the butterflies and twiddle

factors the real operations count for the radix-3 section

is given by:

real mult = 4(2sN/3) + 4(sN/3) 4sN (G.5)

real adds = 14(sN/3) + 2(2sN/3) 6sN (G.6)

The radix- 4 section uses 24 real additions and no ren

multiplications for the tN/4 butterflies. The 3tN/4 twiddleI

factors require 2 real additions and 4 real multiplications.

.% Combining the results gives:

I real mult = 3tN (G.7)

real adds = 24tN/4 + 2(3tN/4)

= 15tN/2 (G.8)

260

All odd prime C<ictors ciual to or (,rater than 5 iso

the general transform section. Based on the [ORTRAN

program written by IMSL there are five sources of real

operations in this general radix-pi transform excluding the

array indexina additions. First the complex multiruliers

are computed for the butterfly transmittance:

real mult = 2 (pi-i) (G.9)

real adds = (Pi-l) (G.10)

for each new factor pi, e.g., N=7*4=28 and N=7*7*4=196

each require the same (pi-l)=(7-1) complex multiplications

for the factor pi= 7 . Second the complex twiddle factor

multiplications are performed on the data array. Assuming

N can be factored as:

N 2 r 3 s 4 t ml m2 mkP1 P2 .. P

mi .thwhere pi represents the i factor raised to some a-. sitive

integer mi, the number of complex twiddles is (mi)N(Pi-l)/p i

-(N-1). The n-i term is subtracted only once for each FFT,

which means the intermediate result can be written as:

real mult = 4(mi)N(pi-l)/'p. (G.!I

real adds = 2(mi)N(pi-l)/pi (G.12)

The individual butterflies are computed next. The first

output of each butterfly requires only 3(pi-i)/2 real

additions and no multiplications. For each radix-pm' there1 o

are (mi)N/pi butterflies in the FFT giving:

261

real 1 dds - (8 (p.-1)/2 ) ( (mi )I,'pr1 (- (

= 4N(mi) (-i) /p (G. 13)

Now the remaining portion of each butterfly is computedusing (pi-l)2 real multiplications and additions. This

gives a total of:2

real mult = N(pi-l) (mi)/pi (G.14)

2real adds = N(pi-l) (mi)/pi (G.15)

Finally the results of the butterfly operations are stored

in the proper array locations requiring 4 real additions

times (pi-1)/2 times the niunber of radix-pi butterflies.

This total is:

real adds = (4(pi-l)/2)(N(mi)/pi)

= 2(mi)N(p i-1)/pi (G.16)

Combining Eqs (G.9) - (G.16) the number of real

operations for the pi factor becomes:

kreal mult = Z (2(pi-l) + 4(mi)N(Pi-l)/Pi

i=l2

+ N(pi-l) (mi) /pi ) (G.17)

kreal adds = Z (pi-l) + 2(mi)N(pi-l)/Pi

i=l

32

+ 4(mi)N(pi-l)/pi + N(pi-l) 2(mi)/Pi

+ 2(mi)N(pi-1)/pi )

k= ((pi-l) + 8(mi)N(pi-l)/pi

i=l

2+ N(pi-1) (mi)/Pi) (G.18)

262 t

Usini Eqs (;. i7) and (W. L.) , r tilk 0hI pri me ctors 11d

the real or-rat i ons count t- facto :-. of :, 3, and 4 the

total operations cound for N 2r 3s 4 t pl ... p k can

be written as:


k+ Z (2(pi-l) + 4(mi)N(pi-l)/pi

i=l

+ N(mi) (pi-) 2/pi) - 4(N-1) (G.19)

real adds = 3rN + 6sN + 15tN/2

k+ Z ((Pi-l) + 8(mi)N(pi-l)/pi

i=l

+ N(MI) (pi-l) 2/p i ) - 2(N-1) (G.20)

As in any FFT the real operations associated with the

twiddle factors have been reduced by (N-l) multiplications

and additions because the last stage of decimation-in-

frequency or the first stage of a decimation-in-time FFT

require no twiddles.

2

263

f a S i - - r 1 i , --.. ..

Appendi-: 11. An Al o -ithrn for C v:pntin,: the WFT,.

This program computes the DVT defined by:

N-IX(k) x(n) exp(-j2-rnk/N) ; k=0, 1, .... N-I

n=0

where the sequence length N is a product of the relative

prime factors from the set (2,3,4,5,7,8,9,16).

Program Description. The WFTA consists of the six

subroutines PERM 1, PERM 2, MULT, WEAVE 1, WEAVE 2, and

INISHL. Step One is to map the sequence x(n) into a

u-dimensional array s(n , n2, ..., nu). Step Two implements

the "pre-weave" modules in subroutine WEAVE 1, one for each

factor of N. Each of the pre-weave modules contains only

additions. Step Three performs a point by point multiply

on the data array (subroutine MULT) of real constants

derived from the small-N DFT algorithms. These constant

multipliers are a function of the complex exponentials of

WN and are the only complex multiplications required in

the algorithm. Step Four implements the post-weave

(WEAVE 2 subroutino) module which contains additions,

subtractions, and multiplies by j. Step Five maps the

I u-dimensional array s(kl, k 2, ..., k u ) into the correct

one-dimensional DFT x(k) according to the Chinese remainder

theorem given in Eq (3.144) (McClellan and Nawab, 1979).

4Arguments. The WFTA is called using the following

arguments. More arguments exist in this list than in the

one given by McClellan and Nawab because array storage is

minimized in this WFTA version.

264

.-... ~I

N = Transform iength whr-!h mu;t be factorable into

mutually prime factors from the sot 2,3,4,5,7,8,9,16.

A list of acceptable sequence lengths is given in the left-

most column of Table 3.9a,b.

XR and XI = The real and imaginary arrays to be trans-

formed and are dimensioned to length N in the calling

program.

INIT = A flag to specify whether the call to FFTWIN

requires initialization. INIT = 0 means initialization

is required and INIT # 0 skips the phase. Initialization

is needed when calling FFTWIN for the first time for a

given sequence length.

IERR = Contains an error code upon return from FFTWIN.

If the DFT was successful IERR = 0; if an error occurred

IERR = -1 or -2. There are two causes for an error:

(1) The transform length is illegal, or

(2) The program has not been initialized for

the correct length N sequence.

SR and SI = One dimensional working arrays of length

M M x M2 x M3 x M4 which is the product of the multi-

plies required by the small-N algorithms. The value of M

* for any permissible N is given in Table H.1 in the right-

most column.

COEF = One-dimensional array length M used to store

the constant coefficients generated by INISHL for the

"weave" modules.

265

*Im " ,

TN I i r:,i T ',-:'2 r)nl,-Jjl flSioar 1 l(,nqth N r(ttppinq

vectors for prj-- and post-permutations of the data.

Usage

(1) Specify the input sequences XR and XI with parameters

N, INIT, IERR, SR, WY7, COEF, INDX 1, INDX 2.

(2) Call WFTA (XR, XI, N, INIT, ERR, SR, SI, COEF,

INDX 1, INDX 2).

(3) XR and XI are the output real and imaginary vectors.

The error code IERR=O specifies successful completion

of the transform.

(4) After the initial call, use INIT30 as long as N

remains constant.

266

$1

T' T

P' FT 1. 1 P- r-4

1" f =1 P POIG'PHR HO NT IHITIRILIZEt' FOR IHL.VLU OF H

-.5 fI =I- PERr.1U-TE THP INPUT Di;TA

CALL PEFMPiI 1 1T N:' D:- 1Ni:.

[ifl THF 'E-IiH" NtHE4 Hi

4; -,i [n THF rIP - IFri tMI-I-l P I F I; T I ir

4'1: t:P1 L t-II1I1 T nP - 'P H .(fl F.Nt it T~

I-~~~ Pc F' r -- ~ ~ p' I il l

-I! I 1 i- Ir l H 4.c=F F, 4 I 41

-:iif-= IHLF r . - F 4 4.. . 1 t

-4jj NE LI N.JH N q N Fj.4iC1J§N;1v.N il-jo L14. lr 1

267

TH RP R--PI T .;.----. DFF- -

IliT fl ri-4'. 1 '-1.~ 4 I59 f6 :4 ,iI

- Ii- i

r7. E-P F i

F:4 0 r =3Tr

I n = -4E: It1 1.l

Lit

-4411=T

F A A

IF IF '1. EF'. H: GO TO £90In ~ PPT INT*, -- TI ti 1OES HOT IaOP'

14''' PE T1IF-H141''~l

14 i C HYT ?3M-iT lEEFRTET-. THE 1w- I 1'LOEFF IC rNT: H4t

143u THE FLR, iBFRPPAY.1441J£5li 1145 DO H C 4= I *ND

147 'I' 1-11) tI 4.:: 1 . H [$l,.1 4:-.H lE) -1'll rij=l * FiLl

1 l 4': 0 F FT H I t I 41- t4

_~~~~~1 41 T I t. i Al 6 F3FTFP'ItP- npIH

*1 4

lr~l'

k ~ T~i'=julT n 4

1 (I= 04i1 Irit i H

269

- , To

19-1 0= '30 TO 5-_I'

1 F: k= _4' IF iEI-. . 1' i30 TO 6,-361

19.4= r11 5 0 ,Il= s_,A, I'1 P =i .riI.,1- 11 'k. i = IF , ' 1 ..- ri[.:, r .F. E F. P P ' 0 '1 0 ti u

I '97 I'= r=ril 1

1 a.?. 1= '_30 TO 6, 0

,0'0 '' . ii- I F *r "r[. El-. j: ' 30 TO 5,., i0

= I= I T

-- or, F

) ," Ir

I 1-= [ii) :;1 r 4= 1 I

-ii," ' -- -I ' I. - ' .

- Ti

I '

,---',!= r -Ti 2 trT,,C 1ff , :,

t: cc = IIriE 3F tlu ii

- rP,', I I=r4' l-NH

270

.... ' ' ' ' "; ' ' - -" - " . .-- ;

P 4:: 0 , . *i**

DO 4 AT 14iF-F[

L.45 11

rio .~ 7 1t=~*I

--. * r4r ,-' -.

1271

tI P

11=10

N- P (i1 4 r-9- -. v

c1- P 4l 4 P

IF--- 4'- -t

[5~~~ 44 C,-A4=I

li 1= i 1=P 4- 1

1 11= 1 -- NP*1

1411=~~~~ 1 r<:iPi

r~Uf= lt N ~ 272

I II I I ril-

r ip, - ' 4-.-+

"=~~~ 4 P TI ,- = r

:4 -1:= t - - 1 . fj , , - I -'. - - > : L ;!-i

' -.- =r4 a + I

Iii Ib=ll ": Ii.;. I -+ I * --

-:,:,1 il:r -4 = ,- - - -

-:fI-, [= LI 1 Et r tl--=: 1 .r-. ,;liA riP't = g- HF + 1

'41-4

t-CIF

-, *9 , i-i 1i l = ,. 1 ,-

-- ,7 =HF 1 3=fl 11 4-i

- = HF- 1 5 ...

. 1n, 1.a :,1 1 "-, ,-

:::t , [,rl J,.il '=1. 4

i -:-; 1 :I , I ', * V, T ' 4 -4

t4

::<41' = r .,, r ,I I l 2--:>"-If=r, , 4[,' H- = '' ' -u' -':.ln b"~~~F ' 1--.- ,: t - , .

273

-a -- -I-, S•

- - -,'.-I. -- T 1 '- .

- I -- & I ' ,= l -4, '', ,.

4- I = D - i nT1 , '- 11 'r Q4.)i)ir-i,= - -'t-r--=i+--F,- F

-:, 14 14-TI

4i,4,,= I7 1 4 -;

41 ,

= ' ; r ii. r - ' ,- --

41 11 -.5 *4. =T, -T -1 +4:

4 I I 1. f t4T I _ F4 1 I ri l t - , = F ' , + -, ",

4 1 4 - , =1 , i1 -1 . .

4 15 .I F' ''-, ' s-f ,.:4 :

4- .I , rt 1 - I-1, - 17,1I

4 I 11: , - = , 4*, 1-:

4 I=_'' . I i h ' =, 4, - 1 -, I > 74 + T.

4,_=:1............ .. ~'.....

4,-'~~~~~~ 1,: _ '[I2 ,

= ' I -T ' '-

4,_'4 I : .= 1 4- T

, p=I -,I 1 I . , I- t, 4; I

I, I +-

4 ' 1 - +-4 7.' . [ r . 1 r 4 1 , -1 , g

4444 a '' *r-- - = i ,~I ,e i

tFF

4 41r4,

274

" AD-AlSO 782 AIR FORCE INST OF TECH WRIGHT-PATTERSON AS OH SCHOO-ETC F/0 12/1EFFICIENT COMPUTER IMPLEMENTATIONS OF FAST FOURIER TRANSFORMS.IW)DEC SO.0 J0 BLANKEN.

UNCLASSIXFIED AFIT/SE/EE/800-9 N

44: 1=fl 4 4u1 t1j4= i t4 1

I,'I

*N P

ll= T~C~

I ~iN Pf qL 11-R

4r$41-41w ; : *'.N. F~T~ Tl Tn-

4660f=146? 70=C. THE POLLOIN1 COVE IriPLEMErI'- tHE 9POINT Fr-E-IIEAVE r~"

4 7 (1A= C.

47101= NLjPS= 1 i'*rit114 7 20 = NL U23-1 1 otitl t 'NtD3-NC)4 7:3 )= N F:AR:.EF=1I4 4 0= NOFF=Ntil4 7 9A DO 940 tN4=I NDi4 79 DO [in I.N

4 7= r-P1l=r4PF: TE+NflFF

4.-- .11 P -:fN+W *NG1FF4 1P' t4-N=P ~+ NflF F

4 ~ ~ -11 lr=-'4 +rt4i0F F4r:: P1 NR r 4;s 4r",Fp

4:~41 4 ?=P + N F! FF

T'--= 4. rti g

T-1 I -. F-- iP I- R- k~

li-

TI = 4.P T I' +p T 4<-T

'P 'NPF;4 = T I -T7

5010Q= $PqikP,5) 14 -T I

275

, .' P 1 -T4

=~'11..I = Fb . I ; .- , - [ , +:, = = I, r -T '_, I NP

t" %1 i~n= T7'I ',~ F ;' , I{ , t F T"-, 11= T =, I 'ri- T- f; ,

-1 1F' ,I FI,. + T i,

=,1 T".= Ti- ) "- 1 4 -,;

14T:-' I ,4Fl , -I I r-4-;

51,,= T4=.F 1 ,ri.'4. I HR

170= T5= . I 6, 4 1 . Is 5. s,

Sl~i115~ =Trf I -1 Tr* TI- I.4 -: = 1 -T4T7

520t= 1. kNF.5., =4-TI521 (.= -1 I (t4P6 =TT-T4522 f= .71 't,:NrP1 0:, =T2 T5 +T:.-523 0= SI P =T'3:-T2.54f= SI ':N: , =T..S-TS8525 0= S1 ,:',;=T2-T55260=91 0 NPRA.E=NFA:..E + 15270=93:' rl: -, .E=NrE::.E + NLUP252 8 0.=:94 I) "BR.:.E NE:EHFH SE +N LIjF'3:_529 0=700 IFiG. N.7' '60 TO 500

5 :3:2 =C:_-

5.3A'=C': THE FOLLOITNG COPE IMPLEMENIT: IHE 7 llt-T PPE-hE ,'..,F r' LLF5:'4 0hf

5:- 0 = -FF=N 1 *t': F; F= I0.::;n= ri F '?;E=. 1 HlF

5 -: a Itl4 1-l 1. IP '= - 4 n;l: FDO. , 4I1 4 qI P

S'-,4= :i1 - =[d-t 14 -''~i

.4 4 1= r4 . = , 4-fa ici F:

540 A= NP4 r 4 ". + -s-" fF:

5 47N ni -11F 4+r1-P

j 94::ui= ts ,,= r.i-r-,rF F

,, ,= T P 4 ". r '. , -P ':N-,'.)_., ": ; f=T':: . .P N'. , -": P' ,NP.-:)

5514 1= T ' 2 H°

P +. 7i..-.. ; P N P,.Fj

rSG. :, A: T5=_, .. , '2 -:.I.,'P )

5560= .S' ('5) T6-T3

276

S - " --; 1wr - . -, .. - . .. -

TH ~ ~ T~ -- T1 4T

P ~ T 1 '- T4

1 I4= 1 t4 4 ' T.

57111=~i r5 I 'H-T I I~.

mHY:T *4t *

5740 1 T 4;5 1 T -

.5760A= SI NP.3",T2 -T I577 O= SI(HP4'=TI T4578A= 31 (NP 7) =1T4 -T5790(= T1=T1-14"-T2.5800O= --s I"NPESE' =:SI (r-4:Hi.E) -Ti1581(0= i3:I q4R1'')=T I

5S:3l=-740' N F.A :E = r4B.F4 CE *-LUJP2

5-3 4 A=5 1-0C IF I:'rl. E.5): FETIJPH

5SY:7 0C. THE FOLLIJIIN3 CODE IMPLEMEM.1 IH 5 POI.NT PPE-fiFk-et P1i'tL

59FFr4 1 O=rF9A 5

5-;4 fl= t-I) F1 ': I T 1l At [Iir

-, ~ ~ ~ ~ i 4 --- '

P--4-114 -1 P i - T

p cIn I= 1 4 r-'i 1 T >r I + AT

e-,! Os" SlR Hl* =T4 ~ ~ ~ ~ .. r. 4c i= T I.(j -. : *.;e.

T1 4w '1 N =V I I T P4

61O= T4 I S. 1 -I + I N; 4'

277

~.1;T fir, 4- 1 ti-

rr,~F' I + T ;, *4. i

'-.1, ~i T 4

P7,~' F T I * Mi. T'I: iJ F * :

6.31 a71 EN

6.3-30j= T~~ P1.6341= COMM~~~n ~~ r4F~~r~i:7~Ni~l ~E' ~~~ 4

E.431 riEH=:

6.3 -3 = ~ F EF L wi I )I~ ..3 1 1

?4I=7CF:*JiT

P.44 ii=

1.104!5 r4 t 4-- 1 I r1P.:.~ ~ ~ -1 N P t.: = r 1 t4-4

4 N P D .r- a,1 r --4

M6 '= T4= I ITF& 't 'f.. I I'' T3=2 1 f 1'; 4 Ttti

P TT T* 4S

AA.4 A= T4=P *FP. .P N

6660= SI,.NPl"Tl.T4

IF 278

ri-i, 0=U NP4:'=Tt-T4T, T, F ,7B= .:I(I'F.'4)A- 1 I -T.

4 -- , , 1 . , ', - -T

' 61 -,.I=-f"I+ IF,':"*if7.1 1 s I3f -

r., I +, "

-. + NO PF

6.- 11 0= N' F=r4 A+N [-

6: 2 0= rT I.; .P- N P I } .P NB .E

7 ftf. (I :LiD 7411 + 7-,1 q ri..:;=,n,=V[ Ti, 1 1 =1 * ,,,rrb~~bO= MP 1=:td::i-&E *ruuFF

67('t= =PSN T T6D

687 0A= SP2= T2-t FF

6970= 1;5=4P4 *-F Ft- II= -6 P 54 IFF

4 1,6340= TlP.Fe +PNBI.E

6920= NP 7=NP6 +OFF6930= r ;, 1=r .TI +FF

~~~,QA~~~ f't TT FTNFi AP

7T l=TPi11=r4 T3'4P:

O=Ti=.: ': I ' ' +. 4 -T +SE 7

6 '95 --"T 2 T - .ri' -Si F .'-,:P :. -. :.i *:f P 146" '

7 T4=1 I= . - ; '-i_-:' -l' .r7 P ::

702.9

,I9r.I T =T ,. ., ,= 4- ;., .F+' +)

"7 110= 1-.1' , i+. I 'tiT 1 -'

T9P' ,-' O= -T ' =T 1 Tu -

T ;F = '-. ' r'ti? -T 5',- ' ~-47)I4"I+ I:'r=4*1)TlT6

" /111T = I T 1 . I,=1.-4, "I' '7,

71 0 J 6="."' \1 'F '-44r. ', PF5: - ,Nt":

7 21";=T5 '-', I I 4F','=T4-- -,t':-: - : ';

-=-r1,- -.' , .---

7- R "it iFH P,7 P:I.

4 i 1 4 11 ji *iE . .. 1.8 I 1.1p,:j L

~~ **

7 F11 4H FiILi i r-4T PMLr~i T54 Pflri :.'

ii L

72 ".'ElJ.= I N1F

7411= -S ['0 I . N= I1.' N F 1

7 4* 0J T1=SRI (NBAE:F,.E'+-- (hp N1)

7470= 1NR P1T1;-PNP2748(1= .;i (NP) =TI--SP (NP2')749FI= .3R (rip.,) =...Pj.79Ci -, I ci E .. =P 7T+I

71 .11 ( i= FF~E~4t ENL iP,252* fi=- .41:A N P 4F=r:-F .+~LlpF ,

75 '0=--.ifi IF .:. NE., GO'' TO 40A

7 (=- THE FOLLOiIING3 4:*OEE IMPLEMENi..-, , POINT O.-Ei.' ::_-

;' ~ f-4 i' 4 IIF,?= I Ii~ or4l I

4765f9i= io :44 1-1 r44= lr4 i

7E.A DO L1', ? 41_76 (r4:.1 1 . ri [FF

-R fi=, r4f.. I=tif: j1 r41PFF

11I= ;~4 W. 1-4 -+

77:(I= Np=N44rPlFNP =~F*mnfFF.,1774fi= NP7=NP6 +rM:l=77r-;fi ~ 4;:47f l

* ~~776(1= R=4:4,Ft

F 280

777T 7r i=t-J 4lF

jj.4 =T-T .A.ri' . ,-5 .,.,-4-

," -Ii',=.. ,'' U ~ .r {-H : E--± :,IA {.r .Ft. *i....'.fr' -H, * . 1 ' .

T I. 1 1'*f ii~

S4P r. ' = r.

;. ,':: :? .. I I:l I.--T .- :- , r-' , - - , ' - ,,,T T =T ,. - 4 , - ,-.

-.- ,= T I ' - 1 , . + I

1.F ,.irJ '=1 ,*I'L,. 411 7 - T T:,

7'?,.: U SF'H,:i 4.-'4 ,=It-T8in, =- r -'; ,. =T 1 -

. T= - 14- "-F4 Tr79.5O= S. P'2=T 4 + Tr.,

796 0= T 3 = S I N:.-.E') I 'F3T9 7A= T7=_.: I ,:iB,.E +- I ,.P I:,798 O= I .SE:, =SI N P.- F4 *S! E)NF3, 1. 11 I ':7990= T6=T.3-SF' NP 10:'8r n= 1 (N ' ., =T -+-: +.-p , N . fl0801 O= T4=T 7 + I ' N P5' : N6.0?0 = T1=T7-SI 'NP4:, -I:NPS)

: 3'..: = T7 T 7 +.7_. I 'N P. 4 ) -S I '.N P6)804f'= -I alPA6:, =T 680.50= TS=P 'PS., -SRP ( P'T71 -S.N P,8:);-: f = T 5 N '. 5 P , ' - -.: 70= T . : NP .': P- F "P7 + P N P"C,

; fi',:'.3 N =P , I . T 7 + T ,-

I' TF I TA_ NI 'HP-)=T7-T5;_:I , 0 :"-I *rlR:',=T1 fT:-

:31 10n= .I ,'HP 5)',=TI -T.Z

4t- ' = 7. -= T _

* .-:I .', i .FM ,-,'_;=Tl r 4 -r T ,14 A= I P*' r4 P-.hr .', ,

-::16 -49 --49 J;, K , =l F 4'-: t F

r

4"

iv

.- [H7,"I , ;F ,-ULL" I. . i ,-E IMFE -4+ - 4 1 H irF..F., IF -,: 4 g . r: PET(' P-b

I FJ r 3=4 C"410 -

8.310= 17E.E=1

4,°

281

I .1

t'Fl 4411 iH4 I *HITl

.: ': 1 -- Iti 4 11 : - = ,* 'ii

-4 V *4P ,- II-='

- '- 4-i . I" q

,11= 1- ;= ?-- 1

* : , .- . = . ' r.H F- r , +- . ,r .- ':

4111:-t; T1,'4 ' -: ,fr.'j. I - ;- , ,:,

;4c' TISl=.". '.r" fr1.'-±'.'1 -:

;-: 4 "1 = - H : tF.' ,'- =TR Fa,- TR' 1F ,P 1:' =TP? ,T F -

:S:4, 1; F *.,, r4Fp::, = IF'?: - T T 7

TIIK c :'i9F.-

8500=".. I ,I4P:-E) =I I +T I 1510= .". I N ':=TTO-TII

~~.T 'frTI-T PS'4-: P. N= P'- I i 1)=T P,,= -T TI'-::'. 7f= : P N P.-, ,=T PI: -T T '-

85.(' (0= .ItNPR.E)'=12I+TI

854 f=4-0 I"4 : F4 E = N"4 F: E + 49551 0=4: N E E = N B R E + NL. 11 P:8560=440 F: HSE_, E =N:H.- F +NLIPF,?-.857 n=8II0 IF,:NR. NE.8:' G0 TO 1600

:3. A C.=:-- l:- fl=C

:-,. 0=~C. THE FOLL00IINI3 CODE IMFLEMEI':- IHE POINT FO-T-hIE,'.- -

S65 0= MLUP&=S.NDC-NP);-: ~ ~~ 4'_ I- =rL Ij P 2'_:=7. r 1 1i: i,,rt[ , ~l:'

: Hk7*:II [] 3 NIFSE 1 i- . i,

.F ['U - i) [44 -1 • r ,

';7, i:[IL .- ' I,.' Fi, * H

- '.-:' + 4 =rJ., -4 : 7 r ', r '4 -

7 - 1 E-1; F Pr&41

P. T I . 4 P _ I PI P::,,. -_ h=T 1=! ' tF~-H : F .P -1 .' t. 4 F i ,::'.:::*11- rF;,..= P'Ft' r'u: :Triv R. '

- ?:: .i=.:-. r:;7 ~,, - - - riR 4 :,

*L " T4= - :r. F' -. I ,. riF', 5-ii- T5=;tr4s *.'t:3;:: 4 * T = ... ,P NR~' 7, €-- I ,. rN ,

8860= TF (h'4) =T I

282

=T t 1= 4 +-TtNP = 14 1-T

I

.-.- '1 iL--- -t! ',l r - ' -i 311N

A fl= 7..'!>

9 , 0 -_ (T= P :., ;,- , , r .

941tAA1 4 0 1 ' lhF"4 ' = N 1

911 1 p-- ..1 FN- H ' E ET4 TF,

9113 ffir'

'41 4' = 14 *r = 1

1.. THE FU L MIF CODE -M L ME ]- TH: 16 POINTr- ' --'- -lH'' , :MOD

N 'P *' P

':0A0 = - F', 'H'.' F,=1=-

'a fl;;.0 =S2O IAF P -i P- ,Sf7 fl= N p = 0 W .I +

9 110 tr1ln0 IF':NH.,NF-E. 16:' RETURgN91NP Ii

914r I 4 +' 19=,0 THE FOILLI.h11N13 COPTE IMP-LEMENI :-. I t 16 POIINT PO:.T-hliE '.,'E riO[,l_

E

94 pi I J;n=t4 1

S1 -;28

91 } 0 = rMk U.F;?_-= 1:: * ,:iT,:.' -Nt f:

0,' '= rIF:&:E_ =1- -'L-h= jJ 1'-.,4I 14= 1 q b

4 - 1- iI l Ii 4' 1 = *n

.- ,11 Li I.- = , ,

' 'I II F4- =I :; -

4---1,1 I ' ,' -i -Ii

'"-: hi H r4 .=Ht. - 1

' .. ' it: HF.'-:=rF;'*I,'

-V 4-s: ii= rP4pl:=rt[, 1 s I

* 9'.' 0tIf= NP'11b=fl ',l 1

283

-4411 4~ +i' 7 z ij

'4.

~~~ 'Z~z-4Z j -

- 4-. liz - , ,; ,<F- ,=;4-4 'i' , €-' :

''4-w':,-l,.:T .:,= -& r1t4. -,, - t ci .

= I ,: 'I I

T Of. r TT ..-.:- =.1' ' T ( 7 ) , i 1 1 T , i 1

-'. ln "' 1 T ,T , -1I' =;. 1. =T ' E-", -T .:5S)

962O= T 1 ) =T ,jz .T '+1T '

9 PE. - f0 = 0.. ,('3) =T ' 9.' -T 1064 I= I' 1:'=T ,1 11 *T ,12)

4 A5 '=) =T ' 11 -T (12'J6 .- = t.),1: 4 k. T ,1 )+ :1

9.k'. ",7 0) CI (5_,) =T ,15'.-: -T ( 1 4

' ,: A - p;, ,'- =T I' +T 1: 11'

97l (i= '7-. riP 7:, =,:. 7:' -i .. :;.-I Il= I-- '.

4=1i i",'-Iii

'r-, '

-.€. 4 i =- ' = : . - -;.i' 4

iI ?r

L "

S41i .. ,T + ..; , - P .- , PS:

-• - , ,.-,=i J 'r. 4 , - ;..,r-4 I- '

:T ; t': ' 4- . T r"7)w

':4 :4 ll= I It. + 14

-'-'4 i = T ':i ,,I = - i F I t - I a. NP 1 4" fq I= T , W fi -,lr-, 1 1-1 + 7I P, r 'p 1 -"

;- J' I= T: I31 i ' I ' - .I NP 1 1)9'-4 -- [= T I:l, ,= I ,r.lwtl ,-SI ,rF' 1 7 l

' l'11 T ,I I't =' w 11' 1'F;- N -. ; P 1 )

284

"4'"C.r:I1 ":1 ', I , = i ,C . +1

-4--.Il= * .'=- 14. . '-;I

I ITlI I I = , I-i,

1 i 1 11= i T.<, =1- - -T 1 ,'

I-II v'1 ,17 T" i-T I '1~~I 4 z ' 4'T +- rT'

. llf i ': ".'Z =0T - 1 s:I 7

17= m~Y 171 T' sr

~~~ Iii;- Tz T4)t 'z '

1 ftF : -izI ,:tI ' :, ',, - ,- ,-,

U 12111:= - I *: N-F.'' , =', I I, *1- ' ,r4

liii lu I'Air it' =i - '-ZI '.t

1A120= I P ' '1's4I )

1' 1 4 NP I : - * i. - , ,' : ' *

1 1 7 i -.I I, : ' 1 =T F + ; 2 ,

101': 1' 1, ,r41 4 = CT,- I1 .= F. ',: tro , = + H

102 1,1= -F'N .r1 ' z .:..P'6

I 'I,-' 20= F; m ,nr P3 :, = 3 -. ,.106:'.-_- Vi F . r4Rt;, i=P', l 0.F '

1 02.4 n= FP ,'HF 1 F1,1 " " = .F' arP , =:F25

1 A ll,7iz I r-,- I-i t4PH 2" F zlF - F *i:::

1 I I *4;z= ,1 r1"H TI ii- -IL'~

] I I I I= fF- TU [

I :1 I-s.5=*EOF

SI

i

285

Appendix I. Computinq the Prime

Factor Algorithm (PFA)

This program computes the DFT defined by:

N-1X(k) = Z x(n) exp(+j2frnk/N) ; k=0,1, ... , N-i

n=0

where the sequence length N is a product of the relative

prime factors from the set (2,3,4,5,7,8,9 and 16). This

algorithm was proposed by Kolba and Parks in 1977 and was

modified to the program presented here in 1980 by Burrus

and Eschenbacher.

Arguments. The PFA is called using the following

arguments.

N = The transform length which must be factored into

mutually prime factors from the set 2,3,4,5,7,8 and 16.

A list of acceptable sequence lengths is given in

Table 3.11 -a,b.

X and Y = The real and imaginary data arrays containing

the sequence to be transformed. These arrays are dimensioned

to length N.

NI = The array containing the factors of N. If all

four factors are not used the unused factors are set equal

to 1. For example with N=30, we have NI(1)=5, NI(2)=3,

NI(3)=2, and NI(4)=l. The factors of one must be the last

of the M's.

M = The number of ncnunity factors. For N=30, M=3.

286

UNSC = An output indexing constant which must be

precomputed. UNSC = N/(NI(l) + ... + NI(M)).

A and B = Data arrays of length N which contain the

results of the DFT. The real part is in A and the

imaginary part is in B.

Usage. To compute the forward single-variate DFT:

(1) Dimension X, Y, A, and B to length N.

(2) Define N, M, and NI(4).

(3) Compute UNSC.

(4) Input the sequence to be transformed in x and y.

(5) Call PFA (X,Y,A,B,N,M,NI,UNSC).

(6) The Fourier transform results are located in A and B.

t

poI 2 8 7

0i . 2 T ' . .. .. . ..

. I ',.2 =F-1M 'TF',' ,' ' FT I ' ' L T ' ' " -I F *,=:

1. r:. C ''i,

2 '= :, T ,- r r I -

I ' I s' :

F,=:[,'1; ;-,e L l-. ,,~ ' ... u~, , - T T Tf T S -:.rn 2 r

, ~i T ~ ,, . . :. TQ 4. 1! -i. fr -

2 A p PrlS2-s;F4M p Si,-,t-:E ii '_- , , ,- . .. .T : . , -F

'I DI M~ I ON *:-4 N 'Y :4N 'A4 N"B*:ri

I I T . , -,11 4' k- 1 ".1 ':71'-T :, 1 q 1 - , 4 5 F.. Ai Ii A 1-

D.4 T0=:71 ;' ,: 1:1. .Fs1 T :-: .. . I,, I .... r i..I.I ..

_ ,r , T , ... 4. T 4...

4 'r --,.iT , 12

.4'E T;;s . q FII . j c4

fi;zT - : C I ,' 4 [ ,. 11, • ,, '" 41 C4-If = I IH T;z - 1'4, 1-1

-1 A= Ti - 9-8' 1 i 44• 118T 4..'7 114-44 . ,, 149 1

41' 1 :-'ef =T ;-e R . , . I, ,_ 4 .=, ..... ,.,

4 cf:T;i I 4 * 11 -

4:7: 1 ~j ~:,r. i::7;4

-'I T= -I T

*- ........ "*-''*"- i4'-11 1 . . -:,-;';.;, ' .7 1 ;, . ..

.~-:4 -=D ~ ,_ .. ' :,d .........

F = I T 7 T T

Ij t:-UI1= W',.TTq ; :' .T7[ 7 1 t;,7=- ri

L'-j Ti zTT eSI0 cT

1.-.. ,-laq 1 11,

4,: = [;- T;: ; l .r u , L, cii, . .. it . 1 .- .N. 1 -'. 4 I . .' i,-, ,;--" ," ;;~I U '

;-.-

288

"" ~ ~ ~ n Am;F= ''=', !

T I=

I 1. 4 -4.,

III,

.-7 Y lI, f T! , I,

-," ,: =T2-' I .

. 1 2. TT .-: ,

-,;L, II= hh--T'l ,

4 -- 1 =1 Fwv T.;N

F)TI- 4-it

:4 11=* ~ l i

14 (4-

1 97' -*T -0 =T,

8,- (,= ' I = : y! ' = l.,l,-T1. I l=, .

: Ii.= *-',:.. T ,:3 =UI , 4'-TI 2,

:3 T T

34(1=0 : :.aPT : 14 :.., : : :

- ri-i 4 F 1 ' , I ,.1 :j : :' I

-'') I'.: ,' 1 -': )= -:.I1

4 j 1 -'4*i

' I ' ='

• -- ,i 1-= v ,'I ::v' .. +1~

I_: ' = 3 I E '

V-i4 't=]hF T .t=

-I -', = :'- 5 .i = ", ': I 1 .' : " ': I " ,':- .-= : ' ", ' IT ." - , ' C,::

.111= ,F.4=.I--- 3 1 : * ,: -;.i , I ,*4.:

4 289 -.:=, .I ,' ,,+,"

.'9.: I'= :_4 ="" ':I 8 :'-","': I :4

14== 4 -4'ic I =:,1 , " - -,* ".-1

] I. -+ l: _ -. -,-_ *: - 4',

T J!= r ' ,

1 : i .... . .I L 4 I ' ' . I '. I *.C

S I1Y= 1 I *

-41 t T: -

*4 r1 r7

1 . 1:=Y 1 I " = I -

t -1 1:- 4. ' 1 -& 514

1 4( . I1= Y l ':1 ':.2 : ":) -- 1 4+': -

1 .":' =l t'= € I " " '. =:x._- , -"-

I et'l'i- :?ej. 3]0 i;

I YI ':41::-

30~~ TOTi

~ ? r:li=r lrlll ;=14-:i=li ! PT-i=;, , ' +:' 4 _ . .7

-'., 1 '7 7.-; -7 . .,

,- . , ,-9,, -.t, , I- 5 = -+ ;,1; '-, ,. .

tt.4i'. 4 -

,

C;,; ' 1 = TiJ=7*.;*F- :!C €

Jl 1 ,:- i= .>;' *' =. ,' 1 ,+ri

t

0*29

I.A

* + " ii :: +<" -- 7--,. .. _ . .. :--+- -- .Z "| [I - J : +.290

1- -'--~~~ -4=:.ef-

..... . ~ ~ ~ = T -.. . . -.. .

. .- 1 .-.-. T -4

• .-'=I --- ' l--- ' T L €- ,

18-', !-;4= T I -1 ,--T4

R , T- I T: - * 4-

I _ - i' = " =.5 : I, ' - ;.

1.I =0I- ?- 4

1, ".-. = * : ,.; - ;..' .

I.4 1 'a' -7o ,:r- -

4

183 A= 4 T T4

1:Z4= T - T 7 -T .

1850= 11 4 -

' +4I.

A-~~ - .'l 43

1::::: (4= I (k) ) = 1

11 ' ='_= T (?)T=C2 19-1

mp -= - 1T-

I :_',4 V= ' 1'= ' - _, T 4

I i-"'. fs= :,:2 L J 4 r'.k- L.l9 Ii= :-" I* . r,

' .;:=.4= I.Nz-I

1 _-, =."- .= I -U :: I1

1:_:.I n : -:. : ,;,= ' -S

1:.9=1¢, ;l( : , -I-::£

1 '9 n n==-', ,.! a", ji =. I ~ - ' -.

.. ~. -,I. '.,-, -- 1':.

* =14'= c": ' T ' ", =,- .-"- '-

' l:Ii- r( T I" -.. ,-;-

I. e16,0=I -"'4.' $-= I"-

291 =.? ' "-,T :1: -v, j,

=' :- I= ": =;"'1 ' ,-' 'r t : ':.

'To

-~~~~~ +'. -:-T 7. - e

24 A fit 's s;'- T

2 TI

4 Prf I L

" -' - It:.-- --- "_- -. O -: ,

T I- 4 --, . 1

:p4'44

,? ": , ,;4 = ,-' 4 .- -:- . :-

l.- TT 4'.

....... i .... - T

:-: oi--T = ' 4- ;:

, I' . - 4

4' 44oir '' :Z I L ,: -T::

.4 1 r *:: ,: I " ' ' =? T - T 7.-1:-4 Hrn="" :T, 1 :, ,=;iat :

''4:.= [ 'TI, .=F3'.U 4

111.oll-= o, }" ,' * =:-s.,: -T4;:' I f!--".... i " '= l : T

z,, _., ,T '4 ,rr;:& ::'

-" - L - '...rI"-=

- 4i,=: , -

S , . - .P'= - P ; ' - :

,:' .- '- "ft" , '' , S -" , ' ,' I "

,:',;,n= "4'.' , ! * ,, -' , r :

, 292of

4-I-

V .-t -, *T -

tT TT-, , , T*. ,

29 ,,,= P , ,.T, I -T: -

,, ,

•- -7 ='

= ' 1 - *, ,- , ]* ,- -

-,-.=.= , -1 ]-- ,

:j4,,=' >J, *' , - 44e ""

':_ ,t,= ,;1= , I -* ;' 4 :- -

, ..17 n - I 14 - 4 - r .7

2 E::-:n = ,- ;=T 1 -T F T-F

2n= FT TI -T -T4-:

- " : -: : 3 =- -= : j -' 1;- 4

2':'=,n= ', , I I -,, ' -'4-- I ii

_ r

.-

C.C:-+ ,- .=- , c f¢4 * ,'- '. r";3

?,- - 0- = ,= - 4.. " -4 -, - ,

• > l, I= 4 I,' , - 4 r 4 -:, Y: ' :.

- -* -, ,-.

- =r 1T 1T- .T4

i I

" 1 III--- - * I- I, .- 4

4j 1 1 : I - =: '' = - ,:

'. -, : * e ' , 1 , ' , , 1 " ,

= , = '-' ' ' .= 1 - -,

<!, 7I1 ' If ,:. , '- - 4

S1 ::> = '' '-.''= - -r- 4

~293

.4" --;.+i

I=

:.4 1 =Y 1 2L,"'I'I I C ' , :+- 1 = >i.'4 IJ=,-

14 A ,=9= X. II 1

i;::4 ) 1 :

.:44 , r . '

:3 , =- T ' ," 1 1 2

.-'4 -, fl= ,"4--'= :,Y 1 ,4) , -' Y +::,r 1 12 ,):

- . 11'.,.4.. .,-,, [ , ! i,, :

4 f Fr-lI. 'W 5" 19111.I

-. 4 F1 f I= TL 1-r '1

:34fh= 9 "= ' ,: :') 3 4.:'rtj ':11 I:'):

J4c1 r 1r- ',. : . .

..... ..3 = ,:1[ ,:.-:' i, -4.:7 ,I ,":12)) :

=Y :-Y 14,'1c2)* ii~... _,- Q j . .. .. --" .7

LU4 i; ,' =;-:>,:' I. '7,+.: 1', 14i

:34 .N=ia=Y : ,1' T,:, 14' : L:,

-4 I '=:I Y ,:, c.' ' ' I 2-f' 14)

34:: f= 1:-,"3= . i.;: ,::: ,. 4. 1 ' 5

'" " z ".. " Ir .- : = . - I r . ' ; ' + , 1 , :

372 O= TS=P" : I.:

294

:' + T -. .

m.4 1. It;;; I-- 1%

-7- -7 T 4-+T

TI 4c? :2I

;l 0 p-I TI; , T -T.-

-. =1 5= ; - - !•1 .! -

• .--', 1

401 T i 1

4 -13I! I'= . '= . H.

. "-,1 " -,_1.:, '-,1

-' i= ''q=T T 1 TI

-34 = :II-Td-1'qQ f=, '1:S1 1 4-U 1

:E9 '-9= 11 :==T1 * T':

:39 0= :15=T--T 44 0= T1 = U S #U' 9 -,I

4 n I = T1'=,f' 4* -; -

4 0:-.n= 0: =Ta S:r4 -' :-

UI1- 11

4r~fi= TI 1 1 ff;T4 A

411" T'4=- ' 1* ,,-,-,+ 1 4:'40 o 0; = 3 =:-: 1 *s ': 5; *:2:14:

4 1 r = T= ' P1 ;g:* .-lli*'''= t,-, = .;; : ; :

A" 4 2 1 I, 9'E, "2 - , -

41, ' -, TI-T-,

L -

4e tI - 29•* 4el"4 = 1'.-,) 1 'S *T'4

2, I ' :, -: ' ' • ' ' -

1. 295

| -,

.4 '1!=--

IF w 4,I::,= T T I

44

-. -' - -'=

I" .I -;. r -: 1--4* T--

4 -, O= 1 = ! 11. " 4-,9

448- -i=

4 : f 1 ,51 IT i;-1-

-s -I I:T

45 1= x j

4: - 14 P

11111It-

4 441- -I I'-'t.-F..-.44.9 I= j'11-I ij i I;.."

44:7) '1= LI1 1 I ii-.:

441-= -4 i P -7 1'-

44 1:3 = :-:* = l~ " F ''-

447 = :'"- 1 5:'_ ) S] -_l I

4499r!= '" '1 ': 16-:, :, r, 1 r'-'

'- ., -'= - ", .44'9:: :- p 91 f =§7'e -,,7

45 h r = ' ,: I : 1 5: ' :,=: -: 1 ':"

4F. I I*= "":I '." 4).: =:S 1.

.. r - .-. 1- '

4.,: .- g= '. I f 1 - -' . .. ,.7'

4 4.'=:: rI= S I f , I3' c' =::.' -' t 4.:

47, f'1g= '.:. *,T I :4 ' *g:'-""* 1

4,- *Kt = I 1L * 14;-. / =

*44"!-1 ,s ' '' - 11,$, -ri.

4790= 'f'Ii 1 1' =,-.1 -P19

' 296

r ' 'e--- -

4: C; % i4 7l=

488lA ~ L=1

49- A F: L )LIF*:L. T ' = - - -

I '-T , - L479-D I"N T 1 I!- 1 F4:94 fi= PE- T -U2,N495" EN=D

44;.:, I- t ,F....lS,'

4-',70 =--- L=E

a*FOR

A4

297

4:.::-: -o=;- :k : =;.:: LI4 I- F=F:, '' :,=v :L "

- ' fl = L ,.. ... .-4- _1 .'.

Append i.>: J. ini Tcst-; on tHi( CDC Cyhcr 74

The timin(j t2sts on the CDC Cyber 74 used the FORTRAN

command SECOND(CP) which, according to the FORTRAN IV

reference manual, returns time accurate to "two decimal

places", i.e., 0.01 seconds. The results of timing the

various DFT alaorithms showed this clock was accurate to

three decimal places (0.001 seconds) giving a time resolu-

tion of 0.002 seconds. Using three decimal places was

justified since almost every standard deviation was less than

or equal to 0.002 seconds.

To verify the premise that counting the real operations

performed in a DFT is the primary factor determining execu-

tion speed of the algorithm on a computer, the DFT execution

times were measured on the CDC Cyber 74. The execution speeds

for the WFTA, PFA, and the mixed/fixed radix FFTs were com-

pared to the "predicted" execution speed of the algorithm.

To perform these comparisons the multiply and add speeds

were determined for the Cyber 74 computer.

The execution times of the floating point multiply and

. add instructions are given in the CDC 6000 Series Computer

Systems Reference Manual. The execution times for several

4 instructions are listed below and include preparing the next

V instruction for execution:

298

IM

Assemb]y MinorInstrucrt irn Lanqua uc Cycles

Floatini ;um F4.

Floating product FX i 10

Normalize result NX. 41

Fetch/store SA. 3

where one minor cycle equals 0.1 microsecond ( s). Simply

using an add time of 4*0.lps and a multiply time of

10*0.1is = i Ks is not sufficient because the operands must

be fetched and stored which adds more time. To determine

the commands executed by the computer for adds and multiplies

the assembly (COMPASS) language was studied and timed for

three cases. First, the DO loop with no operations was

executed 100,000 times:

DO 102 J = 1,N

102 CONTINUE

The associated COMPASS language code was listed as an

output of the program:

(AA BSS OB

SBO B2 + 7B

SA5 J

SA4 N

SX7 X5 + 1B

IXO X4 - X7

. SA7 A5

PL X5, (AA

299

This 1),),) rr,:uired an -vcra(.y of 2. 70; .. .

deviation 0.03 s) to execute. Next the addition

instruction was executed 100,000 times using the FORTRAN

code:

DO 102 J 1,N

102 TAD A + B

The associated COMPASS code for the addition loop is:

(AA BSS OB

SBO B2 + 7B

SA5 A

SA4 B

SA3 J

SA2 N

FXO X4 + X5

NX7 BO, XO

SX6 X3 + 1B

IX5 X2 - X6

SA6 A3

SA7 TAD

PL X5, (AA

This add loop required an average of 3.34is (standard

4 ' deviation 0.3os) to execute. Notice the "extra" instructions

of the add loop versus the no operation loop:

300

Corr'-m' ! i nor ' '

SA5 3

SA4 B 3

FXO X4 + X5 3

NX7 BO, XO 4

SA7 TAD 3

17

Finally the multiply loop was executed 100,000 times.

The FORTRAN code is:

DO 102 J = 1, N

102 TAD = A*B

and the corresponding COMPASS code loop is:

)AA BSS OB

SBO B2 + 7B

SA5 B

SA4 A

SA3 J

SA2 N

FX7 X4*X5

SX6 X3 + 1B

* IXO X2 - X6,4

SA6 A3

SA7 TAD

PL X5, )AA

The multiply loop averaged 3.37ps (standard deviation 0.03)

to execute. The extra instructions required for the multiply

loop relative to the no operation loop are:

301

Command Minor Cycles

SA5 b 3

SA4 A 3

PX7 X4 * X5 10

SA7 TAD 3

19

Comparing the measured execution times of the three

loops shows the add loop is 0.64ps longer. Based on the

minor cycle times for the extra add and multiply commands,

the add loop should be 17*0.lps longer and the multiply

loop should be 19*0.11s = 1.9ps longer than the "no operation"

loop. (Notice that every floating point addition must be

"normalized" by the command NX7 which requires 4 minor

cycles. The floating point sum does not require normalization).

The difference in measured add and multiply speed (0.64ps

and 0.67.s) versus the predicted add and multiply speed

(l.7ps and 1.9ps) is a result of the very short loops

fitting inside the Cyber's "instruction/execution stack"

which is a 12 word stack with 60 bits per word. Since the

entire loop could fit in the stack the instructions were

fetched only once instead of 100,000 times, whereas "all

execution times (minor cycles) listed include readying the

next instruction for execution". During normal DFT

algorithm execution of all of the instructions must be

fetched which means the add speed is 1.7ps and the multiply

speed is 1 .9us. These numbers were then used to predict

execution speed of the DFT algorithms.

302

Vi ta

John David Blanken wi!7 horn on 18 AXpril 1953 in

Junction City, Kansas. Hie graduated from Junction City

High School in 1971 and attended Kansas State University

from which he received a Bachelor of Science in Electrical

Engineering in May 1975. Upon graduation, he was desig-

nated an AFROTC distinguished graduate and received a

commission in the United States Air Force. He entered

active duty 15 July 1975 and was assigned to the Air Force

Avionics Laboratory at Wright-Patterson Air Force Base,

Ohio. On June 6, 1979 he was assigned to the School of

Engineering, Air Force Institute of Technology.

Permanent Address: 716 West 8th StreetJunction City, Kansas

66441

7 A'., . ~i.CONL1RACI 1 U-.4 I .>

-r I ' i) . ;, -, I -.

717 i

r,7 cr Pal. r AF6-, Ohio 454-i1..NL_7', S302_ _

Uncl a! s i f i d

Pn-J, !,-,'!A d , J.i; ):.2>. 0'.lO l- l,6.

IT ' 1 7 T, , 1'r-,7 It i,.e - -

. -... .. . .. . .. . .. . ... .. ... . .. . ..

I, _I

. . .. 2l -

. . . . . . . . .. . . . . . . . . . . . . . . . . . ..

o -. . 1..

I I s C. )u i or ' J': ,+

* .

S'CIIU ., cil 1 -1' t '7.i : I i~ w" Io c41

nit, I,, a 1'. W 'l* ic r) V.' v, o L

us C' 'I ':I3,', ,hi • :' . '+ . '+ 0, ( :.' 1 IA llt .W h

thc: . 11 ", 1. ' , L 1 .I d ) t ]w !. s u t ,1-; o f t h" 1+:i Li C t. i o .mu s hid:; t- ic ,:i J d.d i-

flO 1I 1 W t.l 0 tc 1.'7! c V o) .. r

I, c .I ; :. c : : (] . ,] J J ,.. : : u t- : ./ : ,) + J- J.+J t. w + ;

llS 4 L; [,2 .q ], .t,:: ,-. L+. .,

Documents

AIR FORCE WRIGHT-PATTERSON AFS OH SCHOO- … · efficient computer implementations of fast fourier ... efficient computer implementations of fast fourier transforms, ... a dft to