Upload
dangdan
View
228
Download
3
Embed Size (px)
Citation preview
AD-AlOC 782 AIR FORCE INST OF TECH WRIGHT-PATTERSON AFS OH SCHOO--ETC F/S 19/1EFFICIENT COMPUTER IMPLEMENTATIONS OF FAST FOURIER TRANSFORMS (U)DEC 80 J 0 BLANKEN
UNCLASSIFIED AFIT/GE/EE/8OD-9 NLSIIIIIIIIII
EIIIIEEEEEEEEEEEEIIIEIIEEEEEIIEEEEEEEIIEEEEIIIIIIEEEIIIllElllEllEllEEI
EEEEEEEEEEEII
DISCLAIMER NOTICE
THIS DOCUMENT IS BEST QUALITYPRACTICABLE. THE COPY FURNISHEDTO DTIC CONTAINED A SIGNIFICANTNUMBER OF PAGES WHICH DO NOTREPRODUCE LEGIBLY.
, ,I
4 ( ii " ' 1 . . . :
EFFICIENT COMPUTER IMPLEMENTATIONS OF
FAST FOURIER TRANSFORMS
THESIS
*AFIT/GE/EF/8OD-9 John D. B13nkenCaptain USAF
JUL 1 1'331
Approved for public release; distributio~n unlimiteld
LAM--
.-
/ AFITr/,FiEE/'OD-9
EFFICIENT COMPUTER IMPLEMENTATIONS
OF FAST FOURIER TRANSFORMS,
THESIS
Presented to the Faculty of the School of Engineering
of the Air Force Institute of Technology
Air University
in Partial Fulfillment of the
Requirements for the Degree of
9 Master of Science
John DBlanke n~ B.S.E.E. ~ C~
Capt. USAF ,
I .,...
Graduate Electrical Engineering
Approved for public release; distribution unlimited
lt
t Acknowledgments
Dr. John Hines and Mr. Harold Noffke of the Air Force-I
Wright Aeronautical Laboratory proposed this topic. I am
indebted to them for both the topic and their support.
Gratitude is due Drs. Burrus and Parks of Rice
University for many helpful discussions and ideas through-
out this effort. They graciously provided the Prime Factor
Algorithm studied and tested in this paper.
A special thanks is extended to Mr. Jim Thompson
of Control Data Corporation and Dr. Poirier of Aeronautical
Systems Division for their invaluable assistance in quanti-
fying the multiply and add speed on the Cyber 74. A debt
of gratitude is owed to my thesis readers Capt. Larry Kizer
and Dr. Kabrisky. Their willingness to let me work inde-
pendently was greatly appreciated.
My very deepest appreciation is extended to my faculty
advisor Dr. Pedro Rustan for his patience, confidence, and
skilled quidance. Most of all I appreciate his enthusiasm
and hard work which motivated me to continue my efforts.o
Finally, I am indebted the most to my wife, Linda, for
her encouragement, typing the rough draft, and allowing me
to sacrifice our family life and complete my work at AFIT.
I 9John D. Blanken
J. 4 This thesis was typed by Niki Maxwell1.,
4,
Contents
Page
Acknowledgments ....................................... ii
List of Figures ....................................... vii
List of Tables ........................................ x
Glossary of Terms ..................................... xi
Abstract .............................................. xii
I. Introduction ..................................... 1
1.1 Background ................................. 11.2 Problem ................................. 21.3 Scope ................................... 21.4 Assumptions ................................ 31.5 Approach and Presentation ............... 5
II. Literature Review ................................ 7
III. FFT Theory .................................... 12
3.1 Computing Trigonometric Function Values. 133.2 Fixed Radix Algorithms .................. 14
3.2.1 Development of Radix-2 Theory .... 143.2.2 Development of Radix-3 Theory .... 263.2.3 Radix-5 Theory .................... 403.2.4 Digit Reversal Algorithm ......... 413.2.5 Development of a Radix-3 FFT
Based on the Cube Root of Unity.. 453.2.6 Summary ............................ 51
3.3 Real Operations Count for Fixed Radix
FFTs ....................................... 51
* 3.3.1 Number of Butterflies in Fixed
Radix-p FFTs ....................... 523.3.2 Number of Twiddle Factors in
4 Fixed Radix-p FFTs ................ 53j 3.3.3 Number of Trigonometric Functions
Required for the Fixed RadixAlgorithms ......................... 53
3.3.4 Number of Real Operations in4 Radix-p FFTs ....................... 56
3.3.5 Real Operations Count for theRadix-3 FFT Using the ComplexCube Root of Unity ............... 58
3.3.6 Memory Requirements for FixedRadix FFTs ....................... 61
iv F.B.Ch. "1&i I ih. b.aL, -NOk Fl iJ
Contents
Page
3.4 Mixed Radix FFT Algorithms ............... 63
3.4.1 Mixed Radix Theory ................ 653.4.2 Digit Reversal Algorithm
(General) ........................... 703.4.3 Twiddle Factors ..................... 723.4.4 Real Operations Count for Computing
Sine and Cosine Difference Equation 733.4.5 Real Operations Count for Mixed.
FFTs ................................ 783.4.6 Memory Requirements for Mixed
Radix FFTs .......................... 99
3.5 Fourier Transforms Using Fast ConvolutionAlgorithms ................................ 109
3.5.1 Converting a DFT to CircularConvolution ........................ 110
3.5.2 Reordering the Data Arrays ........ 1123.5.3 The Winograd Fourier Transform .... 1133.5.4 The Prime Factor Algorithm Theory. 1183.5.5 Real Operations for WFTA .......... 1223.5.6 Memory Requirements for WFTA ...... 1273.5.7 Real Operations for the PFA ....... 1343.5.8 Memory Requirements for PFA ....... 1373.5.9 Summary ............................ 144
IV. Comparison Results of Efficient Discrete FourierTransforms ....................................... 145
4.1 Introduction .............................. 1454.2 Conventional Radix-3 vs R(u) Field
Radix-3 ................................... 1474.3 Fixed Radix vs Mixed Radix FFTs .......... 1474.4 Mixed Radix FFT Comparison: IMSL vs
Singleton ................................. 1504.5 Conventional vs Fast Convolution Mixed
Radix FFTs ................................ 158
4.5.1 Real Operations Count ............. 1594.5.2 Memory ............................. 1714.5.3 WFTA vs PFA Operations Count ...... 173
4.6 Flexibility of the DFT Algorithms ........ 1744.7 An Algorithm to Select the Most Efficient
DFT Technique ............................. 178
4.7.1 Arguments .......................... 1784.7.2 Usage .............................. 179
v
Contentfi-
Page
V. Conclusions ..................................... 185
5.1 Results and Conclusions ................... 1855.2 Recommendations ............................ 188
Bibliography ........................................... 190
Appendix A: Radix-2 FFT Algorithm ................... 193
Appendix B: Radix-3 FFT Algorithm ............... *... 195
Appendix C: Radix-3 FFT in R(u) ...................... 201
Appendix D: Radix-5 FFT Algorithm .................... 208
Appendix E: Mixed Radix FFT Algorithm ............... 222
Appendix F: Singleton's Mixed Radix FFT ............. 242
Appendix G: IMSL Mixed Radix FFT .................... 258
Appendix H: An Alqorithm for Computing the WFTA..... 264
Appendix I: Computing the Prime Factor Algorithm(PFA) ................................... 286
Appendix J: Timing Tests on the CDC Cyber 74 ........ 298
vi
7.
List of Fiqures
Figure Page
3.1 Flowgraph of the Decimation-In-Time Decom-position of an N-Point DFT Computation intoTwo N/2-Point DFT Computations (N=8) ........ 16
3.2 Flowgraph of the Decimation-In-Time Decom-position of an N/2-Point DFT Computationinto Two N/4-Point DFT Computations (N=8)... 19
3.3 Result of Substituting Figure 3.2 into
Figure 3.1 .................................... 20
3.4 Flowgraph of Two-Point DFT ................... 22
3.5 Flowgraph of Complete Decimation-In-TimeDecomposition of an 8-Point DFT ............. 23
3.6 Flowgraph of Basic Butterfly Composition .... 25
3.7 Flowgraph of Simplified ButterflyComposition ................................... 27
3.8 Flowgraph of 8-Point DFT Using the "TwiddleFactor" Butterfly of Figure 3.7 ............. 28
3.9 Butterfly Flowgraph for First StageDecimation (N=9)............................... 31
3.10 Butterfly Flowgraph for F(k) ................ 33
3.11 Complete Butterfly Flowgraph (N=9) .......... 34
3.12 General Radix-3 Butterfly Flowgraph ......... 38
3.13 Basic Twiddle Factor Radix-3 Butterfly ...... 39
3.14 Radix-5 Twiddle Factor Buttcrfly ............ 42
3.15 Digit Reversed Input and Output Arrays ...... 44
3.16 Radix-3 Butterfly in R(u) Arithmetic ........ 49
3.17 First Decomposition N=30 ...................... 67
3.18 Butterfly Flowgraph for N=30 ................. 68
3.19 Radix-4 Butterfly Flowgraph Showing the 4Twiddle Factor Multipliers ................... 75
vii
- ' ' " J'i" ' ' "n - .. ... . .. .
List of Figures
Figure Page
3.20 Radix-2 Section of Singleton's FFT .......... 84
3.21 Radix-3 Section of Singleton's FFT .......... 85
3.22 Radix-4 Section of Singleton's FFT .......... 87
3.23 Radix-5 Section of Singleton's FFT .......... 89
3.24 General Factor Section of Singleton's FFT... 91
3.25 Multiplications vs N for Singleton'sFFT (N<200) .................................... 97
3.26 Additions vs N for Singleton's FFT (N<200).. 98
3.27 Multiplications vs N for Multiples of2,3,4, and 5 ................................. 100
3.28 Additions vs N for Multiples of 2,3,4 and 5. 101
3.29 Memory Array vs N (<200) for Singleton's FFT 107
3.30 Memory Array vs N (<200) for IMSL's FFT ..... 108
3.31 Flow Control in WFTA Program ................ 124
3.32 Real Multiplications for WFTA ............... 131
3.33 Real Additions for WFTA ..................... 132
3.34 Memory Comparison Between Modified andOriginal WFTA ................................ 133
3.35 Real Multiplications for the PFA ............. 140
3.36 Real Additions for the PFA .................. 141
3.37 Percentage Savings of Multiplications byUsing Shifts in PFA .......................... 142
3.38 Memory Array Required by PFA ................ 143
4.1 Fourier Transform of e-t cos 50iit ........... 146
4.2 Memory Array Saved Using Singleton's Insteadof IMSL's FFT ................................ 155
4.3 Real Multiplication Comparison for PFA, WFTA,and MFFT (N<500) ............................ 160
viii
List of Iicures
Picjure "age
4.4 Real Addition Comparison for PFA, WFTA,and MFFT (<500) ............................. 162
4.5 Predicted Times of Execution as a Percentageof Measured Time for the MFrT, WFTA, and PFA 170
4.6 Memory Arrays Required by MFFT, WFTA, andPFA ......................................... 172
4.7 Relative Efficiencies of MFFT, WFTA, andPFA ......................................... 175
4.8 Flowchart to Select Most EfficientAlgorithm ................................... 180
i
I.
4Ii
= ' " , - r -? l * l : ': ' "l :
... . . . .' : ... . .
List of Tables
.i've Page
3.1 Real Operations Count for Radix-2,3 and 5... 59
3.2 Comparison Between Complex and R(u) Radix-3FFT For Real Operations ..................... 62
3.3 Fixed Radix Memory Required .................... 64
3.4 Results of Counters in Singleton's FFT ...... 79
3.5 Operations Executed for Each Counter .......... 81
3.6 Small-N Operations Count for WFTA ............ 119
3.7 PFA Small-N DFT Operations Count ............. 121
3.8 McClellan and Nawab's WFTA Real Operationsfor the Sr' II-N Algorithms .................... 128
3.9 Real OperaLions and Memory for McClellanand Nawab WFTA .............................. 129
3.10 PFA Small-N DFT Operations Count for NoShifts ...................................... 136
3.11 PFA Real Operations and Memory Count forN<72 ........................................ 138
4.1 Radix-3 Timing Comparison ................... 148
4.2 Fixed Radix (FR) vs Mixed Radix (MR) FFTs... 149
4.3 Program Memory Required by FFTs ............... 151
4.4 Timing Results for IMSL and Singleton FFTs.. 154
4.5 Operations and Memory Array Comparison forMFFT, WETA, and PFA ......................... 164
4.6 Timing Results from the WFTA Subroutines .... 1674
4.7 Measured and Predicted Timing Results forV MFFT, WFTA, and PFA ......................... 168
5.1 Comparison of DFT Algorithms .................. 189
x
m-
Glossary of Terms
1. Butterfly: The DFT computation of Figure 3.4 pro-
vides the notation whose appearance is that of a
"butterfly".
2. Fixed Radix: The term "radix" is commonly used to
describe a specific FFT decomposition. The term
"fixed" radix means that all the factors of N are
the same.
3. Mixed Radix: All the factors of N are not identical.
4. Relatively Prime: The numbers in a given set are said
to be relatively prime when no number in the set is
divisible (with no remainder) by any other number in
the set. Example, (2, 3, 7, 9) are not relatively
prime sets because 9 is divisible (with no remainder)
by 3. The following example is relatively prime:
(2, 3, 5, 7).
5. Square and Square--free Factors: For the case where
N = 4 • 3 - 7 • 4, the "4s" are square factors and
the 3 and 7 are square-free.
6. Twiddle Factors: The term refers to the complex
multipliers of Figure 3.8 which pre-multiply the FFT
4butterflies. They are sometimes called phase or
rotation factors.
t
t.t
xi
Abstract
-A comprehensive comparison of the most efficient
Discrete Fourier Transform (DFT) techniques is presented.
The DFT algorithms selected are the fixed radix Fast
Fourier Transform (FFT), mixed radix FFT, the Winograd
Fourier Transform Algorithm (WFTA), and the Prime Factor
Algorithm (PFA). Comparison of the alqorithms is based
on the number of real multiplications, additions, and
memory arrays required as a function of sequence length N.
This paper reviews the literature, selects the most
efficient DFT FORTRAN programs available, develops the
number of real multiplications and additions as a function
of N, and compares the algorithms using tables and plots of
real multiplications, additions, and memory arrays. This
comparison shows that the WFTA and PFA require the least
real multiplications and additions, but the fixed radix
and mixed radix FFTs require the least memory. The mixed
radix FFT is much more flexible than WFTA or PFA since N
can be any length sequence. The WFTA and PFA are closely
studied and tradeoffs between the two are discussed. The
PFA uses less additions but more multiplications for most
sequence lengths which means the WFTA is more efficient
when multiplications are "costly" relative to additions.
4The PFA uses less memory than the WFTA making the PFA
preferable when the machine memory is limited. -Based on
xii
the results of the paper, an alqorithm is presented to select
the most efficient. DPI' for an N length sequence given the
, multiply speed, add speed, and memory size of the computer.
xiii/e
*'rxiii
I. Introduction
1.1 Background
Computing the Discrete Fourier Transform (DFT) of N
points has many applications in scientific and engineering
calculations. In 1965 Cooley and Tukey described an
algorithm which became known as the Fast Fourier Transform
(FFT) because it reduced the number of complex operations
required to compute the DFT from N2 to N log 2 N where
N=2m , m an integer. Using ideas proposed in the Cooley-
Tukey paper a mixed radix algorithm was written and pub-
lished in 1969 by Singleton which permitted N to be any
positive integer length sequence.
In 1976 Winograd proposed a mixed radix DFT algorithm
which (1) converted the DFT to circular convolution,
(2) used fast convolution algorithms to perform "short-
DFTs", and (3) nested these short-DFTs into a structure to
perform long Fourier transforms on complex data sequences.
This alqorithm became known as the Winograd Fourier Trans-
form Algorithm (WFTA). The WFTA maintained the real
additions count at the FFT levels while significantly
reducing the real nultiplications required.
Kolba and Parks, 1977, used Winograd's fast convolu-
tion algorithms and proposed a new Prime Factor Algorithm
1
(PFA). This new algorithm niodified the short-DFTs to use
"shifts" instead of multiplication by 1/2 and (lid not u:,,
the nested structure of WPTA. As a consequence the J'I'A
uses more real multiplications and less additions rcldt iv,'
to the WFTA for a given length sequence N.
1.2 Problem
Both Winograd, 1976, and Kolba-Parks, 1977, compirLl
their operations count to that of the FFT but did not
include all possible WFTA ard PFA sequence lengths. Fur-
ther, no comparisons were made on the basi3 of memory arrays
required by each algorithm as a function of N. This paper
presents a comprehensive comparison of fixed radix FFTs,
mixed radix FFTs, WFTA, and PFA based on real operations
and memory arrays. This comparison provides the informa-
tion needed to select the most efficient algorithm to
perform the DFT based on machine size, machine speed,
and real operations.
1.3 Scope
This paper reviews the literature, selects DFT
algorithms for comparison, studies the theory of each
algorithm selected, develops the real operation and
memory count as a function of N, compares these algorithms
using tables and plots of operation and memory counts,
and presents an algorithm to select the most efficient
techniques.
2
M. ~ 4k IV - - - - -
The DFT algorithms selected for study and comparison
(1) Radix-2 FFT
(2) Radix-3 FFT
(3) Radix-3 FFT in the R(u) field
(4) Radix-5 FFT
(5) Mixed radix FFT written by the author
(6) Mixed radix FFT written by Singleton
(7) Mixed radix FFT available from International
Mathematical Subroutine Library (IMSL) on theCDC Cyber 74
(8) WFTA
(9) PFA.
Each of these algorithms has a particular advantage which
makes selection of the best algorithm dependent on the
machine size, machine speed, and sequence length.
1.4 Assumption:,
To a first approximation, the speed of an FFT
algorithm is proportional to the nunber of complex
multiplications used. The number of times the data array
is indexed is, however, an important secondary factor
(Singleton, 1969). Kolba and Parks, 1977, substantiated
4 this assumption by timing the PFA and FFTs on an IBM
370/155 for several sequence lengths and showing that the
FORTRAN coded PFA (having less real additions and multi-
plications) was faster than the FFT FORTRAN algorithms.
I 3"
In 1978 Morris demonstrated that the sequence of
arithmetic operations in a DFT algorithm's internal
structure can result in different execution times "between
ostensibly equivalent algorithms on a given machine"
and that the computer dependent algorithm/architecture
interactions may also alter relative performance of the
different algorithms. He modified the FORTRAN coded
radix-4 FFT and WFTA programs and matched them to the
PDP 11/55 and IBM 370/168 architecture and showed that
the WFTA offered neither time or space advantages over the
radix-4 FFT. Morris achieved these results because "the
radix-4 FFT appears almost ideally matched to the PDP-11
architecture" whereas the WFTA "has extra load/store
burdens" and requires extra data array indexing.
Morris demonstrated that it may be possible to
optimize DFT algorithms to match a certain machine, how-
ever, this type of optimization of the FORTRAN DFT alqo-
rithms is outside the scope of this paper. It is assumed
that existing FORTRAN coded DFT algorithms will not be
modified and selecting an algorithm which minimizes real
operations produces the most efficient algorithm.
This paper derives and tabulates real operations
counts as a function of N for the algorithms listed in
Section 1.3. The most efficient DFT algorithms are timed
on the CDC Cyber 74 computer and compared to the predicted
execution time based on real operations. These predicted
times are shown to be consistent with the timing results.
4
I
1.5 Approach and Presentation
A literature review is presented in Chapter II which
starts with the 1965 Cooley-Tukey paper and follows the
various DFT algorithm developments up through Kolba-Parks'
1977 article. The review puts Rader's 1968 landmark paper
in perspective with Winograd's "nested" DFT algorithm and
the subsequent work by Kolba and Parks.
Next, the theory behind the DFT algorithms is reviewed,
the real operations count developed, and the memory array
count needed for a sequence length N is determined. The
general expressions for real operations and memory array
counts are developed from published articles or from the
background theory and then plotted and tabulated as a
function of N. The readers familiar with the FFT and
Winograd background theory may wish to skip Sections 3.1
and 3.2.
In Chapter IV comparison tables and plots of the
DFT algorithms make it possible to select the most
efficient algorithm based on real operations and memory
* array required. Timing results from the CDC Cyber 74
system for representative sequence lengths are tabulated
to substantiate the assumption that minimizing real
operations equates to maximizing efficiency. An algorithm
is also presented at the end of Chapter IV which uses the
Setables in this paper to select the most efficient DFT
technique given the sequence length, memory size, and
computer add and multiply speed.
5
I ± .. .: . . .. . . .+ +"
I Conclusions and recominenclations are presented in
Chapter V. I
-I
A
1
4
4
-I6
- ~
II. LITERATURE REVIEW
The calculation of the Discrete Fourier Transform (DFT)
is a central operation performed in digital signal proces-
sing but was not widely used for other than trivial sequence
lengths because of the cumbersome DFT evaluation:
N-1X(k) = Z x(n)exp(-j2frnk/N) (2.1)
n=0
which required on the order of N2 complex operations.
In 1965 Cooley and Tukey published "An Algorithm for
the Machine Calculation of Complex Fourier Series" which
stimulated the widespread use of an algorithm which became
known as the "Fast Fourier Transform" (FFT). Their paper
proposed an efficient method of computing the DFT by factor-
ing an N length sequence into its prime components:
N = n 1 n 2 ... nm (2.2)
and then decomposing Eq (2.1) into m steps with N/ni trans-
formations within each step. If n1=n2= ... nm=2, the
operations are reduced to the N log 2 N level from the
previous N2 level.
Most of the early work on the FFT (Bergland, 1968) was
directed toward the special cases where N=2m which yielded
simple and efficient algorithms. These algorithms are
efficient because no multiplications are needed to evaluate
'. the 2-point DFT butterflies which can reduce the operations
count below the N log 2 N level.
7
Other "fixed radix" algorithms were studied and Dubois
and Venets nopouor nnbli.hcd "A New Radix-3 Algorithm" in
1978 which demonstrated that a radix-3 butterfly could be
computed without multiplications by defining a new basis
(l,u) instead of using the complex plane (l,i) basis, where
u is the complex cube root of unity. This technique was
later shown to be limited to the special cases of 3m and 6m
(Burrus and Parks, 1979).
Based on Cooley and Tukey's paper "mixed-radix"
algorithms were written by Brenner and Singleton. The
most efficient and popular of these algorithms was "An
Algorithm For Computing the Mixed Radix Fast Fourier Trans-
form" published in 1969 by Singleton and is frequently used
in digital signal processing where a wider choice of N is
needed. The Singleton algorithm can perform the DFT using
FFT techniques of any length sequence N but becomes most
efficient when N is highly composite from the set of inte-
gers 2, 3, 4, and 5. If N is a prime number the algorithm
2performs a DFT using N operations. The Singleton algorithm
became the standard against which all future DFT techniques
were measured.
In 1968 Rader presented "DFTs when the Number of Data
Samples Is Prime" which showed that a prime number length
sequence contains an (N-l) point circular convolution. lie
showed how to isolate the convolution by applying a permuta-
tion to the (N-l) signal points x(l), x(2), ... , x(N-1).
He also gave the permutation applied to the complex
871
multipliers from the set [exp(-j2irnh/N),k=l,2, ... , N-I].
Both of the permutations were generated by using a "primi-
tive" root which exists for N length prime sequences
(McClellan and Rader, 1979). Rader's paper was largely
overlooked for many years but took on new significance when
Winograd presented his new DFT algorithm "On Computing the
Discrete Fourier Transform" in 1976.
Winograd combined Rader's idea of converting a DFT to
circular convolution with his own fast convolution algo-
rithms to produce a new DFT method called the "Winograd
Fourier Transform Algorithm" (WFTA). Winograd provided the
fast convolution algorithms for short prime and prime power
length sequences and proposed that longer transforms be
computed by "nesting" the short-high speed transforms. He
presented a table comparing the WFTA to the radix-2 FFT
operations and showed that the number of additions remained
at the FFT levels while the number of multiplications was
significantly reduced.
Kolba and Parks published "A Prime Factor FFT Algorithm
Using High Speed Convolution" in 1977 which modified
Winograd's fast convolution algorithms to permit "shifts"
instead of multiplications by 1/2. They also changed the
nested structure of the WFTA in favor of a conventional FFT
decomposition. The decomposition of the sequence was based
on an algorithm proposed by Thomas, 1963, in his article
"Using a Computer to Solve Problems in Physics" which uses
an index mapping based on the Chinese Remainder Theorem.
9
Kolba and Parks selected several N length sequences and
compared their operations count to WFTA and FFT.
Paralleling Winograd's fast convolution work are the
studies into number theoretic transforms (NTTs) which have
been proposed for digital cyclic convolution and digital
filtering. The NTTs were first published by Pollard, 1971,
in "The Fast Fourier Transform in the Finite Field". He
showed that an analogous transform to the DFT exists in the
finite (or Galois) field where exp(j2nk/N) terms are
replaced by r n k in the DFT expression such that:
N- 1 nX(k) =Z x(n) rn k (2.3)
n=O
Notice that Pollard chose the alternative definition of the
DFT where the exponent of e is positive. The r term is
defined in the Galois field (GF) such that the same cyclic
convolution properties exist in GF and in the complex field
for the DFT. He then proved that this analogous DFT could
apply prime factor decomposition to the N length sequence
and perform N/n: transformations to reduce the operations
in GF to the N log2 N level which provided the FFT in GF.
Pollard proposed that this technique be applied to cyclic
convolutions in GF, multiplication of polynomials over
GF(pn), aperiodic convolution of integer sequences, multi-
plication of very large integers, division of polynomials
over CE (p), and a chirp-Z-transform for NTTs (McClellan and
Rader, 1979).
10
** %--
Pollard's paper stimulated more sLudy of the NTTs.
Reed and Truonq' s 1975 paper, "The Use of Finite Fields to
Compute Convolutions", includes complex valued NTTs. It
was snown that this NTT over GF(q 2 ) can reduce convolution
operations to the FFT levels. If q is sufficiently large
2the NTT can be used over GF(q ) to transform a sequence of
2complex integers x(n) into X(k) on GF(q ) for which the
2inverse transform of X(k) on GF(q ) is precisely the
original sequence x(n). Using these ideas filtering or
convolutions without roundoff errors can be obtained on a
sequence of complex integers.
Most applications of the NTTs have been in the areas
of digital filtering and convolution. The author was not
able to find any NTT algorithm which could be compared to
the FFT, WFTA, or PFA and perform all the same functions
as these three algorithms.
PFA, WFTA, and FFT represent the most efficient and
flexible FORTRAN programs available to perform the DFT.
Each algorithm has its own particular advantage over the
other two depending on machine size and speed for a particular
sequence lencth. None of the articles reviewed presents a
comprehensive evaluation or comparison of the three
4 algorithms based on real operations and memory arrays
required to perform a DFT for any sequence length N. This
paper fills that need so that an efficient algorithm can
be selected.
L
III. F"1T 'Theory
The set of algorithms known as the Fast Fourier
Transforms (FFT) use a variety of methods to reduce the
computation time required to evaluate the Discrete
Fourier Transform (DFT). The DFT is the central part
in most spectrum analysis problems and the FFT can improve
performance by a factor of 100 or more over direct eval-
uation of the DFT (Rabiner and Gold, 1975). Therefore,
the FFT is crucially important to the digital signal
processing techniques.
This section begins with "fixed radix" FFT algorithms
by discussing a "decimation-in-time" algorithm, the data
reordering (bit reversal) theory, the real operations
(addition and multiplication) count, a new fixed radix
algorithm in the finite field, and then summarizes the
memory required to use the fixed radix algorithms. Next
the conventional "mixed" radix algorithms are presented
by discussing the theory, digit reversal, real operations
count, and memory required to utilize the mixed radix
algorithms. This theory chapter concludes with a dis-
cussion of mixed radix algorithms based on fast convolu-
tion. The theory, data reordering, real operations count
and memory are also presented for these algorithms.
Before discussing the FFT algorithms comments must
be made relative to computing the trigonometric function
values needed to evaluate the PFT.
12
l~ l R ]. - : -
.,, . . . .
3.1 Comutinq Triqonomctric Function Values
The trigonomctric values used in 'l"Ts can be rcpre-
sented as values on the unit circle. The values are based
on integer powers of
exp (-j2ff/N)
which can be computed using sine and cosine functions. It
is useful to have accurate methods of generating the sine
and cosine terms other than the method of repeated use of
library sine and cosine functions.
The method most widely used in FFT algorithms
(Singleton, 1967) generates the trigonometric functions by
a difference equation given by:
cos ((k+l)a)
= (C • cos(ka) - S * sin (ka)) + cos(ka)
sin ((k+l)a)
= (C • sin(ka) + S • cos(ka)) + sin(ka)
where.2
C = -2 sin (a/2)
S = sin(a)
cos (0) = 1
sin (0) = 0
This technique is used for all FFTs presented in this paper
(except noted otherwise) because it minimizes using FORTRAN
library subroutinls cos (') and sin (-) thereby reducing
the overall FFT computation time.
13
3.2 Fixed Radix Alorithms
While FPT algorithms are well known and widely used,
they are relatively intricate and somewhat difficult to
grasp at first reading. There are two excellent textbooks
(Rabiner and Gold, 1975; Oppenheim and Schafer, 1975)
which discuss the FFT theory in great detail and present
FFTs based on decimation-in-time and frequency. Both
texts spend a great deal of time discussing the radix-2
FFT, which is the most widely known and ured. For this
reason, the radix-2 development is presented here as a
convenience for the reader and provides a theoretical
background from which the other fixed radix algorithms are
derived.
3.2.1 Development of Radix-2 Theory. To achieve
the reduction in complex operations (defined as four real
multiplications and two real additions) from N2 to N log 2 N
it is necessary to decompose the DFT computation into
smaller and smaller DFT computations. As a result, the
symmetry and periodicity of the complex exponentialnk
exp(-j2nnk/N) = WN can be exploited. This radix-2
algorithm is based on decomposition of the sequence x(n)
from the DFT expression:
N-IX(k) F. x(n)exp(-j2rnk/N) (3.1)
n=O
k = 0, 1, ..., N-1 and N= 2m
which is known as a "decimation-in-time" algorithm
(Oppenheim and Schafer, 1975). Since N is an even integer,
14
A°AL
X(k) can be computed by separating x(n) into two N/2 length
sequences consisting of even-numbered points and the odd-
numbered points in x(n). Using n=2r for n even and n=2r+l
for n odd Eq (3.1) becomes:
T 2rk T (2r+l)kX(k) = E x(2r)W N + E x(2r+l)W N (3.2)
r=O r=0
where T=(N/2)-l and WN = exp(-j27/N). By expanding
(2r+l)k kWN and factoring out WN Eq (3.2) ca: be rewritten as:
T 2 rk k T 2 rkX(k) =EZ x(2r) (WN) + WN E x(2r+l) (WN) (3.3)
r=0 r=0
2But WN = exp(-j4T/N) exp(-j27/(N/2)) = WN/2 and Eq (3.3)
can be written as:
T rk k T rkX(k) =Z x(2r)WN/2 + WN E x(2r+l) WN/2
r=0 / r=ON/
k= G(k) + WN H(k) (3.4)
Each of the sums in Eq (3.4) is an N/2 point DFT, the
first sum being the even numbered points of the original
sequence and the second sum being the odd numbered points
of the original sequence. Although the index k = 0,1,...,N-1,
each of the sums in Eq (3.4) need only be computed over
k = 0, 1, ... , (N/2)-l, since G(k) and 11(k) are periodic
in k with period N/2. After the two DFTs in Eq (3.4) are
computed, they are then combined to yield the N-point DFT,
X(k). Figure 3.1 indicates the computation involved in
computing X(k) according to Eq (3.4) for an eight-point
15
X
Figure 3.1 Floviqralph of the iDccim-i Lion- I r,-T i mcDecom--position of an N-Pi nt DFTComputation into Two N/-Point Pl'"T
4 Computations (N-)
NOTE: The int ecors on the branches of the filowqraphrepresent' the powers:~i- of 1-1 i ce. ,the "4"
N'N
S equence. 1' iquve 3.1 (Oppenhcimn anid ,;chafer, 1975) uses
the -i';ni flow c,_)mtvcion ,;uch Lhit brancles Lntering a
node are sunLned to jroduce the node variable. When no
coefficient is shown the branch transmittance is assumed
to be one. For other branches the transmittancc of a branch
is an integer power of WN. Note in Figure 3.1 that two
four-point DFTs are computed using G(k) and 11(k). X(O)0
is obtained by multiplying H(Q) by WN and addirg the product1
to G(0). X(l) is obtained by multiplying I1(l) by WI and
adding the result to G(1). For X(4) it would follow that
H(4) is multiplied by W4 and added to G(4), however, since
G(k) and H(k) arc both periodic in k with period 4, H(4) =
H(0) and G(4) = G(O). Thus X(4) results from multiplying
4H(0) by WN and adding the produce to G(0).
With the computation of the N-point DFT of Eq (3.4)
that number of computations can be compared with the direct
DFT computation of Eq (3.1). For the direct computation
without using symretry properties N 2 complex multiplications
were retuired. Eu (3.4) requires corpu,,ation of two N/2-!" 2
point )PT.;, whi ch require 2(k /2) co'r.f 1]x multiplicatio:s
and about 2(N/2)" complex additions (Oppenheim -ind Schafer,
1975). The two N/?-point DFT.i must be combined, requiring
N compl'x multipi jcations correspondinq to multiplying the
second u N by IV and then N comp lex additions, correspondina
to add iz, the p Ludact to the first sum. As a result, the
comput.ition of :(I (3.4) for ill values of k requires
717
--7
N + 2(N/2) 2 or N + (N/2) comlj]ex multiplications and
additions. For N>2, N + N 2/2 is les-;s than N2 "
The expression in Eq (3.4) corresponds to decimating
the original N-point oequence into odd and even N/2-point
sequences. Since N=2m the N/2-point sequences are also
even and then each G(k) and 11(k) can be further decimated
into two N/4-point DFTs, which could then be combined to
yield the N/2-point DFTs. Decimating the N/2-point sequences
in Eq (3.4) into N/4-point sequences gives:
(N/2)-l rkG(k) E g(r)WN/2/
r=0
(N/4)-l 2pk (N/4)-l (2p+l)kE 7 g(2p) WN/2 + E g(2p+ Np=0 N/2 W -2
Letting R (N/4)-l,
R pk k R pkG(k) = g(2p)WN/ 4 + WN/2 E g(2p+l) W/ 4 (3.5)
P=0 p=O
Similarly,
R pk k R pkH(k) p h( 2 p)WN/ 4 + WN 1 2 F h( 2 p+l)WN/ 4 (3.6)
If the four-point DFT in Figure 3.1 are computed using
Eq (3.5) and (3.6) then that computation would be carried4
out as indicated in Figure 3.2. Inserting the computation
in Figure 3.2 into the flowgraph of Figure 3.1 produces the
coiiplete flowgraph in Figure 3.3. Note that WN/2 WN was
used.
18
() 0 11
Q) 4J En
- :4$
1 0 4-)
H- 4-iI E- :
.H 0
N 4J~
4 0-40 1
4 4 -ig
.4 0
o~ 44'
-r 0
4-3 -c
Xo r4
X 4-
19'
ff
I-
200
-Ampa
For the 8-point DFT that has been used as an example,
the computation has been reduced to a computation of N/4-
point DFTs where N/4=2. An example, 2-point DFT for x(O)
and x(4) is shown in Figure 3.4. The complete flowgraph
for the computation of the 8-point DFT is shown in Figure
3.5 and was obtained with the computation of Figure 3.4
and inserting it in Figure 3.3.
Considering the more general case with N a power of
2 greater than 3 the same decimation procedure would be
continued by decomposing the N/4-point transforms in
Eqs (3.5) and (3.6) into N/8-point transforms. This
requires v stages of computation where v = log 2 N. Recall
that in the original decomposition of the N-point trans-
form into two N/2-point transforms, the number of complex
2multiplications and additions required was N + 2(N/2)
When the N/2-point transforms were decomposed into N/4-
point transforms the factor of (N/2) 2 is replaced by
N/2 + 2(N/4) 2 so that the overall computation now requires2
N + N + 4(N/4) complex multiplications and additions.
If N=2v this can be done at most v = log 2 N times, "so
that after carrying out this decomposition as many times
as possible the number of complex multiplications and
additions is equal to N log 2 N" (Oppenheim and Schafer, 1975).
The flowqraph of rigure 3.5 displays the operations
explicitly. By counting branches with transmittances of
the form Wr it is seen that each stage has N complexN
-.
21
4-J
A-4
0
440
etI
0
PL4
'. 4
22
L 0I I 0
41
0
0
X 1,
-4 ci
23,
mul ti p i ctt i on; and 'I complex addit. ion.. Since there are
lo0 2 N stages there are a total of N loj 2 N coriplex multi-
plications and additions as shown before. Further reductions
in the complex operations count can be achieved by exploiting
the symmetry and periodicity of W.N*
Note that on each "stage" of Figure 3.5 the computation
takes a set of N complex numbers and transforms them into
another set of N complex numbers. This process is repeated
vz0locT N times resulting in the DFT computation. For example,
in computing the first stage of Figure 3.5 one set of stor-
age registers would contain the input data sequence and a
second set of storage registers would contain the computed
results for the first stage. The sequence of numbers
resulting from the mth stage of computation is denoted as
X .(i), where i = 0, 1, ... , N-1 and m = 1, 2, ... , v. For
the following stage, the previous output array, X m(i),
becomes the input array and the new output array is Xm+l (i)
for the (m+l) stage of computation. Using this notation,
it can be seen that the basic flowgraph in Figure 3.5 is
given by Fiqurc 3.6. Using the notation of Fic;ure 3.6 the
equations of the butterfly are given by:
rXml (p) = Xm (p) + W X (q) (3.7)
r+N/2Xr (q) = X (P) I W, X (q) (3.8)
Because of the appearance of Figure 3.6 the computation of
Eq!; (3.7) and (3.8) are referred to as the "butterfly"
computations.
24
-ABC
4
4
4-)
U)
0
Q4J
41
r4
*Q
jo
The number of complex multip] ('ations can be reduced
by a factor of 2 u';ji g the syjiritry:
N/2
WN = exp(-j(271/N) • N/2) = exp(-jrr) = -1 (3.9)
so that the Eq (3.7) becomes:
rXm+(P) = X m(P) + WN X m (q) (3.10)
rXm+ 1(q) = X m(p) - WN Xm (q) (3.11)
Eqs (3.10) and (3.11) are shown in Figure 3.7 which reflects
the "twiddle factor" Wr out front in the butterfly. SinceN
there are N/2 "butterflies" of the form of Figure 3.7 per
stage and log 2 N stages, the total number of complex
multiplications required is (N/2) log 2N instead of the
N log 2N used in Figure 3.5. Using the "twiddle factor"
butterfly flowgraph of Figure 3.6 as a replacement for the
butterfly of Figure 3.4, the Figure 3.8 is obtained.
3.2.2 Development of Radix-3 FFT Theory. Starting
with the restriction that the N-point sequence be an
minteger power of three (N = 3 , m = 1, 2, 3, ... ), the
DFT X(k) was computed by seporating the discrete time
sequence s(n) into Lhrec N/3 point sequences. X(k) is
given by the DFT c:pression:
N-1 nk where k = 0,1, ... , N-1X(k) = Y x(n)WN (3.12)
n:0 and WN = exp(-j2i/N)
Breaking x (n) into three N/3 point sequences yields x (3r)
x(3r+l) and x(3r+2). Substituting these into Eq (3.12)
and adju5;Ling the respective summations to (N/3)-l yields:
26
.VI
L4-
4)
'4-)
ro
.ric~44J
.ri
-)
tP:
0
27
b -- - --- - -
44
4-)
0
ro
N , ,
4-)
-44
0.4-)
0
44.
Vo "
t - •. C , >
*J* ....
i IE l.-q
i /r-i 4
X x
AA,~
P (3r)k P (3r+1)kX (k) ). x (3r)IVN Y 2 x(3r37 1 )WN
-r- r=0
P (3r+2)k+ x(3j:2)W Nr=0
where P = (N/3)-1 (3.13)
By regrouping the exponents of WN Fq (3.13) can be
rewritten as:
P 3rk k P 3rkX(k) E x(3r)WN + WN Z x(3r+l)WN
r=0 r=0
2k P 3rk+ WN E x(3r+2 )WN (3.14)
r=03
By rewriting W as:N
WN = exp(-j67/N) = exp(-j2T/(N/3)) = WN/ 3 (3.15)
Eq (3.14) can be expressed as:
P rk k P rkX(k) = E x(3r)W + W Z x( 3 r+l)WN/3
r=0 N/3 Nr=
2k P rk+ WN Z x(3 r+2 )WN/3 (3.16)
r= 0
Each of the sums in Eq (3.16) represents an N/3 point DFT:
the first being the N/3 DiFT of the 3r points in the
original sequence, the second being the N/3 points of
3r+l, and the third being the N/3 points of 3r+2 points of
the original sequence. Although t:e index k of X(k) ranges
over N values (k = 0, 1, ... , N-i) each of the summations
in Eq (3.16) needs computation over (N/3)-I points. Eq
(3.16) can be rewritten to reflect this:
29
~• .P.'.
k 2kX (k) - 'k) I W.. G(1:) 4 W. I (L) (3.17)
Eq (3.17) can be implemented into the butterfly flowjraph
in Figure 3.9 using the accepted notational conventions
(Oppenheim and Schafer, 1975). The convention used for
the flowgraph is when no coefficient is shown, the branch
transmittance is assumed to be one. For other branches the
transmittance (multiplier) is an integer power multiplier
of W N ' In Figure 3.9 there are three N/3 point DFTs and
these are computed with F(k) designating the three point
DFT of the 3r points, G(k) designating the three point DFT
of 3r+l, and H(k) designating the DFT of 3r+2 points,
where r = 0, 1, ... , (N/3)-l.
X(O) is obtained by (1) multiplying 11(0) by a branch
transmittance of 1 (which equals WN), (2) multiplying
G(0) by 1, (3) multiplying F(0) by 1, and (4) summing the
three. Likewise, X(l) is obtained by multiplying H(l) by
2 1WN, multiplying G(1) by WI , and adding the results to F(l).
X(6) has 11(6) multiplied by W1 2 and G(6) multiplied by
q6WN and the products added to F(6) giving:
6 q;6 I 2 ( )( 8X(6) = F(6) + WN G(6) + 11(6) (3.18)
However, since F(k), G(k), and if(k) are all periodic in
k with period N/3=3, the periodicity can be exploited to
yield F(6) F(0), G(6) G(0), and 11(6) 11(0). These
results can be substituted into Eq (3.18) to give:
X(6) = F(o) + W6 G(0) + 2 11(0) 19)
30
.//
XNN
xW )
Figure 3.9. Butterfl, Flowqraph for Fir:itL StageDecimat ion t N-9) .
NOTE. The nt,,,bcr,; T n t,, ,. t .svr nt . W . • . *, . ,
w9* 31
31
Continuin,! to use the, neriodic properties, the
results for X(0) Lhrough X(8) are:
X(0) = F(0) + IS(0) + 11(0) (3.20)
1 2X(1) = F(1) + 9 G(1) + W9 11(1) (3.21)
2 4X(2) = F(2) + W9 G(2) + W9 H(2) (3.22)
3 6X(3) = F(0) + W9 G(0) + W9 H(0) (3.23)
4 8X(4) = F(1) + W9 G(1) + W9 H(1) (3.24)
5 10X(5) = F(2) + W9 G(2) 4 W9 H(2) (3.25)
6 12X(6) = F(0) + W9 G(0) + W9 H(0) (3.26)
7 14X(7) = F(l) + W9 G(1) + W9 H(1) (3.27)
8 16X(8) = F(2) + W9 G(2) + W9 H(2) (3.28)
Eqs (3.20) throuqh (3.28) conclude the first stage decimation
of the 9-point sequence. The DFT computation has been
reduced to computations of N/3-point DFTs where N/3 = 3.
An cy. h 1 -, 3-,oint. UFT for x(0) , x(3) , and x(6) is shown in
* Fiqure .10. The complete flowqraph for the computation of
the 9-i1)iiit PITI is shown in Figure 3.11 and was obtained by
" u t it ic';,i ptuh' of Figure 3.10 into Figure 3.9.
('on:-id (-i fji( t he more general case with N a power of 3
,t,.%. , '. , a ti, ,r d vnittion procedure would be
Sin 1; J'i' ()sin'; tc N/3 DFTs into N/9 computations
.(i , l (C k ), itnd 1(0k). The DFT of F(k) is:
32
04-)
S-i
4)
0
g.4 (n
H 4-)0
4 .q
4) A
4.J
0
0
o r) 0|...
02:
E-1
C3)
33
a) ci
Figjure 3.11. Compiotce ButLerfly I'1owfqrai~i C.-
a', NOTE:L Dig its on the i mich traw-smittancc - i
34
LM Amok-.
(N/3) -I rkF(k) Y x(r) W (3.29)
Thi!s cquation, It ttinri 0 (N/9)-I, can be divided into
three N/9 lonqth sequences:
Q 3ik Q (3i+l)kP (k) Z f(3i)W + E f(3i4 )WN/i=0 )N/3 i=0
Q (3i42)k+ , f ( 3 i+ 2 )V.N/3 (3.30)
Expanding the exponents of WN/ 3 E( (3.30) can be rewritten:
Q 3ik k Q 3ikF(k) ' f( 3i)WN/ 3 + WN/ 3 Z f(3 i+l)WN/ 3
i= 0 =
2k Q 3ik+ w f( 3 i+ 2 )N/ 3 (3.31)
N/3 iO ,/3
Using the substitution W W
Q ik k Q ikF(k) f 1 0 (3N)WN/9 + WN/3 f(3i+l)/i=o i=-o wV
2k Q 1k+ WN/ 3 f (3i N2)/9 (3.32)
Similar c:pre isi ons for (() and li(r ) caln be derived:
1k k Q ikG(k) q(3 i)W+," + h/
i:0 N/3 1<0
2k C)i k+ wN/3 0 q M-f2)W 9 (3.33)
Q i k Q ikIf(k) = h( ) , 4 j h (lIi 1)W /
2k Q ik+ W~N/3 i-w N/3 hi (3i 2)V,/9 (3.34)
• 35
.-
3] 3 2 1 !rjo 14 1 I: u ( t .i( i t I .-
I1q (3. 32) hrlo'Jqh Ci;. 4) (', w i,. u:;iid I, !, i. titic
qeniru] (x)n; i, foi a nid.ix-3 buttcrfIly 1 lowqrsh.
Lettinq N .9 the c,.pres ;ions for 1.'(k), G(k) ,nd 11(k) ),,(come:
0 01.)(0) 0) I(0 3 f(1) + W3 f(2)
1 2F() f(0) + W3 f(1) + W3 f(2)
2 4F(2) = f(0, f W3 f1( ) + W3 f(2) (3.35)
0 0G(0) g (0) + w3 (1) + W3 g(2)
1 2G(1) = g(O) + W3 g(l) + W3 g(2)
2 4G(3) g(O) + W3 g(i) + W3 g(2) (3.36)
0 01(0) = h(O) + W3 h(1) + W3 q(2)
1 21(1) = h(O) + W3 h(1) + W3 g(2)
2 4H(2) = h(0) + W3 h(1) + W3 g(2) (3.37)
From Eqs (3. 35) through (3.37) the qene, l butterflv
mult ipi.ers ire derived (c-un, irten with O],,, Y .r , ,l
Sch; f('1 ) to be:
k 2kX(k) = F(k) + WN G(k) ± WN if(k) (3.38)
k+r 2k-12rX(k=r) F(k) - WN C(k) + WN 11(k) (3.39)
k+2r 2ki4rX(k 2r) - I-- (k) + w 4k) 4 N 11(k) (3.40)
where r repr ;( ,t the di.;tzanco betwen th, (,iipoi nt:; of
the but t erf l'. In I- i ur, 3. 11 r I for .;t aio I and r -2 for
36
.. t -'. ...-
sti;e 2. 14(1:, (3.3') til 0u 11 (3.1 0) are r.mresenl (d in
li, .uie 3.1,1 wi,ch i'. ih,8 t j enor,il radix--3 buttierfly
The exnon ents of Figure 3.12 can be- rewritten to:
k k (.1Wk 11 W (3.41)
2k+2r 2k 2r (3.42)W Iq VI( . 2
Wk+2r :Wk W 2r (3.43)
W2k44r W2k WAr (344)
With these expressions for the butterfly multipliers an
alternative arrangement to Figure 3.12 is possible by
"premultiplyinq" or "twiddling" the inputs to G(k) and
H(k) (Centleman and Sande, 1966). The multipliers WNk
2kand WIk represent the twiddle factors of the butterfly
in Figure 3.13. Since N=3r (Oppcnheim and Schafer, 1975)
the butterfly multipliers can be reduced to:
r =rWN 3 3r exp (-j211r/3r) = exp (-j2,/3) (3.45)
= -0.5 - j.866
-r 2 1 (-j4,,i/3) -0.5 + j.806 (3.46)
4r 1- 'I. N 3 r p, (-jTS'r/3) -0.5 - J.866 (3.47)
1Oplnh,im ,ind -chafer obo::rvcd that there is no advantage
4in FigJure 3.12 to the alternato twiddle factor version in
]'i ur 3.13 lcc,iu e "e:.: (-j 2 /3) 1.ndI all [he pow:ers theieof
illO C,,eel', CN fici ,nti; that rei Ve mult Jp] ications"
lowever, for tht, particuliar ]'ORTRAN FFT radix-3 programs
which i upi e,,nt , 1 ifxvi 3.12 and 3.13, the twiddle factor
37
'- -..
1
* 4-1
-4-)44'
38d
AAA'
'4-J
4J
4
t7
version of the radi:--3 FI"' was much norc efficient to
imp, t nt b ,cause on itwo Lwicdic.- iactoL,; had L be compuLed
(Wk and W 2 ) per buttLrfly and the butterfly multipliers were
the constants in Eqs (3.45) and (3.46), the original version
of Figure 3.12 requires that all six complex multipliers be
computed for each butterfly. The twiddle factor version
represents a simplification over the original raiix-3
butterfly.
3.2.3 RaCix-5 Theorv. The theory for the radix-5
algorithm follows a development similar to the radix-3.
Because of this similarity only the radix-5 results are
given here for comparison to the radix-3, readers interested
in detailed development are referred to Appendix D.
The basic butterfly multipliers for the radix-5 are
given by:
k 2k 3k 4kX(k) = A(k) + WN B(k) + WN C(k) + IN D(k) + WN E(k) (3.48)
k+r 2k+2r 3k+3rX(k+r) = A(k) + 1 B(k) + WN C(k) + WN D(k)
4k+4r+ WN E(k) (3.49)
k+2r 2k-i4r 3k+6rX(k+2r) ACE) + W N B3(k) + W N C(k) + WN D(k)
4k+8r+ W N E (k) (3.50)
k+3r 2kA-6r 3k+9rX(k+3r) A(k) + WN B(k) + W N C(k) + WN D(k)
4k+12r+ WN E(k) (3.51)
40
-1.m
k4-4 r 2k+r 3k+12rX(k+4r) A(k) + W 1 ,(k) 4- %., ( k) A WN D(k)
4k+16r+ iN LE(k) (3.52)NI
The Eqs (3.48) through (3.52) ar(, shown in the twiddle
factor butterfly of Figure 3.14 wher,: "r" is the distance
between the butterfly and points. Since 1; ''r the butterfly
multipliers reduce to constant comiplex multipliers of:
r 6r 16rWN = WN =- =N cos(2,/5) -j sin(2,/5)
2r 12rWN = W = cos(4n/5) -j sin(47/5)
3r 2r, 8rW = (WN ) N WN cos(4,/5) +j sin(4r/5)
4r r , 9rWN (WN = WN = cos(2T/5) +j sin(2-r/5)
These constant butterfly multipliers are computed once
during the PET computation and used in every radix-5
butterfly.
3.2.4 Digit Reversal Algorithm. In order for the
DFT to be computed as discussed above, the input data must
be stored in nonsequential order. In fact the order in
which the input data are stored is in "bit-reversed" order
for the radix-2 FFT and "digit-reversed" order for the
other fixed-radix algorithms. To see what is meant by this
terminology note that for the 8-point radix-2 flowgraph of
Figure 3.8 three binary digits are required to index through
the data arroy. Writing the input indices X0 in binary form
and then reversing the order of th, .-s gives:
4/.o. 41
-----
,2i~::
A p::, \_N,,,.. . ----_ . .. . . . . . . . . ..--'.", -
f ). " . ... ?"__+__
" t_ J . .__ _ " "... " " " "
'K
" ,,
,.7 -..- -
- - - -- - - - - - - - - - - - - - - --\-------
A4
, ,:,*:" , ," '. -. t : ;i , r *'.... c Fac- . ..
Xo(O) = Xo(O00) = x(O00) = x(0)
X0 (1) = X0 (001) = x(100) = x(4)
X0 (2) = X0 (010) = x(010) = x(2)
X 0 (3) = X0 (011) = x(ll0) = x(6) (3.53)
X0 (7) = X0 (1ll) = x(lll) = X(7)
If (n2 nI no) is the binary representation of the index of
the sequence x(n), then sequence value s(n2 n1 no) is stored
in array position x0 (n0 n1 n2 ). That is, in determining the
position of x(n2 n1 n0 ) in the input array, the bits of
index n must be reversed in order.
For the radix-3 FFT the input array must be in a
similar nonsequential order. The order is determined by
"digit reversing" the input sequence value using a modulo-3
counter. The digit reversed radix-3 FFT example where N=9
is shown in Figure 3.15. The modulo-3 counter is given by:
COUNT = (b1 * 31) + (b0 * 3 ) (3.54)
where bk = 0, 1, 2. The reversed count is given by:
REVCOUNT = (b0 3) + (bI • 30) (3.55)
Eqs (3.54) and (3.55) show the modulo-3 counter for N=9
which requires only two b bits: b I and b0 to represent the
input sequence. For the case where N=33 =27 three bits are
needed to repre;cnt the input sequence x(n) and the modulo-3
counter becomes:
43
baf:e , wts I0 bn5;c 10 base "
X(o() 0)' (o) goo)
x(20) x(6) (2) X(02)
/01) -- x(i) F:F()-Al Lterfly
x(11) x(h)
x(21) = x(?) X5) = x(12)
x(02) x(2) (2G)
x(12) . ),.7)" '
='2 X P, (-22j
Figure 3.15. Digit Reversed Input and OutputArrays.
44
!0
C()U '' (b ,2) I (h ) ) ( 0 , ) (3.56)
aiid the iQXE d iI jt. oimnter if;:
REVCOL),' - (b0 * 32) + (b1 3 ) + (1)2 * 30 (3.57)
Siviilarly the conera-I ex r ss o i l. ; for COI.VI' and [k'C( i'!:r
can be given where N:3 m' and b k = 0, 1, 2:
COUNT -- (b m_ 1 • 3 m - 1) + (bn_ 2 3 3m-2 ) 4 ...
+ (bI 31) + (b0 • 3 ) (3.58)
and
REVCOUNT = (bI 3m-1 + (b2 3m- 2) + ...
+ (bm-2 3 ) + (bM_ * 3 ) (3.59)
Once COUNT and REVCOUNT are computed the magnitudes are
compared. If REVCOUNT is less than or equal to COUNT a
swap of the values indexed by COUNT and REVCOUNT is not
required; otherwise exchanqe the array value indexed in
by COUNT with the array value indexed by REVCOUNT. The
cojnters are incremented by one and the process continues.
until all N in.ices h tve been testcd.
3.2.5 Dro- <op'n- oF a Rad :.-_l ' ,T . i ,
Cu!),- , tI of Un Ly. This section prosen t-. tl!,h th-iorv (
a rad[::-3 FFT algorithm which uses the comiplex cube 1,ot or a
unity to perform the complex Fourier transformation (ftter-
.* fly) .ithout i; ijng mu]tiilications. The bcne'Lit of Lhis F
techn.i que w.i1. also be discussed in the section on real
operatlions count.
/
"I /
*A 45.s
W i .I : t~i (1c (l loi.; ,inId VenCt:aInopoulos,
. .':, .: > [ i>1 oI ; LuCo iq(]uC, it
.V.. C out .V(,v ,rI 1 twtI . which Cii in understanding the
th(ory and for that reason it is presented again here.
This a] (orithm u.Cos basis vectors (l,u) instead of the
conventional complex plane vectors (1,j) to perform the
cori)]cx Fouricr transform (wherc u is the cube root of 1
and j is the square root of -1). The new basis vectors
use arithmetic notation:
a + bu = R(u) ; a, b, real numbers (3.60)
Taking u as the cube root of 1 implies:
u - 1 = 0 (3.61)
or
2(u-i) (u + u + 1) = 0 (3.62)
Since it is known u . 1, then
u + u + 1 =0 (3.63)
or
2u= -1 - u (3.64)
Eq 3.60) is used in the definition of multiplication in
th,, R(vi) field:
(a + bu)(c + du) = ac + bdu + adu + bcu (3.65)
Subs-tituting Eq (3.64) into Eq (3.65) results in:
(a + bu)(c + du) = (ac - bd) + (ad + b(c-d))u (3.66)
The cxp-ession in Eq (3.66) can he expanded and then
,'cocl~i ned to reduce the number of multiplications:
46
.-. -'
ad + b(c-d) ad - bc - bd - bd + )-( + ac - ,ac (3.67)
=ac + ad + bc + bd - ac - bd - bd (3.68)
= (a + b)(c + d) - ac - bd - bd (3.69)
Substitutinq Eq (3.69) into Eq (3.66) qives:
(a + bu)(c + du) = (ac - bd) (3.70)
+ ((a + b)(c + b)- ac - bd - bd))u
The result in Eq (3.70) requires three real multiplications
and six real additions compared with conventional complix
multiplication which requires four real multiplications and
two real additions. Multiplication in the R(u) field requires
one less multiplication but four more additions.
3 3The expression for u is obtained from u = 1 by letting3 3
u = (exp(-j27r/3)) = 1. Consequently, u = exp(-j27/3) =
-1/2 -j (/3/2) which is used for conversion between a + bj
and c + du:
c + du = c + d(-1/2-j(/3/2)) = c - d/2-j(¥f/2) d .71)
c + du (c - d/2) + j(-,3/2)d 1.72)
To find the conversion from a + bj to c + du, solve
Eq (3.70) for j:
c + du (c - 1/2) +-
d/2 A- du = (-/r3/?)cl j
d(1/2 + u) = (-,F]2)d j
1/2 1 u j
t j - (-2//3) (]./2 + u) (3.73)
47
. ..7 ,
Us in; Inq (3.66) nd a + h- the conver;j on to c + du is:
= a + b(-2/(3)(1/2) + b(-2/v/j)u
a + bj (a - b//J) + (-2b//3)u (3.74)
Using the R(u) arithneLic dveloped above, it can be
shown that a radix-3 FFT butterfly can be developed which
require; no multiplications except for the twiddle factors
in Figure 3.13.
Using Eq (3.74) and WN = cos(2-ir/N) + j(-sin(Pr/N))
produces:
c + du = (cos(27r/N) + sin(27r/N)/!)
+ (2 sin(2Trr/N)//3)u (3.75)
Using the substitution of N = 3r in Eq (3.75) reduces it to:
W r = (cos(2Tr/3) + sin(21i/3)/3) + 2 sin(2T/3)/,3)uN
Wr = 0 + lu = u (3.76)
Likewise the remaining W terms in Figure 3.7 can be reduced:
2r
WNr= (cos(4v/3) + sin(4 /3)/ ) ±+ 2 sin(4v/3)//33)u
W -1 u (3.77)N
W4r 1- 0 + I u - u (3.78)tN
SubstiLuting Eq.; (3.76) through (3.78) into Figure 3.13
produces Figure 3.16.
48
H
. 4
'-44
'-4
4-)
49-
W-..
Ii :;ilx , ,;.itlii. i , Ic I ' Ln) 111ic I "'i:",; -;ut Ilk o',cr t- io l
.11 ,ii I .I. ] ' .,i, , [ t i Ii V, IO ; it. ii I '. I 4. :' : J11 I. nlo
liult j p' I :; irc, rcnl ui rud to YC.l uLt th, bittof I]v u o'.4rph.
X. , Yi arc the butterfly inpluts afte, twiJd]e fllctor i::u lti-
p iication ann A(.), B(.) arc the hUt t ir ],, out ju ts iln the
R(u) field.
A(]) + 13(1)u - (X1 + X2 + X3) + (Y1 + Y2 + Y3)u (3.79)
A(2) + B(2)u (X2 + Y2u) (0 + u) + (X3 + Y3u) (-1 - u)
+ (Xl + Ylu)
A(2) + P,(2)u = (-Y2) + (X2 + Y2 (-1))u + (-X3 + Y3)
+ (-X3)u - X1 + Ylu (3.80)
= (X1 - Y2 - X3 + Y3) + (Y1 + X2 - Y2 - X3)u
A(3) + J3(3)u = Xl + Ylu + (X2 + Y2u)(-l - u)
+ (X3 + Y3u)(0 + u)
= Xl + Ylu + (-X2 + Y2) + (-X2)u + (-Y3)
+ (X3 + Y3(Ql))u (3.81)
= (Xl - X2 + Y2 - Y3) 4 (Y- - X2 + X3 - Y3)u
There are 16 real adti t-ons shov..- in Eqs (3. S0) and
(3.81); however, by corbinin common ter-ms -Y2 - N3 - -R
and -X2 - Y3 = -S, tie radix-3 bu-Lr I y can be -,valuaLed
I using only fourteci rc al additions (noe ]ecting the twiddle
* 4 fact:ors)
A(1) X1 + X2 ± X3
B(1) Y1 + Y2 + Y3
A(2) - Xl + Y3 - R
B(2) Y1 + X2 - R where R = Y2 + X3
0
SA(3) -- N + Y2 -
B ( 7 ) I 4 X3 S;. ,. 5; c ; Y3
3.2.6 Summary. T1liC; (,te5 t,, (!iscusic n of
fixed rada:: II?' theory. in this secLion the gceiiral theory
was develope(d( usi nj Lh! radix- 3 eas, a<; an alternatyive to
the more common radix-2 d(evelopmcnt. A decimation-in-time
for N--9 was shown and the basic butterfly equations for
radix-3 was derived. ,ecause of the similarity to radix-3
butterflies, the radix-5 theory was not developed but the
butterfly equations necessary to implement a radix-5 FFT
was given. Finally, a new radix-3 FFT (Dubois and
Venetsanopoulos, 1978) was developed.
3.3 Real Operations Count for Fixed Radix FFTs
The speed at which an FFT algorithm can perform the
DFT is a (to a first approximation) proportional to the
numboer of complex multiplications used in the algorithm
(Singleton, 1969). The number of times the data array is
indexed is a secondary factor :and is shown to have minimal
impact on the results of this paper.
An lnoima:l\" in th nei.,nilure <;heu] be oinI out
before further discussion of "complex multiplications"
related to FIVPs. A Complex m]ltipication implies fourreal multipl i cations and two real additions. It has been
sho;.n1 ( Sinl tn, 1969) that (p-]) 2ial multi pica ions
are required to evaluate a complex transform of dimension
p, p odd, where N p Sinqleton then reies to the (p-1
51
2' ru. ]i lu]t ll[ a i as (p-] ) 2 com:1p](-, 11ulti'J"Iicatiolls
form of diinension p, requ.i ies miorc tim ii (p-1) 2/2 real additior s.
Throughout this paper all references to multiplications and
additions arc. in terms of real operations and noL comli]cx
operaticns.
The real operations are detcrmined from (1) the ntumbcr
of butterflies times the number of real operations required
to compute the butt ,rfly and (2) the numiber of twiddle
factors times real operations required per twiddle factor,
and (3) the number of trigonometric functions (sine and
cosine) which must be computed. The real operations count
for a radix-p FFTs are derived as a function of N, m, and
p where Np
3.3.1 Number of Butterflies in Fixed Radix-p FFTs.
The number of butterflies is dependent on N, m, and p,
where N=u Examining the radix-2 FIT in Figure 3-8 shows
that there are 8 input point.; and 8 output points for each
St a, . Te, radi x-2 butt- erflv in Figre 3. 7 has 2 i nput
aiid 2 out pa - }< .t t,'hich mccns chat ]ici.ure 3c. must have
8/2 4 . ,utL e yc. ,ir st a{j,. Tieere are 3 sta <c-s in this
radix-2 I''T (where N -2) giving a total, of 12 butterflies
"b in thi. ; I,'T.
(In p -III; a th' N~iI (r Of. rab tterf1 i is given
by: nm/p (3. 82)
bThis ( u.ition can be checked for the radix-3 example.
(;iven that N-9, p- 3, and 1112 1,q (3.82) gives the total
52
P1,I ) Ch r I. I it t. 2. .i': 9/3 6. Thi iv i fi Jed
1)y .1' ("Ii. .I I I .VIi iul .5 6 r ei- u eiI
3 .3. Numb11er wdie1conIuV eRuix-
I l T s . The i-tv dd le iict- ors are coi,,,,p i'- mu It tii I.i( of t he
f orn C' x") ( -- j 2) l./U, ) %.,I Is hi i:U 1 I p]' y c Q. i rai ci x-p hout t r f Iv
as shown inii Pj ure 3. 8 . NotLice tl)it: cL ich fs t-ape has- N/p
8/2 = 4 butterflies, eatch of w.hichi rciiii yes p-i 2-1, =1
complex tv Eddie factoi: . The genera 1 (pre.05 s ion for number
of twiddle factors inl each s~tage becoreo,-s:
N(p-l)/p (3. 84)
Given that N=p m there are in stages in a radix-p FFT making
the total numiber of twiddle factors for the FFT equal:
mN(p-l)/p (3. 85)
0some of the complex twiddle factors are W N =1 and can be
eliminated. In any FFT there are N-1 of these unity twiddle
factors (S-iaici-ton, 1.969) w.hich gives the final expression
for the nui-,ber of comple~x twiddle factorsa:
mN(p-l),/p - (N-1) (3.86)
') 3
fa jr .s .cn; to Lbec 5. TP :a ii n ii, 1'iu i c 3. 8 for N:2
s tlcre c, 5 noi 1- Cii il.t.v fw icih ?ict or
4 ~3.3. 3 N111oher oFTieioletrcRne ions Recuiuied
for t lle pi.:J nod adix f\ lihs.The t riqenemiet iic functions
of, 5.i 1c Ill(! (OiIi 1le llkuciJlei to co v ill(, twid(11 I factors.
Tli, 1,.1(11-:-ad2 a lyjo.' thim uses caIl1is; to the FOPTRAN
iiI111 P> : i COS fun.1 I. tansi; '1s well- -is thle di.fforence
53
equ;t i In ; give,;I in S.ct o . Tihe raldi: -3 and 5 FV 'i's
U'j ' O t,'~]
The ra-di x-, .l(c,1-J Li]u i n Ap lidi x A coipii)utes one sine
and cosine at each stage of the FFT using:
W -- CMPL)I (COS( P1/L]]_), SIN(PIr/FiL]].))
Each radix-2 PlI T has m s ,ages where N=m which means the
sine and cosine funcLions0 are called m ti mes for the FFT.
Once the initial sine and cosine are computed for the
stage each new Lwi ddl e factor in the stage is computed
using the complex multiplication:
U = U * W
where the complex U was originally initialized to U = (1,0).
The complex multiplication U * W effectively implements
the sine and cosine difference equations in Section 3.1.
The number of times U * W is computed for each PFT stage
is a function of the number of different twiddle factors in
the stage m i. ln Figure 3.8 the first stage has only one
0type of twiddle factor W, , the second stage has two types:
0 2 0 1 ,2 3. and P , while st-acf hs four: W , W The
genera! e::eo-Cfo tl(' Lypes of twiddle falctors iln
." ('.achl 5 [age iu:
TF =2k - I
0Thus for stage J, k=l and TF-2 0=1, which gives one type
of tv-i(Mlc factor; for st nc 2, ]--? and TF: 2 Z2 giving two
types of twidd]e f.actr:;; and fi nall]y for the last stage
Cin thi.; example ]3 and TV'=2 =4, or four types of twiddle
factors alre required. In general for the radix-2 1.'-,'T in
5.1
A, ,,eondlx A the ofiul( x mui.ltip] ication U * i is evai.Liated
. Le t l o [
; (2 )k=l
times, where m 1s tihe number of stages for H=4 . Given
that the complex multiplications requires 4 real ultipli-
cations and 2 additions, the number of operations required
to compute sines and cosines for this radix-2 FFT is:
real mult = 4 Z ( 2 k - ) (3.87)
real add = 2 X (2kl) (3.88)
k=l
sine and cosine calls m (3.89)
.re real operations required to compute the sine and
cosine lookup tables for the radix-3 and 5 algorithms is
less complex than the radix-2 FP'T. In these algorithms
the difference equaLien from Section 3.1 is used to compute
sino and cos ine look:im tebiCs which have length N. Because
of the symmce-r • of sin(k) -sin(-1) only N/"2 coriputations
of the dlif fvoc'i c,,ua .ons aer reqci.red. The equations
are given by:IWKC(I) C * W,C(l-1) - S* WKS(I-I) + WKC(I-l)
WKS (I) C * W4-(-1) + S * WKC(I-]) + WI(S(I-i)
whicl need a total of 4 real, mu]tipl.icat ions and 10 additons
to compute. For an N ong'Lh sequence computing the lookupI"
- tables require:
5 5
r a ,l IlW t 4 (!/2) 7 2N (3.90)
rTOd. add 3 0(N/2) -5N
3.3.4 Number of 1 ,a O)er. ion s in Pad iZ -p 'l"s.
Based on the (gefneral cxpressions in Eqs (3. 82) through
(3.91) the total nuimber ofF real mu]l.tipl ications can be
determined y Lven N-pI wiiere N, p, and m are integers
First , each radix-pbutterfly computation requires multi-
plications or additions or both to be evaluated. The
e:act number of multiplies and adds is determined from the
FORTiRAN code as shown below. Second, each complex twiddle
factor multiplication requires 4 real multiplications and
2 real additions. Third, the number of real operations to
compute the sines and cosines is added to the butterflies
and twiddle factors to (jive the total operations count for
each algorithm.
For the case of N: 2 m it was shown in the radix-2
SecLion 3.2.1 Lhat the radix-2 butterfly can be computed with
4 real. addiLions and no multiplications. This radix-2 butter-
fl y can be compIut d witL 4 re al additions and no mul t i plica-
tion: ;. This _,('ix-2 IF[' does not c,]iminate a]11 mu]. l) l., ice-
tion. by W0 . Therefore each radix-2 butterfly is multiplied
by a comIl)e twiddle fzctor as shown in Ficure 3. 8. For this
parLicuilar i-x iix-2 FFT Ilie number of twiddl.e factors equal
tin' nmber of ul tor- I '. Colbin all sources of real
- " op .,11 i ons for Li I' radix- 2 FFT gives a total of:
56
real r~iilt- (! nu].t l - butter1l') * (41 butterflies)
4, ( L\.iV1 factm>.) (3.92)
+ 4 (: type.,; of twiddo factorl)
Substituting the appropriate values for the r,di::-2 (jives:
real mulL = (0) * (ncN/2) + 4*(raN/2) + 4 * ( 2 -
k 1m
= 2mN + 4 2 k-I (3.93)
k=l
Likcwise for the nu-mber of real addit ions:
real adds = (f adds per butterfly) * (4 butterflies)
+ 2 (U twiddle factors) (3.94)
+ 2 (f types of twiddle factors)
m k-1real adds = 4 * (mN/2) + 2*(mN/2) + 2 * ( 2 2
k=1
= 3mN + 2 Z 2 k-l (3.95)k=l
For the radix-p FFTs where p is an odd prime it has
been shown by Singleton, 1969, that those butterflies can
2be evaluated using (p-1) real multiplications. The
FORTRAN codc, racli- -3 and rad i x-5 in Arppend-ices 13 and D
require 4 real multiplications and 1:1 a(!:dtion; for radix-3
buLterfl ie.o and 16 rc ]i mu'1t11.dicat. nd 30 ad iltions
for radix-2 butterflies. Using these in Eqs (3.87) and
(3.91) yields the total real operations for the radix-3 as:
t
5 "
real mul 1 -L ( mulL pr butLerfly) '. iN/3
+ 4(mn(3-l)/3 - (N-1)) + 2N
4raN/3 + 8mN/3 - 4 (N-i) + 2N
41,n - 4(N-l) + 2N (3.96)
real adds (12 adds per butterfly) * raN/3
+ 2(j(3-l)/3 - (N-i)) + 5N
= l2mN/3 + 41TLN/3 - 2 (N-i) + 5N
= 16maN/3 - 2(N-I) + 5N (3.97)
Similarly the real operations count for the radix-5 IFT
becomes:
real mult = (16 mult per butterfly) * mN/5
+ 4 (inN (5-1)/5 - (N-i)) + 2N
= l6mN/5 + 16mN/5 - 4(N-i) + 2N
= 32imN/5 - 4(N-1) + 2N (3.98)
real adds = (30 adds per butterfly) * mN/5
4 2(miN(5-l)/5 - (N-i)) + 5N
= 30niN/5 + 8mN/5 - 2(N-1) + 5N
= 38mN/5 - 2(N-I) + 5N (3.99)
The resu] K of Els (3.92) through (3.99) are given in Table
3.]. for N hc.ILween 8i andI 16,000. This table also sunmrizes
the possible values of N for the fixed radix-2, 3, and 5
4 PETs.
3.3.5 Real Operat]in.. Count for the Radix-3 FT
Uisingc Ll - ('mp-c- C Mfot of Unity. This algorithm
represents an alternative to the conventional radix-3 FFT.-It is shown in this section that selective use of this
. ..8 {
/
'TABLiE 3.1a
N Radix Multi .i i ca t ions "Addi t ion:; Tr i : i -V.
8 2 3 76 86 3
9 32 58 125 1
16 24 188 222 4
25 52 274 457 1
27 33 274 515 1
32 25 444 542 5
64 26 1020 1278 6
81 34 1138 1973 1
125 53 2154 3227 1
128 27 2300 2942 7
243 35 4378 7211 1
256 28 51.16 6654 8
512 29 11260 14846 9
625 54 14754 20877 1
729 36 15142 25517 1
1024 21.0 24572 32766 10
2048 211 53244 71678 11
2187 37 56866 8 .211 1
3125 55 93754 1 28127 1
4096 212 1146, 1 5564(l 12
8656 1 3 1908 , 2 () ) 1
8192 213 24 r, 1 3) 70 13
'.5
o.
[,
5 (*)
~ ' . .!
alqor- Lhm cli i ,dth it, h.u ntl -r Of. 1f r ;II ]Cn' r;It 1i. l.'; (1 CpeI I 11-
,d eln tiht' I .I L~ , ],' l j Lh ) 1 : .
The radi:.-3 I'IT in the R(u) field ha:; four sources
or real multipication whe N-- 3
]. 2mn/3 - (N-I) complex tw;ddle Iactors (I(. - \'.Idin Section 3.3.3.
m i-i
2. Conversion from com}plc. to R(u) of " 2 (3 - 1)i:2
twiddlj factors derived from FORTRAN code inAppendix C.
3. Convers;ion of complex arr'ay of lnath .T to theR(u) field derived from the FORTRAN code.
4. Conversion of R(u) array length N back to the
complex field derived from the FORTRAN code.
The radix-3 in R(u) has five sources of real additions:
1. mn/3 butterflies derived in Section 3.3.3.
2. The four sources of real multiplies listed above.
Based on the FORTRAN code in Appendix C, there are three
real multiplications per complex twiddle factor, two per
twiddle factor conversion, two per conversion from complex
to the R(u) field, and two per conversion from R(u) to the
comple: field. Condcn:izirs the above into an ec]ution for
real multiplicitions y.ields:
m i-l
real mult: = 3(2mN/3 - N+I) + 2 T 2(3 - 1) + 4N (3.100)
There are 14 real additions per butterfly, six per
twiddle facto, , one per twiddle factor conversion, one per
convei-:;ion to P (u) array, and one per convei:sion to complex
array. F-::c'; ing the total number of real additions as a
function of the above yields:
60
-" :-'; t ,: f- u-,"...
real 4ac'. ' ra /3 (;,N/ 3 NA .[
iv i- I
Y 2(3 - 1) + 2N (3.101)i=2
The results for the numlr of real multiplications and
addition:- for both vidix-3 algori thins is given in Table 3.2
for N=27 to N:- 1968,3. lBecaue the R(u) radix-3 rejuires -.ore
Plultiplica tions alls additions for N=-27 and }l it will alw.s
run slower than the complex field radix-3 FFT. But, for
Nz.243 and hijlier the R(u) radi.:-3 may run faster dependina
upon the speed of additions relative to multiplications for
the computer being used to perform the FFTs.
Table 3.2 also gives the "Add to Multiply Ratio"
required for the R(u) field radix-3 FFT to run faster than
the conventional radi:-3 FFT. (The ratio is the difference
in the number of multiplies divided by the differncc in
the number of additins.) For the case of !,=729, a multi- , '-.
operation must take 3.77 times longer than an addition
before the R(u) field radix-3 can run fastc- than t he c(:- '-
plex fiel( raidix-3 'sli me an. 1that p.-ir to s
either of the algor h[o: the relati ve costs of adcd7 ,,
to multip] ications ntl.t, be known as well as the l enjt
the data s., qunce.
3.3.6 Memory Triiironient s for P1xed Radi: N Ps. A
major con;i dcration 1b( r seiectino a parti cular I'.T
algorithm i s the sequence leng.::h and memory required to
execute the sublroutine relative, to the memory available
in the con'puter. For this reason the memory requirements
01
.C. ']_
COf, 1APISO, B]';T'1 M*'N CO 1.7I . ? NlD R(OORADIX- 3 IVT "OR IA, (lJ'I AI O ;*
Complox Radi>-3 R(u) Ri(ii:-3 Add toN Real *.uIt ].,7a] A,,bs lo I. Mul t R,!I Adds Mult Ratico
27 220 380 232 624 NA
81 976 1568 1284 2562 NA
243 3892 5996 3140 9796 5.05
729 14584 21.872 10912 35714 3.77
2187 52492 77276 37152 126108 3.18
6561 183712 266816 124628 435202 2.85
19683 62930C' 905420 413308 1476212 2.63
* Does not include computing sine and cosine terms
1.
A
62
- -" '-; ' -- ," " ' " ri\- ' ..
for the radix-2, 3, and 5 FF'T,; i (jvn !iere a. function
of sequence ]urig th N. The progl-ri m,,y ind (ht, array
storage requirements for each algorithm are enurc Led
below.
The program memor, required by each rottin, ws
determined from a "load map" generated by the comm*iand MAP,
PART. The array storage req iirem-,ts w.eie determined bV
inspection of the DIIE"NSION statements in the VORTPLAN code,
for each subroutine listed in Appendix A to D. The
results are:
FFT Program Arrays
Radix-2 108 2NRadix-3 301 4N + m + 30Radix-3 in R(u) 396 4N + M + 30Radix-5 458 4N + M + 30
The memory arrays required for each algorithm as a
function of N are listed in Table 3.3. The program memory
was not included because it is dependent on machine word
size which varies from machine to machine.
3. 4 Mix(,d Pdix PVT A] on thin;
Up to this point only fixed radix Pl]"Ts ha )ecn
discussed. Explanatinn and programming f-- the s 1 .ecial
cases where N=2m or 3 M or 5 m arc simpler than the generalI
case of N=p 1 p2 .. .Pm, and for most applications the restricted
choice of value,*; is adequate. However, when the application
* does not permit "zeropacking" of the data sequence to reach
one of the special cases a wider choice of N is needed.
63
, i r i 1-s.i
17W .I1 3. 3
FUI i) RAD)I P;POR 1.1 l
N Memory Arra'",,
8 26
9 68
16 32
25 ]-32
27 141
32 64
64 128
81 358
125 533
128 256
243 1007
256 512
512 1024
625 2534
729 2952
] 021 2040
204 S 409'
218,7 88200
t
6 4
s ~ 11(2 Iia; u V ~ he(1a n;i red 1rldl1iTa1 .rafi
li. j 3 i iwhi h i2 h Cf Wi del 'y u.'ud amc i j I1%j (T7('Iitcd onl
.111i~ al: :A 1 C~~l~ I r2.TIJ!2 i](miJthm i; 3 styC] inl
Au'ndi.: . (he nt.arna tanal, Vatliher-ati cal Scien Li fic
Iimr 172 .Ai i ch J L; av ai IaM 1)1c o n th c 11PA !'P CDC Cv'beor 7 4
coaj~p tier li2 ai Tlldx(K ra di-x FFT U zscee a-n Singleton -, work)
-,c) the1 author has wrilt Len andl tostcd a mixed radix~y a iqor I~l-.
wchis iitdin pprj'jj. nai .:. The theory, dijiit reversal1.
rel l!a 0n count 3'' I; K meIoV 2:(,(" r(!>22',LS; fr t hese
ailcjorithrna-, is li-scussec i n the follaw"-di Sections.
3.4.1 Mixecd Ra-clh \ Theory. All1 PET theory can be
de2 oed1:. raraiv 1qa an-i mesaal sea-Uencoe N as
sexora tw dir~es.~on>i nltriccS andl p)rfor:;ing ope-ratiors
oil these m trcs.Unolrstandli n~ this apprach when exposed!
to Lfor thne firs-t tiirc is dIiftf colt. For this reason the
dv c:,on f2 pre s on te hei, :-, 11nd thenC 2 a s 0ed fi c
Dx .-1a 70 i S tire C, C" to incr-aSe under,-Sta -ndinc, of the
1.( .~i 3 02
n0
1 i?.0 ,1 .. ,N- 1 whcr~e (k ) \n x(n ) are both comp~lex
v, 11;i.E (3.102) -,1 lie express;c ais a viat Fix
X Tx
'A> .Lri- T '-i'an 1),, dwcimLi Lcd-.in-t. iire (Cooley andl Tukcy,
6 19 Or 1- u11 1 t I a ('0 a 11 o ad S nlde , 19 966 )to prod IIce
c I, f_ f I it'll ti f.,-) ri 1)17:
G 5
1 - 2 1
whr I he deci I f 'onl corresfoi op to the factor
n. of: I n n * *p1rr1 2 1
anid P1 isLt mrndh cp eesl matr-ix Sng.L:i
1969j) .The PO .c> L 113iS oniv ?I j) no0l ixi i we onl each
row aind caluv'uri 31)0a io I,,,- r~ ind to N/nsqu(I1)C
submaJ.trices c, dilrcunIsies n1 it is thiis partLitieon that is
the basis fn,- these UAwdrax ]ty o; iceo
1969) .The matrices F. can be furtLher fa-ctored into:
F. JR i T (3.103)
whore JR a s the diacional inatrix of - dd- ("otataon) fac-
tors. Using those tidefactor:; enbethe trigjonomectric
syminotriosand complex m-.ultipliers (e.g.,e
0 i7/ N ) to be, exploitedl in the DP' uttefia and red(uce
the nurcer of Leal. operaitionr- A eel POie ecmtin-n
timo examle 1 flow coo sjc7(cied hIiChl usc s thle ab:ove, idoa S.
Given an N paint seqn-,cnce fo-i the "-n; ot rE
fawr~toc of ne nuwbo1)r :ftc tI_ C-11 c h rsult in) anly
1)02211)10 coire) nation. 1F N a30, it c on he fa-ctored a5;
5 , aind thin 'ins 5 .3 1.'PsI i-s eoaoi tioln is s;aw)nl
il 1'ic'r 9 n t;t~ s<l~o;:t11T follov.edl
by fLiv ye -po int 31 'T:.; Th( e xi[tg of (decorvnj mos it lol is
-fro 5 6 5 3 -2 andl is shown ini VIi pure 3. 1.8. Start-
v.-ai; i tin 1h I li 'T e x;' 1!-*atton in Ppx (3.9 9) thle sequimee caia
6 6
(17~
law-2
-i, - - \, .. /,
\ 7 / / "~
'I ' /-1-,.--/<. .. ..
,,,- ,,,- ,\ .. . ./ ~ ---
.1 I ' /
1(1\ /I "'\,'
7- /
22\3 1
I \~
• ."! / '/ - -. ..
1'
GIx
.. . . .. iiiI i . ... . -I . ..i-._
ha( I icjt int o (1 5 6 (rtr!-v*c ntink; a 5 by G
p -1 v% q- 11 p171k
X (K) ) x (m -!m ) N (3.A04m= 0 r=0
Now the :nc sus; can be (-:pressCd as the, CJ-pc)J. U 'Ts:
q-i rkC (k) ' x :,(pr+m) V.' (3. 105)
S r= 0
sinceprk rk
WN exp(-j2: prk/N) q::p(-j2;prk/pq) V.' (3.106)
Using p:-5 and q-6 in Eq (3.104) produces:
4 mk 5 5rkX(k) = Y W30 2 x(5r+m) W30 (3.107)
m= 0 r=0
The inner sum in Eq (3.107) is a 6-point DET which
can be dc;mposed into a 3 by 2 matrix by dividing the
sequences x(5r1n) into three ,sequenccs, each two points
long. The inner sturmation in EQ (3.107) can be represenLed
using the notation of Eq (3.1-04) as:
p-1 d: q-I ptkC,(], - q: Ui , (1,t4 ; N (3. 10'-,
s-0 t-0 N
whc -o ,' : nav , 1, p p 3 2. Sl,)id t-i tut in 1 1!1('l
q yields:
2 1-, 1 3tkS(k) 0 6 t (3 6 (3.109)
6; 9
IAki,
4 ik 2 ;1, t: k
in- (11 V~ V2 1 (3.110)
r -3t+s
3 tk t k
Iq exp(-j 2 i! 3 Lk/6) =exp-j 2 ill Lk/2) 2
rn 0, 1 , 2, 3, 4
s 0, 1, 2
t 0 0, 1
The cciipleto flowgraph is shown in V'igure 3. 18 and
iripleieintsF 1> (3.A1.0).
3.4.2 Di cdt R\'oiL Al qoriithn-I (Gcneril) . The
permuti'$.ilit-rx P1) is ro(3uirec I)_cjiusc Lhe Lranlsforricd
rosti] t .iS in1 LI dliit resocr -odc ur(lr . GiV~en a at rI
tion o." N -- n n 1 11 2 111 the Vourior oo1wi t of
k' : J~112 .. nh (3.112)
In (0ijiU>Iliii.i iiiorcha;( of K Vlilih k' canl hot (loneC "inl place''
if N aol; 01J;uoh I ha i (tiiilo f 1977)
n. n (3.1,13)
7 0
i or l::: : h hn n-i. For i I i."; fact ()I- I 1 (1: ".e co natd
inr. paLu.rl ] order and k' in diiL i v.v::r.d o I.r d,.cribed
for fi:-:cI-rAdi ooa ri th: bit -revei!"A
To implement thi-; technique for mix:ed ra i.ces j i
factored into its prime factors and the "squarc" fact rs
arranqed syma'trically around the "scuare-frec" factor:;
of N. For ex-ample, let N=270 and be factored as;:
3 - 2 • 3 • 5 3
Now the reordering, P. is factored into:
P = P P (3.114)
The reordering P1 is "associated with the square factors of
n and is cone by pair interchanges as previously described,
except that the digits of n correspondJing, to the square-
free factors are held constant and the digits of the
square factors are exchanged symmetrically" (Singleton, 1977).
For example, if:
N = n1 n 2 n3 n 4 n 5 n 6 n 7 (3.115)
with n I = 1 7 , n 2 = n6 , and n3, n 4 , n 5 relatively prime,
the Jntercange as;ociated .i Lh the sq-1 re factors n 7 ,
n 2 , and n 6 is liven b':
k=k 7 n 6 n 5 ... n1 +1 6 n n 4 "'" nl +k5 p 4 n 3 n 2 n 1
+k 4 n 3 n n]. + 1 3 n 2 n I + k 2 n1 + h' (3.116)
.ipte('hap(I, with:
k' 1-k1 n 6 n 5 ... n + k 2 n2 n 4 ... 4 i. + k 51 n3 n 2 n 1
4- k n 3 n n1 ) k n 2 n + k 1 -I k (3.117)4 31 5 6
71
Th~ i:: 1 ((, fl~ t in thi.; ,- i,] e JA ac s,-; ch (c : ei (-nle t of1
X(K) i1 L1 , ,wI r t : .,e:ten of Ih '(J LIi N/nI 12 qr-ou}ed inll 22"su,:,.(l,'cc:"of n I n 21 consecuiive ,o ,m.jt (Sinclc]eton,
1977) The ne::t reordering P 2 then finished the reordering
of e!Cl n n l 15 subseguences within each U/n 1 n 2 segment.
The above factori.2ation is used in the SinqIeLon and
INS, mixed -adix algorithms and generates a compl1cated
FORTRAN code. A simpler alternative factorization was
written by tlie author and used in his mixed radix algorithm.
The simpler algorithm requires an additional two arrays of
length N to store the intermediate results which detracts
from the algorithms utility when longer sequence lengths
are transformed. The details of this factorization are
presented in Appendix E for interested readers.
3.4.3 Twiddle Factors. In Section 3.4.2 the factoring
into F. was described corresponding to a factor n. F. canS1 "1 1
be factored to give a product R. T. where the matrix T. is1 1 1
one of N/ni identical Fourier transforms of dimension n
and P. is a a aona] iddle factor matrix. The elements
of P. are secified by the deci,.tLon-in- frequenc, vcrsion
of the FFT (SingleLon, 1977).
The twiddle factor matrix RP. multiplies each transform
Ti of dimns2on n. by (,j(Z) where Z is an angle from the
s~et:
0, Z, 2Z, ... , (n. 1)Z (3.118)
and Z 2i:/N. No multliplication -is needed for the zero
angl ,e which gjives at mwo;t N(n j- .)/n i complex multiplications
72
AM&L
.fchi I ., t It~:2(VI~ /) W it w~l1v
an I~ F P1t t> I'c I
I / . ( .1 9
I'hi- r(,: L is U:;eu i M, co;()i~t ji.L t w m m -c o (
mnult:i i icat ion:. -Indl add itions reqluired by anl N length FFT.
3A4. 4 Peal Oocr'i-atiions CountU for CompItoi an Sine
and Ccf-2 no, Pi F QoceVa Rcealill frui!1 Section 3. 1
that tj.-i,1onom, U ri c va lucn ii>:cd in ain F,," can be computed
cos (k:+1) a) (C cos(].ha) -S sin(ka) ) + cos(ka) (3.120)
sin( (k,-,i)a) (C sin (Ia) +S cos (ki) ) + sin (ka) (3. 121)
Where a 2/ r!a-,!d1ian11S
C -2 nin 2n /2)
cos;(0) 1
sin (0) 0
I n P t he rXi:.2e ()f- tho aiit her', m ixcd rat: lix 1 FT the
dii f ft I 0 w iC'( l o-L -m1 ~ 0~ipu tcd N t i mes and the sine and
Io t r' 1 11 t 3i v( i.Ool~lip t ablOs . The difference
1:11C(I c ~ .' -)-S * KSO(--i) + WKC (1-1) (3.122)w4 1-I + q ; 1
I'- C I ur:;( r- 1 E(ri)+WSI (3.123)
73
~Wk.
1,12) rA .( 12)s) .1ir mu '.I (.ltil S , oil-,an
trepe ,(,pu e j-e li- p r to Count i;(32\'jlby
rual- miult 4N(3. 124)
real1 addls TON (3.125)
The IMSL lind Singleton FYTs do not- use the sine and
cosine l ookup talbles in orde-r to save memory arrays.
Instead the s.-o nd cosine values aire computed as needed.
in the FVET prOjrami-, resulting in in intricate FORTRA\N code.
It was determined from the FORTRA'N coded I'ISL and Singleton
FFTs that both utilize the same method of computing the sine
and cosine difference equations. For this reason only the
Singleton FrT aie orithm was studied.
An algorithim w.hich com-iputes the number of real
operations required was interpolated from "counters" Placed
in the YTT rFliJo haV code( in Ap;pendix P. They provided the
number of tiinc Lhali each section of the FFT subroutine
was- usied to co7')u1 e Lhe sine and cosine values- for different
valuw-es NF. I erFor the ceutr are habc.
* a1n l the( .1 r'of J " <A coclo. whelre they\ we-rn
positJ oied. TI he lines of code are shown i n Appendix F.
12c: Counno,,v for thec radix-2 dlifference equationin 1ip~~ 2330 - 2340.
I 2CL Count rv for tHie i%-I x- 2 5. oand cosine
11mwv cllsin .1 nes 26")o - 2600.
14c] Couii I(Yr- for the r , ij x-4 sec(tioni which corn-put(: the Sine and con inc term~s of the
,WN 11 ni o the od, vix-4 .in I.inen; 3030 -3040.
iR t~r to Uicn( '13.19 w- hich shiows the radix-4
7 4
2r
ri.(yulW 3. 19. lnKii'-,i ay . .,flv lowy- h ,c .- nu',I o ) I 1
!I
I 4C2 C nt t(r Y-() t)he ral] ]:-'. 4 OC t i (,n% 7h ich CM1,PUtC<
[2'.. 505 .li_ [t ( t.11 : (0 i 1e 61( WN k• N
]('I:; of [Iio" rad.i.:-4 buLterfl f owjraph inlin s 3140 - 3170.
14CL: CounLer for radix-4 sine and cosine librarycalls in lines 3690 - 3700.
IGTF: Counter for the feneral twiddle factors sectionin lines 4990 - 5000. which cormutes the sineand cosine for the 11N leg of the general radix-pPIT.
IGTFE: Counter for the general twiddle factors sectionwhich cooxutes the sine and cosine for therenainder of the radix-p butterfly legs inlines 5170 - 5190.
IGTFL: Counter for the general radix-p sine and cosinelibrary calls in lines 5290 - 5300.
Data was collected for over 70 values of N using these
counters. A subset of the vz,lues were the 59 permissible
sequence lenctis of PFA and WFTA. Based on the results of
these tests and study of the FORTRAN code FFT in Appendix F
the general expressien> for these counters were determined.
Given that:
N osegence length
NFAC() ()f-r o ,v T N (as fact ored ))y the 21 nc]eTItons u brout r"
1-'I : nulll] n o fa](r ' oi o N
KSPN. = N/(NVIAC() * NFAC(2) ... * (NFAC(i-])J
then
T2C" (NPAN - 3),1/ for ]4],"N. > ' and odd
12C v (K1SP1ANi - 2)12 for IKfSAN i 4 and even
I2C 0 for K2P;\fl. < 4
. .
76
Am b
F'or th'( W~~tr 4 (?'t11c':~rve for I? ]'cmes
1C 12. o .f r of- 2 in N(3.126)
'Ihe x veni for- Llie- I1U1lD()w of c i . and COSIHO( clls
duringi cwnputat irn Fafact oi of 2 is[vpu/2r hr
Irepreroen s t rinciol of thew result. insi~le LlhbackeL
Using t ho "trunc:-,tioi' nota-t-ion:
kIM. N: 1, [1SPAU/701 (3.127)
Thui raodi;x-4 SectLioni uses the Same notatioanal conven-
tions for KSIIAN andl truncation. The expressicns for
14Cl, 14C2, and 14CL bccomez
14C2. I{S= N - 1 (3.128)
14CL. [YSPAN./32] (3.129)
14c1. 14C2. - 14CL. (3. 130)
For all fictors of 41 in N tC-he expression becomeis:
k14C2 = p,(K,\ N-1) (3. 131)
I4CL 1, 12f22VJ.21 (3 .1-32)
14C1 -14C2 - 14CL
4whec thcrc' arc, k- f c tors of 4 in N.
Th3c ee cllo r<;x n folr I CP, IGTIT, adI GTPIL were
~PI~TCTL:~ N/? (3.1-34)
I V'i E;AN -lciL.-2 (3.135)
'7
IJGT i. (Izf'hv (N I) ( ' (i) 1) . 1.336)
The rt_-ii t for. the (jenerai. raldix-p ) .(ec U on (cjes
k1GTF KIGTV. (3.137)
kIG'V M7.1L (3 . 13 8)
kIGTFL X I(;T F 1; (3. 139)
i= 1 1
Uqs (3.124) thrciiqh (3.139) were prwqrapuiid in FORTIRA" and
then tabulated as a function of N in Table 3.4. These
results identically match the tests conducted using the
counters.
1E7xamininq the FORTRAN code where the counters were
locatedl Uives the number o(7 operations performed each time
one of 1Lhc counters was incroim.ented. These results arc
pre-.qe:nt(-d an TaIble 3.5 for al.1 the counters. The numrber
of relopc ratj ns , sine and cosine library calls, and
expoetain call be ue1(te7mi.nec] for all N length seqluences
KADO) I4(i2L' I ]4C-1 I ~ i + 3 (_14C1) ± 2 il) (3.] 30)
ju i-r 4 (iT-c, j-'(,2 -i 1GTjI' + IGm.ri':) + 6 ( i l-) (3.141)
KEX1, 2 (14CI ) (3. 142)
3. 4. 5 N("1 Op, Coti (lH(? lnL I(, MJ*Ii\od 1. Is . Thte
rea"l h~ op- Ii tLi on: ; (2:11 (1ci ve A V iI olHI the ofI,)C 01 COllplox
(-ti-; tdi wfatl i- bui-)fje' and tenu-ilber
of- ti e a 111d Colni1 coL;r,;reju Led 1l! 11 (J cdif ference eq uatLions.
718
.. .. ... ...... . - 4',! . ..re !!m e. . . .L, -
- Vt
4- -
47"
-JV, V4 -- i - C
4 - - -.. , -
t -r
S
.4
1 1 ---. V .
- 'r - *. . ic4'!' -~ 4
- .1-';
* C I *i *-r di.:,.: -, U I..
-I ~1 -1 Vt Vt
I *J'( 4 .9!- (.0' . C~A (V
I
N CCUCC§2 -, .. iA .v *t..~.<C((~~.q.>orC0.. -1 CC 0(4 . if, CI) C') I in 01 Li
Vt ~-c ~-i (N -r (N
>4
(N UN UN " F Vt C NVt ,-s iN Vt C.
A -. 2- * - - i-i -iA ('-''-U-, -4 C .-.-, * C,:.: A
t
I.4
I
tA- P vrc - .~t P Vtr -4 4 -r- .- . io.-.,-A P
w -
Rea I IlC al I 7:pon 0 1- S I, c, Cwrd':Co un L cr Ad. ( Mu tU i ati onl C~aj Cj) 1 a
12(7 4 4 0 0 0
12CI1 0 0 0 1 1
I1l C,1 3 6 2 C 0
14C72 4 4 0 0 0
1 4ci 0 0 0 1 1
I1C(711 4 4 0 0 0
l il1 2 4 0 0 0
ICTY1, 0 0 0 11
EFFICIENT COMPUTER IMPLEMENTATIONS OF FAST FOURIER TRANSFORMS. CU)
UON EC 80 .J 0 BLANKENN*CLASSIFIEO AFIT/SE/EE/800-9 N
Given that N is factored as:
N P3 P 2 "m (3.143)
the number of twiddlc factors has bcn shown (Singleton,
1969) to be:
mZ (N(pi - ])/pi) - (N-1) (3.144)
i=1
where m is the total number of factors of N. The number of
butterflies required for an N length sequence is given by:
mE (N/pi) (3.145)
i=l
The total real operations count is determined by adding (a)
the number of real multiplications and additions required
per butterfly times Eq (3.145), plus (b) the complex twiddle
factor multiplications times Eq (3.144), plus (c) the number
of additions and multiplications given by Eq (3.140) and
(3.141).
Assuming a complex multiplication requires four real
multiplications and two additions a general expression for
the real operations count can be determined for the mixed
radix FFTs.
£Singleton's mixed radix algorithm contains special
transform sections for factors of 2, 3, 4, and 5 as well as
a general section for other odd factors. This requires
that N be represenLed as:
r s t u ml m2 mkN 2 3 4 5 p1 P2 ... Pk (3.146)
82
The IMST mixed radix FFT (! PJTCC) does not have a special
section for factors of 5 and uses the general suction to
transform Lhese factors. The author's mixed radix FFT (IFTMIR)
has sections for 2, 3, 4, and 5 but does not have the general
transform section. Only the detailed development of oper-
ations count for Singleton's algorithm is presented here
because the ether two algorithms are subsets thereof. The
general expressions for real operations versus N are given
for the other two algorithms in Appendix G and H.
The radix-2 section of the FORTRAN code for Singleton's
algorithm is shown in Figure 3.20. For factors of two the
twiddle (rotation) factor complex multiplications are com-
puted in this section rather than the "general rotation
section" to reduce the array indexing required. Using
Eq (3.144) the total number of butterflies is rN/2 and from
Eq (3.145) the total number of twiddle factors is rN/2
(neglecting the -(N-l) term which will be subtracted once
the complete real operations count for all factors has been
developed). The transform for factor of 2 (refer to
Figure 3.20) is computed in lines 2200-2230 using 4 real
additions, if no twiddles arc required, or it is computed
4 1 in lines 2450-2500 if twiddles are necessary. The general
expression for factors of two becomes:
real mult = 4(rN/2) = 2rN (3.147)
real adds = 4(rN/2) + 2(rN/2) = 3rN (3.148)
- The factors of 3 section shown in Figure 3.21 performs
only the butterfly in this section and uses the general
83
I
- I
* *.
S :I , , • 11 -4 I
-,-- . ,
- 1 *-, ".
a.- , . .
, -
• I p-. -" . : :' : " ,I , , " - '
II .
i'- Figure 3.20. Radix-2 Section of Singleton's FFT.
; 84
H'
T f1*
igr 3.21 Radx- Secionoigetns
Ali,
H 85
rotaLion (twiddle) section to twiddle the data (the general
tv: cUl c far t.or C ctioll fi; Fshown in Figure 3.24) . Ur;ing
Eqs (3.144) and (3.145) the number of butterflies for
factors of 3 is sN/3 and the number of complex twiddles is
s(2N/3). Examining lines 2760-2870 in Figure 3.21. shows
4 real multiplications and 12 real additions. Each complex
twiddle requires 4 real. multiplications and 2 real additions.
The expression for the factors of 3 section becomes:
real mult = 4(N/3)s + 4(2/3)Ns
= 4sN (3.149)
real adds = 12(N/3)s + 2(2/3)Ns
= 16sN/3 (3.150)
The factors of 4 section in Figures 3.22a and b include
the twiddles in the butterfly section to minimize array
indexing. The number of butterflies computed for t factors
of 4 is tN/4 and the number cf complex twiddles is t(3N/4)
from Eqs (3.144) and (3.145). From lines 3210-3320 and
3540-3570 the number of real additions per butterfly is 16.
Every complex twiddle requires 4 real multiplications and
2 additions. Combining the butterfly and twiddle operations
results in the general expression for factors of 4:
real mult = 4(3N/4)t = 3tN (3.151)
real adds = 2(3N/4)t + 16(N/4)t
= 3tN/2 + 8tN/2 = lltN/2 (3.152)
The transform section for factors of 5 shown in Figure
3.23 computes the butterflies for the u factors of 5. There
are uN/5 butterflies and u(4N/5) complex twiddles based on
86
. .... . ..- 4 I_
1 7 A
,.-j
-: = '-! - ;4 b- I!
i g 3.22a . :- ,c i ,l FF',
",1 -II- : F ' ', - 4 - - , .,
.- , '1 , - J=. ;,' 4 : , : , - '4, '-
: .f 4 - n :- : .
-A-
' i>" I'.' ',
-' I -' : , ' , ' " ' , "
* -I-' ' ' ' " -'I
' 4 - : + ;
-4 I n .: , , ,,,,
" Figure 3.22a. Radlix-4 Section of Singleton's FFT.
~87
Tr : ; ' l ' I ' I .. . . .- ... ..... .. . ..
"-' T . C - - 1 ', - - -
I.F;.q:. . i
Figure 3.221b. Radix-4 Section of Singleton's PET.
4
S.
4,
/ 88
, : r : - ? 7 r ' - '" 1 ' . , - -. ...... ... .. '---
I'
-,-- - I-I.,
." -"j .7 , T - .• , •
- S..
4 ,,. 0 1
:I-I';:- I:-I': ' " ' :" r , I ':
-4 , . - . 1 '.
4r11.i,- '- ', I . ' "
4 4
Figure~~ 3.3 Rai- etorfSnltnsFT
iii6i
.. •. .4-
.4:1-
-S-.,
)_]:iqs (3.144) and (3.145) . Examination of lines 3%20-4090
in Iiiqure 3.23 ! iow, 36 i-u, [ L i t anu 32 rcal
additions are required per butterfly. Cowbining the
butterfly and complex twiddle operations provides the
general expression for real operation for factors of 5:
real mult = 16(N/5)u + 4(4N/5)u
= 32uN/5 (3.153)
real adds = 32(N/5)u + 2(4N/5)u
= 8uN (3.154)
where u is the number of factors of 5 in N.
The general transform section for odd prime factors
is more complex than the special factors sections. To
aid in describing the number of real operations a p-radix
is defined such that p is an odd prime greater than 5 with
an associated "mi" integer power. The real operations
count for the general section does not include additions
associated with array indexing nor does it count multi-
plications and additions needed to recursively compute the
sine and cosine terms.
Based on the FORTRAN program for the odd factors shown
. in Figure 3.24a and b there arc five sources of real
operations for each pi factor. The first source shown in
lines 4310-4360 is computing the (Pi-l)/2 complex multi-
t pliers for the butterfly legs which require:
real mult 4(pi-1)/2 = 2 (pi-l) (3.155)
real adds 2 (pi-l)/2 = (pi-i) (3.156)
90
S-I Iz - I*
4 . i-,= ' I' - "
. F: -
, .L T. T T2 '41
.4 1
44' ,-..-i,,s
441= 1" = 74:;44320i---=' ,-I = d r:i
44. A= E::: P P444 H= j 1
44 h = -
4 i - - ,. .4:- ;;T44.90= _ s=' 1'" d
44'1z, ; -q'.: = ::IT ,_ , - ;q :
49,0= T,' - , 1' F'PT F.1,
451= F:: T ' + F:
45 '; .I I -' - '
4._4= F T I' - F'' F4= 1= : =I 1 F :':""
, 1= I , .: 1 T * ' $ , : - T V m T'-4t - - -, -
.,4:1," =,'I 1 "
4-,111= 1 1 4
t 4:jz1 = -- I I
4700n 3,.:0. , = ,
Figure 3.24a. General Factor Section of Singleton's FFT.
91
A L
T I
. ...... I , I I I!
- .j
-1':. r;= : '- :,.. I . n ."[
4 '4: I =3' T In 4-
T i ,. L f- f-
1 F .: -P 0.5F, 'l LI IT '- r,
4CJC~'i T IrrP Tr 1)
T 4,-T~ 1 -SF
....- ..HE 2LL TT~1 :T I qr
,. . . -
Ill - - - ,- 1 t *:-: C - :
'= N I '' ' I..,:';d; , -44 . , , . .' Sf F Zf
t "1
- ' = .: 5' ",* ;; +" ,; 5'ICl -L-,.1
Figure 3.24b. General Factor Section of Singleton's FFT.
92
-- ' . i ' " " ' i ". . ..' ++''") ; : " '. . ... ..+- ' : " a
. .;., . L , .,,: c '.., unlv OnCe Icr cach
, ... , ,: .. 26 - 4, thc.., fictor 7 re-l, -s
(7-)1 c mkT :: u ZL icrs. I f N 196=7.7-4 th r -rc
still only (7-1)/2 complex multipliers needed.
The second source of real operations is produced by
computing the butterfly transmittances which require only
real additions. From Eq (3.145) there are (mi)N/pi
butterflies required for the (mi) factors of pi. For
each butterfly there are (pi-l)/2 transmittances which
require only real additions. Examining lines 4470-4540
in Figure 3.24a show that the (pi-l)/2 transmittances
require 6 additions. Combining these results produces
the general expression for the real additions:
real adds = (6(pi-l)/2)(mi)N/pi
= 3 N(mi)(Pi-l)/pi (3.157)
The third source of operations is produced by the
(pi-l) 2/4 butterfly transmittances which require real
multiplications and additions. Lines 4510-4750 in
Fi-ure 3.24b show there are 4 real multinlications ,-,nd
4 real additions needed. Combining this with the number
*of transmittances and butterflies gives:
real mult = 4((mi)N/p i ) ((pi-l) 2/4)
2
| = (mi)N(Pi-l)2/pi (3.158)
real adds = (mi)N(pi-l) 2/p i (3.159)
93
"r ", lM "- A "
linc s 4800-4:230 ,-c w that this function requires 4 real
additions. Combining these results give the total as:
real adds ((mi)N/pi)4(pii-)/ 2
2(mi)N(Pi-l)/pi (3.160)
The final source of real operations is shown -.n
Figure 3.24b lines 5120-5140 which performs the complex
twiddle multiplications. From Eq (3.144) there are
(mi)N(Pi-l)/p i complex twiddles which provide the general
expression:
real mult = 4(mi)N(Pi-l)/Pi (3.161)
real adds = 2(mi)N(pi-l)/pi (3.162)
Combining Eqs (3.145) through (3.162) give the expression
for the real operations in the general odd factors section:
k 2real mult E 2(pi-l) + (mi)N(pi-l) p.
i=l
+ 4(mi)N(pi-l)/Pi (3.163)k
real adds Z ((pi-l) + 3N(mi)(pi-l)/pii=l
+ (mi)N(pi.-l)2pi + 2(mi)N(pi-l)Pi
+ 2(mi)N(Pi-1)/p i )
'4
k," = E (pi-l) + 7N(mi)(pi-1)/pi
i=l
* 2+ (mi)N(pi-1) /pi (3.164)
94
Assuminq that the sequence can be factored into
1 m2 p1 mk the expressions for the
total number of real operations can be written using
Eqs (3.140) through (3.164) as:
real mult = 2rN + 4sN + 3tN + 32uN/5
k2+ E (2(pi-l) + (mi)N(pi-l) /p,
i=l
+ 4(mi)N(pi-l)/pi) - 4(N-1) + KMULT (3.165)
real adds = 3rN + 16sN/3 + lltN/2 + 8uN
k+ k ((pi-l) + 7N(mi)(pi-l)/pii=l
+ (mi)N(pi-l) 2/pi) - 2(N-1) + KADD (3.166)
Notice that Eqs (3.165) and (3.166) have the corresponding
4(N-1) and 2(N-1) real operations subtracted from the total
multiplications and additions because the first stage of any
FFT decimation-in-time does not require the "twiddle factors"
(likewise with the last stage of an FFT decimation-in-
frequency). These equations also include KADD and KMULT
which are the real operations required to compute the
recursive sine and cosine difference equation.
4 Similar expressions and derivations were performed
* for the IMSL FFT and the author's FFT but due to theredundancy they were derived in Appendices G and E
respectively. The general expression for real operations
required by the IMSL mixed radix FFT (where N st
ml m2 mkSI P2 " k is given by:
95
a~t -. . *..
real itiul L -, r;4 + 3 Jn
k+ T (2(pi-1) + 4(mi)N(p.-l)/p ii=l £1 3
+ (mi)N(pi-l)2/pi) - 4(N-1) + KMULT (3.167)
real adds = 3rN + 6(sN + ltN/2
k+ E ((pi-l) + 8(mi)N(pi-l)p i
i=l
+ N(mi)(pi-1) 2/pi) - 2(N-1) + KADD (3.168)
where KMULT and KADD are the multiplies and adds needed
to compute the sine and cosine terms. The general expression
for real operations required by the author's mixed radix
FFT (where N = 2r 3s 4t 5u) is given by:
real mult = 2rN + 4sN + 3tN
+ 32uN/5 - 4(N-1) + 4N (3.169)
real adds = 3rN + 16sN/3
+ lltN/2 + 8uN - 2(N-1) + 1ON (3.170)
The real operations count for Singleton's mixed radix
FFT is shown for N 200 in Figures 3.26 and 3.27. The
operations count plotted includes only the additions and
multiplications for the butterfly and twiddle factors in
2p order to demonstrate the N2 "upper bound" and the N log2 N
"lower bound". The N upper bound occurs in the mixed
radix FFTs when a prime number must be transformed. The@mN log 2 N lower bound is reached when N=2m. In between the
N2 and N log 2 N bounds there are other "bounds" which are
observed in Figure 3.25. The dashed lines represent numbers
96
00
4J
0 3 0
0"4
oz
w 0
00-- S-4
IL
t __ ___ ____ ___ ___ ____ ___ ___ ____ ___ ___ ____ ___ __ 3
oooi. cn 00Qz ~iz oofr ooo. oD~
4.3
97.1I1
C-C
0
0
-w4-~ 0
0
14
.9-I
o 0oC 4-
C)
C Z
0
98-
which are not primes, but are not highly factorable either.
The dashed lint, ipprodchc2 : log2 N as :1 becomes more
factorable.
The relative efficiency of radix 2, 3, 4 and 5 FFTs
is observed in Figures 3.27 and 3.28. These figures plot
real operations counts for the mixed radix FFT for N less
than 250 (where N is divisible by 2, 3, 4 and 5 only) and
annotate the integer powers of 2, 3, 4 and 5. Notice that
the fixed radix-2 and 4 provide the "lower bound" and the
radix-3 and 5 provide the "upper bound" on the number of
real operations which shows that integer powers of 2 and 4
require the least number of real operations and radix-3
and 5 the most. Other combinations of factors, i.e.,
N=120=5*4*3*2, have real operations counts which fall
between the "bounds".
3.4.6 Memory Requirements for Mixed Radix FFTs.
As in the case of fixed radix algorithms, a major consider-
ation in selecting a particular mixed radix algorithm is
the memory required to execute the FFT subroutine given the
memory storage limitations of the computer to be used. The
memory requirements for the three mixed radix FFTs is given
here as a function of the sequence length N. Each
algorithm has program and memory array requirements which
are listed below.
All the algorithms were compiled on the CDC Cyber
system at AFIT and the program memory required by each sub-
* routine was determined from a "load map" generated by the
99
.........--. . -i
+ v
0
Nz
~LLJ
ZL('4
M4-
+ I
0 ~4J
4-' 00 4.4
3.4 0 + U)4) -1 M -Q) + (0 w +4.4 U~ 44 +
4-4)~ 0A-) J4 C0
0) 4J) " OI l u E 0a fd 0 41 14 r_'m .,
.41 r_ -4 .0 +) 4Ji+n 0 j :
-4 (D r + M)
m H 30-4 ~0 0 + 0 *a) "~ TY u
(4
1-44~N ~+
*.0J-U a Qg +
) 0
14 0~ 4-' M
ON
10 0
100
+ C
C'
uLJ
0 L1J
+ ~LLJ
+ 0
444 +D0 4-4 a)-
0+ U) +0
$4.
w4-4 4-4
Z 14 z
.W a 4 L)~ 0M~
+Ci)
aD) C (1) 4q'0
W 4 -.14J U N+
U)4 0 Q) o +4
w~ .-4 iLr)cl Mir+
"r +
V( 41' (a N
t4
-4 ).CJZ U
0 0
9008 009 01' 000 00
101
command MAP, PART. This load map gives the size of all
programs used during execution. The array storage require-
ments were determined from the FORTRAN coded programs and
reference material provided with the IMSL and Singleton FFT
subroutines. The general expression for memory renuire-
ments for each FFT subroutine (as a function of N) is
given below.
The subroutine written by the author requires 899
words of program memory. This subroutine (FFTMR) also
requires the "calling" program to dimension 6 arrays
(A, B, AT, BT, WKS, and WKC) to length N. (Use of these
arrays is explained in Appendix E). This gives the total
memory array required as:
FFTMR memory = 6N (3.171)
The mixed radix subroutine written by Singleton
(FFTSNG) requires 1100 words of program memory. Four arrays
(AT, BT, CK, SK) are dimensioned to equal the maximum prime
factor of N. If there are no prime factors greater than 5
these arrays may be reduced to 1. A fifth array (NP) is
dimensioned to at least one less than the product K of the
square-free factors (see Glossary) of N. If N contains at
most one square-free factor this array can be reduced to
M + 1 where M is the maximum number of prime factors of
N. Two more arrays, (XR, and XI) are dimensioned to length
N. The total memory array storage becomes:
FFTSNG memory = 2 • N + 4 • MAXPF + (K-1 or M+l) (3.172)
102
" ,~~ *7 7i T :;..
where
N = Sequence length
MAXPF = Maximum prime factor of N.1
K = Product of square-free factors
M = Maximum number of prime factors
NOTE: K-I or M+I is selected in Eq (3.172) basedon the number of square-free factors ofN as described in the preceding paragraph.
The mixed radix subroutine (FFTCC) provided as part
of the IMSL package on the CDC Cyber system requires 1061
words of program memory. A complex array (A) must be
dimensioned to length N and two other arrays (IWK and WK)
are dimensioned to length "IWORD", where:
IWORD = 3 • M + 3 + MAX (4 * M + 7 + 6 - K,
KB + 1 + 2 • JK) (3.173)
To define the quantities M, K, KB and JK a prime factor
decomposition of N is required such that:
N- 2 f 2 f2 fN = f 1f "" fKT fKT+1 ... fKT+JT
where each f. is a prime number (other than 1) and fi fr
given that:
i, r > KT + 1
KT > 0; JT > 0
Then:
M 2KT + JT (3.174)
is the number of prime factors in N and:
K= max (f) (3.175)1 < < KT + JT
103
is the larqcst prime factor of N. KB and JK are defined
as follows:
JK = 1 f f 2 "" KT (3.176)
where JK = 1 if KT = 0 and
KB = N/(JK) 2 - 2 (3.177)
Once M, K, JK, and KB are determined they are substituted
into Eq (3.173) to determine the value of IWORD, the actual
work storage requirement. Counting only the arrays for the
work vectors (IWK and WK) and the data arrays (A and B)
gives the total array memory required for the IMSL FFT:
Memory = 2 * N + IWORD * 2 (3.178)
An example of N=2100 is used to demonstrate the use
of Eqs (3.172) through (3.178) in computing the memory array
required by the IMSL and Singleton subroutines. For N=2100
the factors are 22 . 52 . 3 • 7 for which FFTSNG memory
becomes:
N = 2100 sequence length
MAXPF = 7 maximum prime factor in N
K = 3.7 = 21 = product of the square free factors
M = 6 = maximum number of prime factors
Using Eq (3.172) the expression for FFTSNG memory array
is given by
2 • 2100 + 4.7 + (20 or 7) = 4248 (3.179)
NOTE: There are two square-free factors 3and 7, therefore choose 20 for thelast term of Eq (3.179).
If this subroutine were used on the Cyber 74 computer, the
program memory is added to the memory array to give a
104
j I
total memory of:
memory = 4248 + 1100 = 5348 words (3.180)
The same example of N=2100 is applied to the IMSL
memory equation where:
2 2 2N f1 f 2 ... fKT fKT+ "-" fKT+JT
22 2 3 7 = 2100 (3.181)
From Eq (3.174) the expression for M becomes:
M = 2.KT + JT = 2 • 2 + 2 = 6 (3.182)
which is the number of prime factors in N. The largest
prime factor in N is given by Eq (3.175):
K = max (f.) = 7 (3.183)1 < j < KT+JT I
JK, which is the product of the "square-factors", is:
JK = 1 - f1 " f2 "'" fXT = 2-5 = 10 (3.184)
and KB is
KB = N/(JK) 2 - 2 = 2100/100 - 2 = 19 (3.185)
The results of Eq (3.181) through (3.185) provide the
size of the work vector IWORD given by Eq (3.173).
IWORD = 3M + 3 + MAX (4M + 7 + 6K, KB+1+2JK)
= 18 + 3 + MAX (24 + 7 + 42, 19+1+20)
= 21 + MAX (73, 40) = 94
Substituting IWORD=72 and N=2100 into Eq (3.178) gives the
memory array for FFTCC as:
2N + 2IWORD = 4200 + 94 = 4294 (3.186)
Using this subroutine on the Cyber 74 computer requires
1061 words of program memory which makes the total memory
required equal to:
105
,J . . , : " " , " l -m . : ,. . .
4294 + 1061 5355 words (3.187)
For this length N=2100 sequence the Singleton FFTSNG used
less memory (5348) than the IMSL FFTCC (5355).
The array memory requirements given by Eq (3.172) and
(3.178) are plottud in Figures 3.29 and 3.30 for ': less than
200. It is readily observed that selective adjustment of N
to be highly factorable (composite) minimizes the memory
required by subroutines FFTCC or FFTSNG. As an example of
how prime numbers increase the memory array sizes, cor.sider
N = 2099 for each algorithm. For FFTSNG the variables are
MAXPF = 2099, K = 2099, and M = 1. Since N = 2099 contains
only one square-free factor the array NP can be dimensioned
to M+1=2. The memory array for FFTSNG becomes:
2N + 4 • MAXPF + 2 = 12594 words of memory array
Adding the program memory of 1100 yields the total memory
requized to execute the FFTSNG on the Cyber 74:
memory = 12594 + 1100 = 13694 (3.188)
For the IMSL FFT the variables are K = 2099, JK 1,
KT = 0, JT = 1, KB = 2097, and M = 1. The expression for
IWORD becomes:
*j IWORD = 3M + 3 + MAX(4M+7+6K, KB+I+2JK)
= 3 + 3 + MAX(12605, 2100) = 12611
The total memory assuming execution on the Cyber 74
system is:
2N + 2.IWORD = 2.2099 + 2.12611 = 29420 (3.189)
which is 5.5 times larger than the total memory for N=2100.
106
0
-j 0
C3
p C)
ov
0W
LLJ4
n ocr,ceI
-i MI
m X 0 1
cvn
0 rq
107)
0L
E-C) aL
-- ----
- --- ( 3pJ U
0!U-
-zz
LLLU
-i Li
(00
C!
1010
100
3.5 Fourier Transforms Using [ast Convolution AI.rithms
The paper by Cooley and Tukey, 1965, had a major impact
on digital signal processing by stimulating the development
and wide use of the FFT. Recently several new ideas have
been used to compute the D?'T which have imnpacted diaital
signal processing. In 1968 it was observed by Rader that
computation of the DFT could be changed to circular con-
volution by rearranging the data when N is prime. Now, if
given a fast way to do circular convolution, one has a fast
DFT method. Winograd showed the minimum number of multi-
plications for circular convolution of primes and prime
power length sequences. He then proposed that these high
speed prime power convolutions be "nested" into long trans-
forms to minimize multiplications. The Winograd nested
algorithm has been studied and programmed (Silverman, 1977;
McClellan and Nawab, 1979; Zohar, 1979) for computing the
DFT of complex valued sequences.
An alternative to the Winograd algorithm was proposed
by Kolba and Parks and combined the concept of fast convolu-
tion with conventional DFT techniques to oive inotner
efficient DFT implementation. Kolba and Parks' prime
factor algorithm (PFA) uses the same reordering technique
as the Winograd Fourier transform algorithm (WFTA). The
oriainal PFA (Kolba and Parks, 1978) has been modified
(Burrus and Eschenbacher, 1980) so it can transform the samer
sequence lengths as the WFTA.
109
This section pre:;ents the theory (c f he IFTA ":;rall-N"
alqorithms, the data reordering (which is the same for PFA
and WFTA), the PFA theory, the real operations count, and the
memory array requirements for both PFA and WFTA. Since both
alqorithms follow a similar development the conversion of a
DFT to circular convolution and data reordering are only
presented once and apply to both algorithms.
3.5.1 Converting a DFT to Circular Convolution.
To convert the DFT expression to a circular convolution the
DFT matrix [W] must be "mapped" into the circular convolu-
tion matrix [W ]. The mapping between these two matrices,
and hence the basis for the WFTA and PFA was developed
by Rader in 1968.
Rader showed that if "N is prime, there is some
number g, not necessary unique, such that a one-to-one
mapping from the integers i = 1,2, ... , N-1 to the integers
j=1,2, ... , N-1 is given by:
j = ((g )) N (3.190)
where the notation ((x))N implies x modulo N." The example
of N=7 and g=3 usino the mapping of Eq (3.190) gives:
i 1 2 3 4 5 6
j 3 2 6 4 5 1
The number g is referred to as a "primitive root" in number
theory. The mapping of Eq (3.190) provides the convolution
matrix [Wc] from the DFT matrix (W]. Examples of this
mapping are extensively treated in the references
110
-* " I i-.. . . . . .. . ..
(Silverman, 1977; KoILci - !Id TI.-rk , 97)Alicr
repeated in this paper.
A brief example of using the results of tho convolu-
tion matrix is presented to aid in developing the small-N
algorithm operations count. Consik'L.r thae fo.llowin9 3-poinL-
DFT written in matrix notation as:
(X(2 7 KWo 2l xd (3.191)
where Wis assumed and usi= t The circular convolution
is given by:
(I =w~ 2 1 lR(2 [w w wl (2) (3.192)
=~) w2 wI] (2 (3.192)
which provides X(l) and X(2). Then the DFT in Eq (3.191)
can be rewritten using Eq (3.192) to give:
X(0) W 0(x(0) + x(l) + x(2))
X(l) W 0x(0) + (1)
X(2) wlx(0) + X(2) (3.193)
Using similar techniques to the one presented here, convolu-
tion expressions to perform DFTs have been developed for
N = 2, 4, 5, 7, 8, 9 and 16.
Ii
3. 5.2 Iord3rin5 the Data Arrays. Implmentinq the
WFTA or the PFA into a useful form involves making long
transforms from the short, fast-convolution transforms for
2, 3, 4, 5, 7, 8, 9, and 16. The general idea is "to con-
vert a one-dimensional lenath 1-! M1 12 ... M. transform
into a i-dimensional transform requiring computation of
i shorter length Mk transforms for k = 1, 2, ..., i.
(Kolba and Parks, 1977). The mapping from one-dimension
to i-dimensions is based on the Chinese Remainder Theorem
which requires relatively prime factors M1 M2 ... M i
The example for two mutually prime factors given by Kolba
and Parks, 1977, is presented here because the mapping is
common to both WFTA and PFA.
In the DFT:N-I
X(k) = N x(n) Wnk (3.194)N=0
the index n of the input sequence is referred to as the
input index, and the index k of the output sequence X(k)
is called the output index. Mapping from one-to-two
dimensions maps the input index n into a pair of indices
(nI , n2 ).
n = r n mod M n = 01 ..., MI-I r = M mod M1 1 ind 1 1** M1- 1 M2 md 1
n 2 = r 2 n mod M 2 n 2 = 0 ... , M 2-1 r 2 = M 1 mod M 2
The output index is
k= k mod M1 k I 0, ... , MlI
k 2 = k mod M2 k 2 =0 1 ... , M 2-1
112
T'.!. 1 fvc r:;. b ,) inI,; (" two-to -,n di muns ion for t1h ( Ut-
puL index is:
k (s k1 + s2k2 ) mod N (3.195)
where
s 1 mr) (I M and 0 mod M,
s I --- 0 mod M2 and SI = i mod M2
While the same inverse mapping in Eq (3.195) could be used
for the input index n, it is more convenient (Kolba and
Parks, 1977) to use:
n = (M2nI + M1n2 ) mod N (3.196)
When the mappings in Eqs (3.195) and (3.196) are used the
DFT becomes:
MI1 I M 2 1 n2k 2 nlk1X(kl,k 2 ) = E x(nl n2 ) n W (3.197)
n1 =0 n2=0 'nWM2 W 1
At this point the WFTA and PFA approach the implementation
of Eq (3.197) differently as seen below.
3.5.3 The Winograd Fourier Transform. A newalgorithm for computinq tie DIT was proposed by Winoerad
in Jul; 1975. The WFTA has properties such that the number
of real additions remained at the FFT level while the
number of real multiplications necessary to evaluate the
DFT was reduced (Silverman, 1977). This paper will not
derive the "small-N" algorithms. Readers interested in
derivation of the WFTA are referred to the articles which
extensively treat the topic (Winograd, 1976; Silverman, 1977;
Kolba and Parks, 1977; Zohar, 1979).
113
SNam
Winograd's proof started with the N by N matrix with
elements:I luens: ir ir mod N
WN = WN = QN(i,r) (3.198)N NN
which can be decomposed to:
Q 0 D I (3.199)
N N N N
where IN ir a u by N incidence matrix with values of 0, 1,
and -1 only, DN is a u by u diagonal matrix, and 0N is an
N by u incidence matrix (Silverman, 1977). The decomposi-
tion of QN is possible with large values of u relative to
N (i.e., u=N 2 ). Winograd solved the more difficult problem
of decomposing QN=ON DN IN given an incidence matrix which
2.has dimension u smaller than N2 . Winograd applied field
theory to give solutions where u approximately equals N for
small values of N, where N = 2, 3, 4, 5, 7, 8, 9, and 16
(Silverman, 1977).
Not only did Winograd prove the minimum multiplication
count for the above small-N DFTs but he also proposed a
special structure of Eq (3.197) using Eq (3.199). The two
dimensional transforr in Eq (3.197) may be implemented by
first calculating M1 length M2 DFTs:
SM- l n2k 2y(nlk 2) E x(nl,n 2 )W (3.200)
2=
and then calculating M2 length M1 DFTs:
X(klk 2 ) E-l k )W (3.201)nl=0
114
Usinq thc notuL,-n o- Leq (3.199) tiC M short trans-
(1)form can be written in terms of the input additions i
output additions 0 (, and multiplications d . The length
M 2 transform uses i 2 ), 0(2) , and d(2) (Kolba and Parks,
1977). The Ec (3.200) becomes:
u -1 M (y(n=1 k 2E 0 ( d(2 i() x(nl,n 2 ) (3.202)
2~ k r r n=0 2r=0 n 2= 0 2
X(kl,k 2 ) in Eq (3.201) is a length M1 transform of y(nl,k1 )
which can also be written:
X(kl#k2 ) = 0( 1) m 1 n1 y(n1 1 k (3.203)
m=0 k1m m n=0 mnI 2
Substituting Eq (3.202) into Eq (3.203) gives:
U1 1 () d(1) M - (1)X(k2k 2 ) = k1m m n *mnI
m=O ~ nl=O
u E 0 k2 r r n i( 2) x(nl,n 2 ) (3.204)r=0 n 2=0 r2
The order of summation may be interchanged to "nest" the
multiplications in the center which gives Eq (3.204)
rewritten as:
X(k 1 1 k2 ) = 0(2) U1 -l 0(i) d (I ) d (2 )
r-90 k2 r m=0 k1m m r
M 1-l1 (1) M 2-l1 (2)x iMn i rn x(nI1 n2 ) (3.205)n1=0 1 n2=0 2
I.
115
II l .. . .. . .
Eq (3.205) is th( f(- m .hat 1-1 s implcrmen t e! into FORTRPA
code (McClellan and Nawab, L979) and listud in Appendix I.
As an example of the "nesting" structure for thle WFTA
consider the case of N=3 given in Eqs (3.190) through (3.192).
First, let
7 ) (3.206)
7(2) [MI/2 + M2/2J
then equating Eqs (3.206) and (3.191) gives:
(i M/2 +2 [ w1 + , (3.207)
3?I/2 - M2/21 (1W2 + x(2)WI
Substituting,
W= exp(-j27/ 3 ) = -1/2-j(/3/2)
W = exp(-j4T/3) = -1/2+j (/3/2)
into Eq (3.207) provides:
M1 /2 + M2 /2 = -x(1)/2 - j(x(i)/3/2)
-x(2)/2 + j(x(2)y/3/2) (3.208)
M 1I/2 - M2,/2 = - (1),/ + j (x (1).-/2)
- M(2) '' - j(x(2),3,12) (3.209)
Solving for M1 and M2 gives:
M= -(1/2)(x(1) + x(2))
M2 = - j (.3/2) (.x(1) -x (2)) (3.210)
116
For the algorithm to be used in Winograd's algorithm the
r mu;t-nlications by W0= must he accounted for and minimized.
This is accomplished by modifying the length 3 DFT to:
a1 = x(1) + x(2)
a2 = x(1) - x(2)
a 3 = x.(O) + a1 (3.211)
M1 = (-1/2 - 1)a 1 -(3/2)a 1
M2 = -j(/3/2)a 1
M3 = W0a3 = a3 (3.212)
C1 M3 + M1
X(O) = 3
X(1) = C1 + M
X(2) C 1 - M 2 (3.213)
Eqs (3.211) through (3.213) result in 2 multiplications,
1 multiplication by W0 , and 6 additions which can now be
expressed in the X 0-D.I-x notation as:
[El1 fl 11}{1 -3/2 0].0 1 1].[X(1)] (3.214)."X (2)_ -1 14 1 0 -j/Z- _0 0 -l _x (2)
and then rewritten into sunmmations as:
u-i N-IX(k) 0kr d iX (n) (3.215)
r=0 n=0
The fast convolution cases for N=2,4,5,7,8,9, and 16
were developed similar to the method used for N=3 above.
The explicit equations for these cases provided the small-N
117
~ -. -
operations count show.n : T,1] , 3.6 whi, ch i:; i. n .orm-
puting the real operatir;': count :I:;. funrcti on
the M'TA.
3.5.4 The Prime Factor Algorithm Theory. A. ilter-
native to the nested alconrithm ,-nroIe( Lv WinocraK was
developed by Kolba and Parks. Because of the algorithms
structure it is called the prime factor algorithm (PFX)
and uses a modified version of Winograd's hiqh-speed con-
volution technique.
Converting the DFT to circular convolution and
reordering the data arrays for the PFA is identical up
through Eq (3.197)
where W = exp(-j2T/M 1),
W M = exp(-j2T/M 2), with M1 and M2 relatively
prime.
The transform in Eq (3.197) may be performed by calculating
M1 length M2 DFTs:
M -l n ky(n1 k2) = 2 x(nl,n2)W (3.216)
n=0
then calculatina M., lenath M, DFTs:
NMII nlk 1
X(k1 ,k2 ) =1 y(nl,k2)W 1 1 (3.217)n =0
The expressions in Eqs (3.216) and (3.217) are implemented
*as short DFTs instead of "nested" operations as shown in
Eq (3.205).
118
TABLE 3.6
SMALL-N OPERATIONS COUNT FOR WFTA
Mul t
N Mult by W 0 Adds
2 0 2 2
3 2 1 6
4 0 4 8
5 5 1 17
7 8 1 36
8 2 6 26
9 12 1 44
16 10 8 74
119
P or both al eorithr" :-tructure the 5:mal-N o ua 1ions
;:r: tx' . : r, only tI. r:i2 I. t t - i,n i., ,iffc ,n t. In
the case of the PFA strucLure, the _:,all- q ulyvrithms arc
modified to permit a "shift operation" instead of a multi-
!- icatinn lby 1/2. 7or thr. N=3 ex ampl I Pqs 2.21l) through
(3.213) are modifieO, to:
a1 = x(1) + x(2)
a 2 = x(l) - x(2)
a 3 = x(0) + a1 (3.218)
M 1 = -(i/2)a 1
S42 = -j (/3/2) a 2 (3.219)
C 1 = x(0) + M1
X(0) a 3
x(1) = C1 + M2
X(2) = C1 - M2 (3.220)
Eqs (3.218) through (3.220) have 1 multiplication, 1 shift
(multiplication by 1/2) and 6 additions.
Similar smal-N DFTs result for N=2,4,5,7,8,9 and 16
to produce the operations count for PFA small-N alorithms
shot,:j in 'able 3.7 (Burrus and Eschenbacher, 1980)
4 (Complex valued sequences require the count in Table 3.7
to be doubled.) If the implementation of the PFA does not
use "shifts" the multiplication count must be adjusted to
reflect the multiplications by 1/2. The original FORTRAN
program written (Kolba, 1977) did not include the factor
of 16. Later modifications (Burrus and Eschenbacher, 1980)
120
I7
PFA SM-ALL-U: D7T HEPATIONS COU.T
N Multiplies Shifts Adds
2 0 0 2
3 1 1 6
4 0 0 8
5 4 2 17
7 8 0 36
8 2 0 26
9 8 2 49
16 10 0 74
NOTE: For complex sequences the values inthe table must be doubled.
i2
121
.ir.cn ided H,- f.-ctor of 16, ich -:!, cliA na 1T-7 of
tran:;Forrin'i!, , SCI . : UIh 2 . .i tis .';",1 ' . it
should be noted that neither 'uklW. ru ion implueiented
the "shifts" which increased the number of real
multinlications.
3.5.5 Real Operations for WFT,.. To use the WFTA
the N length sequence must be factorable into R relatively
prime factors N1 N2 ... NR where each factor corresponds
to one of the Winograd small-I: algorithms for 2,3,4,5,7,8,
9 and 16. It has been shown (Silverman, 1977) that the
number of real multiplications is a function of the factors
of N. To aid in the development of the number of real
operations the following terms are defined:
Mr = number of real multiplications in factor Nr
Ar = number of real additions in factor Nr
N rth factor of N
Winograd proved that the DN matrix is an MR by MR diagonal
matrix with only 0, 1, or -1 for diagonal entries and 0Nand IN are N bY and M by N incidence matrices, respec-
tively. To evaluate the nested multiplications of DN
(Silverman, 1977) requires:
NMULT = M M (3.221)1 2 -MR(32)
which is the real multiplications count for real valued
sequences. For complex valued transforms Eq (3.221) must
be multiplied by 2.
122
' : i i h," 1 t I .:. 1 ' • . .. _
All rvii u it jrd i t,'-t on ; co, nts (Wi nojr, , ] 976;
Kolba and Parks, 1977; Silverman, 1977) use only Eq (3.221)
as the source of real multiplications for the WFTA. The
multirlicctions in Fq (3.221) are all performed by tic MULT
subroutine in Fiuure 3.31. Other real riuI tpA I t:3ns
required in the WFTA for computing the multiplier coefficients
and determining the input and output permutation vectors
of the INISL subroutine in Figure 3.31.
The DFT multiplier coefficients are computed in lines
1450-1510 of the WFTA listed in Appendix H and require:
real mult = 3 * NMULT (3.222)
where N MULT was computed in Eq (3.221). Determining the
output permutation vector in lines 2080-2170 requires:
real mult = 4 * N (3.223)
where N is sequence length to be transformed. Combining
Eqs (3.222) and (3.223) provides the number of real oper-
ations required for initializing the WFTA. Subsequent
transforms of the same sequence length do not require
initialization. The first complex transform of length N
usinc, the WFTA requires:
real mult = 2 * NMULT + 3 * NMULT + 4 *N (3.224)
Subsequent complex transforms require:
real mult = 2 * NMULT (3.225)
Counting the number of real additions is more compli-
cated because the factorization order of N will change the
real additions count (Silverman, 1977). For a given factor-
ization of N = N1 N2 ... NR the number of real additions
123
--- ': -- '' __', ...7Z _ ! , i~ ~ i t I-, ,- . ........ "AL,
- 1_- --- - -
I Determine
,;eave IYulti:Iier,
t :eCntroffcinttod Y.eter.....
Figu'rent 3.1ClwCnrli APogram.ent
,dul 1es.e er. i n J
L -j2-o. . . .
)t
Figure 3.31. Flow Control in WFTA Program.
124
AM&
r K "r.. . . .. _ ~- c ri I i i; ( , I _I llr i
K,' .,]~ c . .. hc :' ::,1, in.i 1i ''... . UO.
the WEAVE] and WEAVL2 JuLruuiucL; in> ur 3.31. First
the real additions from the "WEAVI~s" can be developed by
considerini the speciai cise of * 4 . . N. 1 «mnd
as the "inner-ost" factor and N2 is the "out(rrmcst" factor.
For two factors of N Silvcrman has shown the nu,:ber of
real additions to be:
A(2) = N1 A2 + M 2 A 1 (3.226)
(Recall A2 equal real adds to evaluate factor N2 and M2
equal real multiplies to evaluate N2.) Now consider
N = N1 N2 N3 where (N1 N2 ) is considered to be the "inner-
most" factor. The number of real additions becomes:
A(3) = (N1 N2)A3 + M3 A(2)
= NI N2 A3 + M3 N1 A2 + M3 M2 A1 (3.227)
By iterative substitution the number of additions for
N N1 N2 N3 N4 becomes:
A(4) = (N 1 N2 N3) A4 + M A(3)
= N N2 2 3 A4 + M4 N1 N2 A3
+M M3 N1 A2 + M4 3 M2 1 (3.228)
Eqs (3.226) through (3.228) are used to write a compact
expression for the rumber of real additions needed in the
WEAVE subroutines:
125
vi,
A(R)= 2 ( ( N.) ,/ (. 2 '
The expression in Eq (3.229) represents only real add iti ns
used in WEAVEl and WEAVE2. Other additions are re,:uirc.v
the INISHL iniLializationi sLut.U index L:,
coefficient array and corpute the output index VUct(o-.
Ihe DFT coefficient array is indexed with a J counter
in line 1500 of the FORTRAN WFTA program in Appendix H.
This part of the INISHL subroutine requires NMULT real
additions. The input index array INDXl requires another
J counter in line 1720 which uses N real additions. The
output index array INDX2 uses a J counter in line 2160
which uses N real additions. Also the INDX2 computation
requires 8N real additions in line 2120.
Totaling the real additions in the initialization
subroutine gives:
real adds = NMULT + 1ON (3.330)
Adding the results of Eq (3.330) to Eq (3.229) gives the
total additions needed to transform an N length sequencc-f-
the first time. Subsequent transforms at the same N sc>: n
length requires only the number of adds in Eq (3.229).
The FORTRAN WFTA program written by McClellan and
Nawab, 1979, decreased the number of real multiplications
for N=9 from 13 to 11 while the number of additions remained
constant at 44. Modifying Table 3.6 to reflect the new
multiply count for N=9 gives the McClellan and Nawab real
126
Tabic 3.8.
Using Eqs '.3.229) and (3. 330) wit :1 3., gI e.; the
nir-1)r of real ilr l: - r il1 . ': , n
"REAL MUL 1" and "LAL ,'DI rcL"reLL.t . i D1l" for
the initial transforn of lentn N. The coiumns labeled
"RFL MULT" and "REAL ADD" cive the operations count for
subsequent transformations o the save sequence length.
The number of real operations are plotted as a function of
N in Figures 3.32 and 3.33. These graphs demonstrate the
large reduction possible after the WFTA has been initialized
for an N length sequence.
3.5.t Memory Requirements for WFTA. Thf- FORTRAN
subroutine WFTA listed in Appendix H requires 2348 words of
program memory when compilc2 for the CDC Cvber 74 computer.
The memory array requirements are given by:
XR, XI, INDXI, INDX2: length N
COE , SR, , ,: i- Ith V.-3.'J1'': 4 w lc is
thc nu-1., ' .: t b.' .' v.
the factors of N. NMULT is listed
in Table 3.9a and b.
C03, CO4, C05, C03, C016, CDA, CDB, CDC,CDD: Total of SS
The original version of WFTA dimensioned IDXI, INDX2, COEF,
SR, and SI to their maximum possible lengths of 5040, 5040,
10692, 10692, and 10692 respc'7tively. This made the memory
127
"'' -" '* ', ? I , i , , - , , ,, -... .. . ... ...
TABLE 3.8
MCCLELLAN AND NAWAB'S WFTA
REAL OPERATIONS FOR THE SMALL-N ALGORITHMS
N M(N) A(N)
2 2 2
3 3 6
4 4 85 6 17
7 9 368 8 26
9 11 44
16 18 74
128
C
C C.
0
1-4 ,-4 -4 -'-4 -g 4 -, rj v-1
U,)z0H
E-4.
k, j. 4\.4 r V
0 " -
129
-4 4
N 4
13
0
.
LU
0LaJ
OC3
Ln Z
* 4
0 0
oc
C ) 4-1*
C) Z
t- Z x 0
C3 (- LlU
CC C3 r4
LL() x
K N Xc
'44
13
-LJ
0 LC) Z
0 L
0n 0
rL
0 0
V- -13K 4
C) -
-J +
'-4 J 0I
cooV-4 C3
*+ 0
00O*OZF- 00 otrz OO9 00,01 090 000
132
IL.J
0LiJ
'IJCC0
C)0
11 ~0W
ci 4
0 '
o3 0
4- .0
orC))
0
0 a
0 0
C3 0
4 0
4 0 0
4 0 3
0C)
60.0
k8OW3W gI
133
array storage very large even for the shortest sequence
lengths:
memory array = 2N + 2*5040 + 3*10692 + 88
= 2N + 42244 (3.331)
The memory arrays INDXI, INDX2, COEF, SR, and SI were
variably dimensioned by the author's version of WFTA in
Appendix H. This reduced the memory arrays required to:
memory array = 4N + 3NMULT + 88 (3.332)
The results of Eq (3.332) are listed in Table 3.9a and b
for all values of N. A comparison of the memory required
by Eqs (3.331) and (3.332) is plotted in Figure 3.34 which
shows the drastic savings in memory storage by using the
variable dimensions. The "cost" of variable dimensions is
more work for the user of WFTA because the dimensions must
be passed to the WFTA subroutine using more arguments in the
subroutine call. The original version required:
CALL WFTA (XR, XI, N, INIT, IERR)
The modified WFTA call is:
CALL WFTA (N, XR, XI, INIT, IERR, SR, SI, COEF,
M, INDXI, INDX2)
where M = NMULT. The increased complexity of the second
call is worth the savings of memory arrays.
3.5.7 Real Operations for the PFA. The real operation
sources for the PFA are computea from reordering the data
and performing the small-N DFTs. The unscrambling constant
A which maps ths PFA result from arrays X and Y to arrays
A and B requires N real additions and no multiplications.
134
' ' ' 2 - " ik -PIC A W&-. .. ...
The second source, computing the smcill-N DFTs using fast
convolution, has been proven (Kolba and Park, 1977) for
two factors (M1 M2 ) to be:
real mult = 2(M 1 u 2 + M 2 u1 ) (3.333)
real add - 2(M 1 A2 + M2 A1 ) (3.334)
for three factors (M1 M 2 M 3):
real mult = 2(M2M3u1 + MIM3U2 + M1 M 2u 3 ) (3.335)
real add = 2(M 2M 3A1 + M1 M 3A 2 + M1 M 2A 3 ) (3.336)
and for four factors (M1M 2M3M4) :
real mult = 2(M2M 3M4u1 + MIM 3M4u2 + M1 M2M4u 3
+ M1M2M 3u 4 ) (3.337)
real add = 2(M2M3M4A 1 + MIM3M4A2 + M1M2M4A 3
+ M1M 2M 3A4 ) (3.338)
where ui is the number of multiplications required for
Mi and A i is the number of additions required for M.
Notice that complex data transforms have been assumed in
Eqs (3.333) through (3.338) and the number of multiplications
tand additions were multiplied bV two.
As shown in the PFA theory chapter the small-N
alqorithms can be implemented by using "shifts" instead of
multiplications by 1/2. The FORTRAN programs available do
not make use of these shifts. Therefore, the operations
count for the PFA small-N DFTs shown in Table 3.7 is
modified to produce Table 3.10. Using the results of
135
- [{ • p •7'
TABLE 3.10
PFA SMALL-N DFT OPERATIONS COUNT FOR NO SHIFTS
N MULT ADD
2 0 2
3 2 6
4 0 8
5 6 17
7 8 36
8 2 26
9 10 42
16 10 74
136
-Aok
Eqs (3.333) through (3.338), the N adds required for the
output mapping, and Table 3.10 the number of real multi-
plications and additions are listed for all permissible N
values in Table 3.11a and b. The corresponding graphs in
Figures 3.35 and 3.36 show the multiplications and additions
as a function of N.
Even though this FORTRAN program did not use a shift
to perform multiplication by 1/2, incorporating shifts into
the small-N DFTs represents a significant savings of real
multiplications. The major benefit would be in small
computers where software multiplies are more costly relative
to additions. The benefit of performing multiplications by
using shifts is given in Table 3.1a and b under the PCT
(percentage) column. PCT was calculated by:
PCT = ((M-MS)*I00)/M (3.339)
where M is the number of multiplications without using
shifts and MS is the number using shifts. The percentage
savings as a function of N was plotted in Figure 3.37 for
all values of N.
3.5.8 Memory Requirements for PFA. The PFA proqram
listed in Appendix I requires 770 words of program memory
A when compiled for the CDC Cyber 74 computer. The memory
array requirements are given by:X, Y, A, B: length N
The memory array required by PFA is given by:
_..y array = 4N
137
TABLE 3.lla
PFA REAL OPERATIONS AND MEMORY COUNT FOR N<72
-'. -"-'4.
") . c. ' -:? '
!1 2-- 1C , ?'
(".l ., .°
"- 'I..'"Z "' -1:*.,4, , 4
- 4 ":1 " ,' ," .7 -"" -:"-:-(
... .i... . . ; .4"., - . ~4.~ / .4 -" ! : : 4 ,
..4- 4 ', ? ' '
...... ~ . t 4'i
i - -S.". '.a - -. -, - 4i
TABLE 3.11b
PFA REAL OPERATIONS AND MEMORY COUNT FOR N>80
f ", 1 """ L l
IL 7 D/- ?T -A ,
-- : .,r , r ,,- 7
L L,
4. ..
4 ,- - -.. -l 2 . - . -
-i '- r ,' -"
4-. . --- -
2" P
139
-- , " : ; i "
. ,- . : . ....
-
u-i
0
o LiJ
CoLLJ
ocot 0
ato
-- 4
C3
x.
X Lfl'x-
x
00 M 0009300,13 o 0 1 o 6 oc'(Fon 18NIr
140
AL0
0
LO
0
9z
LAJ
o LLJ
0 L&04 4
10
C44-
00
0
141.
LO
C!
0
C)
0WOLJ
UJU.H
0..H
w-d4
0-
La
4w
40
40 L0
142
LO
C
a
C!0
-LJ
CD
wZ
9) 0)
C--
oc
0-I
It 0
v 0L
40
Af., Am
and is listed in Table 3.11a and b ,ind Piotte - in
Pi,;ure 3.30.
3.5.9 Summary.. Two algorithms which use hic-h-
speed convolution techniques have been presented. Both use
the convolution for computine snal]-7.U DFTs ndI Loch reqnuire
N to be factored into relatively prime factors. This
particular factorization used the Chinese Remainder Theorem
and the "Sino correspondence" to reorder the data arrays.
The theory, structure, and operations count was presented
in this section.
144
4.M,,, A f - ' . ...
IV. Comra-ria;on Results of Efficient
ric-.t ou i ,r IT an-; forpi!
4.1 Introduction
Several fixed radix and mixed radix algorithms have
been studied nd C. number of real operations an6 meporv
count required have been computed in the preceding sections.
The results from these sections are compared and presented
here.
Tradeoffs and advantages of fixed radix and mixed radix
algorithms are discussed, the justification for selecting
Singleton's algorithm over the IMSL and mixed radix FFT
is given, tables and graphs comparing the conventional
rmixed radix FFT with the fast convolution algorithms (WFTA
and PFA) are presented and advantages of each are discussed.
This chapter concludes with an algorithm which selects the
most efficient algorithm based on memory available, machine
speed, zeropacking, and sequence length. A flowchart imple-
mentation of the algorithm is included.
The timing tests in this section used the Cyber 74
S- sstem clock. This clock was accessed using the FORTRAN
command SLCOND(CP) which provides a timer accurate to .001
seconds. The transforms were all performed using samples| -t
fror the function e cos 50nt which has the magnitude
transform shown in Figure 4.1 for N=625.
I
145
lw.
N- 625
D-
rn
oMAG
0
0
.0
0-
-50.00 -30.0O0 -tO.?O 10.00 30.00 50.00
FREQ
Figure 4.1. Fourier Transform of e- t cos 507t.
146
0|
"± ,__ • ° c...
The memory comparisons vidc in thi:s chccpter are based
n -(,m~ory Jrrv ) n: r*: te:
compilation on the Cyber 74 is not applicAAli fo smaller
machines and would not permit valid memory comparisons. The
program memory required for the C''Lcr 74 is Tiuvo: t show
the relative sizes of the algorithms.
4.2 Conventional Radix-3 vs R(u) Field Radix-3
In the previous chapter the real operations count for
these two radix-3 FFTs was given in Table 3.2. From this
table the most efficient radix-3 algorithm can be selected
based on machine speed. Validation of this table was per-
formed using the CDC Cyber 74 computer which has a 1.1
multiply-to-add ratio and test data).
With a 1.1 multiply-to-add ratio Table 3.2 indicates
that the conventional radix-3 algorithm is more efficient
for all sequence lengths shown. The timing results in
Table 4.1 verify this conclusion.
4.3 Fixed Radix vs Mixed Radix FFTs
In Sections 3.3 and 3.4 the real operations count and
memory requirements develoPed for uhe fixed radi; and mixcd
radix FFTs. Using the results from these sections the real
operations count and memory requirements are given in Table
4.2 along with results from timing tests conducted on theI!
CDC Cyber 74. This table demonstrates that Singleton's
mixed radix FFT (MFFT) minimizes the operations count for
factors of 2, 3, and 5 to the level of the fixed radix
algorithms.
147
- 'U-
TABLE 4.1
RADIX-3 TIMING COMPARISCO
Colive"'ionl 1 R(Lu) i l
N Radix- 3 Time Radix-3 Time
27 .002 .003
81 .009 .011
243 .026 .034
729 .094 .117
2143 .305 .393
44
148
-27-
Lo 4 J
F x
C\4O r- CDLo n :, r 1
~~~~~~~ LI -000 7 7~r
CN cc -T~
C- I
-44
The prograi 7emory requirer! by v>a!, nri thF is given
tfn r''W l] c. '.3 . Jh & . l,-2-JO 2]: 2 f i} " " i V V : ''' Of th<
exLra sections needed to transform any len-!th tr a nf.sfrrm and
the extra FORTRAN code required to perform multi-variate
transforms. None of the other FFTs are caablo of -erformir.
multi-variate transform, without a significant amount of
additional user programming. Singleton's .IFFT can Perform
up to a tri-variate transform, however, this additional
flexibility is a disadvantage on memory limited computers
when performing single-variate FFTs.
The fixed radix and mixed radix FFTs are roughly
equivalent in efficiency. The fixed radix FFTs offer a
memory savings over the MFF2 for all radix-2 transform
sequence lengths shown in Table 4.2 and some of the radix-3
and 5 transform lengths. The main advantage the MFFT offers
is the capability to transform any length sequence N while
the fixed radix algorithms are limited to integer powers
of 2, 3, and 5.
4.4 Mixed Radix FFT Comparison: IMSL vs Sincleton
In Chapter 3 and Appendix C the rec] );joraticns and
memory required for the IMSL and Singleton's mixed radix
FFTs were derived as a function of N. Those two algorithms
are now compared on the basis of real operations and memory
and the best algorithm selected.
150
• . .e -
TABLE 4.3
PROGRAIM MEMORY REQUIRED BY FFTs
FFT Program Memory
Radix-2 108
Radix-3 301
Radix-5 458
Singleton's MixedRadix 1100
I
151
ihe ex )1 o ;io f for real muIt j;) 1 i i (t)r io : , i ( i t,!, tions
(I I,' O( o -()1 Si n~(" Iotf(n ': 1.'"T it ! !"t.' 't,,: !-':', I y i.
FFT expression for real operations to show the o:.:tra oper-
ations required by IMSL. Recall that both Singleton and
IMSL versions of the FT compute sine and cosine usino the
difference equation of Section 3.1. Both implement the
sine and cosine computation similarly and require the same
number of real operations to compute them.
Assuming that N can be factored as:
r s t u ml mk (4.1)N =2 3 4 5 p1 ... k(41
the difference in real multiplications between JMSL and
Singleton's becomes:
delta multiplies = [IMSL multiplication expression]
- [Singleton multiplication expression]
deltamultiplies = (2rN + 4sN + 3tN + 8 + 32(u)N/5
k+ E (2 (pi-i) + 4(mi)N(Pi-l)/Pi
i=l
+ (mi)N(pi-l) 2/pi) - 4N-1) + KMULTI
- [2rN + 4sN + 3tN + 32uN/5
k+ E (2(pi-l) + (mi)N(pi-l) /pi|i=l 11 1
+ 4(mi)N(pi-l)/pi) - 4(N-1) + KMULT]
= 8 (4.2)
For large values of N the difference in multiplications is
negligible.
152
The difference in roa] additions iz derived from:
adds = [IMSL addition expression]
- [Singleton addition expression]
deltaadds [3rN + 6sN + 15tN/2 + 4 + 48(u)N/5
k+ ( ((pi-) + 8(mi)N(pi-l)/pi
+ N(mi)(pi-l) 2/pi) - 2(N-1) + KADD]
- [3rN + 16sN/3 + lltN/2 + BuN
k+ E ((pi-l) + 7N(mi) (pi-l)/pi
i=l
+ (mi)N(pi-1) 2/pi) - 2(N-1) + KADDI
= 2sN/3 + 2tN + 8uN/5 + 4+ N(pi-l)/pi (4.3)
The results from Eqs (4.2) and (4.3) demonstrate that
the IMSL has approximately the same number of real multi-
plications but requires significantly more additions than
Singleton's mixed radix algorithm. Based on these results
and because the data reordering for the two subroutines
is the same, the Singleton FFT is the nost efficient of the
two subroutines. This conclusion was confirmed by timing
tests on the CDC Cyber 74 computer at AFIT. The results
are shown in Table 4.4 for selected sequence lengths.
The memory array required for each of the alqorithms
was derived in the preceding chapter. Those results are
now compared for N less than 200 and the percentage of array
Vmemory saved by Singleton's FFT over the IMSL FFT was plotted
in Figure 4.2 using the equation:
153
M- , .
TABLE 4.4
TIMING RESULTS FOP, IMSL AND SINGLETON FITs
IMSL Singleton
N Time (sec) Time (sec)
60 .010 .008
120 .018 .014
125 .019 .012
128 .013 .011
210 .039 .036
243 .031 .031
256 .028 .021
315 .054 .052
420 .081 .072
504 .090 .082
625 .128 .076
729 .107 .107
840 .163 .150
1008 .151 .157
1024 .126 .092
1250 .275 .158
1260 .268 .231
2048 .269 .224
2187 .366 .364
2520 .565 .495
154
/~• • -, , ~~f ., '. _,- -... 1
CD
touSz--w
C3
C -
- 0
C!r
C U2
co 4
0
0 4
5155
. SI'/1ln~:; - (Mi..ICC ' .... ' .. ... • - ... ,) r O, . ., _(4.4
MILMSN,; : Sing1,. '-on ' s arra'y : 1mo r','
From the plot it is evident that Sinqleton's a1:ritm uses
loss maner' than the ICSL .ro-r . Th " " - of
the curve approaches 57'. which can be verified hv c":aina-
tion of Lqs 3.172) throuoh (3.17S) for N a primc nu.bcr.
This number represents the memory savings at the points
where N is prime.
The values of M, K, KB, and JK used to compute the
IWORD constant in Eq (3.173) are M=I, K=N, KB=N-2 and JK=l.
IWORD = 3 • M + 3 + MAX (4 - M + 7 + 6 - K,
KB + 1 + 2 - JK) (4.5)
IWORD = 3 + 3 + MAX (6N + 11, N + 1) (4.6)
IWORD = 6 • N + 17 (4.7)
Now the memory for IMSL given that N is prime becomes:
MEMCC = 2 N + 2(6 • N + 17) (4.8)
MEMCC = 14• N + 34 (4.9)
The array memory required by Singleton's FFT is based
on the v-,.]us NP and KD. ',P is direns i ne".i ,'t .ess than
the product of the square free factors of N or if at most one
square free factors is present, MP can be dimensioned to M+l
where M is the number of prime factors in N. KD is the size
of arrays AT, BT, CK, and SK where KD equals the largest
prime factor in N. Using these results the expression for
[array memory where N is prime becomes:
MEMSNG 2 • N + 4 • KD + NP (4.10)
156
.. ,
:': i , i i i i {' ' ' I i i= "A
Substitut ing for N1 and ED this c.uaition is.
.. .... 2 >_ 11 4 N 4 2 (4.11)
MEMSNG 6 ' + 2 (4.12)
Substituting Eqs (4.9) and (4.12) into the percentage
expression in Eq (4.4) is seen to Hciach approximaLuiv
57%:
% savings = ((14 • N + 34) - (6 • N + 2))
• 100/(14 N + 34) (4.13)
% savings = (8 - N + 36) 100/(14N + 34) (4.14)
As N gets large Eq (4.14) becomes:
% savings - 800N/14N - 57% (4.15)
which corresponds to the results shown by Figure 4.1.
The memory array must be added to the program memory
to determine the size of the program. The program memory
required by each algorithm was determined by compiling each
algorithm for the CDC Cyber 74. The IMSL FFT used 1061
words and the Singleton FFT used 1100 words. The larger
size of the Singleton FFT relative to the IMSL version
is because of the extra FORTRAN code needed to ,erform
multi-variate FFTs. These program me -cr " ficzurcs -ire onlv
applicable for the FORTRAN compiler used here at Aii'T,
however, they do provide a relative measure of the program
memory size. Singleton's program requires about 3.7% more
program memory.
IThe results for real operations count and memory
required show that Singleton's mixed radix FFT is superior
157
1IWI
- 1,A'ho 1i..
availab I for com-ari -,-i to thr WPTPA ,-nd T-' ,\ ' !, fol 7low-
in sections.
4 ' ConV(UtI .alV-; V <~ Con,'c- i
Singleton's algorithm (MFI"I) is referred to as a
"conventional" FFT because it uses the Cooley-Tukey deci-
mation and reordering of the data array. The WFTA and
PFA use Winograd's small-N fast convolution algorithms
to perform the DFT. The operation and memory array counts
are presented in Figures 4.3 and 4.4 and Tables 4.5a and b.
as a function of N for comparison of the three algorithms.
These tables and plots illustrate the advantages and dis-
advantages of each algorithm and are used along with the
fixed radix results in Table 4.2 to select the most
efficient algorithm for a particular sequence length and
machine capability (size and speed).
The tibles and plots refer to the algorithms as MFFT
(Singleton) , TA:.j neru< ) , -ind PA (i~oiba-Pa -ki . The
PI'A used for i-. , r on c,)Lnts and misorv -s s
the one described by Burrus and Eschenbacher which includes
prime power factors of 2,3,4,5,7,8,9 and 16. The FORTRAN
coded program for PFA was obtained from C. S. Burrus of
Rice University and does not make use of "shifts" for
multiplications by 1/2. Both the WFTA and MFFT FORTRAN
programs were obtained from the IEEE Press "Programs for
Digital Signal Processing".
158
I ! *., . 1r 1 i :, 1 .; ! k L... . .. a U S(
the program emory changes biased on machine woru length.
The program memory required for the Cvber 74 is liven for
e,ach a I -nr i thm so thr, YolcO i " ,-, be cor :.,
4.5.1 Real Operations Count. The mixed radix KFFT
written by Singleton includes special sections for factors
of 2, 3, 4, and 5 as well as a general section for odd
prime factors which permits the transformation of any
positive integer N length sequence. Because of the special
sections the operations count is less for an N which is
highly factorable by 2, 3, 4, or 5 instead of higher prime
powers. Figure 4.3 and 4.4 demonstrate the efficiency of
Singleton's MFFT relative to the radix-2 complex transform
multiplications and additions count of 2N log 2 N and
3N log 2 N respectively (Winograd, 1976). The MFFT oper-
ations count shown in Figures 4.3a,b and 4.4a,b are for N
factorable by 2, 3, 4, or 5 combinations thereof. The
WFTA and PFA counts are shown for all 59 seauence lenaths
which they can transform. Recal ,or. Section '.4 and 3.5
that \<F'TaA ad PIK s30uence ns -c limited by" the data
reordering algorithm used by the WFTA and PFA. These
Y figures also reflect the WFTA "post-initialization" oper-
ations count. As shown in Section 3.5 the post-initiali-
zation count is significantly less than the number of
operations required for the initial transform of length N.
159
... ~ ~ ~ b AAA.,,m ., i- _ . .
4 E0
LC~
4 El+0
+ 40+ C3'
0 L+ 8 u
+ + 0 L
4. 0
+1 0
+ ~C.
8 +
fl.1i
1601
.~+ +
0 CDJ
+ 0-j
+ L UJ
cnLU
0
a + +Ih.
+4- +
+- 00~0
4- 0c+ -0
++ 0
+ + .1
44-+0
+ *+- + Lb
+ .
LL-4 LL -X: CLA
4-00 4.gl 0 6T 09 06 0 a 0
I-o l*.. C34E
LL a. U. 4161
+ 03 ~LO
+ (-O.
+0
0 UJ
L
+. a LLI
+
+ ~ 0A LO
+ +
+- 0+ 0 ,
+ 100
+. 0
I. .L-
Ll.. LL. LL.
Al El +0
4 0008 0+09 DOo O6 0.g D-c D
1010
162
+?4 C2
+ C)
-Ji
+ 0n
+ c
+
4-L
0
CC-
+
4L ci
4 LO
C4-C34
00-00'Zc 00Q 0O 0LR 0.o o46 1011
SOOU -IU3
163
.4.-1 .4 4 ,-4 -4 .4 ,4 j C\J C\ J C'j CVj
.c: LL 71, ') -I'j ?) it X -n 7 ' Cj -t aNI 'j q 7 -oIn -q0 - Cd V!) "n f
A44
2- L 4 V-4 14 V4 r'!-4e.J4
0
PL.4
:4 c
0 U)4.fIm4 (\ Nj co 4r IV*'C~ oN % \I , C%4 :C 00 C
It ..-4 ti- -4 -- 4 r' I) t -I~. (1 ,.0 CO 01 11- -4 -4~ 1. 'j .C M\ rr) Z O - I,-
uI. ,-A. -X 44(I(iN 4
.40
4P
o4 14 -- ,- (\I -4 .4 (1 Nk 01-4
IL4 -4.~ -,. .4-r x 4,. A NJ-
E- ZZ
164
JD~ T ~ -1~c~f- '0~\Jj .
-4 1-4 1-4 v4 14 T4 (\.1 w' C'j C\J r-) .1 :: c
C\J~~~~~~~~~~~~~~~~ co -*N ocol 4 - t-1- \J\ WD- l I
.11 L - " t ,M N C, jo ' or% , ' J 0 , r.) 7r -4 T4 co 0 r 4 ) tA .I o'r C\J
-a -4 -4 0,11 it\
I- LL 1i 0 -0 m - r, un d (r. , I.'-tn , .cr , ff 1- t f m r 4 r ,
H- -4 14 4-4-1 . C'.j C'4 rl) 0 t-P 4 14
E-4
.r4 C. 14 14j C-j 14j C'm (J NN "~ 1-\ to cc ' c IMt oO
0(t cc co NA %C~ coi 'T k' (3' 11 x.. K . IQ '\ (NJ 2' J (\I co. (xi -T .4-ac )
Lr -d v..4 Nj (\ EN. fl) P") N )V cx -4 *.. .. d .Tr - , I W -)
14
.1 W -f .-4 4 .- 4cr4 F- to u.) tD 0l'1' 0 ~ l. c7-'V cc 4to 4N.
E-4 L,, T% In w-4 -4 -
o1 0- - -,4 .)' c~H4 .. j (\4 F-) -+ to I' IN I N ~
n JM + n"4,1%01,W n N1i -i - -A N~ ',.71 PO (\I . en P- .Vt\n -
4 --4MI-4 -lC\ -4\ +4 .4 .- -(% ' N') In -41-4
LL 4 -4 kC (\J jrjj r) M. " M-\ )nf C,1 -) 7' C
11 -4 -4 -4 14.4(V (\J (M I-)- i JI 4
04 <
C) (j k. 4 a j ~j cx - rV -- V165(\
(2I i In rt-i t- o n.o f nt nII ,i tC!- i ial i-
fi\ ( fl.jJ tile .,t 'I _I '1 ,
data presented here w .s :llected by timing tie £:diviuUal
subroutines (INISHL, PERM 1, WEAVF 1, MULT, WEAVE 2, PERM 2)
in the WI2TA for different sequence lengths ind th,,n dlvinw
the time required for each subroutine by the total time for
all of the subroutines. Comparinc the MFFT and PFA against
the post-initialized WFTA is assumed to be valid because
most applications of DFTs involve the repeated transform
of N length sequences.
A point by point comparison of MFFT, WFTA, and PFA
real operations is presented in Table 4.7. The
sequence lengths in these tables represent the only lengths
permissible for both PFA and WFTA, whereas the mixed radix
MFFT can transform any sequence length. The operations
count presented in Tables 4.2, 4.7 with a computer's
multiply and add speed can predict the most efficient
(fastest)DFT technique for that particular computer.
Using the multiply and add ipceds determined for
the CDC Cv-ber 74 (see Appendix J) as 1.9 x 10- 6 s 7con* ~-C
a : ] .-, x 10 scenvk,, ,y' ct1V-: , the t: '
execution speeds were predicted from the operations count
in Tables 3.9 and 4.7. The predicted execution speeds
do not account for all of the actual execution time
measured as shown in Figure 4.5. The extra time which
was not predicted by the real operations count comes
from array indexing and data reordering needed in all of
166
TABLE 4.6
TIMING RESULTS FROM TIE WPTA SUBROU'T'INES
N INISHL PERM 1 WEAVE 1 MULT WEAVE 2 PER%1 2
315 48.0. 7.5% 16.3%, 4.5, 16.3% 7.4:
360 47.0% 5.9% 15.7% 5.9% 21.6% 3.9%
630 43.9% 5.6% 18.7% 5.6% 21.5% 4.7%
720 44.0% 3.5% 20.0% 6.1% 22.8% 3.6%
840 34.5% 5.5% 23.6% 6.4% 23.6% 6.4%
1008 48.0% 1.7% 19.2% 6.2% 21.5% 3.4%
1260 38.2% 5.3% 18.1% 6.4% 27.7% 4.3%
Results are given as % of total timeto execute WFTA.
1
I
167
C ) C D 0 C ( (D 0 c
00 Q
Qr r
0 x
C'. N 0n TTl 0' N -4 r- r4 r C E0 1-4 1-1 (1)N m mN ( D 1'DT '-0 0 1- -40
Ok 0 0 0 0 0 0 0 0 -
En -4 r'
- 4 0 (N ('( C)(N. r 7 r Lf) 4jU
-1 -(N -) (- WZ (N10 r r) C o :C- 0 CD 0 0 (N LA D 0 -4 N C) a L0
-4- u m
LA 4 Q)
0' -40 N ( 0 N C 0 2' N L-4 ~ 4 ~ 4 (N (( 0 0 0 N 0 c C 32
C2 - 0 0 0 0 0 0 0 0 0 0 '-4 I nE-n r- -1 o r- a.o-
4-4 C~ -4
(N 00 0 00L^
(N ( (N (N .) N &.)U 4.)H -0 0 0 0 0 ~ <
00
E- -
00 0 0- -
1- N' -N m3 m T 0 w r N .3)
-4 -4 C444
168-'0'
AAII
N C-
if If
w
A LL3T
4
ElHd GMCI JO *::;dVO CaJI~-
170
based only on real operations are sufficient to select the
most efficient algorithm as demonstrated by Table 4.7.
The tiiiinc: results in 'Table -.pmo ra ono-t-o-nne wi-th
the predicted Limes (given the standard dviatio:.: she.' in
parentheses) for all three alaorithms. Several observa-
tions can be made from Table 4.7. First, the WFTAI which
represents the initial transform made by WFTA may be slower
than MFFT for certain sequence lengths. An example of this
is N=315, 630, and 720, all of which were correctly pre-
dicted to be slower from the operations counts in Tables
3.9 and 4.6. Second, the post-initialized WFTA2 and the
PFA were predicted to be, and are, faster than MEFT for all
sequence lengths. Third, the PFA and WFTA2 (post-initiali-
zation) are close in efficiency for all sequence lengths.
4.5.2 Memory. The memory array for MFFT, WFTA, and
PFA was compiled from the previous chapter and presented in
Figure 4.6 and Table 4.5a and b. The figure clearly demon-
strates how much less memory array is required by >TFT.
These rosults are due to the efficient d vta reorderino i
technique of MFT which can essentially be done in place
with very little additional memory relative to the sequence
length. The WFTA and PFA base their data reordering on
the Chinese Remainder Theorem and require an additional two
length N arrays for PFA. The WFTA uses even more memory
array because of the algorithm's structure which "nest"
multiplications inside all the additions. This requires
171
w
~Lb
a(
+cn rJ -
C; I.I 4
C()
C4.
>1
+ (
4-4
44
E)E
1- U- CL j 4
r- C)
L4
E) q O y
172
thr ,, -iddition, l arravs nf- lonuith -1 --
store the multiplication coefficients : • ing9
array storage because the WFTA is not comiputd- i 1ce.
The proqram remor': was not inc11 udcti i!n th( t I on S
for comparison because program memory required depnclds on
the machine word size. The :rogram menor" required on
the Cyber 74 for each algorithm is:
PFA program memory = 770 words
WFT program memory = 2348 words
FFT program memory = 1100 words
These results were achieved from the standard compiler
command FTN for the FORTRAN IV language. For short sequences
these program memory requirements contribute significantly
to the choice of the most memory efficient algorithm.
4.5.3 WFTA vs PFA Operations Count. The tradeoffs
between WFTA and PFA for real multiplications and additions
can be seen in Figures 4.3 and 4.4. In most cases the WFTA
requires less multiplications but more additions than PFA.
The selection of the most efficient alcorithn then oc0=o's
dependent on maclhie adeed t ion ad- t en!(:o 1, o
real multiplication. As an example of this tradeoff between
4additions and multiplications consider the case of N=630.I
For this sequence length the PFA requires 4352 multiplica-
tions and 18534 additions while the WFTA requires 2376
multiplications and 22072 additions. Assuming the machine
add speed of 1.7 x 10- 6 seconds and a multiply speed of
173
AI.-< .
_ _. I I I ( I l i i t9
For thc sclectud ad&. a i:1ui Liply' speed PFA was faster.
However, c.nsdhr , case where a multiply requires three-6
tj.; tIhA . iL! .2 10 sc6onds. Fr the
same N=630 the P!-. speed is -redicted to be .054 seconds
and the WPTA speed is .050 seconds. With the increase in
multiply time from 1.9 to 5.1 microseconds the WFTA
became the more efficient algorithm. This example illus-
trated why the add and multiply speed must be known to
select the fastest algorithm for a particular sequence
length N.
The effects of changing the multiply to add ratio from
1 to 20 is shown in Figure 4.7a, b, and c for MFFT, WFTA,
and PFA. For the sequences N=315 and 1008 the PFA is most
efficient at the low multiply to add ratios but as the
multiplies are "more costly" the WFTA soon. becomes the
most efficient. For N=30 the WFTA is the most efficient
for all ratios.
* 4.6 T P1 c i it -A '. t Ior ithrs
It is clear from the plots in Figures 4.3, 4.4, andI
data in Table 4.2 that the fixed radix FFT, PFA, and WFTA
are somewhat limited in permissible sequence lengths,
whereas the mixed radix FPT provides a much more "dense"
4selection even for sequence lengths factorable by only
- 2, , 4, or 5. The restriction in possible values for N
174
+ 4 D
+ 4 S
4 El
+. 4
4. 4 9)
4. 4 63
+ 4 E3 a l
4 + 4 ElC) 0
+ 4 E)(
+. 4 j o3+ 4 0 0
.00
+. 4 El L .)
C) a:
+. 40 03
+ 4 E
+. 40 l
+. 48 * 4
+. 4E)
M X C.. 49
+ ~ C3
* + 40
068 o-L L. 69 4.o 49 Doo -1 D
04 1 LL0L6
17
4Ism
+ 14 El
+ El8
- El
+ 4 El
+- 4 El
+ 4 El CZ
+ 40 El
+ 4 El
+ 14 El
+ 4 E CJ
+4 4 Sl
+ 4a
+4 E)
+4El eL-+ 4 ee
+. q El U)
+ 44 El 0 i+ -d C -4
+ 49 ElL
+ e9I-
+ 4 E
o>+ 4
+ 4+J~
+ 4+
00~~~~~~ ~ ~ 499 C 91 0 z1 0-6 0; o-d
JWIINO~iindw
* 4176
+ 4 El+ E3l '
4- El
+ d El
4 + 4
- 4 8
C
4- 4 El o
4- 4 cm-.'.a -- _
+ q El
+ 4 E)0
+- 4 El c
4- - 4 8
+ - 40 El mf
+ E-40
4+ 46 8
+ 464
4- 4 8- , ,-
+ 48 E
4 4 El -4
CD U
4 E3+- 48 EH
+ + El
+ 4 El4-4
+- 48+- 4E8 4*
CD 3 Lk- NO L L - 0
+
4--
Q)l
0
t+ I
3WIi NO~UiflidwoDj
177
2,3,4,5,7,P,9 aind 16. This limits #o 'imr factors and a
:u l u 0 of 3040. The fixed : i:: L ri t :. ire cyon
-- : st ict.] ch-i .' or i"'A L,,_,,",ur oiuc." c n trnsfor:
onlY sequence lengtn which .re an intejer ower of 2, 3, or 5.
4.7 An Algorithm to Select the Most ff'icient DFT Technicue.
The results of this chanter are used to develop a
systematic approach to selecting the most 2fficient DFT
method from the fixed radix FFTs, mixed radix FFT (,MFFT),
WFTA, and PFA. A flowchart is presented which selects the
most efficient algorithm based on real operations, computer
memory, machine speed, and sequence length. The algorithm
requires inputs of machine speed for add and multiply,
sequence length, zeropack limits, and computer memory. This
alqorithm also assumes that the same length sequence will
be repeatedly transformed such that the WFTA is initialized
only once.
4.7.1 Ar;::ts. .oj , th::--------rs Inruts:
N: Sequenco lone:th to be transformed
NP: The upper limit to which the sequence length
can be filled to reach an efficient transfo, rm
length.
A: Machine addition speed
M: Machine multiplication speed
178
7 AD-AlSO 782 AIR FORCE INST OF TECH WRISNY-PATT'ERSON AFB OH SCHOO-ETC F/6 12/1EFFICIENT COM4PUTER IMPLEMENTATIONS OF FAST FOURIER TRANSFORMS. (U)
DEC 80 J D BLANKEN
UNCLASSIFIED AFIT/6E/EE/8OD09E4EEE~~EEmhhEohhhEmhE
I IEEEEEEEEEEIEEhuEEEEEEEEEEE
4.7.2 Usage. The algorithm is presented as a flow-
fcli ir,. The hclejic lr (jic of the alqorithm is:
(1) Zeropack (if permitted) to the nearest WFTA or PFA
sequence length.
(2) Determine the memory requirements for the WFTA and PFA.
(3) If WFTA and PFA both fit in computer memory available,
select between the two by using real operations and
computer speed.
(4) If only PFA or WFTA fit in computer memory, select the
one that fits.
(5) If neither PFA nor WFTA will fit in computer memory,
zeropack to nearest N an integer power of 2, 3, or 5.
Choose the most efficient algorithm from the fixed radix
FFT and MFFT based on real operations counts and
machine speed.
(6) If fixed radix FFT cannot be used, zeropack to nearest
N factorable by 2, 3, or 5 and use the mixed radix FFT.
Using the flow diagram of Figure 4.8a, b, and c along with
the specified tables selects the most efficient algorithm.
An example for N=410 demonstrates the use of Figure 4.8
6' and the tablcs in this paper to select the most efficient
DFT. Given that A=450 nanoseconds (ns), M=1000 ns, 10%
* zeropacking permitted, and no memory limitations, the most
efficient algorithm can be selected.
t (1) MEM is very large and is not a limitation
(2) N=410
(3) NP=410 + .10(410) = 451
-Y 179
-f: - ; .. . . ..i . .. ... ; \ , -: . .. - :4 : . . . - ,
L Zoropaickn
MEN Permitted
'A Table 4.-
Set fr
NN
Set A
NP
DetermineA and M
NoN P WN,
SNo
4N
Figure 4.8a. Flowchart to Select Most EfficientAlgorithm.
180
neare .it WFTA& PIFA lenirgtl!
>N in
PFA fit No wilFtin c will notA
? fiteither
Table 4.6 & Y
\ ro .m moryfast; es f o
W F AN U s e t h e P F
A
fit in to transformiomputer o sequence
Agrtlength
'
fastesro
Determine
Table 4.6~usina
A&M
{ Stop
1.-
Figure 4.8b. Flowchart to Select Most Efficient
,
Algorithm.
181
- -
A
? ?
Yes Yes
[eslect tl.
N.P ralix-3 "(es fja.test
w r of . . . - 'iradix-3 fr,-
,3,or5 Tbe3.
?No No
SelectfatUse MFFT est FFTto compute or MFFT in I
DFT Table 4.j2
•ompute memor Computerequired for memory for
MFFT FFT chosen(Eq 3.172)
FFTmemory<
MEM?
Zeropack to
W most "factor-]able" N
No Yes
i '.N cannot Use FFT
F.be chosen totrans formed Itrans form
i sequence
~Figure 4.8c. Flowchart to Select Most Efficient
Algorithm.
V 182
Jim . . . .. - ' -.t -". . '-
(4) NP=N? No, continue
(5) NP<5040? Yes, continue
(6) Zeropack to nearest WFTA PFA length given in Table 4.6
which is NP=420.
(7) PFA fit in computer? Yes, continue
(8) WFTA fit in computer? Yes, continue
(9) Determine fastest algorithm between WFTA and PFA from
Table 4.6. For N=420,
WFTA PFA
Mult 1296 2528
Add 11352 10956
Using A=450 ns and M=1000 ns the predicted speeds
are: WFTA = 6.4 milliseconds
PFA = 7.5 milliseconds
For this sequence N=420 and for the add and multiply speeds
given the WFTA is the fastest algorithm. However, if this
sequence were only being transformed once for a particular
utilization and the WFTA could not be repeatedly used without
initialization the WFTA counts must be taken from Table 3.11
where 4920 multiplications and 16200 additions are used to
initialize the WFTA and perform the transform. Now the
WFTA is predicted to use 56.5 milliseconds to transform
N=420. When selecting between WFTA and PFA the particular
utilization must be considered.
It should also be noted that the predicted times from
Table 4.6 are based only on real operations which do not
account for all of the execution time required as shown by
183
I
L " ", ' l t' " - "- ... . . . -.. . . . ..---
the timing tests. For the cases tested in Table 4.7 on the
I CDC Cyber 74 the rcoal oporatioai* accounted for average 670,
of the PFA, 65% of the WFTA, and 61%. of tho.MFFT actual
execution speed.
184
V. Conclusions
This paper, for the first time, presented a capability
to select the most efficient DFT based on real operations.
These real operations were tabulated and plotted as a
function of N. The algorithms studied and compared for real
operations and memory include:
1. Radix-2 FFT from Rabiner and Gold.
2. Radix-3 FFT written by the author.
3. Radix-3 FFT in R(u) from Dubois and
Venetsanopolous.
4. Radix-5 FFT written by the author.
5. Mixed radix FFT for factors of 2, 3, or 5
written by the author.
6. IMSL mixed radix FFT which can transform
any sequence length N.
7. Singleton's mixed radix FFT which can
transform any sequence length N.
8. Winograd Fourier transform algorithm (WFTA)
written by McClellan and Nawab.
9. Prime Factor Algorithm (PFA) written by
Burrus and Eschenbacher.
5.1 Results and Conclusions
j The two radix-3 FFTs were compared for real operations
and memory required to perform the DFT of N length sequencesIM
where N= 3m. Selection criteria were developed and tabulated
based on machine speed. The new radix-3 FFT in the R(u)
I..185
go.FA:
field uses less multiplications but more real additions
than the conventional Radix-3 FFT. The more efficicnL of
the two algorithms depends on the relative costs of multi-
plications and additions. The Radix-3 in R(u) is most
efficient when multiplications are costly.
All of the fixed radix algorithms were compared to the
Singleton mixed radix FFT for real operations and memory.
The operations counts show that the most efficient algorithm
depends on multiplication and addition speed of the computer.
Data was tabulated for selecting the best algorithm based on
this criteria. The FFT algorithm using the least memory can
also be selected from Tables 4.2 and 4.3. The limited choice
of sequence lengths possible with the fixed radix FFTs
reduce their utility compared to Singleton's mixed radix FFT.
Three conventional mixed radix FFT algorithms were com-
pared for efficiency, memory array, and flexibility. The
author's mixed radix FFT was very efficient but required
more memory array and was not as flexible since N was limited
to factors of 2, 3, 4, and 5. It was shown that Singleton's
* mixed radix FFT was more efficient, flexible, and used less
memory array than the IMSL mixed radix FFT and was chosen
I as the best conventional mixed radix FFT.
Singleton's mixed radix FFT (labeled MFFT) and the fixed
radix FFTs were compared to the WFTA and PFA. The real
t, operations and memory required was tabulated and plotted for
all of the N length sequences permitted by WFTA and PFA.
186
1.-u .-.. ,.~-. - mm -
This comparison showed that the WFTA and PFA required less
t real operations but that the FFTs requires less memory. The
MFFT was much more flexible than WFTA or PFA since N can be
any length sequence.
The WFTA and PFA were then more closely studied and
the tradeoffs between the two were discussed. The PFA uses
less additions but more multiplications for most N length
sequences which means WFTA is more efficient when multipli-
cations are "costly" relative to additions. The PFA uses
less memory than the WFTA which makes PFA preferable when
the machine is memory limited. Further criteria considered
in selecting between these two algorithms are the (1)
machine language and (2) the particular application of the
algorithms. If the machine language permits "shifts" to be
used for multiplication by 1/2 the PFA performance can be
improved. (The percentage improvements have been tabulated
for all permissible PFA sequence lengths). The second con-
sideration affects the WFTA since any repeated use of WFTA
for the same length N sequence does not require the algorithm
to re-initialize the multiplier coefficients. Improvements
in operating speeds of 40% over the initial WFTA were realized
on the Cyber 74 for various sequence lengths.
An algorithm to select the most efficient DFT method
from WFTA (Winograd),MFFT (Singleton), fixed radix FFTs,
and PFA (Kolba and Parks) was presented. This selection is
based on: minimizing real operations and minimizing memory
size for the machine used. Minimizing real operations is
*1.
187
AVIEW-
the best "first order" criteria (Singleton, 1969) and was
t vcrified by timing the transforms on the CDC Cyber 74. A
summary of the above conclusions is presented in Table 5.1.
The PFA was chosen as the best DFT technique because
it minimizes real operations well below the FFT levels,
requires substantially less memory than WFTA, and is more
flexible than the fixed radix FFTs. Of course, the "optimum"
algorithm depends on the specific application and computer,
but for general applications the PFA provides the best mix
of minimizing real operations and memory.
5.2 Recommendations
The above conclusions related to an algorithm's
efficiency were based on real operations and then verified
by timing tests on the CDC Cyber 74. The Cyber 74 is a
representative large main frame computer with very high
speed additions and multiplications.
To further substantiate the conclusions of this paper
it is recommended that similar timing tests be made on other
computers (large and small) available at AFIT and the results
compared to the predicted efficiencies based on real additions
and multiplications. All of the data necessary to perform
these tests is available in this paper.
1
188
Imw
iA L2 )
-4-' 4-)
-4 I a0H I 4 4 - 4-i
-4 w I .-
4-)
1-4
0
rz. i-irzy)
00
0 4-0
0 8)E-14
00a
U))
(d- 4-)0 0
0 d ) --4 *. 0 a4 u 4
9~ d C -4 -400 r.
x x
E-444P E-4 E-4 0
189
-- q1 hL
ill ±ographv
1. Bergland, G.D. "A Fast Fourier Transform Using Base-8Interactions," Math. Comp., Vol. 22, pp. 275-279,April 1968.
2. Brenner, N.M. "Three FORTRAN Programs that Performthe Cooloy-Tukey Fourier Transform," M.I.T. LincolnLab, Lexington, MASS, Tech Note 1967-2, July 1967.
3. Brigham, Oran E. The Fast Fourier Transform.Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1974.
4. Burrus, C.S. and P.W. Eschenbacher. "An In-Place,In-Order Prime Factor FFT Algorithm," UnpublishedArticle. School of Electrical Engineering, RiceUniversity, Houston, TX 77001, 1980.
5. Burrus, C.S. and T.W. Parks. "Efficient Techniquesfor Signal Processing", Final Report, ContractDASG50-78-C-0082, Ballistic Missile Defense AdvancedDefence Center, 30 June 1979 (AD B039920L).
6. Cooley, J.W. and J.W. Tukey. "An Algorithm for theMachine Calculation of Complex Fourier Series",Mathematics of Computation, Vol. 19, No. 90,pp. 297-301, 1965.
7. Dubois, E. and A.N. Venetsanopoulos. "A New Algorithmfor the Radix-3 FFT", IEEE Trans. on Acoustics, Speech,and Sig. Processing, Vol. ASSP-26, No. 3, pp. 222-225,June 1978.
8. Gentleman, W.M. and G. Sande. "Fast Fourier Transformsfor Fun and Profit," 1966 Fall Joint Computer Conf.,AFIPS Proc., Vol. 29, Washington, D.C., Spartan,pp. 563-578, 1966.
9. Kolba, D.P. A Prime Factor Algorithm Using High SpeedConvolution. MS Thesis. Houston, Texas: Rice Univer-sity School of Engineering, May 1977.
1 10. Kolba, D.P. and T.W. Parks. "A Prime Factor Algorithm1.4 Using High Speed Convolution," IEEE Trans. on Acoustics,
Speech, and Sig. Processing, VoT. ASSP--T, No. 4,August 1977.
11. McClellan, J.H. and H. Nawab. "Complex General-N WinogradFourier Transform Alqorithm (WFTA)", Programs for DigitalSignal Processing, Edited by Digital Signal ProcessingCommittee IEEE Acoustics, Speech, and Sig. Processing
4W Society, IEEE Press. New York: John Wiley and Sons,Inc., 1979.
190
-o -- -
12. McClellan, J.H. and C.M. Rader. Number Theory inDigital Signal Processing. Englewood Clf, NewJersey: Prentice-lall, Inc., 1979.
13. Morris, L.R. "A Comparative Study of Time EfficientFFT and WFTA Programs for General Purpose Computers,"IEEE Trans. on Acoustics, Speech, and Sig. Processing,Vol. ASSP-26, No. 2, April 1978.
14. Oppenheim, A.V. and R.W. Schafer. Digital SignalProcessing. Englewood Cliffs, New Jersey: PrenticeHall, Inc., 1975.
15. Pollard, J.M. "The Fast Fourier in the Finite Field",Math Comp., Vol. 25, April 1971.
16. Rabiner, L.R. and B. Gold. Theory and Application ofDigital Signal Processing. Englewood Cliffs, NewJersey: Prentice-Hall, Inc., 1975.
17. Rader, C.M. "Discrete Fourier Transforms When theNumber of Data Samples is Prime", Proc. IEEE, Vol. 56,pp. 1107-8, June 1978.
18. Reed, I.S. and T.K. Truong. "The Use of Finite Fieldsto Compute Convoluticns", IEEE Trans. on Inf. Theory,Vol. IT-21, No. 2, March 1975.
19. Silverman, H.F. "An Introduction to Programming theWinograd Fourier Transform Alqorithm", IEEE Trans. onAcoustics, Speech, and Sig. Processing, Vol. ASSP-25,No. 2, April 1977.
20. Singleton, R.C. "On Computing the Fast FourierTransform", Communications of the ACM, Vol. 10, No. 10,p. 651, October 1967.
21. Singleton, R.C. "An Alqorithm for Computing the MixedRadix Fast Fourier Transform", IEEE Trans. on Audioand Electroacoustics, Vol. AU-17, op. 39-103, June 1969.
22. Thomas, L.H. "Using a Computer to Solve Problems inPhysics", Applications of Digital Computers, Boston,MASS: Ginn, 1963.
4 23. Winograd, S. "Some Bilinear Forms Whose MultiplicativeComplexity Depends on the Field of Constants", IBMT.J. Watson Res. Ctr., Yorktown Heights, NY, IBM Res.Rep., Rc 5669, 10 October 1975.
191
4' - -
24. Winograd, S. "On Computing the Discrete FourierTransform", Proc. Nat. Acad. Sci. USA, Vol. 73, No. 4,pp. 1005-6, A-pil 1-9-76.-
25. Zohar, S. "Fast Fourier Transformation: The Algorithm* of S. Winograd", NASA, Jet Propulsion Lab, California
Ins.titute of Technology, Pasadena, CA, 15 February 79(AS N79-19733).
1
' 192
Appendix A. Radix-2 FFT Algorithm*1 tThis appendix presents an algorithm for computing the
I !
complex fast Fourier transform (FFT) defined by:
N-1X(k) = Z x(n) exp(-j2-Fnk/N)
n=0
where k = 0,1, ... , N-1
* Mand n=2 M integer.
A FORTRAN subroutine is listed for computing the
radix-2 FFT of a single-variate forward complex Fourier
transform or calculates one variate of a multi-variate
transform.
Arguments.
A = The complex array to be transformed which is
dimensioned to length N.
N = The integer sequence length to be transformed
which must have length equal 2
M = The integer power of 2.
Usage. For a single variate forward transform:
(i) Specify the input complex sequence A along with
parameters M and N.
(2) Dimension complex array A to length N.
(3) Call FFT2C (A,M,N).
(4) A contains the complex output vector x(k).t
193
IqU;: na:. AMA
*ii
lH
.:4I D 1 1 1 ..-. H.. ,TO 5
" 'I T
TPT
-: 0O I 141f.,KE J a; I
- n I , I I T
*194
- Air,' :
"r-I
tI .
J
Appendix B. Radix-3 FFT Algorithm
This section presents an algorithm for computing the
fast Fourier transform (FFT) based on a method called
decimation-in-time described in Chapfer III. This algo-
rithm is an efficient method for computing the
transformation:
N-IX(k) = E x(n) exp(-j2rnk/N), k = 0,1,2,..., (N-i)
n=0
where X(k) and x(n) are complex valued. This algorithm
requires that the sequence length be N= 3m, m=0,l,2. ....
This appendix lists a FORTRAN subroutine for computing
the radix-3 FFT. This subroutine computes the single-
variate complex Fourier transform or calculates one
variate of a multivariate transform.
Arguments.
A = The real part of the array to be transformed
which is dimensioned to length N.
B = The imaginary part of the array to be transformed
* - which is dimensioned to length N.
M = The exponent of 3.
N = The length of the data sequence (N=3 M).I
IW = A work vector of length M.
WKS and WKC = Storage arrays of length N used for
sine and cosine lookup tables.
195
-l - V h lm l i . . .. . -
Usage. For a single variate forward transform:
(1) Specify the input sequences A arid B along with the
parameters M and N.
(2) Dimension A,B,rW,WKS and WKC to the correct lengths.
(3) Call FFT3TM (A,B,M,N,IW,WKC,WKS).
(4) A and B are the output real and imaginary portion of
the complex vector X(k).
I1'I;
V 196
I ' " " " " i, .. .
I '-.4 j
PL Ti Li -Ll
TH PE5 P; TE F itF
021 D; I ETPU T P I13 PEPL-IE;'BYIT E P92E TW;42.
P3: 1 I;PT 4W2CJ T~ HG C ll I' IE.0~ 0 1M LDPi 'T OP ;F L E; J TH M1.
:310 i ;C: RA FOP CTE1PAA4_F7 OIF THE 0I~ETERMS:7 USEI.F TO T;.d1IDPLE EP
320C=- .C.: RRAY U S ED TO STORE THE S*INE TERMS TO T-IIDDLE THE RE:<U;LT-332-0 C H LE-H-3TH OF THE RRAY TO BE TFRHC FORMED.24
FT IS OIRU;TED J5[4 UTT ERFLY ruO;.'I.P0PH HO BY270= ?1IITAiL *S I iHL RPC& PH Y OREHIPSHFRP. :3'14.
: I: P- -UTHOR. JOHN D. E:A;AEU, CARPT! UCAF 4 AUG 8 04 0C=41 11 =42 0J=c43iCCTR OFC:BRTIEFT;1440=iF *.i11EcIUALLARYS*.
46 0= PEA;L A'' f:( * j:([ fr'4
4
j 570-.: COMD;PU-TFE E :IdE ._O CO IdE LOOKrUP TABLES USIH_.31'G THE RECURSI'%-E
I Ii I= H r 1
0=FIN THETAlI?640-= 4 ;:C'1 '=.
$ 197
Js
77 -7
j= j-
PR 1;J *'P T ME i:D.
::0LrE=:Ecor TH 1HU F' -E Cu86 Cl~i'- ANDI1 THE DIGrIT REVEF*.-. TH1E' :O.'T
10 = .C.HUNPF- E THEE LIDiT [FPY UCI;Z3POTERCO~4E9:3-L NPAE Te I j Is:T ERETEHO?*.8? 0= DO
'A = 21i=r-:=l ir:
91 -C Z'EO ~TE HFCOUHO: OF P O;4TE
9 l2c= uF IC=~
9080= DO;iT 21 TO~. 26
I ~ F TH CIO=JH T P.P. 4
I~ I i= 24 =a f ICOUH
I 01 P -:14T 1 .O N1T.
1 14p j- ;iT
I 730= IF 'HFEHT.'. LE .2I:UT.* i'30 TD 2i o ~ O i T=1t eIEv
11 0 v)= 2 1 OUH T 'I C;O;T + I'
198
I
T ;7 -r 4T 7-_ 7,- T -
.&j TT L
T T
1 3 :. i 0i- T E T i:
1* =I_ ,1IIE F CI:' T-
13:-40= 0; ,'I:'- T 8.:, 6 02 540:3: 7.-' 441:35 L=r
I :3-0C THI LOP ' -THE :;.TFGE H . THF HO. [IF
1:_:38 0= X,'' I Hi- S'EC,-ID (C P'-.,l:3 ,)=.. DO1 30 L.'=1,9'01400=0141 0=C THE ITEGEP D 1:- THE DISTAFNCE PETFEN P;LITTEPFLIES:F,
1420=0 W.HICH HA VE THE :-_-:HIE COMPLEX TW,.IDD LE FAICTORS :(TF)1430O=C1440O= D=3**4L1450=C1460=C TYPES OF BF IN4 STAGE WJHICH USE DTIFFER.ENT TF:1470=0 .148A = LI =L- 1
1490= TYPE=3*,*L1 I1500=C1510=C INITITLIZE THE T;IDDLE FFICTS:152 0=01530= TF1O.
;1540= TFBI1=O.1550= TF2=1I.
1561)= TFB2= 0.1570= TFA3=.180= TFB3=, .
1600=C TYMPUTE DI:'TTACE ETHIEEH F EHDPIFETTR. FTP THI:7.TAi E:
1020= P=TYPE
' 16,-3 ,] ::"-2=F *2•. 1640= PX4=R*4I165 0=0
• 1660O=C1670=0 CINPUTE I THEX CNSTIDTS FO COS & SI LOOKUP TABLES
1680=0
16"30= K I =,1,170= T2=2K I1710=0 THIS LOOP INDEXES THE BF WITH ._iTFS, INDEXES THE TFS,1720=0
1730= D 40 Jx=,TYPE
1740I=C FIRST STA GE HAS N TF!S SO SKIP TF COMqPUTATION!1750= IFL.E0. " GO T, 601760= IF(J.EQ.i) DSN TO 60
1619
162= RTYP
. - .-i _ - - - -... - - -- -_ I ..... _ _- _ I n _ -!
* -c-- _.... ->
- - 1 4--. -
LE T , F - . .-
1", ':4 P-I T=P 11)I,' F:I T=:-I :-"
1-~i1 F : L =: 1 .7)-!CPE~:7 WrI ' F '. E:". 1. SD TO 11'-7-,= -,T, i.E' 1 .- f T E-1-'n= FE-'T ,: ._' ', aTF'"P -T: I. T F T2.
B:T= e,; I2: .TFB2) * ,: E) .TF- 2.:
2000= IT= ,:A ':I:- 'TF B:':, B 1:,:3) *TFF;:-)2010= 3 TO 622020= 61 R2T=A,:I S)2030= B2T=. I: 2'204 0= ';:3T =F, 1:3:20 5 =:T=B ': : ,2 0 o0 = ::
2070=C COMPUTE THE BF:20'18 (I='-.2 (1,4 0 = , R2 ,2F_-= 2T S,:='T
21 O = B =:2 T F-',T21 ':II) =;I T *C'' :S:A212 ' = I T -2 -
21 fm ' P ,II=I T -T. 5 -'.2 1 5 A =! l E I -i . . ' E
- F'; F!c
-- ' = 411 ;,lT 1 ''
I , II I, ,I-4 =', ncr- r. o-ir''N)T I t.,
C--ii2 2=IF' 4 " F TI; , :'','F
.4,,-.
i =.'il OF FTTM IUk'RI'UTIicE..4._,
* 2340= E2D
. 200
4.
Appendix C. Radix-3 FFT in R(u)
This appendix presents an algorithm for computing
the radix-3 FFT based on a method which transforms the
array from the complex domain (l,i) to the R(u) domain
(1,u) .
Arguments
A = Real portion of the complex data sequence to be
transformed. It is dimensioned to length N.
B = Imaginary portion of the complex data sequence
to be transformed. It is dimensioned to length N.
M = The exponent of 3.
N = The length of the data sequence (N=3 M).
IW = Work vector dimensioned to length M.
WKC and WKS = Storate array dimensioned to length N
and used for sine and cosine look up tables.
RTEST = Set equal to zero or one. If the data sequence
is real, RTEST=l; if the data sequence is complex, RTEST=0.
Usag. This algorithm is an efficient method for
computing the transformation:
N-I
X(k) = Z x(n) exp(-j2rnk/N) k = 0,1,n=O
where X(k) and x(n) are complex valued. This algorithm
restricts N to equal 3M where M = 0,1,2, .
201
LM
. ., , , .q~m ... .. - . .. .. . . .. .. ..
i -I>'...'m " ' ..
For a single variate forward transform:
(1) Specify the input sequences A and B along with
parameters M, N, and RTEST.
(2) Dimension A,B,WRC,WKS, and IW.
(3) Call FFT3RU (A,B,M,N,IW,WKC,WKS,RTEST).
(4) A and B are the output real and imaginary portion of
the complex vector X(k).
i'0
i' 202
,-'-z =:-. *, - . ,_ . . .. ,_, T,-._ ,c -, te. ...:T , 4-..&4> 4 . -t. I, *. ecee-b: e.
r,, '- . r.V" rrrt-,-' , -- .... ~ T't z c . T
-- ' I'1= -r- n'-,-rr,- . ... -. .. .. . -L -7
, , ,, T,, T O: OF L. E . T T .4 I", M I ...jT B 0. - , T,,iiTHE 1i -M LU TO F TP;, F . ?tEiD
On OU T.FII-T F: IC EFPL~z:' E 11 P THEF'E Ti0,N-'0 =- "1: I,-_IT 01- ,,- N NT TO M!l_-C :- 1 1 : ED I E. ,;N=:3
II-: IM: ' ' O'ETO OF LENGTH M.:- E. ,._T-. OF THE A 'rY TO B E TRANFO:-:F '.:3:' -:f1 .QT E:-7T TE:-T FLAG3=1 IF PE:-L TRAN'-FORM:34"(1 TEST FLAG = IF IF ,OMLE. TRA;:FORM
7 T-.
-1-4.-
4 1 w~ T1 T I - 4N C :C $ I S 7 O F ~ ~ 2 I ' ; H ; E
E D7, F4 TTE T-C FF1':: Q7 T IEEET
.4 III-1D K i
27, :znL=:;P .
L bee C',-e
"-: FT T I T F
#4 f10INN s :
-.-.c..
FE: =- ~,ZF.T.ASF':2TI:; F1TATAS'-T F 1 TPT.iv- -. INI V-ON ',
610 = ,-iv,"'=1:_-* 'P- I T : E - , '-= ::h : e [, :_- _,-,-c ... ,- . .. :; , ' .. . :, ';- ' T ; T : l'1
I == = ' F~ S "! I' I T ' = -.. ... ' ... . - T T ,..E F ' .203 .
- r' .-' .. .z , % . .. " ; , , -
I T
F - I r r ..:- , ] ,r=11-I,.Y,, *, LL'"
O:C-- 0T. 0TID 2 T.Hf p4 cl
840 N pp F N9q 3 . 1r I q- N!IP
870I:= DO 81 I ..... N
8'90M : .
'7 -1 1- I-
cF'INT*. "L0DV TF TI" 'E T
7 T l,- I,
LE$ T T I T 7E'.'EfCETEY m 0
I4 -471 4
-t .4 0 .. .. . . . .
' h~l,- r'- 7 * -1 r
L' I
I 'if0= 3E ,j .7 rN ... .. .,r.- E,---
' rr
1 0 1 , r= :7 T.
1 79= c--4 "l 4- f- P .. I .. , 1- ,- *,',N
i :_=ail=c :1 : 1",2 ,: Ytth',: :- (.J :, P2 1 21 0 HEit1 0= ...... . .. . 0_: ,, 3 T 2
1150-= T,=-, r .E
1160= TB=P rFE V
204
I.I
-~ T2
'?=I E R L I:l' -OT
14 (1 C],] H1450= DO;S:_ I I
ri .l-- - r---- ;.- 4
1 ;-, 1 '1 = : 4
1 ii I;1 i" l i. , , , - 2 =I : REAL : F , ARRA .:.C . 1 ER.C- ,, [IO, ti:: ,"IT , ...E,.,C'UIRE.rI
140 F- fNL::: ARA7 :
1:T T1117T - -'ivR I T~ Ru-FED EU1 r
14 ; I7. 1 = L141 0= 0 , ON ER A,-oB-1420=0 TE."KI FOP',.REAL'n OP COMPLEX.: FARAY
14J .- = I F RTE:T. E. 1 ) :3 TOT 641440= ! .1:4 - 7T, . r T:... .:14-50= DO 63 l,
G T' _,ZEi',l TIC:', =;qI
1476= [:'=L , I
7 1
1,,- ,.
1; c. , in 2 'I . T"1T:Ci.E T 5> Tl:
i*l l.iii=:* 4l 1 " 'a' : '4 I> r '- -r r- nl r: T :J 5 TI - T - : P TT !.,2 E ,f FIIT T F~ %C I
~~~~1 ,_ - i i -'I' I - -
S,; 1 "10= LM=-1
", 170 O= T..2E!=L1l
,,ll,; 1710=0-
205
?A 3-
,-.I. . . . . . ... I--
I THI's. LO.I' ,E.c THE :v :,I[TH " TF-T -I:-.
.V,-?- -ONVEqT TO P... NOT-AT IN
194'1 rIO 40 _1=1, TYPE195(=C FIRT STAGE H O: D TF!.S SO 7.:Ip TF COMPUTATION196 O= I F :L. ... 1 , 3 TO 60 11970= I F (J. E :. 1 :, 3 TO -6 01980= .j,J I = 11990= = .. :1 0:1 +)eirw = "= 11I 4<i ,- 1:.
2. f 0= E:J:c FE I .....
C_ j
JJKO ~T IF.F
22 4 0:- T) 0 1]: =-:' ', : .: :l
;:lI 4[ 1--- -I " , ii $ :*
L E '
2r2--(I= I' J_ E 'T=. 1 3 TO I
22" 7: I = BX = 12 * TI2=
LH T.:- s". ='20
' T ,4 TI,,: ' . .
, ,--_-,. .'1 - -'
•M _,-,.r - -
I 0
c11Ti.. ~EP I2':SvEC22~::L:.~T~rU:'
4 I1' =F. IT +EcT .T48 C' :Ba T =- :-: T 4-F2 T
2 "4Q rl= ;2 F: T =-::2 T. :_- T
250 0 -: 12) =;1 T B :: T -; 2' T251 O= P <1 2 = 1F 1T *F2T -:',2T252 n= A 0, 3 =.A I T +B2T FT23T,2 B '1:3 =P 1 T -:3T -;2B3T2540= 50 C."ON T I 4U E;2550O= 40 ,C 1 ONT I, NOh-2560= 40 2NT IUE257 (1= C] 3.... "
2 ' =.-~
'-'-- . ,_ .T TO 7:Z 7
2. 1 r,=:"~ff,:,T.," : .T-'-,- ... .-,
*4
. .. _,. f, I , - T
207
-, *-.-. , = ,* r = -- r-, -,
App end L:. D. RFFi rPET Alqorithm
This section prusI.cnts an aljorithm for computing the
FFT based on decimation-in-time of the discrete Fourier
transform defined by:
N-iX(k) = E x(n) exp(-j2ffnk/N) k = 0,1,2, ... N-I
n=0
where X(k) and x(n) are complex valued. This algorithm
restricts the length of the sequence to be N=5m where m
is an integer.
In this appendix a FORTRAN subroutine FFT5TF is listed
for computing the radix-5 FFT. This subroutine computes
the single-variate complex Fourier transform or performs
the calculation for one variate of a multivariate transform.
Arguments
A = Real portion of the complex data sequence to be
transformed. It is dimensioned to length N.
B = Imaginary portion of the complex data sequence.
It is dimensioned to length, N.
MM = Exponent of 5, where N=5kM
N = Length of the data sequence (N=5 M).
PIW = Work vector of length M.%4
WKC and WKS = Storage arrays dimensioned to length N
and used for sine and cosine look up tables.
t.
208
Usagle. For a single variat-o fL-rwtrd transform:
(1) Specify the input sequences A and B along with the
parameters M and N.
(2) Dimension A,B,IW,WKS and WKC to correct lengths.
(3) A and B are the output real and imaginary portion of
the complex vector X(k).
I2
t
E
209
i!
ri T
fl~~~s r' z TA C
C O -OTrPU-T P. 1_7 EPLFU:Er Ut THE FOUPFI EP TPi~r4CFOFri.~A i M: INPUIT FOrIErT TO MIIHiCH 5. T- P.'T7 Er * I.E.-4C*
C.N: LErD3TH OF THE Fi-PRY To BE TF~riWFOPtMEE'.:31 'J C lIM: IJOPV '..EC: TOP OF LEr4JTH fl.32O=C 1-4 C: RPRi OF LENGTH N :Etl TO -TOPE Co:- INIE TEPWM:3.-_ 0 ='i C: iPPRY OF LEHGITH Ni UIEE TO :TR lETEPM.
350=C R IJTHOP: JOHN ti. PLANV4Et' CF4PT~ J,:A F
C .IL'C Tt--T OF- :A 1TIr4E FFTcTF7:
4~r L~ L
4400 F IL r. 4 p -r w N 1 4
*4 -1 t.-..-1IT T i 1r~Fv 1 iG It.E LDDO ;F y i 1 FELE - i I rit, Ti-sF
P F L T I ~ T
I)= :o LOWEF' =ri + I640= PR INITO, "NLLOI.E= rLOWEP
210
35~ ~ ~ ~ 0-' kik -=
:E~C'. COPNFUTE Ea~iE riOC.. OF COLINTEP8170= MFI::=M8380= NER ~E 1:) =1890= DO 21 .J=29M900fl= NBK7 . =I hi':FAC: .NE:A:E' (J- 1),:'I0= 2i1 r1FRCI =PFRC- 1q2o=': EPO TH Er4COU4T J) APARY
9310 DO24 j=1I.-l'40= 24 NC OUIIT J=0
-9_=C :ET PiE Ir-FT I CE.- FOP, Fi B RPR
l = r 4FF.= FP N fl 4 01_1t 4T t *P. 7E I*
111_1 :' 5 F~ 1-1frT
1 0 :u 01-F iE. T E I ' =rl-CINT 10 TO1 p
F- 14 ltTI L E G' 0, TO
IF It-l 1 1 reT 17 r4 0 T
1 ::. l OHr I NUE~frT I--
Iji
I 11 o=F I RjIT NOW .rH, FLEDTO #**
211
13~= THE I HTE GE P D IC THE DI ;TAAC.E BETieiEEN BUTTEPPLIE:--F,14fI(=C IHIT HFY THE PiE' COM3PPLE:.-: TIA IEIELE FAC--TOR.;S :TF*141 C=1420= t'5*lL143 0=C1440=C. TYPE:" OF BF IN :3TAGE WHICH lE."E DIFFERENT TFt ITAC BETW).EEN1450.=0 BF EUNiPOIr4T;_ FOR TNH:ITS3146i0CML-147 0= M=-14:3: 0= P=5..Lt-
j1ri I= I4T I RLUZE THE TiilririLE FACTT
1P 5: 1 ='[=.
1540= TFR:2=1.
155f, TFE =l.
1,.T f C17MI-UTF TkP I--- I1rl T,"iFLE 1~lC
I.a
II
I. DO 40 -1=1.,P
195@ mI5= I4+F-
1970=C TlIIDDLE THE BF INPUTS7 ANti S--TORE RESULTS IN TEMP LOCATIOrN:
19,')= A IT=R'-"112000= BlIT = B'IlI2010=C IF L =I1. NO TF!2S APE PECUIPED:20)20= IF.:L.E0.1, '30 TO 6-1
203=IF(J.E0'. 1', '30- 06E.2040= FA2T= AT8.TFF,,BI2.T- . ,:
2050= FT= A,l *TrE - FI1:T P7;,:P -?T=,R iZ4 1:'TFP P.- T-F.T PrtT
:7 0=t:4 T=, t:4 14 :' *TF 7 +a *T =,11:4 5:4T 44 *TF14P4*
c.t~i L4:T=H'T14'*TFE4+. 14 T5*Tcruc,
llu== FiT - 1'5 .TFES + F 5
;r Tn g :4
V-lrMli 'IF THF- l-t-
V22.90= A5MA2=AST-R2'T2.300)= B3PE:4=E---3T*E:4T
213
.~I; *IS -4- = - --" - 1,'- .
,:'4 L;,= -.t. - = *i:+, -- <f- +l:.1 :
2500= R. 4 B 3: 4 + , B 2.52,5 1 OI=
.. .1... . . 5-
7H, 4-4 EIF :2 , E4252(0= C ,-5 . 1T +C B 4 +C 2B-,:E :525:', 0= C B34=FB1T+C4E-:+C2B-B425401= .25 = :"-.4 4 4 -+ : A 5' R2550= .4 54, 2 A4 A.,2560= * ':I , =IT+-A4+_2PA52570= Fi 12:' = CA25+ A252580= A ,:,3:' =C :-',4 +: A ,:4257 0= R, 1",4: =C ;-'4- R -'4'600= A 15,=:A'- 7 ,-
261 ':i= F: '11', =EI T+B-''E-+P:C''.:' .--'n =E., 12) :, = : 2:'5 + -E:,--25
26-,0= P, -= + -: .'4
, '.14, =CE4- 7 :4265~ B,= E* 19:, BE25-" -::
-, 5. -I O T I rILUE
:f r lrT T it 'F
=1 Er'r' s- FFT5TF .It F:, LI T
4 1
214
<IL
Radix-5 FPT Thkorv
This section presents the theory of the radix-5 FFT
starting with the D!"1T definition and then decomposin(I th,
DFT equation using the decimation-in-time algorithm
(Cooley and Tukey, 1965). This development closely
parallels the radix-3 development presented earlier and
consequently the radix-5 theory will be brief.
The DFT X(k) is computed by separating the discrete
time sequence X(n) into five N/5 point sequences (n must
be of length 5m, m = 0,1,2, .. X(k) is given by the
DFT expression:
N-I nk where k = 0,1, ... , N-IX(k) = E x(n)WN (D.1)
n=0 and WN = exp(-j2/N)
Breaking X(n) into five N/5 point sequences yields X(5r),
X(5r+±), X(5r+2), X(5r+3), and X(5r+4). Using these
sequences and Eq (D.1) gives:
N/5-1 5rk N/5-l (5r+l)k N/5-1 (5r-:X(k) x(Sr)WN + 7 x(5r+l)W - x(5r+2)W.
r=r-N N r'OSr=O r=O
N/5-1 (5r+3)k N/5-1 (5r+4)k+ E x(5r+3)W + Z x(5r+4)W (D.2)
r=O r=0
By regrouping exponents and making the substitution of:
5r rW = WN/5 (D.3)
then Eq (D.2) can be written in final form as:
215
LM~ jq
N, -.- I : - ,/ .
:C = 5r)N IV w( Jr+i)WN/Sr=0 N5r 0
2k N/5-1 rk 3k N/5-1 rk" WN rx(5r+2)W N/5 + WMl r=0x(5r+3)WN/5
4k N/5-1 rkWN E x(5r+4)N/5 (D.4)
r=0
Each of the N/5 point DFTs in Eq (D.4) represents an N/5
length sequence and the WN terms in front of the summations
are the butterfly multipliers.
Eq (D.4) can be rewritten to reflect the N/5 point
DFTs as:
k 2k 3k 4kX(k) = A(m) + WNB(m) + WN C(m) + WN D(m) + WN E(m) (D.5)
For N=52=25 the Eq (D.5) representation is shown in Figure
D.1 and uses a less cumbersome FFT,notation (Rabiner and
Gold, 1975). X(k) is obtained by evaluating Eq (D.5) as:
X(O) = A(O) + B(O) + C(O) + D(O) + E(0)
1 2 3 4X(1) A (1) + W2 B(1) * 25 C(1) + W25 D(1) W E(1)
25 25 25 D25 E
2 4 6 8X (2) 2 2(2) + w2 B2(2) C (2) 1 W25 D(2) W2 E (2)
6 12 18 24X(6) = A(O) + W2 5 B(O) + W2 5 C(0) + W25 D(0) + W25 E(0)
216
• ,i' ,, ," -,' *; . ' ' : ; I : , ... _-
77
x(1) li/ /Ux (6)
x~~~~~ ~ ~ ~ (8) i- ( 2
x (21) B4x (9)
xC((2
x(22) x(4
x (3) D()(5
x (4) XC()?'4~ (1)1
x (9 N/5X(21)
x (24 X (.4)
Figure D.1. First Stage Decimation forN=25.
217
f2
2 3
2L 4 ' ' : 92
': 2 = 4 ) i:, I. t , V W ,- i:3)
24 48 72 96X(24) A(4) + W5 B(4) + C(4) + ,D 4 + w E(4)
The above expressions explicitly describe the first
stage decimation for N:25. The next step is to evaluate
A(m) - E(m) which are also 5-point DFTs. The 5-point DFT
for A(m) can be evaluated as:
N/5-1 rmA(m) E x(r)WN,5 (D.6)
r= 0
which results in five N/25 length sequences:
N/25-1 5im m N/25-1 5imA(m) - a(5i)WN/ 2 5 + WN/5 a(i+l)N/2 5
i=o I=(
2m N/25-1 5im 3m N/25-1 5im+ WN/5 i= a(5i+ 2)WN/2 5 + WN/ 5 i-0 a(5i+3)WN/25
N5i=O i=0
4m N/25-1 5im+ 'N/5 .i +4) WN/ 2 5
: KI-,.,. . , 4 2 7
It can be seen from Figure D.1 that a(5i) X(0),
{ a(9i+l) = x(5), a(5i+2) = x(10), a(5i+3) x(15), and
a(5i+4) = x(20) for the 5-point DFT of A(m). The final
expression for the A(m) 5-point DFT is given from Eq (D.7)
where N=25:
218
*~~~~ (k) ,'§'~
AX (k)
k(k
C Jjr k)~r
( k )3
'IC4()
t.r
321
N 9r- X k 3r
0 0 0 0~~A (0 ) ,: (0 ) L- t,,',_ -,[ ( ', . ( ) l , , , . 1 / ) (D . 8)
1 2 3 4A(1) = a(0) + W a(1) + W5 (2) + W5 a(3) + W5 a(4) (D.9)
2 4 6 8A(2) = a(0) + W a(1) + W a(2) + W5 a(3) + W a(4) (D.10)
5 5 5
6 9 12A(3) = a(0) + W5 a(1) + W5 a(2) + W5 a(3) + W5 a(4) (D.11)
4 8 12 16A(4) = a(0) + W5 a(1) + W5 a(2) + W5 a(3) + W5 a(4) (D.12)
From Eqs (D.8) - (D.12) the basic butterfly multipliers are
derived to be:
k 2k 3k 4kX(k) = A(k) + WNB(k) + WN C(k) + WN D(k) + WN E(k) (D.13)
k+r 2k+2r 3k+3rX(k+r) = A(k) + WN B(k) + WN C(k) + WN D(k)
4k+4r+ W N E(k) (D.14)
k+2r 2k+4r 3k+6rX(k+2r) = A(k) + WN B(k) + WN C(k) + WN D(k)
4k+8r+ WN E(k) (D.15)
k+3r 2k+6r 3k+9rX(k+3r) = A(k) + WN B(k) + WN C(k) 4 w D(k)
4k+12r+ WN E(k) (D.16)
k+4r 2k+8r 3k+12rX(k+4r) = A(k) + WN B(k) + WN C(k) + wN D(k)
4k+16r+ WN E(k) (D.17)
220
k.3W]..... '.' -- .. .- , . , i .. . . .
T ho EqIs (1)). ID)J- (1). 17) ar ;Ijowuj Lu 1 twiddle factor
butterfly of Figure D.2 where "r" is the distance between
the butterfly and points. Since N=5r the butterfly multi-
pliers reduICe to constant complex multipliers of:
r 6r 16rW N = VIM W N cos(2Tr/5) -j sin(2r/S)
2r 12rW N = W N cos(4ir/5) -j sin(4rr/5)
3r 2r 8 rW N= (WN WN cos(47T/5) 4-j sin(4Tr/5)
4r r 9rW = (WN = WN cos(27T/5) +j sin(2rT/5)
These constant butterfly multipliers are computed once
during the F'FT computation and used in every radix-5
butterfly.
~221
Appendix E. Mixud iFE? Aiqorithin
This section presents an alqorithm for computing the
FFT based on the discrete Fourier transform:
N-iX(k) = Z x(n) exp(-j2T:nk/N)
n=0
The algorithm described here can accept an N length sequence
which is factorable by 2, 3, 4, or 5. To aid in selecting
an appropriate length sequence for this algorithm a list of
numbers less than 50,000 containing no prime factors larger
than five is listed in Table E.
Arguments
A The real portion of the complex data sequence to
be transformed. It is dimensioned to length N.'
B = Imaginary portion of the complex data sequence to
be transformed. It is dimensioned to length N.
M = Number of factors of N.
WKC and WKS = Storage arrays dimensioned to length N
and used for sine and cosine look up tables.
N = Lenqth of the sequence to be transformed. N
must be an integer power of 2, 3, 4, 5, or a combination
thereof.
AT and BT = Arrays used in the subroutine for tem-
porary storage of A and B during the data reordering (digit
reversal).
NFAC = Contains all the factors of N. NFAC is computed
by the user and passed to the subroutine in the argument list.
Dimensioned to length M. 222
)()W,4 2 2i til I I
dimonsi- ntd to 1.on tlh 4.
IWK() 1 [ow_,rs of 5
IWK(2) = powers cf 4[WI(3) = po.ers of 3iT;!' (4) - !,o'..-rs of 2 (must bV n or 1)
Usage. The subroutine listed permits a maximum of 11
factors which is adequate for any N less than 216 with the
factoring used by this subroutine.
(1) Dimension arrays A,B,AT,BT,WKC, and WKS to length
N and array NFAC to length M.
(2) Factor N and store them in array NFAC. Array NFAC
must contain the factors of N starting with the high-
est prime factor, 5, and continuing to the lowest, 2.
E.G. N=480NFAC(l) = 5, NFAC(2) = 4, NFAC(3) = 4NFAC(4) = 3, NFAC(5) = 2.
(3) Specify the integer powers of 2, 3, 4, and 5 in the
array IWK.
E.G. N=480IWK(1) = 1, IWK(2) = 2, IWK(3) = 1, IWK(4) I
In general,
N = 2 A 3 l 41' 5' and
IWK(l) = q, IWK(2) = p, IWK(3) = n, IWK(4) = m.
(4) Specify values for A the real part of data sequence
and B the imaginary part of the data sequence.
(5) Call FFTMR(AB,M,N, WKC,NKS,AT,BE,NFAC,IWK).
(6) A and B contain the real and imaginary part of the
transform X(b).
223
~~' 1
tit 11 rlC * libi. r s w -; r ~
I i
224
ANAL -,.
62~.6 b
72; 729 7i. 768
810. 8 ,* 9 D '. 6 J 7 21 ga" 0 . 24 C(:., 125 1152
1200 1215 1205 1295 I2ib
1351l D " ,0 i!a) 1! 3616 0. 16 2 . 37 . Z: 2518 "1921, 1944, 2L ... C 52[*216. 2187 225 , ?3 14 2, .
243. Z2 3ju 255. !21-9 ,W
2o8-4 2916 3L J. Y72 72
321t 32-" L 331 3 ' .- 3' .K
"3 64a- 37 L" 3" -).+2 4. qu 42
6J);. . .i
518 ,- .' . 2" , ...
• i:,3 >.15 p . .--.L~ -,
3 1 L 5r 2,b 32
i4-3,
32835 3375 365 4 W9:? -6-0
~~~~35 366 5 3 J" k~~2, 1 3L; 1S,, ;osJ ;e
1*71
2 8 a 0 291bL. 3Q3. 31?75 3Et, 1t
3 t1-4 31 25t, ~ J 324 a j 32765
32835 3 375C 3456J 3499 ?b .
3 355 3686 375ij 334PJ 3 8 813
3936b '.Jit) 4030'3 1#J36J 4 1L 72
4 IZJL 4.374,; 4 5 O 50 s 4(
#6875 483G 486U '43152 Om
225
S
- - - c. --
a7=:THE IfliF4 iiHPY '.L;ETO PE T~~ 1 ~28i)=;- OHA OUTPUT P I T PrEFPLiFC IDP Y THR PO'LPIER TRFINSFOPM'.29 C' B I:: lli *E;iCIAOHEDl TO LE;IHG:TH ii:3O~ 0 ' 1: HIBPOF FF4CTOP : H.3iC 1 H=0 :N THE LEH-iTH OF THE Z.Er)'LEHCE TO PE TPANSFORMEi.. N1 MUST BER320=C ITEGEP PC;..'EP OF DP5,, ~.330~=C ~V:DIMENHCIO D TO LENGTH N ANHD §H"1TAIHr- THE iCO-INE TEP;47 FOP
3 4 Ctl = C :, 1 D I i1 E i 10;4ED TO LEHGTH N HI r'~D~ONTRIN THE JIHE TEP;4"r THE FFT.
330i FiT: t1H:IH TO LENG3TH i4 PND 12 UED T 0 TOPE THE f; Z.Up IN.;
37 F- T: DI;1E;4i:IOHEji TO LEHG:Tr H 4 iHDil 1'7 LE i TO 11OPE TH EU~p I ;i
C 'HF FLE:~.-~J - ;4T~il4'- THE FTD H :T rEET:Dr
1- i4 THE 51 'HTL. T. 7CHE rT -IJ:-TL:I J.:
4~ ~ ~ 3 -4 '-l -T iPE7- 1
4,: - CPL L :i~i4 I' P;TP7 I ;4 T 4
5~2:c D I'E 4 IOH iziH (1 B -'H. ,biI H' C 6H0 it 14W ' 4.' i*:T 114.', .PT "Hifl5:3 0= DI;1EN rO;l HCOUH 'ii I BE Ik. LA NDIG I 7 10 FFIC (il.:54o= liHTEi3-EP P -TYPE ~;ED
226
6mL~~-
- .. .. ..
.40--0 :HJFFLE THE IHP;JT F4Pi, TQ FE.FP E DIi ,
750=C760= OH I 4= -CECO4D 'C P,770= DO 135 I=I.4783O= AT (I') =Ai(I790= 135 BT (I:., =B 'I80,I= C CO'PUTE THE BaSE ;U;_,EFS OF THE ,-OU;i4TEP310=0_
;32 = 'IFRC=i'1833= H4BE . = I840= DO 101- .Ja =3'5 f=- ;.4 -E r' = ,;,1Fa- , 4 W =' E -
,='._-.,=_- ,-00 '_I TE ThE ,C,"U; TEW LI 'TT - pg"' F,=.,- -:-Ji~i ~ O I i I
THE _I jD31 4 T' ' L1,,13 1 J = ,-
T,'= ICOUI4T,' 40 _IM
4I II0 ;IPUTE 0 ,5I- 1E -1
I
I ')3'" -
j gi4 , -. r =,1
l " " " = L1'L] ' t- ,
~~1070= 13')I ri="-
1030=0 CHECK IF CH;_tFFLE IS NECESSARY PH THIF PaIR
227
- - , " ~ ; .... . ...
~~ F&~I9 T*- r r .tt< I
1 340=i * O;IP-UTE THE TPFIFOFII RaDIX 5 SPECTIOII1350=13n i-, FI=-- RE THERE PO.IE? OF '57
1370= PPIN T.o 141 *.i ~ 1
13.30 :F T1; 4<EO~~j3 '0= IF *1I.WI ' E .0 T ''
144=T FT.: &Tc
149 1k1I.1=je
14,i~
* T I -fl
TF P4 I
228
T- -
11 0- TFr? .? C ;a l * + 1 1
L3I:8 0 T'J5=;4 - I *r 4+1
192=i- THIC LOOP PEPFOP;I ?' THE s5-PT DFT. THE LODP ICi ? 3 lllCTi-llE D' ;.'HloH -ELE-CT: THE UE F -W~ITH THE C-t4M T;:
i Yl: 2= I=I 4-P
I' ̂ T;. Ii E THE iR F7 ;4PF'-jT r: D TOF',E PF 7U T li4 TEI,1E 0-
B3T-rF; r *TF T. I.TE&P
Il r;4 T= r,IT4 *T~r4' 14 OT;7 r- 4
i?4 = i14: T i4
21229
R --R 3
2~41Fi :4 5=:4.umP2401 cB3-Pr4=;-[ I;4?.1:44241 0= -4B:3?P.=.:c.3F2424O= Cd2B C3 2.P24B
243 O= 42aAB:-4P4 ' I42.*5P 4244 O= C:42?p5=cD *4P?22PB72450= 41R4?2B 4 *CC ;t:H?
24? 0f= :4R3r-a= IT; 4;F43
24,030= :245z Pcr-*sr14 4#:-I4 r-41+ 4r -1 r4 r
0 14- '-$ 4 tl t I tt *r4"It~wItL rT+.-
4 1 Tr4r
'r4pi 34 4
2 r-k5 i- r-:
J,7 7 E4 at- - ---
274T 200) PPT1;T1. T;av;, 23 (3'.217 5 '< F4PE THE'RE A~ii [IFE? 472760)= I F IW1r2..L E.0.- -30 TOD :3010
230
-,
=.' ... .-,O-;'! ITE THE ITdTE> - T H4 r:2?7 i= IF23aV= r< .'2*i I
.3 1)0:0= ED 0 -1= TYPE3 0 0= IF ;IIlK 1 ,.;i E. , 1. ,'- I TO 2 1 )
3020 = IF *L4. E. 17' I 3, TO 2113,3n= 210 IF (:J. E-71. I GO TO 2113 '40= *J ' _- I
3050= TFa2=uiL ,( J;.Il *r: 1 + 1',
-,7 ,i= T - :7 ; ,r , - f I -+ I
- 7,-1,= TF,4=;,r - _C,' . _: i
:: =-,. I - iIi+rI A-I'
D1 TA I ~ I ~ j
:=, TH- LODF PEP:EEP;H- THE 4 FT LET
T zr_ ii; , + t!. .;
S7 I I 'T -, . I ' jr .- . ,- " T r --'
. F. 14T=r4l 15 '*TFS+P" , I,' *T-,'1 ' ~r liZ r-" 1-,I * ' r: ri -- il'L[ t rlT'
:, ' +- r T .. . , T - - +- , " - 7 -
;3300= , TO 217'310)= 215 a?2T=a I2':3_,'i= B2T=BK 12,
231
-_ I~ * I-. ,- . _1 . .. .I m.-im . .. ) . . . ... . .m . . . . ...
' 14.:
35 O= :TI IE
:35?'1 27" CD;j ,'j"j 4
~E
358 C= ppI;AT 'P I.
35 0-
F D :: : :a C T I '~
4 T F
4
-. 00=i-
i i I ;. j I
3 1 I -
A ,0= L 'l -
= ,; - : ' - r 1
4.1..
.4I A
232
d-.k&D -F] L-: ',{3
4:- - 7.--: ,-- .. ,
4uv~iz ::T=r- -
4(i7V= -I, L.
4,171= r:3 A T,: IZ41:31:= rs[-: = 1
41 CIO= P: T . I 3411 7= -1 = HH: T+H -T4120= B2:B=P2T+ T413 0= R-I ,I . =i I T+ia3i :?4140= BI . I T+B2.3415,= :'i 02;=cO;"- T* ' B -T-,2T'4 16 = P.'.,IB:'=cQL-,;-4 T* 'A2T-a:-T,417 0= Pi;,I =a I T--. 0.418 0= PB: d.,I =PI1 T-0. 5.P2B:34 19 C,= R 12> = 1a;.,'i42&d,'= F:,: '2' =pp£;,ij -P_;;4;-
421 ,= , ::> =F'R I +PR;,24 -P B'= (1':?:' =pF,;I j +P i,,I;:
42 2i, 0- 0 COT I HA;E424f= D; 4 1 T 4;I-H;E4250= :70 i T I; UE426 P= PPI;T*,, ' FI I :: - 0;"-4E
24Q7 F;= rib) 7h411
4 -',
.4.:_?, =,4 3 = -T; ' : -. ..-
4 Tn F i -
4 ,i7 1TF
4 _- 2P1E= I
4 T Ft TYPE=P4:4 = F I="4:350= TFPE ='
k[ 4T-'E. = TFa I1.
4.3-to= TF 1 = .44 0,1= r. I = -Ailc114410= iDa 4,'' _I= ',TY,.pE.'". 4420= IF' .I.cEO.1' '60 TO 411
,44:3eU= ._;ii1 J =.1
r 4440= TFa2=;," -1;11 . I + L.,
233
->aaffiL- a. Z m " -: t .:tt k Ul .... .- .
447 = TFFS d ~ I VI + I+,4 4 ,.'o 411 11J
4 47F 1 11.-
4 4:4 1 T B I I.,t ~47: 11 . IF P -. Eil. I' G3D TO0 4 15
451= F 2T= a't I 2.' T FAR2-B1i a-,') .TFB?24520L= 32T=A4'2) TFB2+B 'I2,-*TFRS4330W= .-O TO 4174540= 413 R2T =Ai(12'4550= 32T=B(12).456L0= 41? A(IL')=A T*Ai2T4570= B (I1.-=B I T+B2T458 0= RA(12) =FA T-F42T45':0= B (12) =BIT-B2T4600L= 4830 0; 1TI;IU E4610'= PRIiT*. RADIX 2 DONE"462 #)=;- END RADIX 246fS3 0= FFTOUT =SECONiD CO>-EFT I N46-40= PR.INT*."TIMIE TO PERFORM FFT= 'FETOUT4650=i4660'0=i: END OIF FFTMP C:-UBROUTINE46-70= 300) PET;JPH4680C= END
470 0=e'EOF
'23
I
r ilh.; Couit for F RTML
The operati ns :ount for the factorization used in
Th s agor!thim ts i fjnct.(on of (1) the number of butter-
flies, (2) tht , -ili r ." co:plex twiddle factors, and
(3) the nuwrrEr of *,,s the cosine and sine difference
equations must be coMputed. The number of butterflies in
a mixed radix algorithm has been shown to be (Singleton, 1969):
mE (N/Pi) (E.1i)
i=l
and the number of complex twiddle factors is:
mE (N(pi-l)/pi) - (N-1) (E.2)
i=l 1 I
where N=plP 2 ... Pm" The radices in this algorithm are
restricted to:
N = 2r 3s 4 t 5u (E.3)
Given the factorization in Eq (E.3) the radix-2 section
(where p=2 ) has
r rE (N/pi) E (N/2) = rN/2 (E.4)
i=1 i=l
butterflies which require four real additions each. The
number of complex twiddle factors for the radix-2 is
given as:
r rZ (N(pi-l)/pi) = Z (N/2) - rN/2 (E.5)|i=l 1 1 i~l
which requires four real multiplications and two real
additions each. Notice that the N-1 term has not been
235
, .i i i i i i 'i ' pj. . .
subtracted as in Eq (E.2). The N-i term will be subtracted
after the total operations count has been derived for 3, 4,
and 5 factors and combined with factors of 2. Using Eqs
(E.4) - (E.5) and the number of additions and multiplications
required for c ich provides the operations count for the
radix-2 section as:
real mult = 4(rN/2) = 2rN (E.6)
real adds = 4(rN/2) + 2(rN/2) = 3rN (E.7)
The radix-3 section requires 4 real multiplications
and 12 real additions per butterfly and 4 real multi- Iplications and two additions per complex twiddle factor.
Using Eqs (E.1) and (E.2) the number of butterflies for
p=3 is:
s sE (N/pi) = E (N/3) = sN/3 (E.8)
i=l i=l
and the number of twiddle factor (neglecting the N-I term)
is:
s(N(pi-l)/pi) = 2sN/3 (E.9)
i=l 1 1
Combininq the additions and multiplication, required for
each butterfly and twiddle fac ur with Eqs (E.8) - (E.9)
A gives the operations count for the radix-3 section as:
real mult 4(sN/3) + 4(2sN/3) = 4sN (E.10)
real adds 12(sN/'3) + 2(23N/3) 16sN/3 (E.11)
The radix-4 secti,.k has zero real multiplications
and 16 real additio:,E per butterfly with 4 real
236
multiplications and 2 real additions per twiddle factor.
The number of butterflies, where p=4, is given by:
t tE (N/p.) = E (N/4) = tN/4 (E.12)i=l i=l
the number of twiddle factors is:
t tZ (N(pi-l)/pi) = Z (3N/4) = 3tN/4 (E.13)
i=l i=1
Using the number of multiplications and additions per
butterfly and twiddle factor in Eqs (E.12) - (E.13) gives
the total operations for factors of 4 as:
real mult = 4(3tN/4) = 3tN (E.14)
real adds = 16(tN/4) + 2(3tN/4) = lltN/2 (E.15)
The radix-5 section requires 16 real multiplications
and 32 additions per butterfly with 4 real multiplications
and 2 additions per twiddle factor. Using Eqs (E.1) and
(E.2) where p=5 gives the total butterflies as:
u uE (N/pi ) = Z (N/5) = uN/5 (E.16)
i=l i=l
and the number of twiddle factors as:
u uE (N(pi-l)/pi) = E (4N/5) = 4uN/5 (E.17)
S i=l i=l
Combining Eqs (E.16) - (E.17) with the operations required
for butterflies and twiddle factor in the radix-5 section
gives the total as:
23237
real mult = 16(uN/5) + 4(4uN/5) = 32uN/5(E.18)
real adds = 32(uN/5) + 2(4uN/5) = 8uN
Using the results of Eqs (E.4) - (E.18) and subtracting
the N-i complex twiddles provides the number of real oper-
ations used for butterflies and twiddle factors for the mixed
radix algorithm. The expressions are:
real mult = 2rN + 4sN + 3tN(E.19)
+ 32uN/5 - 4(N-1)
real adds = 3rN + 16sN/3 + lltN/2(E.20)
+ 8uN - 2(N-l)
Recall that Eqs (E.19) - (E.20) account for only two
of the three sources of real operations in this algorithm.
The third source is computing the sine and cosine look up
table. From the FORTRAN program in this appendix the
expressions computing the look up table are:
WKC(I) = C*WKC(I-I) - S*WKS(I-I) + WKC(I-l) (E.21)
WKS(I) = (*WKS(I-I) + S*WKC(I-I) + WKS(I-I) (E.22)
Each equation requires 5 real addition. and 2 real
multiplications and they are computed N-i times for the
mixed radix FFT. The real operations required to compute
the look up table are:
real mult = 4(N-l) (E.23)
real adds = 10(N-1) (E-24)
238
S.>-
Combininq Eqs (E.23) - (1.24) with the real operations
for butterflies and twildle factors provides the total
real operations for the mixed radix FFT:
real mult = 2rN + 4sN + 3tN
+ 32uN/5 - 4(N-l) + 4(N-1)
= 2rN + 4sN + 3tN + 32uN/5 (E.25)
real adds = 3rN + 16sN/3 + lltN/2
+ 8uN - 2(N-1) + 10(N-1)
= 3rN + 16sN/3 + lltN/2
+ BuN + 8(N-1) (E.26)
'.2
239
Development of the Mixed
Radix Digit-Reversed Algorithm
Assuming that the number of points to be transformed
satisfies N=r I , r2, ... , rm, where rI , r2, ... , rm are
integer values, the indices of x(n) and X(k) can be
expressed as (Brigham, 1974):
n =nm 1 (r2 r3 ... rm) + nm-2 (r3 r4 --- rm)
+ nlr m + n0 (E.27)
Sk -_ 1 (r1 r2 rmi) + km-2 (r1 r2 -" rm-2 )
+ k 1r + 0 (E.28)
where
k = 0, 1, 2, ... ri-1 i < i < m
n.i = 0, 1, 2, ... r - 0 < i < m-i
For N=30 = 2x3x5 = r1 r2r3 and m=3 the input sequence
x(n) counter is:
n n2 (15) + nI (5) + no (E.29)
where
n 0, 1, 2, 3, 4
n4 0, 1, 2
n = 0, 1
The output counter k for X(k) is:
k = k2 (6) + k (2) + k
240
'q 'I l~ l l ' " .,- , ,.. - . ... ..
whe re
k0 0, 1
k 0, 1, 2
k2 0, 1, 2, 3, 4
To implement the general digit reversed counter let the
input counter n use the digit reversed multipliers of the
output counter k:
n M_1 + nm 2 (r1 ) + ...(E. 30)
+ n1 (rI r2 ... rm-2 ) + n0 (rI r2 .. rm 1 )
For the example r1 r2 r3 = 2x3x5 = 30 the digit reversed
counter becomes:
n = n2 + 2n1 + 6n0 (E.31)
where, as before:
n = 0, 1, 2, 3, 4
nI 0, 1, 2
S2= 0, 1
241
Appendix F. Sin(TLeton's M i:xed Radix FFT
This program was written by R.C. Singleton and pub-
lished by the IEEE press in "Programs for Digital Signal
Processing". It computes the DFT defined by:
N-ix(k) 7 ' x(n) exp(-j2mnk/N)
n=O
It also computes the 1/N scaled inverse Fourier transform.
The subroutine listed in this appendix factors N into
"square" and "square-free" factors and stores the results
in an array NFAC. It then calls subroutine FFTMX to com-
pute the complex Fourier transform, twiddle the data, and
reorder the complex array to final order.
Use of this subroutine for multi-variatc transforms is
described in the comments section at the beginning of the
program. A multi-variate transform is basically a single-
variate transform with modified indexing (Singleton, 1977).
The subroutine listed permits the sequence length that
has 15 or fewer factors.
The smallest number that has more than 15 factors is
12,754,584 and if this condition is encountered an error
message is printed.
The transform portion of the subroutine includes
sections for factors of 2, 3, 4, or 5 as well as a general
section for odd prime factors. The special sections for
2 and 4 include the twiddle factor multiplication in these
special sections instead of using the general twiddle factor
242
section. "Pcrforminrj t1i. transform in this manner pro-
duces a 10 percent speed improvement over the general
twiddle section" (Singleton, 1969). The special sections
for 3 and 5 are similar to the general odd factor section
but reduce the indexing required and thus improve the
speed (Singleton, 1969).
Arguments. The Singleton FFT for computing a complex
single-variate transform is called using the following
arguments:
A = The real part of the array to be transformed and
is dimensioned to length N.
B = The imaginary part of the array to be transformed
and is dimensioned to length N.
N = Length of the input sequence N which must be a
positive integer with no more than 15 factors.
NSPN = The spacing of consecutive data values while
indexing the current variable (in units determined by the
magnitude of ISN).
ISN = The sign of ISN determines the transform direc-
tion (negative for forward and positive for inverse). The
ILmagnitude of ISN determines the indexing increment for
arrays A and B. Normally the magnitude of ISN is unity.
NSEG = An integer value such that NSEG x N x NSPN
equals the total number of complex data values.
243
Usaqe. lor a sinqfl-var iate forward transform:
(1) fpoc f,, the input ;c.qunces A and B and parameters
NSEG=l, N=transform length, NSPN-=, and ISN= -1.
(2) Dimension A and B to length N.
(3) Call FFTSNG (A,B,NSEG,N,NSPN,ISN).
(4) A and B are the output real and imaginary portion
of the complex vector X(b).
To perform a real valued, inverse, or multi-variate
transform refer to the comments portion of FFTSNG.
.4
244
"1" < - " . V. f '- Lh ' :
-- - - - - - - - - - - - - - - - - - - - - - --- - - - -T ' - - - - - - - - - - - - - - - - - - - - - -
0 I HfH Y 4- Hr-itl P OF'I'B,1TflLL,-f HO-1t I HI- Wt 4L RID 119 LIP
2:7 i-OMF'OfFr-iT- OF THF PTIFT. ann W'TU~pt THE PE;L. FibhilI M P4;-I t 4 AHP F-'. fnlF NTS. OF TH f NIIJT-A FOURIIEF-. COEFF n':lEtiTC
f-H TI HR I Th paTH 7U T H-4Ti Fi Hr- Itip ri I ~l TO THE FOPTRHiHR FL- WM-I >'W1fl F: T ~ F I ITkli' . IrIT
C~~~~~tj~~~~ TFh Ti '4 F C tF E 'Y:K i4 F 7 TTri-I- i'1-- I ~- V CHL9-I i>4. i-iF Fp
T HF ' HI 7 P H7 rTItT 1_ F TV IN rREr'il, [TIH- - -
Nn1 TA - 4 : T P Ti-LC C" FQ ~ .~
'4 P p :H F- ~ >PT t 1 1 5 1 7 1- I
-- U - T--
T .I '
5 ~~ F'i LzW F T . - < e I * . -
- ,' t1= C I FFT H* F t I *r., n 1 r - I
245
71 I~r4rFI;PP' T K'Ta fu' i
4'I EIIIILE ri E 117T:Ik I.s. TFHlf
11F T FR P TNF H .TOF? OF
II F- T T%[ '
- .246
I ' .t u- l i - Pl t r I j 1 0 1I
iT
IF~ ~~~ 7.9. ; 1
l hEi= IF F~. FT T D ,F t TCJTN
h1
W:4F14-ill rlcI> F = .
1247
I! =
1--,- --.= J -.
3,_4V 0 t I r, = r, r
I'1 1 = _1 = _ , - 1 1
1 ' 41 * 4 T 1 .-
1 -:4 1.- l hni'd. = I . -C H. 4'
I . . .i,_. . [ ,,- .,-, -
tj*1L v , I:. -fl- --- +1 T -~ r4 W P, - R H.~'I ~ l - &
i t 1 11,: -L h-rA "T. i-4i'' i',-A, I.d. j.
--- - .- --
--.-
a I -- l,,
--,.i TI. iii r- r" i-Fi 'fl
247
7rII F JIi T. 1'n TO I
1 ii
1P iii D~~I P 1811C. Ii- ", F i '
1 41 ti-I Ri 1. F 1 -1 :4T '4'
1, i l [ z 0 . T T i, T
1 i -11 . T -;
1-4:r1= I'I[ = ->14
1, T= A F - T-
I--' =3 . i
l ., fli li A = c.>. F 'I.. IT.
. 1' 1F *PHr I
t I- Ial ==1
C "1119'= 1;~ = . t':1'i = - 1
1 I- ,F r4FR".: 2 1-0 TO 11
248
I -T
7u H F Trr rIr4F~ I 7 I LI -i r'4'fl iLT FOP, TFuhi: Rr T01- Ovi iP I THMETI'1 1. LLI -1JE-TITUJTE
:- - ~ .I i .F
IIII
- , - 1249
; n- TO * --
2_ = [--
T ~ P =' r 4 4 - nF ri -
-' ' , IF" ,:Wh .. LT.H'H' '30 T0 1. '-,,:,nn= k [ = .: - H.
L .".:.,In=iF ,I':k.LE.V P FiH' 35 TO U'''t
,=,44.n TFi<F1 F cF"TO"nr
I: i=t I
oT
- ,'<I- - 1 c-
: - .e - - e . "
.! ' 1 ' = -. : I ' **
S 7.1. = t , F = S i: 'it '
~250
rr1
I4i , , 4i -'" -.,,-
-I- - -:- - 4- -
:4 - -,- , = -' - ' -.:-:4 .3 h =,q :.-::, = ~.: ,! - - - ; , - .
.:41 11 '"t :_ = F * M '::.' 4- W:liel-_-
L .4%I~l= : k = : R- We F::._p:e'f
44 I'=Fbri..+PKL:
S4.E, =IF .:: LE. NT) 130 TO 15A47 0 17 :': = H T + .1:
IF *'b LE. Ml: GO TO 1 .3004'"= IF '4 :. LT. F- N:'t 3 To 2 (1
It *--, LE. " _ TO I -fl:: :'lt1 " 1-- , . 7 P iz Hr .F: .1 ,71 1 , ' " T . .
7 1,--::%4 1= 1:;1 r:F- . = t: ;.1 4- Hi "
- -- , 1 1 -.r , ' =- f, Fl , i-
T1 I-r4 P 3F 5 'LF'1 Iur4HI I-0 1F
251ai&Zr
3'e:3Ii ,k. 1 = I-ZW - F:.
4?'' t- (104: =~ , l 1:_4,1 il~l'l, 'k 4 = E:i: 4- F;!
4111 I)= F:, k4, = E:I -4' -
4Ji1t'= ,F: = -FekE - F il:e7
4fr~t= t.: = F:.{ ",:' 4- B. i'7 _ 4- F::
44.0:.h 4 I = 41< fle ,- - _ie.' ,"4 = P r . - , ,- T.'-
411.'"=U-T , , = H-1 4- -, ,~
4 1talIT )- " - -Il 4'F - q
4i1. ,-,1t1{. = {. . - HF
411 0, b. M ' 7-
4 1. *I=
41'11= I. t-f',r ' -
'J 4,-,-.,= -- ,~ ,ri' 'Kl,
4 ,'* t - r . * r * .. :r 4
4 Co '
4 24 0 17. 1
252
49.1 Ek = F. - F
49:r 1i *. = F. - F
494a 5 ~ '~ i -4 6'.c2454" 0= BT = I B i FH
4 P'.(iIF '41.LTJ -Ct.'150 TO &6
49 F.,: 0 k:
-~~ T k 1 'J
I. n = Tn
~ FFrib
4-.4ii .2=3
- 4 #I= L e. ft liJ
44 bl.
flln.,i=, -t = i .....* *, *c4,' 1 *-* . 4- 1 t
9ft fl = "" 1 = 6-.1 .1. 1!'"3 II I i, = I; 1. 1:-
~I'lIII .- W I ' :1 = I
= 1Cit 11= :. = k:. 4 - F't4
,l ,)I IF ,. .LF_ ' T ET _ T T -- .
T-, 9J') I= I = Hi14'.-
S,13:F= H;,: tI l. ,*hf. ~
"ErI- tl- r = < C - f' .T--: -.-..1.-
f = I rir'.]I,
In, -., -. ii -- V--T
I TG) 4 4
01 L
": i4 = IF ,d LT.c F- i. F. T- -:2
141 i)= _11=T
N254
... - 4 -i "- .- l ' ' =' In .. .. . . . _ i
r-i P
5,-, i= E 'I< ': = Br--Ij + I F- . ti
5r., I F ,R.LT.k -.:' GO TO 370
.,711 .1 1 LP -4:n T 05r -1= IF- ,' K . 'LT. r ; 0: :, 30 TO ,-
III= I T-T il -- 'I] I F ..L T. 1.- 1' 3fl TO .:7i'
7I .. Tp r4 4. I'4t74'-= j- ,I'.'.LT.h. -. Rf TO -'
=,- C [ " ,F F.. _T. , '+,i TB -2.-.i
F I I :. 3":
j1 TB!4'
~~~~~~'~~~~ Tp= +FN rr i~T tTT ~,-
IU--7o = = TO' 4fl ,
4 ,--,1'= 4' I = ' 2',, . ,= .F- , = *.i,.
,-.--ll= F = - * --[ !
C.,'-;, IF ,.Ak*.LT . IT' 3fl TOl 41"tI
*:-:.4,-, , .' = F.,? - tIT 4. r" -' r". c'-c,II= k:F = l: - NT 4. *..t
..... i,=IF ' . LT.1 - ' .Hi TO 41111
255
:=
.44
,'rl-1 .I ) I -1
618( = k:T z IT *T 1r 1 n r- rm = NFI C,:1<:T) - 1
4 F -_ _ I Ir,L 11I1= .I = I1
FI = rip- TD 4T-l4I i ? '~ = 4 !e 1 l1 _1 _ I _1 I - 1 T
1,- 1 =- r - 1
r i
= tA~i T f
I I * -
I
, "m ,, I . 2-5
- flI. F- k.
7, 4
6,,71= = I - In-.7h ' =IF , 1 .Fi .:. ,~ * '- TOl C.fi"
_,- 1= r , i, F I = . 0- 0 5Fr-,7,2A= i-'c = 1.:1 - . *: r4FI<:,
o ,.--: = k = -r-.F',::.
II= F; ,: . = F, : .:',676, =w 1 = L - It-W
*77 L-IEk =-- : :E' - I r-.:,.-II: IF ,F l. HE. F::, F1f TO C_,9,f'
,fr: T; = L r4
,2 r5- A. .,] 1- , -.
r-1, 1 i-c ,
' = - iI
r'- - -I:-- -'' 11 '* ,- H1
'
-..-. ,ij='t I ' = t.i ,' ,-
-•-,I -
....
- |*<- CW
The International Mathematical Subroutine Library
contains a mixed radix subroutine which can perform the
FT[' of any positive inteqer length sequence. This sub-
routine was based on Singleton's article "On Computing the
Fast Fourier Transform", Comm. ACM 10(10) 1967 in which he
proposed several ideas used in the IMSL subroutine. As
stated in Chapter III the program closely resembles
Singleton's algorithm published in the open literature
but the IMSL version has been copyrighted and the FORTRAN
code is not listed in this paper. The IMSL description of
the algorithm and its usage are included in this appendix
for the convenience of the reader and a detailed develop-
ment of the real operations count which was not presented
in the main text is also in this appendix.
.
258
ho-7ijg 2
Real Optrations CouinL Fc, r
TMSI, Mi- d Radix Alqorithm
A copyrighted mixed radix FFT is available through
the International Mathematical Scientific Library (IMSL)
on the CDC computer used at AFIT. This subroutine (FFTCC)
can accept any length sequence N including prime numbers.
It is based on an article written by Singleton, "On
Computing the Fast Fourier Transform" published in 1967.
Functionally this subroutine has few differences from
Singleton's algorithm described in the preceding section.
The factoring, twiddle factors, and reordering of the data
is the same, however, the special sections for factors of
3 and 4 require 2 and 8 more additions, respectively, than
Singleton's subroutine. Also this mixed radi) algorithm
uses the general factors section for odd prime factors of
5 or greater which further reduces the efficiency compared
to Singleton's.
As in the case of Singleton's FFT subroutine the real
operations count for the TMSL subroutine is determined from
the number of twiddle factors:
m•~ (N(Pi-l)/Pi) - (N-1)( .I:' i=l
and the number of butterflies:
mSE N/Pi (G. 2)
i=1
259
where N-1) 1) - . . 1n tli j subro t n,' tho L, c t= r wi is
performed such that N 2 3- 4 t p 11 k with the
real operations count being derived from the FORTRAN
coded subroutine FFTCC and the Eqs (C.1) and (G.2) . The
radix-2 section of FFTCC includes the twiddle factor multi-
plications with the butterfly computation. In Chis case
there are rN/2 butterflies and twiddle factors to be com-
puted using 4 real multiplications and 6 real additions
giving:
# real mult = 4(rN/2) = 2rN (G.3)
# real adds = 6(rN/2) = 3rN (G.41
The radix-3 section uses sN/3 butterflies and 2sN/3 twiddle
factors which require 4 real multiplications and 14 additions
per butterfly and 4 real multiplications and 2 real additions
per twiddle factor. Combining the butterflies and twiddle
factors the real operations count for the radix-3 section
is given by:
real mult = 4(2sN/3) + 4(sN/3) 4sN (G.5)
real adds = 14(sN/3) + 2(2sN/3) 6sN (G.6)
The radix- 4 section uses 24 real additions and no ren
multiplications for the tN/4 butterflies. The 3tN/4 twiddleI
factors require 2 real additions and 4 real multiplications.
.% Combining the results gives:
I real mult = 3tN (G.7)
real adds = 24tN/4 + 2(3tN/4)
= 15tN/2 (G.8)
260
All odd prime C<ictors ciual to or (,rater than 5 iso
the general transform section. Based on the [ORTRAN
program written by IMSL there are five sources of real
operations in this general radix-pi transform excluding the
array indexina additions. First the complex multiruliers
are computed for the butterfly transmittance:
real mult = 2 (pi-i) (G.9)
real adds = (Pi-l) (G.10)
for each new factor pi, e.g., N=7*4=28 and N=7*7*4=196
each require the same (pi-l)=(7-1) complex multiplications
for the factor pi= 7 . Second the complex twiddle factor
multiplications are performed on the data array. Assuming
N can be factored as:
N 2 r 3 s 4 t ml m2 mkP1 P2 .. P
mi .thwhere pi represents the i factor raised to some a-. sitive
integer mi, the number of complex twiddles is (mi)N(Pi-l)/p i
-(N-1). The n-i term is subtracted only once for each FFT,
which means the intermediate result can be written as:
real mult = 4(mi)N(pi-l)/'p. (G.!I
real adds = 2(mi)N(pi-l)/pi (G.12)
The individual butterflies are computed next. The first
output of each butterfly requires only 3(pi-i)/2 real
additions and no multiplications. For each radix-pm' there1 o
are (mi)N/pi butterflies in the FFT giving:
261
real 1 dds - (8 (p.-1)/2 ) ( (mi )I,'pr1 (- (
= 4N(mi) (-i) /p (G. 13)
Now the remaining portion of each butterfly is computedusing (pi-l)2 real multiplications and additions. This
gives a total of:2
real mult = N(pi-l) (mi)/pi (G.14)
2real adds = N(pi-l) (mi)/pi (G.15)
Finally the results of the butterfly operations are stored
in the proper array locations requiring 4 real additions
times (pi-1)/2 times the niunber of radix-pi butterflies.
This total is:
real adds = (4(pi-l)/2)(N(mi)/pi)
= 2(mi)N(p i-1)/pi (G.16)
Combining Eqs (G.9) - (G.16) the number of real
operations for the pi factor becomes:
kreal mult = Z (2(pi-l) + 4(mi)N(Pi-l)/Pi
i=l2
+ N(pi-l) (mi) /pi ) (G.17)
kreal adds = Z (pi-l) + 2(mi)N(pi-l)/Pi
i=l
32
+ 4(mi)N(pi-l)/pi + N(pi-l) 2(mi)/Pi
+ 2(mi)N(pi-1)/pi )
k= ((pi-l) + 8(mi)N(pi-l)/pi
i=l
2+ N(pi-1) (mi)/Pi) (G.18)
262 t
Usini Eqs (;. i7) and (W. L.) , r tilk 0hI pri me ctors 11d
the real or-rat i ons count t- facto :-. of :, 3, and 4 the
total operations cound for N 2r 3s 4 t pl ... p k can
be written as:
real mult = 2rN + 4sN + 3tN
k+ Z (2(pi-l) + 4(mi)N(pi-l)/pi
i=l
+ N(mi) (pi-) 2/pi) - 4(N-1) (G.19)
real adds = 3rN + 6sN + 15tN/2
k+ Z ((Pi-l) + 8(mi)N(pi-l)/pi
i=l
+ N(MI) (pi-l) 2/p i ) - 2(N-1) (G.20)
As in any FFT the real operations associated with the
twiddle factors have been reduced by (N-l) multiplications
and additions because the last stage of decimation-in-
frequency or the first stage of a decimation-in-time FFT
require no twiddles.
2
263
f a S i - - r 1 i , --.. ..
Appendi-: 11. An Al o -ithrn for C v:pntin,: the WFT,.
This program computes the DVT defined by:
N-IX(k) x(n) exp(-j2-rnk/N) ; k=0, 1, .... N-I
n=0
where the sequence length N is a product of the relative
prime factors from the set (2,3,4,5,7,8,9,16).
Program Description. The WFTA consists of the six
subroutines PERM 1, PERM 2, MULT, WEAVE 1, WEAVE 2, and
INISHL. Step One is to map the sequence x(n) into a
u-dimensional array s(n , n2, ..., nu). Step Two implements
the "pre-weave" modules in subroutine WEAVE 1, one for each
factor of N. Each of the pre-weave modules contains only
additions. Step Three performs a point by point multiply
on the data array (subroutine MULT) of real constants
derived from the small-N DFT algorithms. These constant
multipliers are a function of the complex exponentials of
WN and are the only complex multiplications required in
the algorithm. Step Four implements the post-weave
(WEAVE 2 subroutino) module which contains additions,
subtractions, and multiplies by j. Step Five maps the
I u-dimensional array s(kl, k 2, ..., k u ) into the correct
one-dimensional DFT x(k) according to the Chinese remainder
theorem given in Eq (3.144) (McClellan and Nawab, 1979).
4Arguments. The WFTA is called using the following
arguments. More arguments exist in this list than in the
one given by McClellan and Nawab because array storage is
minimized in this WFTA version.
264
.-... ~I
N = Transform iength whr-!h mu;t be factorable into
mutually prime factors from the sot 2,3,4,5,7,8,9,16.
A list of acceptable sequence lengths is given in the left-
most column of Table 3.9a,b.
XR and XI = The real and imaginary arrays to be trans-
formed and are dimensioned to length N in the calling
program.
INIT = A flag to specify whether the call to FFTWIN
requires initialization. INIT = 0 means initialization
is required and INIT # 0 skips the phase. Initialization
is needed when calling FFTWIN for the first time for a
given sequence length.
IERR = Contains an error code upon return from FFTWIN.
If the DFT was successful IERR = 0; if an error occurred
IERR = -1 or -2. There are two causes for an error:
(1) The transform length is illegal, or
(2) The program has not been initialized for
the correct length N sequence.
SR and SI = One dimensional working arrays of length
M M x M2 x M3 x M4 which is the product of the multi-
plies required by the small-N algorithms. The value of M
* for any permissible N is given in Table H.1 in the right-
most column.
COEF = One-dimensional array length M used to store
the constant coefficients generated by INISHL for the
"weave" modules.
265
*Im " ,
TN I i r:,i T ',-:'2 r)nl,-Jjl flSioar 1 l(,nqth N r(ttppinq
vectors for prj-- and post-permutations of the data.
Usage
(1) Specify the input sequences XR and XI with parameters
N, INIT, IERR, SR, WY7, COEF, INDX 1, INDX 2.
(2) Call WFTA (XR, XI, N, INIT, ERR, SR, SI, COEF,
INDX 1, INDX 2).
(3) XR and XI are the output real and imaginary vectors.
The error code IERR=O specifies successful completion
of the transform.
(4) After the initial call, use INIT30 as long as N
remains constant.
266
$1
T' T
P' FT 1. 1 P- r-4
1" f =1 P POIG'PHR HO NT IHITIRILIZEt' FOR IHL.VLU OF H
-.5 fI =I- PERr.1U-TE THP INPUT Di;TA
CALL PEFMPiI 1 1T N:' D:- 1Ni:.
[ifl THF 'E-IiH" NtHE4 Hi
4; -,i [n THF rIP - IFri tMI-I-l P I F I; T I ir
4'1: t:P1 L t-II1I1 T nP - 'P H .(fl F.Nt it T~
I-~~~ Pc F' r -- ~ ~ p' I il l
-I! I 1 i- Ir l H 4.c=F F, 4 I 41
-:iif-= IHLF r . - F 4 4.. . 1 t
-4jj NE LI N.JH N q N Fj.4iC1J§N;1v.N il-jo L14. lr 1
267
TH RP R--PI T .;.----. DFF- -
IliT fl ri-4'. 1 '-1.~ 4 I59 f6 :4 ,iI
- Ii- i
r7. E-P F i
F:4 0 r =3Tr
I n = -4E: It1 1.l
Lit
-4411=T
F A A
IF IF '1. EF'. H: GO TO £90In ~ PPT INT*, -- TI ti 1OES HOT IaOP'
14''' PE T1IF-H141''~l
14 i C HYT ?3M-iT lEEFRTET-. THE 1w- I 1'LOEFF IC rNT: H4t
143u THE FLR, iBFRPPAY.1441J£5li 1145 DO H C 4= I *ND
147 'I' 1-11) tI 4.:: 1 . H [$l,.1 4:-.H lE) -1'll rij=l * FiLl
1 l 4': 0 F FT H I t I 41- t4
_~~~~~1 41 T I t. i Al 6 F3FTFP'ItP- npIH
*1 4
lr~l'
k ~ T~i'=julT n 4
1 (I= 04i1 Irit i H
269
- , To
19-1 0= '30 TO 5-_I'
1 F: k= _4' IF iEI-. . 1' i30 TO 6,-361
19.4= r11 5 0 ,Il= s_,A, I'1 P =i .riI.,1- 11 'k. i = IF , ' 1 ..- ri[.:, r .F. E F. P P ' 0 '1 0 ti u
I '97 I'= r=ril 1
1 a.?. 1= '_30 TO 6, 0
,0'0 '' . ii- I F *r "r[. El-. j: ' 30 TO 5,., i0
= I= I T
-- or, F
) ," Ir
I 1-= [ii) :;1 r 4= 1 I
-ii," ' -- -I ' I. - ' .
- Ti
I '
,---',!= r -Ti 2 trT,,C 1ff , :,
t: cc = IIriE 3F tlu ii
- rP,', I I=r4' l-NH
270
.... ' ' ' ' "; ' ' - -" - " . .-- ;
P 4:: 0 , . *i**
DO 4 AT 14iF-F[
L.45 11
rio .~ 7 1t=~*I
--. * r4r ,-' -.
1271
tI P
11=10
N- P (i1 4 r-9- -. v
c1- P 4l 4 P
IF--- 4'- -t
[5~~~ 44 C,-A4=I
li 1= i 1=P 4- 1
1 11= 1 -- NP*1
1411=~~~~ 1 r<:iPi
r~Uf= lt N ~ 272
I II I I ril-
r ip, - ' 4-.-+
"=~~~ 4 P TI ,- = r
:4 -1:= t - - 1 . fj , , - I -'. - - > : L ;!-i
' -.- =r4 a + I
Iii Ib=ll ": Ii.;. I -+ I * --
-:,:,1 il:r -4 = ,- - - -
-:fI-, [= LI 1 Et r tl--=: 1 .r-. ,;liA riP't = g- HF + 1
'41-4
t-CIF
-, *9 , i-i 1i l = ,. 1 ,-
-- ,7 =HF 1 3=fl 11 4-i
- = HF- 1 5 ...
. 1n, 1.a :,1 1 "-, ,-
:::t , [,rl J,.il '=1. 4
i -:-; 1 :I , I ', * V, T ' 4 -4
t4
::<41' = r .,, r ,I I l 2--:>"-If=r, , 4[,' H- = '' ' -u' -':.ln b"~~~F ' 1--.- ,: t - , .
273
-a -- -I-, S•
- - -,'.-I. -- T 1 '- .
- I -- & I ' ,= l -4, '', ,.
4- I = D - i nT1 , '- 11 'r Q4.)i)ir-i,= - -'t-r--=i+--F,- F
-:, 14 14-TI
4i,4,,= I7 1 4 -;
41 ,
= ' ; r ii. r - ' ,- --
41 11 -.5 *4. =T, -T -1 +4:
4 I I 1. f t4T I _ F4 1 I ri l t - , = F ' , + -, ",
4 1 4 - , =1 , i1 -1 . .
4 15 .I F' ''-, ' s-f ,.:4 :
4- .I , rt 1 - I-1, - 17,1I
4 I 11: , - = , 4*, 1-:
4 I=_'' . I i h ' =, 4, - 1 -, I > 74 + T.
4,_=:1............ .. ~'.....
4,-'~~~~~~ 1,: _ '[I2 ,
= ' I -T ' '-
4,_'4 I : .= 1 4- T
, p=I -,I 1 I . , I- t, 4; I
I, I +-
4 ' 1 - +-4 7.' . [ r . 1 r 4 1 , -1 , g
4444 a '' *r-- - = i ,~I ,e i
tFF
4 41r4,
274
" AD-AlSO 782 AIR FORCE INST OF TECH WRIGHT-PATTERSON AS OH SCHOO-ETC F/0 12/1EFFICIENT COMPUTER IMPLEMENTATIONS OF FAST FOURIER TRANSFORMS.IW)DEC SO.0 J0 BLANKEN.
UNCLASSIXFIED AFIT/SE/EE/800-9 N
44: 1=fl 4 4u1 t1j4= i t4 1
I,'I
*N P
ll= T~C~
I ~iN Pf qL 11-R
4r$41-41w ; : *'.N. F~T~ Tl Tn-
4660f=146? 70=C. THE POLLOIN1 COVE IriPLEMErI'- tHE 9POINT Fr-E-IIEAVE r~"
4 7 (1A= C.
47101= NLjPS= 1 i'*rit114 7 20 = NL U23-1 1 otitl t 'NtD3-NC)4 7:3 )= N F:AR:.EF=1I4 4 0= NOFF=Ntil4 7 9A DO 940 tN4=I NDi4 79 DO [in I.N
4 7= r-P1l=r4PF: TE+NflFF
4.-- .11 P -:fN+W *NG1FF4 1P' t4-N=P ~+ NflF F
4 ~ ~ -11 lr=-'4 +rt4i0F F4r:: P1 NR r 4;s 4r",Fp
4:~41 4 ?=P + N F! FF
T'--= 4. rti g
T-1 I -. F-- iP I- R- k~
li-
TI = 4.P T I' +p T 4<-T
'P 'NPF;4 = T I -T7
5010Q= $PqikP,5) 14 -T I
275
, .' P 1 -T4
=~'11..I = Fb . I ; .- , - [ , +:, = = I, r -T '_, I NP
t" %1 i~n= T7'I ',~ F ;' , I{ , t F T"-, 11= T =, I 'ri- T- f; ,
-1 1F' ,I FI,. + T i,
=,1 T".= Ti- ) "- 1 4 -,;
14T:-' I ,4Fl , -I I r-4-;
51,,= T4=.F 1 ,ri.'4. I HR
170= T5= . I 6, 4 1 . Is 5. s,
Sl~i115~ =Trf I -1 Tr* TI- I.4 -: = 1 -T4T7
520t= 1. kNF.5., =4-TI521 (.= -1 I (t4P6 =TT-T4522 f= .71 't,:NrP1 0:, =T2 T5 +T:.-523 0= SI P =T'3:-T2.54f= SI ':N: , =T..S-TS8525 0= S1 ,:',;=T2-T55260=91 0 NPRA.E=NFA:..E + 15270=93:' rl: -, .E=NrE::.E + NLUP252 8 0.=:94 I) "BR.:.E NE:EHFH SE +N LIjF'3:_529 0=700 IFiG. N.7' '60 TO 500
5 :3:2 =C:_-
5.3A'=C': THE FOLLOITNG COPE IMPLEMENIT: IHE 7 llt-T PPE-hE ,'..,F r' LLF5:'4 0hf
5:- 0 = -FF=N 1 *t': F; F= I0.::;n= ri F '?;E=. 1 HlF
5 -: a Itl4 1-l 1. IP '= - 4 n;l: FDO. , 4I1 4 qI P
S'-,4= :i1 - =[d-t 14 -''~i
.4 4 1= r4 . = , 4-fa ici F:
540 A= NP4 r 4 ". + -s-" fF:
5 47N ni -11F 4+r1-P
j 94::ui= ts ,,= r.i-r-,rF F
,, ,= T P 4 ". r '. , -P ':N-,'.)_., ": ; f=T':: . .P N'. , -": P' ,NP.-:)
5514 1= T ' 2 H°
P +. 7i..-.. ; P N P,.Fj
rSG. :, A: T5=_, .. , '2 -:.I.,'P )
5560= .S' ('5) T6-T3
276
S - " --; 1wr - . -, .. - . .. -
TH ~ ~ T~ -- T1 4T
P ~ T 1 '- T4
1 I4= 1 t4 4 ' T.
57111=~i r5 I 'H-T I I~.
mHY:T *4t *
5740 1 T 4;5 1 T -
.5760A= SI NP.3",T2 -T I577 O= SI(HP4'=TI T4578A= 31 (NP 7) =1T4 -T5790(= T1=T1-14"-T2.5800O= --s I"NPESE' =:SI (r-4:Hi.E) -Ti1581(0= i3:I q4R1'')=T I
5S:3l=-740' N F.A :E = r4B.F4 CE *-LUJP2
5-3 4 A=5 1-0C IF I:'rl. E.5): FETIJPH
5SY:7 0C. THE FOLLIJIIN3 CODE IMPLEMEM.1 IH 5 POI.NT PPE-fiFk-et P1i'tL
59FFr4 1 O=rF9A 5
5-;4 fl= t-I) F1 ': I T 1l At [Iir
-, ~ ~ ~ ~ i 4 --- '
P--4-114 -1 P i - T
p cIn I= 1 4 r-'i 1 T >r I + AT
e-,! Os" SlR Hl* =T4 ~ ~ ~ ~ .. r. 4c i= T I.(j -. : *.;e.
T1 4w '1 N =V I I T P4
61O= T4 I S. 1 -I + I N; 4'
277
~.1;T fir, 4- 1 ti-
rr,~F' I + T ;, *4. i
'-.1, ~i T 4
P7,~' F T I * Mi. T'I: iJ F * :
6.31 a71 EN
6.3-30j= T~~ P1.6341= COMM~~~n ~~ r4F~~r~i:7~Ni~l ~E' ~~~ 4
E.431 riEH=:
6.3 -3 = ~ F EF L wi I )I~ ..3 1 1
?4I=7CF:*JiT
P.44 ii=
1.104!5 r4 t 4-- 1 I r1P.:.~ ~ ~ -1 N P t.: = r 1 t4-4
4 N P D .r- a,1 r --4
M6 '= T4= I ITF& 't 'f.. I I'' T3=2 1 f 1'; 4 Ttti
P TT T* 4S
AA.4 A= T4=P *FP. .P N
6660= SI,.NPl"Tl.T4
IF 278
ri-i, 0=U NP4:'=Tt-T4T, T, F ,7B= .:I(I'F.'4)A- 1 I -T.
4 -- , , 1 . , ', - -T
' 61 -,.I=-f"I+ IF,':"*if7.1 1 s I3f -
r., I +, "
-. + NO PF
6.- 11 0= N' F=r4 A+N [-
6: 2 0= rT I.; .P- N P I } .P NB .E
7 ftf. (I :LiD 7411 + 7-,1 q ri..:;=,n,=V[ Ti, 1 1 =1 * ,,,rrb~~bO= MP 1=:td::i-&E *ruuFF
67('t= =PSN T T6D
687 0A= SP2= T2-t FF
6970= 1;5=4P4 *-F Ft- II= -6 P 54 IFF
4 1,6340= TlP.Fe +PNBI.E
6920= NP 7=NP6 +OFF6930= r ;, 1=r .TI +FF
~~~,QA~~~ f't TT FTNFi AP
7T l=TPi11=r4 T3'4P:
O=Ti=.: ': I ' ' +. 4 -T +SE 7
6 '95 --"T 2 T - .ri' -Si F .'-,:P :. -. :.i *:f P 146" '
7 T4=1 I= . - ; '-i_-:' -l' .r7 P ::
702.9
,I9r.I T =T ,. ., ,= 4- ;., .F+' +)
"7 110= 1-.1' , i+. I 'tiT 1 -'
T9P' ,-' O= -T ' =T 1 Tu -
T ;F = '-. ' r'ti? -T 5',- ' ~-47)I4"I+ I:'r=4*1)TlT6
" /111T = I T 1 . I,=1.-4, "I' '7,
71 0 J 6="."' \1 'F '-44r. ', PF5: - ,Nt":
7 21";=T5 '-', I I 4F','=T4-- -,t':-: - : ';
-=-r1,- -.' , .---
7- R "it iFH P,7 P:I.
4 i 1 4 11 ji *iE . .. 1.8 I 1.1p,:j L
~~ **
7 F11 4H FiILi i r-4T PMLr~i T54 Pflri :.'
ii L
72 ".'ElJ.= I N1F
7411= -S ['0 I . N= I1.' N F 1
7 4* 0J T1=SRI (NBAE:F,.E'+-- (hp N1)
7470= 1NR P1T1;-PNP2748(1= .;i (NP) =TI--SP (NP2')749FI= .3R (rip.,) =...Pj.79Ci -, I ci E .. =P 7T+I
71 .11 ( i= FF~E~4t ENL iP,252* fi=- .41:A N P 4F=r:-F .+~LlpF ,
75 '0=--.ifi IF .:. NE., GO'' TO 40A
7 (=- THE FOLLOiIING3 4:*OEE IMPLEMENi..-, , POINT O.-Ei.' ::_-
;' ~ f-4 i' 4 IIF,?= I Ii~ or4l I
4765f9i= io :44 1-1 r44= lr4 i
7E.A DO L1', ? 41_76 (r4:.1 1 . ri [FF
-R fi=, r4f.. I=tif: j1 r41PFF
11I= ;~4 W. 1-4 -+
77:(I= Np=N44rPlFNP =~F*mnfFF.,1774fi= NP7=NP6 +rM:l=77r-;fi ~ 4;:47f l
* ~~776(1= R=4:4,Ft
F 280
777T 7r i=t-J 4lF
jj.4 =T-T .A.ri' . ,-5 .,.,-4-
," -Ii',=.. ,'' U ~ .r {-H : E--± :,IA {.r .Ft. *i....'.fr' -H, * . 1 ' .
T I. 1 1'*f ii~
S4P r. ' = r.
;. ,':: :? .. I I:l I.--T .- :- , r-' , - - , ' - ,,,T T =T ,. - 4 , - ,-.
-.- ,= T I ' - 1 , . + I
1.F ,.irJ '=1 ,*I'L,. 411 7 - T T:,
7'?,.: U SF'H,:i 4.-'4 ,=It-T8in, =- r -'; ,. =T 1 -
. T= - 14- "-F4 Tr79.5O= S. P'2=T 4 + Tr.,
796 0= T 3 = S I N:.-.E') I 'F3T9 7A= T7=_.: I ,:iB,.E +- I ,.P I:,798 O= I .SE:, =SI N P.- F4 *S! E)NF3, 1. 11 I ':7990= T6=T.3-SF' NP 10:'8r n= 1 (N ' ., =T -+-: +.-p , N . fl0801 O= T4=T 7 + I ' N P5' : N6.0?0 = T1=T7-SI 'NP4:, -I:NPS)
: 3'..: = T7 T 7 +.7_. I 'N P. 4 ) -S I '.N P6)804f'= -I alPA6:, =T 680.50= TS=P 'PS., -SRP ( P'T71 -S.N P,8:);-: f = T 5 N '. 5 P , ' - -.: 70= T . : NP .': P- F "P7 + P N P"C,
; fi',:'.3 N =P , I . T 7 + T ,-
I' TF I TA_ NI 'HP-)=T7-T5;_:I , 0 :"-I *rlR:',=T1 fT:-
:31 10n= .I ,'HP 5)',=TI -T.Z
4t- ' = 7. -= T _
* .-:I .', i .FM ,-,'_;=Tl r 4 -r T ,14 A= I P*' r4 P-.hr .', ,
-::16 -49 --49 J;, K , =l F 4'-: t F
r
4"
iv
.- [H7,"I , ;F ,-ULL" I. . i ,-E IMFE -4+ - 4 1 H irF..F., IF -,: 4 g . r: PET(' P-b
I FJ r 3=4 C"410 -
8.310= 17E.E=1
4,°
281
I .1
t'Fl 4411 iH4 I *HITl
.: ': 1 -- Iti 4 11 : - = ,* 'ii
-4 V *4P ,- II-='
- '- 4-i . I" q
,11= 1- ;= ?-- 1
* : , .- . = . ' r.H F- r , +- . ,r .- ':
4111:-t; T1,'4 ' -: ,fr.'j. I - ;- , ,:,
;4c' TISl=.". '.r" fr1.'-±'.'1 -:
;-: 4 "1 = - H : tF.' ,'- =TR Fa,- TR' 1F ,P 1:' =TP? ,T F -
:S:4, 1; F *.,, r4Fp::, = IF'?: - T T 7
TIIK c :'i9F.-
8500=".. I ,I4P:-E) =I I +T I 1510= .". I N ':=TTO-TII
~~.T 'frTI-T PS'4-: P. N= P'- I i 1)=T P,,= -T TI'-::'. 7f= : P N P.-, ,=T PI: -T T '-
85.(' (0= .ItNPR.E)'=12I+TI
854 f=4-0 I"4 : F4 E = N"4 F: E + 49551 0=4: N E E = N B R E + NL. 11 P:8560=440 F: HSE_, E =N:H.- F +NLIPF,?-.857 n=8II0 IF,:NR. NE.8:' G0 TO 1600
:3. A C.=:-- l:- fl=C
:-,. 0=~C. THE FOLL00IINI3 CODE IMFLEMEI':- IHE POINT FO-T-hIE,'.- -
S65 0= MLUP&=S.NDC-NP);-: ~ ~~ 4'_ I- =rL Ij P 2'_:=7. r 1 1i: i,,rt[ , ~l:'
: Hk7*:II [] 3 NIFSE 1 i- . i,
.F ['U - i) [44 -1 • r ,
';7, i:[IL .- ' I,.' Fi, * H
- '.-:' + 4 =rJ., -4 : 7 r ', r '4 -
7 - 1 E-1; F Pr&41
P. T I . 4 P _ I PI P::,,. -_ h=T 1=! ' tF~-H : F .P -1 .' t. 4 F i ,::'.:::*11- rF;,..= P'Ft' r'u: :Triv R. '
- ?:: .i=.:-. r:;7 ~,, - - - riR 4 :,
*L " T4= - :r. F' -. I ,. riF', 5-ii- T5=;tr4s *.'t:3;:: 4 * T = ... ,P NR~' 7, €-- I ,. rN ,
8860= TF (h'4) =T I
282
=T t 1= 4 +-TtNP = 14 1-T
I
.-.- '1 iL--- -t! ',l r - ' -i 311N
A fl= 7..'!>
9 , 0 -_ (T= P :., ;,- , , r .
941tAA1 4 0 1 ' lhF"4 ' = N 1
911 1 p-- ..1 FN- H ' E ET4 TF,
9113 ffir'
'41 4' = 14 *r = 1
1.. THE FU L MIF CODE -M L ME ]- TH: 16 POINTr- ' --'- -lH'' , :MOD
N 'P *' P
':0A0 = - F', 'H'.' F,=1=-
'a fl;;.0 =S2O IAF P -i P- ,Sf7 fl= N p = 0 W .I +
9 110 tr1ln0 IF':NH.,NF-E. 16:' RETURgN91NP Ii
914r I 4 +' 19=,0 THE FOILLI.h11N13 COPTE IMP-LEMENI :-. I t 16 POIINT PO:.T-hliE '.,'E riO[,l_
E
94 pi I J;n=t4 1
S1 -;28
91 } 0 = rMk U.F;?_-= 1:: * ,:iT,:.' -Nt f:
0,' '= rIF:&:E_ =1- -'L-h= jJ 1'-.,4I 14= 1 q b
4 - 1- iI l Ii 4' 1 = *n
.- ,11 Li I.- = , ,
' 'I II F4- =I :; -
4---1,1 I ' ,' -i -Ii
'"-: hi H r4 .=Ht. - 1
' .. ' it: HF.'-:=rF;'*I,'
-V 4-s: ii= rP4pl:=rt[, 1 s I
* 9'.' 0tIf= NP'11b=fl ',l 1
283
-4411 4~ +i' 7 z ij
'4.
~~~ 'Z~z-4Z j -
- 4-. liz - , ,; ,<F- ,=;4-4 'i' , €-' :
''4-w':,-l,.:T .:,= -& r1t4. -,, - t ci .
= I ,: 'I I
T Of. r TT ..-.:- =.1' ' T ( 7 ) , i 1 1 T , i 1
-'. ln "' 1 T ,T , -1I' =;. 1. =T ' E-", -T .:5S)
962O= T 1 ) =T ,jz .T '+1T '
9 PE. - f0 = 0.. ,('3) =T ' 9.' -T 1064 I= I' 1:'=T ,1 11 *T ,12)
4 A5 '=) =T ' 11 -T (12'J6 .- = t.),1: 4 k. T ,1 )+ :1
9.k'. ",7 0) CI (5_,) =T ,15'.-: -T ( 1 4
' ,: A - p;, ,'- =T I' +T 1: 11'
97l (i= '7-. riP 7:, =,:. 7:' -i .. :;.-I Il= I-- '.
4=1i i",'-Iii
'r-, '
-.€. 4 i =- ' = : . - -;.i' 4
iI ?r
L "
S41i .. ,T + ..; , - P .- , PS:
-• - , ,.-,=i J 'r. 4 , - ;..,r-4 I- '
:T ; t': ' 4- . T r"7)w
':4 :4 ll= I It. + 14
-'-'4 i = T ':i ,,I = - i F I t - I a. NP 1 4" fq I= T , W fi -,lr-, 1 1-1 + 7I P, r 'p 1 -"
;- J' I= T: I31 i ' I ' - .I NP 1 1)9'-4 -- [= T I:l, ,= I ,r.lwtl ,-SI ,rF' 1 7 l
' l'11 T ,I I't =' w 11' 1'F;- N -. ; P 1 )
284
"4'"C.r:I1 ":1 ', I , = i ,C . +1
-4--.Il= * .'=- 14. . '-;I
I ITlI I I = , I-i,
1 i 1 11= i T.<, =1- - -T 1 ,'
I-II v'1 ,17 T" i-T I '1~~I 4 z ' 4'T +- rT'
. llf i ': ".'Z =0T - 1 s:I 7
17= m~Y 171 T' sr
~~~ Iii;- Tz T4)t 'z '
1 ftF : -izI ,:tI ' :, ',, - ,- ,-,
U 12111:= - I *: N-F.'' , =', I I, *1- ' ,r4
liii lu I'Air it' =i - '-ZI '.t
1A120= I P ' '1's4I )
1' 1 4 NP I : - * i. - , ,' : ' *
1 1 7 i -.I I, : ' 1 =T F + ; 2 ,
101': 1' 1, ,r41 4 = CT,- I1 .= F. ',: tro , = + H
102 1,1= -F'N .r1 ' z .:..P'6
I 'I,-' 20= F; m ,nr P3 :, = 3 -. ,.106:'.-_- Vi F . r4Rt;, i=P', l 0.F '
1 02.4 n= FP ,'HF 1 F1,1 " " = .F' arP , =:F25
1 A ll,7iz I r-,- I-i t4PH 2" F zlF - F *i:::
1 I I *4;z= ,1 r1"H TI ii- -IL'~
] I I I I= fF- TU [
I :1 I-s.5=*EOF
SI
i
285
Appendix I. Computinq the Prime
Factor Algorithm (PFA)
This program computes the DFT defined by:
N-1X(k) = Z x(n) exp(+j2frnk/N) ; k=0,1, ... , N-i
n=0
where the sequence length N is a product of the relative
prime factors from the set (2,3,4,5,7,8,9 and 16). This
algorithm was proposed by Kolba and Parks in 1977 and was
modified to the program presented here in 1980 by Burrus
and Eschenbacher.
Arguments. The PFA is called using the following
arguments.
N = The transform length which must be factored into
mutually prime factors from the set 2,3,4,5,7,8 and 16.
A list of acceptable sequence lengths is given in
Table 3.11 -a,b.
X and Y = The real and imaginary data arrays containing
the sequence to be transformed. These arrays are dimensioned
to length N.
NI = The array containing the factors of N. If all
four factors are not used the unused factors are set equal
to 1. For example with N=30, we have NI(1)=5, NI(2)=3,
NI(3)=2, and NI(4)=l. The factors of one must be the last
of the M's.
M = The number of ncnunity factors. For N=30, M=3.
286
UNSC = An output indexing constant which must be
precomputed. UNSC = N/(NI(l) + ... + NI(M)).
A and B = Data arrays of length N which contain the
results of the DFT. The real part is in A and the
imaginary part is in B.
Usage. To compute the forward single-variate DFT:
(1) Dimension X, Y, A, and B to length N.
(2) Define N, M, and NI(4).
(3) Compute UNSC.
(4) Input the sequence to be transformed in x and y.
(5) Call PFA (X,Y,A,B,N,M,NI,UNSC).
(6) The Fourier transform results are located in A and B.
t
poI 2 8 7
0i . 2 T ' . .. .. . ..
. I ',.2 =F-1M 'TF',' ,' ' FT I ' ' L T ' ' " -I F *,=:
1. r:. C ''i,
2 '= :, T ,- r r I -
I ' I s' :
F,=:[,'1; ;-,e L l-. ,,~ ' ... u~, , - T T Tf T S -:.rn 2 r
, ~i T ~ ,, . . :. TQ 4. 1! -i. fr -
2 A p PrlS2-s;F4M p Si,-,t-:E ii '_- , , ,- . .. .T : . , -F
'I DI M~ I ON *:-4 N 'Y :4N 'A4 N"B*:ri
I I T . , -,11 4' k- 1 ".1 ':71'-T :, 1 q 1 - , 4 5 F.. Ai Ii A 1-
D.4 T0=:71 ;' ,: 1:1. .Fs1 T :-: .. . I,, I .... r i..I.I ..
_ ,r , T , ... 4. T 4...
4 'r --,.iT , 12
.4'E T;;s . q FII . j c4
fi;zT - : C I ,' 4 [ ,. 11, • ,, '" 41 C4-If = I IH T;z - 1'4, 1-1
-1 A= Ti - 9-8' 1 i 44• 118T 4..'7 114-44 . ,, 149 1
41' 1 :-'ef =T ;-e R . , . I, ,_ 4 .=, ..... ,.,
4 cf:T;i I 4 * 11 -
4:7: 1 ~j ~:,r. i::7;4
-'I T= -I T
*- ........ "*-''*"- i4'-11 1 . . -:,-;';.;, ' .7 1 ;, . ..
.~-:4 -=D ~ ,_ .. ' :,d .........
F = I T 7 T T
Ij t:-UI1= W',.TTq ; :' .T7[ 7 1 t;,7=- ri
L'-j Ti zTT eSI0 cT
1.-.. ,-laq 1 11,
4,: = [;- T;: ; l .r u , L, cii, . .. it . 1 .- .N. 1 -'. 4 I . .' i,-, ,;--" ," ;;~I U '
;-.-
288
"" ~ ~ ~ n Am;F= ''=', !
T I=
I 1. 4 -4.,
III,
.-7 Y lI, f T! , I,
-," ,: =T2-' I .
. 1 2. TT .-: ,
-,;L, II= hh--T'l ,
4 -- 1 =1 Fwv T.;N
F)TI- 4-it
:4 11=* ~ l i
14 (4-
1 97' -*T -0 =T,
8,- (,= ' I = : y! ' = l.,l,-T1. I l=, .
: Ii.= *-',:.. T ,:3 =UI , 4'-TI 2,
:3 T T
34(1=0 : :.aPT : 14 :.., : : :
- ri-i 4 F 1 ' , I ,.1 :j : :' I
-'') I'.: ,' 1 -': )= -:.I1
4 j 1 -'4*i
' I ' ='
• -- ,i 1-= v ,'I ::v' .. +1~
I_: ' = 3 I E '
V-i4 't=]hF T .t=
-I -', = :'- 5 .i = ", ': I 1 .' : " ': I " ,':- .-= : ' ", ' IT ." - , ' C,::
.111= ,F.4=.I--- 3 1 : * ,: -;.i , I ,*4.:
4 289 -.:=, .I ,' ,,+,"
.'9.: I'= :_4 ="" ':I 8 :'-","': I :4
14== 4 -4'ic I =:,1 , " - -,* ".-1
] I. -+ l: _ -. -,-_ *: - 4',
T J!= r ' ,
1 : i .... . .I L 4 I ' ' . I '. I *.C
S I1Y= 1 I *
-41 t T: -
*4 r1 r7
1 . 1:=Y 1 I " = I -
t -1 1:- 4. ' 1 -& 514
1 4( . I1= Y l ':1 ':.2 : ":) -- 1 4+': -
1 .":' =l t'= € I " " '. =:x._- , -"-
I et'l'i- :?ej. 3]0 i;
I YI ':41::-
30~~ TOTi
~ ? r:li=r lrlll ;=14-:i=li ! PT-i=;, , ' +:' 4 _ . .7
-'., 1 '7 7.-; -7 . .,
,- . , ,-9,, -.t, , I- 5 = -+ ;,1; '-, ,. .
tt.4i'. 4 -
,
C;,; ' 1 = TiJ=7*.;*F- :!C €
Jl 1 ,:- i= .>;' *' =. ,' 1 ,+ri
t
0*29
I.A
* + " ii :: +<" -- 7--,. .. _ . .. :--+- -- .Z "| [I - J : +.290
1- -'--~~~ -4=:.ef-
..... . ~ ~ ~ = T -.. . . -.. .
. .- 1 .-.-. T -4
• .-'=I --- ' l--- ' T L €- ,
18-', !-;4= T I -1 ,--T4
R , T- I T: - * 4-
I _ - i' = " =.5 : I, ' - ;.
1.I =0I- ?- 4
1, ".-. = * : ,.; - ;..' .
I.4 1 'a' -7o ,:r- -
4
183 A= 4 T T4
1:Z4= T - T 7 -T .
1850= 11 4 -
' +4I.
A-~~ - .'l 43
1::::: (4= I (k) ) = 1
11 ' ='_= T (?)T=C2 19-1
mp -= - 1T-
I :_',4 V= ' 1'= ' - _, T 4
I i-"'. fs= :,:2 L J 4 r'.k- L.l9 Ii= :-" I* . r,
' .;:=.4= I.Nz-I
1 _-, =."- .= I -U :: I1
1:_:.I n : -:. : ,;,= ' -S
1:.9=1¢, ;l( : , -I-::£
1 '9 n n==-', ,.! a", ji =. I ~ - ' -.
.. ~. -,I. '.,-, -- 1':.
* =14'= c": ' T ' ", =,- .-"- '-
' l:Ii- r( T I" -.. ,-;-
I. e16,0=I -"'4.' $-= I"-
291 =.? ' "-,T :1: -v, j,
=' :- I= ": =;"'1 ' ,-' 'r t : ':.
'To
-~~~~~ +'. -:-T 7. - e
24 A fit 's s;'- T
2 TI
4 Prf I L
" -' - It:.-- --- "_- -. O -: ,
T I- 4 --, . 1
:p4'44
,? ": , ,;4 = ,-' 4 .- -:- . :-
l.- TT 4'.
....... i .... - T
:-: oi--T = ' 4- ;:
, I' . - 4
4' 44oir '' :Z I L ,: -T::
.4 1 r *:: ,: I " ' ' =? T - T 7.-1:-4 Hrn="" :T, 1 :, ,=;iat :
''4:.= [ 'TI, .=F3'.U 4
111.oll-= o, }" ,' * =:-s.,: -T4;:' I f!--".... i " '= l : T
z,, _., ,T '4 ,rr;:& ::'
-" - L - '...rI"-=
- 4i,=: , -
S , . - .P'= - P ; ' - :
,:' .- '- "ft" , '' , S -" , ' ,' I "
,:',;,n= "4'.' , ! * ,, -' , r :
, 292of
4-I-
V .-t -, *T -
tT TT-, , , T*. ,
29 ,,,= P , ,.T, I -T: -
,, ,
•- -7 ='
= ' 1 - *, ,- , ]* ,- -
-,-.=.= , -1 ]-- ,
:j4,,=' >J, *' , - 44e ""
':_ ,t,= ,;1= , I -* ;' 4 :- -
, ..17 n - I 14 - 4 - r .7
2 E::-:n = ,- ;=T 1 -T F T-F
2n= FT TI -T -T4-:
- " : -: : 3 =- -= : j -' 1;- 4
2':'=,n= ', , I I -,, ' -'4-- I ii
_ r
.-
C.C:-+ ,- .=- , c f¢4 * ,'- '. r";3
?,- - 0- = ,= - 4.. " -4 -, - ,
• > l, I= 4 I,' , - 4 r 4 -:, Y: ' :.
- -* -, ,-.
- =r 1T 1T- .T4
i I
" 1 III--- - * I- I, .- 4
4j 1 1 : I - =: '' = - ,:
'. -, : * e ' , 1 , ' , , 1 " ,
= , = '-' ' ' .= 1 - -,
<!, 7I1 ' If ,:. , '- - 4
S1 ::> = '' '-.''= - -r- 4
~293
.4" --;.+i
I=
:.4 1 =Y 1 2L,"'I'I I C ' , :+- 1 = >i.'4 IJ=,-
14 A ,=9= X. II 1
i;::4 ) 1 :
.:44 , r . '
:3 , =- T ' ," 1 1 2
.-'4 -, fl= ,"4--'= :,Y 1 ,4) , -' Y +::,r 1 12 ,):
- . 11'.,.4.. .,-,, [ , ! i,, :
4 f Fr-lI. 'W 5" 19111.I
-. 4 F1 f I= TL 1-r '1
:34fh= 9 "= ' ,: :') 3 4.:'rtj ':11 I:'):
J4c1 r 1r- ',. : . .
..... ..3 = ,:1[ ,:.-:' i, -4.:7 ,I ,":12)) :
=Y :-Y 14,'1c2)* ii~... _,- Q j . .. .. --" .7
LU4 i; ,' =;-:>,:' I. '7,+.: 1', 14i
:34 .N=ia=Y : ,1' T,:, 14' : L:,
-4 I '=:I Y ,:, c.' ' ' I 2-f' 14)
34:: f= 1:-,"3= . i.;: ,::: ,. 4. 1 ' 5
'" " z ".. " Ir .- : = . - I r . ' ; ' + , 1 , :
372 O= TS=P" : I.:
294
:' + T -. .
m.4 1. It;;; I-- 1%
-7- -7 T 4-+T
TI 4c? :2I
;l 0 p-I TI; , T -T.-
-. =1 5= ; - - !•1 .! -
• .--', 1
401 T i 1
4 -13I! I'= . '= . H.
. "-,1 " -,_1.:, '-,1
-' i= ''q=T T 1 TI
-34 = :II-Td-1'qQ f=, '1:S1 1 4-U 1
:E9 '-9= 11 :==T1 * T':
:39 0= :15=T--T 44 0= T1 = U S #U' 9 -,I
4 n I = T1'=,f' 4* -; -
4 0:-.n= 0: =Ta S:r4 -' :-
UI1- 11
4r~fi= TI 1 1 ff;T4 A
411" T'4=- ' 1* ,,-,-,+ 1 4:'40 o 0; = 3 =:-: 1 *s ': 5; *:2:14:
4 1 r = T= ' P1 ;g:* .-lli*'''= t,-, = .;; : ; :
A" 4 2 1 I, 9'E, "2 - , -
41, ' -, TI-T-,
L -
4e tI - 29•* 4el"4 = 1'.-,) 1 'S *T'4
2, I ' :, -: ' ' • ' ' -
1. 295
| -,
.4 '1!=--
IF w 4,I::,= T T I
44
-. -' - -'=
I" .I -;. r -: 1--4* T--
4 -, O= 1 = ! 11. " 4-,9
448- -i=
4 : f 1 ,51 IT i;-1-
-s -I I:T
45 1= x j
4: - 14 P
11111It-
4 441- -I I'-'t.-F..-.44.9 I= j'11-I ij i I;.."
44:7) '1= LI1 1 I ii-.:
441-= -4 i P -7 1'-
44 1:3 = :-:* = l~ " F ''-
447 = :'"- 1 5:'_ ) S] -_l I
4499r!= '" '1 ': 16-:, :, r, 1 r'-'
'- ., -'= - ", .44'9:: :- p 91 f =§7'e -,,7
45 h r = ' ,: I : 1 5: ' :,=: -: 1 ':"
4F. I I*= "":I '." 4).: =:S 1.
.. r - .-. 1- '
4.,: .- g= '. I f 1 - -' . .. ,.7'
4 4.'=:: rI= S I f , I3' c' =::.' -' t 4.:
47, f'1g= '.:. *,T I :4 ' *g:'-""* 1
4,- *Kt = I 1L * 14;-. / =
*44"!-1 ,s ' '' - 11,$, -ri.
4790= 'f'Ii 1 1' =,-.1 -P19
' 296
r ' 'e--- -
4: C; % i4 7l=
488lA ~ L=1
49- A F: L )LIF*:L. T ' = - - -
I '-T , - L479-D I"N T 1 I!- 1 F4:94 fi= PE- T -U2,N495" EN=D
44;.:, I- t ,F....lS,'
4-',70 =--- L=E
a*FOR
A4
297
4:.::-: -o=;- :k : =;.:: LI4 I- F=F:, '' :,=v :L "
- ' fl = L ,.. ... .-4- _1 .'.
Append i.>: J. ini Tcst-; on tHi( CDC Cyhcr 74
The timin(j t2sts on the CDC Cyber 74 used the FORTRAN
command SECOND(CP) which, according to the FORTRAN IV
reference manual, returns time accurate to "two decimal
places", i.e., 0.01 seconds. The results of timing the
various DFT alaorithms showed this clock was accurate to
three decimal places (0.001 seconds) giving a time resolu-
tion of 0.002 seconds. Using three decimal places was
justified since almost every standard deviation was less than
or equal to 0.002 seconds.
To verify the premise that counting the real operations
performed in a DFT is the primary factor determining execu-
tion speed of the algorithm on a computer, the DFT execution
times were measured on the CDC Cyber 74. The execution speeds
for the WFTA, PFA, and the mixed/fixed radix FFTs were com-
pared to the "predicted" execution speed of the algorithm.
To perform these comparisons the multiply and add speeds
were determined for the Cyber 74 computer.
The execution times of the floating point multiply and
. add instructions are given in the CDC 6000 Series Computer
Systems Reference Manual. The execution times for several
4 instructions are listed below and include preparing the next
V instruction for execution:
298
IM
Assemb]y MinorInstrucrt irn Lanqua uc Cycles
Floatini ;um F4.
Floating product FX i 10
Normalize result NX. 41
Fetch/store SA. 3
where one minor cycle equals 0.1 microsecond ( s). Simply
using an add time of 4*0.lps and a multiply time of
10*0.1is = i Ks is not sufficient because the operands must
be fetched and stored which adds more time. To determine
the commands executed by the computer for adds and multiplies
the assembly (COMPASS) language was studied and timed for
three cases. First, the DO loop with no operations was
executed 100,000 times:
DO 102 J = 1,N
102 CONTINUE
The associated COMPASS language code was listed as an
output of the program:
(AA BSS OB
SBO B2 + 7B
SA5 J
SA4 N
SX7 X5 + 1B
IXO X4 - X7
. SA7 A5
PL X5, (AA
299
This 1),),) rr,:uired an -vcra(.y of 2. 70; .. .
deviation 0.03 s) to execute. Next the addition
instruction was executed 100,000 times using the FORTRAN
code:
DO 102 J 1,N
102 TAD A + B
The associated COMPASS code for the addition loop is:
(AA BSS OB
SBO B2 + 7B
SA5 A
SA4 B
SA3 J
SA2 N
FXO X4 + X5
NX7 BO, XO
SX6 X3 + 1B
IX5 X2 - X6
SA6 A3
SA7 TAD
PL X5, (AA
This add loop required an average of 3.34is (standard
4 ' deviation 0.3os) to execute. Notice the "extra" instructions
of the add loop versus the no operation loop:
300
Corr'-m' ! i nor ' '
SA5 3
SA4 B 3
FXO X4 + X5 3
NX7 BO, XO 4
SA7 TAD 3
17
Finally the multiply loop was executed 100,000 times.
The FORTRAN code is:
DO 102 J = 1, N
102 TAD = A*B
and the corresponding COMPASS code loop is:
)AA BSS OB
SBO B2 + 7B
SA5 B
SA4 A
SA3 J
SA2 N
FX7 X4*X5
SX6 X3 + 1B
* IXO X2 - X6,4
SA6 A3
SA7 TAD
PL X5, )AA
The multiply loop averaged 3.37ps (standard deviation 0.03)
to execute. The extra instructions required for the multiply
loop relative to the no operation loop are:
301
Command Minor Cycles
SA5 b 3
SA4 A 3
PX7 X4 * X5 10
SA7 TAD 3
19
Comparing the measured execution times of the three
loops shows the add loop is 0.64ps longer. Based on the
minor cycle times for the extra add and multiply commands,
the add loop should be 17*0.lps longer and the multiply
loop should be 19*0.11s = 1.9ps longer than the "no operation"
loop. (Notice that every floating point addition must be
"normalized" by the command NX7 which requires 4 minor
cycles. The floating point sum does not require normalization).
The difference in measured add and multiply speed (0.64ps
and 0.67.s) versus the predicted add and multiply speed
(l.7ps and 1.9ps) is a result of the very short loops
fitting inside the Cyber's "instruction/execution stack"
which is a 12 word stack with 60 bits per word. Since the
entire loop could fit in the stack the instructions were
fetched only once instead of 100,000 times, whereas "all
execution times (minor cycles) listed include readying the
next instruction for execution". During normal DFT
algorithm execution of all of the instructions must be
fetched which means the add speed is 1.7ps and the multiply
speed is 1 .9us. These numbers were then used to predict
execution speed of the DFT algorithms.
302
Vi ta
John David Blanken wi!7 horn on 18 AXpril 1953 in
Junction City, Kansas. Hie graduated from Junction City
High School in 1971 and attended Kansas State University
from which he received a Bachelor of Science in Electrical
Engineering in May 1975. Upon graduation, he was desig-
nated an AFROTC distinguished graduate and received a
commission in the United States Air Force. He entered
active duty 15 July 1975 and was assigned to the Air Force
Avionics Laboratory at Wright-Patterson Air Force Base,
Ohio. On June 6, 1979 he was assigned to the School of
Engineering, Air Force Institute of Technology.
Permanent Address: 716 West 8th StreetJunction City, Kansas
66441
7 A'., . ~i.CONL1RACI 1 U-.4 I .>
-r I ' i) . ;, -, I -.
717 i
r,7 cr Pal. r AF6-, Ohio 454-i1..NL_7', S302_ _
Uncl a! s i f i d
Pn-J, !,-,'!A d , J.i; ):.2>. 0'.lO l- l,6.
IT ' 1 7 T, , 1'r-,7 It i,.e - -
. -... .. . .. . .. . .. . ... .. ... . .. . ..
I, _I
. . .. 2l -
. . . . . . . . .. . . . . . . . . . . . . . . . . . ..
o -. . 1..
I I s C. )u i or ' J': ,+
* .
S'CIIU ., cil 1 -1' t '7.i : I i~ w" Io c41
nit, I,, a 1'. W 'l* ic r) V.' v, o L
us C' 'I ':I3,', ,hi • :' . '+ . '+ 0, ( :.' 1 IA llt .W h
thc: . 11 ", 1. ' , L 1 .I d ) t ]w !. s u t ,1-; o f t h" 1+:i Li C t. i o .mu s hid:; t- ic ,:i J d.d i-
flO 1I 1 W t.l 0 tc 1.'7! c V o) .. r
I, c .I ; :. c : : (] . ,] J J ,.. : : u t- : ./ : ,) + J- J.+J t. w + ;
llS 4 L; [,2 .q ], .t,:: ,-. L+. .,