Upload
corey-edwards
View
215
Download
0
Embed Size (px)
Citation preview
Parallelizing the Fast Fourier Transform
David Monismithcs599
Outline
• An Example of a use of the Fast Fourier Transform (FFT) – Audio Processing.
• Explanation of the Discrete Fourier Transform.• Making use of Divide and Conquer to
implement the Recursive FFT (Cooley-Tukey Algorithm).
• Creating an iterative algorithm.• Parallelization of the FFT.
Examples: Audio Processing• Compression
– In audio and video processing, often only certain frequencies can be heard or seen.
– The FFT or a similar tool can be used to remove frequency data that cannot be seen or heard.
– Only the important data (the frequencies we care about can be stored).
– An inverse FFT (or similar operation) can be applied to decompress the data.
• Audio synthesis– Frequencies can be quickly added/adjusted and converted to a
signal (a sound) by using the FFT.– Such operations are often applied in audio synthesizers.
Band Pass Filter Algorithm
• Convert signal to the frequency domain with FFT.
• Multiply desired frequencies by 1.• Multiply remaining frequencies by zero.• Apply the inverse FFT to convert the signal
back to the space-time domain.
Discrete Fourier Transform
• Converts a sequence of values from space-time domain to frequency domain.
• Useful for signal processing.• The standard DFT is too slow for practical use -
it requires O(n2) operations.• Notice that each value is a summation of all the
components in the space-time sequence.
Discrete Fourier TransformThe nth root of unity is defined as ωn as shown below.
The DFT can be performed as shown in the equation below.
Many values in the matrix are repeated, but the repetition is not obvious.
Example of Repetition
Assume that N = 5.Notice that the first root of unity is:
The sixth root of unity is
Fast Fourier Transform (FFT)
• Repetition in the roots of unity is highest for the DFT when using array sizes of powers of two.
• This can be taken advantage of with a divide and conquer algorithm called the FFT.
• The FFT and can compute a DFT in O(n lg n) time by multiplying array values by the appropriate root of unity and adding the array value at an appropriate stride.
• This algorithm allows for fast compression and manipulation of signals.
Divide and Conquer in the FFT
x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7]
s1[0] s1[1] s1[2] s1[3] s1[4] s1[5] s1[6] s1[7]
s2[0] s2[1] s2[2] s2[3] s2[4] s2[5] s2[6] s2[7]
X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7]
Cooley-Tukey FFT Algorithmy = fft(x, n, stride) if(n == 1) y[0] = x[0]; else y1 = fft(x, n/2, 2*stride); y2 = fft(x+stride, n/2, 2*stride);
for(int i = 0; i < n/2; i++) y[i] = y1[i] + e^((2*PI*I*i)/n)*y2[i]; y[i+n/2] = y1[i] + e^((2*PI*I*(i+n/2))/n)*y2[i]; end for end ifend fft
Example Code
• Let’s quickly take a look at some example code for the Recursive FFT.
• Given an array of values:[ 8, 7.0, 9.0, -1.3, 6.3, 8.5, 4.2, 9.1, -5.2 ]• Matlab (and Octave) tell us that the fft is:[37.60 + 0.00*i, -6.24 + 1.13*i, 7.70 + 12.10*i, 3.24 + 21.93*i, 9.00 + 0.00*i, 3.24 + -21.93*i, 7.70 + -12.10*i, -6.24 + -1.13*i]
Iterative FFTy = fft(x, n) { r = ceil(log2(n)); //Allocate arrays R and S to be of size n R = x;
for(m = 0; m < r; m++) { S[i] = R[i];//Elements to add at each stage differ in exactly one bit. bit = 1 << (r - m - 1); notBit = ~bit; for(i = 0; i < n; i++) { j = i & notbit; k = i | bit; expFactor = revAndShift(i, r, m); R[i] = S[j] + S[k] * cexp( (2*PI*I*expFactor)/n ); } } y = R;}
Reverse and Shift Function//Given i = b0 b1 b2 … br-1 obtain bm bm-1 … b0 0 … 0//Note that there are r - m - 1 zeros.result = revAndShift(unsigned int i, int r, int m) { i = i >> (r-m-1); //remove unwanted bits result = 0; for(int j = 0; j < m+1; j++) { result |= i & 1; i = i >> 1; if(j < m) result = result << 1; }
//pad result with zeros result = result << (r-m-1);}
Parallelizing the FFT
• Assume the number of processes (i.e. running programs) is a power of 2.
• The number of processes will be referred to as npes.
• Assume the array size (N) is a power of 2 and is larger than the number of processes.
• Partition the array into N/npes chunks.• Assign one chunk to each process.
Parallelizing the FFT
x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7]
s1[0] s1[1] s1[2] s1[3] s1[4] s1[5] s1[6] s1[7]
s2[0] s2[1] s2[2] s2[3] s2[4] s2[5] s2[6] s2[7]
X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7]
Process 0 Process 1 Process 2 Process 3
Parallelizing the FFT
• Notice that data must be sent to each process and received by each process in the first lg(npes) stages.
• Additionally, notice that within each process, in a stage where data transfer must occur, data must be sent to a process and received from the same process.
• This operation can be accomplished using a function called MPI_Sendrecv from the Message Passing Interface API.
• We will investigate the algorithm to perform this operation next.
Parallel Algorithm//Rank is the process id and npes is the //number of processesy = fft(x, n, rank, npes) { r = ceil(log2(n)); workToDo = n/npes; start = rank*workToDo; end = start + workToDo;//Allocate arrays S, Sk, and R of size workToDo
R = x[start…end-1];
for(int m = 0; m < r; m++) { Sk = S = R; bit = 1 << (r - m - 1), notbit = ~bit; splitPoint = npes / (1 << (m+1));
Parallel FFT Algorithm, Cont’d
if(splitPoint > 0) { if( ( rank % (splitPoint << 1) ) < splitPoint) Send S to process rank + splitPoint, and receive Sk from rank + splitPoint. else Send Sk to process rank - splitPoint, and receive S from rank – splitPoint. } else Sk = S;
Parallel FFT Algorithm, Cont’d
for(int i = start, l = 0; l < workToDo; i++, l++) { j = (i & notbit) % workToDo; k = (i | bit) % workToDo; expFactor = revAndShift(i, r, m); R[l] = S[j] + Sk[k] * e^( (2*PI*I*expFactor)/n ); } } y = R;}
References
• [1] A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing 2nd Edition, 2003
• [2] J. Demmel, Fast Fourier Transform Lecture, Efficient Algorithms and Intractable Problems, Spring 2007, http://www.cs.berkeley.edu/~demmel/cs170_spr07/LectureNotes/Lecture_FFT.pdf
• [3] Discrete Fourier Transform, http://en.wikipedia.org/wiki/Discrete_Fourier_transform
• http://en.wikipedia.org/wiki/Fast_Fourier_transform• http://en.wikipedia.org/wiki/Gilbert_Strang• Additive Synthesis,
http://en.wikipedia.org/wiki/Additive_synthesis